I
I.
I
I
I
I
I
I
II
IUMlRICAL COMPARISOlI OF INPROYD ME'rHODS OF !ESlfIlI}
:rott
CONtDIGDCY tABLES WItJI SMALL FREQlJEIICIIS
By
Kariaki Sugiura
and
Masanori otake
(Uni-versity of North CaroliBa
(Atomic BoDib Casualty
" Hiroshima llniTersity)
Commission, Hiroshi.)
Institute of Statistics .ameo Series 110.
553
NoveDilter 1967
Ie
I
I
I
I
I
I
I
•I
!his research partially supported by the Sakko-Itai
J'Ou.nutioD and the Batioaal. ScieDce 1I'0undatioD
CJrant 10. GU-2059
DEP.All!lCff
at StAtI8!ICS
VaiverBity of Borth Carolina
Chapel Bill, •• C•
I
I.
I
I
I
I
I
I
I
IroMERICAL COMPARISON or IMPROVED METHODS OF TES'fiIG
FOR CONTINGENCY TABLES w:rm SMALL FREQUENCIES
;..
Bariaki Sugiura*
ad
Masanori otake
(University of North Carolina
(Atomic Bomb Casualt7'
& B1roshima university)
Commission, HiroShima)
1.
Introduction and sU1DlD8.l'Y'
!he signific8.Dce levels of various tests for a general c x k con-
tingency table are usually' given by larae sample theory.
accurate for the one having smal1 frequencies.
But tlley are not
In this paper, a numerical
eValuation was made to determine how good the &l'Proximation ot significance
level is for various improved tests that have been developed b7 .ass ['],
Ie
Yoshimura [9], Gart [2] etc. for c x k contingenq- table with small trequen-
I
I
1eve1s of the various approximate methods (i) with those ot one sided tail
I,
I'
I
I
I
•I
cies in some of cells.
For this purpose we compared tile signiticance
defined in the terms of exact probabilities for given margina1s in 2 X 2
table; (ii) with those of exact probabilities accumulated in the order of
magnitUde of
I
ste.tistic or 1ikelihood ratio (-LIt) statistic in 2 x 3
table mentioned by Yates [8].
In 2 x 2 tab1e it is well known that Yates'
correction gives satisfactory result for 8_11 cell frequencies and the
other methods that we have not referred here, can be considered if' we devote
our attention only to, 2 x 2 or 2 x k tab1e.
But we are mainl7 i.terested
in comparina the methods that are applicable to general c x k table.
appears that such a comparison for the various im,proved methods in the
same examp1e has not ,been made explicitly, eVeD though these tests are
frequently used in biological and _dical research•
"'This research was partially supported by the Sakko-kai Foundation
1
It
I
I.
I
I
I
I
I
I
I
Ie
Our
numerical experience sbo,", the tollowing facts.
somewhat better than the others tor 2 x 2 tables, though thq are aomewbat
too small tor exact levels in the ra.qe one to tive percent.
the others are not alWlqS conservative.
ot
!rIae corrected L'R statistic DJ'
Yoshimura [9] 18 a11f8¥S a little lIIOre aCCNrate tJaan the LR statistic.
lbwever, the approximate signitic8Dce level ot L1l statistic ... allfa:1s
smaller thILIl the others except tar the table llaTiDg
comparative~ large
marginala.
!he other methods that we iaftatigated give a.l...:>st the a. .
2
results as the standa.'rd 1 test. !he method ot .ass (6]proved Rot to gift
as good aa approximation as he conjectured.
!he followiDg notation ot observed frequencies for
&
c x k con-
tingency table is used throughout this l"tper. .
~
I
I
I
I
I
~1'
~,
•• •
• ••
I
'!he ..thod
Gart [2] is conservative for 2 x 3 taD1. tar tile practical purpoae, 'bllt
Xu'
.-
!he approximte
.
significance levels due to Gart [2] ad the Editied Dandekar's _tho4 are
I
I
.
2.
... , ~
~.
..., Xa
~.
...
• ••
Xc1' Xc2'···' Xck
Xc·
X· l , X· 2,·· ., X. k
•
Description ot .thods with R'UeriCal. cOQ&risClllS in bo's eD:Q1e
Rao
([7], :P.2(2) consid.erecl the tollow:LDc 2 x 2 ta'b1e, testiDe
whether the cit;y soldiers are more sociable tllaD village soldiers. '
SociaBle
City soldiers
Tillage soldiera
13
Total.
19
.o.socia'bl.
4
lfotal :
1"
'2.0
11
37
6
2
17
.......
", '"
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.-
I
In this example he compared the various improved techniques for small
frequencies with the exact method, that is, the one sided tail probability
(X
= 0 '" 4) for fixed marginals is given by equation (l) and the approxi12
mate significance levels of the other methods described by him are shown in
(5).
(2)
(1)
(2)
Exact probability = 0.0059
2
2
X test, p{x > 7.9435)/2
(3)
Yates' correction,
p{ ,>2
(4)
LR test,
> 8.2811)/2 = 0.0020
(5)
Dandekar's correction,
p{-21og\
= 0.0024
> 6.1922)/2
= 0.0064
p{ 'i~ > 6.1086)/2
= 0.0068.
These results show that _:2 method (2) and LR method (4) do not give satisfactory approximation and that Yates' correction (3) and Dandekar' s
correction (5) are fairly good.
But Yates' correction cannot be extended
to a general c x k table as well as Dandeku's correction.
~ate
We shall investi-
some other techniques applicable to c x k table which give good epproxi-
mations in this example.
(6)
Corrected LR statistic.
YOlhiDll1ra [9]calculated the correction
factor K for the LR statistic, -2 log )" such that the first and the second
conditional moments tor given margiD&1s of the statistic, - 2K log }.., are
equal to those of the ·'{c-l}{k.-l) distribution up to the order of lin, that is,
,
- 2K log '. = - 2 K!
..
;:~=l;~=lXijlOgXij
-
z~=lXi .logXi •
_)Jt
X logX
+ If log If
•j
•j
~j=l
where
3
I
Ie
I
I
I
I
I
I
I
,e
I
I
I
I
I
I
I
.e
I
Then he proposed using the statistic - 2 K log A, which is a.pproxiDl&tely
Xcc-l) (k-l) dist~ibut~on.
distributed a.ccording to the
K = 0.95906 and p( - 2 K log l\ > 7.94?1) = 0.0024.
In Rao' s example,
!he accuracy of this
correction gives the same significance level as that of the
l
method (2),
though it is a little better than the LR method (4).
(7)
Corrected
I
statistic.
we shall determine the correction
fe,ctors a and b such that the first and second cODditional mments of the
statistic a
-l
+ b are equal to those of Ice-l)(k-l) distribution.
I
conditional Jll)ments of the
E
l
=
E[li Xi.' X.
E2 =
E[i~ I Xi·'
j
statistic are giTeD. by Haldane [4].
= N(lf-l)-l(c"":i.)(k'-l)
]
J
X. j
'.rhe
=
~[(Ji-l)(I'-i!)(~)rl
(A +
o
Al~
+A• l)
2
A = (~)2(k_l)2 + 2(e-l)(k-l)
O
~ = -4(e-l)2(k-l)2
+
c~2
+ 2ck - 2 -
(k2+2k-e)lf£~=lX; ;
2
__ -It -1 _...2( c -1) (-'k -1
- (c +2c-2)Ej=lX. j + .- r.i=lX i '. Ej"=lX. j )
A_ =
-,
ck(c-2) (k-e) + k(k-e).r:i_1X-i:J. + c(c-e)~ X-l
-
+
~(t~=lX~;) (~=lX:-~)
•
•
Then we get a = [ 2 (e-l) (k-l)/(E 2
l
regarding the statistic a
example, a
= 0.9864,
'b =
j=l .j
~) ]1/2
and b = (c-l) (k-l) - aE ,
l
+ b as a x(C-l){k-l) variate.
0.01382 end P(al' + b
!his correction is slightly better then the
In bots
> 7.8218)/2 = 0.0026.
l- method
(2), thoqh it
is not satisfactory to us.
(8) Method by Nass I.
a
I
Nass [6J proposes to use the test statistic
as a ~ variate, 'Where the quantities a aDd. f are determined such
that the first and second conditional moments of the statistic a
i
are
I
I.
I
I
I
I
I
I
I
equaJ. to those of the
f = aE
~ distribution, that is, a = 2E l /(2E2 - ~),
l where El' E2 are given by (7).
This method differs from the.
others in that the test statistic has a ~ distribution with non-integer
number of degrees of freedom.
~
tm
In order to get the tail probabilit7 for
distribution, we made use of the Cheb7ahev series expansions for
gamma functions mentioned by Clenabaw [1] and the continued fraction
expansion
-e. x
x e
!
-
'1: a-l
1 1-.
1
2-a 2
x e x dx X+""1+ i+' 1+ x+
c;o
3..... • •• n-l n-e. •••
1+
X+ 1+
for 0 < a < 1, which is discussed by Gupta and Waknis [3].
example, a • 1.0000, f = 1.0278 and
p(a~ > 7.(439)/2 = 0.0025. This
result is almost the same as that for the
(9)
Method by Nass II.
In Rao's
l
method (2) and method (7).
Rass [6] conjectured that his method (8)
l
Ie
would be improved, it he could use the test statistic, a
I
I
I
I
I
I
I
a, b and f are determi..ned such that the first three conditional lIIOments
••I
+ D, as if
it were distributed according to the ~ distribution, _ere the parametea
ofax2 + b are equaJ. to those of the ~ distribution.
In a 2 x k table
we can check this conjecture 'by using the third Jl¥)ment given b7 Haldane [5].
We shall write it after some rearrangement.
2
E = E(IIX .,X. j] = .5 [(N-l)(R-2) ••• (1'-5) ]-lUk -l)(k+3) +
i
3
2
c l = 2(:30k + 69k - 43)-e(9k + 47)-S=lX:-~
+
c2 =
~lca. .-a}
~(~.X2. )-1(-3;-elk2-e4k+26+(3k+19)~=lX:-~}
120C~k-l)-3601f~=lX:-~
+
120~~=lX~
+
r(~.~. )-lr5k3-57k2-e66k+120+3(9k+67).~=lX:-~
-
.4(~.~.)-e(-e(k3+9k2+l4k-12)+(3k+22).S=lX:-~ - ~S=lX~}
5
-
3em2~=lX~}
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
c
=
3
~(X1.X2.>-1 r60k(k-e)-30(k-6).S=lX:-~ - 9O:I~=lX~ }
-
= -§4(~.X2.)~(4k(k2~)-(2J.k-e0).S=lX:-~ -l:r.rS=lX:~)
c4
c 5=
By
N4(~.~.>-e (6k(k-2)(k+5)~(3k~3)xS=lX:-~ - 16rS=lX~)
-4N4(~.~.> -e (3k~)B~=1X:-~+ w2S.1X~}.
using the :.oments E together with E , 1: given by (7), we
1 2
3
the three parameters as a =
t: =
8~/~,
where
-r = E1,
4~/~,
~
4~(~
-
= ~ - 31:~1
+
b =
= 1:2 -
~
and .,
CaD
determine
~m,)/m5'
ai.
In
2
Rso's example, a = 1.0287, b = 0.03034, f = 1.0877 and P(aXf+b > 8.2022)/2
= 0.0024. !he result ot: this correction ia again the same as that ot: the
X2 method (2) ad methods (6) ,... (8). !he approximation method conjectured
by Nass does not aeem to be satisfactory.
(J.O) Method by Gart I. Gart [2] derived the modified L1t statiatic
Mfd, as a x.2 variate with (c-1)(k-1) degrees ot: freedom tram an interesting
viewpoint of regarding the data value Xij as parameters and cell probabUi ties as random variables based on the equality connecting the JDLl1tinomiaJ..
distribution with the multivariate Beta distribution.
- S=l(2X. j + c)log(2X. j + c) + (211 + ck)log(2N + Ck)k
It
c
1
d=l+t
1:.
-I:
-J;
3(c-1)(k-1f i=l j=l
i=l 2Xi • +k
j=l
1 {C
n~j+1
I
I
}
at.
j
+6+
2Jr+ck
We shall remark that this correction factor d contains random variables Xij ,
which distinguishes this method from others.
d = 1.0565 and P(Mfd >
7.3~0)/2 =
0.0033.
than the previous ones.
6
In Rao' s example, M = 7.8096,
This result is somewhat better
..
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
(n)
Method bY' Gut II.
Gart [2] also proposed to use the more
accurate correction factor d' instead of d in the method (10).
d' =
1
(c-1Hk-l)
[CE
+ (:1 _
{ 1-
i=l j=l
C{ 1- E
i=l·
kr.
1
+
3(2Xi .+k)
1
3 (211 + ck)
+
1
3(2X +1)
ij
+
1
~
-1
8(21: , +1<)2 }
- jal
1
1
6(211 +
1
rl
8 (2X +1)2
ij
Ck)~
}-l
In Rao's example, d' = 1.0562 and P(M,ld'
,-
1-
1
+
3(2X. j-+<:)
1
8(2X +c)2
.j
r
1
_ •
2:
7.3~Ji.)/2 =
0.0033.
1'.I1e result of this correction is the same as that of the .thod (10).
!he
difterences between methods (10) and (n) will be seen in another example
l.ater.
(12)
Modified DaIldekar' s .thod.
J)andekar' s correction mentioned
in Roo ([7]), p.203) cannot be exteD4eci to a general c x k table.
because tor given marginals we c....ot
unique~
1'.I1is is
increase or decrease the
observed f'requencY' bY' 1 accordingly as the miDimwn observed frequellc7 is
decreased or increased by 1.
suggested .,. Mr. Veda.
We shall consider the following modification
Let the minimum number ot all the cell trequencies
be attained bY' the (il,jl)' ••• ' (il,j j) ceUa.
Then we caleuJ.ate the
l-
statistic tor the table having the BJdified. trequency X' , where X' = X +
rs
rs
rel
X' 0 = X • + (muiber of i ,. 0., it equal to r) x t- , X' = X + (number
r
r oS
•a l
l
of jl' ••• ' jl equal to s) x J- for (r,s) = (il,jl)'···' (it,
jiJ,
X'rs = Xrs for other cells and 11' :: R + 1.
~l.
(~
We sball define this value as__
The modified Dandekar' s method is to use the statistic
-X :1)(-X:1 - '(
~)/(~l -
'cl)
as a
7
l
~
=
variate with (c-l)(k-l)
x. ~ -
I
t-l ,
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••I
degrees of treedom, where
observed trequency Xi;"
~ means the value of x,2 statistic for the
Here we DDlst note that the inequalities
222
> It 0 > X -1 are not necessarily hold.
'X+l
Figure q. we have X:l <
~<
x:
l"
~
In Rao's example,
.1-1 = 6.7556 and P('X~ 2: 7.295 8)/2 •
= 9 '" 12 in
7.91J.35, ~ = 9.3678,
In fact for X
12
0.0035.
=
This result is slightly better
than Gart' s methods (10) and (11).
3.
Comparison in some 2 x 2 and 2 x 3 tables
In the previous section ve have checked the accurac," for the various
approximation methods in Roo's eX8JllJ)le onl.7.
the effects of these corrections for ~
We shall first investigate
=0
..,
7 or
~l = 0 ,..,
8 (accurm-
1ated in opposite direction) in the same example, 'Whose results are shown
in Figures 1 and 2.
(i)
!hen we also examiDe ia cue of other tables u
Rao's example
~
~
20
19
18
37
2 x 2 s,mmetric case
~l
X
12
20
(Figure 3)
~l
~
20
20
20
q.o
(Figures 1 and 2)
(ii)
(iii)
(iv)
17
~l
~l
2 x 2 table with large
18
arginals
X
X
(Figure 4)
~l
~
555
133
440
573
~3
17
11
12
2 x 3 table by Yates'
( [8] p. 233)
(Figures 5 and 6)
13
30
follows:
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••I
Figures 1. and 2 show the following facts.
Gart (1.0), (n) and modified
Dandekar (]2) give fairl.y better results aad LR method (4) always gives
methods (1) ....., (9) are almost the same as the
-l method
(2).
note in Figure 3 that the vaJ.ue obtained by Gart (1.0) for
larger than tor
XJ.2
~e
Yoshimura (6) is unifor:me1.7 better then (4).
the smal1.est T&lues.
= 1..
We JIII18t
Xu
= 0 is
This fact seems to be a weak point ot metbod (1.0).
Gart (11.) is preferable to Gart (10) in the presence of zero frequencies
for some cells; otherwise they are nearly equal.
Figure 4 shon that the
LR method (4) and Yoshimura (6) are better and thq are aJ.mcst equaJ. to
Gart (1.0) and (ll).
In this case, the /
method (2) as well as moditied
Dandekar's method (1.2) do not give good approximations.
In Figures
5 and 6, the methods (2) and (1) ,., (9) give a1.lmst the same results as
in the previous Figures, but they are close to exact method (1.).
Gut (1.0)
and (ll) are conservative, but JlIOdified J)andekar (]2) is not.
Acknowledgment.
The authors wish to express their gratitude to
Mr. S. Ueda, Ministry of Foreign Attairs ucl to Mr. C. E. Lead, AlICC,
for their valuab1.e suggestions.
~,. &1.80
thaDk Miss FutagaJli, Computins
Laboratory, Hiroshima 'University tor her eomputational help.
9
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
References
[1]
C. W. Clenshaw; Mathematical Tables Vol. 5, Chebyshev series for
mathematical function, Her Majesty's stationary office, l'at101'1al
Physical Laboratory, 1962.
[2]
John J. Gart; Alternative anaJ..yses ot cont1llgency tables, J. Roy
Statist. Soc. I,
[3]
28(1966), 164-179.
Shanti S. Gupta and Mnldu.ll.a I'. Waknis; A S18tem of inequaJ.ities
for the incomplete
Math. Statist.
[4]
gamDII.
f'unction and the normal integral, Alm.
36(1965), 139-149.
J. B. S. Haldane; !he mean and variance ot
a test
of hempneity, when expections are s-.n, Biometrika 31(1911-0),
346-355.
[5]
J. B. S. Baldane; The use of .,..2 as a test ot hoDr>geneity in a (n x 2)
fold table who expectations are s-.11, Biometrika,
33(1~'),
234-238.
[6]
C. A. G. I'&8S; The
-l test tor
small ex,pectations in contingency
tables, vith special reference to accidents and absenteeim,
Biometrika, 46(1959), 365-385.
[7]
C. R. Rao; Advanced Statistical. Methods in Biometric Research,
John Wiley and Sons, Inc.
1952.
[8] F. Yates; Contingency tables inwlving s-.u muabers aad the
Supp1- J. Roy. Statist. Soc.
[9]
l
test,
1(1934), 217-235.
I. YoshiBlra; Scale correction ot likelihood ratio test statistics
related to contingenc7 tables, Rep. Stat. Appl. Res.,
.e
10
I
-l when 'Used as
lO(1963), 1-10.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••
I
Figure 1
One sided tail probabilities for X12 ·0-8 in Rao's example
log10P
-1
--
-2
-3
-4
Exact (1)
-5
_ · _ - X2
(2)
Yates (3)
-6
-- --------- -21ogA (4)
- - _ . _ - - -2KlogA (6)
_·o-.-.aX2 +b
-8
-
.. -
A-
-Nass I
(?)
(8)
· - x - « - N a s s II (9)
"-A-4-Gart I'(10)
- - ' - - G a r t II (11)
-=--..-
-10
._Modified (12)
" Dandekar
-11
-12
0
1
2
3
4
5
6
.7
X12
One sided tail probabilities for X11 • 0-8 in Rao's example
-1
-2
-4
-----Exact
(1)
- - - - X2
(2)
-----yates
(3)
---------- -21ogA (4)
~---- -2KlogA (6)
-0-0-
aX2 +b
-tA-tA-
Nass I :(8)
-I(-)t-
Nass II (9)..
(7)
-4-4-Gart I
- . - - . - Gart II (11)
.-e._e_
Modified (12)
Dandekar
-11
o
1
2
3
4
5
6
7
8
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
One sided tail probabilities for X12 • 0-9 in symmetric case
-1
-2
_ _ _ _ _ Exact
-5
(1)
---
(2)
- - - - - Yates C~)
__________ -2logA (4)
-- -
-2KlogA (6)
- -
(.7)
-A-A-
-
(8)
Nass I
x - x.- Nass II
-4-6-
Gar1i I
(9)
(10)
---. - . - Gart II (11)
-u-u-
.
Modified (12)
Dandekar
i
i I
i
I
II
I,'
-12
II
( I
! I
,: I
o
1
2
4
5
6
7
8
I
One sided tail probabilities for X12 " 0-12 with lar!?e
margl.nals
Figure 4
_ _ _ _ _ Exact
_ _ _ _ %2
(2)
- - - - - Yates
---------- -2log", (4)
- - - - - -2Klog", (6)
-0-0-
aX2 +b
(7)
-6-4-
Nass I
(8)
-~-x.- Nass
I
2
4
5
6
7
(9)
-
A-A-
Gart I
-
• -
Gart II (11)
_
1
II
••
8
• -
(10)
•• _Modified (12)
Dandekar
9
10
11
- -e - ---------------e
,~e
Figure
Tail probabilities in 2 X 3 table
5
accumulated by the magintude of
X.
statistic
log10P
-1
-2
.-~
-/
'\
~V-h£'
/
Exact
,.
_ - - X}
(1)
(2)
- - - - - - - - - -2log)... (4)
\
"
./
- - - - - -2Klog)... (6)
2
aX +b (7)
-.-0-
"
-3
Nass I
-
Nass II (9)
9
7
5
0
8
0
10 10
1
3
5
1
3
6
7
6
10
2
3
3
4
2
9
9
1
4
4
6
8
3
5
5
6
6
x. -~-
_4-A-
Gart I
-
Gart II (11)
-
X12 2
X13 5
(8)
-A-A-
3
4
t -t-
•• -
6
1
tt_
5
6
8
1
(10)
Modified (12)
Dandekar
7
9
1
2
------- - -• -- - - - - -e e
Tail probabilities in 2 ~ 3 tables accumulated by the mae;intudc of LR statistics
Figure 6
log10P
-1
Exact
-2
\\.
/-
/.~
/
~
_ _ _ X,2
__ _ _ tI'
_______ -- -21ogA
(1)
(2)
(4)
- - - - --2KlogA (6)
-
"
,,
_ o _ o - a X2 +b (7)
..,--- -
_I.
-3
-A-Nass I
- ) ( - K- Nass II (9)
_A_A-Gar: I
X 12
(8)
7
X
0
13
8
0
2
4
9
2
5
5
10
1
3
6
7
10
6
3
10
2
5
1
4
6
3
3
6
6
5
6
4
2
(10)
-
• - . - Gart II (11)
_
•• _
9
9
8
3
1
4
5
5
.. ~ Modified (12)
Dandekar
8
7 9
3 6
2
1
4 1- 1
© Copyright 2026 Paperzz