California State University, Northridge
,,
APPROPRil'~TE
USE OF YATES 1 CORREC'I'IO.N FOR
CONTI~TUI'I'Y
A thesis submitted in partial fulfillment of the
requirements for the degree of Haster of Science in
Health Science
Biostatistics &Epidemiology
by
Nuntavarn Pattaphongse
.. /
j .
August, 1978
The Thesis of Nuntavarn Pattaphongs0 is approved:
f
)
•
Se-y-mour E1seman
Bernard Hanes, Committee Chairman
California State University, Northridge
ii
ACIZNO\>\JLEDGEMENTS
I
v.rish to thank Dr. Bernard Hanes, Dr. Ram Swaroop
and Dr. Seymour Eisaman for their critical readings of
the drafts and their helpful suggestions.
iii
TABLE OF CON'I'ENTS
ACK}.TO'itU..JEDGEHENTS
ABSTRACT
•
~
c •
•
•
•
•
• •
« • • o •
~
•
~
•
•
•
•
•
• •
• • • • • • • • • • • o e a • •
• • • e • • • o • • • o • • e s o
•
~
•
e • •
e •
~ ~
e e
e
•
•
e e
e
~
•
• e e •
~
• • • o e • • e • •
•
fi
o e •
~
~
e
•
Page
iii
o • • a
v
~
~
1
6
Chapter
I
IN'rRODUCTION
II
REVIEW OF THE L.ITERA'l'URE
• o e e e • • • • • • • • • e & o • e
III
STUDY SCHEME
e •
•
~
•
•
•
•
•
•
•
•
~
~
•
o e •
o •
•
12
IV
CONCLUSIONS
•
e •
•
o •
•
•
•
•
•
•
•
•
•
•
•
•
25
FOO'I'NO'l'Ji!S
• •
•
BIBLIOGRll...PI:-IY
APPENDICES
A.
•
•
e •
•
•
•
•
•
•
•
•
o e •
e •
e •
• •
•
•
•
•
• •
• •
• •
e •
• •
•
• •
• •
~ •
•
•
• • •
•
•
~ o
• • e •
• • •
•
•
•
• •
~ •
26
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
0
•
•
•
29
~ •
• • • • • • • e • • • • • • •
31
e
•
•
•
•
•
•
•
•
•
•
•
• • e • • • a • • • • • e • • • • • • • • • • • •
e
• a
e
•
•
•
e
~
e
•
e
o
s
e
e •
'
•
•
•
•
•
•
•
•
e
•
e
•
e
•
•
e
•
•
•
•
•
•
•
•
•
•
~
~
. . . . . . . . .. . . . . . . . . . . .
B.
F'isher Exact Probabi1i ty
c.
Notes on Programming Procedure
LIST OF TABLES
•
•
. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
31
33
37
43
ABSTRACT
APPROPRIATE USE OF YATES• CORRECTION FOR CONTINUI'rY
by
Nuntavarn Pattaphongse
Master of Science in Health Science
The exact distribution of Pearson chi-squared is discontinuous.
'l'hus, the continuous chi-squared table may
give a poor approximation to the exact probabilities.
In 1934 "¥ates introduced his correction term
continuity.
(Cs~N)
for
Since then, much research has been done to
find out the extent of the necessity of the correction.
In 1974, Conover with a sample size of forty examiP.ed
the practicability of Yates• correction.
His results
showed that Yates• correction is useful in some situations
but not in all circumstances.
The main objective of this
thesis is to examine Yates • correction for continuity for
sample sizes of fifty-six,
thirty, and twenty, on lines
similar to Conover•s study.
The Fisher exact probability
test is used as a standard in comparing the performances
2
of Pearson%
2
with Yates' _correction and % without the
correction term.
A time-sharing computer program {APL)
is used to perform all the necessary calculations.
v
The results obtained agree with Conover's conclusions.
Namely, in cases with predetermined marginal totals, the
employment of Yates' correction improved the probabilities
where one or both marginal totals are equal.
ditions where none of the marginal totals
son ' s
X2
• h Yates
Wlt
I
~re
In those conequal, Pear-
2
•
•
•
f er1or
•
' hcorrect1on
1s
1n
to th e % w1t
out Yates' correction.
vi
Chapter I
INTRODUCTION
In 1900, Pearson's interest at that period of time
was finding some mathematical contribut.ions to the theory
of evolution.
He developed the method of chi-squared to
deal with the problem of the relationship of attributes,
not capable of quantitative measurements.
His chi-squared
measures the total variation of the expected from the observed which he termed contingency.
1
The greater the con-
tingency, the greater must be the amount of association or
of correlation between the two attributes.
2
The suwmariza3
tion of .Pearson's development of chi-squared is as follows:
Given 'N' observations of a random sample from a population are classified into 'k' mutually exclusive classes.
The null hypothesis gives the probability 'pi• that an observation falls into the ith class (i=l!2, ••. ,k).
quantities
mi~
The
Np. are called the expected numbers, where
l
t
p. = 1
N
i
l
i=l
the observed numbers xi falling in ith class follow a multinomial distribution with the 'p.' as probabilities.
l
joint distribution
The
of the x.•s is therefore specified by
l
1
2
the probabilities
n!
~~~,~----,
xl. x2···
xk~
As a test criterion for the null hypothesis, Karl
Pearson proposed the quantity
'X
2
=
t
(x.
. l
i=l
-m.)
2
m . .......t.._
l
If the null hypothesis holds, the limiting distribution of
X2 as
. .
f'
.
.
p. rerna1n1ng
·1xe d lS
t h e cont1nuous
X 2 d'ls-
..
n-~~~ana
tribution.
l
Proof is shown in Appendix A.
Pearson ~
2
statistics is a
useful method in testing
for the independence between the two characteristics.
The
term independent means that the distribution of one characteristic should be the same regardless of the other characteristic.4
Suppose a researcher wants to test the indepen-
dence between sex and frequency of nightmare.
obtained is arranged in (2 x 2) table below.
:Men
Nightmare
frequency
The resulting
degree of freedom
The data
5
~!omen
Often
55
60
115
Seldom
105
160
132
192
237
352
«2
x2
for the above data is 0.39.
at~=
.05 is equal to 3.84,
For one
therefor~
in the above example the hypothesis of independence is
:3
accepted at the five percent level of significance.
Thus,
the occurrence of nightmare is independent of sex.
Another statistical test of independence for a
contingency table is the Fisher's exact test.
exact test uses the 'exact' probability
6
(2 x 2)
Fisher's
of observing data
under the assumption that a particu:ar ( 2 x 2) table was
generated by sampling from a four-variilble hypergeometric
distribution. 7
Also, Fisher 1 s exact test yields the exact
probabilities of a particular (2 x 2) table and more extreme cases of that table.
The method of calculating
Fisher exact probability is shown in Appendix B.
For large
sample size, the calculation becomes tedious because the
technique involves factorials.
Even with the use of the
existing computer programs, such as BMDP, SPSS, and IHSL,
the maximum sample size allowed is only hventy-one.
Although Fisher's exact test gives exact probability,
it is difficult to use because it involves tedious calculations.
For purposes of this thesis, Fisher exact probabi-
2
lity is used as a reference for comparison of Pearson %
with and without Yates' correction for continuity.
Yates
8
introduced the correction term in 1934 to im-
prove Pearson
~ 2 estimates.
In the case of (2 x 2) table,
Yates felt that the approximation of the discrete sampling
distribution of Pearson
tinuous
~ 2 as computed to estimate the con-
~ 2 distribution can
be markedly improved by reducing
the absolute value of each difference by .5.
Since the distribution of Pearson
9
x 2 is discontinuous,
when the expected frequencies are small the number of distinct values of
tinuous
x2
x2
mav be very limited.
Thus, the con-
table may give a poor approximation to the e~act
probability mainly because the area of a continuous curve
is used to approximate t.he sum of a small number of dis.
10
crete pro b a b 1'1'1t1es.
The conventional limit on the
smallest expectation in any cell of the (2 x 2} contingency
. r1ve.
-.
11
ta bl e lS
Since the introduction of Yates' correction for (2 x 2)
contingency tables in 1934, much argument has been developed
regarding the necessity of the use of the correction.
Con-
over(l974)12 cautioned against the use of Yates' correction
in all situations.
His study is used as a starting point
in this thesis to further examine the appropriateness of
Yates 1 correction.
As in Conover's paper, Fisher exact
probability is used as a reference in comparing the probabilities of Pearson
x2
with and without correction.
The main objective of this thesis is to examine
whether Conover's results hold with changes in sample size ..
The organization is as follows:
In
chap~er
II, a series of
articles on the effectiveness of Yates' correction for continuity on t:he chi-·squctred estimates are cited.
Chapter
III focuses on the extended stuJy of Conover's research
with various sample sizeE:,
is presen·t.ed.
First, the method of sampling
In addition, Conover's study is expanded to
encompass different sarnp1e sizes.
'l'he APL computer language
2
2
.
is employed in the calculation of ?C , X Wl'th correct1on,
their probabilities, and Fisher exact probabilities.
Chap-ter II
REVIEW OF THE LI'rEHATURE
1
Karl Pearson
(1900) developed the method of chi-
squared to deal with problems of ·the relationship of attributes 1 not capable of quan·ti ta t.ive measurements.
'X
2
= \
L
(
ObsE:~rved
2
- expected)
expect.ed
Pearson termed any measure of the total variation of the
expected from the observed a measure of its contingency.
He established the necessary distribution theory for finding significance levels when the null hypothesis provides
the exact values of the expected.
Yates 2 in 1934 introduced a correction term to calculate Pearson %2 for (2 x 2) table where an error occurs
when using a continuous distribution to approximate a
2
discrete Pearson ?( 2 . Yates found that Pearson's :<
estimation improves greatly when his correction of ~
(N
=
total smaple size) is used.
2
In referring to the performance of ~c, chi-squared
3
with Yates correction, E.S. Pearson (1947) said "it'
becomes clear that in the case of small samples, at any
rate, this method of introducing the normal approximation
6
'7
gives such an overestimate of the true changes of falling
beyond a contour as to be almost valueless."
The contour
referred to the sample values which lead to the rejection
of the hypothesis of equality of proportions at o<. = 0. 05 or
0.01 when various values were assigned as·the common value
of proportions under the null hypothesis.
In 1949, Lancaster
4
cautioned that the use of
2
Xc
when combining several (2 x 2) contingency tables may
result in serious bias.
Also in 1952, Cochran
5
. :1.
t d
1na1ca-e
that the correction term is not needed even when one or
two expectations fall as low as one-half but provided the
remainder are above the conventional limits of five or
ten.
For a
(2 x 2) contingency table,
Cochran suggested
the use of Fisher's exact test when the sample size (N) is
less than twenty,
20
< N < 40,
tation is less than five.
If
and where the smallest expecN
is greater than forty,
~2 was recowmended and if the smallest expectation is less
than five
2
~
should be considered.
c
Plackett's
6
results
(1964) confirmed the conclusions
reached empirically by Pearson that a correction is inappropriate if the reference set considered relevent is
generated by independent binomial samples.
Grizzle
7
{1967) noted that the researchers are more
likely to control probability of making a Type I error at
2
0( = 0.05 and d,= 0.01 when using the X
statistics than
~.
8
2
when using ,A~
,c
agreement
Also, the results from his studies were in
•
~,.,i th
t~he
usual recornmendation that the smallest
expect:ed value to be at h!a,s:t, five before putting complete
x.?
reliance on the
stat:i.5t:ics.
However when the smallest
expected value is less than five,
2
?Gc.
-;:?
should be chosen over
In contrast, to Grizzle l s results, Mantel and
Greenhouse 8 (1968)
in their studies of the comparison be-
tween the exact. test: probabilities for a
(2 x 2) contin-
gency table and estimated using alternatively corrected
·or uncorrected chi-squared came closer to Fisher exact
probability.
Roscoe and Byars
9
(1971) suggested using Fisher's
exact test when the expected frequencies are 7.5 or less
for
(2 x 2) tables.
Also, they concluded that the chi-
squared approximation with l
degree of freedom provided
an acceptable approximation under all circumstances where
the average expected frequency was maintained at 7.5 or
more.
10
Conover
11
(1974) made the following assertions in the
beginning of his paper:
(1) if the rows and column totals are predetermined and
n
1
=
n
2
or c
1
=
c , then corrected ~ 2 improves the pro-
2
bability estimates in most cases.
(2)
If the row and column totals are predetermined, but
~
n
and c
~
2
.
.
c , then %c 1mproves the est1mates
2
1
111
2
2
(over X ) for about half of the
possible contingency
(
tables and provides worse probability estimated than the
uncorrected statistic in the rest of the cases.
Conover
used particular sampling scheme to demonstrate these
assertions (14, 15, 16, 17, 18).
Conover 1 s sampling scheme \vas the same as the example
used by Mentel and Greenhouse (1968).
It involved a con-
tingency table with equal row t.ables n 1 = n2 = 20, and
12
The test of performance
column totals c 1 = 7· , c 2 = J~3 •
·
2 ana, uncorrec t·ea, .,.
N..2 1. s based 11pon
between corrected
~
the comparison between their probabilities against the
Fisher exact probabilities.
Also, the Conover sampling scheme is divided into four
parts.
Part A consists of a contingency table with equal
row totals n 1 = n 2
=
20 and column totals c 1
=
7, c 2
=
33·
Part A
a
20
--
~
20
7
33
40
c
Given all the
~
possible combinations of cell frequencies
2
such as from 0 to 7, it was found that% estimates
c
13
2
perform better than ~ estimates.
In other words, the
difference between corrected
x2
and Fisher exact probabi-
10
2
lity was less than that of uncorrected % and Fisher exact
14
probabili t:y.
In part B, Conover
__ 21 but
t~he
slightly to n 1 - 19,
column totals remain the same.
change~
Part B
-----.-------
-
a
b
19
c
d
21
7
33
40
Again, all the possible combination of cell frequencies
were examined and the results
2
In this example, X
found changed drastically.
.
2
est1mates were better than Xc·
15
The
correction was not needed.
In part C, all the marginal totals
are set· equal
to twenty to obtain the most symmetric case.
again illustrated that
X~
The results
provide better estimates than
X 2. 16
Lastly, marginal totals shown in part D
are adjusted
= 19, n 2 = 21, c 1 = 19
1
All the possible combination of cell frequencies
slightly from part C to obtain n
= 21.
2
.
were taken and it was found that ~ 2 est1mates
are co 1 ser
2
than the x estimates to the true probabilities more
c
c
11
tirm:::s than'· not.
17
Part D
--a
b
19
d
21
-·
c
·19
21
40
Conover's assertions are supported by the above
sampling schemes.
Furthermore, he concluded that the
actual significance level is lowered when the correction
is used.
This resulted in a reduction in power, that is,
it reduces the probability of detecting a real association
18
or real difference in rates.
12
Chap·ter· III
S'fUDY
SCHEBE
METHOD OF SAMPLING
The example used by Mantel and Greenhouse
a contingency table with equal row totals n
and column totals c
1
=
7, c
2
1
1
=
involved
n
2
=
20,
= 33.
20
20
7
They found that
x2
estimates.
33
40
2
performed considerably better than the
c
~
The results were repeated by Conover (1974)
with three additional sampling schemes.
= 19,
totals were changed slightly to n
1
colurrm totals remained the same, c
~
1
=
First, the row
n
7, c
2
=
21 but the
2
=
33.
19
[2G
21
33
40
7
2
Secondly, Conover fixed all the marginal totals at 20.
EI£J_
~
20
20
20
20
40
In the last sample, the marginal totals readjusted slightly
to n
= 19,
1
n
2
=
21 and c
1
=
19, c
2
=
21.
I
a
b
19
c
d
21
19
21
40
Conover only dealt with the total sample size of forty.
The study presented in this thesis is an extension of Conover's paper.
Three more sets of sample size fixed at
fifty-six, thirty and twenty are examined.
As in Conover's
study, the same method of sampling are used in this thesis.
That is, for part A the row totals are
totals are fixed at arbitrary numbers.
~N
and the column
In part B, the ro\'7
totals are changed slightly from part A but the colurrm
totals remain the same.
totals. are set equal to
In part C, the row and column
~N,and
in part D both the row and
column totals are changed slightly from part
c.
In each of the above parts, all the possible combinations of cell frequencies are considered.
Fisher exact
test is used as a reference in the comparison between
an d
~
2
c
.
estlmates.
in Appendix B.
~2
Derivation of Fisher exact test is given
Several computer program packages, such as,
SPSS, IMSL and BMDP permit a maximum sample size of only
twenty-one which is quite limited.
Therefore, a time-sharmg
APL program is written to calculate Fisher
ties.
~xact probabili~
'rhis program allows a total sample r:;ize of up to
sixty-nine.
Programming procedtlres together with flow
charts are presented in Appendix C.
RESULTS OBTAINED FROM THE STUDY
Tables 1, 2, 3 and 4 show the canputer results
obtained from the sampling scheme -vli th sample sizes of
fifty-six, forty, thirty and twenty respectively.
Column 1
2
lists the possible values of X ; column 2 gives the con2
tingency tables that yield % ; column 3 gives the exac·t
probability.
Columns 4 and 5 contain estimates of the
probabilities obtained by comparing the chi-squared distribution with one degree of freedom vJith
'1}
and
,l,
c
res-
pectively.
The last two columns present the comparison
2
2
in performance between x and X , that is, 'b' shows that
2
c
2
between /G and % which performs better.
c
2
Table 1 illustrates how 'X and ,J- estimates compare in
c
cases where the sample size is greater than forty.
In
agreement with Conover's findings, part A of Table 1 shows
that
x2
estimates are closer to·the exact probability in
c
eighty percent of the cases.
l'-j
Part A: Table 1
If the row totals are changed slightly, only by a factor of plus or minus two to n
1-
26, n_...
=
= 30 but the column
.!.
totals remain the same, t.he results chans;e dras·tically
(part B, Table 1).
,.,;
2
2
estimates are no1il bet.ter than x,
c
in most instances.
Part B: Table 1
m::
9
47
56
In part C of Table 1, all the marginal totals are set
to
twenty~eight
which is equal to half of the total sample
size to obtain the most syrrrrnetric case.
The results show
2
again that % provides better estimates than
c
x2
in seventy
percent of the cases.
Part C: Table 1
a
b
28
c
d
28
28
28
56
Again for the last part (part D), the row and
collli~n
totals are adjusted slightly by a factor of plus or minus
16
two to n
1
=
2
that the r.G
26, n
2
=·==
30 and c
1
== 26, c
=
2
estimates are closer than the
30.
2
~
c
It is found
'
.
estlmates to
the exact probabilities (about sixty-nine percent of the
cases).
Part D: Table 1
26
30
26
30
56
Again in Table 3 and Table 4 with sample size of
thirty and twenty respectively, part A where n
1
=
n
2
pre-
sents the same performance pattern as seen before; i.e. ,
.
.
?G 2 estlmates
are more accura t e t h an %2 · estlmates.
c
.
also holds true in part C of Tables 3 and 4 with n
c
1
=
c
2
=
~N.
This
1
= n2 =
When the row totals are changed slightly by
.
.
.
a factor of plus or mlnus
two as 1n
part B, t h e %2 estlmates
2
.
A1 so, part D o f
perform better than t h e X estlmates.
c
Tables 3 and 4 show that v1hen both the row and column totals
are adjusted slightly
~
2
estimates better approximates the
exact probabilities than the
xc2
estimates.
TABLE 1
Exact and Approximate Exceeda.nce Probabilities
for the Sample Size of Fifty-Six
a
Exact
prob. *
P
10.723
6.487
3.307
1.192
·0.132
0,9
1,8
2,7
3,6
4,5
0.0018
0.0268
0.1430
0.4688
1.0000
0.0008
0.0109
0.0689
0.2750.
0.7159
0.0036
0.0290
0.1456
0.4688
1.0000
b
12.373
9.294
7.773
5.378
4.237
2.526
1.766
0.739
0.359
0.017
9
0
o.ooo4
8
1
7
2
6
3
5
4
0.0004.
0.0023
0.0053
0.0204
0.0396
0.1120
0.1839
0.3899
0.5490
0.8964
0.0016
0.0073
0.0154
0.0507
0.0903
0.2207
0.3350
0.6206
0.8146
0.8146
b
b
b
23.143
18.286
14.000
10.286
7.143
4.571
2.571
1.143
0.286
0.000
5,23
6,22
7121
8,20
9,19
10,18
11,17
12,16
13,15
14
X
2
0.0023
Os0085
0.0206
0 .0663·
0.1537
0.2771
0.4808
0.7188
1.0000
2 .6E-6* *
3. 9E-5* *
0.0004
0.0029
0.0154
0.0604
0.1810
0.6230
0.7896
1.0000
ex?>
1.5E-6* *
1.9E-5* *
0.0002
0.0016
0.0075
0.0325
0.1088
. 0.2851
0.5930
1.0000
P
(X~)
5. 5E-6* *
6 .1E-5* *
0.0005
0.0004
0.0162
0.0614
0.1815
0.4227
0.7893
0.7893
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
TABLE 1
(cont.)
2
a
Exact Prob. P (
X)
2
p (%)
2
X
c
2
X.
c
Part D. n =26, n =30, c =26, c =30
1
2
1
2
18.805
18.145
14.434
13.875
10.640
10.146
7.424
7.012
4.785
4.455
2.723
2.476
1.239
l. 074
0.331
0.249
4
20
5
19
6
18
7
17
8
16
9
15
10
14
11
13
1.4E-5
3.6E~5
0.0002
0.0004
0.0014
0.0028
0.0082
0.0149
0.0353
0.0582
0.1156
0.1791
0.2953
0.4210
0.6013
1.0000
1.4E-5
2. OE--5
0.0001
0.0002
0.0011
0.0014
0.0064
0.0081
0.0066
0.0348
0.0990
0.1156
0.2658
0.3001
0.5649
0.6179
4.7E-5
6.6E-5
0.0004
0.0006
0.0028
0.0035
0.0140
0.0173
0.0550
0.0654
0.1671
0.1920
0.3986
0.4428
0.7588
0.8179
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
*
Column 3 is the probability of that par·ticular table
and l~ss, it is the exact probability of unfavourable
events.
**
E- denotes the exponential ~ower of base 10, e.g.,
2.6E-6 is equal to 2.6 x lo-
19
'l'ABLE 2
Exact and Approximate Excedance Probabilities for
Sample Size of Forty
2
"
2
Exact Prob. P ( p(, }
a
2
p
<.X )
2
%
2
X
c
c
Part A. n =20, n =20, c =7, c =33
2
1
2
1
8.485
4.329
1.558
0.173
0,7
1,6
2,5
. 3, 4
0.00832
0.09148
1. 00000
0.40748
0.00358
0.03747
0.67732
0.21189
0.01253
0.09601
1.00000
0.40527
b
b
b
b
Part B. n =19, n =21, c =7, c =33
1
1
2
2
9.378
7.677
4.969
3.754
1.948
1.219
0.316
0.073
7
0
6
1
5
2
4
3
. Part
19.600
14.400
10.000
6.400
3.600
1 .600
0.400
0.000
3,17
4,16
5,15
6,14
7,13
8,12
9,11
10
0.00270
0.00894
0.03950
0.09480
0.22578
0.41242
0.68893
1. 00000
c.
0.00220
0.00559
0.02581
0.05270
0.16278
0.26954
'0.57379
0.78653
0.00815
0.01857
0.06992
0.12832
0.32752
0.49179
0.88406
1.00000
b
b
b
b
b
b
b
b
n =20, n =20, c =20, c =20
1
1
2
2
0.00002
0.00036
0.00385
0.02565
0.11284
0.34307
0.75237
1. 00000
0.00001
0.00015
0.00157
0.01141
0.05778
0.20590
0.52709
1.00000
0.00004
0.00050
0.00443
0.02686
0.11385
0.-34278
0.75183
0.75183
b
b
b
b
b
b
b
b
TABLE 2
(cont.)
2
a
Exact Prob. P (
:X )
2
p
(.X )
c
2
X
2
X
c
Part D. n =19, n =21, c =19, c =21
1
2
1
2
19.839
19.558
14.593
14.352
10.151
9.950
6.513
6.352
3.558
3.679
1.568
1. 648
0.422
0.382
0.0003
2
16
3
15
4
14
5
13
12
6
11
7
8
10
9
0.00001
0.00002
0.00017
0.00034
0.00195
0.00375
0.01405
0.02626
0.11195
0.06732
0.34192
0.22476
0.54498
0.75181
1.00000
0.00001
0.00001
·o.ooo13
0.00015
0.00144
0.00161
0.01071
0.01172
0.05926
0.05511
0.21049
0.19917
0.51576
0.53645
0.98735
0.00004
0.00004
0.00046
0.00052
0.00412
0.00455
0.02542
0.02557
0.11659
0.10939
0.34968
0.33359
0.73923
0.76329
0.76329
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
"
TABLE
3
Exact and Approximate Exceedance Probabilities
for Sample Size of Thirty
2
2
~
a
Exact Prob. P ( % )
2
2
P(~)
~
2
X
c
c
Part A. n =15, n =15, c ==9, c =21
1
2
1
2
12.857
7.778
3.968
1. 429
00.159
0,9
1,8
2,7
3,6
4,5
1. 00070
0.01419
0.10646
0.42698
1. 00000
0.00034
0.00529
0.04637
0.23200
0.69033
0.00144
0.01683
0.11102
0.42556
1.00000
b
b
b
b
b
Part B. n =14, n =16, c =9, c =21
1
2
2
1
14.694
11.250
9.209
6.531
5.000
3.087
2.066
0.918
0.408
0.026
30.000
22.533
16.133
10.800
6.533
3.333
1.200
0.133
9
0
8
1
7
2
6
3
5
4
0.00014
0.00094
0.00430
0.01689
0.04568
0.11844
0.23599
0.43973
0.69449
1. 00000
Part
c.
0,15
1,14
2,13
3,12
4,11
5,10
6,9
7,8
1.3E-9
2.9E-6
0.00015
0.00281
0.02684
0.14311
0.46610
1.00000
0.00013
0.00080
0.00241
0.01060
0.02535
0.07893
0.15058
0.33790
0.52290
0.87310
0.00059
0.00313
0.00840
0.03107
0.06624
0.17459
0.29919
0.57615
0.81066
0.81066
b
b
b
b
b
b
b
b
b
b
n =15, n =15, c =15, c =15
1
1
2
2
4.32E-8
2.07E-6
0.00006
0.00102
0.01587
0.06789
0.27332
0.71500
3.19E-7
1.18E-5
0.00026
0.00349
0.02846
0.14413
0.46521
1.00000
b
b
b
b
b
b
b
b
'
TABLE 3 (cont. )
2
2
2
2
a
?:,
Exact Prob. P ( JL )
p (~ )
~
2
-~
c
c
Part D. n =14, n =16, c =14, c =16
1
16.476
16.081
11.059
10.736
6.718
6.467
3.453
3.247
1.126
1.158
0.153
0.117
1
12
2
11
....
..)
10
4
9
s·
8
6
7
0.00005
0.00013
0.00127
0.00267
0.01361
0.02613
0.08126
0.14140
0.29890
0.46430
0.73002
1.00000
2
0.00005
0.00006
0.00088
0.00105
0.00954
0.01099
0.06312
0.07038
0.26068
0.28198
0.69563
0.73211
1
2
0.00022
0.00027
0.00309
0.00362
·o.02607
0.02954
0.13581
0.14912
0.44845
0.47826
0.98049
0.98049
b
b
b
b
b
b
b
b
b
b
b
b
TABLE
4
Exact and Approximate Exceedance Probabilities
for Sample Size of Tw~nty
2
2
a
Exact Prob. P ( X )
2
P (X )
c
2
X
2
X
c
Part A. n =10, n =10, c =7, c =13
1
2
1
2
1o.769
5.495
1.987
0.220
0,7
1,6
2,5
3,4
0.00310
0.05728
0.34984
1. 00000
0.00103
0.01908
0.15960
0.63921
0.00491
0.06076
0.34844
1. 00000
b
b
b
b
Part B. n =9, n =11, c =7, c =13
1
2
1
2
13~162
8.811
7.213
4.105
3.039
1.174
0.642
0.020
20.000
12.800
7.200
3.200
0.800
0.000
7
0
6
1
5
2
4
3
0.00046
0.00472
0.01660
0.07028
0.15968
0.37423
0.64242
1.00000
Part
c.
0,10
1,9
2,8
3,7
4,6
5,5
0.00029
0.00299
0.00724
0.04276
0.08128
0.27850
0.42314
0.88759
0.00159
0.21252
0.02680
0.11998
0.20332
0.54019
0.74154
.0.74154
b
b
b
b
b
b
b
b
n =10, n =10, c =10, c =10
2
1
1
2
0.00001
0.001o9
0.02301
0.17890
0.65628
1.00000
7.74E-6
0.00035
0.00729
0.07364
0.37109
1.00000
5.70E-5
0.00175
0.02535
0.17971
0.65472
0.65472
b
b
b
b
b
b
4 {cont.)
'I'ABLE
2
~
a
Exact Prob.P (
2
2
r,)
p ( ?G )
c
2
X
2
X
c
---Part D. n =9, n =11, c =9, c =11
1
20.000
13.388
12.735
7.593
7.130
3.430
3.104
0.899
0.737
0.002
9
0
8
1
7
2
6
3
5
4
5.95E-6
0.00033
0.00092
0.00976
0.02155
0.09228
0.17480
0.40590
0.65340
l. 00000
2
1
7.74E-6
0.000253
0.000359
0.005859
0.007694
0.064012
0.078111
0.342810
0.390730
0.963970
2
5.81E-5
0.001340
0.001827
0.021232
0.026864
0.161400
0.190190
0.619260
0.684330
0.684330
b
b
b
b
b
b
b
b
b
b
Chapter IV
CONCLUSIONS
The results obtained from this study are in agreement
with Conover, and they are as follows:
1
.....
In cases when the row or column totals are equal,
2
n = n or c == c , then the probability estimates of X
1
2
1
2
2
come closer to Fisher exact probability than ,c, without
the correction term.
2.
In case when the row and column totals are unequal,
2
'1- n , c t c , then X without t.he correction provides
1
2
1
2
closer approximation to Fisher exact probability.
n
25
26 .
FOO'l'NOTES
CHAPTER I
1
Pearson, Karl
~-~~stical_R_ap~.
bridge: University Press, 1948, p. 443
3
Cam-
Gibbons 1 J.D. Nonparai.!!_Eltric~~~:
New York: I1cGraw-Hill Book Co., 1971, p. 231
f..illl-~·
4
nixon, Wilfrid and Massey, Frank J. Introduction
to Statis·tical An~lys~s_. 3ed. ed. New York: McGraw:ffiil Book Co., 1969, p. 240
5 nata taken from Larsen, Richard J. Statistics
for the Allied Health Sciences. Ohio: Charles E.
Merrill Publishing Co., l975,-p. 246
6
nocumented in an article written by Starmer, C.
Frank, Grizzle, James E., and Sen, P.K., "Comment on
the Yates Continuity Correction", Journal of the
American Statistical Association, June 1974, p. 377
7 I'.D~'d •
8Yates, F., 11 Contingency Tables Involving Small
Numbers and the ~ 2 Test", Supp. Journal Royal Statistical Society, 1 (1934), p. 217
9 Ibid., p. 230
10
cochran, William G., 11 The Chi-Square Test of
Goodness of Fit 11 , Annals of Mathematical Statistics,
23 (1952) 1 PP• 315-345
27
llibid., p. 331
12
conover, W.J., 11 Some Reasons for Not Using the
Yates Continuity Correction on 2 x 2 Contingency
Tables ..
(1974), p.374,Journal of American Stat. Asso.
CHAPTER II
1
2
Pearson, Karl, p. 443
Yates, F.,
pp. 217-235
3 Pearson, E.S., 11 The Choice of Statistical Test
Illustrated on the Interpretation of Data Classified
in a 2 x 2 Table 11 , Biometrika, 34 (1947), pp. 139-167
4 Lancaster, H.O., 11 The Combination of Probabilities Arising from Data in Discrete Distribution .. ,
Biometrika, 36 (1949}, pp. 370-382
5 cochran, William G., p. 315
6
Plackett, R.L., 11 The Continuity Correction in
2 x 2 Tables 11 , Biometrika, 51 (1964). 427-438
7 Grizzle, James E., 11 Continuity Correction in the
~2Test for 2 x 2 Tables .. , The American Statistician,
October 1967, pp. 28-32
8Mantel, Nathan and Greenhouse, Samuel W., 11 What
is the Continuity Correction? 11 , The American Statistician, December 1968, pp. 27-30
9 Roscoe, John T. and Byars, Jackson A., 11 An Investigation of the Restraints with Respect to Sample Size
Commonly Imposed on the Use of the Chi-Square Statistic.. Journal of the American Statistical Association,
66 (1971), pp. 755-769
1.0
Ibid.
11conover, W.J., p. 374
1 2 conov.er.,
T,T
J
13 conover.,
hl
VV •
J
•
v\t •
• '
•
1
p • 37 5
o • 375
..._
14conover, W.J .. , p.375
15conover,
vJ &J .. , PQ 375
16 conover,
Ti[ .,J •
I
p .375
l7conover, T/l.J., p.375
18
Conover, W.J ., p. 376
CH~.J?'l'ER
III
1 Mantel, Nathan and Greenhouse, Samuel W., p. 28
2 conover, W.J., p. 375
3This sample size was done by .Conover and also
repeated in this thesis to compare-the results obtained by the APL program written in this thesis.
The results are comparable, therefore, APL program
written is reliable.
29· (.
BIBJ. . IOGRAPHY
l.
Cochran{ William G., "The Chi-Square Test of Goodness
of Fit", Annals of Hathe.lTta.tical Stati§t~_s!_§, 23 (1952)
pp • 315-3 4 5--"~-----------,-~
2.
Conover I Vl e J" .. r !I Some Reasons for Not Using the Yates
Continuity Correction on 2 x 2 Contingency Tables"
(1974),
374-376 ,,Journal of American St.at.. Asso.
pp.
3.
Dixon, Wilfrid and Massey, Frank J.
:f!JtroduC?J:i2.!L.J:...Q..
Statistical Ana.lvsis. 3rd. ed. New York: McGraw-Hill
Book.Co. 1 1969
_ _ _P '_ _ _ _ . .
,._....~_,..- . . . . .~----.... _.__.-~----~-::o
4.
Enrenfeld and Littauer Introduction to Statistical
~-iett~S?~~.
Ne-w York: McGravi.=-Hill -Book Co., 19~
5.
Fisher, R.A. Statistical Me".thods for Research Worke£§.
New York: Hafner Pub-:Co--:-rnc-:-;1~-·-···
6.
Collected Papers of R .. A. Fisher.
University of Adelaide, 1974
7.
Gibbons, J.D. Nonparametric Statistical Inference.
New York: HcGraw-Hili Book Co., 1971
8.
Grizzle 1 James E., "Continuity Correction in the ~.,2
Test for 2 x 2 Tables", The American Statistician 1
October 1967, pp. 28-32
9.
Kendall, M.G. and Stuart, Alan. !he Advanced Theory of
Statistics. vol. 2 New York: Hafner Pub. Co., 1958
J.,H. Bennett:
The
10. Lancaster, H.O., "The Combination of Probabilities
Arising from Data in Discrete Distribution",
Biometrika, 36 (1949), pp. 370-382
11. Mantel, Nathan and Greenhouse, Samuel W., "vlhat is the
Continuity Correction? .. , 'l'he American Statisticia12_,
December 1968, pp. 27-30
12. Max\vell, A.E. Analysing Qualitative Data. Great
Britain: Spottlswoode, Ballantyne & Co., Ltd., 1961
13. McNemar, Quinn. Psychological Statistics.
New York: John Wiley & Sons ~nc., 1963
3rd. ed.
14. Mosteller, F. and Rourke, R .E. Sturdy Statistics-Non-·
parametics&Order Statistics. Addison-Hesley Pub.,l973
30
15 ~
Pearson, E .. s& I "The Choice of St:at:.istica.l Test Illus-·
i.:ra.ted on the Interpretation of Data Classified in a
2 x 2 Table", Biometr.i~~~ 34 (1947), pp .. 139-167
16.
Pea.rson, Karl -~arly St_11.tistica~.-!?~~E~·
University Press, 1948
17.
Plackett, R .. L.,, uT'ne Continuity Correction in 2 x 2
Tables 11 , ~~-trikar 51 (1963) 1 pp. 427-438
18.
Roscoe, John T. and Byars, Jackson A., "An Investiga'tion of the Restraints with Respect t.o Sample Size
Commonly LrnposeO. on the Use of the Chic-Square Statistic11 1 Journal of the Arnerican StaJcistical_~sso
~t:Lor~, 66 (1971) 1 pp. 755-769
19.
Siegel, s. Nonpara.>llotric Statistics for the Behaviora:l.:__Sciences--:---New York: McGraw-HITl-Bookc0.~-1956-
20.
Walker, H.M. and Lev, J. St.atistical Inference.
Henry Holt Co. Inc., 1953-
21.
Yarnold, Jc.1mes k., "'l11e J:1inimum Expectation in x.- 2
Goodness-of-Fit Tests and the Accuracy of Approximation for the Null Distribution 11 , Journal of the
§neri~an Statistical _Assoc~ation, 65 (1970), pp.884-6
22.
Yates, F., "Contingency Tables Involving Small Numbers
and the P<,2 Test 11 1 Supp. Journal of the Royal Statistical Society, 1 (1934) 1 p.217
Cambridge:
APPENDIX A
Source:
Lecture notes from Dr. Ram Swaroop 590A class
A theorem states that if 0.
and E. are observed and
l
l
expected frequencies under model H , then for large samples
0
t
l=l
u =
Proof:
A)
B)
C)
2
?G k-1
In the proof three results are used.
Frequencies are observed under multinomial distribution
d~1 = (0.- E.) 3 and higher powers may be ignored;
d. = t( o7-E.) = o
:L
l
l
If
=
max
L(xiH )
0
H
then -2 log~
max L(xiH) be likelihood ratio
H
'X 2
"'-J
I
with appropriate degree of free-
dom.
By A)
max L(xlH )
0
H
=
0
max L(xtH)
H
A =
n!
lf(Oi!)
= _nj__
lJ(O.!)
l
rr (~iri
i=l
l
=
l=l
.8
.1fl
l=
(*i)
ifl (
1
~i)
A2 =
By C) and B)
-2log A.
o.l
k
20.
l
o.l
Tr(?fi
. 1
l=
l
k
-
'\-t
2 ( d. + E.)
L
l
l
i==l
log (d. + E.)
_J.
E. .l
l
k
=
E
(
2 d.
l
i=l
(d.
+ -:'5._
E.)
1
tt;)
?l
d.
l
].
k
=
2d~
E
i=l E':-
( 1
+
(1
E.)
-J.
d.
l
-
)
1 d.
2?
].
l
k
=
E
i=l
2d3
-l
E.
( 1
...
- 1.
d~)+ (E.
2.._[_
_,.1._
].
L:
i=l
d~
[!
E.].
i=l
-l
)
l
k
k
=
2
d.
E.
l
1.
k
3
d.2
2l
+
E
2
d.].
i=l
k
·-
L!
i=l
d~
-l
E.].
k
=
L.
i=l
o.l -
E.)
].
2
I
E.
l
2
""- ~k-1
-
APPENDIX B
Fisher Exact Probability
Source:
Siegel, S., Nonparametric Statistics for the
Behavioral Sciences, (McGraw-Hill Book Co.) 1 1956
In the two by two contingency table, an exact technique for analyzing discrete data (either nominal or
ordinal) when the two independent random samples are small
is" the Fisher exact probability.
Table 5 represents data by
Table 5
2 x 2 Contingency Table
+
Total
Group I
A
B
A+B
Group II
c
D
C+D
A+C
B+D
Total
frequencies in 2 x 2 contingency table.
N
Groups I and II
might be any two independent groups, such as experimentals
and controls, males and frmales, etc.
The column headings,
indicated as plus and minus, may be any two classifications:
absence and presence of a factor, pass and fail, agree and
disagree, etc.
The Fisher exact probability test, as well
as the Pearson , ?(_ 2 , determines whether the two groups
differ in the proportion with which they fall into the above
cla[~sifica.tions..
'l'hat is 1 the null hypothesis is Group I
and Group II differ significantly in the proportion of
pluses and minuses attributed to them.
In a
(2 x 2} cont:ingency table, the exact probability
p of observing a particular set of frequencies is given by
the hypergeometri.c distribution
p:
(~~(Bjj ~)
A
B
{A+B)
(C+D)
simplifying
P:: {A+B)~
(C+D)! (A+C)} (B+D)I
N} A! B~ C~ DJ---.-•
For example, using the supposed data in Table 6 one can
calculate the exact probability of the observed occurence.
Table 6
+
Group I
Group II
Total
Total
10
0
10
4
5
. 9
14
5
19
Substituting the observed values in the formula, the result
is
The exact probability that these 19 cases should fall in
the four cells as they did is p
=
.0108.
H0
the null hypo-
thesis, is rejected at ot.. :: • 01 and accepted at
d-. :.: • 0 l .
In the above example, one of the cells (cell B) had
a frequency of '0' making the computation simple.
But in
th-e cases vlhere none of the cell frequnecies is zero, the
'more extreme' deviations from the distribution under H0
could occur with same marginal totals and these cases must
be taken into consideration.
For example, in Table 7 the smallest observation is
in cell D with the value of 2.
Table 7
+
Total
Group I
3
9
12
Group II
6
2
8
9
11
20
Total
With the marginal totals unchanged, the more extreme occurrences would be that shown in Table 8.
That is reducing
cell D from 2 to 0.
Thus, applyin<J a statistical test of
Table 8
III
II
Group
Group
I
L~=~~
12
II~~
9
11
Group I
Group I I
8
20
11
9
20
the null hypothesis to the data given in Table 7, the pr,obability of that occurrence must be added to the probability of the more extreme possible ones (Table 8) .
Each p is
comput:ed resulting in
0.0367
Prr=
12! 8! llt 9!
201 11 11~8!0!
0.0031
.
8! lll 91
20! 1 ! 1118101
Prrr-= 121
. ..
Therefore, the total p = .0367 + .0031
0.0001
~
.0001
=
.0399
is
the value of p which is used in deciding whether the data
in Table 7 leads to the rejection of H .
0
H0
is rejected at t:1.. = .05.
In this case,
APPENDIX C
Notes on Programming Procedure
Source:
Gilman, Leonard and Rose, Allen J., APL An Interactive Approach, (John Wiley & Sons, Inc.), 1974
Given the elements of a 2 x 2 contingency table,
an APL program is developed to calculate the values of
X~, exact probability.
X2,
APL is chosen due to its powerful
handling of vectors and special functions, such as,
'facto-
rials' and 'random number generators' under an interactive
environment.
The overall program consists of the following
(A)
Main program
(B)
The conversion routine where a
p (
x2
value is given and
:<, 2 ) is obtained as an end result.
(C)
The calculation of Fisher exact probability.
(A)
MAIN PROGRA...T\1
Given the values of a, b, c, d of 2 x 2 contingency table, the value of
~
2
.
1s obtained by the formula
2
% =
2
((a X d)
n.
1
-
(b X C))
n.
n • n •
2 1
2
N
38
n. =a b
l
n. =c+d
2
n
•
N==a+b+c+d
2
1
(a+c)
(b+d)
The value of 'f,,.2 with Yates' continuity correction ( i(,~) is
given by
2
/(,
{B)
c
2
=
(a
d) - (b X c)
n. n. n . n .
l
2 l
2
X
-~N)
X
N
CONVERSION ROUTINE
vJhat the conversion routine does is to transform the
~ 2 values into probabilities.
This is accomplished by
utilizing the nature of the '):.,2 -distribution for one degree
of freedom.
The formula for a general ~ 2 -distribution with
'v' degrees of freedom is as follows:
k
(,".''Q
P (x)
=
2f (x) .x. (l+
I:
X
k=l (v+2) (v+4) ....••• (v+2k)
2
f{x)
where
is the ?G density function
f(x) =
for v
=
-~x
~v
{~x)
e
·~,..-.,.-----
xr
{~v).
1 the above formula reduces to
~
f(x)=
(~x)
x
r
-~x
e
<~)
)
39
and the evaluat:ion of the summat.ion is ·truncated when the
differences in successive terms become exceedingly small.
(C)
This routine is straight forward.
Along with routine
A and B, the flow diagrams in the three following pages
describe exactly the derivation of the
programs.
MAIN PROGRtiJI!
I
CHISQ I
~nter
a,b,c,d as
ments in the 2x2 conti
table
~
------r-·------~-------
2
2
Calculate P< ,P( ~ )
2
Conversion
Routine
2
~,P(A )
c
c
'Fisher'
i.---Yes
Fisher
·Routine
No
Print
Table
E:ND
SUBROU1'IJIJE
1
FISHER EXACT PROBABILITY 1
Program inputs
a,b,e,d,N
Fisher Exact Probability Formula
F. E. P. = ( a+b)! ( c+d) ! ( a+c)! (b+d)!
a! b! c!
d!
N!
Yes,.----<
Decrement a,d
Increment b,c
>------No
Final value of
F.E.P. is the surr@ation
of observed and extreme
cases.
Return to main
program
SUBROU'l'INE
1
CONVERSION'
I
___J
Start·
Input 'x'
-~,.,.
~~~j_-~--
Evaluate f(x) for one
degree of freedom
1
1
(k: )"2 -?X
)
f ( x,=
, 2X, .e
x.
Calculate
S~Il=
Sllilli~ation
of series
k
X
k=l (---3. 5. 7 ............. (1+2k) )
[ =----
d
Truncate series when kth. ·term
closely approximates k+lth. term
p(x)=2.f(x).x(l+SUMl)
Return to main program
LIS'I' OF TABLES
TABLE 1
Exact and Approximate Exceedance Probabilities
for the Sample Size of Fif·ty-·si.x •.•....•••.•. 17
Tl'i.BLE 2
Exact and Approximate Exceedance Probabilities
for the Sample Size of Forty .•...••.••••..•.. 19
TABLE 3
Exact and Approximate Exceedance Probabilities
for the Sample Size of Thirty •..•.•••...•••.. 21
TABLE 4
Exact and Approximate Exceedance Probabilities
for the Sample Size of 'Twenty . • • . • • • . • • • • • • • . 23
'l'lillLE 5
Hypothetical (2 x 2) Contingency Table ••..•••. 33
TABLE 6
Hypothetical Da·ta for a ( 2 x 2) Contingency
Table . . . . . . . .._.e·. . . . . . s • • • • • s • • • • • • • • , ; • • • • • • • • • 34
TP.J3LE 7
Hypothetical Data for a (2 x 2) Contingency
Tabl.e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
o
35
© Copyright 2026 Paperzz