Use and Misuse of the Chi-square Test of Association in Medical

Use and Misuse of the Chi-square Test of Association
in Medical Research
~ a n a l l ~ R 4 u r nChi-Square
~'l~
Test ~danmaou$a~anl.miiii
Chansuda ~on~srichanalai'
and H. Kyle webster '
&f/a
. $un&qm wrifl7ruittb 11% ma Innl nuntaoi. 2x31. 9hn-m~Y~lunl3W
Chi-square
4
Test
9-9
rwanmaau4aynmaa51. ai~ai~~ou~nuiflianimsuwn8
2 (2) : 123.131
rwfmnlwd dao~hdu%inj~rijmma&ouiw"ou~u
nha%lnnaaun+niA n%
rui~~mauu%du~1%unndngrin'7unb
w?a ?a/ Chi-square test 18u?ihsnnasu
d
,muab79m'adiuw& ~~iuit&iu%garn"sndi~
s n z ~ ~ a ~ ~ i n i h r ~ a m s d a i ~ w a " n ~66aznisdflaiuwuio
~u~aw'~~flia~
4,
~uuiudio~uinmi7ui?aln*adiu66~kaiu
~%nuiraoniuhuiurrfi nn?av chi-square test ?@pIn%r di?wV
t=Y i
nisi~flaiuwu~uflnu?u?~uwrnwa~rn
nuu ruauain@~%~d6v??gwn"nvauchi-square test a h u l l d b riouiuafi$
s chi-square test nazi uIUrjrmdm'7) 7 vauv%$nw a 7 n ~ u m ~ ? a d a i o ~ ~w6tl~13au'ir
mdu
oznci i ~ b 5 2 n 7?a"
9-4
-4
~~9
%tract: Chansuda Wongsrichanalar anu ri. Kyle Vveoster 1988. Use and Misuse of Chi-square
Test of Association in Medical Research. Thai J Hlth Resch 2(2): 123-131
Chi-square test of association is a statistical method frequently used in medical research to
con;pare two or more groups of subjects on a certain characteristic to see the numbers or proportions
of he subjects having the characteristic are different. The test is appealing because of its mathematical
nnd
-. --' conceptual simplicity. Unfortunately, it is sometimes misused leading to misinterpretation of a
scie,ntific study. In this paper, the uses of chi-square test of association will be described and the
sit ulations in which misuses may occur will be emphasized, aN illustrated with examples.
Department of Immunology and Biochemistry, U.S.Army Medical Component, Armed Forces Research Instilute of Medical Sciences
(AFR IMS), Rajvilhi Road, Bangkok 10400.
.
Chansuda Wongsrichanalai and H. Kyle Webster
.
Thai J Hlth Resch
.
Introduction
Chi-square test is a method of statistical testing to determine whether two or more series of
count data are different. Chi-square with one degree of freedom is the distribution of the square of a unit
normally distributed variable (z2).Thus, chi-square and normal critical ratio are arithmatically similar.
The general formulation of chi-square is :
where
X2 =
C =
0 =
E =
df =
chi-square statistic
sum of all cells
observed counts
expected counts
degree of freedom
Being the sum of the numbers that have been squared, chi-square assumes no negative values.
Like the t test, it deals with the concept of degree of freedom. The appeal for the use of chi-square is its .,
mathematical and conceptual simplicity. In addition, chi-square can be used for the comparison of several
proportions. Its simplicity, however, is sometimes exploited at the expense of misapplication, which results
in flawed scientific conclusions.
Chi-square test of association, sometimes called chi-square test of independence, is probably the
most frequent use of the chi-square distribution in medical research. Other uses are the goodness-of-fit
test, test of homogeneity (Daniel, 1983), and test of trend in proportions (Armitage, 1971). Misuses of the
chi-square test of association are rarely mentioned or explained in text books of biostatistics. In this paper,
uses and misuses of the test are summarized, all illustrated by examples, and the rationale for each situation
is discussed.
Generally, chi-square test of association is used to test the null hypothesis that two criteria of
classification (e.g., types of drugs used versus disease outcomes) are independent when applied to the same
set of data. An investigator concludes that the two criteria of classification are associated if the distribution
of one criterion is different among categories of the other criterion. Classification of the data according to
two criteria is commonly presented by a table in which the rows represent the categories of one criterion
and columns represent those of the other. This is called a "contingency table", the simplest of which is
when each of the two criteria gives rise to dichotomous categories resulting in a so-called "2 x 2" or fourfold
table.
To simplify the following discussion, only 2 x 2 tables (i.e., involving chi-square test with one
degree of freedom) will be considered. The examples given are taken from research on the epidemiology of
malaria. However, the formulated concepts can be applied to other branches of medical research and, of
course. to diseases other than malaria.
Uses of the Chi-square Test of Association
I . Independent samples
Webster et al. (1988), reported a seroepidemiologic study of 135 Thai soldiers deployed to a
malaria endemic area in northeast Thailand. The soldiers were divided into two groups, namely, "malaria
naive" and "malaria experienced" based on their history of malaria infection and baseline circumsporozoite
(CS) and blood-stage antibody (ABS) levels. They were prospectively followed for development of parasitemia
J
Vo12 No 2
Sept 1988
Use and Misuse of the Chi-square Test of Association
in Medical Research
and antibody responses. In table 1 of Webster et at. (1988), the numbers of malaria naive and malaria
experienced soldiers who developed P. falciparum malaria during the seventy-seven day study period were
presented along with other seroepidemiological characteristics of the two groups.
Suppose we want to know whether soldiers who are malaria experienced are less likely to be
infected with P. falciparum malaria than those who are malaria naive. We may evaluate this association by
comparing the proportions of individuals who develop P. falciparum infection between the two groups.
Table 1 displays these data.
Table 1. Distribution of malaria-naive and malaria-experienced soldiers according to the development of
P. falciparum infection (Websrer ef at., 1988).
Soldier groups
Malaria
naive
P. falciparum infection
Malaria
experienced
Total
Yes
No
The numbers 28, 25, 43, and 39 are observed counts. Under the null hypothesis, one would expect,
among the 71 malaria naive soldiers, 71 x (53/ 135) = 27.9 to be infected with P. falciparum malaria and
71 x (82/ 135) = 43.1 to be not infected. Correspondingly, the numbers expected to be infected and not
infected among malaria experienced soldiers are 64 x (53/ 135) = 25.1 and 64 x (82/135) = 38.9. The
calculation of chi-square can then proceed as follows:
x 2 ( 1 df)
=
2.0 ;
P = 0.96
A chi-square of 2.0 with a P value of O.% suggests that P. falciparum infection has no association
with the grouping of soldiers according to their malaria history, CS antibody, and ABS antibody levels.
Like many other statistical methods, chi-square test is based on approximation. The validity of
the test is, therefore, reduced as the cell entries become smaller. As a general rule, when at least one cell
has a small enough expected value, such as less than 5, a chi-square test is considered to be inappropriate
and a calculation of exact probability is advised. The technique for calculating exact P values can be found
elsewhere (Dixon and Massey, 1969).
2. Paired samples
When the samples are paired and each of the two criteria of classification results in dichotomous
groupings, a special type of chi-square test of association called McNemar's test is used.
In order to determine the possible protective effect of CS antibodies against naturally-acquiied
P. falciparum malaria, a seroepidemiologic study of malaria was conducted in an endemic population of
southeastern Thailand. The villagers' blood samples were collected monthly or bimonthly during 1985- 1987
for antibody assays and malaria smears (Webster HK, and Gingrich JB, unpublished data). Each villager
who had a blood smear positive for P. falciparum (a case) in October 1986 was matched to a villager of
,
.
b
Chansuda Wongsrichanalai and H. Kyle Webster
Thai J Hlth Resch
the same age category ( ( 15 years or 3 15 years) with negative smear (a control). For each case and each
control, the P. falciparum CS antibody level for September 1986 was ascertained. Antibody absorbance
(at 414 nm.) of 0.1 OD or more is considered positive. Each pair of subjects thus results in one of the
following four possibilities :
- both villagers were CS antibody positive
- the P. falciparum - infected villager was CS antibody negative but the non-infected
villager was positive
- the P. falciparum - infected villager was CS antibody positive but the non-infected
villager was negative
- Both villagers were CS antibody negative
These four possibilities may be presented in the following 2 x 2 table.
Table 2. Distribution of the pairs of villagers with smears positive for P. falciparum in October 1986 and
their matched controls according to the categories of CS antibodies (Webster HK, and Gingrich
JB, unpublished data)
P. falciparum - infected
villager
CS
antibodies
+
-
Total
Non-infected villager
In the context of probability, we are interested in comparing the probability of having had positive
CS antibodies given that a villager is P. falciparum - infected to the probability of having had positive CS
antibodies given that a villager is not. The null hypothesis states that the former probability, (a + c) / N,,
equals the latter, (a b)/N,, where N, (number of cases) = N, (number of controls) = N. This leads to
testing (a + c ) / N = (a + b)/N.
+
Since a in the numerator is a common fraction,we can discard it and use only b and c to determine
the association. b and c are called the "discordant" or "unlike" pair.
If the null hypothesis is true, we would expect b and c to be equal. We would reject the null
hypothesis if b and c are significantly different at a specified confidence level.
Characteristics
of cases, controls
No. of pairs
observed
No. of pairs
expected
In this case, the characteristic of interest is the positivity of CS antibody in September 1986.
I f b is greater than c , the data suggest a protective effect of CS antibodies. Algebraically, McNemar's
test is derived from the original chi-square formula as follows.
Vo12 No 2
Sept 1988
Use and Misuse of the Chi-square Test of Association
in Medical Research
+
-
[(b - c)/2I2
[(c - b ) / 2 I 2
(b
c)/2
-
(b - c ) ~
-
+
b
+c
(McNemar's test)
So, for the above example:
We cannot reject the null hypothesis that P. falciparum - infected and non-infected villagers are
equally likely to have had positive CS antibodies. The naturally-acquired CS antibodies do not seem to
protect against P. falciparum malaria infection according to the above observation. The result of this
analysis, however, does not exclude the possibility that protection is dependent on CS antibody levels,
especially at the time of mosquito exposure. The positive/negative cut-off point of 0.1 OD used here
represents background reactivity of individuals who live in non-endemic areas and have never had malaria.
It is probably too low to be a sensitive indicator of protection.
Sometimes, matching is done using a subject as his own control. Examples of such self-pairing
are an efficacy study to compare two drugs using a cross-over design, and a comparison of two diagnostic
procedures in which each specimen is subjected to both procedures.
In the study of malaria in the endemic population of southeastern Thailand that was just mentioned,
each malaria smear was read separately by two technicians. Technician 2 used a microscope with a higher
magnifying power and examined more fields on each slide, so there was no @estion that he detected
malaria cases more often than technician 1 did. To answer the question of whether there is any difference
between the two technicians in reporting the results of the smears, we consider low parasite densities
reported by technician 2, which are unlikely to be detected by technician 1 , to be negative. The P. vivax
data for August 1986 will be used to illustrate this example. "Positive" smears for P. vivax according
to technician 2 is arbitarily defined here as smears with 4 or more P. vivax per 500 white blood cells
(wbc) .
Table 3. Distribution of malaria smear positivity according to the two technicians' reading of P. vivax
(Webster HK, and Gingrich JB, unpublished data).
Technician 1
P. v i v a
Technician 2
+
-
Total
.
Chansuda Wongsrichanalai and H. Kyle Webster
b
Thai J Hlth Resch
The data are clearly self-pairing because each malaria smear is subjected to examination by two
technicians, the discrepancy in reporting between whom we want to know. The null hypothesis states that
the two technicians report P. vivax at equal frequency. McNemar's test for the above data results in a
chi-square (1 df) equal to 10.9 and a P value of 0.001. Technician 2 has definitely reported P. vivax more
often than technician 1 and the difference is statistically significant at the 0.05 confidence level. Ninety-five
per cent confidence interval for the difference between the proportions of P. vivax positive smears read
by the two technicians, which is 13/203 - 1/203 = 0.06, can be calculated as follows :
-*
(b-c)
N
1 . 9 6 d m
N
-- -(13*203- 1)
(1.96 f
203
i)
We learn from this example that although the existence of the difference depended solely on
the discordant pairs, its magnitude could not be estimated without knowing the total number of subjects.
It should be noted that this illustrative analysis only serves to compare the smear reading by the
two technicians. It is irrelevant to the question of whether technician 2 has over-reported P. vivax cases or
technician 1 has missed any of them. This is because we consider neither of their reports to be a standard.
In fact, if we assume technician 2 to be a reference microscopist, the data indicate that although technician 1
has not detected as many cases, his performance is considered satisfactory in that he not missed any cases
with vivax densities greater than 5/500 wbc (N = 6). He has even detected a case with only 2/500 wbc.
Misuses
There are three common situations for which the chi-square test of association may be misused.
I . Selfpairing samples presented as independent and analysed as independent.
In the above example on the diagnoses of P. vivax made by the two technicians, suppose we
ignore the self-pairing nature of the data and present the data as follows.
Table 4. Rearrangement of the data presented in table 3.
Technician 1
Technician 2
Total
This looks as if there were 203 x 2 = 406 subjects. In fact, there are 406 observations but only
203 subjects. Presenting the data this way tends to mislead an investigator to compare the proportion of
P. vivax positive specimens read by technician 1 (7/203) with that read by technician 2 (19 / 203). With such
an inappropriate data presentation, a correct analysis cannot be achieved. Although analysis disregarding
pairs usually does not lead to a substantially different result (Colton, 1974), the test cannot be considered
statistically valid. In this example, chi-square (1 df) equals 5.9 and the P value is also small (0.015).
2. Paired samples presented as paired but analysed as independent.
If the data on P. falciparum malaria are presented properly as paired samples (as they appear in
table 2), but a chi-square test of association for independent samples rather than McNemar's test is applied,
,
Vo12 No 2
Sept 1988
Use and Misuse of the Chi-square Test of Association
in Medical Research
the conclusion would be different. Here, the null hypothesis claims an equal probability between a CS
antibody-positive and a CS antibody-negative infected villager to have a matched control with positive CS
antibody. This is equivalent to testing the difference between 5/14 and 3/15 in the given example. Applying
the chi-square test of association for independent samples to these data results in a chi-square (1 df) of 0.90
(P = 0.34).
This simply tells us that the pairs in which both subjects are CS antibody positive occur as
frequently as pairs with the P. falciparurn case having negative but the matched control having positive CS
antibody. It provides no information about the relationship of interest, i.e., between malaria and CS
antibody.
3. Independent samples presented as paired and analysed as paired.
When each of the two criteria of classification results in two categories and the row and column
categories look similar, the data are sometimes erroneously considered as paired instead of independent
samples.
In a study on the development of antibody to P. falciparurn malaria in children and adolescents
living in a malaria endemic area of Liberia, Wahlgren et al. (1986), reported the relationship between spleen
size and ABS antibody detected by indirect immunofluorescence assay (IFA). The null hypothesis stated
that children and adolescents with positive antibody titers (IFA titers of 2 1:20) were as likely as those
with negative titers to have enlarged spleens. If we arbitarity classify spleen size as follows: grades 0-1 =
negative, and grades 11-111 = positive (enlarged), the following counts are obtained.
Table 5. Distribution of Liberian children and adolescents according to their irnmunological status (derived
from Figure 5 of Wahlgren et al., 1986)
Seropositive (IFAT)
Splenomegaly
Yes
No
Total
Yes
9
10
19
No
14
3
17
23
13
36
Here, the independent categories of interest are seropositivity (IFAT): Yes and No; and the
dependent categories of interest are splenomegaly: Yes and No. Misapplication of a chi-square test occurs
when the data are regarded as self-pairing and analysed as paired using McNemar's test. The 9 children
and adolescents with positive IFA titers and enlarged spleens are discarded. So are the other 3 with negative
IFA titers and no enlarged spleens. The McNemar's test [(lo - 1 4 ) ~ / ( 1 0 14)] results in a chi-square
(1 df) of 0.67 and a P value of 0.41. This only tells us that positive IFA titers and enlarged spleens occur at
equal chance in the study population. The test is irrelevant to the stated null hypothesis. Our interest is in
the comparison of spleen sizes among those seropositive and those seronegative. By not taking into account
the total number of subjects with positive (N = 23) and negative (N = 13) serum IFA titers, it is impossible
to determine the relationship between spleen size and seropositivity.
+
The chi-square test of association for independent samples shows that the difference between 9/23
and 10/13 is statistically significant at the 0.05 confidence level (chi-square with 1 df = 4.76, P = 0.03)
and the null hypothesis is rejected. The test indicates that children and adolescents with negative serum IFA
titers are more likely to have enlarged spleen. This supports Wahlgren's conclusion that IFA titers are
inversely related to spleen size.
Chansuda Wongsrichanalai and H. Kyle Webster
Thai J Hlth Resch
The question arises, then, whether McNemar's test can be used to test the stated null hypothesis
of this study. The answer is yes if the data are arranged properly. One way to do that is to match the
subjects based on the dependent variable (seropositivity). Each subject with a positive IFA titer is matched
to a subject without. Spleen size is verified for each individual. The rearranged data appear in the following
format.
Seropositive subject
Splenomegaly
Seronegative subject
Yes
Total
Yes
No
a = number of pairs that both subjects are splenomegaly
b = number of pairs that the seropositive subject is not splenomegaly but the seronegative
subject is
c = number of pairs that the seropositive subject is splenomegaly but the seronegative subject
is not
d = number of pairs that both subjects are not splenomegaly
Summary
There are two types of data pertaining to the use of chi-square test of association, namely,
independent and paired. Each has its own format of data presentation and analysis. Paired samples must
be presented and analysed as paired and independent samples must be presented and analysed as independent.
Misapplications of the chi-square test of association occur when 1) self-pairing samples are presented as
independent and analysed as independent, 2) paired samples are presented as paired but analysed as
independent, and 3) independent samples are presented as paired and analysed as paired. The first condition
is concerned with an inappropriate statisticial technic, but usually does not lead to a substantially different
result or conclusion. The latter two conditions are more serious and can result in a totally erroneous interpretation of the study.
The relatively frequent use of the chi-square test of association in medical research is tempted
by its mathematical and conceptual simplicity. It is important that investigators use the test correctly in
order to avoid flawed scientific conclusions.
Acknowledgements
We are grateful to Dr John B Gingrich and Yupha Onthuam for discussions and comments on
the topic. We thank Nipaporn Nimsombun for typing the manuscript.
Vo12 No 2
Sept 1988
Use and Misuse of the Chi-square Test of Association
in Medic.al Research
REFERENCES
Armitage P. 1971. Statistical methods in medical research. Chapter 4, statistical inference. Blackwell Scientific
Publications, London: pp 363-365.
Colton T. 1974. Statistics in medicine. Chapter 5, inference on proportions. Little, Brown & Company,
Boston: pp 169- 172.
Daniel WW. 1983. Biostatistics: a foundation for analysis in the health sciences. Chapter 10, the chi-square
distribution and the analysis of frequencies. John Wiley & Sons, New York: pp 354-365, 376-380.
Dixon WJ and Massey FJ. 1969. Introduction to statistical analysis. Chapter 13, enumeration statistics,
McGraw-Hill Book Company, New York: p 243.
Wahlgren M, Bjorkman A, Perlmann H, Berzins K, and Perlmann P. 1986. Anti-Plamodium falciparum
antibodies acquired by residents in a holoendemic area of Liberia during development of clinical
immunity. Am J Trop Med Hyg 35: 22-29.
Webster HK, Brown AE, Chuenchitra C, Permpanich B, and Pipithkul J. 1988. Antibodies to sporozoites
in Plasmodium falciparum malaria: characterization of antibody response and correlation with
protection. J Clin Microbiol 26: 923-927.