Use and Misuse of the Chi-square Test of Association in Medical Research ~ a n a l l ~ R 4 u r nChi-Square ~'l~ Test ~danmaou$a~anl.miiii Chansuda ~on~srichanalai' and H. Kyle webster ' &f/a . $un&qm wrifl7ruittb 11% ma Innl nuntaoi. 2x31. 9hn-m~Y~lunl3W Chi-square 4 Test 9-9 rwanmaau4aynmaa51. ai~ai~~ou~nuiflianimsuwn8 2 (2) : 123.131 rwfmnlwd dao~hdu%inj~rijmma&ouiw"ou~u nha%lnnaaun+niA n% rui~~mauu%du~1%unndngrin'7unb w?a ?a/ Chi-square test 18u?ihsnnasu d ,muab79m'adiuw& ~~iuit&iu%garn"sndi~ s n z ~ ~ a ~ ~ i n i h r ~ a m s d a i ~ w a " n ~66aznisdflaiuwuio ~u~aw'~~flia~ 4, ~uuiudio~uinmi7ui?aln*adiu66~kaiu ~%nuiraoniuhuiurrfi nn?av chi-square test ?@pIn%r di?wV t=Y i nisi~flaiuwu~uflnu?u?~uwrnwa~rn nuu ruauain@~%~d6v??gwn"nvauchi-square test a h u l l d b riouiuafi$ s chi-square test nazi uIUrjrmdm'7) 7 vauv%$nw a 7 n ~ u m ~ ? a d a i o ~ ~w6tl~13au'ir mdu oznci i ~ b 5 2 n 7?a" 9-4 -4 ~~9 %tract: Chansuda Wongsrichanalar anu ri. Kyle Vveoster 1988. Use and Misuse of Chi-square Test of Association in Medical Research. Thai J Hlth Resch 2(2): 123-131 Chi-square test of association is a statistical method frequently used in medical research to con;pare two or more groups of subjects on a certain characteristic to see the numbers or proportions of he subjects having the characteristic are different. The test is appealing because of its mathematical nnd -. --' conceptual simplicity. Unfortunately, it is sometimes misused leading to misinterpretation of a scie,ntific study. In this paper, the uses of chi-square test of association will be described and the sit ulations in which misuses may occur will be emphasized, aN illustrated with examples. Department of Immunology and Biochemistry, U.S.Army Medical Component, Armed Forces Research Instilute of Medical Sciences (AFR IMS), Rajvilhi Road, Bangkok 10400. . Chansuda Wongsrichanalai and H. Kyle Webster . Thai J Hlth Resch . Introduction Chi-square test is a method of statistical testing to determine whether two or more series of count data are different. Chi-square with one degree of freedom is the distribution of the square of a unit normally distributed variable (z2).Thus, chi-square and normal critical ratio are arithmatically similar. The general formulation of chi-square is : where X2 = C = 0 = E = df = chi-square statistic sum of all cells observed counts expected counts degree of freedom Being the sum of the numbers that have been squared, chi-square assumes no negative values. Like the t test, it deals with the concept of degree of freedom. The appeal for the use of chi-square is its ., mathematical and conceptual simplicity. In addition, chi-square can be used for the comparison of several proportions. Its simplicity, however, is sometimes exploited at the expense of misapplication, which results in flawed scientific conclusions. Chi-square test of association, sometimes called chi-square test of independence, is probably the most frequent use of the chi-square distribution in medical research. Other uses are the goodness-of-fit test, test of homogeneity (Daniel, 1983), and test of trend in proportions (Armitage, 1971). Misuses of the chi-square test of association are rarely mentioned or explained in text books of biostatistics. In this paper, uses and misuses of the test are summarized, all illustrated by examples, and the rationale for each situation is discussed. Generally, chi-square test of association is used to test the null hypothesis that two criteria of classification (e.g., types of drugs used versus disease outcomes) are independent when applied to the same set of data. An investigator concludes that the two criteria of classification are associated if the distribution of one criterion is different among categories of the other criterion. Classification of the data according to two criteria is commonly presented by a table in which the rows represent the categories of one criterion and columns represent those of the other. This is called a "contingency table", the simplest of which is when each of the two criteria gives rise to dichotomous categories resulting in a so-called "2 x 2" or fourfold table. To simplify the following discussion, only 2 x 2 tables (i.e., involving chi-square test with one degree of freedom) will be considered. The examples given are taken from research on the epidemiology of malaria. However, the formulated concepts can be applied to other branches of medical research and, of course. to diseases other than malaria. Uses of the Chi-square Test of Association I . Independent samples Webster et al. (1988), reported a seroepidemiologic study of 135 Thai soldiers deployed to a malaria endemic area in northeast Thailand. The soldiers were divided into two groups, namely, "malaria naive" and "malaria experienced" based on their history of malaria infection and baseline circumsporozoite (CS) and blood-stage antibody (ABS) levels. They were prospectively followed for development of parasitemia J Vo12 No 2 Sept 1988 Use and Misuse of the Chi-square Test of Association in Medical Research and antibody responses. In table 1 of Webster et at. (1988), the numbers of malaria naive and malaria experienced soldiers who developed P. falciparum malaria during the seventy-seven day study period were presented along with other seroepidemiological characteristics of the two groups. Suppose we want to know whether soldiers who are malaria experienced are less likely to be infected with P. falciparum malaria than those who are malaria naive. We may evaluate this association by comparing the proportions of individuals who develop P. falciparum infection between the two groups. Table 1 displays these data. Table 1. Distribution of malaria-naive and malaria-experienced soldiers according to the development of P. falciparum infection (Websrer ef at., 1988). Soldier groups Malaria naive P. falciparum infection Malaria experienced Total Yes No The numbers 28, 25, 43, and 39 are observed counts. Under the null hypothesis, one would expect, among the 71 malaria naive soldiers, 71 x (53/ 135) = 27.9 to be infected with P. falciparum malaria and 71 x (82/ 135) = 43.1 to be not infected. Correspondingly, the numbers expected to be infected and not infected among malaria experienced soldiers are 64 x (53/ 135) = 25.1 and 64 x (82/135) = 38.9. The calculation of chi-square can then proceed as follows: x 2 ( 1 df) = 2.0 ; P = 0.96 A chi-square of 2.0 with a P value of O.% suggests that P. falciparum infection has no association with the grouping of soldiers according to their malaria history, CS antibody, and ABS antibody levels. Like many other statistical methods, chi-square test is based on approximation. The validity of the test is, therefore, reduced as the cell entries become smaller. As a general rule, when at least one cell has a small enough expected value, such as less than 5, a chi-square test is considered to be inappropriate and a calculation of exact probability is advised. The technique for calculating exact P values can be found elsewhere (Dixon and Massey, 1969). 2. Paired samples When the samples are paired and each of the two criteria of classification results in dichotomous groupings, a special type of chi-square test of association called McNemar's test is used. In order to determine the possible protective effect of CS antibodies against naturally-acquiied P. falciparum malaria, a seroepidemiologic study of malaria was conducted in an endemic population of southeastern Thailand. The villagers' blood samples were collected monthly or bimonthly during 1985- 1987 for antibody assays and malaria smears (Webster HK, and Gingrich JB, unpublished data). Each villager who had a blood smear positive for P. falciparum (a case) in October 1986 was matched to a villager of , . b Chansuda Wongsrichanalai and H. Kyle Webster Thai J Hlth Resch the same age category ( ( 15 years or 3 15 years) with negative smear (a control). For each case and each control, the P. falciparum CS antibody level for September 1986 was ascertained. Antibody absorbance (at 414 nm.) of 0.1 OD or more is considered positive. Each pair of subjects thus results in one of the following four possibilities : - both villagers were CS antibody positive - the P. falciparum - infected villager was CS antibody negative but the non-infected villager was positive - the P. falciparum - infected villager was CS antibody positive but the non-infected villager was negative - Both villagers were CS antibody negative These four possibilities may be presented in the following 2 x 2 table. Table 2. Distribution of the pairs of villagers with smears positive for P. falciparum in October 1986 and their matched controls according to the categories of CS antibodies (Webster HK, and Gingrich JB, unpublished data) P. falciparum - infected villager CS antibodies + - Total Non-infected villager In the context of probability, we are interested in comparing the probability of having had positive CS antibodies given that a villager is P. falciparum - infected to the probability of having had positive CS antibodies given that a villager is not. The null hypothesis states that the former probability, (a + c) / N,, equals the latter, (a b)/N,, where N, (number of cases) = N, (number of controls) = N. This leads to testing (a + c ) / N = (a + b)/N. + Since a in the numerator is a common fraction,we can discard it and use only b and c to determine the association. b and c are called the "discordant" or "unlike" pair. If the null hypothesis is true, we would expect b and c to be equal. We would reject the null hypothesis if b and c are significantly different at a specified confidence level. Characteristics of cases, controls No. of pairs observed No. of pairs expected In this case, the characteristic of interest is the positivity of CS antibody in September 1986. I f b is greater than c , the data suggest a protective effect of CS antibodies. Algebraically, McNemar's test is derived from the original chi-square formula as follows. Vo12 No 2 Sept 1988 Use and Misuse of the Chi-square Test of Association in Medical Research + - [(b - c)/2I2 [(c - b ) / 2 I 2 (b c)/2 - (b - c ) ~ - + b +c (McNemar's test) So, for the above example: We cannot reject the null hypothesis that P. falciparum - infected and non-infected villagers are equally likely to have had positive CS antibodies. The naturally-acquired CS antibodies do not seem to protect against P. falciparum malaria infection according to the above observation. The result of this analysis, however, does not exclude the possibility that protection is dependent on CS antibody levels, especially at the time of mosquito exposure. The positive/negative cut-off point of 0.1 OD used here represents background reactivity of individuals who live in non-endemic areas and have never had malaria. It is probably too low to be a sensitive indicator of protection. Sometimes, matching is done using a subject as his own control. Examples of such self-pairing are an efficacy study to compare two drugs using a cross-over design, and a comparison of two diagnostic procedures in which each specimen is subjected to both procedures. In the study of malaria in the endemic population of southeastern Thailand that was just mentioned, each malaria smear was read separately by two technicians. Technician 2 used a microscope with a higher magnifying power and examined more fields on each slide, so there was no @estion that he detected malaria cases more often than technician 1 did. To answer the question of whether there is any difference between the two technicians in reporting the results of the smears, we consider low parasite densities reported by technician 2, which are unlikely to be detected by technician 1 , to be negative. The P. vivax data for August 1986 will be used to illustrate this example. "Positive" smears for P. vivax according to technician 2 is arbitarily defined here as smears with 4 or more P. vivax per 500 white blood cells (wbc) . Table 3. Distribution of malaria smear positivity according to the two technicians' reading of P. vivax (Webster HK, and Gingrich JB, unpublished data). Technician 1 P. v i v a Technician 2 + - Total . Chansuda Wongsrichanalai and H. Kyle Webster b Thai J Hlth Resch The data are clearly self-pairing because each malaria smear is subjected to examination by two technicians, the discrepancy in reporting between whom we want to know. The null hypothesis states that the two technicians report P. vivax at equal frequency. McNemar's test for the above data results in a chi-square (1 df) equal to 10.9 and a P value of 0.001. Technician 2 has definitely reported P. vivax more often than technician 1 and the difference is statistically significant at the 0.05 confidence level. Ninety-five per cent confidence interval for the difference between the proportions of P. vivax positive smears read by the two technicians, which is 13/203 - 1/203 = 0.06, can be calculated as follows : -* (b-c) N 1 . 9 6 d m N -- -(13*203- 1) (1.96 f 203 i) We learn from this example that although the existence of the difference depended solely on the discordant pairs, its magnitude could not be estimated without knowing the total number of subjects. It should be noted that this illustrative analysis only serves to compare the smear reading by the two technicians. It is irrelevant to the question of whether technician 2 has over-reported P. vivax cases or technician 1 has missed any of them. This is because we consider neither of their reports to be a standard. In fact, if we assume technician 2 to be a reference microscopist, the data indicate that although technician 1 has not detected as many cases, his performance is considered satisfactory in that he not missed any cases with vivax densities greater than 5/500 wbc (N = 6). He has even detected a case with only 2/500 wbc. Misuses There are three common situations for which the chi-square test of association may be misused. I . Selfpairing samples presented as independent and analysed as independent. In the above example on the diagnoses of P. vivax made by the two technicians, suppose we ignore the self-pairing nature of the data and present the data as follows. Table 4. Rearrangement of the data presented in table 3. Technician 1 Technician 2 Total This looks as if there were 203 x 2 = 406 subjects. In fact, there are 406 observations but only 203 subjects. Presenting the data this way tends to mislead an investigator to compare the proportion of P. vivax positive specimens read by technician 1 (7/203) with that read by technician 2 (19 / 203). With such an inappropriate data presentation, a correct analysis cannot be achieved. Although analysis disregarding pairs usually does not lead to a substantially different result (Colton, 1974), the test cannot be considered statistically valid. In this example, chi-square (1 df) equals 5.9 and the P value is also small (0.015). 2. Paired samples presented as paired but analysed as independent. If the data on P. falciparum malaria are presented properly as paired samples (as they appear in table 2), but a chi-square test of association for independent samples rather than McNemar's test is applied, , Vo12 No 2 Sept 1988 Use and Misuse of the Chi-square Test of Association in Medical Research the conclusion would be different. Here, the null hypothesis claims an equal probability between a CS antibody-positive and a CS antibody-negative infected villager to have a matched control with positive CS antibody. This is equivalent to testing the difference between 5/14 and 3/15 in the given example. Applying the chi-square test of association for independent samples to these data results in a chi-square (1 df) of 0.90 (P = 0.34). This simply tells us that the pairs in which both subjects are CS antibody positive occur as frequently as pairs with the P. falciparurn case having negative but the matched control having positive CS antibody. It provides no information about the relationship of interest, i.e., between malaria and CS antibody. 3. Independent samples presented as paired and analysed as paired. When each of the two criteria of classification results in two categories and the row and column categories look similar, the data are sometimes erroneously considered as paired instead of independent samples. In a study on the development of antibody to P. falciparurn malaria in children and adolescents living in a malaria endemic area of Liberia, Wahlgren et al. (1986), reported the relationship between spleen size and ABS antibody detected by indirect immunofluorescence assay (IFA). The null hypothesis stated that children and adolescents with positive antibody titers (IFA titers of 2 1:20) were as likely as those with negative titers to have enlarged spleens. If we arbitarity classify spleen size as follows: grades 0-1 = negative, and grades 11-111 = positive (enlarged), the following counts are obtained. Table 5. Distribution of Liberian children and adolescents according to their irnmunological status (derived from Figure 5 of Wahlgren et al., 1986) Seropositive (IFAT) Splenomegaly Yes No Total Yes 9 10 19 No 14 3 17 23 13 36 Here, the independent categories of interest are seropositivity (IFAT): Yes and No; and the dependent categories of interest are splenomegaly: Yes and No. Misapplication of a chi-square test occurs when the data are regarded as self-pairing and analysed as paired using McNemar's test. The 9 children and adolescents with positive IFA titers and enlarged spleens are discarded. So are the other 3 with negative IFA titers and no enlarged spleens. The McNemar's test [(lo - 1 4 ) ~ / ( 1 0 14)] results in a chi-square (1 df) of 0.67 and a P value of 0.41. This only tells us that positive IFA titers and enlarged spleens occur at equal chance in the study population. The test is irrelevant to the stated null hypothesis. Our interest is in the comparison of spleen sizes among those seropositive and those seronegative. By not taking into account the total number of subjects with positive (N = 23) and negative (N = 13) serum IFA titers, it is impossible to determine the relationship between spleen size and seropositivity. + The chi-square test of association for independent samples shows that the difference between 9/23 and 10/13 is statistically significant at the 0.05 confidence level (chi-square with 1 df = 4.76, P = 0.03) and the null hypothesis is rejected. The test indicates that children and adolescents with negative serum IFA titers are more likely to have enlarged spleen. This supports Wahlgren's conclusion that IFA titers are inversely related to spleen size. Chansuda Wongsrichanalai and H. Kyle Webster Thai J Hlth Resch The question arises, then, whether McNemar's test can be used to test the stated null hypothesis of this study. The answer is yes if the data are arranged properly. One way to do that is to match the subjects based on the dependent variable (seropositivity). Each subject with a positive IFA titer is matched to a subject without. Spleen size is verified for each individual. The rearranged data appear in the following format. Seropositive subject Splenomegaly Seronegative subject Yes Total Yes No a = number of pairs that both subjects are splenomegaly b = number of pairs that the seropositive subject is not splenomegaly but the seronegative subject is c = number of pairs that the seropositive subject is splenomegaly but the seronegative subject is not d = number of pairs that both subjects are not splenomegaly Summary There are two types of data pertaining to the use of chi-square test of association, namely, independent and paired. Each has its own format of data presentation and analysis. Paired samples must be presented and analysed as paired and independent samples must be presented and analysed as independent. Misapplications of the chi-square test of association occur when 1) self-pairing samples are presented as independent and analysed as independent, 2) paired samples are presented as paired but analysed as independent, and 3) independent samples are presented as paired and analysed as paired. The first condition is concerned with an inappropriate statisticial technic, but usually does not lead to a substantially different result or conclusion. The latter two conditions are more serious and can result in a totally erroneous interpretation of the study. The relatively frequent use of the chi-square test of association in medical research is tempted by its mathematical and conceptual simplicity. It is important that investigators use the test correctly in order to avoid flawed scientific conclusions. Acknowledgements We are grateful to Dr John B Gingrich and Yupha Onthuam for discussions and comments on the topic. We thank Nipaporn Nimsombun for typing the manuscript. Vo12 No 2 Sept 1988 Use and Misuse of the Chi-square Test of Association in Medic.al Research REFERENCES Armitage P. 1971. Statistical methods in medical research. Chapter 4, statistical inference. Blackwell Scientific Publications, London: pp 363-365. Colton T. 1974. Statistics in medicine. Chapter 5, inference on proportions. Little, Brown & Company, Boston: pp 169- 172. Daniel WW. 1983. Biostatistics: a foundation for analysis in the health sciences. Chapter 10, the chi-square distribution and the analysis of frequencies. John Wiley & Sons, New York: pp 354-365, 376-380. Dixon WJ and Massey FJ. 1969. Introduction to statistical analysis. Chapter 13, enumeration statistics, McGraw-Hill Book Company, New York: p 243. Wahlgren M, Bjorkman A, Perlmann H, Berzins K, and Perlmann P. 1986. Anti-Plamodium falciparum antibodies acquired by residents in a holoendemic area of Liberia during development of clinical immunity. Am J Trop Med Hyg 35: 22-29. Webster HK, Brown AE, Chuenchitra C, Permpanich B, and Pipithkul J. 1988. Antibodies to sporozoites in Plasmodium falciparum malaria: characterization of antibody response and correlation with protection. J Clin Microbiol 26: 923-927.
© Copyright 2026 Paperzz