Goodness of Fit Tests Marc H. Mehlman [email protected] University of New Haven Marc Mehlman (University of New Haven) Goodness of Fit Tests 1 / 26 Table of Contents 1 Goodness of Fit Chi–Squared Test 2 Tests of Independence 3 Chapter #9 R Assignment Marc Mehlman (University of New Haven) Goodness of Fit Tests 2 / 26 Goodness of Fit Chi–Squared Test Goodness of Fit Chi–Squared Test Goodness of Fit Chi–Squared Test Marc Mehlman (University of New Haven) Goodness of Fit Tests 3 / 26 Goodness of Fit Chi–Squared Test Idea of the chi-square test The chi-square (χ 2 ) test is used when the data are categorical. It measures how different the observed data are from what we would Observed sample proportions Expected proportions under (1 SRS of 700 births) H0: p1=p2=p3=p4=p5=p6=p7=1/7 20% Expected composition Sample composition expect if H0 was true. 15% 10% 5% 0% Mon. Tue. Wed. Thu. Fri. Marc Mehlman (University of New Haven) Sat. Sun. 20% 15% 10% 5% 0% Mon. Tue. Goodness of Fit Tests Wed. Thu. Fri. Sat. Sun. 4 / 26 Goodness of Fit Chi–Squared Test The chi-square distributions The χ2 distributions are a family of distributions that take only positive values, are skewed to the right, and are described by a specific degrees of freedom. Published tables & software give the upper-tail area for critical values of many χ2 distributions. Marc Mehlman (University of New Haven) Goodness of Fit Tests 5 / 26 Goodness of Fit Chi–Squared Test Table D Ex: df = 6 If χ2 = 15.9 the P-value is between 0.01 −0.02. df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 80 100 p 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 5.39 5.99 6.74 7.78 9.49 11.14 11.67 13.28 14.86 16.42 18.47 6.63 7.29 8.12 9.24 11.07 12.83 13.39 15.09 16.75 18.39 20.51 7.84 8.56 9.45 10.64 12.59 14.45 15.03 16.81 18.55 20.25 22.46 9.04 9.80 10.75 12.02 14.07 16.01 16.62 18.48 20.28 22.04 24.32 10.22 11.03 12.03 13.36 15.51 17.53 18.17 20.09 21.95 23.77 26.12 11.39 12.24 13.29 14.68 16.92 19.02 19.68 21.67 23.59 25.46 27.88 12.55 13.44 14.53 15.99 18.31 20.48 21.16 23.21 25.19 27.11 29.59 13.70 14.63 15.77 17.28 19.68 21.92 22.62 24.72 26.76 28.73 31.26 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 28.30 30.32 32.91 15.98 16.98 18.20 19.81 22.36 24.74 25.47 27.69 29.82 31.88 34.53 17.12 18.15 19.41 21.06 23.68 26.12 26.87 29.14 31.32 33.43 36.12 18.25 19.31 20.60 22.31 25.00 27.49 28.26 30.58 32.80 34.95 37.70 19.37 20.47 21.79 23.54 26.30 28.85 29.63 32.00 34.27 36.46 39.25 20.49 21.61 22.98 24.77 27.59 30.19 31.00 33.41 35.72 37.95 40.79 21.60 22.76 24.16 25.99 28.87 31.53 32.35 34.81 37.16 39.42 42.31 22.72 23.90 25.33 27.20 30.14 32.85 33.69 36.19 38.58 40.88 43.82 23.83 25.04 26.50 28.41 31.41 34.17 35.02 37.57 40.00 42.34 45.31 24.93 26.17 27.66 29.62 32.67 35.48 36.34 38.93 41.40 43.78 46.80 26.04 27.30 28.82 30.81 33.92 36.78 37.66 40.29 42.80 45.20 48.27 27.14 28.43 29.98 32.01 35.17 38.08 38.97 41.64 44.18 46.62 49.73 28.24 29.55 31.13 33.20 36.42 39.36 40.27 42.98 45.56 48.03 51.18 29.34 30.68 32.28 34.38 37.65 40.65 41.57 44.31 46.93 49.44 52.62 30.43 31.79 33.43 35.56 38.89 41.92 42.86 45.64 48.29 50.83 54.05 31.53 32.91 34.57 36.74 40.11 43.19 44.14 46.96 49.64 52.22 55.48 32.62 34.03 35.71 37.92 41.34 44.46 45.42 48.28 50.99 53.59 56.89 33.71 35.14 36.85 39.09 42.56 45.72 46.69 49.59 52.34 54.97 58.30 34.80 36.25 37.99 40.26 43.77 46.98 47.96 50.89 53.67 56.33 59.70 45.62 47.27 49.24 51.81 55.76 59.34 60.44 63.69 66.77 69.70 73.40 56.33 58.16 60.35 63.17 67.50 71.42 72.61 76.15 79.49 82.66 86.66 66.98 68.97 71.34 74.40 79.08 83.30 84.58 88.38 91.95 95.34 99.61 88.13 90.41 93.11 96.58 101.90 106.60 108.10 112.30 116.30 120.10 124.80 109.10 111.70 114.70 118.50 124.30 129.60 131.10 135.80 140.20 144.30 149.40 Marc Mehlman (University of New Haven) Goodness of Fit Tests 0.0005 12.12 15.20 17.73 20.00 22.11 24.10 26.02 27.87 29.67 31.42 33.14 34.82 36.48 38.11 39.72 41.31 42.88 44.43 45.97 47.50 49.01 50.51 52.00 53.48 54.95 56.41 57.86 59.30 60.73 62.16 76.09 89.56 102.70 128.30 153.20 6 / 26 Goodness of Fit Chi–Squared Test Data for n observations of a categorical variable with k possible outcomes are summarized as observed counts, n1 , n2 , · · · , nk in k cells. Let H0 specify the cell probabilities p1 , p2 , · · · , pk for the k possible outcomes. Definition oj ej def = def = observed in cell j npj = expected in cell j Example Three species of large fish (A, B, C) that are native to a certain river have been observed to exist in equal proportions. A recent survey of 300 large fish found 89 of species A, 120 of species B and 91 of species C. What are the observed and expected counts? Solution: o1 = 89, o2 = 120 and o3 = 91. 1 e1 = e2 = e3 = npj = 300 = 100. 3 Marc Mehlman (University of New Haven) Goodness of Fit Tests 7 / 26 Goodness of Fit Chi–Squared Test Theorem (Chi–Squared Goodness of Fit Test) The chi–square statistic, which measures how much the observed cell counts differ from the expected cell counts, is def x = k X (oj − ej )2 . ej j=1 Let H0 : the cell probabilities are p1 , · · · , pk . If H0 is true and all expected counts are ≥ 1 no more than 20% of the expected counts are < 5. then the chi–squared statistic is approximately χ2 (k − 1). In that case, the p–value of the test H0 versus HA : not H0 is approximately P(x ≥ C ) where C ∼ χ2 (k − 1). Marc Mehlman (University of New Haven) Goodness of Fit Tests 8 / 26 Goodness of Fit Chi–Squared Test Example River ecology Three species of large fish (A, B, C) that are native to a certain river have been observed to co-exist in equal proportions. A recent random sample of 300 large fish found 89 of species A, 120 of species B, and 91 of species C. Do the data provide evidence that the river’s ecosystem has been upset? H0: pA = pB = pC = 1/3 Ha: H0 is not true Number of proportions compared: k=3 All the expected counts are : n / k = 300 / 3 = 100 Degrees of freedom: (k – 1) = 3 – 1 = 2 X2 calculations: ( 89 − 100) 2 + (120 − 100) 2 + ( 91 − 100) 2 χ2 = 100 100 = 1.21 + 4.0 + 0.81 = 6.02 Marc Mehlman (University of New Haven) 100 Goodness of Fit Tests 9 / 26 Goodness of Fit Chi–Squared Test Example (cont.) If H0 was true, how likely would it be to find by chance a discrepancy between observed and expected frequencies yielding a X2 value of 6.02 or greater? From Table E, we find 5.99 < X2 < 7.38, so 0.05 > P > 0.025 Software gives P-value = 0.049 Using a typical significance level of 5%, we conclude that the results are significant. We have found evidence that the 3 fish populations are not currently equally represented in this ecosystem (P < 0.05). Marc Mehlman (University of New Haven) Goodness of Fit Tests 10 / 26 Goodness of Fit Chi–Squared Test Example (cont.) Interpreting the χ 2 output The individual values summed in the χ2 statistic are the χ 2 components. When the test is statistically significant, the largest components indicate which condition(s) are most different from the expected H0. You can also compare the actual proportions qualitatively in a graph. χ2 = Percent of total . 40% 100 100 = 1.21 + 4.0 + 0.81 = 6.02 30% 20% 100 The largest X2 component, 4.0, is for 10% 0% ( 89 − 100) 2 + (120 − 100) 2 + ( 91 − 100) 2 species B. The increase in species B A gumpies B sticklebarbs C spotheads Marc Mehlman (University of New Haven) contributes the most to significance. Goodness of Fit Tests 11 / 26 Goodness of Fit Chi–Squared Test Example Goodness of fit for a genetic model Under a genetic model of dominant epistasis, a cross of white and yellow summer squash will yield white, yellow, and green squash with probabilities 12/16, 3/16 and 1/16 respectively (expected ratios 12:3:1). Suppose we observe the following data: Are they consistent with the genetic model? H0: pwhite = 12/16; pyellow = 3/16; pgreen = 1/16 Ha: H0 is not true We use H0 to compute the expected counts for each squash type. Marc Mehlman (University of New Haven) Goodness of Fit Tests 12 / 26 Goodness of Fit Chi–Squared Test Example (cont.) We then compute the chi-square statistic: χ2 = (155 −153.75) 2 + ( 40 − 38.4375) 2 + (10 −12.8125) 2 = 0.069106 153.75 38.4375 12.8125 χ2 0.01016 0.06352 0.61738 0.69106 Degrees of freedom = k – 1 = 2, and X2 = 0.691. Using Table D we find P > 0.25. Software gives P = 0.708. This is not significant and we fail to reject H0. The observed data are consistent with a dominant epistatic genetic model (12:3:1). The small observed deviations from the model could simply have arisen from the random sampling process alone. Marc Mehlman (University of New Haven) Goodness of Fit Tests 13 / 26 Goodness of Fit Chi–Squared Test Example (cont.) > obs=c(155,40,10) > tprob=c(12/16, 3/16, 1/16) > chisq.test(obs,p=tprob) Chi-squared test for given probabilities data: obs X-squared = 0.6911, df = 2, p-value = 0.7078 > exp=chisq.test(obs,p=tprob)$expected > exp [1] 153.7500 38.4375 12.8125 > (obs-exp)^2/exp [1] 0.01016260 0.06351626 0.61737805 Marc Mehlman (University of New Haven) Goodness of Fit Tests 14 / 26 Tests of Independence Tests of Independence Tests of Independence Marc Mehlman (University of New Haven) Goodness of Fit Tests 15 / 26 Tests of Independence Example Two-way tables An experiment has a two-way, or block, design if two categorical factors are studied with several levels of each factor. Two-way tables organize data about two categorical variables with any number of levels/treatments obtained from a two-way, or block, design. High school students were asked whether they smoke, and whether their parents smoke: Second factor: Student smoking status First factor: Parent smoking status Marc Mehlman (University of New Haven) 400 416 188 Goodness of Fit Tests 1380 1823 1168 16 / 26 Tests of Independence Example (cont.) both parents smoke one parent smokes neither parent smokes Total student smokes 400 416 188 1,004 student doesn’t smoke 1,380 1,823 1,168 4,371 Total 1,780 2,239 1,356 5,375 Assuming the observed corresponds to the population, ie using empirical probabilities in place of actual probabilities: P(student & one parent smokes) = = P(student smokes) = = P(one parent smokes) = = Marc Mehlman (University of New Haven) P(being in row #2 & column #1) 2, 1 entry grand total = 416 5, 375 = 0.077 P(being in column #1) column #1 total grand total = 1, 004 5, 375 = 0.187 P(being in row #2) row #2 total grand total Goodness of Fit Tests = 2, 239 5, 375 = 0.417. 17 / 26 Tests of Independence Example (cont.) both parents smoke one parent smokes neither parent smokes Total student smokes 400 416 188 1,004 student doesn’t smoke 1,380 1,823 1,168 4,371 Total 1,780 2,239 1,356 5,375 Assuming the observed corresponds to the population, ie using empirical probabilities in place of actual probabilities: P(student smokes | both parents smoke) = P(student smokes | one parent smokes) = P(student smokes | neither parent smokes) = Marc Mehlman (University of New Haven) 1, 1 entry row #1 total 2, 1 entry row #2 total 3, 1 entry row #3 total Goodness of Fit Tests = = = 400 1, 780 416 2, 239 188 1, 356 = 0.225 = 0.186 = 0.139. 18 / 26 Tests of Independence Observe: Assuming H0 : row variable and column variable are independent, eij = (grand total) ∗ P(being in ij th cell) = (grand total) ∗ P(being in row #i) ∗ P(being in column #j) row #i total column #j total ∗ = (grand total) ∗ grand total grand total (row #i total) ∗ (column #j total) = . grand total Marc Mehlman (University of New Haven) Goodness of Fit Tests 19 / 26 Tests of Independence Example (cont.) both parents smoke one parent smokes neither parent smokes Total student smokes 400 416 188 1,004 student doesn’t smoke 1,380 1,823 1,168 4,371 Total 1,780 2,239 1,356 5,375 The expected counts of the six cells are: 1, 780 ∗ 1, 004 = 332.49 5, 375 2, 239 ∗ 1, 004 = 418.22 = 5, 375 1, 356 ∗ 1, 004 = = 253.29 5, 375 1, 780 ∗ 4, 371 = 1, 447.51 5, 375 2, 239 ∗ 4, 371 = = 1, 820.48 5, 375 1, 356 ∗ 4, 371 = = 1, 102.71 5, 375 e11 = e12 = e21 e22 e31 Marc Mehlman (University of New Haven) e32 Goodness of Fit Tests 20 / 26 Tests of Independence Theorem (Chi–Squared Test for Two–Way Tables) The chi–square statistic from a two–way r × c table, def x = r X c X (oij − eij )2 , eij i=1 j=1 measures how much the observed cell counts differ from the expected cell counts when H0 : row variable and column variable are independent holds. If H0 is true and all expected counts are ≥ 1 no more than 20% of the expected counts are < 5. then the chi–squared statistic is approximately χ2 ((r − 1)(c − 1)). In that case, the p–value of the test, H0 versus HA : not H0 is approximately P(x ≥ C ) where C ∼ χ2 ((r − 1)(c − 1)). Marc Mehlman (University of New Haven) Goodness of Fit Tests 21 / 26 Tests of Independence Example (cont.) Influence of parental smoking Here is a computer output for a chi-square test performed on the data from a random sample of high school students (rows are parental smoking habits, columns are the students’ smoking habits). What does it tell you? Sample size? Hypotheses? Are the data ok for a χ2 test? Interpretation? Marc Mehlman (University of New Haven) Goodness of Fit Tests 22 / 26 Tests of Independence Example (cont.) > > > > > row1=c(400,1380) row2=c(416,1823) row3=c(188,1168) obs = rbind(row1,row2,row3) chisq.test(obs) Pearson’s Chi-squared test data: obs X-squared = 37.5663, df = 2, p-value = 6.959e-09 > exp=chisq.test(obs)$expected > exp [,1] [,2] row1 332.4874 1447.513 row2 418.2244 1820.776 row3 253.2882 1102.712 > (obs-exp)^2/exp [,1] [,2] row1 13.70862455 3.14881241 row2 0.01183057 0.00271743 row3 16.82884348 3.86551335 Marc Mehlman (University of New Haven) Goodness of Fit Tests 23 / 26 Tests of Independence Consider a 2 × 2 two–way table: male female bad driver 789 823 good driver 563 575 One can test whether being a bad/good driver has nothing to do with gender by 1 z test for comparing two proportions. 2 Goodness of fit Chi–Squared Test for Independence. Both ways are equivalent and will yield the same result. Marc Mehlman (University of New Haven) Goodness of Fit Tests 24 / 26 Chapter #9 R Assignment Chapter #9 R Assignment Chapter #9 R Assignment Marc Mehlman (University of New Haven) Goodness of Fit Tests 25 / 26 Chapter #9 R Assignment 1 A car expert claims that 30% of all cars in Johnstown are American made, 35% are Japanese made, 20% are Korean made and 15% are European. Of 156 cars randomly observed in Johnstown, 67 were American, 42 were Japanese, 24 were Korean and 23 were European. Find the p–value of a goodness of fit test between the what was expected and what was observed. 2 Senie et al. (1981) investigated the relationship between age and frequency of breast self-examination in a sample of women (Senie, R. T., Rosen, P. P., Lesser, M. L., and Kinne, D. W. Breast self–examinations and medical examination relating to breast cancer stage. American Journal of Public Health, 71, 583–590.) A summary of the results is presented in the following table: Frequency of breast self–examination Age under 45 45 - 59 60 and over Monthly 91 150 109 Occasionally 90 200 198 Never 51 155 172 From Hand et al., page 307, table 368. Do an independence test to see if age and frequency of breast self–examination are independent. Marc Mehlman (University of New Haven) Goodness of Fit Tests 26 / 26
© Copyright 2026 Paperzz