Math 143 – Fall 2010 Chi-Squared Tests 1 Goodness of Fit Testing Golfballs in the Yard The owners of a residence located along a golf course collected the first 500 golf balls that landed on their property. Most golf balls are labeled with the make of the golf ball and a number, for example “Nike 1” or “Titleist 3”. The numbers are typically between 1 and 4, and the owners of the residence wondered if these numbers are equally likely (at least among golf balls used by golfers of poor enough quality that they lose them in the yards of the residences along the fairway). Of the 500 golf balls collected, 486 of them bore a number between 1 and 4. The results are tabulated below: number on ball count 1 137 2 138 3 107 4 104 Q. What should the owners conclude? A. We will follow our usual four-step hypothesis test procedure to answer this question. 1. State the null and alternative hypotheses. • The null hypothesis is that all of the numbers are equally likely. We could express this as H0 : p1 = p2 = p3 = p4 where pi is the true proportion of golf balls (in our population of interest) bearing the number i. Since these must sum to 1, this is equivalent to H0 : p1 = 0.25, p2 = 0.25, p3 = 0.25, and p4 = 0.25 . • The alternative is that there is some other distribution (in the population). 2. Calculate a test statistic. While we could make up a test statistic (and we did this in class), the usual test statistic used for this situation is the Chi-squared statistic: χ2 = X (Oi − Ei )2 Ei , where Oi = observed count for cell i , Ei = expected count for cell i . In this example, if H0 is true, we would expect for each cell. Our test statistic is χ2 = = 1 4 of the golf balls in each category, so Ei = 486/4 = 121.5 (137 − 121.5)2 121.5 + (138 − 121.5)2 121.5 + (107 − 121.5)2 121.5 + (104 − 121.5)2 121.5 1.98 + 2.24 + 1.73 + 2.52 = 8.47 . Math 143 – Fall 2010 Chi-Squared Tests 2 3. Determine the p-value. When the null hypothesis is true, our test statistic has approximately a Chi-squared distribution. The degrees of freedom is 1 less than the number of cells in our data table. (That is, 1 less than the number of categories.) As usual, we can ask StatCrunch to compute this for us (Stat > Calculators > Chi-square). You will be asked to provide the degrees of freedom (3, in our example) and the value of the test statistic (8.47, in our example). Our p-value is the probability of being as larger or larger than 8.47. We choose the larger tail because the test statistic will be larger the more different our observed and expected counts are. In our example, the p-value is P(X ≥ 8.47) = 0.03725 when H0 is true. Inference with Two Categorical Variables Two-Way Tables To look for an association between two categorical variables, we first put the data into a table (called a two-way table). The values inside the table describe the joint distribution of the two variables. In the example below, right-handed men and women had their feet measured to see with foot (if any) was larger. The two categorical variables are gender (Male or Female) and foot comparison (left foot larger, both feet the same, or right foot larger). The joint distribution is displayed in the following table. Men Women LBig 2 55 Equal 10 18 RBig 28 14 In addition to the joint distrubion, there are two other kinds of distribtuions that are of interest in this situation. 1. The marginal distributions give the distribution of one variable ignoring the other variable altogether. It is common to put this information in the “margin” of the table. That’s where the name comes from. Here is our data set with the marginal distriubtions added. Men Women total LBig 2 55 57 Equal 10 18 28 RBig 28 14 42 Total 40 87 127 For the marginal distribution of gender, we see that 40 of the 127 subjects were men. For the marginal distribution of foot comparison, we see that 57 subjects had larger left feet, 28 and equally sized feet, and 42 had larger right feet. 2. The conditional distributions give the distribution of one variable for a fixed value of the other variable. Math 143 – Fall 2010 Chi-Squared Tests 3 For example, the conditional distribution of foot comparision among the men (i.e., conditional on being a man) appears in the first row of the table (2 with larger left feet, 10 with equally sized feet, and 28 with larger right feet out of 40 men). Of course each column also represents a conditional distribution (this time conditional on a particular foot comparison). Imporant idea: Conditional distributions are found by looking at rows or columns of a twoway table, because a row or column represents selecting just one possible value for one variable and looking at the distriubtion of the other variable in that situation. Often it is more useful to express the joint, marginal, and conditional distributions using percentages or proportions rather than counts. This especially makes it easier to compare them when the total counts differ. Here are the joint and marginal distributions again described with percentages. Men Women Total LBig 1.6 43.3 44.9 Equal 7.9 14.2 22.0 RBig 22.0 11.0 33.1 Total 31.5 68.5 100.0 Of more interest to us are the conditional probabilities. We would like to know, for example, how the percentage of right-handed men with larger left feet compares to the percentage of right-handed women with larger left feet. We can do this by calculating tables of row- and column-percents (or proportions). The tables below show percentages. Men Women LBig 5.0 63.2 Men Women Total Equal 25.0 20.7 LBig 3.5 96.5 100.0 RBig 70.0 16.1 Equal 35.7 64.3 100.0 Total 100.0 100.0 RBig 66.7 33.3 100.0 1 Express the percentages in the previous two tables using conditional probability notation. [P(A | B)] 2 Answer the following questions refering to the previous two tables. a) One of these tables is much more informative than the other. Which one and why? b) Informally/intuitively, what do these data suggest? c) Can we be sure that this holds for the population as well as for our sample? Why or why not? d) Do you think that this might hold for the population as well as for our sample? Why or why not? Math 143 – Fall 2010 Chi-Squared Tests 4 The Chi-Square Test for 2-Way Tables The hypothesis test that we use in this situation is called the Pearson chi-square test. It follows the same outline as our other hypothesis tests, but we have some details to fill in. 1. State the null and alternative hypotheses: Details needed: 2. Compute the test statistic. Details needed: 3. Find the p-value Details needed: 4. Draw a conclusion. Details needed: Step 1: State the Hypotheses Chi-square tests are generally used to answer 1 of 2 questions (or, the same question phrased 2 ways): 1. Does the percent in a certain category change from population to population? Example: Does the percent of deaths from heart disease change from country to country? Or we could say, is there an association between the percent of deaths from heart disease and which country a person is from? 2. Are these two categorical variables independent? Example: Is alcohol consumption associated with oesophageal cancer? In part the difference has to do with how the data were collected. If we independently sample from multiple populations, question 1 often fits the situation better. If we sample from one population and categorize the subjects in two ways, question 2 often fits better. So our null and alternative hypotheses are • H0 : • Ha : Math 143 – Fall 2010 Chi-Squared Tests 5 Step 2: Compute the Test Statistic Here is our table again. We want to measure how close these number are to what we expect if the null hypothesis is true. To answer that, we first need to ask “What do we expect?” Observed Counts Men Women total LBig 2 55 57 Equal 10 18 28 RBig 28 14 42 Total 40 87 127 3 The top left cell contains the number 2. What would we expect if there is no association between gender and foot comparison? Fill in the table below with the expected counts for each cell. Expected Counts LBig Equal RBig Men Women We still need a way to compare the tables of observed and expected counts and to summarize the differences in a single number, our test statistic: X2 = = = 45.00 Step 3: Compute the p-value So how big is 45? Is it unusually big? About what we would expect? The p-value will answer that for us. But to get that answer we first need to answer a different question, namely: Math 143 – Fall 2010 Chi-Squared Tests 6 Step 4: Interpret the p-value and Draw a Conclusion StatCrunch can do all the work in steps 2 and 3 for us if we like (Stat > Tables > Contingency). Here is the output from StatCrunch: • There are 2 degrees of freedom because • The p-value is < 0.0001, so • What more can we learn from the other information in the table? Math 143 – Fall 2010 Chi-Squared Tests 7 Example 1. A random sample of high school and college students in 1973 was classified according to political views and frequency of marijuana use. The following two-way table summarizes the data. Marijuana Use Liberal Consrvative Other Never 479 214 172 Rarely 173 47 45 Frequently 119 15 85 Example 2. A number of school children (elementary and middle) were asked which of three things they thought was most important to them, Grades, Popoularity, or Sports. Here are the results. boy girl Grades 27 20 Popular 11 14 Sports 9 3 Math 143 – Fall 2010 Chi-Squared Tests 8 Some Fine Print Is the Approximation Good Enough? The Chi-squared distribution is only an approximation. The approximation is poor when expected cell counts are small. Our rule of thumb will be this: The Chi-squared approximation is good enough for our purposes provided all the expected cell counts are at least 5. We will use this rule for both goodness of fit and 2-way tables situations. Some authors suggest slightly relaxes (but more complicated) rules of thumb. As with many things, this isn’t about black and white, but shades of gray. The smaller the cell counts, the worse things work. There are other methods that can be used when the cell counts are small, but we won’t cover them in this course. Causation? When we do a Chi-squared test for a 2-way table and the p-value is small, it is temping to infer a causal relationship between the 2 categorical variables, but the Chi-squared test cannot tell us about cause (at least not directly). Recall our examples of Simpson’s paradox. There may be an association between our two variables even without a causal relationship between them. Not All Tables are Created Equal For a table to be analyzed by these methods • The cell values must be counts. • Every individual in the sample must be counted in exactly 1 cell. This is another way of saying the tables come from tabulating the categories of categorical variables. Chi-squared or 2-proportion for 2 × 2 tables? 2 × 2 tables can also be analyzed using a 2-proportion test. The advantage to a 2-proportion approach is that we can produce a confidence interval for the difference in the two proportions. Beware of Really Large Data Sets In very large data sets, a small p-value might not be very interesting. With large data sets, we can detect very small deviances from the null hypothesis. But these small differences might not be important, even if they are statistically significant. We haven’t discussed following up with confidence intervals (our usual solution to this problem in previous scenarios), but we should at least • compare cell percentages to hypothesis percentages for goodness of fit tests. • compare row percentages across columns or column percentages across rows for 2-way tables. This will often give us a good sense for whether the differences are big enough to be interesting. Math 143 – Fall 2010 Chi-Squared Tests 9 Residuals The residual for the ith cell is defined by residuali = Oi − Ei √ . Ei The square of the residual is the contribution of a cell to the chi-squared statistic. The primary advantage of residuals is that they can be positive or negative. Looking at the residual will show which cells of the table are farthest from what we would have expected under H0 and whether those counts are too small or too large. StatCrunch doesn’t compute residuals, but other software does. Here is some typical output: > footsize Lbig Equal Rbig men 2 10 28 women 55 18 14 > xchisq.test(footsize) Pearson’s Chi-squared test data: footsize X-squared = 45.0029, df = 2, p-value = 1.689e-10 2.00 (17.95) [14.176] <-3.77> 10.00 ( 8.82) [ 0.158] < 0.40> 28.00 (13.23) [16.495] < 4.06> 55.00 (39.05) [ 6.518] < 2.55> 18.00 (19.18) [ 0.073] <-0.27> 14.00 (28.77) [ 7.584] <-2.75> key: observed (expected) [contribution to X-squared] <residual> From this we see that compared to what our null hypothesis predicted: • There were too few men with large left feet, and too many men with large right feet. • There were too few women with large right feet and too many women with large right feet. • The numbers of men and women with equal sized feet was about what we would have expected. Math 143 – Fall 2010 Chi-Squared Tests 10 Combining Cells Often researchers will combine cells from a larger table to make a smaller table. There are at least two reasons to do this: • Combining cells increases the cell counts, which can be important if some cells have small counts. • Some categories might naturally go together and lead to a test of a reasonable and interesting hypothesis. For example, we could combine the LBig and Rbig columns to see whether men or women are more likely to have feet that are the same size: Different Equal men 30 10 women 69 18 > chisq.test(footsize2) Pearson’s Chi-squared test with Yates’ continuity correction data: footsize2 X-squared = 0.0985, df = 1, p-value = 0.7536 Here we see that we have no reason to reject the null hypothesis that men and women are equally likely to have equally sized feet. (In this case we could have also used a 2-proportion test.)
© Copyright 2024 Paperzz