GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 CHAPTER Lisl Dennis/Getty Images P1: PBU/OVY Two Categorical Variables: The Chi-Square Test The two-sample z procedures of Chapter 21 allow us to compare the proportions of successes in two groups, either two populations or two treatment groups in an experiment. In the first example in Chapter 21 (page 513), we compared young men and young women by looking at whether or not they lived with their parents. That is, we looked at a relationship between two categorical variables, gender (female or male) and “Where do you live?” (with parents or not). In fact, the data include three more outcomes for “Where do you live?”: in another person’s home, in your own place, and in group quarters such as a dormitory. When there are more than two outcomes, or when we want to compare more than two groups, we need a new statistical test. The new test addresses a general question: is there a relationship between two categorical variables? 23 In this chapter we cover... Two-way tables The problem of multiple comparisons Expected counts in two-way tables The chi-square test Using technology Cell counts required for the chi-square test Uses of the chi-square test The chi-square distributions The chi-square test and the z test∗ The chi-square test for goodness of fit∗ Two-way tables We saw in Chapter 6 that we can present data on two categorical variables in a two-way table of counts. That’s our starting point. Here is an example. EXAMPLE 23.1 Health care: Canada and the United States Canada has universal health care. The United States does not, but often offers more elaborate treatment to patients with access. How do the two systems compare in 547 P1: PBU/OVY GTBL011-23 548 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test treating heart attacks? A comparison of random samples of 2600 U.S. and 400 Canadian heart attack patients found that “the Canadian patients typically stayed in the hospital one day longer (P = 0.009) than the U.S. patients but had a much lower rate of cardiac catheterization (25 percent vs. 72 percent, P < 0.001), coronary angioplasty (11 percent vs. 29 percent, P < 0.001), and coronary bypass surgery (3 percent vs. 14 percent, P < 0.001).”1 The study then looked at many outcomes a year after the heart attack. There was no significant difference in the patients’ survival rate. Another key outcome was the patients’ own assessment of their quality of life relative to what it had been before the heart attack. Here are the data for the patients who survived a year: Quality of life Much better Somewhat better About the same Somewhat worse Much worse Total cell Canada United States 75 71 96 50 19 541 498 779 282 65 311 2165 The two-way table in Example 23.1 shows the relationship between two categorical variables. The explanatory variable is the patient’s country, Canada or the United States. The response variable is quality of life a year after a heart attack, with 5 categories. The two-way table gives the counts for all 10 combinations of values of these variables. Each of the 10 counts occupies a cell of the table. It is hard to compare the counts because the U.S. sample is much larger. Here are the percents of each sample with each outcome: Quality of life Much better Somewhat better About the same Somewhat worse Much worse Total Canada United States 24% 23% 31% 16% 6% 25% 23% 36% 13% 3% 100% 100% In the language of Chapter 6 (page 153), these are the conditional distributions of outcomes, given the patients’ nationality. The differences are not large, but slightly higher percents of Canadians thought their quality of life was “somewhat worse” or “much worse.” Figure 23.1 compares the two distributions. We want to know if there is a significant difference between the two distributions of outcomes. QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 About the same Much better These bars compare “somewhat worse” percents: 16% in Canada, 13% in the United States. Somewhat better 20 Percent 30 40 Two-way tables Somewhat worse 10 GTBL011-23 P2: PBU/OVY Much worse 0 P1: PBU/OVY Can. U.S. Can. U.S. Can. U.S. Can. U.S. Can. U.S. F I G U R E 2 3 . 1 Bar graph comparing quality of life a year after a heart attack in Canada and the United States, for Example 23.1. APPLY YOUR KNOWLEDGE 23.1 Smoking among French men. Smoking remains more common in much of Europe than in the United States. In the United States, there is a strong relationship between education and smoking: well-educated people are less likely to smoke. Does a similar relationship hold in France? Here is a two-way table of the level of education and smoking status (nonsmoker, former smoker, moderate smoker, heavy smoker) of a sample of 459 French men aged 20 to 60 years.2 The subjects are a random sample of men who visited a health center for a routine checkup. We are willing to consider them an SRS of men from their region of France. Smoking Status Education Primary school Secondary school University Nonsmoker Former Moderate Heavy 56 37 53 54 43 28 41 27 36 36 32 16 (a) What percent of men with a primary school education are nonsmokers? Former smokers? Moderate smokers? Heavy smokers? These percents should add to 100% (up to roundoff error). They form the conditional distribution of smoking, given a primary education. (b) In a similar way, find the conditional distributions of smoking among men with a secondary education and among men with a university education. Make a Lisl Dennis/Getty Images 549 P1: PBU/OVY GTBL011-23 550 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test table that presents the three conditional distributions. Be sure to include a “Total” column showing that each row adds to 100%. (c) Compare the three conditional distributions. Is there any clear relationship between education and smoking? 23.2 Attitudes toward recycled products. Recycling is supposed to save resources. Some people think recycled products are lower in quality than other products, a fact that makes recycling less practical. Here are data on attitudes toward coffee filters made of recycled paper.3 Think the quality of the recycled product is Buyers Nonbuyers Higher The same Lower 20 29 7 25 9 43 (a) It appears that people who have bought the recycled filters have more positive opinions than those who have not. Give percents to back up this claim. Make a bar graph that compares your percents for buyers and nonbuyers. (b) Association does not prove causation. Explain how buying recycled filters might improve a person’s opinion of their quality. Then explain how the opinion a person holds might influence his or her decision to buy or not. You see that the cause-and-effect relationship might go in either direction. The problem of multiple comparisons The null hypothesis in Example 23.1 is that there is no difference between the distributions of outcomes in Canada and the United States. Put more generally, the null hypothesis is that there is no relationship between two categorical variables, H 0 : there is no relationship between nationality and quality of life The alternative hypothesis says that there is a relationship but does not specify any particular kind of relationship, H a : there is some relationship between nationality and quality of life Any difference between the Canadian and American distributions means that the null hypothesis is false and the alternative hypothesis is true. The alternative hypothesis is not one-sided or two-sided. We might call it “many-sided” because it allows any kind of difference. With only the methods we already know, we might start by comparing the proportions of patients in the two nations with “much better” quality of life, using the two-sample z test for proportions. We could similarly compare the proportions with each of the other outcomes: five tests in all, with five P-values. This is a P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 The problem of multiple comparisons bad idea. The P-values belong to each test separately, not to the collection of five tests together. Think of the distinction between the probability that a basketball player makes a free throw and the probability that she makes all of five free throws. When we do many individual tests or confidence intervals, the individual P-values and confidence levels don’t tell us how confident we can be in all of the inferences taken together. Because of this, it’s cheating to pick out the largest of the five differences and then test its significance as if it were the only comparison we had in mind. For example, the “much worse” proportions in Example 23.1 are significantly different (P = 0.0047) if we compare just this one outcome. But is it surprising that the most different proportions among five outcomes differ by this much? That’s a different question. The problem of how to do many comparisons at once with an overall measure of confidence in all our conclusions is common in statistics. This is the problem of multiple comparisons. Statistical methods for dealing with multiple comparisons usually have two steps: 1. An overall test to see if there is good evidence of any differences among the parameters that we want to compare. 2. A detailed follow-up analysis to decide which of the parameters differ and to estimate how large the differences are. The overall test, though more complex than the tests we met earlier, is often reasonably straightforward. The follow-up analysis can be quite elaborate. In our basic introduction to statistical practice, we will concentrate on the overall test, along with data analysis that points to the nature of the differences. APPLY YOUR KNOWLEDGE 23.3 Nonsmokers and education in France. In the setting of Exercise 23.1, consider only the proportions of nonsmokers in the three populations of men with primary, secondary, and university education. Do three significance tests of the three null hypotheses H 0: p primary = p secondary H 0: p primary = p university H 0: p secondary = p university against the two-sided alternatives. Give P-values for each test. These three P-values don’t tell us how often the three proportions for the three education groups will be spread this far apart just by chance. 23.4 Who’s online? A sample survey by the Pew Internet and American Life Project asked a random sample of adults about use of the Internet and about the type of community they lived in. Following is the two-way table:4 CAUTION UTION multiple comparisons 551 P1: PBU/OVY GTBL011-23 552 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test Community Type Internet users Nonusers Rural Suburban Urban 433 463 1072 627 536 388 (a) Give three 95% confidence intervals, for the percents of adults in rural, suburban, and urban communities who use the Internet. (b) Explain clearly why we are not 95% confident that all three of these intervals capture their respective population proportions. Expected counts in two-way tables Our general null hypothesis H 0 is that there is no relationship between the two categorical variables that label the rows and columns of a two-way table. To test H 0 , we compare the observed counts in the table with the expected counts, the counts we would expect—except for random variation—if H 0 were true. If the observed counts are far from the expected counts, that is evidence against H 0 . It is easy to find the expected counts. He started it! A study of deaths in bar fights showed that in 90% of the cases, the person who died started the fight. You shouldn’t believe this. If you killed someone in a fight, what would you say when the police ask you who started the fight? After all, dead men tell no tales. EXPECTED COUNTS The expected count in any cell of a two-way table when H 0 is true is expected count = EXAMPLE 23.2 row total × column total table total Observed versus expected counts Let’s find the expected counts for the quality-of-life study. Here is the two-way table with row and column totals: Quality of life Much better Somewhat better About the same Somewhat worse Much worse Total Canada United States Total 75 71 96 50 19 541 498 779 282 65 616 569 875 332 84 311 2165 2476 The expected count of Canadians with much better quality of life a year after a heart attack is row 1 total × column 1 total (616)(311) = = 77.37 table total 2476 P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 Expected counts in two-way tables Here is the table of all 10 expected counts: Quality of life Canada United States Total Much better Somewhat better About the same Somewhat worse Much worse 77.37 71.47 109.91 41.70 10.55 538.63 497.53 765.09 290.30 73.45 616 569 875 332 84 Total 311 2165 As this table shows, the expected counts have exactly the same row and column totals (up to roundoff error) as the observed counts. That’s a good way to check your work. To see how the data diverge from the null hypothesis, compare the observed counts with these expected counts. You see, for example, that 19 Canadians reported much worse quality of life, whereas we would expect only 10.55 if the null hypothesis were true. Why the formula works Where does the formula for an expected cell count come from? Think of a basketball player who makes 70% of her free throws in the long run. If she shoots 10 free throws in a game, we expect her to make 70% of them, or 7 of the 10. Of course, she won’t make exactly 7 every time she shoots 10 free throws in a game. There is chance variation from game to game. But in the long run, 7 of 10 is what we expect. In more formal language, if we have n independent tries and the probability of a success on each try is p, we expect np successes. Now go back to the count of Canadians with much better quality of life a year after a heart attack. The proportion of all 2476 subjects with much better quality of life is count of successes row 1 total 616 = = table total table total 2476 Think of this as p, the overall proportion of successes. If H 0 is true, we expect (except for random variation) this same proportion of successes in both countries. So the expected count of successes among the 311 Canadians is 616 = 77.37 np = (311) 2746 That’s the formula in the Expected Counts box. APPLY YOUR KNOWLEDGE 23.5 Smoking among French men. The two-way table in Exercise 23.1 displays data on the education and smoking behavior of a sample of French men. The null hypothesis says that there is no relationship between these variables. That is, the distribution of smoking is the same for all three levels of education. 553 P1: PBU/OVY GTBL011-23 554 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test (a) Find the expected counts for each smoking status among men with a university education. This is one row of the two-way table of expected counts. Find the row total and verify that it agrees with the row total for the observed counts. (b) We conjecture that men with a university education smoke less than the null hypothesis calls for. How does comparing the observed and expected counts in this row confirm this conjecture? 23.6 Attitudes toward recycled products. Exercise 23.2 describes a comparison of the attitudes of people who do and don’t buy coffee filters made of recycled paper. The null hypothesis “no relationship” says that in the population of all consumers, the proportions who hold each attitude are the same for buyers and nonbuyers. (a) Find the expected cell counts if this hypothesis is true and display them in a two-way table. Add the row and column totals to your table and check that they agree with the totals for the observed counts. (b) Are there any large deviations between the observed counts and the expected counts? What kind of relationship between the two variables do these deviations point to? The chi-square test The statistical test that tells us whether the observed differences between Canada and the United States are statistically significant compares the observed and expected counts. The test statistic that makes the comparison is the chi-square statistic. CHI-SQUARE STATISTIC The chi-square statistic is a measure of how far the observed counts in a two-way table are from the expected counts. The formula for the statistic is X2 = (observed count − expected count)2 expected count The sum is over all cells in the table. The chi-square statistic is a sum of terms, one for each cell in the table. In the quality-of-life example, 75 Canadian patients reported much better quality of life. The expected count for this cell is 77.37. So the term of the chi-square statistic from this cell is (observed count − expected count)2 (75 − 77.37)2 = expected count 77.37 5.617 = 0.073 = 77.37 P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 Using technology Think of the chi-square statistic X 2 as a measure of the distance of the observed counts from the expected counts. Like any distance, it is always zero or positive, and it is zero only when the observed counts are exactly equal to the expected counts. Large values of X 2 are evidence against H 0 because they say that the observed counts are far from what we would expect if H 0 were true. Although the alternative hypothesis H a is many-sided, the chi-square test is one-sided because any violation of H 0 tends to produce a large value of X 2 . Small values of X 2 are not evidence against H 0 . Using technology Calculating the expected counts and then the chi-square statistic by hand is a bit time-consuming. As usual, software saves time and always gets the arithmetic right. Figure 23.2 (pages 556 and 557) shows output for the chi-square test for the quality-of-life data from a graphing calculator, two statistical programs, and a spreadsheet program. EXAMPLE 23.3 Chi-square from software The outputs differ in the information they give. All except the Excel spreadsheet tell us that the chi-square statistic is X 2 = 11.725, with P-value 0.020. There is quite good evidence that the distributions of outcomes are different in Canada and the United States. The two statistical programs repeat the two-way table of observed counts and add the row and column totals. Both programs offer additional information on request. We asked CrunchIt! to add the column percents that enable us to compare the Canadian and American distributions. The chi-square statistic is a sum of 10 terms, one for each cell in the table. We asked Minitab to give the expected count and the contribution to chi-square for each cell. The top-left cell has expected count 77.4 and chi-square term 0.073, just as we calculated. Look at the 10 terms. More than half the value of X 2 (6.766 out of 11.725) comes from just one cell. This points to the most important difference between the two countries: a higher proportion of Canadians report much worse quality of life. Most of the rest of X 2 comes from two other cells: more Canadians report somewhat worse quality of life, and fewer report about the same quality. Excel is as usual more awkward than software designed for statistics. It lacks a menu selection for the chi-square test. You must program the spreadsheet to calculate the expected cell counts and then use the CHITEST worksheet formula. This gives the P-value but not the test statistic itself. You can of course program the spreadsheet to find the value of X 2 . The Excel output shows the observed and expected cell counts and the P-value. The chi-square test is the overall test for detecting relationships between two categorical variables. If the test is significant, it is important to look at the data to learn the nature of the relationship. We have three ways to look at the quality-oflife data: • Compare appropriate percents: which outcomes occur in quite different percents of Canadian and American patients? This is the method we learned in Chapter 6. CAUTION UTION 555 P1: PBU/OVY GTBL011-23 556 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 22, 2006 21:38 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test TI-83 CrunchIt! Cell format Count (Column percent) Canada USA Total Much better 75 (24.12%) 541 (24.99%) 616 (24.88%) Somewhat better 71 (22.83%) 498 (23%) 569 (22.98%) About the same Somewhat worse Much worse Total Statistic Chi-square DF 779 875 96 (30.87%) (35.98%) (35.34%) 50 282 332 (16.08%) (13.03%) (13.41%) 19 65 84 (6.109%) (3.002%) (3.393%) 311 2165 2476 (100.00%) (100.00%) (100.00%) Value 4 11.725485 P-value 0.0195 F I G U R E 2 3 . 2 Output from the TI-83 graphing calculator, CrunchIt!, Minitab, and Excel for the two-way table in the quality-of-life study (continued). • • Compare observed and expected cell counts: which cells have more or fewer observations than we would expect if H 0 were true? Look at the terms of the chi-square statistic: which cells contribute the most to the value of X 2 ? EXAMPLE 23.4 Canada and the United States: conclusions There is a significant difference between the distributions of quality of life reported by Canadian and American patients a year after a heart attack. All three ways of comparing the distributions agree that the main difference is that a higher proportion of Canadians P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 22, 2006 21:38 Using technology Minitab Canada USA All Much better 75 77.4 0.0728 541 538.6 0.0105 616 616.0 * Somewhat better 71 71.5 0.0031 498 497.5 0.0004 569 569.0 * About the same 96 109.9 1.7593 779 765.1 0.2527 875 875.0 * Somewhat worse 50 41.7 1.6515 282 290.3 0.2372 332 332.0 * Much worse 19 10.6 6.7660 65 73.4 0.9719 84 84.0 * All 311 311.0 * 2165 2165.0 * 2476 2476.0 * Cell Contents: Count Expected count Contribution to Chi-square Pearson Chi-Square = 11.725, DF = 4, P-Value = 0.020 Excel A B 1 Observed Canada 2 3 4 5 6 7 C D USA 75 71 96 50 19 541 498 779 282 65 USA 8 Expected Canada 77.37 538.63 9 71.47 497.53 10 109.91 765.09 11 41.7 290.3 12 10.55 73.45 13 14 15 16 CHITEST(B2:C6,B9:C13) 0.019482 17 Sheet1 Sheet2 Sheet3 F I G U R E 2 3 . 2 (continued). This key identifies the output for each cell in the table. 557 P1: PBU/OVY GTBL011-23 558 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 22, 2006 21:38 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test report that their quality of life is worse than before their heart attack. Other response variables measured in the study agree with this conclusion. The broader conclusion, however, is controversial. Americans are likely to point to the better outcomes produced by their much more intensive treatment. Canadians reply that the differences are small, that there was no significant difference in survival, and that the American advantage comes at high cost. The resources spent on expensive treatment of heart attack victims could instead be spent on providing basic health care to the many Americans who lack it. There is an important message here: although statistical studies shed light on issues of public policy, statistics alone rarely settles complicated questions such as “Which kind of health care system works better?” APPLY YOUR KNOWLEDGE 23.7 Smoking among French men. In Exercises 23.1 and 23.5, you began to analyze data on the smoking status and education of French men. Figure 23.3 displays the Minitab output for the chi-square test applied to these data. (a) Starting from the observed and expected counts in the output, calculate the four terms of the chi-square statistic for the bottom row (university education). Verify that your work agrees with Minitab’s “Contribution to Chi-square” up to roundoff error. Nonsmoker Former Moderate Heavy All Primary 56 59.48 0.2038 54 50.93 0.1856 41 42.37 0.0443 36 34.22 0.0924 187 187.00 * Secondary 37 44.21 1.1769 43 37.85 0.6996 27 31.49 0.6414 32 25.44 1.6928 139 139.00 * University 53 42.31 2.7038 28 36.22 1.8655 36 30.14 1.1414 16 24.34 2.8576 133 133.00 * All 146 146.00 * 125 125.00 * 104 104.00 * 84 84.00 * 459 459.00 * Cell Contents: Count Expected count Contribution to Chi-square Pearson Chi-Square = 13.305, DF = 6, P-Value = 0.038 F I G U R E 2 3 . 3 Minitab output for the two-way table of education level and smoking status among French men, for Exercise 23.7. P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 22, 2006 21:38 Cell counts required for the chi-square test (b) According to Minitab, what is the value of the chi-square statistic X 2 and the P-value of the chi-square test? (c) Look at the “Contribution to Chi-square” entries in Minitab’s display. Which terms contribute the most to X 2 ? Write a brief summary of the nature and significance of the relationship between education and smoking. 23.8 Attitudes toward recycled products. In Exercises 23.2 and 23.6 you began to analyze data on consumer attitudes toward recycled products. Figure 23.4 gives CrunchIt! output for these data. (a) Starting from the observed and expected counts, find the six terms of the chi-square statistic and then the statistic X 2 itself. Check your work against the computer output. (b) What is the P-value for the test? Explain in simple language what it means to reject H 0 in this setting. (c) Which cells contribute the most to X 2 ? What kind of relationship do these terms in combination with the row percents in the table point to? Cell format Count (Row percent) Expected count Higher Buyers Nonbuyers Total The same 20 7 (55.56%) (19.44%) 13.26 8.662 Lower Total 9 36 (25%) (100.00%) 14.08 29 25 43 97 (29.9%) (25.77%) (44.33%) (100.00%) 35.74 23.34 37.92 49 32 (36.84%) (24.06%) 52 133 (39.1%) (100.00%) Statistic DF Value P-value Chi-square 2 7.638116 0.0219 F I G U R E 2 3 . 4 CrunchIt! output for the study of consumer attitudes toward recycled products, for Exercise 23.8. Cell counts required for the chi-square test The chi-square test, like the z procedures for comparing two proportions, is an approximate method that becomes more accurate as the counts in the cells of the table get larger. We must therefore check that the counts are large enough to trust the P-value. Fortunately, the chi-square approximation is accurate for quite modest counts. Here is a practical guideline.5 559 P1: PBU/OVY GTBL011-23 560 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test CELL COUNTS REQUIRED FOR THE CHI-SQUARE TEST You can safely use the chi-square test with critical values from the chi-square distribution when no more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater. In particular, all four expected counts in a 2 × 2 table should be 5 or greater. Note that the guideline uses expected cell counts. The expected counts for the quality of life study of Example 23.1 appear in the Minitab output in Figure 23.2. The smallest expected count is 10.6, so the data easily meet the guideline for safe use of chi-square. APPLY YOUR KNOWLEDGE 23.9 Does chi-square apply? Figure 23.3 displays Minitab output for data on French men. Using the information in the output, verify that the data meet the cell count requirement for use of chi-square. 23.10 Does chi-square apply? Figure 23.4 displays CrunchIt! output for data on consumer attitudes toward recycled products. Using the information in the output, verify that the data meet the cell count requirement for use of chi-square. Uses of the chi-square test Two-way tables can arise in several ways. The study of the quality of life of heart attack patients compared two independent random samples, one in Canada and the other in the United States. The design of the study fixed the sizes of the two samples. The next example illustrates a different setting, in which all the observations come from just one sample. 4 STEP EXAMPLE 23.5 Extracurricular activities and grades STATE: North Carolina State University studied student performance in a course required by its chemical engineering major. Students must earn at least a C in the course in order to continue in the major. One question of interest was the relationship between time spent in extracurricular activities and success in the course. Students were asked to estimate how many hours per week they spent on extracurricular activities (less than 2, 2 to 12, or greater than 12). The CrunchIt! output in Figure 23.5 shows the two-way table of extracurricular activity time and course grade for the 119 students who answered the question.6 FORMULATE: Carry out a chi-square test for H 0 : there is no relationship between extracurricular activity time and course grade P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 22, 2006 18:45 Uses of the chi-square test H a : there is some relationship between these two variables Compare column percents or observed versus expected cell counts or terms of chi-square to see the nature of the relationship. SOLVE: First check the guideline for use of chi-square. The expected cell counts appear in the output in Figure 23.5. Two of the expected counts are quite small, 5.513 and 2.487. But all the expected counts are greater than 1, and only 1 out of 6 (17%) is less than 5. We can safely use chi-square. The output shows that there is a significant relationship ( X 2 = 6.926, P = 0.0313). The column percents show an interesting pattern: students who spend low and high amounts of time on extracurricular activities are both less likely to earn a C or better than students who spend a moderate amount of time. CONCLUDE: We find that 75% of students in the moderate extracurricular activity group succeed in the course, compared with 55% in the low group and only 38% in the high group. These differences in success percents are significant (P = 0.03). Because there are few students in the low and (especially) high groups, we now wish that the questionnaire had not lumped 2 to 12 hours together. We should also look at other data that might help explain the pattern. For example, are the “low extracurricular” students more often employed? Or are they students with low GPAs who are struggling despite lots of study time? F I G U R E 2 3 . 5 CrunchIt! output for the two-way table of course grade and extracurricular activities, for Example 23.5. Alt-6/Alamy 561 P1: PBU/OVY GTBL011-23 562 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test Pay attention to the nature of the data in Example 23.5: • • We do not have three separate samples of students with low, moderate, and high extracurricular activity. We have a single group of 119 students, each classified in two ways (extracurricular activity and course grade). The data (except for small nonresponse) cover all of the students enrolled in this course in one semester. We might regard this as a sample of students enrolled in the course over several years. But we might also regard these 119 students as the entire population rather than a sample from a larger population. One of the most useful properties of chi-square is that it tests the null hypothesis “the row and column variables are not related to each other” whenever this hypothesis makes sense for a two-way table. It makes sense when we are comparing a categorical response in two or more samples, as when we compared quality of life for patients in Canada and the United States. The hypothesis also makes sense when we have data on two categorical variables for the individuals in a single sample, as when we examined grades and extracurricular activities for a sample of college students. The hypothesis “no relationship” makes sense even if the single sample is an entire population. Statistical significance has the same meaning in all these settings: “A relationship this strong is not likely to happen just by chance.” This makes sense whether the data are a sample or an entire population. USES OF THE CHI-SQUARE TEST Use the chi-square test to test the null hypothesis H 0 : there is no relationship between two categorical variables when you have a two-way table from one of these situations: • Independent SRSs from each of two or more populations, with each individual classified according to one categorical variable. (The other variable says which sample the individual comes from.) • A single SRS, with each individual classified according to both of two categorical variables. APPLY YOUR KNOWLEDGE 23.11 Majors for men and women in business. A study of the career plans of young women and men sent questionnaires to all 722 members of the senior class in the P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 The chi-square distributions College of Business Administration at the University of Illinois. One question asked which major within the business program the student had chosen. Here are the data from the students who responded:7 Female Male 68 91 5 61 56 40 6 59 Accounting Administration Economics Finance This is an example of a single sample classified according to two categorical variables (gender and major). (a) Describe the differences between the distributions of majors for women and men with percents, with a bar graph, and in words. (b) Verify that the expected cell counts satisfy the requirement for use of chi-square. (c) Test the null hypothesis that there is no relationship between the gender of students and their choice of major. Give a P-value. (d) Which two cells have the largest terms of the chi-square statistic? How do the observed and expected counts differ in these cells? (This should strengthen your conclusions in (a).) (e) What percent of the students did not respond to the questionnaire? Why does this nonresponse weaken conclusions drawn from these data? The chi-square distributions Software usually finds P-values for us. The P-value for a chi-square test comes from comparing the value of the chi-square statistic with critical values for a chi-square distribution. THE CHI-SQUARE DISTRIBUTIONS The chi-square distributions are a family of distributions that take only positive values and are skewed to the right. A specific chi-square distribution is specified by giving its degrees of freedom. The chi-square test for a two-way table with r rows and c columns uses critical values from the chi-square distribution with (r − 1)(c − 1) degrees of freedom. The P-value is the area to the right of X 2 under the density curve of this chi-square distribution. 563 P1: PBU/OVY GTBL011-23 564 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test df = 1 df = 4 df = 8 0 F I G U R E 2 3 . 6 Density curves for the chi-square distributions with 1, 4, and 8 degrees of freedom. Chi-square distributions take only positive values and are right-skewed. Figure 23.6 shows the density curves for three members of the chi-square family of distributions. As the degrees of freedom increase, the density curves become less skewed and larger values become more probable. Table E in the back of the book gives critical values for chi-square distributions. You can use Table E if you do not have software that gives you P-values for a chi-square test. EXAMPLE 23.6 Using the chi-square table The two-way table of 5 outcomes by 2 countries for the quality-of-life study has 5 rows and 2 columns. That is, r = 5 and c = 2. The chi-square statistic therefore has degrees of freedom (r − 1)(c − 1) = (5 − 1)(2 − 1) = (4)(1) = 4 df = 4 p x ∗ .02 .01 11.67 13.28 Three of the outputs in Figure 23.2 give 4 as the degrees of freedom. The observed value of the chi-square statistic is X 2 = 11.725. Look in the df = 4 row of Table E. The value X 2 = 11.725 falls between the 0.02 and 0.01 critical values of the chi-square distribution with 4 degrees of freedom. Remember that the chi-square test is always one-sided. So the P-value of X 2 = 11.725 is between 0.02 and 0.01. The outputs in Figure 23.2 show that the P-value is 0.0195, close to 0.02. We know that all z and t statistics measure the size of an effect in the standard scale centered at zero. We can roughly assess the size of any z or t statistic by the 68–95–99.7 rule, though this is exact only for z. The chi-square statistic does not P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 The chi-square test and the z test have any such natural interpretation. But here is a helpful fact: the mean of any chi-square distribution is equal to its degrees of freedom. In Example 23.6, X 2 would have mean 4 if the null hypothesis were true. The observed value X 2 = 11.725 is so much larger than 4 that we suspect it is significant even before we look at Table E. APPLY YOUR KNOWLEDGE 23.12 Attitudes toward recycled products. The CrunchIt! output in Figure 23.4 gives 2 degrees of freedom for the table in Exercise 23.2. (a) Verify that this is correct. (b) The computer gives the value of the chi-square statistic as X 2 = 7.638. Between what two entries in Table E does this value lie? What does the table tell you about the P-value? (c) What is the mean value of the statistic X 2 if the null hypothesis is true? How does the observed value of X 2 compare with this mean? 23.13 Smoking among French men. The Minitab output in Figure 23.3 gives the degrees of freedom for the table of education and smoking status as DF = 6. (a) Show that this is correct for a table with 3 rows and 4 columns. (b) Minitab gives the chi-square statistic as Chi-Square 13.305. Between which two entries in Table E does this value lie? Verify that Minitab’s result P-Value = 0.038 lies between the tail areas for these values. The chi-square test and the z test∗ One use of the chi-square test is to compare the proportions of successes in any number of groups. If the r rows of the two-way table are r groups and the columns are “success” and “failure,” the counts form an r × 2 table. P-values come from the chi-square distribution with r − 1 degrees of freedom. If r = 2, we are comparing just two proportions. We now have two ways to do this: the z test from Chapter 21 and the chi-square test with 1 degree of freedom for a 2 × 2 table. These two tests always agree. In fact, the chi-square statistic X 2 is just the square of the z statistic, and the P-value for X 2 is exactly the same as the two-sided P-value for z. We recommend using the z test to compare two proportions because it gives you the choice of a one-sided test and is related to a confidence interval for the difference p1 − p2. APPLY YOUR KNOWLEDGE 23.14 Treating ulcers. Gastric freezing was once a recommended treatment for ulcers in the upper intestine. Use of gastric freezing stopped after experiments showed it had no effect. One randomized comparative experiment found that 28 of the 82 ∗ The remainder of the material in this chapter is optional. 565 P1: PBU/OVY GTBL011-23 566 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test gastric-freezing patients improved, while 30 of the 78 patients in the placebo group improved.8 We can test the hypothesis of “no difference” between the two groups in two ways: using the two-sample z statistic or using the chi-square statistic. (a) Check the conditions required for both tests, given in the boxes on pages 521 and 560. The conditions are very similar, as they ought to be. (b) State the null hypothesis with a two-sided alternative and carry out the z test. What is the P-value, exactly from software or approximately from the bottom row of Table C? (c) Present the data in a 2 × 2 table. Use the chi-square test to test the hypothesis from (a). Verify that the X 2 statistic is the square of the z statistic. Use software or Table E to verify that the chi-square P-value agrees with the z result (up to the accuracy of the tables if you do not use software). (d) What do you conclude about the effectiveness of gastric freezing as a treatment for ulcers? The chi-square test for goodness of fit∗ The most common and most important use of the chi-square statistic is to test the hypothesis that there is no relationship between two categorical variables. A variation of the statistic can be used to test a different kind of null hypothesis: that a categorical variable has a specified distribution. Here is an example that illustrates this use of chi-square. EXAMPLE 23.7 More chi-square tests There are other chi-square tests for hypotheses more specific than “no relationship.” A sociologist places people in classes by social status, waits ten years, then classifies the same people again. The row and column variables are the classes at the two times. She might test the hypothesis that there has been no change in the overall distribution of social status in the group. Or she might ask if moves up in status are balanced by matching moves down. These and other null hypotheses can be tested by variations of the chi-square test. Never on Sunday? Births are not evenly distributed across the days of the week. Fewer babies are born on Saturday and Sunday than on other days, probably because doctors find weekend births inconvenient. Exercise 1.4 (page 10) gives national data that demonstrate this fact. A random sample of 140 births from local records shows this distribution across the days of the week: Day Births Sun. Mon. Tue. Wed. Thu. Fri. Sat. 13 23 24 20 27 18 15 Sure enough, the two smallest counts of births are on Saturday and Sunday. Do these data give significant evidence that local births are not equally likely on all days of the week? The chi-square test answers the question of Example 23.7 by comparing observed counts with expected counts under the null hypothesis. The null hypothesis for births says that they are evenly distributed. To state the hypotheses carefully, write the discrete probability distribution for days of birth: P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 The chi-square test for goodness of fit Day Probability Sun. Mon. Tue. Wed. Thu. Fri. Sat. p1 p2 p3 p4 p5 p6 p7 The null hypothesis says that the probabilities are the same on all days. In that case, all 7 probabilities must be 1/7. So the null hypothesis is H 0: p 1 = p 2 = p 3 = p 4 = p 5 = p 6 = p 7 = 1 7 The alternative hypothesis says that days are not all equally probable: H a : not all p i = 1 7 As usual in chi-square tests, H a is a “many-sided” hypothesis that simply says that H 0 is not true. The chi-square statistic is also as usual: X2 = (observed count − expected count)2 expected count The expected count for an outcome with probability p is np, as we saw in the discussion following Example 23.2. Under the null hypothesis, all the probabilities p i are the same, so all 7 expected counts are equal to np i = 140 × 1 = 20 7 These expected counts easily satisfy our guideline for using chi-square. The chisquare statistic is X2 = (observed count − 20)2 20 (13 − 20) (23 − 20)2 (15 − 20)2 = + + ··· + 20 20 20 = 7.6 2 This new use of X 2 requires a different degrees of freedom. To find the Pvalue, compare X 2 with critical values from the chi-square distribution with degrees of freedom one less than the number of values the birth day can take. That’s 7 − 1 = 6 degrees of freedom. From Table E, we see that X 2 = 7.6 is smaller than the smallest entry in the df = 6 row, which is the critical value for tail area 0.25. The P-value is therefore greater than 0.25 (software gives the more exact value P = 0.269). These 140 births don’t give convincing evidence that births are not equally likely on all days of the week. The chi-square test applied to the hypothesis that a categorical variable has a specified distribution is called the test for goodness of fit. The idea is that the test assesses whether the observed counts “fit” the distribution. The only differences between the test of fit and the test for a two-way table are that the expected counts df = 6 p .25 .20 x∗ 7.84 8.56 567 P1: PBU/OVY GTBL011-23 568 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test are based on the distribution specified by the null hypothesis and that the degrees of freedom are one less than the number of possible outcomes in this distribution. Here are the details. THE CHI-SQUARE TEST FOR GOODNESS OF FIT A categorical variable has k possible outcomes, with probabilities p 1 , p 2 , p 3 , . . . , p k . That is, p i is the probability of the ith outcome. We have n independent observations from this categorical variable. To test the null hypothesis that the probabilities have specified values H 0: p 1 = p 10 , p 2 = p 20 , . . . , p k = p k0 use the chi-square statistic X2 = (count of outcome i − np i 0 )2 np i 0 The P-value is the area to the right of X 2 under the density curve of the chi-square distribution with k − 1 degrees of freedom. In Example 23.7, the outcomes are days of the week, with k = 7. The null hypothesis says that the probability of a birth on the ith day is p i 0 = 1/7 for all days. We observe n = 140 births and count how many fall on each day. These are the counts used in the chi-square statistic. APPLY YOUR KNOWLEDGE 23.15 Saving birds from windows. Many birds are injured or killed by flying into windows. It appears that birds don’t see windows. Can tilting windows down so that they reflect earth rather than sky reduce bird strikes? Place six windows at the edge of a woods: two vertical, two tilted 20 degrees, and two tilted 40 degrees. During the next four months, there were 53 bird strikes, 31 on the vertical window, 14 on the 20-degree window, and 8 on the 40-degree window.9 If the tilt has no effect, we expect strikes on all three windows to have equal probability. Test this null hypothesis. What do you conclude? 23.16 More on birth days. Births really are not evenly distributed across the days of the week. The data in Example 23.7 failed to reject this null hypothesis because of random variation in a quite small number of births. Here are data on 700 births in the same locale: Day Randy Duchaine/CORBIS Births Sun. Mon. Tue. Wed. Thu. Fri. Sat. 84 110 124 104 94 112 72 P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 The chi-square test for goodness of fit (a) The null hypothesis is that all days are equally probable. What are the probabilities specified by this null hypothesis? What are the expected counts for each day in 700 births? (b) Calculate the chi-square statistic for goodness of fit. (c) What are the degrees of freedom for this statistic? Do these 700 births give significant evidence that births are not equally probable on all days of the week? 23.17 Course grades. Most students in a large statistics course are taught by teaching assistants (TAs). One section is taught by the course supervisor, a senior professor. The distribution of grades for the hundreds of students taught by TAs this semester was Grade Probability A B C D/F 0.32 0.41 0.20 0.07 The grades assigned by the professor to students in his section were Grade A B C D/F Count 22 38 20 11 (These data are real. We won’t say when and where, but the professor was not the author of this book.) (a) What percents of each grade did students in the professor’s section earn? In what ways does this distribution of grades differ from the TA distribution? (b) Because the TA distribution is based on hundreds of students, we are willing to regard it as a fixed probability distribution. If the professor’s grading follows this distribution, what are the expected counts of each grade in his section? (c) Does the chi-square test for goodness of fit give good evidence that the professor’s grades follow a different distribution? (State hypotheses, check the guideline for using chi-square, give the test statistic and its P-value, and state your conclusion.) 23.18 What’s your sign? The University of Chicago’s General Social Survey (GSS) is the nation’s most important social science sample survey. For reasons known only to social scientists, the GSS regularly asks its subjects their astrological sign. Here are the counts of responses in the most recent year this question was asked:10 Sign Count Sign Count Aries Taurus Gemini Cancer Leo Virgo 225 222 241 240 260 250 Libra Scorpio Sagittarius Capricorn Aquarius Pisces 243 214 200 216 224 244 If births are spread uniformly across the year, we expect all 12 signs to be equally likely. Are they? Follow the four-step process in your answer. 4 STEP 569 P1: PBU/OVY GTBL011-23 570 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test C H A P T E R 23 SUMMARY The chi-square test for a two-way table tests the null hypothesis H 0 that there is no relationship between the row variable and the column variable. The alternative hypothesis H a says that there is some relationship but does not say what kind. The test compares the observed counts of observations in the cells of the table with the counts that would be expected if H 0 were true. The expected count in any cell is expected count = row total × column total table total The chi-square statistic is X2 = (observed count − expected count)2 expected count The chi-square test compares the value of the statistic X 2 with critical values from the chi-square distribution with (r − 1)(c − 1) degrees of freedom. Large values of X 2 are evidence against H 0 , so the P-value is the area under the chi-square density curve to the right of X 2 . The chi-square distribution is an approximation to the distribution of the statistic X 2 . You can safely use this approximation when all expected cell counts are at least 1 and no more than 20% are less than 5. If the chi-square test finds a statistically significant relationship between the row and column variables in a two-way table, do data analysis to describe the nature of the relationship. You can do this by comparing well-chosen percents, comparing the observed counts with the expected counts, and looking for the largest terms of the chi-square statistic. STATISTICS IN SUMMARY Here are the most important skills you should have acquired from reading this chapter. A. TWO-WAY TABLES 1. Understand that the data for a chi-square test must be presented as a two-way table of counts of outcomes. 2. Use percents to describe the relationship between any two categorical variables, starting from the counts in a two-way table. B. INTERPRETING CHI-SQUARE TESTS 1. Locate the chi-square statistic, its P-value, and other useful facts (row or column percents, expected counts, terms of chi-square) in output from your software or calculator. P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 Check Your Skills 2. Use the expected counts to check whether you can safely use the chi-square test. 3. Explain what null hypothesis the chi-square statistic tests in a specific two-way table. 4. If the test is significant, compare percents, compare observed with expected cell counts, or look for the largest terms of the chi-square statistic to see what deviations from the null hypothesis are most important. C. DOING CHI-SQUARE TESTS BY HAND 1. Calculate the expected count for any cell from the observed counts in a two-way table. Check whether you can safely use the chi-square test. 2. Calculate the term of the chi-square statistic for any cell, as well as the overall statistic. 3. Give the degrees of freedom of a chi-square statistic. Make a quick assessment of the significance of the statistic by comparing the observed value with the degrees of freedom. 4. Use the chi-square critical values in Table E to approximate the P-value of a chi-square test. CHECK YOUR SKILLS The National Survey of Adolescent Health interviewed several thousand teens (grades 7 to 12). One question asked was “What do you think are the chances you will be married in the next ten years?”Here is a two-way table of the responses by sex:11 Almost no chance Some chance, but probably not A 50-50 chance A good chance Almost certain Female Male 119 150 447 735 1174 103 171 512 710 756 23.19 The number of female teenagers in the sample is (a) 4877. (b) 2625. (c) 2252. 23.20 The percent of the females in the sample who responded “almost certain” is about (a) 44.7%. (b) 39.6%. (c) 33.6%. 23.21 The percent of the females in the sample who responded “almost certain” is (a) higher than the percent of males who felt this way. (b) about the same as the percent of males who felt this way. (c) lower than the percent of males who felt this way. 23.22 The expected count of females who respond “almost certain” is about (a) 464.6. (b) 891.2. (c) 1038.8. 571 P1: PBU/OVY GTBL011-23 572 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test 23.23 The term in the chi-square statistic for the cell of females who respond “almost certain” is about (a) 17.6. (b) 15.6. (c) 0.1. 23.24 The degrees of freedom for the chi-square test for this two-way table are (a) 4. (b) 8. (c) 20. 23.25 The null hypothesis for the chi-square test for this two-way table is (a) Equal proportions of female and male teenagers are almost certain they will be married in ten years. (b) There is no difference between female and male teenagers in their distributions of opinions about marriage. (c) There are equal numbers of female and male teenagers. 23.26 The alternative hypothesis for the chi-square test for this two-way table is (a) Female and male teenagers do not have the same distribution of opinions about marriage. (b) Female teenagers are more likely than male teenagers to think it is almost certain they will be married in ten years. (c) Female teenagers are less likely than male teenagers to think it is almost certain they will be married in ten years. 23.27 Software gives chi-square statistic X 2 = 69.8 for this table. From the table of critical values, we can say that the P-value is (a) between 0.0025 and 0.001. (b) between 0.001 and 0.0005. (c) less than 0.0005. 23.28 The most important fact that allows us to trust the results of the chi-square test is that (a) the sample is large, 4877 teenagers in all. (b) the sample is close to an SRS of all teenagers. (c) all of the cell counts are greater than 100. C H A P T E R 23 EXERCISES If you have access to software or a graphing calculator, use it to speed your analysis of the data in these exercises. Exercises 23.29 to 23.38 are suitable for hand calculation if necessary. 23.29 Who’s online? A sample survey by the Pew Internet and American Life Project asked a random sample of adults about use of the Internet and about the type of community they lived in. Here, repeated from Exercise 23.4, is the two-way table: Community Type Internet users Nonusers Rural Suburban Urban 433 463 1072 627 536 388 P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 Chapter 23 Exercises (a) Give a 95% confidence interval for the difference between the proportions of rural and suburban adults who use the Internet. (b) What is the overall pattern of the relationship between Internet use and community type? Is the relationship statistically significant? 23.30 Child care workers. A large study of child care used samples from the data tapes of the Current Population Survey over a period of several years. The result is close to an SRS of child care workers. The Current Population Survey has three classes of child care workers: private household, nonhousehold, and preschool teacher. Here are data on the number of blacks among women workers in these three classes:12 Household Nonhousehold Teachers Total Black 2455 1191 659 172 167 86 (a) What percent of each class of child care workers is black? (b) Make a two-way table of class of worker by race (black or other). (c) Can we safely use the chi-square test? What null and alternative hypotheses does X 2 test? (d) The chi-square statistic for this table is X 2 = 53.194. What are its degrees of freedom? What is the mean of X 2 if the null hypothesis is true? Use Table E to approximate the P-value of the test. (e) What do you conclude from these data? 23.31 Free speech for racists? The General Social Survey (GSS) for 2002 asked this question: “Consider a person who believes that Blacks are genetically inferior. If such a person wanted to make a speech in your community claiming that Blacks are inferior, should he be allowed to speak, or not? ” Here are the responses, broken down by the race of the respondent:13 Allowed Not allowed Black White Other 67 53 476 252 35 17 (a) Because the GSS is essentially an SRS of all adults, we can combine the races in these data and give a 99% confidence interval for the proportion of all adults who would allow a racist to speak. Do this. (b) Find the column percents and use them to compare the attitudes of the three racial groups. How significant are the differences found in the sample? 23.32 Do you use cocaine? Sample surveys on sensitive issues can give different results depending on how the question is asked. A University of Wisconsin study divided 2400 respondents into 3 groups at random. All were asked if they had ever used cocaine. One group of 800 was interviewed by phone; 21% said they had used cocaine. Another 800 people were asked the question in a one-on-one First Light/Getty Images 573 P1: PBU/OVY GTBL011-23 574 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test personal interview; 25% said “Yes.” The remaining 800 were allowed to make an anonymous written response; 28% said “Yes.” 14 Are there statistically significant differences among these proportions? State the hypotheses, convert the information given into a two-way table of counts, give the test statistic and its P-value, and state your conclusions. 23.33 Ethnicity and seat belt use. How does seat belt use vary with drivers’ race or ethnic group? The answer depends on gender (males are less likely to buckle up) and also on location. Here are data on a random sample of male drivers observed in Houston:15 Black Hispanic White Drivers Belted 369 540 257 273 372 193 (a) The table gives the number of drivers in each group and the number of these who were wearing seat belts. Make a two-way table of group by belted or not. (b) Are there statistically significant differences in seat belt use among men in these three groups? If there are, describe the differences. 23.34 Did the randomization work? After randomly assigning subjects to treatments in a randomized comparative experiment, we can compare the treatment groups to see how well the randomization worked. We hope to find no significant differences among the groups. A study of how to provide premature infants with a substance essential to their development assigned infants at random to receive one of four types of supplement, called PBM, NLCP, PL-LCP, and TG-LCP.16 (a) The subjects were 77 premature infants. Outline the design of the experiment if 20 are assigned to the PBM group and 19 to each of the other treatments. (b) The random assignment resulted in 9 females in the TG-LCP group and 11 females in each of the other groups. Make a two-way table of group by gender and do a chi-square test to see if there are significant differences among the groups. What do you find? 23.35 Opinions about the death penalty. “Do you favor or oppose the death penalty for persons convicted of murder? ” When the General Social Survey asked this question in its 2002 survey, the responses of people whose highest education was a bachelor’s degree and of people with a graduate degree were as follows:17 Bachelor Graduate Favor Oppose 135 64 71 50 (a) Is there evidence that the proportions of all people at these levels of education who favor the death penalty differ? Find the two sample proportions, the z statistic, and its P-value. P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 22, 2006 21:38 Chapter 23 Exercises (b) Is there evidence that the opinions of all people at these levels of education differ? Find the chi-square statistic X 2 and its P-value. If your work is correct, X 2 should be the same as z 2 and the two P-values should be identical. 23.36 Unhappy rats and tumors. Some people think that the attitude of cancer patients can influence the progress of their disease. We can’t experiment with humans, but here is a rat experiment on this theme. Inject 60 rats with tumor cells and then divide them at random into two groups of 30. All the rats receive electric shocks, but rats in Group 1 can end the shock by pressing a lever. (Rats learn this sort of thing quickly.) The rats in Group 2 cannot control the shocks, which presumably makes them feel helpless and unhappy. We suspect that the rats in Group 1 will develop fewer tumors. The results: 11 of the Group 1 rats and 22 of the Group 2 rats developed tumors.18 (a) State the null and alternative hypotheses for this investigation. Explain why the z test rather than the chi-square test for a 2 × 2 table is the proper test. (b) Carry out the test and report your conclusion. 23.37 Regulating guns. The National Gun Policy Survey, conducted by the National Opinion Research Center at the University of Chicago, asked a random sample of adults many questions about regulation of guns in the United States. One of the questions was “Do you think there should be a law that would ban possession of handguns except for the police and other authorized persons? ” Figure 23.7 Less than high school Yes High school Some Graduate college 58 50.00 2.6055 College Graduate Postgraduate degree All 84 39.44 0.0558 169 36.50 1.7989 98 42.06 0.1463 77 43.75 0.4690 486 40.47 * 715 59.53 * No 58 50.00 1.7710 129 60.56 0.0379 294 63.50 1.2228 135 57.94 0.0994 99 56.25 0.3188 All 116 100.00 * 213 100.00 * 463 100.00 * 233 100.00 * 176 1201 100.00 100.00 * * Cell Contents: Count % of Column Contribution to Chi-square Pearson Chi-Square = 8.525, DF = 4, P-Value = 0.074 F I G U R E 2 3 . 7 Minitab output for the sample survey responses of Exercise 23.37. 575 P1: PBU/OVY GTBL011-23 576 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 22, 2006 21:38 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test displays Minitab output that includes the two-way table of response versus the respondents’ highest level of education.19 (a) The column percents show the breakdown of responses separately for each level of education. Which education groups show particularly high and low support for the proposed law? Which education group’s responses contribute the most to the size of the chi-square statistic? Is there a consistent direction in the relationship, such as “people with more education are more likely to support strong gun laws”? (b) Verify the degrees of freedom given by Minitab. How does the value of the chi-square statistic compare with its mean under the null hypothesis? What do you conclude from the chi-square test? 23.38 I think I’ll be rich by age 30. A sample survey of young adults (aged 19 to 25) asked, “What do you think are the chances you will have much more than a middle-class income at age 30? ” The CrunchIt! output in Figure 23.8 shows the two-way table and related information, omitting a few subjects who refused to respond or who said they were already rich.20 Cell format Count (Column percent) Male Female Almost no chance 98 (3.985%) 96 (4.056%) 194 (4.02%) Some, but probably not 286 (11.63%) 426 (18%) 712 (14.75%) A 50-50 chance 720 (29.28%) 696 (29.4%) 1416 (29.34%) A good chance 758 (30.83%) 663 (28.01%) 1421 (29.44%) Almost certain 597 (24.28%) 486 (20.53%) 1083 (22.44%) Total Statistic DF Total 2459 2367 4826 (100.00%) (100.00%) (100.00%) Value P-Value Chi-square 4 43.94552 <0.0001 F I G U R E 2 3 . 8 CrunchIt! output for the sample survey responses of Exercise 23.38. P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 Chapter 23 Exercises Use the output as the basis for a discussion of the differences between young men and young women in assessing their chances of being rich by age 30. The remaining exercises concern larger tables that require software for easy analysis. Follow the Formulate, Solve, and Conclude steps of the four-part process in your answers to these exercises. It may be helpful to restate in your own words the State information given in the exercise. 23.39 Students and catalog shopping. What is the most important reason that students buy from catalogs? The answer may differ for different groups of students. Here are results for samples of American and East Asian students at a large midwestern university:21 Save time Easy Low price Live far from stores No pressure to buy Other reason Total American Asian 29 28 17 11 10 20 10 11 34 4 3 7 115 69 Describe the most important differences between American and Asian students. Is there a significant overall difference between the two distributions of responses? 23.40 Where do young adults live? A survey by the National Institutes of Health asked a random sample of young adults (aged 19 to 25), “Where do you live now? That is, where do you stay most often? ” We earlier (page 513) compared the proportions of men and women who lived with their parents. Here now is the full two-way table (omitting a few who refused to answer and one who claimed to be homeless):22 Parents’ home Another person’s home Own place Group quarters Female Male 923 144 1294 127 986 132 1129 119 What are the most important differences between young men and women? Are their choices of living places significantly different? 23.41 How are schools doing? The nonprofit group Public Agenda conducted telephone interviews with a stratified sample of parents of high school children. There were 202 black parents, 202 Hispanic parents, and 201 white parents. 4 STEP 577 P1: PBU/OVY GTBL011-23 578 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test One question asked was “Are the high schools in your state doing an excellent, good, fair or poor job, or don’t you know enough to say? ” Here are the survey results:23 Black parents Hispanic parents White parents 12 69 75 24 22 34 55 61 24 28 22 81 60 24 14 202 202 201 Excellent Good Fair Poor Don’t know Total Are the differences in the distributions of responses for the three groups of parents statistically significant? What departures from the null hypothesis “no relationship between group and response” contribute most to the value of the chi-square statistic? Write a brief conclusion based on your analysis. 23.42 The Mediterranean diet. Cancer of the colon and rectum is less common in the Mediterranean region than in other Western countries. The Mediterranean diet contains little animal fat and lots of olive oil. Italian researchers compared 1953 patients with colon or rectal cancer with a control group of 4154 patients admitted to the same hospitals for unrelated reasons. They estimated consumption of various foods from a detailed interview, then divided the patients into three groups according to their consumption of olive oil. Here are some of the data:24 Olive Oil Hugh Burden/SuperStock Colon cancer Rectal cancer Controls Low Medium High Total 398 250 1368 397 241 1377 430 237 1409 1225 728 4154 (a) Is this study an experiment? Explain your answer. (b) The investigators report that “less than 4% of cases or controls refused to participate.” Why does this fact strengthen our confidence in the results? (c) The researchers conjectured that high olive oil consumption would be more common among patients without cancer than among patients with colon cancer or rectal cancer. What do the data say? 23.43 Market research. Before bringing a new product to market, firms carry out extensive studies to learn how consumers react to the product and how best to advertise its advantages. Here are data from a study of a new laundry detergent.25 The subjects are people who don’t currently use the established brand that the new product will compete with. Give subjects free samples of both detergents. P1: PBU/OVY GTBL011-23 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 Chapter 23 Exercises After they have tried both for a while, ask which they prefer. The answers may depend on other facts about how people do laundry. Laundry Practices Prefer standard product Prefer new product Soft water, warm wash Soft water, hot wash Hard water, warm wash Hard water, hot wash 53 63 27 29 42 68 30 42 How do laundry practices (water hardness and wash temperature) influence the choice of detergent? In which settings does the new detergent do best? Are the differences between the detergents statistically significant? Support for political parties. Political parties anxiously ask what groups of people support them. The General Social Survey (GSS) asked its 2002 sample, “Generally speaking, do you usually think of yourself as a Republican, Democrat, Independent, or what?”Here is a large two-way table breaking down the responses by age group:26 Age Group Strong Democrat Not strong Democrat Independent, near Democrat Independent Independent, near Republican Not strong Republican Strong Republican Other party 18–30 31–40 41–55 56–89 60 99 72 152 53 90 42 9 83 126 56 124 41 85 56 12 113 138 77 149 50 133 89 14 151 148 62 102 54 138 127 13 Exercises 23.44 to 23.46 are based on this table. 23.44 Other parties. The GSS is essentially an SRS of American adults. Give a 95% confidence interval for the proportion of adults who support “other parties.” 23.45 Party support. Make a 2 × 4 table by combining the counts in the three rows that mention Democrat and in the three rows that mention Republican and ignoring strict independents and supporters of other parties. We might think of this table as comparing all adults who lean Democrat and all adults who lean Republican. How does support of the two major parties differ among age groups? 23.46 Politics and age. Use the full table to analyze the differences in political party support among age groups. The sample is so large that the differences are bound to be highly significant, but give the chi-square statistic and its P-value nonetheless. The main challenge is in seeing what the data say. Does the full table yield any insights not found in the compressed table you analyzed in the previous exercise? 579 P1: PBU/OVY GTBL011-23 580 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v18.cls T1: PBU June 21, 2006 21:6 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test E E S E E CASE STUDIES The Electronic Encyclopedia of Statistical Examples and Exercises (EESEE) is available on the text CD and Web site. These more elaborate stories, with data, provide settings for longer case studies. Here are some suggestions for EESEE stories that apply the chi-square test. 23.47 Read the EESEE story “Surgery in a Blanket.” Write a report that answers Questions 1, 3, 5, 6, and 7 for this case study. 23.48 Read the EESEE story “Trilobite Bites.” Write a report that answers Questions 1, 2, 4, and 5 for this case study.
© Copyright 2025 Paperzz