The chi-square test - Yosemite Community College District

GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
CHAPTER
Lisl Dennis/Getty Images
P1: PBU/OVY
Two Categorical Variables:
The Chi-Square Test
The two-sample z procedures of Chapter 21 allow us to compare the proportions
of successes in two groups, either two populations or two treatment groups in an
experiment. In the first example in Chapter 21 (page 513), we compared young
men and young women by looking at whether or not they lived with their parents. That is, we looked at a relationship between two categorical variables, gender
(female or male) and “Where do you live?” (with parents or not). In fact, the data
include three more outcomes for “Where do you live?”: in another person’s home,
in your own place, and in group quarters such as a dormitory. When there are more
than two outcomes, or when we want to compare more than two groups, we need a
new statistical test. The new test addresses a general question: is there a relationship
between two categorical variables?
23
In this chapter we cover...
Two-way tables
The problem of multiple
comparisons
Expected counts in
two-way tables
The chi-square test
Using technology
Cell counts required for
the chi-square test
Uses of the chi-square test
The chi-square
distributions
The chi-square test and
the z test∗
The chi-square test for
goodness of fit∗
Two-way tables
We saw in Chapter 6 that we can present data on two categorical variables in a
two-way table of counts. That’s our starting point. Here is an example.
EXAMPLE 23.1
Health care: Canada and the United States
Canada has universal health care. The United States does not, but often offers more
elaborate treatment to patients with access. How do the two systems compare in
547
P1: PBU/OVY
GTBL011-23
548
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
treating heart attacks? A comparison of random samples of 2600 U.S. and 400 Canadian
heart attack patients found that “the Canadian patients typically stayed in the hospital
one day longer (P = 0.009) than the U.S. patients but had a much lower rate of cardiac
catheterization (25 percent vs. 72 percent, P < 0.001), coronary angioplasty (11 percent vs. 29 percent, P < 0.001), and coronary bypass surgery (3 percent vs. 14 percent,
P < 0.001).”1
The study then looked at many outcomes a year after the heart attack. There was
no significant difference in the patients’ survival rate. Another key outcome was the
patients’ own assessment of their quality of life relative to what it had been before the
heart attack. Here are the data for the patients who survived a year:
Quality of life
Much better
Somewhat better
About the same
Somewhat worse
Much worse
Total
cell
Canada
United States
75
71
96
50
19
541
498
779
282
65
311
2165
The two-way table in Example 23.1 shows the relationship between two categorical variables. The explanatory variable is the patient’s country, Canada or the
United States. The response variable is quality of life a year after a heart attack,
with 5 categories. The two-way table gives the counts for all 10 combinations of
values of these variables. Each of the 10 counts occupies a cell of the table.
It is hard to compare the counts because the U.S. sample is much larger. Here
are the percents of each sample with each outcome:
Quality of life
Much better
Somewhat better
About the same
Somewhat worse
Much worse
Total
Canada
United States
24%
23%
31%
16%
6%
25%
23%
36%
13%
3%
100%
100%
In the language of Chapter 6 (page 153), these are the conditional distributions of
outcomes, given the patients’ nationality. The differences are not large, but slightly
higher percents of Canadians thought their quality of life was “somewhat worse”
or “much worse.” Figure 23.1 compares the two distributions. We want to know if
there is a significant difference between the two distributions of outcomes.
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
About
the same
Much
better
These bars compare
“somewhat worse”
percents: 16%
in Canada, 13% in
the United States.
Somewhat
better
20
Percent
30
40
Two-way tables
Somewhat
worse
10
GTBL011-23
P2: PBU/OVY
Much
worse
0
P1: PBU/OVY
Can. U.S.
Can. U.S.
Can. U.S.
Can. U.S.
Can. U.S.
F I G U R E 2 3 . 1 Bar graph comparing quality of life a year after a heart attack in
Canada and the United States, for Example 23.1.
APPLY YOUR KNOWLEDGE
23.1 Smoking among French men. Smoking remains more common in much of
Europe than in the United States. In the United States, there is a strong
relationship between education and smoking: well-educated people are less likely
to smoke. Does a similar relationship hold in France? Here is a two-way table of
the level of education and smoking status (nonsmoker, former smoker, moderate
smoker, heavy smoker) of a sample of 459 French men aged 20 to 60 years.2 The
subjects are a random sample of men who visited a health center for a routine
checkup. We are willing to consider them an SRS of men from their region of
France.
Smoking Status
Education
Primary school
Secondary school
University
Nonsmoker
Former
Moderate
Heavy
56
37
53
54
43
28
41
27
36
36
32
16
(a) What percent of men with a primary school education are nonsmokers?
Former smokers? Moderate smokers? Heavy smokers? These percents should add
to 100% (up to roundoff error). They form the conditional distribution of
smoking, given a primary education.
(b) In a similar way, find the conditional distributions of smoking among men
with a secondary education and among men with a university education. Make a
Lisl Dennis/Getty Images
549
P1: PBU/OVY
GTBL011-23
550
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
table that presents the three conditional distributions. Be sure to include a “Total”
column showing that each row adds to 100%.
(c) Compare the three conditional distributions. Is there any clear relationship
between education and smoking?
23.2 Attitudes toward recycled products. Recycling is supposed to save resources.
Some people think recycled products are lower in quality than other products, a
fact that makes recycling less practical. Here are data on attitudes toward coffee
filters made of recycled paper.3
Think the quality of
the recycled product is
Buyers
Nonbuyers
Higher
The same
Lower
20
29
7
25
9
43
(a) It appears that people who have bought the recycled filters have more
positive opinions than those who have not. Give percents to back up this claim.
Make a bar graph that compares your percents for buyers and nonbuyers.
(b) Association does not prove causation. Explain how buying recycled filters
might improve a person’s opinion of their quality. Then explain how the opinion
a person holds might influence his or her decision to buy or not. You see that the
cause-and-effect relationship might go in either direction.
The problem of multiple comparisons
The null hypothesis in Example 23.1 is that there is no difference between the
distributions of outcomes in Canada and the United States. Put more generally,
the null hypothesis is that there is no relationship between two categorical variables,
H 0 : there is no relationship between nationality and quality of life
The alternative hypothesis says that there is a relationship but does not specify any
particular kind of relationship,
H a : there is some relationship between nationality and quality of life
Any difference between the Canadian and American distributions means that the
null hypothesis is false and the alternative hypothesis is true. The alternative hypothesis is not one-sided or two-sided. We might call it “many-sided” because it
allows any kind of difference.
With only the methods we already know, we might start by comparing the
proportions of patients in the two nations with “much better” quality of life, using
the two-sample z test for proportions. We could similarly compare the proportions
with each of the other outcomes: five tests in all, with five P-values. This is a
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
The problem of multiple comparisons
bad idea. The P-values belong to each test separately, not to the collection of five
tests together. Think of the distinction between the probability that a basketball
player makes a free throw and the probability that she makes all of five free throws.
When we do many individual tests or confidence intervals, the individual P-values and
confidence levels don’t tell us how confident we can be in all of the inferences taken
together.
Because of this, it’s cheating to pick out the largest of the five differences and
then test its significance as if it were the only comparison we had in mind. For
example, the “much worse” proportions in Example 23.1 are significantly different
(P = 0.0047) if we compare just this one outcome. But is it surprising that the most
different proportions among five outcomes differ by this much? That’s a different
question.
The problem of how to do many comparisons at once with an overall measure
of confidence in all our conclusions is common in statistics. This is the problem of
multiple comparisons. Statistical methods for dealing with multiple comparisons
usually have two steps:
1. An overall test to see if there is good evidence of any differences among the
parameters that we want to compare.
2. A detailed follow-up analysis to decide which of the parameters differ and to
estimate how large the differences are.
The overall test, though more complex than the tests we met earlier, is often
reasonably straightforward. The follow-up analysis can be quite elaborate. In our
basic introduction to statistical practice, we will concentrate on the overall test,
along with data analysis that points to the nature of the differences.
APPLY YOUR KNOWLEDGE
23.3 Nonsmokers and education in France. In the setting of Exercise 23.1, consider
only the proportions of nonsmokers in the three populations of men with primary,
secondary, and university education. Do three significance tests of the three null
hypotheses
H 0: p primary = p secondary
H 0: p primary = p university
H 0: p secondary = p university
against the two-sided alternatives. Give P-values for each test. These three
P-values don’t tell us how often the three proportions for the three education
groups will be spread this far apart just by chance.
23.4 Who’s online? A sample survey by the Pew Internet and American Life Project
asked a random sample of adults about use of the Internet and about the type of
community they lived in. Following is the two-way table:4
CAUTION
UTION
multiple comparisons
551
P1: PBU/OVY
GTBL011-23
552
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
Community Type
Internet users
Nonusers
Rural
Suburban
Urban
433
463
1072
627
536
388
(a) Give three 95% confidence intervals, for the percents of adults in rural,
suburban, and urban communities who use the Internet.
(b) Explain clearly why we are not 95% confident that all three of these intervals
capture their respective population proportions.
Expected counts in two-way tables
Our general null hypothesis H 0 is that there is no relationship between the two
categorical variables that label the rows and columns of a two-way table. To test
H 0 , we compare the observed counts in the table with the expected counts, the
counts we would expect—except for random variation—if H 0 were true. If the
observed counts are far from the expected counts, that is evidence against H 0 . It
is easy to find the expected counts.
He started it!
A study of deaths in bar fights
showed that in 90% of the cases,
the person who died started the
fight. You shouldn’t believe this. If
you killed someone in a fight, what
would you say when the police ask
you who started the fight? After all,
dead men tell no tales.
EXPECTED COUNTS
The expected count in any cell of a two-way table when H 0 is true is
expected count =
EXAMPLE 23.2
row total × column total
table total
Observed versus expected counts
Let’s find the expected counts for the quality-of-life study. Here is the two-way table
with row and column totals:
Quality of life
Much better
Somewhat better
About the same
Somewhat worse
Much worse
Total
Canada
United States
Total
75
71
96
50
19
541
498
779
282
65
616
569
875
332
84
311
2165
2476
The expected count of Canadians with much better quality of life a year after a heart
attack is
row 1 total × column 1 total
(616)(311)
=
= 77.37
table total
2476
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
Expected counts in two-way tables
Here is the table of all 10 expected counts:
Quality of life
Canada
United States
Total
Much better
Somewhat better
About the same
Somewhat worse
Much worse
77.37
71.47
109.91
41.70
10.55
538.63
497.53
765.09
290.30
73.45
616
569
875
332
84
Total
311
2165
As this table shows, the expected counts have exactly the same row and column totals (up to
roundoff error) as the observed counts. That’s a good way to check your work.
To see how the data diverge from the null hypothesis, compare the observed counts
with these expected counts. You see, for example, that 19 Canadians reported much
worse quality of life, whereas we would expect only 10.55 if the null hypothesis were
true.
Why the formula works Where does the formula for an expected cell count
come from? Think of a basketball player who makes 70% of her free throws in the
long run. If she shoots 10 free throws in a game, we expect her to make 70% of
them, or 7 of the 10. Of course, she won’t make exactly 7 every time she shoots
10 free throws in a game. There is chance variation from game to game. But in
the long run, 7 of 10 is what we expect. In more formal language, if we have n
independent tries and the probability of a success on each try is p, we expect np
successes.
Now go back to the count of Canadians with much better quality of life a year
after a heart attack. The proportion of all 2476 subjects with much better quality
of life is
count of successes
row 1 total
616
=
=
table total
table total
2476
Think of this as p, the overall proportion of successes. If H 0 is true, we expect
(except for random variation) this same proportion of successes in both countries.
So the expected count of successes among the 311 Canadians is
616
= 77.37
np = (311)
2746
That’s the formula in the Expected Counts box.
APPLY YOUR KNOWLEDGE
23.5 Smoking among French men. The two-way table in Exercise 23.1 displays data
on the education and smoking behavior of a sample of French men. The null
hypothesis says that there is no relationship between these variables. That is, the
distribution of smoking is the same for all three levels of education.
553
P1: PBU/OVY
GTBL011-23
554
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
(a) Find the expected counts for each smoking status among men with a
university education. This is one row of the two-way table of expected counts.
Find the row total and verify that it agrees with the row total for the observed
counts.
(b) We conjecture that men with a university education smoke less than the null
hypothesis calls for. How does comparing the observed and expected counts in
this row confirm this conjecture?
23.6 Attitudes toward recycled products. Exercise 23.2 describes a comparison of
the attitudes of people who do and don’t buy coffee filters made of recycled paper.
The null hypothesis “no relationship” says that in the population of all consumers,
the proportions who hold each attitude are the same for buyers and nonbuyers.
(a) Find the expected cell counts if this hypothesis is true and display them in a
two-way table. Add the row and column totals to your table and check that they
agree with the totals for the observed counts.
(b) Are there any large deviations between the observed counts and the expected
counts? What kind of relationship between the two variables do these deviations
point to?
The chi-square test
The statistical test that tells us whether the observed differences between Canada
and the United States are statistically significant compares the observed and
expected counts. The test statistic that makes the comparison is the chi-square
statistic.
CHI-SQUARE STATISTIC
The chi-square statistic is a measure of how far the observed counts in a
two-way table are from the expected counts. The formula for the statistic is
X2 =
(observed count − expected count)2
expected count
The sum is over all cells in the table.
The chi-square statistic is a sum of terms, one for each cell in the table. In the
quality-of-life example, 75 Canadian patients reported much better quality of life.
The expected count for this cell is 77.37. So the term of the chi-square statistic
from this cell is
(observed count − expected count)2
(75 − 77.37)2
=
expected count
77.37
5.617
= 0.073
=
77.37
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
Using technology
Think of the chi-square statistic X 2 as a measure of the distance of the observed counts from the expected counts. Like any distance, it is always zero or
positive, and it is zero only when the observed counts are exactly equal to the expected counts. Large values of X 2 are evidence against H 0 because they say that
the observed counts are far from what we would expect if H 0 were true. Although
the alternative hypothesis H a is many-sided, the chi-square test is one-sided because any
violation of H 0 tends to produce a large value of X 2 . Small values of X 2 are not
evidence against H 0 .
Using technology
Calculating the expected counts and then the chi-square statistic by hand is a
bit time-consuming. As usual, software saves time and always gets the arithmetic
right. Figure 23.2 (pages 556 and 557) shows output for the chi-square test for
the quality-of-life data from a graphing calculator, two statistical programs, and a
spreadsheet program.
EXAMPLE 23.3
Chi-square from software
The outputs differ in the information they give. All except the Excel spreadsheet tell us
that the chi-square statistic is X 2 = 11.725, with P-value 0.020. There is quite good evidence that the distributions of outcomes are different in Canada and the United States.
The two statistical programs repeat the two-way table of observed counts and add
the row and column totals. Both programs offer additional information on request. We
asked CrunchIt! to add the column percents that enable us to compare the Canadian
and American distributions. The chi-square statistic is a sum of 10 terms, one for each
cell in the table. We asked Minitab to give the expected count and the contribution
to chi-square for each cell. The top-left cell has expected count 77.4 and chi-square
term 0.073, just as we calculated. Look at the 10 terms. More than half the value of
X 2 (6.766 out of 11.725) comes from just one cell. This points to the most important
difference between the two countries: a higher proportion of Canadians report much
worse quality of life. Most of the rest of X 2 comes from two other cells: more Canadians
report somewhat worse quality of life, and fewer report about the same quality.
Excel is as usual more awkward than software designed for statistics. It lacks a menu
selection for the chi-square test. You must program the spreadsheet to calculate the
expected cell counts and then use the CHITEST worksheet formula. This gives the
P-value but not the test statistic itself. You can of course program the spreadsheet to
find the value of X 2 . The Excel output shows the observed and expected cell counts and
the P-value.
The chi-square test is the overall test for detecting relationships between two
categorical variables. If the test is significant, it is important to look at the data to
learn the nature of the relationship. We have three ways to look at the quality-oflife data:
•
Compare appropriate percents: which outcomes occur in quite different
percents of Canadian and American patients? This is the method we learned
in Chapter 6.
CAUTION
UTION
555
P1: PBU/OVY
GTBL011-23
556
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 22, 2006
21:38
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
TI-83
CrunchIt!
Cell format
Count
(Column percent)
Canada
USA
Total
Much better
75
(24.12%)
541
(24.99%)
616
(24.88%)
Somewhat better
71
(22.83%)
498
(23%)
569
(22.98%)
About the same
Somewhat worse
Much worse
Total
Statistic
Chi-square
DF
779
875
96
(30.87%) (35.98%) (35.34%)
50
282
332
(16.08%) (13.03%) (13.41%)
19
65
84
(6.109%) (3.002%) (3.393%)
311
2165
2476
(100.00%) (100.00%) (100.00%)
Value
4 11.725485
P-value
0.0195
F I G U R E 2 3 . 2 Output from the TI-83 graphing calculator, CrunchIt!, Minitab, and
Excel for the two-way table in the quality-of-life study (continued).
•
•
Compare observed and expected cell counts: which cells have more or
fewer observations than we would expect if H 0 were true?
Look at the terms of the chi-square statistic: which cells contribute the
most to the value of X 2 ?
EXAMPLE 23.4
Canada and the United States: conclusions
There is a significant difference between the distributions of quality of life reported by
Canadian and American patients a year after a heart attack. All three ways of comparing
the distributions agree that the main difference is that a higher proportion of Canadians
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 22, 2006
21:38
Using technology
Minitab
Canada
USA
All
Much better
75
77.4
0.0728
541
538.6
0.0105
616
616.0
*
Somewhat better
71
71.5
0.0031
498
497.5
0.0004
569
569.0
*
About the same
96
109.9
1.7593
779
765.1
0.2527
875
875.0
*
Somewhat worse
50
41.7
1.6515
282
290.3
0.2372
332
332.0
*
Much worse
19
10.6
6.7660
65
73.4
0.9719
84
84.0
*
All
311
311.0
*
2165
2165.0
*
2476
2476.0
*
Cell Contents:
Count
Expected count
Contribution to Chi-square
Pearson Chi-Square = 11.725, DF = 4, P-Value = 0.020
Excel
A
B
1 Observed Canada
2
3
4
5
6
7
C
D
USA
75
71
96
50
19
541
498
779
282
65
USA
8 Expected Canada
77.37
538.63
9
71.47
497.53
10
109.91
765.09
11
41.7
290.3
12
10.55
73.45
13
14
15
16 CHITEST(B2:C6,B9:C13) 0.019482
17
Sheet1 Sheet2 Sheet3
F I G U R E 2 3 . 2 (continued).
This key identifies
the output for each
cell in the table.
557
P1: PBU/OVY
GTBL011-23
558
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 22, 2006
21:38
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
report that their quality of life is worse than before their heart attack. Other response
variables measured in the study agree with this conclusion.
The broader conclusion, however, is controversial. Americans are likely to point
to the better outcomes produced by their much more intensive treatment. Canadians
reply that the differences are small, that there was no significant difference in survival,
and that the American advantage comes at high cost. The resources spent on expensive
treatment of heart attack victims could instead be spent on providing basic health care
to the many Americans who lack it.
There is an important message here: although statistical studies shed light on issues
of public policy, statistics alone rarely settles complicated questions such as “Which kind
of health care system works better?”
APPLY YOUR KNOWLEDGE
23.7 Smoking among French men. In Exercises 23.1 and 23.5, you began to analyze
data on the smoking status and education of French men. Figure 23.3 displays the
Minitab output for the chi-square test applied to these data.
(a) Starting from the observed and expected counts in the output, calculate the
four terms of the chi-square statistic for the bottom row (university education).
Verify that your work agrees with Minitab’s “Contribution to Chi-square” up to
roundoff error.
Nonsmoker
Former
Moderate
Heavy
All
Primary
56
59.48
0.2038
54
50.93
0.1856
41
42.37
0.0443
36
34.22
0.0924
187
187.00
*
Secondary
37
44.21
1.1769
43
37.85
0.6996
27
31.49
0.6414
32
25.44
1.6928
139
139.00
*
University
53
42.31
2.7038
28
36.22
1.8655
36
30.14
1.1414
16
24.34
2.8576
133
133.00
*
All
146
146.00
*
125
125.00
*
104
104.00
*
84
84.00
*
459
459.00
*
Cell Contents:
Count
Expected count
Contribution to Chi-square
Pearson Chi-Square = 13.305, DF = 6, P-Value = 0.038
F I G U R E 2 3 . 3 Minitab output for the two-way table of education level and smoking
status among French men, for Exercise 23.7.
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 22, 2006
21:38
Cell counts required for the chi-square test
(b) According to Minitab, what is the value of the chi-square statistic X 2 and the
P-value of the chi-square test?
(c) Look at the “Contribution to Chi-square” entries in Minitab’s display. Which
terms contribute the most to X 2 ? Write a brief summary of the nature and
significance of the relationship between education and smoking.
23.8 Attitudes toward recycled products. In Exercises 23.2 and 23.6 you began to
analyze data on consumer attitudes toward recycled products. Figure 23.4 gives
CrunchIt! output for these data.
(a) Starting from the observed and expected counts, find the six terms of the
chi-square statistic and then the statistic X 2 itself. Check your work against the
computer output.
(b) What is the P-value for the test? Explain in simple language what it means to
reject H 0 in this setting.
(c) Which cells contribute the most to X 2 ? What kind of relationship do these
terms in combination with the row percents in the table point to?
Cell format
Count
(Row percent)
Expected count
Higher
Buyers
Nonbuyers
Total
The same
20
7
(55.56%) (19.44%)
13.26
8.662
Lower
Total
9
36
(25%) (100.00%)
14.08
29
25
43
97
(29.9%) (25.77%) (44.33%) (100.00%)
35.74
23.34
37.92
49
32
(36.84%) (24.06%)
52
133
(39.1%) (100.00%)
Statistic
DF
Value
P-value
Chi-square
2
7.638116
0.0219
F I G U R E 2 3 . 4 CrunchIt! output for the study of consumer attitudes toward recycled
products, for Exercise 23.8.
Cell counts required for the chi-square test
The chi-square test, like the z procedures for comparing two proportions, is an
approximate method that becomes more accurate as the counts in the cells of the
table get larger. We must therefore check that the counts are large enough to trust
the P-value. Fortunately, the chi-square approximation is accurate for quite modest
counts. Here is a practical guideline.5
559
P1: PBU/OVY
GTBL011-23
560
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
CELL COUNTS REQUIRED FOR THE CHI-SQUARE TEST
You can safely use the chi-square test with critical values from the
chi-square distribution when no more than 20% of the expected counts are
less than 5 and all individual expected counts are 1 or greater. In particular,
all four expected counts in a 2 × 2 table should be 5 or greater.
Note that the guideline uses expected cell counts. The expected counts for the
quality of life study of Example 23.1 appear in the Minitab output in Figure 23.2.
The smallest expected count is 10.6, so the data easily meet the guideline for safe
use of chi-square.
APPLY YOUR KNOWLEDGE
23.9 Does chi-square apply? Figure 23.3 displays Minitab output for data on French
men. Using the information in the output, verify that the data meet the cell
count requirement for use of chi-square.
23.10 Does chi-square apply? Figure 23.4 displays CrunchIt! output for data on
consumer attitudes toward recycled products. Using the information in the
output, verify that the data meet the cell count requirement for use of chi-square.
Uses of the chi-square test
Two-way tables can arise in several ways. The study of the quality of life of heart attack patients compared two independent random samples, one in Canada and the
other in the United States. The design of the study fixed the sizes of the two samples. The next example illustrates a different setting, in which all the observations
come from just one sample.
4
STEP
EXAMPLE 23.5
Extracurricular activities and grades
STATE: North Carolina State University studied student performance in a course required by its chemical engineering major. Students must earn at least a C in the course
in order to continue in the major. One question of interest was the relationship between
time spent in extracurricular activities and success in the course. Students were asked
to estimate how many hours per week they spent on extracurricular activities (less than
2, 2 to 12, or greater than 12). The CrunchIt! output in Figure 23.5 shows the two-way
table of extracurricular activity time and course grade for the 119 students who answered
the question.6
FORMULATE: Carry out a chi-square test for
H 0 : there is no relationship between extracurricular activity time and course
grade
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 22, 2006
18:45
Uses of the chi-square test
H a : there is some relationship between these two variables
Compare column percents or observed versus expected cell counts or terms of chi-square
to see the nature of the relationship.
SOLVE: First check the guideline for use of chi-square. The expected cell counts appear in the output in Figure 23.5. Two of the expected counts are quite small, 5.513
and 2.487. But all the expected counts are greater than 1, and only 1 out of 6 (17%) is
less than 5. We can safely use chi-square. The output shows that there is a significant
relationship ( X 2 = 6.926, P = 0.0313). The column percents show an interesting pattern: students who spend low and high amounts of time on extracurricular activities are
both less likely to earn a C or better than students who spend a moderate amount of
time.
CONCLUDE: We find that 75% of students in the moderate extracurricular activity
group succeed in the course, compared with 55% in the low group and only 38% in the
high group. These differences in success percents are significant (P = 0.03). Because
there are few students in the low and (especially) high groups, we now wish that the
questionnaire had not lumped 2 to 12 hours together. We should also look at other data
that might help explain the pattern. For example, are the “low extracurricular” students
more often employed? Or are they students with low GPAs who are struggling despite
lots of study time?
F I G U R E 2 3 . 5 CrunchIt! output for the two-way table of course grade and
extracurricular activities, for Example 23.5.
Alt-6/Alamy
561
P1: PBU/OVY
GTBL011-23
562
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
Pay attention to the nature of the data in Example 23.5:
•
•
We do not have three separate samples of students with low, moderate, and
high extracurricular activity. We have a single group of 119 students, each
classified in two ways (extracurricular activity and course grade).
The data (except for small nonresponse) cover all of the students enrolled in
this course in one semester. We might regard this as a sample of students
enrolled in the course over several years. But we might also regard these
119 students as the entire population rather than a sample from a larger
population.
One of the most useful properties of chi-square is that it tests the null hypothesis “the row and column variables are not related to each other” whenever this
hypothesis makes sense for a two-way table. It makes sense when we are comparing a categorical response in two or more samples, as when we compared quality
of life for patients in Canada and the United States. The hypothesis also makes
sense when we have data on two categorical variables for the individuals in a
single sample, as when we examined grades and extracurricular activities for a
sample of college students. The hypothesis “no relationship” makes sense even
if the single sample is an entire population. Statistical significance has the same
meaning in all these settings: “A relationship this strong is not likely to happen
just by chance.” This makes sense whether the data are a sample or an entire
population.
USES OF THE CHI-SQUARE TEST
Use the chi-square test to test the null hypothesis
H 0 : there is no relationship between two categorical variables
when you have a two-way table from one of these situations:
• Independent SRSs from each of two or more populations, with
each individual classified according to one categorical variable.
(The other variable says which sample the individual comes
from.)
• A single SRS, with each individual classified according to both of two
categorical variables.
APPLY YOUR KNOWLEDGE
23.11 Majors for men and women in business. A study of the career plans of young
women and men sent questionnaires to all 722 members of the senior class in the
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
The chi-square distributions
College of Business Administration at the University of Illinois. One question
asked which major within the business program the student had chosen. Here are
the data from the students who responded:7
Female
Male
68
91
5
61
56
40
6
59
Accounting
Administration
Economics
Finance
This is an example of a single sample classified according to two categorical
variables (gender and major).
(a) Describe the differences between the distributions of majors for women and
men with percents, with a bar graph, and in words.
(b) Verify that the expected cell counts satisfy the requirement for use of
chi-square.
(c) Test the null hypothesis that there is no relationship between the gender of
students and their choice of major. Give a P-value.
(d) Which two cells have the largest terms of the chi-square statistic? How do
the observed and expected counts differ in these cells? (This should strengthen
your conclusions in (a).)
(e) What percent of the students did not respond to the questionnaire? Why does
this nonresponse weaken conclusions drawn from these data?
The chi-square distributions
Software usually finds P-values for us. The P-value for a chi-square test comes from
comparing the value of the chi-square statistic with critical values for a chi-square
distribution.
THE CHI-SQUARE DISTRIBUTIONS
The chi-square distributions are a family of distributions that take only
positive values and are skewed to the right. A specific chi-square
distribution is specified by giving its degrees of freedom.
The chi-square test for a two-way table with r rows and c columns uses
critical values from the chi-square distribution with (r − 1)(c − 1) degrees
of freedom. The P-value is the area to the right of X 2 under the density
curve of this chi-square distribution.
563
P1: PBU/OVY
GTBL011-23
564
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
df = 1
df = 4
df = 8
0
F I G U R E 2 3 . 6 Density curves for the chi-square distributions with 1, 4, and 8 degrees
of freedom. Chi-square distributions take only positive values and are right-skewed.
Figure 23.6 shows the density curves for three members of the chi-square family
of distributions. As the degrees of freedom increase, the density curves become less
skewed and larger values become more probable. Table E in the back of the book
gives critical values for chi-square distributions. You can use Table E if you do not
have software that gives you P-values for a chi-square test.
EXAMPLE 23.6
Using the chi-square table
The two-way table of 5 outcomes by 2 countries for the quality-of-life study has 5 rows
and 2 columns. That is, r = 5 and c = 2. The chi-square statistic therefore has degrees
of freedom
(r − 1)(c − 1) = (5 − 1)(2 − 1) = (4)(1) = 4
df = 4
p
x
∗
.02
.01
11.67
13.28
Three of the outputs in Figure 23.2 give 4 as the degrees of freedom.
The observed value of the chi-square statistic is X 2 = 11.725. Look in the df =
4 row of Table E. The value X 2 = 11.725 falls between the 0.02 and 0.01 critical values
of the chi-square distribution with 4 degrees of freedom. Remember that the chi-square
test is always one-sided. So the P-value of X 2 = 11.725 is between 0.02 and 0.01. The
outputs in Figure 23.2 show that the P-value is 0.0195, close to 0.02.
We know that all z and t statistics measure the size of an effect in the standard
scale centered at zero. We can roughly assess the size of any z or t statistic by the
68–95–99.7 rule, though this is exact only for z. The chi-square statistic does not
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
The chi-square test and the z test
have any such natural interpretation. But here is a helpful fact: the mean of any
chi-square distribution is equal to its degrees of freedom. In Example 23.6, X 2 would
have mean 4 if the null hypothesis were true. The observed value X 2 = 11.725
is so much larger than 4 that we suspect it is significant even before we look at
Table E.
APPLY YOUR KNOWLEDGE
23.12 Attitudes toward recycled products. The CrunchIt! output in Figure 23.4
gives 2 degrees of freedom for the table in Exercise 23.2.
(a) Verify that this is correct.
(b) The computer gives the value of the chi-square statistic as X 2 = 7.638.
Between what two entries in Table E does this value lie? What does the table tell
you about the P-value?
(c) What is the mean value of the statistic X 2 if the null hypothesis is true? How
does the observed value of X 2 compare with this mean?
23.13 Smoking among French men. The Minitab output in Figure 23.3 gives the
degrees of freedom for the table of education and smoking status as DF = 6.
(a) Show that this is correct for a table with 3 rows and 4 columns.
(b) Minitab gives the chi-square statistic as Chi-Square 13.305. Between
which two entries in Table E does this value lie? Verify that Minitab’s result
P-Value = 0.038 lies between the tail areas for these values.
The chi-square test and the z test∗
One use of the chi-square test is to compare the proportions of successes in any
number of groups. If the r rows of the two-way table are r groups and the columns
are “success” and “failure,” the counts form an r × 2 table. P-values come from the
chi-square distribution with r − 1 degrees of freedom. If r = 2, we are comparing
just two proportions. We now have two ways to do this: the z test from Chapter 21
and the chi-square test with 1 degree of freedom for a 2 × 2 table. These two tests
always agree. In fact, the chi-square statistic X 2 is just the square of the z statistic,
and the P-value for X 2 is exactly the same as the two-sided P-value for z. We
recommend using the z test to compare two proportions because it gives you the
choice of a one-sided test and is related to a confidence interval for the difference
p1 − p2.
APPLY YOUR KNOWLEDGE
23.14 Treating ulcers. Gastric freezing was once a recommended treatment for ulcers
in the upper intestine. Use of gastric freezing stopped after experiments showed it
had no effect. One randomized comparative experiment found that 28 of the 82
∗
The remainder of the material in this chapter is optional.
565
P1: PBU/OVY
GTBL011-23
566
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
gastric-freezing patients improved, while 30 of the 78 patients in the placebo
group improved.8 We can test the hypothesis of “no difference” between the two
groups in two ways: using the two-sample z statistic or using the chi-square
statistic.
(a) Check the conditions required for both tests, given in the boxes on pages 521
and 560. The conditions are very similar, as they ought to be.
(b) State the null hypothesis with a two-sided alternative and carry out the z test.
What is the P-value, exactly from software or approximately from the bottom row
of Table C?
(c) Present the data in a 2 × 2 table. Use the chi-square test to test the
hypothesis from (a). Verify that the X 2 statistic is the square of the z statistic. Use
software or Table E to verify that the chi-square P-value agrees with the z result
(up to the accuracy of the tables if you do not use software).
(d) What do you conclude about the effectiveness of gastric freezing as a
treatment for ulcers?
The chi-square test for goodness of fit∗
The most common and most important use of the chi-square statistic is to test the
hypothesis that there is no relationship between two categorical variables. A variation
of the statistic can be used to test a different kind of null hypothesis: that a categorical variable has a specified distribution. Here is an example that illustrates this
use of chi-square.
EXAMPLE 23.7
More chi-square tests
There are other chi-square tests for
hypotheses more specific than “no
relationship.” A sociologist places
people in classes by social status,
waits ten years, then classifies the
same people again. The row and
column variables are the classes at
the two times. She might test the
hypothesis that there has been no
change in the overall distribution of
social status in the group. Or she
might ask if moves up in status are
balanced by matching moves down.
These and other null hypotheses
can be tested by variations of the
chi-square test.
Never on Sunday?
Births are not evenly distributed across the days of the week. Fewer babies are born on
Saturday and Sunday than on other days, probably because doctors find weekend births
inconvenient. Exercise 1.4 (page 10) gives national data that demonstrate this fact.
A random sample of 140 births from local records shows this distribution across the
days of the week:
Day
Births
Sun.
Mon.
Tue.
Wed.
Thu.
Fri.
Sat.
13
23
24
20
27
18
15
Sure enough, the two smallest counts of births are on Saturday and Sunday. Do these
data give significant evidence that local births are not equally likely on all days of the
week?
The chi-square test answers the question of Example 23.7 by comparing observed counts with expected counts under the null hypothesis. The null hypothesis
for births says that they are evenly distributed. To state the hypotheses carefully,
write the discrete probability distribution for days of birth:
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
The chi-square test for goodness of fit
Day
Probability
Sun.
Mon.
Tue.
Wed.
Thu.
Fri.
Sat.
p1
p2
p3
p4
p5
p6
p7
The null hypothesis says that the probabilities are the same on all days. In that
case, all 7 probabilities must be 1/7. So the null hypothesis is
H 0: p 1 = p 2 = p 3 = p 4 = p 5 = p 6 = p 7 =
1
7
The alternative hypothesis says that days are not all equally probable:
H a : not all p i =
1
7
As usual in chi-square tests, H a is a “many-sided” hypothesis that simply says that
H 0 is not true. The chi-square statistic is also as usual:
X2 =
(observed count − expected count)2
expected count
The expected count for an outcome with probability p is np, as we saw in the
discussion following Example 23.2. Under the null hypothesis, all the probabilities
p i are the same, so all 7 expected counts are equal to
np i = 140 ×
1
= 20
7
These expected counts easily satisfy our guideline for using chi-square. The chisquare statistic is
X2 =
(observed count − 20)2
20
(13 − 20)
(23 − 20)2
(15 − 20)2
=
+
+ ··· +
20
20
20
= 7.6
2
This new use of X 2 requires a different degrees of freedom. To find the Pvalue, compare X 2 with critical values from the chi-square distribution with
degrees of freedom one less than the number of values the birth day can take. That’s
7 − 1 = 6 degrees of freedom. From Table E, we see that X 2 = 7.6 is smaller than
the smallest entry in the df = 6 row, which is the critical value for tail area 0.25.
The P-value is therefore greater than 0.25 (software gives the more exact value
P = 0.269). These 140 births don’t give convincing evidence that births are not
equally likely on all days of the week.
The chi-square test applied to the hypothesis that a categorical variable has a
specified distribution is called the test for goodness of fit. The idea is that the test
assesses whether the observed counts “fit” the distribution. The only differences
between the test of fit and the test for a two-way table are that the expected counts
df = 6
p
.25
.20
x∗
7.84
8.56
567
P1: PBU/OVY
GTBL011-23
568
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
are based on the distribution specified by the null hypothesis and that the degrees
of freedom are one less than the number of possible outcomes in this distribution.
Here are the details.
THE CHI-SQUARE TEST FOR GOODNESS OF FIT
A categorical variable has k possible outcomes, with probabilities p 1 , p 2 ,
p 3 , . . . , p k . That is, p i is the probability of the ith outcome. We have
n independent observations from this categorical variable.
To test the null hypothesis that the probabilities have specified values
H 0: p 1 = p 10 , p 2 = p 20 , . . . , p k = p k0
use the chi-square statistic
X2 =
(count of outcome i − np i 0 )2
np i 0
The P-value is the area to the right of X 2 under the density curve of the
chi-square distribution with k − 1 degrees of freedom.
In Example 23.7, the outcomes are days of the week, with k = 7. The null
hypothesis says that the probability of a birth on the ith day is p i 0 = 1/7 for all
days. We observe n = 140 births and count how many fall on each day. These are
the counts used in the chi-square statistic.
APPLY YOUR KNOWLEDGE
23.15 Saving birds from windows. Many birds are injured or killed by flying into
windows. It appears that birds don’t see windows. Can tilting windows down so
that they reflect earth rather than sky reduce bird strikes? Place six windows at the
edge of a woods: two vertical, two tilted 20 degrees, and two tilted 40 degrees.
During the next four months, there were 53 bird strikes, 31 on the vertical
window, 14 on the 20-degree window, and 8 on the 40-degree window.9 If the tilt
has no effect, we expect strikes on all three windows to have equal probability.
Test this null hypothesis. What do you conclude?
23.16 More on birth days. Births really are not evenly distributed across the days of
the week. The data in Example 23.7 failed to reject this null hypothesis because
of random variation in a quite small number of births. Here are data on 700 births
in the same locale:
Day
Randy Duchaine/CORBIS
Births
Sun.
Mon.
Tue.
Wed.
Thu.
Fri.
Sat.
84
110
124
104
94
112
72
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
The chi-square test for goodness of fit
(a) The null hypothesis is that all days are equally probable. What are the
probabilities specified by this null hypothesis? What are the expected counts for
each day in 700 births?
(b) Calculate the chi-square statistic for goodness of fit.
(c) What are the degrees of freedom for this statistic? Do these 700 births give
significant evidence that births are not equally probable on all days of the week?
23.17 Course grades. Most students in a large statistics course are taught by teaching
assistants (TAs). One section is taught by the course supervisor, a senior professor.
The distribution of grades for the hundreds of students taught by TAs this
semester was
Grade
Probability
A
B
C
D/F
0.32
0.41
0.20
0.07
The grades assigned by the professor to students in his section were
Grade
A
B
C
D/F
Count
22
38
20
11
(These data are real. We won’t say when and where, but the professor was not the
author of this book.)
(a) What percents of each grade did students in the professor’s section earn? In
what ways does this distribution of grades differ from the TA distribution?
(b) Because the TA distribution is based on hundreds of students, we are willing
to regard it as a fixed probability distribution. If the professor’s grading follows this
distribution, what are the expected counts of each grade in his section?
(c) Does the chi-square test for goodness of fit give good evidence that the
professor’s grades follow a different distribution? (State hypotheses, check the
guideline for using chi-square, give the test statistic and its P-value, and state your
conclusion.)
23.18 What’s your sign? The University of Chicago’s General Social Survey (GSS) is
the nation’s most important social science sample survey. For reasons known only
to social scientists, the GSS regularly asks its subjects their astrological sign. Here
are the counts of responses in the most recent year this question was asked:10
Sign
Count
Sign
Count
Aries
Taurus
Gemini
Cancer
Leo
Virgo
225
222
241
240
260
250
Libra
Scorpio
Sagittarius
Capricorn
Aquarius
Pisces
243
214
200
216
224
244
If births are spread uniformly across the year, we expect all 12 signs to be equally
likely. Are they? Follow the four-step process in your answer.
4
STEP
569
P1: PBU/OVY
GTBL011-23
570
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
C H A P T E R 23 SUMMARY
The chi-square test for a two-way table tests the null hypothesis H 0 that there is
no relationship between the row variable and the column variable. The
alternative hypothesis H a says that there is some relationship but does not say
what kind.
The test compares the observed counts of observations in the cells of the table
with the counts that would be expected if H 0 were true. The expected count in
any cell is
expected count =
row total × column total
table total
The chi-square statistic is
X2 =
(observed count − expected count)2
expected count
The chi-square test compares the value of the statistic X 2 with critical values
from the chi-square distribution with (r − 1)(c − 1) degrees of freedom. Large
values of X 2 are evidence against H 0 , so the P-value is the area under the
chi-square density curve to the right of X 2 .
The chi-square distribution is an approximation to the distribution of the
statistic X 2 . You can safely use this approximation when all expected cell counts
are at least 1 and no more than 20% are less than 5.
If the chi-square test finds a statistically significant relationship between the row
and column variables in a two-way table, do data analysis to describe the nature
of the relationship. You can do this by comparing well-chosen percents,
comparing the observed counts with the expected counts, and looking for the
largest terms of the chi-square statistic.
STATISTICS IN SUMMARY
Here are the most important skills you should have acquired from reading this
chapter.
A. TWO-WAY TABLES
1. Understand that the data for a chi-square test must be presented as a
two-way table of counts of outcomes.
2. Use percents to describe the relationship between any two categorical
variables, starting from the counts in a two-way table.
B. INTERPRETING CHI-SQUARE TESTS
1. Locate the chi-square statistic, its P-value, and other useful facts (row or
column percents, expected counts, terms of chi-square) in output from
your software or calculator.
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
Check Your Skills
2. Use the expected counts to check whether you can safely use the
chi-square test.
3. Explain what null hypothesis the chi-square statistic tests in a specific
two-way table.
4. If the test is significant, compare percents, compare observed with
expected cell counts, or look for the largest terms of the chi-square statistic
to see what deviations from the null hypothesis are most important.
C. DOING CHI-SQUARE TESTS BY HAND
1. Calculate the expected count for any cell from the observed counts in a
two-way table. Check whether you can safely use the chi-square test.
2. Calculate the term of the chi-square statistic for any cell, as well as the
overall statistic.
3. Give the degrees of freedom of a chi-square statistic. Make a quick
assessment of the significance of the statistic by comparing the observed
value with the degrees of freedom.
4. Use the chi-square critical values in Table E to approximate the P-value of
a chi-square test.
CHECK YOUR SKILLS
The National Survey of Adolescent Health interviewed several thousand teens (grades 7
to 12). One question asked was “What do you think are the chances you will be married
in the next ten years?”Here is a two-way table of the responses by sex:11
Almost no chance
Some chance, but probably not
A 50-50 chance
A good chance
Almost certain
Female
Male
119
150
447
735
1174
103
171
512
710
756
23.19 The number of female teenagers in the sample is
(a) 4877.
(b) 2625.
(c) 2252.
23.20 The percent of the females in the sample who responded “almost certain” is about
(a) 44.7%.
(b) 39.6%.
(c) 33.6%.
23.21 The percent of the females in the sample who responded “almost certain” is
(a) higher than the percent of males who felt this way.
(b) about the same as the percent of males who felt this way.
(c) lower than the percent of males who felt this way.
23.22 The expected count of females who respond “almost certain” is about
(a) 464.6.
(b) 891.2.
(c) 1038.8.
571
P1: PBU/OVY
GTBL011-23
572
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
23.23 The term in the chi-square statistic for the cell of females who respond “almost
certain” is about
(a) 17.6.
(b) 15.6.
(c) 0.1.
23.24 The degrees of freedom for the chi-square test for this two-way table are
(a) 4.
(b) 8.
(c) 20.
23.25 The null hypothesis for the chi-square test for this two-way table is
(a) Equal proportions of female and male teenagers are almost certain they will
be married in ten years.
(b) There is no difference between female and male teenagers in their
distributions of opinions about marriage.
(c) There are equal numbers of female and male teenagers.
23.26 The alternative hypothesis for the chi-square test for this two-way table is
(a) Female and male teenagers do not have the same distribution of opinions
about marriage.
(b) Female teenagers are more likely than male teenagers to think it is almost
certain they will be married in ten years.
(c) Female teenagers are less likely than male teenagers to think it is almost
certain they will be married in ten years.
23.27 Software gives chi-square statistic X 2 = 69.8 for this table. From the table of
critical values, we can say that the P-value is
(a) between 0.0025 and 0.001.
(b) between 0.001 and 0.0005.
(c) less than 0.0005.
23.28 The most important fact that allows us to trust the results of the chi-square test is
that
(a) the sample is large, 4877 teenagers in all.
(b) the sample is close to an SRS of all teenagers.
(c) all of the cell counts are greater than 100.
C H A P T E R 23 EXERCISES
If you have access to software or a graphing calculator, use it to speed your analysis of
the data in these exercises. Exercises 23.29 to 23.38 are suitable for hand calculation if
necessary.
23.29 Who’s online? A sample survey by the Pew Internet and American Life Project
asked a random sample of adults about use of the Internet and about the type of
community they lived in. Here, repeated from Exercise 23.4, is the two-way table:
Community Type
Internet users
Nonusers
Rural
Suburban
Urban
433
463
1072
627
536
388
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
Chapter 23 Exercises
(a) Give a 95% confidence interval for the difference between the proportions of
rural and suburban adults who use the Internet.
(b) What is the overall pattern of the relationship between Internet use and
community type? Is the relationship statistically significant?
23.30 Child care workers. A large study of child care used samples from the data
tapes of the Current Population Survey over a period of several years. The result is
close to an SRS of child care workers. The Current Population Survey has three
classes of child care workers: private household, nonhousehold, and preschool
teacher. Here are data on the number of blacks among women workers in these
three classes:12
Household
Nonhousehold
Teachers
Total
Black
2455
1191
659
172
167
86
(a) What percent of each class of child care workers is black?
(b) Make a two-way table of class of worker by race (black or other).
(c) Can we safely use the chi-square test? What null and alternative hypotheses
does X 2 test?
(d) The chi-square statistic for this table is X 2 = 53.194. What are its degrees of
freedom? What is the mean of X 2 if the null hypothesis is true? Use Table E to
approximate the P-value of the test.
(e) What do you conclude from these data?
23.31 Free speech for racists? The General Social Survey (GSS) for 2002 asked this
question: “Consider a person who believes that Blacks are genetically inferior. If
such a person wanted to make a speech in your community claiming that Blacks
are inferior, should he be allowed to speak, or not? ” Here are the responses,
broken down by the race of the respondent:13
Allowed
Not allowed
Black
White
Other
67
53
476
252
35
17
(a) Because the GSS is essentially an SRS of all adults, we can combine the races
in these data and give a 99% confidence interval for the proportion of all adults
who would allow a racist to speak. Do this.
(b) Find the column percents and use them to compare the attitudes of the three
racial groups. How significant are the differences found in the sample?
23.32 Do you use cocaine? Sample surveys on sensitive issues can give different
results depending on how the question is asked. A University of Wisconsin study
divided 2400 respondents into 3 groups at random. All were asked if they had
ever used cocaine. One group of 800 was interviewed by phone; 21% said they
had used cocaine. Another 800 people were asked the question in a one-on-one
First Light/Getty Images
573
P1: PBU/OVY
GTBL011-23
574
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
personal interview; 25% said “Yes.” The remaining 800 were allowed to make an
anonymous written response; 28% said “Yes.” 14 Are there statistically significant
differences among these proportions? State the hypotheses, convert the
information given into a two-way table of counts, give the test statistic and its
P-value, and state your conclusions.
23.33 Ethnicity and seat belt use. How does seat belt use vary with drivers’ race or
ethnic group? The answer depends on gender (males are less likely to buckle up)
and also on location. Here are data on a random sample of male drivers observed
in Houston:15
Black
Hispanic
White
Drivers
Belted
369
540
257
273
372
193
(a) The table gives the number of drivers in each group and the number of these
who were wearing seat belts. Make a two-way table of group by belted or not.
(b) Are there statistically significant differences in seat belt use among men in
these three groups? If there are, describe the differences.
23.34 Did the randomization work? After randomly assigning subjects to treatments
in a randomized comparative experiment, we can compare the treatment groups
to see how well the randomization worked. We hope to find no significant
differences among the groups. A study of how to provide premature infants with a
substance essential to their development assigned infants at random to receive
one of four types of supplement, called PBM, NLCP, PL-LCP, and TG-LCP.16
(a) The subjects were 77 premature infants. Outline the design of the
experiment if 20 are assigned to the PBM group and 19 to each of the other
treatments.
(b) The random assignment resulted in 9 females in the TG-LCP group and
11 females in each of the other groups. Make a two-way table of group by gender
and do a chi-square test to see if there are significant differences among the
groups. What do you find?
23.35 Opinions about the death penalty. “Do you favor or oppose the death penalty
for persons convicted of murder? ” When the General Social Survey asked this
question in its 2002 survey, the responses of people whose highest education was a
bachelor’s degree and of people with a graduate degree were as follows:17
Bachelor
Graduate
Favor
Oppose
135
64
71
50
(a) Is there evidence that the proportions of all people at these levels of
education who favor the death penalty differ? Find the two sample proportions,
the z statistic, and its P-value.
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 22, 2006
21:38
Chapter 23 Exercises
(b) Is there evidence that the opinions of all people at these levels of education
differ? Find the chi-square statistic X 2 and its P-value. If your work is correct, X 2
should be the same as z 2 and the two P-values should be identical.
23.36 Unhappy rats and tumors. Some people think that the attitude of cancer
patients can influence the progress of their disease. We can’t experiment with
humans, but here is a rat experiment on this theme. Inject 60 rats with tumor
cells and then divide them at random into two groups of 30. All the rats receive
electric shocks, but rats in Group 1 can end the shock by pressing a lever. (Rats
learn this sort of thing quickly.) The rats in Group 2 cannot control the shocks,
which presumably makes them feel helpless and unhappy. We suspect that the rats
in Group 1 will develop fewer tumors. The results: 11 of the Group 1 rats and 22
of the Group 2 rats developed tumors.18
(a) State the null and alternative hypotheses for this investigation. Explain
why the z test rather than the chi-square test for a 2 × 2 table is the proper test.
(b) Carry out the test and report your conclusion.
23.37 Regulating guns. The National Gun Policy Survey, conducted by the National
Opinion Research Center at the University of Chicago, asked a random sample of
adults many questions about regulation of guns in the United States. One of the
questions was “Do you think there should be a law that would ban possession of
handguns except for the police and other authorized persons? ” Figure 23.7
Less
than
high
school
Yes
High
school
Some
Graduate college
58
50.00
2.6055
College
Graduate
Postgraduate
degree
All
84
39.44
0.0558
169
36.50
1.7989
98
42.06
0.1463
77
43.75
0.4690
486
40.47
*
715
59.53
*
No
58
50.00
1.7710
129
60.56
0.0379
294
63.50
1.2228
135
57.94
0.0994
99
56.25
0.3188
All
116
100.00
*
213
100.00
*
463
100.00
*
233
100.00
*
176
1201
100.00 100.00
*
*
Cell Contents:
Count
% of Column
Contribution to Chi-square
Pearson Chi-Square = 8.525, DF = 4, P-Value = 0.074
F I G U R E 2 3 . 7 Minitab output for the sample survey responses of Exercise 23.37.
575
P1: PBU/OVY
GTBL011-23
576
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 22, 2006
21:38
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
displays Minitab output that includes the two-way table of response versus the
respondents’ highest level of education.19
(a) The column percents show the breakdown of responses separately for each
level of education. Which education groups show particularly high and low
support for the proposed law? Which education group’s responses contribute the
most to the size of the chi-square statistic? Is there a consistent direction in the
relationship, such as “people with more education are more likely to support
strong gun laws”?
(b) Verify the degrees of freedom given by Minitab. How does the value of the
chi-square statistic compare with its mean under the null hypothesis? What do
you conclude from the chi-square test?
23.38 I think I’ll be rich by age 30. A sample survey of young adults
(aged 19 to 25) asked, “What do you think are the chances you will have
much more than a middle-class income at age 30? ” The CrunchIt! output
in Figure 23.8 shows the two-way table and related information, omitting
a few subjects who refused to respond or who said they were already rich.20
Cell format
Count
(Column percent)
Male
Female
Almost no chance
98
(3.985%)
96
(4.056%)
194
(4.02%)
Some, but probably not
286
(11.63%)
426
(18%)
712
(14.75%)
A 50-50 chance
720
(29.28%)
696
(29.4%)
1416
(29.34%)
A good chance
758
(30.83%)
663
(28.01%)
1421
(29.44%)
Almost certain
597
(24.28%)
486
(20.53%)
1083
(22.44%)
Total
Statistic DF
Total
2459
2367
4826
(100.00%) (100.00%) (100.00%)
Value
P-Value
Chi-square 4 43.94552 <0.0001
F I G U R E 2 3 . 8 CrunchIt! output for the sample survey responses of Exercise 23.38.
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
Chapter 23 Exercises
Use the output as the basis for a discussion of the differences between
young men and young women in assessing their chances of being rich by
age 30.
The remaining exercises concern larger tables that require software for easy analysis.
Follow the Formulate, Solve, and Conclude steps of the four-part process in your
answers to these exercises. It may be helpful to restate in your own words the State
information given in the exercise.
23.39 Students and catalog shopping. What is the most important reason that
students buy from catalogs? The answer may differ for different groups of students.
Here are results for samples of American and East Asian students at a large
midwestern university:21
Save time
Easy
Low price
Live far from stores
No pressure to buy
Other reason
Total
American
Asian
29
28
17
11
10
20
10
11
34
4
3
7
115
69
Describe the most important differences between American and Asian students.
Is there a significant overall difference between the two distributions of
responses?
23.40 Where do young adults live? A survey by the National Institutes of Health
asked a random sample of young adults (aged 19 to 25), “Where do you live now?
That is, where do you stay most often? ” We earlier (page 513) compared the
proportions of men and women who lived with their parents. Here now is the full
two-way table (omitting a few who refused to answer and one who claimed to be
homeless):22
Parents’ home
Another person’s home
Own place
Group quarters
Female
Male
923
144
1294
127
986
132
1129
119
What are the most important differences between young men and women? Are
their choices of living places significantly different?
23.41 How are schools doing? The nonprofit group Public Agenda conducted
telephone interviews with a stratified sample of parents of high school children.
There were 202 black parents, 202 Hispanic parents, and 201 white parents.
4
STEP
577
P1: PBU/OVY
GTBL011-23
578
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
One question asked was “Are the high schools in your state doing an excellent,
good, fair or poor job, or don’t you know enough to say? ” Here are the survey
results:23
Black
parents
Hispanic
parents
White
parents
12
69
75
24
22
34
55
61
24
28
22
81
60
24
14
202
202
201
Excellent
Good
Fair
Poor
Don’t know
Total
Are the differences in the distributions of responses for the three groups
of parents statistically significant? What departures from the null hypothesis “no
relationship between group and response” contribute most to the value of the
chi-square statistic? Write a brief conclusion based on your analysis.
23.42 The Mediterranean diet. Cancer of the colon and rectum is less common in the
Mediterranean region than in other Western countries. The Mediterranean diet
contains little animal fat and lots of olive oil. Italian researchers compared
1953 patients with colon or rectal cancer with a control group of 4154 patients
admitted to the same hospitals for unrelated reasons. They estimated consumption
of various foods from a detailed interview, then divided the patients into three
groups according to their consumption of olive oil. Here are some of the
data:24
Olive Oil
Hugh Burden/SuperStock
Colon cancer
Rectal cancer
Controls
Low
Medium
High
Total
398
250
1368
397
241
1377
430
237
1409
1225
728
4154
(a) Is this study an experiment? Explain your answer.
(b) The investigators report that “less than 4% of cases or controls refused to
participate.” Why does this fact strengthen our confidence in the results?
(c) The researchers conjectured that high olive oil consumption would be more
common among patients without cancer than among patients with colon cancer
or rectal cancer. What do the data say?
23.43 Market research. Before bringing a new product to market, firms carry out
extensive studies to learn how consumers react to the product and how best to
advertise its advantages. Here are data from a study of a new laundry detergent.25
The subjects are people who don’t currently use the established brand that the
new product will compete with. Give subjects free samples of both detergents.
P1: PBU/OVY
GTBL011-23
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
Chapter 23 Exercises
After they have tried both for a while, ask which they prefer. The answers may
depend on other facts about how people do laundry.
Laundry Practices
Prefer standard product
Prefer new product
Soft water,
warm wash
Soft water,
hot wash
Hard water,
warm wash
Hard water,
hot wash
53
63
27
29
42
68
30
42
How do laundry practices (water hardness and wash temperature) influence the
choice of detergent? In which settings does the new detergent do best? Are the
differences between the detergents statistically significant?
Support for political parties. Political parties anxiously ask what groups of people
support them. The General Social Survey (GSS) asked its 2002 sample, “Generally
speaking, do you usually think of yourself as a Republican, Democrat, Independent, or
what?”Here is a large two-way table breaking down the responses by age group:26
Age Group
Strong Democrat
Not strong Democrat
Independent, near Democrat
Independent
Independent, near Republican
Not strong Republican
Strong Republican
Other party
18–30
31–40
41–55
56–89
60
99
72
152
53
90
42
9
83
126
56
124
41
85
56
12
113
138
77
149
50
133
89
14
151
148
62
102
54
138
127
13
Exercises 23.44 to 23.46 are based on this table.
23.44 Other parties. The GSS is essentially an SRS of American adults. Give a 95%
confidence interval for the proportion of adults who support “other parties.”
23.45 Party support. Make a 2 × 4 table by combining the counts in the three rows
that mention Democrat and in the three rows that mention Republican and
ignoring strict independents and supporters of other parties. We might think of
this table as comparing all adults who lean Democrat and all adults who lean
Republican. How does support of the two major parties differ among age
groups?
23.46 Politics and age. Use the full table to analyze the differences in political party
support among age groups. The sample is so large that the differences are bound to
be highly significant, but give the chi-square statistic and its P-value nonetheless.
The main challenge is in seeing what the data say. Does the full table yield any
insights not found in the compressed table you analyzed in the previous
exercise?
579
P1: PBU/OVY
GTBL011-23
580
P2: PBU/OVY
QC: PBU/OVY
GTBL011-Moore-v18.cls
T1: PBU
June 21, 2006
21:6
C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test
E E S E E CASE STUDIES
The Electronic Encyclopedia of Statistical Examples and Exercises (EESEE) is available
on the text CD and Web site. These more elaborate stories, with data, provide settings
for longer case studies. Here are some suggestions for EESEE stories that apply the
chi-square test.
23.47 Read the EESEE story “Surgery in a Blanket.” Write a report that answers
Questions 1, 3, 5, 6, and 7 for this case study.
23.48 Read the EESEE story “Trilobite Bites.” Write a report that answers Questions 1,
2, 4, and 5 for this case study.