Notes on Chi-Squared Tests

Math 143 – Fall 2010
Chi-Squared Tests
1
Goodness of Fit Testing
Golfballs in the Yard
The owners of a residence located along a golf course collected the first 500 golf balls that landed on their
property. Most golf balls are labeled with the make of the golf ball and a number, for example “Nike 1” or
“Titleist 3”. The numbers are typically between 1 and 4, and the owners of the residence wondered if these
numbers are equally likely (at least among golf balls used by golfers of poor enough quality that they lose them
in the yards of the residences along the fairway).
Of the 500 golf balls collected, 486 of them bore a number between 1 and 4. The results are tabulated
below:
number on ball
count
1
137
2
138
3
107
4
104
Q. What should the owners conclude?
A. We will follow our usual four-step hypothesis test procedure to answer this question.
1. State the null and alternative hypotheses.
• The null hypothesis is that all of the numbers are equally likely. We could express this as
H0 : p1 = p2 = p3 = p4
where pi is the true proportion of golf balls (in our population of interest) bearing the number i.
Since these must sum to 1, this is equivalent to
H0 : p1 = 0.25, p2 = 0.25, p3 = 0.25, and p4 = 0.25 .
• The alternative is that there is some other distribution (in the population).
2. Calculate a test statistic.
While we could make up a test statistic (and we did this in class), the usual test statistic used for this
situation is the Chi-squared statistic:
χ2 =
X (Oi − Ei )2
Ei
,
where
Oi = observed count for cell i ,
Ei = expected count for cell i .
In this example, if H0 is true, we would expect
for each cell. Our test statistic is
χ2 =
=
1
4
of the golf balls in each category, so Ei = 486/4 = 121.5
(137 − 121.5)2
121.5
+
(138 − 121.5)2
121.5
+
(107 − 121.5)2
121.5
+
(104 − 121.5)2
121.5
1.98
+
2.24
+
1.73
+
2.52
= 8.47 .
Math 143 – Fall 2010
Chi-Squared Tests
2
3. Determine the p-value.
When the null hypothesis is true, our test statistic has approximately a Chi-squared distribution. The
degrees of freedom is 1 less than the number of cells in our data table. (That is, 1 less than the number
of categories.)
As usual, we can ask StatCrunch to compute this for us (Stat > Calculators > Chi-square). You will be
asked to provide the degrees of freedom (3, in our example) and the value of the test statistic (8.47, in
our example). Our p-value is the probability of being as larger or larger than 8.47. We choose the
larger tail because the test statistic will be larger the more different our observed and expected counts
are.
In our example, the p-value is
P(X ≥ 8.47) = 0.03725
when H0 is true.
Inference with Two Categorical Variables
Two-Way Tables
To look for an association between two categorical variables, we first put the data into a table (called a two-way
table). The values inside the table describe the joint distribution of the two variables.
In the example below, right-handed men and women had their feet measured to see with foot (if any) was
larger. The two categorical variables are gender (Male or Female) and foot comparison (left foot larger, both
feet the same, or right foot larger). The joint distribution is displayed in the following table.
Men
Women
LBig
2
55
Equal
10
18
RBig
28
14
In addition to the joint distrubion, there are two other kinds of distribtuions that are of interest in this
situation.
1. The marginal distributions give the distribution of one variable ignoring the other variable altogether.
It is common to put this information in the “margin” of the table. That’s where the name comes from.
Here is our data set with the marginal distriubtions added.
Men
Women
total
LBig
2
55
57
Equal
10
18
28
RBig
28
14
42
Total
40
87
127
For the marginal distribution of gender, we see that 40 of the 127 subjects were men. For the marginal
distribution of foot comparison, we see that 57 subjects had larger left feet, 28 and equally sized feet, and
42 had larger right feet.
2. The conditional distributions give the distribution of one variable for a fixed value of the other variable.
Math 143 – Fall 2010
Chi-Squared Tests
3
For example, the conditional distribution of foot comparision among the men (i.e., conditional on being a
man) appears in the first row of the table (2 with larger left feet, 10 with equally sized feet, and 28 with
larger right feet out of 40 men).
Of course each column also represents a conditional distribution (this time conditional on a particular
foot comparison).
Imporant idea: Conditional distributions are found by looking at rows or columns of a twoway table, because a row or column represents selecting just one possible value for one variable and
looking at the distriubtion of the other variable in that situation.
Often it is more useful to express the joint, marginal, and conditional distributions using percentages or
proportions rather than counts. This especially makes it easier to compare them when the total counts differ.
Here are the joint and marginal distributions again described with percentages.
Men
Women
Total
LBig
1.6
43.3
44.9
Equal
7.9
14.2
22.0
RBig
22.0
11.0
33.1
Total
31.5
68.5
100.0
Of more interest to us are the conditional probabilities. We would like to know, for example, how the
percentage of right-handed men with larger left feet compares to the percentage of right-handed women with
larger left feet. We can do this by calculating tables of row- and column-percents (or proportions). The tables
below show percentages.
Men
Women
LBig
5.0
63.2
Men
Women
Total
Equal
25.0
20.7
LBig
3.5
96.5
100.0
RBig
70.0
16.1
Equal
35.7
64.3
100.0
Total
100.0
100.0
RBig
66.7
33.3
100.0
1 Express the percentages in the previous two tables using conditional probability notation. [P(A | B)]
2 Answer the following questions refering to the previous two tables.
a) One of these tables is much more informative than the other. Which one and why?
b) Informally/intuitively, what do these data suggest?
c) Can we be sure that this holds for the population as well as for our sample? Why or why not?
d) Do you think that this might hold for the population as well as for our sample? Why or why not?
Math 143 – Fall 2010
Chi-Squared Tests
4
The Chi-Square Test for 2-Way Tables
The hypothesis test that we use in this situation is called the Pearson chi-square test. It follows the same
outline as our other hypothesis tests, but we have some details to fill in.
1. State the null and alternative hypotheses:
Details needed:
2. Compute the test statistic.
Details needed:
3. Find the p-value
Details needed:
4. Draw a conclusion.
Details needed:
Step 1: State the Hypotheses
Chi-square tests are generally used to answer 1 of 2 questions (or, the same question phrased 2 ways):
1. Does the percent in a certain category change from population to population?
Example: Does the percent of deaths from heart disease change from country to country? Or we could
say, is there an association between the percent of deaths from heart disease and which country a person
is from?
2. Are these two categorical variables independent?
Example: Is alcohol consumption associated with oesophageal cancer?
In part the difference has to do with how the data were collected. If we independently sample from multiple
populations, question 1 often fits the situation better. If we sample from one population and categorize the
subjects in two ways, question 2 often fits better.
So our null and alternative hypotheses are
• H0 :
• Ha :
Math 143 – Fall 2010
Chi-Squared Tests
5
Step 2: Compute the Test Statistic
Here is our table again. We want to measure how close these number are to what we expect if the null hypothesis
is true. To answer that, we first need to ask “What do we expect?”
Observed Counts
Men
Women
total
LBig
2
55
57
Equal
10
18
28
RBig
28
14
42
Total
40
87
127
3 The top left cell contains the number 2. What would we expect if there is no association between gender and
foot comparison? Fill in the table below with the expected counts for each cell.
Expected Counts
LBig
Equal
RBig
Men
Women
We still need a way to compare the tables of observed and expected counts and to summarize the differences
in a single number, our test statistic:
X2 =
=
= 45.00
Step 3: Compute the p-value
So how big is 45? Is it unusually big? About what we would expect? The p-value will answer that for us. But
to get that answer we first need to answer a different question, namely:
Math 143 – Fall 2010
Chi-Squared Tests
6
Step 4: Interpret the p-value and Draw a Conclusion
StatCrunch can do all the work in steps 2 and 3 for us if we like (Stat > Tables > Contingency). Here is the
output from StatCrunch:
• There are 2 degrees of freedom because
• The p-value is < 0.0001, so
• What more can we learn from the other information in the table?
Math 143 – Fall 2010
Chi-Squared Tests
7
Example 1. A random sample of high school and college students in 1973 was classified according to
political views and frequency of marijuana use. The following two-way table summarizes the data.
Marijuana Use
Liberal
Consrvative
Other
Never
479
214
172
Rarely
173
47
45
Frequently
119
15
85
Example 2. A number of school children (elementary and middle) were asked which of three things they
thought was most important to them, Grades, Popoularity, or Sports. Here are the results.
boy
girl
Grades
27
20
Popular
11
14
Sports
9
3
Math 143 – Fall 2010
Chi-Squared Tests
8
Some Fine Print
Is the Approximation Good Enough?
The Chi-squared distribution is only an approximation. The approximation is poor when expected cell
counts are small. Our rule of thumb will be this:
The Chi-squared approximation is good enough for our purposes provided
all the expected cell counts are at least 5.
We will use this rule for both goodness of fit and 2-way tables situations.
Some authors suggest slightly relaxes (but more complicated) rules of thumb. As with many things, this
isn’t about black and white, but shades of gray. The smaller the cell counts, the worse things work. There are
other methods that can be used when the cell counts are small, but we won’t cover them in this course.
Causation?
When we do a Chi-squared test for a 2-way table and the p-value is small, it is temping to infer a causal
relationship between the 2 categorical variables, but the Chi-squared test cannot tell us about cause (at least
not directly). Recall our examples of Simpson’s paradox. There may be an association between our two
variables even without a causal relationship between them.
Not All Tables are Created Equal
For a table to be analyzed by these methods
• The cell values must be counts.
• Every individual in the sample must be counted in exactly 1 cell.
This is another way of saying the tables come from tabulating the categories of categorical variables.
Chi-squared or 2-proportion for 2 × 2 tables?
2 × 2 tables can also be analyzed using a 2-proportion test. The advantage to a 2-proportion approach is that
we can produce a confidence interval for the difference in the two proportions.
Beware of Really Large Data Sets
In very large data sets, a small p-value might not be very interesting. With large data sets, we can detect very
small deviances from the null hypothesis. But these small differences might not be important, even if they are
statistically significant. We haven’t discussed following up with confidence intervals (our usual solution to this
problem in previous scenarios), but we should at least
• compare cell percentages to hypothesis percentages for goodness of fit tests.
• compare row percentages across columns or column percentages across rows for 2-way tables.
This will often give us a good sense for whether the differences are big enough to be interesting.
Math 143 – Fall 2010
Chi-Squared Tests
9
Residuals
The residual for the ith cell is defined by
residuali =
Oi − Ei
√
.
Ei
The square of the residual is the contribution of a cell to the chi-squared statistic. The primary advantage of
residuals is that they can be positive or negative. Looking at the residual will show which cells of the table are
farthest from what we would have expected under H0 and whether those counts are too small or too large.
StatCrunch doesn’t compute residuals, but other software does. Here is some typical output:
> footsize
Lbig Equal Rbig
men
2
10
28
women
55
18
14
> xchisq.test(footsize)
Pearson’s Chi-squared test
data: footsize
X-squared = 45.0029, df = 2, p-value = 1.689e-10
2.00
(17.95)
[14.176]
<-3.77>
10.00
( 8.82)
[ 0.158]
< 0.40>
28.00
(13.23)
[16.495]
< 4.06>
55.00
(39.05)
[ 6.518]
< 2.55>
18.00
(19.18)
[ 0.073]
<-0.27>
14.00
(28.77)
[ 7.584]
<-2.75>
key:
observed
(expected)
[contribution to X-squared]
<residual>
From this we see that compared to what our null hypothesis predicted:
• There were too few men with large left feet, and too many men with large right feet.
• There were too few women with large right feet and too many women with large right feet.
• The numbers of men and women with equal sized feet was about what we would have expected.
Math 143 – Fall 2010
Chi-Squared Tests
10
Combining Cells
Often researchers will combine cells from a larger table to make a smaller table. There are at least two reasons
to do this:
• Combining cells increases the cell counts, which can be important if some cells have small counts.
• Some categories might naturally go together and lead to a test of a reasonable and interesting hypothesis.
For example, we could combine the LBig and Rbig columns to see whether men or women are more likely
to have feet that are the same size:
Different Equal
men
30
10
women
69
18
> chisq.test(footsize2)
Pearson’s Chi-squared test with Yates’ continuity correction
data: footsize2
X-squared = 0.0985, df = 1, p-value = 0.7536
Here we see that we have no reason to reject the null hypothesis that men and women are equally likely
to have equally sized feet. (In this case we could have also used a 2-proportion test.)