The chi-square test for two-way tables

Objectives (PSLS Chapter 22)
The chi-square test for two-way tables

Two-way tables

Hypotheses for the chi-square test for two-way tables

Expected counts in a two-way table

Conditions for the chi-square test

Chi-square test for two-way tables of fit

Simpson’s paradox
Two-way tables
Two-way tables organize data about two categorical variables with a
finite number of levels/treatments.
High school students were asked whether they smoke
and whether their parents smoke:
Second factor:
Student smoking status
First factor:
Parent smoking status
400
416
188
1380
1823
1168
Marginal distribution
The marginal distributions (in the “margins” of the table) summarize
each factor independently.
Marginal distribution for
parental smoking:
400
416
188
1380
1823
1168
P(both parents)
= ??/?? = ??%
percent of all students
P(one parent) = ??%
P(neither parent) = ??%
40
30
20
10
0
both parents
one parent
neither parent
With two factors, there are two marginal distributions.
400
416
188
1380
1823
1168
P(student smokes)
= ??/?? = ??%
P(student doesn’t)
= ??/?? = ??%
percent of all students
Marginal distribution for student smoking:
80
60
40
20
0
Student smokes
Sudent doesn't
Conditional distribution
The cells of the two-way table represent the intersection of a given level
of one factor with a given level of the other factor. They can be used to
compute the conditional distributions.
400
416
188
1380
1823
1168
Conditional distribution of student smoking for
different parental smoking statuses:
P(student smokes | both parents) = ??/?? = ??%
P(student smokes | one parent) = ??/?? =??%
P(student smokes | neither parent) = ??/?? = ??%
Hypotheses
A two-way table has r rows and c columns.
H0: There is no association between the row and column variables.
Ha: There is an association/relationship between the two variables.
We will compare actual counts from the sample data with expected
counts given the null hypothesis of no relationship.
Expected counts in a two-way table
A two-way table has r rows and c columns. H0 states that there is no
association between the row and column variables (factors) in the table.
The expected count in any cell of a two-way table when H0 is true is:
expected count 
row total  column tot al
table total
Conditions for the chi-square test
The chi-square test for two-way tables looks for evidence of
association between two categorical variables (factors) in sample
data. The samples can be drawn either:

By randomly selecting SRSs from different populations (or from a
population subjected to different treatments)

girls vaccinated for HPV(Human Papillomavirus) or not among
8th-graders and 12th-graders


remission or no remission for different treatments
Or by taking one SRS and classifying the individuals according to two
categorical variables (factors)

obesity and ethnicity among high school students
We can safely use the chi-square test when:

very few (no more than 1 in 5) expected counts are < 5.0

all expected counts are ≥ 1.0
[Note: If one factor has many levels and too many expected counts are
too low, you might be able to “collapse” some of the levels (regroup
them) and thus have large enough expected counts.]
The chi-square test for two-way tables
H0 : there is no association between the row and column variables
Ha : H0 is not true
The c2 statistic is summed over all r × c cells in the table:
c 
2
observed count - expected count 2
expected count
When H0 is true, the c2 statistic
follows ~ c2 distribution with
(r-1)(c-1) degrees of freedom.
P-value: P(c2 variable ≥ calculated c2)
Expected counts computation
Student smokes Student not
smokes
Row
total
Both
parents
smoke
400
1780*1004/5375
=332.49
1380
1780*4371/5375
=1447.51
1780
One
416
2239*1004/5375
=418.22
1823
2239*4371/5375
=1820.78
2239
Neither
188
1356*1004/5375
=253.29
1168
??
1356
Column
total
1004
4371
5375
Chi-square Stat computation
c
2
2
2

400  332.49  1380  1447.51


332.49
1447.51
2
2

416  418.22  1823  1820.78


418.22
1820.78
2
2

188  253.29  1168  1102.71


253.29
1102.71
 13.709  3.149  0.012  0.003  16.829  3.866  37.566
Influence of parental smoking
Here is a computer output for a chi-square test performed on the data from
a random sample of high school students (rows are parental smoking
habits; columns are the students’ smoking habits). What does it tell you?
Sample size?
Hypotheses?
Are the data okay for a c2 test?
Interpretation?
Table D
Ex: df = 6
If c2 = 15.9
the P-value
is between
0.01−0.02.
Cocaine addiction
Cocaine produces short-term feelings of physical
and mental well being. To maintain the effect, the
drug may have to be taken more frequently and at
higher doses. After stopping use, users will feel tired,
sleepy, and depressed.
A study compares the rates of successful rehabilitation for cocaine addicts
following one of three treatment options:
1. antidepressant treatment (desipramine)
2. standard treatment (lithium)
3. placebo (“sugar pill”)
Observed %
Expected %
35%
35%
35%
Expected relapse counts
No
25*26/74 ≈ 8.78
Desipramine
25*0.351
Yes
16.22
25*0.649
Lithium
9.13
26*0.351
16.87
26*0.649
Placebo
8.07
23*0.351
14.93
23*0.649
Table of counts:
“actual/expected,” with
three rows and two
columns:
No relapse
Relapse
Desipramine
15
8.78
10
16.22
Lithium
7
9.13
19
16.87
Placebo
4
8.07
19
14.93
df = (3 − 1)(2 − 1) = 2
2
2




15

8
.
78
10

??
c2 

We compute the c2 statistic:
Using Table D: 10.60 < c2 < 11.98
8.78
??
2
2


7  9.14 
19  16.86 


9.14
16.86
2
2


4  8.08
19  14.93


8.08
14.93
 ??

?? > P > ??
The P-value is very small (software gives P = 0.0047) and we reject H0.
 There is a significant relationship between treatment type (desipramine, lithium,
placebo) and outcome (relapse or not).
Interpreting the c2 output
When the c2 test is statistically significant:
The largest components indicate which condition(s) are most different
from H0. You can also compare the observed and expected counts, or
compare the computed proportions in a graph.
No relapse
Desipramine 4.41
0.50
Lithium
2.06
Placebo
Relapse
2.39
0.27
1.12
c2 components
The largest c2 component, 4.41, is for
desipramine/no relapse. Desipramine has
the highest success rate (see graph).
A 2013 Gallup study investigated how phrasing affects the opinions of Americans
regarding physician-assisted suicide. Telephone interviews were conducted with a
random sample of 1,535 national adults. Using random assignment, 719 heard the
question in Form A (“end the patient’s life by some painless means”) and 816 the
one in Form B (“assist the patient to commit suicide”).
Allowed
Not allowed
No opinion
Total
"Painless
means"
503
194
22
719
"Commit
suicide"
416
367
33
816
The chi-square test statistic for these data is c2 = 57.88. Conclude using  = 0.05.
A. There is significant evidence of a relationship between opinions and question
wording.
B. We failed to find significant evidence of a relationship between opinions and
question wording.
C. The test assumptions are not met.
c 2  57.88
Should be allowed
Should not be allowed
No opinion
Number interviewed
We found that phrasing significantly
(P < 0.0005) influences opinions about
physician-assisted suicide.
Specifically, the phrasing of “painless
means” resulted in a substantially higher
approval (70% in favor) than the
phrasing of “commit suicide” (51% in
favor).
Form A
"End the patient's life by
some painless means"
70%
27%
3%
Form B
"Assist the patient to
commit suicide"
51%
45%
4%
719
816
Caution with categorical data
Beware of lurking variables!
An association that holds for all of several groups can reverse direction
when the data are combined to form a single group. This reversal is
called Simpson’s paradox.
Kidney stones
A study compared the success rates of
two different procedures for removing
kidney stones: open surgery and
Small stones
Open surgery
PCNL
Success
81
234
273
289
77
6136
Failure
6
% failure
7%
13%
22%
17%
percutaneous nephrolithotomy (PCNL),
a minimally invasive technique.
Can you think of a possible lurking variable here?
The procedures are not chosen randomly by surgeons! In fact, the minimally
invasive procedure is most likely used for smaller stones (with a good chance of
success) whereas open surgery is likely used for more problematic conditions.
Small stones
Large
Open surgery
PCNL
Open surge
Success
81
234
Success
192
273
289
61
Failure
677
36
Failure
71
% failure
7%
13%
% failure
27%
22%
17%
Small stones
Open surgery
PCNL
Success
81
234
Failure
6
36
% failure
7%
13%
Success
Failure
% failure
Large stones
Open surgery
PCNL
192
55
71
25
27%
31%
For both small stones and large stones, open surgery has a lower failure rate.
This is Simpson’s paradox. The more challenging cases with large stones tend
to be treated more often with open surgery, making it appear as if
the procedure was less reliable overall.