Lecture 21

Section 2.5: Data Analysis for Two-Way Tables
Section 9.1: Chi-square test for Two-Way Tables
Learning goals for this chapter:
Find the joint, marginal, and conditional distributions from a two-way table of the
counts by hand and with SPSS.
Determine from the wording of the story whether the question is asking for a
joint, marginal, or conditional percentage/probability.
Know when it two-way tables and the chi-square test are the correct statistical
technique for a story.
Perform a hypothesis test for a 2 test, including: stating the hypotheses,
obtaining the test statistic and P-value from SPSS, and writing a conclusion in
terms of the story.
Check assumption to see if it is appropriate to use a 2 test using the footnote of
the SPSS 2 test.
Two-way tables and the chi-square test are used when you are studying the association
between 2 categorical variables.
The joint distribution of the 2 categorical variables is the
cell #
(the inner squares).
total #
All the joint distribution should add to 1.
The marginal distribution allows us to study 1 variable at a time. You get them just by
adding across a row or down a column for the specific variable you are interested in. The
marginals are written in the margins of the table (far right and very bottom).
The marginals for the row variable should add to 1.
The marginals for the column variable should add to 1.
Conditional distribution: If you know one variable for sure (you have “reduced your
world”), what are the respective percentages for the other variable?
Bar graphs are a good way to demonstrate conditional distributions.
Hypothesis testing with 2-way tables
H0: There is no association between the row and column variables in the population.
Ha: There is an association between the row and column variables in the population.
To test the null hypothesis, compare observed cell counts with expected cell counts
calculated under the assumption that the null hypothesis is true.
1
Test statistic: Chi Square Test Statistic
X
2
observed count - expected count
2
expected count
row total x column total
,
n
where n = total # of observations for the table.
Expected count =
The X2 test statistic has an approximately chi-square distribution.
To use the chi-square table, you need the degrees of freedom, (r-1)(c-1). Go to Table F
in the back of the book.
WE WILL LET SPSS CALCULATE THE TEST STATISTIC AND P-VALUE
FOR US. YOU DO NOT NEED TO KNOW HOW TO USE THE TABLE.
P-value for chi-square test is: P (
2
X 2)
(We’ll be using SPSS to do the test.)
The chi-square test becomes more accurate as the cell counts increase and for tables
larger than 2x2.
For tables larger than 2x2: use chi-square test whenever
the average of the expected counts is 5 or more
and the smallest expected count is 1 or more
<20% of cells have expected counts of less than 5.
For 2x2 tables: use chi-square test whenever
all 4 expected cell counts to be 5 or more
Example: Market researchers know that background music can influence the mood and
purchasing behavior of customers. One study in a supermarket in Northern Ireland
compared 3 treatments: no music, French accordion music, and Italian string music.
Under each condition, the researchers recorded the number of bottles of French, Italian,
and other wine purchased. Here is the 2-way table that summarizes the data in counts
(total # of bottles sold = 243:
Wine
French
Italian
Other
None
30
11
43
Music
French
39
1
35
2
Italian
30
19
35
Calculate the joint distribution for music and wine:
Music
Wine
French
Italian
Other
None
12.3
4.5
17.7
French
16.0
0.4
14.4
Italian
12.3
7.8
14.4
Calculate the marginal distribution for music:
Music
Wine
French
Italian
Other
Marg. for music
None
12.3
4.5
17.7
34.6
French
16.0
0.4
14.4
30.9
Italian
12.3
7.8
14.4
34.6
Calculate the marginal distribution for wine:
Music
Wine
None French Italian Marg. for wine
French
12.3
16.0
12.3
40.7
Italian
4.5
0.4
7.8
12.8
Other
17.7
14.4
14.4
46.5
Marg. for music 34.6
30.9
34.6
100
3
Questions (joint, marginal, conditional?):
1. “What percent of all wine bought was Italian with French music playing in the
store?”
2. “Of the Italian wine purchased, what percent was from a store playing French
music?”
3. “What percent of wine bought was Italian?”
4. “What percent of the wine purchased from French music-playing stores was
French?”
5. “What percent of wine was purchased from a store with no music playing?”
Using SPSS, set up the data so that you have a wine column, a music column, and a
purchase column (where you will input the counts inside the chart).
Wine
French
Italian
Other
French
Italian
Other
French
Italian
Other
Music
None
None
None
French
French
French
Italian
Italian
Italian
Purchase
30
11
43
39
1
35
30
19
35
Then go to Data --> Weight Cases. Click “Weight cases by” and then move “purchase”
into the “frequency variable” box. Click OK. Do Analyze--> Descriptive Statistics -->
Crosstabs. Make sure “observed” is checked. Put “wine” into the “Rows” box and
“music” into the “Columns” box. Click OK. You will get:
4
Type of W ine * Type of Music Crosstabulation
Count
Type of
Wine
French
Italian
Other
Total
French
39
1
35
75
Type of Music
Italian
30
19
35
84
None
30
11
43
84
Total
99
31
113
243
Then if you want the %s for joint and marginal distributions instead of counts, you go
back to your data and do Analyze --> Descriptive Statistics --> Crosstabs --> (your rows
and columns should still be entered from the previous step) --> Click “Cells” --> Click
“Total.” Also, un-click “observed” so your table won’t also include the counts and be
too crowded. Click “Continue” and then “OK.” You will get:
Type of W ine * Type of Music Crosstabulation
% of Total
Type of
Wine
French
Italian
Other
Total
French
16.0%
.4%
14.4%
30.9%
Type of Music
Italian
12.3%
7.8%
14.4%
34.6%
None
12.3%
4.5%
17.7%
34.6%
Total
40.7%
12.8%
46.5%
100.0%
Is there a relationship in the population between the type of wine purchased and the type
of music that is playing? Perform a significance test, and write a short summary of your
conclusion.
Hypotheses:
Test statistic:
P-value:
Conclusion in terms of the story:
Was it appropriate to use the chi-square test here? Justify your answer.
5
To make SPSS do the hypothesis test, you go back to Analyze --> Descriptive Statistics -> Crosstabs --> Cells. Then click “total” to make their checks go away. Also click
“expected” under “counts.” Click Continue. Then click Statistics --> Chi-Square -->
Continue --> OK. You will get:
Chi -Square Tests
Pearson Chi-Square
Likelihood Ratio
N of Valid Cases
Value
18.279a
21.875
243
df
4
4
Asy mp. Sig.
(2-s ided)
.001
.000
a. 0 cells (.0% ) have expect ed count less than 5. The
minimum expected count is 9.57.
Use the “Pearson Chi-Square” to get your X2 test statistic, and the “Asymp. Sig.” to get
the P-value.
Example: Psychological and social factors can influence the survival of patients with
serious diseases. One study examined the relationship between survival of patients with
coronary heart disease and pet ownership. Each of 92 patients was classified as having a
pet or not and by whether they survived for one year. The researchers suspect that having
a pet might be connected to the patient status. Here are the data:
Patient Status
Alive
Dead
Total
Pet ownership
No Yes
28 50
11 3
39 53
a) Find the joint and marginal distributions (in probabilities) of patient status and
pet ownership.
Patient Status
Pet ownership
No
Yes
Alive
Dead
Marg. for pets
0.304 0.543
0.120 0.033
0.424 0.576
Marg, for
status
0.847
0.153
b) Assuming a patient is still alive, what is the probability he owns a pet? Is this a
joint, marginal, or conditional probability?
6
c) What is the probability a patient is still alive and owns a pet? Is this a joint,
marginal, or conditional probability?
d) What is the probability a patient owns a pet? Is this a joint, marginal, or
conditional probability?
e) State the hypotheses for a 2 test of this problem, find the X2 test statistic, its
degrees of freedom, and the P-value. State your conclusion in terms of the
original problem.
Hypotheses:
Test statistic:
P-value:
Conclusion in terms of the story:
Chi-Square Tests
Pearson Chi-Square
Continuity
Correction(a)
Likelihood Ratio
1
Asymp. Sig.
(2-sided)
.003
7.190
1
.007
9.011
1
.003
Value
8.851(b)
df
Fisher's Exact Test
Exact Sig.
(2-sided)
.006
Linear-by-Linear
Association
8.755
N of Valid Cases
92
1
.003
7
Exact Sig.
(1-sided)
.004
Student Handout for M&Ms/Skittles Activity (Chapter 9: Two Way Distributions)
Part 1: Plain vs. Peanut M&Ms
1.
Your data for plain (mine for peanut), in counts:
Brown
Plain
Peanut
Total
2
Yellow
Red
Blue
5
0
3
Orange
Green
8
Total
4
22
Overall total number of plain and peanut M&Ms counted:
2.
Joint Distribution (in white boxes). Divide each count above by the overall total
of M&Ms.
Brown
Yellow
Red
Blue
Plain
Peanut
Marginal for
color
Orange
Green
Marginal
for
flavor
100%
3.
Marginal Distributions (above in shading). Add down the columns and across the
rows. The bottom numbers should add to 100%, and the right column should add
to 100%.
4.
Conditional distribution of flavor for green M&Ms (you know the M&M is green,
now what is the chance it is …). The denominator will be the same for both of
these calculations. These two percentages should add to 100%.
Plain
5.
Peanut
Bar graph for the conditional distributions above (you will have 2 bars on 1
graph):
8
6.
Conditional distribution of color for plain M&M. Denominator will be the same
for all 6 calculations. All 6 add to 100%.
Brown
Yellow
Red
Blue
Orange
Green
7.
Sketch a bar graph for the conditional distribution of color for plain M&Ms. You
will have 6 bars on the graph.
8.
Conditional distribution of color for peanut M&Ms. Denominator will be the
same for all 6 calculations. All 6 add to 100%.
Brown
9.
Yellow
Red
Blue
Orange
Green
Sketch a bar graph for the conditional distribution of color for peanut M&Ms.
You will have 6 bars on the graph. Use the same y-axis scale that you used for
the bar graph for plain M&Ms so that you can easily compare your results?
9
How do they compare? In order to do a hypothesis test, we need a large data set, like one
from the whole class.
Plain
Peanut
Brown
147
69
Yellow
302
110
Red
264
70
Blue
407
162
Orange
330
148
Green
373
123
10.
Hypotheses for the M&Ms 2 hypothesis test. Be sure to state whether your
conclusion refers to the population or the sample.
11.
Test statistic and P-value for the
12.
Conclusion for the
13.
Was it appropriate to use the chi-square test here?
2
2
hypothesis test:
hypothesis test ( = 0.01) in terms of the story.
10
Part 2: M&Ms vs. Skittles
Table for counts for the whole class:
Yellow
302
361
663
M&Ms
Skittles
Total
2
Non-yellow
1521
1351
2872
14.
Hypotheses for
test:
15.
Test statistic and P-value:
16.
Conclusion ( = 0.01) in terms of the story.
17.
Was it appropriate to use the chi-square test here?
11
Total
1823
1712
3535