Chi-Square Test for Qualitative Data

Chi-Square Test for
Qualitative Data
2
For qualitative data
(measured on a nominal scale)
* Observations MUST be independent
- No more than one measurement per subject
* Sample size must be large enough
- Expected frequencies must be ≥ 5
Chi-square distribution
Critical Values Table on page 537 in your book!
X2 rollercoaster right here in
California
Goodness of Fit χ2
1 variable
l  H0: observed & expected frequencies do not differ
l  Steps:
l 
l 
l 
l 
Calculate expected frequencies
Compute χ2
Compare to critical value
l 
df = # categories - 1
(fO-fE)2
∑
fE
Observed
frequency
Expected
frequency
Example: Goodness of Fit χ2
Married Single Separated Divorced Widowed Total
Sample (N = 100)
fo
50
22
8
18
2
100
expected freq.
fe
0.55
0.21
0.09
0.10
0.05
100%
Is the marital status of our sample representative of the population?
Statistical Hypotheses: H0 = fo’s (observed frequencies) conform to fe’s (expected)
H1 = the sample differs from the expected frequencies
Decision rule: α = .05; df = 5 - 1 = 4; critical χ2= 9.49
Calculate test statistic: (*expected frequencies should not below 5 in any cell!)
2
(
f
o
−
f
e
)
χ2 = ∑
fe
2
2
2
2
2
(
50
−
55
)
(
22
−
21
)
(
8
−
9
)
(
18
−
10
)
(
2
−
5
)
χ2 =
+
+
+
+
55
21
9
10
5
χ 2 = .45 + .05 + .11 + 6.4 + 1.8 = 8.81
Getting the Critical Value
Example: Goodness of Fit χ2
Observed statistical test value: χ2 (4) = 8.81, p > .05
Make a decision & interpret
- Retain H0 because 8.81 < 9.49
- The sample does not significantly differ from the population, with
regard to marital status
Another Example
Rated G
Rated PG-13
Rated NC17
Sample (N = 24)
fo
5
5
14
expected freq.
fe
8
8
8
Is there an association between sexy advertising and buying more products?
Statistical Hypotheses: H0 = there is no association between sexy advertising and
purchases; H1 = there is an association between advertising and purchases
Decision rule: α = .05; df = 3 - 1 = 2; critical χ2= 5.99
Calculate statistic: (remember: expected frequencies should not below 5 in any cell!)
2
(
f
o
−
f
e
)
χ2 = ∑
fe
(5 − 8) 2 (5 − 8) 2 (14 − 8) 2
2
χ =
+
+
8
8
8
χ 2 = 1.125 + 1.125 + 4.5 = 6.75
Another Example
l 
Observed statistical test value: χ2 (2) = 6.75, p < .05
l 
Make a decision & interpret
l  Reject H0 because 6.75 > 5.99
l  Sex sells!
Practice! Goodness of Fit χ2
l 
Lets say you roll a 6-sided dice 120 times. You would EXPECT
that each side would come up 1/6 of the time (i.e., 20 times)
1
fo 18
l 
l 
2
3
4
5
6
19 21 23 22 17
Now your friend gets his own 6-sided dice and rolls it 120 times.
You would have the same EXPECTED frequency here, right?
1
2
3
4
5
6
fo 8
9
15 15 16 57
Calculate a goodness of fit χ2 for both you and your friend, and
determine whether one of you has a weighted dice, at α = .05.
Don’t forget to calculate df to get the critical χ2 value! Is one of the
dice suspect?
Your 120 Rolls
Dice
Obs.
Exp.
O-E
(O-E)2
(O - E) 2
E
1
18
20
-2
4
.20
2
19
20
-1
1
.05
3
21
20
1
1
.05
4
23
20
3
9
.45
5
22
20
2
4
.20
6
17
20
-3
9
.45
120
120
0
€
1.4
Friend’s 120 Rolls
Dice
Obs.
Exp.
O-E
(O-E)2
(O - E) 2
E
1
8
20
-12
144
7.20
2
9
20
-11 €
121
6.05
3
15
20
-5
25
1.25
4
15
20
-5
25
1.25
5
16
20
-4
16
0.80
6
57
20
37
1369
68.45
120
120
0
85
df & critical value…
l 
df = #categories – 1 = 5
l 
Critical χ2 = 11.07
Practice: Goodness of Fit χ2
l 
You:
l 
l 
χ =∑
l 
2
E
=
1.4
NOT SIGNIFICANT
Friend:
l 
(O-E)2
χ
2
(O-E)2 =∑
E
=
85
SIGNIFICANT
Is your friend using a weighted dice?
χ2 Test for Independence
l 
l 
l 
Tests the association between 2 categorical variables
Do the frequencies you actually observe differ from the
expected frequencies by more than chance alone?
Statistical hypotheses:
l 
l 
l 
H0: the 2 variables are independent (i.e. no association)
H1: the variables are not independent
Steps:
l 
l 
l 
Calculate expected frequency of each cell
Compute χ2
Compare to critical value
§ 
df = (# rows – 1) x (# columns – 1)
(fO-fE)2
fE
∑
Observed
frequency
Expected
frequency
Example: χ2 Test for Independence
l 
l 
Is there an association between gender and vegetarianism?
Non-Vegetarian
Total:
Male
10
60
70
Female
50
80
130
Total:
60
140
200
Statistical Hypotheses:
l 
l 
l 
Vegetarian
H0: gender and food preference are independent
H1: gender and food preference are associated/ not independent
Decision rule: α = .05
l 
df = (# rows – 1) x (# columns – 1) à (2-1) x (2-1) = 1
l 
Critical χ2 = 3.841
Next step: calculate the expected
frequency of each cell
Male
Female
Vegetarian
Non-Vegetarian
Total:
10
60
70
70 x 60
fe =
= 21
200
50
fe =
Total:
60
130 x 60
= 39
200
fe =
80
fe =
140
70 x 140
= 49
200
130 x 140
= 91
200
130
200
row total x column total
expected frequency of each cell =
grand total
Now put it into the table…
Male Veg
Male Non-Veg
Female Veg
Female Non-Veg
Sample
(N = 200)
fo
10
60
50
80
expected freq.
fe
21
49
39
91
( fo − fe) 2
χ =∑
fe
(10 − 21) 2 (60 − 49) 2 (50 − 39) 2 (80 − 91) 2
2
χ =
+
+
+
21
49
39
91
2
χ 2 = 5.76 + 2.47 + 3.10 + 1.33 = 12.66
Example: χ2 Test for Independence
l 
Observed statistical test value: χ2 (1) = 12.66, p < .05
l 
Make a decision & interpret
l  Reject H0 and accept H1 because 12.66 > 3.84
l  Gender is related to food preference!
Practice!
l 
Is there an association between cat ownership (yes/no) and life
success (yes/no)? You survey 100 people…
Successful
Not Successful
Cat
60
15
No Cat
15
10
Total:
l 
l 
Don’t forget to get your row and column totals…
And follow the steps of hypothesis testing:
l 
l 
l 
l 
Statistical Hypothesis
Decision Rule
Calculate Test Statistic
Make a Decision & Interpret
Total:
100
Successful
Not Successful
Total:
Cat
60
15
75
No Cat
15
10
25
Total:
75
25
100
Statistical Hypotheses:
H0: cat ownership and life success are independent
H1: cat ownership and life success are related
Decision rule: α = .05
df = (# rows – 1) x (# columns – 1) à (2-1) x (2-1) = 1
Critical χ2 = 3.841
Successful
Not Successful
Total:
Cat
60
15
75
No Cat
15
10
Total:
75
25
75 x 75
fe =
= 56.25
100
25 x 75
fe =
= 18.75
100
75 x 25
fe =
= 18.75
100
25 x 25
fe =
= 6.25
100
25
100
Cat,
Success
No cat,
Success
Cat,
No success
No cat,
No Success
Sample
(N = 100)
fo
60
15
15
10
expected freq.
fe
56.25
18.75
18.75
6.25
Cat,
Success
No cat,
Success
Cat,
No success
No cat,
No Success
Sample
(N = 100)
fo
60
15
15
10
expected freq.
fe
56.25
18.75
18.75
6.25
2
(
f
o
−
f
e
)
χ2 = ∑
fe
(60 − 56.25) 2 (15 − 18.75) 2 (15 − 18.75) 2 (10 − 6.25) 2
2
χ =
+
+
+
56.25
18.75
18.75
6.25
χ 2 = .25 + .75 + .75 + 2.25 = 4.0
l 
Observed statistical test value: χ2 (1) = 4.00, p < .05
l 
Make a decision & interpret
l  Reject H0 because 4.00 > 3.84
l  Cat ownership is related to life success!
=