Categorical variables

Hypothesis testing
Part 2: Categorical
variables
Intermediate Training in
Quantitative Analysis
Bangkok 19-23 November 2007
LEARNING PROGRAMME
Topics to be covered in this
presentation
 Pearson’s chi square
LEARNING PROGRAMME - 2
Learning objectives
By the end of this session, the participant
should be able to:
Conduct chi square
LEARNING PROGRAMME - 3
Hypothesis testing for
categorical variables…
We sometimes want to determine…
Whether the proportion of people with some
particular outcome differ by another variable
Ex. Does the proportion of food insecure
households differ in male and female headed
households??
LEARNING PROGRAMME - 4
What if we we want to test whether there is
a relationship between two categorical
variables?
Pearson Chi-Square
LEARNING PROGRAMME - 5
Pearson’s chi-square test
 Pearson’s chi-squared test (X²) is an omnibus
test that is used to test the hypothesis that the
row and the column variables of a contingency
table are independent
 It’s a comparison of the frequencies you
observe in certain categories to the frequency
you might expect to get in those categories by
chance.
LEARNING PROGRAMME - 6
Assumptions of the chi-square
test
Two assumptions:
1. For the test to be meaningful it is
imperative that each unit contributes to
only one cell of the contingency table.
2. The expected frequencies should be
greater than 5 in each cell (or the test
may fail to detect a genuine effect)
LEARNING PROGRAMME - 7
Chi square formula…
Chi  Square  
(Observed  Expected)2
Expected
LEARNING PROGRAMME - 8
Chi square example
Child Gender * underweight Crosstabulation
underweight
no
Child Gender
Male
Count
587
2673
2144.6
528.4
2673
2204
253
2674
2145.6
528.6
2674
Count
4290
1057
5347
Expected Count
4290
1057
5347
Count
Expected Count
Total
yes
2086
Expected Count
Female
Total
LEARNING PROGRAMME - 9
Chi Square example…
 X2= [(2086-2144.6)2/2144.6] + [(587528.4)2/528.4] + [(2204-2145.4)2/2145.4] +
[(470-528.6)2/528.6]
 X2= 1.60 + 6.50 + 1.60 + 6.50
 X2= 16.2 (then check x2 distribution…)
LEARNING PROGRAMME - 10
Chi Square example…
 If we do it by spss, we get the same answer
Gender of child * WAZPREV Crosstabulati on
WAZPREV
.00
Gender of
child
Male
Female
T otal
Count
1. 00
T otal
2086
587
2673
Expected Count
2144.6
528.4
2673.0
% within Gender of child
78.0%
22.0%
100.0%
% within WAZPREV
48.6%
55.5%
50.0%
2204
470
2674
Expected Count
2145.4
528.6
2674.0
% within Gender of child
82.4%
17.6%
100.0%
% within WAZPREV
51.4%
44.5%
50.0%
4290
1057
5347
Expected Count
4290.0
1057.0
5347.0
% within Gender of child
80.2%
19.8%
100.0%
100.0%
100.0%
100.0%
Count
Count
% within WAZPREV
LEARNING PROGRAMME - 11
Chi -S quare Tests
Value
Asymp. Sig.
(2-sided)
df
Pearson Chi-Square
16.196b
1
.000
Continuity Correction a
15.921
1
.000
Likelihood Ratio
16.223
1
.000
Fisher's Exact Test
Linear-by-Linear Association
N of Valid Cases
Exact Sig.
(2-sided)
.000
16.193
1
Exact Sig.
(1-sided)
.000
.000
5347
a. Computed only for a 2x2 table
b. 0 cell s (.0%) have expect ed cou nt less than 5. The minimum expect ed cou nt i s 528.40.
LEARNING PROGRAMME - 12
To calculate chi-squares in
SPSS
In SPSS, chi-square tests are run using the following
steps:
1.
2.
3.
4.
5.
6.
7.
8.
Click on “Analyze” drop down menu
Click on “Descriptive Statistics”
Click on “Crosstabs…”
Move the variables into proper boxes
Click on “Statistics…”
Check box beside “Chi-square”
Click “Continue”
Click “OK”
LEARNING PROGRAMME - 13
Reading the Chi-square test
 However, it is difficult to get an idea about the
strength of that relationship
LEARNING PROGRAMME - 14
Important Note:
 If you compare two categorical variables and
at least one has multiple categories, you can
determine which categories are different from
one another by running a Z-test under
“Custom Tables”
 This is rather complicated so we will not
discuss in detail
LEARNING PROGRAMME - 15
Now…..exercise!!!!
LEARNING PROGRAMME - 16