PSYC 502 Lecture 09 Categorical data and tabulation 2012

Nominal/Ordinal/ Categorical
data
Black Chapter 20
What if the data that we obtain
are not interval or ratio level?
 Working with percentages
 Do children with reading disabilities have
trouble getting along with their teachers?
 What personality characteristics are
associated with harsh parenting?
 Are low SES children more likely to have
psychopathology than those in middle and
high SES?
Categorical Data
Do’s and Don’ts
NOT IN BLACK
Tabular presentation of data
Reading level
Public
Schools
Private
Schools
Religious
Schools
Behind grade level
36
24
21
At grade level
33
28
43
Above grade level
14
22
14
Do you have a directional
hypothesis?
 If yes  Use that to guide your
presentation
 If no  Think about temporal or logical
sequencing
 Present percentages accordingly
SPSS crosstabs output:
presenting percentages
Reading
level
Behind
grade
level
At grade
level
Above
grade
level
SUM
Public
Schools
Private
Schools
Religious
Schools
36
44.44%
43.37%
15.32%
24
29.63%
32.43%
10.21%
21
25.93%
26.92%
8.94%
81
33
31.73%
39.76%
14.04%
28
26.92%
37.84%
11.91%
43
41.35%
55.13%
18.30%
104
14
28.00%
16.87%
5.96%
83
22
44.00%
29.73%
9.36%
74
14
28.00%
17.95%
5.96%
78
50
235
Presenting percentages
Table 1. Distribution of the reading level of children in public,
private, and religious schools (N=235)
Reading Level
Public
Schools
Private
Schools
Religious
Schools
Behind grade
level
43.4%
32.4%
26.9%
At grade level
39.8%
37.8%
55.1%
Above grade
level
16.9%
29.7%
17.9%
83
74
78
N
Writing about tabular data
 Among the children in public schools 43%
were reading behind their grade level,
compared to 32% in private schools and
27% in religious schools. Children reading
above their grade level differed as well
with only 17% reading above their grade
level in public schools, 18% in religious
schools but 30% in private schools.
Graphical presentation of
categorical data
50.0%
40.0%
Behind grade level
30.0%
At grade level
Above grade level
20.0%
10.0%
Public
Schools
Private
Schools
Religious
Schools
Graphical presentation of
categorical data
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Above grade level
At grade level
Behind grade level
Public
Schools
Private
Schools
Religious
Schools
Graphical presentation of
categorical data
100.0%
90.0%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
Public Schools
Private Schools
Religious Schools
Behind grade
level
At grade level
Above grade
level
Nominal/Ordinal data
Black Chapter 20
Reflection exercise
 Do you see any similarities between t-tests
and F-tests on one hand, and Chi-square
tests, on the other?
The concept of observed and
expected frequencies
 We may have prior information about how
things are in the population.
Children who have reading
problems
Population
Percentages
Yes
10%
No
90%
The concept of observed and
expected frequencies
 We may have prior information about how
things are in the population.
Children who have reading problems
Population
Observed
Percentages
Yes
14
10%
No
40
90%
What would be observed and
expected frequencies
 Compute the expected frequencies
Yes
No
Total
Children who have reading problems
Expected
Observed frequencies
Expected
14
5.4
10%
40
48.6
90%
54
54
Is this sample similar to a
known population?
 Chi-square test
2
(
O

E
)
i
2   i
Ei
i 1
Number of cells
m



df=m-1
p-value tests the NULL hypothesis (gives the probability of null
hypothesis being true) that this sample is similar to the known
population.
p-value greater than the chosen significance level indicates that this
sample is similar to the known population.
p-value smaller than the chosen significance level indicates that
this sample is coming from a population different from the known
population.
Apply the chi-square test
Yes
No
Total
Yes
No
Total
p
Children who have reading problems
Expected
Observed frequencies
Expected
14
5.4
10%
40
48.6
90%
54
54
Chi Square
13.696296
1.5218107
15.218107
0.000096
1-sample Chi square test: Application
guide
 Write the null hypothesis:
1. This sample is coming from the known population.
2. This sample is similar to the known population.
 Write alternative hypothesis
1. This sample is not coming from the known
population.
2. This sample is not similar to the known population.
 Compute “expected” frequencies
 Compute chi-square value
 Obtain the p-value
 p=CHIDIST(chi-value,df)
 Interpret the p-value
Example 2 - Step 1: Hypotheses
 In the US population, 16% of all women experience
depressive symptoms that are clinically meaningful during
a year. The numbers below are taken from a sample of
Turkish women living in metropolitan areas. To what
extent do these women’s experiences with depression are
similar to American women?
Women who have clinically significant
depressive symptoms
Depressive
Not depressive
Turkish
US Percentages
45
16%
105
84%
Step 2: What would be the
expected frequencies
Women who have clinically significant
depressive symptoms
Depressive
Not depressive
Total
Expected
Turkish
Turkish
US Percentages frequency
45
16%
24
105
84%
126
150
150
Step 3: Compute Chi square
value
Women who have clinically significant
depressive symptoms
Depressive
Not depressive
Total
Expected
Turkish
Turkish
US Percentages frequency
45
16%
24
105
84%
126
150
150
Oi-Ei
(Oi-Ei)**2
21
441
-21
441
Chi
square
values
18.375
3.5
21.875
Step 4: Compute p value
Women who have clinically significant
depressive symptoms
Depressive
Not depressive
Total
Expected
Turkish
Turkish
US Percentages frequency
45
16%
24
105
84%
126
150
150
Oi-Ei
(Oi-Ei)**2
21
441
-21
441
p value
Chi
square
values
18.375
3.5
21.875
2.91E-06
0.00000
Step 5: Write your conclusion
Comparing two samples –
Nominal/ordinal variables
 We may have a hypothesis about an
association:
 Religiosity and sexual orientation are associated
Sexual
Religiosity
Orientation Yes
No
Heterosexual
57
105
Gay
13
27
Bisexual
8
17
 Null Hypothesis:
 Religiosity and sexual orientation are not associated.
 Alternative Hypothesis:
 Religiosity and sexual orientation are associated.
The assumption of
independence
 If the two variables were independent,
what would have been the distribution of
the cases?
Sexual
Orientation
Heterosexual
Gay
Bisexual
Total
Religiosity
Yes
No
57
105
13
27
8
17
78
149
Total
162
40
25
227
The assumption of
independence – Step 1
 If the two variables were independent, what
would have been the distribution of the cases?
Sexual
Religiosity
Orientation
Yes
No
Heterosexual
57
105
Gay
13
27
Bisexual
8
17
Total
78
149
Percent religious
34.36%
65.64%
Total Percent sexual orientation
162
71.37%
40
17.62%
25
11.01%
227
Marginal distribution
What would be the probability that a person is heterosexual AND religious?
p(Heterosexual & religious) = 71.37%*34.36% = 24.52%
How many people would be heterosexual AND religious?
N(Heterosexual & religious) = 24.52%*227 = 55.7
Expected frequencies under the
assumption of independence – Step 2
 Calculate what is the percent of cases expected
in each cell, under the assumption of
independence
Sexual
Religiosity
Orientation Yes
No
Heterosexual
57
105
Gay
13
27
Bisexual
8
17
Total
78
149
Marginal
distribution
34.36%
65.64%
Percent in each
Marginal
cell expected
Total
distribution Yes
No
162
71.37%
24.52%
46.84%
40
17.62%
6.05%
11.57%
25
11.01%
3.78%
7.23%
227
Expected frequencies under the
assumption of independence – Step 3
 Calculate what is the number of cases expected
in each cell, under the assumption of
independence
Sexual
Religiosity
Orientation Yes
No
Heterosexual
57
105
Gay
13
27
Bisexual
8
17
Total
78
149
Marginal
distribution
34.36%
65.64%
Frequency in each cell expected
Heterosexual 55.6652 106.3348
Gay
13.74449 26.25551
Bisexual
8.590308 16.40969
Percent in each
Marginal
cell expected
Total
distribution Yes
No
162
71.37%
24.52%
46.84%
40
17.62%
6.05%
11.57%
25
11.01%
3.78%
7.23%
227
Chi square values for each cell
– Step 4
Sexual
Religiosity
Orientation Yes
No
Heterosexual
57
105
Gay
13
27
Bisexual
8
17
Total
78
149
Marginal
distribution
34.36%
65.64%
Frequency in each cell expected
Heterosexual 55.6652 106.3348
Gay
13.74449 26.25551
Bisexual
8.590308 16.40969
Percent in each
Marginal
cell expected
Total
distribution Yes
No
162
71.37%
24.52%
46.84%
40
17.62%
6.05%
11.57%
25
11.01%
3.78%
7.23%
227
227
Chi-square for each cell
Heterosexual 0.032007 0.016756
Gay
0.040327 0.021111
Bisexual
0.040565 0.021235
0.172
Is the observed distribution conforming to
independence? Two sample chi-square test

Chi-square test
2
(
O

E
)
i
2   i
Ei
i 1
mn



df=(m-1)*(n-1)
p-value gives us the probability that hypothesis of
independence is true.
p-value greater than the chosen significance level indicates
that the two variables of interest are independent.
p-value smaller than the chosen significance level indicates
that the two variables of interest are associated.
Example - chi-square: Step 1
Behind grade level
At grade level
Above grade level
Public
Private
Religious
Schools Schools Schools
36
24
21
33
28
43
14
22
14
83
74
78
81
104
50
235
Step 2
Behind grade level
At grade level
Above grade level
Public
Private
Religious
Schools Schools Schools
36
24
21
33
28
43
14
22
14
83
74
78
0.353191 0.314894 0.331915
81 0.344681
104 0.442553
50 0.212766
235
Step 3
Behind grade level
At grade level
Above grade level
Expected %
Behind grade level
At grade level
Above grade level
Public
Private
Religious
Schools Schools Schools
36
24
21
33
28
43
14
22
14
83
74
78
0.353191 0.314894 0.331915
Public
Schools
0.121738
0.156306
0.075147
81 0.344681
104 0.442553
50 0.212766
235
Private
Religious
Schools Schools
0.108538 0.114405
0.139357 0.14689
0.066999 0.07062
1
Step 4
Expected %
Behind grade level
At grade level
Above grade level
Public
Schools
0.121738
0.156306
0.075147
Private
Religious
Schools Schools
0.108538 0.114405
0.139357 0.14689
0.066999 0.07062
1
Expected #
Behind grade level
At grade level
Above grade level
Public
Schools
28.60851
36.73191
17.65957
Private
Schools
25.50638
32.74894
15.74468
Religious
Schools
26.88511
34.51915
16.59574
Step 5
Expected #
Behind grade level
At grade level
Above grade level
Public
Schools
36
33
14
Public
Schools
28.60851
36.73191
17.65957
Private
Schools
24
28
22
Private
Schools
25.50638
32.74894
15.74468
Religious
Schools
21
43
14
Religious
Schools
26.88511
34.51915
16.59574
Chi
Behind grade level
At grade level
Above grade level
Public
Schools
1.909715
0.379158
0.75837
Private
Schools
0.088966
0.688645
2.485221
Religious
Schools
1.28824
2.083621
0.406001
Behind grade level
At grade level
Above grade level
Step 6
Chi
Behind grade level
At grade level
Above grade level
Public
Schools
1.909715
0.379158
0.75837
Private
Schools
0.088966
0.688645
2.485221
Religious
Schools
1.28824
2.083621
0.406001
10.08794 df
p
4
0.038972
What if the sample size was 150?
Behind grade level
At grade level
Above grade level
Public
Private
Religious
Schools Schools Schools
23
15
13
21
18
27
9
14
9
Same percentages, different N!
What if the sample size was 150?
Chi
Behind grade level
At grade level
Above grade level
Public
Schools
1.218967
0.242016
0.484066
Private
Schools
0.056787
0.439561
1.586312
Religious
Schools
0.822281
1.329971
0.25915
6.439109 df
p
Same percentages, different N!
4
0.168668
What if the sample size was 1500?
Behind grade level
At grade level
Above grade level
Public
Private
Religious
Schools Schools Schools
230
153
134
211
179
274
89
140
89
Same percentages, different N!
What if the sample size was 1500?
Chi
Behind grade level
At grade level
Above grade level
Public
Schools
12.18967
2.420156
4.840657
Private
Schools
0.567865
4.395607
15.86312
Religious
Schools
8.22281
13.29971
2.591496
64.39109 df
p
Same percentages, different N!
4
0.00000