Chapter 6 Categorical Variables Two

Basic Practice of Statistics - 3rd Edition
Chapter 6
Two-Way Tables
BPS - 3rd Ed.
Chapter 6
1
Categorical Variables
‹ In
this chapter we will study the
relationship between two categorical
variables (variables whose values fall in
groups or categories).
‹ To
analyze categorical data, use the
counts or percents of individuals that
fall into various categories.
BPS - 3rd Ed.
Chapter 6
2
Two-Way Table
‹ When
there are two categorical variables,
the data are summarized in a two-way table
– each row in the table represents a value of the row
variable
– each column of the table represents a value of the
column variable
‹
The number of observations falling into each
combination of categories is entered into each
cell of the table
BPS - 3rd Ed.
Chapter 6
Chapter 6
3
1
Basic Practice of Statistics - 3rd Edition
Marginal Distributions
‹A
distribution for a categorical variable tells
how often each outcome occurred
– totaling the values in each row of the table
gives the marginal distribution of the row
variable (totals are written in the right margin)
– totaling the values in each column of the table
gives the marginal distribution of the column
variable (totals are written in the bottom margin)
BPS - 3rd Ed.
Chapter 6
4
Marginal Distributions
‹ It
is usually more informative to display
each marginal distribution in terms of
percents rather than counts
– each marginal total is divided by the table
total to give the percents
‹A
bar graph could be used to graphically
display marginal distributions for
categorical variables
BPS - 3rd Ed.
Chapter 6
5
Case Study
Age and Education
(Statistical Abstract of the United States, 2001)
Data from the U.S. Census Bureau
for the year 2000 on the level of
education reached by Americans
of different ages.
BPS - 3rd Ed.
Chapter 6
Chapter 6
6
2
Basic Practice of Statistics - 3rd Edition
Case Study
Age and Education
Variables
Marginal distributions
Chapter 6
BPS - 3rd Ed.
7
Case Study
Age and Education
Variables
15.9%
33.1%
25.4%
25.6%
21.6%
46.5%
32.0%
Marginal distributions
Chapter 6
BPS - 3rd Ed.
8
Case Study
Age and Education
Marginal Distribution
for Education Level
Not HS grad
15.9%
HS grad
33.1%
College 1-3 yrs
25.4%
College ≥4 yrs
25.6%
BPS - 3rd Ed.
Chapter 6
Chapter 6
9
3
Basic Practice of Statistics - 3rd Edition
Conditional Distributions
‹ Relationships
between categorical
variables are described by calculating
appropriate percents from the counts
given in the table
– prevents misleading comparisons due to
unequal sample sizes for different groups
Chapter 6
BPS - 3rd Ed.
10
Case Study
Age and Education
Compare the 25-34
age group to the 3554 age group in
terms of success in
completing at least 4
years of college:
Data are in thousands, so we have that 11,071,000 persons in
the 25-34 age group have completed at least 4 years of college,
compared to 23,160,000 persons in the 35-54 age group.
The groups appear greatly different, but look at the group totals.
Chapter 6
BPS - 3rd Ed.
11
Case Study
Age and Education
Compare the 25-34
age group to the 3554 age group in
terms of success in
completing at least 4
years of college:
Change the counts to percents:
11,071
= .293 (29.3%) for 25 - 34 age group
37,786
23,160
= .284 (28.4%) for 35 - 54 age group
81,435
BPS - 3rd Ed.
Chapter 6
Chapter 6
Now, with a fairer
comparison using
percents, the groups
appear very similar.
12
4
Basic Practice of Statistics - 3rd Edition
Case Study
Age and Education
If we compute the percent completing at least four
years of college for all of the age groups, this would
give us the conditional distribution of age, given
that the education level is “completed at least 4
years of college”:
Age:
25-34
35-54
55 and over
Percent with
≥ 4 yrs college:
29.3%
28.4%
18.9%
Chapter 6
BPS - 3rd Ed.
13
Conditional Distributions
‹
The conditional distribution of one variable can be
calculated for each category of the other variable.
‹
These can be displayed using bar graphs.
‹
If the conditional distributions of the second variable
are nearly the same for each category of the first
variable, then we say that there is not an association
between the two variables.
‹
If there are significant differences in the conditional
distributions for each category, then we say that there
is an association between the two variables.
Chapter 6
BPS - 3rd Ed.
14
Case Study
Age and Education
Conditional
Distributions
of Age for
each level of
Education:
BPS - 3rd Ed.
Chapter 6
Chapter 6
15
5
Basic Practice of Statistics - 3rd Edition
Simpson’s Paradox
When studying the relationship between two
variables, there may exist a lurking variable that
creates a reversal in the direction of the
relationship when the lurking variable is ignored
as opposed to the direction of the relationship
when the lurking variable is considered.
‹ The lurking variable creates subgroups, and
failure to take these subgroups into consideration
can lead to misleading conclusions regarding the
association between the two variables.
‹
Chapter 6
BPS - 3rd Ed.
16
Discrimination?
(Simpson’s Paradox)
Consider the acceptance rates for the following
group of men and women who applied to college.
counts
Accepted
Not
Total
accepted
percents Accepted
Not
accepted
Men
198
162
360
Men
55%
45%
Women
88
112
200
Women
44%
56%
Total
286
274
560
A higher percentage of men were accepted: Discrimination?
Chapter 6
BPS - 3rd Ed.
17
Discrimination?
(Simpson’s Paradox)
Lurking variable: Applications were split between the
Business School (240) and the Art School (320).
BUSINESS SCHOOL
counts
Accepted
Not
Total
accepted
percents Accepted
Not
accepted
Men
18
102
120
Men
15%
85%
Women
24
96
120
Women
20%
80%
Total
42
198
240
A higher percentage of women were accepted in Business
BPS - 3rd Ed.
Chapter 6
Chapter 6
18
6
Basic Practice of Statistics - 3rd Edition
Discrimination?
(Simpson’s Paradox)
Lurking variable: Applications were split between the
Business School (240) and the Art School (320).
ART SCHOOL
Not
counts Accepted
Total
accepted
Men
180
60
Women
64
Total
244
percents Accepted
Not
accepted
240
Men
75%
25%
16
80
Women
80%
20%
76
320
A higher percentage of women were also accepted in Art
BPS - 3rd Ed.
Chapter 6
19
Discrimination?
(Simpson’s Paradox)
‹ So
within each school a higher percentage of
women were accepted than men.
There is not any discrimination against women!!!
‹ This
is an example of Simpson’s Paradox. When
the lurking variable (School applied to: Business or
Art) is ignored the data seem to suggest
discrimination against women. However, when the
School is considered the association is reversed
and suggests discrimination against men.
BPS - 3rd Ed.
Chapter 6
Chapter 6
20
7