CHAPTER 1 Exploring Data

CHAPTER 1
Exploring Data
1.1
Analyzing Categorical
Data
The Practice of Statistics, 5th Edition
Starnes, Tabor, Yates, Moore
Bedford Freeman Worth Publishers
Analyzing Categorical Data
Learning Objectives
After this section, you should be able to:
 CALCULATE and DISPLAY the marginal distribution of a
categorical variable from a two-way table
 CALCULATE and DISPLAY the conditional distribution of a
categorical variable for a particular value of the other categorical
variable in a two-way table
 DESCRIBE the association between two categorical variables
The Practice of Statistics, 5th Edition
2
Two-Way Tables and Marginal Distributions
When a dataset involves two categorical variables, we begin by
examining the counts or percents in various categories for one of the
variables.
A two-way table describes two categorical variables,
organizing counts according to a row variable and a
column variable.
Young adults by gender and chance of getting rich
Female
Male
Total
Almost no chance
96
98
194
Some chance, but probably not
426
286
712
A 50-50 chance
696
720
1416
A good chance
663
758
1421
Almost certain
486
597
1083
Total
2367
2459
4826
The Practice of Statistics, 5th Edition
What are the variables
described by this
two-way table?
How many young
adults were surveyed?
3
Two-Way Tables and Marginal Distributions
The marginal distribution of one of the categorical variables in a twoway table of counts is the distribution of values of that variable among
all individuals described by the table.
Note: Percents are often more informative than counts, especially
when comparing groups of different sizes.
How to examine a marginal distribution:
1)Use the data in the table to calculate the marginal
distribution (in percents) of the row or column totals.
2)Make a graph to display the marginal distribution.
The Practice of Statistics, 5th Edition
4
Two-Way Tables and Marginal Distributions
Examine the marginal
distribution of chance
of getting rich.
Female
Male
Total
Almost no chance
96
98
194
Some chance, but probably not
426
286
712
A 50-50 chance
696
720
1416
A good chance
663
758
1421
Almost certain
486
597
1083
Total
2367
2459
4826
Chance of being wealthy by age 30
Percent
35
Almost no
chance
194/4826 = 4.0%
30
Some chance
712/4826 = 14.8%
A 50-50 chance
1416/4826 = 29.3%
A good chance
1421/4826 = 29.4%
25
Percent
Response
Young adults by gender and chance of getting rich
20
15
10
5
0
Almost certain
1083/4826 = 22.4%
Almost
none
Some
chance
50-50
chance
Good
chance
Almost
certain
Survey Response
The Practice of Statistics, 5th Edition
5
Relationships Between Categorical Variables
A conditional distribution of a variable describes the values of that
variable among individuals who have a specific value of another
variable.
How to examine or compare conditional distributions:
1) Select the row(s) or column(s) of interest.
2) Use the data in the table to calculate the conditional
distribution (in percents) of the row(s) or column(s).
3) Make a graph to display the conditional distribution.
• Use a side-by-side bar graph or segmented bar
graph to compare distributions.
The Practice of Statistics, 5th Edition
6
Relationships Between Categorical Variables
Calculate the conditional
distribution of opinion
among males. Examine the
relationship between gender
and opinion.
Young adults by gender and chance of getting rich
Female
Male
Total
Almost no chance
96
98
194
Some chance, but probably not
426
286
712
A 50-50 chance
696
720
1416
A good chance
663
758
1421
Almost certain
486
597
1083
Total
2367
2459
Chance of being wealthy
by
age 4826
30
Response
Male
Female
Almost no chance
98/2459 =
4.0%
96/2367 =
4.1%
90%
286/2459 =
11.6%
426/2367 =
18.0%
70%
720/2459 =
29.3%
696/2367 =
29.4%
758/2459 =
30.8%
663/2367 =
28.0%
597/2459 =
24.3%
486/2367 =
20.5%
A 50-50 chance
A good chance
Almost certain
80%
Percent
Some chance
100%
Almost certain
60%
50%
Good chance
40%
30%
50-50 chance
20%
10%
Some chance
0%
Males
Opinion
The Practice of Statistics,
5th
Edition
Females
Almost no
chance
7
Relationships Between Categorical Variables
Making this determination
requires formal inference, which
will have to wait a few chapters.
Chance of being wealthy by age 30
Percent
Can we say there is an
association between gender and
opinion in the population of
young adults?
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Almost certain
Good chance
50-50 chance
Some chance
Males
Opinion
Females
Almost no
chance
Caution!
Even a strong association between two categorical variables can
be influenced by other variables lurking in the background.
The Practice of Statistics, 5th Edition
8
Cell Phones
The Pew Research Center asked a random sample of 2024 adult cell
phone owners from the United States which type of cell phone they
own: iPhone, Android, or other (including non-smart phones). Here are
the results, broken down by age category.
• http://pewinternet.org/Reports/2013/Smartphone-Ownership2013.aspx
iPhone
Android
Other
Total
18-34
169
214
134
517
35-54
171
189
277
637
55+
127
100
643
870
Total
467
503
1054
2024
What are the
variables
described by
this two way
table?
Age and type of
phone
The Practice of Statistics, 5th Edition
9
Cell Phones
(a) Use the cell phone data to calculate the marginal distribution (in
percents) of type of cell phone.
(b) Make a graph to display the marginal distribution. Describe what
you see.
Age
Type
of
Phone
iPhone
Android
Other
Total
18-34
169
214
134
517
35-54
171
189
277
637
55+
127
100
643
870
Total
467
503
1054
2024
467
= 23%
2024
503
android:
= 25%
2024
1054
other:
= 52%
2024
iphone:
The percent who own
iPhones and Androids are
about the same, but more
than half of cell phone
owners do not own either
an iPhone or an Android.
The Practice of Statistics, 5th Edition
10
Cell Phones
(c) Use the cell phone data to calculate the conditional distribution of
cell phone type for the 18- to 34-year-olds.
Age
Type
of
Phone
iPhone
Android
Other
Total
18-34
169
214
134
517
The Practice of Statistics, 5th Edition
35-54
171
189
277
637
55+
127
100
643
870
Total
467
503
1054
2024
169
= 32.7%
517
214
android:
= 41.4%
517
134
other:
= 25.9%
517
iphone:
11
Cell Phones
(d) Based on the cell phone data, is there evidence of a difference in
the distribution of cell phone ownership for the three age groups?
Justify your answer.
Age
Type
of
Phone
iPhone
Android
Other
Total
18-34
32.7%
41.4%
25.9%
100%
35-54
26.8%
29.7%
43.5%
100%
55+
14.6%
11.5%
73.9%
100%
Based on the sample data, more than half of the
18- to 34-year-olds and 35- to 54-year-olds own
an iPhone or an Android, while much less than
half of the 55-and-older group own one of these
types. Furthermore, younger people tend to
prefer the Android to the iPhone, while older
people show a slight preference for the iPhone
over the Android.
The Practice of Statistics, 5th Edition
12
Data Analysis: Making Sense of Data
Section Summary
In this section, we learned how to…
 DISPLAY categorical data with a bar graph
 IDENTIFY what makes some graphs of categorical data
deceptive
 CALCULATE and DISPLAY the marginal distribution of a
categorical variable from a two-way table
 CALCULATE and DISPLAY the conditional distribution of a
categorical variable for a particular value of the other categorical
variable in a two-way table
 DESCRIBE the association between two categorical variables
The Practice of Statistics, 5th Edition
13
PAGE 22
20-26 EVEN
Homework
The Practice of Statistics, 5th Edition
14