Unit 2: Exploring Data Name______________________ 2.1

Unit 2: Exploring Data
2.1 Variables and their Graphs
Name______________________
Lab: Questions on Backs
Student Survey
Categorical vs Quantitative Variables
What is the difference between a categorical and a quantitative variable?
Do we ever use numbers to describe the values of a categorical variable? Give some examples.
What is a distribution?
Example: US Census Data
Here is information about 10 randomly selected US residents from the 2000 census.
State
Kentucky
Florida
Wisconsin
California
Michigan
Virginia
Pennsylvania
Virginia
California
New York
Number of Family
Members
2
6
2
4
3
3
4
4
1
4
Age
Gender
61
27
27
33
49
26
44
22
30
34
Female
Female
Male
Female
Female
Female
Male
Male
Male
Female
Marital
Status
Married
Married
Married
Married
Married
Married
Married
Never married/ single
Never married/ single
Separated
Total
Income
21000
21300
30000
26000
15100
25000
43000
3000
40000
30000
Travel time
to work
20
20
5
10
25
15
10
0
15
40
(a) Who are the individuals in this data set?
(b) What variables are measured? Identify each as categorical or quantitative. In what units were the
quantitative variables measured?
(c) Describe the individual in the first row.
Graphs of Data:
Categorical
Quantitative
[Chapter 3]
2.2 Analyzing and Displaying Categorical Data
What graphs are used for categorical data?
Bar Graph:
Pie Graph:
What is the most important thing to remember when making pie charts and bar graphs? Why do statisticians
prefer bar graphs?
Segmented Bar Graph:
1
0.9
0.8
Percent
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Male
Female
Frequency and Relative Frequency Tables:
Color
Blue
Red
Orange
Green
Yellow
Brown
TOTAL
Freq.
13
7
11
9
8
7
55
Rel. Freq.
Percent
1.000
100%
What are some common ways to make a misleading graph?
What is wrong with these graphs?
Two-way tables:
What is a contingency (two-way) table?
TOTAL
Male
Female
What is a marginal distribution?
TOTAL
What is a conditional distribution?
The conditional distribution of political preference, conditional on being male:
Liberal
Moderate
Conservative
TOTAL
Male
The conditional distribution of political preference, conditional on being female:
Liberal
Moderate
Conservative
TOTAL
Female
What is the conditional relative frequency distribution of gender among conservatives?
Classwork: Transportation and Gender
[Chapter 4-5]
2.3 Analyzing and Displaying Quantitative Data
What graphs are used to display quantitative data?
Dotplots:
Stemplots (stem and leaf):
Example: Make a stemplot for the following data,
The following data are price per ounce for various brands of dandruff shampoo at a local grocery store.
0.32 0.21 0.29 0.54 0.17 0.28 0.36 0.23
Can you make a stemplot with this data?
What is the most important thing to remember when making a stemplot?
Back-to Back Stemplots:
Example: Tobacco use in G-rated Movies
Total tobacco exposure time (in seconds) for Disney movies:
223 176 548 37 158 51 299 37 11 165 74 9
Total tobacco exposure time (in seconds) for other studios’ movies:
205 162 6 1 117 5 91 155 24 55 17
Make a back-to-back stemplot.
2
6
23
206
9
Boxplots:
Example: We will use the following data representing tornadoes per year in Oklahoma from 1995 until
2004 (Sullivan, 2nd edition, p. 167), to construct a modified box plot .
79
47
55
83
145
44
61
18
78
Describing Distributions:
Briefly describe/illustrate the following distribution shapes:
Symmetric
Skewed right
Unimodal
Bimodal
Identify the shape of the following distributions:
Skewed left
Uniform
62
Example: Smart Phone Battery Life
Here is the estimated battery life for each of 9 different smart phones (in minutes). Make a graph of the data
and describe what you see.
Smart Phone
Battery Life (minutes)
Apple iPhone
300
Motorola Droid
385
Palm Pre
300
Blackberry Bold
360
Blackberry Storm
330
Motorola Cliq
360
Samsung Moment
330
Blackberry Tour
300
HTC Droid
460
Lab: Features of Distributions
Center:
Unusual Features:
Spread:
Shape’s Impact on Mean and Median:
Resistant Measures:
2.4 Histograms
Why would we prefer a relative frequency histogram to a frequency histogram?
What will cause you to lose points on tests and projects (and cause Miss Hartman to go crazy…or crazier)?
The following table presents the average points scored per game (PPG) for the 30 NBA teams in the 2009–2010
regular season. Make a histogram of the distribution.
Team
Atlanta Hawks
Boston Celtics
Charlotte Bobcats
Chicago Bulls
Cleveland Cavaliers
Dallas Mavericks
Denver Nuggets
Detroit Pistons
Golden State Warriors
Houston Rockets
PPG
101.7
99.2
95.3
97.5
102.1
102
106.5
94
108.8
102.4
Team
Indiana Pacers
Los Angeles Clippers
Los Angeles Lakers
Memphis Grizzlies
Miami Heat
Milwaukee Bucks
Minnesota Timberwolves
New Jersey Nets
New Orleans Hornets
New York Knicks
PPG
100.8
95.7
101.7
102.5
96.5
97.7
98.2
92.4
100.2
102.1
Team
Oklahoma City Thunder
Orlando Magic
Philadelphia 76ers
Phoenix Suns
Portland Trail Blazers
Sacramento Kings
San Antonio Spurs
Toronto Raptors
Utah Jazz
Washington Wizards
PPG
101.5
102.8
97.7
110.2
98.1
100
101.4
104.1
104.2
96.2
Here is some data on time spent on the internet. Graph the data using a histogram.
Time on
Internet
(min.)
Frequency
0
10
20
30
40
45
60
90
120
180
210
240
270
300
360
7
1
3
7
1
1
15
3
14
10
1
10
2
9
3
2.5 Comparing Two Distributions
Example: McDonald’s Beef Sandwiches
Here is data for the amount of fat (in grams) for McDonald’s beef sandwiches. Calculate the median
and the IQR.
Sandwich
Fat (g)
Are there any outliers in the beef sandwich distribution?
Here is data for the amount of fat (in grams)
for McDonald’s chicken sandwiches. Are
there any outliers in this distribution?
Draw parallel boxplots for the beef and chicken
sandwich data. Compare these distributions.
Hamburger
Cheeseburger
Double Cheeseburger
McDouble
Quarter Pounder®
Quarter Pounder® with Cheese
Double Quarter Pounder® with Cheese
Big Mac®
Big N' Tasty®
Big N' Tasty® with Cheese
Angus Bacon & Cheese
Angus Deluxe
Angus Mushroom & Swiss
McRib ®
Mac Snack Wrap
Sandwich
McChicken ®
Premium Grilled Chicken Classic Sandwich
Premium Crispy Chicken Classic Sandwich
Premium Grilled Chicken Club Sandwich
Premium Crispy Chicken Club Sandwich
Premium Grilled Chicken Ranch BLT Sandwich
Premium Crispy Chicken Ranch BLT Sandwich
Southern Style Crispy Chicken Sandwich
Ranch Snack Wrap® (Crispy)
Ranch Snack Wrap® (Grilled)
Honey Mustard Snack Wrap® (Crispy)
Honey Mustard Snack Wrap® (Grilled)
Chipotle BBQ Snack Wrap® (Crispy)
Chipotle BBQ Snack Wrap® (Grilled)
9g
12 g
23 g
19 g
19 g
26 g
42 g
29 g
24 g
28 g
39 g
39 g
40 g
26 g
19 g
Fat
16 g
10 g
20 g
17 g
28 g
12 g
23 g
17 g
17 g
10 g
16 g
9g
15 g
9g
Dotplot of EnergyCost vs Type
Type
Example: Energy Cost: Top vs. Bottom Freezers
How do the annual energy costs (in dollars) compare for refrigerators with top freezers and refrigerators with
bottom freezers? The data below is from the May 2010 issue of Consumer Reports.
bottom
top
56
70
84
98
112
EnergyCost
126
140
Example: Which gender is taller, males or females? A sample of 14-year-olds from the United Kingdom was
randomly selected using the CensusAtSchool website. Here are the heights of the students (in cm). Make a
back-to-back stemplot and compare the distributions.
Male: 154, 157, 187, 163, 167, 159, 169, 162, 176, 177, 151, 175, 174, 165, 165, 183, 180
Female: 160, 169, 152, 167, 164, 163, 160, 163, 169, 157, 158, 153, 161, 165, 165, 159, 168, 153, 166, 158, 158, 166
Lab: Matching Graphs to Variables
2.6 Standard Deviation
Lab: Guess My Age
In the distribution below, how far are the values from the mean, on average?
Dot Plot
Collection 1
0
1
2
What does the standard deviation measure?
3
4
Data
5
6
7
8
What are some similarities and differences between the range, IQR, and standard deviation?
How is the standard deviation calculated? What is the variance?
What are some properties of the standard deviation?
Example: A random sample of 5 students was asked how many minutes they spent doing HW the previous
night. Here are their responses (in minutes): 0, 25, 30, 60, 90. Calculate and interpret the standard deviation.
Unit 2 FRAPPY