Lesson 1 - Scatter Plots

9/29/2014
Scatterplots &
Correlation
Section 3.1A
Relationships between quantitative
variables.
Relationships between two
Variables
O A study found that short women are more
likely to have heart attacks than tall
women….
O Smokers on average die younger than
nonsmokers….
O But – to make these conclusions we must
first eliminate the effect of other variables.
NOTE: Statistical relationships are overall tendencies, NOT
absolute rules!
Example A smoker who lives to age 90.
Lurking Variables
O Can strongly influence the relationship
between two variables.
Does this mean that we
should conclude that
country of origin is the
cause for the difference in
math SAT scores?
Math SAT Scores
Hand Span (cm)
Height (cm)
Case of the Missing
Cookies
No – Broad Section of all
U.S. students take test,
while other countries
might have a more select
group.
Hand Span (cm)
Height (cm)
Variables
Response
O A response variable
measures the
outcome of a study.
Explanatory
O An exploratory
variable may help
explain or influence
changes in a
response variable.
1
9/29/2014
Which is the Explanatory and
which is the Response Variable?
O We think that car
weight helps to
explain accident
death.
O We think that
O It is easiest to identify explanatory and
smoking influences
life expectancy.
Explanatory:
Car Weight
Explanatory:
# of Cigarettes Smoked
Response:
Accident Death Rate
Response:
Life Expectancy
response variables when we actually specify
values of one variable to see how it affects
another variable.
O When we don’t specify the values of either
variable but just observe both variables,
there may or may not be explanatory and
response variables.
Points to Remember about a
Scatter Plot
Scatterplot
O The most useful graph to show the
relationship between two quantitative
variables measured on the same
individuals. Each individual in the data
appears as a point in the graph.
O Put the correct variable on
the x and y axis.
O Explanatory variable goes on the horizontal
O Label and scale your axes.
axis.
axis. (eX
Xplanatory goes on the x-axis.)
O If there is no explanatory variable then either
variable can go on the horizontal axis.
O Plot your points.
Types of Correlation
Has the increase been constant?
90
100
80
Would Vote for a woman
90
80
70
60
70
60
% Responding Yes
50
50
40
40
100
30
30
20
20
80
10
10
0
0
0
10
20
30
40
50
0
60
Series1
40
20
Strong
Positive
Linear
20
40
60
80
Strong
Negative
Linear
120
100
80
60
0
No
Correlation
40
0
50
100
150
20
0
Years (since 1900)
0
20
40
60
80
100
2
9/29/2014
Describe the correlation
Caution…..
O Apples: circumference, weight
O College freshmen: shoe size, weight
O People: age, grip strength
Association Does
Not Imply Causation!
ausation!
O Drivers: blood alcohol, reaction time
Interpreting Scatterplots
O Look for DIRECTION (positive, negative, none)
O Look at the FORM of the relationship
O Straight or curved
O Any clusters
O Look at the STRENGTH
When writing to describe:
O There appears to be a (strong, weak,
moderate) (positive/negative) (linear,
nonlinear) relationship between _____ (give
the x variable) and ______ (give the y
variable)
O How closely does it follow the form
O Do not just say between x & y!
O Look for outliers
O Individual value that falls outside the overall pattern of
the relationship
Interpret
Direction: Negative; States in
which higher percentages of
high school graduates take the
SAT tend to have lower mean
SAT Math scores.
Let’s look at a scatter plot of the #
of registered boats in Florida and
the # of manatees killed by boats
for the years 1977 - 2007
Form: Slightly curved; Appears
that most states fall into one of
two distinct clusters.
Strength: Strength is determined by how closely the points
follow a clear form. The overall relationship in this figure is
moderately strong states with similar percents taking the
SAT tend to have roughly similar mean SAT Math scores.
3
9/29/2014
Graph Using a calculator:
Interpret
Direction: Positive – the more
boats registered, the more
manatees killed.
Form: Linear – the overall
pattern follows a straight line
from lower left to upper right.
Strength: Strong – the points don’t deviate greatly from a
line. There are no obvious outliers.
NOTE: Although the scatterplot shows a strong linear relationship
between the variables, we can NOT conclude that the increase in
manatee deaths was caused by the change in boat registrations.
Interpret….
Influential Pt!
200
1. Direction
160
120
2. Form
80
40
5
6
7
8
9
Sprint (seconds)
3. Strength
Sprint Time (sec)
5.41
5.05
9.49
8.09
7.01
7.17
6.83
6.73
8.01
5.68
5.78
6.31
6.04
Long Jump (in)
1.71
184
48
151
90
65
94
78
71
130
173
143
141
The following data represents 9th grade
students who go on a backpacking trip.
Body wt
(lb)
120 187 109 103 131 165 158 116
Backpack
(lb)
26 30 26 24 29 35 31 28
4. Outliers
A point is INFLUENTIAL if removing it would markedly change the
result of the calculation.
Interpret: Backpack
Direction: Positive – lighter
students carry lighter
backpacks.
Form: Somewhat Linear – the
overall pattern follows a straight
line from lower left to upper right.
O The Starnes family arrived at Old Faithful
after it had erupted. They wondered how
long it would be until the next eruption.
Here is a scatterplot that plots the interval
between consecutive eruptions of Old
Faithful against the duration of the previous
eruptions, for the month prior to their visit.
Strength: Moderately Strong – the points vary somewhat from
the linear pattern. One possible outlier the hiker with body
weight 187 pounds and pack weight 30 pounds.
4
9/29/2014
Answer the following
questions:
O Describe the direction of the relationship.
Explain why this makes sense.
POSITIVE - The longer
the duration of the
eruption, the longer the
wait between eruptions.
One reason for this may
be that if the geyser
erupted for longer, it
expended more energy
and it will take longer to
build up the energy
needed to erupt again.
Pg. 148
Answer the following
questions:
O What form does the relationship take? Why
Answer the following
questions:
O How strong is the relationship? Justify.
are there 2 clusters of points?
FAIRLY STRONG.
LINEAR.
The clusters indicate
that in general there are
two types of eruptions,
one shorter, the other
somewhat longer.
Answer the following
questions:
O Are there any outliers?
The clusters indicate
that in general there are
two types of eruptions,
one shorter, the other
somewhat longer.
Answer the following
questions:
O What information does the Starnes family need to
predict when the next eruption will occur?
THERE ARE A FEW
OUTLIERS AROUND THE
CLUSTERS.
but not many and not
very distant from the
main grouping of points.
The Starnes family
needs to know how long
the last eruption was in
order to predict how long
it will be until the next
one.
5
9/29/2014
Homework
O Page 159 (1-13) odd
6