Scatter Plots - Parkway School District

Scatter Plots
A Scatter (XY) Plot has points
that show the relationship
between two sets of data.
In this example, each dot shows one
person's weight versus their height.
(The data is plotted on the graph as
" Cartesian (x,y) Coordinates ")
Example:
The local ice cream shop keeps track of how much ice cream they sell
versus the noon temperature on that day. Here are their figures for the
last 12 days:
Ice Cream Sales vs Temperature
Temperature °C Ice Cream Sales
14.2°
$215
16.4°
$325
11.9°
$185
15.2°
$332
18.5°
$406
22.1°
$522
19.4°
$412
25.1°
$614
23.4°
$544
18.1°
$421
22.6°
$445
17.2°
$408
And here is the same data as a Scatter Plot:
It is now easy to see that warmer weather leads to more sales, but
the relationship is not perfect.
Line of Best Fit
We can also draw a "Line of Best Fit" (also called a "Trend Line") on our
scatter plot:
Try to have the line as close as possible to all points, and as many
points above the line as below.
Example: Sea Level Rise
A
Scatte
r Plot
of Sea
Level
Rise:
And
here I
have
drawn
on a
"Line
of Best
Fit".
Correlation
When the two sets of data are strongly linked together we say they have
a High Correlation.
The word Correlation is made of Co- (meaning "together"), and Relation

Correlation is Positive when the values increase together, and

Correlation is Negative when one value decreases as the other
increases
Like this:
Outliers
"Outliers" are values that "lie outside" the other values.
When we collect
data, sometimes
there are values
that are "far away"
from the main
group of data ...
what do we do with
them?
Example: Long Jump
A new coach has been working with the Long Jump team this month, and
the athletes' performance has changed.
Augustus can now jump 0.15m further, June and Carol can jump 0.06m
further.
Here are all the results:

Augustus: +0.15m

Tom: +0.11m

June: +0.06m

Carol: +0.06m

Bob: + 0.12m

Sam: -0.56m
Oh no! Sam got worse.
Here are the results on the number line:
The mean is:
(0.15+0.11+0.06+0.06+0.12-0.56) / 6 = -0.06 / 6 = -0.01m
So, on average the performance went DOWN.
The coach is obviously useless ... right?
Sam's result is an "Outlier" ... what if we remove Sam's result?
Example: Long Jump (continued)
Let us try the results WITHOUT Sam:
Mean = (0.15+0.11+0.06+0.06+0.12)/6 = 0.08m
Hey, the coach looks much better now!
But is that fair? Can we just get rid of values we don't like?
What To Do?
You need to think "why is that value over there?"
It may be quite normal to have high or low values

People can be short or tall

Some days there is no rain, other days there can be a downpour

Athletes can perform better or worse on different days
Or there may be an unusual reason for extreme data
Example: Long Jump (continued)
We find out that Sam was feeling sick that day. Not the coach's fault at
all.
So it is a good idea in this case to remove Sam's result.
When we remove outliers we are changing the data, it is no longer
"pure", so we shouldn't just get rid of the outliers without a good reason!
And when we do get rid of them, we should explain what we are doing
and why.