Aim #97: How do we distinguish between scatter plots that model a

Aim #97: How do we distinguish between scatter plots that model a linearversus a
nonlinear equation and how do we write the linear regression equation for a set of
data using our calculator?
5-9-17
Homework: Handout
Do Now: 1) A scatter plot is an informative way to display numerical data with two
variables. Here is a scatter plot of the data on elevation and mean number of clear
days.
a) Do you see a pattern in the scatter plot, or does it look like the datapoints are
scattered?
b) How would you describe the relationship between elevation and meannumber of
clear days for these 14 cities? That is, does the mean number ofclear days tend
to increase as elevation increases, or does the mean number of clear days tend to
decrease as elevation increases?
c) Do you think that a straight line would be a good way to describe therelationship
between the mean number of clear days and elevation? Why do you think this?
2) The scatter plot below shows number of cell phone calls and age. Isthere a
relationship between number of cell phone calls and age? If there is a relationship
between number of cell phone calls and age, does the relationship appear to be
linear?
3) Below are three scatter plots. Each one represents a data set with eight
observations. The scales on the x and y axes have been left off these plots on
purpose so you will have to think carefully about the relationships.
a) If one of these scatter plots represents the relationship between heightand
weight for eight adults, which scatter plot do you think it is and why?
b) If one of these scatter plots represents the relationship between heightand
SAT math score for eight high school seniors, which scatter plot do youthink it is
and why?
c) If one of these scatter plots represents the relationship between theweight of
a car and fuel efficiency for eight cars, which scatter plot do you think it is and
why?
d) Which of these three scatter plots does not appear to represent a linear
relationship? Explain the reasoning behind your choice.
4) The scatter plot below compares frying time and moisture content. Isthere a
relationship between moisture content and frying time, or do the data points look
scattered? If so, does the relationship look linear or non linear? If non-linear, what
type of association exists between frying time and moisture content?
5) Describe the type of association for each type of scatter plot.
6) The scatter plot below shows a straight line that can be used to model the
relationship between elevation and mean number of clear days. The equation of this
line is y = 83.6 + 0.008x.
a) There are 14 US cities shown in the scatter plot above. Should you seemore
clear days per year in Los Angeles, which is near sea level or in Denver which is known
as the mile high city? Justify your choice.
b) One of the cities in the data set was Albany, New York, which has anelevation of
275 feet. What would you predict this number to be based onthe equation of the
line that describes the relationship between elevation n
ad mean number of clear
days?
c) Another city in the data set was Albuquerque, New Mexico. Ithas an elevation of
5,311 feet. What would you predict this number to be using the equation of the line?
d) The actual value for Albany is 69 clear days and the actual value forAlbuquerque
is 167 clear days. Was the prediction of the mean number of clear days based on the
line closer to the actual value for Albany or for Albuquerque? How could you tell this
from looking at the scatter plot with the line shown above?
7) Kendra likes to watch crime scene investigation shows on television. She
watched a show where investigators used a shoe print to help identify a suspect
in a case. She questioned how it is possible to predict someone‛s height from his
shoe print.
To investigate, she collected data on shoe length (in inches) and height (in inches)
from 10 adult men. Her data appears in the table and scatter plot below.
Steps for finding the linear regression equation
(also known as the line of best fit)
1. Stat
Edit
Enter your data into L1 and L2
2. Stat
Calc
Choose LinReg (4)
a) Using your calculator, write the linear regression equation for the table above
where shoe size is the independent variable. Round the slope to the nearest
hundredth and y-intercept to the nearest tenth.
b) Is there a relationship between shoe length and height? Explain
c) Do the men with longer shoe lengths tend to be taller?
d) Using the equation of the line of best fit from part (a), predict the height of a
man with a shoe length of 12 inches. Round to the nearest hundredth.
e) Use the equation of the line of best fit to predict the height of a man with a
shoe length of 12.6 inches.
f) How does the predication from part e compare to the first data point in the
table?
Since his actual height was different than the predicted height, you can calculate
the prediction error by subtracting the predicted value from the actual value.
This prediction error is called a residual. For the first data point, the residual is
calculated as follows:
Residual = actual y value - predicted y value
= 74 - 71.42
= 2.58 inches
g) For the line y = 3.66x + 25.3, calculate the missing values, to the nearest
hundredth, and complete the table.
actual
predicted
(residual) 2 h) Why is the residual in the table‛s first row positive, and the residual in the
second row negative?
i) What is the sum of the residuals? ________
j) Why did you get a number close to zero for this sum?
k) Does this mean that all of the residuals were close to 0?
When you use a line to describe the relationship between two numerical values, the
best line is the line that makes the residuals as small as possible overall.
l) If the residuals tend to be small, what does this say about the fit of the line to
the data?
m) Calculate the residuals squared and fill in the column in the table above.
-The way we determine the line of best fit is to make the sum of the squared
residuals as small as possible.
n) Calculate the sum of the squared residuals. Fill this in the table above.
o) Why do we use the sum of the squared residuals instead of just the sum of the
residuals (without squaring)?
p) Assuming that the 10 men in the sample are representative ofadult men in
general, what height would you predict for a man whose shoe length is 12.5 inches?
q) What shoe length for a man whose height is 60 inches?
r) Give an interpretation of the slope of the linear regression equationfor
predicting height from shoe size for adult men.
s) What is the y-intercept ofy = 3.66x + 25.3?
t) Explain why it does not make sense to interpret the y-intercept of 25.3 as the
predicted height for an adult male whose shoe length is zero.
Sum it up!
A scatter plot can be used to investigate whether or not there is a relationship
between two numerical variables. This relationship can be described as linear or
nonlinear.
The line that has the smallest sum of squared residuals for a data set is called
the least-squares line. This line can also be called the linear regression equation
or the line of best fit .
The least-squares line (best-fit line) can be used to predict the value either
variable given the other.
Residual = actual y value - predicted y value