Students Matter. Success Counts.

Section 12.2
Linear Regression
With added comments and content
by D.R.S., University of Cordele
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Least-Squares Regression Line
Least-Squares Regression Line
The least-squares regression line is the line for which
the average variation from the data is the smallest, also
called the line of best fit, given by ŷ  b0  b1 x.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Linear Regression
Input: A bunch of (𝑥, 𝑦) data points in our sample.
Requirement: We have found that the variables do
have significant linear correlation.
Output: “The Regression Line” (also called The Line of
Best Fit “is 𝑦 = 𝑎 + 𝑏𝑥.”
It’s an equation of a line that models the relationship.
The Theory behind it.

Best fit means that the sum of the squares
of the vertical distance from each point to
the line is at a minimum.
(This slide is mostly from
Bluman’s 5th edition,
© McGraw Hill)
Your textbook has the awful formulas
to determine the equation of the
line. That’s what the calculator uses
to come up with its results.
Bluman, Chapter
4
Living with Inconsistent Notation
Traditional algebra class line: 𝑦 = 𝑚𝑥 + 𝑏
– Slope is the number that multiplies the 𝑥.
– The y-intercept is the 𝑏 (keep the sign.)
Calculator uses either 𝑎 + 𝑏𝑥 (and in other cases the
calculator will call it 𝑎𝑥 + 𝑏)
Hawkes talks about 𝑦 = 𝑏0 + 𝑏1 𝑥
• If a problem asks for b1, you just have to know it’s
the slope and the slope is the coefficient of x.
• If a problem asks for b0, you just have to know it’s
the y-intercept.
Slope of the Least-Squares Regression Line
Slope of the Least-Squares Regression Line
The slope of the least-squares regression line for paired
data from a sample is given by
b1 
  x   y 
n x    x 
n x i y i 
2
i
i
i
2
i
(This frightening formula
is presented for shock
value and informational
purposes only.)
where n is the number of data pairs in the sample,
xi is the ith value of the explanatory variable, and
yi is the ith value of the response variable.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
y-Intercept of the Least-Squares Regression Line
y-Intercept of the Least-Squares Regression Line
The y-intercept of the least-squares regression line for
paired data from a sample is given by
b0
y


n
i
 b1
x
n
i
(This frightening formula is
presented for shock value and
informational purposes only.)
where n is the number of data pairs in the sample,
xi is the ith value of the explanatory variable,
yi is the ith value of the response variable, and
b1 is the slope of the least-squares regression line.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.10: Finding a Least-Squares
Regression Line Using a TI-83/84 Plus Calculator
The local school board wants to evaluate the
relationship between class size and performance on the
state achievement test. It decides to collect data from
various schools in the district, and the data from a
sample of eight classes are shown in the following
table. Each pair of data values represents the class size
and corresponding average score on the achievement
test for one class.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.10: Finding a Least-Squares Regression
Line Using a TI-83/84 Plus Calculator (cont.)
Class Sizes and Average Test Scores
Class Size
Average Test Score
15
17 18 20
21
24
26
29
85.3 86.2 85 82.7 81.9 78.8 75.3 72.1
Determine if there is a significant linear relationship
between class size and average test score at the 0.05
level of significance. If the relationship is significant,
find the least-squares regression line for these data.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.10: Finding a Least-Squares Regression
Line Using a TI-83/84 Plus Calculator (cont.)
Solution
First, we must decide which variable should be the
x-variable and which variable should be the y-variable.
Consider whether there is a possibility that one of
these variables influences the values of the other. In
this case, we are interested in the possibility that class
size influences the average test score. Thus, class size is
the explanatory variable, x, and average test score is
the response variable, y.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.10: Finding a Least-Squares Regression
Line Using a TI-83/84 Plus Calculator (cont.)
• Press
.
• Select option 1:Edit.
• Enter the values for class size (x) in List 1 (L1) and
the values for average test score (y) in List 2 (L2).
• Press
. INSTEAD, use STAT, TESTS, LinRegTTest
as we demonstrated in Section 12.1
• Select CALC.
• Choose option 4:LinReg(ax+b).
• Press
twice.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.10 Inputs
Put x and y values in
two lists, as usual.
Recall: STAT, TESTS,
ALPHA F on 84, E on 83
VARS, Y-VARS, 1, 1 to get
The Y1 into the RegEQ
Example 12.10 Outputs
Do we have a significant
linear relationship here?
Compare the p-Value to
the Level Of Significance
α=0.05
If significant, then y = a + bx
with these values of a and b
is the “Line Of Best Fit”, but
you do NOT have to retype
them, see next screen!
r and r2 are here as usual.
Example 12.10 Outputs
A
W
E
S
O
M
E
!
Because you told
LinRegTTest
“RegEQ: Y1”,
the y = a + bx is assembled
for you in the Y= screen as
equation Y1.
Example 12.10 Outputs
You can then set up a STAT
PLOT to do the scatter plot.
ZOOM 9:ZoomStat plots
both the scatter plot and
the Line of Best Fit together.
Example 12.10: Finding a Least-Squares Regression
Line Using a TI-83/84 Plus Calculator (cont.)
Next we need to consider the shape of the data in the
scatter plot. Looking at the following graph, we can
verify that the data points fall in a somewhat linear
fashion. It is now appropriate to consider the linear
regression model. The slope of the regression line is
a = b1 ≈ −1.043 and the y-intercept of the regression
line is b = b0 ≈ 103.085. Thus, the equation of the
regression line, in the form ŷ  b0  b1 x is as follows.
yˆ  103.085  1.043x
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.10: Finding a Least-Squares Regression
Line Using a TI-83/84 Plus Calculator (cont.)
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
A prediction should not be made with a
regression model if…
A prediction should not be made with a regression
model if…
1. The data do not fall in a linear pattern when
graphed on a scatter plot.
2. The correlation coefficient is not statistically
significant.
3. You wish to make a prediction about a value outside
the range of the sample data.
4. The population is different than that from which the
sample data were drawn.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.11: Making Predictions Using a
Least-Squares Regression Line
Use the equation of the regression line from the
previous example,
yˆ  103.085  1.043x ,
to predict what the average achievement test score will
be for the following class sizes.
a. 16 But instead of plugging
b. 19 in and computing by
hand, use the TI-84 to
c. 25 directly do function
d. 45 notation, as shown
here:
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.11: Making Predictions Using a
Least-Squares Regression Line (cont.)
Solution
a. yˆ  103.085  1.04316   86.397
b. yˆ  103.085  1.04319  83.268
They used rounded-off
values for slope and yintercept and got slightly
different results compared
to what we did with the
Y1(16), etc.
c. yˆ  103.085  1.043 25  77.010
d. It is not meaningful to predict the value of y for
this class size because the value x = 45 is outside the
range of the original data. The original data only
considered class sizes between 15 and 29, so we
should only predict the average achievement test
scores for class sizes within this range.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.12: Finding the Least-Squares
Regression Line for a Given Data Set
The following table lists data collected on the selling
prices of used Land Rover Freelanders and their ages in
years. Find a linear regression model for predicting the
price of a used Land Rover Freelander based on its age
in years, if appropriate at the 0.05 level of significance.
Selling Prices and Ages of Used Land Rover Freelanders
Age
3
4
1
2
2
3
4
3
4
1
(in Years)
Price
15,500 14,995 30,795 28,995 23,995 20,900 20,500 19,995 19,888 29,995
(in Dollars)
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.12: Finding the Least-Squares
Regression Line for a Given Data Set (cont.)
Solution
To begin, we must determine which variable is the
explanatory variable (x) and which variable is the
response variable (y). We want to use the age of a used
Freelander to predict its selling price, thus age (x) is the
explanatory variable and price (y) is the response
variable.
Use LinRegTTest again and see if you agree with
their results on the following slides.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.12: Finding the Least-Squares
Regression Line for a Given Data Set (cont.)
Thus, it is appropriate to consider the linear regression
model. The slope of the regression line is a = b1 ≈
4257.818 and the y-intercept of the regression line is
b = b0 ≈ 34,051.909. Therefore the regression line, in
the form ŷ  b0  b1 x , is as follows.
yˆ  34,051.909  4257.818 x
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.13: Making Predictions Using a
Linear Regression Model
Use the linear regression model from the previous
example,
yˆ  34,051.909  4257.818 x ,
to predict the following.
a. The selling price of a Land Rover Freelander that is
2.5 years old
b. The selling price of a Land Rover Freelander that is
10 years old
c. The selling price of a Land Rover Range Rover that is
3 years old
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.13: Making Predictions Using a
Linear Regression Model (cont.)
Solution
a. Substitute the value x = 2.5 into the regression line
and solve for y.
yˆ  34,051.909  4257.818 x
yˆ  34,051.909  4257.818  2.5
yˆ  23,407.364
Thus, we would predict that a 2.5-year-old
Freelander would sell for approximately $23,407.36.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.13: Making Predictions Using a
Linear Regression Model (cont.)
b. The original sample only contains Land Rover
Freelanders that are 1 to 4 years old; therefore, it is
inappropriate to use this model to predict the price
of a Freelander that is 10 years old.
c. The population is Freelanders, not Range Rovers.
Thus, it is inappropriate to use this model to predict
the price of a Range Rover, no matter how old it is.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.14: A Linear Regression Model, Start
to Finish
The following table gives the average monthly
temperatures and corresponding monthly precipitation
totals for one year in Key West, Florida.
Average Temperatures and Precipitation Totals in Key West, Florida
Average Temperature (in °F)
Precipitation (in Inches)
75 76 79 82 85 88
2.22 1.51 1.86 2.06 3.48 4.57
Average Temperature (in °F)
Precipitation (in Inches)
89 90 88 85 81 77
3.27 5.4 5.45 4.34 2.64 2.14
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.14: A Linear Regression Model, Start
to Finish (cont.)
a. Create a scatter plot for the data. Does there appear
to be a linear relationship between x and y?
b. Calculate the correlation coefficient, r.
c. Verify that the correlation coefficient is statistically
significant at the 0.05 level of significance.
d. Determine the equation of the line of best fit.
e. Calculate and interpret the coefficient of
determination, r2.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.14: A Linear Regression Model, Start
to Finish (cont.)
f. If appropriate, predict the monthly precipitation
total in Key West for a month in which the average
temperature is 80 degrees.
g. If appropriate, predict the monthly precipitation in
Destin, Florida, for a month in which the average
temperature is 83 degrees.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.14: A Linear Regression Model, Start
to Finish (cont.)
If r is statistically significant for the variables, a linear
regression model would be appropriate.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.14: A Linear Regression Model, Start
to Finish (cont.)
c. Using Table I in Appendix A, we find that the critical
value at the 0.05 level of significance is 0.576. Since
|0.859| > 0.576, Using the p value from
LinRegTTest, we can conclude that r is indeed
statistically significant.
d. From the calculator, we see that the equation of the
line of best fit is as follows.
yˆ  15.424  0.225x
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.14: A Linear Regression Model, Start
to Finish (cont.)
e. The coefficient of determination is approximately
0.738. This tells us that approximately 73.8% of the
variation in precipitation can be attributed to the
linear relationship between temperature and
precipitation. The remaining 26.2% of the variation
is from unknown sources.
f. Because r is statistically significant, we can use the
regression line to make predictions regarding the
variables.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.14: A Linear Regression Model, Start
to Finish (cont.)
In addition, 80 is within the range of the x-values from
the sample data, so it is appropriate to predict the
monthly precipitation total when the average
temperature is 80 degrees. Because we designated the
average temperatures as the x-values, substitute x = 80
into the regression equation to obtain an estimate for
the precipitation total for a month when the average
temperature is 80 degrees.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.14: A Linear Regression Model, Start
to Finish (cont.)
yˆ  15.424  0.225 x
yˆ  15.424  0.225  80 
yˆ  2.58
Thus, a reasonable estimate for the precipitation for a
month in which the average temperature is 80 degrees
is approximately 2.58 inches.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Example 12.14: A Linear Regression Model, Start
to Finish (cont.)
g. The data were collected in Key West—not Destin,
Florida. Therefore, it is not appropriate to use the
linear regression equation to make predictions
regarding the precipitation in Destin.
HAWKES LEARNING SYSTEMS
Students Matter. Success Counts.
Copyright © 2013 by Hawkes Learning
Systems/Quant Systems, Inc.
All rights reserved.
Excel – Data tab, Data Analysis add-in,
Regression
Excel > Data > Data Analysis > Regression
A lot of
choices
to make:
Excel Regression tool
with class size and test score example
*
Excel Regression tool
with class size and test score example
*
Excel Regression tool
with class size and test score example
• Some familiar things in the output (yellow)
• But you have to know where to find them
• A lot of new and different advanced stuff
• Some of it’s discussed in Lessons 12.3 and 12.4
• Some of it’s from advanced courses.