4.2: Linear Regression and the Coefficient of Determination

4.2: Linear Regression and the
Coefficient of Determination
Least-squares criterion - The sum of the squares of the vertical
distances from the data points (x, y) to the line is made as small
as possible
Least-squares line - yˆ  a  bx
slope: b 
intercept:
 x  y
n  x   x 
n  xy 
2
2
a  y  bx
Least-squares line:
• the point x, y  is always on the least-squares line
• The slope of the least-squares line tells how many units the
response variable (y) is expected to change for each unit
change in the explanatory variable (x)
• also called the marginal change of the response
variable
1
Issues affecting the validity of predictions using least-squares
equations:
• correlation coefficient
• interpolation - predicting “y hat” values for x values that are
between observed x values in the data set … more reliable
• extrapolation - predicting “y hat” values for x values that are
beyond observed x values in the data set … less reliable
Coefficient of determination
r2
• a measure of the proportion of variation in y that is
explained by the regression line, using x as the explanatory
variable.
• if r = .8, then r^2 = .64 … meaning that 64% of the
variation in the y variable can be explained the the
corresponding variation of the x variable … the remaining
36% of the variation of the y variable is due to random
chance or to the possibility of lurking variables that
influence y
2
Assignment:
3