November 25

Statistics
The Regression Line
November 25, 2008
Outline
1. Data: (x1 , y1 ), . . . , (xn , yn )
(n individuals, two variables x and y)
2. Goal: to write the equation of a line that best describes the linear relationship exhibited
in the data.
3. Notation: We will write the line as ŷ = a + bx.
(a) The notation ŷ should be read “predicted” y or “fitted” y and indicates that the
actual values of y for given a given x are not necessarily going to be on the line.
(b) The use of a and b are somewhat unfortunate as the usual mathematical notation is
y = b + mx. But this notation is absolutely standard in statistics.
(c) Note that we are trying to predict y from x and not the other way around. So there
is a distinction between explanatory variable (x) and response variable (y)
(d) A line such as this is called a “regression” line.
(e) The coefficients a and b have natural interpretations in terms of x and y.
The slope b is the amount of change in y predicted from an increase in one unit of x.
The intercept a is the predicted value of y for x = 0. (Note that x = 0 often can’t
happen.)
4. Many lines seem possible. We will choose the “least squares” line. The values for b and
a are computed from the correlation coefficient r by
b=r
sy
sx
a = ȳ − bx̄.
5. What is the “least-squares” line? The line that minimizes the sums of the squares of the
residuals.