Statistics The Regression Line November 25, 2008 Outline 1. Data: (x1 , y1 ), . . . , (xn , yn ) (n individuals, two variables x and y) 2. Goal: to write the equation of a line that best describes the linear relationship exhibited in the data. 3. Notation: We will write the line as ŷ = a + bx. (a) The notation ŷ should be read “predicted” y or “fitted” y and indicates that the actual values of y for given a given x are not necessarily going to be on the line. (b) The use of a and b are somewhat unfortunate as the usual mathematical notation is y = b + mx. But this notation is absolutely standard in statistics. (c) Note that we are trying to predict y from x and not the other way around. So there is a distinction between explanatory variable (x) and response variable (y) (d) A line such as this is called a “regression” line. (e) The coefficients a and b have natural interpretations in terms of x and y. The slope b is the amount of change in y predicted from an increase in one unit of x. The intercept a is the predicted value of y for x = 0. (Note that x = 0 often can’t happen.) 4. Many lines seem possible. We will choose the “least squares” line. The values for b and a are computed from the correlation coefficient r by b=r sy sx a = ȳ − bx̄. 5. What is the “least-squares” line? The line that minimizes the sums of the squares of the residuals.
© Copyright 2026 Paperzz