May 7

Statistics
Linear Regression
1. Data: (x1 , y1 ), . . . , (xn , yn )
May 7, 2009
(n individuals, two variables x and y)
2. Least-squares regression line:
ŷ = a + bx.
3. Residuals: y − ŷ. “Error” in using ŷ to predict y.
4. Choose a and b to minimize sums of squares of residuals.
5. Properties of line:
sy
where sx and sy are the standard deviations of x and y and r is
sx
the correlation coefficient.
• The slope b = r
• The line passes through the point (x̄, ȳ).
6. Explaining variation in y
Why aren’t all the different y values equal?
(a) x “explains” some of it
(b) natural variation (“error”) explains the rest.
X
(c) Total variation in y to be explained: SStotal =
(y − ȳ)2 .
X
(d) Variation not explained by x: SSerror =
(y − ŷ)2 .
(e) Variation explained by x: SSmodel = SStotal − SSerror
SSmodel
(f) Percentage of variation explained by x: R2 =
SSTotal
7. Describing the result of a linear regression: Use either R2 or b