April 13

Mathematics 243
Lines (Investigation 4.8, concluded)
Day 37 - April 13
Observation = Fitted + Residual
yi = b0 + b1 xi + ei .
1. Regression in R:
> lm(height~foot,data=statstu)
Call:
lm(formula = height ~ foot, data = statstu)
Coefficients:
(Intercept)
38.302
foot
1.033
2. Graphing the regression line:
> xyplot(height~foot,data=statstu,type=c('p','r'))
or
> xyplot(height~foot,data=statstu)
> l=lm(height~foot,data=statstu)
> ladd(panel.abline(l))
3. Evaluating the fit, part I. Obviously we want to make SSE small. But how small is small?
> l=lm(height~foot,data=statstu)
> e=residuals(l)
> SSE=sum(e^2)
> SSE
[1] 235.0006
(a) Is 235 small? Compare it to SSE for the model without the explanatory variable.
> meanmodel=lm(height~1,data=statstu)
> meanmodel
Call:
lm(formula = height ~ 1, data = statstu)
Coefficients:
(Intercept)
67.75
> e=residuals(meanmodel)
> SSEm=sum(e^2)
> SSEm
[1] 475.75
(b) The model reduces the variation from 475.5 to 235. We write this improvement as a fraction:
SSE(mean model) − SSE(linear model)
SSE(mean model)
> (SSEm-SSE)/SSEm
[1] 0.5060419
(c) This expression is usually reported as a percentage and it is called the coefficient of determination of the
linear model. The symbol for it is R2 . We say:
51% of the variation in the height of these statistics students is “explained” by the foot size.
Investigation 4.9 – Movies
1. Do parts (a), (c), (d) and (e) of Investigation 4.9.
2. Some useful R. Draw the scatterplot with the regression line superimposed. We’d like to point ata various points
with the mouse and see which movies they correspond to.
> trellis.focus()
# picks a panel
> panel.identify()
# now click on a point on the graph (use escape to end)
3. Do parts (e) and (f). Remember that you can identify different groups by the argument group= in xyplot.
Homework, Due Tuesday, April 17
1. Use the golfers data that we used in Investigation 4.7 (golfers.csv in Stob’s data).
(a) Write the equation of the least squares line for predicting average score from average number of putts per
hole.
(b) Interpret the slope of the regression line. (That is, write a good sentence that explains the meaning of the
number that that you computed for the slope.)
(c) Use this equation to predict the average score of a golfer that uses an average of 1.8 putts per hole.
2. Changing the variables. we used a model to predict height of statistics students from footsize. Suppose that we
want to predict footsize from height. Write the appropriate fitted equation using lm:
foot =
+
height
(a) Use this equation to predict the foot size of someone with the height of 63in.
(b) For the foot size that you get in part (a), use our first model to predict the height.
(c) Why isn’t your answer in part (b) equal to 63in?
2