April 23

Mathematics 243
More explanatory variables
Observation = Model + Error
yi = β0 + β1 xi,i + · · · + βk xk,i + εi
Day 42 - April 23
Observation = Fitted + Residual
yi = b0 + bxi,i + · · · + bk xk,i + ei .
1980 Census Undercount
The dataset Ericksen in the package car has data on undercounting in the 1980 census. It includes data on the
16 largest cities, the rest of the states in which those cities are located, and the other US states. We are interested
in estimating the percentage by which the 1980 census undercounted the city or state population (undercount).
Ericksen et.al. actually considered using others of the variables in the dataset. For example, they tried the model:
undercount = β0 + β1 minority + β2 crime + ε
The R to fit this model is here:
> model=lm(undercount~minority+crime,data=Ericksen)
> model
Call:
lm(formula = undercount ~ minority + crime, data = Ericksen)
Coefficients:
(Intercept)
-1.60665
minority
0.06595
crime
0.03561
1. What are the three assumptions for the regression model in this specific case?
2. To graph the data, try
> cloud(undercount~minority+crime,data=Ericksen)
3. Write the equation for predicting undercount percentage from minority percentage and crime rate.
4. Compare the residual standard errors of this model and the model with only minority percentage. Comment.
5. Compare the values of R2 . Comment.
6. Write 95% confidence intervals for β1 and β2 .
7. Intepret b1 . In other words, how would you explain to someone what b1 means?
8. The minority percentage and crime rate in Grand Rapids were 18.9% and 87 resepctively Predict the Grand Rapids
undercount rate and compare it to the previous model.
Census model yet again
Let’s not stop here. The authors also considered a model that used the variable conventional which was the
percentage of the enumeration done by the traditional “door-to-door” method.
undercount = β0 + β1 minority + β2 crime + β3 conventional + ε
1. Fit this model. What are the estimates of the betas?
2. Examine such evidence as you want to decide how much better this model is than the previous one.
3. The percentage of conventional enumeration for Grand Rapids was 0. Predict the Grand Rapids undercount
percentage using this model. Compare to your other two models.
4. We would like to look at residuals but what would we plot them against? (Try residual plots against each preditor,
the response, and the fitted values.)
>
>
>
>
>
>
>
>
>
>
>
>
model=lm(height~foot,data=statstu)
r=residuals(model)
f=fitted(model)
bs=coefficients(model)
xyplot(r~foot,data=statstu)
xyplot(height~foot,data=statstu)
ladd(panel.abline(model))
summary(model)
confint(model)
x=data.frame(foot=c(25,30,35))
predict(model,x,interval='confidence')
predict(model,x,interval='prediction')
#
#
#
#
#
#
#
#
#
#
#
#
fits the linear model
defines the vector of residuals
defines the vector of fitted values
a vector with the slope and intercept
plotting the residuals
plotting the data
adds the regression line to data plot
summary of statistics and inference
confindence intervals for betas
a dataframe of x values
predictions and confidence interval for mean of y at x
prediction and prediction interval for value of y at x
2
College GPA
(We are going to investigate John’s question about how R2 might combine.) A dataset containing a random sample of 100 Calvin graduates who entered Calvin between 2001 and 2005 is in Stob’s data directory and is named
SampleGrads.csv. Variables include
ACT.Comp
Start.Term.GPA
Grad.GPA
Grad
Cohort
HS.GPA
Composite ACT score
GPA in first term at Calvin
GPA upon graduation
logical variable indicating graduated
Year entering Calvin
GHigh school GPA
For each of the following models write the fitted equation and compute SSE and R2 .
(A) Grad.GPA = β0 + ε
\ =
Equation: Grad.GPA
SSE:
(B) Grad.GPA = β0 + β1 HS.GPA + ε
\ =
Equation: Grad.GPA
SSE:
R2 :
SSE:
R2 :
SSE:
R2 :
(C) Grad.GPA = β0 + β1 ACT.Comp + ε
\ =
Equation: Grad.GPA
(D) Grad.GPA = β0 + β1 ACT.Comp + β2 HS.GPA + ε
\ =
Equation: Grad.GPA
1. Is high-school GPA or ACT score more useful for predicting graduating GPA?
2. Compare models (B) and (D). What does this seem to say about using both ACT scores and high-school GPA for
predicting graduating GPA?
3. You might ask why ACT scores do not add more predictive power to high-school GPA than they do. To think
about this question, fit the following model and think about what the result says:
HS.GPA = β0 + β1 ACT.Comp + ε
R2 :
\ =
Equation: HS.GPA
3