Mathematics 243 More explanatory variables Observation = Model + Error yi = β0 + β1 xi,i + · · · + βk xk,i + εi Day 42 - April 23 Observation = Fitted + Residual yi = b0 + bxi,i + · · · + bk xk,i + ei . 1980 Census Undercount The dataset Ericksen in the package car has data on undercounting in the 1980 census. It includes data on the 16 largest cities, the rest of the states in which those cities are located, and the other US states. We are interested in estimating the percentage by which the 1980 census undercounted the city or state population (undercount). Ericksen et.al. actually considered using others of the variables in the dataset. For example, they tried the model: undercount = β0 + β1 minority + β2 crime + ε The R to fit this model is here: > model=lm(undercount~minority+crime,data=Ericksen) > model Call: lm(formula = undercount ~ minority + crime, data = Ericksen) Coefficients: (Intercept) -1.60665 minority 0.06595 crime 0.03561 1. What are the three assumptions for the regression model in this specific case? 2. To graph the data, try > cloud(undercount~minority+crime,data=Ericksen) 3. Write the equation for predicting undercount percentage from minority percentage and crime rate. 4. Compare the residual standard errors of this model and the model with only minority percentage. Comment. 5. Compare the values of R2 . Comment. 6. Write 95% confidence intervals for β1 and β2 . 7. Intepret b1 . In other words, how would you explain to someone what b1 means? 8. The minority percentage and crime rate in Grand Rapids were 18.9% and 87 resepctively Predict the Grand Rapids undercount rate and compare it to the previous model. Census model yet again Let’s not stop here. The authors also considered a model that used the variable conventional which was the percentage of the enumeration done by the traditional “door-to-door” method. undercount = β0 + β1 minority + β2 crime + β3 conventional + ε 1. Fit this model. What are the estimates of the betas? 2. Examine such evidence as you want to decide how much better this model is than the previous one. 3. The percentage of conventional enumeration for Grand Rapids was 0. Predict the Grand Rapids undercount percentage using this model. Compare to your other two models. 4. We would like to look at residuals but what would we plot them against? (Try residual plots against each preditor, the response, and the fitted values.) > > > > > > > > > > > > model=lm(height~foot,data=statstu) r=residuals(model) f=fitted(model) bs=coefficients(model) xyplot(r~foot,data=statstu) xyplot(height~foot,data=statstu) ladd(panel.abline(model)) summary(model) confint(model) x=data.frame(foot=c(25,30,35)) predict(model,x,interval='confidence') predict(model,x,interval='prediction') # # # # # # # # # # # # fits the linear model defines the vector of residuals defines the vector of fitted values a vector with the slope and intercept plotting the residuals plotting the data adds the regression line to data plot summary of statistics and inference confindence intervals for betas a dataframe of x values predictions and confidence interval for mean of y at x prediction and prediction interval for value of y at x 2 College GPA (We are going to investigate John’s question about how R2 might combine.) A dataset containing a random sample of 100 Calvin graduates who entered Calvin between 2001 and 2005 is in Stob’s data directory and is named SampleGrads.csv. Variables include ACT.Comp Start.Term.GPA Grad.GPA Grad Cohort HS.GPA Composite ACT score GPA in first term at Calvin GPA upon graduation logical variable indicating graduated Year entering Calvin GHigh school GPA For each of the following models write the fitted equation and compute SSE and R2 . (A) Grad.GPA = β0 + ε \ = Equation: Grad.GPA SSE: (B) Grad.GPA = β0 + β1 HS.GPA + ε \ = Equation: Grad.GPA SSE: R2 : SSE: R2 : SSE: R2 : (C) Grad.GPA = β0 + β1 ACT.Comp + ε \ = Equation: Grad.GPA (D) Grad.GPA = β0 + β1 ACT.Comp + β2 HS.GPA + ε \ = Equation: Grad.GPA 1. Is high-school GPA or ACT score more useful for predicting graduating GPA? 2. Compare models (B) and (D). What does this seem to say about using both ACT scores and high-school GPA for predicting graduating GPA? 3. You might ask why ACT scores do not add more predictive power to high-school GPA than they do. To think about this question, fit the following model and think about what the result says: HS.GPA = β0 + β1 ACT.Comp + ε R2 : \ = Equation: HS.GPA 3
© Copyright 2026 Paperzz