Soc 6Z03: Multiple Regression Homework Answers

10/21/2016
Soc 6Z03: Multiple Regression Homework Answers
Soc 6Z03: Multiple Regression
Homework Answers
John Fox
2015­09­12
> States <‐ read.table("D:/Courses/2016‐2017/Soc6Z03/Data/States.txt", + header=TRUE, sep="", na.strings="NA", dec=".", strip.white=TRUE)
1.
> RegModel.1 <‐ lm(satMath~percentTaking+teacherPay, data=States)
> summary(RegModel.1)
Call:
lm(formula = satMath ~ percentTaking + teacherPay, data = States)
Residuals:
Min 1Q Median 3Q Max ‐45.642 ‐10.763 0.035 8.764 39.615 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 537.9430 15.7742 34.103 < 2e‐16 ***
percentTaking ‐1.2887 0.1171 ‐11.003 1.01e‐14 ***
teacherPay 1.0328 0.4945 2.089 0.0421 * ‐‐‐
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 17.33 on 48 degrees of freedom
Multiple R‐squared: 0.7624, Adjusted R‐squared: 0.7525 F‐statistic: 77.02 on 2 and 48 DF, p‐value: 1.045e‐15
a = 537.943 : The literal interpretation of the regression constant is the predicted average math SAT
score for a state where teachers on average earn nothing and no students took the exam. Clearly this
makes no sense.
: Each increase of 1 in the percentage of students taking the exam is associated on
average with a decline of a little more than 1 point in the average math SAT score, holding average
teachers’ pay constant. In states where relatively few students take the exam, the stronger students tend
to take it, so this makes sense.
b1 = −1.289
b2 = 1.033: Each increase of $1000 in average teachers’ pay is associated on average with a rise of
about 1 point in the average math SAT score, holding percentage of students taking the exam constant.
2
= .7624
file:///D:/Courses/2016­2017/Soc6Z03/Homework/multiple­regression.html
1/5
10/21/2016
Soc 6Z03: Multiple Regression Homework Answers
R = .7624 : The regression of average SAT math score on percentage of students taking the exam and
average teachers’ pay accounts for more than 3/4 of the variation among states in the average SAT math
scores, a substantial amount.
2
2.
> RegModel.2 <‐ lm(satMath~teacherPay, data=States)
> summary(RegModel.2)
Call:
lm(formula = satMath ~ teacherPay, data = States)
Residuals:
Min 1Q Median 3Q Max ‐64.971 ‐22.715 0.179 14.881 66.008 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 610.3956 26.6261 22.925 < 2e‐16 ***
teacherPay ‐2.2603 0.7312 ‐3.091 0.00328 ** ‐‐‐
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 32.19 on 49 degrees of freedom
Multiple R‐squared: 0.1632, Adjusted R‐squared: 0.1461 F‐statistic: 9.556 on 1 and 49 DF, p‐value: 0.003284
The coefficient for teachers’ pay in the simple regression is negative, b
rises, the average SAT math score declines.
= −2.260
: As teachers’ pay
The difference is due to the failure to control for percentage of students taking the exam, which becomes
a lurking variable: Teachers’ pay is positively correlated with percentage of students taking the exam (
r = .605, using the R Commander) — i.e., states that pay their teachers relatively well tend to have a
larger percentage of students taking the SAT), which in turn is negatively related to average SAT math
score.
3.
> States$fitted.RegModel.1 <‐ fitted(RegModel.1)
> States$residuals.RegModel.1 <‐ residuals(RegModel.1)
> scatterplot(residuals.RegModel.1~percentTaking, + reg.line=FALSE, smooth=TRUE, spread=TRUE, + boxplots=FALSE, span=0.9, ellipse=FALSE, levels=c(.5, .9), + data=States)
file:///D:/Courses/2016­2017/Soc6Z03/Homework/multiple­regression.html
2/5
10/21/2016
Soc 6Z03: Multiple Regression Homework Answers
> scatterplot(residuals.RegModel.1~teacherPay, + reg.line=FALSE, smooth=TRUE, spread=TRUE, + boxplots=FALSE, span=0.9, ellipse=FALSE, levels=c(.5, .9), + data=States)
file:///D:/Courses/2016­2017/Soc6Z03/Homework/multiple­regression.html
3/5
10/21/2016
Soc 6Z03: Multiple Regression Homework Answers
> scatterplot(residuals.RegModel.1~fitted.RegModel.1, + reg.line=FALSE, smooth=TRUE, spread=TRUE, + boxplots=FALSE, span=0.9, ellipse=FALSE, levels=c(.5, .9), + data=States)
file:///D:/Courses/2016­2017/Soc6Z03/Homework/multiple­regression.html
4/5
10/21/2016
Soc 6Z03: Multiple Regression Homework Answers
The residual plot for percentTaking reveals that the partial relationship between satMath and
percentTaking is nonlinear.
There is also nonlinearity apparent in the plot of residuals against fitted values.
The residual plot for teacherPay shows only smaller departures from linearity.
file:///D:/Courses/2016­2017/Soc6Z03/Homework/multiple­regression.html
5/5