Homework 9
Q8.6
(a) (b)
(c) Parabolic trend indicates the model is misspecified and suggests that a second order term may be needed to improve
the fit of the model.
Q8.12
HAWAII=read.table(file.choose(),header=TRUE)
out=lm(LEASEFEE~SIZE+I(SIZE^2),HAWAII)
plot(fitted(out),residuals(out))
lwx=c(9.6,7.9,11.5,8.2,12,10,10.2,7.7,9.9,11.2)
lwy=c(52.7,43.2,103.8,45.1,73.3,61.3,85,47,54.7,68)
out1=lm(lwy~lwx+I(lwx^2))
summary(out1)
anova(out1)
uwx=c(13.5,17.6,15.2,13.8,14.5,18.7,13.2,16.3,12.3,12.4)
uwy=c(70.7,87.6,86.8,144.3,148,171.2,97.5,158.1,74.2,75.2)
out2=lm(uwy~uwx+I(uwx^2))
summary(out2)
anova(out2)
8.18
BOILERS=read.table(file.choose(),header=TRUE)
out=lm(Man.HRs~Capacity+Pressure+Boiler+Drum,BOILERS)
r=residuals(out)
par(mfrow=c(1,2))
hist(r,probability=TRUE) #histogram
#add a normal curve
curve(dnorm(x,mean = 0,sd = sd(r)),col=2,add=TRUE)
qqnorm(r,pch=20) #q-q plot
qqline(r,col=2)
shapiro.test(r)
> shapiro.test(r)
Shapiro-Wilk normality test
data: r
W = 0.9547, p-value = 0.1469
At 5% level, we don’t have evidence that the random errors are not normally distributed.
Q8.28
MISSWORK=read.table(file.choose(),header=TRUE)
out1=lm(formula=HOURS~WAGES,data=MISSWORK)
plot(MISSWORK$WAGES,out1$residuals)
x=MISSWORK$WAGES
H=x%*% (t(x)%*%x)^{-1} %*%t(x)
H[13,13]
MISSWORK_d=subset(MISSWORK,HOURS!=543)
out2=lm(formula=HOURS~WAGES,data=MISSWORK_d)
(a)
lm(formula = HOURS ~ WAGES, data = MISSWORK)
Residuals:
Min
1Q Median
-78.22 -52.68 -46.50
3Q
Max
0.45 436.54
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 222.636
119.900
1.857
0.0861 .
WAGES
-9.601
9.502 -1.010
0.3307
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 130.9 on 13 degrees of freedom
Multiple R-squared: 0.07282, Adjusted R-squared: 0.001496
F-statistic: 1.021 on 1 and 13 DF, p-value: 0.3307
(b)
(c)
(d)
Leverage value of 13th observation is the 13th diagonal element in the H matrix.
> H[13,13]
[1] 0.06129994
compare .061299 with 2*(k+1)/n=.2667.
13th observation is not influential.
Alternatively, one can use Jackknife or Cook’s Distance.
(e)
lm(formula = HOURS ~ WAGES, data = MISSWORK_d)
Residuals:
Min
1Q Median
-47.03 -22.45 -15.84
3Q
10.85
Max
74.50
Coefficients:
Estimate Std. Error t value
(Intercept) 191.256
36.210
5.282
WAGES
-9.585
2.861 -3.350
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01
Pr(>|t|)
0.000194 ***
0.005784 **
‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 39.43 on 12 degrees of freedom
Multiple R-squared: 0.4832, Adjusted R-squared: 0.4401
F-statistic: 11.22 on 1 and 12 DF, p-value: 0.005784
The new model has R-squared increased drastically.
© Copyright 2026 Paperzz