Solution of HK 2

ACMS 30600 HW #2 Solutions
28 points [should have been 26 points; everyone receives 2/2 free]
1. Compute 95% CI of Beta 1
0.03618-qt(0.975,14)*0.01200
0.03618+qt(0.975,14)*0.01200
Resulting interval is (0.0104, 0.0619).
2. How does the CI you calculated for Beta 1 relate to the p-value for estimated Beta 1?
The p-value for 𝛽𝛽1 in the R output, 0.00927, indicates that we would reject the null hypothesis for the
test 𝐻𝐻0 : 𝛽𝛽1 = 0 𝑣𝑣𝑣𝑣. π»π»π‘Žπ‘Ž : 𝛽𝛽1 β‰  0 at 𝜢𝜢 = 𝟎𝟎. 𝟎𝟎𝟎𝟎. Since this 95% CI does not include the null value 𝛽𝛽1 = 0,
the CI is consistent with the test.
3. Report and interpret both the coefficient of determination and the coefficient of correlation (4 pts.)
The coefficient of determination is π‘Ÿπ‘Ÿ 2 = 0.3937. Thus, almost 40% of the variation in ACTIVITY is
explained by EMPATHY.
The correlation coefficient is r=0.6275 indicates a moderately strong linear relationship between brain
activity and emphatic concern.
Note: It does NOT indicate that an increase of 1 unit in empathy is on average associated with a 0.6275
increase in activityThat would be the interpretation for Beta=0.6275, NOT for r=0.6275.
4. Obtain from R or SAS and interpret a 95% confidence interval for the mean value of y for x=17.
predict(mod.brain,data.frame(EMPATHY=17),interval="confidence",level=0
.95)
fit
lwr
upr
1 0.2225702 0.1329588 0.3121817
We are 95% confident the mean ACTIVITY score for all subjects with EMPATHY=17 is contained in the
interval (0.1329588,0.3121817)
5. Obtain from R or SAS and interpret a 95% prediction interval for the mean value of y for x=17.
predict(mod.brain,data.frame(EMPATHY=17),interval="prediction",level=0
.95)
fit
lwr
upr
1 0.2225702 -0.1322776 0.577418
We are 95% confident the ACTIVITY score for a single future individual with EMPATHY=17 will fall in the
interval (-0.1322776,0.577418).
6. Using R, fit a multiple regression model with y=sr and the other four variables as predictors. Report
and interpret the p-value for the Model Utility F-test.
savings<-read.table("savings.txt", header=T)
attach(savings)
mod.complete<-lm(sr~pop15+pop75+dpi+ddpi)
P-value for the Model Utility Test is 0.0007904. Since this is less than 𝛼𝛼 = 0.05, we reject the null of all
slopes are equal to zero and conclude we have at least one statistically significant variable (or β€œthe
model is better than an intercept only model”).
7. Report the individual tests for each coefficient
Note: This implies that the t-stats, p-values are listed, and also there is an interpretation of whether to
reject the null hypothesis that the coefficient is zero. One point was deducted if there was no
interpretation. No credit was given was given if only provided coefficient estimates.
Example: Since the p-value for pop15 is 0.002603 (t-value of 3.884), which is less than alpha=0.05 (larger
than the critical value of 1.96), we can reject the null hypothesis and conclude that the coefficient on
pop15 is statistically significantly different from zero.
Note: If the p-value is larger than 0.05, you don’t accept the null hypothesis; instead, you fail to reject
the null hypothesis.
summary(mod.complete)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.5660865 7.3545161 3.884 0.000334 ***
pop15
-0.4611931 0.1446422 -3.189 0.002603 **
pop75
-1.6914977 1.0835989 -1.561 0.125530
dpi
-0.0003369 0.0009311 -0.362 0.719173
ddpi
0.4096949 0.1961971 2.088 0.042471 *
--Signif. codes: 0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1
Residual standard error: 3.803 on 45 degrees of freedom
Multiple R-squared: 0.3385, Adjusted R-squared: 0.2797
F-statistic: 5.756 on 4 and 45 DF, p-value: 0.0007904
8. 95% Confidence Intervals:
confint(mod.complete, 'pop15', level=0.95) [-0.752517542, -0.16986752]
confint(mod.complete, 'pop75', level=0.95) [-3.873977955, 0.490982602]
confint(mod.complete, 'dpi', level=0.95) [-0.002212248, 0.001538444]
confint(mod.complete, 'ddpi', level=0.95) [0.014533628, 0.804856227]
Note: In general, when reporting an interval, it is implied that the interval should be interpreted.
However, since very few students interpreted these intervals, no points were deducted for missing
interpretations here.
9. Compute Variance/Covariance Matrix:
vcov(mod.complete)
This code generates the correct output, but need to print the output for full credit.
Part II: R Activity
1.
r.squared.vector<-NULL
for (i in 1:1000){
my.sample<-sample(1:16,16,replace=T)
ACTIVITY.i<-ACTIVITY[my.sample]
EMPATHY.i<-EMPATHY[my.sample]
r.i<-cor(ACTIVITY.i,EMPATHY.i)
r.squared.vector<-c(r.squared.vector,r.i^2)
}
hist(r.squared.vector)
2.
quantile(r.squared.vector,.025)
quantile(r.squared.vector,.975)
Actual answers will vary due to the randomness of the bootstrap method. I obtained
(0.06999965, 0.7193106)
3.
We are 95% confident that the true population value of the coefficient of determination is contained in
the above interval.