ST430Exam1 Sol.pdf

ST430 Exam 1 with Answers
Date:
October 5, 2015
Name:
Guideline:
• You may use one-page (front and back of a standard A4 paper) of notes.
• No laptop or textook are permitted but you may use a calculator.
• Giving or receiving assistance from other students is not allowed.
• Show work to receive partial credit! Partial credit will be given, but only for work
written on the exam.
• The total points are 25.
• Good luck!
1. Assume that the math scores of high school seniors in North Carolina are normally distributed
with mean 82 and standard deviation 5.
(a) (2 points) Compute the z-score of a student with math score 85. Would you say this
student has an extremely high math score? Justify your answer.
ANSWER: z-score= (85-82)/5=.6. Note that for standard normal distribution, 95%
of the data is within 2 standard deviation from 0; here, |.6| < 2, suggesting the 85 is not
an extreme value.
(b) (3 points) Let X be the mean math score of a class of 25. What is the probability that
X is greater than 83.6?
ANSWER:
X̄ − 82
83.6 − 82
√
√
>
)
5/ 25
5/ 25
83.6 − 82
√
= P (Z >
)
5/ 25
= P (Z > 1.6)
P (X̄ > 83.6) = P (
= .0548
2. Consider the following three scatter plots with least squares lines.
(a) (1 point) Which least squares line has the largest intercept?
ANSWER: Plot3. About 3.1.
(b) (1 point) Which least squares line has the largest slope?
ANSWER: Plot1 has largest slope.
(c) (2 points) Which simple linear regression has the largest coefficient of determination
(R2 )?
ANSWER: Plot 3, as its line has the best of fit among the three.
3. The British Journal of Sports Medicine (April 2000) published a study of the effect of massage
on boxing performance. Two variables measured on the boxers were blood lactate concentration (mM) and the boxer’s perceived recovery (28-point scale). The data were obtained for
16 five-round boxing performances, where a massage was given to the boxer between rounds.
The plot below gives the 95% prediction interval for the average value and a particular value
of perceived recovery for several levels of blood lactate concentration.
(a) (2 points) Explain why the interval for a particular value is considerably wider than the
interval for the average value.
ANSWER: Recall the formula of confidence interval and prediction interval, the only
difference lies in the margin part,
s
s
1 (xp − x̄)2
1 (xp − x̄)2
+
< 1+ +
n
SSxx
n
SSxx
Intuitively, for confidence interval, we estimate E(y) = β0 + β1 x with ŷ = β̂1 + β̂2 x. The
error is just ŷ − E(y). However, for prediction interval, we estimate y with ŷ, the error
being ŷ − y = ŷ − E(y) + (E(y) − y) = ŷ − E(y) − , which has additional error from
individual level.
(b) (1 point) Would it be wise to use this simple linear regression model to predict a boxer’s
pereceived recovery if the blood lactate level is 1mM? Explain.
ANSWER: No. Predication at some value of explanatory variable which is out of the
range of observed data will produce unreliable result. 1mM is way below the lower bound
of the data.
4. The R output for the data of the previous problem is:
Call:
lm(formula = RECOVERY ~ LACTATE, data = BOXING2)
Residuals:
Min
1Q Median
-6.577 -3.752 0.060
3Q
3.067
Max
8.043
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
2.7967
4.9838
0.561
0.5836
LACTATE
2.5667
0.9883
2.597
0.0211 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 4.28 on 14 degrees of freedom
Multiple R-squared: 0.3251,Adjusted R-squared: 0.2769
F-statistic: 6.744 on 1 and 14 DF, p-value: 0.0211
(a) (1 point) Give the least squares regression line.
ANSWER: ŷ = 2.7967 + 2.5667 × LACTATE
(b) (2 points) Give the slope and it interpretation in the context of the problem.
ANSWER: Slope is 2.5667. Thus, the boxers’ perceived recovery (28-point scale) increases by 2.5667 point on average for one additional unit increase in blood lactate
concentration (mM).
(c) (1 point) Give the sample correlation between blood lactate level and perceived recovery.
ANSWER: Sample correlation is just
wit the sign of slope).
√
R2 =
√
.3251 = .57 (note the sign must agree
(d) (2 points) Is there a statistically significant association between blood lactate level and
perceived recovery at the 0.05 level? Explain.
ANSWER: H0 : β = 0 Ha : β 6= 0
Look at the t value corresponding to LACTATE, its p-value .0211 < .05. Therefore, we
reject the null and conclude there is a statistically significant association between blood
lactate level and perceived recovery.
(e) (1 point) Would you say there is a strong linear relationship between blood lactate level
and perceived recovery? Explain.
ANSWER: The R-squares is 0.3251, which means only about 32.5 percent of variation in the data can be explained by the model. Hence the linear relationship is not very
strong.
5. The first-order multiple regression model with two predictors is
Y = β0 + β1 X1 + β2 X2 + ,
where Y is the dependent variable, X1 and X2 are the independent variables, and is the
random error. We collect 32 observations and perform a multiple regression. The R output
is:
Call:
lm(formula = Y ~ X1 + X2)
Residuals:
Min
1Q
-2.2128 -0.5937
Median
0.1083
3Q
0.7110
Max
1.8639
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.8676
1.3173 -2.177 0.03778 *
X1
2.4296
0.6857
3.543 0.00136 **
X2
2.2206
0.6615
3.357 0.00222 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 1.012 on 29 degrees of freedom
Multiple R-squared: 0.396,Adjusted R-squared: 0.3543
F-statistic: 9.505 on 2 and 29 DF, p-value: 0.0006691
(a) (1 point) What is the least squares regression line?
ANSWER: ŷ = −2.8676 + 2.4296X1 + 2.2206X2
(b) (1 point) Conduct a test of overall model utility. Use α = .05.
ANSWER:
H0 : β1 = β2 = 0 Ha : at least one of β1 and β2 is not equal to zero.
F-test value is 9.505 with p-value= .0006 which is less than .05. Thus the model is useful
in that it explains some variation in Y using X1 and X2 .
(c) (2 points) Conduct a test whether X1 is significantly associated with Y . Use α = .05.
ANSWER: H0 : β1 = 0 and Ha : β1 6= 0. The t-value for X1 is 3.543. Its p-value
is .00136, which is less than .05. Thus we reject the null and conclude X1 is significantly
associated with Y .
(d) (2 points) What assumptions about ’s distribution are needed for the test in (c).
ANSWER: we need the following
1. i follows the same normal distribution N (0, σ 2 ) for all i;
2. i is independent of j for j 6= i.