Answers to ch 9 review

Name
1) The equation of the least squares regression line of a scatter plot is
a. A) What is the residual for the point (3,5)?
J -- ciL-j.,'-Id-.-rD, (,(1)
J
b.
O.6x
S- 4. ;;2=- ZJ. 8
-
-
9 = 2.4+
Does the equation of
y
overstate or
nder t
()J1r0.uS
2) Given: r = -0.47,
x
= 19.5,
regression equation? _
- I, d-I"
f- ~ c;,. 5 .0'0
y
= 201.5, 'sx = 6.1,
bd~
,
Sy
/\
_
j
3) Which of the foilowing could not represent a correlation coefficient?
a) -.35
~1.35
c) .35 d) -.00002
4) (Review from ch. 2): Which of the foil owing displays is best suited for categorical data?
a. Box plot
@> Bar graph
c. Stem and leaf plot
d. Dot plot
e. Scatterplot
5) A bivariate set of data relates the amount of annual salary raise and previous performance rating. The
least squares regression equation is = 1,400 + 2,000x where
is the estimated raise and x is the
performance rating. Which of the foilowing statements is not correct?
a. For each increase of one point in performance rating, the raise will increase on average by
$2,000.
b. This equation produces predicted raises with an average error O.
c. A rating of 0 wiil yield will yield a predicted raise of $1,400.
d. The correlation coefficient of the data is positive.
Ail of the above are true.
y
y
@
6) Linear regression usuaily employs the method of least squares. Which of the foilowing is the quantity
that is minimized by the least squares process?
a. Yi
b. Xi - Xi
@ ECYi
d. (Xi,y;)
e. E(Xi - X;)2
-yY
7) Which of the following is not true?
a. Two sets of data can have the same means but different variances.
b. Two sets of data can have the same variances but different means.
Two different vaiues in a data set can have the same z-score
d. All of the absolute values of z-score for a data set can be equal.
e. All of the above are true.
is>
8) The average price per dozen received by farmers for eggs has varied as shown:
Year Cents/Dozen
1985 57.1
1986 61.6
1987 54.9
1988 52.8
1989 68.9
1990 70.9
1991 67.8
1992 57.6
1993 63.4
1994 61.4
a)
~ind the equation of the best-fitting line. (Use the last two digits of the year as the independent
J.~c...+-divariable.)
P{<! Qi~1.:1Yj){Q(.0.. -=-
()
O. q 9,7 t-
, (p 7
b) What is the standard error of estimate
.
(p,
IS
.
81 (U.Pfl ;'\
Se
()~- )
~ =-
J~O-('
~ ~ C-&l b/
.
JO?:UI
.
c) Do you think this equation will be useful in predicting egg prices for years after 1994? Explain.
NDt
. f:... :>?ll
a- ::~pod...j:/t) DCU:;
(Y)
d) Use the equation to predict the price for 1995. Compare with the actual price of 64.0 cents/dozen.
Discuss the results in relation to your answer to (b).
4/-fhtf1ttj)
.
G~~.
'-If
i¥CC-r IS lfk-tsldL tfl.fL5atl1(}l.Q,
~i5
. Z/6e5# tt
5t?~ m
-fDo
+0.;
J/I._
(1U.l
/,S'<-(I
va w
.
(
of { .
e) Including the actual 1995 price. find a new equation of the best fitting line. Comment on how the new
data point affects the fit.
g-
-"I -
'1
f)
o.¥ 'J.J../ + 0, {p &0 )(
<c . 0../ I., Lj +
o. (.,, j'
'/
Lee's z-score on his math test was 1.5. The class average was 62.1 and the variance was 6.76. What
was Lee's "raw" score (his score before converting to zO-scores)?
X - r.,,;< .r
"10. 7&
y. -:
(p t.o
J
f
For questions 9 - 11, Describe the type of function that is shown and tell what type of transformation
be made to the data.
30
28
26
24
22
20
18
16
> 14
12
10
8
6
4
2
0
should
••
•
•
•
••
••••*•
0
1
+y
•
2
3
4
XVariable 1
9)
600
500 "
•
400
•
p(J'v\.J~r
••
> 300
200
.. •••
100
0
0
2
( ", Lj) -7 {) ~y JIl-7)
I
+y
~
4
6
8
XVariable 1
10)
11) Solve each equation
a. /094(5x
+ 6) + /0942 =
I, & 77, 7;}O.1
Aof 1./5><t,,).2- :: I).
,;2.,
if .;
b.6 . 123X-5
12
~
"" '6{"
51(-5"".:
h- ~;
-
,;1
1;2~'":> -1.11
J 0'/
10)<.1""
X-_/f/7'7i'J-O,l..j
j1
+ 4 = 70
)
"
___~.
-7
(3'f.-5).{01 y~~ J~ILr
/0'1
12) The-number of a certain type of bacteria pre~bOUS~fter
modeled by the equation In (number) = -0.0048 + 0.586(hours).
quantity of bacteria after 3.75 ho~rs?
.A.-YJ(Vtl.1I<\.J,.L'
)
-
~D.6lJ'{(
lV1it1LU.-v~r).=
a certain number of hours can be
What would be the predicted
r
0 Gb (
,S-o
)
?7')
C/. f 9;)...7
e.,;I. l-CjJ. 7 _ -Yll..<..c W\.- ber
loci'
mqf~
ffi"
K:/.1~b
13)
The computer output below shows the result of a linear regression analysis for predicting the concentration
zinc, in parts per million (ppm), from the concentration of lead, in ppm, found in fish from a certain river.
4,90
T
3.32
1.89
10,01
Std Dev
of
P
0.003
0.000
R-S ,,; 82.0%
S = 16.17
Which of the fnllowing statements is a correct interpretation
of the value
J
9.0 in the outpt,t?
(A) On average there is a predicted increase of 19.0 ppm in concentration
concenlration of zinc found in the fish.
of lead for every increase ofl ppm in
(B)
of zi'nc for every increase of 1 ppm in
0
n average there is a predicted increase of 19.0 ppm in concentration
concentration of lead found in the fish.
(e) The predicted concentrntion
of zinc is 19.0 ppm in fish with no concentration
of lead.
(D) The predicted concentrntion
of lead is 19.0 ppm in fish with no concentration
of zinc.
(E) Approximately
19% of the variability in zinc concentration
is predicted by its linear relationship
with lead
concentration.
14)The dataset "Healthy Breakfast" contains, among other variables, the Consumer
Reports ratings of 77 cereals and the number of grams of sugar contained in each serving.
(Data source: Free publication available in many grocery stores. Dataset available through
the Statlib Data and Storv Librarv (DASLi.)
A simple linear regression model considering "Sugars" as the explanatory variable and
"Rating" as the response variable produced the following summary statistics:
Predictor
!.:')
.-f1
i:t-
---
I
Constant
Fat
Sugars
StDev
T
P
1.953
1.036
0.2347
31. 28
-2.96
-9,43
0.000
0.004
0,000
"-
62.2%
S ~ 8.755
R-Sq(adj)
=
61. 2%
What would be the predicted rating of a cereal if the amount of sugar was 12 grams and fat
~U)'i'1 j--
~r/8gramS,_
\.>f-te\, ((); ~
.-
0J I,o~q
.'
-31)(p(".(h-f)
'3,00(0
(
-d..")l,)l(5~C<V5)
I?j - ;;..,)./;;Z(
/Ol')
.--
Section I
Sample Examination One
40
y
oQ
i
o
o
I
x
I
J
!
9. The scatterplot above shows 52 points with the associated least squares regression line
for predicting values of y from values of x. One of the two labeled points - either P or
Q _ will be removed. Which of the following is true?
I
!
I
Ii
9Removal
of the pgint P would substantially increase the slope of the least squares
.
regression line. Removal of the point Q would have little effect on the slope of
the least squares regression line.
(B) Removal of the point P would substantially decrease the slope of the least
squares regression line. Removal of the point Q would have little effect on the
slope of the least squares regression line.
(C) Removal of the point Q would substantially increase the slope of the least squares
regression line. Removal of the point P would have little effect on the slope of
II
I
I
I
I
I
I
I
the least squares regression line.
(D) Removal of the point Q would substantially decrease the slope of the least
squares regression line. Removal of the point P would have little effect on the
slope of the least squares regression line.
(E) Removal of the point P would have a substantial effect on the slope of the least
squares regression line and removal of the point Q would have a substantial
effect on the slope of the least squares regression line.
P
~CL5
'fu.vVLwevl
QV\-
II
i,
I
I
!
I,
ex;~~t1
0~ ~
r-f
_~lCvn_~.{efke+
.-
!
I
I
().i ~ (\
I
!I
CfI1-t1lC'
II
I
Unauthorized copying or reusing
a.nypart of this page is illegal.
."
56
S.ept;ion I
17. For a group of students, the correlation between their heights (in inches) and their
weights (in pounds) is 0.332. You are given that I inch = 2,54 centimeters and that 1
pound = 0.454 kilogram. lfthe heights are expressed in centimeters and the weights are
expressed in kilograms, what will be the value of the correlation?
/
(A) 0.059
<!U-
0.288
(.g)0.332
(D) 0.383
(E) 1.857
Answer
[g
LJ
38. Having graded a test, a teacher was interested in the relationship between the amount of
time the students studied for the test and the scores they received. She asked the 24
students individually how much they studied, and then compiled a list giving for each
student the amount of time studied and the score on the test. The teacher performed a
least squares regression analysis. Part of the computer output from that analysis is
shown below.
Depenqent
variable:
Score
Coef
SE Coef
69.555194
3.
1432
0 . 2 642443
0 . 10921
Predictor
Constant
Time
S = 6.3241
on test
'r
18.69
2.42
<S- Se.
R-sq = 21.0%
R-sq (adj)
P
e+
<.0001
0.0243
51
= 17.5%
Which of the following is a 99% confidence interval for the slope of the regression line
that relates the time spent studying and the score on the test?
(A)
(B)
(e)
~
~
69.555:f: (2.807)(3.721)
69.555:f: (2.819)(3.721)
69.555:f: (18.69)(3.721)
0.264:f: (2.807)(0.109)
0.264:1: (2:819)(0.109)
.-
cr::
----..
L-r.J..'S
VI,-
Answer
'le.
VI
~
tc. ':: ;(,g, q
elf ~ V\-
UnflUlhori7.M copying orrcusin,g
1lllY part of this pAgel is illegal
Se
-<.'-- - ~'--'----~""_'~
0+
d-
s/<f)O-t -
O. I oq.).-t ""
•.',.---~._,.._-.-.,--"-'-".-~ -'.--,- --, __,_,~~,~, ",' ~'-'.-~ _. __~._. ~
~ __"'
"'''''_._'--'_'T
_~.
.,_" _'_•••._~.~._
58
Sample Examination One
Section I
40. Two variables, x and y, were measured for a random sample of 10 subjects. In the first
of two transformations, logy was plotted (on the vertical axis) against x (on the
horizontal axis), a least squares regression was performed on the transformed variables,
and the following residual plot was obtained.
0.01
•
0.005
•
•
•
«l
.g
.~
0
18
'"0)
~
2
-0.005
x
2.2
2.4
2.6
18
3
•
•
•
-0.01
In the second transformation, log y was plotted (on the vertical axis) against log x (on
the horizontal axis), a least squares regression was performed on the transformed
variables, and the following residual plot was obtained.
0.003
•
0.002
•
•
•
0.001
•
Ote-----+-----+---'--_+-
~
.~ -o.oor
~
5
0.35
0.3
•
-0.002
-0.003
•
0.45
•
--<
logx
0.5
•
•
-0.004
~ich
0.4
+-
of the following concl~sions is best sl~pported by the evidence above?
0!:Y x and y are related accordmg to an equatlOn of the form y = ax
are constants.
P,
where a and p
.
(B) x andy are related according to an equation of the form y = a
p are constants.
+ xP,
where a and.
(C) x andy are related according to an equation of the form y = a. b\ where a and b
are constants.
(0) x andy are related according to an equation of the form y = a + b\ where a and
b are constants.
(E) x and-yare related according to an equation of the form y = a + blogx, where a
and b are constants.
Answer
D
Unauthorized copying or reusing
any part of this page is illegal.
Section II
Sample Examination Two - Answers
31
Question Two
(a) The regression equation is: Predicted Time = 15.027(Distance) _ 5826. In the least
squares line, Time denotes the time in days for a planet to revolve around the sun and
Distance denotes distance of the planet from the sun in millions of kilometers.
".
'"
Predictor
Constant
Distance
Coef
-5826
15.027
S
=
=
7471 R-Sq
StDev
3303
1.218
95.6% R-Sq(adj)
T
-1.76
12.34
=
P
0.121
0.000
95.0%
The value of the slope and the y-intercept can be found in the computer printout. The y_
intercept or constant is shown under the column "Coef" in row "Constant".The
value of the slope is shown under the column "Coer' in row "Distance".
(b) The slope is interpreted as: for each increase of one million kilometer in distance that a
planet is from the sun, it takes an estimated 15.027 additional days to revolve around
the sun. Slope is defined to be change in y divided by change in x. Here y is the time in
days for a planet to revolve around the sun and x is distance of the planet from the sun.
(c) Substituting 1,450 for Distance in the equation above, we get:
Predicted Time = 15.027(1450) - 5826 = 15963 days.
(d) The p-value of 0.000 in"tf'te Distance row indicates that there is strong evidence that a
linear relationship exists"between millions of kilometers the planet is from the sun and
the number of days it takes to revolve around the sun. The p-value is the probability of
getting a result at least as extreme as we actually obtained if the null hypothesis is true.
That is, a p-value of .000 for the slope suggests that it is very unlikely to get the results
that we did given that thc null hypothesis is true. This leads us to reject the null
hypothesis. The null hypothesis in this case would bc Ho:,B = 0, that is, the slope of the
population regression line is a and there is no linear relationships between thc
explanatory and response variables.
Predictor
Constant
Distance
Coef
-5826
15.027
StDev
3303
1.218
T
P
-1.76
12.34
0.121
0.000
(e) The residual plot indicates that a line is not an appropriate model. There is a curved
pattern to the residual plot. Our next step may be to try to make the data fit a straight
line pattern by taking the logarithm of the y values, or the logarithm of both the x and
the y values.
Unauthoril.oo copying or reusing ..
any plll1 or this page is illegal