Solutions to Ch.4 Book Problems

Chapter 4 Solutions
4.1
(a)
(b)
(c)
(d)
4.2
(a)
(b)
(c)
(d)
4.3
(a)
(b)
(c)
4.4
(a)
(b)
Amount of time spent studying for the exam is the explanatory variable and
grade on the exam is the response variable.
Simply explore the relationship–either height or weight could be considered
the explanatory variable.
Inches of rain is the explanatory variable and corn yield is the response
variable.
Simply explore the relationship (and they are related!)
Family’s income is the explanatory variable and years of
eldest child completes is the response variable.
Square footage is the explanatory variable and price of
response variable.
Simply explore the relationship-either variable could be
explanatory variable.
Simply explore the relationship-either variable could be
explanatory variable for the other.
education their
a house is the
considered the
considered the
Without including A, B, and C, the overall pattern is moderately linear. There
is a more linear trend for IQ test scores over 100 than for those less than
100.
Student A has an IQ of about 103 and a GPA of about 0.5. Student A
appears to be an underachiever.
Student A has an average IQ score but a very low GPA. Student B has a
somewhat higher IQ score (about 109) and a low GPA. Student C has a low
IQ score, but a fairly high GPA.
The overall relationship is moderately linear (or slightly curved) with average
score on the SAT Math test decreasing as the percent of high school
graduates taking the test increased.
We see two clusters of points
corresponding to the states that require the ACT for admission and those
requiring the SAT for admission. The students who took the SAT in states
where it was not required for admission to the state schools tend to be
strong students (admission as a non-resident is typically more difficult as is
admission to a private college) and so the average score is higher. When a
higher percentage of students take the exam, the average score is lowered
by some of the poorer performing students who are taking the exam.
West Virginia stands out because we would expect the average score to be
higher with such a small percentage of students taking the exam. Maine
stands out because their average score is also lower than expected based on
the general relationship of the data.
Chapter 4
4.5
(a)
53
The number of males available to return each year could vary dramatically
depending on the survival rate from the previous season. So it would be
more appropriate to use percents when comparing the various years.
(b)
(c)
The data does support the theory that a smaller percent of birds survive
following a successful breading season.
The scatterplot has a definite
downward trend – but not very linear; possibly more curved.
4.6
(a) and (b)
(c)
There is a positive, linear association for city mileage and highway mileage
for both types of cars. The form of the relationship is similar between the
two types of cars except that two-seater cars (with the marked exception of
the Smart convertible) tend to get somewhat worse mileage than the
mini/subcompacts.
54
4.7
(a)
(b)
(c)
Chapter 4
You would expect a positive association between a state’s median household
income and mean personal income since states with a high (low) median
household income will also tend to have a high (low) mean personal income.
The overall pattern of the plot shows a moderately strong, positive, linear
association, at least up to a mean personal income of about $45,000. The
data shows more spread at the higher median household income levels
(above about $45,000).
Utah is unusual because it has a higher than expected median household
income given the mean income per person in the state. Connecticut is
unusual because it has a lower median household income than we would
expect given its high mean income per person, based on the relationship
seen in the rest of the data. D.C. is unusual for the same reason as
Connecticut, but the difference is even greater here than what we would
expect based on the rest of the observations.
4.8
(a)
(b)
4.9
(a)
(b)
The association appears to be moderately strong, negative and linear. We
could argue, based on this data, that babies tend to crawl sooner when the
average temperature is higher after six months.
The highest percent return on T-bills is about 15% and the lowest is about
1%. The highest percent return on stocks is about 50% and the lowest is
about –30%.
There is no readily observable pattern to the data. There is no strong
evidence that high interest rates are bad for stocks. The relationship
between interest rates and stock returns appears to be weak or nonexistent.
Chapter 4
55
4.10
(b)
M
i
l
e
a
g
e
Window:[0,12,1,-1000,190000,10000,1]
Age
(c)
The scatterplot shows a moderately strong positive linear association
between the age of a car and its mileage, with two possible outliers. The
observations (7, 40,000) and (10, 98,000) have slightly lower mileages than
would be expected from the trend evident in the remainder of the data.
4.11
(a)
(b)
(c)
The association is strongly positive and linear.
States with higher
percentages of proficient students tend to have higher average NAEP math
scores.
Answers will vary.
Chapter 4
56
4.12
The percent scoring proficient on the NAEP mathematics test tends to
decrease as the percentage of students in a state who were eligible for free
or reduced-price school lunch increases.
4.13
(a)
72.0
x
x
x
Men
68.0
x
x
x
64.0
64.5
(b)
xW
66.0
67.5
69.0
70.5
Women
= 66, sw = 2.098, x M = 69, sM = 2.53
Based on the plot we
would
expect
the
correlation to be positive
but not, due to the
outlier, to be close to 1.
⇒
Women
66
64
66
65
70
65
zWomen
0
-0.95
0
-0.48 1.91
-0.48
Men
72
68
70
68
71
65
zMen
1.19 -0.40 0.40 -0.40 0.79
-1.58
1
r = [(0)(1.19) + (−.95)(−.40) + (0)(.40) + (−.48)(−.40) + (1.91)(.79) + (−.48)(−1.58)] = 0.57 .
5
Chapter 4
4.14
(a)
(b)
(c)
4.15
57
If all the men were 6 inches shorter, the correlation would not change. The
correlation tells us that there is a weak to moderate association between
women’s heights and men’s heights (that is, that taller women tend to date
taller men), but it does not tell us whether or not they tend to date men
taller than themselves.
The correlation would not change.
+1.
The correlation r measures the strength of the linear relationship between
GPA and exam score. If r is positive, there is evidence that GPAs and exam
score measures would be positively associated. That is, higher GPAs would
be associated with higher exam scores. If r is negative, these variables would
be negatively associated.
4.16
(a)
(b)
For the data given, x = 0, sx = 4.123, y = 2.6, and sy = 2.408. Using these
values and computing the z-score for each x- and y-value, we have:
−5
−3
0
3
5
0
4
5
4
0
−1.21
−0.73
0
0.73
1.21
−1.08
0.58
1.00
0.58
−1.08
1
Then r = [(−1.21)(−1.08) + (−.73)(.58) + (0)(1) + (.73)(.58) + (1.21)(−1.08)] = 0.
4
r is a measure of the strength of the linear relationship between two
variables. Two variables, as in this example, can be strongly associated in a
non-linear fashion.
x
y
zx
zy
(c)
4.17
Answers may vary. We would expect the height of women at age 4 and their
height as women at age 18 to be the highest correlation since it is reasonable
to expect taller children to become taller adults and shorter children to
become shorter adults. The next highest would be the correlation between
the heights of male parents and their adult children. Tall fathers tend to
have tall sons, but typically not as tall, and likewise for shorter fathers. The
lowest correlation would be between husbands and their wives. Husbands
may be taller than their wives in general, but there is no reason to expect
anything more than a weak positive correlation.
Chapter 4
58
4.18
(a)
(b)
(c)
(d)
(e).
4.19
(a)
(b)
(c)
4.20
(a)
(b)
r
r
r
r
r
= 0.9.
= 0.
= 0.7.
= −0.3.
= −0.9.
The direction of the scatterplot is positive, but appears curved, not linear.
The strength of the association is moderate.
The hippopotamus is unusual because its lifespan is longer than would be
expected given its gestation period, based on the association in the
remaining data. The Asian elephant is unusual because it has the second to
largest gestation time. The two animals with the largest gestation times do
not follow the curvilinear pattern seen in the shorter gestation times, rather,
they tend to have much longer lifetimes as well. The giraffe’s observation
tends to follow the curvilinear shape, with possibly a little shorter lifespan
than is to be expected based on the pattern in the remaining data.
With a gestation period of about 280 days and an average lifespan of around
70 years worldwide (according to Wikipedia), the data point for humans
would not be part of the general pattern.
r = –0.319 says that there is a weak negative correlation between a child’s
score on the vocabulary portion of the Wechsler Intelligence Scale for
Children and the number of siblings a child has. In general, the more
siblings a child has, the lower their score on the Intelligence Scale.
Answers will vary. For example, one explanation might be that having more
siblings gets in the way of having enough attention paid to you to develop
your vocabulary to the fullest. The correlation gives us no clue as to the
reasons behind the relationship, only that there is one.
4.21
(a ) −1 ≤ r ≤ 1.
(b)
s ≥ 0.
4.22
(a)
(b)
(c)
(d)
(e)
False.
False.
True.
False.
True.
It’s possible that no three of the points are collinear.
There can be a strong non-linear relationship.
Usually, the thicker the book, the greater the number of pages.
Heavier cars are less fuel efficient–the correlation would be negative.
The value of r is not related to the units of measurement.
Chapter 4
4.23
(a)
(b)
4.24
(a)
(b)
(c)
(d)
(e)
59
The association is negative. The higher the level of carbon monoxide, the
lower the level of nitrogen oxides. The levels of nitrogen oxides drop off
quickly until you reach 5 or 6 grams of carbon monoxide and then become
quite linear after that. The only clear outlier is the point (4, 2.9), although
the bulk of the values are 15g or less of carbon monoxide with only 3 values
beyond that point.
The plot does not back up the statement. It’s possible, likely even, to have
low amounts of carbon monoxide and high amounts of nitrogen oxides as
well as high amounts of carbon monoxide and low amounts of nitrogen
oxides.
A
A
A
A
A
substantial negative correlation (the older the car, the lower the price).
substantial negative correlation (heavier cars get worse gas mileage).
substantial positive correlation (taller men weigh more).
small correlation.
moderate positive correlation.
4.25
(a)
(12, 50)
x
(33,38)
40
x
HAV
x
x
x
x
2
20
x
2
2
7.0
(b)
(c)
(d)
x x
x
x
x x
x x
x
x
x
x
x
x
x
x
x
x
x
x
(37,32)
x
x
x
The explanatory
variable is MA
Angle.
x x
14.0
21.0
28.0
35.0
MA
There is a weak to moderate positive linear association between MA angle
and HAV angle. The point (12, 50) is a clear outlier. The points (37, 32) and
(33, 38) are removed from the bulk of the data but are not really outliers.
A correlation of r = 0.30 tells us that there is a weak positive linear
relationship between MA angle and HAV angle.
There is only a weak correlation (r = 0.3) between MA Angle and HAV Angle
so that the doctor’s speculation is only somewhat supported.
Chapter 4
60
4.26
Answers will vary.
The relationship between x
and y is a strong quadratic
relationship,
but
the
correlation between x and y
is only r=-0.07.
4.27
(a)
(b)
(c)
4.28
(a)
The slope is -19.87, the coefficient of x. We predict the amount of gas
consumed in Joan’s home to decrease by 19.87 cubic feet for every degree
the average monthly temperature increases.
The y-intercept is 1425. When the average monthly temperature is 0°F, the
predicted gas consumption for Joan’s home is 1425 cubic feet.
Predicted gas = 1425-19.87*30=828.9 cubic feet. We predict that the
amount of natural gas Joan will use in a month with an average temperature
of 30°F is 828.9 cubic feet.
The y-intercept is clearly at the point (0,10). To find the slope of the line
(the coefficient of x in the least-squares line, note that (3,8) is another point
on the least-squares line. Thus the slope is m =
the least squares line is then y = 10 −
(b)
(c)
2
3
The slope of the line is − .
2
x.
3
10 − 8
2
= − . The equation of
0−3
3
For every additional slice of pizza eaten, we
predict the number of laps the player could run afterward to decrease by
two-thirds of a lap.
The y-intercept is 10. If the player does not eat any pizza, we predict that
he can run 10 laps.
4.29
2
* 8 = 4.67. Using the least-squares
3
(a)
Predicted number of laps for John= 10 −
(b)
line, we predict that John will complete 4.7 laps.
The least-squares line should not be used to predict how many laps Ezekiel
will complete because the number of slices of pizza that Ezekiel is too far
Chapter 4
61
outside of the range of the range of data values that were used to calculate
the least-squares line.
4.30
Answers will vary but should include a discussion of how the least-squares
line minimizes the squares of the vertical distances between the observed
points and the line.
4.31
The vertical distances from the points to the two lines are given in the table
below:
x
y
y=1-x Distance
Squared y=3-2x Distance Squared
Distance
Distance
-1 2
2
0
0
5
-3
9
1 1
0
1
1
1
0
0
1 0
0
0
0
1
-1
1
3 -1
-2
1
1
-3
2
4
5 -5
-4
-1
1
-7
2
4
For the line y=1-x, the sum of the squares of the vertical distances is 3. For
the line y=3-2x, the sum of the squares of the vertical distances is 18. Thus,
the line y=1-x fits the data best.
4.32
The predicted sleep debt for a 5-day school week, based on the least-squares
regression equation, is 2.23+3.17*5=18.08 hours, a little more than 3 hours
greater than what was found in the research study. Based on their collected
data, the students have reason to be skeptical of the research study’s
reported results.
4.33
To find the residuals, we first calculate the predicted pack weights by
substituting the corresponding body weights into the least-squares
regression equation: Packweight=16.3+0.09(Bodyweight). The residuals are
then found by subtracting the predicted values from the observed pack
weights.
Body weight
120
187
109
103
131
165
158
116
Packweight
26
30
26
24
29
35
31
28
Predicted
Packweight
27.10 33.13 26.11 25.57 28.09 31.15 30.52 26.74
Residual
-1.1 -3.13 -0.11 -1.57
0.91
3.85
0.48
1.26
The sum of the residuals is -1.1-3.13-0.11-1.57+0.91+3.85+0.48+
1.26=0.59. The sum is not exactly 0 due to roundoff error.
4.34
The least-squares regression line relating the amount of natural gas
consumed in Joan’s Midwestern home based on the average monthly
temperature is y=1425-19.87x (see Exercise 4.27). For the month with an
average temperature of 49.4 °F, the observed gas consumed was 520 cubic
feet. The predicted value is 1425-19.87*49.4=443.422 and the residual is
then 520-443.422=76.578. This value differs slightly from what is shown in
calculator corner (76.643) due to rounding error.
Chapter 4
62
4.35
The statement for r is false. r is a measure of the strength of the linear
relationship between two variables; it does not tell you what percentage of
the individuals in a sample can have their values predicted accurately. The
statement is also false for r2. r2 is the percentage of the variability in the
response variable that can be accounted for by the straight line relationship
with the explanatory variable, not the proportion of individuals in the sample
for which prediction is valid.
4.36
x has no value in predicting y
in this situation since the yvalue is 3 for every choice of
x.
4.37
(a)
0.20
x
BAC
x
0.10
x
x
0.00
x
2.0
(b)
(c)
4.38
(a)
(b)
x
x
x
x
x
x
x
x
x
x
x
4.0
6.0
8.0
Beers
We
are
interested
in
predicting BAC from the
number of beers consumed.
Hence, number of beers
consumed is the x-variable
and BAC is the y-variable.
r = 0.894. Yes, this is an appropriate measure of the strength of the
association between BAC and beers because the scatterplot clearly shows
that there is a linear association.
(BAC) = -0.013 + 0.018(# beers). The slope of the regression line, 0.018,
tells us that the BAC is predicted to increase by 0.018, or 1.8%, for each
additional beer. The y-intercept, −0.013, is outside of the range of the data
and should not be interpreted.
BAC = −0.013 + 0.018(5) = 0.077.
The regression line is based on at most 9 beers. We therefore do not know
the accuracy of the line for predicting BAC for a person who drinks 15 beers.
It’s quite possible that the relationship between the number of beers
Chapter 4
63
consumed and BAC changes significantly outside of the scope of the observed
data.
(c)
The residuals are tightly clustered
around
0
(ranging
from
approximately
-0.03
to
0.04)
indicating that the regression line is a
good fit to the data. Note also the
random scatter of the residuals
around 0.
(d)
r2=0.80. 80% of the variation in the BAC levels is explained by the leastsquares regression of y on x, number of beers consumed.
4.39 The more plausible explanation is the presence of a lurking variable,
temperature (z). When the weather is warmer, more people are likely to go
swimming and thus there are more drowning deaths (y). Ice cream sales (x)
are likely to be higher during this time as well.
x
y
z
4.40
Answers will vary. One possibility is that a large proportion of the nondrinkers do not drink because of a severe illness. These people might have a
higher death rate even though they did not drink.
4.41
Answers will vary. Some examples:
(1)
The men are highly motivated and this could explain job success as
well as years of schooling.
(2)
Greater intelligence could lead to more schooling as well as higher
job success.
(3)
Family pressure to succeed leads to more schooling and job success.
4.42
It very rarely snows in the San Francisco Bay Area and it is an area in which
there are frequent earthquakes. In other words, low (or no) snowfall would
be associated with higher earthquake activity.
If Ontario has few
Chapter 4
64
earthquakes, then heavy snowfall would be associated with low earthquake
activity. Hence, the negative correlation. This does not mean that a strong
snowfall will prevent earthquakes. There is an observed association between
snow and earthquakes (as represented by the dashed line), but not a cause
and effect relationship (as would be represented by an arrow).
x
4.43
4.44
(a)
(b)
4.45
(a)
(b)
4.46
y
Answers will vary. For example, parental pressure could lead to taking high
school algebra and geometry as well as success in college. It’s very hard to
get into college without at least algebra and geometry, so we would be very
surprised if successful college students had not generally taken algebra and
geometry whether or not they were in a minority. The association between
taking Algebra and Geometry and college success does not indicate that
taking math leads to college success.
It’s most likely an example of common response, with the amount of water
and heat expelled during the previous eruption as possible lurking variables.
If the duration of the previous eruption was short, it is likely that the amount
of heat and water expelled are less requiring a shorter amount of time for the
geyser to build up and erupt again, leading to a shorter duration between
eruptions.
Answers will vary.
The general trend is a weak negative association between the number of
patients treated and the mortality rate. For hospitals that treat few heart
attacks, the mortality rate is considerably higher than those that treat at
least, say, 100.
The scatterplot, taken as a whole, appears to be curved with a rapid drop off
in the mortality rate until about 100 or so cases, after which it becomes
reasonably linear. Actually, two linear models (0-100, 100-700) seem to fit
the data well. The non-linearity strengthens the thesis that patients should
avoid hospitals that treat few heart attacks since by far the highest mortality
rates are at the hospitals that treat few heart attacks.
Answers will vary. For example, states like New York and Rhode Island are
densely populated, while states like South Carolina, Alabama, and Arkansas
are more rural and less densely populated. If cancer rates are higher in
more populated areas, then this variable could be confounded with the
effects of a state’s beer consumption. Another lurking variable might be
general lifestyle differences between residents of the Northeast (New York,
Rhode Island) and the South. If people in the South were generally more
health conscious than people in the Northeast, this could help explain the
lower cancer rates.
Chapter 4
4.47
4.48
(a)
(b)
65
We would expect active girls to weigh less so that more hours of activity
would be associated with a lower BMI. 3.24% (r2 = 0.0324) of the variation
in BMI among the girls in the study can be explained by the straight-line
relationship with hours of activity.
Since 80°F is well within the range of the observed data, the regression
equation can be used.
The correlation will not change since it does not depend on the units of
measurement.
The equation to convert Fahrenheit to Celsius is
s
5
C = (F − 32) ⋅ . The slope is defined by the equation b = r y . By changing
9
sx
the units, sx is the only quantity that changes, the relationship being
5
sC = sF .
9
Thus, the slope of the regression equation using Celsius can be
found by dividing the original slope by 5/9. The equation for the y-intercept
is a = y − bx . If the units change, both b and x will change and we can
calculate the new slope as follows:
aC = y − bC xC = y −
bF
5
(x F − 32)⋅
9
59
= y − bF x F + 32bF = aF + 32bF .
4.49
140
x x
Weight
x
70
x
x
x
x
x
x
x
x
x x
x x
0
0.0
4.50
(a)
(b)
(c)
5.0
10.0
15.0
20.0
Day
The overall pattern is roughly
straight-line. The correlation
looks to be quite close to –1
(in fact, r = −0.998). The
pattern is strongly negatively
linear.
b = −6.31 tells us that the soap lost, on average, 6.31g per day. You could
also say that the weight of the soap is predicted to decrease by 6.31g every
day.
The y-intercept of 133.2 is the predicted weight of the soap on day 0. That
is, our regression line predicts that the soap weighed 133.2g when Mr. Boggs
began collecting data.
weight = 133.2 – 6.31(4) = 107.96g on day 4.
Chapter 4
66
(d)
140
x x
Weight
x
70
x
x
x
x
x
x
x
x
x x
x x
0
0.0
4.51
(a)
(b)
5.0
10.0
15.0
20.0
Day
Weight = 133.2 – 6.31(30) = −56.1g on day 30. This clearly does not make
sense since the soap can’t weigh less than 0g. There is no way to know
whether the trend illustrated extends beyond the domain of the original data.
In general, it is a bad idea to extrapolate.
99.6% of the variation in the soap weights is explained by the least-squares
regression of weight on day.
(c)
There is an obvious pattern in the
scatterplot that suggests that a nonlinear model would fit the data better.
4.52
(a)
The least-squares regression line appears to fit the data quite well, the points
are tightly clustered around the line and the residuals are clustered around 0.
Additionally, an r2 value of 0.96 indicates that 96% of the variation in the
temperature is explained by the least-squares regression of temperature on
number of cricket chirps.
(b)
Number of Chirps
Dr. LeMone’s prediction
Web prediction
10
48.92
47
The predictions are very close.
20
57.84
57
30
66.76
67
40
75.68
77
Chapter 4
4.53
67
Answers will vary. For example, let x = use of chemicals, y = number of
miscarriages and z = time spent standing up. Since the problem describes
both x and z as possible causes whose effects are confounded, we have
x
y
?
z
4.54
4.55
(a)
(b)
4.56
(a)
(b)
4.57
(a)
b=r
(x and z are confounded)
?
sy
3.462
= 0.795 ⋅
= 0.0908, a = 28.625 − (0.0908)(136.125) = 16.26 .
sx
30.296
We would expect the correlation between length and weight to be positive
because the longer the insect is the more it should weigh.
The correlation would not change since it does not depend on the units of
measurement.
For Alaska, the maximum daily rainfall is about 15 inches and the maximum
annual rainfall is about 330 inches.
Excluding Alaska and Hawaii, the maximum annual rainfall ranges between
about 10 and 200 inches, with most states between about 50 and 150. The
trend is linear but the regression line would be almost horizontal. That is,
the predicted y-value for every x would be about the same. Hence, knowing
a state’s highest daily precipitation would not be a great help in predicting
that state’s highest yearly precipitation.
Chapter 4
68
(b)
n = 6, x = 5, sx = 4, y = 4, sy = 3.74.
x
zx
y
zy
1
−1
1
−0.80
2
−0.75
3
−0.267
3
−0.5
3
−0.267
4
−0.25
5
0.267
10
1.25
1
−0.80
10
1.25
11
1.87
1
[(−1)(−.8) + (−.75)(−.267) + (−.5)(−.267) + (−.25)(.267) + (1.25)(−.80) + (1.25)(1.87)]
5
= 0.48.
r =
(c)
The point (10, 1) is an influential point that lies well outside of the general
pattern of the data. If the point (10, 1) is removed, the correlation becomes
r = 0.99.
4.58
We can conclude that there is a moderately strong linear association between
taking the SAT and the number of hours they spent preparing for the math
section. However, we cannot conclude that preparation time caused the
score on the SAT to be higher. We cannot rule out, based on correlation
alone, the effect of confounding variables.
4.59
(a)
(b)
4.60
(a)
(b)
4.61
4.62
(a)
(b)
Dolphins:
Body weight: 190 kg;
Brain weight: 1600 g.
Hippos:
Body weight: 1400 kg;
Brain weight: 600 g.
If we assume that “smart” means that an animal has a large brain relative to
its body size, then dolphins are smart because their actual brain size lies well
above their predicted brain size, and hippos are dumb because their brain
size lies well below their predicted brain size.
It would tend to decrease the correlation. An outlier generally in line with
the bulk of the data will tend to increase the correlation. Without that point,
the points for human, dolphin, and hippo would have more effect on the
straight-line pattern of the data.
If we removed dolphin, human, and hippo, the correlation would increase.
With the exception of these three points, the other points tend to lie pretty
much on a line indicating a large correlation. The linear relationship is clearly
strengthened by the removal of these points.
r2 = 0.7396. This means that about 74% of the variation in brain weight can
be accounted for by the least-squares regression of brain weight on body
weight.
The brain weight is predicted to be about 900 g. This value was found by
finding 600 kg on the x-axis and finding the corresponding y-value on the
regression line.
The answer is 1.3. We can estimate the slope by finding two points on the
regression line. Two approximate points are (0,0) and (725, 1000). An
Chapter 4
69
approximation for the slope is then m =
1000 − 0
= 1.4 .
725 − 0
This is closest to 1.3.
The other two values, 0.5 and 3.2, produce lines that are much too flat or
steep, respectively.
4.63
(a)
The association is negative, the
number of days in April until the first
bloom decreases as the average
March temperature increases.
The
association is linear and fairly strong.
(b)
(c)
(d)
The least-squares regression equation is Predicted Number of Days= 33.12 4.69*Temperature.
For every 1 degree increase in average March
temperature, in degrees Celsius, we predict the number of days in April until
first bloom to decrease by 4.69. The y-intercept is outside of the range of
data and therefore has no meaningful interpretation.
Predicted number of days until 1st bloom=33.12-4.69(3.5)=16.7. We predict
the first cherry blossom to appear on April 17th.
Predicted number of days until 1st bloom=33.12-4.69(4.5)=12.015. The
observed value was 10. The residual is then 10-12.015=-2.015.
(e)
There is no discernable pattern in the
residuals. They are clustered about 0
in a random fashion.
(f)
r2=0.72. 72% of the variation in the number of days in April until the first
cherry blossom appears is explained by the least-squares regression of the
70
Chapter 4
number of days in April until 1st bloom on the average temperature, in
Celsius, in March.
4.64
4.65
(a)
(b)
The finding of a positive correlation between average teacher salaries and
liquor sales does not tell us that the increase in average teacher salaries
caused the increase in liquor sales. Other confounding factors, such as an
improving economy, could explain both increasing teacher salaries and
increasing liquor sales.
From the graph, there would be about a 37% or 38% reduction in injuries.
Answers will vary. For example, it’s likely a result of causation since there
have been many investigations that demonstrate that wearing a seat belt
reduces the risk of injury in an accident. Further: the association is strong;
the association is consistent; more people wearing belts is positively
associated with a reduction injuries; the increase of wearing belts preceded
the reduction in injury rate; and wearing of belts is a plausible reason for the
reduced injury rate.