Review Quiz Solutions

Chapter 10: Quiz 1
1. a. true
b. true
c. false
d. false
2. D. 40.5% of 1000 is 405.
3. A. The P-value is about 0.0054.
4. E. The alternative hypothesis is that at least one proportion in the population is not equal to that in your model.
5. For each category, the expected frequency is 0.2(23) = 4.6. Because the expected frequencies are not all 5 or greater, the
conditions are not met for a chi-square goodness-of-fit test. Further, because this is not a random sample from the grades
of this instructor but, rather, all of the grades from a single course, a significance test could only test whether the
differences in observed proportions from 0.2 can reasonably be attributed to chance.
Chapter 10: Quiz 2
1. a. False. The chi-square test of homogeneity has (2 – 1)(2 – 1) = 1 degree of freedom.
b. False. In a test of independence, one sample is taken from a single population. In a test of homogeneity, independent
random samples are taken from two or more populations.
c. true
d. true
2. E
3. C; the P-value is 0.0046.
4. Four possible bar charts are shown here. The proportion of females among the Hispanics who were stopped is smaller
than the proportion of females stopped in the other races/ethnicities. This is most easily seen in the bar chart on the lower
right. It can also be seen easily in the bar chart on the lower left because the proportion of the bar that is female is smaller
for Hispanics than it is for the other races/ethnicities.
Chapter 11: Quiz 1
1.
2.
3.
4.
5.
6.
7.
I, III, II
A
B
E
B
C
a. The regression equation is batting average = 239.56 + 2.3135 · number of home runs. The slope of 2.31 means that if
one player has one more home run than another player, you can expect his batting average to be about 2.31 (or 0.231%)
higher.
b. Check conditions. The 15 baseball players were randomly selected from the American League baseball players in a
given season. (See the plot below.) The trend appears to be linear, but there is a serious concern with the point at the
upper right because it may be influential. The residual plot shows no obvious pattern, so a linear model fits the data well.
There is no evidence that the residuals tend to change in size as x increases. The boxplot of the residuals shows no
outliers or obvious skewness or other indications of non-normality.
State your hypotheses.
H0: 1  0 , where 1 is the slope of the true linear relationship between the total number of home runs and the batting
average for all American League baseball players in the given year.
Ha: 1  0
Compute the test statistic and P-value and draw a sketch. The test statistic is t = 2.611. There are 13 degrees of
freedom, and the
P-value is 0.022, as shown in the following sketch.
Write a conclusion in context. Reject the null hypothesis that there is no evidence of a linear relationship. It appears that
there is evidence of a nonzero linear relationship between a player’s batting average and number of home runs. However,
as indicated earlier, the possibly influential point (33, 317) is of serious concern. By removing that point and redoing the
analysis, we find that the point is indeed influential. When that point is removed, the conclusion changes because the new
P-value increases to 0.12.
Chapter 11: Quiz 2
1. E
2. III and IV are defensible.
I: This isn’t a defensible conclusion because the relationship could be quadratic, for example.
II: This isn’t defensible because the student failed to reject the null hypothesis and the relationship is curved.
III: The scatterplot could, for example, show a parabolic trend.
IV: A transformation of one or both variables could straighten out the plot.
3. a. The Caspian Sea is very influential. In the
t-test for the significance of the slope, when the Caspian Sea was removed, the P-value went from 0.025 to 0.13. The
appearance of a positive linear relationship was due entirely to this one lake. (Note that in both cases, the slope is very
small. Even when the Caspian Sea is included and so the slope is significant, the increase in depth tends to be about
0.02 foot per 1-square-mile increase in the surface area.)
No, removing the Caspian Sea did not result in a more randomly scattered residual plot, nor did it take care of the
outliers or skewness in the residuals.
b. Yes, making a log-log transformation of the data results in a much more randomly scattered residual plot, and the
residuals are more symmetric and without outliers. The
P-value for the test of the slope, 0.0032, now is highly significant. Therefore, we can reject the null hypothesis and
conclude that there is evidence to support the claim that the linear trend between ln(area) and ln(depth) cannot reasonably
be explained by chance alone.
c. One condition was violated after the log-log transformation: This was not a random sample from the world’s lakes. The
regression line is meaningful as a summary whether you have a random sample or the entire population. It was useful to
do a significance test for the slope here, but the conclusion must be stated carefully: The slope of the relationship between
the log of the surface area and the log of the depth of the lakes is different from what you would expect if the depth of the
lakes had been assigned at random to the lakes. In other words, there is a pattern here that cannot be explained by chance
alone.