Solutions

STAB22 section 2.1
2.3 Both ounces and price are quantitative variables,
and so we could draw a scatterplot to see how
they are related. We might expect that bigger
sizes cost more, though a Venti (24 ounces) costs
less than twice a Tall (12 ounces), even though
it’s twice the size. (I have problems with a company that calls its smallest serving a Tall, but
that may just be me.) If you leave the variable
Size as categorical, you could make something
like a bar graph but using Ounces instead of frequency. The individuals (cases) here are cups of
Mocha Frappuccino.
2.9 The price of a drink depends on the size. So price
should be the response and size the explanatory
variable, and on your scatterplot price should be
on the vertical y scale. I typed the numbers into
software and produced the plot shown in Figure 1, though you could almost as easily do this
by hand. As the size goes up, the price goes up
Figure 1: Scatterplot of price vs. size for Mocha Frapas well, but not in a straight-line way: the relapuccino
tionship looks less steep as the size increases (reflecting the fact that a 24-ounce drink costs the
least per ounce of coffee, because the coffee itself
is only one component of the price, and there is
also the fixed cost of hiring a barista to serve you,
however big a drink you have).
1
2.24 The first test comes before the final exam
chronologically, so the final exam score should be
the response (and go on the vertical scale on your
scatterplot). Again, this one could be done either by hand or by using software (your choice).
I used Minitab, with the results shown in Figure 2. Select Scatterplot and Simple, then select the response as Y and the explanatory as X.
There is essentially no relationship between the
two scores: if you knew the first test score, that
would not help you at all in predicting the final
exam score. This might be because the first test Figure 2: Scatterplot of first test and final exam scores
came very early in the course, and the material
it tested was very different from that on the final
ally late in the semester, it will usually be pretty
exam. Or students might react to their first test
clear what material is going to be tested (pretty
result: a student who scores poorly might study
much the same stuff that will be on the final), so
hard for the final, and a student who scores well
a student who does well on one will probably do
might relax a bit too much before the final.
well on the other (and will know how hard they
need to study for the final).
2.25 Again, the final exam score will be the response.
My scatterplot is shown in Figure 3. This appears
2.27 Think of whether one variable might be the
to be something of a positive association (more so
cause of the other, or whether the two variables
than in Figure 2, anyway), so that knowing the
are just things that “happen to go together”.
score on the second test helps a bit in predicting
In (b) and (e), the two values in each case are
the final exam score. (Note that the student who
obtained at the same time, and so they just “go
does best on the 2nd test, 175, does well on the
together” (or not): just explore the relationship
final, and the two students who score under 150
in each case.
on the second test don’t do very well on the final
either.)
In (a), older children will tend to be heavier, so
that if you knew the age of a child, you wold be
By the time the second test comes around, usu2
you would probably get pretty close to the right
order.)
In each of (a), (c) and (d) here, you could make
a case for the explanatory and response variables
being the other way around, but the major interest would be in the relationships as described
above. For instance, if you knew the weight of a
child, you could guess their age, but you would
normally want to do it the other way around.
2.28 Parents’ income is explanatory and college debt
is the response, because parental income influences college debt (it comes first). These variables are both quantitative (you would measure
them). If the parents have a high income, the
student will not have to borrow so much money,
so the debt will be low; if the parents have a low
income, the student will have to borrow a lot of
money to pay tuition, living expenses and so on.
So we would expect a negative association.
Figure 3: Scatterplot of second test and final exam
scores
able to predict their weight. Being able to say “if
I knew x, I would be able to predict y” means
that x is explanatory and y is the response: here
age is explanatory and weight is the response.
In (c), if you knew how many bedrooms the apartment has, you could make a guess at its rental
price. Thus bedrooms is explanatory, and rental
price is the response.
This is assuming that parents will pay their children’s college expenses, if they can. This isn’t always the case. Some students work while they’re
at school (or during the summers) and save what
they earn, and such students can be expected to
graduate with a lower debt than they would otherwise have had.
In (d), likewise, if you knew how much sugar a
cup of coffee has, you would be able to guess
how sweet it would taste. (A more interesting
setup would be to have a friend prepare three
cups of coffee with differing amounts of sugar in,
and then, by tasting, you would rank them in or- 2.29 IQ is supposed to be a measure of general inder of sweetness. If you’re a big coffee drinker,
telligence, and we would expect more intelligent
3
children to be more interested in and more skilled
in reading. This would be especially true for children in the same grade (and thus of about the
same age).
estimate. Having said that, children with a high
test score also tend to have a high self-estimate
(all of the children with test scores above 80 rate
themselves 3 or better). Likewise, the children
with a test score below 40 rate themselves 3 or
worse, with one exception. This exception is the
one outlier: a test score of about 10, and a selfestimate of 4, which is a serious over-estimate
(looking at the plot, you would expect this child
to have a self-rating of 1 or maybe 2).
In Figure 2.13, children with higher IQ scores
generally have higher reading scores, though
there is a lot of scatter. There are four children (with IQs between 100 and 130, and reading
scores less than 20) that don’t seem to follow the
general trend. Their reading scores are about 40
points less then you would expect based on their 2.32 Get the data from the disk into your software.
IQ; these children could have some kind of develIn Minitab, select Graph and Plot, with the right
opmental problems that hinder their reading even
variable (cycle length, here) as the response, Y,
though they score well on general intelligence.
variable. My plot is shown in Figure 4.
Ignoring the outliers, the trend is roughly linear (there is no obvious curve to the relationship,
which is how you tell). But it isn’t very strong:
there is a lot of scatter in the in the picture, which
is another way of saying that if you know a child’s
IQ, you wouldn’t be able to predict their reading
test score very accurately. (There is more to reading than general intelligence, in other words.)
2.30 As on a normal probability plot, when you see
a “stair-step” pattern like this, it means that one
of the variables only takes a few different values.
Here, it’s the child’s self-estimate of reading ability, which can only be 1, 2, 3, 4 or 5. There are 60
children, so there are several with the same self-
Figure 4: Plot of cycle length against day length
4
The point on the far right (with day length close
to 24) is an outlier, because it is not part of the
general pattern. You could claim that there is a
positive association, but it is very weak: if you
try to predict cycle length from day length, your
prediction won’t be very accurate.
score on the distress scale leads to a higher brain
activity measurement. The relationship is more
or less linear and fairly strong. I don’t see any
outliers. The data do suggest that distress from
social exclusion is related to brain activity in the
“pain” region.
2.33 I did this in Minitab again (though you could
do this one by hand if you really want to). Get
the data from the disk into Minitab; treat brain
activity as the response. Select Graph and Plot,
and select the two variables into Y and X with
brain activity as Y. My plot is in Figure 5.
Figure 6: Plot of team value against revenue
Figure 5: Scatter plot of brain activity against social 2.34 My plot of team value against revenue is in Figure 6. I don’t think there’s much of a relationship.
distress
If anything, the trend is downward, since one of
The relationship shows an upward trend: a higher
the teams with no revenue has the highest value,
5
Figure 7: Plot of team value against debt
Figure 8: Plot of team value against operating income
6
and the team with the highest revenue has almost
the lowest value.
variable. Your plot should look something like
Figure 9.
On the other hand, the plot of value against debt
is close to a perfect upward straight line: the
larger the debt, the larger the value. There are
some outliers at the bottom left (in the sense of
points that are further off the line than the others): the Oklahoma City Thunder and the Orlando Magic have higher value than you would
expect given their amounts of debt, and the Portland Trail Blazers have lower value than you
would expect from their debt. None of these are
far off the trend, but the overall fit is so good that
I would call even these moderately “off” values
outliers.
Figure 9: Metabolic rate vs. lean body mass
I’d describe the value–income plot as a weakish
positive association, since there does seem to be
some relationship. The teams with negative income seem to be following the same trend as the
others, except for the Dallas Mavericks: for a
team with that kind of value, you’d expect a positive income at least.
Looking at all the data, the relationship is positive (larger lean body mass goes with larger
metabolic rate), and the trend looks linear. The
relationship looks quite strong, except perhaps
at the upper end. Separating out the men and
women, some of the men (red squares) have large
lean body mass and large metabolic rate, and the
trend overall for the men is not as clear as it is
for the women (black circles). (Most of the larger
values are men, and all of the smaller values, on
both variables, are women.)
2.35 The last sentence of the first paragraph in the
text gives you a clue as to what should be on
the y-axis: rate is the response, and mass the
explanatory variable.
So get a scatterplot of Rate against Mass, “with 2.37 To get the plot with men and women’s records
separately labelled, use the same idea as 2.35: do
groups”, and use Sex as the grouping categorical
7
a scatterplot “with groups”, and select Sex as the
grouping variable.
Figure 10: Men and women’s 10,000 record times
Men (red squares) have been running this event
for longer than women (black circles), so their
history is longer. But the women’s record appears to have been dropping more quickly than
the men’s. In recent years, though, the women’s
record hasn’t dropped very much, while the men’s
has dropped more quickly. So the data support
the first claim of (b), but not the second (the
men’s record is still less than the women’s, with
no apparent sign that the women are going to
catch up).
8