Solutions

STAB22 section 1.1
taking each mark and weighting it according to
the figures given in the example. According to
the description, 837 is between 800 and 900, so
it is a B. (Check the spreadsheet printout to see
that the marks there between 800 and 900 did
indeed get a B.)
1.2 The “who” is the students in the introductory
statistics class. Each case (observation) is one
student. We don’t know how many students there
were in this class. As to “what”: the variables
are ID (categorical, really, even though a number), marks on two exams, homework, the final 1.5 Go through the variables and decide whether you
would just classify the possible values (categorexam, and a project (all quantitative), the total
ical) or count or measure them (quantitative).
points, calculated from the other marks (quantiThe individuals (cases) are the apartments (they
tative) and the grade (categorical). These data
would form the rows of your spreadsheet). The
are (probably) the instructor’s marks table used
variables are:
for calculating grades, but you might imagine
that the instructor could compare this class with
• monthly rent: quantitative
others.
• cable included: categorical (values are yes
and no)
1.3 Find the student with ID 105, who is in row 6.
For this student, Exam1 is 79, Exam2 is 88, and
• pets allowed: categorical (yes/no)
Final is 88 as well, reading along the row.
• number of bedrooms: quantitative (count
them)
1.4 This one involves a careful reading of the descrip• distance to campus: quantitative (measure
it)
tion in Example 1.2. All the marks shown are out
of 100, but when the grade is computed, some of
them are out of other things. The computation
is like this for the student given:
There are therefore 5 variables. (I prefer to figure
out what they are first and then count them.)
total = 200(86/100) + 200(82/100)
+ 300(77/100) + 200(90/100)
+ 100(80/100)
= 172 + 164 + 231 + 180 + 80 = 827,
1.27 You can do this one by hand, or (easier) get your
software to do it.
To get the graph in Minitab, select Graph and
Bar Chart. The bars are going to be the values in
1
the Percent column, so change Counts of Unique
Variables to Values from a Table. “Simple” is
what we want. Click OK. The dialog box asks
for two variables (columns): in the big box select
Percent by double-clicking on it in the left box,
then click on the box under Categorical Variable
and select Spam in the same way. The bar graph
comes out with the spam categories in the order
given (actually, I think, in alphabetical order).
My bar graph is in Figure 1.
To get the bars sorted in order, re-do what you
just did, up to the dialog box where you select
Percent and Spam. Click on Bar Chart Options.
Select Order Main X Groups Based on Decreasing Y. Click OK a couple of times. My graph
is in Figure 2. This second bar chart makes it
easier to see that the most spams are trying to
sell products, with “adult” and “scam” categories
very close together.
1.30 To use a pie chart, the data need to be parts of a
whole. These figures are percentages, but of the
“wrong thing” as far as a pie chart is concerned.
(What adds up to 100% is percent of females and
percent of males within each subject area, so you
could do 9 different pie charts, one each for MBA,
MEd, MA, MSc and so on.) If these figures were
of all female graduates that graduated with degrees of these different types (in this case the percentages would add up to 100%), then a pie chart
Figure 1: Default bar chart for types of spam
2
would work.
Drawing a bar chart is very like the previous exercise. Open Minitab, get hold of the data, and
select Graph and Bar Chart. Make sure Minitab
recognizes the percents as Values from a Table,
and, in Bar Chart Options, Order Main X Groups
Based on Decreasing (or Increasing, if you prefer) Y. My bar chart is shown in Figure 3. You
can see easily from the bar chart that education
degrees are most popular among women (compared to men) and theology the least popular.
You could see this from the original data too,
but it is more work to do so.
1.36 The data are in co2.
The values are measured in emissions per person
because some of the countries have a lot more
people than others; if you just give emissions per
person, you don’t know whether the figure for a
country is large because a lot of fuel is burned in
that country, or because that country has a lot
of people. So the figures as given are large for
countries that burn a lot of fuel, relative to how
many people live in that country.
Figure 2: Bar chart for types of spam with bars tallest
to shortest
You can make either a histogram or a stemplot of
the numbers. Let’s start with a histogram, which
looks Figure 4. Those three highest values might
be outliers, but it’s hard to be sure: we don’t
know how far above the others they are. A stem3
Figure 4: Histogram of CO2 emissions per person
plot might shed some more light. See Figure 5.
The stemplot could be done by hand, but has the
disadvantage that it is not clear what you want
to have as stems and what as leaves. Doing it
with software has the advantage that you can let
the software choose the stems, and then fix it up
if you don’t like it.
Figure 3: Bar chart of women’s degree types
The stemplot shows that the three largest values
are 16, 17 and 20, which are quite a bit bigger
than 11 (the next largest value). The stemplot
4
Figure 5: Stemplot of CO2 emissions per person
doesn’t show a big gap, though.
Figure 6: Default histogram of pH
Nonetheless, the histogram and stemplot clearly
show the right-skewed shape of this distribution:
there are some countries that produce a lot of
CO2 . The centre is hard to judge because of the
extreme skewness; as we’ll see in Section 1.2, measures of centre such as the mean and median will
be very different for data like this. But it seems
clear that the “top” three countries are outliers;
looking back at the data (in the text), these are
the United States, Canada and Australia (so no
surprise there). If you ignore the outliers, the
spread is from 0 to about 11.
The default histogram, shown in Figure 6, has
about 14 intervals (actually 13), but the intervals
are centred on 4.2, 4.4 etc., which is not quite
what we wanted.
So, in Minitab: double-click on one of the bars of
the histogram, and select Binning from the popup box. At the top, change Midpoint to Cutpoint
(because you are going to specify the ends of the
intervals). Select Midpoint/Cutpoint Positions,
and then type in the values for the class boundaries (4.2, 4.4, up to 7.0). When you click OK,
you’ll get a histogram with the ends of the intervals in the right place.
1.38 It’s easiest to do this question backwards: first
get the default histogram, then get a histogram
with 14 intervals, then get the interval boundaries
in the right place.
(Minitab Version 12 instructions are:
5
select
Graph, Histogram, put C1 or pH under X, click
Options. Under Type of Intervals, click Cutpoint
(to ensure Minitab makes the interval boundaries,
not the interval midpoints, at the values you’re
going to give). Under Definition of Intervals, select Midpoint/Cutpoint Positions, then type into
the box 4.2 4.4 4.6 and so on up to 7. (Enter the
numbers with spaces between). Click OK twice.)
The result is shown in Figure 7.
Figure 8: Histogram (b) of pH
have the chance to set where the intervals (sometimes called “bins”) start from and how wide they
are. Thus in (a) you start from 4.2 and use a
width of 0.2; in (b) you start from 4.14 and again
use a width of 0.2.)
To get around to answering the question (finally):
Figure 7 shows two modes (peaks), around 5.1
and 5.7. In Figure 8, the data set is much closer
to having a single “flat-top” peak between about
5 and 6. The default histogram, Figure 6, has
one peak around 5.6. So the way the histogram
looks depends on apparently small choices about
how it is drawn.
Figure 7: Histogram (a) of pH
For (b), repeat the above, but for the cutpoint
positions, enter 4.14 4.34 and so on up to 6.94.
The result is shown in Figure 8.
(Other software is different, but if you don’t go
straight to a default histogram, you’ll usually
6
1.39 If the only possible values for a variable are 0
and 1, the histogram will have two bars with a
gap between, like (b) and (c). There should be a
similar number of males and females (with, these
days, slightly more females), as in (c), while the
right-handers will typically outnumber the lefthanders (about 15% of the population as a whole
is left-handed). This corresponds to (b).
The other two variables can take many different
values. Some people are of average height, but
we’d expect similar numbers of people of aboveaverage and below-average height, ie. a symmetric shape like (d). Finally, a right-skewed shape
like (a) would make sense for studying times, because a few people study a lot (and you can’t
study less than 0 minutes!)
Figure 9: Histogram of Cavendish’s measurements
distinguishes the individuals (students). GPA,
IQ and self-concept are quantitative (measured)
variables, while Gender is categorical.
When you read the data into Minitab from the
disk, you’ll find that you have five columns of
data (the four variables plus OBS). So make sure
you make a stemplot of GPA (in column C2) and
not something else! (Select gpa into the Variables
box when you select Graph, Stem and Leaf.) See
Figure 10. You can verify that Minitab did indeed
round or trim the GPA values to the nearest tenth
of a point. (If it does not, go back to the dialogue
box and type 0.1 into the “increment” box.)
1.42 As in 1.39, you can use a histogram, as shown in
Figure 9. You might also argue that a stemplot
shows more accuracy in the data values. Either
is good.
There appears to be one outlier, the value around
4.9 (actually 4.88). (The two values around 5.1
might be outliers too.) Apart from that, the
shape is more or less symmetric, with centre
around 5.5 or 5.6 and a spread from 5.1 to 5.8.
My guess at the density of the earth is the centre,
5.5 or 5.6 times the density of water.
This distribution is skewed left, with centre
around 8 The spread is from around 1 to around
11. There aren’t really any values distinct enough
1.43 There are really 4 variables here, since OBS just
7
Stem-and-Leaf Display: gpa
Stem-and-leaf of gpa
Leaf Unit = 0.10
1
2
3
7
11
15
22
(22)
34
19
4
0
1
2
3
4
5
6
7
8
9
10
N
= 78
5
7
4
4689
0678
0259
0001249
1122344555566668888999
001111223378899
011133445555679
1577
Figure 11: Overlaid histogram of GPAs for boys and
girls
Figure 10: stemplot of GPA
the lowest GPAs, though, since there are fewer
girls (only 40% of the total), this might be the
result of the sample sizes rather than gender differences. (With a larger sample, you have more
chance to see extremely high or low values.)
from the others to call outliers.
In Minitab version 14 you can make an overlaid
histogram, which allows you to compare boys and
girls. This is not quite a back-to-back stemplot,
but is as close as we will get. Select Graph, Histogram, and then With Outline and Groups. Se- 1.44 See Figure 13. A histogram would be equally
good. The shape is slightly skewed left, though
lect GPA as your graph variable and Gender as
if you consider the four lowest values (below 80)
your “categorical variable for grouping”. When
as outliers, the shape is more or less symmetric.
you click OK, you’ll find two overlaid histograms,
The centre is around 112, and the spread from 72
with boys as red and girls as black. See Figure 11.
(or 86) to 136. The centre is clearly above 100;
The three lowest GPAs are all boys.
in fact, only 14 of the 78 students have IQs below
Your software might also produce a histogram
100.
like the one in Figure 12, with the girls in blue
and the boys in red. Again, the boys dominate 1.45 Same deal again, as shown in Figure 14. This is
8
Character Stem-and-Leaf Display
Stem-and-leaf of iq
Leaf Unit = 1.0
2
4
4
6
10
14
24
36
(19)
23
15
9
3
1
7
7
8
8
9
9
10
10
11
11
12
12
13
13
N
= 78
24
79
69
0133
6778
0022333344
555666777789
0000111122223334444
55688999
003344
677888
02
6
Figure 13: Stemplot of IQ
Figure 12: Stacked histogram for GPA by gender
9
skewed left, but not as much as GPA. The centre is about 60 (hard to judge, but less than 65).
The spread is from 20 to 80; the value 80 is a little
higher than the others, but, given the large number of values around 70, not really high enough
to be an outlier.
Character Stem-and-Leaf Display
Stem-and-leaf of concept
Leaf Unit = 1.0
2
3
4
8
13
17
30
39
39
25
11
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
N
= 78
01
8
0
5679
02344
6799
1111223344444
556668899
00001233344444
55666677777899
0000111223
0
Figure 14: Stemplot of self-concept scores
10