STAB22 section 1.1 taking each mark and weighting it according to the figures given in the example. According to the description, 837 is between 800 and 900, so it is a B. (Check the spreadsheet printout to see that the marks there between 800 and 900 did indeed get a B.) 1.2 The “who” is the students in the introductory statistics class. Each case (observation) is one student. We don’t know how many students there were in this class. As to “what”: the variables are ID (categorical, really, even though a number), marks on two exams, homework, the final 1.5 Go through the variables and decide whether you would just classify the possible values (categorexam, and a project (all quantitative), the total ical) or count or measure them (quantitative). points, calculated from the other marks (quantiThe individuals (cases) are the apartments (they tative) and the grade (categorical). These data would form the rows of your spreadsheet). The are (probably) the instructor’s marks table used variables are: for calculating grades, but you might imagine that the instructor could compare this class with • monthly rent: quantitative others. • cable included: categorical (values are yes and no) 1.3 Find the student with ID 105, who is in row 6. For this student, Exam1 is 79, Exam2 is 88, and • pets allowed: categorical (yes/no) Final is 88 as well, reading along the row. • number of bedrooms: quantitative (count them) 1.4 This one involves a careful reading of the descrip• distance to campus: quantitative (measure it) tion in Example 1.2. All the marks shown are out of 100, but when the grade is computed, some of them are out of other things. The computation is like this for the student given: There are therefore 5 variables. (I prefer to figure out what they are first and then count them.) total = 200(86/100) + 200(82/100) + 300(77/100) + 200(90/100) + 100(80/100) = 172 + 164 + 231 + 180 + 80 = 827, 1.27 You can do this one by hand, or (easier) get your software to do it. To get the graph in Minitab, select Graph and Bar Chart. The bars are going to be the values in 1 the Percent column, so change Counts of Unique Variables to Values from a Table. “Simple” is what we want. Click OK. The dialog box asks for two variables (columns): in the big box select Percent by double-clicking on it in the left box, then click on the box under Categorical Variable and select Spam in the same way. The bar graph comes out with the spam categories in the order given (actually, I think, in alphabetical order). My bar graph is in Figure 1. To get the bars sorted in order, re-do what you just did, up to the dialog box where you select Percent and Spam. Click on Bar Chart Options. Select Order Main X Groups Based on Decreasing Y. Click OK a couple of times. My graph is in Figure 2. This second bar chart makes it easier to see that the most spams are trying to sell products, with “adult” and “scam” categories very close together. 1.30 To use a pie chart, the data need to be parts of a whole. These figures are percentages, but of the “wrong thing” as far as a pie chart is concerned. (What adds up to 100% is percent of females and percent of males within each subject area, so you could do 9 different pie charts, one each for MBA, MEd, MA, MSc and so on.) If these figures were of all female graduates that graduated with degrees of these different types (in this case the percentages would add up to 100%), then a pie chart Figure 1: Default bar chart for types of spam 2 would work. Drawing a bar chart is very like the previous exercise. Open Minitab, get hold of the data, and select Graph and Bar Chart. Make sure Minitab recognizes the percents as Values from a Table, and, in Bar Chart Options, Order Main X Groups Based on Decreasing (or Increasing, if you prefer) Y. My bar chart is shown in Figure 3. You can see easily from the bar chart that education degrees are most popular among women (compared to men) and theology the least popular. You could see this from the original data too, but it is more work to do so. 1.36 The data are in co2. The values are measured in emissions per person because some of the countries have a lot more people than others; if you just give emissions per person, you don’t know whether the figure for a country is large because a lot of fuel is burned in that country, or because that country has a lot of people. So the figures as given are large for countries that burn a lot of fuel, relative to how many people live in that country. Figure 2: Bar chart for types of spam with bars tallest to shortest You can make either a histogram or a stemplot of the numbers. Let’s start with a histogram, which looks Figure 4. Those three highest values might be outliers, but it’s hard to be sure: we don’t know how far above the others they are. A stem3 Figure 4: Histogram of CO2 emissions per person plot might shed some more light. See Figure 5. The stemplot could be done by hand, but has the disadvantage that it is not clear what you want to have as stems and what as leaves. Doing it with software has the advantage that you can let the software choose the stems, and then fix it up if you don’t like it. Figure 3: Bar chart of women’s degree types The stemplot shows that the three largest values are 16, 17 and 20, which are quite a bit bigger than 11 (the next largest value). The stemplot 4 Figure 5: Stemplot of CO2 emissions per person doesn’t show a big gap, though. Figure 6: Default histogram of pH Nonetheless, the histogram and stemplot clearly show the right-skewed shape of this distribution: there are some countries that produce a lot of CO2 . The centre is hard to judge because of the extreme skewness; as we’ll see in Section 1.2, measures of centre such as the mean and median will be very different for data like this. But it seems clear that the “top” three countries are outliers; looking back at the data (in the text), these are the United States, Canada and Australia (so no surprise there). If you ignore the outliers, the spread is from 0 to about 11. The default histogram, shown in Figure 6, has about 14 intervals (actually 13), but the intervals are centred on 4.2, 4.4 etc., which is not quite what we wanted. So, in Minitab: double-click on one of the bars of the histogram, and select Binning from the popup box. At the top, change Midpoint to Cutpoint (because you are going to specify the ends of the intervals). Select Midpoint/Cutpoint Positions, and then type in the values for the class boundaries (4.2, 4.4, up to 7.0). When you click OK, you’ll get a histogram with the ends of the intervals in the right place. 1.38 It’s easiest to do this question backwards: first get the default histogram, then get a histogram with 14 intervals, then get the interval boundaries in the right place. (Minitab Version 12 instructions are: 5 select Graph, Histogram, put C1 or pH under X, click Options. Under Type of Intervals, click Cutpoint (to ensure Minitab makes the interval boundaries, not the interval midpoints, at the values you’re going to give). Under Definition of Intervals, select Midpoint/Cutpoint Positions, then type into the box 4.2 4.4 4.6 and so on up to 7. (Enter the numbers with spaces between). Click OK twice.) The result is shown in Figure 7. Figure 8: Histogram (b) of pH have the chance to set where the intervals (sometimes called “bins”) start from and how wide they are. Thus in (a) you start from 4.2 and use a width of 0.2; in (b) you start from 4.14 and again use a width of 0.2.) To get around to answering the question (finally): Figure 7 shows two modes (peaks), around 5.1 and 5.7. In Figure 8, the data set is much closer to having a single “flat-top” peak between about 5 and 6. The default histogram, Figure 6, has one peak around 5.6. So the way the histogram looks depends on apparently small choices about how it is drawn. Figure 7: Histogram (a) of pH For (b), repeat the above, but for the cutpoint positions, enter 4.14 4.34 and so on up to 6.94. The result is shown in Figure 8. (Other software is different, but if you don’t go straight to a default histogram, you’ll usually 6 1.39 If the only possible values for a variable are 0 and 1, the histogram will have two bars with a gap between, like (b) and (c). There should be a similar number of males and females (with, these days, slightly more females), as in (c), while the right-handers will typically outnumber the lefthanders (about 15% of the population as a whole is left-handed). This corresponds to (b). The other two variables can take many different values. Some people are of average height, but we’d expect similar numbers of people of aboveaverage and below-average height, ie. a symmetric shape like (d). Finally, a right-skewed shape like (a) would make sense for studying times, because a few people study a lot (and you can’t study less than 0 minutes!) Figure 9: Histogram of Cavendish’s measurements distinguishes the individuals (students). GPA, IQ and self-concept are quantitative (measured) variables, while Gender is categorical. When you read the data into Minitab from the disk, you’ll find that you have five columns of data (the four variables plus OBS). So make sure you make a stemplot of GPA (in column C2) and not something else! (Select gpa into the Variables box when you select Graph, Stem and Leaf.) See Figure 10. You can verify that Minitab did indeed round or trim the GPA values to the nearest tenth of a point. (If it does not, go back to the dialogue box and type 0.1 into the “increment” box.) 1.42 As in 1.39, you can use a histogram, as shown in Figure 9. You might also argue that a stemplot shows more accuracy in the data values. Either is good. There appears to be one outlier, the value around 4.9 (actually 4.88). (The two values around 5.1 might be outliers too.) Apart from that, the shape is more or less symmetric, with centre around 5.5 or 5.6 and a spread from 5.1 to 5.8. My guess at the density of the earth is the centre, 5.5 or 5.6 times the density of water. This distribution is skewed left, with centre around 8 The spread is from around 1 to around 11. There aren’t really any values distinct enough 1.43 There are really 4 variables here, since OBS just 7 Stem-and-Leaf Display: gpa Stem-and-leaf of gpa Leaf Unit = 0.10 1 2 3 7 11 15 22 (22) 34 19 4 0 1 2 3 4 5 6 7 8 9 10 N = 78 5 7 4 4689 0678 0259 0001249 1122344555566668888999 001111223378899 011133445555679 1577 Figure 11: Overlaid histogram of GPAs for boys and girls Figure 10: stemplot of GPA the lowest GPAs, though, since there are fewer girls (only 40% of the total), this might be the result of the sample sizes rather than gender differences. (With a larger sample, you have more chance to see extremely high or low values.) from the others to call outliers. In Minitab version 14 you can make an overlaid histogram, which allows you to compare boys and girls. This is not quite a back-to-back stemplot, but is as close as we will get. Select Graph, Histogram, and then With Outline and Groups. Se- 1.44 See Figure 13. A histogram would be equally good. The shape is slightly skewed left, though lect GPA as your graph variable and Gender as if you consider the four lowest values (below 80) your “categorical variable for grouping”. When as outliers, the shape is more or less symmetric. you click OK, you’ll find two overlaid histograms, The centre is around 112, and the spread from 72 with boys as red and girls as black. See Figure 11. (or 86) to 136. The centre is clearly above 100; The three lowest GPAs are all boys. in fact, only 14 of the 78 students have IQs below Your software might also produce a histogram 100. like the one in Figure 12, with the girls in blue and the boys in red. Again, the boys dominate 1.45 Same deal again, as shown in Figure 14. This is 8 Character Stem-and-Leaf Display Stem-and-leaf of iq Leaf Unit = 1.0 2 4 4 6 10 14 24 36 (19) 23 15 9 3 1 7 7 8 8 9 9 10 10 11 11 12 12 13 13 N = 78 24 79 69 0133 6778 0022333344 555666777789 0000111122223334444 55688999 003344 677888 02 6 Figure 13: Stemplot of IQ Figure 12: Stacked histogram for GPA by gender 9 skewed left, but not as much as GPA. The centre is about 60 (hard to judge, but less than 65). The spread is from 20 to 80; the value 80 is a little higher than the others, but, given the large number of values around 70, not really high enough to be an outlier. Character Stem-and-Leaf Display Stem-and-leaf of concept Leaf Unit = 1.0 2 3 4 8 13 17 30 39 39 25 11 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 N = 78 01 8 0 5679 02344 6799 1111223344444 556668899 00001233344444 55666677777899 0000111223 0 Figure 14: Stemplot of self-concept scores 10
© Copyright 2026 Paperzz