is of course 7. If the middle two countries had 7 and 9 gold medals, then we would have selected 8 as the median, according to the convention. SECTION 1.3 EXERCISES 1. In a national survey, men were asked why they exercise. Here are the reasons they gave: Reason Health Stress relief Weight loss Other Percentage responding 51 25 20 4 Draw a circle graph of these data. 2. Toss a coin 30 times and record the numbers of heads and tails. Display your result as a circle graph. (Hint: To draw a circle graph, you will need to change your results into percentages.) 3. Draw a line graph for the high temperatures in Exercise 6 of Section 1.2. Use the graph to answer the following questions: a. What are the highest and lowest temperature? b. What is the range of the data? c. What do the center and spread of the data tell you about the weather on this day? 4. Draw a line graph for the weights of rats used in agricultural research in Exercise 2 of Section 1.2. Use the graph to answer the following questions: a. What are the highest and lowest weights? b. What is the range of the data? c. Find the median, or middle number, of the data. For additional exercises, see page 714. 1.4 FREQUENCY HISTOGRAMS In statistics, a bar graph is called a frequency histogram, or just histogram. The horizontal axis of such a graph tells the scale of the observations involved, and the vertical axis shows how many observations fall in various intervals. That is, it gives the numbers of data points, or frequencies, of the observations in various intervals. Thus, when statisticians say “frequency,” they simply mean the number of data points. Since it is the frequency of the data occurring in various locations over the range of the data that determines the overall shape of a data set, the concept of a frequency histogram is very useful and important. The histogram is the most common way to graphically display data. It loses some information (the exact value of each data point, that is, the leaf), but it is extremely informative about centering, spread, and especially shape. These are the three aspects of data we usually care most about. Our first example of a histogram is the one presented in Figure 1.3. The amount of tar in a cigarette (measured in milligrams) was measured for 24 brands of cigarettes. The graph shows the number (frequency) of those 24 cigarette brands that contain given amounts of tar. For example, the leftmost bar represents the number of cigarette brands that contain from 0 to less than 2.5 milligrams of tar per cigarette. (This range 10 Number of cigarette brands 9 8 7 6 5 4 3 2 1 0 0 2.5 5 7.5 10 12.5 15 17.5 Amount of tar (mg) Figure 1.3 Histogram of tar content of selected brands of cigarettes. can be expressed as 0 to 2.5⫺ , the minus sign indicating that the value 2.5 is not included.) That bar reaches to a height of 1, meaning that one brand has a tar content in that range. We read the other bars, corresponding to tar ranges 2.5 to 5⫺ , 5 to 7.5⫺ , and so on, in a similar manner. This convention is needed so that we know which interval to put 5.0 into, for example. How did we construct this histogram? The first step was to take the “raw” data, or original 24 pieces of data, and construct a frequency table. Here are the raw data: 1.1 8.9 14.7 4.0 9.0 15.0 4.5 11.6 15.0 4.9 12.1 15.3 7.2 12.5 15.8 7.5 12.9 16.2 7.9 13.3 16.8 8.3 14.1 17.3 From these data and using intervals of width 2.5, we obtain the needed frequency table, shown in Table 1.11. Then we simply convert this frequency table into Figure 1.3 in the obvious way. What does the histogram of Figure 1.3 tell about the data—that is, tell us about tar content? We see that although some of the 24 brands have relatively low amounts of tar, more of the brands are near the high end of the range of tar contents measured. Further, we see that none of the brands has a tar content greater than 17.5 mg. Thus the graph shows Table 1.11 Frequency Table of Tar (mg) per Cigarette Interval ⫺ 0–2.5 2.5–5.0⫺ 5.0–7.5⫺ 7.5–10.0⫺ 10.0–12.5⫺ 12.5–15.0⫺ 15.0–17.5⫺ Total Frequency (f ) 1 3 1 5 2 5 7 24 the frequencies of the data fully in various intervals, and this shows us the overall shape of the data. We might ask why there is no value exceeding 17.5 mg. Does that value represent the natural tar content of tobacco, and does the shape of the graph (high to the right, low to the left) arise from the fact that some manufacturers remove tar from their tobacco? To answer such questions, we must leave statistics; we might ask a chemist, for example. What if, instead of using the ranges—or intervals—0 to 2.5⫺ , 2.5 to 5, and so on, we had used the broader ranges 0 to 5⫺ , 5 to 10⫺ , and so on? Of course we could construct a new frequency table, starting from the original data, as we did above. But we can now more easily work directly from Figure 1.3. Looking at our first group, it is clear that the number of cigarette brands whose cigarettes contain between 0 and 5.0 mg of tar is 4. Similarly, the number of brands having a tar content from 5 to 10 mg is 6. We continue in this manner and create the histogram in Figure 1.4. Have we lost or gained anything by graphing the data this way? We see that the rectangles still increase in height as we move from low tar content to high tar content, so our new graph still communicates the fact that more of the 24 brands have higher tar content than lower. But the heights increase more gradually than in Figure 1.3. And the two rightmost rectangles in Figure 1.4 are of the same height, meaning that just as many of the brands have a tar content between 10 and 15 mg as between 15 mg and 20 mg. Judging from Figure 1.4 alone, we would not be able to say whether, within the range from 10 mg to 20 mg, more of the cigarette brands had tar content closer to 20 than to 10. All this suggests that by consolidating the bars of Figure 1.3—that is, by combining the intervals—we have lost information that we would prefer to keep. For these data we would do better to leave the graph as it is in Figure 1.3. It is also possible to graph a greater number of bars than is useful. For example, is the histogram of Figure 1.5 very useful in displaying the general 10 9 Number of brands 8 7 6 5 4 3 2 1 0 0 5 10 15 20 Amount of tar (mg) Figure 1.4 Histogram derived from that of Figure 1.3 by combining bars. 10 5 0 0 10 Figure 1.5 20 30 40 50 60 70 80 A histogram with too many bars. shape of the data set it was constructed from? If we choose wider bars, we obtain the histogram of Figure 1.6. Thus we discover that the shape of the histogram is rather flat over the range of the data—a very useful piece of information. Clearly, Figure 1.5 hid from us this important fact about the approximate flatness in the shape of the data. 10 5 0 0 10 Figure 1.6 bining bars. 20 30 40 50 60 70 80 Histogram derived from that of Figure 1.5 by com- As a general rule of thumb, 5 to 15 bars are usually appropriate. Too few flatten out important local aspects of the general shape of the data, as we saw in Figure 1.4. Too many lead to a wide and accidental variation of bar heights that hides the general shape of the data, as we saw in Figure 1.5. Clearly, the more data points there are, the more bars are appropriate. For example, 200 data points could allow 15 bars, whereas 30 data points might be best displayed with 5 to 6 bars. This decision is a judgment call on the part of the user; rigid rules about this are not appropriate. Histograms can be readily made from stem-and-leaf tables, which are basically just a special kind of histogram. Let’s look at the stem-and-leaf table of annual earnings to the nearest $1000 in Table 1.12. By simply enclosing each row (set of leaves) in a bar, we have a respectable histogram. See Figure 1.7a. We can also count how many leaves there are for each stem and redraw our histogram showing the frequencies (how many leaves there are) along one axis of the graph. See Figure 1.7b. It is more common to use vertical bars instead of horizontal bars in bar graphs. In that case, our histogram of the annual earnings looks like the one in Figure 1.8. Note that the horizontal axis scale (salaries in $1000s) has been included, too; we put in the scale of the horizontal axis so that it is clear which income each bar represents, with the first interval from 10 to 20⫺ , and so on. Thus, as stated above, a stem-and-leaf plot is really a special case of a histogram. Using bars of unequal widths is occasionally useful. For example, we may use wider bars in regions where the data are more sparse. Exercise 7(b) and (c) provides such an example. Table 1.12 Stem-and-Leaf Plot of Annual Earnings (Thousands of Dollars) Stem Leaf 1 2 3 4 5 6 7 6,7 9,9,9,2,7,7,8,3,5 4,5,1,2,6,6,3,9 6,9,7,7,7,5,9,8,6,3,0,3,4,8,3 3,5,3,7,7,9 3,2,2,4,4,0,0 7,4 Key: “1 6” stands for $16,000. Stem Leaf 1 2 3 4 5 6 7 6, 9, 4, 6, 3, 3, 7, 7 9, 5, 9, 5, 2, 4 9, 1, 7, 3, 2, 2, 2, 7, 7, 4, 7, 6, 7, 7, 4, 7, 6, 5, 9 0, 8, 3, 5 3, 9 9, 8, 6, 3, 0, 3, 4, 8, 3 0 (a) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (b) Figure 1.7 Table 1.12. Histogram derived from stem-and-leaf plot of 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 10 20 30 40 50 60 70 80 Annual salary ($1000s) Figure 1.8 Histogram of Figure 1.7 put in usual position. SECTION 1.4 EXERCISES 1. Draw a histogram of the data given in the stem-and-leaf plot below. The data (from Exercise 4 of Section 1.2) are the scores of 20 students on a statistics test. Stem Leaf 7 8 9 2,3,4,5,9 0,2,3,4,4,5,7,8 0,1,2,3,5,6,6 Key: “8 2” stands for 82. 2. Draw the stem-and-leaf plot in Exercise 1 with six stems instead of three. Draw a histogram of the data. Does the addition of more stems give a clearer picture of the data? 3. Draw a histogram of the data in Exercise 2 of Section 1.2 using five bars: 180–190⫺ , 190– 200⫺ , and so on. Now draw a separate histogram with each bar split in half: 180–185⫺ , 185–190⫺ , and so on. Which histogram gives you a clearer picture of the data? Note: 190⫺ means that the bar excludes 190, and so on. 4. Toss a coin 30 times and keep track of the numbers of heads and tails. Construct a histogram to show your results. (Hint: Your histogram will have only two bars.) If you perform the experiment again, would you expect to get the same results? Why or why not? 5. Fingerprints can be classified according to the number of ridges between “loops” in the patterns. The number of ridges is called the ridge count for a particular person. Suppose we have the following ridge counts for the fingerprints of 20 people: 189 207 186 205 181 201 189 213 205 185 192 207 210 188 194 213 6. Draw a histogram of the high temperatures from Exercise 6 of Section 1.2, choosing the bars so that you get the “best” picture of the data. Explain your choice of bars. For additional exercises, see page 715. 198 192 215 220 Draw a stem-and-leaf plot of these data. Then draw a histogram. 1.5 RELATIVE FREQUENCY HISTOGRAMS AND POLYGONS As we have seen, the histogram is a powerful graphical tool for showing the general shape, or distribution, of a data set. Consider the frequency table of 30 exam scores given by Table 1.13. We first note that we can rescale these frequencies relative to the total number of data points, 30, by dividing each frequency by 30. Doing so converts them to proportions, also called relative frequencies. In this textbook such proportions are thought of as probabilities, and in fact they are often called experimental probabilities and are denoted by P. We will study them in detail in Chapter 5. Table 1.13 Frequency Table of Exam Scores Table 1.14 Proportions of Persons Obtaining Various Exam Scores, Used for Figure 1.9 Score Frequency ( f ) Score f P 75 76 77 78 79 80 81 82 83 1 0 2 3 4 11 5 3 1 75 76 77 78 79 80 81 82 83 1 0 2 3 4 11 5 3 1 0.03 0 0.07 0.10 0.13 0.37 0.17 0.10 0.03 Total 30 Total 30 1.00
© Copyright 2026 Paperzz