DATA ANALYSIS

D ATA A N A LY S I S
12
SECTION 12.1 MEASURES OF CENTRAL TENDENCY
Sets are useful for grouping interesting and related numbers. One such set is the
heights of all of the people in your class. In order to use these sets, we need to
analyze the numbers, or data, in context. The first step in data analysis, the
process of making sense of a set, is collecting data. In data analysis, the idea of
a data set is slightly different from that of a set. Unlike regular sets, data sets can
have repetition of elements, and the order or arrangement matters.
EXPLORATION 1
Measure each person in your class in inches and record their name, age in months,
and height in inches in a table like the one below. The numbers below were taken
from another class, so your own class will have different results. Try to find ways
to summarize the information in the table so that you can share your results with
a friend without showing her the whole table. Would your strategy still work if
there were 100 people in the survey? 1000 people?
499
Chapter 12 Data Analysis
Name Height (in) Age (months)
Sophia
52
113
Rhonda
51
112
Edna
57
112
Danette
61
115
Hesam
55
117
Eloi
62
110
Vanessa
58
113
Michelle
60
108
Mari
58
125
Calvin
56
129
Moises
57
124
Amanda
57
120
Hannah
55
131
Tricia
55
129
Name
Kristen
Max
Jim
Karen
Diane
Tiankai
Oscar
Jenny
Bence
Pat
Teri
Sally
Will
Height (in) Age (months)
57
130
52
135
50
142
57
136
49
138
58
138
51
137
60
138
59
142
53
134
59
135
57
139
57
140
The entire collection of numbers is called the data and each individual piece of
information is called a data point. Data is plural for datum.
A major goal of data analysis is to find a simple measure of the data, called a
measure of central tendency, that summarizes or represents, in a general way,
the majority of the data. There are three common measures of central tendency:
the mean, median, and mode. The mean, median and mode are different ways
to identify the center of the data. We are also interested in how spread out our
data is. The range, the difference between the largest and smallest values of the
data, provides a simple measure of how much the data varies.
500
S e c t i o n 1 2 . 1 M e a s u r e s o f C e n t r a l Te n d e n c y
The mean, also called the arithmetic mean or average, is the sum of all the
data values divided by the number of data points. For a visual example, suppose
we have five containers, each containing a certain number of blocks:
These data can be grouped into a data set: {7, 3, 5, 7, 3}. There are 25 blocks total.
The mean number of blocks in a container is the number of blocks each container
has if these 25 blocks are distributed evenly among the 5 containers: 255 = 5.
The median is the value of the middle data point when the values are arranged in
increasing order. If the data set has an even number of data points, the median is
the average of the two middle values. To find the median value for the container
example, order the data, with the smallest number of blocks first and the largest
number last:
The median is the number of blocks in the middle, or third container with respect
to the sorted ordering. The median is a helpful measure of central tendency
because half of the values are less than or equal to the median and the other half
of the values are greater than or equal to it.
501
Chapter 12 Data Analysis
Frequency is the number of times a data point appears in a data set. For example,
if there are 4 people in the class who are 56 inches tall, then the frequency of the
height 56 inches in the class is 4. The mode is the value or element that occurs
the most often or with the highest frequency in the data set. In Exploration 1
the mode in our data is 57 because it appears 7 times in the data. What is the
mode in your class’ data? A set of data can have more than one mode. For the
containers of blocks example, the modes are 3 and 7 because both appear twice.
One way to display data is to use a table as in Exploration 1. Another way is to
use a stem-and-leaf plot. Let us use the data of heights in Exploration 1 to
illustrate how stem-and-leaf plots are constructed. The data in Exploration 1 are:
{52, 51, 57, 61, 55, 62, 58, 60, 58, 56, 57, 57, 55, 55, 57, 52, 50, 57, 49, 58, 51,
60, 59, 53, 59, 57, 57}. As the name suggests, there is a stem and then there is
a leaf. Because the data are all two digit numbers, we use the ten’s digit as the
stem and the one’s digit as the leaf. For example, the number 49 has stem 4 and
leaf 9.
The stem and leaf designation will vary with the data you are working with. To
begin, you may wish to rewrite the 27 data points in increasing order as follows:
{49, 50, 51, 51, 52, 52, 53, 55, 55, 55, 56, 57, 57, 57, 57, 57, 57, 57, 58, 58, 58,
59, 59, 60, 60, 61, 62}.
The stem-and-leaf plot for the heights will look like the following:
Stem
Leaf
4
9
5
0 1 1 2 2 3 5 5 5 6 7 7 7 7 7 7 7 8 8 8 9 9
6
0 0 1 2
A key should be included for the reader. “5|3 is read 53” is one way of designating the key.
The stems are in a column in numerical order. The leaves are in the row corresponding to their stems. They are listed in increasing order and usually a single
digit while the stems may have more than one digit. The leaves are in increasing
order and repeated as they repeat in the data set.
502
S e c t i o n 1 2 . 1 M e a s u r e s o f C e n t r a l Te n d e n c y
Notice the frequency becomes very apparent in the stem-and-leaf plot along
with the largest and smallest data used to determine the range. The median can
also be found using the stem-and-leaf plot with a little additional work. Determine the range and mode from the stem-and-leaf plot. Use the stem-and-leaf
plot to also determine the median.
PROBLEM 1
Use the age data in Exploration 1 to:
a. Construct a stem-and-leaf plot. You may wish to use the hundreds and
tens digits combined as the stem. For example, for 113, use 11 for the stem
and 3 for the leaf. For 120, use 12 for the stem and 0 for the leaf.
b. Determine the mode, range, and median for the data set.
EXAMPLE 1
Find the mean, median, mode, and range of the following data set.
{2, 8, 4, 8, 8, 6}
SOLUTION
The mean is found by adding the values together and dividing by the number of
values. The sum of the values is 2 + 8 + 4 + 8 + 8 + 6 = 36. The number of values
in the set is 6. The mean is 366 = 6.
Putting the data set into order from smallest to greatest value results in
{2, 4, 6, 8, 8, 8}. Because there are an even number of values in the set, the
median is the average of the two middle values. The median is the average of 6
( 6+ 8 )
and 8, 2 = 142 = 7.
The most commonly occurring value in the data set is 8, so 8 is the mode.
The range is the difference between the highest and lowest value. The range is
8 – 2 = 6.
503
Chapter 12 Data Analysis
PROBLEM 2
In the following data set, what is the mean? the median? the mode? the range?
{4, 9, 12, 5, 9, 14, 11, 15, 5, 6, 7, 5}
The mean depends on all the numbers in the data, but the median only depends
on the value of the data point in the middle position. That does not, however,
suggest that the mean is a better measure of central tendency than the median.
PROBLEM 3
Find the mean and median of the following six weeks test grades:
{95, 30, 98, 93, 100}.
Compare the value of each as a measure of the data.
EXPLORATION 2
Using the data from Exploration 1, compute the mean and the median of the
heights of the class. Then, imagine that a giant who is 400 inches tall joins the class.
Compute the new mean and find the new median. How has each changed?
If the data is skewed, or uneven, a median value is a more accurate picture of the
representative value than the mean is. Exploration 2 had a very tall giant join the
class. The mean was affected by this outlier, a term used to refer to a value that
is drastically different from most of the data values. The median, however, was not
affected. The mean is usually more influenced by extreme values than the median.
Let us review the ways in which we summarized data in this section.
If we have a set of n values, then we can find the following measures:
504
S e c t i o n 1 2 . 1 M e a s u r e s o f C e n t r a l Te n d e n c y
•
Find the mean by adding the values and dividing by n.
•
Find the median by ordering the values and finding the value that is in
the middle, if n is odd, or taking the average of the middle values, if n
is even.
• The mode is the most frequent value that occurs. There could be two or
more such values.
• The range is the difference between the largest and the smallest values
in the set.
EXERCISES
1. Using the example class from Exploration 1,
a Find the mean age, in years, to the nearest tenth.
b. Now look at the ages of students in your own class. Find the mean,
median and mode of the ages, in years, to the nearest tenth.
c. How are the two means different? What are the factors that cause the
difference?
d. To the nearest inch, is the median height of students in your class different from the mean height? If so, why do you think they are different?
2. Separate the data from your class into categories by age and find the mean
height for each age. Does the mean height increase with age? Explain the
results of your analysis.
3. Find the mean, median, mode and range of the following data sets:
a. {8, 6, 10, 14, 10, 9, 8, 3, 8, 16, 0, 8, 6, 2, 15, 2, 12, 8, 16, 5}
b. {74, 66, 66, 66, 64, 66, 71, 66, 71, 66, 74, 64, 66, 73, 71, 60, 71, 65, 63, 74}
c. Contruct a stem-and-leaf plot for the data in part b.
505
Chapter 12 Data Analysis
4. Use the following results of a math test as data to create a stem-and-leaf
plot. Determine the range, mode, and median for the math test using the
stem-and-leaf plot. Also, determine the mean.
{100, 92, 79, 65, 86, 80, 78, 63, 91, 83, 91, 87, 79, 86, 85, 92, 75, 76, 95,
78, 68, 67, 73, 76, 71, 86, 89, 85, 91, 96, 83, 77, 93}
5. A class has 17 students and the total height of all the students is 935 inches.
What is the mean height of the class? What is the median height?
6. Rhonda joins a class that has 17 students. The class mean height was 58
inches. Rhonda is 65 inches tall. What is the new mean height for the class
with Rhonda as an additional student? Give your answer to the nearest
hundredth of an inch.
7. A class has six students with a mean height of 55 inches. The class mean
height changes to 56 inches after Hannah, a new student, joins the class.
How tall is Hannah?
8. A 14-person class with an average height of 54 inches merges with a
12-person class with an average height of 50 inches. What is the average
height of the combined class to the nearest hundredth of an inch?
9. Which measure of central tendency is most helpful in representing the following?
a. Your grade in math.
b. The winner of the race for mayor.
10. Choose the appropriate measure of central tendency or range to describe
the data in the table. Justify your reasoning.
School
Jones Middle School
Lampasas Middle School
Falls Middle School
Miller Middle School
Fossum Middle School
Number of Teachers
32
36
28
37
51
11. The heights of various buildings in the city are listed. Which measure of central
tendency or range would make the heights appear tallest? 168 ft., 186 ft.,
221 ft., 73 ft., 152 ft., 186 ft., 199 ft.
506
S e c t i o n 1 2 . 1 M e a s u r e s o f C e n t r a l Te n d e n c y
12. The January mean daily temperatures for Castolon, TX and Galveston, TX
are approximately the same. However, their ranges are quite different. The
temperature data, in ˚F, from NOAA are
City
Maximum Minimum Mean Range
Galveston
61.9
49.7
55.8 12.2
Castolon
67.7
33.6
50.7
34.1
Even though Galveston and Castolon have about the same daily mean temperature for January, would you consider packing different clothes for the two
places? Which measure of central tendency influenced your decision? Why?
13. On the right are estimated national median heights in inches for 9- through
14-year-olds in 2000, according to the
Age group Height (in)
National Center for Health Statistics (NCHS).
9-year-olds
52.5
Based on this data, what is your estimate for
the median height for 15-year-olds? Do you
think the median heights for 24-year-olds and
25-year-olds are that much different? Explain.
10-year-olds
54.5
11-year-olds
56.5
12-year-olds
59.0
13-year-olds
62.0
14-year-olds
65.0
14. Ingenuity:
It takes 1263 digits to number all the pages of
a book. How many pages are there in the book?
15. Investigation:
Supply an example that applies to your hometown where the median is a
more appropriate measure than mean
507