Summarizing Data - Signal Hill #181

Summarizing Data

Common Core Standard: Recognize that a measure of center for a numerical data set summarizes all of its values
with a single number, while a measure of variation describes how its values vary with a single number

Summarize numerical data sets in relation to their context, such as by:
o
Reporting the number of observations.
o
Describing the nature of the attribute under investigation, including how it was measured and its units of
measurement.
o
Giving quantitative measures of center (median and/or mean) and variability (interquartile range and/or
mean absolute deviation), as well as describing any overall pattern and any striking deviations from the
overall pattern with reference to the context in which the data were gathered.
o
Relating the choice of measures of center and variability to the shape of the data distribution and the
context in which the data were gathered.

Example: Determine how many observations were reported in order to have made the following dot plot.
o
Each dot on the dot plot is an observation or an accounting
o
In this case, each dot represents an observation regarding the type of snake


Note: This does not mean that they only saw each snake one time
o
To determine the total number of observations, count how many dots there are
o
There are 29 dots on this dot plot
o
This means that there are 29 observations or 29 snakes
Which measure of center is better?
o
Mean – is best to use when there are no outliers present in the data
o
Median – is best to use when there are outliers

Outlier – is a data point that is either bigger or smaller than the rest of the data and that does not
fit in with the majority of the data points

Example: {20, 21, 22, 20, 22, 100, 21}, the outlier is 100, because it is quite a bit bigger the rest of
the data points
o

Mode – is best to use when the data is compiled of items instead of numbers (i.e., favorite colors)
Example: Determine which method of center best describes the data shown below:
o
{14, 16, 15, 15, 19, 12, 13, 18, 17, 12} – this would be mean, because there is no outlier present
o
{9, 11, 0, 12, 11, 9} – this would be the median, because there is an outlier, 0
o
Determine which type of snake is most commonly found in the zoo.
- this would be the mode, because there are no numbers, just different types of snakes

Measures of Variability:
o

This is usually a number that best describes the changes in the data or the spread of the data
Mean Absolute Deviation – this is a value that will show how far away from the mean the data points are
o
To determine mean absolute deviation:

1st: Determine the mean

2nd: Subtract all the data points from the mean

3rd: Determine the absolute value of all of these points (the distance away from zero)

4th: Determine the mean of the differences

Example: What is the MAD (Mean Absolute Deviation) for: {3, 4, 1, 2, 1, 1}

Mean: 3 + 4 + 1 + 2 + 1 + 1 = 12; 12 ÷ 6 = 2

Subtract the values:
o

2 – 3 = -1; 2 – 4 = -2; 2 – 1 =1; 2 -2 = 0; 2 -1 =1; 2 -1 =1
Absolute value:
o
of -1 is 1; of -2 is 2, of 1 is 1, of 0 is 0

Mean of the differences:
o

1 + 2 + 1 + 0 + 1 + 1 = 6; 6 ÷ 6 = 1
MAD is 1, so this means that the data is about 1 away from the mean, which tells us that
the data is pretty close together!

Interquartile Range – is the spread of the data between the first (lower) quartile and the third (upper) quartile
o
This tells us how spread out the data is across the median or the box in the Box and Whisker Plot
o
Example: The Box and Whisker Plot below shows data describing the number of books teachers read during
the summer. Determine the interquartile range.

The first quartile (lower quartile) is 16

The third quartile (upper quartile) is 18

The difference is 18 – 16 = 2

The interquartile range is 2.

This tells us that the data changes or varies by about 2, so the data within the interquartile
range is close

Shape of the graph
o
Skewed left – most of the data is the to the right of the graph
o
Skewed right – most of the data is to the left of the graph
o
Equally distributed or evenly distributed – the data is spread throughout the graph close to equally