Spread

Measures of Spread
The terms spread, dispersion, and variation all refer to a measure of the way a data set is distributed about some central value ﴾i.e. indicate how closely a set of data clusters around its center﴿.
Why do we care about spread?
Range
The range of data is the difference between the maximum and the minimum value.
Box and Whisker Plot
A graphical representation of the quartiles of data.
Quartiles and Box and Whisker Plot
Three values that divide a set of ordered data into four groups with equal numbers of data in each group.
Variance (coming up...)
Standard Deviation (stay tuned...)
Box and Whisker Plots
•
•
a representation of the spread of the data using the medians of the data
need to know:
range ­ minimum values to maximum values
quartiles ­
lower quartile (Q1) ­ median of lower half of data
median (Q2) upper quartile (Q3) ­ median of upper half of data
Let's construct one!
Data P { 26, 26, 27, 28, 30, 32, 34, 35, 39, 42, 44, 45, 49, 51, 64} n = 15
; 1) make a number line containing at least the range of your data
25 30 35 40 45 50 55 60 65 70
2) determine Q1, Q2 and Q3 of the data
{ 26, 26, 27, 28, 30, 32, 34, 35, 39, 42, 44, 45, 49, 51, 64}
th
lower quartile is the (1+7)/ 2 = 4 entry, 28
th
median is the (1+15)/2 = 8 entry, 35 upper quartile is the (9+15)/2 = 12 th entry, 45
25 30 35 40 45 50 55 60 65 70
3) mark the Q1, Q2 and Q3 on the
number line, and then complete the box
25 30 35 40 45 50 55 60 65 70
4) Draw a line along the number line from the left side of the box to the minimum value, and one from the right side of the box to the maxium value.
25 30 35 40 45 50 55 60 65 70
5) Enjoy your completed box and whisker plot!
Analysing Box and Whisker Plots
•
box and whisker plots provide information about the spread of the data when it is divided up into quarters
•
consider....
25 30 35 40 45 50 55 60 65 70
•
What is the range of this data set?
•
Determine the four intervals which each have an equal number of data points or 25% of the data.
•
What is the interquartile range of this data set?
The interquartile range (IQR) is the difference between the upper and lower quartiles; includes middle 50% of the data; removes outliers. 25 30 35 40 45 50 55 60 65 70
•
Describe the difference in spread between the 1st quartile and the 4th quartile.
•
Describe the difference in spread between the 2nd quartile and the 3rd quartile.
•
Describe the difference in spread between the 1st quartile and the 3rd quartile.
Interquartile Range (IQR)
The interquartile range (IQR) is the difference between the first and the third quartiles (Q3­Q1). The IQR contains the middle 50% of data (recall: the box in a box­and­whisker plot).
Note the larger the interquartile range, the larger the spread of data.
Your turn...
A farmer keeps track of the number of corn cobs produced by 11 different stalks in two different fields. Find the quartile values and calculate the IQR to compare the two fields.
Field A
25
23 33
28 27 30 25 30 27 29 31 Field
B
24 28 27 30 36 31 29 29
27 30 28
Field A produces a slightly greater spread of number of stalks than Field B.
In other words, Field B is more consistent then Field A.
Variance
A measure of dispersion that is found by averaging the squares of the deviation (from the mean) of each piece of data. (i.e. the mean of the squares of the deviations).
Field
Standard Deviation
A statistic used to measure how much the data is spread out around the mean. The square root of the variance.
The lowercase Greek letter sigma, , is the symbol for standard deviation. Standard Deviation:
The smaller the standard deviation, the more compact (less spread) the data.
Standard Deviation for Grouped Data
A student keeps track of the time one has to wait for an oil change at a Toyota garage. Find the standard deviation of the wait times.
Homework: pg.168 #1, 2ac, 3ac, 4, 5, 6
Summary of Measures of Spread
• range = maximum ­ minimum
• quartiles; interquartile range (IQR)
• box and whisker plot
• variance
• standard deviation
You need to know how to calculate each of the quantities and generate the plots. You will also need to know how to analyse these measures of spread to determine something about the data set (i.e. comparing the variance of different data sets to determine which one has the greater spread).