Cumulative Frequency Graphs

Box Plots &
Cumulative Frequency Graphs
Box Plots
• AKA: box and whiskers plots
• Graphical display of the 5 number summary:
– Minimum value
– Lower quartile (Q1)
– Median (Q2)
– Upper quartile (Q3)
– Maximum value
Box Plots
Example. Draw a box plot for the following data:
8
2
2
6
3
2
9
5
6
4
5
5
3
5
2
6
1. Determine 5 number summary.
Min: 2
Q1: 2.5
Q2: 5
Q3: 6
Max: 9
2. Create an appropriate number line. Be sure
to label the axis.
3. Plot the 5 number summary & connect the
appropriate points using a straight edge.
Outliers
• Outliers are noted on box plots with an
asterisk or a dot
• Use 1.5 X IQR criteria
• It is possible to have more than one outlier at
either end
• Each whisker extends to the last value that is
not an outlier
Outliers
Example. Draw a box plot for the following data.
1
8
3
8
3
9
5
9
6 7 7 7 8 8
10 10 12 13 14 16
1. Determine the 5 number summary.
2. Test for outliers.
3. Create an appropriate number line. Be sure
to label the axis.
4. Plot the 5 number summary & connect the
appropriate points using a straight edge.
Outliers
1. Determine the 5 number summary.
Min: 1
Q1: 6.5
Q2: 8
Q3: 10
Max: 16
2. Test for outliers.
Lower Boundary
Upper Boundary
Q1 – 1.5 X IQR
Q3 + 1.5 X IQR
6.5 − 1.5 × 3.5 = 1.25
10 + 1.5 × 3.5 = 15.25
1 is below lower boundary
16 is above upper boundary
3. Create an appropriate number line. Be sure to label the
axis.
4. Plot the 5 number summary & connect the appropriate
points using a straight edge.
Box Plots
How can we make the calculator do this for us?
“Stat”
“Edit”
Enter data into L1
“2nd” “y=” (“Stat Plot”)
Select the plot you’d like to use
Turn plot “on”
Select the type you’d like to
use
List: L1 (or applicable list)
“Graph”
Tips:
-Make sure there aren’t
equations in “y=” that will
interfere with your box
plot
-You can use the calculator
to check the box plot that
you create
Interpreting Box Plots
• 25% of values are between smallest value &
lower quartile (lower whisker)
• 25% of values are between lower quartile &
median (lower small rectangle)
• 25% of values are between median & upper
quartile (upper small rectangle)
• 25% of values are between upper quartile &
largest value (upper whisker)
• 50% of values lie between lower & upper
quartiles (entire rectangle)
Interpreting Box Plots
A set of data with a symmetric distribution will
have a symmetric box plot.
The whiskers are the same length and the
median is in the center of the box.
Interpreting Box Plots
A set of data which is positively skewed will have
a positively skewed box plot.
The right whisker is longer than the left whisker
and the median line is to the left of the box.
Interpreting Box Plots
A set of data which is negatively skewed will
have a negatively skewed box plot.
The left whisker is longer than the right whisker
and the median line is to the right of the box.
Parallel Box Plots
• A visual comparison of the distribution of two
data sets
• Can easily compare descriptive statistics, such
as median, range & IQR
Parallel Box Plots
Example. A hospital is trialing a new anesthetic
drug and has collected data on how long the
new and old drugs take before the patient
becomes unconscious. They wish to know which
drug is more reliable.
Using the below box plots, compare the two
drugs for speed and reliability.
Parallel Box Plots
Speed: Using the median, 50% of the time the new drug
takes 9 seconds or less, compared with 10 seconds for the
old drug. We conclude that the new drug is generally a
little quicker.
Reliability:
Old drug:
Range = 21 – 5 = 16; IQR = 12.5 – 8 = 4.5
New drug: Range = 12 – 7 = 5;
IQR = 10 – 8 = 2
The new drug times are less “spread out” than the old
drug times. The new drug is more reliable.
Cumulative Frequency Graphs
• Cumulative frequency: the sum of all the
frequencies up to and including the new value
Race finishing time t
Frequency Cumulative Frequency
2 h 26 ≤ t < 2 h 28
8
8
2 h 28 ≤ t < 2 h 30
3
8 + 3 = 11
2 h 30 ≤ t < 2 h 32
9
11 + 9 = 20
2 h 32 ≤ t < 2 h 34
11
20 + 11 = 31
2 h 34 ≤ t < 2 h 36
12
31 + 12 = 43
2 h 36 ≤ t < 2 h 38
7
43 + 7 = 50
2 h 40 ≤ t < 2 h 42
5
50 + 5 = 55
2 h 42 ≤ t < 2 h 48
8
55 + 8 = 63
2 h 48 ≤ t < 2 h 56
6
63 + 6 = 69
Cumulative Frequency Graphs
• To draw a cumulative frequency graph:
– scale and label the axes correctly (variable on the xaxis & cumulative frequency on the y-axis)
– plot the first point (lowest bound, 0)
– plot the middle points (upper bound, corresponding
cumulative frequency)
– plot the last point (highest bound, total frequency)
– connect the points with a smooth curve
Cumulative Frequency Graphs
Cumulative Frequency Graphs
Example. A supermarket is
open 24 hours a day and
has free parking. The
number of parked cars
each hour is monitored
over a period of several
days.
Organize this information
into a cumulative
frequency table.
Then draw a graph of the
cumulative frequency.
# of cars parked
per hour (n)
Frequency
0 ≤ n < 50
6
50 ≤ n < 100
23
100 ≤ n < 150
41
150 ≤ n < 200
42
200 ≤ n < 250
30
250 ≤ n < 300
24
300 ≤ n < 350
9
350 ≤ n < 400
5
Cumulative Frequency Graphs
# of cars
parked per
hour (n)
Frequency
Cumulative
frequency
0 ≤ n < 50
6
6
50 ≤ n < 100
23
29
100 ≤ n < 150
41
70
150 ≤ n < 200
42
112
200 ≤ n < 250
30
142
250 ≤ n < 300
24
166
300 ≤ n < 350
9
175
350 ≤ n < 400
5
180
Interpreting Cumulative Frequency Graphs
• To find the median, find the frequency amount that
translates to 50% of the cumulative frequency. Follow
this amount on the y-axis over to the curve, and then
down to the x-axis to find the median.
• To find the lower quartile, Q1, find the frequency
amount that translates to 25% of the cumulative
frequency. Follow this amount on the y-axis over to the
curve, and then down to the x-axis to find Q1.
• To find the upper quartile, Q3, find the frequency
amount that translates to 75% of the cumulative
frequency. Follow this amount on the y-axis over to the
curve, and then down to the x-axis to find Q3.
• To find the interquartile range subtract the lower
quartile from the upper quartile: IQR = Q3 – Q1.
Interpreting Cumulative Frequency Graphs
Example. Use the cumulative frequency graph to
estimate the:
i. median finishing time
ii. number of competitors who finished in less
than 2 hours 35 minutes
iii. percentage of competitors who took more
than 2 hours 39 minutes to finish
iv. time taken by a competitor who finished in
the top 20% of runners completing the
marathon
Interpreting Cumulative Frequency Graphs
Interpreting Cumulative Frequency Graphs
i. Median = 50th
%ile.
50% of 69 =
34.5.
Start with
cumulative
frequency of
34.5 & find
corresponding
time.
2 hours 34.5
minutes
Interpreting Cumulative Frequency Graphs
ii. Start on xaxis at 2 hours
35 minutes &
find
corresponding
cumulative
frequency.
37 people
Interpreting Cumulative Frequency Graphs
iii. Start on xaxis at 2 hours
39 minutes &
find
corresponding
cumulative
frequency.
52 people took
< 2 h 39 min.
69 − 52
=
69
𝟐𝟔. 𝟒%
Interpreting Cumulative Frequency Graphs
iv. 20% of 69 =
13.8.
Find time
corresponding
to cumulative
frequency of
13.8.
2 hour 31 min