Lab 3 - Tim Busken

Statistics Chapter 2
Name:
2.5 Measures of Position — LAB
Learning objectives:
1. How to find the first, second, and third quartiles of a data set, how to find the interquartile range of a data set, and how to represent a data set graphically using a box-and
whisker plot
2. How to interpret other fractiles such as percentiles and how to find percentiles for a
specific data entry
3. Determine and interpret the standard score (z-score)
Fractiles
• In this section, you will learn how to use fractiles to specify the location of a data
entry within a data set.
• Fractiles are numbers that partition (divide) an ordered data set into equal parts.
Quartiles
• First quartile, Q1 : About one quarter of the data fall on or below Q1 .
• Second quartile, Q2 : About one half of the data fall on or below Q2 (median).
• Third quartile, Q3 : About three quarters of the data fall on or below Q3 .
Example: The number of nuclear power plants in the top 15 nuclear power-producing
countries in the world are listed. Find the first, second, and third quartiles of the data set.
7 18 11 6 59 17 18 54 104 20 31 8 10 15 19
Solution: Q2 divides the data set into two halves.
The first and third quartiles are the medians of the lower and upper halves of the data set.
Interquartile Range (IQR)
• The difference between the third and first quartiles.
• IQR = Q3 − Q1 .
Recall that for the nuclear power plant data, Q1 = 10, Q2 = 18 and Q3 = 31. The
interquartile range is
IQR = Q3 − Q1 = 31 − 10 = 21
The number of power plants in the middle portion of the data set vary by at most 21.
Box-and-whisker plot
• Exploratory data analysis tool.
• Highlights important features of a data set.
• Requires (five-number summary): Minimum entry, First quartile Q1 , Median Q2 ,
Third quartile Q3 , Maximum entry
How to draw a Box-and-Whisker Plot
1. Find the five-number summary of the data set.
2. Construct a horizontal scale that spans the range of the data.
3. Plot the five numbers above the horizontal scale.
4. Draw a box above the horizontal scale from Q1 to Q3 and draw a vertical line in the
box at Q2 .
5. Draw whiskers from the box to the minimum and maximum entries.
Example The five number summary for the power plant data is:
Min = 6,
Q1 = 10,
Q2 = 18,
Q3 = 31,
Max = 104
About half the scores are between 10 and 31. By looking at the length of the right whisker,
you can conclude 104 is a possible outlier. Another way to identify outliers is to use the
interquartile range.
Using the Interquartile Range to Identify Outliers
1. Find Q1 and Q3
2. Find the interquartile range, IQR = Q3 − Q1
3. Multiply the IQR by 1.5
4. Subtract (1.5)×(IQR) from Q1 to find the lower fence.
lowerf ence = Q1 − 1.5 · (IQR)
Any data entry less than the lower fence is an outlier.
5. Add 1.5×IQR to Q3 to find the upper fence.
upperf ence = Q3 + 1.5 · (IQR)
Any data entry greater than the upper fence is an outlier.
Example When carrying out this list of steps for the power plant data we find:
1. Q1 = 10 and Q3 = 31
2. Find the interquartile range, IQR = Q3 − Q1 = 21
3. Multiply the IQR by 1.5:
1.5×21 = 31.5
4. lower fence = 10 − 31.5 = -21.5 . There is no data entry less than −21.5 value.
5. upper fence = 31 + 31.5 = 62.5 . So, 104 is an outlier since it is greater than 62.5.
Modified Boxplot The boxplot is redrawn with the whisker on the right extended out to
the largest value in the data set that is not larger than the upper fence, namely data value
59. Had there been an outlier on the lower end, then our graph would have the left whisker
extended to the lowest value in the data set that was not an outlier. Also, notice that outliers
are marked on the graph with a cross marker.
Your Turn! Use the sample data below answer the questions on the next page. The sample
represents the number of paid vacation days used by 20 employees in a recent year.
16 25 1 33 15 5 18 8 20 14 17 19 16 10 21 28 14 37 18 15
1. List the data in ascending fashion.
2. Find the minimum
2.
3. Find Q1
3.
4. Find Q2
4.
5. Find Q3
5.
6. Find the maximum
6.
7. Find the lower fence.
7.
8. Find the upper fence
8.
9. What numbers are outliers?
9.
10. Sketch the graph of a modified boxplot. Label the x axis.
11. About 75% of the employees in the sample took at least how many days off?
11.
12. What percentage of employees took more than 16 and a half days off?
12.
13. You randomly select one employee from the sample. What is the likelihood that the
person took less than 20.5 days off?
13.
The ogive at the right represents
the cumulative frequency distribution for SAT scores of collegebound students in a recent year.
What SAT score represents the
62nd percentile?
Answer/Interpretation: An SAT
score of 1600. This means that
approximately 62% of the students
had an SAT score of 1600 or less.
14. What percentage of students scored at most 1500?
15. Approximately what SAT score represents the 90th percentile? How should you interpret
this?
Another way to find a percentile is to use a formula.
Definition
the formula
To find the percentile that corresponds to a specific data entry x, use
number of data entries less than x
· 100
total number of data entries
and then round to the nearest whole number.
Percentile of x =
Example For the vacation days data, find the percentile that corresponds to 8 vacation
days.
Answer There are two data entries less than 8 and the total number of data entries is 20.
Percentile of 8 =
number of data entries less than x
2
· 100 =
· 100 = 10
total number of data entries
20
Interpretation 8 vacation days off corresponds to the 10th percentile. 8 vacation days is
greater than 10% of the entries in the sample.
Try These! : The exam scores of a particular geography class are listed.
26 54 59 64 67 73 73 73 74 76
77 78 80 80 81 83 83 85 86 87
87 88 88 89 90 91 92 93 94 97
16. Find the percentile that corresponds to an exam score of 67. How should you interpret
this?
17. Find the percentile that corresponds to an exam score of 89. How should you interpret
this?
Standard Score (z-score)
• Every number x in a data set has a z-score
• A z-score represents the number of standard deviations a given value x is away from the
mean µ.
x−µ
• z-score formula
z=
σ
• A z-score can be negative, positive or zero.
• When z is negative the corresponding x value is less than the mean
• When z is positive the corresponding x value is greater than the mean
• When z = 0 the corresponding x value is equal to the mean
How to use the calculator to graph a Boxplot
The TI-83, TI-83 Plus, and TI-84 Plus calculators will take a list of data and automatically draw
a box-and-whisker plot for that data. Since there are a number of different types of plots
available on the calculator, it is important to make sure that all other plots are turned off before
you begin or you graph will be cluttered with several unrelated plots being graphed at the same
time. Even worse, it is possible for previous plots to become invalid or the data sets that were
used before are changed or deleted, causing an error whenever you try to graph anything new.
Before you begin any plotting:
• Press the Y= key at the top left of the keyboard, and delete or deselect
any equations being plotted there.
• Press STAT PLOT (above the Y= key) and choose 4 to turn off any
other statistical plots.
There are three stages to creating a box-and-whisker plot.
1. Enter the data into a list as before.
2. Tell the calculator what kind of plot you want.
3. Tell the calculator what size to draw the window for the plot.
Example: Household Incomes
The following numbers represent household incomes in thousands of dollars for twelve
households:
35
29
44
72 34
64 41
50 54
We’ll use this data to construct a box-and-whisker plot.
First store the data in the list L1.
Press the 2nd key and then press the Y= key to get to
STATPLOT.
Press the number 1 key.
Move the cursor to On and press the ENTER key.
Use the arrow keys and highlight the box-and-whisker
plot picture.
Type L1 in for Xlist:
104
39
58
Set the Freq: to 1.
Press the ZOOM key.
Press the number 9 key.
We have two different types of box-and-whisker plots to select from. One will separate outliers
from the maximum or minimum value (the modified boxplot) and the other will include the
outliers in to whiskers (the boxplot).
If you select the picture that shows two dots after the
maximum on the plot, the outliers will be shown
outside of the whiskers.