Chapter 2:

2.1.notebook
September 06, 2013
*You need your book today!!!!
Chapter
2:
Organizing Data
Florence Nightingale (1820-1910)
has been described as the "relevant
statistician ."
--One of the first to use graphic
representations of stats
--Improved sanitation in hospitals
with charts/diagrams
Her stat reports about the appalling
sanitary conditions at Scutari (main
British hospital in Crimean War)
were taken seriously by the English
Secretary of War (Sidney Herbert).
Her recommendations were instituted in military hospitals,
and the mortality rate dropped from 42.7% to 2.2%!!
2.1.notebook
September 06, 2013
There were no vessels for water or utensils of any kind; no soap, towels, or clothes, no hospital clothes; the men lying in their uniforms, stiff with gore and covered with filth to a degree and of a kind no one could write about; their persons covered with vermin . . . We have not seen a drop of milk, and the bread is extremely sour. The butter is most filthy; it is Irish butter in a state of decomposition; and the meat is more like moist leather than food. Potatoes we are waiting for, until they arrive from France . . . Early in 1855, because of the defects in the sanitation system, there was a great increase in the number of cases of cholera and of typhus fever among Nightingale's patients. Seven of the army doctors and three of the nurses died. Frost­bite and dysentery from exposure in the trenches before Sevastopol made the wards fuller than before. There were over 2000 sick and wounded in the hospital and in February 1855 the death­rate rose to 42%. The War Office ordered the sanitary commissioners at Scutari to carry out sanitary reforms immediately, after which the death­
rate declined rapidly until in June it had fallen to 2%. 2.1.notebook
September 06, 2013
Florence Nightingale said, "In dwelling upon the vital
importance of sound observation, it must never be lost sight
of what observation is for.
"It is not for the sake of piling up miscellaneous information
or curious facts, but for the sake of saving life and
increasing health and comfort."
--Notes on Nursing
Ways to Organize
& Present Data
­­Pictographs
­­Bar Graphs
­­Pie Charts
­­Dot Plots
­­Histograms
­­Time Series
­­Stem Plots
­­Frequency Distributions
­­Pareto charts
­­Frequency Polygons
­­Ogives
­­Box­n­Whisker*
­­Scatter Plots*
*We will discuss in later chapters.
2.1.notebook
September 06, 2013
Pictographs
Uses a picture or graphic to represent frequency
Frequency Distribution ‐‐ lists each
category or class and the number of times
it has occurred (frequency).
Two types of Frequency Distributions are:
‐‐Categorical
‐‐Grouped
2.1.notebook
September 06, 2013
Constructing a Categorical Frequency Distribution
The following is a list of the top 10 saddest children's movies. Pick
the one you think is the most sad. Construct a categorical frequency
distribution of the class data.
BAMBI
OLD YELLER
ET: THE EXTRA­TERRESTRIAL
WHERE THE RED FERN GROWS
UP
DUMBO
CHARLOTTE'S WEB
Constructing a Distribution with small amounts of data
The following set of N = 20 scores was obtained from a 10­point stats quiz. Organize these into a frequency distribution.
8 9 8 7 10 9 6 4 9 8
7 8 10 9 8 6 9 7 8 8
2.1.notebook
September 06, 2013
Constructing a Grouped Frequency Distribution (for large
amounts of data)
An instructor has obtained the set of N = 25 exam scores shown here. Construct a grouped frequency distribution of the data set using 9 classes.
82 75 88 93 53 84 87 58 72 94 69 84 61
91 64 87 84 70 76 89 75 80 73 78 60
Step 1: Find the range of the scores. Step 2: Divide the range by the # of classes to get the class width .
Round UP to the next highest whole number.
82 75 88 93 53 84 87 58 72 94 69 84 61
91 64 87 84 70 76 89 75 80 73 78 60
Step 3: Begin setting classes by the multiples of the width you're using. Lowest value is 53, so we will start with the number ______ and go up by the width. Be careful: that first number IS counted in the width. Write down class limits.
Class
Limits
Class
Boundaries
Frequency
Cumulative Frequency
2.1.notebook
September 06, 2013
82 75 88 93 53 84 87 58 72 94 69 84 61
91 64 87 84 70 76 89 75 80 73 78 60
Step 4: Find the class boundaries by adjusting your class limits by 0.5 in both directions.
Class
Limits
Class
Boundaries
Frequency
Cumulative Frequency
Step 5: Fill out your table with class limits, boundaries, and frequencies.
82 75 88 93 53 84 87 58 72 94 69 84 61
91 64 87 84 70 76 89 75 80 73 78 60
Class
Limits
Class
Boundaries
Frequency
Cumulative Frequency
2.1.notebook
September 06, 2013
Your turn!
Construct a Grouped Frequency Distribution using 6 classes.
The following are scores from a math test. 65 75 50 67 86 66 62 64 71 47
57 74 63 67 56 65 70 87 48 50
41 66 73 60 63 45 78 68 53 75
Answers!
Construct a Grouped Frequency Distribution using 6 classes.
The following are scores from a math test. 65 75 50 67 86 66 62 64 71 47
57 74 63 67 56 65 70 87 48 50
41 66 73 60 63 45 78 68 53 75
Class
Limits
Class
Boundaries
Frequency
Cumulative Frequency
2.1.notebook
September 06, 2013
The Histogram
Histograms are bar graphs for interval or ratio data. The data scale is on the x­axis and the frequency is on the y­axis.
Use the grouped frequency distribution we completed in our example to construct a histogram of the data set.
5
4
f
3
2
1
49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5
EXAM SCORE
The Frequency Polygon
Frequency Polygons are also used for interval or ratio data. The data scale is on the x­axis and the frequency is on the y­axis.
Use the grouped frequency distribution we completed in our example to construct a frequency polygon of the data set.
5
4
f
3
2
1
50 55 60 65 70 75 80 85 90 95
49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5
EXAM SCORE
2.1.notebook
September 06, 2013
The Ogive
Ogives are also used for interval or ratio data. The data scale is on the x­axis and the cumulative frequency is on the y­axis.
Use the grouped frequency distribution we completed in our example to construct an ogive of the data set.
25
20
cf
15
10
5
49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5
EXAM SCORE
Distribution Shapes
Histograms are valuable tools. If the raw data came from a random sample, the resulting histogram should have a shape similar to that of the entire population's shape. This will be a major importance later in this course.
ped
a
h
d­s trical
n
u
Mo mme
Sy
2.1.notebook
Distribution Shapes
or ution
m
r
trib
ifo
Un r Dis
ula
g
n
cta
Re
Distribution Shapes
d
we
e
k
S
ely Right
v
i
t
i
d
Pos kewe
S
September 06, 2013
2.1.notebook
September 06, 2013
Distribution Shapes
d
we
e
k
ly S eft
e
v
ati wed L
g
e
N Ske
Distribution Shapes
am
r
tog
s
i
H
l
oda
m
i
B
Mode = # that occurs the most
2.1.notebook
September 06, 2013
More Practice!
1. What is the difference between a class boundary and a class limit?
2. A data set has values ranging from a low of 10 and a high of 52. What's wrong with using the class limits: 10­19, 20­29, 30­39, 40­49 for a frequency distribution?
3. A data set with whole numbers has a low value of 20 and a high value of 82. Find the class width and class limits for a frequency distribution with 7 classes.
More Practice! ANSWERS
1. What is the difference between a class boundary and a class limit?
Boundary: halfway point of the space between the upper limit of one class and the lower limit of the next class (can't be data values)
Limit: the numbers that separate each of the classes (are data values)
2. A data set has values ranging from a low of 10 and a high of 52. What's wrong with using the class limits: 10­19, 20­29, 30­39, 40­49 for a frequency distribution?
It does not include all the data (the numbers above 49 to 52).
3. A data set with whole numbers has a low value of 20 and a high value of 82. Find the class width and class limits for a frequency distribution with 7 classes.
Class width = (82 ­ 20) ÷7 = 8.8 = 9
Class Limits: 20­28, 29­37, 38­46, 47­55, 56­64, 65­73, 74­82
2.1.notebook
September 06, 2013
4. You are a manager of a specialty coffee shop and collect data throughout a full day regarding waiting time for customers from the time they enter the shop until the time they pick up their order.
a. What type of distribution do you think would be most desirable for the waiting times: skewed right, skewed left, bell­shape symmetrical?
b. What if the distribution were bimodal? What might be an explanation?
4. You are a manager of a specialty coffee shop and collect data throughout a full day regarding waiting time for customers from the time they enter the shop until the time they pick up their order.
a. What type of distribution do you think would be most desirable for the waiting times: skewed right, skewed left, bell­shape symmetrical?
SKEWED RIGHT: you want most wait times to be short
b. What if the distribution were bimodal? What might be an explanation?
Lots of customers means long wait times (long lines)
Few customers means short wait times (short lines)
2.1.notebook
September 06, 2013
5. The following data represent salaries, in 1000s of dollars, for employees of a small company. The data have been ordered from lowest to highest.
24 25 25 27 27 29 30 35 35 35 36 38
38 39 39 40 40 40 45 45 45 45 47 52
52 52 58 59 59 61 61 67 68 68 68 250
a. Make a histogram using the class boundaries: 23.5, 69.5, 115.5, 161.5, 207.5, and 253.5.
50
40
f
30
20
10
Salary in Thousands
5. The following data represent salaries, in 1000s of dollars, for employees of a small company. The data have been ordered from lowest to highest.
24 25 25 27 27 29 30 35 35 35 36 38
38 39 39 40 40 40 45 45 45 45 47 52
52 52 58 59 59 61 61 67 68 68 68 250
b. Look at the last data value. Does it appear to be an outlier (a value that doesn't fit the rest of the values)? Could this be an owner's salary?
2.1.notebook
September 06, 2013
5. The following data represent salaries, in 1000s of dollars, for employees of a small company. The data have been ordered from lowest to highest.
24 25 25 27 27 29 30 35 35 35 36 38
38 39 39 40 40 40 45 45 45 45 47 52
52 52 58 59 59 61 61 67 68 68 68 250
c. Take out the highest value (250). Make a new histogram with the class boundaries: 23.5, 32.5, 41.5, 50.5, 59.5, and 68.5. Does this new histogram represent salaries of the company better than the first one you did?
10
8
f
6
4
2
Salary in Thousands
6. Certain kinds of tumors tend to recur. The following data represent the lengths of time, in months, for a tumor to recur after chemotherapy (DP Byar, Journal of Urology, vol. 10, pp. 556­561). Using 5 classes, construct an ogive.
19 18 17 1 21 22 54 46 25 49
50 1 59 39 43 39 5 9 38 18
14 45 54 59 46 50 29 12 19 36
38 40 43 41 10 50 41 25 19 39
27 20
50
40
cf
30
20
10
Time (months)