Chapter 3 Outline

Chapter 3 Outline
Introduction
What is an average? It is a single number used to describe the central tendency of a set of data.
Examples of an average are:



The average length of the school year for students in public schools in the United States is
180 days.
The median salary of New York Yankees baseball players on opening day 2004 was
$3,100,000. (http://asp.usatoday.com/sports/baseball/salaries/mediansalaries.aspx?year=2004 )
The median price of houses on Kelley’s Island went up 51% in five years to $208,250. (The
Blade, June 27, 2004)

“The 2004 average income for Computer Support Specialists was $20.50 per hour. (The
Census Bureau’s Income Statistics Branch, July, 2004)

Computer Software Engineers’ pay averaged $36.65 per hour, while Sales Cashiers
averaged $7.58 per hour. (Bureau of Labor Statistics web site: http://stats.bls.gov, July, 2004)
There are several types of averages. We will consider five: the arithmetic mean, the median, the
mode, the weighted mean, and the geometric mean.
Measures of Location
The purpose of a measure of location is to pinpoint the center of a set of observations.
Measure of location: A single value that summarizes a set of data. It locates the
center of the values.
The arithmetic mean, or simply the mean, is the most widely used measure of location.
Mean: The sum of observations divided by the total number of observations.
The population mean is calculated as follows:
Population mean =
Sum of all values in the population
Number of values in the population
In terms of symbols, the formula for the mean of a population is:
Population Mean

X
N
[3  1]
Where:
 represents the population mean. It is the Greek letter “mu.”
N
is the number of items in the population.
X
is any particular value.

indicates the operation of adding all the values. It is the Greek letter “sigma.”
X is the sum of the X values.
[3-1] indicates the formula number from the text.
Any measurable characteristic of a population is called a parameter.
Parameter: A characteristic of a population.
The Sample Mean
As explained in Chapter 1, we frequently select a sample from the population to find out
something about a specific characteristic of the population.
The mean of a sample and the mean of a population are computed in the same way, but the
shorthand notation is different.
In terms of symbols, the formula for the mean of a sample is:
Sample Mean
X
X
n
[3  2]
Where:
is the sample mean; it is read AX bar@.
X
n
is the number of values in the sample.
X
is a particular value.

indicates the operation of adding all the values.
X is the sum of the X values.
[3-2] is the formula number from the text.
The mean of a sample, or any other measure based on sample data, is called a statistic.
Statistic: A characteristic of a sample.
“The mean weight of a sample of laptop computers is 6.5 pounds,” is an example of a statistic.
In formulas [3-1] and [3-2] the mean is calculated by summing the observations and dividing by
the total number of observations.
Suppose the Kellogg Company’s quarterly earnings per share for the last five quarters are: $0.89,
$0.77, $1.05, $0.79, and $0.95. If the earnings are a population, the mean is found by:
X ($0.89  $0.77  $1.05  $0.79  $0.95)

N
5
$4.45

 $0.89
5

The mean quarterly earning per share is $0.89.
In some situations the mean may not be representative of the data.
As an example, the annual salaries of five vice presidents at AVX, LLC are $90,000, $92,000,
$94,000, $98,000, and $350,000. The mean is:
X ($90, 000  $92, 000  $94, 000  $98, 000  $350, 000)

N
5
$724, 000

 $144,800
5

Notice how the one extreme value ($350,000) pulled the mean upward. Four of the five vice
presidents earned less than the mean, raising the question whether the arithmetic mean value of
$144,800 is typical of the salary of the five vice presidents.
Properties of the Mean
As stated, the mean is a widely used measure of location. It has several important properties.
1.
2.
3.
4.
5.
Every set of interval level and ratio level data has a mean.
All the data values are included in the calculation.
A set of data has only one mean, that is, the mean is unique.
The mean is a useful measure for comparing two or more populations.
The sum of the deviations of each value from the mean will always be zero, that is:
( X  X )  0
Weighted Mean
The weighted mean is a special case of the arithmetic mean. It is often useful when there are
several observations of the same value.
Weighted mean: The value of each observation is multiplied by the number of
times it occurs. The sum of these products is divided by the total number of
observations to determine the weighted mean.
In general, the weighted mean of a set of values, designated X1, X2, X3,  Xn, with the
corresponding weights w1, w2, w3, , wn is computed by:
Weighted Mean
Xw 
w1 X1  w2 X 2  w3 X 3   wn X n
w1  w2  w3   wn
[3  3]
The weighted mean is particularly useful when various classes or groups contribute differently to
the total. For example, a small accounting firm consists of administrative assistants who are paid
$12 per hour, financial assistants who earn $15 per hour, and tax examiners who earn $24 per
hour.
To say the average hourly wage for the firm is $16 per hour ($12 + $15 + $21) ÷ 3 would not be
accurate unless there was the same number of people in each group.
Suppose the accounting firm has ten employees: two administrative assistants who earn $12 per
hour, 3 financial assistants who earn $15 per hour, and five tax examiners who earn $24 per hour.
The weighted mean is:
Xw 

w1 X 1  w2 X 2  w3 X 3   wn X n
w1  w2  w3   wn
(2  $12)  (3  $15)  (5  $24) $24  $45  $120 $189


 $18.90
235
10
10
Thus the weighted mean is $18.90.
The Median
It was pointed out that the arithmetic mean is often not representative of data with extreme
values. The median is a useful measure when we encounter data with an extreme value.
Median: The midpoint of the values after all observations have been ordered
from the smallest to the largest, or from largest to smallest.
Fifty percent of the observations are above the median and 50 percent are below the median. To
determine the median, the values are ordered from low to high, or high to low, and the middle
value selected. Hence, half the observations are above the median and half are below it. For the
executive incomes, the middle value is $44,000, the median.
$40,000$42,000$44,000$48,000$300,000
D

median
Obviously, it is a more representative value in this problem than the mean of $94,800.
Note that there were an odd number of executive incomes (5). For an odd number of ungrouped
values we just order them and select the middle value. To determine the median of an even
number of ungrouped values, the first step is to arrange them from low to high as usual, and then
determine the value half way between the two middle values.
As an example, the number of bronze castings produced in a day at Markey Bronze is 87, 62, 91,
58, 99, and 85. Ordering these from low to high:
58
62
85
87
D D
91
99
The median number produced is halfway between the two middle values of 85 and 87. The
median is 86. Thus we note that the median (86) may not be one of the values in a set of data.
Properties of the Median
The major properties of the median are:
1. The median is a unique value, that is, like the mean, there is only one median for a set of
data.
2. It is not influenced by extremely large or small values and is therefore a valuable measure
of location when such values do occur.
3. It can be computed for ratio level, interval level, and ordinal-level data.
4. Fifty percent of the observations are greater than the median and fifty percent of the
observations are less than the median.
The Mode
A third measure of location is the mode.
Mode: The value of the observation that appears most frequently.
The mode is the value that occurs most often in a set of raw data. The dividends per share
declared on five stocks were: $3, $2, $4, $5, and $4. Since $4 occurred twice (the most frequent
value), the mode is $4.
Properties of the Mode
1.
2.
3.
4.
The mode can be found for all levels of data (nominal, ordinal, interval, and ratio).
The mode is not affected by extremely high or low values.
A set of data can have more than one mode. If it has two modes, it is said to be bimodal.
A disadvantage is that a set of data may not have a mode because no value appears more
than once.
The Relative Positions of the Mean, Median, and Mode
The mean, median, and mode of a set of data are usually not all equal. However, if they are
identical, the distribution is a symmetrical distribution.
Symmetrical distribution: A distribution that has the same shape on either side of
the median.
If the distribution is not symmetrical, it is skewed and
the relationship between the mean, median, and mode
changes. If the long tail is to the right, the distribution
is said to be a positively skewed distribution.
Symme tric Distribution
Number
The chart on the right shows the useful life of a
sample of batteries used in a CD player. Note the
symmetrical bell-shape of the distribution. In a
symmetrical distribution the mean, median and mode
are equal.
1
6 11 16 21 26 31 36 41 46 51
Hours of Use ful Life
Positively skewed distribution: The long tail is to the right; that is, in the positive
direction. The mean is larger than the median or the mode.
The chart on the right shows the years of service for a
group of employees at an old manufacturing plant that
was revitalized with a new product line and
experienced a hiring surge about 13 years ago. It is a
positively skewed distribution. The mean is larger
than the median, which is larger than the mode.
Number
Positive Skewness
For a negatively skewed distribution the mean is the
smallest of the three measures of central tendency
(because it is being pulled down by the small
observations). The mode is the highest of the three
measures.
20
18
16
14
12
10
8
6
4
2
0
1
6 11 16 21 26 31 36 41 46 51
Years of Service
Negatively skewed distribution: The long tail is to the left or in the negative
direction. The mean is smaller than the median or mode.
The chart on the right shows the years of service for
a group of teachers in a school system that has an
experienced staff and has not hired many staff in
recent years. The mean is smaller than the median,
which is smaller than the mode.
Number
In skewed distributions the mode always appears at
the apex or top (highest point) on the curve and the
mean is pulled in the direction of the tail. The
median always appears between the mode and the
mean, regardless of the direction of the tail.
Negative Skewness
20
18
16
14
12
10
8
6
4
2
0
1
5
9 13 17 21 25 29 33 37 41 45 49 53
Years of Service
The Geometric Mean
The geometric mean is used to determine the mean percent increase from one period to another. It
is also used in finding the average of ratios, indexes, and growth rates.
Geometric mean: The nth root of the product of n values.
The formula for finding the geometric mean is:
Geometric Mean
GM  n ( X 1 )( X 2 )( X 3 )
(Xn)
[3  4]
Where:
X 1, X 2, ( X 3 ) etc.
n
n
are data values.
is the number of values.
is the n th root.
The geometric mean can be used for averaging percents. Suppose the return on investment for
McDermoll International for the past 4 years is 0.4%, 2.9%, 2.1%, and 12.3%. The GM increase
over the period is 4.3 percent, found by: [Note that the 1.004 represents the 0.4% return on
investment plus the original investment of 1.000. This is also done for the other returns]
GM  n ( X 1 )( X 2 )( X 3 )
(Xn)
 4 1.004  1.029  1.021  1.123
 4 1.18455  1.043
The geometric mean is fourth root of 1.18455, which is 1.043. The average return on the
investment is found by subtracting one from the geometric mean. (1.043 – 1.000) = 0.043 = 4.3%.
Another application of the geometric mean is to find average percent increase over a period of
time. Text formula [3-5] is used:
Average Percent
Increase Over Time
GM  n
Value at end of period
1
Value at beginning of period
[3  5]
Why Study Dispersion?
A direct comparison of two sets of data based only on two measures of location such as the mean
and the median can be misleading since an average does not tell us anything about the spread of
the data.
For example, the mean salary paid to baseball players for the New York Yankees is $6,568,757.
However, the range is $21,697,450, with a low of $302,550 and a high of $22,000,000. The
Tampa Devil Rays have a mean salary of $1,094,681. The range is $7,197,500, with a low of
$302,500 and a high of $7,500,000. (http://espn.go.com/mlb/clubhouses/salaries/2004).
Suppose a statistics instructor has two classes, one in the morning and one in the evening; each
with six students. In the morning class (AM) the students’ ages are 18, 20, 21, 21, 23, and 23
years. In the evening class (PM) the ages are 17, 17, 18, 20, 25, and 29 years. Note that for both
classes the mean age is 21 years but there is more variation or dispersion in the ages of the
evening students.
A small value for a measure of dispersion indicates that the data are clustered closely, say, around
the arithmetic mean. Thus the mean is considered representative of the data, that is, it is reliable.
Conversely, a large measure of dispersion indicates that the mean is not reliable and is not
representative of the data.
Measures of Dispersion
We will consider several measures of dispersion: the range, the mean deviation, the variance,
and the standard deviation.
Range
The simplest measure of dispersion is the range.
Range: The difference between the largest and smallest values in a data set.
The formula for range is:
Range
Range  Largest value  Smallest value
[3-6]
The statistics instructor referred to above has two classes with the ages indicated:
A.M. Class: 18, 20, 21, 21, 23, 23
P.M. Class: 17, 17, 18, 20, 25, 29
The range for the classes is:
A.M. Class: (23  18) = 5
P.M. Class: (29  17) = 12
Thus we can say that there is more spread in the ages of the students enrolled in the evening
(P.M.) class compared with the morning (A.M.) class.
The characteristics of the range are:




Only two values are used in the calculation.
It is influenced by extreme values.
It is easy to compute and understand.
It can be distorted by an extreme value.
The range has two disadvantages. It can be distorted by a single extreme
value. Suppose the same statistics instructor has a third class of five
students. The ages of these students are given in the table.
Ages of Students
20 20 21 22 60
The range of ages is 40 years, yet four of the five students’ ages are within two years of each
other. The 60-year old student has distorted the spread. Another disadvantage is that only two
values, the largest and the smallest, are used in its calculation.
Mean Deviation
In contrast to the range, the mean deviation considers all the data.
Mean Deviation: The arithmetic mean of the absolute values of the deviations
from the arithmetic mean.
In terms of symbols, the formula for the mean deviation is:
Mean Deviation
Where:
X
X
n
 
MD 
 XX
[3  7]
n
is the value of each observation.
is the arithmetic mean of the values.
is the number of observations in the sample.
indicates the absolute value.
We take the absolute value of the deviations from the mean because if we didn’t, the positive and
negative deviations from the mean exactly offset each other, and the mean deviation would
always be zero. Such a measure (zero) would be a useless statistic.
The mean deviation is computed by first
determining the difference between each
observation and the mean. These differences are
then averaged without regard to their signs. For
the PM statistics class the mean deviation is 4.0
years, found by the table on the right:
Then
X
| X  X | 24

4
n
6
Absolute
Deviation
XX
 17  21
 17  21
 18  21
 20  21
 25  21
 29  21

=
=
=
=
=
=

 4
 4
 3
 1
 4
 8
=
=
=
=
=
=
 =
4
4
3
1
4
8
24
The parallel lines  indicate absolute value. To interpret, 4.0 years is the mean amount by which
the ages differ from the arithmetic mean age of 21.0 years for the PM students.
The major characteristics of the mean deviation are:
1. All the observations are used in the calculations.
2. It is easy to understand.
The mean deviation has a disadvantage because of the use of the absolute values. Generally,
absolute values are difficult to work with, so the mean deviation is not used as often as other
measures of dispersion.
Variance and Standard Deviation
The disadvantage of the mean deviation is that the absolute values are difficult to manipulate
mathematically. Squaring the differences from each value and the mean eliminates the problem of
absolute values. These squared differences are used both in the computation of the variance and
the standard deviation.
Variance: The arithmetic mean of the squared deviations from the mean.
The variance is non-negative and is zero only if all observations are the same.
Standard Deviation: The square root of the variance
Squaring units of measurement, such as dollars or years, makes the variance cumbersome to use
since it yields units like “dollars squared” or “years squared.” However, by calculating the
standard deviation, which is the positive square root of the variance, we can return to the original
units, such as years or dollars. Because the standard deviation is easier to interpret, it is more
widely used than the mean deviation or the variance.
Population Variance
The formula for the population variance and the sample variance are slightly different. The
formula for the population variance is:
Population Variance
Where:
2
X

N
2 
( X   ) 2
N
[3  8]
is the symbol for the population variance ( is the Greek letter sigma). It is usually
referred to as “sigma squared.”
is a value of an observation in the population.
is the arithmetic mean of the population.
is the total number of observations in the population.
The major characteristics of the variance are:
3. All the observations are used in the calculations.
4. It is not influenced by extreme observations.
5. The units are somewhat difficult to work with. (They are the original units squared.)
Population Standard Deviation
The population standard deviation is the square root of the population variance. The formula for
the population standard deviation is:
Population Standard Deviation

( X   ) 2
N
[3  9]
Sample Variance
The conversion of the population variance formula to the sample variance formula is not as direct
as the change made when we went from the population mean formula to the sample mean
formula. Recall in that instance we replaced  with X and N with n.
The conversion from population variance to sample variance requires a change in the
denominator. Instead of substituting n, the number in the sample, for N, the number in the
population, we replace N with (n – 1). Thus the formula for the sample variance is:
Sample Variance
Where:
s2
X
X
n
s2 
( X  X ) 2
n 1
[3  10]
is the symbol for the sample variance. It is pronounced as “s squared.”
is the value of each observation in the sample.
is the mean of the sample.
is the total number of observations in the sample.
Changing the denominator to (n – 1) seems insignificant, however the use of n tends to
underestimate the population variance. The use of (n –1) in the denominator provides an
appropriate correction factor.
Sample Standard Deviation
The sample standard deviation is used as an estimator of the population standard deviation. The
sample standard deviation is the square root of the sample variance. The formula is:
Standard Deviation
s
( X  X ) 2
n 1
[3  11]
Interpretation and Uses of the Standard Deviation
The standard deviation is used to measure the spread of the data. A small standard deviation
indicates that the data is clustered close to the mean, thus the mean is representative of the data. A
large standard deviation indicates that the data are spread out from the mean and the mean is not
representative of the data.
Chebyshev’s Theorem
We can use Chebyshev’s theorem to determine the percent of the values that lie within a specified
number of standard deviations of the mean.
Chebyshev’s theorem: For any set of observations (sample or population), the
proportion of the values that lie within k standard deviations of the mean is at
least 1 – 1/k2, where k is any constant greater than 1.
The theorem holds for any set of observations regardless of the shape of the distribution.
The Empirical Rule
Chebyshev’s theorem is concerned with any set of values: that is, the distribution of values can
have any shape. If the distribution is approximately symmetrical and bell shaped, then the
Empirical Rule or Normal Rule as it is often called is applied.
Empirical Rule: For a symmetrical, bell-shaped frequency distribution,
approximately 68 percent of the observations will lie within plus and minus one
standard deviation of the mean; about 95 percent of the observations will lie
within plus and minus two standard deviations of the mean; and practically all
(99.7 percent) will lie within plus and minus three standard deviations of the
mean.
The rule states that:
 The mean, plus and minus one standard deviation, will include about 68% of the
observations.
 The mean, plus and minus two standard deviations, will include about 95% of the
observations.
 The mean, plus and minus three standard deviations, will include about 99.7% of the
observations.