Introduction

MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 1
MARKED SET
CHAPTER
1
Introduction
OBJECTIVE
To identify the role of statistics in the analysis of data from
engineering and the sciences
CONTENTS
•
•
•
1.1
Statistics: The Science of Data
1.2
Fundamental Elements of Statistics
1.3
Types of Data
1.4
The Role of Statistics in Critical Thinking
1.5
A Guide to Statistical Methods Presented in This Text
STATISTICS IN ACTION
Contamination of Fish in the Tennessee River: Collecting
the Data
1
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 2
MARKED SET
2 Chapter 1 Introduction
1.1 Statistics: The Science of Data
According to The Random House College Dictionary, statistics is “the science that
deals with the collection, classification, analysis, and interpretation of numerical facts
or data.” In short, statistics is the science of data.
Definition 1.1
Statistics is the science of data. This involves collecting, classifying, summarizing,
organizing, analyzing, and interpreting data.
The science of statistics is commonly applied to two types of problems:
1. Summarizing, describing, and exploring data
2. Using sample data to infer the nature of the data set from which the sample was
selected
As an illustration of the descriptive applications of statistics, consider the United
States census, which involves the collection of a data set that purports to characterize
the socioeconomic characteristics of the approximately 295 million people living in the
United States. Managing this enormous mass of data is a problem for the computer
software engineer, and describing the data utilizes the methods of statistics. Similarly,
an environmental engineer uses statistics to describe the data set consisting of the daily
emissions of sulfur oxides of an industrial plant recorded for 365 days last year. The
branch of statistics devoted to these applications is called descriptive statistics.
Definition 1.2
The branch of statistics devoted to the organization, summarization, and description of data sets is called descriptive statistics.
Sometimes the phenomenon of interest is characterized by a data set that is either
physically unobtainable or too costly or time-consuming to obtain. In such situations,
we obtain a subset of the data—called a sample—and use the sample information to
infer its nature. To illustrate, suppose the phenomenon of interest is the drinking-water
quality on an inhabited, but remote, Pacific island. You might expect water quality to
depend on such factors as temperature of the water, the level of the most recent rainfall, etc. In fact, if you were to measure the water quality repeatedly within the same
hour at the same location, the quality measurements would vary, even for the same
water temperature. Thus, the phenomenon “drinking-water quality” is characterized
by a large data set that consists of many (actually, an infinite number of) water quality
measurements—a data set that exists only conceptually. To determine the nature of
this data set, we sample it—i.e., we record quality for n water specimens collected at
specified times and locations, and then use this sample of n quality measurements to
infer the nature of the large conceptual data set of interest. The branch of statistics
used to solve this problem is called inferential statistics.
Definition 1.3
The branch of statistics concerned with using sample data to make an inference
about a large set of data is called inferential statistics.
1.2 Fundamental Elements of Statistics
In statistical terminology, the data set that we want to describe, the one that characterizes a phenomenon of interest to us, is called a population. Then, we can define a
sample as a subset of data selected from a population. Sometimes, the words
population and sample are used to represent the objects upon which the measurements
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 3
MARKED SET
1.2 Fundamental Elements of Statistics
3
are taken (i.e., the experimental units). In a particular study, the meaning attached to
these terms will be clear by the context in which they are used.
Definition 1.4
A statistical population is a data set (usually large, sometimes conceptual) that
is our target of interest.
Definition 1.5
A sample is a subset of data selected from the target population.
Definition 1.6
The object (e.g., person, thing, transaction, specimen, or event) upon which
measurements are collected is called the experimental unit. (Note: A population consists of data collected on many experimental units.)
In studying populations and samples, we focus on one or more characteristics or
properties of the experimental units in the population. The science of statistics refers
to these characteristics as variables. For example, in the drinking-water quality study,
two variables of interest to engineers are the chlorine-residual (measured in parts per
million) and the number of fecal coliforms in a 100-milliliter water specimen.
Definition 1.7
A variable is a characteristic or property of an individual experimental unit.
Example 1.1
Engineers with the University of Kentucky Transportation Research Program
have collected data on accidents occurring at intersections in Lexington, Kentucky. One of the goals of the study was to estimate the rate at which left-turn
accidents occur at intersections without left-turn-only lanes. This estimate will
be used to develop numerical warrants (or guidelines) for the installation of leftturn lanes at all major Lexington intersections. The engineers collected data at
each of 50 intersections without left-turn-only lanes over a 1-year period. At
each intersection, they monitored traffic and recorded the total number of cars
turning left that were involved in an accident.
a. Identify the variable and experimental unit for this study.
b. Describe the target population and the sample.
c. What inference do the transportation engineers want to make?
Solution
a. Since the engineers collected data at each of 50 intersections, the experimental unit
is an intersection without a left-turn-only lane. The variable measured is the total
number of cars turning left that were involved in an accident.
b. The goal of the study is to develop guidelines for the installation of left-turn lanes
at all major Lexington intersections; consequently, the target population consists of
all major intersections in the city. The sample consists of the subset of 25 intersections monitored by the engineers.
c. The engineers will use the sample data to estimate the rate at which left-turn accidents occur at all major Lexington intersections. (We learn, in Chapter 7, that this
estimate is the number of left-turn accidents in the sample divided by the total
number of cars making left turns in the sample.)
The preceding definitions and example identify four of the five elements of an inferential statistical problem: a population, one or more variables of interest, a sample,
and an inference. The fifth element pertains to knowing how good the inference is—
that is, the reliability of the inference. The measure of reliability that accompanies an
inference separates the science of statistics from the art of fortune-telling. A palm
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 4
MARKED SET
4 Chapter 1 Introduction
reader, like a statistician, may examine a sample (your hand) and make inferences
about the population (your future life). However, unlike statistical inferences, the
palm reader’s inferences include no measure of how likely the inference is to be true.
To illustrate, consider the transportation engineers’ estimate of the left-turn accident rate at Lexington, Kentucky, intersections in Example 1.1. The engineers are interested in the error of estimation (i.e., the difference between the sample accident rate
and the accident rate for the target population). Using statistical methods, we can determine a bound on the estimation error. This bound is simply a number (e.g., 10%)
that our estimation error is not likely to exceed. In later chapters, we learn that this
bound is used to help measure our “confidence” in the inference. The reliability of statistical inferences is discussed throughout this text. For now, simply realize that an inference is incomplete without a measure of reliability.
Definition 1.8
A measure of reliability is a statement (usually quantified) about the degree of
uncertainty associated with a statistical inference.
A summary of the elements of both descriptive and inferential statistical problems
is given in the following boxes.
Four Elements of Descriptive Statistical Problems
1. The population or sample of interest
2. One or more variables (characteristics of the population or sample units) that are
to be investigated
3. Tables, graphs, or numerical summary tools
4. Identification of patterns in the data
Five Elements of Inferential Statistical Problems
1. The population of interest
2. One or more variables (characteristics of the experimental units) that are to be
investigated
3. The sample of experimental units
4. The inference about the population based on information contained in the sample
5. A measure of reliability for the inference
Applied Exercises
1.1
of Ottawa, is a collection of publicly available data sets
to serve researchers in building prediction software models. A PROMISE data set on software reuse, saved in the
SWREUSE file, provides information on the success or
failure of reusing previously developed software for each
project in a sample of 24 new software development
projects. (Data source: IEEE Transactions on Software
Engineering, Vol. 28, 2002.) Of the 24 projects, 9 were
judged failures and 15 were successfully implemented.
a. Identify the experimental units for this study.
b. Describe the population from which the sample is
selected.
c. Use the sample information to make an inference about
the population.
Steel anticorrosion study. Researchers at the Department
of Materials Science and Engineering, National Technical
University (Athens, Greece), examined the anticorrosive
behavior of different epoxy coatings on steel. (Pigment &
Resin Technology, Vol. 32, 2003.) Flat panels cut from
steel sheets were coated with one of four different types of
epoxy (S1, S2, S3, and S4). After exposing the panels to
water for one day, the corrosion rate (nanoamperes per
square centimeter) was determined for each panel.
a. What are the experimental units for the study?
b. Suppose you are interested in describing only the corrosion rates of steel panels coated with epoxy type S1.
Define the target population and relevant sample.
SWREUSE
1.2
Success/failure of software reuse. The PROMISE Soft-
ware Engineering Repository, hosted by the University
1.3
Orchard contamination from insecticides. Pesticides ap-
plied to an extensively grown crop can result in inadvertent
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 5
MARKED SET
1.3 Types of Data
market. The system employs an air and water mixture designed to yield effective cooling with a much lower water
flow than conventional hydrocooling. To compare the effectiveness of the two systems, 20 batches of green tomatoes
were divided into two groups; one group was precooled
with the new method, and the other with the conventional
method. The water flow (in gallons) required to effectively
cool each batch was recorded.
a. Identify the population, the samples, and the type of
statistical inference to be made for this problem.
b. How could the sample data be used to compare the
cooling effectiveness of the two systems?
ambient air contamination. Environmental Science & Technology (Oct. 1993) reported on thion residues of the insecticide chlorpyrifos used on dormant orchards in the San
Joaquin Valley, California. Ambient air specimens were
collected daily at an orchard site during an intensive period
of spraying—a total of 13 days—and the thion level
(ng/m3) was measured each day.
a. Identify the population of interest to the researchers.
b. Identify the sample.
1.4
Ground motion of earthquakes. In the Journal of Earth-
quake Engineering (Nov., 2004), a team of civil and
environmental engineers studied the ground motion characteristics of 15 earthquakes that occurred around the
world between 1940 and 1995. Three (of many) variables
measured on each earthquake were the type of ground
motion (short, long, or forward directive), earthquake
magnitude (Richter scale) and peak ground acceleration
(feet per second). One of the goals of the study was to estimate the inelastic spectra of any ground motion cycle.
a. Identify the experimental units for this study.
b. Do the data for the 15 earthquakes represent a population or a sample? Explain.
1.5
1.6
COGAS
1.7
Weekly carbon monoxide data. The World Data Centre for
Greenhouse Gases collects and archives data for greenhouse and related gases in the atmosphere. One such data
set lists the level of carbon monoxide gas (measured in
parts per billion) in the atmosphere each week at the Cold
Bay, Alaska, weather station. The weekly data for the
years 2000–2002 are saved in the COGAS file.
a. Identify the variable measured and the corresponding
experimental unit.
b. If you are interested in describing only the weekly carbon monoxide values at Cold Bay station for the years
2000–2002, does the data represent a population or a
sample? Explain.
1.8
Monitoring defective items. Checking all manufactured
Computer power system load currents. Electrical engi-
neers recognize that high neutral current in computer
power systems is a potential problem. To determine the
extent of the problem, a survey of the computer power
system load currents at 146 U.S. sites was taken (IEEE
Transactions on Industry Applications, July/Aug. 1990).
The survey revealed that less than 10% of the sites had
high neutral to full-load current ratios.
a. Identify the population of interest.
b. Identify the sample.
c. Use the sample information to make an inference about
the population.
Precooling vegetables. Researchers have developed a new
precooling method for preparing Florida vegetables for
5
items coming off an assembly line for defectives would be
a costly and time-consuming procedure. One effective
and economical method of checking for defectives involves the selection and examination of a portion of the
items by a quality control engineer. The percentage of examined items that are defective is computed and then used
to estimate the percentage of all items manufactured on
the line that are defective. Identify the population, the
sample, and a type of statistical inference to be made for
this problem.
1.3 Types of Data
Data can be one of two types, quantitative or qualitative. Quantitative data are those
that represent the quantity or amount of something, measured on a numerical scale.
For example, the power frequency (measured in megahertz) of a semiconductor is a
quantitative variable, as is the breaking strength (measured in pounds per square inch)
of steel pipe. In contrast, qualitative (or categorical) data possess no quantitative interpretation. They can only be classified. The set of n occupations corresponding to a
group of n engineering graduates is a qualitative data set. The type of pigment (zinc or
mica) used in an anticorrosion epoxy coating also represents qualitative data.*
*A finer breakdown of data types into nominal, ordinal, interval, and ratio data is possible. Nominal data are
qualitative data with categories that cannot be meaningfully ordered. Ordinal data are also qualitative data,
but a distinct ranking of the groups from high to low exists. Interval and ratio data are two different types
of quantitative data. For most statistical applications (and all the methods presented in this introductory
text), it is sufficient to classify data as either quantitative or qualitative.
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 6
MARKED SET
6 Chapter 1 Introduction
Definition 1.9
Quantitative data are those that represent the quantity or amount of
something.
Definition 1.10
Qualitative data are those that have no quantitative interpretation, i.e., they
can only be classified into categories.
Example 1.2
The Journal of Performance of Constructed Facilities (Feb., 1990) reported on
the performance dimensions of water distribution networks in the Philadelphia
area. For one part of the study, the following variables were measured for each
sampled water pipe section. Identify the data produced by each as quantitative
or qualitative.
a. Pipe diameter (measured in inches)
b. Pipe material (steel or PVC)
c. Pipe location (Center City or suburbs)
d. Pipe length (measured in feet)
Solution
Both pipe diameter (in inches) and pipe length (in feet) are measured on a meaningful
numerical scale; hence, these two variables produce quantitative data. Both type of
pipe material and pipe location can only be classified—material is either steel or PVC;
location is either Center City or the suburbs. Consequently, pipe material and pipe location are both qualitative variables.
The proper statistical tool used to describe and analyze data will depend on
the type of data. Consequently, it is important to differentiate between quantitative
and qualitative data.
Applied Exercises
1.9
Drinking-water quality study. Disasters (Vol. 28, 2004) published a study of the effects of a tropical cyclone on the
quality of drinking water on a remote Pacific island. Water
samples (size 500 milliliters) were collected approximately
4 weeks after Cyclone Ami hit the island. The following
variables were recorded for each water sample. Identify
each variable as quantitative or qualitative.
a. Town where sample was collected
b. Type of water supply (river intake, stream, or borehole)
c. Acidic level (pH scale, 1 to 14)
d. Turbidity level (nephalometric turbidity units = NTUs)
e. Temperature (degrees Centigrade)
f. Number of fecal coliforms per 100 milliliters
g. Free chlorine-residual (milligrams per liter)
h. Presence of hydrogen sulphide (yes or no)
1.10 Extinct New Zealand birds. Environmental engineers at the
University of California (Riverside) are studying the patterns of extinction in the New Zealand bird population.
(Evolutionary Ecology Research, July 2003.) The following characteristics were determined for each bird species
that inhabited New Zealand at the time of the Maori colonization (i.e., prior to European Contact).
a. Flight capability (volant or flightless)
b. Habitat type (aquatic, ground terrestrial, or aerial
terrestrial)
c. Nesting site (ground, cavity within ground, tree, cavity
d.
e.
f.
g.
h.
above ground)
Nest density (high or low)
Diet (fish, vertebrates, vegetables, or invertebrates)
Body mass (grams)
Egg length (millimeters)
Extinct status (extinct, absent from island, present)
1.11 Computer power system load currents. Refer to the IEEE
Transactions on Industry Applications (July/Aug. 1990)
survey of computer power system load currents in Exercise
1.5. In addition to the ratio of neutral current to full-load
current, the researchers also recorded the type of load (lineto-line or line-to-neutral) and the computer system vendor.
Identify the type of data for each variable recorded.
1.12 CT scanning for lung cancer. A new type of screening for
lung cancer, computed tomography (CT), has been developed. Medical physicists believe CT scans are more sensitive than regular X-rays in pinpointing small tumors. The
H. Lee Moffitt Cancer Center at the University of South
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 7
MARKED SET
1.4 The Role of Statistics in Critical Thinking 7
Florida is currently conducting a clinical trial of 50,000
smokers nationwide to compare the effectiveness of CT
scans with X-rays for detecting lung cancer. (Todays’ Tomorrows, Fall 2002.) Each participating smoker is randomly assigned to one of two screening methods, CT or
chest X-ray, and their progress tracked over time. In addition to the type of screening method used, the physicists
recorded the age at which the scanning method first detects a tumor for each smoker.
a. Identify the experimental units of the study.
b. Identify the two variables measured for each experimental unit.
c. Identify the type (quantitative or qualitative) of the
variables measured.
d. What is the inference that will ultimately be drawn
from the clinical trial?
variables in the drilling process are described here. Identify
the data type for each variable.
a. Chip discharge rate (number of chips discarded per
minute)
b. Drilling depth (millimeters)
c. Oil velocity (millimeters per second)
d. Type of drilling (single-edge, BTA, or ejector)
e. Quality of hole surface
1.14 National Bridge Inventory. All highway bridges in the
1.13 Deep hole drilling. “Deep hole” drilling is a family of
drilling processes used when the ratio of hole depth to
hole diameter exceeds 10. Successful deep hole drilling
depends on the satisfactory discharge of the drill chip. An
experiment was conducted to investigate the performance
of deep hole drilling when chip congestion exists (Journal
of Engineering for Industry, May 1993). Some important
United States are inspected periodically for structural deficiency by the Federal Highway Administration (FHWA).
Data from the FHWA inspections are compiled into the
National Bridge Inventory (NBI). Several of the nearly
100 variables maintained by the NBI are listed below.
Classify each variable as quantitative or qualitative.
a. Length of maximum span (feet)
b. Number of vehicle lanes
c. Toll bridge (yes or no)
d. Average daily traffic
e. Condition of deck (good, fair, or poor)
f. Bypass or detour length (miles)
g. Route type (interstate, U.S., state, county, or city)
1.4 The Role of Statistics in Critical Thinking
Experimental research in engineering and the sciences typically involves the use of
experimental data—a sample—to infer the nature of some conceptual population that
characterizes a phenomenon of interest to the experimenter. This inferential process is
an integral part of the scientific method. Inference based on experimental data is first
used to develop a theory about some phenomenon. Then the theory is tested against
additional sample data.
How does the science of statistics contribute to this process? To answer this question, we must note that inferences based on sample data will almost always be subject
to error, because a sample will not provide an exact image of the population. The nature of the information provided by a sample depends on the particular sample chosen
and thus will change from sample to sample. For example, suppose you want to estimate the proportion of all steel alloy failures at U.S. petrochemical plants caused by
stress corrosion cracking. You investigate the cause of failure for a sample of 100 steel
alloy failures and find that 47 were caused by stress corrosion cracking. Does this
mean that exactly 47% of all steel alloy failures at petrochemical plants are caused by
stress corrosion cracking? Of course, the answer is “no.” Suppose that, unknown to
you, the true percentage of steel alloy failures caused by stress corrosion cracking is
44%. One sample of 100 failures might yield 47 that were caused by cracking, whereas
another sample of 100 might yield only 42. Thus, an inference based on sampling is
always subject to uncertainty.
On the other hand, suppose one petrochemical plant experienced a steel alloy failure rate of 81%. Is this an unusually high failure rate, given the sample rate of 47%?
The theory of statistics uses probability to measure the uncertainty associated with an
inference. It enables engineers and scientists to calculate the probabilities of observing
specific samples or data measurements, under specific assumptions about the population. These probabilities are used to evaluate the uncertainties associated with sample
inferences; for example, we can determine whether the plant’s steel alloy failure rate
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 8
MARKED SET
8 Chapter 1 Introduction
Descriptive
Probability
Reliability
Chapters
3, 4, 5, 6
Qualitative
Section
2.1
Data
Inferential
Study
Chapter
17
Quantitative
Sections
2.2–2.7
Quantitative
Qualitative
Data
Parameter
Parameter
Proportions
Model
relationships
One
proportion
Two
proportions
Three or
more
proportions
Chapters
10, 11, 12
Section 15.7
Sections
7.9, 8.11
16.6
Sections
7.10, 8.12
Chapter 9
Means
Variances
One
mean
Two
means
Three or
more means
One
variance
Two
variances
Sections
7.6, 8.7,
15.2, 16.3
Sections
7.7, 7.8,
8.9, 8.10
15.3, 15.4
Chapter 14,
Sections
15.5, 15.6
Sections
7.11, 8.13,
16.4
Sections
7.12, 8.14
FIGURE 1.1
Flowchart of statistical methods described in the text
of 81% is unusually high by calculating the chance of observing such a high rate given
the sample information.
Thus, a major contribution of statistics is that it enables engineers and scientists to
make inferences—estimates and decisions about the target population—with a known
measure of reliability. With this ability, an engineer can make intelligent decisions and inferences from data; that is, statistics helps engineers to think critically about their results.
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 9
MARKED SET
Quick Review 9
Definition 1.11
Statistical thinking involves applying rational thought and the science of statistics to critically assess data and inferences.
1.5 A Guide to Statistical Methods Presented in This Text
Although we present some useful methods for exploring and describing data sets
(Chapter 2), the major emphasis in this text and in modern statistics is in the area of inferential statistics. The flowchart in Figure 1.1 is provided as an outline of the chapters
in this text and as a guide to selecting the statistical method appropriate for your particular analysis.
Quick Review
Key Terms
Data
Descriptive statistics
Experimental unit
Inference
Inferential statistics
Measure of reliability
Measurement error
Population
Qualitative data
Quantitative data
Reliability
Sample
Statistical thinking
Statistics
Variable
Chapter Summary Notes
•
•
•
•
•
Two types of statistical applications: descriptive and inferential
Fundamental elements of statistics: population, experimental units, variable, sample, inference, measure of reliability
Descriptive statistics involves summarizing and describing data sets.
Inferential statistics involves using a sample to make inferences about a population.
Two types of data: quantitative and qualitative
Supplementary Exercises
1.15 Reliability of a computer system. The reliability of a com-
puter system is measured in terms of the lifelength of a
specified hardware component (e.g., the hard disk drive).
To estimate the reliability of a particular system, 100 computer components are tested until they fail, and their lifelengths are recorded.
a. What is the population of interest?
b. What is the sample?
c. Are the data quantitative or qualitative?
d. How could the sample information be used to estimate
the reliability of the computer system?
1.16 Traveling turtle hatchlings. Hundreds of sea turtle hatch-
lings, instinctively following the bright lights of condominiums, wandered to their deaths across a coastal highway in
Florida (Tampa Tribune, Sept. 16, 1990). This incident
led researchers to begin experimenting with special lowpressure sodium lights. One night, 60 turtle hatchlings were
released on a dark beach and their direction of travel noted.
The next night, the special lights were installed and the
same 60 hatchlings were released. Finally, on the third
night, tar paper was placed over the sodium lights. Conse-
quently, the direction of travel was recorded for each
hatchling under three experimental conditions—darkness,
sodium lights, and sodium lights covered with tar paper.
a. Identify the population of interest to the researchers.
b. Identify the sample.
c. What type of data were collected, quantitative or
qualitative?
1.17 Acid neutralizer experiment. A chemical engineer con-
ducts an experiment to determine the amount of hydrochloric acid necessary to neutralize 2 milliliters (ml) of
a newly developed cleaning solution. The chemist prepares five 2-ml portions of the solution and adds a known
concentration of hydrochloric acid to each. The amount of
acid necessary to achieve neutrality of the solution is determined for each of the five portions.
a. Identify the experimental units for the study.
b. Identify the variable measured.
c. Describe the population of interest to the chemical
engineer.
d. Describe the sample.
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 10
MARKED SET
10 Chapter 1 Introduction
1.18 PCB in soil. A preliminary study was conducted to obtain
information on the background levels of the toxic substance polychlorinated biphenyl (PCB) in soil samples in
the United Kingdom (Chemosphere, Feb. 1986). For each
soil sample taken, the researchers recorded the location
(rural or urban) and the PCB level (measured in grams per
kilogram of soil). Identify the variables measured as quantitative or qualitative.
1.19 Intellectual development of engineering students. Perry’s
model of intellectual development was applied to undergraduate engineering students at Penn State (Journal of
Engineering Education, Jan. 2005). Perry scores (ranging
from 1 to 5) were determined for 21 students in a firstyear, project-based design course. (Note: A Perry score of
1 indicates the lowest level of intellectual development,
and a Perry score of 5 indicates the highest level.) The average Perry score for the 21 students was 3.27.
a. Identify the experimental units for this study.
b. What is the population of interest? The sample?
c. What type of data, quantitative or qualitative, are
collected?
d. Use the sample information to make an inference about
the population.
1.20 Type of data. State whether each of the following data sets
a. Arrival times of 16 reflected seismic waves
b. Types of computer software used in a database man-
agement system
c. Brands of calculator used by 100 engineering students
on campus
d. Ash contents in pieces of coal from three different mines
e. Mileages attained by 12 automobiles powered by alcohol
f. Numbers of print characters per line of computer out-
put for 20 line printers
g. Shift supervisors in charge of computer operations at
an airline company
h. Accident rates at 46 machine shops
1.21 Structurally deficient bridges. Refer to Exercise 1.14. The
most recent NBI data were analyzed, and the results published in the Journal of Infrastructure Systems (June 1995).
Using the FHWA inspection ratings, each of the 470,515
highway bridges in the United States was categorized as
structurally deficient, functionally obsolete, or safe. About
26% of the bridges were found to be structurally deficient,
and 19% were functionally obsolete.
a. What is the variable of interest to the researchers?
b. Is the variable of part a quantitative or qualitative?
c. Is the data set analyzed a population or a sample?
Explain.
d. How did the researchers obtain the data for their study?
is quantitative or qualitative.
•
•
•
STATISTICS IN ACTION
Contamination of Fish in the Tennessee River: Collecting the Data
hemical and manufacturing plants often discharge toxic waste materials into nearby rivers and
streams. These toxicants have a detrimental effect on the plant and animal life inhabiting the river
and the river’s bank. One type of pollutant, commonly known as DDT, is especially harmful to fish and,
indirectly, to people. The Food and Drug Administration sets the limit for DDT content in individual fish at 5
parts per million (ppm). Fish with DDT content exceeding this limit are considered potentially hazardous to
people if consumed. A study was undertaken to examine the DDT content of fish inhabiting the Tennessee
River (in Alabama) and its tributaries.
The Tennessee River flows in a west–east direction across the northern part of the state of Alabama,
through Wheeler Reservoir, a national wildlife refuge. Ecologists fear that contaminated fish migrating from
the mouth of the river to the reservoir could endanger other wildlife that prey on the fish. This concern
is more than academic. A manufacturing plant was once located along Indian Creek, which enters the
Tennessee River 321 miles upstream from the mouth. Although the plant has been inactive for over
10 years, there is evidence that the plant discharged toxic materials into the creek, contaminating all the
fish in the immediate area. Have the fish in the Tennessee River and its tributary creeks also been contaminated? And if so, how far upstream have the contaminated fish migrated? To answer these and other questions, members of the U.S. Army Corps of Engineers collected fish specimens at different locations along
the Tennessee River and three tributary creeks: Flint Creek (which enters the river 309 miles upstream from
the river’s mouth), Limestone Creek (310 miles upstream), and Spring Creek (282 miles upstream). Each fish
was first weighed (in grams) and measured (length in centimeters), then the fillet of the fish was extracted
and the DDT concentration (in parts per million) in the fillet was measured.
The FISHDDT file contains the length, weight, and DDT measurements for a total of 144 fish specimens.
FISHDDT
Obviously, not all the fish in the Tennessee River and its tributaries were captured. Consequently, the data
are based on a sample collected from the population of all fish inhabiting the Tennessee River. Here, the
C
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 11
MARKED SET
Statistics in Action
11
words population and sample are used to describe the objects upon which the measurements are taken,
i.e., the fish. We could also use the terms to represent data sets. For example, the 144 DDT measurements
represent a sample collected from the population consisting of DDT measurements for all fish inhabiting
the river.
In addition to the quantitative variables length, weight, and DDT concentration, notice that the data set
also contains information on the qualitative variables location (i.e., where the fish were captured) and
species of the fish. Three species of fish were examined: channel catfish, largemouth bass, and smallmouth
buffalo. The different symbols for location are interpreted as follows. The first two characters represent the
river or creek, and the remaining characters represent the distance (in miles) from the mouth of the river or
creek. For example, FCM5 indicates that the fish was captured in Flint Creek (FC), 5 miles upstream from
the mouth of the creek (M5). Similarly, TRM380 denotes a fish sample collected from the Tennessee River
(TR), 380 miles upstream from the river’s mouth (M380).
The U.S. Army Corps of Engineers used the data in the FISHDDT file to compare the DDT contents of fish
at different locations and among the different species, and to determine the relationship (if any) of length
and weight to DDT content. In subsequent chapters, we demonstrate several of these analyses in examples,
exercises, and in a statistics in Action section. •
MENDMC01_0131877062.QXD
3/29/06
11:48 PM
Page 12
MARKED SET