1 Collecting Data

1
Collecting Data
So far our focus has been on very small sets of data: frequency and point value of
Scrabble tiles, career statistics for single players (Steve Carlton), team statistics
for a single season (Boston Celtics, Atlanta Falcons), etc. We have had the
ability to collect the entire set of data. This is certainly not always true in the
…eld of statistics.
De…nition 1 Individuals are the objects described by a set of data. Individuals may be people, animals, objects or abstract concepts.
De…nition 2 A variable is a characteristic that di¤ ers or varies from one
observation to the next.
Example 1 If we study athletes at KSU then the athlete is the individual. Different variables are the athlete’s height, weight, sport, GPA, etc.
Example 2 If we study professional sports teams then the teams are the individuals. Di¤ erent variables are the sport, team payroll, number of players on
the team, etc.
De…nition 3 A population is the entire collection of data that describe some
phenomenon.
De…nition 4 A sample is a subset of a population from which we actually
collect data.
De…nition 5 Using a sample to make an inference about a population is called
inferential statistics.
We frequently use samples when the population is too large to gather. Say
we wish to determine the average number of hours student athletes at KSU
prepare for classes. It might be very di¢ cult and time consuming to collect
this data from every student athlete. If would be easier to use a sample. One
possible sample is to use the chess team. Another sample would be to use all
the student athletes present in this classroom. A third sample would be to
randomly select one student from each team at KSU. The sampling design
describes how the sample is selected. Unfortunately not all samples are equally
useful when practicing inferential statistics.
Problem 1 Is the KSU chess team a good sample to use for estimating the
number of hours the average KSU athlete studies? Explain.
Problem 2 Is the sample of all the student athletes present in this classroom a
good sample to use for estimating the number of hours the average KSU athlete
studies? Explain.
Problem 3 Is the sample of a single student from each of KSU’s teams appropriate for estimating the number of hours the average KSU athlete studies?
Explain.
1
De…nition 6 A sample selected by taking individuals of the population that are
easy to reach is called a convenience sample.
De…nition 7 A sample consisting of people who choose themselves to respond
to an appeal for opinions is a self-selected sample or a voluntary response
sample.
Convenience samples and voluntary response samples almost always provide
biased results.
Problem 4 Identify misuses of statistics in the following: Creative Loa…ng
posted an online poll asking readers if Atlantans should approve a 2% increase
in sales tax to purchase new stadiums for millionaire athletes? Ten people
responded, 83% said no’ and 17% said ’yes.’ Creative Loa…ng reported that
Georgians do not want an increase in sales tax to fund urban renewal projects.
What makes for a good sample? A simple random sample.
De…nition 8 A simple random sample (SRS) of size n is a sample of n
individuals from the population such that every individual has an equal chance
of inclusion.
Reliable data goes beyond collecting a good sample.
Problem 5 Do you have concerns about the following survey?
2
2
Exercises
1. Identify …ve misuses of statistics in the following statement.
A sports reporter on the local country radio station asked his listeners
to call in and answer the following question: Do you support the use of
3
random drug tests to catch cheaters who soil the reputation of Baseball by
using steroids? The DJ reported that twenty people responded, 87% said
yes and 13% said no. The DJ concluded that Americans overwhelmingly
support random drug testing.
4