Coverage and Sampling

COVERAGE AND
SAMPLING
Damon Burton
University of Idaho
What do each of these
important sampling terms
mean?
ESSENTIAL SAMPLING
DEFINITIONS
Survey Population -- consists of all units
(i.e., individuals, households,
organizations) to which one desires to
generalize survey results.
Sample Frame -- list from which a
sample is to be drawn in order to
represent the survey population.
Sample -- consists of all units of the
population that are drawn for inclusion
in the survey.
ESSENTIAL SAMPLING
DEFINITIONS
Completed Sample -- consists of all units
(i.e., persons) that complete the survey.
Coverage Error – results from every unit
in the survey population not having a
known, nonzero chance of being
included in the sample.
Sampling Error – is the result of
collecting data from only a subset,
rather than all, members of the
sampling frame.
COVERAGE CONSIDERATIONS
Telephone coverage
Internet coverage
Mail coverage
TELEPHONE COVERAGE
In 2000, telephones were regarded as the best
survey mode for general surveys because
 high coverage (i.e., 90% of Americans had
phones),
 Random Digit Dialing (RDD) procedures allowed
sampling of most phone users,
 People were ameanable to answering survey
questions over the phone.
By 2003, half of all US citizens used cell phones,
and by 2007, 16% had only cell service.
Today almost 20% of US adults would be
excluded by RDD sampling procedures.
INTERNET COVERAGE
The internet is a useful mode for conducting
surveys for specific populations who have
service (e.g., students, professionals, &
businesses), but it has significant coverage gaps
with the general population.
 As of 2007, only 71% of Americans used the
internet at least occasionally.
 Only 67% had internet service in their homes.
 Only 47% had high-speed home internet service
with 23% having dial-up and 29% having no
home internet access.
 Internet growth seems to be slowing.
PROBLEMS WITH INTERNET
FOR POPULATION SURVEYS
No list of all, or most, internet subscribers is
available (i.e., sampling frame).
No simple procedure is available for drawing
samples in which individuals, or households, have a
known, nonzero chance of inclusion.
People’s ability to use the internet varies
significantly, even in households with good access.
Because internet providers are private, not public,
legal and cultural barriers prevent contacting
randomly generated email addresses.
Web surveyors often use self-selected panels of
respondents, creating a number of sampling issues.
MAIL COVERAGE
Phone books once were good sources of addresses
for mail surveys.
By 1990, 25% of households had unlisted numbers,
and cell phone-only households rose sharply.
Address-based sampling has become more feasible
with US Postal Service DSF lists.
 DSF is an electronic file containing all delivery point
addresses by USPS.
 Names are provided for addresses except PO boxes.
 DSF can’t tell homes from businesses.
 Geocoding is possible for stratified sampling or
targeting specific populations.
 DSF is available thru vendors, each has different
processes for managing and updating lists.
MAIL COVERAGE
Missing addresses for multiperson dwellings (e.g.,
apartments) are problematic.
Initial evaluations have shown DSF mail surveys with
a reminder mailing resulted in 4-7% higher response
rates than RDD surveys.
RDD and DSF surveys overrepresent white, nonHispanic individuals with higher education levels who
are married.
Other lists sometimes used include licensed drivers,
utility users, registered voters, and homeowners.
General lists may be compiled from multiple sources,
including: credit card holders, telephone directories,
magazine subscribers, bank depositors, organization
membership lists, catalog and internet customers, and
other sources.
What are the major
coverage issues for phone,
internet and mail surveys?
REDUCING COVERAGE ERRORS
Many surveys are designed for
special populations.
You need to know how a specific
list is compiled, maintained and
used.
5 important questions to ask about
any potential sampling list.
COVERAGE QUESTION 1
Does the list contain everyone in
the survey population?
If not, determine whether getting
the remainder of the people on the
list is possible.
Evaluate the consequences of not
obtaining excluded names.
COVERAGE QUESTION 2
Does the list include names of people
who are not in the study population?
If so, learning up front exactly who is
on the list and why would have
allowed respondents to only answer
questions appropriate for them.
This targeting strategy would save
valuable resources.
COVERAGE QUESTION 3
How is the list maintained and
updated?
You may need to check the accuracy
of addresses before surveying.
Accuracy depends on continual
updating of addresses on list.
COVERAGE QUESTION 4
Are the same sample units included
on the list more than once?
Customers’ names may be added to
the list each time they order if a
slightly different name or address are
given.
Divorced parents are often on the list
twice compared to married parents
only once.
COVERAGE QUESTION 5
Does the list contain other
information that can be used to
improve the survey?
Use mixed modes for different
aspects of the survey process.
Age and gender can be used to
identify nonresponse error.
What other information would be
valuable?
RESPONDENT SELECTION
Samples drawn from phone books in the 19701990’s typically produced a higher proportion of
male respondents, even when letters requested
females complete the survey.
Women are more likely to participate in phone
surveys because they answer the phone more
often.
Commonly ask for “the adult with the most recent
birthday” to randomize respondents in the
household.
Other surveys target “the individual who shops for
groceries most often,” “who makes the
investment decisions,” or “who purchases the
computer.”
COVERAGE OUTCOMES
The goal is that every unit in the survey
population appears on the sample frame list only
once, so the survey population is prepared for
actual sampling.
Often researchers must decide what amount of
coverage is acceptable.
Do alternatives exist? What is the cost of those
alternatives? Can the coverage error be accurately
assessed?
Mixed mode surveys are a possibility. For
example, most of the survey is conducted on the
internet, but hard copy surveys are mailed to the
portion of the sample who don’t have internet
access.
PROBABILITY SAMPLING
Sampling error is the type of error that
occurs because information is requested
from only a sample of the population rather
than the entire sample.
The first step in drawing a sample is to
understand the number of properly selected
respondents necessary for generalizing
results to the population and with what
degree of accuracy.
How do I calculate the
desired sample size for a
survey study?
HOW LARGE SHOULD A
SAMPLE BE?
The size of the sample, not the proportion
sampled, is what affects precision.
The formula takes into account
 How much sampling error can be tolerated within
a given confidence interval,
 The amount of confidence one wishes to have in
the estimates,
 How varied the population is with respect to the
characteristic of interest, and
 The size of the population from which the sample
is drawn.
SAMPLE SIZE
(Np)(p)(1-p)
Ns =
(Np-1)(B/C)2+(p)(1-p)
Formula terms
 Ns = the completed sample size needed for the




desired level of precision.
Np = the size of the population,
p = the proportion of the population expected to
choose one of the 2 response categories,
B = margin of error (i.e., half of the desired
confidence interval width such as 3%),
C = Z score associated with the confidence level (i.e.,
1.96 corresponds with a 95% confidence level).
How do I draw a good
simple random sample?
5 SAMPLING PREMISES
1. Relatively few completed questionnaires can provide
surprising precision at a high level of confidence.
2. Among large populations, there is virtually no
difference in the completed sample size needed for a
given confidence level of precision.
3. Within small populations, greater proportions of the
population are needed to be surveyed to achieve
estimates with a given margin of error.
4. At higher levels of sample size, increasing your
sample size yield smaller and smaller reductions in
margin of error.
5. Completed sample sizes must be much larger if one
wants to make precise estimates for subgroups of the
population.
DRAWING A SIMPLE
RANDOM SAMPLE
Typically numbers are assigned to every member of
the sample frame, and the computer randomly
selects a certain number of respondents.
Sometimes comparisons require sampling different
segments of the population unequally. Comparing
employees who have worked for a c company more
or less than 6 months requires weighted sampling.
Because employees with less than 6 months service
represent only 5% of the workforce, more veteran
employees have a 20% great chance of being select.
If you need equal numbers for these 2 groups, you’ll
need to sample a higher percentange of new than
older employees.
The
End