sample design: who will be in the sample?

SAMPLE DESIGN:
WHO WILL BE IN THE
SAMPLE?
Lu Ann Aday, Ph.D.
The University of Texas
School of Public Health
SAMPLE DESIGN:
Key Components
Target Population or Universe: group
about which information is desired
Sampling frame: operational definition of the
target population which directly matches the
target population, e.g., existing or constructed
list of individuals from which the sample
would actually be drawn
• Sample elements: types of individuals or units that
will be drawn, i.e., ultimate sampling unit refers to
final sampling unit that is usually the focus of the
analysis, e.g., individuals
SAMPLE DESIGN:
Types of Designs
Probability Sample: Relies on laws
of chance to pick the sample,
where probability of selection is
known, i.e., based on sampling
fraction: n/N
Nonprobability Sample: Relies on
human judgment to pick the
sample
SAMPLE DESIGN:
Types of Nonprobability Designs
Purposive: Pick people for certain purpose,
e.g., focus groups
Quota: Pick target number of people in certain
categories, e.g., women 18-35
Chunk: Pick convenient “chunk” of people,
e.g., church attendees
Volunteer: Ask for volunteers, e.g., healthy
male medical students
Snowball: Identify small number of individuals
representative of the population of interest,
who then identify others that meet the same
inclusion criteria, e.g., drug users
SAMPLE DESIGN:
Types of Probability Designs
Simple random sample
Systematic random sample
Stratified sample
Cluster sample
SAMPLE DESIGN:
Simple Random Sample
Definition: Every unit in the
population has a known,
nonzero, and equal chance
of being selected through a
lottery-type procedure
SAMPLE DESIGN:
Simple Random Sample
Procedures
Draw sample randomly from numbers
assigned to sampling elements placed
in a sampling “urn” OR
Use a random numbers table to identify
sampling elements to be included OR
Use computer software to randomly
select sample from computerized
sampling frame
RANDOM NUMBERS TABLE:
Example: 1-Select random starting point “X”; 2-Look at 1st two
digits of random numbers; 3-Proceed from left to right through
table to identify elements from sampling frame (numbered 1-50)
until the target sample size (n) , e.g., 10, has been reached.
91567
42595 X 27958
30134
04024
17955
56349
90999
49127
20044
46503
18584
18845
49618
02304
92157
89634
94824
78171
84610
14577
62765
35065
81263
39667
SAMPLE DESIGN:
Systematic Random Sample
Definition: Variation of
simple random sample
selected through randomly
selecting a starting point
and then taking every n’th
unit thereafter, based on
the sampling fraction
SAMPLE DESIGN:
Systematic Random Sample
Procedures
1-Determine the sampling interval required to
sample the required number of cases, based
on the sampling fraction: n/N, e.g, 10/50 =
1/5
2-Select a random starting point “X” within
the first sampling interval, e.g., elements 1-5
3-Starting at “X”, sample every n/Nth case
from the sampling frame until the target
sample size (n) , e.g., 10, has been reached
SYSTEMATIC RANDOM SAMPLE:
Example, e.g., n/N=10/50 = 1/5 (20%)
1
11
21
31
41
2
12
22
32
42
3
X
13
X
23
X
33
X
43
4
14
24
34
44
5
15
25
35
45
6
16
26
36
46
7
17
27
37
47
8
X
18
X
28
X
38
X
48
9
19
29
39
49
10
20
30
40
50
X
X
SAMPLE DESIGN:
Stratified Sample
Definition: Sample based on
dividing the population into
homogeneous strata and drawing
random-type sample separately
from all the strata
Proportionate: Use same sampling
fraction in each stratum
Disproportionate: Use different
sampling fraction in each (or
selected) stratum
SAMPLE DESIGN:
Stratified Sample
Procedures
1-Order or group the sampling frame by
relevant strata
2-Determine the sampling interval required to
sample the required number of cases, based
on the sampling fraction
3-Select a random starting point “X” within
the first sampling interval
4-Starting at “X”, sample every n/Nth case
from the sampling frame until the target
sample size (n) has been reached
STRATIFIED SAMPLE: ExampleProportionate,
e.g., n/N=1/20 (5%) in all strata
STRATA
N (%)
n/N
n (%)
A
500 (5%)
1/20
25 (5%)
B
3000 (30%)
1/20
150 (30%)
C
2000 (20%)
1/20
100 (20%)
D
500 (5%)
1/20
25 (5%)
E
700 (7%)
1/20
35 (7%)
F
1600 (16%)
1/20
80 (16%)
G
700 (7%)
1/20
35 (7%)
H
1000 (10%)
1/20
50 (10%)
10000
500
STRATIFIED SAMPLE: ExampleDisproportionate,
e.g., n/N=1/20 (5%) in
strata B,C,F,H & 1/10 (10%) in strata A,D,E,G
STRATA
N (%)
n/N
n (%)
A
500 (5%)
1/10
50 (8.1%)
B
3000 (30%)
1/20
150 (24.2%)
C
2000 (20%)
1/20
100 (16.1%)
D
500 (5%)
1/10
50 (8.1%)
E
700 (7%)
1/10
70 (11.3%)
F
1600 (16%)
1/20
80 (12.9%)
G
700 (7%)
1/10
70 (11.3%)
H
1000 (10%)
1/20
50 (8.1%)
10000
620
SAMPLE DESIGN:
Cluster Sample
Definition: Sample based on
dividing the population into
heterogeneous clusters and
drawing random-type sample
separately from sample of
clusters
CLUSTER SAMPLE: Example—Probability
Proportionate to Size (PPS) (Aday &
Cornelius, 2006, Table 6.2)
(continued in next lecture)
Block A: 100 HUs* Block F: 250 HUs* Block K: 200 HUs*
Block B: 50 HUs
Block G: 125 HUs* Block L: 300 HUs*
Block C: 75 HUs
Block H: 50 HUs
Block M: 125 HUs
Block D: 150 HUs*
Block I: 100 HUs*
Block N: 150 HUs*
Block E: 200 HUs* Block J: 50 HUs
Block O: 275 HUs*
CRITERIA FOR EVALUATING
SAMPLE DESIGNS
Precision—how close the estimates
derived from the sample are to the
true population value as a function
of variable sampling error
Accuracy—how close the estimates
derived from the sample are to the
true population value as a function
of systematic sampling error (bias)
CRITERIA FOR EVALUATING
SAMPLE DESIGNS (cont.)
Complexity—number of stages and
steps required to implement the
sample design
Efficiency—obtaining the most
accurate and precise estimates at
the lowest possible costs
ADVANTAGES & DISADVANTAGES:
Simple Random Sample
ADVANTAGES
Requires little
knowledge of
population in
advance
DISADVANTAGES
May not capture
certain groups of
interest
May not be very
efficient
ADVANTAGES & DISADVANTAGES:
Systematic Random Sample
ADVANTAGES
Easy to analyze
and compute
sampling
(standard) errors
High precision
DISADVANTAGES
Periodic ordering of
elements in sample
frame may create
biases in the data
May not capture
certain groups of
interest
May not be very
efficient
ADVANTAGES & DISADVANTAGES:
Stratified Sample
ADVANTAGES
Enables certain
groups of interest
to be captured
Enables
disproportionate
sampling within
strata
Highest precision
DISADVANTAGES
Requires knowledge
of population in
advance
May introduce more
complexity in
analyzing data and
computing sampling
(standard) errors
ADVANTAGES & DISADVANTAGES:
Cluster Sample
ADVANTAGES
Lowers field costs
Enables sampling
of groups of
individuals for
which detail on
individuals
themselves may
not be available
DISADVANTAGES
Introduces more
complexity in
analyzing data and
computing sampling
(standard) errors
Lowest precision