PPS Sampling

R-PATA 8029: Improving Agricultural and
Rural Statistics for Food Security
Basic Sample Survey Concepts
Dalisay S. Maligalig
27 March 2014
Outline of the Lecture
• Multi-stage Probability Sampling
– Example: Philippines master sample
• Review of Cluster Sampling
• Probability Proportional to Size Sampling
• Sample Designs
Multi-stage Probability Sampling (1)
• Large national probability samples involve
several stages of stratified cluster sampling.
• The whole country is divided into geographic
clusters, urban and rural.
• Some large urban areas are selected with
certainty.
• Other areas are formed into strata of areas;
clusters are selected randomly from these
strata.
Multi-stage Probability Sampling (2)
• Within each sampled area, the clusters are
defined, and the process is repeated
perhaps several times, until blocks
exchanges are selected.
• At the last step, households and individuals
within household are randomly selected.
Purpose of Review of Cluster Sample
• To demonstrate how a cluster sample is
selected in practice
• To demonstrate how parameters are
estimated under cluster sampling
– We do this for clusters of same size and
clusters of different sizes.
• The practicalities of cluster sampling is also
discussed.
6
SRS is not always appropriate!
Example
• Population of N=324
households
• Households arranged into 36
“villages” of 9 households each
• Costly to travel between
villages
• Cheap to travel between
households in a village
Taking a SRS of n=27 households is a “costly” strategy
7
Cluster Sampling Review (1)
Example (cont.)
• Each village is a primary
sampling unit (PSU)
(
• Each household in a village is a
secondary sampling unit (SSU)
• Take a sample of villages
• Sample all households within
the selected villages
• This is one
one--stage cluster
sampling.
8
Cluster Sampling Review (2)
• Units in a cluster tend to be more similar to each
other and different to units in other clusters.
• Cluster sampling often leads to less precise
estimates than SRS.
(opposite concept to stratification)
• Trade-off between convenience and precision:
If cluster sampling is cheaper to do, could take
larger sample to help improve precision.
9
How do we select PSUs?
• In this first (unrealistic) example, the villages all have
the same number of households, hence we select
villages using simple random sampling.
• In general, the PSUs (villages) may not have the same
number of SSUs (households). Might then want to
select PSUs using
– Probability proportional to size
– gives large PSUs a greater probability of occurring
in the sample than a small PSU
10
PPS Sampling (with replacement) 1
Example:
M=8 Villages (PSUs) of different sizes.
Want to sample 3 of them (m=3).
• Assume interest is still in income from sale of goods
(recorded for households and totalled for each village).
– Larger villages are likely to have higher incomes, and
smaller villages, lower incomes.
11
PPS Sampling (2)
• 240 households (SSUs) in the population arranged in
the villages as follows:
PSU (e.g.
village no.)
1
2
3
4
5
6
7
8
PSU size
10
10
20
20
40
40
50
50
7
8
• Probability of village being selected (pi ) is:
PSU (e.g.
village no.)
pi
1
2
3
4
1/24 1/24 1/12 1/12
5
6
1/6
1/6
5/24 5/24
12
PPS Sampling (3)
Step 1: Calculate the cumulative sum of the SSUs
PSU (e.g.
village no.)
1
2
3
4
5
6
7
8
PSU size
10
10
20
20
40
40
50
50
Cumulative
Sum
10
20
40
60
100
140
190
240
Step 2: Draw a number at random from 1,2,…240
• This determines which village is selected
e.g. 48 would be in Village 4, and 190 in Village 7.
13
PPS Sampling (4)
Step 3: Replace number and repeat to select other villages
Three random numbers may be 33, 174, 137. This implies that Villages 3,
7 and 6 will be the sample PSUs.
PSU (e.g.
village no.)
1
2
3
4
5
6
7
8
Cumulative
Sum
10
20
40
60
100
140
190
240
Step 4: Sample all households in the selected villages
Calculate the estimated total income for the area then
weigh according to the size of the village.
14
PPS Sampling (5)
How can we eliminate the effect of unequal clusters’ sizes (increase in
variance) to increase precision when using PPS Sampling?
The aim is to satisfy conditions of fixed n and equal probability of
selection with a two-stage sample from unequal sized clusters.
Consider the selection equation:
If a fixed sample size, b, is taken from each selected PSU then n=ab is
fixed.
Then
Thus,
P(αβ ) = P(α ) P (α β ) = f .
P(αβ ) = P(α )b Bα = f = ab N .
P(α ) = aBα N .
PPS: An Example
PSU
1
2
3
4
Bα
5
10
12
9
Cumulative
Bα
5
15
27
36
Select a random number from 1 to 36, say 19.
Then PSU 3 is selected.
PPS: Oversized PSUs
Mα
PSU
Cumulative M α
A
300
300
B
30
330
C
200
530
D
900
1430
E
100
1530
F
70
1600
PSU D is bound to be selected and has a chance of 1/8 of being
selected twice. Suppose that a=2 b=50:
2 M α 50
1
f =
=
1600 M α 16
PPS: Oversized PSUs
Mα
PSU
A
300
Cumulative M α
300
B
30
330
C
200
530
E
100
630
F
70
700
D
900
900
Treat D as a self representing PSU.
1 Mα b
900
1. Apply f=1/16 to D.
=
⇒b=
= 56 .25
16 900 M α
16
2.
For the remaining PSUs the selection equation:
1 Mα b
700
=
⇒b=
= 43 .75
16 700 M α
16
PPS: Undersized PSUs
Mα
PSU
A+B
300+30
Cumulative M α
330
C
200
530
E
100
630
F
70
700
D
900
900
1.
Link small PSUs to others before selection.
2.
Put small PSUs in a separate stratum with special selection
procedure.
Intra-class Correlation
Intra-class correlation is related to the design
effect:
V ( yc )
2
D ( yc ) =
V( y SRS )
≅ 1 + (B − 1)ρ
If ρ is small and positive but B is large,
D 2 ( yc ) is large. The solution is to sub-sample to
make B smaller.
Sample Designs For Compact
Populations With List Frames
• Simple random sampling
• Systematic sampling
• Stratified sampling
– proportionate
– disproportionate
Sample Designs For Widespread
Populations and/or
Populations Without List Frames
Complex cluster sample designs involving:
• Cluster sampling
• Multi-stage sampling
• Probability proportional to size (PPS) sampling
• Stratified sampling
• Systematic sampling
For More Information
[email protected]
Web site: http://sdbs.adb.org