R-PATA 8029: Improving Agricultural and Rural Statistics for Food Security Basic Sample Survey Concepts Dalisay S. Maligalig 27 March 2014 Outline of the Lecture • Multi-stage Probability Sampling – Example: Philippines master sample • Review of Cluster Sampling • Probability Proportional to Size Sampling • Sample Designs Multi-stage Probability Sampling (1) • Large national probability samples involve several stages of stratified cluster sampling. • The whole country is divided into geographic clusters, urban and rural. • Some large urban areas are selected with certainty. • Other areas are formed into strata of areas; clusters are selected randomly from these strata. Multi-stage Probability Sampling (2) • Within each sampled area, the clusters are defined, and the process is repeated perhaps several times, until blocks exchanges are selected. • At the last step, households and individuals within household are randomly selected. Purpose of Review of Cluster Sample • To demonstrate how a cluster sample is selected in practice • To demonstrate how parameters are estimated under cluster sampling – We do this for clusters of same size and clusters of different sizes. • The practicalities of cluster sampling is also discussed. 6 SRS is not always appropriate! Example • Population of N=324 households • Households arranged into 36 “villages” of 9 households each • Costly to travel between villages • Cheap to travel between households in a village Taking a SRS of n=27 households is a “costly” strategy 7 Cluster Sampling Review (1) Example (cont.) • Each village is a primary sampling unit (PSU) ( • Each household in a village is a secondary sampling unit (SSU) • Take a sample of villages • Sample all households within the selected villages • This is one one--stage cluster sampling. 8 Cluster Sampling Review (2) • Units in a cluster tend to be more similar to each other and different to units in other clusters. • Cluster sampling often leads to less precise estimates than SRS. (opposite concept to stratification) • Trade-off between convenience and precision: If cluster sampling is cheaper to do, could take larger sample to help improve precision. 9 How do we select PSUs? • In this first (unrealistic) example, the villages all have the same number of households, hence we select villages using simple random sampling. • In general, the PSUs (villages) may not have the same number of SSUs (households). Might then want to select PSUs using – Probability proportional to size – gives large PSUs a greater probability of occurring in the sample than a small PSU 10 PPS Sampling (with replacement) 1 Example: M=8 Villages (PSUs) of different sizes. Want to sample 3 of them (m=3). • Assume interest is still in income from sale of goods (recorded for households and totalled for each village). – Larger villages are likely to have higher incomes, and smaller villages, lower incomes. 11 PPS Sampling (2) • 240 households (SSUs) in the population arranged in the villages as follows: PSU (e.g. village no.) 1 2 3 4 5 6 7 8 PSU size 10 10 20 20 40 40 50 50 7 8 • Probability of village being selected (pi ) is: PSU (e.g. village no.) pi 1 2 3 4 1/24 1/24 1/12 1/12 5 6 1/6 1/6 5/24 5/24 12 PPS Sampling (3) Step 1: Calculate the cumulative sum of the SSUs PSU (e.g. village no.) 1 2 3 4 5 6 7 8 PSU size 10 10 20 20 40 40 50 50 Cumulative Sum 10 20 40 60 100 140 190 240 Step 2: Draw a number at random from 1,2,…240 • This determines which village is selected e.g. 48 would be in Village 4, and 190 in Village 7. 13 PPS Sampling (4) Step 3: Replace number and repeat to select other villages Three random numbers may be 33, 174, 137. This implies that Villages 3, 7 and 6 will be the sample PSUs. PSU (e.g. village no.) 1 2 3 4 5 6 7 8 Cumulative Sum 10 20 40 60 100 140 190 240 Step 4: Sample all households in the selected villages Calculate the estimated total income for the area then weigh according to the size of the village. 14 PPS Sampling (5) How can we eliminate the effect of unequal clusters’ sizes (increase in variance) to increase precision when using PPS Sampling? The aim is to satisfy conditions of fixed n and equal probability of selection with a two-stage sample from unequal sized clusters. Consider the selection equation: If a fixed sample size, b, is taken from each selected PSU then n=ab is fixed. Then Thus, P(αβ ) = P(α ) P (α β ) = f . P(αβ ) = P(α )b Bα = f = ab N . P(α ) = aBα N . PPS: An Example PSU 1 2 3 4 Bα 5 10 12 9 Cumulative Bα 5 15 27 36 Select a random number from 1 to 36, say 19. Then PSU 3 is selected. PPS: Oversized PSUs Mα PSU Cumulative M α A 300 300 B 30 330 C 200 530 D 900 1430 E 100 1530 F 70 1600 PSU D is bound to be selected and has a chance of 1/8 of being selected twice. Suppose that a=2 b=50: 2 M α 50 1 f = = 1600 M α 16 PPS: Oversized PSUs Mα PSU A 300 Cumulative M α 300 B 30 330 C 200 530 E 100 630 F 70 700 D 900 900 Treat D as a self representing PSU. 1 Mα b 900 1. Apply f=1/16 to D. = ⇒b= = 56 .25 16 900 M α 16 2. For the remaining PSUs the selection equation: 1 Mα b 700 = ⇒b= = 43 .75 16 700 M α 16 PPS: Undersized PSUs Mα PSU A+B 300+30 Cumulative M α 330 C 200 530 E 100 630 F 70 700 D 900 900 1. Link small PSUs to others before selection. 2. Put small PSUs in a separate stratum with special selection procedure. Intra-class Correlation Intra-class correlation is related to the design effect: V ( yc ) 2 D ( yc ) = V( y SRS ) ≅ 1 + (B − 1)ρ If ρ is small and positive but B is large, D 2 ( yc ) is large. The solution is to sub-sample to make B smaller. Sample Designs For Compact Populations With List Frames • Simple random sampling • Systematic sampling • Stratified sampling – proportionate – disproportionate Sample Designs For Widespread Populations and/or Populations Without List Frames Complex cluster sample designs involving: • Cluster sampling • Multi-stage sampling • Probability proportional to size (PPS) sampling • Stratified sampling • Systematic sampling For More Information [email protected] Web site: http://sdbs.adb.org
© Copyright 2026 Paperzz