Metropolitan Areas Villages Sadar Upazilas Peri

Sampling Stratification in
Practice:
An Example from a Rural
Investment Climate Survey in
Bangladesh.
Mikhail Bontch-Osmolovski, DECRG
29 January, 2008
The Bangladesh Rural Investment
Climate Assessment, 2007

Objectives
 Measure investment climate conditions
 Analyze enterprise performance and start-up
decisions
 Identify and prioritize areas of policy actions in
support of a stronger private sector

Innovations in the 2007 ICA for Bangladesh
 Focus
on rural-urban linkages
 Secondary towns
 Service sector
Urban Sample



City corporations of Dhaka, Chittagong,
Khulna, Barisal, Rajshahi and Sylhet.
Large manufacturing enterprises.
2006 Census of large enterprises as a
sampling frame
Rural sample



What is rural?
Informal enterprises
No sampling frame
Rural sample objectives
Non-farm enterprises in “rural” areas, but
close to cities: sampling design
 Manufacturing, and Trade, and Services
 need to stratify by sector
 90% of enterprise have less than 5
workers  need to stratify by size
 Also need households with no enterprises

Administrative Divisions
6 divisions: Barisal, Chittagong, Dhaka,
Khulna, Rajshahi, Sylhet
 64 districts: Zila
 Sub-districts: 500 Upazila/Thana

 Union/Wards
 Mahallas/villages
(7500 in Ec. Census)
Stylized Description
Peri-metro Area
Metropolitan
Areas
Sadar
Upazilas
Villages
Sampling Strategy

The strategy was to select the following
random samples:
 50
mahallas located in business-intensive perimetropolitan areas and in sadar upazilas.
 3 villages in the immediate neighborhood of each
base mahalla (150 satellite villages in total.)
 2,500 non-farm enterprises in the BMs and SVs
 4 households without enterprises in each satellite
village (600 households without enterprises in total).
Selecting Base Mahallas

Stratification : 50 base mahallas were allocated into 7
strata.








2 mahallas in the Barisal peri-metropolitan area.
5 mahallas in the Chittagong peri-metropolitan area.
7 mahallas in the Dhaka peri-metropolitan area.
3 mahallas in the Khulna peri-metropolitan area.
6 mahallas in the Rajshahi peri-metropolitan area.
2 mahallas in the Sylhet peri-metropolitan area.
25 mahallas in the sadar upazilas
Mahallas were selected with pps from each stratum

Number of enterprises from the 2006 Business Census used as
a measure of size
Selecting Satellite Villages

The sample frame for the selection of the 3 satellite
villages (for each selected mahalla) is the list of all
villages that satisfy the following conditions:





Be located in the same zila of the base mahalla;
Be accessible from the base mahalla in one hour or less, using
the most common mean of public transportation;
Be in 30 km radius from the base mahalla.
Be located outside the Union/Ward of the base mahalla.
3 satellite villages selected by systematic, equal
probability sampling from list of all “qualifying” villages.
Implicit stratification by travel time.
Selecting Satellite Villages:
Implementation
No villages outside the upazila of selected
base mahalla were present in the list of
qualifying satellite villages.
I.e. the SV listing rule was modified by data
collection firm.
Selecting Enterprises
The listing exercise: get employment and
sector information for each address.
 Stratification decision:

 By
sector:1500 man., 500 trade, 500 services.
 By size: P=1. for 10+ enterprises.
P ~ Employment for small
enterprises.
Selecting Households
No stratification
 Select 4 households with no enterprises in
each of 150 satellite villages.
 Equal probability of selection within the
villages.

Calculating weights: households
P(H) = P(H|SV)*P(SV)
 P(H|SV) = 4/N_hh
 P(SV) = P(BM)*3/N_sv
 P(BM) we know from the first stage.
 W_01=1/P(H)

????
Calculating weights: households
WRONG!
With those weights we got estimate of
60 million households in our non-metro
sample > total number of households in
Bangladesh.
Calculating weights: households


Problem: Areas of eligible villages may intersect
for two base mahallas, so satellite village could
be selected into sample in more than one way.
That is why actual probability of selecting SV is
lower, and weights are too high.
For each selected SV, need to calculate
probability of selection – how many BM were in
the neighborhood.
Calculating weights: households
Assume, that once select BM, all the satellite
villages in the upazila are eligible. Upazilas are
small.
 Each base mahalla in upazila would have the
same list of villages.
 P(SV) = P(Upazila on 1st stage)*3/N_sv
P(Upazila) – proportional to the “size” of the
upazila

Calculating weights: households
P(Upazila) = K*(Nent_Upaz)/Nent_stratum
K=number of base mahallas to be
selected:2,5,7,3,6,2,25
P>1 ?  Upazila is selected with certainty.
New weights: 1.7 mln households.

Probability of enterprises
Enterprises were not selected within village.
So formula P(Ent)=P(Ent|SV)*P(SV) is an approximation.
Formally:
P(ent i) = 1500*n_i/(N_man)
N_man comes from the first stage, so it could be different, if
some other satellite villages were selected. Then P(ent)
would be different.
Assuming total employment in selected villages is constant:
P(ent) = P(ent|Sv_1)*P(Sv_1)=1500*n_i/(N_man)*P(SV)