On Small Area Estimation under Informative Sampling
Danny Pfeffermann1 and Michael Sverchkov2
1
Hebrew University and University of Southampton, 2Bureau of Labor Statistics and User Technology
2
Associates, Inc.
Bureau of Labor Statistics, Room 1950, 2 Massachusetts Avenue, NE, Washington, DC 20212
SUMMARY
Classical small area estimation techniques assume either that all the areas are represented in the sample
or that the selection of the areas to the sample is noninformative. When the areas are sampled with
unequal selection probabilities that are related to the values of the response variable, the classical
estimators are biased; the magnitude of the bias depends on the sampling fraction and the covariance
between the sampling weights and the response variable. We illustrate this point using very simple
models employing the notions of the sample distribution and sample-complement distribution. We
suggest simple unbiased estimators based on these distributions.
Key words: Sample distribution, Sample-complement distribution
1. The sample and sample-complement distributions
Consider a finite population U consisting of N units belonging to M areas, with
¦
M
i 1
Ni
N i units in area i ,
N . Let y define the study variable with value yij for unit j in area i and denote by xij the
values of auxiliary (covariate) variables that are possibly known for that unit. In what follows we
consider the population y-values as random realizations of the following two level stochastic process:
First level- values (random effects) {u1 ...uM } are generated independently from some distribution
with probability density function (pdf)
f p (ui ) for which E p (ui )
0 ; E p (ui2 ) V u2 , where E p
defines the expectation operator; Second level- values { yi1 ... yiNi } are generated from some
f p ( yij | xij , ui ) , for i 1...M . We assume a two-stage sampling
conditional distribution with pdf
Pr(i s ) and in the
scheme which in the first stage selects m areas with inclusion probabilities S i
ni units are sampled from area i selected in the first step with inclusion
probabilities S j|i Pr( j si | i s ) . Note that the sample inclusion probabilities at both stages may
second step
depend in general on all the population or area values of y, x and possibly design variables z, used for
the sample selection but not included in the working model. Denote by , i and , ij the sample indicator
variables at the two stages ( , i
w j|i
1 iff i s and similarly for , ij ) and by wi
1/ S j|i the corresponding first and second stage sampling weights.
Following Pfeffermann et. al (1998), we define the conditional sample pdf of
conditional pdf of
ui for area i s as,
f s (ui )
def
f (ui | , i
1)
Bayes
Pr(, i
1 ui ) f p (ui )
Pr(, i
def
f (ui | , i
0)
Bayes
Pr(, i
0 ui ) f p (ui )
Pr(, i
0)
Notice that the population, sample and sample-complement pdfs of
Pr(, i
1 | ui )
Pr(, i
ui , i.e., the first level
(1.1)
1)
Similarly, the conditional sample-complement pdf, i.e., the conditional pdf of
defined in Sverchkov and Pfeffermann (2001) as,
f c (ui )
1/ S i and
ui for area i s is
(1.2)
ui are the same iff
1) i , in which case the sampling of areas is noninformative.
The second level sample pdf and sample-complement pdf of
as,
f s ( yij | xij , ui )
f c ( yij | xij , ui )
def
def
Pr(, ij
f ( yij | xij , ui , , ij
1)
f ( yij | xij , ui , , ij
0)
Pr(, ij
yij are defined similarly to (1.1) and (1.2)
1 yij , xij , ui ) f p ( yij xij , ui )
Pr(, ij
(1.3)
1 xij , ui )
0 yij , xij , ui ) f p ( yij xij , ui )
Pr(, ij
(1.4)
0 xij , ui )
The model defined by (1.1) and (1.3) defines the two-level sample model analogue of the population
model defined by f p (ui | zi ) and f p ( yij | xij , ui ) ; see also Pfeffermann et. al (2001).
The following relationships are established in Pfeffermann and Sverchkov (1999) and Sverchkov and
Pfeffermann (2001) for general pairs of random variables v1 , v2 measured for elements i U where
E p , Es and Ec denote expectations under the population, sample and sample-complement
distributions and ( S i
, wi ) define the sample inclusion probability and the sampling weight.
f s (v1i v2i )
E p (v1i v2i )
f c (v1i v2i )
E p (S i v1i , v2i ) f p (v1i v2i )
f (v1i | v2i , i s )
Es ( wi v1i v2i )
Es ( wi v2i )
f (v1i v2 i , i s )
(1.5)
E p (S i v2i )
;
E p (S i v2i )
1
Es ( wi v2i )
(1.6)
E p [(1 S i ) v1i , v2i ] f p (v1i v2 i )
E p [(1 S i ) v2 i ]
(1.7)
Es [( wi 1) v1i , v2 i ] f s (v1i v2i )
Es [( wi 1) v2i ]
Ec (v1i v2i )
v1i
Defining v1ij
Defining
E p [(1 S i )v1i v2i ]
E p [(1 S i ) v2i ]
Es [( wi 1)v1i v2i ]
(1.8)
Es [( wi 1) v2i ]
ui , v2i =constant yields the relationships holding for the random area effects ui .
yij ; v2ij ( xij , ui ) and substituting S j|i and w j|i for S i and wi respectively yields
the relationships holding for the observations
yij .
2. Optimal Small Area Predictors
The target estimated population parameters are the small area means
Yi
¦
Ni
j 1
yij / N i for
{( yij , S j|i , S i ), (i, j ) s; (, kl , , k , xkl ), (k , l ) U } define the known data.
ˆ
The MSE of a predictor Yi given Ds with respect to the population pdf is,
i 1...M . Let Ds
MSE (Yˆi | Ds )
E p [(Yˆi Yi ) 2 | Ds ]
E p {[Yˆi E p (Yi | Ds )]2 | Ds } V p (Yi | Ds )
[Yˆi E p (Yi | Ds )]2 V p (Yi | Ds )
(2.1)
V p (Yi | Ds ) does not depend on the form of the predictor and hence the MSE is
ˆ E (Y | D ) . In what follows we distinguish between sampled areas ( , 1 )
minimized when Yi
p
i
s
i
The variance
and nonsampled areas ( , i
sampled areas,
0 ). Denote by si the sample of units in sampled area i . Then, for the
2
E p (Yi | Ds , , i
1
{¦ j s E p ( yij | Ds ) ¦ l s E p ( yil | Ds , , il
i
i
Ni
1)
1
{¦ j
Ni
For areas
(2.2)
yij ¦ l s Ec ( yil | Ds )
si
i
i not in the sample,
E p (Yi | Ds , , i
0)
1
Ni
0)
1
Ni
¦
¦
Ni
k 1
E p ( yik | Ds , , ik
0)
(2.3)
Ni
k 1
Ec ( yik | Ds )
The predictors in (2.2) and (2.3) can be written in a single equation as,
1
N
N
{¦ k i 1 yik , ik ¦ k i 1 Ec [ yik (1 , ik ) | Ds ]}, i
Ni
E p (Yi | Ds )
(2.4)
N
1
{¦ k i 1 Ec [ yik | Ds ]}(1 , i )
Ni
3. Bias of Small Area Predictors when ignoring the Sampling Scheme
Consider for convenience the case of a sampled area. Ignoring the sampling scheme implies an implicit
assumption that the sample-complement model and the sample model are the same such that
Yˆ
i , IGN
¦
j si
yij ¦ l s Es ( yil | Ds ) . Hence,
i
E p [(Yˆ
i , IGN Yi ) | Ds , , i
1]
1
Ni
¦
1
Ni
[ Es ( yil | Ds ) Ec ( yil | Ds )]
l si
¦
(3.1)
Covs ( yil , wl|i | Ds )
l si
Es [( wl |i 1) | Ds ]
with the second equality following from (1.8). Thus, unless the response values
yil and the ‘within’
sampling weights wl |i are uncorrelated, ignoring the sampling scheme results in biased predictors (see
also the empirical results). A similar expression for the bias can be obtained for the nonsampled areas.
A simple Example. Let the population model be the “unit level random effects model”
yij
P ui eij ; ui ~ N (0, V u2 ) , eij ~ N (0, V e2 )
(3.2)
with all the random effects and residual terms being mutually independent.
Let S i c u N i where c is some constant and S j|i n0 / N i (fixed sample size
selected areas), such that S ij
Pr[(i, j ) s] S iS j|i
n0 within the
const . Note that the sample selection
within the selected areas is noninformative in this case but if the area sizes
Ni are correlated with the
random effects ui (say, the areas are schools, the study variable measures children’s attainment, the
large schools are in the poor areas), the selection of the areas is informative.
Suppose that the areas sizes can be modeled as log( N i ) ~ N ( Aui , V M ) , implying that
2
V M2
) by familiar properties of the lognormal distribution. It follows that
E p (S i | ui ) E exp( Aui 2
(see Pfeffermann et al. 1998, example 4.3),
f s (ui )
E p (S i ui ) f p (ui )
E p (S i )
3
N ( AV u2 , V u2 )
(3.3)
Es (ui )
so that
JV u2 z E p (ui )
0 . The fact that the random effects in the sample have in this case
a positive expectation is easily explained by the fact that the sampling scheme considered tends to
select the areas with large positive random effects. Note, however, that by defining
ui AV u2 , the model holding for the sample data in sampled areas is yij
*
and ui
P AV u2
P*
P * ui* eij ,
ui* ~ N (0, V u2 ) , eij ~ N (0, V e2 ) , which is the same as the population model. Thus, the optimal
predictors under the population model for the area means
are still optimal under the sample model
Next consider nonsampled areas. By (1.7),
f c (ui )
E p [(1 S i ) | ui ] f p (ui )
E p (1 S i )
E p ( m)
Let
f p (ui )
E p (1 S i )
P ui of the sampled areas ( , i
E p (S i | ui ) f p (ui )
E p (1 S i )
M
M
M
E p (S i )
1)
(3.4)
E p [¦ l 1 , l ] E p [ E p (¦ l 1 , l |{Ni })] E p [¦ l 1 S i ] ME p (S i ) define
expected number of sampled areas, such that
is fixed, E p ( m)
Ti
the
E p (m) / M . If the number of sampled areas
m . By (3.4) and (1.5),
f c (ui ) [ Mf p (ui ) E p (m) f s (ui )] /[ M E p (m)] and hence,
Ec (ui )
E p (m) Es (ui )
M E p ( m)
E p (m) AV u2
M E p ( m)
(3.5)
Here again, the negative expectation of the random effects pertaining to nonsampled areas is easily
explained by the tendency of the sampling scheme to select the areas with the large positive random
effects. Thus, ignoring the sampling scheme underlying the selection of the areas and predicting the
sample means in nonsampled areas by, say, the average of the predictors in the sampled areas yields in
general biased predictors with a positive bias defined by the absolute value of the right hand side of
(3.5).
4. On Small Area Estimation based on Sample Distribution
In order to illustrate the proposed approach, we suppose that the area level random effects model
defined by (4.1) holds for the sampled areas, i.e., for j si
yij
P ui eij ; ui | , 1i ~ N (0, V u2 ) , eij | , ij
1 ~ N (0, V e2 )
(4.1)
We mention in this respect that the sample model can be identified using conventional techniques, see,
e.g., Rao (2003).
Suppose that in the first stage m areas are selected with inclusion probabilities S i (m is fixed) and in
the second stage
probabilities S j|i
ni units are sampled from area i selected in the first stage with inclusion
where again, we assume for convenience that the sample sizes ni are fixed. Assume
that,
Es ( w j|i | yij , ui )
Es ( w j|i | yij )
ci exp(byij )
(4.2)
T i P ui and ci >0 and b are fixed parameters. Notice that since ni is fixed,
Es ( w j|i | ui ) N i / ni .
where
Comment: As with the sample model (4.1), the expectation in (4.2) refers to the sample distribution
within the areas. The relationship between the sampling weights and the observed data holding in the
sample can be identified and estimated therefore from the sample data. See Pfeffermann and Sverchkov
(1999, 2003) for discussion and examples. On the other hand, the relationship between the sampling
weights wi and the small area means T i P ui is more difficult to detect since the area means are
not observable and in what follows we do not model this relationship. See Pfeffermann et al. (2001) for
an example of modeling the selection probabilities at both stages.
As established in Section 2, the optimal predictor for areas in the sample is,
4
1) [¦ j
E p (Yi | Ds , , i
expectations
si
Ec ( yil | Ds , , i
f c ( yil T i , , i
yij ¦ l s Ec ( yil | Ds , , i
1)] / N i . In order to compute the
i
1) we follow the following steps. First, by (1.7), (4.1) and (4.2),
[ Es ( wl |i yil ,T i ) 1] f s ( yil T i )
1)
Es ( wl |i T i ) 1
[ci exp(byil ) 1]
y T i Ni
1
I [ il
] /[ 1]
Ve
Ve
ni
(4.3)
ni
V e2 b 2 1
yil (T i bV e2 )
y Ti
1
=
{ci exp(T i b ) I[
] I [ il
]}
N i ni
2 Ve
Ve
Ve
Ve
where I is the standard normal pdf. Notice that if b 0 (noninformative selection within the
sampled areas with equal inclusion probabilities), ci
Ni / ni and the pdf in (4.3) reduces to the
conditional normal density defined by (4.1). Second, by (4.3),
Ec ( yil | T i , , i
V 2b2
ni
{ci exp(T i b e )[T i bV e2 ] T i }
Ni ni
2
1)
Finally,
Ec ( yil | Ds , , i
Es [ Ec ( yil | , i 1,T i )]
(4.5)
where the exterior expectation is with respect to the distribution of T i | Ds , , i 1 . Under the model
(4.1), the latter distribution is normal with mean Tˆ J y (1 J ) y and variance
vi
Es [ Ec ( yil | Ds , , i
(4.4)
1)
J iV (1 J i ) (¦ i 1 J i / V )
2
i
area i ,
y
m
2
¦
2
i
n yi / ¦ i 1 ni , V i2
m
i
ni
j 1
i
i
i
yij / ni is the sample mean in sampled
Var ( yi | ui ) and J i
Ec ( yil | Ds , , i
¦
yi
where
V e2 / ni
m
i 1 i
1
1,T i )]
V u2 /[V u2 V i2 ] .
1) is obtained by computing the expectation of the right
hand side of (4.4) with respect to the normal distribution of T i | Ds , , i 1 . We find that,
Thus, for the sampled areas
Ec ( yil | Ds , , i
1)
ni
b2
{ci [Tˆi b(vi V e2 )]exp[Tˆi b (V e2 vi )] Tˆi }
N i ni
2
(4.6)
Notice that if b=0 (noninformative sampling within the areas with equal inclusion probabilities)
ci
Ni / ni and Ec ( yil | Ds , , i
Comment:
The
E p (Yi | Ds , , i
predictor,
optimal
1)
[¦ j
1) Tˆi .
predictor
si
obtained
for
the
case
of
noninformative
sampling,
yij ( N i ni )Tˆi ] / N i (Eq. 2.2) is different from the common
Tˆi . This is so because the target parameter is defined to be the finite area mean Yi rather
than T i . See also Prasad and Rao (1990).
For the nonsampled areas the optimal
E p (Yi | Ds , , i
Ec ( yik | Ds , , i
0)
¦
N
k 1
Ec ( yik | Ds , , i
predictor
is
defined
in
(2.3)
to
be,
0) / N i . In order to compute the expectations
0) we note first that
f p ( yij | T i , , i
1)
f p ( yij | T i , , i
0)
f p ( ykl | T k )
(4.7)
signifying that conditionally on the area means T i , the population pdf is the same for all the areas
irrespective of whether the areas are sampled or not. The pdf
and (4.2) similarly to the derivation of
f c ( yil T i , , i
5
f p ( yil T i ) is obtained from (1.5), (1.6)
1) in (4.3) as,
Es ( wl|i yil ,T i ) f s ( yil T i )
f p ( yil T i )
(4.8)
Es ( wl |i T i )
ci ni
V 2b2 1
y (T i bV e2 )
exp(T i b e ) I [ il
]
Ni
2 Ve
Ve
Notice that the population pdf is different from the sample pdf defined by (4.1) unless the sampling
scheme within the areas is noninformative (b=0).
By (4.8),
E p ( yil | T i )
ci ni
V e2 b 2
exp(T i b )(T i bV e2 )
Ni
2
(4.9)
Now,
Ec ( yik | Ds , , i
0)
Def
E p ( yik | Ds , , i
E p [ E p ( yik | T i , Ds , , i
E p [ E p ( yik | T i ) | Ds , , i
0) | Ds , , i
0]
Def
0)
0]
(4.10)
Ec [ E p ( yik | T i ) | Ds ]
where the exterior expectations in the last row are with respect to the conditional distribution
f p (T i | Ds , , i 0) f c (T i | Ds ) .
Finally, by (1.8) and (4.10),
Ec ( yik | Ds , , i
Denoting T i , p
0)
Ec [ E p ( yik | T i ) | Ds ]
Es [
( wi 1) E p ( yik | T i )
(4.11)
| Ds ]
Es ( wi | Ds ) 1
E p ( yik | T i ) , an estimator of the expectation Ec ( yik | Ds , , i 0) is obtained
from (4.11) as,
Eˆc ( yik | Ds , , i
0)
( wi 1)Tˆi , p
1
¦
m i s Eˆs ( wi | Ds ) 1
¦
( wi 1)Tˆi , p
i s
¦
i s
( wi 1)
(4.12)
1
¦ wi and Tˆi, p is obtained by substituting Ti in (4.9) by
m is
Tˆi J i yi (1 J i ) y Es (T i | Ds , , i 1) or by use direct Hajek estimator of
T i , p E p ( yik | T i ) . Notice that the right hand side of (4.12) defines the predictor of the mean
where
Eˆs ( wi | Ds )
T i ’ s in the nonsampled areas.
References
Pfeffermann, D., Krieger, A. M., and Rinott, Y. (1998). Parametric distributions of complex survey
data under informative probability sampling. Statistica Sinica, 8, 1087-1114.
Pfeffermann, D., Moura, F., Silva, P.N. (2002). Fitting multilevel models under informative
probability sampling. Multilevel Modeling Newsletter, 14, 8-17.
Pfeffermann, D. and Sverchkov, M. (1999). Parametric and semi-parametric estimation of
regression models fitted to survey data, Sankhya, B, 61, Pt.1, 166 – 186
Pfeffermann, D. and Sverchkov, M. (2003). Fitting generalised linear models under informative
sampling. Chapter 12 in R. L. Chambers and C. Skinner (eds.), Analysis of Survey Data,
Chichester: Wiley , pp. 175 - 195
Rao, J. N. K. (2003). Small Area Estimation, Wiley.
Prasad, N. G. N. and Rao, J. N. K. (1999). On robust small area estimation using a simple random
effect model. Survey Methodology, 25, 67-72
Sverchkov, M. and Pfeffermann, D. (2000). Prediction of finite population totals under
informative sampling utilizing the sample distribution, American Statistical Association
Proceedings of the Section on Survey Research Methods, 0, 41 - 46
6
© Copyright 2025 Paperzz