Small-area estimation based on natural exponential family quadratic

Biometrika (2004), 91, 1, pp. 95–112
© 2004 Biometrika Trust
Printed in Great Britain
Small-area estimation based on natural exponential family
quadratic variance function models and survey weights
B MALAY GHOSH
Department of Statistics, University of Florida, Gainesville, Florida 32611-8545, U.S.A.
[email protected]
 TAPABRATA MAITI
Department of Statistics, Iowa State University, Ames, Iowa 50011-1210, U.S.A.
[email protected]
S
We propose pseudo empirical best linear unbiased estimators of small-area means based
on natural exponential family quadratic variance function models when the basic data
consist of survey-weighted estimators of these means, area-specific covariates and certain
summary measures involving the weights. We also provide explicit approximate mean squared
errors of these estimators in the spirit of Prasad & Rao (1990), and these estimators can
be readily evaluated. A simulation study is undertaken to evaluate the performance of the
proposed inferential procedure. We estimate also the proportion of poor children in the
5–17 years age-group for the different counties in one of the states in the United States.
Some key words: Area specific covariate; Linear unbiased estimator; Mean squared error; Optimal estimating
equation; Poor school-age children; Pseudo empirical best.
1. I
Direct survey estimators for small areas are usually unreliable, which makes it necessary
to use models, either explicit or implicit, to connect the small areas and obtain estimators
of improved precision by ‘borrowing strength’. The direct estimators, however, are not
without value, since they often form the starting point for finding model-based estimates.
Most of the existing small-area estimation methods do not make use of survey weights.
However, direct estimates are often survey-weighted, and are the only accessible data along
with area level covariates for secondary users of surveys. There is therefore a clear need
to develop small-area estimation methods based on weighted survey data.
We develop a general methodology for finding empirical best linear unbiased predictors
of small-area means based on direct survey-weighted estimators and natural exponential
family quadratic variance function superpopulation models. Morris (1982, 1983) first
characterised distributions belonging to the natural exponential family quadratic variance
function family and studied many of their properties. The six basic distributions belonging
to this family are the binomial, Poisson, normal with known variance, gamma, negative
binomial and the generalised hyperbolic secant. A simulation is undertaken to evaluate
96
M G  T M
the proposed methodology. Also, the procedure is illustrated by estimating the proportion
of poor children in the 5–17 age-group for the different counties of the United States as
a part of the project with the United States Bureau of the Census.
Kott (1989) initiated the study of small-area estimators using survey weights primarily
with the objective of achieving design consistency, as small-area sample sizes increase, in
the case of model failure. He used the familiar unit-level random effects model and obtained
both model-unbiased and design-consistent small-area estimators assuming equality of the
error variances.
Prasad & Rao (1999) obtained pseudo empirical best linear unbiased predictors of
small-area parameters which depended on survey weights and were design consistent.
They also obtained approximate estimators of the mean squared errors of the small-area
estimators which were more stable than the ones proposed by Kott (1989). Their approach,
based on a random effects model, needed estimation of both the model variance and the
error variance in finding the pseudo empirical best linear unbiased predictors. However,
the variance of the distribution is a known function of the mean for the binomial and
Poisson distributions, which are typically used to model binary and count data, and this
is true in general for the natural exponential family quadratic variance function family of
distributions.
Since our application consists of finding the number or proportion of poor school-age
children, it fits in very nicely within the general natural exponential family quadratic
variance function framework. Our procedure for estimating the model parameters will
depend only on the direct survey weighted estimators of small-area means, the area-specific
weights and certain summary measures involving the survey weights.
In § 2 of this paper, we introduce the natural exponential family quadratic variance function
and model along with a conjugate prior for the canonical parameter of the exponential
model. Together, they constitute an overdispered natural exponential family quadratic
variance function model. Our model involves area-level covariates, but is motivated from
the unit-level model via the use of design weights. Based on such models, we first develop
pseudo best linear unbiased predictors of the small-area means, and subsequently find the
pseudo empirical best linear unbiased predictors by estimating the prior parameters, by a
method, based on the theory of optimal estimating functions as proposed by Godambe &
Thompson (1989), which is quite different from the currently available procedures even
in the normal case. We also point out the design consistency of these small-area estimators
when the sample size within a small area grows to infinity.
In § 3, we derive approximate formulae for the mean squared errors and find their
approximate estimators; to find bias-corrected estimators of the mean squared errors, we
use a variation of a certain technique of Cox & Snell (1968), who obtained asymptotic
biases of the maximum likelihood estimators up to a certain desired order. A small
simulation study in § 4 illustrates applicability of the proposed method, and § 5 describes
our demographic example.
Pfeffermann and his colleagues have argued strongly in favour of accounting for the
sample selection process for inferential purposes in complex surveys. The present study
provides a systematic account of how to use survey weights for small-area estimation
based on the natural exponential family quadratic variance function model, although,
unlike as in Pfeffermann (1993), the present study is based on noninformative sampling.
However, we do indicate very briefly how to modify our procedure along the lines suggested by Pfeffermann, Krieger & Rinott (1998) and Pfeffermann & Sverchkov (1999)
when the selection probabilities of the different units depend on the response values.
Small-area estimation
97
2. D  - 
Let y denote the response of the jth unit in the ith small area ( j=1, . . . , n ; i=1, . . . , k);
ij
i
let wA be the weight attached to y , usually the inverse of the selection probability,
ij
ij
and let w =wA /W ni wA so that W ni w =1. Our basic data consist of y: =W ni w y
iw
ij j=1 ij
ij
j=1 ij
j=1 ij ij
(i=1, . . . , k), and not the y themselves. It is assumed that the weights w are independent
ij
ij
of the y so that the former can be considered as fixed numbers given the sample.
ij
Suppose that the y are independent, and y has probability density function or probability
ij
ij
function belonging to the natural exponential family quadratic variance function family,
written as
f (y |h )=exp [j {h y −y(h )}+c(y , j )].
ij i
i i ij
i
ij i
This is the regular one-parameter exponential family model. Here
(2·1)
E(y |h )=y∞(h )=m , var (y |h )=y◊(h )/j =V (m )/j ,
ij i
i
i
ij i
i i
i i
say. Since var ( y |h )>0, m is strictly increasing in h .
ij i
i
i
With the quadratic variance function structure, V (m )=v +v m +v m2 , where v , v
i
0
1 i
2 i
0 1
and v are not simultaneously zero. For the binomial distribution, v =0, v =1 and v =−1.
2
0
1
2
For the Poisson distribution, v =v =0 and v =1. For the normal model with known
0
2
1
variance s2, j =s−2, v =1 and v =v =0. From now on, unless otherwise specified, we
i
0
1
2
will take j =1 for all i=1, . . . , k.
i
In what follows we assume that the weights w are independent of the y and hence
ij
ij
that they can be considered as fixed numbers given the sample. The estimator y: can be
iw
interpreted as a weighted maximum likelihood estimator or pseudo maximum likelihood
estimator of m , using the terminology of Roberts et al. (1987) or Skinner et al. (1989).
i
Following these authors, we begin with the weighted likelihood
n
(h )= ai f wij (y |h ),
i
ij i
j=1
where f (y |h ) is defined in (2·1). Now solving
ij i
d log (h )
n
i = ∑i w (y −m )=0,
ij ij
i
dh
i
j=1
one obtains the weighted maximum likelihood estimator of m as y: . Alternatively,
i
iw
following Pfeffermann, Skinner et al. (1998), we can begin with the census likelihood
equation W Ni (y −m )=0, using all the hypothetical parameters in a finite population
i
j=1 ij
of size N , and then make the Horvitz–Thompson replacement of W Ni ( y −m ) by
i
i
j=1 ij
W ni w (y −m ), where w is the normalised inverse of the selection probability p of the
ij
ij
i
ij
ij
j=1
jth unit in the ith local area.
Consider now the conjugate prior with probability density function
p(h )=exp [l{m h −y(h )}]C(l, m ),
(2·2)
i
i i
i
i
for h , where m =g(xT b) (i=1, . . . , k). Here x is the design vector for the ith small area,
i
i
i
i
b is the regression coefficient and g is the link function. Then (Morris, 1983)
V (m )
i ,
E(m )=m , var (m )=
i
i
i
l−v
2
(2·3)
98
M G  T M
where l>max (0, v ). Since var (m ) is strictly decreasing in l, l may be interpreted as the
2
i
precision parameter.
We obtain first the best linear unbiased predictor of h based on y: . To this end, we
iw
i
compute
E(y: )=EE(y: |h )=E(m )=m .
iw
iw i
i
i
(2·4)
Also, writing d = W ni w2 , we have
i
j=1 ij
var (y: )=var {E(y: |h )}+E{var (y: |h )}=var (m )+d E{V (m )}
iw
iw i
iw i
i
i
i
=V (m )(l−v )−1+d E{V (m )+(m −m )V ∞(m )+v (m −m )2}
i
2
i
i
i
i
i
2 i
i
=V (m )(l−v )−1+d {V (m )+v V (m )(l−v )−1}
i
2
i
i
2
i
2
=V (m )(l−v )−1(1+ld )=w V (m ),
i
2
i
i
i
(2·5)
say, and
cov (y: , m )=E{cov (y: , m )|h }+cov {E(y: |h ), m }=E(0)+var (m )
iw i i
i
iw i i
iw i
=V (m )(l−v )−1.
i
2
(2·6)
By (2·3)–(2·6), the best linear unbiased predictor of m based on y: is
iw
i
V (m )(l−v )−1
i
2
mA =m +
(y −m )
i
i V (m )(l−v )−1(1+ld ) : iw
i
i
2
i
=r y: +(1−r )m ,
iw i
iw iw
(2·7)
where r =(1+ld )−1. Note that this derivation depends only on the means, variances
iw
i
and covariances of y: and m .
iw
i
Remark 1. For the normal model with j=s−2, rather than 1, and m =xT b, mA given
i
i
i
in (2·7) becomes the Prasad & Rao (1999) estimator.
Remark 2. The formulation (2·1) and (2·2) is non-Bayesian. The word ‘prior’ is used only
for convenience. Indeed the marginal distribution of y based on (2·1) and (2·2) is an
ij
overdispersed natural exponential family quadratic variance function distribution. Special
cases are the beta-binomial or gamma-Poisson model depending on whether (2·1) is a
binomial or a Poisson distribution.
However, b and l are unknown, and need to be estimated from the marginal distributions
of y: (i=1, . . . , k). Also, for simplicity, in the remainder of the paper, we consider the
iw
canonical link, namely m =y∞(xT b) (i=1, . . . , k).
i
i
The marginal distributions of y: are nonstandard except for the normal case. We use
iw
the theory of optimal estimating functions (Godambe & Thompson, 1989) for simultaneous
estimation of b and l. This requires evaluation of only the first four marginal moments
of y: (i=1, . . . , k) rather than full knowledge of their distributions.
iw
Following Godambe & Thompson (1989), we begin with the elementary unbiased estimating functions g =( g , g )T, where g =y: −m and g =( y: −m )2−w V (m ), where
iw
i
i
i
iw
i
2i
i
1i 2i
1i
Small-area estimation
99
w is defined in (2·5). Let
i
DT =
i
C
A B
A B
C
−E
∂g
1i
∂b
−E
−E
∂g
1i
∂l
−E
=V (m )
i
A B
A B
∂g
2i
∂b
∂g
2i
∂l
D
V ∞(m )w x
i
i i i
,
0 (1+v d )(l−v )−2
2 i
2
x
D
(2·8)
where we use ∂m /∂b=V (m )x . Next let m =E(y: −m )r (r=1, 2, . . . ) and
iw
i
i
i i
ri
C
D
m
m
3i
S =var (g )= 2i
.
i
i
m
m −m2
3i
4i
2i
(2·9)
Then b and l are obtained by solving the optimal estimating equations W k DT S−1 g =0.
i=1 i i i
However, |S |=m m −m3 −m2 =D , say, and
i
4i 2i
2i
3i
i
S−1 =D−1
i
i
C
m −m2 −m
4i
2i
3i .
−m
m
3i
2i
D
(2·10)
Hence, by Godambe & Thompson (1989), the optimal estimating equations are
k
∑ D−1 [{m −m2 −m (1+ld )(l−v )−1V ∞(m )}g
i
4i
2i
3i
i
2
i 1i
i=1
+{m (1+ld )(l−v )−1−m }g ]V (m )x =0,
2i
i
2
3i 2i
i i
k
∑ D−1 (m g −m g )V (m )(1+v d )(l−v )−2=0.
i
2i 2i
3i 1i
i
2 i
2
i=1
(2·11)
(2·12)
Solving (2·11) and (2·12) simultaneously, and writing g=(b, l)T, one obtains g@ =(b@ , l@ )T.
Then r and m are estimated by r@ =(1+l@ d )−1 and m
@ =y∞(xT b). Accordingly, an empirical
i
i
iw
i
iw
i
best linear unbiased predictor of m is
i
m@ =r@ y: +(1−r@ )m
@ .
iw i
i
iw iw
(2·13)
In general, there is no guarantee that the solution l@ found from (2·11) and (2·12) satisfies
the constraint l@ >max (0, v ). In such cases, we take l@ =max (0, v ). Also, equations (2·11)
2
2
and (2·12) can only be solved numerically. We accomplish this by the Nelder–Mead
algorithm. In particular, we use the optim function in ‘R’ to solve these equations, minimising the sums of squares of the estimating functions to get the roots of these equations.
The R code can be obtained from the authors. This approach may present problems in
the presence of multiple roots, but fortunately we did not encounter this in our example.
In order to find g@ , we first need to find S . We have seen already that m =w V (m ),
i
2i
i
i
where w =(1+ld )(l−v )−1. The following theorem provides expressions for m and m .
i
i
2
3i
4i
100
M G  T M
T 1. L et f =W ni w3 , n =W ni w4 and d=v2 −4v v . T hen, if l>max (0, v , 3v ),
i
1
0 2
2 2
j=1 ij i
j=1 ij
V (m )V ∞(m )
i
i (l2f +3ld +2),
m =
i
i
3i (l−v )(l−2v )
2
2
m ={dn +(3d2 +6n v )V (m )}V (m )
4i
i
i
i 2
i
i
+(dn v +(3d2 +6n v +4f )[2v V (m )+{V ∞(m )}2}]+6d V (m ))E(m −m )2
i 2
i
i 2
i
2
i
i
i
i
i
i
+{(3d2 +6n v )2v V ∞(m )+12v f V ∞(m )+6d V ∞(m )}E(m −m )3
i
i 2 2
i
2 i
i
i
i
i
i
+{v2 (3d2 +6n v )+8v2 f +6v d +1}E(m −m )4.
2 i
i 2
2 i
2 i
i
i
The proof of this theorem involves some tedious algebra, and is omitted. The details
are available from the authors. We have found already that E(m −m )2=V (m )/(l−v ).
i
i
i
2
From Morris (1983, Theorem 5·3), we have
2V (m )V ∞(m )
i
i ,
E(m −m )3=
i
i
(l−v )(l−2v )
2
2
3(l+6v )V 2(m )+6dV (m )
2
i
i ,
E(m −m )4=
i
i
(l−v )(l−2v )(l−3v )
2
2
2
provided that l>max (0, v , 3v ).
2 2
The expressions given in Theorem 1 simplify in the binomial and Poisson cases. In the
binomial case, V (m )=m (1−m ), while, for the Poisson case, V (m )=m .
i
i
i
i
i
Remark 3. The optimal estimating equations developed here provide a unified approach
for estimation of b and l, and are different from the equations of Prasad & Rao (1999)
even in the normal case. To see this, we continue to assume for convenience that j =1.
i
Now noting that marginally y: ~N{xT b, l−1(1+ld )}, one has g =y: −xT b and
iw
i
i
1i
iw
i
g =(y: −xT b)2−(l−1+d ). Then
iw
i
i
2i
x
0
l−1+d
0
i
DT = i
, S=
.
i
i
0 l−1
0
2(l−1+d )2
i
Accordingly, W k DT S−1 g =0 simplifies to
i=1 i i i
k
(2·14)
∑ (l−1+d )−1(y: −xT b)x =0,
iw
i
i
i
i=1
and
k
∑ 1 (l−1+d )−2l−2{(y: −xT b)2−(l−1+d )}=0
i
iw
i
i
2
i=1
or equivalently
k
k
(2·15)
∑ (l−1+d )−2(y: −xT b)2= ∑ (l+d )−1.
i
iw
i
i
i=1
i=1
While (2·14) is similar to that of Prasad & Rao (1999) and provides the optimal modelunbiased estimator of b for known l, (2·15) is quite different from that of Prasad &
Rao (1999).
C
D
C
D
Remark 4. In the normal case, it is preferable to use the fact that y: is normal rather
iw
than to set up estimating equations based only on the first four moments of y: . For small
iw
Small-area estimation
101
areas with large n , even without normality of the y ’s, a normal approximation of the
i
ij
distribution of y: may not be too unreasonable. However, this does not work for small areas
iw
with small n . In such instances, given that the marginal distribution of y: is nonstandard
iw
i
and fairly complicated, neither maximum likelihood nor any other similar method of
estimation seems feasible.
Remark 5. The estimators b@ and l@ of b and l are consistent and asymptotically normal
as k 2, under certain conditions. If we use arguments similar to a standard maximum
likelihood analysis, see for example Theorem 5·13 of Shao (1999), it follows that b@ −b=
O (k−D) and l@ −l=O (k−D). If this consistency holds, m@ is a consistent estimator of m
p
p
i
i
when d 0 as n 2, as for example when w =n−1 for all j=1, . . . , n and i=1, . . . , k.
i
i
ij
i
i
Remark 6. Throughout the model-based formulation of this section, and in the rest of
this paper, we have tacitly assumed that the sample selection probabilities p =w−1 are
ij
ij
not correlated with the y . This need not always be the case. It may so happen that, even
ij
after conditioning on the covariates x , the sampling mechanism remains informative, and
i
should be taken into account for inferential purposes.
Pfeffermann and his colleagues have made concrete proposals of how to incorporate
this selection mechanism in model-based inference. One key consequence is that the
distributions of the samples may be different from the assumed superpopulation model
because of the selection mechanism. We provide one illustration of this approach in the
present context.
First regard the selection probabilities p =w−1 as realisations of a random variable.
ij
ij
If we use the suffix P to denote a population distribution, and use S for a sample distribution, then from equation (3·1) of Pfeffermann, Krieger & Rinott (1998) one obtains
the sample probability density function of y as
ij
f (y |h )=E (p |y ) f (y |h )/E (p ).
S ij i
P ij ij P ij i P ij
(2·16)
For the purpose of illustration, let E (p |y )=exp (A +A y ) (Pfeffermann & Sverchkov,
P ij ij
0
1 ij
1999). Then, by (2·1), if j =1, we obtain
i
E (p )=E {exp (A +A y )}=exp {A +y(A +h )−y(h )}.
P ij
P
0
1 ij
0
1
i
i
Consequently, from (2·16), we have
f (y |h )=exp {(A +h )y −y(A +h )+c(y , 1)}.
S ij i
1
i ij
1
i
ij
(2·17)
Thus the population density of y given in (2·1) has now changed to the new exponenij
tially shifted sampling density as given in (2·17). If we write f =A +h , and assume a
i
1 i
conjugate prior for f as in (2·2), the calculations of this section can be carried out
i
directly, although retrieving E{y∞(h )|y } from E{y∞(f )|y } may not be easy because of
i i
i i
the lack of one-to-one correspondence between the two, the normal case in which
y∞(h )=h being an exception. The situation becomes even more complex if one assigns a
i
i
conjugate prior to h and not to f , because the conjugacy of h is not necessarily translated
i
i
i
to the conjugacy of f . Nevertheless, it seems that the calculations can at least be carried
i
out numerically.
M G  T M
102
3. C         

We first state a theorem which provides an asymptotic expansion of the mean squared
errors of the m@ . Let T =ld (1+l )−1(l−v )−1V (m ),
i
1i
i
i
2
i
l2V 2(m )x xT 0
i i i
U−1 ,
T =d2 (1+ld )−2 tr
k
2i
i
i
0
0
CA
B D
where U = W m DT S−1 D , and
k
i
i=1 i i
lV (m )x xT
−V ∞(m )(l−v )−1x
i i i
i
2
i
U−1
T =d (1+ld )−1V (m ) tr
k
3i
i
i
i
0
−(1+ld )−1(1+v d )(l−v )−2
i
2 i
2
ld (1+ld )−1V (m ) f xT −d (1+ld )−2 f
i
i
i i11 i
i
i
i12 U−1 .
−d (1+ld )−1 tr DT S−1
i
i
i i
k
ld (1+ld )−1V (m ) f xT −d (1+ld )−2 f
i
i
i i12 i
i
i
i22
Expressions for f , f and f are given in (A·20)–(A·22). Then we have the following
i11 i12
i22
theorem.
CA
C
B D
B D
A
T 2. T he approximate mean squared error for m@ is given up to O(k−1) as
i
(3·1)
E(m@ −m )2=T +T +2T .
i
i
1i
2i
3i
The proof of the theorem is deferred to the Appendix.
Since T and T are both O(k−1), they are estimated by substitution of b@ and l@ for b
2i
3i
and l respectively in the relevant expressions. However, T =O(1), and creation of an
1i
estimator TC of T which is correct up to O(k−1) requires considerable effort.
1i
1i
To this end, we write T as T (g), and, by a two-step Taylor expansion,
1i
1i
1
∂T T
∂2T
1i (g@ −g)+ (g@ −g)T
1i (g@ −g).
T (g@ )jT (g)+
(3·2)
1i
1i
∂g
2
∂g ∂gT
A B
Thus,
E{T (g@ )}jT (g)+
1i
1i
A B
q
r
1
∂T T
∂2T
1i E(g@ −g)+ tr E
1i (g@ −g)(g@ −g)T .
∂g
2
∂g ∂gT
(3·3)
Note that
AB
∂T
1i
∂b
V ∞(m )V (m )h (l)x
∂T
i
i i
i ,
1i =
=
∂g
∂T
V
(m
)h∞
(l)
1i
i i
∂l
C
D
(3·4)
[2v V (m )+{V ∞(m )}2]V (m )h (l)x xT V ∞(m )V (m )h∞ (l)x
∂2T
2
i
i
i i
i i
i
i i
i ,
l1 =
∂g ∂gT
V ∞(m )V (m )h∞ (l)xT
V (m )h◊ (l)
i
i i
i
i i
where we recall that h (l)=ld (1+ld )−1(l−v )−1. As before
i
i
i
2
∂2T
∂2T
1i (g@ −g)(g@ −g)T j
1i U−1 ,
E
∂g ∂gT
∂g ∂gT k
C
q
D
r C
D
which is O(k−1) and is estimated by substitution of b@ and l@ for b and l in the resulting
expression. We denote this estimator by 2T (g@ ).
12i
Small-area estimation
103
Finally, we approximate E(g@ −g), up to O(k−1), by the method of Cox & Snell (1968).
We write E(g@ −g) as z(g) and show in the Appendix that z(g) is O(k−1). We now write
T
11i
(g)=
A
B
∂T (g) T
1i
z(g),
∂g
and estimate T (g) by T (g@ )−T (g@ )−T (g@ ).
1i
1i
11i
12i
4. A  
We conduct a simulation study to check the performance of the estimation methodology
as described in the previous sections. The procedure is very similar to the one conducted
by Prasad & Rao (1999) who considered a normal-normal model instead of the present
beta-binomial model.
Step 1. Generate a covariate x from N(0, 10−3) (i=1, . . . , k).
i
Step 2. Generate a size measure z from an exponential distribution with mean 100, for
ij
j=1, . . . , N and i=1, . . . , k. Define Z =W Ni (i=1, . . . , k). We have taken N =200
i
i
i
j=1
for i=1, . . . , k.
Step 3. Generate the response variable y according to a beta-binomial model as
ij
described in (2·1)–(2·3). We have taken b=(0, 1)T and l=8.
Step 4. Select a probability proportional to size sample with replacement independently
from each small area with n =20 (i=1, . . . , k). The selection probabilities for the sample
i
are given by p = z /Z ( j=1, . . . , N , i=1, . . . , k). Then the y ’s are observed from the
ij
ij i
i
ij
small areas.
Step 5. The basic sampling weights are given by wA =n−1 p−1 so that
ij
i ij
p−1
ij
w =
.
ij W ni p−1
j=1 ij
Then we calculate y: =W ni w y (i=1, . . . , k).
iw
j=1 ij ij
Step 6. Calculate m@ ’s and  (m@ )’s, the square roots of the estimated mean squared
i
i
errors (i=1, . . . , k), as given in §§ 2 and 3.
Step 7. We repeat Steps 3–6 R=500 times keeping Steps 1 and 2 fixed.
Step 8. For the estimators m@ (i= . . . , k) we calculate the following quantities: the simulated
i
relative bias of m@ , defined by
i
1 R
(m@ )= ∑ {m@ (r)−h (r)}/h (r) (i=1, . . . , k)
i
i
i
i
R
r=1
in which h (r) and m@ (r) are the values of h and m@ respectively at the rth run, r=1, . . . , R;
i
i
i
i
and relative bias of {(m@ )}2, defined by
i
{(m@ )}2={(m@ )−E{(m@ )}2}/(m@ ),
i
i
i
i
where
1 R
1 R
(m@ )= ∑ {m@ (r)−h (r)}2, E{(m@ )}2= ∑ {(m@ )}2 (i=1, . . . , k).
i
i
i
i
i
R
R
r=1
r=1
104
M G  T M
Table 1 reports the  and  values for different choices of k, the number of small
areas. The summary measures considered are the first quartile, the median, the mean and
the third quartile over all the small areas.
Table 1. T he relative biases of the small-area
estimates and relative biases of the mean squared
errors in the simulation study, for k=20, k=30
and k=50 small areas
k
Q
Q
1
0·066
−20·590
Median
Mean
0·075
−9·580
0·076
−15·100
3
0·096
−2·893

 (%)
20

 (%)
30
0·065
−10·585
0·074
−9·416
0·077
−10·640
0·090
0·084

 (%)
50
0·071
−10·630
0·082
−2·152
0·084
−3·988
0·099
2·614
, simulated small-area estimate relative bias; , relative
bias of mean squared error
Q , first quartile; Q , third quartile
1
3
The simulation study refers to both the design for sample selection and the model to be
fitted the data. Clearly there is negligible bias in the small-area point estimates even for small
k. For the relative bias of mean squared error estimates, the median values show a downward
trend as the number of small areas increases. The trend indicates that the bias would be
negligible if we consider a large number of small areas, which is the case in many applications,
such as the example in § 5 in which the number of sampled counties is more than 1300.
5. S-    - 
We now use our methods to estimate the proportion of poor school-age children, i.e. in
the age-group 5–17, for all the counties of a certain state, not named for confidentiality,
in the United States for the year 1989. The year 1989 is picked because then the different
estimates can be compared against the corresponding 1990 Census estimates, usually taken
as the ‘gold standard’ for this purpose. Although our illustration involves only one state,
the methods have been used for simultaneous estimation of these proportions for all the
counties in the United States.
First we provide briefly the background of this research. As part of its Small Area
Income and Poverty estimation project, the United States Bureau of the Census is required
to produce updated estimates of poor school-age children for the different counties and
school districts of the country. The March supplement to the Current Population Survey
provides reliable direct estimates annually for the number of poor children at the national
level, and also for large states. However, the direct estimates of nearly 40% of the counties
have high sampling variability.
The current approach of the Bureau of the Census is as follows. Let z be the estimated
iu
number of children in the 5–17 age-group for county i in year u, and let q be the
iu
estimated number of poor children in the 5–17 age-group for county i in year u, both
estimates being obtained from the Current Population Survey. Then the estimated poverty
rate for county i in year u is defined as p =q /z .
iu
iu iu
Small-area estimation
105
Suppose now a county i has Current Population Survey samples for all the three years
t−1, t and t+1. Then, for this county, the direct estimate of the logarithm of the number
of poor children in the 5–17 age-group for year t is defined as
y* =log {(w* p
+w* p +w* p
)/(w* z
+w* z +w* z
)}. (5·1)
it
i,t−1 i,t−1
i,t i,t
i,t+1 i,t+1
i,t−1 i,t−1
i,t i,t
i,t+1 i,t+1
Here the weight w* (u=t−1, t, t+1) is proportional to the number of interviewed housing
i,u
units in county i containing at least one child aged 5–17 in year u. For counties with
Current Population Survey samples in only one or two of the three years, the formula
given in (5·1) is modified by taking values only for that year or a two-year average
analogous to (5·1). The direct estimates y* are assumed to be N(h , V ) with V >0 known,
it
it it
it
and h ~N(xT b, A) with A>0 unknown, and an empirical Bayes analysis is carried out
it
it
along the lines of Fay & Herriot (1979). For the detailed estimation procedure we refer
to a Bureau of the Census technical report by W. R. Bell and others.
However, the above procedure has certain limitations. First, counties with no poor child
observed in the Current Population Survey sample during the three years are omitted
from model-fitting since logarithms cannot be taken when the direct estimate is zero. This
has typically led to the elimination of approximately 30% of the sampled counties.
Secondly, an estimator based on three years’ combined data may not necessarily produce
an unbiased estimator for the single year in question. Moreover, the logarithm of an
unbiased estimator of a given parameter is not an unbiased estimator of the logarithm of
the parameter. The above two limitations of current models for this project are discussed
in detail in an Iowa State University Technical Report by T. Maiti and E. V. Slud.
Our procedure, instead, focuses on one specific year. Dropping the suffix t, for a county i,
we use the Current Population Survey estimate for that year. However, these estimates
involve survey weights, and consequently are not straight averages of the binary responses
of individuals in the counties. The individual weights w are constructed in a general way
ij
that does not explicitly take the response variable into account. The weight construction
for Current Population Survey sample units is a complex procedure. The details are available
in the United States Census Bureau’s technical paper 63RV entitled ‘Design and methodology’.
In summary the final weight is the product of the basic weight, an adjustment for special
weighting, a non-interview adjustment, a first-stage ratio adjustment factor and a secondstage ratio adjustment factor. The basic weight is not related to poverty rates. Our basic
statistics are the y: for year i. In our framework, y is 1 or 0 according as the child is or
iw
ij
is not poor. Also, we do not have access to the full microdata: the data given to us are
y: , W w2 , W w3 and W w4 .
iw j ij j ij
j ij
Zero counts, y: =0, do not cause a problem for our model. Also, we model p , the true
iw
i
proportion of poor school-age children in county i, by
logit {E( p )}=b +b x +b x +b x +b x ,
i
0
1 1i
2 2i
3 3i
4 4i
where the covariates are as follows:
x =log (proportion of child exemptions reported by families in poverty for tax
1i
returns in county i),
x =log (proportion of people receiving food stamps in county i),
2i
x =log (proportion of child exemptions reported by families for tax returns in
3i
county i),
x =log (proportion of poor school-age children from the previous census in
4i
county i).
M G  T M
106
These covariates are based on the recommendation of the Bureau of the Census, except
that we have considered logarithms of the proportions rather than logarithms of the
numbers.
Table 2, based on the results of §§ 2 and 3, provides for 1989 the sample sizes, n , surveyi
weighted direct estimates, y: , 1990 census estimates, c , empirical best linear unbiased
iw
i
predictors, m@ , and the corresponding estimates of the approximate root mean squared
i
errors, (m@ ), for all 39 counties of the state. We also provide naive estimates of the
i
root mean squared errors given by T D (g@ ), ignoring T (g@ ) and T (g@ ) completely. This
1i
2i
3i
amounts to ignoring uncertainty in the estimation of the model parameters. The average
values of the y: ’s and m@ ’s are 0·152 and 0·164 respectively, while average values of the c ’s
iw
i
i
is 0·171. The average naive root mean squared error is 0·051, whereas the average value
of the (m@ )’s is 0·061.
i
Table 2. Data for poor school-age children. Estimates and root mean squared errors for the
natural exponential family quadratic variance function models
C
i
n
c
y:
iw
m@
1
2
3
4
5
6
7
8
9
10
i
3
6
10
13
13
18
24
25
26
26
i
0·376
0·127
0·128
0·111
0·287
0·142
0·121
0·171
0·173
0·189
0·000
0·157
0·000
0·000
0·000
0·293
0·085
0·093
0·000
0·267
i
0·198
0·132
0·067
0·060
0·019
0·287
0·112
0·096
0·028
0·261
11
12
13
14
15
16
17
18
19
20
31
31
36
37
38
38
38
38
41
55
0·081
0·235
0·091
0·174
0·223
0·141
0·159
0·268
0·266
0·148
0·000
0·036
0·063
0·163
0·316
0·149
0·076
0·218
0·091
0·162
0·024
0·039
0·074
0·158
0·322
0·161
0·093
0·214
0·094
0·163
Naive

×10
Corr

×10
C
i
1·780
0·459
1·137
1·035
0·801
0·714
0·767
0·543
0·745
0·546
1·823
0·753
1·196
1·060
1·048
1·021
0·718
0·785
0·748
0·807
0·676
0·395
0·611
0·165
0·669
0·615
0·618
0·043
0·469
0·427
0·680
0·569
0·756
0·250
0·722
0·664
0·693
0·058
0·621
0·581
n
c
y:
iw
m@
21
22
23
24
25
26
27
28
29
30
i
59
62
65
67
67
70
72
75
86
92
i
0·131
0·211
0·156
0·195
0·133
0·136
0·087
0·236
0·207
0·147
0·376
0·525
0·301
0·201
0·276
0·034
0·075
0·239
0·245
0·061
i
0·370
0·521
0·301
0·202
0·275
0·041
0·084
0·243
0·242
0·065
31
32
33
34
35
36
37
38
39
117
127
178
207
215
224
282
296
945
0·111
0·174
0·147
0·171
0·135
0·133
0·168
0·139
0·229
0·064
0·164
0·097
0·198
0·073
0·170
0·153
0·170
0·351
0·069
0·169
0·010
0·201
0·077
0·172
0·155
0·171
0·352
Naive

×10
Corr

×10
0·365
0·472
0·441
0·415
0·437
0·444
0·459
0·441
0·273
0·384
0·548
0·560
0·555
0·538
0·582
0·529
0·460
0·489
0·390
0·486
0·360
0·344
0·296
0·273
0·272
0·259
0·234
0·229
0·134
0·398
0·346
0·308
0·277
0·278
0·259
0·235
0·230
0·135
C, County; Corr, Corrected
The empirical best linear unbiased predictors shrink the direct estimates towards some
regression surface. As is evident from Table 2, the amount of shrinkage is less for counties
with large sample sizes, such as counties 35, 36, 37, 38 and 39, than for those with smaller
sample sizes, such as counties 1 and 2. Furthermore, a comparison of naive and the proposed root mean squared errors reveals that, for counties with large sample sizes, the
proposed values do not change the naive values very much. On the other hand, for counties
with smaller sample sizes, changes are more substantial.
Our method is clearly not comparable to that used by the Bureau of the Census: they
have combined three years’ data, which we have not, and they do not provide, as such,
estimates of the proportions of poor children in the 5–17 age-group for the different
Small-area estimation
107
counties. They provide estimated numbers of poor children in the 5–17 age-group, but
do not provide the analogous estimates of the total number of children in the same agegroup. Hence, to provide some comparative results, we consider a normal model for the
logarithms of the sampled proportions. This is different from modelling the log counts as
done by the Bureau of the Census. The normal theory model is given by log ( y: )=h +e ,
iw
i
i
where the e are independent N(0, v /n ). Also, h is distributed, independently of e , as
i
e i
i
i
N(xT b, s2 ). Here v =9·97, an approximate value supplied by the Bureau of the Census,
i
u
e
and is treated as known. Note that, for the normal model, to overcome nonidentifiability
one variance component needs to be known.
The two sets of estimates are compared against the 1990 decennial census estimates for
1989. Based on the recommendation of the panel of National Academy of Sciences, for
any estimate e=(e , . . . , e )T, we compute the average relative bias, 1 W 39 (|c −e |/c ),
1
39
i i
39 i=1 i
the average squared relative bias, 1 W 39 {(c −e )2/c2 }, the average absolute bias,
i
i
i
39
i=1
1 W 39 |c −e |, and the average squared deviation, 1 W 39 (c −e )2. The results are
i
i
39 i=1 i
39 i=1 i
reported in Table 3.
Table 3. Poor school-age children example. Comparison
of direct, normal and - estimates
Estimates
Average
relative bias
Average
squared
relative bias
Average
absolute
bias
Average
squared
deviation
Direct
Normal
-
0·5444
0·6634
0·4662
0·4870
0·4541
0·3820
0·0945
0·0116
0·0814
0·0169
0·0140
0·0125
-, natural exponential quadratic variance function
It follows from Table 3 that pseudo empirical best linear unbiased predictors based on
the beta-binomial model perform better than the direct estimates under all four criteria,
while they perform better than the estimates based on the normal model except in terms
of average absolute bias.
6. D
An alternative to the present estimating equations approach is the extended quasilikelihood approach (Nelder & Pregibon, 1987; McCullagh & Nelder, 1989). Indeed,
Raghunathan (1993) proposed a quasilikelihood-based method for small-area estimation
in a context where the direct estimators were the unit-level estimators y , and not the
ij
survey-weighted estimators y: . The apparent advantage of the quasilikelihood approach
iw
over the estimating function approach is that the small-area estimates can be obtained
only from the first two moments of the y ’s and the first two moments of the prior. In
ij
the present context, the quasilikelihood, or more appropriately the quasi-loglikelihood,
based on E( y: |m ) and var ( y: |m ) is given by
iw i
iw i
m y: −t
(m )=− 1 log {2pd V (m )}+d−1 i iw
dt.
i
i
i
i
2
V (t)
y:iw
One difficulty with the quasilikelihood is that it does not provide an analytical method
for finding mean square errors. Resampling methods such as jackknife and bootstrap are
possible, and have been used by Raghunathan (1993), but the analytical approximation
P
108
M G  T M
as given here also has its virtues, especially since the overdispersed natural exponential
family quadratic variance function structure facilitates calculation of moments, making
the procedure readily available for implementation.
A
We would like to thank the referees, an associate editor and the editor for comments
that led to improvement of the paper. It is a pleasure to acknowledge Dr William Bell
of the Bureau of the Census for many useful conservations. The paper was completed
when both authors were American Statistical Association/National Science Foundation/
Census Senior Research Fellows at the Bureau of the Census. The research was partially
supported by grants from the United States National Science Foundation. Thanks are
due to Bhramar Mukherjee for her help in the preparation of the final version of the
manuscript.
A
T echnical details
Proof of T heorem 2. We begin with the calculation
E(m −m@ )2=E(m −mA +mA −m@ )2=E(m −mA )2+E(mA −m@ )2+2E(m −mA )(mA −m@ ).
i
i
i
i
i
i
i
i
i
i
i
i i
i
First we calculate
E(m −mA )2=E{m −m −(1+ld )−1(y: −m )}2
i
i
i
i
i
iw
i
=(1+ld )−2E{ld (m −m )−(y: −m )}2.
iw
i
i
i i
i
(A·1)
(A·2)
Since
V (m )
i , E(y: −m )2=ld V (m )(l−v )−1,
E(m −m )2=
i
i
iw
i
i
i
2
l−v
2
E{(m −m )(y: −m )}=E[(m −m )E{(y: −m )|m )}]=0,
i
i iw
i
i
i
iw
i i
it follows from (3·2) that
E(m −mA )2=T ,
(A·3)
i
i
1i
where T is defined after the statement of Theorem 2 in § 3. In order to evalaute E(mA −m@ )2, let
1i
i
i
g=(b, l)T and q (g, y: )=r y: +(1−r )m . By one-step Taylor expansion, we have
i
iw
iw iw
iw i
∂q 2
E(mA −m@ )2=E{q (g, y: )−q (g@ , y: )}2jE (g@ −g)T i
i
i
i
iw
i
iw
∂g
q
qA BA B
r
r
∂q
i
∂g
∂q T
i (g@ −g)(g@ −g)T .
∂g
ABq
r
=tr E
However,
∂q
i
∂b
(1−r )V (m )x
∂q
iw
i i
i=
=
.
∂g
∂q
−d (1+ld )−2(y: −m )
i
i
i
iw
i
∂l
(A·4)
Small-area estimation
109
Hence, from (A·4), after slight simplification, we have
E(mA −m@ )2jd2 (1+ld )−2 tr E
i
i
i
i
l2V 2(m )x xT
−lV (m )(y: −m )x
i i i
i iw
i i
−lV (m )(y: −m )xT (1+ld )−2(y: −m )2
i iw
i i
i
iw
i
qA
B
r
×(g@ −g)(g@ −g)T .
(A·5)
We will show now that the right-hand side of (A·5) can be approximated, up to O(k−1), by
d2 (1+ld )−2 tr
i
i
qA
B r
l2V 2(m )x xT 0
i i i
U−1 =T .
k
2i
0
0
(A·6)
Since U =O(k), the approximation (A·6) is accurate up to O(k−1).
k
To see this, we write S (g)=W k DT S−1 g . By one-step Taylor expansion,
k
j=1 j j j
0=S (g@ )jS (g)+
k
k
A
B
∂S (g) T
k
(g@ −g).
∂g
Since
qA
B
r
∂g
∂S (g)
∂
k
k = ∑
DT S−1 g +DT S−1 j ,
j
j j ∂g
∂g
∂g j j
j=1
we have E{−∂S (g)/∂g}=W k DT S−1 D =U . Also, var (S )=U . By the Central Limit Theorem,
k
j
k
k
k
j=1 j j
U−1 S is asymptotically N(0, U−1 ). Hence, g@ −gjU−1 S +o (k−D), and
k k
k
k k
p
−
E{(g@ −g)(g@ −g)T}=U−1 +o(k−1).
k
Furthermore, by the mutual independence of the y: (i=1, . . . , k),
iw
E{(y: −m )(g@ −g)(g@ −g)T}=E{(y: −m )U−1 S ST U−1 }+o(k−1)
iw
i
iw
i k k k k
=U−1 E{(y: −m )DT S−1 g gT S−1 D }U−1 +o(k−1)
iw
i i i i i i
i k
k
=O(k−2)+o(k−1)=o(k−1),
and a similar calculation shows that
E{(y: −m )2(g@ −g)(g@ −g)T}=o(k−1).
iw
i
The above calculations lead to (A·6) from (A·5).
Finally we approximate E(m −mA )(mA −m@ ) up to O(k−1). Writing y =(y , . . . , y )T, we obtain
i
i i
i
i
i1
ini
E{(m −mA )(mA −m@ )}=E[{m −E(m |y )+E(m |y )−mA }(mA −m@ )]
i
i i
i
i
i i
i i
i
i
i
=E[{E(m |y )−mA }(mA −m@ )].
i
i i
i
i
(A·7)
If y: and m have a joint bivariate normal distribution, we have mA =E(m |y ) and the above expression
iw
i
i
i i
simplifies to zero. However, in general, this needs an approximation. From Morris (1983), we
have E(m |y )=r y: +(1−r )m , where y: =n−1 W ni y and r =(1+ln−1 )−1. Accordingly, we can
i i
i i
i i
i
i
i
i
j=1 ij
rewrite (A·7) as
E(m −mA )(mA −m@ )=E[{r (y: −m )−r (y: −m )}(mA −m@ )].
i
i i
i
i i
i
iw iw
i
i
i
The approximation of (A·8) proceeds as follows.
(A·8)
M G  T M
110
By two-step Taylor expansion,
mA −m@ =q(g, y: )−q(g@ , y: )j−
i
i
iw
iw
qA B
r
∂2q
∂q T
1
(g@ −g) .
(g@ −g)+ (g@ −g)T
∂g
2
∂g ∂gT
Now approximating g@ −g by U−1 S as was done earlier, we have
k k
E(m −mA )(mA −m@ )jT −T ,
i
i i
i
31i
32i
where
C
(A·10)
A B
A B
D
∂q T
i U−1 S (g) ,
T =−E {r (y: −m )−r (y: −m )}
k k
i i
i
iw iw
i
31i
∂g
A
(A·9)
(A·11)
B
1
∂2q
i U−1 S (g)ST (g)U−1 ] .
T =− tr E[{r (y: −m )−r (y: −m )}
32i
i i
i
iw iw
i
k k
k
k
2
∂g ∂gT
(A·12)
We begin with the calculation
ld (1+ld )−1V ∞(m )V (m )x xT
d (1+ld )−2V (m )x
∂2q
i
i
i
i i i
i
i
i i
i =
.
(A·13)
∂g ∂gT
d (1+ld )−2V (m )xT
2d2 (1+ld )−3(y: −m )
i
i
i i
i
i
iw
i
As in the calculation of (A·6), from (A·12) and (A·13), T =O(k−2). Finally, we calculate
32i
T =T −T ,
(A·14)
31i
311i
312i
where
A
B
Cq
Cq
A Br D
A Br D
∂q T
i
U−1 ,
=tr E S (g)r (y: −m )
k
iw iw
i ∂g
k
311i
T
(A·15)
∂q T
i
T =tr E S (g)r (y: −m )
U−1 .
312i
k
i i
i ∂g
k
(A·16)
We now evaluate T
and T
separately. First we have
311i
312i
∂q T
i
E S (g)r (y: −m )
k
i iw
i ∂g
q
A Br
q
A Br
∂q T
i
=E (DT S−1 g )(y: −m )
i i i iw
i ∂g
=DT S−1 E
i i
CA
y: −m
iw
i
(y: −m )2−w V (m )
iw
i
i
i
B
(y: −m )
iw
i
q
ld
d
i V (m )xT −
i
(y: −m )
i i
i
1+ld
(1+ld )2 iw
i
i
ld (1+ld )−1V (m )m xT
−d (1+ld )−2m
i
i
i 2i i
i
i
3i
=DT S−1
i i
ld (1+ld )−1V (m )m xT −d (1+ld )−2(m −m2 )
i
i
i 3i i
i
i
4i
2i
A
A
m
m
3i
=DT S−1 2i
i i
m
m −m2
3i
4i
2i
=DT
i
=
A
C
BA
B
ld (1+ld )−1V (m )xT
0
i
i
i i
0
−d (1+ld )−2
i
i
ld (1+ld )−1V (m )xT
0
i
i
i i
0
−d (1+ld )−2
i
i
B
D
ld (1+ld )−1V 2(m )x xT −V (m )V ∞(m )d (1+ld )−1(l−v )−1x
i
i
i i i
i
i i
i
2
i .
0
−V (m )d (1+ld )−2(l−v )−2(1+v d )
i i
i
2
2 i
B
rD
Small-area estimation
111
Hence,
=d (1+ld )−1V (m ) tr
311i
i
i
i
T
qA
lV (m )x xT
−V ∞(m )(l−v )−1x
i i i
i
2
i
U−1 ,
k
0
−(1+ld )−1(1+v d )(l−v )−2
i
2 i
2
(A·17)
B r
which is O(k−1). Also,
C
C
A B
qA B
∂q T
i (y: −m )}U−1
T =tr E{DT S−1 g
312i
i
i
k
i i i ∂g
D
r D
∂q T
i (y: −m ) U−1 ,
=tr DT S−1 E g
i i
i ∂g
k
i
i
(A·18)
which is O(k−1). Furthermore,
ld (1+ld )−1V (m ) f xT −d (1+ld )−2 f
∂q T
i
i
i i11 i
i
i
i12 ,
i (y: −m ) =
i ∂g
i
i
ld (1+ld )−1V (m ) f xT −d (1+ld )−2 f
i
i
i i12 i
i
i
i22
qA B
E g
r C
D
(A·19)
where
=E{(y: −m )(y: −m )}=V (m )(l−v )−1(1+ln−1 ),
i11
iw
i i
i
i
2
i
and, after much algebraic simplification,
f
=E{(y: −m )2(y: −m )}=
i12
iw
i
i
i
f
f
i22
(A·20)
(l+n )(ld +2)V (m )V ∞(m )
i
i
i
i ,
n (l−v )(l−2v )
i
2
2
(A·21)
=E{(y: −m )3(y: −m )}−w V (m ) f
iw
i
i
i
i
i i11
1
{df +3(d +2v f )V (m )}V (m )
i
i
2 i
i
i
n
i
dv f
6
2 i+
(d +v f )+f [2v V (m )+{V ∞(m )}2]+3(n−1 +d )V (m ) E(m −m )2
+
2
i
i
i
i
i
i
i
2 i
i
n
n i
i
i
1
(5v d +4v2 f +1)+v f +d V ∞(m ) E(m −m )3
+3
2 i
2 i
2 i
i
i
i
i
n
i
1
+
(9v2 d +6v3 f +3v )+2v2 f +3v d +1 E(m −m )4
2 i
2 i
2
2 i
2 i
i
i
n
i
−(1+ld )V 2(m )(l−v )−2(1+ln−1 ).
(A·22)
i
i
2
i
Combining (A·1), (A·3), (A·6), (A·10)–(A·12) and (A·14)–(A·22), we obtain the result.
%
=
A q
Cq
q
r
B
r
D
r
Approximation of E(g@ −g). Let U−1 =(Urs ). We first need some notation associated with
k
k
DT S−1 . Let
i i
e =D−1 {m −m2 −m w V (m )}, e =D−1 {m w V ∞(m )−m },
1i
i
4i
2i
3i i
i
2i
i
2i i
i
3i
e =−D−1 m V (m ), e =D−1 m V (m ),
3i
i 3i
i
4i
i 2i
i
where we recall that V (m )=v +v m +v m2 and w =(1+ld )(l−v )−1. We now write
i
0
1 i
2 i
i
i
2
g=(g , . . . , g
)T and S (g)¬S =(s , . . . , s , s
)T, where
1
p+1
k
k
k1
kp k,p+1
∂w
k
k
s = ∑ (e g +e g )V (m )x , s
=− ∑ (e g +e g ) i .
kr
1i 1i
2i 2i
i ir
k,p+1
3i 1i
4i 2i ∂l
i=1
i=1
112
M G  T M
We will also find it convenient to write l=b
A
B
p+1
. Then, following Cox & Snell (1968), let
A
∂2s
∂s
kr
J =cov Ust , kr , K =E
k ∂b
rst
t,rs
∂b ∂b
s
s t
B
(r, s, t,=1, . . . , p+1).
Now writing
AA
BB
∂2s
kr
J =(J ), K = E
,
k(r)
t,rs
k(r)
∂b ∂b
s t
aT =(tr (U−1 J ), . . . ,tr (U−1 J
)), bT =(tr (U−1 K ), . . . , tr (U−1 K
)),
k
k k(1)
k k(p+1)
k
k
k(1)
k
k(p+1)
by the Cox–Snell formula, we have
(A·23)
E(g@ −g)=U−1 (a + 1 b )=z(g),
k
k 2 k
say. Finding the elements of z(g) requires heavy algebra. The details are omitted here, and can be
obtained from the authors. We note also that z(g)=O(k−1) since U−1 is O(k−1), and its multiplier
k
is O(1).
R
C, D. R. & S, E. J. (1968). A general distribution of residuals (with Discussion). J. R. Statist. Soc. B
30, 248–75.
F, R. E. & H, R. A. (1979). Estimates of income from small places: An application of James-Stein
procedures to census data. J. Am. Statist. Assoc. 74, 269–77.
G, V. P. & T, M. E. (1989). An extension of quasi-likelihood estimation (with Discussion).
J. Statist. Plan. Infer. 22, 137–52.
K, P. (1989). Robust small domain estimation using random effects modeling. Survey Methodol. 15, 3–12.
MC, P. & N, J. A. (1989). Generalized L inear Models, 2nd ed. London: Chapman and Hall.
M, C. (1982). Natural exponential families with quadratic variance functions. Ann. Statist. 10, 65–80.
M, C. (1983). Natural exponential families with quadratic variance functions: Statistical theory. Ann.
Statist. 11, 515–29.
N, J. A. & P, D. (1987). An extended quasi-likelihood function. Biometrika 74, 221–32.
P, D. (1993). The role of sampling weights when modeling survey data. Int. Statist. Rev. 61, 317–37.
P, D., K, A. M. & R, Y. (1998). Parametric distributions of complex survey data
under informative probability sampling. Statist. Sinica, 8, 1087–114.
P, D. & S, M. (1999). Parametric and semiparametric estimation of regression models
fitted to survey data. Sankhya: B 61, 166–86.
P, D., S, C. J., H, D. J., G, H. & R, J. (1998). Weighting for unequal
selection probabilities in multi-level models (with Discussion). J. R. Statist. Soc. B 60, 23–40.
P, N. G. N. & R, J. N. K. (1990). The estimation of the mean squared error of small-area estimators.
J. Am. Statist. Assoc. 85, 163–71.
P, N. G. N. & R, J. N. K. (1999). On robust small area estimation using a simple random effects
model. Survey Methodol. 25, 67–72.
R, T. E. (1993). A quasi-empirical Bayes method for small area estimation. J. Am. Statist. Assoc.
88, 1444–8.
R, G., R, J. N. K. & K, S. (1987). Logistic regression analysis of sample survey data. Biometrika
74, 1–12.
S, J. (1999). Mathematical Statistics. New York: Wiley.
S, C. J., H, D. & S, T. M. F. (Eds.) (1989). Analysis of Complex Surveys. New York: Wiley.
[Received January 2002. Revised June 2003]