Biometrika (2004), 91, 1, pp. 95–112 © 2004 Biometrika Trust Printed in Great Britain Small-area estimation based on natural exponential family quadratic variance function models and survey weights B MALAY GHOSH Department of Statistics, University of Florida, Gainesville, Florida 32611-8545, U.S.A. [email protected] TAPABRATA MAITI Department of Statistics, Iowa State University, Ames, Iowa 50011-1210, U.S.A. [email protected] S We propose pseudo empirical best linear unbiased estimators of small-area means based on natural exponential family quadratic variance function models when the basic data consist of survey-weighted estimators of these means, area-specific covariates and certain summary measures involving the weights. We also provide explicit approximate mean squared errors of these estimators in the spirit of Prasad & Rao (1990), and these estimators can be readily evaluated. A simulation study is undertaken to evaluate the performance of the proposed inferential procedure. We estimate also the proportion of poor children in the 5–17 years age-group for the different counties in one of the states in the United States. Some key words: Area specific covariate; Linear unbiased estimator; Mean squared error; Optimal estimating equation; Poor school-age children; Pseudo empirical best. 1. I Direct survey estimators for small areas are usually unreliable, which makes it necessary to use models, either explicit or implicit, to connect the small areas and obtain estimators of improved precision by ‘borrowing strength’. The direct estimators, however, are not without value, since they often form the starting point for finding model-based estimates. Most of the existing small-area estimation methods do not make use of survey weights. However, direct estimates are often survey-weighted, and are the only accessible data along with area level covariates for secondary users of surveys. There is therefore a clear need to develop small-area estimation methods based on weighted survey data. We develop a general methodology for finding empirical best linear unbiased predictors of small-area means based on direct survey-weighted estimators and natural exponential family quadratic variance function superpopulation models. Morris (1982, 1983) first characterised distributions belonging to the natural exponential family quadratic variance function family and studied many of their properties. The six basic distributions belonging to this family are the binomial, Poisson, normal with known variance, gamma, negative binomial and the generalised hyperbolic secant. A simulation is undertaken to evaluate 96 M G T M the proposed methodology. Also, the procedure is illustrated by estimating the proportion of poor children in the 5–17 age-group for the different counties of the United States as a part of the project with the United States Bureau of the Census. Kott (1989) initiated the study of small-area estimators using survey weights primarily with the objective of achieving design consistency, as small-area sample sizes increase, in the case of model failure. He used the familiar unit-level random effects model and obtained both model-unbiased and design-consistent small-area estimators assuming equality of the error variances. Prasad & Rao (1999) obtained pseudo empirical best linear unbiased predictors of small-area parameters which depended on survey weights and were design consistent. They also obtained approximate estimators of the mean squared errors of the small-area estimators which were more stable than the ones proposed by Kott (1989). Their approach, based on a random effects model, needed estimation of both the model variance and the error variance in finding the pseudo empirical best linear unbiased predictors. However, the variance of the distribution is a known function of the mean for the binomial and Poisson distributions, which are typically used to model binary and count data, and this is true in general for the natural exponential family quadratic variance function family of distributions. Since our application consists of finding the number or proportion of poor school-age children, it fits in very nicely within the general natural exponential family quadratic variance function framework. Our procedure for estimating the model parameters will depend only on the direct survey weighted estimators of small-area means, the area-specific weights and certain summary measures involving the survey weights. In § 2 of this paper, we introduce the natural exponential family quadratic variance function and model along with a conjugate prior for the canonical parameter of the exponential model. Together, they constitute an overdispered natural exponential family quadratic variance function model. Our model involves area-level covariates, but is motivated from the unit-level model via the use of design weights. Based on such models, we first develop pseudo best linear unbiased predictors of the small-area means, and subsequently find the pseudo empirical best linear unbiased predictors by estimating the prior parameters, by a method, based on the theory of optimal estimating functions as proposed by Godambe & Thompson (1989), which is quite different from the currently available procedures even in the normal case. We also point out the design consistency of these small-area estimators when the sample size within a small area grows to infinity. In § 3, we derive approximate formulae for the mean squared errors and find their approximate estimators; to find bias-corrected estimators of the mean squared errors, we use a variation of a certain technique of Cox & Snell (1968), who obtained asymptotic biases of the maximum likelihood estimators up to a certain desired order. A small simulation study in § 4 illustrates applicability of the proposed method, and § 5 describes our demographic example. Pfeffermann and his colleagues have argued strongly in favour of accounting for the sample selection process for inferential purposes in complex surveys. The present study provides a systematic account of how to use survey weights for small-area estimation based on the natural exponential family quadratic variance function model, although, unlike as in Pfeffermann (1993), the present study is based on noninformative sampling. However, we do indicate very briefly how to modify our procedure along the lines suggested by Pfeffermann, Krieger & Rinott (1998) and Pfeffermann & Sverchkov (1999) when the selection probabilities of the different units depend on the response values. Small-area estimation 97 2. D - Let y denote the response of the jth unit in the ith small area ( j=1, . . . , n ; i=1, . . . , k); ij i let wA be the weight attached to y , usually the inverse of the selection probability, ij ij and let w =wA /W ni wA so that W ni w =1. Our basic data consist of y: =W ni w y iw ij j=1 ij ij j=1 ij j=1 ij ij (i=1, . . . , k), and not the y themselves. It is assumed that the weights w are independent ij ij of the y so that the former can be considered as fixed numbers given the sample. ij Suppose that the y are independent, and y has probability density function or probability ij ij function belonging to the natural exponential family quadratic variance function family, written as f (y |h )=exp [j {h y −y(h )}+c(y , j )]. ij i i i ij i ij i This is the regular one-parameter exponential family model. Here (2·1) E(y |h )=y∞(h )=m , var (y |h )=y◊(h )/j =V (m )/j , ij i i i ij i i i i i say. Since var ( y |h )>0, m is strictly increasing in h . ij i i i With the quadratic variance function structure, V (m )=v +v m +v m2 , where v , v i 0 1 i 2 i 0 1 and v are not simultaneously zero. For the binomial distribution, v =0, v =1 and v =−1. 2 0 1 2 For the Poisson distribution, v =v =0 and v =1. For the normal model with known 0 2 1 variance s2, j =s−2, v =1 and v =v =0. From now on, unless otherwise specified, we i 0 1 2 will take j =1 for all i=1, . . . , k. i In what follows we assume that the weights w are independent of the y and hence ij ij that they can be considered as fixed numbers given the sample. The estimator y: can be iw interpreted as a weighted maximum likelihood estimator or pseudo maximum likelihood estimator of m , using the terminology of Roberts et al. (1987) or Skinner et al. (1989). i Following these authors, we begin with the weighted likelihood n (h )= ai f wij (y |h ), i ij i j=1 where f (y |h ) is defined in (2·1). Now solving ij i d log (h ) n i = ∑i w (y −m )=0, ij ij i dh i j=1 one obtains the weighted maximum likelihood estimator of m as y: . Alternatively, i iw following Pfeffermann, Skinner et al. (1998), we can begin with the census likelihood equation W Ni (y −m )=0, using all the hypothetical parameters in a finite population i j=1 ij of size N , and then make the Horvitz–Thompson replacement of W Ni ( y −m ) by i i j=1 ij W ni w (y −m ), where w is the normalised inverse of the selection probability p of the ij ij i ij ij j=1 jth unit in the ith local area. Consider now the conjugate prior with probability density function p(h )=exp [l{m h −y(h )}]C(l, m ), (2·2) i i i i i for h , where m =g(xT b) (i=1, . . . , k). Here x is the design vector for the ith small area, i i i i b is the regression coefficient and g is the link function. Then (Morris, 1983) V (m ) i , E(m )=m , var (m )= i i i l−v 2 (2·3) 98 M G T M where l>max (0, v ). Since var (m ) is strictly decreasing in l, l may be interpreted as the 2 i precision parameter. We obtain first the best linear unbiased predictor of h based on y: . To this end, we iw i compute E(y: )=EE(y: |h )=E(m )=m . iw iw i i i (2·4) Also, writing d = W ni w2 , we have i j=1 ij var (y: )=var {E(y: |h )}+E{var (y: |h )}=var (m )+d E{V (m )} iw iw i iw i i i i =V (m )(l−v )−1+d E{V (m )+(m −m )V ∞(m )+v (m −m )2} i 2 i i i i i 2 i i =V (m )(l−v )−1+d {V (m )+v V (m )(l−v )−1} i 2 i i 2 i 2 =V (m )(l−v )−1(1+ld )=w V (m ), i 2 i i i (2·5) say, and cov (y: , m )=E{cov (y: , m )|h }+cov {E(y: |h ), m }=E(0)+var (m ) iw i i i iw i i iw i =V (m )(l−v )−1. i 2 (2·6) By (2·3)–(2·6), the best linear unbiased predictor of m based on y: is iw i V (m )(l−v )−1 i 2 mA =m + (y −m ) i i V (m )(l−v )−1(1+ld ) : iw i i 2 i =r y: +(1−r )m , iw i iw iw (2·7) where r =(1+ld )−1. Note that this derivation depends only on the means, variances iw i and covariances of y: and m . iw i Remark 1. For the normal model with j=s−2, rather than 1, and m =xT b, mA given i i i in (2·7) becomes the Prasad & Rao (1999) estimator. Remark 2. The formulation (2·1) and (2·2) is non-Bayesian. The word ‘prior’ is used only for convenience. Indeed the marginal distribution of y based on (2·1) and (2·2) is an ij overdispersed natural exponential family quadratic variance function distribution. Special cases are the beta-binomial or gamma-Poisson model depending on whether (2·1) is a binomial or a Poisson distribution. However, b and l are unknown, and need to be estimated from the marginal distributions of y: (i=1, . . . , k). Also, for simplicity, in the remainder of the paper, we consider the iw canonical link, namely m =y∞(xT b) (i=1, . . . , k). i i The marginal distributions of y: are nonstandard except for the normal case. We use iw the theory of optimal estimating functions (Godambe & Thompson, 1989) for simultaneous estimation of b and l. This requires evaluation of only the first four marginal moments of y: (i=1, . . . , k) rather than full knowledge of their distributions. iw Following Godambe & Thompson (1989), we begin with the elementary unbiased estimating functions g =( g , g )T, where g =y: −m and g =( y: −m )2−w V (m ), where iw i i i iw i 2i i 1i 2i 1i Small-area estimation 99 w is defined in (2·5). Let i DT = i C A B A B C −E ∂g 1i ∂b −E −E ∂g 1i ∂l −E =V (m ) i A B A B ∂g 2i ∂b ∂g 2i ∂l D V ∞(m )w x i i i i , 0 (1+v d )(l−v )−2 2 i 2 x D (2·8) where we use ∂m /∂b=V (m )x . Next let m =E(y: −m )r (r=1, 2, . . . ) and iw i i i i ri C D m m 3i S =var (g )= 2i . i i m m −m2 3i 4i 2i (2·9) Then b and l are obtained by solving the optimal estimating equations W k DT S−1 g =0. i=1 i i i However, |S |=m m −m3 −m2 =D , say, and i 4i 2i 2i 3i i S−1 =D−1 i i C m −m2 −m 4i 2i 3i . −m m 3i 2i D (2·10) Hence, by Godambe & Thompson (1989), the optimal estimating equations are k ∑ D−1 [{m −m2 −m (1+ld )(l−v )−1V ∞(m )}g i 4i 2i 3i i 2 i 1i i=1 +{m (1+ld )(l−v )−1−m }g ]V (m )x =0, 2i i 2 3i 2i i i k ∑ D−1 (m g −m g )V (m )(1+v d )(l−v )−2=0. i 2i 2i 3i 1i i 2 i 2 i=1 (2·11) (2·12) Solving (2·11) and (2·12) simultaneously, and writing g=(b, l)T, one obtains g@ =(b@ , l@ )T. Then r and m are estimated by r@ =(1+l@ d )−1 and m @ =y∞(xT b). Accordingly, an empirical i i iw i iw i best linear unbiased predictor of m is i m@ =r@ y: +(1−r@ )m @ . iw i i iw iw (2·13) In general, there is no guarantee that the solution l@ found from (2·11) and (2·12) satisfies the constraint l@ >max (0, v ). In such cases, we take l@ =max (0, v ). Also, equations (2·11) 2 2 and (2·12) can only be solved numerically. We accomplish this by the Nelder–Mead algorithm. In particular, we use the optim function in ‘R’ to solve these equations, minimising the sums of squares of the estimating functions to get the roots of these equations. The R code can be obtained from the authors. This approach may present problems in the presence of multiple roots, but fortunately we did not encounter this in our example. In order to find g@ , we first need to find S . We have seen already that m =w V (m ), i 2i i i where w =(1+ld )(l−v )−1. The following theorem provides expressions for m and m . i i 2 3i 4i 100 M G T M T 1. L et f =W ni w3 , n =W ni w4 and d=v2 −4v v . T hen, if l>max (0, v , 3v ), i 1 0 2 2 2 j=1 ij i j=1 ij V (m )V ∞(m ) i i (l2f +3ld +2), m = i i 3i (l−v )(l−2v ) 2 2 m ={dn +(3d2 +6n v )V (m )}V (m ) 4i i i i 2 i i +(dn v +(3d2 +6n v +4f )[2v V (m )+{V ∞(m )}2}]+6d V (m ))E(m −m )2 i 2 i i 2 i 2 i i i i i i +{(3d2 +6n v )2v V ∞(m )+12v f V ∞(m )+6d V ∞(m )}E(m −m )3 i i 2 2 i 2 i i i i i i +{v2 (3d2 +6n v )+8v2 f +6v d +1}E(m −m )4. 2 i i 2 2 i 2 i i i The proof of this theorem involves some tedious algebra, and is omitted. The details are available from the authors. We have found already that E(m −m )2=V (m )/(l−v ). i i i 2 From Morris (1983, Theorem 5·3), we have 2V (m )V ∞(m ) i i , E(m −m )3= i i (l−v )(l−2v ) 2 2 3(l+6v )V 2(m )+6dV (m ) 2 i i , E(m −m )4= i i (l−v )(l−2v )(l−3v ) 2 2 2 provided that l>max (0, v , 3v ). 2 2 The expressions given in Theorem 1 simplify in the binomial and Poisson cases. In the binomial case, V (m )=m (1−m ), while, for the Poisson case, V (m )=m . i i i i i Remark 3. The optimal estimating equations developed here provide a unified approach for estimation of b and l, and are different from the equations of Prasad & Rao (1999) even in the normal case. To see this, we continue to assume for convenience that j =1. i Now noting that marginally y: ~N{xT b, l−1(1+ld )}, one has g =y: −xT b and iw i i 1i iw i g =(y: −xT b)2−(l−1+d ). Then iw i i 2i x 0 l−1+d 0 i DT = i , S= . i i 0 l−1 0 2(l−1+d )2 i Accordingly, W k DT S−1 g =0 simplifies to i=1 i i i k (2·14) ∑ (l−1+d )−1(y: −xT b)x =0, iw i i i i=1 and k ∑ 1 (l−1+d )−2l−2{(y: −xT b)2−(l−1+d )}=0 i iw i i 2 i=1 or equivalently k k (2·15) ∑ (l−1+d )−2(y: −xT b)2= ∑ (l+d )−1. i iw i i i=1 i=1 While (2·14) is similar to that of Prasad & Rao (1999) and provides the optimal modelunbiased estimator of b for known l, (2·15) is quite different from that of Prasad & Rao (1999). C D C D Remark 4. In the normal case, it is preferable to use the fact that y: is normal rather iw than to set up estimating equations based only on the first four moments of y: . For small iw Small-area estimation 101 areas with large n , even without normality of the y ’s, a normal approximation of the i ij distribution of y: may not be too unreasonable. However, this does not work for small areas iw with small n . In such instances, given that the marginal distribution of y: is nonstandard iw i and fairly complicated, neither maximum likelihood nor any other similar method of estimation seems feasible. Remark 5. The estimators b@ and l@ of b and l are consistent and asymptotically normal as k 2, under certain conditions. If we use arguments similar to a standard maximum likelihood analysis, see for example Theorem 5·13 of Shao (1999), it follows that b@ −b= O (k−D) and l@ −l=O (k−D). If this consistency holds, m@ is a consistent estimator of m p p i i when d 0 as n 2, as for example when w =n−1 for all j=1, . . . , n and i=1, . . . , k. i i ij i i Remark 6. Throughout the model-based formulation of this section, and in the rest of this paper, we have tacitly assumed that the sample selection probabilities p =w−1 are ij ij not correlated with the y . This need not always be the case. It may so happen that, even ij after conditioning on the covariates x , the sampling mechanism remains informative, and i should be taken into account for inferential purposes. Pfeffermann and his colleagues have made concrete proposals of how to incorporate this selection mechanism in model-based inference. One key consequence is that the distributions of the samples may be different from the assumed superpopulation model because of the selection mechanism. We provide one illustration of this approach in the present context. First regard the selection probabilities p =w−1 as realisations of a random variable. ij ij If we use the suffix P to denote a population distribution, and use S for a sample distribution, then from equation (3·1) of Pfeffermann, Krieger & Rinott (1998) one obtains the sample probability density function of y as ij f (y |h )=E (p |y ) f (y |h )/E (p ). S ij i P ij ij P ij i P ij (2·16) For the purpose of illustration, let E (p |y )=exp (A +A y ) (Pfeffermann & Sverchkov, P ij ij 0 1 ij 1999). Then, by (2·1), if j =1, we obtain i E (p )=E {exp (A +A y )}=exp {A +y(A +h )−y(h )}. P ij P 0 1 ij 0 1 i i Consequently, from (2·16), we have f (y |h )=exp {(A +h )y −y(A +h )+c(y , 1)}. S ij i 1 i ij 1 i ij (2·17) Thus the population density of y given in (2·1) has now changed to the new exponenij tially shifted sampling density as given in (2·17). If we write f =A +h , and assume a i 1 i conjugate prior for f as in (2·2), the calculations of this section can be carried out i directly, although retrieving E{y∞(h )|y } from E{y∞(f )|y } may not be easy because of i i i i the lack of one-to-one correspondence between the two, the normal case in which y∞(h )=h being an exception. The situation becomes even more complex if one assigns a i i conjugate prior to h and not to f , because the conjugacy of h is not necessarily translated i i i to the conjugacy of f . Nevertheless, it seems that the calculations can at least be carried i out numerically. M G T M 102 3. C We first state a theorem which provides an asymptotic expansion of the mean squared errors of the m@ . Let T =ld (1+l )−1(l−v )−1V (m ), i 1i i i 2 i l2V 2(m )x xT 0 i i i U−1 , T =d2 (1+ld )−2 tr k 2i i i 0 0 CA B D where U = W m DT S−1 D , and k i i=1 i i lV (m )x xT −V ∞(m )(l−v )−1x i i i i 2 i U−1 T =d (1+ld )−1V (m ) tr k 3i i i i 0 −(1+ld )−1(1+v d )(l−v )−2 i 2 i 2 ld (1+ld )−1V (m ) f xT −d (1+ld )−2 f i i i i11 i i i i12 U−1 . −d (1+ld )−1 tr DT S−1 i i i i k ld (1+ld )−1V (m ) f xT −d (1+ld )−2 f i i i i12 i i i i22 Expressions for f , f and f are given in (A·20)–(A·22). Then we have the following i11 i12 i22 theorem. CA C B D B D A T 2. T he approximate mean squared error for m@ is given up to O(k−1) as i (3·1) E(m@ −m )2=T +T +2T . i i 1i 2i 3i The proof of the theorem is deferred to the Appendix. Since T and T are both O(k−1), they are estimated by substitution of b@ and l@ for b 2i 3i and l respectively in the relevant expressions. However, T =O(1), and creation of an 1i estimator TC of T which is correct up to O(k−1) requires considerable effort. 1i 1i To this end, we write T as T (g), and, by a two-step Taylor expansion, 1i 1i 1 ∂T T ∂2T 1i (g@ −g)+ (g@ −g)T 1i (g@ −g). T (g@ )jT (g)+ (3·2) 1i 1i ∂g 2 ∂g ∂gT A B Thus, E{T (g@ )}jT (g)+ 1i 1i A B q r 1 ∂T T ∂2T 1i E(g@ −g)+ tr E 1i (g@ −g)(g@ −g)T . ∂g 2 ∂g ∂gT (3·3) Note that AB ∂T 1i ∂b V ∞(m )V (m )h (l)x ∂T i i i i , 1i = = ∂g ∂T V (m )h∞ (l) 1i i i ∂l C D (3·4) [2v V (m )+{V ∞(m )}2]V (m )h (l)x xT V ∞(m )V (m )h∞ (l)x ∂2T 2 i i i i i i i i i i , l1 = ∂g ∂gT V ∞(m )V (m )h∞ (l)xT V (m )h◊ (l) i i i i i i where we recall that h (l)=ld (1+ld )−1(l−v )−1. As before i i i 2 ∂2T ∂2T 1i (g@ −g)(g@ −g)T j 1i U−1 , E ∂g ∂gT ∂g ∂gT k C q D r C D which is O(k−1) and is estimated by substitution of b@ and l@ for b and l in the resulting expression. We denote this estimator by 2T (g@ ). 12i Small-area estimation 103 Finally, we approximate E(g@ −g), up to O(k−1), by the method of Cox & Snell (1968). We write E(g@ −g) as z(g) and show in the Appendix that z(g) is O(k−1). We now write T 11i (g)= A B ∂T (g) T 1i z(g), ∂g and estimate T (g) by T (g@ )−T (g@ )−T (g@ ). 1i 1i 11i 12i 4. A We conduct a simulation study to check the performance of the estimation methodology as described in the previous sections. The procedure is very similar to the one conducted by Prasad & Rao (1999) who considered a normal-normal model instead of the present beta-binomial model. Step 1. Generate a covariate x from N(0, 10−3) (i=1, . . . , k). i Step 2. Generate a size measure z from an exponential distribution with mean 100, for ij j=1, . . . , N and i=1, . . . , k. Define Z =W Ni (i=1, . . . , k). We have taken N =200 i i i j=1 for i=1, . . . , k. Step 3. Generate the response variable y according to a beta-binomial model as ij described in (2·1)–(2·3). We have taken b=(0, 1)T and l=8. Step 4. Select a probability proportional to size sample with replacement independently from each small area with n =20 (i=1, . . . , k). The selection probabilities for the sample i are given by p = z /Z ( j=1, . . . , N , i=1, . . . , k). Then the y ’s are observed from the ij ij i i ij small areas. Step 5. The basic sampling weights are given by wA =n−1 p−1 so that ij i ij p−1 ij w = . ij W ni p−1 j=1 ij Then we calculate y: =W ni w y (i=1, . . . , k). iw j=1 ij ij Step 6. Calculate m@ ’s and (m@ )’s, the square roots of the estimated mean squared i i errors (i=1, . . . , k), as given in §§ 2 and 3. Step 7. We repeat Steps 3–6 R=500 times keeping Steps 1 and 2 fixed. Step 8. For the estimators m@ (i= . . . , k) we calculate the following quantities: the simulated i relative bias of m@ , defined by i 1 R (m@ )= ∑ {m@ (r)−h (r)}/h (r) (i=1, . . . , k) i i i i R r=1 in which h (r) and m@ (r) are the values of h and m@ respectively at the rth run, r=1, . . . , R; i i i i and relative bias of {(m@ )}2, defined by i {(m@ )}2={(m@ )−E{(m@ )}2}/(m@ ), i i i i where 1 R 1 R (m@ )= ∑ {m@ (r)−h (r)}2, E{(m@ )}2= ∑ {(m@ )}2 (i=1, . . . , k). i i i i i R R r=1 r=1 104 M G T M Table 1 reports the and values for different choices of k, the number of small areas. The summary measures considered are the first quartile, the median, the mean and the third quartile over all the small areas. Table 1. T he relative biases of the small-area estimates and relative biases of the mean squared errors in the simulation study, for k=20, k=30 and k=50 small areas k Q Q 1 0·066 −20·590 Median Mean 0·075 −9·580 0·076 −15·100 3 0·096 −2·893 (%) 20 (%) 30 0·065 −10·585 0·074 −9·416 0·077 −10·640 0·090 0·084 (%) 50 0·071 −10·630 0·082 −2·152 0·084 −3·988 0·099 2·614 , simulated small-area estimate relative bias; , relative bias of mean squared error Q , first quartile; Q , third quartile 1 3 The simulation study refers to both the design for sample selection and the model to be fitted the data. Clearly there is negligible bias in the small-area point estimates even for small k. For the relative bias of mean squared error estimates, the median values show a downward trend as the number of small areas increases. The trend indicates that the bias would be negligible if we consider a large number of small areas, which is the case in many applications, such as the example in § 5 in which the number of sampled counties is more than 1300. 5. S- - We now use our methods to estimate the proportion of poor school-age children, i.e. in the age-group 5–17, for all the counties of a certain state, not named for confidentiality, in the United States for the year 1989. The year 1989 is picked because then the different estimates can be compared against the corresponding 1990 Census estimates, usually taken as the ‘gold standard’ for this purpose. Although our illustration involves only one state, the methods have been used for simultaneous estimation of these proportions for all the counties in the United States. First we provide briefly the background of this research. As part of its Small Area Income and Poverty estimation project, the United States Bureau of the Census is required to produce updated estimates of poor school-age children for the different counties and school districts of the country. The March supplement to the Current Population Survey provides reliable direct estimates annually for the number of poor children at the national level, and also for large states. However, the direct estimates of nearly 40% of the counties have high sampling variability. The current approach of the Bureau of the Census is as follows. Let z be the estimated iu number of children in the 5–17 age-group for county i in year u, and let q be the iu estimated number of poor children in the 5–17 age-group for county i in year u, both estimates being obtained from the Current Population Survey. Then the estimated poverty rate for county i in year u is defined as p =q /z . iu iu iu Small-area estimation 105 Suppose now a county i has Current Population Survey samples for all the three years t−1, t and t+1. Then, for this county, the direct estimate of the logarithm of the number of poor children in the 5–17 age-group for year t is defined as y* =log {(w* p +w* p +w* p )/(w* z +w* z +w* z )}. (5·1) it i,t−1 i,t−1 i,t i,t i,t+1 i,t+1 i,t−1 i,t−1 i,t i,t i,t+1 i,t+1 Here the weight w* (u=t−1, t, t+1) is proportional to the number of interviewed housing i,u units in county i containing at least one child aged 5–17 in year u. For counties with Current Population Survey samples in only one or two of the three years, the formula given in (5·1) is modified by taking values only for that year or a two-year average analogous to (5·1). The direct estimates y* are assumed to be N(h , V ) with V >0 known, it it it it and h ~N(xT b, A) with A>0 unknown, and an empirical Bayes analysis is carried out it it along the lines of Fay & Herriot (1979). For the detailed estimation procedure we refer to a Bureau of the Census technical report by W. R. Bell and others. However, the above procedure has certain limitations. First, counties with no poor child observed in the Current Population Survey sample during the three years are omitted from model-fitting since logarithms cannot be taken when the direct estimate is zero. This has typically led to the elimination of approximately 30% of the sampled counties. Secondly, an estimator based on three years’ combined data may not necessarily produce an unbiased estimator for the single year in question. Moreover, the logarithm of an unbiased estimator of a given parameter is not an unbiased estimator of the logarithm of the parameter. The above two limitations of current models for this project are discussed in detail in an Iowa State University Technical Report by T. Maiti and E. V. Slud. Our procedure, instead, focuses on one specific year. Dropping the suffix t, for a county i, we use the Current Population Survey estimate for that year. However, these estimates involve survey weights, and consequently are not straight averages of the binary responses of individuals in the counties. The individual weights w are constructed in a general way ij that does not explicitly take the response variable into account. The weight construction for Current Population Survey sample units is a complex procedure. The details are available in the United States Census Bureau’s technical paper 63RV entitled ‘Design and methodology’. In summary the final weight is the product of the basic weight, an adjustment for special weighting, a non-interview adjustment, a first-stage ratio adjustment factor and a secondstage ratio adjustment factor. The basic weight is not related to poverty rates. Our basic statistics are the y: for year i. In our framework, y is 1 or 0 according as the child is or iw ij is not poor. Also, we do not have access to the full microdata: the data given to us are y: , W w2 , W w3 and W w4 . iw j ij j ij j ij Zero counts, y: =0, do not cause a problem for our model. Also, we model p , the true iw i proportion of poor school-age children in county i, by logit {E( p )}=b +b x +b x +b x +b x , i 0 1 1i 2 2i 3 3i 4 4i where the covariates are as follows: x =log (proportion of child exemptions reported by families in poverty for tax 1i returns in county i), x =log (proportion of people receiving food stamps in county i), 2i x =log (proportion of child exemptions reported by families for tax returns in 3i county i), x =log (proportion of poor school-age children from the previous census in 4i county i). M G T M 106 These covariates are based on the recommendation of the Bureau of the Census, except that we have considered logarithms of the proportions rather than logarithms of the numbers. Table 2, based on the results of §§ 2 and 3, provides for 1989 the sample sizes, n , surveyi weighted direct estimates, y: , 1990 census estimates, c , empirical best linear unbiased iw i predictors, m@ , and the corresponding estimates of the approximate root mean squared i errors, (m@ ), for all 39 counties of the state. We also provide naive estimates of the i root mean squared errors given by T D (g@ ), ignoring T (g@ ) and T (g@ ) completely. This 1i 2i 3i amounts to ignoring uncertainty in the estimation of the model parameters. The average values of the y: ’s and m@ ’s are 0·152 and 0·164 respectively, while average values of the c ’s iw i i is 0·171. The average naive root mean squared error is 0·051, whereas the average value of the (m@ )’s is 0·061. i Table 2. Data for poor school-age children. Estimates and root mean squared errors for the natural exponential family quadratic variance function models C i n c y: iw m@ 1 2 3 4 5 6 7 8 9 10 i 3 6 10 13 13 18 24 25 26 26 i 0·376 0·127 0·128 0·111 0·287 0·142 0·121 0·171 0·173 0·189 0·000 0·157 0·000 0·000 0·000 0·293 0·085 0·093 0·000 0·267 i 0·198 0·132 0·067 0·060 0·019 0·287 0·112 0·096 0·028 0·261 11 12 13 14 15 16 17 18 19 20 31 31 36 37 38 38 38 38 41 55 0·081 0·235 0·091 0·174 0·223 0·141 0·159 0·268 0·266 0·148 0·000 0·036 0·063 0·163 0·316 0·149 0·076 0·218 0·091 0·162 0·024 0·039 0·074 0·158 0·322 0·161 0·093 0·214 0·094 0·163 Naive ×10 Corr ×10 C i 1·780 0·459 1·137 1·035 0·801 0·714 0·767 0·543 0·745 0·546 1·823 0·753 1·196 1·060 1·048 1·021 0·718 0·785 0·748 0·807 0·676 0·395 0·611 0·165 0·669 0·615 0·618 0·043 0·469 0·427 0·680 0·569 0·756 0·250 0·722 0·664 0·693 0·058 0·621 0·581 n c y: iw m@ 21 22 23 24 25 26 27 28 29 30 i 59 62 65 67 67 70 72 75 86 92 i 0·131 0·211 0·156 0·195 0·133 0·136 0·087 0·236 0·207 0·147 0·376 0·525 0·301 0·201 0·276 0·034 0·075 0·239 0·245 0·061 i 0·370 0·521 0·301 0·202 0·275 0·041 0·084 0·243 0·242 0·065 31 32 33 34 35 36 37 38 39 117 127 178 207 215 224 282 296 945 0·111 0·174 0·147 0·171 0·135 0·133 0·168 0·139 0·229 0·064 0·164 0·097 0·198 0·073 0·170 0·153 0·170 0·351 0·069 0·169 0·010 0·201 0·077 0·172 0·155 0·171 0·352 Naive ×10 Corr ×10 0·365 0·472 0·441 0·415 0·437 0·444 0·459 0·441 0·273 0·384 0·548 0·560 0·555 0·538 0·582 0·529 0·460 0·489 0·390 0·486 0·360 0·344 0·296 0·273 0·272 0·259 0·234 0·229 0·134 0·398 0·346 0·308 0·277 0·278 0·259 0·235 0·230 0·135 C, County; Corr, Corrected The empirical best linear unbiased predictors shrink the direct estimates towards some regression surface. As is evident from Table 2, the amount of shrinkage is less for counties with large sample sizes, such as counties 35, 36, 37, 38 and 39, than for those with smaller sample sizes, such as counties 1 and 2. Furthermore, a comparison of naive and the proposed root mean squared errors reveals that, for counties with large sample sizes, the proposed values do not change the naive values very much. On the other hand, for counties with smaller sample sizes, changes are more substantial. Our method is clearly not comparable to that used by the Bureau of the Census: they have combined three years’ data, which we have not, and they do not provide, as such, estimates of the proportions of poor children in the 5–17 age-group for the different Small-area estimation 107 counties. They provide estimated numbers of poor children in the 5–17 age-group, but do not provide the analogous estimates of the total number of children in the same agegroup. Hence, to provide some comparative results, we consider a normal model for the logarithms of the sampled proportions. This is different from modelling the log counts as done by the Bureau of the Census. The normal theory model is given by log ( y: )=h +e , iw i i where the e are independent N(0, v /n ). Also, h is distributed, independently of e , as i e i i i N(xT b, s2 ). Here v =9·97, an approximate value supplied by the Bureau of the Census, i u e and is treated as known. Note that, for the normal model, to overcome nonidentifiability one variance component needs to be known. The two sets of estimates are compared against the 1990 decennial census estimates for 1989. Based on the recommendation of the panel of National Academy of Sciences, for any estimate e=(e , . . . , e )T, we compute the average relative bias, 1 W 39 (|c −e |/c ), 1 39 i i 39 i=1 i the average squared relative bias, 1 W 39 {(c −e )2/c2 }, the average absolute bias, i i i 39 i=1 1 W 39 |c −e |, and the average squared deviation, 1 W 39 (c −e )2. The results are i i 39 i=1 i 39 i=1 i reported in Table 3. Table 3. Poor school-age children example. Comparison of direct, normal and - estimates Estimates Average relative bias Average squared relative bias Average absolute bias Average squared deviation Direct Normal - 0·5444 0·6634 0·4662 0·4870 0·4541 0·3820 0·0945 0·0116 0·0814 0·0169 0·0140 0·0125 -, natural exponential quadratic variance function It follows from Table 3 that pseudo empirical best linear unbiased predictors based on the beta-binomial model perform better than the direct estimates under all four criteria, while they perform better than the estimates based on the normal model except in terms of average absolute bias. 6. D An alternative to the present estimating equations approach is the extended quasilikelihood approach (Nelder & Pregibon, 1987; McCullagh & Nelder, 1989). Indeed, Raghunathan (1993) proposed a quasilikelihood-based method for small-area estimation in a context where the direct estimators were the unit-level estimators y , and not the ij survey-weighted estimators y: . The apparent advantage of the quasilikelihood approach iw over the estimating function approach is that the small-area estimates can be obtained only from the first two moments of the y ’s and the first two moments of the prior. In ij the present context, the quasilikelihood, or more appropriately the quasi-loglikelihood, based on E( y: |m ) and var ( y: |m ) is given by iw i iw i m y: −t (m )=− 1 log {2pd V (m )}+d−1 i iw dt. i i i i 2 V (t) y:iw One difficulty with the quasilikelihood is that it does not provide an analytical method for finding mean square errors. Resampling methods such as jackknife and bootstrap are possible, and have been used by Raghunathan (1993), but the analytical approximation P 108 M G T M as given here also has its virtues, especially since the overdispersed natural exponential family quadratic variance function structure facilitates calculation of moments, making the procedure readily available for implementation. A We would like to thank the referees, an associate editor and the editor for comments that led to improvement of the paper. It is a pleasure to acknowledge Dr William Bell of the Bureau of the Census for many useful conservations. The paper was completed when both authors were American Statistical Association/National Science Foundation/ Census Senior Research Fellows at the Bureau of the Census. The research was partially supported by grants from the United States National Science Foundation. Thanks are due to Bhramar Mukherjee for her help in the preparation of the final version of the manuscript. A T echnical details Proof of T heorem 2. We begin with the calculation E(m −m@ )2=E(m −mA +mA −m@ )2=E(m −mA )2+E(mA −m@ )2+2E(m −mA )(mA −m@ ). i i i i i i i i i i i i i i First we calculate E(m −mA )2=E{m −m −(1+ld )−1(y: −m )}2 i i i i i iw i =(1+ld )−2E{ld (m −m )−(y: −m )}2. iw i i i i i (A·1) (A·2) Since V (m ) i , E(y: −m )2=ld V (m )(l−v )−1, E(m −m )2= i i iw i i i 2 l−v 2 E{(m −m )(y: −m )}=E[(m −m )E{(y: −m )|m )}]=0, i i iw i i i iw i i it follows from (3·2) that E(m −mA )2=T , (A·3) i i 1i where T is defined after the statement of Theorem 2 in § 3. In order to evalaute E(mA −m@ )2, let 1i i i g=(b, l)T and q (g, y: )=r y: +(1−r )m . By one-step Taylor expansion, we have i iw iw iw iw i ∂q 2 E(mA −m@ )2=E{q (g, y: )−q (g@ , y: )}2jE (g@ −g)T i i i i iw i iw ∂g q qA BA B r r ∂q i ∂g ∂q T i (g@ −g)(g@ −g)T . ∂g ABq r =tr E However, ∂q i ∂b (1−r )V (m )x ∂q iw i i i= = . ∂g ∂q −d (1+ld )−2(y: −m ) i i i iw i ∂l (A·4) Small-area estimation 109 Hence, from (A·4), after slight simplification, we have E(mA −m@ )2jd2 (1+ld )−2 tr E i i i i l2V 2(m )x xT −lV (m )(y: −m )x i i i i iw i i −lV (m )(y: −m )xT (1+ld )−2(y: −m )2 i iw i i i iw i qA B r ×(g@ −g)(g@ −g)T . (A·5) We will show now that the right-hand side of (A·5) can be approximated, up to O(k−1), by d2 (1+ld )−2 tr i i qA B r l2V 2(m )x xT 0 i i i U−1 =T . k 2i 0 0 (A·6) Since U =O(k), the approximation (A·6) is accurate up to O(k−1). k To see this, we write S (g)=W k DT S−1 g . By one-step Taylor expansion, k j=1 j j j 0=S (g@ )jS (g)+ k k A B ∂S (g) T k (g@ −g). ∂g Since qA B r ∂g ∂S (g) ∂ k k = ∑ DT S−1 g +DT S−1 j , j j j ∂g ∂g ∂g j j j=1 we have E{−∂S (g)/∂g}=W k DT S−1 D =U . Also, var (S )=U . By the Central Limit Theorem, k j k k k j=1 j j U−1 S is asymptotically N(0, U−1 ). Hence, g@ −gjU−1 S +o (k−D), and k k k k k p − E{(g@ −g)(g@ −g)T}=U−1 +o(k−1). k Furthermore, by the mutual independence of the y: (i=1, . . . , k), iw E{(y: −m )(g@ −g)(g@ −g)T}=E{(y: −m )U−1 S ST U−1 }+o(k−1) iw i iw i k k k k =U−1 E{(y: −m )DT S−1 g gT S−1 D }U−1 +o(k−1) iw i i i i i i i k k =O(k−2)+o(k−1)=o(k−1), and a similar calculation shows that E{(y: −m )2(g@ −g)(g@ −g)T}=o(k−1). iw i The above calculations lead to (A·6) from (A·5). Finally we approximate E(m −mA )(mA −m@ ) up to O(k−1). Writing y =(y , . . . , y )T, we obtain i i i i i i1 ini E{(m −mA )(mA −m@ )}=E[{m −E(m |y )+E(m |y )−mA }(mA −m@ )] i i i i i i i i i i i i =E[{E(m |y )−mA }(mA −m@ )]. i i i i i (A·7) If y: and m have a joint bivariate normal distribution, we have mA =E(m |y ) and the above expression iw i i i i simplifies to zero. However, in general, this needs an approximation. From Morris (1983), we have E(m |y )=r y: +(1−r )m , where y: =n−1 W ni y and r =(1+ln−1 )−1. Accordingly, we can i i i i i i i i i i j=1 ij rewrite (A·7) as E(m −mA )(mA −m@ )=E[{r (y: −m )−r (y: −m )}(mA −m@ )]. i i i i i i i iw iw i i i The approximation of (A·8) proceeds as follows. (A·8) M G T M 110 By two-step Taylor expansion, mA −m@ =q(g, y: )−q(g@ , y: )j− i i iw iw qA B r ∂2q ∂q T 1 (g@ −g) . (g@ −g)+ (g@ −g)T ∂g 2 ∂g ∂gT Now approximating g@ −g by U−1 S as was done earlier, we have k k E(m −mA )(mA −m@ )jT −T , i i i i 31i 32i where C (A·10) A B A B D ∂q T i U−1 S (g) , T =−E {r (y: −m )−r (y: −m )} k k i i i iw iw i 31i ∂g A (A·9) (A·11) B 1 ∂2q i U−1 S (g)ST (g)U−1 ] . T =− tr E[{r (y: −m )−r (y: −m )} 32i i i i iw iw i k k k k 2 ∂g ∂gT (A·12) We begin with the calculation ld (1+ld )−1V ∞(m )V (m )x xT d (1+ld )−2V (m )x ∂2q i i i i i i i i i i i = . (A·13) ∂g ∂gT d (1+ld )−2V (m )xT 2d2 (1+ld )−3(y: −m ) i i i i i i iw i As in the calculation of (A·6), from (A·12) and (A·13), T =O(k−2). Finally, we calculate 32i T =T −T , (A·14) 31i 311i 312i where A B Cq Cq A Br D A Br D ∂q T i U−1 , =tr E S (g)r (y: −m ) k iw iw i ∂g k 311i T (A·15) ∂q T i T =tr E S (g)r (y: −m ) U−1 . 312i k i i i ∂g k (A·16) We now evaluate T and T separately. First we have 311i 312i ∂q T i E S (g)r (y: −m ) k i iw i ∂g q A Br q A Br ∂q T i =E (DT S−1 g )(y: −m ) i i i iw i ∂g =DT S−1 E i i CA y: −m iw i (y: −m )2−w V (m ) iw i i i B (y: −m ) iw i q ld d i V (m )xT − i (y: −m ) i i i 1+ld (1+ld )2 iw i i ld (1+ld )−1V (m )m xT −d (1+ld )−2m i i i 2i i i i 3i =DT S−1 i i ld (1+ld )−1V (m )m xT −d (1+ld )−2(m −m2 ) i i i 3i i i i 4i 2i A A m m 3i =DT S−1 2i i i m m −m2 3i 4i 2i =DT i = A C BA B ld (1+ld )−1V (m )xT 0 i i i i 0 −d (1+ld )−2 i i ld (1+ld )−1V (m )xT 0 i i i i 0 −d (1+ld )−2 i i B D ld (1+ld )−1V 2(m )x xT −V (m )V ∞(m )d (1+ld )−1(l−v )−1x i i i i i i i i i 2 i . 0 −V (m )d (1+ld )−2(l−v )−2(1+v d ) i i i 2 2 i B rD Small-area estimation 111 Hence, =d (1+ld )−1V (m ) tr 311i i i i T qA lV (m )x xT −V ∞(m )(l−v )−1x i i i i 2 i U−1 , k 0 −(1+ld )−1(1+v d )(l−v )−2 i 2 i 2 (A·17) B r which is O(k−1). Also, C C A B qA B ∂q T i (y: −m )}U−1 T =tr E{DT S−1 g 312i i i k i i i ∂g D r D ∂q T i (y: −m ) U−1 , =tr DT S−1 E g i i i ∂g k i i (A·18) which is O(k−1). Furthermore, ld (1+ld )−1V (m ) f xT −d (1+ld )−2 f ∂q T i i i i11 i i i i12 , i (y: −m ) = i ∂g i i ld (1+ld )−1V (m ) f xT −d (1+ld )−2 f i i i i12 i i i i22 qA B E g r C D (A·19) where =E{(y: −m )(y: −m )}=V (m )(l−v )−1(1+ln−1 ), i11 iw i i i i 2 i and, after much algebraic simplification, f =E{(y: −m )2(y: −m )}= i12 iw i i i f f i22 (A·20) (l+n )(ld +2)V (m )V ∞(m ) i i i i , n (l−v )(l−2v ) i 2 2 (A·21) =E{(y: −m )3(y: −m )}−w V (m ) f iw i i i i i i11 1 {df +3(d +2v f )V (m )}V (m ) i i 2 i i i n i dv f 6 2 i+ (d +v f )+f [2v V (m )+{V ∞(m )}2]+3(n−1 +d )V (m ) E(m −m )2 + 2 i i i i i i i 2 i i n n i i i 1 (5v d +4v2 f +1)+v f +d V ∞(m ) E(m −m )3 +3 2 i 2 i 2 i i i i i n i 1 + (9v2 d +6v3 f +3v )+2v2 f +3v d +1 E(m −m )4 2 i 2 i 2 2 i 2 i i i n i −(1+ld )V 2(m )(l−v )−2(1+ln−1 ). (A·22) i i 2 i Combining (A·1), (A·3), (A·6), (A·10)–(A·12) and (A·14)–(A·22), we obtain the result. % = A q Cq q r B r D r Approximation of E(g@ −g). Let U−1 =(Urs ). We first need some notation associated with k k DT S−1 . Let i i e =D−1 {m −m2 −m w V (m )}, e =D−1 {m w V ∞(m )−m }, 1i i 4i 2i 3i i i 2i i 2i i i 3i e =−D−1 m V (m ), e =D−1 m V (m ), 3i i 3i i 4i i 2i i where we recall that V (m )=v +v m +v m2 and w =(1+ld )(l−v )−1. We now write i 0 1 i 2 i i i 2 g=(g , . . . , g )T and S (g)¬S =(s , . . . , s , s )T, where 1 p+1 k k k1 kp k,p+1 ∂w k k s = ∑ (e g +e g )V (m )x , s =− ∑ (e g +e g ) i . kr 1i 1i 2i 2i i ir k,p+1 3i 1i 4i 2i ∂l i=1 i=1 112 M G T M We will also find it convenient to write l=b A B p+1 . Then, following Cox & Snell (1968), let A ∂2s ∂s kr J =cov Ust , kr , K =E k ∂b rst t,rs ∂b ∂b s s t B (r, s, t,=1, . . . , p+1). Now writing AA BB ∂2s kr J =(J ), K = E , k(r) t,rs k(r) ∂b ∂b s t aT =(tr (U−1 J ), . . . ,tr (U−1 J )), bT =(tr (U−1 K ), . . . , tr (U−1 K )), k k k(1) k k(p+1) k k k(1) k k(p+1) by the Cox–Snell formula, we have (A·23) E(g@ −g)=U−1 (a + 1 b )=z(g), k k 2 k say. Finding the elements of z(g) requires heavy algebra. The details are omitted here, and can be obtained from the authors. We note also that z(g)=O(k−1) since U−1 is O(k−1), and its multiplier k is O(1). R C, D. R. & S, E. J. (1968). A general distribution of residuals (with Discussion). J. R. Statist. Soc. B 30, 248–75. F, R. E. & H, R. A. (1979). Estimates of income from small places: An application of James-Stein procedures to census data. J. Am. Statist. Assoc. 74, 269–77. G, V. P. & T, M. E. (1989). An extension of quasi-likelihood estimation (with Discussion). J. Statist. Plan. Infer. 22, 137–52. K, P. (1989). Robust small domain estimation using random effects modeling. Survey Methodol. 15, 3–12. MC, P. & N, J. A. (1989). Generalized L inear Models, 2nd ed. London: Chapman and Hall. M, C. (1982). Natural exponential families with quadratic variance functions. Ann. Statist. 10, 65–80. M, C. (1983). Natural exponential families with quadratic variance functions: Statistical theory. Ann. Statist. 11, 515–29. N, J. A. & P, D. (1987). An extended quasi-likelihood function. Biometrika 74, 221–32. P, D. (1993). The role of sampling weights when modeling survey data. Int. Statist. Rev. 61, 317–37. P, D., K, A. M. & R, Y. (1998). Parametric distributions of complex survey data under informative probability sampling. Statist. Sinica, 8, 1087–114. P, D. & S, M. (1999). Parametric and semiparametric estimation of regression models fitted to survey data. Sankhya: B 61, 166–86. P, D., S, C. J., H, D. J., G, H. & R, J. (1998). Weighting for unequal selection probabilities in multi-level models (with Discussion). J. R. Statist. Soc. B 60, 23–40. P, N. G. N. & R, J. N. K. (1990). The estimation of the mean squared error of small-area estimators. J. Am. Statist. Assoc. 85, 163–71. P, N. G. N. & R, J. N. K. (1999). On robust small area estimation using a simple random effects model. Survey Methodol. 25, 67–72. R, T. E. (1993). A quasi-empirical Bayes method for small area estimation. J. Am. Statist. Assoc. 88, 1444–8. R, G., R, J. N. K. & K, S. (1987). Logistic regression analysis of sample survey data. Biometrika 74, 1–12. S, J. (1999). Mathematical Statistics. New York: Wiley. S, C. J., H, D. & S, T. M. F. (Eds.) (1989). Analysis of Complex Surveys. New York: Wiley. [Received January 2002. Revised June 2003]
© Copyright 2026 Paperzz