Quasi-Optimal Estimation in Survey Sampling Stima quasi ottimale nelle inferenze campionarie Giorgio E. Montanari Department of Statistical Sciences-University of Perugia Via A. Pascoli - 06100 Perugia - Italy - E-mail:[email protected] Riassunto: in questo lavoro si propone una classe estesa di stimatori per regressione generalizzata che include come caso particolare lo stimatore ottimo. Vengono poi presentati alcuni risultati che documentano come quest'ultimo sia equivalente ad uno stimatore per regressione generalizzata basato su un modello di regressione lineare che include tra le variabili esplicative le variabili bilanciate rispetto al disegno campionario in uso, quelle cioé le cui medie sono stimate senza errore dagli usuali stimatori corretti. Attraverso una scelta oculata delle variabili bilanciate è allora possibile costruire uno stimatore quasi ottimale capace di coniugare l'efficienza asintotica dello stimatore ottimo con la maggiore stabilità di quello per regressione generalizzata. Keyword: generalised regression estimator, superpopulation model, optimal estimator. 1. Introduction Regression estimation is a powerful technique for estimating finite population means or totals of survey variables when the population means or totals of a set of auxiliary variables are known. Two types of regression estimators have been recently studied in literature, namely the Generalised Regression Estimator (GRE) and the Optimal Estimator (OPE). Till now, they have been studied separately (for a short review see Montanari, 1998). In this paper we explore the connections between the two types of regression estimators and establish conditions under which a GRE estimator is exactly or asymptotically equivalent to the OPE. Then, an estimation strategy able to merge the large sample efficiency of the OPE with the greater stability of the GRE for samples of moderate size is proposed. 2. The framework Consider a finite population U ={u1, u2, ... , uN}, where the i-th unit is represented by its label i. Let Yi and xi = (X1i, X2i,..., Xqi )' be the values associated with unit i of the survey variable y and of a q-dimensional auxiliary variable x. The population mean vector x iN1 x i / N of x is assumed known, e. g. from administrative registers, or census data. The unknown y variable mean, Y iN1Yi / N , has to be estimated by means of a sample s drawn from U according to a probabilistic sampling design and taking into account the knowledge of x . Let Yˆ = is Yi / N i and x̂ = is x i / N i be the design-unbiased HorvitzThompson estimators of Y and x , respectively, where i (i=1,…,N) is the first order inclusion probability of unit i. The most common way of taking into account the knowledge of the auxiliary variable population means is to adopt the regression estimator Yˆ R = Yˆ + ̂ '( x - x̂ ), where ̂ is a vector of regression coefficients given by some function of sample data {(Yi, xi); is}. The class of regression estimators contains well known estimators, such as ratio and product estimators, even with linearly transformed auxiliary variables, post-stratified and regression estimators. So, the main issue the statistician has to deal with is the definition of ̂ . ~ The OPE is obtained from the difference estimator YO = Yˆ +B'( x - x̂ ), where B is a vector of constants. The latter is an unbiased estimator of Y , and assembling the values of y and x into a N-vector Y and an Nq-matrix X having xi' on its i-th row, the ~ sampling variance of YO is minimised by taking B = (X'WX)-1X'WY (Montanari, 1987), where W is the NN-matrix whose ij-th entry is (ijij)/N 2ij , and ij is the second order inclusion probability of units i and j (i,j=1,…,N; ii =i). When X'WX is singular and its rank is q< q, to define B one or more entries of x, hence of x , have to be dropped in such a way to obtain a q q non singular variance matrix X'WX. The optimum value of B can be estimated in many ways. For our purposes, if we take Horvitz-Thompson estimators of variances and covariances of unbiased estimators, under mild conditions on the sampling design, a consistent estimator of B is given by B̂ = (Xs'WssXs)-1Xs'WssYs , where Wss = {(ij - ij) /N 2ij ij}i,js; Xs = {xi'}is ; Ys= {Yi}is. ~ ˆ '(x x ˆ ) , i.e. the OPE. In large samples Then, replacing B by B̂ in YO , we get YˆO Yˆ B ~ this estimator shares the properties of YO , the latter being the first order Taylor linear approximation of the former. So, asymptotically, i.e. up to terms of the first order, YˆO is design unbiased, and has the minimum variance among all regression estimators based on the same auxiliary information x . The main drawback of YˆO is that it is complex to compute and may be unstable in finite size samples (Casady and Valliant, 1993). However, if an adequate number of degrees of freedom is available for estimating B, the problem can be overcome. A GRE is based on a "working superpopulation linear regression model" relating the survey variable to the auxiliary variables whose population means are known. Consider the model E m ( Y) = X' and Vm ( Y) = 2 , where = diag{vi}i=1,…,N is a known matrix. Note that Em , Vm and Cm denote the expected value, variance and covariance with respect to the model. Let ̂ N = (X'X)-1XY be the census weighted least-squares regression estimator of . Then, replacing ̂ N by the consistent estimator ̂ n = (Xs' ss1 Xs)-1Xs ss1 Ys , where ss1 = diag{1/vii}is, the corresponding GRE is defined to be YˆG = Yˆ + ̂ n '( x - x̂ ). The large sample properties of YˆG can be established by means ~ of its first order Taylor linear approximation YG = Yˆ + ̂ N '( x xˆ HT ) (Särndal, Swensson and Wretman, 1992; p. 235). In particular, YˆG is asymptotically design unbiased, and when the working model holds true it has the minimum expected asymptotic design variance with respect to the model, i.e. for any other design unbiased or approximately ~ design unbiased estimator Yˆ * of Y , EmV( YG ) EmV( Yˆ * ), for all . On the contrary, if the model is wrongly specified, the value of the asymptotic variance of YˆG may be sensibly higher than that of YˆO based on the same auxiliary information x . 3. An Extended class of GRE's To explore the connections between GRE's and the OPE, let us enlarge the GRE class to embody non-diagonal model variance matrix. Consider the model E m ( Y) = X and Vm ( Y) = 2, where is now a positive definite symmetric NN-matrix. Let ss1 be the symmetric matrix that has vij/ij as ij-th entry, where i,js and vij is the ij-th entry of -1. Then, provided that the matrix (Xs' ss1 Xs)-1 exists for all sample s, the corresponding Extended GRE (EGRE) can be written as YˆEG = Yˆ + ̂ n '( x xˆ ), where ̂ n = (Xs' ss1 Xs)-1Xs' ss1 Ys. Observe that the entries of Xs' ss1 Xs and Xs' ss1 Ys are design unbiased estimators of the corresponding entries of X'-1X and X'-1Y, respectively, provided that vij 0 implies ij 0. So, under mild conditions on the second order inclusion probabilities, ̂ n converges in probability to ̂ N = (X'-1X)-1X'-1Y, i. e. the census weighted least-squares regression estimator of , and the first order Taylor ~ approximation of YˆEG is YEG = Yˆ + ̂ N '( x xˆ ). Obviously, when is a diagonal matrix, the EGRE reduces to a GRE. Note that when W is non singular, the OPE belongs to the EGRE class defined above setting = W-1. However, generally W is only non-negative definite, being singular for many sampling designs. But, even in such a case there are EGRE's that are asymptotically equivalent to the OPE, as we show next. Let us call Design Balanced Variable (DBV) any non null auxiliary variable z whose mean is estimated without error by the Horvitz-Thompson estimator Zˆ = is Z i / N i , i.e. Zˆ Z = 0 , for all s. For this reason, the DBV's are normally not inserted into a regression estimator. But, assembling the population values Zi into the N-vector Z, the following theorems can be proved. Theorem 1. A variable z is a DBV if and only if Z belongs to the subspace orthogonal to that spanned by the columns of W. It follows that the subspace spanned by the DBV's has dimension t=Nr(W), where r() denotes the rank of a matrix. So, there are at most t linearly independent DBV's. Now, let Z be an Nt-matrix containing the population values of t linearly independent DBV's and assume that X does not contain any DBV. Theorem 2. Assume the model E m ( Y) = (ZX) and Vm ( Y) = 2 . If is chosen so ~ ~ that there exists a scalar such that -1-1Z(Z'-1Z)-1Z'-1 = W, then YEG = YO , i.e. the EGRE is asymptotically equal to the OPE based on the same auxiliary information x . Theorem 2 provides a sufficient condition for the asymptotic equality between an EGRE and the OPE that uses the same auxiliary variable x ; besides the auxiliary variables that are not DBV's, the working model should include a number t of linearly independent DBV's and the variance matrix 2 should be set so that -1-1Z(Z'-1Z)-1Z'-1 is proportional to W. In this way, it is possible to get an asymptotic minimum variance estimator, irrespective of the working model goodness, given the amount of auxiliary information x . Technically, the information provided by Z and is summarised by W, hence by the sampling design. The next theorem assures the finite size sample identity between an EGRE and the OPE. Theorem 3. If -1 - -1Z(Z'-1Z)-1Z'-1 W implies ss - ss Zs(Zs' ss Zs)-1Zs' ss Wss, then YˆEG = YˆO . 4. Conclusion Generally speaking, the OPE is approximately equal to an EGRE based on a working model that includes the effect of any existing DBV's. Unfortunately, theorem 2 does not provide guidelines for determining the structure of the matrix and the DBV's that correspond to the sampling design in use. However, for common designs, easy solutions can be found. So, given an amount of auxiliary information x , the OPE is approximately or exactly equal to an EGRE based on a working model which includes the maximum number of linearly independent DBV's and assumes a variance matrix that reflects the structure of the first and second order inclusion probabilities. Hence the OPE allows a better fit of the data, and this explains its asymptotic superiority. However, in finite size samples, besides incidental collinearities among regressors, the OPE is exposed to instabilities due to a likely inadequate number of residual degrees of freedom available for fitting the model. In particular, this concern may be relevant in stratified designs with a few observations per stratum, or multistage sampling designs with a few PSU's per stratum. The above analysis suggests the following quasi-optimal estimation strategy. When a reliable model is lacking, use an GRE based on a working model with a variance matrix set according to theorem 2 and with a suitably reduced number of DBV's to achieve a sufficient number of degrees of freedom to fit the model. In particular, in the case of stratified samples, this may be accomplished introducing into the model DBV's corresponding to superstrata obtained collapsing original strata. Collapsing should be performed so that within each superstratum, strata effects can be considered negligible. For each superstratum, a DBV is obtained adding the DBV's of the collapsed strata. By reducing the number of DBV's inserted into the model, we accept a smaller level of asymptotic efficiency to better control the finite size sampling variance of the estimator. This strategy has been implemented in a simulation study and the resulting estimator was on the average the most efficient across populations generated under a variety of superpopulation models. Results are available from the author. References Casady R.J., Valliant R. (1993) Conditional properties of post-stratified estimators under normal theory, Survey Methodology, 19, 183-192. Montanari G.E. (1987) Post-sampling efficient QR-prediction in large-scale surveys, International Statistical Review, 55, 191-202. Montanari G.E. (1998) On regression estimation of finite population mean, Survey Methodology, 24, 69-77. Särndal C.E., Swensson B., Wretman. J.H. (1992) Model assisted survey sampling, Springer-Verlag, New York.
© Copyright 2026 Paperzz