Sample Correlation Coef’ cients Based on Survey Data Under Regression Imputation Jun Shao and Hansheng Wang Regression imputation is commonly used to compensate for item nonresponse when auxiliary data are available. It is common practice to compute survey estimators by treating imputed values as observed data and using the standard unbiased (or nearly unbiased) estimation formulas designed for the case of no nonresponse. Although the commonly used regression imputation method preserves unbiasedness for population marginal totals (i.e., survey estimators computed from imputed data are still nearly unbiased), it does not preserve unbiasedness for population correlation coef cients. A joint regression imputation method is proposed that preserves unbiasedness for marginal totals, second moments, and correlation coef cients. Some simulation results show that the joint regression imputation method produces not only sample correlation coef cients that are nearly unbiased, but also estimates that are more stable than those produced by marginal nonrandom regression imputation when correlation coef cients are in a certain range. Variance estimation for sample correlation coef cients under joint regression imputation is also studied, using a jackknife method that takes imputation into account. KEY WORDS: Jackknife; Joint imputation; Marginal imputation; Random imputation; Variance estimation. 1. INTRODUCTION Most surveys have nonresponse. Item nonresponse occurs when a sampled unit fails to provide information on some items (variables) in the survey. Imputation is commonly applied to compensate for item nonresponse. In addition to some practical reasons for imputation (Kalton and Kasprzyk 1986), imputation using auxiliary data may produce survey estimators that are more ef cient than the one obtained by ignoring nonrespondents and reweighting. It is a common practice to compute survey estimators by treating imputed values as observed data and using the standard unbiased (or nearly unbiased) estimation formulas designed for the case of no nonresponse (e.g., standard survey estimators for population item totals and correlation coef cients among items are sample weighted totals and sample correlation coef cients). Therefore, we should select an imputation method that preserves the unbiasedness in the sense that the survey (point) estimators computed from imputed data using standard formulas are still unbiased or nearly unbiased. Throughout this article, an imputation method is called unbiased for estimating a given set of parameters if it preserves the unbiasedness. As we discuss later, the unbiasedness of an imputation method often depends on the type of parameters to be estimated. For item nonresponse, imputing the whole vector of items whenever a sampled unit has at least one missing item is rarely used, because it discards many observed data. Marginal imputation, also called item imputation, is often used in practice; that is, items in a survey are imputed separately and the relationship among items is ignored. Marginal imputation is often unbiased for estimating functions of marginal population item totals (or means). For parameters measuring relationship among items such as the population correlation coef cients among items, marginal imputation may not be unbiased, because the relationship is not preserved during marginal imputation. The main purpose of this article is to study (a) the unbiasedness of a popular imputation method, the regression imputation method (see, e.g., Deville and Särndal 1994), for estimating not only marginal totals, but also correlation coef cients among items, and (b) the variance estimation problem for an unbiased regression imputation method. The latter is an important issue because it is well known that, even for an imputation method that produces unbiased or nearly unbiased point estimators, treating imputed values as observed data and applying standard variance estimation formulas designed for the case of no nonresponse may produce seriously biased variance estimators. In Section 2 we introduce the commonly used marginal regression imputation method and a joint regression imputation method, which is an extension of that of Srivastava and Carter (1986). The joint regression imputation method is shown to be unbiased for estimating marginal totals as well as correlation coef cients. We devote Section 3 to the variance estimation problem for sample correlation coef cients under joint regression imputation. We propose a jackknife method that takes imputation into account and produces asymptotically unbiased and consistent variance estimators. We present some simulation results in Section 4 to examine the nite-sample performance of joint regression imputation and the jackknife variance estimator. Simulation results show that the joint regression imputation method produces sample correlation coef cients that are not only nearly unbiased, but also more stable than those produced by marginal nonrandom regression imputation when correlation coef cients are in a certain range. Simulation results also show that the proposed jackknife variance estimator performs well in terms of the bias in variance estimation and the coverage probability of con dence intervals using the jackknife variance estimators. 2. REGRESSION IMPUTATION Let ° be a nite population containing M units and let ³ be a sample from ° obtained according to the following commonly used strati ed multistage sampling plan. The population ° is strati ed into H strata with Nh clusters in the hth stratum. In the rst-stage sampling, nh ¶ 2 clusters are selected Jun Shao is Professor, Department of Statistics, University of Wisconsin, Madison, W I 53706 (E-mail: [email protected]). Hansheng Wang is Principle Statistician, StatPlus, Inc., Yardley, PA 19067. The authors thank the referees for helpful comments and suggestions. The research of Jun Shao was partially supported by National Science Foundation grants DMS-9803112 and DMS-0102223 and National Security Agency grant MDA 904-99-1-0032. 544 © 2002 American Statistical Association Journal of the American Statistical Association June 2002, Vol. 97, No. 458, Theory and Methods Shao and Wang: Sample Correlation Coef’ cients 545 without replacement from stratum h according to some probability sampling plan, and the clusters are selected independently across the strata. A second-stage sample, a third-stage sample, and so on may be taken from each sampled cluster, using some sampling plan independently across the sampled clusters. Associated with the ith unit in ° is a vector 4yi 1 zi 1 : : : 5 of items of interest and a vector xi of auxiliary variables (which is observed for all sampled units). For i 2 ³, survey weights wi are constructed so that when there is no nonresponse, ³ ´ X X Es wi –i D –i 1 i2³ i2° for any set of values 8–i 2 i 2 ° 9, where Es is the expectation with respect to the randomness from sampling ³. 2.1 i2³ As for any other statistical method, the validity of an imputation method relies on some assumptions and/or models. The regression imputation method is based on the following model assumption. The population ° under consideration can be divided into subpopulations (called imputation classes) °k 1 k D 11 : : : 1 K, such that for an item y with nonresponse, i 2 °k 1 (1) i 2 °k 1 (2) where ‚k is a parameter vector and ‚0k is its transpose; vki D vk 4xi 5 with a known function vk 4¢5 > 03 …i is a random error (independent of xi ) with mean 0 and unknown variance ‘ …12 k > 03 ‚k 1‘ …1 k , and vk 4¢5 may be different in different imputation classes or for different items; and ai is the indicator of whether yi is a respondent (i.e., ai D 1 if yi is observed and ai D 0 if yi is a nonrespondent). Note that (1) is a linear regression model with heteroscedastic errors and (2) means that given x, the response probability for item y is independent of y. The creation of K imputation classes allows us to establish a reasonable model such as (1) in the imputation class °k that may not hold for the whole population ° . Imputation classes are usually formed by using an auxiliary variable without nonresponse; for example, in many business surveys, imputation classes are strata or unions of strata. Under (1)–(2), the nonrandom regression imputation method imputes a nonrespondent yi in ³k D ³ \ °k by yiü D ‚O 0k xi 1 where ‚O k D ³ X i2³k ƒ1 wi ai vki xi xi0 ´ƒ1 X i2³ wi yi . (4) i2³ Let Em denote the expectation under model (1)–(2) and let Es be the expectation under sampling, given item values bü 5 Em 4Y 5, where generated by model (1)–(2). Then Em Es 4Y is the equality up to a negligible term under the asymptotic bü is asymptotisetting described in Appendix A.1. That is, Y cally unbiased for Y . In this article we consider the estimation of the correlation coef cient between two items, say y and z, de ned by ³ ´ X D yi zi ƒ Y Z=M µ³ ƒ1 wi ai vki xi y i (3) i2³k is the best linear unbiased estimator of ‚k under (1) based on respondents in ³k . Once nonrespondents are imputed, survey estimators are computed by treating imputed data as observations and using the same formulas as in the case P of no nonresponse. For the (marginal) total of item y1 Y D i2° yi , its standard unbiased X yi2 i2° ƒ Y 2 =M ´³ X z2i i2° ´¶1=2 1 ´¶1=2 1 ƒ Z 2 =M P where Z D i2° zi . When there is no nonresponse, a standard estimator of is the sample correlation coef cient ³ ´ X b b b O D wi yi zi ƒ Y Z=M i2³ µ³ and P4ai D 1—yi 1 xi 5 D P 4ai D 1—xi 51 P i2° Marginal Regression Imputation 1=2 yi D ‚0k xi C vki …i 1 bD estimator in the case of no nonresponse is Y Hence its estimator based on imputed data is X X b Yü D wi ai yi C wi 41 ƒ ai 5yiü 0 X i2³ b wi yi2 ƒ b Y 2 =M ´³ X i2³ b 2 =M b wi z2i ƒ Z b is as de ned previously, M b D Pi2³ wi (M may be where Y b D Pi2³ wi zi . Note that O unknown in many surveys), and Z b1 Z1 b Pi2³ wi yi zi 1 Pi2³ wi yi2 , is notPexactly unbiased. Because Y P P 2 2 and i2³ wi zi are unbiased for Y 1 Z1 i2° yi zi 1 i2° yi , and P 2 O i2° zi 1 is asymptotically unbiased under the asymptotic setting given in Appendix A.1. Consider now the situation in which both y and z have nonrespondents. A model for item z analogous to (1)–(2) is zi D ƒk0 xi C u1=2 ki ‡i 1 i 2 °k (5) and P4bi D 1—zi 1 xi 5 D P4bi D 1—xi 51 i 2 °k 1 (6) where ƒk is unknown, uki D uk 4xi 5 with a known function uk 4¢5 > 0, and ‡i is a random error (independent of xi ) with 2 mean 0 and unknown variance ‘ ‡1 k > 0. Note that …i in (1) and ‡i in (5) are generally dependent. Similarly, ai in (2) and bi in (6) are generally dependent. Nonrandom regression imputed value of a missing zi in ³k is züi D ƒOk0 xi , where ƒOk is an analog of ‚O k for item z. Suppose that nonrespondents for y and z are imputed marginally (separately). For simplicity, let yiü D yi if yi is observed and yiü D the imputed value if yi is a nonrespondent, and let züi be de ned similarly. Then the sample correlation coef cient based on the imputed data is ³ ´ X ü ü bü Z bü =M b O ü D wi yi zi ƒ Y i2³ µ³ X i2³ wi yiü 2 bü =M b ƒY ´³ X i2³ wi züi 2 bü =M b ƒZ ´¶1=2 bü is similarly de ned. Y ü is de ned in (4) and Z where b 1 (7) 546 Journal of the American Statistical Association, June 2002 For marginal imputation, the previous discussion can be extended to the case of more than two items in a straightforward way. P P ü 2 ü ü It can be shown that and i2³ wi yi i2³ wi yi zi have asymptotic biases ³ ´ XX 2 ƒ Em ƒ 41 ai 5vki‘ …1 k k i2°k and ƒE m ³ XX k i2°k ´ 1=2 1=2 41 ƒ ai bi 5vki uki ‘ …1 ‡1 k 1 (8) where ‘ …1 ‡1 k D Em 4…i ‡i 5 for i 2 °k (details can be found in a technical report). These biases are of a nonnegligible order. Consequently, O ü in (7) based on marginal nonrandom regression imputation is not unbiased for the correlation coef cient . We now consider the random regression imputation method, which adds a random term to the regression imputed nonrespondent; that is, a nonrespondent yi in ³k is imputed by ü yiü D ‚O 0k xi C v1=2 ki …i 1 (9) where, given the observed data, the …iü ’s are independent with mean 0 and variance .X X ƒ1 ‘O …12 k D wi ai vki 4yi ƒ ‚O 0k xi 52 wi ai i2³k i…³ k (a consistent and asymptotically unbiased estimator of ‘ …12 k ). Adding random terms is common in imputation (Rubin and Schenker 1986; Srivastava and Carter 1986; Rubin 1987) and con dentiality editing (Kim 1986). The random regression imputation method imputes a nonrespondent zi in ³k by ü zi D ƒOk0 xi ü C u1=2 ki ‡i (10) 1 where, given the observed data, the ‡iü ’s are independent with mean 0 and variance .X X 2 D ‘O ‡1 wi bi uƒki1 4zi ƒ ƒOk0 xi 52 wi bi 0 k i2³k i2³k For marginal imputation, items y and z are imputed according to (9) and (10) separately, and the …iü ’s and ‡iü ’s are generated independently. P ü Em 4Y 5 and It can Pbe shown that EP m Es Eü 4 i2³ wi yi 5 ü 2 2 Em Es Eü 4 i2³ wi yi 5 Em 4 i2° yi 5, where Eü is the expectation with respect to the random term in the imputation. This means that the random regression imputation method is unbiased for the marginal rst and second moments. However, as P an estimator of the cross-product moment, i2³ wi yiü züi has an asymptotic bias given by (8). Thus O ü based on marginal random regression imputation is asymptotically biased, unless ‘ …1 ‡1 k D 0 for all k (i.e., yi and zi are conditionally uncorrelated, given xi ). 2.2 Joint Regression Imputation To obtain an unbiased imputation method for the correlation coef cient, some type of joint imputation is required. Under parametric models, there exist ef cient imputation methods (see, e.g., Rubin and Schenker 1986; Little and Rubin 1987; Schafer 1997). For complex survey data, however, it is often hard to nd a suitable parametric model. Note that the regression imputation method discussed in this article is not a parametric method, because no assumption is imposed on the form of the distribution of …i and ‡i . Consider rst the case in which only y and z are the items of interest (i.e., the bivariate case). We propose a joint regression imputation method that is the same as the random regression imputation method described in Section 2.1 [i.e., missing yi ’s and zj ’s are imputed according to (9) and (10)], except that the random terms …iü and ‡jü are generated according to the following scheme: 1. In the kth imputation class, when ai D 0 but bi D 1 (yi is missing but zi is observed), …iü D ‘O …1 ‡1 k O2 u1=2 ki ‘ ‡1 k 4zi ƒ ƒOk0 xi 5 C …Qiü 1 (11) where, given the observed data, the …Qiü ’s are independent ran2 dom variables with mean 0 and variance ‘O …12 k ƒ ‘O …12 ‡1 k =‘O ‡1 k, ³ O2 ‘ …1 k ‘O …1 ‡1 k 2 ‘O …1 ‡1 k ‘O ‡1 k D ´ X w i a i bi i2³k ³ ry12 ki ry1 ki rz1 ki ry1 ki rz1 ki rz12 ki ´ X w i a i bi 1 (12) i2³k ƒ ƒ ry1 ki D vki1=2 4yi ƒ ‚O 0k xi 5, and rz1 ki D uki1=2 4zi ƒ ƒOk0 xi 5. 2. In the kth imputation class, when ai D 1 but bi D 0, ‡iü D ‘O …1 ‡1 k O2 v1=2 ki ‘ …1 k 4yi ƒ ‚O 0k xi 5 C ‡Q iü 1 (13) where, given the observed data, the ‡Q iü ’s are independent ran2 ƒ O2 ‘ …1 ‡1 k =‘O …12 k . dom variables with mean 0 and variance ‘O ‡1 k 3. In the kth imputation class, when both ai D 0 and bi D 0, the (…iü 1 ‡üi )’s are independently (given the observed data) distributed with mean 0 and covariance matrix given in (12). Note that there are two major differences between joint regression imputation and marginal random regression imputation: 1. When both yi and zi are nonrespondents, …iü and ‡üi are independent in marginal random regression imputation, whereas for joint regression imputation, …iü and ‡iü have covariance given by (12), which is a consistent estimator of the covariance between …i and ‡i . 2. When exactly one of yi and zi is missing, …iü or ‡iü is generated according to (11) or (13) to ensure that the imputed pair of data preserves the same correlation as that of the original pair. Conditional on the observed data, …iü or ‡üi in (11) or (13) does not have mean 0, whereas …iü or ‡iü in marginal random regression imputation has mean 0. Shao and Wang: Sample Correlation Coef’ cients 547 Consider now the multivariate case with q ¶ 3 items and the estimation of correlation coef cients between all pairs of q items. Let ti D 4yi 1 zi 1 : : : 50 be the q vector of item values from unit i. Model (1)–(2) and (5)–(6) should be replaced by 0 1=2 ti D Bk xi C Vki ei 1 i 2 °k (14) and P 4ai D a—ti 1 xi 5 D P4ai D a—xi 51 i 2 °k 1 (15) where Bk D 4‚k 1 ƒk 1 : : : 5 is a matrix of regression parameters; Vki is a diagonal matrix whose diagonal elements are vki D vk 4xi 5 and uki D uk 4xi 51 : : : , with known positive functions vk 1 uk 1 : : : 3 ei is a random error (independent of xi ) with mean 0 and unknown covariance matrix èk ; and ai is the vecbk D 4‚O k 1 ƒOk 1 : : : 5 with tor of response indicators for ti . Let B O ‚k computed according to (3) and ƒOk 1 : : : , computed similarly, èk be a positive de nite consistent estimator of èk . Let and let b 811 : : : 1 q9 be the set of indices corresponding to missing components in q items and let c be the complement of . For each ti , let ti be the subvector of ti containing all nonrespondents and let tc i be the subvector containing all respondents. For any q q matrix A and two subsets 1 and 2 of 811 : : : 1 q9, let A41 1 2 5 be the submatrix containing rows of A indexed by the integers in 1 and columns of A indexed by the 41 5 41 5 4 1 5 èk D b èk , and b èk 1 2 D integers in 2 . De ne Vki D Vki 1 b b è1 2 k . The joint regression imputation method imputes ti by h i ƒ ü 0 1=2 bk b0 c k xi 5 C eQ üi 1 (16) ƒB èc k b èƒc1k Vc 1=2 ti D B xi C Vki b ki 4tc i where, given the observed data, the eQ iü ’s are independent èk ƒ random vectors with mean 0 and covariance matrix b b èc k b èƒc1k b è0c k 4b èƒc1k is de ned to be 0 if D 811 : : : 1 q95. One choice of a positive de nite consistent estimator of èk is .X X ƒ b bk0 xi 54ti ƒ B bk0 xi 50 Vkiƒ1=2 èk D wi Vki 1=2 4ti ƒ B wi 1 (17) i2² k i2² k where ²k is the set of i’s in imputation class k for which ti has no missing components. The proposed joint regression imputation is an extension of formulas (6)–(7) of Srivastava and Carter (1986), who considered the situation of no auxiliary x, normally distributed 4yi 1 zi 1 : : : 5’s, and uniform nonresponse (i.e., missing completely at random). Also, our method of generating random terms incorporates the survey weights wi ’s so that our results are applicable to complex surveys. Srivastava and Carter (1986) proposed drawing random terms from residuals computed using respondents for all items, whereas our method is exible for implementation, because random terms can be generated from any distribution with the covariance matrix given èk . If parameters other than functions of the rst- and by b second-order moments are considered (e.g., quantiles), then random terms should be generated from residuals as follows. For a given imputation class k, let ²k be the set of i’s with missing components indexed by . For each j 2 ²k 1 eQ üj is generated according to X P ü 4eQ jü D rki ƒ rN k 5 D wi wi 1 i 2 ²k 1 i2² k where ²k is the set of i’s in imputation class k for which ti has no missing components, ƒ1=2 ƒ 0 bk b0 c k xi 51 ƒB èc k b èƒc1k Vc 1=2 rki D Vki 4ti ƒ B xi 5 ƒ b ki 4tc i P P and rN k D i2²k wi rki = i2²k wi . It is shown in Appendix A.2 that this joint regression imputation method is unbiased for estimating marginal totals and correlation coef cients among items. 3. VARIANCE ESTIMATION Once a nearly unbiased survey estimator is obtained, the next important step in sample surveys is to nd a nearly unbiased and consistent variance estimator for assessing the variability of the survey estimator. It is well known that if we treat imputed values as observed data and apply standard variance estimation formulas for the case of no nonresponse, then we may seriously underestimate the true variances. It is possible to obtain a nearly unbiased and consistent variance estimator for the sample correlation coef cient O ü using linearization (i.e., Taylor expansion) and substitution. However, under joint regression imputation, linearization for O ü involves very messy derivations, especially for the multivariate case. Instead, we use the following adjusted jackknife method, which is an extension of the method of Rao and Shao bü under marginal ran(1992) for estimating the variance of Y dom imputation. Some other resampling methods have been given by Srivastava (1997). PH hD1 nh = PHAssume that the rst-stage sampling fraction N n =N is negligible, although may not be negligible h h hD1 h for some h’s. In the case of no nonresponse, the jackknife method works as follows. Let X D 8ti 1 xi 1 wi 2 i 2 ³9 denote the dataset including covariates and survey weights, and let 4hj5 Xhj D 8ti 1 xi 1 wi 2 i 2 ³9 be the 4h1 j5th pseudoreplicate after deleting the rst-stage cluster j in stratum h and suitably adjusting the survey weights, where 8 > if i is in cluster j <0 n w 4hj5 h i wi D if i is not in cluster j but in stratum h > : nh ƒ 1 wi if i is not in stratum h. O Let ˆO D ˆ4X5 be a survey estimator. The jackknife variance O estimator for ˆ4X5 is ¶2 nh µ nh H X nh ƒ 1 X 1 X O O vJack D ˆ4Xhj 5 ƒ ˆ4Xhi 5 0 (18) nh jD1 nh iD1 hD1 O Note that the ˆ4X hj 5’s can be computed repeatedly using a O program similar to that for computing ˆ4X5. When there are imputed nonrespondents, formula (18) has to be modi ed to take imputation into account. Let Xü and Xühj be the same as X and Xhj but with imputed nonrespondents, 4hj5 bk4hj5 and b bk and b èk be B èk , with wi ’s replaced by and let B 4hj5 4hj5 4hj5 b bk and èk are B bk and b wi ’s; that is, B èk recalculated using observed data in Xhj . We consider the following adjustment. ü For each imputed vector ti in Xühj [see (16)], we reimpute ü ti using the same imputation method but the observed data in Xhj . More precisely, for each (h1 j), the reimputed vector ü bk and b èk replaced by is the same as ti in (16) but with B 4hj5 4hj5 4hj5 ƒ1 1=2 ü ü b b Bk and èk and with eQ i replaced by 4b èek b èek 5 eQ i , where 548 Journal of the American Statistical Association, June 2002 4hj5 b èek D b èk ƒ b èc k b èƒc1k b è0c k and b èek is b èek recalculated using 4hj5 wi ’s instead of wi ’s. In the bivariate case, for example, if ai D 0 and bi D 1, then yiü is reimputed by 4‚O k 50 xi C 4hj5 O v1=2 ki ‘ …1 ‡1 k 4hj5 C ³ 4.1 4zi ƒ 4ƒOk 50 xi 5 4hj5 O 4hj5 2 u1=2 ki 4‘ ‡1 k 5 4hj5 vki 64‘O …1 k 52 ‘O …12 k 2 ƒ 4‘O …14hj5 O 4hj5 2 ‡1 k 5 =4‘ ‡1 k 5 7 2 ƒ ‘O …12 ‡1 k =‘O ‡1 k ´1=2 …Qiü 1 4hj5 4hj5 4hj5 4hj5 4hj5 where ‚O k 1 ƒOk 1 ‘O …1 k 1 ‘O ‡1 k , and ‘O …1 ‡1 k are the same as ‚O k 1 ƒOk 1 ‘O …1 k 1 ‘O ‡1 k , and ‘O …1 ‡1 k , except that the wi ’s are replaced 4hj5 bk and by wi ’s in their de nitions. Note that not only are B 4hj5 4hj5 b b b èk changed to Bk and èk in the reimputation, but also 4hj5 ƒ1 1=2 ü èek b èek 5 eQ i , which has the random term eQ iü is changed to 4b 4hj5 b imputation variance èek , although we do not generate any new random vectors in reimputation. ü Let XühjR I be the reimputed Xhj . The adjusted jackknife variadj ü O ance estimator vjack for ˆ4X 5 is then obtained using (18) with Xhj replaced by XühjR I. adj To show that vjack is asymptotically unbiased and consisbü or O ü , we consider for tent for the asymptotic variance of Y simplicity the special case of bivariate ti D 4yi 1 zi 50 . Note that Y ü or O ü are differentiable functions of weighted averboth b P bü D Yb4Xü 5 is a ages of the form i2³k wi –i . For example, Y O 2 2 differentiable function of ‚k 1 ‘O …1 ‡1 k 1 ‘O …1 k 1 ‘O ‡1 k (each of which is a function of weighted averages), X X X 1=2 ƒ1=2 wi ai yi 1 wi 41 ƒ ai 5bi xi 1 wi 41 ƒ ai 5bi vki uki zi 1 i2³k X i2³ k X i2³k i2³k i2³k ƒ 1=2 wi 41 ƒ ai 5bi vki uki1=2 xi 1 1=2 wi 41 ƒ ai 5bi vki „i 1 and X i2³k X i2³k wi 41 ƒ ai 541 ƒ bi 5xi 1 Bivariate Case With Normal Errors We consider a one-stage strati ed simple random sampling design used in the Transportation Annual Survey conducted by the U.S. Census Bureau (U.S. Census Bureau 1987). There are 33 strata divided into 4 imputation classes according to business type. Bivariate data 4yi 1 zi 5’s are generated according to (1) and (5) with a univariate x and vki 4x5 D uki 4x5 D x. Stratum sample sizes, survey weights, and model parameter values are listed in Table 1. The x values are independently generated from gamma distributions with means and variances given by those in Table 1. The error terms …i and ‡i are independently generated according to …i D Іi C „i and (19) ‡i D Іi C ’i 1 where †i 1 „i , and ’i are independently distributed as N 401 15 and Š ¶ 0 is a parameter. As Š increases, the value of the correlation coef cient increases. When Š D 01 yi and zi are conditionally independent, given xi . Table 1. Sample Size, Survey Weight, and Model Parameters Across Imputation Classes and Strata Imputation class Stratum Sample size Survey weight 1 1 2 3 4 5 6 7 14 11 10 11 16 18 31 12043 8091 6010 5073 2070 2017 1000 2 1 2 3 4 5 6 8 13 11 12 13 86 3 1 2 3 4 5 6 7 8 9 10 4 1 2 3 4 5 6 7 8 9 10 wi 41 ƒ ai 541 ƒ bi 5v1=2 ki ’ i 1 2 1=2 where „i D …Qiü =4‘O …12 k ƒ‘O …12 ‡1 k =‘O ‡1 and ’i D …iü =‘O …1 k are rank5 Y ü is dom variables with mean 0 and variance 1. (Because b ü a nonlinear function of so many terms and O is a nonlinear function of even more terms, variance estimation by linearization/Taylor expansion involves very messy derivations.) Note O ü R I5 is the same function of the same weighted averthat ˆ4X hj 4hj5 ages, except that the wi ’s are replaced by wi ’s. From the theory of jackknife (see, e.g., Krewski and Rao 1981), the adj jackknife variance estimator vjack is asymptotically unbiased and consistent (under the asymptotic setting in Appendix A.1). Note that the same conclusion cannot be reached if we use O ü 5 in (18). ˆ4X hj Although the same adjusted jackknife method can be applied to nonrandom regression imputation and marginal random regression imputation to obtain consistent variance estibü , jackknife variance estimators for O ü under mators for Y marginal regression imputation are meaningless when O ü is not asymptotically unbiased. 4. joint regression imputation, (b) the proposed jackknife variadj ance estimator vjack for O ü , and (c) the con dence interval (for adj the true correlation coef cient) using O ü and vjack . SIMULATION RESULTS We conducted a simulation study to study the nite-sample performance of (a) the sample correlation coef cient O ü under Model parameters p ‚ ƒ E(x) var(x) 200 300 400 500 600 700 800 01 01 01 01 01 01 01 100 100 100 100 100 100 100 06 06 06 06 06 06 06 32091 9085 10082 6008 3060 1000 205 305 405 505 605 705 02 02 02 02 02 02 05 05 05 05 05 05 07 07 07 07 07 07 14 11 13 14 16 18 15 15 40 38 87091 67039 44048 25028 15057 9080 6023 4068 2013 1000 300 400 500 600 700 800 900 1000 1100 1200 01 01 01 01 01 01 01 01 01 01 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 101 7 13 10 14 13 11 17 19 22 28 32014 16075 12090 7000 6018 4070 3031 1089 1082 1000 305 405 505 605 705 805 905 1005 1105 1205 01 01 01 01 01 01 01 01 01 01 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 Shao and Wang: Sample Correlation Coef’ cients 549 Given the generated data, respondents are generated according to (2) and (6), with independent ai ’s and bi ’s (although our results in Sections 2 and 3 hold even when ai and bi are related), P 4ai D 1—xi 5 D e01C005xi 1 C e01C005xi P4bi D 1—xi 5 D e02C004xi 0 1 C e02C004xi and The average response rates, E6P4ai D 1—xi 57 and E6P4bi D 1—xi 57, are approximately 61075%. The random terms in imputation are generated from a normal distribution with covariance matrix given by (12). We consider O ü based on the joint regression imputation and O ü based on the marginal nonrandom regression imputation. The sample correlation coef cient O based on data without nonresponse is also included. Table 2 lists the mean and standard deviation of three sample correlation coef cients for some values of Š and , computed based on 500 simulations. The results in Table 2 can be summarized as follows: 1. The mean of the sample correlation coef cient O ü under joint regression imputation is very close to the true . This supports our theory of the asymptotic unbiasedness of the joint regression imputation method. The standard deviation of O ü O indiunder joint regression imputation is higher than that of , cating the price paid for nonresponse and imputation. 2. The sample correlation coef cient O ü under marginal nonrandom regression imputation is biased, except for the case where Š D 1 (biases canceled out). Comparing standard deviations of O ü for marginal nonrandom imputation and joint regression imputation, we nd that marginal nonrandom imputation has lower standard deviation when Š µ 1, whereas joint regression imputation has lower standard deviation when Š ¶ 102. The variability in marginal nonrandom imputation increases with the value of Š. q adj The simulation average of vjack as an estimator of the standard deviation of O ü under joint regression imputation is given in Table 3 (the case of normal errors), which shows that the proposed jackknife estimator has a negligible bias in all cases. The average of the naive jackknife standard deviation estimator obtained by treating imputed values as observed data O ü 5 instead of Ô 4Xü R I5 in (18)] was .0317 in the [i.e., using ˆ4X hj hj case of Š D 1, which resulted in a very large negative bias as expected. A large sample 10041 ƒ 5% con dence interval q for the true adj ü z vjack correlation coef cient has the limits O , where z is the 1 ƒ quantile of the standard normal distribution. It is well known that con dence intervals based on the Fisher C Z scale h45 D 21 log 11ƒ perform better in this problem. A large sample 10041 ƒ q5% con dence interval for h45 has adj adj ü the limits h4O 5 z vjack , where vjack is the proposed jackü knife variance estimator for h4O 5. The simulation average of the coverage probabilities and the lengths of 95% con dence intervals (under the raw scale and Fisher Z scale) are reported in Table 3 (the case of normal errors). In terms of the coverage probability, it is clear that the intervals under the Fisher Z scale perform better. Rubin and Schenker (1986) proposed a multiple imputation method that imputes nonrespondents using an approximate Bayesian bootstrap (ABB) imputation. In our simulation, ABB imputation is carried out by generating bootstrap samples from the centered residuals. Multiple imputation imputes nonrespondents independently m ¶ 2 times to obtain imputed O multiple impudatasets e X1 1 : : : 1 e Xm . For a survey estimator ˆ, tation uses m 1 X O e ˆQm D ˆ4 Xl 5 m lD1 as the point estimator and bC vm D W mC1b B m Table 2. Averages of 500 Simulations of Sample Correlation Coef’ cients and Standard Deviations (SD) of Sample Correlation Coef’ cients (bivariate case) Without nonresponse Š 0 .2 .4 .6 .8 1.0 1.2 1.4 1.6 1.8 2.0 2.4 2.8 3.2 3.6 4.0 Marginal nonrandom regression imputation Joint regression imputation O O SD() O ü SD(O ü ) O ü SD(O ü ) 05491 05581 05777 06091 06462 06850 07220 07559 07863 08117 08352 08711 08975 09172 09320 09433 05521 05563 05779 06090 06467 06834 07223 07576 07858 08130 08345 08709 08976 09166 09311 09432 00351 00362 00378 00351 00332 00314 00283 00251 00230 00210 00192 00145 00120 00104 00087 00070 06618 06620 06656 06697 06796 06854 06917 06990 07065 07081 07155 07238 07286 07322 07442 07520 00494 00498 00478 00483 00485 00524 00569 00652 00725 00797 00848 00998 01027 01146 01225 01247 05462 05596 05795 06076 06456 06872 07233 07561 07819 08140 08350 08714 08971 09165 09325 09426 00693 00685 00691 00671 00626 00590 00549 00487 00443 00387 00350 00280 00226 00184 00153 00127 550 Journal of the American Statistical Association, June 2002 as the variance estimator for ˆQm , where m X bD 1 W v4e Xl 51 m lD1 have longer average lengths. Under multiple imputation, use of the Fisher Z scale seems unnecessary. 4.2 v4X5 is a variance estimator for Ô 4X5 in the case of no nonO X Q 5] and response [but v4e Xl 5 underestimates the variance of ˆ4 l m X O 1 bD B 4ˆ4e Xl 5 ƒ Q̂m 52 0 m ƒ 1 lD1 A 10041 ƒ 5% con dence interval p based on multiple imputation has the limits ˆQm t 4‡5 vm , where t 4‡5 is the 1 ƒ quantile of the t distribution with ‡ degrees of freedom and b =4m C 15B7 b2 ‡ D 4m ƒ 1561 C mW p . Q 1 Standard deviation of m vm , coverage probability and length of con dence interval based on ABB multiple imputation with m D 3 are evaluated by simulation, and the results are given in Table 3 (the case of normal errors). The results indicate that vm underestimates, but the bias becomes smaller as Š (or 5 increases. Raw scale con dence intervals based on multiple imputation generally have higher coverage probabilities than those based on O ü and the jackknife, but they also Bivariate Case With Nonnormal Errors To examine the effect of normality of the errors in (19), we repeat the simulation given in Table 3 with nonnormal errors. †i is distributed as an exponential random variable with probability density eƒ4xC15 1 x ¶ ƒ11 „i and ’i are distributed as a mixture of normal 04N 401 095 C 06N 401 302=35, and †i 1 „i , and ’i are independent. Consequently, the distributions of the errors …i and ‡i are skewed. The variables †i 1 „i , and ’i still have mean 0 and variance 1, so that the correlation coef cient between yi and zi is the same as that in Section 4.1. The other settings of the simulation are also the same as those in Section 4.1. In particular, the random terms in imputation are still generated from the normal distribution with covariance matrix given by (12). The simulation results are included in Table 3 (the case of nonnormal errors). It can be seen that the performance of the proposed method is similar to that in the case of normal errors. The standard deviations of O ü and Q m are increased, so are Table 3. Averages of 500 Simulations of the Standard Deviation (SD) of Sample Correlation Coef’ cients, SD Estimators, and the Coverage Probability (CP) and Length of 95% Con’ dence Intervals (bivariate case) Multiple imputation (m D 3) Proposed method Raw scale Š SD(O ü ) q adj Fisher Z scale Raw scale CP Length CP Length SD(Q m ) 0944 0924 0914 0918 0908 0926 0918 0906 0914 0922 0912 0920 0926 0912 0916 0924 0272 0263 0266 0259 0245 0227 0209 0187 0172 0152 0135 0105 0086 0071 0057 0049 0946 0934 0928 0932 0926 0946 0930 0920 0926 0934 0930 0942 0952 0938 0940 0940 0393 0389 0406 0416 0426 0436 0445 0442 0448 0453 0451 0441 0448 0448 0440 0443 00582 00573 00568 00551 00527 00479 00433 00397 00345 00306 00269 00215 00171 00137 00112 00097 0274 0275 0272 0277 0261 0260 0239 0226 0204 0181 0163 0136 0111 0088 0076 0064 0948 0944 0934 0940 0934 0926 0928 0924 0930 0940 0940 0944 0938 0934 0938 0952 0397 0405 0417 0449 0459 0491 0492 0517 0524 0528 0531 0543 0541 0545 0559 0556 00597 00596 00598 00593 00585 00557 00517 00477 00435 00392 00350 00287 00234 00192 00161 00135 v jack p Fisher Z scale CP Length CP Length 00503 00499 00494 00480 00459 00430 00396 00359 00327 00294 00263 00212 00173 00142 00118 00100 0938 0940 0937 0937 0940 0943 0950 0948 0953 0953 0960 0960 0962 0962 0968 0961 0290 0288 0287 0281 0269 0253 0234 0211 0193 0172 0154 0124 0101 0083 0069 0059 0942 0946 0941 0943 0944 0947 0956 0952 0960 0960 0965 0968 0970 0970 0974 0968 0419 0420 0434 0451 0465 0479 0491 0496 0507 0509 0511 0515 0518 0526 0526 0531 00511 00512 00506 00495 00479 00459 00431 00403 00374 00345 00316 00263 00221 00185 00156 00134 0935 0941 0936 0931 0932 0932 0937 0937 0944 0947 0952 0955 0959 0959 0964 0964 0292 0296 0294 0288 0279 0267 0249 0231 0213 0195 0177 0144 0120 0099 0083 0072 0938 0946 0942 0936 0936 0935 0940 0939 0947 0949 0951 0956 0959 0959 0963 0964 0422 0432 0446 0461 0482 0502 0517 0534 0551 0566 0575 0588 0605 0613 0620 0637 vm Case of normal errors 0 .2 .4 .6 .8 1.0 1.2 1.4 1.6 1.8 2.0 2.4 2.8 3.2 3.6 4.0 00693 00685 00692 00671 00626 00590 00534 00487 00443 00387 00350 00280 00226 00184 00153 00127 00693 00672 00678 00660 00624 00579 00534 00477 00440 00387 00345 00268 00220 00181 00145 00125 Case of nonnormal errors 0 .2 .4 .6 .8 1.0 1.2 1.4 1.6 1.8 2.0 2.4 2.8 3.2 3.6 4.0 00716 00680 00687 00697 00687 00685 00649 00617 00539 00478 00445 00352 00288 00247 00206 00161 00700 00702 00693 00709 00666 00662 00609 00576 00519 00461 00416 00347 00282 00234 00194 00162 0940 0934 0930 0928 0926 0918 0922 0918 0942 0930 0920 0934 0932 0920 0946 0950 adj NOTE: O ü denotes the sample correlation coef’ cient based on joint regression imputation; vjack , the proposed jackknife variance estimator; m , the sample correlation coef’ cient based on multiple imputation; and vm , the variance estimator based on multiple imputation. Shao and Wang: Sample Correlation Coef’ cients 551 Table 4. Averages of 500 Simulations of Sample Correlation Coef’ cients, Standard Deviation (SD) of Sample Correlation Coef’ cients, SD Estimators, and the Coverage Probability (CP) of 95% Con’ dence Intervals (multivariate case) q adj O ü p1 q SD(O ü ) vjack CP 11 2 11 3 11 4 11 5 11 6 11 7 11 8 11 9 11 10 21 3 21 4 21 5 21 6 21 7 21 8 21 9 21 10 31 4 31 5 31 6 31 7 31 8 31 9 31 10 41 5 41 6 41 7 41 8 41 9 41 10 51 6 51 7 51 8 51 9 51 10 61 7 61 8 61 9 61 10 71 8 71 9 71 10 81 9 81 10 91 10 05121 05085 04981 04844 04669 04505 04323 04152 03992 05482 05556 05483 05405 05321 05201 05107 04996 05906 06002 06029 06001 05964 05906 05833 06384 06490 06549 06550 06558 06522 06834 06950 07012 07051 07062 07247 07361 07424 07472 07607 07716 07779 07921 08019 08188 05039 05022 04964 04818 04686 04548 04392 04216 04056 05413 05419 05438 05344 05329 05234 05138 04986 05797 05878 05912 05941 05908 05839 05782 06230 06381 06478 06492 06466 06455 06704 06860 06901 06964 06967 07142 07264 07324 07382 07526 07653 07709 07840 07946 08152 00701 00679 00649 00646 00645 00630 00628 00605 00588 00665 00635 00616 00565 00576 00541 00547 00540 00587 00567 00545 00516 00500 00479 00488 00557 00501 00483 00465 00467 00432 00473 00439 00442 00423 00415 00414 00386 00365 00347 00356 00315 00322 00306 00299 00246 00740 00724 00701 00689 00666 00656 00647 00637 00626 00678 00671 00640 00630 00601 00590 00571 00572 00636 00598 00575 00555 00542 00527 00515 00574 00537 00508 00499 00478 00465 00502 00471 00456 00431 00418 00432 00411 00390 00374 00376 00350 00334 00322 00302 00276 0950 0958 0948 0942 0966 0952 0936 0944 0946 0944 0950 0944 0972 0942 0954 0946 0960 0940 0956 0970 0956 0960 0962 0942 0950 0954 0954 0968 0954 0958 0976 0966 0948 0944 0970 0968 0968 0962 0976 0948 0962 0964 0952 0956 0968 NOTE: p, q represent the case of estimating the correlation coef’ cient between variables p and q, 1 µ p µ q µ 10; O ü , sample correlation coef’ cient based on joint regression imputation; adj and v jack , the proposed jackknife variance estimator. the lengths of con dence intervals, which indicates the ef ciency loss of the proposed method (and the multiple imputation method) due to skewness of the errors. On the other hand, the coverage probabilities of the con dence intervals are almost the same as those in the case of normal error, which con rms the asymptotic unbiasedness and consistency of the proposed estimators. 4.3 Multivariate Case With Normal Errors Finally, we consider a simulation for the multivariate case where q D 10–dimensional ti ’s are generated according to (14) with Bk0 D 4011 021 : : : 1 1005 and Vki D xi Iq , where Iq is the identity matrix of order q. The sampling design and the distributions of xi ’s are still given in Table 1. The distribution of ei is multivariate normal with mean 0 and covariance matrix Iq C 1q 10q , where 1q is the column vector of 1s. The nonrespondents are generated with constant probability. The probability that ti has no missing component is 05, and given that ti has at least one missing component, the components of ti are missing independently with probability 05. The random terms in imputation are generated from a normal distribution with covariance matrix given by (17). For each pair of components of t, the true value of the correlation coef cient , simulation averages of O ü (the sample correlation coef cient based on joint regression imputation), the standard deviation of O ü , the jackknife variance estimaadj tor vjack for O ü , and the coverage probability of 95% con adj dence intervals based on O ü and vjack are given in Table 4. The adj results indicate that the performances of O ü and vjack are good, although their biases are larger than those in the bivariate case (the same sample sizes are used in both cases). However, because the jackknife estimator overestimates in this case, the coverage probabilities of the con dence intervals are larger than 95% in many cases. APPENDIX A.1 The asymptotic results in this article are based on the following asymptotic framework (see, e.g., Krewski and Rao 1981; Bickel and Freedman 1984; Valliant 1993). The population ° is assumed to be a member of a sequence of populations indexed by l, but l is suppressed for simplicity of notation. As l ! ˆ1 n ! ˆ1 n=N ! 0, and P P P maxh1 j i2³4h1 j5 wi =M D O4nƒ 1 5, where n D h nh 1 N D h Nh 1 M is the total number of ultimate units in ° , and ³4h1 j5 is the set of indices of sampled units in stratum h and cluster j. Also, as l ! ˆ, nh H X X hD1 jD1 E ˜–hj ƒ E4–hj 5˜2C„ D O4nƒ41C„5 5 for some „ > 0, where ˜˜ is the usual vector norm and –hj D P i2³4h1 j5 wi – i =M, with –i being any component of the matrix b=M57 and 0 < ti ti0 or xi xi0 . Finally, as l ! ˆ1 0 < lim inf6nvar4Y O lim inf6nvar457. Models (14) and (15) are assumed. Also, for every x in the support of the distribution of xi , it is assumed that P4a D a—x5 > 0. A.2 Y ü in (4) and the sample Here we show that the estimated total b ü correlation coef cient O in (7) based on the joint regression imputation described in Section 2 are asymptotically unbiased, under the asymptotic setting in Section A.1. We consider only the bivariate case. First, Em Es Eü 4b Y ü 5 D Em Es ³ X i2³ wi a i y i C XX k i2³k wi 41 ƒ ai 5 ´ h i ‘O ‚O 0k xi C 1=2…1 ‡1 k 4zi ƒ ƒOk0 xi 5 2 uki ‘O ‡1 k Em D Em ³ ³ X i2° X i2° ai yi C ´ yi 0 XX k i2° k ± 41 ƒ ai 5 ‚0k xi C ‘ …1 ‡1 k 2 ‘ ‡1 k ‡i ²´ 552 Journal of the American Statistical Association, June 2002 Next, Em Es Eü ³ X i2³ D Em Es C wi 41 ƒ ai 5bi yiü ³ XX k i2³k XX Em D Em D Em k i2³k ³ ³ ³ zi k i2°k XX k i2°k XX k i2°k 1=2 vki ‘O …1 ‡1 k O2 u1=2 ki ‘ ‡1 k ³ ³ X i2³ Em Es eü Finally, note that Eü 4…iü 2 5 D Then Em Es Eü ³ X i2³ D Em Es Em D Em D Em ³ ³ ³ ³ wi yiü 2 X i2³ C X i2° X i2° X i2° X ´ XX k i2³k Em ´ ³ ´ ü zi k i2³k REFERENCES ³ XX k i2° k XX k i2° k Em ³ X ai 41 ƒ bi 5yi zi ´ ´ 41 ƒ ai 541 ƒ bi 5yi zi 0 ´ yi zi 0 i2° wi 41 ƒ ai 564‚O 0k xi 52 C vki‘O …12 k 7 µ O2 ‘ …1 ‡1 k wi 41 ƒ ai 5 k i2° k k i2° k ai yi2 C XX XX XX yi2 0 wi yiü i2³ wi ai yi2 C ´ [Received November 1999. Revised August 2001.] ‘O …12 ‡1 k ‘O 2 0 2 C O 2 ƒ …1 ‡1 k ƒ O 4z ƒ 5 ‘ 0 x i k i …1 k 4 2 uki‘O ‡1 ‘O ‡1 k k ai yi2 C C ´´ ´ ´ Em ³ 2 ‘ ‡1 k ‡i z i 41 ƒ ai 5bi yi zi 0 wi 41 ƒ ai 541 ƒ bi 5yiü züi Thus v1=2 ki ‘ …1 ‡1 k ´ 1=2 41 ƒ ai 5bi 4‚0k xi ƒk0 xi C vki ‘ …1 ‡1 k 5 i2³ Em Ex Eü 4zi ƒ ƒOk0 xi 5zi 41 ƒ ai 5bi ‚0k xi zi C Similarly, ³ ´ X Em Es Eü wi ai 41 ƒ bi 5yi züi and As we argued in Section 3, Oü is a function of weighted averages. Furthermore, this function is a differentiable function. Hence, under the asymptotic setting in Appendix A.1, asymptotic unbiasedness, consistency, and asymptotic normality of O ü and consistency of the adjusted jackknife variance estimator for O ü can be proved using standard arguments of Krewski and Rao (1981), Bickel and Freedman (1984), and Valliant (1993). wi 41 ƒ ai 5bi ‚O 0k xi zi wi 41 ƒ ai 5bi XX A.3 ´ 41 ƒ ai 564‚0k xi 52 C vki‘ …12 k 7 41 ƒ ai 5 XX k i2° k 4 uki‘O ‡1 k 4zi ƒ ƒOk0 xi 52 ƒ µ ‘ …12 ‡1 k 4 ‘ ‡1 k ‡i2 ƒ ‘ …12 ‡1 k 2 ‘ ‡1 k ¶´ ´ 41 ƒ ai 564‚0k xi 52 C vki‘ …12 k 7 ¶´ ‘O …12 ‡1 k 2 ‘O ‡1 k Bickel, P. J., and Freedman, D. A. (1984), “Asymptotic Normality and the Bootstrap in Strati ed Sampling,” The Annals of Statistics, 12, 470–482. Deville, J. C., and Särndal, C. E. (1994), “Variance Estimation for the Regression Imputed Horvitz–Thompson Estimator,” Journal of Of cial Statistics, 10, 381–394. Kalton, G., and Kasprzyk, D. (1986), “The Treatment of Missing Data,” Survey Methodology, 12, 1–16. Kim, J. J. (1986), “A Method for Limiting Disclosure in Micodata Based on Random Noise and Transformation,” in Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 303–308. Krewski, D., and Rao, J. N. K. (1981), “ Inference From Strati ed Samples: Properties of the Linearization, Jackknife and Balanced Repeated Replication Methods,” The Annals of Statistics, 9, 1010–1019. Little, R. J., and Rubin, D. B. (1987), Statistical Analysis With Missing Data, New York: Wiley. Rao, J. N. K., and Shao, J. (1992), “Jackknife Variance Estimation With Survey Data Under Hot Deck Imputation,” Biometrika, 79, 811–822. Rubin, D. B. (1987), Multiple Imputation for Nonresponse in Surveys, New York: Wiley. Rubin, D. B., and Schenker, N. (1986), “Multiple Imputation for Interval Estimation From Simple Random Samples With Ignorable Nonresponse,” Journal of the American Statistical Association, 81, 366–374. Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data, London: Chapman and Hall. Srivastava, M. S. (1997), “Resampling Methods for Imputing Missing Observation in Regression Models, Where the Jackknife as Well as the Bootstrap Estimate of the Variance is Given for the Imputed Estimator in a Regression Model,” Technical Report #9707, University of Toronto, Dept. of Statistics. Srivastava, M. S., and Carter, E. M. (1986), “The Maximum Likelihood Method for Nonresponse in Sample Surveys,” Survey Methodology, 12, 61–72. U.S. Census Bureau (1987), “Noncertainty Sample Speci cation,” BSR-87 Action Memo D.06, U.S. Census Bureau. Valliant, R. (1993), “Poststrati cation and Conditional Variance Estimation,” Journal of the American Statistical Association, 88, 89–96.
© Copyright 2026 Paperzz