PDF file for Variance Estimation for the National Compensation Survey When PSUs are Clustered Prior to the Second Phase of Sampling

VARIANCE ESTIMATION FOR THE NATIONAL COMPENSATION SURVEY WHEN PSUS ARE CLUSTERED
PRIOR TO THE SECOND PHASE OF SAMPLING
Christopher J. Guciardo, Alan H. Dorfman, Lawrence R. Ernst, Michail Sverchkov
[email protected], [email protected], [email protected], [email protected]
Bureau of Labor Statistics, 2 Massachusetts Ave., N.E., Room 3160, Washington, DC 20212-0001
KEY WORDS: Replication; zero-calibrated; BRR;
jackknife; NCS
1.
Introduction
In this paper we investigate the accuracy of the current
variance estimator used in the National Compensation
Survey (NCS) and some design-based alternatives. The
principal motivation for this study is that our current
variance estimator is based in part on the assumption, which
is not completely true in NCS, that the sampling of
establishments is done independently in each sample area.
Three of the Bureau of Labor Statistics (BLS)
compensation survey programs, the Employment Cost Index
(ECI), the Employee Benefits Survey (EBS), and the
locality wage surveys, were integrated, creating one
comprehensive National Compensation Survey (NCS)
program. The ECI publishes national indexes that track
quarterly and annual changes in employers’ labor costs, both
wages and benefits, and also cost level information,
previously annually but now quarterly, on the cost per hour
worked of each component of compensation. The cost level
counterpart of the ECI is referred to as Employee Costs for
Employee Compensation (ECEC).
The former EBS
publishes annual incidence and detailed provisions of
selected employee benefit plans and is now known as the
Benefits Incidence and Provisions Product. The locality
wage surveys program publishes locality and national
occupational wage data. There are two samples used in the
NCS program, a larger wage sample, used for those survey
products that only present wage data, and a smaller wage
and benefits subsample used for products that include
benefits data. We will only be considering the wage sample
in this paper.
The NCS wage sample mostly consists of five rotating
panels at any point in time. It is selected using a three-stage
stratified design with probability proportionate to
employment sampling at each stage. The first stage of
sample selection is a sample of 154 areas (PSUs); the
second stage is a sample of establishments within sampled
areas, with the frame of establishments presently partitioned
into 23 industry sampling strata; and the third stage is a
sample of occupations within sampled areas and
establishments.
Currently, government and aerospace
manufacturing establishments, although included in the
survey estimates, are not part of the rotating panel design.
A key aspect of the sample design is that the second
stage of sampling, the selection of establishments, is not
done independently in each of the 154 sample areas. If we
had sampled independently in each sample area × industry
sampling stratum cell, then the total number of sampling
cells would be 154×23=3542 (since there are 23 industry
sampling strata). Since we require a minimum of one unit
in each nonempty cell, the total number of sample units to
meet the minimums would then comprise a large proportion
of each rotating sample panel, resulting in an inefficient
design.
To alleviate this problem, we only sample
independently in 52 larger areas.
The three nonmetropolitan areas from Alaska and Hawaii are collapsed to
form one cluster; and the remaining 99 areas are collapsed
into a second, much larger cluster. The reason for the two
separate clusters is that the NCS wage estimates are used in
determining locality pay for federal workers, but only in the
48 contiguous states and the District of Columbia, which led
us to sample Alaska and Hawaii separately from the
remainder of the nation. For each of these clusters, there is
a single allocation for each industry stratum to all the PSUs
in the cluster. The set of frame establishments in the cluster
is sorted first by PSU and then by the frame employment. A
systematic PPS sample of establishments is then selected,
with the measure of the size being the product of the frame
employment and the reciprocal of the probability of
selection of the PSU in which the establishment is located.
Our current method of calculating variance estimates
uses the variation of balanced repeated replication (BRR)
developed by Fay (1989). BRR assumes that the sample
design is two PSUs per stratum with replacement. Since our
design is actually one PSU per stratum, we artificially
created a two PSU per stratum design by collapsing, for the
most part, pairs of noncertainty PSUs together to form what
we call variance strata. This is a common technique know
as collapsed stratum variance estimation. The certainty
PSUs were split in half to form additional variance strata,
with half of the noncertainty sample establishments in each
industry sampling stratum in one variance PSU and half in
the other, and with the certainty establishments in both
variance PSUs. Because the variance strata that we formed
do not completely reflect our actual design, it should be
anticipated that our variance estimator is biased. The bias
arising from the use of a collapsed stratum variance
estimator has been well studied (Wolter 1985). In addition,
it is assumed in BRR that sampling is done independently in
each variance PSU. This is not the case for the two multiarea clusters and as a result there is another source of bias,
which will be the major focus of the paper.
This issue was studied previously in Wang, Dorfman,
and Ernst (2004), where a hybrid variance estimator
dependent on both model-based and designed-based ideas
was introduced as an alternative to our current variance
estimator. Although that paper developed from the NCS
problem, the empirical part of the paper was based on an
artificial population. In this paper we estimate the accuracy
of the variance estimates for our current variance estimator,
and several design-based modifications and alternatives,
using both the jackknife (Wolter 1985) and BRR, through
an empirical investigation that more closely reflects the
NCS sampling. This is done by first selecting 100 samples
of establishments from the our sampling frame, the
Longitudinal Database (LDB), using a selection method
similar to that used in the NCS. Next, for each variance
estimation method under consideration, we calculate 100
variance estimates, one per sample, and then compare the
variance estimates with an approximation of the true
variance.
Initially, we intended to take 100 independent firststage samples (with 154 PSUs per sample), and then within
each of these samples of PSUs, choose a single sample of
establishments. Unfortunately, the setup costs of managing
a frame and determining sample sizes are high even for a
single sample of PSUs, let alone 100 independent samples.
Fortunately, for a fixed set of PSUs, the marginal costs of
taking multiple samples of establishments are relatively low.
Therefore, for this study, we selected 100 independent
samples from only one set PSU-sample, namely, the current
NCS sample of PSUs.
The true variance is approximated as the sum of a
between-PSU component, and a component from the second
stage of sampling. For between-PSU variance we use the
standard variance formula for a single-stage, one-PSU per
stratum design, or a Taylor series approximation in the case
of mean wages. The phase-two variance component is
computed by taking the mean squared deviation of the 100
sample estimates from their average.
One drawback to the empirical approach considered is
that it does not reflect the third stage of sampling, the
sampling of occupations. There appears to be no simple
way to reflect this stage of sampling in this type of study
since the frame contains no occupational information.
Section 2 details the various variance estimators that are
studied. The following is the key motivation for these
different variance estimators. The sample weights used in
this study guarantee that under certain conditions sample
estimates Ê of total employment agree with the universe
total E for all samples, that is, in keeping with common
usage,“ Ê is calibrated with respect to E,” and hence that the
variance of Ê is zero.
These conditions are that for each establishment the
employment used in calculating the estimates be the same as
the frame employment used in selecting the establishment
and also the same as the employment used in obtaining a
measure of size for each area that is used in the selection of
sample areas. (In the NCS, however, these will not be
equal, since PSUs were selected using 1995 data,
establishments are sampled annually, and once an
establishment is sampled, its data is collected for several
years.)
All the variance estimators considered involve
calculation of a set of replicate estimates based on a set of
replicate weights for each replicate. Depending on how
these weights are calculated, the replicate estimates of total
employment may or may not always be E. If they are
always E then the variance estimate of Ê is always zero, in
which case the variance estimator of Ê is said to be zerocalibrated (Sverchkov, Dorfman, Ernst, and Guciardo
2004). It is desirable for a variance estimator corresponding
to Ê to be zero-calibrated if Ê is calibrated with respect to
E. We surmised before doing the empirical study that zerocalibrated variance estimators produce more accurate
estimates of variance than variance estimators that are not
zero-calibrated, with the verification of this hypothesis a key
point of this study. Furthermore, among the variance
estimators that are not zero-calibrated, we believe that some
tend to produce estimates of the variance of Ê that are
closer to zero, which is to be preferred if Ê is calibrated
with respect to E.
In section 3 we present our empirical results.
2. Jackknife and BRR Variance Estimators Studied
Both the jackknife and BRR are based on the
construction of variance strata, with each variance stratum
consisting of a set of variance PSUs. These are described in
Section 2.1. Next, for both jackknife and BRR, the first step
in the calculation of variance estimates is to form a set of
replicate estimates in which for each replicate a portion of
the sample is either eliminated or down weighted in
comparison with the full sample weights used in the basic
estimates, and a portion has their sample weights increased.
The full sample weights are presented in Section 2.2.
Fifteen different approaches to replicate weighting were
considered in this study. For jackknife, five basic strategies
were used, and for each strategy, we explored two different
sets of variance strata: 90 strata and 27 strata. Jackknife
methods are presented in Section 2.3. For BRR, an
analogous set of five methods were used, yet only for the 90
strata case. BRR methods are presented in Section 2.4.
Finally, in Section 2.5, the variance estimation formulas for
the jackknife and BRR are presented. From these formulas
it is easily seen that that if the replicate estimates
corresponding to an estimator must all be the same as the
full sample estimate, then the variance estimate must always
be zero.
2.1 Variance PSUs and Variance Strata
The variance PSUs are a modification of the actual
sample PSUs and each variance stratum is a collection of
variance PSUs, with n g variance PSUs in the g-th variance
stratum. For the jackknife each actual noncertainty PSU
corresponds to a variance PSU and each certainty PSU is
split in half to form two variance PSUs as explained in the
description of the current variance estimation method in the
Introduction. We consider two possibilities for the variance
strata for the jackknife, one uses the same 90 variance strata
we use for current BRR estimates in NCS publications, the
other uses 27 strata. In the 90 strata case, two variance
PSUs, for the most part, are collapsed together to form a
variance stratum. In the case of certainty PSUs the
corresponding two variance PSUs constitute a variance
stratum. For noncertainty PSUs the PSUs collapsed
together most be from the same census division, among the
nine census divisions, and must be all metropolitan or all
nonmetropolitan. For any of these 18 categories for which
there are an odd number of variance PSUs, one of the
variance strata corresponding to the category consists of
three variance PSUs instead of two. In the second set of
variance strata for the jackknife there are 27 variance strata
corresponding to the 9 census divisions × {certainty
metropolitan, noncertainty metropolitan, nonmetropolitan
areas}, that is, only one variance stratum per category. We
surmised that the second set of variance strata might yield
higher variance estimates because generally more variance
PSUs that might not as similar are collapsed together to
form variance strata with this approach.
For BRR, we considered only one set of variance strata,
which is identical to the first set of 90 mentioned for the
jackknife and identical to the 90 used in to get NCS
published variances. The only difference in the variance
PSUs is that for BRR n g = 2 for all strata So for BRR, if a
variance stratum has 3 sampled PSUs, then two of these
PSUs are combined to form a single variance PSU in order
to be able to form variance strata with two variance PSUs in
all cases.
2.2 Sample Weighting
Our current weighting system in NCS national wage
publications includes factors reflecting the selection
probabilities at each of the three stages of selection, and
establishment and occupational nonresponse adjustment. It
does not presently include any type of calibration or
benchmarking, although other survey products in the NCS
program do benchmark industry employment estimates to
LDB counts, and there are plans to do this for NCS wage
estimates.
In our simulation study, the sample weight, wigjk ,
assigned to establishment k in industry stratum j in variance
PSU i within variance stratum g, is the product of the
reciprocal of the probability of selecting PSU ig and the
reciprocal of the probability of selecting establishment jk
conditional on the set of sample PSUs. This is known as the
full sample weight to distinguish it from the replicate
weights described in Sections 2.3, and 2.4. Weighting
factors representing the third stage of sampling and
nonresponse adjustment are not used, since there is no third
stage of sampling or nonresponse in our study.
We also consider an alternative weighting, where the
sample weights just described are adjusted to yield estimates
of employment that for each industry stratum j are
benchmarked with respect to its frame employment, E j ,
since the accuracy of various variance estimation procedures
may be different for weights with and without this
adjustment (Sverchkov et al. 2004).
2.3 Replicate Weights for the Jackknife
Both the jackknife and BRR variance estimators require
that a set of replicate estimates be first calculated, where in
the calculation of each replicate estimate a portion of the
sample is either eliminated or down weighted, and a portion
has their sample weights increased. For both the jackknife
and BRR we consider five types of replicate weighting. For
both types of variance estimators, the weight wigjk before
replicate adjustment is the sample weight without
benchmarking to industry employment totals.
In addition, the following modifications to wigjk are
used prior to the replicate adjustment. In the case of a
certainty establishment in a certainty area, the sample
weight is cut in half to compensate for the fact that the
establishment is placed in two variance PSUs. In addition,
in the BRR case, for those variance strata that consist of
three actual PSUs, two of which are collapsed together to
form a variance PSU, the sample weights for establishments
in the collapsed variance PSU are multiplied by 3/4 and
those in the other variance PSU by 3/2 to compensate for the
difference in the sum of the weighted employments in these
two variance PSUs before this adjustment.
In the case of the jackknife variance estimator
considered here, a single variance PSU i′ in a single
variance stratum g′ is deleted in each replicate. (There are
other forms of the jackknife, not considered in this paper
that delete more than one variance PSU in each replicate.)
The
five
replicate
weights,
wl*igjk , l = 1,...,5 ,
for
establishment igjk, for the replicate corresponding to this
deletion are as follows:
0
if i = i ′, g = g ′

 n g wigjk
w1*igjk [i ′g ′] = 
(2.1)
if i ≠ i ′, g = g ′,
 ng − 1
= w
otherwise
igjk

where n g is the number of variance PSUs in variance
stratum g.
w2*igjk [i′g ′] only differs from w1*igjki′g ′ in that
w2*ig ′jk [i ' g '] =
n jg ′ wig ′jk
n j[i′g ′]
if i ≠ i ′
(2.2)
where n jg ′ is the number of sample establishments that are
in both industry stratum j and variance stratum g′ and
n j[i′g ′] is the number of such establishments that are not in
variance PSU i ′g ′ .
w3*igjk [i′g ′] only differs from w1*igjki ′g ′ in that
w5*igjk [i ′g ′] =
E j w1*igjk [i ′g ′]
Eˆ ′ ′
Eˆ Fjg ′ wig ′jk
if i ≠ i ′
Eˆ ′ ′
where Ê j = ∑ wigjk Eigjk , Eˆ j[i′g ′] = ∑ w1*igjk [i′g ′] E igjk . The
where
Ê Fjg ′ = ∑ wig ′jk E Fig ′jk , Eˆ Fj[i′g ′] = ∑
key difference between these two weights is that for
w5*igjk [i′g ′] , the benchmark factors E j / Eˆ j[i′g ′] are
w3*igjk [i′g ′] =
j [i g ]
Fj[ i g ]
i ≠ i′
ik
igk
∑ wig′jk E Fig′jk
and
k
E Figjk is the frame employment for establishment igjk
used in the selection of the establishment.
The rationale for these first three replicate weights is as
follows. As noted in the Introduction, Ê is calibrated with
respect to E, that is ∑ wigjk E igjk = E , (where E igjk is the
igjk
employment of establishment igjk at the time for which
estimates are made and E is the universe employment at this
time), provided that the employment used to select the
sample establishments, the employment used to select the
sample areas, and the employment used in estimation are all
the same. Consequently, we would want under these
conditions for each replicate estimate of total employment
to equal the full sample estimate, in which case the
estimator of the variance of Ê is zero-calibrated. Now
replicate weight 1 does not result in replicate estimates that
are equal to the full sample estimate and hence does not
produce a zero-calibrated variance estimator. Replicate
weight 2 attempts to obtain this zero-calibration by means of
adjustment (2.2). It does not completely succeed, however,
in part because for each industry stratum, there are different
sampling intervals for each of the 54 area sampling clusters,
and in part because weighted employment for a certainty
establishment differs from that of noncertainty
establishments and other certainty establishments in the
same sampling cell. Replicate weight 3 does completely
succeed in obtaining this zero-calibration. In fact, for
replicate weight 3 the variance estimator of total
employment is zero even if the establishment sampling is
from a different frame than the area sampling, despite the
fact that in that case Ê is not calibrated with respect to E
and hence the variance estimator with replicate weight 3
tends to underestimate the true variance.
These first three replicate weights are intended for the
case when the weights wigjk are not benchmarked with
respect to E j , where E j is the frame employment of
industry j at the time for which estimates are calculated.
The final two replicate weights are for the case when
the wigjk are benchmarked with respect to E j . That is,
w4*igjk [i ′g ′] =
E j w1*igjk [i′g ′]
Eˆ j
igk
recalculated for the replicate corresponding to each deleted
PSU i ′g ′ , so that the replicate estimates of employment are
calibrated with respect to E j ; while this is not case for
w4*igjk [i′g ′] since the benchmark factor for each replicate
weight is the same as for the corresponding full sample
weight.
2.4 Replicate Weights for BRR
For BRR we consider an analogous set of five replicate
weighting adjustments to the five considered for the
jackknife. While in the jackknife one variance PSU is
dropped in the formation of the replicate weights, in BRR
one variance PSU is either dropped or down weighted from
each variance stratum, and the sample establishments in the
other variance PSU in each variance stratum has its weights
increased.
For BRR, we let S r denote, the set of variance PSUs
that have their weights increased in the r-th replicate and S r′
denote the complimentary set. The S r are determined by
specifying a Hadamard matrix as described in Wolter
(1985).
We let f denote a factor applied to variance PSUs in
S r , with f ′ = 2 − f the corresponding factor in S r′ . For
standard BRR, f = 2 and hence the variance PSUs in S r′
are dropped. Fay’s method corresponds to 1 < f < 2 with
f = 1.5 the most commonly used value for Fay’s method,.
For Fay’s method, all units receive positive weights.
The five replicate weights, wl*igjkr , l = 1,...,5 , for BRR,
analogous to the replicate weights for the jackknife, with the
subscript r denoting the replicate r, are
 fwigjk if ig ∈ S r
(2.3)
w1*igjkr = 
′ igjk if ig ∈ S r′
 f w
w2*igjkr =
n j w1*igjkr
fn jr + f n′ ′jr
where n jr , n′jr and n j are the total number of sample
establishments in S r , S r′ , and in all variance PSUs,
respectively, in industry j.
Eˆ Fj w1*igjkr
w3*igjk =
Eˆ Fjr
where Ê Fj = ∑ wigjk E Figjk , Eˆ Fjr =
igk
∑ w1*igjkr EFigjk .
igk
Note that, unlike in the case of the jackknife, replicate
weight 3 does not completely restore the calibration with
respect to E. This is solely due to variance strata with three
actual PSUs, and the weighting factors of 3/4 and 3/2 that
are used on the corresponding variance PSUs, as explained
in Section 2.3, and we anticipated that the impact of this
problem would be small.
The fourth and fifth replicate weights are completely
analogous to that for the jackknife. That is,
w4*igjkr
E j w1*igjkr
=
Eˆ
j
w5*igjkr =
E j w1*igjkr
Eˆ
jr
where Eˆ jr = ∑ w1*igjkr Eigjk .
igk
2.5 Variance Estimation Formula for BRR and the Jackknife
For the jackknife and BRR, the replicate estimates are
obtained using the same form of estimator as the full sample
estimator, with full sample weights replaced by the replicate
weights described in the previous two subsections.
For the jackknife, with Θ̂ the full sample estimator and
Θ̂ig the corresponding replicate estimator with PSU ig
deleted, the jackknife variance estimator v J is
 ng − 1
ˆ
ˆ )2
 (Θ
−Θ
vJ = ∑
 n g  ∑ [ig ]
g 
 i
For the BRR with factor f and with Θ̂ rf the replicate
estimate corresponding to replicate r and factor f, the BRR
variance estimator with factor f, denoted v BRRf , is
v BRRf =
1
2
∑ (Θˆ rf
ˆ )2
−Θ
R(1 − f ) r
where R is the number of replicates.
3. Empirical Results
The empirical study began with selection of 100
samples of establishments from our LDB sampling frame.
We generally followed the current method of selection for
the NCS wage sample as described in Ernst, Guciardo,
Ponikowski, and Tehonica (2002). However, while under
the current method most of the sample is obtained from a
rotating panel design, with each annual sample panel
consisting of one fifth of the entire rotating sample selected
from an updated frame, we selected all five sample panels
from a single frame to simplify our work. We excluded
from our sampling frame and our estimates all government
and aerospace manufacturing establishments.
This is
because, although such establishments do contribute to our
current production estimates, they were selected from an
older and somewhat different sample design. Also, in
addition to calculating variance estimates for the entire
nation (All U.S.), we calculated variance estimates for the
domain “remaining 99 primary sampling strata” (Rem 99)
corresponding to the cluster of the remaining 99 PSUs. This
cluster is of greatest concern, since the sampling is not
independent in each variance PSU corresponding to this
cluster. (We did not study the three-PSU Alaska-Hawaii
cluster, since this cluster is very small.). In our weight
adjustments, two definitions for industry j were used. For
All U.S. and its subdomains, the j were the 23 industry
strata within All U.S. For Rem 99 and its subdomains, the j
were the same 23 industry strata, yet restricted to Rem 99.
This ensures that Rem 99 total employment estimates are
calibrated, as well as the All U.S. estimates.
The results are presented in 10 different tables,
covering all combinations of domains (All U.S., Rem 99,
and their 23 industry strata), estimates (total employment
and mean quarterly wages), and variance estimation method
(jackknife with 90 variance strata, jackknife with 27
variance strata, and Fay’s BRR with f = 1.5 ). Let the two
jackknife methods be known as jackknife-90 and jackknife27. Tables 1 and 2 display detailed results for the two
domains All U.S. and Rem 99. Tables 4-10 have summary
results for the 23 industries in both All U.S. and Rem 99.
Tables 4-7 compare standard errors across the various
methods. Tables 7-10 provide summary information on
95% confidence interval coverage rates.
In Tables 1 and 2, the rows correspond to variance
estimators, for All U.S. or Rem. 99. The first column is an
estimate σ̂ of the true standard error for the nonbenchmarked case. The next 9 columns can be divided into
three groups of three with each group corresponding to one
of the sets of replicate weights wl*igjk [i′j ′] , l = 1,2,3, in the
case of the jackknife, and wl*igjkr , l = 1,2,3, in the case of
BRR. The first column in each group is a ratio equal to the
“root average variance estimate” (RAVE), divided by the
estimate σ̂ of the true standard error in column 1. The
RAVE is the square root of the arithmetic mean over the
100 samples of the variance estimate Vˆ (i ), i = 1,...,100
(calculated using the variance estimation formula
corresponding to the group). Values near 1 are to be
preferred in this column. The second column in a group is
the percentage of the 100 samples for which the estimated
95% confidence interval covers the true value. The final
column in each group is an estimate of the “percent relative
root mean squared error”:
1
100
100
i =1 

2
Vˆ (i ) − σˆ 
σˆ
∑

which is the square root of the mean squared error of the
100 estimates of standard error, expressed as a percent of
the estimated true standard error.
% Rel. RMSE = 100
The remaining columns will display output for the
benchmarked cases ( l = 4,5 ). For mean wages, the format
is identical. For total employment, however, we will only
show the square root of the average variance estimate for
l = 4 . All other results are trivial, because for both
methods, Ê is calibrated with respect to E, and for l = 5 ,
all variance estimators are zero-calibrated. The former
property ensures that the true variance is zero, and that each
confidence interval contains E (we are using closed
intervals, which contain Ê =E, even when the variance
estimate is zero). The latter property ensures that, for l = 5 ,
all standard error estimates equal zero, in which case the
standard error of the standard error estimator is zero.
Table 1 shows output for total employment, for All U.S
and Rem 99. The square roots of the average variance
estimates (or RAVE values) decrease from method 1 to 2 to
3. Values for method 1 are above the estimated true standard
error, method 2 is above or very close, and method 3 is
substantially smaller, which all was expected (see Section
2.3). Methods 1 and 2 have all of their confidence intervals
containing frame employment, yet method 3 has less
coverage. The estimated MSEs of the standard error
estimators are relatively high for most methods, yet this is
more because of the contribution to the MSE of the bias
squared term then than the variance term. The bias squared
term of this MSE was estimated by
variance
term
by
1
100
∑i =1 
100
(αˆ − σˆ )2 and
Vˆ (i ) − αˆ 

2
,
the
where
100
1
αˆ = 100
∑i=1 Vˆ (i) . These terms are omitted from the
tables. Fay’s BRR for method 2 has the least bias in the
standard error estimator. For methods 1 and 2, Jackknife-90
have RAVE values similar to Fay’s BRR, which was
expected, because they use the same 90 variance strata.
Jackknife-27 has substantially higher RAVE values for
methods 1 and 2, which was expected, because it has fewer
variance strata and hence a larger collapsed stratum effect.
In Table 2, we have output for mean wages for All U.S.
and Rem 99. The RAVE values are above the estimated true
standard error. For All U.S., as expected, the RAVE values
drop from methods 1 to 2 to 3, and method 5 (where we
rebenchmark for each replicate) results in lower RAVE
values than method 4. Some results for Rem 99 were
unexpected, since RAVE values for both jackknife methods
increase from method 1 to 2 to 3, and from method 4 to 5.
Overall, Fay’s BRR has RAVE values closest to the
estimated true value, particularly for methods 2 and 5, and
confidence interval coverage is closer to 95%, in general.
Table 1. Output for Total Employment
Domain,
Variance
Estimator
All US, Jack-90
All US, Jack-27
All US, Fay's
Rem 99, Jack-90
Rem 99, Jack-27
Rem 99, Fay's
Non-Benchmarked Cases
Bench
Method L = 1
Method L = 2
Method L = 3
L=4
Conf
Conf
Conf
σ̂ Avg (Vˆ ) Inter % Rel Avg (Vˆ ) Inter % Rel Avg (Vˆ ) Inter % Rel Avg (Vˆ )
Rmse
Rmse
Rmse
σ̂
σ̂
σ̂
Cov
Cov
Cov
1000
1000
of Vˆ
of Vˆ
of Vˆ
%
%
%
True
688
688
688
659
659
659
2.08
9.09
1.90
1.85
3.57
1.77
100
100
100
100
100
100
108
809
90
86
257
77
1.41
6.73
1.00
1.38
1.44
1.00
100
100
100
100
100
100
41
573
3
39
44
3
0.33 100
0.17 83
0.13 62
0.33 100
0.18 99
0.10 68
67
83
87
67
82
91
1383
6241
1278
1095
2292
1095
Table 2. Output for Mean Quarterly Earnings
Domain,
Variance
Estimator
All US, Jack-90
All US, Jack-27
All US, Fay's
Rem 99, Jack-90
Rem 99, Jack-27
Rem 99, Fay's
Benchmarked Cases
Non-Benchmarked Cases
Method
L=4
Method L = 5
Method
L
=
1
Method
L
=
2
Method
L
=
3
True
True
Conf
Conf
Conf
Conf
Conf
Avg (Vˆ ) Inter % Rel Avg (Vˆ ) Inter % Rel Avg (Vˆ ) Inter % Rel
Avg (Vˆ ) Inter % Rel Avg (Vˆ ) Inter % Rel
σˆ
Rmse
Rmse
Rmse σˆ
Rmse
Rmse
σˆ
σˆ
σˆ
σˆ
σˆ
Cov
Cov
Cov
Cov
Cov
of Vˆ
of Vˆ
of Vˆ
of Vˆ
of Vˆ
%
%
%
%
%
47
47
47
75
75
75
1.33
3.82
1.28
1.52
1.66
1.50
100
100
100
100
100
100
35
282
32
53
67
53
1.25
2.79
1.17
1.67
1.79
1.39
98
100
96
100
100
100
29
179
23
69
80
41
1.31
2.38
1.13
1.87
2.08
1.45
100
100
96
100
100
100
34
138
22
88
108
47
47
47
47
78
78
78
1.31
3.80
1.27
1.49
1.63
1.47
100
100
99
100
100
100
34
280
31
50
64
50
1.25
3.27
1.12
1.68
1.80
1.37
100
100
99
100
100
100
29
227
21
69
81
40
True
Jack-90 (L=1)
Jack-90 (L=2)
Jack-90 (L=3)
Jack-27 (L=1)
Jack-27 (L=2)
Jack-27 (L=3)
Fay's (L=1,2,3)
−
22
13
35
4
4
4
22
78
−
26
65
0
0
0
65
87
74
−
87
13
4
22
74
Fay's (L=1,2,3)
Jack-27 (L=3)
Jack-27 (L=2)
Jack-27 (L=1)
Jack-90 (L=3)
65 96 96 96
35 100 100 100
13 87 96 78
96 96 100
−
4 −
48 22
4 52 −
26
0 78 74 −
39 96 96 91
78
35
26
61
4
4
9
−
True
Jack-90 (L=1)
Jack-90 (L=2)
Jack-90 (L=3)
Jack-27 (L=1)
Jack-27 (L=2)
Jack-27 (L=3)
Fay's (L=1,2,3)
−
17
17
39
9
0
9
30
83
−
35
61
17
9
13
65
83
65
−
74
61
35
48
70
61
39
17
−
35
9
4
39
91 100
83 91
39 65
65 91
78
−
22 −
30 78
70 83
91
87
52
96
70
13
−
78
Fay's (L=1,2,3)
Jack-27 (L=3)
Jack-27 (L=2)
Jack-27 (L=1)
Jack-90 (L=3)
Fay's (L=3)
0
0
0
9
0
0
9
0
0
−
Jack-90 (L=2)
39
17
35
91
13
30
91
17
−
91
Percent of 23
Domains
where
Col > Row
Jack-90 (L=1)
13 87
0 78
9 70
17 100
0 22
0 61
− 100
0 −
9 83
91 100
Fay's (L=2)
Table 6. Comparing Average Standard Errors
Mean Wage, 23 Industries in Rem 99
Fay's (L=1)
Jack-27 (L=3)
Jack-27 (L=2)
Jack-27 (L=1)
13 96 52
0 96 30
0 78 43
− 100 96
0 −
26
4 74 −
83 100 91
0 78 39
9 87 70
91 100 100
Percent of 23
Domains
where
Col > Row
True
True
78 52
−
Jack-90 (L=1) 22 −
26
Jack-90 (L=2) 48 74 −
Jack-90 (L=3) 87 100 91
Jack-27 (L=1)
4
4 22
Jack-27 (L=2) 48 70 57
Jack-27 (L=3) 87 100 91
Fay's (L=1)
13 22 30
Fay's (L=2)
61 83 65
Fay's (L=3)
100 100 100
Jack-90 (L=3)
Jack-90 (L=2)
Jack-90 (L=1)
Percent of 23
Domains
where
Col > Row
True
Table 4. Comparing Average Standard Errors
Total Employment, 23 Industries in Rem 99
Table 5. Comparing Average Standard Errors
Mean Wage, 23 Industries in All U.S.
Jack-90 (L=2)
0
0
0
9
0
0
4
0
0
−
numerator and denominator of the mean wage estimator,
and hence cancel out leaving only the formula for l = 1 .
For mean wages in each industry j for benchmarked
cases l = 4,5 , variance estimates are the same as those for
the case l = 1 , because each industry j is a benchmark cell.
Therefore, all units have the same benchmark adjustment
factor in the replicate estimates, and the factor cancels out.
Jack-90 (L=1)
9 96 43
0 57
9
0 74
9
4 100 100
0
4
0
0
4
0
− 100 100
0 −
4
0 96 −
96 100 100
Fay's (L=3)
Fay's (L=2)
100
91
96
100
22
−
100
96
100
100
Fay's (L=1)
100
96
91
100
−
78
100
96
100
100
Jack-27 (L=3)
Jack-27 (L=2)
13
0
0
−
0
0
96
0
0
91
Jack-27 (L=1)
True
96 74
−
Jack-90 (L=1)
4 −
30
Jack-90 (L=2) 26 70 −
Jack-90 (L=3) 87 100 100
Jack-27 (L=1)
0
4
9
Jack-27 (L=2)
0
9
4
Jack-27 (L=3) 91 100 100
Fay's (L=1)
4 43 26
Fay's (L=2)
57 91 91
Fay's (L=3)
100 100 100
Jack-90 (L=3)
Jack-90 (L=2)
Jack-90 (L=1)
Percent of 23
Domains
where
Col > Row
True
Table 3. Comparing Average Standard Errors
Total Employment, 23 Industries in All US
For total employment in each industry j for
benchmarked cases l = 4,5 , the situation is the same as it is
for total employment in all industries combined; that is, for
both methods Ê is calibrated with respect to E, and for
l = 5 , all variance estimators are zero-calibrated.
In Tables 5 and 6, for mean wages, jackknife 27 with
l = 1,2 dominates, although less so for industries in Rem
99. All the values in row 1 of each table are greater than 50
% , with Jackknife-90 with l = 3 having the values closest
to 50%. For mean wages, for an industry j, the Fay’s BRR
variance estimates for l = 2,3 are equivalent to those for
l = 1 because the replicate adjustment factors (multiplied
by wigjk ) are the same for all units in j, in both the
True
Tables 3-8 record, for each cell, the percentage of the
23 industry domains where the RAVE value of the method
indicated by the column exceeds the RAVE value of the
method indicated by the row. The first row and column
refers to the estimated true standard error for the nonbenchmarked cases.
In Table 3 (for total employment, All US industries),
Jackknife-27 for l = 1,2 dominates (that is, the percentages
in the corresponding columns are high) and method 3 in
most cases is dominated by methods 1 and 2. Fay’s BRR
with l = 2 has a RAVE value greater than the true standard
error estimate for 43% of domains, which is closest to 50%
of all methods. (Note that a dominance of about 50% over
the estimated true standard error seems desirable as
supporting unbiasedness of the variance estimator.) In
Table 4, (for total employment, Rem 99 industries),
jackknife 27 is less dominant. Both jackknife-27 and
jackknife-90 with l = 2 have 52% dominance over the
estimated true variance, the closest to 50%.
70
35
30
61
30
17
22
−
Tables 7-10 summarize confidence interval coverage
rates; that is, the percent of the samples where the estimated
95% confidence interval contains the true employment or
wage. Each table-cell displays the percentage of the 23
industries that have coverage rates X within the given range
Tables 7 and 8 clearly show how extreme the
confidence interval coverage can be for employment. Most
rates are high, but rates are low for l = 3 since variance
estimates are too close to zero.
X > 98%
39 100 100
Fay's (L=3)
Fay's (L=2)
96
96
13
96
87
4
90% < X < 98%
4
4
13
0
0
9
4
4
9
80% < X < 90%
0
0
0
0
0
9
0
0
9
X < 80%
0
0
48
0
0
70
0
9
78
X > 98%
Fay's (L=3)
Fay's (L=2)
Fay's (L=1)
Jack-27 (L=3)
Jack-27 (L=2)
Jack-27 (L=1)
Jack-90 (L=3)
Jack-90 (L=2)
Jack-90 (L=1)
Table 8. Confidence Interval Coverage Rates
Total Employment, 23 Industries in Rem 99
Percent of 23
Domains
where 95%
Confid Interv
Coverage X is:
96
87
39
96
87
17
96
83
4
90% < X < 98%
0
4
17
0
0
4
0
0
9
80% < X < 90%
0
0
0
0
4
4
0
0
0
X < 80%
4
9
43
4
9
74
4
17
87
Tables 9 and 10 for mean wages show less extreme
coverage rates, yet coverage is usually too high. Jackknife90 for l = 3 and Fay’s BRR have the least amount of
overcoverage.. Also in Table 10, there is some evidence of
undercoverage.
Fay's (L=1,2,3)
Jack-27 (L=3)
Jack-27 (L=2)
Jack-27 (L=1)
Jack-90 (L=3)
Jack-90 (L=2)
Jack-90 (L=1)
Table 9. Confidence Interval Coverage Rates
Mean Wage, 23 Industries in All US
Percent of 23
Domains where
95% Confid
Interv Coverage
X is:
X > 98%
48
52
39
78
78
70
43
90% < X < 98%
43
39
52
13
13
22
48
80% < X < 90%
4
4
4
4
4
4
4
X < 80%
4
4
4
4
4
4
4
4.
Fay's (L=1,2,3)
Jack-27 (L=3)
Jack-27 (L=2)
Jack-27 (L=1)
Jack-90 (L=3)
X > 98%
52
57
48
57
65
52
57
90% < X < 98%
26
26
30
22
22
35
22
80% < X < 90%
9
4
9
9
4
4
9
13
13
13
13
9
9
13
X < 80%
Fay's (L=1)
Jack-27 (L=3)
Jack-27 (L=2)
Jack-27 (L=1)
Jack-90 (L=3)
Jack-90 (L=2)
Percent of 23
Domains
where 95%
Confid Interv
Coverage X is:
Jack-90 (L=1)
Table 7. Confidence Interval Coverage Rates
Total Employment, 23 Industries in All US
Jack-90 (L=2)
Percent of 23
Domains where
95% Confid
Interv Coverage
X is:
Jack-90 (L=1)
Table 10. Confidence Interval Coverage Rates
Mean Wage, 23 Industries in Rem 99
Conclusions
For total employment, we found method 2, particularly
for Fay’s BRR, to be the class of standard error estimators
with the least bias for most key domains when the estimates
are not benchmarked. Method 1 is generally too high,
method 2 is better, method 3 is too low. This is consistent
with what we surmised when we introduced these three
methods in Section 2. Method 5 tends to do better than
method 4, since it re-benchmarks for each replicate.
For mean wages, these trends are less prominent, and in
some situations reversed, particularly for jackknife-90 and
the Rem 99 domains. Yet Fay’s BRR does well in many
cases, especially for method 2.
5.
References
Ernst, L. R., Guciardo, C. J., Ponikowski, C. H., and
Tehonica, J. (2002). Sample Allocation and Selection for
the National Compensation Survey. 2002 Proceedings of
the American Statistical Association, Section on Survey
Research Methods, [CD-ROM], Alexandria, VA:
American Statistical Association.
Fay, R. E. (1989) Theory and Application of Replicate
Weighting for Variance Calculations. Proceedings of the
American Statistical Association, Section on Survey
Research Methods, 212-217.
Sverchkov, M., Dorfman, A. H., Ernst, L. R., Guciardo, C.
J., and (2004). Zero-Calibrated Variance Estimators.
Proceedings of the American Statistical Association,
Section on Survey Research Methods, [CD-ROM].
Wang, S., Dorfman, A. H., and Ernst, L. R. (2004). A New
Variance Estimator for a Two-phase Restratified Sample
with PPS at Both Phases. Journal of Statistical Planning
and Inference, (to appear).
Wolter, K. M (1985). Introduction to Variance Estimation.
New York: Springer-Verlag.
Any opinions expressed in this paper are those of the
authors and do not constitute policy of the Bureau of Labor
Statistics.