A Method of Reconciling Discrepant National and Sub-National Employment and
Unemployment Estimates*
DRAFT, May 19, 2004, Revised August 16, 2004
P.A.V.B. Swamy
a
a , Jatinder S. Mehta and I-Lok Chang
Bureau of Labor Statistics, Statistical Methods Staff, 2 Massachusetts Ave, N.E., Washington, DC 20212
Key Words: Sample survey, Small area estimation, Errors in the variables model; Heteroscedastic nonlinear regression; Stochastic
coefficient estimation; Iteratively rescaled generalized least squares estimation.
Considering two discrepant estimates of employment or unemployment for each of several geographical
areas, this paper finds two models of their conditional variations across areas both at a point in time and
through time. One of these models is shown to predict the nation’s employment or unemployment better
than the other model. It improves one of the two estimates and corrects for nonresponse and measurementerror biases and sampling and non-sampling errors in the other estimate for each area. The improved
estimate is equal to the corrected estimate.
1. Introduction
In this paper, we consider a situation where we
have at least two different estimates of
employment or unemployment made from
different data sets for each of several small
geographical areas. The sources of two of these
data sets are the Current Population Survey
(CPS) and the American Community Survey
(ACS). The problem with CPS data is that for
many sub-state areas, they are either unavailable
or too sparse. For such areas, Local Area
Unemployment Statistics (LAUS) estimates
produced by the LAUS Program within the
Bureau of Labor Statistics (BLS) will be used in
place of the CPS estimates.
Section 2 derives the first two moments of two
conditional distributions from a joint distribution
of the CPS and ACS estimators across states at a
point in time. The mean of one of these two
distributions is shown to predict the nation’s
employment or unemployment better than that of
the other distribution. An empirical example is
given in Section 3. Section 4 concludes.
2. A Method of Simultaneously Improving
One and Correcting Another of Two
Estimates
Let Y be the finite population value of a
population characteristic for a geographical area
at a period. Such a value may not be equal to the
value of the population characteristic obtained
through a census (100% sample) because (i)
while conducting the census, some of the units in
the population may not be measured (omissions)
or may be measured more than once
(duplication) and (ii) measurement, response,
editing, coding, and tabulating errors may enter
into the census data. For these reasons, we treat
Y as an unknown “true value” and try to learn
about it from the available data that are not
perfect. This concept of a “true value” is not
different from Cochran’s (1977, p. 377) idea of a
“correct value.” The population characteristics
that are of interest in this paper are employment
and unemployment. The geographical areas that
are of interest in this paper are states.
Let Yit denote the “true value” of Y for state i
at time t.
(1)
Let the number of states in the nation be n. Then
Yt = ∑ i =1 Yit is the “true value” of national
n
employment or unemployment. The estimators
of Yit based on data for time t only from the
sample units within area i are called the direct
survey estimators. Let YˆitACS and YˆitCPS denote
the direct ACS and CPS estimators of Yit ,
respectively.1 The ACS is not designed to make
monthly estimates, so we can only get ACS
estimates of the annual averages of Yit . Let t
index years.
2.1 Choosing Between the Means of Two
Conditional Distributions Derived from a
Joint Distribution of Two Estimators
1
*Any opinions expressed in this paper are those
of the authors and do not constitute policy of the
Bureau of Labor Statistics. Thanks are due to
J.N.K. Rao, Tamara Zimmerman and Edwin
Robison for helpful comments.
The ACS is not fully implemented yet. Only
smaller-scale versions of the ACS were
conducted in 2000, 2001, and 2002. In this
paper, these smaller-scale ACS data for 2000 are
used to evaluate YˆitACS .
We
write
YˆitCPS = Yit + ε itCPS
and
ACS
ACS
CPS
ˆ
Yit
= Yit + ε it , where ε it
and ε itACS
denote the errors of YˆitCPS and YˆitACS ,
respectively. Here inferential disasters can be
avoided by making assumptions about ε itCPS and
ε itACS that are attentive to the CPS- and ACSdesign features, respectively. It is true that YˆitCPS
and Yˆ ACS are unbiased in probability sampling.
it
It is also true that adjustments have been made in
these estimators for nonresponse and, in
addition, measures have been taken to control the
presence of various sources of nonsampling
errors in the CPS and ACS. However, the
approximations involved in these adjustments
and measures (see Cochran (1977, Chapter 13))
may not have permitted the complete elimination
of all the biases nonresponse and measurement
errors have produced in the estimates that are
computed from the CPS and ACS data. Because
of the difficulty of ensuring that no such
approximations are present, we define ε itCPS and
ε itACS broadly to include both sampling and nonsampling errors and assume that these errors may
not have zero means. The amounts of biases that
are still remaining in YˆitCPS and YˆitACS after these
estimators have been adjusted for bias are given
by Eε itCPS and Eε itACS , respectively, which may
not be zero.
The estimator, Yˆ CPS , is the BLS standard for
measuring Y at the national level but not at the
sub-national level, where the CPS sample size
may be too small to yield direct survey estimates
with meaningful accuracy.
This standard
suggests that a necessary condition for an
estimate yit of Yit to be accurate is that
∑
n
i =1
yit
is equal to the CPS estimate of Yt .
The common component Yit of YˆitCPS and
YˆitACS shows that there is a relationship between
YˆitCPS and YˆitACS . One form of this relationship is
obtained by replacing the left-hand side of the
identity
Yit = Yit
(2)
CPS
it
by Ŷ
ACS
it
Ŷ
-ε
-ε
ACS
it
ˆ
Y
CPS
it
. Doing so gives
CPS
it
where α
CPS
0it
and its right-hand side by
CPS
ˆ ACS
= α 0it
+ α1itACS Y
it
=ε
CPS
it
and α
ACS
1it
= 1 - (ε
(3)
ACS
it
ˆ
/Y
ACS
it
).
If YˆitCPS is based on a very small sample, then
it is not a very precise estimator. To improve its
precision, its relationship with YˆitACS in (3) may
be used. This relationship can “borrow strength”
cross-sectionally, over time, or both, more
effectively than the relationship, YˆitCPS = Yit +
ε itCPS with Yit replaced by a function of some
auxiliary variables and random error, when
relevant variables are omitted from the latter
relationship, when the included auxiliary
variables are measured with error, or when the
function’s true functional form is misspecified.
Ignoring the effects of these misspecifications
leads to incorrect inferences, as Freedman and
Navidi (1986) and Swamy, et al. (2003a, b)
show.
A big virtue of model (3) is that by eliminating
Yit from (2) it avoids all the misspecification
problems associated with the regressions of Yit
on auxiliary variables. For example, consider the
usual assumption that the auxiliary variables,
which are used as some of the determinants of
Yit
are independent of ‘the’ excluded
determinants themselves.
This assumption
shown to be either meaningless or false by Pratt
and Schlaifer (1984, 1988) is avoided in (3).
What (3) cannot avoid, however, is the errors-inthe-variables problem: both its regressand ŶitCPS
and its regressor ŶitACS are subject to error.
Here, it should be pointed out that to estimate
(3), the econometric method of instrumental
variables cannot be used because these variables
do not exist in the usual situations, where (i)
relevant explanatory variables are omitted, (ii)
the included explanatory variables are measured
with error, and (iii) the unknown functional
forms of relevant relationships are misspecified,
as shown by Chang, et al. (2000).
The roles of ŶitCPS and ŶitACS in (3) can be
interchanged by replacing the left- and righthand sides of (2) by YˆitACS - ε itACS and
Yˆ CPS - ε CPS , respectively. Doing so gives
it
it
ˆ CPS
YˆitACS = α 0ACS
+ α1CPS
it
it Yit
(4)
/ Yˆ ) .
and α
= 1 - (ε
where α = ε
We now show that (3) is preferable to (4). In
these models, the regressors are not independent
of their respective coefficients. To account for
and YˆitACS in (3),
the correlation between α1ACS
it
ACS
0 it
ACS
it
CPS
1it
we assume that given ŶitACS = yitACS ,
CPS
it
CPS
it
⎛ 1 ⎞
+ ζ 0CPS
(5)
it
ACS ⎟
y
⎝ it ⎠
⎛ 1 ⎞
= π 10ACS + π 11ACS ⎜ ACS ⎟ + ζ 1ACS
α1ACS
(6)
it
it
⎝ yit ⎠
ACS
(ζ 0CPS
) ′ , are
where the vectors,
it , ζ 1it
ACS
ˆ
and are independently
independent of Y
CPS
CPS
= π 00
+ π 01
α 0CPS
⎜
it
it
distributed with mean vector zero and constant
covariance matrix σ 12 ∆1 as i varies for fixed t.2
Assumption (5) says that the mean function,
ˆ ACS = y ACS ) = π CPS + π CPS (1/ y ACS ) ,
E( α 0CPS
it
00
01
it | Yit
it
with one constant and one variable term, is a
proxy for the bias, E ε itCPS , and ζ 0CPS
it ,
representing the fluctuating component of ε itCPS ,
is independent of ŶitACS . It is analogous to one
of Cochran’s (1977, pp. 377-379) assumptions.
The mean function is a good proxy for E ε itCPS =
CPS
0 because π 01
(1/ yitACS ) takes only tiny values
CPS
, as we
and zero is a convenient value for π 00
show in Section 3 below. Good proxies for
E ε itCPS ≠ 0 can be obtained by adding additional
regressors to (5). Quite possibly, some of these
regressors are the proportions of the population
in state i falling in black and Hispanic racial
groups with different unemployment rates.
Thus, using various proxies for E ε itCPS including
the one for E ε itCPS = 0 in (5), we can investigate
the statistical consequences of various departures
from the assumption that E ε itCPS = 0.
Assumption (6) says that the correlation between
α1ACS
and YˆitACS is due to the function, π 10ACS +
it
π 11ACS (1/ yitACS ) , but once this function is
subtracted from α1ACS
, the remainder ζ 1ACS
is
it
it
ACS
ACS
independent of Yˆit . Given that α1it is a
function of 1/ Yˆ ACS and is correlated with Yˆ ACS ,
it
it
assumption (6) is reasonable. This assumption
may justify assumption (5) because (i) α 0CPS
and
it
α , being the coefficients of the same
equation, may be affected by the same set of
is affected by 1/ YˆitACS ,
variables and (ii) if α1ACS
it
may also be affected by the same
then α 0CPS
it
variable.
To account for the correlation between α1CPS
it
and YˆitCPS in (4), we assume that given YˆitCPS =
yitCPS ,
⎛ 1 ⎞
(7)
+ ζ 0ACS
it
CPS ⎟
y
⎝ it ⎠
⎛ 1 ⎞
α1CPS
(8)
= π 10CPS + π 11CPS ⎜ CPS ⎟ + ζ 1CPS
it
it
⎝ yit ⎠
CPS
′
(ζ 0ACS
are
where the vectors,
it , ζ 1it ) ,
CPS
ˆ
independent of Y
and are independently
α 0ACS
= π 00ACS + π 01ACS ⎜
it
it
distributed with mean vector zero and constant
covariance matrix σ 22 ∆ 2 as i varies for fixed t.
Inserting (5) and (6) into (3) gives
⎛ 1 ⎞
CPS
CPS
ACS ACS
YˆitCPS = π 00
+ π 01
+ π 11ACS
⎜ ACS ⎟ + π 10 yit
y
⎝ it ⎠
ACS ACS
+ ζ 0CPS
+
ζ
yit
(9)
1it
it
CPS
and π 11ACS is
where only the sum of π 00
identifiable.
Equation (9) is a nonlinear
regression
model
with
heteroscedastic
disturbances. Under (5) and (6), it implies that
the conditional distribution of YˆitCPS given
Yˆ ACS = y ACS , has mean equal to the right-hand
it
it
side of (9) with the last two terms suppressed
var(ζ 0CPS
+
and variance equal to
it )
ACS
var(ζ 1ACS
)( yitACS ) 2 +
2cov(ζ 0CPS
) yitACS ,
it
it , ζ 1it
where the variances and covariance are given by
the elements of σ 12 ∆1 .
It follows from (3) that
ε itCPS = α 0CPS
(10)
it
ACS
ACS ˆ ACS
ε
= (1 - α )Y
(11)
The time subscript t is fixed at a value because
currently, the ACS data are available only for the
years 2000-2002.
it
(12)
Equation (12) gives the true value, Yit , by
eliminating the differences between YˆitCPS and
Yˆ ACS . Summing it over i gives
it
∑ (Yˆ
n
ACS
1it
2
1it
it
ˆ ACS
Yit = YˆitCPS - α 0CPS
= α1ACS
it
it Yit
i =1
CPS
it
- α 0CPS
it
)
=
n
∑α
i =1
ACS
1it
YˆitACS (13)
The left- and right-hand sides of this equation
give two estimators of Yt that are equal to each
other with probability (w.p.) 1. If, in addition,
∑
n
i =1
α 0CPS
= 0
it
w.p.
1,
then
these
two
estimators are equal to the CPS estimator,
n
∑ i =1 YˆitCPS . It is desirable to obtain such
n
n
estimators if | ∑ i =1 YˆitCPS - Yt | < | ∑ i =1 YˆitACS - Yt |
w.p. 1. The BLS belief in this condition is very
high.
Thus,
under
the
condition,
= 0 w.p. 1, the estimators on both
∑ i =1 α 0CPS
it
n
sides of the second equality sign in (12) satisfy
both the BLS standard and the implied necessary
condition (stated at the end of the second
paragraph of Section 2.1) for them to be precise.
The condition,
∑
n
i =1
α 0CPS
= 0 w.p. 1, is nearly
it
CPS
CPS
satisfied if the sums of π 00
+ π 01
(1 / yitACS ) and
ζ 0CPS
over i are near zero w.p. 1.
it
n
i =1
ACS
it
- α 0ACS
it
)
=
n
∑α
i =1
CPS
1it
YˆitCPS (14)
which is equal to the BLS preferred estimator
n
= 1 w.p. 1 for all i and t.
∑ i =1 YˆitCPS if α1CPS
it
This condition is very strong and is unlikely to
be satisfied in practice. Therefore, we can
conclude that equation (4) usually gives an
estimate of Yt that is different from its BLS
preferred (CPS) estimate. Since it is easier to
satisfy the condition,
∑
n
i =1
ζ 1ACS
, of equation (9).3 These
it
CPS
, πˆ10ACS ,
approximations are denoted by πˆ 01
ˆ ACS , respectively.
(q
π 00CPS + π 11ACS ) , ζˆ0CPS
it , and ζ 1it
The corresponding estimate of σ 12 ∆1 is denoted
by σˆ 2 ∆ˆ . Arbitrary prior values of the unknown
1
1
parameters have little or no influence on these
estimates. Without (5) and (6) these IRSGLS
estimators are inconsistent because a necessary
condition for their consistency and asymptotic
and ζ 1ACS
are independent
normality is that ζ 0CPS
it
it
ACS
and other variables included on the
of Yˆ
it
If we consider the equation set, {(4), (7), (8)},
then (13) changes to
∑ (Yˆ
ζ 0CPS
and
it
α 0CPS
= 0 w.p. 1,
it
than to satisfy the condition, α1CPS
= 1 w.p. 1
it
for all i and t, we can easily obtain an estimate of
Yt equal to its CPS estimate by using the set,
{(3), (5), (6)}, which is equivalent to (9), instead
of the set, {(4), (7), (8)}. This is the reason why
we prefer (9) to the latter set.
Under (7) and (8), (4) gives the conditional
distribution of YˆitACS given YˆitCPS = yitCPS , whose
mean is equal to
⎛ 1 ⎞
E (YˆitACS | YˆitCPS = yitCPS ) = π 00ACS + π 01ACS ⎜ CPS ⎟
⎝ yit ⎠
CPS CPS
CPS
+ π 10 yit + π 11
(15)
and whose variance is equal to var(ζ 0ACS
it ) +
CPS 2
CPS
CPS
var(ζ 1CPS
)
+ 2cov(ζ 0ACS
,
it )( yit
it , ζ 1it ) yit
where the variances and covariance are given by
the elements of σ 22 ∆ 2 .
Given the data, ( yitCPS , yitACS ) for i = 1, 2, ..., n
and fixed t, an Iteratively Re-Scaled Generalized
Least Squares (IRSGLS) method is used to
obtain good approximations to the minimum
variance linear unbiased estimators of the
CPS
CPS
, π 10ACS , (π 00
+ π 11ACS ) , and the
coefficients, π 01
best linear unbiased predictors of the errors,
right-hand side of (5) and (6). Sufficient
conditions for their consistency and asymptotic
normality have been worked out in the
econometrics literature.
2.2 Bias- and Error-Corrected Version of the
CPS Estimator
Equations (5), (10), and (12) can be used to
CPS
= 0,
show that given YˆitACS = yitACS and π 00
2
⎛ CPS 1 ⎞
+ var(ζ 0CPS
E (YˆitCPS - Yit ) 2 = ⎜ π 01
it ) (16)
ACS ⎟
y
it
⎝
⎠
where the first term on the right-hand side is the
square of the bias, Eε itCPS . Adding additional
regressors on the right-hand side of (5) may push
the magnitude of this bias above that of
1
π 01CPS ACS and decrease the magnitude of the
yit
fluctuating component of ε itCPS , below that of
CPS
ζ 0CPS
is
it . An estimate of E ε it
1
n CPS = πˆ CPS
Bias
01
(17)
y
The standard error of (17) is given by the square
root of
ACS
it
2
⎛ 1 ⎞
var(πˆ ) ⎜ ACS ⎟
⎝ yit ⎠
From equations (5) and (12) we
⎛
CPS
= YˆitCPS - πˆ 01
YˆitBECCPS = YˆitCPS - αˆ 0CPS
⎜
it
⎝
CPS
ˆ
,
-ζ
CPS
01
0 it
(18)
have
1 ⎞
ACS ⎟
yit ⎠
(19)
if π
= 0 . This is what we call bias- and
error-corrected (BEC) version of the CPS
CPS
00
3
Chang, et al. (2000) developed software for the
IRSGLS method.
estimator, YˆitCPS . It is close to YˆitCPS if αˆ 0CPS
is
it
close to 0 w.p. 1. Estimator (19) can be more
precise than YˆitCPS .
YˆitCPS for a majority of states. Consequently,
YˆitACS without the improvements considered in
Section 2.3 is not preferable to Yˆ CPS . Now we
2.3 Improved ACS Estimator
need to compare (20) with YˆitCPS . The conditional
given
distributions of (20) and Yˆ CPS
it
An estimator of Yit that gives a good
approximation to (12) is
YˆitIACS = αˆ1ACS
yitACS =
it
⎡ ACS
⎤ ACS
⎛ 1 ⎞
ACS
ACS
(20)
⎢πˆ10 + πˆ11 ⎜ ACS ⎟ + ζˆ1it ⎥ yit
⎝ yit ⎠
⎣⎢
⎦⎥
where YˆitIACS denotes the improved ACS (IACS)
estimator and πˆ11ACS is the IRSGLS estimator of
π 11ACS when π 00CPS is restricted to be zero. The
standard error of (20), denoted by se( YˆitIACS ), is
given by the square root of the conditional
variance of YˆitIACS given YˆitACS = yitACS , which is
var(πˆ ACS )( y ACS ) 2 + var(πˆ ACS ) + var(ζˆ ACS ) ×
10
11
it
) + 2cov(πˆ , πˆ
2cov(πˆ ACS , ζˆ ACS )( y ACS ) 2
ACS
it
(y
2
10
ACS
10
1it
ACS
11
1it
ACS
it
)y
+
variance, respectively.
Under (5), µit
π
+
CPS
00
var( ζ
+
π
CPS
01
ACS
it
(1/ y
)
Yit
and ωit
=
=
) if Yit is fixed. Unlike ωit , the CPS
design variance of YˆitCPS does not take into
account the variation in Yˆ CPS across i and hence
CPS
0 it
it
is not equal to ωit .
it
) yitACS
+ 2cov(πˆ11ACS , ζˆ1ACS
it
it
YˆitACS = yitACS across i for fixed t are called the
conditional
cross-sectional
distributions.
IACS
IACS
Suppose that Fit , µit , and ωitIACS (= (21))
denote
the
conditional
cross-sectional
distribution function of (20), its mean and
variance, respectively. Let Fit , µit , and ωit
denote
the
conditional
cross-sectional
distribution function of YˆitCPS , its mean and
(21)
2.4 Comparison of CPS, ACS, Improved ACS,
and Bias- and Error-Corrected CPS
Estimators
From (11) it follows that the amount of bias in
ACS
ˆ
ˆ ACS . An idea of
Yit
is E ε itACS = E (1 - α1ACS
it )Yit
the magnitudes of ε itACS and Eε itACS can be
obtained by examining the estimate, εˆitACS =
ACS
(1 - αˆ1ACS
, of (11) and the estimate of
it ) yit
ACS ˆ ACS
E (1 - α1it )Yit
equal to yitACS - πˆ10ACS yitACS -
πˆ11ACS when π 00CPS = 0.
The second equality in (12) is preserved under
the IRSGLS estimation of (9). Consequently,
(19) and (20) have the same conditional
distribution given YˆitACS = yitACS and yield the
same estimate of Yit when they are evaluated at
the IRSGLS estimates of π ’s and ζ ’s in (9).
They both are close to YˆitCPS if αˆ 0CPS
is close to 0
it
w.p. 1. Thus, (20) resolves the difficult problem
of choosing between the two discrepant
estimates of Yit given by the CPS and the ACS.
The previous studies found that for the year
2000, the amount of bias in YˆitACS exceeds that in
We say that (20) is better than YˆitCPS if
FitIACS ( yit + µitIACS ) − FitIACS (− yit + µitIACS ) ≥
Fit ( yit + µit ) − Fit (− yit + µit )
(22)
for each yit . A necessary condition for (22) to
hold is that ωitIACS ≤ ωit (see Rao (1973, p. 96)).
It is satisfied for fixed Yit if (21) ≤ var( ζ 0CPS
it ).
The difficulty here is that the population values
of ωitIACS and ωit are always unknown and the
condition, ωitIACS ≤ ωit , may not be false when
the sample estimates of ωitIACS and ωit violate it.
Furthermore,
to
verify
the
condition,
ωitIACS ≤ ωit ,
if we use the same data that we
have used to estimate the unknown quantities of
(9), then we will be judging the validity of our
estimated model (9) by the data from which we
have derived it (see Friedman and Schwartz
(1991, pp. 46-48))! This is not the right thing to
do. Therefore, the practical verification of the
condition,
∑
n
i =1
α 0CPS
= 0 , given below (13)
it
gives more useful information about the relative
accuracies of the estimates given by (20) and
YˆitCPS than that of the inequality, ωitIACS ≤ ωit .
A necessary condition for (20) to
stochastically dominate YˆitCPS is that ωitIACS +
( µitIACS - Yit ) 2 ≤ ωit + ( µit - Yit ) 2 (see Rao
(1973, p. 315, (5a.1.2)) and Lehmann and
Casella (1998, p. 400, Problem 4.14)). It is
possible that ( µitIACS - Yit ) 2 ≤ ( µit - Yit ) 2
because (20) is equal to (19), which is corrected
for bias.
Changing the set of regressors in (5) and (6)
changes the values of µitIACS , µit , ωitIACS , and
larger than those of ŶitCPS , which are close to
zero.
≤ 34198 and
In (23), −37760 ≤ ζˆ0CPS
it
ACS
ˆ
-0.15115 ≤ ζ
≤ 0.16419 . When (5) and
1it
(6) are evaluated at the estimates in (23), we
≤ 34198 and 0.55068
obtain −37760 ≤ αˆ 0CPS
it
≤ α̂1itACS ≤ 0.90853 . All the estimates of α1itACS
have the right sign. Inserting the estimates of
α 0CPS
and α1ACS
into (10) and (11) gives −37760
it
it
ωit . One way of limiting the effects of
misspecifications in (5) and (6) is to restrict
attention to those regressors, which when
included in (5) and (6), yield close
approximations to design consistent estimates.
This can be accomplished by using in (5) and (6)
those regressors that make (20) lie between YˆitCPS
different from that of
- 2se( Yˆ
) and Yˆ
IRSGLS estimate of ζ
estimator,
∑
CPS
it
CPS
it
n
Yˆ
CPS
i =1 it
+ 2se( Yˆ
CPS
it
), since the
, is design consistent.
≤ εˆitCPS ≤ 34198 and 4933.534 ≤ εˆitACS ≤
139309.2 .
The estimate of the error,
CPS
CPS
εˆit = αˆ 0it , of the CPS estimate, yitCPS , is
εˆ
ACS
it
>
εˆ
CPS
it
Eε itCPS
CPS
0 it
because the
is large.
Also,
for 45 states. For Hawaii and
North Carolina, εˆitACS is more than 170 and 269
3. Example
times εˆitCPS , respectively. Also, for some states,
Let Yit
denote the annual average
unemployment for t = 2000 and let i index the 50
states and the District of Columbia. Following
the IRSGLS method and using the CPS and ACS
estimates, ( yitCPS , yitACS ) , of 51 Yit ’s given in
columns (B) and (D) of Table 1, we obtain the
following estimates for model (9):
⎛ 1 ⎞
yitCPS = -0.064101 ⎜ ACS ⎟ + 0.76659 yitACS
(0.12511) ⎝ yit ⎠ (0.03034)
− 2151.8 + ζˆ0CPS
+ ζˆ1ACS
yitACS
(23)
it
it
(4199.8)
where the figures appearing in parentheses below
the coefficient estimates are asymptotic (large
CPS
sample) standard errors and π 00
= 0. The
realized value of the design coefficients of
variation (CV) of CPS estimates are given in
column (C) of Table 1.
The estimate of the coefficient on (1/ yitACS ) in
equation (23), although has the right sign, is
insignificant. This shows that the estimate of the
bias, Eε itCPS , of the CPS estimate, yitCPS , based
on (17) is insignificant. The estimated values of
Eε itCPS lie between -5.8E-06 and -6.5E-08 and
such as DE, KS, MT, RI, VT, and WY, εˆitCPS
appears to be large in absolute value even though
the CPS and ACS estimates are very close to
each other. However, such large values do not
arise when the Blacks/Population and
Hispanics/Population ratios for state i are
included in (5) and (6) as additional regressors
(see column (I) of Table 1).
The formulas (20) and (21) were used to
obtain the estimates, IACS1, in column (E) and
their standard errors in column (F) of Table 1,
respectively. For each state, the IACS1 estimate
is equal to the bias- and error-corrected CPS
estimate, as shown in Section 2.4, but is smaller
than the ACS estimate, yitACS . The latter result is
obtained because the ACS data classified far
more people as unemployed than the CPS data
did for 2000.
All these results arise as a direct consequence
of (5) and (6). Including in (5) and (6) the
Blacks/Population and Hispanics/Population
ratios for state i as additional regressors
generally brings the estimates based on
YˆitCPS - αˆ 0CPS
much closer to yitCPS by reducing
it
those of Eε itACS lie between 4714.6 and
232926.6; for the formula used to estimate
Eε itACS , see Section 2.4. Thus, under (5) and (6),
the estimated absolute biases of Yˆ ACS are much
(see columns (E), (G), and (I) of Table 1).
it
substantially the fluctuating component of αˆ 0CPS
it
CPS
ˆ
without substantially increasing the bias of Y
it
CPS
The estimate of - ∑ i=1 α̂ 0it
51
given at the
bottom of column (E) of Table 1 is -5 showing
CPS
that the sum, ∑ i=1 ( yitCPS - αˆ 0it
) , of the state
51
BECCPS estimates based on (19) is equal to the
sum,
∑
51
i=1
yitCPS , of the state CPS estimates.
This means that under (5) and (6), both the
IACS1 and the BECCPS estimates of Yit ’s
cohere with the national CPS estimate (of
∑
51
i=1
Yit ), which is regarded as the BLS
standard. This coherency property, however,
does not exactly hold if the ratios of blacks and
Hispanics to the total population in state i are
included in (5) and (6) as additional regressors
(see columns (G) and (I) of Table 1).
The CV of IACS1 estimates in column (F) of
Table 1 are very high for many states. However,
they get reduced for all states except DC if the
proportion of state i’s population who were black
is included in (5) and (6) as a third regressor (see
column (H) of Table 1). The range of the CV of
IACS1 or IACS2 estimates is reduced when the
proportion of state i’s population who were
Hispanic is included in (5) and (6) as a fourth
regressor (see column (J) of Table 1). For each
state, the CV in columns (F), (H), and (J) are
larger than the CV in column (C). This result
should not be regarded as a demonstration that
the CPS estimates are more accurate than the
IACS3 estimates because the CV of the CPS
estimates ignore the variability of ŶitCPS across i
and those of IACS3 take into account the
variability of the cross-sectional estimates of
coefficients and error terms in (23). The IACS3
estimates can be more accurate than the other
estimates presented in Table 1 because for all
states except one, they reside between the lower
and upper 95% confidence limits of the
corresponding CPS estimate.
4. Conclusions
Even with estimators that are unbiased in
probability
sampling,
nonresponse
and
measurement errors may produce biases in the
estimates that we are able to compute from the
data. Therefore, it is important that appropriate
adjustments for bias are made in the estimators
and, in addition, measures are taken to control
the presence of various sources of nonsampling
errors in the surveys. If these adjustments and
measures are inaccurate, then the estimates that
we are able to compute from the data may still be
biased and may contain nonsampling errors
besides sampling errors. This paper develops
good methods of removing these biases and
errors. The method can be used to remove the
differences between any pair of discrepant
national and sub-national employment or
unemployment estimates.
The method’s
practical behavior is demonstrated on a real
dataset.
References
Chang, I., Swamy, P.A.V.B., Hallahan, C. and
Tavlas, G.S. (2000), “A Computational
Approach to Finding Causal Economic Laws,”
Computational Economics, 16, 105-136.
Cochran, W.G. (1977), Sampling Techniques, 3rd
edition, New York: John Wiley & Sons.
Freedman, D.A. and Navidi, W.C. (1986),
“Regression Models for Adjusting the 1980
Census (with discussion),” Statistical Science,
1, 3-39.
Friedman, M. and Schwartz, A.J. (1991),
“Alternative Approaches to Analyzing
Economic Data,” The American Economic
Review, 81, 39-49.
Lehmann, E.L. and Casella, G. (1998), Theory of
Point Estimation, 2nd edition, New
York: Springer.
Rao, C.R. (1973), Linear Statistical Inference
and its Applications, 2nd edition, New
York: John Wiley & Sons.
Swamy, P.A.V.B., Chang, I., Mehta, J.S. and
Tavlas, G.S. (2003a), “Correcting for
Omitted-Variable and Measurement-Error Bias
in Autoregressive Model Estimation with Panel
Data,” Computational Economics, 22, 225253.
Swamy, P.A.V.B., Tavlas, G.S. and Chang, I.
(2003b), “How Stable are Monetary Policy
Rules: Estimating the Time-Varying
Coefficients in Monetary Policy Reaction
Function for the U.S.” Computational Statistics
& Data Analysis, forthcoming.
Table 1: Direct and Improved Estimates of Unemployment for the 50 States and the D.C. During 2000
StAbbr
CPS
se(CPS)%
ACS
IACS1 se(IACS1)% IACS2 se(IACS2)%
IACS3 se(IACS3)%
CPS
(A)
AL
AK
AZ
(B)
96978
21373
98085
(C)
8.10
6.76
7.98
IACS1
(D)
134383
24674
150382
(E)
103061
16059
114869
(F)
19.08
31.48
19.15
IACS2
(G)
97635
17295
111060
(H)
11.67
18.77
9.77
IACS3
(I)
96229
21871
107889
(J)
13.09
11.29
13.70
AR
55278
CA
835331
CO
65042
CT
40064
DE
16428
DC
17653
FL
280793
GA
155576
HI
25267
ID
31969
IL
280926
IN
99817
IA
41326
KS
51761
KY
81043
LA
110592
ME
23772
MD
106682
MA
87701
MI
182848
MN
90958
MS
74423
MO
101860
MT
23648
NE
28120
NV
42457
NH
19183
NJ
159974
NM
42331
NY
418855
NC
149668
ND
10520
OH
232986
OK
50532
OR
88361
PA
251237
RI
22249
SC
75102
SD
9447
TN
110409
TX
440736
UT
37152
VT
9695
VA
79094
WA
159026
WV
44562
WI
105419
WY
10360
Total 5694669
8.13
3.10
9.46
12.27
8.68
7.26
4.50
7.56
8.99
7.31
4.65
9.05
10.21
8.83
8.35
7.26
9.54
8.67
7.45
5.55
8.89
7.25
8.85
7.48
9.43
7.86
10.77
5.46
7.52
3.55
6.48
9.61
5.15
9.51
7.66
4.79
8.71
8.90
10.48
8.46
4.14
8.80
10.49
10.49
7.78
7.12
8.67
8.39
80232
61855 19.26
59559 10.79
56548 12.58
988710 849401 17.23 839316
8.59 843676 12.17
94290
73814 18.82
72539
9.47
71309 12.22
74750
63682 17.51
57199
9.96
46463 14.23
16947
10274 43.73
10604 32.75
15578 14.05
22875
15065 32.57
13070 41.56
16549 25.15
396620 269741 21.64 279558 10.19 286078 12.49
193889 158667 17.89 148949 11.23 153768 12.08
35702
25205 24.39
26479 13.30
28458 11.76
37291
25082 25.22
27459 12.95
33259 10.53
347670 291055 17.56 280375
8.92 282583 10.86
123002
86517 20.82
95087
9.26
99634 10.85
63321
48732 19.69
48883 10.02
44555 12.78
48841
29737 25.92
35227 11.84
48875
8.96
109211
81980 19.55
83431
9.43
81721 11.83
146198 109751 19.49 106226 12.90 108616 13.47
27220
17853 29.59
19466 16.62
23679 11.09
139970 104481 19.60 102609 11.97 105612 12.68
113099
82078 20.21
87170
9.33
89387 11.18
263720 169488 22.82 185245 10.21 184208 12.58
95724
56761 24.84
74155
9.41
88462
9.69
99271
73591 19.84
69673 15.14
72102 14.72
136955 103334 19.39 103627
9.49 102270 11.76
23631
14845 33.47
16473 19.18
22366 10.60
29627
19141 28.80
20975 16.09
27625 10.30
60214
44682 20.55
45513 10.52
48483 11.93
21918
14043 34.40
15263 20.59
19473 11.58
220139 155354 20.76 161976
9.75 163761 11.91
63475
48445 19.85
48945 10.03
54963 14.42
507074 431383 17.33 420638
8.71 420826 10.77
198593 149485 19.45 148484 10.35 149459 11.97
14495
8826 49.42
9501 32.46
12475 14.60
279842 248530 16.52 230668
8.62 231245 10.70
96741
87892 16.20
73528
9.56
56527 15.05
112952
81358 20.36
87991
9.35
90663 11.24
316102 258742 17.95 250625
8.96 250817 11.14
23709
15114 32.91
16394 19.81
22670 10.93
98133
71562 20.18
70365 13.12
73496 13.08
13000
7689 55.87
8308 37.14
11480 15.36
164329 118360 20.31 118070 10.26 111752 12.98
572323 442125 19.10 440248
9.29 450006 13.15
66133
54103 18.44
51719
9.81
43610 13.79
10980
6046 69.85
6641 46.65
10523 16.05
150989 116853 18.90 101365 11.43
83787 16.07
161811 133407 17.74 139153
8.39 156877
9.25
51111
34347 23.28
38074 11.10
43615 10.55
123719
83455 21.71
96977
9.13 104644 10.47
11909
6742 63.10
7384 41.85
11857 14.99
7357896 5694664
5677173
5782378
1663227
-5
-17496
87709
Note: se = standard error. The IACS1-3 are the improved ACS estimates obtained from equation (3). The
IACS1 were obtained by subjecting the coefficients of (3) to equations (5) and (6), respectively. The IACS2
were obtained by subjecting the coefficients of (3) to (5) and (6) with Blacks/Population as an additional
regressor, respectively. The IACS3 were obtained by subjecting the coefficients of (3) to (5) and (6) with
Blacks/Population and Hispanics/Population as additional regressors, respectively. The figures appearing below
the totals are the differences between the totals and the CPS total.
© Copyright 2026 Paperzz