American Economic Association

American Economic Association
Evaluating a Simple Method for Estimating Black-White Gaps in Median Wages
Author(s): William Johnson, Yuichi Kitamura and Derek Neal
Source: The American Economic Review, Vol. 90, No. 2, Papers and Proceedings of the One
Hundred Twelfth Annual Meeting of the American Economic Association (May, 2000), pp. 339343
Published by: American Economic Association
Stable URL: http://www.jstor.org/stable/117247
Accessed: 27-07-2015 13:53 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact [email protected].
American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to The American Economic
Review.
http://www.jstor.org
This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC
All use subject to JSTOR Terms and Conditions
Evaluatinga Simple Method for Estimating
Black-White Gaps in Median Wages
By WILLIAMJOHNSON,YUICHI KITAMURA,AND DEREK NEAL*
Racial differencesin wage ratesareimportant
measures of economic inequality among races
because, for almost all individuals, labor income constitutesthe most importantcomponent
of lifetime income. As a consequence,the prices
at which individualsmay sell their time provide
vital infornation about the distributionof welfare and economic success. However, we only
observe prices when markettransactionsoccur,
and thus we only observe wage rates for individuals who are employed. Since the wages of
employed workers are not randomly sampled
from the distributionof potential wages, it is
difficult to draw inferences concerning racial
gaps in potential wages from data on observed
wage rates. RichardButler and James Heckman
(1977) raised this issue in the context of assessing the impact of governmentpolicies on racial
income inequality. Charles Brown (1984),
James Smith and Finis Welch (1989), and AmitabhChandra(2000) also examine the extent of
racial differences in participationrates and the
impact of these differences on measures of racial wage and income gaps. This topic remains
salient, in part, because employment rates
among working-ageblack males remain significantly below correspondingrates for whites.
Neal and Johnson (1996) estimated racial
gaps in median wages among men by imputing
wages of zero for all men in a particularcrosssection who reportthat they have not worked at
all during the survey period. Under a specific
assumptionconcerningthe distributionof missing wages, this procedureyields consistent estimatesof the black-white gap in medianwages
conditional on observed characteristics.Below,
we spell out this assumptionand use panel data
to investigate the extent to which it may be
violated in cross-section wage analyses. Our
results suggest that imputing wages of zero for
unemployed individuals may provide a reasonable way to estimate median wage regressions
among men.
I. A SimpleImputationMethod
Consider the following linear model:
Wi- X=
o + Si
where wi, Xi, and si are the wage offer, observed characteristics,and unobservedtraitsfor
individuali. The conditionalmedian of -i given
Xi is assumed to be zero. We are interestedin
identifying the unknown parametervector I30.
Ourproblemis that wi is not observedfor those
who do not work, Ii = 0. We proceed by
creating a variableyi such that yi = wi if Ii =
1 andyi = 0 if Ii = 0, and we assume that the
following condition (Condition A) holds:
wi < Xi4
if II-0.
Here, ,8 is a hypothetical LAD (least absolute
deviation) estimator based on the true wage
offers, wi. Given these assumptions,LAD estimation using yi has the following property:
igimputed=
argmin0 E IYi-X'l
i~ l
N
argminp >
w-
X431
because Condition A implies that the LAD
estimation is not affected at all by the imputations.1 Further, since the hypothetical LAD
* Johnson:Departmentof Economics, Universityof Virginia, Charlottesville,VA 22903; Kitamuraand Neal: Departmentof Economics, University of Wisconsin, Madison,
WI 53706. Neal is also affiliatedwith the National Bureat
of Economic Research.
1
Here, we assume that the LAD estimatoris unique.
339
This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC
All use subject to JSTOR Terms and Conditions
AEA PAPERSAND PROCEEDINGS
340
RESULTSUSING VARIOUS
TABLE 1-MEDIAN REGRESSION
METHODS:NLSY, 1990-1991
WAGE IMPUTATION
Variable
(i)
Black
-0.091
(0.036)
Hispanic
0.013
(0.039)
Age
0.058
(0.017)
AFQT
0.197
(0.016)
(AFQT)2
0.007
(0.014)
N:
1,593
(ii)
(iii)
(iv)
(v)
-0.134
(0.034)
-0.014
(0.038)
0.055
(0.017)
0.206
(0.015)
0.010
(0.014)
-0.141
(0.035)
-0.018
(0.038)
0.057
(0.017)
0.202
(0.015)
0.011
(0.014)
-0.138
(0.033)
-0.017
(0.037)
0.061
(0.016)
0.200
(0.015)
0.010
(0.013)
-0.139
(0.031)
-0.017
(0.035)
0.060
(0.015)
0.200
(0.014)
0.010
(0.013)
1,674
1,674
1,674
1,674
Notes: Explanationof regression imputations:(i) restricted
sample, no imputations;(ii) impute zero if missing; (iii) use
new wage data, 1992-1993; (iv) use new wage data, 19881989; (v) use new wage data, 1988, 1989, 1992, and 1993.
Standarderrorsarereportedin parentheses.Details concerning these samples and those used in Figure 1, are available
online (see (http://www.ssc.wisc.edu/-dneal)).
estimator1 is consistent,we know that I3imputed
is also a consistent estimator of I80 (see Peter
Bloomfield and William Steiger [1983 pp. 4452] for details).
II.
The ImputationMethodExamined
The method we have described is easy to
implementand also has importantconsequences
for estimates of the effect of race on wages. To
see this, comparethe two median wage regressions presentedin the firsttwo columns of Table
1. The dependent variable is the log of the
average wage earned over the period 19901991, and the data, from the National Longitudinal Survey of Youth (NLSY), are the same
observationsused in Neal and Johnson (1996).
We lack wage observationsonly for those who
work neither in 1990 nor in 1991. In the first
regression, we simply eliminate all individuals
who did not work in eitherinterviewyear. In the
second regression, we replicate the Neal and
Johnson (1996) results by imputing a wage of
zero for all individuals who do not work.2 A
of the two regressions reveals that
comnparison
the results in column (i) may understate the
magnitudeof the black-white wage gap by fail-
2
The coefficients in these columns do not match Neal
and Johnson (1996) exactly because the NLSY data are
edited over time as coding errorsare found.
MAY2000
ing to account for the missing-data problem
created by individuals who are not employed.
The estimated black-white gap in log wages
expands by 50 percent from -0.091 to -0.134
when we add the 81 imputed wages for people
who did not reportworking in either interview
year.
Joseph Altonji and Rebecca Blank (1999)
questionthe wisdom of imputinglow wages for
individuals who are not working. They argue
that some of the NLSY respondentswho are not
working may be high-wage workers who are
temporarily unemployed or out of the labor
force. We can never know exactly what wages
these workers would have received if they had
worked during 1990 or 1991. However, we can
shed light on Altonji and Blank's conjectureby
exploiting the panel nature of the NLSY data.
We look at datafrom two years beyond and two
years before the 1990-1991 periodto find wage
observationsfor the 81 individuals who report
not working in the 1990 and 1991 interview
years.
Panel A of Figure 1 summarizesour findings.
The second column reports that in 49 of 81
cases, we are able to find a wage observationin
at least one of the surveys from the years 1988,
1989, 1992, or 1993. The third column breaks
down the locations of these wage observations.
Eight men did not report a valid wage for the
1988-1989 period but did report a valid wage
during the 1992-1993 period. Another 23 men
reportthe opposite, while yet another18 report
wages in both the before and after periods.
The fourthcolumn describesthe relationships
between these new wage observations and the
predictedmedian wage given each individual's
characteristics. We can interpret the results
from this column in the context of standard
search theory. Ignoring the complication of finite life spans, simple search models predict
that each worker's reservation wage will be
constantover time. Regardlessof the details, all
models with a constantreservationwage imply
that any person who reports a wage before or
after 1990-1991 that is lower thanthe predicted
median wage given his characteristics must
have also faced a best offer during 1990-1991
that was below this predicted median. To see
this, consider a workeri, who reportsa wage in
1992 that is less than the predicted median
given his characteristics, Wi92 < Xi$. We
This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC
All use subject to JSTOR Terms and Conditions
ECONOMICWELL-BEINGOF AFRICAN-AMERICANS
VOL.90 NO. 2
A) Neal and Johnson-Sample
Wage observation
in sample penod?
Period: 1990-1991
Otherwage
observation?
Timing of other
wage observation
Otherwage observation
greaterthan predictedmedian
given Xi?
Yes: 1,593
No: 81 (49)
No 32 (20)
Y
4
1Sm
2es9o
8l(6) After01(0)
only:
Yes:
8 (6)
No:No:
Before only: 23 (11) I Yes: 8 (4)
No: 15 (7)
Both before and
after: 18 (12)
B) Mincer Regression
(l)
i Sometimes: 3 (4)
1iNever:lI (7)
IAlways:2
Sample Period: 1992
Yes: 3,662
N
No: 294 (157)
1(6
Yes 17
l)tI Afterronly:31 (22)
:Yes: 7(6)
No: 24 (16)
Befor-eonly: 78 (41);1 Yes: 28 (16)
:No: 50 (25)
Both before and
Always: 10 (7)
t Sometimes: 25 (16)
after:69s(4)
INever: 34 (25)
C) Mincer Regression with Two-Year Average Wage-Sample
N
No: 205 (94)
0(1
::
Yes: 15 (53)
t
f~
Yes: 8 (7)
49 (20)
Before only:
a
Yes: 14 (3)
:
19 (13)
Boththati
bi'
time,
1
:
After only 27(20)
-No:
FIGURE
Period: 1991-1992
:;
Yes:4,003
the
befoe
aand
h
Both
eorvatond
after:
miiSometimes:
NEW WAGE OBSERVATIONS:
AND
Two YEARs
No: 35 (17)
Always: 5 (3)
5 w(1)
Never: 19 (9)
Two YEARs
AFtER
BEFoRE ORIGINAL SAMPLE
Notes: Numbers in parenthesesare those not reportingdisability.
knowthatalthoughWM9 < Xi3, WM9exceeds
worker i's reservationwage and therefore exceeds any offers that he received during the
1990-1991 period. Note that this argument
holds even if worker i also reports a wage for
1989 thatis greaterthanhis predictedmedian.If
an individual's reservation wage is constant
over time, the minimum of observed wages
must bound his unaccepted and hence unobserved wage offers from above.
Given this framework,ten cases appearto be
problematic.Eight individuals who do not report wages during the 1992-1993 period do
reportwages during 1988-1989 that exceed the
predicted medians based on their characteristics. Two more individuals,who reportwages in
both the 1988-1989 and 1992-1993 periods,
always reportwages greaterthan predictedmedians given their characteristics.Because our
assumption of a constant reservation wage
seems less attractive in cases where persons
341
reporthealth problems, we are especially interested in outcomes for workersreportingno disabilities, given in parenthesesfor each category
in Figure 1. Among these men, only one reports
wages both before and after 1990-1991 that
exceed the predictedmedian based on his characteristics.Four otherswho do not reportwages
in the 1992-1993 period do reportwages before
1990-1991 that satisfy this criterion.Thus, five
of the 81 imputations, or just over 6 percent,
seem particularlysuspect.
It is difficult to draw firm conclusions based
on these data. Even in the five cases noted
above, it is possible that these workerslost their
jobs and temporarily (during 1990-1991) received wage offers that were not only below
their reservationwage, but also below the predicted median based on their characteristics.
Thus, it is possible that all five cases involve
valid imputations.On the otherhand,the second
colunmlindicates that 32 individuals never report a wage during the entire 1988-1993 period. We assume that, relative to others with
similareducationand experience,these individuals actually face low wage offers. However,
we have no direct evidence that this is true.
In sum, if those who never worked duringthe
1988-1993 period actually faced low wage offers given their characteristics,then the vast
majorityof our 81 wage imputationsare likely
to involve individuals who faced wage offers
during 1990-1991 that were below predicted
medians given their characteristics.Further,the
final three columns of Table I show that, in this
NLSY sample, estimates of racial gaps in median wages do not change much when we incorporatewage datafrom the 1988, 1989, 1992,
and 1993 surveys. These columns reportresults
derived from different rules for assigning
1990-1991 wages based on wage observations
found in other survey years. In all cases, the
original imputation procedure produces estimates that are very close to those based on the
expanded data.
III. Imputationswith a Mincerian
EarningsFunction
The specifications employed in Table 1 use
scores on the Armed Forces Qualifying Test to
control for skill, but these scores are not available in most data sets. Further,following Neal
This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC
All use subject to JSTOR Terms and Conditions
342
MAY2000
AEA PAPERSAND PROCEEDINGS
and Johnson (1996), these regressions use data
from only the three youngest cohorts in the
NLSY. We now explore the validity of the
imputationprocedure described above using a
more common median regression specification
and data from all birth cohorts. Figure 1B provides information analogous to that in Figure
IA, but in this case, predicted medians are
based on a Mincerian wage equation that includes schooling, potential experience, and potential experience squared.Further,the analysis
is based on a single cross-section of the NLSY,
the 1992 wave. Here, thereare 294 persons with
mTissingwage observationsfor 1992. Of these,
178 reportwages during either the 1990-1991
period or the 1993-1994 period.
The r-esultsin Figure 1:Bprovide slightly less
rule describedabove.
supportfor the imnputation
In txis case, 45 individuals(7 + 28 + 10) only
report wages that exceed predicted medians
based on t;heircharacteristics.Of these, 29 (6 +
16 - '1) do not report disabilities. These 29
individualsrepresentless than 10 percentof the
sample of inmputations.The results in Figure
IC mirrorthose in ^Figure1B but are based on a
regression involving two-year wage averages.3
By using wage informationfrom two years, we
reduce the need for imputations.Figure IC reports a larger total sample than Figure lB but
only 205 imputationscomparedto 294 in panel
13.Only three of these 205 cases involve persons without a disability who report wages
aibovepredicted medians, based on their characteristics,both before and afterthe :1991-1992
period. Further,only 13 cases, or just over 6
percent, involve persons without disabilities
who only report wages above their relevant
predictedmedians.
Using the data summarizedin Figure lB and
C, we have computed regressions like those in
Table 1. These results appearin Table 2. Once
again, we find evidence that regression results
based only on samples of persons who are cur-
3 Here, the total sample is largerthanin Figure lB. Some
persons who reporta valid wage in 1991 and did not report
a valid wage in 1992 were actually working in both years,
but coding problemscontaminatedtheir 1992 wage records.
In our 1992 cross-section analyses, we eliminate these
workersfrom the sample. We do not impute wages of zero
unless individualsreportthat they did not work during the
sample period in question.
RESULTSUSING VARIOUS
TABLE2-MEDIAN REGRESSION
METHODS:(A) NLSY,1992(BASED ON
WAGE IMPUTATION
IN FIG. 1B); (B) NLSY,1991-1992
DATA SUMMARIZED
IN FIG. IC)
(BASED ON DATA SUMMARIZED
Variable
(i)
(ii)
(iii)
(iv)
(v)
A. Based on Data in Figure IB (NLSY,1992):
-0.300
(0.021)
--0.079
Hispanic
(0.024)
Highestgrade (.090
completed (0.005)
Black
N:
3,662
-0.362
(0.022)
-0.091
(0.025)
0.101
(0.005)
-0.351
(0.023)
-0.089
(0.027)
0.099
(0.005)
-0.343
(0.023)
-0.081
(0.027)
0.101
(0.005)
-0.338
(0.025)
-0.084
(0.029)
0.(98
(0.006)
3,956
3,956
3,956
3,956
B. Based on Data in Figure IC (NLSY, 1991-4992):
-0.302 -0.335
(0.017) (0.017)
-0.097 (-0.102
Hispanic
(0.020) (0.020)
0.081
Highestgrade 0.076
completed (0.004) (0.004)
Black
N:
4,003
4,208
-0.325
(0.018)
-0.102
(0.020)
0.081
(0.004)
--0.329
(0.017)
-0.098
(0.020)
(.081
(0.004)
-0.322
(0.017)
-0(199
(0.019)
0.0XI
(0.004)
4,208
4,208
4,20(8
Notes: Explanationof regression imputations:(i) restricted
(ii) imputezeroif missing;(iii) use
sample,no imputations;
new wage data, 1992-1993; (iv) Use new wage data, 1990-(
1991 (panel A) or 1989-1990 (panel B); (v) use new wage
data, 1990, 1991, 1993, and 1994 (panel A) or 1989, 1990,
1993, and 1994 (panel B). Each regressionalso controls for
potential experience and its square. Standard errors arc
reportedin parentheses.
rently workingtend to understatethe magnitude
of the black-white wage gap. Both mnediatregressions based on the imputation rule describedabove and medianregressionsinvolving
imputationsand additionalwage data from adjacent years yield black-white wage gaps that
are greater than those based on the sample of
observedwages. However, the results involving
imputations and additional wage data imply
gaps that are slightly smallerthan those implied
by median regressions involving imputations
alone.
IV. Conclusion
Imputing below-median wages to workers
with no wage observationsmay be a simple and
fairly accurateway of handling selection problems when estimatingmedian wage regressions
among men. This proceduresignificantlyaffects
estimatesof racial gaps in median wages. Using
data from short panels rather than single--year
cross-sections may mitigate the need for addi-
This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC
All use subject to JSTOR Terms and Conditions
VOL.90 NO. 2
ECONOMICWELL-BEINGOF AFRICAN-AMERICANS
tional imputationsand also reduce the frequencyof imputationerror.4
REFERENCES
Altonji, Joseph and Blank, Rebecca. "Race and
Gender in the Labor Market,"in Orley Ashenfelter and David Card, eds., fIandbook of
labor economics, Vol. 3. Amsterdam:NorthHolland, 1999, pp. 3144-3213.
Bloomfield,Peter and Steiger, WilliamL. Least
absolute deviations: Theory, applications,
and algorithms. Boston, MA: Birkhauser,
1983.
Brown, Charles."Black-White EarningsRatios
Since the Civil Rights Acts of 1964: The
4 Note that computing average wages over short panels
involves implicit imputations,since only years with valid
wage data contributeto the average calculations.
343
Importance of Labor Market Dropouts."
Quarterly Journal of Economics, February
1984, 99(1), pp. 31-44.
Butler, Richard and Heckman, James J. "The
Government's Impact on the Labor Market
Status of Black Americans: A Critical Review," in L. Hausmanet al., eds. Equal rights
and industrialrelations.Madison,WI: Industrial Relations Research Association, 1977.
Chandra,Amitabh."Is the Convergence in the
Racial Wage Gap Illusory?"Mimeo, University of Kentucky, 2000.
Neal, Derekand Johnson,William."TheRole of
Premarket Factors in Black-White Wage
Differences." Journal of Political Economy,
October 1996, 104(5), pp. 869-95.
Smith, James and Welch, Finis. "Black Economic Progress after Myrdal." Journal of
Economic Literature, June 1989, 27(2), pp.
519-64.
This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC
All use subject to JSTOR Terms and Conditions