American Economic Association Evaluating a Simple Method for Estimating Black-White Gaps in Median Wages Author(s): William Johnson, Yuichi Kitamura and Derek Neal Source: The American Economic Review, Vol. 90, No. 2, Papers and Proceedings of the One Hundred Twelfth Annual Meeting of the American Economic Association (May, 2000), pp. 339343 Published by: American Economic Association Stable URL: http://www.jstor.org/stable/117247 Accessed: 27-07-2015 13:53 UTC Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/ info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to The American Economic Review. http://www.jstor.org This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC All use subject to JSTOR Terms and Conditions Evaluatinga Simple Method for Estimating Black-White Gaps in Median Wages By WILLIAMJOHNSON,YUICHI KITAMURA,AND DEREK NEAL* Racial differencesin wage ratesareimportant measures of economic inequality among races because, for almost all individuals, labor income constitutesthe most importantcomponent of lifetime income. As a consequence,the prices at which individualsmay sell their time provide vital infornation about the distributionof welfare and economic success. However, we only observe prices when markettransactionsoccur, and thus we only observe wage rates for individuals who are employed. Since the wages of employed workers are not randomly sampled from the distributionof potential wages, it is difficult to draw inferences concerning racial gaps in potential wages from data on observed wage rates. RichardButler and James Heckman (1977) raised this issue in the context of assessing the impact of governmentpolicies on racial income inequality. Charles Brown (1984), James Smith and Finis Welch (1989), and AmitabhChandra(2000) also examine the extent of racial differences in participationrates and the impact of these differences on measures of racial wage and income gaps. This topic remains salient, in part, because employment rates among working-ageblack males remain significantly below correspondingrates for whites. Neal and Johnson (1996) estimated racial gaps in median wages among men by imputing wages of zero for all men in a particularcrosssection who reportthat they have not worked at all during the survey period. Under a specific assumptionconcerningthe distributionof missing wages, this procedureyields consistent estimatesof the black-white gap in medianwages conditional on observed characteristics.Below, we spell out this assumptionand use panel data to investigate the extent to which it may be violated in cross-section wage analyses. Our results suggest that imputing wages of zero for unemployed individuals may provide a reasonable way to estimate median wage regressions among men. I. A SimpleImputationMethod Consider the following linear model: Wi- X= o + Si where wi, Xi, and si are the wage offer, observed characteristics,and unobservedtraitsfor individuali. The conditionalmedian of -i given Xi is assumed to be zero. We are interestedin identifying the unknown parametervector I30. Ourproblemis that wi is not observedfor those who do not work, Ii = 0. We proceed by creating a variableyi such that yi = wi if Ii = 1 andyi = 0 if Ii = 0, and we assume that the following condition (Condition A) holds: wi < Xi4 if II-0. Here, ,8 is a hypothetical LAD (least absolute deviation) estimator based on the true wage offers, wi. Given these assumptions,LAD estimation using yi has the following property: igimputed= argmin0 E IYi-X'l i~ l N argminp > w- X431 because Condition A implies that the LAD estimation is not affected at all by the imputations.1 Further, since the hypothetical LAD * Johnson:Departmentof Economics, Universityof Virginia, Charlottesville,VA 22903; Kitamuraand Neal: Departmentof Economics, University of Wisconsin, Madison, WI 53706. Neal is also affiliatedwith the National Bureat of Economic Research. 1 Here, we assume that the LAD estimatoris unique. 339 This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC All use subject to JSTOR Terms and Conditions AEA PAPERSAND PROCEEDINGS 340 RESULTSUSING VARIOUS TABLE 1-MEDIAN REGRESSION METHODS:NLSY, 1990-1991 WAGE IMPUTATION Variable (i) Black -0.091 (0.036) Hispanic 0.013 (0.039) Age 0.058 (0.017) AFQT 0.197 (0.016) (AFQT)2 0.007 (0.014) N: 1,593 (ii) (iii) (iv) (v) -0.134 (0.034) -0.014 (0.038) 0.055 (0.017) 0.206 (0.015) 0.010 (0.014) -0.141 (0.035) -0.018 (0.038) 0.057 (0.017) 0.202 (0.015) 0.011 (0.014) -0.138 (0.033) -0.017 (0.037) 0.061 (0.016) 0.200 (0.015) 0.010 (0.013) -0.139 (0.031) -0.017 (0.035) 0.060 (0.015) 0.200 (0.014) 0.010 (0.013) 1,674 1,674 1,674 1,674 Notes: Explanationof regression imputations:(i) restricted sample, no imputations;(ii) impute zero if missing; (iii) use new wage data, 1992-1993; (iv) use new wage data, 19881989; (v) use new wage data, 1988, 1989, 1992, and 1993. Standarderrorsarereportedin parentheses.Details concerning these samples and those used in Figure 1, are available online (see (http://www.ssc.wisc.edu/-dneal)). estimator1 is consistent,we know that I3imputed is also a consistent estimator of I80 (see Peter Bloomfield and William Steiger [1983 pp. 4452] for details). II. The ImputationMethodExamined The method we have described is easy to implementand also has importantconsequences for estimates of the effect of race on wages. To see this, comparethe two median wage regressions presentedin the firsttwo columns of Table 1. The dependent variable is the log of the average wage earned over the period 19901991, and the data, from the National Longitudinal Survey of Youth (NLSY), are the same observationsused in Neal and Johnson (1996). We lack wage observationsonly for those who work neither in 1990 nor in 1991. In the first regression, we simply eliminate all individuals who did not work in eitherinterviewyear. In the second regression, we replicate the Neal and Johnson (1996) results by imputing a wage of zero for all individuals who do not work.2 A of the two regressions reveals that comnparison the results in column (i) may understate the magnitudeof the black-white wage gap by fail- 2 The coefficients in these columns do not match Neal and Johnson (1996) exactly because the NLSY data are edited over time as coding errorsare found. MAY2000 ing to account for the missing-data problem created by individuals who are not employed. The estimated black-white gap in log wages expands by 50 percent from -0.091 to -0.134 when we add the 81 imputed wages for people who did not reportworking in either interview year. Joseph Altonji and Rebecca Blank (1999) questionthe wisdom of imputinglow wages for individuals who are not working. They argue that some of the NLSY respondentswho are not working may be high-wage workers who are temporarily unemployed or out of the labor force. We can never know exactly what wages these workers would have received if they had worked during 1990 or 1991. However, we can shed light on Altonji and Blank's conjectureby exploiting the panel nature of the NLSY data. We look at datafrom two years beyond and two years before the 1990-1991 periodto find wage observationsfor the 81 individuals who report not working in the 1990 and 1991 interview years. Panel A of Figure 1 summarizesour findings. The second column reports that in 49 of 81 cases, we are able to find a wage observationin at least one of the surveys from the years 1988, 1989, 1992, or 1993. The third column breaks down the locations of these wage observations. Eight men did not report a valid wage for the 1988-1989 period but did report a valid wage during the 1992-1993 period. Another 23 men reportthe opposite, while yet another18 report wages in both the before and after periods. The fourthcolumn describesthe relationships between these new wage observations and the predictedmedian wage given each individual's characteristics. We can interpret the results from this column in the context of standard search theory. Ignoring the complication of finite life spans, simple search models predict that each worker's reservation wage will be constantover time. Regardlessof the details, all models with a constantreservationwage imply that any person who reports a wage before or after 1990-1991 that is lower thanthe predicted median wage given his characteristics must have also faced a best offer during 1990-1991 that was below this predicted median. To see this, consider a workeri, who reportsa wage in 1992 that is less than the predicted median given his characteristics, Wi92 < Xi$. We This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC All use subject to JSTOR Terms and Conditions ECONOMICWELL-BEINGOF AFRICAN-AMERICANS VOL.90 NO. 2 A) Neal and Johnson-Sample Wage observation in sample penod? Period: 1990-1991 Otherwage observation? Timing of other wage observation Otherwage observation greaterthan predictedmedian given Xi? Yes: 1,593 No: 81 (49) No 32 (20) Y 4 1Sm 2es9o 8l(6) After01(0) only: Yes: 8 (6) No:No: Before only: 23 (11) I Yes: 8 (4) No: 15 (7) Both before and after: 18 (12) B) Mincer Regression (l) i Sometimes: 3 (4) 1iNever:lI (7) IAlways:2 Sample Period: 1992 Yes: 3,662 N No: 294 (157) 1(6 Yes 17 l)tI Afterronly:31 (22) :Yes: 7(6) No: 24 (16) Befor-eonly: 78 (41);1 Yes: 28 (16) :No: 50 (25) Both before and Always: 10 (7) t Sometimes: 25 (16) after:69s(4) INever: 34 (25) C) Mincer Regression with Two-Year Average Wage-Sample N No: 205 (94) 0(1 :: Yes: 15 (53) t f~ Yes: 8 (7) 49 (20) Before only: a Yes: 14 (3) : 19 (13) Boththati bi' time, 1 : After only 27(20) -No: FIGURE Period: 1991-1992 :; Yes:4,003 the befoe aand h Both eorvatond after: miiSometimes: NEW WAGE OBSERVATIONS: AND Two YEARs No: 35 (17) Always: 5 (3) 5 w(1) Never: 19 (9) Two YEARs AFtER BEFoRE ORIGINAL SAMPLE Notes: Numbers in parenthesesare those not reportingdisability. knowthatalthoughWM9 < Xi3, WM9exceeds worker i's reservationwage and therefore exceeds any offers that he received during the 1990-1991 period. Note that this argument holds even if worker i also reports a wage for 1989 thatis greaterthanhis predictedmedian.If an individual's reservation wage is constant over time, the minimum of observed wages must bound his unaccepted and hence unobserved wage offers from above. Given this framework,ten cases appearto be problematic.Eight individuals who do not report wages during the 1992-1993 period do reportwages during 1988-1989 that exceed the predicted medians based on their characteristics. Two more individuals,who reportwages in both the 1988-1989 and 1992-1993 periods, always reportwages greaterthan predictedmedians given their characteristics.Because our assumption of a constant reservation wage seems less attractive in cases where persons 341 reporthealth problems, we are especially interested in outcomes for workersreportingno disabilities, given in parenthesesfor each category in Figure 1. Among these men, only one reports wages both before and after 1990-1991 that exceed the predictedmedian based on his characteristics.Four otherswho do not reportwages in the 1992-1993 period do reportwages before 1990-1991 that satisfy this criterion.Thus, five of the 81 imputations, or just over 6 percent, seem particularlysuspect. It is difficult to draw firm conclusions based on these data. Even in the five cases noted above, it is possible that these workerslost their jobs and temporarily (during 1990-1991) received wage offers that were not only below their reservationwage, but also below the predicted median based on their characteristics. Thus, it is possible that all five cases involve valid imputations.On the otherhand,the second colunmlindicates that 32 individuals never report a wage during the entire 1988-1993 period. We assume that, relative to others with similareducationand experience,these individuals actually face low wage offers. However, we have no direct evidence that this is true. In sum, if those who never worked duringthe 1988-1993 period actually faced low wage offers given their characteristics,then the vast majorityof our 81 wage imputationsare likely to involve individuals who faced wage offers during 1990-1991 that were below predicted medians given their characteristics.Further,the final three columns of Table I show that, in this NLSY sample, estimates of racial gaps in median wages do not change much when we incorporatewage datafrom the 1988, 1989, 1992, and 1993 surveys. These columns reportresults derived from different rules for assigning 1990-1991 wages based on wage observations found in other survey years. In all cases, the original imputation procedure produces estimates that are very close to those based on the expanded data. III. Imputationswith a Mincerian EarningsFunction The specifications employed in Table 1 use scores on the Armed Forces Qualifying Test to control for skill, but these scores are not available in most data sets. Further,following Neal This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC All use subject to JSTOR Terms and Conditions 342 MAY2000 AEA PAPERSAND PROCEEDINGS and Johnson (1996), these regressions use data from only the three youngest cohorts in the NLSY. We now explore the validity of the imputationprocedure described above using a more common median regression specification and data from all birth cohorts. Figure 1B provides information analogous to that in Figure IA, but in this case, predicted medians are based on a Mincerian wage equation that includes schooling, potential experience, and potential experience squared.Further,the analysis is based on a single cross-section of the NLSY, the 1992 wave. Here, thereare 294 persons with mTissingwage observationsfor 1992. Of these, 178 reportwages during either the 1990-1991 period or the 1993-1994 period. The r-esultsin Figure 1:Bprovide slightly less rule describedabove. supportfor the imnputation In txis case, 45 individuals(7 + 28 + 10) only report wages that exceed predicted medians based on t;heircharacteristics.Of these, 29 (6 + 16 - '1) do not report disabilities. These 29 individualsrepresentless than 10 percentof the sample of inmputations.The results in Figure IC mirrorthose in ^Figure1B but are based on a regression involving two-year wage averages.3 By using wage informationfrom two years, we reduce the need for imputations.Figure IC reports a larger total sample than Figure lB but only 205 imputationscomparedto 294 in panel 13.Only three of these 205 cases involve persons without a disability who report wages aibovepredicted medians, based on their characteristics,both before and afterthe :1991-1992 period. Further,only 13 cases, or just over 6 percent, involve persons without disabilities who only report wages above their relevant predictedmedians. Using the data summarizedin Figure lB and C, we have computed regressions like those in Table 1. These results appearin Table 2. Once again, we find evidence that regression results based only on samples of persons who are cur- 3 Here, the total sample is largerthanin Figure lB. Some persons who reporta valid wage in 1991 and did not report a valid wage in 1992 were actually working in both years, but coding problemscontaminatedtheir 1992 wage records. In our 1992 cross-section analyses, we eliminate these workersfrom the sample. We do not impute wages of zero unless individualsreportthat they did not work during the sample period in question. RESULTSUSING VARIOUS TABLE2-MEDIAN REGRESSION METHODS:(A) NLSY,1992(BASED ON WAGE IMPUTATION IN FIG. 1B); (B) NLSY,1991-1992 DATA SUMMARIZED IN FIG. IC) (BASED ON DATA SUMMARIZED Variable (i) (ii) (iii) (iv) (v) A. Based on Data in Figure IB (NLSY,1992): -0.300 (0.021) --0.079 Hispanic (0.024) Highestgrade (.090 completed (0.005) Black N: 3,662 -0.362 (0.022) -0.091 (0.025) 0.101 (0.005) -0.351 (0.023) -0.089 (0.027) 0.099 (0.005) -0.343 (0.023) -0.081 (0.027) 0.101 (0.005) -0.338 (0.025) -0.084 (0.029) 0.(98 (0.006) 3,956 3,956 3,956 3,956 B. Based on Data in Figure IC (NLSY, 1991-4992): -0.302 -0.335 (0.017) (0.017) -0.097 (-0.102 Hispanic (0.020) (0.020) 0.081 Highestgrade 0.076 completed (0.004) (0.004) Black N: 4,003 4,208 -0.325 (0.018) -0.102 (0.020) 0.081 (0.004) --0.329 (0.017) -0.098 (0.020) (.081 (0.004) -0.322 (0.017) -0(199 (0.019) 0.0XI (0.004) 4,208 4,208 4,20(8 Notes: Explanationof regression imputations:(i) restricted (ii) imputezeroif missing;(iii) use sample,no imputations; new wage data, 1992-1993; (iv) Use new wage data, 1990-( 1991 (panel A) or 1989-1990 (panel B); (v) use new wage data, 1990, 1991, 1993, and 1994 (panel A) or 1989, 1990, 1993, and 1994 (panel B). Each regressionalso controls for potential experience and its square. Standard errors arc reportedin parentheses. rently workingtend to understatethe magnitude of the black-white wage gap. Both mnediatregressions based on the imputation rule describedabove and medianregressionsinvolving imputationsand additionalwage data from adjacent years yield black-white wage gaps that are greater than those based on the sample of observedwages. However, the results involving imputations and additional wage data imply gaps that are slightly smallerthan those implied by median regressions involving imputations alone. IV. Conclusion Imputing below-median wages to workers with no wage observationsmay be a simple and fairly accurateway of handling selection problems when estimatingmedian wage regressions among men. This proceduresignificantlyaffects estimatesof racial gaps in median wages. Using data from short panels rather than single--year cross-sections may mitigate the need for addi- This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC All use subject to JSTOR Terms and Conditions VOL.90 NO. 2 ECONOMICWELL-BEINGOF AFRICAN-AMERICANS tional imputationsand also reduce the frequencyof imputationerror.4 REFERENCES Altonji, Joseph and Blank, Rebecca. "Race and Gender in the Labor Market,"in Orley Ashenfelter and David Card, eds., fIandbook of labor economics, Vol. 3. Amsterdam:NorthHolland, 1999, pp. 3144-3213. Bloomfield,Peter and Steiger, WilliamL. Least absolute deviations: Theory, applications, and algorithms. Boston, MA: Birkhauser, 1983. Brown, Charles."Black-White EarningsRatios Since the Civil Rights Acts of 1964: The 4 Note that computing average wages over short panels involves implicit imputations,since only years with valid wage data contributeto the average calculations. 343 Importance of Labor Market Dropouts." Quarterly Journal of Economics, February 1984, 99(1), pp. 31-44. Butler, Richard and Heckman, James J. "The Government's Impact on the Labor Market Status of Black Americans: A Critical Review," in L. Hausmanet al., eds. Equal rights and industrialrelations.Madison,WI: Industrial Relations Research Association, 1977. Chandra,Amitabh."Is the Convergence in the Racial Wage Gap Illusory?"Mimeo, University of Kentucky, 2000. Neal, Derekand Johnson,William."TheRole of Premarket Factors in Black-White Wage Differences." Journal of Political Economy, October 1996, 104(5), pp. 869-95. Smith, James and Welch, Finis. "Black Economic Progress after Myrdal." Journal of Economic Literature, June 1989, 27(2), pp. 519-64. This content downloaded from 128.143.200.124 on Mon, 27 Jul 2015 13:53:22 UTC All use subject to JSTOR Terms and Conditions
© Copyright 2026 Paperzz