Local Employment Dynamics Data: Advanced Topics C2ER Training Workshop June 4, 2012 Stephen Tibbets Erika McEntarfer LEHD Program US Census Bureau Overview of this section • Confidentiality protection in QWI and OnTheMap • Data identities • Understanding differences between LED data and other data products • New data items: education, race & ethnicity, firm age and size • Question and answer session 2 Confidentiality protection in QWI • QWI was one of the first public use data products to use noise infusion to protect the underlying microdata – Chief advantage: noise infusion allows release of small cells that would otherwise be suppressed. – Noise infusion can’t fully protect cells with v. few observations, so there is still some cell suppression in QWI 3 Noise Infusion (“Fuzzing”) • How noise infusion works – Every data item is distorted by a minimum amount – For a given workplace, data are always distorted in the same direction, by the same percentage in every period and release of QWI’s • When aggregated, the effects of the distortion cancel out for the vast majority of the estimates – The fewer entities in the cell, the more protection (distortion) needed. – Be aware of noise infusion and suppression when aggregating small cells. 4 Confidentiality protection in OnTheMap/LODES • LODES is one of the first partiallysynthetic data releases – Workforce characteristics by place of residence are synthesized conditional on the underlying microdata – Workforce characteristics by place of work are not synthesized (thus the ‘partially’) 5 QWI Identities: Overview • There are a number of identities that have been defined to relate QWI’s both within and across quarters – A complete list can be found in infrastructure document, section A.2.4 • Identities hold at the establishment level (in restricted use data) • Identities may not hold exactly in publicly released data due to a number of factors, including: – weighting – fuzzing – changes in geography or industry for individual establishments over time 6 Intertemporal Identity • Employment at end of period t equals employment at beginning of period t+1 EmpEndt = Empt+1 – When this may not hold: • Industry or geography changes on establishment record between quarters • Weighting adjustments change every quarter • Fuzz factor changes (successor-predecessor only) 7 Evolution of End-of-period employment • Beginning and end-of-period employment are tied by accessions and separations EmpEndt = Empt + Hirt - Sept – When this may not hold: • This holds almost exactly in the public release files – Some minor differences may arise due to rounding or precision in the calculation – Selected measures may be suppressed on individual records 8 Job Flow Identity • Job flows represents the net of job creation and job destruction FrmJbCt = FrmJbGnt - FrmJbLst – When this may not hold: • This holds almost exactly in the public release files – Some minor differences may arise due to rounding or precision in the calculation – Selected measures may be suppressed on individual records 9 Creation-Destruction Identity • The difference between beginning and endof-period employment equals the net of creation and destruction EmpEndt = Empt + FrmJbGnt - FrmJbLst – When this may not hold: • Alternate fuzzing is applied to firm measures, based on average fuzz factors for Emp and EmpEnd. • http://lehd.did.census.gov/led/library/techpapers/tp-2006-02.pdf 10 New Hires/Recalls Identity • Accessions is the sum of new hires and recalls HirAt = HirNt + HirRt – When this may not hold: • This holds almost exactly in the public release files – Some minor differences may arise due to rounding or precision in the calculation – Selected measures may be suppressed on individual records 11 Understanding Differences between QWI, OnTheMap and other data sources • Users are often confused when different data provide different answers – For QWI, users want to understand differences between QCEW and JOLTS – For OnTheMap, users want to understand differences between LODES and Journey to Work. 12 Understanding QCEW-QWI Differences • While state employment totals should be quite close, sub-state estimates will display deviations • These differences have multiple sources • Different source data, different employment concepts, geography edits, other edits and imputations differ across the agencies. • But chiefly arise because: – to provide worker demographics, QWI aggregates from individual UI records rather than firm employment – Census does not receive a QCEW file that includes final edits. 13 Causes of Differences: Measure Definition • B and Mon1 do not capture exactly the same universe – An individual may count towards either one of the measures, but not towards the other • Differences generally minor, but may be noticeable in some industries with particular seasonal patterns – e.g., education, agriculture 14 Causes of Differences: BLS Data Editing • LEHD data receipts – Before 2004 LEHD received BLS edited data – Since 2004 LEHD does not receive BLS edited data (CIPSEA) • BLS QCEW file may be edited/different from that which LEHD receives – Completeness – Imputed employment – Industry/geography changes • Statewide totals are close (<1% off) • LEHD QA will periodically note BLS QCEW data inconsistent with internal LEHD QCEW micro-data 15 Causes of Differences: UI Wage Data Reporting • Firm may fail to report wage records – QCEW still reported or imputed • Firm may report wage records and QCEW records on different account numbers – Successor/predecessor mistiming – Public sector issues • PIK (SSN) miscoding prevents linking wage records to same longitudinal job 16 Causes of Differences: Industry Assignment • Most establishments are assigned based on the reported NAICS_AUX • For earlier years in the data series, the reported SIC code is probabilistically mapped to the current NAICS codes – Imputes may also be used for transitions between 1997, 2002, and 2007 NAICS • LDB data are used for NAICS back-coding purposes when the file has been provided by state • Variations in algorithms between LEHD and BLS may result in differences – NAICS sector 55 (management of companies) displays particular issues during SIC-NAICS transition 17 Causes of Differences: Geographic Coding • LEHD performs own geo-coding of addresses – Generates lat-long for distance measures, allows custom geography • Address data are processed along with address data from other sources • Results may differ from BLS assignments – Marginal shift over county line – Significant relocation • Effort currently underway to reengineer LEHD geographic assignment to improve results 18 Differences between OnTheMap and Journey to Work • OnTheMap uses LEHD data – Administrative data on employment, wages, residence, and establishment locations • Journey to Work uses ACS data – User reported place of work and place of residence, wages and employment 19 OnTheMap and Journey to Work, some reasons they may differ • OnTheMap – Establishment may not be same as worksite (construction workers) – Tax address may differ from residence (students) • Journey to Work – High nonresponse on place of work – Commute distance is capped in JtoW – Response bias in employment, wages 20 21 22 23 24 25 26 Overview: Summary – The QWI are developed by incorporating data from a broad variety of sources – Differences in data sources, construction, and imputation procedures may cause employment estimates that do not match other sources 27
© Copyright 2026 Paperzz