LED Data: Advanced Topics

Local Employment Dynamics Data:
Advanced Topics
C2ER Training Workshop
June 4, 2012
Stephen Tibbets
Erika McEntarfer
LEHD Program
US Census Bureau
Overview of this section
• Confidentiality protection in QWI and OnTheMap
• Data identities
• Understanding differences between LED data
and other data products
• New data items: education, race & ethnicity, firm
age and size
• Question and answer session
2
Confidentiality protection in QWI
• QWI was one of the first public use data
products to use noise infusion to protect
the underlying microdata
– Chief advantage: noise infusion allows
release of small cells that would otherwise be
suppressed.
– Noise infusion can’t fully protect cells with v.
few observations, so there is still some cell
suppression in QWI
3
Noise Infusion (“Fuzzing”)
• How noise infusion works
– Every data item is distorted by a minimum
amount
– For a given workplace, data are always distorted
in the same direction, by the same percentage in
every period and release of QWI’s
• When aggregated, the effects of the distortion
cancel out for the vast majority of the
estimates
– The fewer entities in the cell, the more protection
(distortion) needed.
– Be aware of noise infusion and suppression when
aggregating small cells.
4
Confidentiality protection in
OnTheMap/LODES
• LODES is one of the first partiallysynthetic data releases
– Workforce characteristics by place of
residence are synthesized conditional on the
underlying microdata
– Workforce characteristics by place of work are
not synthesized (thus the ‘partially’)
5
QWI Identities: Overview
• There are a number of identities that have been
defined to relate QWI’s both within and across
quarters
– A complete list can be found in infrastructure document,
section A.2.4
• Identities hold at the establishment level (in restricted
use data)
• Identities may not hold exactly in publicly released
data due to a number of factors, including:
– weighting
– fuzzing
– changes in geography or industry for individual
establishments over time
6
Intertemporal Identity
• Employment at end of period t equals
employment at beginning of period t+1
EmpEndt = Empt+1
– When this may not hold:
• Industry or geography changes on establishment
record between quarters
• Weighting adjustments change every quarter
• Fuzz factor changes (successor-predecessor only)
7
Evolution of
End-of-period employment
• Beginning and end-of-period employment are
tied by accessions and separations
EmpEndt = Empt + Hirt - Sept
– When this may not hold:
• This holds almost exactly in the public release files
– Some minor differences may arise due to rounding or
precision in the calculation
– Selected measures may be suppressed on individual
records
8
Job Flow Identity
• Job flows represents the net of job creation
and job destruction
FrmJbCt = FrmJbGnt - FrmJbLst
– When this may not hold:
• This holds almost exactly in the public release files
– Some minor differences may arise due to rounding or
precision in the calculation
– Selected measures may be suppressed on individual
records
9
Creation-Destruction Identity
• The difference between beginning and endof-period employment equals the net of
creation and destruction
EmpEndt = Empt + FrmJbGnt - FrmJbLst
– When this may not hold:
• Alternate fuzzing is applied to firm measures, based on
average fuzz factors for Emp and EmpEnd.
• http://lehd.did.census.gov/led/library/techpapers/tp-2006-02.pdf
10
New Hires/Recalls Identity
• Accessions is the sum of new hires and
recalls
HirAt = HirNt + HirRt
– When this may not hold:
• This holds almost exactly in the public release files
– Some minor differences may arise due to rounding or
precision in the calculation
– Selected measures may be suppressed on individual
records
11
Understanding Differences between
QWI, OnTheMap and other data
sources
• Users are often confused when different
data provide different answers
– For QWI, users want to understand
differences between QCEW and JOLTS
– For OnTheMap, users want to understand
differences between LODES and Journey to
Work.
12
Understanding QCEW-QWI
Differences
• While state employment totals should be quite
close, sub-state estimates will display deviations
• These differences have multiple sources
• Different source data, different employment concepts,
geography edits, other edits and imputations differ across the
agencies.
• But chiefly arise because:
– to provide worker demographics, QWI aggregates
from individual UI records rather than firm
employment
– Census does not receive a QCEW file that includes
final edits.
13
Causes of Differences:
Measure Definition
• B and Mon1 do not capture exactly the
same universe
– An individual may count towards either one of
the measures, but not towards the other
• Differences generally minor, but may be
noticeable in some industries with
particular seasonal patterns
– e.g., education, agriculture
14
Causes of Differences:
BLS Data Editing
• LEHD data receipts
– Before 2004 LEHD received BLS edited data
– Since 2004 LEHD does not receive BLS edited data
(CIPSEA)
• BLS QCEW file may be edited/different from that
which LEHD receives
– Completeness
– Imputed employment
– Industry/geography changes
• Statewide totals are close (<1% off)
• LEHD QA will periodically note BLS QCEW data
inconsistent with internal LEHD QCEW micro-data
15
Causes of Differences:
UI Wage Data Reporting
• Firm may fail to report wage records
– QCEW still reported or imputed
• Firm may report wage records and QCEW
records on different account numbers
– Successor/predecessor mistiming
– Public sector issues
• PIK (SSN) miscoding prevents linking
wage records to same longitudinal job
16
Causes of Differences:
Industry Assignment
• Most establishments are assigned based on the
reported NAICS_AUX
• For earlier years in the data series, the reported SIC
code is probabilistically mapped to the current NAICS
codes
– Imputes may also be used for transitions between 1997,
2002, and 2007 NAICS
• LDB data are used for NAICS back-coding purposes
when the file has been provided by state
• Variations in algorithms between LEHD and BLS may
result in differences
– NAICS sector 55 (management of companies) displays
particular issues during SIC-NAICS transition
17
Causes of Differences:
Geographic Coding
• LEHD performs own geo-coding of addresses
– Generates lat-long for distance measures, allows
custom geography
• Address data are processed along with
address data from other sources
• Results may differ from BLS assignments
– Marginal shift over county line
– Significant relocation
• Effort currently underway to reengineer LEHD
geographic assignment to improve results
18
Differences between OnTheMap
and Journey to Work
• OnTheMap uses LEHD data
– Administrative data on employment, wages,
residence, and establishment locations
• Journey to Work uses ACS data
– User reported place of work and place of
residence, wages and employment
19
OnTheMap and Journey to Work,
some reasons they may differ
• OnTheMap
– Establishment may not be same as worksite
(construction workers)
– Tax address may differ from residence (students)
• Journey to Work
– High nonresponse on place of work
– Commute distance is capped in JtoW
– Response bias in employment, wages
20
21
22
23
24
25
26
Overview: Summary
– The QWI are developed by incorporating data
from a broad variety of sources
– Differences in data sources, construction, and
imputation procedures may cause
employment estimates that do not match
other sources
27