Calibration estimators

How similar are different calibration estimators in the
presence of a zero-inflated auxiliary variable?
Evidence from the German job vacancy survey
Hans Kiesl
Institute for Employment Research (IAB), Germany
[email protected]
NTTS 2009 – New Techniques and Technologies for Statistics
Brussels • February 18-20, 2009
Background
Regulation (EC) No. 453/2008 of the European Parliament and of the
Council of 23 April 2008 on quarterly statistics on Community job
vacancies
Member states have to provide
 quarterly data on job vacancies (broken down to NACE section level)
 quality reports
In Germany, the data will be provided by the IAB.
2
Background (2)
Information on job vacancies in Germany

Business units might report job vacancies to the Federal
Employment Agency

Federal Employment Agency publishes monthly statistics on number
of registered job vacancies (by NACE-sector)

Since 1989, IAB conducts a yearly (4th quarter) sample survey
among business units to estimate number of job vacancies
(registered or not) and to get additional information (e.g. about
recruiting strategies)

Mail questionnaire (8 pages in length); voluntary

CATI interviews in quarters 1 - 3
3
Basic estimation strategy

stratified simple random sampling (by size classes and industry sector)

calculate design weights as inverse (realized) sampling rate within
each stratum

calibrate design weights to known totals from external data
 number of business units by size
 number of business units by industry sector
 number of employees by size
 number of employees by industry sector
 number of registered vacancies by industry sector
4
Calibration estimators (1)
RAKCON


raking estimator with weight restrictions

within each stratum only two different weights allowed

units with vacancies, units without vacancies

reason: control variance of weights and variance of estimates
start with design weights and repeat following two steps until
convergence of weights:

proportional fitting of weights for units with vacancies to number of
registered vacancies by sector

iterative proportional fitting of all weights to number of units by size
and by sector
5
Calibration estimators (2)
Generalized regression estimator (GREG)



minimizes
( w i  di )2
dist( w, d)  
qidi
i1
n
so that
t̂ X j   w i x ij  t X j
i
GREG1 calibrated to

number of units by size

number of units by sector

number of registered vacancies by sector
GREG2 additionally calibrated to

number of employees by size

number of employees by sector
6
Calibration estimators (3)
Generalized regression estimator (GREG) with weight restrictions
 ( w i  di )2
( w i  di )2
( w i  di )2 

dist( w, d)  
   

qidi
qidi
qidi 
i1
h  iN1
iN2
n

 ( w 1h  d1h )2
( w h2  dh2 )2 



1
2
 iN

q
d
q
d
iN2
i h
i h
 1


 ( w 1h  d1h )2
1 ( w h2  dh2 )2
1





1
2


dh
dh
iN1 qi
iN1 qi 

h
h

GREGCON1: N1 = set of units with vacancies

GREGCON2: N1 = set of units with registered vacancies
7
Result of different calibration estimators
 4th quarter 2007, Germany (west)
 realized sample size: 7,485 (response rate: 20%)
Algorithm used
Estimated # of job vacancies
Germany (west) 4th quarter 2007
RAKCON
994,735
GREGCON1
951,386
GREGCON2
848,178
GREG1
848,184
GREG2
812,513
8
Highly skewed distribution of job vacancies
% of 0’s
total # of vacancies (excluding 0)
# of registered vacancies (excluding 0)
91%
97%
size
1-10
86%
96%
10-19
77%
91%
20-49
68%
86%
50-199
48%
75%
200-499
36%
71%
500 +
9
0
100
200
300
Simulation study

Create synthetic population by sampling with replacement from
original sample

Draw 300 samples from synthetic population with same sampling
design and realized sample sizes as original sample

Calculate all estimators described above

Repeat for different nonresponse models

RHG1: equal response probability within strata

RHG2: equal response probabilities within two group (units with and
without vacancies) in every stratum

RHG3: equal response probabilities within two group (units with and
without registered vacancies) in every stratum
10
Sampling distributions under RHG 1
750,000 800,000 850,000 900,000 950,000
1,000,000
sampling under nonresponse model RHG1
rakcon
gregcon1
gregc on2
greg1
greg2
11
Sampling distributions under RHG 2
700,000
800,000
900,000
1,000,000 1,100,000
sampling under nonresponse model RHG2
rakcon
gregcon1
gregc on2
greg1
greg2
12
Sampling distributions under RHG 3
800,000
850,000
900,000
950,000
1,000,000
sampling under nonresponse model RHG3
rakcon
gregcon1
gregc on2
greg1
greg2
13
Two step GREG estimation

If we accept RHG2, unconstrained GREG is biased.

No information in the frame or among non-responding units to
directly estimate the response probabilities.

Suggestion: two step GREG estimation.

First step: GREG estimation, calibrating to registered vacancies

Using the calibrated weights, we can get estimates for response
probabilities.

Second step: adjust design weights for different response
probabilities, add another GREG estimation step
14
How do we estimate response probabilities?
unreg v ac
N̂
N̂reg v ac
n
unreg v ac
N̂reg v ac
reg v ac
n
N̂no v ac  N  N̂reg v ac  N̂unreg v ac
population
1st stage: equal inclusion probabilities
sample
1
1
2
(model RHG 2)
respondents
nreg v ac
nunreg v ac
nno v ac
15
Sampling distributions under RHG 1
750,000 800,000 850,000 900,000 950,000
1,000,000
sampling under nonresponse model RHG1
rakcon
gregcon1
gregcon2
greg1
greg2
2 step greg
16
Sampling distributions under RHG 2
700,000
800,000
900,000
1,000,000 1,100,000
sampling under nonresponse model RHG2
rakcon
gregcon1
gregcon2
greg1
greg2
2 step greg
17
Sampling distributions under RHG 3
750,000 800,000 850,000 900,000 950,000
1,000,000
sampling under nonresponse model RHG3
rakcon
gregcon1
gregcon2
greg1
greg2
2 step greg
18
Conclusions

Weight restrictions lead to larger variance of estimators.

Calibration estimators work under an implicit nonresponse model.

Two step GREG estimator applicable if
 theory suggests certain response homogeneity groups,
 there is no complete information about RHG membership in the frame
or among the non-responding units,
 the only information is an auxiliary variable applicable for calibration
which identifies part of the RHG group.
 Special case: existence of a zero-inflated calibration variable with the
property that units with a value greater than zero are in the same RHG,
but units with a value of zero might be in different RHGs.
19
Thank you very much for your attention!
NTTS 2009 – New Techniques and Technologies for Statistics
Brussels • February 18-20, 2009