11 Memobust course Weighting and Estimation

Weighting and Estimation
Eurostat
Presented by
• Loredana Di Consiglio
• Istituto Nazionale di Statistica, ISTAT
Outline
• Weighting and estimation in the Handbook
– Weighting, use of auxiliary variables and calibration
estimators
– Small area estimation
– Preliminary estimation
• Choice of estimation method
Weighting
• Principle of weighting: each sample unit
represents a number of population units.
• Basic weights: the design weights
 k  Pr(k  s)   p( s) 1(k  s)  E p ( I k ( s))
s
• Horvitz-Thompson estimator
1
1
YˆHT  
yk   d k yk   1(k  s )
yk
s
k
s
U
k
Non-linear Estimation: Plug-in Principle (or
substitution)
Weighting
•
•
•
•
The principle of weighting is also applied to account for unit nonresponse.
Design weights can be adjusted also to consider non-response in
order to reduce the possible bias of resulting estimates.
For example, the sample can be partitioned into sub-groups of
units where the response rates are assumed to be constant, and
where it can be assumed that non-respondents behave similarly
to respondents.
Non-response depends on auxiliary variables defining a partition
of the population, but conditionally on these variables it is
independent of the target variable.
Use of Auxiliary information
• When auxiliary variables are available: reduce bias,
reduce variance (however sometimes, external bounds)
• Ratio estimator, auxiliary information : the total of one
numerical variable
Yˆ
ˆ
Yrat  X
  wk yk
Xˆ
s
whe re wk 
X
dk
Xˆ
• If applied to the X variable, one gets a perfect estimate
Xˆ rat  X
Use of Auxiliary information
• Poststratification: total of a vector of indicator of poststrata
.
 . 






X   N h     1(k  U h ) 
 .  U 

.




• The estimator is
Yˆpost   N hYˆh where Yˆh   1(k U h )d k yk /  1(k U h )d k  Yˆh / Nˆ h .
h
s
s
Use of Auxiliary information
• Raking Ratio
– Auxiliary Information: known totals of different auxiliary
variables (not-cross-classified)
Ni  i  1 to I 
N  j  j  1 to J 
The Raking-Ratio method consists in performing
post-stratification with all variables and iterate
Use of Auxiliary information
• GREG


YˆGREG   d i yi  ̂  X   d i xi 
is
is


• GREG is «assisted» by a linear relationship
between X and Y.
Calibration
• The estimate of total Y is obtained by means of a
procedure which
– Corrects bias due to non response
– takes into account the knowledge of auxiliary variables,
requiring that the estimates of these ones are equal to
their own known totals
YˆCAL   yk d k  k   yk wk
ks
ks
Calibration
• The weights wk are calculated as follows:
 dk is the initial weight, equal to the inverse of the
inclusion probability pk
 gk is the final correction factor, which allows equality of
sampling estimates to their known totals; it is calculated
by means of the following equations
Calibration
• Final weight are chosen to satisfy constraints on
auxiliary variables subject to
 

min E p s G Dk , Wk 

 s Wk X k  t x
• where G is an appropriate distance function
• Subject to bounds for w/d
Calibration
• Distance function G:
 Linear

w  d 2
s
– Raking ratio:
2d
(w/d) Log (w/d) – w/d +1
– Truncated linear
w / d  12
2
w / d  L, U 
Calibration
• Calibration estimator equals GREG when
choosing the linear (Euclidean) distance function
ˆ ) ' β̂
YˆCAL  YˆGREG  Yˆ  (X  X
Calibration
• All calibration estimators are asymptotically equal
to GREG
• They are approximately unbiased and consistent
• Their sampling variance converges to GREG
variance
Calibration
• Software
– CLAN (Statistics Sweden)
– BASCULA (The Netherlands)
– GES (StatCan)
• ReGenesees (ISTAT)- R package
– A second R package, called ReGenesees.GUI, implements the
presentation layer of the system: less experienced R users will take
advantage from the user-friendly graphical interface.
•
downloadable from the Joinup
https://joinup.ec.europa.eu/software/regenesees/release/all
Weighting, use of auxiliary variable
and calibration
• Planned modules in HB
– Main theme module
– Calibration estimators
– Already available:
 GREG http://www.crosportal.eu/content/generalised-regressionestimator
Small area estimation
• Most national surveys are planned to produce
accurate estimation at national level.
• Analyses at finer partition may not have the
desired precision due to small sample size or
even zero sample.
• A small area is a domain where the sample size is
not sufficient to satisfy prefixed level of precision.
Small area estimation
• Indirect estimators – make use of what has been observed
on the other domains (or time)
– Traditional estimators:
 Synthetic estimators
 Composite estimators
– Model based estimators
 Area level models
 Unit level models
• With this class of estimators extra-information is gained in
the estimation process by making use of observations
outside the domain of interest by means of implicit
(synthetic estimators) or explicit (model based estimators) use
of models.
Small area estimation
• Use information at local level with common beta
– Modified direct
YˆdGRE
1

Nˆ d

1

w
y

X

 id id  d Nˆ
isd
d

1
YˆdGRE 
Nˆ d
w
isd
id
T
 ˆ

w
x

id id  β
isd

( yid  xTid βˆ )  X.Td βˆ
Small area estimation
• Synthetic estimators: simple case it is assumed
that small areas have same mean of larger
domains (at least in classes),
Yˆ SIN  XTd βˆ
Synthetic estimators can be based on different
models (relationships between variable of interest
and auxiliary v.); linear model; linear mixed model
at unit level; linear mixed model at area level.
Small area estimation
• Model based estimators
 Based on area level model:
YˆdEBLUP1  ˆd YˆdHT  (1  ˆd ) X.Td βˆ
ˆd  ˆ u2
ˆ
2
u
 ˆ d

 Based on unit level model:
 

EBLUP2
Yˆd
  d yd  XTd βˆ  xTd βˆ  1   d XTd βˆ
ˆd  ˆ u2
ˆ
2
u
 ˆ d2 / nd

Small area estimation
• References in the HB:
•
http://www.cros-portal.eu/content/small-area-estimation
•
•
•
http://www.cros-portal.eu/content/eblup-area-level-sae
http://www.cros-portal.eu/content/eblup-unit-level-sae
http://www.cros-portal.eu/content/small-area-estimation-methods-timeseries-data
Small area estimation
• Guidelines can be found at: http://www.crosportal.eu/sites/default/files//WP6-Report.pdf
• Quality assessment: http://www.cros-
portal.eu/content/final-report-quality-assessment-sae-wp3
• In practice:
– http://www.cros-portal.eu/content/final-reportsoftware-tools-sae-sae-wp4
– R codes from ESSnet SAE project: http://www.crosportal.eu/content/r-codes-documentations-sae-wp4
Preliminary estimation
• The treatment of unit non-response may be
applied.
• In this case, the late response is treated as nonresponse but in order to avoid biased estimates,
the self-selection of quick respondents
mechanism should not be considered as random.
Preliminary estimation
• Rao et al. (1989) proposed composite estimators that may
represent an improvement of the standard estimator.
• The basic composite estimator is obtained as weighted
average of the preliminary estimate at time t and the final
estimate at time t-1 adjusted for the difference between
preliminary estimates at time t and t-1.

Yt ,  Yt p  (1   ) Yt 1  (Yt p  Yt p1 )
•

 in [0,1]
 chosen on the basis of variances and covariances
Preliminary estimation
• In order to reduce the revision error of the
preliminary estimates model based estimators
can be considered, Rao, Srinath and Quenneville (1989)
adopt a time series approach to preliminary estimation.
• Let Yt Yt and Yt be respectively the preliminary
estimate at time t, the final estimates and the
measurement errors in preliminary estimates at
time t
P
*
Preliminary estimation
• Furthermore, suppose:
Yt  Yt 1   t ,
Yt*  Yt*1   t ,
 t ~ N 0, 2 
 t ~ N 0,  02 
• The estimator results: Yˆt 1   Yt   1   Yt P1 Yt* 
• Or whenP auxiliary variables
Yt  Yt 1    k X t  t ,
k 1
 t ~ N 0,  2 
• Or taking into account of seasonality
Yt  1Yt 1  2Yt 12   t ,
 t ~ N 0,  2 
   2  02
Preliminary estimation
• Design based
– http://www.cros-portal.eu/content/preliminary-estimates-design-based-methods
• Model based
– http://www.cros-portal.eu/content/preliminary-estimates-model-based-methods
• Sub-sampling
– http://www.cros-portal.eu/content/subsampling-preliminary-estimation
Choice of estimation methods
• Quality indicators:
– Accuracy: degree of closeness of estimates to the true
values.
 Bias
 Precision
– Timeliness
: is the length of time between the event or
phenomenon they describe and their availability. – Revision errors
– Coherence and comparability: Coherence with other
statistics
Ref. ESS Handbook for Quality Reports Methodologies and
Working papers, 2009
Choice of estimation methods
• Close relationship with sampling design – (e.g.
weights) – Choice of sampling strategy
• Non probabilistic sample design? E.g. cut-off
sampling
model based estimators
– Model simply assumes that the units cut off behave
similarly to those in the sampled portion.
– Model assumptions should be analysed as far as
possible.
Thank you for your attention