pdf, 1.77 MB

Social Assistance Pilots Program
SA Pilots Seminar
Hybrid Means Testing (HMT) Model
Development
Roman Semko
CASE Ukraine
March, 2010
Content
1.
Introduction to modeling
2.
Data analysis
3.
Methods for estimation
4.
Simulations
5.
Income from assets (agriculture)
6.
Double-blind experiment results
7.
Model comparisons and conclusions
2
Concept
•
The World Bank has developed a methodology for income estimation
which is based on regression analysis – HYBRID MEANS TESTING
(HMT)
•
Under HTM method, eligibility to the SA program is assessed based
on the households income modeling
•
Total income is divided into two parts: easy to verify (e.g., pension,
stipend) and hard to verify (e.g., dividends, shadow wage)
•
The final goal is to estimate hard to verify share of the income based
on a set of variables, which can be accurately measured and reflect
the hard to verify income
•
Hard to verify income is divided into income which is not generated
by long-term assets (estimated by regression model) and income
from assets (estimated by formulas)
3
The main goal of the model is to predict most
precisely total family income
Criteria
Model
Methods
Data and
Knowledge
Equation which estimates
applicant’s income based on
the available information:
1. Theoretical
validity
Y = β1*X1 + β2*X2 + β3*X3 + …
3. Goodness of
fit
Y
X1, X2, X3, …
 total income  family structure
 type and sector
Source: Finance Ministry of Ukraine
 hard to verify of employment
 education
 region
 other
2. Simplicity
4. Significance
of
explanatory
variables
4
Application and Simulation
Good model should use all available relevant
information for income prediction
HBS 2008
Pilots dataset
10,622 observations of households
with total income
• > 3,000 observations of families with
declared income
• Cannot be used separately for model
estimation since total income is not
available
A lot of information could/should be used
to guarantee acceptable level of precision
Declared income (DI) is an important
indicator in total income (TI) assessment
MATCHING
TI
Characteristics
Characteristics
DI
5
TI
Characteristics
DI
Observations are matched in a way to guarantee
the highest similarity between them
HBS 2008
Observation 1
Observation …
Observation K1
Observation 1
Observation …
Observation K…
Observation 1
Observation …
Observation KN
Procedure
Pilots dataset
1. Form groups based on the follow-ing
Group 1 variables: type of settlement, type of
Group 1
assistance, household’s size, # of children,
working persons, pensioners, sex of the
Group … single-heads household
Group …
2. Match each observations from HBS to
the observation from pilots dataset from
the same groups based on the similar
Group N characteristics: age of the head, education Group N
of the head, etc. using Euclidean distance
function
3. Each observation from pilots dataset is
used for matching no more than 2 times
4. Aggregate the groups if there are no
good candidate for HBS observation from
corresponding group from pilots dataset
and match again
Observation 1
Observation …
Observation L1
Observation 1
Observation …
Observation L…
Observation 1
Observation …
Observation LN
6
Data comparison: a main difference between HBS
and pilots applicants occurs in their incomes,
while most of other characteristics are similar
Total vs. declared income comparison (without SA)
3500
Income, UAH
3000
2500
2000
1500
1000
500
0
1
3
5
7
9
11
13
15
17
19
Income vintiles
HBS (total income)
Pilots (declared income)
7
For some regions average income in HBS
significantly differs from the Personal Disposable
Income (PDI) Statistics
Statistics, PDI
HBS
Chernigiv
Chernigiv
Lutsk
Lutsk
Rivne
Lviv
Ternopil
Khmelnytsky
IvanoVinnytsa
Frankivsk
Uzhgorod
Chernivtsi
Rivne
Sumy
Zhytomyr
Kyiv
Poltava Kharkiv
Cherkasy
Lviv
Lugansk
Kirovograd Dnipropetrovsk Donetsk
Mykolayiv
Sumy
Zhytomyr
Ternopil
Khmelnytsky
IvanoVinnytsa
Frankivsk
Uzhgorod
Chernivtsi
Kyiv
Zaporizhzhya
Lugansk
Kirovograd Dnipropetrovsk Donetsk
Mykolayiv
Kherson
Odesa
Poltava Kharkiv
Cherkasy
Zaporizhzhya
Kherson
Odesa
Simferopol
Simferopol
Differences in income without SA per capita compared to Chernivtsi region, UAH
– >200
– 100-200
– <100
8
Bayesian econometrics allows combining data
with aggregated publications of regional PDI
Calibration
Researcher artificially
determines the model
coefficient(s), e.g., if regional
macrodata say that income
in Kyiv city is 1108 UAH
higher than in AR of Crimea,
than it is assumed that for
Kyiv city applicants income
is 1108 UAH higher than for
AR of Crimea applicants,
other things equal
Bayesian
estimation
Combines both
approaches.
Estimated
coefficient lies
between
calibrated and
estimated in a
standard way
Standard
estimation
Coefficients are determined
based on the collected
observations using
standard regression tools
(classical econometrics)
Does not lead to significant changes within regions but for regions across
Ukraine changes are significant: average predicted income for regions has
changes
9
Linear model is the most simple
Description
• Linear relation between income and family characteristics
• Dependent variable is under the logarithm (log-linear)
• Independent variables (IVs) include easy to verify income
• Other IVs are: number of children, of working persons, of the elderly, type
and sector of employments of household heads, education level
R2
Linear
58 % (large cities – 65%, small cities – 63%, villages – 48%)
Predictions
2000
Concept: the more income the
applicant declares, the lower the
additional predicted income is – a
sort of a “zero sum game”
1000
0
– declared income
10
– predicted income
Nonlinear model is performing well when income
differences are high – for the whole HBS sample
Description
• Nonlinear relation between income and family characteristics. The form of
relation: cubic or quadratic – since total income sorted in ascending order
increases as a polynomial of 2nd or 3rd order
• Dependent variable is under the logarithm (log-linear)
• Independent variables are as in the linear model
R2
NonLinear
R2-square is not bounded in [0%,100%] region
2000
Predictions
Concept: as for the linear model
1000
0
– declared income
11
– predicted income
Two-step model is effective when there is a large
number of families with zero and nonzero hard to
verify incomes
Description
• At first stage probability that family has shadow income is estimated and
then linear relations between income and family characteristics with a
hazard of having shadow income is used for estimation
• Dependent variable is under the logarithm (log-linear) and does not
include salary
R2
Two-stage
47 % (no division by cities)
Predictions
2000
Concept: Stable additional income
is added to the declared – “the
game with constant markup”.
1000
0
– declared income
12
– predicted income
Each model needs a set of adjustments in order
to become fully useful
Adjustments
Description
1. DEPENDENT VARIABLE
Informal (shadow) salary was incorporated
into the dependent variable (hard to verify
income) since it is not easy to verify income
2. EXPLANATORY
VARIABLES (EVs)
Some EVs which can be used for predictions
are hard to verify, e.g., number of mobile
phones cannot be accurately measured
3. TIME INCONSISTENCIES
In order to compare incomes across different
time period, average growth rates of PDI and
its elements were used for time adjustment
4. FAMILY HEADS
The definitions of family heads are standardized: male co-head and female co-head are
used instead of voluntary definitions
Prediction does not change significantly unless dependent variable is redefined. If the
dependent variable is redefined, additional predicted income becomes more stable and
decreases with the increase of declared income at a lower rate
13
Average predicted income exceeds declared by
26%
Declared vs. Predicted income (by models)
Income, UAH
1200
900
600
300
0
Low income Child care
Single
mothers
Housing
subsidies
Fuel
Mixed
subsidies assistance
Total
Type of assistance
Declared w/o SA
Predicted linear model
Predicted matching model
Predicted two-step model
14
27% families will be excluded from the SA
programs
Number of beneficiaries (hypothetical scenario)
100.00%
Beneficiaries
80.00%
60.00%
40.00%
20.00%
0.00%
Low income
Child care
Single
mothers
Housing
subsidies
Fuel
subsidies
Type of assistance
Status-quo
Linear model
Matching model
Two-step model
15
Average assistance will drop significantly, except
for low income and fuel subsidies
Average assistance (hypothetical scenario)
Average assistance
120.00%
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
Low income
Child care
Single
mothers
Housing
subsidies
Fuel
subsidies
Type of assistance
Status-quo
Linear model
Matching model
Two-step model
16
Total budget for SA expenditures will decrease by
27%
Total expenditures on SA (hypothetical scenario)
Total expenditures
120.00%
100.00%
80.00%
60.00%
40.00%
20.00%
0.00%
Low income
Child care
Single
mothers
Housing
subsidies
Fuel
subsidies
Type of assistance
Status-quo
Linear model
Matching model
Two-step model
17
Income from agriculture assets is calculated
based on the developed normatives
Current situation
New approach
• Agriculture income is calculated as
income per hectar
• Normatives are not unified across
regions
• Income calculation per hectar and per
each animal
• Differentiation between cities and
villages
• Normatives are unified since they are
based on the same methodology and data
Calculation procedure
Information,
certified by
the
village/city
council
Is not
applied to
families with
disable
persons or
elderly (>70)
If applicant
lives closer
than 10 km
to the city –
apply city
normatives
Income
from land is
a product of
land area
and
normatives
Average predicted income exceed declared by 28%
Income
from payi is
calculated
sepatately
Income from
lifestock is the
product of
number of
livestock
heads times
the normative
18
Example of income calculation from agriculture
New
approach
Current
situation
CROPS
ANIMALS
LAND AREA
NORMATIVE
ANIMALS
NORMATIVE
Only farmstead area of
0.56 hectars,
located in
village
(Donetsk
region)
127.62 per
hectar per
month
Possess
one cow
and 10
chickens
Only
related
through
the
hayfields
and
pasturage
Only farmstead area of
0.56 hectars,
located in
village
(Donetsk
region)
412.44 per
hectar per
month
Possess
one cow
and 10
chickens
(the same)
270.83 for
cow, and
4.48 for
one
chicken
AGROINCOME
+ 63.81 UAH
per month
+ 521.85 UAH
per month
19
Double-blind experiment: case study
Family description
Model result
Declared Income = 211 UAH
LESS than
Eligibility threshold = 255 UAH
Father: unemployed
and not registered in
employment center
Age: <18
Age: <3
Mother: housewife
Age: <3
GRANT?
WHILE
Age: <3
Model prediction = 308 UAH
immediate decision – risky family,
need home visit
MORE than
Commission case and home visit Eligibility threshold = 255 UAH
DENIAL
DENIAL
20
Cases of SA denials through commission, based
on home inspections
Type of assistance
Low income
Low income
Low income
Child care
Child care
Child care
Child care
Child care
Child care
Single mothers
Single mothers
Declared
income
UAH
9
211
264
10
12
209
250
260
273
0
222
Predicted
income
UAH
133
308
344
133
127
286
327
365
330
68
291
Absolute
difference
UAH
124
97
80
123
114
77
77
105
56
68
69
Relative
difference
%
1320
46
30
1238
931
37
31
40
21
31
21
Predicted income helps to select families for home
inspection
Families selected for inspection
Comments
Relative deviation, %
1200
• Each family has a
chance to be selected
for home inspection
• The probability of
selections increases as
the predicted income is
significantly different
from declared in
absolute and relative
terms
1000
800
600
400
200
0
0
200
400
600
800
1000
1200
Absolute deviation, UAH
Not selected
Selected
22
Model comparison
MODEL
Linear (no
matching)
Nonlinear
Two-step
model
Linear
(matching)
Linear
(Bayesian
+ matching
or not)
Theoretical
background
Weak since
linear
relations are
rare in nature
Stronger
since takes
into account
nonlinearities
Strong if
selection bias
is expected to
be
Same as
linear (no
matching)
Same as
linear (no
matching)
Simplicity
Very simple
Simple
Slightly
complex
Simple in
Very complex
estimation but in estimation
complex in
data matching
R-square
High
-
High
High
-
Information
as an input
Only HBS
Only HBS
Only HBS
HBS and
pilots dataset
Very effective
use of
information
Influence on
applicant
High
High
Medium
High
High
Characteristics
23
Conclusions
1
Income estimates generated by the models significantly differ from the
incomes declared by the SA applicants
2
Further empirical tests with the models are needed
3
Initially model results should be used only as an advice rather than
a criterion for granting SA benefits
4
The models may be used as an instrument for selecting families for
home inspections
24