Social Assistance Pilots Program SA Pilots Seminar Hybrid Means Testing (HMT) Model Development Roman Semko CASE Ukraine March, 2010 Content 1. Introduction to modeling 2. Data analysis 3. Methods for estimation 4. Simulations 5. Income from assets (agriculture) 6. Double-blind experiment results 7. Model comparisons and conclusions 2 Concept • The World Bank has developed a methodology for income estimation which is based on regression analysis – HYBRID MEANS TESTING (HMT) • Under HTM method, eligibility to the SA program is assessed based on the households income modeling • Total income is divided into two parts: easy to verify (e.g., pension, stipend) and hard to verify (e.g., dividends, shadow wage) • The final goal is to estimate hard to verify share of the income based on a set of variables, which can be accurately measured and reflect the hard to verify income • Hard to verify income is divided into income which is not generated by long-term assets (estimated by regression model) and income from assets (estimated by formulas) 3 The main goal of the model is to predict most precisely total family income Criteria Model Methods Data and Knowledge Equation which estimates applicant’s income based on the available information: 1. Theoretical validity Y = β1*X1 + β2*X2 + β3*X3 + … 3. Goodness of fit Y X1, X2, X3, … total income family structure type and sector Source: Finance Ministry of Ukraine hard to verify of employment education region other 2. Simplicity 4. Significance of explanatory variables 4 Application and Simulation Good model should use all available relevant information for income prediction HBS 2008 Pilots dataset 10,622 observations of households with total income • > 3,000 observations of families with declared income • Cannot be used separately for model estimation since total income is not available A lot of information could/should be used to guarantee acceptable level of precision Declared income (DI) is an important indicator in total income (TI) assessment MATCHING TI Characteristics Characteristics DI 5 TI Characteristics DI Observations are matched in a way to guarantee the highest similarity between them HBS 2008 Observation 1 Observation … Observation K1 Observation 1 Observation … Observation K… Observation 1 Observation … Observation KN Procedure Pilots dataset 1. Form groups based on the follow-ing Group 1 variables: type of settlement, type of Group 1 assistance, household’s size, # of children, working persons, pensioners, sex of the Group … single-heads household Group … 2. Match each observations from HBS to the observation from pilots dataset from the same groups based on the similar Group N characteristics: age of the head, education Group N of the head, etc. using Euclidean distance function 3. Each observation from pilots dataset is used for matching no more than 2 times 4. Aggregate the groups if there are no good candidate for HBS observation from corresponding group from pilots dataset and match again Observation 1 Observation … Observation L1 Observation 1 Observation … Observation L… Observation 1 Observation … Observation LN 6 Data comparison: a main difference between HBS and pilots applicants occurs in their incomes, while most of other characteristics are similar Total vs. declared income comparison (without SA) 3500 Income, UAH 3000 2500 2000 1500 1000 500 0 1 3 5 7 9 11 13 15 17 19 Income vintiles HBS (total income) Pilots (declared income) 7 For some regions average income in HBS significantly differs from the Personal Disposable Income (PDI) Statistics Statistics, PDI HBS Chernigiv Chernigiv Lutsk Lutsk Rivne Lviv Ternopil Khmelnytsky IvanoVinnytsa Frankivsk Uzhgorod Chernivtsi Rivne Sumy Zhytomyr Kyiv Poltava Kharkiv Cherkasy Lviv Lugansk Kirovograd Dnipropetrovsk Donetsk Mykolayiv Sumy Zhytomyr Ternopil Khmelnytsky IvanoVinnytsa Frankivsk Uzhgorod Chernivtsi Kyiv Zaporizhzhya Lugansk Kirovograd Dnipropetrovsk Donetsk Mykolayiv Kherson Odesa Poltava Kharkiv Cherkasy Zaporizhzhya Kherson Odesa Simferopol Simferopol Differences in income without SA per capita compared to Chernivtsi region, UAH – >200 – 100-200 – <100 8 Bayesian econometrics allows combining data with aggregated publications of regional PDI Calibration Researcher artificially determines the model coefficient(s), e.g., if regional macrodata say that income in Kyiv city is 1108 UAH higher than in AR of Crimea, than it is assumed that for Kyiv city applicants income is 1108 UAH higher than for AR of Crimea applicants, other things equal Bayesian estimation Combines both approaches. Estimated coefficient lies between calibrated and estimated in a standard way Standard estimation Coefficients are determined based on the collected observations using standard regression tools (classical econometrics) Does not lead to significant changes within regions but for regions across Ukraine changes are significant: average predicted income for regions has changes 9 Linear model is the most simple Description • Linear relation between income and family characteristics • Dependent variable is under the logarithm (log-linear) • Independent variables (IVs) include easy to verify income • Other IVs are: number of children, of working persons, of the elderly, type and sector of employments of household heads, education level R2 Linear 58 % (large cities – 65%, small cities – 63%, villages – 48%) Predictions 2000 Concept: the more income the applicant declares, the lower the additional predicted income is – a sort of a “zero sum game” 1000 0 – declared income 10 – predicted income Nonlinear model is performing well when income differences are high – for the whole HBS sample Description • Nonlinear relation between income and family characteristics. The form of relation: cubic or quadratic – since total income sorted in ascending order increases as a polynomial of 2nd or 3rd order • Dependent variable is under the logarithm (log-linear) • Independent variables are as in the linear model R2 NonLinear R2-square is not bounded in [0%,100%] region 2000 Predictions Concept: as for the linear model 1000 0 – declared income 11 – predicted income Two-step model is effective when there is a large number of families with zero and nonzero hard to verify incomes Description • At first stage probability that family has shadow income is estimated and then linear relations between income and family characteristics with a hazard of having shadow income is used for estimation • Dependent variable is under the logarithm (log-linear) and does not include salary R2 Two-stage 47 % (no division by cities) Predictions 2000 Concept: Stable additional income is added to the declared – “the game with constant markup”. 1000 0 – declared income 12 – predicted income Each model needs a set of adjustments in order to become fully useful Adjustments Description 1. DEPENDENT VARIABLE Informal (shadow) salary was incorporated into the dependent variable (hard to verify income) since it is not easy to verify income 2. EXPLANATORY VARIABLES (EVs) Some EVs which can be used for predictions are hard to verify, e.g., number of mobile phones cannot be accurately measured 3. TIME INCONSISTENCIES In order to compare incomes across different time period, average growth rates of PDI and its elements were used for time adjustment 4. FAMILY HEADS The definitions of family heads are standardized: male co-head and female co-head are used instead of voluntary definitions Prediction does not change significantly unless dependent variable is redefined. If the dependent variable is redefined, additional predicted income becomes more stable and decreases with the increase of declared income at a lower rate 13 Average predicted income exceeds declared by 26% Declared vs. Predicted income (by models) Income, UAH 1200 900 600 300 0 Low income Child care Single mothers Housing subsidies Fuel Mixed subsidies assistance Total Type of assistance Declared w/o SA Predicted linear model Predicted matching model Predicted two-step model 14 27% families will be excluded from the SA programs Number of beneficiaries (hypothetical scenario) 100.00% Beneficiaries 80.00% 60.00% 40.00% 20.00% 0.00% Low income Child care Single mothers Housing subsidies Fuel subsidies Type of assistance Status-quo Linear model Matching model Two-step model 15 Average assistance will drop significantly, except for low income and fuel subsidies Average assistance (hypothetical scenario) Average assistance 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% Low income Child care Single mothers Housing subsidies Fuel subsidies Type of assistance Status-quo Linear model Matching model Two-step model 16 Total budget for SA expenditures will decrease by 27% Total expenditures on SA (hypothetical scenario) Total expenditures 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% Low income Child care Single mothers Housing subsidies Fuel subsidies Type of assistance Status-quo Linear model Matching model Two-step model 17 Income from agriculture assets is calculated based on the developed normatives Current situation New approach • Agriculture income is calculated as income per hectar • Normatives are not unified across regions • Income calculation per hectar and per each animal • Differentiation between cities and villages • Normatives are unified since they are based on the same methodology and data Calculation procedure Information, certified by the village/city council Is not applied to families with disable persons or elderly (>70) If applicant lives closer than 10 km to the city – apply city normatives Income from land is a product of land area and normatives Average predicted income exceed declared by 28% Income from payi is calculated sepatately Income from lifestock is the product of number of livestock heads times the normative 18 Example of income calculation from agriculture New approach Current situation CROPS ANIMALS LAND AREA NORMATIVE ANIMALS NORMATIVE Only farmstead area of 0.56 hectars, located in village (Donetsk region) 127.62 per hectar per month Possess one cow and 10 chickens Only related through the hayfields and pasturage Only farmstead area of 0.56 hectars, located in village (Donetsk region) 412.44 per hectar per month Possess one cow and 10 chickens (the same) 270.83 for cow, and 4.48 for one chicken AGROINCOME + 63.81 UAH per month + 521.85 UAH per month 19 Double-blind experiment: case study Family description Model result Declared Income = 211 UAH LESS than Eligibility threshold = 255 UAH Father: unemployed and not registered in employment center Age: <18 Age: <3 Mother: housewife Age: <3 GRANT? WHILE Age: <3 Model prediction = 308 UAH immediate decision – risky family, need home visit MORE than Commission case and home visit Eligibility threshold = 255 UAH DENIAL DENIAL 20 Cases of SA denials through commission, based on home inspections Type of assistance Low income Low income Low income Child care Child care Child care Child care Child care Child care Single mothers Single mothers Declared income UAH 9 211 264 10 12 209 250 260 273 0 222 Predicted income UAH 133 308 344 133 127 286 327 365 330 68 291 Absolute difference UAH 124 97 80 123 114 77 77 105 56 68 69 Relative difference % 1320 46 30 1238 931 37 31 40 21 31 21 Predicted income helps to select families for home inspection Families selected for inspection Comments Relative deviation, % 1200 • Each family has a chance to be selected for home inspection • The probability of selections increases as the predicted income is significantly different from declared in absolute and relative terms 1000 800 600 400 200 0 0 200 400 600 800 1000 1200 Absolute deviation, UAH Not selected Selected 22 Model comparison MODEL Linear (no matching) Nonlinear Two-step model Linear (matching) Linear (Bayesian + matching or not) Theoretical background Weak since linear relations are rare in nature Stronger since takes into account nonlinearities Strong if selection bias is expected to be Same as linear (no matching) Same as linear (no matching) Simplicity Very simple Simple Slightly complex Simple in Very complex estimation but in estimation complex in data matching R-square High - High High - Information as an input Only HBS Only HBS Only HBS HBS and pilots dataset Very effective use of information Influence on applicant High High Medium High High Characteristics 23 Conclusions 1 Income estimates generated by the models significantly differ from the incomes declared by the SA applicants 2 Further empirical tests with the models are needed 3 Initially model results should be used only as an advice rather than a criterion for granting SA benefits 4 The models may be used as an instrument for selecting families for home inspections 24
© Copyright 2026 Paperzz