Income Distribution Models and Income Inequality
Measures from the Robust Statistics Perspective
Revisited
Daniel Kosiorowski1
1 Cracow
University of Economics, Department of Statistics, Faculty o Management
Łódź, 3th Conference on Spatial Econometrics , 09.06.2014
Thanks for the polish NCS financial support DEC-011/03/B/HS4/01138
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Plan
1
Introduction
2
Robust estimators of the Pareto and log normal models
3
Properties of the estimators
4
An overview of further studies
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
Plan
1
Introduction
2
Robust estimators of the Pareto and log normal models
3
Properties of the estimators
4
An overview of further studies
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
MOTIVATIONS
• Economic size distributions (ESD) including Pareto, log-normal or
gamma distributions are a central part of Economic Theory distribution of wealth, of firm sizes, actuarial losses, city sizes, but
also the Internet traffic characteristics (data streams). These
distributions may be studied within Functional Data Analysis.
• Correct estimation of the ESD is directly related to tax, social or
regional politics via their derivative measures of social inequalities like
Gini or Pietra coefficient, measures of "degree of a free market" in a
sector of economy.
• Even small relative error in estimation of a shape parameter of a size
distribution may lead to a large relative error in estimated quantiles or
tail probabilities based on α. Classical MLE estimators of the size
models are not robust.
• Multivariate extensions of the Lorenz curve, Gini coefficient etc.,
appeared within the data depth concept.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
Introduction - robustness of a procedure
Five monthly salaries (in PLN) in Poland in 2011 year:
3225; 3103; 2944; 3100; 1123,
the mean 2699 , the median 3100.
the standard deviation 886.63, the MAD 185.23.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
Economic size distributions - introduction
• Pareto’s research on income distribution (1887) was related with his
discussion with the French and Italian Socialists who were insisting on
institutional reforms to reduce inequality in the distribution of income. He
found a regularity of observed income distribution obtained from tax records
- the Pareto type I model expresses this regularity. The Pareto principle
sometimes called the 80-20 rule claims that roughly 80% of the effects come
from 20% of the causes.
• In 1931 Gibrat - French engineer and economist developed a widely used
lognormal model for the size distribution of income.
• In 1898 March proposed using gamma distribution for distribution of wages
and fitted it for data from France and Germany.
• Kalecki (1945) - increments of the income are proportional to the excess in
ability of given members of the distribution over the lowest (or median)
member - truncated log normal distribution.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
Economic size distributions - introduction
• PARETO: "...statistics tells us that the (income) curve varies very little in
time and space - different people and at different times, give very similar
curves..."
• D’Addario (1934,1939) - a detailed investigation of the income distributions:
the average income earned over a certain income x is increasing linear
function of x.
• Dagum (1977) studied elacticity of the survival function with respect to
income x - basing on it proposed system of distributions including among
others Pareto distributions (Dagum’s generalized logistic system)
• Champernowne (1953) - log logistic distribution, generalized gamma
distribution, Singh-Maddala distribution,...
• Stochastic models leading to the size distributions - e.g., Mandelbrot (1961)
• INEQUALITY MEASURES: Lorentz curve (1905), Gini coefficient (1936),
Pietra coefficient (1915), Atkinson coefficient, Generalized entropy
coefficient, Zenga curve (1984),...
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
ESD - theoretical, robust approaches.
• For nearly a century ESD considerations were strictly descriptive - data
were obtained from censuses.
• Theory for Lorenz Curve via empirical Lorentz process, confidence intervals
for simple points on the Lorentz curve - Sendler (1979), Lorentz curve
ordinates can be reduced to the asymptotics of the sample quantiles.
• ROBUST APPROACHES:Victoria-Feser (1993), Victoria-Feser (1993) &
Ronchetti (1994) - proposed bounded influence function M- alternatives to
the MLE estimators; Serfling & Brazauskas (2000a, 2000b, 2001, 2004) trimmed mean estimators, generalized median estimators - for Pareto and
log normal models.
• MULTIVARIATE EXTENSIONS: multivariate Lorentz curve and Gini
coeff. - Mosler (1996), Mosler & Koshevoy (1997), Liu, Parelius & Singh
(1999) - within data depth concept.
• Functional Data Analysis and decompositions of the ESD - Kosiorowski,
Mielczarek, Rydlewski, Snarska (2013), (2014); applications of robust
statistics to EDS - Kosiorowski & Tracz (2014a, 2014b).
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
1.0
Lorenz curve
GINI=0.36
GINI=0.1
0.6
0.4
0.2
L(p)
0.8
P(3,2), Gini=0.36
P(3,5), Gini=0.10
0.0
density
0.0 0.2 0.4 0.6 0.8 1.0 1.2
Pareto density split into 10 equal areas
3
4
5
6
7
8
0
20
x
40
60
80
100
px100
Pareto studied the economic agents income distribution for tax purposes. The
distribution was truncated to the left at a point xm , the maximum non-taxable
income xm > 0. Pareto found - a stable linear relation of the form
log N (x) = A − α log x,
x ≥ xm > 0 , α > 1 , where N (x) is the number of economic units with income
X > x , X is the income with range [xm , ∞).
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
Pareto distribution - basic facts
density
0.0 0.2 0.4 0.6 0.8 1.0 1.2
Pareto density split into 10 equal areas
P(3,2), Gini=0.36
P(3,5), Gini=0.10
3
4
5
6
7
8
A simple Pareto distribution P (xm , α) is
given by its CDF
x α
m
,
F (x) = 1 −
x
for x > xm , where α is the shape parameter
that characterizes the tail of the distribution
and xm > 0 is the scale parameter.
x
√
α
E(X) =
the median xm 2,
and the mode xm .
GIN I = 1/(2α − 1),
1−1/α
L(p) = 1 − (1 − p)
D. Kosiorowski (CUE)
(
2
D (X) =
.
∞
αxm
α−1
α≤1
,
α>1
∞
x2m α
(α−1)2 (α−2)
α ∈ (1, 2]
α>2
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
MLE estimators of the Pareto distribution
For large data sets, MLE estimators attains the minimum possible variance
among a large class of competing estimators
n
i=1 log(Xi /xm )
α̂M L = Pn
It can be easily found that
2nα
α̂M L
has cdf χ22n .
Although α̂M L is biased, it is easy to find its unbiased version (MLE)
n−1
,
i=1 log(Xi /xm )
M LE(α) = Pn
M LE(xm ) = min{Xi }
2
For large sample size n , M LE is approximately N α, αn
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
Estimation of the Pareto distribution
• Even small relative error in estimation of α in P (xm , α) may lead to a
large relative error in estimated quantiles or tail probab. based on α.
• For the quantile qε corresponding to upper tail probability ε , it
follows that qε = xm ε−1/α .
• For ε = 0.001 underestimation of α = 1 by only 5% leads to
overestimation of q0.001 by 58%.
• In practice when observing income, even up to 30% of units reports
zero values. These observation can be treated as outliers in a lower
tail of the distribution.
• Errors in estimation of α may result in errors in estimation of basic
measure of social inequity called the Gini coefficient
GIN I = 1/(2α − 1) for α ≥ 1
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
Robustness of a statistical procedure
• ROBUSTNESS: Small changes in data should result in small
changes of outputs of the procedure. By small changes we mean –
small changes in values of observations or big changes of values of a
small fraction of observations. Analysis of robustness of a statistical
procedure is closely tied with analysis of its continuity.
• FINITE SAMPLE BREAKDOWN POINT of ESTIMATOR: A
minimal fraction of bad observations in a sample, which makes the
estimator useless – i.e., e.g., its bias is unacceptable.
• qualitative robustness, robustness in terms of the Hampel’s influence
function, in the sense of Genton & Lucas,...
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Introduction
For a given distribution F in Rd and an ε > 0 , the version of F contaminated by
an ε amount of an arbitrary distribution G in Rd is denoted by
F (ε, G) = (1 − ε)F + εG . The influence function (IF) of an estimator T at a
given x ∈ Rd for a given F is defined
IF (x; T, F ) = lim+ (T (F (ε, δx )) − T (F ))/ε
(1)
ε→0
where δx is the point-mass probability measure at x ∈ Rd .
The IF (x; T, F ) describes the relative effect (influence) on T of an infinitesimal
point-mass contamination at x , and measures the local robustness of T . An
estimator with bounded IF (with respect to a given norm) is therefore robust
(locally, as well as globally) and very desirable.
Let X n be a sample of size. The replacement breakdown point (BP) of an
estimator T
nm
o
n
BP (T, X n ) =
: kT (Xm
) − T (X n )k > δ ,
n
n
where Xm is a contaminated sample resulting from replacing points ofX n with
arbitrary values, k·k denotes a norm, δ is certain content-related threshold.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Robust estimators of the Pareto and log normal models
Plan
1
Introduction
2
Robust estimators of the Pareto and log normal models
3
Properties of the estimators
4
An overview of further studies
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Robust estimators of the Pareto and log normal models
Trimmed mean estimator of the shape in P (xm , α)
For specified β1 and β2 satisfying 0 ≤ β1 , β2 < 1/2 , a trimmed mean is
formed by discarding the population β1 lowest obs. and the proportion of
β2 uppermost obs. and averaging the remaining ones in some sense. In
particular, for estimating α, with known xm
Brasauskas and Serfling (2000) proposed the trimmed mean estimator
for P (xm , α)
α̂T M =
n
X
cni log X(i) /xm
!
−1
,
i=1
with cni = 0 for 1 ≤ i ≤ [nβi ], cni = 0 for n − [nβ2 ] + 1 ≤ i ≤ n and
cni = 1/d(β1 , β2 , n) for [nβ1 ] + 1 ≤ i ≤ n − [nβ1 ], where [·] denotes
“greatest integer part” and d (β1 , β2 , n) =
n−[nβ
P 2 ] j−1
P
(n − i)−1 .
j=[nβ1 ]+1 i=0
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Robust estimators of the Pareto and log normal models
The generalized median estimators
The generalized
median (GM) statistics are defined by taking median
!
n
of the
evaluations of a given kernel h(x1 , ..., xk ) over all k− subsets
k
of the data (see Serfling 1994).
Different choices of k and kernel h lead to different GM estimators. The
kernel is chosen such that h(x1 , ..., xk )is median unbiased for estimating
the parameter θ .
θ̂GM = median {h (X1 , ..., Xk )}
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Robust estimators of the Pareto and log normal models
The generalized median estimator of the shape in P (xm , α)
Brazauskas and Serfling (2002) proposed estimator for the parameter α
in case of xm known:
α̂GM = M ED {h (Xi1 , ..., Xik )} ,
with a particular kernel h(x1 , ..., xk ):
h(x1 , ..., xk ; xm ) =
1
k
Ck P
k
,
log (xj /xm )
j=1
where Ck is a multiplicative, the median – unibasing factor i.e. chosen so
that the distribution of h(x1 , ..., xk ; xm ) has median α - values of Ck for
k = 2 , C2 = 1.1916 , k = 3 C3 = 1.1219.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Robust estimators of the Pareto and log normal models
Robust estimator for the lognormal distribution
A random variable Y has a lognormal distribution L(µ, σ) if X = log Y has the
normal distribution N (µ, σ 2 ) .
Three parameter form L(µ, σ, τ ) is the distribution of Y = τ + eX , where τ
represents a threshold value and X is a random variable with mean µ and std.
deviation. (Kalecki, (1945) a pioneer of the robust approach to ESD.)
Sample Y n = {Y1 , ..., Yn } from the model L(µ, σ), transformation to the
equivalent model N (µ, σ) yields the well-known ML estimators of the location µ
and σ scale parameter and depending on them expected value estimator:
n
µ̂M L =
1X
log Yi ,
n i=1
n
σ̂M L =
1X
2
(logYi − µ̂M L )
n i=1
!1/2
,
2
E(Y ) = eµ̂M L +σ̂M L /2 .
Estimators have good properties, minimal asymptotic variance but they fail to be
robust BP = 0, IF unbounded.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Robust estimators of the Pareto and log normal models
The generalized median estimators in L(µ, σ)
For the log normal distribution L(µ, σ) Serfling (2004) introduced GM
estimators and obtained their properties.
A kernel for the location estimator
h1 (x1 , ..., xk ) =
k
1X
log xi ,
k i=1
µ̂GM (k) = median {h1 (X1 , ..., Xk )}, BP (µ̂GM (k)) = 1 − (1/2)1/k ,
smooth and bounded IF.
A kernel for the scale estimator
h2 (x1 , ..., xm ) =
1
mMm−1
X
(xi − xj )2 ,
1≤i<j≤m
2 (m) = median {h (X , ..., X )}, BP (σ̂ 2 (m)) = 1 − (1/2)1/m ,
σ̂GM
2
1
m
GM
smooth and bounded IF.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Properties of the estimators
Plan
1
Introduction
2
Robust estimators of the Pareto and log normal models
3
Properties of the estimators
4
An overview of further studies
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Properties of the estimators
Aims of the studies
• Robust estimation of the parametric model (which of the say 100
possible models?) or maybe powerful nonparametric estimation, i.e.,
e.g., constrained local polynomial estimator providing estimates of
the derivatives of the PDF or CDF.
• Nonparametric studies of spatial differentiation of the income
distribution in Poland in 2005 (Bocian & Kosiorowski, 2012),
nonparametric studies of spatial differentiation relations between
incomes and expenditures (Węgrzynkiewicz & Kosiorowski, 2012) - in
both cases nonparametric panel models.
• Functional data analysis of the income distribution densities
depends on quality of their estimation (Kosiorowski et all, 2014).
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Properties of the estimators
Properties of the robust estimators - Pareto model
In order to investigate properties of the considered estimators MLE, TM
and GM we performed intensive simulation studies involving simulated
datasets of size 500 observations from the following mixtures of
distributions
• Mixture of P (1, 5) × 10% and P (10, 5) × 90%
• Mixture of lognormal distribution LN (2.14, 1) × 10% and P (7, 2) × 90%
• Mixture of normal distribution N (3300, 500) × 10% and P (2500, 4) × 90%
• Mixture of uniform U [0, 0.1] × 10% distribution and P (2500, 4) × 90%
distribution.
Empirical studies - income distributions in polish counties in 2005.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Properties of the estimators
The estimators comparison - the scale unknown
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Properties of the estimators
The estimators comparison - rob. est. scale
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Properties of the estimators
The estimators comparison - scale known
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Properties of the estimators
The empirical Influence Function
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Properties of the estimators
Constrained local polynomial estimator?
10000
30000
0.0020
0.0000
Density
0.0015
0.0000
0
0e+00
4e+04
8e+04
MAL2005 ; Gini= 0.61 ; sk= 6.58
MAZ2005 ; Gini= 0.64 ; sk= 14.12
10000
20000
30000
N = 11235 Bandwidth = 48.46
D. Kosiorowski (CUE)
0.0000
Density
0
0.0015
N = 11256 Bandwidth = 32.75
0.0015
Density
LÓDZ2005 ; Gini= 0.64 ; sk= 31.14
N = 11141 Bandwidth = 44.46
0.0000
Density
DOL2005 ; Gini= 0.64 ; sk= 12.55
0e+00
4e+04
8e+04
N = 17872 Bandwidth = 45.02
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
Properties of the estimators
CONCLUSIONS - Pareto model.
• The MLE, TM, GM crucially depend on the estimate of the scale
parameter xm . Assuming that this parameter is known, we
recommend the GM estimator for k=5-7, which is a compromise
between a need of high efficiency and high robustness.
• The MLE, TM, GM estimators strongly depend on the distributional
assumption. Before the estimation certain diagnostic procedure
inspecting a sample should be performed. We recommend an usage of
simple Q-Q plot based procedures.
• The GM estimator with xm estimated as quantile of order
γ ∈ (0, 0.3), where γ is optimized using Kolmogorov - Smirnoff
goodness of fit statistics outperform classical MLE as well as TM
estimator. The estimator is computationally intensive however.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
Plan
1
Introduction
2
Robust estimators of the Pareto and log normal models
3
Properties of the estimators
4
An overview of further studies
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
FDA for income distributions decomposition
In Kosiorowski et all (2014), we performed empirical studies using census
data from MINESSOTA POPULATION CENTER
(https://international.ipums.org/international/)
We considered data on TOTAL INCOME from the following countries:
PANAMA 1960, 1970, 1980, 1990, 2000, 2010
MEXICO 1960, 1970, 1990, 1995, 2000, 2005, 2010
PUERTO RICO 1970, 1980, 1990, 2000, 2005
CANADA 1971, 1981, 1991, 2001
BRAZIL 1960, 1970, 1980, 1991, 2000, 2010
USA 1960, 1970, 1980, 1990, 2000, 2005, 2010
For each case we used only nonzero observations smaller then third
quartile. For comparison, we standardized the data dividing them by their
medians. For each case the density was estimated by means of the local
linear polynomial estimator in equally spaced grid of 500 points.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
The considered dataset cont.
10
15
20
25
30
0
5
10
15
20
D. Kosiorowski (CUE)
20
30
40
50
0
0
5
10
15
20
N = 2168065 Bandwidth = 0.01055
0.4
0.8
10
0
10
20
30
N = 21710 Bandwidth = 0.07218
MEX 2000 ; Gini coef.= 0.43 ; skew=MEX
2.7 2010 ; Gini coef.= 0.38 ; skew= 1.68
Density
0.4
0.0
0.0
0.6
Density
USA 2000 ; Gini coef.= 0.51 ; skew= USA
4.79 2010 ; Gini coef.= 0.52 ; skew= 3.92
N = 3852393 Bandwidth = 0.006646
Density
0
N = 111210 Bandwidth = 0.01013
4
5
0.0
0.0
0
N = 3413278 Bandwidth = 0.007884
Density
8
0
6
1.0
Density
2.0
4
4
2
2
0
N = 903395 Bandwidth = 0.01286
Density
MEX 1960 ; Gini coef.= 0.58 ; skew= MEX
5.45 1990 ; Gini coef.= 0.72 ; skew= 1.74
0.0
0.6
Density
0.8
0.4
0.0
Density
USA 1960 ; Gini coef.= 0.48 ; skew= 5.45
USA 1990 ; Gini coef.= 0.5 ; skew= 4.27
Fig. : MEXICO - income distribution
2
Fig. : USA - income distribution.
0
2
4
6
8
10
N = 1272178 Bandwidth = 0.003658
0
1
2
3
4
5
N = 1459061 Bandwidth = 0.005328
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
The considered dataset cont
Fig. : PUERTO RICO - income
distribution.
Fig. : CANADA - income distribution
CAN 1971 ; Gini coef.= 0.47 ; skew= CAN
3.08 1981 ; Gini coef.= 0.44 ; skew= 1.31
0
2
4
6
8
N = 6665 Bandwidth = 0.1321
0.8
0
−2
−1
0
1
2
3
4
0.4
Density
0.0
Density
0.0 0.4 0.8
1.0
Density
0.0
0.3
0.0
Density
PUE 1970 ; Gini coef.= 0.43 ; skew= PUE
2.09 1980 ; Gini coef.= 0.42 ; skew= 0.87
5
10
15
0
N = 114196 Bandwidth = 0.02256
1
2
3
4
5
6
N = 313437 Bandwidth = 0.0168
N = 39785 Bandwidth = 0.0231
CAN 1991 ; Gini coef.= 0.41 ; skew= CAN
0.72 2001 ; Gini coef.= 0.41 ; skew= 0.58
0.0
0.5
1.0
1.5
2.0
0.8
0.0
0
−0.5
0.4
Density
0.6
0.0
0.3
Density
1.2
0.6
0.0
Density
PUE 2000 ; Gini coef.= 0.29 ; skew= −0.13
1
2
3
N = 558278 Bandwidth = 0.01981
0.0
1.0
2.0
3.0
N = 565432 Bandwidth = 0.01783
N = 40408 Bandwidth = 0.02073
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
FPCA for the densities - a general idea.
Fig. : The estimated densities.
Fig. : The FPCA decomposition.
0
20
40
60
income
D. Kosiorowski (CUE)
80
100
0.00
0.00
values * values
0.02 0.04 0.06
densities
0.10 0.20 0.30
FDA transformed densities
0
20
40
60
income
80
100
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
FPCA for the estimated densities
Fig. : The estimated squared
harmonics.
1.0
Fig. : The estimated harmonics.
value
2.0
0
1
2
3
4
5
income/median(income)
D. Kosiorowski (CUE)
6
7
1.0
PCA1 78.2%
PCA2 15.3%
PCA3 4.4%
PCA4 1.9%
0.0
−2.0
values
−1.0
0.0
3.0
PCA1 78.2%
PCA2 15.3%
PCA3 4.4%
PCA4 1.9%
0
1
2
3
4
5
income/median(income)
6
7
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
Selected results FDA analysis of ESD
• The first harmonic which explains 80% of the total variability is
responsible for the Pareto type asymmetry between incomes of the
Pour and incomes of the majority concentrated around the median of
the income.
• The second harmonic responsible for 15% of the total variability
relates to the people entering middle class incomes.
• The third harmonic responsible for 4.4% of the variability relates to
the financial success. The incomes exceeds three times median
income.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
Multivariate extensions within data depth concept
Data depth concept was originally introduced as a way to generalize the
concepts of median and quantiles to the multivariate framework. A depth
function D(·, F ) associates with any x ∈ Rd a measure D(x, F ) ∈ [0, 1] of
its centrality w.r.t. a probability measure F ∈ P over Rd or w.r.t. an
empirical measure Fn ∈ P calculated from a sample Xn = {x1 , ..., xn }.
The larger the depth of x, the more central x is w.r.t. to F or Fn .
The weighted Lp depth from sample Xn = {x1 , ..., xn } is computed as
follows:
1
Lp D(x; Xn ) =
,
n
P
1 + n1
w kx − Xi kp
i=1
where w is a suitable is non-decreasing and continuous on [0, ∞) weight
function, and k·kp stands for the Lp norm (when p = 2 we have the usual
Euclidean norm and spatial depth, spatial median).
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
Fig. : Sample global depth
Fig. : Sample local depth
1.0
1.0
6
6
0.8
4
0.8
4
0.6
y
0.6
2
2
0.4
0.4
0
0
0.2
0.2
-2
-2
0.0
-2
-4
0.0
-2
0
2
4
0
2
4
6
8
x
6
See Zuo & Serfling (2000), Mosler (2013), Kosiorowski (2012), (2014).
Visit DepthProc package web pages for free R package
http://r-forge.r-project.org/projects/depthproc/
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
Further studies - multivariate extensions
• The set of points for which depth takes value not smaller than
p ∈ [0, 1] is multivariate analogue of the quantile and is called the
p− central region, Dp (X) = {x ∈ Rd : D(x, X) ≥ p}.
• Lorentz curve – proportion of the mean confined to the central region
Dp (X) to the overall mean. Let f (x) denote wealth of a point
x = (x1 , ..., xd ) . We can define multivatiate Lorenz Curve as
L(p) = p ×
E (f (x)|x ∈ Dp (X))
.
E(f (x))
• Kurtosis in the form of sample Mahalanobis distances of observations
to a multivariate median.
• Local multivariate lorenz curves induced via local depth concept...
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
SUMMARY
• The GM estimators with scale (threshold in three parameter
lognormal model) estimated as quantile of order γ ∈ (0, 0.3), where γ
is optimized using Kolmogorov - Smirnoff goodness of fit statistics
outperforms classical MLE as well as TM estimator. The estimators
are computationally intensive however.
• It is worth considering to estimate the income distribution
nonparametrically - we recommend the constrained local polynomial
estimator - at least on the explanatory step of a research.
• We recommend calculating the Gini coefficient "nonparametrically"
i.e., without using an assumption of Pareto, log-normal or gamma
distributed data.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
An overview of further studies
Selected references
• Brazauskas V., Serfling R., (2000), Robust and Efficient Estimation of the
Tail Index of a Single-Parameter Pareto Distribution, North American
Actuarial Journal.
• Brazauskas V., Serfling R., (2001), Robust Estimation of Tail Parameters for
Two-Parameter Pareto and Exponential Models via Generalized Quantile
Statistics, Extremes.
• Brazauskas V., Serfling R., (2004), Fovorable Estimators for Fitting Pareto
Models: A Study Using Goodness-of-Fit Measures with Actual Data, ASTIN
Bulletin.
• Serfling, R. (2002), Efficient and Robust Fitting of Lognormal Distributions
• Victoria-Feser, M.-P. (2000), Robust Methods for the Analysis of Income
Distributions, Inequality and Poverty, International Statistical Review.
D. Kosiorowski (CUE)
Łódź, 3th Conference on Spatial Econometric
Income Distribution Models and Income Inequality Measures from the Robust Statistics
/ 41Per
© Copyright 2026 Paperzz