Bstimation of the Lorenz Curve and Concentration Ratio Shi

Bstimation of the Lorenz Curve and Concentration Ratio
Shi-Tao Yeh, Consultant, Malvern, PA. OSA
Lorenz Curve and Concentration Ratio
The concentration ratio is a measure of
the. inequalities of the size distribution. It has been used for studyinq
market share concentration, city size
distribution,
income
distribution
inequality,
environmental
pollution
concentration, .•. , etc_. Such ~nequali­
ties can be recorded in the form of a
Lorenz curve. The diaqonal straiqht
line shows what a distribution of complete equality in cummutive percentaqe
would look like, so the extent to which
the LorenZ curve deviates from this
line gives an indication of relative
concentration.
Gini Coefficient
The Gini .coefficient, or concentration
ratio, provides a summary measure of
Lorenz Curve
"
the extent to which the Lorenz curve
deviated from the equalitarian lin.e.
"
This paper discusses the evaluation of
functional forms of the Lorenz curve
and its mathematical properties. The
SAS®
modules with SAScode for estimatinq the Lorenz curve and the concentration ratio are provided.
10
~
~
~
~
~
~
~
"
I~
Cum. ,.. 0( Total PopWatioD.
The last section of this paper
with real-world application.
deals
Figure ,. Lorenz Curve and Concentration Ratio
:Ui'l'RODtl'CTION
The concentration ratio is a measure of
the inequalities of the size distribution. It has been used for studyinq
market share concentration, city size
distribution,
income
distribution
inequality,
environmental
pollution
concentration, ... , etc. Such inequalities can be recorded in the form of a
Lorenz curve, as in Figure 1. The diaqonal straiqht line shows what a distribution of complete equality in cummutive percentaqe would look like, so
the extent to which the Lorenz curve
deviates from this line qives an indication of relative concentration. For
example, the diaqonal line shows how
one miqht expect 50% or total population to be account for by 50% of the
total income, whereas in fact SO, of
the lower income from the total populalation are accounted for by only 23% of
total income, as the Lorenz curve indicates in Fiqure 1.
The Gini coefficient,
crosshatched
lines area in Fiqure 1, provides a summary measure of the extent to which the
Lorenz curve deviated from the equalitarian line. It indicates the extent of
the crosshatched lines area in the
fiqure by dividinq the area of crosshatched lines by the area below the
line of equality. The value of the Gini
coefficient ranqes from 0 (complete
equality) to 1 (complete inequality).
'This paper discusses the evaluation of
functional forms of the Lorenz curve
and its mathematical properties. The
SAS modules with SAS code for estimatinq the Lorenz curve and the concentration ratio are provided.
The last section of this paper deals
with real-world application.
873
BVJ.LUATXON
0"
the other, suggested by Gupta [1], is
the following:
P1lJIC'l'XOIIAL PaRKS 0., ftE
LORENZ CURVE
(x - 1 )
In recent years, there has been some
development on deriving the functional
forms for the Lorenz curve. The functional form
y-x'A
FUnctional form (1) has been criticized
with, its nonlinearity in the parameter,
which makes estimation of the parameters by the linear l.east squares method
impossible [1]. ?unctional form (2) is
intrinsically linear; nonlinear with
y - fix)
represents the Lorenz curve if it
tisfies the following properties:
sa-
respect
Iii) f(1) - 1;
>- 0, forO <-
(iv) f"(x)
(v)
fix)
(vi)
0
>- 0,
< - x,
~J)(x) dx ~
<
1;
MODULE PaR PROCESSING INPUT DATA
The data from Table 1 are used throughout
the
paper
for
demonstration
purpose.
There are two proposed functional forms
, is proposed by Rache, Gaffney, Kee, and
Obst [2] as follows:
The following
0;
w'lere
0
parameters
< x < 1;
1/2.
r -x)
variables, but linear
becomes relatively simple and easy.
,tqat satisfy all of the properties. One
Y - [ 1-(
the'
the estimation of nonlinear
<- x <- 1;
forO
for 0
x
to
with respect to the parameters to be
estimated. These two functional forms
has been used for parameter estimation.
The results from these two forms are
very similar. The NLIN procedure in
SAS/STAT®provides a powerful tool that
(i)f(0) - 0;
(iii) {'(x)
with A> 1.••..•.•..• (2)
1/(3
1
<0; <-
percentage.
(1)
1,0
county
Philadelphia
Montgomery
Chester
Delaware
Bucks
Camden
Burlington
Gloucester
New Castle
SAS~ macro module
~rocess input data
~nto data with
<
(3
<-
will
and convert raw data
sorted
cummulative
1;,
1980
1970
1,688,.210
643,621
316,660
555,007
479,211
471,650
362,542
199, !).17
398.,115
1,949,996
624,080
277,746
603,456
416,278
456,291
323,132
172,681
385,856
Source: CB Co.aercial Real Bstate Group
TMJ/e 1. Population in
874.
Tr~State Area
1990
1,585,577
678,111
376,396
547,651
541,174
502,824
395,066
230,082
441,946
This macro tunction %cinput can be used
anywhere i\1 the SAS proqram. For example, a SAS data set F1 contains data of
Table 1 with variable names: COUNTY $ ,
Y1970 , Y1980, and Y1990. The following SAS statements will produce a data
set F2, with variables: 08S, Y70, X,
Yao, and Y90, shown in Figure 2.
"macro cinput(fl,varl,xI,var2 ofl)'
.' ,
data fl·
set &fl;
a - I'
proc son;by a &var I;
proc summary;
class a;
var &varl;
outputout-alsum_sl n-nl'
dataal;
,
setal;
a - I'
if typ~ _0'
keepaslnl:
data &ofI;
,
merge a I fl;
by a;
pa - &var 1151'
retain &var2 0.'
&var2 - &var + pa'
i
ctx-aln!'
. &xl 0;'
retam
&xl - &xl +ctx·
keep &x I &var2' '
"cinputlfl, yl 970,)(, y70,0 I I
"cinput(fI, yl 980,x, y80,021
%cinputlfl, yI990,x, y90,031
run;
data f2;
set 01; set 02; set 03;
'
1
2
3
4
5
6
7
8
9
"mend cinput;'
0.03315
0.08646
0.14849
0.22256
0.30246
0.39005
0.50589
0.62569
1. 00000
0.11111
0.22222
0.33333
0.44444
0.55556
0.66667
0.77778
0.88889
1.00000
0.03908
0.10099
0.17187
0.24971
0.34192
0.43561
0.54411
0.66994
1.00000
0.04342
0.11446
0.18901
0.27242
0.36731
0.46944
0.57279
0.70077
1. 00000
Figure 2 $AS Macro Code for Input Data Processing
Figure 3 Da.ta Set F2
The arguments
are:
of
this macro function
MODULE FOR PARAMETER ESTIMATION
fl: data set name which contains
the variables to be processed,
The NLIN procedure in SAS/STAT
is
employed for THE Lorenz curve parameter
~stimation. Figure 3 shows a SAS
macro
estimation module.
varl : variable name in fl data set
for which to be processed,
x I : computed values on equalitarian
line,
var 2 : variable name for processed
var1 values,
%macro nlreg(f,x, y,a I,b I,p 1);
proc nlin data - &f
maxiter-30
converge -.0000 I;
parms &a I - 0.5
&bl - 0.6;
bounds 0 < &a I < - I ,
0< &bl <-I;
model &y - (I. (1· &xl •• &a1)
•• (1-1 &b 1);
output out - &p I parms - &a I &b I;
%mend nlreg;
off : output data set name.
To invoke this function, write its
macro function name %cinput and then
the arguments of data set name fl,
variable name varl, name for computed
values on equalitarian line xl, variable name var2 for
processed
values, and output data set name
varl
ofl,
enclosed in parentheses.
875
The macro function %nlreq
8upplied arquments of
accept~
users
pl.: data set name which is the
oupU! file of macro
funaidn %nlreg,
f : data set name,
aI, b I : estimates of parameters ,
x : values on equalitarian
line,
year: the year which the data
are represented.
y : target variable for analysis,
The complete macro code are shown
the followinq tiqure.
aI, b I: estimates of parameters
a, and b in Lorenz
in
curve (1),
%macro cr(p I,a I,b I,year);
data &pl;
set &pI;
a - &al;
b - &bl;
al - 1I&al;
bl - (1I&bl) + I;
cl-al+bl;
cr - 1.0 - (2.0/a) •
(gamma(a I) •
.
gamma(b 1)/gamma(c I));
year - &year;
keep year cr a b;
%mend cr;
pI: output data set name.
After executing the macro
function
tnlreq, you can use the followinq macro
function to process output file.
%macro fone{p I ,a 1);
data &pI;
set&pl;
proc SOrl;by &a I; .
data &pl;
set &pl;
by&al;
iffirst.&a I;
%mend fone;
APPLICATIOW
XOOVLB FOR GIWI COBFFICIBWT COMPVTATIOW
The
Tri-state population
data
in
Table 1 show the population chanqes
from 1970 to 1990. One may ask the
question of equity in the population
size distribution in the study area.
The followinq SAS statements are used
to produce Table 2.
The computation ofthe Gini coefficient
is based on functional from specified
in equation (1). It is defined:
Substitutinq variables
u-I-(I-x)
%nlreg«(2,x,y70,aO,bO,pO)
%fone(pO,aO)
a
%cr(pO,aO,bO, 1970)
%nlreg(f2,x, y80,a 1,b 1,p I)
%fonelp I,a I)
%cr(pl,al,bl,1980)
%nlreg(f2,x,y90,a2,b2,p2)
%fone(p2,a2)
this is equal to:
f
%crq,2,a2,b2,1990)
data (3;
1
lil1
11",-1
CR - 1.0 - 2.0 (-) (1 - u)
u
du
"
- 1.0 - (2.0/a) • B(1/""
set pO pI p2;
proc print data -f3 noobs label split-·· ';
var year cr a b;
11~
+
label
year - 'year· - - - -
1)
I
cr - 'concentration rat;o·. ________________ __ '
a - 'estimateofa·--- __________ ,
b - 'estimate ofb*. - -. _________ ,
;
where 8 represents the beta
distribution.
SAS function of GAMMA is used
computinq the beta distribution.
macro function tcr is desiqned
computinq concentration
ratio.
arquments of this function are
title 'Population Concentration Trend in" Tr;.State';
run;
for
The
for
The
876
Population Concentration Trend in Tri-Stat.
year
concentration ratio
estimate of a
estimate of b
0.38806
0.33115
0.28947
0.50352
0.55668
0.59915
0.89967
0.91887
0.93148
~
1970
1980
1990
TmJe 2 Population Concentration TfMd in T,i-5tate Al'ea
UPlIRlDIClIS
Population Concentration in Tri-State
'"r-------------------------------~
.
.
[1)
Gupta, M.R.(1984),"FUnctional Forms
for Estimatinq the Lorenz curve",
Econometrica , 52 , pp. 1313-1314 •
[2)
Rasche, R. B., J. Gaffney, A. Y. C.
Koo, and N. Obst(1980), "FUnctional
Forms for Estimatinq the Lorenz
curve", Econometrica, 48, pp. 10611062.
i'eo
~
~
~
•
~
SAS, and
SAS/STAT
are
reqistered
trademarks of SAS Institute Inc., Cary
NC, USA
50
40
..
.
to
..
•
~
•
..
..
n
..
•
_
e-. n of ToW Pop1IlaUoA
Fi,UTa 5 Population Concentration TfMd in T,i-State Al'ea
Table 2 presents the estimatedparameters of" the Lorenz curves alonq with
the
different
inequality measures.
Fiqure 3 exhibits the correspondinq
Lorenz curves. It shows a trend of
equity in the population size distribution in the Tri-State area.
877