Towards understanding the Lorenz curve using the Uniform distribution
Chris J. Stephens
Newcastle City Council, Newcastle upon Tyne, UK
(For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)
Abstract
Using the uniform statistical distribution and ordered uniform spacings,
this paper provides a point for improving our understanding of the
Lorenz curve and the Gini co-efficient of inequality ‘G’, under random
allocation.
Starting with ‘p’ ordered uniform spacings it establishes, under random
allocation, the joint moment generating function of these observations,
and the exact distribution of the individual ordered observations.
These provide a basis for understanding the Gini co-efficient of
inequality and the associated Lorenz curve.
For example, the expected value of G, under random allocations, is not
unexpectedly, (p-1)/2*p, and obviously approaches 0.5 as p increases.
The paper, following Durbin (1965), uses a simple test to establish if
the co-efficient is from a random allocation.
Additionally, the paper develops the associated Lorenz curve, which
takes the form L(z)=z+(1-z)log(1-z), 0<z<1, and hence leads to the
negative exponential distribution underlying the results.
It suggests three extensions covering particular values of ‘G’.
1.
Introduction
This paper will look at the ordered uniform spacings. This paper
follows a considerable number of authors who have looked at uniform
spacings (for example: Barton & David (1956), Durbin (1961), Lewis
(1965), Pyke (1965) and Stephens (1986)).
It will start from the basics and apply the results to the Lorenz curve
and the Gini co-efficient with a view to a better understanding of both
the Lorenz curve and the Gini co-efficient.
2.
Foundations
Let vi (i=1, … ,n; vj > vi for j>i) be n observations uniformly distributed
on the interval [0,a]. Define wi = vi - vi-1 (i=1, … ,n+1; v0 =0, vn+1 = a).
Let xi = wi′ where wi′ is the ordered set of the w’s (i.e. wi′ > wi-1′), such
that 0 < xi < xi+1 < a and Σ xi =a, where the summation is over all values
of i=1, … ,p=n+1.
These xi are the ordered uniform spacings. We will also define x0=0.
Some authors have used the terms D( i) or c(i) to describe these
spacings.
3.
Gini co-efficient
The Gini co-efficient of inequality is then, using the above x’s, given by
the following:
G = 2(Σixi) – (p+1)a.
ap
Again the summation is over all values of xi (i=1, … ,p).
The minimum G is 0, when all the x’s are equal, that is xi = a/p, for all i.
The maximum G is (p-1)/p when xi = 0 (i=1, … ,p-1) and xp = a. This
approaches 1 as p increases.
4.
Distribution function of the xi’s
The first part of this section will outline the calculations used to produce
the joint moment generating function of the x’s. The next part will show
the individual moment generating functions. The final part will give the
individual distribution functions.
Additionally, as part of this section, the expected value of the Gini coefficient G is derived.
4.1
Joint Moment Generating Function of the xi’s
Here the aim is to find the Joint Moment Generating Function of the
xi’s.
Σsixi
⌠
⌠ Σsixi
That is, E(e
) = Kp │ … │ e
π dxj
⌡
⌡
Sp
where Sp is the Space such that 0 < xi < xi+1 < a, Σ xi = a, and Kp
is a standardising function of p and a, so that the integral is unity if all
the si are 0. (The summations and product are over i,j=1, … ,p.)
p-i
4.11
Using the transformation ui = a – (i*xp-i+1) - Σ xj
j=1
with u1≡ 0, up+1≡a, max(ui) = ui+1 and dui = -idxp-i+1, it is easy to show
that
s(ui - ui-1)
E(e
) =
∞
Kpap-1 Σ
p! j=0
(as)j _ , independent of i.
(p+j-1)!
Since, when s=0, this must be ‘1’, this implies that Kp = p!(p-1)! .
ap-1
4.12
It follows that E(ui+1 – ui) = c, a constant independent of i.
i.e.
E(i(xp-i+1 – xp-i)) = c.
Hence
E(xi) = c Σ (1/k),
p
for all i.
k=p-i+1
p
But
Σ xi =a.
i=1
p
Hence
E(xi) = (a/p) Σ (1/k),
for all i.
k=p-i+1
We can also express this as
i
E(xi) = (a/p) Σ (1/(p-k+1)) for all i.
E(x1) = a/p2,
For example
k=1
E(x2) = (a/p) ((1/p) + (1/(p-1))), and
p
E(xp) = (a/p) Σ (1/k).
k=1
If p=2, E(x1) = a/4 and E(x2) = 3a/4.
If p=3, E(x1) = a/9 , E(x2) = 5a/18 and E(x3) = 11a/18.
Not surprisingly, for any p, they add to a.
4.13
Using the above it follows that
p
p
p
E(Σixi) = Σ {(a/p) Σ (1/k) } = (3p+1)a/4.
i=1
Hence
i=1
k=p-i+1
E(G) = 2*E(Σixi)) – (p+1)a = 2*((3p+1)/4) – (p+1)
ap
p
= (p-1)/2p.
This is exactly half way between the minimum (i.e. 0) and the
maximum (i.e. (p-1)/p).
4.14
It follows, from 4.11, that
Σsi (ui - ui-1)
E(e
) =
asi
(p-1)! Σ e
-1
ap-1
si π (si - sj)
where the first summation is over i=1, … ,p, the second over i=2, … ,p
and the product over j=2, … ,p except j=i.
4.15
After noting the results of Stephens (1991), for example, for integer m,
any si ≠ sj,
p
Σ
i=0
si m
≡0 (0 < m < p-2), and
si p-1
≡1,
π (si - sj)
j≠i
p
Σ
i=0
π (si - sj)
j≠i
where the products are over all values of j, but j≠i, this leads to
Σsi xi
p
ti
E(e
) = (p-1)! Σ
e
_
i=1
π (ti - tj)
j≠i
p
where ti = a Σ sk , the average of the last i ‘sk’s, multiplied by ‘a’.
i k=p-i+1
The first summation is over i=1, … ,p and the product over j=1, … ,p
and again j≠i.
4.16
Examples
Let p=2, then we have
E(e
s1x1 + s2x2
)=
t1
e
(t1 – t2)
+
t2
e
_
(t2 – t1)
=
as2
a(s1 + s2)/2
e
+
e
_
a(s2 – (s1 + s2)/2)
a((s1 + s2)/2 - s2)
=
as2/2 as2/2
as1/2
2 e
(e
- e
) .
a(s2 – s1)
Putting s1= 0, leads to the moment generating function of x2, which is
uniform on the interval [a/2,a].
Alternatively, putting s2= 0, leads to the moment generating function of
x1, which is also uniform, but on the interval [0,a/2].
In the limit as s2 → s1 = s, we have not unexpectedly,
E(e
s1x1 + s2x2
)=
as
e .
That is the point ‘a’ with probability 1.
Generally, the variance var(x1) = var(x2) = a2/48, and the covariance
cov(x1 , x2) = -a2/48. Hence the correlation corr(x1 , x2) = -1. This
makes sense, intuitively. It says that if two people have a fixed amount
between them, then what one person has the other person does not
have.
4.17
Let p=3, then we have
s1x1+s2x2+s3x3
E(e
)=
t1
t2
t3
2e
+
2e
_ +
2e
_
(t1 – t2)(t1 – t3)
(t2 – t1)(t2 – t3) (t3 – t1)(t3 – t2)
=
as3
2e
a2 ( s3 – (s2+s3)/2) (s3 – (s1+s2+s3)/3)
+
a(s2 + s3)/2
2e
a2 ((s2+s3)/2 –s3) ((s2+s3)/2 – (s1+ s2+s3)/3)
+
a(s1 + s2 + s3)/3
2e
.
a2 ((s1+s2+s3)/3 –s3) ((s1+s2+s3)/3 - (s2+s3)/2)
Then we have, for example, that
corr (x1 , x2) = 1/√(28), corr (x1 , x3) = -5/√(52) and corr (x2 , x3)=- 8/√(91).
It is easy to show that the correlation matrix is singular indicating that
the three variables x1 , x2 and x3 are not independent, in an analogous
way to the example when p=2.
4.2
Individual moment generating functions
By putting sk=s and sj=0, j≠k and for ease of presentation here, and in
the rest of section 4, we put a=1, the individual moment generating
functions are
E(e
sxk
p
) = p!(p-1)!
sp-1(p-k)!
e s/i ip-2 (-1)i+k+p+1
i=p-k+1 (p-i)! (i-p+k-1)!
Σ
together with other terms such that any negative powers of s
disappear.
4.21
Examples
For example, if p=2, we have as mentioned in section 4.16,
E(e
sx1
s/2
) = 2 (e
s
–1)
and E(e
sx2
) = 2 (es – es/2);
s
that is the uniform distributions on [0,½] and [½,1], respectively.
4.22
For example, if p=3,
E(e
sx1
) = 6 (3es/3 –3 -s) ,
s2
E(e
sx2
) = 12 (2es/2 – 3es/3 +1)
s2
and
sx3
) = 6 (es - 4es/2 +3es/3).
s2
Note, for example, in both situations, the minimum power of the
exponential of the largest spacing is es/p. This reflects the fact that the
minimum value of the largest spacing is, if a=1, ‘1/p’.
E(e
4.23
An unexpected pair of results
sxk
E(e
)=
∞
p! Σ sj-p+1 /j!,
k=1
sp-1 j=p-1
p
sxk
∞
∞
Σ k E(e
) = p* p! Σ sj-p+1 /j! - (p-1)*p! Σ (s/2)j-p+1 /j!
k=1
j=p-1
2
j=p-1
p
It is easy to show that
and that
Σ
These results which are not intuitively obvious, are useful in
determining Gr when we transform xi → xir = xir, for varying r>0.
4.3
Individual probability density functions
For the sake of completeness we include the individual density
functions.
1/c
Using the fact that ∫ esx (1-cx)r dx = r! cr es/c and other terms which
x=0
sr+1
do not involve the exponential, leads to the following:
fpk (x)=
(
(
(
fpkm (x)
fpk1 (x)
0
- ( 1/(p-m+2) < x < 1/(p-m+1) , 2< m < k)
- ( 0
< x < 1/p
)
(elsewhere
)
where
k
fpkm (x) =
(p-1) p!
Σ
(k-1)! (p-k)! j=m
(-1)k-j (k-1)! (1-(p-j+1)x)p-2 , 1 < k < p.
( j-1)!(k-j)!
These are the individual probability density functions. For any value of
p, it gives the probability density function of the kth. function using, for
example, k=1 for the smallest value and k=p as the largest value.
It also uses m as a variable, increasing one at a time, allowing each
particular segment of the individual functions to be specified.
We use the term segment to indicate the range over which a particular
function is valid. Where a function has the value 0 (e.g. fp1m(x)≡0, for
m>2, and that fpp1(x)≡0) we will not treat this as a segment.
For p> 3, there is no discontinuity in the value of the function at the end
of each segment, although there will, for most values of k, be a
discontinuity in the (p-2)th. derivative.
4.31
Examples
The smallest density function is of the form a single segment,
fp1 (x) = fp11(x) = p(p-1)(1-px)p-2 (0 < x < 1/p, p> 2).
Hence E(x)=1/p2 and the variance, var(x) =(p-1)/(p4(p+1)).
The second smallest density function is of the form two segments,
( fp21(x) = p(p-1)2((1-((p-1)x))p-2 – (1-px)p-2) (0 < x < 1/p),
fp2 (x) = (
( fp22(x) = p(p-1)2(1-((p-1)x))p-2
(1/p < x < 1/(p-1)); p> 2.
The density function of the largest spacing is given by, (p> 2),
p
(-1)p-j (p-1)! (1-(p-j+1)x)p-2
j=m
( j-1)!(p-j)!
( 1/(p-m+2) < x < 1/(p-m+1) , 2< m < p).
fpp (x) = fppm (x) = (p-1)p Σ
This has (p-1) segments.
4.32
Limiting probability density functions
It is reasonably easy to compute, using the mean and standard
deviation, the limiting probability density function (i.e. as p → ∞) of the
smaller distributions.
There would appear to be no limiting probability density function of the
larger values. This appears to be primarily because
p
Lt
Σ 1/j
p → ∞ j=1
p
does not have a finite limit, but that Lt
p→∞
(1/p) Σ 1/j → 0.
j=1
5.
Distribution of the Gini co-efficient
We now apply all this to the Gini co-efficient of inequality. An
equivalent result was presented by Durbin (1965).
5.1
Using the result from section 4.15, and putting sk = (k*s) then we have,
again putting a=1, that tk= s*(2p+1-k)/2.
E(e
Σsi xi
) = E(e
s*Σixi
p
) =
(p-1)! Σ
i=1
ti
e
_
π (t i - t j )
j≠i
p
s(2p+1-i)/2
= (p-1)! Σ
e
_
π (j-i)(s/2)p-1
i=1
j≠i
= esp
(1-e-s/2)p-1
(s/2)p-1
=
es(p+1)/2 * (es/2 – 1)p-1
(s/2)p-1
.
This is the moment generating function of the sum of (p-1) uniform
distributions over [0,½] with a positive displacement of (p+1)/2.
I.e. it has a mean value of (3p+1)/4 and a variance of (p-1)/48. This
was put forward by Durbin (1965).
5.2
Hence using the equation for the Gini co-efficient, we have that the
moment generating function of the Gini co-efficient is given by
Mgf(G) = (es/p – 1)p-1 .
(s/p)p-1
I.e. the Gini co-efficient of inequality is, under the uniform distribution,
the sum of (p-1) uniform distributions over the interval [0,1/p]. The
expectation is then (p-1)/(2*p), with variance (p-1)/(12*p2).
Hence for moderately large p, a 95% confidence interval for G, is given
by the interval:
(0.5 - 1.96/(2√(3p)) , 0.5 + 1.96/(2√(3p)) ).
6.
The shape of the Lorenz curve
In this section we find the general shape of the Lorenz curve using the
expected values as derived in section 4.12.
p
We have that E(xk) = (a/p) * Σ (1/j) .
j=p-k+1
Therefore, the value of the Lorenz curve at the value ‘i’ on the x-axis is
given by
i
i
p
p
Σ E(xk) = Σ (a/p) * Σ (1/j) = (a/p) * Σ (j-p+i)/j
k=1
k=1
j=p-k+1
j=p-i+1
We will, again, put a=1, but also let i=zp where 0 < z < 1.
Then the point on the curve at the point where the x-axis has the value
‘zp’ is given by
p
(1/p) * Σ (j-p+zp)/j
j=p(1-z)+1
p
= 1
p
Σ
(1 - p(1-z)/j )
j=p(1-z)+1
p
p
= 1 * ( Σ 1 ) - (1-z) * Σ (1/j)
p j=p(1-z)+1
j=p(1-z)+1
For moderately large p, this is
z – ((1-z)* loge(p/p(1-z))) = z + (1-z) log (1-z)
Lorenz curve - under a) Perfect Equality, and
b) the Uniform Distribution.
1.0
0.9
0.8
Equality
0.7
0.6
0.5
0.4
0.3
Uniform
0.2
Distribution
0.1
0.0
0.0
0.2
0.4
0.6
0.8
1.0
Hence the equation of this is given by
L(zp)= z + (1-z) log (1-z) (0< z< 1).
Hence by putting p=1, we relate the curve to the proportion of the
population and obtain the following Lorenz curve
L(z)= z + (1-z) log (1-z)
(0 < z < 1).
This is the equation of the Lorenz curve under the assumption that the
original data comes from the Uniform random distribution. The graph
shows the shape and compares this with the straight-line equality
Lorenz curve.
6.1
Table of values and associated information
The table, below, shows values on the Lorenz curve.
It shows for each value, the cumulative amount that this group will have
(i.e. the point on the Lorenz curve), together with in the third column,
the average (using 1 unit as the overall average) that this group will
have.
Following the blank column, in the fifth column, we have those who
have not been counted in the first column. The sixth column shows the
amounts that these individuals will have. The final column shows the
average that these individuals will have.
Table showing values and associated information on the Lorenz curve
produced using the expected value of the ordered uniform spacings.
Proportion
Cumulative
Amount
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.6321
0.7000
0.8000
0.9000
1.0000
6.2
0.0000
0.0052
0.0215
0.0503
0.0935
0.1534
0.2335
0.2642
0.3388
0.4781
0.6697
1.0000
Average
Amount
0.0518
0.1074
0.1678
0.2338
0.3069
0.3891
0.4180
0.4840
0.5976
0.7442
1.0000
Above
point
1.0000
0.9000
0.8000
0.7000
0.6000
0.5000
0.4000
0.3679
0.3000
0.2000
0.1000
0.0000
Amount
above
point
1.0000
0.9948
0.9785
0.9497
0.9065
0.8466
0.7665
0.7358
0.6612
0.5219
0.3303
0.0000
Average
above the
point
1.0000
1.1054
1.2231
1.3567
1.5108
1.6931
1.9163
2.0000
2.2040
2.6094
3.3026
-
Examples
That is, under the uniform distribution, for example, for the lowest 10%
point, will have between them 0.52% of resources, an average of
0.0518 units. The remaining 90% of individuals will have 99.48% of
resources, an average of 1.1054 units.
At the point where 90% are accounted for, these will have 66.97% of
resources, an average of 0.7442 units. The remaining 10% will have
33.03% of resources, an average of 3.3026 units.
6.3
Characteristics of this Lorenz curve
This section briefly looks at the characteristics of the curve. As noted
above and as expected, L(0)=0, L(1)=1.
6.31
Derivatives and Area under the curve
L′(z)= -log(1-z). This is greater than 0 for z> 0 and tends to infinity as
z→1.
L′′(z) = 1/(1-z). This is ‘1’ for z=0 and again tends to infinity as z→1.
The area under the curve, to the point z, is given by
z
∫ [ y + (1-y) log(1-y) ] dy
y=0
=
z(3z-2) - (1-z)2log(1-z) .
4
2
Hence, not unexpectedly, the total area under the curve is, on putting
z=1, 0.25. Hence the Gini co-efficient of inequality is 0.5 as expected.
6.32
The overall average
When the gradient of the curve is ‘1’ this is the maximum horizontal
distance between the line of equality and this curve. This is the value
of the average, i.e. the individual has 1 unit.
Here L′(z1)= -log(1-z1)=1, i.e. z1 = (e-1)/e =63.21%. Up to this point
the cumulative amount is 26.42% of resources, giving an average of
0.4180 units. The remainder (i.e. 36.79%) have 73.58% of the
resources, i.e. an average of 2.00 units. These figures are also given
in the table.
6.33
Relationship of a ‘particular value’ to the ‘average of the higher values’
Expanding on the previous section, for a particular ‘z’, the cumulative
amount is L(z)= z+((1-z)*log(1-z)), the actual value is –log(1-z), the
average amount to this point is 1+((1-z)*log(1-z)/z).
Further for the remaining ‘1-z’, the total amount that is left is
1-L(z) = 1-[z+(1-z)log(1-z)] = ((1-z)*(1-log(1-z))).
Hence the average for these higher values is (1-log(1-z)).
Hence, for a particular ‘z’, if we let ‘v’ = -log(1-z) (0 < z <1) be a
particular value, then the average of the values above this is given by
‘1+v’.
This is trivially obvious for v=0; in the previous section we noted it for
v=1, and now conclude that it is generally true.
6.34
To round off our excursion we note that for any Lorenz curve we have
the following:
⌠Z
-1
LF(z)= µ-1 │ F (x) dx, 0 < z < 1,
⌡0
-1
where F (x) = sup {y: F(y) < x}; F(x) is the cumulative distribution
function. (See, Sarabia et al. (1999).)
In our case, L(z)= z + (1-z) log (1-z), (0 < z < 1).
-1
L′(z)= -log(1-z), µ=1, hence F (x)= -log(1-x), - ∞ < x <1.
Hence x=F(-log(1-x)), and on putting y=-log(1-x), leads to
F(y)=1-e-y, with F(0)=0 and, as y →∞, F(y) =1.
Hence, not unexpectedly given our starting point, f(y)= e-y.
7.
Extensions
Here we consider three extensions of this work. Previous work on
creating families of Lorenz curves include Sarabia et al. (1999). We
define L0(z)=z and L1(z)=z+(1-z) log (1-z).
7.1
We can produce a Lorenz curve of the form
Lα(z)= z + α (1-z) log (1-z), with 0 < α < 1,
where α=0, is the line of equality and α=1 is the same as the
previously determined curve.
It is easy to show that, correspondingly, Gα = α/2. The problem with
this family of curves, is that it is not possible to produce, without further
adjustments, a Lorenz curve with Gα > 1/2.
This curve appears to arise when there is a basic unit allocation
together with a Uniformly randomly distributed allocation.
7.2
Extension using the transformation xir = xir.
We next consider the transformation xi → xir = xir for varying r, r>0.
This would have the following affect:
If r=0, we obtain perfect equality, since for all values of xi, xi0≡1.
If r=1, we have the current example, i.e. using the uniform random
distribution.
As r →∞, the curve works towards perfect inequality, since the largest
xi, (i.e. xp ) takes precedence and in effect all the other terms tend to 0.
This would then create a family of Lorenz curves covering all values of
the Gini co-efficient ‘G’. The greater the ‘r’ the more inequality.
Using the results of section 4.23, we can show that
Gr = ((2r-1)/2r)*((p-1)/p)
and, obviously, tends to (2r-1)/2r as p→∞.
Hence by varying ‘r’ we may be able to provide a benchmark
distribution with which to compare various Lorenz curves with the same
Gini co-efficient of inequality, covering all values of G.
We can find the value of ‘r’ using the following, for specified G, (0 < G <
1), and large p, putting
rG = -loge(1-G)/loge2
gives the required ‘r’, for a pre-specified G.
This suggests that we should be able to find a particular Lorenz curve,
for a particular ‘r’ which is determined by a pre-specified ‘G’.
This curve could then be used as a benchmark for comparing other
curves with the same ‘G’.
7.21
Shape of this family of curves
Further work is required to establish if it is possible to produce the
general shape of these curves. If we define this family of curves by
Lr(z) (0 < r < ∞ ) then we know the following, for 0 < z < 1:
L0(z) = z, L1(z) = z + (1-z)log(1-z), with Gr= (2r-1)/2r,
and as r→∞, Lr(z) → 0, 0 < z <1, Lr(1) =1, and Gr=1.
7.3
Third extension
This section looks at a third extension. Since L0(z)= z and
L1(z)= z+(1-z)log(1-z), we can express a generalised form of these as
∞
s+1
Ls(z)= z
+
Σ
s * zs+j+1 , 0 < s < ∞.
(s+1)
j=1 (s+j)*(s+j+1)
It is easy to show that, for all s > 0 , Ls(0)=0, Ls(1)=1. Since all the coefficients are positive it must be a Lorenz curve and that Gs = s/(s+1).
Hence we can, for any given G, put sG = G/(1-G) and hence we can
find a Lorenz curve of the above form, again covering all values of G.
8.
Summary and conclusion
In summary, the paper started from the Uniform distribution, went via
some reasonably complicated functions, then to the Gini co-efficient
and the Lorenz curve, and ended at the negative exponential
distribution.
The paper went on to propose extensions to provide a family of Lorenz
curves. These curves could be used as benchmark for comparing
actual curves with the same Gini co-efficient.
I hope this has been a useful trip into the properties of the Ordered
Uniform Spacings, Moment Generating functions, density functions, the
Gini co-efficient and the Lorenz curve.
We have related this work to the Uniform distribution, further work is
required to relate it to other distributions.
CJS
25/04/05
References
Barton, D.E, and David, F.N. (1956),“Tests for randomness of points on a
line”, Biometrika, 43, 104-112.
Durbin, J. (1961), “Some methods of constructing exact tests”, Biometrika, 48,
41-55.
Durbin, J. (1965), in the ‘Discussion on Professor Pyke’s Paper’ “Spacings”,
J. R. Statist. Soc. B, 27, 437-438.
Lewis, P.A.W. (1965), “Some results on tests for Poisson processes”,
Biometrika, 52, 67-78.
Pyke, R. (1965), “Spacings”, J. R. Statist. Soc. B, 27, 395-436, with
discussion.
Sarabia J.-M., Castillo E., and Slottje D.J. (1999), “An ordered family of
Lorenz curves”, Journal of Econometrics, 91, 43-60.
Stephens C.J. (1991), “Symmetry in disguise”, I.M.A. Bulletin, 27, 187-188.
Stephens M.A. (1986), “Tests for the Uniform distribution”. In Goodness-of-fit
techniques (eds. R. D’Agostino and M. A. Stephens). New York: Marcel
Dekker.
© Copyright 2026 Paperzz