INFORMETRICS 89/90
L. Egghe and R Rousseau (Editors)
O Elsevier Science Publishers B.V., 1990
ELEMENTS OF CONCENTRATION THEORY
Leo EGGHE
LUC, U n i v e r s i t a i r e Campus, 8-3610 Diepenbeek, Belgium (*)
MA, U n i v e r s i t e i t s p l e i n 1, 8-2610 W i l r i j k , Belgium
Ronald ROUSSEAU
Katholieke I n d u s t r i e l e Hogeschool West-Vlaanderen,
Zeedijk 101, 8-8400 Oostende, Belgium
UIA, Documentatie- en Bibliotheekwetenschap,
U n i v e r s i t e i t s p l e i n 1, 8-2610 Wilri j k , Belgium
Abstract
We review some concentration measures proposed i n the l i t e r a t u r e and
present a set o f p r i n c i p l e s t h a t good concentration measures must
f u l f i l l . We moreover look i n t o some o f the consesuences o f these
principles.
The transfer p r i n c i p l e i s extended t o y i e l d a new family of p r i n c i p l e s ,
denoted E(p), but a concentration measure can o n l y s a t i s f y E(p) f o r
a t most one p. We discuss b r i e f l y the issue of s e n s i t i v i t y t o
transfers and show t h a t Heine's dispersion measures are r e l a t e d t o
some well-known concentration measures.
1. INTRODUCTION
Concentration theory o r t h e measurement o f i n e q u a l i t y deals w i t h rankings of
d i s t r i b u t i o n s . I n economics and sociology people use t h i s theory t o answer
questions l i k e t h e following. I s the d i s t r i b u t i o n o f income i n country X now
more equal than i t was i n the past? Are t h i r d world countries characterized
by greater i n e q u a l i t y than western countries? Do taxes lead t o greater
e q u a l i t y i n the d i s t r i b u t i o n o f wealth? 111
I n informetrics and l i n g u i s t i c s one can ask s i m i l a r questions. I s the
d i s t r i b u t i o n o f journals t h a t publish papers on physics more concentrated
than the d i s t r i b u t i o n of journals dealing w i t h sociology? ( c f . the Bradford
d i s t r i b u t i o n ) . Are c i t a t i o n s t o mathematics papers more unequally d i s t r i b u t e d
than c i t a t i o n s t o chemical papers? Give a nunber t o characterize t h e
difference i n word occurrence between the works o f Mark Twain and the works
o f James Joyce. Are borrowings i n a u n i v e r s i t y l i b r a r y more unequally
d i s t r i b u t e d over the a v a i l a b l e books than i n a p u b l i c l i b r a r y ?
This l i s t o f questions and areas o f a p p l i c a t i o n f o r a general theory o f
concentration i s by no means exhaustive. I n general we w i l l use the
tenninology o f sources and items. I n the examples taken from sociology the
sources are t h e income classes, t h e items the people belonging t o these
classes; according t o the p o i n t o f view taken by the i n v e s t i g a t o r sources
might also be people, i n which case an item i s a c e r t a i n amount o f money.
(*) Penanent address.
L.Egghe and R. Rousseau
98
For t h e Bradford d i s t r i b u t i o n the sources are journals and t h e items are papers,
but i n t h e case o f c i t a t i o n analysis, the sources are papers and t h e items
c i t a t i o n s . I n the l i n g u i s t i c example sources are words and items occurrences
o f words. F i n a l l y , f o r the l i b r a r y example, sources are books and items are
borrowings. We r e c a l l t h a t e s p e c i a l l y i n l i n g u i s t i c studies, one often uses
the terms types and tokens instead o f sources and items. We w i l l denote t h e
nmber o f items i n t h e i - t h source by x.. The t o t a l number o f sources i s
denoted N, where N > 1 , and t h e r e l a t i v e number of items i n t h e i - t h source,
To develop a theory of concentration one has t o f i n d a mathematical f o n u l a t i o n
f o r t h e i n t u i t i v e meaning o f "concentration" o r "inequality". This means t h a t
x,).
I n t h e s p i r i t o f [21
we w i l l study functions f of N variables : f(xl,
....
we w i l l develop a set of p r i n c i p l e s such a function must s a t i s f y i n order t o
be an acceptable measure of concentration. Let us already mention two obvious
requirements.
(a) I f there i s a p e r f e c t concentration, i.e.
a l l x. = 0 except one,$hen
1
r xi.
i=l
(b) Ifa l l xi are equal (and d i f f e r e n t from zero) so t h a t there i s a p e r f e c t
f ( x l,...,~N)
equality, f(xl
a t t a i n s i t s maximal value, given a f i x e d value f o r
,...,x N ) must be zero.
Other natural, but more i n t r i c a t e p r i n c i p l e s wi 11 be investigated f u r t h e r on.
Equivalent t o t h e problem o f f i n d i n g good concentration measures, there i s the
problem of f i n d i n g good dispersion measures. Indeed, a dispersion measure can
be viewed as t h e opposite o f a concentration measure : i t measures t h e way i n
which a c e r t a i n d i s t r i b u t i o n i s n o t concentrated, i.e. i s dispersed. If
g(x l....,~N) denotes a measure o f dispersion, p r i n c i p l e s (a) and (b) becane
the following :
( a ' ) Ifwe have no dispersion (i.e.
x
= 0.
g(xl,
N
...,
p e r f e c t concentration) then
( b ' ) Ifwe have p e r f e c t dispersion ( i .e. no concentration) then g(xl
N
....,xN)
r xi items.
.
.
Ifthe maximal value i n condition ( a ) i s taken t o be 1 then a f i r s t suggestion
a t t a i n s i t s maximal value, given a f i x e d t o t a l o f
ill
f o r a dispersion measure g i s t o put g = 1 - f. This suggestion and r e l a t e d
ones w i l l be studied i n a more general context.
I n the next section we w i l l review sane concentration measures. which were
proposed i n the l i t e r a t u r e . Section three gives the r e l a t i o n between sane o f
these measures and the d i s c r e t e Lorenz curve; we w i l l a l s o show t h a t several
of these proposed concentration measures are of t h e form : a measure of
deviation ( o r s c a t t e r ) about the arithmetic mean 11 divided by U. I n section
f o u r we propose a number of principles, i.e. requirements t h a t good
concentration measures must f u l f i l l . We a l s o check which measures s a t i s f y these
principles. Sections f i v e and s i x deal w i t h r e l a t e d o r equivalent requirements.
Section seven considers t h e important issue of s e n s i t i v i t y t o transfers and
f i n a l l y , section e i g h t considers dispersion measures and t h e i r r e l a t i o n w i t h
concentration measures.
Elements of Concentration Theory
2. CONCENTRATION MEASURES
2.1.
It i s i n t u i t i v e l y c l e a r t h a t the c l a s s i c a l notions of standard
deviation ( a ) and variance ( a2 ), where
(and u denotes the mean of t h e d i s t r i b u t i o n ) , bear some r e l a t i o n on the
notion of concentration. But, as we w i l l show, they do n o t s a t i s f y
essential conditions f o r good concentration measures.
The above e q u a l i t i e s i n (1) are r e a d i l y seen and are we1 1-known.
2.2.
The c o e f f i c i e n t of v a r i a t i o n
was introduced t o deal w i t h r e l a t i v e values instead o f absolute ones
(cf. the argumentation i n [31, p.164 and following). I n a s i m i l a r way
we can consider
M'
o r Gaston's measure ([41, p.148)
o r A l l i s o n ' s modified squared v a r i a t i o n c o e f f i c i e n t (31
2.3.
From l i n g u i s t i c s we consider t h e Yule c h a r a c t e r i s t i c , defined as
In [51 Johnson advocates the use o f Simpson's index [61 i n s t y l i s t i c
studies.
where x. i s the nunber of words t h a t occur i times and n i s t h e t o t a l
nunber ' o f words t h a t occur i n the t e x t under study. Simpson's index i s
nothing but t h e number of i d e n t i c a l p a i r s divided by t h e number o f a l l
possible pairs.
2.4.
The Schutz c o e f f i c i e n t ( r e l a t i v e mean deviation) 171
L. Egghe and R. Rousseau
100
According t o Gastwirth [El, t h i s measure was f i r s t proposed by Yntma [91
and P i e t r a i n t h e 1930's.
2.5.
P r a t t ' s measure and t h e Gini index.
I n order t o define P r a t t ' s measure we f i r s t assume t h a t t h e xi's
a l s o the ai 's) are ordered decreasingly. P u t t i n g
(hence
P r a t t ' s measure C i s defined as ( [ l o ] ) :
N+l
2 (TC =
N-1
q)
G i n i ' s index i s then
N-1
G =TC.
It should be r e c a l l e d however t h a t the G i n i index was introduced i n
econometrics [ I 1 1 long before P r a t t ' s measure was defined. Relation (11)
was established i n 1979 by Carpenter 1121. The usual d e f i n i t i o n of Gini 's
index uses the so-called Lorenz curve (see further). Using only absolute
frequencies (xi's), P r a t t ' s measure can be r e w r i t t e n as :
N
N
(Ntl)
c
=
x
i= l
x. - 2
I
N
x
i.1
i x.
1
(12)
o r also as :
We also propose t h e f o l l o w i n g generalized P r a t t measure (see section
three f o r an explanation o f t h i s terminology) :
2.6.
T h e i l ' s measure 1131.
This i n e q u a l i t y measure i s defined as :
( c f . the n o t i o n o f entropy i n information theory).
We f u r t h e r remark t h a t i n t h i s formula one sets O.ln(0)
= 0.
Elements of Concentmtion Theory
2.7.
The variance of the logarithm.
which i s only defined i f a l l xi # 0.
2.8.
Atkinson's index.
I n [11 Atkinson introduced a f a m i l y o f concentration measures defined
as :
where e > 0 and e # 1.
Ifa l l xi # 0, A(1) i s defined as l i m A(e), which i s nothing but
€+I
rr - GM(xiIi
(18)
rr
as i s e a s i l y seen.
Here GM(xi )i denotes the geometric mean o f the xi,
that l i m
€+o
A(;) =
i= 1
,...,N.
Remark
0. The formula f o r the Atkinson index as presented i n
([21, p.873) seems t o be i n e r r o r .
2.9.
The CON-index [141.
I n the authors' own words, t h i s index i s the standard deviation o f the
percentage shares divided by t h e maximum possible standard d e v i a t i o n i n
a system o f size N.
This y i e l d s :
CON =
(19)
X.
be r e w r i t t e n as follows :
1
CON = -
= - l- - - o
-
V
!Jm\/m
(using ( 1 ) and ( 2 ) ) .
Formula (20) shows t h a t CON i s only a v a r i a n t of the coefficient of
variation.
2.10.
Lotka's a.
We f i n a l l y remark t h a t Rao 1151 pointed out t h a t when data follow Lotka's
distribution :
L. Egghe a n d R. Rousseau
102
Y
t h e ex onent a could be used as a measure of concentration. We r e c a l l t h a t
i n (217 f ( y ) denotes t h e number of sources w i t h y items.
3. THE EXPRESSION 'MEASURE OF DEVIATION' DIVIDED BY THE MEAN.
THE LORENZ DISTRIBUTION
Many concentration measures are of t h e form : a measure o f d e v i a t i o n about t h e
mean d i v i d e d by t h e mean. This i s obvious f o r t h e c o e f f i c i e n t o f v a r i a t i o n ,
Schutz' c o e f f i c i e n t and T h e i l ' s measure. Here we w i l l show t h a t o t h e r measures
are also o f t h i s form.
3.1.
Proposition
The G i n i index G i s equal t o
showing t h a t G i s also of t h e s p e c i a l form we a r e i n v e s t i g a t i n g here.
Pmof :
By (9). (10) and (11)
S u b s t i t u t i n g t h i s i n (*) above gives :
Ekmenta of Concentmtion The00
3.2.
Remark
The factor
1
-2
N
N
I x k - x Q I i s known i n the l i t e r a t u r e as the Gini mean
k=l e=l
difference. I t has recently been used 1161 as the basis f o r a measure o f
association which generalizes the Pearson p r o d u c t - m n t c o r r e l a t i o n
c o e f f i c i e n t , Kendall's t a u and Spearman's rank c o r r e l a t i o n c o e f f i c i e n t .
1
Z
N
3.3.
Corollary
N
r
P r a t t ' s measure C =
m7Tm-J=,,
N
r
=
,,
1xk-xal
u
We remark t h a t formulae (22) and (23) are independent o f the special ranking
we have used t o define G and C.
Proposition 3.1,
Corollary 3.3 and the f a c t t h a t
suggest t o introduce a generalized P r a t t measure ( i n the same way one could
also consider a generalized Gini index) as :
(cf. El, p.870). This gives a whole family o f concentration measures depending
) ' I 2 V.
on the parameter r. For r = 1, P(1) = C; f o r r = 2, P(2) = (
m
The next r e s u l t shows t h a t although A(e) has a somewhat s i m i l a r form i t i s
more a measure o f skewness divided by the mean.
3.4.
Proposition
A(e) has t h e form : arithmetic mean minus the generalized (1-ebmean, divided
by the arithmetic mean. Here the generalized (1-el-mean i s given by
Proof
:
L. Egghe and R. Rousseau
104
Remark that also A(1) has a similar form.
In the sequel of t h i s section we will investigate another relation, namely
that between the discrete Lorenz distribution and some of the proposed
concentration measures.
3.5. The discrete Lorenz distribution
If there are N classes (sources) and there i s perfect equality, then every
class contains I of the t o t a l number of items. The cunulative relative
frequency distribution of t h i s situation i s called the discrete Lorenz
distribution of equality and i s given by the diagonal points of Fig.1.
,.
..,N are ordered decreasingly we can also consider
When data points
the cumulative relative frequency of the xi (denoted by * in Fig.1). This
distribution i s then the discrete Lorenz distribution of the data. The sun of
the differences between these two discrete distributions i s then :
Elements of Concentration Theory
= ( ( N t 1 ) - 1 ) al +
... + ( ( N + l ) - i )
a.
1
+
... + ( ( N t 1 ) - N )
105
aN
Ntl
- -T
The greatest d i f f e r e n c e between t h e Lorenz d i s t r i b u t i o n o f e q u a l i t y and an
= a - 0. Then the difference
observed one i s obtained when al = 1, a2 =
Nequals
...
Normalizing ( * ) t o o b t a i n a value which i s always i n the i n t e r v a l 10,ll y i e l d s
N+l
2 +-q)
N-1
This shows the f o l l o w i n g proposition.
3.6.
Proposition
P r a t t ' s measure C i s the normalized sun o f the differences between the observed
Lorenz d i s t r i b u t i o n and t h e d i s c r e t e Lorenz d i s t r i b u t i o n of equality.
A3so Schutz' c o e f f i c i e n t can be r e l a t e d t o the Lorenz d i s t r i b u t i o n .
3.7.
Proposition ( c f . 181, Lemna 3)
Schutz' c o e f f i c i e n t i s equal t o the maximal difference between the observed
d i s c r e t e Lorenz d i s t r i b u t i o n and the Lorenz d i s t r i b u t i o n o f equality.
which should be equal t o
So, we have t o show t h a t
j
max [2 (,z xi
j
1 =I
- ju)]
N
= 1 Ixk
k=l
- HI .
L.Egghe and R. Rousseau
106
To show t h i s , we f i r s t remark t h a t f o r every k E {1,2,
...,Nl
...,
Choose now j E { l ,
Nl such t h a t xi 2 u i f i 5 j and xi < LI i f i > j. Such a
j always e x i s t s , i f the x i ' s are ordered decreasingly (which can always be
achieved via a permutation and a relabeling). Then
j
=
r
i =l
lxi
-ul
N
=
r lxi - u l
i=l
N
r
+
i=j+l
lxi
-u/
(definition of j )
.
N
Hence, the above max i s certainly not smaller than I: I x k - u / . B u t , using (*)
j
again we have also, for every i
E
(1,
...,N j
:
k=l
Hence, we have proved t h a t
j
max (2 ( r xi
j
i =l
-
N
j v ) ) = I: lxk
k-1
- UI
4. CONCENTRATION PRINCIPLES
In t h i s section f ( x l
4.1.
....,xN) denotes a general concentration measure.
(C1) If a l l xi are equal, say t o c # 0, then f ( x l
minimal value, equal to 0.
,..,,xN) a t t a i n s
its
This i s a perfectly natural condition, as already explained in the introductory
section. Remark also t h a t (Cl) implies t h a t a concentration measure i s never
Elements of Concentmtion Theory
negative. A l l measures considered i n Section 2 obviously s a t i s f y t h i s
condition (Even Lotka's a i s then zero). Indeed i t i s i n order t o s a t i s f y C1
t h a t one uses P r a t t ' s measure C (formula (10)) instead o f q (formula (9)).
4.2.
(C2) For every (xl
,....xN) and every permutation n : (1 ,....N}
....,xN) =
,...,'n(N))'
+
{l,
...,N>
we require t h a t f(xl
This p r i n c i p l e expresses t h a t e.g. poverty ( o r richness) o f a nation i s not a
l a b e l l e d property : i t i s only determined by t h e o v e r a l l configuration. Also
t h i s p r i n c i p l e i s s a t i s f i e d by a l l measures considered i n Section 2, as i s
r e a d i l y seen.
4.3.
Scale invariance : (C3)
This p r i n c i p l e says t h a t f o r every (x,
,....xN) and c > 0 :
It expresses the requirement t h a t a good concentration measure should not be
influenced by the units. Returning t o the case o f income d i s t r i b u t i o n s t h i s
means t h a t there must not be a d i f f e r e n c e whether we c a l c u l a t e the income i n
d o l l a r s , yen o r rupees.
2
The standard deviation ( a ) and the variance ( o ) do not s a t i s f y t h i s important
requirement, n e i t h e r do Gaston's measure (Gal, A l l i s o n ' s modified squared
c o e f f i c i e n t o f v a r i a t i o n (A) and Simpson's index ( J ) . From now on these measures
w i l l not be considered anymore.
However, concerning A l l i s o n ' s modified squared c o e f f i c i e n t of v a r i a t i o n , we
should p o i n t out t h a t A l l i s o n was f u l l y aware of the fact t h a t i t i s not scale
i n v a r i a n t for the observed data ( i n h i s case, these data were p u b l i c a t i o n
nmbers). He shows however t h a t A i s scale i n v a r i a n t for the underlying l a t e n t
r a t e o f p u b l i c a t i o n ([31, [171).
4.4.
(C4) "When the r i c h e s t source gets r i c h e r , i n e q u a l i t y r i s e s " .
This p r i n c i p l e i s a very natural one : i t has two requirements : the f i r s t i s
the one mentioned above and the second i s i t s dual : when the poorest source
gets poorer then t o o i n e q u a l i t y increases. I n a mathematical formulation t h i s
becomes the following.
(C4a) Ifxi = max {xl,
then, f o r h > 0,
...,xN1 and ifthere e x i s t s a k # i such t h a t xk # 0,
....,xi+h .....xN) > f(x ,,.,.,xN) .
= min (xl ,...,xN1 and 0 < h h x . then
J
f(xl
(C4b) If x .
J
The p r i n c i p l e s (C4a) and (C4b) can be expressed i n a d i f f e r e n t way as shown i n
the next propositions.
L.E g g k and R. Rousseau
108
4.5.
Propositions
A. (C4a) i s equivalent w i t h (C4a') :
...,
(C4a1). I f xi = max (xl,
xN1 and i f there e x i s t s a k # i such t h a t xk # 0,
then, f o r h > 0, f(xl
x.-h
xN) < f(xl
xN) as long as
1
x.-h 2 xt, f o r every t # i.
,...,
,...,
,...,
1
0. (C4b) i s equivalent w i t h (C4b') :
(C4b'). Ifxj = min {xl,
xN1 and 0
f(xl
x.+h
xN) < f ( x
J
every t # j.
...,
,..., ,...,
<
h then
,,....xN) as long as x.+h
J
-
s xt, f o r
.
A. The i m p l i c a t i o n (C4A)
(C4a') i s t r i v i a l
Suppose now t h a t (C4a') i s satisfied, then
f(xl
....,x .,....x),
1
< f ( x ,,... .xi+h
-
as xi = (xi+h)
(C4a).
(C4a')
-
,...,(xi+h) - h ,...,x N)
= f(xl
,....xN)
,
+ i. This shows the i m p l i c a t i o n
h 2 xt for every t
B i s s i m i l a r and i s l e f t t o t h e reader.
The proof o f p a r t
Lotka's a does not s a t i s f y C4a o r C4b f o r a change i n one source destroyes
the Lotka d i s t r i b u t i o n .
We w i l l now check whether t h e remaining concentration measures s a t i s f y t h i s p r i n cople. For convenience we only check (C4a); (C4b) can be d e a l t w i t h i n a s i m i l a r
way.
4.6.
I f xi
Proposition V
=
max (xj),
2 (hence also V, K and the CON-index) s a t i s f y (C4a).
then we e s s e n t i a l l y have t o show t h a t
j
This i s equivalent w i t h t h e requirement t h a t
u2N2 ( x x? + 2 xih + h2)
j
>
2
( H ' N ~ + 2 uhN + h2 ).(E xj)
J
j
or, using the fact t h a t E x?
j
<
u2N2. t h a t
J
2 2 2
2
2 2 2
Z ~ ~ ~ ~ x . h + 5 u2 uNh Nh . ( x x j ) + h u N
1
As uNxi = (E x.).x.
j~
.
j
1
2
rj
2
2
x., t h i s proves t h a t V s a t i s f i e s (C4a).
J
Ekments of Concentration Theory
4.7.
Proposition. Schutz' c o e f f i c i e n t , P r a t t ' s measure and the G i n i index
s a t i s f y (c4a).
Proof :
This follows imnediately from t h e i r i n t e r p r e t a t i o n using t h e d i s c r e t e Lorenz
curve (Section 3). Indeed, adding h t o x i yie!ds a cumulative r e l a t i v e
d i s t r i b u t i o n which l i e s s t r i c t l y above the o r l g l n a l one. Hence Schutz'
c o e f f i c i e n t , P r a t t ' s measure and the Gini index increase.
4.8.
Proposition. T h e i l ' s lneasure s a t i s f i e s (C4a).
Proof :
We consider the function
and show i t t o be increasing i n h. For t h i s we c a l c u l a t e f ' ( h ) and show t h a t
f ' ( h ) > 0.
So, we have t o show t h a t
As i n (N(xi+h))
satisfied.
4.9.
>
I n (Nxk), for every k # i, t h i s i n e q u a l i t y i s obviously
Proposition. The variance o f logarithms, L, s a t i s f i e s (C4a).
Proof :
When xi i s replaced by xi
hence L increases.
4.10.
+ h a l l t e n s i n t h i s sum stay t h e same o r increase,
Proposition. Atkinson's index A(e) s a t i s f i e s (C4a).
Pmof :
Let e # 1, then we consider the function
L. Egghe and R. Rousseau
and show i t t o be decreasing for h 2 0.
Taking the d e r i v a t i v e y i e l d s :
with
and
This means we have t o show t h a t
MN + h
<(r
+ (xi + h)l-e)JXi
x!-e
j#i
+ h)e
J
Now,
<
(xi
+hie.
1 x!'~
j#i
+ (xi + h)
(as e
>
O)
J
This proves Proposition 4.10 i f e # 1.
For t h e case e = 1, we consider the f u n c t i o n
( x l...(xi
+ h)
g(h) =
v+:
Then,
Now, we have t o show t h a t
...xN)1fN - (nj x j
n
+ h
j#i
u+;
x.) 1I N
J
Elements of Concentmtion Theory
As ir
s
xi and N
>
1, t h i s i s obvious.
This f i n i s h e s the proof o f Proposition 4.10.
F i n a l l y , we have t o show t h a t the generalized P r a t t measure satisfies(C4a).
As the proof we w i l l present i s r a t h e r long we postpone i t t o the appendix.
Here we only s t a t e t h e proposition.
4.11.
Proposition.
The generalized P r a t t measure, P ( r ) satisfies(C4a), f o r
rrl.
An important consequence of (C4a) i s t h a t i f f s a t i s f i e s (C4a),
e x p l i c i t l y on a l l sources. Indeed, iff would be independent o f
source, one could add items t o i t u n t i l 1 i t becomes the r i c h e s t
changing the value o f f ) . An a d d i t i o n a l increase i n items would
the same value f o r f, but t h i s would c o n t r a d i c t p r i n c i p l e (C4a).
4.12.
f depends
the i - t h
(without
s t i l l yield
(C5) The p r i n c i p l e of nominal increase.
This p r i n c i p l e requires t h a t an equal, nominal increase i n each source
s t r i c t l y decreases the global i n e q u a l i t y . More formally t h i s becomes :
f o r every (xl
xN). where n o t a l l xi are equal, and h > 0 :
f(xl+h
....,
,...,x,+h) < f ( x l ,...,xN).
The p r i n c i p l e o f nominal increase i s equivalent w i t h the requirement t h a t the
function F(h) = f(xl+h,
xN+h) i s s t r i c t l y decreasing on LO,+-[.
...,
Indeed, i f an i n e q u a l i t y measure s a t i s f i e s t h i s requirement then :
Conversely, if hl
then
<
h2 and i f f s a t i s f i e s the p r i n c i p l e o f nominal increase,
It i s easy t o see t h a t V, v2, K, CON and D s a t i s f y (C51.Adding a f i x e d value t o
each class again destroyes a Lotka d i s t r i b u t i o n , so t h a t Lotka's a does n o t
s a t i s f y t h i s p r i n c i p l e . Consequently, t h i s measure w i l l n o t be considered
anymore i n the following.
The next propositions show t h a t P ( r ) (and hence also C and G and once more V ) ,
Th, L and A(e) satisfy(C5).
4.13.
Proposition. The generalized P r a t t measure s a t i s f i e s the p r i n c i p l e of
nominal increase.
L. Egghe and R. Rorrsseau
112
Pmof :
we see t h a t adding h t o each x does not change the nominator. The denominator
on t h e other hand, increases fkom u t o u +h. This shows t h a t P ( r ) s a t i s f i e s ( C 5 ) .
4.14.
Proposition. T k i l ' s measure s a t i s f i e s the p r i n c i p l e of nominal increase.
Proof :
I t s u f f i c e s t o show t h a t the function
i s decreasing i n h. Hence we w i l l show t h a t F ' ( h )
< 0,
f o r every h
> 0.
We suppose now t h a t n o t a l l xi are equal and t h a t they are ordered increasingly.
Let j be the l a r g e s t index i n (1,
N
-
1 ( U xi)
i=l
=
...,N)
such t h a t x. S
J
u
(so j
< N).
0 we have :
j
u - 1x =
i=l
-
N
1 (u-xi)
i=j+l
But
Hence, by t h e d e f i n i t i o n o f j, we have the f o l l o w i n g m a j o r i z a t i o n :
w i t h e q u a l i t y only ifxl =
Using (*) t h i s gives :
... = x.J
<
... =
x ~ =+ ~
XN.
Since
Elements of Concentmtion Theory
. .
f o r every h
>
0.
This f i n i s h e s the proof o f the proposition.
4.15.
Proposition. The variance o f l o g a r i t h s s a t i s f i e s the p r i n c i p l e o f
nominal increase.
Pmof :
We have t o show t h a t :
This i n e q u a l i t y i s obviously s a t i s f i e d as i n i s an increasing concave function.
F i n a l l y also Atkinson's index s a t i s f i e s the p r i n c i p l e o f nominal increase.
4.16.
Proposition. A(e), e
>
0, s a t i s f i e s the p r i n c i p l e o f nominal increase.
Proof :
i s increasing i n h ( h t 0). This w i l l prove t h e proposition f o r e # 1.
Calculating F 1 ( h ) y i e l d s :
with
I14
L. Egghe a n d R. Rouaseau
and
Then :
Now, t o prove t h a t F(h1 i s increasing, we have t o show t h a t F'(h)
> 0, o r
:
This follows from Tchebyckff's i n e q u a l i t y ([IS], p.431, but f o r the reader's
convenience, we include a complete proof for the p a r t i c u l a r case we need.
This expression i s o n l y smaller than 0 for every (xi + h l i ift h e power of
(xi + h) i s such t h a t i t reverses t h e order, i.e. when -e < 0 o r e > 0. As t h i s
i s always the case, t h i s proves the proposition for e
N
For the case e = 1 we have t o show t h a t the function
increases, o r t h a t i t s d e r i v a t i v e :
+ i4 .
n
(xi + h ~ ' / ~
x
(xi + h )
&
i=I
Elements of Concentration Theory
N
N
(xi+h).
i= l
(
i= l
(xi + h ) )
r
j=l
2
( - ) -1N ]
2
J
>o.
This again i s a consequence o f Tchebycheff's i n e q u a l i t y ([181, p.43).
4.17.
( C 6 ) The transfer p r i n c i p l e
This p r i n c i p l e , o r i g i n a t i n g from Dalton ([191) states t h a t i f we make a
s t r i c t l y p o s i t i v e transfer from a poorer source t o a r i c h e r , t h i s must lead
t o a s t r i c t l y p o s i t i v e increase i n the index o f inequality. Formulated i n a
precise mathematical way t h i s i s : i f xi s x . and 0 < h s xi, then
J
f(xl
x
xj
x),
<
f(xl
xi-h
x.+h
I
J
X ~ ) (25)
We remark t h a t such a transfer leaves the arithmetic mean unchanged.
,..., ....., ,...,
"..,
,...,
,...,
Again i t i s easy t o show t h a t a, hence a2, V, V 2 , K and CON s a t i s f y the
p r i n c i p l e of transfers. The fact t h a t the generalized P r a t t measure P ( r )
(and hence also C, G and again V) s a t i s f i e s t h e transfer p r i n c i p l e w i l l follow
from our investigations i n Section 6. Here we w i l l show t h a t Schutz' c o e f f i c i e n t
and the variance of logarithms do not s a t i s f y ( C 6 ) . ~ h e i l ' s measure and the
Atkinson index on the other hand do s a t i s f y t h e transfer p r i n c i p l e .
Schutz' coefficient does not s a t i s f y the t r a n s f e r p r i n c i p l e f o r changes
between xi and x. such t h a t x . + h i s smaller than v, or, such t h a t xi h i s
J
J
l a r g e r than v, obviously leave D i n v a r i a n t . So, the r e l a t i v e mean deviation
i s not a good measure of concentration.
-
Also, the variance of logarithms does n o t always s a t i s f y t h e t r a n s f e r
principle.
4.18.
Example
TakeN.4;
x
-1,x2=2,x3=24,x4=25andi
53, j - 4 , h = l . T h e n
1 L = 2.0936 and a f t e r the t r a n s f e r L becomes 2.0929, so t h a t i n t h i s case L
decreases instead o f showing an increase.
A l l i s o n [[21, p.868) w r i t e s t h a t t h i s effect happens when x; and x, are both
J
l a r g e r than e ( r ~2.718) times the geometric mean. We were unable t o v e r i f y
t h i s assertion.
We also remark t h a t f o r N = 2, L does s a t i s f y the p r i n c i p l e o f t r a n s f e r s (see
appendix). But, o f course, t h i s remark i s unimportant i n practice.
4.19.
Proposition. T h e i l ' s measure s a t i s f i e s the p r i n c i p l e o f transfers.
Fmof :
We have t o show t h a t
L. Egghe and R. Rousseau
116
-
-
o r (xi h ) I n (xi h) + ( x . + h) I n ( x . + h ) > xi i n xi + x . I n x..
J
J
J
J
This f o l l o w s imnediately from the f a c t t h a t the function t i n t i s convex.
4.20.
Proposition. A(e) s a t i s f i e s the transfer p r i n c i p l e , for every e
> 0.
Proof :
For every 0
for every h
<
>
e 6 1, the f u n c t i o n H ( t ) = tl-e i s increasing and concave, hence,
0, and xi 5 x .
J
(*) and t h e f a c t t h a t J ( t ) = tl'(l-e)
i s increasing shows t h a t such a transfer
diminishes the second t e n o f t h e nominator, hence A(e) i t s e l f increases.
I f e > 1, the f u n c t i o n Hl(t)
and x. 5 x.
1
= tl-e i s decreasing and convex, hence, f o r h
>0
J
1
N
This shows t h a t t h e t e n ( X x i - e ) increases, but as J l ( t ) = tE i s decreasing
k=l
N
i n t h i s case, ( X
decreases as a whole, which shows t h a t A(e)
k=l
increases.
F i n a l l y , as (xi
- h ) ( xJ. + h ) < X 1. X .3
also A(1) s a t i s f i e s t h e t r a n s f e r p r i n c i p l e .
I n the next sections we w i l l study some consequences and extensions o f the
transfer p r i n c i p l e . Here however, we already note one important consequence.
4.21.
Theorem
I f f s a t i s f i e s the transfer p r i n c i p l e (C6), then
f(n,O
,...,0) =
max
N
f(xl
,....x),
x x -n
k=1 k Indeed, one can t r a n s f e r one u n i t a t t h e time from a poorer source t o a r i c h e r
one. By the t r a n s f e r p r i n c i p l e , t h e f u n c t i o n f increases d u r i n g t h i s process.
It stops when one source contains a l l the items. As a consequence, i t i s a t
t h a t moment t h a t f a t t a i n s i t s maximal value.
4.22.
It could also be argued t h a t a good measure o f concentration should vary
between 0 and 1 :
Elements of Concentmtion Theory
I....,XN)
P r i n c i p l e ( 0 ) : For a l l ( X
However, t h i s p r i n c i p l e i s only a mathematical convenience. I t shoulId n o t imply
any preference, as simple transformations can produce any desired bcunds. I f a
measure f i s p o s i t i v e and does n o t s a t i s f y the requirement t h a t f S 1, then we
can use the transformation
f
+7-7f
This y i e l d s an increasing function of f w i t h values i n t h e i n t e r v a l L0.11.
The transformed function s a t i s f i e s (C1) t o (C6) if f does.
4.23.
We conclude t h a t the f o l l o w i n g measures s a t i s f y a l l our concentration
p r i n c i p l e s and hence might be considered t o be good concentration
measures i n the case the nunber o f sources stays f i x e d : the c o e f f i c i e n t of
2
v a r i a t i o n ( V ) and i t s square ( V 1, the Yule c h a r a c t e r i s t i c (K), t h e CON-index,
P r a t t ' s measure (C) and t h e Gini index (G), t h e generalized P r a t t index, P ( r ) ,
for r z 1. T h e i l ' s measure and Atkinson's index.
4.24.
Remark
The p r i n c i p l e s we have studied i n t h i s section can be described i n a more
abstract mathematical framework. Then they are a consequence o f the f a c t t h a t
good concentration measures must be s t r i c t l y Schur-convex and scale i n v a r i a n t .
For the reader interested i n t h i s mathematical theory we refer t o 1201.
5. REQUIREMENTS RELATED TO THE TRANSFER PRINCIPLE
5.1.
An equivalent formulation
Instead o f t a k i n g from a poorer source t o give t o a r i c h e r i n order t o increase
the inequality, one may also consider the opposite. Does g i v i n g t o t h e poor what
has been taken from the r i c h diminish the i n e q u a l i t y ? And, i s t h i s i n some sense
equivalent w i t h the t r a n s f e r p r i n c i p l e as expressed i n (25)? The exact answer i s
given i n Theorem 5.2. We thank professor I.K. Ravichandra Rao for suggesting
us t o i n v e s t i g a t e t h i s matter.
5.2.
Theorem
Iff satisfies(C2)then the t r a n s f e r p r i n c i p l e :
O<xi'xj;O<h~xif(xl
.....xi ....,x j ,...,x),
<
f(xl
,...,x.-h
....,x.+h
,....X
1
3
(25)
~ )
i s equivalent w i t h the f o l l o w i n g
0
<
f(x1
x . 5 x.,
1
J
,...,xi+h
- ( xJ. - h ) l < J ,...,xj -h ,....xN) < f(xl, ..., ....,Xj,....XN)
I(xi + h )
X.
Xi
Xi
(26)
L. Egghe and R. Rousseau
118
Proof :
A. (25)
*
(26)
(i)IfOs(x.+h)-(x.+h)<x.-xithenxithsx.-handh>O.Hence(25)
J
1
J
J
implies t h a t
f(x
( i i ) If0
s
,,...,x . + h ....,xJ. - h ,...,xN) < f ( x 1 ,...,x .,...,x j ....,xN) .
1
(x. +h)
1
-
1
(x.
J
-
- h) <
- xi
x.
xi then obviously xi + h 2 x . - h. Moreover,
J
J
implies t h a t h < x.J
xi o r 0 < x.J - h xi.
(,x.1 t h) - ( x J. - h) < xJ.
Then (25) implies t h a t
-
-
,...,x.1 + h,. ..,x.J - h,. ..,xN) < f(xl ,...,xj ,... ,xi ,...,xN) ,
( w i t h ( x . - h - x.) i n t h e r o l e o f h).
J
f(xl
1
This proves p a r t A. (Remark t h a t we have used the f a c t . t h a t f s a t i s f i e s (C2)).
6. (26)
If0
<
r
(25)
h i x., and x. 5 x. then
1
J
1
I x . - x j / = x.
1
J
-
xi
<
(x.+h)
J
-
(xi - h )
.
By (26). t h i s implies t h a t
,...,x. 1 ,....xj ,...,xN)
= f(xl ,...,(xi - h) + h ,...,( x j + h ) - h ,... d N )
< f ( x l ,..., - h ,...,xJ. +h,...,xN) .
f(xl
X.
1
This proves p a r t B.
5.3.
Remark
It i s easy t o see t h a t (25) and (26) are also equivalent w i t h
(27) ( t r i v i a l ) and (27) r (25) as i n p a r t B of the preceding
Indeed, (26)
proof. F i n a l l y , (25) r (26) ( p a r t A o f Theorem 5.2).
The transfer p r i n c i p l e i s also equivalent w i t h t h e following.
5.4.
Proposition. The t r a n s f e r p r i n c i p l e i s equivalent w i t h t h e requirement
t h a t f o r xi S x . the f u n c t i o n
J
~ ( h =) f(xl
,,..,x.1 - h ,....xJ. + h ,...,x N)
i s s t r i c t l y increasing on [O,xil.
Elements of Concentration Theory
Prwf :
Iff s a t i s f i e s t h e transfer p r i n c i p l e and hl
< h2
then
,...,xi - h2 ,...,x3. + h 2 ,...,xN)
= f(xl ,...,x. - hl - ( h -h ) ,...,x.+hl + ( h -h ) ,...,xN)
2 1
J
2 1
> f(xl ,..., - hl ,... ,x.J +hl ,... ,xN) .
f(xl
1
X.
1
Conversely, ifA(h) i s s t r i c t l y increasing then Vh E 1 0 . ~ ~ 1
f(xl
,...,xi - h ,..., J + h ,...,xN) > f ( x l ,...,x .,...,x j ,...,xN) .
X.
1
The transfer p r i n c i p l e e n t a i l s t h e f o l l o w i n g i n t e r e s t i n g consequences.
5.5.
Proposition
( i ) Iff s a t i s f i e s t h e t r a n s f e r p r i n c i p l e then
h
~ x-N -~m ) > f(xl ,....xN)
...,
where 0 < h < (N-l).min I x 2 , ....xN} and
f(xl +
h
,
x 1 -- max {xl,
h
~
...,xN}
( i i ) Iff s a t i s f i e s t h e transfer p r i n c i p l e then
f(xl
- h,x2
where x, = min
h
h
+m...,
xN +rn)> f(xl ,....xN)
{xl, ...,xN} and 0 < h h xl
(29)
Proof :
We w i l l only show ( i ) , ( i i ) follows i n a s i m i l a r way.
By the transfer p r i n c i p l e , we have t o f o l l o w i n g i n e q u a l i t i e s :
Combining these (N-1) i n e q u a l i t i e s y i e l d s the required r e s u l t .
I t i s remarkable t h a t , although i n practice, (28) i s almost t h e same as (C4a).
the proof t h a t P ( r ) s a t i s f i e s (C4a) i s not t r i v i a l w h i l e the proof t h a t P ( r )
s a t i s f i e s (28) i s easy, as can be seen from the following d i r e c t proof.
5.6.
Proposition. The generalized P r a t t measure P ( r ) s a t i s f i e s (28).
L. Egghe and R. Rousseau
120
We have t o show t h a t
which y i e l d s
This i n e q u a l i t y i s obviously satisfied.
The main d i f f i c u l t y i n proving (C4a) f o r P ( r ) ( o r any other measure) l i e s
N
mainly i n the fact t h a t i n t h i s p r i n c i p l e i xi does not remain constant (as
i=l
opposed t o the s i t u a t i o n i n (28)). This i s however a very natural s i t u a t i o n
( i n econometrics : the t o t a l wealth o f a country i s not constant i n time; i n
bibliolnetrics : the t o t a l nunber of a r t i c l e s i n a bibliography over a f i x e d
time i n t e r v a l , i s not constant i n time, and so on).
6. THE EXTENDED TRANSFER PRINCIPLE
I n t h i s section we introduce a family of principles, r e l a t e d to, and i n f a c t
extending the transfer p r i n c i p l e .
6.1.
The E(p) p r i n c i p l e
If(xl
,...,xN) i s transformed
into (xi
,...,x i ) such t h a t
and
then f ( x l
V,
,...,xN) < f ( x i ,... , xN' )
'
v2,
K and CON obviously s a t i s f y E(2); C and'(; s a t i s f y E(1) (by Proposition
3.1 and Corollary 3.3); P ( r ) t r i v i a l l y s a t i s f i e s E ( r ) , f o r every r L 1.
The next proposition shows t h a t E(p) i s indeed a generalization o f the transfer
principle.
6.2.
Proposition. Ifa concentration measure f s a t i s f i e s E(p) f o r sane p 2 1,
then i t s a t i s f i e s the t r a n s f e r principle.
By induction on the nunber o f sources (N).
suppose t h a t ( x ~ ) ~ = ,N~ i s ordered decreasingly and l e t ( x ; ) ~ = ~
,...
,...,N
Elements of Concentmtion Theory
denote the sequence (xl
We have t o show t h a t
For N
=
,...,xi
t
h
....,x.J - h ....,xN) where i < j
2, t h i s i n e q u a l i t y becomes
(xl
- x2)P <
(xl
121
- x2 + 2 h l P
and 0
<
h 5 x..
J
.
which i s obviously s a t i s f i e d .
We suppose now t h a t (*) i s s a t i s f i e d f o r N and we w i l l show t h a t then (*I i s
also s a t i s f i e d when N i s replaced by N+1. We suppose t h e vectors t o be
decreasing.
= x . - h. then (*) i s t r i v i a l l y s a t i s f i e d f o r N+l.
a) Ifx i = xi t h and
J
b) We now suppose t h a t x i # xi + h o r
P xj
- h.
b l ) Suppose XI;+!
# x . - h, hence XI;+?
< XJ. - h (and xN+!
J
Deleting xNtl gives the f o l l o w i n g N-sequence :
(xi
,x.1 + h,.. .,x. J - h
,xN) (denoted
,...
,...
=
,...,N ).
1.
Then
N+1 N t l
= a=l
=
k1.
/xk-xLIP
(using the i n d u c t i o n hypothesis).
-
Consider the vectors A = (xi -xN+!, x j x N + 0 and
B = (x.1 + h xN+?, x j h - x ~ 1.+ Note
~ t h a t a l l components are non-negative.
I t i s easy t o see t h a t [181, p.89 i s applicable t o A and B and hence
(since d.) = I.IP i s convex)
-
-
( x1 . - ~ ~ + ~ )J ~ + ( ~ . - ~ ~ + ~ N+1
) ~) :P :+ ( ( xJ~ ~. -+h h+ -+1
~ xIP
N
S u b s t i t u t i n g t h i s i n (**) gives
L. Egghe and R. R o w e a u
122
b2) Suppose x i # x i + h.
This proof i s exactly the same as the one of ( b l ) .
This finishes the proof o f Proposition 6.2.
6.3.
Corollary
The generalized P r a t t measure, P(r), r L 1, s a t i s f i e s the t r a n s f e r p r i n c i p l e
and hence, so does V, C and G.
6.4.
Remark
There does not e x i s t a function f ( x l.....~N) t h a t s a t i s f i e s E(p) for every
p 6 No. I n fact, we can show 1211 t h a t ifa function f s a t i s f i e s E(p) f o r some
f i x e d p, i t can not s a t i s f y E(q), for a l l q # p, a t l e a s t ifN 2 3, which i s
always the case i n p r a c t i c a l applications. (If
N = 3 then the above i s a l s o
t r u e except f o r (p,q) = (2,4) o r (p,q) = ( 4 2 ) ; indeed : i f N = 3 then
E(2) = E(4) : see [211).
6.5.
Definition
As a kind o f l i m i t i n g property f o r the p r i n c i p l e s E(p) we also f o m u l a t e a
p r i n c i p l e E(-) as follows :
If(xl
,...,xN) i s transformed
N
r
into (xi
,...,X);I
such t h a t
N
xi =
i-1
x
i= l
XI
1
and
max ] x i - x . 1
4 ,j
J
<max I x ; - x ' I
i,j
5
then
Now, one might conjecture t h a t also E(-) implies the transfer r i n c i p l e . That
t h i s i s n o t so can be seen as follows. We define a measure P(-! by
P(-1 = 1 max I x . x (
P i .
1
j
-
-
.J
It i s e a s i l y seen t h a t the measure P(-) i s a l i m i t i n g measure o f the measures
P(r) since
Elements -of Concentration Theory
This argument also explains why t h e n o t a t i o n E(-)
i s used above.
For t h i s measure P(-) we show t h e following.
6.6.
Proposition. P(-) s a t i s f i e s E(-) but i t does n o t s a t i s f y t h e t r a n s f e r
p r i n c i p l e , i f N 2 4.
P(-) s a t i s f i e s E(-) t r i v i a l l y . To see t h a t P(-) does n o t s a t i s f y t h e t r a n s f e r
p r i n c i p l e , an example s u f f i c e s .
Take N = 4 :
x 1 -- 1 , x 2 = 2 , x 3 . 3 , x
4 -- 5
x i = 1, x i = 1, x j = 4, x 4' -- 5
(hence i = 2, j = 3, h = 1). Then
P(-)(1,2,3,5)
6.7.
=
-4II =
P(-)(1,1,4,5)
.
Remark 1
I t i s e a s i l y seen t h a t , i f N = 3, E(-) = E(1) (see also [211). Hence o n l y i n
t h i s t r i v i a l case, E(-1 i m p l i e s t h e t r a n s f e r p r i n c i p l e , since E(1) does.
Remark 2
The f a c b t h a t E(p) i m p l i e s the t r a n s f e r p r i n c i p l e and E(-) does n o t ( i f N t 3 ) ,
do n o t c o n t r a d i c t each other. Indeed, although E(-) i s a l i m i t i n g case o f t h e
E(p)'s, t h i s does n o t mean t h a t E(-) i n h e r i t s a l l p r o p e r t i e s o f t h e E(p)'s.
The problem l i e s i n t h e f a c t t h a t i f
max / x i - x . I
i,J
J
then there i s a p
>
<max l x ; - x ! I
J
i,j
0 such t h a t
-
but t h i s p depends on t h e p a r t i c u l a r difference max ix; - x ! 1 - max ixi x. 1.
J
i,j
J
i,j
So, the o n l y t h i n g t h a t can be s a i d i s , t h a t i f f s a t i s f i e s E(p) f o r every p
l a r g e r than some f i x e d po then f s a t i s f i e s E(-1. However no function of t h i s
k i n d e x i s t s ( c f . Remark 6.4).
I t i s possible t h a t a measure s a t i s f i e s t h e t r a n s f e r p r i n c i p l e and, i n fact,
a l l o t h e r p r i n c i p l e s C1 - C6, w i t h o u t s a t i s f y i n g any o f t h e E(p)'s, p L 1.
by t h e next r e s u l t s .
This i s s h w n - several times
-
6.8.
Proposition. T h e i l ' s measure does n o t s a t i s f y any o f t h e E(p) p r i n c i p l e s ,
p 5 1.
Proof :
L e t N = 3, X = (x,.x2,x3)
=
(2,47,134)
f i r s t show t h a t , f o r every r 5 1 :
and X ' = (x{,x;,xj)
= (8,34.141).
We w i l l
L. Egghe and R. Rousseau
124
For t h i s , i t s u f f i c e s t o show that, f o r every r t 1 :
45' + 132' + 87'
< 26'
+ 133'
.
+ 107'
We p u t :
A = (al.a2,a3)
= (132,87,45)
B = (bl,bp.b3)
=
and
where a =
264
(133a,107a,26a)
,
. The parameter a ensures t h a t
3
r
3
a = 2
i-1
b.
i;l
1'
Furthermore : al = bl and at + a2 t bl + b2. As a l s o al 2 a2 L a3, bl 2 b2 2 b3
and r r 1, we can apply 1181, p.89 once more, showing that, for every r t 1 :
132' + 87' + 45r 2 ( 1 3 3 a ) ~+ (107a)'
+ (26aIr
.
Hence
132' + 87' + 45'
<
+ 107' + 26' ,
133'
3
i s s a t i s f i e d . We also have I xi
i=l
Th(X) = 0.472 > Th(X1) = 0.448 ,
which shows t h a t (*)
3
=
r
x i = 183, but
i= l
showing t h a t Th does n o t s a t i s f y E(p), f o r every p 2 1.
6.9.
Proposition. Atkinson's index A(e) does not s a t i s f y E(p), f o r every p
r
R.wf :
We w i l l show t h i s only for e = 0.5,
same example as i n Proposition 6.8.
~'7+
t
R j T = 19.846
e = 1, e = 2 and e = 3. We begin w i t h the
Since
< JB + /3? + vIT=
we see t h a t A(0.5) does not s a t i s f y E(p), f o r every p
20.534
z 1.
Also, since
(2).(47).(134)
= 12596
<
(8).(34).(141)
= 38352
,
we see t h a t A(1) does not s a t i s f y E(p), f o r every p t 1.
N and X ' =
F i n a l l y . choose X =
such that, f o r every p L 1 :
and such t h a t
1.
Elements of Concentration Theory
125
(and no x. o r x; i s zero).
1
When N = 2, t h i s can be r e a l i z e d by t a k i n g X = (x,x) and X ' on the l i n e segnent
j o i n i n g ( 2 x 8 ) and ( 0 2 x 1 , but X ' # X and n o t on t h e x-axis o r y-axis, see
figure 2.
Since (*) i s a l r;o v a l i d f o r p = 2, we have, by (1) i n 2.1,
w i t h N = 2) :
,2
2
2
,2
Xl + X2 < X1
+ X2
t h a t (continuing
(***I
We now show t h a t (**) and (***) imply t h a t
Indeed :
But xl + x2 =
So,
(***I
xi
+ xi, hence
x2 + 2x1x2 + x;
1
implies
XlX2
= x i 2 + 2 x j x i + x2,2
.
> xixi
This y i e l d s
This. i n turn, together w i t h (*) and (**) y i e l d s t h a t A(3) does not s a t i s f y
E(p) f o r every p 2 1. Also,
by !**) and (****). This shows, together w i t h (*) and
s a t i s f y E(p) for every p 2 1.
(**I
t h a t A(2) does n o t
L. Egghe and R. Rousseau
126
6.10.
Remark
I n 1211, i t i s shown that, when N = 3, E(1) = E(-).
Th and A(e) do not s a t i s f y E(-) also.
Consequently, we have t h a t
7. SENSITIVITY TO TRANSFERS
The transfer p r i n c i p l e describes an increase o f f under the t r a n s f o n a t i o n
7.1.
where xi i x . and 0 < h I xi, but i t does not say anything about t h e degree of
J
increase i n function o f other parameters. One such parameter could be the
difference between x j and xi o r the place where the t r a n s f e r occurs (when the
N sources are ordered i n sane n a t u r a l way). Considerations on t h i s k i n d of
s e n s i t i v i t y are given e.g. by Atkinson ( [ ! I ) and A l l i s o n ([21, [31).
I n t h i s context we f i r s t offer the f o l l o w i n g proposition.
7.2. Proposition. If, f o r every i and j, Gi,j(t)
= f(xl
i s o n l y function of t = x. -xi,
3
are equivalent :
,...,xi ,....x j ,...,xN)
x. 2 xi,
3
then the following
( a ) the functions Gi
. ( t ) are s t r i c t l y increasing, f o r every i,j;
,J
( b ) f s a t i s f i e s the t r a n s f e r p r i n c i p l e .
Pmof :
Ifthe functions Gi
.J
. ( t ) are s t r i c t l y increasing then obviously
Conversely, i f 0 2 tl
x.
J
-
<
t2,take xi
t2
x. = tl and put h = ,
1
Gi,j(t,)
-
t2 - tl
and x. such t h a t xi L --T- and
.1
then, by the t r a n s f e r p r i n c i p l e :
,....xi ,....xj ....,xN)
= f(xl
This proves t h e proposition.
-
,....,
,...,
,...,
The expression f ( x
xi - h
X. + h
xN) can also be studied as
J
a f u n c t i o n 6 f "the p1ace"wherethe t r a n s f e r occurs. As there are a c t u a l l y
two places where changes occur, the simplest way t o proceed i s t o consider f
i+ j
as a function of
This approach makes the most sense when t h e sources
7.3.
- .
have sane i n t r i n s i c ordering (i.e. are n o t j u s t ordered from l a r g e s t t o
smallest), as i s the case when studying the income d i s t r i b u t i o n of a country,
where sources are income classes.
The problem of s e n s i t i v i t y of a concentration measure depends l a r g e l y on the
specific s i t u a t i o n t o which the measure i s applied. For instance, A l l i s o n [31
argues
i n the case o f publications o r c i t a t i o n s
i n favor o f a measure t h a t
i s equally sensitive t o differences a t a l l levels of the d i s t r i b u t i o n . The
coefficient of v a r i a t i o n s a t i s f i e s t h i s requirement.
-
-
We w i l l now review the s e n s i t i v i t y of some of the good concentration measures
we have found i n section 4. We leave an exhaustive i n v e s t i g a t i o n f o r f u r t h e r
research. I n what follows, A f denotes the difference between
f(xl
xi h
x.+ h
x),
and f ( x l
x
x
x ). 0 < h S xi;
J
1
J
N
,...,
...., - ,....
X.
1
s
,.... ...... ......
X..
J
7.4. S e n s i t i v i t y of V
&v2
=
'
2
2
[(xi
- h - u) 2 + ( x . + h - 11)'
J
- (xi - d2- ( x j - U) 2 I
This expression depends l i n e a r l y on the difference between x. and xi,
J
independent of the place where the transfer occurs.
7.5. S e n s i t i v i t y of T h e i l ' s measure
1
h) + ( x . + h ) l n ( x . + h)
ATh = fi [(xi - h ) ln(xi
-
J
J
- xi
TO estimate the meaning o f t h i s difference we w i l l show t h a t
= (using L ' H 6 p i t a l ' s r u l e )
i n xi
but i s
- x.J
I n x.1
J
L.Egghe and R. Rousseau
128
Hence, for small h, ATh depends l i n e a r l y on the difference between i n x . and
J
I n xi.
7.6.
Let j
S e n s i t i v i t y of P r a t t ' s measure
<
<
i, hence xi
x.. Then
J
and C2, a f t e r a t r a n s f e r o f h from xi t o x. :
J
where
i s the transformed, ordered sequence consisting of xi
- h,
x.
3
and xk, k # i, k # j.
Suppose now t h a t xi
-h
Xi+s
h
has a rank equal t o i t s and t h a t the rank of x. + h i s
J
j-r ( r , ~
# 0). then
... +
t
-s
Xi]
.
I n the special case where there i s no s h i f t i n rank
as i s r e a d i l y seen.
For t y p i c a l l y shaped income d i s t r i b u t i o n s the P r a t t measure (and the Gini index)
tends t o be m s t s e n s i t i v e t o transfers around the middle o f the d i s t r i b u t i o n .
On the other hand, for p u b l i c a t i o n patterns, the P r a t t measure i s most
sensitive among low producers.
Elements of Concenhation Theory
129
The Atkinson index A(e) and the generalized P r a t t index P ( r ) are i n fact
families of concentration measures. According t o t h e value o f the parameter
e o r r the s e n s i t i v i t y changes. They have the advantage t h a t the value o f the
parameter can be adapted t o the s p e c i f i c case under consideration.
8. DISPERSION MEASURES
8.1.
Roughly speaking, dispersion measures are t h e opposite o f concentration
measures. Makina adeauate chanaes t o the o r i n c i o l e s f o r concentration
measures (such as, using < inslead o f >)one obtains a set o f p r i n c i p l e s
good dispersion measures must f u l f i l l .
8.2.
A general strategy t o construct dispersion measures from concentration
measures i s t h e following. Let f be a concentration measure t a k i n g
values i n the i n t e r v a l [0,11. Then h = 1 - f i s a dispersion measure.
So, i m p l i c i t l y , we already know a large set o f dispersion measures.
8.3.
Independently of the above remarks, Heine ([221) studied some measures
of dispersion. I n our notation, the three measures he introduced are
.,xN) be ordered i n increasing order ( t h i s
given as follows. Let (xl
,..
asstanption i s also made i n [221).
A. The adapted Gini index DG.
8. Singleton's index DS.
( I t can r e a d i l y be v e r i f i e d t h a t formula (31) i s the same as Heine's more
i n t r i c a t e formula ( 7 ) i n 1221).
C. The normalized entropy index DE.
N
If N #
1 xi and N+l #
i= l
D~ =
with
-Emin
max
Emin
N
X xi,
i= l
then
L. Egghe and R. Rousseau
N
E~~~ = log2 (
r
xi)
i-1
-
(1
- -) N - 1
r
N
log2 (N+1
Xi
- i I= l
xi)
(34)
i= l
and
N
N
I: x i - K
i-1
I xi-K
i-1
7
log2 )-(
Emax = -(N-K) N . r xi
N I xi
i=l
i-1
-
N
N
where K = 1 xi (mod N), i.e. K i s the r e s t o f I xi a f t e r d i v i s i o n by N.
i-1
i-1
Although i n t r i c a t e , DF- can inmediately be detennined from t h e raw data
(xl,
xN). The use o f log2 shows i t s r e l a t i o n w i t h information theory
( b u t t h i s i s n o t essential).
...,
8.4.
I n [221, Heine gives properties o f t h e measures DG, DS and DE. We note
however t h a t DG, DS and DE are not new measures. Instead, we can r e l a t e
then t o we1 1-known concentration measures.
8.5.
Proposition.
C =
DS
= 1-C.
N+l
2+-q)
N-1
N
where q i s now equal t o r ( N i + 1) ai,
increasing order.
i= l
so
N
N + l - 2 I (N-i)ai-2
i=l
C =
N-1
-
since (xl
,...,xN)
i s now ordered i n
Elements of Concentration Theory
The next p r o p o s i t i o n shows t h a t DG and DS are e s s e n t i a l l y the same.
8.6.
Proposition
N
Proof :
8.7. Corollary. DG S DS, w i t h e q u a l i t y o n l y i f a l l xi are equal.
Proof:
-
Since 0 IC I1, we have a l s o 0 IDS 5 1. Hence
N
x
N
x
Xi
N
i=l
L DS 7$
--= Ds (--1)
x
xi-N
i=l
i= l
w i t h e q u a l i t y o n l y if DS = 1, i.e.
Hence
N
z
D s 2 D)s7 (
x
xi - N
i=l
C = 0, i.e.
Xi
i= l
1 xi-N
i=l
--=D,.
x
i=l
N
xi-N
x.-N
1
when a l l xi
are equal.
L. Egghe and R. Rousseau
132
The f o l l o w i n g c o r o l l a r i e s are easy consequences from Propositions 8.5 and 8.6
and t h e r e l a t i o n between P r a t t ' s measure and the Gini index.
8.8.
Corollary
N
1 X. - N
1
i-1
and
8.9.
Corollary
N
8.10.
Corollary. C I 1
-
DG, w i t h e q u a l i t y only i f C = 0.
Proof :
N
Xi
i=l
l-DG=T-C>C.ifC+O.
x x.
i-1
- N
1
When C = 0, then a l l xi are equal and hence DG = 1. I n t h i s case : C = 1
-
DG.
The previous r e s u l t s show t h a t DG nor DS are new measures, hence t h e i r
properties can be deduced from Sections 3 and 4 : no special i n v e s t i g a t i o n i s
needed. Furthermore, the same holds for DE.
8.11.
Proposition
Th=lnN-E,
hence
Elements of Concentration Theory
So we conclude t h a t Heine [221 i s only dealing w i t h t h e dispersion version of C
( o r G) and Th. As we have shown before t h a t these are good concentration
measures, DG, DS and DE are good dispersion measures.
9. CONCLUSlON; SUGGESTIONS FOR FURTHER RESEARCH
I n t h i s paper we have reviewed some concentration measures pro osed i n the
t h a t good
l i t e r a t u r e . We have presented a set of p r i n c i p l e s - ( C l ) t o
concentration measures must f u l f i l 1 and have investigated some of t h e
consequences o f these p r i n c i p l e s . The transfer p r i n c i p l e was extended t o y i e l d
a new famil o f p r i n c i p l e s , denoted E(p), but a concentration measure can only
s a t i s f y E(p7 for a t most one p. We discussed b r i e f l y the issue of s e n s i t i v i t y
t o transfers and have shown how Heine's dispersion measures are r e l a t e d t o some
well-known concentration measures.
(A) -
Suggestions f o r f u r t h e r research :
1. The p r i n c i p l e s we have investigated are stated w i t h respect t o a f i x e d nunber
o f sources. What then i s t h e behavior o f a good concentration measure w i t h
respect t o a varying number of sources? (cf. [141).
2. I n 1231, Egghe calculated and i n t e r p r e t e d P r a t t ' s measure f o r some c l a s s i c a l
b i b l i o m e t r i c d i s t r i b u t i o n s , i n c l u d i n g the geometric d i s t r i b u t i o n . A s i m i l a r
investigation, using other concentration measures might be i n t e r e s t i n g .
3. What are the implications of using d i f f e r e n t concentration measures and what
i s the "most desirable" l e v e l o f i n e q u a l i t y , i n p a r t i c u l a r i n connection
w i t h science p o l i c y decisions ( c f . [241).
4. A more r e f i n e d study o f s e n s i t i v i t y t o transfers i s c e r t a i n l y needed.
5. I n [251 Stephen Cole raises t h e issue t h a t t h e Gini index (hence also
P r a t t ' s measure) simultaneously measures two concepts. One i s consensus
i n a f i e l d and the other i s dispersion Of recognition. I s t h i s remark also
v a l i d w i t h respect t o other uses of the Gini index than i n the f i e l d of
sociology of science. Does i t apply t o any concentration measure?
6. The measures of concentration we have studied i n t h i s paper are only
one-dimensional representations of the complex notion "concentration".
Does there e x i s t useful, interpretable, more-dimensional extensions of
the measures we know?
REFERENCES
[I1 Atkinson, A.B., On the measurement o f i n e q u a l i t y . Journal of Economic
Theory, 2 : 244-263, 1970.
[21 Allison, P.D., Measures o f inequality. American Sociological Review,
43 : 865-880, 1978.
[31 Allison, P.D., I n e q u a l i t y and s c i e n t i f i c p r o d u c t i v i t y . Social Studies
of Science, 10 : 163-179, 1980.
L. Egghe a n d R. Rousseau
134
[41
Gaston, J., The reward system i n B r i t i s h and American science. New York :
Wiley, 1978.
Johnson, R.L., Measures of vocabulary d i v e r s i t y . I n : Ager O.E.,
Knowles,
F.E., Smith J., e d i t o r s . Advances i n computer-aided l i t e r a r y and
l i n g u i s t i c research. Birmingham : AMLC, 1979 : 213-227.
Simpson, E.H.,
Measurement of d i v e r s i t y . Nature, 163 : 688, 1949.
Schutz, R.R., On t h e measurement o f income i n e q u a l i t y . h e r i c a n Economic
Review, 41 : 107-122, 1951.
Gastwirth, J., The e s t i m a t i o n o f t h e Lorenz curve and G i n i index. Review
o f Economics and S t a t i s t i c s , 54 : 306-316, 1972.
Yntema, D., Measures o f t h e i n e q u a l i t y i n t h e personal d i s t r i b u t i o n o f
wealth o r income. Journal o f t h e h e r i c a n S t a t i s t i c a l Association, 28 :
P r a t t , A.O.,
A measure o f class concentration i n b i b l i o m e t r i c s . Journal
o f t h e American Society f o r Information Science, 28 : 285-292, 1977.
Gini, C., I 1 diverso accrescimento d e l l e c l a s s i s o c i a l i e l a concentrazione d e l l a ricchezza. Giornale d e g l i Economisti , s e r i e 11, 37, 1909.
Carpenter, M.P., S i m i l a r i t y o f P r a t t ' s measure of c l a s s concentration t o
t h e G i n i index. Journal of t h e h e r i c a n Society f o r Information Science,
30 : 108-110, 1979.
T h e i l , H., Economics and i n f o r m a t i o n theory. Chicago : Rand McNally, 1967.
Ray, J.L., Singer, J.D., Measwing t h e concentration o f power i n t h e
i n t e r n a t i o n a l system. S o c i o l o g i c a l Methods & Research, 1 : 403-437, 1973.
Rao I.K. Ravichandra, P r o b a b i l i t y d i s t r i b u t i o n s and i n e q u a l i t y measures
for analyses of c i r c u l a t i o n data. I n : Egghe, L., Rousseau, R., e d i t o r s .
I n f o r m e t r i c s 87/88, h s t e r d a m : Elsevier, 1988 : 231-248.
Schechtman, E., Yitzhaki, S., A measure of a s s o c i a t i o n based on G i n i ' s
mean difference. Comnunications i n S t a t i s t i c s - Theory and Methods, 16 :
207-231, 1987.
I171 A l l i s o n , A., The r e l i a b i l i t y o f v a r i a b l e s measured as t h e number of events
i n an i n t e r v a l of time. I n : Schuessler, K.F., e d i t o r . Sociological
methodology 1978, San Francisco : Jossey-Bass, 1977 : 238-253.
1181 Hardy, G., Littlewood, J.E.,
U n i v e r s i t y Press, 1988.
Polya, G.,
I n e q u a l i t i e s . Cambridge :
1191 Dalton, H., The measurement of t h e i n e q u a l i t y o f incomes. Economic
Journal, 30 : 248-361, 1920.
1201 Marshall, A.. Olkin, I.,I n e q u a l i t i e s : theory o f m a j o r i z a t i o n and i t s
a p p l i c a t i o n s . New York : Academic Press, 1979.
1211 Egghe, L., A norm problem, i t s s o l u t i o n and i t s a p p l i c a t i o n i n
econometrics and b i b l i o m e t r i c s . P r e p r i n t 1988.
[221
Heine, M.H.,
Indices of l i t e r a t u r e dispersion based on q u a l i t a t i v e
a t t r i b u t e s . Journal o f Documentation, 34 : 175-188, 1978.
Egghe, L., P r a t t ' s measure f o r some b i b l i o m e t r i c d i s t r i b u t i o n s and i t s
r e l a t i o n w i t h t h e 80/20 r u l e . Journal o f t h e h e r i c a n Society for
Information Science, 38 : 288-297, 1987.
[241 Chapman, I.D., Farina, C., Concentration o f resources : t h e National
Research Council's (Canada) grants i n a i d o f research : 1964-1974.
Scientanetrics, 4 : 105-117, 1982.
[231
[251
Cole, S., The hierarchy o f t h e sciences? h e r i c a n Journal of Sociology,
89 : 111-139, 1983.
Elements of Concentmiion Theory
APPENDIX
1. IfN = 2, L s a t i s f i e s the t r a n s f e r p r i n c i p l e .
Proof :
L(xl .x2)
=
I
7
[(ln
x1
-
1
+ ( I n x2
( l n xl + I n x2))'
-
1
2
( l n xl + I n x 2 ) ) I
S i m i l a r l y , ( t a k i n g xl I x2)
XI
X - h
1
X2
X1
also l n ( - ) > I n
X2
(-1x, l - h
X2
+
X1
hence ( I n (-)
X2
<
O!)
2. The generalized P r a t t measure, P ( r ) , s a t i s f i e s (C4a), f o r r 2 I
The following proof i s due t o Q. B u r r e l l who provided us w i t h a simpler proof
than the one we had o r i g i n a l l y . We thank him f o r h i s help.
Proof :
We note f i r s t t h a t ifa f u n c t i o n z ( t ) i s a quotient, say
of z ' ( t ) i s the same as t h a t o f v ( t ) u ' ( t )
-
#,
then t h e sign
v'(t)u(t).
According t o our p r i n c i p l e (C2) we may order t h e sources i n non-decreasing
order o f p r o d u c t i v i t y :
0'
X1
<
X2
2
... 5
Then (C4a) becomes : ifx,,-~
f(xl
Now :
.....xN-,,xN
XN.
# 0, then, for h
+ h)
> f(xl
.....xN)
>0
L. Egghe and R. Rousseau
= F(y) ,where y = x N .
Now, ~ ( r s) a t i s f i e s (C4a) ifF(y) i s an increasing function o f y, o r
equivalently, if
i s increasing i n y.
Then, by the remark made i n the beginning of t h i s proof,
N-1
sgn ~ ' ( y )= sgn { (
r
1
N- I
xQ+yIr
r
( y - x L )r-1
1
Reverting t o y = xN, the expression w i t h i n the braces can be r e w r i t t e n as :
That t h i s i s p o s i t i v e i s most e a s i l y seen by w r i t i n g the two expressions i n
arrays as follows :
+
...
+
Xl
Elements of Concentmtion Theory
(xN - XN-1 r-l +
137
...
+ xl(xN-xl)r-l;
while the second i s best w r i t t e n as :
Because the xi's
are ordered non-decreasingly, a l l o f the above terms are
non-negative and every term i n the top array i s a t l e a s t as large as t h e one
i n the corresponding p o s i t i o n below (blanks i n the lower array being taken as
zeroes), e.g. i f k > j then
* ( x k - x j) ( x N - x .Jl r - l
2 (xk-x.lr
.
J
This shows t h a t G(y), hence F ( y ) i s increasing i n y, so t h a t P ( r ) s a t i s f i e s
(C4a).
© Copyright 2026 Paperzz