a new short proof of Naranan`s theorem L.Egghe JASIST revision

1
Brief Communication
A new short proof of Naranan’s Theorem,
explaining Lotka’s Law and Zipf’s Law
L. Egghe
Universiteit Hasselt (UHasselt), Campus Diepenbeek, Agoralaan, B-3590 Diepenbeek,
Belgium1
and
Universiteit Antwerpen (UA), Stadscampus, Venusstraat 35, B-2000 Antwerpen, Belgium
[email protected]
___________________________________________________________________________
ABSTRACT
Naranan’s important Theorem [Nature 227, 631-632] states that, if the number of journals
grows exponentially and if the number of articles in each journal grows exponentially (at the
same rate for each journal), then the system satisfies Lotka’s law and a formula for the
Lotka’s exponent is given in function of the growth rates of the journals and the articles.
This brief communication reproves this result by showing that the system satisfies Zipf’s law,
which is equivalent with Lotka’s law. The proof is short and algebraic and does not use
infinitesimal arguments.
1
Permanent address
Key words and phrases: Naranan’s theorem, exponential growth, Lotka’s law, Zipf’s law.
2
I. Introduction
Naranan’s old, but important, theorem, published in Nature in 1970 (Naranan (1970)) can be
formulated as follows (here we replace “journal” by “source” and “article” by “item” to have
the general framework of information production processes (IPPs)).
Theorem I.1: Let us have an IPP in which
(i)
The number of sources N  t  grows exponentially in time t , denoted as
N  t   c1a1t
(ii)
(1)
The number of items p  t  in each source grows exponentially in time t and the
growth rate is the same in each source, denoted as
p  t   c2 a2t
 c1 , c2  0,
(2)
a1 , a2  1 . Then this IPP satisfies Lotka’s law:
f  j 
C
j
(3)
where C  0 ,   1 , j  1 and f  j  is the density of the sources with item density j
(density in the sense of probability densities in probability theory (continuous distributions)).
The relation between Lotka’s exponent  and the growth rates a1 and a2 is given by (4)
  1
ln a1
ln a2
(4)
3
The proof can be found in Naranan (1970) but also in Egghe (2005a,b) where more details are
given (but where the argument, essentially, is the same as in Naranan). The proof uses
infinitesimal arguments (i.e. arguments using derivatives and integrals).
The proof presented here is different in nature. First, we do not use infinitesimal arguments.
We also do not prove Lotka’s law but its equivalent Zipf’s law:
g r  
B
r
(5)
where B  0 ,   0 , r  0 and where j  g  r  is the item density in source density r . The
equivalence of Lotka’s law (3) and the one of Zipf (5) is stated exactly as in Theorem I.2
below (see also Egghe (2005a), Exercise II.2.2.6 or Egghe and Rousseau (2006) where a
proof is given in the Appendix).
Theorem I.2: The following assertions are equivalent
(i)
Lotka’s law as in (3)
(ii)
Zipf’s law as in (5)
Moreover the parameters  and  relate as in (6)

1
 1
(6)
So we will prove Theorem I.1 by using Theorem I.2. Zipf’s law is a power law. Power laws
are characterized by the well-known scale-free property: see Egghe (2005a) or Luce (1959),
Roberts (1979).
Theorem I.3: The following assertions are equivalent for a function g : R   R 
(i)
g is continuous and scale-free, i.e. for every positive constant D there is a positive
constant E (only dependent on D ) such that
4
g  Dr   Eg  r 
(ii)
(7)
g is a power law: there exist B  0 , b  R such that
g  r   Br b
(8)
for all r  0 .
In the next section, Naranan’s theorem will be proved by proving that (under the assumptions
of Naranan) the rank-frequency function g  r  (the item density of the source on rank density
r ) is scale-free, hence, by Theorem I.3, a power law. We will also see that it decreases. This,
with Theorem I.2 yields Lotka’s law and the proof of Naranan’s theorem.
The paper ends with a remark on an extension of Zipf’s law.
II. New, short, algebraic proof of the Theorem of
Naranan
Let t  R  be fixed, representing a fixed end time (such as the present). Let   ,1 be a
variable parameter on which we can measure growth (normally we take    0,1 but for
technical reasons (the fact that r  0 as argument of g ) we take   ,1 for reasons
which will become clear in the sequel).
According to Naranan’s assumptions, we have that a new source at  t (hence with r  c1a1 t )
has c2 a2t  t items (densities). Hence the defining equation for the rank-frequency function g
is
g  r   g  c1a1 t   c2 a2t  t
(9)
5
This function is continuous and strictly decreasing. We now show that g is scale-free. For
D  0 (a constant) we have
Dr  Dc1a1 t
 c1a1
log a1 D
a1 t
log a1 D  t
 c1a1
: c1a1 ' t
defining  ' as
'
'
log a1 D   t
t
log a1 D
t

(10)
So, by (9)
 log a1 D 
t 
 t


t


2 2
g  Dr   c a
t  t log a1 D
 c2 a2
1

log a1 D
2
a
g r 
 Eg  r 
with
E
1
log a1 D
2
a
(11)
6
Since a1 and a2 are fixed, we have that E only depends on D and hence, g is scale-free. By
Theorem I.3 and the fact that g decreases strictly we have that g  r  has the form (5). Hence,
by Theorem I.2 we have Lotka’s law (3). There only remains to prove (4).
By (5) we have
g r 
D
g  Dr  
hence, by (11) we have
D   a2
log a1 D
hence


log a1 D
log a2 D
ln a2
ln a1
(12)
(using that ln a1 log a1 D  ln D and similar for a2 ). This, together with (6) yields (4) and
hence Naranan’s theorem.
□
Remark: Formula (9) is remarkable, summarizing “in one line” that exponential growth in
sources and items yields a Zipfian function g  r  , hence also the Lotkaian function f  j  .
Naranan’s theorem cannot be over-estimated: Lotka’s law is a natural consequence of two
exponential growths (which is maybe the most important function for natural growth). Now
the present paper gives a simple (almost trivial) proof for this non-trivial phenomenon, where
everything is summarized in formula (9).
7
We end with an open problem. Many data sets do not show a convexly decreasing rankfrequency function g but a decreasing S-shaped function g , first convex then concave (see
Mansilla et al. (2007), Martinez-Mekler et al. (2009), Lavalette (1996), Campanario (2009)
and Egghe and Waltman (2010)). A function that can have both shapes (convexly decreasing
or decreasing, first convex, then concave) appears in Mansilla et al. (2007), Martinez-Mekler
et al. (2009) and Campanario (2009) and is given in (13)
g r   K
 N 1 r
b
ra
(13)
with K  0 , a  0 , b  0 and with r  0, N  1 . Function (13) for a  b is called
Lavalette’s function (see Lavalette (1996)).
It is shown that this function fits very well practical rank-frequency data. In Egghe and
Waltman (2010) we could show that g strictly decreases, that it is convex if and only if b  0
or b  1 (the case b  0 corresponding to Zipf’s law (5)) and that g has an S-shape, first
convex then concave if and only if 0  b  1 .
How can this function be “explained” in the sense of Naranan ? Or how can its proof be
adapted to yield another general size- or rank-frequency function as described above ?
8
References
Campanario, J.M. (2009). Distribution changes in impact factors over time. Scientometrics, to
appear.
Egghe, L. (2005a). Power Laws in the Information Production Process: Lokaian Informetrics.
Oxford, UK: Elsevier.
Egghe, L. (2005b). The power of power laws and an interpretation of Lotkaian informetric
systems as self-similar fractals. Journal of the American Society for Information
Science and Technology, 56(7), 669-675.
Egghe, L. & Rousseau, R. (2006). An informetric model for the Hirsch-index. Scientometrics,
69(1), 121-129.
Egghe, L. & Waltman, L. (2010). Relations between the shape of a size-frequency distribution
and the shape of a rank-frequency distribution. Information Processing and
Management , to appear.
Lavalette, D. (1996). Facteur d’impact: impartialité ou impuissance ? Report INSERM U350,
Institut Curie-Recherche, Bât.112. Centre Universitaire, 91405 Orsay, France.
Luce, R.D. (1959). On the possible psychophysical laws. The Psychological Review, 66(2),
81-95.
Mansilla, R., Köppen, E., Cocho, G. & Miramontes, P. (2007). On the behavior of journal
impact factor rank-order distribution. Journal of Informetrics, 1(2), 155-160.
Martinez-Mekler, G., Alvarez Martinez, R., Beltrán del Rio, M., Mansilla, R., Miramontes, P.
& Cocho, G. (2009). Universality of rank-order distributions in the arts and sciences.
PLos ONE, 4(3), e4791.
Naranan, S. (1970). Bradford’s law of bibliography of science: an interpretation. Nature, 227,
631-632.
Roberts, F.S. (1979). Measurement Theory with Applications in Decisionmaking, utility and
the social Sciences. Reading (MA), USA: Addision-Wesley.