- Lorentz Center

ARE YOUR DATA REALLY POWER LAW DISTRIBUTED?
ECONOPHYSICS AND NETWORKS ACROSS SCALES - LORENTZ CENTER 2013
PASQUALE CIRILLO
Many types of Paretianity (or power law bahavior)
⇣
x
x0
⌘
↵
0 < x0  x
Pareto I
1+
F (x) = 1
,
x>0
8
⇣
⌘ ⇠1
⇠(x ⌫)
<1
1+
⇠ 6= 0
⌘
⇣
F (x) =
:1 exp
x ⌫
⇠=0
Pareto II
F (x) = 1
⇥
x
b
↵
↵
GPD
...
...
f (x) / L(x)x
⇤
,
Power law
Why do we care?
✤
Size distributions of companies: Gibrat law, Pareto law.
✤
Individual income and wealth distributions.
✤
Tax evasion and under-reporting.
✤
Scale-free networks.
✤
Etc.
Zipf plot
✤
A very common tool to look for
Paretianity in the data is the so-called Zipf
plot (log-log plot of the empirical survival
function).
✤
F̄ (x) = 1
F (x) =
There exist different versions, e.g. with
and without binning.
✤
Pareto I
⇣ ⌘
x
F (x) = 1
x0
Most of the times it is the only “test” used
log(F̄ (x)) = ↵ log(x0 )
log(F̄ (x)) = C
⇣
↵
x
x0
⌘
↵
↵ log(x)
↵ log(x)
to check for Paretianity, before estimating
the power law parameters.
but this essentially holds also
for the other specifications
Is this Power law?
What do you reckon?
NO
lognorm(1,5)
Is this Power law?
What do you reckon?
What about this?
Again...
NO
lognorm(0,2)
What about this?
Again...
And this?
Last one.
YES
for x>1000
otherwise lognorm(5,1)
And this?
Last one.
Hence
✤
The Zipf plot alone is not reliable.
✤
Yet, it is the most used (and abused) plot to
Zipf plot (Log-log plot of the survival function)
look for Paretianity in the data.
It represents a necessary condition (negative
linear dependence), but it is not at all
sufficient.
✤
Exponential =1
log(1-F(x))
✤
Par
eto
/P
ow
er
Weibull
Law
<1, Lognormal
or Gamma
>1
Very difficult to interpret for mixtures and
generalized distributions.
Normal or Weibull
>1
log(x)
✤
It can be used heuristically to find a candidate
for the Paretianity threshold.
Some guidelines for interpretation (pure cases)
An alternative from EVT: the Meplot
✤
The mean excess function plot is based on the
behavior of the mean excess function (ME).
✤
Let X be a random variable with distribution
Pareto I
F. The mean excess function is
e(u) = E[X
✤
u|X > u] =
R1
(t u)dF (t)
uR
,
1
dF
(t)
u
eP AI (u) =
0 < u < xF
Interestingly, the ME is a way of
characterizing distributions within the class
of continuous distributions.
✤
For example, the Pareto class is the only class
satisfying the so-called van der Wijk’s law.
u
↵ 1,
↵>1
GPD
eGP D (u) =
+⇠u
1 ⇠ ,
etc.
+ ⇠u > 0
...
Empirically, the mean excess function is
en (u) =
✤
Mean Excess Function Plot (MEPLOT)
Pn
(X u)
Pni=1 i
i=1 1{Xi >u}
Weibull
If we plot en(u) against the ordered u, we
construct the mean excess plot, or meplot.
e(u)
✤
<1 or Lognormal
Normal
Power Law
Gamma
✤
For Paretian/power law data we look for
Exponential
some increasing linear trend.
✤
As for the Zipf plot, we can use the meplot to
heuristically identify the Paretian threshold.
>1
Weibull
>1
Threshold u
Some guidelines for interpretation (pure cases)
Mean Excess Plot
7
6
4
5
You may want to
ignore these points
2
3
Mean Excess e(u)
8
5e−01
1e−02
5e−04
1 − F(x) (on log scale)
Zipf plot
0.05
0.50
5.00
x (on log scale)
0
5
Threshold u
So... is this power law?
The meplot should now help you.
10
15
Mean Excess Plot
7
6
4
5
You may want to
ignore these points
2
3
Mean Excess e(u)
8
5e−01
1e−02
5e−04
1 − F(x) (on log scale)
Zipf plot
0.05
0.50
5.00
x (on log scale)
0
5
Threshold u
Sorry, the answer is no.
Lognorm(0,1)
10
15
7
6
5
3
4
0.020
0.200
Mean Excess e(u)
8
Mean Excess Plot
0.002
1 − F(x) (on log scale)
Zipf plot
2
5
10
20
x (on log scale)
Little digression
Be careful: plots are easily manipulable.
5
10
15
Threshold u
Simply acting on axes...
Why do they fail?
✤
For what concerns the Zipf plot, the problem is in our eyes.
We are inclined to look for Paretianity, and we are very happy to see it everywhere, even if the plot is
perfectly consistent with a Lognormal distribution.
Weibull
The misunderstanding generated by the meplot is more subtle:
e(u)
✤
Mean Excess Function Plot (MEPLOT)
<1 or Lognormal
Normal
Power Law
Gamma
★
The lognormal distribution shows an increasing mean excess function,
>1
Exponential
Weibull
as the Pareto one.
>1
Threshold u
★
The main difference is that the Paretian e(u) grows linearly, while the lognormal ME is concave.
★
Empirical investigations and simulations show that, on average, we need more than 10000
observations in order to clearly distinguish between a Paretian and a lognormal mean excess function.
✤
In both plots, the range of variation of our data is 0-30. Such a small range is not really compatible with a
distribution belonging to the Paretian family, which typically accounts for a larger volatility.
My proposal
P.C. 2013. Are your data really power law distributed? Submitted.
Discriminant Moment-ratio Plot
5
✤
Inv. Gamma
Lognormal
Gray zone
The information given by the Zipf and the mean
m
m
Ga
4
a
Lognormal-like
zone
excess plot can be complemented with other graphs.
3
Skewness
✤
Pareto
Exponential/Thin Tailed
2
Paretian
zone
Bernoulli
1
I have recently proposed two possibilities:
Normal/Symmetric
0
-1
★
The discriminant moment-ratio plot
CV (scale is not representative)
Zenga Plot
1
★
Exponential
The Zenga plot
Lognormal, large
✤
These plots are more reliable, and they can be
Z(u)
Pareto
1
Pareto
2
used to refine the analysis if we suspect the presence
of Paretianity in the data.
Lognormal, small
Pareto, large
0
0
1
u
And they do work...
Zenga plot
0.8
Z(u)
lli
Lognormal
Bernou
5
10
CV
15
0.0
Normal or Symmetric
0
Pareto(10,2)
0.4
Inverted Gamma
Log
nor
mal
Pareto I
20
10
ma
m
Ga
0
Skewness
30
40
Discriminant Moment−ratio Plot
20
0.0
0.2
0.4
0.6
u
0.8
1.0
However
✤
Graphical tools are nice useful instruments, BUT they should only
represent the first step of the analysis.
✤
Distributional tests such as the KS and the AD should always be used!
THANKS!
[email protected] - WWW.PASQUALECIRILLO.EU
21