Lecture 12: Why Power Laws/Scale-free Networks? Consider cases

Lecture 12: Why Power Laws/Scale-free Networks?
Consider cases of power laws/networks and discontinuous change
1 Turkeys Nassim Taleb uses turkey as danger of ignoring discontinuous change.
Each day humans feed and take care of turkey. On Thanksgiving big
surprise: 46 million turkeys are eaten; another 22 million are eaten at
Christmas and 19 million at Easter. Not much turkey eaten in other times --->
power law.
2)Finance : IndyMac,the seventh largest mortgage originator in the US until on July 11, 2008, it became the fourth
largest bank failure at that time; variations in derivatives portfolio
3)Social Media: Twitter and Facebook
4) 20th Century's largest disasters from EM-DAT International Disaster Database by the Center for Research on
Epidemiology of Disasters that currently lists disasters from 1900 to 2015 (http://www.emdat.be/database). Disaster
means at least one of the following: 10 or more deaths, 2000 or more affected people for droughts and famine or 100 or
more for other disasters, a government disaster declaration, or a plea for international assistance.
The heavy tail of power distributions shows up in the moments of the distribution. Assume Y (frequency) = S- a
or ln Y = C -a ln S. Smaller a's mean greater weight at tail events. When a < 3 the 2nd moment does not exist – no
variance. When a < 2 the first moment – the mean – also does not exist. You may calculate those statistics with
finite data and will get a finite variance from ∑(X-Mean X)2 /N – but true variance is infinite.
Why does a<2 not have a mean? (S-a) S = S1-a If a = 2 get harmonic series (∑1/S, which diverges. If a = 1
get ∑ 1, which is infinite. Since we never are at infinity, maybe not a problem. But infinite variance means high
sensitivity of empirical variance measures to presence/absence of a small number of big events. Most studies find a to
be between 2 and 3, but with large SD. Small diff in a has huge impact on the probability of an extreme outcome.
Consider the Pareto distribution. If 0< k < 2 –> infinite variance and if 0 < k < 1, get infinite mean (You can do calculus
or see http://en.wikipedia.org/wiki/Pareto_distribution. ) But in power law form a=k+1, where a is the power law
coefficient, translates into if 1 < a < 3, second moment is infinite and if 1 < a < 2, first moment is infinite.
Is there a universal law underlying all situations in which we observe power law?
Three ways to generate power laws:
1)As outcome of statistical process via generalization of Central Limit theorem – if mean/variance and errors
are iid, then as N → infinity distribution is NORMAL --to distributions that include infinite variance
2)Through some model in which interrelation among parts creates thick tails. STRUCTURE that has some
feedback loops that react to random shocks in ways to produce power law. Hope is to find simple and deep principles
that underlie regularities and obviate the need for details to understand economics.
3)Through optimizing behavior that brings system to “brink” of large changes
Two camps: people who seek/sometimes find broad robust detail-free laws and those who believe details matter.
1) STATISTICAL --properties of power law distributions as a stable distribution – linear combination of two
independently drawn copies of the variable has the same distribution. Three closed form representatives of stable
distribution: Normal, Cauchy distribution, and Levy distribution that is used in finance. Cauchy is symmetric with such
a thick tail it has neither mean nor variance; Levy is for non-negative variable – not symmetric. By being STABLE
distributions, all three are “attractors” – if lots of random “stuff” happens end up with this distribution
What it means to have a thick tail in a distribution: comparison of Cauchy and Levy with Normal
A stable distribution has four parameters: – key is a; skewness parameter; scale parameter; location parameter). Once
regarded as "exotic" stable distributions now interesting and useful. Because infinite variance is likely to show jumps,
which fits “many time series appear to exhibit "discontinuities( e.g., large jumps)” (Knight, 1997)
Evidence + Generalized Central Limit Theorem justifies the use of stable models. Examples in finance &
economics, where data poorly described by Gaussian model, but well described by a stable distribution – stock prices
for instance ( Journal of Business & Economic Statistics, Vol. 8, No. 2 (Apr., 1990)).
If we have systems that operate by power law processes, can we predict/improve outcomes?
2)STRUCTURAL MODEL I: stochastic growth –proportionate growth plus some barrier/bound: % growth +
bounds generate power law (associated with Herb Simon who had a nasty debate with Mandelbrot over it)
Without barriers/bounds stochastic growth gives lognormal: random ln/% growth –> lognormal with var σ2.
If rate of growth is independent of initial size and the variance of growth is the same for all units, known as Gibrat's
law in economics, yields equation for growth of firm (http://docentes.fe.unl.pt/~jmata/gibrat.pdf):
% change in SIZE = σ so that SIZE (t) = (1+ σ) SIZE (t-1)–> ln SIZE (t) = ln SIZE (t-1) + ln (1+σ)
Need something to fatten tails in the distribution and go beyond lognormal. Some lower bound/friction. Gibrat
+ lower bound–> Zipf. This is the STEADY STATE distribution. The bound produces “Reflected Brownian motion”
– originally shown by Champernowne for income distribution .
Gabaix model for cities: Cities of different sizes have same growth rate with a constant variance. The position of
cities can change, but the distribution replicates itself. LA surpasses Chicago and the number 2 city is proportionate (½
of the largest city) in Zipf’s law with coefficient 1.
You follow the average growth rate + random component unless you are very small. If you are very small you
grow at 0 or at some positive value that depends on average growth and random shock. By moving density from the
bottom, you push the distribution toward fatter tails. Empirical issue for cities: do all but smallest have same growth
rate with constant variance regardless of “policies”?
To see mechanism, consider fixed total population that distributes itself among cities. With same % growth larger
cities have greater absolute growth. Thus, must have more small cities to maintain the fixed population.
Example: P = % of cities that double every period; (1-P) = % of cities that half at given amount. This does not allow
for the variance in growth rates, but shows how the rule produces distribution.
Assume fixed population scaled at 1. Then no NET growth: 2(p) + ½ (1-p) = 1
Solving we get p = 1/3 → cities with size 2 make up 1/3rd; cities with size ½ made up 2/3rd of population.
There are twice as many small cities as large cities.
What about next period? One city of size 2 becomes 4 and one city of size 2 becomes 1, etc
Rank of city by size size Frequency
1
4
1/9
2
1
2/9
3
1
2/9
4
1/4
4/9
And next period and so on
A stable distribution by size classes after a long period of doubling/halving needs same absolute changes, which holds
only if size classes have the same total population and fits Zipf with bins:
SIZE # CITIES POPULATION IN CLASS
1
4
1
4
2
2
2
4
3
1
4
4
4
½
8
4
2) STRUCTURAL MODEL 2. PREFERENTIAL ATTACHMENT
This is story of power law in web pages: growth rates differ across web sites, differential attachment, so more
likely to attach to older/larger sites. Small # of older/larger sites will grow more rapidly than smaller/newer sites →
power law. This can also explain why paper citations show power law. But also need one other element in these
models: some attachment/citation for new sites/papers.
Mixture of Distributions: in web pages: the lifetime of sites is exponential, so there are a few long-lived sites
and many short-lives sites. Random growth of attachments to web sites would give lognormal distribution. The
mixture of the exponential and log-normal gives power law. Similarly if you assume firm size– distribution of firm
size in an industry – exponential life distribution + distribution of industry means –> power.
Other ways to get power law: as the inverse of a function that follows a power law; as combinations of
exponential; random walk distribution of lifetime till run → lots of short lives, few older ones.
4. LOCAL INTERACTIONS AND OPTIMIZATION --> SOC self organized criticality: system has birth/death
process that moves it to border area where it is subject to risk of major disruptions, producing power law. This is the P.
Bak “sandpile theory” explanation: systems. naturally move to a point where they generate “avalanche” events. But
if then “we must also abandon any idea of detailed long-term determinism or predictability”. There will be Silicon
Valley or an economic collapse, but you cannot predict where it occurs or when In a short period, you get few BIG
EVENTS, but you never know when.
Bak: “Large fluctuations ... in economics indicate an economy operating at the SOC state, in which minor
shocks can lead to avalanches of all sizes ... there is no way one can stabilize the economy... eventually something
different and quite unexpected will upset ... balance ... and there will be a major avalanche somewhere else” (p 191).
Almost Prez A. Gore Paean to the Sandpile Model: “The sandpile theory – self-organized criticality – is irresistible as a
metaphor; one can begin by applying it to the developmental stages of human life. The formation of identity is akin to a
formation of the sandpile, with each person being unique and thus affected by events differently. A personality reaches the
critical state once the basic contours of its distinctive shape are revealed; then the impact of each new experience reverberates
throughout the whole person, both directly, at the time it occurs, and indirectly, by setting the stage for future change. Having
reached this mature configuration, a person continues to pile up grains of experience, building on the existing base. But
sometimes, at midlife, the grains start to stack up as if the entire pile is still pushing upward, still searching for its mature
shape. The unstable configuration that results makes one vulnerable to a cascade of change.”
The model is a cellular automata that uses nearest neighbor interactions to produce an avalanche-- lots of places
changing. Think of debts and bankruptcy under this rule: Add an extra debt: Owe--> Owe + 1. If you hit a debt limit,
you go bankrupt, pushing your debts to neighbors by lowering their assets. In the diagram, numbers measure debts.
You drop a new debt onto the model -- 4 in the second diagram. That person can’t pay loans to neighbors, which adds
to their debt. They go under. And so on. The avalanche is defined as the number of sites that hit 0 -- go bankrupt.
This is a LOCAL INTERACTION MODEL in which several sites get close to an avalanche, so that “the next
straw breaks the camels’ back”. The model has power law: # of Avalanches of Size in period = (Size )-1.1
Avalanche Size
# of Avalanches
2
.47
10
.08
100
.006
4.OPTIMIZATION: HOT (highly optimized tolerance) develops power laws from optimization. Systems are
optimized along some dimensions and robust but risk failure from cascade of shocks in other dimensions. If optimize
return to investment you leave open the door to catastrophe –> heavy tails.
Forest fire model
Consider the forest with trees on a grid. Random lightning bolts cause a fire. If a bolt lands on an empty space,
no fire; if it lands on tree, it burns the tree, which spreads to all neighbors. The chance of a fire at any tree is p. The
number of trees that burn is inversely related to the number of fires per time period in a power law: many small fires in
which few trees burn and a few large fires. The key metric is R(p), the probability that there is a path across the space
called a spanning cluster so that whole forest burns. R(p) makes a dramatic transition from low to high values---a
phase transition---at a critical value when the density of trees is about 0.59.
Large fires are more likely when few sparks. Why? Because with few small fires, the forest gets denser –>
one big fire. With lots of sparks there likely to be space between trees. Carlson and Doyle explain this with model in
which the forester plants trees to optimize lumber, subject to fires that burn trees. The optimizing strategy is to plant
trees in blocks with narrow fire breaks between them to prevent fire from spreading. Smaller blocks in regions of likely
fire; larger blocks in regions where fire is unlikely. The structure is designed to respond optimally to small
perturbations. But optimal performance in normal times risks ruinous collapse if get unlikely shock. In an area with
not many sparks you put up few firebreaks. The trees are close together. Unlikely spark event → forest burns down. In
area with lots of sparks, you put up lots of firebreaks. Unlikely spark event happens – you are safe.
Optimization to common perturbations leads to good properties with respect to those shocks but are especially
"fragile" to rare events, unanticipated changes in the environment, and flaws in the design
Arithmetic Insights from Power Laws
In 1970s distribution of unemployment was hotly debated. M. Feldstein said “unemployment is widely
distributed”, therefore is not so bad, or needing much UI. L.Summers-K.Clark said “unemployment is concentrated,”
so there is group that suffers a lot and needs help. Who is right?
It depends how you look at the same data :
Consider 1 person unemployed for 12 months – Mr. Longterm
12 people unemployed for one month each – Shortermers
Of those unemployed at ANY MONTH: ½ will be longterm
OF those unemployed OVER THE YEAR, just 1/13 will be longterm
The mean spell of unemployment is 24/13 ≈ 2 months; the median spell is 1 month
But ½ of UI moneys will go to the long spell person
This is an arithmetic relation based on power law distribution: Long spells contribute more to the stock of
anything in a given period. A small number of LONG TIMERS take up most resources. Short spells contribute a lot
to the MEAN spell and dominate the MEDIAN so the average looks short. So if you want to say most unemployment
is short term give average length of spell. If you want to say its longterm, give % long term in period.
The key to duration statistics is the Hazard Rate, the chance of leaving the state.
In surveys we ask people how long they have been in a state – for how many years have you been at Harvard?
Unemployed? In Jail? We thus get measures of INCOMPLETE durations of spells.
For the people in the state, the completed spell must exceed the incomplete spell.
If the sample is randomly drawn, the mean of a completed will be twice the mean of the incomplete spell for
persons in the state, assuming no changes in the world that generated the spells.
If the hazard rate is constant, the mean for the incomplete spell equals the mean for ALL completed spells
Finally, note difference in averages depending on whom we ask
Firm size: # firms with employment, Total in group
40 workers
1
40
20
2
40
10
4
40
1
40
40
Average employment AMONG FIRMS 160 workers/47=~3.4
Average employment REPORTED BY WORKERS? 71/4 =~17.8. Why? Since each group has 40 workers
average reported by 160 workers = 40 (40+20+10+1)/160 = 71/4 = 17.8.
Median worker is in firm with 1 worker
Another paradoxical numerical relation and a theorem is that the average number of friends of people is invariably less
than than the average number of friends of their friends. Why? Because it is more likely you are friends with someone
with lots of friends. Numerical example: List of friends of four people.
Average of friends
Average friends of friends
Grand Avg
Most unequal – 3 edges (3,1,1,1)
(3+1+1+1+)/4=1.5
Most pop avg 1; other 3 avg 3
(10)/4 = 2.5
Another friendship-- 4 edges (3, 2, 2, 1)
(3+2+2+1)/4 =2.0
Most pop avg 5/3; 2 avg 2.5; 1 avg 3 (29/3)/4~ 2.42
Another friendship – 5 edges (3, 3, 2, 2)
(3+3+2+2)/4= 2.5
2 avg 7/3; other 2 avg 3
Add another edge – 6 edges (3,3,3,3)
3
3
See https://en.wikipedia.org/wiki/Friendship_paradox.
Does same hold true for twitter, where graph is di-graph rather than ↔ ?
(32/3)/4 = 2.67
3