What is Power Law?

Winners don’t take all: Characterizing the
competition for links on the web
written by
D.Pennock; G.Flake; S.Lawrence; E.Glover; C.Giles
Evren CEYLAN
2003700163
Dec 17, 2004
Introduction









General Purpose of the Paper
What is Power Law?
Category-Specific Degree Distributions
Inbound link distributions for company homepages
Network Growth Model
Analytic Solution
Related Models
Web Data and Model Comparisons
Conclusion
General Purpose of the Paper

The World Wide Web displays a striking ‘‘rich get richer’’ behavior, with a relatively small
number of sites receiving a disproportionately large share of hyperlink references and traffic.

Although the connectivity distribution over the entire web is close to the power law
distribution, this study will show that the distribution within specific categories is typically
unimodal on a log-scale, and so the extent of the rich get richer phenomenon varies across
different categories such as;




research paper citations,
Pattern of movie actor collaborations,
United States power grid Connections
The proposed generative model, incorporating a mixture of preferential and uniform
attachment, quantifies the degree to
 which the rich nodes grow richer and
 how new (and poorly connected) nodes can compete
What is Power Law?

Several investigations show that the distribution of the number of links to (and from) a web
page obeys a power law. So, what is power law?

Barabasi and Albert refer power law scaling to a ‘‘rich get richer’’ mechanism called
preferential attachment:
 as the network grows, the probability that a given vertex receives an edge is
proportional to that vertex’s current connectivity.

In power law distribution, very large events are rare, but small events are quite common
such as;



There are few large earth-quakes, but many small ones.
There are few mega-cities, but many small ones.
There are few words, such as 'and' and 'the' that occur very frequently, but many which
occur rarely.
Category-Specific Degree Distributions

Several studies find that the probability that a randomly selected web page has k links is proportional to k  
for large k , where  is a constant, which is empirically determined as

2.1 for inbound links and 2.72 for outbound links

At small connectivities k, the distribution of links on the web fails to fit a power law (explained
later).

When displayed on a log–log plot, this so-called power law distribution appears linear with slope

What are inbound links and outbound links?

Inbound links: Links pointing to a website. When a user arrives at a website from another site, that
link is called an Inbound Link.

Outbound links: A link that points away from your website.

If the number of inbound links to a web page is interpreted as a measure of its popularity, then power law
distribution implies that a small fraction of web pages receive a disproportionately large share of such
endorsements.

As a result, although the vast majority of web pages have relatively small numbers of links, a few pages
have enormous numbers of links.

.
Category-Specific Degree Distributions (Cont’d)

Therefore, according to ‘Power Law’ few popular pages on the web





However, the majority of web sites suffer from relatively poor visibility.


benefit from a greater volume of traffic from web surfers,
a higher probability of being indexed in search engine databases, and
more important ranking within search engine results.
For example; new commercial sites may have difficulty in competing for consumer
attention.
Therefore; these affairs on the web is called as “winners take all” phenomenon.
Inbound link distributions for company homepages

We examined the inbound link distributions for a set of public company homepages. (Figure-1)
- Analytic Sol.: Eq3
- Data: Emprically observed
- Simulation: Sim. of model
-- Tail of the distribution continues to fit a power law
-- The body of the distribution is unimodal with a sharp and singular mode (indicating the
company homepages have between 99 and 146 inbound links)
The connectivity distributions of company homepages and university
homepages:
-They display the same qualitative shape —unimodal body and power law tail.
-Tails indicate that popular pages still gain a disproportionate percentage of all inbound links.
- Among less popular web pages of the same type, the distribution of inbound links is more balanced.
- Relative to their community, winners don’t quite “take all”.
- In the above graphics; Losing sites (or ordinary web-sites) attract a considerably higher proportion of
links than would be the case under the power law distribution.
The connectivity distributions of scientist homepages, and newspaper
homepages:
Network Growth Model
Generative Process Description:

The proposed model of network growth is used to explain the observed connectivity distributions
for



web categories,
other social networks
The model is similar to other generalized BA models
In the proposed model;


The network begins with m0 vertices.
At each time step t, one vertex and m edges are added to the network.



(In the BA model, all m edges connect the new vertex with an old vertex according to preferential
attachment: the probability (ki) that an edge connects to vertex i is proportional to ki , where ki is the
current number of edges incident on vertex i.)
Instead; in this model every vertex has at least some baseline probability of gaining an
edge.
So, both endpoints of edges are chosen according to a mixture of probability α for preferential
attachment and 1-α for uniform attachment. (Eq.1)
By using this formula;

The probability that an endpoint of a new edge connects to vertex i is;
Eq. 1
- Mo+t: total number of vertices.
Growth components
- 2mt: total connectivity at time t
- α: Preferential attachment parameter
- Each vertex increments its connectivity independently according to Eq. 1
- So, edge point are choosen symmetrically, rather than pinned to the newest vertex.
Therefore;

Solitary vertices are not destined to remain forever disconnected.

Under preferential attachment alone, sites that are already rich in links tend to get richer, resulting in a
power law distribution.

With the addition of a component for uniform attachment, the poorer sites (with some luck) can get rich
too.
Analytic Solution


With any other approximation, which is similar to one used by Barabasi and Albert, the connectivity
distribution for the model in closed-form can also be derived.
The probability of a vertex has connectivity k is;
Eq. 2

In the limit k --> œ, the empirically observed exponents are obtained as follows;



inbound web links: 2.10
outbound web links: 2.72
An analogous transformation of the probability density of (Eq. 2) can be performed to make a
comparison on a log-scale plot;
Eq. 3
- k = 2m(1- α): mode of the distribution on log-scale plot
- α  1 or m  0
the distribution approaches to the power law.
Related Models

Even Simon in 1955, invoked a similar process to explain Zipf ’s word-frequency distribution.
freq.
freq.
word
word
Linear scales on both axes

Logarithmic scales on both axes
Zipf distributions shows the characterarization of words in a natural language (like English);

a language has a few words ("the", "and", etc.) that are used extremely often

a language has quite a lot of words ("dog", "house", etc.) that are used relatively much

a language has many words ("Zipf", "double-logarithmic", etc.) that are almost never used
Related Models: Zipf distribution (Cont’d):


According to Zipfs study; ‘Web-Use’ follows a Zipf distribution as well.
The following figure shows the distribution of incoming page requests to www.sun.com during a
one-month period.
- black line: actual
empirical data
- red line: Zipf curve that
seems to fit the data quite
well except for the low end
- Each datapoint in the x-axis represents one page sorted according to popularity.
- The first page is the most popular one (the home page),
- The second page is the one that received second-most requests in that month,
- It goes on until reaching page number 10,000 (which was only requested a single time that month.)
Web Data and Model Comparisons

The proposed model has an ability to fit both the body and the tail of the category-specific
web pages.

There is a clear fit between the model and the actual connectivity distributions:
 the body and the tail of typical degree distribution is fit as shown in figures (for
category specific data -company, university, newspaper, and scientist homepages- )

The distribution of links to university homepages exhibits the largest deviation
from a power law.

However, the distribution of inbound links on the web as a whole is closest to a
pure power law.
Conclusion

In all cases studied, the mixture parameter α is greater than 0.5.
 Thus, preferential attachment plays a larger role in web-link-growth than does
uniform attachment.

The growth of links to company homepages (α = 0.950) and to newspaper homepages
(α = 0.948) dominated by the “rich get richer” process of preferential attachment.

However, link growth on scientist homepages (α = 0.602) and university homepages
(α = 0.612) suggest a more balanced mixture of preferential and uniform terms.

The results are seen in the following table:
Conclusion (Cont’d)

In this study, it is showed that;

Among web pages of the same type, the body of the distribution of inbound
links deviates strongly from a power law, showing a roughly log-normal
shape.

The proposed generative model which incorporates uniform and preferential
attachment explains data from the web as a whole, as well as categoryspecific
data from company, university, newspaper, and scientist homepages.
Thank you…
Questions ?