Winners don’t take all: Characterizing the
competition for links on the web
written by
D.Pennock; G.Flake; S.Lawrence; E.Glover; C.Giles
Evren CEYLAN
2003700163
Dec 17, 2004
Introduction
General Purpose of the Paper
What is Power Law?
Category-Specific Degree Distributions
Inbound link distributions for company homepages
Network Growth Model
Analytic Solution
Related Models
Web Data and Model Comparisons
Conclusion
General Purpose of the Paper
The World Wide Web displays a striking ‘‘rich get richer’’ behavior, with a relatively small
number of sites receiving a disproportionately large share of hyperlink references and traffic.
Although the connectivity distribution over the entire web is close to the power law
distribution, this study will show that the distribution within specific categories is typically
unimodal on a log-scale, and so the extent of the rich get richer phenomenon varies across
different categories such as;
research paper citations,
Pattern of movie actor collaborations,
United States power grid Connections
The proposed generative model, incorporating a mixture of preferential and uniform
attachment, quantifies the degree to
which the rich nodes grow richer and
how new (and poorly connected) nodes can compete
What is Power Law?
Several investigations show that the distribution of the number of links to (and from) a web
page obeys a power law. So, what is power law?
Barabasi and Albert refer power law scaling to a ‘‘rich get richer’’ mechanism called
preferential attachment:
as the network grows, the probability that a given vertex receives an edge is
proportional to that vertex’s current connectivity.
In power law distribution, very large events are rare, but small events are quite common
such as;
There are few large earth-quakes, but many small ones.
There are few mega-cities, but many small ones.
There are few words, such as 'and' and 'the' that occur very frequently, but many which
occur rarely.
Category-Specific Degree Distributions
Several studies find that the probability that a randomly selected web page has k links is proportional to k
for large k , where is a constant, which is empirically determined as
2.1 for inbound links and 2.72 for outbound links
At small connectivities k, the distribution of links on the web fails to fit a power law (explained
later).
When displayed on a log–log plot, this so-called power law distribution appears linear with slope
What are inbound links and outbound links?
Inbound links: Links pointing to a website. When a user arrives at a website from another site, that
link is called an Inbound Link.
Outbound links: A link that points away from your website.
If the number of inbound links to a web page is interpreted as a measure of its popularity, then power law
distribution implies that a small fraction of web pages receive a disproportionately large share of such
endorsements.
As a result, although the vast majority of web pages have relatively small numbers of links, a few pages
have enormous numbers of links.
.
Category-Specific Degree Distributions (Cont’d)
Therefore, according to ‘Power Law’ few popular pages on the web
However, the majority of web sites suffer from relatively poor visibility.
benefit from a greater volume of traffic from web surfers,
a higher probability of being indexed in search engine databases, and
more important ranking within search engine results.
For example; new commercial sites may have difficulty in competing for consumer
attention.
Therefore; these affairs on the web is called as “winners take all” phenomenon.
Inbound link distributions for company homepages
We examined the inbound link distributions for a set of public company homepages. (Figure-1)
- Analytic Sol.: Eq3
- Data: Emprically observed
- Simulation: Sim. of model
-- Tail of the distribution continues to fit a power law
-- The body of the distribution is unimodal with a sharp and singular mode (indicating the
company homepages have between 99 and 146 inbound links)
The connectivity distributions of company homepages and university
homepages:
-They display the same qualitative shape —unimodal body and power law tail.
-Tails indicate that popular pages still gain a disproportionate percentage of all inbound links.
- Among less popular web pages of the same type, the distribution of inbound links is more balanced.
- Relative to their community, winners don’t quite “take all”.
- In the above graphics; Losing sites (or ordinary web-sites) attract a considerably higher proportion of
links than would be the case under the power law distribution.
The connectivity distributions of scientist homepages, and newspaper
homepages:
Network Growth Model
Generative Process Description:
The proposed model of network growth is used to explain the observed connectivity distributions
for
web categories,
other social networks
The model is similar to other generalized BA models
In the proposed model;
The network begins with m0 vertices.
At each time step t, one vertex and m edges are added to the network.
(In the BA model, all m edges connect the new vertex with an old vertex according to preferential
attachment: the probability (ki) that an edge connects to vertex i is proportional to ki , where ki is the
current number of edges incident on vertex i.)
Instead; in this model every vertex has at least some baseline probability of gaining an
edge.
So, both endpoints of edges are chosen according to a mixture of probability α for preferential
attachment and 1-α for uniform attachment. (Eq.1)
By using this formula;
The probability that an endpoint of a new edge connects to vertex i is;
Eq. 1
- Mo+t: total number of vertices.
Growth components
- 2mt: total connectivity at time t
- α: Preferential attachment parameter
- Each vertex increments its connectivity independently according to Eq. 1
- So, edge point are choosen symmetrically, rather than pinned to the newest vertex.
Therefore;
Solitary vertices are not destined to remain forever disconnected.
Under preferential attachment alone, sites that are already rich in links tend to get richer, resulting in a
power law distribution.
With the addition of a component for uniform attachment, the poorer sites (with some luck) can get rich
too.
Analytic Solution
With any other approximation, which is similar to one used by Barabasi and Albert, the connectivity
distribution for the model in closed-form can also be derived.
The probability of a vertex has connectivity k is;
Eq. 2
In the limit k --> œ, the empirically observed exponents are obtained as follows;
inbound web links: 2.10
outbound web links: 2.72
An analogous transformation of the probability density of (Eq. 2) can be performed to make a
comparison on a log-scale plot;
Eq. 3
- k = 2m(1- α): mode of the distribution on log-scale plot
- α 1 or m 0
the distribution approaches to the power law.
Related Models
Even Simon in 1955, invoked a similar process to explain Zipf ’s word-frequency distribution.
freq.
freq.
word
word
Linear scales on both axes
Logarithmic scales on both axes
Zipf distributions shows the characterarization of words in a natural language (like English);
a language has a few words ("the", "and", etc.) that are used extremely often
a language has quite a lot of words ("dog", "house", etc.) that are used relatively much
a language has many words ("Zipf", "double-logarithmic", etc.) that are almost never used
Related Models: Zipf distribution (Cont’d):
According to Zipfs study; ‘Web-Use’ follows a Zipf distribution as well.
The following figure shows the distribution of incoming page requests to www.sun.com during a
one-month period.
- black line: actual
empirical data
- red line: Zipf curve that
seems to fit the data quite
well except for the low end
- Each datapoint in the x-axis represents one page sorted according to popularity.
- The first page is the most popular one (the home page),
- The second page is the one that received second-most requests in that month,
- It goes on until reaching page number 10,000 (which was only requested a single time that month.)
Web Data and Model Comparisons
The proposed model has an ability to fit both the body and the tail of the category-specific
web pages.
There is a clear fit between the model and the actual connectivity distributions:
the body and the tail of typical degree distribution is fit as shown in figures (for
category specific data -company, university, newspaper, and scientist homepages- )
The distribution of links to university homepages exhibits the largest deviation
from a power law.
However, the distribution of inbound links on the web as a whole is closest to a
pure power law.
Conclusion
In all cases studied, the mixture parameter α is greater than 0.5.
Thus, preferential attachment plays a larger role in web-link-growth than does
uniform attachment.
The growth of links to company homepages (α = 0.950) and to newspaper homepages
(α = 0.948) dominated by the “rich get richer” process of preferential attachment.
However, link growth on scientist homepages (α = 0.602) and university homepages
(α = 0.612) suggest a more balanced mixture of preferential and uniform terms.
The results are seen in the following table:
Conclusion (Cont’d)
In this study, it is showed that;
Among web pages of the same type, the body of the distribution of inbound
links deviates strongly from a power law, showing a roughly log-normal
shape.
The proposed generative model which incorporates uniform and preferential
attachment explains data from the web as a whole, as well as categoryspecific
data from company, university, newspaper, and scientist homepages.
Thank you…
Questions ?
© Copyright 2026 Paperzz