Traffic-driven model of the World-Wide

Traffic-driven model of the
World-Wide-Web Graph
A. Barrat, LPT, Orsay, France
M. Barthélemy, CEA, France
A. Vespignani, LPT, Orsay, France
Outline





The WebGraph
Some empirical characteristics
Various models
Weights and strengths
Our model:
 Definition
 Analysis: analytics+numerics
 Conclusions
The Web as a directed graph
l
j
i
in- and outdegrees:
nodes i: web-pages
directed links: hyperlinks
Empirical facts
•Small world : captured by Erdös-Renyi graphs
With probability p an edge is
established among couple of
vertices
<k> = p N
Poisson distribution
Empirical facts
•Small world
•Large clustering: different neighbours of a node
will likely know each other
n
3
Higher probability to be connected
2
1
=>graph models with large clustering, e.g. Watts-Strogatz 1998
Empirical facts
•Small world
•Large clustering
•Dynamical network
•Broad connectivity distributions
•also observed in many other contexts
(from biological to social networks)
•huge activity of modeling
(Barabasi-Albert 1999; Broder et al. 2000; Kumar et al. 2000;
Adamic-Huberman 2001; Laura et al. 2003)
Various growing networks models
 Barabási-Albert (1999): preferential attachment
 Many variations on the BA model: rewiring (Tadic
2001, Krapivsky et al. 2001), addition of edges,
directed model (Dorogovtsev-Mendes 2000,
Cooper-Frieze 2001), fitness (Bianconi-Barabási
2001), ...
 Kumar et al. (2000): copying mechanism
 Pandurangan et al. (2002): PageRank+pref.
attachment
 Laura et al. (2002): Multi-layer model
 Menczer (2002): textual content of web-pages
The Web as a directed graph
l
j
nodes i: web-pages
directed links: hyperlinks
i
Broad P(kin) ; cut-off for P(kout)
(Broder et al. 2000; Kumar et al. 2000;
Adamic-Huberman 2001; Laura et al. 2003)
Additional level of complexity:
Weights and Strengths
l
j
i
Links carry weights/traffic:
wij
In- and out- strengths
Adamic-Huberman 2001: broad distribution of sin
Model: directed network
j
n
(i) Growth
(ii) Strength driven
preferential attachment
(n: kout=m outlinks)
i
“Busy gets busier”
AND...
Weights reinforcement mechanism
j
n
i
The new traffic n-i increases the traffic i-j
“Busy gets busier”
Evolution equations
(Continuous approximation)
Coupling term
Resolution
Ansatz
supported by numerics:
Results
Approximation
Total in-weight i sini : approximately proportional to the
total number of in-links i kini , times average weight hwi = 1+
Then: A=1+
gsin 2 [2;2+1/m]
Numerical simulations
Measure of A
prediction of g
Approx of g
Numerical simulations
NB: broad P(sout) even if kout=m
Clustering spectrum
i.e.: fraction of connected couples of neighbours of node i
Clustering spectrum
•  increases => clustering increases
• New pages: point to various well-known pages, often connected
together => large clustering for small nodes
• Old, popular pages with large k: many in-links from many less
popular pages which are not connected together
=> smaller clustering for large nodes
Clustering and weighted clustering
takes into account the relevance of triangles in the global traffic
Clustering and weighted clustering
Weighted Clustering larger than topological clustering:
triangles carry a large part of the traffic
Assortativity
Average connectivity of
nearest neighbours of i
Assortativity
•knn: disassortative behaviour, as usual in growing networks
models, and typical in technological networks
•lack of correlations in popularity as measured by the in-degree
Summary
 Web: heterogeneous topology and traffic
 Mechanism taking into account interplay between
topology and traffic
 Simple mechanism=>complex behaviour, scale-free
distributions for connectivity and traffic
 Analytical study possible
 Study of correlations: non-trivial hierarchical
behaviour
 Possibility to add features (fitnesses, rewiring,
addition of edges, etc...), to modify the redistribution
rule...
 Empirical studies of traffic and correlations?