Full size

Web Systems and Algorithms
Small World Networks
Chris Brooks
Department of Computer Science
University of San Francisco
Department of Computer Science — University of San Francisco – p.1/??
Advantages of Graphs
What are some advantages to modeling the web as a
graph?
Department of Computer Science — University of San Francisco – p.2/??
Advantages of Graphs
What are some advantages to modeling the web as a
graph?
Graphs are well-understood data structures
Lets us explicitly represent link structure
Lets us think about connectivity, components
We can develop algorithms to find or rank pages that
take advantage of graph structure.
Department of Computer Science — University of San Francisco – p.3/??
Weaknessess of Graphs
What are some disadvantages to modeling the web as a
grpah?
Department of Computer Science — University of San Francisco – p.4/??
Weaknessess of Graphs
What are some disadvantages to modeling the web as a
grpah?
Hard to capture dynamics
State captured in URIs
Unique search requests
Rich client-side interactions
Deep Web/Dark Web
Department of Computer Science — University of San Francisco – p.5/??
Examples of Networks
One interesting aspect of “small-world” graphs is that
they appear in so many different domains:
Computer networks
Social networks
Citation networks
Food Webs in biology
Gene regularatory networks
Neural pathways in nematodes
and many more ...
Department of Computer Science — University of San Francisco – p.6/??
Milgram’s Small-world experiment
In the 1960’s Stanley Milgram conducted some of the
first research on small-world networks.
The experiment:
Select some source people in Nebraska
Give them a letter with the name and address of a
person in Massachusetts.
They must get the letter to its destination - each
person can only forward the letter to someone they
know on a first-name basis.
Only about 25% of the letters were delivered.
Those that arrived had an average path length
between 5 and 6.
Basis for the idea of “six degrees of separation.”
Department of Computer Science — University of San Francisco – p.7/??
Milgram’s Small-world experiment
Assuming we think Milgram’s results are correct, there
are two interesting implications:
There exist short paths between randomly chosen
sets of nodes that are geographically distant.
It is possible to discover those paths using only local
information.
Department of Computer Science — University of San Francisco – p.8/??
Definitions: small-world
Networks such as Milgram’s are referred to as
small-world networks.
This is a network in which most nodes are not directly
connected, but a path of short length exists between any
two nodes due to the presence of occasional edges
linking distant nodes.
These networks also have a higher-than-expected
clustering coefficient
A parameter that descibes how likely nodes are to
connect to their neighbors.
Small-world networks tend to have lots of cliques.
Department of Computer Science — University of San Francisco – p.9/??
Definitions: Power law
Small-world networks are said to obey a power law in
their in-degree and out-degree.
A power law distribution tells us the probability that a
randomly selected element has value x.
P (X = x) ∼ Kxq where K is some constant and q is a
parameter of the distribution.
When we sort these values according to size, and plot
them, we get a familiar-looking graph.
Department of Computer Science — University of San Francisco – p.10/??
Definitions: Power law
Graph from Newman, The Structure and Function of Complex Networks.
c, d, and f are power laws.
e is a regular exponential distribution.
a has a truncated power-law curve, and b deviates fromDepartment
the power-law
for—small
of Computer Science
University degree.
of San Francisco – p.11/??
Definitions: long tail
When we plot empirical data, we often see that the “tail”
of the data does not match up exactly with the predicted
curve.
This is known as a long tail.
The phrase is also sometimes used to refer more
generally to the right-hand part of the distribution.
This phrasing is popular in economics, where it’s
used to justify niche markets
Department of Computer Science — University of San Francisco – p.12/??
Definitions: scale-free
Scale-free is a term sometimes used to apply to graphs
that obey a power law in some feature.
The term gets at a notion of self-similarity or fractalness;
subgraphs of the network share the same
characteristics (power law, clustering coefficient) that
the network as whole does.
The work of Watts and Strogatz and of Kleinberg
proposes algorithmic techniques for randomly creating
graphs that have these characteristics.
This allows researchers to easily study and compare
algorithms designed to exploit the properties of
scale-free networks.
For example, simulate the performance of a new web
page ranking algorithm.
Department of Computer Science — University of San Francisco – p.13/??
Watts-Strogatz model
Duncan Watts and Steve Strogatz developed a
mathematical model that attempted to elegantly
describe how to generate small-world networks.
Start with V = {v1 , v2 , ..., vn } points, evenly spaced in
one dimension (a circle.)
We will distinguish between local and long-range
contacts.
Local contacts for each point are the k closest neighbors
( k2 on each side), where k is a parameter that can be
manipulated.
Department of Computer Science — University of San Francisco – p.14/??
Watts-Strogatz model
β is a parameter that governs the number of long-range
contacts.
For each point vi
For each neighbor vj
• With probability β , remove the edge from vi to vk
and replace it with an edge from vi to vk , where
vk is selected from a uniform distribution of all
nodes not connected to vi .
Department of Computer Science — University of San Francisco – p.15/??
Watts-Strogatz model
n = 10, K = 2, β = 0.25
We have set up local
connections here.
Department of Computer Science — University of San Francisco – p.16/??
Watts-Strogatz model
n = 10, K = 2, β = 0.25
After rewiring. We can
see that in-degree is
more varied, and there is
a node that is beginning
to look like a “hub”
Department of Computer Science — University of San Francisco – p.17/??
Watts-Strogatz model
These networks exhibit a high clustering coefficient, as
do observed small-world networks.
Average path length is
|V |
2k .
This scales linearly with |V |.
However, nodes do not have a scale-free degree
distribution.
No underlying “geography” to take advantage of; does
this help explain Milgram’s experiment?
Department of Computer Science — University of San Francisco – p.18/??
Kleinberg model
One problem with the Watts-Strogatz model is:
Even though short paths exist, there is no way to find
them in a decentralized way (a la Milgram)
Participants must just guess at an edge to use in
forwarding a message.
Kleinberg’s work generalizes the Watts-Strogatz model,
and shows how agents can discover shortest paths
using only local information.
Department of Computer Science — University of San Francisco – p.19/??
Kleinberg Model
Kleinberg begins with a two-dimensional lattice, rather
than a ring.
Edges in this graph are directed.
The “lattice distance” between two nodes is their
Manhattan distance.
As in the Watts-Strogatz model, all nodes are connected
to their neighbors, with a few far-away connections.
Department of Computer Science — University of San Francisco – p.20/??
Kleinberg Model
Kleinberg’s model uses a parameter p to indicate
neighborhood.
A node has an edge from it to all other nodes within p.
It then uses parameters q and r to construct long-range
contacts:
for i = 1 to q :
Add an edge from a source to a long-range
neighbor, where each neighbor is selected with
probability proportional to d(u, v)−r .
If r=0, all nodes are equally likely. If r is very large,
closer nodes are more likely.
Department of Computer Science — University of San Francisco – p.21/??
Route-finding
This network provides a nice, easy algorithm for finding
short paths in a decentralized way:
Each node forwards messages to the node nearest
(according to d) to the target.
Kleinberg shows that, in graphs in which contacts are
formed independly of d (i.e. r=0), the shortest path found
2
3
by a decentralized mechanism is proportional to n
For positive values of r (specifically r = 2), the shortest
path found by a decentralized mechanism is
proportional to log 2 n.
Department of Computer Science — University of San Francisco – p.22/??
Kleinberg Model
Kleinberg’s model uses a parameter p to indicate
neighborhood.
A node has an edge from it to all other nodes within p.
It then uses parameters q and r to construct long-range
contacts:
for i = 1 to q :
Add an edge from a source to a long-range
neighbor, where each neighbor is selected with
probability proportional to d(u, v)−r .
If r=0, all nodes are equally likely. If r is very large,
closer nodes are more likely.
Department of Computer Science — University of San Francisco – p.23/??
Route-finding
This network provides a nice, easy algorithm for finding
short paths in a decentralized way:
Each node forwards messages to the node nearest
(according to d) to the target.
Kleinberg shows that, in graphs in which contacts are
formed independently of d (i.e. r=0), the shortest path
found by a decentralized mechanism is proportional to
2
n3
For positive values of r (specifically r = 2), the shortest
path found by a decentralized mechanism is
proportional to log 2 n.
Department of Computer Science — University of San Francisco – p.24/??
Route-finding
For r = 0, Edges are
uniformly distributed
throughout the lattice.
(This is the
Watts-Strogatz model)
This
generalizes
to
k -dimensional
lattices,
where r = k produces
optimal performance.
lower bound
on delivery time
For r >> 2, paths
become progressively
longer, as long-range
links get shorter.
1
2
3
4
5
r
Department of Computer Science — University of San Francisco – p.25/??
Applications to the Web
Again, what does this have to do with the Web?
Department of Computer Science — University of San Francisco – p.26/??
Applications to the Web
Again, what does this have to do with the Web?
Understanding network structure helps us identify
cliques or communities of related documents.
Knowing about shortest-path properties can help with
navigational suggestions.
These ideas also turn out to play a large role in
determining the importance of documents.
Documents with lots of inward links are considered
important.
Understanding Web structure can help us design better
peer-to-peer and Web cacheing mechanisms.
Department of Computer Science — University of San Francisco – p.27/??