Adventures in random graphs: Models, structures and algorithms

BCAM January 2011
1
Adventures in random graphs:
Models, structures and algorithms
Armand M. Makowski
ECE & ISR/HyNet
University of Maryland at College Park
[email protected]
BCAM January 2011
2
Complex networks
• Many examples
– Biology (Genomics, protonomics)
– Transportation (Communication networks, Internet, roads
and railroads)
– Information systems (World Wide Web)
– Social networks (Facebook, LinkedIn, etc)
– Sociology (Friendship networks, sexual contacts)
– Bibliometrics (Co-authorship networks, references)
– Ecology (food webs)
– Energy (Electricity distribution, smart grids)
• Larger context of “Network Science”
BCAM January 2011
3
Objectives
• Identify generic structures and properties of “networks
• Mathematical models and their analysis
• Understand how network structure and processes on networks
interact
What is new?
Very large data sets now easily available!
• Dynamics of networks vs. dynamics on networks
BCAM January 2011
4
Bibliography (I): Random graphs
• N. Alon and J.H. Spencer, The Probabilistic Method (Second
Edition), Wiley-Science Series in Discrete Mathematics and
Optimization, John Wiley & Sons, New York (NY) 2000.
• A. D. Barbour, L. Holst and S. Janson, Poisson
Approximation, Oxford Studies in Probability 2, Oxford
University Press, Oxford (UK), 1992.
• B. Bollobás, Random Graphs, Second Edition, Cambridge
Studies in Advanced Mathematics, Cambridge University
Press, Cambridge (UK), 2001.
• R. Durrett, Random Graph Dynamics, Cambridge Series in
Statistical and Probabilistic Mathematics, Cambridge
University Press, Cambridge (UK), 2007.
BCAM January 2011
• M. Draief and L. Massoulié, Epidemics and Rumours in
Complex Networks, Cambridge University Press, Cambridge
(UK), 2009.
• D. Dubhashi and A. Panconesi, Concentration of Measure for
the Analysis of Randomized algorithms, Cambridge University
Press, New York (NY), 2009.
• M. Franceschetti and R. Meester, Random Networks for
Communication: From Statistical Physics to Information
Systems, Cambridge Series in Statistical and Probabilistic
Mathematics, Cambridge (UK), 2007.
• S. Janson, T. Luczak and A. Ruciński, Random Graphs,
Wiley-Interscience Series in Discrete Mathematics and
Optimization, John Wiley & Sons, 2000.
• R. Meester and R. Roy, Continuum Percolation, Cambridge
University Press, Cambridge (UK), 1996.
5
BCAM January 2011
• M.D. Penrose, Random Geometric Graphs, Oxford Studies in
Probability 5, Oxford University Press, New York (NY), 2003.
6
BCAM January 2011
7
Bibliography (II): Survey papers
• R. Albert and A.-L. Barabási, “Statistical mechanics of
complex networks,” Review of Modern Physics 74 (2002), pp.
47-97.
• M.E.J. Newman, “The structure and function of complex
networks,” SIAM Review 45 (2003), pp. 167-256.
BCAM January 2011
Bibliography (III): Complex networks
• A. Barrat, M. Barthelemy and A. Vespignani, Dynamical
Processes on Complex Networks, Cambridge University Press,
Cambridge (UK), 2008.
• R. Cohen and S. Havlin, Complex Networks - Structure,
Robustness and Function, Cambridge University Press,
Cambridge (UK), 2010.
• D. Easley and J. Kleinberg, Networks, Crowds, and Markets:
Reasoning About a Highly Connected World, Cambridge
University Press, Cambridge (UK) (2010).
• M.O. Jackson, Social and Economic Networks, Princeton
University Press, Princeton (NJ), 2008.
8
BCAM January 2011
• M.E.J. Newman, A.-L. Barabási and D.J. Watts (Editors), The
Structure and Dynamics of Networks, Princeton University
Press, Princeton (NJ), 2006.
9
BCAM January 2011
10
LECTURE 1
BCAM January 2011
11
Basics of graph theory
BCAM January 2011
12
What are graphs?
With V a finite set, a graph G is an ordered pair (V, E) where
elements in V are called vertices/nodes and E is the set of
edges/links:
E ⊆V ×V
E(G) = E
Nodes i and j are said to be adjacent, written i ∼ j, if
e = (i, j) ∈ E,
i, j ∈ V
BCAM January 2011
Multiple representations for G = (V, E)
Set-theoretic – Edge variables {ξij , i, j ∈ V } with



 1 if (i, j) ∈ E
ξij =



0 if (i, j) ∈
/E
Algebraic – Adjacency matrix A = (Aij ) with



 1 if (i, j) ∈ E
Aij =



0 if (i, j) ∈
/E
13
BCAM January 2011
14
Some terminology
Simple graphs vs. multigraphs
Directed vs. undirected
(i, j) ∈ E
if and only if
(j, i) ∈ E
No self loops
(i, i) ∈
/ E,
i∈V
Here: Simple, undirected graphs with no self loops!
BCAM January 2011
15
Types of graphs
• The empty graph
• Complete graphs
• Trees/forests
• A subgraph H = (W, F ) of G = (V, E) is a graph with vertex
set W such that
W ⊆V
and F = E ∩ (W × W )
• Cliques (Complete subgraphs)
BCAM January 2011
16
Labeled vs. unlabeled graphs
A graph automorphism of G = (V, E) is any one-to-one mapping
σ : V → V that preserves the graph structure, namely
(σ(i), σ(j)) ∈ E
if and only if
Group Aut(G) of graph automorphisms of G
(i, j) ∈ E
BCAM January 2011
17
Of interest
• Connectivity and k-connectivity (with k ≥ 1)
• Number and size of components
• Isolated nodes
• Degree of a node: degree distribution/average degree,
maximal/minimal degree
• Distance between nodes (in terms of number of hops): Shortest
path, diameter, eccentricity, radius
• Small graph containment (e.g., triangles, trees, cliques, etc.)
• Clustering
• Centrality: Degree, closeness, in-betweenness
BCAM January 2011
18
For i, j in V ,
`ij =
Shortest path length between
nodes i and j in the graph G = (V, E)
Convention: `ij = ∞ if nodes i and j belong to different
components and `ii = 0.
Average distance
`Avg
XX
1
=
`ij
|V |(|V | − 1)
i∈V j∈V
Diameter
d(G) = max (`ij , i, j ∈ V )
BCAM January 2011
19
Eccentricity
Ec(i) = max (`ij , j ∈ V ) ,
i∈V
Radius
rad(G) = min (Ec(i), i ∈ V )
`Avg ≤ d(G)
and
rad(G) ≤ d(G) ≤ 2 rad(G)
BCAM January 2011
20
Centrality
Q: How central is a node?
Closeness centrality
g(i) = P
1
j∈V
`ij
,
i∈V
Betweenness centrality
b(i) =
X
k6=i, k6=j,
σkj (i)
,
σkj
i∈V
BCAM January 2011
21
Clustering
Clustering coefficient of node i
P
C(i) =
j6=i, k6=i, j6=k ξij ξik ξkj
P
j6=i, k6=i, j6=k ξij ξik
Average clustering coefficient
CAvg
1X
=
C(i)
n
i∈V
Number of fully connected triples
C =3·
Number of triples
BCAM January 2011
22
Random graphs
BCAM January 2011
23
Random graphs?
G(V ) ≡
Collection of all (simple free of self-loops undirected)
graphs with vertex set V .
Definition – Given a probability triple (Ω, F, P), a random
graph is simply a graph-valued rv G : Ω → G(V ).
Modeling – We need only specify the pmf
{P [G = G] ,
G ∈ G(V )}.
Many, many ways to do that!
BCAM January 2011
24
Equivalent representations for G
Set-theoretic – Link assignment rvs {ξij , i, j ∈ V } with



 1 if (i, j) ∈ E(G)
ξij =



0 if (i, j) ∈
/ E(G)
Algebraic – Random adjacency matrix A = (Aij ) with



 1 if (i, j) ∈ E(G)
Aij =



0 if (i, j) ∈
/ E(G)
BCAM January 2011
25
Why random graphs?
Useful models in many applications to capture binary relationships
between participating entities
Because
|G(V )| =
'
2
2
|V |(|V |−1)
2
|V |2
2
A very large number!
there is a need to identify/discover typicality!
Scaling laws – Zero-one laws as |V | becomes large, e.g.,
V ≡ Vn = {1, . . . , n}
(n → ∞)
BCAM January 2011
26
Ménagerie of random graphs
• Erdős-Renyi graphs G(n; m)
• Erdős-Renyi graphs G(n; p)
• Generalized Erdős-Renyi graphs
• Geometric random models/disk models
• Intrinsic fitness and threshold random models
• Random intersection graphs
• Growth models: Preferential attachment, copying
• Small worlds
• Exponential random graphs
• Etc
BCAM January 2011
27
Erdős-Renyi graphs G(n; m)
With
n
n(n − 1)
1≤m≤
=
,
2
2
the pmf on G(Vn ) is specified by

−1

u(n;
m)


P [G(n; m) = G] =



0
if |E(G)| = 2m
if |E(G)| =
6 2m
where
n(n−1) u(n; m) =
2
m
BCAM January 2011
Uniform selection over the collection of all graphs on the vertex
set {1, . . . , n} with exactly m edges
28
BCAM January 2011
29
Erdős-Renyi graphs G(n; p)
With
0 ≤ p ≤ 1,
the link assignment rvs {χij (p), 1 ≤ i < j ≤ n} are i.i.d.
{0, 1}-valued rvs with
P [χij (p) = 1] = 1 − P [χij (p) = 0] = p,
1≤i<j≤n
For every G in G(V ),
P [G(n; p) = G] = p
|EG|
2
· (1 − p)
n(n−1)
|EG|
− 2
2
BCAM January 2011
30
Related to, but easier to implement than G(n; m)
Similar behavior/results under the matching condition
|E(G(n; m))| = E [|E(G(n; p))|] ,
namely
n(n − 1)
m=
p
2
BCAM January 2011
31
Generalized Erdős-Renyi graphs
With
0 ≤ pij ≤ 1,
1≤i<j≤n
the link assignment rvs {χij (p), 1 ≤ i < j ≤ n} are mutually
independent {0, 1}-valued rvs with
P [χij (pij ) = 1] = 1 − P [χij (pij ) = 0] = pij ,
1≤i<j≤n
An important case: With positive weights w1 , . . . , wn ,
wi wj
pij =
W
with W = w1 + . . . + wn
BCAM January 2011
32
Geometric random graphs (d ≥ 1)
With random locations in Rd at
X 1, . . . , X n,
the link assignment rvs {χij (ρ), 1 ≤ i < j ≤ n} are given by
χij (ρ) = 1 [ kX i − X j k ≤ ρ ] ,
where ρ > 0.
1≤i<j≤n
BCAM January 2011
33
Usually, the rvs X 1 , . . . , X n are taken to be i.i.d. rvs uniformly
distributed over some compact subset Γ ⊆ Rd
Even then, not so obvious to write
P [G(n; ρ) = G] ,
G ∈ G(V ).
since the rvs {χij (ρ), 1 ≤ i < j ≤ n} are no more i.i.d. rvs
For d = 2, long history for modeling wireless networks (known as
the disk model) where ρ interpretated as transmission range
BCAM January 2011
34
Threshold random graphs
Given R+ -valued i.i.d. rvs W1 , . . . , Wn with absolutely continuous
probability distribution function F ,
i∼j
if and only if Wi + Wj > θ
for some θ > 0
Generalizations:
i∼j
if and only if R(Wi , Wj ) > θ
for some symmetric mapping R : R2+ → R+ .
BCAM January 2011
35
Random intersection graphs
Given a finite set W ≡ {1, . . . , W } of features, with random
subsets K1 , . . . , Kn of W,
i∼j
if and only if Ki ∩ Kj 6= ∅
Co-authorship networks, random key distribution schemes,
classification/clustering
BCAM January 2011
36
Growth models
{Gt , t = 0, 1, . . .}
with rules
Vt+1 ← Vt
and
Gt+1 ← (Gt , Vt+1 )
• Preferential attachment
• Copying
Scale-free networks
BCAM January 2011
37
Small worlds
• Between randomness and order
• Shortcuts
• Short paths but high clustering
Milgram’s experiment and six degrees of separation
BCAM January 2011
38
Exponential random graphs
• Models favored by sociologists and statisticians
• Graph analog of exponential families often used in statistical
modeling
• Related to Markov random fields
BCAM January 2011
39
With I parameters
θ = (θ1 , . . . , θI )
and a set of I observables (statistics)
ui : G(V ) → R+ ,
we postulate
P [G = G] =
e
PI
i=1
θi ui (G)
,
Z(θ)
G ∈ G(V )
with normalization constant
Z(θ) =
X
G∈G(V )
e
PI
i=1
θi ui (G)
BCAM January 2011
40
In sum
• Many different ways to specify the pmf on G(V )
– Local description vs. global representation
– Static vs. dynamic
– Application-dependent mechanisms