RANDOM GRAPH MODELS
Changiz Eslahchi
Computer Science
Shahid Beheshti University
Workshop on Systems Biology
June 2014
1
Outline
●
Complexity and Networks
●
Definitions
●
Random Graphs
●
Classical Models
●
Configuration Model
●
Small-World Model
●
Geometric Model
●
Scale-Free Model
2
COMPLEXITY
●
“Complexity, a scientific theory which asserts that some
systems display behavioral phenomena that are completely
inexplicable by any conventional analysis of the systems'
constituent parts. These phenomena, commonly referred to as
emergent behavior, seem to occur in many complex systems
involving living organisms, such as a stock market or the
human brain” – John L. Casti, Encyclopedia Britannica
●
“I think the next century will be the century of complexity” –
Stephan Hawking, Jan 23, 2000
3
COMPLEX SYSTEMS AND
NETWORKS
●
Complex system: A group or organization made up of many
interacting parts
●
We can model a Complex System as a Network, each part as a node,
and links between them if they interact
●
Examples:
–
World Trading Network
–
Power Grid Networks
–
Social Networks
–
Biological Networks: PPI Networks, Metabolic Networks, Signaling
Networks, etc.
4
COMPLEX SYSTEMS AND
NETWORKS
●
“It is increasingly clear that we will never understand complex
systems unless we gain a deep understanding of the networks
behind them.” –Albert-Laszlo Barabasi, Network Science
●
Network Science draws on theories and methods including:
–
Graph theory from mathematics
–
Statistical mechanics from physics
–
Data mining and information visualization from computer science
–
Inferential models from statistics
–
Social structures from sociology
5
COMPLEX SYSTEMS AND
NETWORKS
●
Hawking was right!
6
COMPLEX SYSTEMS AND
NETWORKS
●
Why Now?
–
–
Data availability
●
World Wide Web network, 1999
●
C. Elegans neural wiring network, 1990
●
Metabolic network, 2000
●
PPI network, 2001
Universality
●
The architecture of networks emerging in various domains of science, nature, and
technology are more similar to each other than one would have expected
–
The need to understand complexity
●
Networks are not only essential for this journey, but during the past decade some of
the most important advances towards understanding complexity were provided in
context of network theory.
7
NETWORK SCIENCE
IN REAL WORLD
●
The thread of global pandemic
–
2003: SARS outbreak
–
2009: H1N1 pandemic
8
NETWORK SCIENCE
IN REAL WORLD
●
The Northeast blackout of 2003
–
Affected 10 million people in Ontario and 45 million
in eight U.S. States.
9
NETWORK SCIENCE
IN REAL WORLD
●
Human Brain has between 10-100 million neurons
–
How to model and analyze it?
10
MODELING COMPLEX NETWORKS
●
Real-world complex networks contain extremely large
number of interacting parts.
–
Model each part as a node in a graph
–
Put an edge between two nodes if their corresponding
parts interact.
●
Question: How should edges be placed in order to model
the real-world complex network at hand?
●
Question: How good a model fit the real-world network?
11
DEFINITIONS
●
The neighborhood Ni for a node vi is defined as its immediately connected
neighbors as follows:
N i ={v j : e ij ∈E ∧e ji ∈E }
●
Walk:
–
A walk from node i to node j is an alternating sequence of nodes and
edges that begins with i and ends with j.
–
The length of the walk is defined as the number of edges in the
sequence.
–
A path is a walk in which no node is visited more than once.
–
The walk with minimal length between two nodes is known as shortest
path or geodesic.
12
DEFINITIONS
●
Connectivity:
–
A graph is said to be connected if for every pair of
distinct nodes i and j, there is a path from i to j.
Otherwise it is said disconnected or unconnected.
–
A component of the graph is a maximally
connected induced subgraph.
–
A giant component is a component whose size is
of the same order as the number of vertices.
13
NETWORK PROPERTIES
●
Degree Distribution, P(k):
–
The probability that a node chosen uniformly at
random has degree k
–
●
The fraction of nodes in the graph having degree k
Mean degree:
〈 k 〉=∑ k P (k )
k
14
NETWORK PROPERTIES
●
The degree distribution completely determines the statistical
properties of uncorrelated networks.
●
As we will see later in this presentation, A large number of
real-world networks are correlated in the sense that the
probability that a node of degree k' is connected to another
node of degree, say k, depends on k.
●
Conditional Probability P(k'|k) :
–
The probability that a link from a node with degree k points
to a node of degree k'
16
NETWORK PROPERTIES
●
Average nearest neighbors degree
–
The average degree of the nearest neighbors of
nodes with degree k can be defined as:
k nn ( k )=∑ k ' P ( k '∣k )
k'
–
Correlated graphs are classified as assortative if knn(k)
is an increasing function of k, whereas they are
referred to as disassortative when knn(k) is a
decreasing function of k.
17
NETWORK PROPERTIES
●
In assortative networks the nodes tend to
connect to their connectivity peers
●
In disassortative networks nodes with low
degree are more likely connected with highly
connected ones.
18
NETWORK PROPERTIES
●
Diameter:
–
●
Average Shortest Path:
–
●
The longest of all shortest paths.
The average of all shortest paths between all pairs of vertices.
Global Efficiency:
–
The inverse of the average shortest path.
–
Such a quantity is an indicator of the traffic capacity of a
network
19
NETWORK PROPERTIES
●
Node Centrality:
–
Refers to the relative importance of a node or vertex within the
network.
–
Betweenness Centrality:
●
The betweenness centrality of a particular vertex is the fraction of
shortest paths in the network that pass through this vertex.
–
Closeness Centrality:
●
The inverse of the average distance from all other nodes. Distance
is defined as the length of shortest path between two nodes.
20
NETWORK PROPERTIES
●
Clustering (transitivity):
–
Two individuals with a common friend are likely
to know each other.
–
Transitivity measures the presence of a high
number of triangles in a graph.
3×number of triangles in G
T=
number of connected triples of vertices in G
21
NETWORK PROPERTIES
●
Local Clustering Coefficient:
–
It quantifies how close are a vertex and its neighbors
to form a clique.
–
The local clustering coefficient Ci for a node vi is then
given by the proportion of links between the nodes
within its neighborhood divided by the number of links
that could possibly exist between them.
∣{e jk : v j , v k ∈N i , e jk ∈E }∣
Ci=
k i (k i −1)
22
N i ={v j : e ij ∈E ∧e ji ∈ E }
NETWORK PROPERTIES
●
Global Clustering Coefficient:
–
The average of the local clustering coefficients of
all nodes in the graph:
1
C =〈C i 〉= ∑ C i
n i
●
Clustering Coefficient of a connectivity class k:
–
The average of Ci taken over all nodes with a given
degree k.
23
NETWORK PROPERTIES
●
Randomized version of a Graph:
–
Given a graph G, the randomized version of G is a
graph with the same number of nodes, links and the
same degree distribution as G but where links are
distributed at random.
●
Motifs:
–
A motif M is a small connected graph occurring in the
graph G at a number significantly higher than in
randomized versions of the graph.
24
NETWORK PROPERTIES
●
Motifs:
–
The statistical significance of M is described by the
Z-score defiend as:
Z M=
n M −〈 n rand
M 〉
rand
σn
M
where nM is the number of times subgraph M appears in G,
rand
rand
and 〈 n M 〉 and σ n
M
are, respectively, the mean and
standard deviation of the number of appearances in the
randomized network ensemble.
25
RANDOM GRAPHS
●
Notion of a random graph originated in a paper
of Erdös (1947) to prove the existence of a
graph with a specific Ramsey property.
26
RANDOM GRAPHS
●
Random Graph can be described as the
probability space (Ω,F,P) where Ω is the set of all
graphs with vertex set [n] = {1,2,...,n}, F is the
family of all subsets of Ω and for every G∈Ω, P(G)
is the probability of the outcome G.
27
CLASSICAL MODELS
●
Binomial random graph:
–
G(n,p) is defined by taking Ω the set of all graphs
on the vertex set [n] and setting
eG
n
−e G
2
()
P (G)= p (1− p)
where e G =∣E (G)∣ stands for the number of edges
of G and 0 ≤ p ≤ 1
28
CLASSICAL MODELS
●
Binomial random graph:
–
G(n,p) can be viewed as a result of
()
n
2
independent coin flippings, one for each pair
of vertices, with the probability of success
equal to p
29
CLASSICAL MODEL PROPERTIES
●
The total probability of drawing a graph with m
edges from the G(n,p) ensemble is:
()
()
n −m
n
(
m
2)
P (m)= 2 p (1− p)
m
●
The mean value can be derived using binomial
theorem:
( n2)
()
〈 m〉= ∑ mP (m)=
m=0
30
n
p
2
CLASSICAL MODEL PROPERTIES
●
For any network with m edges and n nodes, the
mean degree of a node is <k> = 2m/n. Thus,
mean degree ( sometimes denoted as c ) in
G(n,p) is:
( n2)
〈 k 〉= ∑
m=0
2m
2 n
P (m)=
p=(n−1) p
n
n 2
()
34
CLASSICAL MODEL PROPERTIES
●
For a network of n vertices without
self-connections the maximal possible vertex
degree is n − 1
●
So, the probability that a randomly chosen
vertex has degree k is:
( )
P (k )= n−1 p k (1− p)n−1−k
k
35
CLASSICAL MODEL PROPERTIES
( )
P (k )= n−1 p k (1− p)n−1−k
k
●
Let p=c/(n-1):
ln [(1− p)n−1−k ]=(n−1−k ) ln (1−
≃(n−1−k )−
c
n−1
≃−c
●
So:
n−1−k
(1− p)
≃e
−c
36
c
)
n−1
CLASSICAL MODEL PROPERTIES
( )
P (k )= n−1 p k (1− p)n−1−k
k
●
Thus:
k
c
n−1 k −c
−c
P (k )≃
p e ≃ e
k!
k
( )
37
CLASSICAL MODEL PROPERTIES
●
Thus, as number of nodes in G(n,p) tends to infinity, degree
distribution P(k) is Poisson distribution with mean and variance c.
c k −c
P (k )≃ e
k!
38
CLASSICAL MODEL PROPERTIES
●
Clustering Coefficient:
number of triangles
C=
number of connected triples
n 3
p
3
c
∝
= p=
n−1
n 2
p
3
()
()
●
This implies that C = O(1/n) for large graphs. It means the
density of triangles in these graphs decays toward zero. The
result is that G(n,p) graphs are locally tree-like.
39
CLASSICAL MODEL PROPERTIES
●
The giant component:
–
G(n,p) model exhibits one extremely interesting property, which is
sudden appearance, as we vary the mean degree c, of a giant
component.
●
This sudden appearance is called a phase transition.
40
CLASSICAL MODEL PROPERTIES
●
Small diameter:
–
The size of the giant component is O(n).
–
The structure of the giant component is locally
tree-like as we have seen.
–
We conclude that with high probability, the giant
component has a depth O(log n), which will be the
diameter of the network.
42
CLASSICAL MODELS' DRAWBACKS
●
Most real-world networks:
–
Have heavy-tailed degree distribution
–
Significantly higher clustering coefficient than
that of classical models
●
We need more accurate models to
analyze real-world networks.
43
CONFIGURATION MODEL
●
We can improve classical models by using a
generalization called the configuration model.
●
In this generalization we generate a graph G of
size n, given a degree sequence:
⃗k =k i i ∈{1,2,. .. , n }
●
The degree sequence ⃗k can be any sequence so
long as
∑ ki
is an even integer.
44
CONFIGURATION MODEL
●
Choose
⃗k to be a sequence of values drawn independently from some degree
distribution P(k).
●
Let v be an array of length 2m and let us write the index of each vertex i exactly ki
times in the vector v. Each of these entries will represent a single edge stub
attached to vertex i.
●
Take a random permutation of the entries of v and read the contents of the array
in order, in pairs. For each pair, add the corresponding edge to the graph.
46
CONFIGURATION MODEL
●
The matching can include self-loops and multi-edges that yields
non-simple graphs which is not desired in most applications.
●
We can either:
–
Reject configurations that turn out to be non-simple graphs.
–
Perform an additional test whenever two stubs are matched and discard
inadmissible matchings.
●
In either case, the uniformity of sampling is not guaranteed
anymore which may be important in some applications.
47
CONFIGURATION MODEL
APPLICATION
●
Shen-Orr et al. Introduced a method for the computation of the statistical
significance of a subgraph in a network as follows:
–
Given a graph G and a network pattern ζ, count the number of occurrences NG(ζ) of
this pattern in the whole graph.
–
Build a set H of graphs with the same degree sequence as G but otherwise randomly
distributed edges.
–
Count the number of occurrences of this pattern for all graphs G' ∈ H and compute
the fraction p of graphs in which the number of occurrences of this pattern is at least
as large as in the original graph G.
●
The resulting fraction is the empirical approximation of the real p-value. A low
p-value implies that the observed occurrence of ζ is less likely to be simply caused
by the structure of the data.
49
SMALL-WORLD RANDOM GRAPHS
●
Milgram's “Small World Experiments” in 1967
●
Small-World Effect is the observation that one can find a
short chain of acquaintances, often of no more than a
handful of indivisuals, connecting almost any two
people on the planet.
●
Duncan Watts et al. Conducted the first large-scale
replication of Milgram's experiment involving 24,163
email chains and 18 targets around the world. They got
the same results!
50
SMALL-WORLD RANDOM GRAPHS
●
In 1998, Duncan J. Watts and Steven Strogatz
published the first network model on small-world
phenomenon.
●
Their model have both small diameter as seen in
classical random graphs ( O(log n) ) and high
clustering coefficient in contrast to almost zero
clustering coefficient of classical random graphs.
51
SMALL-WORLD RANDOM GRAPHS
●
In this model, n vertices are arranged on a
1-dimensional circular lattice (a “ring” network) and
each vertex is connected to its k nearest neighbors.
●
For each lattice edge (i, j), we then “rewrite” it
uniformly at random with probability p.
–
Rewire: Change j to k, where k is chosen
uniformly at random from among all vertices.
52
SMALL-WORLD RANDOM GRAPHS
●
In this model as p → 1, all edges are rewired, and we
have a classic random graph, with no clustering but
small diameter. When p → 0, we have the original
lattice, with high clustering but large diameter.
53
SMALL-WORLD RANDOM GRAPHS
●
Interesting observation:
–
Small-World model shows that some measures of network structure
are extremely sensitive to small variations in the network structure.
–
Example: With less than 1% of the original edges rewired, the
diameter has already fallen to roughly 20%
of its original value, while clustering
coefficient has barely budged.
L: Mean geodesic path length normalized by
original geodesic path length L(0)
C: Clustering Coefficient normalized by
original clustering coefficient C(0)
54
GEOMETRIC RANDOM GRAPHS
●
A geometric graph G(V,r) with radius r is a graph with node set V of
points in a metric space and edge set
E ={( u , v )∣( u , v ∈V ) ∧( 0<∣u−v∣⩽r ) }
where | . | is an arbitrary distance norm in this space.
●
That is, points in a metric space correspond to nodes, and two
nodes are adjacent if the distance between them is at most r.
●
A random geometric graph G(n,r) is a geometric graph with n nodes
which correspond to n independently and uniformly randomly
distributed points in a metric space.
55
PREFERENTIAL ATTACHMENT
●
In studies of networks of citations between scientific
papers, Derek de Solla Price in 1965 showed that the
number of links to papers had a heavy-tailed distribution
following Pareto distribution or power law.
●
Price also proposed a mechanism to explain the occurrence
of power laws in citation networks.
●
In 1999, Albert-LaszLo Barabasi and collaborators coined
the term “scale-free network” to describe the class of
networks that exhibit a power-law degree distribution
57
PREFERENTIAL ATTACHMENT
●
Barabasi and Albert proposed a generative mechanism to
explain the appearance of power-law distributions, which
they called “preferential attachment” and which is
essentially the same as that proposed by Price.
●
Their paper is most cited Science paper in 1999;
highlighted by ISI as one of the ten most cited papers in
physics in the decade after its publication.
58
PREFERENTIAL ATTACHMENT
●
There are two ingredients needed to build up a scale-free model:
1.Adding or removing nodes. Usually we concentrate on growing
the network, i.e. adding nodes.
2.Preferential Attachment: The probability P that the new node
will be connected to the old nodes.
●
The Barabsi-Albert model assumes that the probability P
that a node attaches to node i is proportional to the degree
k of node i.
59
Thank you!
60
© Copyright 2026 Paperzz