p.58-61

12
Random Graphs
In this section we present the most standard random network model, the Erdős-Rényi graphs (introduced in 1959 by Paul Erdős and
Alfred Rényi).
We consider simple graphs G on n vertices, i.e. a set of n points and some edges among them. The actual geometric realization
is unimportant, only the connectivity structure among the n points (expressed by the edges) is relevant. For example, the vertices can
represent students in a university and two vertices are connected if the two students are friends (assuming that this relation is symmetric:
Paul is a friend of Peter if and only if Peter is a friend of Paul – think of Facebook). The adjective simple refers to the fact are no multiple
edges between the same vertices. The set of vertices is denoted by V , and the set of edges by E. One can think of E as a subset of
unordered pairs of elements of V :
n
o
E ⇢ {i, j} : i, j 2 V, i 6= j
(47)
in a natural way.
Two vertices are called adjacent if they are connected by an edge. A connected component of the graph G is a maximal collection of
vertices V1 ⇢ V such that between any two vertices v, w 2 V1 there is a path between them, i.e. there is a sequence of vertices
v = v1 , v 2 , . . . v k
1 , vk
=w
such that vi and vi+1 is adjacent for every i = 1, 2, . . . , k 1. In other words, any two vertices in V1 are connected (maybe not adjacent,
of course). The word ”maximal” refers to the requirement that one cannot add further vertices to V1 and keep the connectivity property,
in other words, there are no edges that would connect some element of V1 with the complement set V \ V1 . The degree of a vertex v,
denoted by d(v) is the number of vertices that are adjacent to v (in other words, the number of edges that connect v).
n
On n (labelled) vertices there are altogether 2( 2 ) graphs, since there are altogether n2 unordered pairs formed from elements of V
(the cardinality of the set in the right hand side of (47)), and a graph is uniquely identified by its edge set, which can be any subset of the
set of vertex-pairs. We now consider a random graph on n fixed vertices. There are many possible distributions one can put on the set of
all graphs on n vertices, but one of the most natural one is the ”most random” choice, where each edge is chosen independently and with
identical Bernoulli distribution. More precisely, we fix a parameter 0 < p < 1 and we select a random subset E in (47) by Bernoulli
trial (with parameter p). This (distribution on graphs) is called the Erdős-Rényi random graph and is denoted by G(n, p) indicating the
two parameters.
Some simple facts A few easy questions can be immediately answered. The probability of generating a particular graph with k
edges is
pk (1
n
p)( 2 )
k
.
The probability that our random graph has exactly k edges is
n
2
k
!
pk (1
n
p)( 2 )
k
by a trivial fact on the binomial distribution. The expected value of the degree of any fixed vertex v is
Ed(v) = p(n
1)
since there are n 1 possible edges connecting v and each is chosen with probability p. The distribution of the degree (of a given
vertex) is also binomial:
!
n 1 k
P (d(v) = k) =
p (1 p)n 1 k
k
Sparse graphs Consider the interesting special case of sparse graphs; these are graphs where p depends on n and it goes to zero as
n goes to infinity. For example, choose
c
(48)
n 1
where c is a fixed constant and we will let n to be very large. In this case, the expected degree is exactly c for any n. As the size of
the graph n grows, the graph gets sparser since despite the more possible edges emanating from a vertex v, still only finite many are
p = pn =
58
present. This is a very typical situation of a large network, e.g. the friendship graph. Even if the possible number of friends (the total
mankind) increases, typically the number of your friends does not. The degree distribution in the sparse limit becomes Poisson (as we
learned that a binomial distribution converges to Poisson), i.e. under (48) we have
P (d(v) = k) ⇡
ck
e
k!
c
(49)
Triangles What is the probability that the graph has a triangle, i.e. three vertices that are pairwise adjacent? The number of possible
triangles is
n
3
and the probability that a fixed triangle is present is p3 . So the expected number of triangles is
!
n 3
E#{triangles} =
p .
3
In the sparse case (48) we have
E#{triangles} =
In particular, if p
!
(np)3
n 3
c3
p ⇡
=
3
6
6
(50)
1/n, then the expected number of triangles diverges.
However, just from this calculation we cannot conclude that there is always a triangle with very high probability since expectation of
a random variable can be large even if, say, in half of the cases (i.e. with probability 1/2) there is no triangle at all. Still, the p
1/n
turns out to be the right threshold to guarantee a triangle, but one cannot argue just by expectation. More precisely, the expectation gives
only a one-sided bound:
c3
P (9 triangle)  E#{triangles} ⇡
6
For a lower bound, we have to argue by the exclusion-inclusion principle. Let i = 1, 2, . . . , n3 label all possible triangles T1 , . . . , T(n) ,
3
and let Ai be the event that the i-th triangle is present in the graph. We are looking for
⇣[ ⌘
P (9 triangle) = P
Ai
i
We clearly have an upper bound
P
⇣[
i
⌘
Ai 
X
P (Ai ) =
i
!
n 3
c3
p ⇡
3
6
as before. For the lower bound, we first interpret Ai as the set of all graphs which contain the i-th triangle. Then we have
⇣[ ⌘ X
X
P
Ai
P (Ai )
P (Ai \ Aj )
i
i
i6=j
because the second term in the right compensates for all possible overcounting coming from overlaps. Now we compute the probability
of overlaps. There are three types of Ai \ Aj sets; either the two triangles Ti and Tj share an edge, or they share only a vertex or they
are fully disjoint. So we can write
X
P (Ai \ Aj ) =P (9two triangles sharing only an edge)
i6=j
+ P (9two triangles sharing a vertex only)
+ P (9two disjoint triangles)
We need an upper bound on these probabilities, which can be obtained by the expectation as above. Two triangles sharing an edge
requires choosing four vertices (there are n4 possibilities) and connecting five among the six possible edges among them, i.e.
!
n 5
P (9two triangles sharing an edge)  6
p
4
(the prefactor 6 comes from selecting the five edges, of course this is an over counting). Similarly
!
n 6
P (9two triangles sharing a vertex only)  15
p
5
59
and
!
n 6
P (9two disjoint triangles)  10
p
6
(think over the factors 15 and 10!). Thus
P (9 triangle)
!
n 3
p
3
!
n 5
6
p
4
!
n 6
15
p
5
!
n 6
10
p
6
In the sparse regime we see that the two middle terms are negligible (they are of order n4 p5 =
P (9 triangle)
!
n 3
p
3
!
n 6
c3
p + O(1/n) ⇡
6
6
10
c4
n
and n5 p6 =
c5
)
n
and we are left with
c6
72
Remark for challenge: Notice that in the above argument we truncated the general exclusion-inclusion formula
P
N
⇣[
j=1
⌘ N
X1
Aj =
( 1)k
1
k=1
X
I⇢{1,2,...,N }
|I|=k
P
⇣\
j2I
Aj
⌘
at level k = 2. One can truncate it at any higher level, successively obtained better and better alternating (lower and upper) bounds.
Finally, we would get
3
P (9 triangle) = 1 e c /6
in the n ! 1 limit under the choice (48) (you can try to prove this rigorously, extending the above argument). In particular, if p
(i.e. c
1), then there is a triangle with a probability that goes to 1 as n increases.
1/n
Criticism of the Erdős-Rényi model Notice that the expected number of triangles in the Erdős-Rényi graph (50) is very different
from what you experience in the real ”friendship” graph. In the friendship graphs there are many more triangles! Although typically
everyone has only finitely many friends (i.e. the expected number of degree c is finite – maybe a big number, like 50 or 100, but much
smaller than the total number n of people, so the sparse graph approximation is well justified), the number of triangles is typically
proportional with n, since very often two of your friends are friends among themselves as well, so a positive proportion of people is part
of a triangle. So
E#{triangles} ⇠ Cn
in contrast to (50). This justifies the need to model large networks with random graph distributions that are different from the most naive
(but still extremely powerful and insightful) Erdős-Rényi model.
Large connected component Finally we discuss the connectivity properties of the Erdős-Rényi graph. One usually formulates
these questions in a form that how large p should be so that certain connectivity patterns arise. Clearly connectivity monotonically
increasing with p, so we are looking for a minimal value of p, that often depends on n.
If p is very small then the graph is totally disconnected, it has only a few isolated components and many isolated vertices (by the
Poisson distribution of d(v), see (49), the probability that a fixed vertex remains isolated is e c in the sparse case). As p increases, the
connected components become larger, but surprisingly not in a continuous way. Let Mp be the largest connected component, then its
expected size behaves as follows:
8
2
1 "
>
<" log n if p = n
EMp ⇠ n2/3
if p = n1
>
:
"n
if p = 1+"
n
Similar results hold with high probability. The interpretation of these formulas is that as long as p < 1/n, the expected degree is smaller
than 1, there are only very small connected components (even the largest one is almost bounded). When p = 1/n, suddenly a large
component of size n2/3 appears. If p increases further to p = 1+"
, then the large component becomes macroscopic, meaning that its
n
size is proportional to the total size of the graph, i.e. n (its called the giant component). Such a sudden change in behavior is called a
phase transition.
60
Now we give an argument why c = 1 is threshold for the giant component to appear. Let u be the fraction of vertices that do not
belong to the giant component; this is the same as the probability that a given vertex does not belong to the giant component. Since the
giant component occupies a positive percentage of vertices by its definition, u < 1 means there exists a giant component and u = 1
means there is no giant component.
Suppose a fixed vertex v does not belong to the giant component. Then it cannot be adjacent any other vertex that belongs to the
giant component either. So for any other vertex v 0 two scenarios are possible: either v 0 is not adjacent to v or v 0 is adjacent to v but it
does not belong to the giant component itself. The first scenario happens with probability 1 p, the second with probability pu. If v
is disconnected from the giant component, then the above argument has to be true for all n 1 possible choices of v 0 6= v, and these
choices are approximately independent, i.e. the probability that v is not in the giant component satisfies
u = (1
p + pu)n
1
.
(Notice that this last step holds only approximatively and for the complete proof more argument is needed, but the equation itself turns
out to be correct). A simple calculation with (48) shows
or
h
u= 1
c(u
n
1) in
1
q=1
e
1
cq
⇡e
c(1 u)
(51)
for u := 1 q the probability that a vertex is in the giant component. Now we analyze this equation for q. Clearly q = 0 is the trivial
solution, meaning that there is no giant component. But an easy graphing shows that (51) has another solution if c > 1 (why? because
the function f (q) = 1 e cq has derivative f 0 (0) = c at zero and for large q is is almost constant, so for c > 1 the graph of the function
f (q) intersects the graph of the identity function a second time apart from zero). This nontrivial solution q > 0 is the size of the giant
component.
61