Lecture 1 1 The Probabilistic Method 2 Cliques and Independent Sets

Topics in Theoretical Computer Science
February 18, 2013
Lecture 1
Lecturer: Ola Svensson
1
Scribes: Ashkan Norouzi Fard, Saeid Sahraei
The Probabilistic Method
In this lecture, we will discuss the use of the Probabilistic Method in Theoretical Computer Science. This
method is a general approach towards proving some facts which relies on the design and manipulation of
certain random experiments, whose outcomes have properties which we would find difficult to guarantee
otherwise.
To be more specific, imagine that you would like to prove the existence of a certain combinatorial
structure; sometimes, designing deterministically structures(e.g. graphs) which satisfy certain properties
can be challenging. However, it is often the case that we can design a clever random experiment, whose
outcome is the desired structure with non-zero probability; this will directly imply the existence of
the desired structure. Similarly, this method can have algorithmic applications. Consider for example
that you want to construct an expander graph, i.e. a graph with very high connectivity. So far,
deterministically designing such a graph has been a challenging task. However, picking a graph at
random from a (specific) very simple distribution will return an expander graph with high probability.
In the following, we will see how to apply the Probabilistic Method in some simple scenarios.
2
Cliques and Independent Sets
Let’s start with a simple example.
Lemma 1 Suppose you invite 6 persons to dinner. We claim that either 3 persons are all friends or 3
persons are all strangers.
Proof Suppose the invited persons are A, B, C, D, E, F . Consider two cases: the first one is that A
is friend with at least 3 persons and the second one is that A is stranger with 3 persons. We prove the
lemma for each of these cases.
• Case 1: A is friend with at least 3 persons. Without loss of generality, assume the friends include
B, C and D. If any two of B, C, D are friends, then we have 3 persons that are friends. If not,
then we have three persons who are pairwise strangers (B, C and D).
• Case 2: A has less than 3 friends. In this case A is stranger with at least 3 persons, say B, C and
D. If any two of B, C, D are strangers, then we have a set of three persons who are all pairwise
strangers, otherwise B, C and D form a set of three friends.
This example can be generalized to the case of n guests. We use the following definitions for the sake
of simplicity:
Definition 2 A subset of vertices of G is called a clique, if and only if there is an edge between each
pair of them.
Definition 3 A subset of vertices of G is called an independent set (IS), if and only if none of them
are connected by an edge.
Now we state a generalization of the previous Lemma:
1
Lemma 4 A graph with n vertices either has a clique of size t or an independent set of size s, assuming
n ≥ 2s+t − 1.
Proof We will use induction on s + t. Base: The claim is always correct for s ≤ 2 or t ≤ 2. For
example, consider s = 2. A graph which does not have an independent set of size 2, is a complete graph.
Thus it has a clique of size n − 1. Therefore it has a clique of size t for any t satisfying n ≥ 22+t − 1.
Consequently the claim holds for all s and t satisfying s + t ≤ 5 (as they are both integers). Now assume
the claim is correct for s + t = k − 1. We will conclude that it is true for s + t = k as well. Let G = (V, E)
be a graph where |V | = 2k − 1 and let the vertex A ∈ V . There are two possible cases:
k
• Case 1: A has at least 2 2−2 neighbors. By induction hypothesis, the graph consisting of neighbors
of A has either an independent set of size s or a clique of size t − 1, so G has either an independent
set of size s or a clique of size t.
k
k
• Case 2: A has less than 2 2−2 neighbors, so there are at least 2 2−2 vertices that are not connected
to A. By induction hypothesis, the graph consisting of vertices that are not connected to A has
either an independent set of size s − 1 or a clique of size t, so G has either an independent set of
size s or a clique of size t.
3
Random Graph Model
Now, let us try to extend the above result for graphs which are drawn according to a specific (and very
natural) distribution:
Definition 5 For p ∈ [0, 1], a graph sampled from G(n, p) is a graph with n vertices, obtained by
including each edge with probability p, independently from each other.
The above random graph model is called the Erdős-Rényi model, and is one of the most widely used
random graph models. We will also need the following well-known inequality:
Theorem 6
(Markov’s inequality): If X is a nonnegative random variable and a > 0, then
P r[X ≥ a] ≤
E[X]
a
Now, we are ready to argue about the existence of cliques(and independent sets; notice that a clique
in G is an independent set in Ḡ, and G and Ḡ are equiprobable if G ∼ G(n, 21 )) in such random graphs:
Theorem 7 A graph G ∼ G(n, 12 ) has no independent set or clique of size 2 log(n) asymptotically almost
surely (i.e. with probability that goes to 1 as n goes to infinity).
Proof We will calculate the expected number of cliques of size 2 log(n). For any S ⊆ V , let XS
be an indicator variable such that XS = 1 if S is a clique and XS = 0 otherwise. We start by
calculating E[XS ] = Pr[S is a clique]. Suppose that |S| = t, we have t(t−1)
edges between vertices
2
# of edges
of S, so E[XS ] = 12
will calculate E[X].
= 2−
t(t−1)
2
. Let X be the number of cliques of size t = 2 log(n) + 1. Now we

E[X] = E 

X
XS  =
|S|=t
X
|S|=t
2
E[XS ] =
X
|S|=t
t
2−(2)
t
n
= 2−(2) ·
t
t
t
1
n
1
n
= ·
≤
·
t−1
t! 2 t(t−1)
t!
2
2 2
1 n
1
1
≤ · = ≤
t! n
t!
n
Now by using Markov’s inequality:
E[X]
1
≤
1
n
The probability that G has at least one clique of size t is less than n1 . Similarly, the probability that G
has an independent set of size t is less than n1 . Thus the probability that G has at least one clique or
one independent set of size t is at most n2 .
Pr[X ≥ 1] ≤
We continue by solving two exercises:
4
Exercise 1
For a given positive constant t, what is the largest value of p for which G ∼ G(n, p) has no cycle of
length t almost surely?
4.1
Solution
The solution is similar to the proof of previous part. We will calculate the expected number of cycles
of size t. For any S ⊆ V , let XS = 1 if S has a cycle of length t and XS = 0 otherwise. We start
by calculating E[XS ] = Pr[S has a cycle of size t]. Suppose that |S| = t ⇒ E[XS ] = pt . Let X be the
number of cycles of size t. Now we will calculate E[X].


X
X
X
E[X] = E 
XS  =
E[XS ] =
pt
|S|=t
|S|=t
|S|=t
n
(t − 1)!
= pt ·
·
2
t
≤
1
nt t
t
· p = · (p · n)
t
t
Again by using Markov’s inequality
Pr[X ≥ 1] ≤
E[X]
1
t
≤ · (p · n)
1
t
So for p = n1 the probability goes to zero as n goes to infinity.
If we want to create a dense graph, a graph with a lot of edges, without any cycles of length l, we
can use the expected number of cycles from Exercise 1. Here we describe an algorithm which finds such
a graph:
1- Sample G ∼ G(n, p)
2- Remove one edge from each cycle of length l.
3- Output the resulting graph
The expected number of edges of the resulting graph is at least the expected number of edges of G
l
minus the expected number of cycles of length l, so it is at least n2 · p − n2l · pl . We need to maximize
2−l
1
this by selecting an appropriate p. For p = n l−1 , the resulting graph will have n1+ l−1 edges.
3
5
Exercise 2
What is the largest value of t such that G
almost surely?
5.1
∼
G(n, 21 ) has at least one clique of size t asymptotically
Solution
We claim that the answer is t = 2(1 − ) log(n) for any arbitrarily small value of which is strictly larger
than zero.
In order to prove this, let X be the random variable that stands for the number of cliques of size t. We
already know that
n 1 (2t)
E[X] =
( )
t 2
We want
to make sure this expectation is large. So, contrary to before, we have to use a lower bound
on nt :
n
n−t+1
n
n(n − 1)...(n − t + 1)
= · ... ·
=
t!
t
1
t
All fractions are of the form n−i
t−i for 0 ≤ i < t, thus all of them are larger or equal to
we have the following lower bound:
n
n
n
n
≥ · ... · = ( )t
t
t
t
t
n
t.
Therefore
We also have that:
t
1 t
1 t(t−1)
1 t2
1 t2
( )(2) = ( ) 2 = ( ) 2 · 2 2 > ( ) 2
2
2
2
2
We can now bound the expectation:
t
n t 1 t2
n 1 (2t)
n
2
E[X] =
( ) >( )( ) =
t
t 2
t 2
t2 2
t
In 2 2 we replace t with the value we claimed for it:
E[X] >
t
n
t2(1−) log(n)
n t
=
=
tn1−
No matter how small is chosen, as n goes to infinity, the fraction
we have that for the claimed value of t,
n
t
n
t
t
goes to infinity too. Consequently
lim E[X] = +∞
n→+∞
This still does not imply that P r[X ≥ 1] also grows arbitrarily close to 1 as n grows large. Consider
for example a random variable X with the following PDF:
f (x) = 0.99δ(x) + 0.01δ(x − 1012 )
where δ(x) indicates the Dirac delta function1
1 To
Z∞
be precise, δ is a generalized function that ranges over the reals, such that δ(x) = 0 if x 6= 0, and
δ(t)dt = 1.
−∞
4
This is clearly a PDF, as integral of f equals to 1. Furthermore, the expected value of x is quite
large:
E[X] = 0.99 · 0 + 0.01 · 1012 = 1010
Nonetheless the probability that x > 1 is only 0.01.
What is noticeable in this example is that the standard deviation of the random variable x is also very
large:
Var[X] = E[X 2 ] − E[X]2 = 1022 − 1020 ≈ 1022 ⇒ σ[X] ≈ 1011
where Var[X] = σ 2 [X].
One can empirically conclude that a random variable can be arbitrarily far from its expected value
with a high probability, given that its standard deviation is sufficiently large. This gives rise to the
following question: Does having a small standard deviation guarantee that a random variable remains
close to its expected value with high probability? Chebyshev’s inequality answers affirmatively:
Theorem 8 (Chebyshev’s inequality): If X is a random variable with finite expected value µ
and finite non-zero standard deviation σ, then
Pr[|X − µ| ≥ λσ] ≤
1
λ2
where λ is an arbitrary positive number.
Proof
σ 2 = Var[X] = E[(X − µ)2 ] ≤ λ2 σ 2 P r[|x − µ| > λσ]
where the last inequality follows from Markov’s inequality for the random variable |X − µ|2 and the
parameter λ2 σ 2 . Dividing both sides by λ2 σ 2 gives Chebyshev’s inequality.
Back to our problem, we now know that in order to prove that Pr[X ≥ 1] is large we also need to
bound the standard deviation of X. In particular note that if we set λ = σµ in Chebyshev’s inequality,
we obtain:
σ2
Pr[|X − µ| ≥ µ] ≤ 2
µ
The left hand side of the inequality can be written as Pr[|X − µ| ≥ µ] = Pr[X ≤ 0] + Pr[X ≥ 2µ] ≥
Pr[X ≤ 0]. Thus we have:
σ2
Var[X]
Pr[X ≤ 0] ≤ 2 =
µ
E[X]2
If we can show that Var[X]
E[X]2 goes to zero as n goes to infinity, we would have shown that X is larger or
equal to 1 (as X only accepts integer values, X > 0 is equivalent to X ≥ 1) asymptotically almost surely
and thus the number of cliques of size t is at least 1 (a.a.s.). Hence:
X
X
X
X
Var[X] = E[X 2 ]−E[X]2 = E[(
XS )2 ]−(E[
XS ])2 = E[
XA XB ]−
E[XA ]E[XB ]
|XS |=t
|XS |=t
|A|=t,|B|=t
|A|=t,|B|=t
If A and B are disjoint, then XA and XB are independent. Thus E[XA XB ] − E[XA ]E[XB ] = 0. Same
argument holds when A and B intersect in only one vertex, as in such case they do not have any edges
in common, thus again XA and XB are independent. So, we assume that A and B intersect in at least
two vertices.
t X
X
n −(2t) X t n − t −(2t)+(2i )
Var[X] ≤
E[XA XB ] =
2
2
i
t−i
t
i=2
|XA |=t |XB |=t,|A∩B|≥2
5
n
t
stands for the number of different ways one can choose XA . The probability that XA
is a clique is 2
. The term ti accounts for all possible ways for XB and XA to share i vertices. XB
−(2t )+(2i )
may choose the remaining vertices in n−t
is the probability that the
t−i different ways. Finally, 2
remaining edges of XB are also connected and thus XB is a clique too.
Here the term
−(2t )
n
t
Var[X]
≤
E[X]2
Pt
t
i=2 i
=
n−t
t−i
t
2−(2)
Pt
(i)
22
n
t
t n−t
i=2 i t−i
n 2 −2(2t )
t 2
≤t
maxti=2
t
i
2−(2)+(2)
t n−t
i t−i
n
t
(i)
22
The expression in the nominator is maximized for i = 2, this is because:
t
i
n−t (2i )
t−i 2
(i−1)
n−t
2
t−(i−1) 2
t
i−1
=
(t − i + 1)(t − i + 1) i−1
2
<1
i(n − 2t + i)
where the last inequality is obtained as n goes to infinity, due to the fact that n t. Thus we have
that:
t n − t (2i )
t
n − t (22)
t
max
2 =
2
i=2
i
t−i
2
t−2
(n−t)!
t n−t (22)
t(t − 1) (t−2)!(n−2t+2)!
Var[X]
2 t−2 2
≤
t
=
t
⇒
n
n!
E[X]2
t
t!(n−t)!
= t3 (t − 1)2
n − 2t + 3
1
1
(t − 1)5
n−t
· ... ·
·
·
<
n
n−t+3 n−t+2 n−t+1
(n − t + 2)2
And this goes to zero as n goes to infinity, since n t.
We have thus proved that almost all graphs from G(n, 21 ) have a clique of size 2(1 − ) log(n) for any
small and positive constant . But this does not mean that finding such a clique is easy. In fact, this is
yet an open problem to find in polynomial time a clique of expected size c log(n), for any constant c > 1
in a graph from G(n, 12 ). As a result, we have the double threshold chart below for the graphs which are
generated from G(n, 21 ).
6
0.
1.
2.
3.
4.
Here we describe an algorithm which finds such a clique for the case of c = 1:
Set S = ∅.
Pick any vertex v at random and set S = S ∪ {v}.
Discard v and all vertices not connected to v from the graph.
If graph is not empty, repeat from step 1.
Output S.
Since in G(n, 21 ) every edge exists with probability 12 , in every step of the algorithm half of the nodes
are discarded on average. Thus the expected size of the returned clique is log(n).
7