Math 216A Notes, Week 7

Math 216A Notes, Week 7
Disclaimer: These notes are not nearly as polished (and quite possibly not nearly as correct) as a published
paper. Please use them at your own risk.
1. The Framework From Last Week, and the General Local Lemma
As often happens with the probabilistic method, we have some list {A1 , . . . , An } of bad events that we want to
avoid. If the events each happened with probability less than 1 and were independent, then we’d immediately
have that with positive (non-zero) probability no event occurred. Our goal was to relax this somewhat: If
each event is sufficiently rare and they’re nearly independent, then we still have positive probability that no
event occurs. To make this ”nearly independent” precise, we defined
Definition 1. A Dependency Graph for a set {A1 , . . . , An } of events is a graph H on vertex set {1, . . . , n},
such that for every i the event Ai is independent from the set
{Aj |j is not adjacent to i in H}.
Given this definition, we can state
Theorem 1. (Lovász Local Lemma, general version) Let A1 , . . . , An be events having dependency graph H.
Suppose that there are constants x1 , . . . , xn such that 0 ≤ xi < 1 and satisfying
Y
P(Ai ) ≤ xi
(1 − xj )
(i,j)∈E(H)
for every i. Then
P( no Ai occurs ) ≥
n
Y
(1 − xi ) > 0.
i=1
As a special case, we have the symmetric version we stated last time
Corollary 1. (Lovász Local Lemma, symmetric version) Let A1 , . . . , An be events having dependency graph
H. Suppose that there is a 0 < p < 1 and d > 0 such that the maximum degree of H is at most d, each Ai
has probability at most p, and
ep(d + 1) < 1.
Then with nonzero probability no event occurs.
Proof. Let xi =
1
d+1
for every i. Then we have
xi
Y
(1 − xj ) ≥
(i,j)∈E(H)
1
d+1
1
1−
d+1
d
1
(d + 1)e
≥ p,
≥
so by the Local Lemma there is a positive (though possibly exponentially small in n) probability no event
occurs.
1
2. The Proof of the Local Lemma
Consider first the special case when H is empty (when all the events are independent). Then the Lemma
collapses down to the statement that if P(Ai ) ≤ xi
n
n
Y
Y
(1 − P (Ai )) ≥
P A1 ∧ A2 ∧ · · · ∧ An =
(1 − xi ) ,
i=1
i=1
which is trivially true. In our case the events aren’t independent, so the above equation becomes
P A1 ∧ A2 ∧ · · · ∧ An = P A1 × P A2 |A1 × · · · × P An |A1 ∧ A2 ∧ · · · ∧ An−1
To prove the Local Lemma, if would be enough to show it term by term, that is to show
P Ai |A1 ∧ A2 ∧ · · · ∧ Ai−1 ≤ xi
More generally, it would be enough to show
Lemma 1. For any variable i and any set S ⊆ {1, . . . , n} not containing i, we have


^
Aj  ≤ x i
P  Ai |
j∈S
In a sense, this lemma is saying that we can think of xi as an upper bound on the probability of Ai . No
matter what other events we assume are not holding, the probability of Ai remains at most xi .
We will prove this lemma by induction on |S|. If S is empty, then there’s nothing to prove, since by
assumption we have
Y
P (Ai ) ≤ xi
(1 − xj ) ≤ xi .
(i,j)∈E(H)
So now let us assume that S is non-empty and the result holds for all proper subsets of S. We partition
S = S1 ∪ S2 , where S1 consists of those elements of S adjacent to i in H, and S2 consists of all the remaining
elements of S. Making use of the general identity
P(A|B ∧ C) =
P(A ∧ B|C)
,
P(B|C)
we have
V
V
P Ai ∧
A
|
A
j
`
j∈S1
`∈S2
Aj  =
P Ai |
V
V
P
A
|
`
∈
S
2 A`
j∈S1 j
j∈S


^
We can bound the numerator above by
!
P Ai |
^
A`
,
`∈S2
which by the definition of H is just
P(Ai ) ≤ xi
Y
(i,j)∈E(H)
2
(1 − xj ).
If S1 is empty, than the denominator is just 1 and we are already done. If this is not the case, then let
S1 = {j1 , . . . , jr }. Using the same conditional probability expansion as in the start of the proof, we can write


^
^
^
^
Aj | ` ∈ S2 A`  = P Aj1 | ` ∈ S2 A` × P Aj2 |Aj1 ∧
P
` ∈ S2 A` × . . .
j∈S1
≥ (1 − xj1 ) (1 − xj2 ) . . . (1 − xjr )
Y
≥
(1 − xj )
(i,j)∈E(H)
Here we used our inductive hypothesis to bound each of the conditional probabilities in our product from
below. Multiplying our lower bound on the denominator with our upper bound on the numerator, we get
the desired result.
3. The Symmetric Local Lemma and R(k, k)
Consider a graph where every edge is colored red or blue independently with probability 1/2. For each set
S with |S| = k, we let AS denote the event that all edges within S are the same color. As before, we have
k
P(AS ) = 21−(2) .
We form a dependency graph H by connecting two subsets if and only if they share at least one edge. We
can upper bound the degree of a vertex in H by counting the number of ways to first choose an edge in S
where they overlap, then choose the remaining k − 2 vertices in the other edge, giving
n
k
.
d≤
2 k−2
The Symmetric Local Lemma, in this case, then becomes the following: If
k
n
1−(k
)
2
e2
1+
< 1,
2 k−2
then with positive probability none of the events occur. In other words, we must have R(k, k) > n.
The difference from our previous union bound argument is that (essentially) nk has been replaced by
n k
2 k−2 . Running through the same asymptotics as before with Stirling’s approximation tells us that the
effect of this change is to show
√
2
(1 + o(1))k2k/2 ,
R(k, k) >
e
a factor of 2 improvement over the previous bound. This may not look like much (given that the upper
bound is still roughly 4k ), but it’s actually still the best known lower bound!
4. The Asymmetric Local Lemma and R(3, k)
We now turn to the question of providing an improved lower bound on R(3, k). The idea, like our previous
lower bound on this number, is to start with a random graph with edge probability p (to be determined
later), and show that with positive probability it satisfies our conditions. Now we effectively have two sets
of bad conditions.
• There are n3 events of the form AT , where T is a triangle that could be in G. Each occurs with
probability p3
• There are nk events of the form BS , where S is an independent set of size k that could occur in G.
k
Each occurs with probability (1 − p)(2) .
3
Our idea will be to apply the general LLL to these events. As in the previous section two events will be
adjancet in the dependency graph precisely when they share at least one edge in common.
Before we go into detailed calculations, it’s worth looking ahead to our final goal. We eventually want to
2
√
take k = c1 n log n for some constant c1 (which corresponds to a lower bound R(3, k) ≥ c logk k ). The
trouble is, if we take two ”typical” sets S1 and S2 of this size, they probably will intersect in several places.
Roughly speaking, this is because there are n log2 n pairs (s1 , s2 ) ∈ S1 × S2 , and each pair collides (has
the same element appearing in both places) with probability about n−1 (This is akin to a fact sometimes
referred to as the ”birthday paradox”: even though any two random people share the same birthday with
probability about 1/365, if you have, say, 30 people in a room there will probably be two with the same
birthday, because there’s so many pairs of people).
In terms of our dependency graph, what this means is that any BS will be connected to most of the other
B events. So for our purposes, the Local Lemma probably won’t be of too much help with the B events.
On the other hand, if we look at two triangles than most of the time they will not intersect, and we can
hope to get something out of the Local Lemma. So our general philosophy in handling H (and elsewhere)
will be to carefully bound stuff involving intersections with triangles, but not worry too much about getting
good bounds when the expressions involving intersections with k−sets. With that in mind, we now turn to
bounding the degrees in H.
We know that each triangle T shares an edge with 3(n − 3) ≤ 3n other triangles (3 choices for the edge where
they intersect,
n − 3 choices for the remaining vertex of the triangle). We also know (trivially) it intersects
at most nk sets of size k.
2
Similarly, we know that each S of size k shares an edgewith at most k2 (n − 2) < nk2 triangles, and we
again use the trivial bound that it intersects at most nk other sets of size k. From the Local Lemma, we
have the following:
Claim 1. Suppose there are p, x, y between 0 and 1 such that
(1)
(2)
p3
k
(1 − p)(2)
n
≤ x(1 − x)3n (1 − y)(k )
n
2
≤ y(1 − x)nk /2 (1 − y)(k ) .
Then R(3, k) > n.
It remains to choose p, x, and y so as to make n as large as possible.
5. Optimizing the Variable Values
This is in a sense as much an art form as anything. Here’s one way you might be led towards (what turn
out to be within a constant factor of) the optimal values of the variables.
We start with our observation from before: We can’t really expect much help from the Local Lemma on the
BS . Given this, the easiest way to handle the BS is to just make sure that the expected number of BS that
occur is small. We have
k
n
E( Number of BS ) =
(1 − p)(2)
k
ne k
2
≈
e−pk /2
k
k
ne
=
pk
ke 2
4
(for a very loose definition of ≈). To make this small, it’s enough to have pk/2 be some constant multiple of
log n. For example, we can take p = c2 n−1/2 , where c2 >
5
2c1 .
At this point if we look at the left hand side of (2), we see that
k
(1 − p)(2)
=
(1 − c2 n−1/2 )
= e
−c2 n−1/2 k2 (1+o(1))
2
2
= e−c2 c1 n
(3)
k2 (1+o(1))
2
1/2
log2 n( 12 +o(1))
is incredibly tiny. So we can get away with taking y to be very small. So small in fact, that we might hope
k
that (1 − y)(2) ≈ 1. We’ll assume that for now, then check later on if our assumption is reasonable. Under
this assumption, equation (1) becomes
p3 = c32 n−3/2 ≤ x(1 − x)3n (1 + o(1))
This is satisfied, for example, if x = c3 n−3/2 , where c3 > c32 , since in that case
(1 − x)3n > 1 − 3nx = 1 − o(1).
We next turn to equation (2). Using our bound (3) on the left hand side turns that equation into
2
e−c2 c1 n
1/2
log2 n( 21 +o(1))
c2 n2 2log2 n
n
≤ y 1 − c3 n−3/2
(1 − y)(k )
=
2
(1 + o(1))ye−c3 c2 n
1/2
log2 n( 21 +o(1))
n
under our assumption that (1 − y)(k ) ≈ 1.
This equation holds if, for example, we take y = e−c4 n
c4 +
1/2
log2 n
, where
c3 c22
c2 c21
<
.
2
2
We now have two loose ends to tie up. We have to check our assumption on y, and we have to make sure
that all of the inequalities we’ve been making relating the various ci are actually consistent. To check the
first, note that we have
n
n
(
)
(1 − y) k ≥ 1 − y
,
k
and
ne k
n
y
≤y
k
k
√
c1 √n log n
√
ne
−c4 n log2 n
= e
c1 log n
√
√
2
1
= e−c4 n log n ec1 n log n( 2 +o(1)) ,
and this tends to 0 if c4 >
c1
2 .
Reviewing all of our constants, it suffices to take c1 large and
c2
=
3c−1
1
c3
=
30c−3
1
c4
=
1.25c1 .
5
Remark 1. Notice how often the inequalities
1 − x ≤ e−x
(with equality almost holding if x = o(1)) and
n
ne k
≤
k
k
came in handy here. They’re both VERY useful
6. Arithmetic Ramsey Theory
We’ll focus more on this starting next week, but the general idea is as follows:
Instead of coloring the edges of a graph, we will color the integers {1, . . . , N } with some finite number of
colors. The goal will be to show that if N is sufficiently large, than at least one of the colors must have some
sort of structure, such as a monochromatic solution to an equation or system of equations. As it turns out,
this isn’t always possible. But there are some cases where it is true.
Theorem 2. (I. Schur) For any fixed k, there is an N0 such that if we color the integers {1, . . . , N } with k
colors and N ≥ N0 , then there must be x, y, z all the same color satisfying
x+y =z
More generally, for any fixed k, ` there is an N0 such that if we color with k colors and N ≥ N0 , then there
must be x1 , x2 , . . . , x` , c all the same color satisfying
x1 + · · · + x` = c
Proof. Our proof relies on the following ”k-color” Ramsey theorem
Lemma 2. For any k and m1 . . . mk there is a finite R(m1 , . . . , mk ) such that if we color the edges of Kn
with k colors and n > R(m1 , . . . , mk ), then there is a monochromatic Kmj of color j for some j.
Proof. (Sketchy) By the same induction as in the 2−color case, we have
R(m1 , m2 . . . mk ) ≤ R(m1 − 1, m2 , . . . , mk ) + R(m1 , m2 − 1, . . . , mk ) + · · · + R(m1 , m2 , . . . , mk − 1).
We now turn to the proof of Schur’s theorem. Given a coloring of the integers {1, . . . , N }, we can define an
auxiliary coloring on the graph KN as follows: For each i < j, we color the edge (i, j) by the color given
j − i in the original coloring. By Ramsey’s theorem, if N > R(` + 1, ` + 1, . . . , ` + 1) we know that this
graph coloring contains a monochromatic K`+1 . This corresponds to a set {a1 , . . . , a`+1 } such that all the
differences within this set have the same color in our original coloring.
Letting
=
a2 − a1
x2 =
..
.
a3 − a2
x`
=
a`+1 − a`
c =
a`+1 − a1
x1
This gives our desired monochromatic solution.
6
Note that actually this gives us something a little bit stronger. Effectively we have that all sums of the form
k
X
xi
i=j
are the same color, for any 1 ≤ i ≤ j ≤ `, since that sum equals aj+1 − ai .
7