Regularity Lemma

Graphs & Algorithms: Advanced Topics
Szemerédi’s Regularity Lemma
Uli Wagner
ETH Zürich
Motivation: Random Bipartite Graphs
B
A
Let G = (A ∪˙ B, E ),
|A| = |B| = n, every
possible edge ab chosen
independently with probability p, 0 ≤ p ≤ 1.
a
Prob = p
b
X
Y
!
"#
n
$
Theorem (Bernstein–Chernoff Bound)
!
"#
n
$
If Z1 , . . . , Zm are independent {0, 1}-valued random variables,
Pr[Zi = 1] = p, then
X
2
Pr[|
Zi − pm| > ∆] < 2e −2∆ /m .
i
Corollary
For X ⊆ A and Y ⊆ B,
Pr[|e(X , Y ) − p|X ||Y ||] > ε|A||B|] < 2e −2ε
2 |A|2 |B|2 /|X ||Y |
2 |A||B|
≤ 2e −2ε
Random Bipartite Graphs Cont’d
B
A
Let G = (A ∪˙ B, E ),
|A| = |B| = n, every
possible edge ab chosen
independently with probability p, 0 ≤ p ≤ 1.
a
b
Y
!
Corollary
Prob = p
X
"#
n
!
$
"#
n
$
For X ⊆ A and Y ⊆ B,
Pr[|e(X , Y ) − p|X ||Y || > ε|A||B|] < 2e −2ε
2 |A||B|
2 n2
≤ 2e −2ε
The number of X ’s, Y ’s is 2|A| · 2|B| = 4n . Thus, with probability
4 2
at least 1 − 4n · 2e −2ε n → 1 (as n → ∞),
|e(X , Y ) − p|X ||Y || ≤ ε|A||B|
holds for all X ⊆ A, Y ⊆ B.
The regularity Lemma—Informal Statement
Example (A Class of Dense Graphs)
Let k be a fixed constant, and let pij ∈ [0, 1] be fixed probabilities.
Given n, let V = V1 ∪ V2 ∪ . . . ∪ Vk with |Vi | = n/k, 1 ≤ i ≤ k,
construct a k-partite graph by choosing the edges between Vi and
Vj independently with probability pij (no edges inside the Vi ).
Message of the Regularity Lemma
Every dense graph G (i.e., Ω(n2 ) edges) is approximately of the
above form, i.e., G is the approximate union of constantly many
random-like bipartite subgraphs.
The number of parts depends only on the error of the
approximation error but not the size of G !
One of the most important tools in “dense” combinatorics.
The Regularity Lemma—Regular Pairs
Definition (Density, Regular Pairs)
Let ε > 0 and let G = (V , E ) be a graph. For subsets A, B ⊆ V ,
the the density of the pair (A, B) is
|E (A, B)|
d(A, B) :=
|A| · |B|
A pair (A, B) of subsets A, B ⊆ V is called ε-regular pair (for G ) if
all X ⊆ A, and Y ⊆ B satisfy
e(X , Y ) − d(A, B) · |X ||Y | ≤ ε|A||B|.
Remark
In other words, (A, B) is ε-regular iff the (induced) bipartite graph
G (A, B) is ε-quasirandom, i.e., ε-approximately behaves like a
random bipartite graph with edge probability p = d(A, B).
Szemerédi’s Regularity Lemma—Regular Partitions
Definition
Let ε > 0, let G = (V , E ) be a graph, and let V = V1 ∪ . . . ∪ Vk
be a partition of V .
I
For an ordered pair (x, y ) ∈ V × V , there is a unique pair
(i, j) ∈ [k] × [k] with x ∈ Vi and y ∈ Vj (possibly i = j).
I
Let B := {(i, j) ∈ [k] × [k] : (Vi , Vj ) not ε-regular} be the
(indices of) nonregular (“bad”) pairs.
I
The partition is called ε-regular (for G ) if
X
|Vi | · |Vj | ≤ ε|V |2 .
(i,j)∈B
In other words, if we pick (u, v ) ∈ V × V uniformly at random
then with probability at least 1 − ε, the corresponding pair
(Vi , Vj ) is ε-regular.
Regularity Lemma—Formal Statement
Theorem (Regularity Lemma)
Given ε > 0, there exists K = K (ε) depending only on ε such that
every graph G has an ε-regular partition into at most K parts.
Remarks
I
Meaningful only for dense graphs with at least εn2 edges.
I
Sometimes, one also wants to prescribe a lower bound k on
the number of parts. In that case, the upper bound
K = K (ε, k) depends on k and ε.
I
One can also obtain regular partitions for which all parts have
approximately equal size, |Vi | ≤ |Vj | ≤ |Vi | + 1 for all i, j.
I
Was devised to prove that “dense sets of integers contain an
arithmetic progression of arbitrary length”.
Szemerédi’s Theorem on Arithmetic Progressions
Definition
An arithmetic progression (of integers, of length k) is a set of
integers of the form
{a, a + d, a + 2d, . . . , a + (k − 1)d}.
Theorem (Szemerédi’s Theorem, 1975)
For any integer k ≥ 1 and δ > 0 there is an integer N = N(k, δ)
such that any subset S ⊆ {1, . . . , N} with |S| ≥ δN contains an
arithmetic progression of length k.
History of Szemerédi’s Theorem and the Regularity Lemma
I
Conjectured by Erdős and Turán (1936). Important problem
in mathematics, inspired lots of great new ideas and research
in various fields.
I
Case k = 3: analytic number theory (Roth, 1953; Fields
medal)
I
First proof for arbitrary k: combinatorial (Szemerédi, 1975)
I
Second proof: ergodic theory (Furstenberg, 1977)
I
Third proof: analytic number theory (Gowers, 2001; Fields
medal)
I
Fourth proof: fully combinatorial (Hypergraph Regularity)
(Rödl–Frankl–Nagle–Schacht–Skokan, Gowers, 2007)
I
Fifth proof: measure theory (Elek-Szegedy, 2007)
I
One of the ingredients in the proof of Green and Tao: “primes
contain arbitrary long arithmetic progressions’
Regularity—An Analytic Reformulation
Let G = (A ∪ B, E ) be a bipartite graph of density p.
We can view G as a function G : A × B → {0, 1},
1, if {x, y } ∈ E (G ),
G (x, y ) :=
0,
else.
Renormalization
Define fG : A × B → [−1, 1] by fG (x, y ) := G (x, y ) − p, i.e.,
1 − p, if {x, y } ∈ E (G ),
fG (x, y ) :=
−p,
else.
Observation
The pair (A, B) is ε-regular iff for any two functions u : A → {0, 1}
and w : B → {0, 1},
X
X
fG (x, y ) · u(x) · w (y ) ≤ ε|A||B|.
x∈A y ∈B
Regularity—An Analytic Reformulation (cont’d)
Let G = (A ∪ B, E ) be a bipartite graph of density p.
Lemma
If the pair (A, B) is ε-regular then, for any pair of functions
u : A → [−1, 1] and w : B → [−1, 1], we have
X X
≤ 4ε|A||B|.
f
(x,
y
)
·
u(x)
·
w
(y
)
G
x∈A y ∈B
(1)
Proof.
I
If (1) fails for some functions u and w then by random
rounding we may assume that u and w take values in
{−1, 0, 1}.
I
Write u = u+ − u− and w = w+ − w− as differences of
{0, 1}-valued functions. The LHS of (1) can be written as a
sum of four terms each involving only these functions. If the
sum exceeds 4ε|A||B| then one term must exceed ε|A||B|.
Counting Lemma
Lemma (Counting Lemma)
Let G be a t-partite graph on V = V1 ∪ . . . ∪ Vt .
Let H be a graph o the vertex set [t] = {1, 2, . . . , t}, and let
#{H ⊆ G } be the number of labeled copies of H in G with vertex
i of H represented by some vi ∈ Vi .
Assume that for every edge {i, j} ∈ E (H), the pair (Vi , Vj ) is
ε-regular of density pij (no assumption on the non-edges). Then
#{H ⊆ G } −
|
Y
pij
{i,j}∈E (H)
{z
t
Y
i=1
t
Y
|E (H)|
·ε·
|Vi |.
|Vi | < 4 · 2
}
expectation in the random case
i=1
Proof of the Counting Lemma
For {i, j} ∈ [t] = V (H), let Gij = G (Vi , Vj ) and fij = Gij − pij .
X
X
Y
#{H ⊆ G } =
...
fij (xi , xj ) + pij
x1 ∈V1
xt ∈Vt {i,j}∈E (H)
|
{z
(∗)
}
Expanding the productQ(∗) yields an inner sum of 2|E (H)| terms.
One of these terms is ij∈E (H) pij . Summing up this term over all
Q
Q
t
p
(x1 , . . . , xt ) ∈ V1 × · · · × Vt yields
ij∈E (H) ij
i=1 |Vi | =: E .
Every other term in the expansion of (∗) involves at least one
factor fk` (xk , x` ), and all other factors depend only on xk or on xj ,
but not on both. Thus, if we fix xi , i 6= k, ` we can write the term
as fk` (xk , x` )u(xk )w (x` ). If we sum this up over xk and x` the
result is at most 4ε|Vk ||V` | in absolute value.
Q If we sum this up
over all xi , i 6= k, ` the result is at most 4ε ti=1 |Vi |. There are
less than 2|E (H)| such other terms in the expansion of (∗), so
t
Y
|#{H ⊆ G } − E | < 4 · 2|E (H)| · ε
|Vi |.
i=1
Proof of the Regularity Lemma—Mean Square Density
Proposition (Cauchy-Schwarz Inequality)
R
For every a = (a1 , . . . , ar ), b = (b1 , . . . , br ) ∈ r ,
r
r
r
X
X
X
2
2
2
2
2
bi2
ai bi = ha, bi ≤ kak2 kbk2 =
ai ) ·
i=1
i=1
i=1
Definition (Mean-Square Density)
R
P
f (x)
Let U be a finite set, f : U → a function, and let d := x∈U
|U|
be the mean of f on U.
P
x∈Ui f (x)
i|
Let U = U1 ∪ . . . ∪ Ur a partition, γi := |U
|U| and let di =
|Ui |
be the mean of f |Ui on Ui . Then
the
mean-square
density
of
f
P
w.r.t. the partition {Ui }ri=1 is ri=1 γi di 2 .
Lemma (Mean-Square Density Exceeds Square Density)
2
d ≤
r
X
i=1
γi di 2 .
Refinements, Vertex and Edge Partitions
Corollary (Refinements Increase Mean-Square Density)
R
Let f : U → , U = U1 ∪ . . . ∪ Ur , d, γi , and di as before,
1 ≤ i ≤ r.
S
Let {Uij } be a refinement of the partition {Ui }, i.e., Ui = j Uij ,
P
1 ≤ i ≤ r . Let dij :=
γij :=
|Uij |
|U| .
Then
x∈Uij
f (x)
|Uij |
X
i
be the mean of f |Uij and
γi di 2 ≤
X
γij dij 2 .
i,j
Remark (Vertex Partitions induce Partitions of Pairs)
Let G be a bipartite graph with vertex sets A and B, and let
A = A1 ∪ . . . ∪ Am and B = B1 ∪ . . . ∪ Bn be partitions of the
vertex sets. These induce a partition of U = A × B into subsets
Uij = Ai × Bj .
The mean-square density of G w.r.t. the partitions {Xi }, {Yj } is
the mean-square density of G : A × B → {0, 1} w.r.t. the induced
P
|Bj |
e(Ai ,Bj )
|Ai |
2
partition:
i,j αi βj dij , where αi = |A| , βj = |B| , dij = |Ai ||Bj |
Density Incremement, First Step
Lemma
Let G be a bipartite graph of density d with vertex sets A and B
and assume that G is not ε-regular. Then there are partitions
A = A1 ∪ A2 and B = B1 ∪ B2 such that the mean-square density
of G is at least d 2 + ε2
Proof.
Let f (x, y ) = G (x, y ) − d. By definition
P of regularity, there are
sets X ⊆ A and Y ⊆ B such that | (x,y )∈X ×Y f (x, y )| > ε|A||B|.
Set A1 = X , A2 = A \ X , B1 = Y , B2 = B \ Y , and let φij be the
mean of f on Ai × Bj . Then the mean of G on Ai × Bj is φij + d.
Note that |φ11 | > ε |A|A||B|
. The mean-square density of G is
1 ||B1 |
2
X
|Bj |
|B|
z}|{
X
αi βj (d + φij )2 ≥ d 2 + 2d
αi βj φij +α1 β1 φ11 2
|{z}
|{z}
| {z }
i,j
i,j=1 |A |
ε
i
>(
)2
d 2 +2dφij +φij 2
| {z }
α1 β1
|A|
=0
Density Increment, Second Step
Lemma (Main Lemma)
Let G be a bipartite graph with vertex sets A and B, respectively.
Let A = A1 ∪ . . . ∪ Am and B = B1 ∪ . . . ∪ Bn be partitions,
|Bj |
i|
be d 2 .
αi = |A
|A| , βj = |B| , and let the mean-square density
P
Let N := {(i, j) : G (Ai , Bj ) not ε-regular}. If (i,j)∈N αi βj > ε
then there exist refinements Ai = Ai1 ∪ . . . ∪ Aisi , 1 ≤ i ≤ m and
Bj = Bj1 ∪ . . . ∪ Bjtj , 1 ≤ j ≤ n, such that the mean-square density
of G w.r.t. the refined partition is at least d 2 + ε3 .
Moreover, all si ≤ 2n and all tj ≤ 2m .
Proof of the Regularity Lemma
If G is a bipartite graph, start with an arbitrary partition of the
vertex sets and iterate the main lemma as long as the partition is
not ε-regular.
Since the mean-square density can never exceed 1, the procedure
has to terminate after at most ε−3 steps. The resulting number of
2···
parts on either side is at most 22 (a tower of height ε−3 ).
If G is not bipartite then first choose a partition of the vertex sets
into k = 2/ε color classes such that at least (1 − ε/2) of all vertex
pairs are bicolored.
Maintain a partition for every color class and refine all partitions
until for every pair of color classes, the corresponding partitions are
ε/2-regular.
Triangle Removal Lemma & Roth’s Theorem
Theorem (Triangle Removal Lemma)
For every γ > 0 there exists δ = δ(γ) > 0 such that
the following
n
holds: If G is an n-vertex graph with at most δ 3 triangles
then
one can make G triangle-free by deleting at most γ n2 edges.
(Proof: Homework)
Theorem (3-Term Arithmetric Progressions (Roth, 1953))
For every > 0 there exists N = N() such that for n ≥ N, any
S ⊆ [n] with |S| ≥ n contains a three-term arithmetic progression.
Remark
Deriving the general form of Szemerédi’s Theorem (arithmetic
progressions of length k) from Szemerédi’s Regularity Lemma is
HIGHLY non-trivial.
First Application of Regularity: Proof of Roth’s Theorem
Create a tri-partite graph H = H(S) from S.
Vertices:
V (H) = {(i, 1) : i ∈ [n]} ∪ {(j, 2) : j ∈ [2n]} ∪ {(k, 3) : k ∈ [3n]}
Edges: (i, 1) and (j, 2) are adjacent if j − i ∈ S
(j, 2) and (k, 3) are adjacent if k − j ∈ S
(i, 1) and (k, 3) are adjacent if k − i ∈ 2S
Note. {(i, 1), (j, 2), (3, k)} is a triangle in H iff j = i + x and
k = i + x + y with x, y ∈ S and x + y ∈ 2S.
If x = y , we have a boring triangle {(i, 1), (i + x, 2), (i + 2x, 3)}.
There is one of these for every i ∈ [n], x ∈ S ⇒ and they are
pairwise edge-disjoint.
|V (H)|
edges must be removed
Thus, at least |S|n ≥ n2 ≥ 18
2
from H to make it triangle-free.
So H contains at least δ |V (H)|
3
triangles, where δ = δ 18
as in the Removal Lemma.
Interesting triangles (x 6= y ) correspond to 3-term arithmetic
progressions {x, y , x+y
} ⊆ S. No such ⇒ number of triangles in H
2
6n
2
is n|S| ≤ n < δ 3 , contradiction, provided n > N() := 1δ .
Application 2—Property Testing
Input extremely large: even a linear time algorithm takes too long.
Have: random access to every entry of the input. One read from
the input is called a query.
Want: some information in constant time.
An input, given as function f : D → F , is ε-close to satisfying
property P if there exists a function f 0 : D → F that
I
satisfies P
I
differs from f at no more than ε|D| places.
An input that is not ε-close to satisfying P is called ε-far from
satisfying P.
Let P be a property and n be the input size. An ε-test for P with
q = q(ε) queries is a randomized algorithm that reads the input up
to q places and with probability at least 23 distinguishes between
the case that the input satisfies P and the case that the input is
ε-far from satisfying P.
Want: q independent of n.
Triangle-Freeness is Testable
Let ε be given. Let δ = δ(ε) be from the Removal Lemma.
Let G = (V , E ) be an input graph.
Choose uniformly and independently ` = 2δ vertex triplets
T1 , . . . , T` ⊆ V , |Ti | = 3.
If a triangle is found reject G , otherwise accept it.
What is the probability that we reject a triangle-free input G ? 0
What is the probability that we accept an input G which is ε-far
from being triangle-free? at most 31
Why? By the Removal Lemma,
Pr [Ti forms a triangle in G ] ≥ δ
Pr [none of T1 , . . . , T` is a triangle] ≤ (1 − δ)` ≤ e −δ`
1
≤ e −2 <
3