Hardness of Finding Independent Sets in Almost q

Hardness of Finding Independent Sets in Almost q-Colorable
Graphs
Subhash Khot∗
Rishi Saket†
April 5, 2012
Abstract
We show that for any ε > 0, and positive integers k and q such that q ≥ 2k + 1, given a graph
on N vertices that has a q-colorable induced subgraph of (1 − ε)N vertices, it is NP-hard to find an
N
independent set of qk+1
vertices. This substantially improves upon the work of Dinur et al. [DKPS10]
who gave a corresponding bound of qN2 .
Our result implies that for any positive integer k, given a graph that has an independent set of ≈
(2k + 1)−1 fraction of vertices, it is NP-hard to find an independent set of (2k + 1)−(k+1) fraction of
vertices. This improves on the previous work of Engebretsen and Holmerin [EH08] who proved a gap of
k
≈ 2−k vs 2−(2 ) , which is best possible using techniques (including those of [EH08]) based on the query
efficient PCP of Samorodnitsky and Trevisan [ST00].
∗
†
New York University, [email protected]
IBM T. J. Watson Research Center, [email protected]
1 Introduction
Given a graph, an independent set is a subset of the vertices that does not contain both end points of any
edge. Computing a maximum sized independent set is a very well studied problem in computer science and
combinatorics. A related problem is that of graph coloring, where the goal is to color the vertices of a given
graph using minimum number of colors such that no edge has both end points of the same color. It is easy
to see that if a graph is colorable by q colors then it must contain an independent set of q −1 fraction of the
vertices. Conversely, if a graph does not contain an independent set of q −1 fraction of vertices, then it cannot
be colored using q colors. Thus, the absence of a large independent set certifies that the chromatic number
of the graph – the minimum number of colors required to color the graph – is large.
Another closely related problem is that of computing the minimum vertex cover, i.e. a subset of vertices
of minimum size that contains at least one end point of every edge in the graph. The complement of vertex
cover is an independent set. Therefore, as in the case of coloring, the absence of a large independent set
guarantees that every vertex cover is large. The best known inapproximability result for vertex cover is due
to Dinur and Safra [DS05] who showed that it is NP-hard to approximate it within a factor of 1.36. The
result of [DS05] showed in particular that it is NP-hard to decide whether a graph of N vertices has an
independent set of size ≈ N3 or every independent set is of size at most ≈ N9 .
Building upon the work of Dinur and Safra [DS05], Dinur, Khot, Perkins and Safra [DKPS10] proved
that: for any positive integer q ≥ 3 and small constant ε > 0, given a graph, it is NP-hard to decide whether
(i) the graph contains an induced subgraph of (1 − ε)-fraction of the vertices which is q-colorable, where
each color class has (1 − ε) 1q fraction of the vertices in the graph, or (ii) the maximum independent set in
the graph contains less than q12 fraction of the vertices. In other words, it is NP-hard to find an independent
set of q12 fraction of the vertices in almost q-colorable graphs.
In this work, we generalize and substantially improve upon the result of Dinur, Khot, Perkins and Safra
[DKPS10]. We show that for any positive positive integer k, any integer q such that q ≥ 2k + 1, and an
arbitrarily small constant ε > 0, given a graph, it is NP-hard to decide between the following two cases,
• The graph contains a q-colorable induced subgraph of (1 − ε)-fraction of vertices, where each color
class has (1 − ε) 1q fraction of vertices.
• Every independent set in the graph has less than
1
q k+1
fraction of the vertices.
The reduction used to prove this result builds upon the ideas of [DS05] and [DKPS10] and combines them
with a new outer verifier based on the query-efficient Probabilistically Checkable Proof (PCP) system of
Håstad and Khot [HK05]. We elaborate more on these techniques later in this section.
Note that for the setting of parameters k = 1 and 3 ≤ q ≤ 4, our result is the same as that of [DKPS10],
i.e. it is NP-hard to distinguish between an almost q-colorable graph and a graph where the maximum
independent set contains at most q12 fraction of vertices. For larger values of k and q, our result yields
substantially better hardness factors. For example, setting k = 2 and q = 22 + 1 = 5 shows that it is
1
NP-hard to find an independent set of 125
fraction of the vertices in an almost 5-colorable graph. Setting
1
3
fraction of vertices
k = 3 and q = 2 + 1 = 9 gives us that it is NP-hard to find an independent set of 6561
in an almost 9-colorable graph.
In terms of hardness of approximating maximum independent set, our result shows that: for any positive
integer k it is NP-hard to find an independent set of (2k + 1)−(k+1) fraction of vertices in a graph which has
an independent set of ≈ (2k + 1)−1 fraction of vertices. This improves upon previous work of Engebretsen
and Holmerin [EH08] who showed that, for any integer k ≥ 2 it is NP-hard to compute an independent set
1
k
of ≈ 2−( 2) fraction in a graph which has an independent set of ≈ 2−k fraction of vertices. The result of
Engebretsen and Holmerin [EH08] extended the work of Samorodnitsky and Trevisan [ST00] who gave a
k
2
gap of ≈ 2−k versus 2−k /4 (say for even k). The latter paper also demonstrated why 2−(2) is a natural
bottleneck for these proof techniques. Roughly, the reason is that there exist functions f : {0, 1}n 7→
{0, 1} that have no non-negligible Fourier coefficients, but still pass the full hyper-graph linearity test with
k
L
L
probability 2−(2) , i.e. with this much probability f ( i∈S xi ) =
i∈S f (xi ), ∀S ⊆ {1, 2, . . . , k} for a
k
n
random choice of x1 , . . . , xk ∈ {0, 1} . On the other hand, as soon as the probability exceeds 2−(2), the
function must have a non-negligible Fourier coefficient, even when the test is carried out only for all k2 sets
S, |S| = 2. We find it quite interesting that this bottleneck is bypassed in our result via completely different
techniques based on [DS05, DKPS10].
It is pertinent to note that assuming the Unique Games Conjecture (UGC) of Khot [Kho02], Bansal
and Khot [BK09, BK10] showed that for any constant δ > 0 it is NP-hard to find an independent set of δ
fraction of vertices in almost 2-colorable graphs. This bound is significantly stronger than those obtained
unconditionally, including the results in this paper. However, since UGC remains unresolved we believe that
our results – in addition to proving stronger unconditional lower bounds – shed new light on the applicability
of some important techniques in PCP theory, especially on the methods developed in the work of Dinur and
Safra [DS05]. In the rest of this section we shall define the problems studied and formally state our results.
We shall also review the related previous work and informally describe the techniques used in this work.
1.1 Problem Definition
We begin by defining a decision problem for the size of the maximum independent set as follows.
I NDEPENDENT S ET(c, s): Given a graph G(V, E), decide between the following cases.
• YES Case: IS(G) ≥ c|V |.
• NO Case: IS(G) < s|V |.
where IS(G) is the size of the maximum independent set in G.
Given a graph G, let χ(G) be its chromatic number, i.e. the minimum number of colors required to color
the graph such that every edge has distinctly colored end points. The graph coloring problem is defined as:
C OLORING(q, Q) : Given a graph G(V, E), decide between,
• YES Case: χ(G) ≤ q.
• NO Case: χ(G) ≥ Q.
It is easy to see that if C OLORING(q, Q) is NP-hard for some parameters q, Q ∈ Z+ then it is NP-hard to
color a q-colorable graph with Q − 1 colors. In this paper we study a slight variant of graph coloring, which
we refer to as almost coloring, which is defined, for positive integers q, Q and a constant ε > 0, as follows.
A LMOST C OLORINGε (q, Q): Given a graph G(V, E), decide between,
• YES Case: There is a subset of (1 − ε) fraction of the vertices, such that the graph G′ induced by it
satisfies, χ(G′ ) ≤ q. We also denote this by χε (G) ≤ q.
• NO Case: IS(G) <
|V |
Q .
Note that the second property above, i.e. IS(G) <
|V |
Q ,
2
implies that χε (G) ≥ Q for sufficiently small ε > 0.
1.2 Our Results
The main theorem that we prove is stated below.
Theorem 1.1. For any constant ε > 0, and positive integers k and q such that q ≥ 2k + 1, given a graph
G(V, E), it is NP-hard to distinguish between the following two cases:
• YES Case: There are q disjoint independent sets V1 , . . . , Vq ⊆ V , such that |Vi | =
i = 1, . . . , q.
• NO Case: There is no independent set in G of size
1
|V
q k+1
(1−ε)
q |V
| for
|.
The following theorems follow directly from Theorem 1.1.
Theorem 1.2. For any constant ε > 0, a positive integer k and integer q such that q ≥ 2k + 1,
A LMOST C OLORINGε (q, q k+1 ) is NP-hard.
Theorem 1.3. For any constant ε > 0, a positive integer k and integer q such that q ≥ 2k + 1,
1
I NDEPENDENT S ET((1 − ε) 1q , qk+1
) is NP-hard.
1.3 Previous Work
In the results stated in this subsection, ε > 0 shall be taken to be an arbitrarily small constant. The independent set and graph coloring problems are NP-hard in general and much of the research on these
problems has focused on understanding their approximability. For approximating
maximum independent
n(log log n)2
-approximation for it, where n
set, the best algorithm is due to Feige [Fei04] who gave a O
log3 n
is the number of vertices in the graph. On the other hand, Håstad [Hås99] gave a n1−ε hardness factor
for
√
1−O(1/
log
log
n)
maximum independent set which was improved by Engebretsen and Holmerin [EH00] to n
1−γ
(log
n)
for some fixed γ > 0. The current best inapproximability is by
and by Khot [Kho01] to n/2
Khot and Ponnuswami [KP06], who showed that I NDEPENDENT S ET(c, s) is NP-hard, assuming NP6⊆
3/4+ε
. However, the parameter c is a subconstant, i.e. c = o(1)
DTIME(2poly(log n) ), where c/s ≥ n/2(log n)
in the result of [KP06] (as also in [Hås99], [EH00] and [Kho01]). For constant values of c, Khot and Regev
[KR08] and subsequently Bansal and Khot [BK09, BK10] showed, assuming UGC, that I NDEPENDENT S ET( 21 −
ε, ε) is NP-hard. Unconditionally however, the previous best result showing that I NDEPENDENT S ET((1 −
k
ε)2−k , (1 + ε)2−( 2) ) is NP-hard for any integer k ≥ 2 was proved by Engebretsen and Holmerin [EH08]
(after applying the FGLSS reduction [FGL+ 96]). They extended the query efficient PCP of Samorodk
nitsky and Trevisan [ST00] who, as mentioned before, also demonstrated why 2−(2) is a natural bottleneck for these techniques. In our work, we bypass this bottleneck by proving in Theorem 1.3 that
I NDEPENDENT S ET((1 − ε)(2k + 1)−1 , (2k + 1)−(k+1) ) is NP-hard for any positive integer k.
The related problem of C OLORING(q, Q) has also been well studied for various ranges of the parameters
q, Q. C OLORING(2, Q) can be solved efficiently by checking whether a graph is bipartite. For q = 3, a long
line of research ([Wig83], [Blu94], [KMS98], [BK97], [ACC06]) shows that C OLORING(3, nα ) can be
solved where the best value of α ≈ 0.207. For general values of q, Halperin, Nathaniel and Zwick [HNZ02]
solve C OLORING(q, nαq ) for some constant αq ∈ (0, 1) depending on q. These algorithmic results also hold
for A LMOST C OLORINGε (q, Q), for small enough values of ε.
The hardness of approximation results for graph coloring are quite far from matching the algorithmic
upper bounds. For q = 3 the best hardness result shows that C OLORING(q, Q) is NP-hard for Q = 5, and
3
for general q it is hard for Q = q + 2⌈ 3q ⌉ [KLS00, GK04]. For all sufficiently large (but unspecified) q, Khot
log q
[Kho01] showed that the problem is NP-hard for Q = q 25 . For the almost coloring variant, Dinur, Khot,
Perkins and Safra [DKPS10] showed that A LMOST C OLORINGε (q, q 2 ) is NP-hard for all values of q ≥ 3.
We note that our result in Theorem 1.2 significantly improves on the work of [DKPS10] for all values of
q ≥ 5. It is incomparable to the result of Khot [Kho01] since our lower bound is better and holds also for
small values of q, though it is only for the almost coloring variant.
Assuming the UGC yields stronger results for graph coloring. Bansal and Khot [BK09, BK10] showed,
assuming UGC, that A LMOST C OLORINGε (2, Q) is NP-hard for any positive integer Q ≥ 3. Similarly,
assuming a variant of UGC, Dinur, Mossel and Regev [DMR09] showed that C OLORING(3, Q) is NP-hard
for all positive integers Q ≥ 3.
1.4 Our Techniques
The overall approach of our proof builds upon the ideas of [DS05] and [DKPS10]. The constructions in
[DS05], [DKPS10] and in our paper consist of three main parts : (i) An initial Label Cover Problem, (ii)
Outer Verifier on blocks of variables, and (iii) a final Combinatorial Gadget. In the remainder of this section
we shall describe how our construction differs from and builds upon the work of [DS05] and [DKPS10] in
each of these steps ([DKPS10] can be considered as the special case k = 1).
Label Cover
The constructions of [DS05] and [DKPS10] use the same Label Cover instance based on the PCP Theorem
[AS98, ALM+ 98] combined with Raz’s Parallel Repetition Theorem [Raz98]. Using this they prove the
following theorem on independent sets in a special class of graphs. Before we state the theorem, let us
define (m, r)-co-partite graphs: a graph G(V, E) is (m, r)-co-partite if V = M × R, where |M | = m,
|R| = r, such that for each i ∈ M , the subset of vertices {i} × R is a clique.
The following theorem can be deduced from the PCP Theorem and Raz’s Parallel Repetition Theorem.
Theorem 1.4. For any δ > 0 and positive integer h, there exists a parameter r such that the following
problem is NP-hard : Given an (m, r)-co-partite graph G, decide between the following two cases:
• YES Case: There is an independent set in G of size m.
• NO Case: Any subset of vertices of G of size at least δm contains a clique of size h.
The above theorem is, however, not strong enough to be used in our reduction. We require a strengthening of the NO Case to lower bound the size of cliques in the k-wise repeated graph Gk , for a given positive
integer k. This k-wise repeated graph has vertex set V k and any two tuples (u1 , . . . , uk ), (v1 , . . . , vk ) ∈ V k
are adjacent if there exist edges between ui and vi in G for all i = 1, . . . , k. Note that |V k | = (mr)k . In
this work we prove and use the following theorem.
Theorem 1.5. For any δ > 0 and positive integers k, h, there exists a parameter r such that the following
problem is NP-hard: Given a (m, r)-co-partite graph G and letting Gk be its k-wise repetition, decide
between the following two cases:
• YES Case: There is an independent set in G of size m.
• NO Case: Any subset of V k of size at least δmk contains a clique (in Gk ) of size h.
4
We would like to emphasize that the above theorem does not automatically follow from Theorem 1.4,
even in the case k = h = 2. In the NO Case of Theorem 1.4, the graph G may have an independent set I
C
of size m
r (since only independent set of size δm is ruled out and r = (1/δ) for some large constant C
2
therein). Then I × V is an independent set of size m (recall that |V | = mr) in G2 !
Indeed, we require a considerable effort to prove Theorem 1.5. The proof proceeds via the query efficient
PCP over a large alphabet constructed by Håstad and Khot [HK05] (which itself builds on [ST00, HW03]).
This PCP consists of a set of tests or predicates, each over a small subset of the variables. The number
of satisfying patterns, say B, for any predicate is much smaller than the inverse of the soundness of the
PCP. Roughly speaking, this rules out the existence of even the not-so-large independent sets (of ≈ 1/B k
fraction) in the NO case of the co-partite graph G derived from this PCP. Combining this property with a
careful analysis of the structure of Gk yields the desired hardness result. Theorem 1.5 is restated as Theorem
3.1 and proved in Section 6.
Outer Verifier
The Outer Verifier is a new graph derived from the (m, r)-co-partite graph G(V, E) given by Theorem 1.5
stated above. A simplified
overview of the Outer Verifier graph is as follows. Set a parameter l ≫ r and for
every block B ∈ Vl and true-false assignment a : B 7→ {T, F}, there is a vertex (B, a). The T values are
supposed to correspond to an independent set in G. For some technical reasons, only those assignments are
considered which have a sufficient number of T values, and we call this set of assignments RB . The vertices
corresponding to assignments RB for any particular block B form a clique. Our construction crucially
differs from that of [DS05] and [DKPS10] in the manner in which edges across different blocks are defined.
In the previous constructions of [DS05, DKPS10], there is an edge between
(B1 , a1 ) and (B2 , a2 ) if (i)
V
B1 = B̂ ∪ {u} and B2 = B̂ ∪ {v} where {u, v} ∈ E and B̂ ∈ (l−1)
is a sub-block, and (ii) either the
restrictions of a1 and a2 on the sub-block B̂ are inconsistent or a1 (u) = a2 (v) = T. The vertices u and v
are referred to as pivot vertices for this edge. It can be shown that if there is a large independent set in G
then there is a large independent set in the Outer Verifier graph as well.
In our construction, we define edges in a more general manner: (B1 , a1 ) and (B2 , a2 ) have an edge
between them iff ([DKPS10] corresponds to the case k = 1):
(i) B1 = B̂
∪ {u1 , . . . , uk } and B2 = B̂ ∪ {v1 , . . . , vk } where {ui , vi } ∈ E for all i = 1, . . . , k and
V is a sub-block.
B̂ ∈ (l−k)
(ii) Either the restrictions of a1 and a2 on B̂ are inconsistent or a1 (ui ) = a2 (vi ) = T for some i ∈
{1, . . . , k}.
In our construction there are k pairs of pivot vertices. As before, it can be shown that the Outer Verifier
graph in our construction, which we denote as GB , has a large independent set if G has a large independent
set. Since we have k pairs of pivot vertices, for our analysis we require the property about the existence of
large cliques in Gk given by Theorem 1.5.
Combinatorial Gadget
The combinatorial gadget we use is essentially the same as the one used in [DKPS10]. For any integer q,
probability parameter p and dimension n, consider the graph Gq,p [n] whose vertex set is {∗, 1, . . . , q}n ,
which we refer to as colorings. There are edges between two colorings F1 and F2 if for all i = 1, . . . , n,
(F1 (i), F2 (i)) 6= {(1, 1), (2, 2), . . . , (q, q)}, i.e. no coordinate is colored with the same symbol in [q] by
5
both the colorings. The colorings are equipped with a product measure that assigns, in each coordinate, a
weight 1 − p (which is set to be very small) for the ‘∗’ symbol and weight p/q to each of the symbols in [q].
Every coordinate i ∈ [n] yields [q] disjoint independent sets of measure p/q each, by taking the q subsets
of colorings which have a particular symbol from [q] in the ith coordinate. It is also easy to see that maximal
independent sets in Gq,p [n] are monotone under the following partial order in each coordinate: ∗ < j for
all j ∈ [q]. The results proved in [DKPS10], based on Friedgut’s and Russo’s theorems, imply that every
monotone subset of the colorings is a junta under a small perturbation of the bias p. The Diametric Theorem
of Ahlswede and Khachatarian [AK98] can be applied to show that when p < 1, any monotone subset of the
colorings of measure at least 1/q k+1 contains two colorings F, F ′ such that there are at most k coordinates
on which both F, F ′ have the same value in [q]. The work of [DKPS10] also uses a similar fact, albeit only
for k = 1.
We use this gadget with the graph
GB in our construction as follows: in our final graph there is a copy of
V
Gq,p [|RB |] for each block B ∈ l , consisting of the vertex set {∗, 1, . . . , q}RB , i.e. colorings of the block
assignments RB . In other words there is a vertex (B, F ) for every coloring F to the block assignments RB .
There is an edge between (B1 , F1 ) and (B2 , F2 ) if both the following conditions are satisfied.
(1) B1 = B̂ ∪ {u1 , . . . , uk } and B2 = B̂ ∪ {v1 , . . . , vk } where {ui , vi } ∈ E for all i = 1, . . . , k and
V
B̂ ∈ (l−k)
is a sub-block.
(2) For all a1 ∈ RB1 and a2 ∈ RB2 : F1 (a1 ) = F2 (a2 ) ∈ [q] =⇒ there is an edge between (B1 , a1 ) and
(B2 , a2 ) in GB .
A crucial ingredient in our construction is to upper bound the value of q for which the final graph is dense
enough. For this we prove that for all values of q ≥ 2k + 1 the following holds : for any two blocks B1
and B2 satisfying property (1) above, there is a joint distribution on the q-colorings F1 ∈ {1, . . . , q}RB1 and
F2 ∈ {1, . . . , q}RB2 such that F1 and F2 are uniformly distributed as marginals and (B1 , F1 ) and (B2 , F2 )
have an edge between them in the final graph. This follows from the following lemma that is restated as
Lemma 2.10 and proved in Section 2.4.
Lemma. For any positive integers k, q such that q ≥ 2k + 1, there exists a joint distribution over two qcolorings of 2[k] , viz. f, g : 2[k] 7→ [q] such that : (i) the marginal distributions of both f and g are uniform
over all the q-colorings of 2[k] , and (ii) for any S, T ⊆ [k], if S ∩ T = ∅, then f (S) 6= g(T ).
The work of [DKPS10] uses a similar fact, but only for k = 1 and q ≥ 21 + 1 = 3. For k = 1, the
structural constraints satisfied by the two colorings have been frequently referred to as “α-constraints” (for
eg. in [DS05], [DKPS10] and [DMR09]).
1.5 Organization of the paper
In the next section we describe some useful objects and state some relevant combinatorial results. In Section
3 we give the hardness reduction and state its completeness and soundness properties which are proved in
Sections 4 and 5 respectively. The rest of the paper contains the proof of a strengthened hardness result for
finding independent sets in co-partite graphs, formally stated as Theorem 3.1.
2 Preliminaries
As in [DKPS10] we first formally define and state properties of several objects before we describe the
reduction. For the common objects we follow notation similar to that of [DKPS10].
6
2.1 Graph Gq,p [n]
For any positive integers n, q and probability parameter p ∈ (0, 1] the graph Gq,p [n] consists of the weighted
vertex set {∗, 1, . . . , q}n . The measure µp on the vertex set is a product measure assigning, in each coordinate, probability mass 1 − p to ∗ and pq to each of the remaining q elements. There is an edge between
vertices F1 , F2 ∈ {∗, 1, . . . , q}n iff for all i ∈ [n] (F1 (i), F2 (i)) 6∈ {(1, 1), (2, 2), . . . , (q, q)}.
2.2 Definitions
• Monotonicity: A family F ⊆ {∗, 1, . . . , q}n is monotone if F ∈ F implies F ′ ∈ F where F ′ can be
obtained by changing some ∗ in any coordinate of F to some element in [q]. It is easy to see that any
maximal independent set in Gq,p [n] is monotone.
• Every element of {∗, 1, . . . , q}n is referred to as a coloring of [n].
• Two colorings F1 and F2 agree on i ∈ [n] if F1 (i) = F2 (i) ∈ [q] (note that the common coordinate is
not ∗).
• A family F of colorings of [n] is agreeing if for all F1 , F2 ∈ F, there is an i ∈ [n] such that F1 , F2
agree on i.
• A family F is (k + 1)-agreeing if for all F1 , F2 ∈ F there exists a set of k + 1 coordinates
{i1 , i2 , . . . , ik+1 } ⊆ [n] so that F1 , F2 agree on all of them, i.e. F1 (ij ) = F2 (ij ) ∈ [q] for all
j ∈ [k + 1].
• A set C ⊆ [n] is a (δ, p)-core for a family F, if there exists a family F ′ such that µp (F△F ′ ) ≤ δ and
F ′ depends only on the coordinates in C, i.e. for any F ∈ {∗, 1, . . . , q}n , changing the value of the
coordinates in [n] \ C does not affect whether F is in F ′ .
• For t ∈ (0, 1) and a subset C ⊆ [n] let a core-family [F]tC be defined as follows,
(
)
[F]tC = F ∈ {∗, 1, . . . , q}C |
Pr
(F, F ′ ) ∈ F > t ,
[n]\C
F ′ ∈µp
where (F, F ′ ) is a combined coloring that is obtained by choosing the coloring assigned by F on
coordinates in C and by F ′ on coordinates in [n] \ C.
• The influence of a coordinate i ∈ [n] for a family F is defined as follows:
Inf pi (F) := µp ({F : F |i=∗ 6∈ F and F |i=r ∈ F for some r ∈ [q]}) ,
where F |i=∗ is a coloring identical to F except on the ith coordinate where it is ∗, and similarly for
F |i=r for any r ∈ [q].
• The average sensitivity of F at p is,
asp (F) :=
n
X
i=1
7
Inf pi (F).
2.3 Useful Results
We begin by stating following simple lemma proved in [DKPS10] (as Lemma 1), and we omit the proof
here.
Lemma 2.1. [Russo’s Lemma [Rus82]] Let F ⊆ {∗, 1, . . . , q}n be monotone, then µp (F) is increasing
with p. In fact,
1
dµp (F)
· asp (F) ≤
≤ asp (F).
q
dp
For our analysis we shall require Friedgut’s Theorem [Fri98] which we state below.
Theorem 2.2. [Friedgut’s Theorem [Fri98]] Fix δ > 0. Let F ⊆ {∗, 1, . . . , q}n with a = asp (F). There
a/δ
exists a function CF riedgut (p, δ, a) ≤ cp , for a constant cp depending only on p, so that F has a (δ, p)-core
C of size |C| ≤ CF riedgut (p, δ, a).
The following theorem of Bourgain, Kahn, Kalai, Katznelson and Linial [BKK+ 92] lower bounds the
average sensitivity of monotone families.
Theorem 2.3. ([BKK+ 92]) For any monotone family F such that µp (F) ≤ 1/2,
1
.
asp (F) ≥ µp (F) log
µp (F)
We now bound the maximum size of a (k + 1)-agreeing monotone family. We shall use the Diamteric
Theorem of Ahlswede and Khachatarian [AK98] which we restate here.
Theorem 2.4. (Diametric Theorem [AK98]) Consider the family {1, . . . , q}n for some q ≥ 2, and fix a
positive integer t < n. For any integer i ≥ 0, define,
n
o
Ki := F ∈ {1, . . . , q}n | |{j ∈ [1, t + 2i] | F (j) = 1}| ≥ t + i .
Let r ≥ 0 be the smallest non-negative integer satisfying,
t−1
t + 2r < min n + 1, t + 2
.
q−2
Then the maximum size of a t-agreeing subset of {1, . . . , q}n is |Kr |. Note that |K0 | = q n−t .
The following two lemmas follows from a simple application of the above theorem.
Lemma 2.5. Let k, q be positive integers with q ≥ 2k + 1, p ∈ (0, 1]. Let I ⊆ {∗, 1, . . . , q}n be a monotone
and agreeing family, where n ≫ k. Then, µp (I) ≤ 1/q.
Proof. By monotonicity and using Lemma 2.1 we have that µ1 (I) ≥ µp (I). With our setting of parameters
and applying Theorem 2.4 we obtain that any 1-agreeing subset of {1, . . . , q}n contains at most 1/q fraction
of elements. Thus µp (I) ≤ µ1 (I) ≤ 1/q.
Lemma 2.6. Let k > 0 be any integer such that F ⊆ {∗, 1, . . . , q}n be monotone with n ≫ k and
k+1
q ≥ 2k + 1 and µp (F) ≥ 1q
for some 0 < p < 1. Then F is not (k + 1)-agreeing. Specifically, there
exist F1 , F2 ∈ F such that F1 and F2 agree on at most k coordinates.
8
Proof. We may assume that µp (F) < 1, otherwise the lemma is trivially true. Along with monotonicity,
k+1
this implies that asp (F) > 0. From Lemma 2.1 and since p < 1, we have that µ1 (F) > µp (F) ≥ 1q
.
n
Therefore, we can assume F to be a subset of [q] – this clearly preserves the (k + 1)-agreeing property if
it existed. Since n ≫ k, in Theorem 2.4 using the setting of parameters t = k + 1 and q ≥ 2k + 1, we get
k+1
, the subset F is not (k + 1)-agreeing.
that r = 0. This implies that since µ1 (F) > 1q
We shall also require the following “Sunflower Theorem” of Erdős and Rado [ER60].
Theorem 2.7. There is a function CER (ℓ, d) so that for any I ⊆ Rℓ , if |I| ≥ CER (ℓ, d) then there are d
distinct sets I1 , . . . , Id ∈ I so that if ∆ = I1 ∩ I2 ∩ · · · ∩ Id then the sets Ij \ ∆ are pairwise disjoint for
j = 1, . . . , d. This also holds when I contains subsets of size at most ℓ rather than exactly ℓ.
Finally we state below Propositions 2.8 and 2.9 (proved as Proposition 3 in [DKPS10] and Lemma 3.1
in [DS05] respectively).
Proposition 2.8. Let F ⊆ {∗, 1, . . . , q}n , and C ⊆ [n].
3
• If F is monotone, then so is [F]C4 .
3
• If F is agreeing, then so is [F]C4 .
Proposition 2.9. If C is a (δ, p)-core of F, then
µC
p
3
4
[F]C
≥ µp (F) − 4δ.
2.4 A Joint Distribution of Two Colorings of 2[k]
Before we describe the reduction we shall show the existence of a joint distribution over two colorings of
subsets of [k] (for any integer k ≥ 1) satisfying some desired properties. Let q be an integer such that
q ≥ 2k + 1. Consider the set 2[k] . A coloring f of 2[k] with q colors is a mapping f : 2[k] 7→ [q]. We prove
the following lemma.
Lemma 2.10. For any integer k ≥ 1 and integer q ≥ 2k + 1, there exists a joint distribution D over two
colorings f, g : 2[k] 7→ [q] satisfying the following two properties.
1. The marginal distributions on f and g are identical and same as that of a random mapping of 2[k] to
[q].
2. For any pair of colorings such that D(f, g) > 0, if S, T ⊆ [k] such that S ∩T = ∅, then f (S) 6= g(T ).
Proof. The distribution D is constructed as follows. Choose a random coloring f . Based on this choice
of f we shall construct a coloring g which satisfies the properties in the lemma. Let the set Af := {j ∈
[q] | |f −1 (j)| ≥ 1}, i.e. Af is the set of colors that are used at least once in the coloring f . Also, let
Bf := {j ∈ [q] | |f −1 (j)| > 1} be the set of colors used at least twice in the coloring f . Let j ∗ := f (∅),
and let Bf′ = Bf ∪ {j ∗ }. By a simple counting argument it is easy to see that,
|[q] \ Af | ≥ |Bf′ |.
We construct the coloring g in a random way as follows:
9
(1)
1. Choose a random injective (one-to-one) map h : Bf′ 7→ ([q] \ Af ). Such a map exists by Equation (1).
2. Define g as follows:
(
h(f (S))
g(S) =
f (S)
if f (S) ∈ Bf′ .
otherwise.
Observe that in the above construction, the color classes in f corresponding to the set of colors Bf′ are
colored in g with randomly chosen distinct colors from [q] \ Af . This preserves the size of the color classes
and therefore the marginal distribution of g is identical to the uniform distribution. It is easy to see this also
satisfies property 2 of the lemma.
3 Reduction
In this section we give the hardness reduction for proving Theorem 1.1. As in previous works [DS05,
DKPS10], we begin with the problem of finding independent sets in co-partite graphs. However, we need a
stronger hardness of finding nearly independent sets in what we refer to as the k-wise repeated graph.
A graph G(V, E) is (m, r)-co-partite if V = M × R, where |M | = m, |R| = r, such that for each
i ∈ M , the subset of vertices {i} × R is a clique. Let IS(G) be the size of maximum independent set in G.
For an (m, r)-co-partite graph G(V, E), define the k-wise repeated graph Gk consisting of vertex set
k
V . Note that |V k | = (mr)k . There is an edge between u = (u1 , . . . uk ), v = (v1 , . . . , vk ) ∈ V k if there is
an edge in G between ui and vi for all i ∈ [k]. For any integer h ≥ 2, let ISh (Gk ) be the maximum size of a
subset of vertices of Gk which do not contain a clique of size h. For parameters h, γ, r, k the khIS(r, γ, h, k)
problem is: given a (m, r)-co-partite graph G, distinguish between the cases:
• YES case: IS(G) = m.
• NO case: ISh (Gk ) < γmk .
In Section 6 we prove the following theorem.
Theorem 3.1. For any k, h, γ > 0, there exists a constant r = r(k, h, γ) such that the problem khIS(r, γ, h, k)
is NP-hard.
The theorem below states our reduction starting from a (m, r)-co-partite graph G to a weighted graph
GqB and, along with Theorem 3.1, proves Theorem 1.1.
Theorem 3.2. For any ε > 0 and positive integers k, q such that q ≥ 2k + 1, there exists a small enough
ε0 > 0 and large enough h > 0 (both depending only on k, q and ε) such that: for a constant integer r,
given an (m, r)-co-partite graph G(V, E), there is a polynomial time reduction to a weighted graph GqB ,
such that:
• Total weight of all the vertices in GqB is 1.
• (Completeness) IS(G) = m implies that there are q disjoint independent sets I1 , . . . , Ik in GqB , each
q
′
of weight 1−2ε
q . In particular, there is a q-colorable subset of vertices V in GB with weight 1 − 2ε.
• (Soundness) ISh (Gk ) < ε0 mk implies that IS(GqB ) < 1/q k+1 where IS(GqB ) is the maximum weight
of an independent set in GqB .
The rest of this section and Sections 4 and 5 are devoted to proving the above theorem. We begin with
the setting of certain parameters based on the values of ε, k and q given in the statement of Theorem 3.2.
10
3.1 Setting of parameters
From the statement of Theorem 3.2 we are given ε > 0 and positive integers k and q satisfying q ≥ 2k + 1.
Additionally, we define the following list of parameters.
• p = 1 − ε.
• ε1 =
ε
.
8q 2k+4
1
• h0 = supp′ ∈[1− ε ,1− ε ] CF riedgut (p′ , 16
ε1 , 8q
ε ), where CF riedgut is the function from Theorem 2.2.
2
• η=
1
h0 2k+3
• h1 = h0 +
• hs = 1 +
4
(2k+1 +1)h0
p
q
l
8q
εη
Ph 0
m
.
j=0
h1 j
.
· q h0 · q h0 .
• h = CER (h1 , hs ), where CER is the function given by Theorem 2.7.
• ε0 = ε1 ·
1
2k+5
·
1
.
kk
• lT = max{4 ln 2ε , 2k+4 · (h1 )2 · k!} · (106 · k3 ).
3.2 Construction of GqB
We begin with a graph G(V, E) which is (m, r)-co-partite with V = M × R where |M | =m and |R| = r.
Let l = 2lT · r k , and let B be the family of all subsets of V of size exactly l, i.e. B = Vl . Each element
B ∈ B is called a block. As in the constructions of [DS05, DKPS10] we think of any independent set in G
as an assignment of {T, F} to each vertex in V , where T means the vertex is in the independent set. In the
YES case, G has an independent set of size m, and therefore the expected number of T values in a randomly
chosen block is 2lT · r k−1 . Therefore, w.h.p a random block has at least lT · r k−1 many T values in them.
Using this fact, we let the set of block assignments RB to the block B be the set of all {T, F} assignments
to the vertices in B such that at least lT · r k−1 of them are assigned value T. Formally, for any block B ∈ B,
RB = {a : B 7→ {T, F} | |a−1 (T)| ≥ lT · r k−1 }.
S
We now construct an intermediate graph GB = (VB , EB ). The vertex set VB := B∈B {B} × RB , i.e.
for each block B, there is a cluster of vertices, one for every block assignment in RB . We let each cluster
be a clique. We add edges between two clusters in the following manner. Let B1 and B2 be two blocks
satisfying the following properties:
(i) B1 and B2 are (l−k)-wise intersecting, i.e. B̂ := B1 ∩B2 and |B̂| = l−k. Let B1 = {u1 , . . . , uk }∪B̂
and B2 = {v1 , . . . , vk } ∪ B̂.
(ii) There is an edge between ui and vi in G, for all i = 1, . . . , k. We refer to ui (vi ) as the ith pivot vertex
for B1 (B2 ).
Add an edge in GB between (B1 , a1 ) and (B2 , a2 ) where a1 ∈ RB1 and a2 ∈ RB2 if and only if either of
the following two conditions are satisfied,
11
• a1 |B̂ 6= a2 |B̂ , where ai |B̂ is the restriction of ai onto the sub-block B̂ for i = 1, 2.
• a1 (ui ) = a2 (vi ) = T for some i ∈ {1, . . . , k}.
This reduction preserves large independent sets as the following proposition shows.
Proposition 3.3. IS(G) = m =⇒ IS(GB ) = m′ (1 − ε) where m′ = |B|.
Proof. Let I ⊆ V be an independent set of size m in G. Let σ : V 7→ {T, F} be a map which assigns T
iff the vertex is in I. For each block B ∈ B let σB be the restriction of σ to the vertices in B. The subset
IB = {(B, σB ) | B ∈ B} is an independent set in GB as the following argument shows. Consider two
vertices (B1 , σB1 ) and (B2 , σB2 ) such that B1 and B2 are (l − k)-wise intersecting and there are edges in G
between ui and vi for all i = 1, . . . , k, where ui (vi ) are pivot vertices for B1 (B2 ). Since I is an independent
set in G, at most one of ui or vi is assigned T by the assignment σ, for i = 1, . . . , k. Therefore, there cannot
be an edge between (B1 , σB1 ) and (B2 , σB2 ) in GB . We lose a small number of elements in IB which might
not have the required number (= lT · r k−1 ) of T-values assigned by σ, and this loss is of at most ε fraction
as per the setting of the parameters. Thus, GB contains an independent set of size m′ (1 − ε).
The final graph GqB is constructed as follows. Recall that q ≥ 2k + 1. The graph GqB has vertex set
q
VB , edge set EBq and a weight function Λ. For every cluster (B, RB ) in GB , there is a copy of Gq,p [n] in
GqB where n = |RB |. The set of vertices VBq [B] of the cluster corresponding to block B is the set of all the
colorings of RB . Formally,
VBq [B] := (B, F ) | F ∈ {∗, 1, . . . , q}RB ,
VBq :=
[
B
VBq [B].
We shall frequently abuse notation to denote a vertex (B, F ) by F when the block B is clear from the
context.
The weight function Λ is defined as follows. Let µp be the distribution on VBq [B] = {F ∈ {∗, 1, . . . , q}RB }.
The weight function Λ assigns equal measure to each cluster, and within each cluster the vertices are
weighted according to µp . Formally, for F ∈ VBq [B],
Λ(F ) = |B|−1 µp (F ).
The edges in the edge set EBq within each cluster are already determined by the edges in Gq,p [n]. The
edges across clusters in EBq are as follows.
EBq = {((B1 , F1 ), (B2 , F2 )) ∈ VBq [B1 ] × VBq [B2 ] | F1−1 (i) × F2−1 (i) ⊆ EB , ∀i ∈ [q]}.
In other words, there is an edge between colorings F1 in VBq [B1 ] and F2 in VBq [B2 ] if and only if for all
colors i ∈ [q], if a1 ∈ RB1 and a2 ∈ RB2 such that F1 (a1 ) = F2 (a2 ) = i, then ((B1 , a1 ), (B2 , a2 )) ∈ EB .
Stated in the contrapositive, this means that there is no edge between F1 and F2 if and only if there exist
a1 ∈ RB1 and a2 ∈ RB2 such that F1 (a1 ) = F2 (a2 ) ∈ [q] and ((B1 , a1 ), (B2 , a2 )) 6∈ EB .
The following proposition follows from Proposition 6 of [DKPS10].
Proposition 3.4. (Maximal independent sets in GqB are monotone) Let I be an independent set in GqB . If
F ∈ I ∩ VBq [B], and F ′ is monotonically above F , then I ∪ {F ′ } is also an independent set.
12
4 Completeness
Lemma 4.1. If IS(G) = m then there exist disjoint independent sets I1 , . . . , Iq in GqB such that Λ(Ij ) ≥
1−2ε
q , for j = 1, . . . , q.
Proof. By Proposition 3.3, if IS(G) ≥ m then IS(GB ) ≥ (1 − ε)m′ . Let IB be an independent set in GB of
size (1 − ε)m′ . Due to the fact that each cluster in GB is a clique, the set IB contains at most one vertex,
say (B, a), from the cluster corresponding to (1 − ε)m′ of the blocks B, where a ∈ RB . For each j ∈ [q]
define,
Ij = {(B, F ) ∈ GqB | ∃a ∈ RB s.t. (B, a) ∈ IB and F (a) = j}.
In other words, for each block B such that there is some (B, a) ∈ IB , the set Ij contains all colorings that
assign the color j to the assignment a. Since IB contains at most one vertex (B, a) from from each block,
it is easy to see that the sets Ij are disjoint for j = 1, . . . , q. To see that Ij is an independent set observe the
following two cases:
• For a block B, let (B, F1 ), (B, F2 ) ∈ Ij . Then there is an assignment a ∈ RB such that F1 (a) =
F2 (a) = j and therefore from the structure of Gq,p [n], there is no edge between the colorings F1 and
F2 in the copy of Gq,p [n] corresponding to block B.
• For B1 6= B2 , let (B1 , F1 ), (B2 , F2 ) ∈ Ij . Then there must exist (B1 , a1 ), (B2 , a2 ) ∈ IB so that
((B1 , a1 ), (B2 , a2 )) 6∈ EB along with the property that F1 (a1 ) = F2 (a2 ) = j. This implies that there
is no edge in GqB between (B1 , F1 ) and (B2 , F2 ).
1−2ε
Also, the weight of Ij is Λ(Ij ) = (1 − ε) 1−ε
q ≥ q since the weight of Ij in each cluster is of all colorings
that assign color j to a chosen block assignment (B, a).
5 Soundness
Lemma 5.1. If IS(GqB ) ≥
k+1
1
q
, then ISh (Gk ) ≥ ε0 mk .
1
Let I be an independent set in GqB with Λ(I) ≥ qk+1
. Denote by I[B] the intersection I ∩ VBq . WLOG
we may assume that I is maximal. The following proposition is a restatement of Proposition 7 of [DKPS10].
Proposition 5.2. For all B ∈ B, I[B] is monotone and agreeing.
1
Thus, by Lemma 2.5, µp (I[B]) ≤ µ1 (I[B]) ≤ 1/q. By averaging, it must be that for at least 21 · qk+1
1
fraction of the blocks B, the measure µp (I[B]) is at least 21 · qk+1
. Therefore, by Theorem 2.3 and Lemma
ε
1
1
2.1, if p is increased from 1−ε to 1− 2 , then the new measure µ(I) increases by at least, 2ε · 2qk+1
· 1q · 2qk+1
≥
ε
. So, the new measure is,
8q 2k+4
Λ1− 2ε (I) ≥
1
q k+1
+
ε
8q 2k+4
=
1
q k+1
+ ε1 .
(2)
We borrow and generalize the following terminology from [DKPS10].
• F ∈ VBq is referred to as a coloring of the block assignments in RB , i.e. F ∈ {∗, 1, . . . , q}RB or
F : RB 7→ {∗, 1, . . . , q}.
13
• A family of colorings F ⊆ {∗, 1, . . . , q}RB is monotone if it is a monotone subset of {∗, 1, . . . , q}RB .
• Two colorings F1 , F2 ∈ {∗, 1, . . . , q}RB agree on a block assignment a ∈ RB if F1 (a) = F2 (a) ∈ [q].
• A set of block assignments A = {a1 , . . . , as } ⊆ RB is distinguished by a family F of colorings of
RB if there exist F1 , F2 ∈ F which agree on the block assignments in A and do not agree on any
block assignment in RB \ A. We also say that F has a distinguished set A.
In the next subsection we construct a large enough subset B ′ ⊆ B of the blocks such that the families
I[B] for B ∈ B ′ satisfy certain properties.
5.1 Selection of B′ ⊆ B.
Lemma 5.3. There exists some p′ ∈ [1 − 2ε , 1 − 4ε ] and a set of blocks B ′ ⊆ B whose size is |B ′ | ≥ 41 ε1 · |B|,
such that for all B ∈ B ′ :
1
ε1 , p′ )-core, Core[B] ⊆ RB , of size |Core[B]| ≤ h0 .
1. I[B] has a ( 16
3
4
2. The core family CF B := [I[B]]Core[B]
has a distinguished subset AB ⊆ Core[B] of block assignments
such that 1 ≤ |AB | ≤ k.
Proof. First we define a subset of blocks in which I has significantly large weight.
ε1
1
.
B̃ = B ∈ B | µ1− 2ε (I[B]) ≥ k+1 +
q
2
Using Equation (2) and averaging we obtain, |B̃| ≥ 21 ε1 · |B|. Since the measure of the monotone families
1
µp (I[B]) increases with p, we have that µp′ (I[B]) ≥ qk+1
+ 12 ε1 for all B ∈ B ′ and p′ ∈ [1 − 2ε , 1 − 4ε ] (we
shall fix p′ later). Define the subset B ′ as follows,
8q
′
.
(3)
B = B ∈ B̃ | asp′ (I[B]) ≤
ε
The following proposition is identical to Proposition 8 in [DKPS10] and we state it without proof.
Proposition 5.4. There exists p′ ∈ [1 − 2ε , 1 − 4ε ] such that
|B ′ | ≥
1
ε1 · |B|.
4
We fix p′ as given by the above proposition. Property 1 of Lemma 5.3 follows directly from Theorem
2.2 and the setting of the parameter h0 . To prove Property 2, observe that from the definition of CF B in
the statement of the lemma, and by Propositions 5.2, 2.8 and 2.9, CF B is monotone, pairwise agreeing and
satisfies,
ε1
1
µp′ (CF B ) ≥ µp′ (I[B]) − 4 ·
≥ k+1 .
(4)
16
q
Applying Lemma 2.6 to the above lower bound we obtain that there are two colorings F ♭ , F ♯ ∈ CF B that
agree only on a subset AB ⊆ Core[B] of the block assignments such that 1 ≤ |AB | ≤ k. Moreover, since
CF B is monotone, both F ♭ and F ♯ can be assumed to be in {1, . . . , q}Core[B] .
14
Definition 5.5. For p′ , B ′ as defined above and B ∈ B ′ , the extended core ECore[B] is defined as,
o
[n
′
Ecore[B] := Core[B]
a ∈ RB | Inf pa (I[B]) ≥ η .
The next proposition follows from the definition of influence and average sensitivity.
Proposition 5.6. (The extended core is small): For B ∈ B ′ ,
asp′ (I[B])
8q
|Ecore[B]| ≤ h0 +
≤ h0 +
= h1 .
η
εη
Definition 5.7. (Preservation:) Let B ∈ B ′ and B̃ ⊆ B such that |B̃| = (l − k). For any block assignment
a ∈ RB let a|B̃ be the restriction of a to B̃. We say that B̃ preserves B if there is no pair of block assignments
a1 6= a2 ∈ ECore[B] with a1 |B̃ = a2 |B̃ .
Lemma 5.8. For all B ∈ B ′ ,
h21 (l − 1)k−1
.
2
l−1
Proof. Each pair of block assignments in ECore[B] can cause at most l−k
≤ (l − 1)k−1 of sub-blocks B̃
to not preserve B. Since, for any B ∈ B ′ , |ECore[B]| ≤ h1 , the lemma follows.
|{X ⊆ B | |X| = k and B \ X does not preserve B}| ≤
5.2 Selection of B̂
V In this subsection we shall use the subset B ′ of the blocks to select a (l − k)-sized sub-block B̂ ∈ l−k
which satisfies some desired properties.
First we need some more notation. Let B ∈ B ′ , and AB be the distinguished set of assignments given
by Lemma 5.3. Let TB be the size of AB , so that 1 ≤ TB ≤ k. We fix an arbitrary numbering of
the assignments in AB and denote them by ȧ1 [B], . . . , ȧTB [B], referring to ȧi [B] as the ith distinguished
assignment for block B.
Also, consider an element v ∈ V k , i.e. v is an ordered tuple (v1 , . . . , vk ) such that vi ∈ V for i =
V
1, . . . , k. By abuse of notation, for any B̃ ∈ l−k
, we shall denote by B̃ ∪ v the set B̃ ∪ {v1 , . . . , vk } where
duplicates, if any, are removed. Similarly, for B ∈ Vl , B \ v shall be used to denote B \ {v1 , . . . , vk }.
Definition 5.9. For any (l − k) sub-block B̃, let VB̃k ⊆ V k be,
n
VB̃k = v = (v1 , . . . , vk ) ∈ (V \ B̃)k | B = B̃ ∪ v ∈ B ′ ,
B̃ preserves B and,
Proposition 5.10. There exists B̂ ∈
V l−k
o
ȧi [B](vi ) = T, ∀i ∈ {1, . . . , TB } .
such that |VB̂k | ≥ ε1 ·
1
2k+5
· mk .
Proof. In the following, the probability is taken over a random sub-block B̂ ∈
V (l−k)
and a tuple v =
(v1 , . . . , vk ) consisting of distinct vertices from V \ B̂, where B = B̂ ∪ v. From Proposition 5.4 we have,
h
i
i 1
h
Pr v ∈ VB̂k ≥ ε1 · Pr v ∈ VB̂k | B ∈ B ′ .
B,v
4
B̂,v
15
Now, conditioned on B ∈ B ′ , v is a random tuple of k distinct vertices from B. For each of the
distinguished assignments ȧi (1 ≤ i ≤ TB ), there are at least lT · r k−1 = 2rl elements v ∈ B so that
ȧi (v) = T. Since lT is much larger than k (by our setting lT > 106 k3 ), it is easy to see that there are at least
1
l k
tuples v = (v1 , . . . , vk ) of distinct vertices from B such that ȧi (vi ) = T for i = 1, . . . , TB . Out of
2 2r
h21 (l−1)k−1 k!
2
tuples v such that B \ v does not preserve B. Since l > 2k+4 · h21 · r k · k!,
l k
1
tuples v ∈ VB̂k , where B̂ ∪ v = B. Noting that the total
the above implies that there are at least 2k+2
r
number of k-tuples of vertices from B is at most lk , we obtain,
h
i
i 1
h
1
1
k
k
′
Pr v ∈ VB̂ ≥ ε1 · Pr v ∈ VB̂ | B ∈ B ≥ ε1 ·
·
.
(5)
k+4
B,v
4
2
rk
B̂,v
these, there are at most
Finally, noting that at least half of the tuples in V k have k distinct vertices and |V k | = mk r k , the above
inequality yields the lemma.
For the rest of the section we shall fix B̂ obtained from the above lemma. Before we proceed, we shall
choose an appropriate subset of VB̂k . We first define the set RB̂ := {â | â : B̂ 7→ {T, F}}, i.e. the set
of all assignments to the sub-block B̂. As before, for any a ∈ RB , where B is a block containing B̂ as a
sub-block, let a|B̂ be the restriction of a onto the sub-block B̂.
Lemma 5.11. There is a mapping κ : RB̂ 7→ {1, . . . , k} and a subset UB̂k ⊆ VB̂k of size |UB̂k | ≥
(1/kk )|VB̂k | ≥ ε0 · mk (by our setting of parameter ε0 ) satisfying the following property:
If u = (u1 , . . . , uk ) ∈ UB̂k and B = B̂ ∪ u, then κ(ȧi [B]|B̂ ) = i, for all i ∈ {1, . . . , TB }.
Proof. Consider any v ∈ VB̂k and let B = B̂ ∪ v ∈ B ′ . From the preservation property we have that, for
any i 6= j ∈ {1, . . . , TB }, ȧi [B]|B̂ 6= ȧj [B]|B̂ . Therefore, taking the probability over a random mapping
κ : RB̂ 7→ [k], we have
#
"T
B
^
1
(κ(ȧi [B]|B̂ ) = i) ≥ k .
Pr
κ
k
i=1
This proves the lemma.
UB̂k
Summarizing the above analysis, we have the sub-block B̂, a mapping κ : RB̂ 7→ [k] and a subset
⊆ V k satisfying the following properties:
• |UB̂k | ≥ ε0 mk .
• For any u = (u1 , . . . , uk ) ∈ UB̂k ,
– B = B̂ ∪ u ∈ B ′ .
– B̂ preserves B.
– ȧi [B](ui ) = T for all i ∈ {1, . . . , TB }.
– κ(ȧi [B]|B̂ ) = i for all i ∈ {1, . . . , TB }.
Using the above we are ready to prove the main soundness lemma, which shall complete the proof of
Lemma 5.1 and Theorem 3.2.
16
Lemma 5.12. The set UB̂k does not contain a clique of size h in Gk . Since, by Lemma 5.11, |UB̂k | ≥ ε0 mk ,
we have that ISh (Gk ) ≥ ε0 mk .
The proof of this lemma is given in the next subsection.
5.3 Proof of Lemma 5.12
We assume the contrapositive, i.e. that there is a subset of UB̂k of size h which is a clique in Gk . Let this
subset be u1 , . . . , uh . Let Bi := B̂ ∪ ui for i = 1, . . . , h, be the h blocks that shall be of interest to us.
We shall show the existence of an edge in I[Bi1 ] ∪ I[Bi2 ] for some i1 , i2 ∈ {1, . . . , h}, which shall be a
contradiction to the initial assumption that I is an independent set.
First, we shall simplify some notation. For each block Bi (1 ≤ i ≤ h), we define the following:
• Ti := TBi , the number of distinguished assignments for block Bi . Recall that 1 ≤ Ti ≤ k.
• ȧij = ȧj [Bi ], the jth distinguished block assignment for Bi , for 1 ≤ j ≤ Ti .
• Ci := Core[Bi ] and Ĉi := Ci |B̂ = {a|B̂ | a ∈ Ci }.
• Ei := ECore[Bi ] and Êi := Ei |B̂ . Recall that from the preservation property, there is a one-to-one
correspondence between Ei and Êi , in particular |Êi | = |Ei | and |Ĉi | = |Ci |.
• Since ȧi1 , . . . , ȧiTi are distinguished assignments, there exist two colorings Fi♭ and Fi♯ ∈ CF Bi that
agree only on ȧi1 , . . . , ȧiTi . Moreover, by monotonicity of CF Bi , both Fi♭ and Fi♯ can be taken to be
in {1, . . . , q}Ci . Let F̂i♭ , F̂i♯ be their restrictions to B̂ which is well-defined because of preservation.
In other words, F̂i♭ and F̂i♯ are colorings of Ĉi with colors from [q]. It follows that F̂i♭ and F̂i♯ agree
only on the sub-block assignments ȧij |B̂ for 1 ≤ j ≤ Ti .
5.3.1
Sunflower and Pigeonhole Principle
Proposition 5.13. There exists i1 6= i2 ∈ [h] such that, for ∆ = Êi1 ∩ Êi2 ,
1. Ĉi1 ∩ ∆ = Ĉi2 ∩ ∆.
2. F̂i♭1 and F̂i♭2 agree on Ĉi1 ∩ ∆ = Ĉi2 ∩ ∆.
3. F̂i♯1 and F̂i♯2 agree on Ĉi1 ∩ ∆ = Ĉi2 ∩ ∆.
Proof. We apply Theorem 2.7. Here we set R = RB̂ , F = {Ê1 , . . . , Ê2 }. Since we set h = CER (h1 , hs )
the theorem yields J ⊆ [h],
E i∈J are pairwise disjoint for ∆ = ∩i∈J Êi .
D |J| = hs such that {Êi \ ∆}
♯
♭
Consider the triplets Ĉi ∩ ∆, F̂i |∆∩Ĉi , F̂i |∆∩Ĉi , over all i ∈ J. The total number possible distinct
values of these triplets is at most,
h0 X
h1
· q h0 · q h0 < hs = |J|,
j
j=0
by our choice of hs . Using the Pigeonhole Principle, this means there exist distinct indices i1 , i2 ∈ J that
satisfy the Proposition.
For the rest of the section we shall focus on the two indices obtained from the above proposition. For
convenience we rename them i1 = 1 and i2 = 2.
17
5.3.2
Finding an edge between IB1 and IB2
Proposition 5.14. If a1 ∈ C1 and a2 ∈ C2 such that F1♭ (a1 ) = F2♯ (a2 ) ∈ [q], then ((B1 , a1 ), (B2 , a2 )) ∈
EB .
Proof. Assume that ((B1 , a1 ), (B2 , a2 )) 6∈ EB . Then a1 |B̂ = a2 |B̂ . This implies the following: a1 |B̂ =
a2 |B̂ ∈ ∆, since a1 |B̂ ∈ Ê1 and a2 |B̂ ∈ Ê2 . In particular, we have a1 |B̂ = a2 |B̂ ∈ Ĉ1 ∩ ∆ = Ĉ2 ∩ ∆.
Furthermore, since F1♭ (a1 ) = F2♯ (a2 ) ∈ [q], we have the following equalities,
F̂1♭ (a1 |B̂ ) = F̂2♯ (a2 |B̂ ) = F̂1♯ (a1 |B̂ ), and,
F̂1♭ (a1 |B̂ ) = F̂2♭ (a2 |B̂ ),
where the first inequality is because of the preservation property and the definition of a restricted coloring.
Recalling from above that, a1 |B̂ = a2 |B̂ ∈ Ĉ1 ∩ ∆ = Ĉ2 ∩ ∆, the second inequality follows from agreement
of F̂1♯ and F̂2♯ on Ĉ1 ∩ ∆ = Ĉ2 ∩ ∆, and similarly the third inequality follows from the agreement of F̂1♭ and
F̂2♭ on Ĉ1 ∩ ∆ = Ĉ2 ∩ ∆. In particular all the terms in the three equalities are the equal.
However, the colorings F̂1♭ and F̂1♯ agree only on some restriction of a distinguished assignment i.e.
ȧ1j |B̂ for some 1 ≤ j ≤ T1 . Similarly, the colorings F̂2♭ and F̂2♯ agree only on some restriction of a
distinguished assignment i.e. ȧ2j ′ |B̂ for some 1 ≤ j ′ ≤ T2 . Now, since ȧ1j |B̂ = ȧ2j ′ |B̂ , this implies
that κ(ȧ1j |B̂ ) = κ(ȧ2j ′ |B̂ ). But, κ(ȧ1j |B̂ ) = j and κ(ȧ2j ′ |B̂ ) = j ′ . Therefore, j = j ′ = j ∗ for some
1 ≤ j ∗ ≤ min{T1 , T2 }, and by preservation a1 = ȧ1j ∗ and a2 = ȧ2j ∗ .
Recall that B1 = B̂ ∪ u1 and B2 = B̂ ∪ u2 such that there is an edge between u1 and u2 in Gk . Let
u1 = (u11 , . . . , u1k ) and u2 = (u21 , . . . , u2k ). Since there is an edge between u1 and u2 in Gk , there are edges
in G between u1j and u2j for j = 1, . . . , k. In particular, there is an edge in G between u1j ∗ and u2j ∗ in G.
However, from our choice of u1 and u2 , we have ȧ1j ∗ (u1j ∗ ) = T and ȧ2j ∗ (u2j ∗ ) = T. This implies that there
is an edge between (B1 , ȧ1j ∗ ) and (B2 , ȧ2j ∗ ) in EB . This is a contradiction to our supposition and thus
completes the proof.
We partition the set of sub-block assignments RB̂ as follows. Let,
D̂ = Ĉ1 ∪ Ĉ2
and
R̂ = RB̂ \ D̂.
This partition also induces partitions on the two sets of block assignments RB1 and RB2 as follows:
n
o
D1 = a ∈ RB1 | a|B̂ ∈ D̂
and R1 = RB1 \ D1 .
n
o
D2 = a ∈ RB2 | a|B̂ ∈ D̂
and
R2 = RB2 \ D2 .
Proposition 5.15. |D1 |, |D2 | ≤ 2k+1 · h0 .
Proof. For any sub-block assignment in RB̂ , there are at most 2k block assignments in RB1 (or in RB2 )
which correspond to it by restriction. Therefore,
|D1 |, |D2 | ≤ 2k · |D̂| ≤ 2k · |Ĉ1 | + |Ĉ2 | = 2k · (|C1 | + |C2 |) ≤ 2k+1 · h0 .
18
Proposition 5.16. (D1 \ C1 ) ∩ E1 = ∅ and (D2 \ C2 ) ∩ E2 = ∅.
Proof. We shall only prove the first equality and the proof of the second is analogous.
Suppose for a contradiction that a ∈ D1 ∩ E1 and a 6∈ C1 . Since a ∈ D1 , a|B̂ ∈ D̂ = Ĉ1 ∪ Ĉ2 . We
analyze two cases.
Case a|B̂ ∈ Ĉ1 : In this case, there exists b ∈ C1 such that b|B̂ = a|B̂ . Since, a ∈ E1 \ C1 , this contradicts
the preservation property.
Case a|B̂ ∈ Ĉ2 \ Ĉ1 : In this case, a|B̂ ∈ Ĉ2 \ ∆ ⊆ Ê2 \ ∆. This implies that a|B̂ 6∈ Ê1 since Ê2 \ ∆ is
disjoint from Ê1 \ ∆. This means that a 6∈ E1 which is a contradiction.
Proposition 5.17. There is a coloring F1′ of every a1 ∈ D1 \ C1 and a coloring F2′ of every a2 ∈ D2 \ C2 ,
using only colors in [q], such that if we combine the colorings F1′ and F1♭ to form a coloring F1∗ of D1 and
similarly combine F2′ and F2♯ to form a coloring F2∗ of D2 , we have that for all a1 ∈ D1 and a2 ∈ D2 ,
F1∗ (a1 ) = F2∗ (a2 ) ∈ [q] =⇒ ((B1 , a1 ), (B2 , a2 )) ∈ EB .
Proof. We begin by noting that, by the choice of B1 and B2 , if ((B1 , a1 ), (B2 , a2 )) 6∈ EB then it must be
the case that a1 |B̂ = a2 |B̂ . Therefore, it is sufficient to describe the colorings for block assignments in D1
and D2 that correspond to the same sub-block assignment in D̂.
Consider any sub-block assignment â ∈ D̂, the subsets D1 (â) := {a1 ∈ D1 | a1 |B̂ = â}, and
D2 (â) := {a2 ∈ D2 | a2 |B̂ = â}. By the preservation property we have that |D1 (â) ∩ E1 | ≤ 1 and
|D2 (â) ∩ E2 | ≤ 1. From the definition of D̂, we also obtain that |D1 (â) ∩ C1 | + |D2 (â) ∩ C2 | ≥ 1.
Before we proceed, we shall introduce some notation to simplify the analysis. Recall that B1 = B̂ ∪ u1 ,
where u1 = (u11 , . . . , u1k ), and similarly B2 = B̂ ∪ u2 , where u2 = (u21 , . . . , u2k ). Define a mapping
X1 : RB1 7→ 2[k] , as follows:
X1 (a1 ) := {i ∈ [k] | a1 (u1i ) = T}.
Similarly, define the mapping X2 : RB2 7→ 2[k] analogously,
X2 (a2 ) := {i ∈ [k] | a2 (u2i ) = T}.
Since there are edges in G between u1i and u2i for all i ∈ [k], it can be seen that for any a1 ∈ D1 (â) and
a2 ∈ D2 (â),
((B1 , a1 ), (B2 , a2 )) ∈ EB ⇐⇒ X1 (a1 ) ∩ X2 (a2 ) 6= ∅.
(6)
Therefore, we need to extend the coloring F1♭ in D1 (â), and the coloring F2♯ in D2 (â) so that for any
a1 ∈ D1 (â) and a2 ∈ D2 (â), if a1 and a2 are assigned the same color in [q] then X1 (a1 ) ∩ X2 (a2 ) must be
non-empty. We shall do this by the following case analysis.
Case 1: |D1 (â) ∩ C1 | = 1. and D2 (â) ∩ C2 = ∅. Clearly, none of the block assignments D2 (â) are
colored by F2♯ . Let a′ ∈ D1 (â) ∩ C1 . Color all the block assignments in D1 (â) with the color F1♭ (a′ ), and
color all block assignments in D2 (â) with a different color from [q]. This gives us the desired coloring of
D1 (â) ∪ D2 (â).
Case 2: |D1 (â) ∩ C1 | = 1. and |D2 (â) ∩ C2 | = 1. Let a′1 ∈ D1 (â) ∩ C1 and a′2 ∈ D2 (â) ∩ C2 . By
Proposition 5.14, we have that
F1♭ (a′1 ) = F2♯ (a′2 ) ∈ [q] =⇒ ((B1 , a′1 ), (B2 , a′2 )) ∈ EB .
Here we need to consider two subcases.
19
Case 2a. F1♭ (a′1 ) = F2♯ (a′2 ) = ℓ∗ ∈ [q]. Equation (6) implies X1 (a′1 ) ∩ X2 (a′2 ) =
6 ∅. Clearly,
∗
′
′
X1 (a1 ), X2 (a2 ) 6= ∅. Since q ≥ 3, let i1 6= i2 ∈ [q] \ {ℓ }. Define two mappings f, g : 2[k] 7→ [q] as
follows:
f (X1 (a′1 )) = g(X2 (a′2 )) = ℓ∗ ,
∀S 6= X1 (a′1 ), f (S) = i1 ,
∀T 6= X2 (a′2 ), g(T ) = i2 .
Letting F1′ (a1 ) = f (X1 (a1 )) for all a1 ∈ D1 (â) \ C1 and F2′ (a2 ) = g(X2 (a2 )) for all a2 ∈ D2 (â) \ C2 ,
and using Equation (6) we obtain the desired coloring of D1 (â) ∪ D2 (â) for this subcase.
Case 2b. F1♭ (a′1 ) 6= F2♯ (a′2 ) ∈ [q]. In this case, color all the block assignments in D1 (â) with the
color F1♭ (a′1 ) and color all the block assignments in D2 (â) with the color F2♯ (a′2 ). This gives us the desired
coloring of D1 (â) ∪ D2 (â) for this subcase.
Case 3. |D2 (â) ∩ C2 | = 1. and D1 (â) ∩ C1 = ∅. This case is handled analogous to Case 1.
The above case analysis, done for each â ∈ D̂, shows the existence of a coloring F1∗ of D1 using colors in
[q] extending F1♭ , and a coloring F2∗ of D2 using colors in [q] extending F2♯ , such that for all a1 ∈ D1 and
a2 ∈ D2 ,
F1∗ (a1 ) = F2∗ (a2 ) ∈ [q] =⇒ ((B1 , a1 ), (B2 , a2 )) ∈ EB .
We fix the colorings F1∗ = (F1♭ , F1′ ) and F2∗ = (F2♯ , F2′ ) of D1 and D2 respectively as obtained from the
above proposition. We define the following,
I1 := F ∈ {∗, 1, . . . , q}R1 | (F, F1∗ ) ∈ I[B1 ] ,
and,
I2 := F ∈ {∗, 1, . . . , q}R2 | (F, F2∗ ) ∈ I[B2 ] ,
where (F, F1∗ ) is the coloring of RB1 = D1 ∪ R1 obtained by combining the coloring F of R1 with the
coloring F1∗ of D1 . The coloring (F, F2∗ ) is analogously defined. Since I[B1 ] and I[B2 ] are monotone,
it is easy to see that both I1 and I2 are monotone. Note that Proposition 5.16 implies that for any block
′
′
assignment a1 ∈ D1 \ C1 , Inf ap1 (I[B1 ]) < η and similarly for a2 ∈ D2 \ C2 , Inf pa2 (I[B2 ]) < η. Using this
we have the following proposition which is proved in a similar manner to Proposition 17 in [DKPS10].
Proposition 5.18.
1
µR
p′ (I1 ) >
1
2
1
2
and µR
p′ (I2 ) > 2 .
Proof. We shall prove the above for I1 , and the proof for I2 is analogous. From the definition of C1 , we
know that > 43 fraction of the extensions of F1♭ outside C1 are in I[B1 ]. We shall additionally fix the coloring
of the block assignments in D1 \ C1 to the coloring F1′ obtained from Proposition 5.17, and show that at
least 12 of the extensions of F1∗ are in I[B1 ].
′
Note that Proposition 5.16 implies that for any block assignment a ∈ D1 \ C1 , Inf pa (I[B]) < η. Let,
n
o
I[B1 ]′ := F ∈ I[B1 ] | F |D1 \C1 ←−F1′ 6∈ I[B1 ] ,
20
where F |D1 \C1 ←−F1′ is the coloring obtained by changing the colors of D1 \ C1 to that given by F1′ . Let
(D1 \ C1 )∁ be RB1 \ (D1 \ C1 ) and,
o
n
∁
F ′′ = F ∈ {∗, 1, . . . , q}(D1 \C1 ) | (F, F1′ ) 6∈ I[B1 ] but (F, H) ∈ I[B1 ] for some H ∈ {1, . . . , q}D1 \C1 .
(D \C )∁
From the monotonicity of I[B1 ] it is easy to see that any coloring F ∈ F ′′ contributes µp′ 1 1 (F ) ·
′ |D1 \C1 |
′
p
to the influence Inf pe (I[B1 ]) for some element e ∈ D1 \ C1 . Since the sum of the influences in
q
D1 \ C1 is less that |D1 \ C1 | · η, we have,
(D \C )∁
µp′ 1 1 (F ′′ )
′ −|D1 \C1 |
p
.
< |D1 \ C1 | · η ·
q
Using Proposition 5.15 we have that |D1 \ C1 | ≤ 2k+1 · h0 . Along with our setting of the parameter η and
since 1 > p′ ≥ p, this implies
(D \C )∁
µp′ 1 1 (F ′′ )
k+1
<2
′ −2k+1 ·h0
p
1 p′ h0
· h0 · η ·
≤
.
q
4 q
Since I[B1 ]′ ⊆ {F ∈ {∗, 1, . . . , q}RB1 | F |(D1 \C1 )∁ ∈ F ′′ }, we have that,
′
µp′ (I[B1 ] ) ≤
(D \C )∁
µp′ 1 1 (F ′′ )
1
<
4
′ h 0
p
.
q
Since F1♭ is a coloring with colors only from [q], and |C1 | ≤ h0 , we have µp′ (F1♭ ) ≥
with the fact that Pr[A | B] ≤ Pr[A]/ Pr[B], this implies,
Pr
RB
1
F ∈µp′
h
i
F ∈ I[B1 ]′ | F |C1 = F1♭ ≤
Pr
RB
1
F ∈µp′
Using the above we have,
i
h
Pr
F ∈ I[B1 ] \ I[B1 ]′ | F |C1 = F1♭ ≥
RB
1
F ∈µp′
F ∈ I[B1 ]′ ·
Pr
RB
1
F ∈µp′
−
Pr
p
q
′ −h0
p
1
< .
q
4
F ∈ I[B1 ] | F |C1 = F1♭
RB
1
F ∈µp′
>
h
′ h0
h
i
F ∈ I[B1 ]′ | F |C1 = F1♭
1
3 1
− = .
4 4
2
Since the LHS of the above inequality can be written as Pr[F ∈ I[B1 ] | F |D1 = F1∗ ], we have,
∗
1
µR
p′ (I1 ) = Pr [F ∈ I[B1 ] | F |D1 = F1 ] >
21
1
.
2
i
. Along
We shall extend the colorings F1∗ and F2∗ to colorings of RB1 and RB2 respectively, such that both are
in the (supposed) independent set but yield an edge in it.
Proposition 5.19. Given I1 and I2 as above there exist H1 ∈ I1 and H2 ∈ I2 such that for all a1 ∈ RB1
and a2 ∈ RB2 , if H1 (a1 ) = H2 (a2 ) ∈ [q] then ((B1 , a1 ), (B2 , a2 )) ∈ EB . Thus, ((B1 , (H1 , F1∗ )), (B2 , (H2 , F2∗ )))
is an edge in GqB .
Proof. By the monotonicity of I1 and I2 we can assume that p′ = 1. Note that F1∗ and F2∗ already color
block assignments in D1 and D2 respectively as given in Proposition 5.17. We shall consider each â ∈ R̂
separately and construct the coloring of R1 (â) in H1 and of R2 (â) in H2 using colors from [q], where
R1 (â) := {a ∈ R1 | a|B̂ = â} and R2 (â) := {a ∈ R2 | a|B̂ = â}.
Fix â ∈ R̂. We use the maps X1 and X2 as given in the proof of Proposition 5.17. Let f, g be a
pair of random colorings sampled from the distribution D given by Lemma 2.10. For any a1 ∈ R1 (â), let
H1 (a1 ) = f (X1 (a1 )). Similarly, for any a2 ∈ R2 (â), let H2 (a2 ) = g(X2 (a2 )). From property 2 of Lemma
2.10 f, g satisfy,
f (X1 (a1 )) = g(X2 (a2 )) =⇒ X1 (a2 ) ∩ X2 (a2 ) 6= ∅.
Using the property that,
X1 (a2 ) ∩ X2 (a2 ) 6= ∅ ⇐⇒ ((B1 , a1 ), (B2 , a2 )) ∈ EB ,
we conclude that the coloring of R1 (â) by H1 and of R2 (â) by H2 satisfies the desired property in the
statement of the lemma.
This randomized construction of the colorings H1 and H2 is done for each â ∈ R̂. From the property
1 of Lemma 2.10 we obtain that under the measure µ1 , H1 is distributed uniformly in the set I1 and the
H2 is uniformly distributed in the set I2 . Using the fact that both I1 and I2 are monotone we have that
µ1 (I1 ) > 12 and µ1 (I2 ) > 12 , and thus,
Pr[H1 ∈ I1 ] >
1
2
and
1
Pr[H2 ∈ I2 ] > .
2
Therefore,
Pr[H1 ∈ I1 , H2 ∈ I2 ] > 0.
This implies that with non-zero probability over the choice of H1 and H2 , ((B1 , (H1 , F1∗ )), (B2 , (H2 , F2∗ )))
is an edge in the (supposed) independent set. The completes the proof of this proposition and of Lemma
5.12.
6 A New Label Cover and Proof of Theorem 3.1
In this section we shall prove Theorem 3.1. For this we begin with the description of a query efficient PCP
construction of Håstad and Khot [HK05]. Based on this PCP we construct a new Label Cover problem
for which we prove a specific hardness property. Using this property of the new Label Cover Problem, we
formally give the proof of Theorem 3.1 in Section 6.4.
The notations used in this section are in a context different to the previous sections.
22
6.1 PCP of Håstad and Khot [HK05]
Håstad and Khot [HK05] constructed a query efficient PCP with perfect completeness and proved the following theorem.
Theorem 6.1. Let t, q > 1 be any arbitrary positive integers. Consider a constraint system C consisting
of a variable set V of size n and a set of m predicates P = {Pi }m
i=1 where each predicate Pi is over the
2
2
[q]-labelings of a fixed tuple of t + 4t variables and the set of accepting assignments Acc(Pi ) ⊆ [q]t +4t
satisfies |Acc(Pi )| = q 4t . It is NP-hard to distinguish between the following two cases.
YES CASE. There is an assignment σ : V 7→ [q] such that every constraint is satisfied.
2
NO CASE. Any assignment of labels to the variables satisfies at most q −t +1 fraction of the predicates.
6.2 A Label Cover Problem
Consider the following Label Cover problem LC derived from the PCP of Håstad and Khot [HK05] given in
Theorem 6.1 with parameters t, q. The set of vertices consists of the variable set V and the set of predicates
P. The label sets for v ∈ V is [q], and for P ∈ P is Acc(P ). There is an edge between v ∈ V and P ∈ P
iff v is a variable of P denoted by v ∈ Var(P ). The edge is satisfied only if the label a ∈ [q] assigned to
v is consistent with the label b ∈ Acc(P ) assigned to P . We prove a somewhat non-standard hardness of
approximation for this Label Cover problem – as stated in the theorem below – which is tailored for our
purpose.
Theorem 6.2. For any constants α, β > 0 and positive integer k, for large enough values of t and q
2
satisfying t > 2k and q = 22(t +4t) , given an instance of LC derived from the PCP in Theorem 6.1 with
parameters t and q, it is NP-hard to distinguish between the following two cases.
YES CASE. There is a labeling to the set of variables V and the set of predicates P such that all edges are
satisfied.
NO CASE. There is no labeling to V and P such that for β(q −4tk ) fraction of the predicates P , at least α
fraction of the edges incident on P are satisfied.
Proof. The YES Case follows from the fact that in the YES case of Theorem 6.1 there is an assignment to
every variable in V that satisfies all the constraints in P. Choosing the corresponding satisfying assignments
as labels for the predicates, we obtain a labeling to the instance of LC that satisfies all edges.
In the NO Case, assume for a contradiction that there is a labeling to the variables and predicates such
that for β(q −4tk ) fraction of the predicates P , at least α fraction of the edges incident on P are satisfied.
Call the predicates in this fraction as “good”. Now do a randomization of the variables as follows: independently with probability 1/2 reset each variable to a random value in [q]. The probability that all edges
2
2
incident on the “good” predicates are satisfied is 2−(t +4t) q −(1−α)(t +4t) . This is because for any “good”
2
predicate, with probability 2−(t +4t) the variables corresponding to the already satisfied edges remain the
same and the variables corresponding to the unsatisfied edges are reset randomly. With a further probability
2
of q −(1−α)(t +4t) , the previously unsatisfied edges are now satisfied.
Since there are β(q −4tk ) fraction of predicates that are “good”, this implies that there is a labeling to the
variables that satisfies,
2
2
2−(t +4t) q −(1−α)(t +4t) β(q −4tk ),
fraction of predicates. For our setting of parameters and for large enough t, q, this exceeds the soundness in
the NO case of Theorem 6.1. Thus, we obtain a contradiction and this completes the proof of the theorem.
23
Define the set of all predicate-label pairs D := ∪P ∈P ∪b∈Acc(P ) {(P, b)}. Clearly |D| = m · q 4t . For a
given positive integer k, we shall study the structure of large enough subsets of D k . First we describe what
is meant by a pair of elements of D or D k being contradictory.
Contradictory Pair. Two predicate-label pairs (P, a) and (P ′ , b) where a ∈ Acc(P ) and b ∈ Acc(P ′ ) are
contradictory if one the following conditions hold.
1. P = P ′ and a 6= b.
2. P and P ′ share a variable v and a(v) 6= b(v) where a(v) (b(v)) is the label in [q] assigned to v by a
(b).
Extending the above terminology to D k , we say that a pair g 1 , g 2 ∈ D k is contradictory if for every index
i ∈ [k] the ith coordinate of g 1 and the ith coordinate of g2 are contradictory as elements of D.
The following is the main lemma that we shall prove.
Lemma 6.3. For any constant ξ > 0, and positive integers h and k, there exist parameters t, q satisfying
2
t > 2k , q = 22(t +4t) such that if the instance of LC is the NO Case, then for every subset S ⊆ D k such that
|S| ≥ ξ · mk , there exist h distinct elements g 1 , . . . g h ∈ S which are pairwise contradictory.
Before proving the above lemma we need to define and analyze contradictions in sub-multisets of [q]k .
Two elements z 1 , z 2 ∈ [q]k are contradictory if they differ in each coordinate. Let Z be a multiset with
elements from [q]k . For any coordinate i ∈ [k] and any a ∈ [q], define the normalized occurrence of a in
coordinate i as,
1
wZ (i, a) :=
· |{z ∈ Z | z(i) = a}| .
|Z|
We are now ready to state the following lemma relating the normalized occurrence values to the existence
of contradictory cliques in multisets with elements from [q]k .
Lemma 6.4. Fix any constant integer h > 1. Let Z be a multiset with elements from [q]k be such that for
every z 1 , . . . , z h ∈ Z, there exist j 6= j ′ ∈ {1, . . . , h} such that z j and z j ′ are not contradictory. Then there
exists a ∈ [q] and i ∈ [k] such that wZ (i, a) ≥ 1/(2hk).
Proof. Suppose that there is no a ∈ [q] and i ∈ [k] such that wZ (i, a) ≥ 1/(2hk). Consider the following
interative process. Set Z0 := Z. For i = 1, . . . , h, do the following.
• Choose z i arbitrarily from Zi−1 .
• Let Zi be the multiset formed by removing from Zi−1 all elements that agree with z i on any coordinate. If Zi is empty, then stop.
We claim that above process continues for h steps. This is because each step removes at most k(1/(2hk)) =
1/2h fraction of the elements of Z. Therefore, this process selects z 1 , . . . , z h which are pairwise contradictory. This contradicts the assumption in the lemma, and thus completes its proof.
In the next subsection we give the proof of Lemma 6.3.
24
6.3 Proof of Lemma 6.3
For a contradiction to the statement of Lemma 6.3, we assume that there is a subset S ⊆ D k such that
|S| ≥ ξ · mk , and it does not contain any h distinct elements which are pairwise contradictory.
Recall that the set of predicates P is of size m, and for each predicate P , the set of possible labels to it is
Acc(P ) of size q 4t . For any tuple (P1 , . . . , Pk ) of predicates there are at most q 4tk possibilities for elements
g ∈ S such that the ith coordinate of g is a predicate-label pair of the form (Pi , b) where b ∈ Acc(P ), for
i = 1, . . . , k.
We construct a subset T ⊆ S by the following process: for each tuple p = (P1 , . . . , Pk ) of predicates
– such that there is at least one element of the form ((P1 , b′1 ), . . . , (Pk , b′k )) in S – choose an arbitrary
unique element denoted by ((P1 , bp1 ), . . . , (Pk , bpk )) from S. Clearly, the size of T is at least ξ · mk /(q 4tk ).
Moreover, since T is a subset of S, it satisfies our supposition that there exist no h distinct elements in T
which are pairwise contradictory. For notation purposes we also define TP to be the subset of P k such that
for p = (P1 , . . . , Pk ) ∈ TP , the unique element ((P1 , bp1 ), . . . , (Pk , bpk )) is in T . Clearly, |TP | = |T |.
Now consider an element v = (v1 , . . . , vk ) in V k . Define the set
N (v) := {p = (P1 , . . . , Pk ) ∈ TP | vi ∈ Var(Pi ), ∀i ∈ [k]}.
Essentially, N (v) is the set of coordinate wise “neighbors” of the tuple v that occur with a labeling in
T . Analogously, for p ∈ TP , define N −1 (p) = {v ∈ V k | p ∈ N (v)}. It is easy to see that since
p = (P1 , . . . , Pk ) ∈ TP , N −1 (p) = {(v1 , . . . , vk ) ∈ V k | vi ∈ Var(Pi ), ∀i ∈ [k]}.
Define a multiset L(v) of the labelings induced on the tuple v by N (v), i.e.
]
L(v) :=
{(bp1 (v1 ), . . . , bpk (vk ))},
p∈N (v)
where bpi (vi ) ∈ [q] is the label induced on vi by the label bpi ∈ Acc(Pi ), where p = (P1 , . . . , Pk ). Note
that L(v) is a multiset with elements from [q]k . The fact that the set T has no large pairwise contradictory
subsets enables us to prove the following lemma.
Lemma 6.5. For any v ∈ V k such that N (v) 6= ∅, there is an element a ∈ [q] and an index i ∈ [k] such
that wL(v) (i, a) ≥ 1/(2hk).
Proof. Since T does not contain any h distinct elements that are pairwise contradictory, it follows that the
set {g = ((P1 , bp1 ), . . . , (Pk , bpk )) | p = (P1 , . . . , Pk ) ∈ N (v)} ⊆ T , does not contain any h distinct
elements that are pairwise contradictory. This implies that the multiset L(v) does not contain h distinct
elements that are pairwise contradictory. Applying Lemma 6.4 to L(v) completes the proof.
We need a few more definitions before proceeding with the proof. For v ∈ V k and i ∈ [k] let A(v, i) :=
{a ∈ [q] | wL(v) (i, a) ≥ 1/(2hk)}. It is easy to see that |A(v, i)| ≤ 2hk.
For each i ∈ [k], define a random assignment σi of labels from [q] to each v ∈ V k as follows: if
N (v) = ∅ then let σi (v) be a random element from [q], else if A(v, i) 6= ∅, then set σi (v) to be a uniformly
random element from A(v, i), otherwise choose a random element from [q]. We have the following lemma.
Lemma 6.6. With probability taken over a random p ∈ TP , a random v = (v1 , . . . , vk ) ∈ N −1 (p), a
random i ∈ [k] and a random assignment σi as constructed above,
h
i
1
Pr bpi (vi ) = σi (v) ≥ 2 3 .
4h k
25
Proof. Conditioned on a choice of v = (v1 , . . . , vk ) ∈ V k , the randomness in the lemma chooses p uniformly from the (non-empty) set N (v). By Lemma 6.5 there is an index i∗ ∈ [k] and a label a ∈ [q]
such that wL(v) (i∗ , a) ≥ 1/(2hk), and therefore a ∈ A(v, i∗ ). This implies that with probability at least
1/(2hk), bpi∗ (vi∗ ) = a. With further probability 1/k the randomness chooses the index i∗ and with probability 1/|A(v, i∗ )| ≥ 1/(2hk), a is chosen by σi∗ . Combining everything we obtain that with probability at
least 1/(4h2 k3 ), the desired event occurs.
Without loss of generality, from the above lemma we may assume that there exists a assignment σ1 such
that taking probability over the choice of a random p = (P1 , . . . , Pk ) ∈ TP and random v1 , . . . , vk such that
vi ∈ Var(Pi ) (for all i ∈ [k]),
This can be rewritten as:
E
h
i
Pr bp1 (v1 ) = σ1 ((v1 , . . . , vk )) ≥
Ev1 ∈Var(P1 )
p=(P1 ,...,Pk )∈TP
v2 ∈Var(P2 ),...,vk ∈Var(Pk )
h
1
.
4h2 k3
i
1{bp1 (v1 ) = σ1 ((v1 , . . . , vk ))} ≥
1
.
4h2 k3
By averaging, the above implies,
|Var(P )| 1
1
p
{v
∈
Var(P
)
|
b
(v
)
=
σ
((v
,
.
.
.
,
v
))}
Pr
≥ 2 3
1
≥
1
1
1
k
1 1
8h2 k3
8h k
p=(P1 ,...,Pk )∈TP
v2 ∈Var(P2 ),...,vk ∈Var(Pk )
Since |TP | ≥ ξmk /q 4tk = ξ · |P k |/q 4tk , the above implies,

p ∈ TP

and
Pr

p
p=(P1 ,...,Pk )∈P k
∈
Var(P
)
|
b
(v
)
=
σ
((v
,
.
.
.
,
v
))}
{v
≥
1
1
1
1
k
1 1
v2 ∈Var(P2 ),...,v ∈Var(P )
k
k
|Var(P1 )|
8h2 k 3


≥
ξ
q 4tk
1
8h2 k3
Therefore, there is a setting of P2∗ , . . . , Pk∗ ∈ P and v2∗ ∈ Var(P2∗ ), . . . , vk∗ ∈ Var(Pk∗ ) such that,


p = (P1 , P2∗ , . . . , Pk∗ ) ∈ TP
ξ
1


and
Pr 
≥
P1 ∈P q 4tk
8h2 k3
|Var(P )|
p
{v1 ∈ Var(P1 ) | b1 (v1 ) = σ1 ((v1 , v2∗ . . . , vk∗ ))} ≥ 8h2 k13
(7)
This enables us to define a labeling ρV to each variable by setting ρV (v) := σ1 (v, v2∗ , . . . , vk∗ ). Also, define a
(partial) labeling ρP to the predicates by setting ρP (P ) := bp1 , where p = (P, P2∗ , . . . , Pk∗ ). From Equation
(7) we obtain that the labeling given by ρV and ρP to the variables and the predicates has the property
−4kt
fraction of the predicates P , 8h12 k3 fraction of the edges incident on P are satisfied. This
that for ξq
8h2 k 3
contradicts Theorem 6.2 for large enough setting of t, q. This completes the proof of Lemma 6.3.
6.4 Proof of Theorem 3.1
Set the parameter ξ in Lemma 6.3 to be ε0 in the statement of Theorem 3.1, and the parameters h and k as in
the statement of Theorem 3.1. Choose the parameters q and t appropriately as in Lemma 6.3. Let LC be the
instance of Label Cover given by Theorem 6.2 with the parameters q and t as set. We construct the following
26
graph G(V, E), where V := D, the set of all predicate-label pairs. Note that the size |V | = |D| = m · q 4t ,
where m is the number of predicates in the LC instance. There is an edge between (P, a), (P ′ , b) ∈ D if the
two are contradictory. It is easy to see that G is (m, q 4t )-co-partite. With the definition of Gk as given in
Section 3, Theorem 6.2 and Lemma 6.3 imply that it is NP-hard to distinguish between the following two
cases:
• YES Case: There is an independent subset in G of size m.
• NO Case: Every subset of V k of size at least ε0 · mk contains h elements that form a clique in Gk .
The above completes the proof of Theorem 3.1.
References
[ACC06]
Sanjeev Arora, Eden Chlamtac, and Moses Charikar. New approximation guarantee for chromatic number. In Proceedings of the ACM Symposium on the Theory of Computing, pages
215–224, 2006.
[AK98]
Rudolf Ahlswede and Levon H. Khachatarian. The Diametric Theorem in Hamming spaces –
optimal anticodes. Advances in Applied Mathematics, 20(4), 1998.
[ALM+ 98] Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy. Proof
verification and the hardness of approximation problems. Journal of the ACM, 45(3):501–555,
1998.
[AS98]
Sanjeev Arora and Shmuel Safra. Probabilistic checking of proofs: A new characterization of
NP. Journal of the ACM, 45(1):70–122, 1998.
[BK97]
Avrim Blum and David R. Karger. An Õ(n3/14 )-coloring algorithm for 3-colorable graphs.
Information Processing Letters, 61(1):49–53, 1997.
[BK09]
Nikhil Bansal and Subhash Khot. Optimal long code test with one free bit. In Proceedings of
the Annual Symposium on Foundations of Computer Science, pages 453–462, 2009.
[BK10]
Nikhil Bansal and Subhash Khot. Inapproximability of hypergraph vertex cover and applications to scheduling problems. In Proceedings of the International Colloquium on Automata,
Languages and Programming, pages 250–261, 2010.
[BKK+ 92] Jean Bourgain, Jeff Kahn, Gil Kalai, Yitzhak Katznelson, and Nathan Linial. The influence of
variables in product spaces. Israel Journal of Mathematics, 77(1):55–64, 1992.
[Blu94]
Avrim Blum. New approximation algorithms for graph coloring.
41(3):470–516, 1994.
Journal of the ACM,
[DKPS10] Irit Dinur, Subhash Khot, Will Perkins, and Muli Safra. Hardness of finding independent sets
in almost 3-colorable graphs. In Proceedings of the Annual Symposium on Foundations of
Computer Science, pages 212–221, 2010.
[DMR09]
Irit Dinur, Elchanan Mossel, and Oded Regev. Conditional hardness for approximate coloring.
SIAM Journal of Computing, 39(3):843–873, 2009.
27
[DS05]
Irit Dinur and Samuel Safra. On the hardness of approximating minimum vertex cover. Annals
of Mathematics, 165(1):439–485, 2005.
[EH00]
Lars Engebretsen and Jonas Holmerin. Clique is hard to approximate within n1-o(1) . In Proceedings of the International Colloquium on Automata, Languages and Programming, pages
2–12, 2000.
[EH08]
Lars Engebretsen and Jonas Holmerin. More efficient queries in PCPs for NP and improved
approximation hardness of maximum CSP. Random Struct. Algorithms, 33(4):497–514, 2008.
[ER60]
Paul Erdős and Richard Rado. Intersection theorems for systems of sets. London Math. Soc.,
35:85–90, 1960.
[Fei04]
Uriel Feige. Approximating maximum clique by removing subgraphs. SIAM Journal of Discrete Mathematics, 18(2):219–225, 2004.
[FGL+ 96] Uriel Feige, Shafi Goldwasser, László Lovász, Shmuel Safra, and Mario Szegedy. Interactive
proofs and the hardness of approximating cliques. Journal of the ACM, 43(2):268–292, 1996.
[Fri98]
Ehud Friedgut. Boolean functions with low average sensitivity depend on few coordinates.
Combinatorica, 18(1):27–35, 1998.
[GK04]
Venkatesan Guruswami and Sanjeev Khanna. On the hardness of 4-coloring a 3-colorable
graph. SIAM Journal of Discrete Mathematics, 18(1):30–40, 2004.
[Hås99]
Johan Håstad. Clique is hard to approximate within n to the power 1-epsilon. Acta Mathematica,
182:105–142, 1999.
[HK05]
Johan Håstad and Subhash Khot. Query efficient PCPs with perfect completeness. Theory of
Computing, 1(1):119–148, 2005.
[HNZ02]
Eran Halperin, Ram Nathaniel, and Uri Zwick. Coloring k-colorable graphs using relatively
small palettes. J. Algorithms, 45(1):72–90, 2002.
[HW03]
Johan Håstad and Avi Wigderson. Simple analysis of graph tests for linearity and PCP. Random
Struct. Algorithms, 22(2):139–160, 2003.
[Kho01]
Subhash Khot. Improved inaproximability results for maxclique, chromatic number and approximate graph coloring. In Proceedings of the Annual Symposium on Foundations of Computer Science, pages 600–609, 2001.
[Kho02]
Subhash Khot. On the power of unique 2-prover 1-round games. In Proceedings of the ACM
Symposium on the Theory of Computing, pages 767–775, 2002.
[KLS00]
Sanjeev Khanna, Nathan Linial, and Shmuel Safra. On the hardness of approximating the
chromatic number. Combinatorica, 20(3):393–415, 2000.
[KMS98]
David R. Karger, Rajeev Motwani, and Madhu Sudan. Approximate graph coloring by semidefinite programming. Journal of the ACM, 45(2):246–265, 1998.
28
[KP06]
Subhash Khot and Ashok Kumar Ponnuswami. Better inapproximability results for maxclique,
chromatic number and min-3lin-deletion. In Proceedings of the International Colloquium on
Automata, Languages and Programming, pages 226–237, 2006.
[KR08]
Subhash Khot and Oded Regev. Vertex cover might be hard to approximate to within 2-epsilon.
Journal of Computer and System Sciences, 74(3):335–349, 2008.
[Raz98]
Ran Raz. A parallel repetition theorem. SIAM Journal of Computing, 27(3):763–803, 1998.
[Rus82]
L. Russo. An approximate zero-one law. Z. Wahrsch. Verw. Gebeite, 61(1):129–139, 1982.
[ST00]
Alex Samorodnitsky and Luca Trevisan. A PCP characterization of NP with optimal amortized
query complexity. In Proceedings of the ACM Symposium on the Theory of Computing, pages
191–199, 2000.
[Wig83]
Avi Wigderson. Improving the performance guarantee for approximate graph coloring. Journal
of the ACM, 30(4):729–735, 1983.
29