UniformSAT30min

Why almost all satisfiable
k-CNF formulas are easy?
Danny Vilenchik
Joint work with A. Coja-Oghlan and M. Krivelevich
SAT – Basic Notions
3CNF form:
F = (x1Çx2Ǭx5) Æ (x3Ǭx4Ǭx1) Æ (x1Çx2Çx6) Æ…
Ã
x1
x2
x3
x4
x5
x6
F
F
T
F
F
T
F = ( F ÇF Ç T ) Æ ( T Ç T Ç T ) Æ ( T Ç F Ç T )Æ…
x5 supports this clause w.r.t.
Ã
Goal: algorithm that produces optimal result, efficient, and works for all inputs
SAT – Some Background

Finding a satisfying assignment is NP Hard [Cook’71]

No approximation for MAX-SAT with factor better than 7/8 [Hastad’01]

How to proceed?

Hardness results only show that there exist hard instances

The heuristical approach - relaxes the universality requirement
Heuristic is a polynomial time algorithm that produces optimal results
on typical instances

Typical instance?

One possibility: random models
Random 3SAT


Random 3SAT:

Fix m,n

Pick m clauses uniformly at random (over the n variables)
Threshold: there exists a constant d such that [Fri99]

m/n¸d: most 3CNFs are not satisfiable (4.506)

m/n<d : most 3CNFs are satisfiable (3.52)

Near-threshold 3CNFs are apparently “hard” for many SAT heuristics

Possible reason: complicated structure of solution space (clustering)
Near Threshold Clustering Phenomenon
Conjectured solution space of Random k-SAT just below the threshold:
(part of this picture was rigorously proved for k¸8, [AR06,MMZ05])
All assignments within a
cluster are “close”
 A linear number of
variables are “frozen”

Every two clusters are “far”
from each other
 Exponentially many clusters

Our Result
Rigorously characterize the structure of the solution space of Random
3SAT, m/n some constant above the threshold:



Single cluster of satisfying assignments
Size of the cluster is exponential in n
(1-e-(m/n))n variables are frozen
Our Results
Theorem: There exists a deterministic polynomial time algorithm
that finds a satisfying assignment for almost all satisfiable 3CNF formulas
with m/n>C, C a sufficiently large constant


Rigorously complement results for the very sparse case:

When clustering is simple – the problem is easy

When clustering is “complicated” – the problem is harder (?)
Improving the exponential time algorithm for uniform satisfiable
3CNFs in this regime (only one known so far, [Chen03])
Almost all k-CNF formulas are easy !
The Planted Distribution


Planted 3SAT distribution with parameters m,n:

Fix an assignment 

Pick u.a.r. m clauses out of all clauses that are satisfied by 
Planted 3SAT was analyzed in several papers:

[Fla03] shows a spectral algorithm for solving sparse instances

Ben-Sasson et. al. for m/n=(logn) (planted and uniform coincide)

Planted models also “fashionable” for graph coloring, max clique,
max independent set, min bisection …

Planted models are more approachable – clauses are practically
independent

Open question: how does the planted model compare with the uniform?
Our Result

We show that the planted and uniform distributions share many
structural properties (“close”)

In particular, same structure of the solution space

Justifying the somewhat unnatural usage of planted-solution models

Flaxman’s algorithm [Fla03] works for the uniform distribution as well
SAT and Message Passing

[FMV06] Warning Propagation was shown to solve planted 3SAT
instances with m/n>C, C some sufficiently large constant

Our work implies – WP works in the uniform setting as well

Reinforces the following thesis:

When clustering is complicated ) formulas are hard )
sophisticated algorithms needed: Survey Propagation

When clustering is simple ) formulas are easy ) naïve
algorithms work: Warning Propagation
Clustering: Proof Technique

Recall: uniform distribution over satisfiable 3CNFs with m clauses

Why more difficult than the planted distribution?


For starters, consider the planted 3SAT distribution


Edges are not independent
m/n sufficiently large constant
Every variable is expected to support 3m/(7n) clauses w.r.t. planted

Pr[x supports C]=Pr[x supports C|x appears in C]Pr[x appears in C]
Fact 1: whp there is no subformula H on h variables s.t. h<n/100 and there are at
least hm/(10n) clauses containing two variables from H
Fact 2: whp there are no two satisfying assignments at distance greater than n/100
Clustering: Proof Technique
Claim: suppose that every variable has the expected support, and Facts 1
and 2 hold, then F is uniquely satisfiable
Proof: suppose not,

Let  be the planted assignment and à some other satisfying assignment

Take x s.t. Ã(x)(x), x supports 3m/(7n) clauses w.r.t. 
Consdier such clause (T Ç F Ç F)
F
T
Ã:
 Define H={ x : Ã(x)(x) }, h=|H|<n/100 (Fact 1)


There exists 3hm/(7n) clauses containing two variables from H

This contradicts Fact 2.
Clustering: Proof Technique


This picture is whp the case when m/n>Clog n
When m/n=O(1) - whp not the case (some variables have 0 support)
Definition: Given a 3CNF F and a satisfying assignment Ã, a set C is
called a core of F if 8x2C, x supports at least m/(4n) clauses in F[C]
Claim: For F in the planted distribution, m/n sufficiently large constant
there exists a core C s.t.
 |V(C)|>(1-e-(m/n))n
 C is frozen in F
Corollary: one-cluster structure
Moving to the Uniform Case


A – a “bad” structural property (in our case: no big core)
 –expected number of satisfying assignments of planted 3CNF
Claim: Pruniform[A] < ¢Prplanted[A]
Claim: Pruniform[no big core] < ¢Prplanted[no big core]< ¹¢e-nc
Claim: ¹<enc’, c’<c
Corollary: Pruniform[no big core] = o(1)
Further Research
solution space
4.26
c
clogn
m/n