Why almost all satisfiable
k-CNF formulas are easy?
Danny Vilenchik
Joint work with A. Coja-Oghlan and M. Krivelevich
SAT – Basic Notions
3CNF form:
F = (x1Çx2Ǭx5) Æ (x3Ǭx4Ǭx1) Æ (x1Çx2Çx6) Æ…
Ã
x1
x2
x3
x4
x5
x6
F
F
T
F
F
T
F = ( F ÇF Ç T ) Æ ( T Ç T Ç T ) Æ ( T Ç F Ç T )Æ…
x5 supports this clause w.r.t.
Ã
Goal: algorithm that produces optimal result, efficient, and works for all inputs
SAT – Some Background
Finding a satisfying assignment is NP Hard [Cook’71]
No approximation for MAX-SAT with factor better than 7/8 [Hastad’01]
How to proceed?
Hardness results only show that there exist hard instances
The heuristical approach - relaxes the universality requirement
Heuristic is a polynomial time algorithm that produces optimal results
on typical instances
Typical instance?
One possibility: random models
Random 3SAT
Random 3SAT:
Fix m,n
Pick m clauses uniformly at random (over the n variables)
Threshold: there exists a constant d such that [Fri99]
m/n¸d: most 3CNFs are not satisfiable (4.506)
m/n<d : most 3CNFs are satisfiable (3.52)
Near-threshold 3CNFs are apparently “hard” for many SAT heuristics
Possible reason: complicated structure of solution space (clustering)
Near Threshold Clustering Phenomenon
Conjectured solution space of Random k-SAT just below the threshold:
(part of this picture was rigorously proved for k¸8, [AR06,MMZ05])
All assignments within a
cluster are “close”
A linear number of
variables are “frozen”
Every two clusters are “far”
from each other
Exponentially many clusters
Our Result
Rigorously characterize the structure of the solution space of Random
3SAT, m/n some constant above the threshold:
Single cluster of satisfying assignments
Size of the cluster is exponential in n
(1-e-(m/n))n variables are frozen
Our Results
Theorem: There exists a deterministic polynomial time algorithm
that finds a satisfying assignment for almost all satisfiable 3CNF formulas
with m/n>C, C a sufficiently large constant
Rigorously complement results for the very sparse case:
When clustering is simple – the problem is easy
When clustering is “complicated” – the problem is harder (?)
Improving the exponential time algorithm for uniform satisfiable
3CNFs in this regime (only one known so far, [Chen03])
Almost all k-CNF formulas are easy !
The Planted Distribution
Planted 3SAT distribution with parameters m,n:
Fix an assignment
Pick u.a.r. m clauses out of all clauses that are satisfied by
Planted 3SAT was analyzed in several papers:
[Fla03] shows a spectral algorithm for solving sparse instances
Ben-Sasson et. al. for m/n=(logn) (planted and uniform coincide)
Planted models also “fashionable” for graph coloring, max clique,
max independent set, min bisection …
Planted models are more approachable – clauses are practically
independent
Open question: how does the planted model compare with the uniform?
Our Result
We show that the planted and uniform distributions share many
structural properties (“close”)
In particular, same structure of the solution space
Justifying the somewhat unnatural usage of planted-solution models
Flaxman’s algorithm [Fla03] works for the uniform distribution as well
SAT and Message Passing
[FMV06] Warning Propagation was shown to solve planted 3SAT
instances with m/n>C, C some sufficiently large constant
Our work implies – WP works in the uniform setting as well
Reinforces the following thesis:
When clustering is complicated ) formulas are hard )
sophisticated algorithms needed: Survey Propagation
When clustering is simple ) formulas are easy ) naïve
algorithms work: Warning Propagation
Clustering: Proof Technique
Recall: uniform distribution over satisfiable 3CNFs with m clauses
Why more difficult than the planted distribution?
For starters, consider the planted 3SAT distribution
Edges are not independent
m/n sufficiently large constant
Every variable is expected to support 3m/(7n) clauses w.r.t. planted
Pr[x supports C]=Pr[x supports C|x appears in C]Pr[x appears in C]
Fact 1: whp there is no subformula H on h variables s.t. h<n/100 and there are at
least hm/(10n) clauses containing two variables from H
Fact 2: whp there are no two satisfying assignments at distance greater than n/100
Clustering: Proof Technique
Claim: suppose that every variable has the expected support, and Facts 1
and 2 hold, then F is uniquely satisfiable
Proof: suppose not,
Let be the planted assignment and à some other satisfying assignment
Take x s.t. Ã(x)(x), x supports 3m/(7n) clauses w.r.t.
Consdier such clause (T Ç F Ç F)
F
T
Ã:
Define H={ x : Ã(x)(x) }, h=|H|<n/100 (Fact 1)
There exists 3hm/(7n) clauses containing two variables from H
This contradicts Fact 2.
Clustering: Proof Technique
This picture is whp the case when m/n>Clog n
When m/n=O(1) - whp not the case (some variables have 0 support)
Definition: Given a 3CNF F and a satisfying assignment Ã, a set C is
called a core of F if 8x2C, x supports at least m/(4n) clauses in F[C]
Claim: For F in the planted distribution, m/n sufficiently large constant
there exists a core C s.t.
|V(C)|>(1-e-(m/n))n
C is frozen in F
Corollary: one-cluster structure
Moving to the Uniform Case
A – a “bad” structural property (in our case: no big core)
–expected number of satisfying assignments of planted 3CNF
Claim: Pruniform[A] < ¢Prplanted[A]
Claim: Pruniform[no big core] < ¢Prplanted[no big core]< ¹¢e-nc
Claim: ¹<enc’, c’<c
Corollary: Pruniform[no big core] = o(1)
Further Research
solution space
4.26
c
clogn
m/n
© Copyright 2026 Paperzz