Critical behaviour of combinatorial search algorithms, and the

Critical behaviour of combinatorial search
algorithms, and the unitary-propagation
universality class
Christophe Deroulers
LPTENS Paris
and
Rémi Monasson
LPTENS Paris
Universität zu Köln — May, 13th 2005
Outline
P & NP problems in computer science
Random 2 + p-SAT
Combinatorial search algorithms
Analysis of the UC algorithm
The critical behaviour: first approach
The critical behaviour: computation
Conclusion
NP problems in computer science
P and NP problems
Let N be the size of a computer science problem. Roughly:
P = {“easy” problems } = { problems solvable in a time bounded by a
Polynom in N }
NP = { “hard problems” } = { others, but where a solution can be tested in
polynomial time }
P and NP problems
Let N be the size of a computer science problem. Roughly:
P = {“easy” problems } = { problems solvable in a time bounded by a
Polynom in N }
NP = { “hard problems” } = { others, but where a solution can be tested in
polynomial time }
Examples:
◮ Sorting N objects is easy (can be done in time ≃ N ln N )
◮ Inverting a matrix of size N is easy (Gauss algorithm)
P and NP problems– cont’d
Believed to he hard:
◮ Finding the exact ground-state of a 3-dimensional sample of an Ising
spin-glass (given the energy of the nearest-neighbours interactions)
◮ Traveling Salesman Problem (TSP): is the shortest path that goes at least
once through each of N given cities shorter than 1000km?
◮ The graph coloring problem (COL): given a geographic map of N
countries, is it possible to color it with k colors so that no two countries
that share a frontier have the same color?
◮ Finding a non-trivial factorization of an integer with N digits
The SAT problem
Given boolean variables a, b, c, . . .
Values of a, b, c, . . . ∈ {TRUE, FALSE}
The SAT problem
Given boolean variables a, b, c, . . .
and a boolean formula, e.g. (a AND (NOTb)) OR c = TRUE,
The SAT problem
Given boolean variables a, b, c, . . .
and a boolean formula, e.g. (a AND (NOTb)) OR c = TRUE,
problem: is these an assignment of the variables that SATisfies the formula, or
not ?
Answer to problem is yes or no.
Here, answer to problem is yes: solutions to formula are {a = TRUE and
b = FALSE, whatever c} and {c = TRUE, whatever a and b}.
Size of problem N = size of the formula.
The SAT problem
Given boolean variables a, b, c, . . .
and a boolean formula, e.g. (a AND (NOTb)) OR c = TRUE,
problem: is these an assignment of the variables that SATisfies the formula, or
not ?
Answer to problem is yes or no.
Here, answer to problem is yes: solutions to formula are {a = TRUE and
b = FALSE, whatever c} and {c = TRUE, whatever a and b}.
Size of problem N = size of the formula.
Historical motivation: automatic proof of theorems
Industrial interest: a test step during the conception of computer chips
Reduction
Reduction of problems in Polynomial time:
e.g. writing a SAT formula to decide if a graph can be colored with k colors;
drawing a graph, a k-coloration of which would solve a SAT formula
Reduction
Reduction of problems in Polynomial time:
e.g. writing a SAT formula to decide if a graph can be colored with k colors;
drawing a graph, a k-coloration of which would solve a SAT formula
→ {NP-complete problems} = { NP problems to which any NP problem can
be reduced } (= hardest NP problems)
⇒ Solving one of them in polynomial time would solve all others.
Examples: SAT, COL, TSP.
Not integer factorization.
P
NP
NP complete
NP problems: comments
Comments:
◮ A non-polynomial time algorithm (method) to solve NP problems is easy:
enumeration (needs polyn. time for each solution × exp. number of
solutions).
◮ This is a worst-case analysis: for some spin-glass samples, some
dispositions of the cities (TSP) or some geographic maps (COL), the
answer is easy to find. But for some others, it is difficult. Tells nothing
about typical case.
◮ This classification dates from the 1970s. No proof that P 6= NP since then.
Random 2 + p-SAT
Random K-SAT
Observation: in practice, many boolean formulae are easy to solve.
We want:
◮ a systematic way to generate difficult formulae
◮ study the average or typical case
Random K-SAT
Observation: in practice, many boolean formulae are easy to solve.
We want:
◮ a systematic way to generate difficult formulae
◮ study the average or typical case
→ use a random distribution of formulae, with some parameter — the random
K-SAT problem
Random K-SAT
Let a, b, c,... be N boolean variables and K a fixed integer.
Draw uniformly at random M K-uples of variables:
a OR b OR c
a OR e OR i
f OR g OR h
d OR g OR j
b OR c OR i
e OR i OR j
a OR f OR h
|
{z
}
a OR g OR j
a OR e OR f
K variables
l M clauses
Random K-SAT
Let a, b, c,... be N boolean variables and K a fixed integer.
Draw uniformly at random M K-uples of variables:
a OR b OR c
a OR e OR i
f OR g OR h
d OR g OR j
b OR c OR i
e OR i OR j
a OR f OR h
a OR g OR j
a OR e OR f
negate each variable with uniform probability
(b means “NOT b”)
1
2
Random K-SAT
Let a, b, c,... be N boolean variables and K a fixed integer.
Draw uniformly at random M K-uples of variables:
(a OR b OR c)
AND
(a OR e OR i)
AND
(f OR g OR h)
(d OR g OR j)
AND
(b OR c OR i)
AND
(e OR i OR j)
(a OR f OR h)
AND
(a OR g OR j)
AND
(a OR e OR f )
negate each variable with uniform probability
1
2
and add ANDs between the K-clauses (K-uples) to build up a formula.
Static phase transition
Fix α and let M, N → +∞ with M = αN (thermodynamic limit with fixed
clauses-per-variable ratio).
Static phase transition
Fix α and let M, N → +∞ with M = αN (thermodynamic limit with fixed
clauses-per-variable ratio).
For 2-SAT, rigorous results: If α = M
N < 1, almost surely the formula is
satisfiable.
If α = M
N > 1, almost surely the formula is not satisfiable.
Width of the transition region ≃ N −1/3 when N → +∞.
Borgs, Chayes et al. 2001
Static phase transition
Fix α and let M, N → +∞ with M = αN (thermodynamic limit with fixed
clauses-per-variable ratio).
For 2-SAT, rigorous results: If α = M
N < 1, almost surely the formula is
satisfiable.
If α = M
N > 1, almost surely the formula is not satisfiable.
Width of the transition region ≃ N −1/3 when N → +∞.
Borgs, Chayes et al. 2001
For 3-SAT, numerical results:
If α = M
N < 4.3, almost surely the formula is satisfiable.
If α = M
N > 4.3, almost surely the formula is not satisfiable.
late 1980s / early 1990s, Selman
Static phase transition
3-SAT, average computation time:
median computational complexity
2000
100 var.
75 var.
50 var.
1500
1000
500
0
0
Critical slowing down!
2
4
6
αC
number of clauses per variable α
8
10
K-SAT = spin glass (∞-dimension)
K-SAT = spin glass (∞-dimension)
Boolean variables x1 , x2 , . . .
TRUE, FALSE
Formula
(x1 OR x2 ) AND x3
Solution
↔
↔
↔
↔
↔
Ising spins S1 , S2 , . . .
1, -1
Hamiltonian:
H = (1 − S1 )(1 + S2 ) + (1 − S3 )
Ground state (energy 0)
K-SAT = spin glass (∞-dimension)
Boolean variables x1 , x2 , . . .
TRUE, FALSE
Formula
(x1 OR x2 ) AND x3
Solution
↔
↔
↔
↔
↔
Ising spins S1 , S2 , . . .
1, -1
Hamiltonian:
H = (1 − S1 )(1 + S2 ) + (1 − S3 )
Ground state (energy 0)
Replica computations give (non-rigorously) location of the phase transition
Monasson, Zecchina, Birolo, Weigt, Mezard, Parisi...
K-SAT = spin glass (∞-dimension)
Boolean variables x1 , x2 , . . .
TRUE, FALSE
Formula
(x1 OR x2 ) AND x3
Solution
↔
↔
↔
↔
↔
Ising spins S1 , S2 , . . .
1, -1
Hamiltonian:
H = (1 − S1 )(1 + S2 ) + (1 − S3 )
Ground state (energy 0)
Replica computations give (non-rigorously) location of the phase transition
Monasson, Zecchina, Birolo, Weigt, Mezard, Parisi...
Rigorous results from the mathematics and computer science communities for
3-SAT:
◮ Width of the transition region → 0 when N → +∞
◮ Threshold αstatic < 4.51
Dubois et al.
◮ Threshold αstatic > 3.52
Kirousis et al.
random 2 + p-SAT
Let 0 ≤ p ≤ 1. With N variables, draw uniformly at random M clauses that
are
◮ of length 3 with probability p
◮ of length 2 with probability 1-p
p = 0 ↔ 2-SAT
p = 1 ↔ 3-SAT
Complete algorithms for resolution
of SAT
Noncomplete algorithms
Given a SAT formula: start from a random assignment of its variable, and
update it by “spin flips” until a solution is found.
Noncomplete algorithms
Given a SAT formula: start from a random assignment of its variable, and
update it by “spin flips” until a solution is found.
May be efficient; used in practice
Interesting out-of-equilibrium phenomena (but nonphysical dynamics)
Can’t give a proof that no solution exists (answer to problem is yes, or run
forever)
Won’t be discussed here.
Complete algorithms
Algorithms that, for a formula:
◮ either give a solution (answer to problem is yes)
◮ or guarantee that there is no solution (answer to problem is no)
Principle: explore the tree of all possibilities.
Davis, Putnam et al. 1960s
Example
0. Let the formula be:
a OR
b OR
c
a OR
e OR
i
a OR
g OR
h
d OR
g OR
j
g OR
h OR
i
e OR
i
OR
j
a OR
f OR
h
a OR
b OR
j
a OR
e OR
f
Example
1. Choose a variable:
a OR
b OR
c
a OR
e OR
i
a OR
g OR
h
d OR
g OR
j
g OR
h OR
i
e OR
i
OR
j
a OR
f OR
h
a OR
b OR
j
a OR
e OR
f
Example
2. Assign it:
a OR
b OR
c
a OR
e OR
i
a OR
g OR
h
d OR
g OR
j
g OR
h OR
i
e OR
i
OR
j
a OR
f OR
h
a OR
b OR
j
a OR
e OR
f
a := true
Example
3. Reduce the clauses where it appears:
TRUE
d OR
g OR
TRUE
j
g OR
e OR
i
h OR
i
TRUE
a := true
e OR
g OR
h
i
OR
j
e OR
f
Example
4. Choose another variable:
TRUE
d OR
g OR
TRUE
j
g OR
e OR
i
h OR
i
TRUE
a := true
e OR
g OR
h
i
OR
j
e OR
f
Example
5. Assign it:
TRUE
d OR
g OR
TRUE
j
g OR
e OR
i
h OR
i
TRUE
a := true
e := false
e OR
g OR
h
i
OR
j
e OR
f
Example
6. Reduce the clauses where it appears:
TRUE
d OR
g OR
TRUE
j
g OR
h OR
TRUE
a := true
e := false
i
g OR
i
TRUE
TRUE
h
Example
7. Natural choice: the unitary clause (1-clause).
TRUE
d OR
g OR
TRUE
j
g OR
h OR
TRUE
a := true
e := false
i
g OR
i
TRUE
TRUE
h
Example
8. Assign the variable...
TRUE
d OR
g OR
TRUE
j
g OR
h OR
TRUE
a := true
e := false
i := true
i
g OR
i
TRUE
TRUE
h
Example
9. Reduce...
TRUE
d OR
g OR
TRUE
j
g OR
TRUE
g OR
h
TRUE
TRUE
TRUE
a := true
e := false
i := true
h
Example
10. Choose a variable:
TRUE
d OR
g OR
TRUE
j
g OR
TRUE
g OR
h
TRUE
TRUE
TRUE
a := true
e := false
i := true
h
Example
11. Assign it:
TRUE
d OR
g OR
TRUE
j
g OR
TRUE
g OR
h
TRUE
TRUE
TRUE
a := true
e := false
i := true
g := false
h
Example
12. Reduce...
TRUE
TRUE
TRUE
h
TRUE
TRUE
TRUE
TRUE
a := true
e := false
i := true
g := false
h
Example
13. Contradiction!
TRUE
TRUE
TRUE
h
TRUE
TRUE
TRUE
TRUE
a := true
e := false
i := true
g := false
h
Complete algorithms
Principle: explore the tree of all possibilities, and backtrack when find a
contradiction.
Davis, Putnam et al. 1960s
a := true
b := true
a := false
b := false
b := true
C
S
c := true
C
c := false
C
Heuristics “UC”
How to choose variables to assign?
Heuristics “Unit Clause”:
◮ If there are 1-clauses, take a variable there (you will have to satisfy
1-clauses now or later anyway) and satisfy its 1-clause. Unit Propagation
◮ Else, take a variable at random and assign it at random.
TRUE
d OU
g OU
TRUE
j
g OU
e OU
i
h OU
i
TRUE
e OU
g OU
h
i
OU
j
e OU
f
Heuristics “GUC”
Heuristics “Generalized Unit Clause”:
◮ If there are 1-clauses, take a variable there (you will have to satisfy
1-clauses now or later anyway) and satisfy its 1-clause. Unit Propagation
◮ Else, choose one of the shortest clauses and assign a variable of this clause
so as to satisfy the clause.
TRUE
d OU
g OU
TRUE
j
g OU
e OU
i
h OU
i
TRUE
e OU
g OU
h
i
OU
j
e OU
f
Success without backtracking
Central object of our computations: probability P that a solution is found
without backtracking (“success probability”)
Success without backtracking
Central object of our computations: probability P that a solution is found
without backtracking (“success probability”)
Intuitively, for N large:
No backtracking
finite P
P ≈ e−N
Complete algorithm (with back.)
↔
↔
polynomial time T
exponential time T
Success without backtracking - 2-SAT
N = +∞
2-SAT
1
0.9
2-SAT
0.8
0.7
Success Probability
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
alpha
0.8
1
Success without backtracking - 3-SAT
N = +∞
3-SAT
1
UC
GUC
HL
3-SAT
0.8
Success Probability
0.6
0.4
0.2
0
0
0.5
1
1.5
2
alpha
2.5
3
3.5
4
A dynamic phase transition
→ 2 kinds of thresholds:
αstatic (p) depends only on the disorder distribution : are there solutions to
formula?
αalgo (p) depends also on the algorithm (=choice of dynamics): are we smart
enough to quickly find solutions to formula?
A dynamic phase transition
→ 2 kinds of thresholds:
αstatic (p) depends only on the disorder distribution : are there solutions to
formula?
αalgo (p) depends also on the algorithm (=choice of dynamics): are we smart
enough to quickly find solutions to formula?
Clearly, αalgo (p) ≤ αstatic (p): if there is no solution, no success!
A dynamic phase transition
→ 2 kinds of thresholds:
αstatic (p) depends only on the disorder distribution : are there solutions to
formula?
αalgo (p) depends also on the algorithm (=choice of dynamics): are we smart
enough to quickly find solutions to formula?
Clearly, αalgo (p) ≤ αstatic (p): if there is no solution, no success!
Open problem: find a smart algorithm with αalgo = αstatic .
Phase space for algo. UC on 2 + p-SAT
Analysis of algorithm UC
Analysis of UC on 2 + p-SAT
Input: a random 2 + p-SAT formula with N variables and αN clauses.
Analysis of UC on 2 + p-SAT
Input: a random 2 + p-SAT formula with N variables and αN clauses.
↓
Algorithm UC
While there are clauses:
◮ If there are 1-clauses, take a variable there (you will have to satisfy
1-clauses now or later anyway) and satisfy its 1-clause. Unit Propagation
◮ Else, take a variable at random and assign it at random.
Don’t backtrack, stop if contradiction found.
Analysis of UC on 2 + p-SAT
Input: a random 2 + p-SAT formula with N variables and αN clauses.
↓
Algorithm UC
While there are clauses:
◮ If there are 1-clauses, take a variable there (you will have to satisfy
1-clauses now or later anyway) and satisfy its 1-clause. Unit Propagation
◮ Else, take a variable at random and assign it at random.
Don’t backtrack, stop if contradiction found.
↓
Output: yes (success) or don’t know (failure)
Analysis of UC on 2 + p-SAT
Input: a random 2 + p-SAT formula with N variables and αN clauses.
↓
Algorithm UC
While there are clauses:
◮ If there are 1-clauses, take a variable there (you will have to satisfy
1-clauses now or later anyway) and satisfy its 1-clause. Unit Propagation
◮ Else, take a variable at random and assign it at random.
Don’t backtrack, stop if contradiction found.
↓
Output: yes (success) or don’t know (failure)
Probability of success at large N =?
Analysis of UC on 2 + p-SAT
Input: a random 2 + p-SAT formula with N variables and αN clauses.
↓
Algorithm UC
While there are clauses:
◮ If there are 1-clauses, take a variable there (you will have to satisfy
1-clauses now or later anyway) and satisfy its 1-clause. Unit Propagation
◮ Else, take a variable at random and assign it at random.
Don’t backtrack, stop if contradiction found.
↓
Output: yes (success) or don’t know (failure)
Probability of success at large N =?
Analysis of UC on 2 + p-SAT
Time T := number of assigned variables.
0 ≤ T ≤ N.
Analysis of UC on 2 + p-SAT
Time T := number of assigned variables.
0 ≤ T ≤ N.
2 important remarks:
After averaging over “disorder”, i.e. the input data (the SAT formula), the
distribution of partially reduced formulae at time T is completely fixed by the
knowledge of the number of 1-, 2-, 3-clauses
~ ) := (C3 (T ), C2 (T ), C1 (T )).
C(T
Franco
In other words: the annealed approximation is exact.
Analysis of UC on 2 + p-SAT
Time T := number of assigned variables.
0 ≤ T ≤ N.
2 important remarks:
After averaging over “disorder”, i.e. the input data (the SAT formula), the
distribution of partially reduced formulae at time T is completely fixed by the
knowledge of the number of 1-, 2-, 3-clauses
~ ) := (C3 (T ), C2 (T ), C1 (T )).
C(T
Franco
In other words: the annealed approximation is exact.
C3 and C2 are self-averaging.
hC3 (T )i.
We only need to compute hC2 (T )i and
Analysis of UC on 2 + p-SAT
Flow of clauses:
C3
C2
C1
3-clauses
2-clauses
1-clauses
Analysis of UC on 2 + p-SAT
Flow of clauses:
w2
w1
e3
C3
3-clauses
e2
e1
C2
2-clauses
C1
1-clauses
Analysis of UC: continuous time limit
For the averages:
3hC3 i
N −T
1 3hC3 i
2hC2 i
+
hC2 (T + 1)i − hC2 (T )i = −
N −T
2N −T
hC2 i
hC1 (T + 1)i − hC1 (T )i =
− E [C1 > 0]
N −T
hC3 (T + 1)i − hC3 (T )i = −
Chao & Franco
Analysis of UC: continuous time limit
For the averages:
3hC3 i
N −T
1 3hC3 i
2hC2 i
+
hC2 (T + 1)i − hC2 (T )i = −
N −T
2N −T
hC2 i
hC1 (T + 1)i − hC1 (T )i =
− E [C1 > 0]
N −T
hC3 (T + 1)i − hC3 (T )i = −
Cj (T ) varies over times T of order 1
Cj
T
=:
c
(t
:=
j
N
N ) varies over times T of order N
T
) + o(N )
⇒ Cj (T ) = N cj ( N
Chao & Franco
Analysis de UC: ODEs
Hence:
dc3
dt
dc2
dt
dc1
dt
=
=
=
3c3
−
1−t
2c2
1 3c3
−
+
1−t 21−t
c2
− ρ1
1−t
where ρ1 (t) = E [C1 (T ) > 0] is the probability that there is at least one
1-clause at time t.
Analysis de UC: ODEs
Hence:
dc3
dt
dc2
dt
dc1
dt
=
=
=
3c3
−
1−t
2c2
1 3c3
−
+
1−t 21−t
c2
− ρ1
1−t
where ρ1 (t) = E [C1 (T ) > 0] is the probability that there is at least one
1-clause at time t.
For 3-SAT: c3 (t) = α(0)(1 − t)3 and c2 (t) = 32 α(0)t(1 − t)2 .
Analysis of UC : study of C1
Proba. that we notice no contradiction between T and T + 1:
1−
1
2(N − T )
max(C1 −1,0)
hence proba. that we notice no contradiction from 0 to T = N t (success):
Z t
hmax(C1 − 1, 0)i
exp −
dt
2(1 − t)
0
→ we need information about C1 .
Transition matrix for C1
s2 +r2 C2 −s2 −r2
C2
X
2
1
C2
1−
×
HN [C1′ ← C1 ; T, C2 ] =
s2 , r2
N −T
N −T
s ,r =0
2
2
h
δC1 =0 δC1′ =r2 + (1 − δC1 =0 )×
CX
1 −1 s1 =0
C1 −1−s1
s1 i
C1 − 1
1
1
δC1′ =C1 −1−s1 +r2
1−
s1
2(N − T )
N −T
Transition matrix for C1
s2 +r2 C2 −s2 −r2
C2
X
2
1
C2
1−
×
HN [C1′ ← C1 ; T, C2 ] =
s2 , r2
N −T
N −T
s ,r =0
2
2
h
δC1 =0 δC1′ =r2 + (1 − δC1 =0 )×
CX
1 −1 s1 =0
C1 −1−s1
s1 i
C1 − 1
1
1
δC1′ =C1 −1−s1 +r2
1−
s1
2(N − T )
N −T
◮ H depends (explicitly) only on C1 , C2 and T .
◮ Exact expression, contains the whole information but C2 .
◮ Correct only for this distribution of disorder (of formulae)
◮ but correct for all algorithms that use the Unit Propagation rule (variables
in 1-clauses taken first).
Random walk with bias
C1 ≥ 0 has a biased random walk:
0
C1
Step left −1 because of one 1-clause elimination
Step right +d [d=binomial variable with proba. 1/(N − T ) among C2 (T )]
because of 1- and 2-clauses reduction
Random walk with bias
C1 ≥ 0 has a biased random walk:
0
C1
Step left −1 because of one 1-clause elimination
Step right +d [d=binomial variable with proba. 1/(N − T ) among C2 (T )]
because of 1- and 2-clauses reduction
⇒ while
c2
1−t
< 1, C1 bounded when N → +∞.
Resolution trajectories
Finite success probability
→ if initial α < αc (= 38 for 3-SAT, 1 for 2-SAT), C1 (T ) is a. s. bounded and
we know its distribution as a function of c2 (t). (If initial α ≥ αc , C1 is no
more bounded.)
Finite success probability
→ if initial α < αc (= 38 for 3-SAT, 1 for 2-SAT), C1 (T ) is a. s. bounded and
we know its distribution as a function of c2 (t). (If initial α ≥ αc , C1 is no
more bounded.)
Conclusion: iff. initial α < αc , probability P of success > 0 :


 3α − √ 1
arctan √ 81
for 3-SAT
8
16
2
−1
−1
3α
3α
ln(P ) =

 α + ln (1 − α)
for 2-SAT
4
Analysis of UC : P (α) for 2-SAT
1
0.9
2-SAT
0.8
0.7
Success Probability
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
alpha
− ln(P (α)) = Θ (ln |α − αc |) when α → αc − = 1.
0.8
1
Analysis of UC : P (α) for 3-SAT
1
numerique pour 2500 variables
numerique pour 12500 variables
numerique pour 50000 variables
theorique
0.9
0.8
0.7
P
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
2
2.5
alpha0
− ln(P (α)) = Θ
√
1
|α−αc |
when α → αc − = 8/3.
3
3.5
The critical behaviour: a first
approach
Critical behaviour
N is large.
For each algorithm without backtracking that uses the unit-propagation
principle, ∃ αc such that :
◮ If α < αc , proba of success P = Θ(1).
◮ If α > αc , − ln(P ) ∝ N (thus P → 0).
Critical behaviour
N is large.
For each algorithm without backtracking that uses the unit-propagation
principle, ∃ αc such that :
◮ If α < αc , proba of success P = Θ(1).
◮ If α > αc , − ln(P ) ∝ N (thus P → 0).
αc depends on the algorithm.
But do we have critical exponents that do not depend on the algorithm?
Erdös-Rényi random graphs
Fix c ≥ 0. Given N points, for each set of 2 points, we draw an edge between
them with probability Nc .
Percolation of Erdös-Rényi random graphs
Size of the largest connected componant:
t(c, N ) =
where 1 − f (c) = e−cf (c) .







1
c−1−ln(c)
2
ln(N ) if c < 1
O(N 3 )
if c = 1
f (c) N
if c > 1
Percolation of Erdös-Rényi random graphs
Size of the largest connected componant:
t(c, N ) =
where 1 − f (c) = e−cf (c) .







1
c−1−ln(c)
ln(N ) if c < 1
2
O(N 3 )
if c = 1
f (c) N
if c > 1
2
3
1
3
Intuition ⇒ ∃σ function such that t(c, N ) ∼ N σ N (c − 1) with:

6 ln |x|

σ(x)
∼

x2

σ(x) ∼ 2x



0 < σ(x) < 1
when x → −∞
when x → +∞
∀x ∈ IR
Algorithms from the UC family
Here, we are looking for a scaling function:
ln[P (α, N )] = N β π N γ (α − αc )
with probably γ = 13 by analogy with phase transitions in random graphs: the
α(1 − p) = 1 condition indicates that the graph {variables = vertices,
2-clauses = edges} percolates.
β=?
Algorithms from the UC family
β
γ
ln[P (α, N )] =? N π N (α − αc )
We know (p > 2/5-SAT case):

 √ cte
when α → αc − , N → +∞
α−αc
− ln(P ) ∼
 N f (α − αc ) when α > αc , N → +∞
Algorithms from the UC family
β
γ
ln[P (α, N )] =? N π N (α − αc )
We know (p > 2/5-SAT case):

 √ cte
when α → αc − , N → +∞
α−αc
− ln(P ) ∼
 N f (α − αc ) when α > αc , N → +∞
Hence probably γ = 31 , β =
1
6
π(x) ∼
et


cte
√
x
 x 52
when x → −∞
when x → +∞
Numerical results (3-SAT)
0.6
moyenne
x/6-0.327
x/6-0.327-1.3/x**2
0.5
log10(-log10(P))
0.4
0.3
0.2
0.1
0
-0.1
2.5
3
3.5
4
log10(N)
β≈
1
6
(0.13 < β < 0.19).
4.5
5
Interpretation
What happens when
c2
1−t
≈ 1?
Interpretation
What happens when
c2
1−t
≈ 1?
The graph where nodes=variables and edges=2-clauses percolates.
Assigning a variable that is in a giant component (size N 2/3 ) will produce
(successively) N 2/3 1-clauses.
Interpretation
What happens when
c2
1−t
≈ 1?
The graph where nodes=variables and edges=2-clauses percolates.
Assigning a variable that is in a giant component (size N 2/3 ) will produce
(successively) N 2/3 1-clauses.
Interpretation
Interpretation
Critical window for the random graph where edges are 2-clauses and vertices
are variables is 11
N3
Tangential approach → we stay in the critical window for ≈
1
1
N6
.
2
3
We have to eliminate
pconnected components of size N of the graph that
2
percolates → C1 is
N 3.
X
− ln(P ) = −
0≤T ≤N (t∗ −
−
|
1
N
1
6
hmax(C1 − 1, 0)i ln(1 −
)
{z
O(1)
1
)
2(N − T )
}
1
)
hmax(C1 − 1, 0)i ln(1 −
|
{z
}
2(N − T )
1
|
{z
}
∗
∗
1
N (t − 1 )≤T ≤N t
Θ(N 3 )
1
N6
)
Θ(
N
{z
}
|
X
Θ(N
1
hence ln[P (α = αc )] ≈ N 6 .
1
1
N6
)
Interpretation
0.3
1.5
freq x N
1/3
100
frequency
0.2
1
0.5
0
1000
0
1
2
C1 / N
3
1/3
0.1
10000
0
0
20
40
maximum C1
60
Critical behaviour: exact
computation
Starting point
We know that, at the success/failure transition, C1 diverges: we have to study
its distribution.
Generating function of C1 :
pN (T, x) :=
+∞
X
xC1 PN (C1 , T )
C1 =0
Matrix H(C1′ ← C1 ) becomes:
pN (T + 1, x) =
with A :=
1
2(N −T )
x−1
1+
N −T
+ (1 −
C2 (T )
1
N −T
×
1
1
pN (T, A) + (1 − )pN (T, 0)
A
A
)x.
Exact, all information but C2 ; relevant for all algorithms that apply the Unit
Propagation rule
Zoom in for 3-SAT
Zoom in
Zoom in around the contact point
c2
1−t
= 1. In the case of the UC algorithm:
◮ α = 38 (1 + ǫ0 N −θ )
◮ t=
1
2
(1 + t0 N −τ )
◮ C2 (T ) = 4T (1 −
T 2
N)
√
+ O( N )
θ, τ to be chosen to study the first non-vanishing order.
Computations...
We substitute in
pN (T + 1, x) =
with A :=
1
2(N −T )
x−1
1+
N −T
+ (1 −
C2 (T )
1
N −T
×
1
1
pN (T, A) + (1 − )pN (T, 0)
A
A
)x.
θ and τ are free. If we ask that all first-order non-vanishing terms are of the
same order:
θ = γ = γ0 = 31 , λ = τ = 16 .
PDE for f
And (for 3-SAT) probability density function f (c, t0 ) of c = C1 /N 1/3 given
by:
2
1∂ f
2 ∂c2
+ (t20 − ǫ0 ) ∂f
∂c + (c − c)f = 0
with boundary conditions:
∂c f (0, t0 ) + (t20 − ǫ0 )f (0, t0 ) = 0
PDE for f
And (for 3-SAT) probability density function f (c, t0 ) of c = C1 /N 1/3 given
by:
2
1∂ f
2 ∂c2
+ (t20 − ǫ0 ) ∂f
∂c + (c − c)f = 0
with boundary conditions:
∂c f (0, t0 ) + (t20 − ǫ0 )f (0, t0 ) = 0
We solve for f:
−(t20 −ǫ0 )c
f (c, t0 ) ∝ e
√
Ai[ 3 2c + z(t20 − ǫ0 )]
√
Ai′ (z)
3
where z(x) is the reciprocal function of x(z) = 2 Ai(z) .
Final result
ln Psucc = −N 1/6 φ(t0 = +∞) + O(1) with φ(t0 = −∞) = 0 and
∂t0 φ(t0 ) = c(t0 )
− ln Psucc ((1 + ǫ0 )αc , N ) = N 1/6 Φ(ǫ0 N 1/3 ) + O(1)
with
Φ(ǫ0 ) =
1
4
R +∞
−ǫ0
√ dx
ǫ0 +x
x2 − 22/3 z(x)]
E.g., for 3-SAT at critical initial α, Φ(0) ≈ 1.1277 .
Generality of this result
This computations depends on the algorithm only through C2 . Actually, if we
zoom in around the contact point tA for another algorithm, we get:
c2
= 1 + b(t − tA )2 when t → tA
1−t
computation is the same, but with a different value of b and tA . We find:
Φ
ǫ
ΦA (ǫ0 ) = rA
Φ(rA
ǫ0 )
where the rA ’s are functions of b and tA .
Generality of this result
This computations depends on the algorithm only through C2 . Actually, if we
zoom in around the contact point tA for another algorithm, we get:
c2
= 1 + b(t − tA )2 when t → tA
1−t
computation is the same, but with a different value of b and tA . We find:
Φ
ǫ
ΦA (ǫ0 ) = rA
Φ(rA
ǫ0 )
where the rA ’s are functions of b and tA .
Non generical case (possible if K > 3-SAT):
c2
= 1 + b(t − tA )4,6,...
1−t
Generality of this result
The computation depends on the distribution of formulae: the scaling function
Φ may change. But the exponents remain the same because of the robustness
of the distribution of sizes of the connected components of a random graph.
The 2-SAT case
Here the resolution trajectory cuts the α(1 − p) = 1 line, instead of becoming
tangent to it:
The 2-SAT case
Therefore, t = 0 + t0 N −τ with now τ = 1/3 instead of 1/6, and the
time-derivative term in the PDE for f is relevant:
2
1∂ f
2 ∂c2
1
+ β(p)(t0 − ǫ0 ) ∂f
+
∂c
2 (c − c)f =
∂f
∂t0
The 2-SAT case
Therefore, t = 0 + t0 N −τ with now τ = 1/3 instead of 1/6, and the
time-derivative term in the PDE for f is relevant:
2
1∂ f
2 ∂c2
1
+ β(p)(t0 − ǫ0 ) ∂f
+
∂c
2 (c − c)f =
∂f
∂t0
which yields a logarithmic behaviour:
ln Psucc
ln N
+ const + o(1)
=−
12β(p)
and β vanishes as p → 2/5.
Since we don’t know how to solve analytically the PDE, we have to solve it
numerically to get the constant. E.g. Psucc ∼ N −1/12 exp (1/4 − 0.24370(1))
for 2-SAT,
Psucc ∼ N −1/6 exp (5/16 − ln42 − 0.20157(1)) for 2+1/4-SAT
The 2-SAT case
0.17
simulations
theoretical
0.165
0.16
-log10(P)/log10(N)
0.155
0.15
0.145
0.14
0.135
0.13
0.125
0
0.05
0.1
0.15
0.2
1/log10(N)
0.25
0.3
0.35
0.4
The 2+2/5-SAT case
Finally, for p = 25 (1 + ǫp N −1/6 ) and at α close to αc ( 52 ) = 53 :
α=
5
2
(1 + ǫp N −1/6 + ǫα N −1/3 )
3
3
there is a transition between the 2- and 3-SAT cases:
ln(Psucc ) = N 1/6 Φ1 (ǫp , ǫα ) +
− ln ǫ
1
ln(N ) + O(1)
24
with Φ1 (ǫp ) ∝ ǫp p for ǫα = 49 ǫ2p and ǫp → −∞ (following the
line),
thus the 2-SAT behaviour is recovered if ǫp ≈ −N 1/6 .
5
c2
1−t
=1
Φ1 (0, 0) = Φ1 (+∞)2− 6 = 0.6329 is expressed as an integral with Airy Ai
functions (Φ1 (ǫp ) → 1.1277 as ǫp → +∞).
Phase space for algo. UC on 2 + p-SAT
Conclusion
Conclusion
◮ The framework of simple algorithms for solving random 2 + p-SAT
formulae is conceptually simple and mathematically tractable thanks to
the independence of random variables (no correlations in the disorder)
◮ But it yields a rather rich behaviour with several critical exponents and
windows, and we can even compute some of the scaling functions and
non-universal coefficients.
Conclusion
◮ The critical exponents are expected to be the same for all algorithms that
study a random NP problem (without too much correlations) and that use
the Unitary Propagation rule (but they have to use it!).
◮ What would be the results with strongly different correlations on the
random formulae (e.g. finite dimension, planar graphs, power-law degree
distributions...)?
◮ Can this understanding of the failure cause of our algorithms help us in
designing better algorithms (that find solutions up to αstatic )?

Download Report

Critical behaviour of combinatorial search algorithms, and the

Paperzz.com

Your Paperzz