Backdoor Sets in SAT Instances

Backdoor Sets in SAT Instances
Ryan Williams
Carnegie Mellon University
Joint work in IJCAI03 with:
Carla Gomes and Bart Selman
Cornell University
1
Significant progress in Complete
search methods!
(Complete = always returns SAT or unSAT)
Software and hardware verification
– complete methods are critical e.g. for verifying the correctness of
chip design, using SAT encodings
Current methods can verify
automatically the correctness of a
large fraction of a Pentium IV.
2
A “real world” example
(Thanks to: Oliver Kullmann)
3
Bounded Model Checking instance:
i.e. (( x1) or x7)
and ((x1) or x6)
and … etc.
4
10 pages later:
…
(x177 or x169 or x161 or x153 …
or x17 or x9 or x1 or (x185))
clauses / constraints are getting more interesting…
5
4000 pages later:
?!!
a 59-cnf
clause…
…
6
Finally, 15,000 pages later:
Note that:
… !!!
The MiniSat solver (Een&Sorensson) solves this instance in 2 seconds.
7
Gap between Theory and
Practice
The good scaling behavior of state-of-the art SAT
solvers seems to defy our complexity-theoretic
intuition that SAT is NP-complete!
How can we explain this gap between theory and practice?
What makes this possible?
Our answer: Hidden tractable substructure in real-world problems.
Can we make this more precise?
Proposal: We consider structures we call backdoor sets.
Idea came out of study of heavy-tailed phenomena in runtime
distributions for some SAT solvers.
8
Backdoor Sets – Initial Motivation
Heavy-tailed distributions and
Randomization.
Certain problems, when solved by
randomized backtracking, yield a runtime
distribution that is heavy-tailed
Pr[solution found in time t] ~ 1/t^c, 0 < c < 2
• Explains why restarting a solver often is
an effective strategy
• Implies a wide range of possible solution
times, often including short runs
How to explain short runs?
9
Explaining short runs:
Backdoors to tractability
Informally:
A backdoor set to a given problem instance is a
subset of its variables such that, once assigned
values, the remaining instance simplifies to a
tractable class.
Formally:
We define notion of a “sub-solver”
(handles tractable substructure of problem instance)
backdoor set and strong backdoor set
Defining a sub-solver
Definition is general enough to encompass many polynomial time
propagation methods.
(Also those for which we do not know a clean characterization of
the tractable subclass.)
Valid for other encoding languages besides SAT: e.g., Mixed
Integer Programming and Constraint Satisfaction Problems
11
Defining backdoors
Backdoor set (for satisfiable instances):
Strong backdoor set (applies to satisfiable or inconsistent instances):
12
Backdoors can be surprisingly small:
Backdoors help explain how a solver can get
“lucky” on certain runs: backdoor sets are
identified early on in backtracking search.
Most recent: Other combinatorial domains. E.g. Graphplan planning,
near constant size backdoors (2 or 3 variables) in certain domains.
(Hoffman, Gomes, Selman ’03)
Backdoors capture critical problem resources (bottlenecks).
Constraint Satisfaction Problem
The Constraint Satisfaction Problem (CSP):
• A finite set of n variables is given and with
each variable is associated a non-empty
finite domain.
• A constraint on k variables X1,…,Xk is a
relation R(X1,…,Xk)  D1 x …x Dk.
• A solution to a CSP is an assignment of
values to all the variables, satisfying all the
constraints.
• (Satisfaction of a constraint = the relation holds)
(Dechter 86, Freuder 82, Mackworth 77, Tsang 93, van Beek and Dechter 97)
14
Explicit Algorithms for
Finding/Exploiting Backdoor Sets
We cover three kinds of strategies for dealing with
instances with small backdoor sets:
• A deterministic algorithm
• A randomized algorithm
– Provably better worst-case performance over the
deterministic one
• A heuristic randomized algorithm
– Assumes existence of a good heuristic for
choosing variables to branch on
– We believe this is close to what happens in
practice
15
Deterministic Generalized
Iterative Deepening
16
Randomized Generalized
Iterative Deepening
Assumption:
There exists a backdoor whose size is
bounded by a function of n (call it B(n))
Idea:
Repeatedly choose random subsets of
variables that are slightly larger than B(n),
searching these subsets for the backdoor
17
Randomized Generalized
Iterative Deepening
18
Deterministic Versus Randomized
Suppose variables have 2 possible values
(e.g. SAT)
For B(n) = n/k, algorithm
runtime is cn
c
Deterministic algorithm
Randomized
algorithm
k
19
Complete Randomized Depth
First Search with Heuristic
Assume we have the following.
DFS, a generic depth first search randomized
backtrack search solver with:
• (polytime) sub-solver A
• Heuristic H that (randomly) chooses variables to
branch on, in polynomial time
 H has probability 1/h of choosing a
backdoor variable (h is a fixed constant)
Call this ensemble (DFS, H, A)
20
Polytime Restart Strategy for
(DFS, H, A)
Essentially:
If there is a small backdoor,
then (DFS, H, A) has a restart strategy
that runs in polytime.
21
Runtime Table for Algorithms
DFS,H,A
B(n) = upper bound on the size of a backdoor, given n variables
When the backdoor is a constant fraction of n, there is an
exponential improvement between the randomized and
deterministic algorithm
Summary
Introduced notion of a “backdoor set” of
variables.
1) More closely captures combinatorics of a
problem instance, as dealt with in practice.
2) Provides insight into restart strategies.
3) Backdoors can be surprisingly small in
practice.
4) Search heuristics + randomization can be
used to find them, provably efficiently.