The SAT 2009 competition results - does theory

The SAT 2009 competition results
does theory meet practice?
Daniel Le Berre Olivier Roussel Laurent Simon
Andreas Goerdt Ines Lynce Aaron Stump
Supported by CRIL, LRI and French ANR UNLOC
SAT 2009 conference, Swansea, 3 July 2009
1/65
For those having a computer and Wifi access
I
See the rules and benchmarks details at :
http://www.satcompetition.org/2009/
I
See the results live at :
http://www.cril.univ-artois.fr/SAT09/
2/65
The team
Organizers
I
I
I
Judges
I
I
I
Daniel Le Berre
Olivier Roussel
Laurent Simon (apart main track)
Andreas Goerdt
Ines Lynce
Aaron Stump
Computer infrastructure provided by CRIL (96 bi-processor cluster)
and LRI (48 quad-core cluster + one 16 core machine (68GB) for
the parallel track).
3/65
The tracks
Main track sequential solvers
competition Source code of the solver should be
available after the competition
demonstration Binary code should be available after
the competition (for research purpose)
Parallel Solvers tailored to run on multicore computers (up to
16 cores)
Minisat Hack Submission of (small) patches against latest public
release of Minisat2
Preprocessing track competition of preprocessors in front of
Minisat2.
4/65
Integration of the competition in the conference
Tuesday
I
I
Wednesday
I
I
I
I
I
I
I
Thursday
5/65
I
Efficiently Calculating Tree Measures Using SAT : bio 2 benchmarks
Finding Efficient Circuits Using SAT solvers : mod circuits benchmarks
On the fly clause improvement : Circus, main track
Problem sensitive restarts heuristics for the DPLL procedure :
Minisat09z, minisat hack
Improved Conflict-Clause Minimization Leads to Improved
Propositional Proof Traces : Minisat2Hack, minisat hack
A novel approach to combine SLS and a DPLL solver for the
satisfiability problem : hybridGM, main track
Building a Hybrid SAT solver via Conflict Driven, Look-Ahead and Xor
reasoning techniques : MoRsat, main track
Improving Variable Selection Process in Stochastic Local Search for
Propositional Satisfiability : slstc, main track
VARSAT : Integrating Novel Probabilistic Inference Techniques with
DPLL Search :VARSAT, main track
Width-Based Restart Policies for Clause Learning : Rsat, main track
Common rules to all tracks
I
No more than 3 solvers per submitter
I
Compared using a simple static ranking scheme
I
Results available for SAT, UNSAT and SAT+UNSAT
benchmarks.
I
Results available to the submitters for checking : It is the
responsibility of the competitor to check that his system
performed as expected !
6/65
New scoring scheme
I
Purse based scoring since 2005 (designed by Allen van Gelder).
pros I Take into account various aspects of the solver
I
cons
I
I
I
I
(power, robustness, speed).
Focus on singular solvers
Difficult to check (and understand)
Too much weight on singularity ?
Depends on the set of competitors
“Spec 2009” static scoring scheme desirable
I
I
7/65
To compare easily other solvers (e.g. reference solvers) without
disturbing the ranking of the competitors.
To allow anybody to compare his solver to the SAT 2009
competitors on similar settings.
Available metrics
NBTOTAL Total number of benchmarks to solve
NBSOLVED Total number of benchmarks solved within a given
timeout
NBUNSOLVEDSERIES Total number of set of benchmarks for
which the solver was unable to solve any element.
TIMEOUT Time allowed to solve a given benchmark
ti Time needed to solve a given benchmark, within the
time limit
PENALTY Constant to use as a penalty for benchmarks not
solved within the timeout
SERIESPENALTY Constant to use as a penalty for a set of
benchmarks in which all members cannot be solved
by the solver.
8/65
Spec 2009 proposals
I
I
P
Lexicographical NBSOLVED, ti
Cumulative time based, with timeout penalty
X
ti + (NBTOTAL − NBSOLVED) ∗ TIMEOUT ∗ PENALTY
I
Cumulative time based, with timeout penalty, log based
X
log10 (1+ti)+(NBTOTAL−NBSOLVED)∗log10 ((1+TIMEOUT )∗PENALTY )
I
Cumulative time based, with timeout and robustness penalties (Proposed
by
P Marijn Heule)
ti + (NBTOTAL − NBSOLVED) ∗ TIMEOUT ∗ PENALTY +
NBUNSOLVEDSERIES ∗ SERIESPENALTY
SAT 2005 and 2007 purse based scoring
I
9/65
Spec 2009 proposals and results of the votes
I
I
P
Lexicographical NBSOLVED, ti 9 votes
Cumulative time based, with timeout penalty 3 votes
X
ti + (NBTOTAL − NBSOLVED) ∗ TIMEOUT ∗ PENALTY
I
Cumulative time based, with timeout penalty, log based
X
log10 (1+ti)+(NBTOTAL−NBSOLVED)∗log10 ((1+TIMEOUT )∗PENALTY )
I
Cumulative time based, with timeout and robustness penalties (Proposed
by
P Marijn Heule) 4 votes
ti + (NBTOTAL − NBSOLVED) ∗ TIMEOUT ∗ PENALTY +
NBUNSOLVEDSERIES ∗ SERIESPENALTY
SAT 2005 and 2007 purse based scoring
I
9/65
Industrial vs Application
I
Many instances in the industrial category do not come from
industry
I
Application better reflects the wide use of SAT technology
10/65
Benchmarks selection : Random category
Based on O. Kullmann recommendations in 2005 (see [OK JSAT06] for details)
3-SAT
Medium
Large
ratio
4.26
4.2
Medium
Large
SAT
110
90
I
I
I
I
I
I
I
I
5-SAT
Generated benchmarks parameters
start-stop step ratio start-stop step
360-560
20 21.3
90-120
10
2000-18000 2000
20 700-1100 100
Number of generated benchmarks
UNKNOWN
SAT
UNKNOWN
110
40
40
50
-
7-SAT
ratio
89
81
SAT
40
50
start-stop
60-75
140-220
step
5
20
UNKNOWN
40
-
Balanced number of SAT/UNKNOWN benchmarks for complete solvers : 190/190
Specific benchmarks for complete SAT solvers : 190
Specific benchmarks for incomplete SAT solvers 190
Satisfiability of medium benchmarks checked using gNovelty+.
Satisfiability of large benchmarks per construction (ratio < threshold).
100 benchmarks generated for each setting.
Randomly selected benchmarks 10 using judges random seed
40 large 3-SAT benchmarks (20K-26K variables) added for the second stage
11/65
How to predict benchmark hardness for non-random
benchmarks ?
I
I
I
I
I
Problem : we need benchmarks to discriminate solvers (i.e.
not too easy, not too hard).
Challenging benchmarks necessary to see the limit of current
approaches
Idea : use a small set of last SAT winners in each categories
Rsat, Minisat and picosat to rank application benchmarks
March-KS, Satzilla-Crafted and Minisat for crafted
benchmarks
easy Solved within 30s by all the solvers
hard Not solved by any of the solvers (timeout used)
medium Remaining instances
12/65
Judges decisions regarding the selection of submitted vs
existing benchmarks
I
No more than 10% of the benchmarks should come from the
same source.
I
The final selection of benchmarks should contain 45% existing
benchmarks and 55% submitted benchmarks.
I
The final selection should contain 10% easy, 40% medium and
50% hard benchmarks.
I
Duplicate benchmarks found after the selection was done will
simply be removed from the selection. No other benchmarks
will be added to the selection.
13/65
Application benchmarks submitted to the competition
Aprove (Carsten Fuhs) Term Rewriting systems benchmarks.
BioInfo I (Fabien Corblin) Queries to find the maximal size of a
biological behavior without cycles in discrete genetic
networks.
BioInfo II (Maria Louisa Bonet) Evolutionary trees (presented on
Tuesday).
Bit Verif (Robert Brummayer) Bit precise software verification
generated by the SMT solver Boolector.
C32SAT Submitted by Hendrik Post and Carsten Sinz. Software
verification generated by the C32SAT satisfiability checker
for C programs.
Crypto (Milan Sesum) Encode attacks for both the DES and MD5
crypto systems.
Diagnosis (Anbulagan and Alban Grastien) 4 different encodings of
discrete event systems.
14/65
Application benchmarks : classification
Origin
EASY
MEDIUM
HARD
Total
SAT UNSAT SAT UNSAT SAT UNSAT UNKOWN
SAT RACES
6
18
43
50
3
21
141
SAT COMP 07
6
15
47
49
7
12
45
181
SUBMITTED 09
60
38
38
60
8
12
102
318
Total
72
71 128
159
18
45
147
640
Origin
EASY
MEDIUM
HARD
Total
SAT UNSAT SAT UNSAT SAT UNSAT UNKOWN
Aprove
21
4
25
BioInfo I
3
6
11
20
BioInfo II
9
4
3
24
40
Bit Verif
14
22
6
23
65
C32SAT
1
1
3
3
2
10
Crypto
5
7
6
4
40
62
Diagnosis
22
23
16
15
4
3
13
96
Total
60
38
38
60
8
12
102
318
15/65
Application benchmarks, final selection
Origin
old
new
Total
16/65
SAT
1
18
19
EASY
UNSAT
9
1
10
ALL
10
19
29
SAT
21
25
46
MEDIUM
UNSAT ALL
33
54
40
65
73 119
SAT
6
8
14
HARD
UNSAT UNK
23
34
10
63
33
97
Total
ALL
63
81
144
127
165
292
Crafted benchmarks submitted to the competition
Edge Matching Submitted by Marijn Heule. Four encodings of
edge matching problems
Mod Circuits submitted by Grigory Yaroslavtsev. Presented on
Tuesday.
Parity Games submitted by Oliver Friedmann. The generator
encodes parity games of a fixed size n that forced the
strategy improvement algorithm to require at least i
iterations.
Ramsey Cube Submitted by Philipp Zumstein.
RB SAT Submitted by Nouredine Ould Mohamedou. Random
CSP problems encoded into SAT.
Sgen submitted by Ivor Spence. Small but hard satisfibility
benchmarks, either SAT or UNSAT.
SGI submitted by Calin Auton. Random SGI model
-SRSGI. Sub Graph isomorphism problems.
17/65
Difficulty of crafted benchmarks
Origin
Edge Matching
ModCircuits
Parity Games
Ramsey Cube
RBSAT
SGEN
SGI
Total
Origin
old
new
Total
18/65
SAT
19
19
SAT
6
1
5
106
118
EASY
UNSAT
1
8
1
10
EASY
UNSAT
4
7
11
ALL
4
26
30
MEDIUM
SAT UNSAT
20
4
1
7
2
5
3
34
1
4
2
1
75
9
SAT
19
50
69
SAT
6
6
MEDIUM
UNSAT ALL
42
61
9
59
65 120
HARD
UNSAT UNKOWN
6
13
1
1
325
9
355
SAT
4
6
11
HARD
UNSAT UNK
12
58
70
10
129
Total
32
19
24
10
360
21
107
573
Total
ALL
74
76
150
139
161
300
Preprocessor track : aim
Back to the first competition aim :
I
a lot of new methods exist, but hard to tell which one is the
best
I
Satelite is widely used, but getting old
I
We want to encourage new methods
I
Allow to easily enhance all solvers by just adding preprocessors
in front of them
19/65
Preprocessor track : competitors
Solver name
IUT BMB SIM 1.0
ReVivAl 0.23
ReVivAl 0.23 + SatElite
SatElite + ReVivAl 0.23
kw pre
minisat2-core
minisat2-simp
20/65
Authors
Competition division
Abdorrahim Bahrami, Seyed Rasoul Mousavi, Kiarash Bazargan
Cédric Piette
Cédric Piette
Cédric Piette
Demonstration division
Johan Alfredsson
Reference solvers
Niklas Een and Niklas Sorensson
Niklas Een and Niklas Sorensson
Preprocessing track : experimental settings
Benchmarks the one from the main track in both application and
crafted categories.
SAT engine Minisat2 070721 core solver (without preprocessing).
Comparison criteria the preprocessor and the engine seen as a
black box.
Timeout 1200s.
21/65
Preprocessing track : the results in application category
Rank
1
2
3
4
5
6
7
22/65
Solver
Virtual Best Solver (VBS)
kw pre
ReVivAl 0.23 + SatElite
SatElite + ReVivAl 0.23
ReVivAl 0.23
minisat2-simp
IUT BMB SIM 1.0
minisat2-core
Total
163
149
121
119
117
116
111
106
SAT
67
58
51
48
53
46
46
47
UNSAT
96
91
70
71
64
70
65
59
CPU Time
33886.97
34591.65
39093.24
38374.13
44067.36
25111.90
30273.14
23477.71
Preprocessing track running time : application
(SAT+UNSAT)
Time to solve an instance
(SAT/UNSAT answers, category APPLICATION)
1200
1000
CPU time (s)
800
600
400
200
0
0
20
40
60
80
100
number of solved instances
IUT_BMB_SIM 1.0
kw_pre 2009-03-21
minisat2-core 070721
minisat2-simp 070721
23/65
120
140
160
ReVivAl 0.23 2009-03-18
ReVivAl 0.23 + SatElite 2009-03-18
SatElite + ReVivAl 0.23 2009-03-18
Preprocessing track : the results in crafted category
Rank
1
2
3
4
5
6
7
24/65
Solver
Virtual Best Solver (VBS)
minisat2-simp
SatElite + ReVivAl 0.23
ReVivAl 0.23 + SatElite
ReVivAl 0.23
IUT BMB SIM 1.0
kw pre
minisat2-core
Total
137
119
119
119
114
107
106
100
SAT
92
76
75
75
72
74
72
69
UNSAT
45
43
44
44
42
33
34
31
CPU Time
20732.67
23212.54
24059.71
24622.54
20435.40
23163.33
16298.74
17639.05
Preprocessing track running time : crafted (SAT+UNSAT)
Time to solve an instance
(SAT/UNSAT answers, category CRAFTED)
1200
1000
CPU time (s)
800
600
400
200
0
0
20
40
60
80
100
120
number of solved instances
IUT_BMB_SIM 1.0
kw_pre 2009-03-21
minisat2-core 070721
minisat2-simp 070721
25/65
ReVivAl 0.23 2009-03-18
ReVivAl 0.23 + SatElite 2009-03-18
SatElite + ReVivAl 0.23 2009-03-18
Minisat Hack track : aim
I
Observe the effect of clearly identified “small changes” in a
widely used solver
I
Help understand what is really important in Minisat, what can
be improved, ...
I
Ensure that all solvers are comparable (small syntactic
changes)
I
Encourage easy entries to the competition (e.g. Master or first
year PhD student)
26/65
Minisat Hack competitors
Solver name
Authors
Submissions
Alexander Mishunin and Grigory Yaroslavtsev
Kiyonori Taniguchi, Miyuki Koshimura, Hiroshi Fujita, and Ryuzo Hasegawa
Markus Iser
Allen Van Gelder
Kazuya Masuda and Tomio Kamada
Reference solvers
minisat2 core
Niklas Een and Niklas Sorensson
Solvers presented during the SAT 2009 conference
APTUSAT
BinMiniSat
MiniSAT 09z
MiniSat2hack
minisat cumr p/r
27/65
Minisat hack results
Rank
1
2
3
4
5
6
7
28/65
Solver
Virtual Best Solver (VBS)
MiniSAT 09z
minisat cumr p
minisat cumr r
APTUSAT
BinMiniSat
minisat2 core
MiniSat2hack
Total
169
149
142
131
123
123
120
119
SAT
71
59
58
60
54
48
53
52
UNSAT
98
90
84
71
69
75
67
67
CPU Time
40959.35
37228.91
32636.31
29316.97
25418.27
29326.67
25600.16
24024.97
Minisat hack running time : application (SAT+UNSAT)
Time to solve an instance
(SAT/UNSAT answers, category APPLICATION)
1200
1000
CPU time (s)
800
600
400
200
0
0
20
40
60
80
100
120
140
160
number of solved instances
APTUSAT 2009-03-22
BinMiniSat 2009-03-21
MiniSAT 09z 2009-03-22
minisat2 070721/core
29/65
MiniSat2hack 2009-03-23
minisat_cumr p-2009-03-18
minisat_cumr r-2009-03-18
Parallel (multithreads) track : aim
We’ll have to deal with multicores computers, let’s start thinking
about it.
I
Naive parellelization should not work on many cores : memory
access is a hard bottleneck for SAT solvers
I
We would like to observe if multithreaded solvers scale well on
a machine with 16 cores.
30/65
Parallel (multithreads) track : aim
We’ll have to deal with multicores computers, let’s start thinking
about it.
I
Naive parellelization should not work on many cores : memory
access is a hard bottleneck for SAT solvers
I
We would like to observe if multithreaded solvers scale well on
a machine with 16 cores.
I
Problem : not enough competitors !
30/65
Parallel track : the competitors
Solver name
Authors
No limit on threads
gNovelty+-T
Duc-Nghia Pham and Charles Gretton
satake
Kota Tsuyuzaki
ttsth-5-0
Ivor Spence
Limited to 4 threads
ManySAT 1.1 aimd 0/1/2 Youssef Hamadi, Saı̈d Jabbour, Lakhdar Saı̈s
31/65
Parallel track : the settings
I
Parallel solvers ran on 3 different computers :
2 processors with the main track, first stage, at CRIL.
4 cores on a cluster of 4 core computers at LRI.
16 cores on one specific 16 core computer at LRI.
I
The solvers are given 10000s CPU time to be shared by the
different threads : to be compared with the second stage of
the main track.
I
We ran only solvers able to use the 16 cores on the 16 core
computer.
32/65
Parallel track : the results
Solver
Total SAT
Application
2 Threads (CRIL)
ManySAT 1.1 aimd 1
193
71
4 Threads (LRI)
ManySAT 1.1 aimd 1
187
68
ManySAT 1.1 aimd 0
185
69
ManySAT 1.1 aimd 2
181
65
satake
118
52
ttsth-5-0
7
3
16 Threads (LRI)
satake
106
40
ttsth-5-0
7
3
Random
gNovelty+-T (2 threads CRIL) 314
314
gNovelty+-T (4 threads LRI) 296
296
gNovelty+-T (16 threads LRI) 237
237
33/65
UNSAT
CPU Time
122
173344.71
119
116
116
66
4
112384.15
103255.01
104021.63
50543.61
2274.38
66
4
130477.38
9007.53
-
143439.69
95118.33
68173.49
The main track : competitors
Solver name
adaptg2wsat2009/++
CircUs
clasp 1.2.0-SAT09-32
CSat 2009-03-22
glucose 1.0
gnovelty+/2/2-H
Hybrid2
hybridGM 1/3/7
HydraSAT base/flat/multi
iPAWS
IUT BMB SAT 1.0
LySAT c/i
march hi/nn
MoRsat
MXC
NCVWr
picosat 913
precosat 236
Rsat
SApperloT base/hrp
SAT4J CORE 2.1 RC1
SATzilla2009 C/I/R
slstc 1.0
TNM
tts-5-0
VARSAT-crafted/random/industrial
kw 2009-03-20
MiniSat 2.1 (Sat-race’08 Edition)
34/65
Authors
chuMin Li, Wanxia Wei
Hyojung Han
Benjamin Kaufmann
Guanfeng Lv, Qian Wang, Kaile Su
Gilles Audemard and Laurent Simon
Duc-Nghia Pham and Charles Gretton
Wanxia Wei, Chu Min Li, and Harry Zhang
Adrian Balint
Christoph Baldow, Friedrich Gräter, Steffen Hölldobler, Norbert Manthey, Max Seelemann, Peter Steinke, C
John Thornton and Duc Nghia Pham
Abdorrahim Bahrami, Seyed Rasoul Mousavi, Kiarash Bazargan
Youssef Hamadi, Saı̈d Jabbour, Lakhdar Saı̈s
Marijn Heule
Jingchao Chen
David Bregman
Wanxia Wei, Chu Min Li, and Harry Zhang
Armin Biere
Armin Biere
Knot Pipatsrisawat and Adnan Darwiche
Stephan Kottler
Daniel Le Berre
Lin Xu, Frank Hutter, Holger H. Hoos and Kevin Leyton-Brown
Anton Belov, Zbigniew Stachniak
Wanxia Wei and Chu Min Li
Ivor Spence
Eric Hsu
Johan Alfredsson
Niklas Sorensson, Niklas Een
The main track : reference solvers from 2007
Solver name
adaptg2wsat+
gnovelty+
March KS
SATzilla RANDOM
picosat 535
Rsat 07
SATzilla CRAFTED
minisat SAT 2007
35/65
Authors
Random
Wanxia Wei, Chu-Min Li and Harry Zhang
Duc Nghia Pham and Charles Gretton
Marijn Heule and Hans van Maaren
Lin Xu, Frank Hutter, Holger H. Hoos and Kevin Leyton-Brown
Application
Armin Biere
Knot Pipatsrisawat and Adnan Darwiche
Crafted
Lin Xu, Frank Hutter, Holger H. Hoos and Kevin Leyton-Brown
Niklas Sorensson and Niklas Een
Main track : phase 1, application
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
36/65
Solver
Virtual Best Solver (VBS)
precosat 236
MiniSat 2.1
LySAT i
glucose 1.0
MiniSAT 09z
kw
ManySAT 1.1 aimd 1
ManySAT 1.1 aimd 0
MXC
ManySAT 1.1 aimd 2
CircUs
Rsat
SATzilla2009 I
minisat cumr p
picosat 913
clasp 1.2.0-SAT09-32
Rsat 2007
SApperloT base
picosat 535
Total
196
164
155
153
152
152
150
149
149
147
145
144
143
142
141
139
138
133
129
126
SAT
79
65
65
57
54
59
58
54
54
62
51
59
53
60
58
63
53
56
55
59
UNSAT
117
99
90
96
98
93
92
95
95
85
94
85
90
82
83
76
85
77
74
67
CPU Time
33863.84
37379.67
27011.56
35271.11
34784.84
37872.87
35080.23
34834.19
38639.59
27968.90
34242.50
36680.28
31000.89
33608.36
29304.08
34013.47
33317.37
28975.23
31762.78
33871.13
Main track : phase 1, application continued
Rank
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
37/65
Solver
Virtual Best Solver (VBS)
LySAT c
IUT BMB SAT 1.0
HydraSAT Base
HydraSAT-Flat Flat
VARSAT-industrial
SApperloT hrp
HydraSAT-Multi
SATzilla2009 C
VARSAT-crafted
SAT4J CORE 2.1 RC1
satake
CSat 2009-03-22
SATzilla2009 R
VARSAT-random
march hi
march nn
Hybrid2
adaptg2wsat2009
adaptg2wsat2009++
Total
196
123
116
116
115
110
107
106
106
99
95
92
91
59
59
21
21
12
11
11
SAT
79
51
46
53
51
49
42
49
45
44
46
40
40
36
25
9
10
11
8
8
UNSAT
117
72
70
63
64
61
65
57
61
55
49
52
51
23
34
12
11
1
3
3
CPU Time
33863.84
26865.49
20974.40
26856.33
26016.34
22753.77
20954.19
16308.48
25974.72
23553.01
25380.84
18309.62
20461.14
6260.03
16836.65
5170.80
6189.51
3851.84
1746.45
1806.37
Main track : phase 1, application continued two
Rank
39
40
41
42
43
44
45
46
47
48
49
50
38/65
Solver
Virtual Best Solver (VBS)
slstc 1.0
tts
NCVWr
iPAWS
ttsth-5-0
hybridGM7
gnovelty+
gNovelty+-T
TNM
hybridGM 1
hybridGM3
gnovelty+2
Total
196
10
10
10
8
8
7
7
7
6
5
5
4
SAT
79
9
6
9
8
4
7
7
7
5
5
5
4
UNSAT
117
1
4
1
3
4
1
-
CPU Time
33863.84
2093.29
2539.03
2973.72
1400.34
2937.42
468.76
1586.83
1826.46
1157.83
731.62
1103.11
91.85
First stage : crafted
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
39/65
Solver
Virtual Best Solver (VBS)
clasp 1.2.0-SAT09-32
SATzilla2009 I
SATzilla2009 C
MXC 2009-03-10
precosat 236
IUT BMB SAT 1.0
minisat SAT 2007
SATzilla CRAFTED
MiniSat 2.1 (Sat-race’08 Edition)
glucose 1.0
VARSAT-industrial
SApperloT base
picosat 913
LySAT c
CircUs
kw
Rsat
SATzilla2009 R
ManySAT 1.1 aimd 1
Total
194
131
128
125
124
122
120
119
114
114
114
113
113
112
112
107
106
105
104
103
SAT
124
78
86
73
80
81
76
76
82
74
75
73
73
80
70
70
72
71
78
72
UNSAT
70
53
42
52
44
41
44
43
32
40
39
40
40
32
42
37
34
34
26
31
CPU Time
19204.67
22257.76
21700.11
16701.85
22256.57
22844.50
22395.97
22930.58
18066.80
18107.02
20823.96
22306.77
22826.65
17111.73
21080.61
16148.01
16460.37
14010.73
14460.38
14991.64
First stage : crafted continued
Rank
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
40/65
Solver
Virtual Best Solver (VBS)
HydraSAT-Multi
HydraSAT-Flat
SApperloT hrp
minisat cumr p
VARSAT-crafted
LySAT i
ManySAT 1.1 aimd 2
ManySAT 1.1 aimd 0
HydraSAT base
MiniSAT 09z
VARSAT-random
satake
iPAWS
SAT4J CORE 2.1 RC1
adaptg2wsat2009
adaptg2wsat2009++
Hybrid2
CSat
Total
194
103
102
102
102
102
100
99
99
99
99
84
75
71
71
70
66
66
65
SAT
124
70
70
69
75
61
69
70
71
66
72
47
55
71
50
68
64
66
50
UNSAT
70
33
32
33
27
41
31
29
28
33
27
37
20
21
2
2
15
CPU Time
19204.67
20825.53
17796.15
20647.84
23176.38
23304.40
14874.18
14211.48
15251.61
16718.94
17027.31
14023.19
16261.12
7352.89
15136.95
9425.51
5796.69
10425.56
10319.33
First stage : crafted continued two
Rank
38
39
40
41
42
43
44
45
46
47
48
49
50
51
41/65
Solver
Virtual Best Solver (VBS)
march hi
TNM
March KS
march nn
gnovelty+
gNovelty+-T
hybridGM
hybridGM3
NCVWr
tts 5-0
ttsth-5-0
gnovelty+2
hybridGM7
slstc 1.0
Total
194
63
62
61
58
54
53
51
51
48
46
46
46
38
33
SAT
124
45
62
42
43
54
53
51
51
48
25
24
44
38
33
UNSAT
70
18
19
15
21
22
2
-
CPU Time
19204.67
10622.02
8181.19
9021.93
6232.17
5853.95
5073.82
5298.30
6737.29
12116.63
2507.80
4020.68
4840.28
4385.10
4228.67
Random results
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Solver
Virtual Best Solver (VBS)
SATzilla2009 R
TNM
gnovelty+2
hybridGM3
hybridGM7
adaptg2wsat2009++
hybridGM 1
adaptg2wsat2009
Hybrid2
gnovelty+
NCVWr
gnovelty+
SATzilla RANDOM
gNovelty+-T
adaptg2wsat+
iPAWS
march hi
march nn
March KS
SATzilla2009 I
Total
459
365
317
305
299
298
297
294
294
290
281
278
272
268
266
265
258
247
243
239
145
SAT
359
299
317
305
299
298
297
294
294
290
281
278
272
177
266
265
258
147
145
149
90
UNSAT
100
66
91
100
98
90
55
CPU Time
62339.75
51997.72
35346.17
26616.48
23272.79
25567.23
26432.65
23732.78
26658.47
30134.40
25523.72
31132.10
21956.28
42919.16
22823.37
22333.18
19296.93
65568.89
66494.85
57869.03
37645.86
Random results : weak solvers
Rank
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
43/65
Solver
slstc 1.0
clasp 1.2.0-SAT09-32
VARSAT-random
picosat 913
SATzilla2009 C
VARSAT-industrial
VARSAT-crafted
SApperloT base
IUT BMB SAT 1.0
MXC
LySAT c
MiniSat 2.1 (Sat-race’08 Edition)
minisat cumr p
precosat 236
satake
SApperloT hrp
glucose 1.0
Total
118
84
83
79
73
71
70
70
63
61
60
41
29
27
24
17
17
SAT
118
66
72
57
61
61
60
53
50
50
48
37
29
25
24
13
17
UNSAT
18
11
22
12
10
10
17
13
11
12
4
2
4
-
CPU Time
13250.77
32979.32
30273.41
29440.52
22395.73
27295.84
27367.38
28249.79
25630.38
28069.37
24329.68
16957.09
14078.15
9522.84
11034.05
7724.34
7772.56
Random problems : very bad solvers
Rank
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
44/65
Solver
HydraSAT-Flat
HydraSAT-Multi
HydraSAT Base
CircUs
ManySAT 1.1 aimd 0
ManySAT 1.1 aimd 2
LySAT i
CSat
Rsat
ManySAT 1.1 aimd 1
kw
SAT4J CORE 2.1 RC1
MiniSAT 09z
tts 5.0
ttsth-5-0
Total
16
16
13
8
7
6
6
6
5
5
4
4
3
0
0
SAT
15
16
13
8
7
6
6
6
5
5
4
4
3
0
0
UNSAT
1
0
0
CPU Time
5738.82
7836.19
4930.65
2553.24
1783.47
957.09
2124.49
2263.92
1801.20
4144.36
635.52
1440.19
1096.04
0.00
0.00
Finally
The results of the second stage !
45/65
Final results, Application, SAT+UNSAT
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Solver
Virtual Best Solver (VBS)
precosat 236
glucose 1.0
LySAT i
CircUs
SATzilla2009 I 2009-03-22
MiniSat 2.1 (Sat-race’08 Edition)
ManySAT 1.1 aimd 1
MiniSAT 09z
MXC
minisat cumr p
Rsat
SApperloT base
Rsat 2007-02-08
kw
clasp 1.2.0-SAT09-32
picosat 535
Total
229
204
204
197
196
195
194
193
193
190
190
188
186
180
175
175
171
SAT
91
79
77
73
77
81
78
71
78
79
75
74
78
69
67
60
76
UNSAT
138
125
127
124
119
114
116
122
115
111
115
114
108
111
108
115
95
CPU Time
153127.06
180345.80
218826.10
198491.53
229285.44
234743.41
144548.45
173344.71
184696.75
180409.82
206371.06
187726.95
282488.39
195748.38
90213.34
163460.74
209004.97
Cactus plot : application SAT+UNSAT
47/65
Final results, Application, SAT only
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Solver
Virtual Best Solver (VBS)
SATzilla2009 I
precosat 236
MXC
MiniSat 2.1 (Sat-race’08 Edition)
MiniSAT 09z
SApperloT base
CircUs
glucose 1.0
picosat 535
minisat cumr p
Rsat 2009-03-22
LySAT i
ManySAT 1.1 aimd 1
Rsat 2007-02-08
kw
clasp 1.2.0-SAT09-32
SAT
91
81
79
79
78
78
78
77
77
76
75
74
73
71
69
67
60
CPU Time
52336.24
96609.87
52903.18
75203.55
42218.37
75075.48
111286.45
74720.59
90532.72
84382.33
67373.20
85363.26
81793.98
62994.30
47294.67
31254.87
25529.94
Final results, Application, SAT only
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Solver
Virtual Best Solver (VBS)
SATzilla2009 I
precosat 236
MXC
MiniSat 2.1 (Sat-race’08 Edition)
MiniSAT 09z
SApperloT base
CircUs
glucose 1.0
picosat 535
minisat cumr p
Rsat 2009-03-22
LySAT i
ManySAT 1.1 aimd 1
Rsat 2007-02-08
kw
clasp 1.2.0-SAT09-32
SAT
91
81
79
79
78
78
78
77
77
76
75
74
73
71
69
67
60
CPU Time
52336.24
96609.87
52903.18
75203.55
42218.37
75075.48
111286.45
74720.59
90532.72
84382.33
67373.20
85363.26
81793.98
62994.30
47294.67
31254.87
25529.94
Cactus plot : application SAT (timeout matters !)
49/65
Final results, Application, UNSAT only
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Solver
Virtual Best Solver (VBS)
glucose 1.0
precosat 236
LySAT i
ManySAT 1.1 aimd 1
CircUs
MiniSat 2.1 (Sat-race’08 Edition)
MiniSAT 09z
clasp 1.2.0-SAT09-32
minisat cumr p
Rsat
SATzilla2009 I
MXC
Rsat 2007-02-08
kw 2009-03-20
SApperloT base
picosat 535
UNSAT
138
127
125
124
122
119
116
115
115
115
114
114
111
111
108
108
95
CPU Time
100790.82
128293.39
127442.62
116697.55
110350.41
154564.85
102330.08
109621.27
137930.80
138997.86
102363.69
138133.54
105206.27
148453.71
58958.47
171201.93
124622.64
Cactus plot : application UNSAT (timeout matters !)
51/65
Final results, Crafted, SAT+UNSAT
Rank
1
2
3
4
5
6
7
8
9
10
11
12
Solver
Virtual Best Solver (VBS)
clasp 1.2.0-SAT09-32
SATzilla2009 C
minisat SAT 2007
IUT BMB SAT 1.0
SApperloT base
MXC
VARSAT-industrial
precosat 236
LySAT c
SATzilla CRAFTED
MiniSat 2.1 (Sat-race’08 Edition)
glucose 1.0
Total
187
156
155
150
149
149
146
145
141
141
137
137
135
SAT
108
92
83
90
89
92
91
85
90
83
84
87
86
UNSAT
79
64
72
60
60
57
55
60
51
58
53
50
49
CPU Time
62264.60
89194.49
94762.27
99960.89
93502.16
108298.52
76965.59
119365.13
66318.44
89925.84
76856.90
78381.80
70385.63
Cactus plot : crafted SAT+UNSAT
53/65
Final results : crafted SAT only
Rank
1
2
3
4
5
6
7
8
9
10
11
12
54/65
Solver
Virtual Best Solver (VBS)
clasp 1.2.0-SAT09-32
SApperloT base
MXC 2009-03-10
precosat 236
minisat SAT 2007
IUT BMB SAT 1.0
MiniSat 2.1 (Sat-race’08 Edition)
glucose 1.0
VARSAT-industrial
SATzilla CRAFTED
SATzilla2009 C
LySAT c
SAT
108
92
92
91
90
90
89
87
86
85
84
83
83
CPU Time
21224.84
49775.04
54682.14
39227.16
34447.16
48346.20
45287.01
41994.77
37779.61
54521.77
21726.48
39383.44
42073.80
Cactus plot : crafted SAT only
55/65
Final results : crafted UNSAT only
Rank
1
2
3
4
5
6
7
8
9
10
11
12
56/65
Solver
Virtual Best Solver (VBS)
SATzilla2009 C
clasp 1.2.0-SAT09-32
IUT BMB SAT 1.0
minisat SAT 2007
VARSAT-industrial
LySAT c
SApperloT base
MXC
SATzilla CRAFTED
precosat 236
MiniSat 2.1 (Sat-race’08 Edition)
glucose 1.0
UNSAT
79
72
64
60
60
60
58
57
55
53
51
50
49
CPU Time
41039.76
55378.83
39419.45
48215.14
51614.69
64843.36
47852.03
53616.38
37738.43
55130.42
31871.28
36387.03
32606.02
Cactus plot : crafted UNSAT only
57/65
Random, SAT (420/380 benchmarks)
Rank
1
2
3
4
5
6
7
8
9
10
11
12
58/65
Solver
Virtual Best Solver (VBS)
TNM
gnovelty+2
hybridGM3
SATzilla2009 R
adaptg2wsat2009++
gnovelty+ 2007-02-08
gNovelty+-T
adaptg2wsat+ 2007-02-08
iPAWS
SATzilla RANDOM
March KS 2007-02-08
march hi
SAT
404 /371
379/353
355/352
340/309
339/335
338/337
318/311
314/309
298
288
181
177
173
CPU Time
97656.83
194780.22
154503.93
101986.32
122158.36
133641.90
130357.30
143439.69
117302.89
93855.93
23793.38
98629.25
90433.09
Cactus plot : random SAT only
59/65
Random, UNSAT/SAT+UNSAT
Rank
1
2
3
4
1
2
3
4
60/65
Solver
Total
SAT UNSAT
SAT+UNSAT (610 benchmarks)
SATzilla2009 R
435/431 339/335
96
march hi
313
173
140
SATzilla RANDOM
308
181
127
March KS 2007-02-08
308
177
131
UNSAT
(190
march hi
140
March KS 2007-02-08
131
SATzilla RANDOM
127
SATzilla2009 R
96
CPU Time
231051.45
261826.59
186335.14
258763.45
benchmarks)
171393.50
160134.20
162541.76
108893.09
Cactus plot : random UNSAT only
61/65
Awards Summary
Category
Gold
SAT
UNSAT
SAT+UNSAT
SATzilla2009 I
glucose
precosat
SAT
UNSAT
SAT+UNSAT
clasp
SATzilla2009 C
clasp
SAT
UNSAT
SAT+UNSAT
Special prizes
TNM
March hi
SATzilla2009 R
Silver
Application
precosat
precosat
glucose
Crafted
SApperloT
clasp
SATzilla2009 C
Random
gNovelty2+
SATzilla2009 R
March hi
Bronze
MXC
Lysat
Lysat
MXC
IUT BMB SAT
IUT BMB SAT
hybridGM3 / adaptg2wsat2009++
-
ManySAT Best parallel SAT solver for application benchmarks
gNovelty+-T Best parallel solver for random benchmarks
Minisat 09z Best Minisat Hack solver
62/65
Summary of the competition
I
Steady improvement since SAT competition 2007
I
Huge improvements in SLS solvers (Random SAT)
I
Many good solvers !
I
Many new solvers awarded !
I
The portfolio approach is quite accurate despite 55% of new
benchmarks in the competition !
I
Parallel SAT solving evaluation needs to be discussed !
63/65
Discussion
Is the competition good or bad for the community ?
It depends of the use of the results !
I
independent public results available
I
new benchmarks appear each year
I
high visibility outside the community
I
reward for solver designers
Main problem : how to spot good new idea ?
I
purse based scoring ?
I
time independent metrics ?
I
submit also less mature solvers !
64/65
Provocative idea
Should we remove the random category to force incomplete solver
designers to tune their approach to application benchmarks ?
65/65