The SAT 2009 competition results does theory meet practice? Daniel Le Berre Olivier Roussel Laurent Simon Andreas Goerdt Ines Lynce Aaron Stump Supported by CRIL, LRI and French ANR UNLOC SAT 2009 conference, Swansea, 3 July 2009 1/65 For those having a computer and Wifi access I See the rules and benchmarks details at : http://www.satcompetition.org/2009/ I See the results live at : http://www.cril.univ-artois.fr/SAT09/ 2/65 The team Organizers I I I Judges I I I Daniel Le Berre Olivier Roussel Laurent Simon (apart main track) Andreas Goerdt Ines Lynce Aaron Stump Computer infrastructure provided by CRIL (96 bi-processor cluster) and LRI (48 quad-core cluster + one 16 core machine (68GB) for the parallel track). 3/65 The tracks Main track sequential solvers competition Source code of the solver should be available after the competition demonstration Binary code should be available after the competition (for research purpose) Parallel Solvers tailored to run on multicore computers (up to 16 cores) Minisat Hack Submission of (small) patches against latest public release of Minisat2 Preprocessing track competition of preprocessors in front of Minisat2. 4/65 Integration of the competition in the conference Tuesday I I Wednesday I I I I I I I Thursday 5/65 I Efficiently Calculating Tree Measures Using SAT : bio 2 benchmarks Finding Efficient Circuits Using SAT solvers : mod circuits benchmarks On the fly clause improvement : Circus, main track Problem sensitive restarts heuristics for the DPLL procedure : Minisat09z, minisat hack Improved Conflict-Clause Minimization Leads to Improved Propositional Proof Traces : Minisat2Hack, minisat hack A novel approach to combine SLS and a DPLL solver for the satisfiability problem : hybridGM, main track Building a Hybrid SAT solver via Conflict Driven, Look-Ahead and Xor reasoning techniques : MoRsat, main track Improving Variable Selection Process in Stochastic Local Search for Propositional Satisfiability : slstc, main track VARSAT : Integrating Novel Probabilistic Inference Techniques with DPLL Search :VARSAT, main track Width-Based Restart Policies for Clause Learning : Rsat, main track Common rules to all tracks I No more than 3 solvers per submitter I Compared using a simple static ranking scheme I Results available for SAT, UNSAT and SAT+UNSAT benchmarks. I Results available to the submitters for checking : It is the responsibility of the competitor to check that his system performed as expected ! 6/65 New scoring scheme I Purse based scoring since 2005 (designed by Allen van Gelder). pros I Take into account various aspects of the solver I cons I I I I (power, robustness, speed). Focus on singular solvers Difficult to check (and understand) Too much weight on singularity ? Depends on the set of competitors “Spec 2009” static scoring scheme desirable I I 7/65 To compare easily other solvers (e.g. reference solvers) without disturbing the ranking of the competitors. To allow anybody to compare his solver to the SAT 2009 competitors on similar settings. Available metrics NBTOTAL Total number of benchmarks to solve NBSOLVED Total number of benchmarks solved within a given timeout NBUNSOLVEDSERIES Total number of set of benchmarks for which the solver was unable to solve any element. TIMEOUT Time allowed to solve a given benchmark ti Time needed to solve a given benchmark, within the time limit PENALTY Constant to use as a penalty for benchmarks not solved within the timeout SERIESPENALTY Constant to use as a penalty for a set of benchmarks in which all members cannot be solved by the solver. 8/65 Spec 2009 proposals I I P Lexicographical NBSOLVED, ti Cumulative time based, with timeout penalty X ti + (NBTOTAL − NBSOLVED) ∗ TIMEOUT ∗ PENALTY I Cumulative time based, with timeout penalty, log based X log10 (1+ti)+(NBTOTAL−NBSOLVED)∗log10 ((1+TIMEOUT )∗PENALTY ) I Cumulative time based, with timeout and robustness penalties (Proposed by P Marijn Heule) ti + (NBTOTAL − NBSOLVED) ∗ TIMEOUT ∗ PENALTY + NBUNSOLVEDSERIES ∗ SERIESPENALTY SAT 2005 and 2007 purse based scoring I 9/65 Spec 2009 proposals and results of the votes I I P Lexicographical NBSOLVED, ti 9 votes Cumulative time based, with timeout penalty 3 votes X ti + (NBTOTAL − NBSOLVED) ∗ TIMEOUT ∗ PENALTY I Cumulative time based, with timeout penalty, log based X log10 (1+ti)+(NBTOTAL−NBSOLVED)∗log10 ((1+TIMEOUT )∗PENALTY ) I Cumulative time based, with timeout and robustness penalties (Proposed by P Marijn Heule) 4 votes ti + (NBTOTAL − NBSOLVED) ∗ TIMEOUT ∗ PENALTY + NBUNSOLVEDSERIES ∗ SERIESPENALTY SAT 2005 and 2007 purse based scoring I 9/65 Industrial vs Application I Many instances in the industrial category do not come from industry I Application better reflects the wide use of SAT technology 10/65 Benchmarks selection : Random category Based on O. Kullmann recommendations in 2005 (see [OK JSAT06] for details) 3-SAT Medium Large ratio 4.26 4.2 Medium Large SAT 110 90 I I I I I I I I 5-SAT Generated benchmarks parameters start-stop step ratio start-stop step 360-560 20 21.3 90-120 10 2000-18000 2000 20 700-1100 100 Number of generated benchmarks UNKNOWN SAT UNKNOWN 110 40 40 50 - 7-SAT ratio 89 81 SAT 40 50 start-stop 60-75 140-220 step 5 20 UNKNOWN 40 - Balanced number of SAT/UNKNOWN benchmarks for complete solvers : 190/190 Specific benchmarks for complete SAT solvers : 190 Specific benchmarks for incomplete SAT solvers 190 Satisfiability of medium benchmarks checked using gNovelty+. Satisfiability of large benchmarks per construction (ratio < threshold). 100 benchmarks generated for each setting. Randomly selected benchmarks 10 using judges random seed 40 large 3-SAT benchmarks (20K-26K variables) added for the second stage 11/65 How to predict benchmark hardness for non-random benchmarks ? I I I I I Problem : we need benchmarks to discriminate solvers (i.e. not too easy, not too hard). Challenging benchmarks necessary to see the limit of current approaches Idea : use a small set of last SAT winners in each categories Rsat, Minisat and picosat to rank application benchmarks March-KS, Satzilla-Crafted and Minisat for crafted benchmarks easy Solved within 30s by all the solvers hard Not solved by any of the solvers (timeout used) medium Remaining instances 12/65 Judges decisions regarding the selection of submitted vs existing benchmarks I No more than 10% of the benchmarks should come from the same source. I The final selection of benchmarks should contain 45% existing benchmarks and 55% submitted benchmarks. I The final selection should contain 10% easy, 40% medium and 50% hard benchmarks. I Duplicate benchmarks found after the selection was done will simply be removed from the selection. No other benchmarks will be added to the selection. 13/65 Application benchmarks submitted to the competition Aprove (Carsten Fuhs) Term Rewriting systems benchmarks. BioInfo I (Fabien Corblin) Queries to find the maximal size of a biological behavior without cycles in discrete genetic networks. BioInfo II (Maria Louisa Bonet) Evolutionary trees (presented on Tuesday). Bit Verif (Robert Brummayer) Bit precise software verification generated by the SMT solver Boolector. C32SAT Submitted by Hendrik Post and Carsten Sinz. Software verification generated by the C32SAT satisfiability checker for C programs. Crypto (Milan Sesum) Encode attacks for both the DES and MD5 crypto systems. Diagnosis (Anbulagan and Alban Grastien) 4 different encodings of discrete event systems. 14/65 Application benchmarks : classification Origin EASY MEDIUM HARD Total SAT UNSAT SAT UNSAT SAT UNSAT UNKOWN SAT RACES 6 18 43 50 3 21 141 SAT COMP 07 6 15 47 49 7 12 45 181 SUBMITTED 09 60 38 38 60 8 12 102 318 Total 72 71 128 159 18 45 147 640 Origin EASY MEDIUM HARD Total SAT UNSAT SAT UNSAT SAT UNSAT UNKOWN Aprove 21 4 25 BioInfo I 3 6 11 20 BioInfo II 9 4 3 24 40 Bit Verif 14 22 6 23 65 C32SAT 1 1 3 3 2 10 Crypto 5 7 6 4 40 62 Diagnosis 22 23 16 15 4 3 13 96 Total 60 38 38 60 8 12 102 318 15/65 Application benchmarks, final selection Origin old new Total 16/65 SAT 1 18 19 EASY UNSAT 9 1 10 ALL 10 19 29 SAT 21 25 46 MEDIUM UNSAT ALL 33 54 40 65 73 119 SAT 6 8 14 HARD UNSAT UNK 23 34 10 63 33 97 Total ALL 63 81 144 127 165 292 Crafted benchmarks submitted to the competition Edge Matching Submitted by Marijn Heule. Four encodings of edge matching problems Mod Circuits submitted by Grigory Yaroslavtsev. Presented on Tuesday. Parity Games submitted by Oliver Friedmann. The generator encodes parity games of a fixed size n that forced the strategy improvement algorithm to require at least i iterations. Ramsey Cube Submitted by Philipp Zumstein. RB SAT Submitted by Nouredine Ould Mohamedou. Random CSP problems encoded into SAT. Sgen submitted by Ivor Spence. Small but hard satisfibility benchmarks, either SAT or UNSAT. SGI submitted by Calin Auton. Random SGI model -SRSGI. Sub Graph isomorphism problems. 17/65 Difficulty of crafted benchmarks Origin Edge Matching ModCircuits Parity Games Ramsey Cube RBSAT SGEN SGI Total Origin old new Total 18/65 SAT 19 19 SAT 6 1 5 106 118 EASY UNSAT 1 8 1 10 EASY UNSAT 4 7 11 ALL 4 26 30 MEDIUM SAT UNSAT 20 4 1 7 2 5 3 34 1 4 2 1 75 9 SAT 19 50 69 SAT 6 6 MEDIUM UNSAT ALL 42 61 9 59 65 120 HARD UNSAT UNKOWN 6 13 1 1 325 9 355 SAT 4 6 11 HARD UNSAT UNK 12 58 70 10 129 Total 32 19 24 10 360 21 107 573 Total ALL 74 76 150 139 161 300 Preprocessor track : aim Back to the first competition aim : I a lot of new methods exist, but hard to tell which one is the best I Satelite is widely used, but getting old I We want to encourage new methods I Allow to easily enhance all solvers by just adding preprocessors in front of them 19/65 Preprocessor track : competitors Solver name IUT BMB SIM 1.0 ReVivAl 0.23 ReVivAl 0.23 + SatElite SatElite + ReVivAl 0.23 kw pre minisat2-core minisat2-simp 20/65 Authors Competition division Abdorrahim Bahrami, Seyed Rasoul Mousavi, Kiarash Bazargan Cédric Piette Cédric Piette Cédric Piette Demonstration division Johan Alfredsson Reference solvers Niklas Een and Niklas Sorensson Niklas Een and Niklas Sorensson Preprocessing track : experimental settings Benchmarks the one from the main track in both application and crafted categories. SAT engine Minisat2 070721 core solver (without preprocessing). Comparison criteria the preprocessor and the engine seen as a black box. Timeout 1200s. 21/65 Preprocessing track : the results in application category Rank 1 2 3 4 5 6 7 22/65 Solver Virtual Best Solver (VBS) kw pre ReVivAl 0.23 + SatElite SatElite + ReVivAl 0.23 ReVivAl 0.23 minisat2-simp IUT BMB SIM 1.0 minisat2-core Total 163 149 121 119 117 116 111 106 SAT 67 58 51 48 53 46 46 47 UNSAT 96 91 70 71 64 70 65 59 CPU Time 33886.97 34591.65 39093.24 38374.13 44067.36 25111.90 30273.14 23477.71 Preprocessing track running time : application (SAT+UNSAT) Time to solve an instance (SAT/UNSAT answers, category APPLICATION) 1200 1000 CPU time (s) 800 600 400 200 0 0 20 40 60 80 100 number of solved instances IUT_BMB_SIM 1.0 kw_pre 2009-03-21 minisat2-core 070721 minisat2-simp 070721 23/65 120 140 160 ReVivAl 0.23 2009-03-18 ReVivAl 0.23 + SatElite 2009-03-18 SatElite + ReVivAl 0.23 2009-03-18 Preprocessing track : the results in crafted category Rank 1 2 3 4 5 6 7 24/65 Solver Virtual Best Solver (VBS) minisat2-simp SatElite + ReVivAl 0.23 ReVivAl 0.23 + SatElite ReVivAl 0.23 IUT BMB SIM 1.0 kw pre minisat2-core Total 137 119 119 119 114 107 106 100 SAT 92 76 75 75 72 74 72 69 UNSAT 45 43 44 44 42 33 34 31 CPU Time 20732.67 23212.54 24059.71 24622.54 20435.40 23163.33 16298.74 17639.05 Preprocessing track running time : crafted (SAT+UNSAT) Time to solve an instance (SAT/UNSAT answers, category CRAFTED) 1200 1000 CPU time (s) 800 600 400 200 0 0 20 40 60 80 100 120 number of solved instances IUT_BMB_SIM 1.0 kw_pre 2009-03-21 minisat2-core 070721 minisat2-simp 070721 25/65 ReVivAl 0.23 2009-03-18 ReVivAl 0.23 + SatElite 2009-03-18 SatElite + ReVivAl 0.23 2009-03-18 Minisat Hack track : aim I Observe the effect of clearly identified “small changes” in a widely used solver I Help understand what is really important in Minisat, what can be improved, ... I Ensure that all solvers are comparable (small syntactic changes) I Encourage easy entries to the competition (e.g. Master or first year PhD student) 26/65 Minisat Hack competitors Solver name Authors Submissions Alexander Mishunin and Grigory Yaroslavtsev Kiyonori Taniguchi, Miyuki Koshimura, Hiroshi Fujita, and Ryuzo Hasegawa Markus Iser Allen Van Gelder Kazuya Masuda and Tomio Kamada Reference solvers minisat2 core Niklas Een and Niklas Sorensson Solvers presented during the SAT 2009 conference APTUSAT BinMiniSat MiniSAT 09z MiniSat2hack minisat cumr p/r 27/65 Minisat hack results Rank 1 2 3 4 5 6 7 28/65 Solver Virtual Best Solver (VBS) MiniSAT 09z minisat cumr p minisat cumr r APTUSAT BinMiniSat minisat2 core MiniSat2hack Total 169 149 142 131 123 123 120 119 SAT 71 59 58 60 54 48 53 52 UNSAT 98 90 84 71 69 75 67 67 CPU Time 40959.35 37228.91 32636.31 29316.97 25418.27 29326.67 25600.16 24024.97 Minisat hack running time : application (SAT+UNSAT) Time to solve an instance (SAT/UNSAT answers, category APPLICATION) 1200 1000 CPU time (s) 800 600 400 200 0 0 20 40 60 80 100 120 140 160 number of solved instances APTUSAT 2009-03-22 BinMiniSat 2009-03-21 MiniSAT 09z 2009-03-22 minisat2 070721/core 29/65 MiniSat2hack 2009-03-23 minisat_cumr p-2009-03-18 minisat_cumr r-2009-03-18 Parallel (multithreads) track : aim We’ll have to deal with multicores computers, let’s start thinking about it. I Naive parellelization should not work on many cores : memory access is a hard bottleneck for SAT solvers I We would like to observe if multithreaded solvers scale well on a machine with 16 cores. 30/65 Parallel (multithreads) track : aim We’ll have to deal with multicores computers, let’s start thinking about it. I Naive parellelization should not work on many cores : memory access is a hard bottleneck for SAT solvers I We would like to observe if multithreaded solvers scale well on a machine with 16 cores. I Problem : not enough competitors ! 30/65 Parallel track : the competitors Solver name Authors No limit on threads gNovelty+-T Duc-Nghia Pham and Charles Gretton satake Kota Tsuyuzaki ttsth-5-0 Ivor Spence Limited to 4 threads ManySAT 1.1 aimd 0/1/2 Youssef Hamadi, Saı̈d Jabbour, Lakhdar Saı̈s 31/65 Parallel track : the settings I Parallel solvers ran on 3 different computers : 2 processors with the main track, first stage, at CRIL. 4 cores on a cluster of 4 core computers at LRI. 16 cores on one specific 16 core computer at LRI. I The solvers are given 10000s CPU time to be shared by the different threads : to be compared with the second stage of the main track. I We ran only solvers able to use the 16 cores on the 16 core computer. 32/65 Parallel track : the results Solver Total SAT Application 2 Threads (CRIL) ManySAT 1.1 aimd 1 193 71 4 Threads (LRI) ManySAT 1.1 aimd 1 187 68 ManySAT 1.1 aimd 0 185 69 ManySAT 1.1 aimd 2 181 65 satake 118 52 ttsth-5-0 7 3 16 Threads (LRI) satake 106 40 ttsth-5-0 7 3 Random gNovelty+-T (2 threads CRIL) 314 314 gNovelty+-T (4 threads LRI) 296 296 gNovelty+-T (16 threads LRI) 237 237 33/65 UNSAT CPU Time 122 173344.71 119 116 116 66 4 112384.15 103255.01 104021.63 50543.61 2274.38 66 4 130477.38 9007.53 - 143439.69 95118.33 68173.49 The main track : competitors Solver name adaptg2wsat2009/++ CircUs clasp 1.2.0-SAT09-32 CSat 2009-03-22 glucose 1.0 gnovelty+/2/2-H Hybrid2 hybridGM 1/3/7 HydraSAT base/flat/multi iPAWS IUT BMB SAT 1.0 LySAT c/i march hi/nn MoRsat MXC NCVWr picosat 913 precosat 236 Rsat SApperloT base/hrp SAT4J CORE 2.1 RC1 SATzilla2009 C/I/R slstc 1.0 TNM tts-5-0 VARSAT-crafted/random/industrial kw 2009-03-20 MiniSat 2.1 (Sat-race’08 Edition) 34/65 Authors chuMin Li, Wanxia Wei Hyojung Han Benjamin Kaufmann Guanfeng Lv, Qian Wang, Kaile Su Gilles Audemard and Laurent Simon Duc-Nghia Pham and Charles Gretton Wanxia Wei, Chu Min Li, and Harry Zhang Adrian Balint Christoph Baldow, Friedrich Gräter, Steffen Hölldobler, Norbert Manthey, Max Seelemann, Peter Steinke, C John Thornton and Duc Nghia Pham Abdorrahim Bahrami, Seyed Rasoul Mousavi, Kiarash Bazargan Youssef Hamadi, Saı̈d Jabbour, Lakhdar Saı̈s Marijn Heule Jingchao Chen David Bregman Wanxia Wei, Chu Min Li, and Harry Zhang Armin Biere Armin Biere Knot Pipatsrisawat and Adnan Darwiche Stephan Kottler Daniel Le Berre Lin Xu, Frank Hutter, Holger H. Hoos and Kevin Leyton-Brown Anton Belov, Zbigniew Stachniak Wanxia Wei and Chu Min Li Ivor Spence Eric Hsu Johan Alfredsson Niklas Sorensson, Niklas Een The main track : reference solvers from 2007 Solver name adaptg2wsat+ gnovelty+ March KS SATzilla RANDOM picosat 535 Rsat 07 SATzilla CRAFTED minisat SAT 2007 35/65 Authors Random Wanxia Wei, Chu-Min Li and Harry Zhang Duc Nghia Pham and Charles Gretton Marijn Heule and Hans van Maaren Lin Xu, Frank Hutter, Holger H. Hoos and Kevin Leyton-Brown Application Armin Biere Knot Pipatsrisawat and Adnan Darwiche Crafted Lin Xu, Frank Hutter, Holger H. Hoos and Kevin Leyton-Brown Niklas Sorensson and Niklas Een Main track : phase 1, application Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 36/65 Solver Virtual Best Solver (VBS) precosat 236 MiniSat 2.1 LySAT i glucose 1.0 MiniSAT 09z kw ManySAT 1.1 aimd 1 ManySAT 1.1 aimd 0 MXC ManySAT 1.1 aimd 2 CircUs Rsat SATzilla2009 I minisat cumr p picosat 913 clasp 1.2.0-SAT09-32 Rsat 2007 SApperloT base picosat 535 Total 196 164 155 153 152 152 150 149 149 147 145 144 143 142 141 139 138 133 129 126 SAT 79 65 65 57 54 59 58 54 54 62 51 59 53 60 58 63 53 56 55 59 UNSAT 117 99 90 96 98 93 92 95 95 85 94 85 90 82 83 76 85 77 74 67 CPU Time 33863.84 37379.67 27011.56 35271.11 34784.84 37872.87 35080.23 34834.19 38639.59 27968.90 34242.50 36680.28 31000.89 33608.36 29304.08 34013.47 33317.37 28975.23 31762.78 33871.13 Main track : phase 1, application continued Rank 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 37/65 Solver Virtual Best Solver (VBS) LySAT c IUT BMB SAT 1.0 HydraSAT Base HydraSAT-Flat Flat VARSAT-industrial SApperloT hrp HydraSAT-Multi SATzilla2009 C VARSAT-crafted SAT4J CORE 2.1 RC1 satake CSat 2009-03-22 SATzilla2009 R VARSAT-random march hi march nn Hybrid2 adaptg2wsat2009 adaptg2wsat2009++ Total 196 123 116 116 115 110 107 106 106 99 95 92 91 59 59 21 21 12 11 11 SAT 79 51 46 53 51 49 42 49 45 44 46 40 40 36 25 9 10 11 8 8 UNSAT 117 72 70 63 64 61 65 57 61 55 49 52 51 23 34 12 11 1 3 3 CPU Time 33863.84 26865.49 20974.40 26856.33 26016.34 22753.77 20954.19 16308.48 25974.72 23553.01 25380.84 18309.62 20461.14 6260.03 16836.65 5170.80 6189.51 3851.84 1746.45 1806.37 Main track : phase 1, application continued two Rank 39 40 41 42 43 44 45 46 47 48 49 50 38/65 Solver Virtual Best Solver (VBS) slstc 1.0 tts NCVWr iPAWS ttsth-5-0 hybridGM7 gnovelty+ gNovelty+-T TNM hybridGM 1 hybridGM3 gnovelty+2 Total 196 10 10 10 8 8 7 7 7 6 5 5 4 SAT 79 9 6 9 8 4 7 7 7 5 5 5 4 UNSAT 117 1 4 1 3 4 1 - CPU Time 33863.84 2093.29 2539.03 2973.72 1400.34 2937.42 468.76 1586.83 1826.46 1157.83 731.62 1103.11 91.85 First stage : crafted Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 39/65 Solver Virtual Best Solver (VBS) clasp 1.2.0-SAT09-32 SATzilla2009 I SATzilla2009 C MXC 2009-03-10 precosat 236 IUT BMB SAT 1.0 minisat SAT 2007 SATzilla CRAFTED MiniSat 2.1 (Sat-race’08 Edition) glucose 1.0 VARSAT-industrial SApperloT base picosat 913 LySAT c CircUs kw Rsat SATzilla2009 R ManySAT 1.1 aimd 1 Total 194 131 128 125 124 122 120 119 114 114 114 113 113 112 112 107 106 105 104 103 SAT 124 78 86 73 80 81 76 76 82 74 75 73 73 80 70 70 72 71 78 72 UNSAT 70 53 42 52 44 41 44 43 32 40 39 40 40 32 42 37 34 34 26 31 CPU Time 19204.67 22257.76 21700.11 16701.85 22256.57 22844.50 22395.97 22930.58 18066.80 18107.02 20823.96 22306.77 22826.65 17111.73 21080.61 16148.01 16460.37 14010.73 14460.38 14991.64 First stage : crafted continued Rank 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 40/65 Solver Virtual Best Solver (VBS) HydraSAT-Multi HydraSAT-Flat SApperloT hrp minisat cumr p VARSAT-crafted LySAT i ManySAT 1.1 aimd 2 ManySAT 1.1 aimd 0 HydraSAT base MiniSAT 09z VARSAT-random satake iPAWS SAT4J CORE 2.1 RC1 adaptg2wsat2009 adaptg2wsat2009++ Hybrid2 CSat Total 194 103 102 102 102 102 100 99 99 99 99 84 75 71 71 70 66 66 65 SAT 124 70 70 69 75 61 69 70 71 66 72 47 55 71 50 68 64 66 50 UNSAT 70 33 32 33 27 41 31 29 28 33 27 37 20 21 2 2 15 CPU Time 19204.67 20825.53 17796.15 20647.84 23176.38 23304.40 14874.18 14211.48 15251.61 16718.94 17027.31 14023.19 16261.12 7352.89 15136.95 9425.51 5796.69 10425.56 10319.33 First stage : crafted continued two Rank 38 39 40 41 42 43 44 45 46 47 48 49 50 51 41/65 Solver Virtual Best Solver (VBS) march hi TNM March KS march nn gnovelty+ gNovelty+-T hybridGM hybridGM3 NCVWr tts 5-0 ttsth-5-0 gnovelty+2 hybridGM7 slstc 1.0 Total 194 63 62 61 58 54 53 51 51 48 46 46 46 38 33 SAT 124 45 62 42 43 54 53 51 51 48 25 24 44 38 33 UNSAT 70 18 19 15 21 22 2 - CPU Time 19204.67 10622.02 8181.19 9021.93 6232.17 5853.95 5073.82 5298.30 6737.29 12116.63 2507.80 4020.68 4840.28 4385.10 4228.67 Random results Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Solver Virtual Best Solver (VBS) SATzilla2009 R TNM gnovelty+2 hybridGM3 hybridGM7 adaptg2wsat2009++ hybridGM 1 adaptg2wsat2009 Hybrid2 gnovelty+ NCVWr gnovelty+ SATzilla RANDOM gNovelty+-T adaptg2wsat+ iPAWS march hi march nn March KS SATzilla2009 I Total 459 365 317 305 299 298 297 294 294 290 281 278 272 268 266 265 258 247 243 239 145 SAT 359 299 317 305 299 298 297 294 294 290 281 278 272 177 266 265 258 147 145 149 90 UNSAT 100 66 91 100 98 90 55 CPU Time 62339.75 51997.72 35346.17 26616.48 23272.79 25567.23 26432.65 23732.78 26658.47 30134.40 25523.72 31132.10 21956.28 42919.16 22823.37 22333.18 19296.93 65568.89 66494.85 57869.03 37645.86 Random results : weak solvers Rank 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 43/65 Solver slstc 1.0 clasp 1.2.0-SAT09-32 VARSAT-random picosat 913 SATzilla2009 C VARSAT-industrial VARSAT-crafted SApperloT base IUT BMB SAT 1.0 MXC LySAT c MiniSat 2.1 (Sat-race’08 Edition) minisat cumr p precosat 236 satake SApperloT hrp glucose 1.0 Total 118 84 83 79 73 71 70 70 63 61 60 41 29 27 24 17 17 SAT 118 66 72 57 61 61 60 53 50 50 48 37 29 25 24 13 17 UNSAT 18 11 22 12 10 10 17 13 11 12 4 2 4 - CPU Time 13250.77 32979.32 30273.41 29440.52 22395.73 27295.84 27367.38 28249.79 25630.38 28069.37 24329.68 16957.09 14078.15 9522.84 11034.05 7724.34 7772.56 Random problems : very bad solvers Rank 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 44/65 Solver HydraSAT-Flat HydraSAT-Multi HydraSAT Base CircUs ManySAT 1.1 aimd 0 ManySAT 1.1 aimd 2 LySAT i CSat Rsat ManySAT 1.1 aimd 1 kw SAT4J CORE 2.1 RC1 MiniSAT 09z tts 5.0 ttsth-5-0 Total 16 16 13 8 7 6 6 6 5 5 4 4 3 0 0 SAT 15 16 13 8 7 6 6 6 5 5 4 4 3 0 0 UNSAT 1 0 0 CPU Time 5738.82 7836.19 4930.65 2553.24 1783.47 957.09 2124.49 2263.92 1801.20 4144.36 635.52 1440.19 1096.04 0.00 0.00 Finally The results of the second stage ! 45/65 Final results, Application, SAT+UNSAT Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Solver Virtual Best Solver (VBS) precosat 236 glucose 1.0 LySAT i CircUs SATzilla2009 I 2009-03-22 MiniSat 2.1 (Sat-race’08 Edition) ManySAT 1.1 aimd 1 MiniSAT 09z MXC minisat cumr p Rsat SApperloT base Rsat 2007-02-08 kw clasp 1.2.0-SAT09-32 picosat 535 Total 229 204 204 197 196 195 194 193 193 190 190 188 186 180 175 175 171 SAT 91 79 77 73 77 81 78 71 78 79 75 74 78 69 67 60 76 UNSAT 138 125 127 124 119 114 116 122 115 111 115 114 108 111 108 115 95 CPU Time 153127.06 180345.80 218826.10 198491.53 229285.44 234743.41 144548.45 173344.71 184696.75 180409.82 206371.06 187726.95 282488.39 195748.38 90213.34 163460.74 209004.97 Cactus plot : application SAT+UNSAT 47/65 Final results, Application, SAT only Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Solver Virtual Best Solver (VBS) SATzilla2009 I precosat 236 MXC MiniSat 2.1 (Sat-race’08 Edition) MiniSAT 09z SApperloT base CircUs glucose 1.0 picosat 535 minisat cumr p Rsat 2009-03-22 LySAT i ManySAT 1.1 aimd 1 Rsat 2007-02-08 kw clasp 1.2.0-SAT09-32 SAT 91 81 79 79 78 78 78 77 77 76 75 74 73 71 69 67 60 CPU Time 52336.24 96609.87 52903.18 75203.55 42218.37 75075.48 111286.45 74720.59 90532.72 84382.33 67373.20 85363.26 81793.98 62994.30 47294.67 31254.87 25529.94 Final results, Application, SAT only Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Solver Virtual Best Solver (VBS) SATzilla2009 I precosat 236 MXC MiniSat 2.1 (Sat-race’08 Edition) MiniSAT 09z SApperloT base CircUs glucose 1.0 picosat 535 minisat cumr p Rsat 2009-03-22 LySAT i ManySAT 1.1 aimd 1 Rsat 2007-02-08 kw clasp 1.2.0-SAT09-32 SAT 91 81 79 79 78 78 78 77 77 76 75 74 73 71 69 67 60 CPU Time 52336.24 96609.87 52903.18 75203.55 42218.37 75075.48 111286.45 74720.59 90532.72 84382.33 67373.20 85363.26 81793.98 62994.30 47294.67 31254.87 25529.94 Cactus plot : application SAT (timeout matters !) 49/65 Final results, Application, UNSAT only Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Solver Virtual Best Solver (VBS) glucose 1.0 precosat 236 LySAT i ManySAT 1.1 aimd 1 CircUs MiniSat 2.1 (Sat-race’08 Edition) MiniSAT 09z clasp 1.2.0-SAT09-32 minisat cumr p Rsat SATzilla2009 I MXC Rsat 2007-02-08 kw 2009-03-20 SApperloT base picosat 535 UNSAT 138 127 125 124 122 119 116 115 115 115 114 114 111 111 108 108 95 CPU Time 100790.82 128293.39 127442.62 116697.55 110350.41 154564.85 102330.08 109621.27 137930.80 138997.86 102363.69 138133.54 105206.27 148453.71 58958.47 171201.93 124622.64 Cactus plot : application UNSAT (timeout matters !) 51/65 Final results, Crafted, SAT+UNSAT Rank 1 2 3 4 5 6 7 8 9 10 11 12 Solver Virtual Best Solver (VBS) clasp 1.2.0-SAT09-32 SATzilla2009 C minisat SAT 2007 IUT BMB SAT 1.0 SApperloT base MXC VARSAT-industrial precosat 236 LySAT c SATzilla CRAFTED MiniSat 2.1 (Sat-race’08 Edition) glucose 1.0 Total 187 156 155 150 149 149 146 145 141 141 137 137 135 SAT 108 92 83 90 89 92 91 85 90 83 84 87 86 UNSAT 79 64 72 60 60 57 55 60 51 58 53 50 49 CPU Time 62264.60 89194.49 94762.27 99960.89 93502.16 108298.52 76965.59 119365.13 66318.44 89925.84 76856.90 78381.80 70385.63 Cactus plot : crafted SAT+UNSAT 53/65 Final results : crafted SAT only Rank 1 2 3 4 5 6 7 8 9 10 11 12 54/65 Solver Virtual Best Solver (VBS) clasp 1.2.0-SAT09-32 SApperloT base MXC 2009-03-10 precosat 236 minisat SAT 2007 IUT BMB SAT 1.0 MiniSat 2.1 (Sat-race’08 Edition) glucose 1.0 VARSAT-industrial SATzilla CRAFTED SATzilla2009 C LySAT c SAT 108 92 92 91 90 90 89 87 86 85 84 83 83 CPU Time 21224.84 49775.04 54682.14 39227.16 34447.16 48346.20 45287.01 41994.77 37779.61 54521.77 21726.48 39383.44 42073.80 Cactus plot : crafted SAT only 55/65 Final results : crafted UNSAT only Rank 1 2 3 4 5 6 7 8 9 10 11 12 56/65 Solver Virtual Best Solver (VBS) SATzilla2009 C clasp 1.2.0-SAT09-32 IUT BMB SAT 1.0 minisat SAT 2007 VARSAT-industrial LySAT c SApperloT base MXC SATzilla CRAFTED precosat 236 MiniSat 2.1 (Sat-race’08 Edition) glucose 1.0 UNSAT 79 72 64 60 60 60 58 57 55 53 51 50 49 CPU Time 41039.76 55378.83 39419.45 48215.14 51614.69 64843.36 47852.03 53616.38 37738.43 55130.42 31871.28 36387.03 32606.02 Cactus plot : crafted UNSAT only 57/65 Random, SAT (420/380 benchmarks) Rank 1 2 3 4 5 6 7 8 9 10 11 12 58/65 Solver Virtual Best Solver (VBS) TNM gnovelty+2 hybridGM3 SATzilla2009 R adaptg2wsat2009++ gnovelty+ 2007-02-08 gNovelty+-T adaptg2wsat+ 2007-02-08 iPAWS SATzilla RANDOM March KS 2007-02-08 march hi SAT 404 /371 379/353 355/352 340/309 339/335 338/337 318/311 314/309 298 288 181 177 173 CPU Time 97656.83 194780.22 154503.93 101986.32 122158.36 133641.90 130357.30 143439.69 117302.89 93855.93 23793.38 98629.25 90433.09 Cactus plot : random SAT only 59/65 Random, UNSAT/SAT+UNSAT Rank 1 2 3 4 1 2 3 4 60/65 Solver Total SAT UNSAT SAT+UNSAT (610 benchmarks) SATzilla2009 R 435/431 339/335 96 march hi 313 173 140 SATzilla RANDOM 308 181 127 March KS 2007-02-08 308 177 131 UNSAT (190 march hi 140 March KS 2007-02-08 131 SATzilla RANDOM 127 SATzilla2009 R 96 CPU Time 231051.45 261826.59 186335.14 258763.45 benchmarks) 171393.50 160134.20 162541.76 108893.09 Cactus plot : random UNSAT only 61/65 Awards Summary Category Gold SAT UNSAT SAT+UNSAT SATzilla2009 I glucose precosat SAT UNSAT SAT+UNSAT clasp SATzilla2009 C clasp SAT UNSAT SAT+UNSAT Special prizes TNM March hi SATzilla2009 R Silver Application precosat precosat glucose Crafted SApperloT clasp SATzilla2009 C Random gNovelty2+ SATzilla2009 R March hi Bronze MXC Lysat Lysat MXC IUT BMB SAT IUT BMB SAT hybridGM3 / adaptg2wsat2009++ - ManySAT Best parallel SAT solver for application benchmarks gNovelty+-T Best parallel solver for random benchmarks Minisat 09z Best Minisat Hack solver 62/65 Summary of the competition I Steady improvement since SAT competition 2007 I Huge improvements in SLS solvers (Random SAT) I Many good solvers ! I Many new solvers awarded ! I The portfolio approach is quite accurate despite 55% of new benchmarks in the competition ! I Parallel SAT solving evaluation needs to be discussed ! 63/65 Discussion Is the competition good or bad for the community ? It depends of the use of the results ! I independent public results available I new benchmarks appear each year I high visibility outside the community I reward for solver designers Main problem : how to spot good new idea ? I purse based scoring ? I time independent metrics ? I submit also less mature solvers ! 64/65 Provocative idea Should we remove the random category to force incomplete solver designers to tune their approach to application benchmarks ? 65/65
© Copyright 2026 Paperzz