KTH/CSC Empirical investigations of local search on random KSAT for K = 3,4,5,6... CDInfos0803 Program Kavli Institute for Theoretical Physics China Erik Aurell KTH Royal Institute of Technology Stockholm, Sweden March 4, 2008 Erik Aurell, KTH Computational Biology 1 KTH/CSC Circumspect descent prevails in solving combinatorial optimization problems Mikko Alava, John Ardelius, E.A., Petteri Kaski, Supriya Krishnamurthy, Pekka Orponen, Sakari Seitz, arXiv:0711.4902 (Nov 30, 2007) Earlier work by E.A., Scott Kirkpatrick and Uri Gordon (2004), Alava, Orponen and Seitz (2005), Ardelius and E.A. (2006), Ardelius, E.A. and Krishnamurthy (2007)……and many others March 4, 2008 Erik Aurell, KTH Computational Biology 2 KTH/CSC Why did we get into this? March 4, 2008 Erik Aurell, KTH Computational Biology 3 KTH/CSC Let me give three reasons March 4, 2008 Erik Aurell, KTH Computational Biology 4 it is a fundamental and practically important problem KTH/CSC ...which I learnt about working for the Swedish railways E.A. J. Ekman, Capacity of single rail yards [in Swedish], Swedish Railway Authority Technical reports (2002) March 4, 2008 Erik Aurell, KTH Computational Biology 5 KTH/CSC They have potential, under-used applications in systems biology As an example I will describe a consulting work we did for Global Genomics, a now defunct Swedish Biotech Company. They claimed to have a new method to measure global gene expression. Many of their ideas were in fact from S. Brenner and K. Livak, PNAS 86 (1989), 8902-06, and K. Kato, Nucleic Acids Res. 23 (1995), 3685-3690. March 4, 2008 Erik Aurell, KTH Computational Biology 6 KTH/CSC The problem is that using only one restriction Type IIS enzyme, there is not enough information in the data to determine which genes were expressed (many genes could have given rise to a given peak). Kato (1995) tried using several enzymes of the same type sequentially. Problem: loss of accuracy, complicated. Global Genomics AB’s invention was to use several enzymes in parallel. March 4, 2008 Erik Aurell, KTH Computational Biology 7 KTH/CSC The Global Genomics invention in led to a optimal matching problem Matching the observations to a gene database gives a bipartite graph, where a link between a gene g and an observation o represents the fact that o could be an observation of g. All possible matchings gene database gene 1 observations 100 30 gene 2 70 30 The best matching can be represented as a subgraph of the graph above + expression levels. gene 3 70 An optimal matching gene database gene 1 30 observations 100 30 gene 2 A. Ameur, E.A., M. Carlsson, J. Orzechowski Westholm, “Global gene expression analysis by combinatorial optimization”, In Silico Biology 4 (0020) (2004) March 4, 2008 70 30 gene 3 70 Erik Aurell, KTH Computational Biology 70 8 KTH/CSC Testing using the FANTOM data base of mouse cDNA (RIKEN) For in silico testing we used the FANTOM data base of full-length mouse cDNA, available at genome.gsc.riken.go.jp We used an early 2003 version of 60 770 RIKEN full-length clones, partitioned into 33 409 groups representing different genes. This second list can be taken a proxy of all genes in mouse. March 4, 2008 Principle of in silico tests: 1. Select a fraction of genes 2. Generate random exp. levels 3. Generate random peak and length perturbations 4. Run the algorithm 5. Compare Erik Aurell, KTH Computational Biology 9 KTH/CSC both methods solve the optimization according to the given criteria when the perturbation parameters are small enough the methods are comparable at low or moderate fraction of genes expressed local search is superior at high fraction of genes expressed Ameur et al (2004) March 4, 2008 Erik Aurell, KTH Computational Biology 10 KTH/CSC In theory, combinatorial optimization and constraint satisfiability give rise to many of the computationally hardest problems March 4, 2008 Erik Aurell, KTH Computational Biology 11 KTH/CSC In practice, combinatorial optimization and constraint satisfaction problems are routinely solved by complete methods (branch-and-bound), local search heuristics, by mixed integer programming, etc. March 4, 2008 Erik Aurell, KTH Computational Biology 12 KTH/CSC How is this possible? Following many others we will look at a simple model March 4, 2008 Erik Aurell, KTH Computational Biology 13 KTH/CSC Random K-satisfiability problems Let there be N Boolean variables, and 2N literals Let there be M logical propositions (clauses) Pa L a1 La2 ... L ak A clause expresses that one out of 2k possible configurations of k variables is forbidden. Clauses are picked randomly (with replacement) from all possible k-tuples of variables. Can all M clauses be satisfied simultaneously? P P1 P2 ... PM March 4, 2008 Erik Aurell, KTH Computational Biology 14 4000 M N 50 var 40 var 20 var 3000 DP Calls KTH/CSC The 4.3 Point KSAT characterized by number of clauses per variable 2000 1000 phase transition between almost surely SAT to almost surely UNSAT 0 1.0 Several simple algorithms take a.s. linear time for α small enough Probability Algorithms take longest time (on the average) close to phase boundary March 4, 2008 50% sat 0.8 0.6 0.4 0.2 0.0 2 3 4 5 6 7 Ratio of Clauses-to-Variables 8 Mitchell, Selman, and Levesque 1991 Mitchell, Selman, Levesque (AAAI-92) Kirkpatrick, Selman, Science 264:1297 (1994) Erik Aurell, KTH Computational Biology 15 KTH/CSC A now about decade old statistical physics prediction of 3SAT and other constraint satisfaction problems: a clustering transition UNSAT SAT one state many states 3SAT threshold values d 3.92 M N March 4, 2008 many states no solutions cr Erik Aurell, KTH Computational Biology 4.27 16 The Mezard, Palassini and Rivoire 2005 prediction for 3COL Obtained by entropic cavity method, computing within a 1RSB scenario the number of states with a given number of solutions KTH/CSC one green state March 4, 2008 many green states, but most solutions in one or a few big states Erik Aurell, KTH Computational Biology 17 KTH/CSC The latest clustering predictions for KSAT, K > 3 are in F Krzakała, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, L. Zdeborová. ”Gibbs states and the set of solutions of random constraint satisfaction problems” PNAS 2007 Jun 19;104(25):10318-23. single cluster many small clusters but most solutions in a few of them March 4, 2008 Erik Aurell, KTH Computational Biology many clusters and solutions are found in a large set of all about equal size 18 The cluster condensation transition in F Krzakała et al (2007) KTH/CSC many clusters and solutions are found in a large set of all about equal size March 4, 2008 most clusters disappear, and again most solutions are found in a small number of them Erik Aurell, KTH Computational Biology 19 KTH/CSC So does clustering in fact pose a problem to simple local search? Are the known/features of the static landscape relevant to dynamics? March 4, 2008 Erik Aurell, KTH Computational Biology 20 a landscape that could be difficult for local search courtesy Sui Huang KTH/CSC another local minimu m local minima March 4, 2008 global minimum Erik Aurell, KTH Computational Biology 21 KTH/CSC Papadimitriou invented a stochastic local search algorithm for SAT problems in 1991, today often referred to as RandomWalksat: Pick an unsatisfied clause Pick a variable in that clause, flip it, loop Not quite like an equilibrium physics process in detailed balance, because only variables in unsatisfied clauses are updated Solves 3SAT in linear time on average up to α about 2.7 March 4, 2008 Erik Aurell, KTH Computational Biology 22 KTH/CSC A benchmark algorithm is Cohen-Kautz-Selman walksat www.cs.wahington.edu/homes/kautz/walksat Pick an unsatisfied clause Compute for each variable in the clause the breakclause breakclause is the number of other, presently satisfied, clauses, that would be broken if the variable is flipped If any variable has breakclause zero, flip it, loop With probability p, flip variable with least breakclause, loop Else, with probability 1-p, flip random variable in clause, loop Solves 3SAT in linear time on average up to α about 4.15 Using default parameters from the public repository (Aurell, Gordon, Kirkpatrick (2004) March 4, 2008 Erik Aurell, KTH Computational Biology 23 KTH/CSC We have worked with the Focused Metropolis Search (FMS) algorithm, and ASAT, an alternative version ASAT: if you have a solution, output and stop Pick an unsatisfied clause Pick randomly a variable in the clause If flipping that variable decreases the energy, do so If not, flip the variable with probability p Loop Also not in detailed balance (also tries only unsat clauses) Parameter p has to be optimized. The optimal value depends on the problem class, e.g. about 0.2 for 3SAT March 4, 2008 Erik Aurell, KTH Computational Biology 24 We have a new algorithm ChainSAT which by design never goes up in energy KTH/CSC Algorithm 1. ChainSAT S = random assignment of values to the variables chaining = FALSE while S is not a solution do if not chaining then C = a clause not satisfied by S selected uniformly at random V = a variable in C selected uniformly at random end if ΔE = change in the number of unsatisfied clauses if V is flipped in S if ΔE = 0 then flip V in S else if ΔE < 0 then with probability p1 flip V in S end with end if chaining = FALSE if ΔE > 0 then with probability 1 – p2 C = a clause that is satisfied only by V selected uniformly at random X = a variable in C other than V selected uniformly at random V=X chaining = TRUE end with end if end while March 4, 2008 Erik Aurell, KTH Computational Biology 25 KTH/CSC March 4, 2008 Solution course of a good local search (ASAT at 4.2) Erik Aurell, KTH Computational Biology 26 KTH/CSC Runtimes for ASAT on 3SAT at α=4.21 Ardelius and E.A. (2006) March 4, 2008 Erik Aurell, KTH Computational Biology 27 KTH/CSC Runtimes for ASAT on 3SAT at α=4.25 Ardelius and E.A. (2006) March 4, 2008 Erik Aurell, KTH Computational Biology 28 KTH/CSC March 4, 2008 FMS on 4SATat α=9.6 Erik Aurell, KTH Computational Biology 29 KTH/CSC March 4, 2008 ChainSAT on 4SAT, 5SAT, 6SAT Erik Aurell, KTH Computational Biology 30 KTH/CSC Do we know how local search fails on hard CSPs? The first guess would be that local search fails if solutions have little slackness which is expressed by Parisi whitening March 4, 2008 Erik Aurell, KTH Computational Biology 31 KTH/CSC March 4, 2008 Erik Aurell, KTH Computational Biology 32 KTH/CSC Several proposed clustering transitions do not stop circumspect descent Not even an algorithm which would be trapped in a potential well of any depth The reason why local search eventually fails is unknown March 4, 2008 Erik Aurell, KTH Computational Biology 33 KTH/CSC Clustering has been rigorously proven for KSAT and K greater than 8 For K less than 8 there are cavity method predictions How does numerics compare to these? March 4, 2008 Erik Aurell, KTH Computational Biology 34 KTH/CSC Solve a 3SAT instance L times with a stochastic local search (ASAT) Compute the overlaps between these L solutions See how that quantity changes with α average overlap variance of the overlap Ardelius, E.A. and Krishnamurthy (2007) March 4, 2008 Erik Aurell, KTH Computational Biology 35 The rank ordered plots of the overlaps in a chain of instances with increasing number of clauses displays a transition around 4.25 KTH/CSC α ranges from 3.5 to 4.3 N is 2000 for α = 4.3 repeat until solvable instance found for α < = 4.3 repeat until ASAT finds many solutions on the instance Ardelius, E.A. and Krishnamurthy (2007) March 4, 2008 Erik Aurell, KTH Computational Biology 36 Generate many chains of instances, check for the α at which all solutions found have an overlap of at least 80% KTH/CSC N is 100, 200, 400, 1000, 2000 Number of chains at each N is 110 If a chain does not reach the 80% threshold, repeat Threshold is between 4.25 and 4.27, could in fact coincide with SAT/UNSAT for 3SAT This is not in contradiction with the theoretical predictions of Krzakala et al (2007) who do not address 3SAT Ardelius, E.A. and Krishnamurthy (2007) March 4, 2008 Erik Aurell, KTH Computational Biology 37 KTH/CSC March 4, 2008 FMS diffusion 4SAT different α Erik Aurell, KTH Computational Biology 38 KTH/CSC March 4, 2008 FMS diffusion 4SAT α=9.6 Erik Aurell, KTH Computational Biology 39 KTH/CSC March 4, 2008 FMS diffusion 4SAT different N Erik Aurell, KTH Computational Biology 40 KTH/CSC As far as numerics can tell, if there are clusters beyond the clustering transitions in 4SAT, they are not separated by overlap March 4, 2008 Erik Aurell, KTH Computational Biology 41 KTH/CSC How does local search compare to more sophisticated (and specialized) methods that we will hear about at this school? (here I have to go to PDF) March 4, 2008 Erik Aurell, KTH Computational Biology 42 KTH/CSC A question to the experts: Which is (or are) the good metrics to compare runtimes? Wall-clock time? Some intrinsic count? March 4, 2008 Erik Aurell, KTH Computational Biology 43 KTH/CSC Conclusions Local heuristics (walksat, Focused Metropolis Search, Focused Record-to-Record Travel, ASAT, ChainSAT) are effective on hard random 3SAT, 4SAT… problems This is true even if the heuristic by design can never get out of a potential well, of any depth (ChainSAT). Traps in the landscape do not stop these algorithms. There seems to be a “clustering condensation” transition in 3SAT very close to SAT/UNSAT transition. If there is a clustering transition in 4SAT, these clusters do not seem to be separated in overlap (in contrast to K equal to 8 and greater) March 4, 2008 Erik Aurell, KTH Computational Biology 44 KTH/CSC Thanks to John Ardelius Supriya Krishnamurthy KTH/CSC Mikko Alava Petteri Kaski Pekka Orponen Sakari Seitz March 4, 2008 Erik Aurell, KTH Computational Biology 45 Is the search trapped in “potential wells” of metastable states? KTH/CSC Energy as function of time N is 1000, March 4, 2008 is 4.2 Distance to target ASAT linear regime, solution in 1000 sweeps Erik Aurell, KTH Computational Biology 46 Is the search trapped in “potential wells” of metastable states? KTH/CSC Energy as function of time N is 1000, March 4, 2008 is 4.3 Distance to target ASAT nonlinear regime, no barrier seen Erik Aurell, KTH Computational Biology 47 Is the search trapped in “potential wells” of metastable states? KTH/CSC Energy as function of time N is 1000, March 4, 2008 is 4.1 Distance to target ASAT linear regime, solution in 20 sweeps Erik Aurell, KTH Computational Biology 48
© Copyright 2026 Paperzz