All Roads Lead To Rome—New Search Methods for Optimal Triangulation Thorsten J. Ottosen Department of Computer Science, Aalborg University, Denmark [email protected] Jiřı́ Vomlel Institute of Information Theory and Automation of the AS CR, The Czech Republic [email protected] Abstract To perform efficient inference in Bayesian networks, the network graph needs to be triangulated. The quality of this triangulation largely determines the efficiency of the subsequent inference, but the triangulation problem is unfortunately NP-hard. It is common for existing methods to use the treewidth criterion for optimality of a triangulation. However, this criterion may lead to a somewhat harder inference problem than the total table size criterion. We therefore investigate new methods for depth-first search and best-first search for finding optimal total table size triangulations. The search methods are made faster by efficient dynamic maintenance of the cliques of a graph. The algorithms are mainly supposed to be off-line methods, but they may form the basis for efficient any-time heuristics. Furthermore, the methods make it possible to evaluate the quality of heuristics precisely. 1 Introduction We consider the problem of finding optimal cost triangulations of Bayesian networks. We solve this problem by searching the space of all possible triangulations. This search is carried out by trying all possible elimination orders and choosing one of those that have a minimal total table size. Of all commonly-used optimality criteria, the total table size yields the most exact bound of the memory and time requirement of the probabilistic inference. However, finding optimal triangulations is difficult: computing a minimum fill-in is NP-complete (Yannakakis, 1981) and finding a triangulation with minimal total table size is NP-hard (Wen, 1990). There are several issues that motivates an investigation of this problem. Since the problem is NP-hard, we cannot expect the problem to be solvable in a reasonable amount of time for large networks. However, triangulation can always be performed off-line on specialized servers and saved for use by the inference al- gorithms. This is important as intractability or simply poor performance is a major obstacle to more wide-spread adoption of Bayesian networks and decision graphs in statistics, engineering and other sciences. Furthermore, efficient off-line algorithms allow us to evaluate the quality of on-line methods which can otherwise only be compared to other on-line methods. An off-line method, on the other hand, can effectively answer whether the subsequent inference is tractable. Finally, off-line methods can often be turned into good any-time heuristics. Previous research on triangulation has also used best-first search (Dow and Korf, 2007) and depth-first search (Gogate and Dechter, 2004), however, the optimality criteria is the treewidth of the graph and so the found triangulation is (in the best case) only guaranteed to be within a factor of n (n being the number of vertices of the graph) from the optimal total table size triangulation—this factor could mean the difference between an intractable and a tractable inference. With the treewidth optimality cri- Pp. 209–217 in Proceedings of the Fifth European Workshop on Probabilistic Graphical Models (PGM-2010), edited by Petri Myllymäki, Teemu Roos and Tommi Jaakkola. HIIT Publications 2010–2. 210 Ottosen & Vomlel terion, one can continuously apply the preprocessing rules of (Bodlaender et al., 2005), but for the total table size criterion we can (so far) only remove simplicial vertices which makes this problem considerably harder. The seminal idea of divide-and-conquer triangulating using decomposable subgraphs dates back to (Tarjan, 1985). Leimer refines this approach such that the generated subgraphs are not themselves decomposable (i.e., they are maximal prime subgraphs and this unique decomposition is denoted a maximal prime subgraph decomposition) (Leimer, 1993). Basically, this means that the problem of triangulating a graph G is no more difficult than triangulating the largest maximal prime subgraph of G. This decomposition is exploited in (Flores and Gámez, 2003). In (Shoikhet and Geiger, 1997) a dynamic programming algorithm is given based on decompositions by minimal separators, and again the optimality criterion is treewidth. As noted by its authors, the method may be adopted to yield an optimal total table size triangulation as well. Finally, an overview of triangulation approaches is given in (Flores and Gámez, 2007). 2 Preliminaries We shall use the following notation and definitions. G = (V, E) is an undirected graph with vertices V = V(G) and edges E = E(G). For a set of edges F, V(F) is the set of vertices {u, v : {u, v} ∈ F}. An undirected graph is triangulated (or chordal) if every cycle of length greater than 3 has a chord. For example, in Figure 1 the graph on the left is not triangulated whereas the graph on the right is triangulated. For W ⊆ V, G[W] is the subgraph induced by W. A triangulation of G is a set of edges T such that T ∩ E = ∅ and the graph H = (V, E ∪ T) is triangulated. We denote the set of all triangulations of a graph G for T (G). Two vertices u and v are connected in G if there is an edge between them. A graph G is complete if all pairs of vertices {u, v} (u 6= v) are connected in G. A set of vertices W ⊆ V is complete in G if G[W] is a complete graph. The neighbours nb(v, G) of a vertex v ∈ V is the set W ⊆ V such that each u ∈ W is connected to v. The family fa(v, G) of a vertex v is the set nb(v, G) ∪ {v}, and the neighbours and family of a set of vertices is defined similarly. The elimination of a vertex v ∈ V of G = (V, E) is the process of removing v from G and making nb(v, G) a complete set. This process induces a new graph H = (V \ {v}, E ∪ F) where F is the set of fill-in edges. For example, in Figure 1 (left), eliminating the vertex 6 induces the two fill-in edges shown with dotted edges in the adjacent graph. If F = ∅, then v is a simplicial vertex. An elimination order of G = (V, E) is a bijection f : V → {1, 2, . . . , |V|} prescribing an order for eliminating all vertices of G. If all vertices are eliminated in G according to an elimination order f , the union of all the fill-in edges produced induces a triangulation of G. In this way each triangulation T of G corresponds to at least one elimination order, and we may explore the space T (G) by investigating all possible elimination orders. Given G = (V, E), a set of vertices C ⊆ V is a clique if it is a maximal complete set and C(G) is the set of all cliques in G. The Q table size of a clique C is given by ts(C) = v∈C |sp(v)| where sp(v) denotes the state space of the variable corresponding to v in the Bayesian network. Finally, the total P table size of a graph H is given by tts(H) = C∈C(H) ts(C). Triangulation algorithms aim at minimizing different criteria. The most common are the fill-in, the treewidth and the total table size criteria. The fill-in criterion requires the triangulated graph to have the minimum total number of fill-in edges. The treewidth of a graph is the size of the largest clique minus one, and the treewidth criterion requires the triangulated graph to have minimum treewidth. The total table size criteria requires the triangulated graph to have the minimum total table size. Commonly seen triangulation heuristics include minfill and min-width which both greedily pick the next vertex to eliminate based on a local score. In min-fill a vertex is chosen if its elimination leads to the fewest fill-in edges; in min-width a vertex is chosen if it has the fewest number of neighbours. Ottosen & Vomlel 1 3 2 4 1 5 3 6 2 4 5 6 211 1 2 3 4 6 5 1 2 3 4 5 6 Figure 1: Example of the fill-in edges and partially triangulated graphs induced by an elimination order that starts with the sequence h6, 4i: the dotted edges are fill-in edges. Left: the initial graph. Middle left: the fill-in edges induced by eliminating vertex 6. Middle right: the fill-in edges induced by eliminating vertex 4. Right: the final triangulated graph. 3 The Search Space for Triangulation Algorithms Our goal is to explore the space T (G) encoding all possible ways to triangulate a graph G in. To do this, we generate a search graph dynamically (on-demand) where each node corresponds to a subset of V being eliminated from G, and where each edge is labelled with the particular vertex that has been eliminated. (Note that we exclusively use the term ”node” for vertices in the search graph whereas the term ”vertex” is used exclusively for vertices in the undirected graph being triangulated.) In the start node s no vertices have been eliminated, and in a goal node t all vertices have been eliminated and the graph G has been triangulated. Since we are seeking optimal total table size triangulations we also need to associate this quantity with each node. By definition, the total table size is easy to compute if we know the cliques of the partially triangulated graph, and therefore we also need to associate this graph with each node. Below we give a small example of a path in the search space—in Section 5 we shall explain the algorithms in detail. Example 1. Consider the graphs in Figure 1 and assume that all variables (in the original Bayesian network) are binary. The graph associated with the start node s would be the graph on the left and this graph has a total table size of 22 + 22 + 22 + 23 + 23 = 28. The graph associated with a successor node m of s (corresponding to the elimination of vertex 6) would correspond to the graph in the middle (left) (including dotted edges) with total table size 22 + 22 + 23 + 24 = 32. And the successor node of m (corresponding to the elimination of vertex 4) would be associated with graph on the right, which is also a goal node, with total table size 23 +24 +24 = 40. Note that when introducing fill-in edges, we must not add edges to vertices that has already been eliminated—this is why this step does not add the edge {2, 6} even though the vertices are both neighbours of vertex 4. Observe that the total table size of a node is never higher than the total table size of its successor node(s). This implies that the total table size associated with any non-goal node n is a lower-bound on the total table size of any goal node that may be discovered from n. This property guarantees that the algorithms in Section 5 are admissible. 4 Dynamic Clique Maintenance To compute the cliques of a graph associated with a node in the search graph, we may use a standard algorithm for this task, for example, the well-known Bron-Kerbosch algorithm (Cazals and Karande, 2008). However, as we 212 Ottosen & Vomlel can see from the above example, each path in the search graph corresponds to a sequence of graphs where the difference between adjacent graphs is quite small. Therefore we may exploit this similarity among adjacent graphs to avoid the quite expensive recomputation of the cliques and total table size of the graphs. (Stix, 2004) investigated this problem, however, his method leads to many redundant computations when the added edges appear close together (as is the case for fill-in edges). Therefore we give a new algorithm that performs significantly faster for this type of update. The general idea behind this method is simple: instead of searching for cliques in the whole graph, simply run a clique enumeration algorithm on a smaller subgraph where all the new cliques appear and existing cliques disappear. This algorithm for dynamic clique maintenance is presented as Algorithm 1, and as a side-effect it also updates the total table size and the current graph. This implies that the total table size does not need to be computed from scratch either. In our case we employ Bron-Kerbosch for the local search in FindCliques(·). Its correctness follows from the following result. Theorem 1. (Ottosen and Vomlel, 2010). Let G = (V, E) be an undirected graph, and let G ′ = (V, E ∪ F) be the graph resulting from adding a set of new edges F to G. Let U = V(F). The cliques of C(G ′ ) can be found by removing the cliques from C(G) that intersect with U and adding cliques of G ′ [fa(U, G ′ )] that intersect with U. (Xiang and Lee, 2006) describes a set of vertices called a cruz which is central to their method for learning. The method described above may also be used to efficiently determine the cruz. 5 Optimal Total Table Size Triangulation Algorithms We have now shown how we may efficiently compute the total table size for each successor m of a node n in the search space T (G), and we have furthermore established that the total table size for a node n is a lower-bound of any possible triangulation associated with the set of goal Algorithm 1 Incremental maintenance of cliques and total table size by local search 1: procedure IncrUpdate(&G, &C , &tts, F) 2: Input: A graph G = (V, E), 3: the set of cliques C of G, 4: the total table size tts of G, and 5: the set of new edges F. 6: Set G = (V, E ∪ F) 7: Let U = V(F) 8: Let C new = FindCliques(G, fa(U, G)) 9: for all C ∈ C do ⊲ Remove old cliques 10: if C ∩ U 6= ∅ then 11: Set tts = tts - ts(C) 12: Set C = C \ {C} 13: end if 14: end for 15: for all C ∈ C new do ⊲ Add new cliques 16: if C ∩ U 6= ∅ then 17: Set tts = tts + ts(C) 18: Set C = C ∪ {C} 19: end if 20: end for 21: end procedure node reachable from n. Given this, we may use standard algorithms like best-first search and depth-first search to explore the search space and at the same time be guaranteed that the algorithms terminate with an optimal solution. Best-first search is an algorithm that successively expands nodes with the shortest distance to the start node until a goal node has a shorter path than all non-goal nodes. The benefit of the best-first strategy is that we may avoid exploring paths that are far from the optimal path. The disadvantage of a best-first strategy is that the algorithm must keep track of a frontier (or fringe) or nodes that still needs to be explored. Depth-first search, on the other hand, explores all paths in a depth-first manner and therefore uses only Θ(|V|) memory for a graph G = (V, E). However, depth-first search is typically forced to explore more paths than bestfirst search. To compute a cost for each path in the search graph, we associate the following with each node n: Ottosen & Vomlel 1. H = (V, E ∪ F): the original graph with all fill-in edges F accumulated along the path to n from the start node s. 2. R: the remaining graph H[V \ W] where W are the vertices of G eliminated along the path from s to n. 3. C : the set of cliques for H, C(H). 4. tts: the total table size for the graph H. 5. L: a list of vertices describing the elimination order. To maintain tts(H) efficiently we need C(H) which in turn requires H, and we saw how this can be done in Section 4 . The graph R makes it easy to determine if the graph H is triangulated and may be computed on demand to reduce memory requirements. In the worst case, the complexity of any bestfirst search method is O(β(|V|)·|V|!) (where β(·) is a function that describes the per-node overhead) because we must try each possible elimination order. However, it is well known that the remaining graph H[V \ W] is the same no matter what order the vertices in W have been eliminated in, so we can use coalescing of nodes and thus reduce the worst case complexity to O(β(|V|) · 2|V| ) (Darwiche, 2009). For depth-first search the complexity is often thought to remain at Θ(γ(|V|) · |V|!), however, at the expense of memory we may also apply coalescing for pruning purposes. Hence, depthfirst search can be made to run in O(γ(|V|)·|V|!) time using O(2|V| ) memory, but the hidden constants will be much smaller in this case compared to best-first search. Both γ(|V|) and β(|V|) take at least O(|V|3 ) time as they are dominated by the removal of simplicial vertices (the lookup into the coalescing map takes O(|V|) time due to the computation of the hash-key, and the priority queue look-up for best-first search may take O(|V|) time since the queue may become exponentially large). Getting a more precise bound on the two functions is difficult as the complexity of maintaining the cliques and total table size depends very much on the graph being triangulated. In Algorithm 2 we describe the basic bestfirst search with coalescing, and depth-first 213 Algorithm 2 Best-first search with coalescing function TriangulationByBFS(G) Let s = (G, G, C(G), tts(G), hi) EliminateSimplicial(s) if |V(s.R)| = 0 then return s end if Let map = ∅ ⊲ Coalescing map Let O = {s} ⊲ The open set while O = 6 ∅ do Let n = arg min x.tts x∈O if |V(n.R)| = 0 then return n end if Set O = O \ {n} for all v ∈ V(n.R) do Let m = Copy(n) EliminateVertex(m, v) EliminateSimplicial(m) if map[m.R].tts ≤ m.tts) then continue end if Set O = O \ {map[m.R]} Set map[m.R] = m Set O = O ∪ {m} end for end while end function search with coalescing and pruning based on the currently best path is described in Algorithm 3. The procedure EliminateVertex(·) simply eliminates a vertex from the remaining graph R and updates the cliques and total table size of the partially triangulated graph H (see Section 4). The procedure EliminateSimplicial(·) removes all simplicial vertices from the remaining graph. 6 Results In this section we describe experiments with the optimal methods as well as several heuristics derived from these. For that purpose we have generated 50 random graphs with varying size and density. In this paper we have only performed experiments on bipartite graphs—these graphs Ottosen & Vomlel Algorithm 3 Depth-first search with coalescing and upper-bound pruning function TriangulationByDFS(G) Let s = (G, G, C(G), tts(G), hi) EliminateSimplicial(s) if |V(s.R)| = 0 then return s end if Let best = MinFill(s) ⊲ Best path Let map = ∅ ⊲ Coalescing map ExpandNode(s, best, map) return best end function procedure ExpandNode(n, &best,&map) for all v ∈ V(n.R) do Let m = Copy(n) EliminateVertex(m, v) EliminateSimplicial(m) if |V(m.R)| = 0 then if m.tts < best.tts then Set best = m end if else if m.tts ≥ best.tts then continue end if if map[m.R].tts ≤ m.tts) then continue end if Set map[m.R] = m ExpandNode(m, best, map) end if end for end procedure result from the application of rank-one decomposition to BN2O networks—see (Savicky and Vomlel, 2009) for details. The main reason for using these graphs is that they are among the most difficult to triangulate. This is because (1) moralization should not be applied after using rank-one decomposition, and (2) bottom and top layers are not connected. Thereby the initial graph is sparser than usual which gives triangulation algorithms more freedom (in terms of choosing fill-in edges) when searching for a triangulation. 1000 100 10 BFS Time 214 1 0.1 0.01 0.01 0.1 1 10 100 1000 DFS Time Figure 2: Comparison of the running time of best-first search and depth-first search. Both algorithms terminate with an optimal triangulation (values above the line indicate depth-first search was faster). We have performed two different tests on this dataset: (1) a comparison of depth-first search and best-first search, and (2) a comparison between heuristic methods. For Test 2 we have implemented the following heuristic methods: (a) Limited-branching depth-first search. This means that we only expand the n successors of a node which have the lowest total table size. For example, ”limited-branch-5” expands at most five successors per node. (b) Limited memory best-first search. Here we limit the size of the open set O to some fixed value n by removing the worst nodes when the set is considered full. For example, ”limited-mem-10k” has at most 10.000 nodes in its open set. (c) The min-width and min-fill heuristics implemented so that a successor node is generated for all ties instead of breaking ties randomly. We refer to these algorithms as min-width* and min-fill*, respectively. The results from Test 1 are given in Figure 2. It appears that best-first search performs better when the computational time is large. Ottosen & Vomlel Table 1: Summary statistics for exhaustive search algorithms. Each row summarizes the mean time for graphs with n vertices. The value in parenthesis in the first column indicates the number of graphs of that size. Vertices 20 (10) 30 (20) 40 (10) 50 (5) 60 (5) % Fastest DFS 0.3s 9.1s 30.6s 30.4s 591.7s 71 BFS 0.3s 17.5s 31.8s 33.0s 607.0s 37 Table 2: Results for heuristic algorithms. The first column describes the percentage of graphs that were triangulated optimally, and the second column contains the maximum percent-wise deviation from the total table size of an optimal triangulation. The third column indicates total time for triangulating all 50 graphs. min-width* min-fill* lim.-br.-2 DFS lim.-br.-3 DFS lim.-br.-4 DFS lim.-mem-100 BFS lim.-mem-1k BFS % op. 82 84 94 98 100 96 100 % dev. 23,916 1,322 53 27 0 2 0 time 1s 1s 83s 270s 512s 2234s 6447s In Table 1 we give summary statistics for this test. We have computed the p-value of the twosided Wilcoxon two-sample test of the null hypothesis that the distribution of depth-first time minus best-first time is symmetric about 0. The p-value is 0.001069, which means that the null hypothesis is rejected, that is, differences are significant (in favor of depth-first search). The results from Test 2 are given in the Table 2. Here we have run the heuristics on the 50 graphs from Test 1. From this we can conclude that min-fill and min-width are quite often good heuristics, but that their induced search spaces are too small to avoid triangulations that are far from optimal. The new heuristics seem to avoid this pitfall. 215 Notice that the BN2O networks only have binary variables. Therefore min-width actually corresponds to the commonly used min-weight heuristic. The graph where min-width* found an exceedingly poor triangulation has 40 vertices and a density around 0.36. The total table size for min-width* was 595, 634, 176, and this means that no stochastic (breaking ties randomly) min-width heuristic can yield a triangulation that requires below some 2.4 GB of memory (assuming 4 bytes for a float). Contrast this with the optimal triangulation which leads to a memory requirement of only 10 MB. 7 Discussion The fact that depth-first search came out as the fastest algorithm must be considered a surprise. We believe that the main reason for this is that the pruning via the coalescing map turns out to work quite well—this pruning is the direct cause of the change in complexity from Θ(γ(|V|) · |V|!) to O(γ(|V|) · |V|!). The experiments indicate that best-first search actually runs in O(γ(|V|) · 2|V| ) time. Secondly, it is worth mentioning that depth-first search only needs very few (otherwise expensive) free-store allocations. To further improve the pruning by the coalescing map, then we should consider a hybrid best-first-depth-first scheme where we explore the most promising paths earlier. In light of this discussion we believe depth-first search should be reconsidered also for the minimum treewidth criterion. 8 Conclusion The contributions of this paper are three-fold. First, we have described new methods for finding optimal total table size triangulations of undirected graphs. The methods rely heavily on efficient dynamic maintenance of the cliques and total table size of a graph. These methods are mainly supposed to be used off-line, but they may also be transformed into any-time heuristics. Secondly, experiments show that depth-first search is faster than best-first search—this was quite unexpected. The main reason is that we 216 Ottosen & Vomlel use pruning based on a coalescing map which lowers the time complexity from Θ(γ(n) · n!) to O(γ(n) · n!) (n being the number of vertices in the graph and γ(n) being the per-node overhead). From the experiments we can infer that this pruning is so effective that depth-first search actually runs in O(γ(n)·2n ) time. Therefore we believe it will be beneficial to reconsider depth-first search for triangulation with the minimum treewidth criterion. Third, we have examined the quality of common heuristic algorithms on a set of graphs that are quite difficult to triangulate. The experiments show that these heuristics will never be able to guarantee good triangulations on all types of graphs, for example, on one model the min-width (and min-weight) heuristic would return a triangulation that requires at least 2.4 GB of memory whereas the optimal solution requires only 10 MB. This shows that off-line triangulation methods could be required in some cases. Acknowledgement The authors would like to thank the three anonymous reviewers for their helpful comments. J. Vomlel was supported by the Ministry of Education of the Czech Republic through grants 1M0572 and 2C06019 and by the Czech Science Foundation through grants ICC/08/E010 and 201/09/1891. References Hans L. Bodlaender, Arie M.C.A. Koster, and Frank van den Eijkhof. 2005. Preprocessing rules for triangulation of probabilistic networks. Computational Intelligence, 21:286–305. F. Cazals and C. Karande. 2008. A note on the problem of reporting maximal cliques. Theoretical Computer Science, 407(1-3):564–568, November. M. Julia Flores and José A. Gámez. 2003. Triangulation of Bayesian networks by retriangulation. International Journal of Intelligent Systems, 18:153–164. M. Julia Flores and José A. Gámez. 2007. A review on distinct methods and approaches to perform triangulation for Bayesian networks. Advances in Probabilistic Graphical Models, pages 127–152. Vibhav Gogate and Rina Dechter. 2004. A complete anytime algorithm for treewidth. In Proceedings of the Proceedings of the Twentieth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-04), pages 201–208, Arlington, Virginia. AUAI Press. Hanns-Georg Leimer. 1993. Optimal decomposition by clique separators. Discrete Math., 113(1-3):99– 123. Thorsten J. Ottosen and Jiřı́ Vomlel. 2010. Honour thy neighbour—clique maintenance in dynamic graphs. In Proceedings of the Fifth European Workshop on Probabilistic Graphical Models. Petr Savicky and Jiřı́ Vomlel. 2009. Triangulation heuristics for BN2O networks. In C. Sossai and G. Chemello, editors, Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, pages 566–577. Springer. Kirill Shoikhet and Dan Geiger. 1997. A practical algorithm for finding optimal triangulations. In AAAI’97: Proceedings of the 14th national conference on Artificial intelligence, pages 185–190. AAAI Press. Volker Stix. 2004. Finding all maximal cliques in dynamic graphs. Comput. Optim. Appl., 27(2):173– 186. Robert E. Tarjan. 1985. Decomposition by clique separators. Discrete Mathematics, 55(2):221 – 232. Wilson Wen. 1990. Optimal decomposition of belief networks. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI-90), pages 209–224, New York, NY. Elsevier Science. Adnan Darwiche. 2009. Modelling and Reasoning with Bayesian Networks. Cambridge University Press. Y. Xiang and J. Lee. 2006. Learning decomposable markov networks in pseudo-independent domains with local evaluation. Mach. Learn., 65(1):199– 227. P. Alex Dow and Richard E. Korf. 2007. Best-first search for treewidth. In AAAI’07: Proceedings of the 22nd national conference on Artificial intelligence, pages 1146–1151. AAAI Press. Mihalis Yannakakis. 1981. Computing the minimum fill-in is NP-complete. SIAM Journal on Algebraic and Discrete Methods, 2(1):77–79.
© Copyright 2024 Paperzz