Fast Computation of the Exact Hybridization Number of Two Phylogenetic Trees Yufeng Wu and Jiayin Wang Department of Computer Science and Engineering University of Connecticut ISBRA 2010 Phylogenetic Tree and Hybridization Network • Phylogenetic Tree: rooted, binary trees • Reticulate Evolution: tree model no longer sufficient: e.g. hybrid 1 speciation, horizontal gene transfer, recombination • Hybridization Network: a directed acyclic graph displays two phylogenetic trees in a compact way ρ T ρ T’ Input phylogenies 2 3 4 1 3 delete two yellow edges 1 2 4 delete two red edges 2 3 Hybridization Number Problem: compute the minimum hybridization events needed to construct a hybridization network displaying two trees 4 Hybridization event: nodes with in-degree two or more A Related Problem: rSPR Distance Problem ρ T 1 ρ T’ Input phylogenies 2 3 ρ 4 1 3 2 • rSPR distance problem: the minimum number of rooted Subtree Prune and Regraft operations to 4 transform T to T’ ρ Prune 2 3 3 Regraft 2 1 2 3 4 One rSPR operation 1 2 3 4 Two rSPR operations rSPR distance of two phylogenies = the number of subtrees in Maximum Agreement Forest (MAF) - 1 (Hein, et al and Bordewich, et al) Maximum Agreement Forest (MAF) ρ • Agreement Forest of T and T’: a set of subtrees s.t. – the two subtrees in AF have same topology in T and T’ – subtrees partition the given taxa – any two subtrees are vertex-disjoint Input phylogenies T 1 2 ρ 3 4 5 6 ρ 3 4 2 3 2 5 6 Number of subtrees is 3 ρ 1 1 T’ 4 6 5 Maximum Agreement Agreement Forest Forest • Maximum Agreement Forest is an agreement forest of two trees where the number of subtrees is minimized Maximum Acyclic Agreement Forest (MAAF) • Maximum Acyclic Agreement Forest: subtrees in MAF are acyclic Input phylogenies T 1 2 35 T’ 43 5 4 MAF Acyclic Maximum Agreement Forest 1 2 3 4 Ti in AF is ancestral to Tj if the root of Ti s ancestral to the root of Tj in either T or T’ 5 3 4 1 2 T12 T34 • Graph of Agreement Forest: GF(T,T’) • nodes in graph G correspond to trees in the AF • an edge from Ti to Tj if Ti is ancestral to Tj in the AF • When graph of the AF is acyclic, the AF is said to be acyclic Cyclic Graph of AF 5 Hybridization Number and Size of MAAF • Hybridization Number of two original trees = the number of T subtrees in a MAAF -1 (Baroni, et al, 2005) For example, the size of the Maximum Acyclic Agreement Forest is 3, so the hybridization number is 3-1=2 1 2 5 3 4 Maximum Acyclic Agreement Forest Input phylogenies T’ 1 2 3 4 5 3 Keep two red edges 4 1 2 Keep two yellow edges Node 3 and 4 are hybridizatio n events 2 5 1 3 4 Hybridization Network 5 Computation of the Exact Hybridization Number • Previous Work: Bordewich, Semple, et al, (2007), HybridNumber • Our Approach: Use Integer Linear Programming (ILP) to minimize the number of subtrees m • Object min Ci i 1 Ci=1 if edge ei is cut • Subject to 3 groups of constraints to ensure the result AF is MAAF • Our Idea: Find a minimum collection of edge-cuts to break down the tree into MAAF ρ e3 e1 1 ρ Input phylogenies e4 e2 e5 2 3 4 1 3 2 4 Triple incompatible ILP constraint for triple 1,2,3: C1+C2+C3+C4+C5≤1 Triple Constraint Pathway Constraint Cyclic Constraint More details for Triple Constraint and Pathway Constraint in Wu (2009) Graph of AF and Leaf Pair (LP) Graph • Difficulty: Graph of AF depends on AF MRCA(3,4) Input phylogenies T T’ • Leaf Pair (LP) Graph: a node corresponds to a 1 2 3 4 5 3 4 1 2 5 pair of two distinct leaves MRCA(1,2) • create an edge from lp(i,j) to lp(p,q) if: leaf pair lp(i,j) is ancestral to lp(p,q) if • the path between i Most Recent Common Ancestor (MRCA) and j is disjoint with of (i,j) is ancestral to MRCA of (p,q) that of p and q in both T and T’; and 1,2 3,4 • lp(i,j) is ancestral to Part of the Leaf lp(p,q) in either T or T’ Pair (LP) Graph Acyclicity of Leaf Pair Graph Input phylogenies T 1 T’ 2 3 4 1,2 1 5 3 4 1 2 5 3,4 2 3 4 5 Maximum Agreement Forest 1 2 5 3 4 Maximum Acyclic Agreement Forest • Realized Leaf Pair: if the two leaves are in the same subtree • Reduced LP Graph: A LP Graph for a certain AF • Lemma: For an AF, say F, GF(T,T’) is acyclic iff LP Graph(F) is acyclic • Add constraints naively: enumerate all cycles – impractical in most cases An Easy Way for Acyclic Constraints 1,3 3,7 4,5 1,2 Input phylogenies T’ T 4,6 1 2 3 4 5 6 7 4 5 6 1 2 3 7 ILP Constraint: M1,3 + M4,5 ≤ 1 • deal with Infeasible twin pair: Mi,j + Mp,q ≤ 1 Mi,j=1 if the path between i and j is not cut • Enumerate all possible elementary cycles after reduce c infeasible twin pairs M k 1 ik c 1 in biological data, it seems a great reduction Speed up by Divide and Conquer Approach Input phylogenies 9 9 T T2 T1 2 T’2 T’1 8 8 1 T’ 3 4 5 6 • Subtree Reduction: replace a pendant subtree occurs identically in T and T’ with a new label • Subtree reduction keeps the Hybridization Number 7 1 2 3 5 4 7 6 • Cluster Reduction: replace a cluster common to T and T’, say T1 and T’1 with a new label, the rest part of two trees are T2 and T’2 • h(T,T’)=h(T1,T’1)+h(T2,T’2) See Bordewich, et al (2007) for detail Results on Simulation Datasets Simulation datasets are from Beiko and Hamilton (2006) Each pair of phylogenies has 100 leaves and generated by applying 10 rSPR operations on one tree HybridNumber is another software tool to compute exact Hybridization Number HybridNumber SPRDist Running time (s) This version of HybridNumber downloaded in Oct. 2009 Later version of HybridNumber appears faster, but still very slow for EEEP data Results on Biological Datasets Tree pairs for a Grass (Poaceas) dataset from the Grass Phylogeny Working Group (2001) The results are gained under CPLEX environment The later version of HybridNumber gives roughly the same running time with ours but still not so scalable #Hybridization SPRDist (CPLEX) 40 14 5s 3s 2 36 13 10s 3s 3 34 12 7s 6s 4 19 9 1s 1s 5 46 19 51s 667s 6 21 4 0s 1s 7 21 7 3s 1s 8 14 3 1s 1s 9 30 8 1s 1s 10 26 13 14s 16s 11 12 7 1s 1s 12 29 14 80s 4h2716s 13 10 1 0s 1s 14 31 15 115s 7h776s 15 15 8 1s 2s Pair #Taxa 1 Hybrid Number Acknowledgment Research is supported by National Science Foundation [IIS-0803440] and the Research Foundation of University of Connecticut
© Copyright 2026 Paperzz