Learning Bayesian Networks via Edge Walks on DAG Associahedra Liam Solus Based on work with Lenka Matejovicova, Adityanarayanan Radhakrishnan, Caroline Uhler, and Yuhao Wang KTH Royal Institute of Technology [email protected] 18 January 2017 Workshop on Convex Polytopes, Osaka University Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 1 / 34 Bayesian Basics Definitions Directed Acyclic Graph (DAG) Models G = ([n], A) a directed acyclic graph (DAG) 1 3 2 The node i associates to a random variable Xi 4 Markov Assumption (MA) The nonedges of G encode conditional independence (CI) relations capturing cause-effect relationships: Xi ⊥ ⊥ Xnondes(i)\ pa(i) | Xpa(i) pa(i) := the collection of parents of i. nondes(i) := the nondescendents of i = [n]\ des(i) ∪ {i}. The CI relations implied by the MA are captured by d-separation in G. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 2 / 34 Bayesian Basics Definitions Directed Acyclic Graph (DAG) Models Let A, B, C ⊂ [n] disjoint with A, B 6= ∅. We say that C d-connects A and B in G if there is an undirected path U from A to B such that 1 every collider on U has a descendant in C , and 2 no other node is in C . 1 2 3 4 5 8 6 i−1 i i+1 7 9 In there is no such path U, we say A and B are d-separated by C . The Global Markov Property A probability distribution P obeys MA for a DAG G if and only if XA ⊥ ⊥ XB | XC for all A, B, C for which C d-separates A and B in G. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 3 / 34 Bayesian Basics Definitions Directed Acyclic Graph (DAG) Models Bio-Mom Bio-Mom ⊥ ⊥ Bio-Grandchild | Bio-Child Bio-Dad Bio-Child Bio-Mom 6⊥ ⊥ Bio-Dad | Bio-Child Bio-Grandchild General Goal Suppose we obtain data from an unknown DAG G, from which we infer a collection of CI relations C. Can we learn the DAG G from C? Algorithms? Consistency gaurantees? Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 4 / 34 Bayesian Basics Algorithms The PC and SP Algorithms The PC-Algorithm: (Spirtes, Glymour, and Scheines, 2001) 1 Identify undirected graph. 2 Then orient the edges. The SP-Algorithm: (Uhler and Raskutti, 2014) 1 To each permutation π = π1 π2 . . . πn construct a permutation DAG Gπ with arrows (πi , πj ) ∈ E (Gπ ) if and only if i < j and πi 6⊥ ⊥ πj | {π1 , . . . , πj−1 }\{πi }. 2 Choose a sparsest permutation π ∗ ; i.e. a permutation for which Gπ has the fewest edges. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 5 / 34 Bayesian Basics Algorithms The SP-Algorithm n = 3 and C = {1 ⊥ ⊥ 3} : 1 3 1 3 1 3 2 2 2 123 132 213 1 3 1 3 1 3 2 2 2 231 312 321 i < j and πi 6⊥ ⊥ πj | {π1 , . . . , πj−1 }\{πi }. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 6 / 34 Bayesian Basics Algorithms Consistency Guarantees Faithfulness Assumption A probability distribution P is faithful to a DAG G if the only CI relations satisfied by P are those entailed by the MA. Restricted Faithfulness Assumption P satisfies the restricted faithfulness assumption with respect to a DAG G = ([n], A) if it satisfies the following two conditions: 1 2 Adjacency Faithfulness – Faithfulness for all arrows i → j ∈ A. Orientation Faithfulness – Faithfulness for all triples (i, j, k). j i k Liam Solus (KTH) j i Learning Bayesian Networks k 18 January 2017 7 / 34 Bayesian Basics Algorithms Consistency Guarantees SMR Assumption P satisfies the SMR assumption with respect to a DAG G if it satisfies the MA with respect to G and |G| < |H| for every DAG H such that P satisfies the MA for H and H is not Markov equivalent to G. Theorem (Uhler and Raskutti, 2014) The SP-algorithm is consistent under the SMR assumption which is strictly weaker than restricted faithfulness. Downside: SP-algorithm search space is factorial in size! Can we more efficiently search through the permutations Sn ? Can we shrink the search space of the SP-algorithm while maintaining consistency gaurantees? Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 8 / 34 Some Geometry Permutohedra Edge Walks on a Permutohedron Pn dron Each vertex corresponds to a permutation DAG Gπ . Edges correspond to flipping adjacent transpositions: 1342 − 1432 Can we take a greedy walk along the edges of Pn ? I.e. walk from Gπ to Gτ whenever |E (Gπ )| > |E (Gτ )|? (One) Problem: Two permutations can have the same permutation DAGs. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 9 / 34 Some Geometry DAG Associahedra DAG Associahedra (Mohammadi, Uhler, Wang, Yu, 2016) DAG associahedron 1 3 2 4 CI relations: 1? ? 2, 1 ? ? 4 | 3, 1 ? ? 4 | {2, 3} 2? ? 4 | 3, 2 ? ? 4 | {1, 3} Associate CI relations to edges of Pn with respect to the dependence relations i < j and πi 6⊥ ⊥ πj | {π1 , . . . , πj−1 }\{πi } : 1⊥ ⊥ 2 – no elements in conditioning set: edges with nothing before {1, 2}: 1234 − 2134 and 1243 − 2143. 2⊥ ⊥ 4 | {3} – conditioning set is {3}: edges with 3 before 2 and 4: 3241 − 3421. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 10 / 34 Some Geometry DAG Associahedra DAG Associahedra DAG associahedron(Mohammadi, Uhler, Wang, Yu, 2016) 1 3 4 2 Caroline (MIT) Liam Solus Uhler (KTH) CI relations: 1? ? 2, 1 ? ? 4 | 3, 1 ? ? 4 | {2, 3} 2? ? 4 | 3, 2 ? ? 4 | {1, 3} Sparsest permutations Learning Bayesian Networks Munich, Oct 2016 18 January 2017 10 /1114/ 34 Some Geometry DAG Associahedra DAG Associahedra (Mohammadi, Uhler, Wang, Yu, 2016) DAG associahedron Theorem (Mohammadi, Uhler, C. Wang, Yu, 2016) 1 CI relations: 1? ?are 2, labeled 1? ? 4 |by 3, the 1? ?different 4 | {2, 3} Pn (C) is a convex polytope whose vertices 4 3 2 ? ? 4 | 3, 2 ? ? 4 | {1, 3} possible permutation DAGs for C. 2 Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 12 / 34 Greedy SP Algorithms Greedy SP Algorithms The vertices of the DAG Associahedron, Pn (C), serve as a reduced search space for the SP-algorithm. Edge SP Algorithm Input: C and a permutation π ∈ Sn . Take a “nonincreasing edge walk” along the edges of the DAG associahedron Pn (C). When is this algorithm consistent? Under the faithfulness assumption? Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 13 / 34 Greedy SP Algorithms Geometric Aspects of Faithfulness Geometric Aspects of Faithfulness A covered edge in a DAG G is any edge i → j such that pa(j) = pa(i) ∪ {i}. Revisiting Our SP-algorithm Example n = 3 and C = {1 ⊥ ⊥ 3} : 1 3 1 3 2 2 π = 123 τ = 132 i < j and πi 6⊥ ⊥ πj | {π1 , . . . , πj−1 }\{πi }. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 14 / 34 Greedy SP Algorithms Geometric Aspects of Faithfulness Geometric Aspects of Faithfulness Theorem (Matejovicova, LS, Uhler, Y. Wang, 2017) Under the faithfulness assumption, each edge e corresponds to flipping a covered edge in one of the DAGs associated to the endpoints of e. Triangle SP Algorithm Input: C and a permutation π ∈ Sn . Take a “nonincreasing edge walk” along the edges of the DAG associahedron Pn (C) that correspond to covered edge reversals. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 15 / 34 Greedy SP Algorithms Geometric Aspects of Faithfulness Edge Walks and Independence Maps A DAG H is an independence map of another DAG G, written G ≤ H, if any CI relation entailed by H is also entailed by G. Theorem (Matejovicova, LS, Uhler, Y. Wang, 2017) A positive probability distribution P is faithful to a sparsest DAG Gπ∗ if and only if Gπ∗ ≤ Gπ for all permutations π. Result of Chickering 2002 implies that under faithfulness we can always find an sequence of independence maps Gπ∗ =: G 0 ≤ G 1 ≤ G 2 ≤ · · · ≤ G N := Gπ . Can always find a sequence that coincides with a nonincreasing edge walk! Theorem (Matejovicova, LS, Uhler, Y. Wang, 2017) The Triangle SP Algorithm is consistent under the faithfulness assumption. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 16 / 34 Greedy SP Algorithms Parsing out the Assumptions ESP Assumption: The assumption guaranteeing consistency of Edge SP. TSP Assumption: The assumption guaranteeing consistency of Triangle SP. A ≺ B = “A is strictly weaker than B” Theorem (Matejovicova, LS, Uhler, Y. Wang, 2017) SMR ≺ ESP ≺ TSP ≺ faithfulness. What about restricted faithfulness? Theorem (Matejovicova, LS, Uhler, Y. Wang, 2017) Consistency under the TSP assumption implies adjacency faithfulness, but not orientation faithfulness. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 17 / 34 Greedy SP Algorithms The Good News The vertices of the DAG Associahedron serve as a reduced search space for the SP Algorithm. In this way, we can execute the SP algorithm in a reduced search space (less than n! elements). The ESP and TSP Algorithms can greedily search over the vertices of a DAG associahedron. The TSP Algorithm is consistent under faithfulness. We understand the relationships amongst the SMR, ESP, TSP, faithfulness, and restricted faithfulness assumptions. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 18 / 34 Greedy SP Algorithms Markov Equivalence The Bad News: Markov Equivalence of DAGs Our search along the edges is not truly greedy... At times we are required to move between DAGs Gπ and Gτ where |Gπ | = |Gτ |. 1 2 3 1 2 3 1 2 3 1⊥ ⊥ 3|2 1 2 3 1⊥ ⊥3 Two DAGs that differ only by a covered edge reversal entail the same set of CI relations. We call any two such DAGs Markov equivalent. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 19 / 34 Greedy SP Algorithms Markov Equivalence The Bad News: The Problem of Markov Equivalence Problem: The algorithm may search through large portions of a Markov Equivalence Class (MEC) before finding a neighboring DAG with fewer edges. To terminate it must search the ENTIRE MEC of the sparsest DAGs! This motivates two enumerative questions: 1 For a fixed set of graph parameters, how many MECs are there? 2 What are the sizes of these MECs? Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 20 / 34 Greedy SP Algorithms Markov Equivalence Markov Equivalence of DAGs A collider that is not in a triangle is called an immorality Immorality Not an immorality Theorem (Verma and Pearl, 1992) Two DAGs are Markov equivalent if and only if they have the same skeleton and the same set of immoralities. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 21 / 34 Greedy SP Algorithms Markov Equivalence Markov Equivalence of DAGs An MEC with three elements: [G] = the MEC containing the DAG G. [G] has an associated partially directed graph called the essential graph of [G]. chain components of [G] = undirected connected b components of G. essential components of [G] = directed connected b components of G. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 22 / 34 Greedy SP Algorithms Markov Equivalence The Two Enumerative Questions Previous work: 1 2 (Gillespie and Perlman, 2001) Computer enumeration of all MECs on p ≤ 10 nodes. (Gillespie ’06, Steinsky ’03, Wagner ’13) Enumeration of MECs of a size: 1 2 Formulas only for small class sizes (size one, two, and three), or restricted chordal components. Inclusion-Exclusion arguments on essential graphs. A new approach: 1 Greedy SP algorithm must search ENTIRE true MEC to terminate 2 when can we solve the enumeration problem for a fixed skeleton? Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 23 / 34 The Problem of Markov Equivalence Combinatorial Enumeration by Skeleton A New Instance of an Old Combinatorial Approach Restrict to a type of skeleton and solve the enumeration problem here. + || Ip := the path on p vertices. M(G ) := number of MECs with skeleton G . M(Ip ) = M(Ip−1 ) + M(Ip−2 ) and M(I1 ) = 1, M(I2 ) = 1 M(Ip ) = Fp−1 , the (p − 1)st Fibonacci number. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 24 / 34 The Problem of Markov Equivalence Combinatorial Enumeration by Skeleton A Second Proof: More Information An independent set in G is a subset of mutually non-adjacent nodes: αk (G ) := number of independent sets in G of size k P I (G ; x) := k≥0 αk (G )x k ; the independence polynomial of G . 1 1 1 1 mk (G ) := number of MECs on G with k immoralities. P k k≥0 mk (Ip )x = I (Ip−2 ; x) = Fp−1 (x), the (p − 1)st Fibonacci Polynomial Liam Solus (KTH) Learning Bayesian Networks 1 1 1 1 6 7 2 3 4 5 1 1 3 6 1 4 10 10 1 5 15 20 15 21 35 35 21 1 6 1 7 18 January 2017 1 25 / 34 The Problem of Markov Equivalence Combinatorial Enumeration by Skeleton A Second Proof: More Information s` (G ) := the number of MECs on G of size `. A composition of p into k parts is an ordered sum of k positive integers the value of which is p: 1 + 3 + 2 c1 + c2 + · · · + ck = p. s` (Ip ) is the number of compositions of p − k into k + 1 parts over all k for which k+1 Y cj = `. j=1 Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 26 / 34 The Problem of Markov Equivalence Combinatorial Enumeration by Skeleton Combinatorial Statistics: Refining the Problem M(G ) = the total number of MECs on G . mk (G ) = the number of MECs on G with precisely k immoralities. m(G ) = the maximum number of immoralities within an MEC on G . sk (G ) = the number of MECs on G of size k. The first three statistics combine into the polynomial generating function: m(G ) M(G ; x) := X mk (G )x k k=0 Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 27 / 34 The Problem of Markov Equivalence Combinatorial Enumeration by Skeleton Some Further Examples The complete set of these statistics are recoverable for some other important graphs including: Cp Sp ' K1,p K2,p Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 28 / 34 The Problem of Markov Equivalence Combinatorial Enumeration by Skeleton Sparse Examples: Bounds for Trees A classical bound on the number of independent sets a tree also holds for the number of MECs on a tree: Theorem (Radhakrishnan, LS, Uhler, 2016) Let Tp be an undirected tree on p nodes. Then Fp−1 = M(Ip ) ≤ M(Tp ) ≤ M(Sp−1 ) = 2p−1 − p + 1. We can also bound the size of an MEC: Theorem (Radhakrishnan, LS, Uhler, 2016) Let Tp be a directed tree on p nodes whose essential graph has ` > 0 chain components m ≥ 0 essential components. Then ` 2 ≤ #[Tp ] ≤ Liam Solus (KTH) p−m ` Learning Bayesian Networks ` . 18 January 2017 29 / 34 Greedy SP-Algorithms An Implementable Algorithm Overcoming Exponentiality: An Implementable Algorithm We need a version of the TSP-Algorithm that avoids the problem of exponentially-sized Markov equivalence classes of permutation DAGs. Solution: Introduce a search-depth bound d and a fixed number of runs r . Triangle SP Algorithm with depth and run bounds Input: C and two positive integers d and r . 1 Pick a permutation DAG Gπ and do a depth-first search along the edges of Pn (C) with depth bound d, searching for a sparser permutation DAG. Repeat search until no sparser DAG is found, and return the last DAG visited. 2 Do step 1 r times and then select the sparsest of the r permutation DAGs recovered. Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 30 / 34 Greedy SP-Algorithms An Implementable Algorithm A Sample of Some Simulations Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 31 / 34 Greedy SP-Algorithms An Implementable Algorithm A Sample of Some Simulations Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 32 / 34 Moral of the Story Combinatorial convex geometry of Generalized Permutohedra can provide DAG model learning algorithms useful in causal inference! Combinatorics of DAG associahedra provide a graphical version of these algorithms that are implementable and are consistent under common identifiability assumptions! The number and size of Markov equivalence classes is important to understanding the efficiency of algorithms searching over a space of DAGs. Connections to classic combinatorial optimization problems yield FUN problems and reveal that MECs can be large even for sparse graphs! Adjusting algorithms with fixed search-depth and run bounds results in algorithms that are efficient and more reliable than the PC-algorithm! Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 33 / 34 Thank You! (Preprints available on the ArXiv!) Liam Solus (KTH) Learning Bayesian Networks 18 January 2017 34 / 34
© Copyright 2026 Paperzz