The Junction Tree Algorithm Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea [email protected] 1 / 22 Motivation: Beyond Trees ◮ So far we have seen a nice probabilistic inference algorithm (sum-product) which works on trees. ◮ Our interest is to develop an inference algorithm which works on graphs with cycles. ◮ In other words, we would like to apply the sum-product algorithm for arbitrary graphs. The answer is the junction tree algorithm. ◮ ◮ The underlying idea of the junction tree framework, is to consider clustering of nodes to form a tree, given a graph (possibly with cycles). But we have to cluster the nodes in a particular way. 2 / 22 An Example of Clique Tree X4 {X2 ,X5 ,X6 } X2 {X2 ,X3 ,X5 } {X2 ,X4 } X6 X1 {X1 ,X2 ,X3 } {X1 ,X2 } {X1 } X3 (a) X5 (b) X2 X4 X1 X1 X2 X2 X5 X6 X1 X2 X3 X2 X3 X5 (c) Figure: (a): A 6-node example; (b) The elimination clique created from a run of the elimination algorithm using the ordering {X6 , X5 , X4 , X3 , X2 , X1 }; (c) The elimination cliques arranged into a clique tree. 3 / 22 An Example of Clique Tree with Separator Sets X2 X4 X2 X1 X1 X1 X2 X1 X2 X2 X5 X6 X1 X2 X3 X2 X5 X2 X3 X2 X3 X5 4 / 22 The Junction Tree Algorithm? ◮ The junction tree algorithm is a general algorithmic framework, which provides an understanding of the general concepts that underly inference. ◮ The idea of junction tree algorithm is to find ways to decompose a global calculation on a joint probability distribution into a linked set of local computations. ◮ The overall procedure of the junction tree algorithm 1. Moralization 2. Triangulation 3. Construction of the junction tree 4. Potentials transfer 5. Propagation 5 / 22 Clique Tree, Separator Set, and Junction Tree Definition A clique tree is a tree structured graph whose vertices are maximal cliques of the original graph. Definition A separator set is the intersection of the corresponding cliques. Definition A junction tree is a special kind of clique tree such that for every pair of cliques, C1 and C2 , of the original graph, C1 ∩ C2 belongs to every separator set on the unique path from C1 to C2 . (running intersection property) 6 / 22 Examples of Graphs and their Clique Trees 1 2 3 4 1 2 3 4 =⇒ =⇒ 123 23 234 12 24 13 34 The upper one is a junction tree but the lower one is not. We can do elimination without adding any extra edges (upper one) but we cannot (lower one). 7 / 22 Extended Representation of Joint Probabilities Given a clique tree with cliques C and separators S, the joint probability is defined as Q ψC (xC ) p(x) = QC , S φS (xS ) where ψC (xC ) a clique potential and φS (xS ) represents a potential function for each separator S ∈ S. A p(xA , xB , xC ) B = = = C p(xA , xB )p(xC |xB ) p(xA , xB )p(xB , xC ) p(xB ) ψAB ψBC . φB 8 / 22 The Basic Operation of Junction Tree Algorithm The basic operation of the junction tree algorithm is an exchange of information between V and W with S serving as a conduit for the flow of information. ψV φS ψW V S W ◮ A separator potential is supportive if whenever a configuration yields a value of zero for the separator potential, the clique potentials at both ends of the edge containing that separator also evaluate to zero. ◮ If the potentials are to represent marginal probabilities, it is necessary that they are consistent with each other, i.e, they must give the same marginals for nodes that they have in common. ◮ It turns out that consistency is not only a necessary condition but it is also a sufficient condition for a probabilistic inference algorithm. ◮ Over next few slides, we focus on the elemental problem of achieving consistency between a pair of cliques. 9 / 22 Information Exchange: V → W The cliques V and W are endowed with potentials ψV and ψW and the separator S is also endowed with a potential φS that we initialize to unity. In the first step (information passing from V to W ), we update W based on V X ψV , φ∗S = V \S ∗ ψW = φ∗S ψW , φS where the first equation marginalizes the potential ψV with respect to S, storing the result in the separator potential. The second equation rescales the potential on W by multiplying an update factor that is the ratio of the new separator potential to its old value. The joint distribution is invariant after this update (considering ψV∗ = ψV ): ∗ ψV ψW φ∗S ψV ψW ψV∗ ψW = = . ∗ φS φS φ∗S φS 10 / 22 Information Exchange: W → V We pass information from W back to V , using the same update rule: φ∗∗ S = X ∗ ψW , W \S ψV∗∗ = φ∗∗ S ψV∗ . φ∗S Once again, the joint distribution remains unaltered by this update. ∗∗ An important property: The potentials ψV∗∗ and ψW are consistent with respect to S, i.e, they have the same marginals: X V \S ψV∗∗ = X φ∗∗ S φ∗S V \S ψV∗ = X ∗ X ∗∗ φ∗∗ φ∗∗ S ψV = S∗ φ∗S = φ∗∗ ψW . S = ∗ φS φS V \S W \S In the forward pass, from V to W , the algorithm stores the marginal of the V potential in the separator potential. In the backward pass, from W to V , the algorithm divides the V potential by its stored marginal and multiplies the result by the new marginal φ∗∗ S that is the marginal of the W potential. The rescaling equation essentially substitutes one marginal for another, thus making the two clique potentials consistent. 11 / 22 Propagation in a Clique Tree (1) C1 D1 V C2 S W D2 How to perform the local updates so that local consistency obtained between a clique and its neighbor is not ruined by subsequent updates between the clique and other neighbors? The update of one clique based on another is referred to as message-passing operation. The answer to the question above, is as follows: Message-Passing Protocol. A clique can send a message to a neighboring clique only when it has received message form all of its other neighbors. 12 / 22 Propagation in a Clique Tree (2) COLLECTEVIDENCE(node) begin for each child of node begin UPDATE( node, COLLECTEVIDENCE(child) ) end return( node ) end DISTRIBUTEEVIDENCE(node) begin for each child of node begin UPDATE( child, node) DISTRIBUTEEVIDENCE( child) end return( node ) end 13 / 22 Propagation in a Clique Tree (3) Theorem The COLLECTEVIDENCE and DISTRIBUTEVIDENCE recursions respect the Message-Passing Protocol. (a) (b) The message-passing resulting from a call of COLLECTEVIDENCE(a) and DISTRIBUTEEVIDENCE(b). 14 / 22 So far we have ... and next? ◮ What we have seen so far, are: ◮ ◮ ◮ What clique trees and junction trees are. Message passing on junction trees. Our next questions are: ◮ ◮ How to get a junction tree? What classes of graphs have a junction tree? 15 / 22 Triangulated Graph Definition A cycle is cordless if no two non-adjacent vertices on the cycle are joined by an edge. 1 2 1 2 3 4 3 4 non-cordless cordless Definition A graph is triangulated if it has no cordless cycles. All triangulated graphs have a junction tree. 16 / 22 Decomposable Graph Definition A graph is decomposable if either it is complete or it can be divided up into disjoint, non-empty subsets, A, B, S such that: 1. S separates A from B 2. S is a clique 3. A ∪ S and B ∪ S are also decomposable. Examples of decomposable graphs are shown below: 1 2 3 4 1 2 3 For the left graph, A = 1, B = 3, and S = 2. For the right graph, A = 1, B = 4, and S = {2, 3}. 17 / 22 Decomposable ≡ Triangulated≡ Junction Tree Theorem The following properties of an undirected graph are all equivalent: 1. G is decomposable 2. G is triangulated 3. G is a junction tree The decomposability captures the divide-and-conquer nature of these algorithms. Elimination produces a triangulated graph that can be used to build a junction tree. Junction trees are a canonical data structure useful for organizing computations. 18 / 22 Constructing a Junction Tree: An Example 1 1 2 3 2 3 4 5 4 5 original graph triangulated graph 123 123 123 23 235 isolated cliques 245 235 a junction tree 2 25 245 235 25 245 a clique tree (not junction tree) 19 / 22 Constructing a Junction Tree (1) As we have seen in an example (shown in previous slide), sometimes a clique tree is a junction tree, while othertimes it is not. In practice, we cannot enumerate all the clique trees, so with larger graphs we will need a better approach. Definition The weight of a separator set is defined as w (S) = |S| = N X I [xi ∈ S] . i =1 Definition The weight of a tree is defined by w (T ) = M−1 X w (Sj ), j=1 where M is the number of cliques (vertices in a clique tree). Recall that a tree with M nodes has M − 1 edges, thus M − 1 separator sets (one for every edge). 20 / 22 Constructing a Junction Tree (2) Consider a node xk and a clique tree T with cliques Ci and separators Sj . The count of the number of times that xk appears as an element in one of the Sj , is related to the count of the number of times that xk appears as an element in one of the Ci . Lemma For a tree, with k ∈ {1, . . . , N}, M−1 X I [xk ∈ Sj ] ≤ M X I [xk ∈ Ci ] − 1. i =1 j=1 Note that the equality holds if and only if the running intersection property holds for k. Theorem A clique tree T is a junction tree if and only if it is a maximal spanning tree. w (T ) = M−1 X ≤ M−1 N X X N M−1 h i i X X h I xk ∈ Sj = I xk ∈ Sj k=1 j =1 N M M X N M X X X X I xk ∈ Ci − 1 = I xk ∈ Ci − N = |Ci | − N. k=1 i =1 i =1 k=1 i =1 j =1 w (Sj ) = j =1 k=1 21 / 22 The Hugin Algorithm ◮ Moralization: Convert a directed graph into an undirected graph. ◮ Introduction of evidence: Evidence is introduced by taking slices of the potentials. ◮ Triangulation: The elimination algorithm always produces a triangulated graph. However there N! orderings of the elimination algorithm. In fact, finding an optimal elimination ordering is NP-hard. Construction of a junction tree: Construct a junction tree by forming a maximal spanning tree from the cliques of the triangulated graph. Belief propagation: Use update equations, designating a root node and calling COLLECTEVIDENCE and DISTRIBUTEEVIDENCE from the root. ◮ ◮ 22 / 22
© Copyright 2026 Paperzz