The Junction Tree Algorithm

The Junction Tree Algorithm
Seungjin Choi
Department of Computer Science and Engineering
Pohang University of Science and Technology
77 Cheongam-ro, Nam-gu, Pohang 37673, Korea
[email protected]
1 / 22
Motivation: Beyond Trees
◮
So far we have seen a nice probabilistic inference algorithm
(sum-product) which works on trees.
◮
Our interest is to develop an inference algorithm which works on
graphs with cycles.
◮
In other words, we would like to apply the sum-product algorithm for
arbitrary graphs.
The answer is the junction tree algorithm.
◮
◮
The underlying idea of the junction tree framework, is to consider
clustering of nodes to form a tree, given a graph (possibly with
cycles). But we have to cluster the nodes in a particular way.
2 / 22
An Example of Clique Tree
X4
{X2 ,X5 ,X6 }
X2
{X2 ,X3 ,X5 }
{X2 ,X4 }
X6
X1
{X1 ,X2 ,X3 }
{X1 ,X2 }
{X1 }
X3
(a)
X5
(b)
X2 X4
X1
X1 X2
X2 X5 X6
X1 X2 X3
X2 X3 X5
(c)
Figure: (a): A 6-node example; (b) The elimination clique created from a run
of the elimination algorithm using the ordering {X6 , X5 , X4 , X3 , X2 , X1 }; (c) The
elimination cliques arranged into a clique tree.
3 / 22
An Example of Clique Tree with Separator Sets
X2 X4
X2
X1
X1
X1 X2
X1 X2
X2 X5 X6
X1 X2 X3
X2 X5
X2 X3
X2 X3 X5
4 / 22
The Junction Tree Algorithm?
◮
The junction tree algorithm is a general algorithmic framework,
which provides an understanding of the general concepts that
underly inference.
◮
The idea of junction tree algorithm is to find ways to decompose a
global calculation on a joint probability distribution into a linked set
of local computations.
◮
The overall procedure of the junction tree algorithm
1. Moralization
2. Triangulation
3. Construction of the junction tree
4. Potentials transfer
5. Propagation
5 / 22
Clique Tree, Separator Set, and Junction Tree
Definition
A clique tree is a tree structured graph whose vertices are maximal
cliques of the original graph.
Definition
A separator set is the intersection of the corresponding cliques.
Definition
A junction tree is a special kind of clique tree such that for every pair of
cliques, C1 and C2 , of the original graph, C1 ∩ C2 belongs to every
separator set on the unique path from C1 to C2 . (running intersection
property)
6 / 22
Examples of Graphs and their Clique Trees
1
2
3
4
1
2
3
4
=⇒
=⇒
123
23
234
12
24
13
34
The upper one is a junction tree but the lower one is not. We can do
elimination without adding any extra edges (upper one) but we cannot (lower
one).
7 / 22
Extended Representation of Joint Probabilities
Given a clique tree with cliques C and separators S, the joint probability is
defined as
Q
ψC (xC )
p(x) = QC
,
S φS (xS )
where ψC (xC ) a clique potential and φS (xS ) represents a potential function for
each separator S ∈ S.
A
p(xA , xB , xC )
B
=
=
=
C
p(xA , xB )p(xC |xB )
p(xA , xB )p(xB , xC )
p(xB )
ψAB ψBC
.
φB
8 / 22
The Basic Operation of Junction Tree Algorithm
The basic operation of the junction tree algorithm is an exchange of information
between V and W with S serving as a conduit for the flow of information.
ψV
φS
ψW
V
S
W
◮
A separator potential is supportive if whenever a configuration yields a
value of zero for the separator potential, the clique potentials at both ends
of the edge containing that separator also evaluate to zero.
◮
If the potentials are to represent marginal probabilities, it is necessary that
they are consistent with each other, i.e, they must give the same
marginals for nodes that they have in common.
◮
It turns out that consistency is not only a necessary condition but it is
also a sufficient condition for a probabilistic inference algorithm.
◮
Over next few slides, we focus on the elemental problem of achieving
consistency between a pair of cliques.
9 / 22
Information Exchange: V → W
The cliques V and W are endowed with potentials ψV and ψW and the
separator S is also endowed with a potential φS that we initialize to unity.
In the first step (information passing from V to W ), we update W based on V
X
ψV ,
φ∗S =
V \S
∗
ψW
=
φ∗S
ψW ,
φS
where the first equation marginalizes the potential ψV with respect to S,
storing the result in the separator potential. The second equation rescales the
potential on W by multiplying an update factor that is the ratio of the new
separator potential to its old value.
The joint distribution is invariant after this update (considering ψV∗ = ψV ):
∗
ψV ψW φ∗S
ψV ψW
ψV∗ ψW
=
=
.
∗
φS
φS φ∗S
φS
10 / 22
Information Exchange: W → V
We pass information from W back to V , using the same update rule:
φ∗∗
S =
X
∗
ψW
,
W \S
ψV∗∗ =
φ∗∗
S
ψV∗ .
φ∗S
Once again, the joint distribution remains unaltered by this update.
∗∗
An important property: The potentials ψV∗∗ and ψW
are consistent with respect
to S, i.e, they have the same marginals:
X
V \S
ψV∗∗ =
X φ∗∗
S
φ∗S
V \S
ψV∗ =
X ∗
X ∗∗
φ∗∗
φ∗∗
S
ψV = S∗ φ∗S = φ∗∗
ψW .
S =
∗
φS
φS
V \S
W \S
In the forward pass, from V to W , the algorithm stores the marginal of the V
potential in the separator potential. In the backward pass, from W to V , the
algorithm divides the V potential by its stored marginal and multiplies the
result by the new marginal φ∗∗
S that is the marginal of the W potential. The
rescaling equation essentially substitutes one marginal for another, thus making
the two clique potentials consistent.
11 / 22
Propagation in a Clique Tree (1)
C1
D1
V
C2
S
W
D2
How to perform the local updates so that local consistency obtained between a
clique and its neighbor is not ruined by subsequent updates between the clique
and other neighbors?
The update of one clique based on another is referred to as message-passing
operation. The answer to the question above, is as follows:
Message-Passing Protocol. A clique can send a message to a neighboring
clique only when it has received message form all of its other neighbors.
12 / 22
Propagation in a Clique Tree (2)
COLLECTEVIDENCE(node)
begin
for each child of node
begin
UPDATE( node, COLLECTEVIDENCE(child) )
end
return( node )
end
DISTRIBUTEEVIDENCE(node)
begin
for each child of node
begin
UPDATE( child, node)
DISTRIBUTEEVIDENCE( child)
end
return( node )
end
13 / 22
Propagation in a Clique Tree (3)
Theorem
The COLLECTEVIDENCE and DISTRIBUTEVIDENCE recursions respect the
Message-Passing Protocol.
(a)
(b)
The message-passing resulting from a call of COLLECTEVIDENCE(a) and
DISTRIBUTEEVIDENCE(b).
14 / 22
So far we have ... and next?
◮
What we have seen so far, are:
◮
◮
◮
What clique trees and junction trees are.
Message passing on junction trees.
Our next questions are:
◮
◮
How to get a junction tree?
What classes of graphs have a junction tree?
15 / 22
Triangulated Graph
Definition
A cycle is cordless if no two non-adjacent vertices on the cycle are joined
by an edge.
1
2
1
2
3
4
3
4
non-cordless
cordless
Definition
A graph is triangulated if it has no cordless cycles.
All triangulated graphs have a junction tree.
16 / 22
Decomposable Graph
Definition
A graph is decomposable if either it is complete or it can be divided up into
disjoint, non-empty subsets, A, B, S such that:
1. S separates A from B
2. S is a clique
3. A ∪ S and B ∪ S are also decomposable.
Examples of decomposable graphs are shown below:
1
2
3
4
1
2
3
For the left graph, A = 1, B = 3, and S = 2. For the right graph,
A = 1, B = 4, and S = {2, 3}.
17 / 22
Decomposable ≡ Triangulated≡ Junction Tree
Theorem
The following properties of an undirected graph are all equivalent:
1. G is decomposable
2. G is triangulated
3. G is a junction tree
The decomposability captures the divide-and-conquer nature of these
algorithms. Elimination produces a triangulated graph that can be used
to build a junction tree. Junction trees are a canonical data structure
useful for organizing computations.
18 / 22
Constructing a Junction Tree: An Example
1
1
2
3
2
3
4
5
4
5
original graph
triangulated graph
123
123
123
23
235
isolated cliques
245
235
a junction tree
2
25
245
235
25
245
a clique tree (not junction tree)
19 / 22
Constructing a Junction Tree (1)
As we have seen in an example (shown in previous slide), sometimes a clique
tree is a junction tree, while othertimes it is not. In practice, we cannot
enumerate all the clique trees, so with larger graphs we will need a better
approach.
Definition
The weight of a separator set is defined as
w (S) = |S| =
N
X
I [xi ∈ S] .
i =1
Definition
The weight of a tree is defined by
w (T ) =
M−1
X
w (Sj ),
j=1
where M is the number of cliques (vertices in a clique tree). Recall that a tree
with M nodes has M − 1 edges, thus M − 1 separator sets (one for every edge).
20 / 22
Constructing a Junction Tree (2)
Consider a node xk and a clique tree T with cliques Ci and separators Sj . The
count of the number of times that xk appears as an element in one of the Sj , is
related to the count of the number of times that xk appears as an element in
one of the Ci .
Lemma
For a tree, with k ∈ {1, . . . , N},
M−1
X
I [xk ∈ Sj ] ≤
M
X
I [xk ∈ Ci ] − 1.
i =1
j=1
Note that the equality holds if and only if the running intersection property
holds for k.
Theorem
A clique tree T is a junction tree if and only if it is a maximal spanning tree.
w (T )
=
M−1
X
≤
M−1
N
X X
N M−1
h
i
i
X
X h
I xk ∈ Sj =
I xk ∈ Sj
k=1 j =1


N
M M X
N
M
X
X
X
X

I xk ∈ Ci − 1 =
I xk ∈ Ci − N =
|Ci | − N.
k=1 i =1
i =1 k=1
i =1
j =1
w (Sj ) =
j =1 k=1
21 / 22
The Hugin Algorithm
◮
Moralization: Convert a directed graph into an undirected graph.
◮
Introduction of evidence: Evidence is introduced by taking slices of
the potentials.
◮
Triangulation: The elimination algorithm always produces a
triangulated graph. However there N! orderings of the elimination
algorithm. In fact, finding an optimal elimination ordering is
NP-hard.
Construction of a junction tree: Construct a junction tree by forming
a maximal spanning tree from the cliques of the triangulated graph.
Belief propagation: Use update equations, designating a root node
and calling COLLECTEVIDENCE and DISTRIBUTEEVIDENCE
from the root.
◮
◮
22 / 22