Document

Bayesian Networks
主講人:虞台文
大同大學資工所
智慧型多媒體研究室
Contents
Introduction
 Probability Theory  Skip
 Inference
 Clique Tree Propagation

–
–
Building the Clique Tree
Inference by Propagation
Bayesian Networks
Introduction
大同大學資工所
智慧型多媒體研究室
What is Bayesian Networks?

Bayesian Networks are directed acyclic
graphs (DAGs) with an associated set of
probability tables.

The nodes are random variables.

Certain independence relations can be
induced by the topology of the graph.
Why Use a Bayesian Network?

Deal with uncertainty in inference via
probability  Bayes.

Handle incomplete data set, e.g.,
classification, regression.

Model the domain knowledge, e.g., causal
relationships.
Use a DAG to model the causality.
Example
Train
Strike
Martin
Oversleep
Boss
Failure-in-Love
Norman
Oversleep
Martin
Late
Norman
Late
Project
Delay
Office
Dirty
Boss
Angry
Attach prior probabilities to all root nodes
Martin
oversleep
Probability
Train
Strike
Probability
Norman
oversleep
Probability
T
0.01
T
0.1
T
0.2
F
0.99
F
0.9
F
0.8
Example
Train
Strike
Martin
Oversleep
Boss
Failure-in-Love
Norman
Oversleep
Martin
Late
Norman
Late
Project
Delay
Boss
Office
Dirty
failure-in-love
Probability
T
0.01
F
Boss
0.99
Angry
Attach prior probabilities to non-root nodes
Example Each column is summed to 1.
Train
Strike
Martin
Oversleep
Martin
Late
Norman
Oversleep
Norman
Late
Train strike
Boss
T
Failure-in-Love
Martin
Late
Norman oversleep
Office
F
Dirty T
Project
F
Delay
Martin oversleep
T
F
T
F
T
0.95
0.8
0.7
0.05
F
0.05
0.2
0.3
0.95
Norman
untidy
Norman T
untidy F
Boss
Angry
0.6
0.2
0.4
0.8
Attach prior probabilities to non-root nodes
Boss Failure-in-love
Example Each column is summed to 1.
T
F
Project Delay
T
Boss
Angry
F
T
Office Dirty
Train
F
T
F
Strike
0.5
0.3
0.2
MartinT
Oversleep
very
0.98
F
T
0.85
0.6
mid
0.02
0.15
little
0
0
no
0
0
0.3
0.25
Martin 0.25
0.1
Late
0
0
Boss
Failure-in-Love
F
0.5
0.5
0.2
0.3
0
0
Project
Delay
What is the difference between
probability & fuzzy measurements?
T
F
0
0.01
0.2
0.02
Norman
0.7
0.07
Late
0.1
0.9
Office
Dirty
Boss
Angry
Norman
Oversleep
Norman
untidy
Medical Knowledge
Example
Definition of Bayesian Networks
A Bayesian network is a directed acyclic graph with
the following properties:
 Each node represents a random variable.
 Each node representing a variable A with parent
nodes representing variables B1, B2,..., Bn is
assigned a conditional probability table (CPT):
P( A | B1 , B2 ,
, Bn )
Problems
How to inference?
 How to learn the probabilities from data?
 How to learn the structure from data?

Bad news: All of them are NP-Hard

What applications we may have?
Bayesian Networks
Inference
大同大學資工所
智慧型多媒體研究室
Inference
Example
Train
Strike
Martin
Late
T
F
Train Strike
T
F
0.6
0.5
0.4
0.5
Martin
Late
Train
Strike
T
F
Norman
Late
Probability
0.1
0.9
Norman
Late
T
F
Train Strike
T
F
0.8
0.1
0.2
0.9
Questions:
P (“Martin Late”, “Norman Late”, “Train Strike”)=?
Joint distribution
P(“Martin Late”)=?
Marginal distribution
P(“Matrin Late” | “Norman Late ”)=?
Conditional distribution
Example
C
Martin
Late
T
F
Train Strike
T
F
0.6
0.5
0.4
0.5
Martin
Late
A
B
C
Probability
T
T
T
0.048
F
T
T
0.032
T
F
T
0.012
F
F
T
0.008
T
T
F
0.045
F
T
F
0.045
T
F
F
0.405
F
F
F
0.405
Train
Strike
T
F
Train
Strike
A
B
Demo
Probability
Norman
Late
0.1
0.9
Norman
Late
T
F
Questions:
P (“Martin Late”, “Norman Late”, “Train Strike”)=?
Joint distribution
P( A, B, C )  P( A | B, C ) P( B | C ) P(C )  P( A | C ) P( B | C ) P(C )
e.g., P( A  T , B  T , C  T )  0.6  0.8  0.1  0.048
Train Strike
T
F
0.8
0.1
0.2
0.9
Example
C
Martin
Late
T
F
Train Strike
T
F
0.6
0.5
0.4
0.5
Martin
Late
A
B
C
Probability
A
B
Probability
T
T
T
0.048
T
T
0.093
F
T
T
0.032
F
T
0.077
T
F
T
0.012
T
F
0.417
F
F
T
0.008
F
F
0.413
T
T
F
0.045
F
T
F
0.045
T
F
F
0.405
F
F
F
0.405
Train
Strike
T
F
Train
Strike
A
Demo
B
Norman
Late
Probability
0.1
0.9
Norman
Late
T
F
Train Strike
T
F
0.8
0.1
0.2
0.9
Questions:
P (“Martin Late”, “Norman Late”)=?
P( A, B)   P( A, B, C )
C
e.g., P( A  T , B  T )  0.048  0.045  0.093
Marginal distribution
Example
C
Martin
Late
T
F
Train Strike
T
F
0.6
0.5
0.4
0.5
Martin
Late
Questions:
P (“Martin Late”)=?
A
B
C
Probability
A
B
Probability
T
T
T
0.048
T
T
0.093
F
T
T
0.032
F
T
0.077
T
F
T
0.012
T
F
0.417
F
F
T
0.008
F
F
0.413
T
T
F
0.045
F
T
F
0.045
T
F
F
0.405
F
F
F
0.405
Train
Strike
A
B
A
Probability
T
0.51
F
0.49
Marginal distribution
P( A)   P( A, B, C )   P( A, B)
B ,C
Train
Strike
T
F
B
e.g., P( A  T )  0.093  0.417  0.51
Norman
Late
Demo
Probability
0.1
0.9
Norman
Late
T
F
Train Strike
T
F
0.8
0.1
0.2
0.9
Example
C
Martin
Late
T
F
Train Strike
T
F
0.6
0.5
0.4
0.5
Questions:
Martin
Late
A
B
C
Probability
A
B
Probability
T
T
T
0.048
T
T
0.093
F
T
T
0.032
F
T
0.077
T
F
T
0.012
T
F
0.417
F
F
T
0.008
F
F
0.413
T
T
F
0.045
F
T
F
0.045
T
F
F
0.405
F
F
F
0.405
Train
Strike
A
B
P( A, B)
P( B)
Norman
Late
A
Probability
B
Probability
T
0.51
T
0.17
F
0.49
F
0.83
P (“Martin Late” | “Norman Late” )=?
P( A | B) 
Train
Strike
T
F
Probability
0.1
0.9
Norman
Late
T
F
Train Strike
T
F
0.8
0.1
0.2
0.9
Conditional distribution
e.g., P( A  T | B  T ) 
0.093
 0.5471
0.17
Demo
Inference Methods

Exact Algorithms:
–
–
–
–

Probability propagation
Variable elimination
Cutset Conditioning
Dynamic Programming
Approximation Algorithms
–
–
–
–
–
Variational methods
Sampling (Monte Carlo) methods
Loopy belief propagation
Bounded cutset conditioning
Parametric approximation methods
The given terms are called evidences.
Independence Assertions


Bayesian Networks have build-in independent assertions.
An independence assertion is a statement of the form
–
X and Y are independent given Z
That is,
or
–
X Z Y
P( X | Y , Z )  P( X | Z )
P( X , Y | Z )  P( X | Z ) P(Y | Z )
We called that X and Y are d-separated by Z.
d-Separation
X i Z Yj ?
Y1
Y2
X i Z X j , i  j ?
Y4
Y3
Z
X1
W1
X2
Yi  Z Y j , i  j ?
W2
X3
Type of Connections
Serial Connections
Yi – Z – Xj
Y1
Y2
Y4
Y3
Y1/2 – Z – Y3/4
Z
X1
W1
X2
Converge Connections
W2
Y3 – Z – Y4
X3 Diverge Connections
Xi – Z – Xj
d-Separation
Serial
X
Z
Y
X Z Y ?
Converge
X
Diverge
Y
Z
X Z Y ?
Z
X
Y
X Z Y ?
JPT: Joint probability table
CPT: Conditional probability table
Joint Distribution
P( X 1 , X 2 ,
 P( X n | X 1 ,
, Xn)
 With this, we can compute all probabilities
, X n1 ) P( X n1 | X1 ,
, X n2 )
P( X 2 | X 1 ) P( X 1 )
n
  P( X i | X 1 ,
, X i 1 )
By chain rule
X1
i 1
n
X5
Parents of Xi
Consider binary random variables:
1. To store JPT of all r.v’s :
2n
1 table entries
2. To store CPT of all r.v’s: ? table entries
X3
X4
  P( X i |  i ) By independence assertions
i 1
X2
X6
X7
X8
X9
X10
X11
Joint Distribution
n
, X n )   P( X i |  i )
P( X 1 , X 2 ,
i 1
2| i | entries are required
X1
n
2
| i |
X2
X3
X4
i 1
X5
Consider binary random variables:
1. To store JPT of all r.v’s :
2n
1 table entries
2. To store CPT of all r.v’s: ? table entries
X6
X7
X8
X9
X10
X11
Joint Distribution
To store JPT of all random variables:
211  1  2047 entries are required
X1 1
To store CPT of all random variables:
X3 1
X4 2
X5 2
29 entries are required
X2 1
X6 8
X7 2
X8 2
X9 2
X10 4
X11 4
More on
d-Separation
A path from X to Y is d-connecting
w.r.t evidence nodes E is every
interior nodes N in the path has the
property that either
1.
2.
It is linear or diverge and
not a member of E; or
It is converging, and either
N or one of its descendants
is in E.
X
Y
E
Identify the d-connecting and non-d-connecting
paths from X to Y.
More on
d-Separation
A path from X to Y is d-connecting
w.r.t evidence nodes E is every
interior nodes N in the path has the
property that either
1.
2.
It is linear or diverge and
not a member of E; or
It is converging, and either
N or one of its descendants
is in E.
X
Y
E
More on
d-Separation
Two nodes are d-separated if
there is no d-connecting path
between them.
X
Exercise:
Withdraw minimum number of edges
such that X and Y are d-separated.
Y
E
More on
d-Separation
Two set of nodes, say,
X={X1, …, Xm} and Y={Y1, …, Yn}
are d-separated w.r.t. evidence
nodes E if any pair of Xi and Yj
are d-separated w.r.t. E.
X
E
In this case, we have
P( X, Y, E)  P( X | Y, E) P(Y, E)
 P( X | E) P(Y, E)
P( X, E) P(Y, E)

P(E)
Y
Bayesian Networks
Clique Tree
Propagation
大同大學資工所
智慧型多媒體研究室
References

Developed by Lauritzen and Spiegelhalter
and refined by Jensen et al.
Lauritzen, S. L., and Spiegelhalter, D. J., Local computations with probabilities on graphical
structures and their application to expert systems, J. Roy. Stat. Soc. B, 50, 157-224, 1988.
Jensen, F. V., Lauritzen, S. L., and Olesen, K. G., Bayesian updating in causal probabilistic
networks by local computations, Comp. Stat. Quart., 4, 269-282, 1990.
Shenoy, P., and Shafer, G., Axioms for probability and belief-function propagation, in
Uncertainty and Articial Intelligence, Vol. 4 (R. D. Shachter, T. Levitt, J. F. Lemmer and L. N.
Kanal, Eds.), Elsevier, North-Holland, Amsterdam, 169-198, 1990.
Clique Tree Propagation (CTP)

Given a Bayesian Network, build a
secondary structure, called clique tree.
–
An undirected tree

Inference by propagation the belief potential
among tree nodes.

It is an exact algorithm.
Notations
Item
Random
variables
Random
vectors
Notation
Examples
uninitiated
uppercase
A, B, C
initiated
lowercase
a, b, c
uninitiated
Boldface
uppercase
X, Y, Z
initiated
Boldface
lowercase
x, y, z
Definition: Family of a Node
The family of a node V,
denoted as FV, is defined by:
FV  V  V
Examples:
FA  {A}
FB  {A, B}
FH  {E, G, H}
A
B
C
G
D
E
H
F
We will model the probability tables as potential functions.
Potential and Distributions
a
on
off
Prior
probability
P(a)
0.5
0.5
Function of a.
All of these tables map a
set of random variables to
a real value.
A
a
b
P(b | a)
on
off
on
0.7
0.3
off
0.2
0.8
Conditional
probability
B
C
D
G
E
Conditional
probability
H
d
F
Function of a and b.
Function of d, e and f.
on
f
P(f | de)
off
e
on
off
on
off
on
0.95
0.8
0.7
0.05
off
0.05
0.2
0.3
0.95
Potential
X : X  R

Used to implement
matrices or tables.
Two operations:
1. Marginalization: Y  X \ Y, Y  X
2. Multiplication:
Z  XY , Z  X  Y
X :
Marginalization
A
B
C
ABC
T
T
T
0.048
F
T
T
0.032
T
F
T
0.012
F
F
T
0.008
T
T
F
0.045
F
T
F
0.045
T
F
F
0.405
F
F
F
0.405
A
A
Y  X \ Y   X , Y  X
X\Y
Example:
Y :
1
X  { A, B, C}
Y1  {A, B}
Y2  {A}
Y :
A
B
AB
T
T
0.093
T
0.51
F
T
0.077
F
0.49
T
F
0.417
F
F
0.413
Y  X
1
C
2
Y    X
2
B ,C
Z :
Multiplication
A
B
C
ABC
T
T
T
0.093 0.08=0.00744
F
T
T
0.077 0.08=0.00616
T
F
T
0.417 0.02=0.00834
F
F
T
0.413 0.02=0.00826
T
T
F
0.093 0.09=0.00837
F
T
F
0.077 0.09=0.00693
T
F
F
0.417 0.91=0.37947
F
F
F
0.413 0.91=0.37583
Z ( z)  X (x)Y (y), Z  X  Y
Not necessary
sum to one.
x and y are consistent with z.
Example:
Z  { A, B, C}
X  {A, B}
Y  {B, C}
X :
A
B
AB
T
T
F
Y :
B
C
AB
0.093
T
T
0.08
T
0.077
F
T
0.02
T
F
0.417
T
F
0.09
F
F
0.413
F
F
0.91
Z  ABC  ABBC  XY
The Secondary Structure
Given a Bayesian Network over a set of variables
U = {V1, …, Vn} , its secondary structure contains a
graphical and a numerical component.
Graphic Component:
An undirected clique tree: satisfies the join tree property.
Numerical Component:
Belief potentials on nodes and edges.
How to build a clique tree?
The Clique Tree T
The clique tree T for a belief network
over a set of variables U = {V1, …, Vn}
satisfies the following properties.


Each node in T is a cluster or clique
(nonempty set) of variables.
The clusters satisfy the join tree
property:
–
–

A
Given two clusters X and Y in T, all
clusters on the path between X and Y
contain XY.
For each variable VU, FV is included
in at least one of the cluster.
Sepsets: Each edge in T is labeled with
the intersection of the adjacent clusters.
B
C
G
D
E
H
F
ABD
AD
ADE
AE
ACE
CE
CEG
DE
EG
DEF
EGH
How to assign belief functions?
The Numeric Component
Clusters and sepsets are attached with belief functions.

For each cluster X and neighboring sepset S, it holds that
S  X \ S  X
Local Consistency
X\S

It also holds that


P(U) 

i
Xi
i
Si
Global Consistency
ABD
AD
ADE
AE
ACE
CE
CEG
DE
EG
DEF
EGH
How to assign belief functions?
The Numeric Component
Clusters and sepsets are attached with belief functions.
The key step to satisfy these constraints by letting
X  P(X) and S  P(S)
If so,
V X
P(V )  X \ {V }
V S
P(V )  S \ {V }
ABD
AD
ADE
AE
ACE
CE
CEG
DE
EG
DEF
EGH
Bayesian Networks
Building the
Clique Tree
大同大學資工所
智慧型多媒體研究室
The Steps
Belief Network
Moral Graph
Triangulated
Graph
Clique Set
Join Tree
Moral Graph
A
Belief Network
A
B
C
G
B
C
G
D
E
H
D
E
H
Moral Graph
Triangulated
Graph
F
Belief Network
Clique Set
F
Moral Graph
1. Convert the directed graph to undirected.
2. Connect each pair of parent nodes for each node.
Join Tree
This step is, in fact, done by incorporating with the next step.
Triangulation
A
Belief Network
A
B
C
G
B
C
G
D
E
H
D
E
H
Moral Graph
Triangulated
Graph
F
Triangulated Graph
F
Moral Graph
Clique Set
1. Triangulate the cycles with length more than 4
Join Tree
There are many ways.
Select Clique Set
GM 
A
Belief Network
A
GM
B
C
G
B
C
G
D
E
H
D
E
H
Moral Graph
Triangulated
Graph
Clique Set
Join Tree
F
F
1. Copy GM to GM’.
2. While GM’ is not empty
a) select a node V from GM’, according to a criterion.
b) Node V and its neighbor form a cluster.
c) Connect all the nodes in the cluster. For each edge
added to GM’, add the same edge to GM.
d) Remove V from GM’.
Criterion:
1. The weight of a node V is the number of values of V.
2. The weight of a cluster is the product of it constituent nodes.
•
•
Select Clique Set
Choose the node that causes the least number of edges to be added.
Breaking ties by choosing the node that induces the cluster with the smallest weight.
GM 
A
Belief Network
A
GM
B
C
G
B
C
G
D
E
H
D
E
H
Moral Graph
Triangulated
Graph
Clique Set
Join Tree
F
F
1. Copy GM to GM’.
2. While GM’ is not empty
a) select a node V from GM’, according to a criterion.
b) Node V and its neighbor form a cluster.
c) Connect all the nodes in the cluster. For each edge
added to GM’, add the same edge to GM.
d) Remove V from GM’.
Criterion:
1. The weight of a node V is the number of values of V.
2. The weight of a cluster is the product of it constituent nodes.
•
•
Select Clique Set
Choose the node that causes the least number of edges to be added.
Breaking ties by choosing the node that induces the cluster with the smallest weight.
GM 
A
Belief Network
A
GM
B
C
G
B
C
G
D
E
H
D
E
H
Moral Graph
F
Triangulated
Graph
F
A
Clique Set
Join Tree
Eliminated
Vertex
Induced
Cluster
Edges
Added
H
G
EGH
CEG
none
none
B
C
G
F
C
DEF
ACE
none
{A, E}
D
E
H
B
D
E
A
ABD
ADE
AE
A
{A, D}
none
none
none
F
Building an Optimal Join Tree
Belief Network
Moral Graph
Triangulated
Graph
Clique Set
We need to find minimal
number of edges to
connect these cliques,
i.e. to build a tree.
Given n nodes to build a tree,
n1 edges are required.
There are many ways.
How to achieve optimality?
Join Tree
Eliminated
Vertex
Induced
Cluster
Edges
Added
H
G
EGH
CEG
none
none
F
C
DEF
ACE
none
{A, E}
B
D
E
A
ABD
ADE
AE
A
{A, D}
none
none
none
Building an Optimal Join Tree
Belief Network
Moral Graph
Triangulated
Graph
Clique Set
Join Tree
1. Begin with a set of n trees, each consisting of
a single clique, and an empty set S.
2. For each distinct pair of cliques X and Y:
a) Create a candidate sepset SXY= XY, with
backpointers to X and Y.
b) Insert SXY to S.
3. Repeat until n1 sepsets have been inserted
into the forest.
a) Select a sepset SXY from S, according to
the criterion described in the next slide.
Delete SXY from S.
b) Insert SXY between cliques X and Y only if
X and Y are on different trees in the forest.
Building an Optimal Join Tree
Criterion:
1. The mass of SXY is the1.number
nodes
in XY.
Beginofwith
a set
of n trees, each consisting of
Network
2.Belief
The cost
of SXY is the weight
X plusclique,
the weight
a single
and Y.
an empty set S.
– The weight of a node V is the number of values of V.
2. For each distinct pair of cliques X and Y:
– The weight of a set of nodes X is the product of it constituent nodes in X.
Moral Graph
a) Create a candidate sepset SXY= XY, with
backpointers
to X and Y.
• Choose the sepset with causes the
largest mass.
• Breaking ties by choosing the
theS.smallest cost.
b) sepset
Insert with
SXY to
Triangulated
3. Repeat until n1 sepsets have been inserted
Graph
into the forest.
a) Select a sepset SXY from S, according to
the criterion described in the next slide.
Clique Set
Delete SXY from S.
b) Insert SXY between cliques X and Y only if
Join Tree
X and Y are on different trees in the forest.
Building an Optimal Join Tree
A
Belief Network
B
C
G
D
E
H
Moral Graph
Triangulated
Graph
Clique Set
Join Tree
F
Graphical Transformation
ABD
AD
ADE
AE
ACE
CE
CEG
DE
EG
DEF
EGH
Bayesian Networks
Inference by
Propagation
大同大學資工所
智慧型多媒體研究室
Inferences
P(V )  ?
P(V | e)  ?
Inference without evidence
Inference with evidence
PPTC: Probability Propagation in Tree of Cliques.
P(V )  ?
Inference without Evidence
Train
Strike
Martin
Oversleep
Boss
Failure-in-Love
Norman
Oversleep
Martin
Late
Norman
Late
Project
Delay
Office
Dirty
Boss
Angry
Demo
Procedure for PPTC without Evidence
Belief Network
Graphical
Transformation
Building Graphic Component
Join Tree Structure
Initialization
Inconsistent Join Tree
Propagation
Consistent Join Tree
Marginalization
P(V )
Building Numeric Component
Initialization
1. For each cluster and sepset X, set each X(x) to 1:
A
B
C
G
D
E
H
X  1
2. For each variable V:
a) Assign to V a cluster X that contains FV; call
X the parent cluster of FV.
b) Multiply X(x) by P(V | V).
F
ABD
AD
ADE
AE
ACE
CE
CEG
DE
EG
DEF
EGH
X  X P(V | V )
X  1
X  X P(V | V )
c
on
on
off
off
a
on
off
on
off
c
on
on
off
off
on
on
off
off
e
on
off
on
off
on
off
on
off
Initialization
a
on
on
on
on
off
off
off
off
 ACE
A
B
C
G
D
E
H
F
ABD
AD
ADE
AE
ACE
CE
CEG
DE
EG
DEF
EGH
P(c | a)
0.7
0.2
0.3
0.8
e
on
on
off
off
c
on
off
on
off
P(e | c)
0.3
0.6
0.7
0.4
Initial Values
1  0.7  0.3 = 0.21
1  0.7  0.7 = 0.49
1  0.3  0.6 = 0.18
1  0.3  0.4 = 0.12
1  0.2  0.3 = 0.06
1  0.2  0.7 = 0.14
1  0.8  0.6 = 0.48
1  0.8  0.4 = 0.32
Initial
Values
c
e
CE on on
1
on off
1
off on
1
off off
1
Q
N
N : # clusters
Q : # variables

i 1
N 1
Now,
Xi
Initialization

Sj
j 1
a
on
on
on
on
off
off
off
off
 ACE
A
B
C
G
D
E
H
F
ABD
AD
ADE
AE
ACE
CE

c
on
on
off
off
on
on
off
off
CEG
DE
EG
DEF
EGH
 P(V
k
i 1
1
| CVk )
 P(U )
By independence assertions
e
on
off
on
off
on
off
on
off
Initial Values
1  0.7  0.3 = 0.21
1  0.7  0.7 = 0.49
1  0.3  0.6 = 0.18
1  0.3  0.4 = 0.12
1  0.2  0.3 = 0.06
1  0.2  0.7 = 0.14
1  0.8  0.6 = 0.48
1  0.8  0.4 = 0.32
Initial
Values
c
e
CE on on
1
on off
1
off on
1
off off
1
N : # clusters
Q : # variables
Q
N

Now,
i 1
N 1
Xi
Initialization

j 1
Sj

 P(V
k
i 1
1
| CVk )
 P(U )
By independence assertions
a c
e
Values
 ACE on on on 1  Initial
0.7  0.3 = 0.21
on on off 1  0.7  0.7 = 0.49
B
C
G
on off on 1  0.3  0.6 = 0.18
AfterE initialization,
onglobal
off off consistency
1  0.3  0.4 = 0.12 is
D
H
off on on 1  0.2  0.3 = 0.06
satisfied,
but localoffconsistency
is not.
= 0.14
on off 1  0.2  0.7
F
off off on 1  0.8  0.6 = 0.48
off off off 1  0.8  0.4 = 0.32
Initial
AD
AE
CE
Values
ABD
ADE
ACE
CEG
c
e
CE on on
1
EG
DE
on off
1
off on
1
DEF
EGH
off off
1
A
Global Propagation
It is used to achieve local consistency.
Let’s consider single message passing first.
Message Passing
X
Projection
on sepset:
R

old
R
Y
 R
R  X \ R
Absorption on
receiving cluster:
R
Y  Y old
R
The Effect of Single Message Passing
  X
i
i

  j S j

  old 
  X
i
i
R
Y



 R Yold   j S j


  old 1

 R old Yold olRd  P(U)
 R Y
R

Message Passing
X
Projection
on sepset:
R

old
R
Y
 R
R  X \ R
Absorption on
receiving cluster:
R
Y  Y old
R
Global Propagation
1.
Choose an arbitrary cluster X.
2.
Unmark all clusters. Call
Ingoing-Propagation(X).
3.
Unmark all clusters. Call
Outgoing-Propagation(X).
Choose an arbitrary cluster X.
Unmark all clusters. Call Ingoing-Propagation(X).
Unmark all clusters. Call Outgoing-Propagation(X).
1.
2.
3.
Global Propagation
Ingoing-Propagation(X)



Outgoing-Propagation(X)
Mark X.
Call Ingoing-Propagation
recursively on X’s unmarked
neighboring clusters, if any.
Pass a message from X to the
cluster which invoked IngoingPropagation(X).
1
ABD
AD
8
3
ADE
DEF
ACE
6
2


5
AE
DE
Mark X.
Pass a message from X to each of
its unmarked clusters, if any.
Call Outgoing-Propagation
recursively on X’s unmarked
neighboring clusters, if any.

CE
9
7
CEG
EG
10
EGH
4
After global propagation,
The clique tree is both global
and local consistent.
P( A)  ABD
P( D)    ABD
BD
AB
Marginalization
ABD =
a
on
on
on
on
off
off
off
off
b
on
on
off
off
on
on
off
off
d
on
off
on
off
on
off
on
off
ABD(abd)
.225
.025
.125
.125
.180
.020
.150
.150
ABD
Consistent
Join Tree
P (a)
a
on .225 + .025 + .125 + .125 = .500
off .180 + .020 + .150 + .150 = .500
P (d)
d
on .225 + .125 + .180 + .150 = .680
off .025 + .125 + .020 + .150 = .320
AD
ADE
AE
ACE
CE
CEG
DE
EG
DEF
EGH
Review:
Procedure for PPTC without Evidence
Belief Network
Graphical
Transformation
Building Graphic Component
Join Tree Structure
Initialization
Inconsistent Join Tree
Propagation
Consistent Join Tree
Marginalization
P(V )
Building Numeric Component
P(V | e)  ?
Inference with Evidence
Train
Strike
Martin
Oversleep
Boss
Failure-in-Love
Norman
Oversleep
Martin
Late
Norman
Late
Project
Delay
Office
Dirty
Boss
Angry
Demo
Observations

Observations are the simplest forms of evidences.
An observations is a statement of the form V = v.

Collections of observations may be denoted by

E=e

An instantiation of a set
of variable E.
Observations are referred to as hard evidence.
Likelihoods
Given E = e, the likelihood of V, denoted as V,
is defined as:
V E :
1 v  e(V )
V (v)  
0 otherwise
V E :
V (v)  1,
v
Likelihoods
V(v)
Variable
A
B
on
D
off
F
C
E
G
H
V
v = on
v = off
A
1
1
B
1
1
C
1
0
D
0
1
E
1
1
F
1
1
G
1
1
H
1
1
Procedure for PPTC with Evidence
Belief Network
Graphical
Transformation
Building Graphic Component
Join Tree Structure
1. Initialization
2. Observation Entry
Inconsistent Join Tree
Propagation
Consistent Join Tree
1. Marginalization
2. Normalization
P(V | e)
Building Numeric Component
Initialization with Observations
1. For each cluster and sepset X, set each X(x) to 1:
A
B
C
G
D
E
H
F
X  1
2. For each variable V:
a) Assign to V a cluster X that contains FV; call
X the parent cluster of FV.
b) Multiply X(x) by P(V | V).
X  X P(V | V )
ABD
AD
ADE
AE
ACE
CE
CEG
DE
EG
DEF
EGH
3. Set each likelihood
element V(v) to 1:
V  1
X  1
X  X P(V | V )
V  1
Observation Entry
1. Encode the observation V = v as: Vnew
A
B
C
G
2. Identify a cluster X that contains V.
D
E
H
3. Update X and V:
F
ABD
AD
X  X 
new
V
ADE
AE
ACE
CE
CEG
DE
EG
DEF
EGH
V  Vnew
Marginalization
After global propagation,
ABD
X  P(X, e)
V  X  P(V , e)  X \{V }
S  P(S, e)
V  S  P(V , e)  S \{V }
AD
ADE
AE
ACE
CE
CEG
DE
EG
DEF
EGH
Normalization
After global propagation,
X  P(X, e)
V  X  P(V , e)  X \{V }
S  P(S, e)
V  S  P(V , e)  S \{V }
Normalization
P(V , e)
P(V , e)
P(V | e) 

P(e)
 P(V , e)
V
Handling Dynamic Observations
e1
Suppose that the
join tree now is
consistent for e1.
e2
How to handle
the consistency if
the observation is
changed to e2?
Observation States
e2
e1
Three observation states for a variable, say, V :
1. No change
2. Update
V is unobserved  observed
3. Retraction
V is observed  unobserved
or
V = v1  V = v2 ,
v1  v2
Handling Dynamic Observations
Belief Network
Graphical Transformation
Join Tree Structure
1. Initialization
2. Observation Entry
Global
Retraction
When?
Inconsistent Join Tree
Propagation
Consistent Join Tree
1. Marginalization
2. Normalization
P(V | e)
Global
Update
When?