Bayesian Networks Variable Elimination Algorithm

Bayesian Networks
Bucket Elimination Algorithm
主講人:虞台文
大同大學資工所
智慧型多媒體研究室
Content
 Basic
Concept
 Belief Updating
 Most Probable Explanation (MPE)
 Maximum A Posteriori (MAP)
Bayesian Networks
Bucket Elimination Algorithm
Basic Concept
大同大學資工所
智慧型多媒體研究室
Satisfiability
Given a statement of clauses (in disjunction normal form),
the satisfiability problem is to determine whether there
exists a truth assignment to make the statement true.
Examples:
1. ( A  B  C )  (C  D)  ( A  B  D)  (A  D)
A=True, B=True, C=False, D=False
Satisfiable
2. ( A  B  C )  ( B  C  D)  ( B  C )  (A  C )  (D)
Satisfiable?
Resolution
(  q)  (   q) can be true if and only if
       can be true.


 q
  q
 
    TURE ?
unsatisfiable
( A  B  C )  ( B  C  D)  ( B  C )  (A  C )  (D)
Direct Resolution
Example:
 A  B  C 
 BC  D 


Given a set of clauses  B  C  and an order d=ABCD
 A  C 


D


Set initial buckets as follows:
A  B  C
B  C
A  C
BucketA
BucketB
BucketC
BC  D
D
BucketD
( A  B  C )  ( B  C  D)  ( B  C )  (A  C )  (D)
Direct Resolution
Because no empty clause () is resulted, the
statement is satisfiable.
How to get a truth assignment?
BucketA
A B
B
A  B
BC
A  B  C
B  C
A  C
BucketB
BucketC
BC  D
D
BucketD
( A  B  C )  ( B  C  D)  ( B  C )  (A  C )  (D)
Direct Resolution
A  True or False
BucketA
if A  True
True
C
True/False if A  False
B  True
D  False
BC
A  B  C
A B
BC  D
B  C
B
D
A  C
A  B
BucketB
BucketC
BucketD
Direct Resolution
Queries on Bayesian Networks

Belief updating

Finding the most probable explanation (mpe)
–

Given evidence, finding a maximum probability assignment to
the rest of variables.
Maximizing a posteriori hypothesis (map)
–

P ( X | E)  ?
Given evidence, finding an assignment to a subset of
hypothesis variables that maximize their probability.
Maximizing the expected utility of the problem (meu)
–
Given evidence and utility function, finding a subset of decision
variables that maximize the expected utility.
Bucket Elimination

The algorithm will be used as a framework for
various probabilistic inferences on Bayesian
Networks.
Preliminary – Elimination Functions
Given a function h defined over subset of
variables S, where X  S,
min X h
max X h
mean X h
X h
Eliminate parameter X from h
Defined over U = S – {X}.
Preliminary – Elimination Functions
Given a function h defined over subset of
variables S, where X  S,
min X h
max X h
mean X h
X h
 min X h (u)  min X h( x, u)
 max X h (u)  max X h( x, u)
1
 mean X h  (u) 
 X h( x, u)
|X|
  X h (u)   X h( x, u)
Preliminary – Elimination Functions
Given function h1,…, hn defined over subset
of variables S1,…, Sn, respectively,
 jhj
Defined over U 
 jhj
S
j j
Preliminary – Elimination Functions
Given function h1,…, hn defined over subset
of variables S1,…, Sn, respectively,
 jhj
  h  (u)   h (u
 jhj
  h  (u)  
j
j
j
j
j
j
Sj
)
h
(
u
)
j
S
j
j
Bayesian Networks
Bucket Elimination Algorithm
Belief Updating
大同大學資工所
智慧型多媒體研究室
Goal
P ( X  x | E  e)  ?
P ( X  x , E  e)

P ( E  e)
Normalization
Factor
P (a, g  1)  ?
P(a | g  1)  ?
Basic Concept of Variable Elimination
Example:
P( g , f , d , c, b, a)  P( g | f ) P( f | b, c) P(d | a, b) P(b | a) P(c | a) P(a)
A
C
B
F
D
G
P (a, g  1)  ?
P(a | g  1)  ?
Basic Concept of Variable Elimination
Example:
P( g , f , d , c, b, a)  P( g | f ) P( f | b, c) P(d | a, b) P(b | a) P(c | a) P(a)
P(a, g  1) 

P(a, b, c, d , f , g )
b , c , d , f , g 1

 P(a)
P( g | f ) P( f | b, c) P(d | a, b) P(b | a) P(c | a )
b , c , d , f , g 1
 P(a) P(c | a) P(b | a) P( f | b, c) P(d | a, b) P( g | f )
c
b
f
d
g 1
P (a, g  1)  ?
P(a | g  1)  ?
Basic Concept of Variable Elimination
P(a, g  1)
 P(a) P(c | a) P(b | a) P( f | b, c) P(d | a, b) P( g | f )
c
b
f
g 1
d
G(f)
 P(a) P(c | a) P(b | a) P( f | b, c )G ( f ) P (d | a, b)
c
b
f
d
D(a, b)
 P(a) P(c | a) P(b | a)D (a, b) P ( f | b, c)G ( f )
c
b
f
F(b, c)
 P(a) P(c | a) P(b | a)D (a, b)F (b, c)  P(a) P(c | a)B (a, c)
c
 P(a)C (a)
b
B(a, c)
c
C(a)
P(a, g  1)
 P(a) P(c | a) P(b | a) P( f | b, c) P(d | a, b) P( g | f )
c
b
f
d
g 1
Basic Concept of Variable Elimination
BucketG
G ( f )   P( g | f )
BucketD
D (a, b)   P(d | a, b)
g 1
d
BucketF
F (b, c)   P( f | b, c)G ( f )
f
BucketB
B (a, c)   P(b | a)D (a, b)F (b, c)
b
BucketC
C (a)   P(c | a)B (a, c)
c
BucketA
P(a, g  1)  P(a)C (a)
P(a, g  1)
 P(a) P(c | a) P(b | a) P( f | b, c) P(d | a, b) P( g | f )
c
b
f
d
g 1
Basic Concept of Variable Elimination
BucketG
 P( g | f )
BucketD
 P ( d | a, b)  
BucketF
 P( f | b, c) 
BucketB
 P(b | a)
BucketC
 P(c | a)  (a, c)  
BucketA
g 1
 G ( f )
D
d
G
f
b
c
D
B
(a, b)
( f )  F (b, c)
(a, b) F (b, c)  B (a, c)
C
(a)
P (a ) C (a)  P(a, g  1)
Basic Concept of Variable Elimination
G ( f )   P( g | f )
g 1
f
+

G(f )
0.1
0.7
f
+

Basic Concept of Variable Elimination
D (a, b)   P(d | a, b)
d
a
0
0
1
1
b D(a, b)
0
1
1
1
0
1
1
1
G(f )
0.1
0.7
a
0
0
1
1
b D(a, b)
0
1
1
1
0
1
1
1
G(f )
0.1
0.7
f
+

Basic Concept of Variable Elimination
F (b, c)   P( f | b, c)G ( f )
f
b
0
0
1
1
c
0
1
0
1
F(b, c)
0.701
0.610
0.400
0.340
0.7
0.1
0.7
0.1
0.7
0.1
0.7
0.1
b
0
0
1
1
c
0
1
0
1
F(b, c)
0.701
0.610
0.400
0.340
a
0
0
1
1
b D(a, b)
0
1
1
1
0
1
1
1
f
+

Basic Concept of Variable Elimination
B (a, c)   P(b | a)D (a, b)F (b, c)
b
a
0
0
1
1
c
0
1
0
1
B(a, c)
0.90.701+0.1 0.400=0.6709
0.90.610+0.1 0.340=0.5830
0.60.701+0.4 0.400=0.5806
0.60.610+0.4 0.340=0.5020
G(f )
0.1
0.7
a
0
0
1
1
c
0
1
0
1
B(a, c)
0.6709
0.5830
0.5806
0.5020
b
0
0
1
1
c
0
1
0
1
F(b, c)
0.701
0.610
0.400
0.340
a
0
0
1
1
b D(a, b)
0
1
1
1
0
1
1
1
f
+

Basic Concept of Variable Elimination
C (a)   P(c | a)B (a, c)
c
a
1
0
C(a )
0.67 0.5806+0.33 0.5020=0.554662
0.75 0.6709+0.25 0.5830=0.648925
G(f )
0.1
0.7
a
1
0
C(a )
0.554662
0.648925
a
0
0
1
1
c
0
1
0
1
B(a, c)
0.6709
0.5830
0.5806
0.5020
b
0
0
1
1
c
0
1
0
1
F(b, c)
0.701
0.610
0.400
0.340
a
0
0
1
1
b D(a, b)
0
1
1
1
0
1
1
1
f
+

Basic Concept of Variable Elimination
P(a, g  1)  P(a)C (a)
a
1
0
P(a, g=1)
0.30.554662=0.1663986
0.70.648925=0.4542475
P(a, g  1)
P(a | g  1) 
P( g  1)
P( g  1)  0.1663986  0.4542475
 0.6206461
a
1
0
P(a | g=1)
0.1663986/0.6206461=0.26811
0.4542475/0.6206461=0.73189
G(f )
0.1
0.7
Bucket Elimination Algorithm
Complexity


The BuckElim Algorithm can be applied to any ordering.
The arity of the function recorded in a bucket
–



the numbers of variables appearing in the processed bucked,
excluding the bucket’s variable.
Time and Space complexity is exponentially grow with a
function of arity r.
The arity is dependent on the ordering.
How many possible orderings for BN’s variables?
Consider the ordering AFDCBG.
A
Determination of the Arity
C
B
F
D
BucketG  P( g | f )  G ( f )
G
g 1
BucketB  P(b | a)P(d | a, b) P( f | b, c)  B (a, c, d , f )
b
BucketC P(c | a) B (a, c, d , f )  C (a, d , f )
G
B
c
C (a, d , f )  D (a, f )
BucketD 
d
BucketF  G ( f )D (a, f )  F (a)
f
BucketA
P (a ) F (a)  P(a, g  1)
C
D
F
A
1
4
1 ,3
0 ,2
0 ,1
0
Given the ordering, e.g., AFDCBG.
d
Determination of the Arity
The width of a graph is the
maximum width of its nodes.
w(d): width of initial graph
for ordering d.
w*(d): width of induced graph
for ordering d.
w(d) = 4
Width of
node
G
B
A
C
C
B
D
F
D
G
Initial
Graph
F
A
w*(d) = 4
Width of
node
1
G
4
B
1
C
0
0
0
Induced
Graph
D
F
A
1
4
3
2
1
0
Definition of Tree-Width
Goal: Finding an ordering with smallest induced width.
NP-Hard
Greedy heuristic and Approximation methods
Are available.
Summary



The complexity of BuckElim algorithm is
dominated by the time and space needed to
process a bucket.
It is time and space is exponential in number
of bucket variables.
Induced width bounds the arity of bucket
functions.
Exercises

Use BuckElim to evaluate P(a|b=1) with the
following two ordering:
A
1.
2.
d1=ACBFDG
d2=AFDCBG
C
B
F
D
Give the details and make some conclusion.
How to improve the algorithm?
G
Bayesian Networks
Bucket Elimination Algorithm
Most Probable
Explanation (MPE)
大同大學資工所
智慧型多媒體研究室
MPE
Goal:
x*  arg max P( X  x | E  e)  ?
x
evidence
x  ( x1 ,
, xn )
MPE
Goal:
x*  arg max P( X  x | E  e)  ?
x
x*  arg max P(x, e)  ?
x
x*  arg max P (x, e)
x
Notations
i
xi

Fi i
x*  arg max P (x, e)
x
MPE
PX (x)  P(x, e)  i 1 P( xi , e |  i )
n
Let
xn  ( x1 ,
, xn )
P(x*)  max PX (x)  max PX (xn )
x
xn
 max i 1 P( xi , e |  i )
n
xn
 max i 1 P( xi , e |  i )
n
xn1 , xn
x*  arg max P (x, e)
x
MPE
P ( x*)  max i 1 P( xi , e |  i )
n
xn1 , xn
Some terms involve xn,
some terms not.
n
Xn is conditioned by its parents.
P( xn , e |  n )
Xn
n
Xn conditions its children.
P( xk , e | xn , ), xk  n
x*  arg max P (x, e)
x
MPE
P ( x*)  max i 1 P( xi , e |  i )
n
xn1 , xn
 max  X X F P( xi , e |  i )  max P( xn , e |  n ) X  P( xi , e |  i )
xn1
i
n
Not conditioned by xn
n
Fn
Xn
n
xn
i
Itself
n
Conditioned by xn
xn appears in these CPT’s
x*  arg max P (x, e)
x
MPE
P ( x*)  max i 1 P( xi , e |  i )
n
xn1 , xn
 max  X X F P( xi , e |  i )  max P( xn , e |  n ) X  P( xi , e |  i )
xn1
i
n
xn
i
n
hn ( xU n )
Eliminate variable xn at Bucketn.
 max  X X F P( xi , e |  i )  hn ( xUn )
xn1
i
n
Process the next bucket recursively.
max P(a, b, c, d , f , g  1)  ?
a ,b , c , d , f
Example
P(a, b, c, d , f , g  1)  P( g  1 | f ) P( f | b, c) P( d | a, b) P(b | a) P(c | a) P(a)
A
C
B
F
D
G
max P(a, b, c, d , f , g  1)  ?
a ,b , c , d , f
Example
Consider ordering ACBFDG
A
BucketG
max P( g | f )  hG ( f )
BucketD
max P(d | a, b)  hD (a, b)
d
BucketF
max P( f | b, c) h ( f )  h (b, c)
G
F
f
BucketB
max P (b | a )h (a, b) h (b, c)  h (a, c)
D
F
B
b
BucketC
max P (c | a )  (a, c)   (a)
C
B
c
BucketA
max P ( a ) hC (a)  max P(a, b, c, d , f , g  1)
g1
a
a ,b , c , d , f
C
B
F
D
G
Bucket Elimination Algorithm
max P(a, b, c, d , f , g  1)  ?
a ,b , c , d , f
Exercise
Consider ordering ACBFDG
Bayesian Networks
Bucket Elimination Algorithm
Maximum
A Posteriori (MAP)
大同大學資工所
智慧型多媒體研究室
MAP
Given a belief network, a subset of hypothesized
variables A=(A1, …, Ak), and evidence E=e, the
goal is to determine
a*  arg max P( A  a | E  e)  ?
a
a*  arg max P(a, e)  ?
a
Example
A
Hypothesis (Decision)
Variables
C
B
F
D
G
g=1
(b*, c*)  arg max P(b, c | g  1)  ?
b ,c
MAP
Ordering
d  X1 , X k , X k 1 , , X n
 A1 , Ak , X k 1 , , X n
 Ak , X
n
k 1
Some of them may be observed
d  Ak , X
MAP
a  arg max P(ak | e)
*
k
ak
P(ak , e)
 arg max
ak
P(e)
 arg max P(ak , e)
ak
n
k 1
d  Ak , X
MAP
n
k 1
a  arg max P(ak , e)
*
k
ak
P(a k , e)   xn P(a k , x
k 1
  xn
k 1
P(a )  max ak
*
k

, e)
P
(
x
,
e
|

)
i
i
i 1
n
 
xnk1
n
k 1
P
(
x
,
e
|

)
i
i
i 1
n
d  Ak , X
MAP
n
k 1
a  arg max P(ak , e)
*
k
ak
Bucket Elimination
for MPE
P(a )  max ak
*
k
Bucket Elimination
for belief updating
 
xnk1
P
(
x
,
e
|

)
i
i
i 1
n
Bucket Elimination Algorithm
Consider ordering CBAFDG
Example
BucketG
 P( g | f )
BucketD
 P ( d | a, b)  
BucketF
g 1
b ,c
 G ( f )
d
 P( f | b, c)
f
(b*, c*)  arg max P(b, c | g  1)  ?
A
D
(a, b)
F
G ( f )   F (b, c)
 P(c | a) P(b | a) P(a) 
BucketB
max  F (b, c)  A (b, c)  b (c)
BucketC
max b (c)  max P(b, c | g  1)
a
D
(a, b)   A (b, c)
b
b ,c
D
G
BucketA
c
C
B
g=1
Consider ordering CBAFDG
Exercise
BucketG
 P( g | f )
BucketD
 P ( d | a, b)  
BucketF
g 1
b ,c
 G ( f )
d
 P( f | b, c)
f
(b*, c*)  arg max P(b, c | g  1)  ?
A
D
(a, b)
F
G ( f )   F (b, c)
 P(c | a) P(b | a) P(a) 
BucketB
max  F (b, c)  A (b, c)  b (c)
BucketC
max b (c)  max P(b, c | g  1)
a
b
b ,c
D
G
BucketA
c
C
B
D
g=1
(a, b)   A (b, c)
Give the detail