Graphical Models: Sum

Graphical Models:
Sum-Product Algorithm
Sandro Schönborn
CS351 09.04.2013
Factor Graphs
Big Products of smaller factors
f (x1 , x2 , x3 ) = fa (x1 )fb (x2 )fc (x1 , x2 , x3 )
P (E, B, A, J, M) = P (E)P (B)P (A|E, B)P (J|A)P (M|A
..
. Y
f (x) =
fS (xS )
S
2
Factor Graphs
f (x1 , x2 , x3 ) = fa (x1 )fb (x2 )fc (x1 , x2 , x3 )
3
Graph Conversions
Priors and Conditionals
4
Graph Conversions
Potentials
5
The Sum-Product Algorithm
Efficiently calculates all marginals in a tree graph
Message Passing: local computation
First known as Belief Propagation for directed
graphs (Pearl 1982)
Known under many names: Forward-Backward
Algorithm, Kalman Filter, Low Density Parity Check
Codes, . . .
6
Belief Propagation
Message Passing
Belief of a node abouts its neighbour
Marginal: Combined beliefs of all neighbours
Tree propagation: all marginals for the cost of two
→ Sum-Product: Generalization to factor graphs
7
The Sum-Product Algorithm
Tree Propagation
Choose root node
Propagate from the leaves towards the root:
for each node
• Collect incoming messages from all children (wait)
• Send message to parent
Revert direction, start at root
• Distribute messages to all children
8
The Sum-Product Algorithm
9
Variable to Factor Message
µx→f (x) = µf1 →x (x)µf2 →x (x)
10
Factor to Variable Message
µf →x (x) =
XX
x1
f (x, x1 , x2 )µx1 →f (x1 )µx2 →f (x2 )
x2
11
Starting and Stopping
Initial Messages
Marginal
p(x) ∝
Y
µfi →x (x)
i
12
Remarks
Messages are functions of the variable involved
A message summarizes the information about the
destination variable as seen from the subgraph
behind the sender
A message never includes knowledge from the
recipient (no double counting)
⇒ Messages depend on receiver
Graph must not have loops: destroys scheduling
→ see Loopy Belief Propagation
Local normalization is sufficient
14
Observations
We know x = v
Initial messages change for observed variables
Observations
µx→f (x) = δxv
δ is the Kronecker Delta δxv
(
1 if x = v ,
δxv =
0 x=
6 v
15
Message Passing
Initial Messages
µx→f (x) = 1
µf →x (x) = f (x)
µx→f (x) = δxxobs
16
Message Passing
Running Messages
µx→f (x) =
N
Y
µfi →x (x)
i =1
µf →x (x) =
XX
x1
x2
···
X
xN
f (x, x1 , . . . , xN )
N
Y
µxi →f (xi )
i =1
17
The California Example
Example
Will John call?
P (J)
John called, how likely is the burglary?
P (B | j)
18
Continuous Variables
Continuous variables do not change the structure
of the algorithm
Factor-to-Variable messages change:
Z
X
−→
dx
x
19
Continuous Variables
Rehandout of real functions?
• Parametric rehandout (e.g. Gaussians)
• Function space rehandout with finite basis
Observations:
Initial message becomes a Dirac Delta
µx→f (x) = δ(x − xobs )
20
The California Example — Beefed-Up
Replace John by a web-connected microphone at
home
The microphone measures the continuous sound
level S with values S ∈ R
We need a new conditional distribution P (S|A)
P (S|a) = N (S|1, 0.04)
P (S|ā) = N (S|0, 0.25)
21
22
California Example — Beefed-Up
Example
What is the sound level?
P (S)
We read 0.9, how likely is the burglary?
P (B|S = 0.9)
23
Max-Product Algorithm
Often we need the maximum posterior probability
max P (x), x = arg max P (x)
x
x
The max function satisfies
max(a × c, b × c) = max(a, b) × c
c ≥0
We can use the same algorithm to calculate
maximal assignments!
24
Max-Product Algorithm
Change µf →x (x) to
µf →x (x) = max max f (x, x1 , x2 )µx1 →f (x1 )µx2 →f (x2 )
x1
x2
Introduce bookkeeping of local maximal
assignments at the factors
25
Summary
Factor graph: factorization of large products
Marginals can be calculated by message passing
Messages are beliefs: What the sub-graph behind
a node thinks of the destination of the message
Exact on tree-structured factor graphs
Complexity depends on the factor with highest
degree (connectivity)
P
Replacing
by max solves the maximal
probability problem
26
References
MacKay, Information Theory, Inference and
Learning
Bishop, Machine Learning and Pattern
Recognition
main resource including all non-handdrawn figures
Jordan, Weiss, Probabilistic inference in graphical
models
27