Graphical Models: Sum-Product Algorithm Sandro Schönborn CS351 09.04.2013 Factor Graphs Big Products of smaller factors f (x1 , x2 , x3 ) = fa (x1 )fb (x2 )fc (x1 , x2 , x3 ) P (E, B, A, J, M) = P (E)P (B)P (A|E, B)P (J|A)P (M|A .. . Y f (x) = fS (xS ) S 2 Factor Graphs f (x1 , x2 , x3 ) = fa (x1 )fb (x2 )fc (x1 , x2 , x3 ) 3 Graph Conversions Priors and Conditionals 4 Graph Conversions Potentials 5 The Sum-Product Algorithm Efficiently calculates all marginals in a tree graph Message Passing: local computation First known as Belief Propagation for directed graphs (Pearl 1982) Known under many names: Forward-Backward Algorithm, Kalman Filter, Low Density Parity Check Codes, . . . 6 Belief Propagation Message Passing Belief of a node abouts its neighbour Marginal: Combined beliefs of all neighbours Tree propagation: all marginals for the cost of two → Sum-Product: Generalization to factor graphs 7 The Sum-Product Algorithm Tree Propagation Choose root node Propagate from the leaves towards the root: for each node • Collect incoming messages from all children (wait) • Send message to parent Revert direction, start at root • Distribute messages to all children 8 The Sum-Product Algorithm 9 Variable to Factor Message µx→f (x) = µf1 →x (x)µf2 →x (x) 10 Factor to Variable Message µf →x (x) = XX x1 f (x, x1 , x2 )µx1 →f (x1 )µx2 →f (x2 ) x2 11 Starting and Stopping Initial Messages Marginal p(x) ∝ Y µfi →x (x) i 12 Remarks Messages are functions of the variable involved A message summarizes the information about the destination variable as seen from the subgraph behind the sender A message never includes knowledge from the recipient (no double counting) ⇒ Messages depend on receiver Graph must not have loops: destroys scheduling → see Loopy Belief Propagation Local normalization is sufficient 14 Observations We know x = v Initial messages change for observed variables Observations µx→f (x) = δxv δ is the Kronecker Delta δxv ( 1 if x = v , δxv = 0 x= 6 v 15 Message Passing Initial Messages µx→f (x) = 1 µf →x (x) = f (x) µx→f (x) = δxxobs 16 Message Passing Running Messages µx→f (x) = N Y µfi →x (x) i =1 µf →x (x) = XX x1 x2 ··· X xN f (x, x1 , . . . , xN ) N Y µxi →f (xi ) i =1 17 The California Example Example Will John call? P (J) John called, how likely is the burglary? P (B | j) 18 Continuous Variables Continuous variables do not change the structure of the algorithm Factor-to-Variable messages change: Z X −→ dx x 19 Continuous Variables Rehandout of real functions? • Parametric rehandout (e.g. Gaussians) • Function space rehandout with finite basis Observations: Initial message becomes a Dirac Delta µx→f (x) = δ(x − xobs ) 20 The California Example — Beefed-Up Replace John by a web-connected microphone at home The microphone measures the continuous sound level S with values S ∈ R We need a new conditional distribution P (S|A) P (S|a) = N (S|1, 0.04) P (S|ā) = N (S|0, 0.25) 21 22 California Example — Beefed-Up Example What is the sound level? P (S) We read 0.9, how likely is the burglary? P (B|S = 0.9) 23 Max-Product Algorithm Often we need the maximum posterior probability max P (x), x = arg max P (x) x x The max function satisfies max(a × c, b × c) = max(a, b) × c c ≥0 We can use the same algorithm to calculate maximal assignments! 24 Max-Product Algorithm Change µf →x (x) to µf →x (x) = max max f (x, x1 , x2 )µx1 →f (x1 )µx2 →f (x2 ) x1 x2 Introduce bookkeeping of local maximal assignments at the factors 25 Summary Factor graph: factorization of large products Marginals can be calculated by message passing Messages are beliefs: What the sub-graph behind a node thinks of the destination of the message Exact on tree-structured factor graphs Complexity depends on the factor with highest degree (connectivity) P Replacing by max solves the maximal probability problem 26 References MacKay, Information Theory, Inference and Learning Bishop, Machine Learning and Pattern Recognition main resource including all non-handdrawn figures Jordan, Weiss, Probabilistic inference in graphical models 27
© Copyright 2026 Paperzz