Probabilistic Inference Lecture 1 M. Pawan Kumar [email protected] Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/ About the Course • 7 lectures + 1 exam • Probabilistic Models – 1 lecture • Energy Minimization – 4 lectures • Computing Marginals – 2 lectures • Related Courses • Probabilistic Graphical Models (MVA) • Structured Prediction Instructor • Assistant Professor (2012 – Present) • Center for Visual Computing • 12 Full-time Faculty Members • 2 Associate Faculty Members • Research Interests • Probabilistic Models • Machine Learning • Computer Vision • Medical Image Analysis Students • Third year at ECP • Specializing in Machine Learning and Vision • Prerequisites • Probability Theory • Continuous Optimization • Discrete Optimization Outline • Probabilistic Models • Conversions • Exponential Family • Inference Example (on board) !! Outline • Probabilistic Models • Markov Random Fields (MRF) • Bayesian Networks • Factor Graphs • Conversions • Exponential Family • Inference MRF Unobserved Random Variables Neighbors Edges define a neighborhood over random variables MRF V V V 1 2 3 V V V 4 5 6 V V V 7 8 9 Variable Va takes a value or a label va from a set L = {l1, l2,…, lh} V = v is called a labeling Discrete, Finite MRF V V V 1 2 3 V V V 4 5 6 V V V 7 8 9 MRF assumes the Markovian property for P(v) MRF V V V 1 2 3 V V V 4 5 6 V V V 7 8 9 Va is conditionally independent of Vb given Va’s neighbors Hammersley-Clifford Theorem MRF Potential ψ12(v1,v2) V V V 1 2 3 V V V 4 5 6 V V V 7 8 9 Potential ψ56(v5,v6) Probability P(v) can be decomposed into clique potentials MRF Potential ψ1(v1,d1) d1 d2 d3 V V V 1 2 3 d4 d5 d6 V V V 4 5 6 d7 d8 d9 V V V 7 8 9 Probability P(v) proportional to Π(a,b) ψab(va,vb) Probability P(d|v) proportional to Πa ψa (va,da) Observed Data MRF d1 d2 d3 V V V 1 2 3 d4 d5 d6 V V V 4 5 6 d7 d8 d9 V V V 7 8 9 Probability P(v,d) = Πa ψa(va,da) Π(a,b) ψab(va,vb) Z Z is known as the partition function MRF d1 d2 V V V 1 2 3 d4 High-order Potential ψ4578(v4,v5,v7,v8) d3 d5 d6 V V V 4 5 6 d7 d8 d9 V V V 7 8 9 Pairwise MRF Unary Potential ψ1(v1,d1) d1 d2 d3 V V V 1 2 3 d4 d5 d6 V V V 4 5 6 d7 d8 d9 V V V 7 8 9 Probability P(v,d) = Pairwise Potential ψ56(v5,v6) Πa ψa(va,da) Π(a,b) ψab(va,vb) Z Z is known as the partition function MRF d1 d2 d3 V V V 1 2 3 d4 d5 d6 V V V 4 5 6 d7 d8 d9 V V V 7 8 9 A is conditionally independent of B given C if there is no path from A to B when C is removed Conditional Random Fields (CRF) d1 d2 d3 V V V 1 2 3 d4 d5 d6 V V V 4 5 6 d7 d8 d9 V V V 7 8 9 CRF assumes the Markovian property for P(v|d) Hammersley-Clifford Theorem CRF d1 d2 d3 V V V 1 2 3 d4 d5 d6 V V V 4 5 6 d7 d8 d9 V V V 7 8 9 Probability P(v|d) proportional to Πa ψa(va;d) Π(a,b) ψab(va,vb;d) Clique potentials that depend on the data CRF d1 d2 d3 V V V 1 2 3 d4 d5 d6 V V V 4 5 6 d7 d8 d9 V V V 7 8 9 Probability P(v|d) = Πa ψa (va;d) Π(a,b) ψab(va,vb;d) Z Z is known as the partition function MRF and CRF Probability P(v) = V V V 1 2 3 V V V 4 5 6 V V V 7 8 9 Πa ψa(va) Π(a,b) ψab(va,vb) Z Outline • Probabilistic Models • Markov Random Fields (MRF) • Bayesian Networks • Factor Graphs • Conversions • Exponential Family • Inference Bayesian Networks V 1 V V 2 3 V V V 4 5 6 V V 7 8 Directed Acyclic Graph (DAG) – no directed loops Ignoring directionality of edges, a DAG can have loops Bayesian Networks V 1 V V 2 3 V V V 4 5 6 V V 7 8 Bayesian Network concisely represents the probability P(v) Bayesian Networks V 1 V V 2 3 V V V 4 5 6 V V 7 8 Probability P(v) = Πa P(va|Parents(va)) P(v1)P(v2|v1)P(v3|v1)P(v4|v2)P(v5|v2,v3)P(v6|v3)P(v7|v4,v5)P(v8|v5,v6) Bayesian Networks Courtesy Kevin Murphy Bayesian Networks V 1 V V 2 3 V V V 4 5 6 V V 7 8 Va is conditionally independent of its ancestors given its parents Bayesian Networks Conditional independence of A and B given C Courtesy Kevin Murphy Outline • Probabilistic Models • Markov Random Fields (MRF) • Bayesian Networks • Factor Graphs • Conversions • Exponential Family • Inference Factor Graphs V a V b V 1 2 3 c d e V 4 f V 5 g V 6 Two types of nodes: variable nodes and factor nodes Bipartite graph between the two types of nodes Factor Graphs ψa(v1,v2) V a V b V 1 2 3 c d e V 4 f V 5 g V 6 Factor graphs concisely represents the probability P(v) Factor Graphs ψa({v}a) V a V b V 1 2 3 c d e V 4 f V 5 g V 6 Factor graphs concisely represents the probability P(v) Factor Graphs ψb(v2,v3) V a V b V 1 2 3 c d e V 4 f V 5 g V 6 Factor graphs concisely represents the probability P(v) Factor Graphs ψb({v}b) V a V b V 1 2 3 c d e V 4 f V 5 g V 6 Factor graphs concisely represents the probability P(v) Factor Graphs ψb({v}b) V a b V 1 2 3 c d e V f 4 Probability P(v) = V V 5 Πa ψa({v}a) Z Z is known as the partition function g V 6 Outline • Probabilistic Models • Conversions • Exponential Family • Inference MRF to Factor Graphs Bayesian Networks to Factor Graphs Factor Graphs to MRF Outline • Probabilistic Models • Conversions • Exponential Family • Inference Motivation Random Variable V Label set L = {l1, l2,…, lh} Samples V1, V2, …, Vm that are i.i.d. Functions ϕα: L Reals α indexes a set of functions Empirical expectations: μα = (Σi ϕα(Vi))/m Expectation wrt distribution P: EP[ϕα(V)] = Σi ϕα(li)P(li) Given empirical expectations, find compatible distribution Underdetermined problem Maximum Entropy Principle max Entropy of the distribution s.t. Distribution is compatible Maximum Entropy Principle max -Σi P(li)log(P(li)) s.t. Distribution is compatible Maximum Entropy Principle max -Σi P(li)log(P(li)) s.t. Σi ϕα(li)P(li) = μα for all α Σi P(li) = 1 P(v) proportional to exp(-Σα θαϕα(v)) Exponential Family Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2,…, lh} Labeling V = v, va L for all a {1, 2,…, n} Functions ϕα: Ln Reals α indexes a set of functions P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Normalization Statistics Constant Minimal Representation P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Normalization Statistics Constant No non-zero c such that Σα cαΦα(v) = Constant Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2} Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters va θa for all Va V vavb θab for all (Va,Vb) E Ising Model P(v) = exp{-Σa θava -Σa,b θabvavb- A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters va θa for all Va V vavb θab for all (Va,Vb) E Interactive Binary Segmentation Interactive Binary Segmentation Foreground histogram of RGB values FG Background histogram of RGB values BG ‘+1’ indicates foreground and ‘-1’ indicates background Interactive Binary Segmentation More likely to be foreground than background Interactive Binary Segmentation θa proportional to -log(FG(da)) + log(BG(da)) More likely to be background than foreground Interactive Binary Segmentation More likely to belong to same label Interactive Binary Segmentation θab proportional to -exp(-(da-db)2) Less likely to belong to same label Rest of lecture 1 …. Exponential Family P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Statistics Log-Partition Function Random Variables V = {V1,V2,…,Vn} Random Variable Va takes a value or label va va L = {l1,l2,…,lh} Labeling V = v Overcomplete Representation P(v) = exp{-Σα θαΦα(v) - A(θ)} Parameters Sufficient Statistics Log-Partition Function There exists a non-zero c such that Σα cαΦα(v) = Constant Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2} Ising Model P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va V, li L Iab;ik(va,vb) θab;ik for all (Va,Vb) E, li, lk L Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk Ising Model P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va V, li L Iab;ik(va,vb) θab;ik for all (Va,Vb) E, li, lk L Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk Interactive Binary Segmentation Foreground histogram of RGB values FG Background histogram of RGB values BG ‘1’ indicates foreground and ‘0’ indicates background Interactive Binary Segmentation More likely to be foreground than background Interactive Binary Segmentation θa;0 proportional to -log(BG(da)) θa;1 proportional to -log(FG(da)) More likely to be background than foreground Interactive Binary Segmentation More likely to belong to same label Interactive Binary Segmentation θab;ik proportional to exp(-(da-db)2) if i ≠ k θab;ik = 0 if i = k Less likely to belong to same label Metric Labeling P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Metric Labeling P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, …, h-1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va V, li L Iab;ik(va,vb) θab;ik for all (Va,Vb) E, li, lk L θab;ik is a metric distance function over labels Metric Labeling P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {0, …, h-1} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va V, li L Iab;ik(va,vb) θab;ik for all (Va,Vb) E, li, lk L θab;ik is a metric distance function over labels Stereo Correspondence Disparity Map Stereo Correspondence L = {disparities} Pixel (xa,ya) in left corresponds to pixel (xa+va,ya) in right Stereo Correspondence L = {disparities} θa;i is proportional to the difference in RGB values Stereo Correspondence L = {disparities} θab;ik = wab d(i,k) wab proportional to exp(-(da-db)2) Pairwise MRF P(v) = exp{-Σα θαΦα(v) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Sufficient Statistics Parameters Ia;i(va) θa;i for all Va V, li L Iab;ik(va,vb) θab;ik for all (Va,Vb) E, li, lk L Pairwise MRF P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Probability P(v) = Πa ψa(va) Π(a,b) ψab(va,vb) Z A(θ) : log Z ψa(li) : exp(-θa;i) ψa(li,lk) : exp(-θab;ik) Parameters θ are sometimes also referred to as potentials Pairwise MRF P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n} {1, 2, …, h} Variable Va takes a label lf(a) Pairwise MRF P(f) = exp{-Σa θa;f(a) -Σa,b θab;f(a)f(b) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n} {1, 2, …, h} Variable Va takes a label lf(a) Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b) Pairwise MRF P(f) = exp{-Q(f) - A(θ)} Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh} Neighborhood over variables specified by edges E Labeling as a function f : {1, 2, … , n} {1, 2, …, h} Variable Va takes a label lf(a) Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b) Outline • Probabilistic Models • Conversions • Exponential Family • Inference Inference maxv ( P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} ) Maximum a Posteriori (MAP) Estimation minf ( Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b) ) Energy Minimization P(va = li) = Σv P(v)δ(va = li) P(va = li, vb = lk) = Σv P(v)δ(va = li)δ(vb = lk) Computing Marginals Next Lecture … Energy minimization for tree-structured pairwise MRF
© Copyright 2026 Paperzz