Algorithms for MAP estimation
in Markov Random Fields
Vladimir Kolmogorov
University College London
Tutorial at GDR (Optimisation Discrète, Graph Cuts et Analyse d'Images)
Paris, 29 November 2005
Note: these slides contain animation
Energy function
E (x | ) const p ( x p ) pq ( x p , xq )
p
q
p ,q
unary terms
(data)
pairwise terms
(coherence)
p
- xp are discrete variables (for example, xp{0,1})
- p(•) are unary potentials
- pq(•,•) are pairwise potentials
Minimisation algorithms
• Min Cut / Max Flow [Ford&Fulkerson ‘56]
[Grieg, Porteous, Seheult ‘89] : non-iterative (binary variables)
[Boykov, Veksler, Zabih ‘99] : iterative - alpha-expansion, alpha-beta swap, …
(multi-valued variables)
+ If applicable, gives very accurate results
– Can be applied to a restricted class of functions
• BP – Max-product Belief Propagation [Pearl ‘86]
+ Can be applied to any energy function
– In vision results are usually worse than that of graph cuts
– Does not always converge
• TRW - Max-product Tree-reweighted Message Passing
[Wainwright, Jaakkola, Willsky ‘02] , [Kolmogorov ‘05]
+ Can be applied to any energy function
+ For stereo finds lower energy than graph cuts
+ Convergence guarantees for the algorithm in [Kolmogorov ’05]
Main idea: LP relaxation
• Goal: Minimize energy E(x) under constraints
xp{0,1}
• In general, NP-hard problem!
• Relax discreteness constraints: allow
xp[0,1]
• Results in linear program. Can be solved in polynomial time!
E
Energy function
with discrete variables
E
tight
E
LP relaxation
not tight
Solving LP relaxation
• Too large for general purpose LP solvers (e.g. interior point methods)
• Solve dual problem instead of primal:
– Formulate lower bound on the energy
– Maximize this bound
– When done, solves primal problem (LP relaxation)
• Two different ways to formulate lower bound
– Via posiforms: leads to maxflow algorithm
– Via convex combination of trees: leads to tree-reweighted message passing
E
Energy function
with discrete variables
E
E
LP relaxation
Lower bound on
the energy function
Notation and Preliminaries
Energy function - visualisation
E (x | ) const p ( x p ) pq ( x p , xq )
p
p ,q
p ( 0)
label 0
3
5
4
pq (0,1)
label 1
0
0
const
2
node p
1
edge (p,q)
0
node q
0
Energy function - visualisation
E (x | ) const p ( x p ) pq ( x p , xq )
p
label 0
0
0
2
node p
3
5
4
label 1
p ,q
1
edge (p,q)
vector of
all parameters
0
node q
0
Reparameterisation
0
0
4 -1
5
4
0+1
2
1
1 -1
Reparameterisation
0
0
3
5
4 -1
1
2
1 -1
0 +1
• Definition. is a reparameterisation of
if they define the same energy:
E (x | ) E (x | ) for any x
• Maxflow, BP and TRW perform reparameterisations
Part I: Lower bound via
posiforms
( maxflow algorithm)
Lower bound via posiforms
[Hammer, Hansen, Simeone’84]
E (x | ) const p ( x p ) pq ( x p , xq )
p
p ,q
maximize
non-negative
const
- lower bound on the energy:
E (x | ) const
x
Outline of part I
• Maximisation algorithm?
– Consider functions of binary variables only
• Maximising lower bound for submodular functions
–
–
–
–
Definition of submodular functions
Overview of min cut/max flow
Reduction to max flow
Global minimum of the energy
• Maximising lower bound for non-submodular functions
– Reduction to max flow
• More complicated graph
– Part of optimal solution
Submodular functions
of binary variables
• Definition: E is submodular if every pairwise term satisfies
pq (0,0) pq (1,1) pq (0,1) pq (1,0)
• Can be converted to “canonical form”:
5
0
1
2
zero
cost
2
0
3
0
4
1
Overview of min cut/max flow
Min Cut problem
source
2
1
1
3
2
4
5
sink
Directed weighted graph
Min Cut problem
source
2
1
1
3
2
4
5
sink
Cut:
S = {source, node 1}
T = {sink, node 2, node 3}
Min Cut problem
source
2
Cut:
1
1
3
S = {source, node 1}
T = {sink, node 2, node 3}
2
4
Cost(S,T) = 1 + 1 = 2
5
sink
• Task: Compute cut with minimum cost
Maxflow algorithm
source
2
1
1
3
2
4
5
sink
value(flow)=0
Maxflow algorithm
source
2
1
1
3
2
4
5
sink
value(flow)=0
Maxflow algorithm
source
1
1
0
3
3
4
4
sink
value(flow)=1
Maxflow algorithm
source
1
1
0
3
3
4
4
sink
value(flow)=1
Maxflow algorithm
source
1
0
0
4
3
3
3
sink
value(flow)=2
Maxflow algorithm
source
1
0
0
4
3
3
3
sink
value(flow)=2
Maxflow algorithm
source
1
0
0
4
3
3
3
sink
value(flow)=2
Maximising lower bound for
submodular functions:
Reduction to maxflow
Maxflow algorithm
and reparameterisation
source
2
1
1
3
2
4
5
1
2
sink
value(flow)=0
5
0
2
0
3
0
0
4
1
Maxflow algorithm
and reparameterisation
source
2
1
1
3
2
4
5
1
2
sink
value(flow)=0
5
0
2
0
3
0
0
4
1
Maxflow algorithm
and reparameterisation
source
1
1
0
3
3
4
4
0
1
sink
value(flow)=1
4
0
3
0
3
0
1
4
1
Maxflow algorithm
and reparameterisation
source
1
1
0
3
3
4
4
0
1
sink
value(flow)=1
4
0
3
0
3
0
1
4
1
Maxflow algorithm
and reparameterisation
source
1
0
0
4
3
3
3
0
1
sink
value(flow)=2
3
0
3
0
4
0
2
3
0
Maxflow algorithm
and reparameterisation
source
1
0
0
4
3
3
3
0
1
sink
value(flow)=2
3
0
3
0
4
0
2
3
0
Maxflow algorithm
and reparameterisation
source
1
0
0
4
3
3
3
0
0
0
sink
value(flow)=2
0
2
minimum of the energy:
x (0,1,1)
0
Maximising lower bound for
non-submodular functions
Arbitrary functions of binary variables
E (x | ) const p ( x p ) pq ( x p , xq )
p
maximize
p ,q
non-negative
• Can be solved via maxflow [Boros,Hammer,Sun’91]
– Specially constructed graph
E
• Gives solution to LP relaxation: for each node
xp{0, 1/2, 1}
LP relaxation
Arbitrary functions of binary variables
0
0
1/2
1
1
1/2
1
1/2
1/2
Part of optimal solution
[Hammer, Hansen, Simeone’84]
Part II: Lower bound via
convex combination of trees
( tree-reweighted message passing)
Convex combination of trees
[Wainwright, Jaakkola, Willsky ’02]
• Goal: compute minimum of the energy for :
F( ) min E (x | )
x
• In general, intractable!
• Obtaining lower bound:
– Split into several components: 1 2 ...
– Compute minimum for each component:
F ( ) min E (x | )
i
i
x
– Combine F1, F2, ... to get a bound on F
• Use trees!
Convex combination of trees
(cont’d)
graph
tree T
tree T’
1 T
2
1 T'
2
1
T
F ( )
2
1
T'
F ( )
2
F( )
maximize
lower bound on the energy
TRW algorithms
• Goal: find reparameterisation maximizing lower bound
• Apply sequence of different reparameterisation
operations:
– Node averaging
– Ordinary BP on trees
• Order of operations?
– Affects performance dramatically
• Algorithms:
– [Wainwright et al. ’02]: parallel schedule
• May not converge
– [Kolmogorov’05]: specific sequential schedule
• Lower bound does not decrease, convergence guarantees
Node averaging
0
4
1
0
Node averaging
2
0.5
2
0.5
Belief propagation (BP) on trees
• Send messages
– Equivalent to reparameterising node and edge parameters
• Two passes (forward and backward)
Belief propagation (BP) on trees
• Key property (Wainwright et al.):
Upon termination p gives min-marginals for node p:
3
p (0) min E (x | ) const
x p 0
0
p (1) min E (x | ) const
x p 1
TRW algorithm of Wainwright et al.
with tree-based updates (TRW-T)
Run BP on all trees
“Average” all nodes
• If converges, gives (local) maximum of lower bound
• Not guaranteed to converge.
• Lower bound may go down.
Sequential TRW algorithm (TRW-S)
[Kolmogorov’05]
Pick node p
Run BP on all trees
containing p
“Average” node p
Main property of TRW-S
• Theorem: lower bound never decreases.
• Proof sketch:
F( T ) 0 const
0
4
1
0
F( T ' ) 0 const '
Main property of TRW-S
• Theorem: lower bound never decreases.
• Proof sketch:
2
F( T ) 0.5 const
0.5
2
0.5
F( T ' ) 0.5 const '
TRW-S algorithm
• Particular order of averaging and BP operations
• Lower bound guaranteed not to decrease
• There exists limit point that satisfies
weak tree agreement condition
• Efficiency?
Efficient implementation
Pick node p
Run BP on all trees
containing p
“Average” node p
inefficient?
Efficient implementation
• Key observation:
Node averaging operation
preserves messages oriented
towards this node
• Reuse previously passed messages!
1
2
3
4
5
6
7
8
9
• Need a special choice of trees:
– Pick an ordering of nodes
– Trees: monotonic chains
Efficient implementation
• Algorithm:
– Forward pass:
1
2
3
4
5
6
7
8
9
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:
• do the same in reverse order
• Linear running time of one
iteration
Efficient implementation
• Algorithm:
– Forward pass:
1
2
3
4
5
6
7
8
9
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:
• do the same in reverse order
• Linear running time of one
iteration
Memory requirements
• Additional advantage of TRW-S:
– Needs only half as much memory as standard message
passing!
– Similar observation for bipartite graphs and parallel
schedule was made in [Felzenszwalb&Huttenlocher’04]
standard message passing
TRW-S
Experimental results:
binary segmentation (“GrabCut”)
5
Energy
6
x 10
average over 50 instances
5
4
3
0
100
200
300
400
Time
Experimental results: stereo
5
4
x 10
left image
ground truth
3.8
3.6
20
40
60
80
100
BP
TRW-S
Experimental results: stereo
7
6
x 10
x 10
1.44
1.94
1.4
1.93
1.36
20
40
60
80
100
120
140
20
40
60
80
100
120
140
Summary
• MAP estimation algorithms are based on LP relaxation
– Maximize lower bound
• Two ways to formulate lower bound
• Via posiforms: leads to maxflow algorithm
– Polynomial time solution
– But: applicable for restricted energies (e.g. binary variables)
• Submodular functions: global minimum
• Non-submodular functions: part of optimal solution
• Via convex combination of trees: leads to TRW algorithm
– Convergence in the limit (for TRW-S)
– Applicable to arbitrary energy function
• Graph cuts vs. TRW:
– Accuracy: similar
– Generality: TRW is more general
– Speed: for stereo TRW is currently 2-5 times slower. But:
• 3 vs. 50 years of research!
• More suitable for parallel implementation (GPU? Hardware?)
Discrete vs. continuous functionals
Discrete formulation
(Graph cuts)
E (x) E p ( x p ) E pq ( x p , xq )
p
p ,q
• Maxflow algorithm
– Global minimum, polynomial-time
• Metrication artefacts?
Continuous formulation
(Geodesic active contours)
|C |
E (C ) g (C (s)) ds
0
• Level sets
– Numerical stability?
• Geometrically motivated
– Invariant under rotation
Geo-cuts
E (C ) g (N) ds
|C |
• Continuous functional
0
f
dV
interior(C )
• Construct graph such that for
smooth contours C
E (C ) cost of the correspond ing cut
• Class of continuous functionals?
[Boykov&Kolmogorov’03], [Kolmogorov&Boykov’05]:
– Geometric length/area (e.g. Riemannian)
– Flux of a given vector field
– Regional term
© Copyright 2026 Paperzz