Advances in discrete energy minimisation for computer vision

Algorithms for MAP estimation
in Markov Random Fields
Vladimir Kolmogorov
University College London
Tutorial at GDR (Optimisation Discrète, Graph Cuts et Analyse d'Images)
Paris, 29 November 2005
Note: these slides contain animation
Energy function
E (x |  )   const    p ( x p )    pq ( x p , xq )
p
q
p ,q
unary terms
(data)
pairwise terms
(coherence)
p
- xp are discrete variables (for example, xp{0,1})
- p(•) are unary potentials
- pq(•,•) are pairwise potentials
Minimisation algorithms
• Min Cut / Max Flow [Ford&Fulkerson ‘56]
[Grieg, Porteous, Seheult ‘89] : non-iterative (binary variables)
[Boykov, Veksler, Zabih ‘99] : iterative - alpha-expansion, alpha-beta swap, …
(multi-valued variables)
+ If applicable, gives very accurate results
– Can be applied to a restricted class of functions
• BP – Max-product Belief Propagation [Pearl ‘86]
+ Can be applied to any energy function
– In vision results are usually worse than that of graph cuts
– Does not always converge
• TRW - Max-product Tree-reweighted Message Passing
[Wainwright, Jaakkola, Willsky ‘02] , [Kolmogorov ‘05]
+ Can be applied to any energy function
+ For stereo finds lower energy than graph cuts
+ Convergence guarantees for the algorithm in [Kolmogorov ’05]
Main idea: LP relaxation
• Goal: Minimize energy E(x) under constraints
xp{0,1}
• In general, NP-hard problem!
• Relax discreteness constraints: allow
xp[0,1]
• Results in linear program. Can be solved in polynomial time!
E
Energy function
with discrete variables
E
tight
E
LP relaxation
not tight
Solving LP relaxation
• Too large for general purpose LP solvers (e.g. interior point methods)
• Solve dual problem instead of primal:
– Formulate lower bound on the energy
– Maximize this bound
– When done, solves primal problem (LP relaxation)
• Two different ways to formulate lower bound
– Via posiforms: leads to maxflow algorithm
– Via convex combination of trees: leads to tree-reweighted message passing
E
Energy function
with discrete variables
E
E
LP relaxation
Lower bound on
the energy function
Notation and Preliminaries
Energy function - visualisation
E (x |  )   const    p ( x p )    pq ( x p , xq )
p
p ,q
 p ( 0)
label 0
3
5
4
 pq (0,1)
label 1
0
0
 const
2
node p
1
edge (p,q)
0
node q
0
Energy function - visualisation
E (x |  )   const    p ( x p )    pq ( x p , xq )
p
label 0
0
0
2
node p
3
5
4
label 1
p ,q
1
edge (p,q)

vector of

all parameters
0
node q
0
Reparameterisation
0
0
4 -1
5
4
0+1
2
1
1 -1
Reparameterisation
0
0
3
5
4 -1
1
2
1 -1
0 +1
• Definition.   is a reparameterisation of
if they define the same energy:
     
E (x |  )  E (x |  ) for any x
• Maxflow, BP and TRW perform reparameterisations
Part I: Lower bound via
posiforms
( maxflow algorithm)
Lower bound via posiforms
[Hammer, Hansen, Simeone’84]
E (x |  )   const    p ( x p )    pq ( x p , xq )
p
p ,q
maximize
non-negative
const
- lower bound on the energy:
E (x |  )   const
x
Outline of part I
• Maximisation algorithm?
– Consider functions of binary variables only
• Maximising lower bound for submodular functions
–
–
–
–
Definition of submodular functions
Overview of min cut/max flow
Reduction to max flow
Global minimum of the energy
• Maximising lower bound for non-submodular functions
– Reduction to max flow
• More complicated graph
– Part of optimal solution
Submodular functions
of binary variables
• Definition: E is submodular if every pairwise term satisfies
 pq (0,0)   pq (1,1)   pq (0,1)   pq (1,0)
• Can be converted to “canonical form”:
5
0
1
2
zero
cost
2
0
3
0
4
1
Overview of min cut/max flow
Min Cut problem
source
2
1
1
3
2
4
5
sink
Directed weighted graph
Min Cut problem
source
2
1
1
3
2
4
5
sink
Cut:
S = {source, node 1}
T = {sink, node 2, node 3}
Min Cut problem
source
2
Cut:
1
1
3
S = {source, node 1}
T = {sink, node 2, node 3}
2
4
Cost(S,T) = 1 + 1 = 2
5
sink
• Task: Compute cut with minimum cost
Maxflow algorithm
source
2
1
1
3
2
4
5
sink
value(flow)=0
Maxflow algorithm
source
2
1
1
3
2
4
5
sink
value(flow)=0
Maxflow algorithm
source
1
1
0
3
3
4
4
sink
value(flow)=1
Maxflow algorithm
source
1
1
0
3
3
4
4
sink
value(flow)=1
Maxflow algorithm
source
1
0
0
4
3
3
3
sink
value(flow)=2
Maxflow algorithm
source
1
0
0
4
3
3
3
sink
value(flow)=2
Maxflow algorithm
source
1
0
0
4
3
3
3
sink
value(flow)=2
Maximising lower bound for
submodular functions:
Reduction to maxflow
Maxflow algorithm
and reparameterisation
source
2
1
1
3
2
4
5
1
2
sink
value(flow)=0
5
0
2
0
3
0
0
4
1
Maxflow algorithm
and reparameterisation
source
2
1
1
3
2
4
5
1
2
sink
value(flow)=0
5
0
2
0
3
0
0
4
1
Maxflow algorithm
and reparameterisation
source
1
1
0
3
3
4
4
0
1
sink
value(flow)=1
4
0
3
0
3
0
1
4
1
Maxflow algorithm
and reparameterisation
source
1
1
0
3
3
4
4
0
1
sink
value(flow)=1
4
0
3
0
3
0
1
4
1
Maxflow algorithm
and reparameterisation
source
1
0
0
4
3
3
3
0
1
sink
value(flow)=2
3
0
3
0
4
0
2
3
0
Maxflow algorithm
and reparameterisation
source
1
0
0
4
3
3
3
0
1
sink
value(flow)=2
3
0
3
0
4
0
2
3
0
Maxflow algorithm
and reparameterisation
source
1
0
0
4
3
3
3
0
0
0
sink
value(flow)=2
0
2
minimum of the energy:
x  (0,1,1)
0
Maximising lower bound for
non-submodular functions
Arbitrary functions of binary variables
E (x |  )   const    p ( x p )    pq ( x p , xq )
p
maximize
p ,q
non-negative
• Can be solved via maxflow [Boros,Hammer,Sun’91]
– Specially constructed graph
E
• Gives solution to LP relaxation: for each node
xp{0, 1/2, 1}
LP relaxation
Arbitrary functions of binary variables
0
0
1/2
1
1
1/2
1
1/2
1/2
Part of optimal solution
[Hammer, Hansen, Simeone’84]
Part II: Lower bound via
convex combination of trees
( tree-reweighted message passing)
Convex combination of trees
[Wainwright, Jaakkola, Willsky ’02]
• Goal: compute minimum of the energy for  :
F( )  min E (x |  )
x
• In general, intractable!
• Obtaining lower bound:
– Split  into several components:   1  2  ...
– Compute minimum for each component:
F ( )  min E (x |  )
i
i
x
– Combine F1, F2, ... to get a bound on F
• Use trees!
Convex combination of trees
(cont’d)
graph
tree T
tree T’


1 T

2

1 T'

2

1
T
F ( )
2

1
T'
F ( )
2
F( )
maximize
lower bound on the energy
TRW algorithms
• Goal: find reparameterisation maximizing lower bound
• Apply sequence of different reparameterisation
operations:
– Node averaging
– Ordinary BP on trees
• Order of operations?
– Affects performance dramatically
• Algorithms:
– [Wainwright et al. ’02]: parallel schedule
• May not converge
– [Kolmogorov’05]: specific sequential schedule
• Lower bound does not decrease, convergence guarantees
Node averaging

0
4
1
0

Node averaging
2
0.5

2
0.5

Belief propagation (BP) on trees
• Send messages
– Equivalent to reparameterising node and edge parameters
• Two passes (forward and backward)


Belief propagation (BP) on trees
• Key property (Wainwright et al.):
Upon termination p gives min-marginals for node p:
3
 p (0)  min E (x |  )  const
x p 0
0
 p (1)  min E (x |  )  const
x p 1
TRW algorithm of Wainwright et al.
with tree-based updates (TRW-T)
Run BP on all trees
“Average” all nodes
• If converges, gives (local) maximum of lower bound
• Not guaranteed to converge.
• Lower bound may go down.


Sequential TRW algorithm (TRW-S)
[Kolmogorov’05]
Pick node p
Run BP on all trees
containing p

“Average” node p

Main property of TRW-S
• Theorem: lower bound never decreases.
• Proof sketch:
F( T )  0  const

0
4
1
0

F( T ' )  0  const '
Main property of TRW-S
• Theorem: lower bound never decreases.
• Proof sketch:
2
F( T )  0.5  const

0.5
2
0.5
F( T ' )  0.5  const '

TRW-S algorithm
• Particular order of averaging and BP operations
• Lower bound guaranteed not to decrease
• There exists limit point that satisfies
weak tree agreement condition
• Efficiency?
Efficient implementation
Pick node p
Run BP on all trees
containing p
“Average” node p
inefficient?


Efficient implementation
• Key observation:
Node averaging operation
preserves messages oriented
towards this node
• Reuse previously passed messages!
1
2
3
4
5
6
7
8
9
• Need a special choice of trees:
– Pick an ordering of nodes
– Trees: monotonic chains
Efficient implementation
• Algorithm:
– Forward pass:
1
2
3
4
5
6
7
8
9
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:
• do the same in reverse order
• Linear running time of one
iteration
Efficient implementation
• Algorithm:
– Forward pass:
1
2
3
4
5
6
7
8
9
• process nodes in the increasing order
• pass messages from lower neighbours
– Backward pass:
• do the same in reverse order
• Linear running time of one
iteration
Memory requirements
• Additional advantage of TRW-S:
– Needs only half as much memory as standard message
passing!
– Similar observation for bipartite graphs and parallel
schedule was made in [Felzenszwalb&Huttenlocher’04]
standard message passing
TRW-S
Experimental results:
binary segmentation (“GrabCut”)
5
Energy
6
x 10
average over 50 instances
5
4
3
0
100
200
300
400
Time
Experimental results: stereo
5
4
x 10
left image
ground truth
3.8
3.6
20
40
60
80
100
BP
TRW-S
Experimental results: stereo
7
6
x 10
x 10
1.44
1.94
1.4
1.93
1.36
20
40
60
80
100
120
140
20
40
60
80
100
120
140
Summary
• MAP estimation algorithms are based on LP relaxation
– Maximize lower bound
• Two ways to formulate lower bound
• Via posiforms: leads to maxflow algorithm
– Polynomial time solution
– But: applicable for restricted energies (e.g. binary variables)
• Submodular functions: global minimum
• Non-submodular functions: part of optimal solution
• Via convex combination of trees: leads to TRW algorithm
– Convergence in the limit (for TRW-S)
– Applicable to arbitrary energy function
• Graph cuts vs. TRW:
– Accuracy: similar
– Generality: TRW is more general
– Speed: for stereo TRW is currently 2-5 times slower. But:
• 3 vs. 50 years of research!
• More suitable for parallel implementation (GPU? Hardware?)
Discrete vs. continuous functionals
Discrete formulation
(Graph cuts)
E (x)   E p ( x p )   E pq ( x p , xq )
p
p ,q
• Maxflow algorithm
– Global minimum, polynomial-time
• Metrication artefacts?
Continuous formulation
(Geodesic active contours)
|C |
E (C )   g (C (s)) ds
0
• Level sets
– Numerical stability?
• Geometrically motivated
– Invariant under rotation
Geo-cuts

E (C )   g (N) ds 
|C |
• Continuous functional
0
f
dV
interior(C )
• Construct graph such that for
smooth contours C
E (C )  cost of the correspond ing cut
• Class of continuous functionals?
[Boykov&Kolmogorov’03], [Kolmogorov&Boykov’05]:
– Geometric length/area (e.g. Riemannian)
– Flux of a given vector field
– Regional term