Linear Programming Relaxations for Graphical Models

Linear Programming Relaxations for
Graphical Models
Tommi Jaakkola (MIT) , Amir Globerson (Hebrew Univ.)
Based on joint work with
M. Collins, M. Fromer, T. Koo,
M. Meila, O. Meshi, A. Rush, D. Sontag
Monday, December 12, 11
Outline
• Inference with linear programs
- problems, examples
- geometry, polytopes, constraints
- relaxations, consequences
- dual decomposition, properties
• Solving linear programs
- sub-gradient, coordinate descent
• Learning with LPs
- structured prediction
- coordinate descent, cutting plane
- pseudo-max
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
• Computational biology
- e.g., drug design, molecular structure prediction
• Natural language processing
- e.g., parsing, machine translation, information extraction
• Medical informatics
- e.g., automated diagnosis
• Exploratory analysis
- e.g., learning Bayesian networks
• etc.
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
• Computational biology
- e.g., drug design, molecular structure prediction
• Natural language processing
- e.g., parsing, machine translation, information extraction
• Medical informatics
- e.g., automated diagnosis
• Exploratory analysis
- e.g., learning Bayesian networks
• etc.
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
(Tappen & Freeman ’03)
corresponding pixels
Goal: recover the depth of each pixel in the image
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
(Tappen & Freeman ’03)
...
corresponding pixels
...
...
Goal: recover the depth of each pixel in the image
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
(Tappen & Freeman ’03)
...
corresponding pixels
...
...
displacements
Goal: recover the depth of each pixel in the image
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
(Tappen & Freeman ’03)
...
corresponding pixels
...
...
neighbor constrained displacements
Goal: recover the depth of each pixel in the image
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
orientations of side chains
are energetically coupled
protein
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
y1
y1
y2
y2
y5
y3
y4
y3
y5
y4
orientation angles as variables,
orientations of side chains
are energetically coupled
protein
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
y1
y1
y2
y2
y5
y3
y4
orientations of side chains
are energetically coupled
y3
y5
y4
orientation angles as variables,
densely coupled
protein
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
y1
y1
y2
y2
y5
y3
y4
orientations of side chains
are energetically coupled
y3
✓15 (y1 , y5 )
y5
y4
orientation angles as variables,
densely coupled
protein
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
y1
y1
y2
y2
y5
y3
y4
y3
y4
orientations of side chains
are energetically coupled
protein
✓15 (y1 , y5 )
y5
orientation angles as variables,
densely coupled
1
P (y) = exp
Z
X
✓ij (yi , yj )
(i,j)2E
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
y1
y1
y2
y2
y5
y3
y4
orientations of side chains
are energetically coupled
protein
✓15 (y1 , y5 )
y5
y4
y3
orientation angles as variables,
densely coupled
max
y
X
✓ij (yi , yj )
(i,j)2E
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Natural language processing
- e.g., dependency parsing, machine translation,
information extraction
*
John
saw
a
movie
yesterday
that
he
liked
Goal: find the highest scoring parse for any given sentence
Monday, December 12, 11
Inference problems
• Natural language processing
- e.g., dependency parsing, machine translation,
information extraction
*
i=0
John
saw
1
2
a
movie
⇤ 1 2
that
he
liked
n
y(i, j) = 1 if arc i ! j is selected
and zero otherwise
n
2
1
yesterday
n
Goal: find the highest scoring parse for any given sentence
Monday, December 12, 11
Inference problems
• Natural language processing
- e.g., dependency parsing, machine translation,
information extraction
*
i=0
John
saw
1
2
a
movie
yesterday
max
y
⇤ 1 2
he
liked
n
y(i, j) = 1 if arc i ! j is selected
and zero otherwise
n
2
1
that
X
i,j
y(i, j)✓(i, j) + ✓T (y)
has to be a tree
n
Goal: find the highest scoring parse for any given sentence
Monday, December 12, 11
Inference problems
• Natural language processing
- e.g., dependency parsing, machine translation,
information extraction
*
i=0
John
saw
1
2
a
movie
yesterday
max
y
⇤ 1 2
he
liked
n
y(i, j) = 1 if arc i ! j is selected
and zero otherwise
n
2
1
that
X
i
✓i (y(i, ·)) + ✓T (y)
has to be a tree
useful scoring functions
couple modifier (arc) choices
n
Goal: find the highest scoring parse for any given sentence
Monday, December 12, 11
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
;
{1}
y2 2
{3}
{1, 3}
;
{2}
y1 2
{3}
{2, 3}
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
;
{1}
y3 2
{2}
{1, 2}
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
1
2
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
3
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
;
{1}
y2 2
{3}
{1, 3}
;
{2}
y1 2
{3}
{2, 3}
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
;
{1}
y3 2
{2}
{1, 2}
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
1
2
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
3
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
;
{2}
y1 2
{3}
{2, 3}
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
✓1(;)
;
{1}
y2 2
y3 2
{3}
{1, 3} ✓2({1, 3})
1
;
{1} ✓3({1})
{2}
{1, 2}
2
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
3
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
;
{2}
y1 2
{3}
{2, 3}
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
✓1(;)
;
{1}
y2 2
y3 2
{3}
{1, 3} ✓2({1, 3})
1
;
{1} ✓3({1})
{2}
max
{1, 2}
y
2
X
3
✓i (yi ) + ✓DAG (y)
i
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
has to be a
DAG
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
;
{2} ✓1({2})
y1 2
{3}
{2, 3}
;
;
{1}
{1} ✓3({1})
y2 2
y3 2
{3}
{2}
{1, 3} ✓2({1, 3})
max
{1, 2}
y
1
2
X
3
✓i (yi ) + ✓DAG (y)
i
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
has to be a
DAG
Inference problems
• MAP inference problems are often provably hard
- protein design (Pierce et al. ’02)
- dependency parsing with sibling scoring (McDonald et al. ’07)
- structure learning of Bayesian networks (Chickering ’96, etc.)
- etc.
• But “typical” instances of these problems may be
solvable substantially faster
• We will discuss here (dual) linear programming (LP)
relaxations for solving “typical” instances of such
problems
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
yi 2 {0, 1}
1
P (y; ✓) =
exp ✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
Z(✓)
2
3
where
✓12 (0, 0)
6 ✓12 (1, 0) 7
7
✓ =6
4 ✓12 (0, 1) 5
✓12 (1, 1)
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
yi 2 {0, 1}
1
P (y; ✓) =
exp ✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
Z(✓)
where
2
✓12 (0, 0)
✓12 (1, 0)
✓12 (0, 1)
✓12 (1, 1)
6
6
6
6
6
✓=6
6 ✓ (0, 0)
6 23
6 ✓ (1, 0)
6 23
4 ✓ (0, 1)
23
✓23 (1, 1)
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
(y1⇤ , y2⇤ , y3⇤ ) = arg max
y1 ,y2 ,y3
where
2
✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
✓12 (0, 0)
✓12 (1, 0)
✓12 (0, 1)
✓12 (1, 1)
6
6
6
6
6
✓=6
6 ✓ (0, 0)
6 23
6 ✓ (1, 0)
6 23
4 ✓ (0, 1)
23
✓23 (1, 1)
Monday, December 12, 11
yi 2 {0, 1}
3
7
7
7
7
7
7
7
7
7
7
5
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
yi 2 {0, 1}
• The MAP problem we wish to solve is
(y1⇤ , y2⇤ , y3⇤ ) = arg max
y1 ,y2 ,y3
where
2
✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
✓12 (0, 0)
✓12 (1, 0)
✓12 (0, 1)
✓12 (1, 1)
6
6
6
6
6
✓12 (1, 0) + ✓23 (0, 1) = 6
6 ✓ (0, 0)
6 23
6 ✓ (1, 0)
6 23
4 ✓ (0, 1)
23
✓23 (1, 1)
Monday, December 12, 11
3 2
0
1
0
0
3
7 6
7
7 6
7
7 6
7
7 6
7
7 6
7
7· 6
7
7 6 0 7
7 6
7
7 6 0 7
7 6
7
5 4 1 5
0
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
yi 2 {0, 1}
• The MAP problem we wish to solve is
(y1⇤ , y2⇤ , y3⇤ ) = arg max
y1 ,y2 ,y3
where
2
✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
✓12 (0, 0)
✓12 (1, 0)
✓12 (0, 1)
✓12 (1, 1)
6
6
6
6
6
✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = 6
6 ✓ (0, 0)
6 23
6 ✓ (1, 0)
6 23
4 ✓ (0, 1)
23
✓23 (1, 1)
Monday, December 12, 11
3 2
0
1
0
0
3
7 6
7
7 6
7
7 6
7
7 6
7
7 6
7
7· 6
7
7 6 0 7
7 6
7
7 6 0 7
7 6
7
5 4 1 5
0
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
(y1⇤ , y2⇤ , y3⇤ ) = arg max ✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
y1 ,y2 ,y3
2
2
3
where
✓12 (0, 0)
1[y1 = 0 ^ y2
6 ✓12 (1, 0) 7 6 1[y1 = 1 ^ y2
6
7 6
6 ✓12 (0, 1) 7 6 1[y1 = 0 ^ y2
6
7 6
6 ✓12 (1, 1) 7 6 1[y1 = 1 ^ y2
7·6
✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = 6
6 ✓ (0, 0) 7 6 1[y = 0 ^ y
6 23
7 6
2
3
6 ✓ (1, 0) 7 6 1[y = 1 ^ y
6 23
7 6
2
3
4 ✓ (0, 1) 5 4 1[y = 0 ^ y
23
✓23 (1, 1)
✓
Monday, December 12, 11
·
= 0]
= 0]
= 1]
= 1]
3
7
7
7
7
7
7
7
= 0] 7
= 0] 7
7
5
2
3 = 1]
1[y2 = 1 ^ y3 = 1]
(y)
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
(y1⇤ , y2⇤ , y3⇤ ) = arg max
y1 ,y2 ,y3
yi 2 {0, 1}
✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
y ⇤ = argmax ✓ · (y)
yi 2 {0, 1}
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
(y)
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
0
(y )
(y)
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
✓
0
(y )
(y)
Monday, December 12, 11
MAP assignment
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
max ✓ · (y) =
y
✓
0
(y )
(y)
Monday, December 12, 11
MAP assignment
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
max ✓ · (y) = ✓ · (y ⇤ )
y
✓
(y ⇤ )
0
(y )
(y)
Monday, December 12, 11
MAP assignment
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
max ✓ · (y) = ✓ · (y ⇤ )
✓
y
(y ⇤ )
0
(y )
M = conv { (y)}
Monday, December 12, 11
M
(y)
MAP assignment
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
max ✓ · (y) = ✓ · (y ⇤ )
✓
y
= max ✓ · µ
(y ⇤ )
µ2M
M = conv { (y)}
Monday, December 12, 11
0
(y )
M
(y)
MAP assignment as a LP
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
max ✓ · (y) = ✓ · (y ⇤ )
✓
y
= max ✓ · µ
µ2M
convex set
linear objective
M = conv { (y)}
Monday, December 12, 11
(y ⇤ )
0
(y )
M
(y)
A more algebraic view
✓
max ✓ · (y) =
y
(y ⇤ )
0
(y )
M
(y)
M = conv { (y)}
Monday, December 12, 11
A more algebraic view
max ✓ · (y) = max
y
p
X
y
✓
p(y) [✓ · (y)]
(y ⇤ )
0
(y )
M
(y)
M = conv { (y)}
Monday, December 12, 11
A more algebraic view
max ✓ · (y) = max
y
p
X
y
= max ✓ · [
p
✓
p(y) [✓ · (y)]
X
y
(y ⇤ )
0
(y )
p(y) (y)]
M
(y)
M = conv { (y)}
Monday, December 12, 11
A more algebraic view
max ✓ · (y) = max
y
p
X
y
= max ✓ · [
p
✓
p(y) [✓ · (y)]
X
y
(y ⇤ )
0
(y )
p(y) (y)]
=µ
M
(y)
M = conv { (y)}
Monday, December 12, 11
A more algebraic view
max ✓ · (y) = max
y
p
X
y
= max ✓ · [
p
p(y) [✓ · (y)]
X
= max ✓ · µ
µ2M
Monday, December 12, 11
✓
y
(y ⇤ )
0
(y )
p(y) (y)]
=µ
M
(y)
M = conv { (y)}
A more algebraic view
max ✓ · (y) = max
y
p
X
y
= max ✓ · [
p
p(y) [✓ · (y)]
X
= max ✓ · µ
µ2M
✓
y
0
(y )
p(y) (y)]
M
=µ
(y)
M = conv { (y)}
P
• What are the expected statistics µ = y p(y) (y)?
• The convex hull M is defined on the basis of an exponentially many vertexes { (y)}. Can we specify it more
compactly?
Monday, December 12, 11
(y ⇤ )
Expected statistics
• Let’s go back to our simple MRF
y1
y2
y3
µ=
X
y
yi 2 {0, 1}
p(y) (y)
2
1[y1
1[y1
1[y1
1[y1
= 0 ^ y2
= 1 ^ y2
= 0 ^ y2
= 1 ^ y2
6
6
6
6
X
6
p(y) 6
=
6 1[y = 0 ^ y
6
2
3
y
6 1[y = 1 ^ y
6
2
3
4 1[y = 0 ^ y
2
3
1[y2 = 1 ^ y3
Monday, December 12, 11
= 0]
= 0]
= 1]
= 1]
3
2
µ12 (0, 0)
µ12 (1, 0)
µ12 (0, 1)
µ12 (1, 1)
6
7
6
7
6
7
6
7
6
7
7 =6
6
7
= 0] 7
6 µ23 (0, 0)
6 µ (1, 0)
= 0] 7
6 23
7
4
5
= 1]
µ23 (0, 1)
= 1]
µ23 (1, 1)
3
7
7
7
7
7
7
7
7
7
7
5
a vector of marginal
probabilities
Marginal polytope
• According to the Weyl’s theorem, convex hull of a finite
set of vectors (here (y)’s) can be written in terms of a
set of linear constraints, i.e., as a polytope
a 2 · µ  b2
M = conv { (y)}
=
a 1 · µ  b1
µ : Aµ  b
M
(y)
- what are the linear constraints?
- how many constraints do we need to specify M?
Monday, December 12, 11
Linear constraints: examples
• The marginal polytope for tree structured models is defined by simple constraints on the edge marginals
2
µ
(0,
0)
12
y1
y2
y3
6 µ12 (1, 0)
yi 2 {0, 1} 6
6 µ12 (0, 1)
6
µ12 (y1 , y2 ) µ23 (y2 , y3 )
6 µ12 (1, 1)
µ=6
6 µ (0, 0)
6 23
6 µ (1, 0)
6 23
4
non-negativity
(1) µ
(y
,
y
)
0,
µ23 (0, 1)
Pij i j
normalization
(2)
µ
(y
,
y
)
=
1,
µ
(1,
1)
ij
i
j
23
y
,y
i
j
P
P
consistency
(3)
yi µij (yi , yj ) =
yk µjk (yj , yk )
yi , yj , yk 2 {0, 1}, (i, j) & (j, k) 2 E
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Linear constraints: examples
• The marginal polytope for tree structured models is defined by simple constraints on the edge marginals
2
µ
(0,
0)
12
y1
y2
y3
6 µ12 (1, 0)
yi 2 {0, 1} 6
6 µ12 (0, 1)
6
µ12 (y1 , y2 ) µ23 (y2 , y3 )
6 µ12 (1, 1)
µ=6
6 µ (0, 0)
6 23
6 µ (1, 0)
6 23
4
non-negative
(1) µ
(y
,
y
)
0,
µ23 (0, 1)
Pij i j
normalized
(2)
µ
(y
,
y
)
=
1,
µ
(1,
1)
ij
i
j
23
y
,y
i
j
P
P
consistent
(3)
yi µij (yi , yj ) =
yk µjk (yj , yk )
yi , yj , yk 2 {0, 1}, (i, j) & (j, k) 2 E
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Linear constraints: examples
• The marginal polytope for tree structured models is defined by simple constraints on the edge marginals
2
µ
(0,
0)
12
y1
y2
y3
6 µ12 (1, 0)
yi 2 {0, 1} 6
6 µ12 (0, 1)
6
µ12 (y1 , y2 ) µ23 (y2 , y3 )
6 µ12 (1, 1)
µ=6
6 µ (0, 0)
6 23
6 µ (1, 0)
6 23
4
non-negative
(1) µ
(y
,
y
)
0,
µ23 (0, 1)
Pij i j
normalized
(2)
µ
(y
,
y
)
=
1,
µ
(1,
1)
ij
i
j
23
y
,y
i
j
P
P
consistency
consistent
(3)
yi µij (yi , yj ) =
yk µjk (yj , yk )
yi , yj , yk 2 {0, 1}, (i, j) & (j, k) 2 E
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Linear constraints: examples
• The marginal polytope for tree structured models is defined by simple constraints on the edge marginals
2
µ
(0,
0)
12
y1
y2
y3
6 µ12 (1, 0)
yi 2 {0, 1} 6
6 µ12 (0, 1)
6
µ12 (y1 , y2 ) µ23 (y2 , y3 )
6 µ12 (1, 1)
µ=6
6 µ (0, 0)
6 23
6 µ (1, 0)
6 23
4
non-negative
(1) µ
(y
,
y
)
0,
µ23 (0, 1)
Pij i j
normalized
(2)
µ
(y
,
y
)
=
1,
µ
(1,
1)
ij
i
j
23
y
,y
i
j
P
P
consistent
(3)
yi µij (yi , yj ) =
yk µjk (yj , yk )
yi , yj , yk 2 {0, 1}, (i, j) & (j, k) 2 E
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Linear constraints: examples
• The marginal polytope for tree structured models is defined by simple constraints on the edge marginals
2
µ
(0,
0)
12
y1
y2
y3
6 µ12 (1, 0)
yi 2 {0, 1} 6
6 µ12 (0, 1)
6
µ12 (y1 , y2 ) µ23 (y2 , y3 )
6 µ12 (1, 1)
µ=6
6 µ (0, 0)
6 23
µ = {µij (yi , yj )} 2 ML i↵
6 µ (1, 0)
6 23
4
non-negative
(1) µ
(y
,
y
)
0,
µ23 (0, 1)
Pij i j
normalized
(2)
µ
(y
,
y
)
=
1,
µ
(1,
1)
ij
i
j
23
y
,y
i
j
P
P
consistent
(3)
yi µij (yi , yj ) =
yk µjk (yj , yk )
{0, 1}, (i,“pairwise
j) & (j, k)relaxation”
2E
i , yj , yk 2polytope”,
“Localymarginal
• We will prove shortly that, indeed, for trees ML = M
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Linear constraints: examples
• It is often convenient to add extra single node statistics
2
µ1 (0)
y1
y2
y3
6 µ1 (1)
µ1 (y1 )
µ3 (y3 ) yi 2 {0, 1} 6
6 µ2 (0)
6
µ12 (y1 , y2 ) µ2 (y2 ) µ23 (y2 , y3 )
6 µ2 (1)
6
6 µ3 (0)
6
6 µ3 (1)
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
6
6
µ
(0,
0)
6
12
non-negative
(1) µ
(y
,
y
)
0,
µ=6
Pij i j
µ
(1,
0)
6
12
normalized
(2)
µ
(y
)
=
1,
6
P yi i i
µ
(0,
1)
6
12
(3)
µij (yi , yj ) = µj (yj )
6
y
i
P
consistent
µ
(1,
1)
6
12
6
yj µij (yi , yj ) = µi (yi )
6 µ (0, 0)
6 23
6 µ (1, 0)
6 23
4 µ23 (0, 1)
(same polytope, embedded in higher dimensions)
µ (1, 1)
23
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
“Proof” for trees
• Consider tree structured models where the expected statistics include single node marginals (for simplicity)
a 2 · µ  b2
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
• clearly any µ 2 M also belongs to ML
Monday, December 12, 11
a 1 · µ  b1
M
(y)
M ✓ ML
“Proof” for trees
• Consider tree structured models where the expected statistics include single node marginals (for simplicity)
a 2 · µ  b2
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
• clearly any µ 2 M also belongs to ML
a 1 · µ  b1
M
(y)
M ✓ ML
• any µ 2 ML gives rise to a tree distribution
Y
Y µij (yi , yj )
P (y) =
µi (yi )
µ
(y
)µ
(y
)
i
i
j
j
i
(i,j)2E
P
such that y P (y) (y) = µ 2 M
ML ✓ M
Monday, December 12, 11
Excluding vertexes (for trees)
• For trees, pairwise relaxation suffices to define M
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
• What if we needed to exclude a
specific (e.g., the maximizing) vertex?
Monday, December 12, 11
✓
(y ⇤ )
M
Excluding vertexes (for trees)
• For trees, pairwise relaxation suffices to define M
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
• What if we needed to exclude a
specific (e.g., the maximizing) vertex?
Monday, December 12, 11
✓
M(y ⇤ )
(y ⇤ )
Excluding vertexes (for trees)
• For trees, pairwise relaxation suffices to define M
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
✓
• What if we needed to exclude a
specific (e.g., the maximizing) vertex?
(4)
X
i
Monday, December 12, 11
(1
di )µi (yi⇤ ) +
X
(i,j)2E
M(y ⇤ )
µij (yi⇤ , yj⇤ )  0
(y ⇤ )
(Fromer et al. ’09)
Excluding vertexes (for trees)
• For trees, pairwise relaxation suffices to define M
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
• What if we needed to exclude a
✓
specific (e.g., the maximizing) vertex?
(4)
X
i
(1
(y ⇤ )
M(y ⇤ )
X
di )µi (yi⇤ ) +
µij (yi⇤ , yj⇤ )  0
=1
=1
(i,j)2E
(Fromer et al. ’09)
⇢
if µ = (y ⇤ )
Monday, December 12, 11
=1
) violated!
Excluding vertexes (for trees)
• For trees, pairwise relaxation suffices to define M
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
✓
• What if we needed to exclude a
specific (e.g., the maximizing) vertex?
(4)
X
(1
di )µi (yi⇤ ) +
i
X
(i,j)2E
(y ⇤ )
M(y ⇤ )
µij (yi⇤ , yj⇤ )  0
(Fromer et al. ’09)
⇢
⇤
if µ = neighbor of (y )
Monday, December 12, 11
=0
) tight!
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y)
y
Monday, December 12, 11
(i,j)
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y) = max ✓ · µ
y
(i,j)
µ2M
M
Monday, December 12, 11
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y) = max ✓ · µ
y
(i,j)
µ2M
M
Monday, December 12, 11
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y) = max ✓ · µ
y
(i,j)
µ 2 M i↵
a) µ
Pij 0
b)
µij = n 1
(i,j)
P
c)
(i,j)2E(S) µij  |S|
Monday, December 12, 11
µ2M
M
1, 8S ⇢ V
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y) = max ✓ · µ
y
(i,j)
µ 2 M i↵
a) µ
Pij 0
b)
µij = n 1
(i,j)
P
c)
(i,j)2E(S) µij  |S|
µ2M
M
1, 8S ⇢ V
• There are exponentially many constraints!
Monday, December 12, 11
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y) = max ✓ · µ
y
(i,j)
µ 2 M i↵
a) µ
Pij 0
b)
µij = n 1
(i,j)
P
c)
(i,j)2E(S) µij  |S|
µ2M
M
1, 8S ⇢ V
• There are exponentially many constraints!
• Polynomial description exists via lift & project (e.g.,Yannakakis ’90)
Monday, December 12, 11
The acyclic subgraph polytope
• The problem is considerably harder if we consider arc
selections that form acyclic graphs (cf. structure
learning)
yij = 1 if arc i ! j is selected and zero otherwise
⇢
0, if y specifies a DAG
✓DAG (y) =
1, otherwise
X
max
yij ✓ij + ✓DAG (y) = max ✓ · µ
y
i,j
µ2M
M
Monday, December 12, 11
The acyclic subgraph polytope
• The problem is considerably harder if we consider arc
selections that form acyclic graphs (cf. structure
learning)
yij = 1 if arc i ! j is selected and zero otherwise
⇢
0, if y specifies a DAG
✓DAG (y) =
1, otherwise
X
max
yij ✓ij + ✓DAG (y) = max ✓ · µ
y
µ2M
i,j
• Many but not all of the facet defining
inequalities are known (e.g., Grötschel et al. ’85)
µij
0,
X
i!j2C
Monday, December 12, 11
µij  |C|
1, 8 cycles C
M
“cycle inequalities”
Trees vs. non-trees
0
(y)
M0
M
M = ML
y1
Monday, December 12, 11
(y)
y3
y3
y2
y1
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
y2
Trees vs. non-trees
vertexes in M0 remain
vertexes in M0L
0
(y)
M0
M
M = ML
y1
Monday, December 12, 11
(y)
0
M ⇢
y3
y2
0
ML
y1
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
y3
y2
Trees vs. non-trees
vertexes in M0 remain
vertexes in M0L
0
(y)
M0
M
M = ML
y1
Monday, December 12, 11
0
M ⇢
y3
y2
0
ML
y1
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
(y)
µ0
fractional
vertex
y3
y2
Trees vs. non-trees
vertexes in M0 remain
vertexes in M0L
✓
0
(y)
M0
M
M = ML
y1
Monday, December 12, 11
0
M ⇢
y3
y2
0
ML
y1
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
(y)
µ0
fractional
vertex
y3
y2
Trees vs. non-trees
vertexes in M0 remain
vertexes in M0L
0
(y)
M
M = ML
y1
Monday, December 12, 11
M
0
M ⇢
y3
y2
0
0
ML
y1
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
(y)
µ
0
✓
fractional
vertex
y3
y2
Fractional vertexes
0
(y)
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
0
0
M
µ
(3)
µij (yi , yj ) = µj (yj )
y
i
P
fractional
µ
(y
,
y
)
=
µ
(y
)
0
0
i i
yj ij i j
M ⇢ ML
vertex

0.5
“disagreement form”
“disagreement form”
0.5


0 0.5
0 0.5
y3 0.5 0
0.5
0
these satisfy (1)-(3)
0
µ =
but
no
distribution

y1
y2 
can generate
0.5
0.5

these marginals
0.5
0.5
0 0.5
0.5 0
“disagreement form”
Monday, December 12, 11
(cf. Deza & Laurent ’96)
Fractional vertexes
0
(y)
µ0 fractional
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
vertex
(1) µ
(y
,
y
)
0,
Pij i j
(2)
µi (yi ) = 1,
y
i
P
0
M
(3)
µij (yi , yj ) = µj (yj )
y
i
P
0
0
yj µij (yi , yj ) = µi (yi )
M ⇢ ML

0.5
“agreement form”
0.5  “disagreement form”

0.5 0
0 0.5
y3 0.5 0
0 0.5
these satisfy (1)-(3)
0
µ =
but
no
distribution

y1
y2 
can generate
0.5
0.5

these marginals
0.5
0.5
0.5 0
0 0.5
“agreement form”
Monday, December 12, 11
Persistency
• Theorem in pairwise MRF models, any partially integral
solution resulting from pairwise relaxation can be
extended to an optimal solution provided that
µi (yi ) > 0 8yi
holds for all neighbors of integral nodes
(cf. Nemhauser et al. ’75, Kolmogorov et al. ’05, Weiss et al. ’08)
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )


0.5
0.5
y1
0.5
0.5

y3

y2
0
1

1
0
0.5
0.5
(pairwise marginals not shown)
Monday, December 12, 11
Preventing fractional vertexes
X • Structural constraints
- e.g., trees, planar, perfect graphs (Jebara, ‘10)
✓
0
• Parameter restrictions
(y)
- e.g., attractive potentials
• Tighter relaxations
- e.g., Gomory, Sherali-Adams (‘90),
Lovasz and Schrijver (‘91), Lasserre (’01)
Monday, December 12, 11
M0
µ
0
fractional
vertex
0
M0 ⇢ ML
µ0 =


0.5
0
0.5
0.5
Parameter
restrictions

0.5
“disagreement form”
0.5 
0 0.5
y3 0.5 0
0
0.5
y1

0.5
0
0
0.5
y2 
0.5
0.5
M0
0
M ⇢
0
ML
✓
0
(y)
µ0
fractional
vertex
• In the pairwise relaxation, a fractional vertex has to have
at least one pairwise marginal in a “disagreement” form.
• We can eliminate such marginals from the maximizing
set by constraining the parameters
✓ij (1, 1) + ✓ij (0, 0)
(a.k.a. attractive potentials)
Monday, December 12, 11
✓ij (1, 0) + ✓ij (0, 1)
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
marginal
probabilities
Monday, December 12, 11
✓
current solution
pairwise
constraints
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
✓
violated
constraint
marginal
probabilities
Monday, December 12, 11
current solution
pairwise
constraints
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
✓
violated
constraint
new solution
marginal
probabilities
Monday, December 12, 11
current solution
pairwise
constraints
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
violated
constraint
✓
violated
constraint
new solution
marginal
probabilities
Monday, December 12, 11
current solution
pairwise
constraints
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
violated
constraint
✓
violated
constraint
new solution
marginal
probabilities
Monday, December 12, 11
current solution
pairwise
constraints
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
violated
constraint
✓
violated
constraint
new solution
marginal
probabilities
current solution
pairwise
constraints
M ✓ M(k) ⇢ . . . ⇢ M(1) ⇢ ML
Monday, December 12, 11
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
violated
constraint
✓
violated
constraint
new solution
marginal
probabilities
current solution
pairwise
constraints
M ✓ M(k) ⇢ . . . ⇢ M(1) ⇢ ML
• What are the additional constraints? How to find them?
Monday, December 12, 11
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
violated
constraint
✓
violated
constraint
new solution
current solution
marginal
probabilities
pairwise
constraints
M ✓ M(k) ⇢ . . . ⇢ M(1) ⇢ ML
• What are the additional constraints? How to find them?
µ
lifting

µ
⌧
projection
simpler constraints
Monday, December 12, 11
µ
Tighter relaxations via lifting
• We can eliminate some fractional vertexes by requiring
that any triplet of pairwise marginals come from a valid
distribution
✓
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
marginal
probabilities
pairwise
constraints
current
solution
y3
y1
Monday, December 12, 11
y2
Tighter relaxations via lifting
• We can eliminate some fractional vertexes by requiring
that any triplet of pairwise marginals come from a valid
distribution
✓
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
marginal
µ
(y
,
y
)
=
µ
(y
)
ij
i
j
j
j
y
P i
probabilities
yj µij (yi , yj ) = µi (yi )
P
(4) ⌧ijk (yi , yj , yk ) 0,
⌧
(y
,
y
,
y
)
=
1
ijk
i
j
k
y
,y
,y
i
j
k
P
(5)
⌧ijk (yi , yj , yk ) = µij (yi , yj )
y
k
P
⌧ijk (yi , yj , yk ) = µik (yi , yk )
y
j
P
yi ⌧ijk (yi , yj , yk ) = µjk (yj , yk )
y1
Monday, December 12, 11
pairwise
constraints
current
solution
y3
y2
Tighter relaxations via lifting
• We can eliminate some fractional vertexes by requiring
that any triplet of pairwise marginals come from a valid
distribution
✓ new
constraints
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
marginal
µ
(y
,
y
)
=
µ
(y
)
ij
i
j
j
j
y
P i
probabilities
yj µij (yi , yj ) = µi (yi )
P
(4) ⌧ijk (yi , yj , yk ) 0,
⌧
(y
,
y
,
y
)
=
1
ijk
i
j
k
y
,y
,y
i
j
k
P
(5)
⌧ijk (yi , yj , yk ) = µij (yi , yj )
y
k
P
⌧ijk (yi , yj , yk ) = µik (yi , yk )
y
j
P
yi ⌧ijk (yi , yj , yk ) = µjk (yj , yk )
y1
Monday, December 12, 11
new solution
pairwise
constraints
current
solution
y3
y2
Dual decomposition
• There are many possible ways of preventing fractional
vertexes, including
X - Structural constraints
- e.g., trees, planar, perfect graphs (Jebara, ‘10)
X - Parameter restrictions
- e.g., attractive potentials
X - Tighter relaxations
- e.g., Gomory, Sherali-Adams (‘90),
Lovasz and Schrijver (‘91), Lasserre (’01)
• These are key ingredients in decomposition methods
that break the original problem into exactly solvable
sub-problems (cf. Guignard, Fisher, ‘80’s)
- e.g., Wainwright et al. ’02+, Johnson et al. ‘07, Kommodakis
’07+, Sontag et al. ’08+, etc.
Monday, December 12, 11
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
X
✓f (yf )
f 2F
(i,j)2E
✓f (yf )
y1
y2
y5
y1
y1
y2
y1
✓g (yg )
y5
y5
y2
y3
y3
y4
y3
Monday, December 12, 11
y5
y4
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
X
✓f (yf )
f 2F
(i,j)2E
✓f (yf )
y1
y2
y5
y1
y1
y2
y5
✓g (yg )
y5
y2
y3
y3
y4
y3
Monday, December 12, 11
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
y1
y3
Monday, December 12, 11
✓f (yf )
f 2F
(i,j)2E
y2
X
y5
y4
✓f (yf ) y1
y2
y3
y1
y5
✓g (yg )
y5
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
✓f (yf )
y1
y3
Monday, December 12, 11
✓f (yf )
f 2F
(i,j)2E
y2
X
y5
y4
y2
y3
y1
y5
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
X
✓f (yf )
f 2F
(i,j)2E
✓f (yf )
y1
y2
y5
y1
y1
y2
y5
✓g (yg )
y5
y2
y3
y3
y4
y3
Monday, December 12, 11
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
X
✓i (yi ) +
i
(i,j)2E
X
f 2F
y1
✓f (yf )
y1
y2
y5
y2
y4
y3
Monday, December 12, 11
✓1 (y1 )
y1
y1
y5
y2
y2
y3
✓f (yf )
y3
y3
✓g (yg )
y5
y5
y4
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓i (yi ) +
i
X
✓f (yf )
f 2F
y1
✓f (yf )
y2
y3
Monday, December 12, 11
y1
y1
y5
y2
y2
✓1 (y1 )
y3
y3
✓g (yg )
y5
y5
y4
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓i (yi ) +
i
X
f 2F
✓f (yf ) = ✓ · (y)
y1
✓f (yf )
y2
y3
Monday, December 12, 11
y1
y1
y5
y2
y2
✓1 (y1 )
y3
y3
✓g (yg )
y5
y5
y4
y4
Decomposition, LP relaxation
• The decomposition leads to a natural component-wise LP
relaxation
max
y
X
✓i (yi ) +
i
X
f 2F
= max
µ2M
✓f (yf ) = max ✓ · µ
XX
i
µ2M
µi (yi )✓i (yi ) +
yi
XX
f 2F yf
y1
✓f (yf )
y2
y2
y3
✓1 (y1 )
y1
y1
y5
y2
Monday, December 12, 11
µf (yf )✓f (yf )
y3
y3
✓g (yg )
y5
y5
y4
y4
Decomposition, LP relaxation
• The decomposition leads to a natural component-wise LP
relaxation
max
y
X
✓i (yi ) +
i
= max
µ2ML
X
f 2F
✓f (yf )  max
µ2ML
XX
µi (yi )✓i (yi ) +
yi
i
µ = {µi (yi ), µf (yf )} 2 ML i↵
µi (yi ) 0, µf (yf )
X
µi (yi ) = 1
X
yf \i
y1
✓f (yf )
µf (yf )✓f (yf )
y2
y2
y3
✓1 (y1 )
y1
y1
y5
y2
µf (yf ) = µi (yi ), if i 2 f
Monday, December 12, 11
XX
f 2F yf
0
yi
✓·µ
y3
y3
✓g (yg )
y5
y5
y4
y4
Dual decomposition
• We can think of the relaxation as enforcing that the
components agree about the maximizing assignment
max
y
X
✓i (yi ) +
i

X
i
X
✓f (yf )
f 2F
max ✓i (yi ) +
yi
X
max ✓f (yf )
f 2F
yf
y1
yf)
(
f
✓
y2
y2
y2
Monday, December 12, 11
y3
y3
)
1
y
✓ 1( y
1
y1
y5
✓g (yg )
y5
y5
y4
y4
Dual decomposition
• We can think of the relaxation as enforcing that the
components agree about the maximizing assignment
max
y
X
✓i (yi ) +
i

X
i
X
✓f (yf )
f 2F
max ✓i (yi ) +
yi
X
f 2F
max ✓f (yf )
yf
• Agreements can be enforced via
Lagrange multipliers associated
with the equality constraints
)
1
y
(
f1
y1
+
)
1
y
(
✓1 y
✓g (yg )
1
y1
y5
y5
y3
y5
)
f
y
✓ f(
y2
y2
y2
Monday, December 12, 11
y3
f
)
1
y
1(
y4
y4
Dual decomposition
• We can think of the relaxation as enforcing that the
components agree about the maximizing assignment
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f i (yi )
+
X
f 2F
f :i2f
max ✓f (yf )
yf
• Agreements can be enforced via
Lagrange multipliers associated
with the equality constraints
+
)
1
y
(
✓1 y
✓g (yg )
1
y1
y5
y5
y3
y5
y2
Monday, December 12, 11
f
)
1
y
1(
f1
y1
)
f
y
✓ f(
y2
y3
f i (yi )
i2f
)
1
y
(
y2
X
y4
y4
Dual decomposition
• We can think of the relaxation as enforcing that the
components agree about the maximizing assignment
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f i (yi )
+
X
f 2F
f :i2f
max ✓f (yf )
yf
• Agreements can be enforced via
Lagrange multipliers associated
)
with the equality constraints
f
y
✓ f(
• Lagrange multipliers rey2
parameterize the model
y2
Monday, December 12, 11
y3
X
f i (yi )
i2f
)
1
y
(
f
)
1
y
1(
f1
y1
+
)
1
y
(
✓1 y
✓g (yg )
1
y1
y5
y5
y3
y5
y2
y4
y4
Dual decomposition
• We minimize the dual objective
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
f i (yi )
i2f
with respect to Lagrange multipliers associated with
equality constraints
- this is an unconstrained convex minimization problem (cf.
constrained primal maximization)
- minimization seeks to enforce agreements among the subproblems
- a number of distributed algorithms have been developed for
this purpose (e.g., Werner ’07, Globerson et al., ’08) (2nd part)
Monday, December 12, 11
Weak duality
dual =
X
i
max ✓i (yi ) +
yi
primal constraints:
Monday, December 12, 11
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
i2f
f i (yi )
Weak duality
dual =
X
i
XX
i
max ✓i (yi ) +
yi
X
f i (yi )
f :i2f
µi (yi ) ✓i (yi ) +
yi
X
f i (yi )
f :i2f
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
yi
Monday, December 12, 11
+
+
X
f 2F
max ✓f (yf )
yf
X
i2f
f i (yi )
Weak duality
dual =
X
i
XX
i
max ✓i (yi ) +
yi
X
f i (yi )
+
f :i2f
µi (yi ) ✓i (yi ) +
yi
X
f i (yi )
f :i2f
+
X
f 2F
max ✓f (yf )
yf
XX
µf (yf ) ✓f (yf )
µf (yf )
X
yi
Monday, December 12, 11
X
i2f
0,
f i (yi )
i2f
f 2F yf
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
X
yf
µf (yf ) = 1
f i (yi )
Weak duality
dual =
X
i
XX
i
max ✓i (yi ) +
yi
X
f i (yi )
+
f :i2f
µi (yi ) ✓i (yi ) +
yi
X
f i (yi )
f :i2f
+
X
max ✓f (yf )
f 2F
yf
XX
µf (yf ) ✓f (yf )
µf (yf )
X
yi
X
yf \i
Monday, December 12, 11
µf (yf ) = µi (yi ), 8i 2 f
X
i2f
0,
f i (yi )
i2f
f 2F yf
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
X
yf
µf (yf ) = 1
f i (yi )
Weak duality
dual =
X
i
XX
i
max ✓i (yi ) +
yi
X
f i (yi )
+
f :i2f
µi (yi ) ✓i (yi ) +
yi
X
f i (yi )
f :i2f
+
X
max ✓f (yf )
f 2F
yf
XX
µf (yf ) ✓f (yf )
µf (yf )
X
yi
X
yf \i
Monday, December 12, 11
µf (yf ) = µi (yi ), 8i 2 f
X
i2f
0,
f i (yi )
i2f
f 2F yf
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
X
yf
µf (yf ) = 1
f i (yi )
Weak duality
dual =
X
i
XX
i
=
yi
f i (yi )
+
f :i2f
µi (yi ) ✓i (yi ) +
yi
XX
i
max ✓i (yi ) +
X
X
f i (yi )
f :i2f
+
X
max ✓f (yf )
f 2F
XX
µi (yi )✓i (yi )
+
XX
µf (yf ) ✓f (yf )
X
µf (yf )✓f (yf ) = primal
µf (yf )
yi
X
yf \i
f i (yi )
i2f
f 2F yf
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
f i (yi )
i2f
f 2F yf
yi
Monday, December 12, 11
yf
X
µf (yf ) = µi (yi ), 8i 2 f
0,
X
yf
µf (yf ) = 1
Weak duality
dual =
X
i
XX
i
=
yi
f i (yi )
+
f :i2f
µi (yi ) ✓i (yi ) +
yi
XX
i
max ✓i (yi ) +
X
X
f i (yi )
f :i2f
+
X
max ✓f (yf )
f 2F
XX
µi (yi )✓i (yi )
+
XX
µf (yf ) ✓f (yf )
X
µf (yf )✓f (yf ) = primal
µf (yf )
yi
X
yf \i
f i (yi )
i2f
f 2F yf
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
f i (yi )
i2f
f 2F yf
yi
Monday, December 12, 11
yf
X
µf (yf ) = µi (yi ), 8i 2 f
0,
X
yf
µf (yf ) = 1
Strong duality
• Min dual value = max primal value
X
i
=
XX
i
yi
max ✓i (yi ) +
yi
µi (yi ) ✓i (yi ) +
X
f i (yi )
f 2F
f :i2f
X
+
X
yf
i2f
µf (yf ) ✓f (yf )
f 2F yf
complementary slackness
µi (yi ) > 0 ) yi 2 argmax ✓i (yi ) +
yi
µf (yf ) > 0 ) yf 2 argmax ✓f (yf )
yf
X
i2f
• At the primal-dual optimum, equality follows from
Monday, December 12, 11
f i (yi )
f i (yi )
f :i2f
+
XX
max ✓f (yf )
X
X
f i (yi )
f :i2f
X
i2f
f i (yi )
f i (yi )
Dual properties
• The dual objective
max
y

X
i
Monday, December 12, 11
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
i2f
f i (yi )
Dual properties
• The dual objective
max
y

X
i
X
✓i (yi ) +
i
X
✓f (yf )
f 2F
max ✓i (yi ) +
yi
X
f i (yi )
+
X
f 2F
f :i2f
max ✓f (yf )
yf
X
i2f
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
✓f i (yi )
f :i2f
X
i2f
✓f i (yf )
8i
8f 2 F
f i (yi )
Dual properties
• The dual objective
max
y

X
i
X
✓i (yi ) +
i
X
✓f (yf )
f 2F
max ✓i (ŷi ) +
yi
X
f i (ŷi )
+
X
f 2F
f :i2f
max ✓f (ŷf )
yf
X
i2f
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
✓f i (yi )
f :i2f
X
i2f
✓f i (yf )
8i
8f 2 F
f i (ŷi )
Dual properties
• The dual objective
max
y

X
X
✓i (yi ) +
i
X
✓f (yf )
f 2F
✓i (ŷi ) +
i
X
f i (ŷi )
+
X
✓f (ŷf )
f 2F
f :i2f
X
f i (ŷi )
i2f
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
f i (yi )
8i
f i (yi )
8f 2 F
f :i2f
X
i2f
Dual properties
• The dual objective
max
y
X
✓i (yi ) +
X
✓f (yf )
reparameterization
X
X
X
X

✓i (ŷi ) +
✓f (ŷf )
f i (ŷi ) +
f i (ŷi )
i
f 2F
i
f 2F
f :i2f
i2f
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
f i (yi )
8i
f i (yi )
8f 2 F
f :i2f
X
i2f
Dual properties
• The dual objective
max
y

X
i
X
✓i (yi ) +
i
✓i (ŷi ) +
X
X
✓f (yf )
f 2F
✓f (ŷf )
f 2F
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
f i (yi )
8i
f i (yi )
8f 2 F
f :i2f
X
i2f
Dual properties
• The dual objective
max
y
=
X
i
X
✓i (yi ) +
i
✓i (ŷi ) +
X
X
✓f (yf )
f 2F
) ŷ is a MAP assignment !
✓f (ŷf )
f 2F
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
f i (yi )
8i
f i (yi )
8f 2 F
f :i2f
X
i2f
Dual properties
• We minimize the dual objective
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
i2f
with respect to Lagrange multipliers associated with
equality constraints
• Theorem (certificate): if the components agree
about an assignment, the assignment is optimal
Monday, December 12, 11
f i (yi )
Dual properties
• We minimize the dual objective
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
i2f
with respect to Lagrange multipliers associated with
equality constraints
• Theorem (certificate): if the components agree
about an assignment, the assignment is optimal
• We can obtain such an agreement only if the
corresponding LP relaxation is tight and the Lagrange
multipliers are set optimally
Monday, December 12, 11
f i (yi )
Dual decomposition: summary
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
f i (yi )
i2f
• we try to solve the MAP problem by encouraging
exactly solvable sub-problems to agree
• the dual objective is an unconstrained convex
optimization problem (cf. primal)
• certificate of optimality = primal integrality
• the dual value always upper bounds
dual
the MAP value (cf. learning)
• dual value can guide search for tighter
relaxations (e.g., Sontag et al. ’08)
LP value
MAP value
primal
time
Monday, December 12, 11