Linear Programming Relaxations for Graphical Models

Linear Programming Relaxations for
Graphical Models
Tommi Jaakkola (MIT) , Amir Globerson (Hebrew Univ.)
Based on joint work with
M. Collins, M. Fromer, T. Koo,
M. Meila, O. Meshi, A. Rush, D. Sontag
Monday, December 12, 11
Outline
• Inference with linear programs
- problems, examples
- geometry, polytopes, constraints
- relaxations, consequences
- dual decomposition, properties
• Solving linear programs
- sub-gradient, coordinate descent
• Learning with LPs
- structured prediction
- coordinate descent, cutting plane
- pseudo-max
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
• Computational biology
- e.g., drug design, molecular structure prediction
• Natural language processing
- e.g., parsing, machine translation, information extraction
• Medical informatics
- e.g., automated diagnosis
• Exploratory analysis
- e.g., learning Bayesian networks
• etc.
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
• Computational biology
- e.g., drug design, molecular structure prediction
• Natural language processing
- e.g., parsing, machine translation, information extraction
• Medical informatics
- e.g., automated diagnosis
• Exploratory analysis
- e.g., learning Bayesian networks
• etc.
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
(Tappen & Freeman ’03)
corresponding pixels
Goal: recover the depth of each pixel in the image
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
(Tappen & Freeman ’03)
...
corresponding pixels
...
...
Goal: recover the depth of each pixel in the image
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
(Tappen & Freeman ’03)
...
corresponding pixels
...
...
displacements
Goal: recover the depth of each pixel in the image
Monday, December 12, 11
Inference problems
• Computer vision
- e.g., image segmentation, stereopsis, scene analysis
(Tappen & Freeman ’03)
...
corresponding pixels
...
...
neighbor constrained displacements
Goal: recover the depth of each pixel in the image
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
orientations of side chains
are energetically coupled
protein
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
y1
y1
y2
y2
y5
y3
y4
y3
y5
y4
orientation angles as variables,
orientations of side chains
are energetically coupled
protein
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
y1
y1
y2
y2
y5
y3
y4
orientations of side chains
are energetically coupled
y3
y5
y4
orientation angles as variables,
densely coupled
protein
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
y1
y1
y2
y2
y5
y3
y4
orientations of side chains
are energetically coupled
y3
✓15 (y1 , y5 )
y5
y4
orientation angles as variables,
densely coupled
protein
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
y1
y1
y2
y2
y5
y3
y4
y3
y4
orientations of side chains
are energetically coupled
protein
✓15 (y1 , y5 )
y5
orientation angles as variables,
densely coupled
1
P (y) = exp
Z
X
✓ij (yi , yj )
(i,j)2E
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Computational biology
- e.g., drug design, molecular structure prediction
y1
y1
y2
y2
y5
y3
y4
orientations of side chains
are energetically coupled
protein
✓15 (y1 , y5 )
y5
y4
y3
orientation angles as variables,
densely coupled
max
y
X
✓ij (yi , yj )
(i,j)2E
Goal: recover energetically optimal side-chain orientations
Monday, December 12, 11
Inference problems
• Natural language processing
- e.g., dependency parsing, machine translation,
information extraction
*
John
saw
a
movie
yesterday
that
he
liked
Goal: find the highest scoring parse for any given sentence
Monday, December 12, 11
Inference problems
• Natural language processing
- e.g., dependency parsing, machine translation,
information extraction
*
i=0
John
saw
1
2
a
movie
⇤ 1 2
that
he
liked
n
y(i, j) = 1 if arc i ! j is selected
and zero otherwise
n
2
1
yesterday
n
Goal: find the highest scoring parse for any given sentence
Monday, December 12, 11
Inference problems
• Natural language processing
- e.g., dependency parsing, machine translation,
information extraction
*
i=0
John
saw
1
2
a
movie
yesterday
max
y
⇤ 1 2
he
liked
n
y(i, j) = 1 if arc i ! j is selected
and zero otherwise
n
2
1
that
X
i,j
y(i, j)✓(i, j) + ✓T (y)
has to be a tree
n
Goal: find the highest scoring parse for any given sentence
Monday, December 12, 11
Inference problems
• Natural language processing
- e.g., dependency parsing, machine translation,
information extraction
*
i=0
John
saw
1
2
a
movie
yesterday
max
y
⇤ 1 2
he
liked
n
y(i, j) = 1 if arc i ! j is selected
and zero otherwise
n
2
1
that
X
i
✓i (y(i, ·)) + ✓T (y)
has to be a tree
useful scoring functions
couple modifier (arc) choices
n
Goal: find the highest scoring parse for any given sentence
Monday, December 12, 11
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
;
{1}
y2 2
{3}
{1, 3}
;
{2}
y1 2
{3}
{2, 3}
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
;
{1}
y3 2
{2}
{1, 2}
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
1
2
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
3
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
;
{1}
y2 2
{3}
{1, 3}
;
{2}
y1 2
{3}
{2, 3}
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
;
{1}
y3 2
{2}
{1, 2}
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
1
2
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
3
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
;
{2}
y1 2
{3}
{2, 3}
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
✓1(;)
;
{1}
y2 2
y3 2
{3}
{1, 3} ✓2({1, 3})
1
;
{1} ✓3({1})
{2}
{1, 2}
2
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
3
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
;
{2}
y1 2
{3}
{2, 3}
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
✓1(;)
;
{1}
y2 2
y3 2
{3}
{1, 3} ✓2({1, 3})
1
;
{1} ✓3({1})
{2}
max
{1, 2}
y
2
X
3
✓i (yi ) + ✓DAG (y)
i
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
has to be a
DAG
Inference problems
• Exploratory analysis
- e.g., learning Bayesian networks
2
0
1
2
0
2
2
1
2
1
0
2
0
0
0
1
2
1
1
1
1 2 0
2 2 0
0 1 1
0 2 0
1 1 0
...
0
1
1
0
1
2
1
1
2
1
0
2
1
0
1
2 ...
2
0
2
0
;
{2} ✓1({2})
y1 2
{3}
{2, 3}
;
;
{1}
{1} ✓3({1})
y2 2
y3 2
{3}
{2}
{1, 3} ✓2({1, 3})
max
{1, 2}
y
1
2
X
3
✓i (yi ) + ✓DAG (y)
i
Goal: find the highest scoring acyclic graph
Monday, December 12, 11
has to be a
DAG
Inference problems
• MAP inference problems are often provably hard
- protein design (Pierce et al. ’02)
- dependency parsing with sibling scoring (McDonald et al. ’07)
- structure learning of Bayesian networks (Chickering ’96, etc.)
- etc.
• But “typical” instances of these problems may be
solvable substantially faster
• We will discuss here (dual) linear programming (LP)
relaxations for solving “typical” instances of such
problems
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
yi 2 {0, 1}
1
P (y; ✓) =
exp ✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
Z(✓)
2
3
where
✓12 (0, 0)
6 ✓12 (1, 0) 7
7
✓ =6
4 ✓12 (0, 1) 5
✓12 (1, 1)
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
yi 2 {0, 1}
1
P (y; ✓) =
exp ✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
Z(✓)
where
2
✓12 (0, 0)
✓12 (1, 0)
✓12 (0, 1)
✓12 (1, 1)
6
6
6
6
6
✓=6
6 ✓ (0, 0)
6 23
6 ✓ (1, 0)
6 23
4 ✓ (0, 1)
23
✓23 (1, 1)
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
(y1⇤ , y2⇤ , y3⇤ ) = arg max
y1 ,y2 ,y3
where
2
✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
✓12 (0, 0)
✓12 (1, 0)
✓12 (0, 1)
✓12 (1, 1)
6
6
6
6
6
✓=6
6 ✓ (0, 0)
6 23
6 ✓ (1, 0)
6 23
4 ✓ (0, 1)
23
✓23 (1, 1)
Monday, December 12, 11
yi 2 {0, 1}
3
7
7
7
7
7
7
7
7
7
7
5
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
yi 2 {0, 1}
• The MAP problem we wish to solve is
(y1⇤ , y2⇤ , y3⇤ ) = arg max
y1 ,y2 ,y3
where
2
✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
✓12 (0, 0)
✓12 (1, 0)
✓12 (0, 1)
✓12 (1, 1)
6
6
6
6
6
✓12 (1, 0) + ✓23 (0, 1) = 6
6 ✓ (0, 0)
6 23
6 ✓ (1, 0)
6 23
4 ✓ (0, 1)
23
✓23 (1, 1)
Monday, December 12, 11
3 2
0
1
0
0
3
7 6
7
7 6
7
7 6
7
7 6
7
7 6
7
7· 6
7
7 6 0 7
7 6
7
7 6 0 7
7 6
7
5 4 1 5
0
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
yi 2 {0, 1}
• The MAP problem we wish to solve is
(y1⇤ , y2⇤ , y3⇤ ) = arg max
y1 ,y2 ,y3
where
2
✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
✓12 (0, 0)
✓12 (1, 0)
✓12 (0, 1)
✓12 (1, 1)
6
6
6
6
6
✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = 6
6 ✓ (0, 0)
6 23
6 ✓ (1, 0)
6 23
4 ✓ (0, 1)
23
✓23 (1, 1)
Monday, December 12, 11
3 2
0
1
0
0
3
7 6
7
7 6
7
7 6
7
7 6
7
7 6
7
7· 6
7
7 6 0 7
7 6
7
7 6 0 7
7 6
7
5 4 1 5
0
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
(y1⇤ , y2⇤ , y3⇤ ) = arg max ✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
y1 ,y2 ,y3
2
2
3
where
✓12 (0, 0)
1[y1 = 0 ^ y2
6 ✓12 (1, 0) 7 6 1[y1 = 1 ^ y2
6
7 6
6 ✓12 (0, 1) 7 6 1[y1 = 0 ^ y2
6
7 6
6 ✓12 (1, 1) 7 6 1[y1 = 1 ^ y2
7·6
✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = 6
6 ✓ (0, 0) 7 6 1[y = 0 ^ y
6 23
7 6
2
3
6 ✓ (1, 0) 7 6 1[y = 1 ^ y
6 23
7 6
2
3
4 ✓ (0, 1) 5 4 1[y = 0 ^ y
23
✓23 (1, 1)
✓
Monday, December 12, 11
·
= 0]
= 0]
= 1]
= 1]
3
7
7
7
7
7
7
7
= 0] 7
= 0] 7
7
5
2
3 = 1]
1[y2 = 1 ^ y3 = 1]
(y)
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
(y1⇤ , y2⇤ , y3⇤ ) = arg max
y1 ,y2 ,y3
yi 2 {0, 1}
✓12 (y1 , y2 ) + ✓23 (y2 , y3 )
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
y ⇤ = argmax ✓ · (y)
yi 2 {0, 1}
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
(y)
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
0
(y )
(y)
Monday, December 12, 11
Markov Random Fields
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
✓
0
(y )
(y)
Monday, December 12, 11
MAP assignment
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
max ✓ · (y) =
y
✓
0
(y )
(y)
Monday, December 12, 11
MAP assignment
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
max ✓ · (y) = ✓ · (y ⇤ )
y
✓
(y ⇤ )
0
(y )
(y)
Monday, December 12, 11
MAP assignment
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
max ✓ · (y) = ✓ · (y ⇤ )
✓
y
(y ⇤ )
0
(y )
M = conv { (y)}
Monday, December 12, 11
M
(y)
MAP assignment
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
max ✓ · (y) = ✓ · (y ⇤ )
✓
y
= max ✓ · µ
(y ⇤ )
µ2M
M = conv { (y)}
Monday, December 12, 11
0
(y )
M
(y)
MAP assignment as a LP
• Let’s start with a simple Markov model over binary
variables
y1
y2
y3
• The MAP problem we wish to solve is
yi 2 {0, 1}
y ⇤ = argmax ✓ · (y)
y
where ✓12 (y1 , y2 ) + ✓23 (y2 , y3 ) = ✓ · (y)
max ✓ · (y) = ✓ · (y ⇤ )
✓
y
= max ✓ · µ
µ2M
convex set
linear objective
M = conv { (y)}
Monday, December 12, 11
(y ⇤ )
0
(y )
M
(y)
A more algebraic view
✓
max ✓ · (y) =
y
(y ⇤ )
0
(y )
M
(y)
M = conv { (y)}
Monday, December 12, 11
A more algebraic view
max ✓ · (y) = max
y
p
X
y
✓
p(y) [✓ · (y)]
(y ⇤ )
0
(y )
M
(y)
M = conv { (y)}
Monday, December 12, 11
A more algebraic view
max ✓ · (y) = max
y
p
X
y
= max ✓ · [
p
✓
p(y) [✓ · (y)]
X
y
(y ⇤ )
0
(y )
p(y) (y)]
M
(y)
M = conv { (y)}
Monday, December 12, 11
A more algebraic view
max ✓ · (y) = max
y
p
X
y
= max ✓ · [
p
✓
p(y) [✓ · (y)]
X
y
(y ⇤ )
0
(y )
p(y) (y)]
=µ
M
(y)
M = conv { (y)}
Monday, December 12, 11
A more algebraic view
max ✓ · (y) = max
y
p
X
y
= max ✓ · [
p
p(y) [✓ · (y)]
X
= max ✓ · µ
µ2M
Monday, December 12, 11
✓
y
(y ⇤ )
0
(y )
p(y) (y)]
=µ
M
(y)
M = conv { (y)}
A more algebraic view
max ✓ · (y) = max
y
p
X
y
= max ✓ · [
p
p(y) [✓ · (y)]
X
= max ✓ · µ
µ2M
✓
y
0
(y )
p(y) (y)]
M
=µ
(y)
M = conv { (y)}
P
• What are the expected statistics µ = y p(y) (y)?
• The convex hull M is defined on the basis of an exponentially many vertexes { (y)}. Can we specify it more
compactly?
Monday, December 12, 11
(y ⇤ )
Expected statistics
• Let’s go back to our simple MRF
y1
y2
y3
µ=
X
y
yi 2 {0, 1}
p(y) (y)
2
1[y1
1[y1
1[y1
1[y1
= 0 ^ y2
= 1 ^ y2
= 0 ^ y2
= 1 ^ y2
6
6
6
6
X
6
p(y) 6
=
6 1[y = 0 ^ y
6
2
3
y
6 1[y = 1 ^ y
6
2
3
4 1[y = 0 ^ y
2
3
1[y2 = 1 ^ y3
Monday, December 12, 11
= 0]
= 0]
= 1]
= 1]
3
2
µ12 (0, 0)
µ12 (1, 0)
µ12 (0, 1)
µ12 (1, 1)
6
7
6
7
6
7
6
7
6
7
7 =6
6
7
= 0] 7
6 µ23 (0, 0)
6 µ (1, 0)
= 0] 7
6 23
7
4
5
= 1]
µ23 (0, 1)
= 1]
µ23 (1, 1)
3
7
7
7
7
7
7
7
7
7
7
5
a vector of marginal
probabilities
Marginal polytope
• According to the Weyl’s theorem, convex hull of a finite
set of vectors (here (y)’s) can be written in terms of a
set of linear constraints, i.e., as a polytope
a 2 · µ  b2
M = conv { (y)}
=
a 1 · µ  b1
µ : Aµ  b
M
(y)
- what are the linear constraints?
- how many constraints do we need to specify M?
Monday, December 12, 11
Linear constraints: examples
• The marginal polytope for tree structured models is defined by simple constraints on the edge marginals
2
µ
(0,
0)
12
y1
y2
y3
6 µ12 (1, 0)
yi 2 {0, 1} 6
6 µ12 (0, 1)
6
µ12 (y1 , y2 ) µ23 (y2 , y3 )
6 µ12 (1, 1)
µ=6
6 µ (0, 0)
6 23
6 µ (1, 0)
6 23
4
non-negativity
(1) µ
(y
,
y
)
0,
µ23 (0, 1)
Pij i j
normalization
(2)
µ
(y
,
y
)
=
1,
µ
(1,
1)
ij
i
j
23
y
,y
i
j
P
P
consistency
(3)
yi µij (yi , yj ) =
yk µjk (yj , yk )
yi , yj , yk 2 {0, 1}, (i, j) & (j, k) 2 E
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Linear constraints: examples
• The marginal polytope for tree structured models is defined by simple constraints on the edge marginals
2
µ
(0,
0)
12
y1
y2
y3
6 µ12 (1, 0)
yi 2 {0, 1} 6
6 µ12 (0, 1)
6
µ12 (y1 , y2 ) µ23 (y2 , y3 )
6 µ12 (1, 1)
µ=6
6 µ (0, 0)
6 23
6 µ (1, 0)
6 23
4
non-negative
(1) µ
(y
,
y
)
0,
µ23 (0, 1)
Pij i j
normalized
(2)
µ
(y
,
y
)
=
1,
µ
(1,
1)
ij
i
j
23
y
,y
i
j
P
P
consistent
(3)
yi µij (yi , yj ) =
yk µjk (yj , yk )
yi , yj , yk 2 {0, 1}, (i, j) & (j, k) 2 E
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Linear constraints: examples
• The marginal polytope for tree structured models is defined by simple constraints on the edge marginals
2
µ
(0,
0)
12
y1
y2
y3
6 µ12 (1, 0)
yi 2 {0, 1} 6
6 µ12 (0, 1)
6
µ12 (y1 , y2 ) µ23 (y2 , y3 )
6 µ12 (1, 1)
µ=6
6 µ (0, 0)
6 23
6 µ (1, 0)
6 23
4
non-negative
(1) µ
(y
,
y
)
0,
µ23 (0, 1)
Pij i j
normalized
(2)
µ
(y
,
y
)
=
1,
µ
(1,
1)
ij
i
j
23
y
,y
i
j
P
P
consistency
consistent
(3)
yi µij (yi , yj ) =
yk µjk (yj , yk )
yi , yj , yk 2 {0, 1}, (i, j) & (j, k) 2 E
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Linear constraints: examples
• The marginal polytope for tree structured models is defined by simple constraints on the edge marginals
2
µ
(0,
0)
12
y1
y2
y3
6 µ12 (1, 0)
yi 2 {0, 1} 6
6 µ12 (0, 1)
6
µ12 (y1 , y2 ) µ23 (y2 , y3 )
6 µ12 (1, 1)
µ=6
6 µ (0, 0)
6 23
6 µ (1, 0)
6 23
4
non-negative
(1) µ
(y
,
y
)
0,
µ23 (0, 1)
Pij i j
normalized
(2)
µ
(y
,
y
)
=
1,
µ
(1,
1)
ij
i
j
23
y
,y
i
j
P
P
consistent
(3)
yi µij (yi , yj ) =
yk µjk (yj , yk )
yi , yj , yk 2 {0, 1}, (i, j) & (j, k) 2 E
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Linear constraints: examples
• The marginal polytope for tree structured models is defined by simple constraints on the edge marginals
2
µ
(0,
0)
12
y1
y2
y3
6 µ12 (1, 0)
yi 2 {0, 1} 6
6 µ12 (0, 1)
6
µ12 (y1 , y2 ) µ23 (y2 , y3 )
6 µ12 (1, 1)
µ=6
6 µ (0, 0)
6 23
µ = {µij (yi , yj )} 2 ML i↵
6 µ (1, 0)
6 23
4
non-negative
(1) µ
(y
,
y
)
0,
µ23 (0, 1)
Pij i j
normalized
(2)
µ
(y
,
y
)
=
1,
µ
(1,
1)
ij
i
j
23
y
,y
i
j
P
P
consistent
(3)
yi µij (yi , yj ) =
yk µjk (yj , yk )
{0, 1}, (i,“pairwise
j) & (j, k)relaxation”
2E
i , yj , yk 2polytope”,
“Localymarginal
• We will prove shortly that, indeed, for trees ML = M
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
5
Linear constraints: examples
• It is often convenient to add extra single node statistics
2
µ1 (0)
y1
y2
y3
6 µ1 (1)
µ1 (y1 )
µ3 (y3 ) yi 2 {0, 1} 6
6 µ2 (0)
6
µ12 (y1 , y2 ) µ2 (y2 ) µ23 (y2 , y3 )
6 µ2 (1)
6
6 µ3 (0)
6
6 µ3 (1)
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
6
6
µ
(0,
0)
6
12
non-negative
(1) µ
(y
,
y
)
0,
µ=6
Pij i j
µ
(1,
0)
6
12
normalized
(2)
µ
(y
)
=
1,
6
P yi i i
µ
(0,
1)
6
12
(3)
µij (yi , yj ) = µj (yj )
6
y
i
P
consistent
µ
(1,
1)
6
12
6
yj µij (yi , yj ) = µi (yi )
6 µ (0, 0)
6 23
6 µ (1, 0)
6 23
4 µ23 (0, 1)
(same polytope, embedded in higher dimensions)
µ (1, 1)
23
Monday, December 12, 11
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
“Proof” for trees
• Consider tree structured models where the expected statistics include single node marginals (for simplicity)
a 2 · µ  b2
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
• clearly any µ 2 M also belongs to ML
Monday, December 12, 11
a 1 · µ  b1
M
(y)
M ✓ ML
“Proof” for trees
• Consider tree structured models where the expected statistics include single node marginals (for simplicity)
a 2 · µ  b2
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
• clearly any µ 2 M also belongs to ML
a 1 · µ  b1
M
(y)
M ✓ ML
• any µ 2 ML gives rise to a tree distribution
Y
Y µij (yi , yj )
P (y) =
µi (yi )
µ
(y
)µ
(y
)
i
i
j
j
i
(i,j)2E
P
such that y P (y) (y) = µ 2 M
ML ✓ M
Monday, December 12, 11
Excluding vertexes (for trees)
• For trees, pairwise relaxation suffices to define M
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
• What if we needed to exclude a
specific (e.g., the maximizing) vertex?
Monday, December 12, 11
✓
(y ⇤ )
M
Excluding vertexes (for trees)
• For trees, pairwise relaxation suffices to define M
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
• What if we needed to exclude a
specific (e.g., the maximizing) vertex?
Monday, December 12, 11
✓
M(y ⇤ )
(y ⇤ )
Excluding vertexes (for trees)
• For trees, pairwise relaxation suffices to define M
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
✓
• What if we needed to exclude a
specific (e.g., the maximizing) vertex?
(4)
X
i
Monday, December 12, 11
(1
di )µi (yi⇤ ) +
X
(i,j)2E
M(y ⇤ )
µij (yi⇤ , yj⇤ )  0
(y ⇤ )
(Fromer et al. ’09)
Excluding vertexes (for trees)
• For trees, pairwise relaxation suffices to define M
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
• What if we needed to exclude a
✓
specific (e.g., the maximizing) vertex?
(4)
X
i
(1
(y ⇤ )
M(y ⇤ )
X
di )µi (yi⇤ ) +
µij (yi⇤ , yj⇤ )  0
=1
=1
(i,j)2E
(Fromer et al. ’09)
⇢
if µ = (y ⇤ )
Monday, December 12, 11
=1
) violated!
Excluding vertexes (for trees)
• For trees, pairwise relaxation suffices to define M
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
✓
• What if we needed to exclude a
specific (e.g., the maximizing) vertex?
(4)
X
(1
di )µi (yi⇤ ) +
i
X
(i,j)2E
(y ⇤ )
M(y ⇤ )
µij (yi⇤ , yj⇤ )  0
(Fromer et al. ’09)
⇢
⇤
if µ = neighbor of (y )
Monday, December 12, 11
=0
) tight!
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y)
y
Monday, December 12, 11
(i,j)
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y) = max ✓ · µ
y
(i,j)
µ2M
M
Monday, December 12, 11
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y) = max ✓ · µ
y
(i,j)
µ2M
M
Monday, December 12, 11
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y) = max ✓ · µ
y
(i,j)
µ 2 M i↵
a) µ
Pij 0
b)
µij = n 1
(i,j)
P
c)
(i,j)2E(S) µij  |S|
Monday, December 12, 11
µ2M
M
1, 8S ⇢ V
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y) = max ✓ · µ
y
(i,j)
µ 2 M i↵
a) µ
Pij 0
b)
µij = n 1
(i,j)
P
c)
(i,j)2E(S) µij  |S|
µ2M
M
1, 8S ⇢ V
• There are exponentially many constraints!
Monday, December 12, 11
The spanning tree polytope
• Suppose our MAP problem involves finding the
maximum weight spanning tree
yij = 1 if edge (i, j) is selected and zero otherwise
⇢
0, if y specifies a spanning tree
✓T (y) =
1, otherwise
X
max
yij ✓ij + ✓T (y) = max ✓ · µ
y
(i,j)
µ 2 M i↵
a) µ
Pij 0
b)
µij = n 1
(i,j)
P
c)
(i,j)2E(S) µij  |S|
µ2M
M
1, 8S ⇢ V
• There are exponentially many constraints!
• Polynomial description exists via lift & project (e.g.,Yannakakis ’90)
Monday, December 12, 11
The acyclic subgraph polytope
• The problem is considerably harder if we consider arc
selections that form acyclic graphs (cf. structure
learning)
yij = 1 if arc i ! j is selected and zero otherwise
⇢
0, if y specifies a DAG
✓DAG (y) =
1, otherwise
X
max
yij ✓ij + ✓DAG (y) = max ✓ · µ
y
i,j
µ2M
M
Monday, December 12, 11
The acyclic subgraph polytope
• The problem is considerably harder if we consider arc
selections that form acyclic graphs (cf. structure
learning)
yij = 1 if arc i ! j is selected and zero otherwise
⇢
0, if y specifies a DAG
✓DAG (y) =
1, otherwise
X
max
yij ✓ij + ✓DAG (y) = max ✓ · µ
y
µ2M
i,j
• Many but not all of the facet defining
inequalities are known (e.g., Grötschel et al. ’85)
µij
0,
X
i!j2C
Monday, December 12, 11
µij  |C|
1, 8 cycles C
M
“cycle inequalities”
Trees vs. non-trees
0
(y)
M0
M
M = ML
y1
Monday, December 12, 11
(y)
y3
y3
y2
y1
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
y2
Trees vs. non-trees
vertexes in M0 remain
vertexes in M0L
0
(y)
M0
M
M = ML
y1
Monday, December 12, 11
(y)
0
M ⇢
y3
y2
0
ML
y1
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
y3
y2
Trees vs. non-trees
vertexes in M0 remain
vertexes in M0L
0
(y)
M0
M
M = ML
y1
Monday, December 12, 11
0
M ⇢
y3
y2
0
ML
y1
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
(y)
µ0
fractional
vertex
y3
y2
Trees vs. non-trees
vertexes in M0 remain
vertexes in M0L
✓
0
(y)
M0
M
M = ML
y1
Monday, December 12, 11
0
M ⇢
y3
y2
0
ML
y1
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
(y)
µ0
fractional
vertex
y3
y2
Trees vs. non-trees
vertexes in M0 remain
vertexes in M0L
0
(y)
M
M = ML
y1
Monday, December 12, 11
M
0
M ⇢
y3
y2
0
0
ML
y1
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
(y)
µ
0
✓
fractional
vertex
y3
y2
Fractional vertexes
0
(y)
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
0
0
M
µ
(3)
µij (yi , yj ) = µj (yj )
y
i
P
fractional
µ
(y
,
y
)
=
µ
(y
)
0
0
i i
yj ij i j
M ⇢ ML
vertex

0.5
“disagreement form”
“disagreement form”
0.5


0 0.5
0 0.5
y3 0.5 0
0.5
0
these satisfy (1)-(3)
0
µ =
but
no
distribution

y1
y2 
can generate
0.5
0.5

these marginals
0.5
0.5
0 0.5
0.5 0
“disagreement form”
Monday, December 12, 11
(cf. Deza & Laurent ’96)
Fractional vertexes
0
(y)
µ0 fractional
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
vertex
(1) µ
(y
,
y
)
0,
Pij i j
(2)
µi (yi ) = 1,
y
i
P
0
M
(3)
µij (yi , yj ) = µj (yj )
y
i
P
0
0
yj µij (yi , yj ) = µi (yi )
M ⇢ ML

0.5
“agreement form”
0.5  “disagreement form”

0.5 0
0 0.5
y3 0.5 0
0 0.5
these satisfy (1)-(3)
0
µ =
but
no
distribution

y1
y2 
can generate
0.5
0.5

these marginals
0.5
0.5
0.5 0
0 0.5
“agreement form”
Monday, December 12, 11
Persistency
• Theorem in pairwise MRF models, any partially integral
solution resulting from pairwise relaxation can be
extended to an optimal solution provided that
µi (yi ) > 0 8yi
holds for all neighbors of integral nodes
(cf. Nemhauser et al. ’75, Kolmogorov et al. ’05, Weiss et al. ’08)
µ = {µi (yi ), µij (yi , yj )} 2 ML i↵
(1) µ
(y
,
y
)
0,
ij
i
j
P
(2)
µi (yi ) = 1,
y
i
P
(3)
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )


0.5
0.5
y1
0.5
0.5

y3

y2
0
1

1
0
0.5
0.5
(pairwise marginals not shown)
Monday, December 12, 11
Preventing fractional vertexes
X • Structural constraints
- e.g., trees, planar, perfect graphs (Jebara, ‘10)
✓
0
• Parameter restrictions
(y)
- e.g., attractive potentials
• Tighter relaxations
- e.g., Gomory, Sherali-Adams (‘90),
Lovasz and Schrijver (‘91), Lasserre (’01)
Monday, December 12, 11
M0
µ
0
fractional
vertex
0
M0 ⇢ ML
µ0 =


0.5
0
0.5
0.5
Parameter
restrictions

0.5
“disagreement form”
0.5 
0 0.5
y3 0.5 0
0
0.5
y1

0.5
0
0
0.5
y2 
0.5
0.5
M0
0
M ⇢
0
ML
✓
0
(y)
µ0
fractional
vertex
• In the pairwise relaxation, a fractional vertex has to have
at least one pairwise marginal in a “disagreement” form.
• We can eliminate such marginals from the maximizing
set by constraining the parameters
✓ij (1, 1) + ✓ij (0, 0)
(a.k.a. attractive potentials)
Monday, December 12, 11
✓ij (1, 0) + ✓ij (0, 1)
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
marginal
probabilities
Monday, December 12, 11
✓
current solution
pairwise
constraints
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
✓
violated
constraint
marginal
probabilities
Monday, December 12, 11
current solution
pairwise
constraints
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
✓
violated
constraint
new solution
marginal
probabilities
Monday, December 12, 11
current solution
pairwise
constraints
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
violated
constraint
✓
violated
constraint
new solution
marginal
probabilities
Monday, December 12, 11
current solution
pairwise
constraints
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
violated
constraint
✓
violated
constraint
new solution
marginal
probabilities
Monday, December 12, 11
current solution
pairwise
constraints
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
violated
constraint
✓
violated
constraint
new solution
marginal
probabilities
current solution
pairwise
constraints
M ✓ M(k) ⇢ . . . ⇢ M(1) ⇢ ML
Monday, December 12, 11
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
violated
constraint
✓
violated
constraint
new solution
marginal
probabilities
current solution
pairwise
constraints
M ✓ M(k) ⇢ . . . ⇢ M(1) ⇢ ML
• What are the additional constraints? How to find them?
Monday, December 12, 11
Tighter relaxations
• We can iteratively add additional constraints to remove
fractional vertexes
violated
constraint
✓
violated
constraint
new solution
current solution
marginal
probabilities
pairwise
constraints
M ✓ M(k) ⇢ . . . ⇢ M(1) ⇢ ML
• What are the additional constraints? How to find them?
µ
lifting

µ
⌧
projection
simpler constraints
Monday, December 12, 11
µ
Tighter relaxations via lifting
• We can eliminate some fractional vertexes by requiring
that any triplet of pairwise marginals come from a valid
distribution
✓
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
µij (yi , yj ) = µj (yj )
y
i
P
yj µij (yi , yj ) = µi (yi )
marginal
probabilities
pairwise
constraints
current
solution
y3
y1
Monday, December 12, 11
y2
Tighter relaxations via lifting
• We can eliminate some fractional vertexes by requiring
that any triplet of pairwise marginals come from a valid
distribution
✓
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
marginal
µ
(y
,
y
)
=
µ
(y
)
ij
i
j
j
j
y
P i
probabilities
yj µij (yi , yj ) = µi (yi )
P
(4) ⌧ijk (yi , yj , yk ) 0,
⌧
(y
,
y
,
y
)
=
1
ijk
i
j
k
y
,y
,y
i
j
k
P
(5)
⌧ijk (yi , yj , yk ) = µij (yi , yj )
y
k
P
⌧ijk (yi , yj , yk ) = µik (yi , yk )
y
j
P
yi ⌧ijk (yi , yj , yk ) = µjk (yj , yk )
y1
Monday, December 12, 11
pairwise
constraints
current
solution
y3
y2
Tighter relaxations via lifting
• We can eliminate some fractional vertexes by requiring
that any triplet of pairwise marginals come from a valid
distribution
✓ new
constraints
(1)
(2)
(3)
µ
(y
,
y
)
0,
ij
i
j
P
µi (yi ) = 1,
y
i
P
marginal
µ
(y
,
y
)
=
µ
(y
)
ij
i
j
j
j
y
P i
probabilities
yj µij (yi , yj ) = µi (yi )
P
(4) ⌧ijk (yi , yj , yk ) 0,
⌧
(y
,
y
,
y
)
=
1
ijk
i
j
k
y
,y
,y
i
j
k
P
(5)
⌧ijk (yi , yj , yk ) = µij (yi , yj )
y
k
P
⌧ijk (yi , yj , yk ) = µik (yi , yk )
y
j
P
yi ⌧ijk (yi , yj , yk ) = µjk (yj , yk )
y1
Monday, December 12, 11
new solution
pairwise
constraints
current
solution
y3
y2
Dual decomposition
• There are many possible ways of preventing fractional
vertexes, including
X - Structural constraints
- e.g., trees, planar, perfect graphs (Jebara, ‘10)
X - Parameter restrictions
- e.g., attractive potentials
X - Tighter relaxations
- e.g., Gomory, Sherali-Adams (‘90),
Lovasz and Schrijver (‘91), Lasserre (’01)
• These are key ingredients in decomposition methods
that break the original problem into exactly solvable
sub-problems (cf. Guignard, Fisher, ‘80’s)
- e.g., Wainwright et al. ’02+, Johnson et al. ‘07, Kommodakis
’07+, Sontag et al. ’08+, etc.
Monday, December 12, 11
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
X
✓f (yf )
f 2F
(i,j)2E
✓f (yf )
y1
y2
y5
y1
y1
y2
y1
✓g (yg )
y5
y5
y2
y3
y3
y4
y3
Monday, December 12, 11
y5
y4
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
X
✓f (yf )
f 2F
(i,j)2E
✓f (yf )
y1
y2
y5
y1
y1
y2
y5
✓g (yg )
y5
y2
y3
y3
y4
y3
Monday, December 12, 11
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
y1
y3
Monday, December 12, 11
✓f (yf )
f 2F
(i,j)2E
y2
X
y5
y4
✓f (yf ) y1
y2
y3
y1
y5
✓g (yg )
y5
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
✓f (yf )
y1
y3
Monday, December 12, 11
✓f (yf )
f 2F
(i,j)2E
y2
X
y5
y4
y2
y3
y1
y5
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
X
✓f (yf )
f 2F
(i,j)2E
✓f (yf )
y1
y2
y5
y1
y1
y2
y5
✓g (yg )
y5
y2
y3
y3
y4
y3
Monday, December 12, 11
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓ij (yi , yj ) =
X
✓i (yi ) +
i
(i,j)2E
X
f 2F
y1
✓f (yf )
y1
y2
y5
y2
y4
y3
Monday, December 12, 11
✓1 (y1 )
y1
y1
y5
y2
y2
y3
✓f (yf )
y3
y3
✓g (yg )
y5
y5
y4
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓i (yi ) +
i
X
✓f (yf )
f 2F
y1
✓f (yf )
y2
y3
Monday, December 12, 11
y1
y1
y5
y2
y2
✓1 (y1 )
y3
y3
✓g (yg )
y5
y5
y4
y4
Decomposition
• We can always break the original model into pieces that
would be exactly solvable in isolation
X
✓i (yi ) +
i
X
f 2F
✓f (yf ) = ✓ · (y)
y1
✓f (yf )
y2
y3
Monday, December 12, 11
y1
y1
y5
y2
y2
✓1 (y1 )
y3
y3
✓g (yg )
y5
y5
y4
y4
Decomposition, LP relaxation
• The decomposition leads to a natural component-wise LP
relaxation
max
y
X
✓i (yi ) +
i
X
f 2F
= max
µ2M
✓f (yf ) = max ✓ · µ
XX
i
µ2M
µi (yi )✓i (yi ) +
yi
XX
f 2F yf
y1
✓f (yf )
y2
y2
y3
✓1 (y1 )
y1
y1
y5
y2
Monday, December 12, 11
µf (yf )✓f (yf )
y3
y3
✓g (yg )
y5
y5
y4
y4
Decomposition, LP relaxation
• The decomposition leads to a natural component-wise LP
relaxation
max
y
X
✓i (yi ) +
i
= max
µ2ML
X
f 2F
✓f (yf )  max
µ2ML
XX
µi (yi )✓i (yi ) +
yi
i
µ = {µi (yi ), µf (yf )} 2 ML i↵
µi (yi ) 0, µf (yf )
X
µi (yi ) = 1
X
yf \i
y1
✓f (yf )
µf (yf )✓f (yf )
y2
y2
y3
✓1 (y1 )
y1
y1
y5
y2
µf (yf ) = µi (yi ), if i 2 f
Monday, December 12, 11
XX
f 2F yf
0
yi
✓·µ
y3
y3
✓g (yg )
y5
y5
y4
y4
Dual decomposition
• We can think of the relaxation as enforcing that the
components agree about the maximizing assignment
max
y
X
✓i (yi ) +
i

X
i
X
✓f (yf )
f 2F
max ✓i (yi ) +
yi
X
max ✓f (yf )
f 2F
yf
y1
yf)
(
f
✓
y2
y2
y2
Monday, December 12, 11
y3
y3
)
1
y
✓ 1( y
1
y1
y5
✓g (yg )
y5
y5
y4
y4
Dual decomposition
• We can think of the relaxation as enforcing that the
components agree about the maximizing assignment
max
y
X
✓i (yi ) +
i

X
i
X
✓f (yf )
f 2F
max ✓i (yi ) +
yi
X
f 2F
max ✓f (yf )
yf
• Agreements can be enforced via
Lagrange multipliers associated
with the equality constraints
)
1
y
(
f1
y1
+
)
1
y
(
✓1 y
✓g (yg )
1
y1
y5
y5
y3
y5
)
f
y
✓ f(
y2
y2
y2
Monday, December 12, 11
y3
f
)
1
y
1(
y4
y4
Dual decomposition
• We can think of the relaxation as enforcing that the
components agree about the maximizing assignment
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f i (yi )
+
X
f 2F
f :i2f
max ✓f (yf )
yf
• Agreements can be enforced via
Lagrange multipliers associated
with the equality constraints
+
)
1
y
(
✓1 y
✓g (yg )
1
y1
y5
y5
y3
y5
y2
Monday, December 12, 11
f
)
1
y
1(
f1
y1
)
f
y
✓ f(
y2
y3
f i (yi )
i2f
)
1
y
(
y2
X
y4
y4
Dual decomposition
• We can think of the relaxation as enforcing that the
components agree about the maximizing assignment
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f i (yi )
+
X
f 2F
f :i2f
max ✓f (yf )
yf
• Agreements can be enforced via
Lagrange multipliers associated
)
with the equality constraints
f
y
✓ f(
• Lagrange multipliers rey2
parameterize the model
y2
Monday, December 12, 11
y3
X
f i (yi )
i2f
)
1
y
(
f
)
1
y
1(
f1
y1
+
)
1
y
(
✓1 y
✓g (yg )
1
y1
y5
y5
y3
y5
y2
y4
y4
Dual decomposition
• We minimize the dual objective
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
f i (yi )
i2f
with respect to Lagrange multipliers associated with
equality constraints
- this is an unconstrained convex minimization problem (cf.
constrained primal maximization)
- minimization seeks to enforce agreements among the subproblems
- a number of distributed algorithms have been developed for
this purpose (e.g., Werner ’07, Globerson et al., ’08) (2nd part)
Monday, December 12, 11
Weak duality
dual =
X
i
max ✓i (yi ) +
yi
primal constraints:
Monday, December 12, 11
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
i2f
f i (yi )
Weak duality
dual =
X
i
XX
i
max ✓i (yi ) +
yi
X
f i (yi )
f :i2f
µi (yi ) ✓i (yi ) +
yi
X
f i (yi )
f :i2f
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
yi
Monday, December 12, 11
+
+
X
f 2F
max ✓f (yf )
yf
X
i2f
f i (yi )
Weak duality
dual =
X
i
XX
i
max ✓i (yi ) +
yi
X
f i (yi )
+
f :i2f
µi (yi ) ✓i (yi ) +
yi
X
f i (yi )
f :i2f
+
X
f 2F
max ✓f (yf )
yf
XX
µf (yf ) ✓f (yf )
µf (yf )
X
yi
Monday, December 12, 11
X
i2f
0,
f i (yi )
i2f
f 2F yf
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
X
yf
µf (yf ) = 1
f i (yi )
Weak duality
dual =
X
i
XX
i
max ✓i (yi ) +
yi
X
f i (yi )
+
f :i2f
µi (yi ) ✓i (yi ) +
yi
X
f i (yi )
f :i2f
+
X
max ✓f (yf )
f 2F
yf
XX
µf (yf ) ✓f (yf )
µf (yf )
X
yi
X
yf \i
Monday, December 12, 11
µf (yf ) = µi (yi ), 8i 2 f
X
i2f
0,
f i (yi )
i2f
f 2F yf
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
X
yf
µf (yf ) = 1
f i (yi )
Weak duality
dual =
X
i
XX
i
max ✓i (yi ) +
yi
X
f i (yi )
+
f :i2f
µi (yi ) ✓i (yi ) +
yi
X
f i (yi )
f :i2f
+
X
max ✓f (yf )
f 2F
yf
XX
µf (yf ) ✓f (yf )
µf (yf )
X
yi
X
yf \i
Monday, December 12, 11
µf (yf ) = µi (yi ), 8i 2 f
X
i2f
0,
f i (yi )
i2f
f 2F yf
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
X
yf
µf (yf ) = 1
f i (yi )
Weak duality
dual =
X
i
XX
i
=
yi
f i (yi )
+
f :i2f
µi (yi ) ✓i (yi ) +
yi
XX
i
max ✓i (yi ) +
X
X
f i (yi )
f :i2f
+
X
max ✓f (yf )
f 2F
XX
µi (yi )✓i (yi )
+
XX
µf (yf ) ✓f (yf )
X
µf (yf )✓f (yf ) = primal
µf (yf )
yi
X
yf \i
f i (yi )
i2f
f 2F yf
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
f i (yi )
i2f
f 2F yf
yi
Monday, December 12, 11
yf
X
µf (yf ) = µi (yi ), 8i 2 f
0,
X
yf
µf (yf ) = 1
Weak duality
dual =
X
i
XX
i
=
yi
f i (yi )
+
f :i2f
µi (yi ) ✓i (yi ) +
yi
XX
i
max ✓i (yi ) +
X
X
f i (yi )
f :i2f
+
X
max ✓f (yf )
f 2F
XX
µi (yi )✓i (yi )
+
XX
µf (yf ) ✓f (yf )
X
µf (yf )✓f (yf ) = primal
µf (yf )
yi
X
yf \i
f i (yi )
i2f
f 2F yf
primal constraints:
X
µi (yi ) 0,
µi (yi ) = 1
f i (yi )
i2f
f 2F yf
yi
Monday, December 12, 11
yf
X
µf (yf ) = µi (yi ), 8i 2 f
0,
X
yf
µf (yf ) = 1
Strong duality
• Min dual value = max primal value
X
i
=
XX
i
yi
max ✓i (yi ) +
yi
µi (yi ) ✓i (yi ) +
X
f i (yi )
f 2F
f :i2f
X
+
X
yf
i2f
µf (yf ) ✓f (yf )
f 2F yf
complementary slackness
µi (yi ) > 0 ) yi 2 argmax ✓i (yi ) +
yi
µf (yf ) > 0 ) yf 2 argmax ✓f (yf )
yf
X
i2f
• At the primal-dual optimum, equality follows from
Monday, December 12, 11
f i (yi )
f i (yi )
f :i2f
+
XX
max ✓f (yf )
X
X
f i (yi )
f :i2f
X
i2f
f i (yi )
f i (yi )
Dual properties
• The dual objective
max
y

X
i
Monday, December 12, 11
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
i2f
f i (yi )
Dual properties
• The dual objective
max
y

X
i
X
✓i (yi ) +
i
X
✓f (yf )
f 2F
max ✓i (yi ) +
yi
X
f i (yi )
+
X
f 2F
f :i2f
max ✓f (yf )
yf
X
i2f
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
✓f i (yi )
f :i2f
X
i2f
✓f i (yf )
8i
8f 2 F
f i (yi )
Dual properties
• The dual objective
max
y

X
i
X
✓i (yi ) +
i
X
✓f (yf )
f 2F
max ✓i (ŷi ) +
yi
X
f i (ŷi )
+
X
f 2F
f :i2f
max ✓f (ŷf )
yf
X
i2f
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
✓f i (yi )
f :i2f
X
i2f
✓f i (yf )
8i
8f 2 F
f i (ŷi )
Dual properties
• The dual objective
max
y

X
X
✓i (yi ) +
i
X
✓f (yf )
f 2F
✓i (ŷi ) +
i
X
f i (ŷi )
+
X
✓f (ŷf )
f 2F
f :i2f
X
f i (ŷi )
i2f
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
f i (yi )
8i
f i (yi )
8f 2 F
f :i2f
X
i2f
Dual properties
• The dual objective
max
y
X
✓i (yi ) +
X
✓f (yf )
reparameterization
X
X
X
X

✓i (ŷi ) +
✓f (ŷf )
f i (ŷi ) +
f i (ŷi )
i
f 2F
i
f 2F
f :i2f
i2f
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
f i (yi )
8i
f i (yi )
8f 2 F
f :i2f
X
i2f
Dual properties
• The dual objective
max
y

X
i
X
✓i (yi ) +
i
✓i (ŷi ) +
X
X
✓f (yf )
f 2F
✓f (ŷf )
f 2F
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
f i (yi )
8i
f i (yi )
8f 2 F
f :i2f
X
i2f
Dual properties
• The dual objective
max
y
=
X
i
X
✓i (yi ) +
i
✓i (ŷi ) +
X
X
✓f (yf )
f 2F
) ŷ is a MAP assignment !
✓f (ŷf )
f 2F
• Suppose we find a maximizing assignment ŷ that is the
same for all the terms
ŷi 2 argmax ✓i (yi ) +
yi
ŷf 2 argmax ✓f (yf )
yf
Monday, December 12, 11
X
f i (yi )
8i
f i (yi )
8f 2 F
f :i2f
X
i2f
Dual properties
• We minimize the dual objective
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
i2f
with respect to Lagrange multipliers associated with
equality constraints
• Theorem (certificate): if the components agree
about an assignment, the assignment is optimal
Monday, December 12, 11
f i (yi )
Dual properties
• We minimize the dual objective
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
i2f
with respect to Lagrange multipliers associated with
equality constraints
• Theorem (certificate): if the components agree
about an assignment, the assignment is optimal
• We can obtain such an agreement only if the
corresponding LP relaxation is tight and the Lagrange
multipliers are set optimally
Monday, December 12, 11
f i (yi )
Dual decomposition: summary
max
y

X
i
X
i
✓i (yi ) +
X
f 2F
max ✓i (yi ) +
yi
✓f (yf )
X
f :i2f
f i (yi )
+
X
f 2F
max ✓f (yf )
yf
X
f i (yi )
i2f
• we try to solve the MAP problem by encouraging
exactly solvable sub-problems to agree
• the dual objective is an unconstrained convex
optimization problem (cf. primal)
• certificate of optimality = primal integrality
• the dual value always upper bounds
dual
the MAP value (cf. learning)
• dual value can guide search for tighter
relaxations (e.g., Sontag et al. ’08)
LP value
MAP value
primal
time
Monday, December 12, 11

Download Report

Linear Programming Relaxations for Graphical Models

Paperzz.com

Your Paperzz