Submodular-Bregman and the Lovász-Bregman Divergences with Applications
Rishabh Iyer and Jeff Bilmes
University of Washington, Seattle
Overview
I
I
I
I
Introduce Submodular Bregman and Lovász Bregman divergences.
They subsume many known discrete divergences (e.g., Hamming
Distance, Conditional Mutual Information, etc.).
Show many useful properties of these divergences.
Discuss two applications: a proximal framework for submodular
optimization, and a Lovász Bregman clustering framework.
I
I
The perm. lower bound SB (PLSB): Special case of GLSB when the subgradient
is an extreme point.
The extreme lower bound SB (ELSB): Extreme subgradients:
argminh∈∂f (Y )hh, 1X − 1Y i and argmaxh∈∂f (Y )hh, 1X − 1Y i. Then,
I
Greedy, permutation σY of ground set V
I Sj = {σ(1), . . . , σ(j)}, and S|Y | = Y
I hY ,σ (σY (i)) = f (Si ) − f (Si−1 )
Y
V
, f (X ) −
d3f (X , Y )
∂ (Y ) = {y ∈ R : f (Y ) − y (Y ) ≥ f (X ) − y (X ) for all X ⊆ V }
j∈X \Y
j∈Y \X
X
X
f (j|V − {j}) +
j∈X \Y
j∈Y \X
X
X
, f (X ) −
j∈X \Y
There is no simple characterization of extreme points,
However, ∃ three efficiently computable supergradients:
j ∈X :
j∈
/X :
gX1 (j) = f (j | X \ j)
gX1 (j) = f (j)
gX2 (j) = f (j | V \ j)
gX2 (j) = f (j | X )
I
gX3 (j) = f (j | V \ j)
gX3 (j) = f (j)
Define Hf and G f as a subgradient and supergradient map such that
Hf (Y ) = hY ∈ ∂f (Y ) and G f (Y ) = gY ∈ ∂ f (Y )
The generalized Bregman divergences (GB):
I Generalization of Bregman divergences to non-differentiable convex
functions.
I Done via a subgradient map Hφ, which is such that Hφ(y ) = hy ∈ ∂φ(y).
Hφ
dφ (x, y )
= φ(x) − φ(y ) − hHφ(y ), x − y i, ∀x, y ∈ S.
(1)
Table: Instances of weighted divergences as special cases of dfHf and dGf f , w ∈ Rn+
Name
Hamming
Hamming
Recall
Type
dfHf
dGf f
dfHf
d
w(X \Y ) + w(Y \X )
w(X \Y ) + w(Y \X )
∩Y )
1 − w(X
w(Y )
f (X )
w(X )
−w(X )
1
Precision
dGf f
dfHf
]
df
dGf f
dGf f
w(X ∩Y )
1 − w(X )
∩X |
1 − |Y |+|Y
2|Y |
-1
I(XX \Y ; XY \X |XX ∩Y )
w(Y )
w(Y )
−
log
w(X )
w(X ) − 1
H(XX )
log w(X )
AER(Y , X ; Y )
Cond. MI
Itakura-Saito
Gen. KL
)
w(Y ) log w(Y
w(X )
1
2
Hf (Y )/G f (X )
2 · w 1Y
−2 · w 1X
w1Y
w(Y )
X
− w1
|X |
1Y
2|Y |
w
w(X )
− w(Y ) + w(X ) −w(X ) log w(X ) −w(1 + log w(X ))
The Lovász Bregman divergence (LB):
I Special case of GBDs
I Defined via subgradient hy ,σ (σy is the ordering of elements of y):
y
hy ,σy (σy (k)) = f (Yk ) − f (Yk −1), ∀k , Yi = {σy (1), · · · , σy (i)}.
I The Lovász Bregman divergence (LBD) is:
df̂ (x, y) = f̂ (x) − hhy,σy , xi
(2)
(7)
f (j|V − {j}) +
f (j|X ) − f (Y ).
(9)
f (j|∅) − f (Y ).
(10)
I
j∈Y \X
The extreme upper bound SB (EUSB): Similar to the ELSB, we can define an
analog for USBs from the Nemhauser upper bounds.
X
X
f
d] (X , Y ) , f (X ) −
f (j|X − {j}) +
f (j|X ∩ Y ) − f (Y )
(11)
j∈X \Y
d\f (X , Y ) , f (X ) −
X
j∈Y \X
f (j|X ∪ Y − {j}) +
j∈X \Y
The Bregman divergences
I
The Nemhauser upper bound SB (NUSB): These are special cases of GUSB, with
gX1 , gX2 and gX3 .
X
X
f
d1(X , Y ) , f (X ) −
f (j|X − {j}) +
f (j|∅) − f (Y ),
(8)
d2f (X , Y )
Superdifferential:
The Algorithm above subsumes a large number of combinatorial
optimization algorithms.
The gen. upper bound SB (GUSB): The gen. upper bound SB is:
dGf f (X , Y ) = f (X ) − f (Y ) − hG f (X ), 1X − 1Y i
I
I
(5)
(6)
= f (X ) + f (Y ) − f (X ∩ Y ) − f (X ∪ Y ),
df\(X , Y ) = f (X ) − f (Y ) − f (Y \X ) − f (V ) + f (V \X \Y )
∂f (Y ) = {y ∈ RV : f (Y ) − y (Y ) ≤ f (X ) − y(X ) for all X ⊆ V }
I
(4)
]
df (X , Y )
Subdifferential:
f
(3)
dfΣ(X , Y ) = f (X ) − hY ,σY (X ) = f (X ) − hHfΣ(Y ), 1X i.
I
Initialize X 0 ; t ← 0 ;
repeat
X t+1 := argmin X ∈S F (X ) + λd(X , X t )
t ←t +1
until Convergence
The gen. lower bound SB (GLSB): The gen. lower bound SB is
dfHf (X , Y ) = f (X ) − f (Y ) − hHf (Y ), 1X − 1Y i).
The Submodular Semi-Differentials
I
Proximal framework for submodular optimization
The submodular Bregman (SB) divergences
X
f (j|X ) − f (Y ),
(12)
j∈Y \X
Theorem
For a submodular function f , d3f (X , Y ) ≥ d1f (X , Y ) ≥ d]f (X , Y ) and
d3f (X , Y ) ≥ d2f (X , Y ) ≥ d\f (X , Y ). Similarly for every permutation
]
\
Σ
map Σ, df (X , Y ) ≤ df (X , Y ) ≤ df (X , Y ). Also the Lovász Bregman
divergence is a continuous extension of PLSB.
Properties of SB and LB
Submodular Bregman (SB):
Lovász Bregman (LB):
I The SBs are non-negative.
I Non-negative and convex in
I GLSB is submodular in 1st
first argument.
argument, while GUSB is
I Given a submodular function
supermodular in 2nd argument.
whose polyhedron contains all
I Other props. – equivalence
extremeppoints (e.g.,
classes, set separation,
f (X ) = |X |), df̂ (x, y ) = 0 if
generalized triangle inequality
and only if σx = σy .
over sets and form of Fenchel
I The LB divergences not only
and submodular duality.
capture the distance between
Hf
I d ∈ d
σx and σy , but also weighs it
f iff ∀A, B ⊆ V , d(X , A) is
submodular in X and d(X , A) −
with the value of x.
d(X , B) is modular in X .
I Hence we shall also represent
f
I d ∈ d f iff, ∀A, B ⊆ V ,
it as df̂ (x, y) = df̂ (x||σy )
G
d(A, Y ) is supermodular
I easily amenable to k -means for
in Y and d(A, Y ) − d(B, Y )
clustering ranked data, which is
is modular in Y .
difficult using other
I More props. given in paper.
permutation-based distances.
I
Submodular function minimization:
t
f
t
t
f
t
I Set d(X , X ) = d (X , X ), d(X , X ) = d (X , X ) and
1
2
d(X , X t ) = d3f (X t , X ) with λ = 1
I This provides improved bounds on the contraction of the lattice of
minimizers.
I In the context of constrained minimization under combinatorial
constraints (like spanning trees, perfect matchings) setting
d(X , X t ) = d2f (X t , X ) provides a number of improved and tight
curvature dependent bounds.
Submodular maximization:
I In this context F (X ) = −f (X ) for some submodular function f and we
Σv
t
set d(X , X ) = df (X , X t )
I Obtain an iterative algorithm for maximizing a submodular function.
I This subsumes a large class of approximation algorithms including
the 12 and 1 − e1 approximations for uncons. non-monotone and
cardinality cons. monotone submodular maximization.
I Also provides improved curvature dependent bounds.
Minimizing the difference between submodular functions:
I This problem is both NP hard and NP hard to approximate.
t
Σ
t
f
t
f
f
I However setting d(X , X ) = d t (X , X ) + d (X , X ), where d = d or
1
d = d2f provides iterative algorithms which though heuristics have
been shown to work well in practice.
The Lovász Bregman clustering framework
(a) LB2
(b) LB3
(c) Euc2
(d) Euc3
Figure: Results of k-means clustering using the LBD and Euclidean distance. youtube
animations at http://youtu.be/kfEnLOmvEVc and http://youtu.be/IqRhemUg14I
I
I
I
I
I
Given set of vectors x1, x2, · · · , xn which are totally ordered, cluster
these based on their orderings.
Real world applications – voter clustering, outputs of classifiers.
The right mean problem is natural – given set of scores, find the mean
permutation, which has minimum average LB divergence from this set.
The Lovász Bregman representative or the permutation
Pn
Pn
0
i=1 xi
σ = argmin σ0 i=1 df̂ (xi ||σ ) is exactly σµ, where µ = n .
Cluster these vectors through a k-means style algorithm involving
assignment step (finding the cluster membership) and re-estimation
step (finding the means).
© Copyright 2026 Paperzz