talk - People @ EECS at UC Berkeley

Parallel Double Greedy
Submodular Maxmization
Xinghao Pan, Stefanie Jegelka, Joseph Gonzalez, Joseph Bradley, Michael
I. Jordan
Submodular Set Functions
Diminishing Returns Property
F: 2V → R, such that for all
and
Submodular Examples
Sensing
(Krause & Guestrin 2005)
Document
Summarization
Network Analysis
(Kempe, Kleinberg, Tardos 2003,
Mossel & Roch 2007)
Graph Algorithms
S
clustering
graph partitioning
Max / Min
Cut
(Lin & Bilmes 2011)
Submodular Maximization
Parallel /
Distributed
Sequential
max F(A), A ⊆ V
Monotone (increasing) functions
[ Positive marginal gains ]
Non-monotone functions
Greedy (Nemhauser et al, 1978)
(1-1/e) - approximation
Optimal polytime
Double Greedy (Buchbinder et al, 2012)
½ - approximation
Optimal polytime
GREEDI (Mirzasoleiman et al, 2013)
(1-1/e)2 / p – approximation
1 MapReduce round
Concurrency Control Double Greedy
Optimal ½ - approximation
Bounded overhead
(Kumar et al, 2013)
1 / (2+ε) – approximation
O(1/ε) MapReduce rounds
Coordination Free Double Greedy
Bounded error
Minimal overhead
?
Double Greedy Algorithm
u
Set
A
Set
B
u
v
w
x
y
z
v
w
x
y
z
Double Greedy Algorithm
u
Set
A
Set
B
u
v
w
x
y
z
v
w
x
y
z
Double Greedy Algorithm
v
Set
A
Set
B
u
v
w
x
y
z
u
w
x
y
z
Marginal gains
Δ+(u|A) = F(A ∪ u) – F(A),
Δ-(u|B) = F(B \ u) – F(B).
p( u | A, B ) = + A
-B
0
1
rand
Double Greedy Algorithm
v
Set
A
Set
B
u
v
w
x
y
z
u
w
x
y
z
Marginal gains
Δ+(u|A) = F(A ∪ u) – F(A),
Δ-(u|B) = F(B \ u) – F(B).
p( u | A, B ) = + A
-B
0
1
rand
Double Greedy Algorithm
v
Set
A
u
Set
B
u
v
w
x
y
z
v
w
x
y
z
Marginal gains
Δ+(v|A) = F(A ∪ v) – F(A),
Δ-(v|B) = F(B \ v) – F(B).
p( v | A, B ) = + A
-B
0
1
rand
Double Greedy Algorithm
w
Set
A
u
Set
B
u
w
x
y
z
Return A
x
y
z
Parallel Double Greedy
Algorithm
u
Set
A
Set
B
v
CPU 1
u
v
w
x
y
z
CPU 2
w
x
y
z
Parallel Double Greedy
Algorithm
Set
A
Set
B
?
?
u
v
CPU 1
u
w
y
Δ+(u | ?) = ?
Δ-(u | ?) = ?
u
v
w
x
y
z
CPU 2
v
x
z
Δ+(v | ?) = ?
Δ-(v | ?) = ?
Concurrency Control Double
Greedy
Maintain bounds on A, B  Enable threads to make decisions locally
Set
A
Set
B
CPU 1
u
w
y
v
x
z
u
u
v
w
x
CPU 2
v
y
z
Concurrency Control Double
Greedy
Maintain bounds on A, B  Enable threads to make decisions locally
Set
A
u
v
Set
B
CPU 1
u
u
w
y
Δ+(u|A) ∈ [Δ+min(u|A), Δ+max(u|A)]
Δ-(u|B) ∈ [Δ-min(u|B), Δ-max(u|B) ]
p( u | A, B ) = + A
v
Uncertaint
y
-B
0
1
w
x
CPU 2
v
y
z
x
z
Δ+(v|A) ∈ [Δ+min(v|A), Δ+max(v|A)]
Δ-(v|B) ∈ [Δ-min(v|B), Δ-max(v|B) ]
p( v | A, B ) = + A
0
Uncertaint
y
-B
1
Concurrency Control Double
Greedy
Maintain bounds on A, B  Enable threads to make decisions locally
Set
A
u
v
Set
B
CPU 1
u
u
w
y
Δ+(u|A) ∈ [Δ+min(u|A), Δ+max(u|A)]
Δ-(u|B) ∈ [Δ-min(u|B), Δ-max(u|B) ]
Uncertaint
y
p( u | A, B ) = + A
v
-B
0
1
rand
w
x
CPU 2
v
y
z
x
z
Δ+(v|A) ∈ [Δ+min(v|A), Δ+max(v|A)]
Δ-(v|B) ∈ [Δ-min(v|B), Δ-max(v|B) ]
p( v | A, B ) = + A
0
Uncertaint
y
-B
1
Concurrency Control Double
Greedy
Maintain bounds on A, B  Enable threads to make decisions locally
Set
A
u
v
Set
B
CPU 1
w
y
Δ+(u|A) ∈ [Δ+min(u|A), Δ+max(u|A)]
Δ-(u|B) ∈ [Δ-min(u|B), Δ-max(u|B) ]
u
Uncertaint
y
p( u | A, B ) = + A
v
Common,
Fast
-B
0
1
rand
w
x
CPU 2
v
y
z
x
Rare,
Slow
z
Δ+(v|A) =
∈ F(A
[Δ+min
∪(v|A),
v) – F(A)
Δ+max(v|A)]
Δ-(v|B) =
∈ F(B
[Δ-min(v|B),
\ v) – F(B)
Δ-max(v|B) ]
p( v | A, B ) = + A
Uncertaint
y
0
-B
1
rand
Theorem: CC double greedy is serializable.
Corollary: CC double greedy preserves optimal
approximation guarantee of ½OPT.
Lemma: CC has bounded overhead.
Expected number of blocked elements
Concurren
cy
Correctness
Properties of CC Double
Greedy
set cover with costs:
< 2τ
sparse max cut:
< 2τ |E| / |V|
τ
Set A
Set B
u
u
w
w
x
x
D
y
y
E
z
Empirical Validation
Multicore up to 16 threads
Set cover, Max graph cut
Real and synthetic graphs
Graph
Vertices
Edges
IT-2004
Italian web-graph
41 Million
1.1 Billion
UK-2005
UK web-graph
39 Million
0.9 Billion
Arabic-2005
Arabic web-graph
22 Million
0.6 Billion
Friendster
Social sub-network
10 Million
0.6 Billion
Erdos-Renyi
Synthetic random
20 Million
2.0 Billion
ZigZag
Synthetic expander
25 Million
2.0 Billion
CC Double Greedy
Coordination
Only 0.01%
needs
blocking!
% elements failed + blocked
Increase in Coordination
Runtime and Strong-Scaling
Speedup for Max Graph Cut
3
15
2.5
2
Concurrency Ctrl.
Sequential
1.5
1
Speedup
Runtime relative to sequential
Runtime, relative to sequential
10
5
0.5
0
0
5
10
# threads
15
0
0
5
10
# threads
15
Conclusion
Scalability
Approximation
Sequential
Double Greedy
Always slow
Always optimal
Concurrency
Control
Double Greedy
Usually fast
Always optimal
Coordination Free
Double Greedy
Always fast
Near optimal
Paper @ NIPS 2014:
Parallel Double Greedy Submodular Maximization.
BACKUP SLIDES
Coordination Free
Theorem: serializable.
preserves optimal
approximation bound
½OPT.
τ
Lemma:
approximation
bound ½OPT error
Set A
Set B
u
u
w
w
x
x
D
y
y
E
set cover with costs:
sparse maxcut:
Correctness
Correctness
Concurrency Control
Lemma:
bounded overhead.
set cover with costs:
sparse maxcut:
from same
dependencies
(uncertainty region)
no
overhead
Concurren
Concurren
cy
z
Adversarial Setting
Processing order
Overlapping covers
 Increased coordination
Adversarial Setting – Ring Set
Cover
Faster
Larger Error
Coord
Free
Conc
Ctrl
Always fast
Possibly wrong
Possibly slow
Always optimal