Determinant Preserving Sparsification of SDDM

Determinant Preserving Sparsification of
SDDM Matrices with Applications to
Counting and Sampling Spanning Trees
Richard Peng
Georgia Tech
Joint work with:
David Durfee John Peebles Anup B. Rao
OUTLINE
• Laplacians and matrix-tree theorem
• Applications of det. preserving sparsification
• Proof of sparsification guarantees
GRAPH LAPLACIANS
Matrices that correspond to undirected graphs
1
3
2
3 -1 -2
-1 4 -3
-2 -3 5
• Coordinates  vertices
• Non-zeros  edges
Many uses in: scientific computing, network
science, combinatorial optimization
KIRCHOFF’S MATRIX TREE THEOREM
# / total weight of spanning trees of G
=
determinant of n-1 sized minor of LG
1
3
2
Total trees = 1 × 2 +
2 × 3 + 1 × 3 = 11
3 -1 -2
-1 4 -3
-2 -3 5
w(T) = ∏e∈ Tw(e)
det+(L)
det+ = 3 × 4 – 1 = 11
ALGORITHMS FOR DET+(L)
General matrix determinant: O(nω), ω ≈ 2.3727
Sparse graphs:
• det+(G / e): number of trees containing e
• det+(G / e) / det+(G): leverage score, τe
Linearity of expectation: Σe τe = n - 1
Cramer’s rule: for e = uv, τe = (L-v-1)uu
• Fast linear system solvers: Õ(m) time
• Pick one tree, contract all its edge: Õ(nm)
• Can also use this to sample a random tree
DENSE VS. SPARSE
Problem
Dense (m ≈ n2)
Sparse (m ≈ n)
O(n2.37)
Õ(nm)
Õ(n5/3m1/3)
Õ(m4/3)
Max matching
O(n2.37)
Õ(m7/4)
Parallel Shortest Path
O(n2.37)
Õ(mn1/2)
Approx. maxflow
Õ(m)
Õ(m)
Lx = b
Õ(m)
Õ(m)
Determinant
Rand Spanning tree
Missing piece: sparsification subroutine
SPARSIFICATION FOR DET+(G)
O(n1.5) (rescaled) edges sampled with probabilities
proportional to τe gives H s.t. det+(H)≈ det+(G)
Reason for n2: need to
estimate τe to error n-1/4
Applications: Õ(n2δ-2) time algorithms for:
• Estimating determinant to error (1 ± δ)
• Generate spanning tree from distribution with
TV distance ≤ δ to the uniform distribution
OUTLINE
• Laplacians and matrix-tree theorem
• Applications of det. preserving sparsification
• Proof of sparsification guarantees
SCHUR COMPLEMENTS
L=
L11
L21
L12
V1
L22
V2
Partition V into V1 and V2
Schur complement: Sc(L, V2):
partial state of Gaussian
elimination after removing V2
G=
G[V1]
G[V2]
Sc(L, V2) = L22 – L21 L11-1 L12
Key fact: Sc(L, Vi) is still a graph, can sparsify!
DETERMINANT APPROXIMATION
Determinant invariant under row/column operations
(which are the Gaussian elimination primitives)
L=
L11
L12
V1
L21
L22
V2
det+(L) =
det(L11) ∙
det+(Sc(L, V2))
• [KLPSS `16] / [JKPS `17]: can sample
Sc(L, V2) w.p. (n-1/4 close to) τe in Õ(n2) time
• Recurse + control errors:
T(n, ) = 2T(n / 2, δ / √2) + Õ (n2 δ-2) = Õ (n2 δ-2)
Requires analyzing variance instead of errors
HIGH LEVEL ROLE OF SC(G, V2)
det+(L) = det(L11) ∙ det+(Sc(L, V2)):
Divide step of div-conquer algorithms
G[V1]
G[V2]
G=
G[V1]
Sc(G, V2)
New edges formed by
Schur complement
ALGORITHMS THAT USE DIV-CONQUER
Directly use div-conquer
Div-conquer in some inner loop
Problem
Dense (m ≈ n2)
Sparse (m ≈ n)
O(n2.37)
Õ(nm)
Õ(n5/3m1/3)
Õ(m4/3)
Max matching
O(n2.37)
Õ(m7/4)
Parallel Shortest Path
O(n2.37)
Õ(mn1/2)
Approx. maxflow
Õ(m)
Õ(m)
Lx = b
Õ(m)
Õ(m)
Determinant
Rand Spanning tree
DIV-CONQUER: OI VERSION
CTSC = Chinese (IOI) Team Selection Contest
CTSC`13 Report
CTSC`08 Homework
Div-conquer +
• Convexity / monge search
• Augmented search trees
• KMP and suffix-tree/array
• Voronoi diagrams
DIV-CONQUER + ? + ERRORS
(our result)
[Kyng-Sachdeva FOCS `16]
Determinant-preserving
sparsifiers of Sc(G, V2)
Sampling spanning
trees in Õ(n2) time
[DKPRS STOC `17]: Õ(n5/3m1/3)
SPANNING TREE DISTRIBUTIONS
Tree distribution given by:
• H  Sparsify(G)
• T  SampleTree(H)
TV distance: dTV(p, q) = ΣT |p(T) – q(T)|
Bound dTV(trees(G), trees(H = sparsify(G))) by
bounding EH |T ⊆H[det+(H)2] for any tree T
V1
MODIFIED ALGORITHM
V2
New edges formed by
Schur complement
Random spanning tree in Sc(G, V1)
decides all edges in G[V1]
Contract/remove edges of
G[V1] based on tree picked
Find another spanning
tree on Sc(G’, V2)
KEY NEW IDEAS
V1
V2
On quasi-bipartite G’, there is an
(efficient) bijection between trees
on Sc(G’, V2) and trees on G’
• Sparsify Schur complements, then recruse
• Similar to, but messier than determinant
OUTLINE
• Laplacians and matrix-tree theorem
• Applications of det. preserving sparsification
• Proof of sparsification guarantees
SIMPLIFYING ASSUMPTIONS
All edges have leverage score ≤ n / m
• In any G, leverage scores sum to n – 1.
• Split e into m / τe copies, let m ∞.
ASIDE: CONCENTRATION BOUNDS
Matrix concentration (e.g. [RV `97][Tropp `12]):
s = Õ(nε-2) gives LH ≈ε LG
• LH ≈ε LG implies all eigenvalues are within 1 ± ε
• det+(G): product of all non-zero eigenvalues of LG
• n eigenvalues: need ε = 1/n, s ≈ n3
• Variance based proofs: s ≈ n2
MAIN MOTIVATION
Main insight: uniform
leverage scores
≈ complete graph
[Janson, `94]: a random graph
with O(n1.5) edges, G(n, O(n1.5))
has concentrated numbers of:
• Spanning trees
• Matchings
• Hamiltonian cycles
Aside: this does not work for G(n, n-0.5)!
EXPECTATION
Rand subset of s > n2 edges,
picked without replacement
Probability of a single edge picked: p = s/m
Probability of a tree picked:
Linearity of expectation:
𝑛2
𝑛−1
𝑝
⋅ exp − − 𝑜 1
2𝑠
𝑛2
𝑛−1
𝐸 𝑇 𝐻 =𝑇 𝐺 ⋅𝑝
⋅ exp − − 𝑜 1
2𝑠
Goal: show E[T(H)2] is close to the square of this
BOUNDING SECOND MOMENT
Goal: show E[T(H)2] is close to:
𝑇 𝐺
2
⋅𝑝
2𝑛−2
⋅ exp
𝑛2
−
𝑠
−𝑜 1
Main steps:
• Express E[T(H)2] as sum over pairs of
trees of probabilities of both in H.
• Express such probability in terms of the
size of intersection.
• Bound pairs of trees with intersection
size k in terms of k using bounded
leverage scores + negative correlation.
E[T(H)2]
Interpretation:
• # of pairs of trees in H
• 𝑇1 ,𝑇2 Pr 𝑇1 ⊆ 𝐻 𝑇2 ⊆ 𝐻
𝐻
=
𝑇1 ,𝑇2 Pr[
𝐻
𝑇1 ⋃𝑇2 ⊆ 𝐻]
Depends only on k = |T1 ∩ T2|, bound by:
𝑘
2
2𝑛
1
2𝑛
𝑝2𝑛−2 ⋅ exp −
⋅ ⋅ 1+
𝑠
𝑝
𝑠
INCORPORATING LEVERAGE SCORES
S: subset of k edges
Negative correlation between trees:
Number of T containing S ≤ T(G) ∏e ∈ Sτe
Uniform leverage score assumption: τe ≤ n / m
• Number of trees containing S: ≤ T(G) ∙ (n / m)k
• Pairs of trees containing S: ≤ T(G)2 ∙ (n / m)2k
Number of subsets of E of size k:
Total number of pairs: ≤ 𝑇 𝐺
2
⋅
1
𝑘!
𝑚
𝑘
⋅
𝑚𝑘
≤
𝑘!
𝑛2 𝑘
𝑚
PUTTING THINGS TOGETHER
# pairs of T1, T2
𝑘
2
1
𝑛
𝑇 𝐺 2
𝑘! 𝑚
PrH[H contains both T1, T2]
𝑘
2
2𝑛
⋅ 𝑝2𝑛−2 ⋅ exp −
𝑠
1
2𝑛
⋅ 1+
𝑝
𝑠
𝑘
Terms depending on k:
𝑛2
𝑘
1
1
2𝑛
⋅
⋅ ⋅ 1+
𝑘! 𝑚 𝑝
𝑠
𝑘
𝑛2
𝑛3
≤ exp
+𝑂 2
𝑠
𝑠
Subbing in E[T(H)]: ≤ E T H
2
⋅ exp 𝑂
𝑛3
𝑠2
FUTURE DIRECTIONS
• Matrix-concentration based extensions?
• Determinatal processes?
• Janson `94: matchings and Hamitonian tours.
• Getting fewer than n1.5 edges? Directly
work with TV distances? (skip det)
• Removing n2 factor (result of needing
estimates of τe with error n-1/4)
• Combine with algorithms for sparse
graphs? (KM `09, MST `15)
(some) references:
• Paper: https://arxiv.org/abs/1705.00985
• [DKPRS `17]: https://arxiv.org/abs/1705.00985
• [Kyng-Sachdeva `16]: https://arxiv.org/abs/1605.02353
• [KLPSS `16]: https://arxiv.org/abs/1512.01892