Outline Eigenvalues of random graphs Combinatorial and Spectral

Fast, Randomized Algorithms for
Partitioning, Sparsification, and
the Solution of Linear Systems
Daniel A. Spielman
MIT
Joint work with Shang-Hua Teng (Boston University)
Papers
Nearly-Linear Time Algorithms for
Graph Partitioning, Graph Sparsification,
and Solving Linear Systems
S-Teng ‘04
The Mixing Rate of Markov Chains,
An Isoperimetric Inequality,
and Computing The Volume
Lovasz-Simonovits ‘93
The Eigenvalues of Random Symmetric Matrices
Füredi-Komlós ‘81
Overview
Preconditioning to solve linear system
Find B that approximates A,
such that solving
is easy
Three techniques:
Augmented Spanning Trees [Vaidya ’90]
Sparsification:
Approximate graph by sparse graph
Partitioning:
Runtime proportional to #nodes removed
Outline
I. Linear system solvers
Combinatorial and Spectral Graph Theory
II. Sparsification using partitioning and
random sampling
Eigenvalues of random graphs
III. Graph partitioning by truncated random walks
Diagonally Dominant Matrices
Solve
where
diags of A at least sum of abs of others in row
1
2
Ex: Laplacian matrices of weighted graphs
= negative weight from i to j
diagonal = weighted degree
Just solve
for
3
Complexity of Solving Ax = b
A positive semi-definite
General Direct Methods:
Gaussian Elimination (Cholesky)
Fast matrix inversion
Conjugate Gradient
n is dimension
m is number non-zeros
Complexity of Solving Ax = b
A positive semi-definite, structured
Express A = L LT
Path: forward and backward elimination
Tree: like path, work up from leaves
Planar: Nested dissection
[Lipton-Rose-Tarjan ’79]
Iterative Methods
Preconditioned Conjugate Gradient
Find easy B that approximates A
Solve in time
Quality
of approximation
Time to solve
By = c
Omit for rest
of talk
Main Result
For symmetric, diagonally dominant A
Iteratively solve
General:
Planar:
in time
Vaidya’s Subgraph Preconditioners
Precondition A by the Laplacian of a subgraph, B
2
1
A
4
2
1
B
3
4
3
History
Vaidya ’90
Relate
to graph embedding cong/dil
use MST
augmented MST
planar:
Gremban, Miller ‘96
Steiner vertices, many details
Joshi ’97, Reif ‘98
Recursive, multi-level approach
planar:
Bern, Boman, Chen, Gilbert, Hendrickson, Nguyen, Toledo
All the details, algebraic approach
History
Maggs, Miller, Parekh, Ravi, Woo ‘02
after preprocessing
Boman, Hendrickson ‘01
Low-stretch spanning trees
S-Teng ‘03
Augmented low-stretch trees
Clustering, Partitioning,
Sparsification, Recursion
Lower-stretch spanning trees
The relative condition number:
if A is positive semi-definite
if A-B is positive semi-definite,
that is,
For A Laplacian of graph with edges E
Bounding
For B subgraph of A,
and so
Fundamental Inequality
2
3
4
5
1
1
8
8
7
6
Fundamental Inequality
2
3
4
5
1
1
8
8
7
6
Application to eigenvalues of graphs
= 0 for Laplacian matrices, and
Example:
For complete graph on n nodes,
all non-zero eigs = n
For path,
-7
-5
-3
-1
1
3
5
7
Lower bound on
[Gauttery-Leighton-Miller ‘97]
Lower bound on
So
And
Preconditioning with a Spanning Tree B
A
B
Every edge of A not in B has unique path in B
When B is a Tree
Low-Stretch Spanning Trees
Theorem (Boman-Hendrickson ’01)
where
Theorem (Alon-Karp-Peleg-West ’91)
Theorem (Elkin-Emek-S-Teng ’04)
Vaidya’s Augmented Spanning Trees
B a spanning tree plus s
edges,
in total
Adding Edges to a Tree
Adding Edges to a Tree
Partition tree into t sub-trees, balancing stretch
Adding Edges to a Tree
Partition tree into t sub-trees, balancing stretch
For sub-trees connected in A,
add one such bridge edge, carefully
(and in blue)
Adding Edges to a Tree
Theorem:
t sub-trees
improve
in
general to
if planar
Sparsification
Feder-Motwani ’91, Benczur-Karger ’96
D. Eppstein, Z. Galil, G.F. Italiano, and T. Spencer. ‘93
D. Eppstein, Z. Galil, G.F. Italiano, and A. Nissenzweig ’97
All graphs can be well-approximated by a sparse graph
Sparsification
Benczur-Karger ’96:
Can find sparse subgraph H s.t.
(edges are subset, but different weights)
We need
H has
edges
Example: Complete Graph
If A is Laplacian of Kn,
all non-zero eigenvalues are n
If B is Laplacian of Ramanujan expander
all non-zero eigenvalues satisfy
And so
Example: Dumbell
A
Kn
If B
Kn
does not contain middle edge,
Example: Grid plus edge
(m-1)2
1 1
1
1
• Random sampling not sufficient.
• Cut approximation not sufficient
= (m-1)2 + k(m-1)
Conductance
Cut = partition of vertices
S
Conductance of S
Conductance of G
Conductance
S
S
S
Conductance and Sparsification
If conductance high (expander)
can precondition by random sampling
If conductance low
can partition graph by removing few edges
Decomposition:
Partition of vertex set
remove few edges
graph on each partition has high conductance
Graph Decomposition
Lemma: Exists partition of vertices
Each Vi has large
Alg: sample these
edges
At most half the edges
cross partition
recurse on these
edges
Graph Decomposition Exists
Lemma: Exists partition of vertices
Each Vi has
At most half edges cross
Proof: Let S be largest set s.t.
If
then
Graph Decomposition Exists
Proof: Let S be largest set s.t.
If
then
S
Graph Decomposition Exists
Proof: Let S be largest set s.t.
If
then
If S small
only recurse in S
S
If S big, V-S not too big
S
V-S
Bounded recursion depth
Sparsification from Graph Decomposition
Alg: sample these
edges
recurse on these
edges
Theorem:
Exists B with #edges <
Need to find the partition in nearly-linear time
Sparsification
Thm: Given G, find subgraph H s.t.
#edges(H) <
in time
Thm: For all A, find B
General solve time:
edges
Cheeger’s Inequality
[Sinclair-Jerrum ‘89]:
For Laplacian L, and
D: diagonal matrix with
= degree of node i
Random Sampling
Given a graph
will randomly sample to get
So that
where
Useful if
is small,
is diagonal matrix of degrees in
is big
Useful if
is big
If
and
Then, for all x orthogonal to (mutual) nullspace
But, don’t want D in there
D has no impact:
So
implies
Work with adjacency matrix
= weight of edge from i to j,
0 if i = j
No difference, because
Random Sampling Rule
Choose param
governing sparsity of
Keep edge (i,j) with prob
If keep edge, raise weight by
Random Sampling Rule
guarantees
And,
guarantees expect at most
edges in
Random Sampling Theorem
Theorem
Useful for
Will prove in unweighted case.
From now on, all edges have weight 1
Analysis by Trace
Trace = sum of diagonal entries
= sum of eigenvalues
For even k
Analysis by Trace
Main Lemma:
Proof of Theorem:
Markov:
k-th root
Expected Trace
1,2
1
2,3
3
3,4
2
2,1
4,2
4
Expected Trace
Most terms zero because
So, sum is zero unless each edge appears
at least twice.
Will code such walks to count them.
Coding walks
For sequence
S = {i : edge
where
not used before, either way}
the index of the time edge
was used before, either way
Coding walks: Example
step: 0, 1, 2, 3, 4, 5, 6, 7, 8
vert: 1, 2, 3, 4, 2, 3, 4, 2, 1
3
4
S = {1, 2, 3, 4}
2
: 1  2
23
34
42
: 5  2
63
74
81
1
Now, do a few slides pointing out what it meas
Coding walks: Example
step: 0, 1, 2, 3, 4, 5, 6, 7, 8
vert: 1, 2, 3, 4, 2, 3, 4, 2, 1
3
4
S = {1, 2, 3, 4}
2
: 1  2
23
34
42
: 5  2
63
74
81
1
Now, do a few slides pointing out what it meas
Coding walks: Example
step: 0, 1, 2, 3, 4, 5, 6, 7, 8
vert: 1, 2, 3, 4, 2, 3, 4, 2, 1
3
4
S = {1, 2, 3, 4}
2
: 1  2
23
34
42
: 5  2
63
74
81
1
Now, do a few slides pointing out what it meas
Coding walks: Example
step: 0, 1, 2, 3, 4, 5, 6, 7, 8
vert: 1, 2, 3, 4, 2, 3, 4, 2, 1
3
4
S = {1, 2, 3, 4}
2
: 1  2
23
34
42
: 5  2
63
74
81
1
Now, do a few slides pointing out what it meas
Coding walks: Example
step: 0, 1, 2, 3, 4, 5, 6, 7, 8
vert: 1, 2, 3, 4, 2, 3, 4, 2, 1
3
4
S = {1, 2, 3, 4}
2
: 1  2
23
34
42
: 5  2
63
74
81
1
Now, do a few slides pointing out what it meas
Valid s
For each i in S
is a neighbor of
with a probability of being chosen < 1
That is,
can be non-zero
For each i not in S
can take edge indicated by
That is
Expected Trace by from Code
Expected Trace from Code
Are
ways to choose
given
Expected Trace from Code
Are
ways to choose
given
Random Sampling Theorem
Theorem
Useful for
Random sampling sparsifies graphs of high conductance
Graph Partitioning Algorithms
SDP/LP: too slow
Spectral: one cut quickly,
but can be unbalanced
many runs
Multilevel (Chaco/Metis): can’t analyze,
miss small sparse cuts
New Alg: Based on truncated random walks.
Approximates optimal balance,
Can be use to decompose
Run time
Lazy Random Walk
At each step:
stay put with prob ½
otherwise, move to neighbor according to weight
Diffusing probability mass:
keep ½ for self,
distribute rest among neighbors
Lazy Random Walk
Diffusing probability mass:
keep ½ for self,
distribute rest among neighbors
Or, with self-loops,
distribute evenly over edges
1
2
0
0
1
3
0
2
Lazy Random Walk
Diffusing probability mass:
keep ½ for self,
distribute rest among neighbors
Or, with self-loops,
distribute evenly over edges
1/2
2
0
1/2
1
3
0
2
Lazy Random Walk
Diffusing probability mass:
keep ½ for self,
distribute rest among neighbors
Or, with self-loops,
distribute evenly over edges
1/3
2
1/12
1/2
1
3
1/12
2
Lazy Random Walk
Diffusing probability mass:
keep ½ for self,
distribute rest among neighbors
Or, with self-loops,
distribute evenly over edges
1/4
2
7/48
11/24
1
3
7/48
2
Lazy Random Walk
Diffusing probability mass:
keep ½ for self,
distribute rest among neighbors
2
2/8
1/8
3/8
1
In limit,
prob mass ~ degree
3
2/8
2
Why lazy?
Otherwise, might be no limit
1
0
0
1
Why so lazy to keep ½ mass?
Diagonally dominant matrices
weight of edge from i to j
degree of node i, if i = j
prob go from i to j
Rate of convergence
Cheeger’s inequality: related to conductance
If low conductance, slow convergence
If start uniform on S, prob leave at each step =
After
steps, at least ¾ of mass still in S
Lovasz-Simonovits Theorem
If slow convergence, then low conductance
And, can find the cut from highest probability nodes
.137
.135
.094
.134
.129
.112
Lovasz-Simonovits Theorem
From now on, every node has same degree, d
For all vertex sets S, and all t · T
Where
k verts with
most prob at step t
Lovasz-Simonovits Theorem
k verts with
most prob at step t
If start from node in set S with small conductance
will output a set
of small conductance
Extension: mostly contained in S
Speed of Lovasz-Simonovits
If want cut of conductance
can run for
steps.
Want run-time proportional to #vertices removed
Local clustering on massive graph
Problem: most vertices can have non-zero mass
when cut is found
Speeding up Lovasz-Simonovits
Round all small entries of
to zero
Algorithm Nibble
input: start vertex v, target set size k
start:
step:
all values <
map to zero
Theorem: if v in conductance <
output a set with conductance <
mostly overlapping
size k set
The Potential Function
For integer k
Linear in between these points
Concave: slopes decrease
For
Easy Inequality
Fancy Inequality
Fancy Inequality
Chords with little progress
Dominating curve, makes progress
Dominating curve, makes progress
Proof of Easy Inequality
Order vertices by probability mass
time t-1
time t
If top k at time t-1 only connect to top k at time t
get equality
Proof of Easy Inequality
Order vertices by probability mass
time t-1
time t
If top k at time t-1 only connect to top k at time t
get equality
Otherwise, some mass leaves, and get inequality
External edges from Self Loops
Lemma: For every set S, and every set R of same size
At least
frac of edges from S don’t hit R
Tight example:
3
3
(S) = 1/2
3
3
3
3
S
R
External edges from Self Loops
Lemma: For every set S, and every set R of same size
At least (S) frac of edges from S don’t hit R
Proof: When R=S, is by definition.
Each edge in R-S can absorb d edges from S.
But each vertex of S-R has d self-loops that
do not go to R.
S
R
Proof of Fancy Inequality
time t-1
top
in
out
time t
It(k) = top + in · top + (in + out)/2
= top/2 + (top + in + out)/2
Local Clustering
Theorem:
If S is set of conductance <
v is random vertex of S
Then output set of conductance <
mostly in S,
in time proportional to size of output.
Local Clustering
Theorem:
If S is set of conductance <
v is random vertex of S
Then output set of conductance <
mostly in S,
in time proportional to size of output.
Can it be done for all conductances???
Experimental code: ClusTree
1. Cluster, crudely
2. Make trees
in clusters
3. Add edges
between trees,
optimally
No recursion on reduced matrices
Implementation
ClusTree: in java
timing not included
could increase total time by 20%
PCG
Cholesky
droptol Inc Chol
Vaidya
TAUCS
Chen, Rotkin, Toledo
Orderings: amd, genmmd, Metis, RCM
Intel Xeon 3.06GHz, 512k L2 Cache, 1M L3 Cache
2D grid, Neumann bdry
2d Grid, Neumann
120
pcg/ic/genmmd
direct/genmmd
Vaidya
clusTree
100
seconds
80
60
40
20
0
0
0.5
1
num nodes
run to residual error 10-8
1.5
2
6
x 10
Impact of boundary conditions
2d Grid Dirichlet
2D grid, Neumann
2d Grid, Dirichlet
2d Grid, Neumann
100
120
direct/genmmd
Vaidya
pcg/mic/rcm
clusTree
80
pcg/ic/genmmd
direct/genmmd
Vaidya
clusTree
100
80
seconds
60
40
60
40
20
0
0
20
0.5
1
num nodes
1.5
2
6
x 10
0
0
0.5
1
num nodes
1.5
2
6
x 10
2D Unstructured Delaunay Mesh
Dirichlet
Neumann
2d Unstructured Delaunay Mesh, Dirichlet
2d Unstructured Delaunay Mesh, Neumann
120
seconds
80
Vaidya
direct/Metis
pcg/mic/genmmd
clusTree
80
60
60
40
40
20
20
0
0
5
10
num nodes
Vaidya
direct/Metis
pcg/ic/genmmd
clusTree
100
seconds
100
120
15
5
x 10
0
0
5
10
num nodes
15
5
x 10
Future Work
Practical Local Clustering
More Sparsification
Other physical problems:
Cheeger’s inequalty
Implications for combinatorics,
and Spectral Hypergraph Theory
To learn more
“Nearly-Linear Time Algorithms for Graph
Paritioning, Sparsification, and Solving Linear
Systems” (STOC ’04, Arxiv)
Will be split into two papers:
numerical and combinatorial
My lecture notes for
Spectral Graph Theory and its Applications