Slides

Estimating PageRank on Graph
Streams
Atish Das Sarma (Georgia Tech)
Sreenivas Gollapudi,
Rina Panigrahy
(Microsoft Research)
PageRank
• PageRank
– Determine Ranking of nodes in graphs
• Typically large graphs - WWW, Social
Networks
• Run daily by commercial search engines
PageRank computation
a
b
u
c
PageRank Computation
a
b
u
c
Our Approach:
No Matrix-Vector
Multiplication!
Our Result
Many Random Walk Samples
Efficiently.
Approximate PageRank
u
Other results from Random Walks
G
u
We can estimate:
Mixing Time
Conductance
Using Streams
Streaming
e1, e2, e3, e4, e5, e6, e7,
….
Few Passes
010001011
Input is a
“stream”
Frequency moments, quantiles
011101011
0100110111
Small RAM
working memory
Graphs: Edges, arbitrary
order
7
Related Work
• Sparsifiers (Benczur-Karger 96, Spielman-Teng
01, Spielman-Srivastava 08)
– Given an undirected graph, produces a sparse one
– approximately preserves x’Lx
– Can be used to compute sparse cuts
• Streaming version of BK96 (Ahn, Guha 09)
~
– Sparse cuts in 1 pass and O(n) space.
• Accelarated Page Rank (McSherry 08)
– heuristics
8
Key Idea
l
u
v
One walk from u
length l efficiently
Later extend to
Many walks
Single Random Walk - Naive Algo.
s
One Step
with every
Pass!
Constant Space
Passes
Second Naive Algo
s
Single Pass
Sample sufficient edges!
If
,
then sample
2 out-edges
from each node.
(store order)
Comparison
Naive (single walk):
l
Our Result:
u
In fact
Automatically:
walks!
Insight: Merge Short Walks
Sample
w
fraction of nodes
(centers)
w
s
a
length walks
b
w
w
passes -
w
w
Merge and extend
short walks!
w
Two problems:
End up at node second time
End up at non-sampled node
Stuck Nodes
w
w
s
w
Again.
And again...
Slow?
w
w
Sample an edge
from stuck.
w
w
If new nodes, good
in
passes!
Stuck nodes
Stuck on same
Nodes?
w
w
s
s
s
w
w
w
w
s
w
w
s
w
s
Sample s edges
from each
Must include to set
previous seen
centers
s
s progress OR
new node!
Summary
w
w
s
s
s
w
w
w
w
s
w
w
s
w
s
s
• Perform short walks
from sampled centers
• Concatenate walks until
stuck
• Sample edges from
stuck
• Make local progress
until new node
• Local progress = s
• New node : center with
prob
• Amortized progress,
every pass
Summary
w
w
s
s
s
w
Total number of
passes :
w
w
w
Total Space :
s
w
w
s
w
s
s
Summary
w
Set
w
s
s
s
w
w
w
Number of
passes =
w
s
w
w
s
w
s
s
Space =
Many Walks
Naive Space Bound:
w
w
s
s
s
w
~
O (n ) for K  n / l
w
w
w
s
w
We show:
w
s
w
s
s
Observation:
Many short walks
not used in
Single RW.
Many Random Walks
• ri : probability node ’s short walk used in
single RW.
• If ri known : save lot of space!
• Perform K random walks
• Total number of short walks required is
about
l
K  ri  K
w
ri
• Don’t know ri . But can estimate.
Estimating ri
l

O
u
• Run K = (log n) walks
of length
• Gives a crude estimate
of ri
• Sufficient to double K
• Continue doubling K
• Gives K walks in space
~
O(n  K
• Passes
l

 Kl )
Distributions
samples
Space
Passes
u
Distribution:
Mixing Time, Conductance
• Undirected graphs: Compare Distribution
with Steady State.
• Estimating
difference:
samples.
[Batu et. al.’ 01]
– approximate mixing time.
• Directed, till distribution “stabilizes”:
samples.
• Conductance:
~
• Recall space for walks: O (n ) for K  n / l
Results recap
• - Mixing Time for Undirected Graphs :
~
Space : O (n )
• Quadratic Approximation to Conductance
• PageRank to accuracy
Open Questions?
• Improve passes for random walks. In
particular, sub-linear space and constant
passes.
• Graph Cuts and Graph Sparsification for
directed graphs
• Better (streaming) algorithms for computing
eigenvectors
Thank You!
Summary
•
•
•
•
•
•
•
Perform short walks from sampled centers
Concatenate walks until stuck
Sample edges from stuck
Make local progress until new node
Local progress = s
New node =
nodes gives center
Amortized, every pass -
Summary
•
•
•
•
•
•
•
Perform short walks from sampled centers
Concatenate walks until stuck
Sample edges from stuck
Make local progress until new node
Local progress = s
New node =
nodes gives center
Amortized, every pass -
Analysis
•
•
•
•
•
Total number of passes :
Total Space :
Set
Number of passes =
Space =