Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research) PageRank • PageRank – Determine Ranking of nodes in graphs • Typically large graphs - WWW, Social Networks • Run daily by commercial search engines PageRank computation a b u c PageRank Computation a b u c Our Approach: No Matrix-Vector Multiplication! Our Result Many Random Walk Samples Efficiently. Approximate PageRank u Other results from Random Walks G u We can estimate: Mixing Time Conductance Using Streams Streaming e1, e2, e3, e4, e5, e6, e7, …. Few Passes 010001011 Input is a “stream” Frequency moments, quantiles 011101011 0100110111 Small RAM working memory Graphs: Edges, arbitrary order 7 Related Work • Sparsifiers (Benczur-Karger 96, Spielman-Teng 01, Spielman-Srivastava 08) – Given an undirected graph, produces a sparse one – approximately preserves x’Lx – Can be used to compute sparse cuts • Streaming version of BK96 (Ahn, Guha 09) ~ – Sparse cuts in 1 pass and O(n) space. • Accelarated Page Rank (McSherry 08) – heuristics 8 Key Idea l u v One walk from u length l efficiently Later extend to Many walks Single Random Walk - Naive Algo. s One Step with every Pass! Constant Space Passes Second Naive Algo s Single Pass Sample sufficient edges! If , then sample 2 out-edges from each node. (store order) Comparison Naive (single walk): l Our Result: u In fact Automatically: walks! Insight: Merge Short Walks Sample w fraction of nodes (centers) w s a length walks b w w passes - w w Merge and extend short walks! w Two problems: End up at node second time End up at non-sampled node Stuck Nodes w w s w Again. And again... Slow? w w Sample an edge from stuck. w w If new nodes, good in passes! Stuck nodes Stuck on same Nodes? w w s s s w w w w s w w s w s Sample s edges from each Must include to set previous seen centers s s progress OR new node! Summary w w s s s w w w w s w w s w s s • Perform short walks from sampled centers • Concatenate walks until stuck • Sample edges from stuck • Make local progress until new node • Local progress = s • New node : center with prob • Amortized progress, every pass Summary w w s s s w Total number of passes : w w w Total Space : s w w s w s s Summary w Set w s s s w w w Number of passes = w s w w s w s s Space = Many Walks Naive Space Bound: w w s s s w ~ O (n ) for K n / l w w w s w We show: w s w s s Observation: Many short walks not used in Single RW. Many Random Walks • ri : probability node ’s short walk used in single RW. • If ri known : save lot of space! • Perform K random walks • Total number of short walks required is about l K ri K w ri • Don’t know ri . But can estimate. Estimating ri l O u • Run K = (log n) walks of length • Gives a crude estimate of ri • Sufficient to double K • Continue doubling K • Gives K walks in space ~ O(n K • Passes l Kl ) Distributions samples Space Passes u Distribution: Mixing Time, Conductance • Undirected graphs: Compare Distribution with Steady State. • Estimating difference: samples. [Batu et. al.’ 01] – approximate mixing time. • Directed, till distribution “stabilizes”: samples. • Conductance: ~ • Recall space for walks: O (n ) for K n / l Results recap • - Mixing Time for Undirected Graphs : ~ Space : O (n ) • Quadratic Approximation to Conductance • PageRank to accuracy Open Questions? • Improve passes for random walks. In particular, sub-linear space and constant passes. • Graph Cuts and Graph Sparsification for directed graphs • Better (streaming) algorithms for computing eigenvectors Thank You! Summary • • • • • • • Perform short walks from sampled centers Concatenate walks until stuck Sample edges from stuck Make local progress until new node Local progress = s New node = nodes gives center Amortized, every pass - Summary • • • • • • • Perform short walks from sampled centers Concatenate walks until stuck Sample edges from stuck Make local progress until new node Local progress = s New node = nodes gives center Amortized, every pass - Analysis • • • • • Total number of passes : Total Space : Set Number of passes = Space =
© Copyright 2026 Paperzz