CMU SCS Peta-Graph Mining Christos Faloutsos Appel, Ana Chau, Polo Leskovec, Jure Kang, U Prakash, Aditya Shringarpure, Suyash Tsourakakis, Charalampos Yahoo/Hadoop, 2008 #1 CMU SCS Our goal: One-stop solution for mining huge graphs Yahoo/Hadoop, 2008 2 CMU SCS Outline Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) Yahoo/Hadoop, (b) NetFlix (20K movies, ~500K2008 users, 100M edges) 3 CMU SCS Degree Distributions - NetFlix count 100 machines - 8min Yahoo/Hadoop, 2008 Movie in-degree 4 CMU SCS Degree Distributions - NetFlix count Theoretically expected 100 machines - 8min Yahoo/Hadoop, 2008 Movie in-degree 5 CMU SCS Degree Distributions - NetFlix count 100 machines - 8min Yahoo/Hadoop, 2008 User out-degree 6 CMU SCS Degree Distributions - NetFlix count Theoretically expected Sharp drop below 100 ratings User out-degree 100 machines - 8min Yahoo/Hadoop, 2008 7 CMU SCS Degree Distributions - Kronecker count degree 100 machines - 6h Yahoo/Hadoop, 2008 Nodes:259M - Edges: 1B 8 CMU SCS Degree Distributions - timings Time (sec) 24 tasks 48 tasks 1 task Yahoo/Hadoop, Edge 2008 file size (MB) 9 CMU SCS Outline Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) Yahoo/Hadoop, (b) NetFlix (20K movies, ~500K2008 users, 100M edges) 10 CMU SCS Diameter Diameter of a graph Maximum shortest path Normally, > O(N**2) ANF : `Approximate Neighborhood function’ [Palmer+02]: O(E) Goal : calculate neighborhood function Neighborhood N(h) : number of pairs of nodes within distance h Yahoo/Hadoop, 2008 11 CMU SCS Time (min) Diameter 1 node 48 nodes 28 nodes Edge file (MB) For large jobs, parallelization helps Yahoo/Hadoop, 2008 Unstable results due to shared machines 12 CMU SCS Diameter / Hop Plot (Netflix) # of reachable pairs within <= h hops h: # of hops Yahoo/Hadoop, 2008 13 CMU SCS Diameter / Hop Plot (Netflix) # of reachable pairs within <= h hops Diameter: 3 h: # of hops Yahoo/Hadoop, 2008 14 CMU SCS Outline Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) Yahoo/Hadoop, (b) NetFlix (20K movies, ~500K2008 users, 100M edges) 15 CMU SCS Community detection • Cross associations [Chakrabarti+ ’04] Yahoo/Hadoop, 2008 16 CMU SCS Community detection Yahoo/Hadoop, 2008 17 CMU SCS Outline Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) Yahoo/Hadoop, (b) NetFlix (20K movies, ~500K2008 users, 100M edges) 18 CMU SCS Triangles • ‘friends of friends are friends’ Yahoo/Hadoop, 2008 19 CMU SCS Triangles • ‘friends of friends are friends’ Yahoo/Hadoop, 2008 20 CMU SCS Triangles • ‘friends of friends are friends’ • Naïve algo: 3-way join (slow) • [Tsourakakis’08]: # triangles ~ sum of cubes of eigenvalues • Thus, super-fast computation of #triangles (100x - 25,000x faster than naïve; >95% accuracy Yahoo/Hadoop, 2008 21 CMU SCS Triangles • Easy to implement on hadoop: it only needs eigenvalues (to do, with Lanczos) Yahoo/Hadoop, 2008 22 CMU SCS Outline Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) Yahoo/Hadoop, (b) NetFlix (20K movies, ~500K2008 users, 100M edges) 23 CMU SCS Visualization Principled visualization of large graphs Yahoo/Hadoop, 2008 (show few most `important’ edges) 24 CMU SCS Summary Centralized Hadoop Degree Distribution old old Pagerank old old Diameter/ANF old X Communities old X Triangles X todo Visualization X todo Goal: one-stop solution for mining huge graphs Yahoo/Hadoop, 2008 25
© Copyright 2026 Paperzz