SCS CMU Colibri: Fast Mining of Large Static and Dynamic Graphs Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug. 24-27, 2008, Las Vegas KDD 2008 SCS CMU Graphs are everywhere! Q: How to find patterns? e.g., community, anomaly, etc. 2 SCS CMU Motivation • Q: How to find patterns? – e.g., community, anomaly, etc. • A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph. X M X A ~ R L 3 SCS CMU LRA for Graph Mining: Example Adj. matrix: A John ICDM KDD Carl ISMB M X Tom Bob L ~ R X Conf. Cluster Interaction Van Roy Author RECOMB Conf. Au. clusters Recon. error is high ‘Carl’ is abnormal 4 SCS CMU Challenges • How to get (L, M, R) + Efficiently (both time and space); + Intuitively (easy for interpretation); + Dynamically (track patterns over time)? 5 SCS CMU Roadmap • Motivation • Existing Methods – SVD – CUR/CX • Proposed Methods: Colibri • Experimental Results • Conclusion 6 SCS CMU Matrix & Column Space • Matrix 3 1 B= 1 1 0 0 b1 , b2 are vectors in 3-d space! b1 b2 • Column Space of a Matrix b2 b1 7 SCS CMU Projection, Projection Matrix & Core Matrix v v~ + X BTB X v~ = BT B X v Core Matrix Projection of v Projection matrix of B An arbitrary vector8 SCS CMU Singular-Value-Decomposition (SVD) … A: n x m ~ u1 …. … uk … m k … …a … x … a2 …. a3 …. v1 … a1 x … … … … 1 …. vk V: right singular vectors … U: left singular vectors 9 SCS CMU SVD: How to • #1: Find the left matrix U, where ui A v i T i a1 vi ,1 a2 vi ,2 ... am vi ,m i • #2: Project A into the column space of U A U (U U ) U A ... U V T T Projection Matrix of Column Space of U 10 SCS CMU SVD: drawbacks A U V • Efficiency 2 2 O (min( n m , nm )) – Time – Space (U, V) are dense = 1st singular vector • Interpretation 2nd singular vector • Dynamic: not easy 11 SCS CMU CUR (CX) decomposition x (C … … … … ……. … ~ C) U x T …. C A R …. •Sample Columns from A to form C •Project A onto the col. Space of C … … … A: n x m T…. C 12 SCS CMU CUR (CX): advantages • Efficiency (better than SVD) 2 3 O ( c n ) or O ( c cm) – Time • (c is # of sampled col.s) – Space (C, R) are sparse • Interpretation 13 SCS CMU CUR (CX): drawbacks • Redundancy in C, wasting both time and space • 3 copies of green, • 2 copies of red, • 2 copies of purple • purple=0.5*green + red… • Dynamic: not easy 14 SCS CMU Roadmap • Motivation • Existing Methods • Colibri – Colibri-S for static graphs – Colibri-D for dynamic graphs • Experimental Results • Conclusion 15 SCS CMU Colibri-S: Basic Idea Original Matrix Colibri-S CUR (CX) … x. x M …. R … . • 3 copies of green, • 2 copies of red, • 2 copies of purple • purple=0.5*green + red… L We want the Col.s in L are linearly independent with each other! 16 SCS CMU Input Output L= … . : Linearly Ind. Col.s -1 …. ? Initially Sampled matrix C LT … . M= = Core Matrix L … . Q: How to find L & M from C efficiently? T R=L xA= …. 17 SCS CMU A: Find L & M iteratively! Initial Sampled Matrix c …. … Current L&M For each col. v in C Project it on L Expand L & M Redundant ? discard v 18 SCS CMU Colibri-S vs. CUR(CX) • Quality: • Colibri-S = CUR(CX) • Time: O(c 3 cm) vs. O(c3 cm), where c c, m m • Colibri-S >= CUR(CX) • Space • Colibri-S >= CUR(CX) • Illustrations Colibri-S CUR (CX) 19 SCS CMU Colirbri-D for dynamic graphs Mt Rt Mt+1 Rt+1 Lt t Initially sampled matrix ? t+1 Lt+1 Q: How to update L and M efficiently? 20 SCS CMU Colibri-D: How-To Selected Redundant Mt Rt Mt+1 Rt+1 Lt t Initially sampled matrix Selected Redundant ? t+1 Lt+1 21 Changed from t SCS CMU Colibri-D: How-To Selected Redundant Mt Lt t Unchanged Cols! ~ M Initially sampled matrix Selected ~ L Redundant t+1 Subspace by blue cols at t+1 M t+1 Lt+1 22 SCS CMU Roadmap • • • • • Motivation Existing Methods Colibri Experimental Results Conclusion 23 SCS CMU Experimental Setup • Datasets • Network traffic • 21,837 sources/destinations • 1,222 consecutive hours • 22,800 edges per hour • Accuracy: Accu = • Space Cost: 24 SCS CMU Performance of Colibri-S CUR CUR CMD Ours CMD Time Ours • Accuracy • Same 91%+ • Time • 12x of CMD • 28x of CUR • Space • ~1/3 of CMD • ~10% of CUR Space 25 SCS CMU More Evaluation on Colibri-S Log Time (Sec) CUR CMD Colibri-S 26 Approximation Accuracy SCS CMU Time Performance of Colibri-D CMD Colibri-S Colibri-D # of changed cols Colibri-D achieves up to 112x speedups 27 SCS CMU A Family of Low-Rank Approximation for Fast Graph Mining • Colibri-S – For static graphs – Remove redundancy – Significant saving in time & space by “free” • Colibri-D – For dynamic graphs – Explores “smoothness” – Up to 112x than best known methods 28 SCS CMU Poster tonight! Thank you! www.cs.cmu.edu/~htong 29
© Copyright 2026 Paperzz