Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos 2007-8-13 KDD 2007, San Jose Proximity on Graph • Un-directed graph – What is Prox between A and B – ‘how close is Smith to Johnson’? 1 1 A 1 B 1 1 1 1 But, many real graphs are directed…. 2 Edge Direction w/ Proximity 11 11 A A 0.5 1 B B 11 11 10.5 11 What is Prox from A to B? What is Prox from B to A? 3 Motivating Questions (Fast DAP) • Q1: How to define it? • Q2: How to compute it efficiently? • Q3: How to benefit real applications? 4 Roadmap • DAP definitions – Escape Probability – Issue # 1: ‘degree-1 node’ effect – Issue # 2: weakly connected pair • Computational Issues – FastAllDAP: ALL pairs – FastOneDAP: One pair • Experimental Results • Conclusion 5 Defining DAP: escape probability • Define Random Walk (RW) on the graph • Esc_Prob(AB) – Prob (starting at A, reaches B before returning to A) A the remaining graph B Esc_Prob = Pr (smile before cry) 6 Esc_Prob: Example 1 1 A 0.5 B 1 1 0.5 1 Esc_Prob(a->b)=1 > Esc_Prob(b->a)=0.5 7 Esc_Prob is good, but… • Issue #1: – `Degree-1 node’ effect • Issue #2: – Weakly connected pair Need some practical modifications! 8 Issue#1: `degree-1 node’ effect [Faloutsos+] [Koren+] A 1 D 1 B Esc_Prob(a->b)=1 A 1 D 1/3 1 E 1/3 B 1/3 1 F Esc_Prob(a->b)=1 • no influence for degree-1 nodes (E, F)! – known as ‘pizza delivery guy’ problem in undirected graph • Solutions: Universal Absorbing Boundary! 9 Universal Absorbing Boundary Footnote: fly-out probability = 0.1 A 0.9 1 D 0.9 1 B 0.1 0.1 U-A-B U-A-B 0.1 1 U-A-B is a black-hole! 10 Introducing Universal-Absorbing-Boundary A 1 D 1 B A 0.9 D 0.9 B 0.1 Esc_Prob(a->b)=1 0.1 0.1 U-A-B Prox(a->b)=0.91 A 1 D 1/3 1 E 1/3 B A 0.9 D E F F 0.1 0.1 0.1 0.1 0.1 Esc_Prob(a->b)=1 B 0.3 0.9 0.3 0.9 1/3 1 0.3 U-A-B Prox(a->b)=0.74 Footnote: fly-out probability = 0.1 11 Issue#2: Weakly connected pair A 1 1 1 B Prox(AB) = Prox (BA)=0 Solution: Partial symmetry! . a w i w j i j . (1-a) w 12 Practical Modifications: Partial Symmetry A 1 1 1 B Prox(AB) = Prox (BA)=0 0.1 A 0.9 0.1 0.9 0.1 0.9 B Prox(AB) =0.081 > Prox (BA)=0.009 13 Roadmap • DAP definitions – Escape Probability – Issue # 1: ‘degree-1 node’ effect – Issue # 2: weakly connected pair • Computational Issues – FastAllDAP: ALL pairs – FastOneDAP: One pair • Experimental Results • Conclusion 14 Solving Esc_Prob: [Doyle+] P: transition matrix (row norm.) n: # of nodes in the graph 1 x (n-2) i^th row removing i^th & j^th elements (n-2) x (n-2) P removing i^th & j^th rows & cols 1 x (n-2) i^th col removing i^th & j^th elements One matrix inversion , one Esc_Prob! 15 P= p1,1 p2,1 p 3,1 p4,1 p5,1 p 6,1 p1,2 p1,3 p1,4 p1,5 p1,6 p2,2 p2,3 p2,4 p2,5 p2,6 p3,2 p3,3 p3,4 p3,5 p3,6 p4,2 p4,3 p4,4 p4,5 p4,6 p5,2 p5,3 p5,4 p5,5 p5,6 p6,2 p6,3 p6,4 p6,5 p6,6 6 0.5 3 0.5 0.5 1 0.5 0.5 0.5 2 0.5 1 1 5 0.5 4 P: Transition matrix (row norm.) -1 Esc_Prob(1->5) = I- + 18 Solving DAP (Straight-forward way) 1-c: fly-out probability (to black-hole) 1 x (n-2) (n-2) x (n-2) 1 x (n-2) One matrix inversion, one proximity! 19 Challenges • Case 1, Medium Size Graph – Matrix inversion is feasible, but… – What if we want many proximities? – Q: How to get all (n 2 ) proximities efficiently? – A: FastAllDAP! • Case 2: Large Size Graph – Matrix inversion is infeasible – Q: How to get one proximity efficiently? – A: FastOneDAP! 20 FastAllDAP • Q1: How to efficiently compute all possible proximities on a medium size graph? – a.k.a. how to efficiently solve multiple linear systems simultaneously? • Goal: reduce # of matrix inversions! 21 FastAllDAP: Observation P= P= p1,1 p2,1 p 3,1 p4,1 p5,1 p 6,1 p1,2 p1,3 p1,4 p1,5 p1,6 p2,2 p2,3 p2,4 p2,5 p2,6 p3,2 p3,3 p3,4 p3,5 p3,6 p4,2 p4,3 p4,4 p4,5 p4,6 p5,2 p5,3 p5,4 p5,5 p5,6 p6,2 p6,3 p6,4 p6,5 p6,6 p1,1 p2,1 p 3,1 p4,1 p5,1 p 6,1 p1,2 p1,3 p1,4 p1,5 p1,6 p2,2 p2,3 p2,4 p2,5 p2,6 p3,2 p3,3 p3,4 p3,5 p3,6 p4,2 p4,3 p4,4 p4,5 p4,6 p5,2 p5,3 p5,4 p5,5 p5,6 p6,2 p6,3 p6,4 p6,5 p6,6 6 0.5 3 0.5 0.5 1 0.5 0.5 0.5 2 0.5 1 1 4 5 0.5 Need two different matrix inversions! 22 Prox(1 5) P= p1,1 p2,1 p 3,1 p4,1 p5,1 p 6,1 p1,2 p1,3 p1,4 p1,5 FastAllDAP: Rescue p 1,6 p2,2 p2,3 p2,4 p2,5 p2,6 p3,2 p3,3 p3,4 p3,5 p3,6 p4,2 p4,3 p4,4 p4,5 p4,6 p5,2 p5,3 p5,4 p5,5 p5,6 p6,2 p6,3 p6,4 p6,5 p6,6 Prox(1 6) P= p1,1 p2,1 p 3,1 p4,1 p5,1 p 6,1 p1,2 p1,3 p1,4 p1,5 p1,6 p2,2 p2,3 p2,4 p2,5 p2,6 p3,2 p3,3 p3,4 p3,5 p3,6 p4,2 p4,3 p4,4 p4,5 p4,6 p5,2 p5,3 p5,4 p5,5 p5,6 p6,2 p6,3 p6,4 p6,5 p6,6 Overlap between two gray parts! Redundancy among different linear systems! 23 FastAllDAP: Theorem • Theorem: • Example: • Proof: by SM Lemma 24 FastAllDAP: Algorithm • Alg. – Compute Q – For i,j =1,…, n, compute • Computational Save O(1) instead of O(n2 )! • Example – w/ 1000 nodes, – 1m matrix inversion vs. 1 matrix! 25 FastOneDAP • Q1: How to efficiently compute one single proximity on a large size graph? – a.k.a. how to solve one linear system efficiently? • Goal: avoid matrix inversion! 26 FastOneDAP: Observation 6 0.5 3 0.5 0.5 1 0.5 0.5 0.5 2 0.5 1 1 4 5 0.5 Partial Info. (4 elements /2 cols ) of Q is enough! 27 FastOneDAP: Observation • Q: How to compute one column of Q? • A: Taylor expansion [0, …0, 1, 0, …, 0] T i th col of Q Reminder: 28 FastOneDAP: Observation [0, …0, 1, 0, …, 0] T i th col of Q …. x x x Sparse matrix-vector multiplications! 29 FastOneDAP: Iterative Alg. • Alg. to estimate ith Col of Q 30 FastOneDAP: Property • Convergence Guaranteed ! • Computational Save – Example: • 100K nodes and 1M edges (50 Iterations) • 10,000,000x fast! • Footnote: 1 col is enough! – (details in paper) 31 Roadmap • DAP definitions – Escape Probability – Issue # 1: ‘degree-1 node’ effect – Issue # 2: weakly connected pair • Computational Issues – FastAllDAP: ALL pairs – FastOneDAP: One pair • Experimental Results • Conclusion 32 Datasets (all real) Name Node # Edge # Directionality WL 4k 10k A-links to-B PC 36k 64k Who-contact-whom EP 76k 509k Who-trust-whom CN 28k 353k A-cites-B AE 38k 115k Who-email to-whom 33 0.18 Link Prediction: existence density 0.16 0.14 0.12 0.1 0.08 with link 0.06 0.04 Prox (ij)+Prox (ji) 0.02 0 0 0.25 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 DAP is effective to distinguish red and blue! density 0.2 0.15 no link 0.1 0.05 Prox (ij)+Prox (ji) 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 35 Link Prediction: existence Dataset Accuracy WL 65.40% PC 79.60% AE 81.51% CN 86.71% EP 92.21% 37 Link Prediction: direction • Q: Given the existence of the link, what is the direction of the link? • A: Compare prox(ij) and prox(ji) density >70% 38 Prox (ij) - Prox (ji) Efficiency: FastAllDAP Time (sec) Straight-Solver 1,000x faster! FastAllDAP Size of Graph 41 Efficiency: FastOneDAP Time (sec) Straight-Solver 1,0000x faster! FastOneDAP Size of Graph 42 Roadmap • DAP definitions – Escape Probability – Issue # 1: ‘degree-1 node’ effect – Issue # 2: weakly connected pair • Computational Issues – FastAllDAP: ALL pairs – FastOneDAP: One pair • Experimental Results • Conclusion 43 Conclusion (Fast DAP) • Q1: How to define it? • A1: Esc_Prob + Practical Modifications • Q2: How to compute it efficiently? • A2: FastAllDAP & FastOneDAP – (100x – 10,000x faster!) • Q3: How to benefit real applications? • A3: Link Prediction (existence & direction) 44 More in the paper… • Generalization to group proximity – Definitions; Fast solutions – ‘How close between/from CEOs and/to Accountants?’ • More applications – Dir-CePS, attributed-graphs B B B B ... C A C A CePS Common descendant C A Common ancestor A C Descendant of B; & Common 45 ancestor of A and C Cupid uses arrows, so does graph mining! Thank you! www.cs.cmu.edu/~htong 46
© Copyright 2026 Paperzz