FastOneDAP - VideoLectures

Fast Direction-Aware Proximity
for Graph Mining
Speaker: Hanghang Tong
Joint work w/ Yehuda Koren, Christos Faloutsos
2007-8-13
KDD 2007, San Jose
Proximity on Graph
• Un-directed graph
– What is Prox between A and B
– ‘how close is Smith to Johnson’?
1
1
A
1
B
1
1
1
1
But, many real graphs are directed….
2
Edge Direction w/ Proximity
11
11
A
A
0.5
1
B
B
11
11
10.5
11
What is Prox from A to B?
What is Prox from B to A?
3
Motivating Questions (Fast DAP)
• Q1: How to define it?
• Q2: How to compute it efficiently?
• Q3: How to benefit real applications?
4
Roadmap
• DAP definitions
– Escape Probability
– Issue # 1: ‘degree-1 node’ effect
– Issue # 2: weakly connected pair
• Computational Issues
– FastAllDAP: ALL pairs
– FastOneDAP: One pair
• Experimental Results
• Conclusion
5
Defining DAP: escape probability
• Define Random Walk (RW) on the graph
• Esc_Prob(AB)
– Prob (starting at A, reaches B before returning to A)
A
the remaining
graph
B
Esc_Prob = Pr (smile before cry)
6
Esc_Prob: Example
1
1
A
0.5
B
1
1
0.5
1
Esc_Prob(a->b)=1 > Esc_Prob(b->a)=0.5
7
Esc_Prob is good, but…
• Issue #1:
– `Degree-1 node’ effect
• Issue #2:
– Weakly connected pair
Need some practical modifications!
8
Issue#1: `degree-1 node’ effect
[Faloutsos+] [Koren+]
A
1
D
1
B
Esc_Prob(a->b)=1
A
1
D
1/3
1
E
1/3
B
1/3
1
F
Esc_Prob(a->b)=1
• no influence for degree-1 nodes (E, F)!
– known as ‘pizza delivery guy’ problem in
undirected graph
• Solutions: Universal Absorbing Boundary!
9
Universal Absorbing Boundary
Footnote: fly-out probability = 0.1
A
0.9
1
D
0.9
1
B
0.1
0.1
U-A-B
U-A-B
0.1
1
U-A-B is a black-hole!
10
Introducing Universal-Absorbing-Boundary
A
1
D
1
B
A
0.9
D
0.9
B
0.1
Esc_Prob(a->b)=1
0.1
0.1
U-A-B
Prox(a->b)=0.91
A
1
D
1/3
1
E
1/3
B
A
0.9
D
E
F
F
0.1
0.1
0.1
0.1
0.1
Esc_Prob(a->b)=1
B
0.3
0.9
0.3
0.9
1/3
1
0.3
U-A-B
Prox(a->b)=0.74
Footnote: fly-out probability = 0.1
11
Issue#2: Weakly connected pair
A
1
1
1
B
Prox(AB) = Prox (BA)=0
Solution: Partial symmetry!
.
a w
i
w
j
i
j
.
(1-a) w
12
Practical Modifications: Partial Symmetry
A
1
1
1
B
Prox(AB) = Prox (BA)=0
0.1
A
0.9
0.1
0.9
0.1
0.9
B
Prox(AB) =0.081 > Prox (BA)=0.009
13
Roadmap
• DAP definitions
– Escape Probability
– Issue # 1: ‘degree-1 node’ effect
– Issue # 2: weakly connected pair
• Computational Issues
– FastAllDAP: ALL pairs
– FastOneDAP: One pair
• Experimental Results
• Conclusion
14
Solving Esc_Prob: [Doyle+]
P: transition matrix (row norm.)
n: # of nodes in the graph
1 x (n-2)
i^th row  removing
i^th & j^th elements
(n-2) x (n-2)
P  removing i^th
& j^th rows & cols
1 x (n-2)
i^th col  removing
i^th & j^th elements
One matrix inversion , one Esc_Prob!
15
P=
 p1,1

 p2,1
p
 3,1
 p4,1

 p5,1
p
 6,1
p1,2 p1,3 p1,4 p1,5 p1,6 

p2,2 p2,3 p2,4 p2,5 p2,6 
p3,2 p3,3 p3,4 p3,5 p3,6 

p4,2 p4,3 p4,4 p4,5 p4,6 

p5,2 p5,3 p5,4 p5,5 p5,6 
p6,2 p6,3 p6,4 p6,5 p6,6 
6
0.5
3
0.5
0.5
1
0.5
0.5
0.5
2
0.5
1
1
5
0.5
4
P: Transition matrix (row norm.)
-1
Esc_Prob(1->5)
=
I-
+
18
Solving DAP (Straight-forward way)
1-c: fly-out probability (to black-hole)
1 x (n-2)
(n-2) x (n-2)
1 x (n-2)
One matrix inversion, one proximity!
19
Challenges
• Case 1, Medium Size Graph
– Matrix inversion is feasible, but…
– What if we want many proximities?
– Q: How to get all (n 2 ) proximities efficiently?
– A: FastAllDAP!
• Case 2: Large Size Graph
– Matrix inversion is infeasible
– Q: How to get one proximity efficiently?
– A: FastOneDAP!
20
FastAllDAP
• Q1: How to efficiently compute all
possible proximities on a medium
size graph?
– a.k.a. how to efficiently solve multiple
linear systems simultaneously?
• Goal: reduce # of matrix inversions!
21
FastAllDAP: Observation
P=
P=
 p1,1

 p2,1
p
 3,1
 p4,1

 p5,1
p
 6,1
p1,2 p1,3 p1,4 p1,5 p1,6 

p2,2 p2,3 p2,4 p2,5 p2,6 
p3,2 p3,3 p3,4 p3,5 p3,6 

p4,2 p4,3 p4,4 p4,5 p4,6 

p5,2 p5,3 p5,4 p5,5 p5,6 
p6,2 p6,3 p6,4 p6,5 p6,6 
 p1,1

 p2,1
p
 3,1
 p4,1

 p5,1
p
 6,1
p1,2 p1,3 p1,4 p1,5 p1,6 

p2,2 p2,3 p2,4 p2,5 p2,6 
p3,2 p3,3 p3,4 p3,5 p3,6 

p4,2 p4,3 p4,4 p4,5 p4,6 

p5,2 p5,3 p5,4 p5,5 p5,6 
p6,2 p6,3 p6,4 p6,5 p6,6 
6
0.5
3
0.5
0.5
1
0.5
0.5
0.5
2
0.5
1
1
4
5
0.5
Need two different matrix inversions!
22
Prox(1  5)
P=
 p1,1

 p2,1
p
 3,1
 p4,1

 p5,1
p
 6,1
p1,2 p1,3 p1,4 p1,5
FastAllDAP:
Rescue
p 
1,6

p2,2 p2,3 p2,4 p2,5 p2,6 
p3,2 p3,3 p3,4 p3,5 p3,6 

p4,2 p4,3 p4,4 p4,5 p4,6 

p5,2 p5,3 p5,4 p5,5 p5,6 
p6,2 p6,3 p6,4 p6,5 p6,6 
Prox(1  6)
P=
 p1,1

 p2,1
p
 3,1
 p4,1

 p5,1
p
 6,1
p1,2 p1,3 p1,4 p1,5 p1,6 

p2,2 p2,3 p2,4 p2,5 p2,6 
p3,2 p3,3 p3,4 p3,5 p3,6 

p4,2 p4,3 p4,4 p4,5 p4,6 

p5,2 p5,3 p5,4 p5,5 p5,6 
p6,2 p6,3 p6,4 p6,5 p6,6 
Overlap between
two gray parts!
Redundancy among different linear systems!
23
FastAllDAP: Theorem
• Theorem:
• Example:
• Proof: by SM Lemma
24
FastAllDAP: Algorithm
• Alg.
– Compute Q
– For i,j =1,…, n, compute
• Computational Save O(1) instead of O(n2 )!
• Example
– w/ 1000 nodes,
– 1m matrix inversion vs. 1 matrix!
25
FastOneDAP
• Q1: How to efficiently compute
one single proximity on a large
size graph?
– a.k.a. how to solve one linear system
efficiently?
• Goal: avoid matrix inversion!
26
FastOneDAP: Observation
6
0.5
3
0.5
0.5
1
0.5
0.5
0.5
2
0.5
1
1
4
5
0.5
Partial Info. (4 elements /2 cols ) of Q is enough!
27
FastOneDAP: Observation
• Q: How to compute one column of Q?
• A: Taylor expansion
[0, …0, 1, 0, …, 0]
T
i th col of Q
Reminder:
28
FastOneDAP: Observation
[0, …0, 1, 0, …, 0]
T
i th col of Q
….
x
x
x
Sparse matrix-vector multiplications!
29
FastOneDAP: Iterative Alg.
• Alg. to estimate ith Col of Q
30
FastOneDAP: Property
• Convergence Guaranteed !
• Computational Save
– Example:
• 100K nodes and 1M edges (50 Iterations)
• 10,000,000x fast!
• Footnote: 1 col is enough!
– (details in paper)
31
Roadmap
• DAP definitions
– Escape Probability
– Issue # 1: ‘degree-1 node’ effect
– Issue # 2: weakly connected pair
• Computational Issues
– FastAllDAP: ALL pairs
– FastOneDAP: One pair
• Experimental Results
• Conclusion
32
Datasets (all real)
Name Node # Edge # Directionality
WL
4k
10k
A-links to-B
PC
36k
64k
Who-contact-whom
EP
76k
509k
Who-trust-whom
CN
28k
353k
A-cites-B
AE
38k
115k
Who-email to-whom
33
0.18
Link Prediction:
existence
density
0.16
0.14
0.12
0.1
0.08
with link
0.06
0.04
Prox (ij)+Prox (ji)
0.02
0
0
0.25
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
DAP is effective to distinguish red and blue!
density
0.2
0.15
no link
0.1
0.05
Prox (ij)+Prox (ji)
0
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
35
Link Prediction: existence
Dataset
Accuracy
WL
65.40%
PC
79.60%
AE
81.51%
CN
86.71%
EP
92.21%
37
Link Prediction: direction
• Q: Given the existence of the link, what is
the direction of the link?
• A: Compare prox(ij) and prox(ji)
density
>70%
38
Prox (ij) - Prox (ji)
Efficiency: FastAllDAP
Time (sec)
Straight-Solver
1,000x
faster!
FastAllDAP
Size of Graph
41
Efficiency: FastOneDAP
Time (sec)
Straight-Solver
1,0000x
faster!
FastOneDAP
Size of Graph
42
Roadmap
• DAP definitions
– Escape Probability
– Issue # 1: ‘degree-1 node’ effect
– Issue # 2: weakly connected pair
• Computational Issues
– FastAllDAP: ALL pairs
– FastOneDAP: One pair
• Experimental Results
• Conclusion
43
Conclusion (Fast DAP)
• Q1: How to define it?
• A1: Esc_Prob + Practical Modifications
• Q2: How to compute it efficiently?
• A2: FastAllDAP & FastOneDAP
– (100x – 10,000x faster!)
• Q3: How to benefit real applications?
• A3: Link Prediction (existence & direction)
44
More in the paper…
• Generalization to group proximity
– Definitions; Fast solutions
– ‘How close between/from CEOs and/to Accountants?’
• More applications
– Dir-CePS, attributed-graphs
B
B
B
B
...
C
A
C
A
CePS
Common
descendant
C
A
Common
ancestor
A
C
Descendant of B; & Common
45
ancestor of A and C
Cupid uses arrows, so does graph mining!
Thank you!
www.cs.cmu.edu/~htong
46