Reduce and Aggregate: Similarity Ranking in Multi

Reduce and Aggregate: Similarity
Ranking in Multi-Categorical
Bipartite Graphs
Alessandro Epasto
J. Feldman*, S. Lattanzi*, S. Leonardi°, V. Mirrokni*.
*Google Research °Sapienza U. Rome
Motivation
● Recommendation Systems:
● Bipartite graphs with Users and Items.
● Identify similar users and suggest relevant
items.
● Concrete example: The AdWords case.
● Two key observations:
● Items belong to different categories.
● Graphs are often lopsided.
Modeling the Data as a Bipartite Graph
2$
Retailers
4$
Apparel
1$
Soccer Shoes
5$
Sport
Equipment
2$
Soccer Ball
Millions of Advertisers
Billions of Queries
Hundreds of Labels
Nike Store
New York
3$
Personalized PageRank
For a node v (the seed) and a probability alpha
v
u
The stationary distribution assigns a similarity
score to each node in the graph w.r.t. node v.
The Problem
2$
Retailers
4$
Apparel
1$
Soccer Shoes
5$
Sport
Equipment
2$
Soccer Ball
Millions of Advertisers
Billions of Queries
Hundreds of Labels
Nike Store
New York
3$
Other Applications
● General approach applicable to several
contexts:
● User, Movies, Genres: find similar users
and suggest movies.
● Authors, Papers, Conferences: find
related authors and suggest papers to
read.
Semi-Formal Problem Definition
Advertisers
Queries
Semi-Formal Problem Definition
Advertisers
A
Queries
Semi-Formal Problem Definition
Advertisers
A
Queries
Labels:
Semi-Formal Problem Definition
Advertisers
A
Queries
Labels:
Goal:
Find the nodes most
“similar” to A.
How to Define Similarity?
● We address the computation of several node
similarity measures:
● Neighborhood based: Common neighbors,
Jaccard Coefficient, Adamic-Adar.
● Paths based: Katz.
● Random Walk based: Personalized PageRank.
● Experimental question: which measure is useful?
● Algorithmic questions:
● Can it scale to huge graphs?
● Can we compute it in real-time?
Our Contribution
● Reduce and Aggregate: general approach to
induce real-time similarity rankings in multicategorical bipartite graphs, that we apply to
several similarity measures.
● Theoretical guarantees for the precision of the
algorithms.
● Experimental evaluation with real world data.
Personalized PageRank
For a node v (the seed) and a probability alpha
v
u
The stationary distribution assigns a similarity
score to each node in the graph w.r.t. node v.
Challenges
● Our graphs are too big (billions of nodes) even for
very large-scale MapReduce systems.
● MapReduce is not real-time.
● We cannot pre-compute the rankings for each
subset of labels.
Reduce and Aggregate
a
a
b
1)
a
b
a
c
2)
c
c
b
c
3)
b
Reduce: Given the bipartite and a category
construct a graph with only A nodes that
preserves the ranking on the entire graph.
Aggregate: Given a node v in A and the reduced
graphs of the subset of categories interested
determine the ranking for v.
Reduce (Precomputation)
Advertisers
Queries
Reduce (Precomputation)
Advertisers
Queries
Precomputed
Rankings
Reduce (Precomputation)
Advertisers
Queries
Precomputed
Rankings
Precomputed
Rankings
Reduce (Precomputation)
Advertisers
Queries
Precomputed
Rankings
Precomputed
Rankings
Precomputed
Rankings
Aggregate (Run Time)
A
Precomputed
Rankings
Precomputed
Rankings
Ranking of
Red + Yellow
Reduce for Personalized PageRank
X
Side A
Y
X
Y
Side B
Side A
● Markov Chain state aggregation theory
(Simon and Ado, ’61; Meyer ’89, etc.).
● 750x reduction in the number of node
while preserving correctly the PPR
distribution on the entire graph.
Run-time Aggregation
Koury et al. Aggregation-Disaggregation Algorithm
A
B
Step 1: Partition the Markov chain into DISJOINT subsets
Koury et al. Aggregation-Disaggregation Algorithm
A
⇡A
B
⇡B
Step 2: Approximate the stationary distribution on each
subset independently.
Koury et al. Aggregation-Disaggregation Algorithm
PAA
⇡A
A
PAB
B
⇡B
PBA
PBB
Step 3: Consider the transition between subsets.
Koury et al. Aggregation-Disaggregation Algorithm
PAA
A
0
⇡A
PAB
B
0
⇡B
PBA
PBB
Step 4: Aggregate the distributions. Repeat until
convergence.
Aggregation in PPR
X
A
Y
⇡A
Precompute the stationary distributions individually
Aggregation in PPR
X
B
Y
⇡B
Precompute the stationary distributions individually
Aggregation in PPR
A
B
The two subsets are not disjoint!
Our Approach
X
⇡A
Y
⇡B
X
Y
● The algorithm is based only on the reduced graphs
with Advertiser-Side nodes.
● The aggregation algorithm is scalable and
converges to the correct distribution.
Experimental Evaluation
● We experimented with publicly available and
proprietary datasets:
● Query-Ads graph from Google AdWords > 1.5
billions nodes, > 5 billions edges.
● DBLP Author-Papers and Patent InventorInventions graphs.
● Ground-Truth clusters of competitors in Google
AdWords.
Patent Graph
Precision vs Recall
1
Inter
Jaccard
Adamic-Adar
Katz
PPR
0.9
0.8
0.7
Precision
Precision
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
Recall
Recall
0.6
0.7
0.8
0.9
1
Precision
Google AdWords
Recall
Conclusions and Future Work
● It is possible to compute several similarity
scores on very large bipartite graphs in
real-time with good accuracy.
● Future work could focus on the case
where categories are not disjoint is
relevant.
Thank you for your attention
Reduction to the Query Side
X
⇡A
Y
⇡B
Reduction to the Query Side
X
⇡A
Y
⇡B
This is the larger side of the graph.
Convergence after One Iteration
Kendall-Tau Correlation
DBLP
Patent
Query-Ads (cost)
1
Kendall-Tau
0.8
0.6
0.4
0.2
0
10
20
30
40
Position (k)
50
All
Convergence
Approximation Error vs # Iterations
0.001
0.0001
1-Cosine
1-Cosine Similarity
DBLP (1 - Cosine)
Patent (1 - Cosine)
1e-05
1e-06
0
2
4
6
8
10
Iterations
12
Iterations
14
16
18
20