More Ranking: Top

Rank Aggregation
Rank Aggregation: Settings
• Multiple items
– Web-pages, cars, apartments,….
• Multiple scores for each item
– By different reviewers, users, according to different
features…
• Some aggregation function on the scores
– Sum, Average, Max…
• Goal: compute the top-k items
Rank Aggregation Example
Model
PriceRank
Model
ComfortRank
Honda
9
Honda
7
Volvo
3
Volvo
10
Subaru
9
Subaru
5
Model
BeautyRank
Honda
3
Volvo
8
Subaru
4
Model
TotalRank(min)
Model
TotalRank(avg)
Honda
3
Honda
6.333
Volvo
3
Volvo
7
Subaru
4
Subaru
6
Naïve Algorithm
• Compute the aggregated rank for all items
• Find the best one, then the second best one…
the k best one
• Good for small-scale problems
• Still not feasible for web scales…
Can we do any better?
• An assumption to help us: each individual list
comes sorted
– Reasonable for search engines, user rankings…
• Another assumption: monotonicity of the
aggregation function
• Now can we do any better?
Fagin's algorithm (FA)
• Do sorted access on all lists in parallel
• For every item do random access to the other
lists to fetch all of its values
• Stop when at least k items were seen (in the
sorted access) in all lists
• Sort the list
• Why is this enough?
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
C
3
A
4
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
B
9.5
C
3
A
4
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
B
9.5
C
3
A
4
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
B
9.5
C
3
A
4
C
4
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
B
9.5
C
3
A
4
C
4
D
3
D
1
Example (top-3)
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
B
9.5
C
3
A
4
C
4
D
3
D
1
How do we know not to
look further?
Complexity
• Probabilistic analysis on the order of items can
be used to show better bounds (with good
probability)
• Can we do even better?
Cost model
• This is a very simple settings so we can define
a finer cost model than worst case complexity
• In a web context it is important to do so
– Since the scale is huge
• We associate some cost Cs with every sorted
access , and some cost Cr with every random
access
• Denote the cost for algorithm A on input
instance I by cost(A,I)
Instance-optimality
• An algorithm A is instance-optimal if for every
input instance I, cost(A,I) = O(cost(A',I)) for
every algorithm A'
• A very strong notion
• But we can realize it here!
Threshold Algorithm (TA)
• Idea: sometimes we can stop before seeing k
objects in every list
• Use a threshold on how good can a score of an
unseen object be.
• Based on aggregating the minimal score seen
so far in all lists
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
C
3
A
4
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
C
3
A
4
D
3
D
1
T=9.5
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
B
9.5
C
3
A
4
D
3
D
1
T=9.5
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
B
9.5
C
3
A
4
C
4
D
3
D
1
T=7
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
6.5
B
9
C
5
B
9.5
C
3
A
4
C
4
D
3
D
1
One step less!
T=4
Instance-optimality
• Theorem: If the aggregation function is strictly monotone and every
two items in a list have distinct grades, then TA is instance-optimal
– Intuition: If an algorithm stops on input I before reaching the
threshold, then we can design an input I' on which it is wrong,
by changing values it did not see
– TA sees at most K items more than any algorithm on any input
• Strict monotonicity is needed to avoid "lucky guesses" in breaking
ties
– Thm. In general no instance-optimal algorithm exists
• Theorem: TA is instance-optimal against all algorithms that do not
"guess"
– i.e. do not do random access to an item they did not see in
sorted access
Restricted Sorted Access
• Some rankings are not available as sorted
– E.g. distances from a map site
• Then we can revise TA to do sorted access only
on the list where it is possible
• And still instance-optimal!
(Against algorithms that work under the same
restrictions, of course)
No Random Access
• Maintain bottom and upper bounds for every
item (worst and best grades)
• Best is the aggregation of what we have seen and
the worst we have seen in every list, Worst is the
aggregation with what we have seen and zeros
• Keep in the list those with top-K "worst" grades
– Break ties by "best" grades
• Halt if we have k items in the list, and the best
grade for every item out of the list is less than the
k'th in the list
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
4.5<S<9
B
9
C
5
C
3
A
4
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
4.5<S<9
B
9
C
5
B
5<S<10
C
3
A
4
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
4.5<S<9
B
9
C
5
B
9.5
C
3
A
4
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Score
A
9
B
10
A
4.5<S<9
B
9
C
5
B
9.5
C
3
A
4
C
2.5<S<5
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Item
Score
Score
A
9
B
10
AA
4.5<S<9
6.5
B
9
C
5
BB
9.5
9.5
C
3
A
4
CC
44
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Item
Score
Score
A
9
B
10
AA
6.5
6.5
B
9
C
5
BB
9.5
9.5
C
3
A
4
CC
44
D
3
D
1
Example
Beauty
Comfort
Average
Item
Score
Item
Score
Item
Item
Score
Score
A
9
B
10
AA
6.5
6.5
B
9
C
5
BB
9.5
9.5
C
3
A
4
CC
44
D
3
D
1
Score(D)<3