Combine preference functions Do ranking aggregation

Learning to Rank
--A Brief Review
Yunpeng Xu
1
Ranking and sorting

Rank: only has K structured categories

Sorting: each sample has a distinct rank
Generally, no need to differentiate them

2
Overview





Rank aggregation
Label ranking
Query and rank by example
Preference learning
Problems left, what we can do?
3
Ranking aggregation

Needs of combining different ranking results

Voting systems, welfare economics, decision
making
1. Hillary Clinton > John Edwards > Barack Obama
2. Barack Obama >John Edwards > Hillary Clinton
=> ?
4
Ranking aggregation (cont.)

Arrow’s impossibility theorem

Kenneth Arrow, 1951
If the decision-making body has at least
two members and at least three options to
decide among, then it is impossible to
design a social welfare function that
satisfies all these conditions at once.
5
Ranking aggregation (cont.)

Arrow’s impossibility theorem


5 fair assumptions
 non-dictatorship, unrestricted domain or
universality, independence of irrelevant
alternatives, positive association of social and
individual values or monotonicity, non-imposition
or citizen sovereignty
Cannot be satisfied simultaneously
6
Ranking aggregation (cont.)

Borda’s method (1971)



Given lists
, each has n items
For each
Define
as the number of items rank below j in
Rank all items by
Hillary Clinton: 2, John Edwards: 2, Barack Obama: 2
7
Ranking aggregation (cont.) -- Border

Condorcet Criteria


If the majority prefers x to y, then x must be
ranked above y
Border’s method does not satisfy CC, neither
any method that assigns weights to each rank
position
8
Ranking aggregation (cont.)


Assumption relaxation
Maximize consensus criteria



Equivalent to minimize disagreement (Kemeny,
Social Choice Theorem)
NP Hard!
Sub-optimal solutions using heuristics
9
Ranking aggregation (cont.)

Basic idea


Assign different weights to different experts
Supervised aggregation


Weighting according to a final judger (ground truth)
Unsupervised aggregation

Aims to minimize the disagreement measured by
certain distances
10
Ranking aggregation (cont.)

Distance measure

Spearman footrule distance

Kendal tau distance
F ( , ) |{(i, j ) | i  j,  (i)   ( j ), but  (i)   ( j )}|

Kendal tau distance for multiple lists

Scaled footrule distance
11
Ranking aggregation (cont.)
-Distance Measure

Kemeny optimal ranking



Minimizing Kendal distance
Still NP-Hard to compute
Local Kemenization (local optimal aggregation)

Can be computed in O(knlogn)
12
Ranking aggregation (cont.)

Supervised Ranking Aggregation (SRA WWW07)

Ground truth: preference matrix H

Example

Goal: rank by the score

It can be seen that
, or with relaxation
13
Ranking aggregation (cont.) -- SRA

Method

Use Borda’s score

Objective
14
Ranking aggregation (cont.)

Markov Chain Rank Aggregation (MCRA, WWW05)

Map a ranked list to a Markov Chain M
Compute the stationary distribution of M
Rank items based on

Example:





B>C>D
A>D>E
A>B>E
15
Ranking aggregation (cont.) - MCRA

Different transition strategies




MC1
all out-degree edges have uniform probabilities
MC2
choose a list, then choose next item on the list;
…
For disconnected graph, define transition
probability based on measure item similarity
16
Ranking aggregation (cont.)

Unsupervised Learning Algorithm for Rank
Aggregation (ULARA: Dan Roth ECML07)
 Goal:

Method: maximize agreement
17
Ranking aggregation (cont.) - UCLRA

Method

Algorithm: iterative gradient decent

Initially, w is uniform, then updated iteratively
18
Overview





Rank aggregation
Label ranking
Query and rank by example
Preference learning
Problems left, what we can do?
19
Label Ranking


Goal: Map from the input space to the set of total
order over a finite set of labels
Related to multi-label or multi-class problems
Input: Customer information
Output: Porsche > Toyota > Ford
Mountain > Sea> Beach
20
Label Ranking (cont.)

Pairwise ranking (ECML03)



Train a classifier for each pair of labels
When judge on an example :
If the classifier predicts
, then count it as a
vote on
Then rank all labels according to their votes
Total
classifiers
21
Label Ranking (cont.)

Constraint Classification (NIPS 02)

Consider a linear sorting function

Goal: learn the values of
rank all labels by the score
22
Label Ranking (cont.) -- CC

Expand the feature vector

Generate positive/ negative samples in
23
Label Ranking (cont.) -- CC

Learn a separating hyper plane

Can be solved by SVM
24
Overview





Rank aggregation
Label ranking
Query and rank by example
Preference learning
Problems left, what we can do?
25
Query and rank by example

Given one query, rank retrieved items according
to their relevancy w.r.t the query.
26
Query and rank by example (cont.)

Rank on manifold

Convergence form

Essentially, this is an one-class semi-supervised
method
27
Preference learning

Given a set of items, and a set of user
preference over these items, to rank all items
according to the user preference.

Motivated by the needs of personalized search.
28
Preference learning

Input:
preference: a set of partial order on X
Output: a total order on X
or, map X onto a structured label space Y

Preference function
29
Existing methods






Learning to order things [W. Cohen 98]
Large margin ordinal regression [R. Herbrich 98]
PRanking with Ranking [K Crammer 01]
Optimizing Search Engines using Clickthrough
Data [T Joachims 02]
Efficient boosting algorithm for combining
preferences [Yoav Freund 03]
Classification Approach towards Ranking and
Sorting Problems [S Rajaram 03]
30
Existing methods

Learning to Rank using Gradient Descent [C
Burges 05]


Stability and Generalization of Bipartite
Ranking [S Agarwal 05]
Generalization Bounds for k-Partite Ranking[S
Rajaram 05]


Ranking with a p-norm push [C Rudin 05]
Magnitutde-Preserving Ranking Algorithms [C
Cortes 07]

From Pairwise Approach to Listwise [Z Cao 07]
31
Large Margin Ordinal Regression

Mapping to an axis using inner product
32
Large Margin Ordinal Regression

Consider
Then

Introduce soft margin

Solve using SVM

33
Learn to order things

A greedy ordering algorithm to order things
Calculate a score for each item
34
Learn to order things (cont.)

Combine different ranking functions

To learn the weight iteratively
35
Learn to order things
Combine preference functions
Do ranking aggregation
Update weights
based on feedbacks
36


Initially, w is uniform
At each step



Compute a combined ranking function
Produce a ranking aggregation
Measure the loss
37
RankBoost

Bipartite ranking problems

Combine weaker rankers

Sort based on values of H(x)
38
RankBoost (cont.)
Sampling distribution Initialization

Bipartite ranking problem
Learn weak
ranker
Sampling
distribution
updation
normalization
Combine weak rankers
39
Stability and Generalization

Bipartite ranking problems
Expected rank error

Empirical rank error

40
Stability and Generalization (cont.)

Stability

Remove one training sample, how much changes

Generalization

Generalize to k-partite ranking problem…
41
Rank on graph data

Objective
42
P-norm push

Focus on the topmost ranked items

The top left region is the most important
43
P-norm push (cont.)

Height of k (k is a negative sample)
Cost of sample k:
g is convex, monotonically incresasing
44
p-norm push

Run RankBoost to solve the problem
45
Thanks!
46