Learning to Rank with Ties Authors: Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu Presenter: Davidson Date: 2009/12/29 Published in SIGIR 2008 Contents Introduction Pairwise learning to rank Models for paired comparisons with ties General linear models Bradley-Terry model Thurstone-Mosteller model Loss functions Learning by functional gradient boosting Experiments Conclusions and future work 2/24 Introduction Learning to rank: Ranking objects for some queries Document retrieval, expert finding, anti web spam, and product ratings, etc. Learning to rank methods: Pointwise approach Pairwise approach Listwise approach 3/24 Existing approaches (1/2) Vector space model methods Represent query and documents as vectors of features Compute the distance as similarity measure Language modeling based methods Use a probabilistic framework for the relevance of a document with respect to a query Estimate the parameters in probability models 4/24 Existing approaches (2/2) Supervised machine learning framework Learn a ranking function from pairwise preference data Minimize the number of contradicting pairs in training data Direct optimization of loss function designed from performance measure Obtain a ranking function that is optimal with respect to some performance measure Require absolute judgment data 5/24 Pairwise learning to rank Use pairs of preference data x y = x is preferred to y How to obtain preference judgments? Clickthroughs User click count on search results Use heuristic rules such as clichthrough rates Absolute relevant judgments Human labels E.g. (4-level judgments) perfect, excellent, good, and bad Need to convert preference judgments to preference data 6/24 Conversion from preference judgments to preference data E.g. 4 samples with 2-level judgment Preference judgment Data Preference Label Judgment A Relevant B Relevant C Irrelevant D Irrelevant Preference data (A, (A, (B, (B, C) D) C) D) Tie cases (A, B) and (C, D) are ignored! 7/24 Models for paired comparisons with ties Notations: x y x y => => x x is preferred to y and y are preferred equally Proposed framework General linear models for paired comparisons (in Oxford University Press, 1988) Statistical models for paired comparisons Bradley-Terry model (in Biometrika, 1952) Thurstone-Mosteller model (in Psychological Review, 1927) 8/24 General linear models (1/4) For a non-decreasing function F : R R F 1 F 0 F x 1 F x The probability that document x is preferred to document y is: h . Px y F hx h y = scoring/ranking function 9/24 General linear models (2/4) With ties, the function becomes: Px y F hx h y Px y F hx h y F hx h y = a threshold that controls the tie probability 0 , this model is identical to the original general linear models. 10/24 General linear models (3/4) F x P y x Px y x 11/24 General linear models (4/4) Px y F x P y x Px y x 12/24 Bradley-Terry model The function is set to be 1 eh( x) F x so that P x y h ( x ) 1 exp( x) e eh( y ) h( x) e With ties: P x y h( x) h( y ) e e 1 eh( x)eh( y ) P x y h ( x ) h( y ) h( y ) h( x) e e e e where e 2 13/24 Thurstone-Mosteller model The function is set to be the Gaussian cumulative distribution 1 F x x 2 e x2 2 dx x With ties: Px y hx h y Px y hx h y hx h y 14/24 Loss functions Training data N xi , yi preference data, N M i 1 M tie data Minimize the empirical risk: Rˆ [h] N M Lh, x , y i 1 N i i Lh, xi yi i 1 N M Lh, x i N 1 i yi L h , x y log P x y i i i i Loss function: Lh, xi yi log Pxi yi 15/24 Learning by functional gradient boosting Obtain a function h from a function space H that minimizes the empirical * loss: h arg min Rˆ[h] * hH Apply gradient boosting algorithm Approximate h by iteratively constructing a sequence of base learners Base learners are regression trees The number of iteration and shrinkage factor in boosting algorithm are found by using cross validation 16/24 * Experiments Learning to rank methods: BT (Bradley-Terry model) and TM (Thurstone-Mosteller model) BT-noties and TM-noties (BT and TM without ties) RankSVM, RankBoost, AdaRank, Frank Datasets (Letor data collection) OHSUMED (16,140 pairs, 3-level judgment) TREC2003 (49,171 pairs, binary judgment) TREC2004 (74,170 pairs, binary judgment) 17/24 Performance measures Precision Binary judgment only Mean Average Precision (MAP) Binary judgment only Sensitive to the entire ranking order Normalized Discount Cumulative Gain (NDCG) Multi-level judgment 18/24 Performance comparisons over OHSUMED 19/24 Performance comparisons over TREC2003 20/24 Performance comparisons over TREC2004 21/24 Performance comparison: ties from different relevance levels 22/24 Learning curves of Bradley-Terry model 23/24 Conclusions and future work Conclusions Tie data improve the performance Common features of relevant documents are extracted Irrelevant documents have more diverse tie features and are less effective BT and TM are comparable in most cases Future work Theoretical analysis of ties New methods/algorithms/loss functions of incorporating ties 24/24
© Copyright 2026 Paperzz