Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006 Outline • • • • • • • • • Motivations Expected Metric Principle Metrics Bayesian Retrieval Objectives Heuristics Experimental Results Related Work Future Work and Conclusions August 9, 2006 ACM SIGIR 2006 Slide 2 Motivation • In IR, we have formal models, and formal metrics • Models provide framework for retrieval – E.g.: Probabilistic • Metrics provide rigorous evaluation mechanism – E.g.: Precision and recall • Probability ranking principle (PRP) provably optimal for precision/recall – Ranking by probability of relevance • But other metrics capture other notions of result set quality and PRP isn’t necessarily optimal August 9, 2006 ACM SIGIR 2006 Slide 3 Example: Diversity • User may be satisfied with one relevant result – Navigational queries, question/answering • In this case, we want to “hedge our bets” by retrieving for diversity in result set – Better to satisfy different users with different interpretations, than one user many times over • Reciprocal rank/search length metrics capture this notion • PRP is suboptimal August 9, 2006 ACM SIGIR 2006 Slide 4 IR System Design • Metrics define preference ordering on result sets – Metric[Result set 1] > Metric[Result set 2] Result set 1 preferred to Result set 2 • Traditional approach: Try out heuristics that we believe will improve relevance performance – Heuristics not directly motivated by metric – E.g. synonym expansion, psuedorelevance feedback • Observation: Given a model, we can try to directly optimize for some metric August 9, 2006 ACM SIGIR 2006 Slide 5 Expected Metric Principle (EMP) • Knowing which metric to use tells us what to maximize for – the expected value of the metric for each result set, given a model Corpus Result Sets 1, 2 Document 1 1, 3 2, 1 Document 2 2, 3 3, 1 Document 3 August 9, 2006 Calculate E[Metric] using model Return set with max score 3, 2 ACM SIGIR 2006 Slide 6 Our Contributions • Primary: EMP – metric as retrieval goal – Metric designed to measure retrieval quality • Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call – Build probabilistic model – Retrieve to maximize an objective: the expected value of metric • Expectations calculated according to our probabilistic model – Use computational heuristics to make optimization problem tractable • Secondary: retrieving for diversity (special case) – A natural side effect of optimizing for certain metrics August 9, 2006 ACM SIGIR 2006 Slide 7 Detour: What is a Heuristic? Ad hoc approach • Use heuristics that are believed to be correlated with good performance • Heuristics used to improve relevance • Heuristics (probably) make system slower • Infinite number of possibilities, no formalism • Model, heuristics intertwined August 9, 2006 Our approach • Build model that directly optimizes for good performance • Heuristics used to improve efficiency • Heuristics (probably) make optimization worse • Well-known space of optimization techniques • Clean separation between model and heuristics ACM SIGIR 2006 Slide 8 Our Contributions • Primary: EMP – metric as retrieval goal – Metric designed to measure retrieval quality • Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call – Build probabilistic model – Retrieve to maximize an objective: the expected value of metric • Expectations calculated according to our probabilistic model – Use computational heuristics to make optimization problem tractable • Secondary: retrieving for diversity (special case) – A natural side effect of optimizing for certain metrics August 9, 2006 ACM SIGIR 2006 Slide 9 Search Length/Reciprocal Rank • (Mean) search length (MSL): number of irrelevant results until first relevant • (Mean) reciprocal rank (MRR): one over rank of first relevant } August 9, 2006 Search length = 2 Reciprocal rank = 1/3 ACM SIGIR 2006 Slide 10 Instance Recall • Each topic has multiple instances (subtopics, aspects) • Instance recall is how many instances covered (in union) over first n results } August 9, 2006 Instance recall @ 5 = 0.75 ACM SIGIR 2006 Slide 11 k-call @ n • Binary metric: 1 if top n results has k relevant, 0 otherwise • 1-call is (1 – %no) – See TREC robust track } August 9, 2006 1-call @ 5 = 1 2-call @ 5 = 1 3-call @ 5 = 0 ACM SIGIR 2006 Slide 12 Motivation for k-call • 1-call: Want one relevant document – Many queries satisfied with one relevant result – Only need one relevant document, more room to explore promotes result set diversity • n-call: Want all relevant documents – “Perfect precision” – Hone in on one interpretation and stick to it! • Intermediate k – Risk/reward tradeoff • Plus, easily modeled in our framework – Binary variable August 9, 2006 ACM SIGIR 2006 Slide 13 Our Contributions • Primary: EMP – metric as retrieval goal – Metric designed to measure retrieval quality • Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call – Build probabilistic model – Retrieve to maximize an objective: the expected value of metric • Expectations calculated according to our probabilistic model – Use computational heuristics to make optimization problem tractable • Secondary: retrieving for diversity (special case) – A natural side effect of optimizing for certain metrics August 9, 2006 ACM SIGIR 2006 Slide 14 Bayesian Retrieval Model • There exists distributions that generate relevant documents, irrelevant documents • PRP: rank by Pr[ d | r ] Pr[ r | d ] Pr[ d | r ] • Remaining modeling questions: form of rel/irrel distributions and parameters for those distributions • In this paper, we assume multinomial models, and choose parameters by maximum a posteriori – Prior is background corpus word distribution August 9, 2006 ACM SIGIR 2006 Slide 15 Our Contributions • Primary: EMP – metric as retrieval goal – Metric designed to measure retrieval quality • Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call – Build probabilistic model – Retrieve to maximize an objective: the expected value of metric • Expectations calculated according to our probabilistic model – Use computational heuristics to make optimization problem tractable • Secondary: retrieving for diversity (special case) – A natural side effect of optimizing for certain metrics August 9, 2006 ACM SIGIR 2006 Slide 16 Objective • Probability Ranking Principle (PRP): maximize Pr[ r | d ] at each step in ranking • Expected Metric Principle (EMP): maximize E[metric | d1...d n ] for complete result set • In particular for k-call, maximize: E[at least k relevant | d1...d n ] Pr[at least k relevant | d1...d n ] August 9, 2006 ACM SIGIR 2006 Slide 17 Our Contributions • Primary: EMP – metric as retrieval goal – Metric designed to measure retrieval quality • Metrics we consider: precision/recall @ n, search length, reciprocal rank, instance recall, k-call – Build probabilistic model – Retrieve to maximize an objective: the expected value of metric • Expectations calculated according to our probabilistic model – Use computational heuristics to make optimization problem tractable • Secondary: retrieving for diversity (special case) – A natural side effect of optimizing for certain metrics August 9, 2006 ACM SIGIR 2006 Slide 18 Optimization of Objective • Exact optimization of objective is usually NP-hard – E.g.: Exact optimization for k-call reducible to NP-hard maximum graph clique problem • Approximation heuristic: Greedy algorithm – Select documents successively in rank order – Hold previous documents fixed, optimize objective at each rank d1 August 9, 2006 Maximize E[metric | d] ACM SIGIR 2006 Slide 19 Optimization of Objective • Exact optimization of objective is usually NP-hard – E.g.: Exact optimization for k-call reducible to NP-hard maximum graph clique problem • Approximation heuristic: Greedy algorithm – Select documents successively in rank order – Hold previous documents fixed, optimize objective at each rank August 9, 2006 d1 Fixed d2 Maximize E[metric | d, d1] ACM SIGIR 2006 Slide 20 Optimization of Objective • Exact optimization of objective is usually NP-hard – E.g.: Exact optimization for k-call reducible to NP-hard maximum graph clique problem • Approximation heuristic: Greedy algorithm – Select documents successively in rank order – Hold previous documents fixed, optimize objective at each rank August 9, 2006 d1 Fixed d2 Fixed d3 Maximize E[metric | d, d1, d2] ACM SIGIR 2006 Slide 21 Greedy on 1-call and n-call • 1-greedy – Greedy algorithm reduces to ranking each successive document assuming all previous documents are irrelevant Pr[ r1 r2 ... ri ] Pr[ r1 ... ri 1 ] Pr[(r1 ... ri 1 ) ri ] Pr[ r1 ... ri 1 ] Pr[(r1 ... ri 1 )] Pr[ ri | (r1 ... ri 1 )] Pr[ ri | r1 , r2 ,..., ri 1 ] – Algorithm has “discovered” incremental negative pseudorelevance feedback • n-greedy: Assume all previous documents relevant August 9, 2006 ACM SIGIR 2006 Slide 22 Greedy on Other Metrics • Greedy with precision/recall reduces to PRP! • Greedy on k-call for general k (k-greedy) – More complicated… • Greedy with MSL, MRR, instance recall works out to 1-greedy algorithm – Intuition: to make first relevant document appear earlier, we want to hedge our bets as to query interpretation (i.e., diversify) August 9, 2006 ACM SIGIR 2006 Slide 23 Experiments Overview • Experiments verify that optimizing for metric improves performance on metric – They do not tell us which metrics to use • • • • Looked at ad hoc diversity examples TREC topics/queries Tuned weights on separate development set Tested on: – Standard ad hoc (robust track) topics – Topics with multiple annotators – Topics with multiple instances August 9, 2006 ACM SIGIR 2006 Slide 24 Diversity on Google Results • Task: reranking top 1,000 Google results • In optimizing 1-call, our algorithm finds more diverse results than PRP, Google results August 9, 2006 ACM SIGIR 2006 Slide 25 Experiments: Robust Track • TREC 2003, 2004 robust tracks – 249 topics – 528,000 documents 1-call 10-call MRR MSL P@10 PRP 0.791 0.020 0.563 3.052 0.333 1-greedy 0.835 0.004 0.579 2.763 0.269 10-greedy 0.671 0.084 0.517 3.992 0.337 • 1-call, 10-call results statistically significant August 9, 2006 ACM SIGIR 2006 Slide 26 Experiments: Instance Retrieval • TREC-6,7,8 interactive tracks – 20 topics – 210,000 documents – 7 to 56 instances per topic • PRP baseline: instance recall @ 10 = 0.234 • Greedy 1-call: instance recall @ 10 = 0.315 August 9, 2006 ACM SIGIR 2006 Slide 27 Experiments: Multi-annotator • TREC-4,6 ad hoc retrieval – Independent annotators assessed same topics – TREC-4: 49 topics, 568,000 documents, 3 annotators – TREC-6: 50 topics, 556,000 documents, 2 annotators 1-call (1) 1-call (2) 1-call (3) Total TREC-4 PRP 0.735 0.551 0.653 1.939 TREC-4 1-greedy 0.776 0.633 0.714 2.122 TREC-6 PRP 0.660 0.620 N/A 1.280 TREC-6 1-greedy 0.800 0.820 N/A 1.620 • More annotators more satisfied using 1-greedy August 9, 2006 ACM SIGIR 2006 Slide 28 Related Work • Fits in risk minimization framework (objective as negative loss function) • Other approaches look at optimizing for metrics directly, with training data • Pseudorelevance feedback • Subtopic retrieval • Maximal marginal relevance • Clustering • See paper for references August 9, 2006 ACM SIGIR 2006 Slide 29 Future Work • General k-call (k = 2, etc.) – Determination if this is what users want • Better underlying probabilistic model – Our contribution is in the ranking objective, not the model model can be arbitrarily sophisticated • Better optimization techniques – E.g., Local search would differentiate algorithms for MRR and 1-call • Other metrics – Preliminary work on mean average precision, precision @ recall • (Perhaps) surprisingly, these metrics are not optimized by PRP! August 9, 2006 ACM SIGIR 2006 Slide 30 Conclusions • EMP: Metric can motivate model – choosing and believing in a metric already gives us a reasonable objective, E[metric] • Can potentially apply EMP on top of a variety of different underlying probabilistic models • Diversity is one practical example of a natural side effect of using EMP with the right metric August 9, 2006 ACM SIGIR 2006 Slide 31 Acknowledgments • Harr Chen supported by the Office of Naval Research through a National Defense Science and Engineering Graduate Fellowship • Jaime Teevan, Susan Dumais, and anonymous reviewers provided constructive feedback • ChengXiang Zhai, William Cohen, and Ellen Voorhees provided code and data August 9, 2006 ACM SIGIR 2006 Slide 32
© Copyright 2026 Paperzz