Swap algorithm for query result diversification Emre Can Kucukoglu, [email protected] Reference articles: • C. Yu, L. Lakshmanan, and S. Amer-Yahia, “It takes variety to make a world: diversification in recommender systems,” in EDBT, 2009. • Marcos R. Vieira, Humberto Luiz Razente, Maria Camila Nardini Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina Jr., Vassilis J. Tsotras: On query result diversification. ICDE 2011: 1163-1174 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Inside the while loop in [2-5], first 4 documents are added to result set R. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. R STEP 1 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Inside the while loop in [2-5], first 4 documents are added to result set R. These are d4, d5, d1, d7. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. R d4 d5 d1 d7 STEP 1 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Inside the while loop in [2-5], first 4 documents are added to result set R. These are d4, d5, d1, d7. And remove them from candidate set S. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. S R d6 d4 d2 d5 d8 d1 d3 d7 STEP 1 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. S R d6 d4 d2 d5 d8 d1 d3 d7 STEP 2 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. C S R d6 d6 d4 d2 d5 d8 d1 d3 d7 STEP 2 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. C S R d6 d2 d4 d8 d5 d3 d1 d7 STEP 2 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. C S R d6 d2 d4 d8 d5 d3 d1 d7 R’ d4 d5 Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. d1 d7 STEP 2 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. Pick d7 from R [line 10], Compute function F value for A={d4,d5,d1,d6} and R’={d4,d5,d1,d7}. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. C S R d6 d2 d4 d8 d5 d3 d1 d7 R’ d4 d5 d1 d7 STEP 3 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. Pick d7 from R [line 10], Compute function F value for A={d4,d5,d1,d6} and R’={d4,d5,d1,d7}. Let F(q,A) > F(q, R’) Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. Then update R’ with A={d4,d5,d1,d6}. C S R d6 d2 d4 d8 d5 d3 d1 d7 R’ d4 d5 d1 d6 STEP 3 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. Pick d1 from R [line 10], and compute F value for A={d4,d5,d6,d7} and R’={d4,d5,d1,d6}. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. C S R d6 d2 d4 d8 d5 d3 d1 d7 R’ d4 d5 d1 d6 STEP 4 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. Pick d1 from R [line 10], and compute F value for A={d4,d5,d6,d7} and R’={d4,d5,d1,d6}. Let F(q,A) < F(q, R’) Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. C S R d6 d2 d4 d8 d5 d3 d1 d7 R’ d4 d5 d1 d6 STEP 4 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. Pick d5 from R [line 10], and compute F value for A={d4,d6,d1,d7} and R’={d4,d5,d1,d6}. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. C S R d6 d2 d4 d8 d5 d3 d1 d7 R’ d4 d5 d1 d6 STEP 5 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. Pick d5 from R [line 10], and compute F value for A={d4,d6,d1,d7} and R’={d4,d5,d1,d6}. Let F(q,A) > F(q, R’) Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. Then update R’ with A={d4,d5,d6,d7}. C S R d6 d2 d4 d8 d5 d3 d1 d7 R’ d4 d5 d6 d7 STEP 5 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. Pick d4 from R [line 10], and compute F value for A={d6,d5,d1,d7} and R’={d4,d5,d6,d7}. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. C S R d6 d2 d4 d8 d5 d3 d1 d7 R’ d4 d5 d6 d7 STEP 6 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. Pick d4 from R [line 10], and compute F value for A={d6,d5,d1,d7} and R’={d4,d5,d6,d7}. Let F(q,A) < F(q, R’) Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. C S R d6 d2 d4 d8 d5 d3 d1 d7 R’ d4 d5 d6 d7 STEP 6 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. After inner for-loop [10-12], If R’ has higher return value for funtion F than initial R set, Assign R to R’. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. C S R d6 d2 d4 d8 d5 d3 d1 d7 R’ d4 d5 d6 d7 STEP 7 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. First pick d6. And remove it from S. Assign R’ to R. After inner for-loop [10-12], If R’ has higher return value for funtion F than initial R set, Assign R to R’. [13-14] Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. Let F(q,R’) > F(q, R). C S R d6 d2 d4 d8 d5 d3 d6 d7 R’ d4 d5 d6 d7 STEP 7 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order document relevance scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 While candidate set S has documents, pick highest scoring document. Second pick d2. And remove it from S. Assign R’ to R. And repeat step [3-6]. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. C S R d2 d8 d4 d3 d5 d6 d7 R’ d4 d5 d6 d7 STEP 8 If we assume complexity of function F as O(C), overall complexity of swap algorithm is O(N.k.C) N is the size of the candidate set S. Start with the k highest scoring documents, and swap the document which contributes the least to the function F with the next highest scoring document among the remaining documents. At each iteration, a candidate document with a lower relevance is swapped into the top-k set if and only if it increases the overall function F value of the resulting set. Function F: λ is the tradeoff value between sim function and div function. k is the result set size. sim function computes the sum of similarity distances among all documents. div function computes the sum of diversity distances among all documents. For a given documents, every pair of documents’ diversity distances are added to find div function result. Similarity distances can be computed with tf-idf or other algorithms. The F value of the final result R is not guaranteed to be optimal, since documents in the candidate set S are analyzed with respect to their similarity distances. That is, this method does not consider the order of diversity distances in S. MMR* algorithm for query result diversification Emre Can Kucukoglu, [email protected] Reference articles: • J. Carbonell and J. Goldstein, “The use of MMR, diversity-based reranking for reordering documents and producing summaries,” in Proc. ACM SIGIR, 1998. • Marcos R. Vieira, Humberto Luiz Razente, Maria Camila Nardini Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina Jr., Vassilis J. Tsotras: On query result diversification. ICDE 2011: 1163-1174 *: Maximal Marginal Relevance Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order function δsim scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Since R is initially empty, first element is picked according to similarity distances. R S d1 d2 d3 d4 d5 d6 MMR algorithm iteratively constructs the result set R by selecting a new document in S that maximixes the function mmr: λ is the tradeoff value between δsim function and δdiv function. k is the result set size. δsim function computes the similarity distances between query and document. δdiv function computes the diversity distances between a pair of documents. Similarity distances can be computed with tf-idf or other algorithms. Since R is empty in the initial iteration, |R| is 0, so that the element with the highest δsim in S is always included in R, regardless of the λ value. d7 d8 STEP 1 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order function δsim scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 R S d4 d1 Since R is initially empty, first element is picked according to similarity distances. d4 has highest δsim score. Add it to R, remove from S. MMR algorithm iteratively constructs the result set R by selecting a new document in S that maximixes the function mmr: λ is the tradeoff value between δsim function and δdiv function. k is the result set size. δsim function computes the similarity distances between query and document. δdiv function computes the diversity distances between a pair of documents. Similarity distances can be computed with tf-idf or other algorithms. Since R is empty in the initial iteration, |R| is 0, so that the element with the highest δsim in S is always included in R, regardless of the λ value. d2 d3 d5 d6 d7 d8 STEP 1 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order function δsim scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Then for every step, pick highest scoring document from S, according to their function mmr results until |R| = k. MMR algorithm iteratively constructs the result set R by selecting a new document in S that maximixes the function mmr: λ is the tradeoff value between δsim function and δdiv function. k is the result set size. δsim function computes the similarity distances between query and document. δdiv function computes the diversity distances between a pair of documents. Similarity distances can be computed with tf-idf or other algorithms. Since R is empty in the initial iteration, |R| is 0, so that the element with the highest δsim in S is always included in R, regardless of the λ value. R S d4 d1 d2 d3 d5 d6 d7 d8 STEP 2-4 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order function δsim scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Then for every step, pick highest scoring document from S, according to their function mmr results until |R| = k. MMR algorithm iteratively constructs the result set R by selecting a new document in S that maximixes the function mmr: λ is the tradeoff value between δsim function and δdiv function. k is the result set size. δsim function computes the similarity distances between query and document. δdiv function computes the diversity distances between a pair of documents. Similarity distances can be computed with tf-idf or other algorithms. Since R is empty in the initial iteration, |R| is 0, so that the element with the highest δsim in S is always included in R, regardless of the λ value. Let d3,d6 and d8 have highest scores. R S d4 d1 d3 d2 d6 d5 d8 d7 STEP 2-4 Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8}, k = 4, decreasing order function δsim scores: d4 d5 d1 d7 d6 d2 d8 d3 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Then for every step, pick highest scoring document from S, according to their function mmr results until |R| = k. MMR algorithm iteratively constructs the result set R by selecting a new document in S that maximixes the function mmr: λ is the tradeoff value between δsim function and δdiv function. k is the result set size. δsim function computes the similarity distances between query and document. δdiv function computes the diversity distances between a pair of documents. Similarity distances can be computed with tf-idf or other algorithms. Since R is empty in the initial iteration, |R| is 0, so that the element with the highest δsim in S is always included in R, regardless of the λ value. Let d3,d6 and d8 have highest scores. For each calculation of function mmr, since |R| value is increasing, weight of diversity distances is decreasing. R S d4 d1 d3 d2 d6 d5 d8 d7 STEP 2-4 Since the result is incrementally constructed by inserting a new element to previous results, the first chosen element has a large influence in the quality of the final result set R. Moreover, experimental results show that the quality of the results for the MMR method decreases very fast when increasing the λ parameter. If we assume complexity of picking highest scoring document according to function mmr as O(C), overall complexity of MMR algorithm is O(k.C). MMR algorithm iteratively constructs the result set R by selecting a new document in S that maximixes the function mmr: λ is the tradeoff value between δsim function and δdiv function. k is the result set size. δsim function computes the similarity distances between query and document. δdiv function computes the diversity distances between a pair of documents. Similarity distances can be computed with tf-idf or other algorithms. Since R is empty in the initial iteration, |R| is 0, so that the element with the highest δsim in S is always included in R, regardless of the λ value.
© Copyright 2024 Paperzz