{d4,d5,d6,d7} and R`={d4,d5,d1,d6}.

Swap algorithm for query
result diversification
Emre Can Kucukoglu, [email protected]
Reference articles:
• C. Yu, L. Lakshmanan, and S. Amer-Yahia, “It takes variety to make a world: diversification in recommender systems,” in EDBT, 2009.
• Marcos R. Vieira, Humberto Luiz Razente, Maria Camila Nardini Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina Jr., Vassilis
J. Tsotras: On query result diversification. ICDE 2011: 1163-1174
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Inside the while loop in [2-5], first 4 documents are added to result set R.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
R
STEP 1
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Inside the while loop in [2-5], first 4 documents are added to result set R.
These are d4, d5, d1, d7.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
R
d4
d5
d1
d7
STEP 1
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Inside the while loop in [2-5], first 4 documents are added to result set R.
These are d4, d5, d1, d7.
And remove them from candidate set S.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
S
R
d6
d4
d2
d5
d8
d1
d3
d7
STEP 1
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
S
R
d6
d4
d2
d5
d8
d1
d3
d7
STEP 2
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
C
S
R
d6
d6
d4
d2
d5
d8
d1
d3
d7
STEP 2
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
STEP 2
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
R’
d4
d5
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
d1
d7
STEP 2
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
Pick d7 from R [line 10],
Compute function F value for A={d4,d5,d1,d6} and R’={d4,d5,d1,d7}.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
R’
d4
d5
d1
d7
STEP 3
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
Pick d7 from R [line 10],
Compute function F value for A={d4,d5,d1,d6} and R’={d4,d5,d1,d7}.
Let F(q,A) > F(q, R’)
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
Then update R’ with A={d4,d5,d1,d6}.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
R’
d4
d5
d1
d6
STEP 3
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
Pick d1 from R [line 10],
and compute F value for A={d4,d5,d6,d7} and R’={d4,d5,d1,d6}.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
R’
d4
d5
d1
d6
STEP 4
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
Pick d1 from R [line 10],
and compute F value for A={d4,d5,d6,d7} and R’={d4,d5,d1,d6}.
Let F(q,A) < F(q, R’)
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
R’
d4
d5
d1
d6
STEP 4
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
Pick d5 from R [line 10],
and compute F value for A={d4,d6,d1,d7} and R’={d4,d5,d1,d6}.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
R’
d4
d5
d1
d6
STEP 5
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
Pick d5 from R [line 10],
and compute F value for A={d4,d6,d1,d7} and R’={d4,d5,d1,d6}.
Let F(q,A) > F(q, R’)
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
Then update R’ with A={d4,d5,d6,d7}.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
R’
d4
d5
d6
d7
STEP 5
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
Pick d4 from R [line 10],
and compute F value for A={d6,d5,d1,d7} and R’={d4,d5,d6,d7}.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
R’
d4
d5
d6
d7
STEP 6
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
Pick d4 from R [line 10],
and compute F value for A={d6,d5,d1,d7} and R’={d4,d5,d6,d7}.
Let F(q,A) < F(q, R’)
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
R’
d4
d5
d6
d7
STEP 6
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
After inner for-loop [10-12],
If R’ has higher return value for funtion F than initial R set,
Assign R to R’.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
C
S
R
d6
d2
d4
d8
d5
d3
d1
d7
R’
d4
d5
d6
d7
STEP 7
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
First pick d6. And remove it from S.
Assign R’ to R.
After inner for-loop [10-12],
If R’ has higher return value for funtion F than initial R set,
Assign R to R’. [13-14]
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
Let F(q,R’) > F(q, R).
C
S
R
d6
d2
d4
d8
d5
d3
d6
d7
R’
d4
d5
d6
d7
STEP 7
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order document relevance scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
While candidate set S has documents, pick highest scoring document.
Second pick d2. And remove it from S.
Assign R’ to R.
And repeat step [3-6].
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
C
S
R
d2
d8
d4
d3
d5
d6
d7
R’
d4
d5
d6
d7
STEP 8
If we assume complexity of function F as O(C),
overall complexity of swap algorithm is O(N.k.C)
N is the size of the candidate set S.
Start with the k highest scoring documents, and swap
the document which contributes the least to the
function F with the next highest scoring document
among the remaining documents. At each iteration, a
candidate document with a lower relevance is
swapped into the top-k set if and only if it increases
the overall function F value of the resulting set.
Function F:
λ is the tradeoff value between sim function and div
function.
k is the result set size.
sim function computes the sum of similarity distances
among all documents.
div function computes the sum of diversity distances
among all documents. For a given documents, every
pair of documents’ diversity distances are added to
find div function result.
Similarity distances can be computed with tf-idf or
other algorithms.
The F value of the final result R is not guaranteed to be optimal, since documents in the candidate
set S are analyzed with respect to their similarity distances.
That is, this method does not consider the order of diversity distances in S.
MMR* algorithm for query
result diversification
Emre Can Kucukoglu, [email protected]
Reference articles:
• J. Carbonell and J. Goldstein, “The use of MMR, diversity-based reranking for reordering documents and producing summaries,” in Proc. ACM
SIGIR, 1998.
• Marcos R. Vieira, Humberto Luiz Razente, Maria Camila Nardini Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina Jr., Vassilis
J. Tsotras: On query result diversification. ICDE 2011: 1163-1174
*: Maximal Marginal Relevance
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order function δsim scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Since R is initially empty, first element is picked according to similarity distances.
R
S
d1
d2
d3
d4
d5
d6
MMR algorithm iteratively constructs the result set R
by selecting a new document in S that maximixes the
function mmr:
λ is the tradeoff value between δsim function and
δdiv function.
k is the result set size.
δsim function computes the similarity distances
between query and document.
δdiv function computes the diversity distances
between a pair of documents.
Similarity distances can be computed with tf-idf or
other algorithms.
Since R is empty in the initial iteration, |R| is 0, so
that the element with the highest δsim in S is always
included in R, regardless of the λ value.
d7
d8
STEP 1
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order function δsim scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
R
S
d4
d1
Since R is initially empty, first element is picked according to similarity distances.
d4 has highest δsim score. Add it to R, remove from S.
MMR algorithm iteratively constructs the result set R
by selecting a new document in S that maximixes the
function mmr:
λ is the tradeoff value between δsim function and
δdiv function.
k is the result set size.
δsim function computes the similarity distances
between query and document.
δdiv function computes the diversity distances
between a pair of documents.
Similarity distances can be computed with tf-idf or
other algorithms.
Since R is empty in the initial iteration, |R| is 0, so
that the element with the highest δsim in S is always
included in R, regardless of the λ value.
d2
d3
d5
d6
d7
d8
STEP 1
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order function δsim scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Then for every step,
pick highest scoring document from S, according to their function mmr
results until |R| = k.
MMR algorithm iteratively constructs the result set R
by selecting a new document in S that maximixes the
function mmr:
λ is the tradeoff value between δsim function and
δdiv function.
k is the result set size.
δsim function computes the similarity distances
between query and document.
δdiv function computes the diversity distances
between a pair of documents.
Similarity distances can be computed with tf-idf or
other algorithms.
Since R is empty in the initial iteration, |R| is 0, so
that the element with the highest δsim in S is always
included in R, regardless of the λ value.
R
S
d4
d1
d2
d3
d5
d6
d7
d8
STEP 2-4
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order function δsim scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Then for every step,
pick highest scoring document from S, according to their function mmr
results until |R| = k.
MMR algorithm iteratively constructs the result set R
by selecting a new document in S that maximixes the
function mmr:
λ is the tradeoff value between δsim function and
δdiv function.
k is the result set size.
δsim function computes the similarity distances
between query and document.
δdiv function computes the diversity distances
between a pair of documents.
Similarity distances can be computed with tf-idf or
other algorithms.
Since R is empty in the initial iteration, |R| is 0, so
that the element with the highest δsim in S is always
included in R, regardless of the λ value.
Let d3,d6 and d8 have highest scores.
R
S
d4
d1
d3
d2
d6
d5
d8
d7
STEP 2-4
Let candidate set S: {d1, d2, d3, d4, d5, d6, d7, d8},
k = 4,
decreasing order function δsim scores:
d4
d5
d1
d7
d6
d2
d8
d3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Then for every step,
pick highest scoring document from S, according to their function mmr
results until |R| = k.
MMR algorithm iteratively constructs the result set R
by selecting a new document in S that maximixes the
function mmr:
λ is the tradeoff value between δsim function and
δdiv function.
k is the result set size.
δsim function computes the similarity distances
between query and document.
δdiv function computes the diversity distances
between a pair of documents.
Similarity distances can be computed with tf-idf or
other algorithms.
Since R is empty in the initial iteration, |R| is 0, so
that the element with the highest δsim in S is always
included in R, regardless of the λ value.
Let d3,d6 and d8 have highest scores.
For each calculation of function mmr, since |R| value is increasing, weight of
diversity distances is decreasing.
R
S
d4
d1
d3
d2
d6
d5
d8
d7
STEP 2-4
Since the result is incrementally constructed by inserting a new element to previous results,
the first chosen element has a large influence in the quality of the final result set R.
Moreover, experimental results show that the quality of the results
for the MMR method decreases very fast when increasing the λ parameter.
If we assume complexity of picking highest scoring document
according to function mmr as O(C),
overall complexity of MMR algorithm is O(k.C).
MMR algorithm iteratively constructs the result set R
by selecting a new document in S that maximixes the
function mmr:
λ is the tradeoff value between δsim function and
δdiv function.
k is the result set size.
δsim function computes the similarity distances
between query and document.
δdiv function computes the diversity distances
between a pair of documents.
Similarity distances can be computed with tf-idf or
other algorithms.
Since R is empty in the initial iteration, |R| is 0, so
that the element with the highest δsim in S is always
included in R, regardless of the λ value.