Find Shortest Snippet

Find Shortest Snippet
• Given a query (e.g., ‘hello world goodbye’)
and a document, find the shortest snippet
(the smallest window of text) in the
document that contains all of the query
words at least once.
Find Shortest Snippet
• Hint
– Use a data structure similar to inverted index:
n arrays of sorted integers representing the
positions of words in the document. For
example:
• hello:
• world:
• goodbye:
5 14 19 35 52
11 17 29 40
1 25 63 72
Find Shortest Snippet
• Further Questions
– What is the time complexity of the brute-force
algorithm?
– What is the time complexity of the enhanced
algorithm?
– Will this runtime matter in practice for typical
queries and web pages?
Find Shortest Snippet
• Further Questions
– Why is this simple algorithm not appropriate
for the snippets generated for google search
results?
– How could your algorithm be improved to
generate better snippets from an end users
perspective?