Intelligent Information Retrieval and Web Search

CS 178H
Introduction to
Computer Science Research
What is CS Research?
1
What is CS Research?
• Discovery of new knowledge of computing
through mathematical analysis and
experimental evaluation of algorithms and
computer software.
2
Epistemology
(definitions from Wikipedia)
• Epistemology (from Greek επιστήμη - episteme,
"knowledge" + λόγος, "logos") or theory of
knowledge is the branch of philosophy concerned
with the nature and scope (limitations) of
knowledge. It addresses the questions:
–
–
–
–
"What is knowledge?"
"How is knowledge acquired?"
"What do people know?"
"How do we know what we know?"
3
Rationalism
• Rationalism is "any view appealing to reason as a
source of knowledge or justification" (Lacey 286).
In more technical terms it is a method or a theory
"in which the criterion of the truth is not sensory
but intellectual and deductive" (Bourke 263).
• Originated with Socrates (469 BC–399 BC) and
Plato (428/427 BC – 348/347 BC).
4
Empiricism
• Empiricism is a theory of knowledge which
asserts that knowledge arises from experience.
Empiricism emphasizes the role of experience and
evidence, especially sensory perception, in the
formation of ideas.
• Originated with Aristotle (384 BC – 322 BC)
5
Rationalism in CS
(Theoretical CS)
• Programs are formal mathematical objects.
• Therefore, important properties of
algorithms/software can be proven
mathematically.
– Termination
– Correctness (satisfies a formal specification)
– Computational Complexity (time and space
requirements)
6
Theoretical CS Research
• Algorithm Design and Analysis
– Design a new (more efficient) algorithm for some welldefined problem (e.g. sorting, longest-commonsubsequence)
– Mathematically prove the correctness and improved
complexity of the new algorithm.
• Theoretical Analysis
– Form a mathematical conjecture about a computational
problem (e.g. graph isomorphism is NP-complete)
– Mathematically prove the conjecture as a theorem.
7
Limits of Rationalism in CS
• Sometimes software is too complex to analyze
theoretically.
• Sometimes correctness cannot be characterized
formally and depends on natural or human
behavior.
– Protein folding
– Handwriting/speech recognition
• Sometimes software behavior on real data depends
on unknown natural properties of this data.
– Locality affecting paging performance
8
Empiricism in CS
(Experimental CS)
• Behavior of software can be studied
experimentally.
• Anecdotal evidence (running a few sample
cases) is insufficient.
• Collect data (e.g. accuracy, run-time) on
running programs many times on large, realworld benchmark collections.
• Verify hypotheses about behavior using
controlled experiments.
• Statistically analyze results for significance.
9
Scientific Method
(steps from Wikipedia)
•
•
•
•
•
•
1) Define the question
2) Gather information and resources (observe)
3) Form hypothesis
4) Perform experiment and collect data
5) Analyze data
6) Interpret data and draw conclusions that
serve as a starting point for new hypothesis
• 7) Publish results
• 8) Retest (frequently done by other scientists)
10
1) Define the question
• Example from My Research: Search Query
Disambiguation from Short Sessions
– Can a web search engine disambiguate queries?
Search
scrubs
?
11
2) Gather information and resources
• Obtained web search session data from
Microsoft
• Find instances of ambiguous queries
• Find contextual clues that might help
disambiguate queries
12
Context can Aid Disambiguation
98.7 fm
www.star987.com
kroq
www.kroq.com
scrubs
???
scrubs-tv.com
huntsville hospital
www.huntsvillehospital.com
ebay.com
www.ebay.com
scrubs
???
scrubs.com
3) Form Hypothesis
• Previous queries and clicks in a session can help
disambiguate queries by relating them to previous
sessions involving the same query (where we
know what result was clicked).
14
4) Perform Experiment and Collect Data
• Build system that uses prior context and
previous session data to predict clicked
results for new user.
• Reorder results from existing search engine
based on predicted probability of clicking
on a result.
– Should reduce number of results user needs to
examine before finding a relevant one.
• Test on unseen data and compare
predictions to actual results clicked.
15
Using Relational Information with a
Markov Logic Network (MLN)
huntsville school
huntsville hospital
huntsvillehospital.org
. . .
scrubs
scrubs.com
ebay
...
ebay.com
scrubs
hospitallink.com
scrubs
scrubs-tv.com
???
…
ebay.com
Controlled Experiment
• Performance of experimental system must be
compared to some baseline or control.
• Controls are necessary to demonstrate the
system is improving over some naïve method
(strawman) or current best system for a problem.
– For example, in the old joke, someone claims that they are
snapping their fingers "to keep the tigers away"; and justifies
this behavior by saying "see - its working!" While this
"experiment" does not falsify the hypothesis "snapping fingers
keeps the tigers away", it does not really support the
hypothesis - not snapping your fingers does not keep the
tigers away as well (Wikipedia: Experiment)
17
Control for Query Disambiguation
• Simple control is to order results from
search engine randomly.
• Another baseline is to just use ordering
from existing (non-personalized) search
engine.
18
Performance Metrics
• Need quantitative measure of system’s
performance (runtime or accuracy).
• Compare quantitative performance of
experimental system to baseline control
system.
• To measure accuracy of ordering of web
search results we measure AUC-ROC
– Percentage of irrelevant results not seen by user
before finding a relevant result (if scan results
from top)
19
5) Analyze Data
• Do results support the hypothesis?
• Are differences statistically significant?
– Use statistical test to determine if observed
differences are unlikely to be due only to
random variation, i.e. probability of null
hypothesis < .05.
20
Results (AUC-ROC)
0.58
* Indicates statistically significant improvement over previous result
AUC-ROC
*
*
0.56
*
0.54
0.52
0.5
0.48
0.46
Random
ClickSim
ClickKW-Sim
MLN1
MLN2
MLN3
6) Interpret data and draw conclusions that
serve as a starting point for new hypothesis
• Is random ordering the best baseline to
compare to?
• What if just order results based on
popularity (i.e. how many people clicked on
a particular result after submitting a given
ambiguous query).
22
New Baseline Results
23
Refine System
• Develop MLN that incorporates popularity
information.
• Rerun experiment to obtain results for
revised version and verify the hypothesis
that it performs better than the popularity
baseline.
24
Results for Revised System
25
7) Publish Results
• Paper submitted to the international data
mining conference.
– KDD-09: Paris, June 28 – July 1, 2009
26