CS 178H Introduction to Computer Science Research What is CS Research? 1 What is CS Research? • Discovery of new knowledge of computing through mathematical analysis and experimental evaluation of algorithms and computer software. 2 Epistemology (definitions from Wikipedia) • Epistemology (from Greek επιστήμη - episteme, "knowledge" + λόγος, "logos") or theory of knowledge is the branch of philosophy concerned with the nature and scope (limitations) of knowledge. It addresses the questions: – – – – "What is knowledge?" "How is knowledge acquired?" "What do people know?" "How do we know what we know?" 3 Rationalism • Rationalism is "any view appealing to reason as a source of knowledge or justification" (Lacey 286). In more technical terms it is a method or a theory "in which the criterion of the truth is not sensory but intellectual and deductive" (Bourke 263). • Originated with Socrates (469 BC–399 BC) and Plato (428/427 BC – 348/347 BC). 4 Empiricism • Empiricism is a theory of knowledge which asserts that knowledge arises from experience. Empiricism emphasizes the role of experience and evidence, especially sensory perception, in the formation of ideas. • Originated with Aristotle (384 BC – 322 BC) 5 Rationalism in CS (Theoretical CS) • Programs are formal mathematical objects. • Therefore, important properties of algorithms/software can be proven mathematically. – Termination – Correctness (satisfies a formal specification) – Computational Complexity (time and space requirements) 6 Theoretical CS Research • Algorithm Design and Analysis – Design a new (more efficient) algorithm for some welldefined problem (e.g. sorting, longest-commonsubsequence) – Mathematically prove the correctness and improved complexity of the new algorithm. • Theoretical Analysis – Form a mathematical conjecture about a computational problem (e.g. graph isomorphism is NP-complete) – Mathematically prove the conjecture as a theorem. 7 Limits of Rationalism in CS • Sometimes software is too complex to analyze theoretically. • Sometimes correctness cannot be characterized formally and depends on natural or human behavior. – Protein folding – Handwriting/speech recognition • Sometimes software behavior on real data depends on unknown natural properties of this data. – Locality affecting paging performance 8 Empiricism in CS (Experimental CS) • Behavior of software can be studied experimentally. • Anecdotal evidence (running a few sample cases) is insufficient. • Collect data (e.g. accuracy, run-time) on running programs many times on large, realworld benchmark collections. • Verify hypotheses about behavior using controlled experiments. • Statistically analyze results for significance. 9 Scientific Method (steps from Wikipedia) • • • • • • 1) Define the question 2) Gather information and resources (observe) 3) Form hypothesis 4) Perform experiment and collect data 5) Analyze data 6) Interpret data and draw conclusions that serve as a starting point for new hypothesis • 7) Publish results • 8) Retest (frequently done by other scientists) 10 1) Define the question • Example from My Research: Search Query Disambiguation from Short Sessions – Can a web search engine disambiguate queries? Search scrubs ? 11 2) Gather information and resources • Obtained web search session data from Microsoft • Find instances of ambiguous queries • Find contextual clues that might help disambiguate queries 12 Context can Aid Disambiguation 98.7 fm www.star987.com kroq www.kroq.com scrubs ??? scrubs-tv.com huntsville hospital www.huntsvillehospital.com ebay.com www.ebay.com scrubs ??? scrubs.com 3) Form Hypothesis • Previous queries and clicks in a session can help disambiguate queries by relating them to previous sessions involving the same query (where we know what result was clicked). 14 4) Perform Experiment and Collect Data • Build system that uses prior context and previous session data to predict clicked results for new user. • Reorder results from existing search engine based on predicted probability of clicking on a result. – Should reduce number of results user needs to examine before finding a relevant one. • Test on unseen data and compare predictions to actual results clicked. 15 Using Relational Information with a Markov Logic Network (MLN) huntsville school huntsville hospital huntsvillehospital.org . . . scrubs scrubs.com ebay ... ebay.com scrubs hospitallink.com scrubs scrubs-tv.com ??? … ebay.com Controlled Experiment • Performance of experimental system must be compared to some baseline or control. • Controls are necessary to demonstrate the system is improving over some naïve method (strawman) or current best system for a problem. – For example, in the old joke, someone claims that they are snapping their fingers "to keep the tigers away"; and justifies this behavior by saying "see - its working!" While this "experiment" does not falsify the hypothesis "snapping fingers keeps the tigers away", it does not really support the hypothesis - not snapping your fingers does not keep the tigers away as well (Wikipedia: Experiment) 17 Control for Query Disambiguation • Simple control is to order results from search engine randomly. • Another baseline is to just use ordering from existing (non-personalized) search engine. 18 Performance Metrics • Need quantitative measure of system’s performance (runtime or accuracy). • Compare quantitative performance of experimental system to baseline control system. • To measure accuracy of ordering of web search results we measure AUC-ROC – Percentage of irrelevant results not seen by user before finding a relevant result (if scan results from top) 19 5) Analyze Data • Do results support the hypothesis? • Are differences statistically significant? – Use statistical test to determine if observed differences are unlikely to be due only to random variation, i.e. probability of null hypothesis < .05. 20 Results (AUC-ROC) 0.58 * Indicates statistically significant improvement over previous result AUC-ROC * * 0.56 * 0.54 0.52 0.5 0.48 0.46 Random ClickSim ClickKW-Sim MLN1 MLN2 MLN3 6) Interpret data and draw conclusions that serve as a starting point for new hypothesis • Is random ordering the best baseline to compare to? • What if just order results based on popularity (i.e. how many people clicked on a particular result after submitting a given ambiguous query). 22 New Baseline Results 23 Refine System • Develop MLN that incorporates popularity information. • Rerun experiment to obtain results for revised version and verify the hypothesis that it performs better than the popularity baseline. 24 Results for Revised System 25 7) Publish Results • Paper submitted to the international data mining conference. – KDD-09: Paris, June 28 – July 1, 2009 26
© Copyright 2026 Paperzz