Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign U.S.A. Motivation • Information retrieval is inherently an interactive process – A user’s information need is unlikely fully satisfied with just one query execution – A user often needs to interact with the system several times through query reformulation and document-browsing • • • – Thus in general, a query exists in a search session A search session provides lots of contextual information for a query that can be exploited (e.g., previous queries and clickthrough data) Such contextual information is mostly ignored in existing search engines We aim at developing a session-based search engine that can exploit such contextual information to improve retrieval Traditional vs. Session-based Retrieval Traditional (“1-query”) Session-based Query=“IR applications” Query=“IR applications” Previous query= “retrieval systems” … Retrieval System Document Collection “IR” can mean either “information retrieval” or “infrared” Results: D1 (infrared) D2 (infrared) D3 (retrieval) D4 (infrared) D5 (retrieval) Retrieval System Results: D3 (retrieval) D5 (retrieval) Frequency in viewed docs: Infrared: 0 Retrieval: 5 … Uses more contextual information Gives more accurate results Research Issues • What is an appropriate architecture for supporting session-based retrieval? – How to manage session information? • How can we detect session boundaries? • What contextual information should we exploit? • How can we exploit such contextual information to improve document ranking? • How can we display search results in the context of a session? A Client-Server Architecture for Session-based IR Server Side Docs User model Search query Engine Search context Personalized Agent query results Top-N Session Manager User Local Collection Client Side 1.--2.--3.--…… Advantages of Server-Side Processing • Persistent user profiles (imagine if a user often uses different machines) • Have access to global user information – Can exploit information about all users to identify common access patterns – Can exploit information about similar users to help improve performance for any individual user • Have access to all the documents – Can perform more powerful statistical analysis (e.g., to identify most frequently accessed docs) – Can improve document representation over time Advantages of a Client-Side Agent • Can capture more information about the user thus more accurate user modeling – Can exploit the complete interaction history (e.g., easily capture click-through information) – Can exploit a user’s other activities (e.g., searching immediately after reading an email) – Can detect session boundary more accurately • More scalable (“distributed personalization”) • Alleviate the problem of privacy for personalization Session Boundary Detection • Detection is generally easier if done on the client side – More information about the user can be exploited • • – E.g., knowing that “logout” and “login” happened between two queries Sever side has access to query co-occurrence patterns, which can help judge query coherence Possible clues for session boundary detection – Time interval between queries – Query coherence (based on word relatedness and/or query log analysis) – Activities in between two queries Useful Session Context Information • Previous queries in the same session • Documents viewed and not viewed so far in the current session • Other user activities during the same time as the current session • Context information collected in a similar session by the current user or other users • …… Session-based Retrieval Models • • Framework: The risk minimization retrieval framework [Lafferty & Zhai 01, Zhai 02] can be naturally extended to support session-based retrieval One possible model (KL-divergence model) – Retrieval = estimating a query model + estimating a doc model + computing their KL-divergence – Session context information (and any other potentially useful information) can be used to estimate a better (session-based) query model ˆD arg max p( | Doc) ˆQ arg max p( | Query,User , CurrentSessionContext ) Refinement of this model leads to specific retrieval formulas Session-based Result Presentation • Retrieval results can be displayed in the context of the current session • – Previous search results in the session can be exploited to show which document has been consistently moving up in ranking as the user is reformulating the query – All the queries in the session can be combined and analyzed to generate a subtopic space for the user’s information need, and documents can be organized and displayed in this space Session-based result presentation can – Help a user digest the search results more effectively and more efficiently – Help a user to quickly focus on the important concept/topic dimensions – Help a user to figure out how to better formulate a query ACES: A Contextual Engine for Search • Architecture: server-side session management • Session-boundary detection: probabilistic measure of query similarity • Session-based ranking: use the KL-div retrieval model and estimate a query model based on – Original query – Displayed title and summary of viewed documents in the same session – Previous queries in the same search session • Session-based result display: show ranks of each doc w.r.t. all the previous queries ACES System Architecture Web Browser Query Clickthrough Data Search Result Document Text Internet Web/Application Server Query Clickthrough Data Search Engine Profile Capture User Profile Text DB RDBMS Details of the Ranking Algorithm • Query model updating using past queries q1, q2,…, qk 1 k c ( w , qi ) p( w | q ) p( w | q1 , q2 ,..., qk ) |qi | (1 ) k i k i 1 ' • Further query model updating using the displayed title and summary of the viewed documents s1, s2,…, sk 1 k 1 c ( w, si ) k i p( w | q '') p( w | q ) |si | (1 ) k 1 i 1 ' is a decay factor to emphasize the most recent context is a parameter to control the influence of the clickthrough data Currently all parameters are set in an ad hoc way Demo: Exploiting Previous Queries in ACES • TREC AP data + Topics 1- 150 + judgments • Allow us to compare traditional search and contextual search ACES is still far away from a full-fledged session-based search engine… Much further research needs to be done… Architecture of Personalized System User model Docs Search query Engine Search context Personalized Agent Server Side 1.--- results 2.--- Top-N Session Manager query Profile Collection 3.--…… Client Side C Query generation θQ Model Selection q U Model Selection S θD Document generation d Web Browser Query Clickthrough Data Search Result Document Text Internet Web/Application Server Query Clickthrough Data Search Engine Context Capturer User Profile AP Text DB RDBMS
© Copyright 2026 Paperzz