A Cognitive Framework for Exploiting Context in Information Retrieval Birger Larsen Information Interaction and Information Architecture Royal School of Library and Information Science Copenhagen, Denmark [email protected] IR Seminar, University of Glasgow, January 25, 2010 Outline • The idea of polyrepresentation in Information Retrieval – cognitive representations associated with users, documents and IR models • Empirical evidence published to date – Similar approaches – not adhering directly to polyrepresentation – Results of experiments from polyrepresentative perspective • On Information Space – Combination of databases – Combinations of search engines – Combinations of document representations • On Cognitive Space – Work task perception and knowledge state inclusion • Future work DCS, University of Glasgow Cognitive Framework for Exploiting Context 2 Polyrepresentation • First presented in 1994 / 1996 – Originates in Peter Ingwersen’s work on establishing a theory for interactive IR from a cognitive point of view (1992) – May be seen as an effort to demonstrate the applicability of this cognitive viewpoint – not a formal mathematical theory, but rather presents a holistic framework – emphasises the potential benefits in exploiting combinations of representations based on their cognitive origins DCS, University of Glasgow Cognitive Framework for Exploiting Context 3 Polyrepresentation Central hypothesis: The more cognitively or typologically different representations (evidence; features) that point to an information object – and the more intensively they do so – the higher the probability that the object is relevant to the topic, the information need, the situation at hand, or the influencing context of the situation (The Turn, 2005, p. 208) DCS, University of Glasgow Cognitive Framework for Exploiting Context 4 Polyrepresentation • Why at all use Polyrepresentation today? – Its all about context … and how to exploit different contexts – It is integration … and might serve as a common framework for integrating various facets of IR and interaction – It is oriented towards practical application • Relatively few studies have so far directly implemented Polyrepresentation – First presented as a ‘theory’ later as a ‘principle’... DCS, University of Glasgow Cognitive Framework for Exploiting Context 5 Representations? • A plethora of different preconditions and interpretations of the current situation: – from different cognitive origins – cognitively different – from the same origin, but displaying functionally different cognitive types, e.g. TI, AB, full text sections, table captions etc. from one author • Performed in different styles depending on domain – For instance, academic papers vs. blog entries vs. radio news broadcasts DCS, University of Glasgow Cognitive Framework for Exploiting Context 6 Documents Users seeking information IR models & systems DCS, University of Glasgow Cognitive Framework for Exploiting Context 7 Features of Author’s responsibility • Interpretation by author(s) – Full-text terms – Zipfian distributions – Particular section terms (e.g. Introduction – XML structures) – Title & section title terms – Caption terms…. Image features • Situational/domain interpretation by author(s) – References & anchor texts (with cited names, journals, titles..) – Out-links – with anchor text DCS, University of Glasgow Cognitive Framework for Exploiting Context 8 Polyrepresentative overlaps of cognitively & typologically different representations by one engine in information space - associated with one searcher statement in scholarly documents CITATIONS In-links to titles authors & passages THESAURUS structure COGNITIVE OVERLAP SELECTORS Journal name Publication year Database(s) Corporate source Country DCS, University of Glasgow AUTHOR(s) Text - images Headings Captions Titles References Out-links INDEXERS Class codes Descriptors Document type Weights Cognitive Framework for Exploiting Context 10 Earlier use of features for IR – not adhering explicitly to polyrepresentation (or any other theory) • Databases via (relevant) seed documents (Medline+SCI), McKain (1989), Pao (1994) • Engines (probabilistic+vector space): I3R Croft & Thomson (1987) – overlaps not assessed for relevance (union: to increase recall; intersection: to increase precision) • Weighting & indexing algorithms with human RF: Combinations seem to outperform individual algorithms, Ruthven, Lalmas & van Rijsbergen (2002) • Different searcher statements: Combinations outperform single query formulations, Belkin et al. (1993) DCS, University of Glasgow Cognitive Framework for Exploiting Context 13 Polyrepresentation lessons • Some experiences from practical application of polyrepresentation: – Skov, Larsen & Ingwersen (2004; 2008) – Larsen, Ingwersen & Lund (2009) – Kelly et al. (2005; 2007) – information space polyrepresentation – White et al. (2006) – on relevance feedback and later – Efron (2009) – automatic generation of pseudo relevance assessments DCS, University of Glasgow Cognitive Framework for Exploiting Context 14 Results of polyrep. experiments 2 • Combinations of query representations (Skov et al., 2004; 2008) – Cystic Fibrosis collection (1200 docs., +reference lists, freq. of citations, graded relevance, 29 topics) – Tests of query structure; value-adding by MeSH-terms; use of reference title words+TI+AB+DE • In total 15 different overlap combinations tested: DCS, University of Glasgow Cognitive Framework for Exploiting Context 15 Results for all 15 overlaps – restricted polyrepresentation DCS, University of Glasgow Cognitive Framework for Exploiting Context 16 Skov et al.- applying weights to overlaps (Cumulated Gain values) Rank 5 10 15 20 25 30 Ideal vector 9.8 18.1 24.8 30.7 35.1 38.7 Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 4.5 8.4 11.2 13.6 15.2 17.1 5.5 10.2 13.6 16.8 18.5 20.2 5.3 9.7 13.4 15.5 18.2 20.0 4.9 8.9 12.0 14.3 16.7 18.2 5.3 9.6 12.9 15.8 17.7 19.3 5.4 9.5 12.6 15.6 17.0 18.7 Bag-ofwords 5.9 10.1 13.0 14.9 16.9 18.6 Run 1: No weighting applied Run 2: Overlap 1, 3, 4, and 5: weight 100. Run 3: Overlap 1, 3, 4, and 5: weight 100; overlap 2, 6, 8, and 10: weight 50 Run 4: Overlap 1: weight 100; overlap 2, 3, 4, and 5: weight 66; overlap 6, 7, 8, 9, 10, and 11 weight 33 Run 5: Overlap 1, 3, 4, and 5: weight 100 + received at least one citation Run 6: Overlap 1, 3, 4, and 5: weight 100 + received at least three citations DCS, University of Glasgow Cognitive Framework for Exploiting Context 17 Results of experiments directly adhering to polyrepresentation 3 • Results (Skov et al., 2008): – The more cognitively different the representations in overlaps, the higher the precision; – Combinations with reference title terms outperformed other combinations as well as individual searches – Structured queries outperformed unstructured queries over all comb. – Re-ranking by citation freq. decreased performance (small numbers though!) DCS, University of Glasgow Cognitive Framework for Exploiting Context 18 Overlap between different IR models (data fusion) Total cognitive overlap IR model X xy IR model Y xy xyz xz yz IR model Z DCS, University of Glasgow Cognitive Framework for Exploiting Context 19 Two types of Polyrepresentation Restricted/disjoint: Each document only in One overlap (by not logic): Documents in ‘fuse4’ are Not in the ‘fuse3’ overlaps. Relaxed/traditional: Documents in ‘fuse4’ also present in ‘fuse3’ & ‘fuse2’ overlaps, providing a list of documents that may be ranked by weights according to presence. DCS, University of Glasgow Cognitive Framework for Exploiting Context 20 Lund et al. – data fusion (30 TREC 5 topics, DCV = 100) ETH & COR: SMART family UWG: special IR algorithm GEN: NLP - machine DCS, University of Glasgow Cognitive Framework for Exploiting Context 21 Kelly et al. 2005, 2007 • TREC HARD track: 13 searchers contributed 45 topics • Searchers assessed relevance: off-topic; ontopic/relevant = relevant • Use of clarification forms – – – – Q1: Times in the past searching topic? Q2: Describe what you already know about topic Q3: Why do you want to know about this topic? Q4: Please input any additional keywords that describe your topic. DCS, University of Glasgow Cognitive Framework for Exploiting Context 22 Overlap between different parts of the user’s cognitive structures Precision Document set A Request version Cognitive overlap Task / Goal Description from IR model X Document set B DCS, University of Glasgow Recall (Kelly …) 23 Kelly et al. 2005, 2007 cont. … • Lemur toolkit – OKAPI BM25 engine, MAP + T-tests – Baseline run: using terms from TREC topic title and description (BL) – Experimental runs: BL + pseudo RF; BL + real RF; BL+Q2; BL+Q3 … • Results: no. of query terms per source: – BL: 9,33; Q2: 16.18; Q3: 10.67; Q4: 2.33 (considerable variation) – Pseudo RF lower than baseline (.284), but pseudo50 better than BL – All single Q and Q-combinations (weighted union) outperform Baseline (Q2+3+4: .368) – Direct strong correlation between query length (BL > BL+Q4 > BL+Q3 …) and performance! DCS, University of Glasgow Cognitive Framework for Exploiting Context 24 Concluding remarks • Many possible ways of polyrepresentation yet to be tested • Some indications from experiments demonstrate that the principle works – but: – Care to be taken of which cognitively different structures to combine: • low-performing engines/actors will reduce performance. Use best performing combined DCS, University of Glasgow Cognitive Framework for Exploiting Context 25 Concluding remarks • Unclear so far how citations (and inlinks) may perform: the time issue • more robust tests should be performed including: – – – – – bigger and more recent data sets graded relevance real searchers non-textual material contextual information (like implicit RF: White) • Integration of geometric models and polyrepresentation? DCS, University of Glasgow Cognitive Framework for Exploiting Context 26 References DCS, University of Glasgow Cognitive Framework for Exploiting Context 27
© Copyright 2026 Paperzz