ONTOSE 2010 – 7th June 2010 @ CAISE, Hammamet, Tunisia Query expansion for the legal domain: a case study from the JUMAS project Fabio Sartori, Matteo Palmonari Disco, University of Milan-Bicocca, Italy Paper appears in: M.-A. Sicilia, Ch. Kop, F. Sartori, ONTOSE 2010, Springer, Lecture Notes in Business Information Processing (LNBIP), Vol. 62 Outline • JUMAS: Background & Motivation • Query Expansion Methods • Platform for Qualitative Evaluation of Methods Outline • JUMAS: Background & Motivation • Query Expansion Methods • Platform for Qualitative Evaluation of Methods Libraries in the Juridical Domain • The legal domain has interested AI for many years • Knowledge representation systems for reasoning about the legal domain • Large and heterogeneous data sources: • Documents • Video and audio of trials • Need for effective techniques to manage libraries in the juridical domain • Semantic processing and search • The JUMAS project addresses these issues The JUMAS project • JUdicial MAnagement by Digital Libraries Semantics • JUMAS envisages a system for the embedded semantic extraction from multimedia data that join into an advanced knowledge management system Jumas’ Objectives • Objectives: – Knowledge Models and Spaces: search directly in the audio and video source without a verbatim transcription of the proceedings – Knowledge and Content Management: exploit hidden semantics in audiovisual digital libraries in order to facilitate search and retrieval, intelligent processing and effective presentation of multimedia information – Sensor and Multimedia Integration: information fusion deriving from multi-modal sources (audio/video) in order to improve accuracy in automatic transcription and annotation phases – Effective Information Management: streamline and optimise the document workflow allowing the analysis of (un) structured information for document search and evidence base assessment – ICT Infrastructure: Service Oriented Architecture supporting a largescale audiovideo retrieval system focusing on scalability, interoperability and modularity. Jumas’ Objectives • Objectives: – Knowledge Models and Spaces: search directly in the audio and video source without a verbatim transcription of the proceedings – Knowledge and Content Management: exploit hidden semantics in audiovisual digital libraries in order to facilitate search and retrieval, intelligent processing and effective presentation of multimedia information – Sensor and Multimedia Integration: information fusion deriving from multi-modal sources (audio/video) in order to improve accuracy in automatic transcription and annotation phases – Effective Information Management: streamline and optimise the document workflow allowing the analysis of (un) structured information for document search and evidence base assessment – ICT Infrastructure: Service Oriented Architecture supporting a largescale audiovideo retrieval system focusing on scalability, interoperability and modularity. Jumas’ Functional Architecture • Automatic transcription • Semantic annotation • Search services, including semantic search Semantic Search and Query Expansion (QE) • Problems with search for semantically annotated content: poor recall • Query expansion (QE) is the process of reformulating a query to improve retrieval performance. – Evaluates a query (i.e. the keywords typed into the search query form) and expands them to match additional relevant documents, using a domain ontology for query expansion • Many different approaches: – Difficult to evaluate which is better in a given domain Specific Goal & Objectives • Main goal • Testing and comparison of existing ontology-based methods for query expansion to be used in the JUMAS project • Objectives • Develop a configurable platform for query expansion that: • implements and integrate existing methods • can be adopted by domain experts in the context of the JUMAS project Outline • JUMAS: Background & Motivation • Query Expansion Methods • Platform for Qualitative Evaluation of Methods Query Expansion Explained Query expansion Knife , trial , murder Knife phrase-based single-keyword often E consists of a set of weighted keywords Two main appraches in literature: – Probabilistic QE – Ontology-based QE knife - 1.0 blade - 0.6 melee weapon - 0.5 bayonet - 0.3 machete - 0.3 Scalpel - 0.3 Sword - 0.2 Query Expansion Explained Query expansion Knife , trial , murder Knife phrase-based single-keyword often E consists of a set of weighted keywords Two main appraches in literature: – Probabilistic QE – Ontology-based QE knife - 1.0 blade - 0.6 melee weapon - 0.5 bayonet - 0.3 machete - 0.3 Scalpel - 0.3 Sword - 0.2 Methodology 1.Analysis of proposed approaches 2.Implementation of the platform, including a set of selected methods 3.Testing e qualitative method comparison Proposed and Selected QE Methods • There are many classifications according to: – Type of method • (relevance feedback / similarity function) – User involvement • (manual / interactive / automatic) – Usage of adjacent words • (single word / phrase based) – Kind of query • (short / long, structured / not structured, navigational / information / transactional) – Application domain • documents / tagged resources /ontologies / source code / …) – Kind of domain • (generic / specific) Voorhees 94 94 [Voorhees] [Voorhees] Voorhees Tuominen et et al. al. 09 09 Tuominen Alani et et al. al. 00 00 [ATJ] [ATJ] Alani Navigli & & Velardi Velardi 03 03 Navigli Hirst & & St-Onge St-Onge 97 97 [HO] [HO] Hirst Andreou 05 05 Andreou Xhu et et al. al. 06 06 Xhu Calegari & & Pasi Pasi 08 08 Calegari + Wu & Palmer 94 [Ancestor] Adopted Ontology Model and Web-compliant Representation • Thesaurus as the ontological models (i.e. lexical ontologies) – focus on four relations among terms: • • • • SYN (Sinonimy); NT (Narrower Term); BT (Broader Term); RT (Related Term). • Representation with SKOS (Simple Knowledge Organization System) – W3C recommandation for thesauri definition – based on RDF (+- OWL) Platform’s functional architecture • Three main blocks: – query expansion testing interface – query expansion engine – ontology management Outline • JUMAS: Background & Motivation • Query Expansion Methods • Platform for Qualitative Evaluation of Methods Platform’s functional architecture • Three main blocks: – query expansion testing interface – query expansion engine – ontology management Platform’s functional architecture • Three main blocks: – query expansion testing interface – query expansion engine – ontology management Platform’s functional architecture • Three main blocks: – query expansion testing interface – query expansion engine – ontology management QE workflow Query Expansion Engine in JUMAS • QE exposed as a Web service Query Expansion Platform’s GUI • Queries and results – with explanations Type of relation Retrieved word and distance Word to expand Query Expansion Platform’s GUI • Ontology Management Interface – Add/delete terms and relations to/from the thesaurus Retrieved word and distance Word to expand Export the ontology Query Expansion Platform’s GUI Example for HO • Platform configuration – Configuration of parameters for each method Conclusions & Future Work • Development of – an environment for testing and discovering potentiality of expansion through lexical ontologies (i.e. thesauri) – a query expander service that can be invoked by a Web service • The platform allowed the expert from the court to select a method that they felt it fit better their needs (Hirst & StOnge 97) • Future work – More quantitative experiments – Axiomatic ontologies – Adaptation of other similarity measures for ontologies (e.g. in the Simpack library) Thanks! Thesaurus adopted • Thesaurus PICO, from www.culturaitalia.it (the website of Italian Ministery of Cultural Resources Management) • Why Pico? – Developed in SKOS; – complete; – Well-known domain. • Completed by means of – RT addition; – Synonyms addition; – Terms addition. Set di query • The query adopted are single word • They have been selected in order to guarantee that each term can distignuishe within the thesaurus. In particular they are different according to: • genericity; • Number of NT; • Number of RT; • Presence of synonyms; • Numbr of parents; Evaluation Criteria • Qualitative, not quantitative! • Based on reasonable research scenarios Output Output Query Output Query Expansion By means of Ancestor Output Query Expansion By means of Ancestor Expansion By means of Voorhees (vers.1) Results Indications on the better expansion for different scenarios: • • • • • Contextual definition: expansion by menas of brothers or NT Components definition: the expansion by mean of NT is ideal, it is better to avoid the expansion by menas of brother nodes Search for general information: RT are very usefule, better to avoid NT methods Search for specific information: better to avoid brothrhood based methods, synonyms are very useful Causal relationships among terms: it is ideal to expand by means of children instead of brothers Results Indications on general features of QE: • • Importance numerosity: relationships among terms seem to be inverse proportional with respect to their number Expansion “Explosion”: some terms, if reached during the expansion, cause the addition of many other terms which are irrelevant These result are starting points for new tests!! Sviluppi futuri • • • Platform Extension: – Addition of new QE methods – Addition of new kinds of ontolgies – Addition of a suitable GUI Application to the JUMAS and CRESM-Milano Antica projects Analisys and research: looking for new methods, phrase-based methods, techniques to detect the kind of queries and to configure the method to apply according to them Metodo 1: Voorhees Four variants: 1 Expansion only by means of synonyms 2 Expansion by means of synonyms and hyponyms 3 Expansion by means of synonyms, hypnyms and hyperonyms 4 Expansion by means of all the adjacent nodes Metodo 2: ATJ Quattro varianti: • espansione con iponimi e iperonimi, pesando le relazioni in base a tipo e profondità; • come prima, con l'aggiunta delle relazioni RT; • limitazione topologica all'espansione delle RT; • limitazione semantica all'espansione delle RT (non implementata). Metodo 2: ATJ Quattro varianti: • espansione con iponimi e iperonimi, pesando le relazioni in base a tipo e profondità; • come prima, con l'aggiunta delle relazioni RT; • limitazione topologica all'espansione delle RT; • limitazione semantica all'espansione delle RT (non implementata). Metodo 2: ATJ Quattro varianti: • espansione con iponimi e iperonimi, pesando le relazioni in base a tipo e profondità; • come prima, con l'aggiunta delle relazioni RT; • limitazione topologica all'espansione delle RT; • limitazione semantica all'espansione delle RT (non implementata). Metodo 2: ATJ Quattro varianti: • espansione con iponimi e iperonimi, pesando le relazioni in base a tipo e profondità; • come prima, con l'aggiunta delle relazioni RT; • limitazione topologica all'espansione delle RT; • limitazione semantica all'espansione delle RT (non implementata). Metodo 2: ATJ Quattro varianti: • espansione con iponimi e iperonimi, pesando le relazioni in base a tipo e profondità; • come prima, con l'aggiunta delle relazioni RT; • limitazione topologica all'espansione delle RT; • limitazione semantica all'espansione delle RT (non implementata). Metodo 3: HO • Vengono aggiunti soli i termini che sono raggiunti attraverso pattern specifici: • Una relazione BT può essere preceduto solo da altre relazioni BT • Può esserci un unico cambio di direzione, due se il pattern è composto da serie di BT, serie di RT, serie di NT • La lunghezza massima del percorso è 5 Metodo 3: HO, pattern Uno o più RT Uno o più BT Uno o più NT Metodo 3: HO Vengono aggiunti soli i termini che sono raggiunti attraverso pattern specifici: • Una relazione BT può essere preceduto solo da altre relazioni BT • Può esserci un unico cambio di direzione, due se il pattern è composto da serie di BT, serie di RT, serie di NT • La lunghezza massima del percorso è 5 Metodo 4: Ancestor • Calcola la similarità di due termini tramite una formula che tiene conto della distanza dal genitore comune e dalla profondità di quest'ultimo: ConSim C1,C2 = 2 N3 N1+N2+ 2 N3 • Vengono aggiunti i termini che hanno distanza minore dal termine iniziale. Metodo 4: Ancestor • Calcola la similarità di due termini tramite una formula che tiene conto della distanza dal genitore comune e dalla profondità di quest'ultimo: ConSim C1,C2 = 2 N3 N1+N2+ 2 N3 • Vengono aggiunti i termini che hanno distanza minore dal termine iniziale.
© Copyright 2025 Paperzz