Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Reminder: what is it? • Simple Knowledge Organisation System • Formal language for representing controlled structured vocabularies (thesauri, classification schemes, … ?) • Subject metadata & information retrieval … – ‘this document is about romantic love’. – ‘this document is about the cure of tuberculosis by xray in India in the 1950s’. • Application of RDF Alistair Miles, Ecoterm 2006, slide 2 http://www.w3.org/2004/02/skos Since Ecoterm 2005 … • SKOS Core Guide & SKOS Core Vocabulary Specification … – First Working Draft May 2005 – Second Working Draft October 2005 • Minor changes • Quick Guide to Publishing a Thesaurus on the Semantic Web … – First Working Draft May 2005 Alistair Miles, Ecoterm 2006, slide 3 http://www.w3.org/2004/02/skos What comes next … ? • Life after SWBPD-WG … ? • Plans for next phase of W3C Semantic Web Activity … • New WG? • SKOS W3C Recommendation by end 2007? • N.B. Not yet approved! Alistair Miles, Ecoterm 2006, slide 4 http://www.w3.org/2004/02/skos If Rec then … • What is the scope? What is the fundamental design goal? • First part of SKOS Rec would be requirements specification. • Between now and Sept/Oct 2006 … define scope and requirements. Alistair Miles, Ecoterm 2006, slide 5 http://www.w3.org/2004/02/skos What I’d like to do here … • Talk about some of the assumptions behind SKOS. • Sketch some ideas on how to define scope and requirements for SKOS. • Get your feedback. [email protected] “SKOS: Requirements for Standardization” isegserv.itd.rl.ac.uk/public/skos/press/dc2006/paper.pdf Alistair Miles, Ecoterm 2006, slide 6 http://www.w3.org/2004/02/skos Brief history of scope … • 2003-04: SWAD-Europe – ISO 2788 thesauri – “Non-standard” thesauri via extensibility e.g. GeMET – Classification scheme (PACS) – Multilingual thesauri – Semantic mapping • 2004: W3C Glossaries • 2005: Discussion re “terminologies” • Subject headings? Gazeteers? Folksonomies? Taxonomies? Alistair Miles, Ecoterm 2006, slide 7 http://www.w3.org/2004/02/skos Assumptions: purpose … • Formal representation of controlled structured vocabularies intended for use in information retrieval applications. Alistair Miles, Ecoterm 2006, slide 8 http://www.w3.org/2004/02/skos Assumptions: workflow … a) Build a vocabulary b) Build an index c) Retrieve Alistair Miles, Ecoterm 2006, slide 9 http://www.w3.org/2004/02/skos Assumptions: components … • Vocabulary Development Application – Something to help build a vocabulary • Indexing Application – Something to help build an index • Retrieval Application – Something to help retrieve things • SKOS ultimately designed to support interoperation of these three “key components”. Alistair Miles, Ecoterm 2006, slide 10 http://www.w3.org/2004/02/skos Proposed scope … • SKOS is a formal language for representing controlled structured vocabularies intended for use within information retrieval applications. • SKOS is required to support the interoperation of these three key components. • I.e. define the requirements for SKOS by describing a set of functionalities that must be enabled. Alistair Miles, Ecoterm 2006, slide 11 http://www.w3.org/2004/02/skos Other components … • Vocabulary mapping … ? • Metadata registries … ? •…? Alistair Miles, Ecoterm 2006, slide 12 http://www.w3.org/2004/02/skos Component specs … • … first discuss social and technological context, then return to component specs … Alistair Miles, Ecoterm 2006, slide 13 http://www.w3.org/2004/02/skos Context … • What is the social and technological context in which controlled structured vocabs are used? • Assume two basic needs… – Locate something I already know about. – Discover something new. • N.B. a good location service is not necessarily a good discovery service. – Cf. Google and del.icio.us Alistair Miles, Ecoterm 2006, slide 14 http://www.w3.org/2004/02/skos Strategies … • Basic strategies for implementing retrieval services … 1. Statistical text analysis 2. Analysis of user behaviour 3. Index with controlled vocab • Other strategies … 1. … kos-assisted text analysis? Alistair Miles, Ecoterm 2006, slide 15 http://www.w3.org/2004/02/skos Cost problem … • Given that applying controlled structured vocab for retrieval involves significant initial and ongoing investment… • Given that other strategies are cheaper… • Huge pressure to drive down cost and increase utility. • Requirement for seamless integration. – I.e. controlled vocab is seldom used in isolation, most applications will combine strategies. Alistair Miles, Ecoterm 2006, slide 16 http://www.w3.org/2004/02/skos Use case … • Search portal … • Use combined strategies. Alistair Miles, Ecoterm 2006, slide 17 http://www.w3.org/2004/02/skos Component specs … • Important factors … • Minimise cost. – Decentralisation. – Assistance. • Maximise “utility”. – Query expansion. – Smart ranking. – Maximize lifetime. • Use the Semantic Web! – Situation A. search across many collections, where indexers use same controlled vocab. – Situation B. search across many collections, where indexes use different controlled vocabs. Alistair Miles, Ecoterm 2006, slide 18 http://www.w3.org/2004/02/skos Focus areas … • Decentralisation requires different models of collaboration and change. • Representing change a key factor to keeping a vocab applicable. • Ranking and scoring well understood for text, less so for controlled index. • Theory of query expansion? Field trials of query expansion? • Strategies for providing assistance? Alistair Miles, Ecoterm 2006, slide 19 http://www.w3.org/2004/02/skos Change and collaboration • Continuum of collaboration models: centralized <-> decentralised • Continuum of change management models: continuous <-> discrete • Decentralization can reduce cost of development and maintenance • Change management can ensure continued utility – maximize ROI • Support for declarative representation of change a requirement for SKOS. Alistair Miles, Ecoterm 2006, slide 20 http://www.w3.org/2004/02/skos Semantic Web architecture… • Exploit Semantic Web facility to distribute and merge data. • However, publication of data in the Semantic Web, best practices need work. • See “Best Practice Recipes for Publishing RDF Vocabularies” W3C Working Draft (Google “publishing RDF”). Alistair Miles, Ecoterm 2006, slide 21 http://www.w3.org/2004/02/skos Semantic Web architecture Alistair Miles, Ecoterm 2006, slide 22 http://www.w3.org/2004/02/skos Direct interaction … Alistair Miles, Ecoterm 2006, slide 23 http://www.w3.org/2004/02/skos Information retrieval… • Indexing and query evaluation well understood for text content. • Less well understood for controlled metadata. • Query types? • Query evaluation strategies, e.g. query expansion? • Ranking? Alistair Miles, Ecoterm 2006, slide 24 http://www.w3.org/2004/02/skos Assistance for indexers … • Provide suggestions – – – – Comparison of labels and annotations Machine learning Exploit lexical resources …? Alistair Miles, Ecoterm 2006, slide 25 http://www.w3.org/2004/02/skos Assistance for mappers … • Provide suggestions … – Analysis of labels and annotations – Exploit lexical resources –…? Alistair Miles, Ecoterm 2006, slide 26 http://www.w3.org/2004/02/skos Summary • SKOS: fundamental requirement to support information retrieval using controlled structured vocabularies. • Define requirements by describing information retrieval functionalities. • Divide functionalities into: – Presentation styles – Query types e.g. compound queries, coordination … – Query evaluation strategies • Assumptions: – – – – Key components Semantic Web interaction Context – pressure to make vocabularies “profitable” … Issues: change, assistance, theory … Alistair Miles, Ecoterm 2006, slide 27 http://www.w3.org/2004/02/skos
© Copyright 2026 Paperzz