Maximising Utility and Minimising Cost - SKOS and

Rutherford Appleton Laboratory
Semantic Web Best Practices and Deployment
SKOS
Ecoterm 2006
Alistair Miles
CCLRC Rutherford Appleton Laboratory
Reminder: what is it?
• Simple Knowledge Organisation System
• Formal language for representing
controlled structured vocabularies
(thesauri, classification schemes, … ?)
• Subject metadata & information
retrieval …
– ‘this document is about romantic love’.
– ‘this document is about the cure of tuberculosis by xray in India in the 1950s’.
• Application of RDF
Alistair Miles, Ecoterm 2006, slide 2
http://www.w3.org/2004/02/skos
Since Ecoterm 2005 …
• SKOS Core Guide & SKOS Core
Vocabulary Specification …
– First Working Draft May 2005
– Second Working Draft October 2005
• Minor changes
• Quick Guide to Publishing a
Thesaurus on the Semantic Web …
– First Working Draft May 2005
Alistair Miles, Ecoterm 2006, slide 3
http://www.w3.org/2004/02/skos
What comes next … ?
• Life after SWBPD-WG … ?
• Plans for next phase of W3C
Semantic Web Activity …
• New WG?
• SKOS W3C Recommendation by end
2007?
• N.B. Not yet approved!
Alistair Miles, Ecoterm 2006, slide 4
http://www.w3.org/2004/02/skos
If Rec then …
• What is the scope? What is the
fundamental design goal?
• First part of SKOS Rec would be
requirements specification.
• Between now and Sept/Oct 2006 …
define scope and requirements.
Alistair Miles, Ecoterm 2006, slide 5
http://www.w3.org/2004/02/skos
What I’d like to do here …
• Talk about some of the assumptions
behind SKOS.
• Sketch some ideas on how to define
scope and requirements for SKOS.
• Get your feedback.
[email protected]
“SKOS: Requirements for Standardization”
isegserv.itd.rl.ac.uk/public/skos/press/dc2006/paper.pdf
Alistair Miles, Ecoterm 2006, slide 6
http://www.w3.org/2004/02/skos
Brief history of scope …
• 2003-04: SWAD-Europe
– ISO 2788 thesauri
– “Non-standard” thesauri via extensibility e.g.
GeMET
– Classification scheme (PACS)
– Multilingual thesauri
– Semantic mapping
• 2004: W3C Glossaries
• 2005: Discussion re “terminologies”
• Subject headings? Gazeteers?
Folksonomies? Taxonomies?
Alistair Miles, Ecoterm 2006, slide 7
http://www.w3.org/2004/02/skos
Assumptions: purpose …
• Formal representation of controlled
structured vocabularies intended for
use in information retrieval
applications.
Alistair Miles, Ecoterm 2006, slide 8
http://www.w3.org/2004/02/skos
Assumptions: workflow …
a) Build a vocabulary
b) Build an index
c) Retrieve
Alistair Miles, Ecoterm 2006, slide 9
http://www.w3.org/2004/02/skos
Assumptions: components …
• Vocabulary Development Application
– Something to help build a vocabulary
• Indexing Application
– Something to help build an index
• Retrieval Application
– Something to help retrieve things
• SKOS ultimately designed to support
interoperation of these three “key
components”.
Alistair Miles, Ecoterm 2006, slide 10
http://www.w3.org/2004/02/skos
Proposed scope …
• SKOS is a formal language for
representing controlled structured
vocabularies intended for use within
information retrieval applications.
• SKOS is required to support the
interoperation of these three key
components.
• I.e. define the requirements for SKOS by
describing a set of functionalities that
must be enabled.
Alistair Miles, Ecoterm 2006, slide 11
http://www.w3.org/2004/02/skos
Other components …
• Vocabulary mapping … ?
• Metadata registries … ?
•…?
Alistair Miles, Ecoterm 2006, slide 12
http://www.w3.org/2004/02/skos
Component specs …
• … first discuss social and
technological context, then return to
component specs …
Alistair Miles, Ecoterm 2006, slide 13
http://www.w3.org/2004/02/skos
Context …
• What is the social and technological
context in which controlled
structured vocabs are used?
• Assume two basic needs…
– Locate something I already know about.
– Discover something new.
• N.B. a good location service is not
necessarily a good discovery
service.
– Cf. Google and del.icio.us
Alistair Miles, Ecoterm 2006, slide 14
http://www.w3.org/2004/02/skos
Strategies …
• Basic strategies for implementing
retrieval services …
1. Statistical text analysis
2. Analysis of user behaviour
3. Index with controlled vocab
• Other strategies …
1. … kos-assisted text analysis?
Alistair Miles, Ecoterm 2006, slide 15
http://www.w3.org/2004/02/skos
Cost problem …
• Given that applying controlled structured
vocab for retrieval involves significant
initial and ongoing investment…
• Given that other strategies are cheaper…
• Huge pressure to drive down cost and
increase utility.
• Requirement for seamless integration.
– I.e. controlled vocab is seldom used in isolation, most
applications will combine strategies.
Alistair Miles, Ecoterm 2006, slide 16
http://www.w3.org/2004/02/skos
Use case …
• Search portal …
• Use combined strategies.
Alistair Miles, Ecoterm 2006, slide 17
http://www.w3.org/2004/02/skos
Component specs …
• Important factors …
• Minimise cost.
– Decentralisation.
– Assistance.
• Maximise “utility”.
– Query expansion.
– Smart ranking.
– Maximize lifetime.
• Use the Semantic Web!
– Situation A. search across many collections, where
indexers use same controlled vocab.
– Situation B. search across many collections, where
indexes use different controlled vocabs.
Alistair Miles, Ecoterm 2006, slide 18
http://www.w3.org/2004/02/skos
Focus areas …
• Decentralisation requires different
models of collaboration and change.
• Representing change a key factor to
keeping a vocab applicable.
• Ranking and scoring well understood
for text, less so for controlled index.
• Theory of query expansion? Field
trials of query expansion?
• Strategies for providing assistance?
Alistair Miles, Ecoterm 2006, slide 19
http://www.w3.org/2004/02/skos
Change and collaboration
• Continuum of collaboration models:
centralized <-> decentralised
• Continuum of change management
models: continuous <-> discrete
• Decentralization can reduce cost of
development and maintenance
• Change management can ensure
continued utility – maximize ROI
• Support for declarative representation of
change a requirement for SKOS.
Alistair Miles, Ecoterm 2006, slide 20
http://www.w3.org/2004/02/skos
Semantic Web architecture…
• Exploit Semantic Web facility to
distribute and merge data.
• However, publication of data in the
Semantic Web, best practices need
work.
• See “Best Practice Recipes for
Publishing RDF Vocabularies” W3C
Working Draft (Google “publishing
RDF”).
Alistair Miles, Ecoterm 2006, slide 21
http://www.w3.org/2004/02/skos
Semantic Web architecture
Alistair Miles, Ecoterm 2006, slide 22
http://www.w3.org/2004/02/skos
Direct interaction …
Alistair Miles, Ecoterm 2006, slide 23
http://www.w3.org/2004/02/skos
Information retrieval…
• Indexing and query evaluation well
understood for text content.
• Less well understood for controlled
metadata.
• Query types?
• Query evaluation strategies, e.g.
query expansion?
• Ranking?
Alistair Miles, Ecoterm 2006, slide 24
http://www.w3.org/2004/02/skos
Assistance for indexers …
• Provide suggestions
–
–
–
–
Comparison of labels and annotations
Machine learning
Exploit lexical resources
…?
Alistair Miles, Ecoterm 2006, slide 25
http://www.w3.org/2004/02/skos
Assistance for mappers …
• Provide suggestions …
– Analysis of labels and annotations
– Exploit lexical resources
–…?
Alistair Miles, Ecoterm 2006, slide 26
http://www.w3.org/2004/02/skos
Summary
• SKOS: fundamental requirement to
support information retrieval using
controlled structured vocabularies.
• Define requirements by describing
information retrieval functionalities.
• Divide functionalities into:
– Presentation styles
– Query types e.g. compound queries, coordination …
– Query evaluation strategies
• Assumptions:
–
–
–
–
Key components
Semantic Web interaction
Context – pressure to make vocabularies “profitable”
… Issues: change, assistance, theory …
Alistair Miles, Ecoterm 2006, slide 27
http://www.w3.org/2004/02/skos