KOS - Helda

Introductory Review of
Current
Knowledge Organization
Systems/Structures/Services
(KOS)
Marcia Lei Zeng
Second International Seminar on Subject
Access to Information, Helsinki,
Finland, 29-30 November 2007
Purpose of this talk
• Introduce different types of
knowledge organization
systems/structures/services
(KOS)
• Provide a common
terminology and background
M.L.Zeng @ ISSAI, Helsinki,2007
2
1. KOS overview (1)
Knowledge organization
systems/structures/services
(KOS) encompass all types of
schemes for organizing
information and promoting
knowledge management.
– (Gail Hodge, 2000)
M.L.Zeng @ ISSAI, Helsinki,2007
3
1. KOS overview (2)
These systems
• model the underlying semantic
structure of a domain, and
• provide semantics, navigation, and
translation through labels,
definitions, typing, relationships,
and properties for concepts.
– (Hill et al. 2002, Koch and Tudhope 2004).
M.L.Zeng @ ISSAI, Helsinki,2007
4
A Taxonomy of KOS
Structure
Relationship Models:
Classification &
Categorization:
Metadata-like
Models:
Term Lists:
Ontologies
Semantic networks
Thesauri
Classification schemes
Taxonomies
Categorization schemes
Subject Headings
Gazetteers
Directories
Authority Files
Synonym Rings
Glossaries/Dictionaries
Pick lists
Function
2. Fundamentals of KOS Approaches
• 2.1 Eliminating ambiguity
• 2.2 Controlling synonyms or
equivalents
• 2.3 Making explicit semantic
relationships
– Hierarchical relationships
– Hierarchical + other associate
relationships
• 2.4 Presenting relationships as well
as properties of concepts
M.L.Zeng @ ISSAI, Helsinki,2007
6
2.1 Eliminating ambiguity
• Ambiguity: terms having the
same spelling (homographs)
that represent different
concepts or meanings
• Ambiguity exists when a given
term can be used to represent
completely different concepts.
M.L.Zeng @ ISSAI, Helsinki,2007
7
Ambiguity / Homographs
Source: Z39.19-2005, p.25
To eliminate ambiguity (1)
1. Adding a qualifier to a term
-- one of the major methods used
by almost every type of KOS,
especially lists of subject
headings and thesauri.
• e.g., Mercury (automobile)
M.L.Zeng @ ISSAI, Helsinki,2007
9
To eliminate ambiguity (2)
2. Providing a scope note
-- another major method used by
almost every type of KOS,
especially lists of subject
headings, classifications, and
thesauri.
Screenshot from MeSH
http://www.nlm.nih.gov/mesh/MBrowser.html
Entry: mercury
M.L.Zeng @ ISSAI, Helsinki,2007
10
M.L.Zeng @ ISSAI, Helsinki,2007
11
http://www.nlm.nih.gov/mesh/MBrowser.html
To eliminate ambiguity (3)
3. providing a context of a term
M.L.Zeng @ ISSAI, Helsinki,2007
12
What are these?
•
•
•
•
•
•
•
Flying Horse
King Fisher
Royal Challenge
Heineken
Budweiser
Miller-Lite
Bud-Light
M.L.Zeng @ ISSAI, Helsinki,2007
13
Drinks
•
•
•
•
•
•
•
•
•
•
Flying Horse
King Fisher
Royal Challenge
Taj Mahal
Hayward’s 2000
Heineken
Corona
Budweiser
Miller-Lite
Bud-Light
Lists (Picklists)
A type of controlled vocabulary induced in
NISO Z39.19 Standard
• Lists are used to describe aspects of content
objects or entities that have a limited number of
possibilities.
• Examples include:
–
–
–
–
geography (e.g., country, state, city),
language (e.g., English, French, Swedish),
format (e.g., text, image, sound), or
……
M.L.Zeng @ ISSAI, Helsinki,2007
16
Lists can be used effectively for
both browsing and searching.
• In browsing,
browsing items are directly
accessed when the list of terms
is reviewed and one term is
selected
M.L.Zeng @ ISSAI, Helsinki,2007
17
Source: http://www.ncbi.nlm.nih.gov/genome/guide/human/resources.shtml
M.L.Zeng @ ISSAI, Helsinki,2007
18
• In searching,
searching a list may be
used to access content in a
single term search, or the terms
from the list may be used to
limit a retrieved set by another
attribute of interest for the user
(one or more terms in the
search).
M.L.Zeng @ ISSAI, Helsinki,2007
19
Source: Google’s advanced search http://www.google.com
M.L.Zeng @ ISSAI, Helsinki,2007
20
pick lists
Waterford County Image Archive
http://www.waterfordcountyimages.org
M.L.Zeng @ ISSAI, Helsinki,2007
22
Waterford County Image Archive
http://www.waterfordcountyimages.org
List - Definition, Purpose, and Uses
• A list (also called a pick list) is
a limited set of terms arranged
as a simple alphabetical list or
in some other logically evident
way.
– A list is a series of terms in some
sequential order.
– Terms can be ordered
alphabetically, chronologically,
numerically, etc.
M.L.Zeng @ ISSAI, Helsinki,2007
23
Exercise: Which list is
better?
• The defining characteristics of
a list are that the terms:
· are all members of the same set
or class of items (e.g., countries,
products)
· are not overlapping in meaning
· are equal in terms of specificity
(granularity)
M.L.Zeng @ ISSAI, Helsinki,2007
25
Typical applications
• Lists are frequently used to
display small sets of terms that
are to be used for quite
narrowly defined purposes
such as a web pull-down list or
list of menu choices.
M.L.Zeng @ ISSAI, Helsinki,2007
26
2. Fundamentals of KOS Approaches
• 2.1 Eliminating ambiguity
• 2.2 Controlling synonyms or
equivalents
• 2.3 Making explicit semantic
relationships
– Hierarchical relationships
– hierarchical + other associate
relationships
• 2.4 Presenting relationships as
well as properties of concepts
M.L.Zeng @ ISSAI, Helsinki,2007
27
2.2 Controlling synonyms or
equivalents
•
1.
Synonyms: terms with the same or
similar meanings
True synonyms (unusual)
–
2.
mean exactly the same thing and are
used in precisely the same context
Near synonyms (most common)
M.L.Zeng @ ISSAI, Helsinki,2007
28
1.
True Synonyms
• common and technical names
– salt vs. sodium chloride
• changes in usage of terms over time
– electronic calculating machines vs.
computers
• in different languages
– eyeglasses, spectacles, glasses
• acronyms
– BBC, British Broadcasting Company;
MPG, miles per gallon
• variant spellings:
– cancelled, canceled; honor, honour
M.L.Zeng @ ISSAI, Helsinki,2007
29
2. Near Synonyms
• Same stem
– computing, computers,
computed,
microcomputers,
supercomputers
• General and
specific terms
• Overlapping concepts
–
–
–
–
medicine, drugs
fired, laid off
forest, woods
arid, dry
M.L.Zeng @ ISSAI, Helsinki,2007
Coffee
– Double Espresso
– Latte
– Cappuccino
– Short Black
– Macchiato
– Flat White
– etc.
30
Synonymy
M.L.Zeng @ ISSAI, Helsinki,2007
31
Source: Z39.19-2005, p.25
• Each distinct concept should
refer to a unique linguistic
form.
• Information or content that is
provided to a user should not
spread across the system under
multiple access points, but
should be gathered together in
one place.
M.L.Zeng @ ISSAI, Helsinki,2007
32
Controlling synonyms: there will only be one term used to represent
a given concept or entity.
……
150 World War, 1939-1945
450 European War, 1939-1945
450 Second World War, 19391945
450 World War 2, 1939-1945
450 World War II, 1939-1945
450 World War Two, 1939-1945
Source: FAST: Faceted
Application of Subject
Terminology
http://fast.oclc.org/
Authority
File
or:
World War, 1939-1945
UF European War, 1939-1945
UF Second World War, 1939-1945
UF World War 2, 1939-1945
UF World War II, 1939-1945
UF World War Two, 1939-1945
European War, 1939-1945
USE World War, 1939-1945
Second World War, 1939-1945
USE World War, 1939-1945
World War 2, 1939-1945
USE World War, 1939-1945
Thesaurus
World War II, 1939-1945
USE World War, 1939-1945
World War Two, 1939-1945
USE World War, 1939-1945
M.L.Zeng @ ISSAI, Helsinki,2007
34
Source: Art and Architecture
Thesaurus (AAT)
M.L.Zeng @ ISSAI, Helsinki,2007
35
Source: Medical Subject Headings (MeSH)
Synonym Rings
A type of controlled vocabulary induced in
NISO Z39.19 Standard
A synonym ring connects a set of words that are
defined as equivalent for retrieval.
astronaut
spaceman
spationaut
cosmonaut
taikonaut
An example from International SEMATECH.
A search for Silicon would look like this:
Your search was submitted as “CILICON” or “SI”
Synonym Rings are used-• to expand queries for content objects
– If a user enters any one of these terms as
a query to the system, all items are
retrieved that contain any of the terms
in the cluster.
• in systems where the underlying
content objects are left in their
unstructured natural language
format
– The control is achieved through the
interface by drawing together similar
terms to these clusters.
• in conjunction with search engines
M.L.Zeng @ ISSAI, Helsinki,2007
39
Poverty mitigation
Poverty alleviation
Poverty reduction
Poverty elimination
Poverty prevention
Poverty abatement
Poverty reducation
Poverty eradication
Source: Bedford, 2006 ppt.
Rings can include all kinds of
synonyms - true,
misspellings, predecessors,
abbreviations
Exercise
• Find synonyms of this type of
object:
M.L.Zeng @ ISSAI, Helsinki,2007
41
2. Fundamentals of KOS Approaches
• 2.1 Eliminating ambiguity
• 2.2 Controlling synonyms or
equivalents
• 2.3 Making explicit semantic
relationships
– Hierarchical relationships
– hierarchical + other associate
relationships
• 2.4 Presenting relationships as well
as properties of concepts
M.L.Zeng @ ISSAI, Helsinki,2007
42
2.3 Making explicit semantic relationships –
Hierarchical relationships
Birds
Cardinals
Doves
Robins
Wrens
All specific names of
birds are kinds of birds.
M.L.Zeng @ ISSAI, Helsinki,2007
43
Scientific Taxonomy
An example: Leatherback turtle
Phylum: Chordata
Class: Reptilia
Subclass: Anapsida
Order: Testudines
Suborder: Cryptodira
Family: Dermochelyidae
Genus: Dermochelys
Species: Dermochelys coriacea
(Leatherback turtle)
Classifications
superordinate classes (e.g., parents)
. coordinate classes (e.g., siblings)
. . subordinate classes (e.g., children)
. . subordinate classes
. coordinate classes
. coordinate classes
. . subordinate classes
relationship types: generic, instance, and whole-part
M.L.Zeng @ ISSAI, Helsinki,2007
45
M.L.Zeng @ ISSAI, Helsinki,2007
46
M.L.Zeng @ ISSAI, Helsinki,2007
47
2.3 Making explicit semantic relationships –
Associative relationships (not hierarchical)
Relationship
Part / Whole
Cause / Effect
Process / Agent
Action / Product
Action / Patient
Concept or Thing / Properties
Concept or Thing / Origins
Thing or Action / Counter-agent
Raw material / Product
Action / Property
Antonyms
Example
Bicycle / Bicycle Wheel
Accident / Injury
Velocity measurement / Speedometer
Writing / Publication
Teaching / Student
Steel alloy / Corrosion resistance
Water / Well
Pest / Pesticide
Grapes / Wine
Communication / Communication
skills
Single people / Married people
M.L.Zeng @ ISSAI, Helsinki,2007
49
M.L.Zeng @ ISSAI, Helsinki,2007
50
Source: Z39.19-2005, p.29
KOS in Use at World Bank
• Topic Thesaurus (500,000+
English terms, French and
Spanish language versions in
progress now)
• Topic Classification Scheme
(30 top classes, 700+ subtopics,
300+ subsubtopics)
• Business Function Thesaurus
(50,000 terms and growing)
• Business Function
Classification Scheme (5
business areas, 30 lines of
business, 300+ business
processes)
• Country-Region classification
scheme (6 regions, ca. 200
countries)
• Content Type Classification
Scheme (8 content types, 300+
secondary content types – in
refinement now)
• Media-Format Classification
Scheme
• Country Name Authority Control
(synonym, predecessor, successor
sources)
• Edition Statements Authority
Control
• Publisher Name Authority
Control
• Organization Authority Control
• Language Authority Control
• Series Name/Collection Title
Authority Control
• Translation Type Authority
Control
Source: Bedford, 2007, ASIST
Vision of An Enterprise Advanced Search
Synonym
Rings
Synonym
Rings
Pick lists
Hierarchical
taxonomy
M.L.Zeng @ ISSAI, Helsinki,2007
53
Source: Revised based on Bedford, 2006 ppt.
Synonym Rings
Metadata
Thesaurus
M.L.Zeng @ ISSAI, Helsinki,2007
54
Source: Revised based on Bedford, 2006 ppt.
2. Fundamentals of KOS Approaches
• 2.1 Eliminating ambiguity
• 2.2 Controlling synonyms or
equivalents
• 2.3 Making explicit semantic
relationships
– Hierarchical relationships
– hierarchical + other associate
relationships
• 2.4 Presenting
relationships as well as
properties of concepts
2.4 Presenting relationships as
well as properties of concepts
• Entity types
• Relationship types
• Properties
M.L.Zeng @ ISSAI, Helsinki,2007
56
Semantic networks
organize sets of terms
representing concepts,
modeled as the nodes in a
network of variable
relationship types.
M.L.Zeng @ ISSAI, Helsinki,2007
57
UMLS Semantic Network
M.L.Zeng @ ISSAI, Helsinki,2007
58
135 Semantic Types (link) and 54 Semantic Relation Types (link)
Ontologies
Classes
attributes
instances
Source: Noy, N. F. and Tu, S.W. (2003).
M.L.Zeng @ ISSAI, Helsinki,2007
61
M.L.Zeng @ ISSAI, Helsinki,2007
62
M.L.Zeng @ ISSAI, Helsinki,2007
63
The Graph view of relations
M.L.Zeng @ ISSAI, Helsinki,2007
64
A Taxonomy of KOS © 2007 Zeng
structure
:
Relationship Models
Multiple
dimensions
Classification &
Categorization:
Twodimensions
Flat
Metadata -like
Models:
Term Lists:
Ontologies
Semantic networks
Thesauri
Classification schemes
Taxonomies
Categorization schemes
Subject Headings
Gazetteers
Directories
Authority Files
Synonym Rings
Glossaries/Dictionaries
Pick lists
function
Major functions
eliminating ambiguity
controlling synonyms
establishing
relationships: hierarchical
establishing
relationships: associative
presenting properties
xxx
xxxx
xxx
xx
xxxx
xx
xxx
xx
xxxx
xx
x
xxxx
xxx
xxx
xxxx
xxxxx
xxxxx
Networked KOS è NKOS
• KOS are not used in isolation;
• KOS may be used, re-used, and repurposed in web-based services;
• KOS are used for:
– organizing, indexing, cataloging, and searching,
AND
– learning, knowledge modeling, reasoning, etc.
• NKOS need to be machine-processable,
machine-understandable
– (more to discuss later today)
M.L.Zeng @ ISSAI, Helsinki,2007
66
References
• Hodge, Gail (2000). Systems of Knowledge Organization for
Digital Libraries: Beyond Traditional Authority Files. Washington,
DC: Council on Library and Information Resources.
http://www.clir.org/pubs/reports/pub91/contents.html
http://www.clir.org/pubs/reports/pub91/pub91.pdf
• Hill, Linda, Buchel, Olha, Janee, Greg, and Zeng, Marcia L.
2002. Integration of knowledge organization systems into
digital library architectures: In: Mai, Jens-Erik, et al. ed.:
Advances of classification research, volume 13, proceedings of the
13th ASIST SIG/CR Workshop, 17 November 2002
Philadelphia PA, pp. 62-68.
• Koch, Traugott and Tudhope, Douglas. 2004. User-centred
approaches to Networked Knowledge Organization
Systems/Services (NKOS): Background.
http://www2.db.dk/nkos-workshop/#Background
M.L.Zeng @ ISSAI, Helsinki,2007
67