Wordnet

From WordNet,
to EuroWordNet,
to the Global Wordnet Grid:
anchoring languages to universal meaning
Piek Vossen
VU University Amsterdam
1
Princeton WordNet
• http://wordnet.princeton.edu/
• Developed by George Miller and his team at
Princeton University, as the implementation of
a mental model of the lexicon
• Organized around the notion of a synset: a set
of synonyms in a language that represent a
single concept
• Semantic relations between concepts
• Covers over 117,000 concepts and over
150,000 English words
2
What kind of resource is
WordNet?
• Mostly used database in language
technology
• Enormous impact in language technology
development
• Large
• Free and downloadable
• English
3
Wordnet Starting point
• Lexical database organized around concepts instead of lexical
forms:
– Separates lexical forms from concepts
– Defines concepts through a relational model of meaning and
not an encyclopedic view
• Concept is defined by the notion of a synset, synsets distinguish
word meanings:
– {board, plank}{board, committee}{board, get on}
• The ‘synset’ as a weak notion of synonymy:
“two expressions are synonymous in a linguistic context C if the
substitution of one for the other in C does not alter the truth
value.” (Miller et al. 1993)
4
Wordnet: a network of semantically
related words
{conveyance;transport}
{vehicle}
{motor vehicle; automotive vehicle}
{car; auto; automobile; machine; motorcar}
{car mirror}
{armrest}
{car door}
{doorlock}
{bumper}
{car window}
{cruiser; squad car; patrol car;
police car; prowl car}
{hinge;
flexible joint}
{cab; taxi; hack; taxicab}
5
Polysemy & Word Forms



Synsets (word meanings)
contain one or more
word forms
Word forms in multiple
synsets are polysemous
Polysemous word forms
are said to have multiple
senses
Polysemy, Familiarity
& Zipf's Law



Zipf's law:

There is a constant k such that f * r = k

In words: there is a predictable relation between the
frequency of a word and its rank
Zipf's other law:

The number of meanings of a word is related to its
frequency of use
Polysemy indicates familiarity in wordnet:
• “horse” has 6 meanings, “equus caballus” has 1
Familiarity & SemCor

Sense numbers indicate frequency in SemCor (250,000
tokens from Brown corpus manually tagged with WordNet
senses):
• {horse:1, Equus caballus:1} => animal
• {horse:5, knight:2} => in chess
Wordnet 3.0 statistics
POS
Unique
Synsets
Strings
Total
Word-Sense
Pairs
Noun
117,798
82,115
146,312
Verb
11,529
13,767
25,047
Adjective
21,479
18,156
30,002
4,481
3,621
5,580
155,287
117,659
206,941
Adverb
Totals
9
Wordnet 3.0 statistics
POS
Noun
Verb
Adjective
Adverb
Totals
Monosemous
Polysemous
Polysemous
Words and
Senses
Words
Senses
101,863
15,935
44,449
6,277
5,252
18,770
16,503
4,976
14,399
3,748
733
1,832
128,391
26,896
79,450
10
Semantic organization of
Nouns in WordNet

25 unique beginners
noun.Tops file

Contains very general classifi cations
Lexicalization patterns
top-layer
entity
organism
object
artifact
building
animal
bird
church
canary
abbey
common
dog crocodile
25 unique
beginners
plant
tree
fl ower
rose
Basic Level
Concepts
(Rosch)
canary
13
Lexicalization patterns
top-layer
entity
organism
object
artifact
building
animal
bird
church
canary
abbey
common
canary
dog crocodile
plant
tree
fl ower
25 unique
beginners
basic level
concepts
rose
• balance of two
principles:
●
predict most features
●
apply to most subclasses
• where most concepts are created
• amalgamate most parts
• most abstract level to draw a pictures
14
Lexicalization patterns
entity
inessential
souvenir
garbage
threat
artifact
building
organism
object
animal
bird
church
canary
abbey
common
dog crocodile
plant
tree
top-layer
curiosity ....etc....
waste
25 unique
variable beginners
fl ower
basic level
concepts
rose
canary
15
Lexicalization patterns
entity
inessential
souvenir
garbage
threat
artifact
building
organism
object
animal
bird
church
canary
abbey
common
dog crocodile
plant
tree
top-layer
curiosity ....etc....
waste
25 unique
variable beginners
fl ower
basic level
concepts
rose
canary
16
Wordnet top level
17
Meronymy & pictures
beak
tail
leg
18
Meronymy & pictures
19
Dogs in WordNet
20
Type-role distinction
• Current WordNet treatment:
(1) a husky is a kind of dog (type)
(2) a husky is a kind of working dog (role)
• What’s wrong?
(2) is defeasible, (1) is not:
*This husky is not a dog
This husky is not a working dog
Other roles: watchdog, sheepdog, herding dog, lapdog, etc….
21
Ontological observations
• Identity criteria as used in OntoClean (Guarino &
Welty 2002):
– rigidity: to what extent are properties true for all
instances of entities in all worlds? You are always a
human, but you can be a student for a short while.
• Ignoring this distinction leads to ISA-overloading
22
Ontology and lexicon
• Hierarchy of disjunct types:
Canine => PoodleDog; NewfoundlandDog;
GermanShepherdDog; Husky
• Lexicon:
– NAMES for TYPES:
{poodle}EN, {poedel}NL, {pudoru}JP
=> ((instance x Poodle)
– LABELS for ROLES:
{watchdog}EN, {waakhond}NL, {banken}JP
=>((instance x Canine) and (role x GuardingProcess))
23
Expansion with pure hyponymy
relations
dog
hunting dog
puppy
lapdog
street dog
bitch
poodle
dachshund
watchdog
short hair
dachshund
long hair
dachshund
Expansion from a type to roles
24
Expansion with pure hyponymy
relations
dog
hunting dog
puppy
lapdog
street dog
bitch
poodle
dachshund
watchdog
short hair
dachshund
long hair
dachshund
Expansion from a role to types and other roles
25
Synset definition
Synsets consist of interchangeable words or synonyms (Miller, 1998)
strict criteria
loose criteria
lion, king_of_beasts,
Panthera_leo [large gregarious
predatory feline of Africa and India
having a tawny coat with a shaggy
mane in the male]
dog, domestic doc, canis
familiaris (a member of the
genus Canis..)
Pooch, doggie, doggy,
barker, bow-wow (informal
terms for dogs)
26
Differences among wordnets
English Wordnet
large number of synsets
asshole, bastard, cocksucker, dickhead, shit,
mother fucker, motherfucker, prick, whoreson,
son of a bitch, SOB
cad, bounder, blackguard, dog, hound, heel
gasbag, windbag
rotter, rat, skunk, stinker, bum, puke, crumb,
lowlife, scum_bag, so-and-so
pain, pain_in_the_neck, nuisance
worm, louse, insect, dirt_ball
Dutch Wordnet
62 synonyms
naarling:1/r_n-24518, beroerling:1/d_n-26921,
ellendeling:1/r_n-12324, etterbak:1/d_n-75936,
etterbuil:2/d_n-75940, fielt:1/d_n-80137,
fluim:2/d_n-81948, gemenerik:1/r_n-14607,
hond:2/r_n-79023, hondenlul:1/r_n-17019,
kankerlijer:1/d_n-130709, kelerelijder:1/d_n540923, kelerelijer:1/d_n-147148,
klerelijer:1/r_n-19790, kloot:1/r_n-19887,
kloothommel:1/d_n-137246, klootspiraal:1/d_n412711, klootzak:1/r_n-19888, kwal:2/r_n21077, lamgat:1/d_n-152244, lammeling:1/r_n21272, lamstraal:1/d_n-152396, lamzak:1/r_n21286, lazersteen:1/d_n-413025,
lazerstraal:1/d_n-154087, loeder:1/r_n-22410,
lul:2/r_n-22757, lulhannes:1/d_n-161976,
lulletje:1/d_n-541138, miesgasser:1/d_n172163, mispunt:1/r_n-24006, onverlaat:1/r_n26320, paardelul:1/d_n-228940,
paardenlul:1/n_n-501022, patjakker:1/d_n212558, pleurislijder:1/r_n-28842, ploert:1/r_n28881, plurk:1/d_n-220067, etc. etc.
insulting terms for people who are stupid, ridiculous, irritating, lazy, slow, ……27
Splitting or Lumping
 Lumping
“one denotation and several connotations per synset”
lion, king_of_beasts, Panthera_leo
 Splitting
“one denotation and one connotation per synset”
Kraut, Krauthead, Jerry, Hun
hyponym
German
Splitting
• On which criteria? Usage (register, domain,
frequency, style, etc.); Attitude (polarity,
subjectivity); Morphology, Syntax, etc.
• Consequent splitting leads to synsets without
synonyms
• Leads to ISA-overloading (german is not a
hypernym of krauthead)
Lumping
• Consequent lumping leads to extremely
large synsets
• Low interchangeability of synonyms as
their connotations differ too much
• Low interoperability between wordnets:
precise translation equivalence is impossible
• Leads to ‘unintuitive’ synsets
Hybrid and 2-layered
Summary
• Synsets are more compact representations for concepts than word
meanings in traditional lexicons
• Synonyms and hypernyms are substitutional variants:
– begin – commence
– I once had a canary. The bird got sick. The poor animal died.
• Hyponymy and meronymy chains are important transitive relations for
predicting properties and explaining textual properties:
object -> artifact -> vehicle -> 4-wheeled vehicle -> car
• Strict separation of part of speech although concepts are closely
related (bed – sleep) and are similar (dead – death)
• Lexicalization patterns reveal important mental structures
32
EuroWordNet
• The development of a multilingual database with wordnets
for several European languages
• Funded by the European Commission, DG XIII,
Luxembourg as projects LE2-4003 and LE4-8328
• March 1996 - September 1999
• 2.5 Million EURO.
• http://www.hum.uva.nl/~ewn
• http://www.illc.uva.nl/EuroWordNet/fi nalresults-ewn.html
33
EuroWordNet
• Languages covered:
– EuroWordNet-1 (LE2-4003): English, Dutch, Spanish, Italian
– EuroWordNet-2 (LE4-8328): German, French, Czech, Estonian.
• Size of vocabulary:
– EuroWordNet-1: 30,000 concepts - 50,000 word meanings.
– EuroWordNet-2: 15,000 concepts- 25,000 word meaning.
• Type of vocabulary:
– the most frequent words of the languages
– all concepts needed to relate more specifi c concepts
34
EuroWordNet Model
Domains
Ontology
Traffic
2OrderEntity
move
go
Air
III
ride
Location Dynamic
Road`
III
rijden
drive
I
III
bewegen
gaan
I
II
III
II
Lexical Items Table
III
cabalgar
jinetear
conducir
III
mover
transitar
II
Lexical Items Table
Lexical Items Table
ILI-record
{drive}
Lexical Items Table
berijden
III
II
Inter-Lingual-Index
I = Language Independent link
II = Link from Language Specific
to Inter lingual Index
III = Language Dependent Link
cavalcare
guidare
III
andare
muoversi
35
Differences in relations between
EuroWordNet and WordNet
• Added Features to relations
• Cross-Part-Of-Speech relations
• New relations to differentiate shallow hierarchies
• Different interpretations of some relations
36
Cross-Part-Of-Speech relations
WordNet1.5: nouns and verbs are not interrelated by basic semantic
relations such as hyponymy and synonymy:
adornment 2
adorn 1
=> change of state-- (the act of changing something)
=> change, alter-- (cause to change; make different)
EuroWordNet: words of different parts of speech can be inter-linked with
explicit xpos-synonymy, xpos-antonymy and xpos-hyponymy relations:
{adorn V}
{size N}
XPOS_NEAR_SYNONYM
XPOS_NEAR_HYPONYM
{adornment N}
{tall A}
{short A}
37
Role relations
In the case of many verbs and nouns the most salient relation is not the hyperonym
but the relation between the event and the involved participants. These relations
are expressed as follows:
{knife}
{to cut}
{school}
{to teach}
ROLE_INSTRUMENT
INVOLVED_INSTRUMENT
ROLE_LOCATION
INVOLVED_LOCATION
{to cut}
{knife}
{to teach}
{school}
reversed
reversed
These relations are typically used when other relations, mainly hyponymy, do not
clarify the position of the concept network, but the word is still closely related to
another word.
38
Co_Role relations
guitar player
player
to play music
guitar
ice saw
saw
ice
HAS_HYPERONYM
CO_AGENT_INSTRUMENT
HAS_HYPERONYM
ROLE_AGENT
CO_AGENT_INSTRUMENT
HAS_HYPERONYM
ROLE_INSTRUMENT
HAS_HYPERONYM
CO_INSTRUMENT_AGENT
HAS_HYPERONYM
CO_INSTRUMENT_PATIENT
HAS_HYPERONYM
ROLE_INSTRUMENT
CO_PATIENT_INSTRUMENT
player
guitar
person
to play music
musical instrument
to make
musical instrument
musical instrument
guitar player
saw
ice
saw
to saw
ice saw REVERSED
39
Horizontal & vertical semantic relations
chronical patient ;
mental patient
HYPONYM
ρ-PATIENT
patient
STATE
cure
ρ-CAUSE
docter
treat
ρ-PATIENT
ρ-AGENT
HYPONYM
child docter
disease; disorder
HYPONYM
stomach disease,
kidney disorder,
ρ-INSTRUMENT
physiotherapy
medicine
etc.
ρ-LOCATION
co-ρAGENT-PATIENT
hospital, etc.
child
40
Overview of the Language
Internal relations in EuroWordnet
Same Part of Speech relations:
NEAR_SYNONYMY
HYPERONYMY/HYPONYMY
ANTONYMY
HOLONYMY/MERONYMY
apparatus - machine
car - vehicle
open - close
head - nose
Cross-Part-of-Speech relations:
XPOS_NEAR_SYNONYMY
XPOS_HYPERONYMY/HYPONYMY
XPOS_ANTONYMY
CAUSE
SUBEVENT
ROLE/INVOLVED
STATE
MANNER
BELONG_TO_CLASS
dead - death; to adorn - adornment
to love - emotion
to live - dead
die - death
buy - pay; sleep - snore
write - pencil; hammer - hammer
the poor - poor
to slurp - noisily
41
Rome - city
The Multilingual Design
• Inter-Lingual-Index: unstructured fund of concepts to
provide an effi cient mapping across the languages;
• Index-records are mainly based on WordNet synsets and
consist of synonyms, glosses and source references;
• Various types of complex equivalence relations are
distinguished;
• Equivalence relations from synsets to index records: not on a
word-to-word basis;
• Indirect matching of synsets linked to the same index items;
42
Equivalent Near Synonym
1. Multiple Targets (1:many)
Dutch wordnet: schoonmaken (to clean) matches with 4
senses of clean in WordNet1.5:
●
●
●
●
make clean by removing dirt, filth, or unwanted substances from
remove unwanted substances from, such as feathers or pits, as of chickens or fruit
remove in making clean; "Clean the spots off the rug"
remove unwanted substances from - (as in chemistry)
2. Multiple Sources (many:1)
Dutch wordnet: versiersel near_synonym versiering
ILI-Record: decoration.
3. Multiple Targets and Sources (many:many)
Dutch wordnet: toestel near_synonym apparaat
ILI-records: machine; device; apparatus; tool 43
Complex mappings across
languages
EN-Net
IT-Net
toe
dito
finger
{ toe : part of foot }
head
{ finger : part of hand }
{ dedo , dito :
finger or toe }
{ head : part of body }
NL-Net
hoofd
kop
{ hoofd : human head }
{ kop : animal head }
ES-Net
dedo
= normal equivalence
= eq _has_hyponym
= eq _has_hyperonym
44
Typical gaps in the (English)
ILI
• Dutch:
doodschoppen (to kick to death):
eq_hyperonym {kill}V and to {kick}V
aardig (Adjective, to like):
eq_near_synonym {like}V
cassière (female cashier)
eq_hyperonym {cashier}, {woman}
kunstproduct (artifact substance)
eq_hyperonym {artifact} and to {product}
• Spanish:
alevín (young fi sh):
eq_hyperonym {fi sh} and eq_be_in_state {young}
cajera (female cashier)
eq_hyperonym {cashier}, {woman}
45
Wordnets as semantic
structures
• Wordnets are unique language-specifi c structures:
– different lexicalizations
– differences in synonymy and homonymy
– different relations between synsets
– same organizational principles: synset structure and same set
of semantic relations.
• Language independent knowledge is assigned to the
ILI and can thus be shared for all language linked to
the ILI: both an ontology and domain hierarchy
46
Autonomous & Language-Specific
Wordnet1.5
Dutch Wordnet
voorwerp
{object}
object
artifact, artefact
(a man-made object)
block
natural object (an
object occurring
naturally)
blok
instrumentality
body {block}
implement
container
tool
spoon
bag
lichaam
{body}
device
instrument
box
werktuig{tool}
bak
{box}
lepel
{spoon}
tas
{bag}
47
Wordnets versus ontologies
• Wordnets:
• autonomous language-specific lexicalization patterns in a
relational network.
• Usage: to predict substitution in text for information
retrieval,
• text generation, machine translation, word-sensedisambiguation.
• Ontologies:
• data structure with formally defined concepts.
• Usage: making semantic inferences.
48
Building wordnets
• Two major approaches:
– Expand model: translate the English synonyms and
copy the synsets and relations;
• Fast and cheap
• Benefi ts from English research and resources
• Bias by Princeton wordnet
– Merge model: build wordnet independently of English
and create equivalence mapping afterwards:
• Slow and expensive
• Complicated since the structures differ and you cannot
change English
• Better representation of the language structure:
theoretically more sound (true WORD-net)
49
How to harmonize wordnets?
• Define universal sets of concepts that play a major role in
many different wordnets: so-called Base Concepts
• Define base concepts in each language wordnet
– High level in the hierarchy
– Many hyponyms
• Provide the closest equivalent in English wordnet
• Expand down-ward with hyponyms & determine the
intersection of English equivalences
50
Base Concepts in Wordnet
entity
object
garbage
threat
artifact
25 unique
beginners
organism
animal
building bird
church canary dog crocodile
abbey common
canary
plant
tree fl ower
rose
1024 base
concepts
basic level
concepts
51
Common Base Concepts
Important in at least two languages
Nouns
Verbs
Physical objects & substances
491
Processes and states
272
Mental objects
Total
Total
491
228
33
796
500
33
228
1024
52
EuroWordNet data
Dutch
Spanish
Italian
French
German
Czech
Estonian
English
WN15
Synsets No. of senses Sens./ Entries Sens./ LIRels. LIRels/ EQRels- EQRels/ Synsets
syns.
entry
syns
ILI
syn without
ILI
44015
70201 1,59 56283 1,25 111639
2,54 53448
1,21
7203
23370
50526 2,16 27933 1,81 55163
2,36 21236
0,91
0
40428
48499 1,20 32978 1,47 117068
2,90 71789
1,78
1561
22745
32809 1.44 18777 1.75 49494
2.18 22730
1.00
20
15132
20453 1.35 17098 1.20 34818
2.30 16347
1.08
0
12824
19949 1.56 12283 1.62 26259
2.05 12824
1.00
0
7678
13839 1.80 10961 1.26 16318
2.13
9004
1.17
0
16361
40588 2,48 17320 2,34 42140
2,58
n.a.
n.a.
n.a.
94515
187602 1,98 126617 1,48 211375
2,24
n.a.
n.a.
n.a.
53
From EuroWordNet to Global
WordNet
• Currently, wordnets exist for more than 70 languages,
including: Arabic, Bantu, Basque, Chinese, Bulgarian,
Estonian, Hebrew, Icelandic, Japanese, Kannada, Korean,
Latvian, Nepali, Persian, Romanian, Sanskrit, Tamil, Thai,
Turkish, Zulu...
• Many languages are genetically and typologically
unrelated
• http://www.globalwordnet.org
54
Indo Wordnet Project
http://www.cfi lt.iitb.ac.in/wordnet/webhwn/
• Basis for 10 year Indian Machine translation
project: translation through ILI is more
effi cient than building translation memories
• Hindi wordnet as the ILI, while Hindi is linked
to the English wordnet
• About 20 Indian languages (900 million
speakers)
55
Indo Wordnet progress 2010
Synsets
Assamese
Words
353
9.739
21.223
5.802
10.278
8.679
18.563 Nepali
Bodo
3.837
13.357 Oriya
Hindi
Words
19.609 Marathi
Bengali
Gugarati
Synsets
To start
970
2.125 Punjabi
33.900
8.200 Sanskrit
3.340
17.820
4.750
9.821
10.639
18.250
6.123
9.641
Kannad
5.920
7.344 Tamil
Kashmiri
6.569
8.674 Telugu
Malayalam
6.154
8.622 Urdu
Manipuri
2.744
5.231
To start
56
Asian Wordnet Project
http://asianwordnet.org/
57
Asian Wordnet Project
http://asianwordnet.org/
• Built using a Wordnet Management System
developed by NICT, NECTEC Bangkok,
Thailand
• Free download: Apache 2.0+, PHP 5.2+,
MySQL 5.0+
• Translation of English wordnet (expand
model), voting, progress monitoring,
intersection and overlap
• Uses Wordnet-LMF as exchange format
58
Asian Wordnet Editor
59
Asian Wordnet Editor
60
Asian Wordnet Editor
61
South African Wordnets
• Started in 2008 as a collaboration of the Department of
African Languages at UNISA and the Centre for Text
Technology (CTexT®) at the North-West University.
• Languages:
– Afrikaans
– Setswana, isiNdebele, isiZulu, isiXhosa and Sesotho sa
Leboa
• Nr. synsets: between 5,000 and 15,000 per language and
will be completed by the end of January 2011.
• Linked to the English wordnet
• DebVisDic wordnet editing environment:
http://deb.fi .muni.cz/clients-debvisdic.php
62
Some downsides of the
EuroWordnet model
•
•
•
•
•
•
Construction is not done uniformly
Coverage differs
Not all wordnets can communicate with one another
Proprietary rights restrict free access and usage
A lot of semantics is duplicated
Complex and obscure equivalence relations due to
linguistic differences between English and other
languages
63
Next step: Global WordNet
Grid
Fahrzeug
1
Auto Zug
Inter-Lingual
Ontology
vehicle
voertuig
1
auto trein
1
car
Object
train
2
German Words
2
Dutch Words
2
English Words
Device
3
TransportDevice
vehículo
1
véhicule
veicolo
Spanish Words
voiture
1
auto treno
dopravní prost edník
2
Italian Words
1
auto
2
Czech Words
liiklusvahend
auto killavoor
3
auto tren
2
1
2
Estonian Words
1
train
2
French Words
vlak
64
GWN-Grid: Main Features
• Construct separate wordnets for each Grid
language
• Contributors from each language encode the
same core set of concepts plus
culture/language-specific ones
• Synsets (concepts) can be mapped
crosslinguistically via an ontology
65
The Ontology: Main Features
• Formal ontology serves as universal index of
concepts
• List of concepts is not just based on the lexicon of
a particular language (unlike in EuroWordNet) but
uses ontological observations
• Ontology contains only upper and mid-level
concepts but concepts can be derived using formal
expressions in e.g. KIF or RDF
• Concepts are related in a type hierarchy
• Concepts are defined with axioms
66
The Ontology: Main Features
• Minimal set of concepts (Reductionist view):
– to express equivalence across languages
– to support inferencing
• Ontology must be powerful enough to encode all concepts
that are lexically expressed in any of the Grid languages
• Ontology need not and cannot provide a linguistic
encoding (label) for all concepts found in the Grid
languages
– Lexicalization in a language is not sufficient to warrant inclusion
in the ontology
– Lexicalization in all or many languages may be sufficient
• Ontological observations will be used to define the
concepts in the ontology
67
Lexicalizations not mapped to
WordNet
• Not added to the type hierarchy:
{straathond}NL (a dog that lives in the streets)
=> ((instance x Canine) and (habitat x Street))
• Added to the type hierarchy:
{klunen}NL (to walk on skates from one frozen body to the next
over land)
WalkProcess = KluunProcess
Axioms:
(and (instance x Human) (instance y Walk) (instance z Skates)
(wear x z) (instance s1 Skate) (instance s2 Skate) (before s1 y)
(before y s2) etc…
• National dishes, customs, games,....
68
Most mismatching concepts are
not new types
• Refer to sets of types in specific circumstances or
to concept that are dependent on these types, next
to {rivierwater}NL there are many other:
{theewater}NL (water used for making tea)
{koffiewater}NL (water used for making coffee)
{bluswater}NL (water used for extinguishing fire)
• Relate to linguistic phenomena:
– gender, perspective, aspect, diminutives, politeness,
pejoratives, part-of-speech constraints
69
KIF expression for gender
marking
• {teacher}EN => ((instance x Human) and
(agent x TeachingProcess))
• {Lehrer}DE => ((instance x Man) and
(agent x TeachingProcess))
• {Lehrerin}DE => ((instance x Woman) and
(agent x TeachingProcess))
70
KIF expression for perspective
sell: subj(x), direct obj(z),indirect obj(y)
versus
buy: subj(y), direct obj(z),indirect obj(x)
=> (and (instance x Human)(instance y Human)
(instance z Entity) (instance e FinancialTransaction)
(source x e) (destination y e) (patient e)
The same process but a different perspective by subject
and object realization: marry in Russian two verbs,
apprendre in French can mean teach and learn
71
Open Questions/Challenges
• What is a word, i.e., a lexical unit?
• What is the status of complex lexemes like
English lightning rod, word of mouth, find
out, kick the bucket?
• What is a semantic unit, i.e. a concept?
72
Open Questions/Challenges
• Is there a core inventory of concepts that are
universally encoded?
• If so, what are these concepts?
• How can crosslinguistic equivalence be verified?
• Is there systematicity to the language-specific
extensions?
• What are the lexicalization patterns of individual
languages?
• Are lexical gaps accidental or systematic?
73
Global Wordnet Grid
Installations
• Global Wordnet Association:
– Upload and GWG viewer through DebVisDic
• KYOTO project
– WordNet-LMF
– Web service and editor
– Mapping of complete English Wordnet to
DOLCE
DebVisDic GWA
http://deb.fi.muni.cz:9000/gwgeng?action=listPreview
GWA-KYOTO
Knowledge Integration in KYOTO
•
A model of division of labour (along the lines of Putnam 1975) in which
knowledge is stored in 3 layers:
– Vocabularies, term databases, etc. (SKOS)
– WordNet (WN-LMF)
– Ontology (OWL-DL)
•
Mapping relations that support the division of labour
– language-specific conceptualizations
•
Each layer supports different types of inferencing
– SparQL queries
– Graph algorithms (UKB, SSID+)
– Formal reasoning (OWL-DL reasoners, FACT++)
3-layered knowledge model
Division of labor (Putnam 1975)
Geonames
8 million places
Kyoto ontology
2,000 classes, 3,000 axioms
English Wordnet 3.0
100,000 concepts
Top synsets
663
mappings
Base Concepts
1,000
Dolce-Lite
OntoWordNet
Perdurant
200,000 mappings Endurant
Quality
Domain concepts
150,486
mappings
Species 2000
3 million species
All nouns
All verbs
All adjectives
Domain Wordnet
1216 synsets
210 WN3.0 synsets
Ont: pollution
Ont: warming
Spanish WN: BC+equi
990 mappings
1,006 new synsets
Wn: greenhouse gas wo:done-by
wo:patient
Wn: polluted water
Japan WN: BC+equi
Basque WN: BC+equi
Chinese WN: BC+equi
Italian WN: BC+equi
Dutch WN: BC+equi
Division of labor in knowledge sources
Skos database
2.1 million species
Animalia
Chordata
Wordnet-LMF
100,000 synsets
animal:1
Base Concept
chordate:1
Amphibia
vertebrate:1,craniate:1
Ontology-OWL-DL
2,000 classes & 3,000 axioms
endurant
perdurant
physical-object endanger
organism
Anura
Leptodactylidae
Eleutherodactylus
Eleutherodactylus
atrabracus
Eleutherodactylus
augusti
amphibian:3
frog:1, toad:1, toad frog:1,
anuran:1, batrachian:1, salientian:1
barking frog
Term database
500,000 terms
endemic frog
endangered frog
poisonous frog
alien frog
Example





268 Species 2000 concepts
Animalia/Chordata/Aves/Anseriformes/Anati
dae/Anas/ITS-175103 : Yellow-billed Pintail
eng-3.0-01847565-n <Anas, genus Anas>
297 WN3.0 Base Concepts
 01507175-n 05 399 bird_genus
Connected to KYOTO ontology
 bird_genus-eng-3.0-01507175-n type
Wordnet ontology relations
Rigid vs. Non-rigid
Rigid
•
Synset:Endurant; Synset:Perdurant; Synset:Quality:
•
sc_equivalenceOf or sc_subclassOf
Non-rigid:
•
Synset:Role; Synset:Endurant
•
sc_domainOf: range of ontology types that restricts a role
•
sc_playRole: role that is being played
Rigidity can be detected automatically (Rudify, 80% precision,
IAG 80%) and is stored in Wordnet-LMF as attributes to synsets
Wordnet to ontology mappings
{create, produce, make}Verb, English {meat}Noun, English
-> sc_ equivalenceOf construction
-> sc_domainOf cow, sheep, pig
{artifact, artefact}Noun, English
-> sc_playRole patient
-> sc_playRole eat
-> sc_domainOf physical_object
{ 名 肉 , 食物 , 餐 }Noun, Chinese
-> sc_playRole result-existence
-> sc_domainOf animal
-> sc_participantOf construction
-> sc_playRole patient
{kunststof}Noun, Dutch // lit. artifact -> sc_playRole eat
substance
{ ‫ طعام‬,‫ لحم‬,‫}غذاء‬Noun, Arabic
-> sc_domainOf amount_of_matter
-> sc_domainOf cow, sheep
-> sc_playRole result-existence
-> sc_playRole patient
-> sc_participantOf construction
-> sc_playRole eat
Wordnet to ontology mappings
{teacher}Noun, English
-> sc_domainOf human
-> sc_playRole done-by
-> sc_participantOf teach
{leraar}Noun, Dutch // lit. male teacher
-> sc_domainOf man
-> sc_playRole done-by
-> sc_participantOf teach
{lerares}Noun, Dutch // lit. female teacher
-> sc_domainOf woman
-> sc_playRole done-by
-> sc_participantOf teach
Wordnet-LMF
<LexicalEntry id="footprint">
<Lemma writtenForm="footprint" partOfSpeech="n"/>
<Sense id="footmark_1" synset="eng-30-06645039-n">
<MonolingualExternalRefs>
<MonolingualExternalRef externalSystem="Wordnet3.0" externalReference="" />
</MonolingualExternalRefs>
</Sense>
</LexicalEntry>
<Synset/>
<SenseAxis/>
<SenseAxis id="sa_ita16-eng30_001" relType="eq_synonym">
<Target ID="ita-16-1251-n" />
<Target ID="eng-30-13480848-n"/>
</SenseAxis>
WN-LMF Synset relations
<Synset id="eng-30-06645039-n" baseConcept="0"> <!-- footprint -->
<Definition gloss="mark of a foot or shoe on a surface">
<Statement example="the police made casts of the footprints in the soft earth outside the window" />
</Definition>
<OntologicalMetaProperties rigidValue=”1”>
<rigid score=”0.57” author=”Rudify1.0” date="2008-07-01">
<non-rigid score=”0.09” author=”Rudify1.0” date="2008-07-01">
</OntologicalMetaProperties>
<SynsetRelations/>
<MonolingualExternalRefs>
<MonolingualExternalRef externalSystem="SUMO" externalRef="superficialPart" relType="at"/>
<MonolingualExternalRef externalSystem="KYO" externalRef="mark" relType="sc_subclassOf"/>
</MonolingualExternalRefs>
</Synset>
WN-LMF Synset relations
<Synset id="eng-30-06852312-n" baseConcept="0"> <!-- migration bird -->
<Definition gloss="birds that migrate in winter to warmer regions"/>
<OntologicalMetaProperties rigidValue=”0”>
<rigid score=”0.00” author=”Rudify1.0” date="2008-07-01">
<non-rigid score=”0.69” author=”Rudify1.0” date="2008-07-01">
</OntologicalMetaProperties>
<SynsetRelations/>
<MonolingualExternalRefs>
<Statement>
<MonolingualExternalRef externalSystem="KY" externalRef="bird" relType="sc_domainOf"/>
<MonolingualExternalRef externalSystem="KY" externalRef="done-by" relType="sc_playRole"/>
<MonolingualExternalRef externalSystem="KY" externalRef="migration" relType="sc_participantOf"/>
</Statement>
</MonolingualExternalRefs>
</Synset>
WN-LMF Synset relations
<Synset id="eng-30-02356039-n" baseConcept="0"> <!-- wonder woman -->
<Definition gloss="a woman that can achieve great things"/>
<OntologicalMetaProperties rigidValue=”0”>
<rigid score=”0.00” author=”Rudify1.0” date="2008-07-01">
<non-rigid score=”0.69” author=”Rudify1.0” date="2008-07-01">
</OntologicalMetaProperties>
<Subjectivity positive=”1” negative=”0”/>
<SynsetRelations/>
<MonolingualExternalRefs/>
</Synset>
KYOTO project: Wikyoto editor
http://www.wikyoto.net/