Exploratory learning

1
AUTOMATIC GLOSS FINDING
for a Knowledge Base using Ontological Constraints
Bhavana Dalvi (PhD Student, LTI)
Work done with:
Prof. William Cohen, CMU
Prof. Einat Mikov, University of Haifa
Prof. Partha Talukdar, IISC Bangalore
2
Motivation
Need for gloss finding
3
KBs are useful for many NLP tasks:
E.g. Question answering
 Lot of research in fact extraction to populate KBs
 Glosses can help further in applications like
- Word/Entity sense disambiguation
- Information retrieval
 Automatically constructed KBs lack glosses
e.g. NELL, YAGO

Example: Gloss finding
4
Class constraints:
 Inclusion: Every entity that is of type “Fruit” is also of type “Food”.
 Mutual Exclusion: If an entity is of type “Food” then it cannot be
of type “Organization”
Example: Gloss finding
5
Example: Gloss finding
6
Example: Gloss finding
7
Knowledge Bases:
NELL / Freebase / YAGO
Candidate glosses:
DBPedia abstracts/
Wiktionary definitions
8
Gloss Finding
Problem Definition
9
Input
Example
KB classes
Food, Fruits, Company …
Ontological constraints: Subset, Mutex
Fruit ⊆ Food, Food ∩ Company = 𝜙
Entities ‘E’ belonging to KB categories
Banana, Microsoft
Lexical strings ‘L’ that refer to entities ‘E’
E.g. ‘MS’, ‘microsoft inc’
Candidate glosses
E.g. G3: Apple, formerly Apple Computer
Inc., is an American multinational corporation
headquartered in Cupertino …
Output: Matching candidate glosses to entities in the KB
E.g. (Apple, G3)  Company:Apple
Can we use existing techniques?
10


Problem: Match potential glosses to appropriate entities
in the KB.
Entity linking: Assume existence of glosses on KB side


Input KB does not have glosses  Chicken & egg problem
Ontology alignment: Both ends being matched are
structured databases

Asymmetric problem:
Structured KB on one side, without glosses
 Candidate glosses contain text but no structure

Proposed Gloss Finding Procedure
11





Decide head-NP for a gloss: NP being defined
G3: Apple, formerly Apple Computer Inc., is an American
multinational corporation headquartered in Cupertino …
Select candidate glosses for which
string match (head-NP, KB entity)
For each gloss  a set of candidate KB entities
(Apple, G3)  (Fruit:Apple, Company:Apple)
Classify the head-NP into KB classes
using ontological constraints
(Apple, G3)  Company
Choose the KB entity match based on chosen KB category.
(Apple, G3)  Company:Apple
12
Building Classifiers
Training classifiers for KB Categories
13
Test: Ambiguous glosses
Train: Unambiguous glosses
Assumptions
14

If a gloss has only one candidate entity matching
in a KB, then it is correct
 i.e.
we assume that KB is always correct and complete
in terms of senses.
 Assumption holds for 81% for NELL dataset

Given the category, a mention is unambiguous
[Suchanek WWW’07, Nakashole ACL’13]
 i.e.
we can differentiate between entities of different
category but not within a category.
Methods
15

Baselines
 SVM
Learning
 Train
binary classifiers using unambiguous glosses
 Predict categories for ambiguous glosses

Label propagation
 PIDGIN
[Wijaya et al. CIKM’13]: Graph-based label
propagation method.

GLOFIN: semi-supervised EM + use of ontological
constraints.
Proposed Method: GLOFIN
16
Initialize model with few seeds per class
Iterate till convergence (Data likelihood)

E step: Predict labels for unlabeled points


For each unlabeled datapoint

Find P(Class | datapoint) for all classes

Assign a consistent bit vector of labels in accordance
with ontological constraints
M step: Recompute model parameters using seeds
+ predicted labels for unlabeled points
Proposed Method: GLOFIN
17
Initialize model with few seeds per class
Iterate till convergence (Data likelihood)

E step: Predict labels for unlabeled points


For each unlabeled datapoint

Find P(Class | datapoint) for all classes

Assign a consistent bit vector of labels in accordance
with ontological constraints
M step: Recompute model parameters using seeds
+ predicted labels for unlabeled points
Estimating class parameters and
assignment probabilities
18

Naïve Bayes
Independent multinomial distributions per word

K-Means
Cosine similarity between centroid and datapoint

von-Mises Fisher
Data distributed on a unit hypersphere
Proposed Method: GLOFIN
19
Initialize model with few seeds per class
Iterate till convergence (Data likelihood)

E step: Predict labels for unlabeled points


For each unlabeled datapoint

Find P(Class | datapoint) for all classes

Assign a consistent bit vector of labels in accordance
with ontological constraints
M step: Recompute model parameters using seeds
+ predicted labels for unlabeled points
Mixed Integer Linear Program
20
Input: P(Cj | Xi), Class constraints: Subset, Mutex
Output: Consistent bit vector yji for Xi
Max {likelihood of assignment
–
constraint violation penalty}
Proposed Method: GLOFIN
21
Initialize model with few seeds per class
Iterate till convergence (Data likelihood)

E step: Predict labels for unlabeled points


For each unlabeled datapoint

Find P(Class | datapoint) for all classes

Assign a consistent bit vector of labels in accordance
with ontological constraints
M step: Recompute model parameters using seeds
+ predicted labels for unlabeled points
22
Experiments
Candidate glosses
23



DBPedia is a database derived from Wikipedia
We use short abstracts (definitions upto 500
characters, from Wikipedia page)
E.g. McGill University is a research university located in Montreal
Quebec Canada Founded in 1821 during the British colonial era
the university bears the name of James McGill a prominent
Montreal merchant from Glasgow Scotland and alumnus of
Glasgow University whose bequest formed the beginning of the
university.
Knowledge bases
24
GLOFIN vs. SVM & Label propagation
25
Freebase Dataset: Performance on ambiguous glosses
100
80
60
SVM
Labal Propagation
GLOFIN-Naïve-Bayes
40
20
0
Precision
Recall
F1
GLOFIN vs. SVM & Label propagation
26
NELL Dataset: Performance on ambiguous glosses
80
70
60
50
40
30
20
10
0
SVM
Labal Propagation
GLOFIN-Naïve-Bayes
Precision
Recall
F1
Are the datasets close to real world?
27



Large fraction of data used for training
80% of NELL
90% of Freebase
In real world scenarios, amount of training data
might be a small fraction of the dataset.
We simulate this by using
10% of unambiguous glosses for training
Small amount of training data
28
Freebase Dataset: Performance on ambiguous glosses
100
80
60
SVM
Labal Propagation
GLOFIN-Naïve-Bayes
40
20
0
Precision
Recall
F1
Small amount of training data
29
NELL Dataset: Performance on ambiguous glosses
80
70
60
50
40
30
20
10
0
SVM
Labal Propagation
GLOFIN-Naïve-Bayes
Precision
Recall
F1
Compare variants of GLOFIN
30
Freebase Dataset
85
80
75
Flat
Hierarchical
70
65
60
K-Means
von-Mises Fisher Naïve Bayes
Compare variants of GLOFIN
31
NELL Dataset
70
60
50
40
30
20
10
0
Flat
Hierarchical
K-Means
von-Mises Fisher
Naïve Bayes
Some more experiments …
32




Evaluating quality of automatically acquired seeds
Manually creating gold standard for NELL dataset
Different ways of scaling GLOFIN
NELL to Freebase mappings via common glosses
http://www.cs.cmu.edu/~bbd
33
Conclusions
And Future Work …..
Conclusions
34
Completely unsupervised method for gloss finding
- using unambiguous matches as training data
- hierarchical classification instead of entity linking
 Our proposed method GLOFIN:
GLOFIN ≥ Label Propagation ≥ SVM
 Variants of Hierarchical GLOFIN
Naïve Bayes ≥ K-Means , von-Mises Fisher
 Ontological constraints help for all GLOFIN variants
Hierarchical GLOFIN ≥ Flat GLOFIN
 In future, we will like to add new entities to the KB.

Candidate NELL
entities
Entity selected
by GLOFIN
McGill University is a research
university located in Montreal
Quebec Canada Founded in 1821
during the British colonial era the
university bears the name of James
McGill a prominent …
University:E,
Sports_team:E
University:E
Kingston_upon_
Hull
Kingston upon Hull frequently
referred to as Hull is a city and
unitary authority area in the
ceremonial county of the East Riding
of Yorkshire England It stands on the
River Hull at its junction with …
City:E,
Visual_Artist:E
City:E
Robert_Southey
Robert Southey was an English poet
of the Romantic school one of the so
called Lake Poets and Poet Laureate
for 30 years from 1813 to his death
in 1843 Although his fame has been
long eclipsed by that …
Person_Europe:E, Person_Europe:E
Person_Africa:E,
Politician_USA:E
head-NP
Gloss
McGill_University
35
36
Thank You
Questions?
37
Extra Slides
Comparing of GLOFIN Approximations
38
Eval: quality of seeds for NELL KB
39


Noisy seeds: Only 81% leaf category assignments
are correct
Hierarchical labeling can help: 94% higher level
category labels are correct
Creating gold standard for NELL
40


Gold standard for evaluation on ambiguous glosses
For most glosses, precise category is part of NELL
NELL – Freebase mappings
via common glosses
41
Pros and Cons of GLOFIN
42
Advantages


Generative EM framework
that can build on SSL
methods: NBayes, K-Means,
VMF
Can label unseen
datapoints once models are
learnt.
Limitations


Assumption: Input KB is
complete and accurate.
All experiments are done in
transductive setting: need to
extend for missing entities
and categories in the KB.
Future work …
43

Adding new entities to existing KB categories
 KBs
are usually incomplete w.r.t coverage of entities.
 GLOFIN: Classifies mentions into categories

Introducing new clusters of entities: missing
categories in the KB
 Extensions
similar to Exploratory EM [Dalvi et al.
ECML’13]
 New categories: entities belonging to them, along with
glosses for those entities