PowerPoint ****

Effective Entity Recognition and Typing
by Relation Phrase-Based Clustering
20151116
Content
• Motivation
• Definition
• Problem model
• ClusType Algorithm
• Experiments
Motivation
• Fine-Grained type information is useful for downstream applications (e.g, it
improved the F1 score by 93% for a relation extraction system[1])
• Traditional named entity recognition systems are designed for several major
types (e.g., person, organization, location) and general domains (e.g., news),
require additional steps for adaptation to a new domain and new types.
• Entity linking techniques suffer coverage and freshness (e.g., over 50% entities
mentioned in Web documents are unlinkable [2])
• Previous methods have difficulties in handling entity mentions with sparse
context. there are often many ways to describe even the same relation between
two entities (e.g., “beat” and “won the game 34-28 over”)
Definition
Problem model
• Based on several hypotheses
• Hypothesis 1: Entity-Relation Co-occurrences
• If surface name c often appears as the left (right) argument of relation phrase
p, then c's type indicator tends to be similar to the corresponding type
indicator in p's type signature.
• Hypothesis 2: Mention correlation
• If there exists a strong correlation (i.e., within sentence, common neighbor
mentions) between two candidate mentions that share the same name, then
their type indicators tend to be similar.
Problem model
• Hypothesis 3: Type signature consistency.
• If two relation phrases have similar cluster memberships, the type indicators
of their left and right arguments (type signature) tend to be similar,
respectively.
• Hypothesis 4: Relation phrase similarity.
• Two relation phrases tend to have similar cluster memberships, if they have
similar (1) strings; (2) context words; and (3) left and right argument type
indicators.
H1
H3
H4
H2
ClusType Algorithm
• Framework Overview
• 1. Perform phrase mining on a POS-tagged corpus to extract candidate entity
mentions and relation phrases, and construct a heterogeneous graph G
• 2. Collect seed entity mentions ML as labels by linking extracted candidate
mentions M to the KB Ψ.
• 3. Estimate type indicator y for unlinkable candidate mention m ∈MU with G
using clustering-integrated type propagation.
Candidate Generation[4]
• 1. mining frequent contiguous patterns up to a fixed length
• 2. using a greedy agglomerative algorithm to generate longer phrases
and terminates when the next highest-score merging does not meet a
pre-defined significance threshold.
Construction of Graph G
• Name-Relation Phrase Subgraph
Construction of Graph G
• Mention Correlation Subgraph
• Mention-Name Subgraph
• Washington <-> 76_Washington
Clustering-integrated Type Propagation
• 1. Seed Mention Generation
• utilize a entity name disambiguation tool (http://spotlight.dbpedia.org/) and
only keep entity mapped with high confidence scores (η > 0.8)
• 2. Joint Optimization
• the type indicators of entity names C
• the type signatures of relation phrases {PL; PR}
• F follows from Hypothesis 1 to model type propagation
Clustering-integrated Type Propagation
•
•
•
•
follows Hypotheses 3 and 4 to model the
multi-view relation phrase clustering
models the type indicator for each entity mention
candidate, the mention-mention link and the supervision from seed mentions
• Finally, solve the real-valued relaxation of (2) and predict the exact type of
each candidate mention using
Experiments
Reference
• [1] X Ling, DS Weld. Fine-Grained Entity Recognition. AAAI, 2012
• [2] Thomas Lin, Mausam, Oren Etzioni. No noun phrase left behind:
detecting and typing unlinkable entities. EMNLP-CoNLL, 2012
• [3] Xiang Ren, Ahmed El-Kishky, Chi Wang, etc. ClusType: Effective
Entity Recognition and Typing by Relation Phrase-Based Clustering.
KDD 2015
• [4] A. El-Kishky, Y. Song, C. Wang, C. R. Voss, and J. Han. Scalable
topical phrase mining from text corpora. VLDB, 2015.