Biperpedia: An Ontology for Search Applications

Biperpedia: An Ontology for Search Applications
Authors: Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu
Presented By: Pan Dai
Spring2017CSCI586 | 1
Outline
Ø Introduction
Ø Problem Definition
Ø The Biperpedia System (Architecture of Biperpedia)
Ø Query Stream Extraction
Ø Extraction from Web Text
Ø Synonym Detection
Ø Attribute Quality Evaluation
Ø Finding Best Classes: Placement Algorithm
Ø Interpreting Web Tables: Mapping Algorithm
Ø Conclusions & Discussion
Spring2017CSCI586 | 2
Biperpedia:
An ontology of binary attributes that contains up to two
orders of magnitude more attributes than Freebase
Scope:
An attribute in Biperpedia is a
relationship between :
§ a pair of entities (e.g.,
CAPITAL of countries)
Figure 1: The elements of a Biperpedia attribute
§ an entity and a value (e.g.,
COFFEE PRODUCTION)
§ an entity and a narrative
(e.g., CULTURE)
Spring2017CSCI586 | 3
Biperpedia:
v Goal: to support search applications
v Includes constructs that facilitates query and text understanding
v Attaches to every attribute a set of common misspellings of
1). Attribute
2). Synonyms (some which may be approximate)
3). Other related attributes (even if the specific relationship is unknown)
4). Common text phrases that mention the attribute
Spring2017CSCI586 | 4
Problem Definition
To find schema-level attributes that can be associated with classes of entities.
(e.g. CAPITAL, GDP, LANGUAGES SPOKENà attributes of COUNTRIES.)
§ Class hierarchy
§ Name, domain class, and range
§ Synonyms and misspellings
§ Related attributes and mentions
§ Provenance
§ Differences from a traditional ontology
§ Evaluation
Spring2017CSCI586 | 5
Biperpedia System
q Phase 1: Extract candidate attributes
q Phase 2: Merge extractions & enhance ontology
o Attribute extraction
o Ontology enhancement
o Misspellings
o Synonyms
o Sub-attributes
o Best class
o Categorization as numeric/textual/non-atomic
Biperpedia extraction pipeline
Spring2017CSCI586 | 6
Query Stream Extraction:
v Find candidate attributes
v Reconcile to Freebase
v Remove co-reference mentions
v Output attribute candidates
Spring2017CSCI586 | 7
Extraction From Web Text
Extraction via distant supervision
Spring2017CSCI586 | 8
Attribute extraction using induced patterns
• Need a pattern-induction pipeline to
learn the large variety of patterns
• Cannot simply operate with a small
set of hand-written patterns
Spring2017CSCI586 | 9
Attribute Classification
Features
Spring2017CSCI586 | 10
Classifier
Optimize the training objective:
q W: the weight vector to be trained
q F(xi, yi) is the hashed feature vector for training attribute xi labeled as
q
are the L1/L2 hyperparameters set using cross-validation
q This objective is optimized using a standard off-the-shelf secondorder solver
q For grid search and cross-validation
Spring2017CSCI586 | 11
Experiments
PrecisionRecall
Spring2017CSCI586 | 12
Synonym Detection
e.g. TOURIST ATTRACTIONS and TOURIST SPOTS of a country are synonymous
For spell correctionà rely on the search engine
To detect synonyms, use an SVM classifier
The precision of the SVM-based synonymizer is 0.87
Spring2017CSCI586 | 13
Attribute Quality
Experimental Setting
Much
Bigger
NotExist
Spring2017CSCI586 | 14
Overall Quality
Spring2017CSCI586 | 15
Finding Best Classes
Not too general, not too specific!
Spring2017CSCI586 | 16
Placement Algorithm
Diversity!
Class C; Attribute A
Define S(C,A) to be the maximal support it
gets from any of the sources
Spring2017CSCI586 | 17
Placement Algorithm
Evaluation
ü Mexact: ratio of number of exact assignments to all assignments
ü Mapprox: ratio of number of approximate assignments to all assignments
ü Mfiltered_exact (resp.Mfiltered_approx ) is the same as Mexact(resp.Mapprox)
(applied only to attributes deemed good in the experiments)
Spring2017CSCI586 | 18
Interpreting Web Tables
Mapping Algorithm
(1) Preprocess
(2) Match
Spring2017CSCI586 | 19
Interpreting Web Tables--Evaluation
Interpretation Quality
Comparison with Freebase
Spring2017CSCI586 | 20
Interpreting Web Tables--Evaluation
Error Analysis: errors in detecting representative attributes
Causes of Errors:
1).Description constant & Class hierarchy gaps.
2). Insufficient information
3). Indeed attributes disagreed by evaluators .
4).Insufficient Biperpedia coverage
5). Sophisticated attribute name & Noun-phrase
recognition required
Spring2017CSCI586 | 21
Conclusions & Discussion
BiperpediaàAn ontology of binary attributes
Using the high-quality attributes to seed extraction from text
Developing methods for classifying different relationships
Mining a grammar for complex attribute names
Applying algorithms to query stream with possibly different results
References
Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu.
Biperpedia: An Ontology for Search Applications. Proceedings of the
International Conference on Very Large Data Bases (VLDB 2014).
Spring2017CSCI586 | 22
Spring2017CSCI586 | 23