Biperpedia: An Ontology for Search Applications Authors: Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu Presented By: Pan Dai Spring2017CSCI586 | 1 Outline Ø Introduction Ø Problem Definition Ø The Biperpedia System (Architecture of Biperpedia) Ø Query Stream Extraction Ø Extraction from Web Text Ø Synonym Detection Ø Attribute Quality Evaluation Ø Finding Best Classes: Placement Algorithm Ø Interpreting Web Tables: Mapping Algorithm Ø Conclusions & Discussion Spring2017CSCI586 | 2 Biperpedia: An ontology of binary attributes that contains up to two orders of magnitude more attributes than Freebase Scope: An attribute in Biperpedia is a relationship between : § a pair of entities (e.g., CAPITAL of countries) Figure 1: The elements of a Biperpedia attribute § an entity and a value (e.g., COFFEE PRODUCTION) § an entity and a narrative (e.g., CULTURE) Spring2017CSCI586 | 3 Biperpedia: v Goal: to support search applications v Includes constructs that facilitates query and text understanding v Attaches to every attribute a set of common misspellings of 1). Attribute 2). Synonyms (some which may be approximate) 3). Other related attributes (even if the specific relationship is unknown) 4). Common text phrases that mention the attribute Spring2017CSCI586 | 4 Problem Definition To find schema-level attributes that can be associated with classes of entities. (e.g. CAPITAL, GDP, LANGUAGES SPOKENà attributes of COUNTRIES.) § Class hierarchy § Name, domain class, and range § Synonyms and misspellings § Related attributes and mentions § Provenance § Differences from a traditional ontology § Evaluation Spring2017CSCI586 | 5 Biperpedia System q Phase 1: Extract candidate attributes q Phase 2: Merge extractions & enhance ontology o Attribute extraction o Ontology enhancement o Misspellings o Synonyms o Sub-attributes o Best class o Categorization as numeric/textual/non-atomic Biperpedia extraction pipeline Spring2017CSCI586 | 6 Query Stream Extraction: v Find candidate attributes v Reconcile to Freebase v Remove co-reference mentions v Output attribute candidates Spring2017CSCI586 | 7 Extraction From Web Text Extraction via distant supervision Spring2017CSCI586 | 8 Attribute extraction using induced patterns • Need a pattern-induction pipeline to learn the large variety of patterns • Cannot simply operate with a small set of hand-written patterns Spring2017CSCI586 | 9 Attribute Classification Features Spring2017CSCI586 | 10 Classifier Optimize the training objective: q W: the weight vector to be trained q F(xi, yi) is the hashed feature vector for training attribute xi labeled as q are the L1/L2 hyperparameters set using cross-validation q This objective is optimized using a standard off-the-shelf secondorder solver q For grid search and cross-validation Spring2017CSCI586 | 11 Experiments PrecisionRecall Spring2017CSCI586 | 12 Synonym Detection e.g. TOURIST ATTRACTIONS and TOURIST SPOTS of a country are synonymous For spell correctionà rely on the search engine To detect synonyms, use an SVM classifier The precision of the SVM-based synonymizer is 0.87 Spring2017CSCI586 | 13 Attribute Quality Experimental Setting Much Bigger NotExist Spring2017CSCI586 | 14 Overall Quality Spring2017CSCI586 | 15 Finding Best Classes Not too general, not too specific! Spring2017CSCI586 | 16 Placement Algorithm Diversity! Class C; Attribute A Define S(C,A) to be the maximal support it gets from any of the sources Spring2017CSCI586 | 17 Placement Algorithm Evaluation ü Mexact: ratio of number of exact assignments to all assignments ü Mapprox: ratio of number of approximate assignments to all assignments ü Mfiltered_exact (resp.Mfiltered_approx ) is the same as Mexact(resp.Mapprox) (applied only to attributes deemed good in the experiments) Spring2017CSCI586 | 18 Interpreting Web Tables Mapping Algorithm (1) Preprocess (2) Match Spring2017CSCI586 | 19 Interpreting Web Tables--Evaluation Interpretation Quality Comparison with Freebase Spring2017CSCI586 | 20 Interpreting Web Tables--Evaluation Error Analysis: errors in detecting representative attributes Causes of Errors: 1).Description constant & Class hierarchy gaps. 2). Insufficient information 3). Indeed attributes disagreed by evaluators . 4).Insufficient Biperpedia coverage 5). Sophisticated attribute name & Noun-phrase recognition required Spring2017CSCI586 | 21 Conclusions & Discussion BiperpediaàAn ontology of binary attributes Using the high-quality attributes to seed extraction from text Developing methods for classifying different relationships Mining a grammar for complex attribute names Applying algorithms to query stream with possibly different results References Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu. Biperpedia: An Ontology for Search Applications. Proceedings of the International Conference on Very Large Data Bases (VLDB 2014). Spring2017CSCI586 | 22 Spring2017CSCI586 | 23
© Copyright 2025 Paperzz