NLP&CC 2012 – Beijing, China Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology [email protected] Outline Introduction Related Work News Ontology Event Model Event definitions Existing event models The Design of NOEM Main Concepts and Properties in NOEM Evaluation Conclusion NLP&CC, Beijing, China -2- Introduction “News Information Overload” Numerous online news service providers Explosive increase of online news users Persons (Ten thousand ) NLP&CC, Beijing, China Numbers of online news users and time they spend in browsing news -3- Introduction Classification & summarization are widely used in online news domain document-oriented techniques based on traditional “BOW” models can not provide sufficient event semantic information Users need intelligent event level semantic news services to push events but not documents to users employing entities and relations to provide semantic navigation, e.g., renlifang of Microsoft, soso waltz of Tencent Web of Document NLP&CC, Beijing, China Web of Data -4- Web of entity and relation Introduction How to provide multi-dimensional semantic navigation? 5W1H:Who, When, Where, What, Why, How 事件发生地是 鹭岛而非香港 基于关键词 的分析,容 易造成“语 义”错误 上海演唱会是 王菲的事件, 与刘德华无关 NLP&CC, Beijing, China -5- Introduction Our research aim is semantic understanding of Chinese news by extracting entities, relations involved in a key event of a news story building a news events knowledge base as well as a semantic retrieval engine to support event level semantic applications We implemented a novel framework to address the whole list of 5W1H key event identification event semantic elements extraction Ontology-based event knowledge base construction This paper discusses Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story NLP&CC, Beijing, China -6- Methodology 5WIH elements extraction in key events of Chinese news story Chinese Online News Key event identification in one news story 5W1H event semantic-elements extraction Event semantic modeling and ontology population Event knowledge base We try to build a practical Chinese event extraction system by combining Natural Language Processing technologies (Lexical analysis, NER) Machine Learning (SVM, CRF) Semantic Web technologies (Ontology, OWL, Rules) NLP&CC, Beijing, China -7- Outline Introduction Related Work News Ontology Event Model Event Definitions Existing Event Models The Design of NOEM Main Concepts and Properties in NOEM Evaluation Conclusion NLP&CC, Beijing, China -8- Related Work Event Definitions WordNet Cognitive psychologists “a cover term for situations that happen or occur. Events can be punctual or last for a period of time.” ACE (Automatic Content Extraction) “an event can be defined in terms of three components: a predicate; an interval of time on which the predicate occurs and a situation or set of conditions under which the predicate occurs.” TimeML “happenings in the outside world”, people observe and understand the world through event . Linguists (Chung and Timberlake, 1985) “something that happens at a given place and time.” “an event involving zero or more ACE entities, values and time expressions” Event-based summarization atomic events: link major constituent parts (participants, locations, times) of events through verbs or action nouns labeling the event itself. NLP&CC, Beijing, China -9- Related Work We define event as “an event is a specific occurrence which involves in some participants”. It has three components: a predicate; core participants, i.e., agents and patients; auxiliary participants, i.e., time and location of the event. <S, P, O, T, L>, where S, P, O are core elements and T, L are subordinates. These participants are usually named entities which correspond to what, who, whom, when, where elements of an event. NLP&CC, Beijing, China -10- Related Work Existing Event Models Script Theory, Event Domain Cognitive Model Probabilistic Event Model MUC & ACE Generic Event Model Event-based automatic summarization Structural Event Model TDT Atomic Event Model Cognitive linguistics Eventcentric multimedia data management Ontology Event Models ABC, PROTON, EO (Event Ontology) , Event-Model-F NLP&CC, Beijing, China -11- Outline Introduction Related Work News Ontology Event Model Event Definitions Existing Event Models The Design of NOEM Main Concepts and Properties in NOEM Evaluation Conclusion NLP&CC, Beijing, China -12- News Ontology Event Model Modeling (1) event information, (2) event relations, (3) event media NLP&CC, Beijing, China -13- News Ontology Event Model Main concepts Relations NLP&CC, Beijing, China -14- Outline Introduction Related Work News Ontology Event Model Event Definitions Existing Event Models The Design of NOEM Main Concepts and Properties in NOEM Evaluation Conclusion NLP&CC, Beijing, China -15- Evaluation Janez Brank et. al. classified ontology evaluation methods into four categories: (1) Comparing the ontology to a “golden standard”; (2) Using an ontology in an application and evaluating the results; (3) Comparing with a source of data about the domain to be covered by the ontology; (4) Evaluation is done by humans who try to assess how well the ontology meets a set of predefined criteria, standards, requirements. NLP&CC, Beijing, China -16- Evaluation Comparison between NOEM and existing event models NLP&CC, Beijing, China -17- Evaluation Category code Manual labeling 4 postgraduates 6000+ Chinese News stories from Xinhua news agency Covers 23 top classes and 2082 subclasses of CNML In 85% of them, we found a topic sentence which contains key event of the news 4/5Ws in the topic sentence which can be described by NOEM appropriately NLP&CC, Beijing, China -18- Category name Subclasses 1 政治 85 2 法律、司法 76 3 对外关系、国际关系 72 4 军事 129 5 社会、劳动、灾难事故 105 11 经济 132 12 经济理论研究 132 13 基本建设、建筑业、房地产 47 14 农业、农村 99 15 矿业、工业 239 16 能源、水务、水利 69 17 信息产业 72 18 交通运输、邮政、物流 65 19 商业、外贸、海关 55 21 服务业、旅游业 84 22 环境、气象 43 31 教育 63 33 科学技术 70 35 文化、娱乐休闲 98 36 文学、艺术 130 37 传媒业 61 38 医药、卫生 88 39 体育 68 Evaluation: A Case Study Chinese President Hu Jintao arrived in Canada for a state visit Result of 5W1H extraction of key event <抵达, isTypeof, Movement/Transport>, <胡锦涛, isTypeof, Person>, <8日, isTypeof, Time> , <渥太华, isTypeof, Place> 5W1H Extraction NLP&CC, Beijing, China -19- …… Evaluation: Population of NOEM Chinese President Hu Jintao arrived in Canada for a state visit NLP&CC, Beijing, China -20- An automatic generated OWL File Outline Introduction Related Work News Ontology Event Model Event Definitions Existing Event Models The Design of NOEM Main Concepts and Properties in NOEM Evaluation Conclusion NLP&CC, Beijing, China -21- Conclusion Main contributions an extensive investigation of “event” and “event modeling” the design of ontology-based event model: NOEM the usage of concept of 5W1H semantic elements in Chinese news domain defining concepts of entities (time, person, location, organization etc.), events and relationships to capture temporal, spatial, information, experiential, structural and causal aspect, e.g. the 5W1H, of an event Future work building a news events knowledge base and a semantic retrieval engine on NOEM to support event level semantic applications NLP&CC, Beijing, China -22- The End Thank you for your patience! Q&A Framework A streamline of three steps and six sub-tasks (1) Title classification and (2) topic sentences extraction for key event identification; (3) Semantic role labeling and (4) 5W1H elements identification for event semantic elements extraction; (5) NOEM definition and (6) Ontology population for event knowledge base construction. NLP&CC, Beijing, China -24- Publications Please see our previous work for more details Key Event Extraction Wang, W., Zhao, D., Zhao, W.: Identification of topic sentence about key event in Chinese News. Acta Scientiarum Naturalium Universitatis Pekinensis 47(5),789–796 (2011). 5Ws Extraction Wang, W., Zhao, D., Zou, L., Wang, D., Zheng, W.: Extracting 5W1H Event Semantic Elements from Chinese Online News. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 644–655. Springer, Heidelberg (2010) Wang W., Zhao D., Wang D.: Chinese news event 5w1h elements extraction using semantic role labeling. In: the 3th ISIP. pp. 484–489(2010) Framework Wang, W., Zhao, D.: Chinese News Event 5W1H Semantic Elements Extraction for Event Ontology Population. WWW2012 PhD symposium. Lyon, France. (2012) NLP&CC, Beijing, China -25- NLP&CC, Beijing, China -26- Title Based Key Event Extraction Input: News document Output: Topic sentences Begin NLP-based Preprocessing: Title classification; // classified the title into informative or non-informative Topic words extraction; //1)TFIDF; 2) PageRank in word co-occurrence graph Title & Topic words co-occurrence analysis; //(1) For each sentence do: Term frequency scoring; //(2) Sentence location scoring; //(3) Sentence length scoring; //(4) Name entity scoring; //(5) Sentence and title similarity scoring; //(6) Sentence weighting & ranking; //(8) End do End NLP&CC, Beijing, China -27- Chinese News Semantic Elements Extraction Input: Topic Sentences Output: < Subject, Predicate, Object, Time, Location> & How of news Begin For each topic sentence do HMM-based NER tool 1) NE recognition; CRF-based NP tagger 2) NP recognition; What 3) Event identification and classification by verb-driven & SVM ; 4) Syntactic-semantic rules-based <Subject, Predicate, Object> recognition; 5) Time expressions identification and normalization; 6) Location identification; When 7) Topic sentences as short summarization; End do End NLP&CC, Beijing, China How -28- Where Who did what to whom
© Copyright 2025 Paperzz