Ontology-Based Event Modeling for Semantic Understanding of

NLP&CC 2012 – Beijing, China
Ontology-Based Event Modeling
for Semantic Understanding of
Chinese News Story
Wang Wei, Zhao Dongyan
Institute of Computer Science & Technology
[email protected]
Outline


Introduction
Related Work



News Ontology Event Model




Event definitions
Existing event models
The Design of NOEM
Main Concepts and Properties in NOEM
Evaluation
Conclusion
NLP&CC, Beijing, China
-2-
Introduction

“News Information Overload”


Numerous online news service providers
Explosive increase of online news users
Persons
(Ten thousand )
NLP&CC, Beijing, China
Numbers of online news users and time
they spend in browsing news
-3-
Introduction

Classification & summarization are widely used in
online news domain



document-oriented techniques based on traditional “BOW” models
can not provide sufficient event semantic information
Users need intelligent event level semantic news
services


to push events but not documents to users
employing entities and relations to provide semantic navigation, e.g.,
renlifang of Microsoft, soso waltz of Tencent
Web of Document
NLP&CC, Beijing, China
Web of Data
-4-
Web of entity and relation
Introduction

How to provide multi-dimensional semantic navigation?

5W1H:Who, When, Where, What, Why, How
事件发生地是
鹭岛而非香港
基于关键词
的分析,容
易造成“语
义”错误
上海演唱会是
王菲的事件,
与刘德华无关
NLP&CC, Beijing, China
-5-
Introduction

Our research aim is



semantic understanding of Chinese news by extracting
entities, relations involved in a key event of a news story
building a news events knowledge base as well as a semantic
retrieval engine to support event level semantic applications
We implemented

a novel framework to address the whole list of 5W1H




key event identification
event semantic elements extraction
Ontology-based event knowledge base construction
This paper discusses

Ontology-Based Event Modeling for Semantic Understanding
of Chinese News Story
NLP&CC, Beijing, China
-6-
Methodology

5WIH elements extraction in key events of Chinese
news story
Chinese
Online
News

Key event
identification in one
news story
5W1H event
semantic-elements
extraction
Event semantic modeling and
ontology population
Event
knowledge
base
We try to build a practical Chinese event extraction
system by combining



Natural Language Processing technologies (Lexical analysis, NER)
Machine Learning (SVM, CRF)
Semantic Web technologies (Ontology, OWL, Rules)
NLP&CC, Beijing, China
-7-
Outline


Introduction
Related Work



News Ontology Event Model




Event Definitions
Existing Event Models
The Design of NOEM
Main Concepts and Properties in NOEM
Evaluation
Conclusion
NLP&CC, Beijing, China
-8-
Related Work

Event Definitions

WordNet


Cognitive psychologists


“a cover term for situations that happen or occur. Events can be punctual or last for a
period of time.”
ACE (Automatic Content Extraction)


“an event can be defined in terms of three components: a predicate; an interval of time
on which the predicate occurs and a situation or set of conditions under which the
predicate occurs.”
TimeML


“happenings in the outside world”, people observe and understand the world through
event .
Linguists (Chung and Timberlake, 1985)


“something that happens at a given place and time.”
“an event involving zero or more ACE entities, values and time expressions”
Event-based summarization

atomic events: link major constituent parts (participants, locations, times) of events
through verbs or action nouns labeling the event itself.
NLP&CC, Beijing, China
-9-
Related Work

We define event as


“an event is a specific occurrence which involves in some
participants”.
It has three components:



a predicate;
core participants, i.e., agents and patients;
auxiliary participants, i.e., time and location of the event.
<S, P, O, T, L>, where S, P, O are core elements
and T, L are subordinates.

These participants are usually named entities which correspond
to what, who, whom, when, where elements of an event.
NLP&CC, Beijing, China
-10-
Related Work

Existing Event Models

Script Theory, Event Domain Cognitive Model


Probabilistic Event Model


MUC & ACE
Generic Event Model


Event-based automatic summarization
Structural Event Model


TDT
Atomic Event Model


Cognitive linguistics
Eventcentric multimedia data management
Ontology Event Models

ABC, PROTON, EO (Event Ontology) , Event-Model-F
NLP&CC, Beijing, China
-11-
Outline


Introduction
Related Work



News Ontology Event Model




Event Definitions
Existing Event Models
The Design of NOEM
Main Concepts and Properties in NOEM
Evaluation
Conclusion
NLP&CC, Beijing, China
-12-
News Ontology Event Model

Modeling

(1) event information, (2) event relations, (3) event media
NLP&CC, Beijing, China
-13-
News Ontology Event Model


Main concepts
Relations
NLP&CC, Beijing, China
-14-
Outline


Introduction
Related Work



News Ontology Event Model




Event Definitions
Existing Event Models
The Design of NOEM
Main Concepts and Properties in NOEM
Evaluation
Conclusion
NLP&CC, Beijing, China
-15-
Evaluation

Janez Brank et. al. classified ontology evaluation
methods into four categories:




(1) Comparing the ontology to a “golden standard”;
(2) Using an ontology in an application and evaluating the
results;
(3) Comparing with a source of data about the domain to be
covered by the ontology;
(4) Evaluation is done by humans who try to assess how well
the ontology meets a set of predefined criteria, standards,
requirements.
NLP&CC, Beijing, China
-16-
Evaluation

Comparison between NOEM and existing event models
NLP&CC, Beijing, China
-17-
Evaluation

Category code
Manual labeling




4 postgraduates
6000+ Chinese News
stories from Xinhua news
agency
Covers 23 top classes and
2082 subclasses of CNML
In 85% of them, we found


a topic sentence which
contains key event of the news
4/5Ws in the topic sentence
which can be described by
NOEM appropriately
NLP&CC, Beijing, China
-18-
Category name
Subclasses
1
政治
85
2
法律、司法
76
3
对外关系、国际关系
72
4
军事
129
5
社会、劳动、灾难事故
105
11
经济
132
12
经济理论研究
132
13
基本建设、建筑业、房地产
47
14
农业、农村
99
15
矿业、工业
239
16
能源、水务、水利
69
17
信息产业
72
18
交通运输、邮政、物流
65
19
商业、外贸、海关
55
21
服务业、旅游业
84
22
环境、气象
43
31
教育
63
33
科学技术
70
35
文化、娱乐休闲
98
36
文学、艺术
130
37
传媒业
61
38
医药、卫生
88
39
体育
68
Evaluation: A Case Study

Chinese President Hu Jintao arrived in Canada for a state visit

Result of 5W1H extraction of
key event
<抵达, isTypeof, Movement/Transport>,
<胡锦涛, isTypeof, Person>,
<8日, isTypeof, Time> ,
<渥太华, isTypeof, Place>
5W1H
Extraction
NLP&CC, Beijing, China
-19-
……
Evaluation: Population of NOEM

Chinese President Hu Jintao arrived in Canada for a state visit

NLP&CC, Beijing, China
-20-
An automatic generated OWL File
Outline


Introduction
Related Work



News Ontology Event Model




Event Definitions
Existing Event Models
The Design of NOEM
Main Concepts and Properties in NOEM
Evaluation
Conclusion
NLP&CC, Beijing, China
-21-
Conclusion

Main contributions

an extensive investigation of “event” and “event modeling”


the design of ontology-based event model: NOEM


the usage of concept of 5W1H semantic elements in Chinese news
domain
defining concepts of entities (time, person, location, organization
etc.), events and relationships to capture temporal, spatial,
information, experiential, structural and causal aspect, e.g. the
5W1H, of an event
Future work

building a news events knowledge base and a semantic
retrieval engine on NOEM to support event level semantic
applications
NLP&CC, Beijing, China
-22-
The End
Thank you for your patience!
Q&A
Framework

A streamline of three steps and six sub-tasks



(1) Title classification and (2) topic sentences extraction for key event
identification;
(3) Semantic role labeling and (4) 5W1H elements identification for
event semantic elements extraction;
(5) NOEM definition and (6) Ontology population for event knowledge
base construction.
NLP&CC, Beijing, China
-24-
Publications

Please see our previous work for more details



Key Event Extraction

Wang, W., Zhao, D., Zhao, W.: Identification of topic sentence about key
event in Chinese News. Acta Scientiarum Naturalium Universitatis
Pekinensis 47(5),789–796 (2011).
5Ws Extraction

Wang, W., Zhao, D., Zou, L., Wang, D., Zheng, W.: Extracting 5W1H Event
Semantic Elements from Chinese Online News. In: Chen, L., Tang, C., Yang,
J., Gao, Y. (eds.) WAIM 2010. LNCS, vol. 6184, pp. 644–655. Springer,
Heidelberg (2010)

Wang W., Zhao D., Wang D.: Chinese news event 5w1h elements extraction
using semantic role labeling. In: the 3th ISIP. pp. 484–489(2010)
Framework

Wang, W., Zhao, D.: Chinese News Event 5W1H Semantic Elements
Extraction for Event Ontology Population. WWW2012 PhD symposium.
Lyon, France. (2012)
NLP&CC, Beijing, China
-25-
NLP&CC, Beijing, China
-26-
Title Based Key Event Extraction
Input: News document
Output: Topic sentences
Begin
NLP-based Preprocessing:
Title classification; // classified the title into informative or non-informative
Topic words extraction; //1)TFIDF; 2) PageRank in word co-occurrence graph
Title & Topic words co-occurrence analysis; //(1)
For each sentence do:
Term frequency scoring; //(2)
Sentence location scoring; //(3)
Sentence length scoring; //(4)
Name entity scoring; //(5)
Sentence and title similarity scoring; //(6)
Sentence weighting & ranking; //(8)
End do
End
NLP&CC, Beijing, China
-27-
Chinese News Semantic Elements Extraction
Input: Topic Sentences
Output: < Subject, Predicate, Object, Time, Location> & How of news
Begin
For each topic sentence do
HMM-based
NER tool
1) NE recognition;
CRF-based
NP tagger
2) NP recognition;
What
3) Event identification and classification by verb-driven & SVM ;
4) Syntactic-semantic rules-based <Subject, Predicate, Object> recognition;
5) Time expressions identification and normalization;
6) Location identification;
When
7) Topic sentences as short summarization;
End do
End
NLP&CC, Beijing, China
How
-28-
Where
Who did what to
whom