Ontology 和 HowNet 董振东 董强 [email protected] [email protected] www.keenage.com Research Centre of Computer & Language Engineering Chinese Academy of Sciences 哈尔滨 2003.08 提纲 Ontology HowNet vs SUMO/WordNet/VerbNet Ontology 什么是Ontology Ontology与IT/NLP 什么是Ontology Ontology是学问 Ontology是资源 Ontology是学问 哲学上的Ontology AI/KR上的Ontology 数学上的Ontology 软件工程上的Ontology 语言学上的Ontology IT上的Ontology Ontology定义涉及的问题 内在的涵义 外在的表示 作为术语的中文翻译 Ontology与IT/NLP similar to a dictionary or glossary, but with greater detail and structure that enables computers to process its content. An ontology consists of a set of concepts, axioms, and relationships that describe a domain of interest. An upper ontology is limited to concepts that are meta, generic, abstract and philosophical … -- Standard Upper Ontology (SUO) Working Group 是一个以汉语和英语的词语所代表的概念为描述对 象,以揭示概念与概念之间以及概念所具有的属性 之间的关系为基本内容的常识知识库。 --《知网》 典型的Ontology Cyc: http:// www.cyc.com IFF: The IFF Foundation Ontology WordNet: http://www.cogsci.princeton.edu EuroWordNet: http: //www.hum.uva.nl/ewn/ HowNet: http://www.keenage.com SUMO: http://ontology.teknowledge.com EDR: http://www.iijnet.or.jp VerbNet: http://www.cis.upenn.edu/verbnet/ Prototype(sinica): http://ckip.iis.sinica.edu.tw/CKIP/ontology/ HowNet vs SUMO/WordNet/VerbNet SUMO – Suggested Upper Merged Ontology Mapping WordNet to SUMO SUMO – Suggested Upper Merged Ontology SUMO Sources SUMO Subclass Hierarchy Tree SUMO Subclass Hierarchy Tree making constructing manufacture publication cooking searching pursuing investigating diagnostic process social interaction change of possession giving unilateral giving lending getting unilateral getting borrowing Motivation for Mapping How can a formal ontology be used effectively by those who lack extensive training in logic and mathematics? How can an ontology be used automatically by applications? How can we know when an ontology is complete? 《知网》的架构 D-relation Trigger (Application Tools) S-relation Trigger (Browser) Basic Data (Concept Definitions / Taxonomies) Basic Data – Sememes Sememes Entity thing (physical, mental, fact) component (part, fitting) time space (direction, location) Event (relation, state、action) Attribute Value Secondary feature 2219 154 818 248 892 107 Basic Data – Concept Definition NO.=020957 W_C=大学生 G_C=N E_C= W_E=college student G_E=N E_E= DEF={human|人:{study|学习:agent={~},location={InstitutePlace|场 所:domain={education|教育},modifier={HighRank|高等},{study| 学习:location={~}},{teach|教:location={~}}}}} Basic Data – Taxonomies - {thing|万物} {entity|实体:{ExistAppear|存现:existent={~}}} - {physical|物质} {thing|万物:HostOf={Appearance|外观}, {perception|感知:content={~}}} - {animate|生物} {physical|物质:HostOf={Age|年龄}, {alive|活着:experiencer={~}},{die|死: experiencer={~}}, {metabolize|代谢: experiencer={~}}, {reproduce|生殖:agent={~},PatientProduct={~}}} - {AnimalHuman|动物} {animate|生物:HostOf={Sex|性别}, {AlterLocation|变空间位置:agent={~}},{StateMental|精神 状态:experiencer={~}}} - {human|人} {AnimalHuman|动物:HostOf={Name|姓名} {Wisdom|智慧}{Ability|能力}, {think|思考:agent={~}},{speak|说:agent={~}}} S-relation Trigger -- Browser D-relation Trigger -- Application Tools Relevant Concept Field Builder (相关概念场构造器) Cf. “seed list” Bonnie Dorr & Tiejun Zhao: “化学”/“射击” Sense Similarity Calculator (语义相似度计算器) “毛衣”Vs“手套”/“醋” Chinese Chunk Extractor (中文语块抽取器) 知网在海内外的应用 (1) Semantic Web ontology annotation thesaurus 陈文鋕: Semantic Processing && Semantic Web Service (台湾财团法人资讯工业策进会) Named Entity Recognition Tianfang Yao, Wei Ding, Gregor Erbach: CHINERS: A Chinese Named Entity Recognition System for the Sports Domain 知网在海内外的应用 (2) Word Sense Disambiguation Chi-Yung Wang: Knowledge-based Sense Pruning using the HowNet: an Alternative to Word Sense Disambiguation Wong Ping Wai: A Maximum Entropy Approach to HowNetBased Chinese Word sense Disambiguation Word Similarity Computing Liu Qun Li Su Jian: Word Similarity Computing Based on HowNet 知网在海内外的应用 (3) Sense Annotation Dependency Relation Annotation Li MingQin, LI Juanzi : Building A Large Chinese Corpus Annotated with Semantic Dependency Cross-language Developing 授权给台湾中央研究院资讯所合作开发HowNet Big5+版 数位典藏国家型计划(NDAP) http://ndap.org.tw/NewsLetter/content.html?subuid=559&uid=26 Thank you 当前研究的趋势 理论或哲学上的探索 做mapping、linking、merging 在应用中研究 建设常识性的或专门领域的知识体系 关于建设知识体系的一些看法 理论与工程的关系 – 把工程放在首位 研究与应用的关系 – 着眼于应用 分清什么是接轨和什么是“接鬼” 五年前有人建议我们把知网改成WordNet 最近有人建议我们按SUMO来改知网的义原 把知网这件旗袍改成两件套的西服裙 – 就是接鬼 Chinese WordNet or English Hownet? 在中文方面,也已有了一个类似词汇网路的资源,叫做《知网》 (HowNet, http://www.keenage.com)。由大陆的董振东先生在 1995年自力着手进行。它是中英/英中的一个双语词汇网路。早 期版是开放不用收费的。2002起新版改由中国科学院软件所管理 后,就需要付费使用了。 《知网》做法的特色是独树一帜;不采用英文词汇网路的架构只 要采取他自己的架构。而且他先把世界知识本体做个定义,在这 定义里再去做区分。这个由上而下的方法,与英语与欧语词汇网 路由下而上的方法不同,当然有其可取之处。可惜的是,由于当 年资源与讯息的限制,董振东教授与它的儿子董强,基本上是凭 着信念与热诚完成《知网》的,过程中绝少外界的奥援,也并未 与世界相关的研究接轨。他跟他儿子花了约有七、八年的功夫来 做这个事。但是,基本上跟其他语言的词汇网路连接,并无架构 上的基础,而其上层知识分类,也是两人的自由心证,不能说错, 却也缺乏理论的基础,面临一些其他系统互通性(interoperability)的问题。 Records in WordNet / HowNet Record in WordNet 03592879 06 n 02 watch 0 ticker 1 012 @ 03506835 n 0000 ~ 02187181 n 0000 %p 02529205 n 0000 ~ 02570752 n 0000 %p 02659936 n 0000 ~ 02841320 n 0000 %p 03021820 n 0000 ~ 03104263 n 0000 ~ 03150171 n 0000 ~ 03410656 n 0000 %p 03593482 n 0000 ~ 03636122 n 0000 | a small portable timepiece Record in HowNet NO.=007738 W_C=表 G_C=N E_C=手~,怀~,钟~,电子~,机械~,带钻石的~,这块~不防水 W_E=watch G_E=N E_E= DEF={tool|用具:{tell|告诉:content={time|时间},instrument={~}}} Axiom in SUMO / HowNet (1) See SUMO_buy.doc Cf. HowNet Event Relation & Role shifting {buy|买} <----> {obtain|得到} [consequence]; agent OF {buy|买}=possessor OF {obtain|得到}; possession OF {buy|买}=possession OF {obtain|得到}. {buy|买} (X) <----> {sell|卖} (Y) [mutual implication]; agent OF {buy|买}=target OF {sell|卖}; source OF {buy|买}=agent OF {sell|卖}; possession OF {buy|买}=possession OF {sell|卖}; cost OF {buy|买}=cost OF {sell|卖}. Axiom in SUMO / HowNet (2) {buy|买} [entailment] <----> {choose|选择}; agent OF {buy|买}=agent OF {choose|选择}; possession OF {buy|买}=content OF {choose|选择}; source OF {buy|买}=location OF {choose|选择}. {buy|买} [entailment] <----> {pay|付}; agent OF {buy|买}=agent OF {pay|付}; cost OF {buy|买}=possession OF {pay|付}; source OF {buy|买}=taget OF {pay|付}. Thematic Roles in VerbNet / HowNet See VerbNet_buy.doc Thematic Roles Agent[+animate OR +organization] Asset[+currency] Beneficiary[+animate OR +organization] Source[+concrete] Theme[] Cf. HowNet Event Role with Typical Actors │ ├ {buy|买} {take|取:agent={human|人}{group|群体->}, possession={artifact|人工物->},source={human|人} {InstitutePlace|场所},cost={money|货币}, beneficiary={human|人}{group|群体->}, domain={economy|经济}} Components of HowNet Taxonomy(义原层级规范) Roles and Features(角色与特征规范) Specifications of KDML(知识描述语言规范) Knowledge Database(知识库) Event Relations & Role Shifting (事件关系与角色转换) Maintenance Tools(维护管理工具) APIs (应用接口) Nature of HowNet An online knowledge-base which reveals the relationship among concepts, and the relationship among attributes of concepts -- Dong Zhendong, "Knowledge Description: What, How and who?", Proceedings of International Symposium on Electronic Dictionary, Tokyo, 1988, p.18 Theory of HowNet Knowledge is a system of relationships among concepts and among attributes of concepts Everything is constantly changing in a specific time and space, and converts from one state to another. The conversion embodies the change of its attributes Guidelines of Design Computer-oriented Relationship is the key; to reveal the relationship is the main objective of HowNet Based on sememes Use of KDML Defining concepts in a static & isolate way Relationship is activated in a dynamic way Concept Definitions in HowNet (1) 医生:DEF={human|人:domain={medical|医}, HostOf={Occupation|职位},{doctor| 医治: agent={~}}} 患者:DEF={human|人:domain={medical|医}, {SufferFrom|罹患:experiencer={~}}, {doctor|医治:patient={~}}} 医院: DEF={InstitutePlace|场所:{doctor|医治: location={~},content={disease|疾病}}, domain={medical|医}} Concept Definitions in HowNet (2) 病历:DEF={document|文书:{record|记录: content={disease|疾病},LocationFin={~}}, domain={medical|医}} 健康:DEF={Health|健康: host={AnimalHuman|动物}} 多病:DEF={unhealthy|不健} │ │ ├ {HealthValue|健康值} │ │ │ ├ {healthy|康健} │ │ │ └ {unhealthy|不健} Concept Definitions in HowNet (3) 病:{disease|疾病} {phenomena|现象: {doctor|医治:content={~}},{SufferFrom|罹患 :content={~}},RelateTo={medicine|药物} {Health|健康}{HealthValue|健康值}, domain={medical|医}} 药: {medicine|药物} {artifact|人工物:{doctor|医治 :instrument={~}},RelateTo={disease|疾病}, domain={medical|医}{chemistry|化学}} Identity of description in different language structures (1) W_C=劫 G_C=V E_C= W_E=rob G_E=V E_E= DEF={rob|抢} W_C=飞机 G_C=N E_C= W_E=plane G_E=N E_E= DEF={aircraft|飞行器} Identity of description in different language structures (2) W_C=劫机 G_C=V E_C= W_E=hijack a plane G_E=V E_E= DEF={rob|抢:possession={aircraft|飞行器}} Identity of description in different language structures (3) W_C=劫机犯 G_C=N E_C= W_E=hijacker G_E=N E_E= DEF={human|人:{rob|抢:agent={~}, possession={aircraft|飞行器}}} Identity of description in different language structures (4) W_C=抓获劫机犯 G_C=V E_C= W_E=catch a hijacker G_E=V E_E= DEF={catch|捉住:patient={human|人: {rob|抢:agent={~}, possession={wealth|钱财}}}} Identity of description in different language structures (1) W_C=机敏地抓获女劫机犯 G_C=V E_C= W_E=catch a woman hijacker cleverly G_E=V E_E= DEF={catch|捉住:manner={clever|灵}, patient={human|人:{rob|抢:agent={~}, possession={wealth|钱财}}, modifier={female|女}}} Applications of HowNet 1. Semantic tagging 2. WSD,Sense Pruning 3. Sensitive information detection 4. Information filtering 5. Similarity of words 6. Semantic Web 7. Match of WordNet Future work Construction of resouces English HowNet Chinese message structure bank Increase of languages Developing more APIs and tools Administration Membership Ontology定义的附录 (1) a specification of a conceptualization the theory of objects and their ties similar to a dictionary or glossary, but with greater detail and structure that enables computers to process its content. An ontology consists of a set of concepts, axioms, and relationships that describe a domain of interest. An upper ontology is limited to concepts that are meta, generic, abstract and philosophical … Ontology定义的附录 (2) the study of what there is, an inventory of what exists …What we may call ontology is the attempt to say what entities exist. Metaphysics, by contrast, is the attempt to say, of those entities, what they are. the study of the categories of things that exist or may exist in some domain The word ontology comes from the Greek ontos for being and logos for word. Cost for French in EuroWordNet For the development of French language, here were 2 partners: Avignon (AVI) and Memodata (MEM). The following was requested : Personnel Equipment Travel & assistance Consumables & computing Overheads Total AVI 72000 3000 5000 3000 16600 99600 MEM 85000 0 1500 300 17100 104400 Since Memodata was a private company, only50% of its request could be funded by the EC. So the total of the request was: Total AVI 99600 MEM 52200 Notes: 1) validation is not included in this table. This has be done by Xerox and Bertin globallyfor several languages. 2) These amounts constitued a previsional budget corresponding to some 20 000 synsets. Demo of Tools (1) Relevant Concept Field (2) Similarity of Words (3) Chinese Chunk Extractor (4) Smart Word finder Overview of HowNet Components of HowNet Nature of HowNet Theory of HowNet Guidelines of Design Sememes and Relations 需要的备用文件 HowNet Browser (桌面) Relevant concept field (桌面) – “行” Similarity computing (桌面) – 数位典藏计划 (目录 “ontology”) Prof. Huang’s comment on HowNet (桌面) U32下:Taxonomy Event Relation & Role Shifting Taxonomy Typical Actors Papers (Applications about HowNet)
© Copyright 2026 Paperzz