Getting Deeper Semantics than Berkeley FrameNet with MSFA Kow KURODA∗ , Masao UTIYAMA∗ , Hitoshi ISAHARA∗ ∗ National Institute of Information and Communications Technology (NICT), Japan 3-5 Hikaridai, Sorakugun, Kyoto, 619–0289, Japan {kuroda, mutiyama, isahara}@nict.go.jp Abstract This paper illustrates relevant details of an on-going semantic-role annotation work based on a framework called M ULTI LAYERED / DIMENSIONAL S EMANTIC F RAME A NALYSIS (MSFA for short) (Kuroda and Isahara, 2005b), which is inspired by, if not derived from, Frame Semantics/Berkeley FrameNet approach to semantic annotation (Lowe et al., 1997; Johnson and Fillmore, 2000). 1. Introduction §1.1. presents the current status of our work, presenting our framework called MSFA. §1.2. explains our motivations. Main features of MSFA are presented in §2.1. Relevant details of the proposed annotation scheme under MSFA are presented in §2.3. with some explanations. 1.1. Current status In (Kuroda and Isahara, 2005b), we defined a framework for fine-grained semantic analysis/annotation called M ULTI - LAYERED / DIMENSIONAL S EMANTIC F RAME A NALYSIS (MSFA for short), and produced a small collection of semantic analyses/annotations for Japanese texts taken from Kyodai Corpus (KC) (Kurohashi and Nagao, 2003). This was just a first attempt to provide an evaluation version released for free, expecting feedback from users.1 More annotation work will follow.2 The target sentences of the official release were three Japanese newspaper articles, comprising 63 sentences in total, selected from the KC. The English translation of the KC was completed now at NICT, and the Chinese translation is underway. We hoped that our semantic role annotation could be used on a multi-lingual basis if a similar kind of semantic annotation was added to other language’s part. The number of the frames we identified through this annotaion/analysis work comes around 700 in terms of type. Note that, as we will see later, the semantic analysis of a sentence and the identification of frames needed for it are performed at the same time, in a cycle. This distinguishes our work from semantic annotation projects like SALSA Project (Erk et al., 2003), which crucially depend on the pre-existing (and yet completed) BFN database of frames. In MSFA, we annotate the texts for semantic roles differentiated from semantic types, an important distinction proposed by (Kuroda and Isahara, 2005a) based on insights from Frame Semantics (Fillmore, 1985; Fillmore and Atkins, 1994).3 While semantic types can be equated with 1 Data access is restricted, however. Users need an account. this “official” result, some additional semantic analyses/annotations are being done on a volunteer basis in the form of open development available at http://www.kotonoba.net/˜mutiyama/cgi-bin/ hiki/hiki.cgi?FrontPage. 3 Roughly, the proposed distinction of semantic roles from semantic types corresponds to the distinction of “relational” cate2 Besides natural kinds by and large and encoded in a thesaurus effectively, semantic roles can’t: they are situation-specific concepts highly dependent on culture, seeming to be more responsible for text understanding than semantic types. 1.2. Motivations: Dealing with “deeper” semantics Semantic annotation has become a trend: it is being practiced in a variety of forms. To name just a few frameworks, we have Berkeley FrameNet (BFN, henceforth) (Baker et al., 1998; Fillmore et al., 2003), PropBank (Kingsbury et al., 2002), SALSA Project and SemCor. For comparison of some of the major frameworks, see (Ellsworth et al., 2004). When it comes to the task of specifying “deeper” semantics useful to text understanding at realistic levels, BFN looks most appealing because it tries, or at least promises, to provide more specific semantic information than you can get through specification of argument structures of predicates (this is what PropBank aims for) and more interesting information than you can get through WordNet-based word sense disambiguation (this is what SemCor aimed for). By realistic, we mean it is capable of specifying semantic intuitions that reflect actual mental processing in human brain. Shallow semantic analysis like word sense disambiguations at the level of SemCor is useful, but we still need more to get into realistic text understanding, and without semantic specifications at some reasonably deep levels, understanding of texts at some psychologically realistic levels would be just a fancy, for the reasons specified below. 1.2.1. Need for finer-grained specifications: A case study of a Japanese verb osou (Nakamoto et al., 2005) conducted psychological experiments based on a detailed corpus-based analysis of the Japanese verb osou. They analyzed carefully all uses of “Xga Y -wo osou” (active) and “Y -ga X-ni osowareru” (passive form) collected from a Japanese-English alignment data (Utiyama and Isahara, 2003).4 giving the network analysis specified in Figure 1. The experiments showed that ordinary Japanese were able to easily differentiate at least 15 situations (F01–F15) specified at very concrete levels in gories from “object/entity” categories made by Genter and her colleagues (Gentner, 2005; Gentner and Kurtz, 2005; Asmuth and Gentner, 2005). Also, Gentner’s “relational schema” category seems to correspond to “(semantic) frame” in the BFN sense, and “(idealized) situation” in our sense. 4 The result of annotation work is freely available. More Abstract More Concrete L2 Level Situations 狼が羊の群れを襲った Wolves {attacked; ?*assaulted} a flock of sheep. F06: Predatory Victimization L1 Level Situations MM 1a A: Victimization of Animal by Animal (excluding Human) スズメバチの群れが人を襲った A swarm of wasps {attacked; ?*assaulted} people. MM 1c MM 1b F07: Nonpredatory Victimization B2: Victimization of Human by Animal (excluding Human) A,B: Victimization of Animal by Animal F07b: (Counter)Attack for Self-defense F07a: Territorial Conflict between Groups MM 1e E: Conflict between Groups B0: Victimization of Human by Animal (including Human) MM 1d MM 2 MM* 2 サルの群れが別の群れを襲った A group of apes {attacked; ?assaulted} another group. マフィアの殺し屋が別の組織の組長を襲った A hitman of a Mafia {attacked; assaulted} the leader of the opponents. F01: Conflict between Human Groups F01,02: Power Conflict between Human Groups B1: Victimization of Human by Human, Crime2 B3c: F01,02,03: Resource-aiming Victimization MM 11 暴徒と化した民衆が警官隊を襲った A mob {attacked; ?assaulted} the squad of police. 貧しい国が石油の豊富な国を襲った A poor country {attacked; ?assaulted} the oil-rich country. F02: Invasion MM 10 MM 9 B3: Victimization of Human by Human based on desire-basis, Crime1 A,B,C,D,E (=ROOT): Victimization of Y by X F04: Persection B3b: Physical Hurting = Abuse MM 8 F03a: Robbery 引ったくりが老婆を襲った. A purse-snatcher {attacked; ?*assaulted} an old woman. F03b: Robbery 三人組の男が銀行を襲った. A gang of three {attacked; ??assaulted} the bank branch. F03: Robbery B3a: Physical Hurting = Violence 狂った男が小学生を襲った A lunaric {attacked, assaulted} boys at elementary school. F05: Raping 男が二人の女性を襲った A man {attacked; assaulted; ??hit} a young woman. MM 14 ?MM 7b MM 0 F09,10(,11): Natural Disaster D: Perceptible Impact ? C,D,E: Victimization in Unfortunate Accident F12: Social Disaster F12a: Social Disaster on Larger Scale MM 4a MM 5a F12b: Social Disaster on Smaller Scale ? F13: Long-term sickness Hierarchical Frame Network (HFN) of “X-ga Y-wo osou” (active) and “Y-ga X-ni osowareru” (passive) E: Personal Disaster? NOTES • Instantiation/inheritance relation is indicated by solid arrow. • Typical “situations” at finer-grained levels are thick-lined. • Dashed arrows indicate that instantiation relations are not guaranteed. • attack is used to denote instantiations of A, B. • assault is used to denote instantiations of B3 (or B1). • hit, strike are used to denote instantiations of C. • Pink arrow with MM i indicates a metaphorical mapping: Source situations are in orange. MM 4b F13,14: Suffering a Physical Disorder L1 Level Situations ? L2 Level Situations F08: Misfortune ?MM 4c F15: Short-term mental disorder 肺癌が彼を襲った He {suffered; was hit by} a lung cancer (cf. Cancer {??attacked; hit; seized} him) 痙攣が患者を襲った The patient have a convulsive fit (cf. A convulsive fit {??attacked; ?seized him) F14: Short-term sickness F14,15: Temporal Suffering a Mental or Physical Disorder 大型の不況がその国を襲った A big depression {?*attacked; hit; ???seized} the country. 赤字がその会社を襲った The company {experienced; *suffered; went into} red figures. (cf. Red figures {?attacked; ?hit; ?*seized} the company}) MM 5b MM 13 F13,14,15: Getting Sick = Suffering a Mental or Physical Disorder 地震がその都市を襲った An earthquake {*attacked; hit; ?*seized} the city. ペストがその町を襲った The Black Death {?*attacked; hit; ?seized} the town. MM 3a MM 3b ? MM 7a MM 6a F11: Epidemic Spead ?MM 12 C: Disaster F10: Natural Disaster on Larger Scale ?MM 6b ? 突風がその町を襲った Gust of wind {?*attacked; hit; ?*seized} the town. F09: Natural Disaster on Smaller Scale 無力感が彼を襲った He {suffered from; was seized by} inertia (cf. The inertia {?*attacked; ?hit; ?seized} him). 不安が彼を襲った He was seized with a sudden anxiety. (cf. Anxiety attacked him suddenly} 暴走トラックが子供を襲った The children got victims of a runaway truck (cf. A runaway truck {*attacked; ?*hit} children.) Figure 1: Hierarchical Frame Network Analysis (HFNA) of the situations referred to by osou-sentences in Japanese. “L1 and L2 Levels” refer to the semantic classes “L1 and L2” defined in Figure 2. Figure 1. To set up for this, they tried to specify the full range of the verb’s uses in terms of Frame Semantics, coming up with the finer-grained situations (F01–F15). The result reported in (Nakamoto et al., 2005), especially the “frame-lattice” specified in Figure 1, can be used to indicate how much fine-grainedness is needed if we try to annotate every use of osou. Note that osou is a rather generic verb to denote a situation of victimization of a h V ICTIM i, Y , by a h H ARM CAUSER i, X. Its meanings range from “X { attack; assault } Y ” (e.g., A gang of three { attacked; ?assaulted } the bank, which comes close the “core” sense of osou, A purse-snatcher { attacked; ??assaulted } an old woman) to metaphorical “X { hit; strike } Y ” (e.g., The hurricane { hit; struck } the city), to further metaphorical “Y suffer (from) X” (e.g., The man suffered { a damage; from a serious disease }). For overview, the English translations of the osou’s occurrences in the freely accessible component of the JEAD are specified in Table 2. BFN provides a higher-level description of several coarse-grained classes for the situations specified in Figure 1. Relevant BFN frames are Attack, Cause harm, Cause impact, but they turn out to be too coarsegrained, for the reasons specified below, as described in (Kanamaru et al., 2005) in more detail. An important point is this: (Nakamoto et al., 2005) found that if activities have different purposes (includ- ing “null” value), no matter how “similar” they are in other respects, people tend to categorize/classify them as “different situations”, and they apparently had no difficulty in performing it. This means something quite ironical to most linguistic/lexicographic analyses: namely, the use of the same verb (e.g., osou) to refer to different situations does not guarantee the sameness of the conceptualizations with and against which situations are described. Rather, (even) ordinary people have no problem in understanding sentences employing highly “fine-grained” differentiated conceptualizations, at least as fine-grained as the levels of F01–F15. If this is true, it follows that the granularity levels of BFN’s Cause impact are unsatisfactory, and it can be easily confirmed by the following fact. Cause impact doesn’t distinguish metaphorical and nonmetaphorical uses. For example, Taifuu-ga Tokyo-wo osotta (‘The typhoon { hit; struck } Tokyo’) and Bouraku-ga shijou-wo osotta (‘The downfall { hit; struck } the market’) are not differentiated. Furthermore, we can find yet finer-grained distinctions in some nonmetaphorical cases of it. For example, taifuu (‘typhoon’) and toppuu (‘a gust of wind’) have different selectional properties: Taifuu-ga sono mati-wo osotta (‘The typhoon hit the city’) sounds natural but ??Taifuu-ga sono otoko-wo osotta (‘The typhoon hit the man’) sounds unnatural. In contrast, ?Toppuu-ga sono mati-wo osotta (‘A gust of wind hit the city’) sounds awkward but Toppuu-ga sono 112 (TOTAL) L0 = Sub L1 Level L1 attack[+human(s)]: rob 4 7 10 attack[+human(s)]: rob: break into attack[+human(s)]: rob: make off with MONEY attack[+human(s)]: rob: hold up attack[+human(s)]: rob: threaten 2 1 1 2 attack[+human(s)] 23 23 attack[+human(s)]: kill attack[+human(s)]: assault attack[+human(s)]: assault: raid attack[+human(s)]: assault: shoot attack[+human(s)]: assault: shoot, wound attack[+human(s)]: assault: shoot; rob attack[+human(s)]: assault: stab 1 9 1 3 1 1 3 1 10 attack[-human(s),+animal(s)] 7 8 attack[-human(s),+animal(s)]: kill attack[-human(s),?animal]: assault[+metaphoric?]: turn on 1 1 1 hit,strike: hit 3 8 hit,strike: rock hit,strike: strike hit,strike: pound hit,strike: destroy: wreak on hit,strike: destroy: ravage hit,strike: roar through hit,strike: sweep through hit,strike: wrought devastation hit,strike: IMPLICIT in: earthquake hit,strike: IMPLICIT in: in PLACE hit,strike: there is 1 2 2 1 1 1 1 1 2 2 1 hit,strike[+metaphoric, +human(s)?]: occur[=attack] 1 hit,strike[+metaphoric]: hurt hit,strike[+metaphoric?]: hit hit,strike[+metaphoric]: hit hit,strike[+metaphoric]: paralyze hit,strike[+metaphoric]: IMPLICIT in: shocks from hit,strike[+metaphoric]: overtake hit,strike[+metaphoric]: take a toll hit,strike[+metaphoric]: besiege hit,strike[+metaphoric]: engulf hit,strike[+metaphoric]: occur hit,strike[+metaphoric]: fall on hit,strike[+metaphoric]: IMPLICIT in: in PLACE hit,strike[+metaphoric]: IMPLICIT in: problems hit,strike[+metaphoric]: IMPLICIT in: turmoil 1 2 5 1 1 1 1 1 1 2 1 1 1 1 suffer 3 suffer: IMPLICIT in: victim suffer: be injured suffer: feel pain suffer: bring sorrow to people suffer: feel anxiety suffer: seized with suffer: suddenly begin a SYMPTOM suffer: experience attack[-human(s), +metaphoric] 1 1 1 1 1 1 1 2 English verbs that translate OSOU Semantic Classes at Level 1 Resourcethreatenig situations L2 Semantic Classes at Level 2 L3 Semantic Classes at Level 3 51 Intended Harmcausation[+animate] 90 Cause oriented 39 Disasters = Harmcausation[-animate] 10 Sufferings = Harmexperience 10 Effect oriented 3 42 Lifethreatening by human 9 Lifethreatening by nonhuman 18 Natural disasters 21 Social disasters[+met aphoric] 10 Sufferings 5 3 2 2 6 2 9 4 4 2 5 3 4 Figure 2: English verbs that appear in the translations of the osou-sentences in Japanese-English Alignment Data (JEAD) (Utiyama and Isahara, 2003) otoko-wo osotta (‘A gust of wind gust hit the man’) sounds fine. All of this suggests that taifuu is understood as a relatively large-scale disaster that happens to a relatively large group of people, whereas toppuu is understood as a relatively small-scale disaster that happens to an individual. The same is true of Cause harm and Attack but we omit details here for the sake of space. 1.2.2. Need for more effective identification procedure Based on the observation above, we can state the problem we are facing as follows: if BFN promises to provide “deeper” semantics, how deep should it be? More plainly, are BFN frames “deep enough”? For this matter, our answer is negative, because Cause impact, Cause harm, and Attack, provided by BFN, are not detailed enough to specify selectional properties we need to describe to reflect human semantic intuitions. But how is this case, after all? We believe that the frame identification procedure adopted by BFN now is not adequate for the purpose of serious, exhaustive semantic annotation/analysis of an unselected text, no matter how reasonable it may look for the purpose of building a relatively wide-coverage database in a short time. More specifically, we believe that the adequate levels of granularity need to be “discovered” through inductive exploration into real texts, unlike BFN’s topdown quick “scans” for frames. There are more “general” problems that are not particularly applicable to BFN, however. Most seriously, no evaluation measures are defined to tell whether a given semantic description is “deep enough” and without such measures, any promise of providing deeper semantics is likely to be ineffective. This means that we definitely need to define a measure for the goodness of frame identification procedure, which provides a “testbed” for evaluating semantic analyses in terms of frames. There are certain possible choices. At one extreme, one can define as many frames as people can recognize as “different situations”, based on the selectional properties for frame-governing verbs. In the case of Japanese verb osou, (Nakamoto et al., 2005) showed, as mentioned earlier, that ordinary Japanese were able to easily differentiate at least 15 situations specified at the most concrete levels (F01–F15) in Figure 1, depending on the nature of the Assailant (the Agent-class role of Attack), Impactor (the Agent-class role of Cause impact) and Harm causer (the Agent-class role of Cause harm), especially in terms of their purposes. If their claim is adequate, it is implied that HFNA in Figure 1 is not “too detailed.” If you want to claim it is, you need justify it on empirical basis, because your claim argues against an empirical result from psychological experiments. It is possible to say that BFN is more interested in establishing the syntax-semantics interface rather than doing deep enough semantic analysis alone. It sounds like a good excuse, but we are also afraid if it is not “too much for one, not enough for the other.” An important finding by (Nakamoto et al., 2005) is that for osou-sentences, the semantic classes of subjects co-vary with those of objects. For example, as F06: Predactory Victimization in Figure 1 specifies, a h P REDATOR i { attacks; ?*assaults; *hits } a h P REY i. This is why The wolf attacked the (school of) sheep contrasts to ?*The wolf attacked the (school of) sardins, and The tuna attacked the (school of) sardins contrasts to ?*The tuna attacked the (school of) sheep. This piece of deeper semantics, or knowledge of what animal serves as food for what animal, enables us to make such judgments. Arguably, the semantic co-variation (under coselection) among arguments (and adjuncts, too) for a given verb can be used as a good test for differentiation among finer-grained frames/situations, serving as a useful distinctive feature applicable to semantic classification of many other verbs. Later research confirmed this. This means that we can use it as a guideline for frame identification/differentiation. Unfortunately, however, this crucial feature of semantic co-variation is largely ignored, if not totally unrecognized, in BFN now, even though it plays a crucial role in identifying frames/situations. More importantly, if semantic co-variation accounts for selectional properties of predicates, they ought to be specified somewhere, and we believe their specifications need to be done in terms of frames/situations; or where else? — Do you want it to appear in the entries for WOLF, SHEEP, TUNA , and SARDIN , risking the notorious frame problem? 1.3. Why MSFA? The crucial point here is that there is a sort of “tradeoff” between the standardization and expressiveness measures: if you want more standardization, you usually need to sacrifice expressiveness; if you want more expressive- ness, you usually need to sacrifice its standardization level. Obviously, BFN sets higher priority on the standardization measure, but this is not the only way to go. In a sense, our semantic annotation/analysis was done as an experiment to see what would happen if expressiveness is more important. This is why we did what we did. 2. Details of MSFA-based Annotation In this section, we specify relevant details of the MSFAbased semantic annotation/analysis to clarify what makes it different from BFN-based one. 2.1. Features of MSFA MSFA-based annotation has the following features: (1) It is not presumptive, in that there are no presupposed frames before doing annotation/analysis, except when the results of a previous analysis are reused. More specifically, frame identification in MSFA is necessitated by semantic analysis/annotation itself. (2) This means that the analysis/annotation needs to be greedy, in that as many frames as you need can be identified and added to the analysis, as long as they found necessary for providing deep enough semantic analysis/annotation of a text. Sufficiency is determined by successful specification of semantic covariations among arguments and adjuncts. (2) It does not assume (at least currently) established definitions for any frames: they are always open to major modifications, and the annotation task is designed so to make it easy to manage. (3) It is exploratory, in that it aims to “discover” frames in a bottom-up, inductive fashion, through the process of “exhaustive” semantic analysis of a text itself. (4) It is meant to be exhaustive, in that every word and multi-word unit are identified as a frame-evoking element: no exception. This means that you are not allowed to ignore certain elements as “uninteresting for our purposes.” Rather, you are required to seriously work with capturing the frame-evocation effects by each word within a sentence in a running text. (8) It is meant to be open, in that annotation work can be done in the form of open-development. The annotation scheme was so defined that annotators are only required to have a good command of Microsoft Excel, dispensing with special house tools for annotation.7 The policy in (2) guarantees that MSFA’s aim to provide “unrestricted” and “unbiased” description. It means that it does not avoid the analysis of such “troublesome” cases as metaphors, metonymies, idioms, and other figurative uses of language. Analysis of them can, and it turned out to, be pains-taking, but it’s worth doing it. We will be discussing some relevant details in §2.3.3. 2.2. Remarks One of the anonymous reviewers said that our first attempt, semantic annotations for only 63 sentences, was just anecdotal. We don’t deny it, but we’ve just started. Please be patient and hold your judgment until you see coming releases. Any great developments started small. For this, it should be noted that MSFA is designed for an open, distributed development, and researchers are always welcome to participate. It doesn’t depend on any house softwares: annotators just need to work on Excel spreadsheets. This, we hope, would “compensate” our small start. 2.3. How Annotation/Analysis Goes in MSFA 2.3.1. Sample MSFAs For illustration, a sample MSFA of (9), which is S-ID: 950107210-002 of the KC, is given in Table 3. (9) アルゼンチンの元サッカー選手、ディエゴ・マラドーナ 氏が六日、同国の検察当局に一時身柄を拘束された。 (10) On January 6, Diego Maradona, a renowned former soccer player of Argentina, was temporarily taken into custody by the local prosecutor’s office in his country. As mentioned earlier, MSFA was inspired by the Frame Semantics/BFN approach. Details of MSFA, however, were specified independently of BFN, for reasons specified above. Their main differences are the following: (11) (5) In this sense, it is unbiased, in that identification of frames is not motivated by any specific applications like Machine Translation, QA, Information Retrieval. (6) Also, it provides semantic analyses/annotations independent from syntactic parses, in that it only assumes tokenization, or “shallow parses” in some limited cases only for the sake of simplication. Unlike many other frameworks, it tries to “unground” semantic specifications against syntactic ones. For one, specification of valence-patterns is out of our focus, because it is very likely that other databases will provide such information independently. (7) This makes annotation flexible, in that you are always allowed to add or remove frames. If semantic analysis/annotation depended on certain syntactic parses, it would be a disaster to do such an “editorial” job. a. BFN provides standardized frames in the form of a database, whereas MSFA provides an annotation database using frames discovered in a running text in an unselective way. b. Frames identified using MSFA are more detailed and specific, having more effectiveness than BFN frames in terms of granularity. This naturally has led to the difference in descriptive density between two frameworks. On average, a sentence has 20+ frames attached to it in the MSFA, drastically more dense if compared to 2 or 3 frames per sentence in the BFN annotation publicized at their Web site.8 Despite this, it should be noted that MSFA-based and BFN-based annotations are expected to be compatible, because MSFA should give us a superset of BFN frames. 7 Of course, the quality control of the annotation work is done by a “committee,” managed independently using a Wiki site. 8 But BFN is not (yet) a project of semantic annotation per se. So, it is kind of unfair to judge BFN from this point alone, as correctly pointed out by one of the reviewers of this article. Saying !"#$%&'( !) !* !+ ,* %=#2;"#:%@ !"#$%9:;9 !*A %=#2;"#:%@ %=#2;"#:%@ !"#$% B"%@CBB;@ !+ ,* <%=#:>;?@ %@&/+ !"#$% I#$% JKLM ¾ ¾ ¾ ¾ ¾ .¿À Âà ¾ LMÄ ¾ kLÄ ¾ JKYÍ» ÑÒÓÔŽ Ô Y LM .¿À Âà LN ,+ !- OPQR OR {Á {Á .¿À {Á Âà /+ .* 0* !1# .¿À LNÄ OPQRÄ ST UVWXY Z[U JK \]^Y_ abYcd ` .+ !12 !3 !4 0+ B"%@CBB;@ %%@&!5A %=#2;"#:%@ D;?@:>:C:% D;?@:>:C:% %=#2;"#:%@ !12 @&!) @&!1# 0+A D;?@:>:C:% @&.* Uefgh Ui*j UklU Uefgh UmnYo Ui+j YZ[Ui*j pq !5 !6 /* 7* '+ .5 mrsPY tu vwxyz {|}~Y •€ •‚yƒz {|}~Y •€ \]^|} ~Y•€ 0- '* B"%@CBB;@ %@&'+A D;?@:>:C:% @&!5A %=#2;"#:%@ '* %=#2;"#:%@ %=#2;"#:%@ .5A %=#2;"#:%@ !6F'+ %=#2;"#:%@ 7* /* •€ „…†‡ ˆ‰YŠ‹ Žy••Y YŒ• ‘’ !8 !*3 !*5 D;?@:>:C:% @&!**A %=#2;"#:%@ %=#2;"#:%@ !*5 !*) ‘’ U“”U !*+ !*1 !*6 !*- %=#2;"#:%@ B"%@C$%@ !*-F!*3A !*6A B"%@CBB;@ :#"G%:@ %=#2;"#:%@ %@&!*4 !*6 /+ –Y—˜ ›œ •–Y—˜ i™$%:#Bš; i™$%:#Bš ">Dj ;">Dj •ž Ÿ .- '- 05 D;?@:>:C:% @&!*6A %=#2;"#:%@ B"%@CBB;@ 05 %@&05 —¡Y¢£ mnY¤— ¥¦§¨• Y‘’ !** 03 !*) D;?@:>:C:% @&05 ©’ Uefgh Ui-j .1 .3 D;?@:>:C:% D;?@:>:C:% @&!*8 @&.1 !*8 !+* 06 "%#=>E%@ B"%@CBB;@ !*1A %@&!*4A %=#2;"#:%@ "%#=>E%@ 03F06A !+) B"%@CBB;@ %@&!+) ªr«Y¬ UmnYo ®¯XY° ²³Y´µ ®¯Y¶· -ª YZ[Ui+j ± !+) !++ D;?@:>:C:% @&!+*A %=#2;"#:%@ !++ ¸¹ º¹ í¸¹Ä º¹ÕÖ !*4 $;:>H#:%@ !*6A B"%@B#"%@ !+* »¼ ¼½ ®¯ ÅíÆ»¼Ä í¼Ä ®¯Ä Âà .¿À ÅOPQR ÄÆ ST{ kLÄ kLÄ ORÄ ORÄ ÈÉ{ LNÍ» LNÍ» ORÍ» ORÍ» Í» LMÄ .) D;?@:>:C:% D;?@:>:C:% D;?@:>:C:% D;?@:>:C:% @&!)A D;?@:>:C:% "%#=>E%@&!+ @&,* @&!) @&!) %=#2;"#:%@ @&.* !1# Z[Ä JÎ _`Ä cdÄ _`YÊ{ ÈÉ{ efghÇ RÄ efghÇ RÄ efghÇ RÄ “”Ä Z[Ä Å“”ËÌ ÄÆ JK ÏÐ Õ Ö×Ø::"i*j ÕÖ×Ø::" ef mn tuÄ×ef ÙÀ¿i*F-j “”Ú o ÛØ<7Ù< ¾ ‘’i•¡ Ü‘j •€YÜ z{â ÙÀ¿7Ù< ÙÀ¿ \]^ •€YÍ» •€Ä •€Ä •€Ä •€Ä ‘’Žy• Þ vwxy z{ æ ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< éêëì ÕÖ ÕÖ ÙçèÙI(Ù<â ÙÀ¿7Ù<ª aôâ ÙÀ¿7Ù< ÕÖ ÙÀ¿7Ù< pq}Š• ¡ pqßYà á ÙÀ¿7Ù< mn pqÄ tuÄ ‘’„… ÝJܪ ‘’Žy• ‘’Ú ãä•â ÙÀ¿7Ù< rå ÙÀ¿7Ù< ÙÀ¿ †‡Ä Œ•Ä ‘’Ä ‘’Ä .¿À .¿Àª í—îÄ ÕÖ o —˜YÕÖ —˜YÕÖ í›œÄ ÅíÆ•žÄ ÅíÆŸ Ä —¡Ä °±ÕÖ ´µÅÕÖÆ Ä í¶·Ä ï ðäñyò ó õ ¾ ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÛØ<7Ù< —îYö÷ ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÛØ<7Ù< ef ÛØ<7Ù< ÛØ<7Ù< ÛØ<7Ù< ÛØ<7Ù< ÛØ<7Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÛØ<7Ù< ÛØ<7Ù< °±Ä ¾ °±Í» ÛØ<7Ù< ´µÄ ÕÖ|øR —î×Ø::" ÛØ<7Ù< ÛØ<7Ù< ¸¹Ä »¼×Ø::" Ùçè ÛØ<7Ù< º¹Ä º¹Í »×Ø::" ÙçèÙI(Ù< ÙçèÙI(Ù< ÙçèÙI(Ù< ÛØ<7Ù< —»¼Ä »¼YÍ »×Ø::" ÛØ<7Ù< —¼Ä ¼ùYÍ »×Ø::" ¾ ÛØ<7Ù< ®—úÜ ®—ûü ¾ •žY] ¾ •žYÂà ¾ —îY-. ûü û æ /Ùçè 0 —îÄ×Ø::" .¿À •–"#[ –"#[Ç ›œ"&ÇR²³ R$% 'ÇRýþ —˜YÜ‘ —˜YÜ‘ ›œYÜ‘ •žYÜ‘ Ÿ —˜Y-. —˜Y-. ›œY-. •žYûü Ÿ ûü ûü ûü ¾ , ÕÖ|øR —î “”Þ /Ùçè •–Y—˜ Ä×Ø::" /Ùçè –Y—˜ Ä×Ø::" /Ùçè /Ùçè ›œÄ×Ø::" •žÄ×Ø::" —¡¢£Y ÏÐ mn .¿À ¢£Yýþ .¿À »¼ »¼YÍ» ¼ùYÍ» ®—YÍ» »¼Yýþ ¼ùYýþ ¶·YÜ‘ Yûü ¶·Yûü í¤—Äi2 34•–j ¥¦§¨ ©’Ú ¢£Ä ¤—Ä ‘’Ä ©’Ä ÕÖ ªr«ª °±Ä —´µÄ —¶·Ä ¸¹Ä º¹Ä —»¼Ä —¼Ä Ùçè Ùçè Ùçè Ùçè Ùçè ÙÀ¿7Ù<ª ª¬-Ī Ùçè Ùçè Ùçè Ùçè Ùçè Ùçè Ùçè @ú{Á ¢£Y>? ÿ!â ÙÀ¿7Ù< ÿ!â ÙÀ¿7Ù< ef ÿ!i*j ®¯()R |*+Rý ¶·Yýþ YÜ‘ /Ùçè 1Ÿ Ä×Ø::" º¹Í» ¶·YÏÐ mn 3 .¿ÀÙ<I¿ <ª Y 56 —îÄ -7 8 Ùçè ÛØ<7Ù< —îY9: ûü —îY<= >? ¾ ;ü lA " › œ *+ Š E •–Y—˜ –Y—˜Ä ›œÄ —•žÄ Ä Ùçè Ùçè Ùçè Ùçè ÛØ<7Ù< ÛØ<7Ù< ÛØ<7Ù< ÛØ<7Ù< —˜Y9: —˜Y9: ›œY9: ÙÀ¿i™D;$ ûü ûü ûü B;@>:%j —˜Y>? —˜Y>? ›œY>? —îYÍ» Ùçè Ùçè Ùçè Ùçè Ùçè Ùçè .¿À Ùçè Ùçè .¿À Ùçè Ùçè 1Ÿ Ä —˜YÍ» —˜YÕÖ ›œYÕÖ ¢£YÕÖ ÛØ<7Ù< ÙÀ¿ ÛØ<7Ù< .¿À Ùçè Ùçè Ùçè Ùçè ÛØ<7Ù< ªDÎ ÙÀ¿ ÙçèÙI(Ù< ÙçèÙI(Ù< Ùçè Ùçè Ùçè Ùçè ª¶·YÍ » ÙÀ¿BÿCâ ÙÀ¿BÿCâ ÙÀ¿7Ù< ÙÀ¿7Ù< ÿ!i+jâ ÙÀ¿7Ù< Ùçè Figure 3: MSFA of (9) [all annotations in Japanese]: Each column specifies a frame; each cell specifies a semantic role with the corresponding segment in the first row as its realization value. Empty, uncolored cells indicate that segments6 have no specific roles defined against the frames containing them. Relative positioning among frames has no specific meaning, though organized to make the table easy to read. Figure 4: Semantic Frame Network Analysis (SFNA) of (9) automatically generated by GraphViz (a free graphic-generator software) based on the specification in Figure 3, i.e., the MSFA of (9): Each circle specifies a frame (all in Japanese) 2.3.2. Specifying the interrelationships among frames Specified in Figure 4 is the relationship among frames that constitute the semantic analysis of (9). It specifies how frames are interrelated within a sentence. This is automatically generated from the MSFA in Figure 1. Listed in (12) according to their relative frequencies are “representative” relationships currently assumed in MSFA,9 most of which have an equivalent, or analoguous “frame-to-frame relation” in BFN: this, it should be pointed out that there is one remaining problem even if BFN is basically a lexicographic work: Who is going to supplement a huge amount of “unknown” frames missing in BFN, and how. We don’t believe it can be done automatically. 9 Other infrequent relations such as “F motivates G” are used. (12) a. “Elaboration” relation: A frame F elaborates another frame G; i.e., F inherits information from G; e.g., h Murder i elaborates h Killing i. b. “Constitution” relation: F constitutes G; i.e., F is part of G. e.g., h Paying i constitutes h Buying i. c. “Presupposition” relation: G presupposes F; e.g., h Buying i presupposes h Selling i. d. “Presumption” relation: F presumes G. This is the reverse of Presupposition; e.g., h Selling i presumes h Buying i. e. “Realization” relation: F realizes G; e.g., h Buying i realizes h Obtaining i. f. “Target/Transfer” relation: F targets G. This is specifically introduced in MSFA to describe the “figurative uses” of words including metaphor and/or idioms.10 e.g., h Shooting i targets h Refuting i in sentences like He shot down the counterarguments from his opponents. Some of the relations, such as Presumption, Realization, Transfer do not seem to be defined in BFN.11 2.3.3. Encoding “figurative” senses Real texts contain a lot of problematic cases in which words have “nonliteral” meanings, to which the BFN frame definitions are not easily applicable. Representative, if not all, cases are metaphoric expressions and idiomatic expressions. They pose a serious challenge to semantic annotation/analysis, because, as mentioned above, successful analysis of a metaphorical expression requires clarification of “(conceptual) metaphorical mapping” (Lakoff and Johnson, 1999), and idioms have, for whatever reason, noncompositional, “superlexical” meanings encoded in a distributed fashion. BFN does not explicitly specify how to encode such effects. This is one of the major reasons we defined MSFA independently of BFN. Take He spilled the political beans for example, where political interrupts spill(ed) the beans. This is an example of discontinuous frame-evocation. The multilayered, redundancytolerant design allows us to analyze this idiom with the Target/Transfer relation, as in Figure 5. h P ROCECUTOR : X i take h C RIMINAL : Y i into custody, appearing in (9) in the passive form Y was (temporarily) taken into custody and targeting h Arrest(ing) i (=h 逮捕 i) (via h Imprisoning i (=h 拘留 i) realizes h Arrest(ing) i), is analyzed in this way. Frame ID (Local) F1 F2 presupposes presupposes Frame-to-Frame F3; unsatifies F4; unsatisfies Relations (Global) F3; elaborates F4; targets F5 F2; targets F5 Frame Name (Global) Spilling F3 F4 elaborates F4 Scattering Holding Keeping F5 presupposes F4; unsatisfies F4; elaborates F8 Leaking = Failing to Keep Secret F8 Failing * ~Characteriza tion~ Politics Characterizer * GOVERNOR Spiller Scatterer Holder Keeper Leaker spilled GOVERNOR GOVERNOR OR EVOKER EVOKER EVOKER EVOKER[1,3] the F7 Failed Activity He political F6 presupposes F5,F7 Object.Attr[1] Object.Attr[1] Object.Attr[1] Object.Attr[1] EVOKER[2,3]: Secret.Attr[1] EVOKER[+c omposite] Object.Attr[2] Object.Attr[2] Object.Attr[2] Object.Attr[2] Secret.Attr[2] beans Object Object Object Object . EXTENDER EXTENDER EXTENDER EXTENDER Politician[+pot entially] Failer EVOKER[3,3]: Secret EXTENDER EXTENDER Attribute: EVOKER EVOKER Object Issue EXTENDER EXTENDER Figure 5: MSFA of He spilled the political beans. 3. References J. A. Asmuth and D. Gentner. 2005. Context sensitivity of relational nouns. In Proceedings of the 27th Annual Meeting of the Cognitive Science Society, pages 163–168. C. F. Baker, C. J. Fillmore, and J. B. Lowe. 1998. The Berkeley FrameNet project. In COLING-ACL 98, Montreal, Canada, pages 86–90. Association for the Computational Linguistics. A. Burchardt, K. Erk, A. Frank, A. Kowalski, S. Padó, and M. Pinkal. 2006. Consistency and coverage: Challenges 10 We came to know that a similar solution was adopted in SALSA framework (Burchardt et al., 2006). 11 It’s not easy to establish correspondence between our definitions and the ones given in BFN, because the proper semantics of certain frame-to-frame relations are unclear, need to be refined from the ontological perspective. for exhaustive semantic annotation. Presented at Worshop on Corpus-based Approaches to Noncompositional Phenomena, DGfS-06 (Bielefeld), Feb 24, 2006. M. Ellsworth, K. Erk, P. Kingsbury, and S. Padó. 2004. PropBank, SALSA, and FrameNet: How design determines product. In Proc. of the LREC 2004 Workshop on Building Lexical Resources from Semantically Annotated Corpora, Lisbon. K. Erk, A. Kowalski, S. Padó, and M. Pinkal. 2003. Towards a resource for lexical semantics: A large German corpus with extensive semantic annotation. In Proceedings of the ACL-03. C. Fillmore and B. T. S. Atkins. 1994. Starting where the dictionaries stop: The challenge for computational lexicography. In B. T. S. Atkins and A. Zampoli, editors, Compuational Approaches to the Lexicon, pages 349–393. Clarendon Press. C. Fillmore, C. Johnson, and M. Petruck. 2003. Background to FrameNet. International J. of Lexicography, 16(3): 235–250. C. Fillmore. 1985. Frames and the semantics of understanding. Quaderni di Semantica, 6(2):222–254. D. Gentner and K. J. Kurtz. 2005. Relational categories. In W. K. Ahn, R. L. Goldstone, B. C. Love, A. B. Markman, and P. W. Wolff, editors, Categorization Inside and Outside the Laboratory, pages 151–175. APA. D. Gentner. 2005. The development of relational category knowledge. In L. Gershkoff-Stow and D. H. Rakison, editors, Building Object Categories in Developmental Time, pages 245–275. Hillsdale, NJ: Lawrence Earlbaum. C. Johnson and C. Fillmore. 2000. The FrameNet tagset for frame-semantic and syntactic coding of predicate-argument structure. In Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL 2000), pages 56–62. T. Kanamaru, M. Murata, K. Kuroda, and H. Isahara. 2005. Obtaining Japanese lexical units for semantic frames from Berkeley FrameNet using a bilingual corpus. In Proceedings of the 6th International Workshop on Linguistically Interpreted Corpora (LINC-05), pages 11–20. P. Kingsbury, M. Palmer, and M. Marcus. 2002. Adding semantic annotation to the Penn TreeBank. In Proceedings of the Human Language Technology Conference. K. Kuroda and H. Isahara. 2005a. Towards a new classification of (word) concepts by distinguishing role names and object names. Technical Report of IEICE, 105 (204): 47–54. K. Kuroda and H. Isahara. 2005b. Proposing the M UL TILAYERED S EMANTIC F RAME A NALYSIS OF T EXT . In The 3rd International Conference on Generative Approaches to the Lexicon. [Revised version is available as: http://clsl.hi.h.kyoto-u.ac.jp/˜kkuroda/ papers/msfa-gal05-rev1.pdf]. S. Kurohashi and M. Nagao. 2003. Building a Japanese parsed corpus: While improving the parsing system. In A. Abeillé, editor, TreeBanks: Building and Using Parsed Corpora, pages 249–260. Kluwer Academic, Dordrecht/Boston/London. G. Lakoff and M. Johnson. 1999. The Philosophy in the Flesh. Basic Books. J. B. Lowe, C. F. Baker, and C. J. Fillmore. 1997. A framesemantic approach to semantic annotation. In Proceedings of the SIGLEX Workshop on Tagging Text with Lexical Semantics K. Nakamoto, K. Kuroda, and H. Nozawa. 2005. Proposing the feature rating task as a(nother) powerful method to explore sentence meanings. Japanese J. of Cognitive Psychology, 3 (1): 65–81. (written in Japanese). M. Utiyama and H. Isahara. 2003. Reliable measures for aligning Japanese-English newspaper articles and sentences. In Proceedings of the ACL 2003, pages 72–79.
© Copyright 2024 Paperzz