Enhancing Recall in Information Extraction through Ontological Semantics Sergei Nirenburg, Marjorie McShane and Stephen Beale Institute for Language and Information Technologies University of Maryland, Baltimore County Baltimore, MD, USA Text Sources OntologicalSemantic Analysis Fact Extraction Text Meaning Representation (TMR) Data and Control Flow Knowledge Support Grammars Lexicons Ontology Static Knowledge Sources Question Answering Fact Repository (FR) Text Sources OntologicalSemantic Analysis Fact Production Text Meaning Representation (TMR) Data and Control Flow Knowledge Support Grammars Lexicons Ontology Static Knowledge Sources Question Answering Report Generation Summary Production User Trend Identification Fact Repository (FR) Query OntologicalSemantic Analysis Grammars Query Formulation Text Meaning Representation (TMR) Lexicons Ontology Static Knowledge Sources Answer Data and Control Flow Knowledge Support Fact Repository (FR) (he (he-pro1 (cat n) (MORPH ) (ANNO (DEF "the pronoun 'he'") (EX "He kicked the can.") (COMMENTS "Expect the same weights for constraints for 'he' and 'she'")) (SYN-STRUC ((root $var0) (cat n) (type pro))) (SEM-STRUC (ANIMAL)) (MEANING-PROCEDURE (TRIGGER-REF ERENCE (third singular male) (same-clause .1) (preceding-clause .7) (pre-preceding-clause .5) (preceding-sent .5) (sentence-minus-2 .2) (sentence-minus-3 .1) (para-break .5) (repeat-collocation .7) (synonym-collocation .6) (agent-theme .8) (pp-embedded .2) (function-match .7) (coord .7) ;; (refl-null-sem .5) NA ;; (subj-of-same-clause .5) NA A description of the heuristics same-clause The candidate is in the same clause preceding-clause The candidate is in the preceding clause pre-preceding-clause The candidate is in any clause in the given sentence farther back than the preceding clause. preceding-sent The candidate is in the preceding sentence. sentence-minus-2 The candidate is two sentences back. sentence-minus-3 The candidate is three sentences back. para-break The candidate is in the preceding paragraph. Valid only for candidates that, themselves, are not in the first sentence of a paragraph. repeat-collocation The nominal candidate has been the same argument of the given verb previously. syn-collocation The nominal candidate has been the same argument of a synonymous (or similar) verb previously. agent-theme The nominal candidate is one of the main arguments in its clause, which we can define for now as the agent or theme (not path, beneficiary, instrument, etc.) pp-embedded The nominal candidate is embedded in a PP (this is mostly to weed out nonprominent adjuncts, but since some arguments are PPs, it can't be too strong a heuristic; it is better to look at case roles). function-match coord refl-null-sem subj-of-same-clause refl-cl-subj The syntactic function of the candidate matches the function of the referring expression The candidate is an argument in the preceding conjunct of a coordinate structure BUT the coordinate structure must be larger than the category itself: i.e., we want to catch the fact that the coordination in 'I picked up the book and read it' (VP coordination) is a strong indicator of coreference, but we don't want to assume that there is coreference in 'I told John and him'. The reflexive directly follows a NP that has matching features; in this case, it is rendered as null semantics (I myself know… He himself thought… The plans themselves are …) The candidate is the head of the subject of the same clause (used to distinguish anaphors). By 'subject of same clause' we mean whichever of the following is the nearest: 1) the overt or elided subject of the minimal tensed clause - Mary-i likes herself-i. - Mary-i is happy-go-lucky and pro-i likes herself-i 2) the PRO (non-overt) subject of the given infinitival clause - John-i forces his children-j PRO-j to fight for themselves-j 3) the overt PP "subject" of the given infinitival clause - For me-i to hurt myself-i would be stupid. It's important to say 'the head' of the subject because, for example, 'Mary's-i dog likes her-i' has coreference between the mod of the head and the d.o. The reflexive has to match features with the subject of the clause: He went to the movies himself ;United Airlines files for US Bankruptcy protection. PREPROCESSOR OUTPUT: notice the ambiguities: ((UNITED-Airlines n ((type pn) (type company))) (FILE N ((NUMBER PL))) (FILE V ((PERSON THIRD) (TENSE PRESENT)) *REPEAT*) (FOR CONJ NIL) (FOR PREP NIL *REPEAT*) (US N ((TYPE PRO))) (US N ((TYPE COUNTRY) (TYPE PN)) *REPEAT*) (BANKRUPTCY N NIL) (PROTECTION N NIL) (*PERIOD* PUNCT NIL)) ) SYNTAX OUTPUT 0 0 0 1 2 2 3 ;; ;; ;; ;; 3 3 3 3 3 4 4 5 - 6 1 1 2 6 3 6 : : : : : : : CL ((0 1 NP UNITED-AIRLINES) (1 2 V FILE) (2 6 PP FOR)) FILE 1 NP ((0 1 N UNITED-AIRLINES)) UNITED-AIRLINES 0 N NIL UNITED-AIRLINES 0 V NIL FILE 1 PP ((2 3 PREP FOR) (3 6 NP PROTECTION)) FOR 2 PREP NIL FOR 2 NP ((3 3 ART ***) (3 6 N PROTECTION)) PROTECTION 5 notice some ambiguity left as to (US (BANKRUPTCY PROTECTION)) vs. ((US BANKRUPTCY) PROTECTION) The semantic analyzer picked the one it liked best, but we will continue working on these n-n compounds - 6 6 5 4 3 6 5 6 : : : : : : : : N ((3 5 N BANKRUPTCY) (5 6 N PROTECTION)) PROTECTION 5 N ((3 4 N US) (4 6 N PROTECTION)) PROTECTION 5 N ((3 4 N US) (4 5 N BANKRUPTCY)) BANKRUPTCY 4 N NIL US 3 ART NIL *** NIL N ((4 5 N BANKRUPTCY) (5 6 N PROTECTION)) PROTECTION 5 N NIL BANKRUPTCY 4 N NIL PROTECTION 5 SEMANTIC OUTPUT: (APPLY-FOR-28 (TIME (VALUE (COMMON (THEME (VALUE (COMMON (AGENT (VALUE (COMMON (INSTANCE-OF (VALUE (COMMON ) (FIND-ANCHOR-TIME)))) PROTECT-28))) CORPORATION-28))) APPLY-FOR))) (PROTECT-28 (RELATION (VALUE (COMMON BANKRUPT-28))) (THEME-OF (VALUE (COMMON APPLY-FOR-28))) (INSTANCE-OF (VALUE (COMMON PROTECT))) ) (CORPORATION-28 (HAS-NAME (VALUE (COMMON "UAL CORP"))) ;; "UAL CORP" is the "official" FR name (AGENT-OF (VALUE (COMMON APPLY-FOR-28))) (INSTANCE-OF (VALUE (COMMON CORPORATION))) ) (UNITED-STATES-OF-AMERICA-28 (INSTANCE-OF (VALUE (COMMON UNITED-STATES-OF-AMERICA))) ) (BANKRUPT-28 (RELATION (VALUE (COMMON UNITED-STATES-OF-AMERICA-28))) (INSTANCE-OF (VALUE (COMMON BANKRUPT))) ) ;*b57-3* ; 10 December 2002 (DATE-29 (VALUE (VALUE (COMMON ((YEAR \2002) (DATE \10)) ((YEAR \2002) (MONTH \12))))) (INSTANCE-OF (VALUE (COMMON DATE))) ) ;*b57-4* ;UAL Corporation filed for Chapter 11 protection. (APPLY-FOR-153 (TIME (VALUE (COMMON (THEME (VALUE (COMMON (AGENT (VALUE (COMMON (INSTANCE-OF (VALUE (COMMON ) (< (FIND-ANCHOR-TIME))))) PROTECT-154))) CORPORATION-153))) APPLY-FOR))) (CORPORATION-153 (HAS-NAME (VALUE (COMMON "UAL CORP"))) (AGENT-OF (VALUE (COMMON APPLY-FOR-153))) (INSTANCE-OF (VALUE (COMMON CORPORATION))) ) (PROTECT-154 (RELATION (VALUE (COMMON CHAPTER-11-BANKRUPTCY-PROTECTION-155))) (THEME-OF (VALUE (COMMON APPLY-FOR-153))) (INSTANCE-OF (VALUE (COMMON PROTECT))) ) ;*b57-5* ;The company has said it will look at all aspects of its operations. (SPEECH-ACT-232 (TIME (VALUE (COMMON (THEME (VALUE (COMMON (AGENT (VALUE (COMMON (INSTANCE-OF (VALUE (COMMON ) (< (FIND-ANCHOR-TIME))))) CONSIDER-232))) CORPORATION-232))) SPEECH-ACT))) (CORPORATION-232 (HAS-NAME (VALUE (COMMON "UAL CORP"))) ;; reference resolution: "company" = "UAL CORP" (AGENT-OF (VALUE (COMMON SPEECH-ACT-232))) (INSTANCE-OF (VALUE (COMMON CORPORATION))) ) (CONSIDER-232 (THEME-OF (VALUE (COMMON (TIME (VALUE (COMMON (THEME (VALUE (COMMON (AGENT (VALUE (COMMON (INSTANCE-OF (VALUE (COMMON ) SPEECH-ACT-232))) (> (FIND-ANCHOR-TIME))))) SET-232))) ;; all aspects PHYSICAL-OBJECT-232))) CONSIDER))) (MILITARY-ACTIVITY-232 ;; "operations" - obviously not correct here (POSSESSED-BY ;; its operation = operation of UAL (VALUE (COMMON PHYSICAL-OBJECT-233))) (PARTS ;; "aspect" of the operations (VALUE (COMMON OBJECT-232))) (CARDINALITY (VALUE (COMMON (> 1)))) (INSTANCE-OF (VALUE (COMMON MILITARY-ACTIVITY))) (SET-232 ;; all aspects (SET-MEMBER-TYPE (VALUE (COMMON OBJECT-232))) (QUANT (VALUE (COMMON \1))) (INSTANCE-OF (VALUE (COMMON SET))) ) (OBJECT-232 ;; aspect of its operation (PART-OF (VALUE (COMMON MILITARY-ACTIVITY-232))) (CARDINALITY (VALUE (COMMON (> 1)))) (THEME-OF (VALUE (COMMON CONSIDER-232))) (INSTANCE-OF (VALUE (COMMON OBJECT))) (PHYSICAL-OBJECT-232 ;; reference resolution: it = UAL (HAS-NAME (VALUE (COMMON "UAL CORP"))) (COREFERENCE (VALUE (COMMON +))) (AGENT-OF (VALUE (COMMON CONSIDER-232))) (INSTANCE-OF (VALUE (COMMON PHYSICAL-OBJECT))) ) (PHYSICAL-OBJECT-233 ;; reference resolution: its = UAL's (HAS-NAME (VALUE (COMMON "UAL CORP"))) (COREFERENCE (VALUE (COMMON +))) (INSTANCE-OF (VALUE (COMMON PHYSICAL-OBJECT))) ) Events Full NP Subjects Pro Subjects No Subjec ts Proper Name Subj Subjects which are common nouns Nominalize d subjec ts No subjects Events identified correctly Events identified incorrectly Events not identified Agents identified without Reference Resolution Agents identified with Reference Resolution Agent Referents needing to be resolved Agent Referents resolved correctly Agent Referents resolved incorrectly Text 1 15 10 3 2 10 2 1 2 15 1 0 5/15 Text 2 8 5 1 2 6 0 0 2 8 0 0 2/8 Text 3 6 3 1 2 4 0 0 2 4 0 2 2/6 Text 19 10 6 2 2 6 1 1 2 6 2 4 2/10 Text 57 6 5 1 0 5 0 1 0 6 0 0 1/6 Text 59 15 10 3 2 12 1 0 2 12 2 3 6/15 Totals 13/15 6/8 4/6 5/10 5/6 8/15 41/60 10 5 4 4 3 4 30 7 1 3 0 2 0 4 0 3 0 2 0 21 1 60 39 11 10 43 4 3 10 51 5 9 18/60
© Copyright 2026 Paperzz