Ontological Semantics

Enhancing Recall in Information Extraction through
Ontological Semantics
Sergei Nirenburg, Marjorie McShane and Stephen Beale
Institute for Language and Information Technologies
University of Maryland, Baltimore County
Baltimore, MD, USA
Text
Sources
OntologicalSemantic
Analysis
Fact
Extraction
Text
Meaning
Representation
(TMR)
Data and Control Flow
Knowledge Support
Grammars
Lexicons
Ontology
Static Knowledge Sources
Question
Answering
Fact
Repository
(FR)
Text
Sources
OntologicalSemantic
Analysis
Fact
Production
Text
Meaning
Representation
(TMR)
Data and Control Flow
Knowledge Support
Grammars
Lexicons
Ontology
Static Knowledge Sources
Question
Answering
Report
Generation
Summary
Production
User
Trend
Identification
Fact
Repository
(FR)
Query
OntologicalSemantic
Analysis
Grammars
Query
Formulation
Text
Meaning
Representation
(TMR)
Lexicons
Ontology
Static Knowledge Sources
Answer
Data and Control Flow
Knowledge Support
Fact
Repository
(FR)
(he
(he-pro1
(cat n)
(MORPH )
(ANNO
(DEF "the pronoun 'he'")
(EX "He kicked the can.")
(COMMENTS "Expect the same weights for constraints for 'he' and 'she'"))
(SYN-STRUC
((root $var0) (cat n) (type pro)))
(SEM-STRUC
(ANIMAL))
(MEANING-PROCEDURE
(TRIGGER-REF ERENCE
(third singular male)
(same-clause .1)
(preceding-clause
.7)
(pre-preceding-clause .5)
(preceding-sent .5)
(sentence-minus-2 .2)
(sentence-minus-3 .1)
(para-break .5)
(repeat-collocation .7)
(synonym-collocation .6)
(agent-theme .8)
(pp-embedded .2)
(function-match .7)
(coord .7)
;; (refl-null-sem .5)
NA
;; (subj-of-same-clause .5)
NA
A description of the heuristics
same-clause
The candidate is in the same clause
preceding-clause
The candidate is in the preceding clause
pre-preceding-clause
The candidate is in any clause in the given sentence farther back than the
preceding clause.
preceding-sent
The candidate is in the preceding sentence.
sentence-minus-2
The candidate is two sentences back.
sentence-minus-3
The candidate is three sentences back.
para-break
The candidate is in the preceding paragraph. Valid only for candidates that,
themselves, are not in the first sentence of a paragraph.
repeat-collocation
The nominal candidate has been the same argument of the given verb
previously.
syn-collocation
The nominal candidate has been the same argument of a synonymous (or
similar) verb previously.
agent-theme
The nominal candidate is one of the main arguments in its clause, which we
can define for now as the agent or theme (not path, beneficiary,
instrument, etc.)
pp-embedded
The nominal candidate is embedded in a PP (this is mostly to weed out nonprominent adjuncts, but since some arguments are PPs, it can't be
too strong a heuristic; it is better to look at case roles).
function-match
coord
refl-null-sem
subj-of-same-clause
refl-cl-subj
The syntactic function of the candidate matches the function of the referring
expression
The candidate is an argument in the preceding conjunct of a coordinate
structure BUT the coordinate structure must be larger than the
category itself:
i.e., we want to catch the fact that the coordination
in 'I picked up the book and read it' (VP coordination) is a strong
indicator of coreference, but we don't want to assume that there is
coreference in 'I told John and him'.
The reflexive directly follows a NP that has matching features; in this case,
it is rendered as null semantics (I myself know… He himself
thought… The plans themselves are …)
The candidate is the head of the subject of the same clause (used to
distinguish anaphors). By 'subject of same clause' we mean
whichever of the following is the nearest:
1) the overt or elided subject of the minimal tensed clause
- Mary-i likes herself-i.
- Mary-i is happy-go-lucky and pro-i likes herself-i
2) the PRO (non-overt) subject of the given infinitival clause
- John-i forces his children-j PRO-j to fight for
themselves-j
3) the overt PP "subject" of the given infinitival clause
- For me-i to hurt myself-i would be stupid.
It's important to say 'the head' of the subject because, for example,
'Mary's-i dog likes her-i' has coreference between the mod of the
head and the d.o.
The reflexive has to match features with the subject of the clause: He went
to the movies himself
;United Airlines files for US Bankruptcy protection.
PREPROCESSOR OUTPUT: notice the ambiguities:
((UNITED-Airlines n ((type pn) (type company)))
(FILE N ((NUMBER PL))) (FILE V ((PERSON THIRD) (TENSE PRESENT)) *REPEAT*)
(FOR CONJ NIL) (FOR PREP NIL *REPEAT*)
(US N ((TYPE PRO))) (US N ((TYPE COUNTRY) (TYPE PN)) *REPEAT*)
(BANKRUPTCY N NIL)
(PROTECTION N NIL)
(*PERIOD* PUNCT NIL))
)
SYNTAX OUTPUT
0
0
0
1
2
2
3
;;
;;
;;
;;
3
3
3
3
3
4
4
5
-
6
1
1
2
6
3
6
:
:
:
:
:
:
:
CL ((0 1 NP UNITED-AIRLINES) (1 2 V FILE) (2 6 PP FOR)) FILE 1
NP ((0 1 N UNITED-AIRLINES)) UNITED-AIRLINES 0
N NIL UNITED-AIRLINES 0
V NIL FILE 1
PP ((2 3 PREP FOR) (3 6 NP PROTECTION)) FOR 2
PREP NIL FOR 2
NP ((3 3 ART ***) (3 6 N PROTECTION)) PROTECTION 5
notice some ambiguity left as to (US (BANKRUPTCY PROTECTION)) vs.
((US BANKRUPTCY) PROTECTION)
The semantic analyzer picked the one it liked best, but we will continue
working on these n-n compounds
-
6
6
5
4
3
6
5
6
:
:
:
:
:
:
:
:
N ((3 5 N BANKRUPTCY) (5 6 N PROTECTION)) PROTECTION 5
N ((3 4 N US) (4 6 N PROTECTION)) PROTECTION 5
N ((3 4 N US) (4 5 N BANKRUPTCY)) BANKRUPTCY 4
N NIL US 3
ART NIL *** NIL
N ((4 5 N BANKRUPTCY) (5 6 N PROTECTION)) PROTECTION 5
N NIL BANKRUPTCY 4
N NIL PROTECTION 5
SEMANTIC OUTPUT:
(APPLY-FOR-28
(TIME
(VALUE
(COMMON
(THEME
(VALUE
(COMMON
(AGENT
(VALUE
(COMMON
(INSTANCE-OF
(VALUE
(COMMON
)
(FIND-ANCHOR-TIME))))
PROTECT-28)))
CORPORATION-28)))
APPLY-FOR)))
(PROTECT-28
(RELATION
(VALUE
(COMMON BANKRUPT-28)))
(THEME-OF
(VALUE
(COMMON APPLY-FOR-28)))
(INSTANCE-OF
(VALUE
(COMMON PROTECT)))
)
(CORPORATION-28
(HAS-NAME
(VALUE
(COMMON "UAL CORP"))) ;; "UAL CORP" is the "official" FR name
(AGENT-OF
(VALUE
(COMMON APPLY-FOR-28)))
(INSTANCE-OF
(VALUE
(COMMON CORPORATION)))
)
(UNITED-STATES-OF-AMERICA-28
(INSTANCE-OF
(VALUE
(COMMON UNITED-STATES-OF-AMERICA)))
)
(BANKRUPT-28
(RELATION
(VALUE
(COMMON UNITED-STATES-OF-AMERICA-28)))
(INSTANCE-OF
(VALUE
(COMMON BANKRUPT)))
)
;*b57-3*
; 10 December 2002
(DATE-29
(VALUE
(VALUE
(COMMON ((YEAR \2002) (DATE \10)) ((YEAR \2002) (MONTH \12)))))
(INSTANCE-OF
(VALUE
(COMMON DATE)))
)
;*b57-4*
;UAL Corporation filed for Chapter 11 protection.
(APPLY-FOR-153
(TIME
(VALUE
(COMMON
(THEME
(VALUE
(COMMON
(AGENT
(VALUE
(COMMON
(INSTANCE-OF
(VALUE
(COMMON
)
(< (FIND-ANCHOR-TIME)))))
PROTECT-154)))
CORPORATION-153)))
APPLY-FOR)))
(CORPORATION-153
(HAS-NAME
(VALUE
(COMMON "UAL CORP")))
(AGENT-OF
(VALUE
(COMMON APPLY-FOR-153)))
(INSTANCE-OF
(VALUE
(COMMON CORPORATION)))
)
(PROTECT-154
(RELATION
(VALUE
(COMMON CHAPTER-11-BANKRUPTCY-PROTECTION-155)))
(THEME-OF
(VALUE
(COMMON APPLY-FOR-153)))
(INSTANCE-OF
(VALUE
(COMMON PROTECT)))
)
;*b57-5*
;The company has said it will look at all aspects of its operations.
(SPEECH-ACT-232
(TIME
(VALUE
(COMMON
(THEME
(VALUE
(COMMON
(AGENT
(VALUE
(COMMON
(INSTANCE-OF
(VALUE
(COMMON
)
(< (FIND-ANCHOR-TIME)))))
CONSIDER-232)))
CORPORATION-232)))
SPEECH-ACT)))
(CORPORATION-232
(HAS-NAME
(VALUE
(COMMON "UAL CORP"))) ;; reference resolution: "company" = "UAL CORP"
(AGENT-OF
(VALUE
(COMMON SPEECH-ACT-232)))
(INSTANCE-OF
(VALUE
(COMMON CORPORATION)))
)
(CONSIDER-232
(THEME-OF
(VALUE
(COMMON
(TIME
(VALUE
(COMMON
(THEME
(VALUE
(COMMON
(AGENT
(VALUE
(COMMON
(INSTANCE-OF
(VALUE
(COMMON
)
SPEECH-ACT-232)))
(> (FIND-ANCHOR-TIME)))))
SET-232)))
;; all aspects
PHYSICAL-OBJECT-232)))
CONSIDER)))
(MILITARY-ACTIVITY-232 ;; "operations" - obviously not correct here
(POSSESSED-BY ;; its operation = operation of UAL
(VALUE
(COMMON PHYSICAL-OBJECT-233)))
(PARTS ;; "aspect" of the operations
(VALUE
(COMMON OBJECT-232)))
(CARDINALITY
(VALUE
(COMMON (> 1))))
(INSTANCE-OF
(VALUE
(COMMON MILITARY-ACTIVITY)))
(SET-232 ;; all aspects
(SET-MEMBER-TYPE
(VALUE
(COMMON OBJECT-232)))
(QUANT
(VALUE
(COMMON \1)))
(INSTANCE-OF
(VALUE
(COMMON SET)))
)
(OBJECT-232 ;; aspect of its operation
(PART-OF
(VALUE
(COMMON MILITARY-ACTIVITY-232)))
(CARDINALITY
(VALUE
(COMMON (> 1))))
(THEME-OF
(VALUE
(COMMON CONSIDER-232)))
(INSTANCE-OF
(VALUE
(COMMON OBJECT)))
(PHYSICAL-OBJECT-232 ;; reference resolution: it = UAL
(HAS-NAME
(VALUE
(COMMON "UAL CORP")))
(COREFERENCE
(VALUE
(COMMON +)))
(AGENT-OF
(VALUE
(COMMON CONSIDER-232)))
(INSTANCE-OF
(VALUE
(COMMON PHYSICAL-OBJECT)))
)
(PHYSICAL-OBJECT-233 ;; reference resolution: its = UAL's
(HAS-NAME
(VALUE
(COMMON "UAL CORP")))
(COREFERENCE
(VALUE
(COMMON +)))
(INSTANCE-OF
(VALUE
(COMMON PHYSICAL-OBJECT)))
)
Events
Full NP Subjects
Pro Subjects
No Subjec ts
Proper Name Subj
Subjects which are common nouns
Nominalize d subjec ts
No subjects
Events identified correctly
Events identified incorrectly
Events not identified
Agents identified without Reference
Resolution
Agents identified with Reference
Resolution
Agent Referents needing to be
resolved
Agent Referents resolved correctly
Agent Referents resolved
incorrectly
Text
1
15
10
3
2
10
2
1
2
15
1
0
5/15
Text
2
8
5
1
2
6
0
0
2
8
0
0
2/8
Text
3
6
3
1
2
4
0
0
2
4
0
2
2/6
Text
19
10
6
2
2
6
1
1
2
6
2
4
2/10
Text
57
6
5
1
0
5
0
1
0
6
0
0
1/6
Text
59
15
10
3
2
12
1
0
2
12
2
3
6/15
Totals
13/15
6/8
4/6
5/10
5/6
8/15
41/60
10
5
4
4
3
4
30
7
1
3
0
2
0
4
0
3
0
2
0
21
1
60
39
11
10
43
4
3
10
51
5
9
18/60