Finding Genealogy Facts with Linguistic Analysis

Ontology-based Information
Extraction with a Cognitive Agent
Peter Lindes1, Deryle Lonsdale, David Embley
Brigham Young University
AAAI 2015
1Now
at University of Michigan
© 2015 Peter Lindes
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
1
The Problem
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
2
Goals and Strategies
• OntoSoar project goals
– Extract genealogy facts from family history books
– Project extracted information onto a conceptual model to
populate a searchable database
• Strategies
– Use ideas from Embodied Construction Grammar
– Use the Soar cognitive architecture
– Integrate several levels of knowledge
• Long term goals
– Build computational models of human language processing
– Apply these models to real-world applications
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
3
Example 1
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
4
A Simple Ontology
has
Charles
Christopher
Lathrop
born on
1817
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
died on
1865
5
Example2
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
6
A More Complex Ontology
Myra Harwood
Feb. 13, 1874
Jonathan Squires
J. Wilbur Squires
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
7
The Solution
Thus, intelligence is the ability to bring to bear all the
knowledge that one has in service of one’s goals.
Newell (1990), p. 90
Conceptual Models
World Knowledge
Pragmatics
Semantics
Syntax
Text Analysis
Page Layout *
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
8
OntoSoar Architecture
Soar
Segmenter
LG Parser
Meaning
Builder
Link
Grammar
Grammar
Constructions
(16)
(A total of 260 Soar productions)
Conceptual
Semantic
Analyzer
Mapper
Inference
Rules
User
Ontology
(OSMX)
PDF Tools
Segment
Rules
(37)
Text
Segments
Linkages
Meaning
Schemas
Knowledge
Structures
Populated
User
Ontology
(OSMX)
Facts
OntoES
Tool Set
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
9
Construction Grammar
Person
Person
LifeEvent
LifeEvent
Date
Date
Person
Person
LIFE-EVENT
REF-EXPR
1/22/2015
LE-VERB
SonOf
SonOf
Person
Person
SON-OF
DATE
REF-EXPR
AAAI 2015 - IE with a Cognitive Agent
son of
REF-EXPR
10
Applying Constructions
Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ;
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
11
… More Constructions
Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ;
SonOf
SonOf
Person
Person
Name
1/22/2015
Name
AAAI 2015 - IE with a Cognitive Agent
Person
Name
12
Building Knowledge
Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ;
Person
gender: M
name: Gerard Lathrop
birth:
death:
husband
Couple
married:
child
wife
parents
Person
gender: F
name: Mary Ely
birth:
death:
Person
gender: M
name: Charles C. Lathrop
birth: 1817
death: 1865
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
13
Knowledge Structures Compared
Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ;
… his widow married JONATHAN SQUIRES, who was born in Ohio, July 25, 1823, by whom
she had one son, J. Wilbur, born June 16, 1865,
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
14
Results on Examples
1/22/2015
AAAI 2015 - IE with a Cognitive Agent
15
Data Accuracy for Test Data
120
100
80
60
40
20
0
CCL
Myra
OD1
OD2
OD3
Persons
1/22/2015
OD4
OD5
Births and Deaths
OD6
OD7
Marriages
OD8
OD9
OD10
OD11
OD12
Children
AAAI 2015 - IE with a Cognitive Agent
16
Results on The Ely Ancestry
a book of 830 pages, including our Example 1
1/22/2015
Item Type
Instance Found
Persons
16,848
Births
8,609
Deaths
2,406
Genders
1,674
Couples
3,343
Children
3,049
Total
35,929
AAAI 2015 - IE with a Cognitive Agent
17
Conclusions
It works! … and, it could work a lot better.
Contributions
• Produces usable genealogy
data from scanned books
• Does this using:
– Integration of several levels of
knowledge
– An adaptation of Embodied
Construction Grammar
– A cognitive architecture
(Soar)
1/22/2015
Future Work
• Integrate parsing with
semantics
• Develop a means to learn
many new constructions
• Adapt to varying book styles
• Scale up to perform well on
100’s of thousands of books
AAAI 2015 - IE with a Cognitive Agent
18