XML-Hoo!: A Prototype Application for Intelligent

Proceedings of the 35th Hawaii International Conference on System Sciences - 2002
XML-hoo!: A Prototype Application for Intelligent Query of XML Documents
using Domain-Specific Ontologies
Henry M. Kim
Schulich School of Business, York University, 4700 Keele St., Toronto, Ontario Canada M3J 1P3
[email protected]
Abstract
XML use for knowledge management systems (KMS)
requires a bottom-up approach: Semantics about
structured data in existing documents must be organized
and applied to support knowledge extraction and
discovery.
Semantics in of themselves constitute organizational
knowledge, and if they are represented systematically, and
consistent with XML structure of documents in the KMS’
repository, they can be codified as query answering
routines. This benefit can be realized by representing these
semantics using ontologies. An ontology is an explicit
understanding of shared understanding [3], so it represents
common knowledge represented in the KMS. More
expressively, it “consists of a representational vocabulary
with precise definitions of the meanings of the terms of
this vocabulary plus a set of formal axioms that constrain
interpretation and well-formed use of these terms” [4].
Ontology axioms can be codified and applied to structured
data for automated inference—i.e. query answering.
Smith and Coulter [5] discuss the use of ontologies
with XML data for e-business applications; Glushko et. al.
[6] posit limitations of such use, namely the impracticality
of detailed formalization because of rapidly changing
business needs. Through bottom up development,
however, a practical approach can be taken: Domainspecific ontologies useful for focused tasks can be
developed, and domain independent, generalizable
ontology representations can be organized over time [7].
A promising focused task is the use of ontologies to
support querying an XML repository. A review of such
applications—e.g. GoXML [8], XYZFind [9], and
xmlTree [10]—shows there are powerful search engines
that index, classify, and store numerous XML documents
on the web. However, they focus on speed and breadth of
the search, rather than depth. As a result, their observed
performance in answering queries is not substantially
better than a regular search engine, unless the searcher is
familiar with the structure of searched documents; these
engines generally do not commit to representing the
semantics of XML documents in order to answer the
queries.
Use of XML holds great promise for standardizing data
models for realizing benefits such as lowered development
costs and time for integrating inter-organizational business processes and intra-organizational knowledge management. Further benefits can be realized by formally
defining common semantics in ontologies using the standardized models. Automation of business processes that
require sharing knowledge represented in XML-based
ontologies can then be supported. In this paper, a proof-ofconcept application for using ontologies to support deduction of knowledge implicit in existing XML documents is
presented. This system, called XML-hoo!, employs a customized portal user interface to answer queries about
Shakespearian plays. Queries are answered by applying
inference rules about these plays represented as axioms
that comprise a Shakespearian ontology, composed of terminology corresponding to existing XML DTD's. These
rules are applied to plays represented in XML that are in
the public domain. Hence, answers to queries such as,
"Who is Romeo’s father?" can be automatically deduced
even though facts required for such answers are not
explicitly structured in XML documents. This application
demonstrates use of re-usable and sharable ontology representations to further leverage the expected proliferation
of XML documents.
1. Introduction
Though many organizations have converted their
documents to XML, these may not be fully utilized.
Structured data is of limited value unless applications
manipulate it in a way intended by the developer, and
required by the user; there must be a common
understanding about semantics of the structured data.
Common understanding can be enforced via use of an offthe-shelf library, e.g. xCBL™ [1], or use of standardized
industry-based languages, e.g. Financial products Markup
Language (FpML) [2]. Both approaches can be considered
top-down because they entail conformance to an external
data schema and accompanying semantics. However,
1
0-7695-1435-9/02 $17.00 (c) 2002 IEEE
1
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002
Shakespearian plays, are chosen for the test. The
reasons are the following: The focus group has read
many plays, so can ably provide requirements;
consistent XML element definitions exist, so an
application broadly querying all Shakespearian plays
can be designed; and extending the application to
query about other domains—e.g. plays in general or
historical writings—may be possible. The focus group
then provides requirements for an application more
sophisticated than their current KM system, and
general query and search engines. Specifically, they
want this application, XML-hoo! to answer non-trivial
queries, the kind that a student studying Shakespeare
may ask.
By committing to represent such semantics, is it
feasible to develop an “intelligent” query application? Is it
feasible to represent these semantics using ontologies?
What can be learned for developing a query application for
KMS? In this paper, a proof-of-concept application
developed to address these questions, XML-hoo!, is
presented with discussion of its ontology emphasized. An
ontological commitment to represent the semantics and
structure of Shakespearian plays [11], represented in XML
and available in the public domain, is made. Next,
excerpts of the ontology and XML-hoo! application are
presented. Then concluding remarks and future work are
stated.
2. XML-Hoo! Shakespearian Ontology
2.2. Competency Questions - Informal
This ontology is developed and presented using the
methodology shown below [12].
Users’ requirements expressed as queries of an
ontology-based application are competency questions.
Since terms to pose formal queries in the ontology's
language are not yet developed, these questions are
inherently informal, asked in English using vocabulary
and semantics familiar to users. The following are some
competency questions for the Shakespearian play ontology
used for XML-hoo!.
CQ-1. Who is Romeo’s father?
CQ 2. Who said “Et tu, brute!”?
CQ 3. Where does ‘King Lear’ take place?
CQ 4. Who are all the characters in ‘Taming of the
Shrew’?
CQ 5. Does Romeo utter a sonnet, and if so what does
he say?
A Motivating Scenario
C Ontology
Narrative about a company
Terminology
X
Q:
A
B
A1
Competency
Questions
A2
B
B1
B2
Data model of a domain
The questions that an
ontology should be used
to answer.
Axioms
∀A1∀Α2∀Y { A1 ∧ Α2 ⊃ Y }.
Specify capability of
ontology to support
problem-solving tasks
Formalizations that define
and constrain the data model
Prolog
populated
enterprise
model
A:
Demonstration of Competency
D Evaluation of Ontology
2.3. Ontology
Figure 1: Overview of the TOVE Ontological Engineering
Methodology
2.3.1. Terminology. The terminology of the ontology
comprises minimally of all terms required to formally
express, but not answer, the competency questions. In
turn, the expression that defines a given term is expressed
using other ontology terms. Ultimately, a primitive
ontology term is not defined, but mapped from a data
repository. In presenting the terminology, the data schema
of Shakespearian XML documents is presented pictorially
as a hierarchical model, then terminologically as the
ontology’s primitive terms expressed as predicates. Next,
key terms and relationships in the informal competency
questions are identified and integrated into pictorial (ER)
and terminological (predicate) models.
2.1. Motivating Scenario
The motivating scenario is a detailed narrative about
problems faced or tasks performed by a user for which an
ontology-based IS application is constructed. Here is
XML-hoo!’s motivating scenario:
• An organization with an IS-based KMS is studying
feasibility of migrating the system to an XML-based
platform. This study entails prototyping an application
for knowledge extraction/query from existing XML
documents to surmise development effort and cost/
benefit before converting all documents to XML. A
focus group is selected to provide requirements for,
and ultimately test, the prototype application.
Documents existing in public domain, specifically
2.3.2. Primitive Terms - Hierarchical Model.
2
0-7695-1435-9/02 $17.00 (c) 2002 IEEE
2
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002
Figure 2: Use of Defined Elements within a Shakespearian
XML document
attributes or child’s unique identifier. In an ontology,
predicates that are not formally defined are called
primitive terms because they are populated through
assertions, not by inference. Therefore, relationships
between terminal nodes in the XML data structure
correspond to primitive terms of the ontology.
<PLAY>
<TITLE>The Tragedy of Julius Caesar</TITLE>
<PERSONAE>
<TITLE>Dramatis Personae</TITLE>
<PERSONA>JULIUS CAESAR</PERSONA>
....
<PGROUP>
<PERSONA>FLAVIUS</PERSONA>
<PERSONA>MARULLUS</PERSONA>
<GRPDESCR>tribunes.</GRPDESCR>
....
</PERSONAE>
<SCNDESCR>SCENE Rome: the neighbourhood of Sardis:
the neighbourhood of Philippi.</SCNDESCR>
...
<ACT>
<TITLE>ACT I</TITLE>
<SCENE>
<TITLE>SCENE I. Rome. A street.</TITLE>
...
<SPEECH>
<SPEAKER>FLAVIUS</SPEAKER>
<LINE>Hence! home, you idle creatures
get you home:</LINE>
<LINE>Is this a holiday? what! know
you not,</LINE>
...
</SPEECH>
...
<PLAYSUBT>JULIUS CAESAR</PLAYSUBT>
...
</PLAY>
2.3.3. Primitive Terms - Predicate Model. Here is a
description of primitive term variables. Variable numbers
match those in Fig. 3.
(1) P: Name of the play; value of Play⇒1Title
(2) S: Subtitle of play; value of Play⇒Play Subtitle
(3) Scd: One scene description for the play; value of
Play⇒Scene Description
(4) Pa: A character list, one descriptive name for section
wherein all characters are introduced; value of
Play⇒Personae⇒Title
(5) Pe1: Character description set of all characters
individually introduced; value of
Play⇒Personae⇒Persona
(6) Gd: Character description for each grouping of
characters; value of
Play⇒Personae⇒PGroup⇒Group Description
(7) Pe2: Character description set of all characters
introduced within a given group; value of
Play⇒Personae⇒PGroup⇒Persona
(8) A: Title of an act within the play; value of
Play⇒Act⇒Title
(9) Std1: One stage direction for each act; value of
Play⇒Ac⇒Stage Direction
(10) Sc: Title of scene within an act; value of
Play⇒Act⇒Scene⇒Title
(11) Std2:One stage direction for each scene; value of Play>Act->Scene->Stage Direction
(12) Sp: Speaker of a speech; value of
Play⇒Act⇒Scene⇒Speech⇒Speaker
(13) L: A line in a speech; value of
Play⇒Act⇒Scene⇒Speech⇒Line
(14) Px: Free text, which will not be represented using the
ontology; value of Play⇒FM⇒P
The hierarchical relationships of the markup tags can
be diagrammed this way:
Play
FM
Personae
*
Play Subtitle
(1)
(14) (4)
P
*
Act
Title
(5)
*
Persona
Title
(7)
*
Persona
(3)
(9)
*
Scene
PGroup
(6)
Stage
Direction
Title
(8)
*
Group
Description
Speech
Title
(10)
(13)
Stage
Direction
(11)
*
Line
Cardinality of has relationships are
one-to-one, unless explicitly noted
as one-to-many with *
Scene
Description
(2)
Z
(12)
Speaker
entity
(#)
X
Attribute of entity
Y
Unique Identifier of entity
PT-1. play_has_act(P,A)
e.g. play_has_act(‘The Tragedy of Julius Caesar’, ‘ACT I’).
PT-2. play_has_subtitle(P,S)
e.g. play_has_subtitle(‘The Tragedy of Julius Caesar’, ‘JULIUS
CAESAR’).
Figure 3: Hierarchical Structure of the Shakespearian Plays
in XML:
PT-3.
All relationships can be read top-to-bottom as has—i.e.
a play has title, or play has act, which has title. Parent
entities have children entities. Terminal nodes in the
diagram correspond to elements that mark-up document
content; i.e. they are attributes of an entity. Some terminal
nodes are attributes that uniquely identify an entity. This
model can then can be expressed using predicates, which
relate either an entity’s unique identifier to its own
play_has_scene_description(P,Scd)
PT-4. play_has_character_list(P,Pa)
e.g. play_has_character_list(‘The Tragedy of Julius Caesar’, ‘Dramatis
Personae’).
PT-5.
character_list_has_character_description(Pa,Pe1)
1. ⇒ denotes has
3
0-7695-1435-9/02 $17.00 (c) 2002 IEEE
3
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002
e.g. character_list_has_character_description(‘Dramatis
Personae’, ‘JULIUS CAESAR’).
<PERSONA>1 ::= <description set>
PT-6. character_list_has_group(Pa,Gd)
e.g. character_list_has_group(‘Dramatis Personae’, ‘tribunes.’).
<description set> ::=
<character name>
[ (<character pseudonym>) ]
[
{ , <qualification of character> }
{ , <relationship of character> }
; <description set> | <primitive description
set>
|
, <description set> | <primitive description
set>
]
[.]
PT-7. group_has_character_description(Gd,Pe2)
e.g. group_has_character_description(‘tribunes.’, ‘FLAVIUS.’).
PT-8.
act_has_stage_direction(A,Std1)
PT-9. act_has_scene(A,Sc)
e.g. act_has_scene(‘ACT 1’, ‘SCENE I. Rome. A street.’).
<GRPDESCR>1 ::=
<qualification of character> |
<relationship of character>
[...unstructured text ].
PT-10. scene_has_stage_direction(Sc,Std2)
PT-11. scene_has_speech(Sc,Sp,L1)
L1: First line of a speech. Though not explicitly an element,
this attribute in combination with Sp is used to uniquely
identify a speech
e.g. scene_has_speech(‘SCENE I. Rome. A street.’,‘FLAVIUS’,
‘Hence! home, you idle creatures get you home:’).
<primitive description set> ::=
<character name>2
[ (<character pseudonym>) ]
<qualification of character> ::=
[
{ <adjective> | <adverb> } ]
<character title>
[ { <preposition> | <article> }
<location qualifier>
]
PT-12. speech_has_line(Sp,L1,L)
e.g. speech_has_line(‘FLAVIUS’, ‘Hence! home, you idle creatures get
you home:’ , ’Is this a holiday? what! know you not,’).
<relationship of character> ::=
[
<conjunction> ]
<relation title>
[ { <preposition> | <article> } ]
<character related to>
[
{
<conjunction>
<character related to> }
]
Each primitive term can be populated by a query on
XML documents. Take the question, “Given that ‘Tragedy
of Julius Caesar’ is the title of the play, what is the play’s
subtitle?”. Using First-Order Logic, the question is
expressed using primitive terms as the following axiom to
prove: ∃S play_has_subtitle(‘The Tragedy of Julius Caesar’,S).
Using the query language XML-QL, the question is
expressed as follows, and returns the value, $st=‘JULIUS
CAESAR’.
<relation title> ::=
<relation noun> <relation preposition>
e.g. in <PERSONA>REYNALDO, servant to
Polonius.</PERSONA>
- <character name> = REYNALDO
- <relation noun> = servant
- <relation preposition> = to
- <character related to> = Polonius
WHERE <PLAY>
<TITLE>Tragedy of Julius Caesar</>
<PLAYSUBT>$st</PLAYSUBT>
</> in "Julius_Caesar.xml"
e.g. in <PERSONA>PARIS, a young nobleman,
kinsman to the prince.
</PERSONA>
- <character title> = nobleman
- <relation noun> = kinsman
- <relation preposition> = to
- <character related to> =
prince
CONSTRUCT <results>
<PLAYSUBT>$st</>
</results>
Though not explicitly structured using XML elements,
there is an observed format for introducing characters,
which applies with few exceptions. For instance, the value
of the <PERSONA> element always starts with the
character’s name, and may be proceeded by combinations
of pseudonym, qualifiers, and statements of relationship
with other characters.
e.g. in <GRPDESCR>friends to Brutus and
Cassius.</GRPDESCR>
- <relation noun> = friends
- <relation preposition> = to
- <character related to> =
Brutus
- <character related to> =
Cassius
1 <PERSONA> and <GRPDESCR> are the only XML elements, all others are shown
in <..> for consistency with BNF notation
2 Formats in bold italics are primitive formats--those not defined in terms
of other formats--significant for the ontology. Though other formats like
<adverb> and <preposition> are primitives also, they markup extraneous
information, which need not be represented using the ontology
BNF
[..] format within brackets is optional
{..} format within brackets may occur 0 or more times
| or
x::=y grammar of x defined by format expressed as y
4
0-7695-1435-9/02 $17.00 (c) 2002 IEEE
4
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002
Figure 4: Format for Persona and Group Descriptions,
expressed in BNF notation
PT-18. description_has_relationship(D,Rn,Rp,Cr)
e.g1. description_has_relationship(‘REYNALDO, servant to
Polonius.’ , ’servant’ , ’to’ , ’Polonius’).
e.g2. description_has_relationship(‘friends to Brutus and Cassius.’ ,
’friends’ , ’to’ , ’Brutus’).
Obviously, the implementation to parse values within
an element is not trivial and is an issue for future work.
Nevertheless, assuming parsing capability, following are
the primitive terms, which express relationships between
<PERSONA> or <GRPDESCR> and the primitive formats
such as <character name> and <relation noun>.
Pe:
Pd:
D:
C:
Ps:
Qt:
Lq:
Cr:
Rn:
Rp:
2.3.4. Ontology Data and Predicate Models. From the
informal competency questions, the following key words
are isolated: who, son, said/utter, where... take place,
characters, sonnet, and what... say. Obviously, some of
these words can be easily defined using the primitive
terms as:
Pe1 (individual character description) or Pe2
(description of individual characters described in a
group description); value for <PERSONA> element
Description for one character; value for
<primitive description set> format
Pd or Gd (group description: value for
<GRPDESCR> element)
Just the name of the character; value for
<character name> format
Pseudonym of the character; value for
<character pseudonym> format
A qualifying title of a character, e.g. ‘King’; value
for <character title> format
A location that qualifies a character’s title, e.g.
‘King of Denmark’; value for <location
qualifier> format
A character who is referenced when describing
another character; value for <character
related to> format
The noun describing the nature of the relation
between a character and Cr, e.g. father; value for
<relation noun> format
Relation preposition that qualifies Rn, e.g. ‘of’ in
‘father-of’; value for <relation
preposition> format
character(C)
location(Lo)
speaker_starts_speech_with(Sp,L1)
Pred-1.
Pred-2.
Pred-3.
These in turn are used to define ontology terms such as
the following:
play_has_character(P,C)
play_has_location(P,Lo)
has_father(C1,C2)
speaker_says(Sp,L)
speaks_sonnet(Sp,L1,Ls)
Pred-4.
Pred-5.
Pred-6.
Pred-7.
Pred-8.
A sonnet—its contents being a list Ls—is spoken by
speaker Sp and starts with line L1.
- play name
- subtitle
- scene description
e.g. has group relation
e.g. - play name attribute
Play
has act
- act name
- stage direction
has character list
- character
list name
Act
Character
Group
List
- group
description
- character
description name
PT-13. character_description_has_primitive_description
has scene
_set(Pe,Pd)
- act name
- stage direction
e.g1. character_description_has_primitive_description_set(
‘Senators, Citizens, Guards, Attendants’ , ’Senators’).
e.g2. character_description_has_primitive_description_set(
‘CLAUDIUS, king of Denmark’ , ’CLAUDIUS, king of Denmark’).
Scene
Character
Description
has
character
description
has
character
description
has primitive description set
has speech
PT-14. primitive_description_set_has_character(Pd,C)
e.g. primitive_description_set_has_character(‘CLAUDIUS, king
of Denmark’ , ’CLAUDIUS’).
- first line
Speech
- primitive
Primitive description set
speaker starts
speech with
- location name
Location
Description name
Set
has character
has line
speaks sonnet
PT-15. primitive_description_set_has_pseudonym(Pe,P
Line
s)
e.g. primitive_description_set_has_pseudonym(‘MARCUS
ANTONIUS (ANTONY)’ , ’ANTONY’).
- line text
Sonnet
Character
has qualifier
related to
has relationship
has son
- character name
- pseudonym
- first line
Figure 5: Revised Data Model (Represented using EntityRelationships)
PT-16. description_has_qualifying_title(D,Qt)
e.g. description_has_qualifying_title(‘CLAUDIUS, king of Denmark’ ,
’king’).
2.3.5. Formal Competency Questions. Competency
questions can now be posed, and formally expressed in
First-Order Logic. Given a set of axioms in an ontology
(Tontology), and a set of instance objects and relations
PT-17. description_has_location_qualifier(D,Qt,Lq)
e.g. description_has_location_qualifier(‘CLAUDIUS, king of
Denmark’ , ’king’ , ’Denmark’).
5
0-7695-1435-9/02 $17.00 (c) 2002 IEEE
5
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002
(Tground), a competency question Q is a First-Order
sentence that can be entailed or is consistent with these
sets; i.e. Tontology ∪ Tground = Q or Tontology ∪
Tground = ¬Q.
CQ-1. Which character is the father of the Romeo
character?
A primitive description set describing one character
can have both the character’s name and pseudonym
used.
∀C∀Ps∃Pd [ primitive_description_set_has_character(Pd,C)
∧ primitive_description_set_has_pseudonym(Pd,Ps)
↔ character_has_pseudonym(C,Ps) ].
∃C has_father(‘Romeo’,C).
CQ 2. Which speaker says the
∃C speaker_says(C,‘Et tu, brute!’).
Defn-5. a) related_characters(C1,Rn,Rp,C2)
CQ 3. What is the location for the
∃Lo play_has_location(‘King Lear’,Lo).
CQ 4.
C1 has a relationship, expressed as relation
noun(Rn)+preposition(Rp), with C2, if: C1 is explicitly
stated as related to C2 or C2’s pseudonym; C1 is a
character introduced individually, or is any of the
characters in a group that has a relationship to C2;
andC1 and C2 are characters in the same play.
line, “Et tu, brute!”?
play ‘King Lear’?
Which are all the characters in the play ‘Taming
of the Shrew’?
∀C1∀C2∀Rn∀Rp∃D [
( description_has_relationship(D,Rn,Rp,C2) ∨
∃Cr ( description_has_relationship(D,Rn,Rp,Cr) ∧
character_has_pseudonym(C2,Cr) ) ) ∧
( primitive_description_set_has_character(D,C1) ∨
∃Pe∃Pd ( group_has_character_description(D,Pe) ∧
character_description_has_primitive_description_
set(Pe,Pd) ∧ primitive_description_set_
has_character(Pd,C1) ) ) ∧
∃P ( play_has_character(P,C1) ∧ play_has_character(P,C2) )
→ related_characters(C1,Rn,Rp,C2) ].
∃L∀C [ C∈L → play_has_character(‘Taming of the Shrew’,C) ].
Does Romeo speak a sonnet, and what are the
lines that comprise it?
∃L1∃L speaks_sonnet(‘Romeo’,L1,L).
CQ 5.
Each question corresponds to an axiom. To prove it,
ontology axioms defining and constraining use of terms
comprising the question axiom must exist.
Defn-6. b) related_characters(C1,Rn,Rp,C2)
2.3.6. Axioms
To answer CQ-1, the following predicates are formally
defined.
C1 has a relationship, expressed as relation
noun(Rn)+preposition(Rp), with C2 if: C1 is a
pseudonym for a character whose relationship with C2
can be inferred; C2 is a pseudonym for a character
whose relationship with C1 can be inferred, or C1 and
C2 are pseudonyms for characters whose relationship
with each other can be inferred.
Defn-1. character(C)
The first part of a primitive description set for one
character is the character’s name.
∀C∃Pd [ primitive_description_set_has_character(Pd,C) ↔
character(C) ].
∀C1∀C2∀Rn∀Rp∃Ca∃Cb [
( related_characters(C1,Rn,Rp,Cb) ∧
character_has_pseudonym(Cb,C2) } ∨
( related_characters(Ca,Rn,Rp,C2) ∧
character_has_pseudonym(Ca,C1) } ∨
( related_characters(Ca,Rn,Rp,Cb) ∧
character_has_pseudonym(Ca,C1) ∧
character_has_pseudonym(Cb,C2) }
→ related_characters(C1,Rn,Rp,C2) ].
Defn-2. play_has_character_description(P,Pe)
A character description Pe either is in a list of
individual character descriptions, or contained within a
list of group descriptions.
∀Pe∀P∃Pa [ play_has_character_list(P,Pa) ∧
( character_list_has_character_description(Pa,Pe) ∨
∃Gd ( character_list_has_group(Pa,Gd) ∧
group_has_character_description(Gd,Pe) ) )
↔ play_has_character_description(P,Pe) ].
Defn-7. a) may_be_related_characters(C1,Rn,Rp,C2)
Defn-3. play_has_character(P,C)
C1 may have a relationship, expressed as relation
noun(Rn)+preposition(Rp), with C2, if: C1 and C2’s
relationship (Rn+Rp) cannot be inferred for sure; C1 is
explicitly stated as related to C2’s qualifying title or
location qualifier; C2 is a character introduced
individually, or is any of the characters in a group; C1
is a character introduced individually, or is any of the
characters in a group that has a relationship to C2, and
C1 and C2 are characters in the same play.
A character name C is the first part of a primitive
description set describing one character, which is part
of a list of character descriptions
∀P∀C∃Pe∃Pd [ play_has_character_description(P,Pe) ∧
character_description_has_primitive_description_set(Pe,Pd)
∧ primitive_description_set_has_character(Pd,C)
↔ play_has_character(P,C) ].
Defn-4. character_has_pseudonym(C,Ps)
6
0-7695-1435-9/02 $17.00 (c) 2002 IEEE
6
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002
∀C1∀C2∀Rn∀Rp∃D∃C∃D2 [
¬related_characters(C1,Rn,Rp,Cb) ∧
∃Cr description_has_relationship(D,Rn,Rp,C) ∧
( description_has_qualifying_title(D2,C) ∨
description_has_location_qualifier(D2,C) ) ∧
( primitive_description_set_has_character(D2,C2) ∨
∃Pe2∃Pd2 ( group_has_character_description(D2,Pe2) ∧
character_description_has_primitive_description_
set(Pe2,Pd2) ∧
primitive_description_set_has_character(Pd2,C2) ) ) ∧
( primitive_description_set_has_character(D,C1) ∨
∃Pe∃Pd ( group_has_character_description(D,Pe) ∧
character_description_has_primitive_
description_set(Pe,Pd) ∧
primitive_description_set_has_character(Pd,C1) ) ) ∧
∃P ( play_has_character(P,C1) ∧ play_has_character(P,C2) )
→ may_be_related_characters(C1,Rn,Rp,C2) ].
In the next section, these axioms are applied to answer
competency questions.
2.4. Demonstration of Competency
’
<?xml version="1.0"?>
<PLAY>
<TITLE>The Tragedy of Romeo and Juliet</TITLE>
<fm>
<p>Text placed in the public domain by Moby Lexical
Tools, 1992.</p>
<p>SGML markup by Jon Bosak, 1992-1994.</p>
<p>XML version by Jon Bosak, 1996-1997.</p>
<p>This work may be freely copied and distributed
worldwide.</p>
</fm>
<PERSONAE>
<TITLE>Dramatis Personae</TITLE>
<PERSONA>ESCALUS, prince of Verona. </PERSONA>
<PERSONA>PARIS, a young nobleman, kinsman to the
prince.</PERSONA>
<PGROUP>
<PERSONA>MONTAGUE</PERSONA>
<PERSONA>CAPULET</PERSONA>
<GRPDESCR>heads of two houses at variance with each
other.</GRPDESCR>
</PGROUP>
<PERSONA>An old man, cousin to Capulet. </PERSONA>
<PERSONA>ROMEO, son to Montague.</PERSONA>
<PERSONA>MERCUTIO, kinsman to the prince, and friend
to Romeo.</PERSONA>
<PERSONA>BENVOLIO, nephew to Montague, and friend to
Romeo.</PERSONA>
.
.
<PERSONA>LADY MONTAGUE, wife to Montague.</PERSONA>
<PERSONA>LADY CAPULET, wife to Capulet.</PERSONA>
Defn-8. b) may_be_related_characters(C1,Rn,Rp,C2)
defined similarly to Defn-6.
Defn-9. has_son(C1,C2)
∀C1∀C2 [ related_characters(C2,’son’ , ’of’,C1) ∨
related_characters(C2,’son’ , ’to’,C1) → has_son(C1,C2) ].
Defn-10.has_wife(C1,C2)
∀C1∀C2 [ related_characters(C2,’wife’ , ’of’,C1) ∨
related_characters(C2,’wife’ , ’to’,C1) → has_wife(C1,C2) ].
Defn-11.a) has_father(C1,C2)
∀C1∀C2 [ related_characters(C2,’father’ , ’of’,C1) ∨
related_characters(C2,’father’ , ’to’,C1) → has_father(C1,C2)].
Figure 6: Excerpt from XML document of ‘Romeo and Juliet
Defn-12.b) has_father(C1,C2)
∀C1∀C2 [ has_son(C2,C1) ∧ ∃C3 has_wife(C2,C3) →
has_father(C1,C2) ].
(i) play_has_character_list(‘The Tragedy of Romeo and Juliet’ , ’Dramatis Personae’).
(ii) character_list_has_character_description(‘Dramatis Personae’ ,
’Escalus, prince of Verona.’).
(iii) character_description_has_primitive_description_set(’Escalus,
prince of Verona.’ , ’Escalus, prince of Verona.’).
(iv) primitive_description_set_has_character(’Escalus, prince of
Verona.’ , ’Escalus’).
(v) description_has_qualifying_title(’Escalus, prince of Verona.’ ,
’prince’).
(vi) character_list_has_character_description(‘Dramatis Personae’ ,
’ROMEO, son to Montague.’).
(vii) character_description_has_primitive_description_set(’ROMEO
, son to Montague.’ , ’ROMEO, son to Montague.’).
(viii) primitive_description_set_has_character(’ROMEO, son to
Montague.’ , ’ROMEO’).
(ix) description_has_relationship(’ROMEO, son to Montague.’ , ’son’
, ’to’ , ’Montague’).
(x) character_list_has_character_description(‘Dramatis Personae’ ,
’LADY MONTAGUE, wife to Montague.’).
(xi) character_description_has_primitive_description_set(’LADY
MONTAGUE, wife to Montague.’ , ’LADY MONTAGUE, wife to Montague.’).
(xii) primitive_description_set_has_character(’LADY MONTAGUE,
wife to Montague.’ , ’LADY MONTAGUE’).
(xiii) description_has_relationship(’LADY MONTAGUE,wife to Montague.’ , ’wife’ , ’to’ , ’Montague’).
Obviously, many such relationship terms can be
defined, e.g. has_mother, has_uncle, or additional
definitions has_father. Also possible familial relationships
can be defined using may_be_related_characters.
Definitions for answering CQ-2 and CQ-3 are
straightforward, so are not presented. The predicate
play_has_character has been defined, so CQ-4 can be
answered. It is difficult to precisely answer CQ-5 because
defining when lines rhyme in a sonnet—consisting of
three quatrains and a couplet with a rhyme scheme of abab
cdcd efef gg—is complicated. Furthermore, existing
Shakespearian XML documents do not represent
paragraph separations, so quatrains and couplets cannot be
defined without modifying the documents. The following
axiom at least disqualifies a speech that is obviously not a
sonnet.
Cons-1. If a speech does not have 14 lines, it cannot be
a sonnet.
∀Sp∃Ls [ ∃L1∀L ( L∈Ls → speech_has_line(Sp,L1,L) ) ∧
n(Ls)≠14 → ¬speaks_sonnet(Sp,L1,Ls) ].
where n(X) is a function that returns the # of elements in a
list X.
7
0-7695-1435-9/02 $17.00 (c) 2002 IEEE
7
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002
(xiv) character_list_has_character_description(‘Dramatis Personae’ ,
’MERCUTIO, kinsman to the prince, and friend to Romeo.’).
(xv) character_description_has_primitive_description_set(’MERCUTIO, kinsman to the prince, and friend to Romeo.’ , ’MERCUTIO, kinsman
to the prince, and friend to Romeo.’).
(xvi) primitive_description_set_has_character(’MERCUTIO, kinsman to the prince, and friend to Romeo.’ , ’MERCUTIO’).
(xvii) description_has_relationship(’MERCUTIO, kinsman to the prince,
and friend to Romeo.’,kinsman’ , ’to’ , ’prince’).
(xviii) description_has_relationship(’MERCUTIO, kinsman to the
prince, and friend to Romeo.’,friend’ , ’to’ , ’Romeo’).
(xix) character_list_has_group(‘Dramatis Personae’,‘heads of two houses at
variance with each other.’).
(xx) group_has_character_description(‘heads of two houses at variance
with each other.’ , ’MONTAGUE’)
(xxi) character_description_has_primitive_description_set(’MONTAGUE’ , ’MONTAGUE’).
(xxii) primitive_description_set_has_character(’MONTAGUE’ ,
’MONTAGUE’).
- applying Defn-2 to (i) & (ii), infer
(xxxiv) play_has_character_description(‘The Tragedy of Romeo
and Juliet’ , ’Escalus, prince of Verona.’)
- applying Defn-2 to (i) & (xiv), infer
(xxxv) play_has_character_description(‘The Tragedy of Romeo
and Juliet’ , ’MERCUTIO, kinsman to the prince, and friend to Romeo.’)
- applying Defn-3 to (xxxiv), (iii) & (iv), infer
(xxxvi) play_has_character(‘The Tragedy of Romeo and Juliet’ ,
’Escalus’)
- applying Defn-3 to (xxxv), (xv) & (xvi), infer
(xxxvii) play_has_character(‘The Tragedy of Romeo and Juliet’ ,
’’MERCUTIO’)
- applying Defn-7 to (xvii), (v), (vi), (xvi), (xxxvi) & (xxxviii), infer
(xxxviii) may_be_related_characters(’Mercutio’ , ’kinsman’ , ’to’ ,
’Escalus’).
Figure 9: Answering CQ-6
Figure 7: Relevant Primitive Term Instances
How XML-hoo! answers competency questions is
presented below.
So, the following competency question is answered.
CQ-1. Which character is the son of the Montague
character?
∃C has_father(‘Romeo’,C). returns has_father(‘Romeo’ ,
3. XML-hoo!
XML-hoo! is presented to the user via a web browser.
The user can perform: 1) guided search through a topic
classification tree, 2) keyword search using a traditional
search engine, or 3) a menu-driven conceptual search of
Shakespearian plays. Since the first two are capabilities
provided by existing XML portals, this paper will discuss
the third.
The user is presented with the following screen in
which context sensitive menus are presented:
’Montague’).
- applying Defn-2 to (i) & (vi), infer
(xxiii) play_has_character_description(‘The Tragedy of Romeo
and Juliet’ , ’ROMEO, son to Montague.’)
- applying Defn-2 to (i), (xix) & (xx), infer
(xxiv) play_has_character_description(‘The Tragedy of Romeo
and Juliet’ , ’MONTAGUE’)
- applying Defn-2 to (i) & (x), infer
(xxv) play_has_character_description(‘The Tragedy of Romeo
and Juliet’ , ’LADY MONTAGUE’)
- applying Defn-3 to (xxiii), (vii) & (viii), infer
(xxvi) play_has_character(‘The Tragedy of Romeo and Juliet’ ,
’ROMEO’)
- applying Defn-3 to (xxiv), (xxi) & (xxii), infer
(xxvii) play_has_character(‘The Tragedy of Romeo and Juliet’ ,
’’MONTAGUE’)
- applying Defn-3 to (xxv), (xxi) & (xxii), infer
(xxviii) play_has_character(‘The Tragedy of Romeo and Juliet’ ,
’’LADY MONTAGUE’)
- applying Defn-5 to (viii), (ix), (xxvi) & (xxvii), infer
(xxix) related_characters(’ROMEO’ , ’son’ , ’to’ , ’Montague’)
- applying Defn-5 to (xii), (xiii), (xxvii) & (xxviii), infer
(xxx) related_characters(’LADY MONTAGUE’ , ’wife’ , ’to’ , ’Montague’)
- applying Defn-9 to (xxix), infer
(xxxi) has_son(,’Montague’ , ’Romeo’).
- applying Defn-10 to (xxx), infer
(xxxii) has_wife(,’Montague’ , ’Lady Montague’).
- applying Defn-12 to (xxxi) & (xxxii), infer
(xxxiii) has_father(’Romeo’ , ’Montague’).
Figure 8: Answering CQ-1
A more interesting question is, “Who are the kinsmen
to Escalus?”, since no direct relationship reference to
‘Escalus’ is stated in character introductions. Rather, there
are references to ‘prince,’ which describes him. This
implicit relationship can be inferred using the axioms:
CQ 6. Who is possibly Escalus’ kinsman?
∃C may_be_related_characters(C,‘kinsman’,‘to’,‘Escalus’,).
Figure 10: XML-hoo! Conceptual Search Interface
returns
may_be_related_characters(‘Merculio’,‘kinsman’,‘to’,‘Escalus’).
8
0-7695-1435-9/02 $17.00 (c) 2002 IEEE
8
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002
parametrically represented that Montague is
Romeo’s father. Yet, this can be reasoned in XMLhoo!.
• The thorough example presented in this paper
provide useful instructions for an ontology building
endeavor to complement an XML repository. In
fact, the example is novel insofar as it is an
application of a traditional ontological engineering
methodology to develop XML-based ontologies.
• The XML-hoo! application serves as a first pass
reference for any endeavor to develop a knowledge
management application using XML-based
ontologies.
The crux of the XML-hoo! application is the
Shakespearian play ontology. Its development can be
described as follows:
• An ontological engineering methodology is
followed to state the motivating scenario and
competency questions.
• The hierarchical structure for a common set of
XML documents, namely Shakespearian plays, is
translated to develop primitive terms—ontology
predicates that are populated by look-ups into an
XML document, rather than inferred using formal
definitions.
• Additional structure within an element is
discerned; e.g. there is a fomat for character
introductions that holds with few exceptions,
which applies to the <PERSONA> element. This
structure is translated to develop more primitive
terms.
• Ontology predicates are identified from
competency questions and ensured for consistency
with primitive terms. This is sufficient to express
the competency questions formally in the language
of the ontology using predicates.
• Axioms that define meanings of predicates or
constrain their interpretation are developed. By
applying ontology axioms to populated primitive
terms, answers to competency questions are
inferred.
The pre-defined relationships in the menu correspond
to predicates defined in the ontology. The diagram below
explains how a user query is answered.
Figure 11: XML-hoo! Main Systems Architecture, and Query
Answer Process
XML Document
Repository
(5)
(7)
User
Interface
(1)
(8)
XML
Query
Engine
(6)
Control
Module
(2)
(4)
Ontology
Query
Engine
(3)
Ontology
Repository
In (1), the Control Module, a Java program, parses the query represented from the user interface, and translates and expresses it as a competency question in the ontology implementation language.
In (2), the control module then passes the question to the Ontology
Query Engine, which interacts with the Ontology Repository—the Prolog programming environment is used to implement both components—to prove the competency question axiom (3). In (4), a set of
primitive term queries that needs to be answered are then passed back
to the Control Module. In (5), the control module then translates each
query in the XML query language, XML-QL, and passes them to the
XML Query Engine, which repeatedly queries XML documents in the
repository (6). In (7), answers to XML queries are returned to the control module, which returns a set of populated primitive terms to the
Ontology Query Engine, which then proves the competency question—
i.e. repeats steps (2)-(4). In (8), the Control Module formats the answer,
if it exists, or an error statement, and returns that to the User Interface.
The rationale for the systems architecture is consistent
with ontology use: Sharability and re-usability. The
application can scale up to a variety of users querying
about different domains and from different repository
sources by de-coupling the Control Module, XML Query
Engine and Repository, and Ontology Query Engine and
Repository, albeit at the expense of inefficiency for
focused users and a centralized repository source.
4. Concluding Remarks and Future Work
The development of this ontology, and the XML-hoo!
application based on it, provide the following evidence to
support using domain specific ontologies to represent
semantics for XML documents in a knowledge
management system:
• The capability of the Shakespearian ontology to
support inference of facts not explicitly structured
in XML demonstrates that an ontology-based
approach to query answering is a natural
complementary function for an XML data
repository. Familiar relationships are not structured
in XML. Plus, nowhere is it explicitly nor
This work can be extended several ways. There is an
immediate opportunity to formalize and automate
manipulation of format (grammar) within an XML
element not further structured. This is inherently difficult,
and achieving perfect manipulation (parsing) is unlikely.
However, the systematic perspective of an ontologist can
be applied to make simplifying assumptions in specifying
grammar and restricting the domain of discourse described
using that grammar. It is believed that a tractable parsing
system can result, and a prototype for XML-hoo! is
currently in development.
9
0-7695-1435-9/02 $17.00 (c) 2002 IEEE
9
Proceedings of the 35th Hawaii International Conference on System Sciences - 2002
[6] Glushko, Robert J., Tenenbaum, Jay M., Meltzer,
Bart (1999). "An XML Framework for Agent-based
E-commerce", Communications of the ACM, Vol. 42,
No. 3.
[7] Kim, Henry M. (2000). "Integrating Business Process-Oriented and Data-Driven Approaches for
Ontology Development", AAAI Spring Symposium
Series 2000 - Bringing Knowledge to Business Processes, Stanford, CA, March 20-22.
[8] goXML.com (2001). “goXML.com: Intelligent
Searching”,
[Online],
Available:
http://
www.goxml.com, April.
[9] XYZFind (2001). “XYZFind - XML Database Repository, Query, and Search for XML”, [Online],
Available: http://www.xyzfind.com. April.
[10] XMLTree (2001). “xmlTree.com - Directory of Content”, [Online], Available: http://www.xmltree.com.
April.
[11] Bozak, Jon (1997). XML Shakespeare, [Online],
Available:
http://metalab.unc.edu/bosak/xml/eg/
shaks200.zip.
[12] Kim, Henry M., Fox, Mark S., Grüninger, Michael
(1999). "An Ontology for Quality Management:
Enabling Quality Problem Identification and Tracing", BT Technology Journal, Kluwer, Netherlands,
Vol. 17, No. 4.
[13] Fensel, D., Horrocks, I., Van Harmelen F., Decker,
S., Erdmann, M. and Klein, M. (2000). "OIL in a nutshell", Proceedings of the European Knowledge
Acquisition Conference (EKAW-2000).
Beyond formally representing semantics to support
automatic inference, ontologies are desirable for use
because its representations are sharable and re-usable. So
not only can ontologies be used to more richly answer
queries about data structured in XML, they are inherently
re-usable as more documents are added to a repository.
From this prototype, it appears that the Shakespearian play
ontology’s definitions are indeed applicable for other types
of plays and literature, and can be de-coupled from
specific structural definitions. Future work will endeavor
to provide stronger evidence of this. Though the
definitions of primitive terms may need to be changed as
similar but different DTD’s are added to a repository,
higher level definitions may be re-used with little
modification. As several iterations of the ontological
engineering methodology are applied, a structuring of the
ontologies for a repository will emerge: General
ontologies will be defined in terms of specific ontologies’
representations. Whereas XML is used to structure data,
ontologies can then serve to structure knowledge
composed from data. This supports the role of ontologybased languages to proceed XML towards Berners-Lee’s
Semantic Web [13].
5. Acknowledgements
The author expresses gratitude to the students in his
Business Information Systems Analysis, who assisted in
detailing the ontology and user interface.
6. References
[1] Commerce One, Inc. (2000). "Commerce One XML
Common Business Library (XCBLTM), an Interconnectivity Guide, Version 2.0.1", Commerce One, Inc.,
Hacienda Business Park, Bldg. #4, 4440 Rosewood
Dr., Pleasanton, CA 94588, February.
[2] FpML.org (2001). “FpML.org: Financial products
Markup Language”, [Online], Available: http://
fpml.org, April.
[3] Gruber, Thomas R. (1993). "Towards Principles for
the Design of Ontologies Used for Knowledge Sharing", In International Workshop on Formal Ontology,
N. Guarino & R. Poli, (Eds.), Padova, Italy.
[4] Campbell, A. E. and Shapiro, S. C. (1995). "Ontological Mediation: An Overview", Proceedings of the
IJCAI Workshop on Basic Ontological Issues in
Knowledge Sharing, Menlo Park CA: AAAI Press
[5] Smith, Howard and and Poulter, Kevin (1999).
"Share the Ontology in XML-based Trading Architectures", Communications of the ACM, Vol. 42, No.
3.
10
0-7695-1435-9/02 $17.00 (c) 2002 IEEE
10