Presented

A TREE BASED ALGEBRA
FRAMEWORK FOR XML DATA
SYSTEMS
Ali El bekai, Nick Rossiter
School of Informatics, Northumbria University
Email: [email protected] ,
[email protected]
Overview
• Framework in algebra for processing XML data.
• Review related work
• Develop a simple algebra, called TA (Tree
Algebra), for processing storing and
manipulating XML data as trees
• Describe input and output of the algebraic
operators
• Define the syntax of relationships/operators and
their semantics in terms of algorithms.
• Examples are given in the domain specific XML
query language.
• Discuss closure and application
Related Work
•
•
•
•
•
•
IBM (Beech & Rys, 1999)
Lore (McHugh et al 1997)
YATL (Christophides et al 2000)
Niagara (Galanis et al 2001)
AT&T (W3C)
TAX (Jagadish et al 2001)
• Problems identified in complexity and generality
Tree Algebra
• True tree
– Each node one parent but many children
– Root node
• Leaves of tree
– Correspond to different sources – object
relational
• Two types of operators
– Algebraic operators
– Relational operators
Concepts in Tree Model
•
•
•
•
•
•
•
Root (ultimate ancestor or parent)
Node (parent or child)
Edge (link from a parent to a child)
Leaf (atomic values, nodes with no children)
Path (sequence of edges between nodes)
Descendants (all successor nodes for a node)
Ancestors (all parent nodes for a node)
Mappings
•
•
•
•
XML Document  Tree
Element  Node (root, parent, child)
Leaf  child node, atomic values
Attribute  function, values
Example XML Tree
ad : (ancestor - descendent)
Doc: collection
pc : (parent - child)
pc
ad
Element Edge
object1
pc
Parent Edge
object3
pc
pc
pc
objid
objNumber
objectInfor1
Info_id
100
des_date
objectInfor2
Info_id
des_date
1234
10
12.10.98 20
12.12.98
objid
301
pc
objectInfor4
objNumber objectInfor3
pc
pc
3239
Info_id des_date
03
12.10.99
pc
Info_id des_date
09
12.12.99
Img_id
016
referenceinfo
imageinfor
F_format
pdf
ref_id title
r35
Root – collection element; object1, object3 – sub-elements;
type
colletor bibliographic
Algebraic Relationships
• Comparison of two trees
• Universal (unary)
– Defines tree containing all information
• Similarity (binary)
– Two trees have the same structure
• Equivalence (binary)
– Two trees are indistinguishable
• Subsumption (binary)
– One tree is subsumed in another
Example Equivalence Relationship
Doc3
collection3
object3
objectInfor1
Info_id
objectInfor2
desc_date
~
~
Doc4
collection4
object1
objectInfor1
Info_id
Info_id
desc_date
10 12.10.98
20 12.12.98
object3
objectInfor2
desc_date
desc_date
Info_id
12.12.98
10
12.10.98 20
XML Tree Collection3 is equivalent to Collection4:
Same node structure, no mismatch in content
Example Subsumption Relationship
Doc 3
Doc1
collection3
object1
objectInfor1
Info_id
Info_id
desc_date
10
12.10.98 20
collection1
object3
object1
object3
objectInfor2
objectInfor1
objectInfor2
desc_date
Info_id
10
12.12.98
desc_date
objectInfor4
objectInfor3
desc_date
desc_date
Info_id
Info_id
imageinfor
12.10.98 20
12.12.98 03
12.10.99
Img_id
referenceinfo
format
16 pdf
Collection3 is part of collection4 (structure and content)
ref_id
type
r35 Bibliographic
Algebraic Operators for Trees
• Join (binary, input two trees, output one tree,
commutative, associative)
– Joined on a predicate
• Union (binary, input two trees, output one tree,
commutative, associative, disjoint)
– Summing trees together
• Complement (binary, input two trees, output one
tree, not commutative, not associative)
– Nodes in one tree not found in another
Algorithm for Complement Operator
// Input two XML document or two DOC tree (DOCn Tree, DOCm Tree)
// Output DOCnm Tree = (DOCn Tree - DOCm Tree)
1 Start from root node DOCn
2 If root node DOCn Tree and root node DOCm Tree has parent/child node
2.1 Perform depth-first algorithm
2.2 If DOCn Tree has parent node not existing in DOCm Tree
2.2.1 set parent node DOCn Tree to the new DOCnm Tree
2.2.2 while parent node DOCn Tree has child node not
existing in DOCm Tree
2.2.2.1 set child node DOCn Tree to DOCnm Tree
2.2.2.2 if child node DOCn Tree has leaf node not
existing in DOCm Tree
2.2.2.2.1 set leaf node DOCn Tree to DOCnm Tree
2.2.2.3 set null to DOCnm Tree
2.2.3 repeat
2.3 set null to DOCnm Tree
3 Set root node to DOCnm Tree and terminate
4 end/terminate
Projection Algebra Operator (unary, input one tree,
output one tree): Example
Doc1p
object3
Doc1
collection1
objectInfor4
objectInfor3
object1
objectInfor1
object3
objectInfor2
objectInfor3
Info_id
objectInfor4
03
desc_date
desc_date
desc_date
Info_id
imageinfor
Info_id
referenceinfo
12.10.99
10 12.10.98 20 12.12.98 03
format
type
Img_id
ref_id
Info_id
16
pdf
desc_date
imageinfor
referenceinfo
12.10.99
type
format
Img_id
ref_id
r35 Bibliographic
Eliminates nodes other than those specified
Projection of object3
Bibliographic
16 pdf r35
Algebra Operators (continued)
• Select (unary, input one tree, output one tree)
– Filters nodes according to a predicate
• Expose (unary, input one tree, output one tree)
– Retrieve specific elements/nodes given by
parent/child boundaries
• Vertex (unary, input one tree, output one tree)
– Creates the vertex encompassing all nodes created
by the expose operator
Algorithm for Complement Operator
// Input one DOC tree or one XML document
// Output one DOC tree or one XML document
1 start with entry point, it is the root node
2 perform depth-first algorithm
2.1 if parameter is equal to the specific node needed to expose
2.1.1 return the specific node
2.1.2 set specific node in the new tree
2.2 if exposed element does not exist then terminate
3 end/terminate
Results
• Developed
– Domain specific algebra
– Tree algebra
– Algebraic relationships
• Universal, similarity, equivalence, subsumption
– Algebraic operators
• Join, union, complement, project, select, expose,
vertex
– Closure – output is always a tree
Verification
• All operators:
– Presented as algorithms
– Implemented in java
• Case study:
– Virtual museum application
– Implemented code employed for satisfaction
of museum requirements
Further Work
• Investigate
– Extent to which limitations in operators affects
usability
– Does domain need extending?
• Further experimentation
– Examine feedback from museum study
– Look at further areas