A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS Ali El bekai, Nick Rossiter School of Informatics, Northumbria University Email: [email protected] , [email protected] Overview • Framework in algebra for processing XML data. • Review related work • Develop a simple algebra, called TA (Tree Algebra), for processing storing and manipulating XML data as trees • Describe input and output of the algebraic operators • Define the syntax of relationships/operators and their semantics in terms of algorithms. • Examples are given in the domain specific XML query language. • Discuss closure and application Related Work • • • • • • IBM (Beech & Rys, 1999) Lore (McHugh et al 1997) YATL (Christophides et al 2000) Niagara (Galanis et al 2001) AT&T (W3C) TAX (Jagadish et al 2001) • Problems identified in complexity and generality Tree Algebra • True tree – Each node one parent but many children – Root node • Leaves of tree – Correspond to different sources – object relational • Two types of operators – Algebraic operators – Relational operators Concepts in Tree Model • • • • • • • Root (ultimate ancestor or parent) Node (parent or child) Edge (link from a parent to a child) Leaf (atomic values, nodes with no children) Path (sequence of edges between nodes) Descendants (all successor nodes for a node) Ancestors (all parent nodes for a node) Mappings • • • • XML Document Tree Element Node (root, parent, child) Leaf child node, atomic values Attribute function, values Example XML Tree ad : (ancestor - descendent) Doc: collection pc : (parent - child) pc ad Element Edge object1 pc Parent Edge object3 pc pc pc objid objNumber objectInfor1 Info_id 100 des_date objectInfor2 Info_id des_date 1234 10 12.10.98 20 12.12.98 objid 301 pc objectInfor4 objNumber objectInfor3 pc pc 3239 Info_id des_date 03 12.10.99 pc Info_id des_date 09 12.12.99 Img_id 016 referenceinfo imageinfor F_format pdf ref_id title r35 Root – collection element; object1, object3 – sub-elements; type colletor bibliographic Algebraic Relationships • Comparison of two trees • Universal (unary) – Defines tree containing all information • Similarity (binary) – Two trees have the same structure • Equivalence (binary) – Two trees are indistinguishable • Subsumption (binary) – One tree is subsumed in another Example Equivalence Relationship Doc3 collection3 object3 objectInfor1 Info_id objectInfor2 desc_date ~ ~ Doc4 collection4 object1 objectInfor1 Info_id Info_id desc_date 10 12.10.98 20 12.12.98 object3 objectInfor2 desc_date desc_date Info_id 12.12.98 10 12.10.98 20 XML Tree Collection3 is equivalent to Collection4: Same node structure, no mismatch in content Example Subsumption Relationship Doc 3 Doc1 collection3 object1 objectInfor1 Info_id Info_id desc_date 10 12.10.98 20 collection1 object3 object1 object3 objectInfor2 objectInfor1 objectInfor2 desc_date Info_id 10 12.12.98 desc_date objectInfor4 objectInfor3 desc_date desc_date Info_id Info_id imageinfor 12.10.98 20 12.12.98 03 12.10.99 Img_id referenceinfo format 16 pdf Collection3 is part of collection4 (structure and content) ref_id type r35 Bibliographic Algebraic Operators for Trees • Join (binary, input two trees, output one tree, commutative, associative) – Joined on a predicate • Union (binary, input two trees, output one tree, commutative, associative, disjoint) – Summing trees together • Complement (binary, input two trees, output one tree, not commutative, not associative) – Nodes in one tree not found in another Algorithm for Complement Operator // Input two XML document or two DOC tree (DOCn Tree, DOCm Tree) // Output DOCnm Tree = (DOCn Tree - DOCm Tree) 1 Start from root node DOCn 2 If root node DOCn Tree and root node DOCm Tree has parent/child node 2.1 Perform depth-first algorithm 2.2 If DOCn Tree has parent node not existing in DOCm Tree 2.2.1 set parent node DOCn Tree to the new DOCnm Tree 2.2.2 while parent node DOCn Tree has child node not existing in DOCm Tree 2.2.2.1 set child node DOCn Tree to DOCnm Tree 2.2.2.2 if child node DOCn Tree has leaf node not existing in DOCm Tree 2.2.2.2.1 set leaf node DOCn Tree to DOCnm Tree 2.2.2.3 set null to DOCnm Tree 2.2.3 repeat 2.3 set null to DOCnm Tree 3 Set root node to DOCnm Tree and terminate 4 end/terminate Projection Algebra Operator (unary, input one tree, output one tree): Example Doc1p object3 Doc1 collection1 objectInfor4 objectInfor3 object1 objectInfor1 object3 objectInfor2 objectInfor3 Info_id objectInfor4 03 desc_date desc_date desc_date Info_id imageinfor Info_id referenceinfo 12.10.99 10 12.10.98 20 12.12.98 03 format type Img_id ref_id Info_id 16 pdf desc_date imageinfor referenceinfo 12.10.99 type format Img_id ref_id r35 Bibliographic Eliminates nodes other than those specified Projection of object3 Bibliographic 16 pdf r35 Algebra Operators (continued) • Select (unary, input one tree, output one tree) – Filters nodes according to a predicate • Expose (unary, input one tree, output one tree) – Retrieve specific elements/nodes given by parent/child boundaries • Vertex (unary, input one tree, output one tree) – Creates the vertex encompassing all nodes created by the expose operator Algorithm for Complement Operator // Input one DOC tree or one XML document // Output one DOC tree or one XML document 1 start with entry point, it is the root node 2 perform depth-first algorithm 2.1 if parameter is equal to the specific node needed to expose 2.1.1 return the specific node 2.1.2 set specific node in the new tree 2.2 if exposed element does not exist then terminate 3 end/terminate Results • Developed – Domain specific algebra – Tree algebra – Algebraic relationships • Universal, similarity, equivalence, subsumption – Algebraic operators • Join, union, complement, project, select, expose, vertex – Closure – output is always a tree Verification • All operators: – Presented as algorithms – Implemented in java • Case study: – Virtual museum application – Implemented code employed for satisfaction of museum requirements Further Work • Investigate – Extent to which limitations in operators affects usability – Does domain need extending? • Further experimentation – Examine feedback from museum study – Look at further areas
© Copyright 2025 Paperzz