Journal of Computational Information Systems3:5(2007) 1-6 Available at http://www.JofCI.org XML Functional Dependencies based on Constraint-tree and Their Relationship with XML Keys Teng LV1,2, Ping YAN1,† 2 1 College of Mathematics and System Science, Xinjiang University, Urumqi 830046, China Teaching and Research Section of Computer, Artillery Academy of PLA, Hefei 230031, China Abstract A concept of functional dependency in XML documents based on constraint-tree is proposed. The definition proposed in this paper captures the tree-structured characteristics of XML documents in structure and considers a more general situation of XML functional dependencies based on sub-trees with some constraint conditions in XML documents, which overcomes the shortcomings of related definitions of XML functional dependencies. The relationship between XML functional dependencies and XML keys are also discussed. Keywords: XML; Functional Dependency; Key 1. Introduction XML (eXtensible Markup Language)[1] has become de facto standards of data exchange on the World Wide Web and is widely used in many fields. Although XML has many advantages, such as flexible, extensible, self-explanatory, etc, it is hard for XML to express semantic information as XML provides little mechanism for semantics. So it is necessary to study such problem in XML research field. One of topics of XML semantic is functional dependency, which is fundamental to other XML research fields, such as normalizing XML documents [2, 3], Querying XML documents [4], mapping between XML documents and other data forms [5-7], etc. Some schemas for XML documents are proposed, such as XML Schema [8], DTD (Document Type Definition)[9], etc. DTDs are widely used in many XML documents and supported by many applications and product providers. Although the theory of functional dependencies in relational database world has matured, there is no such mature and systematic theory for XML world because XML is new comparing to relational databases, and there are so many differences between relational schemas and XML schemas in structure. Related work. The theory of functional dependencies [10~12] for relational data-bases can not be directly applied in XML documents as there are significant differ-ences in their structures: relational model are flat while XML schemas are nested. For XML functional dependencies, there are two major approaches to define in XML research community. The first approach is based on paths in XML document, such as Refs. [13~18]. Unfortunately, they do not deal with the tree-structured situation pro-posed in this paper. The second approach is based on sub-graph or sub-tree in XML documents, such as Ref.[19], but it does not deal with tree-structured situation with some constraint conditions proposed in the paper. Ref. [20] deals with XML func-tional dependencies with constraint condition, but without specifying what kind of constraint they allowed for. For XML keys, Refs. [21-23] propos the concept of XML keys. † Corresponding author. Email addresses: [email protected] (T. Lv), [email protected] (P. Yan). 1553-9105/ Copyright © 2007 Binary Information Press June, 2007 Teng Lv et al. /Journal of Computational Information Systems 3:3 (2007) 1-6 2 In this paper, we give the definition of XML functional dependencies based on constraint tree. The definition proposed in our paper overcomes the shortcomings of previous definitions in the following aspects: (1) it captures the tree-structured char-acteristics of XML documents in structure. (2) it considers a more general situation of XML functional dependencies based on sub-trees with some constraint conditions in XML documents. More discussions can be found in Sub-section 3.2. The rest of the paper is organized as follows. Some notations are given in section 2 as a preliminary work. The definition of XML functional depend-encies is given in Section 3. The XML expression of XML functional dependencies and their relationship with XML keys are also given in Section 3. Section 4 concludes the paper and points out the directions of future work. 2. XML constraint-tree-based functional dependency (xCTFD) 2.1. A motivating example Example 1. Consider the following DTD D1 which describes the information of some courses, including the name of a course, a pair (a male and a female) taking the course, and an element community which indicates if the course is in a course community. We suppose that two courses are in the same course community if the two courses are taken by a same pair, i.e., the two courses have some similarity in aspect of having the same students. Moreover, all courses have this similarity construct a course community. <!ELEMENT courses (course*)> <!ELEMENT course (pair*,community)> <!ATTLIST course name CDATA #REQUIRED> <!ELEMENT pair (he,she)> <!ELEMENT he (#PCDATA)> <!ELEMENT she (#PCDATA)> <!ELEMENT community (#PCDATA)> We illustrate the structure of DTD D1 in Fig. 1, which just shows the necessary information of DTD for clarity. courses * course name * pair community he she Fig.1 A tree-structured DTD D1 Example 2. Fig. 2 is an XML tree T1 conforming to DTD D1, which says that there are 4 courses (“c1”, “c2”, “c3” and “c4”). Teng Lv et al. /Journal of Computational Information Systems 3:3 (2007) 1-6 3 courses course course name pair c1 he she pair community name pair com1 c2 he he she she course pair community name pair com1 c3 he she Tom Jane Jim Mary Tom Jane Jim Tina he course pair community name pair com1 c4 she he she he Tom Tina Jim Mary she pair commnunity com2 he she Tom Mary Jim Jane Fig. 2 An XML tree T1 conforming to DTD D1 A path p in D=(E, A, P, R, r) is defined to be p=ω1.….ωn, where (1) ω1=r; (2) ωi∈P(ωi-1), i∈[2,…,n-1]; (3) ωn∈P(ωn-1) if ωn∈E and P(ωn)≠Φ, or ωn=S if ωn∈E and P(ωn)= Φ, or ωn∈R(ωn-1) if ωn∈A. Let paths(D)={p | p is a path in D }. Similarly, we can define a path in a part of DTD. Definition 1. Given a DTD D, suppose v is a vertex of D. A v-subtree is a tree rooted on v in D. Similarly, we can define the path in the v-subtree as a part of path started from vertex v. If a v-subtree contains all vertexes which can be reached from root v through all the paths in v-subtree, then it is called a full v-subtree in D. 2.2. Definition of xCTFD Definition2. An XML constraint-tree-based functional dependency (xCTFD) has the form {v : X |C Y } , where v is a vertex of DTD D, X and Y are v-subtrees of D, and C is a constraint condition of X. An XML tree T conforming to DTD D satisfies xCTFD {v : X |C Y } if for any two pre-images W1 and W2 of a full v-subtree of D in T, the projections W1 ( Y ) = W2 ( Y ) whenever condition C is satisfied, which is defined as the following form: { v’-subtree1 = v’-subtree2 | v’ is a vertex of DTD, v’-subtree1 W1 ( X ) , and v’-subtree2 W2 ( X ) }. Of course, more complicated conditions can be defined, but we only consider a specific type of condition C in xCTFDs in this paper, which is the most common and useful constraint in real XML applications. The concepts pre-image and projection proposed in the above definition are same as those in graph theory, so we do not elaborate them any more here. Example 3. There is a xCTFD {course : X |C Y } in XML tree T1 (Fig. 2), where X (Fig. 3) is the course-subtrees with leaves “he” and “she” of DTD D1, Y (Fig. 4) is the course-subtrees with leaves “community” of DTD D1, and C is the condition that there exists a pair is equal, i.e., { pair-subtree1=pair-subtree2 | pair-subtree1 W1 ( X ) and pair-subtree2 W2 ( X ) }. course pair he * course she Fig.3 A course-subtree X of DTD D1 community Fig. 4 A course-subtree Y of DTD D1 The semantic meaning of the above xCTFD is that the two courses belong to the same course community if there exists a pair are equal in the two courses. The intuitive meaning implied by this xCTFD is that if two courses are taken by a same pair, then the two courses have some similarity in aspect of having the same students. Moreover, all courses have this similarity construct a course community. The above xCTFD can not be expressed by earlier related XML functional dependencies. For example, path-based XML FDs [2,3,13~18] can only express the above xCTFD in the following FD form: Teng Lv et al. /Journal of Computational Information Systems 3:3 (2007) 1-6 4 {courses.course.paircourses.course.community}, which just says that a pair determines a community. Ref. [17] can only express the above xCTFD in the form {course: XY} without condition C, which says that all the set of pairs determines a community. Ref. [18] does not specify the formal form of constraint condition C. So they do not capture the exact semantic of xCTFD {course : X | C Y } defined here, which says that a pair in a set of pairs determines a community. 2.3. XML form of xCTFD An xCTFD {v : X | C Y } can be expressed in an XML form. We give the schema DTD of xCTFD as following (xCTFD.dtd): <?xml version="1.0" ?> <!ELEMENT xmlFDs (xCTFD*)> <!ELEMENT xCTFD (v?,v-subtreeX,v-subtreeY,conditionC?)> <!ATTLIST xCTFD xCTFDid ID #REQUIRED> <!ELEMENT v (vNode)> <!ELEMENT vNode (#PCDATA)> <!ELEMENT v-subtreeX (InternalNodes*,LeaveNodes+)> <!ELEMENT v-subtreeY (InternalNodes*,LeaveNodes+)> <!ELEMENT InternalNodes (#PCDATA)> <!ELEMENT LeaveNodes (#PCDATA)> <!ELEMENT conditionC (#PCDATA)> Element type xmlFDs is the set of all xCTFDs. Element type xCTFD is a specific xCTFD, which has an attiubte xCTFDid (unique ID) to specify the xCTFD, an element type v to indicate the vertex v, two element types v-subtreeX and v-subtreeY to indicate the v-subtree X and v-subtree Y, respectively (in which elements LeaveNodes are used to specify the leave nodes of the two sub-trees), and an element type conditionC to indicate the condition C. Example 4. Each xCTFD can be expressed in XML form and xCTFD has the uniform of XML documents. For example, xCTFD {course : X |C Y } in XML tree T1 can be expressed as the following XML file (xCTFD1.xml): <?xml version="1.0" ?> <!DOCTYPE FunctionalDependencies SYSTEM "xCTFD.dtd"> <xmlFDs> <xCTFD xCTFDid ="100"> <v> <vNode> course </vNode> </v> <v-subtreeX> <InternalNodes> pair </InternalNodes> <LeaveNodes> he </LeaveNodes> <LeaveNodes> she </LeaveNodes> </v-subtreeX> <v-subtreeY> <LeaveNodes> community </LeaveNodes> </v-subtreeY> <conditionC> there exists a pair is equal in X </conditionC> </xCTFD> </xmlFDs > Teng Lv et al. /Journal of Computational Information Systems 3:3 (2007) 1-6 5 2.4. The relationship between xCTFD and XML keys In this section, we will discuss the relationship between xCTFD and XML keys. We first give the definition of XML key based on tree-structure: Definition 3. The key of XML has the form {v : Y ( X )} where v is a vertex of DTD D, X and Y are v-subtrees of D. An XML tree T conforming to DTD D satisfies key {v : Y ( X )} if for any two pre-images W1 and W2 of a full v-subtree of D in T, the projections W1 ( Y ) = W2 (Y ) whenever W1 ( X ) = W2 ( X ) . If v is the root of an XML document or is null, then key {v : Y ( X )} is simplified as {Y ( X )} , and is called a global XML key which means that the key is satisfied in the whole XML tree ; otherwise, it is called a local XML key which means that the key is satisfied in a sub-tree rooted on the vertex v. From definitions of XML key and xCTFD, it is easy to obtain the relationship between XML key and xCTFD as the following theorem shows: Theorem 1. An XML tree satisfies an XML key {v : Y ( X )} iff it satisfies xCTFD {v : X |C Y } where the condition C is null, i.e., xCTFD {v : X | Y } . 3. Conclusions and future work Functional dependencies are very important semantic information in XML documents, which are fundamental to other related XML research topics such as normalizing XML documents and query optimization. This paper extended the theory of functional dependencies in relational database world to the XML world and proposes the formal definition of functional dependencies in XML documents which are based on constraint-tree. The XML functional dependencies in our work can express more semantics information and applied in more general situations in XML documents. The future work should be done on the issues of complete inference rules for XML functional dependencies. Another interesting work is to elaborate the condition in the proposed xCTFD definition. Acknowledgement This work is supported by Natural Science Foundation of China (No.60563001), Natural Science Foundation of Anhui Province (Key Technologies of Data Integration based on XML), College Science & Research Plan Project of Xinjiang Uighur Autonomous Region (No.XJEDU2004S04), Science Research Foundation for Young Teachers of Xinjiang University (No.QN040101). The authors are grateful for the anonymous reviewers who made constructive comments. References [1] [2] [3] [4] [5] [6] [7] [8] [9] T. Bray, J. Paoli, etc. Extensible Markup Language (XML) third edition. http://www.w3.org/TR/REC-xml. Teng Lv, Ning Gu, and Ping Yan. Normal forms for XML documents. Information and Software Technology, 2004, 46(12): 839~846. Marcelo Arenas and Leonid Libkin. A Normal Form for XML Documents. Symposium on Principles of Database Systems (PODS'02), Madison, Wisconsin, U.S.A. ACM press, 2002, pp.85~96. Alin Deutsch and Val Tannen. Querying XML with Mixed and Redundant Storage. Techni-cal Report MS-CIS-02-01 (2002). Teng Lv and Ping Yan. Mapping DTDs to relational schemas with semantic constraints. Information and Software Technology, 2006, 48(4): 245-252. D. Lee, M. Mani, and W. W. Chu. Schema conversion methods between XML and relational models. Knowledge Transformation for the Semantic Web, Frontiers in Artificial Intelli-gence and Applications, Vol. 95, IOS Press, 2003, pp.1-17. S. Lu, Y. Sun, M. Atay, etc. A new inlining algorithm for mapping XML DTDs to relational schemas. ER workshops 2003, Spinger, Lecture Notes in Computer Science, Vol. 2814, 2003, pp366-377. XML Schema Part 0: Primer Second Edition. W3C Recommendation, http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/. W3C XML Specification DTD. http://www.w3.org/XML/1998/06/xmlspec-report-19980910.htm, Jun, 1998. Teng Lv et al. /Journal of Computational Information Systems 3:3 (2007) 1-6 6 [10] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, Reading, Massachusetts 1995. [11] C. S. Hara and S. B. Davidson. Reasoning about nested functional dependencies. In : Proc of ACM Symp on principles of Database Systems( PODS), Philadelphia: ACM Press, 1999, pp.91-100. [12] W. Y. Mok, Y. K. Ng, and D. W. Embley. A normal form for precisely characterizing redun-dancy in nested relations. ACM Trans. Database Syst. 1996, 21(1): 77-106. [13] M. Vincent, J. Liu, and C. Liu. Strong functional dependencies and their application to normal forms in XML. ACM Transactions on Database Systems, 2004, 29(3): 445-462. [14] Mong Li Lee, Tok Wang Ling, and Wai Lup Low. Designing Functional Dependencies for XML, in VIII Conference on Extending Database Technology (EDBT'02), Springer, 2002, pp.124~141. [15] Millist Vincent and Jixue Liu. Functional dependencies for XML. In Proc. of 5th Asian-Pacific Web Conference (APWeb 2003), Lecture Notes in Computer Science, Vol. 2642, Springer, 2003, pp.22-34. [16] Jixue Liu, Millist Vincent, and Chengfei Liu. Local XML functional dependencies. Proc. of WIDM'03, pp.23-28. [17] J. Liu, M. Vincent, and C. Liu. Functional dependencies from relational to XML. Ershov Memorial Conference 2003, pp.531-538. [18] Ping Yan and Teng Lv. Functional Dependencies in XML Documents. Porc. of APWeb 2006 workshop. Lecture Notes in Computer Science (LNCS), Springer, 2006, Vol. 3842, pp. 29-37. [19] Sven Hartmann and Sebastian Link. More functional dependencies for XML. In: Proc. of ADBIS 2003, LNCS 2798. Germany: Springer, 2003, pp.355~369. [20] Teng Lv and Ping Yan. XML Constraint-tree-based functional dependencies. Proc. of 2006 IEEE International Conference on e-Business Engineering (ICEBE2006), IEEE Computer Society Press, 2006, pp.224-228. [21] Peter Buneman, Susan Davidson, Wenfei Fan, etc. Keys for XML. Computer Networks, 2002, 39(5): 473~487. [22] Peter Buneman, Wenfei Fan, J. Simeon, etc. Constraints for semistructured data and XML. ACM SIGMOD Record, 2001, 30(1): 47~54. [23] Peter Buneman, Susan Davidson, Wenfei Fan, etc. Reasoning about keys for XML. Lecture Notes in Computer Science (LNCS), Springer, 2001, Volume 2397 : pp.133~148.
© Copyright 2026 Paperzz