1. Introduction

Journal of Computational Information Systems3:5(2007) 1-6
Available at http://www.JofCI.org
XML Functional Dependencies based on Constraint-tree and Their
Relationship with XML Keys
Teng LV1,2, Ping YAN1,†
2
1
College of Mathematics and System Science, Xinjiang University, Urumqi 830046, China
Teaching and Research Section of Computer, Artillery Academy of PLA, Hefei 230031, China
Abstract
A concept of functional dependency in XML documents based on constraint-tree is proposed. The definition proposed in
this paper captures the tree-structured characteristics of XML documents in structure and considers a more general situation
of XML functional dependencies based on sub-trees with some constraint conditions in XML documents, which overcomes
the shortcomings of related definitions of XML functional dependencies. The relationship between XML functional
dependencies and XML keys are also discussed.
Keywords: XML; Functional Dependency; Key
1. Introduction
XML (eXtensible Markup Language)[1] has become de facto standards of data exchange on the World
Wide Web and is widely used in many fields. Although XML has many advantages, such as flexible,
extensible, self-explanatory, etc, it is hard for XML to express semantic information as XML provides little
mechanism for semantics. So it is necessary to study such problem in XML research field. One of topics of
XML semantic is functional dependency, which is fundamental to other XML research fields, such as
normalizing XML documents [2, 3], Querying XML documents [4], mapping between XML documents
and other data forms [5-7], etc. Some schemas for XML documents are proposed, such as XML Schema [8],
DTD (Document Type Definition)[9], etc. DTDs are widely used in many XML documents and supported
by many applications and product providers. Although the theory of functional dependencies in relational
database world has matured, there is no such mature and systematic theory for XML world because XML is
new comparing to relational databases, and there are so many differences between relational schemas and
XML schemas in structure.
Related work. The theory of functional dependencies [10~12] for relational data-bases can not be
directly applied in XML documents as there are significant differ-ences in their structures: relational model
are flat while XML schemas are nested. For XML functional dependencies, there are two major approaches
to define in XML research community. The first approach is based on paths in XML document, such as
Refs. [13~18]. Unfortunately, they do not deal with the tree-structured situation pro-posed in this paper.
The second approach is based on sub-graph or sub-tree in XML documents, such as Ref.[19], but it does
not deal with tree-structured situation with some constraint conditions proposed in the paper. Ref. [20]
deals with XML func-tional dependencies with constraint condition, but without specifying what kind of
constraint they allowed for. For XML keys, Refs. [21-23] propos the concept of XML keys.
†
Corresponding author.
Email addresses: [email protected] (T. Lv), [email protected] (P. Yan).
1553-9105/ Copyright © 2007 Binary Information Press
June, 2007
Teng Lv et al. /Journal of Computational Information Systems 3:3 (2007) 1-6
2
In this paper, we give the definition of XML functional dependencies based on constraint tree. The
definition proposed in our paper overcomes the shortcomings of previous definitions in the following
aspects: (1) it captures the tree-structured char-acteristics of XML documents in structure. (2) it considers a
more general situation of XML functional dependencies based on sub-trees with some constraint conditions
in XML documents. More discussions can be found in Sub-section 3.2.
The rest of the paper is organized as follows. Some notations are given in section 2 as a preliminary work.
The definition of XML functional depend-encies is given in Section 3. The XML expression of XML
functional dependencies and their relationship with XML keys are also given in Section 3. Section 4
concludes the paper and points out the directions of future work.
2. XML constraint-tree-based functional dependency (xCTFD)
2.1. A motivating example
Example 1. Consider the following DTD D1 which describes the information of some courses, including
the name of a course, a pair (a male and a female) taking the course, and an element community which
indicates if the course is in a course community. We suppose that two courses are in the same course
community if the two courses are taken by a same pair, i.e., the two courses have some similarity in aspect
of having the same students. Moreover, all courses have this similarity construct a course community.
<!ELEMENT courses (course*)>
<!ELEMENT course (pair*,community)>
<!ATTLIST course name CDATA #REQUIRED>
<!ELEMENT pair (he,she)>
<!ELEMENT he (#PCDATA)>
<!ELEMENT she (#PCDATA)>
<!ELEMENT community (#PCDATA)>
We illustrate the structure of DTD D1 in Fig. 1, which just shows the necessary information of DTD for
clarity.
courses
*
course
name
*
pair community
he
she
Fig.1 A tree-structured DTD D1
Example 2. Fig. 2 is an XML tree T1 conforming to DTD D1, which says that there are 4 courses (“c1”,
“c2”, “c3” and “c4”).
Teng Lv et al. /Journal of Computational Information Systems 3:3 (2007) 1-6
3
courses
course
course
name pair
c1
he
she
pair community name pair
com1
c2
he
he
she
she
course
pair community name pair
com1
c3
he
she
Tom Jane Jim Mary
Tom Jane Jim Tina
he
course
pair community name pair
com1
c4
she
he
she
he
Tom Tina Jim Mary
she
pair commnunity
com2
he
she
Tom Mary Jim Jane
Fig. 2 An XML tree T1 conforming to DTD D1
A path p in D=(E, A, P, R, r) is defined to be p=ω1.….ωn, where (1) ω1=r; (2) ωi∈P(ωi-1), i∈[2,…,n-1];
(3) ωn∈P(ωn-1) if ωn∈E and P(ωn)≠Φ, or ωn=S if ωn∈E and P(ωn)= Φ, or ωn∈R(ωn-1) if ωn∈A. Let
paths(D)={p | p is a path in D }. Similarly, we can define a path in a part of DTD.
Definition 1. Given a DTD D, suppose v is a vertex of D. A v-subtree is a tree rooted on v in D.
Similarly, we can define the path in the v-subtree as a part of path started from vertex v. If a v-subtree
contains all vertexes which can be reached from root v through all the paths in v-subtree, then it is called a
full v-subtree in D.
2.2. Definition of xCTFD
Definition2.
An
XML
constraint-tree-based
functional
dependency
(xCTFD)
has
the
form
{v : X |C  Y } , where v is a vertex of DTD D, X and Y are v-subtrees of D, and C is a constraint
condition of X. An XML tree T conforming to DTD D satisfies xCTFD {v : X |C  Y } if for any two
pre-images W1 and W2 of a full v-subtree of D in T, the projections W1 ( Y ) = W2 ( Y ) whenever
condition C is satisfied, which is defined as the following form:
{  v’-subtree1 = v’-subtree2 | v’ is a vertex of DTD, v’-subtree1  W1 ( X ) , and v’-subtree2  W2 ( X ) }.
Of course, more complicated conditions can be defined, but we only consider a specific type of condition C
in xCTFDs in this paper, which is the most common and useful constraint in real XML applications.
The concepts pre-image and projection proposed in the above definition are same as those in graph
theory, so we do not elaborate them any more here.
Example 3. There is a xCTFD
{course : X |C  Y } in XML tree T1 (Fig. 2), where X (Fig. 3) is the
course-subtrees with leaves “he” and “she” of DTD D1, Y (Fig. 4) is the course-subtrees with leaves
“community” of DTD D1, and C is the condition that there exists a pair is equal, i.e.,
{  pair-subtree1=pair-subtree2 | pair-subtree1  W1 ( X ) and pair-subtree2  W2 ( X ) }.
course
pair
he
*
course
she
Fig.3 A course-subtree X of DTD D1
community
Fig. 4 A course-subtree Y of DTD D1
The semantic meaning of the above xCTFD is that the two courses belong to the same course community if there
exists a pair are equal in the two courses. The intuitive meaning implied by this xCTFD is that if two courses are taken
by a same pair, then the two courses have some similarity in aspect of having the same students. Moreover, all courses
have this similarity construct a course community.
The above xCTFD can not be expressed by earlier related XML functional dependencies. For example,
path-based XML FDs [2,3,13~18] can only express the above xCTFD in the following FD form:
Teng Lv et al. /Journal of Computational Information Systems 3:3 (2007) 1-6
4
{courses.course.paircourses.course.community}, which just says that a pair determines a community. Ref.
[17] can only express the above xCTFD in the form {course: XY} without condition C, which says that
all the set of pairs determines a community. Ref. [18] does not specify the formal form of constraint
condition C. So they do not capture the exact semantic of xCTFD {course : X | C  Y } defined here,
which says that a pair in a set of pairs determines a community.
2.3. XML form of xCTFD
An xCTFD {v : X | C  Y } can be expressed in an XML form. We give the schema DTD of xCTFD as
following (xCTFD.dtd):
<?xml version="1.0" ?>
<!ELEMENT xmlFDs (xCTFD*)>
<!ELEMENT xCTFD (v?,v-subtreeX,v-subtreeY,conditionC?)>
<!ATTLIST xCTFD xCTFDid ID #REQUIRED>
<!ELEMENT v (vNode)>
<!ELEMENT vNode (#PCDATA)>
<!ELEMENT v-subtreeX (InternalNodes*,LeaveNodes+)>
<!ELEMENT v-subtreeY (InternalNodes*,LeaveNodes+)>
<!ELEMENT InternalNodes (#PCDATA)>
<!ELEMENT LeaveNodes (#PCDATA)>
<!ELEMENT conditionC (#PCDATA)>
Element type xmlFDs is the set of all xCTFDs. Element type xCTFD is a specific xCTFD, which has an
attiubte xCTFDid (unique ID) to specify the xCTFD, an element type v to indicate the vertex v, two element
types v-subtreeX and v-subtreeY to indicate the v-subtree X and v-subtree Y, respectively (in which elements
LeaveNodes are used to specify the leave nodes of the two sub-trees), and an element type conditionC to
indicate the condition C.
Example 4. Each xCTFD can be expressed in XML form and xCTFD has the uniform of XML
documents. For example, xCTFD
{course : X |C  Y } in XML tree T1 can be expressed as the
following XML file (xCTFD1.xml):
<?xml version="1.0" ?>
<!DOCTYPE FunctionalDependencies SYSTEM "xCTFD.dtd">
<xmlFDs>
<xCTFD xCTFDid ="100">
<v>
<vNode> course </vNode>
</v>
<v-subtreeX>
<InternalNodes> pair </InternalNodes>
<LeaveNodes> he </LeaveNodes>
<LeaveNodes> she </LeaveNodes>
</v-subtreeX>
<v-subtreeY>
<LeaveNodes> community </LeaveNodes>
</v-subtreeY>
<conditionC> there exists a pair is equal in X </conditionC>
</xCTFD>
</xmlFDs >
Teng Lv et al. /Journal of Computational Information Systems 3:3 (2007) 1-6
5
2.4. The relationship between xCTFD and XML keys
In this section, we will discuss the relationship between xCTFD and XML keys. We first give the definition
of XML key based on tree-structure:
Definition 3. The key of XML has the form {v : Y ( X )} where v is a vertex of DTD D, X and Y are
v-subtrees of D. An XML tree T conforming to DTD D satisfies key {v : Y ( X )} if for any two
pre-images W1 and W2 of a full v-subtree of D in T, the projections W1 ( Y ) = W2 (Y ) whenever
W1 ( X ) = W2 ( X ) . If v is the root of an XML document or is null, then key {v : Y ( X )} is
simplified as {Y ( X )} , and is called a global XML key which means that the key is satisfied in the whole
XML tree ; otherwise, it is called a local XML key which means that the key is satisfied in a sub-tree rooted
on the vertex v.
From definitions of XML key and xCTFD, it is easy to obtain the relationship between XML key and
xCTFD as the following theorem shows:
Theorem 1. An XML tree satisfies an XML key {v : Y ( X )} iff it satisfies xCTFD {v : X |C  Y }
where the condition C is null, i.e., xCTFD {v : X | Y } .
3. Conclusions and future work
Functional dependencies are very important semantic information in XML documents, which are
fundamental to other related XML research topics such as normalizing XML documents and query
optimization. This paper extended the theory of functional dependencies in relational database world to the
XML world and proposes the formal definition of functional dependencies in XML documents which are
based on constraint-tree. The XML functional dependencies in our work can express more semantics
information and applied in more general situations in XML documents.
The future work should be done on the issues of complete inference rules for XML functional
dependencies. Another interesting work is to elaborate the condition in the proposed xCTFD definition.
Acknowledgement
This work is supported by Natural Science Foundation of China (No.60563001), Natural Science
Foundation of Anhui Province (Key Technologies of Data Integration based on XML), College Science &
Research Plan Project of Xinjiang Uighur Autonomous Region (No.XJEDU2004S04), Science Research
Foundation for Young Teachers of Xinjiang University (No.QN040101). The authors are grateful for the
anonymous reviewers who made constructive comments.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
T. Bray, J. Paoli, etc. Extensible Markup Language (XML) third edition. http://www.w3.org/TR/REC-xml.
Teng Lv, Ning Gu, and Ping Yan. Normal forms for XML documents. Information and Software Technology,
2004, 46(12): 839~846.
Marcelo Arenas and Leonid Libkin. A Normal Form for XML Documents. Symposium on Principles of Database
Systems (PODS'02), Madison, Wisconsin, U.S.A. ACM press, 2002, pp.85~96.
Alin Deutsch and Val Tannen. Querying XML with Mixed and Redundant Storage. Techni-cal Report
MS-CIS-02-01 (2002).
Teng Lv and Ping Yan. Mapping DTDs to relational schemas with semantic constraints. Information and
Software Technology, 2006, 48(4): 245-252.
D. Lee, M. Mani, and W. W. Chu. Schema conversion methods between XML and relational models. Knowledge
Transformation for the Semantic Web, Frontiers in Artificial Intelli-gence and Applications, Vol. 95, IOS Press,
2003, pp.1-17.
S. Lu, Y. Sun, M. Atay, etc. A new inlining algorithm for mapping XML DTDs to relational schemas. ER
workshops 2003, Spinger, Lecture Notes in Computer Science, Vol. 2814, 2003, pp366-377.
XML
Schema
Part
0:
Primer
Second
Edition.
W3C
Recommendation,
http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/.
W3C XML Specification DTD. http://www.w3.org/XML/1998/06/xmlspec-report-19980910.htm, Jun, 1998.
Teng Lv et al. /Journal of Computational Information Systems 3:3 (2007) 1-6
6
[10] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, Reading,
Massachusetts 1995.
[11] C. S. Hara and S. B. Davidson. Reasoning about nested functional dependencies. In : Proc of ACM Symp on
principles of Database Systems( PODS), Philadelphia: ACM Press, 1999, pp.91-100.
[12] W. Y. Mok, Y. K. Ng, and D. W. Embley. A normal form for precisely characterizing redun-dancy in nested
relations. ACM Trans. Database Syst. 1996, 21(1): 77-106.
[13] M. Vincent, J. Liu, and C. Liu. Strong functional dependencies and their application to normal forms in XML.
ACM Transactions on Database Systems, 2004, 29(3): 445-462.
[14] Mong Li Lee, Tok Wang Ling, and Wai Lup Low. Designing Functional Dependencies for XML, in VIII
Conference on Extending Database Technology (EDBT'02), Springer, 2002, pp.124~141.
[15] Millist Vincent and Jixue Liu. Functional dependencies for XML. In Proc. of 5th Asian-Pacific Web Conference
(APWeb 2003), Lecture Notes in Computer Science, Vol. 2642, Springer, 2003, pp.22-34.
[16] Jixue Liu, Millist Vincent, and Chengfei Liu. Local XML functional dependencies. Proc. of WIDM'03, pp.23-28.
[17] J. Liu, M. Vincent, and C. Liu. Functional dependencies from relational to XML. Ershov Memorial Conference
2003, pp.531-538.
[18] Ping Yan and Teng Lv. Functional Dependencies in XML Documents. Porc. of APWeb 2006 workshop. Lecture
Notes in Computer Science (LNCS), Springer, 2006, Vol. 3842, pp. 29-37.
[19] Sven Hartmann and Sebastian Link. More functional dependencies for XML. In: Proc. of ADBIS 2003, LNCS
2798. Germany: Springer, 2003, pp.355~369.
[20] Teng Lv and Ping Yan. XML Constraint-tree-based functional dependencies. Proc. of 2006 IEEE International
Conference on e-Business Engineering (ICEBE2006), IEEE Computer Society Press, 2006, pp.224-228.
[21] Peter Buneman, Susan Davidson, Wenfei Fan, etc. Keys for XML. Computer Networks, 2002, 39(5): 473~487.
[22] Peter Buneman, Wenfei Fan, J. Simeon, etc. Constraints for semistructured data and XML. ACM SIGMOD
Record, 2001, 30(1): 47~54.
[23] Peter Buneman, Susan Davidson, Wenfei Fan, etc. Reasoning about keys for XML. Lecture Notes in Computer
Science (LNCS), Springer, 2001, Volume 2397 : pp.133~148.