XML Constraints
Wenfei Fan
University of Edinburgh
and
Bell Laboratories
1
Outline of Part IV
XML Specifications: types and integrity constraints
Specification of XML constraints:
– keys, foreign keys, FDs
– absolute vs. relative constraints
Analysis of XML constraints
– Consistency analysis
– Implication analysis
Applications of XML constraints, and research issues
– Relational storage of XML data via constraint propagation
– Schema-directed XML integration
– Normal forms, query optimization, updates, data cleaning . . .
2
Introduction to XML specificaiton
XML Specification:
– types
– integrity constraints
– the need for XML constraints
3
XML data - an example
Rooted, node-labeled tree
elements: db, province, capital, city, subtree/sub-document
elements/subelements, e.g., the capital child of province
@attributes: @name, @inProvince, carrying text
text nodes, with text but no label, e.g., “Hasselt”
db
province
@name
city
province
...
capital
“Limburg”
capital
capital
“Hasselt”
@inProvince
“Limburg”
“others” @inProvince “Hasselt”
“Limburg”
4
XML specification: DTD (type)
Production: constrains the subelement list of each element
<!ELEMENT
db
(province+, capital+)>
<!ELEMENT
province (city*, capital)>
Attributes: uniquely identified by name for each element, unordered
province: @name,
capital:
@inProvince
db
province
@name
city
province
...
capital
“Limburg”
capital
capital
“Hasselt”
@inProvince
“Limburg”
“others” @inProvince “Hasselt”
“Limburg”
5
XML specification: integrity constraints
Keys and foreign keys (vs. relational constraints):
key: the value of a @name uniquely identifies a province
province.@name
province
capital.@inProvince
capital
FK: @inProvince of a capital references @name of a province
capital.@inProvince
province.@name
db
province
@name
city
province
...
capital
“Limburg”
“others” @inProvince “Hasselt”
“Limburg”
capital
capital
“Hasselt”
@inProvince
“Limburg”
6
XML specification
A type (DTD) D
A set of integrity constraints,
Example:
DTD D: structure of the document, vs. types in a PL
<!ELEMENT db
(province+, capital+)>
<!ELEMENT province (city*, capital)>
province.@name, capital.@inProvince
Constraints : defined in terms of data values across elements
province.@name
province
capital.@inProvince capital
capital.@inProvince province.@name
7
Why XML constraints?
Supported by W3C XML standard, XML Schema
In databases (supported by SQL standard), constraints are:
an essential part of the semantics of data,
fundamental to conceptual design,
useful for choosing efficient storage and access methods,
central to update anomaly prevention,
data cleaning …
In the XML setting: constraints have proved useful in
database storage of XML data (via constraint propagation),
schema-directed database publishing/integration in XML,
XML query optimization and formulation,
design theory for XML specifications: normal forms
data cleaning, …
8
Data exchange on the Web: XML publishing
Web
DTD
XML
constraints
XML
Q: XML view
DB1
DB2
All members of a community (or industry) agree on a schema and
exchange data w.r.t. the schema: e-commerce, health-care, ...
Schema-Directed XML Publishing/Integration:
mapping data from traditional database to XML
satisfying the predefined DTD and constraints
9
Data exchange on the Web: XML shredding
Web
XML
XML keys
XML
XML shredding
DB1
propagation
DB2
relational FDs
XML shredding:
mapping XML data to relations
relational design: normalization via constraint propagation from
XML to relations
– optimal relational storage of XML data
– semantic connection: query/update optimization
10
XML constraints
Specification of XML constraints:
– keys, foreign keys, FDs
– absolute vs. relative constraints
11
The limitations of the XML standard (DTD)
<!ATTLIST country name
ID
<!ATTLIST province capital
ID
<!ATTLIST capital inProvince IDREF
#required>
#required>
#required>
Scoping:
– ID unique within the entire document (like oids), while a key
needs only to uniquely identify a tuple within a relation
– IDREF untyped: one has no control over what it points to -you point to something, but you don’t know what it is!
<student id=“01” name=“Saddam” taking=“qsx”/>
<student id=“02” name=“Bush”
taking=“qsx 01”/>
<course id=“qsx”/>
12
The limitations of the XML standard (DTD)
keys need to be multi-valued, while IDs must be single-valued
(unary)
enroll (sid: string, cid: string, grade:string)
a relation may have multiple keys, while an element can have at
most one ID (primary)
ID/IDREF can only be defined in a DTD, while XML data may
not come with a DTD/schema
ID/IDREF, even relational keys/foreign keys, fail to capture the
semantics of hierarchical data – will be seen shortly
A mixture of relational keys and object identities (oids)
Mild extensions of relational constraints do not work for XML!
13
Absolute constraints
Absolute keys and foreign keys are to hold on the entire document.
province.@name
province
capital.@inProvince capital
capital.@inProvince province.@name
Extensions of relational counterparts
db
province
@name
city
province
...
capital
“Limburg”
capital
capital
“Hasselt”
@inProvince
“Limburg”
“others” @inProvince “Hasselt”
“Limburg”
14
Absolute keys and foreign keys [PODS’00, 01, JACM]
key: [X] .
An XML document satisfies the key iff
x y ext() (l X (x.l = y.l) x = y)
foreign key (FK): a combination of an inclusion constraint
1[X] 2[Y], and a key 2[Y] 2 .
A document satisfies the FK iff it satisfies the key and
x ext(1 ) y ext(2 ) (x[X] = y[Y])
– , 1 ,2: element types; X, Y: sets (lists) of attributes;
– ext(): the set of elements in an XML document.
Equality issue:
(string) value equality: when comparing attributes
node identify: when comparing XML elements
Unary keys and foreign keys: defined in terms of single-attribute.
15
Relative constraints [WWW’01, PODS’02,SICOMP]
An XML tree specifies countries, provinces, province capitals.
What is a key for a province?
What does @inProvince of a capital reference?
db
...
country
province
...
capital
country
province
@name
“Belgium”
@name
“Limburg”
capital “Hasselt”@inProvince
@inProvince “Hasselt”
“Limburg”
“Limburg”
@name capital
“Limburg”
...
capital
@name
“Holland”
“Maastricht” @inProvince
“Limburg”
@inProvince “Hasselt”
“Limburg”
16
Examples of relative constraints
Relative constraints: on a subdocument rooted at a country:
key:
country (province.@name
province)
country (capital.@inProvince capital)
FK:
country (capital.@inProvince province.@name)
Absolute: on the entire document: country.@name country
db
...
country
province
...
capital
@name
country
province
“Belgium”
...
capital
@name
“Holland”
@name
capital “Hasselt” @inProvince @name capital “Maastricht”
@inProvince
“Limburg”
“Limburg”
“Limburg”
“Limburg”
@inProvince “Hasselt”
“Limburg”
@inProvince
“Limburg”
“Hasselt”
17
Relative keys and foreign keys
key: (1[X] 1). An document satisfies the key iff
c ext() y, z ext(1)
( (y c) (z c) l X (y.l = z.l) y = z)
foreign key (FK): ( 1[X] 2[Y] ) and a key ( 2[Y] 2) .
A document satisfies the FK iff it satisfies the key and
c ext() y ext(1) (( y c)
z ext(2 ) ((z c) y[X] = z[Y] ))
where
(y c): y is a descendant of c (y in the subtree rooted at c);
: context type;
ext(): the set of elements in an XML document.
18
Relative vs. Absolute
Absolute constraints are a special case of relative ones:
country.@name country db ( country.@name country )
absolute: a fixed context type -- the root type r
Absolute constraints are scoped within the entire document;
whereas relative ones within the context of a subdocument.
country (province.@name
province)
country (capital.@inProvince capital)
country (capital.@inProvince province.@name)
country.@name country
Together they specify constraints on the entire document
Beyond relational constraints; important for hierarchically
structured data: XML, scientific databases, biomedical data, ...
19
Define keys with path expressions
XML data is hierarchically structured!
“name” as a key for employees of companies only: target set is
identified with a path expression: //company//employee
XML data is semistructured: it may not have a DTD/schema!
– key paths may be missing or have multiple occurrences
key specification should be independent of types
db
company
dept
name
company
employee
employee
name
name
government
...
university
employee employee employee
name
@id
@id
@id
name
20
firstName
lastName
Path expressions
Path expression: navigating XML trees
A simple yet powerful path language:
q
::=
|
l
|
q/q
|
//
: empty path
l: tag
q/q: concatenation
//: descendants and self – recursively descending downward
21
Absolute path constraints [WWW’01]
Absolute key: (Q,
{P1, . . ., Pk} )
Path expressions Q, Pi: XPath, regular path expressions, …
target path Q: to identify a target set [[Q]] of nodes on which the
key is defined (vs. relation)
a set of key paths {P1, . . ., Pk}: to provide an identification for
nodes in [[Q]] (vs. key attributes)
semantics: for any two nodes in [[Q]], if they have all the key
paths and agree on them by value equality (existential), then
they must be the same node (value equality and node identity)
Examples:
(//company//employees, {name, phone}) -- composite key
( //company//employees, {//@id})
-- multiple keys
(//., {@id})
-- capturing ID attributes in DTDs 22
Value equality on trees
Two nodes are value equal iff
either they are text nodes (PCDATA) with the same value;
or they are attributes with the same tag and the same value;
or they are elements having the same tag and their children are
pairwise value equal
db
E.g.: two value-equal names
person
@phone
“123-4567”
person
name
name
firstName
“Jerk”
“George”
...
lastName
“Bush”
person
person
name
firstName
“George”
@pnone
“234-5678”
lastName
“Bush”
23
Capturing the semistructured nature
independent of types
no structural requirement: tolerating missing/multiple paths
(person, {name})
(person, {name, @phone})
db
person
@phone
“123-4567”
person
name
name
name
person
person
@pnone
“234-5678”
firstName
lastName
firstName
lastName
“JohnDoe”
“George”
“Bush”
“George”
“Bush”
24
Relative path constraints [WWW’01]
Relative key: (Q, K)
path Q identifies a set [[Q]] of nodes, called the context path;
K = (Q’,
{P1, . . ., Pk} ) is a key on sub-documents rooted at
nodes in [[Q]] (relative to Q).
Example.
(//country, (province, {@capital}))
(//country,
{@name}) -- absolute key
Absolute keys are a special case of relative keys:
(Q, K) when Q is the empty path
Similarly for foreign keys
Specification of XML constraints is more involved than its relational
counterparts
25
Keys and foreign keys in XML Schema
key: (Q,
{P1, . . ., Pk} )
Path expressions Q, Pi: fragments of XPath
Uniqueness and existence: for each node x in [[Q]] and each i
in [1, n], there exists a unique node yi reached via Pi, and yi is
either a text node or an attribute
Foreign keys: (Q,
(S,
{P1, . . ., Pk} ) (S,
{S1, . . ., Sk} )
{S1, . . ., Sk} ) is a key
Uniqueness and existence: both Pi and Si
The uniqueness and existence condition complicates the
consistency and implication analyses
Absolute constraint
26
Other constraints for XML
Functional dependencies: {P1, . . ., Pk} {S1, . . ., Sk}
Generalizations of relational FDs – for deriving an extension of
relational-schema normal forms
Absolute constraints [Arenas and Libkin, PODS’02]
XICs: x1 … xn ( B(x1, …, Xn)
∨ (i [1, l]) ( y1 … yk
Ci (x1, …, xn, y1, …, yk))
Generalization of relational embedded constraints
B, Ci: conjunction of simple XPath expressions
Subsuming relative keys and foreign keys (Deutsch and Tannen,
[KRDB’01])
27
Constraint analysis
Analysis of XML constraints
– Consistency analysis
– Implication analysis
– Absolute, relative, path-expression constraints
28
Consistency of XML specifications
Given D: a DTD
: a set of integrity constraints over D
Consistency: Is there an XML document that both conforms to
D and satisfies ?
One wants to know whether XML specifications make sense!
Run-time check: attempts to validate documents with (D, ).
This would not tell us whether repeated failures are due to a bad
specification or problems with the documents
static analysis is desirable
29
An inconsistent specification
The specification with D and is inconsistent!
DTD D:
<!ELEMENT db
(province+, capital+)>
<!ELEMENT province (city*, capital)>
province.@name, capital.@inProvince
Constraints :
province.@name
province
capital.@inProvince capital
capital.@inProvince province.@name
In contrast, one can specify keys and foreign keys in SQL without
worrying about their consistency with schema.
30
Cardinality constraints by keys, foreign keys
Constraints :
province.@name
province
capital.@inProvince capital
capital.@inProvince province.@name
Notation:
ext(): the set of elements in an XML document
ext(.l): the set of l attribute values of all elements
|ext(province.@name)|
= |ext(province)|
|ext(capital.@inProvince)| = |ext(capital)|
|ext(capital.@inProvince)| |ext(province.@name)|
|ext(capital)| |ext(province)|
31
Cardinality constraints imposed by DTDs
DTD D: <!ELEMENT db
(province+, capital+)>
<!ELEMENT province (city*, capital)>
Variables:
Xprovince: the number of province elements under the root
Xcapital: the number of capital subelements of the root
Ycapital: the number of capital subelements of province’s
Xprovince 1, Xcapital 1
|ext(province)| = Xprovince, Xprovince = Ycapital
|ext(capital)| = Xcapital + Ycapital
|ext(capital)| > |ext(province)|
32
The interaction
Contradiction:
From the constraints : |ext(capital)|
|ext(province)|
From the DTD D:
|ext(province)|
|ext(capital)| >
Thus there exists NO XML document that both conforms to D and
satisfies .
db
province
@name
city
province
...
capital
“Limburg”
capital
capital
“Hasselt”
@inProvince
“Limburg”
“others” @inProvince “Hasselt”
“Limburg”
33
Consistency analysis [PODS’01, 02, JACM, SICOMP]
Trivial for relational databases: given any schema and keys,
foreign keys, one can always find a nonempty instance of the
schema satisfying the constraints.
Hard for XML: XML specifications may not be consistent!
– Both DTDs and constraints impose cardinality constraints
– The interaction between these two classes of cardinality
constraints is rather complicated.
34
Consistency analysis of XML constraints
Theorem: The consistency problem is
undecidable for multi-attribute absolute keys and foreign keys;
NP-complete for unary absolute keys and foreign keys, even for
primary keys (primary: at most one key for each element type);
in NEXPTIME for primary multi-attribute absolute keys and
unary foreign keys
in 2NEXPTIME and PSPACE-hard for unary absolute regular
keys and foreign keys (target path: /, where is a regular path
expression and an element type; key paths: attributes)
undecidable for relative keys and foreign keys, even when all
the constraints are unary and primary.
As opposed to the trivial analysis of the relational counterpart.
35
Proof ideas
Multi-attribute constraints: reduction from the implication
problem for functional and inclusion dependencies in RDBs.
Unary keys and foreign keys:
– a nontrivial encoding of DTDs and unary constraints in terms
of linear integer constraints (O(n2 log n)-time);
– polynomially equivalent to LIP, linear integer programming
Multi-attribute primary keys and unary foreign keys:
– polynomially equivalent to Prequadratic Diophantine
Problem (PDE): satisfiability of linear integer constraints and
prequadratic constraints of the form: x <= y z;
– the precise complexity of PDE, a restriction to the Hilbert’s
10th problem, is open -- nontrivial.
36
Proof idea for relative constraints
Theorem: The consistency problem is undecidable for relative keys
and foreign keys, even when all the constraints are unary and
are under the primary key restriction.
As opposed to the NP complexity of its absolute counterpart.
Proof idea: reduction from the Hilbert’s 10th problem.
Diophantine equation problem:
P1 (x1, …, xk) = Q1 (x1, …, xk) + c1
...
Pn (x1, …, xk) = Qn (x1, …, xk) + cn
37
More on regular-expression constraints
XML data is hierarchically structured:
define @eid as a key of employees of companies and schools;
define @taughtBy as a foreign key of students referencing @eid
of school employees.
db
...
university university
dept
dept
government
company
employee employee employee dept
@eid
student
employee
@taughtBy @eid
student
@taughtBy
employee
@eid
employee employee
@eid
@eid
38
Examples of regular constraints
Key:
(university._* + company._*).employee.@eid
(university._* + company._*).employee
FK:
_*.student.@taughtBy university._*.employee.@eid
_: wildcard that matches any label
_*: the Kleene closure of _
db
...
university university
dept
dept
government
company
employee employee employee dept
@eid
student
employee
@taughtBy @eid
student
@taughtBy
employee
@eid
employee employee
@eid
@eid
39
Regular path expression
Vertical regular expressions:
::= | | _ | . | + | *
: empty word;
: element type;
_: wildcard;
“., +, *”: concatenation, disjunction, Kleene star
Example: (university._* + company._*).employee
university._*.employee
nodes(. ): the set of elements in an XML document that are
reachable from the root by following
40
Regular expression constraints
key: .[X] .. A document satisfies the key iff
x y nodes( . ) (l X (x.l = y.l) x = y)
foreign key: 1.1[X] 2.2[Y], and a key 2.2[Y] 2.2
A document satisfies the FK iff it satisfies the key and
x nodes( 1.1 ) y nodes( 2.2 ) (x[X] = y[Y])
where nodes(.): the set of elements reachable from the root by
following .
41
Regular: an extension of absolute constraints
Example:
Key:
(university._* + company._*).employee.@eid
(university._* + company._*).employee
FK:
_*.student.@taughtBy university._*.employee.@eid
Observation:
nodes( _*. ) = ext()
Recall absolute constraints:
key: [X]
_*. [X] _*.
foreign key: 1[X] 2[Y], 2[Y] 2
_*. 1 [X] _*.2 [Y],
_*. 2 [Y] _*.2
42
Consistency analysis of regular constraints
Corollary: The consistency problem is undecidable for multiattribute regular keys and foreign keys.
Theorem: It is decidable in 2NEXPTIME and is PSPACE-hard for
unary regular constraints.
2NEXPTIME: an involved encoding in terms of LIP
regular expressions in a DTD interact with (vertical) regular path
expressions: reduce DTD to a simple normal form
regular path expressions interact with each other: introduce
exponentially many variables for all boolean combinations
encoding “reachability” (nodes(.)) of a path expression: tag
variables with states of finite state automata
43
Some tractable cases
Restrictions on constraints.
Theorem: For multi-attribute relative keys only, the consistency
problem is in linear time for arbitrary DTDs.
Recall relative keys: country (province.@name province)
In contrast, due to the existence and uniqueness condition:
Theorem: It is intractable for unary keys alone in XML Schema.
Restrictions on DTDs:
Theorem: When DTD is fixed, the consistency problem is in PTIME
for absolute unary keys and foreign keys.
In practice, DTD is designed at one time, but constraints are written
in stages: constraints are incrementally added.
44
Implication analysis [PODS’00, 01, 02, DBPL’01]
Given D: a DTD
: a set of constraints expressed in C
: a property (a constraint of C)
Implication (C ): Is it the case that for any XML document, if it
conforms to D and satisfies , then it must satisfy ?
C: a constraint language
The need for studying implication:
data integration: constraints checking at virtual views
optimization of XML queries and XML relational storage
design theory for XML specifications: normalization
45
Some complexity results for implication analysis
Theorem: The implication problem is
undecidable for multi-attribute absolute keys and foreign keys,
and for unary relative keys and foreign keys;
PSPACE-hard for unary regular absolute keys and foreign keys;
coNP-complete for unary absolute keys and foreign keys.
coNP-hard for XML-Schema unary keys
in linear time for absolute multi-attribute keys;
in PTIME for arbitrary absolute keys and foreign keys when the
DTD is fixed, and
in PTIME for relative path keys in the absence of DTDs
The analysis of XML constraints is far more intricate than its
relational counterpart
46
Applications
Application of XML constraints, and open problems
– Constraint propagation
– Schema-directed XML integration
– Normal form
– Query rewriting/optimization
– Update processing
– Data cleaning
– ...
47
XML shredding: relational storage of XML data
Web
XML
XML keys
XML
XML shredding
DB1
propagation
DB2
relational FDs
XML shredding:
mapping XML data to relations
relational design: normalization
– optimal relational storage of XML data
– semantic connection: query/update optimization
48
Example: XML constraints
(//book,
{isbn})
-- isbn is an (absolute) key of book
(//book,
(chapter, {number}) -- number is a key of chapter
relative to book
(//book, (title, { })) -- each book has a unique title
db
book
isbn
“XML”
“1”
title
book
chapter
chapter
number section
number text
title
DTD
number section
“6”
number
book
book
isbn
chapter
chapter
“XML” number title
number
title
“1” XPath
49
“10”
Mapping from XML to a predefined relation
Predefined RDB: chapter(bookTitle, chapterNum, chapterTitle)
Mapping: for each book, extract its title, and the numbers and
titles of all its chapters
Predefined relational key: (bookTitle, chapterNum)
Can the XML data be mapped to the RDB without violating the key?
db
book
isbn
“XML”
“1”
title
book
chapter
chapter
number section
number text
title
DTD
number section
“6”
number
book
book
isbn
title
chapter
chapter
“XML” number title number
50
“1” XPath “10”
A safe mapping
Now change the relational schema to
RDB: chapter(isbn, chapterNum, chapterTitle)
The relation can be populated without any violation. Why?
The relational key (isbn, chapterNum) for chapter is implied
(entailed) by the keys on the original XML data:
(//book,
{isbn}),
(//book,
(chapter, {number}),
(//book, (title, { }))
db
book
isbn
“XML”
“1”
title
book
chapter
chapter
number section
number text
title
DTD
number section
“6”
number
book
book
isbn
title
chapter
chapter
“XML” number title number
51
“1” XPath “10”
Constraint Propagation [ICDE’03, JCSS]
Input:
– a set K of XML keys (context and target path: a fragment of
XPath, key paths: attributes)
– a predefined relational schema S,
– a mapping f from XML to S (XPath, projection, join, union)
– and a relational functional dependency FD over S
Output:
is the FD propagated from K via f? I.e., does FD hold
over the DB f(T) for any XML document T that satisfies K?
Theorem: The constraint propagation problem is in PTIME.
Checking the consistency of a predefined relational schema for
storing XML data
XML schema/DTD is not required – K is the only semantics
52
Deriving relational schema for storing XML
One wants to find a “good” relational schema to store:
chapter(isbn, bookTitle, author, chapterNum, chapterTitle)
What is a good schema? In normal form: BCNF, 3NF, …
Prevent update anomaly (the relational theory)
Efficient storage, query optimization …
But how to find a normalized design? db
book
isbn
“XML”
“1”
title
book
chapter
chapter
number section
number text
title
DTD
number section
“6”
number
book
book
isbn
title
chapter
chapter
“XML” number title number
53
“1” XPath “10”
Constraint propagation and normalization
From the given XML keys:
(//book,
{isbn}),
(//book, (chapter, {number}),
(//book, (title, { }))
one can derive functional dependencies:
isbn bookTitle,
isbn, chapterNum chapterTitle
Normalize the relation by using these functional dependencies:
chapter(isbn, bookTitle, author, chapterNum, chapterTitle)
book(isbn, bookTitle),
chapter(isbn, chapterNum, chapterTitle),
author(isbn, author)
The new schema is in BCNF!
54
Computing minimum cover of propagated FDs
Input: a set K of XML keys, and a mapping f from XML to a
universal schema U
Output: a minimum cover F of all the functional dependencies
(FDs) propagated from the XML keys K via f
– F is a cover (a set of FDs): any FD propagated from K via f is
implied by F
– F is minimum: F contains no redundant FDs, i.e., any FD in
F is not entailed by other FDs in F.
Theorem: There is a PTIME algorithm for computing a minimum
cover of propagated FDs.
Normalize relational schema for storing/querying XML data!
55
Research issues
For general constraints/mapping languages: undecidable
if the mapping language is relationally complete (selection,
projection, join, union, difference), even for XML keys alone
if both XML keys and foreign keys are considered, even for the
identity “transformation”
Open:
To identify (a) practical mapping languages and (b) practical
XML constraints that allow efficient constraint propagation
Constraint propagation from relations to XML
– Information preserving (lossless) data exchange
– Query/update rewriting/optimization
– Overcoming incompleteness of source data (foreign keys)
56
XML publishing/integration
Web
DTD
XML
constraints
XML
Q: XML view
DB1
DB2
All members of a community (or industry) agree on a schema and
exchange data w.r.t. the schema: e-commerce, health-care, ...
Schema-directed XML Publishing/Integration:
mapping data from traditional database to XML
satisfying the predefined DTD and constraints
57
Schema-directed integration [SIGMOD’03]
DB
DTD
integration
DB
DB
constraints
XML view
multiple, distributed sources
Schema-directed: XML view conforming to a schema (D, )
– D: a DTD
– : a set of XML constraints (relative keys, foreign keys)
Attribute Integration Grammar (AIG)
DTD-directed view definition: recursive, nondeterministic
Inherited and synthesized attributes
Constraint compilation: automatically captures integrity
constraints and DTD in a uniform framework
58
XML normal forms
3NF, BCNF?
Extensions of (nested) relational normal forms, via XML FDs
– M. Arenas and L. Libkin. A Normal Form for XML Documents,
[PODS 02]. XNFs, decomposition algorithms, complexity, …
– M. Vincent, J. Liu and C. Liu. Strong functional dependencies and
their application to normal forms in XML. [TODS 29(3), 2004]
– X. Wu, T.W. Ling, S. Lee, M. Lee, G. Dobbie. NF-SS: A Normal
Form for Semistructured Schema. [ER (Workshops) 2001]
59
Research issues for XML normal forms
Implication analysis: more intriguing than relational FDs
Relative functional dependencies: hierarchical nature of XML
“Right” normal form for XML: to prevent update anomalies?
– XML data is often “static”: update anomalies?
– XML data is typically stored in RDBMS
– When XML data is updated, it is done through RDBMS
– Redundancy often helps, e.g., performance and reliability
– Normal form: a right class of constraints to assure “lossless”
shredding into relations of certain normal form
Unfortunately, no previous work has studied this
60
Run-time analysis: incremental constraint checking
Input: XML tree T, constraints , update ∆T, where T satisfies
Question: does (T + ∆T) satisfy ?
∆X . Code generator: incremental checking. Lucent applications
M. Benedikt, G. Brun, J. Gibson, R. Kuss and A. Ng. Automated
update management for XML integrity constraints. [PLANX’02]
Application of incremental techniques for attribute grammar
M. Abrao, B. Bouchou, M. Alves, D. Laurent, M. Musicante.
Incremental Constraint Checking for XML Documents [XSym’04]
Research issues:
Complexity of incremental constraint checking
XML editors: broken link detection and repair
Incremental checking techniques for XML data stored in RDBMS
61
Query rewriting and optimization
Query translation from XQuery to SQL: XML data stored in RDBMS
– encode XIGs and XQuery in relational queries and constraints
– extensions of chase and backchase
A. Deustch and V. Tannen
– Reformulation of XML Queries and Constraints [ICDT’03]
– MARS: A System for Publishing XML from Mixed and Redundant
Storage [VLDB’03]
R. Krishnamurthy, R. Kaushik, J. Naughton. Efficient XML-to-SQL
Query Translation: Where to Add the Intelligence? [VLDB 2004]
Research issues:
Rewriting queries over (recursive security) views of XML data
Query optimization for (compressed) XML data in native store
62
Data cleaning
Input: XML tree T, constraints , DTD D
Question: if T does not satisfy D + , find a repair T’ such that (a) T’
satisfies D + , and (b) the distance between T and T’ is minimal
(update operations: insert, delete, modify)
G. Flesca, F. Furfaro, S. Greco, E. Zumpano. Repairs and Consistent
Answers for XML Data with Functional Dependencies [XSym’03]
Research issues:
Effective techniques for repairing integrated XML data: conflicts
and inconsistencies may emerge as violations of constraints.
– Various constraint languages,
– XML schema
Automated tools for repairing Web pages: broken links
63
Summary
Specification of XML constraints:
– absolute vs. relative, path constraints: XML data is
hierarchical and semi-structured
– mild extensions of relational constraints are not sufficient
Consistency and implication analysis of XML constraints
– DTDs interact with XML constraints
– far more intricate than their relational counterparts
Applications of XML constraints
– XML storage, query, update, integration, cleaning, …
– many practical issues remain to be explored
64
References
In addition to the papers mentioned earlier
Keys for XML
Computer Networks, Volume 39(5), August 2002, pp 473 - 487.
P. Buneman, S. Davidson, W. Fan, C. Hara, W. Tan
On XML Integrity Constraints in the Presence of DTDs
Journal of the ACM (JACM), 49(3), pp 368 - 406, May 2002.
Wenfei Fan and Leonid Libkin
On Verifying Consistency of XML Specifications
PODS 2002
Marcelo Arenas, Wenfei Fan and Leonid Libkin
What's Hard about XML Schema Constraints?
DEXA 2002
Marcelo Arenas, Wenfei Fan and Leonid Libkin
65
References
Propagating XML Constraints to Relations
JCSS, 73(3):316-361, May 2007.
Susan Davidson, Wenfei Fan, and Carmem Hara
Capturing both Types and Constraints in Data Integration
SIGMOD, 2003
M. Benedikt, C. Chan, W. Fan, J. Freire, and R. Rastogi
XML Constraints: Specification, Analysis, and Applications
LAAIC, 2005
Wenfei Fan
Containment and Integrity Constraints for XPath
KRDB 2001
Alin Deutsch, Val Tannen
66
© Copyright 2026 Paperzz