XEM: Managing the Evolotion of XML Documents

XEM: Managing the Evolution of
XML Documents
Author: Hong Su, Diane Kramer. Li Chen, Kajal Claypool
and Elke A. Rundensteiner
Presented by: Li Shuhong
Fall 2001 CS401
Nov. 20, 2001
1
Motivation:
• XML has become increasingly popular as the data exchange format
over the Web. DTDs assume a similar role as types in programming
languages and schemata in database systems
• Many systems utilize the given DTD to construct a fixed relational
schema. This schema can serve as structure based on which to populate
the XML documents that conform to this DTD
•
Change is a fundamental aspect of persistent information and datacentric system. Most of the current XML management systems don’t
provide enough support for these changes.
2
Motivating Example of XML Changes
<!ELEMENT article (title, author+, related-work?)>
<!ELEMENT title (#PCDATA)
<!ELEMENT author (name)>
<!ATTLIST author id ID #REQUIRED>
<!ELEMENT name (firstrname, lastname)>
<!ELEMENT firstname (#PCDATA)
<!ELEMENT lastname (#PCDATA)
<!ELEMENT related-work (monogragh)*>
<!ELEMENT monograph (title, editor)>
<!ELEMENT editor, EMPTY>
<!ATTLIST editor name CDATA #IMPLIED>
<article>
<title>XML Evolution Manager</title>
<author id = “dk”>
<name>
<firstname>Diane</firstname>
<lastname>Diane</lastname>
</name>
</author>
<author id = “er”>
<name>
<firstname>Elke</firstname>
<lastname>Rudensteiner</lastname>
</name>
</author>
<related-work>
<monograph>
<title>Modern database systems</title>
<editor name = “Won Kim”>
</monograph>
</related-work>
</article>
3
Example:
• Removal of <editor name = “Won Kim”>
• XML change support system would need to verify that:
– 1. A new valid DTD.
– 2. Change all old XML documents to conform to the changed DTD
• Result of the example: this change leads to a DTD change, requiring
no changes of the underlying XML data.
Problems with XML Management System:
• To updates the code, the users must be aware of underlying storage
system and the mapping mechanism between XML, DTD, and their
underlying storage model in order to prevent the errors of mismatch of
desired XML transformation and the actual system change.
4
XML data Model & The DTD Data Model
• XML is composed of: nested tagged elements, attributes, and subelements. It may have an associated schema, and DTD.
• Document Type Definition(DTD) allows for properties or constraints
to be defined on elements and attributes.
• A DTD can be modeled as graph, G= (N, p, l), where N is the set of
nodes, p is the parent function representing the edges in the graph, and
l is the labeling function representing a tuple of node’s properties.
5
Graph Representation of Article.dtd
6
Taxonomy and Semantics of XML Change Primitives
• Present the taxonomy of XML change primitives and define their
semantics. Those primitives fall into two categories:
– 1. pertaining to the DTD.
– 2. pertain to XML data.
• The primitives have the following characteristics:
– Complete
– Minimal: each primitive is atomic
– Sound: consistency, integraty
• Example:
– changeQuant(monograph, [1,2],”?”);
– destroyDataEl(“article/related-work/monograph[1]/editor”);
7
8
Advantages:
•
•
•
•
•
Identify the lack of generic support for change in XML management systems.
Provide a system to specify changes both at the DTD and XML data level
Introduce the notion of constraint checking to ensure structural consistency.
Express desired transformation independent of the underlying storage system.
Describe a working XML Evolution Management prototype system:
MARROW
Disadvantages:
•
This systems only allows simple pre-defined schema evolution operation
9