Chapter 3 Querying XML

Querying XML:
XQuery, XPath, and SQL/XML in Context
Authors: Jim Melton and Stephen Buxton
Publisher: Morgan Kaufmann
Publication Year: 2006
Lecturer: Kyungpook National University
School of EECS
Lab. of Database Systems
Young Chul Park
Preface
Issues
1. Querying XML documents
: How to locate information in documents that are marked up using XML and
how to find and extract that information in repositories of such documents.
2. Repositories of XML documents
XML : XML document  XML fragment
Who should read this book.
Software engineers who have to design and build applications that use XML and to access documents
and data presented in an XML form.
How the book is organized.
Part I, “XML: Documents and Data”
Part II, “Metadata and XML”
Part III, “Managing XML for Querying”
Part IV, “Querying XML”
Part V, “Querying and the World Wide Web”
W3C : World Wide Web Consortium.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
2
Part I XML: Documents and Data



Chapter 1 XML
Chapter 2 Querying
Chapter 3 Querying XML
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
3
Chapter 1 XML
1.1 Introduction
XML - the Extensible Markup Language –
1.2 Adding Markup to Data
1.2.1 Raw Data
Example 1-1 movie, Raw Data
An American Werewolf in London1981LandisJohnFolseyGeorge,
Jr.GuberPeterPetersJon98NaughtonDavidmaleDavid
KesslerAgutterJennyfemaleAlex Price
1.2.2 Separating Fields
Example 1-2 movie, Fields Separated by Commas
An American Werewolf in London,1981,Landis,John,Folsey,George\,
Jr.,Guber,Peter,Peters,Jon,98,Naughton,David,male,David
Kessler,Agutter,Jenny,female,Alex Price
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
4
Chapter 1 XML
1.2.3 Grouping Fields Together
Example 1-3 movie, Grouped Fields
,
,An American Werewolf in London$,
,1981$,
,
,Landis$
,John$
$,
,
,Folsey$,
,George, Jr.$,
$,
,
,Guber$,
,Peter$,
$,
,
,Peters$,
,Jon$,
$,
,98$,
,
,Agutter$,
,Jenny$,
,female$,
,Alex Price$,
$,
$,
2007-7/KNU
1.2.4 Naming Fields
a way to name each field.“<tagname>” … “</tagname>”
Example 1-4 movie, Fields Grouped and Named
<movie>
<title>An American Werewolf in London</title>
<yearReleased>1981</yearReleased>
<director>
<familyName>Landis</familyName>
<givenName>John</givenName>
</director>
<producer>
<familyName>Folsey</familyName>
<givenName>George, Jr.</givenName>
<otherName></otherName>
</producer>
<producer>
<familyName>Guber</familyName>
<givenName>Peter</givenName>
<otherName></otherName>
</producer>
<producer>
<familyName>Peters</familyName>
<givenName>Jon</givenName>
<otherName></otherName>
</producer>
<runningTime>98</runningTime>
<cast>
<familyName>Agutter</familyName>
<givenName>Jenny</givenName>
<maleOrFemale>female</maleOrFemale>
<character>Alex Price</character>
</cast>
</movie>
Querying XML: XQuery, XPath, and SQL/XML in Context
5
Chapter 1 XML
1.2.5 A Structural Map of the Data : DTDs (Document Type Definition) and XML Schemas
Example 1-5 A DTD for movie
<!ELEMENT movie (title, yearReleased, director, producer+, runningTime, cast+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT yearReleased (#PCDATA)>
<!ELEMENT director (familyName, givenName, otherNames?)>
<!ELEMENT producer (familyName, givenName, otherNames?)>
<!ELEMENT runningTime (#PCDATA)>
<!ELEMENT cast (familyName, givenName, otherNames?, maleOrFemale, character)>
<!ELEMENT familyName (#PCDATA)>
<!ELEMENT givenName (#PCDATA)>
<!ELEMENT otherNames (#PCDATA)>
<!ELEMENT maleOrFemale (#PCDATA)>
<!ELEMENT character (#PCDATA)>
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
6
Chapter 1 XML
Example 1-6 An XML Schema for movie
<?xml version=“1.0” encoding=“UTF-8”?>
<xs:schema xmlns:xs=“http://www.w3.org/2001/XMLSchema”
elementFormDefault=“qualified”>
<xs:element name=“familyName” type=“xs:string”/>
<xs:element name=“givenName” type=“xs:string”/>
<xs:element name=“movie”>
<xs:complexType>
<xs:sequence>
<xs:element name=“title” type=“xs:string”/>
<xs:element name=“yearReleased”>
<xs:simpleType>
<xs:restriction base=“xs:integer”>
<xs:minInclusive value=“1900”/>
<xs:maxInclusive value=“2100”/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name=“director”>
<xs:complexType>
<xs:sequence>
<xs:element ref=“familyName”/>
<xs:element ref=“givenName”/>
</xs:sequence>
</xs:complexType>
</xs:element>
2007-7/KNU
<xs:element name=“producer” maxOccurs=“unbounded”>
<xs:complexType>
<xs:sequence>
<xs:element ref=“familyName”/>
<xs:element ref=“givenName”/>
<xs:element name=“otherNames” type=“xs:string”/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name=“runningTime” type=“xs:integer”/>
<xs:element name=“cast” maxOccurs=“unbounded”/>
<xs:complexType>
<xs:sequence>
<xs:element ref=“familyName”/>
<xs:element ref=“givenName”/>
<xs:element name=“maleOrFemale”>
<xs:simpleType>
<xs:restriction base=“xs:string”>
<xs:enumeration value=“male”/>
<xs:enumeration value=“female”/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name=“character” type=“xs:string”/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Querying XML: XQuery, XPath, and SQL/XML in Context
7
Chapter 1 XML
1.2.6 Markup and Meaning
1.2.7 Why XML?
Record: field  XML: element
XML parser
Mapping == Modeling
XSLT
XML-aware query engine
In general, you should use XML for semantic markup and leave it to a reporting/publishing tool (such as
XSLT) to map “meaning” into presentation.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
8
Chapter 1 XML
1.3 XML-Based Markup Languages : say something about Structure and Semantics.
For the meaning of the tags, you need to look at the human language documentation.
Digital Weather Markup Language
XBRL – Extensible Business Reporting Language
Dublin Core : defines a markup language for catalog metadata.
Table 1-1 Dublin Core Metadata Definition, Sample
DocBook : defines a set of XML tags for use in creating marked-up books, articles, and documentation.
Example 1-7 DocBook Example Document
1.4 XML Data
1.4.1 Structured Data
1.4.2 Unstructured Data : text, AVI (audio, video, image), and spatial data
1.4.3 Messages
1.4.4 XML Data –Summary
Table 1-2 XML Data
===============================================================================
Structured (data)
Messages (data)
Unstructured (documents)
===============================================================================
Fields
Small, well defined
Small, well defined
Large, mainly text
Record
Large
Small
Large
Storage
Persistent, database
Nonpersistent, memory
Persistent, database or files
Query
Across large numbers,
Across a single message
Across large numbers of documents
with complex relationships/
with search for meaning
constraints
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
9
Chapter 1 XML
1.5 Some Other Ways to Represent Data
1.5.1 SQL – Structure Only : Figure 1-1 movie, SQL (Relational) Representation
1.5.2 Presentation Languages – Presentation Only :
(1) roff, troff, groff (C++ version of roff produced by GNU)
(2) TeX/LaTeX (3) PostScript (4) PDF (Portable Document Format)
Example 1-8 troff Markup to Start a New Paragraph
Example 1-9 PostScript Markup to Print Hello, world!
1.5.3 SGML : the Standard Generalized Markup Language, ISO Standard 8879.
Important features
• Descriptive markup: the idea that markup should not be procedural.
• Document type: the first language to define a DTD (Document Type Definition).
• Data independence: Enable faithful sharing of documents across different hardware and
software platforms.
1.5.4 HTML
• HTML is much more forgiving than XML.
• XML is all about marking up the meaning of the data whereas HTML has drifted toward
presentation markup.
XHTML 1.0, XHTML transitional, XHTML strict.
1.6 Chapter Summary
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
10
Chapter 2 Querying
2.1 Introduction
A database query :
a select query : a data retrieval query
an action query : insertion, deletion, and updating.
2.2 Querying Traditional Data
Traditional data : simple data types such as integers, dates, and short strings.
2.2.1 The Relational Model and SQL
 SQL-92
Example 2-1 Projection
SELECT title FROM movies;
Example 2-2 Selection
SELECT * FROM movies WHERE runningTime < 100;
Example 2-3 Projection and Selection
SELECT title FROM movies WHERE runningTime < 100;
Example 2-4 Union
SELECT * FROM directors
UNION
SELECT ID, familyName, givenName FROM producers;
Example 2-5 Join
SELECT movies.ID, movies.title, movies.yearReleased, directors.familyName, directors.givenName
FROM movies, directors
WHERE movies.director = directors.id;
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
11
Chapter 2 Querying
2.2.2 Extensions to SQL
Rows and Columns with Single-Value Cells – Too Simplistic
Object-Relational Storage
Object Extensions to SQL  SQL:1999 (formerly referred to as SQL3)
Extensions include support for:
– User-defined types
– Type constructors for row types, reference types, and collection types.
– User-defined functions and procedures
– LOBs (large objects)
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
12
Chapter 2 Querying
2.3 Querying Nontraditional Data
Nontraditional data : data that cannot be represented naturally as numbers and dates and strings, such as documents and
pictures and movie clips.
2.3.1 Metadata Approach
One approach is to store the nontraditional data as an opaque chunk of data and add meta data. Once we have the data
in a LOB, we can store it in a database table and add metadata in other columns in the table. Then we can query the
metadata to find a particular instance of a LOB or to find out information about an instance of a LOB.
There are several ways to create the metadata.
–
Some formats have metadata embedded in them. Text formats (PDF, Microsoft Word) generally contain some
automatically generated metadata – the author’s name, document title, last modified date, etc – and some
metadata that can be added by the author. This metadata can be extracted programmatically and written into
database columns as the data is inserted. For example, Oracle’s interMedia product will extract metadata from
most common document, audio, video, and image formats and make that metadata available for query in columns
of a table.
–
Whoever publishes the data – inserts the data into a database – can add metadata via an application. A CMS
(content management system) will allow the publisher to add all kinds of metadata at various stages of the
publishing process.
–
Some interesting programs can produce meaningful metadata for text documents automatically, even when that
metadata does not exist explicitly in the document.
The metadata approach requires manual or programmatic effort to produce the metadata, then some design to figure
out how to store the metadata, and finally some programming to create an application that will query the metadata in an
application-specific way.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
13
Chapter 2 Querying
2.3.2 Objects Approach
Object technology offers the potential of storing nontraditional data items in a nonopaque way – to “open the box” and
treat, say, a PDF document as a PDF document rather than as an opaque LOB. All we need to do is define an object type
to represent the PDF-formatted data, and some methods that make sense for PDF. Then we can query the actual
document instead of querying its metadata.
Oracle’s interMedia can extract metadata that is embedded in, e.g., a picture. interMedia does that by creating an object
type for the picture format and querying that object to extract useful metadata.
The objects approach requires the definition of an object type, with methods, for each kind of data to be queried, plus an
application to query that data.
2.3.3 Markup Approach
An XML document that can be described and queried with standard tools.
You can use markup to represent metadata that has been created or extracted. Adobe has taken this approach.
2.3.4 Querying Content
Metadata, Objects, and Markup approaches : On querying nontraditional data, these approaches extract (or at least
surface) traditional, structured data that tells us something about the nontraditional content so that we can query that
traditional data.
There is another approach – to query the nontraditional data directly, in a way that is appropriate to that kind of data.
E.g., query text data using full-text queries.
E.g., direct multimedia search.
2.4 Chapter Summary
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
14
Chapter 3 Querying XML
3.1 Introduction
XML is quite different from relational data, and it offers its own special challenges and opportunities for the query writer.
Querying XML data is different from querying relational data – it requires navigating around a tree structure that may
or may not be well defined (in structure and in type). Also XML arbitrarily mixes data, metadata, and representational
annotation (though the latter is frowned on).
We argue that knowledge of document structure and data types is a good thing and that XQuery 1.0 and XPath 2.0 will
be the most important languages for querying XML.
3.2 Navigating an XML Document
Since an XML document is by nature hierarchical, it can be represented easily in a tree diagram.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
15
Chapter 3 Querying XML
<movie>
myStars=”5"
+
<title>
An American
Werewolf in
London
<yearReleased>
1981
<director>
<producer>
<runningTime>
<familyName>
<givenName>
<familyName>
<givenName>
Landis
John
Guber
Peter
98
Figure 3-1 movie Document  an XML tree
–
//title
“at any point in the XML tree, return all nodes with the name title”
–
/movie/title
“return the title by starting at the root node and navigating to the title element that
is a child of the movie element”
//
: “at any point in the XML tree”
The leading “/”
: “start at the top of the XML tree”
Any other “/”
: “go down one level in the XML tree, i.e., select the children of the current node”
“/” (step)
Note that the “top of the XML tree” is not the element named “movie,” it’s a notational node above the element named
“movie.”
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
16
Chapter 3 Querying XML
3.2.1 Walking the XML tree
To traverse or walk an XML tree, you need to be able to express the following:  basic 5 concepts
1. The top node – in XPath, this is called the root node (an imaginary node that sits above the
topmost node) and is represented by a leading “/”.
2. The current position – in XPath, this is called the context node and is represented by “.”.
3. The node directly above the current position – in XPath, this is called a reverse step (specifically,
the parent) and is represented by “..”.
4. The nodes directly below the current position – in XPath, this is a step, and is represented by a “/”
(a step separator), typically followed by a condition.
5. A condition – in XPath, this can be a node test or a predicate list. A node test is used to test either
the name of the node or its kind (element, attribute, comment, etc). A predicate tests either the
position of the node as the N-th child (child nodes are numbered starting from 1) or it tests the
value of the node.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
17
Chapter 3 Querying XML
<movies>
<movie>
<title>
myStars=”3"
<familyName> <givenName>
The Fifth
Element
Besson
<movie>
<title>
<director>
<yearReleased>
1997
...
myStars=”5"
<director>
<yearReleased>
An American
Werewolf in
London
Luc
1981
...
<familyName> <givenName>
Landis
John
Figure 3-2 movies Document Tree
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
18
Chapter 3 Querying XML
(1)A Simple Walk Down the Tree
: Selecting and filtering a sequence of nodes.
Example 3-1 A Simple Walk Down the Tree
English query
Find the titles of all movies.
XPath Expression
/movies/movie/title
1.
Expression
2.
3.
4.
5.
6.
/ - select the notational top node in the tree.
/movies - select the child node(s) named “movies”.
/movies/ - select the child node(s) of /movies.
/movies/movie - select the nodes named “movie”.
/movies/movie/ - select the child node(s) of /movies/movie.
/movies/movie/title - select the nodes named “title”.
Note that the result of evaluating the XPath expression “/movies/movie/title” is not a string containing the titles of all
movies in the document – it’s a sequence of nodes (title element nodes). If you want to do anything with the results, you
need to serialize the results, i.e., convert the results from the data model of your query language into something you can
read or print. When you serialize a sequence of title element nodes, it’s reasonable to take the string value of each node,
along with some representation of the element tag (“title”).
If you need to convert the node sequence into a sequence of strings, you can use
/movies/movie/title/text()
to pull out the text nodes, but even then you may need to do some more work to map those text nodes into something
your host language will understand.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
19
Chapter 3 Querying XML
(2) Adding a Value Predicate
Example 3-2 Adding a Value Predicate
English query
Find the titles of all 5-star movies.
XPath Expression
/movies/movie[@myStars=5]/title
1.
Expression
2.
3.
4.
5.
6.
2007-7/KNU
/ - select the notational top node in the tree.
/movies - select the child node(s) named “movies”.
/movies/ - select the child node(s) of /movies.
/movies/movie - select the nodes named “movie”.
/movies/movie[@myStars=5] - from the sequence of movie nodes, select all
those nodes where the value of the attribute named “myStars” equals 5.
/movies/movie[@myStars=5]/title - from the sequence of movie nodes where
“myStars” equals 5, select just the child nodes with the name “title”.
Querying XML: XQuery, XPath, and SQL/XML in Context
20
Chapter 3 Querying XML
(3) Adding a Positional Predicate
Example 3-3 Adding a Positional Predicate
English query
Find the titles of the 5th movie.
XPath Expression
/movies/movie[5]/title
1.
Expression
2.
3.
/movies/movie - select the sequence of movie element nodes under each movies element node.
/movie/movie[5] - from the sequence of movie nodes, select the node in position 5.
/movies/movie[5]/title - from the 5th movie node, select the element child named “title”.
“Informally, document order is the order in which nodes appear in the XML serialization of a document.”
In general, data-centric XML documents, such as movies, do not rely on document order. On the other hand, documentcentric XML documents, such as books, articles, and papers, rely heavily on document order. Without document order,
XML authors would have to number every chapter, section, paragraph, bolded term, etc., and explicitly order every
query.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
21
Chapter 3 Querying XML
(4) The Context Item
Example 3-4 The Context Item
English query
Find the titles of movies that contain the string “Werewolf”.
XPath Expression
/movies/movie/title[contains(.,”Werewolf”)]
1.
Expression
2.
/movies/movie/title - select the sequence of nodes that represent all titles under movie under
movies.
/movie/movie/title[contains(.,”Werewolf”)] - filter the sequence of title nodes by the condition
‘contains(.,”Werewolf”)’, where ‘contains’ is a built-in function. The first parameter to ‘contains’
is the context item “.”.
Note that this is equivalent to:
/movies/movie[contains(title,”Werewolf”)]/title
The context item (“.”) indicates the current node being considered, as the predicate is applied to each title element node
in turn.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
22
Chapter 3 Querying XML
(5) Up the Tree and Down Again
Example 3-5 Up the Tree and Down Again
English query
Find the running times of movies where the title contains the string “Werewolf”.
XPath Expression
/movies/movie/runningTime[contains(../title,”Werewolf”)]
1.
Expression
2.
/movies/movie/runningTime - select all runningTime nodes under movie under movies.
/movie/movie/runningTime[contains(../title,”Werewolf”)] - filter the runningTime nodes by
looking back up the tree (“..”) and testing whether the parent of the runningTime node has a
child called “title” that contains the string “Werewolf”.
Note that this is equivalent to:
/movies/movie[contains(title,”Werewolf”)]/runningTime
(this is also a better style for an XPath expression).
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
23
Chapter 3 Querying XML
(6) Comparison in Different Parts of the Tree
Example 3-6 Comparison in Different Parts of the Tree
English query
Find the titles of all movies where the director is also the producer.
XPath Expression
/movies/movie/title[../director/familyName=../producer/familyName]
1.
Expression
2.
/movies/movie/title - select all title nodes under movie under movies.
/movie/movie/title[../director/familyName=../producer/familyName] - for each title, look up the
tree and down again to find the familyName under director under movie, and again to find the
familyName under producer under movie, and retrieve the titles where these two are equal.
Note this is equivalent to:
/movies/movie[director/familyName=producer/familyName]/title
For simplicity, we assume that directors and producers can be uniquely identified by their family names.
Comparison of sequences might be defined in a number of ways, including:
1. The condition holds if any director’s name matches any producer’s name (existential comparison).
2. The condition holds if any director’s name matches all producers’ names.
3. The condition holds if the sequences are identical – i.e., same number of names, same names, in the same order.
4. The condition holds if the first director’s name matches the first producer’s name.
XPath (1.0 and 2.0) uses the first definition of “=“ when comparing sequences.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
24
Chapter 3 Querying XML
What about comparing nodes rather than values?
An element node might contain just text (like familyName), or it might be a complex element, containing subelements
(like movie). If you want to know whether two nodes are equal, you might choose:
1. Two nodes are equal if their string values (all the text content between the start and end tags of the element) are
equal.
2. Two nodes are equal if they have the same children, in the same order, and those children are equal.
3. Two nodes are equal if they are the same node – that is, you are not comparing two different nodes that happen to
have the same content, but you are comparing the exact same node with itself (i.e., the same node from the same
document).
XPath (1.0 and 2.0) uses the first definition of “=“ when comparing nodes, which can lead to some odd results. XQuery
1.0 and XPath 2.0 introduced the deep_equal() function, so you can make the comparison in the second definition, and the
“is” operator, to enable the comparison in the third definition.
XQuery 1.0 and XPath 2.0 also introduced a new set of comparison operators (eq, ne, lt, le, gt, and ge) for comparing
values rather than sequences. These new operators are called value comparison operators to distinguish them from the
general comparison operators (=, !=, <, <=, >, and >=).
When querying XML, we are often dealing with sequences (ordered lists) rather than single items, and the items in a
sequence may be values (strings, integers, dates, …) or nodes (elements, complex elements such as those containing child
elements, attributes, …), or a mixture of values and nodes.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
25
Chapter 3 Querying XML
3.2.2 Some Additional Wrinkles
<book>
Title=”Querying XML”
<chapter>
<title>
Querying
<body>
<section>
<title>
Introduction
<para>
This
is the
<emph>
first
…………>
<chapter>
<title>
<body>
...
...
<section>
…………>
<title>
<para>
...
...
…………>
paragraph
of
<link>
Href=”http://mkp.com”
…………>
.
the book
Figure 3-3 book Document Tree
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
26
Chapter 3 Querying XML
(1) Find an Element by Name
Example 3-7 Find an Element by Name
English query
Find all nodes named “title.”
XPath Expression
//title
Expression
1. //
- select all nodes in the document
(select the root node and all its descendents).
2. //title - from those nodes, select the nodes named “title”.
(2) Attributes vs. Elements
Attributes are different from elements (they have different properties), and XML query languages should and do treat
elements and attributes differently.
For example, attributes have no implicit order, they cannot have children, and they do not have a parent-child
relationship with the element they appear in.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
27
Chapter 3 Querying XML
(3) Mixed-Content Models and Text Nodes
Data-centric XML documents
: Figure 3-2  It only has data at its leaf nodes, and each leaf node has exactly one piece of data.
Document-centric XML documents
: Figure 3-3  Element tags are sprinkled throughout the text.
/book/chapter[1]/body/section[1]/paragraph[1]/text()
 returns a sequence of the text nodes, i.e., the sequence (“This is the”, “paragraph of”,”.”)
string(/book/chapter[1]/body/section[1]/paragraph[1])
 returns all the character data between the start and end tags for this paragraph (and no
attribute data), i.e., “This is the first paragraph of the book.”
(4) Querying the Structure Only
/*/*[1]
: returns the title by starting at the root node and navigating to the first child.  (Figure 3-1)
(5) Building Up a Result Set, typically in XML
XPath is limited in this area – you need XQuery (or XSLT) to build up XML result sets.
(6) Documents, Collections, Elements
An XML query on documents like the movies document needs to return fine-grained results, i.e., results that are subtrees
at any level (including leaves).
3.2.3 Summary – Things to Consider
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
28
Chapter 3 Querying XML
3.3 What Do You Know about Your Data?
(1) Knowing about Structure
It is very difficult to query data if you don’t know how it’s laid out.
(2) Knowing about Data Types
If you want to include conditions in your query, in general you’ll need to know about the data
types you are dealing with.
(3) Knowing about the Semantics of the Data
You need to know the semantics of the XML data in order to write sensible queries.
In fact, XML documents typically contain content plus metadata plus marked-up content.
At best, tag names merely give some hints about what the data represents.
An XML document on its own is not self-describing, but an XML document plus a DTD or XML
Schema plus an XML language definition does fully describe the data.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
29
Chapter 3 Querying XML
3.4 Some Ways to Query XML Today
(1) Document Object Model (DOM)
: defines an interface to the data and structure of an XML (or HTML) document so that a program can navigate
and manipulate them. Using DOM API, you can write a program to return values of named elements/attributes or to
walk the XML tree and return values of elements and/or attributes at specified positions in the tree. The DOM API also
supports manipulation of the tree – inserting and deleting elements and attributes.
(2) Simple API for XML (SAX)
: is an event-based API for XML, for use with Java and other languages. To write a SAX program you will need
to obtain a SAX XML parser and then register an event handler to define a callback method for elements, for text, and
for comments. SAX is a serial access API, which means you cannot go back up the tree, or rearrange nodes, as you can
with DOM.
(3) Streaming API for XML (StAX)
: is a Java pull parsing API. That is, StAX lets you pull the next item in the document as it parses. You (the
calling program) decide when to pull the next item (whereas with an event-based parser, it’s the parser that decides when
to cause the calling program to take some action). StAX also lets you write XML to output stream.
(4) XQuery : is a language defined by the W3C specifically for querying XML data.
(5) SQL/XML
: provides an API for querying XML data in a SQL environment, using XPath and XQuery to query the XML
structure and values.
3.5 Chapter Summary
: XQuery 1.0 and XPath 2.0 will be the standard languages for querying XML.
2007-7/KNU
Querying XML: XQuery, XPath, and SQL/XML in Context
30