XML, XML Schema, Xpath and
XQuery
Slides collated from various sources,
many from Dan Suciu at Univ. of
Washington
XML
W3C standard to complement HTML
• origins: structured text SGML
• motivation:
– HTML describes presentation
– XML describes content
• HTML4.0 XML SGML
• http://www.w3.org/TR/2000/REC-xml-20001006 (version 2,
10/2000)
CS561 - Spring 2005.
2
From HTML to XML
HTML describes the presentation
CS561 - Spring 2005.
3
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteboul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
CS561 - Spring 2005.
4
XML
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>
CS561 - Spring 2005. the content
XML describes
5
XML Terminology
•
•
•
•
•
•
tags: book, title, author, …
start tag: <book>, end tag: </book>
elements: <book>…<book>,<author>…</author>
elements are nested
empty element: <red></red> abbrv. <red/>
an XML document: single root element
well formed XML document:
if it has matching tags
CS561 - Spring 2005.
6
More XML: Attributes
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 1995 </year>
</book>
attributes are alternative ways to represent data
CS561 - Spring 2005.
7
More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name>
<children idref=“o123 o555”/>
</person>
<person id=“o123” mother=“o456”><name>John</name>
</person>
oids and references
in XML are just syntax8
CS561 - Spring 2005.
XML Namespaces
• http://www.w3.org/TR/REC-xml-names
(1/99)
• name ::= [prefix:]localpart
<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
CS561 - Spring 2005.
9
XML Namespaces
• syntactic: <number> , <isbn:number>
• semantic: provide URL for schema
<tag xmlns:mystyle = “http://…”>
…
defined here
<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
CS561 - Spring 2005.
10
XML Schemas
• http://www.w3.org/TR/xmlschema1/10/2000
• generalizes DTDs
• uses XML syntax
• two documents: structure and datatypes
– http://www.w3.org/TR/xmlschema-1
– http://www.w3.org/TR/xmlschema-2
• XML-Schema is complex
CS561 - Spring 2005.
12
XML Schemas
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
<xsd:element name=“author” minOccurs=“0”/>
<xsd:element name=“year”/>
<xsd: choice> < xsd:element name=“journal”/>
<xsd:element name=“conference”/>
</xsd:choice>
</xsd:sequence>
</xsd:element>
DTD: <!ELEMENT paper (title,author*,year,
(journal|conference))>
CS561 - Spring 2005.
13
Elements v.s. Types in
XML Schema
<xsd:element name=“person”>
<xsd:complexType>
<xsd:sequence>
<xsd:element name=“name”
type=“xsd:string”/>
<xsd:element name=“address”
type=“xsd:string”/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
DTD:
<xsd:element name=“person”
type=“ttt”>
<xsd:complexType name=“ttt”>
<xsd:sequence>
<xsd:element name=“name”
type=“xsd:string”/>
<xsd:element name=“address”
type=“xsd:string”/>
</xsd:sequence>
</xsd:complexType>
<!ELEMENT person (name,address)>
CS561 - Spring 2005.
14
Elements v.s. Types in
XML Schema
• Types:
– Simple types (integers, strings, ...)
– Complex types (regular expressions, like in DTDs)
• Element-type-element alternation:
–
–
–
–
–
Root element has a complex type
That type is a regular expression of elements
Those elements have their complex types...
...
On the leaves we have simple types
CS561 - Spring 2005.
15
Local and Global Types in
XML Schema
• Local type:
<xsd:element name=“person”>
[define locally the person’s type]
</xsd:element>
• Global type:
<xsd:element name=“person” type=“ttt”/>
<xsd:complexType name=“ttt”>
[define here the type ttt]
</xsd:complexType>
CS561 - Spring 2005.
Global types: can be
reused in other elements
16
Local v.s. Global Elements in
XML Schema
• Local element:
<xsd:complexType name=“ttt”>
<xsd:sequence>
<xsd:element name=“address” type=“...”/>...
</xsd:sequence>
</xsd:complexType>
• Global element:
<xsd:element name=“address” type=“...”/>
<xsd:complexType name=“ttt”>
<xsd:sequence>
<xsd:element ref=“address”/> ...
</xsd:sequence>
</xsd:complexType>
17
Global elements: like in DTDs
CS561 - Spring 2005.
Regular Expressions in
XML Schema
Recall the element-type-element alternation:
<xsd:complexType name=“....”>
[regular expression on elements]
</xsd:complexType>
Regular expressions:
•
•
•
•
•
<xsd:sequence> A B C </...>
=ABC
<xsd:choice> A B C </...>
=A|B|C
<xsd:group> A B C </...>
= (A B C)
<xsd:... minOccurs=“0” maxOccurs=“unbounded”> ..</...> = (...)*
<xsd:... minOccurs=“0” maxOccurs=“1”> ..</...>
= (...)?
CS561 - Spring 2005.
18
Derived Types by Extensions
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
CS561 - Spring 2005.
Corresponds to inheritance
20
XML:
Keys in XML Schema
<purchaseReport>
<regions>
<zip code="95819">
<part number="872-AA" quantity="1"/>
<part number="926-AA" quantity="1"/>
<part number="833-AA" quantity="1"/>
<part number="455-BX" quantity="1"/>
</zip>
<zip code="63143">
<part number="455-BX" quantity="4"/>
</zip>
</regions>
<parts>
<part number="872-AA">Lawnmower</part>
<part number="926-AA">Baby Monitor</part>
<part number="833-AA">Lapis Necklace</part>
<part number="455-BX">Sturdy Shelves</part>
</parts>
CS561 - Spring 2005.
</purchaseReport>
XML Schema:
<key name="NumKey">
<selector xpath="parts/part"/>
<field xpath="@number"/>
</key>
21
Keys in XML Schema
• In general, two flavors:
<key name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
<unique name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
Note: all Xpath expressions “start” at the element currently being defined
The fields must identify a single node
CS561 - Spring 2005.
22
Keys in XML Schema
• Unique = guarantees uniqueness
• Key = guarantees uniqueness and existence
• All Xpath expressions are “restricted”:
– /a/b | /a/c OK for selector”
– //a/b/*/c OK for field
• Note: better than DTD’s ID mechanism
CS561 - Spring 2005.
23
Keys in XML Schema
• Examples
Recall: must have
A single forename,
Single surname
<key name="fullName">
<selector xpath=".//person"/>
<field xpath="forename"/>
<field xpath="surname"/>
</key>
<unique name="nearlyID">
<selector xpath=".//*"/>
<field xpath="@id"/>
</unique>
CS561 - Spring 2005.
24
Foreign Keys in XML Schema
• Example
<keyref name="personRef" refer="fullName">
<selector xpath=".//personPointer"/>
<field xpath="@first"/>
<field xpath="@last"/>
</keyref>
CS561 - Spring 2005.
25
XPATH
XPath
• Goal = permit to access some nodes from document
• XPath main construct : axis navigation
• XPath path consists of one or more navigation steps,
separated by /
• Navigation step : axis + node-test + predicates
• Examples
– /descendant::node()/child::author
– /descendant::node()/child::author[parent/attribute::booktitle =“XML”][2]
• XPath also offers shortcuts
– no axis means child
– // /descendant-or-self::node()/
CS561 - Spring 2005.
27
XPath- Child axis navigation
• author is shorthand for child::author. Examples:
– aaa -- all the child nodes labeled aaa (1,3)
– aaa/bbb -- all the bbb grandchildren of aaa children (4)
– */bbb all the bbb grandchildren of any child (4,6)
context node
1
4
bbb
aaa
5
2
ccc
aaa
6
3
bbb
aaa
7
ccc
– . -- the context node
– / -- the root node
CS561 - Spring 2005.
28
XPath- child axis navigation
– /doc -- all the doc children of the root
– ./aaa -- all the aaa children of the context node
(equivalent to aaa)
– text() -- all the text children of the context node
– node() -- all the children of the context node
(includes text and attribute nodes)
– .. -- parent of the context node
– .// -- the context node and all its descendants
– // -- the root node and all its descendants
– //text() -- all the text nodes in the document
CS561 - Spring 2005.
29
Predicates
– [2] -- the second child node of the context node
– chapter[5] -- the fifth chapter child of the context
node
– [last()] -- the last child node of the context node
– chapter[title=“introduction”] -- the chapter children of
the context node that have one or more title children
whose string-value is “introduction” (the string-value
is the concatenation of all the text on descendant text
nodes)
– person[.//firstname = “joe”] -- the person children of
the context node that have in their descendants a
firstname element with string-value “Joe”
CS561 - Spring 2005.
30
Axis navigation
• So far, nearly all our expressions have moved us
down by moving to child nodes. Exceptions were
–
–
–
–
. -- stay where you are
/ go to the root
// all descendants of the root
.// all descendants of the context node
• XPath has several axes: ancestor, ancestor-or-self,
attribute, child, descendant, descendant-or-self,
following, following-sibling, namespace, parent,
preceding, preceding-sibling, self
– Some of these (self, parent) describe single nodes,
others describe sequences of nodes.
CS561 - Spring 2005.
31
XPath Navigation Axes
ancestor
preceding-sibling
following-sibling
self
child
preceding
attribute
following
namespace
descendant
CS561 - Spring 2005.
32
XPath abbreviated syntax
(nothing)
@
//
.
.//
..
/
child::
attribute::
/descendant-or-self::node()
self::node()
descendant-or-self::node
parent::node()
(document root)
CS561 - Spring 2005.
33
Query Languages - XQuery
Summary of XQuery
• FLWR expressions
• FOR and LET expressions
• Collections and sorting
Resources
XQuery: A Query Language for XML Chamberlin,
Florescu, et al.
W3C recommendation: www.w3.org/TR/xquery/
CS561 - Spring 2005.
36
XQuery
• Based on Quilt
(which is based on XML-QL)
• http://www.w3.org/TR/xquery/2/2001
• XML Query data model (ordered)
CS561 - Spring 2005.
37
FLWR (“Flower”) Expressions
FOR ... LET... FOR... LET...
WHERE...
RETURN...
CS561 - Spring 2005.
38
XQuery
Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
CS561 - Spring 2005.
Result:
<title> abc </title>
<title> def </title>
<title> ghi </title>
39
XQuery
For each author of a book by Morgan
Kaufmann, list all books she published:
FOR $a IN distinct(document("bib.xml")
/bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function thatCS561
eliminates
duplicates
- Spring 2005.
40
XQuery
Result:
<result>
<author>Jones</author>
<title> abc </title>
<title> def </title>
</result>
<result>
<author> Smith </author>
<title> ghi </title>
</result>
CS561 - Spring 2005.
41
XQuery
• FOR $x in expr -- binds $x to each element
in the list expr
• LET $x = expr -- binds $x to the entire list
expr
– Useful for common subexpressions and for
aggregations
CS561 - Spring 2005.
42
XQuery
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
count = a (aggregate) function that returns the number of elms
CS561 - Spring 2005.
43
XQuery
Find books whose price is larger than average:
LET $a=avg(document("bib.xml")/bib/book/@price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/@price > $a
RETURN $b
CS561 - Spring 2005.
44
XQuery
Summary:
• FOR-LET-WHERE-RETURN = FLWR
FOR/LET Clauses
List of tuples
WHERE Clause
List of tuples
RETURN Clause
CS561 - Spring 2005.
45
Instance of Xquery data model
FOR v.s. LET
FOR
• Binds node variables iteration
LET
• Binds collection variables one value
CS561 - Spring 2005.
46
FOR v.s. LET
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
CS561 - Spring 2005.
Returns:
<result> <book>...</book></result>
<result> <book>...</book></result>
<result> <book>...</book></result>
...
Returns:
<result> <book>...</book>
<book>...</book>
<book>...</book>
...
</result>
47
Collections in XQuery
• Ordered and unordered collections
– /bib/book/author = an ordered collection
– Distinct(/bib/book/author) = an unordered collection
• LET $a = /bib/book $a is a collection
• $b/author a collection (several authors...)
RETURN <result> $b/author </result>
CS561 - Spring 2005.
Returns:
<result> <author>...</author>
<author>...</author>
<author>...</author>
...
</result>
48
Sorting in XQuery
<publisher_list>
FOR $p IN distinct(document("bib.xml")//publisher)
RETURN <publisher> <name> $p/text() </name> ,
FOR $b IN document("bib.xml")//book[publisher = $p]
RETURN <book>
$b/title ,
$b/@price
</book> SORTBY(price DESCENDING)
</publisher> SORTBY(name)
</publisher_list>
CS561 - Spring 2005.
49
Sorting in XQuery
• Sorting arguments: refer to name space of
RETURN clause, not FOR clause
• To sort on an element you don’t want to
display, first return it, then remove it with
an additional query.
CS561 - Spring 2005.
50
If-Then-Else
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
CS561 - Spring 2005.
51
Existential Quantifiers
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
CS561 - Spring 2005.
52
Universal Quantifiers
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
CS561 - Spring 2005.
53
© Copyright 2026 Paperzz