XPath-expression

CIS 550
Handout 7 -- XPATH and Quilt
CIS550 Handout 6
1
•
XPath
Primary goal = to permit to access some nodes from a given
document
• XPath main construct : axis navigation
• An XPath path consists of one or more navigation steps,
separated by /
• A navigation step is a triplet: axis + node-test + list of
predicates
• Examples
– /descendant::node()/child::author
– /descendant::node()/child::author[parent/attribute::booktitle = “XML”][2]
• XPath also offers some shortcuts
– no axis means child
– //  /descendant-or-self::node()/
• XPath/XSL-T quickref
http://www.mulberrytech.com/quickref/index.html
CIS550 Handout 6
XPath- child axis navigation
• author is shorthand for child::author. Examples:
– aaa -- all the child nodes labeled aaa (1,3)
– aaa/bbb -- all the bbb grandchildren of aaa children (4)
– */bbb all the bbb grandchildren of any child (4,6)
context node
1
4
bbb
aaa
5
2
ccc
aaa
6
3
bbb
aaa
7
ccc
– . -- the context node
– / -- the root node
CIS550 Handout 6
3
XPath- child axis navigation (cont)
– /doc -- all the doc children of the root
– ./aaa -- all the aaa children of the context node
(equivalent to aaa)
– text() -- all the text children of the context node
– node() -- all the children of the context node (includes
text and attribute nodes)
– .. -- parent of the context node
– .// -- the context node and all its descendants
– // -- the root node and all its descendants
– //para -- all the para nodes in the document
– //text() -- all the text nodes in the document
– @font the font attribute node of the context node
CIS550 Handout 6
4
Predicates
[2] -- the second child node of the context node
chapter[5] -- the fifth chapter child of the context node
[last()] -- the last child node of the context node
chapter[title=“introduction”] -- the chapter children of the
context node that have one or more title children whose
string-value is “introduction” (the string-value is the
concatenation of all the text on descendant text nodes)
– person[.//firstname = “joe”] -- the person children of the
context node that have in their descendants a firstname
element with string-value “Joe”
– From the XPath specification:
NOTE: If $x is bound to a node set then $x = “foo” does not mean
the same as not ($x != “foo”) ...
–
–
–
–
CIS550 Handout 6
5
Unions of Path Expressions
• employee consultant -- the union of the employee and
consultant nodes that are children of the context node
• For some reason person/(employeeconsultant) --as in regular
path expressions -- is not allowed
• However person/node()[boolean(employeeconsultant)] is
allowed!!
• From the XPATH specification:
– The boolean function converts its argument to a boolean as
follows:
• a number is true if and only if it is neither positive or negative zero
nor NaN
• a node-set is true if and only if it is non-empty
• a string is true if and only if its length is non-zero
• an object of a type other than the four basic types is converted to a
boolean in a way that is dependent on that type
CIS550 Handout 6
6
Axis navigation
• So far, nearly all our expressions have moved us down the by
moving to child nodes. Exceptions were
–
–
–
–
. -- stay where you are
/ go to the root
// all descendants of the root
.// all descendants of the context node
• All other expressions have been abbreviations for child::…
e.g. child::para. child:is an example of an axis
• XPath has several axes: ancestor, ancestor-or-self, attribute,
child, descendant, descendant-or-self, following, followingsibling, namespace, parent, preceding, preceding-sibling, self
– Some of these (self, parent) describe single nodes, others
describe sequences of nodes.
CIS550 Handout 6
7
XPath Navigation Axes
(merci, Arnaud)
ancestor
preceding-sibling
following-sibling
self
child
preceding
attribute
following
namespace
descendant
CIS550 Handout 6
XPath abbreviated syntax
(nothing)
@
//
.
.//
..
/
CIS550 Handout 6
child::
attribute::
/descendant-or-self::node()
self::node()
descendant-or-self::node
parent::node()
(document root)
XPath
• Reasonably widely adopted -- in XML-Schema and
query languages.
• Neither more expressive nor less expressive than
regular path expressions (can’t do (ab)* )
• Particularly messy in some areas:
– defining order of results
– overloading of operations,
• e.g. [chapter/title = “Introduction”]
• why not [ “Introduction” IN chapter/title] ?
CIS550 Handout 6
10
Quilt
proposed by Chamberlin, Robbie and Florescu
(from the authors’ slides)
• Leverage the most effective features of several existing
and proposed query languages
• Design a small, clean, implementable language
• Cover the functionality required by all the XML Query use
cases in a single language
• Write queries that fit on a slide
• Design a quilt, not a camel
CIS550 Handout 6
11
Quilt/Kweelt URLs
Quilt (the language)
http://www.almaden.ibm.com/cs/people/chamberlin/quilt_lncs.pdf
Kweelt (the implementation)
http://db.cis.upenn.edu/Kweelt/
http://db.cis.upenn.edu/Kweelt/useCases
(examples in these notes)
CIS550 Handout 6
12
Quilt = XPath + “comprehension” syntax
bind variables
• XML -QL
where
<pattern> in <XML-expression>
<pattern> in <XML-expression>
…
<condition>
use variables
construct <expression>
• Quilt
bind variables
for
x in <XPath-expression>
y in <XPath-expression>
…
use variables
where <condition>
return <expression>
CIS550 Handout 6
13
Examples of Quilt
(from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt )
Relational data -- two DTDs:
<?xml version="1.0" ?>
<!DOCTYPE items [
<!ELEMENT items
(item_tuple*)>
<!ELEMENT item_tuple (itemno, description, offered_by, start_date?,
end_date?, reserve_price? )>
<!ELEMENT itemno
(#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT offered_by (#PCDATA)>
<!ELEMENT start_date (#PCDATA)>
<!ELEMENT end_date (#PCDATA)>
<!ELEMENT reserve_price (#PCDATA)>
]>
<?xml version="1.0" ?>
<!DOCTYPE bids [
<!ELEMENT bids
(bid_tuple*)>
<!ELEMENT bid_tuple (userid, itemno, bid, bid_date)>
<!ELEMENT userid (#PCDATA)>
<!ELEMENT itemno (#PCDATA)>
<!ELEMENT bid
(#PCDATA)>
<!ELEMENT bid_date (#PCDATA)>
]>
CIS550 Handout 6
14
The data
<items>
<item_tuple>
<itemno>1001</itemno>
<description>Red Bicycle</description>
<offered_by>U01</offered_by>
<start_date>1999-01-05</start_date>
<end_date>1999-01-20</end_date>
<reserve_price>40</reserve_price>
</item_tuple>
<item_tuple>
<itemno>1002</itemno>
<description>Motorcycle</description>
<offered_by>U02</offered_by>
<start_date>1999-02-11</start_date>
<end_date>1999-03-15</end_date>
<reserve_price>500</reserve_price>
</item_tuple>
<bids>
<bid_tuple>
<userid>U02</userid>
<itemno>1001</itemno>
<bid>35</bid>
<bid_date>99-01-07</bid_date>
</bid_tuple>
<bid_tuple>
<userid>U04</userid>
<itemno>1001</itemno>
<bid>40</bid>
<bid_date>99-01-08</bid_date>
</bid_tuple>
…
</bids>
…
</items>
CIS550 Handout 6
15
Query 1
FUNCTION date()
{
"1999-02-01"
}
simple function definitions
XPath expressions
in orange
<result>
(
FOR $i IN document("items.xml")//item_tuple
WHERE $i/start_date LEQ date()
AND $i/end_date GEQ date()
dates are formatted so
AND contains($i/description, "Bicycle")
that lexicographic
RETURN
ordering gives the right
<item_tuple>
result
$i/itemno ,
$i/description
</item_tuple> SORTBY (itemno)
)
</result>
CIS550 Handout 6
16
Output from Q1
<?xml version="1.0" ?>
<result>
<item_tuple>
<itemno> 1003 </itemno>
<description> Old Bicycle </description>
</item_tuple>
<item_tuple>
<itemno> 1007 </itemno>
<description> Racing Bicycle </description>
</item_tuple>
</result>
CIS550 Handout 6
17
Query Q2
For all bicycles, list the item number, description, and
highest bid (if any), ordered by item number.
<result>
(
FOR $i IN document("items.xml")//item_tuple
LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno]
WHERE contains($i/description, "Bicycle")
RETURN
use of variable
<item_tuple>
in Xpath
$i/itemno ,
$i/description ,
IF ($b) THEN
<high_bid> NumFormat("#####.##", max(-1, $b/bid)) </high_bid>
ELSE ""
</item_tuple> SORTBY (itemno)
)
lots of coercion
</result>
CIS550 Handout 6
18
Output from Q2
<result>
<item_tuple>
<itemno> 1001 </itemno>
<description> Red Bicycle </description>
<high_bid> 55 </high_bid>
</item_tuple>
<item_tuple>
<itemno> 1003 </itemno>
<description> Old Bicycle </description>
<high_bid> 20 </high_bid>
</item_tuple>
<item_tuple>
<itemno> 1007 </itemno>
<description> Racing Bicycle </description>
<high_bid> 225 </high_bid>
</item_tuple>
<item_tuple>
<itemno> 1008 </itemno>
<description> Broken Bicycle </description>
</item_tuple>
</result>
CIS550 Handout 6
19
Query Q3
Find cases where a user with a rating worse (alphabetically greater
than "C" ) offers an item with a reserve price of more than 1000.
<result>
(
FOR $u IN document("users.xml")//user_tuple,
$i IN document("items.xml")//item_tuple
WHERE $u/rating GT 'C'
AND $i/reserve_price GT 1000
Comparing sets with singletons
AND $i/offered_by = $u/userid
Same rules as in XPath? In this
RETURN
case the DTD gives uniqueness
<warning>
<user_name>$u/name/text()</user_name>,
<user_rating>$u/rating/text()</user_rating>,
<item_description>$i/description/text()</item_description>,
$i/reserve_price
</warning>
)
</result>
CIS550 Handout 6
20
Quilt -- Attributes and IDs
<census>
<person name = "Bill" job = "Teacher">
<?xml version="1.0" ?>
<person name = "Joe" job = "Painter" spouse = "Martha">
<!DOCTYPE census [
<person name = "Sam" job = "Nurse">
<!ELEMENT census (person*)>
<person name = "Fred" job = "Senator" spouse = "Jane">
<!ELEMENT person (person*)>
</person>
<!ATTLIST person
</person>
name ID
#REQUIRED
<person name = "Karen" job = "Doctor" spouse = "Steve">
spouse IDREF #IMPLIED
</person>
job CDATA #IMPLIED >
</person>
]>
<person name = "Mary" job = "Pilot">
<person name = "Susan" job = "Pilot" spouse = "Dave">
</person>
</person>
</person>
<person name = "Frank" job = "Writer">
<person name = "Martha" job = "Programmer" spouse = "Joe">
<person name = "Dave" job = "Athlete" spouse = "Susan">
</person>
</person>
...
</person>
</census>
CIS550 Handout 6
21
Query Q1
Find Martha's spouse:
FOR $m IN document("census.xml")//person[@name="Martha"]
RETURN shallow($m/@spouse->{person@name})
The shallow function
strips an element of
its subelements.
Dereferencing
A hack. Kweelt
does not read
the DTD
CIS550 Handout 6
22
Query Q6
Find Bill's grandchildren.
<result>
(
FOR $b IN document("census.xml")//person[@name = "Bill"] ,
$c IN $b/person | $b/@spouse->{person@name}/person ,
$g IN $c/person | $c/@spouse->{person@name}/person
RETURN
shallow($g)
)
</result>
CIS550 Handout 6
23
Status of XML types
• DTDs -- widely used, but limited
– lack of base types
– untyped pointers (IDs and IDREFs)
– no tuple types (hence no record subtyping or inheritance)
• XML-schema -- lots of hoopla, but
– not stable
– too complex
• Others: RDF (not really types for XML) SOX,
Relax, Schematron
• Opinions:
– None of these is good for database design.
– Something new is needed (some core of XML-schema)
CIS550 Handout 6
24
Status of XML Query languages
• None of them are really typed (by a DTD or
anything else).
• Type errors show up as empty answers
• XML-QL probably the most elegant, but too
powerful.
• XSL and descendants are working (in IE 5)
• Quilt -- nice extension of XPath, but XPath is
quite complex.
• Nothing like an “algebra” for any of these (though
some ideas are now emerging)
• Nothing like database optimization yet exists.
• Do we need something simpler?
CIS550 Handout 6
25