XPath - GUC MET

Semi-Structured Data and the Web
Topics:
Querying XML Documents: XPath
Prof. Dr. Slim Abdennadher
c S.
Abdennadher
1
Querying XML Documents
• Querying XML data means to:
– identify nodes
– to test certain further properties of these nodes
– then to operate on the matches
– and to construct the result in XML documents as answers
• In XML, XQUERY plays the role of SQL is relational databases
– XPATH is an embedded sublanguage to locate and test
– XQUERY iterates over selected parts and operates on and constructs
answers
c S.
Abdennadher
2
XPath Navigation
c S.
Abdennadher
3
XPath
• XPath is a syntax for defining parts of an XML document.
• XPath uses path expressions to navigate in XML documents.
• XPath is based on the Unix directory notation
– In a Unix directory tree: /slim/Slides/CSEN604/Lecture6
– In an XML tree: /bib/book/year
• Specification of the navigation formalism as W3C XPath in 1999
http://www.w3.org/TR/xpath
• XPath is the building block for other W3C standards:
– XSL Transformations: XSLT
– XML Link: XLink
– XML Pointer: XPointer
– XML Query: XQuery
c S.
Abdennadher
4
Example of XPath Queries
<bib>
<book>
<publisher> Springer </publisher>
<author> Thom Fruehwirth </author>
<author>
<first-name> Slim </first-name>
<last-name> Abdenndaher </last-name>
</author>
<title> Essentials of Constraint Programming </title>
<year> 2001 </year>
</book>
<book price=’55’>
<publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
</bib>
c S.
Abdennadher
5
Selecting Nodes – XPath Examples
• Query: All year elements that are direct subelements of book:
/bib/book/year
• Result:
<year> 2001 </year>
<year> 1998 </year>
• Query: All year elements that are direct subelements of paper:
/bib/paper/year
• Result: empty (there were no papers in the document)
c S.
Abdennadher
6
Selecting Nodes – XPath Examples
• Query: all author elements in the current document:
//author
• Result:
<author> Thom Fruehwirth </author>
<author>
<first-name> Slim </first-name>
<last-name> Abdenndaher </last-name>
</author>
<author> Jeffrey D. Ullman </author>
• Query: select all first name elements that are descendant from bib
element, no matter where they are under bib element:
/bib//first-name
• Result:
<first-name> Slim </first-name>
c S.
Abdennadher
7
Selecting Nodes – Attribute Nodes
• Query: Find the prices of all books:
/bib/book/@price
• Result: ’55’
• @price means that price has to be an attribute.
c S.
Abdennadher
8
Selecting Nodes
Expression
Description
nodename
Selects all child nodes of the node
/
Selects from the root node
//
Selects nodes in the document from the current node
that match the selection no matter where they are
.
Selects the current node
..
Selects the parent of the current node
@
Selects attributes
Note that if the path starts with a slash / it always represents an absolute
path to an element.
• bib: selects all the child nodes of the bib element.
• /bib: selects the root element bib.
c S.
Abdennadher
9
Selecting Unknown Nodes – XPath Examples
• Query:
/author/*
• Result:
<first-name> Slim </first-name>
<last-name> Abdennadher </last-name>
• Query: Select all book elements which have any attributes:
//book[@*]
• Result
<book price=’55’>
<publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
c S.
Abdennadher
10
Selecting Unknown nodes
XPath wildcards can be used to select unknown XML elements.
c S.
Abdennadher
Wildcard
Description
*
Matches any element node
@*
Matches any attribute node
node()
Matches any node of any kind
11
XPath – Functions
• Query:
/bib/book/author/text()
• Result:
Thom Fruehwirth
Jeffrey D. Ullman
• Slim Abdennadher does not appear because he has firstname and
lastname.
• Functions in XPath:
– text(): matches the text value
– node(): matches any node (corresponds to * or @* or text())
– name(): returns the name of the current tag
c S.
Abdennadher
12
Selecting Specific Nodes – Qualifiers
• Predicates are used to find a specific node or a node that contains a
specific value.
• Predicates are always embedded in square brackets.
• Query: Select the first book element that is child of the bib element:
/bib/book[1]
• Query: Select the first three book elements that are children of the bib
element:
/bib/book[position()<4]
• Query: Select all the book elements that have a price attribute with a
value greater than 50.
/bib/book[@price>’50’]
• Select all the title elements of the book elements of the bib element that
have a price attribute with a value greater than 50.
/bib/book[@price>’50’]/title
c S.
Abdennadher
13
More on Qualifiers
• Query: Select all the author elements that have a firstname element:
/bib/book/author[firstname]
• Result:
<author>
<first-name> Slim </first-name>
<last-name> Abdennadher </last-name>
</author>
• XPath expressions in condition have existential semantics:
– The truth value associated with an XPath expression is true, if the
result set is not empty.
• XPath expressions in condition are not only simple properties of an
object, but are path expressions that are evaluated wrt. the current
context node.
• Example: //country[.//city/name=’Cairo’]/name
returns the names of all countries, in which a city with name Cairo is
located.
c S.
Abdennadher
14
More on Qualifiers
• Note that in conditions: .//city and //city are different
• Example: //country[//city/name=’Cairo’]/name
returns the names of all countries (if there is some city with name Cairo
in the document).
• Query: Select all the book elements that were written by an author who
is younger than 25.
/bib/book[author/@age < ’25’]
• When comparing an element with something, the text() function is
applied implicitly:
//country[name = ’Egypt’]
equivalent to
//country[name/text() = ’Egypt’]
• The Boolean connectives and, or, and not can be used in condition.
c S.
Abdennadher
15
Absolute and Relative Path
• Paths that start with a name are relative paths that are evaluated
against the current context node:
//country[name = ’Egypt’]
• Paths that starts with / or // are absolute paths.
• By using the | operator in an XPath expression several paths can be
selected.
• Query: Selects all the title AND publisher elements of all book
elements
//book/title | //book/publisher
c S.
Abdennadher
16
XPath Navigation
• Starting with a current node it is possible to navigate in an XML tree to
several directions.
• Navigation can be done along 13 axes:
ancestor
ancestor-or-self
attribute
child
descendant
descendant-or-self
following
following-sibling
namespace
parent
preceding
preceding-sibling
self
• We have only seen child, descendant, attribute, and parent so far.
c S.
Abdennadher
17
XPath Navigation
c S.
Abdennadher
18
Navigation Path
• A navigation path is of the form:
/step/step/...
• The result of each step is a set of nodes that serve as input for the next
step.
• A step is of the form
axisname::nodetest[predicate]
– an axis: defines the tree-relationship between the selected nodes and
the current node
– a node-test: identifies a node within an axis
– zero or more predicates (to further refine the selected node-set)
c S.
Abdennadher
19
Navigation Path – Examples
• Query: child::book
• Meaning: Selects all book nodes that are children of the current node. It
corresponds to book.
• Query: attribute::price
• Meaning: Selects the price attribute of the current node. It corresponds
to @price.
• Query: child::*
• Meaning: Selects all children of the current node
• Query: child::text()
• Meaning: Selects all text child nodes of the current node.
• Query descendant::book
• Meaning: Selects all book descendants of the current node
• Query: ancestor-or-self::book
• Meaning: Selects all book ancestors of the current node - and the current
as well if it is a book node
c S. Abdennadher
20