XML Querying and Views

XML Querying and Views
Helena Galhardas
DEI IST
(slides baseados na disciplina CIS 550 – Database &
Information Systems, Univ. Pennsylvania, Zachary Ives)
Agenda


Recalling XML Querying
Views
XQuery’s Basic Form



Has an analogous form to SQL’s
SELECT..FROM..WHERE..GROUP BY..ORDER BY
The model: bind nodes (or node sets) to variables;
operate over each legal combination of bindings;
produce a set of nodes
“FLWOR” statement [note case sensitivity!]:
for {iterators that bind variables}
let {collections}
where {conditions}
order by {order-conditions}
(older version was “SORTBY”)
return {output constructor}
3
“Iterations” in XQuery
A series of (possibly nested) FOR statements assigning the results
of XPaths to variables
for $root in document(“http://my.org/my.xml”)
for $sub in $root/rootElement,
$sub2 in $sub/subElement, …



Something like a template that pattern-matches, produces a
“binding tuple”
For each of these, we evaluate the WHERE and possibly output
the RETURN template
document() or doc() function specifies an input file as a URI
 Old version was “document”; now “doc” but it depends on your
XQuery implementation
4
Example XML Data
Root
?xml
2002…
p-i
element
dblp
article
mdate
key
author title year school
1992
ms/Brown92
key
editor title journal volume year ee ee
2002…
tr/dec/…
PRPL…
Kurt P….
attribute
text
mastersthesis
mdate
root
Digital…
Univ….
1997
The…
Paul R.
db/labs/dec
SRC…
http://www.
5
Two XQuery Examples
<root-tag> {
for $p in document(“dblp.xml”)/dblp/proceedings,
$yr in $p/yr
where $yr = “1999”
return <proc> {$p} </proc>
} </root-tag>
for $i in document(“dblp.xml”)/dblp/inproceedings[author/text() = “John
Smith”]
return <smith-paper>
<title>{ $i/title/text() }</title>
<key>{ $i/@key }</key>
{ $i/crossref }
</smith-paper>
6
Another Example
Root
?xml
name
country
…
…
p-i
element
mastersthesis
key
author title year
Univ….
attribute
text
universities
university
root
school
USA
1999
ms/Brown92
PRPL…
Univ….
Kurt P….
7
What If Order Doesn’t Matter?
By default:



SQL is unordered
XQuery is ordered everywhere!
But unordered queries are much faster to answer
XQuery has a way of telling the query engine to
avoid preserving order:

unordered {
for $x in (mypath) …
}
8
Querying & Defining Metadata –
Can’t Do This in SQL
Can get a node’s name by querying node-name():
for $x in document(“dblp.xml”)/dblp/*
return node-name($x)
Can construct elements and attributes using computed
names:
for $x in document(“dblp.xml”)/dblp/*,
$year in $x/year,
$title in $x/title/text(),
element node-name($x) {
attribute {“year-” + $year} { $title }
}
9
XQuery Wrap-up



XQuery is very SQL-like, but in some ways
cleaner and more orthogonal
It is based on paths and binding tuples, with
collections and trees as its first-class objects
See www.w3.org/TR/xquery/ for more details
on the language
10
A Problem

We frequently want to reference data in a way that
differs from the way it’s stored





XML data  HTML, text, etc.
Relational data  XML data
Relational data  Different relational representation
XML data  Different XML representation
Generally, these can all be thought of as different
views over the data


A view is a named query
Let’s start with a special presentation language for XML 
HTML
11
XSL(T): XML  “Other Stuff ”

XSL (XML Stylesheet Language) is actually divided
into two parts:




XSL:FO: formatting for XML
XSLT: a special transformation language
We’ll leave XSL:FO for you to read off www.w3.org,
if you’re interested
XSLT is actually able to convert from XML  HTML,
which is how many people do their formatting today


Products like Apache Cocoon generally translate XML 
HTML on the server side
Your browser will do XML  HTML on the client side
12
Other Forms of Views
XSLT is a language primarily designed from
going from XML  non-XML
Obviously, we can do XML  XML in XQuery
… Or relations  relations
… What about relations  XML and XML 
relations?
Let’s start with XML  XML, relations 
relations
13
Views in SQL and XQuery
A view is a named query
 We use the name of the view to invoke the query
(treating it as if it were the relation it returns)
Using the views:
SQL:
SELECT *
CREATE VIEW V(A,B,C) AS
FROM V, R
SELECT A,B,C FROM R
WHERE V.B = 5
WHERE R.A = “123”
AND V.C = R.C
XQuery:
declare function V() as element(content)* {for $v in V()/content,
$r in doc(“r”)/root/tree
for $r in doc(“R”)/root/tree,
where $v/b = $r/b
$a in $r/a, $b in $r/b, $c in $r/c
return $v
where $a = “123”
return <content>{$a, $b, $c}</content>
}

14
What’s Useful about Views
Providing security/access control


We can assign users permissions on different views
Can select or project so we only reveal what we want!
Can be used as relations in other queries

Allows the user to query things that make more sense
Describe transformations from one schema (the base
relations) to another (the output of the view)


The basis of converting from XML to relations or vice versa
This will be incredibly useful in data integration, discussed
soon…
Allow us to define recursive queries
15
Materialized vs. Virtual Views

A virtual view is a named query that is actually recomputed every time – it is merged with the
referencing query
CREATE VIEW V(A,B,C) AS SELECT *
FROM V, R
SELECT A,B,C FROM R
WHERE V.B = 5 AND V.C = R.C
WHERE R.A = “123”

A materialized view is one that is computed once and
its results are stored as a table




Think of this as a cached answer
These are incredibly useful!
Techniques exist for using materialized views to answer other
queries
Materialized views are the basis of relating tables in different
16
schemas
Views Should Stay Fresh


Views (sometimes called intensional
relations) behave, from the perspective of a
query language, exactly like base relations
(extensional relations)
But there’s an association that should be
maintained:


If tuples change in the base relation, they should
change in the view (whether it’s materialized or
not)
If tuples change in the view, that should reflect in
the base relation(s)
17
View Maintenance and
the View Update Problem


There exist algorithms to incrementally recompute a
materialized view when the base relations change
We can try to propagate view changes to the base
relations

However, there are lots of views that aren’t easily
updatable:
R A B
R⋈S A B C delete?
S B C
1 2 4
2 4
1 2
1 2 3
2 3
2 2
2 2 4

We can ensure views are updatable
2 2 3
by enforcing certain constraints (e.g., no aggregation),
but this limits the kinds of views we can have
18
Views as a Bridge between Data
Models
A claim made several times:
“XML can’t represent anything that can’t be
expressed in in the relational model”
If this is true, then we must be able to represent
XML in relations
Store a relational view of XML
(or create an XML view of relations)
19
An Important Set of Questions
Views are incredibly powerful formalisms for describing
how data relates: fn: rel  …  rel  rel
(Or XML  XML  XML, or rel  rel  XML, ...)
Can I define a view recursively?

Why might this be useful in the XML construction case?
When should the recursion stop?
Suppose we have two views, v1 and v2


How do I know whether they represent the same data?
If v1 is materialized, can we use it to compute v2?

This is fundamental to query optimization and data
integration, as we’ll see later
Reasoning about Queries and Views
SQL or XQuery are a bit too complex to reason
about directly

Some aspects of it make reasoning about SQL
queries undecidable
We need an elegant way of describing views
(let’s assume a relational model for now)



Should be declarative
Should be less complex than SQL
Doesn’t need to support all of SQL – aggregation,
for instance, may be more than we need
Referências



Raghu Ramakrishnan et al, “Database
Management Systems”
Avi Silberchatz et al, “Database System
Concepts”
XML Recommendation.