Slides

Answering Tree Pattern
Queries Using Views
Laks V.S. Lakshmanan, Hui (Wendy) Wang, and
Zheng (Jessica) Zhao
University of British Columbia
Vancouver, BC
Amazon.com
Outline
Motivation
 Problems Studied
 Without schema
 With schema
 Recursive schemas
 Related Work
 Summary & Future Work

Motivation 1/3

Integration of existing data sources.
Local as view (LAV) – one of the well-known
approaches.
 Each source = a materialized view over
some global database.
 Answer to query over global DB = answer to
query using (materialized) views.

Motivation 2/3
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
Source = View “//Trials//Trial” over some DB containing
clinical data – trials, their status, patient data, etc.
Consider query Q: //Trials[//Status]//Trial over [unknown]
original DB.
How can and should we answer it using above source?
<PharmaLab> (1)
<Trials @type=“T1”> (2)
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
</Trials>
<Trials @type=“T2”> (13)
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
</Trials>
</PharmaLab>
?
//Trials//Trial
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
One possible original DB
Motivation 3/3
?
<PharmaLab> (1)
Q: //Trials[//Status]//Trial
<Trials @type=“T1”> (2)
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
</Trials>
<Trials @type=“T2”> (13)
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
</Trials>
</PharmaLab>
?
//Trials//Trial
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
One possible original DB
Motivation 3/3
?
Motivation 3/3
//Trials//Trial
Contained rewriting
◦ “●[//Status]”
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
One possible original DB
<PharmaLab> (1)
<Trials @type=“T1”> (2)
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
</Trials>
<Trials @type=“T2”> (13)
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
</Trials>
</PharmaLab>
{ (3) }
Problems Studied 1/3

Equivalent Rewriting: Given Q and views V,
find an equivalent rewriting of Q using V, i.e.,
an expression E s.t. V◦E Ξ Q, over all possible
input DBs.


Appropriate for query optimization.
Contained Rewriting: Given Q and V, find an
expression E s.t. V◦E  Q overall all possible
input DBs, and V◦E is maximal among all such
rewritings.

Most appropriate for information integration
[Halevy, Lenzerini, Pottinger & Halevy].
Problems Studied 2/3
No Schema: Given Q and V, find a
maximally contained rewriting (MCR) of
Q using V.
 With Schema: Given Q and V, and a
schema prescribing possible input DBs,
find a maximally contained rewriting of
Q using V.
 Focus: Tree Pattern Queries (XP/,//, [ ]).
Schema without cycles, union, and
recursion.

Problems Studied 3/3

Given Q & V:

R Ξ V ◦ E  Q.
Rewriting query
Compensation query
//a[//b]/c 

//a
b
Want MCR in the absence and in the
presence of a schema.
c
Without Schema 1/6

Question 1: Does an MCR always exist?
/a
b
c
V
/b
d
Q1
/a
b
No MCR for
Q1 and for Q2.
What went
wrong?
d
Q2
distinguished (answer) node
Without Schema 2/6
//Trials
Trial
V
f
//Trials
(1)
//Trials
(1)
V
Status
Patient
(2)
(3)
Q
Unfulfilled obligations
f – useful embedding
Trial
E
Status
(2)
Patient
(3)
Clip Away Tree (CAT)
Without Schema 3/6
Theorem: Q, V – tree pattern queries.
Then Q is answerable using V iff there
is a useful embedding from Q to V.
Testing Existence of MCR:
1//a
1,2
//a
2 a
1:{2}, 2:{}
1:{2,3}, 2:{3}
a
a
b6
2:{6} b
3a
b 2:{6}, 3:{4}
6:{7} c
c7
4b
c 4:{5}, 6:{7}
Q
5
c V
d
e
Without Schema 4/6
 Two embeddings – corresponding
irredundant CRs.
//a
a
//a
a
a
b
b
c
c d

e
need  for
expressing
MCR!
a
b
b
c
c a
c
d
b
e
Without Schema 5/6
Can test existence of MCR in poly time.
 However, MCRs can be exponentially
large (closure issue).

//a
a
V b
c
//a
a
a
b
b
c
d
Q
c
e
How many
irredundant CRs
are possible?
Without Schema 5/6
//a
//a
a
V b
c
//a
a
a
b
b
c
d
Q
a

b
c
c
e
d
e
Without Schema 5/6
//a
//a
a
V b
c
//a
a
a
b
b
c
d
Q
a

b
c
c
e
d
a/b/c/e
Without Schema 5/6
//a
//a
a
V b
c
//a
a
a
b
b
c
d
Q
a

b
c
e
c
a/b
e
c
e
Without Schema 5/6
//a
//a
a
V b
c
//a
a
a
b
b
c
d
Q
a

b
c
e
MCR = union of exponential
# CRs in the worst case!
c
a/b a/b/c/e
c
e
Without Schema 6/6
Summary:
 Can test existence of MCR in poly time.


Exact characterization.
MCR may be union of exponentially many
CRs in the worst case.
 Algorithm for generating MCR.

With Schema 1/6
Given Query Q, view V, schema S.
 Infer all constraints C implied by S.
 Chase V w.r.t. C.
 Look for MCR of Q w.r.t. chased view.

With Schema 2/6
Auctions
*
E.g. constraints:
Auction
•c_a has ≤ 1 bids
*
?
child
open_auction closed_auction
•Every Auction having
?
+
a person desc also has
bids
an item desc.
+
+
person
item
•every path from
Auction to name
name
goes via bids.
With Schema 3/6
//Auction
o_a
c_a
bids
bids
V
//Auction
bids
bids
person
item
Q
name
With Schema 3/6
*
open_auction
+
+
//Auction
Auctions
*
Auction
?
o_a
c_a
bids
bids
closed_auction
bids
person
?
+
item
name
person
item p
i
name
n
With Schema 4/6
//Auction
//Auction
bids
bids
o_a
c_a
person
item
bids
bids
Q
name
person
MCR = identity query.
item p
i
name
n
With Schema 5/6
Another Example:
Auctions
*
Auction
//Auction
item
Q
name
//Auction
person
V
*
open_auction
item
+
?
?
closed_auction
bids
+
How to answer Q using V?
buyer
person
name
With Schema 5/6
Another Example:
Auctions
*
Auction
//Auction
item
Q
name
//Auction
person
name
*
open_auction
item
+
?
item
So what’s the
compensation query?
?
closed_auction
bids
+
buyer
person
name
With Schema 5/6
Another Example:
Auctions
*
Auction
//Auction
item
Q
name
//Auction
*
open_auction
item
+
?
?
closed_auction
bids
+
person
item
name
MCR = V ◦ “●//name”
buyer
person
name
With Schema 6/6

Challenges and Highlights:

Naïve chase can explode.
 Make
chase context aware.
Exact characterization of schema w/o
recursion and union in terms of constraints.
 Efficient algo. for inferring the constraints.
 Efficient algo. for chase.
 And for finding MCR.
 MCR is unique, if it exists.

Recursive Schemas 1/2
a
*
c
b
//a
*
d
?
//a
b
b
b
V
c Q d
What is the MCR?
Recursive Schemas 2/2
a
*
c
b
//a
*
d
?
//a
b
b
V
c Q d
//a
b
c
d
b
Recursive Schemas 2/2
a
*
c
b
//a
*
d
?
//a
b
b
V
c Q d
//a
b
c
b
d
b
Recursive Schemas 2/2
a
*
c
b
//a
*
d
?
//a
b
b
V
c Q d
//a
b
b
c
d
b
Recursive Schemas 2/2
a
*
c
b
//a
*
d
?
//a
b
b
V
c Q d
//a
b
b
c
b
MCR = union of
four CRs.
b
Behavior similar
d
to no schema.
Related Work 1/2
QAV for relational – huge body of work
[Halevy 01].
 Regular path queries and semistructured DBs [Grahne&Thomo 03,
Calvenese 00,Papakonstantinou&Vassalos
99].
 Equivalent rewrites for fragments of
XQuery and XPath [Deutsch&Tannen
03, Tang&Zhou 05, Xu&Ozsoyoglu 05].

Related Work 2/2
Key differences b/w equivalent &
contained rewriting:
 Unique rewriting (even w/o schema).
 MCR may involve union of (possibly
exponentially many) CRs.
 Study of contained rewriting in
presence of schema.
 Lot of work on semantic caching [Chen+
02], heuristics for using materialized
views for optimizing XPath [Balmin+ 04],
mine views worth materializing, XPath
containment, … .

Summary & Future Work 1/2
QAV using (maximally) contained
rewriting ( information integration).
 Without schema: existence,
characterization, closure, generation of
MCR.
 With Schema: extract essence using
constraints, chase, similar problems as
above.
 Impact of recursion.
 Experiments.

Summary & Future Work 2/2
Impact of wildcard, disjunction, order …
 Impact of union, recursion, …
 Other integration models (e.g., GLAV)
 QAV for XQuery.
