Answering Tree Pattern
Queries Using Views
Laks V.S. Lakshmanan, Hui (Wendy) Wang, and
Zheng (Jessica) Zhao
University of British Columbia
Vancouver, BC
Amazon.com
Outline
Motivation
Problems Studied
Without schema
With schema
Recursive schemas
Related Work
Summary & Future Work
Motivation 1/3
Integration of existing data sources.
Local as view (LAV) – one of the well-known
approaches.
Each source = a materialized view over
some global database.
Answer to query over global DB = answer to
query using (materialized) views.
Motivation 2/3
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
Source = View “//Trials//Trial” over some DB containing
clinical data – trials, their status, patient data, etc.
Consider query Q: //Trials[//Status]//Trial over [unknown]
original DB.
How can and should we answer it using above source?
<PharmaLab> (1)
<Trials @type=“T1”> (2)
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
</Trials>
<Trials @type=“T2”> (13)
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
</Trials>
</PharmaLab>
?
//Trials//Trial
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
One possible original DB
Motivation 3/3
?
<PharmaLab> (1)
Q: //Trials[//Status]//Trial
<Trials @type=“T1”> (2)
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
</Trials>
<Trials @type=“T2”> (13)
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
</Trials>
</PharmaLab>
?
//Trials//Trial
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
One possible original DB
Motivation 3/3
?
Motivation 3/3
//Trials//Trial
Contained rewriting
◦ “●[//Status]”
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
One possible original DB
<PharmaLab> (1)
<Trials @type=“T1”> (2)
<Trial> (3) <Patient> (4) John Doe </Patient> …
<Status> (10) Complete </Status> </Trial>
<Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial>
</Trials>
<Trials @type=“T2”> (13)
<Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
</Trials>
</PharmaLab>
{ (3) }
Problems Studied 1/3
Equivalent Rewriting: Given Q and views V,
find an equivalent rewriting of Q using V, i.e.,
an expression E s.t. V◦E Ξ Q, over all possible
input DBs.
Appropriate for query optimization.
Contained Rewriting: Given Q and V, find an
expression E s.t. V◦E Q overall all possible
input DBs, and V◦E is maximal among all such
rewritings.
Most appropriate for information integration
[Halevy, Lenzerini, Pottinger & Halevy].
Problems Studied 2/3
No Schema: Given Q and V, find a
maximally contained rewriting (MCR) of
Q using V.
With Schema: Given Q and V, and a
schema prescribing possible input DBs,
find a maximally contained rewriting of
Q using V.
Focus: Tree Pattern Queries (XP/,//, [ ]).
Schema without cycles, union, and
recursion.
Problems Studied 3/3
Given Q & V:
R Ξ V ◦ E Q.
Rewriting query
Compensation query
//a[//b]/c
//a
b
Want MCR in the absence and in the
presence of a schema.
c
Without Schema 1/6
Question 1: Does an MCR always exist?
/a
b
c
V
/b
d
Q1
/a
b
No MCR for
Q1 and for Q2.
What went
wrong?
d
Q2
distinguished (answer) node
Without Schema 2/6
//Trials
Trial
V
f
//Trials
(1)
//Trials
(1)
V
Status
Patient
(2)
(3)
Q
Unfulfilled obligations
f – useful embedding
Trial
E
Status
(2)
Patient
(3)
Clip Away Tree (CAT)
Without Schema 3/6
Theorem: Q, V – tree pattern queries.
Then Q is answerable using V iff there
is a useful embedding from Q to V.
Testing Existence of MCR:
1//a
1,2
//a
2 a
1:{2}, 2:{}
1:{2,3}, 2:{3}
a
a
b6
2:{6} b
3a
b 2:{6}, 3:{4}
6:{7} c
c7
4b
c 4:{5}, 6:{7}
Q
5
c V
d
e
Without Schema 4/6
Two embeddings – corresponding
irredundant CRs.
//a
a
//a
a
a
b
b
c
c d
e
need for
expressing
MCR!
a
b
b
c
c a
c
d
b
e
Without Schema 5/6
Can test existence of MCR in poly time.
However, MCRs can be exponentially
large (closure issue).
//a
a
V b
c
//a
a
a
b
b
c
d
Q
c
e
How many
irredundant CRs
are possible?
Without Schema 5/6
//a
//a
a
V b
c
//a
a
a
b
b
c
d
Q
a
b
c
c
e
d
e
Without Schema 5/6
//a
//a
a
V b
c
//a
a
a
b
b
c
d
Q
a
b
c
c
e
d
a/b/c/e
Without Schema 5/6
//a
//a
a
V b
c
//a
a
a
b
b
c
d
Q
a
b
c
e
c
a/b
e
c
e
Without Schema 5/6
//a
//a
a
V b
c
//a
a
a
b
b
c
d
Q
a
b
c
e
MCR = union of exponential
# CRs in the worst case!
c
a/b a/b/c/e
c
e
Without Schema 6/6
Summary:
Can test existence of MCR in poly time.
Exact characterization.
MCR may be union of exponentially many
CRs in the worst case.
Algorithm for generating MCR.
With Schema 1/6
Given Query Q, view V, schema S.
Infer all constraints C implied by S.
Chase V w.r.t. C.
Look for MCR of Q w.r.t. chased view.
With Schema 2/6
Auctions
*
E.g. constraints:
Auction
•c_a has ≤ 1 bids
*
?
child
open_auction closed_auction
•Every Auction having
?
+
a person desc also has
bids
an item desc.
+
+
person
item
•every path from
Auction to name
name
goes via bids.
With Schema 3/6
//Auction
o_a
c_a
bids
bids
V
//Auction
bids
bids
person
item
Q
name
With Schema 3/6
*
open_auction
+
+
//Auction
Auctions
*
Auction
?
o_a
c_a
bids
bids
closed_auction
bids
person
?
+
item
name
person
item p
i
name
n
With Schema 4/6
//Auction
//Auction
bids
bids
o_a
c_a
person
item
bids
bids
Q
name
person
MCR = identity query.
item p
i
name
n
With Schema 5/6
Another Example:
Auctions
*
Auction
//Auction
item
Q
name
//Auction
person
V
*
open_auction
item
+
?
?
closed_auction
bids
+
How to answer Q using V?
buyer
person
name
With Schema 5/6
Another Example:
Auctions
*
Auction
//Auction
item
Q
name
//Auction
person
name
*
open_auction
item
+
?
item
So what’s the
compensation query?
?
closed_auction
bids
+
buyer
person
name
With Schema 5/6
Another Example:
Auctions
*
Auction
//Auction
item
Q
name
//Auction
*
open_auction
item
+
?
?
closed_auction
bids
+
person
item
name
MCR = V ◦ “●//name”
buyer
person
name
With Schema 6/6
Challenges and Highlights:
Naïve chase can explode.
Make
chase context aware.
Exact characterization of schema w/o
recursion and union in terms of constraints.
Efficient algo. for inferring the constraints.
Efficient algo. for chase.
And for finding MCR.
MCR is unique, if it exists.
Recursive Schemas 1/2
a
*
c
b
//a
*
d
?
//a
b
b
b
V
c Q d
What is the MCR?
Recursive Schemas 2/2
a
*
c
b
//a
*
d
?
//a
b
b
V
c Q d
//a
b
c
d
b
Recursive Schemas 2/2
a
*
c
b
//a
*
d
?
//a
b
b
V
c Q d
//a
b
c
b
d
b
Recursive Schemas 2/2
a
*
c
b
//a
*
d
?
//a
b
b
V
c Q d
//a
b
b
c
d
b
Recursive Schemas 2/2
a
*
c
b
//a
*
d
?
//a
b
b
V
c Q d
//a
b
b
c
b
MCR = union of
four CRs.
b
Behavior similar
d
to no schema.
Related Work 1/2
QAV for relational – huge body of work
[Halevy 01].
Regular path queries and semistructured DBs [Grahne&Thomo 03,
Calvenese 00,Papakonstantinou&Vassalos
99].
Equivalent rewrites for fragments of
XQuery and XPath [Deutsch&Tannen
03, Tang&Zhou 05, Xu&Ozsoyoglu 05].
Related Work 2/2
Key differences b/w equivalent &
contained rewriting:
Unique rewriting (even w/o schema).
MCR may involve union of (possibly
exponentially many) CRs.
Study of contained rewriting in
presence of schema.
Lot of work on semantic caching [Chen+
02], heuristics for using materialized
views for optimizing XPath [Balmin+ 04],
mine views worth materializing, XPath
containment, … .
Summary & Future Work 1/2
QAV using (maximally) contained
rewriting ( information integration).
Without schema: existence,
characterization, closure, generation of
MCR.
With Schema: extract essence using
constraints, chase, similar problems as
above.
Impact of recursion.
Experiments.
Summary & Future Work 2/2
Impact of wildcard, disjunction, order …
Impact of union, recursion, …
Other integration models (e.g., GLAV)
QAV for XQuery.
© Copyright 2026 Paperzz