Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC Amazon.com Outline Motivation Problems Studied Without schema With schema Recursive schemas Related Work Summary & Future Work Motivation 1/3 Integration of existing data sources. Local as view (LAV) – one of the well-known approaches. Each source = a materialized view over some global database. Answer to query over global DB = answer to query using (materialized) views. Motivation 2/3 <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> Source = View “//Trials//Trial” over some DB containing clinical data – trials, their status, patient data, etc. Consider query Q: //Trials[//Status]//Trial over [unknown] original DB. How can and should we answer it using above source? <PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab> ? //Trials//Trial <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> One possible original DB Motivation 3/3 ? <PharmaLab> (1) Q: //Trials[//Status]//Trial <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab> ? //Trials//Trial <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> One possible original DB Motivation 3/3 ? Motivation 3/3 //Trials//Trial Contained rewriting ◦ “●[//Status]” <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> One possible original DB <PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab> { (3) } Problems Studied 1/3 Equivalent Rewriting: Given Q and views V, find an equivalent rewriting of Q using V, i.e., an expression E s.t. V◦E Ξ Q, over all possible input DBs. Appropriate for query optimization. Contained Rewriting: Given Q and V, find an expression E s.t. V◦E Q overall all possible input DBs, and V◦E is maximal among all such rewritings. Most appropriate for information integration [Halevy, Lenzerini, Pottinger & Halevy]. Problems Studied 2/3 No Schema: Given Q and V, find a maximally contained rewriting (MCR) of Q using V. With Schema: Given Q and V, and a schema prescribing possible input DBs, find a maximally contained rewriting of Q using V. Focus: Tree Pattern Queries (XP/,//, [ ]). Schema without cycles, union, and recursion. Problems Studied 3/3 Given Q & V: R Ξ V ◦ E Q. Rewriting query Compensation query //a[//b]/c //a b Want MCR in the absence and in the presence of a schema. c Without Schema 1/6 Question 1: Does an MCR always exist? /a b c V /b d Q1 /a b No MCR for Q1 and for Q2. What went wrong? d Q2 distinguished (answer) node Without Schema 2/6 //Trials Trial V f //Trials (1) //Trials (1) V Status Patient (2) (3) Q Unfulfilled obligations f – useful embedding Trial E Status (2) Patient (3) Clip Away Tree (CAT) Without Schema 3/6 Theorem: Q, V – tree pattern queries. Then Q is answerable using V iff there is a useful embedding from Q to V. Testing Existence of MCR: 1//a 1,2 //a 2 a 1:{2}, 2:{} 1:{2,3}, 2:{3} a a b6 2:{6} b 3a b 2:{6}, 3:{4} 6:{7} c c7 4b c 4:{5}, 6:{7} Q 5 c V d e Without Schema 4/6 Two embeddings – corresponding irredundant CRs. //a a //a a a b b c c d e need for expressing MCR! a b b c c a c d b e Without Schema 5/6 Can test existence of MCR in poly time. However, MCRs can be exponentially large (closure issue). //a a V b c //a a a b b c d Q c e How many irredundant CRs are possible? Without Schema 5/6 //a //a a V b c //a a a b b c d Q a b c c e d e Without Schema 5/6 //a //a a V b c //a a a b b c d Q a b c c e d a/b/c/e Without Schema 5/6 //a //a a V b c //a a a b b c d Q a b c e c a/b e c e Without Schema 5/6 //a //a a V b c //a a a b b c d Q a b c e MCR = union of exponential # CRs in the worst case! c a/b a/b/c/e c e Without Schema 6/6 Summary: Can test existence of MCR in poly time. Exact characterization. MCR may be union of exponentially many CRs in the worst case. Algorithm for generating MCR. With Schema 1/6 Given Query Q, view V, schema S. Infer all constraints C implied by S. Chase V w.r.t. C. Look for MCR of Q w.r.t. chased view. With Schema 2/6 Auctions * E.g. constraints: Auction •c_a has ≤ 1 bids * ? child open_auction closed_auction •Every Auction having ? + a person desc also has bids an item desc. + + person item •every path from Auction to name name goes via bids. With Schema 3/6 //Auction o_a c_a bids bids V //Auction bids bids person item Q name With Schema 3/6 * open_auction + + //Auction Auctions * Auction ? o_a c_a bids bids closed_auction bids person ? + item name person item p i name n With Schema 4/6 //Auction //Auction bids bids o_a c_a person item bids bids Q name person MCR = identity query. item p i name n With Schema 5/6 Another Example: Auctions * Auction //Auction item Q name //Auction person V * open_auction item + ? ? closed_auction bids + How to answer Q using V? buyer person name With Schema 5/6 Another Example: Auctions * Auction //Auction item Q name //Auction person name * open_auction item + ? item So what’s the compensation query? ? closed_auction bids + buyer person name With Schema 5/6 Another Example: Auctions * Auction //Auction item Q name //Auction * open_auction item + ? ? closed_auction bids + person item name MCR = V ◦ “●//name” buyer person name With Schema 6/6 Challenges and Highlights: Naïve chase can explode. Make chase context aware. Exact characterization of schema w/o recursion and union in terms of constraints. Efficient algo. for inferring the constraints. Efficient algo. for chase. And for finding MCR. MCR is unique, if it exists. Recursive Schemas 1/2 a * c b //a * d ? //a b b b V c Q d What is the MCR? Recursive Schemas 2/2 a * c b //a * d ? //a b b V c Q d //a b c d b Recursive Schemas 2/2 a * c b //a * d ? //a b b V c Q d //a b c b d b Recursive Schemas 2/2 a * c b //a * d ? //a b b V c Q d //a b b c d b Recursive Schemas 2/2 a * c b //a * d ? //a b b V c Q d //a b b c b MCR = union of four CRs. b Behavior similar d to no schema. Related Work 1/2 QAV for relational – huge body of work [Halevy 01]. Regular path queries and semistructured DBs [Grahne&Thomo 03, Calvenese 00,Papakonstantinou&Vassalos 99]. Equivalent rewrites for fragments of XQuery and XPath [Deutsch&Tannen 03, Tang&Zhou 05, Xu&Ozsoyoglu 05]. Related Work 2/2 Key differences b/w equivalent & contained rewriting: Unique rewriting (even w/o schema). MCR may involve union of (possibly exponentially many) CRs. Study of contained rewriting in presence of schema. Lot of work on semantic caching [Chen+ 02], heuristics for using materialized views for optimizing XPath [Balmin+ 04], mine views worth materializing, XPath containment, … . Summary & Future Work 1/2 QAV using (maximally) contained rewriting ( information integration). Without schema: existence, characterization, closure, generation of MCR. With Schema: extract essence using constraints, chase, similar problems as above. Impact of recursion. Experiments. Summary & Future Work 2/2 Impact of wildcard, disjunction, order … Impact of union, recursion, … Other integration models (e.g., GLAV) QAV for XQuery.
© Copyright 2024 Paperzz