A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy 1 Answering Queries Using Views Problem: access views instead of original relations Useful in data integration and query optimization NP-Complete Many papers on the subject No empirical testing of algorithms 2 Data Integration: Query Reformulation Data sources are pre-calculated views Views are not complete Get the most answers possible given the views Many data sources Car sale information Ford cars - dealer prices - sticker prices - inventory Cheap cars - prices -manufacturer Used cars - prices - dealer - year 3 Data Integration Example Query: find the prices of cars that we can buy at cost Query Database relations Q(cost):-dealercost(car,cost) & stickerprice(car,cost) V1(price1,price2):-dealercost(car, price1) & Views stickerprice(car, price2) & maker(car, “Ford”) V2(cost):-dealercost(car, cost) & stickerprice(car,cost) & cheap(car) distinguished existential Q’1(cost):-Ford(cost, cost) Conjunctive Q’2(cost):-BMW(cost) rewritings Maximally contained rewriting 4 Outline Previous algorithms Bucket Algorithm [Levy, Rajaraman, Ordille, 1996] Inverse rules [Duschka, Genesereth, 1997] Minimum Necessary Connections (MiniCon) Algorithm Experimental evaluation Extension to arithmetic comparisons Conclusions and future work 5 The Bucket Algorithm Introduced as part of Information Manifold Treats subgoals individually 6 Bucket Algorithm: Populating buckets For each subgoal in the query, place relevant views in the subgoal’s bucket Inputs: Q(x):- r1(x,y) & r2(y,x) V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) Buckets: r2(y,x) r1(x,y) V1(x),V3(x) V2(x), V3(x) 7 Combining Buckets For every combination in the Cartesian products from the buckets, check containment in the query Bucket Algorithm will check all possible combinations Buckets: r1(x,y) V1(x),V3(x) r2(y,x) V2(x), V3(x) Candidate rewritings: Q’1(x) :- V1(x) & V2(x) Q’2(x) :- V1(x) & V3(x) Q’3(x) :- V3(x) & V2(x) Q’4(x) :- V3(x) & V3(x) r1(x,y) r2(y,x)8 Inverse Rules Part of the Info Master system Inverse rules show how to get database tuples from the views Cannot be extended to interpreted predicates Stops earlier than the Bucket Algorithm 9 Creating Inverse Rules For each V(X):-r1(X1) &… & rn(Xn) for each j = 1, …, n form an inverse rule: rj(Xj):-V(X) Inverse Rules: Inputs: IR1 r1(a, sfV1(a)) :-V1(a) V1(a):-r1(a,b) IR2 r2(sfV2(d),d) :-V2(d) V2(d):-r2(c,d) IR3 r1(f,sfV3(f)) :-V3(f) V3(f):- r1(f,g) & r2(g,f) IR4 r2(sfV3(f),f) :-V3(f) Skolem Function 10 Combining Inverse Rules At query time, query over rules Query + Inverse Rules + IR1 r1(a, sfV1(a)) :-V1(a) Q(x):-r1(x,y)& r2(y,x) IR2 r2(sfV2(d),d) :-V2(d) IR3 r1(f,sfV3(f)) :-V3(f) IR4 r2(sfV3(f),f) :-V3(f) Tuples V1(g) V2(h) V3(j) V3(m) = Expansion: r1(g,sfV1(g)), r2(sfV2(h),h), r1(j,sfV3(j)), r2(sfV3(j),j) r1(m,sfV3(m)), r2(sfV3(m),m) 11 Unfolding rules before tuples Q(x):- r1(x,y) & r2(y,x) IR1 IR2 IR3 IR4 Use unification to see if rewriting is contained in the query No containment check necessary 12 The MiniCon Algorithm Concentrate on variables rather than subgoals to create MiniCon Descriptions (MCDs) Combine MCDs that only overlap on distinguished view variables No containment check! 13 MiniCon Description Formation Form all MiniCon Descriptions (MCDs) that map all query variables that have to be mapped together Inputs: Q(x) :-r1(x,y) & r2(y,x) V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) view mapping MCDs: V3 x f, y g subgoals mapped 1, 2 14 MiniCon Combination Take all combinations of MCDs that map disjoint sets of subgoals map all subgoals of the query MCDs: view V3 mapping x f, y g subgoals mapped 1, 2 Rewriting: Q’(x):-V3(x) 15 Experimental Evaluation Tested performance and scale up of: Bucket Algorithm Inverse Rules extended with unification MiniCon Algorithm MiniCon at least as good in all cases, much better in some Show results for chain queries: Q(a):-r1(a,b), r2(b,c), r3(c,d), r4(d,e) 16 Many Rewritings Chain queries with 5 subgoals and all variables distinguished MiniCon Inverse Bucket Time (sec) 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 Number of Views 17 structured query and views Chain queries with 10 subgoals and 2 distinguished variables MiniCon Inverse Bucket Time (sec) 2 1.5 1 0.5 0 0 100 200 300 400 Number of Views 18 Few rewritings, less structured views Chain queries; 2 variables distinguished, query of length 12, views of lengths 2, 3, and 4 Time (sec) 2 Minicon Inverse 1.5 1 0.5 0 0 50 100 150 Number of Views 19 Extension: Interpreted Predicates Problem is in general undecidable We looked at subgoals of the form: var < constant or var > constant If maps to an existential view variable, require interpreted predicates implied Ex: Q(x):-r1(x,y), y > 17 Interpreted V1(a):-r1(a,b), b > 18 Predicates Guaranteed to be sound 20 Interpreted Predicate Results Chain queries with two distinguished variables, 10 subgoals, and 5 variables constrained Chain queries with all variables distinguished, 5 subgoals, and 5 variables constrained 0.8 MiniCon IP 0.6 MiniCon Time (sec) Time (sec) 1 0.4 0.2 0 0 100 200 Number of Views 300 400 8 7 6 5 4 3 2 1 0 MiniCon IP Minicon 1 2 3 4 5 6 7 8 Number of Views 21 9 Future Work Query Optimization Look for the fastest answer to query Assume that all views are complete Require equivalent rewritings Need to allow overlap on subgoals mapped A fuller comparison of interpreted predicates 22 Conclusions Scalability of previous algorithms understood MiniCon Algorithm invented First experimental comparison of algorithms for answering queries using views Extensions to binding patterns, interpreted predicates New maximally contained rewriting form 23 Maximally contained Rewritings Q’ is a maximally contained rewriting of a query Q using the views V = V1, …, Vn if For any database D, and extensions v1, …, vn of the views such that vi Vi(D), 1 i n, then Q’(v1, …, v2) Q(D) for all i There is no other query Q1 such that Q’(v1, …, vn) Q1(v1, …, vn) (2) Q1(v1, …, vn) Q(D), and there exists at least one database for which is a strict subset 24 Containment Checks Q1 Q2 if the answer to Q1 is a subset of Q2 m is a containment mapping from Vars(Q2) to Vars(Q1) if m maps every subgoal in the body of Q2 to a subgoal in the body of Q1 m maps the head of Q2 to the head of Q1 25 Inverse Rules With Unification Find all Inverse Rules that match each query subgoal; place in bucket for that subgoal For each rule in the first bucket For each other subgoal, i, attempt to unify the rules so far with all elements in the bucket for I If we cannot unify with anything in that bucket, break out of loop, otherwise, recurse 26 Correctness requirements We need both soundness and completeness A sound rewriting has a valid containment mapping from the variables of the query to the variables of the view For completeness we need only to check rewritings of length less than or equal to that of the query 27 Extensions to XML Need to choose a query language Containment checks should still hold Need to check to make sure that restructured elements are distinguished May even be more scalable vs Inverse Rules, Bucket Algorithm 28
© Copyright 2026 Paperzz