A Scalable Algorithm for Answering Queries Using Views

A Scalable Algorithm for
Answering Queries Using
Views
Rachel Pottinger
Qualifying Exam
October 29, 1999
Advisor: Alon Levy
1
Answering Queries Using
Views
Problem: access views instead of original
relations
Useful in data integration and query
optimization
NP-Complete
Many papers on the subject
No empirical testing of algorithms
2
Data Integration:
Query Reformulation
Data sources are pre-calculated views
Views are not complete
Get the most answers possible given the views
Many data sources
Car sale information
Ford cars
- dealer prices
- sticker prices
- inventory
Cheap cars
- prices
-manufacturer
Used cars
- prices
- dealer
- year
3
Data Integration Example
Query: find the prices of cars that we can buy at cost
Query
Database relations
Q(cost):-dealercost(car,cost) & stickerprice(car,cost)
V1(price1,price2):-dealercost(car, price1) &
Views stickerprice(car, price2) & maker(car, “Ford”)
V2(cost):-dealercost(car, cost) & stickerprice(car,cost)
& cheap(car)
distinguished
existential
Q’1(cost):-Ford(cost, cost) 
Conjunctive
Q’2(cost):-BMW(cost)
rewritings
Maximally
contained
rewriting
4
Outline
Previous algorithms
Bucket Algorithm [Levy, Rajaraman, Ordille, 1996]
Inverse rules [Duschka, Genesereth, 1997]
Minimum Necessary Connections (MiniCon)
Algorithm
Experimental evaluation
Extension to arithmetic comparisons
Conclusions and future work
5
The Bucket Algorithm
Introduced as part of Information
Manifold
Treats subgoals individually
6
Bucket Algorithm:
Populating buckets
For each subgoal in the query, place
relevant views in the subgoal’s bucket
Inputs:
Q(x):- r1(x,y) & r2(y,x)
V1(a):-r1(a,b)
V2(d):-r2(c,d)
V3(f):- r1(f,g) & r2(g,f)
Buckets:
r2(y,x)
r1(x,y)
V1(x),V3(x)
V2(x), V3(x)
7
Combining Buckets
For every combination in the Cartesian products from
the buckets, check containment in the query
Bucket Algorithm will
check all possible
combinations
Buckets:
r1(x,y)
V1(x),V3(x)
r2(y,x)
V2(x), V3(x)
Candidate rewritings:
Q’1(x) :- V1(x) & V2(x) 
Q’2(x) :- V1(x) & V3(x) 
Q’3(x) :- V3(x) & V2(x) 
Q’4(x) :- V3(x) & V3(x) 
r1(x,y) r2(y,x)8
Inverse Rules
Part of the Info Master system
Inverse rules show how to get database
tuples from the views
Cannot be extended to interpreted
predicates
Stops earlier than the Bucket Algorithm
9
Creating Inverse Rules
For each V(X):-r1(X1) &… & rn(Xn)
for each j = 1, …, n form an inverse rule:
rj(Xj):-V(X)
Inverse Rules:
Inputs:
IR1 r1(a, sfV1(a)) :-V1(a)
V1(a):-r1(a,b)
IR2 r2(sfV2(d),d) :-V2(d)
V2(d):-r2(c,d)
IR3 r1(f,sfV3(f)) :-V3(f)
V3(f):- r1(f,g) & r2(g,f)
IR4 r2(sfV3(f),f) :-V3(f)
Skolem
Function
10
Combining Inverse Rules
At query time, query over rules
Query
+
Inverse Rules
+
IR1 r1(a, sfV1(a)) :-V1(a) Q(x):-r1(x,y)& r2(y,x)
IR2 r2(sfV2(d),d) :-V2(d)
IR3 r1(f,sfV3(f)) :-V3(f)
IR4 r2(sfV3(f),f) :-V3(f)
Tuples
V1(g)
V2(h)
V3(j)
V3(m)
= Expansion:
r1(g,sfV1(g)), r2(sfV2(h),h), r1(j,sfV3(j)), r2(sfV3(j),j)
r1(m,sfV3(m)), r2(sfV3(m),m)
11
Unfolding rules before
tuples
Q(x):-
r1(x,y)
& r2(y,x)
IR1
IR2
IR3
IR4
Use unification to see if rewriting is contained
in the query
No containment check necessary
12
The MiniCon Algorithm
Concentrate on variables rather than subgoals
to create MiniCon Descriptions (MCDs)
Combine MCDs that only overlap on
distinguished view variables
No containment check!
13
MiniCon Description
Formation
Form all MiniCon Descriptions (MCDs) that map
all query variables that have to be mapped
together
Inputs:
Q(x) :-r1(x,y) & r2(y,x)
V1(a):-r1(a,b)
V2(d):-r2(c,d)
V3(f):- r1(f,g) & r2(g,f)
view
mapping
MCDs: V3
x  f, y  g
subgoals mapped
1, 2
14
MiniCon Combination
Take all combinations of MCDs that
 map disjoint sets of subgoals
 map all subgoals of the query
MCDs:
view
V3
mapping
x  f, y  g
subgoals mapped
1, 2
Rewriting: Q’(x):-V3(x)
15
Experimental Evaluation
Tested performance and scale up of:
Bucket Algorithm
Inverse Rules extended with unification
MiniCon Algorithm
MiniCon at least as good in all cases, much
better in some
Show results for chain queries:
Q(a):-r1(a,b), r2(b,c), r3(c,d), r4(d,e)
16
Many Rewritings
Chain queries with 5 subgoals and all variables
distinguished
MiniCon
Inverse
Bucket
Time (sec)
10
8
6
4
2
0
1
2
3
4
5
6
7
8
9
10
11
Number of Views
17
structured query and
views
Chain queries with 10 subgoals and 2
distinguished variables
MiniCon
Inverse
Bucket
Time (sec)
2
1.5
1
0.5
0
0
100
200
300
400
Number of Views
18
Few rewritings, less
structured views
Chain queries; 2 variables distinguished,
query of length 12, views of lengths 2, 3, and 4
Time (sec)
2
Minicon
Inverse
1.5
1
0.5
0
0
50
100
150
Number of Views
19
Extension:
Interpreted Predicates
Problem is in general undecidable
We looked at subgoals of the form:
var < constant or var > constant
If maps to an existential view variable,
require interpreted predicates implied
Ex: Q(x):-r1(x,y), y > 17
Interpreted
V1(a):-r1(a,b), b > 18
Predicates
Guaranteed to be sound
20
Interpreted Predicate
Results
Chain queries with two
distinguished variables, 10
subgoals, and 5 variables
constrained
Chain queries with all variables
distinguished, 5 subgoals, and 5
variables constrained
0.8
MiniCon IP
0.6
MiniCon
Time (sec)
Time (sec)
1
0.4
0.2
0
0
100
200
Number of Views
300
400
8
7
6
5
4
3
2
1
0
MiniCon IP
Minicon
1
2
3
4
5
6
7
8
Number of Views
21
9
Future Work
Query Optimization
Look for the fastest answer to query
Assume that all views are complete
Require equivalent rewritings
Need to allow overlap on subgoals mapped
A fuller comparison of interpreted
predicates
22
Conclusions
Scalability of previous algorithms understood
MiniCon Algorithm invented
First experimental comparison of algorithms
for answering queries using views
Extensions to binding patterns, interpreted
predicates
New maximally contained rewriting form
23
Maximally contained
Rewritings
Q’ is a maximally contained rewriting of a
query Q using the views V = V1, …, Vn if
For any database D, and extensions v1, …, vn
of the views such that vi  Vi(D), 1 i n,
then Q’(v1, …, v2)  Q(D) for all i
There is no other query Q1 such that
Q’(v1, …, vn)  Q1(v1, …, vn)
(2) Q1(v1, …, vn)  Q(D), and there exists
at least one database for which  is a
strict subset
24
Containment Checks
Q1  Q2 if the answer to Q1 is a subset of
Q2
m is a containment mapping from
Vars(Q2) to Vars(Q1) if
m maps every subgoal in the body of Q2 to a
subgoal in the body of Q1
m maps the head of Q2 to the head of Q1
25
Inverse Rules With Unification
Find all Inverse Rules that match each
query subgoal; place in bucket for that
subgoal
For each rule in the first bucket
For each other subgoal, i, attempt to unify
the rules so far with all elements in the
bucket for I
If we cannot unify with anything in that
bucket, break out of loop, otherwise,
recurse
26
Correctness requirements
We need both soundness and
completeness
A sound rewriting has a valid containment
mapping from the variables of the query to
the variables of the view
For completeness we need only to check
rewritings of length less than or equal to that
of the query
27
Extensions to XML
Need to choose a query language
Containment checks should still hold
Need to check to make sure that
restructured elements are distinguished
May even be more scalable vs Inverse
Rules, Bucket Algorithm
28