Here - CS

Local as View:
Some refinements
 IM: Filtering irrelevant sources
Views with restricted access patterns
 A summary of IM
2005
lav-ii
1
IM: Filtering irrelevant sources
When there are many sources, it is important to weed out those
that are irrelevant to a query
Comparison constraints can help (e.g., qu >= w98)
What more can be done?
The IM system suggests to introduce
classes with a class hierarchy
into source descriptions
2005
lav-ii
2
Example :
carForSale
car
usedCar
newCar
AmericanCar
GermanCar
EurpoeanCar
ItalianCar
JapaneseCar
FrenchCar
-- disjoint classes
Additionally, the global schema contains a relation
details(car, year, mileage, price, sellerContact)
[
c,
y,
mi,
p,
s ]
(we will also abbreviate class names)
2005
lav-ii
3
The views:
v1(c, y, mi, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c), ,y >= 1990
v2(c, y, p, s) :- details(c,y,mi,p,s) , cFSale(c), EurCar(c)
v3(c, y, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c), p>= $25000 // luxury cars
v4(c, y, p):- details(c,y,mi,p,s), cFSale(c), uCar(c), y<= 1980 //vintage cars
v5(c, y, p, s) :- details(c, mc, y, p, s), cFSale(c), nCar(c), c=Toyota
Assume a query:
Q: q(c, mc, y, p, s) :- details(c, y, mi, p, s) , cFSale(c), Jcar(c),
y>= 1992 , p<= $12000
Some candidate rewritings will be rejected, since they are
inconsistent with Q
2005
lav-ii
4
When a view is considered for consistency with Q,
•
v4 will be discarded – y<=1980, y>=1992 is inconsistent
•
v3 will be discarded – p>=$25000, p<=$12000 is inconsistent
•
v2 will be discarded – EurCar(c), JCar(c) is inconsistent
•
v5 – depends on what is known about the relationship between
Toyota and the various car classes
Reasoning about disjoint-ness of classes (given a hierarchy as
above) is easy and efficient
2005
lav-ii
5
The true story (a side trip):
IM uses a (PTIME) Description Logic for source description
A DL is a formalism that describes
classes & binary relationships
intentionally.
For example, a class can be given by a name (e.g. JCar) or by an
expression that describes its properties:
cheapJCar :- uCar and JCar and price < $9000
A DL also contains containment and disjoint-ness axioms for
class expressions (containment is called subsumption in DL jargon)
To be useful, a DL needs to support containment and disjointness queries on classes and membership queries on individuals
– this is an inference problem
2005
lav-ii
6
Many DL’s are known
Complexity (for subsumption) ranges from polynomial (rare), to
NP-complete, to exptime-complete, to undecidable
Recent interest focuses on using DL’s for the Semantic Web
The W3C OWL standard is essentially a DL
(this use is essentially the same as in IM)
That is it on DL’s
2005
lav-ii
7
Views with restricted access patterns
Many sources do not support full SQL:
• They are legacy systems, e.g.
– finger on UNIX accepts email, returns other attributes
– A bibliography source requires author, or title, or but does not
accept a year as input
• They do not want to disclose all their data, e.g.,
– a carSale source will not present all the cars it has for sale
– An airline requires from and destination as input for flight info
The questions:
• How do we describe such sources?
• What are good rewritings and do we find them?
2005
lav-ii
8
Restricted sources can be described by binding patterns
Two equivalent styles : (there are more sophisticated schemes)
Example: assume global relations
email(F, L, E), office(F, L, O), phone(O, P)
(F-first, L-last, E-email, O-office, P-phone)
The views are
finger, userId, described as follows:
• Adding $ to attributes that can be given as input
finger(F, L, $E, O, P) :- email(F, L, E), office(F, L, O), phone(O, P)
userId($O, E) :- office(F, L, O), email(F, L, E)
• Using b, f strings on predicates, where b means bound (i.e., in)
fingerffbff(F, L, E, O, P) :- email(F, L, E), office(F, L, O), phone(O, P)
userIdbf(O, E) :- office(F, L, O), email(F, L, E)
2005
lav-ii
9
Example, cont’d :
Q: qbf(O, F) :- office(F, L, O) (or q($O, F) :- office(F, L, O) )
• Cannot be answered by using finger – it requires E as input
• Cannot be answered by using userId – it does not return F
The following is a good rewriting:
q’(O, F):- userId(O, E), finger(F, L, E, O, P)
jump
For two reasons:
• It is executable with respect to the sources: executing the body
left-to-right respects the access restrictions
O for userId –from the query, E for finger – from userId
• Its expansion is contained in the query (check!)
2005
lav-ii
10
These two reasons are a characterization of a good rewriting:
• It is executable with respect to the sources: executing the body
left-to-right respects the access restrictions
• Its expansion is contained in the query (check!)
Indeed
• If it is not a contained rewriting, then being executable is no
good
• Being contained but not executable is also no good
2005
lav-ii
11
The IM approach:
After a rewriting is found to be consistent and contained, it is
checked for being executable – can the sub-goals in the body
be ordered so that the input required for each is supplied from
the query or the sub-goals to its left
2005
lav-ii
12
A summary of IM
• Introduced (with other concurrent systems) the notion of
LAV and query rewriting using views
• Also, detailed source descriptions using DL’s
• An efficient algorithm for finding contained and
executable rewritings
• Worked well, for about 100 sources
2005
lav-ii
13
Here is a graph from the paper
2005
lav-ii
14
But :
• The fact that a contained rewriting needs a number of
views at most the number of atoms in the query has been
proved only for CQ’s , without
• comparisons,
• access restrictions
• constraints on the global db
Does it hold for these cases? (see example in p. 10)
For access restricted sources, it has been proved that for
equivalent rewritings one needs at most n+m views, where
n is the number of atoms in the query, m is the number of
different variables in it
The proof does not hold for contained rewritings
2005
lav-ii
15
• Even for “pure” CQ’s, is the bucket algorithm guaranteed to
find all rewritings?
The answers to all these questions are negative!
• The bucket algorithm does not find all rewritings
• For the more general cases, longer rewritings are needed;
actually, there may be an infinite number of them, with no
bound on length
There is a need for another approach
2005
lav-ii
16