Decidable Containment of Recursive Queries

Decidable Containment of
Recursive Queries
Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi
presented by Axel Polleres
http://www.dis.uniroma1.it/pub/calvanes/calv-degi-vard-ICDT-2003.pdf
1
Query Containment
• Checking whether one query yields
necessarily a subset of the result of another
one for every database
• Important for information integration, query
rewriting, verification, information integration,
cooperative answering, integrity checking,
etc.
2
Conjunctive Queries vs. full
Datalog
• A conjunctive query is a query of the form:
ans(X0) :- r1(X1), r1(X1), …, rn(Xn).
where the Xi = (x1i, …, xni) range over a set of variables
{u1, …, uk} and the variables in X0 are called distinguished
variables.
In SQL often called S(elect)P(roject)J(oin)-Queries
Containment of conjunctive queries is decidable!
In fact, NP-complete: [14]
Proof Sketch (membership in NP): A conj. Query Q1 is contained in Q2 iff
there is a containment mapping from (the variables in) Q2 to (the variables
in) Q1. Guessing and checking that homomorphism is clearly in NP.
Also completeness can be shown (e.g. by reduction
of “exact cover”, cf. [])
3
Full Datalog vs CQ:
• Full Datalog add Union and Recursion to CQ
Containment is undecidable 
• Undecidability can be shown by reduction from
containment for context free grammars [22]
So, CQ and Full Datalog span two extremes
But …not all is lost! There are interesting classes in
between!
4
Decidable containment Problems:
• Containment Monadic Datalog (all rule heads
use a single variable) is decidable
• Checking containment of full Datalog in nonrecursive Datalog is decidable in exponential
time
• Checking containment of non-recursive
Datalog in full Datalog is decidable
in triple
n
22
exponential time , i.e. O( 2 )
– When the non-recursive query is unfolded then
“only” double exponential.
5
In this paper:
Regular Path Queries:
• Query containment in the context of
conceptual graphs (e.g. RDF-graphs),
namely for Regular Path Queries, i.e.:
• Asking for all pairs of objects in a graph that are
connected by a path conforming to a regular expression:
i.,e.:
E(x,y)
… where E is a regular
expression over graph edges
Refinement:
- 2RPQs: “inverse” is allowed in traversal of
6
UC2RPQs:
• A conjunctive 2-way regular path query (C2RPQ) of arity n is a
query of the form:
where
are 2RPQs.
• UC2RQPs are then unions of conjunctive 2-way regular path
queries (C2RPQs) with the same arity. Here, the answer set to
Note that CQs (with only binary body predicates) are just a special case of 2RPQs!
7
Containment of Datalog in a
UC2RPQ:
• We define for a datalog program Π, an IDB
predicate Q and a database (EDB
predicates) G:
i.e. the set of facts Q (fixpoint) which can be
obtained by applications of rules in Π, then:
8
Containment of Datalog in a
Unions of Conjunctive queries:
• Idea: Reuse of variables is allowed, as long
as the variables are not “connected” in the
tree. So, we can build proof trees with a
bound number of variables by twice the
number of the maximum of variables
occurring in IDB atoms num_var(r) in rules r
of Π = num_var(Π).
• A proof tree is then simply an expansion tree
only using variables from {x1,…,xnum_var(Π)}
9
Containment of Datalog in a
Unions of Conjunctive queries:
•
Approach: the notion of a containment mapping is generalized to
Datalog and to UC2RPQs by expansions of Datalog programs:
can be defined via an infinite sequence of conjunctive queries:
•
Let trees(Q, Π) be the set of trees for predicate Q
labeled with a Rule at each node, such that the children of a node N
always are labeled with rules having as head atoms corresponding to
the IDB atoms of the rule of N and leaves are rules labeled with rules
having EDB predicates only in their bodies. Note that trees(Q, Π) can
be infinite.
Intuition: Π is contained in a union of conjunctive queries
if there is a containment mapping from some
to each expansion tree
in trees(Q, Π). … not yet, since the number of variables and hence the
number of node labels is unbounded.
10
Connected variables in proof
trees:
• To reconstruct an expansion tree for a gicen proof
tree, we need to distinguish among occurrences of
variables:
• Let g1, g2 be nodes in a proof tree, then we call
occurrences x1, x2 of variable x in the rules labeling
g1, and g2, respectively connected if every rule on
the path from g1 to g2 (except maybe the lowest
common ancestor g0) has an occurrence of x in the
head.
• We say that an occurrence x of a variable x in τ is a
distinguished occurrence if it is connected to an
occurrence of x in the head of the root of τ .
11
Containment of Datalog in a
Unions of Conjunctive queries:
A strong containment mapping from a conjunctive query ϕ to a
proof tree τ is a containment mapping h from ϕ to τ with:
• – h maps distinguished occurrences in ϕ to distinguished
occurrences in τ , and
• – if x1 and x2 are two occurrences of a variable x in ϕ, then the
occurrences h(x1) and h(x2) in τ are connected.
Then:
12
This can be similarly exploited for
C2RPQS
•
An expansion of a C2RPQ
is a CQ of the form:
13
In the rest of the paper…
• The authors show how to check this
condition using tree-automata:
• Idea: The set of proof trees for a Datalog program Π with a goal
predicate Q can be described by a nondeterministic tree
automaton (doubly exponential in the size of Π), accepting
exactly the proof trees. …
• concluding:
14
Conclusions
• Adding transitive to CQ closure does not increase
upper-bound-results for containment of Datalog
(2EXP matches the upper bound for containment in
unions of conjunctive queries) [25]
• However whether this upper-bound is tight is not
clear, but conjectured by the authors
• (lower bound EXPSPACE follows from containment
of UC2RPQs in UC2RPQs [34])
• Observe: Containment in the other direction already
undecidable for RPQs [22]
15
Questions/Interesting for WSMO/L
• How do te proof obligations we need relate to
RPQs/2RPQs/UC2RPQs
• How do RPQs/2RPQs/UC2RPQs relate to OWL
DL/Light/Flight and rule extensions thereof?
• Decidable yes, but (hardly) scalable, or no? Not
necessarily if queries/programs are of moderate
size.
• We need more use cases to show what kinds of
containment we need!
16
Important references from the paper
17