lecture4d

CS589 Principles of DB Systems
Fall 2008
Lecture 4d: Recursive Datalog with Negation –
What is the query answer defined to be?
Lois Delcambre
[email protected]
503 725-2405
Goals for today





Briefly discuss negative occurrences of
variables with universal quantification in
domain calculus.
Discuss proofs of equivalence/test 1, as
desired.
Introduce the problem with recursion and
negation in Datalog.
Introduce stratification – a syntactic
restriction that avoids the problem.
Mention several ways to define the semantics
of a Datalog program.
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
2
Consult handout from
Ramakrishnan/Gehrke



We want values in the query answer to come
from the DB or constants in the query.
In addition, if we use an existential quantifier,
we want the value substituted in for the
existential quantifier to come from the DB or
the constants in the query.
Finally, if we use a universal quantifier, we
want to find any value that makes the
formula false by only checking the tuples that
use values from the DB or the constants in
the query.
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
3
What about the 3rd bullet?
A typical tuple calculus query with universal
quantification:
{ S | S ε Sailors ^ B ε Boats
(B.color=‘red’→(R ε Reserves(S.sid=R.sid^R.bid=B.bid)}
which is equivalent to:
{ S | S ε Sailors ^ B ε Boats
((B.color=‘red’) v (R ε Reserves(S.sid=R.sid^R.bid=B.bid)}
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
4
Proving Safe Datalog is contained in
Allowable Domain Calculus
Induction on the number of rules with the answer
predicate in the head.’
Base case: zero rules. (The book assumes that the
answer predicate is a DB relation name.)
Inductive step:
 introduce additional variables & introduce xi=yj and
xi=v (for repeated var. & constants)
 Introduce “zi” for all var. in body but not in head
 Introduce R1(…)^R2(…)^R3(…) for body
 Find all other rules with same head, construct a body
(as above), create one big disjunction.
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
5
Proving Allowable Domain Calculus is
contained in Relational Algebra






Induction on the number of logical connectors in the
allowed domain calculus formula.
Minimal set of logical connectors (, v, )
For each R(A1, …, Am) construct an expression that
gives the active domain of R.
Do that for all R. Construct a dom. calculus Fdom(x)
expression that comprises the union of all such
domains or constants from the query.
{ x1, …, xn | F ^ Fdom(x1)^ … ^ Fdom(xn)}
We know how to construct rel. alg expressions for
Fdom. We form the cross product of Reldom(F) n
times and intersect is with the rel. alg. expression we
need to express F – the original expression in Q.
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
6
Rest of the proof sketch
Base case: zero logical connectors. Then F is just one
relation predicate. The rel alg expression is  …(σ…R) to
accommodate any constants or repeated variables in
R(x1, …, xn) and to account for R having more variables
than the desired query answer.
Induction: Assume it’s true for q logical connectors.
F1 v F2: use …(E1∩RelDom(F1)n-m) U  …(E2∩RelDom(F2)n-k)
F:
use RelDom(F)n – E
x (F): use …(E)
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
7
Several Datalog languages


Our
focus
in Unit 3

Our

focus
in Unit 1

Our
focus
in Unit 3
Datalog – one rule, no negation, no recursion.
(conjunctive queries)
Datalog – multiple rules, no negation, no
recursion.
Datalog – multiple rules, no negation, with
recursion. Recursion but not relationally complete.
Datalog – multiple rules, with negation, no
recursion. Relationally complete but no recursion.
Datalog – multiple rules, with negation, with
recursion. Relationally complete with recursion but
some queries are ambiguous!
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
8
Datalog program and it’s dependency
graph
Given a DB with two relations:
Topics(Topic) and Interests(Person,Topic)
Result(a):-Interests(a,b,), DIFF(a).
Diff(a) :- Prod(a,b), Interests(a,b).
Prod(a,b) :- Interests(a,c), Topics(b).
Topics
Draw an arrow from body predicate
to corresponding head predicate.
Acyclic dependency graph = not recursive.
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
Interests
Prod
Diff
Result
9
Another example
Anc(x,y) :- Parent-of(x,y).
Anc(x,z) :- Anc(x,y), Parent-of(y,z).
Recursive Datalog program
has a cycle in the dependency
graph.
Parent-of
Anc
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
10
Now consider Datalog with recursion &
negation
Person(Dan). (one tuple in base relation)
Student(x) :- Person(x), Employee(x).
Employee(x) :- Person(x), Student(x).
What is the query answer?
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
11
What does the dependency graph look
like?
Person(Dan). (one tuple in base relation)
Student(x) :- Person(x), Employee(x).
Employee(x) :- Person(x), Student(x).
This is a recursive program.
Student
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
Person
Employee
12
More about this dependency graph
Person(Dan). (one tuple in base relation)
Student(x) :- Person(x), Employee(x).
Employee(x) :- Person(x), Student(x).
Cycle in dependency
graph → recursion.
Label negative predicates
in dependency graph.
Person

Student
Employee

CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
13
Now consider recursion & negation
Person(Dan). (one tuple in base relation)
Student(x) :- Person(x), Employee(x).
Employee(x) :- Person(x), Student(x).
We have a problem when
a cycle in a dependency
graph includes (at least)
one negative edge.
Person

Student
Employee

CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
14
Another example with
negation & recursion
Node(a).
Node(b).
Node(c).
Node(d). These are facts from the database.
Arc(a,b).
Arc(c,d).
Reachable(a).
Reachable(y) :- Reachable(x), Arc(x,y).
Unreachable(x) :- Node(x), Reachable(x).
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
15
Stratification – for Datalog with negation
and recursion
A stratification of a Datalog program P
is a partition of P into strata or layers where,
for every rule, H :- A1, …, Am, B1, …, Bq



The rules that define H are all in the same strata.
For all positive predicates in this rule, their strata
is <= the strata of this rule.
Strata(Ai) <= Strata(H).
For all negative predicates in this rule, their strata
is < the strata of this rule.
Strata(Ai) < Strata(H).
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
16
Exercise: Define a stratification for this
datalog program.
Node(a).
Node(b).
Node(c).
Node(d). These are facts from the database.
Arc(a,b).
Arc(c,d).
Reachable(a).
Reachable(y) :- Reachable(x), Arc(x,y).
Unreachable(x) :- Node(x), Reachable(x).
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
17
Can you define a stratification for this
program?
Person(Dan). (one tuple in base relation)
Student(x) :- Person(x), Employee(x).
Employee(x) :- Person(x), Student(x).
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
18
Comments


Stratified Datalog programs have a unique
answer.
Not all Datalog programs can be stratified.
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
19
Slide repeated from Unit 1 (emphasis added)
Datalog with recursion
Faculty (f-id, name, major-professor)
Academic-descendant(x,y) :- Faculty(x,a,y).
Academic-descendant(x,z) :Academic-descendant(x,y), Faculty(y,b,z).
How does this Datalog program (without negation) get
evaluated?
Fire all rules (from right to left) until you don’t
produce any new tuples in Academic-descendant.
Note each Datalog rule is independent. The variable
names in separate rules have no connection.
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
20
Semantics of a Datalog Program


The fixed point of the “immediate
consequence” operator, applied to the
Datalog program.
The minimal model for the Datalog program.
Plus others, particularly for Datalog with
negation.
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
21
Comments

For Datalog with recursion, but NO negation,
then:





The minimal model is unique.
The minimal model is always the intersection of all
the models.
The minimal model is the same as the fixed point
of the immediate consequence operator.
This language is monotonic (rules only add facts)
For Datalog with recursion & negation:

There may not be a unique minimal model.
CS 510 Principles of DB Systems, Fall 2006 © Lois Delcambre, Dave Maier
22