A Dichotomy in the Complexity of Deletion Propagation with

A Dichotomy in the Complexity of Deletion Propagation
with Functional Dependencies
Benny Kimelfeld
IBM Research–Almaden, San Jose, CA 95120, USA
[email protected]
ABSTRACT
a maximum number of user-file access permissions remain.
This is exactly the deletion-propagation problem, where the
view is defined by the following conjunctive query (CQ):
A classical variant of the view-update problem is deletion propagation, where tuples from the database are deleted in order to
realize a desired deletion of a tuple from the view. This operation
may cause a (sometimes necessary) side effect—deletion of additional tuples from the view, besides the intentionally deleted one.
The goal is to propagate deletion so as to maximize the number
of tuples that remain in the view. In this paper, a view is defined
by a self-join-free conjunctive query (sjf-CQ) over a schema with
functional dependencies. A condition is formulated on the schema
and view at hand, and the following dichotomy in complexity is
established. If the condition is met, then deletion propagation
is solvable in polynomial time by an extremely simple algorithm
(very similar to the one observed by Buneman et al.). If the condition is violated, then the problem is NP-hard, and it is even
hard to realize an approximation ratio that is better than some
constant; moreover, deciding whether there is a side-effect-free
solution is NP-complete. This result generalizes a recent result
by Kimelfeld et al., who ignore functional dependencies. For the
class of sjf-CQs, it also generalizes a result by Cong et al., stating that deletion propagation is in polynomial time if keys are
preserved by the view.
1.
Access(u, f ) :− GroupUser(g, u), GroupFile(g, f )
(1)
Formally, for a CQ Q (the view definition), the input for
the deletion-propagation problem consists of a database instance I and a tuple a ∈ Q(I). A solution is an subinstance J of I (i.e., J is obtained from I by deleting facts)
such that a ∈
/ Q(J), and an optimal solution maximizes
the number of tuples in Q(J). The side effect is the set
(Q(I) \ {a}) \ Q(J). Buneman et al. [3] identify some classes
of CQs (e.g., projection-free CQs) for which a straightforward algorithm produces an optimal solution in polynomial
time. But those tractable classes are highly restricted, as
Buneman et al. show that even for a CQ as simple as the
above Access(u, f ), testing whether there is a solution that
is side-effect free is NP-complete. Therefore, finding an
optimal solution minimizing the side effect is NP-hard for
Access(u, f ). In a more recent work, Kimelfeld et al. [13]
considered views that are defined by a self-join-free CQ (sjfCQ), and proved a dichotomy theorem implying that the
views for which the straightforward algorithm is suboptimal
are exactly those for which deletion propagation is NP-hard.
Later, we discuss that dichotomy theorem in more detail.
Conceivably, one could settle for a solution that is just
approximately optimal. This paper follows the notion of
approximation of Kimelfeld et al. [13], who view the optimization problem as that of maximization (of the number of
tuples that remain in the view) rather than minimization (of
the side effect). As noted by Buneman et al. [3], minimization of the side effect does not give rise to approximations,
at least for Access(u, f ), since then an approximation (with
any ratio) can be used to solve the NP-complete problem of
testing for the existence of a side-effect-free solution. In fact,
this argument is true not just for Access(u, f ), but actually
for the whole class of sjf-CQs with intractable deletion propagation [13]. On the other hand, in the case of maximization there is always a constant-factor approximation when
the view defined by an sjf-CQ [13].
Going back to the Access(u, f ) view defined in (1), suppose now that we have the restriction that each user belongs to at most one group, and that each file is associated with at most one group. It is not difficult to show
that then, deletion propagation is in polynomial time, since
to delete a tuple (user, file) from the view, it is enough
to consider (and select the best out of) two possible solutions: one obtained by deleting the single fact of the form
GroupUser(g, user), and the other by deleting the single
fact of the form GroupFile(g, file). Actually, this example is a special case of the result of Cong et al. [5], stating
INTRODUCTION
A classical problem in database management is that of
view update [1, 5, 6, 10–12]: translate an update operation
on the view to an update of the source database, so that
the update on the view is properly realized. A special case
of this problem is that of deletion propagation in relational
databases: given an undesired tuple in the view (defined by
some monotonic query), delete some tuples from the source
relations (where such tuples are referred to as facts), so that
the undesired tuple disappears from the view. The database
resulting from this deletion of tuples is called a solution. The
side effect is the set of deleted view tuples that are different
from the undesired one. If only the undesired tuple gets
to be deleted from the view, then the solution is side-effect
free. A solution that is side-effect free does not necessarily
exist, and therefore, the task is relaxed to that of minimizing
the side effect [3, 5, 13]. That is, the problem is to delete
tuples from the source relations so that the undesired tuple
disappears from the view, but as many as possible other
tuples remain there.
Following is a simple example by Cui and Widom [7]
(also referenced and used later by Buneman et al. [3] and
Kimelfeld et al. [13]). Suppose that we have two relations,
GroupUser(group, user) and GroupFile(group, file), representing memberships of users in groups and access permissions
of groups to files, respectively. A user u can access the file
f if u belongs to a group that can access f ; that is, there
is some g, such that GroupUser(g, u) and GroupFile(g, f ).
Suppose that we want to restrict the access of a specific user
to a specific file, by eliminating some user-group or group-file
pairings. Furthermore, we would like to do so in a way that
1
atoms are connected by an edge whenever they share at
least one existential variable. The tractability condition in
the dichotomy theorem of Kimelfeld et al. [13] says that for
each connected component P of G∃ (Q) there is an atom that
contains all head variables of P . For Q = Access(u, f ), the
graph G∃ (Q) has only one connected component (containing
the two atoms). Since no atom contains both u and f , the
tractability condition of Kimelfeld et al. [13] is violated, thus
MaxDPhS, Qi is NP-hard. Now suppose that S contains the
fd file → group (i.e., every file belongs to one group). Then
the existential variable g is functionally determined by the
head variables; that is, in a mapping from Q to the database,
if we know what the head variables are mapped to, then we
also know what g is mapped to. So, we treat g as if it was
a head variable. But now, the edge of G∃ (Q) disappears
(since the two atoms no longer share an existential variable), and we get two connected components, each having
one node. The tractability condition of Kimelfeld et al. [13]
trivially holds then, and so, MaxDPhS, Qi is now in polynomial time. Lastly, suppose that instead of file → group, the
schema S contains the fd group → user (i.e., every group has
at most one user). Although the atom GroupFile(g, f ) does
not contain both head variables (u and f ), it functionally
determines both of them. This implies that the tractability
condition of this paper is satisfied, and hence, MaxDPhS, Qi
is again in polynomial time.
Proving the tractable part of the dichotomy is fairly simple. However, the proof of the hardness part is nontrivial,
and entails a fairly intricate construction and case analysis.
A central concept in the proof is that of graph separation.
In Section 5, we review the proof, which is restricted to the
NP-completeness of FreeDPhS, Qi. In Section 6 we discuss
the adaptation of the proof to hardness of approximation
for MaxDPhS, Qi; the discussion is brief, as this adaptation
uses ideas similar to the ones used by Kimelfeld et al. [13].
The work reported here is restricted to a special setting
within view update: deletion propagation for sjf-CQs in the
presence of fds, with the goal of preserving as many tuples of
the view as possible. Other aspects of deletion propagation,
and view update in general, have been studied in the literature. For example, deletion propagation was studied with
the goal of minimizing the source side effect [3, 5], namely,
finding a solution with a minimal number of missing facts.
A significant research effort has been invested on defining
the semantics of (and notions of correctness for) general
view-update operations (e.g., deletion and insertion of tuples) [1, 6, 10, 12]. For example, Cosmadakis and Papadimitriou [6] introduced a requirement for a view to keep intact
a complement view (which has also been explored more recently by Lechtenbörger and Vossen [15]) that, intuitively,
contains the information ignored by the view.
Several dichotomies in the complexity of operations involving CQs (and sjf-CQs) were proved in the literature.
For example, Dalvi and Suciu [9] considered query evaluation over probabilistic databases with independent tuples,
and classified the sjf-CQs into those that can be evaluated in
polynomial time and those that are #P-hard. Later, Dalvi
et al. [8] extended this result to the class of unions of conjunctive queries (allowing self joins). More recently, Meliou
et al. [19] showed a dichotomy (polynomial time vs. NPcompleteness) in the complexity of computing the degree
of responsibility of a database tuple for an answer in the
result of an sjf-CQ. Examples of dichotomy theorems that
that when the view (defined by a CQ) preserves keys, deletion propagation is in polynomial time. Now what if each
file is associated with one group, but a user may belong to
any number of groups (as commonly practiced by operating systems)? It is not difficult to show that, here also, the
problem is in polynomial time. Actually, it turns out that
every nontrivial key constraint suffices for the tractability
of deletion propagation for Access(u, f ); moreover, there is
one very simple algorithm, which we later refer to as the
unirelation algorithm, that realizes the polynomial time for
each of these key constraints.
The above example illustrates the fact that functional dependencies (e.g., key constraints, {conference, year} →chair,
etc.), abbreviated fds, have an immediate and direct effect
on the complexity of deletion propagation. Moreover, this
effect is on one direction: if for a certain view definition the
problem is in polynomial time when disregarding the fds,
it continues to be in polynomial time when fds are taken
into account; however, if the problem is hard without the
fds, it may no longer be hard when fds are accounted for.
Therefore, the class of views that are known to be tractable
for deletion propagation would expand as a result of the
investigation of the problem in the presence of functional
dependencies. This what we do in this paper.
More formally, the setting is as follows. The view is
defined by an sjf-CQ Q over a schema S that may have
functional dependencies. (Exact definitions of sjf-CQs and
schemas are in Section 2.) In the problem MaxDPhS, Qi
(where DP stands for “Deletion Propagation”), the input
consists of a database instance I over S and a tuple a ∈ Q(I).
A solution is an subinstance J of I (i.e., J is obtained from
I by deleting facts) such that a ∈
/ Q(J). The goal is to
find an optimal solution, which is a solution that maximizes the number of tuples in Q(J). The decision problem
FreeDPhS, Qi has the same input (i.e., I and a), and the
goal is to decide whether there is a solution that is side-effect
free, that is, a subinstance J of I with Q(J) = Q(I) \ {a}.
The main result of this paper is a generalization of the
dichotomy theorem of Kimelfeld et al. [13] to accommodate
functional dependencies. More precisely, in Section 3 we
phrase a syntactic condition on (the fds of) the schema S and
the sjf-CQ Q at hand. We call this condition the tractability
condition. Then, the following is proved (Theorem 6.1).
• If the tractability condition is met, then MaxDPhS, Qi
(as well as FreeDPhS, Qi) is solvable in polynomial
time by a straightforward algorithm. (Section 3.)
• Otherwise, MaxDPhS, Qi is NP-hard, even to approximate better than some constant smaller than one;
moreover, FreeDPhS, Qi is NP-complete. (Section 5.)
The straightforward algorithm in the positive side of the dichotomy is called the unirelation algorithm, and is described
in Section 3. This algorithm is very similar to the “trivial”
algorithm used by Kimelfeld et al. [13] and originally observed by Buneman et al. [3].
To give the flavor of our tractability condition, we illustrate it on a very simple example, namely the CQ Access(u, f )
defined in (1). (More involved examples are in the body of
the paper.) In this CQ, u and f are head variables while
g is an existential variable. Suppose first that the underlying schema S has no fds. In the existential-connectivity
graph of Q, denoted G∃ (Q), each atom is a node, and two
2
2.2 Conjunctive Queries
involve CQs and key constraints are within the topic of consistent query answering. Kolaitis and Pema [14] proved a
dichotomy (polynomial time vs. coNP-hardness) in the complexity of computing the consistent answers of a Boolean
sjf-CQ with exactly two atoms. Finally, Maslowski and Wijsen [17] showed a dichotomy (polynomial vs. #P-hardness)
in the complexity of counting the database repairs that satisfy a Boolean sjf-CQ. We are not aware of any dichotomy
theorem that applies to sjf-CQs and general functional dependencies like as our main theorem (Theorem 6.1).
The classical use case for deletion propagation is database
administration: actual translation of a deletion in the view
to deletion in the source relations. Contemporary applications provide further motivation—database debugging. As
an example, Information Extraction (IE) programs, which
can be highly complicated, often operate on unclean tables
and dictionaries that by themselves may be produced by
means of prior IE programs [16, 20]. When an erroneous
result is detected in the output, the developer wishes to
find the cause of that result (e.g., wrong dictionary entries),
which may require significant manual effort in complex systems (e.g., the MIDAS system [4]). We can view the set
of source tuples suggested for deletion as a debugging suggestion, or alternatively as a cause of the undesired result.
That cause is meaningful in the sense that it is focused on
the undesired result and has a small (if any) effect on the
other results. This notion of causality is different from that
of Meliou et al. [18], which actually corresponds to deletion
propagation with minimal source side effect. Of course, IE
programs entail more than CQs. Nevertheless, we believe
that the case of sjf-CQs is already challenging enough, and
our hope is that insights on them will lead to understanding
of the studied problems in more complex settings.
Due to space limitations, the proofs are in the appendix.
2.
We fix an infinite set Var of variables, and assume that
Cnsts and Var are disjoint sets. We usually denote variables
by lowercase letters from the end of the Latin alphabet (e.g.,
x, y and z). We use the Datalog style for denoting a conjunctive query (abbrev. CQ): a CQ over a schema S is an
expression of the form Q(y) :− ϕ(x, y, c), where x and y
are tuples of variables (from Var), c is a tuple of constants
(from Cnsts), and ϕ(x, y, c) is a conjunction of atomic formulas φi (x, y, c) over S; an atomic formula is also called an
atom. We may write just Q(y), or even just Q, if ϕ(x, y, c)
is irrelevant. We denote by atoms(Q) the set of atoms of Q.
We usually write ϕ(x, y, c) by simply listing atoms(Q). We
make the requirement that each variable occurs at most once
in x and y, and no variable occurs in both x and y. Furthermore, we require every variable of y to occur (at least
once) in ϕ(x, y, c). The arity of Q, denoted arity(Q), is the
length of the tuple y.
Let Q(y) :− ϕ(x, y, c) be a CQ. A variable in x is called
an existential variable, and a variable in y is called a head
variable. We use Var∃ (Q) and Varh (Q) to denote the sets of
existential variables and head variables of Q, respectively.
Similarly, if φ is an atom of Q, then Var∃ (φ) and Varh (φ) denote the sets of existential and head variables, respectively,
that occur in φ. We denote by Var(Q) and Var(φ) the unions
Var∃ (Q) ∪ Varh (Q) and Var∃ (φ) ∪ Varh (φ), respectively.
In general, this work is restricted to CQs that are self-join
free, which means that each relation symbol occurs at most
once in the CQ. We use sjf-CQ as an abbreviation of selfjoin-free CQ. Let Q(y) :− ϕ(x, y, c) be an sjf-CQ over the
schema S = (R, ∆). Without loss of generality, we assume
that the set of relation symbols in Q is exactly the set of
relation symbols of S. For a relation symbol R of S, the
unique atom over R is denoted by φR (x, y, c) or just φR .
Let S be a schema, let Q(y) be a CQ over S, and let I
be an instance over S. An assignment for Q is a mapping
µ : Var(Q) → Cnsts. For an assignment µ for Q, the tuple
µ(y) is the one obtained from y by replacing every head
variable y with the constant µ(y). Similarly, for an atom φ
of Q, the fact µ(φ) is the one obtained from φ by replacing
every variable z with the constant µ(z). A match for Q in
I is an assignment µ for Q, such that µ(φ) is a fact of I for
all φ ∈ atoms(Q). We denote by M(Q, I) the set of all the
matches for Q in I. If µ is a match for Q in I, then µ(y) is
called an answer (for Q in I). The result of evaluating Q
over I, denoted Q(I), is the set of all the answers for Q in
I; that is:
SETTING AND BACKGROUND
This section describes the formal setting for this paper,
and provides some needed background.
2.1 Schemas and Functional Dependencies
In this paper, a schema S is a pair (R, ∆), where R =
{R1 , . . . , Rn } is a finite set of relation symbols and ∆ is a
set of functional dependencies. We abbreviate functional
dependency as fd. Each relation symbol Ri ∈ R has an arity,
denoted arity(R), which is a natural (nonzero) number. An
fd δ ∈ ∆ has the form Ri : A → B, where Ri ∈ R and A
and B are subsets of {1, . . . , arity(Ri )}. We use Ri : A → j
and Ri : j ′ → j as shorthand notations of Ri : A → {j} and
Ri : {j ′ } → {j}, respectively.
We assume an infinite set Cnsts of constants. A database
over a schema S is called an instance (of S), and its values
are taken from Cnsts. Formally, an instance I over a schema
S = (R, ∆) consists of a finite relation RiI ⊆ Cnstsarity(Ri )
for each relation symbol Ri ∈ R, such that all the fds of S
are satisfied; that is, for each fd Ri : A → B in ∆ and tuples
t and u in RiI , if t and u agree on (i.e., have the same values
for) the indices of A, then they also agree on the indices of
B. If I is an instance over S and t is a tuple in RiI , then
we say that Ri (t) is a fact of I. Notationally, we view an
instance I as the set of its facts. As an example, J ⊆ I
means that RiJ ⊆ RiI for all Ri ∈ R, in which case we say
that J is subinstance of I. (Note that if I satisfies ∆, then
every subinstance of I satisfies ∆ as well.)
def
Q(I) = {µ(y) | µ ∈ M(Q, I)}
Over a given schema, let Q(y) be a CQ, let I be an instance, and let b be an answer. A fact f ∈ I is said to be
(Q, b)-useful (in I) if there is an atom φ ∈ atoms(Q) and a
match µ for Q in I, such that µ(y) = b and µ(φ) = f . A
fact f ∈ I is said to be Q-useful (in I) if f is (Q, b)-useful
for some answer b.
Example 2.1. An important CQ in this work is the sjfCQ Q⋆ which is the same as the CQ Access(u, f ) defined
in (1) (and discussed in the introduction), up to renaming
of relation symbols and variables. The schema is S⋆ , and it
has two binary relation symbols R1 and R2 and no functional
dependencies. The sjf-CQ Q⋆ over S⋆ is defined as follows.
Q⋆ (y1 , y2 ) :− R1 (x, y1 ), R2 (x, y2 )
3
(2)
an answer a⋆ = (a1 , a2 ). To obtain a solution, we need to
delete from I ⋆ facts, such that for all constants c where both
R1 (c, a1 ) and R2 (c, a2 ) are in I ⋆ , at least one of the two is
deleted. In MaxDPhS⋆ , Q⋆ i, the goal is to have as many
as possible pairs remaining in Q⋆ (I ⋆ ) after the deletion. In
FreeDPhS⋆ , Q⋆ i, the goal is to determine whether we can
delete facts from I ⋆ , so that every original answer, except
for a⋆ , remains an answer. Buneman et al. [3] proved the
following.
The atoms of Q⋆ are φR1 = R1 (x, y1 ) and φR2 = R2 (x, y2 ).
In the examples of this article, Ri and Rj are assumed to
be different symbols when i 6= j. In particular, schema(Q⋆ )
consists of two distinct binary relation symbols: R1 and R2 .
Note that Q⋆ is an sjf-CQ, since R1 and R2 are different
relation symbols. There is only one existential variable in
Q⋆ , namely x, and the two head variables are y1 and y2 .
Hence Var∃ (Q) = {x} and Varh (Q) = {y1 , y2 }. Furthermore,
Var∃ (φR1 ) = {x} and Varh (φR1 ) = {y1 }.
Theorem 2.3. [3] FreeDPhS⋆ , Q⋆ i is NP-complete (and
hence, MaxDPhS⋆ , Q⋆ i is NP-hard).
2.2.1 CQ Functional Dependencies
Let S = (R, ∆) be a schema, and let Q be a CQ over S.
If µ is an assignment for Q and Z is a subset of Var(Q),
then µ[Z] denotes the restriction of µ to the variables in Z
(i.e., Z is the domain of µ[Z], and µ[Z](z) = µ(z) for all
z ∈ Z). Let Z1 and Z2 be subsets of Var(Q). We denote
by Q : Z1 → Z2 the functional dependency stating that for
every instance I over S and matches µ and µ′ of Q in I we
have the following:
Kimelfeld et al. [13] extended Theorem 2.3 to a dichotomy
theorem, which we discuss in the next section.
Another problem that falls in our setting is that of deletion propagation for key-preserving views [5], which is described as follows. Let S = (R, ∆) be a schema. For a
relation symbol R ∈ R, a primary key for R is a set K
of indices in {1, . . . , arity(R)}, such that ∆ contains the fd
K → {1, . . . , arity(R)}. We say that a CQ Q over S is key
preserving if for every φ ∈ atoms(Q) there is a key Kφ for
Rφ , such that φ has no existential variables in the positions
of Kφ . Cong et al. [5] proved the following.
µ[Z1 ] = µ′ [Z1 ] ⇒ µ[Z2 ] = µ′ [Z2 ]
Note that when ∆ is empty (e.g., as in the setting of [3,13]),
then Q : Z1 → Z2 is equivalent to Z1 ⊇ Z2 . As usual, we
may write Q : Z → z instead of Q : Z → {z}. The image of
Z, denoted img(Z), is the set of all the variables z ∈ Var(Q),
such that Q : Z → z. Note that Z ⊆ img(Z). If φ is an
atom of Q, then we use img(φ) as a shorthand notation of
img(Var(φ)).
Theorem 2.4. [5] Let S be a schema and let Q be a CQ
over S, such that Q is key preserving. Then MaxDPhS, Qi
(as well as FreeDPhS, Qi) is solvable in polynomial time.
In the end of Section 3, we will show how Theorem 2.4
follows from the results of this paper in the case where Q is
self-join free.
Example 2.2. Let S = (R, ∆) be such that R contains
two binary relation symbols R1 , R2 and two ternary relation
symbols S and T . In addition, suppose that ∆ contains the
following fds:
R1 : 1 → 2
R2 : 1 → 2
2.4 Head Domination and Dichotomy
Let S be a schema, and let Q be a CQ over a S. The
existential-connectivity graph of Q, denoted G∃ (Q), is the
undirected graph that has atoms(Q) as the set of nodes, and
that has an edge {φ1 , φ2 } whenever φ1 and φ2 have at least
one existential variable in common (in notation, Var∃ (φ1 ) ∩
Var∃ (φ2 ) 6= ∅). Let φ be an atom of Q, and let P be a set of
atoms of Q. We denote by Var(P ) the set of all the variables
that occur in P , by Varh (P ) the set Var(P ) ∩ Varh (Q), and
by Var∃ (P ) the set Var(P ) ∩ Var∃ (Q).
S : {1, 2} → 3
Let Q be the following CQ.
Q(y1 , y2 , y3 ) :−
R1 (x1 , y1 ), R2 (x2 , y2 ), S(y1 , y3 , x), T (x, x1 , x2 )
Then we have img({x1 }) = {x1 , y1 } due to the fd R1 :
1 → 2, img({x2 }) = {x2 , y2 } due to R2 : 1 → 2, and
img({y1 , y3 }) = {y1 , y3 , x} due to S : {1, 2} → 3. In addition, we have img(φT ) = {x, x1 , x2 , y1 , y2 } due to the fds
R1 : 1 → 2 and R2 : 1 → 2. (Recall that φT is the atom
T (x, x1 , x2 ).)
Example 2.5. The left side of Figure 1 shows the graph
G∃ (Q) for the CQ Q of Example 2.2. Observe that there is no
edge between R1 (x1 , y1 ) and S(y1 , y3 , x) since the two atoms
do not share an existential variable (they share y1 , which is a
head variable). The graph G∃ (Q) is connected, and its single
connected component P satisfies Varh (P ) = {y1 , y2 , y3 } and
Var∃ (P ) = {x, x1 , x2 }.
2.3 Deletion Propagation
Let S be a schema, and let Q be a CQ. The problem of
maximizing the view in deletion propagation, with Q as the
view definition, is denoted by MaxDPhS, Qi and is defined
as follows. The input consists of an instance I over S, and
a tuple a ∈ Q(I). A solution (for I and a) is a subinstance
J ⊆ I, such that a ∈
/ Q(J). The goal is to find an optimal
solution, which is a solution J that maximizes |Q(J)|; that
is, J is such that |Q(J)| ≥ |Q(J ′ )| for all solutions J ′ . The
problem of determining whether there is a side-effect-free solution, denoted FreeDPhS, Qi, is that of deciding whether
there is a solution in which we lose exactly a; that is, given
the input I and a, decide whether there is a subinstance
J ⊆ I with Q(J) = Q(I) \ {a}.
As an example, consider the schema S⋆ and the CQ Q⋆ of
Example 2.1. The input for the problems MaxDPhS⋆ , Q⋆ i
and FreeDPhS⋆ , Q⋆ i consists of an instance I ⋆ over S⋆ and
Kimelfeld et al. [13] defined the head-domination property
of a CQ.
Definition 2.6. (Head Domination [13]) A CQ Q
has the head-domination property if for every connected
component P of G∃ (Q) there is an atom φ of Q such that
Varh (P ) ⊆ Var(φ).
Note that in the definition, the atom φ that satisfies Varh (P ) ⊆
Var(φ) is not necessarily in P . Examples follow.
Example 2.7. A CQ that has head domination is the
following:
Q′ (y1 , y2 , y3 ) :− R1 (x, y1 ), R2 (x, y2 ), S(y1 , y2 ), T (y2 , y3 )
4
R1 (x1 , y1 )
R2 (x2 , y2 )
T (x, x1 , x2 )
S(y1 , y3 , x)
R2 (x2 , y2 )
MaxDPhS, Qi to be solvable in polynomial time. In Section 5, we prove that the condition is also necessary (under
standard complexity assumptions). We begin by describing
the unirelation algorithm, which is an extremely simple algorithm that produces a (not-necessarily optimal) solution
given the input I and a for MaxDPhS, Qi.
P1
R1 (x1 , y1 )
T (x, x1 , x2 )
S(y1 , y3 , x)
P2
3.1 The Unirelation Algorithm
Let S be a schema, and let Q be a CQ over S. Buneman
et al. [3] observed the following simple algorithm (also called
the trivial algorithm [13]) for producing a solution, given the
input I and a for MaxDPhS, Qi. For each φ ∈ atoms(Q),
construct the candidate solution Jφ by deleting from I every fact that can be obtained from φ by an assignment that
agrees with a on the head variables; then, select the Jφ with
the maximal |Q(Jφ )|. The unirelation algorithm, denoted
UniRelhS, Qi and depicted in Figure 2, is essentially the same
algorithm, except that we delete only facts that are (Q, a)useful (that is, the facts we delete are always of the form
µ(φ) where µ ∈ M(Q, I) is such that µ(y) = a). The following straightforward proposition states the fact that the
unirelation algorithm is correct and terminates in polynomial time.
Figure 1: The graphs G∃ (Q) (left) and G∃ (Q+ ) (right)
for the CQ Q of Example 2.2
Note that G∃ (Q′ ) has three connected components: P1 =
{φR1 , φR2 }, P2 = {φS }, P3 = {φT }. Since Varh (P1 ) ⊆
Var(φS ), Varh (P2 ) ⊆ Var(φS ) and Varh (P3 ) ⊆ Var(φT ), we
get that Q′ has indeed head domination.
As a negative example, consider again the CQ Q of Example 2.2. As shown in Example 2.5 (through Figure 1),
G∃ (Q) has only one connected component. Since no atom
contains all the head variables of Q, we get that Q does not
have head domination.
Kimelfeld et al. [13] proved the following theorem, which
states a dichotomy in the complexities of MaxDPhS, Qi and
FreeDPhS, Qi in the absence of fds.
Proposition 3.1. UniRelhS, Qi(I, a) returns a solution in
polynomial time.
Theorem 2.8. [13] Let S = (R, ∆) be a schema and let Q
be an sjf-CQ over S.
It follows immediately from the results of Kimelfeld et
al. [13] that if Q is an sjf-CQ with head domination, then
UniRelhS, Qi produces an optimal solution. It also follows
immediately that in the absence of fds, head domination is
a necessary condition for the optimality of UniRelhS, Qi; but
this is no longer true when fds are present, as the following
example shows.
1. If Q has head domination, then MaxDPhS, Qi can be
solved in polynomial time.
2. If ∆ = ∅ and Q does not have head domination, then
FreeDPhS, Qi is NP-complete.
In this paper, we extend the dichotomy of Theorem 2.8 to
account for fds (Theorem 6.1). As we will show, this extension is nontrivial, and requires arguments, concepts and constructions that are significantly different from those used for
proving Theorem 2.8. Specifically, we will define a tractability condition, which is a syntactic condition on the schema S
and a CQ Q at hand, such that fds are taken into account to
weaken the property of head domination. We will show that
the tractability condition gives a dichotomy similar to that
of Theorem 2.8, that is, its satisfaction implies tractability
of MaxDPhS, Qi and its violation implies NP-completeness
of FreeDPhS, Qi. (Of course, the tractability condition is
equivalent to head domination in the absence of fds.)
Actually, the result of Kimelfeld et al. [13] is stronger than
Theorem 2.8. In the tractable part (item 1), the problem
MaxDPhS, Qi is solved by an extremely simple algorithm
that has been observed already by Buneman et al. [3]. We
will show that the tractable side in the current paper is
solved by essentially the same algorithm, with a slight modification. In the intractable part of Theorem 2.8 (item 2),
Kimelfeld et al. further show some hardness of approximation for MaxDPhS, Qi. To simplify the presentation, we
ignore approximation through most of this paper, except for
Section 6 where we show that a similar hardness of approximation is in the intractable side of our dichotomy as well.
3.
Example 3.2. Consider again the CQ Q⋆ over S⋆ (Example 2.1), but now let us use the schema S⋆0 , which is the
same as S⋆ except that it has the fd R1 : 1 → 2. We
claim that, given input I ⋆ and a⋆ = (a1 , a2 ), the execution
of UniRelhS⋆0 , Q⋆ i(I ⋆ , a⋆ ) returns an optimal solution. As
proof, we will show that for φ = R2 (x, y2 ) the solution Jφ ,
constructed by the algorithm, is optimal. Let Jopt be an optimal solution. It is enough to show every fact R2 (c, a2 ) that
is (Q⋆ , a⋆ )-useful in I ⋆ is not Q-useful in Jopt (which then
implies that Q(Jopt ) ⊆ Q(Jφ )). Indeed, if Jopt contains both
R1 (c, b1 ) and R2 (c, a2 ), then the fd implies that b1 = a1 ,
since I ⋆ contains also R1 (c, a1 ) (or otherwise R2 (c, a2 ) is not
(Q⋆ , a⋆ )-useful in I ⋆ ). This means that Q(Jopt ) contains a⋆ ,
which contradicts the fact that Jopt is a solution.
3.2 Functional Head Domination
The idea that is exploited in Example 3.2 to show the optimality of UniRelhS⋆0 , Q⋆ i is the following. Although φR2 does
not contain both y1 and y2 (which are the head variables of
the single connected component of G∃ (Q⋆ )), img(φR2 ) contains them both. So intuitively, we can treat φR2 as if it
contains both y1 and y2 . This idea is formalized next.
Definition 3.3. (Functional Head Domination) Let
S be a schema, and let Q be a CQ over S. We say that Q
has functional head domination if for every connected component P of G∃ (Q) there is an atom φ ∈ atoms(Q), such
that Varh (P ) ⊆ img(φ).
TRACTABILITY
In this section, we phrase the tractability condition on
a schema S and an sjf-CQ Q over S. Also in this section we will prove that the condition is indeed sufficient for
5
which means that given an answer in Q(I ⋆), every corresponding assignment maps x to the same value. In effect,
this means that x can be treated as a head (rather than existential) variable, which may change the structure of G∃ (Q)
and can potentially lead to functional head domination. Indeed, if x becomes a head variable in Q⋆ , then no existential
variables remain in Q⋆ , and then (functional) head domination trivially holds. The following lemma formalizes this
intuition. The proof is straightforward, hence omitted.
Algorithm UniRelhS, Qi(I, a)
1: J ← ∅
2: for all φ ∈ atoms(Q) do
3:
Jφ ← I
4:
Remove from Jφ every (Q, a)-useful fact over Rφ
5:
if |Q(Jφ )| > |Q(J)| then
6:
J ← Jφ
7: return J
Lemma 3.7. Let S be a schema, and let Q(y) be a CQ
over S. Suppose that z is a variable in img(Varh (Q)), and
let Q′ be the CQ Q(y, z) (obtained by appending z to the
head). For all instances I over S, there is a (polynomialtime-computable) function β : Q(I) → Q′ (I), such that β is
a bijection from Q(J) to Q′ (J) for every J ⊆ I.
Figure 2: The unirelation algorithm
As an example, the CQ Q⋆ over the schema S⋆0 of Example 3.2 has functional head domination, since img(φR2 )
contains both y1 and y2 (as previously explained). Another
example follows.
For a CQ Q over a schema S, we denote by Q+ the CQ
that is obtained by appending (in some order) all the existential variables in Var∃ (Q) ∩ img(Varh (Q)) to the head of
Q. As an example, for the CQ Q = Q⋆ over the schema S⋆1
of Example 3.6, the CQ Q+ is:
Example 3.4. Consider the schema S and CQ Q of Example 2.2. Recall that the left side of Figure 1 shows the
graph G∃ (Q). Since G∃ (Q) is connected and no atom φ of
Q is such that img(φ) contains all the head variables, we
conclude that Q does not have functional head domination.
Now, let Q′ be obtained from Q by adding x to the head
(so the head variables of Q′ are x and the yi ). The right
side of Figure 1 shows G∃ (Q′ ). For the connected component P1 we have Varh (P1 ) = {x, y1 , y2 } ⊆ img(φT ), and
Varh (P2 ) = {y1 , y3 } ⊆ img(φS ). Thus, Q′ has functional
head domination.
Q+ (y1 , y2 , x) :− R1 (x, y1 ), R2 (x, y2 )
By repeatedly applying Lemma 3.7 we get the following.
Lemma 3.8. Let S be a schema, and let Q be a CQ over
S. Given an instance I over S, there is a (polynomial-timecomputable) function β : Q(I) → Q+ (I), such that β is a
bijection from Q(J) to Q+ (J) for every J ⊆ I.
As an immediate conclusion of Lemma 3.8, we get the
following lemma.
The following theorem extends the idea illustrated in Example 3.2: functional head domination is a sufficient condition for MaxDPhS, Qi to be solvable in polynomial time.
This theorem follows directly from Lemma 3.11 and Theorem 3.12 that are given later in this section.
Lemma 3.9. Let Q be a CQ over a schema S.
• There is a polynomial-time reduction from the problem
FreeDPhS, Qi to the problem FreeDPhS, Q+ i and
vice versa.
• There is a polynomial-time, approximation-preserving
reduction from MaxDPhS, Qi to MaxDPhS, Q+ i and
vice versa.
Theorem 3.5. Let S be a schema, and let Q be an sjfCQ over S. If Q has functional head domination, then
UniRelhS, Qi optimally solves MaxDPhS, Qi.
Nevertheless, functional head domination is still not a necessary condition for the optimality of the unirelation algorithm, as the following example shows.
3.3 Tractability Condition
We can now define our tractability condition.
Example 3.6. Consider again the CQ Q⋆ over S⋆ (Example 2.1), but now let us use the schema S⋆1 , which is the
same as S⋆ except that it has the fd R1 : 2 → 1 (recall
that in Example 3.2 we had R1 : 1 → 2). We again claim
that, given input I ⋆ and a⋆ = (a1 , a2 ), the execution of
UniRelhS⋆0 , Q⋆ i(I ⋆ , a⋆ ) returns an optimal solution. To see
that, let Jopt be an optimal solution. If none of the facts
R2 (c, a2 ) that are (Q⋆ , a⋆ )-useful in I ⋆ are Q-useful in Jopt ,
we are done as in Example 3.2. Otherwise, take such a fact
R2 (c, a2 ), and we will show that none of the facts R1 (c′ , a1 )
that are (Q⋆ , a⋆ )-useful in I ⋆ are Q⋆ -useful in Jopt (which
means that Q(Jopt ) ⊆ Q(Jφ ) for φ = R1 (x, y1 )). Indeed,
if R1 (c′ , a1 ) is a counterexample, then R1 : 2 → 1 implies c′ = c, since we know that I ⋆ contains R1 (c, a1 ) (or
otherwise R2 (c, a2 ) is not (Q⋆ , a⋆ )-useful in I ⋆ ), and then
a⋆ ∈ Q⋆ (Jopt ), which is of course a contradiction.
Definition 3.10. (Tractability Condition) For an sjfCQ Q over a schema S, the tractability condition is that Q+
has functional head domination.
Later on, we show that the tractability condition indeed
implies tractability. But first, the following proposition states
that this condition is weaker than functional head domination, which in turn is weaker than head domination.
Proposition 3.11. Let Q be a CQ over a schema S.
1. If Q has head domination, then Q has functional head
domination.
2. If Q has functional head domination, then Q+ has
functional head domination.
Furthermore, none of the two reverse implications hold.
Following is the main result for this section, and it says
that when the tractability condition is satisfied (and in the
absence of self joins), the unirelation algorithm is optimal.
The optimality of the unirelation algorithm in Example 3.6
is due to the fact that the existential variable x is functionally dependent on the head variables (i.e., x ∈ img({y1 , y2 })),
6
Theorem 3.12. Let Q be an sjf-CQ over S. If Q+ has
functional head domination, then UniRelhS, Qi optimally solves
MaxDPhS, Qi (in polynomial time).
y1
x1
x2
y2
y
Next, we illustrate applications of Theorem 3.12.
Figure 3: Gv (Q) in Example 4.1
Example 3.13. Consider again the schema S and CQ
Q of Example 2.2, where it is stated that the existential
variable x is in img({y1 , y2 }). So x ∈ img(Varh (Q)). The
reader can verify that none of x1 and x2 are in img(Varh (Q)).
Therefore, the CQ Q+ is the following:
The next theorem states some properties of fd-separation.
This theorem is important for this work, as it is used repeatedly in the proofs for the results of the following section.
Q(y1 , y2 , y3 , x) :−
R1 (x1 , y1 ), R2 (x2 , y2 ), S(y1 , y3 , x), T (x, x1 , x2 )
Theorem 4.2. Let Q be a CQ over a schema S, let Z ⊆
Var(Q) be a set of variables, and let z ∈ Var(Q) be a variable.
The following hold.
1. fdSep(Z, z) is closed under fds; that is:
Hence, Q+ is actually the CQ Q′ of Example 3.4, where it is
shown to have functional head domination. Therefore, the
tractability condition is satisfied, and hence, Theorem 3.12
says that MaxDPhS, Qi is in polynomial time.
img(fdSep(Z, z)) = fdSep(Z, z)
2. fdSep(Z, z) is closed under fd-separation by subsets;
that is, for all Z ′ ⊆ Var(Q) we have:
As another example, we will show how Theorem 2.4 (by
Cong et al. [5]) follows from Theorem 3.12 for the class of
sjf-CQs. Let S be a schema and let Q be an sjf-CQ over
S, such that Q is key preserving. It is straightforward to
show that Q being key preserving implies that Q+ has no
existential variables. Hence, G∃ (Q) has no edges, and so Q+
has (functional) head domination. Then, Theorem 3.12 says
that MaxDPhS, Qi is in polynomial time.
4.
Z ′ ⊆ fdSep(Z, z) ⇒ fdSep(Z ′ , z) ⊆ fdSep(Z, z)
3. Z determines z whenever Z separates from z a set that
determines z; that is, for all Z ′ ⊆ Var(Q) we have:
`
´
z ∈ img(Z ′ ) ∧ Z ′ ⊆ fdSep(Z, z) ⇒ z ∈ img(Z)
5. HARDNESS
In this section, we prove that for an sjf-CQ Q over a
schema S, if the tractability condition is violated then the
problem FreeDPhS, Qi is NP-complete. In the next section, we also show a hardness-of-approximation result for
MaxDPhS, Qi. We will discuss the main lemmas involved
in the proof, while complementary intermediate results are
in the appendix.
Similarly to the proof of hardness for Theorem 2.8 [13], we
prove hardness here by a reduction from FreeDPhS⋆ , Q⋆ i,
where S⋆ and Q⋆ are those defined in Example 2.1. (Nevertheless, the reductions here are significantly more involved
than those of [13].) We first set some notation and assumptions that we will hold for the remainder of this section.
FD-SEPARATION
Before we proceed to proving of the hardness side of our
dichotomy, we introduce the concept of fp-separation, which
plays a significant role in our proofs.
Let Q be a CQ over a schema S. The variable co-occurrence
graph of Q, denoted Gv (Q), is the graph that has Var(Q) as
the set of nodes, and an edge {z1 , z2 } whenever z1 and z2
co-occur in the same atom of Q (i.e., {z1 , z2 } ⊆ Var(φ) for
some φ ∈ atoms(Q)). Consider two variables z1 , z2 ∈ Var(Q)
and a subset Z of Var(Q). We say that Z separates z1 from
z2 if every path of Gv (Q) connecting z1 and z2 visits one or
more nodes in Z. Observe that as a special case, if z1 ∈ Z
then Z separates z1 from every other variable z2 . Moreover,
if Z separates z1 from itself then z1 is necessarily in Z.
Let Z be a subset of Var(Q), and let z1 , z2 ∈ Var(Q) be
two variables. We say that Z fd-separates z1 from z2 if
img(Z) separates z1 from z2 . We also say that an atom φ
of Q fd-separates z1 from z2 if Var(Q) fd-separates z1 from
z2 . We denote by fdSep(Z, z) (respectively, fdSep(φ, z)) the
set of all the variables z ′ of Q, such that Z (respectively, φ)
fd-separates z from z ′ .
5.1 Notation and Assumptions
For the remainder of this section, we fix a schema S =
(R, ∆) and an sjf-CQ Q over S, such that Q+ does not have
functional head domination. Membership of FreeDPhS, Qi
in NP is straightforward (nondeterministically choose a subinstance and test whether it is a side-effect-free solution). So
our goal is to prove that FreeDPhS, Qi is NP-hard. By
Lemma 3.9, it suffices to prove that FreeDPhS, Q+ i is NPhard. Therefore, to simplify the presentation we will assume
that Q = Q+ . We fix a component P of G∃ (Q), such that
none of the atoms φ of Q satisfy Varh (P ) ⊆ img(Q); such P
exists, since Q does not have functional head domination.
For the sake of presentation, we rephrase S⋆ and Q⋆ for a
clear distinction from S and Q. Specifically, we will assume
that the two relation symbols of S⋆ are R1⋆ and R2⋆ (rather
than R1 and R2 as in Example 2.1), and that the CQ Q⋆
over S⋆ is the following sjf-CQ:
Example 4.1. Let S be a schema with three ternary relation symbols R1 , R2 and S, and the fd S : {1, 3} → 2.
Consider the following CQ over S:
Q(y1 , y2 , y) :− R1 (y1 , y, x1 ), S(x1 , y, x2 ), R2 (x2 , y, y2 )
The graph Gv (Q) is depicted in Figure 3. The node x1
does not fd-separate y1 from y2 , since the path y1 — y — y2
does not contain any node in img({x1 }) = {x1 }. Similarly,
x2 does not fd-separate y1 from y2 . However, {x1 , x2 } fdseparates y1 from y2 , since every path between y1 and y2
visits a node in img({x1 , x2 }) = {x1 , x2 , y}. Finally, observe
that φS separating y1 from y2 .
Q⋆ (y1⋆ , y2⋆ ) :− R1⋆ (x⋆ , y1⋆ ), R2⋆ (x⋆ , y2⋆ )
It is known that FreeDPhS⋆ , Q⋆ i is NP-complete [3]. Recall
that S⋆ has no fds. Also recall that our goal is to devise
a reduction from FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi. So,
7
in the remaining of this section we fix the input I ⋆ and
a⋆ for FreeDPhS⋆ , Q⋆ i. We assume that a⋆ is the pair
(✁, ✄). Our reduction will construct the input I and a for
the problem FreeDPhS, Qi.
Recall from Section 4 that fdSep(φ, y) is the set of all variables z ∈ Var(Q), such that Var(φ) fd-separates y from z
(i.e., img(φ) separates y from z in the graph Gv (φ)).
Example 5.2. We continue with our running example,
which is introduced in Example 5.1. Figure 5 shows the
graph Gv (Q). The grey shadow should be ignored for now;
we discuss it later in this section. Recall that P consists of
all the atoms of Q, except for V (x3 , y2 , y3 ).
Suppose that the parameters of the template construction
are y1 , y2 , φ1 = R1 (x1 , x) and φ2 = R2 (x, x2 ). (Later,
we discuss the manner in which these specific parameters
are chosen.) The sets img(φ1 ) and img(φ2 ) are surrounded
by boxes with dotted edges. For each variable z of Q, the
corresponding node contains Θ(z) below z. As an example,
consider the variable x′ . As shown in the figure, Θ(x′ ) =
hx⋆ , y1⋆ , y ⋆ i. The occurrence of x⋆ is due to the fact that x′
is an existential variable in P ; that of y1⋆ is due to the fact
that φ2 does not fd-separate x′ from y1 , as evidenced by the
path x′ — x1 — y1 , and similar is the explanation for y2⋆ .
As another example, note that Θ(x′′ ) = hx⋆ , ✁, ✄i, since
φ2 (resp., φ1 ) fd-separates x′′ from y1 (resp., y2 ), as the
edges that are incident to x′′ connect x′′ to variables in
the intersection img(φ1 ) ∩ img(φ2 ). Finally, for x3 we have
Θ(x3 ) = h⋄, ✁, y2⋆ i, where the reason for the occurrence of ⋄
is that x3 is not in Var∃ (P ); besides that, the explanations
for ✁ and y2⋆ are like those for x′′ and x′ , respectively.
Example 5.1. A running example in this section is over
the following CQ.
Q(y1 , y2 , y3 ) :− S1 (y1 , x1 ), R1 (x1 , x), R2 (x, x2 ), S2 (x2 , y2 ),
T (x1 , x′ , x2 ), U (x, x′′ , y3 ), V (x3 , y2 , y3 )
The schema S has the relation symbols of Q (with the corresponding arities). In addition, S has the following fds.
S1 : 2 → 1
S2 : 1 → 2
U :1→3
Observe that Q = Q+ since no existential variable belongs to img({y1 , y2 , y3 }). The graph G∃ (Q) has two connected components, where one consists of just V (x3 , y2 , y3 )
and the other, which we fix here as P , consists of all remaining atoms. It is easy to verify that no atom φ of Q is such
that img(φ) contains all of y1 , y2 and y3 (which are the head
variables in P ); hence the tractability condition is violated.
As example input for FreeDPhS⋆ , Q⋆ i, we will use the
instance I ⋆ with the following facts:
R1 (0, ✁)
R2 (0, ✄)
R1 (0, a)
R2 (0, b)
R1 (1, ✁)
R2 (1, ✄)
R1 (1, c)
Let µ be a match in M(Q⋆ , I ⋆ ). The assignment Θµ is
the same as Θ, except that in the image, every occurrence
of x⋆ , y1⋆ and y2⋆ is replaced with µ(x⋆ ), µ(y1⋆ ) and µ(y2⋆ ),
respectively. We call Θµ an instantiation of Θ. The template
construction builds the instance IΘ that is the union of the
Θµ (φ) over all matches µ ∈ M(Q⋆ , I ⋆ ) and atoms φ of Q.
The matches for Q⋆ in I ⋆ are shown in the leftmost column in the table of Figure 4 (the other columns are discussed later); as a shorthand notation, the figure uses the
triple c, b1 , b2 (e.g., 0, ✁, b) to specify the match that maps
x⋆ , y1⋆ and y2⋆ to c, b1 and b2 , respectively. Note that
Q(I ⋆ ) = {(✁, ✄), (✁, b), (a, ✄), (a, b), (c, ✄)}. The matches
that produce a⋆ = (✁, ✄) are repeated (for later discussion)
in the bottom of the column. The reader can verify that
there is no side-effect-free solution for I ⋆ and a⋆ .
Example 5.3. We continue with our running example.
For presentation sake, in our examples we may write just
cb1 b2 instead of hc, b1 , b2 i (e.g., ⋄✁✄ instead of h⋄, ✁, ✄i).
Recall that Figure 5 shows the values that Θ assigns the
variables to. Each of the top six rows in the table of Figure 4
corresponds to one of the six matches µ ∈ M(Q⋆ , I ⋆ ). (We
later discuss the bottom two rows.) The leftmost element
(under M(Q⋆ , I ⋆ )) is µ, as explained in Example 5.1. The
cell under the atom φ contains the values of the fact Θµ (φ).
As an example, if µ is the match represented by 0, ✁, b and
φ is S1 (y1 , x1 ), then Θµ (φ) = S(⋄✁✄, 0✁✄). So, the top six
rows in the table depict IΘ (with some facts repeated).
5.2 Template Construction
As a first step in the reduction from FreeDPhS⋆ , Q⋆ i
to FreeDPhS, Qi, we define a template construction, which
produces an instance IΘ over S. In turn, IΘ will be a subinstance of the final instance I built in the reduction. The
construction is as follows.
Recall that we are given the input I ⋆ and a⋆ = (✁, ✄) for
FreeDPhS⋆ , Q⋆ i. Also recall that Q⋆ uses three variables:
x⋆ , y1⋆ , y2⋆ . We fix a constant ⋄ that does not appear is I ⋆ .
We first define a template assignment Θ, which maps every
variable z ∈ Var(Q) to a triple Θ(z) = hw, w1 , w2 i, where w
is either the variable x⋆ or the constant ⋄, w1 is either the
variable y1⋆ or the constant ✁, and w2 is either the variable
y2⋆ or the constant ✄.
More particularly, the entire template construction is parameterized by two variables y1 , y2 ∈ Varh (P ) and two atoms
φ1 , φ2 ∈ P . In the next section, we discuss in detail the way
these parameters are chosen (which is critical for the correctness of the reduction); for now, assume that they are
already present. Given the parameters y1 , y2 , φ1 and φ2 ,
the triple Θ(z), where z ∈ Var(Q), is constructed as follows.
• Begin with Θ(z) = hx⋆ , y1⋆ , y2⋆ i.
• If z ∈
/ Var∃ (P ), then replace x⋆ with ⋄.
• If z ∈ fdSep(φ2 , y1 ), then replace y1⋆ with ✁.
• If z ∈ fdSep(φ1 , y2 ), then replace y2⋆ with ✄.
Next, we discuss some basic properties of the template
construction, which hold regardless of the choice of the parameters φi and yi . The following lemma states that IΘ is
an instance of S (i.e., all fds are satisfied).
Lemma 5.4. IΘ satisfies every fd of ∆.
The next lemma, which is central to the correctness of our
construction, states that there are no matches for Q in IΘ
except for the instantiations of Θ (used for constructing IΘ ).
Lemma 5.5. M(Q, IΘ ) = {Θµ | µ ∈ M(Q⋆ , I ⋆ )}.
The tuple a (playing as input for FreeDPhS, Qi) is the
one obtained from y by replacing each variable y with the
triple h⋄, ✁, ✄i. As as example, with our shorthand notation
the tuple a in our running example is (⋄✁✄, ⋄✁✄, ⋄✁✄).
8
M(Q⋆ , I ⋆ ) S1 (y1 , x1 ) R1 (x1 , x) R2 (x, x2 )
0, ✁, ✄
⋄✁✄, 0✁✄ 0✁✄, 0✁✄ 0✁✄, 0✁✄
0, ✁, b
⋄✁✄, 0✁✄ 0✁✄, 0✁✄ 0✁✄, 0✁b
0, a, ✄
⋄a✄, 0a✄ 0a✄, 0✁✄ 0✁✄, 0✁✄
0, a, b
⋄a✄, 0a✄ 0a✄, 0✁✄ 0✁✄, 0✁b
1, ✁, ✄
⋄✁✄, 1✁✄ 1✁✄, 1✁✄ 1✁✄, 1✁✄
1, c, ✄
⋄c✄, 1c✄ 1c✄, 1✁✄ 1✁✄, 1✁✄
0✁✄, ♥φ
0
1✁✄, ♥φ
1
⋄✁✄, 0✁✄
⋄✁✄, 1✁✄
0, ✁, ✄
1, ✁, ✄
♥φ
0 , 0✁✄
♥φ
1 , 1✁✄
S2 (x2 , y2 )
T (x1 , x′ , x2 )
U (x, x′′ , y3 )
0✁✄, ⋄✁✄ 0✁✄, 0✁✄, 0✁✄ 0✁✄, 0✁✄, ⋄✁✄
0✁b, ⋄✁b 0✁✄, 0✁b, 0✁b 0✁✄, 0✁✄, ⋄✁✄
0✁✄, ⋄✁✄ 0a✄, 0a✄, 0✁✄ 0✁✄, 0✁✄, ⋄✁✄
0✁b, ⋄✁b
0a✄, 0ab, 0✁b 0✁✄, 0✁✄, ⋄✁✄
1✁✄, ⋄✁✄ 1✁✄, 1✁✄, 1✁✄ 1✁✄, 1✁✄, ⋄✁✄
1✁✄, ⋄✁✄ 1c✄, 1c✄, 1✁✄ 1✁✄, 1✁✄, ⋄✁✄
0✁✄, ⋄✁✄ 0✁✄, 0✁✄, 0✁✄
1✁✄, ⋄✁✄ 1✁✄, 1✁✄, 1✁✄
φ
φ
♥φ
0 , ♥0 , ♥0
φ
φ
♥1 , ♥1 , ♥φ
1
V (x3 , y2 , y3 )
Q(I)
⋄✁✄, ⋄✁✄, ⋄✁✄ ⋄✁✄, ⋄✁✄, ⋄✁✄
⋄✁b, ⋄✁b, ⋄✁✄ ⋄✁✄, ⋄✁b, ⋄✁✄
⋄✁✄, ⋄✁✄, ⋄✁✄ ⋄a✄, ⋄✁✄, ⋄✁✄
⋄✁b, ⋄✁b, ⋄✁✄
⋄a✄, ⋄✁b, ⋄✁✄
⋄✁✄, ⋄✁✄, ⋄✁✄ ⋄✁✄, ⋄✁✄, ⋄✁✄
⋄✁✄, ⋄✁✄, ⋄✁✄ ⋄c✄, ⋄✁✄, ⋄✁✄
φ
♥φ
0 , ⋄✁✄, ♥0
φ
♥1 , ⋄✁✄, ♥φ
1
⋄✁✄, ⋄✁✄, ♥φ
0
⋄✁✄, ⋄✁✄, ♥φ
1
Figure 4: The instance I constructed in the running example (introduced in Example 5.1)
the existence of a side-effect-free solution for IΘ and a, and
on the other direction, existence of a side-effect-free solution
JΘ for IΘ and a implies the existence of a side-effect-free
solution for I ⋆ and a⋆ provided that JΘ does not miss any
of facts over the encounters.
x′
hx⋆ , y1⋆ , y2⋆ i
x1
x
x2
hx⋆ , y1⋆ , ✄i
hx⋆ , ✁, ✄i
hx⋆ , ✁, y2⋆ i
y1
y3
y2
h⋄, y1⋆ , ✄i
h⋄, ✁, ✄i
h⋄, ✁, y2⋆ i
img(φ1 )
x′′
x3
hx , ✁, ✄i
h⋄, ✁, y2⋆ i
⋆
Lemma 5.7. Suppose that P has two non-encounters γ1
and γ2 , such that Θ(γ1 ) contains y1⋆ and Θ(γ2 ) contains y2⋆ .
Then, the following are equivalent.
1. There is a side-effect-free solution for I ⋆ and a⋆ .
2. There is a side-effect-free solution JΘ for IΘ and a,
such that RφJΘ = RφIΘ for all encounters φ.
img(φ2 )
In the next section we branch into cases, based on properties of S and Q, and show how we build on Lemma 5.7 to
complete the reduction in each of these cases.
Figure 5: Gv (Q) in the running example (introduced
in Example 5.1)
5.3 Cases
We now describe the different cases we consider. In addition, for each case we specify the parameters we use for the
template construction.
We use the following terminology. A set Z of variables is
simultaneously determined if there is an atom φ of Q, such
that Z ⊆ img(φ). To variables z1 and z2 are fd-separable if
there is an atom φ of Q with the following properties:
• φ fd-separates y1 from y2 .
• img(φ) contains neither y1 nor y2 .
Up to now, we showed how to construct IΘ and a from the
input I ⋆ and a⋆ for FreeDPhS⋆ , Q⋆ i, given parameters for
the template construction. One would wonder whether we
are done in the sense that we can use IΘ and a directly as
input for FreeDPhS⋆ , Q⋆ i. As stated by Lemma 5.7 below,
it is true that existence of a side-effect-free solution for I ⋆
and a⋆ implies the existence of a side-effect-free solution for
IΘ and a. In one of the cases we consider later we show
that the other direction is also true; however, this is not the
general case: There may be a side-effect-free solution for IΘ
and a even if no such a solution exists for I ⋆ and a⋆ , as the
following example shows.
Case 1: P has two head variables that are not simultaneously determined.
The variables y1 and y2 will act as parameters for the
template construction. With y1 and y2 chosen, we branch
into two sub-cases, where in each sub-case we choose the
parameters φ1 and φ2 differently.
Example 5.6. Consider again the instances I ⋆ and IΘ ,
and the tuples a⋆ and a, constructed in our running example.
Recall that IΘ is depicted in the top part of the table in Figure 4. Recall from Example 5.1 that there is no side-effectfree solution for I ⋆ and a⋆ . But there is a side-effect-free
solution for I and a. To see that, observe (in Figure 4) that
the facts T (0✁✄, 0✁✄, 0✁✄) and T (1✁✄, 1✁✄, 1✁✄) are
used only in the matches that produce a. From Lemma 5.5
we conclude that by removing these two facts from IΘ we
get a solution that is side-effect free.
Case 1.a: y1 and y2 are fd-separable.
In Case 1.a, we choose an atom φ, such that φ fd-separates
y1 from y2 and img(φ) contains neither y1 nor y2 .
Lemma 5.8. In Case 1.a, φ ∈ P .
Now, both parameters φ1 and φ2 are chosen to be φ. Recall that φ1 and φ2 are required to be in P , so this choice is
justified by Lemma 5.8,
The “problem” illustrated in Example 5.6 is that one of the
atoms, namely T (x1 , x′ , x2 ), is such that Θ(φ) contains both
y1⋆ and y2⋆ . Indeed, in the example both of y1⋆ and y2⋆ appear
in Θ(x′ ). An atom φ of Q such that Θ(φ) contains both y1⋆
and y2⋆ (not necessarily in the image of the same variable)
is called an encounter. Following is a key lemma that will
later guide us in the further definition of the reduction. This
lemma shows that, under some condition, the reduction thus
far is “correct” up to the issue of Example 5.6. That is,
existence of a side-effect-free solution for I ⋆ and a⋆ implies
Example 5.9. Consider the following CQ.
Q(y1 , y2 , y3 ) :− S1 (x′′1 , y1 , x′1 ), R1 (x′1 , y3 , x1 ),
T (x1 , x2 ), R2 (x2 , y3 , x′2 ), S2 (x′2 , y2 , x′′2 )
The schema S has the relation symbols of Q (with the corresponding arities). In addition, S has the following fds.
R1 : 3 → 2
9
S1 : 3 → 2
S2 : 1 → 2
y1
h⋄, y1⋆ , ✄i
x′1
hx⋆ , y1⋆ , ✄i
x′′1
x′′2
hx⋆ , y1⋆ , ✄i
hx⋆ , ✁, y2⋆ i
x1
x2
hx⋆ , ✁, ✄i
hx⋆ , ✁, ✄i
y2
x′′1
y1
h⋄, ✁, y2⋆ i
h⋄, y1⋆ , ✄i
x′2
x′1
hx⋆ , ✁, y2⋆ i
x′′2
hx⋆ , y1⋆ , ✄i
hx⋆ , y1⋆ , ✄i
y3
hx⋆ , ✁, y2⋆ i
x1
x2
hx⋆ , y1⋆ , ✄i
hx⋆ , ✁, y2⋆ i
y2
h⋄, ✁, y2⋆ i
x′2
hx⋆ , ✁, y2⋆ i
y3
h⋄, ✁, ✄i
h⋄, ✁, ✄i
img(φ1 )
img(φ2 )
img(φ1 ) = img(φ2 )
Figure 7: Case 1.b—Gv (Q) in Example 5.12
Figure 6: Case 1.a—Gv (Q) in Example 5.9
Example 5.12. Consider again the CQ Q and schema S
of Example 5.9, except that now we assume that the fds of
S are the following.
It is easy to verify that G∃ (Q) is connected (i.e., has just
one connected component), and that no atom φ satisfies
{y1 , y2 } ⊆ img(φ). Therefore, y1 and y2 are not simultaneously determined, which means that we are in Case 1.
Figure 6 shows the graph Gv (Q). Surrounded by a rectangle (with dotted edges) is the set img(φT ), where φT is the
atom T (x1 , x2 ). As can be seen in the figure, neither y1 nor
y2 appear in img(φT ), and img(φT ) separates y1 from y2 .
Therefore, y1 and y2 are fd-separable, which means that we
are in Case 1.a. As φ1 and φ2 we select the atom φT (actually, φT is the only possible choice). Finally, having chosen
all the parameters of the template construction, the box of
each variable z in Figure 6 contains the triple Θ(z).
R1 : {3, 2} → 1
R2 : {1, 2} → 3
S1 : 3 → 2
S2 : 1 → 2
Like in Example 5.9, y1 and y2 are not simultaneously determined, which means that we are in Case 1. Figure 7 shows
the graph Gv (Q). Unlike Example 5.9, now the set img(φT )
contains just x1 and x2 , and as can be seen in Figure 7 the
atom φT no longer fd-separates y1 from y2 . Actually, it is
easy to see that for each atom φ of Q it holds that img(φ)
either contains one of y1 and y2 or does not separate y1 from
y2 . We are therefore in Case 1.b.
The atoms φ with y1 ∈ img(φ) are φR1 and φS1 , namely,
R1 (x′1 , y3 , x1 ) and S1 (x′′1 , y1 , x′1 ), respectively. We have the
following:
Observe the following for the CQ Q over the schema S in
Example 5.9 (with Gv (Q) depicted in Figure 6). First, none
of the atoms are encounters. Second, Θ(y1 ) contains y1⋆ and
Θ(y2 ) contains y2⋆ . The following lemma shows that these
observations always hold in Case 1.a. This lemma is useful,
since it suffices for completing the reduction in Case 1.a.
fdSep(φR1 , y2 ) = {x′′1 , y1 , x′1 , x1 , y3 }
fdSep(φS1 , y2 ) = {x′′1 , y1 , x′1 }
Since |fdSep(φR1 , y2 )| > |fdSep(φS1 , y2 )|, we choose φ1 =
φR1 . We similarly choose φ2 = φR2 . In Figure 7, the sets
img(φ1 ) and img(φ2 ) are surrounded by polygons with dotted edges. Finally, with all parameters chosen, the box of
each variable z in Figure 6 contains the triple Θ(z).
Lemma 5.10. In Case 1.a, the following hold.
• There are no encounters.
• For each i ∈ {1, 2}, the tuple Θ(yi ) contains yi⋆ .
So, for the instance I of our reduction from the problem
FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi we simply choose I =
IΘ . Based on Lemma 5.10, we can now apply Lemma 5.7
with γ1 and γ2 being any atoms in P that contain y1 and
y2 , respectively.
Note that in Example 5.12, the atom φT (i.e., T (x1 , x2 ))
is an encounter, since Θ(x1 ) contains y1⋆ and Θ(x2 ) contains
y2⋆ . In particular, we cannot just set I = IΘ as in Case 1.a:
similarly to Example 5.6 it can be shown that there is always
a side-effect-free solution for IΘ and a, no matter what I ⋆
and a⋆ are. In the following section, we show how to deal
with encounters for both this case and Case 2, which we
discuss next.
Corollary 5.11. In Case 1a, there is a side-effect-free
solution for I ⋆ and a⋆ if and only if there is a side-effect-free
solution for I and a.
Case 2: Every two head variables of P are simultaneously determined.
Still within Case 1, the next case we consider is the complement of Case 1.a.
In Case 2, the parameters of the template construction
are chosen as follows. We first look at the sets Z ⊆ Varh (P )
that are simultaneously determined by an atom of P , and
among those we select a Z1 with a maximal cardinality. Note
that Z1 is a strict subset of P , due to our assumption that
P violates the tractability condition. Next, among those
sets Z that are not contained in Z1 , we again select a Z2
with a maximal cardinality. Note that neither Z1 ⊆ Z2 nor
Z2 ⊆ Z1 hold. So, as y1 we select an element of Z1 \ Z2 , and
as y2 we select an element of Z2 \ Z1 . Finally, we consider
the atoms φ ∈ P with img P
h (φ) = Z1 , and select φ1 to
be a φ with a maximal fdSep(φ, y2 ). Similarly, we consider
Case 1.b: y1 and y2 are not fd-separable.
In Case 1.b, we choose the atoms φ1 , φ2 ∈ P (used as
parameters of the template construction) as follows. The
atom φ1 is such that y1 ∈ img(φ1 ) (hence, y2 ∈
/ img(φ1 )
since we are in Case 1) and fdSep(φ1 , y2 ) has the maximal
number of elements among the atoms φ ∈ P that satisfy
y1 ∈ img(φ). (Actually, it is enough for fdSep(φ1 , y2 ) not
be contained in any fdSep(φ, y2 ) for those atoms φ.) The
atom φ2 is selected similarly, except that y1 and y2 swap
roles. This manner of selecting φ1 and φ2 is crucial for the
correctness of our reduction.
10
the atoms φ ∈ P with img P
h (φ) = Z2 , and select φ2 to be
a φ with a maximal fdSep(φ, y1 ). We now have all four
parameters needed for the template construction. Again,
this specific manner of selecting the parameters is needed
for the correctness of our reduction.
to I the fact νcφ (γ) (unless it is already in I). Of course, we
need to show that I is indeed an instance over S (i.e., the
fds are satisfied).
Example 5.13. An example of Case 2 is the CQ Q over
the schema S in our running example. Indeed, y1 and y2 are
simultaneously determined (by φT ), and so are y1 and y3 (by
φR1 ) as well as y2 and y3 (by φR2 ). In the parameter setting,
Z1 is {y1 , y3 }, Z2 is {y2 , y3 }, and the chosen parameters are
y1 , y2 , φR1 and φR2 . Recall that Figure 5 shows Gv (Q) along
with Θ(z) for each z ∈ Var(Q). Observe that T (x1 , x′ , x2 ) is
an encounter (since Θ(x′ ) contains both y1⋆ and y2⋆ ).
Example 5.15. We continue with our running example,
which falls in Case 2. There is a single encounter: φT =
T (x1 , x′ , x2 ). As shown in Figure 4, there are two matches in
Ma⋆ (Q⋆ , I ⋆ ): the one of the first row maps x⋆ to 0, and the
one of the fifth row maps x⋆ to 1; let us call these matches µ0
and µ1 , respectively. In Figure 5, the set img(φT ) is shaded
by a grey polytope. Since img(φT ) does not separate any
node from y3 , except for the nodes of img(φT ) itself, we have
ν0φ (z) = ♥φ0 T and ν1φ (z) = ♥φ1 T for each variable z outside
img(φT ), and for each z ∈ img(φT ) we have ν0φ (z) = Θµ0 (z)
and ν1φ (z) = Θµ1 (z). The facts that are added to construct
I are the two in the bottom of Figure 4.
Lemma 5.14. In Cases 1.b and 2, I satisfies each fd of S.
As shown in Example 5.13 (and previously in Example 5.6),
in Case 2 there may be encounters, and we deal with those
in the next section.
5.4 Handling Encounters
In the proof of correctness for the reduction, we prove the
following intermediate result. For each µ ∈ Ma⋆ (Q⋆ , I ⋆ )
with µ(x⋆ ) = c, the mapping νcφ maps φ to Θµ (φ), and
moreover, gives rise to a unique new answer b ∈ Q(I) (that
contains at least one occurrence of ♥φc ). See Figure 4 for
illustration. In turn, this intermediate result is used for
proving the following lemma.
Our approach to dealing with encounters (in Cases 1.b
and 2) is as follows. Suppose that φ is an encounter. Let
Ma⋆ (Q⋆ , I ⋆ ) be the set of all matches µ ∈ M(Q⋆ , I ⋆ ) with
µ(y1⋆ ) = ✁ and µ(y2⋆ ) = ✄ (that is, µ(y1⋆ , y2⋆ ) = a⋆ ). As
illustrated in Example 5.6, we can always obtain a sideeffect-free solution for IΘ and a by deleting from IΘ all the
facts of the form Θµ (φ) where µ ∈ Ma⋆ (Q⋆ , I ⋆ ). The idea
is to make each such Θµ (φ) necessary for a side-effect-free
solution. This is done by adding to IΘ facts so that in the
resulting instance I, there are new answers for which every
Θµ (φ) is needed.1 Now, if there is a side-effect-free solution
for I ⋆ and a⋆ then there is still a side-effect-free solution for I
and a. Moreover, if there is a side-effect-free solution J for I
and a, then J necessarily contains all the facts Θµ (φ) where
µ ∈ Ma⋆ (Q⋆ , I ⋆ ). We can then easily assume that JΘ = J ∩
IΘ contains all the facts of IΘ over the encounters. Finally,
by applying Lemma 5.7 we get a side-effect-free solution for
I ⋆ and a⋆ . Next, we give the formal details.
Let φ be an encounter, let µ be a match in Ma⋆ (Q⋆ , I ⋆ ),
and suppose that µ(x⋆ ) = c. We define the assignment νcφ
for Q, as follows. In Case 1.b:
(
Θµ (z) if z ∈ fdSep(φ, y1 ) ∩ fdSep(φ, y2 );
def
φ
νc (z) =
♥φc
otherwise.
And in Case 2:
(
Θµ (z)
def
φ
νc (z) =
♥φc
Lemma 5.16. In Cases 1.b and 2, the atom φi (i = 1, 2)
is a non-encounter, and Θ(φi ) contains yi⋆ . Moreover, the
following are equivalent.
1. There is a side-effect-free solution JΘ for IΘ and a,
such that RφJΘ = RφIΘ for all encounters φ.
2. There is a side-effect-free solution for I and a.
The correctness of the reduction (in Cases 1.b and 2) is
obtained by combining Lemmas 5.7 and 5.16.
Corollary 5.17. In Cases 1.b and 2, there is a sideeffect-free solution for I ⋆ and a⋆ if and only if there is a
side-effect-free solution for I and a.
5.5 Summary and Conclusion
To summarize this section, we fixed S and Q that violate the tractability condition, and showed a reduction from
FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi. Specifically, we took
the input I ⋆ and a⋆ for FreeDPhS⋆ , Q⋆ i and constructed
the input I and a for FreeDPhS, Qi. We started with
I = IΘ , which we built by means of the assignments Θµ ,
using the template assignment Θ; to deal with encounters,
we added tuples to I using the assignments νcφ . The correctness of the reduction has been stated in Corollaries 5.11
and 5.17. Combined with the observation that the construction of I and a takes polynomial time, we get the main result
of this section.
T
if z ∈ y∈Varh (p) fdSep(φ, y);
otherwise.
Here, ♥φc is a fresh
T new constant that is unique to φ and c.
Note that z ∈ y∈Varh (p) fdSep(φ, y) means that in Gv (Q),
the set img(φ) separates z from every head variable y of P .
As an example, consider Q and S from Example 5.12,
which fall in Case 1.b. There is a single encounter, namely
φT = T (x1 , x2 ). In Figure 7, the set img(φT ) is shaded by a
grey rectangle. Since every variable z ∈
/ {x1 , x2 } is reachable
from y1 (and y2 ) by a path that does not visit img(φT ), we
have νcφ (z) = ♥φc for each z ∈
/ {x1 , x2 }, while νcφ (z) = Θµ (z)
for each z ∈ {x1 , x2 }. Later, Example 5.15 discusses Case 2.
Finally, to construct I we start with I = IΘ , and for each
encounter φ, assignment νcφ and atom γ ∈ atoms(Q), we add
Theorem 5.18. If Q+ does not have functional head domination, then FreeDPhS, Qi is NP-complete.
6. MAIN RESULT
In this section, we state the dichotomy established in
the previous two sections. Moreover, we strengthen the dichotomy by considering (hardness of) approximation.
1
An idea in this spirit was used for proving the dichotomy theorem of [13], but naturally, details greatly differ due to the fds.
11
6.1 Dichotomy
sfj-CQs, then there are no tractable cases except for those
where an algorithm as straightforward as the unirelation algorithm already handles. Many natural and practically motivated directions are left for future work. Among them are
exploring CQs with self joins (with or without fds), utilizing fds to generalize and improve upon approximation algorithms [13], handling additional constraints (like general
equality-generating dependences [2] and foreign keys) and
investigating deletion of multiple tuples (a.k.a. group deletion [5]).
As a conclusion of Theorems 3.12 and 5.18, we get the
following dichotomy theorem, which states that for an sjfCQ Q over a schema S the complexity of MaxDPhS, Qi is
precisely determined by the tractability condition (i.e., Q+
has functional head domination): when held, the problem is
solvable in polynomial time by the (very simple) unirelation
algorithm; when violated, the problem is NP-hard, since
FreeDPhS, Qi is NP-complete.
Theorem 6.1. Let Q be an sjf-CQ over a schema S.
Acknowledgments
• If Q+ has functional head domination, then UniRelhS, Qi
solves MaxDPhS, Qi (in polynomial-time).
The author is grateful to Jan Vondrák for helpful and insightful discussions, and to Phokion G. Kolaitis for being a
source of education and inspiration for this work.
• If Q+ does not have functional head domination, then
FreeDPhS, Qi is NP-complete.
8. REFERENCES
6.2 Hardness of Approximation
[1] F. Bancilhon and N. Spyratos. Update semantics of relational
views. ACM Trans. Database Syst., 6(4):557–575, 1981.
[2] C. Beeri and M. Y. Vardi. A proof procedure for data
dependencies. J. ACM, 31(4):718–741, 1984.
[3] P. Buneman, S. Khanna, and W. C. Tan. On propagation of
deletions and annotations through views. In PODS, pages
150–158, 2002.
[4] D. Burdick, M. A. Hernández, H. Ho, G. Koutrika,
R. Krishnamurthy, L. Popa, I. Stanoi, S. Vaithyanathan, and
S. R. Das. Extracting, linking and integrating data from public
sources: A financial case study. IEEE Data Eng. Bull.,
34(3):60–67, 2011.
[5] G. Cong, W. Fan, and F. Geerts. Annotation propagation
revisited for key preserving views. In CIKM, pages 632–641,
2006.
[6] S. S. Cosmadakis and C. H. Papadimitriou. Updates of
relational views. J. ACM, 31(4):742–760, 1984.
[7] Y. Cui and J. Widom. Run-time translation of view tuple
deletions using data lineage. Technical report, Stanford
University, 2001. http://dbpubs.stanford.edu:8090/pub/2001-24.
[8] N. N. Dalvi, K. Schnaitter, and D. Suciu. Computing query
probability with incidence algebras. In PODS, pages 203–214,
2010.
[9] N. N. Dalvi and D. Suciu. Efficient query evaluation on
probabilistic databases. VLDB J., 16(4):523–544, 2007.
[10] U. Dayal and P. A. Bernstein. On the correct translation of
update operations on relational views. ACM Trans. Database
Syst., 7(3):381–416, 1982.
[11] R. Fagin, J. D. Ullman, and M. Y. Vardi. On the semantics of
updates in databases. In PODS, pages 352–365. ACM, 1983.
[12] A. M. Keller. Algorithms for translating view updates to
database updates for views involving selections, projections,
and joins. In PODS, pages 154–163. ACM, 1985.
[13] B. Kimelfeld, J. Vondrák, and R. Williams. Maximizing
conjunctive views in deletion propagation. In PODS, pages
187–198, 2011.
[14] P. G. Kolaitis and E. Pema. A dichotomy in the complexity of
consistent query answering for queries with two atoms. In press,
2011.
[15] J. Lechtenbörger and G. Vossen. On the computation of
relational view complements. ACM Trans. Database Syst.,
28(2):175–208, 2003.
[16] B. Liu, L. Chiticariu, V. Chu, H. V. Jagadish, and F. Reiss.
Automatic rule refinement for information extraction. PVLDB,
3(1):588–597, 2010.
[17] D. Maslowski and J. Wijsen. On counting database repairs. In
LID, pages 15–22, 2011.
[18] A. Meliou, W. Gatterbauer, J. Y. Halpern, C. Koch, K. F.
Moore, and D. Suciu. Causality in databases. IEEE Data Eng.
Bull., 33(3):59–67, 2010.
[19] A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. The
complexity of causality and responsibility for query answers
and non-answers. PVLDB, 4(1):34–45, 2010.
[20] E. Riloff and R. Jones. Learning dictionaries for information
extraction by multi-level bootstrapping. In AAAI/IAAI, pages
474–479, 1999.
For a number α ∈ [0, 1], a solution J is α-optimal if
|Q(J)| ≥ α · |Q(K)| for all solutions K. Kimelfeld et al. [13]
showed that for an sjf-CQ Q over a schema S without fds,
when MaxDPhS, Qi is NP-hard, it cannot be approximated
better than some constant αQ ∈ (0, 1) unless P = NP. For
that, they showed a PTAS reduction from MaxDPhS⋆ , Q⋆ i
to MaxDPhS, Qi (that is, the reduction is such that for each
α there is αQ , such that an αQ -approximation for the generated instance of MaxDPhS, Qi gives an α-approximation
for the original instance of MaxDPhS⋆ , Q⋆ i), rather than an
ordinary reduction from FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi
as given here. Nevertheless, using ideas very similar to those
in the construction of Kimelfeld et al., we can adjust the reduction of Section 5 to a PTAS reduction.2 As a result,
we get the following addition to the hardness part of Theorem 6.1.
Theorem 6.2. Let Q be an sjf-CQ over a schema S. If
Q+ does not have functional head domination, then there is
a constant αQ such that it is NP-hard to αQ -approximate
MaxDPhS, Qi.
Combining Theorems 6.1 and 6.2, we conclude the following about MaxDPhS, Qi for an sjf-CQ Q over a schema S.
Either the problem is straightforward, or our ability to even
approximate it is limited.
The dependency of the constant αQ on Q is already known
to be unavoidable [13]: for each constant α ∈ (0, 1) there
is an sjf-CQ Qα over a schema S without fds, such that
Qα does not have (functional) head domination, and yet,
MaxDPhS, Qi is α-approximable in polynomial time (and
by the unirelation algorithm). In other words, there is no
global α that can replace all the αQ in Theorem 6.2.
7.
CONCLUSIONS
The dichotomy theorem proved in this paper shows how
exactly functional dependencies extend the class of sjf-CQs
where maximum deletion propagation is in polynomial time.
On the negative side, it shows that if optimal solutions (or
arbitrarily good approximations, e.g., PTAS) are sought for
2
We gave a reduction from FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi,
rather than a PTAS one from MaxDPhS⋆ , Q⋆ i to MaxDPhS, Qi,
since it greatly simplifies the presentation, yet still, entails essentially all of the novel ideas.
12
APPENDIX
A.
A.1
1. Var(φ) ⊆ fdSep(Varh (P ), zP ).
2. If zP ∈ img(φ) then zP ∈ img(Varh (P )).
3. If zP ∈ img(φ) and Q = Q+ , then zP ∈ Varh (P ).
Proof. We first prove Part 1. Let zφ be a variable in
Var(φ). We need to show that Varh (P ) fd-separates zP
from zφ . Suppose, by way of contradiction, that Gv (Q)
has a path p between zφ and zP , such that none of the
nodes in p are in Varh (P ). Suppose that p is the path
zP = z1 — z2 — . . . — zn = zφ . We will prove, by induction on i, that each zi is in Var∃ (P ). This is enough,
since zφ ∈ Var∃ (P ) implies that φ ∈ P , in contradiction to
the assumption of the proposition.
For the basis, i = 1, the claim zi ∈ Var∃ (P ) is obvious
from our assumption that z1 = zP ∈ Var(P ) and p has no
variables in Varh (P ). For the inductive step, if zi ∈ Var∃ (P )
then zi ∈ Var∃ (φ′ ) for some atom φ′ in P . The fact that
Gv (Q) contains an edge between zi and zi+1 means that
some atom φ′′ contains both zi and zi+1 . But then, φ′ and
φ′′ share the existential variable zi , and therefore, φ′′ is also
in P . It thus follows that zi+1 ∈ Var(P ), and since p has
no nodes in Varh (P ), we conclude that zi+1 ∈ Var∃ (P ), as
required.
We now prove Part 2. Suppose that zP ∈ img(φ) (which is
the same as zP ∈ img(Var(φ))). From Part 1 of the lemma
we know that Var(φ) ⊆ fdSep(Varh (P ), zP ). So now, we
can use Part 3 of Theorem 4.2 with zP , Var(φ) and Varh (P )
playing the roles of z, Z ′ and Z, respectively. That gives us
zP ∈ img(Varh (P )), as claimed.
Part 3 follows immediately from Part 2, since Q = Q+
and zP ∈ img(Varh (P )) implies that zP is a head variable
(hence, in Var(P ) ∩ Varh (Q) = Varh (P )).
PROOFS FOR SECTIONS 3 AND 4
Proof of Theorem 4.2
Theorem 4.2.
Let Q be a CQ over a schema S, let
Z ⊆ Var(Q) be a set of variables, and let z ∈ Var(Q) be a
variable. The following hold.
1. fdSep(Z, z) is closed under fds; that is:
img(fdSep(Z, z)) = fdSep(Z, z)
2. fdSep(Z, z) is closed under fd-separation by subsets; that
is, for all Z ′ ⊆ Var(Q) we have:
Z ′ ⊆ fdSep(Z, z) ⇒ fdSep(Z ′ , z) ⊆ fdSep(Z, z)
3. Z determines z whenever Z separates from z a set that
determines z; that is, for all Z ′ ⊆ Var(Q) we have:
`
´
z ∈ img(Z ′ ) ∧ Z ′ ⊆ fdSep(Z, z) ⇒ z ∈ img(Z)
Proof. We begin with Part 1. We need to show that
img(fdSep(Z, z)) ⊆ fdSep(Z, z). By way of contradiction,
suppose otherwise. Let I be the set of facts that consists
of the union of two sets I1 and I2 of facts: I1 is obtained
from atoms(Q) by replacing each variable with a constant
c1 , and I2 is obtained similarly to I1 , except that each variable not in fdSep(Z, z) is replaced with a different constant
c2 6= c1 . So there are (at least) two matches for Q in I,
where these matches agree exactly on fdSep(Z, z). Since
img(fdSep(Z, z)) 6⊆ fdSep(Z, z), there is a violation of an fd,
say R : A → i. From the construction of I it follows that
there is an atom φ over R, such that the set ZA of variables
of φ in the positions of A is contained in fdSep(Z, z), and the
variable zi of φ in the ith position is not in fdSep(Z, z). Now,
if ZA ⊆ img(Z) then R : A → i implies that zi ∈ img(Z),
and then zi ∈ fdSep(Z, z), which contradicts our assumption. Therefore, at least one variable in ZA , say zA , is not
in img(Z). But then, by definition there is a path in Gv (Q)
from z to zi , such that the path does not visit any node in
img(Z); by adding to this path the edge {zi , zA } (and this
edge exists since zi and zA co-occur in φ), we get a path in
Gv (Q) from z to zA , such that the path does not visit any
node in img(Z). This contradicts the assumption that zA is
fd-separated from z.
We now prove Part 2. Let Z ′ ⊆ fdSep(Z, z) and let z ′ be a
node in fdSep(Z ′ , z). We need to show that z ′ ∈ fdSep(Z, z).
Let p be a path in Gv (Q) between z and z ′ . We have to prove
that p contains a node in img(Z). Since z ′ ∈ fdSep(Z ′ , z),
we have that p contains some node z ′′ ∈ img(Z ′ ). From
Part 1 we know that img(Z ′ ) ⊆ fdSep(Z, z). Therefore,
z ′′ ∈ fdSep(Z, z), which means that every path connecting z
and z ′′ contains a node in img(Z). In particular, p contains a
node in img(Z) (since p contains both z and z ′′ ), as required.
Finally, we prove Part 3. So suppose that Z ′ is such that
z ∈ img(Z ′ ) and Z ′ ⊆ fdSep(Z, z). From Part 1 we know
that z ∈ fdSep(Z, z). In particular, the one-node path that
consists of only z visits a node in img(Z), or in other words,
z ∈ img(Z), as claimed.
A.2
Proposition 3.11. Let Q be a CQ over a schema S.
1. If Q has head domination, then Q has functional head
domination.
2. If Q has functional head domination, then Q+ has functional head domination.
Furthermore, none of the two reverse implications hold.
Proof. Item 1 is straightforward. The fact that none of
the two reverse implications hold is shown in Examples 3.2
and 3.6, respectively. It remains to prove item 2, which is
done next.
Suppose that Q has functional head domination. We need
to show that Q+ has functional head domination. Let PQ+
be a connected component of G∃ (Q+ ). An easy observation is that every edge of G∃ (Q+ ) is also an edge of G∃ (Q).
In particular, PQ+ is contained in some connected component PQ of G∃ (Q). Let φQ be an atom of Q, such that
Varh (PQ ) ⊆ img(φQ ). We complete the proof by showing
that Varh (PQ+ ) ⊆ img(φQ ) as well.
Let z be a head variable of PQ+ . We need to show that
z ∈ img(φQ ). If z is a head variable of Q, then z is a head
variable of PQ , and therefore z ∈ img(φQ ). So suppose that
z is an existential variable of Q. Then z is in PQ (since z
is in PQ+ ) and z ∈ img(Varh (Q)) (by the definition of Q+ ).
From Part 1 of Proposition A.1 it follows that Varh (PQ ) fdseparates z from every head variable z ′ ∈ Varh (Q)\Varh (PQ ).
Of course, Varh (PQ ) fd-separates z from every head variable
in Varh (PQ ). We conclude that Varh (Q) ⊆ fdSep(Varh (PQ ), z).
From Part 3 of Theorem 4.2, with img(φQ ) and Varh (Q)
playing the roles of Z and Z ′ , respectively, we conclude that
z ∈ img(Varh (PQ )). Finally, from the fact that Varh (PQ ) ⊆
img(φQ ) we get that z ∈ img(φQ ), as claimed.
Proofs for Section 3
Proposition A.1. Let Q be a CQ over a schema S, and
let P be a connected component of G∃ (Q). Let φ be an atom
in atoms(Q) \ P , and let zP be a variable in Var(P ).
13
Theorem 3.12.
Let Q be an sjf-CQ over S. If Q+
has functional head domination, then UniRelhS, Qi optimally
solves MaxDPhS, Qi (in polynomial time).
Let µ be a match in M(Q⋆ , I ⋆ ). If µ(x⋆ ) = c and µ(yi⋆ ) =
bi for i = 1, 2, then we may write Θcb1 ,b2 instead of Θµ .
Observe that whenever y is a head variable, the first element w of Θ(y) is ⋄; hence, Θ(y) (where y is the sequence
of head variables of Q) does not mention x⋆ . Therefore,
if µ1 , µ2 ∈ M(Q, IΘ ) are such that µ1 (yi⋆ ) = µ2 (yi⋆ ) for
i = 1, 2, then Θµ1 (y) = Θµ2 (y). If µ ∈ M(Q, IΘ ) is such
that µ(y1⋆ ) = b1 and µ(y2⋆ ) = b2 , then Θµ (y) is also denoted
by Θb1 ,b2 (y).
Finally, if z is a variable such that Θ(z) does not include y1⋆ , then we Θcb1 ,b2 (z) = Θcb′ ,b2 (z) for all b1 and b′1 .
1
Therefore, in that case we may write just Θcb2 (z) instead of
c
Θb1 ,b2 (z). Similarly, if z does not include y2⋆ , then we may
write Θcb1 (z) instead of Θcb1 ,b2 (z). We use the same convention for an atom γ such that Θ(γ) does not mention y1⋆ or y2⋆ ;
that is, if γ does not include y1⋆ , then we may write Θcb2 (γ)
instead of Θcb1 ,b2 (γ), and if γ does not include y2⋆ , then we
may write Θcb1 (γ) instead of Θcb1 ,b2 (γ).
Proof. Suppose that Q+ has functional head domination. Using Lemma 3.8, it is enough to show that the algorithm UniRelhS, Q+ i (that actually acts exactly the same
as UniRelhS, Qi) optimally solves MaxDPhS, Q+ i. Suppose
that P1 , . . . , Pk are the connected components of G∃ (Q+ ).
We assume that Varh (Pi ) is the set Varh (Q+ ) ∩ Var(Pi ) (that
is, the context is Q+ rather than Q). For each Pi (i =
1, . . . , k), let φi ∈ atoms(Q+ ) be such that img(φi ) contains
Varh (Pi ). Those φi exist, since we assume that Q+ has functional head domination.
Consider the input I and a for MaxDPhS, Q+ i, and let
Jopt be an optimal solution. We will prove that there is
some j ∈ {1, . . . , k}, such that none of the (Q+ , a)-useful
facts over Rφj is Q+ -useful in Jopt . This implies that the intermediate solution Jφj constructed by the unirelation algorithm is such that Q+ (Jopt ) ⊆ Q+ (Jφj ), thus |Q+ (Jopt )| ≤
|Q+ (Jφj )|. In particular, the solution J returned by the
unirelation algorithm satisfies |Q+ (Jopt )| ≤ |Q+ (J)|.
Suppose, by way of contradiction, that for all i ∈ {1, . . . , k}
there is a fact fi in Rφi that is (Q+ , a)-useful in I and is
also Q+ -useful in Jopt . We will show that a ∈ Q+ (Jopt ),
which contradicts the fact that Jopt is a solution. For each
i ∈ {1, . . . , k}, define two assignments µai and µi as follows.
B.1.2 Proofs
Lemma 5.4. IΘ satisfies every fd of ∆.
Proof. It is enough to show that every fd R : A → i of S
(having a single index i on the right-hand side) is satisfied.
So let R : A → i be an fd of S, and let f and f ′ be two facts
over R, such that agree on all the attributes in A. We need
to show that f and f ′ agree on the ith position as well.
Consider the atom φR (over R), and let ZA be the set
of variables of φR in the positions of A. We use Θ(ZA )
denote the set {Θ(z) | z ∈ ZA }. Let zi be the variable of
φR in the ith position. Observe that R : A → i implies that
zi ∈ img(ZA ). Now, let hw, w1 , w2 i and hw′ , w1′ , w2′ i be the
values in the ith positions of f and f ′ , respectively. We need
to prove that w = w′ and wi = wi′ (i = 1, 2), and we do so
as follows.
To prove that w = w′ , suppose, by way of contradiction,
that w 6= w′ . Then Θ(zi ) contains x⋆ (in the leftmost coordinate). In particular, from the definition of Θ it follows
that φR is in P , and that zi is an existential variable. Now,
since f and f ′ agree on A, then none of the triples in Θ(ZA )
contain x⋆ , which means that each variable in ZA is a head
variable (as φR ∈ P ). But then, this contradicts our assumption that Q = Q+ .
For proving that w1 = w1′ , suppose, by way of contradiction, that w1 6= w1′ . Then Θ(zi ) contains y1⋆ (hence
zi ∈
/ fdSep(φ2 , y1 )) and Θ(z) does not contain y1⋆ for each
z ∈ ZA (hence ZA ⊆ fdSep(φ2 , y1 )). But then, we get a
contradiction to Part 1 of Theorem 4.2 with Var(φ2 ) and y1
playing the roles of Z and z, respectively.
The proof that w2 = w2′ is symmetrical to that for w1 =
′
w1 . We conclude that f = f ′ , as claimed.
• µai ∈ M(Q+ , I), µai (φi ) = fi , and µai (y) = a. Note
that such µai exists, since fi is (Q+ , a)-useful in I.
• µi ∈ M(Q+ , Jopt ) and µi (φi ) = fi . Note that such µi
exists, since fi is Q+ -useful in Jopt .
Let i and j be two different integers in {1, . . . , k}. We first
show that µi and µj agree on the variables in Var(Pi ) ∩
Var(Pj ). So let z ∈ Var(Pi ) ∩ Var(Pj ) be given. Then z is
necessarily in Varh (Q+ ), since Pi and Pj do not share existential variables of Q+ . Therefore, z ∈ Varh (Pi ) ∩ Varh (Pj ),
and hence, in both img(φi ) and img(φj ). We need to show
that µi (z) = µj (z). So, we have the following:
• µi (z) = µai (z) since µi and µai agree on Var(φi );
• µai (z) = µaj (z) since µai and µaj agree on Varh (Q+ );
• µaj (z) = µj (z) since µj and µaj agree on Var(φj ).
So, we build an assignment µ for Q+ , as follows. For z ∈
Var(Q+ ), we set µ(z) = µi (z) if i is such that z ∈ Var(Pi ).
Note that µ is a match for Q+ in Jopt , since each µi is a
match for Q+ in Jopt .
We complete the proof by showing that µ(y) = a. Let y
be a head variable of Q+ , and suppose that y ∈ Varh (Pi ).
Then µ(y) = µi (y) by the construction of µ, and µi (y) =
µai (y) since y ∈ img(φ) and the matches µi and µai agree
on Var(φi ). Hence, µ(y) = µai (y). We conclude that for all
y ∈ Varh (Q+ ) there exists i ∈ {1, . . . , k}, such that µ(y) =
µai (y). Since each µai is such that µai (y) = a, we get that
µ(y) = a, as claimed.
B.
B.1
Before we prove Lemma 5.5, we need the following lemma.
Lemma B.1. Let p be a path in Gv (Q), and let j be one
of 1, 2 or 3. Suppose that the jth element of Θ is the same
on all the nodes of p. Then for all ν ∈ M(Q, IΘ ), the jth
element of ν is the same on all the nodes of p.
PROOFS FOR SECTION 5
Proofs for Section 5.2
Proof. It is enough to show the claim for each edge of p.
Let {z, z ′ } be an edge of p, and suppose that the jth element
of Θ(z) is the same as the jth element of Θ(z ′ ). Since {z, z ′ }
is an edge, there is an atom φ that contains both z and z ′ .
And since ν is a match, we get that ν(φ) is a fact of IΘ .
B.1.1 Notation
We first give some notation that will be used throughout
this section.
14
ν(y1 ) = Θb1 ,b2 (y1 ). On the other hand, ν(y1 ) has b′1 as
its second element since ν(y1 ) = Θcb′ ,b′ (y1 ). Therefore, we
1 2
get that b1 = b′1 . We similarly prove that b2 = b′2 . This
concludes the proof for the “only if” direction.
We now prove the “if” direction. Suppose that both Θcb1 (γ1 )
and Θcb2 (γ2 ) are Q-useful in JΘ . From the construction of IΘ
we conclude that I ⋆ contains both R1 (c, b1 ) and R2 (c, b2 ).
Therefore, Θcb1 ,b2 is a match for Q in IΘ . We will show that
Θcb1 ,b2 is a match for Q in JΘ as well. Since Θcb1 ,b2 (y) =
Θb1 ,b2 (y), this would imply the required Θb1 ,b2 (y) ∈ Q(JΘ ).
To prove that Θcb1 ,b2 ∈ M(Q, JΘ ), we need to show that
c
Θb1 ,b2 (φ) is in JΘ for all φ ∈ atoms(Q). So, let φ ∈ atoms(Q)
be given. If φ is an encounter, then Θcb1 ,b2 (φ) is in JΘ by
the assumption of the lemma. So, suppose that Θ(φ) does
not contain y2⋆ . (The case where Θ(φ) does not contain y1⋆
is handled symmetrically.) Let ν1 be a match for Q in JΘ ,
such that ν1 (γ1 ) = Θcb1 (γ1 ). Note that such ν1 exists, since
Θcb1 (γ1 ) is Q-useful in JΘ . From Lemma 5.5 we conclude
′
that ν1 = Θcb′ ,b′ for some c′ , b′1 and b′2 . Now, recall that
1 2
Θ(γ1 ) contains y1⋆ by assumption, and that Θ(γ1 ) contains
x⋆ since γ1 is in P (and hence, has variables in Var∃ (P )).
′
Therefore, Θcb1 (γ1 ) = Θcb′ ,b′ (γ1 ) implies that c = c′ and
1 2
b1 = b′1 . Since Θcb1 ,b2 (φ) does not contain y2⋆ , the definition
of Θcb1 ,b2 implies that Θcb1 ,b2 (φ) is equal to Θcb1 ,b′ (φ), which
2
is ν1 (φ). And since ν1 ∈ M(Q, J ⋆ ), we conclude that ν1 (φ),
c
and hence Θb1 ,b2 (φ), is in JΘ , as claimed.
The construction of IΘ is such that every fact is obtained
by replacing elements of Θ with constants; in particular, in
the creation of the fact ν(φ) it replaces the jth element of
z and z ′ with the same constant. We conclude that the jth
element of ν(z) is the same as the jth element of ν(z ′ ), as
required.
Lemma 5.5. M(Q, IΘ ) = {Θµ | µ ∈ M(Q⋆ , I ⋆ )}.
Proof. Let ν be a match for Q in IΘ . We need to show
that ν = Θµ for some µ ∈ M(Q⋆ , I ⋆ ). Let ρ1 and ρ2 be
atoms of P that contain y1 and y2 , respectively. Note that
such ρ1 and ρ2 exist since we assume that both parameters
y1 and y2 (as well as the parameters φ1 and φ2 ) occur in P .
Recall from the definition of Θ that for each variable
z ∈ Var∃ (P ), the first element in the triple Θ(z) is x⋆ .
Since P is a connected component of G∃ (Q), the subgraph
of Gv (Q) that is induced by Var∃ (P ) is connected. Hence,
from Lemma B.1 it follows ν(x) has the same first element
on all the existential variables x of P . Clearly, the atoms ρ1
and ρ2 contain existential variables, since they are both in
P . So, let c be the first element of ν(x) on all the existential
variables x of P . Let µ be the mapping that maps x⋆ to c,
maps y1⋆ to the second element of ν(y1 ), and maps y2⋆ to the
third element of ν(y2 ). The template construction implies
that µ is necessarily in M(Q⋆ , I ⋆ ). We complete the proof
by showing that ν = Θµ .
So, let z ∈ Var(Q) be a variable. We need to show that
ν(z) = Θµ (z). Let us denote ν(z) = hd, d1 , d2 i and Θµ (z) =
he, e1 , e2 i. So, we will show that d = e and ei = di (i = 1, 2).
Suppose that Θ(z) = hw, w1 , w2 i.
We first prove that d = e. If z ∈
/ Var∃ (P ), then w = ⋄,
and then both d and w are ⋄. Otherwise w = x⋆ and e = c.
Then, d = c follows from the way c is selected above.
We now prove that d1 = e1 . If w1 = ✁, then d1 =
✁ = e1 . Otherwise, z ∈
/ fdSep(φ2 , y1 ), which means that
there is a path p in Gv (Q) that connects z to y1 and does
not visit img(φ2 ). But then, none of the nodes of p are in
fdSep(φ2 , y1 ), and hence, each of these nodes has y1⋆ as its
second element. Lemma B.1 then implies that ν as the same
second element on all the variables of p. In particular, d1
is the same as the second element of ν(y1 ), which is µ(y1⋆ ),
which is e1 . The proof of d2 = e2 is similar.
Lemma 5.7. Suppose that P has two non-encounters γ1
and γ2 , such that Θ(γ1 ) contains y1⋆ and Θ(γ2 ) contains y2⋆ .
Then, the following are equivalent.
1. There is a side-effect-free solution for I ⋆ and a⋆ .
2. There is a side-effect-free solution JΘ for IΘ and a,
such that RφJΘ = RφIΘ for all encounters φ.
Proof. This lemma is weaker than the next lemma, which
we need for later use.
Lemma B.3. Suppose that P has two non-encounters γ1
and γ2 , such that Θ(γ1 ) contains y1⋆ and Θ(γ2 ) contains y2⋆ .
Then, the following are equivalent.
1. There is a side-effect-free solution for I ⋆ and a⋆ .
2. There is a side-effect-free solution JΘ for IΘ and a,
such that RφJΘ = RφIΘ for all φ ∈ atoms(Q) \ {γ1 , γ2 }.
3. There is a side-effect-free solution JΘ for IΘ and a,
such that RφJΘ = RφIΘ for all encounters φ.
Lemma B.2. Let JΘ be a subinstance of IΘ with the property that RφJΘ = RφIΘ for all encounters φ. Suppose that
γ1 , γ2 ∈ P are non-encounters such that Θ(γ1 ) contains y1⋆
and Θ(γ2 ) contains y2⋆ . For all Θb1 ,b2 (y) ∈ Q(IΘ ) we have
Θb1 ,b2 (y) ∈ Q(JΘ ) if and only if there is a constant c, such
that some Θcb1 (γ1 ) and Θcb2 (γ2 ) are both Q-useful in JΘ .
Proof. To prove the lemma, we will use the following
observation. Let (c, b1 , b2 ) and (c′ , b′1 , b′2 ) be triples of constants. Then (c, b1 ) = (c′ , b′1 ) if and only if Θcb1 (γ1 ) =
′
Θcb′ (γ1 ); this is true because Θ(γ1 ) contains both x⋆ (since
1
γ1 ∈ P ) and y1⋆ (by assumption). Similarly, (c, b2 ) = (c′ , b′2 )
′
if and only if Θcb2 (γ2 ) = Θcb′ (γ2 ). We now proceed to prove
2
the lemma.
1 ⇒ 2. Suppose that J ⋆ is a side-effect-free solution for
⋆
I and a⋆ . We will construct a solution JΘ for IΘ and a,
and show that JΘ is side-effect free. The solution J ⋆ is
constructed as follows. Start with JΘ = IΘ . Then:
• For all R1⋆ (c, b1 ) ∈ I ⋆ \ J ⋆ , remove Θcb1 (γ1 ) from JΘ .
• For all R2⋆ (c, b2 ) ∈ I ⋆ \ J ⋆ , remove Θcb2 (γ2 ) from JΘ .
To see that JΘ is a solution, and furthermore side-effect
free, we will show that (b1 , b2 ) ∈ Q⋆ (J ⋆ ) if and only if
Θb1 ,b2 (y) ∈ Q(JΘ ). For the “only if” direction, suppose that
Proof. We first prove the “only if” direction. Suppose
that Θb1 ,b2 (y) ∈ Q(JΘ ), and consider a match ν for Q in
JΘ , such that ν(y) = Θb1 ,b2 (y). We will show a constant c
and two facts Θcb1 (γ1 ) and Θcb2 (γ2 ) that are both Q-useful in
JΘ . We apply Lemma 5.5, and assume that ν = Θcb′ ,b′ for
1 2
some constants c, b′1 and b′2 . We will show that b′1 = b1 and
′
c
b2 = b2 . This would imply that Θb1 ,b2 is a match for Q in
JΘ , which (by definition) would give us the desired Θcb1 (γ1 )
and Θcb2 (γ2 ) (i.e., such that are Q-useful in JΘ ).
As y1⋆ occurs in the image of Θ, we conclude that φ2 does
not fd-separate y1 from itself, which is equivalent to saying
that y1 ∈
/ img(φ2 ). In particular, Θ(y1 ) has y1⋆ as its second
element. Therefore, ν(y1 ) has b1 as its second element since
15
B.3
(b1 , b2 ) ∈ Q⋆ (J ⋆ ). So there is a constant c, such that both
R1⋆ (c, b1 ) and R2⋆ (c, b2 ) are in J ⋆ . From the observation in
the beginning of the lemma we conclude that we have not
removed from JΘ any fact in the image Θcb1 ,b2 . Therefore,
Θb1 ,b2 (y) = Θcb1 ,b2 (y) ∈ Q(JΘ ). For the “if” direction, suppose that Θb1 ,b2 (y) ∈ Q(JΘ ). From Lemma B.2 we get that
there is a constant c, such that both Θcb1 (γ1 ) and Θcb2 (γ2 )
are Q-useful in JΘ . In particular, JΘ contains both Θcb1 (γ1 )
and Θcb2 (γ2 ), which implies that both R1⋆ (c, b1 ) and R2⋆ (c, b2 )
are in J ⋆ . Therefore, (b1 , b2 ) ∈ Q⋆ (JΘ ).
2 ⇒ 3. This direction is straightforward, since we assume
that neither γ1 nor γ2 are encounters.
3 ⇒ 1. We will construct a solution J ⋆ as follows. Start
with J ⋆ = I ⋆ . Then:
• For each fact R1⋆ (c, b1 ) ∈ I ⋆ , if Θcb1 (γ1 ) is not Q-useful
in JΘ , then remove R1⋆ (c, b1 ) from J ⋆ .
• For each fact R2⋆ (c, b2 ) ∈ I ⋆ , if Θcb2 (γ2 ) is not Q-useful
in JΘ , then remove R2⋆ (c, b2 ) from J ⋆ .
To see that J ⋆ is a solution, and furthermore side-effect free,
we will show that Θb1 ,b2 (y) ∈ Q(JΘ ) if and only if (b1 , b2 ) ∈
Q⋆ (J ⋆ ). For the “only if” direction, suppose that Θb1 ,b2 (y) ∈
Q(JΘ ). From Lemma B.2 we have that there is a constant
c, such that both Θcb1 (γ1 ) and Θcb2 (γ2 ) are Q-useful in JΘ .
Therefore, we removed none of R1⋆ (c, b1 ) and R2⋆ (c, b2 ) to
construct J ⋆ , which implies that (b1 , b2 ) ∈ Q⋆ (J ⋆ ). For
the “if” direction, suppose that (b1 , b2 ) ∈ Q⋆ (J ⋆ ). Then for
some c we have that J ⋆ contains both R1⋆ (c, b1 ) and R2⋆ (c, b2 ).
Observe that Θcb1 ,b2 is a match for Q in IΘ , and from the
construction we conclude that both Θcb1 (γ1 ) and Θcb2 (γ2 )
remain Q-useful in JΘ (otherwise J ⋆ would miss at least
one of the two facts). Finally, from Lemma B.2 get that
Θb1 ,b2 ∈ Q(J ⋆ ), as claimed.
B.2
Proofs for Section 5.4
B.3.1 Notation and Assumption
In the proofs of this section, we use the following terminology. Let φ be an encounter, and let µ be a match in
Ma⋆ (Q⋆ , I ⋆ ) with µ(x⋆ ) = c. We call νcφ an encounter assignment. A variable z ∈ Var(Q) where νcφ (z) = ♥φc is called
a heart variable of νcφ .
We denote by Ma (Q, I) be the set the matches ν ∈ M(Q, I),
such that ν(y) = a.
Throughout this section, unless specified otherwise we assume that we are in either Case 1.b or Case 2.
B.3.2 Proofs
Lemma 5.14.
In Cases 1.b and 2, I satisfies each fd of
S.
Proof. We first prove the lemma for Case 1.b. Suppose, by way of contradiction, that I does not satisfy some
fd R : A → i. Lemma 5.4 states that IΘ satisfies ∆. In
particular, IΘ satisfies R : A → i. So, the violation of
R : A → i occurs during the consideration of the encounters
(Section 5.4). Suppose that the first violation of R : A → i
is when adding to I the fact f = νcφ (φR ), where φ is an
encounter and µ ∈ Ma⋆ (Q⋆ , I ⋆ ) is such that µ(x⋆ ) = c.
(Recall that φR is Q’s atom over R.)
Since the addition of f causes the (first) violation of R :
A → i, by the time f is added I already contains a fact
f ′ 6= f over R, such that f and f ′ agree on the positions of
A by not on the ith position. Note that φR cannot have a
constant in the ith position, or otherwise the construction
of I would not allow for disagreement on the ith position of
R. Let ZA be the set of variables of φR in the positions of
A, and let zi be the variable of φR in the ith position. Note
that the fd R : A → i implies that zi ∈ img(ZA ).
Note that f is the only tuple of R that may contain the
constant ♥φc . Since f and f ′ agree on ZA and f ′ does not
contain ♥φc , we get that νcφ (z) 6= ♥φc for all z ∈ ZA . So from
the definition of νcφ (z) we conclude the following:
• z ∈ fdSep(φ, y1 ) for all z ∈ ZA , or in other words,
Proofs for Section 5.3
Lemma 5.8. In Case 1.a, φ ∈ P .
Proof. Suppose, by way of contradiction, that φ ∈
/ P.
From the fact that P contains both y1 and y2 we conclude
that in Gv (Q) there is a path p between y1 and y2 , such that
all the intermediate nodes of p are existential variables of
P . Since φ fd-separates y1 from y2 , and neither y1 nor y2
are in img(φ), we conclude that img(φ) contains at least one
existential variable of P . But that stands in contradiction
to Part 3 of Proposition A.1.
ZA ⊆ fdSep(φ, y1 ) .
νcφ (z)
= Θµ (z) for all z ∈ ZA .
•
So we have zi ∈ img(ZA ) and ZA ⊆ fdSep(φ, y1 ), which
implies that zi ∈ img(fdSep(φ, y1 )). Theorem 4.2 (Part 1)
implies that img(fdSep(φ, y1 )) = fdSep(φ, y1 ), and hence,
zi ∈ fdSep(φ, y1 ). A symmetrical proof shows that zi ∈
fdSep(φ, y2 ). By the definition of νcφ , we have zi = Θµ (zi ).
Therefore, the fact Θµ (φR ) (which belongs to IΘ ) agrees
with f on all of the positions in A ∪ {i}. But then, the
violation of R : A → i existed before the addition of f : it
has been already realized by the facts Θµ (φR ) and f ′ . This
contradicts our choice of f as the first fact that causes the
violation of R : A → i, as required.
For Case 2, the proof is similar. The only difference is
that instead of proving zi ∈ fdSep(φ, y) just for y ∈ {y1 , y2 },
we prove zi ∈ fdSep(φ, y) for all y ∈ Varh (p).
Lemma 5.10. In Case 1.a, the following hold.
• There are no encounters.
• For each i ∈ {1, 2}, the tuple Θ(yi ) contains yi⋆ .
Proof. We begin with Part 1. Suppose, by way of contradiction, that γ is an encounter (i.e., Θ(γ) contains both
y1⋆ and y2⋆ ). From the construction of Θ we conclude that
Gv (Q) contains a path p1 from y1 to some variable z1 of γ,
and a path p2 from y2 to some variable z2 of γ, such that
none of p1 and p2 visit any node of img(φ). But then, by
combining p1 , p2 and the edge {z1 , z2 } (which is not needed
if z1 = z2 ), we get a path between y1 and y2 that does not
visit any node of img(φ). This contradicts our assumption
that φ fd-separates y1 from y2 .
The fact that Θ(y1 ) contains y1⋆ is due to the assumption
that y1 ∈
/ img(φ2 ) (recall that φ2 = φ), which implies that
y1 ∈
/ fdSep(φ2 , y1 ). The proof that Θ(y2 ) contains y2⋆ is
symmetrical.
Lemma B.4. For i = 1, 2, the atom φi is a non-encounter,
and Θ(φi ) contains yi⋆ .
Proof. We will prove that Θ(φ1 ) contains y1⋆ but not
y2⋆ ; the proof that Θ(φ2 ) contains y2⋆ but not y1⋆ is by a
symmetrical argument.
16
of the nodes of p′ are in img(φ). Again, this means that
all the nodes of p′ are heart variables, which implies that p′
is a path of G♥ . In particular, G♥ contains a path from a
variable of γ (namely z ′ ) to y. Finally, we similarly show
that G♥ contains a path from a variable of γ to y ′ .
Suppose, by way of contradiction, that y1⋆ is not in Θ(φ1 ).
The definition of Θ implies that Var(φ1 ) ⊆ fdSep(φ2 , y1 ).
From Part 1 of Theorem 4.2 we get that img(Var(φ)) ⊆
fdSep(φ2 , y1 ). In particular, y1 ∈ fdSep(φ2 , y1 ) since y1 ∈
img(φ1 ) (by our choice of the parameters y1 and φ1 in each
of the two cases). But, the only way y1 ∈ fdSep(φ2 , y1 ) can
hold is if y1 ∈ img(φ2 ) (since then the path that contains
just y1 visits a node of img(φ2 )). This is a contradiction,
since we know (from our choice of the parameters in each of
the two cases) that y1 ∈
/ img(φ2 ).
The fact that y2⋆ does not occur in Θ(φ1 ) is obvious from
the definition of Θ, since φ1 fd-separates every variable of
itself from y2 .
Lemma B.6. Let νcφ be an encounter assignment. Both
Var(φ1 ) and Var(φ2 ) contain heart variables of νcφ .
Proof. We first prove the lemma for Case 1.b. We will
prove that Var(φ1 ) contains heart variables of νcφ (and the
proof for φ2 is by a symmetrical argument). Suppose, by
way of contradiction, that Var(φ1 ) does not contain any
heart variables of νcφ . Then, from the construction of νcφ
we conclude that each variable of φ1 must be fdSep(φ, y1 ).
That is, Var(φ) ⊆ fdSep(φ, y1 ). From Part 1 of Theorem 4.2
we conclude that img(φ1 ) ⊆ fdSep(φ, y1 ), and in particular, y1 ∈ fdSep(φ, y1). Of course, y1 ∈ fdSep(φ, y1) means
that y1 ∈ img(φ). From the fact that we are in Case 1 we
conclude that y2 ∈
/ img(φ).
Next, we prove that fdSep(φ1 , y2 ) is a strict subset of
fdSep(φ, y2 ). The fact that Var(φ1 ) does not contain a heart
variable of νcφ also means that each variable of φ1 is in
fdSep(φ, y2 ) (i.e., Var(φ1 ) ⊆ fdSep(φ, y2 )). Then, Part 2
of Theorem 4.2 implies that fdSep(φ1 , y2 ) ⊆ fdSep(φ, y2 ).
Now, since φ is an encounter, some variable zφ of φ is such
that Θ(zφ ) contains y2⋆ . The definition of Θ implies that
zφ ∈
/ fdSep(φ1 , y2 ). But zφ ∈ fdSep(φ, y2 ) since zφ ∈ Var(φ).
So fdSep(φ1 , y2 ) is a strict subset of fdSep(φ, y2 ), as claimed.
Next, we show that φ ∈ P . Suppose otherwise (by way
of contradiction). The fact that P is a connected component of G∃ (Q) implies that each existential variable of P is
connected to y2 in Gv (Q) through a path that contains only
existential variables of P . Part 3 of Proposition A.1 implies
that img(φ) has no existential variables of P (recall that
Q = Q+ is assumed). We conclude that φ1 (which is in P
and hence has existential variables of P ) has a variable x
that is connected to y2 through a path that visits no nodes
in img(φ), which contradicts our previous conclusion that
Var(φ1 ) ⊆ fdSep(φ, y2 ).
We showed that both y1 ∈ img(φ) and φ ∈ P hold, and
yet, fdSep(φ1 , y2 ) is a strict subset of fdSep(φ, y2 ). This
contradicts the choice of φ1 as one where fdSep(φ1 , y2 ) has
the maximal number of elements among the atoms φ ∈ P
that satisfy y1 ∈ img(φ).
We now consider Case 2, and first prove that Var(φ1 ) contains heart variables of νcφ . Suppose, by way of contradiction, that Var(φ1 ) does not contain any heart variables of νcφ .
Let y be a head variable of P , such that y ∈
/ img(φ). Using
an argument similar to one given in the proof for Case 1.b
we can show that φ ∈ P , or otherwise we get a contradiction
to the fact that Var(φ1 ) ⊆ fdSep(φ, y).
Suppose first that there is some head variable y ∈ img(φ1 ),
such that y ∈
/ img(φ). Then y is a heart variable of φ.
Let µ ∈ Ma⋆ (Q⋆ , I ⋆ ) be a match with µ(x⋆ ) = c. So we
have νcφ (y) = ♥φc , and of course, Θcµ (y) 6= ♥φc . Hence, y ∈
img(φ1 ) (together with Lemma 5.14) implies that νcφ (φ1 ) 6=
Θµ (φ1 ). From the definition of νcφ we conclude that φ1 must
have heart variables, which is a contradiction.
P
Now suppose that img P
h (φ1 ) ⊆ img h (φ). From the fact
P
that |img h (φ1 )| is maximal among the atoms of P , we conP
clude that img P
h (φ1 ) = img h (φ). In particular, y1 ∈ img(φ)
and y2 ∈
/ img(φ). Since φ1 has no heart variables, every
path of Gv (Q) from a variable of φ1 to y2 visits img(φ), or
Lemma B.5. Let νcφ be an encounter assignment. The
heart variables of νcφ induce a connected subgraph of Gv (Q).
Proof. We first prove the lemma for Case 1.b. We consider three cases, depending on properties of φ.
Case a: y1 ∈ img(φ). Let z be a heart variable of νcφ . By
the construction of νcφ we have z ∈
/ fdSep(φ, y1 )∩fdSep(φ, y2 ).
Note that z ∈ fdSep(φ, y1 ), since y1 ∈ img(φ). Therefore,
z∈
/ fdSep(φ, y2 ). This means that Gv (Q) contains a path p
between z and y2 , such that none of the nodes (variables)
of p are in img(φ). But then, none of the nodes of p are
in fdSep(φ, y2 ), which implies that p consists of only heart
variables. We conclude that in the subgraph induced by the
heart variables, each node is connected to y2 , and hence,
this subgraph is connected.
Case b: y2 ∈ img(φ). The proof for this case is by an
argument symmetrical to that of Case a.
Case c: img(φ) contains neither y1 nor y2 . Here, we
use the assumption that such φ does not fd-separate y1 from
y2 , which means that there is a path p12 between y1 and y2 ,
such that p12 does not contain any node in img(φ). Since z is
a heart variable, either z ∈
/ fdSep(φ, y1 ) or z ∈
/ fdSep(φ, y2 ),
and hence, Gv (Q) contains a path p that connects z to either
y1 or y2 , and p does not contain any node in img(φ). Hence,
the combination of p and p12 implies that in the subgraph
induced by the heart variables, z is connected to y1 (and to
y2 ). Hence, this induced subgraph is connected.
We now consider Case 2. Let G♥ be the subgraph of
Gv (Q) that is induced by the heart variables of νcφ . Let z
be a heart variable of νcφ . By definition, z ∈
/ fdSep(φ, y) for
some y ∈ Varh (P ). Hence, Gv (Q) contains a path p between
z and y, such that p does not visit a node of img(φ). Note
that none of the nodes of p are in fdSep(φ, y); therefore, all
the nodes of p are heart variables. We conclude that p is
a path in G♥ , and hence, z is connected to y in G♥ . Of
course, y is not in img(φ). So, it is enough to show that
every two variables y and y ′ where y, y ′ ∈ Varh (p) \ img(φ)
are connected by a path in G♥ . We prove that next.
Suppose that y and y ′ are two variables in Varh (p)\img(φ).
It is enough to show that there is an atom γ, such that G♥
has a path from a variable of γ to y, and a path from a
variable of γ to y ′ (this is enough, since those variables of γ
are either the same or connected by an edge in G∃ (Q)). We
choose γ to be an atom of Q, such that y, y ′ ∈ img(γ); due
to the case we currently consider (Case 2), such γ necessarily
exists. If Var(γ) ⊆ fdSep(φ, y), then Part 1 of Theorem 4.2
implies that y is in fdSep(φ, y), which is impossible since
y ∈
/ img(φ). Hence, there is some variable z ′ of γ, such
that Gv (Q) contains a path p′ between z ′ and y, and none
17
in other words, Var(φ1 ) ⊆ fdSep(φ, y2 ). From Part 1 of Theorem 4.2 we conclude that fdSep(φ1 , y2 ) ⊆ fdSep(φ, y2 ). But
then, the fact that φ is an encounter implies that Var(φ) 6⊆
fdSep(φ1 , y2 ), and hence, we get that fdSep(φ, y2 ) is a strict
superset of fdSep(φ1 , y2 ). Bit this contradicts our choice of
φ1 as one with a maximal cardinality for fdSep(φ, y2 ).
The proof that Var(φ2 ) contains heart variables of νcφ is
similar to the proof for φ1 . Specifically, in the case where we
P
P
assume that img P
h (φ2 ) ⊆ img h (φ), we have that img h (φ2 ) ⊆
P
P
P
img h (φ) implies that img h (φ2 ) = img h (φ); this is true since
P
in the selection of φ1 and φ2 we had img P
h (φ) 6⊆ img h (φ1 )
P
as y2 ∈ img P
(φ)
and
φ
is
such
that
|img
(φ
)|
is
of
max2
2
h
h
imal cardinality among those γ ∈ P satisfying img P
h (γ) 6⊆
img P
h (φ1 ).
Proof. From Lemmas B.7 and B.8 (and due to the fact
that a has no heart constants in it), we conclude that every
match ν ∈ Ma (Q, I) is an instantiation of Θ, that is, ν is
equal to Θµ for some µ ∈ M(Q⋆ , I ⋆ ).
Observe that Θ(y1 ) (where i = 1, 2) contains y1⋆ , since
y1 ∈
/ img(φ2 ). Similarly, Θ(y2 ) contains y2⋆ . We conclude
that for each ν ∈ M(Q, I), both y1⋆ and y2⋆ appear in ν(y).
In particular, due to our choice of a, for µ ∈ M(Q⋆ , I ⋆ ) we
have Θµ (y) = a if and only if µ(y1⋆ ) = ✁ and µ(y2⋆ ) = ✄.
This completes the proof.
Lemma 5.16. In Cases 1.b and 2, the atom φi (i = 1, 2)
is a non-encounter, and Θ(φi ) contains yi⋆ . Moreover, the
following are equivalent.
1. There is a side-effect-free solution JΘ for IΘ and a,
such that RφJΘ = RφIΘ for all encounters φ.
2. There is a side-effect-free solution for I and a.
Lemma B.7. Every match for Q in I is either an instantiation of Θ or an encounter assignment.
Proof. Let ν be a match for Q in I. If no heart appears
in the image of ν, then ν is a match for Q in IΘ , and from
Lemma 5.5 we conclude that ν is an instantiation of Θ. So,
suppose that the image of ν contains some ♥φc .
We first show that the image of ν does not contain any
′
φ′
♥c′ that is different from ♥φc . So, let ♥φc′ be another heart in
the image of ν. Lemmas B.5 and B.6 imply that in Gv (Q),
each heart variable of νcφ is connected to some variable of
φ1 through a path p that contains only heart variables of
νcφ . Since no fact of I contains two different hearts (by the
construction of I), the image of all the nodes z of p are such
that ν(z) = ♥φc . In particular, ν(φ1 ) contains ♥φc . A similar
′
argument shows that ν(φ1 ) contains ♥φc′ . Again, no fact of
′
I contains two different hearts, and therefore, ♥φc = ♥φc′ .
Next, we will show that ν(z) = νcφ (z) for all variables z
of φ. Lemma B.5 and the way I is constructed imply that
every heart variable of νcφ is such that ν(z) = ♥φc . Hence,
ν(z) = νcφ (z) whenever z is a heart variable of νcφ . Let
µ ∈ Ma⋆ (Q⋆ , I ⋆ ) be a match with µ(x⋆ ) = c. It remains
to show that νcφ (z ′ ) = Θµ (z ′ ) whenever z ′ is not a heart
variable of νcφ . Let ν ′ be obtained from ν by replacing ν(z)
with Θµ (z) whenever z is a heart variable of νcφ . From the
construction of νcφ it follows that each fact in the image
of ν ′ is a fact of IΘ , and hence, ν ′ is a match for Q in
IΘ . From Lemma 5.5 we conclude that ν ′ is Θµ′ for some
µ′ ∈ M(Q⋆ , I ⋆ ). We will show that µ = µ′ . Lemma B.6
states that φ1 contains a heart variable of νcφ . Since only
one fact over Rφ1 contains ♥φc , we conclude that ν(φ1 ) =
ν ′ (φ1 ). Now, Lemma B.4 shows that Θ(φ1 ) mentions y1⋆ , and
therefore, µ(y1⋆ ) = µ′ (y1⋆ ). We similarly show that µ(y2⋆ ) =
µ′ (y2⋆ ). Finally, since φ1 is in P we conclude that Θ(φ1 )
contains x⋆ , and hence, µ(x⋆ ) = µ′ (x⋆ ). It follows that
µ = µ′ , and hence, ν ′ = Θµ . In particular, νcφ (z ′ ) = Θµ (z ′ )
whenever z ′ is not a heart variable of νcφ , as claimed.
Proof. The fact that φi (i = 1, 2) is a non-encounter and
Θ(φi ) contains yi⋆ is stated by Lemma B.4.
1 ⇒ 2. We assume that JΘ is a side-effect-free solution
for IΘ and a, such that RφJΘ = RφIΘ for all encounters φ.
Building on Lemma B.3 (specifically, the equivalence between 2 and 3), we can assume that RφJΘ = RφIΘ for all
φ ∈ atoms(Q)\{φ1 , φ2 }. Here, we implicitly use Lemma B.4,
which implies that in Lemma B.3 we can use φ1 and φ2 in
place of γ1 and γ2 , respectively.
Let J be obtained from I by removing every fact in IΘ \JΘ .
We will prove that J is side-effect free. Lemma B.9 implies
that J is a solution, since a is obtained only through matches
that require facts only from JΘ (and we have a ∈
/ Q(JΘ )).
Lemma B.6 implies that we have not lost any answer of Q(I)
of the form νcφ (y), since none of the νcφ use facts among the
removed ones. Furthermore, we have not lost any answer of
the form Θµ (y), except for a, since each such Θµ (y) is in
Q(JΘ ). Finally, from Lemma B.7 we conclude that Q(J) =
Q(I) \ {a}, that is, J is side-effect free, as claimed.
2 ⇒ 1. Let J be a side-effect-free solution for I and a.
Without loss of generality, we assume that J is economical in
the sense that it does not miss any fact of I unless this fact is
(Q, a)-useful in I. Clearly, every solution can be transformed
into an economical one. Let JΘ be the subinstance J ∩ IΘ .
We will show that JΘ is side-effect-free solution for IΘ and
a, and that RφJΘ = RφIΘ for all encounters φ.
To show that JΘ is side-effect free, let b 6= a be an answer
in Q(IΘ ), and we need to show that b ∈ Q(JΘ ). We know
that b ∈ Q(J), since J is side-effect free (and Q(IΘ ) ⊆
Q(I)). From Lemmas B.7 and B.8 we get that the matches
for Q in J that produce b use facts only from IΘ . Since all
these facts are in JΘ , we get that b ∈ Q(JΘ ), as claimed.
Let φ be an encounter. We complete the proof by proving
that RφJΘ = RφIΘ . Let f be a fact of RφIΘ . We will show
that f ∈ J. Suppose first that f = Θc✁,✄ (φ) for some c.
Lemmas B.8 and B.7 imply that every νcφ is match for Q
in J (since J is side-effect free). In particular, f is in J,
since f = νcφ (φ) (which follows from the construction of νcφ
in both Case 1.b and Case 2). Now suppose that f is a fact
of the form Θb1 ,b2 (φ) where (b1 , b2 ) 6= (✁, ✄). Then f does
not participate in any Θµ with µ ∈ Ma⋆ (Q⋆ , I ⋆ ), since φ is
an encounter. But then, Lemma B.9 implies that f is not
(Q, a)-useful in I. And since J is economical, J contains
every fact that is not (Q, a)-useful, and in particular, the
fact f .
Lemma B.8. Let νcφ be an encounter assignment. The
answer νcφ (y) in Q(I) has one or more occurrences of ♥φc .
Proof. We begin with Case 1.b. Our assumption in
this case implies that either y1 or y2 (or both) is not in
img(φ). This means that either y1 ∈
/ fdSep(φ, y1 ) or y2 ∈
/
fdSep(φ, y2 ). Therefore, either νcφ (y1 ) = ♥φc or νcφ (y2 ) = ♥φc .
We now consider Case 2. Let y be a head variable in
Varh (P ) \ img(φ). Then y ∈
/ fdSep(φ, y), and therefore, y is
a heart variable of φ, which means that νcφ (y) = ♥φc .
Lemma B.9. Ma (Q, I) = {Θµ | µ ∈ Ma⋆ (Q⋆ , I ⋆ )}
18