A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies Benny Kimelfeld IBM Research–Almaden, San Jose, CA 95120, USA [email protected] ABSTRACT a maximum number of user-file access permissions remain. This is exactly the deletion-propagation problem, where the view is defined by the following conjunctive query (CQ): A classical variant of the view-update problem is deletion propagation, where tuples from the database are deleted in order to realize a desired deletion of a tuple from the view. This operation may cause a (sometimes necessary) side effect—deletion of additional tuples from the view, besides the intentionally deleted one. The goal is to propagate deletion so as to maximize the number of tuples that remain in the view. In this paper, a view is defined by a self-join-free conjunctive query (sjf-CQ) over a schema with functional dependencies. A condition is formulated on the schema and view at hand, and the following dichotomy in complexity is established. If the condition is met, then deletion propagation is solvable in polynomial time by an extremely simple algorithm (very similar to the one observed by Buneman et al.). If the condition is violated, then the problem is NP-hard, and it is even hard to realize an approximation ratio that is better than some constant; moreover, deciding whether there is a side-effect-free solution is NP-complete. This result generalizes a recent result by Kimelfeld et al., who ignore functional dependencies. For the class of sjf-CQs, it also generalizes a result by Cong et al., stating that deletion propagation is in polynomial time if keys are preserved by the view. 1. Access(u, f ) :− GroupUser(g, u), GroupFile(g, f ) (1) Formally, for a CQ Q (the view definition), the input for the deletion-propagation problem consists of a database instance I and a tuple a ∈ Q(I). A solution is an subinstance J of I (i.e., J is obtained from I by deleting facts) such that a ∈ / Q(J), and an optimal solution maximizes the number of tuples in Q(J). The side effect is the set (Q(I) \ {a}) \ Q(J). Buneman et al. [3] identify some classes of CQs (e.g., projection-free CQs) for which a straightforward algorithm produces an optimal solution in polynomial time. But those tractable classes are highly restricted, as Buneman et al. show that even for a CQ as simple as the above Access(u, f ), testing whether there is a solution that is side-effect free is NP-complete. Therefore, finding an optimal solution minimizing the side effect is NP-hard for Access(u, f ). In a more recent work, Kimelfeld et al. [13] considered views that are defined by a self-join-free CQ (sjfCQ), and proved a dichotomy theorem implying that the views for which the straightforward algorithm is suboptimal are exactly those for which deletion propagation is NP-hard. Later, we discuss that dichotomy theorem in more detail. Conceivably, one could settle for a solution that is just approximately optimal. This paper follows the notion of approximation of Kimelfeld et al. [13], who view the optimization problem as that of maximization (of the number of tuples that remain in the view) rather than minimization (of the side effect). As noted by Buneman et al. [3], minimization of the side effect does not give rise to approximations, at least for Access(u, f ), since then an approximation (with any ratio) can be used to solve the NP-complete problem of testing for the existence of a side-effect-free solution. In fact, this argument is true not just for Access(u, f ), but actually for the whole class of sjf-CQs with intractable deletion propagation [13]. On the other hand, in the case of maximization there is always a constant-factor approximation when the view defined by an sjf-CQ [13]. Going back to the Access(u, f ) view defined in (1), suppose now that we have the restriction that each user belongs to at most one group, and that each file is associated with at most one group. It is not difficult to show that then, deletion propagation is in polynomial time, since to delete a tuple (user, file) from the view, it is enough to consider (and select the best out of) two possible solutions: one obtained by deleting the single fact of the form GroupUser(g, user), and the other by deleting the single fact of the form GroupFile(g, file). Actually, this example is a special case of the result of Cong et al. [5], stating INTRODUCTION A classical problem in database management is that of view update [1, 5, 6, 10–12]: translate an update operation on the view to an update of the source database, so that the update on the view is properly realized. A special case of this problem is that of deletion propagation in relational databases: given an undesired tuple in the view (defined by some monotonic query), delete some tuples from the source relations (where such tuples are referred to as facts), so that the undesired tuple disappears from the view. The database resulting from this deletion of tuples is called a solution. The side effect is the set of deleted view tuples that are different from the undesired one. If only the undesired tuple gets to be deleted from the view, then the solution is side-effect free. A solution that is side-effect free does not necessarily exist, and therefore, the task is relaxed to that of minimizing the side effect [3, 5, 13]. That is, the problem is to delete tuples from the source relations so that the undesired tuple disappears from the view, but as many as possible other tuples remain there. Following is a simple example by Cui and Widom [7] (also referenced and used later by Buneman et al. [3] and Kimelfeld et al. [13]). Suppose that we have two relations, GroupUser(group, user) and GroupFile(group, file), representing memberships of users in groups and access permissions of groups to files, respectively. A user u can access the file f if u belongs to a group that can access f ; that is, there is some g, such that GroupUser(g, u) and GroupFile(g, f ). Suppose that we want to restrict the access of a specific user to a specific file, by eliminating some user-group or group-file pairings. Furthermore, we would like to do so in a way that 1 atoms are connected by an edge whenever they share at least one existential variable. The tractability condition in the dichotomy theorem of Kimelfeld et al. [13] says that for each connected component P of G∃ (Q) there is an atom that contains all head variables of P . For Q = Access(u, f ), the graph G∃ (Q) has only one connected component (containing the two atoms). Since no atom contains both u and f , the tractability condition of Kimelfeld et al. [13] is violated, thus MaxDPhS, Qi is NP-hard. Now suppose that S contains the fd file → group (i.e., every file belongs to one group). Then the existential variable g is functionally determined by the head variables; that is, in a mapping from Q to the database, if we know what the head variables are mapped to, then we also know what g is mapped to. So, we treat g as if it was a head variable. But now, the edge of G∃ (Q) disappears (since the two atoms no longer share an existential variable), and we get two connected components, each having one node. The tractability condition of Kimelfeld et al. [13] trivially holds then, and so, MaxDPhS, Qi is now in polynomial time. Lastly, suppose that instead of file → group, the schema S contains the fd group → user (i.e., every group has at most one user). Although the atom GroupFile(g, f ) does not contain both head variables (u and f ), it functionally determines both of them. This implies that the tractability condition of this paper is satisfied, and hence, MaxDPhS, Qi is again in polynomial time. Proving the tractable part of the dichotomy is fairly simple. However, the proof of the hardness part is nontrivial, and entails a fairly intricate construction and case analysis. A central concept in the proof is that of graph separation. In Section 5, we review the proof, which is restricted to the NP-completeness of FreeDPhS, Qi. In Section 6 we discuss the adaptation of the proof to hardness of approximation for MaxDPhS, Qi; the discussion is brief, as this adaptation uses ideas similar to the ones used by Kimelfeld et al. [13]. The work reported here is restricted to a special setting within view update: deletion propagation for sjf-CQs in the presence of fds, with the goal of preserving as many tuples of the view as possible. Other aspects of deletion propagation, and view update in general, have been studied in the literature. For example, deletion propagation was studied with the goal of minimizing the source side effect [3, 5], namely, finding a solution with a minimal number of missing facts. A significant research effort has been invested on defining the semantics of (and notions of correctness for) general view-update operations (e.g., deletion and insertion of tuples) [1, 6, 10, 12]. For example, Cosmadakis and Papadimitriou [6] introduced a requirement for a view to keep intact a complement view (which has also been explored more recently by Lechtenbörger and Vossen [15]) that, intuitively, contains the information ignored by the view. Several dichotomies in the complexity of operations involving CQs (and sjf-CQs) were proved in the literature. For example, Dalvi and Suciu [9] considered query evaluation over probabilistic databases with independent tuples, and classified the sjf-CQs into those that can be evaluated in polynomial time and those that are #P-hard. Later, Dalvi et al. [8] extended this result to the class of unions of conjunctive queries (allowing self joins). More recently, Meliou et al. [19] showed a dichotomy (polynomial time vs. NPcompleteness) in the complexity of computing the degree of responsibility of a database tuple for an answer in the result of an sjf-CQ. Examples of dichotomy theorems that that when the view (defined by a CQ) preserves keys, deletion propagation is in polynomial time. Now what if each file is associated with one group, but a user may belong to any number of groups (as commonly practiced by operating systems)? It is not difficult to show that, here also, the problem is in polynomial time. Actually, it turns out that every nontrivial key constraint suffices for the tractability of deletion propagation for Access(u, f ); moreover, there is one very simple algorithm, which we later refer to as the unirelation algorithm, that realizes the polynomial time for each of these key constraints. The above example illustrates the fact that functional dependencies (e.g., key constraints, {conference, year} →chair, etc.), abbreviated fds, have an immediate and direct effect on the complexity of deletion propagation. Moreover, this effect is on one direction: if for a certain view definition the problem is in polynomial time when disregarding the fds, it continues to be in polynomial time when fds are taken into account; however, if the problem is hard without the fds, it may no longer be hard when fds are accounted for. Therefore, the class of views that are known to be tractable for deletion propagation would expand as a result of the investigation of the problem in the presence of functional dependencies. This what we do in this paper. More formally, the setting is as follows. The view is defined by an sjf-CQ Q over a schema S that may have functional dependencies. (Exact definitions of sjf-CQs and schemas are in Section 2.) In the problem MaxDPhS, Qi (where DP stands for “Deletion Propagation”), the input consists of a database instance I over S and a tuple a ∈ Q(I). A solution is an subinstance J of I (i.e., J is obtained from I by deleting facts) such that a ∈ / Q(J). The goal is to find an optimal solution, which is a solution that maximizes the number of tuples in Q(J). The decision problem FreeDPhS, Qi has the same input (i.e., I and a), and the goal is to decide whether there is a solution that is side-effect free, that is, a subinstance J of I with Q(J) = Q(I) \ {a}. The main result of this paper is a generalization of the dichotomy theorem of Kimelfeld et al. [13] to accommodate functional dependencies. More precisely, in Section 3 we phrase a syntactic condition on (the fds of) the schema S and the sjf-CQ Q at hand. We call this condition the tractability condition. Then, the following is proved (Theorem 6.1). • If the tractability condition is met, then MaxDPhS, Qi (as well as FreeDPhS, Qi) is solvable in polynomial time by a straightforward algorithm. (Section 3.) • Otherwise, MaxDPhS, Qi is NP-hard, even to approximate better than some constant smaller than one; moreover, FreeDPhS, Qi is NP-complete. (Section 5.) The straightforward algorithm in the positive side of the dichotomy is called the unirelation algorithm, and is described in Section 3. This algorithm is very similar to the “trivial” algorithm used by Kimelfeld et al. [13] and originally observed by Buneman et al. [3]. To give the flavor of our tractability condition, we illustrate it on a very simple example, namely the CQ Access(u, f ) defined in (1). (More involved examples are in the body of the paper.) In this CQ, u and f are head variables while g is an existential variable. Suppose first that the underlying schema S has no fds. In the existential-connectivity graph of Q, denoted G∃ (Q), each atom is a node, and two 2 2.2 Conjunctive Queries involve CQs and key constraints are within the topic of consistent query answering. Kolaitis and Pema [14] proved a dichotomy (polynomial time vs. coNP-hardness) in the complexity of computing the consistent answers of a Boolean sjf-CQ with exactly two atoms. Finally, Maslowski and Wijsen [17] showed a dichotomy (polynomial vs. #P-hardness) in the complexity of counting the database repairs that satisfy a Boolean sjf-CQ. We are not aware of any dichotomy theorem that applies to sjf-CQs and general functional dependencies like as our main theorem (Theorem 6.1). The classical use case for deletion propagation is database administration: actual translation of a deletion in the view to deletion in the source relations. Contemporary applications provide further motivation—database debugging. As an example, Information Extraction (IE) programs, which can be highly complicated, often operate on unclean tables and dictionaries that by themselves may be produced by means of prior IE programs [16, 20]. When an erroneous result is detected in the output, the developer wishes to find the cause of that result (e.g., wrong dictionary entries), which may require significant manual effort in complex systems (e.g., the MIDAS system [4]). We can view the set of source tuples suggested for deletion as a debugging suggestion, or alternatively as a cause of the undesired result. That cause is meaningful in the sense that it is focused on the undesired result and has a small (if any) effect on the other results. This notion of causality is different from that of Meliou et al. [18], which actually corresponds to deletion propagation with minimal source side effect. Of course, IE programs entail more than CQs. Nevertheless, we believe that the case of sjf-CQs is already challenging enough, and our hope is that insights on them will lead to understanding of the studied problems in more complex settings. Due to space limitations, the proofs are in the appendix. 2. We fix an infinite set Var of variables, and assume that Cnsts and Var are disjoint sets. We usually denote variables by lowercase letters from the end of the Latin alphabet (e.g., x, y and z). We use the Datalog style for denoting a conjunctive query (abbrev. CQ): a CQ over a schema S is an expression of the form Q(y) :− ϕ(x, y, c), where x and y are tuples of variables (from Var), c is a tuple of constants (from Cnsts), and ϕ(x, y, c) is a conjunction of atomic formulas φi (x, y, c) over S; an atomic formula is also called an atom. We may write just Q(y), or even just Q, if ϕ(x, y, c) is irrelevant. We denote by atoms(Q) the set of atoms of Q. We usually write ϕ(x, y, c) by simply listing atoms(Q). We make the requirement that each variable occurs at most once in x and y, and no variable occurs in both x and y. Furthermore, we require every variable of y to occur (at least once) in ϕ(x, y, c). The arity of Q, denoted arity(Q), is the length of the tuple y. Let Q(y) :− ϕ(x, y, c) be a CQ. A variable in x is called an existential variable, and a variable in y is called a head variable. We use Var∃ (Q) and Varh (Q) to denote the sets of existential variables and head variables of Q, respectively. Similarly, if φ is an atom of Q, then Var∃ (φ) and Varh (φ) denote the sets of existential and head variables, respectively, that occur in φ. We denote by Var(Q) and Var(φ) the unions Var∃ (Q) ∪ Varh (Q) and Var∃ (φ) ∪ Varh (φ), respectively. In general, this work is restricted to CQs that are self-join free, which means that each relation symbol occurs at most once in the CQ. We use sjf-CQ as an abbreviation of selfjoin-free CQ. Let Q(y) :− ϕ(x, y, c) be an sjf-CQ over the schema S = (R, ∆). Without loss of generality, we assume that the set of relation symbols in Q is exactly the set of relation symbols of S. For a relation symbol R of S, the unique atom over R is denoted by φR (x, y, c) or just φR . Let S be a schema, let Q(y) be a CQ over S, and let I be an instance over S. An assignment for Q is a mapping µ : Var(Q) → Cnsts. For an assignment µ for Q, the tuple µ(y) is the one obtained from y by replacing every head variable y with the constant µ(y). Similarly, for an atom φ of Q, the fact µ(φ) is the one obtained from φ by replacing every variable z with the constant µ(z). A match for Q in I is an assignment µ for Q, such that µ(φ) is a fact of I for all φ ∈ atoms(Q). We denote by M(Q, I) the set of all the matches for Q in I. If µ is a match for Q in I, then µ(y) is called an answer (for Q in I). The result of evaluating Q over I, denoted Q(I), is the set of all the answers for Q in I; that is: SETTING AND BACKGROUND This section describes the formal setting for this paper, and provides some needed background. 2.1 Schemas and Functional Dependencies In this paper, a schema S is a pair (R, ∆), where R = {R1 , . . . , Rn } is a finite set of relation symbols and ∆ is a set of functional dependencies. We abbreviate functional dependency as fd. Each relation symbol Ri ∈ R has an arity, denoted arity(R), which is a natural (nonzero) number. An fd δ ∈ ∆ has the form Ri : A → B, where Ri ∈ R and A and B are subsets of {1, . . . , arity(Ri )}. We use Ri : A → j and Ri : j ′ → j as shorthand notations of Ri : A → {j} and Ri : {j ′ } → {j}, respectively. We assume an infinite set Cnsts of constants. A database over a schema S is called an instance (of S), and its values are taken from Cnsts. Formally, an instance I over a schema S = (R, ∆) consists of a finite relation RiI ⊆ Cnstsarity(Ri ) for each relation symbol Ri ∈ R, such that all the fds of S are satisfied; that is, for each fd Ri : A → B in ∆ and tuples t and u in RiI , if t and u agree on (i.e., have the same values for) the indices of A, then they also agree on the indices of B. If I is an instance over S and t is a tuple in RiI , then we say that Ri (t) is a fact of I. Notationally, we view an instance I as the set of its facts. As an example, J ⊆ I means that RiJ ⊆ RiI for all Ri ∈ R, in which case we say that J is subinstance of I. (Note that if I satisfies ∆, then every subinstance of I satisfies ∆ as well.) def Q(I) = {µ(y) | µ ∈ M(Q, I)} Over a given schema, let Q(y) be a CQ, let I be an instance, and let b be an answer. A fact f ∈ I is said to be (Q, b)-useful (in I) if there is an atom φ ∈ atoms(Q) and a match µ for Q in I, such that µ(y) = b and µ(φ) = f . A fact f ∈ I is said to be Q-useful (in I) if f is (Q, b)-useful for some answer b. Example 2.1. An important CQ in this work is the sjfCQ Q⋆ which is the same as the CQ Access(u, f ) defined in (1) (and discussed in the introduction), up to renaming of relation symbols and variables. The schema is S⋆ , and it has two binary relation symbols R1 and R2 and no functional dependencies. The sjf-CQ Q⋆ over S⋆ is defined as follows. Q⋆ (y1 , y2 ) :− R1 (x, y1 ), R2 (x, y2 ) 3 (2) an answer a⋆ = (a1 , a2 ). To obtain a solution, we need to delete from I ⋆ facts, such that for all constants c where both R1 (c, a1 ) and R2 (c, a2 ) are in I ⋆ , at least one of the two is deleted. In MaxDPhS⋆ , Q⋆ i, the goal is to have as many as possible pairs remaining in Q⋆ (I ⋆ ) after the deletion. In FreeDPhS⋆ , Q⋆ i, the goal is to determine whether we can delete facts from I ⋆ , so that every original answer, except for a⋆ , remains an answer. Buneman et al. [3] proved the following. The atoms of Q⋆ are φR1 = R1 (x, y1 ) and φR2 = R2 (x, y2 ). In the examples of this article, Ri and Rj are assumed to be different symbols when i 6= j. In particular, schema(Q⋆ ) consists of two distinct binary relation symbols: R1 and R2 . Note that Q⋆ is an sjf-CQ, since R1 and R2 are different relation symbols. There is only one existential variable in Q⋆ , namely x, and the two head variables are y1 and y2 . Hence Var∃ (Q) = {x} and Varh (Q) = {y1 , y2 }. Furthermore, Var∃ (φR1 ) = {x} and Varh (φR1 ) = {y1 }. Theorem 2.3. [3] FreeDPhS⋆ , Q⋆ i is NP-complete (and hence, MaxDPhS⋆ , Q⋆ i is NP-hard). 2.2.1 CQ Functional Dependencies Let S = (R, ∆) be a schema, and let Q be a CQ over S. If µ is an assignment for Q and Z is a subset of Var(Q), then µ[Z] denotes the restriction of µ to the variables in Z (i.e., Z is the domain of µ[Z], and µ[Z](z) = µ(z) for all z ∈ Z). Let Z1 and Z2 be subsets of Var(Q). We denote by Q : Z1 → Z2 the functional dependency stating that for every instance I over S and matches µ and µ′ of Q in I we have the following: Kimelfeld et al. [13] extended Theorem 2.3 to a dichotomy theorem, which we discuss in the next section. Another problem that falls in our setting is that of deletion propagation for key-preserving views [5], which is described as follows. Let S = (R, ∆) be a schema. For a relation symbol R ∈ R, a primary key for R is a set K of indices in {1, . . . , arity(R)}, such that ∆ contains the fd K → {1, . . . , arity(R)}. We say that a CQ Q over S is key preserving if for every φ ∈ atoms(Q) there is a key Kφ for Rφ , such that φ has no existential variables in the positions of Kφ . Cong et al. [5] proved the following. µ[Z1 ] = µ′ [Z1 ] ⇒ µ[Z2 ] = µ′ [Z2 ] Note that when ∆ is empty (e.g., as in the setting of [3,13]), then Q : Z1 → Z2 is equivalent to Z1 ⊇ Z2 . As usual, we may write Q : Z → z instead of Q : Z → {z}. The image of Z, denoted img(Z), is the set of all the variables z ∈ Var(Q), such that Q : Z → z. Note that Z ⊆ img(Z). If φ is an atom of Q, then we use img(φ) as a shorthand notation of img(Var(φ)). Theorem 2.4. [5] Let S be a schema and let Q be a CQ over S, such that Q is key preserving. Then MaxDPhS, Qi (as well as FreeDPhS, Qi) is solvable in polynomial time. In the end of Section 3, we will show how Theorem 2.4 follows from the results of this paper in the case where Q is self-join free. Example 2.2. Let S = (R, ∆) be such that R contains two binary relation symbols R1 , R2 and two ternary relation symbols S and T . In addition, suppose that ∆ contains the following fds: R1 : 1 → 2 R2 : 1 → 2 2.4 Head Domination and Dichotomy Let S be a schema, and let Q be a CQ over a S. The existential-connectivity graph of Q, denoted G∃ (Q), is the undirected graph that has atoms(Q) as the set of nodes, and that has an edge {φ1 , φ2 } whenever φ1 and φ2 have at least one existential variable in common (in notation, Var∃ (φ1 ) ∩ Var∃ (φ2 ) 6= ∅). Let φ be an atom of Q, and let P be a set of atoms of Q. We denote by Var(P ) the set of all the variables that occur in P , by Varh (P ) the set Var(P ) ∩ Varh (Q), and by Var∃ (P ) the set Var(P ) ∩ Var∃ (Q). S : {1, 2} → 3 Let Q be the following CQ. Q(y1 , y2 , y3 ) :− R1 (x1 , y1 ), R2 (x2 , y2 ), S(y1 , y3 , x), T (x, x1 , x2 ) Then we have img({x1 }) = {x1 , y1 } due to the fd R1 : 1 → 2, img({x2 }) = {x2 , y2 } due to R2 : 1 → 2, and img({y1 , y3 }) = {y1 , y3 , x} due to S : {1, 2} → 3. In addition, we have img(φT ) = {x, x1 , x2 , y1 , y2 } due to the fds R1 : 1 → 2 and R2 : 1 → 2. (Recall that φT is the atom T (x, x1 , x2 ).) Example 2.5. The left side of Figure 1 shows the graph G∃ (Q) for the CQ Q of Example 2.2. Observe that there is no edge between R1 (x1 , y1 ) and S(y1 , y3 , x) since the two atoms do not share an existential variable (they share y1 , which is a head variable). The graph G∃ (Q) is connected, and its single connected component P satisfies Varh (P ) = {y1 , y2 , y3 } and Var∃ (P ) = {x, x1 , x2 }. 2.3 Deletion Propagation Let S be a schema, and let Q be a CQ. The problem of maximizing the view in deletion propagation, with Q as the view definition, is denoted by MaxDPhS, Qi and is defined as follows. The input consists of an instance I over S, and a tuple a ∈ Q(I). A solution (for I and a) is a subinstance J ⊆ I, such that a ∈ / Q(J). The goal is to find an optimal solution, which is a solution J that maximizes |Q(J)|; that is, J is such that |Q(J)| ≥ |Q(J ′ )| for all solutions J ′ . The problem of determining whether there is a side-effect-free solution, denoted FreeDPhS, Qi, is that of deciding whether there is a solution in which we lose exactly a; that is, given the input I and a, decide whether there is a subinstance J ⊆ I with Q(J) = Q(I) \ {a}. As an example, consider the schema S⋆ and the CQ Q⋆ of Example 2.1. The input for the problems MaxDPhS⋆ , Q⋆ i and FreeDPhS⋆ , Q⋆ i consists of an instance I ⋆ over S⋆ and Kimelfeld et al. [13] defined the head-domination property of a CQ. Definition 2.6. (Head Domination [13]) A CQ Q has the head-domination property if for every connected component P of G∃ (Q) there is an atom φ of Q such that Varh (P ) ⊆ Var(φ). Note that in the definition, the atom φ that satisfies Varh (P ) ⊆ Var(φ) is not necessarily in P . Examples follow. Example 2.7. A CQ that has head domination is the following: Q′ (y1 , y2 , y3 ) :− R1 (x, y1 ), R2 (x, y2 ), S(y1 , y2 ), T (y2 , y3 ) 4 R1 (x1 , y1 ) R2 (x2 , y2 ) T (x, x1 , x2 ) S(y1 , y3 , x) R2 (x2 , y2 ) MaxDPhS, Qi to be solvable in polynomial time. In Section 5, we prove that the condition is also necessary (under standard complexity assumptions). We begin by describing the unirelation algorithm, which is an extremely simple algorithm that produces a (not-necessarily optimal) solution given the input I and a for MaxDPhS, Qi. P1 R1 (x1 , y1 ) T (x, x1 , x2 ) S(y1 , y3 , x) P2 3.1 The Unirelation Algorithm Let S be a schema, and let Q be a CQ over S. Buneman et al. [3] observed the following simple algorithm (also called the trivial algorithm [13]) for producing a solution, given the input I and a for MaxDPhS, Qi. For each φ ∈ atoms(Q), construct the candidate solution Jφ by deleting from I every fact that can be obtained from φ by an assignment that agrees with a on the head variables; then, select the Jφ with the maximal |Q(Jφ )|. The unirelation algorithm, denoted UniRelhS, Qi and depicted in Figure 2, is essentially the same algorithm, except that we delete only facts that are (Q, a)useful (that is, the facts we delete are always of the form µ(φ) where µ ∈ M(Q, I) is such that µ(y) = a). The following straightforward proposition states the fact that the unirelation algorithm is correct and terminates in polynomial time. Figure 1: The graphs G∃ (Q) (left) and G∃ (Q+ ) (right) for the CQ Q of Example 2.2 Note that G∃ (Q′ ) has three connected components: P1 = {φR1 , φR2 }, P2 = {φS }, P3 = {φT }. Since Varh (P1 ) ⊆ Var(φS ), Varh (P2 ) ⊆ Var(φS ) and Varh (P3 ) ⊆ Var(φT ), we get that Q′ has indeed head domination. As a negative example, consider again the CQ Q of Example 2.2. As shown in Example 2.5 (through Figure 1), G∃ (Q) has only one connected component. Since no atom contains all the head variables of Q, we get that Q does not have head domination. Kimelfeld et al. [13] proved the following theorem, which states a dichotomy in the complexities of MaxDPhS, Qi and FreeDPhS, Qi in the absence of fds. Proposition 3.1. UniRelhS, Qi(I, a) returns a solution in polynomial time. Theorem 2.8. [13] Let S = (R, ∆) be a schema and let Q be an sjf-CQ over S. It follows immediately from the results of Kimelfeld et al. [13] that if Q is an sjf-CQ with head domination, then UniRelhS, Qi produces an optimal solution. It also follows immediately that in the absence of fds, head domination is a necessary condition for the optimality of UniRelhS, Qi; but this is no longer true when fds are present, as the following example shows. 1. If Q has head domination, then MaxDPhS, Qi can be solved in polynomial time. 2. If ∆ = ∅ and Q does not have head domination, then FreeDPhS, Qi is NP-complete. In this paper, we extend the dichotomy of Theorem 2.8 to account for fds (Theorem 6.1). As we will show, this extension is nontrivial, and requires arguments, concepts and constructions that are significantly different from those used for proving Theorem 2.8. Specifically, we will define a tractability condition, which is a syntactic condition on the schema S and a CQ Q at hand, such that fds are taken into account to weaken the property of head domination. We will show that the tractability condition gives a dichotomy similar to that of Theorem 2.8, that is, its satisfaction implies tractability of MaxDPhS, Qi and its violation implies NP-completeness of FreeDPhS, Qi. (Of course, the tractability condition is equivalent to head domination in the absence of fds.) Actually, the result of Kimelfeld et al. [13] is stronger than Theorem 2.8. In the tractable part (item 1), the problem MaxDPhS, Qi is solved by an extremely simple algorithm that has been observed already by Buneman et al. [3]. We will show that the tractable side in the current paper is solved by essentially the same algorithm, with a slight modification. In the intractable part of Theorem 2.8 (item 2), Kimelfeld et al. further show some hardness of approximation for MaxDPhS, Qi. To simplify the presentation, we ignore approximation through most of this paper, except for Section 6 where we show that a similar hardness of approximation is in the intractable side of our dichotomy as well. 3. Example 3.2. Consider again the CQ Q⋆ over S⋆ (Example 2.1), but now let us use the schema S⋆0 , which is the same as S⋆ except that it has the fd R1 : 1 → 2. We claim that, given input I ⋆ and a⋆ = (a1 , a2 ), the execution of UniRelhS⋆0 , Q⋆ i(I ⋆ , a⋆ ) returns an optimal solution. As proof, we will show that for φ = R2 (x, y2 ) the solution Jφ , constructed by the algorithm, is optimal. Let Jopt be an optimal solution. It is enough to show every fact R2 (c, a2 ) that is (Q⋆ , a⋆ )-useful in I ⋆ is not Q-useful in Jopt (which then implies that Q(Jopt ) ⊆ Q(Jφ )). Indeed, if Jopt contains both R1 (c, b1 ) and R2 (c, a2 ), then the fd implies that b1 = a1 , since I ⋆ contains also R1 (c, a1 ) (or otherwise R2 (c, a2 ) is not (Q⋆ , a⋆ )-useful in I ⋆ ). This means that Q(Jopt ) contains a⋆ , which contradicts the fact that Jopt is a solution. 3.2 Functional Head Domination The idea that is exploited in Example 3.2 to show the optimality of UniRelhS⋆0 , Q⋆ i is the following. Although φR2 does not contain both y1 and y2 (which are the head variables of the single connected component of G∃ (Q⋆ )), img(φR2 ) contains them both. So intuitively, we can treat φR2 as if it contains both y1 and y2 . This idea is formalized next. Definition 3.3. (Functional Head Domination) Let S be a schema, and let Q be a CQ over S. We say that Q has functional head domination if for every connected component P of G∃ (Q) there is an atom φ ∈ atoms(Q), such that Varh (P ) ⊆ img(φ). TRACTABILITY In this section, we phrase the tractability condition on a schema S and an sjf-CQ Q over S. Also in this section we will prove that the condition is indeed sufficient for 5 which means that given an answer in Q(I ⋆), every corresponding assignment maps x to the same value. In effect, this means that x can be treated as a head (rather than existential) variable, which may change the structure of G∃ (Q) and can potentially lead to functional head domination. Indeed, if x becomes a head variable in Q⋆ , then no existential variables remain in Q⋆ , and then (functional) head domination trivially holds. The following lemma formalizes this intuition. The proof is straightforward, hence omitted. Algorithm UniRelhS, Qi(I, a) 1: J ← ∅ 2: for all φ ∈ atoms(Q) do 3: Jφ ← I 4: Remove from Jφ every (Q, a)-useful fact over Rφ 5: if |Q(Jφ )| > |Q(J)| then 6: J ← Jφ 7: return J Lemma 3.7. Let S be a schema, and let Q(y) be a CQ over S. Suppose that z is a variable in img(Varh (Q)), and let Q′ be the CQ Q(y, z) (obtained by appending z to the head). For all instances I over S, there is a (polynomialtime-computable) function β : Q(I) → Q′ (I), such that β is a bijection from Q(J) to Q′ (J) for every J ⊆ I. Figure 2: The unirelation algorithm As an example, the CQ Q⋆ over the schema S⋆0 of Example 3.2 has functional head domination, since img(φR2 ) contains both y1 and y2 (as previously explained). Another example follows. For a CQ Q over a schema S, we denote by Q+ the CQ that is obtained by appending (in some order) all the existential variables in Var∃ (Q) ∩ img(Varh (Q)) to the head of Q. As an example, for the CQ Q = Q⋆ over the schema S⋆1 of Example 3.6, the CQ Q+ is: Example 3.4. Consider the schema S and CQ Q of Example 2.2. Recall that the left side of Figure 1 shows the graph G∃ (Q). Since G∃ (Q) is connected and no atom φ of Q is such that img(φ) contains all the head variables, we conclude that Q does not have functional head domination. Now, let Q′ be obtained from Q by adding x to the head (so the head variables of Q′ are x and the yi ). The right side of Figure 1 shows G∃ (Q′ ). For the connected component P1 we have Varh (P1 ) = {x, y1 , y2 } ⊆ img(φT ), and Varh (P2 ) = {y1 , y3 } ⊆ img(φS ). Thus, Q′ has functional head domination. Q+ (y1 , y2 , x) :− R1 (x, y1 ), R2 (x, y2 ) By repeatedly applying Lemma 3.7 we get the following. Lemma 3.8. Let S be a schema, and let Q be a CQ over S. Given an instance I over S, there is a (polynomial-timecomputable) function β : Q(I) → Q+ (I), such that β is a bijection from Q(J) to Q+ (J) for every J ⊆ I. As an immediate conclusion of Lemma 3.8, we get the following lemma. The following theorem extends the idea illustrated in Example 3.2: functional head domination is a sufficient condition for MaxDPhS, Qi to be solvable in polynomial time. This theorem follows directly from Lemma 3.11 and Theorem 3.12 that are given later in this section. Lemma 3.9. Let Q be a CQ over a schema S. • There is a polynomial-time reduction from the problem FreeDPhS, Qi to the problem FreeDPhS, Q+ i and vice versa. • There is a polynomial-time, approximation-preserving reduction from MaxDPhS, Qi to MaxDPhS, Q+ i and vice versa. Theorem 3.5. Let S be a schema, and let Q be an sjfCQ over S. If Q has functional head domination, then UniRelhS, Qi optimally solves MaxDPhS, Qi. Nevertheless, functional head domination is still not a necessary condition for the optimality of the unirelation algorithm, as the following example shows. 3.3 Tractability Condition We can now define our tractability condition. Example 3.6. Consider again the CQ Q⋆ over S⋆ (Example 2.1), but now let us use the schema S⋆1 , which is the same as S⋆ except that it has the fd R1 : 2 → 1 (recall that in Example 3.2 we had R1 : 1 → 2). We again claim that, given input I ⋆ and a⋆ = (a1 , a2 ), the execution of UniRelhS⋆0 , Q⋆ i(I ⋆ , a⋆ ) returns an optimal solution. To see that, let Jopt be an optimal solution. If none of the facts R2 (c, a2 ) that are (Q⋆ , a⋆ )-useful in I ⋆ are Q-useful in Jopt , we are done as in Example 3.2. Otherwise, take such a fact R2 (c, a2 ), and we will show that none of the facts R1 (c′ , a1 ) that are (Q⋆ , a⋆ )-useful in I ⋆ are Q⋆ -useful in Jopt (which means that Q(Jopt ) ⊆ Q(Jφ ) for φ = R1 (x, y1 )). Indeed, if R1 (c′ , a1 ) is a counterexample, then R1 : 2 → 1 implies c′ = c, since we know that I ⋆ contains R1 (c, a1 ) (or otherwise R2 (c, a2 ) is not (Q⋆ , a⋆ )-useful in I ⋆ ), and then a⋆ ∈ Q⋆ (Jopt ), which is of course a contradiction. Definition 3.10. (Tractability Condition) For an sjfCQ Q over a schema S, the tractability condition is that Q+ has functional head domination. Later on, we show that the tractability condition indeed implies tractability. But first, the following proposition states that this condition is weaker than functional head domination, which in turn is weaker than head domination. Proposition 3.11. Let Q be a CQ over a schema S. 1. If Q has head domination, then Q has functional head domination. 2. If Q has functional head domination, then Q+ has functional head domination. Furthermore, none of the two reverse implications hold. Following is the main result for this section, and it says that when the tractability condition is satisfied (and in the absence of self joins), the unirelation algorithm is optimal. The optimality of the unirelation algorithm in Example 3.6 is due to the fact that the existential variable x is functionally dependent on the head variables (i.e., x ∈ img({y1 , y2 })), 6 Theorem 3.12. Let Q be an sjf-CQ over S. If Q+ has functional head domination, then UniRelhS, Qi optimally solves MaxDPhS, Qi (in polynomial time). y1 x1 x2 y2 y Next, we illustrate applications of Theorem 3.12. Figure 3: Gv (Q) in Example 4.1 Example 3.13. Consider again the schema S and CQ Q of Example 2.2, where it is stated that the existential variable x is in img({y1 , y2 }). So x ∈ img(Varh (Q)). The reader can verify that none of x1 and x2 are in img(Varh (Q)). Therefore, the CQ Q+ is the following: The next theorem states some properties of fd-separation. This theorem is important for this work, as it is used repeatedly in the proofs for the results of the following section. Q(y1 , y2 , y3 , x) :− R1 (x1 , y1 ), R2 (x2 , y2 ), S(y1 , y3 , x), T (x, x1 , x2 ) Theorem 4.2. Let Q be a CQ over a schema S, let Z ⊆ Var(Q) be a set of variables, and let z ∈ Var(Q) be a variable. The following hold. 1. fdSep(Z, z) is closed under fds; that is: Hence, Q+ is actually the CQ Q′ of Example 3.4, where it is shown to have functional head domination. Therefore, the tractability condition is satisfied, and hence, Theorem 3.12 says that MaxDPhS, Qi is in polynomial time. img(fdSep(Z, z)) = fdSep(Z, z) 2. fdSep(Z, z) is closed under fd-separation by subsets; that is, for all Z ′ ⊆ Var(Q) we have: As another example, we will show how Theorem 2.4 (by Cong et al. [5]) follows from Theorem 3.12 for the class of sjf-CQs. Let S be a schema and let Q be an sjf-CQ over S, such that Q is key preserving. It is straightforward to show that Q being key preserving implies that Q+ has no existential variables. Hence, G∃ (Q) has no edges, and so Q+ has (functional) head domination. Then, Theorem 3.12 says that MaxDPhS, Qi is in polynomial time. 4. Z ′ ⊆ fdSep(Z, z) ⇒ fdSep(Z ′ , z) ⊆ fdSep(Z, z) 3. Z determines z whenever Z separates from z a set that determines z; that is, for all Z ′ ⊆ Var(Q) we have: ` ´ z ∈ img(Z ′ ) ∧ Z ′ ⊆ fdSep(Z, z) ⇒ z ∈ img(Z) 5. HARDNESS In this section, we prove that for an sjf-CQ Q over a schema S, if the tractability condition is violated then the problem FreeDPhS, Qi is NP-complete. In the next section, we also show a hardness-of-approximation result for MaxDPhS, Qi. We will discuss the main lemmas involved in the proof, while complementary intermediate results are in the appendix. Similarly to the proof of hardness for Theorem 2.8 [13], we prove hardness here by a reduction from FreeDPhS⋆ , Q⋆ i, where S⋆ and Q⋆ are those defined in Example 2.1. (Nevertheless, the reductions here are significantly more involved than those of [13].) We first set some notation and assumptions that we will hold for the remainder of this section. FD-SEPARATION Before we proceed to proving of the hardness side of our dichotomy, we introduce the concept of fp-separation, which plays a significant role in our proofs. Let Q be a CQ over a schema S. The variable co-occurrence graph of Q, denoted Gv (Q), is the graph that has Var(Q) as the set of nodes, and an edge {z1 , z2 } whenever z1 and z2 co-occur in the same atom of Q (i.e., {z1 , z2 } ⊆ Var(φ) for some φ ∈ atoms(Q)). Consider two variables z1 , z2 ∈ Var(Q) and a subset Z of Var(Q). We say that Z separates z1 from z2 if every path of Gv (Q) connecting z1 and z2 visits one or more nodes in Z. Observe that as a special case, if z1 ∈ Z then Z separates z1 from every other variable z2 . Moreover, if Z separates z1 from itself then z1 is necessarily in Z. Let Z be a subset of Var(Q), and let z1 , z2 ∈ Var(Q) be two variables. We say that Z fd-separates z1 from z2 if img(Z) separates z1 from z2 . We also say that an atom φ of Q fd-separates z1 from z2 if Var(Q) fd-separates z1 from z2 . We denote by fdSep(Z, z) (respectively, fdSep(φ, z)) the set of all the variables z ′ of Q, such that Z (respectively, φ) fd-separates z from z ′ . 5.1 Notation and Assumptions For the remainder of this section, we fix a schema S = (R, ∆) and an sjf-CQ Q over S, such that Q+ does not have functional head domination. Membership of FreeDPhS, Qi in NP is straightforward (nondeterministically choose a subinstance and test whether it is a side-effect-free solution). So our goal is to prove that FreeDPhS, Qi is NP-hard. By Lemma 3.9, it suffices to prove that FreeDPhS, Q+ i is NPhard. Therefore, to simplify the presentation we will assume that Q = Q+ . We fix a component P of G∃ (Q), such that none of the atoms φ of Q satisfy Varh (P ) ⊆ img(Q); such P exists, since Q does not have functional head domination. For the sake of presentation, we rephrase S⋆ and Q⋆ for a clear distinction from S and Q. Specifically, we will assume that the two relation symbols of S⋆ are R1⋆ and R2⋆ (rather than R1 and R2 as in Example 2.1), and that the CQ Q⋆ over S⋆ is the following sjf-CQ: Example 4.1. Let S be a schema with three ternary relation symbols R1 , R2 and S, and the fd S : {1, 3} → 2. Consider the following CQ over S: Q(y1 , y2 , y) :− R1 (y1 , y, x1 ), S(x1 , y, x2 ), R2 (x2 , y, y2 ) The graph Gv (Q) is depicted in Figure 3. The node x1 does not fd-separate y1 from y2 , since the path y1 — y — y2 does not contain any node in img({x1 }) = {x1 }. Similarly, x2 does not fd-separate y1 from y2 . However, {x1 , x2 } fdseparates y1 from y2 , since every path between y1 and y2 visits a node in img({x1 , x2 }) = {x1 , x2 , y}. Finally, observe that φS separating y1 from y2 . Q⋆ (y1⋆ , y2⋆ ) :− R1⋆ (x⋆ , y1⋆ ), R2⋆ (x⋆ , y2⋆ ) It is known that FreeDPhS⋆ , Q⋆ i is NP-complete [3]. Recall that S⋆ has no fds. Also recall that our goal is to devise a reduction from FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi. So, 7 in the remaining of this section we fix the input I ⋆ and a⋆ for FreeDPhS⋆ , Q⋆ i. We assume that a⋆ is the pair (✁, ✄). Our reduction will construct the input I and a for the problem FreeDPhS, Qi. Recall from Section 4 that fdSep(φ, y) is the set of all variables z ∈ Var(Q), such that Var(φ) fd-separates y from z (i.e., img(φ) separates y from z in the graph Gv (φ)). Example 5.2. We continue with our running example, which is introduced in Example 5.1. Figure 5 shows the graph Gv (Q). The grey shadow should be ignored for now; we discuss it later in this section. Recall that P consists of all the atoms of Q, except for V (x3 , y2 , y3 ). Suppose that the parameters of the template construction are y1 , y2 , φ1 = R1 (x1 , x) and φ2 = R2 (x, x2 ). (Later, we discuss the manner in which these specific parameters are chosen.) The sets img(φ1 ) and img(φ2 ) are surrounded by boxes with dotted edges. For each variable z of Q, the corresponding node contains Θ(z) below z. As an example, consider the variable x′ . As shown in the figure, Θ(x′ ) = hx⋆ , y1⋆ , y ⋆ i. The occurrence of x⋆ is due to the fact that x′ is an existential variable in P ; that of y1⋆ is due to the fact that φ2 does not fd-separate x′ from y1 , as evidenced by the path x′ — x1 — y1 , and similar is the explanation for y2⋆ . As another example, note that Θ(x′′ ) = hx⋆ , ✁, ✄i, since φ2 (resp., φ1 ) fd-separates x′′ from y1 (resp., y2 ), as the edges that are incident to x′′ connect x′′ to variables in the intersection img(φ1 ) ∩ img(φ2 ). Finally, for x3 we have Θ(x3 ) = h⋄, ✁, y2⋆ i, where the reason for the occurrence of ⋄ is that x3 is not in Var∃ (P ); besides that, the explanations for ✁ and y2⋆ are like those for x′′ and x′ , respectively. Example 5.1. A running example in this section is over the following CQ. Q(y1 , y2 , y3 ) :− S1 (y1 , x1 ), R1 (x1 , x), R2 (x, x2 ), S2 (x2 , y2 ), T (x1 , x′ , x2 ), U (x, x′′ , y3 ), V (x3 , y2 , y3 ) The schema S has the relation symbols of Q (with the corresponding arities). In addition, S has the following fds. S1 : 2 → 1 S2 : 1 → 2 U :1→3 Observe that Q = Q+ since no existential variable belongs to img({y1 , y2 , y3 }). The graph G∃ (Q) has two connected components, where one consists of just V (x3 , y2 , y3 ) and the other, which we fix here as P , consists of all remaining atoms. It is easy to verify that no atom φ of Q is such that img(φ) contains all of y1 , y2 and y3 (which are the head variables in P ); hence the tractability condition is violated. As example input for FreeDPhS⋆ , Q⋆ i, we will use the instance I ⋆ with the following facts: R1 (0, ✁) R2 (0, ✄) R1 (0, a) R2 (0, b) R1 (1, ✁) R2 (1, ✄) R1 (1, c) Let µ be a match in M(Q⋆ , I ⋆ ). The assignment Θµ is the same as Θ, except that in the image, every occurrence of x⋆ , y1⋆ and y2⋆ is replaced with µ(x⋆ ), µ(y1⋆ ) and µ(y2⋆ ), respectively. We call Θµ an instantiation of Θ. The template construction builds the instance IΘ that is the union of the Θµ (φ) over all matches µ ∈ M(Q⋆ , I ⋆ ) and atoms φ of Q. The matches for Q⋆ in I ⋆ are shown in the leftmost column in the table of Figure 4 (the other columns are discussed later); as a shorthand notation, the figure uses the triple c, b1 , b2 (e.g., 0, ✁, b) to specify the match that maps x⋆ , y1⋆ and y2⋆ to c, b1 and b2 , respectively. Note that Q(I ⋆ ) = {(✁, ✄), (✁, b), (a, ✄), (a, b), (c, ✄)}. The matches that produce a⋆ = (✁, ✄) are repeated (for later discussion) in the bottom of the column. The reader can verify that there is no side-effect-free solution for I ⋆ and a⋆ . Example 5.3. We continue with our running example. For presentation sake, in our examples we may write just cb1 b2 instead of hc, b1 , b2 i (e.g., ⋄✁✄ instead of h⋄, ✁, ✄i). Recall that Figure 5 shows the values that Θ assigns the variables to. Each of the top six rows in the table of Figure 4 corresponds to one of the six matches µ ∈ M(Q⋆ , I ⋆ ). (We later discuss the bottom two rows.) The leftmost element (under M(Q⋆ , I ⋆ )) is µ, as explained in Example 5.1. The cell under the atom φ contains the values of the fact Θµ (φ). As an example, if µ is the match represented by 0, ✁, b and φ is S1 (y1 , x1 ), then Θµ (φ) = S(⋄✁✄, 0✁✄). So, the top six rows in the table depict IΘ (with some facts repeated). 5.2 Template Construction As a first step in the reduction from FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi, we define a template construction, which produces an instance IΘ over S. In turn, IΘ will be a subinstance of the final instance I built in the reduction. The construction is as follows. Recall that we are given the input I ⋆ and a⋆ = (✁, ✄) for FreeDPhS⋆ , Q⋆ i. Also recall that Q⋆ uses three variables: x⋆ , y1⋆ , y2⋆ . We fix a constant ⋄ that does not appear is I ⋆ . We first define a template assignment Θ, which maps every variable z ∈ Var(Q) to a triple Θ(z) = hw, w1 , w2 i, where w is either the variable x⋆ or the constant ⋄, w1 is either the variable y1⋆ or the constant ✁, and w2 is either the variable y2⋆ or the constant ✄. More particularly, the entire template construction is parameterized by two variables y1 , y2 ∈ Varh (P ) and two atoms φ1 , φ2 ∈ P . In the next section, we discuss in detail the way these parameters are chosen (which is critical for the correctness of the reduction); for now, assume that they are already present. Given the parameters y1 , y2 , φ1 and φ2 , the triple Θ(z), where z ∈ Var(Q), is constructed as follows. • Begin with Θ(z) = hx⋆ , y1⋆ , y2⋆ i. • If z ∈ / Var∃ (P ), then replace x⋆ with ⋄. • If z ∈ fdSep(φ2 , y1 ), then replace y1⋆ with ✁. • If z ∈ fdSep(φ1 , y2 ), then replace y2⋆ with ✄. Next, we discuss some basic properties of the template construction, which hold regardless of the choice of the parameters φi and yi . The following lemma states that IΘ is an instance of S (i.e., all fds are satisfied). Lemma 5.4. IΘ satisfies every fd of ∆. The next lemma, which is central to the correctness of our construction, states that there are no matches for Q in IΘ except for the instantiations of Θ (used for constructing IΘ ). Lemma 5.5. M(Q, IΘ ) = {Θµ | µ ∈ M(Q⋆ , I ⋆ )}. The tuple a (playing as input for FreeDPhS, Qi) is the one obtained from y by replacing each variable y with the triple h⋄, ✁, ✄i. As as example, with our shorthand notation the tuple a in our running example is (⋄✁✄, ⋄✁✄, ⋄✁✄). 8 M(Q⋆ , I ⋆ ) S1 (y1 , x1 ) R1 (x1 , x) R2 (x, x2 ) 0, ✁, ✄ ⋄✁✄, 0✁✄ 0✁✄, 0✁✄ 0✁✄, 0✁✄ 0, ✁, b ⋄✁✄, 0✁✄ 0✁✄, 0✁✄ 0✁✄, 0✁b 0, a, ✄ ⋄a✄, 0a✄ 0a✄, 0✁✄ 0✁✄, 0✁✄ 0, a, b ⋄a✄, 0a✄ 0a✄, 0✁✄ 0✁✄, 0✁b 1, ✁, ✄ ⋄✁✄, 1✁✄ 1✁✄, 1✁✄ 1✁✄, 1✁✄ 1, c, ✄ ⋄c✄, 1c✄ 1c✄, 1✁✄ 1✁✄, 1✁✄ 0✁✄, ♥φ 0 1✁✄, ♥φ 1 ⋄✁✄, 0✁✄ ⋄✁✄, 1✁✄ 0, ✁, ✄ 1, ✁, ✄ ♥φ 0 , 0✁✄ ♥φ 1 , 1✁✄ S2 (x2 , y2 ) T (x1 , x′ , x2 ) U (x, x′′ , y3 ) 0✁✄, ⋄✁✄ 0✁✄, 0✁✄, 0✁✄ 0✁✄, 0✁✄, ⋄✁✄ 0✁b, ⋄✁b 0✁✄, 0✁b, 0✁b 0✁✄, 0✁✄, ⋄✁✄ 0✁✄, ⋄✁✄ 0a✄, 0a✄, 0✁✄ 0✁✄, 0✁✄, ⋄✁✄ 0✁b, ⋄✁b 0a✄, 0ab, 0✁b 0✁✄, 0✁✄, ⋄✁✄ 1✁✄, ⋄✁✄ 1✁✄, 1✁✄, 1✁✄ 1✁✄, 1✁✄, ⋄✁✄ 1✁✄, ⋄✁✄ 1c✄, 1c✄, 1✁✄ 1✁✄, 1✁✄, ⋄✁✄ 0✁✄, ⋄✁✄ 0✁✄, 0✁✄, 0✁✄ 1✁✄, ⋄✁✄ 1✁✄, 1✁✄, 1✁✄ φ φ ♥φ 0 , ♥0 , ♥0 φ φ ♥1 , ♥1 , ♥φ 1 V (x3 , y2 , y3 ) Q(I) ⋄✁✄, ⋄✁✄, ⋄✁✄ ⋄✁✄, ⋄✁✄, ⋄✁✄ ⋄✁b, ⋄✁b, ⋄✁✄ ⋄✁✄, ⋄✁b, ⋄✁✄ ⋄✁✄, ⋄✁✄, ⋄✁✄ ⋄a✄, ⋄✁✄, ⋄✁✄ ⋄✁b, ⋄✁b, ⋄✁✄ ⋄a✄, ⋄✁b, ⋄✁✄ ⋄✁✄, ⋄✁✄, ⋄✁✄ ⋄✁✄, ⋄✁✄, ⋄✁✄ ⋄✁✄, ⋄✁✄, ⋄✁✄ ⋄c✄, ⋄✁✄, ⋄✁✄ φ ♥φ 0 , ⋄✁✄, ♥0 φ ♥1 , ⋄✁✄, ♥φ 1 ⋄✁✄, ⋄✁✄, ♥φ 0 ⋄✁✄, ⋄✁✄, ♥φ 1 Figure 4: The instance I constructed in the running example (introduced in Example 5.1) the existence of a side-effect-free solution for IΘ and a, and on the other direction, existence of a side-effect-free solution JΘ for IΘ and a implies the existence of a side-effect-free solution for I ⋆ and a⋆ provided that JΘ does not miss any of facts over the encounters. x′ hx⋆ , y1⋆ , y2⋆ i x1 x x2 hx⋆ , y1⋆ , ✄i hx⋆ , ✁, ✄i hx⋆ , ✁, y2⋆ i y1 y3 y2 h⋄, y1⋆ , ✄i h⋄, ✁, ✄i h⋄, ✁, y2⋆ i img(φ1 ) x′′ x3 hx , ✁, ✄i h⋄, ✁, y2⋆ i ⋆ Lemma 5.7. Suppose that P has two non-encounters γ1 and γ2 , such that Θ(γ1 ) contains y1⋆ and Θ(γ2 ) contains y2⋆ . Then, the following are equivalent. 1. There is a side-effect-free solution for I ⋆ and a⋆ . 2. There is a side-effect-free solution JΘ for IΘ and a, such that RφJΘ = RφIΘ for all encounters φ. img(φ2 ) In the next section we branch into cases, based on properties of S and Q, and show how we build on Lemma 5.7 to complete the reduction in each of these cases. Figure 5: Gv (Q) in the running example (introduced in Example 5.1) 5.3 Cases We now describe the different cases we consider. In addition, for each case we specify the parameters we use for the template construction. We use the following terminology. A set Z of variables is simultaneously determined if there is an atom φ of Q, such that Z ⊆ img(φ). To variables z1 and z2 are fd-separable if there is an atom φ of Q with the following properties: • φ fd-separates y1 from y2 . • img(φ) contains neither y1 nor y2 . Up to now, we showed how to construct IΘ and a from the input I ⋆ and a⋆ for FreeDPhS⋆ , Q⋆ i, given parameters for the template construction. One would wonder whether we are done in the sense that we can use IΘ and a directly as input for FreeDPhS⋆ , Q⋆ i. As stated by Lemma 5.7 below, it is true that existence of a side-effect-free solution for I ⋆ and a⋆ implies the existence of a side-effect-free solution for IΘ and a. In one of the cases we consider later we show that the other direction is also true; however, this is not the general case: There may be a side-effect-free solution for IΘ and a even if no such a solution exists for I ⋆ and a⋆ , as the following example shows. Case 1: P has two head variables that are not simultaneously determined. The variables y1 and y2 will act as parameters for the template construction. With y1 and y2 chosen, we branch into two sub-cases, where in each sub-case we choose the parameters φ1 and φ2 differently. Example 5.6. Consider again the instances I ⋆ and IΘ , and the tuples a⋆ and a, constructed in our running example. Recall that IΘ is depicted in the top part of the table in Figure 4. Recall from Example 5.1 that there is no side-effectfree solution for I ⋆ and a⋆ . But there is a side-effect-free solution for I and a. To see that, observe (in Figure 4) that the facts T (0✁✄, 0✁✄, 0✁✄) and T (1✁✄, 1✁✄, 1✁✄) are used only in the matches that produce a. From Lemma 5.5 we conclude that by removing these two facts from IΘ we get a solution that is side-effect free. Case 1.a: y1 and y2 are fd-separable. In Case 1.a, we choose an atom φ, such that φ fd-separates y1 from y2 and img(φ) contains neither y1 nor y2 . Lemma 5.8. In Case 1.a, φ ∈ P . Now, both parameters φ1 and φ2 are chosen to be φ. Recall that φ1 and φ2 are required to be in P , so this choice is justified by Lemma 5.8, The “problem” illustrated in Example 5.6 is that one of the atoms, namely T (x1 , x′ , x2 ), is such that Θ(φ) contains both y1⋆ and y2⋆ . Indeed, in the example both of y1⋆ and y2⋆ appear in Θ(x′ ). An atom φ of Q such that Θ(φ) contains both y1⋆ and y2⋆ (not necessarily in the image of the same variable) is called an encounter. Following is a key lemma that will later guide us in the further definition of the reduction. This lemma shows that, under some condition, the reduction thus far is “correct” up to the issue of Example 5.6. That is, existence of a side-effect-free solution for I ⋆ and a⋆ implies Example 5.9. Consider the following CQ. Q(y1 , y2 , y3 ) :− S1 (x′′1 , y1 , x′1 ), R1 (x′1 , y3 , x1 ), T (x1 , x2 ), R2 (x2 , y3 , x′2 ), S2 (x′2 , y2 , x′′2 ) The schema S has the relation symbols of Q (with the corresponding arities). In addition, S has the following fds. R1 : 3 → 2 9 S1 : 3 → 2 S2 : 1 → 2 y1 h⋄, y1⋆ , ✄i x′1 hx⋆ , y1⋆ , ✄i x′′1 x′′2 hx⋆ , y1⋆ , ✄i hx⋆ , ✁, y2⋆ i x1 x2 hx⋆ , ✁, ✄i hx⋆ , ✁, ✄i y2 x′′1 y1 h⋄, ✁, y2⋆ i h⋄, y1⋆ , ✄i x′2 x′1 hx⋆ , ✁, y2⋆ i x′′2 hx⋆ , y1⋆ , ✄i hx⋆ , y1⋆ , ✄i y3 hx⋆ , ✁, y2⋆ i x1 x2 hx⋆ , y1⋆ , ✄i hx⋆ , ✁, y2⋆ i y2 h⋄, ✁, y2⋆ i x′2 hx⋆ , ✁, y2⋆ i y3 h⋄, ✁, ✄i h⋄, ✁, ✄i img(φ1 ) img(φ2 ) img(φ1 ) = img(φ2 ) Figure 7: Case 1.b—Gv (Q) in Example 5.12 Figure 6: Case 1.a—Gv (Q) in Example 5.9 Example 5.12. Consider again the CQ Q and schema S of Example 5.9, except that now we assume that the fds of S are the following. It is easy to verify that G∃ (Q) is connected (i.e., has just one connected component), and that no atom φ satisfies {y1 , y2 } ⊆ img(φ). Therefore, y1 and y2 are not simultaneously determined, which means that we are in Case 1. Figure 6 shows the graph Gv (Q). Surrounded by a rectangle (with dotted edges) is the set img(φT ), where φT is the atom T (x1 , x2 ). As can be seen in the figure, neither y1 nor y2 appear in img(φT ), and img(φT ) separates y1 from y2 . Therefore, y1 and y2 are fd-separable, which means that we are in Case 1.a. As φ1 and φ2 we select the atom φT (actually, φT is the only possible choice). Finally, having chosen all the parameters of the template construction, the box of each variable z in Figure 6 contains the triple Θ(z). R1 : {3, 2} → 1 R2 : {1, 2} → 3 S1 : 3 → 2 S2 : 1 → 2 Like in Example 5.9, y1 and y2 are not simultaneously determined, which means that we are in Case 1. Figure 7 shows the graph Gv (Q). Unlike Example 5.9, now the set img(φT ) contains just x1 and x2 , and as can be seen in Figure 7 the atom φT no longer fd-separates y1 from y2 . Actually, it is easy to see that for each atom φ of Q it holds that img(φ) either contains one of y1 and y2 or does not separate y1 from y2 . We are therefore in Case 1.b. The atoms φ with y1 ∈ img(φ) are φR1 and φS1 , namely, R1 (x′1 , y3 , x1 ) and S1 (x′′1 , y1 , x′1 ), respectively. We have the following: Observe the following for the CQ Q over the schema S in Example 5.9 (with Gv (Q) depicted in Figure 6). First, none of the atoms are encounters. Second, Θ(y1 ) contains y1⋆ and Θ(y2 ) contains y2⋆ . The following lemma shows that these observations always hold in Case 1.a. This lemma is useful, since it suffices for completing the reduction in Case 1.a. fdSep(φR1 , y2 ) = {x′′1 , y1 , x′1 , x1 , y3 } fdSep(φS1 , y2 ) = {x′′1 , y1 , x′1 } Since |fdSep(φR1 , y2 )| > |fdSep(φS1 , y2 )|, we choose φ1 = φR1 . We similarly choose φ2 = φR2 . In Figure 7, the sets img(φ1 ) and img(φ2 ) are surrounded by polygons with dotted edges. Finally, with all parameters chosen, the box of each variable z in Figure 6 contains the triple Θ(z). Lemma 5.10. In Case 1.a, the following hold. • There are no encounters. • For each i ∈ {1, 2}, the tuple Θ(yi ) contains yi⋆ . So, for the instance I of our reduction from the problem FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi we simply choose I = IΘ . Based on Lemma 5.10, we can now apply Lemma 5.7 with γ1 and γ2 being any atoms in P that contain y1 and y2 , respectively. Note that in Example 5.12, the atom φT (i.e., T (x1 , x2 )) is an encounter, since Θ(x1 ) contains y1⋆ and Θ(x2 ) contains y2⋆ . In particular, we cannot just set I = IΘ as in Case 1.a: similarly to Example 5.6 it can be shown that there is always a side-effect-free solution for IΘ and a, no matter what I ⋆ and a⋆ are. In the following section, we show how to deal with encounters for both this case and Case 2, which we discuss next. Corollary 5.11. In Case 1a, there is a side-effect-free solution for I ⋆ and a⋆ if and only if there is a side-effect-free solution for I and a. Case 2: Every two head variables of P are simultaneously determined. Still within Case 1, the next case we consider is the complement of Case 1.a. In Case 2, the parameters of the template construction are chosen as follows. We first look at the sets Z ⊆ Varh (P ) that are simultaneously determined by an atom of P , and among those we select a Z1 with a maximal cardinality. Note that Z1 is a strict subset of P , due to our assumption that P violates the tractability condition. Next, among those sets Z that are not contained in Z1 , we again select a Z2 with a maximal cardinality. Note that neither Z1 ⊆ Z2 nor Z2 ⊆ Z1 hold. So, as y1 we select an element of Z1 \ Z2 , and as y2 we select an element of Z2 \ Z1 . Finally, we consider the atoms φ ∈ P with img P h (φ) = Z1 , and select φ1 to be a φ with a maximal fdSep(φ, y2 ). Similarly, we consider Case 1.b: y1 and y2 are not fd-separable. In Case 1.b, we choose the atoms φ1 , φ2 ∈ P (used as parameters of the template construction) as follows. The atom φ1 is such that y1 ∈ img(φ1 ) (hence, y2 ∈ / img(φ1 ) since we are in Case 1) and fdSep(φ1 , y2 ) has the maximal number of elements among the atoms φ ∈ P that satisfy y1 ∈ img(φ). (Actually, it is enough for fdSep(φ1 , y2 ) not be contained in any fdSep(φ, y2 ) for those atoms φ.) The atom φ2 is selected similarly, except that y1 and y2 swap roles. This manner of selecting φ1 and φ2 is crucial for the correctness of our reduction. 10 the atoms φ ∈ P with img P h (φ) = Z2 , and select φ2 to be a φ with a maximal fdSep(φ, y1 ). We now have all four parameters needed for the template construction. Again, this specific manner of selecting the parameters is needed for the correctness of our reduction. to I the fact νcφ (γ) (unless it is already in I). Of course, we need to show that I is indeed an instance over S (i.e., the fds are satisfied). Example 5.13. An example of Case 2 is the CQ Q over the schema S in our running example. Indeed, y1 and y2 are simultaneously determined (by φT ), and so are y1 and y3 (by φR1 ) as well as y2 and y3 (by φR2 ). In the parameter setting, Z1 is {y1 , y3 }, Z2 is {y2 , y3 }, and the chosen parameters are y1 , y2 , φR1 and φR2 . Recall that Figure 5 shows Gv (Q) along with Θ(z) for each z ∈ Var(Q). Observe that T (x1 , x′ , x2 ) is an encounter (since Θ(x′ ) contains both y1⋆ and y2⋆ ). Example 5.15. We continue with our running example, which falls in Case 2. There is a single encounter: φT = T (x1 , x′ , x2 ). As shown in Figure 4, there are two matches in Ma⋆ (Q⋆ , I ⋆ ): the one of the first row maps x⋆ to 0, and the one of the fifth row maps x⋆ to 1; let us call these matches µ0 and µ1 , respectively. In Figure 5, the set img(φT ) is shaded by a grey polytope. Since img(φT ) does not separate any node from y3 , except for the nodes of img(φT ) itself, we have ν0φ (z) = ♥φ0 T and ν1φ (z) = ♥φ1 T for each variable z outside img(φT ), and for each z ∈ img(φT ) we have ν0φ (z) = Θµ0 (z) and ν1φ (z) = Θµ1 (z). The facts that are added to construct I are the two in the bottom of Figure 4. Lemma 5.14. In Cases 1.b and 2, I satisfies each fd of S. As shown in Example 5.13 (and previously in Example 5.6), in Case 2 there may be encounters, and we deal with those in the next section. 5.4 Handling Encounters In the proof of correctness for the reduction, we prove the following intermediate result. For each µ ∈ Ma⋆ (Q⋆ , I ⋆ ) with µ(x⋆ ) = c, the mapping νcφ maps φ to Θµ (φ), and moreover, gives rise to a unique new answer b ∈ Q(I) (that contains at least one occurrence of ♥φc ). See Figure 4 for illustration. In turn, this intermediate result is used for proving the following lemma. Our approach to dealing with encounters (in Cases 1.b and 2) is as follows. Suppose that φ is an encounter. Let Ma⋆ (Q⋆ , I ⋆ ) be the set of all matches µ ∈ M(Q⋆ , I ⋆ ) with µ(y1⋆ ) = ✁ and µ(y2⋆ ) = ✄ (that is, µ(y1⋆ , y2⋆ ) = a⋆ ). As illustrated in Example 5.6, we can always obtain a sideeffect-free solution for IΘ and a by deleting from IΘ all the facts of the form Θµ (φ) where µ ∈ Ma⋆ (Q⋆ , I ⋆ ). The idea is to make each such Θµ (φ) necessary for a side-effect-free solution. This is done by adding to IΘ facts so that in the resulting instance I, there are new answers for which every Θµ (φ) is needed.1 Now, if there is a side-effect-free solution for I ⋆ and a⋆ then there is still a side-effect-free solution for I and a. Moreover, if there is a side-effect-free solution J for I and a, then J necessarily contains all the facts Θµ (φ) where µ ∈ Ma⋆ (Q⋆ , I ⋆ ). We can then easily assume that JΘ = J ∩ IΘ contains all the facts of IΘ over the encounters. Finally, by applying Lemma 5.7 we get a side-effect-free solution for I ⋆ and a⋆ . Next, we give the formal details. Let φ be an encounter, let µ be a match in Ma⋆ (Q⋆ , I ⋆ ), and suppose that µ(x⋆ ) = c. We define the assignment νcφ for Q, as follows. In Case 1.b: ( Θµ (z) if z ∈ fdSep(φ, y1 ) ∩ fdSep(φ, y2 ); def φ νc (z) = ♥φc otherwise. And in Case 2: ( Θµ (z) def φ νc (z) = ♥φc Lemma 5.16. In Cases 1.b and 2, the atom φi (i = 1, 2) is a non-encounter, and Θ(φi ) contains yi⋆ . Moreover, the following are equivalent. 1. There is a side-effect-free solution JΘ for IΘ and a, such that RφJΘ = RφIΘ for all encounters φ. 2. There is a side-effect-free solution for I and a. The correctness of the reduction (in Cases 1.b and 2) is obtained by combining Lemmas 5.7 and 5.16. Corollary 5.17. In Cases 1.b and 2, there is a sideeffect-free solution for I ⋆ and a⋆ if and only if there is a side-effect-free solution for I and a. 5.5 Summary and Conclusion To summarize this section, we fixed S and Q that violate the tractability condition, and showed a reduction from FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi. Specifically, we took the input I ⋆ and a⋆ for FreeDPhS⋆ , Q⋆ i and constructed the input I and a for FreeDPhS, Qi. We started with I = IΘ , which we built by means of the assignments Θµ , using the template assignment Θ; to deal with encounters, we added tuples to I using the assignments νcφ . The correctness of the reduction has been stated in Corollaries 5.11 and 5.17. Combined with the observation that the construction of I and a takes polynomial time, we get the main result of this section. T if z ∈ y∈Varh (p) fdSep(φ, y); otherwise. Here, ♥φc is a fresh T new constant that is unique to φ and c. Note that z ∈ y∈Varh (p) fdSep(φ, y) means that in Gv (Q), the set img(φ) separates z from every head variable y of P . As an example, consider Q and S from Example 5.12, which fall in Case 1.b. There is a single encounter, namely φT = T (x1 , x2 ). In Figure 7, the set img(φT ) is shaded by a grey rectangle. Since every variable z ∈ / {x1 , x2 } is reachable from y1 (and y2 ) by a path that does not visit img(φT ), we have νcφ (z) = ♥φc for each z ∈ / {x1 , x2 }, while νcφ (z) = Θµ (z) for each z ∈ {x1 , x2 }. Later, Example 5.15 discusses Case 2. Finally, to construct I we start with I = IΘ , and for each encounter φ, assignment νcφ and atom γ ∈ atoms(Q), we add Theorem 5.18. If Q+ does not have functional head domination, then FreeDPhS, Qi is NP-complete. 6. MAIN RESULT In this section, we state the dichotomy established in the previous two sections. Moreover, we strengthen the dichotomy by considering (hardness of) approximation. 1 An idea in this spirit was used for proving the dichotomy theorem of [13], but naturally, details greatly differ due to the fds. 11 6.1 Dichotomy sfj-CQs, then there are no tractable cases except for those where an algorithm as straightforward as the unirelation algorithm already handles. Many natural and practically motivated directions are left for future work. Among them are exploring CQs with self joins (with or without fds), utilizing fds to generalize and improve upon approximation algorithms [13], handling additional constraints (like general equality-generating dependences [2] and foreign keys) and investigating deletion of multiple tuples (a.k.a. group deletion [5]). As a conclusion of Theorems 3.12 and 5.18, we get the following dichotomy theorem, which states that for an sjfCQ Q over a schema S the complexity of MaxDPhS, Qi is precisely determined by the tractability condition (i.e., Q+ has functional head domination): when held, the problem is solvable in polynomial time by the (very simple) unirelation algorithm; when violated, the problem is NP-hard, since FreeDPhS, Qi is NP-complete. Theorem 6.1. Let Q be an sjf-CQ over a schema S. Acknowledgments • If Q+ has functional head domination, then UniRelhS, Qi solves MaxDPhS, Qi (in polynomial-time). The author is grateful to Jan Vondrák for helpful and insightful discussions, and to Phokion G. Kolaitis for being a source of education and inspiration for this work. • If Q+ does not have functional head domination, then FreeDPhS, Qi is NP-complete. 8. REFERENCES 6.2 Hardness of Approximation [1] F. Bancilhon and N. Spyratos. Update semantics of relational views. ACM Trans. Database Syst., 6(4):557–575, 1981. [2] C. Beeri and M. Y. Vardi. A proof procedure for data dependencies. J. ACM, 31(4):718–741, 1984. [3] P. Buneman, S. Khanna, and W. C. Tan. On propagation of deletions and annotations through views. In PODS, pages 150–158, 2002. [4] D. Burdick, M. A. Hernández, H. Ho, G. Koutrika, R. Krishnamurthy, L. Popa, I. Stanoi, S. Vaithyanathan, and S. R. Das. Extracting, linking and integrating data from public sources: A financial case study. IEEE Data Eng. Bull., 34(3):60–67, 2011. [5] G. Cong, W. Fan, and F. Geerts. Annotation propagation revisited for key preserving views. In CIKM, pages 632–641, 2006. [6] S. S. Cosmadakis and C. H. Papadimitriou. Updates of relational views. J. ACM, 31(4):742–760, 1984. [7] Y. Cui and J. Widom. Run-time translation of view tuple deletions using data lineage. Technical report, Stanford University, 2001. http://dbpubs.stanford.edu:8090/pub/2001-24. [8] N. N. Dalvi, K. Schnaitter, and D. Suciu. Computing query probability with incidence algebras. In PODS, pages 203–214, 2010. [9] N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4):523–544, 2007. [10] U. Dayal and P. A. Bernstein. On the correct translation of update operations on relational views. ACM Trans. Database Syst., 7(3):381–416, 1982. [11] R. Fagin, J. D. Ullman, and M. Y. Vardi. On the semantics of updates in databases. In PODS, pages 352–365. ACM, 1983. [12] A. M. Keller. Algorithms for translating view updates to database updates for views involving selections, projections, and joins. In PODS, pages 154–163. ACM, 1985. [13] B. Kimelfeld, J. Vondrák, and R. Williams. Maximizing conjunctive views in deletion propagation. In PODS, pages 187–198, 2011. [14] P. G. Kolaitis and E. Pema. A dichotomy in the complexity of consistent query answering for queries with two atoms. In press, 2011. [15] J. Lechtenbörger and G. Vossen. On the computation of relational view complements. ACM Trans. Database Syst., 28(2):175–208, 2003. [16] B. Liu, L. Chiticariu, V. Chu, H. V. Jagadish, and F. Reiss. Automatic rule refinement for information extraction. PVLDB, 3(1):588–597, 2010. [17] D. Maslowski and J. Wijsen. On counting database repairs. In LID, pages 15–22, 2011. [18] A. Meliou, W. Gatterbauer, J. Y. Halpern, C. Koch, K. F. Moore, and D. Suciu. Causality in databases. IEEE Data Eng. Bull., 33(3):59–67, 2010. [19] A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. The complexity of causality and responsibility for query answers and non-answers. PVLDB, 4(1):34–45, 2010. [20] E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI/IAAI, pages 474–479, 1999. For a number α ∈ [0, 1], a solution J is α-optimal if |Q(J)| ≥ α · |Q(K)| for all solutions K. Kimelfeld et al. [13] showed that for an sjf-CQ Q over a schema S without fds, when MaxDPhS, Qi is NP-hard, it cannot be approximated better than some constant αQ ∈ (0, 1) unless P = NP. For that, they showed a PTAS reduction from MaxDPhS⋆ , Q⋆ i to MaxDPhS, Qi (that is, the reduction is such that for each α there is αQ , such that an αQ -approximation for the generated instance of MaxDPhS, Qi gives an α-approximation for the original instance of MaxDPhS⋆ , Q⋆ i), rather than an ordinary reduction from FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi as given here. Nevertheless, using ideas very similar to those in the construction of Kimelfeld et al., we can adjust the reduction of Section 5 to a PTAS reduction.2 As a result, we get the following addition to the hardness part of Theorem 6.1. Theorem 6.2. Let Q be an sjf-CQ over a schema S. If Q+ does not have functional head domination, then there is a constant αQ such that it is NP-hard to αQ -approximate MaxDPhS, Qi. Combining Theorems 6.1 and 6.2, we conclude the following about MaxDPhS, Qi for an sjf-CQ Q over a schema S. Either the problem is straightforward, or our ability to even approximate it is limited. The dependency of the constant αQ on Q is already known to be unavoidable [13]: for each constant α ∈ (0, 1) there is an sjf-CQ Qα over a schema S without fds, such that Qα does not have (functional) head domination, and yet, MaxDPhS, Qi is α-approximable in polynomial time (and by the unirelation algorithm). In other words, there is no global α that can replace all the αQ in Theorem 6.2. 7. CONCLUSIONS The dichotomy theorem proved in this paper shows how exactly functional dependencies extend the class of sjf-CQs where maximum deletion propagation is in polynomial time. On the negative side, it shows that if optimal solutions (or arbitrarily good approximations, e.g., PTAS) are sought for 2 We gave a reduction from FreeDPhS⋆ , Q⋆ i to FreeDPhS, Qi, rather than a PTAS one from MaxDPhS⋆ , Q⋆ i to MaxDPhS, Qi, since it greatly simplifies the presentation, yet still, entails essentially all of the novel ideas. 12 APPENDIX A. A.1 1. Var(φ) ⊆ fdSep(Varh (P ), zP ). 2. If zP ∈ img(φ) then zP ∈ img(Varh (P )). 3. If zP ∈ img(φ) and Q = Q+ , then zP ∈ Varh (P ). Proof. We first prove Part 1. Let zφ be a variable in Var(φ). We need to show that Varh (P ) fd-separates zP from zφ . Suppose, by way of contradiction, that Gv (Q) has a path p between zφ and zP , such that none of the nodes in p are in Varh (P ). Suppose that p is the path zP = z1 — z2 — . . . — zn = zφ . We will prove, by induction on i, that each zi is in Var∃ (P ). This is enough, since zφ ∈ Var∃ (P ) implies that φ ∈ P , in contradiction to the assumption of the proposition. For the basis, i = 1, the claim zi ∈ Var∃ (P ) is obvious from our assumption that z1 = zP ∈ Var(P ) and p has no variables in Varh (P ). For the inductive step, if zi ∈ Var∃ (P ) then zi ∈ Var∃ (φ′ ) for some atom φ′ in P . The fact that Gv (Q) contains an edge between zi and zi+1 means that some atom φ′′ contains both zi and zi+1 . But then, φ′ and φ′′ share the existential variable zi , and therefore, φ′′ is also in P . It thus follows that zi+1 ∈ Var(P ), and since p has no nodes in Varh (P ), we conclude that zi+1 ∈ Var∃ (P ), as required. We now prove Part 2. Suppose that zP ∈ img(φ) (which is the same as zP ∈ img(Var(φ))). From Part 1 of the lemma we know that Var(φ) ⊆ fdSep(Varh (P ), zP ). So now, we can use Part 3 of Theorem 4.2 with zP , Var(φ) and Varh (P ) playing the roles of z, Z ′ and Z, respectively. That gives us zP ∈ img(Varh (P )), as claimed. Part 3 follows immediately from Part 2, since Q = Q+ and zP ∈ img(Varh (P )) implies that zP is a head variable (hence, in Var(P ) ∩ Varh (Q) = Varh (P )). PROOFS FOR SECTIONS 3 AND 4 Proof of Theorem 4.2 Theorem 4.2. Let Q be a CQ over a schema S, let Z ⊆ Var(Q) be a set of variables, and let z ∈ Var(Q) be a variable. The following hold. 1. fdSep(Z, z) is closed under fds; that is: img(fdSep(Z, z)) = fdSep(Z, z) 2. fdSep(Z, z) is closed under fd-separation by subsets; that is, for all Z ′ ⊆ Var(Q) we have: Z ′ ⊆ fdSep(Z, z) ⇒ fdSep(Z ′ , z) ⊆ fdSep(Z, z) 3. Z determines z whenever Z separates from z a set that determines z; that is, for all Z ′ ⊆ Var(Q) we have: ` ´ z ∈ img(Z ′ ) ∧ Z ′ ⊆ fdSep(Z, z) ⇒ z ∈ img(Z) Proof. We begin with Part 1. We need to show that img(fdSep(Z, z)) ⊆ fdSep(Z, z). By way of contradiction, suppose otherwise. Let I be the set of facts that consists of the union of two sets I1 and I2 of facts: I1 is obtained from atoms(Q) by replacing each variable with a constant c1 , and I2 is obtained similarly to I1 , except that each variable not in fdSep(Z, z) is replaced with a different constant c2 6= c1 . So there are (at least) two matches for Q in I, where these matches agree exactly on fdSep(Z, z). Since img(fdSep(Z, z)) 6⊆ fdSep(Z, z), there is a violation of an fd, say R : A → i. From the construction of I it follows that there is an atom φ over R, such that the set ZA of variables of φ in the positions of A is contained in fdSep(Z, z), and the variable zi of φ in the ith position is not in fdSep(Z, z). Now, if ZA ⊆ img(Z) then R : A → i implies that zi ∈ img(Z), and then zi ∈ fdSep(Z, z), which contradicts our assumption. Therefore, at least one variable in ZA , say zA , is not in img(Z). But then, by definition there is a path in Gv (Q) from z to zi , such that the path does not visit any node in img(Z); by adding to this path the edge {zi , zA } (and this edge exists since zi and zA co-occur in φ), we get a path in Gv (Q) from z to zA , such that the path does not visit any node in img(Z). This contradicts the assumption that zA is fd-separated from z. We now prove Part 2. Let Z ′ ⊆ fdSep(Z, z) and let z ′ be a node in fdSep(Z ′ , z). We need to show that z ′ ∈ fdSep(Z, z). Let p be a path in Gv (Q) between z and z ′ . We have to prove that p contains a node in img(Z). Since z ′ ∈ fdSep(Z ′ , z), we have that p contains some node z ′′ ∈ img(Z ′ ). From Part 1 we know that img(Z ′ ) ⊆ fdSep(Z, z). Therefore, z ′′ ∈ fdSep(Z, z), which means that every path connecting z and z ′′ contains a node in img(Z). In particular, p contains a node in img(Z) (since p contains both z and z ′′ ), as required. Finally, we prove Part 3. So suppose that Z ′ is such that z ∈ img(Z ′ ) and Z ′ ⊆ fdSep(Z, z). From Part 1 we know that z ∈ fdSep(Z, z). In particular, the one-node path that consists of only z visits a node in img(Z), or in other words, z ∈ img(Z), as claimed. A.2 Proposition 3.11. Let Q be a CQ over a schema S. 1. If Q has head domination, then Q has functional head domination. 2. If Q has functional head domination, then Q+ has functional head domination. Furthermore, none of the two reverse implications hold. Proof. Item 1 is straightforward. The fact that none of the two reverse implications hold is shown in Examples 3.2 and 3.6, respectively. It remains to prove item 2, which is done next. Suppose that Q has functional head domination. We need to show that Q+ has functional head domination. Let PQ+ be a connected component of G∃ (Q+ ). An easy observation is that every edge of G∃ (Q+ ) is also an edge of G∃ (Q). In particular, PQ+ is contained in some connected component PQ of G∃ (Q). Let φQ be an atom of Q, such that Varh (PQ ) ⊆ img(φQ ). We complete the proof by showing that Varh (PQ+ ) ⊆ img(φQ ) as well. Let z be a head variable of PQ+ . We need to show that z ∈ img(φQ ). If z is a head variable of Q, then z is a head variable of PQ , and therefore z ∈ img(φQ ). So suppose that z is an existential variable of Q. Then z is in PQ (since z is in PQ+ ) and z ∈ img(Varh (Q)) (by the definition of Q+ ). From Part 1 of Proposition A.1 it follows that Varh (PQ ) fdseparates z from every head variable z ′ ∈ Varh (Q)\Varh (PQ ). Of course, Varh (PQ ) fd-separates z from every head variable in Varh (PQ ). We conclude that Varh (Q) ⊆ fdSep(Varh (PQ ), z). From Part 3 of Theorem 4.2, with img(φQ ) and Varh (Q) playing the roles of Z and Z ′ , respectively, we conclude that z ∈ img(Varh (PQ )). Finally, from the fact that Varh (PQ ) ⊆ img(φQ ) we get that z ∈ img(φQ ), as claimed. Proofs for Section 3 Proposition A.1. Let Q be a CQ over a schema S, and let P be a connected component of G∃ (Q). Let φ be an atom in atoms(Q) \ P , and let zP be a variable in Var(P ). 13 Theorem 3.12. Let Q be an sjf-CQ over S. If Q+ has functional head domination, then UniRelhS, Qi optimally solves MaxDPhS, Qi (in polynomial time). Let µ be a match in M(Q⋆ , I ⋆ ). If µ(x⋆ ) = c and µ(yi⋆ ) = bi for i = 1, 2, then we may write Θcb1 ,b2 instead of Θµ . Observe that whenever y is a head variable, the first element w of Θ(y) is ⋄; hence, Θ(y) (where y is the sequence of head variables of Q) does not mention x⋆ . Therefore, if µ1 , µ2 ∈ M(Q, IΘ ) are such that µ1 (yi⋆ ) = µ2 (yi⋆ ) for i = 1, 2, then Θµ1 (y) = Θµ2 (y). If µ ∈ M(Q, IΘ ) is such that µ(y1⋆ ) = b1 and µ(y2⋆ ) = b2 , then Θµ (y) is also denoted by Θb1 ,b2 (y). Finally, if z is a variable such that Θ(z) does not include y1⋆ , then we Θcb1 ,b2 (z) = Θcb′ ,b2 (z) for all b1 and b′1 . 1 Therefore, in that case we may write just Θcb2 (z) instead of c Θb1 ,b2 (z). Similarly, if z does not include y2⋆ , then we may write Θcb1 (z) instead of Θcb1 ,b2 (z). We use the same convention for an atom γ such that Θ(γ) does not mention y1⋆ or y2⋆ ; that is, if γ does not include y1⋆ , then we may write Θcb2 (γ) instead of Θcb1 ,b2 (γ), and if γ does not include y2⋆ , then we may write Θcb1 (γ) instead of Θcb1 ,b2 (γ). Proof. Suppose that Q+ has functional head domination. Using Lemma 3.8, it is enough to show that the algorithm UniRelhS, Q+ i (that actually acts exactly the same as UniRelhS, Qi) optimally solves MaxDPhS, Q+ i. Suppose that P1 , . . . , Pk are the connected components of G∃ (Q+ ). We assume that Varh (Pi ) is the set Varh (Q+ ) ∩ Var(Pi ) (that is, the context is Q+ rather than Q). For each Pi (i = 1, . . . , k), let φi ∈ atoms(Q+ ) be such that img(φi ) contains Varh (Pi ). Those φi exist, since we assume that Q+ has functional head domination. Consider the input I and a for MaxDPhS, Q+ i, and let Jopt be an optimal solution. We will prove that there is some j ∈ {1, . . . , k}, such that none of the (Q+ , a)-useful facts over Rφj is Q+ -useful in Jopt . This implies that the intermediate solution Jφj constructed by the unirelation algorithm is such that Q+ (Jopt ) ⊆ Q+ (Jφj ), thus |Q+ (Jopt )| ≤ |Q+ (Jφj )|. In particular, the solution J returned by the unirelation algorithm satisfies |Q+ (Jopt )| ≤ |Q+ (J)|. Suppose, by way of contradiction, that for all i ∈ {1, . . . , k} there is a fact fi in Rφi that is (Q+ , a)-useful in I and is also Q+ -useful in Jopt . We will show that a ∈ Q+ (Jopt ), which contradicts the fact that Jopt is a solution. For each i ∈ {1, . . . , k}, define two assignments µai and µi as follows. B.1.2 Proofs Lemma 5.4. IΘ satisfies every fd of ∆. Proof. It is enough to show that every fd R : A → i of S (having a single index i on the right-hand side) is satisfied. So let R : A → i be an fd of S, and let f and f ′ be two facts over R, such that agree on all the attributes in A. We need to show that f and f ′ agree on the ith position as well. Consider the atom φR (over R), and let ZA be the set of variables of φR in the positions of A. We use Θ(ZA ) denote the set {Θ(z) | z ∈ ZA }. Let zi be the variable of φR in the ith position. Observe that R : A → i implies that zi ∈ img(ZA ). Now, let hw, w1 , w2 i and hw′ , w1′ , w2′ i be the values in the ith positions of f and f ′ , respectively. We need to prove that w = w′ and wi = wi′ (i = 1, 2), and we do so as follows. To prove that w = w′ , suppose, by way of contradiction, that w 6= w′ . Then Θ(zi ) contains x⋆ (in the leftmost coordinate). In particular, from the definition of Θ it follows that φR is in P , and that zi is an existential variable. Now, since f and f ′ agree on A, then none of the triples in Θ(ZA ) contain x⋆ , which means that each variable in ZA is a head variable (as φR ∈ P ). But then, this contradicts our assumption that Q = Q+ . For proving that w1 = w1′ , suppose, by way of contradiction, that w1 6= w1′ . Then Θ(zi ) contains y1⋆ (hence zi ∈ / fdSep(φ2 , y1 )) and Θ(z) does not contain y1⋆ for each z ∈ ZA (hence ZA ⊆ fdSep(φ2 , y1 )). But then, we get a contradiction to Part 1 of Theorem 4.2 with Var(φ2 ) and y1 playing the roles of Z and z, respectively. The proof that w2 = w2′ is symmetrical to that for w1 = ′ w1 . We conclude that f = f ′ , as claimed. • µai ∈ M(Q+ , I), µai (φi ) = fi , and µai (y) = a. Note that such µai exists, since fi is (Q+ , a)-useful in I. • µi ∈ M(Q+ , Jopt ) and µi (φi ) = fi . Note that such µi exists, since fi is Q+ -useful in Jopt . Let i and j be two different integers in {1, . . . , k}. We first show that µi and µj agree on the variables in Var(Pi ) ∩ Var(Pj ). So let z ∈ Var(Pi ) ∩ Var(Pj ) be given. Then z is necessarily in Varh (Q+ ), since Pi and Pj do not share existential variables of Q+ . Therefore, z ∈ Varh (Pi ) ∩ Varh (Pj ), and hence, in both img(φi ) and img(φj ). We need to show that µi (z) = µj (z). So, we have the following: • µi (z) = µai (z) since µi and µai agree on Var(φi ); • µai (z) = µaj (z) since µai and µaj agree on Varh (Q+ ); • µaj (z) = µj (z) since µj and µaj agree on Var(φj ). So, we build an assignment µ for Q+ , as follows. For z ∈ Var(Q+ ), we set µ(z) = µi (z) if i is such that z ∈ Var(Pi ). Note that µ is a match for Q+ in Jopt , since each µi is a match for Q+ in Jopt . We complete the proof by showing that µ(y) = a. Let y be a head variable of Q+ , and suppose that y ∈ Varh (Pi ). Then µ(y) = µi (y) by the construction of µ, and µi (y) = µai (y) since y ∈ img(φ) and the matches µi and µai agree on Var(φi ). Hence, µ(y) = µai (y). We conclude that for all y ∈ Varh (Q+ ) there exists i ∈ {1, . . . , k}, such that µ(y) = µai (y). Since each µai is such that µai (y) = a, we get that µ(y) = a, as claimed. B. B.1 Before we prove Lemma 5.5, we need the following lemma. Lemma B.1. Let p be a path in Gv (Q), and let j be one of 1, 2 or 3. Suppose that the jth element of Θ is the same on all the nodes of p. Then for all ν ∈ M(Q, IΘ ), the jth element of ν is the same on all the nodes of p. PROOFS FOR SECTION 5 Proofs for Section 5.2 Proof. It is enough to show the claim for each edge of p. Let {z, z ′ } be an edge of p, and suppose that the jth element of Θ(z) is the same as the jth element of Θ(z ′ ). Since {z, z ′ } is an edge, there is an atom φ that contains both z and z ′ . And since ν is a match, we get that ν(φ) is a fact of IΘ . B.1.1 Notation We first give some notation that will be used throughout this section. 14 ν(y1 ) = Θb1 ,b2 (y1 ). On the other hand, ν(y1 ) has b′1 as its second element since ν(y1 ) = Θcb′ ,b′ (y1 ). Therefore, we 1 2 get that b1 = b′1 . We similarly prove that b2 = b′2 . This concludes the proof for the “only if” direction. We now prove the “if” direction. Suppose that both Θcb1 (γ1 ) and Θcb2 (γ2 ) are Q-useful in JΘ . From the construction of IΘ we conclude that I ⋆ contains both R1 (c, b1 ) and R2 (c, b2 ). Therefore, Θcb1 ,b2 is a match for Q in IΘ . We will show that Θcb1 ,b2 is a match for Q in JΘ as well. Since Θcb1 ,b2 (y) = Θb1 ,b2 (y), this would imply the required Θb1 ,b2 (y) ∈ Q(JΘ ). To prove that Θcb1 ,b2 ∈ M(Q, JΘ ), we need to show that c Θb1 ,b2 (φ) is in JΘ for all φ ∈ atoms(Q). So, let φ ∈ atoms(Q) be given. If φ is an encounter, then Θcb1 ,b2 (φ) is in JΘ by the assumption of the lemma. So, suppose that Θ(φ) does not contain y2⋆ . (The case where Θ(φ) does not contain y1⋆ is handled symmetrically.) Let ν1 be a match for Q in JΘ , such that ν1 (γ1 ) = Θcb1 (γ1 ). Note that such ν1 exists, since Θcb1 (γ1 ) is Q-useful in JΘ . From Lemma 5.5 we conclude ′ that ν1 = Θcb′ ,b′ for some c′ , b′1 and b′2 . Now, recall that 1 2 Θ(γ1 ) contains y1⋆ by assumption, and that Θ(γ1 ) contains x⋆ since γ1 is in P (and hence, has variables in Var∃ (P )). ′ Therefore, Θcb1 (γ1 ) = Θcb′ ,b′ (γ1 ) implies that c = c′ and 1 2 b1 = b′1 . Since Θcb1 ,b2 (φ) does not contain y2⋆ , the definition of Θcb1 ,b2 implies that Θcb1 ,b2 (φ) is equal to Θcb1 ,b′ (φ), which 2 is ν1 (φ). And since ν1 ∈ M(Q, J ⋆ ), we conclude that ν1 (φ), c and hence Θb1 ,b2 (φ), is in JΘ , as claimed. The construction of IΘ is such that every fact is obtained by replacing elements of Θ with constants; in particular, in the creation of the fact ν(φ) it replaces the jth element of z and z ′ with the same constant. We conclude that the jth element of ν(z) is the same as the jth element of ν(z ′ ), as required. Lemma 5.5. M(Q, IΘ ) = {Θµ | µ ∈ M(Q⋆ , I ⋆ )}. Proof. Let ν be a match for Q in IΘ . We need to show that ν = Θµ for some µ ∈ M(Q⋆ , I ⋆ ). Let ρ1 and ρ2 be atoms of P that contain y1 and y2 , respectively. Note that such ρ1 and ρ2 exist since we assume that both parameters y1 and y2 (as well as the parameters φ1 and φ2 ) occur in P . Recall from the definition of Θ that for each variable z ∈ Var∃ (P ), the first element in the triple Θ(z) is x⋆ . Since P is a connected component of G∃ (Q), the subgraph of Gv (Q) that is induced by Var∃ (P ) is connected. Hence, from Lemma B.1 it follows ν(x) has the same first element on all the existential variables x of P . Clearly, the atoms ρ1 and ρ2 contain existential variables, since they are both in P . So, let c be the first element of ν(x) on all the existential variables x of P . Let µ be the mapping that maps x⋆ to c, maps y1⋆ to the second element of ν(y1 ), and maps y2⋆ to the third element of ν(y2 ). The template construction implies that µ is necessarily in M(Q⋆ , I ⋆ ). We complete the proof by showing that ν = Θµ . So, let z ∈ Var(Q) be a variable. We need to show that ν(z) = Θµ (z). Let us denote ν(z) = hd, d1 , d2 i and Θµ (z) = he, e1 , e2 i. So, we will show that d = e and ei = di (i = 1, 2). Suppose that Θ(z) = hw, w1 , w2 i. We first prove that d = e. If z ∈ / Var∃ (P ), then w = ⋄, and then both d and w are ⋄. Otherwise w = x⋆ and e = c. Then, d = c follows from the way c is selected above. We now prove that d1 = e1 . If w1 = ✁, then d1 = ✁ = e1 . Otherwise, z ∈ / fdSep(φ2 , y1 ), which means that there is a path p in Gv (Q) that connects z to y1 and does not visit img(φ2 ). But then, none of the nodes of p are in fdSep(φ2 , y1 ), and hence, each of these nodes has y1⋆ as its second element. Lemma B.1 then implies that ν as the same second element on all the variables of p. In particular, d1 is the same as the second element of ν(y1 ), which is µ(y1⋆ ), which is e1 . The proof of d2 = e2 is similar. Lemma 5.7. Suppose that P has two non-encounters γ1 and γ2 , such that Θ(γ1 ) contains y1⋆ and Θ(γ2 ) contains y2⋆ . Then, the following are equivalent. 1. There is a side-effect-free solution for I ⋆ and a⋆ . 2. There is a side-effect-free solution JΘ for IΘ and a, such that RφJΘ = RφIΘ for all encounters φ. Proof. This lemma is weaker than the next lemma, which we need for later use. Lemma B.3. Suppose that P has two non-encounters γ1 and γ2 , such that Θ(γ1 ) contains y1⋆ and Θ(γ2 ) contains y2⋆ . Then, the following are equivalent. 1. There is a side-effect-free solution for I ⋆ and a⋆ . 2. There is a side-effect-free solution JΘ for IΘ and a, such that RφJΘ = RφIΘ for all φ ∈ atoms(Q) \ {γ1 , γ2 }. 3. There is a side-effect-free solution JΘ for IΘ and a, such that RφJΘ = RφIΘ for all encounters φ. Lemma B.2. Let JΘ be a subinstance of IΘ with the property that RφJΘ = RφIΘ for all encounters φ. Suppose that γ1 , γ2 ∈ P are non-encounters such that Θ(γ1 ) contains y1⋆ and Θ(γ2 ) contains y2⋆ . For all Θb1 ,b2 (y) ∈ Q(IΘ ) we have Θb1 ,b2 (y) ∈ Q(JΘ ) if and only if there is a constant c, such that some Θcb1 (γ1 ) and Θcb2 (γ2 ) are both Q-useful in JΘ . Proof. To prove the lemma, we will use the following observation. Let (c, b1 , b2 ) and (c′ , b′1 , b′2 ) be triples of constants. Then (c, b1 ) = (c′ , b′1 ) if and only if Θcb1 (γ1 ) = ′ Θcb′ (γ1 ); this is true because Θ(γ1 ) contains both x⋆ (since 1 γ1 ∈ P ) and y1⋆ (by assumption). Similarly, (c, b2 ) = (c′ , b′2 ) ′ if and only if Θcb2 (γ2 ) = Θcb′ (γ2 ). We now proceed to prove 2 the lemma. 1 ⇒ 2. Suppose that J ⋆ is a side-effect-free solution for ⋆ I and a⋆ . We will construct a solution JΘ for IΘ and a, and show that JΘ is side-effect free. The solution J ⋆ is constructed as follows. Start with JΘ = IΘ . Then: • For all R1⋆ (c, b1 ) ∈ I ⋆ \ J ⋆ , remove Θcb1 (γ1 ) from JΘ . • For all R2⋆ (c, b2 ) ∈ I ⋆ \ J ⋆ , remove Θcb2 (γ2 ) from JΘ . To see that JΘ is a solution, and furthermore side-effect free, we will show that (b1 , b2 ) ∈ Q⋆ (J ⋆ ) if and only if Θb1 ,b2 (y) ∈ Q(JΘ ). For the “only if” direction, suppose that Proof. We first prove the “only if” direction. Suppose that Θb1 ,b2 (y) ∈ Q(JΘ ), and consider a match ν for Q in JΘ , such that ν(y) = Θb1 ,b2 (y). We will show a constant c and two facts Θcb1 (γ1 ) and Θcb2 (γ2 ) that are both Q-useful in JΘ . We apply Lemma 5.5, and assume that ν = Θcb′ ,b′ for 1 2 some constants c, b′1 and b′2 . We will show that b′1 = b1 and ′ c b2 = b2 . This would imply that Θb1 ,b2 is a match for Q in JΘ , which (by definition) would give us the desired Θcb1 (γ1 ) and Θcb2 (γ2 ) (i.e., such that are Q-useful in JΘ ). As y1⋆ occurs in the image of Θ, we conclude that φ2 does not fd-separate y1 from itself, which is equivalent to saying that y1 ∈ / img(φ2 ). In particular, Θ(y1 ) has y1⋆ as its second element. Therefore, ν(y1 ) has b1 as its second element since 15 B.3 (b1 , b2 ) ∈ Q⋆ (J ⋆ ). So there is a constant c, such that both R1⋆ (c, b1 ) and R2⋆ (c, b2 ) are in J ⋆ . From the observation in the beginning of the lemma we conclude that we have not removed from JΘ any fact in the image Θcb1 ,b2 . Therefore, Θb1 ,b2 (y) = Θcb1 ,b2 (y) ∈ Q(JΘ ). For the “if” direction, suppose that Θb1 ,b2 (y) ∈ Q(JΘ ). From Lemma B.2 we get that there is a constant c, such that both Θcb1 (γ1 ) and Θcb2 (γ2 ) are Q-useful in JΘ . In particular, JΘ contains both Θcb1 (γ1 ) and Θcb2 (γ2 ), which implies that both R1⋆ (c, b1 ) and R2⋆ (c, b2 ) are in J ⋆ . Therefore, (b1 , b2 ) ∈ Q⋆ (JΘ ). 2 ⇒ 3. This direction is straightforward, since we assume that neither γ1 nor γ2 are encounters. 3 ⇒ 1. We will construct a solution J ⋆ as follows. Start with J ⋆ = I ⋆ . Then: • For each fact R1⋆ (c, b1 ) ∈ I ⋆ , if Θcb1 (γ1 ) is not Q-useful in JΘ , then remove R1⋆ (c, b1 ) from J ⋆ . • For each fact R2⋆ (c, b2 ) ∈ I ⋆ , if Θcb2 (γ2 ) is not Q-useful in JΘ , then remove R2⋆ (c, b2 ) from J ⋆ . To see that J ⋆ is a solution, and furthermore side-effect free, we will show that Θb1 ,b2 (y) ∈ Q(JΘ ) if and only if (b1 , b2 ) ∈ Q⋆ (J ⋆ ). For the “only if” direction, suppose that Θb1 ,b2 (y) ∈ Q(JΘ ). From Lemma B.2 we have that there is a constant c, such that both Θcb1 (γ1 ) and Θcb2 (γ2 ) are Q-useful in JΘ . Therefore, we removed none of R1⋆ (c, b1 ) and R2⋆ (c, b2 ) to construct J ⋆ , which implies that (b1 , b2 ) ∈ Q⋆ (J ⋆ ). For the “if” direction, suppose that (b1 , b2 ) ∈ Q⋆ (J ⋆ ). Then for some c we have that J ⋆ contains both R1⋆ (c, b1 ) and R2⋆ (c, b2 ). Observe that Θcb1 ,b2 is a match for Q in IΘ , and from the construction we conclude that both Θcb1 (γ1 ) and Θcb2 (γ2 ) remain Q-useful in JΘ (otherwise J ⋆ would miss at least one of the two facts). Finally, from Lemma B.2 get that Θb1 ,b2 ∈ Q(J ⋆ ), as claimed. B.2 Proofs for Section 5.4 B.3.1 Notation and Assumption In the proofs of this section, we use the following terminology. Let φ be an encounter, and let µ be a match in Ma⋆ (Q⋆ , I ⋆ ) with µ(x⋆ ) = c. We call νcφ an encounter assignment. A variable z ∈ Var(Q) where νcφ (z) = ♥φc is called a heart variable of νcφ . We denote by Ma (Q, I) be the set the matches ν ∈ M(Q, I), such that ν(y) = a. Throughout this section, unless specified otherwise we assume that we are in either Case 1.b or Case 2. B.3.2 Proofs Lemma 5.14. In Cases 1.b and 2, I satisfies each fd of S. Proof. We first prove the lemma for Case 1.b. Suppose, by way of contradiction, that I does not satisfy some fd R : A → i. Lemma 5.4 states that IΘ satisfies ∆. In particular, IΘ satisfies R : A → i. So, the violation of R : A → i occurs during the consideration of the encounters (Section 5.4). Suppose that the first violation of R : A → i is when adding to I the fact f = νcφ (φR ), where φ is an encounter and µ ∈ Ma⋆ (Q⋆ , I ⋆ ) is such that µ(x⋆ ) = c. (Recall that φR is Q’s atom over R.) Since the addition of f causes the (first) violation of R : A → i, by the time f is added I already contains a fact f ′ 6= f over R, such that f and f ′ agree on the positions of A by not on the ith position. Note that φR cannot have a constant in the ith position, or otherwise the construction of I would not allow for disagreement on the ith position of R. Let ZA be the set of variables of φR in the positions of A, and let zi be the variable of φR in the ith position. Note that the fd R : A → i implies that zi ∈ img(ZA ). Note that f is the only tuple of R that may contain the constant ♥φc . Since f and f ′ agree on ZA and f ′ does not contain ♥φc , we get that νcφ (z) 6= ♥φc for all z ∈ ZA . So from the definition of νcφ (z) we conclude the following: • z ∈ fdSep(φ, y1 ) for all z ∈ ZA , or in other words, Proofs for Section 5.3 Lemma 5.8. In Case 1.a, φ ∈ P . Proof. Suppose, by way of contradiction, that φ ∈ / P. From the fact that P contains both y1 and y2 we conclude that in Gv (Q) there is a path p between y1 and y2 , such that all the intermediate nodes of p are existential variables of P . Since φ fd-separates y1 from y2 , and neither y1 nor y2 are in img(φ), we conclude that img(φ) contains at least one existential variable of P . But that stands in contradiction to Part 3 of Proposition A.1. ZA ⊆ fdSep(φ, y1 ) . νcφ (z) = Θµ (z) for all z ∈ ZA . • So we have zi ∈ img(ZA ) and ZA ⊆ fdSep(φ, y1 ), which implies that zi ∈ img(fdSep(φ, y1 )). Theorem 4.2 (Part 1) implies that img(fdSep(φ, y1 )) = fdSep(φ, y1 ), and hence, zi ∈ fdSep(φ, y1 ). A symmetrical proof shows that zi ∈ fdSep(φ, y2 ). By the definition of νcφ , we have zi = Θµ (zi ). Therefore, the fact Θµ (φR ) (which belongs to IΘ ) agrees with f on all of the positions in A ∪ {i}. But then, the violation of R : A → i existed before the addition of f : it has been already realized by the facts Θµ (φR ) and f ′ . This contradicts our choice of f as the first fact that causes the violation of R : A → i, as required. For Case 2, the proof is similar. The only difference is that instead of proving zi ∈ fdSep(φ, y) just for y ∈ {y1 , y2 }, we prove zi ∈ fdSep(φ, y) for all y ∈ Varh (p). Lemma 5.10. In Case 1.a, the following hold. • There are no encounters. • For each i ∈ {1, 2}, the tuple Θ(yi ) contains yi⋆ . Proof. We begin with Part 1. Suppose, by way of contradiction, that γ is an encounter (i.e., Θ(γ) contains both y1⋆ and y2⋆ ). From the construction of Θ we conclude that Gv (Q) contains a path p1 from y1 to some variable z1 of γ, and a path p2 from y2 to some variable z2 of γ, such that none of p1 and p2 visit any node of img(φ). But then, by combining p1 , p2 and the edge {z1 , z2 } (which is not needed if z1 = z2 ), we get a path between y1 and y2 that does not visit any node of img(φ). This contradicts our assumption that φ fd-separates y1 from y2 . The fact that Θ(y1 ) contains y1⋆ is due to the assumption that y1 ∈ / img(φ2 ) (recall that φ2 = φ), which implies that y1 ∈ / fdSep(φ2 , y1 ). The proof that Θ(y2 ) contains y2⋆ is symmetrical. Lemma B.4. For i = 1, 2, the atom φi is a non-encounter, and Θ(φi ) contains yi⋆ . Proof. We will prove that Θ(φ1 ) contains y1⋆ but not y2⋆ ; the proof that Θ(φ2 ) contains y2⋆ but not y1⋆ is by a symmetrical argument. 16 of the nodes of p′ are in img(φ). Again, this means that all the nodes of p′ are heart variables, which implies that p′ is a path of G♥ . In particular, G♥ contains a path from a variable of γ (namely z ′ ) to y. Finally, we similarly show that G♥ contains a path from a variable of γ to y ′ . Suppose, by way of contradiction, that y1⋆ is not in Θ(φ1 ). The definition of Θ implies that Var(φ1 ) ⊆ fdSep(φ2 , y1 ). From Part 1 of Theorem 4.2 we get that img(Var(φ)) ⊆ fdSep(φ2 , y1 ). In particular, y1 ∈ fdSep(φ2 , y1 ) since y1 ∈ img(φ1 ) (by our choice of the parameters y1 and φ1 in each of the two cases). But, the only way y1 ∈ fdSep(φ2 , y1 ) can hold is if y1 ∈ img(φ2 ) (since then the path that contains just y1 visits a node of img(φ2 )). This is a contradiction, since we know (from our choice of the parameters in each of the two cases) that y1 ∈ / img(φ2 ). The fact that y2⋆ does not occur in Θ(φ1 ) is obvious from the definition of Θ, since φ1 fd-separates every variable of itself from y2 . Lemma B.6. Let νcφ be an encounter assignment. Both Var(φ1 ) and Var(φ2 ) contain heart variables of νcφ . Proof. We first prove the lemma for Case 1.b. We will prove that Var(φ1 ) contains heart variables of νcφ (and the proof for φ2 is by a symmetrical argument). Suppose, by way of contradiction, that Var(φ1 ) does not contain any heart variables of νcφ . Then, from the construction of νcφ we conclude that each variable of φ1 must be fdSep(φ, y1 ). That is, Var(φ) ⊆ fdSep(φ, y1 ). From Part 1 of Theorem 4.2 we conclude that img(φ1 ) ⊆ fdSep(φ, y1 ), and in particular, y1 ∈ fdSep(φ, y1). Of course, y1 ∈ fdSep(φ, y1) means that y1 ∈ img(φ). From the fact that we are in Case 1 we conclude that y2 ∈ / img(φ). Next, we prove that fdSep(φ1 , y2 ) is a strict subset of fdSep(φ, y2 ). The fact that Var(φ1 ) does not contain a heart variable of νcφ also means that each variable of φ1 is in fdSep(φ, y2 ) (i.e., Var(φ1 ) ⊆ fdSep(φ, y2 )). Then, Part 2 of Theorem 4.2 implies that fdSep(φ1 , y2 ) ⊆ fdSep(φ, y2 ). Now, since φ is an encounter, some variable zφ of φ is such that Θ(zφ ) contains y2⋆ . The definition of Θ implies that zφ ∈ / fdSep(φ1 , y2 ). But zφ ∈ fdSep(φ, y2 ) since zφ ∈ Var(φ). So fdSep(φ1 , y2 ) is a strict subset of fdSep(φ, y2 ), as claimed. Next, we show that φ ∈ P . Suppose otherwise (by way of contradiction). The fact that P is a connected component of G∃ (Q) implies that each existential variable of P is connected to y2 in Gv (Q) through a path that contains only existential variables of P . Part 3 of Proposition A.1 implies that img(φ) has no existential variables of P (recall that Q = Q+ is assumed). We conclude that φ1 (which is in P and hence has existential variables of P ) has a variable x that is connected to y2 through a path that visits no nodes in img(φ), which contradicts our previous conclusion that Var(φ1 ) ⊆ fdSep(φ, y2 ). We showed that both y1 ∈ img(φ) and φ ∈ P hold, and yet, fdSep(φ1 , y2 ) is a strict subset of fdSep(φ, y2 ). This contradicts the choice of φ1 as one where fdSep(φ1 , y2 ) has the maximal number of elements among the atoms φ ∈ P that satisfy y1 ∈ img(φ). We now consider Case 2, and first prove that Var(φ1 ) contains heart variables of νcφ . Suppose, by way of contradiction, that Var(φ1 ) does not contain any heart variables of νcφ . Let y be a head variable of P , such that y ∈ / img(φ). Using an argument similar to one given in the proof for Case 1.b we can show that φ ∈ P , or otherwise we get a contradiction to the fact that Var(φ1 ) ⊆ fdSep(φ, y). Suppose first that there is some head variable y ∈ img(φ1 ), such that y ∈ / img(φ). Then y is a heart variable of φ. Let µ ∈ Ma⋆ (Q⋆ , I ⋆ ) be a match with µ(x⋆ ) = c. So we have νcφ (y) = ♥φc , and of course, Θcµ (y) 6= ♥φc . Hence, y ∈ img(φ1 ) (together with Lemma 5.14) implies that νcφ (φ1 ) 6= Θµ (φ1 ). From the definition of νcφ we conclude that φ1 must have heart variables, which is a contradiction. P Now suppose that img P h (φ1 ) ⊆ img h (φ). From the fact P that |img h (φ1 )| is maximal among the atoms of P , we conP clude that img P h (φ1 ) = img h (φ). In particular, y1 ∈ img(φ) and y2 ∈ / img(φ). Since φ1 has no heart variables, every path of Gv (Q) from a variable of φ1 to y2 visits img(φ), or Lemma B.5. Let νcφ be an encounter assignment. The heart variables of νcφ induce a connected subgraph of Gv (Q). Proof. We first prove the lemma for Case 1.b. We consider three cases, depending on properties of φ. Case a: y1 ∈ img(φ). Let z be a heart variable of νcφ . By the construction of νcφ we have z ∈ / fdSep(φ, y1 )∩fdSep(φ, y2 ). Note that z ∈ fdSep(φ, y1 ), since y1 ∈ img(φ). Therefore, z∈ / fdSep(φ, y2 ). This means that Gv (Q) contains a path p between z and y2 , such that none of the nodes (variables) of p are in img(φ). But then, none of the nodes of p are in fdSep(φ, y2 ), which implies that p consists of only heart variables. We conclude that in the subgraph induced by the heart variables, each node is connected to y2 , and hence, this subgraph is connected. Case b: y2 ∈ img(φ). The proof for this case is by an argument symmetrical to that of Case a. Case c: img(φ) contains neither y1 nor y2 . Here, we use the assumption that such φ does not fd-separate y1 from y2 , which means that there is a path p12 between y1 and y2 , such that p12 does not contain any node in img(φ). Since z is a heart variable, either z ∈ / fdSep(φ, y1 ) or z ∈ / fdSep(φ, y2 ), and hence, Gv (Q) contains a path p that connects z to either y1 or y2 , and p does not contain any node in img(φ). Hence, the combination of p and p12 implies that in the subgraph induced by the heart variables, z is connected to y1 (and to y2 ). Hence, this induced subgraph is connected. We now consider Case 2. Let G♥ be the subgraph of Gv (Q) that is induced by the heart variables of νcφ . Let z be a heart variable of νcφ . By definition, z ∈ / fdSep(φ, y) for some y ∈ Varh (P ). Hence, Gv (Q) contains a path p between z and y, such that p does not visit a node of img(φ). Note that none of the nodes of p are in fdSep(φ, y); therefore, all the nodes of p are heart variables. We conclude that p is a path in G♥ , and hence, z is connected to y in G♥ . Of course, y is not in img(φ). So, it is enough to show that every two variables y and y ′ where y, y ′ ∈ Varh (p) \ img(φ) are connected by a path in G♥ . We prove that next. Suppose that y and y ′ are two variables in Varh (p)\img(φ). It is enough to show that there is an atom γ, such that G♥ has a path from a variable of γ to y, and a path from a variable of γ to y ′ (this is enough, since those variables of γ are either the same or connected by an edge in G∃ (Q)). We choose γ to be an atom of Q, such that y, y ′ ∈ img(γ); due to the case we currently consider (Case 2), such γ necessarily exists. If Var(γ) ⊆ fdSep(φ, y), then Part 1 of Theorem 4.2 implies that y is in fdSep(φ, y), which is impossible since y ∈ / img(φ). Hence, there is some variable z ′ of γ, such that Gv (Q) contains a path p′ between z ′ and y, and none 17 in other words, Var(φ1 ) ⊆ fdSep(φ, y2 ). From Part 1 of Theorem 4.2 we conclude that fdSep(φ1 , y2 ) ⊆ fdSep(φ, y2 ). But then, the fact that φ is an encounter implies that Var(φ) 6⊆ fdSep(φ1 , y2 ), and hence, we get that fdSep(φ, y2 ) is a strict superset of fdSep(φ1 , y2 ). Bit this contradicts our choice of φ1 as one with a maximal cardinality for fdSep(φ, y2 ). The proof that Var(φ2 ) contains heart variables of νcφ is similar to the proof for φ1 . Specifically, in the case where we P P assume that img P h (φ2 ) ⊆ img h (φ), we have that img h (φ2 ) ⊆ P P P img h (φ) implies that img h (φ2 ) = img h (φ); this is true since P in the selection of φ1 and φ2 we had img P h (φ) 6⊆ img h (φ1 ) P as y2 ∈ img P (φ) and φ is such that |img (φ )| is of max2 2 h h imal cardinality among those γ ∈ P satisfying img P h (γ) 6⊆ img P h (φ1 ). Proof. From Lemmas B.7 and B.8 (and due to the fact that a has no heart constants in it), we conclude that every match ν ∈ Ma (Q, I) is an instantiation of Θ, that is, ν is equal to Θµ for some µ ∈ M(Q⋆ , I ⋆ ). Observe that Θ(y1 ) (where i = 1, 2) contains y1⋆ , since y1 ∈ / img(φ2 ). Similarly, Θ(y2 ) contains y2⋆ . We conclude that for each ν ∈ M(Q, I), both y1⋆ and y2⋆ appear in ν(y). In particular, due to our choice of a, for µ ∈ M(Q⋆ , I ⋆ ) we have Θµ (y) = a if and only if µ(y1⋆ ) = ✁ and µ(y2⋆ ) = ✄. This completes the proof. Lemma 5.16. In Cases 1.b and 2, the atom φi (i = 1, 2) is a non-encounter, and Θ(φi ) contains yi⋆ . Moreover, the following are equivalent. 1. There is a side-effect-free solution JΘ for IΘ and a, such that RφJΘ = RφIΘ for all encounters φ. 2. There is a side-effect-free solution for I and a. Lemma B.7. Every match for Q in I is either an instantiation of Θ or an encounter assignment. Proof. Let ν be a match for Q in I. If no heart appears in the image of ν, then ν is a match for Q in IΘ , and from Lemma 5.5 we conclude that ν is an instantiation of Θ. So, suppose that the image of ν contains some ♥φc . We first show that the image of ν does not contain any ′ φ′ ♥c′ that is different from ♥φc . So, let ♥φc′ be another heart in the image of ν. Lemmas B.5 and B.6 imply that in Gv (Q), each heart variable of νcφ is connected to some variable of φ1 through a path p that contains only heart variables of νcφ . Since no fact of I contains two different hearts (by the construction of I), the image of all the nodes z of p are such that ν(z) = ♥φc . In particular, ν(φ1 ) contains ♥φc . A similar ′ argument shows that ν(φ1 ) contains ♥φc′ . Again, no fact of ′ I contains two different hearts, and therefore, ♥φc = ♥φc′ . Next, we will show that ν(z) = νcφ (z) for all variables z of φ. Lemma B.5 and the way I is constructed imply that every heart variable of νcφ is such that ν(z) = ♥φc . Hence, ν(z) = νcφ (z) whenever z is a heart variable of νcφ . Let µ ∈ Ma⋆ (Q⋆ , I ⋆ ) be a match with µ(x⋆ ) = c. It remains to show that νcφ (z ′ ) = Θµ (z ′ ) whenever z ′ is not a heart variable of νcφ . Let ν ′ be obtained from ν by replacing ν(z) with Θµ (z) whenever z is a heart variable of νcφ . From the construction of νcφ it follows that each fact in the image of ν ′ is a fact of IΘ , and hence, ν ′ is a match for Q in IΘ . From Lemma 5.5 we conclude that ν ′ is Θµ′ for some µ′ ∈ M(Q⋆ , I ⋆ ). We will show that µ = µ′ . Lemma B.6 states that φ1 contains a heart variable of νcφ . Since only one fact over Rφ1 contains ♥φc , we conclude that ν(φ1 ) = ν ′ (φ1 ). Now, Lemma B.4 shows that Θ(φ1 ) mentions y1⋆ , and therefore, µ(y1⋆ ) = µ′ (y1⋆ ). We similarly show that µ(y2⋆ ) = µ′ (y2⋆ ). Finally, since φ1 is in P we conclude that Θ(φ1 ) contains x⋆ , and hence, µ(x⋆ ) = µ′ (x⋆ ). It follows that µ = µ′ , and hence, ν ′ = Θµ . In particular, νcφ (z ′ ) = Θµ (z ′ ) whenever z ′ is not a heart variable of νcφ , as claimed. Proof. The fact that φi (i = 1, 2) is a non-encounter and Θ(φi ) contains yi⋆ is stated by Lemma B.4. 1 ⇒ 2. We assume that JΘ is a side-effect-free solution for IΘ and a, such that RφJΘ = RφIΘ for all encounters φ. Building on Lemma B.3 (specifically, the equivalence between 2 and 3), we can assume that RφJΘ = RφIΘ for all φ ∈ atoms(Q)\{φ1 , φ2 }. Here, we implicitly use Lemma B.4, which implies that in Lemma B.3 we can use φ1 and φ2 in place of γ1 and γ2 , respectively. Let J be obtained from I by removing every fact in IΘ \JΘ . We will prove that J is side-effect free. Lemma B.9 implies that J is a solution, since a is obtained only through matches that require facts only from JΘ (and we have a ∈ / Q(JΘ )). Lemma B.6 implies that we have not lost any answer of Q(I) of the form νcφ (y), since none of the νcφ use facts among the removed ones. Furthermore, we have not lost any answer of the form Θµ (y), except for a, since each such Θµ (y) is in Q(JΘ ). Finally, from Lemma B.7 we conclude that Q(J) = Q(I) \ {a}, that is, J is side-effect free, as claimed. 2 ⇒ 1. Let J be a side-effect-free solution for I and a. Without loss of generality, we assume that J is economical in the sense that it does not miss any fact of I unless this fact is (Q, a)-useful in I. Clearly, every solution can be transformed into an economical one. Let JΘ be the subinstance J ∩ IΘ . We will show that JΘ is side-effect-free solution for IΘ and a, and that RφJΘ = RφIΘ for all encounters φ. To show that JΘ is side-effect free, let b 6= a be an answer in Q(IΘ ), and we need to show that b ∈ Q(JΘ ). We know that b ∈ Q(J), since J is side-effect free (and Q(IΘ ) ⊆ Q(I)). From Lemmas B.7 and B.8 we get that the matches for Q in J that produce b use facts only from IΘ . Since all these facts are in JΘ , we get that b ∈ Q(JΘ ), as claimed. Let φ be an encounter. We complete the proof by proving that RφJΘ = RφIΘ . Let f be a fact of RφIΘ . We will show that f ∈ J. Suppose first that f = Θc✁,✄ (φ) for some c. Lemmas B.8 and B.7 imply that every νcφ is match for Q in J (since J is side-effect free). In particular, f is in J, since f = νcφ (φ) (which follows from the construction of νcφ in both Case 1.b and Case 2). Now suppose that f is a fact of the form Θb1 ,b2 (φ) where (b1 , b2 ) 6= (✁, ✄). Then f does not participate in any Θµ with µ ∈ Ma⋆ (Q⋆ , I ⋆ ), since φ is an encounter. But then, Lemma B.9 implies that f is not (Q, a)-useful in I. And since J is economical, J contains every fact that is not (Q, a)-useful, and in particular, the fact f . Lemma B.8. Let νcφ be an encounter assignment. The answer νcφ (y) in Q(I) has one or more occurrences of ♥φc . Proof. We begin with Case 1.b. Our assumption in this case implies that either y1 or y2 (or both) is not in img(φ). This means that either y1 ∈ / fdSep(φ, y1 ) or y2 ∈ / fdSep(φ, y2 ). Therefore, either νcφ (y1 ) = ♥φc or νcφ (y2 ) = ♥φc . We now consider Case 2. Let y be a head variable in Varh (P ) \ img(φ). Then y ∈ / fdSep(φ, y), and therefore, y is a heart variable of φ, which means that νcφ (y) = ♥φc . Lemma B.9. Ma (Q, I) = {Θµ | µ ∈ Ma⋆ (Q⋆ , I ⋆ )} 18
© Copyright 2025 Paperzz