Dichotomies in Ontology-Mediated Querying with the Guarded Fragment André Hernich Carsten Lutz University of Liverpool United Kingdom University of Bremen Germany [email protected] Fabio Papacchini [email protected] Frank Wolter University of Liverpool United Kingdom University of Liverpool United Kingdom [email protected] [email protected] ABSTRACT We study the complexity of ontology-mediated querying when ontologies are formulated in the guarded fragment of first-order logic (GF). Our general aim is to classify the data complexity on the level of ontologies where query evaluation w.r.t. an ontology O is considered to be in PT IME if all (unions of conjunctive) queries can be evaluated in PT IME w.r.t. O and CO NP-hard if at least one query is CO NP-hard w.r.t. O. We identify several large and relevant fragments of GF that enjoy a dichotomy between PT IME and CO NP, some of them additionally admitting a form of counting. In fact, almost all ontologies in the BioPortal repository fall into these fragments or can easily be rewritten to do so. We then establish a variation of Ladner’s Theorem on the existence of NP-intermediate problems and use this result to show that for other fragments, there is provably no such dichotomy. Again for other fragments (such as full GF), establishing a dichotomy implies the Feder-Vardi conjecture on the complexity of constraint satisfaction problems. We also link these results to Datalog-rewritability and study the decidability of whether a given ontology enjoys PT IME query evaluation, presenting both positive and negative results. Keywords Ontology-Based Data Access; Query Answering; Dichotomies 1. INTRODUCTION In Ontology-Mediated Querying, incomplete data is enriched with an ontology that provides domain knowledge, enabling more complete answers to queries [47, 10, 34]. This paradigm has recently received a lot of interest, a significant fraction of the research being concerned with the (data) complexity of querying [46, 14] and, closely related, with the rewritability of ontology-mediated queries into more conventional database query languages [16, 26, 28, 31, 27]. A particular emphasis has been put on designing ontology languages that result in PT IME data complexity, and in delinPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. PODS’17, May 14 – 19, 2017, Chicago, IL, USA. c 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. ISBN 978-1-4503-4198-1/17/05. . . $15.00 DOI: http://dx.doi.org/10.1145/3034786.3056108 eating these from the CO NP-hard cases. This question and related ones have given rise to a considerable array of ontology languages, including many description logics (DLs) [3, 37] and a growing number of classes of tuple-generating dependencies (TGDs), also known as Datalog± and as existential rules [13, 45]. A general and uniform framework is provided by the guarded fragment (GF) of first-order logic and extensions thereof, which subsume many of the mentioned ontology languages [5, 6]. In practical applications, ontologies often need to use language features that are only available in computationally expensive ontology languages, but do so in a way such that one may hope for hardness to be avoided. This observation has led to a more fine-grained study of data complexity than on the level of ontology languages, initiated in [42], where the aim is to classify the complexity of individual ontologies while quantifying over the actual query: query evaluation w.r.t. an ontology O is in PT IME if every CQ can be evaluated in PT IME w.r.t. O and it is CO NP-hard if there is at least one CQ that is CO NP-hard to evaluate w.r.t. O. In this way, one can identify tractable ontologies within ontology languages that are, in general, computationally hard. Note that an even more fine-grained approach is taken in [11], where one aims to classify the complexity of each pair (O, q) with O an ontology and q an actual query. Both approaches are reasonable, the first one being preferable when the queries to be answered are not fixed at the design time of the ontology; this is actually often the case because ontologies are typically viewed as general purpose artifacts to be used in more than a single application. In this paper, we follow the former approach. The main aim of this paper is to identify fragments of GF (and of extensions of GF with different forms of counting) that result in a dichotomy between PT IME and CO NP when used as an ontology language and that cover as many real-world ontologies as possible, considering conjunctive queries (CQs) and unions thereof (UCQs) as the actual query language. We also aim to provide insight into which fragments of GF (with and without counting) do not admit such a dichotomy, to understand the relation between PT IME data complexity and rewritability into Datalog (with inequality in rule bodies, in case we start from GF with equality or counting), and to clarify whether it is decidable whether a given ontology has PT IME data complexity. Note that we concentrate on fragments of GF because for the full guarded fragment, proving a dichotomy between PT IME and CO NP implies the long-standing Feder-Vardi conjecture on constraint satisfaction problems [23] which indicates that it is very difficult to obtain (if it holds at all). In particular, we concentrate on the fragment of GF that is invariant under disjoint unions, which we call uGF, and on fragments thereof and their ex- No Dichotomy CSP-Hard (Datalog6= 6= P TIME) Dichotomy (Datalog6= = P TIME) uGF− 2 (2, f ) uGF2 (1, =) uGF− (1, =) uGF2 (2) ALC depth 3 [42] uGF(1) uGF− 2 (2) uGF2 (1, f ) uGC− 2 (1, =) ALCHIQ depth 1 ALCIF ` depth 2 ALCF depth 3 [42] ALCF ` depth 2 ALCHIF depth 2 Figure 1: Summary of the results—Number in brackets indicates depth, f presence of partial functions, ·2 restriction to two variables, ·− restricts outermost guards to be equality, F globally function roles, F` concepts (≤ 1 R). tension with forms of counting. Invariance under disjoint unions is a fairly mild restriction that is shared by many relevant ontology languages, and it admits a very natural syntactic characterization. Our results are summarized in Figure 1. We first explain the fragments shown in the figure and then survey the obtained results. A uGF ontology is a set of sentences of the form ∀~ x(R(~ x) → ϕ(~ x)) where R(~x) is a guard (possibly equality) and ϕ(~ x) is a GF formula that does not contain any sentences as subformulas and in which equality is not used as a guard. The depth of such a sentence is the quantifier depth of ϕ(~ x) (and thus the outermost universal quantifier is not counted). A main parameter that we vary is the depth, which is typically very small in real world ontologies. In Figure 1, the depth is the first parameter displayed in brackets. As usual, the subscript ·2 indicates the restriction to two variables while a superscript ·− means that the guard R(~ x) in the outermost universal quantifier can only be equality, = means that equality is allowed (in non-guard positions), f indicates the ability to declare binary relation symbols to be interpreted as partial functions, and GC2 denotes the two variable guarded fragment extended with counting quantifiers, see [32, 48]. While guarded fragments are displayed in black, description logics (DLs) are shown in grey and smaller font size. We use standard DL names except that ‘F’ denotes globally functional roles while ‘F` ’ refers to counting concepts of the form (≤ 1 R). We do not explain DL names here, but refer to the standard literature [4]. The bottommost part of Figure 1 displays fragments for which there is a dichotomy between PT IME and CO NP, the middle part shows fragments for which such a dichotomy implies the FederVardi conjecture (from now on called CSP-hardness), and the topmost part is for fragments that provably have no dichotomy (unless PT IME = NP). The vertical lines indicate that the linked results are closely related, often indicating a fundamental difficulty in further generalizing an upper bound. For example, uGF− (1, =) enjoys a dichotomy while uGF2 (1, =) is CSP-hard, which demonstrates that generalizing the former result by dropping the restriction that the outermost quantifier has to be equality (indicated by ·− ) is very challenging (if it is possible at all).1 Our positive results are thus optimal in many ways. All results hold both when CQs and when UCQs are used as the actual query; in this context, it is interesting to note that there is a GF ontology (which is not an uGF ontology) for which CQ answering is in PT IME while UCQ-answering is CO NP-hard. In the cases which enjoy a dichotomy, we also show that PT IME query evaluation coincides with rewritability into Datalog (with inequality in the rule bodies if we start from a fragment 1 A tentative proof of the Feder-Vardi conjecture has very recently been announced in [49], along with an invitation to the research community to verify its validity. with equality or counting). In contrast, for all fragments that are CSP-hard or have no dichotomy, these two properties do provably not coincide. This is of course independent of whether or not the Feder-Vardi conjecture holds. For ALCHIQ ontologies of depth 1, we also show that it is decidable and E XP T IME-complete whether a given ontology admits PT IME query evaluation (equivalently: rewritability into Datalog6= ). For uGC− 2 (1, =), we show a NE XP T IME upper bound. For ALC ontologies of depth 2, we establish NE XP T IME-hardness. The proof indicates that more sophisticated techniques are needed to establish decidability, if the problem is decidable at all (which we leave open). To understand the practical relevance of our results, we have analyzed 411 ontologies from the BioPortal repository [52]. After removing all constructors that do not fall within ALCHIF , an impressive 405 ontologies turned out to have depth 2 and thus belong to a fragment with dichotomy (sometimes modulo an easy complexity-preserving rewriting). For ALCHIQ, still 385 ontologies had depth 1 and so belonged to a fragment with dichotomy. As a concrete and simple example, consider the two uGC− 2 (1)ontologies O1 = {∀x (Hand(x) → ∃=5 y hasFinger(x, y))} O2 = {∀x (Hand(x) → ∃y (hasFinger(x, y) ∧ Thumb(y)))} which both enjoy PT IME query evaluation (and thus rewritability into Datalog6= ), but where query evaluation w.r.t. the union O1 ∪O2 is CO NP-hard. Note that such subtle differences cannot be captured when data complexity is studied on the level of ontology languages, at least when basic compositionality conditions are desired. We briefly highlight some of the techniques used to establish our results. An important role is played by the notions of materializability and unraveling tolerance of an ontology O. Materializability means that for every instance D, there is a universal model of D and O, defined in terms of query answers rather than in terms of homomorphisms (which, as we show, need not coincide in our context). Unraveling tolerance means that the ontology cannot distinguish between an instance and its unraveling into a structure of bounded treewidth. While non-materializability of O implies that query evaluation w.r.t. O is CO NP-hard, unraveling tolerance of O implies that query evaluation w.r.t. O is in PT IME (in fact, even rewritable into Datalog). To establish dichotomies, we prove for the relevant fragments that materializability implies unraveling tolerance which, depending on the fragment, can be technically rather subtle. To prove CSP-hardness or non-dichotomies, very informally speaking, we need to express properties in the ontology that a (positive existential) query cannot ‘see’. This is often very subtle and can often be achieved only partially. While the latter is not a major problem for CSP-hardness (where we need to deal with CSPs that ‘admit precoloring’ and are known to behave essentially in the same way as traditional CSPs), it poses serious challenges when proving non-dichotomy. To tackle this problem, we establish a variation of Ladner’s theorem on NP-intermediate problems such that instead of the word problem for NP Turing machines, it speaks about the run fitting problem, which is to decide whether a given partially described run of a Turing machine (which corresponds to a precoloring in the CSP case) can be extended to a full run that is accepting. Also our proofs of decidability of whether an ontology admits PT IME query evaluation are rather subtle and technical, involving e.g. mosaic techniques. Due to space constraints, throughout the paper we defer proof details to the appendix. Related Work. Ontology-mediated querying has first been considered in [40, 15]; other important papers include [16, 12, 5]. It is a form of reasoning under integrity constraints, a traditional topic in database theory, see e.g. [9, 8] and references therein, and it is also related to deductive databases, see e.g. the monograph [44]. Moreover, ontology-mediated querying has drawn inspiration from query answering under views [17, 18]. In recent years, there has been significant interest in complete classification of the complexity of hard querying problems. In the context of ontology-mediated querying, relevant references include [42, 11, 41]. In fact, this paper closes a number of open problems from [42] such as that ALCI ontologies of depth two enjoy a dichotomy and that materializability (and thus PT IME complexity and Datalog-rewritability) is decidable in many relevant cases. Other areas of database theory where complete complexity classifications are sought include consistent query answering [36, 35, 24, 21], probabilistic databases [51], and deletion propagation [33, 25]. 2. PRELIMINARIES We assume an infinite set ∆D of data constants, an infinite set ∆N of labeled nulls disjoint from ∆D , and a set Σ of relation symbols containing infinitely many relation symbols of any arity ≥ 1. A (database) instance D is a non-empty set of facts R(a1 , . . . , ak ), where R ∈ Σ, k is the arity of R, and a1 , . . . , ak ∈ ∆D . We generally assume that instances are finite, unless otherwise specified. An interpretation A is a non-empty set of atoms R(a1 , . . . , ak ), where R ∈ Σ, k is the arity of R, and a1 , . . . , ak ∈ ∆D ∪ ∆N . We use sig(A) and dom(A) to denote the set of relation symbols and, respectively, constants and labelled nulls in A. We always assume that sig(A) is finite while dom(A) can be infinite. Whenever convenient, interpretations A are presented in the form (A, (RA )R∈sig(A) ) where A = dom(A) and RA is a k-ary relation on A for each R ∈ sig(A) of arity k. An interpretation A is a model of an instance D, written A |= D, if D ⊆ A. We thus make a strong open world assumption (interpretations can make true additional facts and contain additional constants and nulls) and also assume standard names (every constant in D is interpreted as itself in A). Note that every instance is also an interpretation. Assume A and B are interpretations. A homomorphism h from A to B is a mapping from dom(A) to dom(B) such that R(a1 , . . . , ak ) ∈ A implies R(h(a1 ), . . . , h(ak )) ∈ B for all a1 , . . . , ak ∈ dom(A) and R ∈ Σ of arity k. We say that h preserves a set D of constants and labelled nulls if h(a) = a for all a ∈ D and that h is an isomorphic embedding if it is injective and R(h(a1 ), . . . , h(ak )) ∈ B entails R(a1 , . . . , ak ) ∈ A. An interpretation A ⊆ B is a subinterpretation of B if R(a1 , . . . , ak ) ∈ B and a1 , . . . , ak ∈ dom(A) implies R(a1 , . . . , ak ) ∈ A; if dom(A) = A, we denote A by B|A and call it the subinterpretation of B induced by A. Conjunctive queries (CQs) q of arity k take the form q(~ x) ← φ, where ~ x = x1 , . . . , xk is the tuple of answer variables of q, and φ is a conjunction of atomic formulas R(y1 , . . . , yn ) with R ∈ Σ of arity n and y1 , . . . , yn variables. As usual, all variables in ~ x must occur in some atom of φ. Any CQ q(~ x) ← φ can be regarded as an instance Dq , often called the canonical database of q, in which each variable y of φ is represented by a unique data constant ay , and that for each atom R(y1 , . . . , yk ) in φ contains the atom R(ay1 , . . . , ayk ). A tuple ~a = (a1 , . . . , ak ) of constants is an answer to q(x1 , . . . , xk ) in A, in symbols A |= q(~a), if there is a homomorphism h from Dq to A with h(ax1 , . . . , axk ) = ~a. A union of conjunctive queries (UCQ) q takes the form q1 (~ x), . . . , qn (~ x), where each qi (~ x) is a CQ. The qi are called disjuncts of q. A tuple ~a of constants is an answer to q in A, denoted by A |= q(~a), if ~a is an answer to some disjunct of q in A. We now introduce the fundamentals of ontology-mediated querying. An ontology language L is a set of first-order sentences over signature Σ (that is, function symbols are not allowed) and an L-ontology O is a finite set of sentences from L. We introduce various concrete ontology languages throughout the paper, including fragments of the guarded fragment and descriptions logics. An interpretation A is a model of an ontology O, in symbols A |= O, if it satisfies all its sentences. An instance D is consistent w.r.t. O if there is a model of D and O. An ontology-mediated query (OMQ) is a pair (O, q), where O is an ontology and q a UCQ. The semantics of an ontology-mediated query is given in terms of certain answers, defined next. Assume that q has arity k and D is an instance. Then a tuple ~a of length k in dom(D) is a certain answer to q on an instance D given O, in symbols O, D |= q(~a), if A |= q(~a) for all models A of D and O. The query evaluation problem for an OMQ (O, q(~ x)) is to decide, given an instance D and a tuple ~a in D, whether O, D |= q(~a). We use standard notation for Datalog programs (a brief introduction is given in the appendix). An OMQ (O, q(~ x)) is called Datalog-rewritable if there is a Datalog program Π such that for all instances D and ~a ∈ dom(D), O, D |= q(~a) iff D |= Π(~a). Datalog6= -rewritability is defined accordingly, but allows the use of inequality in the body of Datalog rules. We are mainly interested in the following properties of ontologies. Definition 1 Let O be an ontology and Q a class of queries. Then • Q-evaluation w.r.t. O is in PT IME if for every q ∈ Q, the query evaluation problem for (O, q) is in PT IME. • Q-evaluation w.r.t. O is Datalog-rewritable (resp. Datalog6= rewritable) if for every q ∈ Q, the query evaluation problem for (O, q) is Datalog-rewritable (resp. Datalog6= -rewritable). • Q-evaluation w.r.t. O is CO NP-hard if there is a q ∈ Q such that the query evaluation problem for (O, q) is CO NP-hard. 2.1 Ontology Languages As ontology languages, we consider fragments of the guarded fragment (GF) of FO, the two-variable guarded fragment of FO with counting, and DLs. Recall that GF formulas [1] are obtained by starting from atomic formulas R(~ x) over Σ and equalities x = y and then using the boolean connectives and guarded quantifiers of the form ∀~ y (α(~ x, ~ y ) → ϕ(~ x, ~ y )), ∃~ y (α(~ x, ~ y ) ∧ ϕ(~ x, ~ y )) where ϕ(~x, ~ y ) is a guarded formula with free variables among ~ x, ~ y and α(~x, ~ y ) is an atomic formula or an equality x = y that contains all variables in ~ x, ~ y . The formula α is called the guard of the quantifier. In ontologies, we only allow GF sentences ϕ that are invariant under disjoint unions, that is, for all families Bi , i ∈ I, of interpretations with mutually disjoint domains, the following holds: S Bi |= ϕ for all i ∈ I if, and only if, i∈I Bi |= ϕ. We give a syntactic characterization of GF sentences that are invariant under disjoint unions. Denote by openGF the fragment of GF that consists of all (open) formulas whose subformulas are all open and in which equality is not used as a guard. The fragment uGF of GF is the set of sentences obtained from openGF by a single guarded universal quantifier: if ϕ(~ y ) is in openGF, then ∀~ y (α(~ y ) → ϕ(~ y )) is in uGF, where α(~ y ) is an atomic formula or an equality y = y that contains all variables in ~ y . We often omit equality guards in uGF sentences of the form ∀y(y = y → ϕ(y)) and simply write ∀yϕ. A uGF ontology is a finite set of sentences in uGF. Theorem 1 A GF sentence is invariant under disjoint unions iff it is equivalent to a uGF sentence. P ROOF. The direction from right to left is straightforward. For the converse direction, observe that every GF sentence is equivalent to a Boolean combination of uGF sentences. Now assume that ϕ is a GF sentence and invariant under disjoint unions. Let cons(ϕ) be the set of all sentences χ in uGF with ϕ |= χ. By compactness of FO it is sufficient to show that cons(ϕ) |= ϕ. If this is not the case, take a model A0 of cons(ϕ) refuting ϕ and take for any sentence ψ in uGF that is not in cons(ϕ) an interpretation A¬ψ satisfying ϕ and refuting ψ. Let A1 be the disjoint union of all A¬ψ . By preservation of ϕ under disjoint unions, A1 satisfies ϕ. By reflection of ϕ for disjoint unions, the disjoint union A of A0 and A1 does not satisfy ϕ. Thus A1 satisfies ϕ and A does not satisfy ϕ but by construction A and A1 satisfy the same sentences in uGF. This is impossible since ϕ is equivalent to a Boolean combination of uGF sentences. The following example shows that some very simple Boolean combinations of uGF sentences are not invariant under disjoint unions. Example 1 Let OUCQ/CQ OMat/PTime = {(∀x(A(x) ∨ B(x)) ∨ ∃xE(x)} = {∀xA(x) ∨ ∀xB(x)} Then OMat/PTime is not preserved under disjoint unions since D1 = {A(a)} and D2 = {B(b)} are models of OMat/PTime but D1 ∪ D2 refutes OMat/PTime ; OUCQ/CQ does not reflect disjoint unions since the disjoint union of D01 = {E(a)} and D02 = {F (b)} is a model of OUCQ/CQ but D02 refutes OUCQ/CQ . We will use these ontologies later to explain why we restrict this study to fragments of GF that are invariant under disjoint unions. When studying uGF ontologies, we are going to vary several parameters. The depth of a formula ϕ in openGF is the nesting depth of guarded quantifiers in ϕ. The depth of a sentence ∀~ y (α(~ y ) → ϕ(~ y )) in uGF is the depth of ϕ(~ y ), thus the outermost guarded quantifier is not counted. The depth of a uGF ontology is the maximum depth of its sentences. We indicate restricted depth in brackets, writing e.g. uGF(2) to denote the set of all uGF sentences of depth at most 2. Example 2 The sentence ∀xy(R(x, y) → (A(x) ∨ ∃zS(y, z))) is in uGF(1) since the openGF formula A(x) ∨ ∃zS(y, z) has depth 1. For every GF sentence ϕ, one can construct in polynomial time a conservative extension ϕ0 in uGF(1) by converting into Scott normal form [29]. Thus, the satisfiability and CQ-evaluation problems for full GF can be polynomially reduced to the corresponding problem for uGF(1). We use uGF− to denote the fragment of uGF where only equality guards are admitted in the outermost universal quantifier applied to an openGF formula. Thus, the sentence in Example 2 (1) is a uGF sentence of depth 1, but not a uGF− sentence of depth 1. It is, however, equivalent to the following uGF− sentence of depth 1: ∀x(∃y((R(y, x) ∧ ¬A(y)) → ∃zS(x, z))) An example of a uGF sentence of depth 1 that is not equivalent to a uGF− sentence of depth 1 is given in Example 3 below. Intuitively, uGF sentences of depth 1 can be thought of as uGF− sentences of ‘depth 1.5’ because giving up ·− allows an additional level of ‘real’ quantification (meaning: over guards that are not forced to be equality), but only in a syntactically restricted way. The two-variable fragment of uGF is denoted with uGF2 . More precisely, in uGF2 we admit only the two fixed variables x and y and disallow the use of relation symbols of arity exceeding two. We also consider two extensions of uGF2 with forms of counting. First, uGF2 (f ) denotes the extension of uGF2 with function symbols, that is, an uGF2 (f ) ontology is a finite set of uGF2 sentences and of functionality axioms ∀x∀y1 ∀y2 ((R(x, y1 )∧R(x, y2 )) → (y1 = y2 )) [29]. Second, we consider the extension uGC2 of uGF2 with counting quantifiers. More precisely, the language openGC2 is defined in the same way as the two-variable fragment of openGF, but in addition admits guarded counting quantifiers [48, 32]: if n ∈ , {z1 , z2 } = {x, y}, and α(z1 , z2 ) ∈ {R(z1 , z2 ), R(z2 , z1 )} for some R ∈ Σ and ϕ(z1 , z2 ) is in openGC2 , then ∃≥n z1 (α(z1 , z2 )∧ ϕ(z1 , z2 )) is in openGC2 . The ontology language uGC2 is then defined in the same way as uGF2 , using openGC2 instead of openGF2 . The depth of formulas in uGC2 is defined in the expected way, that is, guarded counting quantifiers and guarded quantifiers both contribute to it. The above restrictions can be freely combined and we use the obvious names to denote such combinations. For example, uGF− 2 (1, f ) denotes the two-variable fragment of uGF with function symbols and where all sentences must have depth 1 and the guard of the outermost quantifier must be equality. Note that uGF admits equality, although in a restricted way (only in non-guard positions, with the possible exception of the guard of the outermost quantifier). We shall also consider fragments of uGF that admit no equality at all except as a guard of the outermost quantifier. To emphasize that the restricted use of equality is allowed, we from now on use the equality symbol in brackets whenever equality is present, as in uGF(=), uGF− (1, =), and uGC− 2 (1, =). Conversely, uGF, uGF− (1), and uGC− 2 (1) from now on denote the corresponding fragments where equality is only allowed as a guard of the outermost quantifier. N Description logics are a popular family of ontology languages that are related to the guarded fragments of FO introduced above. We briefly review the basic description logic ALC, further details on this and other DLs mentioned in this paper can be found in the appendix and in [4]. DLs generally use relations of arity one and two, only. An ALC concept is formed according to the syntax rule C, D ::= A | > | ⊥ | ¬C | C u D | C t D | ∃R.C | ∀R.C where A ranges over unary relations and R over binary relations. An ALC ontology O is a finite set of concept inclusions C v D, with C and D ALC concepts. The semantics of ALC concepts C can be given by translation to openGF formulas C ∗ (x) with one free variable x and two variables overall. A concept inclusion C v ∗ ∗ D then translates to the uGF− 2 sentence ∀x(C (x) → D (x)). The depth of an ALC concept is the maximal nesting depth of ∃R and ∀R. The depth on an ALC ontology is the maximum depth of concepts that occur in it. Thus, every ALC ontology of depth n is a uGF− 2 ontology of depth n. When translating into uGF2 instead of into uGF− 2 , the depth might decrease by one because one can exploit the outermost quantifier (which does not contribute to the depth). A more detailed description of the relationship between DLs and fragments of uGF is given in the appendix. Example 3 The ALC concept inclusion ∃S.A v ∀R.∃S.B has depth 2, but is equivalent to the uGF2 (1) sentence ∀xy(R(x, y) → ((∃S.A)∗ (x) → (∃S.B)∗ (y)) Note that for any ontology O in any DL considered in this paper one can construct in a straightforward way in polynomial time a conservative extension O∗ of O of depth one. In fact, many DL algorithms for satisfiability or query evaluation assume that the ontology is of depth one and normalized. We also consider the extensions of ALC with inverse roles R− (denoted in the name of the DL by the letter I), role inclusions R v S (denoted by H), qualified number restrictions (≥ n R C) (denoted by Q), partial functions as defined above (denoted by F), and local functionality expressed by (≤ 1R) (denoted by F` ). The depth of ontologies formulated in these DLs is defined in the obvious way. Thus, ALCHIQ ontologies (which admit all the constructors introduced above) translate into uGC− 2 ontologies, preserving the depth. For any syntactic object O (such as an ontology or a query), we use |O| to denote the number of symbols needed to write O, counting relation names, variable names, and so on as a single symbol and assuming that numbers in counting quantifiers and DL number restrictions are coded in unary. 2.2 Guarded Tree Decompositions We introduce guarded tree decompositions and rooted acyclic queries [30]. A set G ⊆ dom(A) is guarded in the interpretation A if G is a singleton or there are R ∈ Σ and R(a1 , . . . , ak ) ∈ A such that G = {a1 , . . . , ak }. By S(A), we denote the set of all guarded sets in A. A tuple (a1 , . . . , ak ) ∈ Ak is guarded in A if {a1 , . . . , ak } is a subset of some guarded set in A. A guarded tree decomposition of A is a triple (T, E, bag) with (T, E) an acyclic undirected graph and bag a function that assigns to every t ∈ T a set bag(t) of atoms such that A|dom(bag(t)) = bag(t) and S 1. A = t∈T bag(t); 2. dom(bag(t)) is guarded for every t ∈ T ; 3. {t ∈ T | a ∈ dom(bag(t))} is connected in (T, E), for every a ∈ dom(A). We say that A is guarded tree decomposable if there exists a guarded tree decomposition of A. We call (T, E, bag) a connected guarded tree decomposition (cg-tree decomposition) if, in addition, (T, E) is connected (i.e., a tree) and dom(bag(t)) ∩ dom(bag(t0 )) 6= ∅ for all (t, t0 ) ∈ E. In this case, we often assume that (T, E) has a designated root r, which allows us to view (T, E) as a directed tree whenever convenient. A CQ q ← φ is a rooted acyclic query (rAQ) if there exists a cgtree decomposition (T, E, bag) of the instance Dq with root r such that dom(bag(r)) is the set of answer variables of q. Note that, by definition, rAQs are non-Boolean queries. Example 4 The CQ q(x) ← φ, φ = R(x, y) ∧ R(y, z) ∧ R(z, x) is not an rAQ since Dq is not guarded tree decomposable. By adding the conjunct Q(x, y, z) to φ one obtains an rAQ. We will frequently use the following construction: let D be an instance and G a set of guarded sets in D. Assume that BG , G ∈ G, are interpretations such that dom(BG ) ∩ dom(D) = G and dom(BG1 ) ∩ dom(BG2 ) = G1 ∩ G2 for any two distinct guarded sets G1 and G2 in G. Then the interpretation [ B=D∪ BG G∈G is called the interpretation obtained from D by hooking BG to D for all G ∈ G. If the BG are cg-tree decomposable interpretations with dom(bag(r)) = G for the root r of a (fixed) cg-tree decomposition of BG , then B is called a forest model of D defined using G. If G is the set of all maximal guarded sets in D, then we call B simply a forest model of D. The following result can be proved using standard guarded tree unfolding [29, 30]. Lemma 1 Let O be a uGF(=) or uGC2 (=) ontology, D a possibly infinite instance, and A a model of D and O. Then there exists a forest model B of D and O and a homomorphism h from B to A that preserves dom(D). 3. MATERIALIZABILITY We introduce and study materializability of ontologies as a necessary condition for query evaluation to be in PT IME. In brief, an ontology O is materializable if for every instance D, there is a model A of O and D such that for all queries, the answers on A agree with the certain answers on D given O. We show that this sometimes, but not always, coincides with existence of universal models defined in terms of homomorphisms. We then prove that in uGF(=) and uGC2 (=), non-materializability implies CO NP-hard query answering while this is not the case for GF. Using these results, we further establish that in uGF(=) and uGC2 (=), query evalution w.r.t. ontologies to be in PT IME, Datalog6= -rewritable, and CO NP-hard does not depend on the query language, that is, all these properties agree for rAQs, CQs, and UCQs. Again, this is not the case for GF. Definition 2 (Materializability) Let O be an FO(=)-ontology, Q a class of queries, and M a class of instances. Then • an interpretation B is a Q-materialization of O and an instance D if it is a model of O and D and for all q(~ x) ∈ Q and ~a in dom(D), B |= q(~a) iff O, D |= q(~a). • O is Q-materializable for M if for every instance D ∈ M that is consistent w.r.t. O, there is a Q-materialization of O and D. If M is the class of all instances, we simply speak of Qmaterializability of O. We first observe that the materializability of ontologies does not depend on the query language (although concrete materializations do). Theorem 2 Let O be a uGF(=) or uGC2 (=) ontology and M a class of instances. Then the following conditions are equivalent: 1. O is rAQ-materializable for M; 2. O is CQ-materializable for M; 3. O is UCQ-materializable for M. P ROOF. The only non-trivial implication is (1) ⇒ (2). It can be proved by using Lemma 1 and showing that if A is a rAQmaterialization of an ontology O and an instance D, then any forest model B of O and D which admits a homomorphism to A that preserves dom(D) is a CQ-materialization of O and D. Because of Theorem 2, we from now on speak of materializability without reference to a query language and of materializations instead of UCQ-materializations (which are then also CQmaterializations and rAQ-materializations). A notion closely related to materializations are (homomorphically) universal models as used e.g. in data exchange [22, 20]. A model of an ontology O and an instance D is hom-universal if there is a homomorphism preserving dom(D) into any model of O and D. We say that an ontology O admits hom-universal models if there is a hom-universal model for O and any instance D. It is well-known that hom-universal models are closely related to what we call UCQ-materializations. In fact, in many DLs and in uGC2 (=), materializability of an ontology O coincides with O admitting hom-universal models (although for concrete models, being hom-universal is not the same as being a materialization). We show in the appendix that this is not the case for ontologies in uGF(2) (with three variables). The proof also shows that admitting homuniversal models is not a necessary condition for query evaluation to be in PT IME (in contrast to materializability). Lemma 2 A uGC2 (=) ontology is materializable iff it admits homuniversal models. This does not hold for uGF(2) ontologies. The following theorem links materializability to computational complexity, thus providing the main reason for our interest into this notion. The proof is by reduction of 2+2-SAT [50], a variation of a related proof from [42]. Theorem 3 Let O be an FO(=)-ontology that is invariant under disjoint unions. If O is not materializable, then rAQ-evaluation w.r.t. O is CO NP-hard. nulls) and each qi is a rAQ. This is similar to squid decompositions in [12], but more subtle due to the presence of subqueries that are not connected to any answer variable of q. Similar constructions are used also to deal with Datalog6= -rewritability and with CO NPhardness. The ontology OUCQ/CQ from Example 1 shows that Theorem 4 does not hold for GF ontologies, even if they use only a single variable and are of depth 1 up to an outermost universal quantifier with an equality guard. Lemma 3 CQ-evaluation w.r.t. OUCQ/CQ is in PT IME and UCQevaluation w.r.t. OUCQ/CQ is CO NP-hard. The lower bound essentially follows the construction in the proof of Theorem 3 and the upper bound is based on a case analysis, depending on which relations occur in the CQ and in the input instance. 4. UNRAVELLING TOLERANCE While materializability of an ontology is a necessary condition for PT IME query evaluation in uGF(=) and uGC2 (=), we now identify a sufficient condition called unravelling tolerance that is based on unravelling instances into cg-tree decomposable instances (which might be infinite). In fact, unravelling tolerance is even a sufficient condition of Datalog6= -rewritability and we will later establish our dichotomy results by showing that, for the ontology languages in question, materializability implies unravelling tolerance. We start with introducing suitable forms of unravelling (also called guarded tree unfolding, see [30] and references therein). The uGF-unravelling Du of an instance D is constructed as follows. Let T (D) be the set of all sequences t = G0 G1 · · · Gn where Gi , 0 ≤ i ≤ n, are maximal guarded sets of D and (a) Gi 6= Gi+1 , (b) Gi ∩ Gi+1 6= ∅, and (c) Gi−1 6= Gi+1 . This remains true when ‘in PT IME’ is replaced with ‘Datalog6= rewritable’ and with ‘CO NP-hard’ (and with ‘Datalog-rewritable’ if O is a uGF ontology). In the following, we associate eachSt ∈ T (D) with a set of atoms bag(t). Then we define Du as t∈T (D) bag(t) and note that (T (D), E, bag) is a cg-tree decomposition of Du where (t, t0 ) ∈ E if t0 = tG for some G. Set tail(G0 · · · Gn ) = Gn . Take an infinite supply of copies of any d ∈ dom(D). We set e↑ = d if e is a copy of d. We define bag(t) (up to isomorphism) proceeding by induction on the length of the sequence t. For any t = G, bag(t) is an instance whose domain is a set of copies of d ∈ G such that the mapping e 7→ e↑ is an isomorphism from bag(G) onto the subinstance D|G of D induced by G. To define bag(t0 ) for t0 = tG0 when tail(t) = G, take for any d ∈ G0 \ G a fresh copy d0 of d and define bag(t0 ) with domain {d0 | d ∈ G0 \ G} ∪ {e ∈ bag(t) | e↑ ∈ G0 ∩ G)} such that the mapping e 7→ e↑ is an isomorphism from bag(t0 ) onto D|G0 . The following example illustrates the construction of Du . Example 5 (1) Consider the instance D depicted below with the maximal guarded sets G1 , G2 , G3 . Then the unravelling Du of D consists of three isomorphic chains (we depict only one such chain): G1 G3 P ROOF. By Theorem 3, we can concentrate on ontologies that are materializable. For the non-trivial implication of Point 3 by Point 1, we exploit materializability to rewrite UCQs into a finite V disjunction of queries q ∧ i qi where q is a “core CQ” that only needs to be evaluated over the input instance D (ignoring labeled (2) Next consider the instance D depicted below which has the shape of a tree of depth one with root a and has three maximal We remark that, in the proof of Theorem 3, we use instances and rAQs that use additional fresh (binary) relation symbols, that is, relation symbols that do not occur in O. The ontology OMat/PTime from Example 1 shows that Theorem 3 does not hold for GF ontologies, even if they are of depth 1 and use only a single variable. In fact, OMat/PTime is not CQ-materializable, but CQ-evaluation is in PT IME (which is both easy to see). Theorem 4 For all uGF(=) and uGC2 (=) ontologies O, the following are equivalent: 1. rAQ-evaluation w.r.t. O is in PT IME; 2. CQ-evaluation w.r.t. O is in PT IME; 3. UCQ-evaluation w.r.t. O is in PT IME. G2 G3 G1 G2 G3 G2 guarded sets G1 , G2 , G3 . Then the unravelling Du of D consists of three isomorphic trees of depth one of infinite outdegree (again we depict only one): G1 G2 a G3 G1 ... a G3 G3 G2 G1 G2 By construction, the mapping h : e 7→ e↑ is a homomorphism from Du onto D and the restriction of h to any guarded set G is an isomorphism. It follows that for any uGF(=) ontology O, UCQ q(~ x), and ~a in Du , if O, Du |= q(~a), then O, D |= q(h(~a)). This implication does not hold for ontologies in the guarded fragment with functions or counting. To see this, let O = {∀x(∃≥4 yR(x, y) → A(x))} Then O, Du |= A(a) for the instance D from Example 5 (2) but O, D 6|= A(a). For this reason the uGF-unravelling is not appropriate for the guarded fragment with functions or counting. By replacing Condition (c) by the stronger condition (c0 ) Gi ∩ Gi−1 6= Gi ∩ Gi+1 , we obtain an unravelling that we call uGC2 -unravelling and that we apply whenever all relations have arity at most two. One can show that the uGC2 -unravelling of an instance preserves the number of R-successors of constants in D and that, in fact, the implication ‘O, Du |= q(~a) ⇒ O, D |= q(h(~a))’ holds for every uGC2 (=) ontology O, UCQ q(~ x), and tuple ~a in the uGC2 -unravelling Du of D. We are now ready to define unravelling tolerance. For a maximal guarded set G in D, the copy in bag(G) of a tuple ~a = (a1 , . . . , ak ) in G is the unique ~b = (b1 , . . . , bk ) in dom(bag(G)) such that b↑i = ai for 1 ≤ i ≤ k. Definition 3 A uGF(=) (resp. uGC2 (=)) ontology O is unravelling tolerant if for every instance D, every rAQ q(~ x), and every tuple ~a in D such that the set G of elements of ~a is maximally guarded in D the following are equivalent: 1. O, D |= q(~a); 2. O, Du |= q(~b) where ~b is the copy of ~a in bag(G) where Du is the uGF-unravelling (resp. the uGC2 -unravelling) of D. We have seen above that the implication (2) ⇒ (1) in Definition 3 holds for every uGF(=) and uGC2 (=) ontology and every UCQ. Note that it is pointless to define unravelling tolerance using the implication (1) ⇒ (2) for UCQs or CQs that are not acyclic. The following example shows that (1) ⇒ (2) does not always hold for rAQs. Example 6 Consider the uGF ontology O that contains the sentences ∀x X(x) → (∃y(R(x, y) ∧ X(y)) → E(x)) with X ∈ {A, ¬A} and ∀x E(x) → ((R(x, y) ∨ R(y, x)) → E(y)) For instances D not using A, O states that E(a) is entailed for all a ∈ dom(D) that are R-connected to some R-cycle in D with an odd number of constants. Thus, for the instance D from Example 5 (1) we have O, D |= E(a) for every a ∈ dom(D) but O, Du 6|= E(a) for any a ∈ dom(Du ). We now show that, as announced, unraveling tolerance implies that query evaluation is Datalog6= -rewritable. Theorem 5 For all uGF(=) and uGC2 (=) ontologies O, unravelling tolerance of O implies that rAQ-evaluation w.r.t. O is Datalog6= -rewritable (and Datalog-rewritable if O is formulated in uGF). P ROOF. We sketch the proof for the case that O is a uGF(=) ontology; similar constructions work for the other cases. Suppose that O is unravelling tolerant, and that q(~ x) is a rAQ. We construct a Datalog6= program Π that, given an instance D, computes the certain answers ~a of q on D given O, where w.l.o.g. we can restrict our attention to answers ~a such that the set G of elements of ~a is maximally guarded in D. By unravelling tolerance, it is enough to check if O, Du |= q(~b), where ~b is the copy of ~a in bag(G) and Du is the uGF-unravelling of D. The Datalog6= program Π assigns to each maximally guarded tuple ~a = (a1 , . . . , ak ) in D a set of types. Here, a type is a maximally consistent set of uGF formulas with free variables in x1 , . . . , xk , where the variable xi represents the element ai . It can be shown that we only need to consider types with formulas of the form φ or ¬φ, where φ is obtained from a subformula of O or q by substituting a variable in x1 , . . . , xk for each of its free variables, or φ is an atomic formula in the signature of O, q with free variables in x1 , . . . , xk . In particular, the set of all types is finite. We further restrict our attention to types θ that are realizable in some model of O, i.e., there is a model B(θ) of O containing all elements of ~a that is a model of each formula in θ under the interpretation xi 7→ ai . The Datalog6= program Π ensures the following: 1. for any two maximally guarded tuples ~a = (a1 , . . . , ak ), ~b = (b1 , . . . , bl ) in D that share an element, and any type θ assigned to ~a there is a type θ0 assigned to ~b that is compatible to θ (intuitively, the two types agree on all formulas that only talk about elements shared by ~a and ~b); 2. a tuple ~a = (a1 , . . . , ak ) is an answer to Π if all types assigned to ~a contain q(x1 , . . . , xk ), or some maximally guarded tuple ~b in D has no type assigned to it. It can be shown that ~a is an answer to Π iff O, Du |= q(~a). The interesting part is the “if” part. Suppose ~a = (a1 , . . . , ak ) is not an answer to Π. Then, each maximally guarded tuple ~b in D is assigned to at least one type, and for some type θ∗ assigned to ~a we have q(x1 , . . . , xk ) ∈ / θ∗ . We use this type assignment to label each maximally guarded tuple ~b of Du with a type θ~b so that (1) for each maximally guarded tuple ~c of Du that shares an element with ~b the two types θ~ and θ~c are compatible; and (2) θ∗ = θ~a∗ , where b ~a∗ is the copy of ~a in G = {a1 , . . . , ak }. We can now show that the interpretation A obtained from Du by hooking B(θ~b ) to Du , for all maximally guarded tuples ~b of Du , is a model of O and Du with A 6|= q(~a∗ ). 5. DICHOTOMIES We prove dichotomies between PT IME and CO NP for query evaluation in the five ontology languages displayed in the bottommost part of Figure 1. In fact, the dichotomy is even between Datalog6= -rewritability and CO NP. The proof establishes that for ontologies O formulated in any of these languages, CQ-evaluation w.r.t. O is Datalog6= -rewritable iff it is in PT IME iff O is unravelling tolerant iff O is materializable for the class of (possibly infinite) cg-tree decomposable instances iff O is materializable and that, if none of this is the case, CQ-evaluation w.r.t. O is CO NPhard. The main step towards the dichotomy result is provided by the following theorem. bag(t0 ) and such that ĥt,t0 (a)↑ = a↑ for all a ∈ dom(Du ). (This is trivial in the example above.) It is for this property that we need that Du is obtained from D using maximal guarded sets only and the assumption that Gi−1 6= Gi+1 . It follows that if tail(t) = tail(t0 ) then the same rAQs are entailed at bag(t) and bag(t0 ) in Du . Theorem 6 Let O be an ontology formulated in one of uGF(1), − uGF− (1, =), uGF− 2 (2), uGC2 (1, =), or an ALCHIF ontology of depth 2. If O is materializable for the class of (possibly infinite) cg-tree decomposable instances D with sig(D) ⊆ sig(O), then O is unravelling tolerant. 2. Homomorphism preservation: if there is a homomorphism h from instance D to instance D0 then O, D |= q(~a) entails O, D0 |= q(h(~a)). Ontologies in uGF(1) and uGF− 2 (2) have this property as they do not use equality nor counting. Because of homomorphism preservation the answers in Du to rAQs are invariant under moving from Du to Du+ . Note that the remaining ontology languages in Theorem 6 do not have this property. P ROOF. We sketch the proof for uGF(1) and uGF− 2 (2) ontologies O and then discuss the remaining cases. Assume that O satisfies the precondition from Theorem 6. Let D be an instance and Du its uGF unravelling. Let ~a be a tuple in a maximal guarded set G in D and ~b be the copy in dom(bag(G)) of ~a. Further let q be an rAQ such that O, Du 6|= q(~b). We have to show that O, D 6|= q(~a). Using the condition that O is materializable for the class of cg-tree decomposable instances D with sig(D) ⊆ sig(O), it can be shown that there exists a materialization B of O and Du . By Lemma 1 we may assume that B is a forest model which is obtained from Du by hooking cg-tree decomposable Bbag(t) to maximal guarded bag(t) in Du . Now we would like to obtain a model of O and the original D by hooking for any maximal guarded G in D the interpretation Bbag(G) to D rather than to Du . However, the resulting model is then not guaranteed to be a model of O. The following example illustrates this. Let O contain Now using that Du+ is materializable w.r.t. O one can uniformize a materialization Bu+ of Du+ that is a forest model in such a way that the automorphisms ĥt,t0 for tail(t) = tail(t0 ) extend to automorphisms of the resulting model Bu∗ which also still satisfies O. In the example, after uniformization all chains will behave in the same way in the sense that every node receives an S-successor not in A. We then obtain a forest model B∗ of D by hooking the in∗ a) terpretations Bu∗ bag(G) to the maximal guarded sets G in D. (B , ~ u∗ ~ ∗ and (B , b) are guarded bisimilar. Thus B is a model of O and B∗ 6|= q(~a), as required. For uGF− (1, =) and uGC− 2 (1, =) the intermediate step of constructing Du+ is not required as sentences have smaller depth and no uniformization is needed to satisfy the ontology in the new model. For ALCHIF ontologies of depth 2 uniformization by constructing Du+ is needed and has to be done carefully to preserve functionality when adding copies of entailed rAQs to Du . ∀x∃y(S(x, y) ∧ A(y)), We can now prove our main dichotomy result. and for ϕ(x) = ∃z(S(x, z) ∧ ¬A(z)) ∀xy(R(x, y) → (ϕ(x) → ϕ(y)) Thus in every model of O each node has an S-successor in A and having an S-successor that is not in A is propagated along R. O is unravelling tolerant. Consider the instance D from Example 5 (1) depicted here again with the maximal guarded sets G1 , G2 , G3 . B A A ¬A A ¬A A G1 G3 G2 A A G2 G3 1. O is materializable; 2. O is materializable for the class of cg-tree decomposable instances D with sig(D) ⊆ sig(O); A A ¬A A ¬A A 3. O is unravelling tolerant; ¬A A 4. query evaluation w.r.t. O is Datalog6= -rewritable (and Datalog-rewritable if O is formulated in uGF); D G1 Theorem 7 Let O be an ontology formulated in one of uGF(1), − uGF− (1, =), uGF− 2 (2), uGC2 (1, =), or an ALCHIF ontology of depth 2. Then the following conditions are equivalent (unless PT IME = NP): ¬A A We have seen that the unravelling Du of D consists of three chains. An example of a forest model B of O and Du is given in the figure. Even in this simple example a naive way of hooking the models BGi , i = 1, 2, 3, to the original instance D will lead to an interpretation not satisfying O as the propagation condition for Ssuccessors not in A will not be satisfied. To ensure that we obtain a model of O we first define a new instance Du+ ⊇ Du by adding to each maximal guarded set in Du a copy of any entailed rAQ. The following facts are needed for this to work: 1. Automorphisms: for any t, t0 ∈ T (D) with tail(t) = tail(t0 ) there is an automorphism ĥt,t0 of Du mapping bag(t) onto 5. query evaluation w.r.t. O is in PT IME. Otherwise, query evaluation w.r.t. O is CO NP-hard. P ROOF. (1) ⇒ (2) is not difficult to establish by a compactness argument. (2) ⇒ (3) is Theorem 6. (3) ⇒ (4) is Theorem 5. (4) ⇒ (5) is folklore. (5) ⇒ (1) is Theorem 3 (assuming PT IME 6= NP). The qualification ‘with sig(D) ⊆ sig(O)’ in Point 2 of Theorem 7 can be dropped without compromising the correctness of the theorem, and the same is true for Theorem 6. It will be useful, though, in the decision procedures developed in Section 8. 6. CSP-HARDNESS We establish the four CSP-hardness results displayed in the middle part of Figure 1, starting with a formal definition of CSPhardness. In addition, we derive from the existence of CSPs in PT IME that are not Datalog definable the existence of ontologies in any of these languages with PT IME query evaluation that are not Datalog6= rewritability. Let A be an instance. The constraint satisfaction problem CSP(A) is to decide, given an instance D, whether there is a homomorphism from D to A, which we denote with D → A. In this context, A is called the template of CSP(A). We will generally and w.l.o.g. assume that relations in sig(A) have arity at most two and that the template A admits precoloring, that is, for each a ∈ dom(A), there is a unary relation symbol Pa such that Pa (b) ∈ A iff b = a [19]. It is known that for every template A, there is a template A0 of this form such that CSP(A) is polynomially equivalent to CSP(A0 ) [39]. We use coCSP(A) to denote the complement of CSP(A). Definition 4 Let L be an ontology language and Q a class of queries. Then Q-evaluation w.r.t. L is CSP-hard if for every template A, there exists an L ontology O such that 1. there is a q ∈ Q such that coCSP(A) polynomially reduces to evaluating the OMQ (O, q) and 2. for every q ∈ Q, evaluating the OMQ (O, q) is polynomially reducible to coCSP(A). It can be verified that a dichotomy between PT IME and CO NP for Q-evaluation w.r.t. L ontologies implies a dichotomy between PT IME and NP for CSPs, a notorious open problem known as the Feder-Vardi conjecture [23, 7], when Q-evaluation w.r.t. L is CSPhard. As noted in the introduction, a tentative proof of the conjecture has recently been announced, but at the time this article is published, its status still remains unclear. The following theorem summarizes our results on CSP-hardness. We formulate it for CQs, but remark that due to Theorem 4, a dichotomy between PT IME and CO NP for any of the mentioned ontology languages and any query language from the set {rAQ, CQ, UCQ} implies the Feder-Vardi conjecture. Theorem 8 For any of the following ontology languages, CQevaluation w.r.t. L is CSP-hard: uGF2 (1, =), uGF2 (2), uGF2 (1, f ), and the class of ALCF ` ontologies of depth 2. P ROOF. We sketch the proof for uGF2 (1, =) and then indicate the modifications needed for uGF2 (1, f ) and ALCF ` ontologies of depth 2. For uGF2 (2), the result follows from a corresponding result in [42] for ALC ontologies of depth 3. Let A be a template and assume w.l.o.g. that A admits precoloring. Let Ra be a binary relation for each a ∈ dom(A), and set ϕ6= a (x) ϕ= a (x) = ∃y(Ra (x, y) ∧ ¬(x = y)) = ∃y(Ra (x, y) ∧ (x = y)) Then O contains ^ _ 6= 6= ¬(ϕ6= ϕa (x)) ∀x( a (x) ∧ ϕa0 (x)) ∧ a6=a0 a ∀x(A(x) → ¬ϕ6= when A(a) 6∈ A a (x)) 6= 0 ∀xy(R(x, y) → ¬(ϕ6= a (x) ∧ ϕa0 (y))) when R(a, a ) 6∈ A = ∀xϕa (x) for all a ∈ dom(A) where A and R range over symbols in sig(A) of the respective arity. A formula ϕ6= a (x) being true at a constant c in an instance D means that c is mapped to a ∈ dom(A) by a homomorphism from D to A. The first sentence in O thus ensures that every node in D is mapped to exactly one node in A and the second and third set of sentences ensure that we indeed obtain a homomorphism. The last set of sentences enforces that ϕ= a (x) is true at every constant c. This makes the disjunction in the first sentence ‘invisible’ to the query (in which inequality is not available), thus avoiding that O is CO NP-hard for trivial reasons. In the appendix, we show that O satisfies Conditions 1 and 2 from Definition 4 where the query q used in Condition 1 is q ← N (x) with N a fresh unary relation. For uGF2 (1, f ), state that a binary relation F is a function and that ∀xF (x, x). Now replace in O the formulas ϕ6= a (x) by ∃y(Ra (x, y) ∧ ¬F (x, y)) and ϕ= a (x) by ∃y(Ra (x, y) ∧ F (x, y)). For ALCF ` of depth 2, replace in O the formulas ϕ6= a (x) by ≥2 ∃ yRa (x, y) and ϕ= a (x) by ∃yRa (x, y). The resulting ontology is equivalent to a ALCF ` ontology of depth 2. It is known that for some templates A, CSP(A) is in PT IME while coCSP(A) is not Datalog6= -definable [23]. Then CQ-evaluation w.r.t. the ontologies O constructed from A in the proof of Theorem 4 is in PT IME, but not Datalog6= -rewritable. Theorem 9 In any of the following ontology languages L there exist ontologies with PT IME CQ-evaluation which are not Datalog6= rewritable: uGF2 (1, =), uGF2 (2), uGF2 (1, f ), and the class of ALCF ` ontologies of depth 2. The ontology languages in Theorem 5 thus behave provably different from the languages for which we proved a dichotomy in Section 5, since there PT IME query evaluation and Datalog6= rewritability coincide. 7. NON-DICHOTOMY ABILITY AND UNDECID- We show that ontology languages that admit sentences of depth 2 as well as functions symbols tend to be computationally problematic as they do neither enjoy a dichotomy between PT IME and CO NP nor decidability of meta problems such as whether query evaluation w.r.t. a given ontology O is in PT IME, Datalog6= rewritable, or CO NP-hard, and whether O is materializable. We actually start with these undecidability results. Theorem 10 For the ontology languages uGF− 2 (2, f ) and ALCIF ` of depth 2, it is undecidable whether for a given ontology O, 1. query evaluation w.r.t. O is in PT IME, Datalog6= -rewritable, or CO NP-hard (unless PT IME = NP); 2. O is materializable. P ROOF. The proof is by reduction of the undecidable finite rectangle tiling problem. To establish both Points 1 and 2, it suffices to exhibit, for any such tiling problem P, an ontology OP such that if P admits a tiling, then OP is not materializable and thus query evaluation w.r.t. OP is CO NP-hard and if P admits no tiling, then query evaluation w.r.t. OP is Datalog6= -rewritable and thus materializable (unless PT IME = NP). The rectangle to be tiled is represented in input instances using the binary relations X and Y , and OP declares these relations and their inverses to be functional. The main idea in the construction of OP is to verify the existence of a properly tiled grid in the input instance by propagating markers from the top right corner to the lower left corner. During the propagation, one makes sure that grid cells close (that is, the XY-successor coincides with the YX-successor) and that there is a tiling that satisfies the constraints in P. Once the existence of a properly tiled grid is completed, a disjunction is derived by OP to achieve non-materializability and CO NP-hardness. The challenge is to implement this construction such that when P has no solution (and thus the verification of a properly tiled grid can never complete), OP is Datalog6= rewritable. In fact, achieving this involves a lot of technical subtleties. A central issue is how to implement the markers (as formulas with one free variable) that are propagated through the grid during the verification. The markers must be designed in a way so that they cannot be ‘preset’ in the input instance as this would make it possible to prevent the verification of a (possibly defective) part of the input. In ALCIF ` , we use formulas of the form ∃=1 yP (x, y) while additionally stating in OP that ∀x∃yP (x, y). Thus, the choice is only between whether a constant has exactly one P -successor (which means that the marker is set) or more than one P -successor (which means that the marker is not set). Clearly, this difference is invisible to queries and we cannot preset a marker in an input instance in the sense that we make it true at some constant. We can, however, easily make the marker false at a constant c by adding two P -successors to c in the input instance. It seems that this effect, which gives rise to many technical complications, can only be avoided by using marker formulas with higher quantifier depth which would result in OP not falling within ALCIF ` depth 2. For uGF− 2 (2, f ) we work with ¬∃y(P (x, y) ∧ ¬F (x, y)), where F is a function for which we state ∀xF (x, x) (as in the CSP encoding). Full proof details can be found in the appendix. We only mention that closing of a grid cell is verified by using marker formulas as second-order variables. Theorem 11 For the ontology languages uGF− 2 (2, f ) and ALCIF ` of depth 2, there is no dichotomy between PT IME and CO NP (unless PT IME = CO NP). By Ladner’s theorem [38], there is a non-deterministic polynomial time Turing machine (TM) whose word problem is neither in PT IME nor NP-hard (unless PT IME = CO NP). Ideally, we would like to reduce the word problem of such TMs to prove Theorem 11. However, this does not appear to be easily possible, for the following reason. In the reduction, we use a grid construction and marker formulas as in the proof of Theorem 10, with the grid providing the space in which the run of the TM is simulated and markers representing TM states and tape symbols. We cannot avoid that the markers can be preset either positively or negatively in the input (depending on the marker formulas we choose), which means that some parts of the run are not ‘free’, but might be predetermined or at least constrained in some way. We solve this problem by first establishing an appropriate variation of Ladner’s theorem. We consider non-deterministic TMs M with a single one-sided infinite tape. Configurations of M are represented by strings vqw, where q is the state, and v and w are the contents of the tape to the left and to the right of the tape head, respectively. A partial configuration of M is obtained from a configuration γ of M by replacing some or all symbols of γ by a wildcard symbol ?. A partial configuration γ̃ matches a configuration γ if it has the same length and agrees with γ on all non-wildcard symbols. A partial run of M is a finite sequence γ̃0 , . . . , γ̃m of partial configurations of M of the same length. It is a run if each γ̃i is a configuration, and it matches a run γ0 , . . . , γn if m = n and each γ̃i matches γi . A run is accepting if its last configuration has an accepting state. Note that runs need not start in any specific configuration (unless specified by a partial run that they extend). The run fitting problem for M is to decide whether a given partial run of M matches some accepting run of M . It is easy to see that for any TM M , the run fitting problem for M is in NP. We prove the following result in the appendix by a careful adaptation of the proof of Ladner’s theorem given in [2]. Theorem 12 There is a non-deterministic Turing machine whose run fitting problem is neither in PT IME nor NP-hard (unless PT IME = NP). Now Theorem 11 is a consequence of the following lemma. Lemma 4 For every Turing machine M , there is a uGF− 2 (2, f ) ontology O and an ALCIF ` ontology O of depth 2 such that the following hold, where N is a distinguished unary relation: 1. there is a polynomial reduction of the run fitting problem for M to the complement of evaluating the OMQ (O, q ← N (x)); 2. for every UCQ q, evaluating the OMQ (O, q) is polynomially reducible to the complement of the run fitting problem for M . To establish Lemma 4, we re-use the ontology OP from the proof of Theorem 10, using a trivial rectangle tiling problem. When the existence of the grid has been verified, instead of triggering a disjunction as before, we now start a simulation of M on the grid. For both ALCIF ` and uGF− 2 (2, f ), we represent states q and tape symbols G using the same formulas as in the CSP encoding of homomorphisms. Thus, for ALCIF ` we use formulas ∃≥2 yq(x, y) and ∃≥2 yG(x, y), respectively, using q and G as binary relations. Note that here the encoding ∃=1 yq(x, y) from the tiling problem does not work because states and tape symbols can be positively preset in the input instance rather than negatively, which is in correspondence with the run fitting problem. 8. DECISION PROBLEMS We study the decidability and complexity of the problem to decide whether a given ontology admits PT IME query evaluation. Realistically, we can only hope for positive results in cases where there is a dichotomy between PT IME and CO NP: first, we have shown in Section 7 that for cases with provably no such dichotomy, meta problems are typically undecidable; and second, it does not seem very likely that in the CSP-hard cases, one can decide whether an ontology admits PT IME query evaluation without resolving the dichotomy question and thus solving the Feder-Vardi conjecture. Our main results are E XP T IME-completeness of deciding PT IMEquery evalutation of ALCHIQ ontologies of depth one (the same complexity as for satisfiability) and a NE XP T IME upper bound for uGC− 2 (1, =) ontologies. Note that, in both of the considered languages, PT IME-query evalutation coincides with rewritability into Datalog6= . We remind the reader that according to our experiments, a large majority of real world ontologies are ALCHIQ ontologies of depth 1. We also show that for ALC ontologies of depth 2, the mentioned problem is NE XP T IME-hard. Since the ontology languages relevant here admit at most binary relations, an interpretation B is cg-tree decomposable if and only if the undirected graph GB = {{a, b} | R(a, b) ∈ B, a 6= b} is a tree. For simplicity, we speak of tree interpretations and of tree instances, defined likewise. The outdegree of B is the outdegree of GB . Theorem 13 For uGC− 2 (1, =) ontologies, deciding whether query evaluation w.r.t. a given ontology is in PT IME (equivalently: rewritable into Datalog6= ) is in NE XP T IME. For ALCHIQ ontologies of depth 1, this problem is in E XP T IME-complete. The main insight underlying the proof of Theorem 13 is that for ontologies formulated in the mentioned languages, materializability (which by Theorem 7 coincides with PT IME query evaluation) already follows from the existence of materializations for tree instances of depth 1. We make this precise in the following lemma. Given a tree interpretation B and a ∈ dom(B), define the 1neighbourhood B≤1 a of a in B as B|X , where X is the union of all guarded sets in B that contain a. B is a bouquet with root a if B≤1 = B and it is irreflexive if there exists no atom of the form a R(b, b) in B. Lemma 5 Let O be a uGC− 2 (1, =) ontology (resp. an ALCHIQ ontology of depth 1). Then O is materializable iff O is materializable for the class of all (respectively, all irreflexive) bouquets D of outdegree ≤ |O| with sig(D) ⊆ sig(O). P ROOF. We require some notation. An instance D is called Osaturated for an ontology O if for all facts R(~a) with ~a ⊆ dom(D) such that O, D |= R(~a) it follows that R(~a) ∈ D. For every O and instance D there exists a unique minimal (w.r.t. set-inclusion) O-saturated instance DO ⊇ D. We call DO the O-saturation of D. It is easy to see that there is a materialization of O and D if and only if there is a materialization of O and the O-saturation of D. We first prove Lemma 5 for uGC− 2 (1, =) ontologies O and without the condition on the outdegree. Let Σ0 = sig(O) and assume that O is materializable for the class of all Σ0 -bouquets. By Theorem 7 it suffices to prove that O is materializable for the class of Σ0 -tree instances. Fix a Σ0 -tree instance D that is consistent w.r.t. O. We may assume that D is O-saturated. Note that a forest model materialization B of an ontology O and an O-saturated instance F consists of F and tree interpretations Ba , a ∈ dom(F), that are hooked to F at a. Take for any a ∈ dom(D) the bouquet D≤1 with root a and hook to D at a the interpretation Ba a that is hooked to D≤1 a at a in a forest model materialization B of ≤1 D≤1 a and O (such a forest model materialization exists since Da is materializable). Denote by A the resulting interpretation. Using the condition that O is a uGC− 2 (1, =) ontology it is not difficult to prove that A is a materialization of O and D. We now prove the restriction on the outdegree. Assume O is given. Let D be a bouquet with root a of minimal outdegree such that there is no materialization of O and D. We show that the outdegree of D does not exceed |O|. Assume the outdegree of D is at least three (otherwise we are done). We may assume that D is Osaturated. Take for any formula χ = ∃≥n z1 α(z1 , z2 ) ∧ ϕ(z1 , z2 ) that occurs as a subformula in O the set Zχ of all b 6= a such that D |= α(b, a) ∧ ϕ(b, a). Let Zχ0 = Zχ if |Zχ | ≤ n + 1; otherwise let Zχ0 be a subset of Zχ of cardinality n + 1. Let D0 be the restriction D|Z of D to the union Z of all Zχ0 and {a}. We show that there exists no materialization of D0 and O. Assume for a proof by contradiction that there is a materialization B of D0 . Let B0 be the union of D∪B and the interpretations Bb , b ∈ dom(D)\(Z∪{a}), that are hooked to D|{a,b} at b in a forest model materialization of D|{a,b} . We show that B0 is a materialization of D and O (and thus derive a contradiction). Using the condition that D is O-saturated one can show that the restriction B0|dom(D) of B0 to dom(D) coincides with D. Using the condition that O has depth 1 it is now easy to show that B0 is a model of O. It is a materialization of D and O since it is composed of materializations of subinstances of D and O. The proof that irreflexive bouquets are sufficient for ontologies of depth 1 in ALCHIQ is similar to the proof above and uses the fact that one can always unravel models of ALCHIQ ontologies into irreflexive tree models. We now develop algorithms that decide PT IME query evaluation by checking the conditions given in Lemma 5, starting with the (easier) case of ALCHIQ. Let D be a bouquet with root a. Call a bouquet B ⊇ D a 1-materialization of O and D if • there exists a model A of O and D such that B = A≤1 a ; • for any model A of D and O there exists a homomorphism from B to A that preserves dom(D). It turns out that, when checking materializability, not only is it sufficient to consider bouquets instead of unrestricted instances, but additionally one can concentrate on 1-materializations of bouquets. Lemma 6 Let O be an ALCHIQ ontology of depth 1. If for all irreflexive bouquets D that are consistent w.r.t. O, of outdegree ≤ |O|, and satisfy sig(D) ⊆ sig(O) there is a 1-materialization of O and D, then O is materializable for the class of all such bouquets. P ROOF. For brevity, we call an irreflexive bouquet F relevant if it is consistent w.r.t. O, of outdegree ≤ |O| and satisfies sig(F) ⊆ sig(O). An irreflexive 1-materializability witness (F, a, B) consists of a relevant irreflexive bouquet F with root a and a 1-materialization B of F w.r.t. O. One can show that B is an irreflexive tree interpretation. Now let D be a relevant irreflexive bouquet with root a and assume that D is 1-materializable w.r.t. O. We have to show that there exists a materialization of O and D. Note that for every relevant irreflexive bouquet F, there is a 1-materializability witness (F, a, B). We construct the desired materialization step-bystep using these pairs also memorizing sets of frontier elements that have to be expanded in the next step. We start with the irreflexive 1-materializability witness (D, a, B) and set B0 = B and F0 = dom(B) \ {a}. Then we construct a sequence of irreflexive tree interpretations B0 ⊆ B1 ⊆ . . . and frontier sets Fi+1 ⊆ dom(Bi+1 ) \ dom(Bi ) inductively as follows: given Bi and Fi , take for any b ∈ Fi its predecessor a in Bi and an irreflexive 1-materializability witness (Bi|{a,b} , b, Bb ) and set [ [ Bi+1 := Bi ∪ Bb Fi+1 := dom(Bb ) \ {b} b∈Fi b∈Fi ∗ Let B be the union of all Bi . We show that B is a materialization of O and D. B is a model of O by construction since O is an ALCHIQ ontology of depth 1. Consider a model A of O and D. It suffices to construct a homomorphism h from B∗ to A that preserves dom(D). We may assume that A is an irreflexive tree interpretation. We construct h as the limit of a sequence h0 , . . . of homomorphisms from Bi to A. By definition, there exists a homomorphism h0 from B0 to A≤1 a preserving dom(D). Now, inductively, assume that hi is a homomorphism from Bi to A. Assume c has been added to Bi in the construction of Bi+1 . Then there exists b ∈ Fi and its predecessor a in Bi such that c ∈ dom(Bb ) \ {b}, where Bb is the irreflexive tree interpretation that has been added to Bi as the last component of the irreflexive model pair (Bi|{a,b} , b, Bb ). But then, as Bb is a 1-materialization of Bi|{a,b} and hi is injective on Bi|{a,b} (since A is irreflexive), we can expand the homomorphism hi to a homomorphism to A with domain dom(Bi ) ∪ {c}. Thus, we can expand hi to a homomorphism from Bi+1 to A. Lemma 5 and Lemma 6 imply that an ALCHIQ ontology O of depth 1 enjoys PT IME query evaluation if and only if all irreflexive bouquets D that are consistent w.r.t. O, of outdegree ≤ |O|, and satisfy sig(D) ⊆ sig(O) have a 1-materialization w.r.t. O. The latter condition can be checked in deterministic exponential time since the satisfiability problem for ALCHIQ ontologies is in E XP T IME. Moreover, there are only exponentially many relevant bouquets. We have thus proved the E XP T IME upper bound in Theorem 13. A matching lower bound can be proved by a straightforward reduction from satisfiability. The following example shows that, in contrast to ALCHIQ depth 1, for uGC− 2 (1, =) the existence of 1-materializations does not guarantee materializability of bouquets. Example 7 We use ∃6= yW (x, y) to abbreviate ∃y(W (x, y)∧(x 6= y)) and likewise for ∃6= yW (y, x). Let S, S 0 , R, R0 be binary relation symbols and O the uGF− 2 (1, =) ontology O that contains ∀x S(x, x) → (R(x, x) → (∃6= yR(x, y) ∨ ∃6= yS(x, y))) ∀x(∃6= yW (y, x) → ∃yW 0 (x, y)) where (W, W 0 ) range over {(R, R0 ), (S, S 0 )}. Observe that for the instance D = {(S(a, a), R(a, a)} and the Boolean UCQ q ← R0 (x, y) ∨ S 0 (x, y), we have O, D |= q. Also, for qR ← R0 (x, y) and qS ← S 0 (x, y) we have O, D 6|= qR and O, D 6|= qS . Thus, O is not materializable. It is, however, easy to show that for every bouquet D there exists a 1-materialization of D w.r.t. O. In uGC− 2 (1, =), we thus have to check unrestricted materializability of bouquets, instead of 1-materializability. In fact, it suffices to consider materializations that are tree interpretations. To decide the existence of such materialization, we use a mosaic approach. In each mosaic piece, we essentially record a 1-neighborhood of the materialization, a 1-neighborhood of a model of the bouquet and ontology, and a homomorphism from the former to the latter. We then identify certain conditions that characterize when a set of mosaics can be assembled into a materialization in a way that is similar to the model construction in the proof of Lemma 6. There are actually two different kinds of mosaic pieces that we use, with one kind of piece explicitly addressing reflexive loops which, as illustrated by Example 7, are the reason why we cannot work with 1-materializations. The decision procedure then consists of guessing a set of mosaics and verifying that the required conditions are satisfied. Details are in the appendix. Theorem 13 only covers ontology languages of depth 1. It would be desirable to establish decidability also for ontology languages of depth 2 that enjoy a dichotomy between PT IME and CO NP, such as uGF− 2 (2). The following example shows that this requires more sophisticated techniques than those used above. In particular, materializability of bouquets does not imply materializability. Example 8 We give a family of ALC-ontologies (On )n≥0 of depth 2 such that each On is materializable for the class of tree interpretations of depth at most 2n − 1 while it is not materializable. The idea is that any instance D that witnesses non-materializability of On must contain an R-chain of length 2n , R a binary relation. The presence of this chain is verified by propagating a marker upwards along the chain. To avoid that O is 1-materializable, we represent this marker by a universally quantified formula and also hide some other unary predicates in the same way. For each unary predicate P , let HP (x) denote the formula ∀y(S(x, y) → P (y)) and include in On the sentence ∀x∃y(S(x, y) ∧ P (y)). The remaining sentences in On are: X 1 (x) ∧ · · · ∧ X n (x) → HV (x) Xi (x) ∧ ∃R.(Xi (y) ∧ Xj (y)) → Hoki (x) X i (x) ∧ ∃R.(X i (y) ∧ Xj (y)) → Hoki (x) Xi (x) ∧ ∃R.(X i (y) ∧ X1 (y) ∧ · · · ∧ Xi−1 (y)) → Hoki (x) X i (x) ∧ ∃R.(Xi (y) ∧ X1 (y) ∧ · · · ∧ X i−1 (y)) → Hoki (x) Hok1 (x) ∧ · · · ∧ Hokn (x) ∧ ∃R.HV (y) → HV (x) ∃R.Xi (y) ∧ ∃R.X i (y) → ⊥ X1 (x) ∧ · · · ∧ Xn (x) ∧ HV (x) → B1 (x) ∨ B2 (x) where x is universally quantified, ∃R.ϕ(y) is an abbreviation for ∃y(R(x, y)∧ϕ(y)), and i ranges over 1..n. Note that X1 , . . . , Xn and X 1 , . . . , Xn represent a binary counter and that lines two to five implement incrementation of this counter. The second last formula is necessary to avoid that multiple successors of a node interact in undesired ways. On instances that contain no R-chain of length 2n , a materialization can be constructed by a straightforward chase procedure. We also observe that the ideas from Example 8 gives rise to a NE X P T IME lower bound. Theorem 14 For ALC ontologies of depth 2, deciding whether query-evaluation is in PT IME is NE XP T IME-hard (unless P TIME = CO NP). We remark that the decidability of PT IME query evaluation of ALC ontologies of depth 2 remains open. 9. CONCLUSION Perhaps the most surprizing result of our analysis is that it is possible to escape Ladner’s Theorem and prove a PT IME/CO NP dichotomy for query evaluation for rather large subsets of the guarded fragment that cover almost all practically relevant DL ontologies. This result comes with a characterization of PT IME query evaluation in terms of materializability and unravelling tolerance, with the guarantee that PT IME query evaluation coincides with Datalog6= rewritability, and with decidability of (and complexity results for) meta problems such as deciding whether a given ontology enjoys PT IME query evaluation. Our study also shows that when we increase the expressive power in seemingly harmless ways, then often there is provably no PT IME/CO NP dichotomy or one obtains CSP-hardness. The proof of the non-dichotomy results comes with a variation of Ladner’s Theorem that could prove useful in other contexts where some form of precoloring of the input is unavoidable, such as in consistent query answering [43]. There are a number of interesting future research questions. The main open questions regarding dichotomies are whether the PT IME/CO NP dichotomy can be generalized from uGF− 2 (2) to uGF− (2) and whether the CSP-hardness results can be sharpened to either CSP-equivalence results (this is known for ALC ontologies of depth 3 [42]) or to non-dichotomy results. Also of interest is the complexity of deciding PT IME query evaluation for uGF(1), where the characterization of PT IME query evaluation via hom-universal models fails. Improving our current complexity results to tight complexity bounds for P TIME query evaluation for ALCHIF ontologies of depth 2 and uGF− 2 (2) ontologies appears to be challenging as well. It would also be interesting to study the case where invariance under disjoint union is not guaranteed (as we have observed, the complexities of CQ and UCQ evaluation might then diverge), and to add the ability to declare in an ontology that a binary relation is transitive. 10. ACKNOWLEDGMENTS André Hernich, Fabio Papacchini, and Frank Wolter were supported by EPSRC UK grant EP/M012646/1. Carsten Lutz was supported by ERC CoG 647289 CODA. 11. REFERENCES [1] H. Andréka, I. Németi, and J. van Benthem. Modal languages and bounded fragments of predicate logic. J. Philosophical Logic, 27(3):217–274, 1998. [2] S. Arora and B. Barak. Computational Complexity - A Modern Approach. Cambridge University Press, 2009. [3] A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. The DL-Lite family and relations. J. of Artifical Intelligence Research, 36:1–69, 2009. [4] F. Baader, Deborah, D. Calvanese, D. L. McGuiness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook. Cambridge University Press, 2003. [5] V. Bárány, G. Gottlob, and M. Otto. Querying the guarded fragment. Logical Methods in Computer Science, 10(2), 2014. [6] V. Bárány, B. ten Cate, and L. Segoufin. Guarded negation. J. ACM, 62(3):22:1–22:26, 2015. [7] L. Barto. Constraint satisfaction problem and universal algebra. SIGLOG News, 1(2):14–24, 2014. [8] C. Beeri and P. A. Bernstein. Computational problems related to the design of normal form relational schemas. ACM Trans. Database Syst., 4(1):30–59, 1979. [9] C. Beeri and M. Y. Vardi. A proof procedure for data dependencies. J. ACM, 31(4):718–741, 1984. [10] M. Bienvenu and M. Ortiz. Ontology-mediated query answering with data-tractable description logics. In Proc. of Reasoning Web, pages 218–307, 2015. [11] M. Bienvenu, B. ten Cate, C. Lutz, and F. Wolter. Ontology-based data access: A study through disjunctive datalog, csp, and MMSNP. ACM Trans. Database Syst., 39(4):33:1–33:44, 2014. [12] A. Calì, G. Gottlob, and M. Kifer. Taming the infinite chase: Query answering under expressive relational constraints. J. Artif. Intell. Res. (JAIR), 48:115–174, 2013. [13] A. Calì, G. Gottlob, and A. Pieris. Towards more expressive ontology languages: The query answering problem. Artif. Intell., 193:87–128, 2012. [14] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Data complexity of query answering in description logics. Artificial Intelligence, 195:335–360, 2013. [15] D. Calvanese, G. De Giacomo, and M. Lenzerini. On the decidability of query containment under constraints. In Proc. of PODS, pages 149–158, 1998. [16] D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. Autom. Reasoning, 39(3):385–429, 2007. [17] D. Calvanese, G. D. Giacomo, M. Lenzerini, and M. Y. Vardi. View-based query processing and constraint satisfaction. In LICS, pages 361–371, 2000. [18] D. Calvanese, G. D. Giacomo, M. Lenzerini, and M. Y. Vardi. View-based query containment. In PODS, pages 56–67, 2003. [19] D. Cohen and P. Jeavons. The complexity of constraint languages, chapter 8. Elsevier, 2006. [20] A. Deutsch, A. Nash, and J. B. Remmel. The chase revisited. In M. Lenzerini and D. Lembo, editors, Proc. of PODS, pages 149–158. ACM, 2008. [21] R. Fagin, B. Kimelfeld, and P. G. Kolaitis. Dichotomies in the complexity of preferred repairs. In Proc. of PODS, pages 3–15, 2015. [22] R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89–124, 2005. [23] T. Feder and M. Y. Vardi. The computational structure of monotone monadic SNP and constraint satisfaction: A study through datalog and group theory. SIAM J. Comput., 28(1):57–104, 1998. [24] G. Fontaine. Why is it hard to obtain a dichotomy for consistent query answering? ACM Trans. Comput. Log., 16(1):7:1–7:24, 2015. [25] C. Freire, W. Gatterbauer, N. Immerman, and A. Meliou. The complexity of resilience and responsibility for self-join-free conjunctive queries. PVLDB, 9(3):180–191, 2015. [26] G. Gottlob, S. Kikot, R. Kontchakov, V. V. Podolskii, T. Schwentick, and M. Zakharyaschev. The price of query rewriting in ontology-based data access. Artif. Intell., 213:42–59, 2014. [27] G. Gottlob, M. Manna, and A. Pieris. Polynomial rewritings for linear existential rules. In Proc. of IJCAI, pages 2992–2998, 2015. [28] G. Gottlob, G. Orsi, and A. Pieris. Query rewriting and optimization for ontological databases. ACM Trans. Database Syst., 39(3):25:1–25:46, 2014. [29] E. Grädel. On the restraining power of guards. J. Symb. Log., 64(4):1719–1742, 1999. [30] E. Grädel and M. Otto. The freedoms of (guarded) bisimulation. In Johan van Benthem on Logic and Information Dynamics, pages 3–31. 2014. [31] M. Kaminski, Y. Nenov, and B. C. Grau. Datalog rewritability of disjunctive datalog programs and non-horn ontologies. Artif. Intell., 236:90–118, 2016. [32] Y. Kazakov. A polynomial translation from the two-variable guarded fragment with number restrictions to the guarded fragment. In Proc. of JELIA, pages 372–384, 2004. [33] B. Kimelfeld. A dichotomy in the complexity of deletion propagation with functional dependencies. In M. Benedikt, M. Krötzsch, and M. Lenzerini, editors, Proc. of PODS, pages 191–202. ACM, 2012. [34] R. Kontchakov and M. Zakharyaschev. An introduction to description logics and query rewriting. In Proc. of Reasoning Web, pages 195–244, 2014. [35] P. Koutris and D. Suciu. A dichotomy on the complexity of consistent query answering for atoms with simple keys. In Proc. of ICDT, pages 165–176. OpenProceedings.org, 2014. [36] P. Koutris and J. Wijsen. The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In Proc. of PODS, pages 17–29, 2015. [37] M. Krötzsch. OWL 2 profiles: An introduction to lightweight ontology languages. In Proc. of Reasoning Web, pages 112–183, 2012. [38] R. E. Ladner. On the structure of polynomial time reducibility. J. ACM, 22(1):155–171, 1975. [39] B. Larose and P. Tesson. Universal algebra and hardness results for constraint satisfaction problems. Theor. Comput. Sci., 410(18):1629–1647, 2009. [40] A. Y. Levy and M. Rousset. Combining horn rules and description logics in CARIN. Artif. Intell., 104(1-2):165–209, 1998. [41] C. Lutz, I. Seylan, and F. Wolter. Ontology-based data access with closed predicates is inherently intractable(sometimes). In Proc. of IJCAI, pages 1024–1030, 2013. [42] C. Lutz and F. Wolter. Non-uniform data complexity of query answering in description logics. In Proc. of KR, 2012. [43] C. Lutz and F. Wolter. On the relationship between consistent query answering and constraint satisfaction problems. In Proc. of ICDT, pages 363–379, 2015. [44] J. Minker, editor. Foundations of Deductive Databases and Logic Programming. Elsevier, 1988. [45] M. Mugnier and M. Thomazo. An introduction to ontology-based query answering with existential rules. In Proc. of Reasoning Web, pages 245–278, 2014. [46] M. Ortiz, D. Calvanese, and T. Eiter. Data complexity of query answering in expressive description logics via tableaux. Journal of Automated Reasoning, 41(1):61–98, 2008. [47] A. Poggi, D. Lembo, D. Calvanese, G. D. Giacomo, M. Lenzerini, and R. Rosati. Linking data to ontologies. J. Data Semantics, 10:133–173, 2008. [48] I. Pratt-Hartmann. Complexity of the guarded two-variable fragment with counting quantifiers. J. Log. Comput., 17(1):133–155, 2007. [49] A. Rafiey, J. Kinne, and T. Feder. Dichotomy for digraph homomorphism problems. CoRR, abs/1701.02409, 2017. [50] A. Schaerf. On the complexity of the instance checking problem in concept languages with existential quantification. J. of Intel. Inf. Systems, 2:265–278, 1993. [51] D. Suciu, D. Olteanu, C. Ré, and C. Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. [52] P. L. Whetzel, N. F. Noy, N. H. Shah, P. R. Alexander, C. Nyulas, T. Tudorache, and M. A. Musen. BioPortal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications. Nucleic acids research, 39(suppl 2):W541–W545, 2011. APPENDIX A. INTRODUCTION TO DESCRIPTION LOGIC We give a brief introduction to the syntax and semantics of DLs and establish their relationship to the guarded fragment of FO. We consider the DL ALC and its extensions by inverse roles, role inclusions, qualified number restrictions, functional roles, and local functionality. Recall that ALC-concepts are constructed according to the rule C, D := > | ⊥ | A | C u D | C t D | ¬C | ∃R.C | ∀R.C where A ranges over unary relations and R ranges over binary relations. DLs extended by inverse roles (denoted in the name of a DL by the letter I) admit, in addition, inverse relations denoted by R− , with R a relation. Thus, in ALCI inverse relations can be used in place of relations in any ALC concept. DLs extended by qualified number restrictions (denoted by Q) admit concepts of the form (≥ n R C), (= n R C), and (≤ n R C), where n ≥ 1 is a natural number, R is a relation or an inverse relation (if inverse relations are in the original DL), and C is a concept. When extending a DL with local functionality (denoted by F` ) one can use only number restrictions of the form (≤ 1 R >) in which R is a relation or an inverse relation (if inverse relations are in the original DL). We abbreviate (≤ 1 R >) with (≤ 1R) and use (= 1R) as an abbreviation for (∃R.>) u (≤ 1R) and (≥ 2R) as an abbreviation for (∃R.>) u ¬(≤ 1R). In DLs, ontologies are formalized as finite sets of concept inclusions C v D, where C, D are concepts in the respective language. We use C ≡ D as an abbreviation for C v D and D v C. In the DLs extended with functionality (denoted by F) one can use functionality assertions func(R), where R is a relation or an inverse relation (if present in the original DL). Such an R is interpreted as a partial function. Extending a DL with role inclusions (denoted by H) allows one to use expressions of the form R v S, where R and S are relations or inverse relations (if present in the original DL), and which state that R is a subset of S. The semantics of DLs is given by interpretations A. The interpretation C A of a concept C in an interpretation A is defined inductively as follows: >A = dom(A) AA = {a ∈ dom(A) | A(a) ∈ A} (C u D)A = C A ∩ DA (∃R.C)A (∀R.C)A (≥ n R C)A (≤ n R C)A (= n R C)A = {a ∈ dom(A) | = {a ∈ dom(A) | ∃a0 ∀a0 ⊥A = ∅ (¬C)A = dom(A) \ C A (C t D)A = C A ∪ DA : R(a, a0 ) ∈ A and a0 ∈ C A } : R(a, a0 ) ∈ A implies a0 ∈ AA } = {a ∈ dom(A) | |{b | R(a, b) ∈ A and b ∈ C A }| ≥ n} = {a ∈ dom(A) | |{b | R(a, b) ∈ A and b ∈ C A }| ≤ n} = {a ∈ dom(A) | |{b | R(a, b) ∈ A and b ∈ C A }| = n} Then A satisfies a concept inclusion C v D if C A ⊆ DA . Alternatively, one can define the semantics of DLs by translating them into FO; the following table gives such a translation: >∗ (x) = > ⊥∗ (x) = ⊥ A∗ (x) = A(x) (¬C)∗ (x) = ¬(C ∗ (x)) (C u D)∗ (x) = C ∗ (x) ∧ D∗ (x) (C t D)∗ (x) = C ∗ (x) ∨ D∗ (x) (∃R.C)∗ (x) = ∃y (R(x, y) ∧ C ∗ (y)) (∀R.C)∗ (x) = ∀y (R(x, y) → C ∗ (y)) (≥ n R C)∗ (x) = ∃≥n y(R(x, y) ∧ C ∗ (y)) We observe the following relationships between DLs and fragments of the guarded fragment. For a DL L and fragment L0 of the guarded fragment we say that an L ontology O can be written as an L0 ontology if the translation given above translates O into an L0 ontology. Lemma 7 The following inclusions hold: 1. Every ALCHI ontology can be written as a uGF2 ontology. If the ontology has depth 2, then it can be written as a uGF− 2 (2) ontology. 2. Every ALCHIF ontology can be written as a uGF− 2 (f ) ontology. 3. Every ALCHIQ ontology can be written as a uGC2 ontology. If the ontology has depth 1, then it can be written as a uGC− 2 (1) ontology. B. INTRODUCTION TO DATALOG We give a brief introduction to the notation used for Datalog. A datalog6= rule ρ takes the form S(~ x) ← R1 (~ x1 ) ∧ · · · ∧ Rm (~ xm ) where S is a relation symbol, m ≥ 1, and R1 , . . . , Rm are either relation symbols or the symbol 6= for inequality. We call S(~ x) the head of ρ and R1 (~ x1 ) ∧ · · · ∧ Rm (~ xm ) its body. Every variable in the head of ρ is required to occur in its body. We call a datalog6= rule that does not use inequality a datalog rule. A Datalog6= program is a finite set Π of datalog6= rules with a selected goal relation symbol goal that does not occur in rule bodies in Π and only in goal rules of the form goal(~ x) ← R1 (~ x1 ) ∧ · · · ∧ Rm (~ xm ). The arity of Π is the arity of its goal relation. A Datalog program is a Datalog6= program not using inequality. For every instance D and Datalog6= program Π, we call a model A of D a model of Π if A is a model of all FO sentences ∀~ x∀~ x1 · · · ∀~ xm (R1 (~ x1 ) ∧ · · · ∧ Rm (~ xm ) → S(~ x)) with S(~ x) ← R1 (~ x1 ) ∧ · · · ∧ Rm (~ xm ) ∈ Π. We set D |= Π(~a) if goal(~a) ∈ A for all models A of D and Π. C. PROOFS FOR SECTION 2 Following the notation introduced after Theorem 1 we denote the guarded fragment with equality by GF(=) and uGF with equality in non-guard positions by uGF(=). In contrast, GF and uGF denote the corresponding languages without equality in non-guard positions. We use this notation in the formulation of Theorem 1 below. Theorem 1 (restated) (1) A sentence in GF(=) is invariant under disjoint unions iff it is equivalent to a sentence in uGF(=). (2) A sentence in GF is invariant under disjoint unions iff it is equivalent to a sentence in uGF. P ROOF. It remains to prove that every sentence in GF(=) is equivalent to a Boolean combination of sentences in uGF(=). Assume a sentence ϕ in GF(=) is given. First, we may assume that all subformulas ∃y(x = y ∧ ψ(x, y)) of ϕ are rewritten as ¬∀y(x = y → ψ(x, y)). Then let ψ = ∀y(x = y → ψ 0 (x, y)) be a subformula of ϕ with free variable x. It follows that ψ ≡ ψ 0 (x, x). As ψ 0 (x, x) is a GF(=) formula, any occurrence of ψ in ϕ can be replaced by ψ 0 (x, x). Let ϕ0 be the result of substituting all subformulas of the form of ψ in ϕ with the corresponding equivalent GF(=) formula. It follows that ϕ ≡ ϕ0 (note that ∀x∀y(x = y → ψ 0 (x, y)) can be rewritten as ∀x(x = x → ∀y(x = y → ψ 0 (x, y)))). A sentence ψ in ϕ is a simple sentence if there is no strict subformula ψ 0 of ψ which is a sentence. For the second step, let ϕ0 = ϕ0 , and let ϕi = (ϕi−1 [ψ/>] ∧ ψ) ∨ (ϕi−i [ψ/⊥] ∧ ¬ψ) for some simple sentence ψ of ϕi−1 not already substituted in any step j < i − 1. As at each step any simple sentence is a GF(=) sentence with guard R(~ x) or x = x and the guarded formula is an openGF formula, then any simple sentence is either the negation of a uGF(=) sentence or a uGF(=) sentence. Let ϕn be the resulting sentence. It follows that ϕ ≡ ϕn , and ϕn is a Boolean combination of uGF(=) sentences. For the proof of Lemma 1 we require the notion of guarded bisimulations that characterizes the expressive power of GF(=) [30]. To apply guarded bisimulations to characterize the fragment uGF(=) of GF(=) we modify the notion slightly by demanding that in the back-and-forth condition only overlapping guarded sets are used. To cover uGC2 (=) we also consider guarded bisimulations preserving the number of guarded sets of the same type that contain a fixed singleton set. Assume A and B are interpretations. A set I of partial isomorphisms p : ~a 7→ ~b between guarded tuples ~a and ~b in A and B, respectively, is called a connected guarded bisimulations if the following holds for all p : ~a 7→ ~b ∈ I: • for every guarded tuple ~a0 in A which has a non-empty intersection with ~a there exists a guarded tuple ~b0 in B and p0 : ~a0 7→ ~b0 ∈ I that coincides with p on the intersection of ~a and ~a0 . • for every guarded tuple ~b0 in B which has a non-empty intersection with ~b there exists a guarded tuple ~a0 in A and p0 : ~a0 7→ ~b0 ∈ I that coincides with p on the intersection of ~b and ~b0 . We say that (A, ~a) and (B, ~b) are connected guarded bisimilar if there exists a connected guarded bisimulation between A and B containing ~a 7→ ~b. Theorem 15 If (A, ~a) and (B, ~b) are connected guarded bisimilar and ϕ(~x) is a formula in openGF, then A |= ϕ(~a) iff B |= ϕ(~b). For uGC2 (=) the guarded sets have cardinality at most two. To preserve counting we require the following modified version of the definition above. A set I of partial isomorphisms p : ~a 7→ ~b between guarded tuples ~a = (a1 , a2 ) and ~b = (b1 , b2 ) in A and B, respectively, is called a counting connected guarded bisimulations if the following holds for all p : (a1 , a2 ) 7→ (b1 , b2 ) ∈ I: • for every finite number of distinct guarded tuples (a1 , a02 ) in A there exist (at least) the same number of guarded tuples (b1 , b02 ) in B and p0 : (a1 , a02 ) 7→ (b1 , b02 ) ∈ I. • for every finite number of distinct guarded tuples (a01 , a2 ) in A there exist (at least) the same number of guarded tuples (b01 , b2 ) in B and p0 : (a01 , a2 ) 7→ (b01 , b2 ) ∈ I. • for every finite number of distinct guarded tuples (b1 , b2 ) in B there exist (at least) the same number of guarded tuples (a1 , a02 ) in A and p0 : (a1 , a02 ) 7→ (b1 , b02 ) ∈ I. • for every finite number of distinct guarded tuples (b01 , b2 ) in B there exist (at least) the same number of guarded tuples (a01 , a2 ) in A and p0 : (a01 , a2 ) 7→ (b01 , b2 ) ∈ I. We say that (A, ~a) and (B, ~b) are counting connected guarded bisimilar if there exists a counting connected guarded bisimulation between A and B containing ~a 7→ ~b. Theorem 16 If (A, ~a) and (B, ~b) are counting connected guarded bisimilar and ϕ(~ x) is a formula in openGC2 , then A |= ϕ(~a) iff B |= ϕ(~b). Lemma 1 (restated) Let O be a uGF(=) or uGC2 (=) ontology, D a possibly infinite instance, and A a model of D and O. Then there exists a forest-model B of D and O such that there exists a homomorphism h from B to A that preserves dom(D). P ROOF. Assume first a uGF(=) ontology O, an instance D, and a model A of O and D are given. Let G be a maximal guarded set in D. We unfold A at G into a cg-tree decomposable interpretation BG as follows: let T (G) be the undirected tree with nodes t = G0 G1 · · · Gn , where Gi , 0 ≤ i ≤ n, are maximal guarded sets in A, G0 = G, G1 6⊆ dom(D), and (a) Gi 6= Gi+1 , (b) Gi ∩ Gi+1 6= ∅, and (c) Gi−1 6= Gi+1 . Let (t, t0 ) ∈ E if t0 = tF for some maximal guarded F . Set tail(G0 · · · Gn ) = Gn . Take an infinite supply of copies of any d ∈ dom(A). We assume all copies are in ∆D . We set e↑ = d if e is a copy of d. Define instances bag(t) for t ∈ T (G) inductively as follows. bag(G) is an instance whose domain is a set of copies of elements d ∈ G such that the mapping e 7→ e↑ is an isomorphism from bag(G) onto A|G . Assume bag(t) has been defined, tail(t) = F , and bag(t0 ) has not yet been defined for t0 = tF 0 ∈ T (G). Then take for any d ∈ F 0 \ F a fresh copy d0 of d and define bag(t0 ) with domain {d0 | d ∈ F 0 \ F } ∪ {e ∈ bag(t) | e↑ ∈ F 0 ∩ F )} such that the mapping e 7→ e↑ is an 0 isomorphism from S bag(t ) onto A|F 0 . Let BG = t∈T (G) bag(t). Now hook BG to D at G by identifying dom(bag(G)) with G using the isomorphism e 7→ e↑ and let B be the union of all BG with G a maximal guarded set in D. It is not difficult to prove the following properties of B: 1. (T (G), E, bag) is a cg-tree decomposition of BG , for every maximal guarded set G in D; 2. The mapping h : e 7→ e↑ is a homomorphism from B to A which preserves dom(D) and is an isomorphism on each bag(t) with t ∈ T (G) for some maximal guarded set G in D. 3. For any maximal guarded set G = {a1 , . . . , ak } in D, there is a connected guarded bisimulation between (B, (a1 , . . . , ak )) and (A, (a1 , . . . , ak )). It follows from Theorem 15 that B is a model of D and O and thus as required. Assume now that O is a uGC2 (=) ontology, that D is an instance, and that A is a model of O and D. We can assume that D and A use unary and binary relation symbols only. We have to preserve the number of guarded sets of a given type intersecting with a singleton and therefore the unfolding is slightly different from the case without guarded counting quantifiers. Let c ∈ dom(D). We unfold A at c into a cg-tree decomposable B{c} . Let T (c) be the undirected tree with nodes t = {c}G1 · · · Gn , where Gi , 1 ≤ i ≤ n, are maximal guarded sets in A, c ∈ G1 and G1 6⊆ dom(D), and (a) Gi 6= Gi+1 , (b) Gi ∩ Gi+1 6= ∅, and (c) Gi−1 ∩ Gi 6= Gi+1 ∩ Gi . Let (t, t0 ) ∈ E if t0 = tF for some maximal guarded F in D. Observe that Condition (c) is stronger than the corresponding Condition (c) in the construction of forest models for uGF ontologies. The new condition ensures that we do not introduce more successors of the same type when we unfold an interpretation. Take an infinite supply of copies of any d ∈ dom(A). We set, as before, e↑ = d if e is a copy of d. Define instances bag(t) for t ∈ T (c) inductively as follows. Let c0 be a copy of c and set bag({c}) = {P (c0 ) | P (c) ∈ A}. Assume bag(t) has been defined, tail(t) = F , and bag(t0 ) has not yet been defined for t0 = tF 0 ∈ T (c). Take for the unique d ∈ F 0 \ F a fresh copy d0 of d and define bag(t0 ) with domain {d0 } ∪ {e ∈ bag(t) | e↑ ∈ F 0 ∩ F )} such that the mapping e 7→S e↑ is an isomorphism from bag(t0 ) onto A|F 0 . Let B{c} = t∈T (c) bag(t). Now hook B{c} to D at {c} by identifying c0 with c and let [ B = {R(~a) ∈ A | ~a ⊆ dom(D)} ∪ B{c} . c∈dom(D) We can, of course, present the interpretation B as an interpretation obtained from hooking interpretations BG with G maximal guarded sets to D by choosing for each c a single maximal guarded set f (c) with c ∈ f (c) and setting for G = {c1 , c2 }, [ BG = {R(~a) ∈ A | ~a ⊆ G} ∪ B{c} f (c)=G It is not difficult to prove the following properties of B: 1. (T (c), E, bag) is a guarded tree decomposition of B{c} , for every c ∈ dom(D); 2. The mapping h : e 7→ e↑ is a homomorphism from B to A which preserves dom(D) and is an isomorphism on each bag(t) with t ∈ T (c) for some c ∈ dom(D). 3. For any maximal guarded set G = {a1 , a2 } in D, there is a counting connected guarded bisimulation between (B, (a1 , a2 )) and (A, (a1 , a2 )). It follows from Theorem 16 that B is a model of O and D and thus as required. D. PROOFS FOR SECTION 3 Theorem 2 (restated) Let O be a uGF(=) or uGC2 (=) ontology and M a class of instances. Then the following conditions are equivalent: P ROOF. The equivalence (2) ⇔ (3) is straightforward. We show (1) ⇒ (2). Assume O is rAQ-materializable for M and assume the instance D ∈ M is consistent w.r.t. O. Let A be a rAQmaterialization of O and D. Consider a forest model B of D and O such that there is a homomorphism from B to A preserving dom(D) (Lemma 1). Recall that B is obtained from D by hooking cg-tree decomposable BG to any maximal guarded set G in D. We show that B is a CQ-materialization of O and D. To this end it is sufficient to prove that for any finite subinterpretation B0 of B and any model A0 of O and D there exists a homomorphism h from B0 to A0 that preserves dom(D) ∩ dom(B0 ). We may assume that for any maximal guarded set G in D, if dom(B0 ) ∩ G 6= ∅, then G ⊆ dom(B0 ) and that B0 ∩ BG is connected if nonempty. But then we can regard every B0 ∩ BG as an rAQ qG with answer variables G. From B |= qG (~b) for ~b a suitable tuple containing all b ∈ G we obtain A |= qG (~b) since B is a rAQ-materialization of D and O. Let hG be the homomorphism that witnesses B |= qG (~b). Then hG is a homomorphism from B0 ∩ BG to A that preserves G. The union h of all hG , G a maximal guarded set in D, is the desired homomorphism from B0 to A0 preserving dom(D)∩dom(B0 ). Lemma 2 (restated) A uGC2 (=) ontology is materializable iff it admits hom-universal models. This does not hold for uGF(2) ontologies. P ROOF. The direction “⇐” is straightforward. Conversely, assume that O is materializable. Let D be an instance that is consistent w.r.t. O. By Lemma 1 there exists a forest model B of D and O that is a CQ-materialization of O and D. Using a straightforward selective filtration procedure one can show that there exists B0 ⊆ B such that B0 is a model of O and D and for any guarded set G the number of guarded sets G0 with G ∩ G0 6= ∅ is finite. We show that for any model A of O and D there is a homomorphism from B0 into A that preserves dom(D). Let A be a model of O and D. Again we may assume that A is a forest model such that for any guarded set G the number of guarded sets G0 with G ∩ G0 6= ∅ is finite. Now, for any finite subset F of dom(B0 ) there is a homomorphism hF from B0|F to A preserving dom(D) ∩ F . Let Fm be the set of all d ∈ dom(B0 ) such that there is a sequence of at most m guarded sets G0 , . . . , Gm with Gi ∩ Gi+1 6= ∅ for i < m, G S0 ∩ dom(D) 6= ∅, 0and d ∈ Gm . Then each Fm is finite and m≥0 Fm = dom(B ). Now we can construct the homomorphism h from B0 to A as the limit of homomorphisms hFnm from B0|Fnm to A preserving dom(D) for an infinite sequence n0 < n1 · · · using a standard pigeonhole argument. We construct an ontology O that expresses that every element of a unary relation C is in the centre of a ‘partial wheel’ represented by a ternary relation W . The wheel can be generated by turning right or left. The two resulting models are homomorphically incomparable but cannot be distinguished by answers to CQs. Thus, neither of the two models is hom-universal but one can ensure that O is materializable. As a first attempt to construct O we take unary relations L (turn left) and R (turn right), the following sentence that says that one has to choose between L and R when generating the wheel with cente C: ∀x C(x) → ((L(x) ∨ R(x)) ∧ ∃y1 , y2 W (x, y1 , y2 )) 1. O is rAQ-materializable for M; and the following sentences that generates the wheel accordingly: ∀x, y, z W (x, y, z) → (L(x) → ∃z 0 W (x, z, z 0 )) 3. O is UCQ-materializable for M. ∀x, y, z W (x, y, z) → (R(x) → ∃y 0 W (x, y 0 , y)) 2. O is CQ-materializable for M; Clearly this ontology does not admit hom-universal models. Note, however, that is also not materializable since for the instance D = {C(a)} we have O, D |= L(a) ∨ R(a) but we neither have O, D |= L(a) nor O, D |= R(a). The first step to ensure materializability is to replace L(x) by ∃y(gen(x, y) ∧ ¬L(x)) and R(x) by ∃y(gen(x, y) ∧ ¬R(x)) in the sentences above and also add ∀x∃y(gen(x, y) ∧ L(x)) and ∀x∃y(gen(x, y) ∧ R(x)) to O. Then a CQ ‘cannot detect’ whether one satisfies the disjunct ∃y(gen(x, y) ∧ ¬R(x)) or the disjunct ∃y(gen(x, y) ∧ ¬L(x)) at a given node x in C. Note that the resulting ontology is still not materializable: if W (a, b, c) ∈ D then CQs can detect whether one introduces a c0 with W (a, c, c0 ) ∈ A or a b0 with W (a, b0 , b) ∈ A. To deal with this problem we ensure that a wheel has to be generated from W (a, b, c) only if b, c are not in the instance D. In detail, we construct O as follows. We use unary relations A, L, and R and binary relations aux and gen. For a binary relation S and unary relation B we abbreviate ∀S.B ∃S.¬B(x) := ∀y(S(x, y) → B(y)) := ∃y(S(x, y) ∧ ¬B(y)) Now let O state that every node has aux-successors in A, and gensuccessors in L and R: ∀x∃y(aux(x, y) ∧ A(y)) and ∀x∃y(gen(x, y) ∧ L(y)) ∀x∃y(gen(x, y) ∧ R(y)) We abbreviate W 0 (x, y, z) := W (x, y, z) ∧ ∀aux.A(y) ∧ ∀aux.A(z) Next we introduce the disjunction that determines whether one generates the wheel by turning left or right, as indicated above: ∀x C(x) → (∃gen.¬L(x) ∨ ∃gen.¬R(x)) The following axiom starts the generation of the wheel. Observe that we use W 0 (x, y1 , y2 ) rather than W (x, y1 , y2 ) to ensure that new individuals are created for y1 and y2 : ∀x C(x) → ∃y1 , y2 W 0 (x, y1 , y2 ) Finally, we turn either left or right: ∀x, y, z W 0 (x, y, z) → (∃gen.¬L(x) → ∃z 0 W 0 (x, z, z 0 )) ∀x, y, z W 0 (x, y, z) → (∃gen.¬R(x) → ∃y 0 W 0 (x, y 0 , y)) This finishes the definition of O. O is a uGF(2) ontology. To show that it is materializable observe that for any instance D we can ensure that in a model A of D and O for all a ∈ dom(D) there exists b with aux(a, b) ∈ A and A(b) 6∈ A. Thus, the wheel generation sentences apply only to W (a, b, c) with b, c 6∈ dom(A). It is now easy to prove that CQ evaluation w.r.t. O is in PT IME. On the other hand, for D = {C(a)} there does not exist a model B of O and D such that there is a homomorphism from B into every model of O and D that preserves dom(D). In the proof of Lemma 3 (and related proofs below), it will be more convenient to work with a certain disjunction property instead of with materializability. We now introduce this property and show the equivalence of the two notions. Let Q be a class of nonBoolean connected CQs. An ontology O has the Q-disjunction property if for all instances D, queries q1 (~ x1 ), . . . , qn (~ xn ) ∈ Q and d~1 , . . . , d~n in D with d~i of the same length as ~ xi : if O, D |= q1 (d~1 ) ∨ . . . ∨ qn (d~n ), then there exists i ≤ n such that O, D |= qi (d~i ). Theorem 17 Let Q be a class CQs and O an FO(=)-ontology. Then O is Q-materializable iff O has the Q-disjunction property. P ROOF. For the nontrivial “⇐” direction, let D be an intance that is consistent w.r.t. O and such that there is no Qmaterialization of O and D. Then the set of FO-sentences O∪D∪Γ is not satisfiable, where ~ | O, D 6|= q(d), ~ d~ ⊆ dom(D), q ∈ Q} Γ = {¬q(d) In fact, any satisfying interpretation would be a Q-materialization of O. By compactness, there is a finite subset Γ0 of Γ such that O ∪ D ∪ Γ0 is not satisfiable, i.e. _ ~ O, D |= q(d) 0 ~ ¬q(d)∈Γ ~ for all q(d) ~ ∈ Γ0 . Thus, O lacks By definition of Γ0 , O, D 6|= q(d) the Q-disjunction property. Theorem 3 (restated) Let O be an FO(=)-ontology that is invariant under disjoint unions. If O is not materializable, then rAQevaluation w.r.t. O is CO NP-hard. P ROOF SKETCH . It was proved in [42] that if an ALC ontology O is not ELIQ-materializable, then ELIQ-evaluation w.r.t. O is CO NP-hard, where an ELIQ is a rAQ q(~ x) such that the associated instance Dq(~x) viewed as an undirected graph is a tree (instead of cg-tree decomposable) with a single answer variable at the root.2 The proof is by reduction from 2+2-SAT, the variant of propositional satisfiability where the input is a set of clauses of the form (p1 ∨ p2 ∨ ¬n1 ∨ ¬n2 ), each p1 , p2 , n1 , n2 a propositional letter or a truth constant [50]. The proof of Theorem 3 can be obtained from the proof in [42] by minor modifications, which we sketch in the following. The proof crucially exploits that if O is not rAQ-materializable, then by Theorem 17 it does not have the rAQ-disjunction property. In fact, we take an instance D, rAQs (resp. ELIQs) q1 (~ x1 ), . . . , qn (~ xn ) ∈ Q, and elements d~1 , . . . , d~n of D that witness failure of the disjunction property, copy them an appropriate number of times, and use the resulting set of gadgets to choose a truth value for the variables in the input 2+2-SAT formula. The fact that O is invariant under disjoint unions ensures that the choice of truth values for different variables is independent. A main difference between ELIQs and rAQs is that rAQs can have more than one answer variable. A straightforward way to handle this is to replace certain binary relations from the reduction in [42] with relations of higher arity (these are ‘fresh’ relations introduced in the reduction, that is, they do not occur in O). To deal with a rAQ of arity k, one would use a k + 1-ary relation. However, with a tiny bit of extra effort (and if desired), one can replace these relations with k binary relations. We now turn to the proof of Theorem 4. Theorem 4 (restated) Let O be a uGF(=) or uGC2 (=) ontology. The following are equivalent: 1. UCQ-evaluation w.r.t. O is in PT IME iff CQ-evaluation w.r.t. O is in PT IME iff rAQ-evaluation w.r.t. O is in PT IME. 2. UCQ-evaluation w.r.t. O is Datalog6= -rewritable iff CQevaluation w.r.t. O is Datalog6= -rewritable iff rAQ-evaluation w.r.t. O is Datalog6= -rewritable. If O is a uGF ontology, then the same is true for Datalog-rewritability. 2 In the context of ALC, relation symbols are at most binary and thus it should be clear what ‘tree’ means. 3. UCQ-evaluation w.r.t. O is CO NP-hard iff CQ-evaluation w.r.t. O is CO NP-hard iff rAQ-evaluation w.r.t. O is CO NPhard. The “only if” directions of the first two parts of Theorem 4 and the “if” direction of the third part are trivial. For the non-trivial “if” directions of part 1 and V 2 we decompose UCQs into finite disjunctions of queries q ∧ i qi where q is a “core CQ” that only needs to be evaluated over the input instance D (ignoring labeled nulls) and each qi is a rAQ. This is similar to squid decompositions in [12], but more subtle due to the presence of subqueries that are not connected to any answer variable of q. The same decomposition of UCQs is also used in the “only if” direction of part 3. The following definition formally introduces decompositions of UCQs. Definition 5 Let O be an ontology and let q(~ x) be a UCQ. A decomposition of q(~x) w.r.t. O is a finite set Q of pairs (φ(~ y ), C), where • φ(~ y ) is a conjunction of atoms (possibly equality atoms) such that ~ y contains all variables in ~ x; • C is a finite set of CQs q(~z), where Dq is cg-tree decomposable; such that the following are equivalent for each instance D and tuple ~a ∈ dom(D)|~x| : 1. O, D |= q(~a). 2. For each model A of D and O there exists a pair (φ(~ y ), C) in Q and an assignment π of elements in dom(D) to the variables in ~ y such that • π(~x) = ~a, • D |= φ(π(~ y )), and • A |= q 0 (π(~z)) for each q 0 (~z) in C. The decomposition is strong if for each pair (φ(~ y ), C), the set C consists of rAQs. We aim to prove that for every uGF(=) or uGC2 (=) ontology O and each UCQ q(~x) there exists a strong decomposition of q(~ x) w.r.t. O. As an intermediate step, we prove: Lemma 8 For each uGF(=) or uGC2 (=) ontology O and each UCQ q(~x) there is a decomposition Q of q(~ x) w.r.t. O. Moreover, if O is an uGC2 (=) ontology, then each rAQ that occurs in a pair of Q has a single free variable. P ROOF. The construction is similar to that of squid decompositions in [12]. We first construct Q and then show that it is a decomposition of q(~x) w.r.t. O. To simplify the presentation, we will assume that each element that occurs in Dq0 for a CQ q 0 is identified with the variable it represents. Consider the set I of all tuples (q 0 , f, V, H, (Ti )ki=1 ), where • q 0 is a disjunct of q; • f : dom(Dq0 ) → dom(Dq0 ) is an idempotent mapping such that for each answer variable x of q 0 we have that f (x) is an answer variable; • V ⊆ f (dom(Dq0 )) and V contains all variables in f (~ x); • H, T1 , . . . , Tn form a partition of f (Dq0 ), and each atom in H only contains variables in V ; • each Ti is cg-tree decomposable; • if O is a uGC2 (=) ontology, then each dom(Ti ) contains at most one variable from V ; • k is at most the number of atoms in q 0 . Clearly, I is finite. Each tuple I ∈ I contributes exactly one pair pI to Q, which we describe next. Let I = (q 0 , f, V, H, (Ti )ki=1 ) ∈ I. Define φ(~ y ) to be the conjunction of all atoms of H and all equality atoms x = f (x) for each variable x in ~ x with f (x) 6= x. Furthermore, for each i ∈ {1, . . . , k}, let qi (~zi ) be the CQ whose atoms are the atoms in Ti , and whose tuple ~zi of answer variables consists of all variables in dom(Ti ) that also occur in V . Let pI = (φ(~ y ), {qi (~zi ) | 1 ≤ i ≤ k}). We show that Q = {pI | I ∈ I} is a decomposition of q(~ x) w.r.t. O. First note that each pair in Q has the form (φ(~ y ), C), where φ(~ y) is a conjunction of atomic formulas that contains all variables of ~ x, and C is a finite set of CQs q 0 (~z) whose instance Dq0 is cg-tree decomposable. Furthermore, if O is formulated in uGC2 (=), then each CQ in C has at most one answer variable. Next we establish the equivalence between conditions 1 and 2 from Definition 5. For the direction from 2 to 1, suppose that for each model A of D and O there exists a pair (φ(~ y ), C) ∈ Q and an assignment π of elements in dom(D) to the variables in ~ y such that π(~ x) = ~a, D |= φ(π(~ y )), and A |= q 0 (π(~z)) for each q 0 (~z) ∈ C. By construction of (φ(~ y ), C) this implies O, D |= q. For the direction from 2 to 1, suppose that O, D |= q(~a), and consider a model A of D and O. By Lemma 1, there is a forestmodel B of D and O and a homomorphism h from B to A that preserves dom(D). Let [ B=D∪ BG , G∈G where G is the set of all maximal guarded subsets of D, and for each BG there is a cg-tree decomposition of BG whose root is labeled with a bag with domain G. Now, O, D |= q(~a) implies B |= q(~a), so there is a disjunct q 0 (~ x) of q(~ x) and a homomorphism π from Dq0 to B with π(~ x) = ~a. Note that π is the composition of • an idempotent mapping f : dom(Dq0 ) → dom(Dq0 ) such that f (~ x) contains only variables from ~ x, and • an injective mapping g from f (dom(Dq0 )) to B. Let • V := {f (x) ∈ dom(Dq0 ) | g(f (x)) ∈ dom(D)}; S • H := {f (α) | α ∈ Dq0 , g(f (α)) ∈ D \ G∈G BG }; • TG := {f (α) | α ∈ Dq0 , g(f (α)) ∈ BG } for G ∈ G. Here, for each atom α = R(x1 , . . . , xk ) and mapping h with domain {x1 , . . . , xk } we use the notation h(α) to denote the atom R(h(x1 ), . . . , h(xk )). Let T1 , . . . , Tk be an enumeration of all connected components of the sets TG , G ∈ G. Then each Ti is cg-tree decomposable, and k is bounded by the number of atoms in q 0 . Now, I = (q 0 (~ x), f, V, H, (Ti )ki=1 ) ∈ I, and pI = (φ(~ y ), C) ∈ Q. It is straightforward to verify that D |= φ(π(~ y )) and B |= q 0 (π(~z)) for each q 0 (~z) ∈ C. Since h is a homomorphism from B to A that preserves dom(D), we obtain that π 0 := h ◦ π is an assignment of elements in dom(D) to the variables in ~ y with π 0 (~ x) = ~a, D |= φ(π 0 (~ y )), and A |= q 0 (π 0 (~z)) for each q 0 (~z) ∈ C. Altogether, this completes the proof that Q is a decomposition of q(~ x) w.r.t. O. Next we show that for uGF(=) or uGC2 (=) ontologies O we can transform any decomposition of a UCQ q(~ x) w.r.t. O into a strong decomposition of q(~ x) w.r.t. O. This transformation is based on Lemma 9 below. Given a uGF(=) or uGC2 (=) ontology O and a decomposition Q of a UCQ w.r.t. O, the lemma tells us that for each model A of D and O, each pair (φ(~ y ), C) ∈ Q, each assignment π of elements in dom(D) to the variables in ~ y , and each q 0 (~z) ∈ C, it suffices to consider homomorphisms from Dq0 to A whose image contains an element at a bounded distance from an element in dom(D). Let us first make more precise what we mean by the distance between elements in an instance. Definition 6 Let A be an instance. • The Gaifman graph of an instance A is the undirected graph whose vertex set is dom(A) and which has an edge between any two distinct a, b ∈ dom(A) if a and b occur together in an atom of A. • Given a, b ∈ dom(A), the distance from a to b in A, denoted by distA (a, b), is the length of a shortest path from a to b in the Gaifman graph of A, or ∞ if there exists no such path. • Given A, B ⊆ dom(A), the distance from A to B in A is defined as distA (A, B) := mina∈A,b∈B distA (a, b). Lemma 9 Let O be a uGF(=) or uGC2 (=) ontology, and let Q be a decomposition of a UCQ q0 (~ x) w.r.t. O. Then there exists an integer d ≥ 0 such that the following is true for all instances D and all tuples ~a ∈ dom(D)|~x| . If O, D |= q0 (~a), then for each model A of D and O there exists a pair (φ(~ y ), C) in Q and an assignment π of elements in dom(D) to the variables in ~ y such that • π(~x) = ~a, • D |= φ(π(~ y )), and • for each q(~z) ∈ C there is a homomorphism h from Dq to A such that h(~z) = π(~z) and the distance from dom(h(Dq )) to dom(D) in A is at most d. P ROOF. For each model A of D and O and each positive integer d, let Qd (A) be the set of all triples (φ(~ y ), C, π) such that (φ(~ y ), C) ∈ Q and π is an assignment of elements in dom(D) to the variables in ~ y such that • π(~x) = ~a, • D |= φ(π(~ y )), and • for each q(~z) ∈ C there is a homomorphism h from Dq to A such that h(~z) = π(~z) and the distance from dom(h(Dq )) to dom(D) in A is at most d. We show that there exists a constant d ≥ 0 with the following property: if O, D |= q0 (~a), then for each model A of D and O we have Qd (A) 6= ∅. The constant d depends on the number of C-types, which we define next. Let Q0 be the set of all Boolean CQs that occur in the set C of some pair (φ(~ y ), C) ∈ Q. For the definition of C-types we assume that each q ∈ Q0 is a GF sentence (if O is formulated in uGF(=)), or a GC2 sentence (if O is formulated in uGC2 (=)). We can assume this w.l.o.g. because for each q ∈ Q0 the instance Dq has a cg-tree decomposition, which can be used to construct a GF or GC2 sentence that is equivalent to q. Now, let cl(O, q) be the smallest set satisfying: • O ∪ Q0 ⊆ cl(O, q); • cl(O, q) contains one atomic formula R(~ x) for each relation symbol R that occurs in O or q, where ~ x is a tuple of distinct variables; • cl(O, q) is closed under subformulas and single negation, where the subformulas of a formula φ = Q~ x ψ with a quantifier Q are φ itself and ψ. Given a set C ⊆ ∆D , a C-type is a set of formulas φ(~c), where φ(~ x) ∈ cl(O, q) and ~c is a tuple of constants in C of the appropriate length. Let w be the maximum arity of a relation symbol in O or q, and define τ to be the number of C-types for a set C of size 2w. Note that τ depends only on O and q, and is bounded by w 2|cl(O,q)|·(2w) . Fix d := s + d∗ with d∗ := τ 2 + 1, where s is the maximum number of atoms in a non-Boolean CQ that occurs in C for some pair (φ(~ y , C) ∈ Q. This finishes the definition of the constant d. Suppose now that O, D |= q0 (~a). We aim to show that each model A of D and O satisfies Qd (A) 6= ∅. Let A be a model of D and O. By Lemma 1, there is a forest-model B of D and O and a homomorphism g from B to A that preserves dom(D). We show that Qd (B) 6= ∅. Since g preserves CQs and does not increase distances (i.e., for all a, b ∈ dom(B) we have distA (g(a), g(b)) ≤ distB (a, b)), this implies Qd (A) 6= ∅, as desired. For a contradiction, suppose that Qd (B) = ∅. We use B to construct a model of D and O that does not satisfy q0 (~a). This contradicts O, D |= q0 (~a), and finishes the proof. The following is the core of the construction. Claim. Let C be a forest-model of D and O and let e ≥ d be such that Qe (C) = ∅. Then there is a forest-model C0 of D and O with Qe+1 (C0 ) = ∅, and C0 agrees with C on all elements at distance at most e − d∗ + 1 from dom(D). Proof. Recall the definition of the set Q0 given at the beginning of the proof of Lemma 9. Let X be the set of all q ∈ Q0 such that for each homomorphism h from Dq to C, distC (dom(h(Dq )), dom(D)) ≥ e + 1. For each instance C0 and each integer e0 , let He0 (C0 ) be the set of all pairs (q, h) such that q ∈ X and h is a homomorphism from Dq to C0 with distC0 (dom(h(Dq )), dom(D)) ≤ e0 . Pick a forest-model C0 of D and O such that 1. He (C0 ) is empty; 2. He+1 (C0 ) is inclusion-minimal; 3. for each (φ(~ y ), C) ∈ Q, each assignment π with π(~ x) = ~a and D |= φ(π(~ y )), and each non-Boolean q(~z) ∈ C we have C |= q(π(~z)) iff C0 |= q(π(~z)); 4. C0 agrees with C on all elements at distance at most e − d + 1 from dom(D). Such a model exists since C is a forest-model of D and O that satisfies conditions 1 and 3. We show that He+1 (C0 ) = ∅. This implies Qe+1 (C0 ) = ∅ as follows. Suppose, for a contradiction, that He+1 (C0 ) = ∅ and Qe+1 (C0 ) 6= ∅. Then any (φ(~ y ), C, π) ∈ Qe+1 (C0 ) satisfies π(~ x) = ~a, D |= φ(π(~ y )), and C |= q(π(~z)) for all non-Boolean q(~z) ∈ C (by point 3). But then, there is a Boolean q ∈ C with q ∈ X (as otherwise (φ(~ y ), C, π) ∈ Qe (C)). Since by (φ(~ y ), C, π) ∈ Qe+1 (C0 ) there exists a homomorphism h from Dq to C0 such that the distance from dom(h(Dq )) to dom(D) is at most e+1, we obtain (q, h) ∈ He+1 (C0 ) = ∅, a contradiction. Hence, it suffices to show that He+1 (C0 ) = ∅. For a contradiction, suppose that He+1 (C0 ) 6= ∅. We construct a new forest-model C00 of D and O that satisfies conditions 1, 3, and 4, but with He+1 (C00 ) ( He+1 (C0 ). This will contradict that He+1 (C0 ) is inclusion-minimal. The construction is based on a pumping argument. S By definition we have C0 = D ∪ G∈G C0G , where G is the set of all maximal guarded subsets of D, and for each G ∈ G there is a cg-tree decomposition (TG , EG , bagG ) of C0G whose root rG satisfies dom(bagG (rG )) = G. For each node t ∈ TG , let bag∗G (t) be the union of bagG (t0 ), where t0 ranges over the descendants of t in (TG , EG ), including t. Observe that for each (q, h) ∈ He+1 (C0 ) there is a G ∈ G and a node t ∈ TG at depth at least e in the tree (TG , EG ) such that h(Dq ) ⊆ bag∗G (t). Indeed, since distC0 (dom(h(Dq )), dom(D)) ≥ e + 1, (1) the set h(Dq ) does not contain any constant from dom(D). Moreover, since Dq is connected, there must be a G ∈ G and a node t ∈ TG such that h(Dq ) ⊆ bag∗G (t). This node t can be chosen to have depth at least e in (TG , EG ), because otherwise we could construct a path of length at most e from an element in dom(h(Dq )) to an element in dom(D) in the Gaifman graph of C0 , contradicting (1). Pick any G ∈ G and t ∈ TG at depth e in (TG , EG ) such that for some (q, h) ∈ He+1 (C0 ) we have h(Dq ) ⊆ bag∗G (t). Let t0 , t1 , . . . , td∗ be the last d∗ + 1 nodes on the path from rG to t in (TG , EG ), and define Ci := dom(bagG (ti )) as well as Ci+ := Ci−1 ∪ Ci . For each i ∈ {1, . . . , d∗ }, define the type and extended type of ti as follows: |~ x| θi := {φ(~a) | φ(~ x) ∈ cl(O, q), ~a ∈ Ci , C0 |= φ(~a)}, x) ∈ cl(O, q), ~a ∈ (Ci+ )|~x| , C0 |= φ(~a)}. θi+ := {φ(~a) | φ(~ Note that θi is a Ci -type and θi+ is a Ci+ -type. By our choice of d∗ , there are 1 ≤ i < j ≤ d∗ and a bijective mapping f : Ci+ → Cj+ with f (θi+ ) = θj+ and f (θi ) = θj , where f (θi+ ) is obtained from θi+ by replacing each constant a that occurs in it by f (a), and likewise for f (θi ). In particular, f is an isomorphism from bagG (ti ) to bagG (tj ). We extend f to an injective mapping with domain dom(bag∗G (ti )) such that each element that does not occur in Ci is mapped to an element in ∆D \ dom(C0 ). We are now ready to construct C00 . First, define C00G := (C0G \ bag∗G (tj )) ∪ f (bag∗G (ti )). Note that a cg-tree decomposition of C00G can be obtained from (TG , EG , bagG ) by removing the subtree of (TG , EG ) rooted at tj , adding a fresh copy of the subtree rooted at ti , making its root a child of tj−1 , and renaming each element a that occurs in the bag of a node in the new subtree to f (a). The root of this cg-tree decomposition is the root of (TG , EG , bagG ), and is therefore labeled with a bag with domain G. We let [ C00 := D ∪ C00G ∪ C0G0 . G0 ∈G\{G} It is straightforward to verify that C00 is a forest-model of D and O that has the desired properties. For point 4, note that C00 agrees with C0 on all elements whose distance from dom(D) in C00 is at most the depth of tj in (TG , EG ), which is at least e − d∗ + 1. Note that point 3 follows from point 4, because e − d∗ + 1 ≥ s + 1 and s was chosen in such a way that any non-Boolean CQ that occurs in C for some (φ(~ y ), C) ∈ Q and that has a match into C00 has a match into the subinstance of C00 induced by all elements at distance at most s from dom(D). y To complete the proof of Lemma 9, we apply the claim repeatedly to B to obtain a sequence B0 , B1 , B2 , . . . of forest models Bi of D and O such that for each i ≥ 0, • Qd+i (Bi ) = ∅; • Bi and Bi+1 agree on all elements at distance at most s + i + 1 from elements in dom(D). By compactness, we obtain a forest-model B0 of D and O with Qe (B0 ) = ∅ for all integers e ≥ 0. Since Q is a decomposition of q0 (~ x) w.r.t. O, this implies B0 6|= q0 (~a), as desired. We now use Lemma 9 to show that any decomposition of a UCQ q(~ x) w.r.t. a uGF(=) or uGC2 (=) ontology O can be turned into a strong decomposition of q(~ x) w.r.t. O. Lemma 10 Let O be a uGF(=) or uGC2 (=) ontology, and let Q be a decomposition of a UCQ q(~ x) w.r.t. O. Then there is a strong decomposition of q(~ x) w.r.t. O. P ROOF. Fix the constant d from Lemma 9. Consider a pair p = (φ(~ x), C) ∈ Q, and let q1 (~z1 ), . . . , qn (~zn ) be an enumeration of all CQs in C. For each i ∈ {1, . . . , n}, we define a set Ci of rAQs as follows. If qi (~zi ) is non-Boolean, then qi (~ x) is an rAQ and we define Ci := {qi (~ x)}. Otherwise, if qi is Boolean, let qi ← φ. Then, Ci consists of all rAQs of the form q 0 (x) ← R1 (~ y1 ) ∧ · · · ∧ Re (~ ye ) ∧ φ, for e ≤ d, where • x occurs in ~ y1 ; • each Ri is an at least binary relation symbol in O; • ~ ye contains a variable y in one of the atoms of φ, but no other variable from φ occurs in any of the ~ yi . Let Q0p be the set of all pairs (φ(~ x), {qi0 (~zi ) | 1 ≤ i ≤ n}), where qi0 (~zi ) ∈ Ci for each i ∈ {1, . . . , n}. By the construction of Q0p and our choice of d, it follows that the following are equivalent for each model A of D and O: 1. there exists an assignment π such that π(~ x) = ~a, D |= φ(π(~ y )), and A |= qi (π(~zi )) for each 1 ≤ i ≤ n; 2. there exists a pair (φ(~ x), C 0 ) ∈ Q0p and an assignment π such that π(~ x) = ~a, D |= φ(π(~ y )), and A |= q 0 (π(~z)) for each 0 0 q (~z) ∈ C . S Altogether, this implies that Q0 := p∈Q Q0p is a strong decomposition of q(~ x) w.r.t. O. Corollary 1 For each uGF(=) or uGC2 (=) ontology O and each UCQ q(~ x) there is a strong decomposition Q of q(~ x) w.r.t. O. Moreover, if O is an uGC2 (=) ontology, then each rAQ that occurs in a pair of Q has a single free variable. We are now ready to give the proof of Theorem 4. P ROOF OF T HEOREM 4. PART 1 AND 2: Since the “only if” directions are trivial, it suffices to focus on the direction from rAQ-evaluation to UCQ-evaluation. Suppose that rAQ-evaluation w.r.t. O is in PT IME (resp., Datalog6= -rewritable), and let q(~ x) be a UCQ. We show that evaluating q(~ x) w.r.t. O is in PT IME (resp., Datalog6= -rewritable). Let Q be a strong decomposition of q(~ x) w.r.t. O, which exists by Corollary 1. Then for all instances D and all tuples ~a over dom(D) of length |~x| we have O, D |= q(~a) iff the following is true: There exists (φ(~ y ), C) ∈ Q and an assignment π of elements in dom(D) to the variables in ~ y with 1. π(~x) = ~a, 3. O, D |= q 0 (π(~z)) for each rAQ q 0 (~z) in C. Indeed, since rAQ-evaluation w.r.t. O is in PT IME (this also holds if rAQ-evaluation w.r.t. O is Datalog6= -rewritable), we may assume that O is rAQ-materializable (by Theorem 3). Pick a rAQmaterialization A of O and D. Then condition 2 of Definition 5 means that there exists a pair (φ(~ y ), C) ∈ Q and an assignment π of elements in dom(D) to the variables in ~ y such that π(~ x) = ~a, D |= φ(π(~ y )), and A |= q 0 (π(~z)) for each rAQ q 0 (~z) in C. Since A is a rAQ-materialization of O and D, we can replace A |= q 0 (π(~z)) in the third condition by O, D |= q 0 (π(~z)), which yields (2). If rAQ-evaluation w.r.t. O is in PT IME, then (2) yields a polynomial time procedure for evaluating q w.r.t. O. Moreover, if rAQ-evaluation w.r.t. O is Datalog6= -rewritable, then we can construct a Datalog6= program for evaluating q w.r.t. O as follows. Let Q0 be the set of rAQs that occur in some pair of Q. For each q 0 ∈ Q0 , let Πq0 be a Datalog6= program that evaluates q 0 w.r.t. O. Without loss of generality we assume that the intensional predicates used in different programmes Πq0 and Πq00 are disjoint, and that the goal predicate of Πq0 is goalq0 . Now let Π be the Datalog6= program containing the rules of all programs Πq0 , for q 0 ∈ Q0 , and the following rule for each (φ(~ y ), C) ∈ Q: ^ goal(~x) ← φ(~ y) ∧ goalq0 (~z). q 0 (~ z )∈C Note that if each Πq0 is a Datalog program, then Π is a Datalog program as well. Using (2) it is straightforward to verify that for all instances D we have D |= Π(~a) iff O, D |= q(~a). PART 3: The “if” directions are trivial, so we focus on the direction from UCQ-evaluation to rAQ-evaluation. Suppose that UCQevaluation w.r.t. O is CO NP-hard, and let q(~ x) be a UCQ that witnesses this. We show that rAQ-evaluation w.r.t. O is CO NP-hard via a polynomial-time reduction from evaluating q w.r.t. O. We start by describing a translation of instances D and tuples ~a, and will show afterwards that this translation is a polynomial-time reduction from evaluating q w.r.t. O to evaluating a suitably chosen rAQ w.r.t. O. Let Q be a strong decomposition of q(~ x) w.r.t. O. Fix an enu0 meration q10 (~z1 ), . . . , qm (~zm ) of all rAQs that occur in some pair of Q, and let ki be the length of ~zi . Without loss of generality, we can assume that each Dqi0 is consistent w.r.t. O. We use fresh relation symbols R, S, and Ti (1 ≤ i ≤ m), where R and S are binary, and Ti is (ki + 1)-ary. Note that each of these relation symbols is at most binary in the case that O is a uGC2 (=) ontology. Given an instance D and a tuple ~a ∈ dom(D)|~x| , we compute a new instance D̃ by adding the following atoms to D. First, we add all atoms of Dqi0 , for each i ∈ {1, . . . , m}, where we assume w.l.o.g. that the domain of Dqi0 is disjoint from that of D and Dqj0 for j 6= i. Let ~ci be the tuple of elements in Dqi0 that represents the tuple ~zi of the answer variables of qi0 . Next, we add the following atoms for each pair p = (φ(~ y ), C) ∈ Q, and for each assignment π of elements in dom(D) to the variables in ~ y that satisfies π(~ x) = ~a and D |= φ(π(~ y )): • R(a0 , ap ); • Ti (ap,π , π(~zi )) for each i ∈ {1, . . . , m} with qi0 ∈ C; • Ti (ap,π , ~ci ) for each i ∈ {1, . . . , m} with qi0 ∈ / C. (2) 2. D |= φ(π(~ y )), and • S(ap , ap,π ); Since Q is of constant size, the instance D̃ can be computed in time polynomial in the size of D. It is now straightforward to verify that O, D |= q(~a) holds iff D̃, O |= q̃(a0 ), where q̃(x) is the rAQ q̃(x) ← R(x, y) ∧ S(y, z) ∧ m ^ i=1 Ti (z, ~ ui ) ∧ qi0 (~ ui ) . Since evaluating q(~ x) w.r.t. O is CO NP-hard, we conclude that evaluating q̃(x) w.r.t. O is CO NP-hard. E. PROOFS FOR SECTION 4 Theorem 5 (restated) For all uGF(=) and uGC2 (=) ontologies O, unravelling tolerance of O implies that rAQ-evaluation w.r.t. O is Datalog6= -rewritable (and Datalog-rewritable if O is formulated in uGF). P ROOF. Let q be an rAQ. We show that answering q w.r.t. O is Datalog6= -rewritable. We provide separate proofs for the case that O is a uGF(=) ontology (Part 1) and for the case that O is a uGC2 (=) ontology (Part 2). Part 1: O is a uGF(=) ontology. To simplify the presentation, we will regard q as an openGF formula, which we can do w.l.o.g. because q has a connected guarded tree decomposition whose root is labeled by the answer variables of q. Let cl(O, q) be the smallest set satisfying: • O ∪ {q} ⊆ cl(O, q); • cl(O, q) contains one atomic formula R(~ x) for each relation symbol R that occurs in O or q, where ~ x is a tuple of distinct variables; • cl(O, q) contains a single equality atom x = y, where x and y are distinct variables; • cl(O, q) is closed under subformulas and single negation, where the subformulas of a formula φ = Q~ x ψ with a quantifier Q are φ itself and ψ. Let w be the maximum arity of a relation symbol in O or q, and let x1 , . . . , x2w−1 be distinct variables that do not occur in O or q. Given a tuple ~ x over X = {x1 , . . . , x2w−1 }, an ~ x-type θ is a maximal consistent set of formulas, where each formula in θ is obtained from a formula φ(~ y ) ∈ cl(O, q) by substituting a variable from ~ x for each variable in ~ y , and one formula in θ is a relational atomic formula containing all the variables in θ. If θi is an ~ xi -type for each i ∈ {1, 2}, then θ1 and θ2 are compatible if they agree on all formulas containing only the variables that occur both in ~ x1 and ~ x2 . An ~ x-type θ is realizable in an interpretation A if there is an assignment π of elements in dom(A) to the variables in ~ x such that A |= φ(π(~ y )) for each φ(~ y ) ∈ θ. In this case we also say that ~a = π(~ x) realizes θ in A. Let tp(~ x) be the set of all ~ x-types that are realizable in some model of O. Since O and q are fixed, each tp(~ x) is of constant size and can be computed in constant time using a standard satisfiability procedure for the guarded fragment [29]. We now describe a Datalog6= program Π for evaluating q w.r.t. O. For each l ∈ {1, . . . , w}, each l-tuple ~ x over X, and each Θ ⊆ tp(~ x), the program uses an intensional l-ary relation symbol PΘl . Intuitively, PΘl (~a) encodes an assignment of possible types (namely those in Θ) for ~a in a model of O. In the description below, l1 and l2 range over integers in {1, . . . , w}, and ~ x1 and ~ x2 range over tuples over X of length l1 and l2 , respectively, such that ~ x2 contains at least one variable from ~ x1 . Let k be the arity of q. The program contains the following rules: 1. PΘl1 (~x1 ) ← R(~ y ) ∧ α(~z), where Θ is the set of all θ ∈ tp(~ x1 ) that contain R(~ y ) and α(~z), α is an atomic formula (possibly an equality) or an inequality, and ~ y contains exactly the variables from ~x1 ; 2. PΘl1 (~x1 ) ← PΘl11 (~ x1 ) ∧ PΘl22 (~ x2 ), where Θi ⊆ tp(~ xi ), and Θ is the set of all θ1 ∈ Θ1 such that there exists a θ2 ∈ Θ2 that is compatible with θ1 ; x1 ), where Θi ⊆ tp(~ x1 ); x1 ) ∧ PΘl12 (~ 3. PΘl11 ∩Θ2 (~x1 ) ← PΘl11 (~ x1 ), where Θ ⊆ tp(~ x1 ), and 4. goal(xi1 , . . . , xik ) ← PΘl1 (~ q(xi1 , . . . , xik ) ∈ θ for each θ ∈ Θ; 5. goal(x1 , . . . , xk ) ← P∅l (~ y ), where l ≤ w and ~ y is a tuple of l variables from X distinct from x1 , . . . , xk . It remains to show that Π is a Datalog6= -rewriting for q w.r.t. O. To this end, let D be an instance. We show that O, D 6|= q(~a) iff D 6|= Π(~a). For the “only if” direction, assume O, D 6|= q(~a), and let B be a model of D and O with B 6|= q(~a). Consider a tuple ~b = (b1 , . . . , bl ) ∈ dom(D)l for some l ∈ {1, . . . , w}. Note that if ~b is not guarded in B, then for all tuples ~ x ∈ X l and all Θ ⊆ tp(~ x) the l ~ program Π does not derive PΘ (b) on input D. This is because of the rules in line 1, which require ~b to be guarded in order to derive y = (y1 , . . . , yl ) ∈ X l , define θy~ (~b) to be PΘl (~b). For each tuple ~ the set of all formulas obtained from a formula φ(z1 , . . . , zn ) ∈ cl(O, q) and 1 ≤ i1 , . . . , in ≤ l with B |= φ(bi1 , . . . , bin ) by substituting yij for each zj . Since ~b is guarded in D, θy~ (~b) contains a relational atomic formula that contains all variables from ~ y , hence θy~ (~b) is a ~ y -type. Furthermore, θy~ (~b) is realizable in B, which implies θy~ (~b) ∈ tp(~ y ). By induction on rule applications of Π one can show that for each guarded tuple ~b in dom(B) of length l, each tuple ~ y ∈ X l , and each Θ ⊆ tp(~ y ) such that PΘl (~b) is derivable by Π on D we have θy~ (~a) ∈ Θ. Since B 6|= q(~a), for each guarded tuple ~b = (b1 , . . . , bl ) in D with ~a = (bi1 , . . . , bik ) and each ~ y = (y1 , . . . , yl ) ∈ X l we have q(yi1 , . . . , yik ) ∈ / θy~ (~b). Altogether, this implies that D 6|= Π(~a). For the “if” direction, assume D 6|= Π(~a). Let G~a be a maximal guarded set of D containing the elements of ~a. Recall the definition of the uGF(=)-unravelling of D from Section 4. We show that O, Du 6|= q(~b), where ~b is the copy of ~a in bag(G~a ), which implies O, D 6|= q(~a) by unravelling tolerance of O. More precisely, we construct a model B of Du and O with B 6|= q(~b). Observe that for each l ∈ {1, . . . , w}, each ~c ∈ dom(D)l , each Θ ⊆ tp(~x), and each bijective mapping f : X → X, the program Π derives PΘl (~c) iff it derives Pfl (Θ) (~c), where f (Θ) is the f (~ x)type obtained from Θ by substituting f (x) for each variable x in ~ x. In other words, the sets Θ of types that Π derives for each tuple ~c do not depend on the choice of the variables ~ x. Observe also that for each guarded tuple ~c in D there is a unique minimal Θ(~c) ⊆ l tp(x1 , . . . , xl ) such that Π derives PΘ(~ c). Since Π does not c) (~ derive goal(~a), we must have Θ(~c) 6= ∅ for all guarded tuples ~c in D. Furthermore, if ~c = (c1 , . . . , cl ) and ~a = (ci1 , . . . , cik ), then q(xi1 , . . . , xik ) ∈ / θ for some θ ∈ Θ(~c). Let us denote the set of such types in Θ(~c) by Θ¬q (~c). To construct the desired model B, we first assign to each maximally guarded tuple ~c of Du a type in Θ(~c) as follows. Recall the definition of the tree T (D) and the bags bag(t), t ∈ T (D), used in the definition of the uGF-unravelling in Section 4. For each node t of T (D), let Gt := dom(bag(t)) and ~ct a |Gt |-tuple of all elements in Gt . We inductively assign to each node t of T (D) a (x1 , . . . , x|Gt | )-type θt as follows. If t = G for a maximal guarded subset G of D, pick a type θt ∈ Θ¬q (~ct ). For the induction step, suppose that t = t0 G and tail(t0 ) = G0 . Let ~ct0 = (c01 , . . . , c0m ), ~ct = (c1 , . . . , cn ), and define a bijective mapping f : X → X such that for all i ∈ {1, . . . , n}, • if ci = c0j , then f (xi ) = xj ; • if ci ∈ / {c01 , . . . , c0m }, then f (xi ) ∈ / {x1 , . . . , xm }. Since θt0 ∈ Θ(~ct0 ), there exists a type θt ∈ Θ(~ct ) such that θt0 and f (θt ) are compatible. This follows from the rules in line 2 of the definition of Π. Finally, for each node t of T (D), pick a model Bt of O such that ~ct realizes θt in Bt . Without loss of generality we assume that dom(Bt ) ∩ dom(Du ) = Gt for each node t of T (D), and dom(Bt ) ∩ dom(Bt0 ) = Gt ∩ Gt0 for every two distinct t, t0 ∈ T (D) Let B be the interpretation obtained from Du by hooking Bt to Du for each node t of T (D): [ B := Du ∪ Bt . t∈T (D) u Clearly, B is a model of D . It remains to show that B is a model of O with B 6|= q(~b). Claim. For all openGF formulas φ(~ x), all guarded tuples ~c of B, and all nodes t ∈ T (D) with ~c ⊆ dom(Bt ), Bt |= φ(~c) ⇐⇒ B |= φ(~c). (3) Proof. By induction on the structure of φ. For the base case assume that φ is an atomic formula. Let ~c be a guarded tuple of B and let t ∈ T (D) be such that ~c ⊆ dom(Bt ). Then, Bt |= φ(~c) implies B |= φ(~c) because Bt ⊆ B. For the converse assume that B |= φ(~c). Since φ is atomic, this implies Bt0 |= φ(~c) for some t0 ∈ T (D) with ~c ⊆ dom(Bt0 ). By the choice of the types θt and θt0 , we obtain Bt |= φ(~c). For the inductive step, we distinguish the following cases: Case 1: φ(~ x) = ¬φ0 (~ x). For each guarded tuple ~c of B and each node t ∈ T (D) with ~c ⊆ dom(Bt ), we have Bt |= φ(~c) ⇐⇒ Bt 6|= φ0 (~c) ⇐⇒ B 6|= φ0 (~c) ⇐⇒ B |= φ(~c), where the second equivalence follows from the induction hypothesis. Case 2: φ(~ x) = φ1 (~ x1 )∧φ2 (~ x2 ). Consider a guarded tuple ~c of B and a node t ∈ T (D) with ~c ⊆ dom(Bt ). Let ~ci be the projection of ~c onto the positions of ~ x that contain a variable from ~ xi . Then, Bt |= φ(~c) ⇐⇒ Bt |= φi (~ci ) for each i ∈ {1, 2} ⇐⇒ B |= φi (~ci ) for each i ∈ {1, 2} ⇐⇒ B |= φ(~c), where the second equivalence follows from the induction hypothesis. Case 3: φ(~ x) = ∀~ y (α(~ x, ~ y ) → ψ(~ x, ~ y )). Consider a guarded tuple ~c of B and a t ∈ T (D) with ~c ⊆ dom(Bt ). We first prove that Bt |= φ(~c) implies B |= φ(~c). Suppose that ~ for some tuple d. ~ Since α is Bt |= φ(~c), and let B |= α(~c, d) 0 ~ atomic, we have Bt0 |= α(~c, d) for some node t ∈ T (D) with ~c, d~ ⊆ dom(Bt0 ). Note that ~c contains only values in dom(Bt ) ∩ dom(Bt0 ) and that ~c is non-empty (because φ is open). By the choice of the types θt and θt0 and since Bt |= φ(~c), we obtain ~ The induction hypothesis Bt0 |= φ(~c). Hence, Bt0 |= ψ(~c, d). ~ now yields B |= ψ(~c, d). We have thus shown that B |= φ(~c). ~ For the converse, assume B |= φ(~c), and let Bt |= α(~c, d) ~ for some tuple d. Since α is atomic and Bt ⊆ B, we have ~ and therefore B |= ψ(~c, d). ~ Since ~c, d~ is guarded B |= α(~c, d), and contained in dom(Bt ), the induction hypothesis implies Bt |= ~ This shows that Bt |= φ(~c). ψ(~c, d). Case 4: φ(~x) = ∃~ y (α(~ x, ~ y ) ∧ ψ(~ x, ~ y )). Consider a guarded tuple ~c of B and a node t ∈ T (D) with ~c ⊆ dom(Bt ). If Bt |= φ(~c), then there exists a tuple d~ ⊆ dom(Bt ) such that ~ and Bt |= ψ(~c, d). ~ By the induction hypothesis, Bt |= α(~c, d) ~ ~ and hence B |= φ(~c). this implies B |= α(~c, d) and B |= ψ(~c, d), Conversely, if B |= φ(~c), then there exists a tuple d~ such that ~ and B |= ψ(~c, d). ~ Since α is atomic, there exists B |= α(~c, d) a node t0 ∈ T (D) with ~c, d~ ⊆ dom(Bt0 ), and hence by the in~ and Bt0 |= ψ(~c, d), ~ duction hypothesis we obtain Bt0 |= α(~c, d) and therefore Bt0 |= φ(~c). By the choice of θt and θt0 , we obtain Bt |= φ(~c). y Using the claim we now show that B is a model of O with B 6|= q(~b). By construction we have θt ∈ Θ¬q (~ct ), where t = G~a . This implies Bt 6|= q(~b), and using the claim we obtain B 6|= q(~b). To prove that B is a model of O, let ψ = ∀~ x(α(~ x) → φ(~ x)) be a sentence in O, and let ~c be a tuple with B |= α(~c). Since α is atomic, ~c is a guarded tuple in B. In particular, since B |= α(~c), there must be a node t0 ∈ T (D) with ~c ⊆ dom(Bt0 ) and Bt0 |= α(~c). In fact, by the choice of the types assigned to the maximally guarded tuples of Du , every node t ∈ T (D) with ~c ⊆ dom(Bt ) must satisfy Bt |= α(~c). Now let t be a node in T (D) such that ~c ⊆ dom(Bt ). Since Bt |= α(~c) and Bt is a model of O, we get Bt |= φ(~c), so by (3) we have B |= φ(~c). Therefore, B |= ψ. This holds for all sentences in O, hence B is a model of O. If O is a uGF-ontology, we obtain a Datalog-rewriting from Π by removing inequalities from types and the rules in line 1. Part 2: O is a uGC2 (=) ontology. Analogous to part 1 we regard q as an openGC2 formula, which we can do w.l.o.g. because q has a cg-tree decomposition where the domain of each bag consists of at most two elements and whose root bag has the answer variables of q as its domain. We define cl(O, q), types, and the corresponding notion of realization as in part 1. Since in the context of uGC2 (=) we work over an at most binary signature, we will only use types over one or two variables. The sets tp(x) and tp(x, y) of these types can be computed in constant time using a satisfiability procedure for the two-variable guarded counting fragment [48]. Given a ~ x0 -type θ0 and a set Θi ⊆ tp(~ xi ) for each i ∈ {1, . . . , `} we write θ0 (Θi )`i=1 iff there exists a θi ∈ Θi for each i ∈ {1, . . . , `} S such that `i=0 θi is realizable in a model of O. Furthermore, if (x, y) and (u, v) are pairs of distinct variables and Θ ⊆ tp(u, v), then we write Θx,y for the set of all types in tp(x, y) that can be obtained from a type in Θ by renaming u to x and v to y. We now turn to the description of the Datalog6= program Π for evaluating q w.r.t. O. The program uses distinct variables x, y, z0 , z1 , z2 , . . . , zN τ 2τ . Here, τ is one more than the number of types of two distinct variables, and N is the largest integer n such that a formula of the form ∃≥n x φ occurs in cl(O, q), or 1 if there is no such formula in cl(O, q). For each i ∈ {1, 2} and Θ ⊆ tp(u, v), Π uses an intensional binary relation symbol PΘ with the sameVintended interpretation as the symbols PΘl in part 1. Let neq` := 0≤i<j≤` zi 6= zj . Then Π consists of the following rules: 1. PΘ (x, y) ← R(~v ) ∧ α(w), ~ where Θ is the set of all types in tp(x, y) containing R(~v ) and α(w), ~ α is an atomic formula (including an equality) or an inequality, and ~v contains both x and y; V 2. PΘ (x, z0 ) ← li=0 PΘi (x, zi )∧neq` , where ` ≤ N τ 2τ , Θi ⊆ tp(x, zi ) for i = 0, 1, . . . , `, and Θ consists of all θ0 ∈ Θ0 with θ0 (Θi )`i=1 ; V 3. PΘ1 ∩Θ2 (x, y) ← 2i=1 PΘi (x, y) with Θi ⊆ tp(x, y); 4. goal(~v ) ← PΘ (x, y), where Θ ⊆ tp(x, y), and q(~v ) ∈ θ for each θ ∈ Θ; 5. goal(~v ) ← P∅ (x). In addition, for each set Θ ⊆ tp(x, y), the program contains PΘx,zi (x, y) ← PΘ (x, y) and PΘ (x, z0 ) ← PΘx,z0 (x, z0 ) to rename variables in types, and PΘy,x (y, x) ← PΘ (x, y) to swap positions of variables. Intuitively, Π computes for each guarded tuple (a, b) of an instance D a set Θ(a, b) of (x, y)-types θ0 that contain all information about the atomic formulas that hold in D|{a,b} , and for each collection (a, c1 ), . . . , (a, c` ) of ` ≤ N τ 2τ guarded tuples of D that have a non-empty intersection with (a, b) there are types S θi ∈ Θ(a, ci ) such that `i=0 (θi )x,zi is realizable in a model of O. The renaming of the variables in each θi is necessary to account for the overlap of the tuples (a, b) and ~ci . Π derives goal(~c) if some Θ(a, b) is empty (which will happen if D is inconsistent w.r.t. O) or if Θ(a, b) contains q(~v ) for some (a, b), where the ith variable in ~v is x if the ith position of ~a is a, and it is y otherwise. It remains to show that Π is a Datalog6= -rewriting for q w.r.t. O. To this end, let D be an instance. We show that O, D 6|= q(~a) iff D 6|= Π(~a). The “only if” direction is similar to part 1. For the “if” direction, assume D 6|= Π(~a). Recall the definition of the uGF2 -unravelling of D from Section 4, and let G~a be a maximal guarded set of D containing the elements of ~a. As in part 1, it suffices to construct a model B of Du and O with B 6|= q(~b), where ~b is the copy of ~a in bag(G~a ). Observe that for each guarded tuple ~c in D there is a unique minimal set Θ(~c) ⊆ tp(x, y) such that Π derives PΘ(~c) (~c) on input D. Since Π does not derive goal(~a), we must have Θ(~c) 6= ∅ for all guarded tuples ~c in D. Furthermore, if ~a = f (~v ) for a mapping from the variables in ~v to the constants in ~c, then q(~v ) ∈ / θ for some θ ∈ Θ(~c). Let us denote the set of such types in Θ(~c) by Θ¬q (~c). To construct the desired model B, we first assign to each maximally guarded tuple ~c of Du a type in Θ(~c↑ ) and a model At of O as follows. Recall the definition of the tree T (D) and the bags bag(t), for t ∈ T (D), used in the definition of the uGF2 -unravelling in Section 4. We inductively assign to each node t of T (D) a (x, y)type θt as follows. If t = {a, b} for a maximal guarded set {a, b} in D, let θt be any type in Θ¬q (a, b). For the induction step, assume that we have assigned a type θt to t ∈ T (D), but that θt0 is undefined for each node t0 = tG in T (D). Let tail(t) = {a, b}. We distinguish the following two cases. Case 1: There exists a t0 ∈ T (D) with t = t0 {a, b}. In this case, there is only one kind of successor node of t in T (D): nodes of the form t{a, c}, where a does not occur in tail(t0 ). Let c1 , . . . , cn be an enumeration of all elements of dom(D) such that ti := t{a, ci } is a node in T (D). Denote by ∼ the equivalence relation on {c1 , . . . , cn } with ci ∼ cj iff Θ(a, ci ) = Θ(a, cj ). There are s ≤ 2τ equivalence classes w.r.t. ∼. Let E1 , . . . , Es be an enumeration of these classes. For each i ∈ {1, . . . , s}, pick an arbitrary subset Ei0 of Ei of size N τ , or the entire set if Ei has fewer than N τ elements. Let c01 , . . . , c0` be an enumeration of the elements in E10 ∪ · · · ∪ Es0 . Then, ` ≤ N τ 2τ . Since θt ∈ Θ(a, b), the rules in line 2 of the definition of Π ensure θt (Θi )`i=1 , where Θi := {θx,zi | θ ∈ Θ(a, c0i )}. Hence, for each i ∈ {1, . . . , `} there is a type θi ∈ Θ(a, c0i ) such that S θt ∪ `i=1 (θi )x,zi is realizable in a model of O. Let θt{a,c0i } := θi for each i ∈ {1, . . . , `}. It remains to assign types to the elements in Ei \ Ei0 , for each i ∈ {1, . . . , s}. By construction, we have Ei \ Ei0 6= ∅ only if |Ei0 | = N τ . Thus there must be at least one type θi∗ that is assigned to at least N of the nodes t{a, ci }. We define θt{a,c} := θi∗ for each c ∈ Ei \ Ei0 . This finishes the assignment of types θt0 to all successor nodes ∗ t0 of t. SnNote that by our choice of N and the type θi , the set θ := θt ∪ i=1 (θti )x,zi is realizable in a model At of O. In fact, At can be chosen such that the assignment πt with πt (x)↑ = a and πt (zi )↑ = ci realizes θ in At . Case 2: t = {a, b}. In this case, there are two kinds of successor nodes of t: nodes of the form t{a, c} and nodes of the form t{b, c}. Let t1 , . . . , tn be an enumeration of all nodes of the form t{a, c}, and let u1 , . . . , um be an enumeration of all nodes of the form t{b, c}. We assign types θti to each node ti as in Case 1. We do the same for all successors nodes uj , but here we have to use the rule PΘy,x (y, x) ← PΘ (x, y) which implies that Θ(b, θ ∈ Θ(a, b)}. One can now show that S a) = {θy,x | S m 0 θ := θt ∪ n (θ ) ∪ t x,z i i i=1 i=1 (θui )y,zi is realizable in a model At of O, and that this model can be chosen in such a way that the assignment πt with πt (x)↑ = a, πt (zi )↑ ∈ tail(ti ) \ {a} and πt (uj )↑ ∈ tail(uj ) \ {b} realizes θ in At . This concludes the induction step. We are now ready to construct B. Note that any two At and At0 with t0 a neighbor of t in T (D) coincide on all atoms that only involve elements of dom(Du ). Let B0 be the collection of all atoms that occur in some At and only involve elements of dom(Du ). For each a ∈ dom(Du ) we now attach the following structure B{a} to B0 . Pick any t ∈ T (D) such that a ∈ dom(bag(t)). Using the construction in the second part of the proof of Lemma 1, we unfold At at a into a cg-decomposable model B{a} , where we identify a and its neighbors b with the copies of a and b in the bag of the root and its neighbors. We assume that all other elements have been renamed so that they do not occur in dom(Du ). Let [ B := B0 ∪ B{a} . a∈dom(Du ) u It is not difficult to prove that B is a model of D and O with B 6|= q(~b). F. PROOFS FOR SECTION 5 Theorem 6 (restated) Let O be either a uGF(1), uGF− (1, =), − uGF− 2 (2), uGC2 (1, =), or ALCHIF ontology of depth 2. If O is materializable for (possibly infinite) cg-tree decomposable instances D with sig(D) ⊆ sig(O), then O is unravelling tolerant. P ROOF. We first give the proof for the ontology languages − uGF(1), uGF− 2 (2) and uGF (1, =). Assume such an ontology O is given. Let D be an instance and Du its uGF-unravelling. Let G0 be a maximal guarded set in D, ~a in G0 , ~b the copy of ~a in bag(G0 ), and q an rAQ. We have to show that O, D |= q(~a) implies O, Du |= q(~b). Define an equivalence relation ∼ on T (D) by setting t ∼ t0 if tail(t) = tail(t0 ). Recall that for any t, t0 ∈ T (D) with t ∼ t0 the mapping ht,t0 that sends every e ∈ dom(bag(t)) to the unique f ∈ dom(bag(t0 )) with e↑ = f ↑ is an isomorphism from bag(t) to bag(t0 ). Call ht,t0 the canonical isomorphism from bag(t) onto bag(t0 ). By construction of the undirected graph (T (D), E), for any t, t0 ∈ T (D) with t ∼ t0 there is an automorphism it,t0 of (T (D), E) such that it,t0 (t) = t0 and it,t0 (s) ∼ s for every s ∈ T (D). it,t0 is uniquely determined on the connected component of t in (T (D), E) and induces the extended canonical automorphism S ĥt,t0 of Du by setting ĥt,t0 = s∈T (D) hs,it,t0 (s) . Fact 1. If t, t0 ∈ T (D) such that t ∼ t0 then ĥt,t0 is an automorphism of Du . Our aim now is to construct a materialization B of O and Du that is a forest model such that the automorphism ĥt,t0 can be lifted to an automorphism of B. Once this is done it is straightforward to construct a materialization B0 of O and D by hooking to any maximal guarded set G of D a copy of Bbag(G) , the guarded tree decomposable subinstances of B hooked to bag(G) in B. Then (B0 , ~a) and (B, ~b) are guarded bisimilar and if O, Du 6|= q(~b) then O, D 6|= q(~a) follows. Let B be a materialization of O and Du . (To show that B exists let red(Du ) be the sig(O)-reduct of D. As O is invariant under disjoint unions and materializable for the class of (possibly infinite) cg-tree decomposable instances D with sig(D) ⊆ sig(O) there exists a materialization Bred of red(Du ). Clearly {R | R(~a) ∈ Bred } ⊆ sig(O). Now let B = Bred ∪ {R(~a) ∈ Du | R 6∈ sig(O)} One can show that B is a materialization of Du and O.) Fact 1 entails the following. Fact 2. For all t, t0 with t ∼ t0 and ~e in dom(bag(t)) and any rAQ q(~ x) such that ~ x has the same length as ~e: B |= q(~e) ⇔ We now distinguish two cases. B |= q(ht,t0 (~e)) Case 1. O is a uGF(1) or a uGF− 2 (2) ontology. The following observation is crucial for the proof (and does not hold for languages with equality such as uGF− (1, =)). Observation 1. If there is a homomorphism h from an instance D to an instance D0 then O, D |= q(~a) implies O, D0 |= q(h(~a)) for any CQ q and ~a in dom(D). We hook in Du to any bag(t) with t ∈ T (D) a copy of any rAQ q (regarded as an instance) from which there is a homomorphism into B that is injective on the root bag of q and maps it into dom(bag(t)). In more detail, let Xt be the set of all pairs (π, Dq ) such that Dq is an instance corresponding to an rAQ q and π is a homomorphism from Dq to B mapping the constants dx corresponding to answer variables x of q to distinct π(dx ) ∈ dom(bag(t)). By renaming constants in Dq we obtain an instance Dq,π isomorphic to Dq such that π(dx ) = dx and such that dom(Dq,π ) ∩ dom(Du ) is the set of all dx with x an answer variable of q. Now let [ [ Dt = Dq,π , Du+ = Du ∪ Dt (q,π)∈Xt t∈T (D) u+ The following properties of D tion: follow directly from the defini- 1. For any t, t0 ∈ T (D) with t ∼ t0 there is an isomorphism from Dt onto Dt0 that extends the canonical isomorphism ht,t0 ; 2. there is a homomorphism from Du+ into B preserving dom(Du ). Thus, by Observation 1, any materialization of Du+ is a materialization of Du and for every CQ q(~ x) and d~ in Du of the same length as ~ x: ~ O, Du+ |= q(d) ⇔ ~ O, Du |= q(d) Now take a materialization Bu+ of Du+ and O that is a forest model of Du+ and O. Thus Bu+ is obtained from Du+ by hooking cg-tree decomposable models Bu+ of Dt to every bag(t) with t ∈ t T (D). By Point 1 and Fact 1, we have the following: Fact 3. For any t, t0 ∈ T (D) with t ∼ t0 the canonical automorphism ĥt,t0 extends to an isomorphism from Bu+ |dom(Dt ) onto Bu+ . |dom(D 0 ) t Fact 4. For any t, t0 ∈ T (D) with t ∼ t0 and any finite subinstance A of Bu+ there exists an isomorphic embedding of A into t Bu+ extending the canonical automorphism ĥt,t0 . |dom(D 0 ) t We prove Fact 4. By Fact 3 it suffices to prove that for any t ∈ T (D) and any finite subinstance A of Bu+ there exists an isot morphic embedding of A into Bu+ preserving dom(bag(t)). |dom(Dt ) But this follows from the fact that Bu+ is a materialization of Du (Point (2)): then there is an isomorphism from A to some Dπ,q used in the construction of Du+ which preserves dom(bag(t)). Fix such a Dπ,q . It remains to be proved that there does not exist any R(~a) with ~a in dom(Dπ,q ) such that R(~a) ∈ Bu+ \ Dπ,q . But using the fact that A is a subinstance of the model Bu+ of O and Du+ isomorphic to Dπ,q one can easily construct a model of Du+ and O that contains no R(~a) 6∈ Dπ,q with ~a in dom(Dπ,q ). Thus Bu+ contains no such R(~a) since Bu+ is a materialization of O and Du+ . This finishes the proof of Fact 4. Fact 4 easily generalizes to Bu+ (by Fact 3): for any t, t0 ∈ T (D) with t ∼ t0 and any finite subinstance A of Bu+ there exists an isomorphic embedding of A into Bu+ extending the canonical automorphism ĥt,t0 . We now uniformize Bu+ . For each t ∈ T (D) with t not a guarded set in D we hook at bag(t) to Du an isomoru+ phic copy Bu∗ t of the interpretation Bbag(G) with t ∼ G and reu+ move Bt . Denote the resulting model by Bu∗ . Bu∗ is uniform in the sense that for any two t, t0 ∈ T (D), the cg-tree decomposable models hooked to bag(t) and bag(t0 ) in Bu∗ are isomorphic. We now show that Bu∗ is a materialization of D and O. ∼ For a ∈ dom(Bu∗ the corresponding t ) and t ∼ G denote by a element of Bu+ such that for a ∈ bag(t) we have a↑ = (a∼ )↑ . bag(G) Fact 5. Bu∗ is a materialization of O and Du . We show that Bu∗ is a model of O. Then Bu∗ is a materialization of O and Du since it is a model of Du and since Bu∗ is obtained from the materialization Bu+ of O and Du by replacing certain interpretations that are hooked to bag(t) by interpretations that are hooked to bag(t0 ) for t, t0 ∈ T (D) with t ∼ t0 which preserves the answers to rAQs (use Fact 1). Consider first a sentence ϕ of the form ∀~ y (R(~ y ) → ψ(~ y )) in O, where ψ is a formula in openGF of depth one. We show that Bu∗ |= ϕ. Let Bu∗ |= R(~a) for some ~a = (a1 , . . . , ak ) in dom(Bu∗ ). Then a1 , . . . , ak are contained in Bu∗ for some t t ∈ T (D). We show that Bu∗ |= ψ(~a) iff Bu+ |= ψ(~a∼ ) where ∼ u ~a∼ = (a∼ 1 , . . . , ak ). Assume t ∼ G. If a1 , . . . , ak 6∈ dom(D ), then this is clear by construction since the truth of ψ(~a) then only u∗ depends on the subinterpretation Bu∗ and this is isomort of B u+ phic to the subinterpretation Bbag(G) of the model Bu+ of O. Now assume that {a1 , . . . , ak }∩dom(Du ) = Z 6= ∅. By Fact 4, for any guarded set F in Bu∗ with G0 ∩ Z 6= ∅ there exists a guarded set F 0 in Bu∗ such that there exists an isomorphism h from Bu∗ |F onto u∗ B|F 0 that extends the canonical automorphism ĥt,G . The converse ∼ u direction holds as well: let Z 0 = {a∼ 1 , . . . , ak } ∩ dom(D ). By Fact 4, for any guarded set F 0 in Bu∗ with F 0 ∩ Z 0 6= ∅ there exists a guarded set F in Bu∗ such that there exists an isomorphism u∗ h from Bu∗ |F 0 onto B|F that extends the canonical automorphism ĥG,t .Thus Bu∗ |= ψ(~a) iff Bu+ |= ψ(~a∼ ) follows. For sentences ϕ of the form ∀xψ(x) in O, where ψ(x) is an openGF formula of depth two using at most binary relations the argument is similar using the fact that guarded sets {a, b} are either completely contained in dom(Du ) or contain at most one element from dom(Du ). This finishes the proof of Fact 5. Finally we hook for any maximal guarded G in D the interpretation Bu+ bag(G) to G in the original instance D and obtain a forest model B+ . It is straightforward to prove that for any maximal guarded set G in D, any tuple ~e containing all elements of G, and the copy f~ of ~e in bag(G) there is a connected guarded bisimulation between (Bu∗ , f~) and (B+ , ~e). It follows that B+ is a model of O and D and that B+ 6|= q(~a) if Bu∗ 6|= q(~b). Case 2. O is a uGF− (1, =) ontology. In this case the construction is simpler as we do not modify B further. There is no need to manipulate B as we are in a fragment of depth one in which the outermost universal quantifier is guarded by an equality. Define B+ by adding to D • all atoms R(a↑1 , . . . , a↑k ) such that R(a1 , . . . , ak ) ∈ B and a1 , . . . , ak ∈ dom(Du ); • for any a ∈ dom(D) for a fixed copy a0 of a in Du a copy Ba0 of B that is hooked to D at a by identifying a0 and a. Using Fact 1 and the condition that O is a uGF− (1, =) ontology it is straightforward to prove that B+ is a model of O and D. Also by Fact 1 and construction B+ 6|= q(~a) if B 6|= q(~b). We now assume that O is a uGC− 2 (1, =) or a ALCHIF ontology of depth 2. Let D be an instance, G0 be a maximal guarded set in D, ~a in G0 , ~b the copy of ~a in bag(G0 ), and q an rAQ. We have to show that O, D |= q(~a) implies O, Du |= q(~b). Observe that now we have to consider the uGC2 -unravelling rather than the uGFunravelling of D. First we establish again the existence of certain automorphisms of Du . In this case, however, they are not induced by automorphisms of the tree (T (D), E) but are determined directly on the interpretation. Recall that for any t, t0 ∈ T (D) with t ∼ t0 the mapping ht,t0 that sends every e ∈ dom(bag(t)) to the unique f ∈ dom(bag(t0 )) with e↑ = f ↑ is an isomorphism from bag(t) to bag(t0 ). Call ht,t0 the canonical isomorphism from bag(t) onto bag(t0 ). One can easily prove the existence of an extended canonical automorphism ĥt,t0 of Du that extends ht,t0 . Fact 1. If t, t0 ∈ T (D) such that t ∼ t0 there is an automorphism ĥt,t0 of Du that extends ht,t0 . Let B be a materialization of O and Du . Fact 1 entails the following. Fact 2. For all t, t0 with t ∼ t0 and ~e in dom(bag(t)) and any rAQ q(~ x) such that ~ x has the same length as ~e: B |= q(~e) ⇔ B |= q(ht,t0 (~e)) We now distinguish two cases. uGC− 2 (1, =) Case 3. O is a ontology. This case is similar to Case 2. No further modification of B is needed. We may assume that B is obtained from Du by hooking cg-tree decomposable interpretations Bc to c for every c ∈ dom(Du ) and by adding atoms R(c, d) to Du for distinct c, d ∈ dom(Du ). Define B+ by adding to D • all atoms R(a↑1 , . . . , a↑k ) such that R(a1 , . . . , ak ) ∈ B and a1 , . . . , ak ∈ dom(Du ); • for any a ∈ dom(D) for a fixed copy a0 of a in Du a copy of Ba0 that is hooked to D at a by identifying a0 and a. Using Fact 1 and the condition that O is a uGC− 2 (1, =) ontology it is straightforward to prove that B+ is a model of O and D. Also by Fact 1 and construction B+ 6|= q(~a) if B 6|= q(~b). Case 4. O is a ALCHIF ontology of depth 2. The proof that follows is similar to Case 1, but one cannot hook to any bag(t) a copy of any rAQ from which there is a homomorphism into B that is injective on the root bag of q and maps it into dom(bag(t)) as this can lead to violations of functionality. Two modifications are needed: firstly, we do not independently hook interpretations to bags bag(t) with two elements in Du . This is to avoid violations of functionality due to guarded sets G1 and G2 with G1 ∩ G2 = {d} when the interpretations we hook to G1 and G2 independently add an R-successor to d for a function R (this has already been done in Case 3). Secondly, we cannot hook arbitrarily many rAQs to bags as this will lead to violations of functionality as well. We may again assume that B is obtained from Du by hooking cg-tree decomposable interpretations Bc to c for every c ∈ dom(Du ) and by adding atoms R(c, d) to Du for distinct c, d ∈ dom(Du ). Observe that GBc = {{a, b} | R(a, b) ∈ Bc , a 6= b} is an undirected tree. We call c its root and in this proof call such an interpretation a tree interpretation with root c. We have to modify Du to be able to uniformize. For any c ∈ dom(Du ) we define the tree instance Dc with root c as follows. Let Dq be the instance corresponding to an rAQ q = q(x) ← φ with a single answer variable x and a single additional variable y such that there is an injective homomorphism h from Dq to B mapping x to some c in dom(Du ) and such that neither R(h(x), h(y)) ∈ B nor R(h(y), h(x)) ∈ B for any R that is functional in O. Then Dc contains a copy of Dq obtained by identifying the variable x with c. Set [ Du+ = {R(~a) ∈ B | ~a ⊆ dom(Du )} ∪ Dc . c∈dom(Du ) The following properties of Du+ follow directly from the definition and standard properties of ALCHIF : 1. For any c, d ∈ dom(Du ) with c↑ = d↑ there is an isomorphism from Dc onto Dd mapping c to d; 2. there is a homomorphism from Du+ into B preserving dom(Du ) and functionality. Thus, in particular, any materialization of Du+ is a materialization of Du and for every CQ q(~x) and d~ in Du of the same length as ~ x: ~ O, Du+ |= q(d) ⇔ ~ O, Du |= q(d) Now take a materialization Bu+ of Du+ and O. Thus Bu+ is obtained from Du+ by hooking tree-interpretations Bu+ that are c models of Dc to every c ∈ dom(Du ). By Point 1 and Fact 1 and the properties of ALCHIF we have the following: Fact 3. For any c, d ∈ dom(Du ) with c↑ = d↑ , c ∈ dom(bag(t)), and d ∈ dom(bag(t0 )) such that t ∼ t0 the canonical automorphism u+ ĥt,t0 extends to an isomorphism from Bu+ |dom(Dc ) onto B|dom(Dd ) . The following fact can now be proved by modifying in a straightforward way the proof of Fact 4 above. Fact 4. For any c, d ∈ dom(Du ) with c↑ = d↑ and any e ∈ u+ dom(Bu+ c ) there exists an isomorphic embedding of B|{c,e} into u+ B|dom(Dd ) . Let for c, d ∈ dom(Du ), c ∼ d if c↑ = d↑ . Fix for any equivalence class [c] = {d | d ∼ c} a unique c∼ ∈ [c]. We now uniformize Bu+ . For each d ∈ dom(Du ) we hook at d to Du an isomorphic u+ ↑ ↑ copy Bu∗ d of the interpretation Bc∼ with c∼ = d and remove u∗ u∗ Bu+ . Denote the resulting model by B . B is uniform in the d u sense that for any c ∼ d the interpretations Bu∗ c hooked to c in D u and Bu∗ hooked to d in D are isomorphic. One can now prove d similary to the proof of Fact 5 above the following: Fact 5. Bu∗ is a materialization of O and Du . It remains construct the materialization B+ of D. To this end we hook to any c ∈ dom(D) the tree interpretation Bu∗ d with d↑ = c. In addition we include in B+ all R(a↑1 , . . . , a↑k ) such that R(a1 , . . . , ak ) ∈ Bu+ and a1 , . . . , ak ∈ dom(Du ). Using Fact 5 one can show that B+ is a model of O and D such that B+ 6|= q(~a) if B 6|= q(~b). G. PROOFS FOR SECTION 6 Theorem 8 (restated) For any of the following ontology languages, CQ-evaluation w.r.t. L is CSP-hard: uGF2 (1, =), uGF2 (2), uGF2 (1, f ), and the class of ALCF ` ontologies of depth 2. P ROOF. We provide additional details of the proof for uGF2 (1, =). Recall that O contains ^ _ 6= 6= ∀x( ¬(ϕ6= ϕa (x)) a (x) ∧ ϕa0 (x)) ∧ a6=a0 a ∀x(A(x) → ¬ϕ6= when A(a) 6∈ A a (x)) 6= ∀xy(R(x, y) → ¬(ϕ6= (x) ∧ ϕ (y))) when R(a, a0 ) 6∈ A a a0 = ∀xϕa (x) for all a ∈ dom(A) where A and R range over symbols in sig(A) of the respective arity. We first show that coCSP(A) polynomially reduces to the query evaluation problem for (O, q ← N (x)). Assume D with sig(D) ⊆ sig(A) is given and let D0 = D ∪ {Ra (d, d0 ) | Pa (d) ∈ A}, where the relations Pa ∈ sig(A) determine the precoloring and d0 is a fresh labelled null for each d ∈ dom(D). We show that D → A iff O, D0 6|= q. First let h be a homomorphism from D to A. Define a model B of D0 and O by adding to D0 for any d ∈ dom(D) with h(d) = a and infinite chain Ra (d0,d , d1,d ), Ra (d1,d , d2,d ), . . . with d0,d = d and fresh labelled nulls di,d for i > 0. Also add Ra (d, d) to D for all d ∈ dom(D) and all labelled nulls used in the chains. Using the definition of O it is not difficult to show that B is a model of O and D0 . Thus O, D0 6|= q, as required. Now assume that O, D0 6|= q. Then there is a model B of O and D0 such that B 6|= q. Define a mapping h from D to A by setting h(d) = a iff there exists d0 with d0 6= d and Ra (d, d0 ) ∈ B. Using the definition of O it is not difficult to show that h is well defined and a homomorphism. This finishes the proof of the polynomial reduction of coCSP(A) to the query evaluation problem for (O, q ← N (x)). Now we show that for any rAQ q the query evaluation problem for (O, q) can be polynomially reduced to coCSP(A). We first show that there is a polynomial reduction of the problem whether an instance D is consistent w.r.t. O to CSP(A). Assume D is given. Let D• be the sig(A)-reduct of D extended with Pa (d) for any d with Ra (d, d0 ) ∈ D for some d0 6= d. Using the definition of O one can show that D is consistent w.r.t. O iff D• → A. ~ iff D is not consistent w.r.t. O or D0 |= q(d) ~ Now O, D |= q(d) where D0 = D ∪ {Ra (d, d) | a ∈ dom(A), d ∈ dom(D)}. The latter problem is in PT IME. H. D d d1 d3 X Y P Y X P d2 R2 d1 d d4 R2 X Y d3 R 1 Y X d4 R 2 d2 R1 R1 PROOFS FOR SECTION 7 Theorem 10 (restated) For the ontology languages uGF− 2 (2, f ) and ALCIF ` of depth 2, it is undecidable whether for a given ontology O, 1. query evaluation w.r.t. O is in PT IME, Datalog6= -rewritable, or CO NP-hard (unless PT IME = NP); 2. O is materializable. As discussed in the main part of the paper, we prove Theorem 10 in two steps: we first construct an ontology Ocell that marks lower left corners of cells and then we construct an ontology OP that marks the lower left corner of grids that represent a solution to a rectangle tiling problem P. We construct the ontologies in ALCIF ` . Thus, in addition to ALCI concepts we use concepts of the form (≤ 1R), (= 1R), and (≥ 2R). The proof is given using DL notation. Marking the lower left corner of grid cells. Let X and Y be binary relations and X − , Y − their inverses in ALCI. Using the sentences − Figure 2: D 6|= cell(d) ⇒ O, D 6|= (= 1P )(d) Details are given below. Resolving the first issue is more involved. There are two reasons why the converse does not hold. First, we might have X(d, d1 ), Y (d1 , d3 ), X(d, d2 ), Y (d2 , d4 ) ∈ D with d3 6= d4 but both d3 and d4 have already two R2 -successors in D. Then the marker (= 1P ) is entailed without the cell being closed at d (i.e, without D |= cell(d)). Second, we might have an odd cycle of mutually distinct e0 , e1 , . . . , en ∈ D such that each ei reaches e(i+1) mod n+1 via a Y − X − Y X-path in D, for i = 0, 1, . . . , n. Figure 3 illustrates this for n = 2. Then, since in at least two neighbouring ei , e(i+1) mod n+1 the same concept (= 1Ri ) is enforced, the marker (= 1P ) is enforced at some node d from which ei and e(i+1) mod 3 are reachable along XY and Y X-paths, respectively, without satisfying cell(d). We resolve both problems by enforcing that D is not consistent w.r.t. our ontology if such constellations appear. > v (≤ 1Z) for all Z ∈ {X, Y, X , Y − } we ensure that in any instance D that is consistent w.r.t. our ontology the relations X and Y as well as their inverses X − and Y − are functional in D in the sense that R(d, d0 ), R(d, d00 ) ∈ D implies d0 = d00 for all R ∈ {X, Y, X − , Y − }. For an instance D and d ∈ dom(D) we write D |= cell(d) if there exist d1 , d2 , d3 with X(d, d1 ), Y (d1 , d3 ), Y (d, d2 ), X(d2 , d3 ) ∈ D. Since X and Y are functional in D, D |= cell(d) implies d3 = d4 if X(d, d1 ), Y (d1 , d3 ), X(d, d2 ), Y (d2 , d4 ) ∈ D. As a marker for all d such that D |= cell(d) we use the concept (= 1P ) for a binary relation P . For P and all binary relations R introduced below we add the inclusion > v ∃R.> to our ontology so that when building models one can only choose between having exactly one R-successor or at least two R-successors. To do the marking we use concepts (= 1R1 ) and (= 1R2 ) with binary relation symbols R1 , R2 as ‘second-order variables’, ensure that all nodes in D are contained in (= 1R1 ) t (= 1R2 ), and then state (as a first attempt) that G ∃X.∃Y.(= 1Ri ) u ∃Y.∃X.(= 1Ri ) v (= 1P ) i=1,2 Clearly, if D |= cell(d) then O, D |= (= 1P )(d) for the resulting ontology O. Conversely, the idea is that if D 6|= cell(d) and X(d, d1 ), Y (d, d2 ), Y (d1 , d3 ), X(d2 , d4 ) ∈ D but d3 6= d4 , then one can extend D by adding a single R1 -successor and two R2 successors to d3 , a single R2 -successor and two R1 -successors to d4 , and two P -successors to d and thus obtain a model B of O and D in which d 6∈ (= 1P )B , see Figure 2. In general, however, this does not work and the latter sentence has depth 3. The depth issue is easily resolved by introducing auxiliary binary relation symbols RiX , RiY , RiXY and RiY X , i = 1, 2, and replacing concepts such as ∃X.∃Y.(= 1Ri ) by (= 1RiXY ) and the sentences (= 1RiXY ) ≡ ∃X.(= 1RiY ) and (= 1RiY ) ≡ ∃Y.(= 1Ri ) Y X X Y Y e0 e2 e1 X X Y Y X Y X Figure 3: D |= (= 1P )(d) 6⇒ D |= cell(d) In detail, we construct an ontology Ocell that uses in addition to X, Y, X − , Y − the set AUXcell of binary relations P, Ri , RiW , where i ∈ {1, 2} and W ranges over a set of words over the alphabet {X, Y, X − , Y − } we define below. The RiW serve as auxiliary symbols to avoid sentences of depth larger than two. No unary relations are used. To ensure that CQ-evaluation is Datalog6= rewritable w.r.t. Ocell we include in Ocell the concept inclusions > v ∃Q.> for all binary relations Q ∈ AUXcell . If an instance D is consistent w.r.t. Ocell , then its materialization adds a certain number of Q-successors to any d ∈ dom(D) to satisfy > v ∃Q.> for Q ∈ AUXcell . The remaining sentences in Ocell only influence the number of Q-successors that have to be added and thus do not influence the certain answers to CQs. In fact, we will have the following equivalence ~ iff {> v ∃Q.> | Q ∈ AUXcell }, D |= q(d) ~ Ocell , D |= q(d) for any CQ q and D that is consistent w.r.t. Ocell . Define for any non-empty word W over {X, Y, X − , Y − } the set ∃W (= 1Ri ) of sentences inductively by setting for Z ∈ {X, Y, X − , Y − }: ∃Z (= 1Ri ) = {(= 1RiZ ) ≡ ∃Z.(= 1Ri )} ∃ZW (= 1Ri ) = {(= 1RiZW ) ≡ ∃Z.(= 1RiW )} ∪ ∃W (= 1Ri ) Thus, ∃W (= 1Ri ) states that the unique d0 reachable from d along a W -path has exactly one Ri -successor iff d has exactly one RiW successor. Now Ocell is defined as follows. 1. Functionality of X, Y, X − and Y − is stated using > v (≤ 1Z) − for X, Y, X , Y . 2. All nodes have exactly one R1 successor or exactly one R2 successor: > v (= 1R1 ) t (= 1R2 ) 3. If all nodes reachable along an XY -path and a Y X-path have exactly one R1 and exactly one R2 -successor, then the marker (= 1P ) is set: l (= 1RiXY ) u (= 1RiY X ) v (= 1P ) i=1,2 4. For i = 1, 2, the concept (= 1Ri ) is true at least at every third node on the cycles in D introduced above: (= 1RjCC ) v (= 1Ri ) t (= 1RiC ) t (= 1RiCC ) for {i, j} = {1, 2} and C = X − Y − XY 5. If (= 1R1 ) and (= 1R2 ) are both true in a node in D then they are both true in all ‘neighbouring’ nodes in D: (= 1R1Y − − Y − XY ) u (= 1R2X X−Y X ) u (= 1R2Y − − for R12 := (= 1R1 ) u (= 1R2 ) • E is of the form e0 ≤ · · · ≤ en with ei 6= ej for all i 6= j, or • E is a cycle e0 ≤ · · · ≤ en with ei = ej iff {i, j} = {0, n} for all i 6= j. − (= 1R1X We thus assume in what follows that all three conditions are satisfied. Clearly, they can be encoded in Datalog6= . Moreover, by Point 2 we can assume that D is saturated for the sentences ∃W (= 1R) in the sense that if (= 1RZW ) ≡ ∃Z.(= 1RW ) ∈ Ocell then for any Z(d, d0 ) ∈ D the following holds: d has at least two RZW successors in D iff d0 has at least two RW -successors in D. Now let e1 ≤ e2 iff there are X(d, d1 ), Y (d1 , e1 ), Y (d, d2 ), X(d2 , e2 ) ∈ D. Let e1 ∼ e2 iff e1 ≤ e2 or e2 ≤ e1 and let ∼∗ be the smallest equivalence relation containing ∼. For any equivalence class E w.r.t. ∼∗ either Y − XY ) v R12 X−Y X ) v R12 6. The auxiliary sentences ∃W (= 1Ri ) for all relations RiW used above. Lemma 11 The ontology Ocell has the following properties for all instances D: 1. for all d ∈ dom(D): Ocell , D |= (= 1P )(d) iff D is not consistent w.r.t. Ocell or D |= cell(d); moreover, if D is consistent w.r.t. Ocell , then there exists a materialization B of D and Ocell such that d ∈ (= 1P )B iff d ∈ dom(B) and D |= cell(d); 2. If all binary relations are functional in D, then D is consistent w.r.t. Ocell ; 3. CQ-evaluation w.r.t Ocell is Datalog6= -rewritable. P ROOF. We first derive a necessary and sufficient condition for consistency of instances D w.r.t. Ocell . Lemma 11 then follows in a straightforward way. It is easy to see that if any of the following conditions is not satisfied, then D is not consistent w.r.t. Ocell : • all X, Y, X − , Y − are functional in D; • D is consistent w.r.t. the sentences ∃W (= 1Ri ) in Ocell ; • if D |= cell(d), then d has at most one P -successor in D. Thus, if E is not a singleton {e} with e ≤ e, we can partition E into two sets E1 and E2 (with one of them possibly empty) such that (†) there are no three e ≤ e0 ≤ e00 in the same Ei . Now set for any equivalence class E and {i, j} = {1, 2}, Ej0 = {d ∈ E | D |= (≥ 2Ri )(d)} Claim 1. D is consistent w.r.t. Ocell iff the following conditions hold for all equivalence classes E: (a) if E = {e} with e ≤ e then e 6∈ E10 ∪ E20 ; (b) otherwise, there exists a partition E1 , E2 of E with Ei ⊇ Ei0 satisfying (†). Moreover, if (a) and (b) hold, then a materialization B satisfying the conditions of Lemma 11 (1) exists. (⇒) First assume that Point (a) does not hold for some E = {e} with e ≤ e. Then D is not consistent w.r.t. Ocell by the axioms given under (2) and (4) since it is not possible to satisfy (= 1Ri ) in e if e ∈ Ej0 (i 6= j). Now assume that (b) holds. So there exists E that has either at least two elements or e 6≤ e if E = {e} but there exists no partition E1 , E2 of E with Ei ⊇ Ei0 satisfying (†). Then the axioms under (4) cannot be satisfied without having at least one node in E that is in both (= 1R1 ) and (= 1R2 ). But then by the axioms under (5) all nodes in E are in (= 1R1 ) and in (= 1R2 ) which implies that E10 = E20 = ∅. This contradicts our assumption that there is no partition E1 , E2 of E with Ei ⊇ Ei0 satisfying (†). (⇐) Assume (a) and (b) hold for every equivalence class E. For E = {e} with e ≤ e we can thus construct the relevant part of a model B of D and Ocell such that e has exactly one Ri -successor for i = 1, 2 and also exactly one P -successor. Thus axiom (2) is satisfied. All e that are not members of such an equivalence class are given at least two P -successors. For any equivalence class with at least two members or E = {e} with e 6≤ e we can construct the relevant part of B such that each d ∈ Ei has exactly one Ri successor and each d ∈ E \ Ei has at least two Ri -successors. As E1 and E2 are mutually disjoint, the axioms under (5) are satisfied. As (†) is satisfied, the axioms under (4) are satisfied. As E1 ∪ E2 contains E, the axioms under (2) are satisfied. The conditions (a) and (b) can be encoded in a Datalog6= program in a straightforward way and thus there is a Datalog6= -program checking consistency of an instance D w.r.t. Ocell . Datalog6= rewritability of CQ-evaluation w.r.t. Ocell now follows from the observation that ~ iff {> v ∃Q.> | Q ∈ AUXcell }, D |= q(d) ~ Ocell , D |= q(d) for any CQ q, D that is consistent w.r.t. Ocell , and any d~ in D. This finishes the construction and analysis of Ocell . Marking the lower left corner of grids. We now encode the rectangle tiling problem. Recall that an instance of the finite rectangle tiling problem is given by a triple P = (T, H, V ) with T a non-empty, finite set of tile types including an initial tile Tinit to be placed on the lower left corner and nowhere else and a final tile Tfinal to be placed on the upper right corner and nowhere else, H ⊆ T × T a horizontal matching relation, and V ⊆ T × T a vertical matching relation. A tiling for (T, H, V ) is a map f : {0, . . . , n}×{0, . . . , m} → T such that n, m ≥ 0, f (0, 0) = Tinit , f (n, m) = Tfinal , (f (i, j), f (i + 1, j)) ∈ H for all i < n, and (f (i, j), f (i, j + 1)) ∈ V for all i < m. We say that P admits a tiling if there exists a map f that is a tiling for P. It is undecidable whether an instance of the finite rectangle tiling problem admits a tiling. Now let P = (T, H, V ) with T = {T1 , . . . , Tp }. We regard the tile types in T as unary relations and take the binary relations symbols X, Y, X − , Y − from above and an additional set AUXgrid of binary relations F, F X , F Y , U, R, L, D, and A. The ontology OP is defined by taking Ocell and adding the sentences > v ∃Q.> for all Q ∈ AUXgrid as well as all sentences in Figure 4 to it, where (Ti , Tj , T` ) range over all triples from T such that (Ti , Tj ) ∈ H and (Ti , T` ) ∈ V : We discuss the intuition behind the sentences of OP . The relations X and Y are used to represent horizontal and vertical adjacency of points in a rectangle. The concepts (= 1X) of OP serve the following puroposes: • (= 1U ), (= 1R), (= 1L), and (= 1D) mark the upper, right, left, and lower (‘down’) border of the rectangle. • The concept (= 1F ) is propagated through the grid from the upper right corner where Tfinal holds to the lower left one where TinitiaL holds, ensuring that every position of the grid is labeled with at least one tile type, that the horizontal and vertical matching conditions are satisfied, and that the grid cells are closed (indicated by (= 1P ) from the ontology Ocell ). • The relations F X and F Y are used to avoid depth 3 sentences in the same way as the relations RiW are used to avoid such sentences in the construction of Ocell . • Finally, when the lower left corner of the grid is reached, the concept (= 1A) is set as a marker. We write D |= grid(d) if there is a tiling f for P with domain {0, . . . , n} × {0, . . . , m} and a mapping γ : {0, . . . , n} × {0, . . . , m} → dom(D) with γ(0, 0) = d such that • for all j < n, k ≤ m: T (γ(j, k)) ∈ D iff T = f (j, k); • for all b1 , b2 ∈ dom(D): X(b1 , b2 ) ∈ D iff there are j < n, k ≤ m such that (b1 , b2 ) = (γ(j, k), γ(j + 1, k)); • for all b1 , b2 ∈ dom(D): Y (b1 , b2 ) ∈ D iff there are j ≤ n, k < m such that (b1 , b2 ) = (γ(j, k), γ(j, k + 1)). • g is closed: if d ∈ ran(γ) and Z(d, d0 ) ∈ D for some Z ∈ {X, Y, X − , Y − }, then d0 ∈ ran(γ). We then call d the root of the n × m-grid with witness function γ for P. The following result can now be proved using Lemma 11. Lemma 12 The ontology OP has the following properties for all instances D: 1. for all d ∈ dom(D): OP , D |= (= 1A)(d) iff D is not consistent w.r.t. OP or D |= grid(d); moreover, if D is consistent w.r.t. OP , then there exists a materialization B of D and OP such that d ∈ (= 1A)B iff d ∈ dom(B) and D |= grid(d); 2. If D |= grid(d) with witness γ such that dom(D) = ran(γ), and all relations are functional in D then D is consistent w.r.t. OP ; 3. CQ-evaluation w.r.t OP is Datalog6= -rewritable. We now use Lemma 12 to prove the undecidability result. Let O = OP ∪ {(= 1A) v B1 t B2 }, where B1 and B2 are unary relations. Lemma 13 (1) If P admits a tiling, then O is not materializable and CQ-evaluation w.r.t. O is CO NP-hard. (2) If P does not admit a tiling, then CQ-evaluation w.r.t. O is Datalog6= -rewritable. P ROOF. (1) Consider a tiling f for P with domain {0, . . . , n}× {0, . . . , m}. Regard the pairs in {0, . . . , n} × {0, . . . , m} as constants. Let D contain X((i, j), (i + 1, j)), for i < n and j ≤ m, Y ((i, j), Y (i, j + 1)) for i ≤ n and j < m, and T (i, j) for f (i, j) = T for i ≤ n and j ≤ m. Then D is consistent w.r.t. OP and O, D |= (= 1A)(0, 0), by Lemma 12. Thus OP , D |= B1 (0, 0) ∨ B2 (0, 0) but O, D 6|= B1 (0, 0) and O, D 6|= B2 (0, 0). Thus O is not materializable and so CQ-evaluation is CO NP-hard. (2) Assume P does not admit a tiling. Any instance D is consistent w.r.t. O iff it is consistent w.r.t. OP and, by Lemma 12, if OP , D |= (= 1A)(d) for some d ∈ dom(D), then D is not con~ iff OP , D |= q(d) ~ holds for sistent w.r.t. OP . Thus, O, D |= q(d) ~ Thus CQ-evaluation w.r.t. O is Datalog6= every CQ q and all d. rewritable by Lemma 12. Lemma 13 implies Theorem 10 for ALCIF ` ontologies of depth 2. For uGF− 2 (2, f ) we modify the construction of Ocell and OP as follows: • The relations X, Y, X − , Y − are defined as functions and it is stated that X − and Y − are the inverse of X and Y , respectively. • For any relation symbol R in OP distinct from X, Y, X − , Y − we introduce a function F , state ∀xF (x, x), and replace by > v ∃R.> ∀x∃y(R(x, y) ∧ F (x, y)). • We replace all occurrences of (= {X, Y, X − , Y − } in OP by 1R) for R 6∈ ¬∃y(R(x, y) ∧ ¬F (x, y)) Now Lemma 11 and Lemma 12 still hold for the resulting ontologies Ocell and OP if (= 1P ) and (= 1A) are replaced by ¬∃y(P (x, y) ∧ ¬F (x, y)) and ¬∃y(A(x, y) ∧ ¬F (x, y)), respectively. We now come to the proof of Theorem 11. We first give a more detailed definition of the run fitting problem. Tfinal ∃X.((= 1U ) u (= 1F ) u Tj ) u Ti ∃Y.((= 1R) u (= 1F ) u T` ) u Ti v (= 1F ) u (= 1U ) u (= 1R) v (= 1U ) u (= 1F ) v (= 1R) u (= 1F ) ∃Y.(= 1F ) v (= 1F Y ) ∃X.(= 1F ) v (= 1F X ) ∃X.(Tj u (= 1F ) u (= 1F Y ))u ∃Y.(T` u (= 1F ) u (= 1F X )) u (= 1P ) u Ti (= 1F ) u Tinit t 1≤s<t≤p Ts u Tt v (= 1F ) v (= 1A) u (= 1D) u (= 1L) v ⊥ (= 1U ) v ∀Y.⊥ (= 1R) v ∀X.⊥ (= 1U ) v ∀X.(= 1U ) (= 1D) v ∀Y − .⊥ (= 1L) v ∀X − .⊥ (= 1D) v ∀X.(= 1D) (= 1R) v ∀Y.(= 1R) (= 1L) v ∀Y.(= 1L) Figure 4: Additional Axioms of OP We consider non-deterministic Turing machines (TMs, for short). A TM M is represented by a tuple (Q, Σ, ∆, q0 , qa ), where Q is a finite set of states, Σ is a finite alphabet, ∆ ⊆ Q × Σ × Q × Σ × {L, R} is the transition relation, and q0 , qa ∈ Q are the start state and accepting state, respectively. The tape of M is assumed to be one-sided infinite, that is, it has a left-most cell and extends infinitely to the right. The set of strings accepted by M is denoted by L(M ). A configuration of M is represented by a string vqw, where q is the state, v is the inscription of the tape to the left of the head, and w is the inscription of the tape to the right of the head in the configuration (as usual, we omit all but possibly a finite number of trailing blanks). The configuration is accepting if q = qa . A run of M is represented by a finite sequence γ0 , . . . , γn of configurations of M with |γ0 | = · · · = |γn |. We assume that the accepting state has no successor states. A run is accepting if its last configuration is accepting. Definition 7 Let M = (Q, Σ, Γ, ∆, q0 , qa ) be a TM. • A partial configuration of M is a string γ̃ over Q ∪ Σ ∪ {?} such that there is at most one i ∈ {1, . . . , n} with γ̃[i] ∈ Q. Here, γ[i] denotes the symbol that occurs at the i-th position of γ. A configuration γ matches γ̃ if |γ| = |γ̃| and for each i ∈ {1, . . . , n} with γ̃[i] 6= ? we have γ[i] = γ̃[i]. • A partial run of M is a sequence γ̃ = (γ̃0 , γ̃1 , . . . , γ̃m ) of partial configurations γ̃i of M such that γ̃0 starts with q0 , has no other occurrence of a state q ∈ Q, and |γ̃0 | = · · · = |γ̃m |. A run γ0 , γ1 , . . . , γn of M matches γ̃ if m = n and γi matches γ̃i , for each i ∈ {0, 1, . . . , m}. Definition 8 The run fitting problem for a TM M , denoted by RF(M ), is defined as follows: Given a partial run γ̃ of M , decide whether there is an accepting run of M that matches γ̃. It is easy to see that RF(M ) is in NP for every TM M . The following theorem states that the complexity of RF(M ) may be intermediate between PT IME and NP-complete, provided PT IME 6= NP. Theorem 12 (restated) If PT IME 6= NP, then there is a TM whose run fitting problem is neither in PT IME nor NP-complete. To prove Theorem 12, we now construct such a TM. The construction is a modification of the construction from Impagliazzo’s version of the proof of Ladner’s Theorem [38], as presented in [2, Theorem 3.3]. Fix a polynomial-time TM MSAT for SAT. For a monotone polynomial-time computable function H : N → N to be specified later, let MH be a polynomial-time TM that works as follows on a given input v: 1. Check that there is an integer n ≥ 0 such that v is the unary H(n) representation of nH(n) (i.e., v = 1n ). If such an n does not exist, then reject v. 2. Guess an input w of length n for MSAT . 3. Generate the initial configuration γ of MSAT on input w. 4. Run MSAT from γ, and accept v iff MSAT accepts w. We refer to the first three steps as the initialization phase. We now define the function H : N → N. Fix a polynomial time computable enumeration M0 , M1 , M2 , . . . of deterministic TMs such that all runs of Mi on inputs of length n terminate after at most i · ni steps, and for each problem A in PT IME there are infinitely many i such that L(Mi ) is in A.3 Then, H(n) is defined as min {i < log log n | for all strings z of length ≤ log n, Mi accepts z iff z ∈ RF(MH )}, or as log log n if the minimum does not exist. It is not hard to see that H is well-defined, and that there is a deterministic polynomialtime Turing machine that, given a positive integer n in unary, outputs H(n). For details, we refer to [2]. This finishes the construction of MH . Lemma 15 below shows that RF(MH ) has the desired properties, namely that RF(MH ) is neither in PT IME nor NP-complete, unless PT IME = NP. It uses the following auxiliary lemma. Lemma 14 • If RF(MH ) is in PT IME, then H(n) = O(1). 3 It is easy to construct a deterministic polynomial-time Turing machine that, given an integer i ≥ 0, outputs a deterministic Turing machine Mi such that the sequence (Mi )i≥0 has the desired properties. For instance, let Mi0 be the i-th deterministic Turing machine in lexicographic order under some string encoding of Turing machines, and add a clock to Mi0 that stops the computation of Mi0 after at most i · ni steps (and rejects if Mi0 did not accept yet). • If RF(MH ) is not in PT IME, then limn→∞ H(n) = ∞. P ROOF. The proof is as in [2]. We provide a proof for the sake of completeness. Assume first that RF(MH ) is in PT IME. Then, there is an index i i such that L(Mi ) = RF(MH ). Now, for all n > 22 , we have i < log log n and thus H(n) ≤ i by the definition of H. It follows i that H(n) ≤ max{H(m) | m ≤ 22 + 1}, and therefore H(n) = O(1). Next assume that RF(MH ) is not in PT IME. For a contradiction, suppose that limn→∞ H(n) 6= ∞. Since H is monotone, this means that there are integers n0 , i ≥ 0 such that H(n) = i for all integers n ≥ n0 . Let n ≥ n0 . By the definition of H, we have that Mi agrees with RF(MH ) on all strings of length at most log n. Since this holds for all n ≥ n0 , we conclude that Mi decides RF(MH ). But then, RF(MH ) is in PT IME, which contradicts our initial assumption that RF(MH ) is not in PT IME. We now prove the main lemma, which concludes the proof of Theorem 12. Lemma 15 If PT IME 6= NP, then RF(MH ) is neither in PT IME nor NP-complete. P ROOF. “RF(MH ) is not in PT IME”: For a contradiction, suppose that RF(MH ) is in PT IME. By Lemma 14, there is a constant c ≥ 0 such that for all integers n ≥ 0 we have H(n) ≤ c. Suppose that on inputs of length n, MSAT makes at most p(n) := nk + k steps. Then, the following is a polynomial-time many-one reduction from SAT to RF(MH ), implying PT IME = NP, and therefore contradicting the lemma’s assumption. Given an input x of length n for MSAT : h 1. Compute h := H(n) and w := 1n . 2. Output the partial run γ̃0 , . . . , γ̃i+p(n) of MH such that: • γ̃0 , . . . , γ̃i corresponds to the initialization phase of MH on input w that generates the start configuration of MSAT on input x. In particular, γ̃0 , . . . , γ̃i are complete configurations of MH , and γ̃0 = q0 w and γ̃i = q00 x, where q0 and q00 are the start states of MH and MSAT , respectively; • γ̃i+1 , . . . , γ̃i+p(n) consist entirely of stars. Note that the partial run γ̃0 , . . . , γ̃i+p(n) can be easily computed by simulating the initialization phase MH on input w, where in step 2 of the initialization phase we “guess” the input string x given as input to the reduction. Then, we pad the sequence of configurations corresponding to the initialization phase by p(n) partial configurations, each consisting of exactly p(n) stars. “RF(MH ) is not NP-complete”: Suppose, to the contrary, that RF(MH ) is NP-complete. Then there is a polynomial-time manyone reduction f from SAT to RF(MH ). Using f , we construct a polynomial-time many-one reduction g from SAT to SAT such that for all sufficiently large strings x we have |g(x)| < |x|. This implies that SAT can be solved in polynomial time, and contradicts PT IME 6= NP. Consider an input x for SAT. Since f is a many-one reduction from SAT to RF(MH ), we have f (x) = γ̃0 #γ̃1 # · · · #γ̃m for some partial run γ̃ := (γ̃0 , γ̃1 , . . . , γ̃m ) (4) of MH . Moreover, x ∈ SAT iff there is an accepting run of MH that matches γ̃. By the construction of MH , an accepting run of MH on an input y can only exist if there is an integer n ≥ 0 such H(n) that y = 1n . Note also that the length of y has to be bounded by |γ̃0 |. Define N := {n ∈ N | nH(n) ≤ |γ̃0 |}. Then, as argued above, the following are equivalent: 1. x ∈ SAT; 2. γ̃0 #γ̃1 # · · · #γ̃m ∈ RF(MH ); 3. there is an n ∈ N such that there is an accepting run of MH on H(n) input 1n that matches γ̃. In what follows, we show how to compute in polynomial time, for each n ∈ N , a propositional formula φn such that: • φn is satisfiable if and only if there is an accepting run of MH H(n) on input 1n that matches γ̃; • |φn | ≤ |x| |N | − 2 for all n ∈ N (if x is large enough). Then, the following function g is a polynomial-time many-one reduction from SAT to SAT: _ g(x) := φn . n∈N Assuming a suitable encoding of propositional formulas, the size of g(x) is then bounded by |x| − 1 for large enough x. Thus, g is the desired length-reducing polynomial-time self-reduction of SAT. It remains to construct φn , for all n ∈ N . C ONSTRUCTION OF φn . Fix n ∈ N . By the construction of H(n) MH , any accepting run of MH on input 1n has to start with the initialization phase. The first step of the initialization phase is H(n) . deterministic, and checks whether the input has the form 1n Thus, we can complete γ̃ in polynomial time to a partial run of MH where the first step of the initialization phase is completely specified. If this is not possible due to constraints imposed by γ̃, then we know that the desired accepting run does not exist, and we can output a trivial unsatisfiable formula φn . Otherwise, let γ̃˜ = (γ̃˜0 , γ̃˜1 , . . . , γ̃˜m ) be the resulting partial run of MH . It remains to construct a formula φn that is satisfiable iff there is an accepting run of MH that ˜ matches γ̃. ˜ Let i ≥ 0 be such that γ̃˜0 , . . . , γ̃˜i Let us take a closer look at γ̃. corresponds to the first step of the initialization phase of MH on H(n) input 1n . In particular, for each j ∈ {0, 1, . . . , i}, γ̃˜j is a completely specified configuration. It is possible to specify MH in such a way that the second and third step of the initialization phase of H(n) MH on input 1n take exactly n computation steps combined, and that any configuration after the initialization phase uses space at most n. Thus, without loss of generality we can assume: 1. |γ̃˜j | ≤ n for all j ∈ {i + 1, . . . , m}; 2. m − i − n is bounded by the running time of MSAT on inputs of length n. Let h be a polynomial-time computable function that, given γ̃˜i+1 # · · · #γ̃˜m , outputs a propositional formula that is satisfiable iff there is an accepting run of MH that starts in the second step of the initialization phase of MH in a configuration matching γ̃˜i+1 , and that matches γ̃˜i+1 , . . . , γ̃˜m . Let φn := h(γ̃˜i+1 # · · · #γ̃˜m ). This finishes the construction of φn . It is immediate from the construction of φn that φn is satisfiable H(n) if and only if there is an accepting run of MH on input 1n that matches γ̃. It remains to prove that the length of φn is bounded by |x|/|N | − 2. B OUNDING THE SIZE OF φn . Let p be a polynomial such that for all strings z, MSAT makes at most p(|z|) steps on input z, and both |f (z)| and |h(z)| are bounded by p(|z|). Since, as mentioned above, m−i−n is bounded by the running time of MSAT on inputs of length n, we have Suppose that there is an n ∈ N such that q(n) > |x|/|N | − 2. Then, 1 k |x| −2−k . n> |N | This implies nH(n) ≥ ≥ Since moreover |γ̃˜j | ≤ n for all j ∈ {i + 1, . . . , m}, we have |φn | ≤ |h(γ̃˜i+1 # · · · #γ̃˜m )| ≤ p(|γ̃˜i+1 # · · · #γ̃˜m |) Hence, for some polynomial q depending only on MSAT , f , and h. It remains to show that for all n ∈ N we have q(n) ≤ |x|/|N | − 2 if x is sufficiently large. Claim. There is a polynomial r(`) > 0 depending only on MH |x| ≥ r(|x|). such that for sufficiently large x we have |N | Proof. Recall that N consists of all integers n ≥ 0 such that nH(n) ≤ |γ̃0 |. Since γ̃0 is part of f (x), whose overall length is bounded by p(|x|), we have nH(n) ≤ p(|x|). Now, for all integers ` ≥ 0, define N (`) := {n ∈ N | nH(n) ≤ p(`)}. Then, N ⊆ N (|x|). We show that for all constants c ∈ (0, 1) there is an integer λc ≥ 0 such that for all integers ` ≥ λc we have |N (`)| ≤ `c . This implies the claim.4 Fix a constant c ∈ (0, 1) and L := {` ∈ N | |N (`)| > `c }. For each ` ∈ L, there is an n` ∈ N (`) with n` ≥ `c . Thus, by the definition of N (`) and the monotonicity of H, for each ` ∈ L we have H(n` ) ≤ p(`). (5) Now, since lim`→∞ H(`) = ∞ (by Lemma 14), we have lim`→∞ cH(`c ) = ∞. Hence, there is an integer λc ≥ 0 such c that for all ` ≥ λc we have `c·H(` ) > p(`). This implies that for c all ` ≥ λc we have |N (`)| ≤ ` (otherwise, we would violate (5)). y Assume that q(n) = nk +k. Let r be a polynomial as guaranteed by the claim. In what follows, we will assume that x is large enough so that: |x| |N | ≥ r(|x|); this can be satisfied by the previous claim. 1 k 2. (r(|x|) − 2 − k)H (r(|x|)−2−k) /k > p(|x|); this is possible since lim`→∞ H(`) = ∞ by Lemma 14. √ 4 Set r(`) := `1−c for some c ∈ (0, 1) (e.g., r(`) = `). Then, |x| ≥ |N |x| ≥ r(|x|) if |x| ≥ λc . |N | (|x|)| 1. |x| −2−k |N | |x| −2−k |N | 1 k k 1 H (r(|x|)−2−k) k ! k ≤ p(|x|) − nH(n) |φn | ≤ q(n) ≤ n` k where the last two inequalities follow from the monotonicity of H. Consequently, |γ̃˜i+1 # · · · #γ̃˜m | ≤ |γ̃˜0 # · · · #γ̃˜m | − |γ̃˜0 # · · · #γ̃˜i | ≤ p( (p(n) + n) · (n + 1) ). ) H(n) ≥ (r(|x|) − 2 − k) > p(|x|), ≤ p( (m − i) · (n + 1) ) c |x| −2−k |N | H m − i ≤ p(n) + n. `c·H(` < p(|x|) − p(|x|), which is the desired contradiction. Lemma 4 (restated) For every Turing machine M , there is a uGF− 2 (2, f ) ontology O and an ALCIF ` ontology O of depth 2 such that the following hold, where N is a distinguished unary relation: 1. there is a polynomial reduction of the run fitting problem for M to the complement of evaluating the OMQ (O, q ← N (x)); 2. for every UCQ q, evaluating the OMQ (O, q) is polynomially reducible to the complement of the run fitting problem for M . P ROOF. We give the proof for ALCIF ` ontologies of depth 2. The proof for uGF− 2 (2, f ) is obtained by modifying the ALCIF ` ontology in the same way as in the proof of Theorem 10 by replacing, for example, (≥ 2R) by ∃y(R(x, y) ∧ ¬F (x, y)). Assume M = (Q, Γ, ∆, q0 , qa ) is given. The instances D we use to represent partial runs and that provide the space for simulating matching runs are n × m X, Y -grids with Tinit written in the lower left corner, Tfinal written in the upper right corner, and B (for blank) written everywhere else. To re-use the notation and results from the proof of Theorem 10 we regard such a structure as a tiling with tile types T = {B, Tfinal , Tinit }. Then the ontology OG for G = (T, H, V ) and H V = {(B, B), (B, Tfinal ), (Tinitial , B)} = {(B, B), (B, Tfinal ), (Tinitial , B)} checks whether an instance represents a grid structure. We now construct the set OM of sentences that encode runs of M that match a partial run. For any D, the simulation of a run is triggered at a constant d exactly if OG , D |= (= 1A)(d). OM uses in addition to the binary relations in OG binary relations q ∈ Q that occur in concepts (≥ 2q) that indicate that M is in state q. It also uses binary relations G for G ∈ Γ that occur in concepts (≥ 2G) that indicate that G is writte on the corresponding cell of the tage. The sentences of OM are now as follows: • The grid in which the lower left corner is marked with (= 1A) is colored with (= 1A), q0 is the first symbol of the first configuration and no state occurs later in the first configuration: (= 1A) v ∀X.(= 1A) u ∀Y.(= 1A) , (= 1A) u Tinit v (≥ 2q0 ) , (= 1A) u (= 1D) u (≥ 2q) v (= 1L) The remaining sentences are all relativized to (= 1A) and so apply to constants in a grid only. • every grid point is colored with some (≥ 2G) for G ∈ Γ or (≥ 2q) for q ∈ Q: G G (= 1A) v (≥ 2G) t (≥ 2q) G∈Γ q∈Q • The machine M cannot be in two different states at the same time and no two distinct symbols can be written on the same cell. For all mutually distinct H1 , H2 ∈ Q ∪ Γ: (= 1A) u (≥ 2H1 ) u (≥ 2H2 ) v ⊥, • For any triple G0 qG1 ∈ Γ×Q×Γ let S(G0 qG1 ) denote the set of all possible successor triples S1 S2 S3 ∈ (Q × Γ × Γ) ∪ (Γ × Γ × Q) according to the transition relation ∆ of M . To avoid sentences of depth larger than two we introduce for the words W ∈ {X, XX} and for S ∈ Q ∪ Γ fresh binary relations S W and the sentences X (= 1A) u (≥ 2S ) ≡ (= 1A) u ∃X.(≥ 2S) , (= 1A) u (≥ 2S XX ) ≡ (= 1A) u ∃X.(≥ 2S X ) and then add (= 1A) u (≥ 2G0 ) u (≥ 2q X ) u (≥ 2GXX )v 1 G ∃Y.(≥ 2S1 ) u (≥ 2S2X ) u (≥ 2S3XX ) S1 S2 S3 ∈S(G0 qG1 ) to OM . • the symbol written on a cell does not change if the head is more than one cell away. For all G, G1 , G2 ∈ Γ: − ≥2 ≥2 (= 1A) u ∀X.G≥2 v ∀Y.G≥2 1 u ∀X .G2 u G := (≥ 2Gi ) with i ∈ {1, 2} or empty. for G≥2 i • the final state cannot be a state distinct from the accepting state qa . For all q ∈ Q \ {qa }: (= 1A) u (≥ 2q) v ∃Y.> • Finally, let AUXM denote the set of fresh binary relations used above and add > v ∃Q.> to OM for all Q ∈ AUXM . This finishes the definition of OM . Let O = OG ∪ OM . We show that O is as required. Let N be a fresh unary relation symbol. Then an instance D is consistent w.r.t. O if O, D 6|= q for the Boolean query q ← N (x). It therefore suffices to provide a polynomial reduction of the run fitting problem for M to the problem whether an instance D is consistent w.r.t. O. Assume that a partial run γ̃ = (γ̃0 , γ̃1 , . . . , γ̃m ) of partial configurations γ̃i of M such that γ̃0 starts with q0 and |γ̃0 | = · · · = |γ̃m | = n + 1 is given. We define an instance D with D |= grid(0, 0) which encodes the partial run. Thus we regard (i, j) with 0 ≤ i ≤ n and 0 ≤ j ≤ m as constants and D contains the assertions X((i, j), (i + 1, j)), Tinit (0, 0), Y ((i, j), (i, j + 1)), Tfinal (n, m) and B(i, j) for (i, j) 6∈ {(0, 0), (n, m)}. In addition, we include in D the atoms S((i, j), d1i,j ), S((i, j), d2i,j ) for distinct fresh constants d1i,j and d2i,j for all i, j such that γ̃j [i] = S and S 6= ?. It is now straightforward to show that D is consistent w.r.t. O iff there is an accepting run of M that matches γ̃. For the converse direction, we have to provide for every UCQ q a polynomial reduction of the query evaluation problem for (O, q) to the complement of the run fitting problem for M . To this end observe that the following two conditions are equivalent for any UCQ q(~ x), instance D, and tuple ~a: 1. O, D |= q(~a) 2. D is not consistent w.r.t. O or {> v ∃Q.> | Q ∈ AUX}, D |= q(~a) where AUX = AUXcell ∪ AUXgrid ∪ AUXM . As the latter problem is in PT IME it suffices to provide a polynomial reduction of the problem whether an instance D is consistent w.r.t. O to the run fitting problem for M . Assume D is given. First decide in polynomial time whether D is consistent w.r.t. OG (Lemma 12). If not, we are done. If yes, let G be the set of all d ∈ dom(D) such that D |= grid(d). Assume without loss of generality that G = {0, . . . , k}. Then we find natural numbers ni , mi such that each i ∈ G is the root of an ni × mi -grid with witness function βi for G. By Lemma 12, there is a materialization B of D and OG such that d ∈ G iff d ∈ (= 1A)B . Next we check in polynomial time that D is consistent w.r.t. the union of OG and the sentences of the form (= 1A) u (≥ 2S X ) ≡ (= 1A) u ∃X.(≥ 2S), (= 1A) u (≥ 2S XX ) ≡ (= 1A) u ∃X.(≥ 2S X ) in OM . If this is not the case we are done. If this is the case we assume that D is saturated in the sense that if, for example, (= 1A) u (≥ 2S X ) ≡ (= 1A) u ∃X.(≥ 2S) is in OM then for any d ∈ (= 1A)B and X(d, d0 ) ∈ D the following holds: d has at least two S X -successors in D iff d0 has at least two S-successors in D. Now for each i ∈ G we define the sequences of strings i γ̃ i = (γ̃0i , . . . , γ̃m ) i by setting for 0 ≤ r ≤ ni and S ∈ Γ ∪ Q, γ̃ri [j] = S iff βi (j, r) has at least two S-successors in D If any of these strings is not a partial configuration, then D is not consistent w.r.t. O and we are done. Otherwise each γ̃ i is a partial run of M . It is now straightforward to show that D is consistent w.r.t. O iff for each i ∈ G there exists an accepting run of M that matches γ̃ i which provides us with a polynomial reduction of the consistency problem for instances D to the run fitting problem for M. I. PROOFS FOR SECTION 8 We start by introducing the basic notions used in this section in more detail. An interpretation D is O-saturated if for all atoms R(~a) with ~a ⊆ dom(D) such that O, D |= R(~a) it follows that R(~a) ∈ D. For every O and instance D there exists a unique minimal (w.r.t. set-inclusion) O-saturated instance DO ⊇ D. We call DO the O-saturation of D. The following lemma states basic properties of O-saturated instances. Lemma 16 Let D ⊆ D0 be instances with D0|dom(D) = D and let O be a uGC2 (=) ontology. 1. There exists a materialization of O and D iff there exists a materialization of O and the O-saturation of D. 2. If B is a materialization of O and D and D is O-saturated, then B|dom(D) = D. 3. If D0 is O-saturated, then D is O-saturated. We also require a lemma about stronger forms of unravelling tolerance and forest models that one can employ for ALCHIQ ontologies. Call a model B of an instance D an irreflexive forest model if it is obtained from D by adding atoms of the form R(a, b) with a 6= b in dom(D) to D and hooking irreflexive tree interpretations Ba to every a ∈ dom(D). The following variations of Lemma 1 and Theorem 7 can be proved by modifying the proofs in a straightforward way taking into account the limited expressive power of ALCHIQ. Lemma 17 Let O be an ALCHIQ ontology of depth 1. Then the following hold: 1. Let A be a model of O and D. Then there exists an irreflexive forest model B of O and D such that there exists a homomorphism h from B to A that preserves dom(D). 2. O is materializable iff O is materializable for the class of all irreflexive tree instance D with dom(D) ⊆ sig(O). To characterize materializability in uGC− 2 (1, =) we admit 1materializability witnesses with loops and require that they satisfy additional closure conditions which ensure that one can construct a materialization in a step-by-step fashion, as in the proof of Lemma 6. To formulate the conditions we employ a ‘mosaic technique’ and introduce mosaic pieces consisting of a 1materializability witness and a homomorphism into a bouquet. Given a set of such mosaic pieces satisfying certain closure conditions we can then construct the materialization of a bouquet stepby-step using the 1-materializability witnesses and, simultaneously, construct a homomorphism into any other model of the bouquet (and thereby prove that we have indeed constructed a materialization). The resulting procedure for checking materializability will be in NE XP T IME as we can first guess an exponential size set of mosaic pieces and then confirm in exponential time that it satisfies our conditions. A 1-model pair is a tuple (F, a, B) such that • F is a bouquet with root a that is consistent w.r.t. O, of outdegree ≤ |O| and such that sig(D) ⊆ sig(O); • B is a bouquet with root a of outdegree ≤ 2|O| that is a model of F such that there exists a model A of F and O with A≤1 a = B. A 1-model pair (F, a, B) is a 1-materializability witness if B is a 1-materialization of F w.r.t. O. We require two different types of mosaic pieces. First, an injective hom-pair takes the form (F, a, B) →h (F0 , a0 , B) such that • (F, a, B) is a 1-materializability witness and (F0 , a0 , B0 ) is a 1-model pair; 1. If (F, a, B) ∈ M and F0 is a bouquet with root a0 such that there is a bijective homomorphism h0 from F to F0 mapping a to a0 , and B0 is an interpretation such that (F0 , a0 , B0 ) is a 1-model pair, then there exists an extension h of h0 such that (F, a, B) →h (F0 , a0 , B0 ) ∈ H. 2. If (F, a, B) →h (F0 , a0 , B0 ) ∈ H or (F, a, B) →h (B0 , a0 ) ∈ E with b ∈ dom(B) \ dom(F) and h(a) = h(b) = a0 , then there exist h0 and B00 such that (B|{a,b} , b, B00 ) →h0 (B0 , a0 ) ∈ E. Table 1: Conditions for M , H, and E in materializability check for uGC− 2 (1, =) • h is a homomorphism from B to B0 and its restriction to dom(F) is a bijective homomorphism onto F0 mapping a to a0 . Thus, in an injective hom-pair the 1-materializability witness is a piece of the materialization we wish to construct and the 1-model pair is a piece of the model into which we wish to homomorphically embed the materialization. Injective hom-pairs assume the homomorphic embedding one wants to extend (i.e., the restriction of h to dom(F)) is injective. To deal with non-injective embeddings (as indicated by Example 7) we also consider contracting hom-pairs which take the form (F, a, B) →h (B0 , a0 ) where • (F, a, B) is a 1-materializability witness with dom(F) = {a, b}; • (B0 , a0 ) is a bouquet with root a0 of outdegree at most 2|O| 0 such that there exists a model A of O with A≤1 a0 = B ; • h is a homomorphism from B to B0 with h(a) = h(b) = a0 . Similarly to injective hom-pairs, in a contracting hom-pair the 1materializability witness is a piece of the materialization we wish to construct and the second component is a piece of the model into which we wish to homomorphically embed the materialization. The following lemma now provides a N EXP T IME decision procedure for materializability of uGC− 2 (1, =) ontologies. Lemma 18 Let O be a uGC− 2 (1, =) ontology. Then O is materializable iff there exist 1. a set M of 1-materializability witnesses containing exactly one 1-materializability witness for every bouquet F of outdegree ≤ |O| with sig(F) ⊆ sig(O) that is consistent relative to O; 2. sets H of injective hom-pairs and E of contracting hom-pairs whose first components are all in M such that the conditions of Table 1 hold. P ROOF. Using Lemma 2 one can show that sets M , H, and E satisfying the conditions of Table 1 exist if O is materializable. Conversely, let the sets M , H, and E satisfy the conditions given in Lemma 18. Assume a bouquet D of outdegree ≤ |O| with sig(D) ⊆ sig(O) and root a is given. Take the 1-materializability witness (D, a, B) ∈ M . As in the proof of Lemma 6 we construct a sequence of interpretations B0 , B1 , . . . and sets Fi ⊆ dom(Bi ) of frontier elements (but now using only 1-materializability witnesses in M ): set B0 := B and F0 = dom(B) \ dom(D). If Bi and Fi have been constructed, then take for any b ∈ Fi its (unique) predecessor a and a 1-materializability witness (Bi|{a,b} , b, Bb ) ∈ S M and set Bi+1 := Bi ∪ b∈Fi Bb . Let B∗ be the union of all Bi . We show that B is a materialization. B is a model of O by construction since O is a uGC− 2 (1, =) ontology. Consider a model A of O and D. We construct a homomorphism h from B∗ to A preserving dom(D) as the limit of a sequence h0 , . . . of homomorphisms from Bi to A. We may assume that the outdegree of A is ≤ 2|O|. By definition, there exists a homomorphism from B0 to A≤1 a preserving D. Now, inductively, we ensure in each step that the homomorphisms hi satisfy the following conditions for all Bi and Fi , all b ∈ Fi and the predecessor a of b in Bi−1 : (a) Assume hi (a) 6= hi (b). Then, for the 1-materializability witness (B∗|{a,b} , b, Bb ) ∈ M there exists a homomorphism hb such that (B∗|{a,b} , b, Bb ) →hb (A|{hi (a),hi (b)} , hi (b), A≤1 hi (b) ) ∈H (b) Assume hi (a) = hi (b). Then, for the 1-materializability witness (B∗|{a,b} , b, Bb ) ∈ M there exists a homomorphism hb such that (B∗|{a,b} , b, Bb ) →hb (A≤1 hi (b) , hi (b)) ∈ E Assume hi with the properties (a) and (b) has been constructed. Then we take for all b ∈ Fi and the predecessor a of b the homomorphism hb determined by (a) and, respectively, (b) and set [ hi+1 := hi ∪ hb b∈Fi Using the properties of M , H, and E given in Table 1 it is not difficult to show that hi+1 again has the properties (a) and (b) above. We prove Theorem 14, that is, for ALC ontologies of depth 2, deciding whether query-evaluation is in P TIME is NE XP T IME-hard (unless P TIME = CO NP). The proof is by reduction of the complement of a NE XP T IME-complete tiling problem in which the aim is to tile a 2n × 2n -grid. An instance of this problem is given by a tuple P = (T , t0 , H, V ) where T is a set of tile types, t0 ∈ T is a distinguished tile to be placed on position (0, 0) of the grid, and H and V are horizontal and vertical matching conditions. A solution to P is a function τ : 2n × 2n → T such that • if τ (i, j) = t and τ (i + 1, j) = t0 then (t, t0 ) ∈ H, for all i < 2n − 1, j < 2n , 0 0 • if τ (i, j) = t and τ (i, j + 1) = t then (t, t ) ∈ V , for all i < 2n , j < 2n − 1, • τ (0, 0) = t0 . Let P = (T , t0 , H, V ). We construct an ALC ontology O of depth 2 such that P has a solution iff O is not materializable. The idea is that O verifies the existence of an r-chain in the input instance whose elements represent the grid positions along with a tiling, row by row from left to right, starting at the lower left corner and ending at the upper right corner. The positions in the grid are represented in binary by the concept names X1 , . . . , X2n in the instance where X1 , . . . , Xn indicate the horizontal position and Xn+1 , . . . , X2n the vertical position. The tiling is represented by unary relations Tt , t ∈ T . O verifies the chain by propagating a marker bottom up and while doing this, it verifies the horizontal matching condition. When the top of the chain is reached, O generates another r-chain using existential quantifiers. On that chain, a violation of the vertical matching condition is guessed using disjunction, that is, the position where the violation occurs and the tiles involved in it. If the tiling on the first chain is defective, the guesses can be made such that the second chain homomorphically maps into the first one, and then the second chain is ‘invisible’ to the query. Otherwise, the query can ‘see’ the second chain and because of the disjunctions involved in building it, the instance has no materialization w.r.t. O. Clearly, the second case can occur only if P has a solution. We now assemble the ontology O. To avoid that O is trivially not materializable, we have to hide the marker propagated up the chain in the input and all markers used on the existentially generated chain. We thus use concepts of the form ∀s.A instead of concept names A, additionally stating in O that > v ∃s.A. For any concept name A, we use HA as an abbreviation of ∀s.A. 1. Set up hiding of concept names: for all A {V, ok1 , . . . , ok2n , D, L, Y1 , . . . , Y2n , Y 1 , . . . , Y 2n , Z1 , . . . , Zn , Z 1 , . . . , Z n } ∪ {St | t ∈ T }, ∈ > v ∃s.A 2. The initial position starts the propagation: for all t ∈ T , X 1 u · · · u X 2n u Tt v HV 3. Tiles are mutually exclusive: for all distinct t, t0 ∈ T : Tt u Tt0 v ⊥ 4. The propagation proceeds upwards, checking the horizontal matching condition: F Xi u ∃r.Xi u 1≤j<i ∃r.Xj v Hoki F X i u ∃r.X i u 1≤j<i ∃r.Xj v Hoki d Xi u ∃r.X i u 1≤j<i ∃r.X j v Hoki d X i u ∃r.Xi u 1≤j<i ∃r.X j v Hoki Hok1 u · · · u Hokn u X i u ∃r.(HV u Tt ) u Tt0 v HV ∃r.Xi u ∃r.X i v ⊥ where i ranges over 1..2n and (t, t0 ) over H; the use of X i in the third last line prevents the counter from wrapping around at maximum value. The last inclusion is necessary to avoid that multiple successors that carry different concept name labels interact in undesired ways. 5. When the maximum value is reached, we make an extra step in the instance and then generate the first object on an existential chain that represents a tiling defect: where ∃r.(X1 u · · · u X2n u HV ) v ∃r.C C = HM u HY1 u · · · u HY2n 6. We continue building the chain; in every step, we decide whether we have a defect here (HD ) or defer it to later (HL ); we can defer at most to the left-most position of the second row: HM v (HD u CD ) t HL HM u HY 1 u · · · u HY n u HYn+1 u HY n+2 u · · · u HY 2n v HD where CD is a concept to be defined later. 7. When defering the defect to later, we keep going and maintain the Y -counter, decrementing it: HL d HL u 1≤j≤i HY i d HL u HYi u 1≤j<i HY i F HL u HYi u 1≤j<i HYi F HL u HY i u 1≤j<i HYi where i ranges over 1..2n. v ∃r.HM v ∀r.HYi v ∀r.HY i v ∀r.HYi v ∀r.HY i 8. When implementing the defect, we guess two V -incompatible tiles, start a new counter, travel exactly 2n steps, and verify the tiles. The above concept CD is G (Tt u HSt0 ) CD = HZ1 u · · · u HZn u (t,t0 )∈V / and we add the following concept inclusions: HD u HZi d HD u 1≤j≤i HZ i d HD u HZi u 1≤j<i HZ i F HD u HZi u 1≤j<i HZi F HD u HZ i u 1≤j<i HZi HD u HZi u HSt HD u HZ 1 u · · · u HZ n u HSt where i ranges over 1..n. v ∃r.(HM u HD ) v ∀r.HZi v ∀r.HZ i v ∀r.HZi v ∀r.HZ i v ∀r.HSt v Tt We now establish correctness of the reduction. A tiling chain is an instance D that consists of the following assertions: • r(ai , ai+1 ) for 0 ≤ i < 22n − 1 • Xj (ai ) whenever the j-th bit of i is one • X j (ai ) whenever the j-th bit of i is zero • exactly one Tt (ai ) for each i, t ∈ T , such that when Tt (ai ), Tt0 (ai+1 ) ∈ D, then (t, t0 ) ∈ H. We say that the tiling chain D is defective if there is an i ≤ 22n − (2n + 1) such that Tt (ai ), Tt0 (ai+2n ) ∈ D and (t, t0 ) ∈ / V. Lemma 19 P has a solution iff O is not materializable. P ROOF. (sketch) “if”. Assume that P has no solution. Take an instance D. We have to construct a CQ-materialization A of D and O. To do this, start with D viewed as an interpretation A. To satisfy the concept inclusions in Points 1-4, which all fall within monadic Datalog, apply a standard chase procedure. To satisfy the concept inclusions in Point 5 to 8, consider every a ∈ (∃r.(X1 u · · · u X2n u HV ))I . The concept HV = ∀s.V cannot be made true by an atom in an instance and, consequently, it was made true at a by the chase. Analyzing the concept inclusions in O, one can show that there must thus be a tiling chain C ⊆ D whose last element a22n −1 is a. Since there is no solution, C must be defective. When generating the existential chain, we can thus make the guesses in a way such that the chain homomorphically maps into C, thus into D. It can be verified that this yields the desired CQ-materialization of O and D. “only if”. Assume that P has a solution. Let D be the tiling chain that represents it. We show that there is no materialization of O and D. In fact, O propagates the HV marker all the way up to the last element a = a22n −1 of D, where then an existential chain is generated. Because of the use of disjunctions to generate all different kinds of violations of the vertical tiling conditions, there are clearly at least two chains that can be generated and that are incomparable in terms of homomorphisms. Moreover, since D represents a proper tiling, none of the chains homomorphically maps to D. Consequently, we can find CQs q1 (x), . . . , qm (x) such that O, D |= q1 (a) ∨ · · · ∨ qm (a), but O, D 6|= qi (a) for any i. Thus, O does not have the disjunction property and consequently is not materializable.
© Copyright 2026 Paperzz