Dichotomies in Ontology-Mediated Querying with the Guarded

Dichotomies in Ontology-Mediated Querying with the
Guarded Fragment
André Hernich
Carsten Lutz
University of Liverpool
United Kingdom
University of Bremen
Germany
[email protected]
Fabio Papacchini
[email protected]
Frank Wolter
University of Liverpool
United Kingdom
University of Liverpool
United Kingdom
[email protected]
[email protected]
ABSTRACT
We study the complexity of ontology-mediated querying when ontologies are formulated in the guarded fragment of first-order logic
(GF). Our general aim is to classify the data complexity on the
level of ontologies where query evaluation w.r.t. an ontology O is
considered to be in PT IME if all (unions of conjunctive) queries
can be evaluated in PT IME w.r.t. O and CO NP-hard if at least one
query is CO NP-hard w.r.t. O. We identify several large and relevant fragments of GF that enjoy a dichotomy between PT IME and
CO NP, some of them additionally admitting a form of counting. In
fact, almost all ontologies in the BioPortal repository fall into these
fragments or can easily be rewritten to do so. We then establish a
variation of Ladner’s Theorem on the existence of NP-intermediate
problems and use this result to show that for other fragments, there
is provably no such dichotomy. Again for other fragments (such as
full GF), establishing a dichotomy implies the Feder-Vardi conjecture on the complexity of constraint satisfaction problems. We also
link these results to Datalog-rewritability and study the decidability of whether a given ontology enjoys PT IME query evaluation,
presenting both positive and negative results.
Keywords
Ontology-Based Data Access; Query Answering; Dichotomies
1.
INTRODUCTION
In Ontology-Mediated Querying, incomplete data is enriched
with an ontology that provides domain knowledge, enabling more
complete answers to queries [47, 10, 34]. This paradigm has recently received a lot of interest, a significant fraction of the research
being concerned with the (data) complexity of querying [46, 14]
and, closely related, with the rewritability of ontology-mediated
queries into more conventional database query languages [16, 26,
28, 31, 27]. A particular emphasis has been put on designing ontology languages that result in PT IME data complexity, and in delinPermission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected].
PODS’17, May 14 – 19, 2017, Chicago, IL, USA.
c 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-4198-1/17/05. . . $15.00
DOI: http://dx.doi.org/10.1145/3034786.3056108
eating these from the CO NP-hard cases. This question and related
ones have given rise to a considerable array of ontology languages,
including many description logics (DLs) [3, 37] and a growing
number of classes of tuple-generating dependencies (TGDs), also
known as Datalog± and as existential rules [13, 45]. A general and
uniform framework is provided by the guarded fragment (GF) of
first-order logic and extensions thereof, which subsume many of
the mentioned ontology languages [5, 6].
In practical applications, ontologies often need to use language
features that are only available in computationally expensive ontology languages, but do so in a way such that one may hope for hardness to be avoided. This observation has led to a more fine-grained
study of data complexity than on the level of ontology languages,
initiated in [42], where the aim is to classify the complexity of individual ontologies while quantifying over the actual query: query
evaluation w.r.t. an ontology O is in PT IME if every CQ can be
evaluated in PT IME w.r.t. O and it is CO NP-hard if there is at least
one CQ that is CO NP-hard to evaluate w.r.t. O. In this way, one can
identify tractable ontologies within ontology languages that are, in
general, computationally hard. Note that an even more fine-grained
approach is taken in [11], where one aims to classify the complexity
of each pair (O, q) with O an ontology and q an actual query. Both
approaches are reasonable, the first one being preferable when the
queries to be answered are not fixed at the design time of the ontology; this is actually often the case because ontologies are typically
viewed as general purpose artifacts to be used in more than a single
application. In this paper, we follow the former approach.
The main aim of this paper is to identify fragments of GF (and
of extensions of GF with different forms of counting) that result in
a dichotomy between PT IME and CO NP when used as an ontology
language and that cover as many real-world ontologies as possible,
considering conjunctive queries (CQs) and unions thereof (UCQs)
as the actual query language. We also aim to provide insight into
which fragments of GF (with and without counting) do not admit
such a dichotomy, to understand the relation between PT IME data
complexity and rewritability into Datalog (with inequality in rule
bodies, in case we start from GF with equality or counting), and to
clarify whether it is decidable whether a given ontology has PT IME
data complexity. Note that we concentrate on fragments of GF because for the full guarded fragment, proving a dichotomy between
PT IME and CO NP implies the long-standing Feder-Vardi conjecture on constraint satisfaction problems [23] which indicates that
it is very difficult to obtain (if it holds at all). In particular, we
concentrate on the fragment of GF that is invariant under disjoint
unions, which we call uGF, and on fragments thereof and their ex-
No Dichotomy
CSP-Hard
(Datalog6= 6= P TIME)
Dichotomy
(Datalog6= = P TIME)
uGF−
2 (2, f )
uGF2 (1, =)
uGF− (1, =)
uGF2 (2)
ALC depth 3 [42]
uGF(1)
uGF−
2 (2)
uGF2 (1, f )
uGC−
2 (1, =)
ALCHIQ depth 1
ALCIF ` depth 2
ALCF depth 3 [42]
ALCF ` depth 2
ALCHIF depth 2
Figure 1: Summary of the results—Number in brackets indicates depth, f presence of partial functions, ·2 restriction to two variables, ·− restricts outermost guards to be equality, F globally function roles, F` concepts (≤ 1 R).
tension with forms of counting. Invariance under disjoint unions
is a fairly mild restriction that is shared by many relevant ontology
languages, and it admits a very natural syntactic characterization.
Our results are summarized in Figure 1. We first explain the fragments shown in the figure and then survey the obtained results. A
uGF ontology is a set of sentences of the form ∀~
x(R(~
x) → ϕ(~
x))
where R(~x) is a guard (possibly equality) and ϕ(~
x) is a GF formula
that does not contain any sentences as subformulas and in which
equality is not used as a guard. The depth of such a sentence is the
quantifier depth of ϕ(~
x) (and thus the outermost universal quantifier is not counted). A main parameter that we vary is the depth,
which is typically very small in real world ontologies. In Figure 1,
the depth is the first parameter displayed in brackets. As usual, the
subscript ·2 indicates the restriction to two variables while a superscript ·− means that the guard R(~
x) in the outermost universal
quantifier can only be equality, = means that equality is allowed
(in non-guard positions), f indicates the ability to declare binary
relation symbols to be interpreted as partial functions, and GC2 denotes the two variable guarded fragment extended with counting
quantifiers, see [32, 48]. While guarded fragments are displayed
in black, description logics (DLs) are shown in grey and smaller
font size. We use standard DL names except that ‘F’ denotes globally functional roles while ‘F` ’ refers to counting concepts of the
form (≤ 1 R). We do not explain DL names here, but refer to the
standard literature [4].
The bottommost part of Figure 1 displays fragments for which
there is a dichotomy between PT IME and CO NP, the middle part
shows fragments for which such a dichotomy implies the FederVardi conjecture (from now on called CSP-hardness), and the topmost part is for fragments that provably have no dichotomy (unless
PT IME = NP). The vertical lines indicate that the linked results are
closely related, often indicating a fundamental difficulty in further
generalizing an upper bound. For example, uGF− (1, =) enjoys
a dichotomy while uGF2 (1, =) is CSP-hard, which demonstrates
that generalizing the former result by dropping the restriction that
the outermost quantifier has to be equality (indicated by ·− ) is very
challenging (if it is possible at all).1 Our positive results are thus
optimal in many ways. All results hold both when CQs and when
UCQs are used as the actual query; in this context, it is interesting
to note that there is a GF ontology (which is not an uGF ontology) for which CQ answering is in PT IME while UCQ-answering
is CO NP-hard. In the cases which enjoy a dichotomy, we also show
that PT IME query evaluation coincides with rewritability into Datalog (with inequality in the rule bodies if we start from a fragment
1
A tentative proof of the Feder-Vardi conjecture has very recently
been announced in [49], along with an invitation to the research
community to verify its validity.
with equality or counting). In contrast, for all fragments that are
CSP-hard or have no dichotomy, these two properties do provably
not coincide. This is of course independent of whether or not the
Feder-Vardi conjecture holds.
For ALCHIQ ontologies of depth 1, we also show that it
is decidable and E XP T IME-complete whether a given ontology
admits PT IME query evaluation (equivalently: rewritability into
Datalog6= ). For uGC−
2 (1, =), we show a NE XP T IME upper bound.
For ALC ontologies of depth 2, we establish NE XP T IME-hardness.
The proof indicates that more sophisticated techniques are needed
to establish decidability, if the problem is decidable at all (which
we leave open).
To understand the practical relevance of our results, we have
analyzed 411 ontologies from the BioPortal repository [52]. After removing all constructors that do not fall within ALCHIF ,
an impressive 405 ontologies turned out to have depth 2 and thus
belong to a fragment with dichotomy (sometimes modulo an easy
complexity-preserving rewriting). For ALCHIQ, still 385 ontologies had depth 1 and so belonged to a fragment with dichotomy.
As a concrete and simple example, consider the two uGC−
2 (1)ontologies
O1 = {∀x (Hand(x) → ∃=5 y hasFinger(x, y))}
O2 = {∀x (Hand(x) → ∃y (hasFinger(x, y) ∧ Thumb(y)))}
which both enjoy PT IME query evaluation (and thus rewritability
into Datalog6= ), but where query evaluation w.r.t. the union O1 ∪O2
is CO NP-hard. Note that such subtle differences cannot be captured
when data complexity is studied on the level of ontology languages,
at least when basic compositionality conditions are desired.
We briefly highlight some of the techniques used to establish our
results. An important role is played by the notions of materializability and unraveling tolerance of an ontology O. Materializability means that for every instance D, there is a universal model of
D and O, defined in terms of query answers rather than in terms
of homomorphisms (which, as we show, need not coincide in our
context). Unraveling tolerance means that the ontology cannot distinguish between an instance and its unraveling into a structure of
bounded treewidth. While non-materializability of O implies that
query evaluation w.r.t. O is CO NP-hard, unraveling tolerance of O
implies that query evaluation w.r.t. O is in PT IME (in fact, even
rewritable into Datalog). To establish dichotomies, we prove for
the relevant fragments that materializability implies unraveling tolerance which, depending on the fragment, can be technically rather
subtle. To prove CSP-hardness or non-dichotomies, very informally speaking, we need to express properties in the ontology that
a (positive existential) query cannot ‘see’. This is often very subtle and can often be achieved only partially. While the latter is not
a major problem for CSP-hardness (where we need to deal with
CSPs that ‘admit precoloring’ and are known to behave essentially
in the same way as traditional CSPs), it poses serious challenges
when proving non-dichotomy. To tackle this problem, we establish
a variation of Ladner’s theorem on NP-intermediate problems such
that instead of the word problem for NP Turing machines, it speaks
about the run fitting problem, which is to decide whether a given
partially described run of a Turing machine (which corresponds to
a precoloring in the CSP case) can be extended to a full run that
is accepting. Also our proofs of decidability of whether an ontology admits PT IME query evaluation are rather subtle and technical,
involving e.g. mosaic techniques.
Due to space constraints, throughout the paper we defer proof
details to the appendix.
Related Work. Ontology-mediated querying has first been considered in [40, 15]; other important papers include [16, 12, 5]. It is
a form of reasoning under integrity constraints, a traditional topic
in database theory, see e.g. [9, 8] and references therein, and it is
also related to deductive databases, see e.g. the monograph [44].
Moreover, ontology-mediated querying has drawn inspiration from
query answering under views [17, 18]. In recent years, there has
been significant interest in complete classification of the complexity of hard querying problems. In the context of ontology-mediated
querying, relevant references include [42, 11, 41]. In fact, this
paper closes a number of open problems from [42] such as that
ALCI ontologies of depth two enjoy a dichotomy and that materializability (and thus PT IME complexity and Datalog-rewritability)
is decidable in many relevant cases. Other areas of database theory
where complete complexity classifications are sought include consistent query answering [36, 35, 24, 21], probabilistic databases
[51], and deletion propagation [33, 25].
2.
PRELIMINARIES
We assume an infinite set ∆D of data constants, an infinite set
∆N of labeled nulls disjoint from ∆D , and a set Σ of relation symbols containing infinitely many relation symbols of any arity ≥ 1.
A (database) instance D is a non-empty set of facts R(a1 , . . . , ak ),
where R ∈ Σ, k is the arity of R, and a1 , . . . , ak ∈ ∆D . We generally assume that instances are finite, unless otherwise specified.
An interpretation A is a non-empty set of atoms R(a1 , . . . , ak ),
where R ∈ Σ, k is the arity of R, and a1 , . . . , ak ∈ ∆D ∪ ∆N .
We use sig(A) and dom(A) to denote the set of relation symbols and, respectively, constants and labelled nulls in A. We always assume that sig(A) is finite while dom(A) can be infinite.
Whenever convenient, interpretations A are presented in the form
(A, (RA )R∈sig(A) ) where A = dom(A) and RA is a k-ary relation
on A for each R ∈ sig(A) of arity k. An interpretation A is a
model of an instance D, written A |= D, if D ⊆ A. We thus make
a strong open world assumption (interpretations can make true additional facts and contain additional constants and nulls) and also
assume standard names (every constant in D is interpreted as itself
in A). Note that every instance is also an interpretation.
Assume A and B are interpretations. A homomorphism h
from A to B is a mapping from dom(A) to dom(B) such that
R(a1 , . . . , ak ) ∈ A implies R(h(a1 ), . . . , h(ak )) ∈ B for all
a1 , . . . , ak ∈ dom(A) and R ∈ Σ of arity k. We say that h preserves a set D of constants and labelled nulls if h(a) = a for all
a ∈ D and that h is an isomorphic embedding if it is injective and
R(h(a1 ), . . . , h(ak )) ∈ B entails R(a1 , . . . , ak ) ∈ A. An interpretation A ⊆ B is a subinterpretation of B if R(a1 , . . . , ak ) ∈
B and a1 , . . . , ak ∈ dom(A) implies R(a1 , . . . , ak ) ∈ A; if
dom(A) = A, we denote A by B|A and call it the subinterpretation of B induced by A.
Conjunctive queries (CQs) q of arity k take the form q(~
x) ← φ,
where ~
x = x1 , . . . , xk is the tuple of answer variables of q, and φ
is a conjunction of atomic formulas R(y1 , . . . , yn ) with R ∈ Σ of
arity n and y1 , . . . , yn variables. As usual, all variables in ~
x must
occur in some atom of φ. Any CQ q(~
x) ← φ can be regarded as
an instance Dq , often called the canonical database of q, in which
each variable y of φ is represented by a unique data constant ay ,
and that for each atom R(y1 , . . . , yk ) in φ contains the atom
R(ay1 , . . . , ayk ). A tuple ~a = (a1 , . . . , ak ) of constants is an answer to q(x1 , . . . , xk ) in A, in symbols A |= q(~a), if there is a homomorphism h from Dq to A with h(ax1 , . . . , axk ) = ~a. A union
of conjunctive queries (UCQ) q takes the form q1 (~
x), . . . , qn (~
x),
where each qi (~
x) is a CQ. The qi are called disjuncts of q. A tuple
~a of constants is an answer to q in A, denoted by A |= q(~a), if ~a is
an answer to some disjunct of q in A.
We now introduce the fundamentals of ontology-mediated
querying. An ontology language L is a set of first-order sentences
over signature Σ (that is, function symbols are not allowed) and an
L-ontology O is a finite set of sentences from L. We introduce various concrete ontology languages throughout the paper, including
fragments of the guarded fragment and descriptions logics. An interpretation A is a model of an ontology O, in symbols A |= O, if
it satisfies all its sentences. An instance D is consistent w.r.t. O if
there is a model of D and O.
An ontology-mediated query (OMQ) is a pair (O, q), where O is
an ontology and q a UCQ. The semantics of an ontology-mediated
query is given in terms of certain answers, defined next. Assume
that q has arity k and D is an instance. Then a tuple ~a of length k
in dom(D) is a certain answer to q on an instance D given O, in
symbols O, D |= q(~a), if A |= q(~a) for all models A of D and O.
The query evaluation problem for an OMQ (O, q(~
x)) is to decide,
given an instance D and a tuple ~a in D, whether O, D |= q(~a).
We use standard notation for Datalog programs (a brief introduction is given in the appendix). An OMQ (O, q(~
x)) is called
Datalog-rewritable if there is a Datalog program Π such that for
all instances D and ~a ∈ dom(D), O, D |= q(~a) iff D |= Π(~a).
Datalog6= -rewritability is defined accordingly, but allows the use of
inequality in the body of Datalog rules. We are mainly interested
in the following properties of ontologies.
Definition 1 Let O be an ontology and Q a class of queries. Then
• Q-evaluation w.r.t. O is in PT IME if for every q ∈ Q, the query
evaluation problem for (O, q) is in PT IME.
• Q-evaluation w.r.t. O is Datalog-rewritable (resp. Datalog6= rewritable) if for every q ∈ Q, the query evaluation problem
for (O, q) is Datalog-rewritable (resp. Datalog6= -rewritable).
• Q-evaluation w.r.t. O is CO NP-hard if there is a q ∈ Q such
that the query evaluation problem for (O, q) is CO NP-hard.
2.1
Ontology Languages
As ontology languages, we consider fragments of the guarded
fragment (GF) of FO, the two-variable guarded fragment of FO
with counting, and DLs. Recall that GF formulas [1] are obtained
by starting from atomic formulas R(~
x) over Σ and equalities x = y
and then using the boolean connectives and guarded quantifiers of
the form
∀~
y (α(~
x, ~
y ) → ϕ(~
x, ~
y )),
∃~
y (α(~
x, ~
y ) ∧ ϕ(~
x, ~
y ))
where ϕ(~x, ~
y ) is a guarded formula with free variables among ~
x, ~
y
and α(~x, ~
y ) is an atomic formula or an equality x = y that contains all variables in ~
x, ~
y . The formula α is called the guard of the
quantifier.
In ontologies, we only allow GF sentences ϕ that are invariant
under disjoint unions, that is, for all families Bi , i ∈ I, of interpretations with mutually disjoint domains,
the following holds:
S
Bi |= ϕ for all i ∈ I if, and only if, i∈I Bi |= ϕ. We give a
syntactic characterization of GF sentences that are invariant under
disjoint unions. Denote by openGF the fragment of GF that consists of all (open) formulas whose subformulas are all open and in
which equality is not used as a guard. The fragment uGF of GF
is the set of sentences obtained from openGF by a single guarded
universal quantifier: if ϕ(~
y ) is in openGF, then ∀~
y (α(~
y ) → ϕ(~
y ))
is in uGF, where α(~
y ) is an atomic formula or an equality y = y
that contains all variables in ~
y . We often omit equality guards in
uGF sentences of the form ∀y(y = y → ϕ(y)) and simply write
∀yϕ. A uGF ontology is a finite set of sentences in uGF.
Theorem 1 A GF sentence is invariant under disjoint unions iff it
is equivalent to a uGF sentence.
P ROOF. The direction from right to left is straightforward. For
the converse direction, observe that every GF sentence is equivalent
to a Boolean combination of uGF sentences. Now assume that ϕ is
a GF sentence and invariant under disjoint unions. Let cons(ϕ) be
the set of all sentences χ in uGF with ϕ |= χ. By compactness of
FO it is sufficient to show that cons(ϕ) |= ϕ. If this is not the case,
take a model A0 of cons(ϕ) refuting ϕ and take for any sentence
ψ in uGF that is not in cons(ϕ) an interpretation A¬ψ satisfying ϕ
and refuting ψ. Let A1 be the disjoint union of all A¬ψ . By preservation of ϕ under disjoint unions, A1 satisfies ϕ. By reflection of
ϕ for disjoint unions, the disjoint union A of A0 and A1 does not
satisfy ϕ. Thus A1 satisfies ϕ and A does not satisfy ϕ but by
construction A and A1 satisfy the same sentences in uGF. This is
impossible since ϕ is equivalent to a Boolean combination of uGF
sentences.
The following example shows that some very simple Boolean combinations of uGF sentences are not invariant under disjoint unions.
Example 1 Let
OUCQ/CQ
OMat/PTime
= {(∀x(A(x) ∨ B(x)) ∨ ∃xE(x)}
= {∀xA(x) ∨ ∀xB(x)}
Then OMat/PTime is not preserved under disjoint unions since D1 =
{A(a)} and D2 = {B(b)} are models of OMat/PTime but D1 ∪ D2
refutes OMat/PTime ; OUCQ/CQ does not reflect disjoint unions since
the disjoint union of D01 = {E(a)} and D02 = {F (b)} is a model
of OUCQ/CQ but D02 refutes OUCQ/CQ . We will use these ontologies
later to explain why we restrict this study to fragments of GF that
are invariant under disjoint unions.
When studying uGF ontologies, we are going to vary several
parameters. The depth of a formula ϕ in openGF is the nesting depth of guarded quantifiers in ϕ. The depth of a sentence
∀~
y (α(~
y ) → ϕ(~
y )) in uGF is the depth of ϕ(~
y ), thus the outermost
guarded quantifier is not counted. The depth of a uGF ontology is
the maximum depth of its sentences. We indicate restricted depth in
brackets, writing e.g. uGF(2) to denote the set of all uGF sentences
of depth at most 2.
Example 2 The sentence
∀xy(R(x, y) → (A(x) ∨ ∃zS(y, z)))
is in uGF(1) since the openGF formula A(x) ∨ ∃zS(y, z) has
depth 1.
For every GF sentence ϕ, one can construct in polynomial time a
conservative extension ϕ0 in uGF(1) by converting into Scott normal form [29]. Thus, the satisfiability and CQ-evaluation problems
for full GF can be polynomially reduced to the corresponding problem for uGF(1).
We use uGF− to denote the fragment of uGF where only equality
guards are admitted in the outermost universal quantifier applied to
an openGF formula. Thus, the sentence in Example 2 (1) is a uGF
sentence of depth 1, but not a uGF− sentence of depth 1. It is,
however, equivalent to the following uGF− sentence of depth 1:
∀x(∃y((R(y, x) ∧ ¬A(y)) → ∃zS(x, z)))
An example of a uGF sentence of depth 1 that is not equivalent to a
uGF− sentence of depth 1 is given in Example 3 below. Intuitively,
uGF sentences of depth 1 can be thought of as uGF− sentences
of ‘depth 1.5’ because giving up ·− allows an additional level of
‘real’ quantification (meaning: over guards that are not forced to
be equality), but only in a syntactically restricted way.
The two-variable fragment of uGF is denoted with uGF2 . More
precisely, in uGF2 we admit only the two fixed variables x and y
and disallow the use of relation symbols of arity exceeding two. We
also consider two extensions of uGF2 with forms of counting. First,
uGF2 (f ) denotes the extension of uGF2 with function symbols,
that is, an uGF2 (f ) ontology is a finite set of uGF2 sentences and
of functionality axioms ∀x∀y1 ∀y2 ((R(x, y1 )∧R(x, y2 )) → (y1 =
y2 )) [29]. Second, we consider the extension uGC2 of uGF2 with
counting quantifiers. More precisely, the language openGC2 is defined in the same way as the two-variable fragment of openGF, but
in addition admits guarded counting quantifiers [48, 32]: if n ∈ ,
{z1 , z2 } = {x, y}, and α(z1 , z2 ) ∈ {R(z1 , z2 ), R(z2 , z1 )} for
some R ∈ Σ and ϕ(z1 , z2 ) is in openGC2 , then ∃≥n z1 (α(z1 , z2 )∧
ϕ(z1 , z2 )) is in openGC2 . The ontology language uGC2 is then
defined in the same way as uGF2 , using openGC2 instead of
openGF2 . The depth of formulas in uGC2 is defined in the expected
way, that is, guarded counting quantifiers and guarded quantifiers
both contribute to it.
The above restrictions can be freely combined and we use
the obvious names to denote such combinations. For example,
uGF−
2 (1, f ) denotes the two-variable fragment of uGF with function symbols and where all sentences must have depth 1 and the
guard of the outermost quantifier must be equality. Note that uGF
admits equality, although in a restricted way (only in non-guard positions, with the possible exception of the guard of the outermost
quantifier). We shall also consider fragments of uGF that admit no
equality at all except as a guard of the outermost quantifier. To emphasize that the restricted use of equality is allowed, we from now
on use the equality symbol in brackets whenever equality is present,
as in uGF(=), uGF− (1, =), and uGC−
2 (1, =). Conversely, uGF,
uGF− (1), and uGC−
2 (1) from now on denote the corresponding
fragments where equality is only allowed as a guard of the outermost quantifier.
N
Description logics are a popular family of ontology languages
that are related to the guarded fragments of FO introduced above.
We briefly review the basic description logic ALC, further details
on this and other DLs mentioned in this paper can be found in the
appendix and in [4]. DLs generally use relations of arity one and
two, only. An ALC concept is formed according to the syntax rule
C, D ::= A | > | ⊥ | ¬C | C u D | C t D | ∃R.C | ∀R.C
where A ranges over unary relations and R over binary relations.
An ALC ontology O is a finite set of concept inclusions C v D,
with C and D ALC concepts. The semantics of ALC concepts C
can be given by translation to openGF formulas C ∗ (x) with one
free variable x and two variables overall. A concept inclusion C v
∗
∗
D then translates to the uGF−
2 sentence ∀x(C (x) → D (x)).
The depth of an ALC concept is the maximal nesting depth of ∃R
and ∀R. The depth on an ALC ontology is the maximum depth of
concepts that occur in it. Thus, every ALC ontology of depth n is
a uGF−
2 ontology of depth n. When translating into uGF2 instead
of into uGF−
2 , the depth might decrease by one because one can
exploit the outermost quantifier (which does not contribute to the
depth). A more detailed description of the relationship between
DLs and fragments of uGF is given in the appendix.
Example 3 The ALC concept inclusion ∃S.A v ∀R.∃S.B has
depth 2, but is equivalent to the uGF2 (1) sentence
∀xy(R(x, y) → ((∃S.A)∗ (x) → (∃S.B)∗ (y))
Note that for any ontology O in any DL considered in this paper
one can construct in a straightforward way in polynomial time a
conservative extension O∗ of O of depth one. In fact, many DL
algorithms for satisfiability or query evaluation assume that the ontology is of depth one and normalized.
We also consider the extensions of ALC with inverse roles R−
(denoted in the name of the DL by the letter I), role inclusions
R v S (denoted by H), qualified number restrictions (≥ n R C)
(denoted by Q), partial functions as defined above (denoted by F),
and local functionality expressed by (≤ 1R) (denoted by F` ). The
depth of ontologies formulated in these DLs is defined in the obvious way. Thus, ALCHIQ ontologies (which admit all the constructors introduced above) translate into uGC−
2 ontologies, preserving the depth.
For any syntactic object O (such as an ontology or a query), we
use |O| to denote the number of symbols needed to write O, counting relation names, variable names, and so on as a single symbol
and assuming that numbers in counting quantifiers and DL number
restrictions are coded in unary.
2.2
Guarded Tree Decompositions
We introduce guarded tree decompositions and rooted acyclic
queries [30]. A set G ⊆ dom(A) is guarded in the interpretation
A if G is a singleton or there are R ∈ Σ and R(a1 , . . . , ak ) ∈ A
such that G = {a1 , . . . , ak }. By S(A), we denote the set of all
guarded sets in A. A tuple (a1 , . . . , ak ) ∈ Ak is guarded in A if
{a1 , . . . , ak } is a subset of some guarded set in A. A guarded tree
decomposition of A is a triple (T, E, bag) with (T, E) an acyclic
undirected graph and bag a function that assigns to every t ∈ T a
set bag(t) of atoms such that A|dom(bag(t)) = bag(t) and
S
1. A = t∈T bag(t);
2. dom(bag(t)) is guarded for every t ∈ T ;
3. {t ∈ T | a ∈ dom(bag(t))} is connected in (T, E), for every
a ∈ dom(A).
We say that A is guarded tree decomposable if there exists a
guarded tree decomposition of A. We call (T, E, bag) a connected guarded tree decomposition (cg-tree decomposition) if, in
addition, (T, E) is connected (i.e., a tree) and dom(bag(t)) ∩
dom(bag(t0 )) 6= ∅ for all (t, t0 ) ∈ E. In this case, we often assume that (T, E) has a designated root r, which allows us to view
(T, E) as a directed tree whenever convenient.
A CQ q ← φ is a rooted acyclic query (rAQ) if there exists a cgtree decomposition (T, E, bag) of the instance Dq with root r such
that dom(bag(r)) is the set of answer variables of q. Note that, by
definition, rAQs are non-Boolean queries.
Example 4 The CQ
q(x) ← φ,
φ = R(x, y) ∧ R(y, z) ∧ R(z, x)
is not an rAQ since Dq is not guarded tree decomposable. By
adding the conjunct Q(x, y, z) to φ one obtains an rAQ.
We will frequently use the following construction: let D be an
instance and G a set of guarded sets in D. Assume that BG ,
G ∈ G, are interpretations such that dom(BG ) ∩ dom(D) = G
and dom(BG1 ) ∩ dom(BG2 ) = G1 ∩ G2 for any two distinct
guarded sets G1 and G2 in G. Then the interpretation
[
B=D∪
BG
G∈G
is called the interpretation obtained from D by hooking BG to D
for all G ∈ G. If the BG are cg-tree decomposable interpretations
with dom(bag(r)) = G for the root r of a (fixed) cg-tree decomposition of BG , then B is called a forest model of D defined using
G. If G is the set of all maximal guarded sets in D, then we call
B simply a forest model of D. The following result can be proved
using standard guarded tree unfolding [29, 30].
Lemma 1 Let O be a uGF(=) or uGC2 (=) ontology, D a possibly
infinite instance, and A a model of D and O. Then there exists a
forest model B of D and O and a homomorphism h from B to A
that preserves dom(D).
3.
MATERIALIZABILITY
We introduce and study materializability of ontologies as a necessary condition for query evaluation to be in PT IME. In brief,
an ontology O is materializable if for every instance D, there is a
model A of O and D such that for all queries, the answers on A
agree with the certain answers on D given O. We show that this
sometimes, but not always, coincides with existence of universal
models defined in terms of homomorphisms. We then prove that in
uGF(=) and uGC2 (=), non-materializability implies CO NP-hard
query answering while this is not the case for GF. Using these results, we further establish that in uGF(=) and uGC2 (=), query
evalution w.r.t. ontologies to be in PT IME, Datalog6= -rewritable,
and CO NP-hard does not depend on the query language, that is, all
these properties agree for rAQs, CQs, and UCQs. Again, this is not
the case for GF.
Definition 2 (Materializability) Let O be an FO(=)-ontology, Q
a class of queries, and M a class of instances. Then
• an interpretation B is a Q-materialization of O and an instance
D if it is a model of O and D and for all q(~
x) ∈ Q and ~a in
dom(D), B |= q(~a) iff O, D |= q(~a).
• O is Q-materializable for M if for every instance D ∈ M that
is consistent w.r.t. O, there is a Q-materialization of O and D.
If M is the class of all instances, we simply speak of Qmaterializability of O.
We first observe that the materializability of ontologies does not
depend on the query language (although concrete materializations
do).
Theorem 2 Let O be a uGF(=) or uGC2 (=) ontology and M a
class of instances. Then the following conditions are equivalent:
1. O is rAQ-materializable for M;
2. O is CQ-materializable for M;
3. O is UCQ-materializable for M.
P ROOF. The only non-trivial implication is (1) ⇒ (2). It can
be proved by using Lemma 1 and showing that if A is a rAQmaterialization of an ontology O and an instance D, then any forest
model B of O and D which admits a homomorphism to A that preserves dom(D) is a CQ-materialization of O and D.
Because of Theorem 2, we from now on speak of materializability without reference to a query language and of materializations instead of UCQ-materializations (which are then also CQmaterializations and rAQ-materializations).
A notion closely related to materializations are (homomorphically) universal models as used e.g. in data exchange [22, 20]. A
model of an ontology O and an instance D is hom-universal if
there is a homomorphism preserving dom(D) into any model of
O and D. We say that an ontology O admits hom-universal models if there is a hom-universal model for O and any instance D.
It is well-known that hom-universal models are closely related to
what we call UCQ-materializations. In fact, in many DLs and in
uGC2 (=), materializability of an ontology O coincides with O admitting hom-universal models (although for concrete models, being
hom-universal is not the same as being a materialization). We show
in the appendix that this is not the case for ontologies in uGF(2)
(with three variables). The proof also shows that admitting homuniversal models is not a necessary condition for query evaluation
to be in PT IME (in contrast to materializability).
Lemma 2 A uGC2 (=) ontology is materializable iff it admits homuniversal models. This does not hold for uGF(2) ontologies.
The following theorem links materializability to computational
complexity, thus providing the main reason for our interest into this
notion. The proof is by reduction of 2+2-SAT [50], a variation of a
related proof from [42].
Theorem 3 Let O be an FO(=)-ontology that is invariant under
disjoint unions. If O is not materializable, then rAQ-evaluation
w.r.t. O is CO NP-hard.
nulls) and each qi is a rAQ. This is similar to squid decompositions
in [12], but more subtle due to the presence of subqueries that are
not connected to any answer variable of q. Similar constructions
are used also to deal with Datalog6= -rewritability and with CO NPhardness.
The ontology OUCQ/CQ from Example 1 shows that Theorem 4
does not hold for GF ontologies, even if they use only a single
variable and are of depth 1 up to an outermost universal quantifier
with an equality guard.
Lemma 3 CQ-evaluation w.r.t. OUCQ/CQ is in PT IME and UCQevaluation w.r.t. OUCQ/CQ is CO NP-hard.
The lower bound essentially follows the construction in the proof
of Theorem 3 and the upper bound is based on a case analysis,
depending on which relations occur in the CQ and in the input instance.
4.
UNRAVELLING TOLERANCE
While materializability of an ontology is a necessary condition
for PT IME query evaluation in uGF(=) and uGC2 (=), we now
identify a sufficient condition called unravelling tolerance that is
based on unravelling instances into cg-tree decomposable instances
(which might be infinite). In fact, unravelling tolerance is even a
sufficient condition of Datalog6= -rewritability and we will later establish our dichotomy results by showing that, for the ontology languages in question, materializability implies unravelling tolerance.
We start with introducing suitable forms of unravelling (also
called guarded tree unfolding, see [30] and references therein). The
uGF-unravelling Du of an instance D is constructed as follows.
Let T (D) be the set of all sequences t = G0 G1 · · · Gn where Gi ,
0 ≤ i ≤ n, are maximal guarded sets of D and
(a) Gi 6= Gi+1 ,
(b) Gi ∩ Gi+1 6= ∅, and
(c) Gi−1 6= Gi+1 .
This remains true when ‘in PT IME’ is replaced with ‘Datalog6= rewritable’ and with ‘CO NP-hard’ (and with ‘Datalog-rewritable’
if O is a uGF ontology).
In the following, we associate eachSt ∈ T (D) with a set of atoms
bag(t). Then we define Du as t∈T (D) bag(t) and note that
(T (D), E, bag) is a cg-tree decomposition of Du where (t, t0 ) ∈ E
if t0 = tG for some G.
Set tail(G0 · · · Gn ) = Gn . Take an infinite supply of copies of
any d ∈ dom(D). We set e↑ = d if e is a copy of d. We define
bag(t) (up to isomorphism) proceeding by induction on the length
of the sequence t. For any t = G, bag(t) is an instance whose
domain is a set of copies of d ∈ G such that the mapping e 7→ e↑
is an isomorphism from bag(G) onto the subinstance D|G of D
induced by G. To define bag(t0 ) for t0 = tG0 when tail(t) = G,
take for any d ∈ G0 \ G a fresh copy d0 of d and define bag(t0 ) with
domain {d0 | d ∈ G0 \ G} ∪ {e ∈ bag(t) | e↑ ∈ G0 ∩ G)} such that
the mapping e 7→ e↑ is an isomorphism from bag(t0 ) onto D|G0 .
The following example illustrates the construction of Du .
Example 5 (1) Consider the instance D depicted below with the
maximal guarded sets G1 , G2 , G3 . Then the unravelling Du of
D consists of three isomorphic chains (we depict only one such
chain):
G1
G3
P ROOF. By Theorem 3, we can concentrate on ontologies that
are materializable. For the non-trivial implication of Point 3 by
Point 1, we exploit materializability
to rewrite UCQs into a finite
V
disjunction of queries q ∧ i qi where q is a “core CQ” that only
needs to be evaluated over the input instance D (ignoring labeled
(2) Next consider the instance D depicted below which has the
shape of a tree of depth one with root a and has three maximal
We remark that, in the proof of Theorem 3, we use instances and
rAQs that use additional fresh (binary) relation symbols, that is,
relation symbols that do not occur in O.
The ontology OMat/PTime from Example 1 shows that Theorem 3
does not hold for GF ontologies, even if they are of depth 1 and use
only a single variable. In fact, OMat/PTime is not CQ-materializable,
but CQ-evaluation is in PT IME (which is both easy to see).
Theorem 4 For all uGF(=) and uGC2 (=) ontologies O, the following are equivalent:
1. rAQ-evaluation w.r.t. O is in PT IME;
2. CQ-evaluation w.r.t. O is in PT IME;
3. UCQ-evaluation w.r.t. O is in PT IME.
G2
G3
G1
G2
G3
G2
guarded sets G1 , G2 , G3 . Then the unravelling Du of D consists
of three isomorphic trees of depth one of infinite outdegree (again
we depict only one):
G1
G2
a
G3
G1
...
a
G3
G3
G2
G1
G2
By construction, the mapping h : e 7→ e↑ is a homomorphism
from Du onto D and the restriction of h to any guarded set G is
an isomorphism. It follows that for any uGF(=) ontology O, UCQ
q(~
x), and ~a in Du , if O, Du |= q(~a), then O, D |= q(h(~a)). This
implication does not hold for ontologies in the guarded fragment
with functions or counting. To see this, let
O = {∀x(∃≥4 yR(x, y) → A(x))}
Then O, Du |= A(a) for the instance D from Example 5 (2) but
O, D 6|= A(a). For this reason the uGF-unravelling is not appropriate for the guarded fragment with functions or counting. By
replacing Condition (c) by the stronger condition
(c0 ) Gi ∩ Gi−1 6= Gi ∩ Gi+1 ,
we obtain an unravelling that we call uGC2 -unravelling and that we
apply whenever all relations have arity at most two. One can show
that the uGC2 -unravelling of an instance preserves the number of
R-successors of constants in D and that, in fact, the implication
‘O, Du |= q(~a) ⇒ O, D |= q(h(~a))’ holds for every uGC2 (=)
ontology O, UCQ q(~
x), and tuple ~a in the uGC2 -unravelling Du
of D.
We are now ready to define unravelling tolerance. For a maximal
guarded set G in D, the copy in bag(G) of a tuple ~a = (a1 , . . . , ak )
in G is the unique ~b = (b1 , . . . , bk ) in dom(bag(G)) such that
b↑i = ai for 1 ≤ i ≤ k.
Definition 3 A uGF(=) (resp. uGC2 (=)) ontology O is unravelling tolerant if for every instance D, every rAQ q(~
x), and every
tuple ~a in D such that the set G of elements of ~a is maximally
guarded in D the following are equivalent:
1. O, D |= q(~a);
2. O, Du |= q(~b) where ~b is the copy of ~a in bag(G)
where Du is the uGF-unravelling (resp. the uGC2 -unravelling)
of D.
We have seen above that the implication (2) ⇒ (1) in Definition 3
holds for every uGF(=) and uGC2 (=) ontology and every UCQ.
Note that it is pointless to define unravelling tolerance using the
implication (1) ⇒ (2) for UCQs or CQs that are not acyclic. The
following example shows that (1) ⇒ (2) does not always hold for
rAQs.
Example 6 Consider the uGF ontology O that contains the sentences
∀x X(x) → (∃y(R(x, y) ∧ X(y)) → E(x))
with X ∈ {A, ¬A} and
∀x E(x) → ((R(x, y) ∨ R(y, x)) → E(y))
For instances D not using A, O states that E(a) is entailed for all
a ∈ dom(D) that are R-connected to some R-cycle in D with an
odd number of constants. Thus, for the instance D from Example 5
(1) we have O, D |= E(a) for every a ∈ dom(D) but O, Du 6|=
E(a) for any a ∈ dom(Du ).
We now show that, as announced, unraveling tolerance implies that
query evaluation is Datalog6= -rewritable.
Theorem 5 For all uGF(=) and uGC2 (=) ontologies O, unravelling tolerance of O implies that rAQ-evaluation w.r.t. O is
Datalog6= -rewritable (and Datalog-rewritable if O is formulated
in uGF).
P ROOF. We sketch the proof for the case that O is a uGF(=)
ontology; similar constructions work for the other cases. Suppose
that O is unravelling tolerant, and that q(~
x) is a rAQ. We construct
a Datalog6= program Π that, given an instance D, computes the
certain answers ~a of q on D given O, where w.l.o.g. we can restrict
our attention to answers ~a such that the set G of elements of ~a is
maximally guarded in D. By unravelling tolerance, it is enough to
check if O, Du |= q(~b), where ~b is the copy of ~a in bag(G) and
Du is the uGF-unravelling of D.
The Datalog6= program Π assigns to each maximally guarded
tuple ~a = (a1 , . . . , ak ) in D a set of types. Here, a type is a
maximally consistent set of uGF formulas with free variables in
x1 , . . . , xk , where the variable xi represents the element ai . It can
be shown that we only need to consider types with formulas of the
form φ or ¬φ, where φ is obtained from a subformula of O or q by
substituting a variable in x1 , . . . , xk for each of its free variables, or
φ is an atomic formula in the signature of O, q with free variables
in x1 , . . . , xk . In particular, the set of all types is finite. We further
restrict our attention to types θ that are realizable in some model of
O, i.e., there is a model B(θ) of O containing all elements of ~a that
is a model of each formula in θ under the interpretation xi 7→ ai .
The Datalog6= program Π ensures the following:
1. for any two maximally guarded tuples ~a = (a1 , . . . , ak ), ~b =
(b1 , . . . , bl ) in D that share an element, and any type θ assigned
to ~a there is a type θ0 assigned to ~b that is compatible to θ (intuitively, the two types agree on all formulas that only talk about
elements shared by ~a and ~b);
2. a tuple ~a = (a1 , . . . , ak ) is an answer to Π if all types assigned
to ~a contain q(x1 , . . . , xk ), or some maximally guarded tuple ~b
in D has no type assigned to it.
It can be shown that ~a is an answer to Π iff O, Du |= q(~a).
The interesting part is the “if” part. Suppose ~a = (a1 , . . . , ak )
is not an answer to Π. Then, each maximally guarded tuple ~b in D
is assigned to at least one type, and for some type θ∗ assigned to ~a
we have q(x1 , . . . , xk ) ∈
/ θ∗ . We use this type assignment to label
each maximally guarded tuple ~b of Du with a type θ~b so that (1) for
each maximally guarded tuple ~c of Du that shares an element with
~b the two types θ~ and θ~c are compatible; and (2) θ∗ = θ~a∗ , where
b
~a∗ is the copy of ~a in G = {a1 , . . . , ak }. We can now show that
the interpretation A obtained from Du by hooking B(θ~b ) to Du ,
for all maximally guarded tuples ~b of Du , is a model of O and Du
with A 6|= q(~a∗ ).
5.
DICHOTOMIES
We prove dichotomies between PT IME and CO NP for query
evaluation in the five ontology languages displayed in the bottommost part of Figure 1. In fact, the dichotomy is even between
Datalog6= -rewritability and CO NP. The proof establishes that for
ontologies O formulated in any of these languages, CQ-evaluation
w.r.t. O is Datalog6= -rewritable iff it is in PT IME iff O is unravelling tolerant iff O is materializable for the class of (possibly infinite) cg-tree decomposable instances iff O is materializable and
that, if none of this is the case, CQ-evaluation w.r.t. O is CO NPhard. The main step towards the dichotomy result is provided by
the following theorem.
bag(t0 ) and such that ĥt,t0 (a)↑ = a↑ for all a ∈ dom(Du ).
(This is trivial in the example above.) It is for this property that
we need that Du is obtained from D using maximal guarded
sets only and the assumption that Gi−1 6= Gi+1 . It follows that
if tail(t) = tail(t0 ) then the same rAQs are entailed at bag(t)
and bag(t0 ) in Du .
Theorem 6 Let O be an ontology formulated in one of uGF(1),
−
uGF− (1, =), uGF−
2 (2), uGC2 (1, =), or an ALCHIF ontology
of depth 2. If O is materializable for the class of (possibly infinite)
cg-tree decomposable instances D with sig(D) ⊆ sig(O), then O
is unravelling tolerant.
2. Homomorphism preservation: if there is a homomorphism h
from instance D to instance D0 then O, D |= q(~a) entails
O, D0 |= q(h(~a)). Ontologies in uGF(1) and uGF−
2 (2) have
this property as they do not use equality nor counting. Because
of homomorphism preservation the answers in Du to rAQs are
invariant under moving from Du to Du+ . Note that the remaining ontology languages in Theorem 6 do not have this property.
P ROOF. We sketch the proof for uGF(1) and uGF−
2 (2) ontologies O and then discuss the remaining cases. Assume that O satisfies the precondition from Theorem 6. Let D be an instance and
Du its uGF unravelling. Let ~a be a tuple in a maximal guarded set
G in D and ~b be the copy in dom(bag(G)) of ~a. Further let q be an
rAQ such that O, Du 6|= q(~b). We have to show that O, D 6|= q(~a).
Using the condition that O is materializable for the class of cg-tree
decomposable instances D with sig(D) ⊆ sig(O), it can be shown
that there exists a materialization B of O and Du .
By Lemma 1 we may assume that B is a forest model which
is obtained from Du by hooking cg-tree decomposable Bbag(t) to
maximal guarded bag(t) in Du . Now we would like to obtain a
model of O and the original D by hooking for any maximal guarded
G in D the interpretation Bbag(G) to D rather than to Du . However,
the resulting model is then not guaranteed to be a model of O. The
following example illustrates this. Let O contain
Now using that Du+ is materializable w.r.t. O one can uniformize
a materialization Bu+ of Du+ that is a forest model in such a way
that the automorphisms ĥt,t0 for tail(t) = tail(t0 ) extend to automorphisms of the resulting model Bu∗ which also still satisfies O.
In the example, after uniformization all chains will behave in the
same way in the sense that every node receives an S-successor not
in A. We then obtain a forest model B∗ of D by hooking the in∗
a)
terpretations Bu∗
bag(G) to the maximal guarded sets G in D. (B , ~
u∗ ~
∗
and (B , b) are guarded bisimilar. Thus B is a model of O and
B∗ 6|= q(~a), as required.
For uGF− (1, =) and uGC−
2 (1, =) the intermediate step of constructing Du+ is not required as sentences have smaller depth and
no uniformization is needed to satisfy the ontology in the new
model. For ALCHIF ontologies of depth 2 uniformization by
constructing Du+ is needed and has to be done carefully to preserve functionality when adding copies of entailed rAQs to Du .
∀x∃y(S(x, y) ∧ A(y)),
We can now prove our main dichotomy result.
and for ϕ(x) = ∃z(S(x, z) ∧ ¬A(z))
∀xy(R(x, y) → (ϕ(x) → ϕ(y))
Thus in every model of O each node has an S-successor in A and
having an S-successor that is not in A is propagated along R. O
is unravelling tolerant. Consider the instance D from Example 5
(1) depicted here again with the maximal guarded sets G1 , G2 , G3 .
B
A
A
¬A
A
¬A
A
G1
G3
G2
A
A
G2
G3
1. O is materializable;
2. O is materializable for the class of cg-tree decomposable instances D with sig(D) ⊆ sig(O);
A
A
¬A
A
¬A
A
3. O is unravelling tolerant;
¬A
A
4. query evaluation w.r.t. O is Datalog6= -rewritable
(and Datalog-rewritable if O is formulated in uGF);
D
G1
Theorem 7 Let O be an ontology formulated in one of uGF(1),
−
uGF− (1, =), uGF−
2 (2), uGC2 (1, =), or an ALCHIF ontology
of depth 2. Then the following conditions are equivalent (unless
PT IME = NP):
¬A
A
We have seen that the unravelling Du of D consists of three chains.
An example of a forest model B of O and Du is given in the figure. Even in this simple example a naive way of hooking the models BGi , i = 1, 2, 3, to the original instance D will lead to an
interpretation not satisfying O as the propagation condition for Ssuccessors not in A will not be satisfied. To ensure that we obtain a
model of O we first define a new instance Du+ ⊇ Du by adding to
each maximal guarded set in Du a copy of any entailed rAQ. The
following facts are needed for this to work:
1. Automorphisms: for any t, t0 ∈ T (D) with tail(t) = tail(t0 )
there is an automorphism ĥt,t0 of Du mapping bag(t) onto
5. query evaluation w.r.t. O is in PT IME.
Otherwise, query evaluation w.r.t. O is CO NP-hard.
P ROOF. (1) ⇒ (2) is not difficult to establish by a compactness
argument. (2) ⇒ (3) is Theorem 6. (3) ⇒ (4) is Theorem 5. (4)
⇒ (5) is folklore. (5) ⇒ (1) is Theorem 3 (assuming PT IME 6=
NP).
The qualification ‘with sig(D) ⊆ sig(O)’ in Point 2 of Theorem 7
can be dropped without compromising the correctness of the theorem, and the same is true for Theorem 6. It will be useful, though,
in the decision procedures developed in Section 8.
6.
CSP-HARDNESS
We establish the four CSP-hardness results displayed in the middle part of Figure 1, starting with a formal definition of CSPhardness. In addition, we derive from the existence of CSPs in
PT IME that are not Datalog definable the existence of ontologies in
any of these languages with PT IME query evaluation that are not
Datalog6= rewritability.
Let A be an instance. The constraint satisfaction problem
CSP(A) is to decide, given an instance D, whether there is a homomorphism from D to A, which we denote with D → A. In
this context, A is called the template of CSP(A). We will generally and w.l.o.g. assume that relations in sig(A) have arity at
most two and that the template A admits precoloring, that is, for
each a ∈ dom(A), there is a unary relation symbol Pa such that
Pa (b) ∈ A iff b = a [19]. It is known that for every template A,
there is a template A0 of this form such that CSP(A) is polynomially equivalent to CSP(A0 ) [39]. We use coCSP(A) to denote the
complement of CSP(A).
Definition 4 Let L be an ontology language and Q a class of
queries. Then Q-evaluation w.r.t. L is CSP-hard if for every template A, there exists an L ontology O such that
1. there is a q ∈ Q such that coCSP(A) polynomially reduces to
evaluating the OMQ (O, q) and
2. for every q ∈ Q, evaluating the OMQ (O, q) is polynomially
reducible to coCSP(A).
It can be verified that a dichotomy between PT IME and CO NP
for Q-evaluation w.r.t. L ontologies implies a dichotomy between
PT IME and NP for CSPs, a notorious open problem known as the
Feder-Vardi conjecture [23, 7], when Q-evaluation w.r.t. L is CSPhard. As noted in the introduction, a tentative proof of the conjecture has recently been announced, but at the time this article is
published, its status still remains unclear.
The following theorem summarizes our results on CSP-hardness.
We formulate it for CQs, but remark that due to Theorem 4,
a dichotomy between PT IME and CO NP for any of the mentioned ontology languages and any query language from the set
{rAQ, CQ, UCQ} implies the Feder-Vardi conjecture.
Theorem 8 For any of the following ontology languages, CQevaluation w.r.t. L is CSP-hard:
uGF2 (1, =), uGF2 (2),
uGF2 (1, f ), and the class of ALCF ` ontologies of depth 2.
P ROOF. We sketch the proof for uGF2 (1, =) and then indicate
the modifications needed for uGF2 (1, f ) and ALCF ` ontologies
of depth 2. For uGF2 (2), the result follows from a corresponding
result in [42] for ALC ontologies of depth 3.
Let A be a template and assume w.l.o.g. that A admits precoloring. Let Ra be a binary relation for each a ∈ dom(A), and set
ϕ6=
a (x)
ϕ=
a (x)
= ∃y(Ra (x, y) ∧ ¬(x = y))
= ∃y(Ra (x, y) ∧ (x = y))
Then O contains
^
_ 6=
6=
¬(ϕ6=
ϕa (x))
∀x(
a (x) ∧ ϕa0 (x)) ∧
a6=a0
a
∀x(A(x) → ¬ϕ6=
when A(a) 6∈ A
a (x))
6=
0
∀xy(R(x, y) → ¬(ϕ6=
a (x) ∧ ϕa0 (y))) when R(a, a ) 6∈ A
=
∀xϕa (x)
for all a ∈ dom(A)
where A and R range over symbols in sig(A) of the respective
arity. A formula ϕ6=
a (x) being true at a constant c in an instance D
means that c is mapped to a ∈ dom(A) by a homomorphism from
D to A. The first sentence in O thus ensures that every node in D
is mapped to exactly one node in A and the second and third set
of sentences ensure that we indeed obtain a homomorphism. The
last set of sentences enforces that ϕ=
a (x) is true at every constant
c. This makes the disjunction in the first sentence ‘invisible’ to the
query (in which inequality is not available), thus avoiding that O
is CO NP-hard for trivial reasons. In the appendix, we show that O
satisfies Conditions 1 and 2 from Definition 4 where the query q
used in Condition 1 is q ← N (x) with N a fresh unary relation.
For uGF2 (1, f ), state that a binary relation F is a function
and that ∀xF (x, x). Now replace in O the formulas ϕ6=
a (x) by
∃y(Ra (x, y) ∧ ¬F (x, y)) and ϕ=
a (x) by ∃y(Ra (x, y) ∧ F (x, y)).
For ALCF ` of depth 2, replace in O the formulas ϕ6=
a (x) by
≥2
∃ yRa (x, y) and ϕ=
a (x) by ∃yRa (x, y). The resulting ontology
is equivalent to a ALCF ` ontology of depth 2.
It is known that for some templates A, CSP(A) is in PT IME while
coCSP(A) is not Datalog6= -definable [23]. Then CQ-evaluation
w.r.t. the ontologies O constructed from A in the proof of Theorem 4 is in PT IME, but not Datalog6= -rewritable.
Theorem 9 In any of the following ontology languages L there exist ontologies with PT IME CQ-evaluation which are not Datalog6= rewritable: uGF2 (1, =), uGF2 (2), uGF2 (1, f ), and the class of
ALCF ` ontologies of depth 2.
The ontology languages in Theorem 5 thus behave provably different from the languages for which we proved a dichotomy in
Section 5, since there PT IME query evaluation and Datalog6= rewritability coincide.
7.
NON-DICHOTOMY
ABILITY
AND
UNDECID-
We show that ontology languages that admit sentences of depth
2 as well as functions symbols tend to be computationally problematic as they do neither enjoy a dichotomy between PT IME
and CO NP nor decidability of meta problems such as whether
query evaluation w.r.t. a given ontology O is in PT IME, Datalog6= rewritable, or CO NP-hard, and whether O is materializable. We
actually start with these undecidability results.
Theorem 10 For the ontology languages uGF−
2 (2, f ) and
ALCIF ` of depth 2, it is undecidable whether for a given ontology O,
1. query evaluation w.r.t. O is in PT IME, Datalog6= -rewritable, or
CO NP-hard (unless PT IME = NP);
2. O is materializable.
P ROOF. The proof is by reduction of the undecidable finite rectangle tiling problem. To establish both Points 1 and 2, it suffices to
exhibit, for any such tiling problem P, an ontology OP such that
if P admits a tiling, then OP is not materializable and thus query
evaluation w.r.t. OP is CO NP-hard and if P admits no tiling, then
query evaluation w.r.t. OP is Datalog6= -rewritable and thus materializable (unless PT IME = NP).
The rectangle to be tiled is represented in input instances using the binary relations X and Y , and OP declares these relations
and their inverses to be functional. The main idea in the construction of OP is to verify the existence of a properly tiled grid in the
input instance by propagating markers from the top right corner
to the lower left corner. During the propagation, one makes sure
that grid cells close (that is, the XY-successor coincides with the
YX-successor) and that there is a tiling that satisfies the constraints
in P. Once the existence of a properly tiled grid is completed,
a disjunction is derived by OP to achieve non-materializability
and CO NP-hardness. The challenge is to implement this construction such that when P has no solution (and thus the verification
of a properly tiled grid can never complete), OP is Datalog6= rewritable. In fact, achieving this involves a lot of technical subtleties.
A central issue is how to implement the markers (as formulas
with one free variable) that are propagated through the grid during the verification. The markers must be designed in a way so
that they cannot be ‘preset’ in the input instance as this would
make it possible to prevent the verification of a (possibly defective) part of the input. In ALCIF ` , we use formulas of the form
∃=1 yP (x, y) while additionally stating in OP that ∀x∃yP (x, y).
Thus, the choice is only between whether a constant has exactly
one P -successor (which means that the marker is set) or more than
one P -successor (which means that the marker is not set). Clearly,
this difference is invisible to queries and we cannot preset a marker
in an input instance in the sense that we make it true at some constant. We can, however, easily make the marker false at a constant
c by adding two P -successors to c in the input instance. It seems
that this effect, which gives rise to many technical complications,
can only be avoided by using marker formulas with higher quantifier depth which would result in OP not falling within ALCIF `
depth 2. For uGF−
2 (2, f ) we work with ¬∃y(P (x, y) ∧ ¬F (x, y)),
where F is a function for which we state ∀xF (x, x) (as in the CSP
encoding).
Full proof details can be found in the appendix. We only mention
that closing of a grid cell is verified by using marker formulas as
second-order variables.
Theorem 11 For the ontology languages uGF−
2 (2, f ) and
ALCIF ` of depth 2, there is no dichotomy between PT IME and
CO NP (unless PT IME = CO NP).
By Ladner’s theorem [38], there is a non-deterministic polynomial time Turing machine (TM) whose word problem is neither in
PT IME nor NP-hard (unless PT IME = CO NP). Ideally, we would
like to reduce the word problem of such TMs to prove Theorem 11.
However, this does not appear to be easily possible, for the following reason. In the reduction, we use a grid construction and marker
formulas as in the proof of Theorem 10, with the grid providing
the space in which the run of the TM is simulated and markers representing TM states and tape symbols. We cannot avoid that the
markers can be preset either positively or negatively in the input
(depending on the marker formulas we choose), which means that
some parts of the run are not ‘free’, but might be predetermined or
at least constrained in some way. We solve this problem by first
establishing an appropriate variation of Ladner’s theorem.
We consider non-deterministic TMs M with a single one-sided
infinite tape. Configurations of M are represented by strings vqw,
where q is the state, and v and w are the contents of the tape to
the left and to the right of the tape head, respectively. A partial
configuration of M is obtained from a configuration γ of M by
replacing some or all symbols of γ by a wildcard symbol ?. A
partial configuration γ̃ matches a configuration γ if it has the same
length and agrees with γ on all non-wildcard symbols. A partial
run of M is a finite sequence γ̃0 , . . . , γ̃m of partial configurations
of M of the same length. It is a run if each γ̃i is a configuration,
and it matches a run γ0 , . . . , γn if m = n and each γ̃i matches γi .
A run is accepting if its last configuration has an accepting state.
Note that runs need not start in any specific configuration (unless
specified by a partial run that they extend). The run fitting problem
for M is to decide whether a given partial run of M matches some
accepting run of M . It is easy to see that for any TM M , the run
fitting problem for M is in NP. We prove the following result in the
appendix by a careful adaptation of the proof of Ladner’s theorem
given in [2].
Theorem 12 There is a non-deterministic Turing machine whose
run fitting problem is neither in PT IME nor NP-hard (unless
PT IME = NP).
Now Theorem 11 is a consequence of the following lemma.
Lemma 4 For every Turing machine M , there is a uGF−
2 (2, f )
ontology O and an ALCIF ` ontology O of depth 2 such that the
following hold, where N is a distinguished unary relation:
1. there is a polynomial reduction of the run fitting problem for M
to the complement of evaluating the OMQ (O, q ← N (x));
2. for every UCQ q, evaluating the OMQ (O, q) is polynomially
reducible to the complement of the run fitting problem for M .
To establish Lemma 4, we re-use the ontology OP from the proof
of Theorem 10, using a trivial rectangle tiling problem. When the
existence of the grid has been verified, instead of triggering a disjunction as before, we now start a simulation of M on the grid.
For both ALCIF ` and uGF−
2 (2, f ), we represent states q and tape
symbols G using the same formulas as in the CSP encoding of homomorphisms. Thus, for ALCIF ` we use formulas ∃≥2 yq(x, y)
and ∃≥2 yG(x, y), respectively, using q and G as binary relations.
Note that here the encoding ∃=1 yq(x, y) from the tiling problem
does not work because states and tape symbols can be positively
preset in the input instance rather than negatively, which is in correspondence with the run fitting problem.
8.
DECISION PROBLEMS
We study the decidability and complexity of the problem to decide whether a given ontology admits PT IME query evaluation. Realistically, we can only hope for positive results in cases where
there is a dichotomy between PT IME and CO NP: first, we have
shown in Section 7 that for cases with provably no such dichotomy,
meta problems are typically undecidable; and second, it does not
seem very likely that in the CSP-hard cases, one can decide whether
an ontology admits PT IME query evaluation without resolving the
dichotomy question and thus solving the Feder-Vardi conjecture.
Our main results are E XP T IME-completeness of deciding PT IMEquery evalutation of ALCHIQ ontologies of depth one (the same
complexity as for satisfiability) and a NE XP T IME upper bound for
uGC−
2 (1, =) ontologies. Note that, in both of the considered languages, PT IME-query evalutation coincides with rewritability into
Datalog6= . We remind the reader that according to our experiments,
a large majority of real world ontologies are ALCHIQ ontologies
of depth 1. We also show that for ALC ontologies of depth 2, the
mentioned problem is NE XP T IME-hard.
Since the ontology languages relevant here admit at most binary
relations, an interpretation B is cg-tree decomposable if and only
if the undirected graph GB = {{a, b} | R(a, b) ∈ B, a 6= b} is
a tree. For simplicity, we speak of tree interpretations and of tree
instances, defined likewise. The outdegree of B is the outdegree
of GB .
Theorem 13 For uGC−
2 (1, =) ontologies, deciding whether query
evaluation w.r.t. a given ontology is in PT IME (equivalently:
rewritable into Datalog6= ) is in NE XP T IME. For ALCHIQ ontologies of depth 1, this problem is in E XP T IME-complete.
The main insight underlying the proof of Theorem 13 is that for
ontologies formulated in the mentioned languages, materializability (which by Theorem 7 coincides with PT IME query evaluation)
already follows from the existence of materializations for tree instances of depth 1. We make this precise in the following lemma.
Given a tree interpretation B and a ∈ dom(B), define the 1neighbourhood B≤1
a of a in B as B|X , where X is the union of
all guarded sets in B that contain a. B is a bouquet with root a if
B≤1
= B and it is irreflexive if there exists no atom of the form
a
R(b, b) in B.
Lemma 5 Let O be a uGC−
2 (1, =) ontology (resp. an ALCHIQ
ontology of depth 1). Then O is materializable iff O is materializable for the class of all (respectively, all irreflexive) bouquets D of
outdegree ≤ |O| with sig(D) ⊆ sig(O).
P ROOF. We require some notation. An instance D is called Osaturated for an ontology O if for all facts R(~a) with ~a ⊆ dom(D)
such that O, D |= R(~a) it follows that R(~a) ∈ D. For every O
and instance D there exists a unique minimal (w.r.t. set-inclusion)
O-saturated instance DO ⊇ D. We call DO the O-saturation of
D. It is easy to see that there is a materialization of O and D if and
only if there is a materialization of O and the O-saturation of D.
We first prove Lemma 5 for uGC−
2 (1, =) ontologies O and without the condition on the outdegree. Let Σ0 = sig(O) and assume
that O is materializable for the class of all Σ0 -bouquets. By Theorem 7 it suffices to prove that O is materializable for the class
of Σ0 -tree instances. Fix a Σ0 -tree instance D that is consistent
w.r.t. O. We may assume that D is O-saturated. Note that a forest model materialization B of an ontology O and an O-saturated
instance F consists of F and tree interpretations Ba , a ∈ dom(F),
that are hooked to F at a. Take for any a ∈ dom(D) the bouquet D≤1
with root a and hook to D at a the interpretation Ba
a
that is hooked to D≤1
a at a in a forest model materialization B of
≤1
D≤1
a and O (such a forest model materialization exists since Da
is materializable). Denote by A the resulting interpretation. Using
the condition that O is a uGC−
2 (1, =) ontology it is not difficult to
prove that A is a materialization of O and D.
We now prove the restriction on the outdegree. Assume O is
given. Let D be a bouquet with root a of minimal outdegree such
that there is no materialization of O and D. We show that the outdegree of D does not exceed |O|. Assume the outdegree of D is at
least three (otherwise we are done). We may assume that D is Osaturated. Take for any formula χ = ∃≥n z1 α(z1 , z2 ) ∧ ϕ(z1 , z2 )
that occurs as a subformula in O the set Zχ of all b 6= a such that
D |= α(b, a) ∧ ϕ(b, a). Let Zχ0 = Zχ if |Zχ | ≤ n + 1; otherwise
let Zχ0 be a subset of Zχ of cardinality n + 1. Let D0 be the restriction D|Z of D to the union Z of all Zχ0 and {a}. We show that
there exists no materialization of D0 and O. Assume for a proof by
contradiction that there is a materialization B of D0 . Let B0 be the
union of D∪B and the interpretations Bb , b ∈ dom(D)\(Z∪{a}),
that are hooked to D|{a,b} at b in a forest model materialization of
D|{a,b} . We show that B0 is a materialization of D and O (and thus
derive a contradiction). Using the condition that D is O-saturated
one can show that the restriction B0|dom(D) of B0 to dom(D) coincides with D. Using the condition that O has depth 1 it is now
easy to show that B0 is a model of O. It is a materialization of D
and O since it is composed of materializations of subinstances of
D and O.
The proof that irreflexive bouquets are sufficient for ontologies
of depth 1 in ALCHIQ is similar to the proof above and uses the
fact that one can always unravel models of ALCHIQ ontologies
into irreflexive tree models.
We now develop algorithms that decide PT IME query evaluation
by checking the conditions given in Lemma 5, starting with the
(easier) case of ALCHIQ. Let D be a bouquet with root a. Call a
bouquet B ⊇ D a 1-materialization of O and D if
• there exists a model A of O and D such that B = A≤1
a ;
• for any model A of D and O there exists a homomorphism from
B to A that preserves dom(D).
It turns out that, when checking materializability, not only is it sufficient to consider bouquets instead of unrestricted instances, but
additionally one can concentrate on 1-materializations of bouquets.
Lemma 6 Let O be an ALCHIQ ontology of depth 1. If for all
irreflexive bouquets D that are consistent w.r.t. O, of outdegree ≤
|O|, and satisfy sig(D) ⊆ sig(O) there is a 1-materialization of O
and D, then O is materializable for the class of all such bouquets.
P ROOF. For brevity, we call an irreflexive bouquet F relevant if it is consistent w.r.t. O, of outdegree ≤ |O| and satisfies sig(F) ⊆ sig(O). An irreflexive 1-materializability witness
(F, a, B) consists of a relevant irreflexive bouquet F with root a
and a 1-materialization B of F w.r.t. O. One can show that B is an
irreflexive tree interpretation.
Now let D be a relevant irreflexive bouquet with root a and assume that D is 1-materializable w.r.t. O. We have to show that
there exists a materialization of O and D. Note that for every
relevant irreflexive bouquet F, there is a 1-materializability witness (F, a, B). We construct the desired materialization step-bystep using these pairs also memorizing sets of frontier elements
that have to be expanded in the next step. We start with the irreflexive 1-materializability witness (D, a, B) and set B0 = B
and F0 = dom(B) \ {a}. Then we construct a sequence of irreflexive tree interpretations B0 ⊆ B1 ⊆ . . . and frontier sets
Fi+1 ⊆ dom(Bi+1 ) \ dom(Bi ) inductively as follows: given Bi
and Fi , take for any b ∈ Fi its predecessor a in Bi and an irreflexive 1-materializability witness (Bi|{a,b} , b, Bb ) and set
[
[
Bi+1 := Bi ∪
Bb
Fi+1 :=
dom(Bb ) \ {b}
b∈Fi
b∈Fi
∗
Let B be the union of all Bi . We show that B is a materialization of O and D. B is a model of O by construction since O
is an ALCHIQ ontology of depth 1. Consider a model A of O
and D. It suffices to construct a homomorphism h from B∗ to
A that preserves dom(D). We may assume that A is an irreflexive tree interpretation. We construct h as the limit of a sequence
h0 , . . . of homomorphisms from Bi to A. By definition, there
exists a homomorphism h0 from B0 to A≤1
a preserving dom(D).
Now, inductively, assume that hi is a homomorphism from Bi to
A. Assume c has been added to Bi in the construction of Bi+1 .
Then there exists b ∈ Fi and its predecessor a in Bi such that
c ∈ dom(Bb ) \ {b}, where Bb is the irreflexive tree interpretation
that has been added to Bi as the last component of the irreflexive
model pair (Bi|{a,b} , b, Bb ). But then, as Bb is a 1-materialization
of Bi|{a,b} and hi is injective on Bi|{a,b} (since A is irreflexive), we
can expand the homomorphism hi to a homomorphism to A with
domain dom(Bi ) ∪ {c}. Thus, we can expand hi to a homomorphism from Bi+1 to A.
Lemma 5 and Lemma 6 imply that an ALCHIQ ontology O of
depth 1 enjoys PT IME query evaluation if and only if all irreflexive bouquets D that are consistent w.r.t. O, of outdegree ≤ |O|,
and satisfy sig(D) ⊆ sig(O) have a 1-materialization w.r.t. O.
The latter condition can be checked in deterministic exponential
time since the satisfiability problem for ALCHIQ ontologies is
in E XP T IME. Moreover, there are only exponentially many relevant bouquets. We have thus proved the E XP T IME upper bound in
Theorem 13. A matching lower bound can be proved by a straightforward reduction from satisfiability.
The following example shows that, in contrast to ALCHIQ
depth 1, for uGC−
2 (1, =) the existence of 1-materializations does
not guarantee materializability of bouquets.
Example 7 We use ∃6= yW (x, y) to abbreviate ∃y(W (x, y)∧(x 6=
y)) and likewise for ∃6= yW (y, x). Let S, S 0 , R, R0 be binary relation symbols and O the uGF−
2 (1, =) ontology O that contains
∀x S(x, x) → (R(x, x) → (∃6= yR(x, y) ∨ ∃6= yS(x, y)))
∀x(∃6= yW (y, x) → ∃yW 0 (x, y))
where (W, W 0 ) range over {(R, R0 ), (S, S 0 )}. Observe that for
the instance D = {(S(a, a), R(a, a)} and the Boolean UCQ
q ← R0 (x, y) ∨ S 0 (x, y),
we have O, D |= q. Also, for qR ← R0 (x, y) and qS ← S 0 (x, y)
we have O, D 6|= qR and O, D 6|= qS . Thus, O is not materializable. It is, however, easy to show that for every bouquet D there
exists a 1-materialization of D w.r.t. O.
In uGC−
2 (1, =), we thus have to check unrestricted materializability of bouquets, instead of 1-materializability. In fact, it suffices
to consider materializations that are tree interpretations. To decide
the existence of such materialization, we use a mosaic approach.
In each mosaic piece, we essentially record a 1-neighborhood of
the materialization, a 1-neighborhood of a model of the bouquet
and ontology, and a homomorphism from the former to the latter.
We then identify certain conditions that characterize when a set of
mosaics can be assembled into a materialization in a way that is
similar to the model construction in the proof of Lemma 6. There
are actually two different kinds of mosaic pieces that we use, with
one kind of piece explicitly addressing reflexive loops which, as illustrated by Example 7, are the reason why we cannot work with
1-materializations. The decision procedure then consists of guessing a set of mosaics and verifying that the required conditions are
satisfied. Details are in the appendix.
Theorem 13 only covers ontology languages of depth 1. It would
be desirable to establish decidability also for ontology languages of
depth 2 that enjoy a dichotomy between PT IME and CO NP, such
as uGF−
2 (2). The following example shows that this requires more
sophisticated techniques than those used above. In particular, materializability of bouquets does not imply materializability.
Example 8 We give a family of ALC-ontologies (On )n≥0 of
depth 2 such that each On is materializable for the class of tree interpretations of depth at most 2n − 1 while it is not materializable.
The idea is that any instance D that witnesses non-materializability
of On must contain an R-chain of length 2n , R a binary relation.
The presence of this chain is verified by propagating a marker upwards along the chain. To avoid that O is 1-materializable, we
represent this marker by a universally quantified formula and also
hide some other unary predicates in the same way. For each unary
predicate P , let HP (x) denote the formula ∀y(S(x, y) → P (y))
and include in On the sentence ∀x∃y(S(x, y) ∧ P (y)). The remaining sentences in On are:
X 1 (x) ∧ · · · ∧ X n (x) → HV (x)
Xi (x) ∧ ∃R.(Xi (y) ∧ Xj (y)) → Hoki (x)
X i (x) ∧ ∃R.(X i (y) ∧ Xj (y)) → Hoki (x)
Xi (x) ∧ ∃R.(X i (y) ∧ X1 (y) ∧ · · · ∧ Xi−1 (y)) → Hoki (x)
X i (x) ∧ ∃R.(Xi (y) ∧ X1 (y) ∧ · · · ∧ X i−1 (y)) → Hoki (x)
Hok1 (x) ∧ · · · ∧ Hokn (x) ∧ ∃R.HV (y) → HV (x)
∃R.Xi (y) ∧ ∃R.X i (y) → ⊥
X1 (x) ∧ · · · ∧ Xn (x) ∧ HV (x) → B1 (x) ∨ B2 (x)
where x is universally quantified, ∃R.ϕ(y) is an abbreviation for
∃y(R(x, y)∧ϕ(y)), and i ranges over 1..n. Note that X1 , . . . , Xn
and X 1 , . . . , Xn represent a binary counter and that lines two to
five implement incrementation of this counter. The second last formula is necessary to avoid that multiple successors of a node interact in undesired ways. On instances that contain no R-chain of
length 2n , a materialization can be constructed by a straightforward chase procedure.
We also observe that the ideas from Example 8 gives rise to a NE X P T IME lower bound.
Theorem 14 For ALC ontologies of depth 2, deciding whether
query-evaluation is in PT IME is NE XP T IME-hard (unless
P TIME = CO NP).
We remark that the decidability of PT IME query evaluation of ALC
ontologies of depth 2 remains open.
9.
CONCLUSION
Perhaps the most surprizing result of our analysis is that it is
possible to escape Ladner’s Theorem and prove a PT IME/CO NP dichotomy for query evaluation for rather large subsets of the guarded
fragment that cover almost all practically relevant DL ontologies.
This result comes with a characterization of PT IME query evaluation in terms of materializability and unravelling tolerance, with the
guarantee that PT IME query evaluation coincides with Datalog6= rewritability, and with decidability of (and complexity results for)
meta problems such as deciding whether a given ontology enjoys
PT IME query evaluation. Our study also shows that when we increase the expressive power in seemingly harmless ways, then often there is provably no PT IME/CO NP dichotomy or one obtains
CSP-hardness. The proof of the non-dichotomy results comes with
a variation of Ladner’s Theorem that could prove useful in other
contexts where some form of precoloring of the input is unavoidable, such as in consistent query answering [43].
There are a number of interesting future research questions.
The main open questions regarding dichotomies are whether the
PT IME/CO NP dichotomy can be generalized from uGF−
2 (2) to
uGF− (2) and whether the CSP-hardness results can be sharpened
to either CSP-equivalence results (this is known for ALC ontologies of depth 3 [42]) or to non-dichotomy results. Also of interest is the complexity of deciding PT IME query evaluation for
uGF(1), where the characterization of PT IME query evaluation via
hom-universal models fails. Improving our current complexity results to tight complexity bounds for P TIME query evaluation for
ALCHIF ontologies of depth 2 and uGF−
2 (2) ontologies appears
to be challenging as well. It would also be interesting to study the
case where invariance under disjoint union is not guaranteed (as we
have observed, the complexities of CQ and UCQ evaluation might
then diverge), and to add the ability to declare in an ontology that a
binary relation is transitive.
10.
ACKNOWLEDGMENTS
André Hernich, Fabio Papacchini, and Frank Wolter were supported by EPSRC UK grant EP/M012646/1. Carsten Lutz was supported by ERC CoG 647289 CODA.
11.
REFERENCES
[1] H. Andréka, I. Németi, and J. van Benthem. Modal
languages and bounded fragments of predicate logic. J.
Philosophical Logic, 27(3):217–274, 1998.
[2] S. Arora and B. Barak. Computational Complexity - A
Modern Approach. Cambridge University Press, 2009.
[3] A. Artale, D. Calvanese, R. Kontchakov, and
M. Zakharyaschev. The DL-Lite family and relations. J. of
Artifical Intelligence Research, 36:1–69, 2009.
[4] F. Baader, Deborah, D. Calvanese, D. L. McGuiness,
D. Nardi, and P. F. Patel-Schneider, editors. The Description
Logic Handbook. Cambridge University Press, 2003.
[5] V. Bárány, G. Gottlob, and M. Otto. Querying the guarded
fragment. Logical Methods in Computer Science, 10(2),
2014.
[6] V. Bárány, B. ten Cate, and L. Segoufin. Guarded negation. J.
ACM, 62(3):22:1–22:26, 2015.
[7] L. Barto. Constraint satisfaction problem and universal
algebra. SIGLOG News, 1(2):14–24, 2014.
[8] C. Beeri and P. A. Bernstein. Computational problems
related to the design of normal form relational schemas.
ACM Trans. Database Syst., 4(1):30–59, 1979.
[9] C. Beeri and M. Y. Vardi. A proof procedure for data
dependencies. J. ACM, 31(4):718–741, 1984.
[10] M. Bienvenu and M. Ortiz. Ontology-mediated query
answering with data-tractable description logics. In Proc. of
Reasoning Web, pages 218–307, 2015.
[11] M. Bienvenu, B. ten Cate, C. Lutz, and F. Wolter.
Ontology-based data access: A study through disjunctive
datalog, csp, and MMSNP. ACM Trans. Database Syst.,
39(4):33:1–33:44, 2014.
[12] A. Calì, G. Gottlob, and M. Kifer. Taming the infinite chase:
Query answering under expressive relational constraints. J.
Artif. Intell. Res. (JAIR), 48:115–174, 2013.
[13] A. Calì, G. Gottlob, and A. Pieris. Towards more expressive
ontology languages: The query answering problem. Artif.
Intell., 193:87–128, 2012.
[14] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and
R. Rosati. Data complexity of query answering in description
logics. Artificial Intelligence, 195:335–360, 2013.
[15] D. Calvanese, G. De Giacomo, and M. Lenzerini. On the
decidability of query containment under constraints. In Proc.
of PODS, pages 149–158, 1998.
[16] D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, and
R. Rosati. Tractable reasoning and efficient query answering
in description logics: The DL-Lite family. J. Autom.
Reasoning, 39(3):385–429, 2007.
[17] D. Calvanese, G. D. Giacomo, M. Lenzerini, and M. Y.
Vardi. View-based query processing and constraint
satisfaction. In LICS, pages 361–371, 2000.
[18] D. Calvanese, G. D. Giacomo, M. Lenzerini, and M. Y.
Vardi. View-based query containment. In PODS, pages
56–67, 2003.
[19] D. Cohen and P. Jeavons. The complexity of constraint
languages, chapter 8. Elsevier, 2006.
[20] A. Deutsch, A. Nash, and J. B. Remmel. The chase revisited.
In M. Lenzerini and D. Lembo, editors, Proc. of PODS,
pages 149–158. ACM, 2008.
[21] R. Fagin, B. Kimelfeld, and P. G. Kolaitis. Dichotomies in
the complexity of preferred repairs. In Proc. of PODS, pages
3–15, 2015.
[22] R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data
exchange: semantics and query answering. Theor. Comput.
Sci., 336(1):89–124, 2005.
[23] T. Feder and M. Y. Vardi. The computational structure of
monotone monadic SNP and constraint satisfaction: A study
through datalog and group theory. SIAM J. Comput.,
28(1):57–104, 1998.
[24] G. Fontaine. Why is it hard to obtain a dichotomy for
consistent query answering? ACM Trans. Comput. Log.,
16(1):7:1–7:24, 2015.
[25] C. Freire, W. Gatterbauer, N. Immerman, and A. Meliou. The
complexity of resilience and responsibility for self-join-free
conjunctive queries. PVLDB, 9(3):180–191, 2015.
[26] G. Gottlob, S. Kikot, R. Kontchakov, V. V. Podolskii,
T. Schwentick, and M. Zakharyaschev. The price of query
rewriting in ontology-based data access. Artif. Intell.,
213:42–59, 2014.
[27] G. Gottlob, M. Manna, and A. Pieris. Polynomial rewritings
for linear existential rules. In Proc. of IJCAI, pages
2992–2998, 2015.
[28] G. Gottlob, G. Orsi, and A. Pieris. Query rewriting and
optimization for ontological databases. ACM Trans.
Database Syst., 39(3):25:1–25:46, 2014.
[29] E. Grädel. On the restraining power of guards. J. Symb. Log.,
64(4):1719–1742, 1999.
[30] E. Grädel and M. Otto. The freedoms of (guarded)
bisimulation. In Johan van Benthem on Logic and
Information Dynamics, pages 3–31. 2014.
[31] M. Kaminski, Y. Nenov, and B. C. Grau. Datalog
rewritability of disjunctive datalog programs and non-horn
ontologies. Artif. Intell., 236:90–118, 2016.
[32] Y. Kazakov. A polynomial translation from the two-variable
guarded fragment with number restrictions to the guarded
fragment. In Proc. of JELIA, pages 372–384, 2004.
[33] B. Kimelfeld. A dichotomy in the complexity of deletion
propagation with functional dependencies. In M. Benedikt,
M. Krötzsch, and M. Lenzerini, editors, Proc. of PODS,
pages 191–202. ACM, 2012.
[34] R. Kontchakov and M. Zakharyaschev. An introduction to
description logics and query rewriting. In Proc. of Reasoning
Web, pages 195–244, 2014.
[35] P. Koutris and D. Suciu. A dichotomy on the complexity of
consistent query answering for atoms with simple keys. In
Proc. of ICDT, pages 165–176. OpenProceedings.org, 2014.
[36] P. Koutris and J. Wijsen. The data complexity of consistent
query answering for self-join-free conjunctive queries under
primary key constraints. In Proc. of PODS, pages 17–29,
2015.
[37] M. Krötzsch. OWL 2 profiles: An introduction to lightweight
ontology languages. In Proc. of Reasoning Web, pages
112–183, 2012.
[38] R. E. Ladner. On the structure of polynomial time
reducibility. J. ACM, 22(1):155–171, 1975.
[39] B. Larose and P. Tesson. Universal algebra and hardness
results for constraint satisfaction problems. Theor. Comput.
Sci., 410(18):1629–1647, 2009.
[40] A. Y. Levy and M. Rousset. Combining horn rules and
description logics in CARIN. Artif. Intell.,
104(1-2):165–209, 1998.
[41] C. Lutz, I. Seylan, and F. Wolter. Ontology-based data access
with closed predicates is inherently intractable(sometimes).
In Proc. of IJCAI, pages 1024–1030, 2013.
[42] C. Lutz and F. Wolter. Non-uniform data complexity of
query answering in description logics. In Proc. of KR, 2012.
[43] C. Lutz and F. Wolter. On the relationship between consistent
query answering and constraint satisfaction problems. In
Proc. of ICDT, pages 363–379, 2015.
[44] J. Minker, editor. Foundations of Deductive Databases and
Logic Programming. Elsevier, 1988.
[45] M. Mugnier and M. Thomazo. An introduction to
ontology-based query answering with existential rules. In
Proc. of Reasoning Web, pages 245–278, 2014.
[46] M. Ortiz, D. Calvanese, and T. Eiter. Data complexity of
query answering in expressive description logics via
tableaux. Journal of Automated Reasoning, 41(1):61–98,
2008.
[47] A. Poggi, D. Lembo, D. Calvanese, G. D. Giacomo,
M. Lenzerini, and R. Rosati. Linking data to ontologies. J.
Data Semantics, 10:133–173, 2008.
[48] I. Pratt-Hartmann. Complexity of the guarded two-variable
fragment with counting quantifiers. J. Log. Comput.,
17(1):133–155, 2007.
[49] A. Rafiey, J. Kinne, and T. Feder. Dichotomy for digraph
homomorphism problems. CoRR, abs/1701.02409, 2017.
[50] A. Schaerf. On the complexity of the instance checking
problem in concept languages with existential quantification.
J. of Intel. Inf. Systems, 2:265–278, 1993.
[51] D. Suciu, D. Olteanu, C. Ré, and C. Koch. Probabilistic
Databases. Synthesis Lectures on Data Management.
Morgan & Claypool Publishers, 2011.
[52] P. L. Whetzel, N. F. Noy, N. H. Shah, P. R. Alexander,
C. Nyulas, T. Tudorache, and M. A. Musen. BioPortal:
enhanced functionality via new web services from the
national center for biomedical ontology to access and use
ontologies in software applications. Nucleic acids research,
39(suppl 2):W541–W545, 2011.
APPENDIX
A.
INTRODUCTION TO DESCRIPTION
LOGIC
We give a brief introduction to the syntax and semantics of DLs
and establish their relationship to the guarded fragment of FO. We
consider the DL ALC and its extensions by inverse roles, role inclusions, qualified number restrictions, functional roles, and local
functionality. Recall that ALC-concepts are constructed according
to the rule
C, D := > | ⊥ | A | C u D | C t D | ¬C | ∃R.C | ∀R.C
where A ranges over unary relations and R ranges over binary relations. DLs extended by inverse roles (denoted in the name of a
DL by the letter I) admit, in addition, inverse relations denoted by
R− , with R a relation. Thus, in ALCI inverse relations can be
used in place of relations in any ALC concept. DLs extended by
qualified number restrictions (denoted by Q) admit concepts of the
form (≥ n R C), (= n R C), and (≤ n R C), where n ≥ 1 is
a natural number, R is a relation or an inverse relation (if inverse
relations are in the original DL), and C is a concept. When extending a DL with local functionality (denoted by F` ) one can use only
number restrictions of the form (≤ 1 R >) in which R is a relation
or an inverse relation (if inverse relations are in the original DL).
We abbreviate (≤ 1 R >) with (≤ 1R) and use (= 1R) as an
abbreviation for (∃R.>) u (≤ 1R) and (≥ 2R) as an abbreviation
for (∃R.>) u ¬(≤ 1R).
In DLs, ontologies are formalized as finite sets of concept inclusions C v D, where C, D are concepts in the respective language.
We use C ≡ D as an abbreviation for C v D and D v C. In the
DLs extended with functionality (denoted by F) one can use functionality assertions func(R), where R is a relation or an inverse
relation (if present in the original DL). Such an R is interpreted as
a partial function. Extending a DL with role inclusions (denoted by
H) allows one to use expressions of the form R v S, where R and
S are relations or inverse relations (if present in the original DL),
and which state that R is a subset of S.
The semantics of DLs is given by interpretations A. The interpretation C A of a concept C in an interpretation A is defined
inductively as follows:
>A = dom(A)
AA = {a ∈ dom(A) | A(a) ∈ A}
(C u D)A = C A ∩ DA
(∃R.C)A
(∀R.C)A
(≥ n R
C)A
(≤ n R
C)A
(= n R
C)A
= {a ∈ dom(A) |
= {a ∈ dom(A) |
∃a0
∀a0
⊥A = ∅
(¬C)A = dom(A) \ C A
(C t D)A = C A ∪ DA
: R(a, a0 ) ∈ A and a0 ∈ C A }
: R(a, a0 ) ∈ A implies a0 ∈ AA }
= {a ∈ dom(A) | |{b | R(a, b) ∈ A and b ∈ C A }| ≥ n}
= {a ∈ dom(A) | |{b | R(a, b) ∈ A and b ∈ C A }| ≤ n}
= {a ∈ dom(A) | |{b | R(a, b) ∈ A and b ∈ C A }| = n}
Then A satisfies a concept inclusion C v D if C A ⊆ DA . Alternatively, one can define the semantics of DLs by translating them
into FO; the following table gives such a translation:
>∗ (x) = >
⊥∗ (x) = ⊥
A∗ (x) = A(x)
(¬C)∗ (x) = ¬(C ∗ (x))
(C u D)∗ (x) = C ∗ (x) ∧ D∗ (x) (C t D)∗ (x) = C ∗ (x) ∨ D∗ (x)
(∃R.C)∗ (x) = ∃y (R(x, y) ∧ C ∗ (y))
(∀R.C)∗ (x) = ∀y (R(x, y) → C ∗ (y))
(≥ n R C)∗ (x) = ∃≥n y(R(x, y) ∧ C ∗ (y))
We observe the following relationships between DLs and fragments
of the guarded fragment. For a DL L and fragment L0 of the
guarded fragment we say that an L ontology O can be written as
an L0 ontology if the translation given above translates O into an
L0 ontology.
Lemma 7 The following inclusions hold:
1. Every ALCHI ontology can be written as a uGF2 ontology. If
the ontology has depth 2, then it can be written as a uGF−
2 (2)
ontology.
2. Every ALCHIF ontology can be written as a uGF−
2 (f ) ontology.
3. Every ALCHIQ ontology can be written as a uGC2 ontology.
If the ontology has depth 1, then it can be written as a uGC−
2 (1)
ontology.
B.
INTRODUCTION TO DATALOG
We give a brief introduction to the notation used for Datalog. A
datalog6= rule ρ takes the form
S(~
x) ← R1 (~
x1 ) ∧ · · · ∧ Rm (~
xm )
where S is a relation symbol, m ≥ 1, and R1 , . . . , Rm are either
relation symbols or the symbol 6= for inequality. We call S(~
x) the
head of ρ and R1 (~
x1 ) ∧ · · · ∧ Rm (~
xm ) its body. Every variable in
the head of ρ is required to occur in its body. We call a datalog6=
rule that does not use inequality a datalog rule. A Datalog6= program is a finite set Π of datalog6= rules with a selected goal relation
symbol goal that does not occur in rule bodies in Π and only in goal
rules of the form goal(~
x) ← R1 (~
x1 ) ∧ · · · ∧ Rm (~
xm ). The arity of
Π is the arity of its goal relation. A Datalog program is a Datalog6=
program not using inequality.
For every instance D and Datalog6= program Π, we call a
model A of D a model of Π if A is a model of all FO sentences ∀~
x∀~
x1 · · · ∀~
xm (R1 (~
x1 ) ∧ · · · ∧ Rm (~
xm ) → S(~
x)) with
S(~
x) ← R1 (~
x1 ) ∧ · · · ∧ Rm (~
xm ) ∈ Π. We set D |= Π(~a) if
goal(~a) ∈ A for all models A of D and Π.
C.
PROOFS FOR SECTION 2
Following the notation introduced after Theorem 1 we denote
the guarded fragment with equality by GF(=) and uGF with equality in non-guard positions by uGF(=). In contrast, GF and uGF
denote the corresponding languages without equality in non-guard
positions. We use this notation in the formulation of Theorem 1
below.
Theorem 1 (restated) (1) A sentence in GF(=) is invariant under
disjoint unions iff it is equivalent to a sentence in uGF(=).
(2) A sentence in GF is invariant under disjoint unions iff it is
equivalent to a sentence in uGF.
P ROOF. It remains to prove that every sentence in GF(=) is
equivalent to a Boolean combination of sentences in uGF(=). Assume a sentence ϕ in GF(=) is given. First, we may assume
that all subformulas ∃y(x = y ∧ ψ(x, y)) of ϕ are rewritten
as ¬∀y(x = y → ψ(x, y)). Then let ψ = ∀y(x = y →
ψ 0 (x, y)) be a subformula of ϕ with free variable x. It follows
that ψ ≡ ψ 0 (x, x). As ψ 0 (x, x) is a GF(=) formula, any occurrence of ψ in ϕ can be replaced by ψ 0 (x, x). Let ϕ0 be the result of
substituting all subformulas of the form of ψ in ϕ with the corresponding equivalent GF(=) formula. It follows that ϕ ≡ ϕ0 (note
that ∀x∀y(x = y → ψ 0 (x, y)) can be rewritten as ∀x(x = x →
∀y(x = y → ψ 0 (x, y)))).
A sentence ψ in ϕ is a simple sentence if there is no strict subformula ψ 0 of ψ which is a sentence. For the second step, let ϕ0 = ϕ0 ,
and let
ϕi = (ϕi−1 [ψ/>] ∧ ψ) ∨ (ϕi−i [ψ/⊥] ∧ ¬ψ)
for some simple sentence ψ of ϕi−1 not already substituted in any
step j < i − 1. As at each step any simple sentence is a GF(=)
sentence with guard R(~
x) or x = x and the guarded formula is an
openGF formula, then any simple sentence is either the negation of
a uGF(=) sentence or a uGF(=) sentence. Let ϕn be the resulting
sentence. It follows that ϕ ≡ ϕn , and ϕn is a Boolean combination
of uGF(=) sentences.
For the proof of Lemma 1 we require the notion of guarded bisimulations that characterizes the expressive power of GF(=) [30]. To
apply guarded bisimulations to characterize the fragment uGF(=)
of GF(=) we modify the notion slightly by demanding that in the
back-and-forth condition only overlapping guarded sets are used.
To cover uGC2 (=) we also consider guarded bisimulations preserving the number of guarded sets of the same type that contain a
fixed singleton set. Assume A and B are interpretations. A set I of
partial isomorphisms p : ~a 7→ ~b between guarded tuples ~a and ~b in
A and B, respectively, is called a connected guarded bisimulations
if the following holds for all p : ~a 7→ ~b ∈ I:
• for every guarded tuple ~a0 in A which has a non-empty intersection with ~a there exists a guarded tuple ~b0 in B and
p0 : ~a0 7→ ~b0 ∈ I that coincides with p on the intersection
of ~a and ~a0 .
• for every guarded tuple ~b0 in B which has a non-empty intersection with ~b there exists a guarded tuple ~a0 in A and p0 : ~a0 7→
~b0 ∈ I that coincides with p on the intersection of ~b and ~b0 .
We say that (A, ~a) and (B, ~b) are connected guarded bisimilar if
there exists a connected guarded bisimulation between A and B
containing ~a 7→ ~b.
Theorem 15 If (A, ~a) and (B, ~b) are connected guarded bisimilar
and ϕ(~x) is a formula in openGF, then A |= ϕ(~a) iff B |= ϕ(~b).
For uGC2 (=) the guarded sets have cardinality at most two. To
preserve counting we require the following modified version of the
definition above. A set I of partial isomorphisms p : ~a 7→ ~b between guarded tuples ~a = (a1 , a2 ) and ~b = (b1 , b2 ) in A and B,
respectively, is called a counting connected guarded bisimulations
if the following holds for all p : (a1 , a2 ) 7→ (b1 , b2 ) ∈ I:
• for every finite number of distinct guarded tuples (a1 , a02 ) in A
there exist (at least) the same number of guarded tuples (b1 , b02 )
in B and p0 : (a1 , a02 ) 7→ (b1 , b02 ) ∈ I.
• for every finite number of distinct guarded tuples (a01 , a2 ) in A
there exist (at least) the same number of guarded tuples (b01 , b2 )
in B and p0 : (a01 , a2 ) 7→ (b01 , b2 ) ∈ I.
• for every finite number of distinct guarded tuples (b1 , b2 ) in B
there exist (at least) the same number of guarded tuples (a1 , a02 )
in A and p0 : (a1 , a02 ) 7→ (b1 , b02 ) ∈ I.
• for every finite number of distinct guarded tuples (b01 , b2 ) in B
there exist (at least) the same number of guarded tuples (a01 , a2 )
in A and p0 : (a01 , a2 ) 7→ (b01 , b2 ) ∈ I.
We say that (A, ~a) and (B, ~b) are counting connected guarded
bisimilar if there exists a counting connected guarded bisimulation
between A and B containing ~a 7→ ~b.
Theorem 16 If (A, ~a) and (B, ~b) are counting connected guarded
bisimilar and ϕ(~
x) is a formula in openGC2 , then A |= ϕ(~a) iff
B |= ϕ(~b).
Lemma 1 (restated) Let O be a uGF(=) or uGC2 (=) ontology,
D a possibly infinite instance, and A a model of D and O. Then
there exists a forest-model B of D and O such that there exists a
homomorphism h from B to A that preserves dom(D).
P ROOF. Assume first a uGF(=) ontology O, an instance D, and
a model A of O and D are given. Let G be a maximal guarded set
in D. We unfold A at G into a cg-tree decomposable interpretation
BG as follows: let T (G) be the undirected tree with nodes t =
G0 G1 · · · Gn , where Gi , 0 ≤ i ≤ n, are maximal guarded sets in
A, G0 = G, G1 6⊆ dom(D), and (a) Gi 6= Gi+1 , (b) Gi ∩ Gi+1 6=
∅, and (c) Gi−1 6= Gi+1 . Let (t, t0 ) ∈ E if t0 = tF for some
maximal guarded F . Set tail(G0 · · · Gn ) = Gn . Take an infinite
supply of copies of any d ∈ dom(A). We assume all copies are in
∆D . We set e↑ = d if e is a copy of d. Define instances bag(t)
for t ∈ T (G) inductively as follows. bag(G) is an instance whose
domain is a set of copies of elements d ∈ G such that the mapping
e 7→ e↑ is an isomorphism from bag(G) onto A|G . Assume bag(t)
has been defined, tail(t) = F , and bag(t0 ) has not yet been defined
for t0 = tF 0 ∈ T (G). Then take for any d ∈ F 0 \ F a fresh copy
d0 of d and define bag(t0 ) with domain {d0 | d ∈ F 0 \ F } ∪ {e ∈
bag(t) | e↑ ∈ F 0 ∩ F )} such that the mapping e 7→ e↑ is an
0
isomorphism from
S bag(t ) onto A|F 0 .
Let BG = t∈T (G) bag(t). Now hook BG to D at G by identifying dom(bag(G)) with G using the isomorphism e 7→ e↑ and let
B be the union of all BG with G a maximal guarded set in D. It is
not difficult to prove the following properties of B:
1. (T (G), E, bag) is a cg-tree decomposition of BG , for every
maximal guarded set G in D;
2. The mapping h : e 7→ e↑ is a homomorphism from B to A
which preserves dom(D) and is an isomorphism on each bag(t)
with t ∈ T (G) for some maximal guarded set G in D.
3. For any maximal guarded set G = {a1 , . . . , ak } in D, there
is a connected guarded bisimulation between (B, (a1 , . . . , ak ))
and (A, (a1 , . . . , ak )).
It follows from Theorem 15 that B is a model of D and O and thus
as required.
Assume now that O is a uGC2 (=) ontology, that D is an instance, and that A is a model of O and D. We can assume that
D and A use unary and binary relation symbols only. We have
to preserve the number of guarded sets of a given type intersecting with a singleton and therefore the unfolding is slightly different from the case without guarded counting quantifiers. Let
c ∈ dom(D). We unfold A at c into a cg-tree decomposable B{c} .
Let T (c) be the undirected tree with nodes t = {c}G1 · · · Gn ,
where Gi , 1 ≤ i ≤ n, are maximal guarded sets in A, c ∈ G1
and G1 6⊆ dom(D), and (a) Gi 6= Gi+1 , (b) Gi ∩ Gi+1 6= ∅,
and (c) Gi−1 ∩ Gi 6= Gi+1 ∩ Gi . Let (t, t0 ) ∈ E if t0 = tF
for some maximal guarded F in D. Observe that Condition (c) is
stronger than the corresponding Condition (c) in the construction
of forest models for uGF ontologies. The new condition ensures
that we do not introduce more successors of the same type when
we unfold an interpretation. Take an infinite supply of copies of
any d ∈ dom(A). We set, as before, e↑ = d if e is a copy of
d. Define instances bag(t) for t ∈ T (c) inductively as follows.
Let c0 be a copy of c and set bag({c}) = {P (c0 ) | P (c) ∈ A}.
Assume bag(t) has been defined, tail(t) = F , and bag(t0 ) has
not yet been defined for t0 = tF 0 ∈ T (c). Take for the unique
d ∈ F 0 \ F a fresh copy d0 of d and define bag(t0 ) with domain {d0 } ∪ {e ∈ bag(t) | e↑ ∈ F 0 ∩ F )} such that the mapping e 7→S e↑ is an isomorphism from bag(t0 ) onto A|F 0 . Let
B{c} = t∈T (c) bag(t). Now hook B{c} to D at {c} by identifying c0 with c and let
[
B = {R(~a) ∈ A | ~a ⊆ dom(D)} ∪
B{c} .
c∈dom(D)
We can, of course, present the interpretation B as an interpretation obtained from hooking interpretations BG with G maximal
guarded sets to D by choosing for each c a single maximal guarded
set f (c) with c ∈ f (c) and setting for G = {c1 , c2 },
[
BG = {R(~a) ∈ A | ~a ⊆ G} ∪
B{c}
f (c)=G
It is not difficult to prove the following properties of B:
1. (T (c), E, bag) is a guarded tree decomposition of B{c} , for every c ∈ dom(D);
2. The mapping h : e 7→ e↑ is a homomorphism from B to A
which preserves dom(D) and is an isomorphism on each bag(t)
with t ∈ T (c) for some c ∈ dom(D).
3. For any maximal guarded set G = {a1 , a2 } in D,
there is a counting connected guarded bisimulation between
(B, (a1 , a2 )) and (A, (a1 , a2 )).
It follows from Theorem 16 that B is a model of O and D and thus
as required.
D.
PROOFS FOR SECTION 3
Theorem 2 (restated) Let O be a uGF(=) or uGC2 (=) ontology
and M a class of instances. Then the following conditions are
equivalent:
P ROOF. The equivalence (2) ⇔ (3) is straightforward. We show
(1) ⇒ (2). Assume O is rAQ-materializable for M and assume
the instance D ∈ M is consistent w.r.t. O. Let A be a rAQmaterialization of O and D. Consider a forest model B of D
and O such that there is a homomorphism from B to A preserving dom(D) (Lemma 1). Recall that B is obtained from D by
hooking cg-tree decomposable BG to any maximal guarded set G
in D. We show that B is a CQ-materialization of O and D. To this
end it is sufficient to prove that for any finite subinterpretation B0
of B and any model A0 of O and D there exists a homomorphism h
from B0 to A0 that preserves dom(D) ∩ dom(B0 ). We may assume
that for any maximal guarded set G in D, if dom(B0 ) ∩ G 6= ∅,
then G ⊆ dom(B0 ) and that B0 ∩ BG is connected if nonempty.
But then we can regard every B0 ∩ BG as an rAQ qG with answer
variables G. From B |= qG (~b) for ~b a suitable tuple containing all
b ∈ G we obtain A |= qG (~b) since B is a rAQ-materialization of D
and O. Let hG be the homomorphism that witnesses B |= qG (~b).
Then hG is a homomorphism from B0 ∩ BG to A that preserves G.
The union h of all hG , G a maximal guarded set in D, is the desired
homomorphism from B0 to A0 preserving dom(D)∩dom(B0 ).
Lemma 2 (restated) A uGC2 (=) ontology is materializable iff it
admits hom-universal models. This does not hold for uGF(2) ontologies.
P ROOF. The direction “⇐” is straightforward. Conversely, assume that O is materializable. Let D be an instance that is consistent w.r.t. O. By Lemma 1 there exists a forest model B of D and
O that is a CQ-materialization of O and D. Using a straightforward selective filtration procedure one can show that there exists
B0 ⊆ B such that B0 is a model of O and D and for any guarded
set G the number of guarded sets G0 with G ∩ G0 6= ∅ is finite. We
show that for any model A of O and D there is a homomorphism
from B0 into A that preserves dom(D). Let A be a model of O
and D. Again we may assume that A is a forest model such that for
any guarded set G the number of guarded sets G0 with G ∩ G0 6= ∅
is finite. Now, for any finite subset F of dom(B0 ) there is a homomorphism hF from B0|F to A preserving dom(D) ∩ F . Let Fm
be the set of all d ∈ dom(B0 ) such that there is a sequence of at
most m guarded sets G0 , . . . , Gm with Gi ∩ Gi+1 6= ∅ for i < m,
G
S0 ∩ dom(D) 6= ∅, 0and d ∈ Gm . Then each Fm is finite and
m≥0 Fm = dom(B ). Now we can construct the homomorphism
h from B0 to A as the limit of homomorphisms hFnm from B0|Fnm
to A preserving dom(D) for an infinite sequence n0 < n1 · · · using a standard pigeonhole argument.
We construct an ontology O that expresses that every element of
a unary relation C is in the centre of a ‘partial wheel’ represented
by a ternary relation W . The wheel can be generated by turning
right or left. The two resulting models are homomorphically incomparable but cannot be distinguished by answers to CQs. Thus,
neither of the two models is hom-universal but one can ensure that
O is materializable. As a first attempt to construct O we take unary
relations L (turn left) and R (turn right), the following sentence that
says that one has to choose between L and R when generating the
wheel with cente C:
∀x C(x) → ((L(x) ∨ R(x)) ∧ ∃y1 , y2 W (x, y1 , y2 ))
1. O is rAQ-materializable for M;
and the following sentences that generates the wheel accordingly:
∀x, y, z W (x, y, z) → (L(x) → ∃z 0 W (x, z, z 0 ))
3. O is UCQ-materializable for M.
∀x, y, z W (x, y, z) → (R(x) → ∃y 0 W (x, y 0 , y))
2. O is CQ-materializable for M;
Clearly this ontology does not admit hom-universal models. Note,
however, that is also not materializable since for the instance
D = {C(a)} we have O, D |= L(a) ∨ R(a) but we neither
have O, D |= L(a) nor O, D |= R(a). The first step to ensure materializability is to replace L(x) by ∃y(gen(x, y) ∧ ¬L(x))
and R(x) by ∃y(gen(x, y) ∧ ¬R(x)) in the sentences above and
also add ∀x∃y(gen(x, y) ∧ L(x)) and ∀x∃y(gen(x, y) ∧ R(x)) to
O. Then a CQ ‘cannot detect’ whether one satisfies the disjunct
∃y(gen(x, y) ∧ ¬R(x)) or the disjunct ∃y(gen(x, y) ∧ ¬L(x)) at a
given node x in C. Note that the resulting ontology is still not materializable: if W (a, b, c) ∈ D then CQs can detect whether one
introduces a c0 with W (a, c, c0 ) ∈ A or a b0 with W (a, b0 , b) ∈ A.
To deal with this problem we ensure that a wheel has to be generated from W (a, b, c) only if b, c are not in the instance D. In
detail, we construct O as follows. We use unary relations A, L,
and R and binary relations aux and gen. For a binary relation S
and unary relation B we abbreviate
∀S.B
∃S.¬B(x)
:= ∀y(S(x, y) → B(y))
:= ∃y(S(x, y) ∧ ¬B(y))
Now let O state that every node has aux-successors in A, and gensuccessors in L and R:
∀x∃y(aux(x, y) ∧ A(y))
and
∀x∃y(gen(x, y) ∧ L(y)) ∀x∃y(gen(x, y) ∧ R(y))
We abbreviate
W 0 (x, y, z) := W (x, y, z) ∧ ∀aux.A(y) ∧ ∀aux.A(z)
Next we introduce the disjunction that determines whether one generates the wheel by turning left or right, as indicated above:
∀x C(x) → (∃gen.¬L(x) ∨ ∃gen.¬R(x))
The following axiom starts the generation of the wheel. Observe
that we use W 0 (x, y1 , y2 ) rather than W (x, y1 , y2 ) to ensure that
new individuals are created for y1 and y2 :
∀x C(x) → ∃y1 , y2 W 0 (x, y1 , y2 )
Finally, we turn either left or right:
∀x, y, z W 0 (x, y, z) → (∃gen.¬L(x) → ∃z 0 W 0 (x, z, z 0 ))
∀x, y, z W 0 (x, y, z) → (∃gen.¬R(x) → ∃y 0 W 0 (x, y 0 , y))
This finishes the definition of O. O is a uGF(2) ontology. To
show that it is materializable observe that for any instance D we
can ensure that in a model A of D and O for all a ∈ dom(D)
there exists b with aux(a, b) ∈ A and A(b) 6∈ A. Thus, the wheel
generation sentences apply only to W (a, b, c) with b, c 6∈ dom(A).
It is now easy to prove that CQ evaluation w.r.t. O is in PT IME. On
the other hand, for D = {C(a)} there does not exist a model B
of O and D such that there is a homomorphism from B into every
model of O and D that preserves dom(D).
In the proof of Lemma 3 (and related proofs below), it will be
more convenient to work with a certain disjunction property instead of with materializability. We now introduce this property and
show the equivalence of the two notions. Let Q be a class of nonBoolean connected CQs. An ontology O has the Q-disjunction
property if for all instances D, queries q1 (~
x1 ), . . . , qn (~
xn ) ∈ Q
and d~1 , . . . , d~n in D with d~i of the same length as ~
xi : if O, D |=
q1 (d~1 ) ∨ . . . ∨ qn (d~n ), then there exists i ≤ n such that O, D |=
qi (d~i ).
Theorem 17 Let Q be a class CQs and O an FO(=)-ontology.
Then O is Q-materializable iff O has the Q-disjunction property.
P ROOF. For the nontrivial “⇐” direction, let D be an intance that is consistent w.r.t. O and such that there is no Qmaterialization of O and D. Then the set of FO-sentences O∪D∪Γ
is not satisfiable, where
~ | O, D 6|= q(d),
~ d~ ⊆ dom(D), q ∈ Q}
Γ = {¬q(d)
In fact, any satisfying interpretation would be a Q-materialization
of O. By compactness, there is a finite subset Γ0 of Γ such that
O ∪ D ∪ Γ0 is not satisfiable, i.e.
_
~
O, D |=
q(d)
0
~
¬q(d)∈Γ
~ for all q(d)
~ ∈ Γ0 . Thus, O lacks
By definition of Γ0 , O, D 6|= q(d)
the Q-disjunction property.
Theorem 3 (restated) Let O be an FO(=)-ontology that is invariant under disjoint unions. If O is not materializable, then rAQevaluation w.r.t. O is CO NP-hard.
P ROOF SKETCH . It was proved in [42] that if an ALC ontology O is not ELIQ-materializable, then ELIQ-evaluation w.r.t. O is
CO NP-hard, where an ELIQ is a rAQ q(~
x) such that the associated
instance Dq(~x) viewed as an undirected graph is a tree (instead of
cg-tree decomposable) with a single answer variable at the root.2
The proof is by reduction from 2+2-SAT, the variant of propositional satisfiability where the input is a set of clauses of the form
(p1 ∨ p2 ∨ ¬n1 ∨ ¬n2 ), each p1 , p2 , n1 , n2 a propositional letter
or a truth constant [50]. The proof of Theorem 3 can be obtained
from the proof in [42] by minor modifications, which we sketch in
the following.
The proof crucially exploits that if O is not rAQ-materializable,
then by Theorem 17 it does not have the rAQ-disjunction property. In fact, we take an instance D, rAQs (resp. ELIQs)
q1 (~
x1 ), . . . , qn (~
xn ) ∈ Q, and elements d~1 , . . . , d~n of D that witness failure of the disjunction property, copy them an appropriate
number of times, and use the resulting set of gadgets to choose a
truth value for the variables in the input 2+2-SAT formula. The fact
that O is invariant under disjoint unions ensures that the choice of
truth values for different variables is independent. A main difference between ELIQs and rAQs is that rAQs can have more than one
answer variable. A straightforward way to handle this is to replace
certain binary relations from the reduction in [42] with relations of
higher arity (these are ‘fresh’ relations introduced in the reduction,
that is, they do not occur in O). To deal with a rAQ of arity k, one
would use a k + 1-ary relation. However, with a tiny bit of extra
effort (and if desired), one can replace these relations with k binary
relations.
We now turn to the proof of Theorem 4.
Theorem 4 (restated) Let O be a uGF(=) or uGC2 (=) ontology.
The following are equivalent:
1. UCQ-evaluation w.r.t. O is in PT IME iff CQ-evaluation w.r.t. O
is in PT IME iff rAQ-evaluation w.r.t. O is in PT IME.
2. UCQ-evaluation w.r.t. O is Datalog6= -rewritable iff CQevaluation w.r.t. O is Datalog6= -rewritable iff rAQ-evaluation
w.r.t. O is Datalog6= -rewritable. If O is a uGF ontology, then
the same is true for Datalog-rewritability.
2
In the context of ALC, relation symbols are at most binary and
thus it should be clear what ‘tree’ means.
3. UCQ-evaluation w.r.t. O is CO NP-hard iff CQ-evaluation
w.r.t. O is CO NP-hard iff rAQ-evaluation w.r.t. O is CO NPhard.
The “only if” directions of the first two parts of Theorem 4 and
the “if” direction of the third part are trivial. For the non-trivial
“if” directions of part 1 and
V 2 we decompose UCQs into finite disjunctions of queries q ∧ i qi where q is a “core CQ” that only
needs to be evaluated over the input instance D (ignoring labeled
nulls) and each qi is a rAQ. This is similar to squid decompositions
in [12], but more subtle due to the presence of subqueries that are
not connected to any answer variable of q. The same decomposition of UCQs is also used in the “only if” direction of part 3. The
following definition formally introduces decompositions of UCQs.
Definition 5 Let O be an ontology and let q(~
x) be a UCQ. A decomposition of q(~x) w.r.t. O is a finite set Q of pairs (φ(~
y ), C),
where
• φ(~
y ) is a conjunction of atoms (possibly equality atoms) such
that ~
y contains all variables in ~
x;
• C is a finite set of CQs q(~z), where Dq is cg-tree decomposable;
such that the following are equivalent for each instance D and tuple
~a ∈ dom(D)|~x| :
1. O, D |= q(~a).
2. For each model A of D and O there exists a pair (φ(~
y ), C) in Q
and an assignment π of elements in dom(D) to the variables in
~
y such that
• π(~x) = ~a,
• D |= φ(π(~
y )), and
• A |= q 0 (π(~z)) for each q 0 (~z) in C.
The decomposition is strong if for each pair (φ(~
y ), C), the set C
consists of rAQs.
We aim to prove that for every uGF(=) or uGC2 (=) ontology
O and each UCQ q(~x) there exists a strong decomposition of q(~
x)
w.r.t. O. As an intermediate step, we prove:
Lemma 8 For each uGF(=) or uGC2 (=) ontology O and each
UCQ q(~x) there is a decomposition Q of q(~
x) w.r.t. O. Moreover,
if O is an uGC2 (=) ontology, then each rAQ that occurs in a pair
of Q has a single free variable.
P ROOF. The construction is similar to that of squid decompositions in [12]. We first construct Q and then show that it is a decomposition of q(~x) w.r.t. O. To simplify the presentation, we will
assume that each element that occurs in Dq0 for a CQ q 0 is identified with the variable it represents. Consider the set I of all tuples
(q 0 , f, V, H, (Ti )ki=1 ), where
• q 0 is a disjunct of q;
• f : dom(Dq0 ) → dom(Dq0 ) is an idempotent mapping such
that for each answer variable x of q 0 we have that f (x) is an
answer variable;
• V ⊆ f (dom(Dq0 )) and V contains all variables in f (~
x);
• H, T1 , . . . , Tn form a partition of f (Dq0 ), and each atom in H
only contains variables in V ;
• each Ti is cg-tree decomposable;
• if O is a uGC2 (=) ontology, then each dom(Ti ) contains at
most one variable from V ;
• k is at most the number of atoms in q 0 .
Clearly, I is finite. Each tuple I ∈ I contributes exactly one pair
pI to Q, which we describe next.
Let I = (q 0 , f, V, H, (Ti )ki=1 ) ∈ I. Define φ(~
y ) to be the
conjunction of all atoms of H and all equality atoms x = f (x)
for each variable x in ~
x with f (x) 6= x. Furthermore, for each
i ∈ {1, . . . , k}, let qi (~zi ) be the CQ whose atoms are the atoms in
Ti , and whose tuple ~zi of answer variables consists of all variables
in dom(Ti ) that also occur in V . Let
pI = (φ(~
y ), {qi (~zi ) | 1 ≤ i ≤ k}).
We show that Q = {pI | I ∈ I} is a decomposition of q(~
x) w.r.t.
O.
First note that each pair in Q has the form (φ(~
y ), C), where φ(~
y)
is a conjunction of atomic formulas that contains all variables of
~
x, and C is a finite set of CQs q 0 (~z) whose instance Dq0 is cg-tree
decomposable. Furthermore, if O is formulated in uGC2 (=), then
each CQ in C has at most one answer variable.
Next we establish the equivalence between conditions 1 and 2
from Definition 5. For the direction from 2 to 1, suppose that for
each model A of D and O there exists a pair (φ(~
y ), C) ∈ Q and an
assignment π of elements in dom(D) to the variables in ~
y such that
π(~
x) = ~a, D |= φ(π(~
y )), and A |= q 0 (π(~z)) for each q 0 (~z) ∈ C.
By construction of (φ(~
y ), C) this implies O, D |= q.
For the direction from 2 to 1, suppose that O, D |= q(~a), and
consider a model A of D and O. By Lemma 1, there is a forestmodel B of D and O and a homomorphism h from B to A that
preserves dom(D). Let
[
B=D∪
BG ,
G∈G
where G is the set of all maximal guarded subsets of D, and for each
BG there is a cg-tree decomposition of BG whose root is labeled
with a bag with domain G. Now, O, D |= q(~a) implies B |= q(~a),
so there is a disjunct q 0 (~
x) of q(~
x) and a homomorphism π from
Dq0 to B with π(~
x) = ~a. Note that π is the composition of
• an idempotent mapping f : dom(Dq0 ) → dom(Dq0 ) such that
f (~
x) contains only variables from ~
x, and
• an injective mapping g from f (dom(Dq0 )) to B.
Let
• V := {f (x) ∈ dom(Dq0 ) | g(f (x)) ∈ dom(D)};
S
• H := {f (α) | α ∈ Dq0 , g(f (α)) ∈ D \ G∈G BG };
• TG := {f (α) | α ∈ Dq0 , g(f (α)) ∈ BG } for G ∈ G.
Here, for each atom α = R(x1 , . . . , xk ) and mapping h with domain {x1 , . . . , xk } we use the notation h(α) to denote the atom
R(h(x1 ), . . . , h(xk )). Let T1 , . . . , Tk be an enumeration of all
connected components of the sets TG , G ∈ G. Then each Ti is
cg-tree decomposable, and k is bounded by the number of atoms in
q 0 . Now,
I = (q 0 (~
x), f, V, H, (Ti )ki=1 ) ∈ I,
and pI = (φ(~
y ), C) ∈ Q. It is straightforward to verify that D |=
φ(π(~
y )) and B |= q 0 (π(~z)) for each q 0 (~z) ∈ C. Since h is a
homomorphism from B to A that preserves dom(D), we obtain
that π 0 := h ◦ π is an assignment of elements in dom(D) to the
variables in ~
y with π 0 (~
x) = ~a, D |= φ(π 0 (~
y )), and A |= q 0 (π 0 (~z))
for each q 0 (~z) ∈ C.
Altogether, this completes the proof that Q is a decomposition
of q(~
x) w.r.t. O.
Next we show that for uGF(=) or uGC2 (=) ontologies O we
can transform any decomposition of a UCQ q(~
x) w.r.t. O into a
strong decomposition of q(~
x) w.r.t. O. This transformation is based
on Lemma 9 below. Given a uGF(=) or uGC2 (=) ontology O
and a decomposition Q of a UCQ w.r.t. O, the lemma tells us that
for each model A of D and O, each pair (φ(~
y ), C) ∈ Q, each
assignment π of elements in dom(D) to the variables in ~
y , and
each q 0 (~z) ∈ C, it suffices to consider homomorphisms from Dq0
to A whose image contains an element at a bounded distance from
an element in dom(D). Let us first make more precise what we
mean by the distance between elements in an instance.
Definition 6 Let A be an instance.
• The Gaifman graph of an instance A is the undirected graph
whose vertex set is dom(A) and which has an edge between
any two distinct a, b ∈ dom(A) if a and b occur together in an
atom of A.
• Given a, b ∈ dom(A), the distance from a to b in A, denoted
by distA (a, b), is the length of a shortest path from a to b in the
Gaifman graph of A, or ∞ if there exists no such path.
• Given A, B ⊆ dom(A), the distance from A to B in A is defined as distA (A, B) := mina∈A,b∈B distA (a, b).
Lemma 9 Let O be a uGF(=) or uGC2 (=) ontology, and let Q
be a decomposition of a UCQ q0 (~
x) w.r.t. O. Then there exists an
integer d ≥ 0 such that the following is true for all instances D
and all tuples ~a ∈ dom(D)|~x| .
If O, D |= q0 (~a), then for each model A of D and O there exists
a pair (φ(~
y ), C) in Q and an assignment π of elements in dom(D)
to the variables in ~
y such that
• π(~x) = ~a,
• D |= φ(π(~
y )), and
• for each q(~z) ∈ C there is a homomorphism h from Dq to A
such that h(~z) = π(~z) and the distance from dom(h(Dq )) to
dom(D) in A is at most d.
P ROOF. For each model A of D and O and each positive integer d, let Qd (A) be the set of all triples (φ(~
y ), C, π) such that
(φ(~
y ), C) ∈ Q and π is an assignment of elements in dom(D) to
the variables in ~
y such that
• π(~x) = ~a,
• D |= φ(π(~
y )), and
• for each q(~z) ∈ C there is a homomorphism h from Dq to A
such that h(~z) = π(~z) and the distance from dom(h(Dq )) to
dom(D) in A is at most d.
We show that there exists a constant d ≥ 0 with the following
property: if O, D |= q0 (~a), then for each model A of D and O we
have Qd (A) 6= ∅.
The constant d depends on the number of C-types, which we
define next. Let Q0 be the set of all Boolean CQs that occur in the
set C of some pair (φ(~
y ), C) ∈ Q. For the definition of C-types we
assume that each q ∈ Q0 is a GF sentence (if O is formulated in
uGF(=)), or a GC2 sentence (if O is formulated in uGC2 (=)). We
can assume this w.l.o.g. because for each q ∈ Q0 the instance Dq
has a cg-tree decomposition, which can be used to construct a GF
or GC2 sentence that is equivalent to q. Now, let cl(O, q) be the
smallest set satisfying:
• O ∪ Q0 ⊆ cl(O, q);
• cl(O, q) contains one atomic formula R(~
x) for each relation
symbol R that occurs in O or q, where ~
x is a tuple of distinct
variables;
• cl(O, q) is closed under subformulas and single negation, where
the subformulas of a formula φ = Q~
x ψ with a quantifier Q are
φ itself and ψ.
Given a set C ⊆ ∆D , a C-type is a set of formulas φ(~c), where
φ(~
x) ∈ cl(O, q) and ~c is a tuple of constants in C of the appropriate
length. Let w be the maximum arity of a relation symbol in O or
q, and define τ to be the number of C-types for a set C of size
2w. Note that
τ depends only on O and q, and is bounded by
w
2|cl(O,q)|·(2w) . Fix
d := s + d∗
with d∗ := τ 2 + 1,
where s is the maximum number of atoms in a non-Boolean CQ
that occurs in C for some pair (φ(~
y , C) ∈ Q. This finishes the
definition of the constant d.
Suppose now that O, D |= q0 (~a). We aim to show that each
model A of D and O satisfies Qd (A) 6= ∅. Let A be a model of D
and O. By Lemma 1, there is a forest-model B of D and O and a
homomorphism g from B to A that preserves dom(D). We show
that Qd (B) 6= ∅. Since g preserves CQs and does not increase
distances (i.e., for all a, b ∈ dom(B) we have distA (g(a), g(b)) ≤
distB (a, b)), this implies Qd (A) 6= ∅, as desired.
For a contradiction, suppose that Qd (B) = ∅. We use B to
construct a model of D and O that does not satisfy q0 (~a). This
contradicts O, D |= q0 (~a), and finishes the proof. The following is
the core of the construction.
Claim. Let C be a forest-model of D and O and let e ≥ d be such
that Qe (C) = ∅. Then there is a forest-model C0 of D and O with
Qe+1 (C0 ) = ∅, and C0 agrees with C on all elements at distance at
most e − d∗ + 1 from dom(D).
Proof. Recall the definition of the set Q0 given at the beginning of
the proof of Lemma 9. Let X be the set of all q ∈ Q0 such that for
each homomorphism h from Dq to C,
distC (dom(h(Dq )), dom(D)) ≥ e + 1.
For each instance C0 and each integer e0 , let He0 (C0 ) be the set of
all pairs (q, h) such that q ∈ X and h is a homomorphism from Dq
to C0 with
distC0 (dom(h(Dq )), dom(D)) ≤ e0 .
Pick a forest-model C0 of D and O such that
1. He (C0 ) is empty;
2. He+1 (C0 ) is inclusion-minimal;
3. for each (φ(~
y ), C) ∈ Q, each assignment π with π(~
x) = ~a
and D |= φ(π(~
y )), and each non-Boolean q(~z) ∈ C we have
C |= q(π(~z)) iff C0 |= q(π(~z));
4. C0 agrees with C on all elements at distance at most e − d + 1
from dom(D).
Such a model exists since C is a forest-model of D and O that
satisfies conditions 1 and 3. We show that He+1 (C0 ) = ∅. This
implies Qe+1 (C0 ) = ∅ as follows. Suppose, for a contradiction,
that He+1 (C0 ) = ∅ and Qe+1 (C0 ) 6= ∅. Then any (φ(~
y ), C, π) ∈
Qe+1 (C0 ) satisfies π(~
x) = ~a, D |= φ(π(~
y )), and C |= q(π(~z))
for all non-Boolean q(~z) ∈ C (by point 3). But then, there is a
Boolean q ∈ C with q ∈ X (as otherwise (φ(~
y ), C, π) ∈ Qe (C)).
Since by (φ(~
y ), C, π) ∈ Qe+1 (C0 ) there exists a homomorphism h
from Dq to C0 such that the distance from dom(h(Dq )) to dom(D)
is at most e+1, we obtain (q, h) ∈ He+1 (C0 ) = ∅, a contradiction.
Hence, it suffices to show that He+1 (C0 ) = ∅.
For a contradiction, suppose that He+1 (C0 ) 6= ∅. We construct
a new forest-model C00 of D and O that satisfies conditions 1, 3,
and 4, but with He+1 (C00 ) ( He+1 (C0 ). This will contradict that
He+1 (C0 ) is inclusion-minimal. The construction is based on a
pumping argument.
S
By definition we have C0 = D ∪ G∈G C0G , where G is the set
of all maximal guarded subsets of D, and for each G ∈ G there
is a cg-tree decomposition (TG , EG , bagG ) of C0G whose root rG
satisfies dom(bagG (rG )) = G. For each node t ∈ TG , let bag∗G (t)
be the union of bagG (t0 ), where t0 ranges over the descendants of t
in (TG , EG ), including t.
Observe that for each (q, h) ∈ He+1 (C0 ) there is a G ∈ G and
a node t ∈ TG at depth at least e in the tree (TG , EG ) such that
h(Dq ) ⊆ bag∗G (t). Indeed, since
distC0 (dom(h(Dq )), dom(D)) ≥ e + 1,
(1)
the set h(Dq ) does not contain any constant from dom(D). Moreover, since Dq is connected, there must be a G ∈ G and a node
t ∈ TG such that h(Dq ) ⊆ bag∗G (t). This node t can be chosen to
have depth at least e in (TG , EG ), because otherwise we could construct a path of length at most e from an element in dom(h(Dq ))
to an element in dom(D) in the Gaifman graph of C0 , contradicting (1).
Pick any G ∈ G and t ∈ TG at depth e in (TG , EG ) such that
for some (q, h) ∈ He+1 (C0 ) we have h(Dq ) ⊆ bag∗G (t). Let
t0 , t1 , . . . , td∗ be the last d∗ + 1 nodes on the path from rG to t
in (TG , EG ), and define Ci := dom(bagG (ti )) as well as Ci+ :=
Ci−1 ∪ Ci . For each i ∈ {1, . . . , d∗ }, define the type and extended
type of ti as follows:
|~
x|
θi := {φ(~a) | φ(~
x) ∈ cl(O, q), ~a ∈ Ci , C0 |= φ(~a)},
x) ∈ cl(O, q), ~a ∈ (Ci+ )|~x| , C0 |= φ(~a)}.
θi+ := {φ(~a) | φ(~
Note that θi is a Ci -type and θi+ is a Ci+ -type. By our choice of
d∗ , there are 1 ≤ i < j ≤ d∗ and a bijective mapping f : Ci+ →
Cj+ with f (θi+ ) = θj+ and f (θi ) = θj , where f (θi+ ) is obtained
from θi+ by replacing each constant a that occurs in it by f (a),
and likewise for f (θi ). In particular, f is an isomorphism from
bagG (ti ) to bagG (tj ). We extend f to an injective mapping with
domain dom(bag∗G (ti )) such that each element that does not occur
in Ci is mapped to an element in ∆D \ dom(C0 ).
We are now ready to construct C00 . First, define
C00G := (C0G \ bag∗G (tj )) ∪ f (bag∗G (ti )).
Note that a cg-tree decomposition of C00G can be obtained from
(TG , EG , bagG ) by removing the subtree of (TG , EG ) rooted at
tj , adding a fresh copy of the subtree rooted at ti , making its root a
child of tj−1 , and renaming each element a that occurs in the bag
of a node in the new subtree to f (a). The root of this cg-tree decomposition is the root of (TG , EG , bagG ), and is therefore labeled
with a bag with domain G. We let
[
C00 := D ∪ C00G ∪
C0G0 .
G0 ∈G\{G}
It is straightforward to verify that C00 is a forest-model of D and O
that has the desired properties. For point 4, note that C00 agrees with
C0 on all elements whose distance from dom(D) in C00 is at most
the depth of tj in (TG , EG ), which is at least e − d∗ + 1. Note that
point 3 follows from point 4, because e − d∗ + 1 ≥ s + 1 and s was
chosen in such a way that any non-Boolean CQ that occurs in C for
some (φ(~
y ), C) ∈ Q and that has a match into C00 has a match into
the subinstance of C00 induced by all elements at distance at most s
from dom(D).
y
To complete the proof of Lemma 9, we apply the claim repeatedly to B to obtain a sequence B0 , B1 , B2 , . . . of forest models
Bi of D and O such that for each i ≥ 0,
• Qd+i (Bi ) = ∅;
• Bi and Bi+1 agree on all elements at distance at most s + i + 1
from elements in dom(D).
By compactness, we obtain a forest-model B0 of D and O with
Qe (B0 ) = ∅ for all integers e ≥ 0. Since Q is a decomposition of
q0 (~
x) w.r.t. O, this implies B0 6|= q0 (~a), as desired.
We now use Lemma 9 to show that any decomposition of a UCQ
q(~
x) w.r.t. a uGF(=) or uGC2 (=) ontology O can be turned into a
strong decomposition of q(~
x) w.r.t. O.
Lemma 10 Let O be a uGF(=) or uGC2 (=) ontology, and let Q
be a decomposition of a UCQ q(~
x) w.r.t. O. Then there is a strong
decomposition of q(~
x) w.r.t. O.
P ROOF. Fix the constant d from Lemma 9. Consider a pair p =
(φ(~
x), C) ∈ Q, and let q1 (~z1 ), . . . , qn (~zn ) be an enumeration of
all CQs in C. For each i ∈ {1, . . . , n}, we define a set Ci of rAQs
as follows. If qi (~zi ) is non-Boolean, then qi (~
x) is an rAQ and we
define Ci := {qi (~
x)}. Otherwise, if qi is Boolean, let qi ← φ.
Then, Ci consists of all rAQs of the form
q 0 (x) ← R1 (~
y1 ) ∧ · · · ∧ Re (~
ye ) ∧ φ,
for e ≤ d, where
• x occurs in ~
y1 ;
• each Ri is an at least binary relation symbol in O;
• ~
ye contains a variable y in one of the atoms of φ, but no other
variable from φ occurs in any of the ~
yi .
Let Q0p be the set of all pairs (φ(~
x), {qi0 (~zi ) | 1 ≤ i ≤ n}), where
qi0 (~zi ) ∈ Ci for each i ∈ {1, . . . , n}. By the construction of Q0p
and our choice of d, it follows that the following are equivalent for
each model A of D and O:
1. there exists an assignment π such that π(~
x) = ~a, D |= φ(π(~
y )),
and A |= qi (π(~zi )) for each 1 ≤ i ≤ n;
2. there exists a pair (φ(~
x), C 0 ) ∈ Q0p and an assignment π such
that π(~
x) = ~a, D |= φ(π(~
y )), and A |= q 0 (π(~z)) for each
0
0
q (~z) ∈ C .
S
Altogether, this implies that Q0 := p∈Q Q0p is a strong decomposition of q(~
x) w.r.t. O.
Corollary 1 For each uGF(=) or uGC2 (=) ontology O and each
UCQ q(~
x) there is a strong decomposition Q of q(~
x) w.r.t. O.
Moreover, if O is an uGC2 (=) ontology, then each rAQ that occurs in a pair of Q has a single free variable.
We are now ready to give the proof of Theorem 4.
P ROOF OF T HEOREM 4. PART 1 AND 2: Since the “only if”
directions are trivial, it suffices to focus on the direction from
rAQ-evaluation to UCQ-evaluation. Suppose that rAQ-evaluation
w.r.t. O is in PT IME (resp., Datalog6= -rewritable), and let q(~
x) be
a UCQ. We show that evaluating q(~
x) w.r.t. O is in PT IME (resp.,
Datalog6= -rewritable).
Let Q be a strong decomposition of q(~
x) w.r.t. O, which exists by Corollary 1. Then for all instances D and all tuples ~a over
dom(D) of length |~x| we have O, D |= q(~a) iff the following is
true:
There exists (φ(~
y ), C) ∈ Q and an assignment π of elements in dom(D) to the variables in ~
y with
1. π(~x) = ~a,
3. O, D |= q 0 (π(~z)) for each rAQ q 0 (~z) in C.
Indeed, since rAQ-evaluation w.r.t. O is in PT IME (this also holds
if rAQ-evaluation w.r.t. O is Datalog6= -rewritable), we may assume that O is rAQ-materializable (by Theorem 3). Pick a rAQmaterialization A of O and D. Then condition 2 of Definition 5
means that there exists a pair (φ(~
y ), C) ∈ Q and an assignment π
of elements in dom(D) to the variables in ~
y such that π(~
x) = ~a,
D |= φ(π(~
y )), and A |= q 0 (π(~z)) for each rAQ q 0 (~z) in C. Since A
is a rAQ-materialization of O and D, we can replace A |= q 0 (π(~z))
in the third condition by O, D |= q 0 (π(~z)), which yields (2).
If rAQ-evaluation w.r.t. O is in PT IME, then (2) yields a polynomial time procedure for evaluating q w.r.t. O.
Moreover, if rAQ-evaluation w.r.t. O is Datalog6= -rewritable,
then we can construct a Datalog6= program for evaluating q w.r.t. O
as follows. Let Q0 be the set of rAQs that occur in some pair of Q.
For each q 0 ∈ Q0 , let Πq0 be a Datalog6= program that evaluates
q 0 w.r.t. O. Without loss of generality we assume that the intensional predicates used in different programmes Πq0 and Πq00 are
disjoint, and that the goal predicate of Πq0 is goalq0 . Now let Π be
the Datalog6= program containing the rules of all programs Πq0 , for
q 0 ∈ Q0 , and the following rule for each (φ(~
y ), C) ∈ Q:
^
goal(~x) ← φ(~
y) ∧
goalq0 (~z).
q 0 (~
z )∈C
Note that if each Πq0 is a Datalog program, then Π is a Datalog
program as well. Using (2) it is straightforward to verify that for
all instances D we have D |= Π(~a) iff O, D |= q(~a).
PART 3: The “if” directions are trivial, so we focus on the direction from UCQ-evaluation to rAQ-evaluation. Suppose that UCQevaluation w.r.t. O is CO NP-hard, and let q(~
x) be a UCQ that witnesses this. We show that rAQ-evaluation w.r.t. O is CO NP-hard
via a polynomial-time reduction from evaluating q w.r.t. O.
We start by describing a translation of instances D and tuples ~a,
and will show afterwards that this translation is a polynomial-time
reduction from evaluating q w.r.t. O to evaluating a suitably chosen
rAQ w.r.t. O.
Let Q be a strong decomposition of q(~
x) w.r.t. O. Fix an enu0
meration q10 (~z1 ), . . . , qm
(~zm ) of all rAQs that occur in some pair
of Q, and let ki be the length of ~zi . Without loss of generality, we
can assume that each Dqi0 is consistent w.r.t. O. We use fresh relation symbols R, S, and Ti (1 ≤ i ≤ m), where R and S are binary,
and Ti is (ki + 1)-ary. Note that each of these relation symbols is
at most binary in the case that O is a uGC2 (=) ontology.
Given an instance D and a tuple ~a ∈ dom(D)|~x| , we compute
a new instance D̃ by adding the following atoms to D. First, we
add all atoms of Dqi0 , for each i ∈ {1, . . . , m}, where we assume
w.l.o.g. that the domain of Dqi0 is disjoint from that of D and Dqj0
for j 6= i. Let ~ci be the tuple of elements in Dqi0 that represents the
tuple ~zi of the answer variables of qi0 . Next, we add the following
atoms for each pair p = (φ(~
y ), C) ∈ Q, and for each assignment π
of elements in dom(D) to the variables in ~
y that satisfies π(~
x) = ~a
and D |= φ(π(~
y )):
• R(a0 , ap );
• Ti (ap,π , π(~zi )) for each i ∈ {1, . . . , m} with qi0 ∈ C;
• Ti (ap,π , ~ci ) for each i ∈ {1, . . . , m} with qi0 ∈
/ C.
(2)
2. D |= φ(π(~
y )), and
• S(ap , ap,π );
Since Q is of constant size, the instance D̃ can be computed in time
polynomial in the size of D.
It is now straightforward to verify that O, D |= q(~a) holds iff
D̃, O |= q̃(a0 ), where q̃(x) is the rAQ
q̃(x) ← R(x, y) ∧ S(y, z) ∧
m
^
i=1
Ti (z, ~
ui ) ∧ qi0 (~
ui ) .
Since evaluating q(~
x) w.r.t. O is CO NP-hard, we conclude that
evaluating q̃(x) w.r.t. O is CO NP-hard.
E.
PROOFS FOR SECTION 4
Theorem 5 (restated) For all uGF(=) and uGC2 (=) ontologies
O, unravelling tolerance of O implies that rAQ-evaluation w.r.t. O
is Datalog6= -rewritable (and Datalog-rewritable if O is formulated
in uGF).
P ROOF. Let q be an rAQ. We show that answering q w.r.t. O
is Datalog6= -rewritable. We provide separate proofs for the case
that O is a uGF(=) ontology (Part 1) and for the case that O is a
uGC2 (=) ontology (Part 2).
Part 1: O is a uGF(=) ontology. To simplify the presentation,
we will regard q as an openGF formula, which we can do w.l.o.g.
because q has a connected guarded tree decomposition whose root
is labeled by the answer variables of q. Let cl(O, q) be the smallest
set satisfying:
• O ∪ {q} ⊆ cl(O, q);
• cl(O, q) contains one atomic formula R(~
x) for each relation
symbol R that occurs in O or q, where ~
x is a tuple of distinct
variables;
• cl(O, q) contains a single equality atom x = y, where x and y
are distinct variables;
• cl(O, q) is closed under subformulas and single negation, where
the subformulas of a formula φ = Q~
x ψ with a quantifier Q are
φ itself and ψ.
Let w be the maximum arity of a relation symbol in O or q, and
let x1 , . . . , x2w−1 be distinct variables that do not occur in O or
q. Given a tuple ~
x over X = {x1 , . . . , x2w−1 }, an ~
x-type θ is
a maximal consistent set of formulas, where each formula in θ is
obtained from a formula φ(~
y ) ∈ cl(O, q) by substituting a variable
from ~
x for each variable in ~
y , and one formula in θ is a relational
atomic formula containing all the variables in θ. If θi is an ~
xi -type
for each i ∈ {1, 2}, then θ1 and θ2 are compatible if they agree
on all formulas containing only the variables that occur both in ~
x1
and ~
x2 . An ~
x-type θ is realizable in an interpretation A if there is
an assignment π of elements in dom(A) to the variables in ~
x such
that A |= φ(π(~
y )) for each φ(~
y ) ∈ θ. In this case we also say that
~a = π(~
x) realizes θ in A. Let tp(~
x) be the set of all ~
x-types that are
realizable in some model of O. Since O and q are fixed, each tp(~
x)
is of constant size and can be computed in constant time using a
standard satisfiability procedure for the guarded fragment [29].
We now describe a Datalog6= program Π for evaluating q w.r.t.
O. For each l ∈ {1, . . . , w}, each l-tuple ~
x over X, and each
Θ ⊆ tp(~
x), the program uses an intensional l-ary relation symbol
PΘl . Intuitively, PΘl (~a) encodes an assignment of possible types
(namely those in Θ) for ~a in a model of O. In the description
below, l1 and l2 range over integers in {1, . . . , w}, and ~
x1 and ~
x2
range over tuples over X of length l1 and l2 , respectively, such that
~
x2 contains at least one variable from ~
x1 . Let k be the arity of q.
The program contains the following rules:
1. PΘl1 (~x1 ) ← R(~
y ) ∧ α(~z), where Θ is the set of all θ ∈ tp(~
x1 )
that contain R(~
y ) and α(~z), α is an atomic formula (possibly an
equality) or an inequality, and ~
y contains exactly the variables
from ~x1 ;
2. PΘl1 (~x1 ) ← PΘl11 (~
x1 ) ∧ PΘl22 (~
x2 ), where Θi ⊆ tp(~
xi ), and Θ
is the set of all θ1 ∈ Θ1 such that there exists a θ2 ∈ Θ2 that is
compatible with θ1 ;
x1 ), where Θi ⊆ tp(~
x1 );
x1 ) ∧ PΘl12 (~
3. PΘl11 ∩Θ2 (~x1 ) ← PΘl11 (~
x1 ), where Θ ⊆ tp(~
x1 ), and
4. goal(xi1 , . . . , xik ) ← PΘl1 (~
q(xi1 , . . . , xik ) ∈ θ for each θ ∈ Θ;
5. goal(x1 , . . . , xk ) ← P∅l (~
y ), where l ≤ w and ~
y is a tuple of l
variables from X distinct from x1 , . . . , xk .
It remains to show that Π is a Datalog6= -rewriting for q w.r.t. O.
To this end, let D be an instance. We show that O, D 6|= q(~a) iff
D 6|= Π(~a).
For the “only if” direction, assume O, D 6|= q(~a), and let B
be a model of D and O with B 6|= q(~a). Consider a tuple ~b =
(b1 , . . . , bl ) ∈ dom(D)l for some l ∈ {1, . . . , w}. Note that if ~b is
not guarded in B, then for all tuples ~
x ∈ X l and all Θ ⊆ tp(~
x) the
l ~
program Π does not derive PΘ (b) on input D. This is because of
the rules in line 1, which require ~b to be guarded in order to derive
y = (y1 , . . . , yl ) ∈ X l , define θy~ (~b) to be
PΘl (~b). For each tuple ~
the set of all formulas obtained from a formula φ(z1 , . . . , zn ) ∈
cl(O, q) and 1 ≤ i1 , . . . , in ≤ l with B |= φ(bi1 , . . . , bin ) by
substituting yij for each zj . Since ~b is guarded in D, θy~ (~b) contains
a relational atomic formula that contains all variables from ~
y , hence
θy~ (~b) is a ~
y -type. Furthermore, θy~ (~b) is realizable in B, which
implies θy~ (~b) ∈ tp(~
y ). By induction on rule applications of Π
one can show that for each guarded tuple ~b in dom(B) of length
l, each tuple ~
y ∈ X l , and each Θ ⊆ tp(~
y ) such that PΘl (~b) is
derivable by Π on D we have θy~ (~a) ∈ Θ. Since B 6|= q(~a), for
each guarded tuple ~b = (b1 , . . . , bl ) in D with ~a = (bi1 , . . . , bik )
and each ~
y = (y1 , . . . , yl ) ∈ X l we have q(yi1 , . . . , yik ) ∈
/ θy~ (~b).
Altogether, this implies that D 6|= Π(~a).
For the “if” direction, assume D 6|= Π(~a). Let G~a be a maximal
guarded set of D containing the elements of ~a. Recall the definition
of the uGF(=)-unravelling of D from Section 4. We show that
O, Du 6|= q(~b), where ~b is the copy of ~a in bag(G~a ), which implies
O, D 6|= q(~a) by unravelling tolerance of O. More precisely, we
construct a model B of Du and O with B 6|= q(~b).
Observe that for each l ∈ {1, . . . , w}, each ~c ∈ dom(D)l , each
Θ ⊆ tp(~x), and each bijective mapping f : X → X, the program
Π derives PΘl (~c) iff it derives Pfl (Θ) (~c), where f (Θ) is the f (~
x)type obtained from Θ by substituting f (x) for each variable x in ~
x.
In other words, the sets Θ of types that Π derives for each tuple ~c
do not depend on the choice of the variables ~
x. Observe also that
for each guarded tuple ~c in D there is a unique minimal Θ(~c) ⊆
l
tp(x1 , . . . , xl ) such that Π derives PΘ(~
c). Since Π does not
c) (~
derive goal(~a), we must have Θ(~c) 6= ∅ for all guarded tuples ~c in
D. Furthermore, if ~c = (c1 , . . . , cl ) and ~a = (ci1 , . . . , cik ), then
q(xi1 , . . . , xik ) ∈
/ θ for some θ ∈ Θ(~c). Let us denote the set of
such types in Θ(~c) by Θ¬q (~c).
To construct the desired model B, we first assign to each maximally guarded tuple ~c of Du a type in Θ(~c) as follows. Recall
the definition of the tree T (D) and the bags bag(t), t ∈ T (D),
used in the definition of the uGF-unravelling in Section 4. For each
node t of T (D), let Gt := dom(bag(t)) and ~ct a |Gt |-tuple of all
elements in Gt . We inductively assign to each node t of T (D) a
(x1 , . . . , x|Gt | )-type θt as follows. If t = G for a maximal guarded
subset G of D, pick a type θt ∈ Θ¬q (~ct ). For the induction step,
suppose that t = t0 G and tail(t0 ) = G0 . Let ~ct0 = (c01 , . . . , c0m ),
~ct = (c1 , . . . , cn ), and define a bijective mapping f : X → X
such that for all i ∈ {1, . . . , n},
• if ci = c0j , then f (xi ) = xj ;
• if ci ∈
/ {c01 , . . . , c0m }, then f (xi ) ∈
/ {x1 , . . . , xm }.
Since θt0 ∈ Θ(~ct0 ), there exists a type θt ∈ Θ(~ct ) such that θt0
and f (θt ) are compatible. This follows from the rules in line 2 of
the definition of Π.
Finally, for each node t of T (D), pick a model Bt of O such
that ~ct realizes θt in Bt . Without loss of generality we assume
that dom(Bt ) ∩ dom(Du ) = Gt for each node t of T (D), and
dom(Bt ) ∩ dom(Bt0 ) = Gt ∩ Gt0 for every two distinct t, t0 ∈
T (D) Let B be the interpretation obtained from Du by hooking
Bt to Du for each node t of T (D):
[
B := Du ∪
Bt .
t∈T (D)
u
Clearly, B is a model of D . It remains to show that B is a model
of O with B 6|= q(~b).
Claim. For all openGF formulas φ(~
x), all guarded tuples ~c of B,
and all nodes t ∈ T (D) with ~c ⊆ dom(Bt ),
Bt |= φ(~c) ⇐⇒ B |= φ(~c).
(3)
Proof. By induction on the structure of φ. For the base case assume
that φ is an atomic formula. Let ~c be a guarded tuple of B and
let t ∈ T (D) be such that ~c ⊆ dom(Bt ). Then, Bt |= φ(~c)
implies B |= φ(~c) because Bt ⊆ B. For the converse assume that
B |= φ(~c). Since φ is atomic, this implies Bt0 |= φ(~c) for some
t0 ∈ T (D) with ~c ⊆ dom(Bt0 ). By the choice of the types θt and
θt0 , we obtain Bt |= φ(~c).
For the inductive step, we distinguish the following cases:
Case 1: φ(~
x) = ¬φ0 (~
x). For each guarded tuple ~c of B and each
node t ∈ T (D) with ~c ⊆ dom(Bt ), we have
Bt |= φ(~c) ⇐⇒ Bt 6|= φ0 (~c)
⇐⇒ B 6|= φ0 (~c) ⇐⇒ B |= φ(~c),
where the second equivalence follows from the induction hypothesis.
Case 2: φ(~
x) = φ1 (~
x1 )∧φ2 (~
x2 ). Consider a guarded tuple ~c of B
and a node t ∈ T (D) with ~c ⊆ dom(Bt ). Let ~ci be the projection
of ~c onto the positions of ~
x that contain a variable from ~
xi . Then,
Bt |= φ(~c) ⇐⇒ Bt |= φi (~ci ) for each i ∈ {1, 2}
⇐⇒ B |= φi (~ci ) for each i ∈ {1, 2}
⇐⇒ B |= φ(~c),
where the second equivalence follows from the induction hypothesis.
Case 3: φ(~
x) = ∀~
y (α(~
x, ~
y ) → ψ(~
x, ~
y )). Consider a guarded tuple
~c of B and a t ∈ T (D) with ~c ⊆ dom(Bt ).
We first prove that Bt |= φ(~c) implies B |= φ(~c). Suppose that
~ for some tuple d.
~ Since α is
Bt |= φ(~c), and let B |= α(~c, d)
0
~
atomic, we have Bt0 |= α(~c, d) for some node t ∈ T (D) with
~c, d~ ⊆ dom(Bt0 ). Note that ~c contains only values in dom(Bt ) ∩
dom(Bt0 ) and that ~c is non-empty (because φ is open). By the
choice of the types θt and θt0 and since Bt |= φ(~c), we obtain
~ The induction hypothesis
Bt0 |= φ(~c). Hence, Bt0 |= ψ(~c, d).
~
now yields B |= ψ(~c, d). We have thus shown that B |= φ(~c).
~
For the converse, assume B |= φ(~c), and let Bt |= α(~c, d)
~
for some tuple d. Since α is atomic and Bt ⊆ B, we have
~ and therefore B |= ψ(~c, d).
~ Since ~c, d~ is guarded
B |= α(~c, d),
and contained in dom(Bt ), the induction hypothesis implies Bt |=
~ This shows that Bt |= φ(~c).
ψ(~c, d).
Case 4: φ(~x) = ∃~
y (α(~
x, ~
y ) ∧ ψ(~
x, ~
y )). Consider a guarded tuple
~c of B and a node t ∈ T (D) with ~c ⊆ dom(Bt ).
If Bt |= φ(~c), then there exists a tuple d~ ⊆ dom(Bt ) such that
~ and Bt |= ψ(~c, d).
~ By the induction hypothesis,
Bt |= α(~c, d)
~
~ and hence B |= φ(~c).
this implies B |= α(~c, d) and B |= ψ(~c, d),
Conversely, if B |= φ(~c), then there exists a tuple d~ such that
~ and B |= ψ(~c, d).
~ Since α is atomic, there exists
B |= α(~c, d)
a node t0 ∈ T (D) with ~c, d~ ⊆ dom(Bt0 ), and hence by the in~ and Bt0 |= ψ(~c, d),
~
duction hypothesis we obtain Bt0 |= α(~c, d)
and therefore Bt0 |= φ(~c). By the choice of θt and θt0 , we obtain
Bt |= φ(~c).
y
Using the claim we now show that B is a model of O with B 6|=
q(~b). By construction we have θt ∈ Θ¬q (~ct ), where t = G~a . This
implies Bt 6|= q(~b), and using the claim we obtain B 6|= q(~b).
To prove that B is a model of O, let ψ = ∀~
x(α(~
x) → φ(~
x)) be
a sentence in O, and let ~c be a tuple with B |= α(~c). Since α is
atomic, ~c is a guarded tuple in B. In particular, since B |= α(~c),
there must be a node t0 ∈ T (D) with ~c ⊆ dom(Bt0 ) and Bt0 |=
α(~c). In fact, by the choice of the types assigned to the maximally
guarded tuples of Du , every node t ∈ T (D) with ~c ⊆ dom(Bt )
must satisfy Bt |= α(~c). Now let t be a node in T (D) such that
~c ⊆ dom(Bt ). Since Bt |= α(~c) and Bt is a model of O, we get
Bt |= φ(~c), so by (3) we have B |= φ(~c). Therefore, B |= ψ.
This holds for all sentences in O, hence B is a model of O.
If O is a uGF-ontology, we obtain a Datalog-rewriting from Π
by removing inequalities from types and the rules in line 1.
Part 2: O is a uGC2 (=) ontology. Analogous to part 1 we regard q
as an openGC2 formula, which we can do w.l.o.g. because q has a
cg-tree decomposition where the domain of each bag consists of at
most two elements and whose root bag has the answer variables of
q as its domain. We define cl(O, q), types, and the corresponding
notion of realization as in part 1. Since in the context of uGC2 (=)
we work over an at most binary signature, we will only use types
over one or two variables. The sets tp(x) and tp(x, y) of these
types can be computed in constant time using a satisfiability procedure for the two-variable guarded counting fragment [48]. Given a
~
x0 -type θ0 and a set Θi ⊆ tp(~
xi ) for each i ∈ {1, . . . , `} we write
θ0
(Θi )ì=1 iff there exists a θi ∈ Θi for each i ∈ {1, . . . , `}
S
such that ì=0 θi is realizable in a model of O. Furthermore, if
(x, y) and (u, v) are pairs of distinct variables and Θ ⊆ tp(u, v),
then we write Θx,y for the set of all types in tp(x, y) that can be
obtained from a type in Θ by renaming u to x and v to y.
We now turn to the description of the Datalog6= program Π
for evaluating q w.r.t. O. The program uses distinct variables
x, y, z0 , z1 , z2 , . . . , zN τ 2τ . Here, τ is one more than the number
of types of two distinct variables, and N is the largest integer n
such that a formula of the form ∃≥n x φ occurs in cl(O, q), or 1
if there is no such formula in cl(O, q). For each i ∈ {1, 2} and
Θ ⊆ tp(u, v), Π uses an intensional binary relation symbol PΘ
with the sameVintended interpretation as the symbols PΘl in part 1.
Let neq` := 0≤i<j≤` zi 6= zj . Then Π consists of the following
rules:
1. PΘ (x, y) ← R(~v ) ∧ α(w),
~ where Θ is the set of all types in
tp(x, y) containing R(~v ) and α(w),
~ α is an atomic formula
(including an equality) or an inequality, and ~v contains both x
and y;
V
2. PΘ (x, z0 ) ← li=0 PΘi (x, zi )∧neq` , where ` ≤ N τ 2τ , Θi ⊆
tp(x, zi ) for i = 0, 1, . . . , `, and Θ consists of all θ0 ∈ Θ0 with
θ0
(Θi )ì=1 ;
V
3. PΘ1 ∩Θ2 (x, y) ← 2i=1 PΘi (x, y) with Θi ⊆ tp(x, y);
4. goal(~v ) ← PΘ (x, y), where Θ ⊆ tp(x, y), and q(~v ) ∈ θ for
each θ ∈ Θ;
5. goal(~v ) ← P∅ (x).
In addition, for each set Θ ⊆ tp(x, y), the program contains
PΘx,zi (x, y) ← PΘ (x, y) and PΘ (x, z0 ) ← PΘx,z0 (x, z0 ) to rename variables in types, and PΘy,x (y, x) ← PΘ (x, y) to swap
positions of variables.
Intuitively, Π computes for each guarded tuple (a, b) of an instance D a set Θ(a, b) of (x, y)-types θ0 that contain all information about the atomic formulas that hold in D|{a,b} , and for
each collection (a, c1 ), . . . , (a, c` ) of ` ≤ N τ 2τ guarded tuples
of D that have a non-empty intersection with (a, b) there are types
S
θi ∈ Θ(a, ci ) such that ì=0 (θi )x,zi is realizable in a model of O.
The renaming of the variables in each θi is necessary to account for
the overlap of the tuples (a, b) and ~ci . Π derives goal(~c) if some
Θ(a, b) is empty (which will happen if D is inconsistent w.r.t. O)
or if Θ(a, b) contains q(~v ) for some (a, b), where the ith variable
in ~v is x if the ith position of ~a is a, and it is y otherwise.
It remains to show that Π is a Datalog6= -rewriting for q w.r.t. O.
To this end, let D be an instance. We show that O, D 6|= q(~a) iff
D 6|= Π(~a).
The “only if” direction is similar to part 1. For the “if” direction,
assume D 6|= Π(~a). Recall the definition of the uGF2 -unravelling
of D from Section 4, and let G~a be a maximal guarded set of D
containing the elements of ~a. As in part 1, it suffices to construct a
model B of Du and O with B 6|= q(~b), where ~b is the copy of ~a in
bag(G~a ).
Observe that for each guarded tuple ~c in D there is a unique
minimal set Θ(~c) ⊆ tp(x, y) such that Π derives PΘ(~c) (~c) on input
D. Since Π does not derive goal(~a), we must have Θ(~c) 6= ∅ for
all guarded tuples ~c in D. Furthermore, if ~a = f (~v ) for a mapping
from the variables in ~v to the constants in ~c, then q(~v ) ∈
/ θ for some
θ ∈ Θ(~c). Let us denote the set of such types in Θ(~c) by Θ¬q (~c).
To construct the desired model B, we first assign to each maximally guarded tuple ~c of Du a type in Θ(~c↑ ) and a model At of O as
follows. Recall the definition of the tree T (D) and the bags bag(t),
for t ∈ T (D), used in the definition of the uGF2 -unravelling in
Section 4. We inductively assign to each node t of T (D) a (x, y)type θt as follows. If t = {a, b} for a maximal guarded set {a, b}
in D, let θt be any type in Θ¬q (a, b). For the induction step, assume that we have assigned a type θt to t ∈ T (D), but that θt0 is
undefined for each node t0 = tG in T (D). Let tail(t) = {a, b}.
We distinguish the following two cases.
Case 1: There exists a t0 ∈ T (D) with t = t0 {a, b}. In this case,
there is only one kind of successor node of t in T (D): nodes of the
form t{a, c}, where a does not occur in tail(t0 ). Let c1 , . . . , cn be
an enumeration of all elements of dom(D) such that ti := t{a, ci }
is a node in T (D). Denote by ∼ the equivalence relation on
{c1 , . . . , cn } with ci ∼ cj iff Θ(a, ci ) = Θ(a, cj ). There are
s ≤ 2τ equivalence classes w.r.t. ∼. Let E1 , . . . , Es be an enumeration of these classes. For each i ∈ {1, . . . , s}, pick an arbitrary
subset Ei0 of Ei of size N τ , or the entire set if Ei has fewer than
N τ elements. Let c01 , . . . , c0` be an enumeration of the elements in
E10 ∪ · · · ∪ Es0 . Then, ` ≤ N τ 2τ .
Since θt ∈ Θ(a, b), the rules in line 2 of the definition of Π ensure θt
(Θi )ì=1 , where Θi := {θx,zi | θ ∈ Θ(a, c0i )}. Hence,
for each i ∈ {1, . . . , `} there is a type θi ∈ Θ(a, c0i ) such that
S
θt ∪ ì=1 (θi )x,zi is realizable in a model of O. Let θt{a,c0i } := θi
for each i ∈ {1, . . . , `}.
It remains to assign types to the elements in Ei \ Ei0 , for each
i ∈ {1, . . . , s}. By construction, we have Ei \ Ei0 6= ∅ only if
|Ei0 | = N τ . Thus there must be at least one type θi∗ that is assigned
to at least N of the nodes t{a, ci }. We define θt{a,c} := θi∗ for
each c ∈ Ei \ Ei0 .
This finishes the assignment of types θt0 to all successor nodes
∗
t0 of t.
SnNote that by our choice of N and the type θi , the set θ :=
θt ∪ i=1 (θti )x,zi is realizable in a model At of O. In fact, At
can be chosen such that the assignment πt with πt (x)↑ = a and
πt (zi )↑ = ci realizes θ in At .
Case 2: t = {a, b}. In this case, there are two kinds of successor nodes of t: nodes of the form t{a, c} and nodes of the
form t{b, c}. Let t1 , . . . , tn be an enumeration of all nodes of
the form t{a, c}, and let u1 , . . . , um be an enumeration of all
nodes of the form t{b, c}. We assign types θti to each node ti
as in Case 1. We do the same for all successors nodes uj , but
here we have to use the rule PΘy,x (y, x) ← PΘ (x, y) which implies that Θ(b,
θ ∈ Θ(a, b)}. One can now show that
S a) = {θy,x | S
m
0
θ := θt ∪ n
(θ
)
∪
t
x,z
i
i
i=1
i=1 (θui )y,zi is realizable in a model
At of O, and that this model can be chosen in such a way that
the assignment πt with πt (x)↑ = a, πt (zi )↑ ∈ tail(ti ) \ {a} and
πt (uj )↑ ∈ tail(uj ) \ {b} realizes θ in At .
This concludes the induction step.
We are now ready to construct B. Note that any two At and At0
with t0 a neighbor of t in T (D) coincide on all atoms that only involve elements of dom(Du ). Let B0 be the collection of all atoms
that occur in some At and only involve elements of dom(Du ). For
each a ∈ dom(Du ) we now attach the following structure B{a} to
B0 . Pick any t ∈ T (D) such that a ∈ dom(bag(t)). Using the
construction in the second part of the proof of Lemma 1, we unfold
At at a into a cg-decomposable model B{a} , where we identify a
and its neighbors b with the copies of a and b in the bag of the root
and its neighbors. We assume that all other elements have been
renamed so that they do not occur in dom(Du ). Let
[
B := B0 ∪
B{a} .
a∈dom(Du )
u
It is not difficult to prove that B is a model of D and O with
B 6|= q(~b).
F.
PROOFS FOR SECTION 5
Theorem 6 (restated) Let O be either a uGF(1), uGF− (1, =),
−
uGF−
2 (2), uGC2 (1, =), or ALCHIF ontology of depth 2. If O
is materializable for (possibly infinite) cg-tree decomposable instances D with sig(D) ⊆ sig(O), then O is unravelling tolerant.
P ROOF. We first give the proof for the ontology languages
−
uGF(1), uGF−
2 (2) and uGF (1, =). Assume such an ontology O
is given. Let D be an instance and Du its uGF-unravelling. Let
G0 be a maximal guarded set in D, ~a in G0 , ~b the copy of ~a in
bag(G0 ), and q an rAQ. We have to show that O, D |= q(~a) implies O, Du |= q(~b).
Define an equivalence relation ∼ on T (D) by setting t ∼ t0 if
tail(t) = tail(t0 ). Recall that for any t, t0 ∈ T (D) with t ∼ t0
the mapping ht,t0 that sends every e ∈ dom(bag(t)) to the unique
f ∈ dom(bag(t0 )) with e↑ = f ↑ is an isomorphism from bag(t)
to bag(t0 ). Call ht,t0 the canonical isomorphism from bag(t) onto
bag(t0 ). By construction of the undirected graph (T (D), E), for
any t, t0 ∈ T (D) with t ∼ t0 there is an automorphism it,t0 of
(T (D), E) such that it,t0 (t) = t0 and it,t0 (s) ∼ s for every s ∈
T (D). it,t0 is uniquely determined on the connected component of
t in (T (D), E) and induces the extended canonical automorphism
S
ĥt,t0 of Du by setting ĥt,t0 = s∈T (D) hs,it,t0 (s) .
Fact 1. If t, t0 ∈ T (D) such that t ∼ t0 then ĥt,t0 is an automorphism of Du .
Our aim now is to construct a materialization B of O and Du that
is a forest model such that the automorphism ĥt,t0 can be lifted
to an automorphism of B. Once this is done it is straightforward
to construct a materialization B0 of O and D by hooking to any
maximal guarded set G of D a copy of Bbag(G) , the guarded tree
decomposable subinstances of B hooked to bag(G) in B. Then
(B0 , ~a) and (B, ~b) are guarded bisimilar and if O, Du 6|= q(~b) then
O, D 6|= q(~a) follows.
Let B be a materialization of O and Du . (To show that B exists
let red(Du ) be the sig(O)-reduct of D. As O is invariant under
disjoint unions and materializable for the class of (possibly infinite)
cg-tree decomposable instances D with sig(D) ⊆ sig(O) there
exists a materialization Bred of red(Du ). Clearly {R | R(~a) ∈
Bred } ⊆ sig(O). Now let
B = Bred ∪ {R(~a) ∈ Du | R 6∈ sig(O)}
One can show that B is a materialization of Du and O.)
Fact 1 entails the following.
Fact 2. For all t, t0 with t ∼ t0 and ~e in dom(bag(t)) and any rAQ
q(~
x) such that ~
x has the same length as ~e:
B |= q(~e)
⇔
We now distinguish two cases.
B |= q(ht,t0 (~e))
Case 1. O is a uGF(1) or a uGF−
2 (2) ontology. The following
observation is crucial for the proof (and does not hold for languages
with equality such as uGF− (1, =)).
Observation 1. If there is a homomorphism h from an instance D
to an instance D0 then O, D |= q(~a) implies O, D0 |= q(h(~a)) for
any CQ q and ~a in dom(D).
We hook in Du to any bag(t) with t ∈ T (D) a copy of any
rAQ q (regarded as an instance) from which there is a homomorphism into B that is injective on the root bag of q and maps it into
dom(bag(t)). In more detail, let Xt be the set of all pairs (π, Dq )
such that Dq is an instance corresponding to an rAQ q and π is a homomorphism from Dq to B mapping the constants dx corresponding to answer variables x of q to distinct π(dx ) ∈ dom(bag(t)). By
renaming constants in Dq we obtain an instance Dq,π isomorphic
to Dq such that π(dx ) = dx and such that dom(Dq,π ) ∩ dom(Du )
is the set of all dx with x an answer variable of q. Now let
[
[
Dt =
Dq,π , Du+ = Du ∪
Dt
(q,π)∈Xt
t∈T (D)
u+
The following properties of D
tion:
follow directly from the defini-
1. For any t, t0 ∈ T (D) with t ∼ t0 there is an isomorphism from
Dt onto Dt0 that extends the canonical isomorphism ht,t0 ;
2. there is a homomorphism from Du+ into B preserving
dom(Du ). Thus, by Observation 1, any materialization of Du+
is a materialization of Du and for every CQ q(~
x) and d~ in Du
of the same length as ~
x:
~
O, Du+ |= q(d)
⇔
~
O, Du |= q(d)
Now take a materialization Bu+ of Du+ and O that is a forest
model of Du+ and O. Thus Bu+ is obtained from Du+ by hooking
cg-tree decomposable models Bu+
of Dt to every bag(t) with t ∈
t
T (D). By Point 1 and Fact 1, we have the following:
Fact 3. For any t, t0 ∈ T (D) with t ∼ t0 the canonical automorphism ĥt,t0 extends to an isomorphism from Bu+
|dom(Dt ) onto
Bu+
.
|dom(D 0 )
t
Fact 4. For any t, t0 ∈ T (D) with t ∼ t0 and any finite subinstance A of Bu+
there exists an isomorphic embedding of A into
t
Bu+
extending
the canonical automorphism ĥt,t0 .
|dom(D 0 )
t
We prove Fact 4. By Fact 3 it suffices to prove that for any
t ∈ T (D) and any finite subinstance A of Bu+
there exists an isot
morphic embedding of A into Bu+
preserving
dom(bag(t)).
|dom(Dt )
But this follows from the fact that Bu+ is a materialization of Du
(Point (2)): then there is an isomorphism from A to some Dπ,q
used in the construction of Du+ which preserves dom(bag(t)). Fix
such a Dπ,q . It remains to be proved that there does not exist any
R(~a) with ~a in dom(Dπ,q ) such that R(~a) ∈ Bu+ \ Dπ,q . But
using the fact that A is a subinstance of the model Bu+ of O and
Du+ isomorphic to Dπ,q one can easily construct a model of Du+
and O that contains no R(~a) 6∈ Dπ,q with ~a in dom(Dπ,q ). Thus
Bu+ contains no such R(~a) since Bu+ is a materialization of O
and Du+ . This finishes the proof of Fact 4.
Fact 4 easily generalizes to Bu+ (by Fact 3): for any t, t0 ∈
T (D) with t ∼ t0 and any finite subinstance A of Bu+ there exists
an isomorphic embedding of A into Bu+ extending the canonical
automorphism ĥt,t0 . We now uniformize Bu+ . For each t ∈ T (D)
with t not a guarded set in D we hook at bag(t) to Du an isomoru+
phic copy Bu∗
t of the interpretation Bbag(G) with t ∼ G and reu+
move Bt . Denote the resulting model by Bu∗ . Bu∗ is uniform
in the sense that for any two t, t0 ∈ T (D), the cg-tree decomposable models hooked to bag(t) and bag(t0 ) in Bu∗ are isomorphic. We now show that Bu∗ is a materialization of D and O.
∼
For a ∈ dom(Bu∗
the corresponding
t ) and t ∼ G denote by a
element of Bu+
such
that
for
a
∈
bag(t)
we
have a↑ = (a∼ )↑ .
bag(G)
Fact 5. Bu∗ is a materialization of O and Du .
We show that Bu∗ is a model of O. Then Bu∗ is a materialization of O and Du since it is a model of Du and since Bu∗ is
obtained from the materialization Bu+ of O and Du by replacing
certain interpretations that are hooked to bag(t) by interpretations
that are hooked to bag(t0 ) for t, t0 ∈ T (D) with t ∼ t0 which
preserves the answers to rAQs (use Fact 1).
Consider first a sentence ϕ of the form ∀~
y (R(~
y ) → ψ(~
y ))
in O, where ψ is a formula in openGF of depth one. We show
that Bu∗ |= ϕ. Let Bu∗ |= R(~a) for some ~a = (a1 , . . . , ak )
in dom(Bu∗ ). Then a1 , . . . , ak are contained in Bu∗
for some
t
t ∈ T (D). We show that Bu∗ |= ψ(~a) iff Bu+ |= ψ(~a∼ ) where
∼
u
~a∼ = (a∼
1 , . . . , ak ). Assume t ∼ G. If a1 , . . . , ak 6∈ dom(D ),
then this is clear by construction since the truth of ψ(~a) then only
u∗
depends on the subinterpretation Bu∗
and this is isomort of B
u+
phic to the subinterpretation Bbag(G) of the model Bu+ of O. Now
assume that {a1 , . . . , ak }∩dom(Du ) = Z 6= ∅. By Fact 4, for any
guarded set F in Bu∗ with G0 ∩ Z 6= ∅ there exists a guarded set
F 0 in Bu∗ such that there exists an isomorphism h from Bu∗
|F onto
u∗
B|F 0 that extends the canonical automorphism ĥt,G . The converse
∼
u
direction holds as well: let Z 0 = {a∼
1 , . . . , ak } ∩ dom(D ). By
Fact 4, for any guarded set F 0 in Bu∗ with F 0 ∩ Z 0 6= ∅ there exists a guarded set F in Bu∗ such that there exists an isomorphism
u∗
h from Bu∗
|F 0 onto B|F that extends the canonical automorphism
ĥG,t .Thus Bu∗ |= ψ(~a) iff Bu+ |= ψ(~a∼ ) follows.
For sentences ϕ of the form ∀xψ(x) in O, where ψ(x) is an
openGF formula of depth two using at most binary relations the
argument is similar using the fact that guarded sets {a, b} are either
completely contained in dom(Du ) or contain at most one element
from dom(Du ). This finishes the proof of Fact 5.
Finally we hook for any maximal guarded G in D the interpretation Bu+
bag(G) to G in the original instance D and obtain a forest
model B+ . It is straightforward to prove that for any maximal
guarded set G in D, any tuple ~e containing all elements of G, and
the copy f~ of ~e in bag(G) there is a connected guarded bisimulation between (Bu∗ , f~) and (B+ , ~e). It follows that B+ is a model
of O and D and that B+ 6|= q(~a) if Bu∗ 6|= q(~b).
Case 2. O is a uGF− (1, =) ontology. In this case the construction
is simpler as we do not modify B further. There is no need to
manipulate B as we are in a fragment of depth one in which the
outermost universal quantifier is guarded by an equality. Define
B+ by adding to D
• all atoms R(a↑1 , . . . , a↑k ) such that R(a1 , . . . , ak ) ∈ B and
a1 , . . . , ak ∈ dom(Du );
• for any a ∈ dom(D) for a fixed copy a0 of a in Du a copy Ba0
of B that is hooked to D at a by identifying a0 and a.
Using Fact 1 and the condition that O is a uGF− (1, =) ontology it
is straightforward to prove that B+ is a model of O and D. Also
by Fact 1 and construction B+ 6|= q(~a) if B 6|= q(~b).
We now assume that O is a uGC−
2 (1, =) or a ALCHIF ontology of depth 2. Let D be an instance, G0 be a maximal guarded set
in D, ~a in G0 , ~b the copy of ~a in bag(G0 ), and q an rAQ. We have
to show that O, D |= q(~a) implies O, Du |= q(~b). Observe that
now we have to consider the uGC2 -unravelling rather than the uGFunravelling of D. First we establish again the existence of certain
automorphisms of Du . In this case, however, they are not induced
by automorphisms of the tree (T (D), E) but are determined directly on the interpretation. Recall that for any t, t0 ∈ T (D) with
t ∼ t0 the mapping ht,t0 that sends every e ∈ dom(bag(t)) to
the unique f ∈ dom(bag(t0 )) with e↑ = f ↑ is an isomorphism
from bag(t) to bag(t0 ). Call ht,t0 the canonical isomorphism from
bag(t) onto bag(t0 ). One can easily prove the existence of an extended canonical automorphism ĥt,t0 of Du that extends ht,t0 .
Fact 1. If t, t0 ∈ T (D) such that t ∼ t0 there is an automorphism
ĥt,t0 of Du that extends ht,t0 .
Let B be a materialization of O and Du . Fact 1 entails the following.
Fact 2. For all t, t0 with t ∼ t0 and ~e in dom(bag(t)) and any rAQ
q(~
x) such that ~
x has the same length as ~e:
B |= q(~e)
⇔
B |= q(ht,t0 (~e))
We now distinguish two cases.
uGC−
2 (1, =)
Case 3. O is a
ontology. This case is similar to
Case 2. No further modification of B is needed. We may assume
that B is obtained from Du by hooking cg-tree decomposable interpretations Bc to c for every c ∈ dom(Du ) and by adding atoms
R(c, d) to Du for distinct c, d ∈ dom(Du ). Define B+ by adding
to D
• all atoms R(a↑1 , . . . , a↑k ) such that R(a1 , . . . , ak ) ∈ B and
a1 , . . . , ak ∈ dom(Du );
• for any a ∈ dom(D) for a fixed copy a0 of a in Du a copy of
Ba0 that is hooked to D at a by identifying a0 and a.
Using Fact 1 and the condition that O is a uGC−
2 (1, =) ontology it
is straightforward to prove that B+ is a model of O and D. Also
by Fact 1 and construction B+ 6|= q(~a) if B 6|= q(~b).
Case 4. O is a ALCHIF ontology of depth 2. The proof that
follows is similar to Case 1, but one cannot hook to any bag(t)
a copy of any rAQ from which there is a homomorphism into B
that is injective on the root bag of q and maps it into dom(bag(t))
as this can lead to violations of functionality. Two modifications
are needed: firstly, we do not independently hook interpretations to
bags bag(t) with two elements in Du . This is to avoid violations of
functionality due to guarded sets G1 and G2 with G1 ∩ G2 = {d}
when the interpretations we hook to G1 and G2 independently add
an R-successor to d for a function R (this has already been done in
Case 3). Secondly, we cannot hook arbitrarily many rAQs to bags
as this will lead to violations of functionality as well.
We may again assume that B is obtained from Du by hooking cg-tree decomposable interpretations Bc to c for every c ∈
dom(Du ) and by adding atoms R(c, d) to Du for distinct c, d ∈
dom(Du ). Observe that GBc = {{a, b} | R(a, b) ∈ Bc , a 6= b}
is an undirected tree. We call c its root and in this proof call such an
interpretation a tree interpretation with root c. We have to modify
Du to be able to uniformize. For any c ∈ dom(Du ) we define the
tree instance Dc with root c as follows. Let Dq be the instance corresponding to an rAQ q = q(x) ← φ with a single answer variable
x and a single additional variable y such that there is an injective
homomorphism h from Dq to B mapping x to some c in dom(Du )
and such that neither R(h(x), h(y)) ∈ B nor R(h(y), h(x)) ∈ B
for any R that is functional in O. Then Dc contains a copy of Dq
obtained by identifying the variable x with c. Set
[
Du+ = {R(~a) ∈ B | ~a ⊆ dom(Du )} ∪
Dc .
c∈dom(Du )
The following properties of Du+ follow directly from the definition
and standard properties of ALCHIF :
1. For any c, d ∈ dom(Du ) with c↑ = d↑ there is an isomorphism
from Dc onto Dd mapping c to d;
2. there is a homomorphism from Du+ into B preserving
dom(Du ) and functionality. Thus, in particular, any materialization of Du+ is a materialization of Du and for every CQ
q(~x) and d~ in Du of the same length as ~
x:
~
O, Du+ |= q(d)
⇔
~
O, Du |= q(d)
Now take a materialization Bu+ of Du+ and O. Thus Bu+ is
obtained from Du+ by hooking tree-interpretations Bu+
that are
c
models of Dc to every c ∈ dom(Du ). By Point 1 and Fact 1 and
the properties of ALCHIF we have the following:
Fact 3. For any c, d ∈ dom(Du ) with c↑ = d↑ , c ∈ dom(bag(t)),
and d ∈ dom(bag(t0 )) such that t ∼ t0 the canonical automorphism
u+
ĥt,t0 extends to an isomorphism from Bu+
|dom(Dc ) onto B|dom(Dd ) .
The following fact can now be proved by modifying in a straightforward way the proof of Fact 4 above.
Fact 4. For any c, d ∈ dom(Du ) with c↑ = d↑ and any e ∈
u+
dom(Bu+
c ) there exists an isomorphic embedding of B|{c,e} into
u+
B|dom(Dd ) .
Let for c, d ∈ dom(Du ), c ∼ d if c↑ = d↑ . Fix for any equivalence
class [c] = {d | d ∼ c} a unique c∼ ∈ [c]. We now uniformize
Bu+ . For each d ∈ dom(Du ) we hook at d to Du an isomorphic
u+
↑
↑
copy Bu∗
d of the interpretation Bc∼ with c∼ = d and remove
u∗
u∗
Bu+
.
Denote
the
resulting
model
by
B
.
B
is
uniform
in the
d
u
sense that for any c ∼ d the interpretations Bu∗
c hooked to c in D
u
and Bu∗
hooked
to
d
in
D
are
isomorphic.
One
can
now
prove
d
similary to the proof of Fact 5 above the following:
Fact 5. Bu∗ is a materialization of O and Du .
It remains construct the materialization B+ of D. To this end
we hook to any c ∈ dom(D) the tree interpretation Bu∗
d with
d↑ = c. In addition we include in B+ all R(a↑1 , . . . , a↑k ) such
that R(a1 , . . . , ak ) ∈ Bu+ and a1 , . . . , ak ∈ dom(Du ). Using
Fact 5 one can show that B+ is a model of O and D such that
B+ 6|= q(~a) if B 6|= q(~b).
G.
PROOFS FOR SECTION 6
Theorem 8 (restated) For any of the following ontology languages,
CQ-evaluation w.r.t. L is CSP-hard: uGF2 (1, =), uGF2 (2),
uGF2 (1, f ), and the class of ALCF ` ontologies of depth 2.
P ROOF. We provide additional details of the proof for
uGF2 (1, =). Recall that O contains
^
_ 6=
6=
∀x(
¬(ϕ6=
ϕa (x))
a (x) ∧ ϕa0 (x)) ∧
a6=a0
a
∀x(A(x) → ¬ϕ6=
when A(a) 6∈ A
a (x))
6=
∀xy(R(x, y) → ¬(ϕ6=
(x)
∧
ϕ
(y)))
when R(a, a0 ) 6∈ A
a
a0
=
∀xϕa (x)
for all a ∈ dom(A)
where A and R range over symbols in sig(A) of the respective
arity. We first show that coCSP(A) polynomially reduces to the
query evaluation problem for (O, q ← N (x)). Assume D with
sig(D) ⊆ sig(A) is given and let
D0 = D ∪ {Ra (d, d0 ) | Pa (d) ∈ A},
where the relations Pa ∈ sig(A) determine the precoloring and d0
is a fresh labelled null for each d ∈ dom(D). We show that D → A
iff O, D0 6|= q. First let h be a homomorphism from D to A. Define
a model B of D0 and O by adding to D0 for any d ∈ dom(D) with
h(d) = a and infinite chain
Ra (d0,d , d1,d ), Ra (d1,d , d2,d ), . . .
with d0,d = d and fresh labelled nulls di,d for i > 0. Also add
Ra (d, d) to D for all d ∈ dom(D) and all labelled nulls used in the
chains. Using the definition of O it is not difficult to show that B is
a model of O and D0 . Thus O, D0 6|= q, as required. Now assume
that O, D0 6|= q. Then there is a model B of O and D0 such that
B 6|= q. Define a mapping h from D to A by setting h(d) = a iff
there exists d0 with d0 6= d and Ra (d, d0 ) ∈ B. Using the definition
of O it is not difficult to show that h is well defined and a homomorphism. This finishes the proof of the polynomial reduction of
coCSP(A) to the query evaluation problem for (O, q ← N (x)).
Now we show that for any rAQ q the query evaluation problem
for (O, q) can be polynomially reduced to coCSP(A). We first
show that there is a polynomial reduction of the problem whether
an instance D is consistent w.r.t. O to CSP(A). Assume D is given.
Let D• be the sig(A)-reduct of D extended with Pa (d) for any d
with Ra (d, d0 ) ∈ D for some d0 6= d. Using the definition of O
one can show that D is consistent w.r.t. O iff D• → A.
~ iff D is not consistent w.r.t. O or D0 |= q(d)
~
Now O, D |= q(d)
where D0 = D ∪ {Ra (d, d) | a ∈ dom(A), d ∈ dom(D)}. The
latter problem is in PT IME.
H.
D
d
d1
d3
X
Y
P
Y
X
P
d2
R2
d1
d
d4
R2
X
Y d3 R 1
Y
X d4 R 2
d2
R1
R1
PROOFS FOR SECTION 7
Theorem 10 (restated) For the ontology languages uGF−
2 (2, f )
and ALCIF ` of depth 2, it is undecidable whether for a given
ontology O,
1. query evaluation w.r.t. O is in PT IME, Datalog6= -rewritable, or
CO NP-hard (unless PT IME = NP);
2. O is materializable.
As discussed in the main part of the paper, we prove Theorem 10 in
two steps: we first construct an ontology Ocell that marks lower left
corners of cells and then we construct an ontology OP that marks
the lower left corner of grids that represent a solution to a rectangle
tiling problem P. We construct the ontologies in ALCIF ` . Thus,
in addition to ALCI concepts we use concepts of the form (≤ 1R),
(= 1R), and (≥ 2R). The proof is given using DL notation.
Marking the lower left corner of grid cells. Let X and Y be
binary relations and X − , Y − their inverses in ALCI. Using the
sentences
−
Figure 2: D 6|= cell(d) ⇒ O, D 6|= (= 1P )(d)
Details are given below. Resolving the first issue is more involved.
There are two reasons why the converse does not hold. First, we
might have X(d, d1 ), Y (d1 , d3 ), X(d, d2 ), Y (d2 , d4 ) ∈ D with
d3 6= d4 but both d3 and d4 have already two R2 -successors in D.
Then the marker (= 1P ) is entailed without the cell being closed at
d (i.e, without D |= cell(d)). Second, we might have an odd cycle
of mutually distinct e0 , e1 , . . . , en ∈ D such that each ei reaches
e(i+1) mod n+1 via a Y − X − Y X-path in D, for i = 0, 1, . . . , n.
Figure 3 illustrates this for n = 2. Then, since in at least two neighbouring ei , e(i+1) mod n+1 the same concept (= 1Ri ) is enforced,
the marker (= 1P ) is enforced at some node d from which ei and
e(i+1) mod 3 are reachable along XY and Y X-paths, respectively,
without satisfying cell(d). We resolve both problems by enforcing
that D is not consistent w.r.t. our ontology if such constellations
appear.
> v (≤ 1Z)
for all Z ∈ {X, Y, X , Y − } we ensure that in any instance
D that is consistent w.r.t. our ontology the relations X and Y
as well as their inverses X − and Y − are functional in D in
the sense that R(d, d0 ), R(d, d00 ) ∈ D implies d0 = d00 for all
R ∈ {X, Y, X − , Y − }. For an instance D and d ∈ dom(D)
we write D |= cell(d) if there exist d1 , d2 , d3 with X(d, d1 ),
Y (d1 , d3 ), Y (d, d2 ), X(d2 , d3 ) ∈ D. Since X and Y are functional in D, D |= cell(d) implies d3 = d4 if X(d, d1 ), Y (d1 , d3 ),
X(d, d2 ), Y (d2 , d4 ) ∈ D. As a marker for all d such that D |=
cell(d) we use the concept (= 1P ) for a binary relation P . For
P and all binary relations R introduced below we add the inclusion > v ∃R.> to our ontology so that when building models
one can only choose between having exactly one R-successor or
at least two R-successors. To do the marking we use concepts
(= 1R1 ) and (= 1R2 ) with binary relation symbols R1 , R2 as
‘second-order variables’, ensure that all nodes in D are contained
in (= 1R1 ) t (= 1R2 ), and then state (as a first attempt) that
G
∃X.∃Y.(= 1Ri ) u ∃Y.∃X.(= 1Ri ) v (= 1P )
i=1,2
Clearly, if D |= cell(d) then O, D |= (= 1P )(d) for the resulting ontology O. Conversely, the idea is that if D 6|= cell(d) and
X(d, d1 ), Y (d, d2 ), Y (d1 , d3 ), X(d2 , d4 ) ∈ D but d3 6= d4 , then
one can extend D by adding a single R1 -successor and two R2 successors to d3 , a single R2 -successor and two R1 -successors to
d4 , and two P -successors to d and thus obtain a model B of O and
D in which d 6∈ (= 1P )B , see Figure 2. In general, however, this
does not work and the latter sentence has depth 3. The depth issue
is easily resolved by introducing auxiliary binary relation symbols
RiX , RiY , RiXY and RiY X , i = 1, 2, and replacing concepts such
as ∃X.∃Y.(= 1Ri ) by (= 1RiXY ) and the sentences
(= 1RiXY ) ≡ ∃X.(= 1RiY ) and (= 1RiY ) ≡ ∃Y.(= 1Ri )
Y
X
X
Y
Y
e0 e2
e1
X
X
Y
Y
X
Y
X
Figure 3: D |= (= 1P )(d) 6⇒ D |= cell(d)
In detail, we construct an ontology Ocell that uses in addition
to X, Y, X − , Y − the set AUXcell of binary relations P, Ri , RiW ,
where i ∈ {1, 2} and W ranges over a set of words over the alphabet {X, Y, X − , Y − } we define below. The RiW serve as auxiliary symbols to avoid sentences of depth larger than two. No
unary relations are used. To ensure that CQ-evaluation is Datalog6= rewritable w.r.t. Ocell we include in Ocell the concept inclusions
> v ∃Q.>
for all binary relations Q ∈ AUXcell . If an instance D is consistent w.r.t. Ocell , then its materialization adds a certain number
of Q-successors to any d ∈ dom(D) to satisfy > v ∃Q.> for
Q ∈ AUXcell . The remaining sentences in Ocell only influence the
number of Q-successors that have to be added and thus do not influence the certain answers to CQs. In fact, we will have the following
equivalence
~ iff {> v ∃Q.> | Q ∈ AUXcell }, D |= q(d)
~
Ocell , D |= q(d)
for any CQ q and D that is consistent w.r.t. Ocell . Define for any
non-empty word W over {X, Y, X − , Y − } the set ∃W (= 1Ri ) of
sentences inductively by setting for Z ∈ {X, Y, X − , Y − }:
∃Z (= 1Ri )
= {(= 1RiZ ) ≡ ∃Z.(= 1Ri )}
∃ZW (= 1Ri )
= {(= 1RiZW ) ≡ ∃Z.(= 1RiW )}
∪ ∃W (= 1Ri )
Thus, ∃W (= 1Ri ) states that the unique d0 reachable from d along
a W -path has exactly one Ri -successor iff d has exactly one RiW successor. Now Ocell is defined as follows.
1. Functionality of X, Y, X − and Y − is stated using
> v (≤ 1Z)
−
for X, Y, X , Y .
2. All nodes have exactly one R1 successor or exactly one R2 successor:
> v (= 1R1 ) t (= 1R2 )
3. If all nodes reachable along an XY -path and a Y X-path have
exactly one R1 and exactly one R2 -successor, then the marker
(= 1P ) is set:
l
(= 1RiXY ) u (= 1RiY X ) v (= 1P )
i=1,2
4. For i = 1, 2, the concept (= 1Ri ) is true at least at every third
node on the cycles in D introduced above:
(= 1RjCC ) v (= 1Ri ) t (= 1RiC ) t (= 1RiCC )
for {i, j} = {1, 2} and C = X − Y − XY
5. If (= 1R1 ) and (= 1R2 ) are both true in a node in D then they
are both true in all ‘neighbouring’ nodes in D:
(= 1R1Y
−
−
Y − XY
) u (= 1R2X
X−Y X
) u (= 1R2Y
−
−
for R12 := (= 1R1 ) u (= 1R2 )
• E is of the form e0 ≤ · · · ≤ en with ei 6= ej for all i 6= j, or
• E is a cycle e0 ≤ · · · ≤ en with ei = ej iff {i, j} = {0, n} for
all i 6= j.
−
(= 1R1X
We thus assume in what follows that all three conditions are satisfied. Clearly, they can be encoded in Datalog6= . Moreover, by
Point 2 we can assume that D is saturated for the sentences ∃W (=
1R) in the sense that if (= 1RZW ) ≡ ∃Z.(= 1RW ) ∈ Ocell then
for any Z(d, d0 ) ∈ D the following holds: d has at least two RZW successors in D iff d0 has at least two RW -successors in D. Now let
e1 ≤ e2 iff there are X(d, d1 ), Y (d1 , e1 ), Y (d, d2 ), X(d2 , e2 ) ∈
D. Let e1 ∼ e2 iff e1 ≤ e2 or e2 ≤ e1 and let ∼∗ be the smallest equivalence relation containing ∼. For any equivalence class E
w.r.t. ∼∗ either
Y − XY
) v R12
X−Y X
) v R12
6. The auxiliary sentences ∃W (= 1Ri ) for all relations RiW used
above.
Lemma 11 The ontology Ocell has the following properties for all
instances D:
1. for all d ∈ dom(D): Ocell , D |= (= 1P )(d) iff D is not consistent w.r.t. Ocell or D |= cell(d); moreover, if D is consistent
w.r.t. Ocell , then there exists a materialization B of D and Ocell
such that d ∈ (= 1P )B iff d ∈ dom(B) and D |= cell(d);
2. If all binary relations are functional in D, then D is consistent
w.r.t. Ocell ;
3. CQ-evaluation w.r.t Ocell is Datalog6= -rewritable.
P ROOF. We first derive a necessary and sufficient condition for
consistency of instances D w.r.t. Ocell . Lemma 11 then follows in
a straightforward way. It is easy to see that if any of the following
conditions is not satisfied, then D is not consistent w.r.t. Ocell :
• all X, Y, X − , Y − are functional in D;
• D is consistent w.r.t. the sentences ∃W (= 1Ri ) in Ocell ;
• if D |= cell(d), then d has at most one P -successor in D.
Thus, if E is not a singleton {e} with e ≤ e, we can partition E
into two sets E1 and E2 (with one of them possibly empty) such
that
(†) there are no three e ≤ e0 ≤ e00 in the same Ei .
Now set for any equivalence class E and {i, j} = {1, 2},
Ej0 = {d ∈ E | D |= (≥ 2Ri )(d)}
Claim 1. D is consistent w.r.t. Ocell iff the following conditions
hold for all equivalence classes E:
(a) if E = {e} with e ≤ e then e 6∈ E10 ∪ E20 ;
(b) otherwise, there exists a partition E1 , E2 of E with Ei ⊇ Ei0
satisfying (†).
Moreover, if (a) and (b) hold, then a materialization B satisfying
the conditions of Lemma 11 (1) exists.
(⇒) First assume that Point (a) does not hold for some E = {e}
with e ≤ e. Then D is not consistent w.r.t. Ocell by the axioms
given under (2) and (4) since it is not possible to satisfy (= 1Ri ) in
e if e ∈ Ej0 (i 6= j). Now assume that (b) holds. So there exists E
that has either at least two elements or e 6≤ e if E = {e} but there
exists no partition E1 , E2 of E with Ei ⊇ Ei0 satisfying (†). Then
the axioms under (4) cannot be satisfied without having at least one
node in E that is in both (= 1R1 ) and (= 1R2 ). But then by the
axioms under (5) all nodes in E are in (= 1R1 ) and in (= 1R2 )
which implies that E10 = E20 = ∅. This contradicts our assumption
that there is no partition E1 , E2 of E with Ei ⊇ Ei0 satisfying (†).
(⇐) Assume (a) and (b) hold for every equivalence class E. For
E = {e} with e ≤ e we can thus construct the relevant part of a
model B of D and Ocell such that e has exactly one Ri -successor
for i = 1, 2 and also exactly one P -successor. Thus axiom (2) is
satisfied. All e that are not members of such an equivalence class
are given at least two P -successors. For any equivalence class with
at least two members or E = {e} with e 6≤ e we can construct
the relevant part of B such that each d ∈ Ei has exactly one Ri successor and each d ∈ E \ Ei has at least two Ri -successors. As
E1 and E2 are mutually disjoint, the axioms under (5) are satisfied.
As (†) is satisfied, the axioms under (4) are satisfied. As E1 ∪ E2
contains E, the axioms under (2) are satisfied.
The conditions (a) and (b) can be encoded in a Datalog6= program
in a straightforward way and thus there is a Datalog6= -program
checking consistency of an instance D w.r.t. Ocell . Datalog6= rewritability of CQ-evaluation w.r.t. Ocell now follows from the observation that
~ iff {> v ∃Q.> | Q ∈ AUXcell }, D |= q(d)
~
Ocell , D |= q(d)
for any CQ q, D that is consistent w.r.t. Ocell , and any d~ in D.
This finishes the construction and analysis of Ocell .
Marking the lower left corner of grids. We now encode the rectangle tiling problem. Recall that an instance of the finite rectangle tiling problem is given by a triple P = (T, H, V ) with T a
non-empty, finite set of tile types including an initial tile Tinit to
be placed on the lower left corner and nowhere else and a final
tile Tfinal to be placed on the upper right corner and nowhere else,
H ⊆ T × T a horizontal matching relation, and V ⊆ T × T a
vertical matching relation. A tiling for (T, H, V ) is a map f :
{0, . . . , n}×{0, . . . , m} → T such that n, m ≥ 0, f (0, 0) = Tinit ,
f (n, m) = Tfinal , (f (i, j), f (i + 1, j)) ∈ H for all i < n, and
(f (i, j), f (i, j + 1)) ∈ V for all i < m. We say that P admits a
tiling if there exists a map f that is a tiling for P. It is undecidable
whether an instance of the finite rectangle tiling problem admits a
tiling.
Now let P = (T, H, V ) with T = {T1 , . . . , Tp }. We regard
the tile types in T as unary relations and take the binary relations
symbols X, Y, X − , Y − from above and an additional set AUXgrid
of binary relations F, F X , F Y , U, R, L, D, and A. The ontology
OP is defined by taking Ocell and adding the sentences
> v ∃Q.>
for all Q ∈ AUXgrid as well as all sentences in Figure 4 to it, where
(Ti , Tj , T` ) range over all triples from T such that (Ti , Tj ) ∈ H
and (Ti , T` ) ∈ V :
We discuss the intuition behind the sentences of OP . The relations X and Y are used to represent horizontal and vertical adjacency of points in a rectangle. The concepts (= 1X) of OP serve
the following puroposes:
• (= 1U ), (= 1R), (= 1L), and (= 1D) mark the upper, right,
left, and lower (‘down’) border of the rectangle.
• The concept (= 1F ) is propagated through the grid from the
upper right corner where Tfinal holds to the lower left one where
TinitiaL holds, ensuring that every position of the grid is labeled
with at least one tile type, that the horizontal and vertical matching conditions are satisfied, and that the grid cells are closed
(indicated by (= 1P ) from the ontology Ocell ).
• The relations F X and F Y are used to avoid depth 3 sentences
in the same way as the relations RiW are used to avoid such
sentences in the construction of Ocell .
• Finally, when the lower left corner of the grid is reached, the
concept (= 1A) is set as a marker.
We write D |= grid(d) if there is a tiling f for P with domain {0, . . . , n} × {0, . . . , m} and a mapping γ : {0, . . . , n} ×
{0, . . . , m} → dom(D) with γ(0, 0) = d such that
• for all j < n, k ≤ m: T (γ(j, k)) ∈ D iff T = f (j, k);
• for all b1 , b2 ∈ dom(D): X(b1 , b2 ) ∈ D iff there are j < n,
k ≤ m such that (b1 , b2 ) = (γ(j, k), γ(j + 1, k));
• for all b1 , b2 ∈ dom(D): Y (b1 , b2 ) ∈ D iff there are j ≤ n,
k < m such that (b1 , b2 ) = (γ(j, k), γ(j, k + 1)).
• g is closed: if d ∈ ran(γ) and Z(d, d0 ) ∈ D for some Z ∈
{X, Y, X − , Y − }, then d0 ∈ ran(γ).
We then call d the root of the n × m-grid with witness function γ
for P. The following result can now be proved using Lemma 11.
Lemma 12 The ontology OP has the following properties for all
instances D:
1. for all d ∈ dom(D): OP , D |= (= 1A)(d) iff D is not consistent w.r.t. OP or D |= grid(d); moreover, if D is consistent
w.r.t. OP , then there exists a materialization B of D and OP
such that d ∈ (= 1A)B iff d ∈ dom(B) and D |= grid(d);
2. If D |= grid(d) with witness γ such that dom(D) = ran(γ), and
all relations are functional in D then D is consistent w.r.t. OP ;
3. CQ-evaluation w.r.t OP is Datalog6= -rewritable.
We now use Lemma 12 to prove the undecidability result. Let O =
OP ∪ {(= 1A) v B1 t B2 }, where B1 and B2 are unary relations.
Lemma 13 (1) If P admits a tiling, then O is not materializable
and CQ-evaluation w.r.t. O is CO NP-hard.
(2) If P does not admit a tiling, then CQ-evaluation w.r.t. O is
Datalog6= -rewritable.
P ROOF. (1) Consider a tiling f for P with domain {0, . . . , n}×
{0, . . . , m}. Regard the pairs in {0, . . . , n} × {0, . . . , m} as constants. Let D contain
X((i, j), (i + 1, j)),
for i < n and j ≤ m,
Y ((i, j), Y (i, j + 1))
for i ≤ n and j < m, and
T (i, j)
for f (i, j) = T for i ≤ n and j ≤ m. Then D is consistent w.r.t. OP and O, D |= (= 1A)(0, 0), by Lemma 12. Thus
OP , D |= B1 (0, 0) ∨ B2 (0, 0) but O, D 6|= B1 (0, 0) and O, D 6|=
B2 (0, 0). Thus O is not materializable and so CQ-evaluation is
CO NP-hard.
(2) Assume P does not admit a tiling. Any instance D is consistent w.r.t. O iff it is consistent w.r.t. OP and, by Lemma 12, if
OP , D |= (= 1A)(d) for some d ∈ dom(D), then D is not con~ iff OP , D |= q(d)
~ holds for
sistent w.r.t. OP . Thus, O, D |= q(d)
~ Thus CQ-evaluation w.r.t. O is Datalog6= every CQ q and all d.
rewritable by Lemma 12.
Lemma 13 implies Theorem 10 for ALCIF ` ontologies of depth 2.
For uGF−
2 (2, f ) we modify the construction of Ocell and OP as
follows:
• The relations X, Y, X − , Y − are defined as functions and it is
stated that X − and Y − are the inverse of X and Y , respectively.
• For any relation symbol R in OP distinct from X, Y, X − , Y −
we introduce a function F , state ∀xF (x, x), and replace
by
> v ∃R.>
∀x∃y(R(x, y) ∧ F (x, y)).
• We replace all occurrences of (=
{X, Y, X − , Y − } in OP by
1R) for R
6∈
¬∃y(R(x, y) ∧ ¬F (x, y))
Now Lemma 11 and Lemma 12 still hold for the resulting ontologies Ocell and OP if (= 1P ) and (= 1A) are replaced by
¬∃y(P (x, y) ∧ ¬F (x, y)) and ¬∃y(A(x, y) ∧ ¬F (x, y)), respectively.
We now come to the proof of Theorem 11. We first give a more
detailed definition of the run fitting problem.
Tfinal
∃X.((= 1U ) u (= 1F ) u Tj ) u Ti
∃Y.((= 1R) u (= 1F ) u T` ) u Ti
v (= 1F ) u (= 1U ) u (= 1R)
v (= 1U ) u (= 1F )
v (= 1R) u (= 1F )
∃Y.(= 1F ) v (= 1F Y )
∃X.(= 1F ) v (= 1F X )
∃X.(Tj u (= 1F ) u (= 1F Y ))u
∃Y.(T` u (= 1F ) u (= 1F X )) u (= 1P ) u Ti
(= 1F ) u Tinit
t
1≤s<t≤p
Ts u Tt
v (= 1F )
v (= 1A) u (= 1D) u (= 1L)
v
⊥
(= 1U ) v ∀Y.⊥ (= 1R) v ∀X.⊥ (= 1U ) v ∀X.(= 1U )
(= 1D) v ∀Y − .⊥ (= 1L) v ∀X − .⊥ (= 1D) v ∀X.(= 1D)
(= 1R) v ∀Y.(= 1R)
(= 1L) v ∀Y.(= 1L)
Figure 4: Additional Axioms of OP
We consider non-deterministic Turing machines (TMs, for short).
A TM M is represented by a tuple (Q, Σ, ∆, q0 , qa ), where Q is
a finite set of states, Σ is a finite alphabet, ∆ ⊆ Q × Σ × Q ×
Σ × {L, R} is the transition relation, and q0 , qa ∈ Q are the start
state and accepting state, respectively. The tape of M is assumed
to be one-sided infinite, that is, it has a left-most cell and extends
infinitely to the right. The set of strings accepted by M is denoted
by L(M ). A configuration of M is represented by a string vqw,
where q is the state, v is the inscription of the tape to the left of
the head, and w is the inscription of the tape to the right of the
head in the configuration (as usual, we omit all but possibly a finite
number of trailing blanks). The configuration is accepting if q =
qa . A run of M is represented by a finite sequence γ0 , . . . , γn of
configurations of M with |γ0 | = · · · = |γn |. We assume that the
accepting state has no successor states. A run is accepting if its last
configuration is accepting.
Definition 7 Let M = (Q, Σ, Γ, ∆, q0 , qa ) be a TM.
• A partial configuration of M is a string γ̃ over Q ∪ Σ ∪ {?}
such that there is at most one i ∈ {1, . . . , n} with γ̃[i] ∈ Q.
Here, γ[i] denotes the symbol that occurs at the i-th position
of γ. A configuration γ matches γ̃ if |γ| = |γ̃| and for each
i ∈ {1, . . . , n} with γ̃[i] 6= ? we have γ[i] = γ̃[i].
• A partial run of M is a sequence γ̃ = (γ̃0 , γ̃1 , . . . , γ̃m ) of
partial configurations γ̃i of M such that γ̃0 starts with q0 , has
no other occurrence of a state q ∈ Q, and |γ̃0 | = · · · = |γ̃m |.
A run γ0 , γ1 , . . . , γn of M matches γ̃ if m = n and γi matches
γ̃i , for each i ∈ {0, 1, . . . , m}.
Definition 8 The run fitting problem for a TM M , denoted by
RF(M ), is defined as follows: Given a partial run γ̃ of M , decide whether there is an accepting run of M that matches γ̃.
It is easy to see that RF(M ) is in NP for every TM M . The
following theorem states that the complexity of RF(M ) may be intermediate between PT IME and NP-complete, provided PT IME 6=
NP.
Theorem 12 (restated) If PT IME 6= NP, then there is a TM whose
run fitting problem is neither in PT IME nor NP-complete.
To prove Theorem 12, we now construct such a TM. The construction is a modification of the construction from Impagliazzo’s
version of the proof of Ladner’s Theorem [38], as presented in [2,
Theorem 3.3].
Fix a polynomial-time TM MSAT for SAT. For a monotone
polynomial-time computable function H : N → N to be specified
later, let MH be a polynomial-time TM that works as follows on a
given input v:
1. Check that there is an integer n ≥ 0 such that v is the unary
H(n)
representation of nH(n) (i.e., v = 1n
). If such an n does
not exist, then reject v.
2. Guess an input w of length n for MSAT .
3. Generate the initial configuration γ of MSAT on input w.
4. Run MSAT from γ, and accept v iff MSAT accepts w.
We refer to the first three steps as the initialization phase.
We now define the function H : N → N. Fix a polynomial time
computable enumeration M0 , M1 , M2 , . . . of deterministic TMs
such that all runs of Mi on inputs of length n terminate after at most
i · ni steps, and for each problem A in PT IME there are infinitely
many i such that L(Mi ) is in A.3 Then, H(n) is defined as
min {i < log log n | for all strings z of length ≤ log n,
Mi accepts z iff z ∈ RF(MH )},
or as log log n if the minimum does not exist. It is not hard to see
that H is well-defined, and that there is a deterministic polynomialtime Turing machine that, given a positive integer n in unary, outputs H(n). For details, we refer to [2].
This finishes the construction of MH . Lemma 15 below shows
that RF(MH ) has the desired properties, namely that RF(MH ) is
neither in PT IME nor NP-complete, unless PT IME = NP. It uses
the following auxiliary lemma.
Lemma 14
• If RF(MH ) is in PT IME, then H(n) = O(1).
3
It is easy to construct a deterministic polynomial-time Turing machine that, given an integer i ≥ 0, outputs a deterministic Turing
machine Mi such that the sequence (Mi )i≥0 has the desired properties. For instance, let Mi0 be the i-th deterministic Turing machine in lexicographic order under some string encoding of Turing
machines, and add a clock to Mi0 that stops the computation of Mi0
after at most i · ni steps (and rejects if Mi0 did not accept yet).
• If RF(MH ) is not in PT IME, then limn→∞ H(n) = ∞.
P ROOF. The proof is as in [2]. We provide a proof for the sake
of completeness.
Assume first that RF(MH ) is in PT IME. Then, there is an index
i
i such that L(Mi ) = RF(MH ). Now, for all n > 22 , we have
i < log log n and thus H(n) ≤ i by the definition of H. It follows
i
that H(n) ≤ max{H(m) | m ≤ 22 + 1}, and therefore H(n) =
O(1).
Next assume that RF(MH ) is not in PT IME. For a contradiction,
suppose that limn→∞ H(n) 6= ∞. Since H is monotone, this
means that there are integers n0 , i ≥ 0 such that H(n) = i for
all integers n ≥ n0 . Let n ≥ n0 . By the definition of H, we
have that Mi agrees with RF(MH ) on all strings of length at most
log n. Since this holds for all n ≥ n0 , we conclude that Mi decides
RF(MH ). But then, RF(MH ) is in PT IME, which contradicts our
initial assumption that RF(MH ) is not in PT IME.
We now prove the main lemma, which concludes the proof of
Theorem 12.
Lemma 15 If PT IME 6= NP, then RF(MH ) is neither in PT IME
nor NP-complete.
P ROOF. “RF(MH ) is not in PT IME”: For a contradiction, suppose that RF(MH ) is in PT IME. By Lemma 14, there is a constant
c ≥ 0 such that for all integers n ≥ 0 we have H(n) ≤ c. Suppose
that on inputs of length n, MSAT makes at most p(n) := nk + k
steps. Then, the following is a polynomial-time many-one reduction from SAT to RF(MH ), implying PT IME = NP, and therefore
contradicting the lemma’s assumption.
Given an input x of length n for MSAT :
h
1. Compute h := H(n) and w := 1n .
2. Output the partial run γ̃0 , . . . , γ̃i+p(n) of MH such that:
• γ̃0 , . . . , γ̃i corresponds to the initialization phase of MH on
input w that generates the start configuration of MSAT on
input x. In particular, γ̃0 , . . . , γ̃i are complete configurations
of MH , and γ̃0 = q0 w and γ̃i = q00 x, where q0 and q00 are
the start states of MH and MSAT , respectively;
• γ̃i+1 , . . . , γ̃i+p(n) consist entirely of stars.
Note that the partial run γ̃0 , . . . , γ̃i+p(n) can be easily computed by
simulating the initialization phase MH on input w, where in step
2 of the initialization phase we “guess” the input string x given as
input to the reduction. Then, we pad the sequence of configurations
corresponding to the initialization phase by p(n) partial configurations, each consisting of exactly p(n) stars.
“RF(MH ) is not NP-complete”: Suppose, to the contrary, that
RF(MH ) is NP-complete. Then there is a polynomial-time manyone reduction f from SAT to RF(MH ). Using f , we construct
a polynomial-time many-one reduction g from SAT to SAT such
that for all sufficiently large strings x we have |g(x)| < |x|. This
implies that SAT can be solved in polynomial time, and contradicts
PT IME 6= NP.
Consider an input x for SAT. Since f is a many-one reduction
from SAT to RF(MH ), we have
f (x) = γ̃0 #γ̃1 # · · · #γ̃m
for some partial run
γ̃ := (γ̃0 , γ̃1 , . . . , γ̃m )
(4)
of MH . Moreover, x ∈ SAT iff there is an accepting run of MH
that matches γ̃. By the construction of MH , an accepting run of
MH on an input y can only exist if there is an integer n ≥ 0 such
H(n)
that y = 1n
. Note also that the length of y has to be bounded
by |γ̃0 |. Define
N := {n ∈ N | nH(n) ≤ |γ̃0 |}.
Then, as argued above, the following are equivalent:
1. x ∈ SAT;
2. γ̃0 #γ̃1 # · · · #γ̃m ∈ RF(MH );
3. there is an n ∈ N such that there is an accepting run of MH on
H(n)
input 1n
that matches γ̃.
In what follows, we show how to compute in polynomial time,
for each n ∈ N , a propositional formula φn such that:
• φn is satisfiable if and only if there is an accepting run of MH
H(n)
on input 1n
that matches γ̃;
• |φn | ≤
|x|
|N |
− 2 for all n ∈ N (if x is large enough).
Then, the following function g is a polynomial-time many-one reduction from SAT to SAT:
_
g(x) :=
φn .
n∈N
Assuming a suitable encoding of propositional formulas, the size of
g(x) is then bounded by |x| − 1 for large enough x. Thus, g is the
desired length-reducing polynomial-time self-reduction of SAT. It
remains to construct φn , for all n ∈ N .
C ONSTRUCTION OF φn . Fix n ∈ N . By the construction of
H(n)
MH , any accepting run of MH on input 1n
has to start with
the initialization phase. The first step of the initialization phase is
H(n)
.
deterministic, and checks whether the input has the form 1n
Thus, we can complete γ̃ in polynomial time to a partial run of
MH where the first step of the initialization phase is completely
specified. If this is not possible due to constraints imposed by γ̃,
then we know that the desired accepting run does not exist, and we
can output a trivial unsatisfiable formula φn . Otherwise, let
γ̃˜ = (γ̃˜0 , γ̃˜1 , . . . , γ̃˜m )
be the resulting partial run of MH . It remains to construct a formula φn that is satisfiable iff there is an accepting run of MH that
˜
matches γ̃.
˜ Let i ≥ 0 be such that γ̃˜0 , . . . , γ̃˜i
Let us take a closer look at γ̃.
corresponds to the first step of the initialization phase of MH on
H(n)
input 1n
. In particular, for each j ∈ {0, 1, . . . , i}, γ̃˜j is a completely specified configuration. It is possible to specify MH in such
a way that the second and third step of the initialization phase of
H(n)
MH on input 1n
take exactly n computation steps combined,
and that any configuration after the initialization phase uses space
at most n. Thus, without loss of generality we can assume:
1. |γ̃˜j | ≤ n for all j ∈ {i + 1, . . . , m};
2. m − i − n is bounded by the running time of MSAT on inputs of
length n.
Let h be a polynomial-time computable function that, given
γ̃˜i+1 # · · · #γ̃˜m , outputs a propositional formula that is satisfiable
iff there is an accepting run of MH that starts in the second step of
the initialization phase of MH in a configuration matching γ̃˜i+1 ,
and that matches γ̃˜i+1 , . . . , γ̃˜m . Let
φn := h(γ̃˜i+1 # · · · #γ̃˜m ).
This finishes the construction of φn .
It is immediate from the construction of φn that φn is satisfiable
H(n)
if and only if there is an accepting run of MH on input 1n
that
matches γ̃. It remains to prove that the length of φn is bounded by
|x|/|N | − 2.
B OUNDING THE SIZE OF φn . Let p be a polynomial such that
for all strings z, MSAT makes at most p(|z|) steps on input z, and
both |f (z)| and |h(z)| are bounded by p(|z|). Since, as mentioned
above, m−i−n is bounded by the running time of MSAT on inputs
of length n, we have
Suppose that there is an n ∈ N such that q(n) > |x|/|N | − 2.
Then,
1
k
|x|
−2−k
.
n>
|N |
This implies
nH(n) ≥
≥
Since moreover |γ̃˜j | ≤ n for all j ∈ {i + 1, . . . , m}, we have
|φn | ≤ |h(γ̃˜i+1 # · · · #γ̃˜m )|
≤ p(|γ̃˜i+1 # · · · #γ̃˜m |)
Hence,
for some polynomial q depending only on MSAT , f , and h. It remains to show that for all n ∈ N we have q(n) ≤ |x|/|N | − 2 if x
is sufficiently large.
Claim. There is a polynomial r(`) > 0 depending only on MH
|x|
≥ r(|x|).
such that for sufficiently large x we have |N
|
Proof. Recall that N consists of all integers n ≥ 0 such that
nH(n) ≤ |γ̃0 |. Since γ̃0 is part of f (x), whose overall length is
bounded by p(|x|), we have nH(n) ≤ p(|x|).
Now, for all integers ` ≥ 0, define
N (`) := {n ∈ N | nH(n) ≤ p(`)}.
Then, N ⊆ N (|x|). We show that for all constants c ∈ (0, 1) there
is an integer λc ≥ 0 such that for all integers ` ≥ λc we have
|N (`)| ≤ `c . This implies the claim.4
Fix a constant c ∈ (0, 1) and L := {` ∈ N | |N (`)| > `c }. For
each ` ∈ L, there is an n` ∈ N (`) with n` ≥ `c . Thus, by the
definition of N (`) and the monotonicity of H, for each ` ∈ L we
have
H(n` )
≤ p(`).
(5)
Now, since lim`→∞ H(`) = ∞ (by Lemma 14), we have
lim`→∞ cH(`c ) = ∞. Hence, there is an integer λc ≥ 0 such
c
that for all ` ≥ λc we have `c·H(` ) > p(`). This implies that for
c
all ` ≥ λc we have |N (`)| ≤ ` (otherwise, we would violate (5)).
y
Assume that q(n) = nk +k. Let r be a polynomial as guaranteed
by the claim. In what follows, we will assume that x is large enough
so that:
|x|
|N |
≥ r(|x|); this can be satisfied by the previous claim.
1
k
2. (r(|x|) − 2 − k)H (r(|x|)−2−k) /k > p(|x|); this is possible
since lim`→∞ H(`) = ∞ by Lemma 14.
√
4
Set r(`) := `1−c for some c ∈ (0, 1) (e.g., r(`) = `). Then,
|x|
≥ |N |x|
≥ r(|x|) if |x| ≥ λc .
|N |
(|x|)|
1.

|x|
−2−k
|N |
|x|
−2−k
|N |

1
k
k
1
H (r(|x|)−2−k) k
!
k
≤ p(|x|) − nH(n)
|φn | ≤ q(n)
≤ n`
k
where the last two inequalities follow from the monotonicity of H.
Consequently,
|γ̃˜i+1 # · · · #γ̃˜m | ≤ |γ̃˜0 # · · · #γ̃˜m | − |γ̃˜0 # · · · #γ̃˜i |
≤ p( (p(n) + n) · (n + 1) ).
)
H(n)
≥ (r(|x|) − 2 − k)
> p(|x|),
≤ p( (m − i) · (n + 1) )
c
|x|
−2−k
|N |
H
m − i ≤ p(n) + n.
`c·H(`
< p(|x|) − p(|x|),
which is the desired contradiction.
Lemma 4 (restated) For every Turing machine M , there is a
uGF−
2 (2, f ) ontology O and an ALCIF ` ontology O of depth
2 such that the following hold, where N is a distinguished unary
relation:
1. there is a polynomial reduction of the run fitting problem for M
to the complement of evaluating the OMQ (O, q ← N (x));
2. for every UCQ q, evaluating the OMQ (O, q) is polynomially
reducible to the complement of the run fitting problem for M .
P ROOF. We give the proof for ALCIF ` ontologies of depth 2.
The proof for uGF−
2 (2, f ) is obtained by modifying the ALCIF `
ontology in the same way as in the proof of Theorem 10 by replacing, for example, (≥ 2R) by ∃y(R(x, y) ∧ ¬F (x, y)).
Assume M = (Q, Γ, ∆, q0 , qa ) is given. The instances D we
use to represent partial runs and that provide the space for simulating matching runs are n × m X, Y -grids with Tinit written in the
lower left corner, Tfinal written in the upper right corner, and B (for
blank) written everywhere else. To re-use the notation and results
from the proof of Theorem 10 we regard such a structure as a tiling
with tile types T = {B, Tfinal , Tinit }. Then the ontology OG for
G = (T, H, V ) and
H
V
= {(B, B), (B, Tfinal ), (Tinitial , B)}
= {(B, B), (B, Tfinal ), (Tinitial , B)}
checks whether an instance represents a grid structure. We now
construct the set OM of sentences that encode runs of M that match
a partial run. For any D, the simulation of a run is triggered at a
constant d exactly if OG , D |= (= 1A)(d). OM uses in addition
to the binary relations in OG binary relations q ∈ Q that occur
in concepts (≥ 2q) that indicate that M is in state q. It also uses
binary relations G for G ∈ Γ that occur in concepts (≥ 2G) that
indicate that G is writte on the corresponding cell of the tage. The
sentences of OM are now as follows:
• The grid in which the lower left corner is marked with (= 1A)
is colored with (= 1A), q0 is the first symbol of the first configuration and no state occurs later in the first configuration:
(= 1A) v ∀X.(= 1A) u ∀Y.(= 1A) ,
(= 1A) u Tinit v (≥ 2q0 ) ,
(= 1A) u (= 1D) u (≥ 2q) v (= 1L)
The remaining sentences are all relativized to (= 1A) and so
apply to constants in a grid only.
• every grid point is colored with some (≥ 2G) for G ∈ Γ or
(≥ 2q) for q ∈ Q:
G
G
(= 1A) v
(≥ 2G) t
(≥ 2q)
G∈Γ
q∈Q
• The machine M cannot be in two different states at the same
time and no two distinct symbols can be written on the same
cell. For all mutually distinct H1 , H2 ∈ Q ∪ Γ:
(= 1A) u (≥ 2H1 ) u (≥ 2H2 ) v ⊥,
• For any triple G0 qG1 ∈ Γ×Q×Γ let S(G0 qG1 ) denote the set
of all possible successor triples S1 S2 S3 ∈ (Q × Γ × Γ) ∪ (Γ ×
Γ × Q) according to the transition relation ∆ of M . To avoid
sentences of depth larger than two we introduce for the words
W ∈ {X, XX} and for S ∈ Q ∪ Γ fresh binary relations S W
and the sentences
X
(= 1A) u (≥ 2S ) ≡ (= 1A) u ∃X.(≥ 2S) ,
(= 1A) u (≥ 2S XX ) ≡ (= 1A) u ∃X.(≥ 2S X )
and then add
(= 1A) u (≥ 2G0 ) u (≥ 2q X ) u (≥ 2GXX
)v
1
G
∃Y.(≥ 2S1 ) u (≥ 2S2X ) u (≥ 2S3XX )
S1 S2 S3 ∈S(G0 qG1 )
to OM .
• the symbol written on a cell does not change if the head is more
than one cell away. For all G, G1 , G2 ∈ Γ:
−
≥2
≥2
(= 1A) u ∀X.G≥2
v ∀Y.G≥2
1 u ∀X .G2 u G
:= (≥ 2Gi ) with i ∈ {1, 2} or empty.
for G≥2
i
• the final state cannot be a state distinct from the accepting state
qa . For all q ∈ Q \ {qa }:
(= 1A) u (≥ 2q) v ∃Y.>
• Finally, let AUXM denote the set of fresh binary relations used
above and add
> v ∃Q.>
to OM for all Q ∈ AUXM .
This finishes the definition of OM . Let O = OG ∪ OM . We show
that O is as required.
Let N be a fresh unary relation symbol. Then an instance D is
consistent w.r.t. O if O, D 6|= q for the Boolean query q ← N (x).
It therefore suffices to provide a polynomial reduction of the run
fitting problem for M to the problem whether an instance D is
consistent w.r.t. O.
Assume that a partial run γ̃ = (γ̃0 , γ̃1 , . . . , γ̃m ) of partial
configurations γ̃i of M such that γ̃0 starts with q0 and |γ̃0 | =
· · · = |γ̃m | = n + 1 is given. We define an instance D with
D |= grid(0, 0) which encodes the partial run. Thus we regard
(i, j) with 0 ≤ i ≤ n and 0 ≤ j ≤ m as constants and D contains
the assertions
X((i, j), (i + 1, j)),
Tinit (0, 0),
Y ((i, j), (i, j + 1)),
Tfinal (n, m)
and B(i, j) for (i, j) 6∈ {(0, 0), (n, m)}. In addition, we include
in D the atoms
S((i, j), d1i,j ),
S((i, j), d2i,j )
for distinct fresh constants d1i,j and d2i,j for all i, j such that γ̃j [i] =
S and S 6= ?. It is now straightforward to show that D is consistent
w.r.t. O iff there is an accepting run of M that matches γ̃.
For the converse direction, we have to provide for every UCQ q
a polynomial reduction of the query evaluation problem for (O, q)
to the complement of the run fitting problem for M . To this end
observe that the following two conditions are equivalent for any
UCQ q(~
x), instance D, and tuple ~a:
1. O, D |= q(~a)
2. D is not consistent w.r.t. O or
{> v ∃Q.> | Q ∈ AUX}, D |= q(~a)
where AUX = AUXcell ∪ AUXgrid ∪ AUXM .
As the latter problem is in PT IME it suffices to provide a polynomial reduction of the problem whether an instance D is consistent
w.r.t. O to the run fitting problem for M . Assume D is given.
First decide in polynomial time whether D is consistent w.r.t. OG
(Lemma 12). If not, we are done. If yes, let G be the set of all
d ∈ dom(D) such that D |= grid(d). Assume without loss of generality that G = {0, . . . , k}. Then we find natural numbers ni , mi
such that each i ∈ G is the root of an ni × mi -grid with witness
function βi for G. By Lemma 12, there is a materialization B of
D and OG such that d ∈ G iff d ∈ (= 1A)B . Next we check in
polynomial time that D is consistent w.r.t. the union of OG and the
sentences of the form
(= 1A) u (≥ 2S X ) ≡ (= 1A) u ∃X.(≥ 2S),
(= 1A) u (≥ 2S XX ) ≡ (= 1A) u ∃X.(≥ 2S X )
in OM . If this is not the case we are done. If this is the case
we assume that D is saturated in the sense that if, for example,
(= 1A) u (≥ 2S X ) ≡ (= 1A) u ∃X.(≥ 2S) is in OM then for
any d ∈ (= 1A)B and X(d, d0 ) ∈ D the following holds: d has at
least two S X -successors in D iff d0 has at least two S-successors
in D. Now for each i ∈ G we define the sequences of strings
i
γ̃ i = (γ̃0i , . . . , γ̃m
)
i
by setting for 0 ≤ r ≤ ni and S ∈ Γ ∪ Q,
γ̃ri [j] = S iff βi (j, r) has at least two S-successors in D
If any of these strings is not a partial configuration, then D is not
consistent w.r.t. O and we are done. Otherwise each γ̃ i is a partial
run of M . It is now straightforward to show that D is consistent
w.r.t. O iff for each i ∈ G there exists an accepting run of M that
matches γ̃ i which provides us with a polynomial reduction of the
consistency problem for instances D to the run fitting problem for
M.
I.
PROOFS FOR SECTION 8
We start by introducing the basic notions used in this section in
more detail. An interpretation D is O-saturated if for all atoms
R(~a) with ~a ⊆ dom(D) such that O, D |= R(~a) it follows that
R(~a) ∈ D. For every O and instance D there exists a unique
minimal (w.r.t. set-inclusion) O-saturated instance DO ⊇ D. We
call DO the O-saturation of D. The following lemma states basic
properties of O-saturated instances.
Lemma 16 Let D ⊆ D0 be instances with D0|dom(D) = D and let
O be a uGC2 (=) ontology.
1. There exists a materialization of O and D iff there exists a materialization of O and the O-saturation of D.
2. If B is a materialization of O and D and D is O-saturated, then
B|dom(D) = D.
3. If D0 is O-saturated, then D is O-saturated.
We also require a lemma about stronger forms of unravelling tolerance and forest models that one can employ for ALCHIQ ontologies. Call a model B of an instance D an irreflexive forest model
if it is obtained from D by adding atoms of the form R(a, b) with
a 6= b in dom(D) to D and hooking irreflexive tree interpretations
Ba to every a ∈ dom(D). The following variations of Lemma 1
and Theorem 7 can be proved by modifying the proofs in a straightforward way taking into account the limited expressive power of
ALCHIQ.
Lemma 17 Let O be an ALCHIQ ontology of depth 1. Then the
following hold:
1. Let A be a model of O and D. Then there exists an irreflexive
forest model B of O and D such that there exists a homomorphism h from B to A that preserves dom(D).
2. O is materializable iff O is materializable for the class of all
irreflexive tree instance D with dom(D) ⊆ sig(O).
To characterize materializability in uGC−
2 (1, =) we admit 1materializability witnesses with loops and require that they satisfy additional closure conditions which ensure that one can construct a materialization in a step-by-step fashion, as in the proof
of Lemma 6. To formulate the conditions we employ a ‘mosaic technique’ and introduce mosaic pieces consisting of a 1materializability witness and a homomorphism into a bouquet.
Given a set of such mosaic pieces satisfying certain closure conditions we can then construct the materialization of a bouquet stepby-step using the 1-materializability witnesses and, simultaneously,
construct a homomorphism into any other model of the bouquet
(and thereby prove that we have indeed constructed a materialization). The resulting procedure for checking materializability will
be in NE XP T IME as we can first guess an exponential size set of
mosaic pieces and then confirm in exponential time that it satisfies
our conditions.
A 1-model pair is a tuple (F, a, B) such that
• F is a bouquet with root a that is consistent w.r.t. O, of outdegree ≤ |O| and such that sig(D) ⊆ sig(O);
• B is a bouquet with root a of outdegree ≤ 2|O| that is a model
of F such that there exists a model A of F and O with A≤1
a = B.
A 1-model pair (F, a, B) is a 1-materializability witness if B
is a 1-materialization of F w.r.t. O. We require two different
types of mosaic pieces. First, an injective hom-pair takes the form
(F, a, B) →h (F0 , a0 , B) such that
• (F, a, B) is a 1-materializability witness and (F0 , a0 , B0 ) is a
1-model pair;
1. If (F, a, B) ∈ M and F0 is a bouquet with root a0 such that there is
a bijective homomorphism h0 from F to F0 mapping a to a0 , and B0
is an interpretation such that (F0 , a0 , B0 ) is a 1-model pair, then there
exists an extension h of h0 such that (F, a, B) →h (F0 , a0 , B0 ) ∈
H.
2. If (F, a, B) →h (F0 , a0 , B0 ) ∈ H or (F, a, B) →h (B0 , a0 ) ∈ E
with b ∈ dom(B) \ dom(F) and h(a) = h(b) = a0 , then there exist
h0 and B00 such that (B|{a,b} , b, B00 ) →h0 (B0 , a0 ) ∈ E.
Table 1: Conditions for M , H, and E in materializability check
for uGC−
2 (1, =)
• h is a homomorphism from B to B0 and its restriction to
dom(F) is a bijective homomorphism onto F0 mapping a to a0 .
Thus, in an injective hom-pair the 1-materializability witness is a
piece of the materialization we wish to construct and the 1-model
pair is a piece of the model into which we wish to homomorphically
embed the materialization. Injective hom-pairs assume the homomorphic embedding one wants to extend (i.e., the restriction of h
to dom(F)) is injective. To deal with non-injective embeddings (as
indicated by Example 7) we also consider contracting hom-pairs
which take the form (F, a, B) →h (B0 , a0 ) where
• (F, a, B) is a 1-materializability witness with dom(F) =
{a, b};
• (B0 , a0 ) is a bouquet with root a0 of outdegree at most 2|O|
0
such that there exists a model A of O with A≤1
a0 = B ;
• h is a homomorphism from B to B0 with h(a) = h(b) = a0 .
Similarly to injective hom-pairs, in a contracting hom-pair the 1materializability witness is a piece of the materialization we wish
to construct and the second component is a piece of the model
into which we wish to homomorphically embed the materialization. The following lemma now provides a N EXP T IME decision
procedure for materializability of uGC−
2 (1, =) ontologies.
Lemma 18 Let O be a uGC−
2 (1, =) ontology. Then O is materializable iff there exist
1. a set M of 1-materializability witnesses containing exactly one
1-materializability witness for every bouquet F of outdegree ≤
|O| with sig(F) ⊆ sig(O) that is consistent relative to O;
2. sets H of injective hom-pairs and E of contracting hom-pairs
whose first components are all in M
such that the conditions of Table 1 hold.
P ROOF. Using Lemma 2 one can show that sets M , H, and E
satisfying the conditions of Table 1 exist if O is materializable.
Conversely, let the sets M , H, and E satisfy the conditions given
in Lemma 18. Assume a bouquet D of outdegree ≤ |O| with
sig(D) ⊆ sig(O) and root a is given. Take the 1-materializability
witness (D, a, B) ∈ M . As in the proof of Lemma 6 we construct
a sequence of interpretations B0 , B1 , . . . and sets Fi ⊆ dom(Bi )
of frontier elements (but now using only 1-materializability witnesses in M ): set B0 := B and F0 = dom(B) \ dom(D). If Bi
and Fi have been constructed, then take for any b ∈ Fi its (unique)
predecessor a and a 1-materializability witness (Bi|{a,b} , b, Bb ) ∈
S
M and set Bi+1 := Bi ∪ b∈Fi Bb . Let B∗ be the union of all
Bi . We show that B is a materialization. B is a model of O by
construction since O is a uGC−
2 (1, =) ontology. Consider a model
A of O and D. We construct a homomorphism h from B∗ to A
preserving dom(D) as the limit of a sequence h0 , . . . of homomorphisms from Bi to A. We may assume that the outdegree of A is
≤ 2|O|. By definition, there exists a homomorphism from B0 to
A≤1
a preserving D. Now, inductively, we ensure in each step that
the homomorphisms hi satisfy the following conditions for all Bi
and Fi , all b ∈ Fi and the predecessor a of b in Bi−1 :
(a) Assume hi (a) 6= hi (b). Then, for the 1-materializability witness (B∗|{a,b} , b, Bb ) ∈ M there exists a homomorphism hb
such that
(B∗|{a,b} , b, Bb )
→hb (A|{hi (a),hi (b)} , hi (b), A≤1
hi (b) )
∈H
(b) Assume hi (a) = hi (b). Then, for the 1-materializability witness (B∗|{a,b} , b, Bb ) ∈ M there exists a homomorphism hb
such that
(B∗|{a,b} , b, Bb ) →hb (A≤1
hi (b) , hi (b)) ∈ E
Assume hi with the properties (a) and (b) has been constructed.
Then we take for all b ∈ Fi and the predecessor a of b the homomorphism hb determined by (a) and, respectively, (b) and set
[
hi+1 := hi ∪
hb
b∈Fi
Using the properties of M , H, and E given in Table 1 it is not
difficult to show that hi+1 again has the properties (a) and (b)
above.
We prove Theorem 14, that is, for ALC ontologies of depth 2,
deciding whether query-evaluation is in P TIME is NE XP T IME-hard
(unless P TIME = CO NP). The proof is by reduction of the complement of a NE XP T IME-complete tiling problem in which the aim
is to tile a 2n × 2n -grid. An instance of this problem is given by a
tuple P = (T , t0 , H, V ) where T is a set of tile types, t0 ∈ T is a
distinguished tile to be placed on position (0, 0) of the grid, and H
and V are horizontal and vertical matching conditions. A solution
to P is a function τ : 2n × 2n → T such that
• if τ (i, j) = t and τ (i + 1, j) = t0 then (t, t0 ) ∈ H, for all
i < 2n − 1, j < 2n ,
0
0
• if τ (i, j) = t and τ (i, j + 1) = t then (t, t ) ∈ V , for all
i < 2n , j < 2n − 1,
• τ (0, 0) = t0 .
Let P = (T , t0 , H, V ). We construct an ALC ontology O of
depth 2 such that P has a solution iff O is not materializable.
The idea is that O verifies the existence of an r-chain in the input instance whose elements represent the grid positions along with
a tiling, row by row from left to right, starting at the lower left
corner and ending at the upper right corner. The positions in the
grid are represented in binary by the concept names X1 , . . . , X2n
in the instance where X1 , . . . , Xn indicate the horizontal position
and Xn+1 , . . . , X2n the vertical position. The tiling is represented
by unary relations Tt , t ∈ T . O verifies the chain by propagating
a marker bottom up and while doing this, it verifies the horizontal
matching condition. When the top of the chain is reached, O generates another r-chain using existential quantifiers. On that chain,
a violation of the vertical matching condition is guessed using disjunction, that is, the position where the violation occurs and the
tiles involved in it. If the tiling on the first chain is defective, the
guesses can be made such that the second chain homomorphically
maps into the first one, and then the second chain is ‘invisible’ to
the query. Otherwise, the query can ‘see’ the second chain and because of the disjunctions involved in building it, the instance has no
materialization w.r.t. O. Clearly, the second case can occur only if
P has a solution.
We now assemble the ontology O. To avoid that O is trivially
not materializable, we have to hide the marker propagated up the
chain in the input and all markers used on the existentially generated chain. We thus use concepts of the form ∀s.A instead of
concept names A, additionally stating in O that > v ∃s.A. For
any concept name A, we use HA as an abbreviation of ∀s.A.
1. Set up hiding of concept names:
for all A
{V, ok1 , . . . , ok2n , D, L, Y1 , . . . , Y2n , Y 1 , . . . , Y 2n ,
Z1 , . . . , Zn , Z 1 , . . . , Z n } ∪ {St | t ∈ T },
∈
> v ∃s.A
2. The initial position starts the propagation: for all t ∈ T ,
X 1 u · · · u X 2n u Tt v HV
3. Tiles are mutually exclusive: for all distinct t, t0 ∈ T :
Tt u Tt0 v ⊥
4. The propagation proceeds upwards, checking the horizontal
matching condition:
F
Xi u ∃r.Xi u 1≤j<i ∃r.Xj v Hoki
F
X i u ∃r.X i u 1≤j<i ∃r.Xj v Hoki
d
Xi u ∃r.X i u 1≤j<i ∃r.X j v Hoki
d
X i u ∃r.Xi u 1≤j<i ∃r.X j v Hoki
Hok1 u · · · u Hokn u X i u
∃r.(HV u Tt ) u Tt0 v HV
∃r.Xi u ∃r.X i v ⊥
where i ranges over 1..2n and (t, t0 ) over H; the use of X i in
the third last line prevents the counter from wrapping around at
maximum value. The last inclusion is necessary to avoid that
multiple successors that carry different concept name labels interact in undesired ways.
5. When the maximum value is reached, we make an extra step in
the instance and then generate the first object on an existential
chain that represents a tiling defect:
where
∃r.(X1 u · · · u X2n u HV ) v ∃r.C
C = HM u HY1 u · · · u HY2n
6. We continue building the chain; in every step, we decide
whether we have a defect here (HD ) or defer it to later (HL );
we can defer at most to the left-most position of the second row:
HM v (HD u CD ) t HL
HM u HY 1 u · · · u HY n u
HYn+1 u HY n+2 u · · · u HY 2n v HD
where CD is a concept to be defined later.
7. When defering the defect to later, we keep going and maintain
the Y -counter, decrementing it:
HL
d
HL u 1≤j≤i HY i
d
HL u HYi u 1≤j<i HY i
F
HL u HYi u 1≤j<i HYi
F
HL u HY i u 1≤j<i HYi
where i ranges over 1..2n.
v ∃r.HM
v ∀r.HYi
v ∀r.HY i
v ∀r.HYi
v ∀r.HY i
8. When implementing the defect, we guess two V -incompatible
tiles, start a new counter, travel exactly 2n steps, and verify the
tiles. The above concept CD is
G
(Tt u HSt0 )
CD = HZ1 u · · · u HZn u
(t,t0 )∈V
/
and we add the following concept inclusions:
HD u HZi
d
HD u 1≤j≤i HZ i
d
HD u HZi u 1≤j<i HZ i
F
HD u HZi u 1≤j<i HZi
F
HD u HZ i u 1≤j<i HZi
HD u HZi u HSt
HD u HZ 1 u · · · u HZ n u HSt
where i ranges over 1..n.
v ∃r.(HM u HD )
v ∀r.HZi
v ∀r.HZ i
v ∀r.HZi
v ∀r.HZ i
v ∀r.HSt
v Tt
We now establish correctness of the reduction. A tiling chain is an
instance D that consists of the following assertions:
• r(ai , ai+1 ) for 0 ≤ i < 22n − 1
• Xj (ai ) whenever the j-th bit of i is one
• X j (ai ) whenever the j-th bit of i is zero
• exactly one Tt (ai ) for each i, t ∈ T , such that when
Tt (ai ), Tt0 (ai+1 ) ∈ D, then (t, t0 ) ∈ H.
We say that the tiling chain D is defective if there is an i ≤ 22n −
(2n + 1) such that Tt (ai ), Tt0 (ai+2n ) ∈ D and (t, t0 ) ∈
/ V.
Lemma 19 P has a solution iff O is not materializable.
P ROOF. (sketch) “if”. Assume that P has no solution. Take an
instance D. We have to construct a CQ-materialization A of D and
O. To do this, start with D viewed as an interpretation A. To satisfy
the concept inclusions in Points 1-4, which all fall within monadic
Datalog, apply a standard chase procedure. To satisfy the concept
inclusions in Point 5 to 8, consider every a ∈ (∃r.(X1 u · · · u
X2n u HV ))I . The concept HV = ∀s.V cannot be made true by
an atom in an instance and, consequently, it was made true at a by
the chase. Analyzing the concept inclusions in O, one can show
that there must thus be a tiling chain C ⊆ D whose last element
a22n −1 is a. Since there is no solution, C must be defective. When
generating the existential chain, we can thus make the guesses in
a way such that the chain homomorphically maps into C, thus into
D. It can be verified that this yields the desired CQ-materialization
of O and D.
“only if”. Assume that P has a solution. Let D be the tiling
chain that represents it. We show that there is no materialization
of O and D. In fact, O propagates the HV marker all the way up
to the last element a = a22n −1 of D, where then an existential
chain is generated. Because of the use of disjunctions to generate all different kinds of violations of the vertical tiling conditions,
there are clearly at least two chains that can be generated and that
are incomparable in terms of homomorphisms. Moreover, since
D represents a proper tiling, none of the chains homomorphically
maps to D. Consequently, we can find CQs q1 (x), . . . , qm (x) such
that O, D |= q1 (a) ∨ · · · ∨ qm (a), but O, D 6|= qi (a) for any i.
Thus, O does not have the disjunction property and consequently
is not materializable.

Download Report

Dichotomies in Ontology-Mediated Querying with the Guarded

Paperzz.com

Your Paperzz