Query-Driven Repairing of Inconsistent DL-Lite Knowledge

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)
Query-Driven Repairing of Inconsistent DL-Lite Knowledge Bases
Meghyn Bienvenu
CNRS, Univ. Montpellier, Inria
Montpellier, France
Camille Bourgaux
Univ. Paris-Sud, CNRS
Orsay, France
Abstract
dialects, including the dialect DL-LiteR considered in this paper, conjunctive query answering is tractable in data complexity and can be implemented using query rewriting techniques
[Lembo et al., 2011; Bienvenu and Rosati, 2013].
While inconsistency-tolerant semantics are essential for returning useful results when consistency cannot be achieved,
they by no means replace the need for tools for improving
data quality. That is why in this paper we propose a complementary approach that exploits user feedback about query
results to identify and correct errors. We consider the following scenario: a user interacts with an OMQA system, posing conjunctive queries and receiving the results, which are
sorted into the possible answers (i.e., those holding under
the weaker brave semantics) and the (almost) sure answers
(holding under IAR semantics). When reviewing the results,
the user can indicate that some of the retrieved answer tuples
are erroneous, whereas other tuples should definitely be considered answers. Ideally, the unwanted tuples should not be
returned as possible (brave) answers, and all of the desired
tuples should be found among the sure (IAR) answers. The
aim is thus to construct a set of atomic changes (deletions and
additions of facts), called a repair plan, that achieves as many
of these objectives as possible, subject to the constraint that
the changes must be validated by the user.
There are several reasons to use queries to guide the repairing process. First, we note that it is typically impossible
(for lack of time or information) to clean the entire dataset,
and therefore reasonable to focus the effort on the parts of the
data that are most relevant to users’ needs. In the database
arena, this observation has inspired work on integrating entity resolution into the querying process [Altwaijry et al.,
2013]. Second, expert users may have a good idea of which
answers are expected for queries concerning their area of expertise, and thus queries provide a natural way of identifying
flaws. Indeed, Kontokostas et al. (2014) recently proposed to
use queries to search for errors and help evaluate linked data
quality. Finally, even non-expert users may notice anomalies
when examining query results, and it would be a shame not to
capitalize on this information, and in this way, help distribute
the costly and time-consuming task of improving data quality,
as argued in [Bergman et al., 2015].
The contributions of this paper are as follows. In Section 3,
we formalize query-driven repairing problems and illustrate
the main challenges, in particular, the fact that there may not
We consider the problem of query-driven repairing
of inconsistent DL-Lite knowledge bases: query
answers are computed under inconsistency-tolerant
semantics, and the user provides feedback about
which answers are erroneous or missing. The aim
is to find a set of ABox modifications (deletions
and additions), called a repair plan, that addresses
as many of the defects as possible. After formalizing this problem and introducing different notions of optimality, we investigate the computational complexity of reasoning about optimal repair
plans and propose interactive algorithms for computing such plans. For deletion-only repair plans,
we also present a prototype implementation of the
core components of the algorithm.
1
François Goasdoué
Univ. Rennes 1, CNRS
Lannion, France
Introduction
Ontology-mediated query answering (OMQA) is a promising
recent approach to data access in which conceptual knowledge provided by an ontology is exploited when querying
incomplete data (see [Bienvenu and Ortiz, 2015] for a survey). As efficiency is a primary concern, significant research
efforts have been devoted to identifying ontology languages
with favorable computational properties. The DL-Lite family
of description logics (DLs) [Calvanese et al., 2007], which
underlies the OWL 2 QL profile [Motik et al., 2012], has garnered significant interest as it allows OMQA to be reduced,
via query rewriting, to standard database query evaluation.
Beyond efficiency, it is important for OMQA systems to
be robust to inconsistencies stemming from errors in the
data. Inspired by work on consistent query answering in
databases [Bertossi, 2011], several inconsistency-tolerant semantics have been developed for OMQA, with the aim of
providing meaningful answers in the presence of inconsistencies. Of particular relevance to the present paper are the
brave semantics [Bienvenu and Rosati, 2013], which returns
all query answers that are supported by some internally consistent set of facts, and the more conservative IAR semantics
[Lembo et al., 2010] that requires that facts in the support not
belong to any minimal inconsistent subset. Both semantics
have appealing computational properties: for most DL-Lite
957
Queries We focus on conjunctive queries (CQs) which take
the form q(~x) = 9~y (~x, ~y ), where is a conjunction of
atoms of the forms A(t) or R(t, t0 ), with t, t0 individuals or
variables from ~x [ ~y . A CQ is called Boolean (BCQ) if it
has no free variables (i.e. ~x = ;). Given a CQ q with free
variables ~x = (x1 , . . . , xk ) and a tuple of individuals ~a =
(a1 , . . . , ak ), we use q(~a) to denote the BCQ resulting from
replacing each xi by ai . A tuple ~a is a certain answer to q
over K, written K |= q(~a), iff q(~a) holds in every model of
K. When we use the generic term query, we mean a CQ.
exist any repair plan that resolves all identified errors. This
leads us to introduce in Section 4 different notions of optimal
repair plan. Adopting DL-LiteR as the ontology language, we
study the complexity of reasoning about the different kinds
of optimal repair plan and provide interactive algorithms for
constructing such plans. In Section 5, we focus on the important special case of deletion-only repair plans, for which all
of the optimality notions coincide. We take advantage of the
more restricted search space to improve the general approach,
and we analyze the complexity of the decision problems used
in our algorithm. Finally, in Section 6, we present preliminary experiments about our implementation of the core components of the algorithm for the deletion-only case. We conclude with a discussion of related and future work.
Omitted proofs and additional information about the experiments can be found in [Bienvenu et al., 2016b].
2
Causes and Conflicts A cause for a BCQ q w.r.t. KB K =
(T , A) is a minimal T -consistent subset C ✓ A such that
(T , C) |= q. We use causes(q, K) to refer to the set of causes
for q. A conflict for K is a minimal T -inconsistent subset of
A, and confl(K) denotes the set of conflicts for K.
When K is a DL-LiteR KB, every conflict for K has at most
two assertions. We can thus define the set of conflicts of a set
of assertions C ✓ A as follows:
Preliminaries
Following the presentation of [Bienvenu et al., 2016a], we
recall the basics of DLs and inconsistency-tolerant semantics.
confl(C, K) = { | 9↵ 2 C, {↵, } 2 confl(K)}.
Inconsistency-Tolerant Semantics A repair of K = (T , A)
is an inclusion-maximal subset of A that is T -consistent.
We consider two previously studied inconsistency-tolerant
semantics based upon repairs. Under IAR semantics, a tuple ~a is an answer to q over K, written K |=IAR q(~a), just in
the case that (T , B\ ) |= q(~a), where B\ is the intersection
of all repairs of K (equivalently, B\ contains some cause for
q(~a)). If there exists some repair B such that (T , B) |= q(~a)
(equivalently: causes(q(~a), K) 6= ;), then ~a is an answer to q
under brave semantics, written K |=brave q(~a).
Example 2. There are two repairs of the example KB Kex :
Syntax A DL knowledge base (KB) consists of an ABox and
a TBox, both constructed from a set NC of concept names
(unary predicates), a set NR of role names (binary predicates),
and a set NI of individuals (constants). The ABox (dataset)
is a finite set of concept assertions A(a) and role assertions
R(a, b), where A 2 NC , R 2 NR , a, b 2 NI . The TBox
(ontology) is a finite set of axioms whose form depends on the
particular DL. In DL-LiteR , TBox axioms are either concept
inclusions B v C or role inclusions P v S built according
to the following syntax (where A 2 NC and R 2 NR ):
B := A | 9P, C := B | ¬B, P := R | R , S := P | ¬P
{Postdoc(a), Teach(a, c)}
Semantics An interpretation has the form I = ( I , ·I ),
where I is a non-empty set and ·I maps each a 2 NI to
aI 2 I , each A 2 NC to AI ✓ I , and each R 2 NR to
RI ✓ I ⇥ I . The function ·I is extended to general concepts and roles in the standard way, e.g. (R )I = {(d, e) |
(e, d) 2 RI } and (9P )I = {d | 9e : (d, e) 2 P I }. An interpretation I satisfies an inclusion G v H if GI ✓ H I ; it satisfies A(a) (resp. R(a, b)) if aI 2 AI (resp. (aI , bI ) 2 RI ).
We call I a model of K = (T , A) if I satisfies all axioms in
T and assertions in A. A KB is consistent if it has a model,
and an ABox A is T -consistent if the KB (T , A) is consistent.
{APr(a), Adv(a, b), Teach(a, c)}
Evaluating the queries q1 = 9yTeach(x, y) and q2 =
Prof(x) on Kex yields the following results: Kex |=IAR q1 (a),
Kex 6|=IAR q2 (a), but Kex |=brave q2 (a). Indeed, intersecting
the repairs yields {Teach(a, c)}, which contains a cause for
q1 (a) but no cause for q2 (a). On the other hand, the second
repair contains two causes for q2 (a) (namely {APr(a)} and
{Adv(a, b)}), which shows Kex |=brave q2 (a).
J
In DL-LiteR , CQ answering under IAR or brave semantics
is in P w.r.t. data complexity (i.e. in the size of the ABox)
[Lembo et al., 2010; Bienvenu and Rosati, 2013].
Example 1. As a running example, we consider a simple
KB Kex = (Tex , Aex ) about the university domain that contains concepts for postdoctoral researchers (Postdoc), professors (Pr) of two levels of seniority (APr, FPr), PhD holders
(PhD), and graduate courses (GradC), as well as roles to link
advisors to their students (Adv), instructors to their courses
(Teach) and student to the courses they attend (TakeC). The
ABox Aex provides information about an individual a:
Tex ={Postdoc v PhD, Pr v PhD, Postdoc v ¬Pr,
FPr v Pr, APr v Pr, APr v ¬FPr, 9Adv v Pr}
Aex ={Postdoc(a), APr(a), Adv(a, b), Teach(a, c)}
Observe that Aex is Tex -inconsistent.
3
Query-Driven Repairing
A user poses questions to a possibly inconsistent KB and
receives the sets of possible answers (i.e. those holding under brave semantics) and almost sure answers (those holding
under IAR semantics). When examining the results, he detects some unwanted answers, which should not have been
retrieved, and identifies wanted answers, which should be
present. To fix the detected problems and improve the quality of the data, the objective is to modify the ABox in such a
way that the unwanted answers do not hold under the (more
liberal) brave semantics and the wanted answers hold under
the more cautious IAR semantics.
J
958
A first way of repairing the data is to delete assertions from
the ABox that lead to undesirable consequences, either because they contribute to the derivation of an unwanted answer
or because they conflict with causes of some wanted answer.
Example 3. Reconsider the KB K = (Tex , Aex ), and
suppose a is an unwanted answer for Pr(x) but a
wanted answer for PhD(x).
Deleting the assertions
APr(a) and Adv(a, b) achieve the objectives since
(Tex , {Postdoc(a), Teach(a, c)})
6|=brave
Pr(a) and
(Tex , {Postdoc(a), Teach(a, c)}) |=IAR PhD(a).
J
The next example shows that, due to data incompleteness,
it can also be necessary to add new assertions.
Example 4. Consider K = (Tex , {APr(a)}) with the same
wanted and unwanted answers as in Ex. 3. The assertion
APr(a) has to be removed to satisfy the unwanted answer,
but then there remains no cause for the wanted answer. This
is due to the fact that the only cause of PhD(a) in K contains
an erroneous assertion: there is no ‘good’ reason in the initial
ABox for PhD(a) to hold. A solution is for the user to add a
cause he knows for PhD(a), such as Postdoc(a).
J
We now provide a formal definition of the query-driven repairing problem investigated in this paper.
Definition 1. A query-driven repairing problem (QRP) consists of a KB K = (T , A) to repair and two sets W, U of
BCQs that K should entail (W) or not entail (U). A repair
plan (for A) is a pair R = (E , E+ ) such that E ✓ A and
E+ \ A = ;; if E+ = ;, we say that R is deletion-only.
The sets U and W correspond to the unwanted and wanted
answers in our scenario: q(~a) 2 U (resp. W) means that ~a
is an unwanted (resp. wanted) answer for q. Slightly abusing
terminology, we will use the term unwanted (resp. wanted)
answers to refer to the BCQs in U (resp. W). We say that a repair plan (E , E+ ) addresses all defects of a QRP (K, W, U )
if the KB K0 = (T , (A\E ) [ E+ ) is such that K0 |=IAR q for
every q 2 W, and K0 6|=brave q for every q 2 U.
The next example shows that by considering several answers at the same time, we can exploit the interaction between
the different answers to reduce the search space.
Example 5. Consider the KB K = (Tex , A) with ABox A =
{Pr(a), APr(b), FPr(b), Teach(a, c), Teach(b, c), GrC(c),
TakeC(s, c)}. It is easy to see that K is inconsistent, and
its two repairs are obtained by removing either APr(b)
or FPr(b). Evaluating the queries q1 (x) = PhD(x) and
q2 (x) = 9yzPr(x) ^ Teach(x, y) ^ GrC(y) ^ TakeC(z, y)
over this KB yields:
K |=brave q1 (b) K |=brave q2 (b) K |=IAR q2 (a).
We consider the QRP (K, W, U ) with wanted answers W =
{q1 (b), q2 (a)} and unwanted answers U = {q2 (b)}.
Two deletion-only repair plans address all defects:
{APr(b), Teach(b, c)} and {FPr(b), Teach(b, c)}. Indeed, we
must delete exactly one of APr(b) and FPr(b) for q1 (b) to be
entailed under IAR semantics, and we cannot remove GrC(c)
or TakeC(s, c) without losing the wanted answer q2 (a). Thus,
the only way to get rid of q2 (b) is to delete Teach(b, c).
If we consider only U (i.e. W = ;), there are additional
possibilities such as {GrC(c)} and {TakeC(s, c)}, and there
is no evidence that Teach(b, c) has to be deleted.
J
If we want to avoid introducing new errors, a fully automated repairing process is impossible: we need the user to
validate every assertion that is removed or added in order to
remove (resp. add) only assertions that are false (resp. true).
Example 6. Reconsider the problem from Ex. 5, and suppose
that the user knows that TakeC(s, c) is false and every other
assertion in A is true. An automatic repairing will remove the
true assertion Teach(b, c). The problem is due to the absence
of a ‘good’ cause for the wanted answer q2 (a) in A.
J
Since we will be studying an interactive repairing process,
in which users must validate changes, we will also need to
formalize the user’s knowledge. For the purposes of this paper, we assume that the user’s knowledge is consistent with
the considered TBox T , and so can be captured as a set Muser
of models of T . Instead of using Muser directly, it will be
more convenient to work with the function user induced from
Muser that assigns truth values to BCQs in the obvious way:
user(q) = true if q is true in every I 2 Muser , user(q) =
false if q is false in every I 2 Muser , and user(q) = unknown
otherwise. We will assume throughout the paper the following truthfulness condition: user(q) = false for every q 2 U ,
and user(q) = true for every q 2 W.
We now formalize the requirement that repair plans only
contain changes that are sanctioned by the user.
Definition 2. A repair plan (E , E+ ) is validatable w.r.t.
user1 just in the case that user(↵) = false for every ↵ 2 E
and user(↵) = true for every ↵ 2 E+ .
Unfortunately, it may be the case that there is no validatable repair plan addressing all defects. This may happen, for
instance, if the user knows some answer is wrong but cannot
pinpoint which assertion is at fault, as we illustrate next.
Example 7. Consider the QRP given by:
K =(Tex , {FPr(a), Teach(a, c), GrC(c)})
U ={9xPr(a) ^ Teach(a, x) ^ GrC(x)}, W = {Pr(a)}
Suppose that user(FPr(a)) = false, user(Teach(a, c)) =
unknown, user(GrC(c)) = unknown, user(APr(a)) = true.
It is not possible to satisfy the wanted and unwanted answers
at the same time, since adding the true assertion APr(a) creates a cause for the unwanted answer that does not contains
any assertion ↵ with user(↵) = false: the user does not know
which of Teach(a, c) and GrC(c) is erroneous.
J
As validatable repair plans addressing all defects are not
guaranteed to exist, our aim will be to find repair plans that
are optimal in the sense that they address as many of the defects as possible, subject to the constraint that the changes
must be validated by the user.
4
Optimal Repair Plans
To compare repair plans, we consider the answers from U and
W that are satisfied by the resulting KBs, where:
q 2 U is satisfied by K if K 6|=brave q;
q 2 W is satisfied by K if there exists C 2 causes(q, K)
such that confl(C, K) = ; and there is no ↵ 2 C with
user(↵) = false.
1
959
In what follows, we often omit ‘w.r.t. user’ and leave it implicit.
Remark 1. Observe that for q 2 W to be satisfied by K,
we require not only that K |=IAR q, but also that there exists
a cause for q that does not contain any assertions known to
be false, i.e. K |=IAR q should hold ‘for a good reason’. We
impose this additional requirement to avoid counterintuitive
situations, e.g. preferring repair plans that remove fewer false
assertions in order to retain a conflict-free (but erroneous)
cause for a wanted answer.
the unwanted answer, which could not be repaired by removing additional assertions as the user does not know which of
Adv(a, b) and takeC(b, c) is false. However, R1 is not globally {U ,W} -optimal because R2 = ({Teach(a, e), GrC(e)},
{Teach(a, d), GrC(c)}) satisfies all answers in W [ U.
J
In order to gain a better understanding of the computational
properties of the different ways of ranking repair plans, we
study the complexity of deciding if a given repair plan is optimal w.r.t. the different criteria. Since validatability of a repair plan depends on user, in this section, we measure the
complexity w.r.t. |A|, |U|, |W|, as well as the size of the set
We say that a repair plan R = (E , E+ ) satisfies q 2 U [W
if the KB KR = (T , (A\E ) [ E+ ) satisfies q, and we use
S(R) (resp. SU (R), SW (R)) to denote the sets of answers
(resp. unwanted answers, wanted answers) satisfied by R.
Two repair plans R and R0 can be compared w.r.t. the
sets of unwanted and wanted answers that they satisfy: for
A 2 {U, W}, we define the preorder A by setting R A
R0 iff SA (R) ✓ SA (R0 ), and obtain the corresponding
strict order ( A ) and equivalence relations (⇠A ) in the usual
way. If the two criteria are equally important, we can combine
them using the Pareto principle: R {U ,W} R0 iff R U R0
and R W R0 . Alternatively, we can use the lexicographic
method to give priority either to the wanted answers ( W,U )
or unwanted answers ( U ,W ): R A,B R0 iff R A R0 or
R ⇠A R0 and R B R0 , where {A, B} = {U , W}.
For each of the preceding preference relations , we can
define the corresponding notions of -optimal repair plan.
T ruerel
user ={↵ 2 T rueuser | there exists q 2 W such that
↵ 2 C for some C 2 causes(q, A [ T rueuser )}
where T rueuser = {↵ | user(↵) = true}. We make the reasonable assumption that T rueuser (hence T ruerel
user ) is finite.
Theorem 1. Deciding if a repair plan is globally -optimal
is coNP-complete for 2 { {U ,W} , U ,W , W,U }, and in
P for 2 { W , U }. Deciding if a repair plan is locally optimal is in P for 2 { U , W , {U ,W} , U ,W , W,U }.
For the coNP upper bounds, we note that to show that R
is not -optimal (for 2 { {U ,W} , U ,W , W,U }), we can
guess another repair plan R0 and verify in P that both plans
are validatable and that R0 satisfies more answers than R.
The lower bounds are by reduction from (variants of) UNSAT.
To establish the tractability results from Theorem 1, we
provide characterizations of optimal plans in terms of the notion of satisfiability of answers, defined next.
Definition 3. A repair plan (E , E+ ) is globally (resp. locally) -optimal w.r.t. user iff it is validatable w.r.t. user and
0
there is no other validatable repair plan (E 0 , E+
) such that
0
0
(E , E+ )
(E 0 , E+
) (resp. such that E ✓ E 0 , E+ ✓ E+
0
0
and (E , E+ ) (E , E+ )).
Definition 4. An answer q 2 U [ W is satisfiable if there
exists a validatable repair plan that satisfies q. We say that q
is satisfiable w.r.t. a validatable repair plan R = (E , E+ ) if
0
there exists a validatable repair plan R0 = (E 0 , E+
) such that
0
E ✓ E 0 , E + ✓ E+
, q 2 S(R0 ), and R {U ,W} R0 .
Globally -optimal repair plans are those that are maximal
with respect to the preference relation , whereas locally optimal repair plans are those that cannot be improved in the
ordering by adding further assertions to E or E+ .
Proposition 1. Deciding if an answer is satisfied, satisfiable,
or satisfiable w.r.t. a repair plan is in P.
Remark 2. If a repair plan is validatable and addresses all
defects of a QRP, then it is globally U -optimal. If it additionally satisfies every q 2 W (ensuring that there is a ‘good’
cause for every q 2 W), then it is globally -optimal for
every 2 { W , {U ,W} , U ,W , W,U }.
Combining Prop. 1 with the following characterizations
yields polynomial-time procedures for optimality testing.
Proposition 2. A validatable repair plan R is:
globally U - (resp. W -) optimal iff it satisfies every
satisfiable q 2 U (resp. q 2 W).
locally U ,W -optimal iff it is locally {U ,W} -optimal iff
it satisfies every q 2 U [ W that is satisfiable w.r.t. R.
locally W,U -optimal iff it satisfies every satisfiable q 2
W and every q 2 U that is satisfiable w.r.t. R.
The following example illustrates the difference between
local and global optimality.
Example 8. Consider the QRP ((Tex , A), W, U ) where
A ={Teach(a, e), Adv(a, b), TakeC(b, c), TakeC(b, e),
GrC(e)}
W ={9xTeach(a, x), 9xtakeC(b, x) ^ GrC(x)}
U ={9xyTeach(a, x)^Adv(a, y)^TakeC(y, x)^GrC(x)}
Our complexity analysis reveals that the notions of global
optimality based upon the preference relations {U ,W} ,
U ,W , and W,U have undesirable computational properties:
even when provided with all relevant user knowledge, it is intractable to decide whether a given plan is optimal. Moreover,
while plans globally U - (resp. W -) optimal can be interactively constructed in a monotonic fashion by removing further false assertions (resp. and adding further true assertions),
building a globally optimal plan for a preference relation that
involves both U and W may require backtracking over answers already satisfied (cf. the situation in Example 8).
Suppose that user(Teach(a, e)) = user(GrC(e)) = false,
user(↵) = unknown for the other ↵ 2 A, and the user knows
that Teach(a, c), Teach(a, d) and GrC(c) are true.
It can be verified that the repair plan R1
=
({Teach(a, e), GrC(e)}, {Teach(a, c)}) satisfies the first answer in W and the (only) answer in U . It is locally {U ,W} optimal since the only way to satisfy the second wanted answer would be to add GrC(c), which would create a cause for
960
complete a cause for q in Step C.1. If he is unable to do so,
at Step C.2, we remove q from W (since it cannot be satisfied w.r.t. user); otherwise, we update E and E+ using Tq
(C.3). Note that since Tq contains only true assertions, we
can remove its conflicts without affecting already satisfied
wanted answers; this step is necessary because Tq may conflict with assertions of A that are not involved in the causes
and conflicts presented at Step B. In Step C.4, we remove
false assertions appearing in a new cause for q or its conflicts
(such assertions may not have been examined in Step B). Step
C.5 removes false assertions of new causes of unwanted answers, and Step C.6 undoes the addition of Tq if it affects
SU (E , E+ ). Thus, at the end of Step C, for every wanted
answer, either it is satisfied, or the user is unable to supply
a cause that does not deteriorate SU (E , E+ ). It follows that
(E , E+ ) is locally {U ,W} -optimal.
A LGORITHM OptRPU
Input: QRP (K=(T ,A), U , W) Output: repair plan
A. E
;, E+
;
S
B. Display the assertions of q2U [W causes(q, K) and
S
q2W,C2causes(q,K) confl(C, K)
1. Ask user to mark all false (F ) and true (T ) assertions
2. E
E [ F [ confl(T, K)
C. While W 0 = W\SW (E , E+ ) 6= ;: q
first(W 0 )
1. Ask the user for true assertions Tq (not already provided) to complete (or create) a cause for q
2. If Tq = ; (nothing to add): W
W\{q}, go to C.
3. E+
E + [ Tq , E
E [ confl(Tq , (T , A [ Tq ))
4. Show assertions of every cause C of q such that
Tq \ C =
6 ; and its conflicts: user indicates false,
true assertions F 0 , T 0 : E
E [ F 0 [ confl(T 0 , K)
5. Show assertions of causes of every q 0 2 U in A\E [
E+ : user gives false assertions F 00 : E
E [ F 00
00
6. If there is q 2 U such that (T , A\E [E+ ) |=brave q 00
and (T , A\E ) 6|=brave q 00 : E+
E+ \Tq (revert E+ )
D. Return (E , E+ )
Figure 1: Algorithm for constructing a globally
cally {U ,W} -optimal repair plan
U
5
Optimal Deletion-Only Repair Plans
In this section, we restrict our attention to constructing optimal deletion-only repair plans. In this simpler setting, all of
the previously introduced notions of optimality collapse into
the one characterized in the following proposition.
Proposition 3. A validatable deletion-only plan is optimal iff
it satisfies every q 2 U such that every C 2 causes(q, K) has
↵ 2 C with user(↵) = false, and every q 2 W for which there
exists C 2 causes(q, K) such that user(↵) 6= false for every
↵ 2 C and user( ) = false for every 2 confl(C, K).
and lo-
For the preceding reasons, we target validatable repair
plans that are both globally optimal for U or W (depending which is preferred) and locally optimal for {U ,W} . In
Fig. 1, we give an interactive algorithm OptRPU for building
such a repair plan when U is preferred; if W is preferred, we
use the algorithm OptRPW obtained by removing Step C.6
from OptRPU . The algorithms terminate provided the user
knows only a finite number of assertions that may be inserted.
In this case, the algorithms output optimal repair plans:
Theorem 2. The output of OptRPU (resp. OptRPW ) is globally U (resp. W ) and locally {U ,W} -optimal.
Constructing such repair plans can be done with one of the
preceding algorithms, omitting Step C that adds facts. However, it is possible to further assist the user by taking advantage of the fact that subsets of the ABox whose removal addresses all defects of the QRP can be automatically identified,
and then interactively transformed into optimal repair plans.
We call such subsets potential solutions.
An assertion is said to be relevant if it appears in a cause
of some q 2 U [ W or in the conflicts of a cause of some
q 2 W. If an assertion ↵ appears in every potential solution,
either user(↵) = false, or there is no validatable potential solution. We call such assertions necessarily false. If ↵ appears
in no potential solution, it is necessary to keep it in A to retrieve some wanted answers under IAR semantics, so either
user(↵) 6= false, or it is not possible to satisfy all wanted
answers. We call such assertions necessarily nonfalse.
When a potential solution does not exist, a minimal correction subset of wanted answers (MCSW) is an inclusionminimal subset W 0 ✓ W such that removing W 0 from W
yields a QRP with a potential solution. Because of the truthfulness condition, we know that the absence of a potential solution means that some wanted answers are supported only by
causes containing erroneous assertions (otherwise the wanted
and unwanted answers would be contradictory, which would
violate the truthfulness condition). Moreover, since removing
all such answers from W yields the existence of a potential
solution, there exists a MCSW which contains only such answers, which we call an erroneous MCSW. This is why MCSWs can help identify the wanted answers that cannot be satisfied by a deletion-only repair plan.
Proof. We give the proof for OptRPU . First observe that at
every point during the execution of the algorithm, the current
repair plan is validatable, since only true assertions are added
to E+ and false assertions are added to E (they are either
marked as false by the user or are in conflict with assertions
that have been marked as true).
Step B adds to E all assertions known to be false that belong to a cause of some q 2 U [ W or a conflict of some
cause of q 2 W. Thus, at the end of this step, E satisfies
every satisfiable answer in U , that is, every answer in U every
cause of which contains at least one false assertion. Hence
(E , E+ ) is globally U -optimal at the end of step B. Moreover, every false assertion that occurs in a cause or conflict of
a cause of a wanted answer has been removed, so if q 2 W
is not satisfied at this point, then it has no cause without any
conflict in A\{↵ | user(↵) = false}.
The purpose of Step C is to add new true assertions to create causes for the wanted answers not satisfied after Step B,
while preserving SU (E , E+ ). For every q 2 W, while q
is not satisfied, the user is asked to input true assertions to
961
Theorem 3. For complexity w.r.t. |A|, |U| and |W|, deciding if a potential solution exists is NP-complete, deciding if
an assertion is necessarily (non)false is coNP-complete, and
deciding if W 0 ✓ W is a MCSW is BH2 -complete.
The lower bounds are proven by reduction from propositional (un)satisfiability and related problems. For the upper
bounds, we construct in polynomial time a propositional CNF
' = 'U ^ 'W with:
^
^
_
'U =
x↵
A LGORITHM OptDRP
Input: QRP (K=(T ,A), U , W) Output: repair plan
(Note: below K is a macro for (T , A\E ), using the current E .)
A. K0
K, A0
A, E
;
B. If a potential solution for (K, U , W) exists in A0 :
1. R
R(K, U , W, A0 ), Nf
Nf (K, U , W, A0 ),
N¬f
N¬f (K, U , W, A0 )
2. If the user validates user(↵) = false for every ↵ 2 Nf
and user(↵) 6= false for every ↵ 2 N¬f :
a. E
E [ Nf , A0
A0 \(Nf [ N¬f )
b. If E is a potential solution for (K0 , U , W):
i. For each
S q 2 W: the user gives all false assertions
F ✓ C2causes(q,K),confl(C,K)=; C, E
E [F
ii. If E is still a potential solution: output E
iii. Else: A0
A0 \E , go to B
c. Else: user selects some F, T ✓ R\(Nf [ N¬f )
i. If F = T = ; (nothing left to input): return E
ii. Else: E
E [F [confl(T, K), A0
A0 \(E [
T ), go to B
3. Else: user gives F ✓ {↵ 2 N¬f | user(↵) = false}
and N F ✓ {↵ 2 Nf | user(↵) 6= false} with
F [ N F 6= ;, E
E [ F , A0
A0 \(E [ N F )
C. Search for a MCSW containing only answers that are
supported only by erroneous causes:
1. M
M CSW s(K, U , W, A0 ) ordered by size
2. While erroneous MCSW not found and M 6= ;:
a. M
first(M)
b. For every q 2 M :
S
i. the user selects F, T ✓ C2causes(q,K) C
q2U C2causes(q,K) ↵2C
'W =
^
_
q2W C2causes(q,K)
^
^
^
wC
^
^
q2W C2causes(q,K) ↵2C
^
^
q2W C2causes(q,K)
¬wC _ ¬x↵
^
2confl(C,K)
which has the following properties:
¬wC _ x
there exists a potential solution iff ' is satisfiable (satisfying assignments correspond to potential solutions);
↵ is necessarily false iff ' ^ ¬x↵ is unsatisfiable;
↵ is necessarily nonfalse iff ' ^ x↵ is unsatisfiable;
there exist disjoint subsets S, H of the clauses in ' such
that the MCSWs correspond to the minimal correction
subsets (MCSs) of S w.r.t. H, i.e. the subsets M ✓ S
such that (i) (S\M )[H is satisfiable, and (ii) (S\M 0 )[
H is unsatisfiable for every M 0 ( M .
We present in Fig. 2 an algorithm OptDRP for computing optimal deletion-only repair plans. Within the algorithm, we denote by R(K, U , W, A0 ) (resp. Nf (K, U , W, A0 ),
N¬f (K, U , W, A0 )) the set of assertions from A0 ✓ A that
are relevant (resp. necessarily false, nonfalse) for the QRP
(K, U , W) when deletions are allowed only in A0 (the set A0
will be used to store assertions whose truth value is not yet
determined). The general idea is that the algorithm incrementally builds a set of assertions that are false according to
the user. It aids the user by suggesting assertions to remove,
or wanted answers that might not be satisfiable when there
is no potential solution, while taking into account the knowledge the user has already provided. If there exists a potential
solution, the algorithm computes the necessarily (non)false
assertions and asks the user either to validate them or to input false and nonfalse assertions to justify why they cannot be
validated, and then to input further true or false assertions if
the current set of false assertions does not address all defects.
When a potential solution is found, the user has to verify that
each wanted answer has a cause that does not contain any
false assertion. If there does not exist a potential solution at
some point, either initially or after some user inputs, the algorithm looks for an erroneous MCSW by computing all MCSWs, then showing for each of them the assertions involved
in the causes of each query of the MCSW. If there is a query
which has a cause without any false assertion, the MCSW under examination is not erroneous, nor are the other MCSWs
that contain that query. Otherwise, the MCSW is erroneous
ii. E
E [ F [ confl(T, K), A0
A0 \(E [ T )
iii. If a cause for q contains no false assertion: M
M\{M 0 2 M | q 2 M 0 }, go to C.2
c. MCSW found: W
W\M and go to B.1
3. No MCSW found: do Step B of OptRPU , output E
Figure 2: Algorithm for optimal deletion-only repair plans
and its queries are removed from W, and we return to the case
where a potential solution exists.
Theorem 4. The algorithm OptDRP always terminates, and
it outputs an optimal deletion-only repair plan.
Proof idea. Termination follows from the fact that every time
we return to Step B, something has been added to E or
deleted from W, and nothing is ever removed from E or
added to W. Since we only add false assertions to E , the output plan is validatable. If the algorithm ends at Step B.2.b.ii,
then E satisfies every answer characterized in Prop. 3. Indeed, since E is a potential solution, it satisfies every unwanted answer. Moreover, the answers removed from W at
Step C.2.c do not fulfill the conditions of Prop. 3 since all
their causes contain some false assertion, and for every remaining q 2 W, we ensure that there is a conflict-free cause
of q that contains no false assertions. If the algorithm ends at
962
Step B.2.c.i, the user was asked to indicate false or true assertions at Step B.2.c and was not able to input anything, so
the user has deleted all false assertions he knows among the
relevant assertions, and thus it is not possible to improve the
current repair plan further. A similar argument applies if the
algorithm ends at Step C.3.
q2 = 9x Employee(x) ^ memberOf(x, D2)^
degreeFrom(x, y)
q3 = 9y teacherOf(x, y) ^ degreeFrom(x, U 532)
q4 = 9z Employee(x) ^ degreeFrom(x, U 532)^
memberOf(x, z) ^ Employee(y)^
degreeFrom(y, U 532) ^ memberOf(y, z)
In all of our experiments, deciding if a potential solution
exists, as well as computing the set of relevant assertions,
takes just a few milliseconds. The difficulty of computing the
necessarily (non)false assertions correlates with the number
of relevant assertions induced by the QRP. For the c5 QRPs
involving 85 to 745 relevant assertions, this computation took
between 30ms to 544ms, while it took 24ms to 1333ms for the
c29 QRPs involving 143 to 1404 relevant assertions. While
these times seem reasonable in practice, ranking the remaining relevant assertions based on their impact is time consuming as it requires a number of calls to the SAT solver quadratic
in the number of assertions: it took less than 10s up to ⇠150
assertions, less than 5mn up to ⇠480 assertions, and up to
25mn for 825 assertions. For all of the QRPs we built, computing the MCSWs takes a few milliseconds; somewhat surprisingly, we always found at most one MCSW.
To avoid overwhelming the user with relevant assertions at
Step B.2.c, it is desirable to reduce the number of assertions
presented at a time. This leads us to propose two improvements to the basic algorithm. First, we can divide QRPs into
independent subproblems. Two answers are considered dependent if their causes (and conflicts in the case of wanted
answers) share some assertion. Independent sets of answers
do not interact, so they can be handled separately. Second, at
Step B.2.c, the assertions can be presented in small batches.
Borrowing ideas from work on reducing user effort in interactive revision, we can use a notion of impact to determine
the order of presentation of assertions. Indeed, deleting or
keeping an assertion may force us to delete or keep other assertions to get a potential solution. Relevant assertions can
be sorted using two scores that express the impact of being
declared false or true. For the impact of an assertion ↵ being
false, we use the number of assertions that becomes necessarily (non)false if ↵ is deleted. The impact of ↵ being true
also takes into account the fact that the conflicts of ↵ can be
marked as false: we consider the number of assertions that
are in conflict with ↵ or become necessarily (non)false when
we disallow ↵’s removal. We can rank assertions by the minimum of the two scores, using their sum to break ties.
6
7
Discussion
The problem of modifying DL KBs to ensure
(non)entailments of assertions and/or axioms has been investigated in many works, see e.g. [De Giacomo et al., 2009;
Calvanese et al., 2010; Gutierrez et al., 2011].
Our framework is inspired by that of [Jiménez-Ruiz et al.,
2011], in which a user specifies two sets of axioms that should
be entailed or not by a KB. Repair plans are introduced as
pairs of sets of axioms to remove and add to obtain an ontology satisfying these requirements. Deletion-only repair plans
are studied in [Jiménez-Ruiz et al., 2009] where heuristics
based on the confidence and the size of the plan are used to
help the user to choose a plan among all minimal plans.
When axiom (in)validation can be partially automatized,
ranking axioms by their potential impact reduces the effort of
manual revision [Meilicke et al., 2008; Nikitina et al., 2012].
In our setting, we believe that validating sets of necessarily
(non)false assertions requires less effort than hunting for false
assertions among all relevant assertions, leading us to propose
a similar notion of impact to rank assertions to be examined.
Compared to prior work, distinguishing features of our
framework are the specification of changes at the level of CQ
answers, the use of inconsistency-tolerant semantics, and the
introduction of optimality measures to handle situations in
which not all objectives can be achieved.
In future work, two aspects of our approach deserve further attention. First, when insertions are needed, it would
be helpful to provide users with suggestions of assertions to
add. The framework of query abduction [Calvanese et al.,
2013], which was recently extended to inconsistent KBs [Du
et al., 2015], could provide a useful starting point. Second,
our experiments revealed the difficulty of ranking relevant assertions, so we plan to develop optimized algorithms for computing impact and explore alternative definitions of impact.
Preliminary Experiments
We report on experiments made on core components of the
above OptDRP algorithm. We focused on measuring the
time to decide whether a potential solution exists (Step B), to
compute necessarily (non)false and relevant assertions (Step
B.1), to rank the relevant assertions w.r.t. their impact (Step
B.2.c), and to find the MCSWs (Step C).
The components were developed in Java using the CQAPri
system (www.lri.fr/⇠bourgaux/CQAPri) to compute query answers under IAR and brave semantics, with their causes, and
the KB’s conflicts. We used SAT4J (www.sat4j.org) to solve
the (UN)SAT reductions in Section 5.
We borrowed from the CQAPri benchmark [Bienvenu et
al., 2016a] available at the URL above its: (i) TBox which
is the DL-LiteR version of the Lehigh University Benchmark
[Lutz et al., 2013] augmented with constraints allowing for
conflicts, (ii) c5 and c29 ABoxes with ⇠10 million assertions
and, respectively, a ratio of assertions involved in conflicts of
5%, that we found realistic, and of 29%, and (iii) queries
q1 , q2 , q3 , q4 , shown below. We slightly modified the original queries by changing some constants or variables in order
to obtain dependent answers (whose causes and conflicts of
causes share some assertions). We built 13 QRPs per ABox,
by adding more and more answers of these queries to U or W;
the size of U [ W varies from 8 to 121.
q1 = 9y Person(x) ^ takesCourse(x, y) ^ Person(G131)^
GraduateCourse(y) ^ takesCourse(G131, y)
963
Acknowledgements
[Du et al., 2015] Jianfeng Du, Kewen Wang, and Yi-Dong
Shen. Towards tractable and practical ABox abduction
over inconsistent description logic ontologies. In Proceedings of the 29th AAAI Conference on Artificial Intelligence
(AAAI), 2015.
[Gutierrez et al., 2011] Claudio Gutierrez, Carlos A. Hurtado, and Alejandro A. Vaisman. RDFS update: From
theory to practice. In Proceedings of the 8th European
Semantic Web Conference (ESWC), 2011.
[Jiménez-Ruiz et al., 2009] Ernesto
Jiménez-Ruiz,
Bernardo Cuenca Grau, Ian Horrocks, and Rafael Berlanga
Llavori. Ontology integration using mappings: Towards
getting the right logical consequences. In Proceedings
of the 6th European Semantic Web Conference (ESWC),
2009.
[Jiménez-Ruiz et al., 2011] Ernesto
Jiménez-Ruiz,
Bernardo Cuenca Grau, Ian Horrocks, and Rafael Berlanga
Llavori. Supporting concurrent ontology development:
Framework, algorithms and tool. Data & Knowledge
Engineering Journal (DKE), 70(1):146–164, 2011.
[Kontokostas et al., 2014] Dimitris Kontokostas, Patrick
Westphal, Sören Auer, Sebastian Hellmann, Jens
Lehmann, Roland Cornelissen, and Amrapali Zaveri.
Test-driven evaluation of linked data quality. In Proceedings of the 23rd International Conference on World Wide
Web (WWW), 2014.
[Lembo et al., 2010] Domenico Lembo, Maurizio Lenzerini,
Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo.
Inconsistency-tolerant semantics for description logics. In
Proceedings of the 4th International Conference on Web
Reasoning and Rule Systems (RR), 2010.
[Lembo et al., 2011] Domenico Lembo, Maurizio Lenzerini,
Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo.
Query rewriting for inconsistent DL-Lite ontologies. In
Proceedings of the 5th International Conference on Web
Reasoning and Rule Systems (RR), 2011.
[Lutz et al., 2013] Carsten Lutz, Inanç Seylan, David
Toman, and Frank Wolter. The combined approach to
OBDA: Taming role hierarchies using filters. In Proceedings of the 12th International Semantic Web Conference
(ISWC), 2013.
[Meilicke et al., 2008] Christian Meilicke, Heiner Stuckenschmidt, and Andrei Tamilin. Supporting manual mapping
revision using logical reasoning. In Proceedings of the
23rd AAAI Conference on Artificial Intelligence (AAAI),
2008.
[Motik et al., 2012] Boris Motik, Bernardo Cuenca Grau,
Ian Horrocks, Zhe Wu, Achille Fokoue, and Carsten Lutz.
OWL 2 Web Ontology Language profiles. W3C Recommendation, 11 December 2012. Available at http://www.
w3.org/TR/owl2-profiles/.
[Nikitina et al., 2012] Nadeschda
Nikitina,
Sebastian
Rudolph, and Birte Glimm. Interactive ontology revision.
Journal of Web Semantics (JWS), 12:118–130, 2012.
This work was supported by contract ANR-12-JS02-007-01.
References
[Altwaijry et al., 2013] Hotham Altwaijry, Dmitri V. Kalashnikov, and Sharad Mehrotra. Query-driven approach to
entity resolution. Proceedings of the VLDB Endowment
(PVLDB), 6(14):1846–1857, 2013.
[Bergman et al., 2015] Moria Bergman, Tova Milo, Slava
Novgorodov, and Wang-Chiew Tan. QOCO: A query
oriented data cleaning system with oracles. PVLDB,
8(12):1900–1911, 2015.
[Bertossi, 2011] Leopoldo E. Bertossi. Database Repairing
and Consistent Query Answering. Synthesis Lectures on
Data Management. Morgan & Claypool Publishers, 2011.
[Bienvenu and Ortiz, 2015] Meghyn Bienvenu and Magdalena Ortiz. Ontology-mediated query answering with
data-tractable description logics. In Lecture Notes of the
11th International Reasoning Web Summer School, volume 9203 of LNCS, pages 218–307. Springer, 2015.
[Bienvenu and Rosati, 2013] Meghyn Bienvenu and Riccardo Rosati. Tractable approximations of consistent
query answering for robust ontology-based data access. In
Proceedings of the 23rd International Joint Conference on
Artificial Intelligence (IJCAI), 2013.
[Bienvenu et al., 2016a] Meghyn Bienvenu, Camille Bourgaux, and François Goasdoué. Explaining inconsistencytolerant query answering over description logic knowledge
bases. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI), 2016.
[Bienvenu et al., 2016b] Meghyn Bienvenu, Camille Bourgaux, and François Goasdoué. Query-driven repairing
of inconsistent DL-Lite knowledge bases (long version
with appendix). Technical Report 1585, LRI, Orsay,
France. Available at https://www.lri.fr/⇠bibli/Rapportsinternes/2016/RR1585.pdf, 2016.
[Calvanese et al., 2007] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal
of Automated Reasoning (JAR), 39(3):385–429, 2007.
[Calvanese et al., 2010] Diego Calvanese, Evgeny Kharlamov, Werner Nutt, and Dmitriy Zheleznyakov. Evolution
of DL-Lite knowledge bases. In Proceedings of the 9th International Semantic Web Conference (ISWC), 2010.
[Calvanese et al., 2013] Diego Calvanese, Magdalena Ortiz,
Mantas Simkus, and Giorgio Stefanoni. Reasoning about
explanations for negative query answers in DL-Lite. Journal of Artificial Intelligence Research (JAIR), 48:635–669,
2013.
[De Giacomo et al., 2009] Giuseppe De Giacomo, Maurizio
Lenzerini, Antonella Poggi, and Riccardo Rosati. On
instance-level update and erasure in description logic ontologies. Journal of Logic and Computation, 19(5):745–
770, 2009.
964

Download Report

Query-Driven Repairing of Inconsistent DL-Lite Knowledge

Paperzz.com

Your Paperzz