utility functions for ceteris paribus preferences

Computational Intelligence, Volume 20, Number 2, 2004
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
MICHAEL MCGEACHIE
Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology,
Cambridge, MA
JON DOYLE
Department of Computer Science, North Carolina State University, Raleigh, NC
Ceteris paribus (all-else equal) preference statements concisely represent preferences over outcomes or goals
in a way natural to human thinking. Although deduction in a logic of such statements can compare the desirability of
specific conditions or goals, many decision-making methods require numerical measures of degrees of desirability.
To permit ceteris paribus specifications of preferences while providing quantitative comparisons, we present an
algorithm that compiles a set of qualitative ceteris paribus preferences into an ordinal utility function. Our algorithm
is complete for a finite universe of binary features. Constructing the utility function can, in the worst case, take time
exponential in the number of features, but common independence conditions reduce the computational burden. We
present heuristics using utility independence and constraint-based search to obtain efficient utility functions.
Key words: qualitative decision theory, rationality, ceteris paribus preferences.
1. INTRODUCTION
Some works on what one can call qualitative decision theories (Doyle and Thomason
1999) study reasoning and algorithmic methods over logical languages of qualitative preference rankings, especially methods that work by decomposing outcomes into their component
qualities or features and then reasoning about the relative desirability of alternatives possessing different features (see, for example, Doyle, Shoham, and Wellman 1991; Bacchus and
Grove 1996; La Mura and Shoham 1999). These theories provide mechanisms for stating
conditions under which possible outcomes in one set are preferred to outcomes in another set
by a given decision maker. Quantitative approaches to decision theory, in contrast, typically
start with numeric valuations of each attribute, direct estimation of a decision maker’s utility
function, or detailed estimations of probabilities of different event occurrences given certain
actions.
In many domains, qualitative rankings of desirability provide a more natural and appropriate starting point than quantitative rankings (Wellman and Doyle 1991). As with qualitative
representations of probabilistic information (Wellman 1990), the primary qualitative relationships often times immediately suggest themselves. Primary qualitative relationships also
remain unchanged despite fluctuating details of how one condition trades off against others,
and can determine some decisions without detailed knowledge of such tradeoffs. For example, in the domain of configuration problems one seeks to assemble a complicated system
according to constraints described in terms of qualitative attributes, such as the type of a
computer motherboard determining the kind of processors possible. Some user of a configuration system might prefer fast processors over slow without any special concern for the
exact speeds.
Purely qualitative comparisons do not suffice in all cases, especially when the primary
attributes of interest are quantitative measures, or when outcomes differ in respect to some
easily measurable quantity. For example, in standard economic models the utility of money
is proportional to the logarithm of the amount of money. Quantitative comparisons also are
needed when the decision requires assessing probabilistic expectations, as in maximizing
expected utility. Such expectations require cardinal as opposed to ordinal measures of desirability. In other cases, computational costs drive the demand for quantitative comparisons.
C
2004 Blackwell Publishing, 350 Main Street, Malden, MA 02148, USA, and 9600 Garsington Road, Oxford OX4 2DQ, UK.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
159
|
For example, systems manipulating lists of alternatives sorted by desirability might be best
organized by computing degrees of desirability at the start and then working with these easily sorted “keys” throughout the rest of the process, rather than repeatedly re-determining
pairwise comparisons.
These considerations suggest that different decision-making processes place different
demands on the representation of preference information, just as demands on representations
of other types of information vary across activities. Although some processes might get by
with a purely quantitative or purely qualitative representation, one expects that the activities
of automated rational agents will require both representations for different purposes, and have
need to transfer preference information from one representation to the other. In formulating a
new decision, for example, the agent might need to reason like a decision analyst, starting by
identifying general preference information relevant to the decision at hand, continuing by assembling this information into a quantitative form, and followed by applying the quantitative
construction to test problems or to the actual problem. These applications might reveal flaws
in the decision formulation. Minor flaws might require only choice of a different quantitative
representation of the qualitative information, much as a physical scientist might adjust parameters in a differential equation. Major flaws might require augmenting or emending the
general qualitative information. All in all, both forms of representation seem required for
different parts of the process, so we look for means to support such activities.
In this work, we give methods for translating preference information from a qualitative
representation to a quantitative representation. This process preserves many of the desirable
properties of both representations. We maintain an intuitive, flexible, under-constrained, and
expressive input language from our qualitative representation. From our quantitative representation, we gain an ordinal utility function: a function u that takes as input an assignment
m of values to features F and returns a real number u(m) such that if m is logically preferred
to m , then u(m) > u(m ). Because we focus on ordinal utilities, the utility function does
not provide meaningful relative magnitudes, e.g., if u(m) > u(m ) neither u(m) − u(m ) nor
u(m)/(u(m )) need be meaningful quantities. As such, the functions we construct are unsuitable for certain applications that compare ratios of utilities or compute probabilistic
expectations of utilities.
We will consider a propositional language L augmented with an ordering relation used
to express preferences over propositional combinations of a set F of elementary features.
Given a set C of preferences in L over the subset F(C) of features appearing in C, and M the
set of all models of F(C), we compute a utility function u : M → R such that the preference
ordering implied by u is consistent with C. We give a more formal explanation of this task
after first providing some background about preferences and introducing some notation.
Wellman and Doyle (1991) have observed that human preferences for many types of
goals can be interpreted as qualitative representations of preferences. Doyle, Shoham, and
Wellman (1991) present a theoretical formulation of human preferences of generalization in
terms of ceteris paribus preferences, i.e., all-else-equal preferences. Ceteris paribus relations
express a preference over sets of possible worlds. We consider all possible worlds (or outcomes) to be describable by some (large) set F of binary features. Then each ceteris paribus
preference statement specifies a preference over some features of outcomes while ignoring
the remaining features. The specified features are instantiated to either true or false, while
the ignored features are “fixed,” or held constant. A ceteris paribus preference might be “we
prefer programming tutors receiving an A in Software Engineering to tutors not receiving an
A, other things being equal.” In this example, we can imagine a universe of computer science
tutors, each describable by some set of binary features F. Perhaps F = {Graduated, SoftwareEngineering A, ComputerSystems A, Cambridge resident, Willing to work on Tuesdays,
. . .}. The preferences expressed above state that, for a particular computer science tutor,
160
COMPUTATIONAL INTELLIGENCE
TABLE 1.
Properties of Possible Computer Science Tutors
Tutor
Feature
Alice
Bob
Carol
Graduated
A in Software Engineering
A in Computer Systems
Cambridge resident
Will work Tuesdays
..
.
False
True
True
True
False
..
.
False
False
True
True
False
..
.
True
False
False
True
True
..
.
they are more desirable if they received an A in the Software Engineering course, all other
features being equal. Specifically, this makes the statement that a tutor Alice, of the form
shown in Table 1, is preferred to another tutor Bob, also in Table 1, assuming the elided
features are identical, since they differ only on the feature we have expresses a preference
over (grade in Software Engineering). The ceteris paribus preference makes no statement
about the relationship between tutor Alice and tutor Carol because they differ with regard to
other features.
1.1. Approach
In broad terms, we generate an ordinal utility function from a set C of ceteris paribus
preferences over features F in the following steps. First we examine the preferences stated
in C and determine which features might be assumed utility independent of which other
features. Utility independence (UI) is the idea that it makes sense to talk of the utility
contributed by some features independent of the values assumed by other features. This
provides computational benefits, since utility independent sets of features can be considered
without regard to the other features. Next, again using C, we can define subutility functions
for each utility independent set of features. Such methods are based on representing the
preorders consistent with the preferences C by building a graph over assignments to the
features. Finally, to assign relative weights of satisfaction of different features, we solve a
linear programming problem. In the end, we have built a utility function u that can be used
to quickly evaluate the utility of different assignments to values of F.
Since the work of Doyle and Wellman, other researchers have proposed methods of
computing with different representations of ceteris paribus preferences. The work of Bacchus
and Grove (1995) and La Mura and Shoham (1999) use quantitative preference statements, in
contrast to our qualitative preference statements. Both systems use an adaptation of Bayesian
Networks to utility as their computation paradigm. As heavily quantitative representations
do, their representation requires a priori knowledge of many of the utilities and many of
the probabilities of combinations of states. Despite our method and the methods of Bacchus
and Grove (1995) and La Mura and Shoham (1999) all modeling ceteris paribus preference
statements, the efforts are quite different. Their approaches assume that subutility functions
are known, and that these can be combined into a utility function. Our work endeavors to
build subutility functions from the raw preference statements themselves.
Our work is similar to the work of Boutilier, Bacchus, and Brafman (2001), who compute an additive decomposition utility function from quantified preferences. Earlier work by
Boutilier et al. (1999) uses qualitative conditional ceteris paribus preferences, and presents
some strong heuristics for determining if one outcome is preferred to another, but uses a very
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
161
restricted representation for ceteris paribus preferences. Their methodology allows complicated logical conditions determining under what conditions a preference holds, but the
preference itself is limited to only one feature, and the representation cannot express statement such as p q. Our language of preference statements, in contrast, can reference more
than one feature at a time and express general comparisons. As we will show, their method has
similar computational costs to our own methods, when our methods operate under favorable
conditions. Since our method allows arbitrary preference statements as input, we sometimes
run into cases that are computationally intractable.
We now give a formal account of the ceteris paribus formulation we will use, then a
statement of our task using that formalism.
1.2. A Formal “All Else Equal” Logic
We employ a restricted logical language L, patterned after (Doyle et al. 1991). We
simplify the presentation by using only the standard logical operators ¬ (negation) and ∧
(conjunction) to construct finite sentences over a set of atoms A, thus requiring one to translate
sentences using disjunction, implication, and equivalence into (possibly larger) equivalent
logical sentences using only negation and conjunction. Each atom a ∈ A corresponds to a
feature f ∈ F, a space of binary features describing possible worlds. We write f (a) for the
feature corresponding to atom a. By literals(A) we denote the atoms of A and their negations;
literals(A) = A ∪ {¬a | a ∈ A}. A complete consistent set of literals m is a model. That is,
m is a model iff exactly one of a and ¬a are in m, for all a ∈ A. We use M for the set of all
models of L.
A model of L assigns truth values to all atoms of L, and therefore to all formula in L
and all features in F. We write f i (m) for the truth value assigned to feature f i by model m.
A model satisfies a sentence p of L if the truth values m assigns to the atoms of p make p
true. We write m |= p when m satisfies p. We define a proposition expressed by a sentence
p, by [ p] = {m ∈ M | m |= p}.
over M.
A preference order is a complete preorder (reflexive and transitive relation) ∼
When m ∼ m , we say that m is weakly preferred to m . If m ∼ m and m ∼
/ m, we write
m and m m, then we say m is
m m and say that m is strictly preferred to m . If m ∼
∼
indifferent to m , written m ∼ m .
The support of a sentence p is the minimal set of atoms determining the truth of p,
denoted s( p). The support of p is the same as the set of atoms appearing in an irredundant
sum-of-products sentence logically equivalent to p. Two models m and m are equivalent
modulo p if they are the same outside the support of p. Formally, m ≡ m mod p iff
m\(literals(s( p))) = m \(literals(s( p))).
Model modification is defined as follows. A set of model modifications of m making p true,
written m[ p], are those models satisfying p which assign the same truth values to atoms
outside the support of p as m does. That is,
m[ p] = {m ∈ [ p] | m ≡ m mod p}.
|
A statement of desire is an expression of ceteris paribus preferences. Desires are defined
in terms of model modification. We write p q when p is desired at least as much as q.
Formally, we interpret this as p q if and only if for all m in M, m ∈ m[ p ∧ ¬q] and
m . This is just a statement that p is desired over q exactly
m ∈ m[¬ p ∧ q], we have m ∼
when any model making p true and q false is weakly preferred to any model making p false
and q true, whenever the two models assign the same truth values to all atoms not in the
|
162
COMPUTATIONAL INTELLIGENCE
support of p or of q. If p is weakly preferred to q and there is some pair of models m, m ,
where m makes p true and q false, m makes p false and q true, m, m assign the same truth
values to atoms outside the support of p and q, and m is strictly preferred to m , we instead
have a strict preference for p over q, written p q.
1.3. Preference Specification
|
Ceteris paribus preferences are specified by a set of preference rules using the language
L. A preference rule is any statement of the form p q or p q, where p and q are
statements in L.
If a preference rule c implies that m m , for m , m ∈ M, then we write m c m .
A set of preferences rules C is said to be consistent just in case for all m , m ∈ M, it is not
the case that m is strictly preferred to m and m is strictly preferred to m .
Given a set C of ceteris paribus preference rules, let [C] be the set of all weak preference
orders over models such that the weak preference order satisfies the preference specification
C.
1.4. Conditional Ceteris Paribus Preferences
|
Conditional ceteris paribus preferences can be represented as well. A conditional ceteris
paribus preference is a preference of the form: if r then p q, where p, q, and r are
a statements in L. This is taken to mean that p q, a ceteris paribus preference, holds
just in case r holds. Such a preference is equivalent to the unconditional ceteris paribus
preference r ∧ p r ∧ q whenever the support of r is disjoint from that of p and q, but
need not be equivalent when these supports overlap (Doyle et al. 1991). Partly for this reason,
conditional ceteris paribus preferences have attracted attention as an important extension of
unconditional ceteris paribus preferences (Boutilier et al. 1999). In the following, however,
we regard unconditional ceteris paribus preference statements as able to capture most of
the conditional ceteris paribus preferences of interest, and use the correspondence between
conditional ceteris paribus preferences and our simplified ceteris paribus in L to obtain results
that apply to both types of ceteris paribus preferences when conditions and preferences do
not share support.
|
1.5. Utility Functions
We have described the language L in the preceding section. A utility function u : M → R,
maps each model in M to a real number. Each utility function implies a particular preorder
over the models M by reflecting the ordering of utility values onto models. We denote the
preorder implied by u with p(u). Given a finite set C of ceteris paribus preferences in a
language L over a set of atoms A representing features in F, we attempt to find a utility
function u such that p(u) ∈ [C].
We will carry out this task by demonstrating several methods for constructing ordinal
utility functions, u; this involves demonstrating two things. First, we show what the function
u computes: how it takes a model and produces a number. Second, we show and how to
compute u given a finite set of ceteris paribus preferences C. These methods are sound and
complete. We then demonstrate the heuristic means of making the computation of u more
efficient.
Before we discuss the function u, we define another representation of ceteris paribus preferences, which we refer to as feature vector representation. We discuss this in the following
section.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
163
2. FEATURE VECTOR REPRESENTATION
The utility construction methods developed in following sections employ an intermediate
representation of preferences in terms of simple rules that relate paired patterns of specified
and “do not care” feature values. We will first define this “feature vector” representation. Then
we will show how to convert between statements in the new representation and statements in
the previously discussed logical representation (of Section 1.2). Our translation is designed
so that it can be implemented automatically and used as a “preprocessing step” in systems
computing with ceteris paribus preference statements. Finally we provide proofs of the
expressive equivalence of the two representations.
Let C be a finite set of preference rules. We define F(C) ⊆ F to be the set of features
corresponding to the atoms in the union of the support of each rule in C. That is:
F(C) = f (a) : a ∈
s(c) .
c∈C
Because each preference rule mentions only finitely many features, F(C) is also finite, and
we write N to mean |F(C)|.
We construct utility functions representing the constraints in C in terms of model features.
Features not specified in any rule in C are not relevant to compute the utility of a model,
since there is no preference information about them in the set C. Accordingly, we focus our
attention on F(C).
2.1. Definition
We define the “feature vector representation” relative to an enumeration V =
( f 1 , . . . , f N ) of F(C).
We define the language Lr (V) of feature vector rules in terms of a language L(V) of
propositions over the ternary alphabet = {0, 1, ∗}.
A statement in L(V) consists of a sequence of N letters drawn from the alphabet , so
that L(V) consists of words of length N over . For example, if V = ( f 1 , f 2 , f 3 ), we have
∗10 ∈ L(V). Given a statement p ∈ L(V) and a feature f ∈ F(C), we write f ( p) for the
value in assigned to f in p. In particular, if f = Vi , then f ( p) = pi .
A feature vector rule in Lr (V) consists of a triple p q in which p, q ∈ L(V) have
matching ∗ values. That is, p q is in Lr (V) just in case pi = ∗ if and only if qi = ∗ for
all 1 ≤ i ≤ N . For example, if V = ( f 1 , f 2 , f 3 ), Lr (V) contains the expression ∗10 ∗00
but not the expression ∗10 0∗0. We refer to the statement in L(V) left of the symbol in
a rule r as the left-hand side of r , and denote it LHS(r ). We define right-hand side RHS(r )
analogously. Thus p = LHS( p q) and q = RHS( p q).
We regard statements of L(V) containing no ∗ letters as models of L(V), and write M(V)
to denote the set of all such models. We say a model m satisfies s, written m |= s, just in case
m assigns the same truth value to each feature as s does for each non ∗ feature in s. That is,
m |= s iff f (m) = f (s) for each f ∈ F(C) such that f (s) = ∗. For example, 0011 satisfies
both ∗0∗1 and 00∗∗.
We project models in M to models in M(V) by a mapping α:
Definition 1 (Model projection). The translation α : M → M(V) is defined for each m ∈
M and f ∈ F(C) by α(m) = m , m ∈ M(V). For all f i ∈ V,
r
r
f (α(m)) = 1 if f ∈ m,
f (α(m)) = 0 if ¬ f ∈ m.
164
COMPUTATIONAL INTELLIGENCE
This projection induces an equivalence relation on M, and we write [m] to mean the set
of models in M mapped to the same model in M(V) as m:
[m] = {m ∈ M | α(m ) = α(m)}.
(1)
The translation α specifies that m and m must assign the same truth values to features that
appear in L(V), but that on features not appearing therein, there is no restriction. When the
feature vector of L(V) is the set of features F, there is a one-to-one correspondence of models
in L(V) and L.
We say that a pair of models (m, m ) of L(V) satisfies a rule r in Lr (V), and write
(m, m ) |= r , if m satisfies LHS(r ), m satisfies RHS(r ), and m, m have the same value
for those features represented by ∗ in r , that is, m i = m i for each 1 ≤ i ≤ N such that
LHS(r )i = ∗. For example, (100, 010) |= 10∗ 01∗, but (101, 010) |= 10∗ 01∗.
The meaning [r ] of a rule r in Lr (V) is the set of all preference orders over M such
that for each m, m ∈ M, if (α(m), α(m )) |= r , then m m . The meaning of a set R of
rules consists
of the set of preference orders consistent with each rule in the set, that is,
[R] = r ∈R [r ]. Thus a rule ∗∗01 ∗∗10 represents four specific preferences
0001 0010
0101 0110
1001 1010
1101 1110.
Note that this says nothing at all about the preference relationship between, e.g., 0101 and
1010.
The support features of a statement p in L(V), written s( p), are exactly those features
in p that are assigned value either 0 or 1, which represent the least set of features needed to
determine if a model of L(V) satisfies p. The support features of a rule r in Lr (V), denoted
s(r ), are the features in (LHS(r )). The definition of Lr (V) implies that (LHS(r )) = (RHS(r )).
We say that a pair of models (m, m ) of L(V) satisfies a rule r in Lr (V), and write
(m, m ) |= r , if m satisfies (LHS(r )), m satisfies (RHS(r )), and m, m have the same value
for those features represented by ∗ in r , that is, m i = m i for each 1 ≤ i ≤ N such that
LHS(r )i = ∗. For example, (100, 010) |= 10∗ 01∗, but (101, 010) |= 10∗ 01∗.
The meaning [r ] of a rule r in Lr (V) is the set of all preference orders over M such
that for each m, m ∈ M, if (α(m), α(m )) |= r , then m m . The meaning of a set R of
rules consists
of the set of preference orders consistent with each rule in the set, that is,
[R] = r ∈R [r ]. Thus a rule ∗∗01 ∗∗10 represents four specific preferences
0001 0010
0101 0110
1001 1010
1101 1110.
Note that this says nothing at all about the preference relationship between, e.g., 0101 and
1010.
2.2. L(V) Compared to L
These two languages of ceteris paribus preference have similar expressive power. Many
of the terms used in the definition of the feature vector representation have direct analogs in
the ceteris paribus representation of Doyle and Wellman. We now show a translation from
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
165
a set C of ceteris paribus rules into sets R of feature vector rules in a way that guarantees
compatibility of meaning in the sense that [R] ⊆ [C]. This translation will allow us to develop
algorithms and methods in following sections that use the language L(V) and know that our
conclusions hold over statements in L. We then give the opposite translation.
2.3. Forward Translation
The translation involves considering models restricted to subsets of features. We write
M[S] to denote the set of models over a feature set S ⊆ F, so that M = M[F]. If m ∈ M[S]
and S ⊆ S, we write m S to denote the restriction of m to S , that is, the model m ∈ M[S ]
assigning the same values as m to all features in S . We say that a model m ∈ M[S] satisfies
a model m ∈ M[S ], written m |= m just in case S ⊆ S and m = m S .
A set of rules R of intermediate representation is compatible with a rule c = pc qc in
the ceteris paribus representation of the previous section just in case [R] ⊆ [c]. If m 1 c m 2 ,
this means that for some r ∈ R we have (m 1 , m 2 ) |= r , where m 1 , m 2 model L, m 1 , m 2
model L(V), such that m 1 ∈ [m 1 ] and m 2 ∈ [m 2 ]. We give a construction for such an r from
an arbitrary p. Before delving into the details of our construction, we define two important
translation functions.
The characteristic statement σ (m) of a model m is the statement in L(V) that indicates
the same truth values on features as m does. Similarly, the characteristic model µ( p) of a
statement p ∈ L(V) is the model in M[s( p)] that indicates the same truth values on features
as p does. We give formal definitions below.
Definition 2 (Characteristic statement σ ). For any S ⊆ F(C), σ is a function: σ : M[S] →
L(V). Let m be a model of S. We set f i (σ (m)) as follows:
r
r
r
f i (σ (m)) = 1 iff f i (m) = true,
f i (σ (m)) = 0 iff f i (m) = false,
f i (σ (m)) = * iff f i ∈ S.
An important property of σ is as follows. If a is a model of M[S], m a model in M(V), then
m satisfies a implies that m satisfies σ (a). More formally,
m |= a → m |= σ (a).
(2)
This can be seen by considering each of the N elements of F(C) one at a time. Let S ⊆ F(C).
If f i ∈ S, then a assigns a truth value to f i , which is consistent with f i (m) and f i (σ (a)). If
f i ∈ S, then a does not assign f i a truth value, nor does σ (a) require a particular truth value
for f i (m) for a model m to satisfy σ (a). Note that if f i ∈ (S\F(C)), it is (again) not required
to be of a particular truth value for a model to satisfy σ (a).
Definition 3 (Characteristic model µ). Let p be a statement in L(V). Then the characteristic
model µ( p) of a statement p in L(V) is the model in M[s( p)] defined by
µ( p) = { f | f ( p) = 1} ∪ {¬ f | f ( p) = 0}.
Note that for m ∈ M we have m |= µ(α(m)), that is, µ(α(m)) = m F (C). This follows
directly from the definitions of model satisfaction, in a similar fashion to the proof of
equation (2).
We translate a single ceteris paribus rule c ∈ Lr into a set of intermediate representation
rules R by the support of c. If c is of the form pc qc , where pc , qc are sentences in L,
then models that satisfy pc ∧ ¬qc are preferred to models that satisfy ¬ pc ∧ qc , other things
being equal. For brevity, let sc = s( pc ∧ ¬qc ) ∪ s(¬ pc ∧ qc ), so that sc ⊆ F(C) is the set of
166
COMPUTATIONAL INTELLIGENCE
support features for each rule r ∈ R, and consider models in M[sc ]. Let Wl be the set of
such models satisfying pc ∧ ¬qc , that is,
Wl = {w ∈ M[sc ] | w |= pc ∧ ¬qc },
and define Wr as the corresponding set of models satisfying the right-hand side,
Wr = {w ∈ M[sc ] | w |= ¬ pc ∧ qc }.
We construct new sets Wl and Wr of statements in L(V) from Wl and Wr by augmenting
each member and translating into L(V). We define these new sets by
Wl = {w ∈ L(V) | σ (w ) = w, w ∈ Wl },
Wr = {w ∈ L(V) | σ (w ) = w, w ∈ Wr }.
Note that the members of Wl and Wr are of length |F(C)|, while those of Wl and Wr are of
size |sc |.
We now construct a set of rules in Lr (V) to include a rule for each pair of augmented
statements, that is, a set
R = {wl wr | wl ∈ Wl , wr ∈ Wr }.
This completes the translation.
Consider a simple example. In the following, we assume V = ( f 1 , f 2 , f 3 , f 4 ). A ceteris
paribus rule might be of the form f 2 ∧ ¬ f 4 f 3 . This expresses the preference for models
satisfying
( f2 ∧ ¬ f4) ∧ ¬ f3
over models satisfying
¬( f 2 ∧ ¬ f 4 ) ∧ f 3
other things being equal. Note sc = { f 2 , f 3 , f 4 }. Then, following the above construction,
Wl = {{ f 2 , ¬ f 3 , ¬ f 4 }}, and Wr = {{ f 2 , f 3 , f 4 }, {¬ f 2 , f 3 , f 4 }, {¬ f 2 , f 3 , ¬ f 4 }}. In this case
we translate these into three intermediate representation rules:
∗100 ∗111
∗100 ∗011
∗100 ∗010.
The above exposition and construction gives us the necessary tools to state and prove the
following lemma.
Lemma 1 (Feature vector representation of ceteris paribus rules). Given ceteris paribus
rule cinL, and any m 1 , m 2 models of L where m 1 c m 2 , there exists a finite set of rules R
in Lr (V) such that there exists m 1 , m 2 models of L(V), r ∈ R, with (m 1 , m 2 ) |= r such that
m 1 ∈ [m 1 ] and m 2 ∈ [m 2 ].
qc is the same.
Proof. Without loss of generality, we take c = pc qc . The case of c = pc ∼
Using the definition of a ceteris paribus rule, we note that Lemma 1 is satisfied when
∀m ∈ M, m ∈ m[ pc ∧ ¬qc ], m ∈ m[¬ pc ∧ qc ] →
∃r ∈ R, m |= LHS(r ), m |= RHS(r ).
(3)
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
167
We show that if m, m , m are such that m ∈ m[ pc ∧ ¬qc ], m ∈ m[¬ pc ∧ qc ], then our
translation above constructs an r such that m |= LHS(r ), m |= RHS(r ). Given such
m, m , m , from the definition of m ∈ m[ pc ∧ ¬qc ], we have m |= pc ∧ ¬qc . Let wa be
a member of Wl such that wa = m sc and wb be a member of Wr such that wb = m sc .
By definition of restriction, m |= wa , and m |= wb .
Let wa be in Wl such that wa = σ (wa ). Since m |= wa then by equation (2) m |= wl .
We also have m |= wa . Since wa = LHS(r ) for some r in R according to the construction, we have m |= LHS(r ) for some r . A similar argument shows that m |= RHS(r) for
some (possibly different) r . Since ∀a ∈ Wl , ∀b ∈ Wr , ∃r : a = LHS(r ), b = RHS(r ), we
choose r such that m |= LHS(r ) = wa and m |= RHS(r ) = wb . This completes the proof of
Lemma 1.
Thus, we have shown, for a given rule c = pc qc in the ceteris paribus representation,
there exists a set of rules R in the feature vector representation such that (m 1 , m 2 ) |= r ∈ R
iff there exists m such that m 1 ∈ m[ pc ∧ ¬qc ], m 2 ∈ m[¬ pc ∧ qc ]. Thus, the construction
preserves the meaning of a rule in the ceteris paribus representation.
2.4. Reverse Translation
We give a translation from one rule in Lr (V) to one rule in L. This is a construction for
a rule r in Lr (V) into a ceteris paribus rule c in L, such that if (m 1 , m 2 ) |= r , there exists
c = pc qc and m, such that m 1 ∈ m[ pc ∧ ¬qc ] and m 2 ∈ m[¬ pc ∧ qc ]; and similarly for
”. Suppose we have a general feature vector representation rule r = LHS(r ) rules using “∼
RHS(r ). This means that models satisfying LHS(r ) are preferred to those satisfying RHS(r ),
all else equal. A ceteris paribus rule is a comparison of formulae a b where a, b are
formulae in the language L. Thus, we must look for some formulae a, b such that
a ∧ ¬b ≡ µ(LHS(r ))
(4)
and
¬a ∧ b ≡ µ(RHS(r )).
In the following, m denotes a model in M(V). Consider the sets of models [LHS(r )] = {m |
α(m ) |= LHS}, and [RHS(r )] = {m | α(m ) |= RHS}. Note that [LHS(r )] and [RHS(r )] are
disjoint. Disjointness follows from the support features of LHS(r ) equal to the support features
of RHS(r ), and that LHS(r ) = RHS(r ). [LHS(r )] ⊆ [RHS(r )] follows from disjointness,
where [RHS(r )] = {m | α(m ) |= RHS(r )} is the complement of [RHS(r)]. Thus,
[LHS(r )] ∩ [RHS(r )] = [LHS(r )]
and
[RHS(r )] ∩ [LHS(r )] = [RHS(r )].
This suggests a solution to equation (4). We let a = µ(LHS(r )) so we have [a] = [LHS(r )],
and similarly, b = µ(RHS(r )), and [b] = [RHS(r )]. Then we have
[a ∧ ¬b] = [a] ∩ [b] = [a]
and
[¬a ∧ b] = [a] ∩ [b] = [b],
as required. Thus, our ceteris paribus rule c is a
b.
168
COMPUTATIONAL INTELLIGENCE
|
Lemma 2 (Ceteris paribus rule expressed by Lr (V)). Given a rule r in Lr (V), for each pair
of models (m 1 , m 2 ) |= r , there exists a ceteris paribus rule c = pc qc , or c = pc qc , with
pc , qc in L, such that there exists an m such that m1 ∈ m[ pc ∧ ¬qc ], m 2 ∈ m[¬ pc ∧ qc ].
Proof. The proof follows from the construction. We must show that for each pair (m 1 , m 2 ) |=
r in L(V), our construction creates a rule c = pc qc , with pc , qc in L, such that there exists
m such that m 1 ∈ m[ pc ∧ ¬qc ], m 2 ∈ m[¬ pc ∧ qc ]. The argument for c = pc qc is similar.
Let m 1 , m 2 model L, and be such that α(m 1 ) = m 1 , and α(m 2 ) = m 2 . Then
(α(m 1 ), α(m 2 )) |= r . We know that α(m 1 ) |= LHS(r ) and α(m 2 ) |= RHS(r ). This implies that
m 1 is in [LHS(r )] and m 2 is in [RHS(r )]. Since we define a = µ(LHS(r )), [a] = [LHS(r )]
follows from the definition of µ. Similarly, [b] = [RHS(r )], so we have m 1 ∈ [a], m 2 ∈ [b].
m 1 ∈ [a ∧ ¬b] and m 2 ∈ [¬a ∧ b] follows from [a] = [a ∧ ¬b], [b] = [¬a ∧ b].
Now we show that m 1 , m 2 are the same outside the support of [a ∧ ¬b] and [¬a ∧ b],
respectively. Note that any two logically equivalent statements have the same support, which
follows from how we define support. Thus, s(LHS(r )) = s(a) = s(a ∧ ¬b), and s(RHS(r )) =
s(b) = s(¬a ∧ b). Note also, from the definition of the feature vector representation, we have
s(LHS(R)) = s(RHS(R)), as a consequence of the requirement that the support features of
LHS(r ) be the same as the support features of RHS(r ). Thus s([a ∧ ¬b]) = s([¬a ∧ b]). We
are given that (m 1 , m 2 ) |= r , this implies m 1 , m 2 are the same outside the support of LHS(r ),
RHS(r ). Thus, m 1 , m 2 are the same outside support of [a ∧ ¬b]. This implies there exists
m such that m 1 ∈ m[a ∧ ¬b], m 2 ∈ m[¬a ∧ b], which completes the proof. We note that
m = m 1 or m = m 2 satisfies the condition on m.
|
2.5. Summary
We have shown translations from the feature vector representation to the ceteris paribus
representation, and a similar translation from the ceteris paribus representation to the feature
vector representation. Combining Lemmas 1 and 2 we can state the following theorem:
Theorem 1 (Expressive equivalence). Any preference over models representable in either
the ceteris paribus representation or the feature vector representation can expressed in the
other representation.
Using this equivalence and the construction established in Section 2.3, we define an
extension of the translation function σ (defined in Definition 1) to sets of ceteris paribus
preference rules in L. This translation converts a set of ceteris paribus preference rules to
a set of feature vector representation statements. That is, σ : C → C ∗ , where C is a set of
ceteris paribus rules, and C ∗ is a set of rules in Lr (V). This translation is accomplished
exactly as described in Section 2.3.
Despite the demonstrated equivalence, the two preference representations are not equal
in every way. Clearly, the translation from the ceteris paribus representation can result in
the creation of more than one feature vector representation rule. This leads us to believe that
the language of the ceteris paribus representation is more complicated than the other. This
is naturally the case, since we can have complicated logical sentences in the language L in
our ceteris paribus rules. Thus, it can take many more rules in L(V) to represent a rule in L.
This gives us some further insight.
The length, in characters, of a ceteris paribus rule in the logical language L, is arbitrary.
The length of a rule in Lr (V) is always 2 ∗ |V|. We might also ask how quickly we can
tell if a model m of L(V) satisfies a statement r in Lr (V). This verification is essentially
a string parsing problem, where we check elements of m against elements of r . Thus, the
determination of satisfaction takes time proportional to the length of the formula.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
169
3. SOME SIMPLE ORDINAL UTILITY FUNCTIONS
In this section, we illustrate simple utility functions, u : M(V) → R consistent with
an input set of feature vector representation preferences C ∗ where each rule r ∈ C ∗ is a
statement in Lr (V). It is possible to combine this with the translation defined in the previous
section. Thus, we may take as input a set of ceteris paribus preference rules C, and then let
C ∗ = σ (C). Similarly, one can use these functions u to define utility functions over models in
L by composition with the model-projection mapping α. Specifically, one finds the utility of
a model m ∈ M by computing the projection α(m) ∈ M(V) and using one of the functions
defined in the following.
We consider a model graph, G(C ∗ ), a directed graph which will represent preferences
expressed in C ∗ . Each node in the graph represents one of the possible models over the features
F(C). The graph G(C ∗ ) always has exactly 2|F(C)| nodes. The graph has two different kinds
of directed edges. Each directed edge in G(C ∗ ), es (m 1 , m 2 ) from source m 1 to sink m 2 ,
exists if and only if (m 1 , m 2 ) |= r for some strict preference rule r ∈ C ∗ . Similarly, an edge
ew (m 1 , m 2 ) from source m 1 to sink m 2 , exists if and only if (m 1 , m 2 ) |= r for some weak
preference rule r ∈ C ∗ . Each edge, es and ew , is an explicit representation of a preference
for the source over the sink. It is possible to consider each rule r and consider each pair of
models (m 1 , m 2 ) |= r , and create an edge e for each such pair.
After this process is completed, we can determine if m i is preferred to mj according to
C ∗ by looking for a path from m i to mj in G(C ∗ ) and a path from mj to m i . If both pathes
exist, and are composed entirely of weak edges ew , then we can conclude that m i ∼ mj . If
only a path exists from m i to mj , or a path exists using at least one strict edge, then we can
conclude that m i mj according to C ∗ . Similarly, if only a path exists from mj to m i , or one
using at least one strict edge, then we can conclude that mj m i according to C ∗ . If neither
path exists, then a preference has not been specified between m i and mj in C ∗ . Cycles in the
model graph are consistent if all the edges in the cycle are weak edges (ew ). In this case,
all the models in the cycle are similarly preferred to each other, and should be assigned the
same utility. If there are cycles using strict edges (es ), this would indicate that some model
is preferred to itself according to C ∗ , which we consider inconsistent. Thus, it is possible to
construct the graph, and then look for cycles, with the presence of strict-cycles indicating
that the input preferences are inconsistent.
3.1. Graphical Utility Functions
In this section, we will define four different graphical utility functions (GUFs). Each GUF
will use the same graph, a model graph G(C ∗ ), but will define different measures of utility
using this graph. We will define GUFs that compute the utility of a model m by checking
the length of the longest path originating at m, counting the number of descendants of m in
G(C ∗ ), or the number of ancestors, or the length of the longest path originating elsewhere and
ending at m, or the number of nodes following it in the topological-sorted order of G(C ∗ ).
We present a summary of each GUF in Table 2.
Definition 4 (Minimizing graphical utility function). Given the input set of ceteris paribus
preferences C ∗ , and the corresponding model graph G(C ∗ ), the Minimizing GUF is the
utility function uM where uM (m) is equal to the number of unique nodes on the longest path,
including cycles, originating from m in G(C ∗ ).
The intuition is that in a graph, the distance between nodes, measured in unique nodes, can
indicate their relative utilities. Note that if neither m 1 m 2 or m 2 m 1 in C ∗ , then we do not
170
COMPUTATIONAL INTELLIGENCE
TABLE 2.
Four Different Ordinal Graphical Utility Functions (GUFs)
Function name
Minimizing GUF
Descendant GUF
Maximizing GUF
Topological GUF
Symbol
Method
uM
uD
uX
uT
Longest path from m
Number of descendants of m
Longest path ending at m
Rank of m in topological-sorted order
m 2 and m 2 m 1 in C ∗ , do require u(m 1 ) = u(m 2 ).
require u(m 1 ) = u(m 2 ). Only when m 1 ∼
∼
This implies members of a cycle should receive the same utility. However, in a graph with
cycles the concept of longest path needs more discussion. Clearly several infinite long pathes
can be generated by looping forever on different cycles. We mean the longest nonbacktracking
path wherein all nodes on any cycle the path intersects are included on the path. It is important
to include each of the members of a cycle as counted once among the “unique nodes” on this
longest path, however, since this assures that a set of nodes on the same cycle will receive
the same utility. For example, suppose the rules C ∗ are such that only the following relations
are implied over models:
m2
m1 ∼
m3
m2 ∼
m3 m4
m5
m3 ∼
m2
m5 ∼
m4 m6.
(5)
It is clear that m 2 , m 3 , m 5 form a cycle (see Figure 1). The longest path from m 1 clearly
goes through this cycle, in fact this path visits nodes m 1 , m 2 , m 3 , m 4 , m 6 . However, since
this path intersects the cycle {m 2 , m 3 , m 5 }, we add all nodes on the cycle to the path, in
this case only the node m 5 . Thus, the “longest path” we are interested in from m 1 passes
M1
M5
M2
M3
M4
M6
FIGURE 1. Model graph for preferences in equation (5).
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
171
M3
M1
M7
M4
M2
M5
M6
FIGURE 2. Model graph for preferences in equation (6).
though all six nodes, and we have u M (m 1 ) = 6. Similarly, the other nodes receive utility as
follows:
u M (m 1 ) = 6
u M (m 2 ) = 5
u M (m 3 ) = 5
u M (m 4 ) = 2
u M (m 5 ) = 5
u M (m 6 ) = 1.
Consider the relation between unrelated models provided by the minimizing GUF. Our
different graphical utility functions define different relationships for such models, in fact, the
relationship is somewhat arbitrary. Consider the following example. Suppose the rules in C ∗
are such that only the following pairs of models satisfy any rule in C ∗ :
m1
m3
m4
m4
m2
m4
m5
m6.
(6)
The relationships between several models (see Figure 2), for example m 1 and m 3 , is
unspecified. A utility function is free to order these two any way convenient, and the utility
function will still be consistent with the input preferences. The minimizing GUF gives the
following values to the models:
u M (m 1 ) = 2
u M (m 2 ) = 1
u M (m 3 ) = 3
u M (m 4 ) = 2
u M (m 5 ) = 1
u M (m 6 ) = 1.
An interesting property of this utility function is that it is minimizing, that is, it assigns minimal
utility to models that are not explicitly preferred to other models according to the preference
set. Thus, suppose there exists an m 7 in the above domain, about which no preferences are
specified. This model will receive minimal utility (1) from the utility function.
Definition 5 (Descendant graphical utility function). Given the input set of ceteris paribus
preferences C ∗ , and the corresponding model graph G(C ∗ ), the Descendant GUF is the
172
COMPUTATIONAL INTELLIGENCE
utility function u D where u D (m) is equal to the total number of unique nodes on any paths
originating from m in G(C ∗ ).
Definition 6 (Maximizing graphical utility function). Given the input set of ceteris paribus
preferences C ∗ , and the corresponding model graph G(C ∗ ), we let max(G(C ∗ )) be the length
of the longest path in G(C ∗ ). The maximizing GUF is the utility function u X where u X (m)
is equal to max(G(C ∗ )) minus the number of unique nodes on the longest path originating at
any node other than m and ending at m in G(C ∗ ).
Definition 7 (Topological sort graphical utility function). Given the input set of strict ceteris
paribus preferences C ∗ , and the corresponding model graph G(C ∗ ), let n = 2|F(C)| be the
number of nodes in G(C ∗ ). The topological sort GUF is the utility function u T where u T (m)
is equal to n minus the rank of m in the topological-sorted order of G(C ∗ ).
Each of the above utility functions is ordinal. Each function except the topological sort
GUF handles both weak and strict preferences. Our maximizing GUF must make use of the
same technique for the path intersecting a cycle that we use for the minimizing GUF. The
Descendant function has no such difficulty with cycles, but note that a node must be one of
its own descendants.
Each function behaves differently toward models not mentioned by the preferences. We
have already discussed the “minimizing” property, where a utility function gives very low or
zero utility to models not mentioned by the preferences. These are models that are neither
preferred to other models or have other models preferred to them. We therefore have no
information about them, and, as we have mentioned, are free to order them as convenient,
while preserving the ordinal property of the utility function. In contrast to “minimizing,” we
call the above functions “maximizing” when they assign high utility to models about which
we have no information.
Consider the example given in equations (6). Under the Descendant GUF (Definition 5),
we get the following utilities for m 1 , . . . , m 7 :
u D (m 1 ) = 2
u D (m 2 ) = 1
u D (m 3 ) = 4
u D (m 4 ) = 3
u D (m 5 ) = 1
u D (m 6 ) = 1
u D (m 7 ) = 1.
We give slightly higher utility to m 3 , m 4 than the under Definition 4, since the former counts
all descendants, while the latter counts only the longest path.
The maximizing graphical utility function, Definition 6, gives the following values:
u X (m 1 ) = 3
u X (m 2 ) = 2
u X (m 3 ) = 3
u X (m 4 ) = 2
u X (m 5 ) = 1
u X (m 6 ) = 1
u X (m 7 ) = 3.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
173
For some applications, it can be useful to find groups of models for which no model is
clearly better, these models are called nondominated, or pareto-optimal. The maximizing
GUF assigns each such nondominated model maximal utility. Thus, it assigns u(m 7 ) = 3. In
some sense, this gives the function a speculative characteristic, since it is willing to assign
high utility to models which are unmentioned in the input preferences.
Topological sort gives different answers depending on how it is implemented. However,
each model gets a unique utility value. It could give the following utility values for the
example:
u T (m 1 ) = 7
u T (m 2 ) = 6
u T (m 3 ) = 5
u T (m 4 ) = 4
u T (m 5 ) = 3
u T (m 6 ) = 2
u T (m 7 ) = 1.
This is a good example of how the utility of models can vary hugely under different utility
functions, while each utility function is consistent with C ∗ .
Now that we have discussed some of the properties and behaviors of each of the GUFs
defined above, we prove their consistency.
Theorem 2 (Consistency of minimizing GUF). Given the input set of ceteris paribus preferences C ∗ , and the corresponding model graph G(C ∗ ), the utility function that assigns u M (m)
equal to the number of unique nodes on the longest path originating from m in G(C ∗ ) is
consistent with C ∗ .
Proof. Let u M (m) be equal to the number of nodes on the longest path originating from m in
G(C ∗ ). For u M to be consistent with C ∗ , we require p(u M ) ∈ [C ∗ ]. Specifically, this requires
C ∗ m 2 , and u M (m 1 ) > u M (m 2 ) whenever m 1 C ∗ m 2 .
that u M (m 1 ) ≥ u M (m 2 ) whenever m 1 ∼
Choose a pair m 1 , m 2 such that m 1 C ∗ m 2 according to C ∗ . By construction of G(C ∗ ), there
exists an edge from m 1 to m 2 in G(C ∗ ). Thus, u M (m 1 ) ≥ u M (m 2 ) because the there exists a
path from m 1 that contains m 2 , and therefore contains the longest path from m 2 , plus at least
C ∗ m 2 then it is possible that there is both a path from
one node, namely the node m 1 . If m 1 ∼
m 1 to m 2 and a path from m 2 to m 1 . In this case, these two pathes define a cycle containing
both m 1 and m 2 . Since m 1 and m 2 lie on the same cycle, u M (m 1 ) = u M (m 2 ) since any longest
path accessible from one model is also the longest path from the other.
Theorem 3 (Consistency of descendant GUF). Given the input set of ceteris paribus preferences C ∗ , and the corresponding model graph G(C ∗ ), the utility function that assigns u D (m)
equal to the total number of unique nodes on any pathes originating from m in G(C ∗ ) is
consistent with C ∗ .
Proof. Following the proof of Theorem 2, if we know that m 1 m 2 according to C ∗ , then by
construction of G(C ∗ ), there exists an edge from m 1 to m 2 in G(C ∗ ). Since m 2 is on a path from
m 1 , m 1 has at least one more descendant than m 2 , namely, m 1 . Therefore u D (m 1 ) > u D (m 2 ).
If it is a weak edge from m 1 to m 2 , then there might also be a path from m 2 to m 1 , in which
case, both models have the same set of descendants, and u D (m 1 ) = u D (m 2 ).
Theorem 4 (Consistency of maximizing GUF). Given the input set of ceteris paribus preferences C ∗ , and the corresponding model graph G(C ∗ ), we let max(G(C ∗ )) be the length
174
COMPUTATIONAL INTELLIGENCE
of the longest path in G(C ∗ ). The utility function that assigns u X (m) equal to max(G(C ∗ ))
minus the number of unique nodes on the longest path originating at any node other than m
and ending at m in G(C ∗ ) is consistent with C ∗ .
Proof. Omitted.
Theorem 5 (Consistency of topological sort GUF). Given the input set of strict ceteris
paribus preferences C ∗ , and the corresponding model graph G(C ∗ ), let n = 2|F(C)| be the
number of nodes in G(C ∗ ). The utility function that assigns u T (m) equal to n minus the rank
of m in the topological-sorted order of G(C ∗ ) is consistent with C ∗ .
Proof. Omitted.
3.2. Complexity
The utility functions outlined in the previous section, while conceptually simple, have
worst-case complexity exponential in the number of relevant features N = |F(C)|.
As noted earlier, the model graph G(C ) has 2 N nodes, but this exponential size does not in
itself imply exponential cost in computing utility functions because the utility functions derive
from graph edges rather than graph nodes. The descendant utility function u D , for example,
requires counting the number of descendants of nodes, a number at worst linear in the number
of edges. The other utility functions measure the number of ancestors, or the longest path
from or to a node. Clearly counting the number of ancestors is the same computational burden
as counting the number of descendants. Computing the longest path originating at a node
and ending elsewhere has the same bound, since searching all descendants can determine
the longest path. Accordingly, the number of edges in the model graph provides a basic
complexity measure for these utility computations.
In fact, a simple and familiar example shows that the model graph can contain a number
of edges exponential in the size of F(C). Suppose, for instance, that F(C) consists of four
features and that the derived intermediate preference rules C consist of those displayed in
Table 3. These rules order all models lexicographically in a complete linear order, the same
ordering we give models if we interpret them as binary representations of the integers from 0
to 15. The longest path through G(C ) has length 2|F(C)| , so the number of edges is exponential
in |C | = |F(C)|. One should note that this example does not imply utility dependence among
the features, but it does imply that the preferences over some features dominate the preferences
over other features. Moreover, the example does not show that derivation of a utility function
must take exponential time, because lexicographic utility functions can be expressed in much
simpler ways than counting path length. The true complexity of this problem remains an open
question.
In fact, one can trade computation cost between construction and evaluation of the utility
function. The evaluation of specific utility values can be reduced by significant preprocessing
in the function-construction stage. Clearly the utility value of m ∈ M(V) could be cached
TABLE 3. Lexicographic
Preference Rules
∗∗∗1 ∗∗∗0
∗∗10 ∗∗01
∗100 ∗011
1000 0111
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
175
at the corresponding node in G(C ), using, for example, Dijkstra’s all-paths algorithm. Alternatively, values for a smaller number of nodes might be cached. The computation of a
utility value could then proceed by traversing G(C ) until each branch of the search from
the starting node reaches a node with a cached utility value, and then computing the desired
node value from these cached values and the graph portion traversed in reaching them.
For instance, one might compute a utility function akin to the descendant utility function
by keeping a static set of nodes with cached utility values. If one finds k separate branches
of the search starting from a node m and terminating in k nodes with cached values u(m i )
for i = 1, . . . , k, and traverses a total of t nodes along these branches, we assign
u(m) = 1 + t +
k
u(m i ).
i=1
This value can differ from the descendant utility value because two cached nodes might
share some descendants. Reducing evaluation cost from exponential levels can be achieved
by cutting the number of nodes searched before reaching cached nodes to log(|G(C )|), but
as the lexicographic example shows, this can require caching |G(C )|/ log |G(C )| nodes,
implying an exponential level of preprocessing cost.
Alternatively, one might calculate utility values by repetitively computing G(C ). One
might do this by searching for a path from a model m 0 by finding rules in r ∈ C such that
(m 0 , m 1 ) |= r , where m 1 is arbitrary. The proof of Theorem 2 implies that an edge e(m 0 , m 1 )
in G(C ) exists if and only if there exists some (m 0 , m 1 ) |= r for any r ∈ C . Thus, searching
the list of rules in C for a pair (m 0 , m 1 ) |= r for some r ∈ C is equivalent to following an edge
e(m 0 , m 1 ) in G(C ). Then one recursively looks for rules r such that (m 1 , m 2 ) |= r , and then
(m 2 , m 3 ), and so on, such that the search becomes a traversal of the graph G(C ). Each branch
of the search terminates when (m k−1 , m k ) |= r for some rule r ∈ C , but (m k , m k+1 ) |= s for
all rules s in C. We know that there exists a path from m 0 with length k; if k is the length of
the longest such path, one can then assign u(m 0 ) = k.
In the following sections, we present a number of heuristics using the concept of UI that
reduce the complexity of evaluating u(m).
4. GENERALIZED ADDITIVE INDEPENDENCE
Utility Independence is a property that obtains when the contribution to utility of some
features can be determined without knowledge of the values assigned to other features. More
precisely, if a set of features Si is utility independent of a disjoint set of features S j , then the
utility given to Si does not depend on the values assumed by the features of S j . We give some
general background regarding UI presently. In the following subsections, we demonstrate
how UI expedites our calculation of utility from ceteris paribus preferences. We give a more
general treatment here than was given in McGeachie and Doyle (2002). For these discussions,
we assume that C ∗ = σ (C), for a set C of strict ceteris paribus preference rules. We write Si ,
for 1 ≤ i ≤ k, to denote subsets of the feature set F(C) such that each Si is utility independent
of the complementary set of features F(C)\Si .
For a given set Si , define a partial utility function û i for features Si , that assigns a utility to
all models m of features Si . For a given set Si , and given that we have a function û i defined, we
will call a subutility function a function u i : M(V) → R such that ∀m, u i (m) = û i (m Si ).
We will write u i (·) for the subutility function for Si .
We use the logical connective “∧” between models m 1 ∧ m 2 as shorthand for µ(m 1 ) ∪
µ(m 2 ). That is, m 1 ∧ m 2 is a model of the combined support features of m 1 and m 2 . To talk
176
COMPUTATIONAL INTELLIGENCE
about properties of the values assigned by m 1 over the set S1 and the values assigned by m 2
to S2 , we would discuss m 1 S1 ∧ m 2 S2 .
4.1. Definition
A set of features Si is UI of a disjoint set of features S j with F(C) = (Si ∪ S j ), if and
only if, for all m 1 S j , m 2 Si , m 3 Si , and m 4 S j ,
((m 2 Si ∧ m 1 S j ) (m 3 Si ∧ m 1 S j ))
⇒ ((m 2 Si ∧ m 4 S j ) (m 3 Si ∧ m 4 S j )).
(7)
We call this UI; Keeney and Raiffa (1976) call it preferential independence, and use a different
notation.
Note that utility dependence is not symmetric. It is possible for a set of features Si to be
utility dependent upon a disjoint set of features S j , while S j is independent of Si .
Since UI generally simplifies the structure of the corresponding utility functions, we
assume that features are utility independent whenever there is no evidence to the contrary.
Recall that our aim is to find a utility function u such that p(u) ∈ [C] for a set of ceteris
paribus preferences C. For example, suppose C = ∅, then [C] is the set of all preorders over
F, and we are free to choose any utility function u over F, particulary, one in which all
features are utility independent of all other features.
In the following, we will try to establish a partition S of F(C), where for each Si ∈ S,
we find a set Ti ⊆ F(C) such that Si is utility dependent only upon values of features in Ti .
We will show that such a decomposition allows us to find a utility function u of the form
u(m) =
Q
ti u i (m),
(8)
i=1
where each u i is a partial utility function for the features Ti ∪ Si , and each ti is a positive
constant.
4.2. Methods
The condition for UI in implication (7) is stronger than we require. Instead of satisfying
implication (7), we would rather assume that all features are utility independent of all other
features and then look for cases where this assumption is explicitly controverted by the
preferences C. With this approach in mind, we investigate how to determine when two
feature sets must be utility dependent.
One method of establishing utility dependence of features in Si on features in S j is to
identify models m 1 , m 2 , m 3 , m 4 in M(V) and rules r1 , r2 such that implication (7) does
not hold. The correspondence is analogous to implication (7): we want all of the following
conditions to hold
(m 2 Si ∧ m 1 S j ) |= LHS(r1 ),
(m 3 Si ∧ m 1 S j ) |= RHS(r1 ),
(9)
(m 2 Si ∧ m 4 S j ) |= RHS(r2 ),
(m 3 Si ∧ m 4 S j ) |= LHS(r2 ).
Although conditions (9) are conditions upon the preorders in [C], in some cases such
m 1 , m 2 , m 3 , m 4 , and Si , S j can be read from the support feature sets of a pair of rules in C ∗ .
Specifically, since S j is a subset of the support features of r1 , and m 1 restricted to S j satisfies
both LHS(r1 ) restricted to S j and RHS(r1 ) restricted to S j , we can look for rules of this form.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
177
Lexically, these rules are easy to recognize: some of the features specified in the rule have
the same value on the left- and right-hand sides of the rule. Then we look for another rule,
r2 , with the same property: there is a set of features S j that is a subset of the support features
of r2 and there is an m 4 satisfies both LHS(r2 ) restricted to S j and RHS(r2 ) restricted to S j ,
with the additional restriction that S j is a subset of S j . Again, we are looking for a rule that
specifies the same values for the same features on the left- and right-hand sides, but these
features must be a subset of those features we found for r1 . If the previous conditions hold,
we have found suitable m 1 , m 4 , and S j satisfying conditions (7) and (9).
We then check that r1 , r2 are such that Si can be found that is disjoint with S j and a
subset of the support features of both r1 and r2 , and an m 2 can be found that satisfies both
LHS(r1 ) restricted to Si and RHS(r2 ) restricted to Si . Here, we are looking for a preference
over models restricted to Si that switches with the values assigned to S j . Again, this is easy
to check for, by doing lexical comparisons on the left- and right-hand sides of the support
feature sets of rules. If all the preceding holds for some Si , S j , then Si is utility dependent
on S j . We are assured that some feature in Si is utility dependent on some feature in S j since
the condition (7) is violated. Accordingly, we assume that Si is utility dependent on S j .
Consider an example. Let V = f 1 , f 2 , f 3 , f 4 , f 5 and
r1 = ∗0001 ∗1101,
(10)
r2 = ∗∗100 ∗∗000
(11)
be rules in Lr (V). We let S j to be the features of the intersection of µ(LHS(r1 )) and
µ(RHS(r1 )). Thus we let S j = { f 4 , f 5 }, and m 1 S j = 01. This satisfies our condition on
m 1 ; that m 1 |= LHS(r1 ) S j and m 1 |= RHS(r1 ) S j . Then we consider r2 . Similarly, we consider the features of intersection of µ(LHS(r2 )) and µ(RHS(r2 )), which is { f 4 , f 5 }. Since
this happens to be the same as S j , we let S j = S j . Then m 4 S j = 00, and we indeed have
m 1 S j = m 4 S j .
Next, to choose Si , check if there are features shared between µ(RHS(r1 )) ∩ µ(LHS(r2 ))
and µ(RHS(r2 )) ∩ µ(LHS(r1 )). Since Si must be disjoint from S j , we exclude features in
S j from this check. We see that f 3 is shared in these two sets, thus we let Si = f 3 . Then
m 2 Si = 0, m 3 Si = 1, and we verify that m 2 Si = m 3 Si . This process constitutes
evidence that Si , S j , m 1 , m 2 , m 3 , m 4 cannot satisfy implication 7, and therefore that Si is
utility dependent on S j .
We now define a function, CheckDependence, takes as input two rules, r1 , r2 ∈ Lr (V)
and gives as output two possibly empty sets of features Sd , Su ⊆ F(C) such that features Sd
are known to be utility dependent on features Su .
Definition 8 (CheckDependence). Given input two rules, r1 , r2 ∈ Lr (V), output two possibly
empty sets of features Sd , Su ⊆ F(C) such that features Sd are known to be utility dependent
on features Su
Pseudocode listing of function CheckDependence:
1.
2.
3.
4.
5.
6.
7.
ssame1
ssame2
Su
scross1
scross2
sintersect
Sd
← { f | f (s(LHS(r1 ))) = f (s(RHS(r1 )))}
← { f | f (s(LHS(r2 ))) = f (s(RHS(r2 )))}
← ssame1 ∩ ssame2
← { f | f (s(LHS(r1 ))) = f (s(RHS(r2 )))}
← { f | f (s(RHS(r1 ))) = f (s(LHS(r2 )))}
← scross1 ∩ scross2
← { f ∈ sintersect | f (LHS(r1 )) = f (RHS(r1 ))}.
178
COMPUTATIONAL INTELLIGENCE
In the above definition Sd is indirectly related to Su since they are both functions of the
specified features of r1 and r2 . The first variable, ssame1 , holds the support features that are the
are assigned the same value on the LHS and RHS of r1 . ssame2 performs the same calculation
for r2 . Su is the intersection of these two. Thus, Su is a set of features that have the same value
on the LHS and RHS of r1 , and on r2 . The variable scross1 holds the support features which
are assigned the same value on the LHS of r1 and the RHS of r2 . scross2 holds the support
features which are assigned the same value on the RHS of r1 and the LHS of r2 . sintersect holds
the intersection of scross1 and scross2 . Sd is the subset of sintersect that switches values from one
rule to the other. Thus Sd is the set of features over which r1 and r2 express a preference
over models that switches depending on the value of the features in Su . If Sd and Su are both
nonempty, we discover that features Sd are utility dependent on features Su .
Using function CheckDependence, we can compute a partition S of F(C) into utilityindependent feature sets and an associated set of feature sets T such that each feature set
Si ∈ S is utility dependent upon a (possibly empty) feature set Ti ∈ T . The main idea is that
sets of features Si , S j ⊆ F(C), where utility of Si depends on S j can be detected by doing
preprocessing on C ∗ , by simply running CheckDependence on each pair of rules to see if a
preference for one feature changes with the value assigned to another feature. If this is the
case, we note that Si is utility dependent on S j , by adding S j to the set Ti .
We define a function UIDecomposition that takes as input a set of features F(C) and
set of rules C ∗ in Lr (V), and outputs two sets of feature sets, S, T such that for each set of
features Si ∈ S, features Si are known to be utility dependent on features Ti ∈ T .
Definition 9 (Utility independence decomposition). Takes input a set of features F(C) and
set of rules C ∗ in Lr (V), and outputs two sets of feature sets, S, T .
1. Initialize S to a set of singleton sets, where each set contains exactly one feature from
F(C). That is, we have S1 = { f 1 }, S2 = { f 2 }, . . . , S N = { f N }.
2. Initialize a set T to a set of empty sets, that is, we have T1 = {}, T2 = {}, . . . , TN = {}.
3. For each pair of rules r1 , r2 ∈ C ∗ , run function CheckDependence on r1 , r2 . If CheckDependence returns some nonempty set of features F utility dependent on some nonempty
set of features F , for each f i ∈ F, if f i ∈ Si add features F to the set Ti .
4. For each pair of sets Ti , T j ∈ T , if either Ti ⊆ T j or T j ⊆ Ti , then merge sets Si and S j ,
and merge sets Ti and T j . Merging sets Si and S j results in replacing the value of Si with
Si ∪ S j and setting the value of S j to the empty set; similarly for merging Ti , T j .
5. Output sets S and T .
The function UIDecomposition is fast and guaranteed to work when the preferences are
of a restricted form. If the input preferences C ∗ are such that they contain no ∗ letters and
that if m 1 C ∗ m 2 then there exists r ∈ C ∗ such that (m 1 , m 2 ) |= r , then UIDecomposition
will find all entailed utility dependence between features. We state as much in the following
theorem.
Theorem 6 (Independence construction). If the input preferences C ∗ are such that they
contain no ∗ letters and that if m 1 C ∗ m 2 then there exists r ∈ C ∗ such that (m 1 , m 2 ) |= r ,
then the UIDecomposition function computes a partition S of the features F(C) such that
each set Si ∈ S is utility independent of the features in F(C)\{Si ∪ Ti }, and further, for each
Si , no set Ti ⊂ Ti exists such that Si is utility independent of F(C)\Ti .
Proof. First note that it is clear that S is a partition because the algorithm starts out with
a partition, and a merge of two sets Si , S j ∈ S is the only operation performed on S, and
it is an operation which preserves the property of being a partition. Further, it is clear that
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
179
(m 2
(m 3
(m 2
(m 3
Si is independent of F\Ti , because the algorithm adds all features upon which Si is utility
dependent to the set Ti .
Now we show that for each Si , no set Ti ⊂ Ti exists such that Si is utility independent
of F(C)\Ti . First suppose Ti is such a set of features. Since the set Ti is the result of our
method, and Si ∈ S, we know that there were some rules in C that caused Fi ⊂ Ti to be added
to set Ti . Without loss of generality, assume a ⊆ Si was shown to be utility dependent on Fi .
Thus, there is some pair of rules r1 , r2 , and some models m 1 , m 2 , m 3 , m 4 such that equations
(9) are satisfied. That is, we have:
a ∧ m1
a ∧ m1
a ∧ m4
a ∧ m4
Fi ) |= LHS(r1 ),
Fi ) |= RHS(r1 ),
Fi ) |= RHS(r2 ),
Fi ) |= LHS(r2 ).
However, this is in direct conflict with the definition of utility independence, as given in
implication (7). Thus we have reached a contradiction, and conclude that our supposition,
that Ti exists, is false.
It remains to discuss what happens when the input preferences C ∗ do not adhere to
the constraints of Theorem 6. Let us consider an example of preferences over f 1 , f 2 , f 3
that imply utility dependence amongst the features but are not registered as such by the
CheckDependence function:
r1
r2
r3
r4
= 1∗0 > 0∗1
= 01∗ > 11∗
= ∗01 > ∗10
= 010 > 000.
Consider the transitivity of preference over models: r1 and r2 imply 110 > 011 > 111 and
r3 and r4 imply 001 > 010 > 000. Thus, it is clear that 110 > 111 and 001 > 000. Let
r5 = 110 > 111
r6 = 001 > 000.
If rules r5 , r6 were in the rule set C ∗ , then CheckDependence would conclude that f 1 is
utility dependent upon f 2 and f 3 . There exists, however, a consistent utility function over
{r1 , r2 , r3 , r4 } with f 3 utility dependent upon f 2 and f 1 ; and, further, there is no consistent
utility function treating f 3 as utility independent of f 2 and f 1 .
Rules r1 through r4 illustrate the phenomenon that utility dependence can be the result of
any implied ordering over models. Thus a complete utility-dependence discovering algorithm
for C ∗ would have to check each pair of models m 1 C ∗ m 2 using the CheckDependence
function. We note that enumerating all such pairs of models is of worst-case time complexity
exponential in the size of the feature set, |F(C)|. Thus, while a complete version of the
UIDecomposition function is possible in principle, it is not practical.
Although dependencies indicated by implied orderings are certainly possible, one might
expect them to occur infrequently in practical applications. The preferences implying the
orderings, of course, are provided in explicit form by humans. Many transitivities will be
obvious to the human informant, who will then have to state the transitivities explicitly
to express the intended preferences. In particular, a user wishing to state a subordering
among the values of some attribute using three conditions p1 , p2 , and p3 will need to
180
COMPUTATIONAL INTELLIGENCE
explicitly state the three rules p1 p2 , p2 p3 , and p1 p3 , due to the intransitivity of
ceteris paribus preference. One can expect that similar attribute-value orderings comprise
a significant fraction of all transitivities. The need to state the transitive aspects of these
orderings explicitly thus minimizes the potential for hidden dependencies. More generally,
human users are quite good at recognizing when utility dependence exists in their preference
structures (see, for example, Keeney and Raiffa 1976), and so will often have a reasonable
idea of dependence and independence when formulating preferences and will include in the
statement of such preferences that explicitly indicate the dependence.
In any event, utility dependence undetected by function UIDecomposition can cause an
error in a later step of the algorithm, but we describe heuristics that correct the problem in
many cases (see Section 5.7).
Computationally, function UIDecomposition is an all-pairs process on the preference
rules C ∗ . For each pair of rules r1 , r2 , the overlap of s(r1 ) and s(r2 ) must be computed. Then,
using function CheckDependence, we check if this overlap admits models of L(V) such
that implication 7 does not hold. Thus, function UIDecomposition makes O(|C ∗ |2 ) calls to
CheckDependence. Function CheckDependence takes time linear in the number of features,
N , so UIDecomposition requires O(N ∗ |C ∗ |2 ) work.
5. UTILITY CONSTRUCTION
We now describe how to compute one utility function consistent with the set of input
ceteris paribus preferences, C. We take the partition of F(C) : S = {S1 , S2 , . . . , S Q } discussed in Section 4.2, and the corresponding set of sets of features T = {T1 , T2 , . . . , TQ },
where each set of features Si is utility dependent on the features Ti and utility independent
of F(C)\Ti . We define a set of sets S = {S1 , S2 , . . . , S Q } such that Si = (Si ∪ Ti ). We will
look for an additive utility function that is a linear combination of subutility functions u i ,
each subutility a separate function of a particular Ti . We associate a scaling parameter ti > 0
with each u i such that the utility of a model is
u(m) =
Q
ti u i (m).
(12)
i=1
We refer to this as an generalized additive utility function. We will argue that this is consistent
with a set of preference rules C ∗ where C ∗ ∈ [C] is a set of rules in Lr (V). We have two tasks:
to craft the subutility functions u i , and to choose the scaling constants ti . We will accomplish
these two tasks in roughly the following way. We will show how a rule can be restricted to
a set Si . This is essentially a shortening of a rule to a particular set of features, in the same
manner as we have defined restriction of models to a set of features. By restricting rules to
the sets Si , we can use these shortened forms of the rules to make GUFs for the partial utility
functions u i . Rules that are consistent in general can be locally inconsistent when restricted to
different feature sets. These inconsistencies can be phrased as constraints and resolved using
a boolean constraint satisfaction solver (SAT). To assign values to scaling parameters ti , we
will define a set of linear inequalities which constrain the variables ti . The linear inequalities
can then be solved using standard methods for solving linear programming problems. The
solutions to the inequalities are the values for the scaling parameters ti . Along the way we
will show some helpful heuristics.
We first describe the subutility functions, and then their linear combination into a full
utility function.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
181
5.1. Subutility Functions
Our task is to define subutility functions u Si (m) = û Si (m Si ) that take as input the
values for the features F(C), and return an integer. We define such a function relative to
some rules r ∈ C ∗ . We say that a subutility function u Si is -consistent with r ∈ C ∗ if
u Si (m 1 ) ≥ + u Si (m 2 ) whenever (m 1 , m 2 ) |= r and (m 1 Si ) = (m 2 Si ), where > 0. In
the following, we generally let = 1 and prove results for 1-consistency. In general, it is
not necessary to use = 1, but using 1 will make some calculations simpler. We show that
1-consistency implies the consistency of u with r .
Theorem 7 (Subutilities). Given rules C ∗ and a mutually utility-independent partition S of
F(C), if each u i is 1-consistent with r , then
u(m) =
Q
ti u i (m),
i=1
where ti > 0, is consistent with r .
Proof. Given in Appendix.
We show how to generate a 1-consistent subutility function by constructing a restricted
model graph, G i (R) for a set of rules R in Lr (V). G i (R) is a multigraph with directed edges,
where each model m in M(Si ) is a node in the graph; recall we have defined M(V) to be
the set of models of the features listed in V. Let m 1 , m 2 be models in M(Si ) with m 1 = m 2 .
There exists one directed edge in the restricted model graph G i (R) from node m 1 to node m 2
for each distinct rule r ∈ R such that (m 1 , m 2 ) |= r and (m 1 Si ) = m 1 and (m 2 Si ) = m 2 ,
where m 1 = m 2 . For example, if (m 1 , m 2 ) |= r1 and (m 1 , m 2 ) |= r2 , there exists two edges
from node m 1 to m 2 . We label the edges with the rule r that causes the edge. The interpretation
of an edge e(m 1 , m 2 ) in G i (Si ) is of a strict preference for m 1 over m 2 . The construction of
this graph G i (R) parallels the construction of the general model graph G, as described in
Section 3.
Note that if a rule r ∈ R has Si \s(r ) = ∅, then the rule makes more than one edge in the
model graph G i (R). Specifically, a rule r makes
2|Si \s(r )|
edges.
With a restricted model graph G i (R) defined, there is still the matter of defining a utility
function on this graph. It appears we could choose any of the ordinal utility functions described
in Section 3 to be the utility function for û i , and preserve further results in this section. For
example, we could use the minimizing GUF for û i . Recall that in this ordering, each node has
utility equal to the length of the longest path originating at the node. If we use the minimizing
GUF, then we want û i (m Si ) to return the length of the longest path starting at node (m Si )
in graph G i (R).
Lemma 3 (Cycle-free subutility). Given a set of rules R ⊆ C ∗ and a set of features Si ⊆ F(C)
such that the restricted model graph G i (R) is cycle-free, and û i (m Si ) is the minimizing
GUF over G i (R), the subutility function u i (m) = û i (m Si ) for Si is 1-consistent with all
r ∈ R.
Proof. The proof is given in the Appendix.
182
COMPUTATIONAL INTELLIGENCE
r1
00
r2
01
11
r2
10
FIGURE 3. Model graph for set Si = { f 2 , f 3 }. Nodes are labeled with binary representation of values they
assign to features f 2 , f 3 .
We claim but do not prove that similar results hold when û i is any of the other three
graphical utility functions defined in Section 3.
Consider an example. Suppose we have the following set R of preferences over
f1, f2, f3, f4 :
Rule
r1
r2
r3
Preference
*000 *011
**0* **1*
1*** 0***
m Si
û i (m Si )
00
01
10
11
1
0
1
0
Suppose that with these three preference rules, we are interested in their restriction to
the set Si = { f 2 , f 3 }. Then the restricted model graph is shown in Figure 3. Note that we
annotate each edge with the rule that causes it, and that r3 makes no edges in G i (R). Using
this figure, we can define the partial utility function û i (m Si ) as follows:
5.2. Conflicting Rules
Although we assume that the input preferences C ∗ are all strict preferences, and have
no conflicting rules, it is possible that a restricted model graph G i (C ∗ ) for Si ⊂ F(C) has
strict-edge cycles in it even though a model graph G(C ∗ ) for F(C) does not. Since we are
creating subutility functions for use in an generalized additive utility function, we cannot use
cycles. Using cycles would break many of the heuristics we will define.
Consider a set of rules R ⊆ C ∗ , and set S of features sets Si ⊆ F(C). The rules R conflict
if there exists some feature set Si ∈ S such that no subutility function u i for Si can be 1consistent with all r ∈ R. When such a feature set Si exists, we say that R conflict on Si or
that R conflict on u i .
Lemma 4 (Conflict cycles). Given a set of strict preference rules R ⊆ C ∗ and a set S of
feature sets of F(C), the rules R conflict if and only if the rules R imply a cycle in any
restricted model graph G i (R) of some set of features Si ∈ S.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
183
Proof. The proof is given in the Appendix.
S = (RHS(r )
r
Note that if the rules R imply a cycle in the (unrestricted) model graph G(R) over all
features, F(C), then the rules R represent contradictory preferences.
We define the restriction of a rule r to a set of features S, written r S , similar to restricting
a model to a set of features. The restriction of a rule r to S is another rule in the feature vector
representation, but over fewer features:
S) (LHS(r )
S)
Consider rules x, y in Lr (V), with V = ( f 1 , f 2 , f 3 , f 4 ), with x = ∗01∗ ∗10∗, y = ∗∗01 ∗∗10. If we examine the restriction to feature three: x { f 3 } and y { f 3 }, then we have
x { f 3 } = 1 0 and y { f 3 } = 0 1. Thus, when restricted to f 3 it is inconsistent to assert
both x and y. In these cases, we say that x, y conflict on f 3 .
We call a set of rules Ri the relevant rule set for Si if Ri is the set of rules r ∈ C ∗ such
that (m 1 , m 2 ) |= r ⇒ (m 1 Si ) = (m 2 Si ). The rules Ri may or may not conflict on Si .
If the rules Ri do not conflict on Si , we say that Ri is conflict free. If some subset of the
rules Ri do conflict on Si , we resolve conflicts by choosing a subset of rules R i ⊆ Ri that is
conflictfree. We choose some of the conflicting rules of Ri and exclude them from the set
R i . How to choose which rule to exclude from R i is a central point of discussion in the rest
of the section. For now, just note that our choice of rules to remove influences our choice of
scaling parameters ti .
We define two properties of conflicting rule sets on Ri . Suppose there are several sets of
conflicting rules Y1 , Y2 , . . . , Y K , where each is composed of rules r ∈ Ri , for a feature set
Si . A set of conflicting rules Yk is minimal if for any r ∈ Yk , Yk \r implies a subset of the
cycles implied by Yk on Si . The set of conflicting rule sets Y = {Y1 , Y2 , . . . , Y K } is minimal
and complete if each conflicting subset Ri of Ri is represented by some Yk ∈ Y such that for
any r ∈ Yk , Yk \r implies a subset of the cycles implied by Ri .
Theorem 8 (Conflict-free rule set). Given a relevant rule set Ri for Si , and a minimal,
complete set of conflicting rule sets Y for Ri , and
R i = Ri \{r1 , r2 , . . . , r K },
(13)
where r j are any rules such that r1 ∈ Y1 , r2 ∈ Y2 , . . . , r K ∈ Y K , then R i is conflictfree.
Proof. We will assume the opposite, and arrive at a contradiction. Suppose that R i is not
conflict free, then by Lemma 4 R i implies a cycle in G i (R i ) of Si . Since R i ⊂ Ri , the same
cycle is implied by Ri . By minimality and completeness of Y , there is some set of rules
Y j ∈ Y such that Y j implies the cycles implied by Ri . By minimality of Y j , if R i implies
the same cycle as Y j , then R i must contain the rules Y j . But, the rules Y j cannot be in R i ,
because r j ∈ Y j and we have defined r j ∈ R i in equation (13). Thus, we have arrived at a
contradiction, and therefore shown that R i is conflict free.
Lemma 4 states that conflict sets in R are equivalent to cycles in the restricted model
graph G i (R). Thus, we can find minimal sets of conflicting rules in R by using a cycle
detection algorithm on the graph G i (R). We can annotate the edges in G i (R) with the rule
r ∈ R that implies the edge. This way, it is simple to translate a cycle to a set of rules. Note
that if we find all cycles of G i (R), we have found a minimal and complete set of conflicting
rules in R. Since one set of rules R may cause more than one cycle in a given G i (R), there
may be multiple identical conflicting sets of rules. Thus, when translating a set of cycles to
184
COMPUTATIONAL INTELLIGENCE
a set of sets of conflicting rules, it may be more efficient to prune away duplicate cycles, but
this is not strictly necessary.
We can construct R i satisfying equation (13) given minimal conflict sets Yk and relevant
rule sets Ri . Note that there are several possible values for R i satisfying this equation, since
we can choose any r1 , r2 , . . . , r K such that r1 ∈ Y1 , r2 ∈ Y2 , . . . , r K ∈ Y K . Since rk can be a
member of more than one rule conflict set Yk , it is possible that the cardinality of R i varies
with different choices of rk . It is generally advantageous to have R i have as high a cardinality
as possible, but not necessary.
Given conflict-free R i , we can construct the restricted model graph G i (R i ). This is a
model graph G i (R i ) that is cycle free. With no cycles, we can construct a subutility function
for Si using any of the graphical utility function given in Section 3 over the restricted model
graph G i (R i ).
Theorem 9 (Subutility 1-consistency). Given a set of features Si ⊆ F(C) and corresponding
conflict-free relevant rule set R i , the subutility function u i using a graphical utility function
from Section 3 based on the restricted model graph G i (R i ) is 1-consistent with all rules
r ∈ Ri
Proof. Since R i is conflict free, by Lemma 4, G i (R i ) is cycle free. Then by Lemma 3, u i is
1-consistent with R i .
We call a subutility function satisfying Theorem 9 a graphical subutility function for Si .
Note that û i and u i may or may not be 1-consistent with r ∈ (Ri \R i ), and in general we assume
that these functions are not 1-consistent with such r . For notational convenience, we say that a
subutility function u i as described in Theorem 9 agrees with rules r ∈ R i , and that u i disagrees
with rules r ∈ (Ri \R i ). 1-Consistency is implied by agreement, but not the converse, because
agreement is a relation between a specific type of subutility function and a rule.
Consider an extension of our previous example. Suppose we have rules R over the features
f 1 , f 2 , f 3 , f 4 as follows:
Rule
Preference
*000 *011
**0* **1*
1*** 0***
1*11 0*00
r1
r2
r3
r4
(the same three rules we had before, plus rule r4 ). Again, let us restrict our attention to the
set of features Si = { f 2 , f 3 }. Then using these four rules, we obtain a model graph as shown
in Figure 4. Note that this model graph contains cycles.
Using any simple cycle-detection algorithm, we note that G i (R) has the following cycles:
Cycle
Rules
Y1
Y2
Y3
{r1 , r4 }
{r2 , r4 }
{r2 , r4 }
Note that rules r2 , r4 cause two separate cycles, one involving models 00 and 01, and another
involving 11 and 10. This results in cycle Y2 being identical to cycle Y3 . To obtain a cycle
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
00
185
r1
r2
r4
01
11
r4
r2
10
FIGURE 4. Model graph G i (R) for set Si = { f 2 , f 3 }. Nodes are labeled with binary representation of values
they assign to features f 2 , f 3 .
m
Si
free subset of R, we choose some rule in each cycle Yi and exclude it from the new set R i .
In this case, suppose we choose rule r4 . Then we have R i = {r1 , r2 } (r3 is excluded since
s(r3 ) ∩ Si = ∅; r3 is not in the relevant rule set of Si ). We then can construct G i (R i ), which
happens to be identical to the figure shown in Figure 3.
We can then construct a partial utility function û i (m Si ) from G i (R i ), exactly as before:
û i (m
00
01
10
11
Si )
1
0
1
0
5.3. Consistency Condition
Recall that we have defined what it means for a utility function u to be consistent with
a set of strict preferences C ∗ as p(u) ∈ [C ∗ ]. This is equivalent to having u(m 1 ) > u(m 2 )
whenever m 1 C ∗ m 2 . This translates into the following condition on the subutility functions
u i . For each rule r ∈ C ∗ and each pair of models (m 1 , m 2 ) |= r , we have the following:
Q
ti u i (m 1 ) >
i=1
Q
ti u i (m 2 ),
(14)
i=1
i:r ∈Ri
then simplify:
i:r ∈ Ri
i:r ∈Ri
i:r ∈Ri
ti u i (m 1 ) >
i:r ∈Ri
that is, if we have a rule r implying that m 1 m 2 , then the utility given by u to m 1 must be
greater than that given to m 2 . For it to be otherwise would mean our utility function u was not
faithfully representing the set of preferences C. Let Ri be the relevant rule set for Si . Examine
one subutility function, u i , and pick some r ∈ Ri . By the definition of Lr (V), if (m 1 , m 2 ) |= r ,
both m 1 , m 2 assign the same truth values to the features in Si . That is, (m 1 Si ) = (m 2 Si ).
This implies u i (m 1 ) = u i (m 2 ). So we can split the summations of equation (14), giving:
ti u i (m 1 ) +
ti u i (m 1 ) >
ti u i (m 2 ) +
ti u i (m 2 ),
(15)
i:r ∈ Ri
ti u i (m 2 ).
(16)
186
COMPUTATIONAL INTELLIGENCE
Given a set S = {S1 , . . . , S Q } of subsets of F(C), corresponding relevant rule sets Ri for
each Si ∈ S, conflict-free rule sets R i for each Ri , and subutility functions u i 1-consistent
with R i , define Sa (r ) and Sd (r ) as follows. For a rule r , Sa (r ) is the set of indices i such that
r ∈ R i , and Sd (r ) is the set of indices i such that r ∈ (Ri \R i ).
Theorem 10 (Rule consistency). Given a set S = {S1 , . . . , S Q } of subsets of F(C), corresponding relevant rule sets Ri for each Si ∈ S, conflict-free rule sets R i for each Ri , and
subutility functions u i 1-consistent with R i , a generalized additive utility function u is consistent with a rule r if for all (m 1 , m 2 ) |= r , we have
ti (u i (m 1 ) − u i (m 2 )) >
t j (u j (m 2 ) − u j (m 1 )).
(17)
i∈Sa (r )
j∈Sd (r )
Proof. The proof is given in the Appendix.
There are two important consequences of this theorem.
Corollary 1 (Total agreement). A partial utility function u i for the features Si must be
1-consistent with r on Si if s(r ) ⊆ Si .
In this case we have exactly one index i such that r ∈ Ri . Thus, we have either |Sa (r )| = 1
and |Sd (r )| = 0 or the opposite, |Sa (r )| = 0 and |Sd (r )| = 1. It is easy to see in the second
case that inequality (17) cannot be satisfied, since ti is a positive quantity and for i ∈ Sd (r ),
u i (m 2 ) − u i (m 1 ) is also a positive quantity. Thus, if a rule’s specified features overlap with
only one subutility, then that subutility must be consistent with the rule.
Corollary 2 (Minimal agreement). In an additive decomposition utility function u consistent
with C ∗ , each rule r ∈ C ∗ must be 1-consistent with some subutility function u i .
By assumption, u satisfies Theorem 10, so u must satisfy inequality (17). If some rule r is
not 1-consistent with any subutility function u i , then we have |Sa (r )| = 0 and |Sd (r )| ≥ 1,
and inequality (17) will not be satisfied.
Using the above theorem and corollaries, a number of algorithms are possible to produce
a consistent utility function. We discuss these in the following subsection.
5.4. Choosing Scaling Parameters
Once we have a collection of utility-independent sets Si , we create subutility functions
as described in Section 5.1. We can then choose scaling parameters ti based on which rules
disagree with each partial utility function û i . There are several possible strategies for choosing these parameters. We will first give the most simple scenario, and follow that with an
optimization, then we present the general method.
5.5. No Conflicting Rules
Given a set of rules C ∗ , a set S of subsets of F(C), and corresponding relevant rule sets
Ri , the sets Ri may or may not have conflicting rules. The simplest possible scenario is when
there are no conflicting rules on any sets Si ∈ S, and each u i is 1-consistent with each rule
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
187
r ∈ Ri . If this is the case, consistency of u becomes easy to achieve. Consider that inequality
(17) becomes:
ti (u i (m 1 ) − u i (m 2 )) > 0,
i∈Sa (r )
since the set of inconsistent subutility indices, Sd (r ), is empty. By the definition of consistency
with r , for i ∈ Sa (r ), u i (m 1 ) − u i (m 2 ) is always a positive quantity. Thus, we are free to choose
any positive values we wish for ti . For simplicity, choose ti = 1, 1 ≤ i ≤ Q.
This brief inquiry lets us state the following theorem.
Theorem 11 (Simple scaling parameters). Given a set of feature sets S = {S1 , S2 , . . . , S Q }
and a set of feature vector rules C ∗ in L ∗ , with Ri the set of relevant rules for Si , and each u i
is 1-consistent with each r ∈ Ri , then
u(m) =
Q
u i (m)
(18)
i=1
is an ordinal utility function consistent with C ∗ .
Proof. Proof is given in the Appendix.
C ∗.
As a corollary, under the same conditions, each of the GUFs can be made consistent with
Corollary 3 (No conflicting rules). Given a set of feature sets S = {S1 , S2 , . . . , S Q } and a
set of feature vector rules C ∗ in L ∗ , with Ri the set of relevant rules for Si and Ri conflict
free for all i, then
u(m) =
Q
u i (m),
(19)
i=1
where each u i is one of the GUFs given in Section 3 over the restricted model graph G i (Ri ),
is an ordinal utility function consistent with C ∗ .
Proof. By assumption, Ri is conflict free, so the restricted model graph G i (Ri ) is cycle free.
By Theorem 9 each u i is 1-consistent with each r ∈ Ri . By Theorem 11, u is consistent with
C ∗.
Let us return to our example. We have f 1 , f 2 , f 3 , f 4 . Let us use R equal to:
Rule
r1
r2
r3
Preferenc
*000 *011
**0* **1*
1*** 0***
This time, let us consider the utility-independence decomposition S = {S1 , S2 , S3 } where
S1 = { f 1 }, S2 = { f 2 , f 3 }, and S3 = { f 4 }. For each set of features we calculate the relevant
rule set: R1 = {r3 }, R2 = {r1 , r2 }, and R3 = {r1 }. Using these we can construct model graphs
as shown in Figure 5.
188
COMPUTATIONAL INTELLIGENCE
1
r1
00
1
r2
r3
11
r1
01
r2
0
10
G1(R1)
0
G2(R2)
G3(R3)
FIGURE 5. Model graphs G i (Ri ) for sets S1 = { f 1 }, S2 = { f 2 , f 3 }, and S3 = { f 4 }.
m
S1
û 1 (m
S1 )
S2
m
S2 )
1
0
1
0
00
01
10
11
û 2 (m
S3
0
1
m
0
1
0
1
Using those graphs, we can define partial utility functions as follows:
û 3 (m
S3 )
1
0
Then, since each of these is conflict free,
u(m) = u 1 (m) + u 2 (m) + u 3 (m)
is a utility function consistent with R.
5.6. Simple Conflicting Rules
Given a set S of feature sets of F(C) and corresponding relevant rule sets Ri ⊆ C ∗
containing conflicting rules, the conflicts can interact in different ways. In some cases, the
conflicts are such that they admit an elegant solution. The intuitive idea is as follows: if r
disagrees with some set of subutility functions Sd (r ), then the weight (scaling parameters ti )
assigned to those subutility functions must be less than the weight assigned to some subutility
function u r that agrees with r . The weight of that one agreement must be larger than the sum
of all the disagreements.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
189
The following example illustrates a configuration of relevant rule sets Ri that can cause
this to occur, and how to assign the scaling parameters ti . Given the usual set of features sets
S, relevant rule sets Ri , conflict free rule sets R i , and subutility functions u i 1-consistent with
R i , choose some r ∈ C ∗ . Suppose there is exactly one subutility function 1-consistent with
r , u r , and let Sd (r ) be the set of indices of subutility functions u i such that (m 1 , m 2 ) |= r ⇒
(m 1 Si ) = (m 2 Si ) and u i is not 1-consistent with r . We are interested to find assignments
to scaling parameters ti such that the generalized additive utility function u is consistent with
r . To this end, we examine the values of u i (m 1 ) − u i (m 2 ) when (m 1 , m 2 ) |= r .
We have u r (m 1 ) > u r (m 2 ); and for all i ∈ Sd (r ) we have u i (m 1 ) < u i (m 2 ) for all pairs
(m 1 , m 2 ) |= r . Let max(u i ) represent the maximum value returned by u i for any models
m. Clearly, max(u i ) ≥ u i (m 2 ) for all i ∈ Sd (r ) and all (m 1 , m 2 ) |= r . Similarly, u i (m 1 ) ≥ 0.
Thus, we can bound the expression u i (m 1 ) − u i (m 2 ) as follows:
u i (m 1 ) − u i (m 2 ) ≥ − max(u i ).
(20)
Recall Theorem 10 states that for u to be consistent with r , we must have:
ti (u i (m 1 ) − u i (m 2 )) >
t j (u j (m 2 ) − u j (m 1 )).
i∈Sa (r )
j∈Sd (r )
In this case, we have i∈Sa (r ) ti (u i (m 1 ) − u i (m 2 )) = tr (u r (m 1 ) − u r (m 2 ), and for i ∈ Sd (r ),
u i (m 2 ) − u i (m 1 ) ≤ max(u i ). Substituting this in the above condition implies that u is consistent with r if
tr (u r (m 1 ) − u r (m 2 )) >
ti max(u i ).
u i ∈Sd
By assumption of r 1-consistent with u r , u r (m 1 ) − u r (m 2 ) ≥ 1, so u is consistent with r if
tr >
ti max(u i ).
u i ∈Sd
Note that this provides an explicit condition on the value of tr for u to be consistent with r .
We can distill the intuition behind the above example into a technique appropriate when
there are few rules per utility-independent set, Si . Given the set of feature sets S and rules
# The ordering S# = ( S#1 , S#2 , . . . , S#Q ) and each S#i ∈ S
C ∗ , we define an ordering of S called S.
has corresponding relevant rule set Ri . S# is conflict free if R1 is conflict free and for all
i : 2 ≤ i ≤ Q,
i−1
Ri
Rj
j=1
is conflict free. That is, S# is conflict free if the first set is conflict free, and subsequent sets
are either conflict-free or conflict free disregarding rules appearing in earlier sets.
Lemma 5 (Conflict-free ordering). Given rules r ∈ C ∗ , a set of feature sets S of F(C),
# each u i a subutility funca conflict-free ordering S# of S, relevant rule sets Ri for S#i ∈ S,
i−1
tion 1-consistent with {Ri \ ∪ j=1 R j }, Sa (r ) the set of indices of subutility functions u i 1consistent with r , Sd (r ) the set of indices of subutility functions u i such that (m 1 , m 2 ) |= r ⇒
(m 1 S#i ) = (m 2 S#i ) and u i is not 1-consistent with r , then, for all r , there exists j ∈ Sa (r )
such that j < k, for all k ∈ Sd (r ).
Proof. Choose some rule r . r is in relevant rule sets Ri for i ∈ {Sa (r ) ∪ Sd (r )}. Let j be the
# that is, for all k = j, k ∈ {Sa (r ) ∪ Sd (r )} implies j < k.
first such rule set according to S,
190
COMPUTATIONAL INTELLIGENCE
j−1
We are given that u j is 1-consistent with {R j \ ∪l=1 Rl }, so by definition of Sa (r ), j ∈ Sa (r ).
Since j < k for all k ∈ {Sa (r ) ∪ Sd (r )}, j < k for all k ∈ Sd (r ). This proves the lemma. Theorem 12 (Lexicographic ordering). Given rules r ∈ C ∗ , a set of feature sets S of
# each u i a subutilF(C), a conflict-free ordering S# of S, relevant rule sets Ri for S#i ∈ S,
i−1
ity function 1-consistent with {Ri \ ∪ j=1 R j }, Sa (r ) the set of indices of subutility functions u i 1-consistent with r , Sd (r ) the set of indices of subutility functions u i such that
(m 1 , m 2 ) |= r ⇒ (m 1 S#i ) = (m 2 S#i ) and u i is not 1-consistent with r , then
u(m) =
Q
ti u i (m),
(21)
t j max(u j ),
(22)
i
where t Q = 1 and for all i : 1 ≤ i < Q,
ti = 1 +
Q
j=i+1
u is a utility function consistent with C ∗ .
Proof. Given in Appendix.
It should be clear that we can look for a conflict-free ordering S# in the following manner.
Given a set of feature sets S and corresponding rule sets Ri , we can check if any of the Ri
are conflict free. If so, Si is the first set in the conflict free ordering, S#1 . Let the relevant rule
set for S#1 be R 1 . We can then check if any Ri \R 1 are conflict free. If so, pick one such Si and
let this be S#2 . We can then proceed iteratively finding further S#i , until either we have found
a conflict-free ordering, or we have not. If we do find a conflict-free ordering, we can then
define graphical subutility functions based on that ordering, and assign scaling parameters ti
as given in equation (22). The result is a utility function consistent with C ∗ (by Theorem 12).
We now return to our previous four-rule example, and add a fifth rule. Let our set of rules
R be as follows:
Rule
Preference
r1
r2
r3
r4
r5
*000 *011
**0* **1*
1*** 0***
1*11 0*00
**01 **10
Again, assume that we have S = {S1 , S2 , S3 } and that S1 = { f 1 }, S2 = { f 2 , f 3 }, and S3 = { f 4 }.
Then compute relevant rule sets Ri . We have:
Set
Rules
R1
R2
R3
{r3 , r4 }
{r1 , r2 , r4 , r5 }
{r1 , r4 , r5 }
We then check each Ri to see if it is conflict free. We can do this by building model graphs
G i (Ri ). In this case, we discover that R1 is conflict free and that R2 , R3 are not. Thus, we
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
1
191
00
1
r2
r3
r4
11
01
r2
0
10
G1(R1)
G2(R2)
0
G3(R3)
FIGURE 6. Model graphs G i (Ri ) for sets S1 = { f 1 }, S2 = { f 2 , f 3 }, and S3 = { f 4 }.
# Since R1 is conflict free, we set S#1 = S1 . We then check
look for a conflict free ordering S.
if R2 \R1 or R3 \R1 is conflict free. In this case, we have
R2 \R1 = {r1 , r2 , r5 },
R3 \R1 = {r1 , r5 },
0
1
0
1
û 1 (m S1 )
m S2
û 2 (m S2 )
00
01
10
11
1
0
1
0
m S3
û 3 (m S3 )
0
1
0
0
m S1
and R2 \R1 is conflict free. Thus, we set S#2 = S2 , and check if R3 \{R1 ∪ R2 } is conflictfree. In this case R3 \{R1 ∪ R2 } = ∅, which of course, entails no conflicts on S3 . We assign
S#3 = S3 . We can now construct model graphs G i for each of the sets in the conflict-free
ordering. These are shown in Figure 6.
We can create subutility functions u i based on the model graphs G i given in Figure 6,
and get the following:
We then have the following utility function:
u(m) = t1 ∗ u 1 (m) + t2 ∗ u 2 (m) + t3 ∗ u 3 (m),
192
COMPUTATIONAL INTELLIGENCE
and it remains to assign values to the ti . Using equation (22), we set t3 = 1. Then
t2 = 1 + t3 ∗ max(u 3 ) = 1 + 1 ∗ 0 = 1,
and
t1 = 1 + t2 ∗ max(u 2 ) + t3 ∗ max(u 3 ) = 1 + 1 ∗ 1 + 1 ∗ 0 = 2.
And so our utility function is
u(m) = 2 ∗ u 1 (m) + u 2 (m) + u 3 (m).
If we cannot find a conflict-free ordering of S, then we must use a more complicated
strategy for assigning scaling parameters ti . This is discussed in the following subsection.
5.7. Scaling Parameter Assignment by Constraint Satisfaction
When we have a set of feature sets S of F(C) without a conflict-free ordering, we
can construct a constraint satisfaction problem from the set of conflicts and use a general
boolean SAT-solver to find a solution. The satisfiability problem will be constructed so that
it represents the rule conflicts on the UI feature sets, and a solution to the constraint problem
will determine which subutility functions agree and disagree with each rule.
Given rules C ∗ and a set of feature sets S of F(C) with corresponding relevant rule sets
Ri , we define a boolean satisfiability problem P(C ∗ , S) in conjunctive normal form (CNF),
a conjunction of disjunctions. Let Yik be a minimal set of conflicting rules on Ri , and let
Yi = {Yi1 , Yi2 , . . . , Yi K } be a minimal and complete set of conflicts for Si . Let Y be the set
of all such Yik . And let
Yik
Rc =
i
k
the set of all rules involved in any conflict. The boolean variable z i j in P(C ∗ , S) represents a pair (i, r j ), where r j ∈ C ∗ and i is an index of Si ∈ S. The truth value of z i j is
interpreted as stating whether or not subutility function u i agrees with rule r j ∈ Rc . Let
Xj = {l | (m 1 , m 2 ) |= r j ⇒ (m 1 Sl ) = (m 2 Sl )} denote the set of indices of UI sets that
overlap with r j .
Conjuncts in P(C ∗ , S) are of one of two forms: those representing the subutility functions
in a set X j ; and those representing conflicts between rules of a particular cycle Yik .
Definition 10 (Rule-set clauses). Given rules C ∗ , a set of feature sets S of F(C) with
corresponding relevant rule sets Ri , Rc of all rules involved in any conflict and X j the set of
indices of UI sets overlapping with r j ∈ Rc , then all clauses of the form
zi j
(23)
i∈X j
are the Rule-Set Clauses of P(C ∗ , S).
The rule-set clauses of P(C ∗ , S) represent the possible subutility functions a rule might
agree with. Corollary 2 states each rule r ∈ C ∗ must be 1-consistent with some subutility
function u i , accordingly in a solution to P(C ∗ , S), one of these variables must be true.
Definition 11 (Conflict Clauses). Given rules C ∗ , a set of feature sets S of F(C) with
corresponding relevant rule sets Ri , Yi = {Yi1 , Yi2 , . . . , Yi K } be a minimal and complete set
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
193
of conflicts for Si , Rc of all rules involved in any conflict and X j the set of indices of UI sets
overlapping with r j ∈ Rc , then all clauses of the form
¬z i j
(24)
j:r j ∈Yik
are the Conflict Clauses of P(C ∗ , S).
The conflict clauses of P(C ∗ , S) represent a particular conflicts set of rules on a particular
UI set Si . At least one of the rules in Yik must disagree with Si .
Combining rule-set clauses and conflict clauses, P(C ∗ , S) is the conjunction of all such
clauses. Thus
Q
P(C ∗ , S) =
zi j
¬z i j
.
j=1
i∈X j
Yik ∈Y
j:r j ∈Yik
Once we have computed P(C ∗ , S), we can use a solver to arrive at a solution . This
solution is an assignment of true or false to each variable z i j in P(C ∗ , S). Note that there
need not be a solution .
Given a solution to P(C ∗ , S), we need to translate that back into a construction for
the subutility functions over Si . Using the definition of the propositional variables z i j , we
look at the truth values assigned to z i j in . This shows which subutility functions disagree
with which rules. Let f be the set of all z i j assigned value false in . We define rule sets
R i ⊆ Ri for each relevant rule set Ri , as follows. For all Si ∈ S,
R i = Ri \{r j | z i j ∈ f }.
(25)
Simply, we construct conflict-free relevant rule sets R i ⊆ Ri , from the solution to P(C ∗ , S),
then we can construct u i 1-consistent with R i .
We show that R i constructed according to are conflict free.
Lemma 6 (Conflict clause solution). Given a solution to P(C ∗ , S), and relevant rule sets
Ri for each Si ∈ S, rule sets R i ⊆ Ri ,
R i = Ri \{r j | z i j ∈ f }
are conflict free.
Proof. We proceed with a proof by contradiction. Choose some Si . Assume R i has a conflict,
Y R ⊆ R i . Since Y is a complete set of conflicts, Y R i ∈ Y . Thus, there exists a conflict clause
∨ j:r j ∈Y Ri ¬z i j in P(C ∗ , S), so one of these variables z i j ∈ f . Call this variable z i x . By
definition of R i , when z i x ∈ f , rule r x ∈ R i . This contradicts r x ∈ R i , so we have arrived
at a contradiction and must conclude that R i is conflict free.
Theorem 13 (Correctness of SAT-formulation). Given a set of feature sets S of F(C), if
∗
P(C ∗ , S) has no solution
then the preferences C are not consistent with a utility function u
of the form u(m) = i ti u i (m).
194
COMPUTATIONAL INTELLIGENCE
1
00
r1
r6
r4
r5
1
r4
r4
r3
r2
11
r6
01
r4
r5
r1
r5
r2
0
10
G1(R1)
G2(R2)
0
G3(R3)
FIGURE 7. Model graphs G i (Ri ) for sets S1 = { f 1 }, S2 = { f 2 , f 3 }, and S3 = { f 4 }.
Proof. The proof is given in the Appendix.
If we return to our previous example, and add another rule to the set R, we have an
example illustrating our SAT-formulation methods. Let
Rule
Preference
r1
r2
r3
r4
r5
r6
*000 *011
**0* **1*
1*** 0***
1*11 0*00
**01 **10
000* 110*
be our set of rules. Again, assume that we have S = {S1 , S2 , S3 } and that S1 = { f 1 }, S2 =
{ f 2 , f 3 }, and S3 = { f 4 }. Then the model graphs are as shown in Figure 7.
Using the model graphs, we can list the cycles found in each graph:
Set
Conflicts
S1
S2
S3
Y11 = {r3 , r6 }, Y12 = {r4 , r6 }
Y21 = {r1 , r4 }, Y22 = {r2 , r4 }, Y23 = {r2 , r4 }, Y24 = {r4 , r5 }, Y25 = {r4 , r5 }
Y31 = {r1 , r4 }, Y32 = {r1 , r5 }
This lets us list the conflict-clauses for P(C ∗ , S):
(¬z 1,3 ∨ ¬z 1,6 ) ∧ (¬z 1,4 ∨ ¬z 1,6 ) ∧ (¬z 2,1 ∨ ¬z 2,4 ) ∧ (¬z 2,2 ∨ ¬z 2,4 )
∧(¬z 2,4 ∨ ¬z 2,5 ) ∧ (¬z 3,1 ∨ ¬z 3,4 ) ∧ (¬z 3,1 ∨ ¬z 3,5 ).
(26)
Recall the first number in the subscript indicates the set Si and the second number indicates
the rule r j . By looking at the relevant rule sets for each Si , we can state the rule-set clauses:
(z 2,1 ∨ z 3,1 ) ∧ (z 2,2 ) ∧ (z 1,3 ) ∧ (z 1,4 ∨ z 2,4 ∨ z 3,4 ) ∧ (z 2,5 ∨ z 3,5 ) ∧ (z 1,6 ∨ z 2,6 ).
(27)
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
195
Clauses (26) and (27) comprise the whole satisfiability problem. One possible solution is as
follows:
Variable
Assignment
z 1,3
z 1,4
z 1,6
z 2,1
z 2,2
z 2,4
z 2,5
z 2,6
z 3,1
z 3,4
z 3,5
True
True
False
True
True
False
True
True
False
True
True
we translate this into agreement and disagreement of rules with sets as follows:
S1
S2
S3
r1
r2
r3
r4
r5
r6
–
a
d
–
a
–
a
–
–
a
d
a
–
a
a
d
a
–
where ‘a’ indicated agreement and ‘d’ disagreement. Using this solution, we can build model
graphs using only the agreeing rules. These are shown in Figure 8.
Suppose there is no solution . One method of proceeding is to merge two feature sets
Si , S j . We can define S = S, except that S contains the set Sk , where Sk = Si ∪ S j , and S does not contain either of Si and S j . Using S , we can obtain another satisfiability problem,
P(C ∗ , S ). This new problem may have a solution or it may not. In the case that it does not,
we may iteratively try merging feature sets and solving the resulting satisfiability problems.
1
00
r1
r3
r4
11
r6
r2
r5
01
1
r4
r5
r5
r2
0
G1(R1)
10
G2(R2)
FIGURE 8. Model graphs G i (Ri ) for sets S1 = { f 1 }, S2 = { f 2 , f 3 }, and S3 = { f 4 }.
0
G3(R3)
196
COMPUTATIONAL INTELLIGENCE
Merging feature sets is an operation that does not create conflicts where there were none, as
we demonstrate in the following theorem.
Theorem 14 (Set merging). If C ∗ is an inconsistent preference set over features Sk , then
there does not exist any Si , S j with Si ∪ S j = Sk such that C ∗ restricted to Si is consistent
and C ∗ restricted to S j is consistent.
Proof. The proof is given in the Appendix.
By the contrapositive, Theorem 14 implies that joining two sets Si , S j together to form
Sk will not result in a set with conflicts if Si , S j are conflict free. Thus, this theorem implies
that iteratively merging feature sets is guaranteed to result in a solvable satisfiability problem
when C ∗ is consistent. If the preferences C ∗ are consistent, by definition the set F(C) is
conflict free, then iterative set-merging must end when all features have been merged into
one set identical with F(C). How best to choose which feature sets to merge is a topic we
leave for future research.
By merging UI sets, the computation of u(m) becomes less efficient, and we spend
computational effort to evaluate several different satisfiability problems. In general, merging
feature sets allows us to represent more complicated preferences, such as utility dependence
(including dependencies missed by function CheckDependence), and nonlinear tradeoff ratios
(see Section 5.9). Merging feature sets is a technique that applies to different representations
of ceteris paribus preferences. For example, Boutilier et al. (2001), mention that merging
feature sets allows their representation to express ceteris paribus preferences which would
be inexpressible in their representation without merging.
Even in the case that a solution to P(C ∗ , S) exists, we are not guaranteed that it leads to
a consistent utility function. There is still the matter of setting the scaling parameters ti . In
doing so, we must be careful to satisfy inequality (17). Therefore, for each rule r ∈ Ri \R i ,
we keep a list of linear inequalities, I , that must be satisfied by choosing the appropriate
scaling parameters ti . We discuss this in the following subsection.
5.8. Linear Inequalities
The solution to a satisfiability problem P(C ∗ , S) determines how to construct conflict
free rule sets R i ⊆ Ri for each feature set Si ∈ S. However, we must then set the scaling
parameters, ti , so that inequality (17) is satisfied. We can do so by building a list of constraints
on the values that the scaling parameters may assume while still making inequality (17) true.
These constraints are linear inequalities relating the relative importance of the parameters ti .
Our list I of inequalities should assure that each pair (m 1 , m 2 ) that satisfies r ∈ C ∗ is
consistent with the total utility function. By inequality (17), given rules C ∗ , a set of feature
sets S of F(C) with corresponding relevant rule sets Ri , and conflict-free R i ⊆ Ri , let Sa (r ) be
the set of indices for which r ∈ R i , and let Sd (r ) be the set of indices for which r ∈ (Ri \R i ),
for each (m 1 , m 2 ) |= r :
ti u i (m 1 ) +
ti u i (m 1 ) >
ti u i (m 2 ) +
ti u i (m 2 ).
(28)
i∈Sa (r )
i∈Sd (r )
i∈Sa (r )
i∈Sd (r )
That is, the utility function u must assign higher utility to m 1 than to m 2 . We can simplify
these inequalities as follows:
ti (u i (m 1 ) − u i (m 2 )) > 0.
(29)
i∈{Sa (r )∪Sd (r )}
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
197
Note that the inequality need not contain terms for i ∈ {Sd (r ) ∪ Sa (r )}, since, by definition
of (m 1 , m 2 ) |= r , we have (m 1 Si ) = (m 2 Si ) and u i (m 1 ) − u i (m 2 ) = 0. Furthermore, note
that many of the linear inequalities are redundant. A simple intuition tells us the important
inequalities are the boundary conditions: the constraints that induce maximum or minimum
values for the parameters ti . Consider the set of model pairs (m 1 , m 2 ) such that (m 1 , m 2 ) |= r .
Some pairs provide the minimum and maximum values for u i (m 1 ) − u i (m 2 ) for some i ∈
{Sa (r ) ∪ Sd (r )}. Call minmax(r, i) the set of maximum and minimum values of u i (m 1 ) −
u i (m 2 ) for model pairs satisfying r . This set has at most two elements and at least one. Let
M(r ) be the set of model pairs as follows:
M(r ) :
(m 1 , m 2 ) |= r,
u i (m 1 ) − u i (m 2 ) ∈ minmax(r, i)∀i ∈ {Sa (r ) ∪ Sd (r )}.
Each model pair in the set M(r ) causes u i (m 1 ) − u i (m 2 ) to achieve either its minimum
or its maximum value on every subutility function either agreeing or disagreeing with r .
For concreteness, stipulate that each (m 1 , m 2 ) ∈ M(r ) has f j (m 1 ) = 0 and f j (m 2 ) = 0 for
j ∈ {Sa (r ) ∪ Sd (r )}. We can now define the irredundant system of linear inequalities as an
inequality of the form given in equation (29) for each rule r and each model pair (m 1 , m 2 ) ∈
M(r ).
Definition 12 (Inequalities). Given rules C ∗ , a set of feature sets S of F(C) with corresponding relevant rule sets Ri , conflict-free R i ⊆ Ri , R i ∈ R, where Sa (r ) is the set of indices for
which r ∈ R i , Sd (r ) is the set of indices for which r ∈ (Ri \R i ), and each subutility function
u i is 1-consistent with all r ∈ R i , then I (C ∗ , S, R) is the set of linear inequalities
ti (u i (m 1 ) − u i (m 2 )) > 0
i∈{Sa (r )∪Sd (r )}
for all r in some (Ri \R i ) and for all model pairs in M(r ).
By definition of the scaling parameters ti of an generalized additive utility function u,
ti > 0 for all 1 ≥ i ≥ Q. Thus, a solution to I (C ∗ , S, R) is an assignment of positive numbers
to each ti .
Lemma 7 (Sufficiency of I (C ∗ , S, R)). Given I (C ∗ , S, R) and a solution to it, for all r in
some (Ri \R i ) and for all model pairs (m 1 , m 2 ) |= r ,
ti (u i (m 1 ) − u i (m 2 )) > 0.
i∈{Sa (r )∪Sd (r )}
Proof. Given in Appendix.
Theorem 15 (Inequalities). If the system of linear inequalities, I (C ∗ , S, R), has a solution,
this solution corresponds to a utility function u consistent with C ∗ .
Proof. The proof is given in the Appendix.
We can solve the system of linear inequalities I (C ∗ , S, R) using any linear inequality
solver. We note that it is possible to phrase this as a linear programming problem, and use
198
COMPUTATIONAL INTELLIGENCE
any of a number of popular linear programming techniques to find scaling parameters ti . If
we rewrite the inequalities in the form
Q
ti (u i (m 1 ) − u i (m 2 )) ≥ ,
(30)
i=1
where is small and positive, then we can solve this system of inequalities by optimizing the
following linear program. The task of the linear program is to minimize a new variable, t0 ,
subject to the following constraints:
t0 +
Q
ti (u i (m 1 ) − u i (m 2 )) ≥ i=1
for each (m 1 , m 2 ) as described above, and the constraint that t0 ≥ 0 (Chvátal 1983). We must
be careful that is sufficiently smaller than 1, or we may not find solutions that we could
otherwise discover. Given small enough, this linear program has the property that it always
has a solution, by making t0 very large. However, only when the optimal value of t0 is 0, is
there a solution to the original system of linear inequalities.
Let us continue our previous example where we left off. We have three preference rules
that disagree with some subutility functions. These are
Rule
r1
r4
r6
Preference
*000 *011
1*11 0*00
000* 110*
Since r1 has preferences over S2 and S3 , we need to add linear inequalities for all
model pairs (m 1 , m 2 ) in M(r1 ), or, all combinations of minimum and maximum values for u 2 (m 1 ) − u 2 (m 2 ) and u 3 (m 1 ) − u 3 (m 2 ). Below we enumerate the members of sets
minmax(r1 , 2), minmax(r1 , 3) and M(r1 ).
minmax(r1 , 2) = {2}
minmax(r1 , 3) = {−1}
M(r1 ) = {((00, 0), (01, 1))}.
The members of M(r1 ) are written using the notation ((m 1 , m 1 ), (m 2 , m 2 )) where m 1 and m 2
are models of S2 and m 1 and m 2 are models of S3 . Thus the preference rule r1 contributes
this inequality to our set I (C ∗ , S, R):
2t2 − t3 ≥ 1.
(31)
Now consider rule r4 , which intersects all three feature sets S1 , S2 , S3 . We have the following
members of minmax(r4 , i) and M(r4 ):
minmax(r4 , 1) = {1}
minmax(r4 , 2) = {−2, −1}
minmax(r4 , 3) = {1}
M(r4 ) = {((1, 01, 1), (0, 00, 0)), ((1, 11, 1), (0, 10, 0))}.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
199
Each member of M(r4 ) corresponds to a linear inequality:
t1 − 2t2 + t3 ≥ 1,
t1 − t2 + t3 ≥ 1.
(32)
Finally, examine r6 . We have the following members of minmax(r6 , i) and M(r6 ):
minmax(r6 , 1) = {−1}
minmax(r6 , 2) = {1}
M(r6 ) = {((0, 00), (1, 10, 0))}.
Each member of M(r6 ) corresponds to a linear inequality:
−t1 + t2 ≥ 1.
(33)
Combining inequalities (31)–(33), and adding a slack variable t0 , we have the following linear
programming problem
t0 + 2t2 − t3
t0 + t1 − 2t2 + t3
t0 + t1 − t2 + t3
t0 − t1 + t2
t0
≥1
≥1
≥1
≥1
≥ 0.
The above problem can the be solved using any linear programming technique. One solution
is as follows:
t0 = 0
t1 = 2
t2 = 3
t3 = 5,
and the corresponding utility function is therefore
u(m) = 2u 1 (m) + 3u 2 (m) + 5u 3 (m),
where the subutility functions are defined using the model graphs shown in Figure 7.
We can bound the total number of linear inequalities in I (C ∗ , S, R) as follows. Given
that r ∈ (Ri \R i ) for some i, it contributes at most
2|Sa (r )∪Sd (r )|
(34)
inequalities to the set of inequalities. This is a simple combinatoric argument. r contributes
exactly one inequality for each combination of minimum and maximum value on each set
Si . For each set a rule can have both a minimum and a maximum (they may not be distinct).
This suffices to give us the bound stated in condition (34).
We have mentioned that I (C ∗ , S, R) might have no solution. We discuss this situation in
the following subsection.
5.9. Nonlinear Tradeoff Ratios
In this section, we discuss how to deal with nonlinear tradeoffs between satisfaction of
one feature set and satisfaction of another feature set.
200
COMPUTATIONAL INTELLIGENCE
A tradeoff ratio is the amount of satisfaction of Si a decision maker is willing to sacrifice
to gain an improvement of one unit of satisfaction of S j . We define the tradeoff ratio between
Si and S j as a positive number ρ such that, for fixed subutility functions u i , u j , for Si , S j ,
we have
u(m) = ρ ∗ u i (m) + u j (m) + κ,
(35)
where κ is a constant that depends on m and represents the values returned by other subutility
functions on input m.
A tradeoff ratio may not exist. It may be the case that the amount of Si a decision maker
will sacrifice for one unit of S j depends on the amount of Si . For example, in economics it is
common to model the utility of money (u i ) as proportional to the logarithm of the amount of
the money. Thus the more money a decision maker has, the less utility the money provides.
It is intuitive to think that the more money a person has (satisfaction of Si ), the more the are
willing to spend $100 on a meal (satisfaction of S j ).
The basic problem is as follows. A well-known result from utility theory (Keeney and
Raiffa 1976) states mutual utility independence implies the existence of a utility function
defined as an additive combination of subutility functions. While we use a somewhat different
conception of utility independence (generalize additive independence), we might still hope
for such a guarantee in our case. However, our subutility functions are of a restricted form
(described below), thus, there are certain types of preferences, representable with ceteris
paribus preference statements that our utility decomposition cannot represent.
We call this problem the nonlinear tradeoff ratio problem, because we view it as a
problem of taking two approximately linear functions, u i and u j , and specifying a tradeoff
ratio between them. The graphical subutility functions are limited in the following way. For
the minimizing GUFs using strict preferences, for each integer n ∈ {1, 2, . . . , max(u i )}, there
exists some model m such that u i (m) = n. The other GUFs have a similar problem: that if
m 1 m 2 the size of (u i (m 1 ) − u i (m 2 )) is arbitrary. There is no way of specifying, within
one subutility function, that some model is “much more preferred” to another model, or that
some model is “ten times as desirable” as another model. This is a natural shortcoming based
on our formalism of qualitative ceteris paribus preference statements.
It is perhaps remarkable that qualitative ceteris paribus preference statements can concisely imply particular nonlinear tradeoffs between subutility functions. Ceteris paribus preferences of the following form imply conditions on the tradeoff ratio. A preference rule r
implies a bound on the tradeoff ratio of feature sets Si , S j if:
r
r
r
r specifies values over at least two UI features sets Si , S j ,
there exists (m 1 , m 2 ) |= r with u i (m 2 ) > u i (m 1 ) and u j (m 1 ) > u j (m 2 ) for u i , u j subutility functions for Si ,
Sj,
m 1 , m 2 are such that k u k (m 1 ) = k u k (m 2 ) for k = i, j.
The bound that r implies on the tradeoff ratio, ρ, is as follows. With m 1 , m 2 as given above,
we have
ρ ∗ u i (m 1 ) + u j (m 1 ) + κ > ρ ∗ u i (m 2 ) + u j (m 2 ) + κ,
where κ represents k (u k (m 1 ) − u j (m 2 )) for k = i, j. This implies
(u j (m 1 ) − u j (m 2 )) > ρ(u i (m 2 ) − u i (m 1 )).
So
ρ<
(u j (m 1 ) − u j (m 2 ))
.
(u i (m 2 ) − u i (m 1 ))
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
201
Note that the quantities (u i (m 2 ) − u i (m 1 )) and (u j (m 1 ) − u j (m 2 )) are positive. Clearly, there
can be several constraints on the tradeoff ratio for Si and Sj , and they may admit a solution or
they may not. When these constraints do not admit a solution, e.g., ρ > 3, ρ < 2, we can fix
this by defining Sk = Si ∪ S j as a new utility-independent set in, and removing Si , Sj from,
the set S of UI feature sets.
We give here a simple example of such preferences. Suppose set Si = { f 1 , f 2 } and set
Sj = { f 3 } are utility independent. The following preferences
010 000
100 010
001 000
imply that 10 01 00 on Si , and on Sj we have 1 0. To establish a tradeoff ratio between
features Si and features Sj , we can state
010 001
(36)
implying that satisfaction of Si is more important than satisfaction of Sj (it pairs a desirable
value of Si with an undesirable value of Sj and states this is preferred to the opposite.)
However, it is possible to present several other preferences that specify different satisfaction
ratios, for different levels of satisfaction of Si and Sj . Consider
011 100,
(37)
where we pair a desirable value for Si with an undesirable value for Sj and compare it to a
desirable value for Sj combined with an undesirable value for Si . This specifies that increasing
Sj from 0 to 1 is more important than increasing Si from 01 to 10. Asserting the preferences
in both equations (36) and (37) at the same time is consistent, but it is not representable by our
utility functions. This is because we have a linear combination of linear subutility functions.
A nonlinear tradeoff ratio between two feature sets Si , Sj , can be fixed by merging sets
Si ,Sj . We can define S = S, except that S contains the set Sk , where Sk = Si ∪ S j , and S does not contain either of Si or S j ; just as we described in Section 5.7. We leave it to future
work to find more efficient ways of solving the nonlinear tradeoff ratio problem.
5.10. Complexity
The running time of the SAT algorithm and the Linear Inequality solver steps can be
prohibitive. We consider here the time taken for both steps, for some different preference
structures. We will discuss the following cases: when each feature f i ∈ F(C) is UI of every
other feature, when non are UI of any other feature, when each UI feature set is of size equal
to the log of the number of features in F(C). There are two different complexity questions
to ask. One is the time and space required to construct u. The other is the time and space
required to evaluate u(m) on a particular model m.
In the following, we assume that F(C) = N and that the number of utility-independent
sets in the partition S is k. The time required for the SAT solver depends on both the number
of clauses and the number of boolean variables appearing in the SAT problem. The number of
boolean variables is bounded by |C ∗ | ∗ k, and the number of clauses depends on the number
of conflicts among rules, upon which we do not have a good bound. The SAT clauses are
of two forms. The first type ensures that every rule agrees with some UI set. Thus, these
clauses have one variable per UI set Si such that (m 1 , m 2 ) |= r ⇒ (m 1 Si ) = (m 2 Si ). We
upper-bound this quantity by k. The second type ensures at least one rule of a conflict is
ignored. These clauses have a number of boolean variables equal to the number of rules
202
COMPUTATIONAL INTELLIGENCE
involved in the conflict. The number of rules in each conflict is bounded by |C ∗ | and by 2|Si | ,
since there can only be 2|Si | different rules per utility-independent set. In pathological cases,
the number of conflicts could be as high as |C ∗ |!, if every rule conflicts with every other
set of rules. However, it is difficult to imagine such scenarios. Particularly, we assume the
preferences C ∗ are consistent, and will therefore have a low number of conflicts. Clearly, we
have no good bound on the number of conflicts. However, since there is no known polynomial
time algorithm for the satisfiability of our CNF formula, the SAT portion of the algorithm
will have worst-case exponential time cost unless we can bound the number of conflicts
and rules by O(log(|F(C)|)). Simple counterexamples exist that have O(|F(C)|) conflicts.
Thus, the efficiency of our algorithm depends on heuristic methods to both (1) reduce the
number of conflicts and (2) quickly solve SAT problems. In the first case, we speculate that
careful translation of the domain into features can reduce the number of conflicts, and work
in progress investigates this intuition. In the second case, our preliminary implementation
suggests that fast randomized SAT-solvers (e.g., WalkSAT, Selman, Levesque, and Mitchell
1992) quickly solve our constraint problems.
We have already shown that the number of linear inequalities contributed by a rule is less
than 2|Sa (r )∪Sd (r )| . More precisely, for each rule, there is one inequality for each combination of
the maximum and minimum of u i (m 1 ) − u i (m 2 ) for each subutility function. For the special
case where i is such that |Si s(r )| = 0 i contributes exactly one term, since the minimum and
maximum are the same. Thus each rule contributes
2 if |si \s(r )| ≥ 1
1 if |si \s(r )| = 0
i∈Sa (r )∪Sd (r )
linear inequalities.
First we consider two extreme cases. Suppose that every feature is found to be utility
dependent upon every other feature, so that the partition S is S1 = { f 1 , f 2 , . . . , f N }, and
k = 1. Then there can be no conflicts, since every rule must agree with u 1 . Thus, there are
no SAT clauses and no linear inequalities needed. However, the evaluation time of u(m) is
the time taken to traverse a model graph of all N features. We argued in Section 3 that this
is worst-case exponential in N .
On the other hand, suppose that every feature is found to be utility independent of every
other feature, and we have k = N . The number of inequalities per rule is exactly one, we
have the total number of inequalities equal to |C ∗ |. The evaluation time of u(m) is order N ,
since we evaluate N graphs of size 2.
Now suppose we have a more balanced utility decomposition. Suppose each utilityindependent set is of size log N . Then k = N /(log N ). The number of inequalities per rule is
less than 2|Sa (r )∪Sd (r )| , and since |Sa (r ) ∪ Sd (r )| ≤ N / log N , this is less than 2 N / log N . Thus
the total number of inequalities is less than |C ∗ | ∗ 2 N / log N . The evaluation of u(m) requires
traversing k graphs each of size 2log N . Thus evaluating u(m) is order N 2 /(log N ).
If the size of the utility-independent sets gets larger than log N , then we will not have
polynomial time evaluation of u(m). In addition, the number of possible linear inequalities
goes up exponentially with the number of utility-independent sets each rule overlaps with.
When we check rules to see if nonlinear tradeoff ratios are required, we calculate several
subutility values u i (m) for each preference statement. This is no more costly computationally
than other aspects of our algorithm. Calculating u i (m) is potentially O(2|Si | ), and we might
have to calculate this O(2|Si \s(r )| ) times for each rule r . Our standard complexity assumption
is that our algorithm is efficient if |Si | is of reasonable size. When |Si | is of size log N ,
then this step is on the order O(|C ∗ | ∗ N 2 ), where |C ∗ | is the nubmer of input preferences.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
203
Thus, without severely impacting performance of utility construction, we can check rules to
determine when nonlinearity occurs.
In conclusion, given a favorable decomposition of the feature space into utility independent sets, and few conflicts, we can create a utility function in time O(maxr (2|Sa (r )∪Sd (r )| ) +
|C ∗ |2 + maxi 2|Si | ). The first term is for the number of inequalities, the second term is for the
utility independence computation, and the third term is for actually constructing the subutility
functions u i . We can then evaluate u(m) in time O(maxi 2|Si | ).
5.11. Summary
In this section, we have presented
a number of methods of assigning scaling parameters
to the utility function u(m) = i ti u i (m) and choosing which subutility functions should be
1-consistent with which rules. Some methods are faster than others. However, the methods
presented here are not complete—they may fail to produce a utility function from ceteris
paribus preferences C ∗ . When the methods fail, it is always possible to join partition elements
Si , S j ∈ S together to perform constraint satisfaction and linear programming on a smaller
problem. Work in progress addresses optimal choice of which feature sets to join together.
We note that the combination of a lexicographic ordering over a subset of the features is a
strong heuristic. It is possible to arrange as many as possible of the features into a lexicographic
ordering, then use the SAT-conflict resolution system and linear inequalities to assign the
remaining utility-independent sets and scaling parameters. However, this method remains to
be analyzed. We should not assume this method is entirely correct; it places unnecessary
constraints on the linear inequalities, and prevent our linear-inequalities algorithm from
finding a solution.
The techniques in this section apply to sets of strict preferences and not to sets of weak
preferences. In future work, we plan to adapt the methods presented here to allow a mix of
weak and strict preferences.
6. A COMPLETE ALGORITHM
In the preceding, we have described several parts of an algorithm for computing with
qualitative ceteris paribus preference statements. This has given us enough tools to accomplish our goal: to find a utility function u such that p(u) ∈ [C] for a set of ceteris paribus
preferences C. In the following, we reiterate how these techniques can be woven together
into a complete algorithm.
Our algorithm has several steps. First, the algorithm takes as input a set C of ceteris
paribus preference statements in the language L, presented in Section 1.2, and a type of
graphical utility function g (from Section 3). These statements are logical statements over a
space of features F. The algorithm outputs a utility function u consistent with C. The steps
of the algorithm are as follows:
1. Compute the set of relevant features F(C), which is the support of C.
2. Compute C ∗ = σ (C). Translation σ (C) is defined in Section 2.3 and converts a set of
preferences C in language L to a set of perferences C ∗ of preferences in the language
L(V).
3. Compute a partition S = {S1 , S2 , . . . , S Q } of F(C) into utility-independent feature sets,
and compute T = {T1 , T2 , . . . , TQ } such that Si is utility dependent on Ti and utility
independent of F(C)\Ti , using function IUDecomposition, as discussed in Section 4.2.
Let S = {S1 , S2 , . . . , S Q } be such that Si = (Si ∪ Ti ).
204
COMPUTATIONAL INTELLIGENCE
4. Construct relevant rule sets Ri for each Si . Ri is defined to be the set of rules r ∈ C ∗
such that (m 1 , m 2 ) |= r ⇒ (m 1 Si ) = (m 2 Si ).
5. Construct the restricted model graph G i (Ri ) for each set of rules Ri .
6. Compute a minimal and complete set Yi of cycles Yik for each graph G i (Ri ) such that
each cycle Yik is a set of rules from Ri .
7. Construct the satisfiability problem P(C ∗ , S) from all cycles Yik .
8. Find a solution to P(C ∗ , S).
9. Choose conflict-free rule sets R i ⊆ Ri sets using solution of P(C ∗ , S), as given in
equation (13).
10. Construct cycle-free restricted model graphs G i (R i ).
11. Define each subutility function u i to be the graphical subutility function of type g based
on G i (R i ).
12. Use Definition 12 to construct I (C ∗ , S, R), a system of linear inequalities relating the
parameters ti .
13. Solve I (C ∗ , S, R) for each ti using linear programming.
(a.) If I (C ∗ , S, R) has a solution, pick a solution, and use the
solution’s values for u i
and ti to construct and output a utility function u(m) = i ti u i (m).
(b.) If I (C ∗ , S, R) has no solution, construct u to be the GUF of type g based on G(C ∗ ),
and output u.
This algorithm is correct and complete, further it is possible to prove as much by reference
to the preceding theorems. We present this here.
Theorem 16 (Completeness). Given a set of ceteris paribus preferences C, the above algorithm produces a utility function u such that p(u) ∈ [C].
Proof. Theorem 1 shows that C ∗ = σ (C) represents the same preference as C. Therefore,
any utility function u such that p(u) ∈ [C ∗ ] satisfies p(u) ∈ [C]. By Theorem 6, the partition
S is a partition of F(C). Lemma 4 implies that a minimal and complete set of conflicts Yi
for rule set Ri can be computed by performing cycle-detection on the restricted model graph
G i (Ri ). By Definitions 10 and 11, S, R, Y , are enough to construct a satisfiability problem
P(C ∗ , S). By Lemma 4, the solution to P(C ∗ , S) allows choosing of R i conflict free. By
Lemma 4, each restricted model graph G i (R i ) is cycle free. It is then possible to build and
solve a set of linear inequalities I (C ∗ , S, R), as given in Definition 12. If I (C ∗ , S,
R) has a
solution, then this solution provides values for each ti . By Theorem 15 u(m) = i ti u i (m)
is a utility function consistent with C ∗ . If I (C ∗ , S, R) has no solution, then the algorithm
outputs u the graphical utility function of type g based on G(C ∗ ). By Theorems 2–4, this is
a utility function consistent with C ∗ .
7. IMPLEMENTATION
We implemented the general form of the algorithm laid out in this work. The implementation is done in C# and available from www.medg.lcs.mit.edu/people/˜mmcgeach. The
implementation has an interface allowing atoms, models, rules, and preference operators
to be defined. The internal representations of models are strings or sometimes bit vectors
packed into integers when its know the models will be less than 32 bits long (as is appropriate
for a model restricted to a subutility function—if the subutility functions have more than 32
features, the algorithms are entirely intractable). We translated java code for solving SATs in
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
205
CNF and for solving linear programming problems. We note that much translation between
Java and C# can be done automatically. Our SAT solver was a translation into C# of a translation into Java by Daniel Jackson,1 of the original WalkSat algorithm written by Henry Kautz
and Bart Selman in C (Selman et al. 1992). Similarly, we translated a Linear Programming
package called JLAPACK,2 which is a Java translation of the popular C library CLAPACK,
which is itself a translation of the popular Fortran LAPACK (Anderson et al. 1990).
Our informal experience with the implementation and running the algorithm on random
and constructed test problems indicates several trends. The first is that many examples exhibit
a great degree of utility independence, and this causes our algorithm to function quite well.
The second is that the cases that have many complicated preference rules lead to difficult SAT
problems and large systems of linear inequalities. The third is that most of these “difficult”
cases turn out to be cases of inconsistent preferences in the first place, but verifying the
inconsistency is quite difficult. Determining if preferences are inconsistent is computationally
taxing, we have no method of better than exponential complexity (in the number of features),
however we believe the exact complexity is unknown.
8. RELATED WORK
Our work has some general characteristics that set it apart from other work in Artificial
Intelligence dealing with utility theory. In general, the literature usually assumes that some
human-centric method is used to assess a human decision maker’s preferences and utilities
over a feature space (such as described in Keeney and Raiffa 1976). Our work produces such
a function from preference statements. Further, our work contains a method for automatically
computing utility independence, which is generally either assumed or gathered directly from
the decision maker. And finally, no other work actually computes a numeric utility function
from purely qualitative preferences. On the other hand, our work does not treat probabilistic
actions, and therefore does not compute the expected utility of an action or outcome. Researchers who are concerned to treat probability and compute expected utilities frequently
spend most of their effort on such issues.
Many researchers have defined logics of preference and desire (Shoham 1997; van der
Torre and Weydert 1998a; La Mura and Shoham 1999), several of them even define logics
of ceteris paribus statements (Tan and Pearl 1994a, 1994b; Bacchus and Grove 1995, 1996;
Boutilier et al. 1999). We mention these in the following sections.
8.1. Related Systems
van der Torre and Weydert (1998a) pursue the natural idea of giving goals semantics
in terms of desires, but employ a different interpretation from the ceteris paribus semantics
used by Wellman and Doyle (1991) to interpret goals. Following Boutilier (1994), van der
Torre and Weydert define a conditional goal as a → b meaning if a obtains, then try to
achieve b. They are concerned to elevate goals to the semantic level of desires, and propose
a system wherein they assign each goal an a priori utility. They then add the utilities of
the goals together when several goals are accomplished simultaneously, and allow goals to
have negative impact if they are not satisfied. In van der Torre and Weydert (1998b), they
allow desires to have strength parameters that determine which desire takes precedence when
1 Available
at http://sdg.lcs.mit.edu/ dnj/walksat/.
Chatterjee, The Harpoon Project, Available at http://www.cs.unc.edu/Research/HARPOON/jlapack/.
2 Siddhartha
206
COMPUTATIONAL INTELLIGENCE
two desires conflict. In van der Torre and Weydert (2001) and Lang, van der Torre and
Weydert (2002) with Lang, they mature their conception of building a utility function from
quantified desires. They present several compelling examples of their logic capturing the
preferences underlying intuitive experiences (Sue wants to marry John, and to marry Bill,
but not simultaneously; I wish not to have a fence, but wish to have a dog, and if I have a dog
then I wish to have a fence).
Lang, van der Torre, and Weydert construct utility functions from desires from a very
different starting point than in our construction. From the start, they assume each goal has
utility independent of every other goal, and what is more, that this utility is known. Thus,
they have desire one worth 3, desire two worth 5, and desire three worth 7.3, and the utility
of an outcome is just the sum of the desires that are true in the outcome. In our construction
from ceteris paribus preferences, such knowledge comes only after much analysis of the set
of preferences, rather than as something demanded of the human user.
As may be evident, this system falls prey to many of the problems of quantitative knowledge representation systems. Small variations in the parameter values render several of the
examples counter-intuitive. By their own admission (van der Torre and Weydert 2001, p. 299),
“the ad hoc adjustment of parameters clearly shows the advantages and drawbacks of our
approach.” They regard fine-grained control over the form of the utility function as an advantage of their approach. However, the necessity to resort to such fine-grained control in
analyzing intuitive examples is a disadvantage.
La Mura and Shoham use a very different representation, called expected utility networks (Shoham 1997; La Mura and Shoham 1999), wherein they reason about utilities the
way researchers have previously reasoned about probabilities. They discuss additive utility
decomposition, wherein a subutility is called a “factor” which is a constant utility amount
contributed to a situation if the situation makes the factor true (similar to the work of Lang,
van der Torre, and Weydert). With a utility decomposition in hand, they define “Bayes Nets
for utilities,” and develop some powerful reasoning methods using these nets. These reasoning methods work in a manner similar to Bayesian network methods, in which an expected
utility can be computed from the topology of the network and the probability and utility
functions present at each node.
8.2. Ceteris Paribus Reasoning Systems
Tan and Pearl (1994a, 1994b) use conditional ceteris paribus comparatives. While these
do use a ceteris paribus interpretation of preferences, they condition the statement on some
set of worlds. Thus the ceteris paribus rule only holds if some other logical condition (or
set of worlds) is also true. They assume each desire is quantified, α β by , such that
the utility of α exceeds that of β by at least . This quantification aside, their unconditional
ceteris paribus semantics are equivalent to those used here. In Tan and Pearl (1994a) they are
concerned to deal with the specificity of the rules, and allow a more specific rule to override
a more general rule, which is a difficult problem our work does not address. Tan and Pearl do
not deal with the computation of preference queries, that is, which outcomes are preferred to
which other outcomes.
Bacchus and Grove have presented a slightly different conception of ceteris paribus preference specification (Bacchus and Grove 1996) and algorithms for computing with it (Bacchus
and Grove 1995). Similar to La Mura and Shoham (1999), their computation paradigm is
an adaptation of Bayesian Networks to representing utility. Their use of conditional additive utility independence of sets of different features to define graphical representations is
similar to our use of unconditional additive utility independence, but their graph represents
dependencies between features. They then compute a simplified form of the utility function
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
207
from the cliques of the graph. Our work uses the preferences themselves to compute utility
independence, and the preferences to define the form of the utility function.
Bacchus and Grove (1996), contrast their preference representation with that of Doyle
et al. (1991). Bacchus and Grove use a conditional expected utility measure, the expected
utility divided by the total probability of occurrence, where if the sum of expected utility of
states satisfying φ is greater than those satisfying ¬φ then there is a preference for φ over ¬φ.
Their representation requires a priori knowledge of all the utilities and all the probabilities
of states satisfying φ and ¬φ.
Although Bacchus and Grove’s aims are similar to ours in that they try to compute utility
functions with ceteris paribus preferences, their approach differs from our own in several
major ways. Firstly, they do not treat qualitative preferences, although they exploit qualitative
representations of preferential independence and dependence. Secondly, they assume the
independence information as a given rather than seek to infer it from preference statements.
Thirdly, they do not discuss methods for constructing subutility information. Fourthly, their
work is complicated by their use of conditional additive independence (independence may
fail sometimes) and the involvement of Bayes nets for probability computation.
In Boutilier et al. (1997, 1999), Boutilier, Brafman, Geib, Hoos, and Poole, propose a
system of conditional ceteris paribus preference statements. They use the conditions on the
ceteris paribus statements to define a graph structure, called a CP-Net. Each node is a feature,
and is annotated with a table describing the user’s (human decision maker’s) qualitative
preferences given every combination of parent values.
A CP-Net allows complicated conditions on a ceteris paribus preference, but the preference itself must be restricted to a single utility-independent attribute. For comparison,
our method allows both complicated conditions (Section 1.4) and complicated preferences
spanning several utility-independent feature sets. The nodes in a CP-Net are annotated with
tables of utilities exponential in the number of features involved. This is equivalent to our
(worst-case) exponential time requirement to evaluate a subutility function for a UI feature
set. Nodes in the CP-Net must then be traversed to compute a dominance query. In Boutilier
et al. (1999) they prove some (weak) search methods for this traversal, and argues for some
strong heuristics. In many cases, their heuristics suffice to assure the search is backtrack
free (and thus polynomial). Further, Domshlak and Brafman show that CP-Nets with a variety of structural properties can be searched in worst-case polynomial time (Domshlak and
Brafman 2002). In most cases, computing with a CP-Net is generally linear in the number
of features. Thus, the complexity of comparing two models is similar to the complexity of
computing the utility of a model in our system, if we compile each subutility function into
an (exponential-size) look-up table.
Recent work on CP-Nets has brought the combination of a CP-Net with a quantitative
utility net that allows computing the utility of a model with the new structure (Boutilier
et al. 2001). Therein they give substantial treatment of how to elicit preferences from human decision makers, and how to massage these preferences to conform to the “UCP-Net”
representation. Interestingly, they obtain a series of constraints on the tradeoff weights between utility-independent feature sets, then use linear programming to solve for the tradeoff
parameters. In this framework, it could be said that a UCP-Net solves a similar problem to
our own: that it takes qualitative ceteris paribus preference statements and computes a utility
function for these, but it does so with the addition of a quantitative utility net.
Another interesting extension of CP-Nets is due to Brafman and Domshlak (2002). They
provide another type of arc to the network, indicating a tradeoff ratio between the connected
nodes. The arc is then annotated with a complete preference ordering of all the possible
combinations of the features joined by the arc. The resulting network it termed a TCP-Net.
Adding tradeoff ratios actually addresses one of the great limitations of CP-Nets: that each of
a decision maker’s preferences may only refer to a single utility independent feature. In our
208
COMPUTATIONAL INTELLIGENCE
representation such preferences can be used to indicate a tradeoff ratio between features. Thus,
it is clear that TCP-Nets are able to represent many more possible preference preorders over
a domain than CP-Nets can, but the exact relation of TCP-Nets to ceteris paribus preferences
of Doyle et al. (1991) is unknown.
9. FUTURE WORK
Future work extending the system proposed herein can be divided into two groups. The
first is work using the current ceteris paribus representation and formalism. The second is
work that alters or augments the representation presented in some way.
There are a few avenues of future research that we mentioned briefly before. We can
attempt to synthesize the treatment of weak preferences in Section 3 with the way we build
subutility functions. It may be possible to just pretend all weak preferences are actually strict
preferences, and use constraint satisfaction to choose one or the other weak preference when
m and m m. Or we can allow these to both be satisfied by
we have two asserting m ∼
∼
allowing a cycle in u i from m to m . If we allow a cycle from m to m , that is, m ∼ m , we
must define this u i to be disagreeing with any strict preference that assert either that m m or that m m.
We might explore the possible uses of partial conflict-free ordering of mutually utilityindependent feature sets. It seems unlikely in general that entire conflict-free orderings will be
found, but more likely that partial conflict-free orderings would arise. It may be straightforward to assign a partial lexicographic ordering to a partial conflict-free ordering, then order
the remaining sets using constraint satisfaction. This approach is likely to complicate the
definitions and use of linear inequalities.
Nonlinear scaling could be addressed in a more intricate manner. Composing nonlinear
functions with the subutility functions could remedy the issue. Or, conditioning the value of
the scaling parameter by the value of the subutility function would achieve the same result. In
either case, the important task will be determining the exact form of the nonlinearity required.
We plan to extend the representation of qualitative preferences given herein in several
ways. Firstly, we might give a treatment of nonbinary discrete valued features. For this,
the work of Wellman on Qualitative Probabilistic Networks (Wellman 1990) might be an
appropriate starting point. The natural extension of that is to treat continuous-valued features.
From a purely speculative standpoint, treating continuous valued variables could require
a richer (and less qualitative) preference specification language. One extension could be a
representation similar to partial differential equations. The decision maker might specify a
partial relation between two features, such as “A is 10-times as desirable as B.” This has a
natural representation in terms of partial differential equations.
Alternatively, our representation could be fractured into several representations of various
decision theoretic concepts. Our feature vector language mixes together representations of
utility independence, tradeoff ratios between mutually utility-independent feature sets, and
preferences within a feature set. Clearly some of the power of the representation presented
herein comes from this combination, but it may be possible to achieve faster reasoning
methods by splitting the representations. However, this direction would bring our work much
closer to the work of Bacchus and Grove, and Boutilier et al. as described in Section 8.2.
10. CONCLUSIONS
Although much work remains to be done in the area of reasoning with ceteris paribus
preferences, the work demonstrated herein has provided some answers. We have shown
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
209
a translation between the logical representation of ceteris paribus preferences and a new,
feature-vector representation of ceteris paribus preferences. The feature-vector representation makes it simple to construct a model graph representing a particular set of preferences.
This graph makes the definition of numeric utility functions simple, although of worst-case
exponential complexity. We have then shown how additive utility independence between
features can be inferred from the structure of preference statements in the feature-vector representation. Finally, we demonstrated some heuristic methods that can be used in the presence
of utility independence to achieve better performance on computing the numeric utility of a
model, according to a fixed set of preferences.
The work herein has provided several reasoning methods using the ceteris paribus representation of Doyle et al. (1991), for which there previously were no explicit methods
developed in detail. There have been other representations of preferences, some of which
have well-developed reasoning methods. Future applications involving the automatic specification of preferences, and computation with them, may now choose to use the representation
method of Doyle, Shoham, and Wellman and the reasoning methods presented in this paper.
ACKNOWLEDGMENTS
The authors wish to thank Howard Shrobe and Peter Szolovits for their assistance and advice. We also thank the anonymous reviewers for their extremely helpful comments. This work
was supported in part by DARPA under contract F30602-99-1-0509. Michael McGeachie is
supported in part by a training grant from the National Library of Medicine, and a grant from
the Pfizer corporation.
REFERENCES
ANDERSON, E., Z. BAI, C. BISCHOF, J. DEMMEL et. al. 1990. LAPACK: A portable linear algebra library for
high-performance computers. In Proceedings of Supercomputing. Knoxville.
BACCHUS, F., and A. GROVE. 1995. Graphical models for preference and utility. In Proceedings of the Eleventh
Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San francisco, CA, pp. 3–
19.
BACCHUS, F., and A. GROVE. 1996. Utility independence in a qualitative decision theory. In Proceedings of
the Fifth International Conference on Knowledge Representation and Reasoning. Morgan Kaufmann, San
Francisco, CA, pp. 542–552.
BOUTILIER, C. 1994. Toward a logic for qualitative decision theory. In Proceedings of the Fourth International
Conference on Principles of Knowledge Representation and Reasoning (KR-94). Bonn, pp. 76–86.
BOUTILIER, C., F. BACCHUS, and R. L. BRAFMAN. 2001. Ucp-networks: A directed graphical representation of
conditional utilities. In Proceedings of Seventeenth Conference on Uncertainty in Artificial Intelligence.
Seattle, pp. 55–64.
BOUTILIER, C., R. BRAFMAN, C. GEIB, and D. POOLE. 1997. A constraint-based approach to preference elicitation
and decision making. In Working Papers of the AAAI Spring Symposium on Qualitative Preferences in
Deliberation and Practical Reasoning. Edited by J. Doyle and R. H. Thomason. AAAI, Menlo Park, CA, pp.
19–28.
BOUTILIER, C., R. I. BRAFMAN, H. H. HOOS, and D. POOLE. 1999. Reasoning with conditional ceteris paribus
preference statements. In Proceedings of Uncertainty in Artificial Intelligence 1999 (UAI-99).
BRAFMAN, R. I., and C. DOMSHLAK. 2002. Introducing variable importance tradeoffs into cp-nets. In Proceedings
of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI-02). Morgan Kaufmann, San
Francisco, CA.
210
COMPUTATIONAL INTELLIGENCE
Chvátal, V. 1983. In Linear Programming. W.H. Freeman and Company, New York.
DOMSHLAK, C., and R. I. BRAFMAN. 2002. Cp-nets—reasoning and consistency testing. In Proceedings of the
Eighth International Conference on Principles of Knowledge Representation and Reasoning (KR-02). Morgan Kaufmann, San Francisco, CA, pp. 121–132.
DOYLE, J., Y. SHOHAM, and M. P. WELLMAN. 1991. A logic of relative desire (preliminary report). In Proceedings
of the Sixth International Symposium on Methodologies for Intelligent Systems. Edited by Z. Ras. Lecture
Notes in Computer Science. Berlin, Springer-Verlag, pp.16–31.
DOYLE, J., and R. H. THOMASON. 1999. Background to qualitative decision theory. AI Magazine, 20(2):55–68.
KEENEY, R., and H. RAIFFA. 1976. In Decisions with Multiple Objectives: Preferences and Value Tradeoffs.
Wiley and Sons, New York.
LA MURA, P., and Y. SHOHAM. 1999. Expected utility networks. In Proceedings of Fifteenth Conference on
Uncertainty in Artificial Intelligence, pp. 366–373.
LANG, J., L. VAN DER TORRE, and E. WEYDERT. 2002. Utilitarian desires. Autonomous Agents and Multi-Agent
Systems, 5:329–363.
MCGEACHIE, M., and J. DOYLE. 2002. Efficient utility functions for ceteris paribus preferences. In AAAI Eighteenth National Conference on Artificial Intelligence. Edmonton, Alberta.
SELMAN, B., H. J. LEVESQUE, and D. MITCHELL. 1992. A new method for solving hard satisfiability problems.
In Proceedings of the Tenth National Conference on Artificial Intelligence. Edited by P. Rosenbloom and P.
Szolovits. AAAI Press, Menlo Park, CA, pp. 440–446.
SHOHAM, Y. 1997. A mechanism for reasoning about utilities (and probabilities): Preliminary report. In Working
Papers of the AAAI Spring Symposium on Qualitative Preferences in Deliberation and Practical Reasoning.
Edited by J. Doyle and R. H. Thomason. AAAI, Menlo Park, CA, pp. 85–93.
TAN, S.-W., and J. PEARL. 1994a. Qualitative decision theory. In AAAI94. Menlo Park, CA. AAAI Press.
TAN, S.-W., and J. PEARL. 1994b. Specification and evaluation of preferences for planning under uncertainty.
In KR94. Edited by J. Doyle, E. Sandewall, and P. Torasso. Morgan Kaufmann, San Francisco, CA,
pp. 530–539.
TORRE, L., and E. WEYDERT. 1998a. Goals, desires, utilities and preferences. In Proceedings of the
ECAI’98 Workshop on Decision Theory meets Artificial Intelligence.
VAN DER
TORRE, L., and E. WEYDERT. 1998b. Goals, desires, utilities and preferences. In Proceedings of the
ECAI’98 Workshop on Decision Theory meets Artificial Intelligence.
VAN DER
TORRE, L., and E. WEYDERT. 2001. Parameters for utilitarian desires in a qualitative decision theory.
Apllied Intelligence, 14:285–301.
VAN DER
WELLMAN, M., and J. DOYLE. 1991. Preferential semantics for goals. In Proceedings of the Ninth National
Conference on Artificial Intelligence. Edited by T. Dean and K. McKeown. AAAI Press. Menlo Park, CA,
pp. 698–703.
WELLMAN, M. P. 1990. Fundamental concepts of qualitative probabilistic networks. Artificial Intelligence,
44:257–303.
APPENDIX: PROOFS OF THEOREMS
Theorem 7 (Subutilities). Given rules C ∗ and a mutually utility-independent partition S of
F(C), if each u i is 1-consistent with r , then
u(m) =
Q
i=1
where ti > 0, is consistent with r .
ti u i (m),
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
211
Proof. By definition of 1-consistency of u i with r , u Si (m 1 ) − u Si (m 2 ) ≥ 1 when (m 1 , m 2 ) |=
r and (m 1 Si ) = (m 2 Si ). Since ti > 0, we have
Q
ti (u i (m 1 ) − u i (m 2 )) > 0.
(38)
i:(m 1 Si )=(m 2 Si )
Si ) =
By definition of a subutility function u i , if (m 1 Si ) = (m 2 Si ) then u i (m 1
u i (m 2 Si ), so we have
Q
(39)
ti (u i (m 1 ) − u i (m 2 )) = 0.
i:(m 1 Si )=(m 2 Si )
Combining equation (39) and inequality (38), we have
Q
ti (u i (m 1 ) − u i (m 2 )) +
ti (u i (m 1 ) − u i (m 2 )) > 0.
i:(m 1 Si )=(m 2 Si )
i:(m 1 Si )=(m 2 Si )
Q
The above is equivalent to
Q
ti (u i (m 1 ) − u i (m 2 )) > 0.
i=1
So we have
Q
ti u i (m 1 ) >
i=1
Q
ti u i (m 2 )
i=1
and
u(m 1 ) > u(m 2 )
whenever (m 1 , m 2 ) |= r . This is the definition of consistency of u with r . Thus, this completes
the proof.
Lemma 3 (Cycle-free subutility). Given a set of rules R ⊆ C ∗ and a set of features Si ⊆ F(C)
such that the restricted model graph G i (R) is cycle free, and û i (m Si ) is the minimizing
GUF over G i (R), the subutility function u i (m) = û i (m Si ) for Si is 1-consistent with all
r ∈ R.
Proof. We must show that u i (m 1 ) ≥ 1 + u i (m 2 ) whenever (m 1 , m 2 ) |= r and m 1 Si =
m 2 Si . First let û i be the minimizing GUF (from Section 3) defined over G i (R). Then
pick some r ∈ R and some pair (m 1 , m 2 ) |= r where (m 1 Si ) = (m 2 Si ). By definition of
G i (R), there exists an edge e(m 1 Si , m 2 Si ) thus, û i (m 1 Si ) ≥ 1 + û i (m 2 Si ) because
there exists a path from (m 1 Si ) that contains (m 2 Si ), and therefore contains the longest
path from (m 2 Si ), plus at least one node, namely the node (m 1 Si ). So we have
û i (m 1 Si ) ≥ 1 + û i (m 2 Si )
⇒ u i (m 1 ) ≥ 1 + u i (m 2 ).
Since this holds for all (m 1 , m 2 ) |= r and for all r ∈ R, this proves the lemma.
212
COMPUTATIONAL INTELLIGENCE
Lemma 4 (Conflict cycles). Given a set of strict preference rules R ⊆ C ∗ and a set S of
feature sets of F(C), the rules R conflict if and only if the rules R imply a cycle in any
restricted model graph G i (R) of some set of features Si ∈ S.
Proof. If the rules R imply a cycle in G i (R), then there exists some cycle of nodes m j in
G i (R). Call this cycle Y = (m 1 , m 2 , . . . , m k , m 1 ). By the definition of a model graph, an
edge e(m j , m j + 1) exists if and only if m j R m j+1 . Thus, for a subutility function u i to be
consistent with R, the following would have to hold:
u i (m 1 ) > u i (m 2 ) > · · · > u i (m k ) > u i (m 1 ),
which is not possible, because u i (m 1 ) > u i (m 1 ) is impossible. Thus, if R implies a cycle in
some G i (R), then there is no subutility function u i consistent with R.
In the other direction, we show that if the rules R conflict on Si , then they imply a cycle
in G i (R). We assume the opposite and work toward a contradiction. Suppose R conflict on Si
but they do not imply a cycle in G i (R). By the definition of a restricted model graph, G i (R)
is cycle free. Then we define the subutility function u i to be the minimizing GUF based on
the graph G i (R). By Lemma 3, u i is 1-consistent with R. This is a contradiction, so we have
shown that when R conflict on Si they imply a cycle in G i (R). This proves the lemma. Theorem 10 (Rule consistency). Given a set S = {S1 , . . . , S Q } of subsets of F(C), corresponding relevant rule sets Ri for each Si ∈ S, conflict-free rule sets R i for each Ri , and
subutility functions u i 1-consistent with R i , a generalized additive utility function u is consistent with a rule r if for all (m 1 , m 2 ) |= r , we have
ti (u i (m 1 ) − u i (m 2 )) >
t j (u j (m 2 ) − u j (m 1 )).
(40)
i∈Sa (r )
j∈Sd (r )
Proof. If u is consistent with r then u(m 1 ) > u(m 2 ) whenever m 1 r m 2 . Inequality (40) is
equivalent to:
ti (u i (m 1 ) − u i (m 2 )) > 0.
i∈Sa (r )∪Sd (r )
Note that the set of indices Sa (r ) ∪ Sd (r ) is the set of all indices i for which r ∈ Ri . Thus,
the above is equivalent to
ti (u i (m 1 ) − u i (m 2 )) > 0,
i:r ∈Ri
and we can split the summation, so we have
ti u i (m 1 ) >
ti u i (m 2 ).
i:r ∈Ri
i:r ∈Ri
The above is the same as inequality (16), which is equivalent to inequality (15), which in turn
is equivalent to inequality (14), and since we define an generalized additive utility function
u as:
u(m) =
Q
i
ti u i (m),
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
213
we can substitute this definition of u(m) for the summations in inequality (14) to obtain:
u(m 1 ) > u(m 2 )
whenever (m 1 , m 2 ) |= r , which is the definition of consistency of u with r .
Theorem 11 (Simple scaling parameters). Given a set of feature sets S = {S1 , S2 , . . . , S Q }
and a set of feature vector rules C ∗ in L ∗ , with Ri the set of relevant rules for Si , and each u i
is 1-consistent with each r ∈ Ri , then
u(m) =
Q
u i (m)
(41)
i=1
is an ordinal utility function consistent with C ∗ .
Proof. Fix a rule r ∈ C ∗ . If each u i is 1-consistent with all rules in Ri , then if r ∈ Ri , for
all (m 1 , m 2 ) |= r we have u i (m 1 ) − u i (m 2 ) > 0. So we have
(u i (m 1 ) − u i (m 2 )) > 0,
i:r ∈Ri
and since for r ∈ Ri u i (m 1 ) = u i (m 2 ) for all (m 1 , m 2 ) |= r , we have:
Q
(u i (m 1 ) − u i (m 2 )) > 0
i=1
or
Q
u i (m 1 ) −
i=1
Q
u i (m 2 ) > 0
i=1
Substituting u(m) for the definition in equation (41) we have
u(m 1 ) − u(m 2 ) > 0.
Since we chose r arbitrarily, this holds for any r ∈ C ∗ . Thus, u as defined in equation (41) is
a utility function consistent with C ∗ .
Theorem 12 (Lexicographic ordering). Given rules r ∈ C ∗ , a set of feature sets S of F(C), a
# each u i a subutility function 1conflict-free ordering S# of S, relevant rule sets Ri for S#i ∈ S,
i−1
consistent with {Ri \ ∪ j=1 R j }, Sa (r ) the set of indices of subutility functions u i 1-consistent
# =
with r , Sd (r ) the set of indices of subutility functions u i such that (m 1 , m 2 ) |= r ⇒ (m 1 Si)
(m 2 S#i ) and u i is not 1-consistent with r , then
u(m) =
Q
ti u i (m),
(42)
ti max(u i ),
(43)
i
where t Q = 1 and for all i : 1 ≤ i < Q,
ti = 1 +
Q
j=i+1
u is a utility function consistent with C ∗ .
214
COMPUTATIONAL INTELLIGENCE
Proof. Pick some r ∈ C ∗ . We are given that r is 1-consistent with subutility functions
indicated by Sa (r ) and inconsistent with subutility functions indicated Sd (r ). For any pair
of models (m 1 , m 2 ) |= r , we must show u(m 1 ) > u(m 2 ). We will do so by showing that
u(m 1 ) − u(m 2 ) > 0. From equation (42), we have
u(m 1 ) − u(m 2 ) =
Q
Q
(ti u i (m 1 )) −
i=1
(ti u i (m 2 )) .
i=1
For i ∈ {Sa (r ) ∪ Sd (r )}, we have u i (m 1 ) = u i (m 2 ), for any models (m 1 , m 2 ) |= r . Thus, we
can remove these terms from our summations, and we have
(ti u i (m 1 )) −
(ti u i (m 2 )) ,
u(m 1 ) − u(m 2 ) =
i∈Sa (r )∪Sd (r )
i∈Sa (r )∪Sd (r )
since Sa (r ) is disjoint from Sd (r ), we can rearrange the terms in the summations
ti (u i (m 1 ) − u i (m 2 )) +
ti (u i (m 1 ) − u i (m 2 )).
u(m 1 ) − u(m 2 ) =
i∈Sa (r )
i∈Sd (r )
By Lemma 5, there exists x in Sa (r ), such that x < y for all y ∈ Sd (r ). Since u i (m 1 ) −
u i (m 2 ) > 0 for all i in Sa (r ), we can bound u(m 1 ) − u(m 2 ) by
ti (u i (m 1 ) − u i (m 2 )).
u(m 1 ) − u(m 2 ) ≥ tx (u x (m 1 ) − u x (m 2 )) +
i∈Sd (r )
Similarly, u i (m 1 ) − u i (m 2 ) ≥ − max(u i ) for all i in Sd (r ), thus the above can be bounded by:
ti max(u i ).
u(m 1 ) − u(m 2 ) ≥ tx (u x (m 1 ) − u x (m 2 )) −
i∈Sd (r )
Since u x is 1-consistent with r , u x (m 1 ) − u x (m 2 ) ≥ 1. Then it is true that
ti max(u i ).
u(m 1 ) − u(m 2 ) ≥ tx −
i∈Sd (r )
Since x < y for all y ∈ Sd (r ), Sd (r ) ⊆ {x + 1, x + 2, . . . , Q}, it follows that
u(m 1 ) − u(m 2 ) ≥ tx −
Q
ti max(u i ).
i=x+1
Substituting the definition of tx from equation (43) we have:
u(m 1 ) − u(m 2 ) ≥ 1 +
Q
ti max(u i ) −
i=x+1
Q
ti max(u i ).
i=x+1
And simplifying provides that
u(m 1 ) − u(m 2 ) ≥ 1.
Since we chose r arbitrarily, this derivation holds for all r ∈ C ∗ . Thus, u as given in equation
(42) is consistent with C ∗ . This proves the theorem.
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
215
Theorem 13 (Correctness of SAT-formulation). Given a set of feature sets S of F(C), if
∗
P(C ∗ , S) has no solution
then the preferences C are not consistent with a utility function u
of the form u(m) = i ti u i (m).
Proof. If there is no solution to P(C ∗ , S), it means that in all possible assignments of
truth values to variables, some clause is not satisfied. By definition of the propositional
variables in P(C ∗ , S), the truth assignments correspond directly to rule-subutility function
1-consistency. The clauses in P(C ∗ , S) are of two forms: rule-set clauses (Definition 10), and
conflict clauses (Definition 11). By assumption, there is no solution to P(C ∗ , S). Thus, in
each possible assignment of truth values to the variables in P(C ∗ , S), either there exists some
rule-set clause that is not satisfied, or there exists some conflict clause that is not satisfied. In
terms of the interpretation given to variables z i j , if a rule-set clause is not satisfied then there
is some rule r j that is not 1-consistent with any subutility function u i for each Si . In the other
case, if a conflict clause is not satisfied, then there exists some cycle Yik where each r j ∈ Yik
is included in the set R i (by definition of R i , equation (25)); and thus R i is not conflict free.
Thus we must treat two cases. Suppose the first case holds: in any assignment of truth
values to variables z i j in P(C ∗ , S), there is some rule r j that is not 1-consistent
with any
subutility function u i . Thus, by Corollary 2, a utility function u(m) = i ti u i (m) is not
consistent with r j . By definition of consistency, u is not consistent with C ∗ . This holds for
any generalized additive utility function u based on the utility-dependence partition S.
The second case is that in any assignment of truth values to the variables z i j , there is
some conflict clause C R i = ∨ j:r j ∈Yik ¬z i j where all z i j are true. We show that this case is
equivalent to the first case. By definition of P(C ∗ , S), for all r j ∈ C R i , there is a rule-set
clause C j = ∨i∈X j z i j . If there were some z x j ∈ C j for x = i, where z x j = true, we could
construct a solution to P(C ∗ , S) by making z i j false and leaving z x j true. But by assumption,
there is no solution to P(C ∗ , S), so we must conclude that for all z x j ∈ C j with x = i, z x j =
false. Therefore, we make z i j false, and then for all z x j ∈ C j , z x j = false, so we now have an
assignment of truth values of the first case.
This shows that if there is no solution to the SAT problem, P(C ∗ , S), then there is no
generalized additive utility function, consistent with the stated preferences C ∗ and the set of
feature sets S.
Theorem 14 (Set merging). If C ∗ is an inconsistent preference set over features Sk , then
there does not exist any Si , S j with Si ∪ S j = Sk such that C ∗ restricted to Si is consistent
and C ∗ restricted to S j is consistent.
Proof. A set of preference C ∗ over Sk is inconsistent exactly when the model graph, G(Sk )
has one or more cycles. Let Y = {r1 , r2 , . . . , rk } be such a cycle. Since Y is a cycle, there
exists models m 1 , m 2 , . . . , m k such that
(m 1 , m 2 ) |= r1
(m 2 , m 3 ) |= r2
..
.
(m k−1 , m k ) |= rk−1
(m k , m 1 ) |= rk .
Let S ⊆ Sk be any set of features. Let S be S if some r ∈ Y has nonempty (r S), otherwise
let S = Sk \S. In either case, some r ∈ Y has nonempty (r S ). By definition of model
restriction and rule restriction, we have:
216
COMPUTATIONAL INTELLIGENCE
((m 1 S ), (m 2 S )) |= (r1 S )
((m 2 S ), (m 3 S )) |= (r2 S )
..
.
((m k−1 S ), (m k S )) |= (rk−1 S )
((m k S ), (m 1 S )) |= (rk S ).
Thus we have exhibited a cycle on S caused by the preferences C. Let Si = S , S j = Sk \S .
Then we have shown, for any Si , S j such that Si ∪ S j = Sk , that there is a conflict in Si . Lemma 7 (Sufficiency of I (C ∗ , S, R)). Given I (C ∗ , S, R) and a solution to it, for all r in
some (Ri \R i ) and for all model pairs (m 1 , m 2 ) |= r ,
ti (u i (m 1 ) − u i (m 2 )) > 0.
i∈{Sa (r )∪Sd (r )}
Proof. Let xi be the maximum value of u i (m 1 ) − u i (m 2 ) and yi be the minimum value of
u i (m 1 ) − u i (m 2 ), then the following inequalities are members of I (C ∗ , S, R):
xi ti +
Q
tjxj > 0
j=1, j=i
xi ti +
Q
tj yj > 0
j=1, j=i
yi ti +
Q
tjxj > 0
j=1, j=i
yi ti +
Q
t j y j > 0.
j=1, j=i
Since we have a solution to I (C ∗ , S, R), we have an assignment of values to t1 , . . . , t Q
such that the above hold. Clearly the following hold for any (u i (m 1 ) − u i (m 2 )) with xi ≥
(u i (m 1 ) − u i (m 2 )) ≥ yi :
(u i (m 1 ) − u i (m 2 ))ti +
Q
tjxj > 0
j=1, j=i
(u i (m 1 ) − u i (m 2 ))ti +
Q
t j y j > 0.
j=1, j=i
Since this holds for any choice of i : 1 ≥ i ≥ Q, this proves the lemma.
Theorem 15 (Inequalities). If the system of linear inequalities, I (C ∗ , S, R), has a solution,
this solution corresponds to a utility function u consistent with C ∗ .
Proof. Each subutility function, u i , 1 ≤ i ≤ Q, is fixed. ByLemma 5.8, for
a given rule r
in some (Ri \R i ), we are assured that if (m 1 , m 2 ) |= r then i ti u i (m 1 ) > i ti u i (m 2 ), for
UTILITY FUNCTIONS FOR CETERIS PARIBUS PREFERENCES
217
all i ∈ {Sa (r ) ∪ Sd (r )}. Since Sa (r ) and Sd (r ) are disjoint, we can rearrange the summations
and obtain:
ti (u i (m 1 ) − u i (m 2 )) >
t j (u j (m 2 ) − u j (m 1 )).
i∈Sa (r )
j∈Sd (r )
Q
Then by Theorem 5.3, u is consistent with r .
If r is not in any (Ri \R i ), then for all i such that r ∈ Ri , u i is 1-consistent with r . For j
such that r ∈ R j , (m 1 , m 2 ) |= r implies that (m 1 S j ) = (m 2 S j ), so all u j are 1-consistent
with r . Since all subutility functions are 1-consistent with r , we have
ti (u i (m 1 ) − u i (m 2 )) > 0
i
and u is consistent with r . Thus, u is consistent with all r in some (Ri \R i ), and all r not in
some (Ri \R i ), so u is consistent with all r ∈ C ∗ . This proves the lemma.