Incremental Decision Making Under Risk with the

Incremental Decision Making Under Risk with the
Weighted Expected Utility Model
Hugo Gilbert, Nawal Benabbou, Patrice Perny, Olivier Spanjaard, Paolo
Viappiani
To cite this version:
Hugo Gilbert, Nawal Benabbou, Patrice Perny, Olivier Spanjaard, Paolo Viappiani. Incremental Decision Making Under Risk with the Weighted Expected Utility Model. International Joint
Conference on Artificial Intelligence (IJCAI’17). 2017.
HAL Id: hal-01515989
http://hal.upmc.fr/hal-01515989
Submitted on 28 Apr 2017
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Incremental Decision Making Under Risk
with the Weighted Expected Utility Model
Hugo Gilbert1,2 , Nawal Benabbou1,2 , Patrice Perny1,2 , Olivier Spanjaard1,2 , Paolo Viappiani1,2
1
Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France
2
CNRS, UMR 7606, LIP6, F-75005, Paris, France
{hugo.gilbert,nawal.benabbou,patrice.perny,olivier.spanjaard,paolo.viappiani}@lip6.fr
Abstract
This paper deals with decision making under risk
with the Weighted Expected Utility (WEU) model,
which is a model generalizing expected utility and
providing stronger descriptive possibilities. We address the problem of identifying, within a given set
of lotteries, a (near-)optimal solution for a given decision maker consistent with the WEU theory. The
WEU model is parameterized by two real-valued
functions. We propose here a new incremental elicitation procedure to progressively reduce the imprecision about these functions until a robust decision
can be made. We also give experimental results
showing the practical efficiency of our method.
1
Introduction
This work is devoted to decision making under risk. In this
framework, a Decision Maker (DM) is faced with numerous
risky prospects given as probability distributions over a finite
number of possible outcomes (i.e. lotteries) and must find
the alternative that suits her best. In this context, preference
modeling and preference elicitation are the key steps that an
intelligent system must realize to advise the DM or to predict
her choices. While preference modeling yields a mathematical model describing specific behaviors, preference elicitation
amounts to fitting the parameters of this model through interactions with the DM. These two tasks are acknowledged to
be very complex, especially due to the fact that uncertainty
affects the DM’s judgements.
The most popular criterion in decision theory is given by
the Expected Utility (EU) model [von Neumann and Morgenstern, 1947]. In this model, the DM’s preferences over
the set X of possible outcomes are modeled using a utility
function u that assigns a numerical value u(x) to each consequence x ∈ X . Then, a lottery p is preferred
to another
P
lottery q iff V (p) ≥ V (q), where V (r) = x∈X r(x)u(x)
for any lottery r and r(x) denotes the probability of x in lottery r. Depending on the concavity or convexity of the utility function, the expected utility maximization will enable to
model risk-averse or risk-seeking behaviors. Nevertheless,
despite its intuitive appeal and its axiomatic justification, the
EU model does not make it possible to account for some rational decision behaviors frequently observed in practice. An
example of such impossibility is the so-called Allais’ paradox
[1953]. We recall below a simple version of this paradox due
to Kahneman and Tversky [1979]:
Example 1. Consider the following lotteries p, q, p0 , q 0 :
x
0$ 3000$ 4000$
p(x)
0
1
0
q(x)
0.2
0
0.8
p0 (x) 0.75
0.25
0
q 0 (x)
0.8
0
0.2
In this example Kahneman and Tversky observed that most
people prefer lottery p to lottery q but prefer lottery q 0 to lottery p0 . However, lottery p0 (resp. q 0 ) is just the mixture of
lottery p (resp. q) and a sure amount of 0$ with a probability of 0.25 and 0.75 respectively. Thus, those preferences
violate the Von Neumann and Morgenstern independence axiom which holds in EU theory. This makes it impossible to
account for such preferences with the EU model.
One model that encompasses EU to account for Allais’
paradox is the Weighted Expected Utility (WEU) model,
developed by Chew [1983]. The WEU model relies on
two functions u and w defined on X . These functions are
then lifted
Pfrom outcomes to lotteries byPusing expectations:
u(p) = x∈X p(x)u(x) and w(p) = x∈X p(x)w(x). In
the WEU model, the preference % is defined by1 :
p % q ⇔ u(p)w(q) − u(q)w(p) ≥ 0
(1)
and u(p)w(q) − u(q)w(p) is interpreted as a signed intensity of preference between p and q. It is worth noting that
if w is constant then WEU boils down to EU. Moreover,
WEU is known to be an instance of Skew Symmetric Bilinear
(SSB) utility which was axiomatized and developed by Fishburn [1982; 1983]. However, SSB theory does not ensure
transitivity, which can be seen as a weakness from a prescriptive point of view. Interestingly, enforcing transitivity in SSB
theory necessarily leads to WEU [Fishburn, 1983].
One key issue in preference modeling with WEU is the
elicitation of the two functions u and w to fit the value system
of a given DM. To the best of our knowledge, no previous
work has tackled the elicitation of the WEU model.
Two different elicitation approaches can be distinguished.
On the one hand, one can perform a full elicitation using a
1
Note that a strict inequality corresponds to a strict preference
systematic sequence of queries aiming at completely specifying the decision model. However, this precise elicitation of
functions u and w would certainly require a prohibitive number of questions. Moreover, the full elicitation of the model is
generally not necessary to identify the optimal choice within
a given set of lotteries.
For this reason we focus on the incremental approach in
which preference queries are selected one by one in order to
efficiently reduce the imprecision over the parameters of the
model until a near optimal choice is identified. This approach
has already proved to be quite efficient in various contexts
(see e.g., [Chajewska et al., 2000] and [Wang and Boutilier,
2003] for EU, [Perny et al., 2016] for Rank Dependent Utility or [Nguyen et al., 2014] for the incremental elicitation of
attacker payoffs in security games).
The paper is organized as follows: Section 2 introduces
some formal background on WEU and motivates our interest
for this model. In Section 3, we give preliminary results that
are used in our elicitation procedure which is presented in
Section 4. Finally, our numerical tests are given in Section 5.
x
p(x)
q(x)
Background on WEU and Motivation
We now present some important features of the WEU model.
Accounting for Allais’ paradox. First, we show that the
WEU model enables to account for Allais’ Paradox.
Example 2. We rescale the possible gains from interval
[0, 4000] to interval [0, 1]. Consider the pair√of functionals
(u, w) defined by u(x) = x2 and w(x) = 1 − x, leading to
the following values:
u
w
p
0.5625
0.1340
q
0.8
0.2
p0
0.1406
0.7835
q0
0.2
0.8
We can easily check that the WEU model using these two
functions is compatible with Allais’ paradox:
• p q since u(p)w(q) − u(q)w(p) ≈ 0.005 > 0.
• q 0 p0 since u(q 0 )w(p0 ) − u(p0 )w(q 0 ) ≈ 0.044 > 0.
Attitude towards risk. The components of the WEU
model can be chosen to account for risk-averse or riskseeking behaviors as in the EU model. As established by
Nakamura [1989] a DM consistent with WEU is risk-averse
(resp. risk-seeking) iff r(x, y) is positive (resp. negative) for
all x, y ∈ X with x 6= y where:
du
(y)(x − y) + u(y) − u(x)]
dx
dw
+ u(y)[w(x) −
(y)(x − y) − w(y)]
dx
r(x, y) =w(y)[
Those conditions can easily be met. For instance, if functions u and w are positive, it is sufficient to choose a concave
(resp. convex) u function and a convex (resp. concave) w
function to obtain a risk-averse (resp. risk-seeking) behavior.
We illustrate this point in the following example.
Example 3. Consider the following lotteries p and q:
25$
1
0
100$
0
0.25
Note that p is the sure lottery where one obtains the expectancy
of lottery q. Let u1 (x) = (x/100)2 , u2 (x) =
p
x/100 and wi=1,2 (x) = 1 − x/100. Let %1 (resp. %2 )
denote the preference relation induced by (u1 , w1 ) (resp.
(u2 , w2 )). Then %1 represents a risk-seeking behavior
whereas %2 represents a risk-averse behavior as illustrated
by the preferences expressed over lotteries p and q:
q 1 p since u1 (p)w1 (q) − u1 (q)w1 (p) ≈ −0.14
p 2 q since u2 (p)w2 (q) − u2 (q)w2 (p) ≈ 0.19
Interpretation of WEU. To give another insight on WEU,
let us rewrite Equation 1 in the following ways (if w > 0):
X
x∈X
2
0$
0
0.75
u(p)/w(p) ≥ u(q)/w(q)
X
w(x)p(x)
w(x)q(x)
P
P
v(x) ≥
v(x)
w(y)p(y)
y∈X
y∈X w(y)q(y)
x∈X
where v(x) is defined as the ratio u(x)/w(x). Hence, lottery
p is preferred to lottery q with the WEU model iff V (p) ≥
V (q), where for any lottery r:
X
w(x)r(x)
u(r)
P
=
v(x)
V (r) =
w(r)
y∈X w(y)r(y)
x∈X
This last formulation makes it possible to interpret functions
w and v. While function v can be interpreted as a utility function representing the extend to which an outcome is satisfactory, function w can be interpreted as a probability distortion
function where the distortion depends on the amounts that
are at stake in the lottery. This distortion of probabilities enables the violation of independence exposed by Allais’ paradox. Note that, in this respect, the WEU model is close to
the Rank-Dependent Utility (RDU) model [Quiggin, 1993]
which is another extension of EU that enables to account
for Allais’ paradox by introducing a distortion of probabilities. However, an important difference between RDU and
WEU can be made regarding their relative position w.r.t the
betweenness axiom (discussed below).
Betweenness. The betweenness axiom [Chew, 1989] states
that the agent has no strict preference or aversion for randomization. If betweenness holds, then for any pair of lotteries
(p, q), any mixture of p and q is ranked between p and q w.r.t
%. More formally, the betweenness axiom states that:
∀p, q, ∀λ ∈ (0, 1), if p % q, then p % (p, λ; q, (1 − λ)) % q
where (p, λ; q, (1 − λ)) denotes the mixture where lotteries
p and q are obtained with probabilities λ and 1 − λ. Verifying betweenness, which is a relaxation of the independence
axiom, ensures several desirable properties. In particular, it
ensures that the preferred element in a finite set L is also the
preferred element of the convex hull2 of L. This implies that
2
the convex hull of L is the set of lotteries obtained by mixing
the elements of L.
we don’t need to investigate mixtures of alternatives to find
the optimal decision. Besides, betweenness has also some
nice consequences in game theory , concerning e.g. the existence of Nash equilibria (see [Chew, 1989]).
To sum up the WEU model is a risk sensitive model that extends the EU model with enhanced descriptive abilities while
conserving desirable normative properties.
3
Preliminary Results
Let P be the set of all lotteries defined on the outcome space
X = [0, 1]. We assume that the DM has to select a lottery
among a finite set L ⊂ P. Within L, each lottery p will
be
Pndenoted (x1 , p1 ; . . . ; xn , pn ), where pi = p(xi ) > 0 and
i=1 pi = 1. In this work, we suppose that the DM is consistent with WEU theory. Stated differently, his preferences are
defined by Equation 1 for a specific pair (u, w) of functions.
We now make two natural assumptions on the DM’s preferences. We notably assume that the DM is not always indifferent, imposing that the highest outcome 1 is strictly preferred
to the lowest one 0. Furthermore, given two outcomes x and y
with x ≥ y, we assume that the signed intensity of preference
between x and any other outcome z is at least as high as the
one between y and z. In the WEU model, these hypotheses
can be written as follows (see Eq. 1):
h1 : u(1)w(0) − u(0)w(1) > 0
h2 : x ≥ y ⇒ ∀z ∈ X , (u(x)−u(y))w(z) ≥ u(z)(w(x)−w(y))
We now derive some properties that hold for u and w in
this context. The first one is a consequence of the following
uniqueness theorem:
Theorem 1 ([Fishburn, 1983]). Suppose is a nonempty
asymmetric weak order on P and (u, w) is a pair of linear
functionals on P such that: ∀p, q ∈ P, p q ⇔ u(p)w(q) >
u(q)w(p). Then, for any pair (u0 , w0 ) of linear functionals on
P, the following properties are equivalent:
1. ∀p, q ∈ P, p q ⇔ u0 (p)w0 (q) > u0 (q)w0 (p)
2. There exist a, b, c, d ∈ R such that u0 = au + bw, w0 =
cu + dw and ad − bc > 0.
We can now state the following result:
Proposition 1. Suppose is a nonempty asymmetric weak
order on P and (u, w) is a pair of linear functionals on P
such that: ∀p, q ∈ P, p q ⇔ u(p)w(q) > u(q)w(p).
Then, if h1 holds, we can build from u and w another pair of
functionals (u0 , w0 ) such that:
• ∀p, q ∈ P, p q ⇔ u0 (p)w0 (q) > u0 (q)w0 (p)
• u0 (1) = w0 (0) = 1 and u0 (0) = w0 (1) = 0
Proof. Let ∆ := u(1)w(0) − u(0)w(1), which is strictly
positive due to h1 . Let (u0 , w0 ) be the pair of linear functionals defined by u0 = au + bw and w0 = cu + dw,
where a = w(0)/∆, b = −u(0)/∆, c = −w(1)/∆ and
d = u(1)/∆. It is easy to check that u0 (0) = w0 (1) = 0
and u0 (1) = w0 (0) = 1, as required. To conclude the proof, we
need to prove that we have p q iff u0 (p)w0 (q) > u0 (q)w0 (p)
for all p ∈ q ∈ P. Since ad − bc = 1/∆ > 0, we know that
this property is verified using Theorem 1.
This proposition enables us to conclude that the DM’s
preferences can be modeled using a pair (u, w) such that
u(0) = w(1) = 0 and u(1) = w(0) = 1; note that these
values somehow reflects the antisymmetric roles that functions u and w play in the WEU model. Interestingly, setting
those values in this way combined with hypothesis h2 imposes monotonicity conditions on u and w:
Proposition 2. Suppose is a nonempty asymmetric weak
order on P. Moreover, suppose (u, w) is a pair of linear
functionals on P such that:
• ∀p, q ∈ P, p q ⇔ u(p)w(q) > u(q)w(p)
• u(1) = w(0) = 1 and u(0) = w(1) = 0
Then h2 holds iff u is nondecreasing and w is nonincreasing.
Proof. ( ⇒ ) Let x, y ∈ X be such that x ≥ y. By definition
of h2 , we have (u(x) − u(y))w(z) ≥ (w(x) − w(y))u(z)
for all z in X . In particular, by taking z = 0 and z = 1, we
obtain u(x) − u(y) ≥ 0 and w(x) − w(y) ≤ 0 respectively.
(⇐ ) Assume that u is nondecreasing and w is nonincreasing.
In that case, for all x, y ∈ X such that x ≥ y, we have u(x) −
u(y) ≥ 0 and w(x) − w(y) ≤ 0. Moreover, since u(0) = 0
and w(1) = 1, we know that u(z) and w(z) are positive for all
z ∈ X . Therefore, we necessarily have (u(x) − u(y))w(z) ≥
u(z)(w(x) − w(y)).
This proposition also ensures that the DM is rational in the
sense that any outcome x is necessarily preferred to any outcome y such that x ≥ y when assuming h1 and h2 .
We remark that Proposition 2 applies to the couple (u, w)
introduced in Example 2 which shows that imposing h1 and
h2 does not prevent to account for Allais’ paradox.
4
The Elicitation Method
Within WEU theory the preferences of the DM are completely characterized by the pair (u, w), which is initially unknown. To assist the DM in her choice, we propose an incremental elicitation procedure aiming to reduce the indetermination of functions u and w in order to discriminate the elements in L. At any stage of the process, we manage two sets
of functions U and W that represent all possible functions u
and w given the preference information collected so far. We
will iteratively generate carefully chosen preference queries
in such a way that collected preference statements enable to
reduce either the set U or the set W.
To initiate the elicitation process, we consider two reference outcomes x+ and x− such that x+ 1 0 x− , i.e.,
x+ and x− must be chosen outside the range of X . The first
step of the elicitation process is to elicit the values u(x− ) and
w(x+ ). The elicitation can be performed using the following
lotteries: pα = (1, α; x− , 1 − α) and qβ = (x+ , β; 0, 1 − β).
We ask the probabilities α0 and β0 for which the following
indifference relations hold: pα0 ∼ 0 and qβ0 ∼ 1. Note
that probability equivalence queries are standard tools used
in decision making under uncertainty. The answers to those
queries yield the following equalities:
u(pα0 )w(0) − u(0)w(pα0 ) = 0 ⇔ u(x− ) = −α0 /(1 − α0 )
u(qβ0 )w(1) − u(1)w(qβ0 ) = 0 ⇔ w(x+ ) = −(1 − β0 )/β0
Note that both u(x− ) and w(x+ ) are strictly negative, which
is consistent with the nondecreasingness (resp. nonincreasingness) of u (resp. w). Once values u(x− ) and w(x+ )
are known, we consider two types of queries, denoted by
uQuery(α, r) and wQuery(α, r), both parametrized by a lottery r and by a probability value α.
uQuery(α, r) asks the DM to compare the compound lottery rα = (r, α; x− , 1 − α) to a sure gain of 0. If rα % 0,
then we deduce a new constraint as follows:
rα % 0 ⇔ u(rα )w(0) − u(0)w(rα ) ≥ 0
⇔ u(rα ) ≥ 0 since u(0) = 0 and w(0) = 1
⇔ αu(r) − α0 (1 − α)/(1 − α0 ) ≥ 0
X
⇔α
r(x)u(x) − α0 (1 − α)/(1 − α0 ) ≥ 0 (2)
x
If the DM prefers the sure gain, then the inequality is reversed
and in both cases we obtain a constraint on the value u(r).
wQuery(α, r) asks the DM to compare lottery rα =
(x+ , α; r, 1 − α) to a sure gain of 1. If rα % 1, then we
have to impose:
rα % 1 ⇔ u(rα )w(1) − u(1)w(rα ) ≥ 0
⇔ w(rα ) ≤ 0 since u(1) = 1 and w(1) = 0
⇔ α(β0 − 1)/β0 + (1 − α)w(r) ≤ 0
X
⇔ α(β0 − 1)/β0 + (1 − α)
r(x)w(x) ≤ 0 (3)
x
Similarly, we have to reverse the inequality if we observe
1 % rα instead. Thus, we are able to derive some constraints
on the values of functions u and w that are completely decoupled. This point will reveal paramount for the efficiency
of the elicitation process. Those constraints are then used to
reduce the uncertainty attached to the sets U and W.
To identify a near-optimal lottery with few queries, our approach has to focus on the relevant parts of functions u and
w. This is achieved by using a criterion that indicates which
lotteries should be further questioned in order to generate an
highly informative constraint (i.e., that enables to identify as
quickly as possible a near-optimal lottery in L). To choose
the next lottery to be asked, we use the maxmin criterion.
Maxmin Optimization. Under preference uncertainty, one
may be interested in the lottery p∗ that performs best in the
worst-case scenario. Since the value of any lottery p ∈ L is
equal to u(p)/w(p) with the WEU model, the maxmin lottery
p∗ is formally defined by:
u(p)
p∗ ∈ arg max
min
(u,w)∈U ×W w(p)
p∈L
The key observation to determine p∗ is that the constraints
obtained when asking uQueries are completely decoupled
from the ones obtained when asking wQueries. Therefore,
minimizing u(p)/w(p) for a fixed lottery p amounts to minimizing u(p) over U and maximizing w(p) over W.
In order to obtain a guarantee on the quality of p∗ with
respect to the other options in L, we now wish to determine an upper bound UB(q, p∗ ) on how much another lottery
q ∈ L could be preferred to p∗ (i.e., how large u(q)w(p∗ ) −
u(p∗ )w(q) can be). For any p ∈ L, we denote by u(p)
and u(p) (resp. w(p) and w(p)) the minimal and maximal values of u(p) for u ∈ U (resp. w(p) for w ∈ W).
Given two lotteries p, q ∈ L, it can easily be checked
that u(q)w(p) − u(p)w(q) ≥ u(q)w(p) − u(p)w(q) for all
(u, w) ∈ U × W, and therefore one can set UB(q, p∗ ) =
u(q)w(p∗ ) − u(p∗ )w(q). Thus, an upper bound on how bad
the choice of p∗ could be is:
UB(p∗ ) =
max
q∈L\{p∗ }
UB(q, p∗ )
For some given sets U and W, it might be the case that
UB(p∗ ) is too large, meaning that recommending p∗ is a bad
decision in the worst-case scenario. In this situation, we collect new preference queries to reduce U and W until UB(p∗ )
becomes smaller than a given acceptable threshold; in particular, if UB(p∗ ) = 0, then p∗ is necessarily the best option.
Note that the obtained guarantee is looser than the one obtained by minmax regret [Savage, 1951] as:
u(q)w(p) − u(p)w(q) ≥
max
(u,w)∈U ×W
(u(q)w(p) − u(p)w(q))
However, the minmax regret induces quadratic optimization
problems (instead of linear ones) which are relatively difficult
to optimize in general.
Query Generation Strategy. At each step of the elicitation
process, we compute both a maxmin lottery p∗ and a challenger maxmax lottery p∗ defined by:
p∗ ∈ arg max UB(q, p∗ ).
q∈L\{p∗ }
By definition, lottery p∗ is the one that may induce the
largest loss when recommending p∗ . To decrease UB(p∗ ) =
UB(p∗ , p∗ ) = u(p∗ )w(p∗ ) − u(p∗ )w(p∗ ) as much as possible, we first compute max{u(p∗ ) − u(p∗ ), w(p∗ ) − w(p∗ ),
u(p∗ ) − u(p∗ ), w(p∗ ) − w(p∗ )}. Then, depending on which
is maximum, we ask either, respectively, uQuery(α, p∗ ),
wQuery(α, p∗ ), uQuery(α, p∗ ), or wQuery(α, p∗ ) with an appropriate value α. We now discuss the choice of α so as to
obtain the most informative query.
Consider a query of type wQuery(α, p) for a given p. We
want to choose α so that the imprecision on w(p) is reduced
as much as possible in the worst-case scenario of answers.
Therefore, we choose α in order to obtain a query3 allowing
w(p) to be compared to (w(p) + w(p))/2. This strategy enables to reduce the imprecision on w(p) by 50% regardless of
the DM’s answer. The procedure is similar for choosing α in
queries of type uQuery.
Compact Representation of u and w. A first approach
for representing functions u and w leads to the introduction
of two variables representing u(z) and w(z) for any relevant consequence z (i.e. the consequences that appear at
least once in the set L of lotteries to be compared). However, this approach becomes more and more inefficient as
the number and/or the size of lotteries increase. It directly
3
In this case, the exact expression of α is:
α := (w(p) + w(p))(w(p) + w(p) − 2w(x+ ))
impacts the number of variables defining the set of admissible functions u and w but also the number of constraints
that must be considered to enforce the desired monotony
of those two functions. For this reason, we adopt a more
compact representation consisting in approximating functions u and w by monotone spline functions [Ramsay, 1988;
Perny et al., 2016].
Spline functions are piecewise polynomials whose pieces
connect with a high degree of smoothness. They have a large
capacity to approximate complex shapes and guarantee that
smooth curves will be derived from smooth data (for more
details, see e.g., [Beatty and Barsky, 1987]). On a given interval [a, b], a spline function f is defined as a linear combination of m P
basis spline functions fi , i ∈ {1, . . . , m},
m
i.e. f (x) =
i=1 λi fi (x). The definition of the basis
spline functions is based on a subdivision of [a, b] into k
parts [ξj , ξj+1 ], j ∈ {1, . . . , k − 1}, with ξ1 = a and
ξk = b. The basis spline functions are piecewise polynomials
on [a, b] as they are polynomials on each interval [ξj , ξj+1 ),
j ∈ {1, . . . , k − 1}; therefore, f is also piecewise polynomial on [a, b] by construction. Moreover, adjacent polynomials have matching derivatives (insuring the smoothness). In
this work, we use cubic splines (defined on [a, b] = [0, 1]) as
they are known to offer a good compromise between smoothness and flexibility (see e.g., [Beatty and Barsky, 1987]).
When one wants to enforce the monotonicity of function
f , a standard approach is to use a basis of monotone spline
functions, namely I-splines [Ramsay, 1988] denoted by Ii ,
i ∈ {1, . . . , m}. For illustrative purpose, Fig. 1 shows a basis composed of six cubic I-spline functions, together with a
spline function generated from this basis.
•••••••
•••
•••
••
••
•
•
0.8
••
••
••
••
•
••
••
0.6 I1
I2
I3
I4 ••••
I5
I6
•
••••
••
••
•••
•••••
••
•
•
•
••••
0.4
••••
•••••
•••••
••••••
•
•
•
••
•••
••
0.2
••
••
•
••
•
0•
0
0.2
0.4
0.6
0.8
1
1
Figure 1: The six cubic I-splines associated with the subdivision of [0, 1] = [0, .3] ∪ [.3, .5] ∪ [.5, .6] ∪ [.6, 1], together
with function f (x) = 0.3I1 + 0.2I3 + 0.5I5 (dotted line).
As function u must be non-decreasing with u(0) = 0 and
u(1) = 1, we define u as a convex combination of I-splines:
Pm
u(x) = i=1 λui Ii (x)
(4)
Pm u
u
where λi ≥ 0 for all i ∈ {1, . . . , m} and i=1 λi = 1.
Similarly, since w is a non-increasing function with w(0) = 1
and w(1) = 0, we define w by:
Pm
w(x) = 1 − i=1 λw
(5)
i Ii (x)
P
m
w
where λw
i ≥ 0 for all i ∈ {1, . . . , m} and
i=1 λi = 1.
Equations 4 and 5 provide a compact definition of functions
u and w using only 2m parameters, a number which remains
constant as the number of lotteries increase. Without any
knowledge of the P
DM’s preferences, function u can be any
m
function in U = { i=1 λui Ii , (λu1 , . . . , λum ) ∈ U }, where:
Pm
U = {(λu1 , . . . , λum ) ∈ [0, 1]m , i=1 λui = 1}
Similarly,
be any element of the set W =
Pm w can initially
w
w
{1 − i=1 λw
i Ii , (λ1 , . . . , λm ) ∈ W }, where:
Pm w
w
m
W = {(λw
1 , . . . , λm ) ∈ [0, 1] ,
i=1 λi = 1}
We now explain how this spline representation is used in
our elicitation procedure. To this end, we show how values
u(p), u(p), w(p) and w(p) are computed for any lottery p.
By using our spline representation in Equation 2 (resp. 3), we
observe that any uQuery (resp. wQuery) yields a linear constraint on parameters λui (resp. λw
i ). Those linear constraints
are then imposed to elements of U and W which are therefore
characterized by linear constraints. Remark that sets U and
W are two convex polyhedra that implicitly represent U and
W at any time of the process.
Moreover, note that, for any lottery p, u(p) (resp. w(p)) is
linear in parameters λui (resp. λw
i ).Thus computing u(p) and
w(p) can be achieved by solving the following linear programs (LPs):

m

X
X

u
 min
λ
p(x)Ii (x)
i
Pu λu1 ,...,λum i=1
x



(λu1 , . . . , λum ) ∈ U

m

X
X

 min
λw
p(x)Ii (x)
i
w
w
Pw λ1 ,...,λm i=1
x



w
(λw
1 , . . . , λm ) ∈ W
where LP Pu (resp. Pw ) is used to minimize u(p) (resp. maximize w(p)). Note that Pw is here stated as a minimization
problem because max{1−x} = 1−min{x}. Values u(p) and
w(p) can similarly be obtained by changing the minimization
problems into maximization problems.
5
Experiments
We carried out numerical tests in order to assess both the average number of queries required by our approach and the
quality of the returned lottery. Interactions with the DM are
simulated by generating answers to queries using WEU with
two hidden, randomly generated, functions uh and wh . To
model uh and wh , we use splines generated by a basis of
m = 12 cubic I-spline functions as defined in Eq. 4, 5.
Number of queries. Our query generation strategy (described in the previous section and denoted by LA hereafter)
is characterized by both the rule for selecting the lottery on
which a query is asked and the rule for choosing the probability parameter α. In order to assess the impact of each
of these rules, we investigate other strategies relaxing either
the former and/or the latter. The different elicitation strategies obtained are denoted by LA, LA and LA, where L
(resp. L) means that the queried lottery is chosen as in our
strategy (resp. is chosen randomly) and A (resp. A) means
that parameter α is chosen as in our strategy (resp. is chosen randomly). Finally, we also consider strategy OA that,
∗+
1+
+
U B LA (p∗ , p∗ )
U B LA (p∗ , p∗ ) +
∗ ++
U B OA (p∗ , p∗ )
U B LA (p∗ , p∗ )
+
LA ∗
∗
+
U
B
(p
,
p
)
∗
0.8
+++
+
∗
++
++++
++++++
∗
++++++
++++++++
0.6
∗
+++
∗
+++++++
∗
+++
∗∗
∗∗
∗∗
0.4
∗∗
∗∗∗
∗∗
∗∗∗
∗∗∗
∗∗∗∗
0.2
∗∗∗∗
∗∗∗∗∗∗∗
∗∗∗∗∗∗
∗∗∗
0
1×
5
10
15
20
25
30
35
40
45
50
×
LLA (p∗ )
LOA (p∗ )
×
0.6
×
×
0.4
×
×
××
×
××
××
×××
××××
×××××××
×××××××××××××××××××
××
5
10
15
20
25
30
35
40
45
50
0.2
0
0
U B LA (p∗ , p∗ )
U B OA (p∗ , p∗ )
×
0.8 ×
0
(a) Reduction of U B(p∗ , p∗ ) as the number of queries (b) Reduction of U B(p∗ , p∗ ) and L(p∗ ) as the number
increases (results averaged over 50 runs).
of queries increases (results averaged over 50 runs).
+
+++++++++++++++++++++++++×
++++++
+++
+++++++++++++++++++++++++++++++++++++++++++++++++++++
×××××××××××××××××××××××××××××××××
+++
×××
+
×
×
+
×
+
0.8
××
+
××
××
×
+
×
××
+
×××
0.6
×××
×
×
+
××
××
××
0.4 +
××××××
×
×
×
×
×
0.2 +
max +
×
×
××
min ×
×
×××
+×××××××××××××××××
0×
0
0.2
0.4
0.6
0.8
1
1
u
u
+++++++++++++++++++
1×
+++
+++
max +
+++
++
+++++
min ×
×
+++
0.8
++
+++++++
++
×
++
0.6
+
+
×
+
+
+
×
+
0.4
×
+
+
××
+
×××××××××××××××××××××××××××××××××××××××××
+
××
++
×××
+
××××××××××++++++++++++++
0.2
+++++++++++++++++
×××××
+++++
××××
××××××××××××××××
×××××
×××
×××
+
×
0
0
0.2
0.4
0.6
0.8
1
w
w
(c) Minimal and maximal values that can be taken by the (d) Minimal and maximal values that can be taken by the
spline representing uh at the end of one run.
spline representing wh at the end of one run
instead of asking a query on the lotteries p∗ and p∗ , asks
a query on the outcome (hence the O) in their support for
which the value-imprecision is the highest; those queries are
easier to answer for a DM as lotteries can be quite complex
objects. We denote by UBS (p∗ , p∗ ) the upper bound obtained with strategy S, S ∈ {LA, LA, LA, LA}. Fig. 2a
plots the decrease of UBS (p∗ , p∗ ) according to the number
of queries averaged over 50 randomly generated sets L of
possible lotteries. Each set L contains 1000 lotteries such
that no stochastic dominance relation exist between them.
The support of each lottery has a size generated uniformly
in {1, . . . , 10} and consists of values generated uniformly in
(0, 1). One observes that UBLA (p∗ , p∗ ) decreases efficiently;
for instance, UBLA (p∗ , p∗ ) ≤ 0.05 after only 15 queries. If
queries are issued on outcomes (strategy OA), the performance is slightly lower and 22 queries are required to reach
UBOA (p∗ , p∗ ) = 0.05. By contrast, the other strategies S are
not able to reduce effectively UBS (p∗ , p∗ ).
Interest of the incremental approach. In Fig. 2c and 2d,
we plot the model that has been learned with our strategy on
one run of 50 queries. The lines umin and umax (resp. wmin
and wmax ) give the minimal and maximal values that can be
taken by the spline function u (resp. w) approximating uh
(resp. wh ) in each point according to the answers of the DM.
Interestingly enough, it appears that functions u and w are far
from being fully elicited, while the estimated loss is already
tiny and lottery p∗ is almost surely optimal. This illustrates
the main interest of the incremental approach that interweaves
elicitation and optimization.
Quality of the returned lottery. We recall that the bound
UBS (p∗ , p∗ ) is a pessimistic estimate of the actual loss
LS (p∗ ) that would be entailed by stopping the elicitation process and choosing lottery p∗ :
LS (p∗ ) = maxq∈L\{p∗ } {uh (q)wh (p∗ ) − uh (p∗ )wh (q)}.
Fig. 2b evaluates how far UBS (p∗ , p∗ ) is from LS (p∗ ) for
S ∈ {LA, OA} (the two best strategies w.r.t. the number of
queries). We observe that during the entire elicitation process,
LS (p∗ ) is much lower than UBS (p∗ , p∗ ) for both S = LA
and S = OA. In fact, LS (p∗ ) reaches 0 after only 10 queries
for both elicitation strategies. This shows that, even if one
Other experiments. We tested how our approach is affected by varying the size of L and the maximum number
of branches in the lotteries. Whichever strategy S is used, we
did not observe a significant increase in the number of queries
required to decrease efficiently U B S (p∗ , p∗ ).
decides an early interruption of the elicitation process (w.r.t.
the estimate UBS (p∗ , p∗ )), the chosen lottery p∗ is likely to
be optimal w.r.t. uh and wh .
Computation time. We monitored the computation time required by our approach as it is crucial that the DM does not
wait a prohibitively long time between two queries4 . Even for
a large set L consisting of 1400 lotteries, the overall computation time for 30 queries is 12sec. and, for instance, the 20th
query requires less than 0.5sec (average times on 50 runs).
4
Implementation in Java using Gurobi 5.6.3 for the LPs. Times
are wall-clock times on a 2.4 GHz Intel Core i5 machine with 8G
main memory.
6
Conclusion
We proposed an incremental elicitation procedure for the
WEU model to solve the problem of identifying, within a
given set of lotteries, a near-optimal solution. The efficiency
of the method relies on a redefinition of functions u and w
as monotone spline functions, which considerably reduces
the elicitation burden while keeping a high descriptive power.
Moreover, we have shown that, contrary to RDU, WEU has
the strong advantage that functions u and w can easily be
elicited concurrently in an incremental elicitation setting.
References
[Allais, 1953] M. Allais. Le comportement de l’homme rationnel devant le risque: Critique des postulats et axiomes
de l’ecole americaine. Econometrica, 21(4):pp. 503–546,
1953.
[Beatty and Barsky, 1987] J.C. Beatty and B.A. Barsky. An
introduction to splines for use in computer graphics and
geometric modeling. Morgan Kaufmann, 1987.
[Chajewska et al., 2000] U. Chajewska, D. Koller, and
R. Parr. Making rational decisions using adaptive utility
elicitation. In AAAI/IAAI, pages 363–369, 2000.
[Chew, 1983] S.H. Chew. A generalization of the quasilinear
mean with applications to the measurement of income inequality and decision theory resolving the Allais paradox.
In Econometrica, 1983.
[Chew, 1989] S.H. Chew. Axiomatic utility theories with
the betweenness property. Annals of operations Research,
19(1):273–298, 1989.
[Fishburn, 1982] P.C. Fishburn. Nontransitive measurable
utility. Journal of Mathematical Psychology, 26:31–67,
1982.
[Fishburn, 1983] P.C. Fishburn. Transitive measurable utility. Journal of Economic Theory, 31(2):293–317, 1983.
[Kahneman and Tversky, 1979] D. Kahneman and A. Tversky. Prospect theory: An analysis of decisions under risk.
Econometrica, pages 263–291, 1979.
[Nakamura, 1989] Y. Nakamura. Risk attitudes for nonlinear
measurable utility. Annals of Operations Research, 19:pp.
311–333, 1989.
[Nguyen et al., 2014] T.H. Nguyen, A. Yadav, B. An,
M. Tambe, and C. Boutilier. Regret-based optimization
and preference elicitation for Stackelberg security games
with uncertainty. In Proceedings of AAAI 2014, pages
756–762, 2014.
[Perny et al., 2016] P.
Perny,
P.
Viappiani,
and
A. Boukhatem.
Incremental preference elicitation
for decision making under risk with the rank-dependent
utility model. In Proceedings of UAI 2016, July 2016.
[Quiggin, 1993] J. Quiggin. Generalized Expected Utility
Theory: The Rank-Dependent Model. Kluwer, 1993.
[Ramsay, 1988] J.O. Ramsay. Monotone regression splines
in action. Statistical science, pages 425–441, 1988.
[Savage, 1951] L. Savage. The Theory of Statistical Decision. In Journal of the American Statistical Association,
1951.
[von Neumann and Morgenstern, 1947] J. von Neumann and
O. Morgenstern. Theory of games and economic behaviour. Princeton University Press, 1947.
[Wang and Boutilier, 2003] T. Wang and C. Boutilier. Incremental Utility Elicitation with the Minimax Regret Decision Criterion. In Proceedings of IJCAI’03, pages 309–
316, 2003.

Download Report

Incremental Decision Making Under Risk with the

Paperzz.com

Your Paperzz