An Automated Agent for Bilateral Negotiation with Bounded Rational

An Automated Agent for Bilateral Negotiation with
Bounded Rational Agents with Incomplete Information
∗
Raz Lin and Sarit Kraus†
Jonathan Wilkenfeld †
James Barry
Computer Science
Department
Bar-Ilan University
Ramat-Gan, Israel
Department of Government
and Politics
University of Maryland
College Park, Maryland
{linraz,sarit}@cs.biu.ac.il
[email protected]
Center for International
Development and Conflict
Management
University of Maryland
College Park, Maryland
ABSTRACT
Many day-to-day tasks require negotiation, mostly under
conditions of incomplete information. In particular, the opponent’s exact tradeoff between different offers is usually
unknown. We propose a model of an automated negotiation
agent capable of negotiating with a bounded rational agent
(and in particular, against humans) under conditions of incomplete information. Although we test our agent in one
specific domain, the agent’s architecture is generic; thus it
can be adapted to any domain as long as the negotiators’
preferences can be expressed in additive utilities. Our results indicate that the agent played significantly better than
human counterparts in some cases, while in the other cases
there was no significant difference between the results of the
agent and the human players.
Categories and Subject Descriptors
I.2.11 [Artificial Intelligence]: Distributed Artificial Intelligence—Intelligent agents
General Terms
Negotiation
1.
INTRODUCTION
Many day-to-day tasks require negotiation, often in a bilateral context. In this setting two agents are negotiating
over one or several issues in order to reach consensus [9, 11,
13]. The sides might have conflicting interests, expressed in
their utility functions, and they might also cooperate in order to reach a beneficial agreements for both sides [2, 4, 12,
18, 19]. While models for bilateral negotiations have been
∗
The primary author of this paper is a student.
This author is also affiliated with the University of Maryland Institute of Advanced Computer Studies.
†
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.
[email protected]
extensively explored, the setting of bilateral negotiation between an automated agent and a bounded rational agent is
an open problem for which an adequate solution has not
yet been found. Despite this, the advantages of succeeding
in representing an automated agent with such capabilities
cannot be understated. Using such an agent could help in
training people in negotiations and assist in e-commerce environments by representing humans and other activities in
daily life. Our proposed automated agent makes a significant contribution in this respect.
An even more difficult problem occurs when incomplete
information is added into this environment. That is, there
is also lack of information regarding some parameters of
the negotiation. Although this problem is not new for researchers in the fields of Multi-Agent Systems (MAS) and
Game Theory, most research makes simplifying assumptions
that do not necessarily apply in genuine negotiations. For instance, [8, 17] focus on bilateral negotiation where both sides
have complete information regarding both the state of the
world and each other’s utilities. Others, like [8, 9, 10, 13],
discuss bilateral or multi-agents and multi-issue negotiations
with incomplete information. The incomplete information is
dealt with either by using heuristics, genetic algorithms, influence diagrams or sequential equilibrium. However, none
of them take into account the bounded rationality of the
opponent.
We consider a setting of bilateral negotiation with incomplete information between an automated agent and a
bounded rational agent. The incomplete information is expressed as uncertainty regarding the utility preferences of
the opponent. The negotiation itself consists of multi attribute issues and time-constraints. During the negotiation
process, one agent gains utility over time, while the other
loses. If no agreement is reached by the given deadline a
status-quo outcome is enforced. We propose a model of an
automated agent for this type of negotiation by using a qualitative decision making approach, adjusting the approach
described in [2, 3, 7, 20] to incomplete information settings.
We conducted three sets of experiments in which we matched
our automated agent against (a) human negotiators, (b) an
automated agent following an equilibrium strategy, and (c)
against itself - that is, same models of agent playing both
sides. By analyzing the results of the experiments we conducted, we show that our automated agent is capable of negotiating efficiently and reaching multi-attribute agreements
in such an environment. When playing one of the sides in
the negotiation our agent reached significantly better results
than the human players, and also allowed both sides to reach
an agreement significantly faster. On the other hand, while
the automated agent was playing the other side, though it
did not reach significantly better results than the human
player, it did not reach worse results. Also, there are no
significant differences between the results reached by the
human players and the automated agent playing that role.
The rest of the paper is organized as follows. Section 2
provides an overview of bilateral negotiation with incomplete information. Section 3 describes an overview of decision making techniques. Section 4 presents the design of our
automated agent, including its beliefs updating and decision
making mechanisms. Section 5 describes the experimental
setting and methodology and reviews the results. Finally,
Section 6 provides a summary and discusses future work.
2.
PROBLEM DESCRIPTION
We consider a bilateral negotiation in which two agents
negotiate to reach an agreement on conflicting issues. The
negotiation can end either when (a) the negotiators reach a
full agreement, (b) one of the agents opts out, thus forcing
the termination of the negotiation with an opt-out outcome
denoted OP T , or (c) a predefined deadline is reached, denoted dl, in which, if a partial agreement was reached it is
implemented or, if no agreement was reached, a status quo
outcome, denoted SQ, is implemented. Let I denote the set
of issues in the negotiation, oi the set of values for each i ∈ I
and O = o1 × o2 × . . . × o|I| the set of values for all issues.
Since we allow partial agreements ∅ ∈ oi for each i ∈ I. An
offer is then denoted as a vector ~o ∈ O. It is assumed that
the agents can take actions during the negotiation process
until it terminates. Let Time denote the set of time periods
in the negotiation, that is Time = {0, 1, ..., dl}. One agent
gains as time passes, while the other agent loses.
In each period t ∈ Time of the negotiation, if the negotiation has not terminated earlier, each agent can propose a
possible agreement, and the opponent can either accept the
offer, reject it or make a counter-offer. Each agent can either propose an agreement which consists of all the issues in
the negotiation, or a partial agreement. Each agent can also
send queries, promises to accept an agreement, comments or
make threats. As opposed to the model of alternating offers
[18], each agent can perform up to M interactions with the
opponent agent at a given time period. Thus, an agent must
worry that its opponent may opt out in any time period.
As we deal with incomplete information, we assume that
there is a finite set of agent types. These types are associated
with different additive utility functions. Formally, we denote
the possible types of the agents Types = {1, . . . , k}. We
refer to an agent whose utility function is ul as an agent of
type l, and ul : {{O ∪ SQ ∪ OP T } × Time} → R. Each
agent is given its exact utility function. The agent is also
aware of the set of possible utility functions of the opponent.
However, the exact utility function of the rival is private
information. Our agent has some probabilistic belief about
the type of the other agent. This belief may be updated
over time during the negotiation process (for example, using
Bayes formula).
3.
DECISION MAKING TECHNIQUES
Different approaches are used to depict an agent’s behav-
ior or strategy in the course of making decisions. Some use
the notion of qualitative decision making [2, 3, 7, 20], while
others use the traditional quantitative methods [5, 12, 14,
18, 19], even when dealing with incomplete information and
bounded rational agents. One shortcoming of the traditional
quantitative representation of probability and utility is that
it does not take into account irrational choices. On the
other hand, the qualitative decision theory seeks to answer
unforseen events and partial information by not relying on
numerical utility functions [6].
In the field of game theory, the problem of modeling strategies for negotiation with incomplete information was originally confronted using the notion of Bayesian Nash equilibrium and sequential equilibrium [12, 18, 19]. However,
these notions require that both sides follow the equilibrium
and it is known that bounded rational agents in general,
and humans in particular, do not necessarily follow equilibrium strategies [1]. It is not even clear whether they play as
expected utility maximizers (see for example [11]). To confront this issue, other notions, still under the quantitative
equilibrium paradigm, have been suggested, for example the
trembling hand equilibrium [18, 19] and the ”standard logit
model” [5].
In our environment we assume that a bounded rational
agent (a) does not necessarily follow equilibrium strategies,
and (b) is not an expected utility maximizer. Thus, we
need new techniques to confront this problem. Techniques
which make these assumptions generally fall in the category
of qualitative decision making [2, 3, 7, 20]. In the remainder
of this section we present an overview of those techniques.
Brafman and Tennenholtz [3] describe two approaches of
qualitative decision theory. Both approaches provide qualitative tools. While the first approach deals with making
decisions that are consistent with classical decision theory
by replacing the probability and utility used in the quantitative approach and use non-quantitative descriptions of
beliefs and preferences, the second approach deals with representing beliefs and preferences that do not necessarily conform to the axioms of classical decision theory. Thus, the
second approach does not require consistency with the principle of expected utility maximization. An example of such
usage is the maximin criteria as described in [20]. Tennenholtz [20] assumes that both agents follow the maximin
strategy. That is, if Sj denotes the set of strategies available for agent j, and uj denotes the utility function for that
agent, we can compute the maximin value. In the case of
two agents, agent 1 and agent 2, for example, the maximin
value of agent 1 is given by:
max min u1 (s, r)
s∈S1 r∈S2
(1)
Dubois et al. [7] also describe the maximin criteria as a
pessimistic attitude to decision making for risk-averse agent.
That is, they show a counterpart to Von Neumann Morgenstern expected utility theory, which only requires ordinal
scales. These ordinal scales are used for preference and
uncertainty, and are equipped with the maximum, minimum and an order reversing operation. Using these qualitative settings, both the incomplete information regarding
the state of the world and the decision-maker’s preferences
on the actions are represented. Adapting, modifying and
incorporating the maximin criteria in our agent can help it
cope with incomplete information, and this is the basis for
our mechanism for generating offers, as described in Section
4.1.
Luce [14] presents another notion that replaces the expected utility, though it does not fall in the category of
qualitative decision theory. Luce’s model of choice describes
behavior that is stochastic in a systematic way. Thus, this
model of choice can be used when dealing with incomplete
information by assigning to each alternative a probability of
being chosen. In essence, if A is the space of alternatives,
then the model uses Luce numbers instead of utility (or payoff). That is, the decision maker attaches to each alternative
a ∈ A a non-negative number v(a). Then, an action a∗ is
chosen from A with a certain probability. This probability
is calculated by the following equation:
P
v(a∗ )
a∈A v(a)
(2)
Combining both the maximin criteria and the notion of
Luce numbers allows us to deal with the bounded rationality
agents and the incomplete information of the negotiation.
The transformation of the utility function into an ordinal
scale, achieved by using the maximin criteria, makes it easier to rank the possible agreements and act based on those
rankings. This is a desired property, since we believe that
the bounded rational agents perceive the agreements by an
ordinal ranking scale, and act by this scale. Incorporating
this in our agent design allows the automated agent to act
based on those assumptions.
4.
AGENT DESIGN
Due to the settings in which our agent operates - incomplete information and the bounded rationality of the opponent - our agent is built with two mechanisms: (a) a decision
making mechanism, and (b) a mechanism for updating beliefs.
For the decision making mechanism we incorporate a qualitative decision making model, rather than quantitative decision making. In using such a model we do not necessarily
assume that the opponent agent will follow the equilibrium
strategy or try to be a utility maximizer. Also, this model
is better suited for cases in which the utility or preferences
are unknown but can be expressed in ordinal scales or as
preference relations [3, 7].
For the mechanism for updating beliefs, the formal Bayesian
updating rule is used. This rule allows us to compensate for
incomplete information and enables good adaptation in a negotiation with time-constraints. In such a negotiation there
are no past interactions to learn from and not enough time
periods to build a complete model, such as neural networks
[17] or genetic algorithms [13]. Using the Bayesian updating
rule we can update the believed type of the opponent and at
each time period act as if the opponent is of a certain type.
We describe both mechanisms in the following subsections.
4.1
The Qualitative Decision Making Component (QDM)
Below we describe the qualitative strategy we follow. While
we provide some theoretical foundations for our approach,
we demonstrate its effectiveness using simulations with human subjects in environment of incomplete information, as
described in Section 5.
The qualitative decision making component takes into account the agent’s utility function, as well as the believed
type of the opponent. This data is used both for deciding
whether to accept or reject an offer and for generating an
offer, which will be proposed to the opponent in the next
time period. In our settings, although several offers can be
proposed in each time period, we restrict our agent to making a single offer in each period. This is done due to the fact
that our mechanism for generating offers only produces one
distinct offer at a given time period. Also, in our domain,
our agent never chooses the option of opting out of the negotiation, which is assigned a low utility value. Instead, the
agent proposes offers until the negotiation terminates, and
in the worst case achieves the status-quo outcome. This is
due to the fact that the opting-out option was not integrated
as part of the values for the issues in the negotiation.
4.1.1
Generating Offers
We assume that, at each time period t, the automated
agent has a belief about the type of its opponent. This
believed type, denoted by BT (t), is the one that the agent
presumes to be the most probable for the opponent. The
agent uses the utility function associated with that type in
all of the calculations in that time period. In Section 4.2 we
describe in detail the elicitation of this belief.
BT (t)
We denote by uopp the utility function associated with
the believed type of the opponent at time t. From this utility function, our automated agent derives the Luce numbers
[14] associated with each offer. Since the Luce number is
calculated based on a given utility function, we denote the
Luce number of an offer derived from the opponent believed
utility at time t, BT (t), by luopp (offer | BT (t)), and the
Luce number for an offer derived by the agent’s own utility
simply as lu(offer). Property 1 states an important relation
that we will use regarding Luce numbers.
Property 1 (Luce Number Relation). For every two
offers x and y and agent j, where luj (x) and luj (y) denote
the Luce numbers associated with offers x and y respectively,
if uj (x) ≥ uj (y) then luj (x) ≥ luj (y) .
We then use a qualitative decision strategy to generate
the next offer. The strategy uses an adapted optimistic criterion, as described in Equation (1). We denote our function
by QO(t) (standing for Qualitative Offer), where t ∈ Time
is the time of the offer. In general, the strategy selects an
offer o in time t such that:
QO(t)
=
arg max min{α, β}
where
α=
uj (o, t)
and
β=
(t)
[luopp (o | BT (t)) + lu(o)] · uBT
opp (o, t)
o∈O
(3)
Intuitively, instead of simply taking into consideration
only the utility value of an offer for both sides, the utility value of the opponent is multiplied by a given factor.
This factor encapsulates the sum of Luce numbers of the
offer for both sides. Using this factor enables us to also take
into account the probabilities that the given offer will be
accepted by either side, using the same ordinal scale. Thus,
the utility of the opponent is either given higher values, if
both sides have high probabilities of accepting that offer, or
lower values, if both sides have low probabilities of accepting it, and only then is it compared to the utility of the
automated agent for selecting the next offer to be proposed.
The next subsection deals with the question of when the
agent should accept an incoming offer or reject it.
4.1.2
Accepting Offers
The agent needs to decide what to do when it receives an
offer from its opponent, offeropp , at time t − 1. Obviously,
if the utility of offeropp is higher than or equal to the utility
of the offer our agent is about to propose at the next time
period, it should accept it now, and not wait for the next
time period, in which there is a chance that the opponent
will reject that offer. Formally, if we refer to the automated
agent as agent 1 and the bounded rational agent as agent
2, if u1 (offeropp ) ≥ u1 (QO(t)) then our agent accepts the
offer. Otherwise, our agent should not immediately rule out
accepting the offer it has just received. Instead, it should
take into consideration the possibility that its counter offer
will be accepted or rejected by the opponent. This is done by
comparing the utility of the opponent (actually, the believed
utility of the opponent) from the original offer as compared
to the opponent’s utility from our offer. If the difference is
lower than a given threshold T (in the simulations, T was set
to 0.05), then there is a high probability that the opponent
will be indifferent between its original offer and our counter
offer, so our agent will reject the offer and propose a counter
offer at time t (taking a risk that the offer will be rejected),
since the counter offer has a better utility value for our agent.
If the difference is greater than the threshold, i.e., there is
a higher probability that the opponent will not accept our
counter offer, our agent will accept the opponent’s offer with
a given probability, which is attached to each outcome. To
this end we define the rank number, which is associated with
each offer and a given utility function u, denoted rank(offer).
The rank number of an offer is calculated by ordering all
offers on an ordinal scale between 1 and |O| according to
their utility values, and dividing the offer’s ordering number
by |O|. Formally, if | u2 (QO(t)) − u2 (offeropp ) | ≤ T the
agent rejects the offer and proposes QO(t). Otherwise, it
accepts the offer with a probability that is given by the rank
number rank(offeropp ) and rejects and counters offer QO(t)
with probability 1 − rank(offeropp ). The intuition behind
this is to enable the agent also to accept agreements based
on their relative values, on an ordinal scale of [0..1], and not
based on their absolute values.
In the next subsection we demonstrate that our proposed
solution also conforms to some of the properties of the Nash
bargaining solution. This gives us the theoretical basis for
the usage of our technique in bilateral negotiation, and for
the assumption that the offers proposed by our automated
agent will also be considered to be accepted by the opponent.
4.1.3
The Nash Bargaining Solution
We employ from Luce and Raiffa [15] the definitions of a
bargaining problem and the Nash bargaining solution. We
denote by B = (u1 (·), u2 (·)) the bargaining problem with
two utilities, u1 and u2 . The Nash bargaining solution1 is
defined by several characteristics and is usually designed for
a mediator in an environment with complete information. A
bargaining (or a negotiation) solution f should satisfy symmetry, efficiency, invariance and independence of irrelevant
alternatives, wherein:
1
Note that the solution is not the offer itself, but rather the
payoff of the offer (or offers, in cases where several offers
have the same payoff) to be made.
1. Symmetry, that is, if both players have the same bargaining power, then neither player will have any reason
to accept an agreement which yields lower payoff for it
than for its opponent. For example, for the solution to
be symmetric, it should not depend on the agent who
started the negotiation process.
2. Efficiency, that is, two rational agents will not agree
on an agreement if its utility is lower for both of them
than another possible agreement. This solution is said
to be Pareto-optimal.
3. Invariance, if for all equivalent problems B and B 0 ,
that is B 0 = (α1 + β1 · u1 (·), α2 + β2 · u2 (·)), α1 , α2 ∈
R, β1 , β2 ∈ R+ , the solution is also the same, f (B) =
f (B 0 ). That is, two positive affine transformations can
be applied on the utility functions of both agents and
the solution will remain the same.
4. Independence of irrelevant alternatives, the solution
f (B) = f (B 0 ) whenever B 0 ⊆ B and f (B) ⊆ B 0 . That
is, if new agreements are added to the problem in such
a manner that the status quo remains unchanged, either the original solution is unchanged or it becomes
one of the new agreements.
It was shown by Nash [16] that the only solution that satisfies all of these properties is the product maximizing of the
agents’ utilities. However, as we stated, the Nash solution
is usually designed for a mediator who is trying to mediate
between two agents. Since we propose a model for an automated agent who negotiates with bounded rational agents
following the QO function (Equation 3), our solution cannot
satisfy all of these properties. To this end, we modified the
independence of irrelevant alternatives property to allow for
a set of possible solutions instead of one unique solution:
• Property 4a (Independence of irrelevant alternatives
solutions) A negotiation solution f satisfies independence of irrelevant alternatives solutions if the set of
all possible solutions of f (B) is equal to the set of
all possible solutions of f (B 0 ) whenever B 0 ⊆ B and
f (B) ⊆ B 0 .
Proving that our agent strategy for proposing offers conforms to those properties is important since although the
agent should maximize its own utility, it should also find
agreements that would be acceptable by its opponent. That
is, those agreements should have the properties of symmetry, efficiency and independence of irrelevant alternatives
solutions. We also show that in certain cases QO generates
agreements which are better for the automated agent than
agreements that would have been generated by following the
Nash solution. Below, we state only the propositions, while
the proofs are beyond the scope of this paper2 .
Proposition 4.1. QO satisfies symmetry (Property 1),
efficiency (Property 2) and independence of irrelevant alternatives solutions (Property 4a). QO satisfies invariance
(Property 3) under the following conditions:
1. u1 (o) < u2 (o) ⇔ u01 (o) < u02 (o)
3
2
The proofs are based on the Luce number relation and the
assumption that the utility of the opponent is known.
3
This condition is applicable in domains such as ours, in
which there are contradicting preferences.
2. ∀o, p ∈ O u1 (o) > u2 (p) ⇒ u01 (o) > u02 (p)
It is also important to show that our proposed solution
is better than the worst outcome. Thus, we have the next
proposition:
Proposition 4.2. QO always generates an agreement which
is better than the worst outcome for both agents.
The last proposition indicates that there might be some
solutions which are better for one agent and also good for
the opponent, but are not part of the Nash solution:
Proposition 4.3. Let the agreement given by the Nash
solution be x. If there exists an agreement y where u1 (y) >
u1 (x) and u1 (y) < (lu1 (y)+lu2 (y))u2 (y) then QO’s solution
will be y rather than x.
To clarify the last proposition we give an example, illustrated by Table 1. Table 1 states the utility of both agents
for three different agreements. Assuming the current agent
is agent 1, while the Nash solution will lead to utilities of
(6,11) for both agents, the QO solution will lead to utilities
of (7,9). This solution does not satisfy the invariance property above, yet it is still a good agreement, however, and
better for the current agent.
Table 1: Example: Utility of a Bargaining Problem
u1 (·) u2 (·) (lu1 (·) + lu2 (·))u2 (·)
2
4
1.2
6
11
9.441
Nash solution
7
9
7.575
QO solution
We continued and examined how time affects our proposed qualitative function, i.e., what different solutions will
be proposed as time passes. We demonstrated that if both
agents have the same time constant discount rate, that is:
uj (o, t) = δ t uj (o, 0), where 0 < δ < 1, QO will generate
the same solution at each time unit. On the other hand, if
the effect of time is characterized as fixed losses/gains, that
is uj (o, t) = uj (o, 0) + t · C, where C 6= 0 is the time cost,
we proved that QO might generate different solutions at
each time period. The difference between the results for the
constant discount rate and the fixed losses/gains is that in
the discount constant rate the probabilities assigned to each
offer, that is, the Luce numbers, remain the same, while in
the latter the probabilities are changed. Intuitively it means
that if the utility functions maintain the preference relation
between their values and also the probabilities which are
assigned to them, the QO solution will remain the same.
4.2
The Bayesian Updating Rule Component
The Bayesian updating rule (BUR) is based on Bayes Theorem and it provides a practical learning algorithm. Bayesian
updating is the core component of the Bayesian Nash equilibrium ([18], p. 24-29), and it is used to deduce the current
state given a certain signal. We assert that there are several
types (or clusters) of possible agents. The bounded rational
agent should be matched to one such type. In each time
period, our automated agent consults the BUR component
in order to update its belief regarding the opponent’s type.
In the rest of this subsection we describe the functionality
of the BUR component.
Recall that there are k possible types for an agent. At
time t = 0 the prior probability of each type is equal, that
is, P (type) = k1 , ∀ type ∈ Types. Then, for each time
period t we calculate the posteriori probability for each of
the possible types, taking into account the history of the
negotiation. This is done incrementally after each offer is
received or accepted. Using the calculated probabilities, the
agent selects the type whose probability is the highest and
proposes an offer as if this is the opponent’s type. Formally,
at each time period t ∈ Time and for each type ∈ Types
and offert (the offer at time period t) we compute:
P (type|offert ) =
P (offert |type)P (type)
P (offert )
(4)
P|Types|
where P (offert ) = i=1
P (offert |typei ) · P (typei ). Since
the Luce numbers actually assign probabilities to each offer,
P (offert |type) is computed using the Luce numbers.
Using the above equations we deduce the believed type of
the bounded rational agent for each time period t, denoted
as BT (t), using the following equation:
BT (t) = arg
max
type∈Types
P (type|offert ),
∀t ∈ Time
(5)
Using this updating mechanism enables our update beliefs component to also conform to the following conditions,
which are generally imposed on an agent’s system of beliefs, and which are part of the conditions for a sequential
Bayesian equilibrium [18]: (a) consistency and (b) never
dissuaded once convinced. Consistency implies that agent
j’s belief should be consistent with its initial belief and with
the possible strategies of its opponents. These beliefs are updated by the agent whenever possible, while never dissuaded
once convinced implies that once an agent is convinced of its
opponent’s type with a probability of 1, or convinced that its
opponent cannot be of a specific type, it is never dissuaded
from its view. The results of the simulation indeed show
that in more than 70% of the simulations the automated
agent believed that its opponent is of the correct type with
a probability of 1 or with the highest probability amongst
the other possible types, and then it always generated offers
better suited to this type of agent.
5.
EXPERIMENTS
We developed a simulation environment which is adaptable such that any scenario and utility functions, expressed
as multi-issue attributes, with or without effect of time, can
be used, with no additional changes in the configuration
of the interface of the simulations or the automated agent.
Our automated agent can play either role in the negotiation,
while the human counterpart accesses the negotiation interface via a web address (http://nova.cs.biu.ac.il:8080/nego/).
The negotiation itself is conducted using a semi-formal language. Each agent constructs an offer by choosing the different values constituting the offers. Then, the offer is constructed and sent in plain English to its counterpart.
To test the efficiency of our proposed agent, we have conducted three sets of experiments on a specific negotiation
domain4 . In the following subsections we describe our do4
To show that our agent is capable of negotiating in other
domains as well, we loaded another domain and tested the
agent. As expected, the agent performs as well as in the
specified domain. That is, only the utility functions play a
role, and not the scenario or the domain.
main, the experimental methodology and review the results.
5.1
Experimental Domain
The experimental domain adheres to the problem definitions described in Section 2. In our scenario England and
Zimbabwe are negotiating in order to reach an agreement
growing out of the World Health Organization’s Framework
Convention on Tobacco Control, the world’s first public health
treaty. The principal goal of the convention is ”to protect
present and future generations from the devastating health,
social, environmental and economic consequences of tobacco
consumption and exposure to tobacco smoke.” Thus, the domain consists of two agents, England and Zimbabwe. There
are three possible agent types, and thus a set of six different utility functions was created. This set describes the
different types or different approaches towards the negotiation process and the other party. For example, type (a)
has a long term orientation regarding the final agreement,
type (b) has a short term orientation, and type (c) has a
compromise orientation.
Each negotiator was assigned its utility function at the
beginning of the negotiation but had incomplete information regarding the opponent’s utility. That is, the different
possible types of the opponent were public knowledge, but
the exact type of the opponent was unknown. The negotiation lasts at most 14 time periods, each with a duration of
two minutes. If an agreement is not reached by the deadline
then the negotiation terminates with a status quo outcome.
Each party can also opt out of the negotiation if it decides
that the negotiation is not proceeding in a favorable way.
Opting out by England means trade sanctions imposed by
England on Zimbabwe (including the ban on the import of
tobacco from Zimbabwe), while if Zimbabwe opts out then
it will boycott all British imports.
The leaders of both countries are about to meet at a long
scheduled summit. They must reach agreement on the following multi-attribute issues, constituting a total of 576 possible agreements: (a) the total amount to be deposited into
the Global Tobacco Fund to aid countries seeking to rid
themselves of economic dependence on tobacco production,
(b) impact on other aid programs, (c) trade issues, and (d)
creation of a forum to explore comparable arrangements for
other long-term health issues. While on the first two issues
there are contradicting preferences for England and Zimbabwe, for the last two issues there are several options which
might be preferred by both sides.
The first issue has an impact on England’s budget and on
the effectiveness of near-term and long-range economic benefits to Zimbabwe. The possible values are (a) $10 billion,
(b) $50 billion or (c) $100 billion. While the latter value is
the most beneficial for Zimbabwe, it is the most costly for
England.
The second issue affects the net cost to England and the
overall benefit to Zimbabwe. If other aid programs are reduced, the economic difficulties for Zimbabwe will increase.
The possible values are (a) no impact, (b) reduction equals
half the size of the fund, or (c) reduction equals the size
of the fund. The outcome of no impact on other aid programs minimizes negative consequences for Zimbabwe and
increases the total cost for England.
The third issue allows both countries to use trade policy to
extract concessions or provide incentives to the other party.
They can use restrictive trade barriers such as tariffs or they
can liberalize their trade policy by increasing imports from
the other party. There are both benefits and costs to these
policies: tariffs may increase revenue in the short run but
lead to higher prices for consumers and possible retaliation
by affected countries in the long run. Increasing imports
can cause problems for domestic industries, but it can also
lead to lower consumer costs and improved welfare. The
possible values are (a) England increases imports, (b) England reduces imports, (c) Zimbabwe increases tariffs, or (d)
Zimbabwe decreases tariffs.
Finally, the last issue relates to the precedent that may be
set by the Global Tobacco Fund. If the fund is established,
Zimbabwe will be highly motivated to apply the same approach to other global health issues. This would be very
costly to England, yet highly beneficial to Zimbabwe. The
possible values are (a) creation of fund, (b) creation of committee to discuss creation of fund or (c) creation of committee to develop agenda.
5.2
Experimental Methodology
In the first set of experiments we tested our automated
agent against human subjects. The experiment involved
44 simulations with human subjects, divided into 22 pairs.
Each simulation was divided into two parts: (i) negotiating
against another human subject, and (ii) negotiating against
the automated agent. The subjects did not know in advance
against whom they played. Also, in order not to bias the results as a consequence of the subjects getting familiar with
the domain and the situation, for exactly half of the subjects the first part of the simulation consisted of negotiating
with a human opponent, while the other half negotiated first
with the automated agent. The length of each part was at
most 28 minutes, after which a small break was given and
the second part started. The outcome of each negotiation is
either the reaching of a full agreement, opting out, or reaching the deadline (either with partial agreement or with a
Status Quo outcome).
The second set of experiments included 30 simulations
which tested QO, our automated agent, which follows the
qualitative decision making algorithm, against another automated agent which followed the Bayesian Equilibrium strategy. This set of experiments is important since it tests our
agent against another automated agent that incorporates a
commonly used approach in negotiations with incomplete
information. The results of this set of experiments gives a
strong reinforcement to our proposed model.
The last set of experiments was conducted to test how our
automated agent plays against itself, that is, its opponent
is another QO agent. To this end, another 30 simulations
were conducted. The following section describes the results
of all the experiments.
5.3
Research Questions and Results
Several research questions were raised in order to test the
efficiency of our automated agent. The answers to those research questions allowed us to test the goals of our design
of the automated agent. The main goal was to enable the
automated agent to achieve higher utility values at the end
of the negotiation than the human negotiator, playing the
same role, achieved, and to allow the negotiation process to
end faster as compared to negotiations without the automated agent. Another secondary goal was to test whether
the automated agent increased the social welfare of the out-
Table 2: Final negotiations
utility values
Parameter
QZ vs. HE
HZ vs. HE
QE vs. HZ
HE vs. HZ
HZ vs. HE
HZ vs. QE
HE vs. HZ
HE vs. QZ
Sum - HE vs. QZ
Sum - HE vs. HZ
Sum - HZ vs. QE
Sum - HZ vs. HE
utility values and sum of
Avg
-27.4
-260
150.5
288.0
-260.14
113.09
288.0
349.14
321.7
27.86
263.6
27.86
Stdev
287.2
395.6
270.8
237.1
395.64
353.23
237.12
299.36
223.4
449.9
270.5
449.9
come. We state below the research questions, followed by
the experiments results.
1. Are the final utility values the automated agent gained
higher than the final utility values gained by a human
player in the same role?
2. Are the final utility values gained by humans playing
against an automated agent higher than the final utility values gained by the same humans playing against
human counterparts?
3. Is the sum of the final utility values higher when a
human subject played against an automated agent as
opposed to playing against a human opponent?
4. Does playing against an automated agent cause the
negotiation to end faster than against a human player?
5. Is the total number of offers made in a negotiation with
an automated agent smaller than the total number of
offers exchanged between two human players?
6. Is the final QO agent utility value higher than that of
the Bayesian Equilibrium agent?
7. How does the QO agent play when facing another QO
agent playing the opposite role?
Table 2 summarizes the average utility values of all the
negotiations and the average of the sum of utility values in
all the experiments. HZ and HE denote the utility value
gained by the human playing the role of Zimbabwe or England, respectively, and QZ and QE denote the utility value
gained by the QO agent playing either role. The utility values ranged from -575 to 895 for the England role and from
-680 to 830 for the Zimbabwe role. The Status-Quo value at
time T0 was 150 for England and -610 for Zimbabwe. England had a fixed gain of 12 points per time period, while
Zimbabwe had a fixed loss of -16 points.
To test items 1 and 2 we examined the final of all the
negotiations for each player. To test item 3 we examined
the sum of the final utility values. The results show that
when the automated agent played the role of Zimbabwe,
it achieved significantly higher utility values as opposed to
a human agent playing the same role (using paired t-test:
t(22) = 2.23, p < 0.03). On the other hand, when playing the role of England, there is no significant difference
between the utility values of the automated agent and the
human player, though the average utility value for the human was higher than for the automated agent. The results
also show that the average utility values when the human
played against another human are lower than the average
utility values when he played against an automated agent.
However, these results are significant only in the case in
which the human player played the role of Zimbabwe (using
paired t-test: t(22) = −3.43, p < 0.003).
Comparing the sum of utility values of both negotiators,
based on the role the automated agent played, we show that
this sum is significantly higher when the negotiations involved our automated agent (either when the automated
agent played the role of Zimbabwe or England), as opposed
to negotiations involving only human players (using 2-sample
t-test: t(22) = 2.74, p < 0.009 and t(22) = 2.11, p < 0.04).
Another important aspect of the negotiation is the outcome - whether a full agreement was reached or whether the
negotiation ended with no agreement (either status-quo or
opting out) or with a partial agreement. While only 50%
of the negotiations involving only humans ended with a full
agreement, 80% of the negotiations involving the automated
agent ended with a full agreement. Using Fishers Exact
test we determined that there is a correlation between the
kind of the opponent agent (be it the automated agent or
the human) and the form of the final agreement (full, partial or none). The results show that there is a significantly
higher probability of reaching a full agreement when playing
against the automated agent (p < 0.006).
To test item 4 we examined the final time period of the
negotiations. All of the test results showed that negotiations
in which our automated agent was involved the final time
period was significantly lower than the final time period in
negotiations in which only humans were involved. The average final time period with two human negotiators was 11.36,
while the average final time period against an automated
agent playing the role of Zimbabwe and the role of England was 6.36 and 6.27 respectively. We used the 2-sample
Wilcoxon test to compare between the final time periods
of our automated agent and the human agent when playing against the same opponent (for England role p < 0.001
and for Zimbabwe role p < 0.002). In addition, we used
the Wilcoxon signed rank test, to examine the final time
period achieved by the same human player in the two simulations in which he participated - once involving another
human player, and another involving an automated agent
(for England role p < 0.0004, for Zimbabwe role p < 0.001).
To test item 5 we examined the total number of offers
made and received in each negotiation. The average number of offers made and received in negotiations involving
the automated agent was 10.86 and 10.59, while in negotiation involving only human players the average number was
18.09. Using 2-sample Wilcoxon test we show that significantly fewer offers were made when an automated agent was
involved (p < 0.007 and p < 0.004).
To test item 6 we conducted the second set of experiments. When the equilibrium agent played the role of Zimbabwe and the QO agent played the role of England, more
than 93% of the negotiations ended by the second time period. The average final utility values of the negotiations were
342.87 for the QO agent (playing the role of England) and
116.4 for the equilibrium agent. Those final utility values for
the QO agent were higher than any of the final utility val-
ues the QO agent reached when negotiating against a human
player. The final utility values achieved by the QO agent
were also higher than those of the human players, playing
also as the role of England (either when playing against another human players or our automated agent). When the
equilibrium agent played the role of Zimbabwe, 90% of the
negotiations lasted until the last time period and eventually
ended with a Status Quo agreement. The rest of the negotiations ended with an agreement favoring the equilibrium
agent due to the probabilistic nature of the QO agent when
deciding whether to accept received offers. The average final
utility values were −791.8 for the QO agent and 372.1 for the
equilibrium agent. Though we did not run simulations of the
equilibrium agent against human agents, the utility values
of the opponent from the offers suggested by the equilibrium
agent are much lower than the final utility values of the human negotiations. Analyzing also the simulation process of
the human negotiations, we can deduce that without incorporating any heuristics to the equilibrium agent, the human
players would not have accepted the offers proposed by it.
To test the last item (item 7) we conducted the third set
of experiments. The average utility values for the QO agent
playing England was 332.73 and for the QO agent playing
Zimbabwe 133.43. These values are higher than any of the
average values gained by the human players when playing
against other humans. For the role of Zimbabwe, the results
are also significant (p < 0.0002). Also, the average utility
values for both QO agents were in the top 40%.
6.
CONCLUSIONS
This paper outlines the design of an automated agent for
bilateral negotiation with bounded rational agents where
there is incomplete information regarding the utility preferences of the opponent. An automated agent was designed incorporating a qualitative decision making mechanism. Three
sets of experiments were conducted to test the automated
agent against a bounded rational agent, against an agent
who follows a Bayesian equilibrium strategy and against itself. The results showed that our agent is indeed capable of
negotiating successfully with human counterparts and reaching efficient agreements. In addition, the results demonstrated that the agent played at least as well as, and in the
case of one of the two roles, gained significantly higher utility values, than the human player. We also showed that the
utility values achieved by our agent when playing against
another automated agent, who followed a Bayesian equilibrium strategy, were higher. Though the experiments were
conducted on a specific domain, it is quite straightforward
to adapt the simulation environment to any other scenario.
Although much time was spent in designing the generation of an offer mechanism, the results showed that most
of the reached agreements were offered by the human counterpart. Our agent accepted the offers based on a probabilistic heuristic. A future research direction would be to
improve the offers’ generation mechanism and the probabilistic heuristic and then test it by conducting the same
sets of experiments.
7.
REFERENCES
[1] W. B. Arthur. Bounded rationality and inductive
behavior (the el farol problem). American Economic
Review, 84:406–411, 1994.
[2] C. Boutilier, R. Das, J. O. Kephart, G. Tesauro, and
W. E. Walsh. Cooperative negotiation in autonomic
systems using incermetal utility elicitation. In Proc. of
UAI-03, pages 89–97, 2003.
[3] R. I. Brafman and M. Tennenholtz. An axiomatic
treatment of three qualitative decision criteria.
Journal of the ACM, 47(3):452–482, 2000.
[4] C. Camere, T. Ho, and K. Chong. Modles of thinking,
learning and teaching in games. American Economic
Review, 93(2):192–195, 2003.
[5] C. M. Capra, J. K. Goeree, R. Gomez, and C. A. Holt.
Anomalous behavior in a traveler’s dilemma?
American Economic Review, 89(3):678–690, 1999.
[6] J. Doyle and R. H. Thomason. Background to
qualitative decision theory. AI Magazine, 20(2):55–68,
1999.
[7] D. Dubois, H. Prade, and R. Sabbadin.
Decision-theoretic foundations of qualitative
possibility theory. European Journal of Operational
Research, 128:459–478, 2001.
[8] P. Faratin, C. Sierra, and N. R. Jennings. Using
similarity criteria to make issue trade-offs in
automated negotiations. Artificial Intelligence,
142(2):205–237, 2002.
[9] S. Fatima and M. Wooldridge. An agenda based
framework for multi-issue negotiation. Journal of
Artificial Intelligence, 152:1–45, 2004.
[10] S. Fatima, M. Wooldridge, and N. R. Jennings.
Bargaining with incomplete information. Annals of
Mathematics and Artificial Intelligence, 44(3):207–232,
2005.
[11] P. Hoz-Weiss, S. Kraus, J. Wilkenfeld, D. R.
Andersen, and A. Pate. An automated agent for
bilateral negotiations with humans. In Proc. of
AAAI/IAAI-02, pages 1000–1001, 2002.
[12] S. Kraus. Strategic Negotiation in Multiagent
Environments. MIT Press, Cambridge MA, USA,
2001.
[13] R. J. Lin and S. T. Chou. Bilateral multi-issue
negotiations in a dynamic environment. In Workshop
on Agent-Mediated Electronic Commerce V, 2003.
[14] R. D. Luce. Individual Choice Behavior: A Theoretical
Analysis. John Wiley and Sons, Inc., New York, 1959.
[15] R. D. Luce and H. Raiffa. Games and Decisions introduction and critical survey. John Wiley and Sons,
Inc., New York, 1957.
[16] J. F. Nash. The bargaining problem. Econometrica,
18:155–162, 1950.
[17] M. Oprea. An adaptive negotiation model for
agent-based electronic commerce. Studies in
Informatics and Control, 11(3), 2002.
[18] M. J. Osborne and A. Rubinstein. A Course In Game
Theory. MIT Press, Cambridge MA, 1994.
[19] E. Rasmusen. Games and Information: An
Inroduction to Game Theory. Blackwell Publishers,
2001.
[20] M. Tennenholtz. On stable social laws and qualitative
equilibrium for risk-averse agents. In Principles of
Knowledge Representation and Reasoning, pages
553–561, 1996.

Download Report

An Automated Agent for Bilateral Negotiation with Bounded Rational

Paperzz.com

Your Paperzz