Finding a Nash Equilibrium by Asynchronous Backtracking

Finding a Nash Equilibrium by Asynchronous
Backtracking
Alon Grubshtein and Amnon Meisels ?
Dept. of Computer Science,
Ben Gurion University of the Negev,
P.O.B 653, Be’er Sheva, 84105, Israel
{alongrub, am}@cs.bgu.ac.il
Abstract. Graphical Games are a succinct representation of multi agent interactions in which each participant interacts with a limited number of other agents.
The model resembles Distributed Constraint Optimization Problems (DCOPs)
including agents, variables, and values (strategies). However, unlike distributed
constraints, local interactions of Graphical Games take the form of small strategic games and the agents are expected to seek a Nash Equilibrium rather than a
cooperative minimal cost joint assignment.
The present paper models graphical games as a Distributed Constraint Satisfaction Problem with unique k-ary constraints in which each agent is only aware
of its part in the constraint. A proof that a satisfying solution to the resulting
problem is an -Nash equilibrium is provided and an Asynchronous Backtracking algorithm is proposed for solving this distributed problem. The algorithm’s
completeness is proved and its performance is evaluated.
1
Introduction
In a typical Multi Agent setting, agents interact with one another to achieve some goal.
This goal may be a globally defined objective such as the minimization of total cost,
or a collection of personal goals such as maximizing the utility of each agent. In the
latter case, Game Theory predicts that the outcome should be a stable solution – an
equilibrium – from which every agent will not care to deviate. This notion of stability
is a fundamental concept in game theory literature and has been at the focus of work in
the field of Multi Agent Systems.
Graphical Games are a succinct representation of normal form games played over a
graph [6]. It is defined by a graph of agents that explicitly defines relations: if an edge
eij exists than agents ai and aj interact with one another and therefore affect each other
gains. The model exploits the locality of interactions among agents and enables one to
specify payoffs in terms of neighbors rather than in terms of the entire population of
agents. Two distributed algorithms inspired by Bayesian Networks were initially proposed to find an approximate equilibrium of graphical games, NashTree and NashProp
?
The research was supported by the Lynn and William Frankel Center for Computer Sciences at
Ben-Gurion University and by the Paul Ivanier Center for Robotics Research and Production
Management.
[6, 11]. Both rely on an a discretization scheme for the approximation and it is proved
in [11] that NashProp can be used to find an -Nash equilibrium on general graph interactions.
The Graphical Games model is closely related to Distributed Constraint Optimization Problems (DCOPs) [9]. DCOPs are defined by agents with private variables, a finite
domain of values for each variable, and constraints mapping each joint assignment to
a non negative cost. The values that a DCOP agent assigns to its variables are similar
to the choice of strategy made by a Graphical Game agent and the local interactions of
the game resembles a constraint. However, despite these similarities the two models are
inherently different. Agents connected by a DCOP constraint share a single cost value
to their joint assignments and have full knowledge of the constraint’s costs structure. In
Graphical Games the agents have a personal valuation to each outcome. Furthermore,
the standard solution of a DCOP is a joint assignment that minimizes the sum of costs
and not necessarily a stable point.
The present paper proposes a model for graphical games as a distributed constraints
problem and a new asynchronous algorithm for finding stable points. The model uses
constraints which are partially known to each participant to represent private valuation
of outcomes [3, 5]. First, Asymmetric DCOPs (ADCOPs) [5], an extension of standard
DCOPs, provide a natural representation to game-like interactions. It specifies different costs to each agent in a constraint, capturing personal preferences and utilities of
different agents. Next, the Partially Known Constraint (PKC) model [3], focusing on
asymmetric constraint satisfaction is used to align a satisfying solution with an equilibrium.
By casting the ADCOP representation of a Graphical Game to an asymmetric satisfaction problem one can apply constraint reasoning techniques to find an equilibrium
of a multi agent problem. Following NashTree and NashProp [6, 11] the present paper
presents a new Asynchronous Nash back Tracking (ANT) algorithm for finding -Nash
equilibria. This algorithm is inspired by the well known Asynchronous Back Tracking algorithm (ABT) [14, 2] and its asymmetric single phase variant [3] (ABT-1-ph).
A proof that a satisfying solution to the revised problem is an -Nash equilibrium is
provided.
Similar to other ABT variants, agents in the ANT algorithm exchange messages
specifying their current assignment, Nogood messages and requests for additional communication links (termination messages are not required). However, the ANT algorithm
searches through a high arity, asymmetric, distributed constraint satisfaction problem.
ANT is proven to find a globally satisfying solution to the graphical game problem –
effectively an -Nash equilibrium.
One former attempt to apply constraint reasoning techniques to game theoretic equilibrium search gave up on the inherently distributed nature of graphical games. A centralized constraint solver is described in [13] where the authors generate very large
domains and complex constraints to represent game like structures. The approach of
[13] assumes that agents are willing to reveal private information to a third party which
carries out the computation and is therefore not suitable to a distributed multi agents
interaction.
There were former studies of limited versions of graphical games. Two methods
for finding stable points of graphical games examine the relation between local minima
and pure strategy equilibria [1, 8] (which is not guaranteed to exist). However, both
works examine games played on symmetric DCOPs which are a subset of a special
class of games known as potential games and do not extend to general form games.
Another structured interaction is also specified in [4] which attempt to find the “best”
pure strategy equilibrium but is extremely limited to tree based interactions. In contrast
to these works the ANT algorithm (always) finds an -Nash equilibrium on general form
graphs and general form games.
The remainder of the paper is organized as follows. Section 2 provides a detailed
description of ADCOPs, graphical normal form games and equilibrium solutions. Next
comes a description of the asymmetric satisfaction problem generated for finding an Nash equilibrium in Section 3. Section 4 presents the Asynchronous Nash back Tracking algorithm (ANT) and its formal properties proof. Section 5 is devoted to experimental evaluation of ANT but also provides a full description of NashProp’s backtracking
(second) phase. This is, to the best of our knowledge, the first full account of the distributed backtracking phase of NashProp. Finally, Section 6 concludes the present work
and discusses future directions.
2
2.1
Preliminaries
Asymmetric Distributed Constraints
Asymmetric Distributed Constraint Optimiaztion Problems (ADCOPs) [5] define a constraint model in which a cost is specified to each agent in a constraint instead of a single
cost to all constrained agents. Unlike DCOPs, an assignment change which decreases
the cost incurred on one agent in an ADCOP is not guaranteed to decrease the cost
incurred on other agents in the constraint. Roughly speaking one can say that the costs
of asymmetric constraints are “on the agents” (the vertices in the constraint network)
rather than on the “constraints” (edges of the constraint network).
Formally, an ADCOP is a tuple hA, X , D, Ri. Where A is a finite set of agents
a1 , a2 , ..., an and X is a finite set of variables x1 , x2 , ..., xm . Each variable is held by
exactly one agent but an agent may hold more than one variable. D is a set of domains
d1 , d2 , ..., dm , specifying the possible assignment each variable may take. Finally, R is
a set of asymmetric relations (constraints).
A value assignment or simply an assignment is a pair hxi , vi including a variable xi ,
and its value v ∈ di . Following common practice we assume each agent holds exactly
one variable and use the two terms interchangeably. A partial assignment (PA) is a set
of value assignments, in which each variable appears at most once.
A constraint C ∈ R of an Asymmetric DCOP is defined over a subset of the variables X (C), and maps the joint values assigned to the variables to a vector of nonnegative costs (where the j th entry corresponds to the j th variable in X (C)):
C : Di1 × Di2 × · · · Dik → Rk+
We say that a constraint is binary if it refers to a partial assignment with exactly
two variables, i.e., k = 2. A binary ADCOP is an ADCOP in which all constraints are
binary.
A complete assignment is a partial assignment that includes all the variables in X
and unless stated otherwise, an optimal solution is a complete assignment of aggregated
minimal cost. In maximization problems, each constraint has utilities instead of costs
and a solution is a complete assignment of maximal aggregated utility.
For consistency, we also define the satisfaction variant of this problem (denoted
ADCSP). An ADCSP constraint takes the following form:
C : Di1 × Di2 × · · · Dik → {0, 1}
k
In this case, constraint C is said to be satisfied by a PA p if p includes value assignments to all variables of C and if C(p) is a tuple of all 1’s, i.e. all involved agents agree
on its satisfiability. The solution of an ADCSP is a complete assignment which satisfies
all constraints C ∈ R.
2.2
Normal form games
A normal form game is a model for multiple interacting parties or agents. These parties
plan their interaction and may act only once (simultaneously). The joint action of all
participants results in an outcome state which is, in general, evaluated differently by the
agents.
More formally, a normal form game includes a finite set of n players (agents) A , a
non empty set S of actions or strategies and a preference ordering over outcomes M.
Specifically, for each agent ai , ui (x) ∈ M maps any joint strategy x = ×ai ∈A si to
a utility (or cost) value. If ui (o1 ) > ui (o2 ) then agent ai prefers outcome o1 to o2 . In
the remainder of this paper we follow common practice and denote with x−i the joint
action of all agents except for ai .
The most commonly used solution concept for strategic interactions is that of a Nash
Equilibrium [10] defined as a joint assignment x∗ to all agents such that:
∀ai ∈ A : ui (x∗−i , x∗i ) ≥ ui (x∗−i , xi )
If agents can take non deterministic actions, or state a probability distribution over
deterministic strategies as their action than Nash’s classic theorem states that a mixed
strategy Nash Equilibrium always exists (cf. [10, 12]).
2.3
Graphical Games
Graphical Games were first described by Kearns et. al in [6]. A graphical game is defined by a pair (G, M) where G is an n players interaction graph and M is the set of
local game matrices mapping joint assignments to agents utilities. The space required
for representing local game matrices is significantly smaller than the standard repren
sentation: instead of n |s| values required for the representation of each table ui M,
d+1
a Graphical Game requires only n |s|
values (where d is the maximal degree of an
agent). That is, in such games, only neighboring agents affect each other’s utility.
We denote ai ’s set of neighbors by N (i) and adhering to our previous notation use
xN (i) to represent the joint action of ai ’s neighbors. It is easy to verify that a Graphical Game defined by a Graph G and local games description M is very similar to an
ADCOP (ADCOPs constraints can provide a slightly more compact representation).
A discretization scheme for computing an approximate equilibrium is presented in
[6]. The approximate equilibrium – - Nash equilibrium – is a mixed strategy profile
p∗ such that
∀Ai ∈ A : ui (p∗−i , p∗i ) + ≥ ui (p∗−i , pi )
The scheme presented in [6] constrain each agent’s value assignment to a discretized
mixed strategy which is a multiple of some value τ which depends on the agents’ degree
and the desired approximation bound . Two search algorithms for finding - Nash
equilibrium in graphical games were proposed in [6, 11].
3
Nash ADCSP
A simple transformation is defined from a general ADCOP to a Nash Asymmetric Distributed Constraint Satisfaction Problem (Nash-ADCSP). The resulting problem can
be solved by a distributed constraint satisfaction algorithm capable of handling asymmetric, non-binary constraints. This section is concluded with a proof that a satisfying
solution to the Nash-ADCSP is an -Nash equilibrium.
Given an ADCOP representation of a general multi agent problem, a Nash-ADCSP
with the same set of agents and variables is constructed. Using the discretization scheme
described in [6] the domains of each agent are revised to represent distributions over
values. That is, each agent’s domain is a discretized mixed strategy that is a multiple of
some τ which is defined according to the desired accuracy level (cf. [6]).
The new problem includes n = |A| constraints, each associated with exactly one
agent. The arity of constraint Ci associated with agent ai equals ai ’s degree + 1. The
set of satisfying assignments (satisfying all agents in the constraint) include all joint
assignments of ai and its neighbors
QN (i) such that ai ’s action yields a maximal gain to
itself. That is, if hv, xN (i) i∈ di × k∈N (i) dk is a joint assignment:
(
Ci (v, xN (i) ) =
h1, 1, ..., 1i
if ui (v, xN (i) ) ≥ ui (v 0 , xN (i) ) ∀v 0 ∈ di
h1, ..., 0, ..., 1i otherwise (the zero is associated with ai )
As an example, consider the three agent binary interaction presented in the left hand
side of Figure 1. In this example there are two separate interactions: between a1 and a2
and between a2 and a3 . Following game theory conventions, the agents’ asymmetric
constraints stemming from these interactions are described by two cost bi-matrices.
That is, each entry in the table correspond to a joint action by the row and column
player, where the row player’s evaluation of the joint action is given by the left hand
cost value and the column player’s cost is given by the right hand cost value.
This problem is then cast to a Nash-ADCSP problem (right hand side of Figure
1) with three new asymmetric constraints C1 , C2 and C3 . Constraints C1 and C3 are
the binary constraints associated with agents a1 and a3 and can only be satisfied when
a2
a2
C1
C3
a1
i j
7,2 1,0
3,1 5,5
i j
6,7 8,1
2,0 1,4
C1 :
i
j
C3 :
i
j
0
1
1
0
0
1
0
1
b a
C2 (x2 = i):
l
m
0
0
1
1
m l
a2
C2 (x2 = j):
l
m
b a
a2
b a
a3
a3
m l
a1
b a
a1
a3
C2
1
1
0
0
Fig. 1: (left) A simple ADCOP with three agents and two asymmetric constraints. Each agent has
two values in its domain d1 = {a, b} , d2 = {i, j} and d3 = {l, m}. An entry in the constraint
represents the cost of each joint action (i.e. the cost of hx1 = a, x2 = ii is 7 for a1 and 2 for
a2 . (right) The resulting ADCSP with 3 asymmetric satisfaction constraints (C1 , C2 and C3 ) of
which C3 is a trinary constraint. A value of 1 in constraint Ci means that the action corresponding
to the entry is ai ’s best response.
a1 and a3 best respond to a2 ’s action. Constraint C2 is a trinary constraint which is
satisfied whenever a2 best responds the joint action of a1 and a3 . For example, a2 ’s
best response (minimal cost) to the joint action hx1 = a, x3 = mi is the assignment
hx2 = ii which takes the value of 1 (consistent) in C2 .
In this example the assignment hx1 = b, x2 = i, x3 = mi is a pure strategy Nash
equilibrium (costs are 3, 0 and 2 respectively).It is easy to verify that this joint assignment also satisfies all constraints of the the new Nash-ADCSP problem.
Nash-ADCSP constraints have two noteworthy properties:
1. Constraint Ci ’s satisfiability state is determined by agent ai only. Any other agent
in the constraint always evaluates its state as “satisfied” and this in turn implies that
any agent aj ∈ N (i) is ignorant of Ci .
2. Given the joint assignment of all its neighbors, an agent ai can always satisfy its
constraint.
The relation between the satisfaction problem and an equilibrium is supplied by the
following proposition:
Proposition 1 Let p be a complete assignment to a Nash ADCSP with discretized domains corresponding to an approximation value . Then p is a consistent assignment iff
it is an -Nash equilibrium of the original problem.
Proof. Necessary: If p is consistent with all constraints then it is easy to see that by
definition each agent best responds to the joint action of all its neighbors. Hence p is
an equilibrium on the τ grid of the discretization scheme and an -equilibrium of the
original problem.
Sufficient: Let p be an -Nash equilibrium of the original problem
Q where each of
the values taken by the agents exists on the discrete grid (i.e. p ∈ Ai ∈A Di ). Each
agent in an equilibrium must “best-respond” its neighbors joint actions up to some
. Therefore, each constraint Ci associated with Ai is mapped to a tuple of all ones
(otherwise the agent can swap its assignment to an alternate one in which its gain is
higher with respect to the joint assignment of its neighbors and p is not an equilibrium)
and all constraints are satisfied.
t
u
The above proposition also implies that a solution to the constructed Nash ADCSP
must always exist (i.e. the problem is always satisfiable).
4
Asynchronous Nash back Tracking
Asynchronous Nash back Tracking (ANT) is a new variant of ABT [14, 2, 3, 9] searching through the space of the Nash-ADCSP problem. ANT extends the original ABT
and its PKC variant, ABT-1ph, to support asymmetric non-binary satisfaction. It relies
on the agents’ total order to cover the space of all joint assignments when attempting
to satisfy a constraint. That is, if an agent ai ’s assignment is not a best response to the
joint assignment of N (i), the lowest priority agent (with respect to the total ordering)
in Ci will change its assignment. If this assignment change remains inconsistent, then a
Nogood is generated and sent to the lowest priority agent from its list of higher priority
neighbors.
ANT’s pseudo code is presented below. It adopts the same notation as that described
in [2, 3] and handles Nogoods similarly. For details and in depth discussion, the reader
is addressed to the description provided in [2, 3].
Each ANT agent maintains a local copy of the value assignments of which it is
aware (its agentView), a store of Nogoods (ngStore) kept as justification for the removal
of values from the current domain and a boolean variable isLastAgentInConstraint taking the value TRUE if ai is the lowest priority agent in Ci . The list of lower priority
neighbors is denoted by Γ + .
Similar to other ABT variants, three message types are used ok?, ngd and adl.
ok? messages notify other agents of a new value assignment, ngd messages are used
to request the removal of values and provide an explanation for this request and adl
messages are used to request an additional link of information be generated between
two agents.
An ANT agent begins by assigning itself a value and notifying its neighbors of
this value assignment. The agents then react to messages received from others until
Asynchronous Nash backTracking - pseudocode
procedure ANT
myV alue ← chooseV alue();
send(ok?, neighbors, myV alue);
end ← f alse;
while ¬end do
msg ← getM sg();
switch (msg.type)
case ok? : ProcessInfo(msg);
case ngd : ResolveConflict(msg);
case adl : SetLink(msg);
procedure P ROCESS I NFO(msg)
updateAgentView(msg.sender ← msg.assignment);
if ¬isConsistent(agentV iew, myV alue) then
if ¬ isLastAgentInConstraint then
ng ← NoGood(agentV iew, cons neighbors);
add(ng.addLhs, self ← myV alue);
send(ngd, ng.rhs, ng);
else
checkAgentView();
procedure R ESOLVE C ONFLICT(msg)
ng ← msg.N oGood;
if coherent(ng, Γ + ∪ self ) then
addLinks(ng);
ngStore.add(ng);
myV alue ← NULL;
checkAgentView();
else if coherent(ng, self ) then
send(ok?, msg.sender, myV alue);
procedure SET L INK(msg)
neighbors.add(msg.sender);
send(ok?, msg.sender, myV alue);
quiescence is reached. When agent ai receives an ok? message it invokes the ProcessInfo procedure which first updates its agentView, ngStore and current domain. If the
agentView and ai ’s assignment are inconsistent with Ci , the agent attempts to find an
alternative assignment if it is the lowest priority agent in Ci or generate a new Nogood
to the lowest priority agent otherwise. It should be noted that ANT agents consider a
constraint Ci to be inconsistent only if all agents in Ci reported their values and these
variables result in an unsatisfying solution (as described in 2.1).
If ai receives a ngd message, it first verifies that this is a coherent message with
respect to ai and its lower priority neighbors. If it is, and additional links are needed
these are requested. Next, the Nogood is stored and the current assignment is removed
until a consistent revised assignment is found (checkAgentView). If the message is not
coherent but ai ’s current assignment is the same as that assumed in the ngd message,
an ok? message is sent to the originator of the message.
When an adl message is received the agents simply add the originator to the list of
neighbors and notify it of their current assignment.
Asynchronous Nash backTracking - pseudocode, continued
procedure UPDATE AGENT V IEW(assignment)
add(assignment, agentV iew);
for all ng ∈ ngStore do
if ¬ coherent(ng.lhs, agentV iew) then
ngStore.remove(ng);
currentDomain.makeConsistentWith(ngStore);
procedure CHECK AGENT V IEW
if ¬isConsistent(agentV iew, myV alue) then
if (myV alue ← chooseValue())==NULL then
backTrack();
else
send(ok?, neighbors , myV alue);
procedure BACK T RACK
resolvedN G ← ngStore.solve();
send(ngd, resolvedNG.rhs, resolvedN G);
agentV iew.unassign(resolvedN G.rhs);
updateAgentView(resolvedN G.rhs ← NULL);
checkAgentView();
procedure CHOOSE VALUE
for all val ∈ currentDomain do
remove(currentDomain, val);
if isConsistent(agentV iew, val) then
return val;
else
ng ← agentV iew ∩ constrainedN eighbors’
if ¬ isLastAgentInConstraint then
add(ng.lhs, self ← val);
add(ng.rhs, last agent in constraint);
send(ok?, ng.rhs, val);
send(ngd, ng.rhs, ng);
return val;
else if
thenadd(ng.rhs, self ← val);
ngStore.add(ng);
return NULL;
ANT’s basic auxiliary functions, updateAgentView, checkAgentView and backtrack
are in parts simpler than that of ABT-1ph due to the Nash-ADCSP structure which defines a single constraint to each agent. However, its value choosing mechanism should
be addressed with care to ensure completeness of the algorithm. The chooseValue procedure iterates over all values in an agent ai ’s current domain. A candidate value is first
removed from the current domain and its consistency is checked against the agentView.
If it is not consistent, a new nogood is generated with the values of all agents in Ci . If
ai is the lowest priority agent in Ci this Nogood is stored and a new candidate value
is examined. However, if ai is not the lowest priority agent in Ci the Nogood is updated to include ai as well, and then sent to the lowest priority agent aj in Ci after an
ok? message with the new value is sent to aj . This ensures that when aj processes the
Nogood, ai ’s value is coherent. Finally, it should be noted that the latter case returns a
seemingly inconsistent value to ai . This step is required to ensure all possible joint values are examined, and stems from the fact that constraints are asymmetric. That is, in an
asymmetric constraint agents with priority lower than ai may change their assignment
in such a way that will change the consistency state of Ci .
The following proposition provides for ANT’s formal proporties:
Proposition 2 The ANT algorithm always finds an equilibrium. That is, it is sound,
complete and terminates.
Proof. ANT reports a solution whenever quiescence is reached. In this case, all constraints are satisfied (otherwise, at least one nogood will exist and quiescence will not
be achieved). Therefore the reported solution is a globally satisfying solution and by
proposition 1 it is also an Nash Equilibrium.
ANT follows a similar search to that described in ABT-1ph [3]. Its main difference
is that it ensures non binary constraints are indeed satisfied by a PA through its nogood
handling mechanism and its chooseValue function.
An agent Ai receiving a nogood from an agent Aj of higher priority must be the
lowest priority agent in Cj . This nogood also includes all value assignments of the rest
of the agents involved in Cj . Ai can therefore select a consistent value (consistent with
its personal constraint Ci ) from its current domain and propose this value assignment
to all agents of Cj . Specifically, this new value assignment will be received by Aj via
an ok? message and its impact on the consistency state of Cj with respect to the new
PA will be examined. If the new PA is still inconsistent, a new nogood from Aj to Ai
will be generated and Ai will seek an alternative value. When Ai ’s domain is exhausted
it generates a new nogood by a similar mechanism to that of ABT-1ph and sends it
backward to a higher priority agent.
When Ai receives a nogood from a lower priority agent Ak then it must be due to an
inconsistent PA to some constraint Cx (possibly Ci ) to which Ak and all other lower priority agents have failed to find a satisfying joint assignment which includes Ai ’s current
value assignment. Ai will therefore attempt to pick an alternate value from its current
domain which will not necessarily satisfy its personal constraint Ci (see chooseValue).
Despite being in a possible conflict state, the asymmetric constraint may eventually
change its state due to a change of assignments by lower priority agents and this step
ensures that no value combination is overlooked.
The result of this process is that no agent will change its value to satisfy a constraint
unless all value combination of lower priorities agents of the same constraints have been
exhausted and no PA which may be extended to a full solution is overlooked.
Finally, due to the exhaustive search of consistent PA to constraints and the nogood
mechanism (see [3, 2]) which guarantees that discarded parts of the search space accumulate ANT must, at one point, terminate.
t
u
5
5.1
Experimental evaluation
The NashProp algorithm
An immediate way of evaluating the performance of the ANT algorithm is to compare
it to the only other distributed algorithm for infering an -Nash equilibrium in graphical
games – the NashProp algorithm [11]. This turns out to be a non trivial task, because
the NashProp algorithm has been presented only partially in [11].
The algorithm is described in two phases: a table passing phase and an assignment
passing one. The first phase is a form of an Arc Consistency procedure, in which every
agent ai exchanges a binary valued matrix T with each of its neighbors aj . An entry in
these tables corresponds to the joint action hxi = i, xj = ji (or vice versa) and takes the
value of 1 if and only if ai can find a joint action of its neighbors xN (i) =hxj1 , ..., xjk i
such that:
1. The entry corresponding to hxi , xjl i takes the value of 1 ∀l 1 ≤ l ≤ k.
2. The assignment xi = i is a best response to xN (i) .
This phase is proven to converge and by a simple aggregation of tables one can hope
to prune parts of the agent’s domain. It should be noted that this phase is essentially a
form of pre processing and will also be used by the ANT algorithm in our evaluation.
Details of the second phase of the NashProp algorithm were omitted from [11].
In fact, NashProp’s evaluation is based on experiments in which no backtracking was
needed (Kearns et. al specify that backtracking was required only on 3% of the problems
in their evaluation). The brief description specifies a simple synchronized procedure in
which the initializing agent picks an assignment for itself and all its neighbors and then
passes it on to one of the already assigned agents (if possible). The recipient then attempts to extend it to an equilibrium assignment. Once a complete assignment is found
an additional pass is made, verifying that this joint assignment is indeed an equilibrium. The authors state that “The difficulty, of course, is that the inductive step of the
assignment-passing phase may fail... (in which case) we have reached a failure point
and must backtrack”. Unfortunately, details on this backtracking are not specified and
required a reconstruction which is sound, complete and terminates.
The following pseudo code provides an outline of NashProp’s assignment passing
phase. The algorithm proceeds in synchronous steps in which only one agent acts. The
first agent initializes the search by generating a cpa token – Current Partial Assignment
– to be passed between the agents. It then adds a joint assignment of itself and all
its neighbors (we use N (i)−cpa to denote all neighbors not on the cpa) to the cpa and
passes it to the next agent in the resulting order. An agent receiving a cpa first checks the
consistency of its assignment. That is, the agent verifies that its assigned value is a best
response to its neighbors assignment. If not all neighbors have values assigned to them
then the agent attempts to assign new values which will be consistent with its current
assignment. Otherwise, if its assigned value does not correspond to a best response
action, the agent reassigns a value to itself and any of its lower priority neighbors. If no
consistent assignment can be made, a backtrack message is generated and passed to the
previous agent on the cpa. If a joint consistent assignment is found then either the agent
passes the cpa to the next agent in the ordering or a backcheck message is generated
and passed to the previous agent.
Upon receiving a backtrack message, the agent reassigns its own value and any
value of its lower priority neighbors. If the joint assignment is consistent the cpa is
moved forward and the search resumes. If it is not, a backtrack message is passed to the
previous agent. Finally, once a full assignment is reached agents pass the cpa with this
assignment backward to higher priority agents. If an agent encounters an inconsistent
assignment for itself, it generates a backtrack message and passes it to the last agent in
the ordering. When the initializing agent receives a consistent backcheck message then
the search is terminated and a solution is reported.
5.2
Evaluation
The performance of both NashProp, ANT and ANT with AC (labeled ANT+AC) is
measured in terms of Non Concurrent Constraint Checks (NCCCs) and network load,
measured in terms of the total number of messages passed between agents. The algorithms implementation uses the AgentZero framework [7], which provides an asynchronous execution environment for multiple agents solving distributed constraint problems. The source code of all experiment is available at: http://www.cs.bgu.ac.
il/˜alongrub/files/code/ANT.
To refrain from exceedingly large domains required for finding accurate equilibrium, the problems used for evaluating both algorithms were comprised of random
interactions in which a pure strategy Nash equilibrium was guaranteed to exist. This
allowed for controlled domain size with no changes to the code.
The evaluation included two setups in which each agent was connected to 3 others.
The cost values of every agent in every constraint were uniformly taken from the range
0 to 9, and were then updated to assure that at least one pure strategy Nash equilibrium
existed. Each data point was averaged over 10 instances and a time out mechanism
limited the execution of a single run to 5 minutes.
The first setup included 7 agents and varied the domain sizes of all agents in the
range 2 .. 10. The number of NCCCs as a function of domain sizes is presented in
Figure 2. Both variants of ANT provide a dramatic speedup over NashProp – roughly
three order of magnitudes less NCCCs. The results also demonstrate that in this setting,
ANT+AC is less effective than ANT. The number of Non Concurrent Steps (NC-Steps),
approximating performance when communication time is significantly higher than an
agent’s computation time, was slightly lower for ANT+AC than ANT1 . This implies
that the AC phase failed to significantly prune the agents’ domains and can explain the
NCCC overhead of ANT+AC.
The second setup, measuring performance as the number of agents increases, shows
a similar improvement over NashProp. In this setup, the domain size remained constant
at 3, and the number of agents was varied in the range 5 .. 15. Figure 3 presents the
number of NCCCs in this setup. ANT+AC is faster than ANT by roughly one order
order of magnitude and four orders of magnitude faster than NashProp as the number
of agents increase.
1
Not presented herein due to lack of space
NashProp, assignment passing phase
procedure NASH P ROP II
end ← false;
if is initializing agent then
cpa ← new CPA();
cpa.assignValid(self ∪ N (i)−cpa );
send(CPA, cpa.next(), cpa);
while ¬end do
msg ← getM sg();
switch (msg.type)
case CPA : ProcessCPA(msg.cpa);
case BT : BackTrack(msg.cpa);
case BC : BackCheck(msg.cpa);
case STP : end ← true;
procedure P ROCESS CPA(cpa)
if ¬isConsistent(cpa) then
cpa.assignValid(self ∪ N (i)−cpa ); // removes all lower priority assignments!
else
cpa.assignValid(N (i)−cpa );
if ¬isConsistent(cpa) then
currentDomain← full domain;
send(BT, cpa.prev(), cpa);
else if isLast then
send(BC, cpa.prev(), cpa);
else
send(CPA, cpa.next(), cpa);
procedure BACK T RACK(cpa)
cpa.assignValid(self ∪ N (i)−cpa ); // removes all lower priority assignments!
if ¬isConsistent(cpa) then
send(BT, cpa.prev(), cpa);
else
send(CPA, cpa.next(), cpa);
procedure BACK C HECK(cpa)
if ¬isConsistent(cpa) then
send(BT, lastAgent, cpa);
else if is initializing agent then
send(STP, all agents, null);
else
send(BC, cpa.prev(), cpa);
The algorithms’ network load in the second experiment is presented in Figure 4. Interestingly, the number of messages generated by NashProp is proportional to the number of NCCCs. That is, for every message sent, there were roughly 14 constraint checks
(all constraint checks are non concurrent due to NashProp’s synchronous nature). This
ratio remains constant throughout the entire experiment, indicating that NashProp’s per-
1E+09
100000000
10000000
NCCC
1000000
ANT
100000
ANT+AC
10000
NashProp
1000
100
2
3
4
5
6
7
8
9
10
Domain size
Fig. 2: Number of NCCCs as a function of the domain size
100000000
10000000
NCCC
1000000
ANT
100000
ANT+AC
10000
NashProp
1000
100
5
6
7
8
9
10 11 12 13 14 15
Number of agents
Fig. 3: Number of NCCCs as a function of the number of agents
formance is highly affected by the agents’ local environment – the number of adjacent
agents and/or the size of their domains.
In sharp contrast to NashProp’s constant ratio of NCCCs and network load, the
number of messages generated by ANT and ANT+AC does not seem to be aligned with
the number of NCCCs. One possible explanation for this discrepancy stems from the
combination of a high arity problem and asynchronous execution. If, for some reason,
the communication links are not steady and some agents are more active than others, it
is possible that a subset of the agents in a constraint exchange messages with each other
until the last agent involved in the constraint reacts to the current state. The messages
the agents exchange are correct with respect to their agent view and Nogood store, but
progress towards the global satisfaction goal can only be made after the final agent in
the constraint reacts. As a result, the ANT algorithm can generate redundant messages
which increase its communication cost. Nonetheless, ANT+AC almost always requires
less messages than NashProp.
6
conclusions
The present paper explores the similarities between distributed constraint reasoning
and graphical games – a well established means for representing multi agent problems.
10000000
# msgs
1000000
100000
ANT
10000
ANT+AC
NashProp
1000
100
5
6
7
8
9
10
11
12
13
14
15
Number of agents
Fig. 4: Number of messages as a function of the number of agents
A general form graphical problem is first represented as an Asymmetric Distributed
Constraint Optimization Problem (ADCOP) which is capable of capturing the agents’
preferences over outcomes. Then, the ADCOP is transformed to a Nash-ADCP with
unique, asymmetric, high arity constraints. A satisfying solution to this Nash-ADCP is
proven to be an -equilibrium of the graphical game.
The constraint based formulation of the graphical problem enables one to apply
constraint reasoning techniques to solve the underlying problem. Asynchronous Nash
back Tracking (ANT), a variant of ABT and ABT-1ph [14, 2, 3] is presented and a proof
that it is always capable of finding -Nash equilibria is provided.
The performance of ANT and a combination of ANT and the AC mechanism described in [11] is compared to NashProp, the only other distributed algorithm for finding
-Nash equilibrium on general form graphs [11]. The paper also presents the first (to the
best of our knowledge) detailed outline of NashProp’s second phase. The results of our
evaluation indicate a three to four orders of magnitude speedup in terms of run-time as
measured by NCCCs and a number of messages which is generally not greater than that
of NashProp in favor of the ANT variants.
The connection between distributed constraint reasoning and graphical games induces multiple directions for future work. The special constraint structure of a NashADCSP hinders the performance of some algorithms such as those applying forward
checking. However, this structure can hopefully be utilized to find even more efficient
distributed algorithms to graphical games. The structure of non random graphical games
should also be investigated. Kearns et. al report that the AC phase of NashProp greatly
reduced the agents’ domains. This phenomenon was not observed on the random instances examined in the present study, and it is our belief that this stems from the fact
that in [11], the evaluation only considered a binary action space. Finally, the relation
between non cooperative agents, privacy loss and the agents ability to rationally manipulate an algorithm is another topic we intend to research in the near future.
Acknowledgment: The authors would like to thank Benny Lutati for his support throughout the experimental evaluation.
References
1. Krzysztof R. Apt, Francesca Rossi, and Kristen Brent Venable. A comparison of the notions
of optimality in soft constraints and graphical games. CoRR, abs/0810.2861, 2008.
2. Christian Bessière, Arnold Maestre, Ismel Brito, and Pedro Meseguer. Asynchronous backtracking without adding links: a new member in the abt family. Artif. Intell., 161(1-2):7–24,
2005.
3. Ismel Brito, Amnon Meisels, Pedro Meseguer, and Roie Zivan. Distributed constraint satisfaction with partially known constraints. Constraints, 14(2):199–234, 2009.
4. Archie C. Chapman, Alessandro Farinelli, Enrique Munoz de Cote, Alex Rogers, and
Nicholas R. Jennings. A distributed algorithm for optimising over pure strategy nash equilibria. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI
2010, Atlanta, Georgia, USA, July 2010.
5. Alon Grubshtein, Roie Zivan, Amnon Meisels, and Tal Grinshpoun. Local search for
distributed asymmetric optimization. In Proceedings of the 9th International Conference
on Autonomous Agents and Multi-Agent Systems (AAMAS’10), pages 1015–1022, Toronto,
Canada, May 2010.
6. Michael J. Kearns, Michael L. Littman, and Satinder P. Singh. Graphical models for game
theory. In UAI ’01: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pages 253–260, San Francisco, CA, USA, 2001.
7. Benni Lutati, Inna Gontmakher, and Michael Lando. AgentZero – a framework for executing, analyzing and developing DCR algorithms. http://code.google.com/p/
azapi-test/.
8. Rajiv T. Maheswaran, Jonathan P. Pearce, and Milind Tambe. Distributed algorithms for
DCOP: A graphical-game-based approach. In Proceedings of Parallel and Distributed Computing Systems PDCS’04), pages 432–439, September 2004.
9. Amnon Meisels. Distributed Search by Constrained Agents: Algorithms, Performance, Communication. Springer Verlag, 2007.
10. Noam Nisan, Tim Roughgarden, Éva Tardos, and Vijay V. Vazirani. Algorithmic Game
Theory. Cambridge University Press, 2007.
11. Luis E. Ortiz and Michael J. Kearns. Nash propagation for loopy graphical games. In NIPS
’02: Advances in Neural Information Processing Systems 15, pages 793–800, Vancouver,
British Columbia, Canada, 2002.
12. Martin J. Osborne and Ariel Rubinstein. A Course in Game Theory. The MIT Press, 1994.
13. Vishal Soni, Satinder P. Singh, and Michael P. Wellman. Constraint satisfaction algorithms
for graphical games. In 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007), page 67, Honolulu, Hawaii, USA, May 2007.
14. M. Yokoo, E. H. Durfee, T. Ishida, and K. Kuwabara. Distributed constraint satisfaction
problem: Formalization and algorithms. IEEE Trans. on Data and Kn. Eng., 10:673–685,
1998.

Download Report

Finding a Nash Equilibrium by Asynchronous Backtracking

Paperzz.com

Your Paperzz