Computing the Team–maxmin equilibrium in Single

Computing the Team–maxmin equilibrium in Single–Team Single

1
Undefined 1 (2016) 1–5
IOS Press
Computing the Team–maxmin equilibrium
in Single–Team Single–Adversary
Team Games
Nicola Basilico a , Andrea Celli b , Giuseppe De Nittis b,˚ and Nicola Gatti b
a
Computer Science Department, University of Milan
via Comelico 39/41
20135 Milan
Italy
E-mail: [email protected]
b
Department of Electronics, Information and Bioengineering, Politecnico di Milano
piazza Leonardo da Vinci, 32
20133 Milan
Italy
E-mail: {andrea.celli,giuseppe.denittis,nicola.gatti}@polimi.it
Abstract. A team game is a non–cooperative normal–form game in which some teams of players play against others. Team
members share a common goal but, due to some constraints, they cannot act jointly. A real–world example is the protection
of environments or complex infrastructures by different security agencies: they all protect the area with valuable targets but
they have to act individually since they cannot share their defending strategies (of course, they are aware of the presence of the
other agents). Here, we focus on zero–sum team games with n players, where a team of n ´ 1 players plays against one single
adversary. In these games, the most appropriate solution concept is the Team–maxmin equilibrium, i.e., the Nash equilibrium
that ensures the team the highest payoff. We investigate the Team–maxmin equilibrium, characterizing the utility of the team
and showing that it can be irrational. The problem of computing such equilibrium is NP–hard and cannot be approximated
within a factor of n1 . The exact solution can only be found by global optimization. We propose two approximation algorithms:
the former is a modified version of an already existing algorithm, the latter is a novel anytime algorithm. We computationally
investigate such algorithms, providing bounds on the utility for the team. We experimentally evaluate the algorithms analyzing
their performance w.r.t. a global optimization approach and evaluate the loss due to the impossibility of correlating.
Keywords: Team–maxmin equilibrium, Algorithmic game theory, Equilibrium computation
1. Introduction
Due to the recent spread of game theory techniques
to solve real–world problems in strategic settings, e.g.,
economics or decision making, the computation of
Nash equilibria and Maxmin strategies has become a
very important topic in such fields [2]. Specifically,
the study of how to compute the Nash equilibrium has
* Corresponding
author. E-mail: [email protected]
been considered one of the most challenging problems
in the last decade in Computer Science [3].
In non–cooperative constant–sum 2–player games,
Maxmin strategies are one of the most adopted solution concepts [12]. The rationale behind such solution
concept is the minimization of the loss in the worst–
case scenario. According to this interpretation, players
are usually referred to as the maximizer and the minimizer, the former aiming at maximizing her own util-
c 2016 – IOS Press and the authors. All rights reserved
0000-0000/16/$00.00 2
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
ity, the latter acting to minimize the utility of the first
player.
In real–world scenarios, several interactions counterpose groups of agents instead of single individuals.
The agents in a group share the same goal but, even
though they are aware of the presence of the other players, they may be forced to act on their own. A practical example is the protection of environments or infrastructures by different security agencies (customarily called Defenders in the related literature, see [7]
for an example) they all aim at defending the area to
be secured against malicious attackers but they cannot
communicate during the mission to coordinate their
actions on the field. In these cases, a non–cooperative
game among n players can be set: some players constitutes the guards team, the others form the attackers team. The team players receive the same utility,
say UT , defined by keeping into account the utilities
of the different members. In these scenarios, the most
suitable solution concept is the Team–maxmin equilibrium, which is actually the best Nash equilibrium of
the game for the team, providing the team the highest
utility [13].
In the literature, there are just few works that explore
solution concepts in games with more than two players
from an algorithmic point of view. In particular, we
recall that for 3–player games with binary utilities and
n actions per player, it is NP–hard to approximate the
minmax value and, for duality, the Maxmin value, for
each of the players within n32 [2].
Original contributions. Starting from a broader
definition of team games, we restrict our attention to
zero–sum single–team single–adversary team games
(STSA–TG) with n players, where a team of n ´ 1
players plays against one single adversary, as defined
in [13]. For this specific case, which is very common in practice, we provide properties that characterize the Team–maxmin equilibrium. We propose two algorithms to deal with its computation: the former is a
modified version of the quasi–polynomial time algorithm1 presented in [6], the latter is a novel anytime
approximation algorithm.
More precisely, the rest of the paper is organized as
follows:
1 An algorithm is called quasi–polynomial if it requires more than
polynomial time but yet not exponential time. The worst–case runk
ning time of a quasi–polynomial time algorithm is 2Opplognq q , for
some constant k.
– Section 2 introduces the preliminary notions of
game theory that will be useful throughout the
paper and the concept of Team–maxmin equilibrium;
– in Section 3 we investigate the Team–maxmin
equilibrium, showing some properties on the utility and providing two algorithms to compute it,
along with their computational analysis;
– in Section 4 we compare the quality of the solution of our approximation algorithms w.r.t. global
optimization, which is able to return the optimal solution if no timeout is set. Furthermore, we
evaluate the loss of the team, in terms of utility,
due to the impossibility of correlating;
– Section 5 summarizes the main results of this
work and suggests some problems and issues that
should be faced and solved in the future.
2. Preliminaries
In this section, we introduce some game–theoretical
preliminaries necessary for our study. More precisely,
we initially introduce the definition of game and
team game and subsequently provide the definition of
Maxmin and Team–maxmin strategy.
2.1. Team games
In game theory, a normal–form game is a tuple
pN, A, U q where:
– N “Ś
t1, 2, . . . , nu is the set of players;
– A “ iPN Ai , where Ai “ ta1 , a2 , . . . , ami u is
the set of player i’s actions;
– U “ tU1 , U2 , . . . , Un u, where Ui : A Ñ R is the
utility function of player i.
In this paper, we study only games in which mi
is finite for every player i. A strategy profile is defined as s “ ps1 , s2 , . . . , sn q, where si : Ai Ñ r0, 1s
is
ř player i’s mixed strategy subject to the constraint
ai PAi si pai q “ 1. As customarily used in game–
theoretical notation, we will use the subscript í to denote the set containing all the players except player i.
A team T Ď N is a subset of players sharing the
same utility function, which is maximal under inclusion. Formally, for any i, j P N we say that i, j P T if
and only if Ui “ Uj . We denote with UT the utility of
each player in team T . A team game is a normal–form
game where at least one team is present. Notice that
two degenerate cases of team games are possible: in
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
Example 1 (Team (security) game) A gang Γ of criminals is logging some ancient trees in a forest, which
extends over two countries. In order to get rid of it,
2 In coordination games the payoff of any outcome to each player
is the same.
the countries have created a group of rangers, divided
in two teams τ1 , τ2 , one for each country. For security
reasons, the teams can share with the others the position they can occupy to protect the trees, but then,
in order to keep radio silence, they cannot communicate when the action is undertaken and so each team
executes its route autonomously. The forest contains
four trees, whose values are πpt1 q “ 0.9, πpt2 q “
0.7, πpt3 q “ 0.5, πpt4 q “ 0.3. Γ can attack any tree
and its attack is successful if it saws a tree t and fades
away without being detected. Let us denote with pij
the j–th position occupied by the i–th team. We have
the following actions:
–
–
–
–
p11 allows τ1 to protect t1 ;
p12 allows τ1 to protect t2 ;
p21 allows τ2 to protect t3 ;
p22 allows τ2 to protect t4 .
The payoffs are given as follows:
#
UΓ “
πpti q, if the attack is successful
,
0,
otherwise
Uτi “ ´
UΓ
.
2
The game can now be formulated as follows (being
the game zero–sum, we report only the values for the
team members):
τ1 , τ2
the former, each team is composed of a single player,
and therefore the number of teams equals the number
of players; in the latter, there is only one team grouping
together all the n players (such a game is customarily
called coordination game2 ).
Team games capture situations in which players pursue equal objectives. In principle, the coordination of
teammates can be of two forms: correlated, in which
a correlating device decides a joint action (an action
profile specifying one action per teammate) and then
communicates each teammate her action, and non–
correlated, in which each player plays independently
from the others. In the first case, the strategy of the
team T is said to be correlated. Ś
Given the set of team
action profiles defined as AT “ iPT Ai , a correlated
team strategy is defined
ř as sT : AT Ñ r0, 1s subject to
the constraint that at PAT sT pat q “ 1. In other words,
teammates can decide jointly their strategy and can
synchronize its execution. In the second case, players
are subject to the inability of correlating their actions
and their strategies si are mixed, as defined above for a
generic normal–form game. In other words, teammates
can decide jointly their strategies, but they cannot synchronize their execution, and therefore each player independently draws an action from her strategy. Dealing with teams in which teammates can correlate is
easy from a computational point of view. Indeed, such
a team is equivalent to a single player whose actions
are the joint actions of the teammates. On the other
hand, dealing with teams in which teammates cannot
correlate is a challenging problem that is still open.
In this work, we explore for the first time algorithmic
methods for such problem.
A particularly significant use case for team games
can be found in the field of security games [1], a class
of games that received some interest in the last few
years due to its usefulness in strategic security resource
allocation problems. We report the following example
which can be called a team security game, modeling an
adversarial scenario where multiple ranger teams from
different countries deal with the task of protecting an
environment from a gang of criminals (for a specific
example of security game of this type see, for example,
[7]).
3
r11 , r21
r11 , r22
r21 , r21
r21 , r22
Γ
t1
t2
t3
t4
0 ´0.35 0 ´0.15
0 ´0.35 ´0.25 0
´0.45 0
0 ´0.15
´0.45 0 ´0.25 0
2.2. Team–maxmin strategies
Let us start by recalling some basic game theoretical notions that will come at hand. One interesting
solution concept in constant–sum games is given by
Maxmin strategies. Given an n–player constant–sum
game the Maxmin strategy for player i is defined as
si “ arg maxsi minsí Ui psi , sí q. This strategy reflects player i’s conservative behavior of aiming at the
best worst case under the assumption that the other
players will seek the lowest payoff for i. Called v the
value returned by the previous maxminimization, then
4
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
the Maxmin strategy guarantees i at least a payoff of
v, that is Ui psi , sí q ě v for any sí . Maxmin strategies become particularly interesting in 2–player zero–
sum games, due to their relation with another popular
solution concept: the Nash equilibrium. A Nash equilibrium is a strategy profile s “ ps1 , s2 , . . . , sn q from
which no player can gain more by unilaterally changing her strategy. In 2–player zero–sum games, where
U1 “ Ú2 , Maxmin strategy profiles (where both
players adopt a Maxmin strategy) and Nash equilibria
coincide and yield to player 1 the same game value
v “ v. Besides this, Maxmin strategies (and, consequently, Nash Equilibria) are interchangeable, can be
computed in polynomial time via linear programming,
and always yield a rational game value (that is, a value
that can be exactly represented in a computer) if the
utility functions return rational values. In the following
example, we show how the Maxmin strategy space can
be represented depending on the strategies played by
the two agents.
1
Example 2 (Maxmin strategy in a 2–player zero–
sum game) Consider the following 2–player zero–sum
strategic game (we report only U1 ):
s1 (a2 )
a4
b
b
0
E[U1 ] =b 1.2
a3
a4
a3
b
0
0.4
1
s1 (a1 )
b
s1 (a2 )
b
0.4
b
s1 (a1 )
Maxmin strategies can be easily generalized to team
games. To ease notation, let us assume that T “
t1, . . . , tu and that sT “ ps1 , s2 , . . . , st q. The Maxmin
strategy for team T when teammates are not correlated
is, analogously to the previous case, defined as
sT “ arg max min UT ps1 , . . . , st , s´T q.
s1 ,...,st s´T
Property 1 In any STSA–TG the Team–maxmin strategy sT is always part of a Nash equilibrium and this
equilibrium is called Team–maxmin equilibrium.
Property 2 No Nash equilibrium different from the
Team–maxmin equilibrium can make the team better
off.
2
a3 a4
a1 0 3
a2 2 0
Below, on the left, we report the strategy simplex of
player 1 partitioned in subareas with the corresponding action player 2 would play. On the right, we also
show how the expected utility of player 1 varies in the
simplex given the strategy of player 2.
1
where ´T denotes the set of all the opponents of the
team (possibly partitioned in teams as well). Team–
maxmin strategies still represent reasonable ways to
play a game, but, as what happens with 2–player
general–sum games, do not provide equilibrium states
in the general–sum case. However, in their seminal
work, Von Stengel and Koller show the validity of
useful theoretical properties in a particular case of
team games: zero–sum, single–team, single–adversary
team games (STSA–TG). In these games we have that
N “ T Y tnu where players in the team T share the
same utility function UT , while the adversary n has
utility function Un “ ´pn ´ 1qUT since |T | “ n ´ 1.
A possible interpretation is that if the team as a whole
p
gets a payoff of p, then the adversary should pay n´1
to each of the n ´ 1 members. The results provided by
von Stengel and Koller can be summarized in the following two properties (we refer the reader to [13] for
the formal statements and proofs):
Since a Team–maxmin strategy always exists, from
Properties 1 and 2 we obtain that any STSA–TG has a
Team–maxmin equilibrium which is also the best Nash
equilibrium for the team. (Notice that in case multiple Team–maxmin equilibria are present, they cannot
be arbitrarily interchanged like Maxmin strategies in
2–player zero–sum games). Moreover, we cannot exploit a linear formulation to compute it and, instead,
we need to rely on a non–linear, non–convex program
in which we maximize the unconstrained variable v
where for every adversary’s action an P An the following constraint must hold:
v´
ÿ
aT PAT
UT paT , an q
ź
si pai q ď 0.
(1)
iPT
In the resulting non–linear program, team players’
strategies s1 , s2 , . . . , sn´1 and team’s expected utility lower bound v are the decision variables. In Constraints (1) aT is a team action profile specifying an action for each team member, and ai is the action played
by team member i in action profile aT . We call the optimal solution of the above program team game value
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
and we denote it with vT . Notice that, if we allow
correlation between team members the game becomes
substantially equivalent to a 2–player zero–sum game.
In such case, the team plans and acts like a single
player. Formulation–wise, this amounts to replace in
the strategy vector s1 , s2 , . . . , sn´1 with a single strategy s1 over actions in AT . ś
Then, we would replace
in Constraints (1) the term iPT si pai q with s1 paT q
obtaining a linear program yielding an optimal value
greater or equal to vT (the ability of correlating their
actions can only improve the team’s revenue in the
game).
In the next section, we further develop the analysis
of STSA–TG by showing how other nice properties of
their 2–player zero–sum counterparts cease to hold.
3. Problem Analysis
In this section we characterize the Team–maxmin
equilibrium. More precisely, in Section 3.1, we analyze the rationality of the equilibrium value given a rational input and we computationally deal with the approximation of the equilibrium. Due to the negative result, in Section 3.2, we provide two algorithms to deal
with the problem of finding the Team–maxmin equilibrium: the former is an adaptation of the algorithm
proposed in [6], the latter is a novel approximation algorithm.
3
To compute the Team–maxmin strategy for T we solve
the following problem:
max
2
a3 a4
a1 1 0
a2 0 0
a5
1
1
Proof 1.1 We prove this by providing an example.
Consider the following zero–sum STSA–TG where
T “ t1, 2u (we report utilities for player 3, that is the
adversary):
2
a3 a4
a1 0 0
a2 0 2
a6
v ď s1 pa1 qs2 pa3 q
v ď 2p1 ´ s1 pa1 qqp1 ´ s2 pa3 qq
s.t.
We observe that the above problem is formulated
considering only s1 pa1 q ad s2 pa3 q as explicit variables. In fact, Player 1 can play only two actions,
namely a1 , a2 , and the probability with which such actions are played must sum to 1. Formally, s1 pa1 q `
s1 pa2 q “ 1, thus we can rewrite s1 pa2 q as s1 pa2 q “
1 ´ s1 pa1 q. A similar observation can be done for
Player 2, which leads us considering only s2 pa3 q as
explicit variable for Player 2: s2 pa4 q “ 1 ´ s2 pa3 q.
The maximum value is achieved when both inequalities hold as equalities. Thus, we can write:
s1 pa1 qs2 pa3 q “ 2p1 ´ s1 pa1 qqp1 ´ s2 pa3 qq
from which it follows:
s1 pa1 q “
2s2 pa3 q ´ 2
.
s2 pa3 q ´ 2
Now we write:
v“
Theorem 1 (Rational and irrational Maxmin value)
The Team–maxmin value in a STSA–TG can be irrational.
v
s1 pa1 q,s2 pa3 q
3.1. Properties of the Team–maxmin equilibrium
We show that, even considering a team of only two
players against one single adversary, the value of the
game can be irrational and therefore cannot be represented by a computer. Such result and the corresponding proof are a consequence of [6].
5
s2 pa3 qp2s2 pa3 q ´ 2q
.
s2 pa3 q ´ 2
?
This expression is maximized for?s2 pa3 q “ 2 2 and
the corresponding value is 6 ´ 4 2.
l
Furthermore, we can show that even the problem of
approximating the Team–maxmin value is hard. Such
results comes as a direct consequence of the following
theorem taken from [6] (again, we refer the reader to
that work for full technical details).
Theorem 2 (3–player Maxmin complexity) Approximating the Maxmin value (without correlation) in a
3–player game with m actions per player and binary
payoffs within an additive error of n1 is NP–hard for
any ą 0.
We report other two results whose adaptation fits our
problem. The following theorem gives a bound on the
maximum number of actions the team will randomize
on at the equilibrium.
6
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
ÿ
v´
Theorem 3 (Maxmin strategy support size) Given
an STSA–TG where |An | “ k, there is a Team–maxmin
strategy (without correlation) for the team where each
team member’s support has a size that is not larger
than k.
UT paT , an q
si pai q ď 0
@an P An
iPT
aT PAT
ÿ
ź
si pai q “ 1, si pai q ě 0
@i P T, ai P Ai
ai PAi
Proof sketch 3.1 Given a Maxmin strategy s of players N ztnu against player n such that at least one
player, say player i, plays more than k actions with
strictly positive probability, it is possible to show that
there is a Maxmin strategy in which i plays no more
than k actions. Given s, we keep the strategies of all the
maximizers except player i fixed and find the Maxmin
strategy of player i against player n, say s11 . Since this
is a 2–player game, the Maxmin strategy has a support
with a size no larger than k. The strategy profile s1 obtained from s once the strategy of player i has been replaced with s11 is a Maxmin strategy of players N ztnu
against player n. Repeating the same procedure for all
the minimizers, it is possible to find a Maxmin strategy
in which the support of each player is upper bounded
by k.
l
Starting from these results, we now deal with the
computation of the Team–maxmin equilibrium.
3.2. Computing the Team–maxmin equilibrium
We recall that computing exactly the Team–maxmin
equilibrium cannot be done in polynomial time. Moreover, from Theorem 1 we know that the value to be
computed may be irrational while from Theorem 2 we
know that it is hard to approximate the equilibrium
within an additive error. In the following we describe
three resolution methods for this problem: the first one
is based on global optimization, which returns the optimal solution if no timeout is set, while the other two
are based on ad hoc algorithms, which return an approximation solution.
3.2.1. Global optimization
The first available approach that can be exploited
is to tackle the resolution of the Team–maxmin non–
linear optimization problem with a global optimization solver. The problem is the one formulated in Section 2.2, we fully report it here for clarity:
max v
s.t.
In our experiments we compute solutions for this problem with BARON [11], the most performing global
solver for continuos problems among all the existing
optimization problems solvers [8].
3.2.2. A quasi–polynomial algorithm
In [6] a method for approximating minmax values is
presented. Such method can be naturally extended to
the case we are considering: here, we present and discuss a variation of such algorithm adapted to solve our
problem. Specifically, we propose a heuristic enumeration, which is a subset of the complete one proposed
in [6]. The steps are reported in Algorithm 1, they
work as follows. For every team member i we comAlgorithm 1 HeuristicEnumeration
1: v ˚ “ `8
2: for all i P T do
P
T
3:
Pi “ tVi1 , Vi2 , . . . | Vi Ď Ai , |Vi | ď ln2|A2 i | u
4: end for
Ś
5: C “
iPT Pi
6: for all pV1 , V2 , . . . , Vn´1 q P C do
7:
for all i P T#do
1
, if ai P Vi
8:
si pai q “ |Vi |
0
otherwise
9:
end for
10:
v ˚ “ mintv ˚ , min UT ps1 , s2 , . . . , sn´1 qu
sn
11:
12:
end for
return v ˚
pute Pi defined as the collection of all the subsets of
P
T
i’s actions whose cardinality does not exceed ln2|A2 i |
(Line 3), where P r0, 1s is a parameter that defines the
size of the enumeration. Next we consider the Cartesian product C of all the Pi s (Line 5) where any of its
elements, by definition, specifies one subset of actions
Vi for each team member i. For each one of such elements pV1 , V2 , . . . , Vn q (Line 6) we consider a team
strategy profile where each team member i uniformly
randomizes over the actions specified by her associated
Vi (Line 8). We compute the best value for the adversary (the minimizer) given such team strategy profile
keeping track of the minimum over C (Line 10) which
will be returned as the final result of the algorithm.
7
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
Proposition 1 The solution returned by Algorithm 1
has an approximation bound that depends on the number of actions per player m.
Proof 3.2 First, notice that, since m is finite, the solution returned by Algorithm 1 cannot go arbitrarely
close to the optimal solution. In particular, let us consider the following example of bimatrix game, a degenerate case of an STSA–TG where the team is composed by only player 1:
Algorithm 2 IteratedLP
1: @i P T , scur
Ð ŝi
i
2: repeat
3:
for all i P T do
4:
maximize vi
s.t.
@an P Ařn :
vi ď
xai UT paT , an q
aT PAT
1
ř
2
a3 a4
a1 1 0
a2 0 0.5
In this specific game Algorithm 1 would consider
the following supports: ta1 u, ta2 u,ta1 , a2 u which,
when player 1 plays uniformly, would yield 0, 0, and
1
4 , respectively. The Maxmin value of player 1 in this
game is 13 meaning that the algorithm scores an actual
1
. Clearly, having exhaustively conadditive error of 12
sidered all the supports in the game, a better additive
error cannot be achieved.
l
Moreover, the pseudo–polynomial complexity makes
the algorithm also impractical even in relatively small
instances. For example, an ă 13 can only be adopted
in team games with a number of actions greater than or
equal to 10. However, HeuristicEnumeration
would require a a number of iterations of the order of
1018 for a STSA–TG with a team of 2 members and
10 actions per player.
3.2.3. An anytime algorithm
Despite providing (bounded) approximation guarantees, the algorithm presented in the previous section is clearly not promising in terms of scalability. In Algorithm 2 we propose a method we call
IteratedLP for which we give an anytime implementation. It works by maintaining a current solution
scur which specifies a strategy for each team member at that can be returned at any time. It is initialized
(Line 1) with a starting solution ŝ which, in principle,
can prescribe an arbitrary set of strategies for the team
(e.g., uniform randomizations). Then for each team
member i (Line 3) we instantiate and solve the specified linear program (Line 4). The decision variables
of this LP are vi and, for each action ai of player i,
xai . We maximize vi subject to the upper bound given
by the first constraint which is a rewriting of Eq. (1)
where we assumed that the strategy of player i (relabeled with x to distinguish it) is a variable while the
ś
scur
j paj q
jPT ztiu
xai “ 1
ai PAi
@ai P Ai : xai ě 0
end for
i1 “ arg max vi˚
iPT#
x˚ai
if i “ i1
7:
scur
i pai q “
cur
si pai q otherwise
8: until (convergence or timeout)
9: return scur
5:
6:
strategies of the other team members are constants set
to the associated value specified by the current solution. (Notice that, in the LP, aj is the action that team
member j plays in the team action profile aT ). The optimal solution of the LP is given by vi˚ and x˚ , basically representing the Maxmin strategy of team member i once the strategies of teammates have been fixed
to the current solution. Once the LP has been solved
for each i, the algorithm updates the current solution
in the following way (Line 7): the strategy of the team
member that obtained the best LP optimal solution is
replaced with the corresponding strategy from the LP.
This process continuously iterates until convergence or
until some timeout is met.
Algorithm 2 is an iterated algorithm and, at each iteration, the value of the game increases (non–strictly)
monotonically. We run it using multiple random restarts,
that is generating a set of different initial assignments ŝ
(Line 1). Once convergence is achieved, we pass to the
next random restart generating a new strategy profile
for the team. This algorithm is polynomial and probabilistic exact sampling in the simplices space and then
solving the linear program iteratively several times.
Being just probabilistic exact, we are pushed to study
the approximation ratio of such algorithm.
Theorem 4 Given ŝ such that, for each player i in
the team, sî prescribes a uniform strategy profile, the
worst–case approximation factor of Algorithm 2 can-
8
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
not be better than
for each player.
1
m , where m is the number of actions
Proof 4.1 To prove this we proceed by constructing an
instance where Algorithm 2 achieves a factor of ex1
independently of the initial assignment ŝ. We
actly m
provide detailed insights for the case of uniform ŝ and
we describe the rationale behind its extension to the
general case.
Consider a STSA–TG where the team is composed
by 2 members and each player has m actions. Moreover consider the minimizer to have only one action
available. The team utility is defined as UT “ Im
where Im is the mˆm identity matrix. Let us consider,
in Line 1 of Algorithm 2, a uniform ŝ prescribing to
each team member to play each action with probabil1
. The resolution of the LP of Line 4, once fixed
ity m
the strategy of one team member to such uniform distribution, would output the same uniform distribution
for the other one. In other words, ŝ is a local maximum for our algorithm. Thus, the algorithm would return a strategy profile guaranteeing the team a payoff
m
1
of m
2 “ m whereas the Team–maxmin value amounts
to 1 and can be clearly obtained by any team profile
in which both team members play the same action in
pure strategies. So the approximation factor achieved
1
by the algorithm would be exactly m
.
l
We conjecture that Theorem 4 holds in general. It
should be possibile to build, for any arbitrary initialization ŝ, a game in which Algorithm 2 has an approx1
imation ratio of m
, but the problem is still open.
4. Experimental evaluation
In this section we provide an experimental comparison of the algorithms described in the previous section. In Section 4.1, we present our experimental setting, while, in Section 4.2, we present and discuss the
experimental results.
4.1. Experimental setting
Our experimental setting is based on instances of
RandomGames class generated by GAMUT [9] (the
main testbed for game theory algorithms). Specifically,
once generated a game instance we extract the utility
function of player 1 and assign it to all the team members. Furthermore, in each generated game instance,
the payoffs are between 0 and 1. We use game in-
stances with a different number of players n and a
different number of actions m for each player, with
n P t3, 4, 5u as follows:
$
5 to 40, step “ 5, n “ 3
’
’
’
& 50 to 80, step “ 10, n “ 3
m“
’ 5 to 15, step “ 5, n “ 4
’
’
%
5 to 10, step “ 5, n “ 5
.
Algorithms are implemented in Python 2.7.6, adopting
GUROBI 6.5.0 [5] for solving linear problems, AMPL
20160310 [4] and BARON 14.4.0 [11,10] for solving
global optimization programs. We set a timeout of 60
minutes for the resolution of each instance. All the algorithms are executed on a computer with the following characteristics:
– CPU: Intel Xeon CPU E5–4610 v2 @ 2.30GHz;
– RAM: 128 GB;
– OS: Ubuntu 16.04.1 LTS.
4.2. Experimental results
4.2.1. Global optimization
The quality of the solutions returned by global optimization solver BARON is reported in Figure 1. We report the ratio between the lower (a.k.a. primal) bound
and the upper (a.k.a. dual) bound returned by BARON
once terminated. When BARON finds the optimal solution (up to an accuracy of 10´9 ), the lower bound
equals the upper bound achieving a ratio of 1. This
happens: with 3 players up to 15 actions per player (except for some outliers), with 4 and 5 players up to 5
actions. If the lower bound is strictly smaller than the
upper bound, the solution returned by BARON may be
not optimal and the ratio is strictly smaller than 1. Interestingly, it can be observed that the ratio is rather
high, being always larger than 0.9 with 3 players (the
boxplot shows a very low standard deviation of the results), while with 4 and 5 players the performance are
worse, but the approximation ratio keeps to be high,
being always higher than 0.7.
Finally, let us notice that, in principle, the lower
bound returned by BARON may be indefinite. This
happens whenever BARON terminates without finding any feasible solution. However, this never happens
with our problem since a feasible solution can be found
easily, any possible strategy profile being a feasible solution.
9
1.0
●
●
●
●
●
●
●
●
0.9
Number of players:
●
3
4
5
0.8
0.7
10
20
30
40
Number of actions
Ratio (BARON lower bound / BARON upper bound)
Ratio (BARON lower bound / BARON upper bound)
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.9
Number of players:
3
●
4
5
0.8
0.7
5
10
15
20
25
30
35
40
Number of actions
(a) Mean of the approximation ratio of the solutions returned by (b) Boxplot of approximation ratio of the solutions retuned by
BARON.
BARON.
Fig. 1. Analysis of the approximation ratio obtained with BARON.
4.2.2. IteratedLP
The mean of the approximation ratios between the
value of the solutions obtained with IteratedLP
and the lower bound returned by BARON is reported
in Figure 2 as the number of players and the number of
random restarts vary. In the plots, we report data only
when the algorithm terminated by 60 minutes. For instance, with 5 players, IteratedLP terminates by
60 minutes only with 1 random restart, while with 4
players IteratedLP terminates by 60 minutes only
with 30 random restarts or less. The mean of the approximation ratios is always smaller than 1, showing
that IteratedLP always performs worse than global
optimization. This happens even with 3 players and 80
actions, where the trend suggests that, as the number
of actions increases, the ratio increases, but it remains
below 1. Surprisingly, the mean of the approximation
ratio obtained with a single restart is rather high and
increases as the number of actions increases (on the
other hand, we know that in the worst case the solution quality may decrease in the number of actions).
We report in Figure 3 the statistical significance of our
results. They show that the standard deviation is rather
close especially as the number of actions increases and
the number of random restarts increases. The compute
time of a single restart may be extremely long, not
allowing the termination of the algorithm with more
restarts after 70 actions with 3 players. This is due
to the long time needed to solve each single LP in
IteratedLP, while the number of iterations is very
low as showed in Figure 4.
4.2.3. HeuristicEnumeration
The mean of the approximation ratio (defined as
the ratio between the value of the solutions obtained with HeuristicEnumeration w.r.t. the
lower bound returned by BARON) is reported in Figure 2 as the number of players and vary. In the
plots, we report data only when the algorithm terminated within the 60 minutes deadline. We recall that
HeuristicEnumeration is the only algorithm
guaranteeing a theoretical bound over the optimality
gap. The main result is that HeuristicEnumeration does not terminate by 60 minutes with ă 1
from 25 actions even with 3 players. This shows
that the algorithm can be practically applied only
when “ 1, but this does not provide any theoretical bound (indeed, since all the payoffs are between 0 and 1, any strategy profile has an additive
gap no larger than 1). The approximation ratios of the
solutions returned by HeuristicEnumeration
are extremely low, being smaller than 0.25 except
for the cases with very few actions. Finally, we report in Figure 3 the statistical significance. Also for
HeuristicEnumeration, as with IteratedLP,
the standard deviation is very low, showing that the
performance of the algorithm is rather stable independently from the specific game instance.
4.2.4. Algorithms comparison
We provide here a comparison of the performance
of IteratedLP and HeuristicEnumeration,
each with its best parametrization: for IteratedLP
we set the number of random restarts as the largest
number such that the algorithm terminates with 40 actions (i.e., 60 random restarts with 3 players, 30 with 4
players, and 1 with 5 players), for HeuristicEnumeration we set as the smallest value such that the
algorithm terminates with 40 actions (i.e., “ 1.0).
The results show that IteratedLP outperforms on
average HeuristicEnumeration in all the cases.
It can be observed that the boxplots have no overlaps,
showing that for no game instance, among those we
generated, HeuristicEnumeration outperforms
IteratedLP.
10
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
1.00
●
●
Ratio (Iterated LP / BARON)
●
●
●
●
●
●
●
●
●
●
●
●
●
Random restarts:
●
●
1
10
●
●
20
●
0.9
30
40
50
60
0.8
70
80
●
90
Ratio (Heuristic enumeration / BARON)
1.0
●
0.75
ε
●
●
0.1
0.25
0.50
0.5
0.75
1
0.25
100
0.7
0.00
20
40
60
80
10
20
Number of actions
(a) 3–players IteratedLP.
1.00
●
Random restarts:
●
Ratio (Iterated LP / BARON)
40
(b) 3–players HeuristicEnumeration.
●
1
10
20
0.9
30
40
●
50
●
60
●
0.8
70
80
●
90
Ratio (Heuristic enumeration / BARON)
1.0
30
Number of actions
0.75
ε
●
●
0.1
0.25
0.50
0.5
0.75
1
0.25
100
0.7
0.00
5.0
7.5
10.0
12.5
15.0
5.0
7.5
Number of actions
(c) 4–players IteratedLP.
12.5
15.0
(d) 4–players HeuristicEnumeration.
1.0
1.00
●
1
10
20
0.9
30
40
50
60
0.8
70
●
80
●
90
●
Ratio (Heuristic enumeration / BARON)
Random restarts:
●
Ratio (Iterated LP / BARON)
10.0
Number of actions
0.75
ε
●
●
0.1
0.25
0.50
0.5
0.75
1
0.25
100
0.7
0.00
5
6
7
8
9
10
Number of actions
(e) 5–players IteratedLP.
5
6
7
8
9
10
Number of actions
(f) 5–players HeuristicEnumeration.
Fig. 2. Mean performance of IteratedLP and HeuristicEnumeration.
4.2.5. Price of non–correlation
Finally, we evaluate the price, in terms of utility, for
a team acting in non–correlated fashion. We do that
by comparing the Maxmin value obtained when all
the members of the team can play in correlated strategies w.r.t. the Team–maxmin value obtained when the
members of the team play in mixed strategies. The
latter value is obtained by using global optimization
(more precisely, we use the lower bound returned by
BARON). Figure 6 shows the mean values of the ratio
(the value obtained with correlated strategies divided
by the value obtained with mixed ones) and the corresponding boxplots for games with 3, 4 and 5 players.
Focusing on Figure 6(a), we note that all the curves
are increasing and, for 3–players games, the trend is
asymptotical towards about 1.15. This means that correlation allows the team to always achieve better re-
sults both w.r.t. the number of players and the number of actions, but in random games the loss due to the
non–correlation is low.
5. Conclusions and future research
In the present work, for the first time, we analyzed
the Team–maxmin equilibrium from a computational
perspective. We introduced a definition that involves
an arbitrary number of teams and thus it is broader than
the one presented in literature. We focused on games
with n players, where a team composed of n ´ 1 players play against one single adversary. Being the Team–
maxmin equilibrium the best solution concept for
these games, we reported its mathematical formulation
and proposed two approximation algorithms to deal
11
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
1.00
●●●
●●●●
●●●
●●
●●
●●
●●●
●
●●● ●
●●
●●
● ●
●
Ratio (Iterated LP / BARON)
●●
●
●
●
●
●
●
●●
●
Random restarts:
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●● ●
●●
●
●
● ●
●
●●● ●
●
●●● ● ●
●● ●●●
●
●
●
●
●
●● ●●
●●
●
●●
●
●
●
●●
●●●●
●●
●
● ●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
1
●
10
●
●
20
●
0.9
●
30
●
●
●●●
40
●
50
●
60
0.8
70
80
90
●
Ratio (Heuristic enumeration / BARON)
1.0
0.75
ε
0.1
●
●
0.25
0.50
0.5
0.75
1
0.25
●
●
●
100
●
15
20
25
30
35
40
50
60
70
80
5
10
15
Number of actions
20
25
30
35
40
Number of actions
(a) 3–players IteratedLP.
(b) 3–players HeuristicEnumeration.
1.0
1.00
●
●
●
●
●
●
●
1
●
●
●
10
●
●
●
0.9
●
●
●
20
●
●
●
●
30
●
●
●
40
●
●
●
50
●
●
●
●
60
●
0.8
70
80
●
90
Ratio (Heuristic enumeration / BARON)
●
●
●
●
Random restarts:
●
●
●
●
●
●
●
●
●
●
●
Ratio (Iterated LP / BARON)
●
●
0.00
10
●
●
●
0.7
5
●
●
0.75
ε
0.1
0.25
0.50
0.5
0.75
1
●
●
0.25
●
●
100
●
●
0.7
0.00
5
10
●
15
5
Number of actions
15
Number of actions
(c) 4–players IteratedLP.
(d) 4–players HeuristicEnumeration.
1.0
1.00
1
●
0.9
●
●
●
10
20
●
●
30
●
●
●
●
40
50
60
0.8
70
●
●
80
90
●
Ratio (Heuristic enumeration / BARON)
Random restarts:
●
Ratio (Iterated LP / BARON)
●
10
0.75
ε
0.1
●
●
0.50
0.25
●
●
0.5
0.75
1
0.25
●
●
●
100
0.7
0.00
5
10
Number of actions
(e) 5–players IteratedLP.
5
10
Number of actions
(f) 5–players HeuristicEnumeration.
Fig. 3. Boxplots of the performance of IteratedLP and HeuristicEnumeration.
with its computation, HeuristicEnumeration
and IteratedLP. HeuristicEnumeration is
a variation of a quasi–polinomial algorithm already
presented in literature while IteratedLP is a novel
heuristic algorithm. Such algorithm is an iterated
approximation anytime algorithm, which improves
its performance given more computational time. We
also provided some bounds on the utility of the
IteratedLP. We evaluated these two approximation
algorithms, comparing them with the value computed
by the global optimization approach, which results
overcoming them in medium–sized instances, while
IteratedLP is the most performing algortihm for
large instances. Finally, we compared the results of the
global optimization solution with the correlated equilibrium, discovering that correlation really gives the
team the possibility to achieve always better results,
independently from the number of players and actions
played by each player.
In the future, we want to further analyze the IteratedLP algorithm, providing an accurate computational analysis on the time required to be executed.
Moreover, we want to investigate the price of non–
correlation, i.e., the utility loss due to the impossibility of communication between the team players, to obtain theoretical bounds. We will also study how to extend the results presented in this work to team games
that present two teams with more than one player each.
Finally, the application of such theoretical results to
a real–world problem, e.g., security games, would be
very important to show how these algorithms can be
adapted to the high level of complexity of such scenarios.
12
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
10.0
Random restarts:
Number of iterations (LP iterated)
1
●
●●
10
7.5
●
● ●
●
●●● ● ● ●
20
●
●●
[2]
30
● ● ●●
40
5.0
●
●● ● ●
●● ●● ●
●
50
●
●
●● ● ●●
●●●●●●● ●●
●●●●●●
●
●
60
●
●
[3]
70
● ● ● ●
2.5
80
●
●
● ●
●●● ●●
●●●●●●●●●
●
●●
90
100
0.0
5
10
15
20
25
30
35
40
50
60
70
[4]
80
Number of actions
(a) Games with 3 players.
15
Random restarts:
Number of iterations (LP iterated)
1
10
●
10
[5]
●
20
●
●
●
●
●
●
●
●
●
●
30
40
●
●
●
5
●
60
70
●
80
●
●
●
[6]
50
[7]
90
100
0
5
10
[8]
15
Number of actions
(b) Games with 4 players.
[9]
15
Random restarts:
●
1
Number of iterations (LP iterated)
●
●
10
●
10
20
●
30
40
50
60
5
●
●
70
[10]
80
●
90
●
100
[11]
0
5
10
Number of actions
(c) Games with 5 players.
[12]
Fig. 4. Number of iterations done by IteratedLP per restart.
[13]
References
[1] Nicola Basilico, Nicola Gatti, and Francesco Amigoni. Patrolling security games: Definition and algorithms for solving
large instances with single patroller and single intruder. Artificial Intelligence, 184:78–123, 2012.
C. Borgs, J. T. Chayes, N. Immorlica, A. T. Kalai, V. S. Mirrokni, and C. H. Papadimitriou. The myth of the folk theorem.
Games and Economic Behavior, 70(1):34–43, 2010.
Xiaotie Deng, Christos Papadimitriou, and Shmuel Safra. On
the complexity of equilibria. In Proceedings of the thiry-fourth
annual ACM symposium on Theory of computing, pages 67–71.
ACM, 2002.
R. Fourer, D. M. Gay, and B. Kernighan. Algorithms and
model formulations in mathematical programming. chapter
AMPL: A Mathematical Programming Language, pages 150–
151. Springer-Verlag New York, Inc., New York, NY, USA,
1989.
Inc. Gurobi Optimization. Gurobi optimizer reference manual,
2015.
K. A. Hansen, T. D. Hansen, P. B. Miltersen, and T. B.
Sørensen. Approximability and parameterized complexity of
minmax values. In WINE, pages 684–695, 2008.
Albert Xin Jiang, Ariel D Procaccia, Yundi Qian, Nisarg Shah,
and Milind Tambe. Defender (mis) coordination in security
games. AAAI, 2013.
Arnold Neumaier, Oleg Shcherbina, Waltraud Huyer, and
Tamás Vinkó. A comparison of complete global optimization
solvers. Mathematical programming, 103(2):335–356, 2005.
Eugene Nudelman, Jennifer Wortman, Yoav Shoham, and
Kevin Leyton-Brown. Run the gamut: A comprehensive
approach to evaluating game-theoretic algorithms. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages
880–887. IEEE Computer Society, 2004.
N. V. Sahinidis. BARON 14.4.0: Global Optimization of MixedInteger Nonlinear Programs, User’s Manual, 2014.
M. Tawarmalani and N. V. Sahinidis. A polyhedral branchand-cut approach to global optimization. Mathematical Programming, 103:225–249, 2005.
John Von Neumann and Oskar Morgenstern. Theory of games
and economic behavior. Princeton university press, 2007.
Bernhard von Stengel and Daphne Koller. Team-maxmin equilibria. Games and Economic Behavior, 21(1):309–321, 1997.
13
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
●
●
●
●
●
●
●
1.00
●
0.75
Algorithm:
●
0.50
Iterated LP, n_restart=60
Heuristic enumeration, ε=1
0.25
Ratio (approximated alg. / BARON)
Ratio (approximated alg. / BARON)
1.00
●
●
●
●
●
●
●
●
●
●
0.75
Algorithm:
Iterated LP, n_restart=60
0.50
Heuristic enumeration, ε=1
0.25
●
0.00
10
20
30
40
●
●
●
●
0.00
5
10
15
Number of actions
(a) 3–players ratio comparison.
30
35
40
1.00
●
●
Ratio (approximated alg. / BARON)
25
(b) 3–players comparison boxplot.
●
0.75
Algorithm:
●
0.50
Iterated LP, n_restart=30
Heuristic enumeration, ε=1
0.25
Ratio (approximated alg. / BARON)
1.00
20
Number of actions
●
●
●
●
0.75
Algorithm:
Iterated LP, n_restart=30
0.50
Heuristic enumeration, ε=1
0.25
●
●
0.00
0.00
5.0
7.5
10.0
12.5
15.0
●
●
●
5
10
Number of actions
15
Number of actions
(c) 4–players ratio comparison.
(d) 4–players comparison boxplot.
1.00
1.00
●
●
●
●
●
0.75
●
Algorithm:
●
0.50
Iterated LP, n_restart=1
Heuristic enumeration, ε=1
0.25
0.00
Ratio (approximated alg. / BARON)
Ratio (approximated alg. / BARON)
●
0.75
●
Algorithm:
Iterated LP, n_restart=1
0.50
Heuristic enumeration, ε=1
●
0.25
●
0.00
5
6
7
8
9
10
Number of actions
(e) 5–players ratio comparison.
5
10
Number of actions
(f) 5–players comparison boxplot.
Fig. 5. Average ratios and boxplots of the comparison between IteratedLP and HeuristicEnumeration.
14
N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG
1.4
1.3
Number of players:
●
1.2
3
4
5
●
1.1
●
●
●
●
●
●
Ratio (correlated / BARON)
Ratio (correlated / BARON)
1.4
1.3
1.2
●
●
●
1.1
●
●
●
●
●
●
1.0
1.0
10
20
30
40
5
10
15
Number of actions
(a) Non–correlation loss mean.
25
30
35
(b) 3–players non–correlation loss boxplot.
1.4
Ratio (correlated / BARON)
1.4
Ratio (correlated / BARON)
20
Number of actions
1.3
●
1.2
●
1.1
1.0
1.3
1.2
1.1
1.0
5
10
15
5
Number of actions
(c) 4–players non–correlation loss boxplot.
10
Number of actions
(d) 5–players non–correlation loss boxplot.
Fig. 6. Inefficiency due to the non–correlation in team games.
40

Download Report

Computing the Team–maxmin equilibrium in Single–Team Single

Paperzz.com

Your Paperzz