1 Undefined 1 (2016) 1–5 IOS Press Computing the Team–maxmin equilibrium in Single–Team Single–Adversary Team Games Nicola Basilico a , Andrea Celli b , Giuseppe De Nittis b,˚ and Nicola Gatti b a Computer Science Department, University of Milan via Comelico 39/41 20135 Milan Italy E-mail: [email protected] b Department of Electronics, Information and Bioengineering, Politecnico di Milano piazza Leonardo da Vinci, 32 20133 Milan Italy E-mail: {andrea.celli,giuseppe.denittis,nicola.gatti}@polimi.it Abstract. A team game is a non–cooperative normal–form game in which some teams of players play against others. Team members share a common goal but, due to some constraints, they cannot act jointly. A real–world example is the protection of environments or complex infrastructures by different security agencies: they all protect the area with valuable targets but they have to act individually since they cannot share their defending strategies (of course, they are aware of the presence of the other agents). Here, we focus on zero–sum team games with n players, where a team of n ´ 1 players plays against one single adversary. In these games, the most appropriate solution concept is the Team–maxmin equilibrium, i.e., the Nash equilibrium that ensures the team the highest payoff. We investigate the Team–maxmin equilibrium, characterizing the utility of the team and showing that it can be irrational. The problem of computing such equilibrium is NP–hard and cannot be approximated within a factor of n1 . The exact solution can only be found by global optimization. We propose two approximation algorithms: the former is a modified version of an already existing algorithm, the latter is a novel anytime algorithm. We computationally investigate such algorithms, providing bounds on the utility for the team. We experimentally evaluate the algorithms analyzing their performance w.r.t. a global optimization approach and evaluate the loss due to the impossibility of correlating. Keywords: Team–maxmin equilibrium, Algorithmic game theory, Equilibrium computation 1. Introduction Due to the recent spread of game theory techniques to solve real–world problems in strategic settings, e.g., economics or decision making, the computation of Nash equilibria and Maxmin strategies has become a very important topic in such fields [2]. Specifically, the study of how to compute the Nash equilibrium has * Corresponding author. E-mail: [email protected] been considered one of the most challenging problems in the last decade in Computer Science [3]. In non–cooperative constant–sum 2–player games, Maxmin strategies are one of the most adopted solution concepts [12]. The rationale behind such solution concept is the minimization of the loss in the worst– case scenario. According to this interpretation, players are usually referred to as the maximizer and the minimizer, the former aiming at maximizing her own util- c 2016 – IOS Press and the authors. All rights reserved 0000-0000/16/$00.00 2 N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG ity, the latter acting to minimize the utility of the first player. In real–world scenarios, several interactions counterpose groups of agents instead of single individuals. The agents in a group share the same goal but, even though they are aware of the presence of the other players, they may be forced to act on their own. A practical example is the protection of environments or infrastructures by different security agencies (customarily called Defenders in the related literature, see [7] for an example) they all aim at defending the area to be secured against malicious attackers but they cannot communicate during the mission to coordinate their actions on the field. In these cases, a non–cooperative game among n players can be set: some players constitutes the guards team, the others form the attackers team. The team players receive the same utility, say UT , defined by keeping into account the utilities of the different members. In these scenarios, the most suitable solution concept is the Team–maxmin equilibrium, which is actually the best Nash equilibrium of the game for the team, providing the team the highest utility [13]. In the literature, there are just few works that explore solution concepts in games with more than two players from an algorithmic point of view. In particular, we recall that for 3–player games with binary utilities and n actions per player, it is NP–hard to approximate the minmax value and, for duality, the Maxmin value, for each of the players within n32 [2]. Original contributions. Starting from a broader definition of team games, we restrict our attention to zero–sum single–team single–adversary team games (STSA–TG) with n players, where a team of n ´ 1 players plays against one single adversary, as defined in [13]. For this specific case, which is very common in practice, we provide properties that characterize the Team–maxmin equilibrium. We propose two algorithms to deal with its computation: the former is a modified version of the quasi–polynomial time algorithm1 presented in [6], the latter is a novel anytime approximation algorithm. More precisely, the rest of the paper is organized as follows: 1 An algorithm is called quasi–polynomial if it requires more than polynomial time but yet not exponential time. The worst–case runk ning time of a quasi–polynomial time algorithm is 2Opplognq q , for some constant k. – Section 2 introduces the preliminary notions of game theory that will be useful throughout the paper and the concept of Team–maxmin equilibrium; – in Section 3 we investigate the Team–maxmin equilibrium, showing some properties on the utility and providing two algorithms to compute it, along with their computational analysis; – in Section 4 we compare the quality of the solution of our approximation algorithms w.r.t. global optimization, which is able to return the optimal solution if no timeout is set. Furthermore, we evaluate the loss of the team, in terms of utility, due to the impossibility of correlating; – Section 5 summarizes the main results of this work and suggests some problems and issues that should be faced and solved in the future. 2. Preliminaries In this section, we introduce some game–theoretical preliminaries necessary for our study. More precisely, we initially introduce the definition of game and team game and subsequently provide the definition of Maxmin and Team–maxmin strategy. 2.1. Team games In game theory, a normal–form game is a tuple pN, A, U q where: – N “Ś t1, 2, . . . , nu is the set of players; – A “ iPN Ai , where Ai “ ta1 , a2 , . . . , ami u is the set of player i’s actions; – U “ tU1 , U2 , . . . , Un u, where Ui : A Ñ R is the utility function of player i. In this paper, we study only games in which mi is finite for every player i. A strategy profile is defined as s “ ps1 , s2 , . . . , sn q, where si : Ai Ñ r0, 1s is ř player i’s mixed strategy subject to the constraint ai PAi si pai q “ 1. As customarily used in game– theoretical notation, we will use the subscript ´i to denote the set containing all the players except player i. A team T Ď N is a subset of players sharing the same utility function, which is maximal under inclusion. Formally, for any i, j P N we say that i, j P T if and only if Ui “ Uj . We denote with UT the utility of each player in team T . A team game is a normal–form game where at least one team is present. Notice that two degenerate cases of team games are possible: in N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG Example 1 (Team (security) game) A gang Γ of criminals is logging some ancient trees in a forest, which extends over two countries. In order to get rid of it, 2 In coordination games the payoff of any outcome to each player is the same. the countries have created a group of rangers, divided in two teams τ1 , τ2 , one for each country. For security reasons, the teams can share with the others the position they can occupy to protect the trees, but then, in order to keep radio silence, they cannot communicate when the action is undertaken and so each team executes its route autonomously. The forest contains four trees, whose values are πpt1 q “ 0.9, πpt2 q “ 0.7, πpt3 q “ 0.5, πpt4 q “ 0.3. Γ can attack any tree and its attack is successful if it saws a tree t and fades away without being detected. Let us denote with pij the j–th position occupied by the i–th team. We have the following actions: – – – – p11 allows τ1 to protect t1 ; p12 allows τ1 to protect t2 ; p21 allows τ2 to protect t3 ; p22 allows τ2 to protect t4 . The payoffs are given as follows: # UΓ “ πpti q, if the attack is successful , 0, otherwise Uτi “ ´ UΓ . 2 The game can now be formulated as follows (being the game zero–sum, we report only the values for the team members): τ1 , τ2 the former, each team is composed of a single player, and therefore the number of teams equals the number of players; in the latter, there is only one team grouping together all the n players (such a game is customarily called coordination game2 ). Team games capture situations in which players pursue equal objectives. In principle, the coordination of teammates can be of two forms: correlated, in which a correlating device decides a joint action (an action profile specifying one action per teammate) and then communicates each teammate her action, and non– correlated, in which each player plays independently from the others. In the first case, the strategy of the team T is said to be correlated. Ś Given the set of team action profiles defined as AT “ iPT Ai , a correlated team strategy is defined ř as sT : AT Ñ r0, 1s subject to the constraint that at PAT sT pat q “ 1. In other words, teammates can decide jointly their strategy and can synchronize its execution. In the second case, players are subject to the inability of correlating their actions and their strategies si are mixed, as defined above for a generic normal–form game. In other words, teammates can decide jointly their strategies, but they cannot synchronize their execution, and therefore each player independently draws an action from her strategy. Dealing with teams in which teammates can correlate is easy from a computational point of view. Indeed, such a team is equivalent to a single player whose actions are the joint actions of the teammates. On the other hand, dealing with teams in which teammates cannot correlate is a challenging problem that is still open. In this work, we explore for the first time algorithmic methods for such problem. A particularly significant use case for team games can be found in the field of security games [1], a class of games that received some interest in the last few years due to its usefulness in strategic security resource allocation problems. We report the following example which can be called a team security game, modeling an adversarial scenario where multiple ranger teams from different countries deal with the task of protecting an environment from a gang of criminals (for a specific example of security game of this type see, for example, [7]). 3 r11 , r21 r11 , r22 r21 , r21 r21 , r22 Γ t1 t2 t3 t4 0 ´0.35 0 ´0.15 0 ´0.35 ´0.25 0 ´0.45 0 0 ´0.15 ´0.45 0 ´0.25 0 2.2. Team–maxmin strategies Let us start by recalling some basic game theoretical notions that will come at hand. One interesting solution concept in constant–sum games is given by Maxmin strategies. Given an n–player constant–sum game the Maxmin strategy for player i is defined as si “ arg maxsi mins´i Ui psi , s´i q. This strategy reflects player i’s conservative behavior of aiming at the best worst case under the assumption that the other players will seek the lowest payoff for i. Called v the value returned by the previous maxminimization, then 4 N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG the Maxmin strategy guarantees i at least a payoff of v, that is Ui psi , s´i q ě v for any s´i . Maxmin strategies become particularly interesting in 2–player zero– sum games, due to their relation with another popular solution concept: the Nash equilibrium. A Nash equilibrium is a strategy profile s “ ps1 , s2 , . . . , sn q from which no player can gain more by unilaterally changing her strategy. In 2–player zero–sum games, where U1 “ ´U2 , Maxmin strategy profiles (where both players adopt a Maxmin strategy) and Nash equilibria coincide and yield to player 1 the same game value v “ v. Besides this, Maxmin strategies (and, consequently, Nash Equilibria) are interchangeable, can be computed in polynomial time via linear programming, and always yield a rational game value (that is, a value that can be exactly represented in a computer) if the utility functions return rational values. In the following example, we show how the Maxmin strategy space can be represented depending on the strategies played by the two agents. 1 Example 2 (Maxmin strategy in a 2–player zero– sum game) Consider the following 2–player zero–sum strategic game (we report only U1 ): s1 (a2 ) a4 b b 0 E[U1 ] =b 1.2 a3 a4 a3 b 0 0.4 1 s1 (a1 ) b s1 (a2 ) b 0.4 b s1 (a1 ) Maxmin strategies can be easily generalized to team games. To ease notation, let us assume that T “ t1, . . . , tu and that sT “ ps1 , s2 , . . . , st q. The Maxmin strategy for team T when teammates are not correlated is, analogously to the previous case, defined as sT “ arg max min UT ps1 , . . . , st , s´T q. s1 ,...,st s´T Property 1 In any STSA–TG the Team–maxmin strategy sT is always part of a Nash equilibrium and this equilibrium is called Team–maxmin equilibrium. Property 2 No Nash equilibrium different from the Team–maxmin equilibrium can make the team better off. 2 a3 a4 a1 0 3 a2 2 0 Below, on the left, we report the strategy simplex of player 1 partitioned in subareas with the corresponding action player 2 would play. On the right, we also show how the expected utility of player 1 varies in the simplex given the strategy of player 2. 1 where ´T denotes the set of all the opponents of the team (possibly partitioned in teams as well). Team– maxmin strategies still represent reasonable ways to play a game, but, as what happens with 2–player general–sum games, do not provide equilibrium states in the general–sum case. However, in their seminal work, Von Stengel and Koller show the validity of useful theoretical properties in a particular case of team games: zero–sum, single–team, single–adversary team games (STSA–TG). In these games we have that N “ T Y tnu where players in the team T share the same utility function UT , while the adversary n has utility function Un “ ´pn ´ 1qUT since |T | “ n ´ 1. A possible interpretation is that if the team as a whole p gets a payoff of p, then the adversary should pay n´1 to each of the n ´ 1 members. The results provided by von Stengel and Koller can be summarized in the following two properties (we refer the reader to [13] for the formal statements and proofs): Since a Team–maxmin strategy always exists, from Properties 1 and 2 we obtain that any STSA–TG has a Team–maxmin equilibrium which is also the best Nash equilibrium for the team. (Notice that in case multiple Team–maxmin equilibria are present, they cannot be arbitrarily interchanged like Maxmin strategies in 2–player zero–sum games). Moreover, we cannot exploit a linear formulation to compute it and, instead, we need to rely on a non–linear, non–convex program in which we maximize the unconstrained variable v where for every adversary’s action an P An the following constraint must hold: v´ ÿ aT PAT UT paT , an q ź si pai q ď 0. (1) iPT In the resulting non–linear program, team players’ strategies s1 , s2 , . . . , sn´1 and team’s expected utility lower bound v are the decision variables. In Constraints (1) aT is a team action profile specifying an action for each team member, and ai is the action played by team member i in action profile aT . We call the optimal solution of the above program team game value N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG and we denote it with vT . Notice that, if we allow correlation between team members the game becomes substantially equivalent to a 2–player zero–sum game. In such case, the team plans and acts like a single player. Formulation–wise, this amounts to replace in the strategy vector s1 , s2 , . . . , sn´1 with a single strategy s1 over actions in AT . ś Then, we would replace in Constraints (1) the term iPT si pai q with s1 paT q obtaining a linear program yielding an optimal value greater or equal to vT (the ability of correlating their actions can only improve the team’s revenue in the game). In the next section, we further develop the analysis of STSA–TG by showing how other nice properties of their 2–player zero–sum counterparts cease to hold. 3. Problem Analysis In this section we characterize the Team–maxmin equilibrium. More precisely, in Section 3.1, we analyze the rationality of the equilibrium value given a rational input and we computationally deal with the approximation of the equilibrium. Due to the negative result, in Section 3.2, we provide two algorithms to deal with the problem of finding the Team–maxmin equilibrium: the former is an adaptation of the algorithm proposed in [6], the latter is a novel approximation algorithm. 3 To compute the Team–maxmin strategy for T we solve the following problem: max 2 a3 a4 a1 1 0 a2 0 0 a5 1 1 Proof 1.1 We prove this by providing an example. Consider the following zero–sum STSA–TG where T “ t1, 2u (we report utilities for player 3, that is the adversary): 2 a3 a4 a1 0 0 a2 0 2 a6 v ď s1 pa1 qs2 pa3 q v ď 2p1 ´ s1 pa1 qqp1 ´ s2 pa3 qq s.t. We observe that the above problem is formulated considering only s1 pa1 q ad s2 pa3 q as explicit variables. In fact, Player 1 can play only two actions, namely a1 , a2 , and the probability with which such actions are played must sum to 1. Formally, s1 pa1 q ` s1 pa2 q “ 1, thus we can rewrite s1 pa2 q as s1 pa2 q “ 1 ´ s1 pa1 q. A similar observation can be done for Player 2, which leads us considering only s2 pa3 q as explicit variable for Player 2: s2 pa4 q “ 1 ´ s2 pa3 q. The maximum value is achieved when both inequalities hold as equalities. Thus, we can write: s1 pa1 qs2 pa3 q “ 2p1 ´ s1 pa1 qqp1 ´ s2 pa3 qq from which it follows: s1 pa1 q “ 2s2 pa3 q ´ 2 . s2 pa3 q ´ 2 Now we write: v“ Theorem 1 (Rational and irrational Maxmin value) The Team–maxmin value in a STSA–TG can be irrational. v s1 pa1 q,s2 pa3 q 3.1. Properties of the Team–maxmin equilibrium We show that, even considering a team of only two players against one single adversary, the value of the game can be irrational and therefore cannot be represented by a computer. Such result and the corresponding proof are a consequence of [6]. 5 s2 pa3 qp2s2 pa3 q ´ 2q . s2 pa3 q ´ 2 ? This expression is maximized for?s2 pa3 q “ 2 2 and the corresponding value is 6 ´ 4 2. l Furthermore, we can show that even the problem of approximating the Team–maxmin value is hard. Such results comes as a direct consequence of the following theorem taken from [6] (again, we refer the reader to that work for full technical details). Theorem 2 (3–player Maxmin complexity) Approximating the Maxmin value (without correlation) in a 3–player game with m actions per player and binary payoffs within an additive error of n1 is NP–hard for any ą 0. We report other two results whose adaptation fits our problem. The following theorem gives a bound on the maximum number of actions the team will randomize on at the equilibrium. 6 N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG ÿ v´ Theorem 3 (Maxmin strategy support size) Given an STSA–TG where |An | “ k, there is a Team–maxmin strategy (without correlation) for the team where each team member’s support has a size that is not larger than k. UT paT , an q si pai q ď 0 @an P An iPT aT PAT ÿ ź si pai q “ 1, si pai q ě 0 @i P T, ai P Ai ai PAi Proof sketch 3.1 Given a Maxmin strategy s of players N ztnu against player n such that at least one player, say player i, plays more than k actions with strictly positive probability, it is possible to show that there is a Maxmin strategy in which i plays no more than k actions. Given s, we keep the strategies of all the maximizers except player i fixed and find the Maxmin strategy of player i against player n, say s11 . Since this is a 2–player game, the Maxmin strategy has a support with a size no larger than k. The strategy profile s1 obtained from s once the strategy of player i has been replaced with s11 is a Maxmin strategy of players N ztnu against player n. Repeating the same procedure for all the minimizers, it is possible to find a Maxmin strategy in which the support of each player is upper bounded by k. l Starting from these results, we now deal with the computation of the Team–maxmin equilibrium. 3.2. Computing the Team–maxmin equilibrium We recall that computing exactly the Team–maxmin equilibrium cannot be done in polynomial time. Moreover, from Theorem 1 we know that the value to be computed may be irrational while from Theorem 2 we know that it is hard to approximate the equilibrium within an additive error. In the following we describe three resolution methods for this problem: the first one is based on global optimization, which returns the optimal solution if no timeout is set, while the other two are based on ad hoc algorithms, which return an approximation solution. 3.2.1. Global optimization The first available approach that can be exploited is to tackle the resolution of the Team–maxmin non– linear optimization problem with a global optimization solver. The problem is the one formulated in Section 2.2, we fully report it here for clarity: max v s.t. In our experiments we compute solutions for this problem with BARON [11], the most performing global solver for continuos problems among all the existing optimization problems solvers [8]. 3.2.2. A quasi–polynomial algorithm In [6] a method for approximating minmax values is presented. Such method can be naturally extended to the case we are considering: here, we present and discuss a variation of such algorithm adapted to solve our problem. Specifically, we propose a heuristic enumeration, which is a subset of the complete one proposed in [6]. The steps are reported in Algorithm 1, they work as follows. For every team member i we comAlgorithm 1 HeuristicEnumeration 1: v ˚ “ `8 2: for all i P T do P T 3: Pi “ tVi1 , Vi2 , . . . | Vi Ď Ai , |Vi | ď ln2|A2 i | u 4: end for Ś 5: C “ iPT Pi 6: for all pV1 , V2 , . . . , Vn´1 q P C do 7: for all i P T#do 1 , if ai P Vi 8: si pai q “ |Vi | 0 otherwise 9: end for 10: v ˚ “ mintv ˚ , min UT ps1 , s2 , . . . , sn´1 qu sn 11: 12: end for return v ˚ pute Pi defined as the collection of all the subsets of P T i’s actions whose cardinality does not exceed ln2|A2 i | (Line 3), where P r0, 1s is a parameter that defines the size of the enumeration. Next we consider the Cartesian product C of all the Pi s (Line 5) where any of its elements, by definition, specifies one subset of actions Vi for each team member i. For each one of such elements pV1 , V2 , . . . , Vn q (Line 6) we consider a team strategy profile where each team member i uniformly randomizes over the actions specified by her associated Vi (Line 8). We compute the best value for the adversary (the minimizer) given such team strategy profile keeping track of the minimum over C (Line 10) which will be returned as the final result of the algorithm. 7 N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG Proposition 1 The solution returned by Algorithm 1 has an approximation bound that depends on the number of actions per player m. Proof 3.2 First, notice that, since m is finite, the solution returned by Algorithm 1 cannot go arbitrarely close to the optimal solution. In particular, let us consider the following example of bimatrix game, a degenerate case of an STSA–TG where the team is composed by only player 1: Algorithm 2 IteratedLP 1: @i P T , scur Ð ŝi i 2: repeat 3: for all i P T do 4: maximize vi s.t. @an P Ařn : vi ď xai UT paT , an q aT PAT 1 ř 2 a3 a4 a1 1 0 a2 0 0.5 In this specific game Algorithm 1 would consider the following supports: ta1 u, ta2 u,ta1 , a2 u which, when player 1 plays uniformly, would yield 0, 0, and 1 4 , respectively. The Maxmin value of player 1 in this game is 13 meaning that the algorithm scores an actual 1 . Clearly, having exhaustively conadditive error of 12 sidered all the supports in the game, a better additive error cannot be achieved. l Moreover, the pseudo–polynomial complexity makes the algorithm also impractical even in relatively small instances. For example, an ă 13 can only be adopted in team games with a number of actions greater than or equal to 10. However, HeuristicEnumeration would require a a number of iterations of the order of 1018 for a STSA–TG with a team of 2 members and 10 actions per player. 3.2.3. An anytime algorithm Despite providing (bounded) approximation guarantees, the algorithm presented in the previous section is clearly not promising in terms of scalability. In Algorithm 2 we propose a method we call IteratedLP for which we give an anytime implementation. It works by maintaining a current solution scur which specifies a strategy for each team member at that can be returned at any time. It is initialized (Line 1) with a starting solution ŝ which, in principle, can prescribe an arbitrary set of strategies for the team (e.g., uniform randomizations). Then for each team member i (Line 3) we instantiate and solve the specified linear program (Line 4). The decision variables of this LP are vi and, for each action ai of player i, xai . We maximize vi subject to the upper bound given by the first constraint which is a rewriting of Eq. (1) where we assumed that the strategy of player i (relabeled with x to distinguish it) is a variable while the ś scur j paj q jPT ztiu xai “ 1 ai PAi @ai P Ai : xai ě 0 end for i1 “ arg max vi˚ iPT# x˚ai if i “ i1 7: scur i pai q “ cur si pai q otherwise 8: until (convergence or timeout) 9: return scur 5: 6: strategies of the other team members are constants set to the associated value specified by the current solution. (Notice that, in the LP, aj is the action that team member j plays in the team action profile aT ). The optimal solution of the LP is given by vi˚ and x˚ , basically representing the Maxmin strategy of team member i once the strategies of teammates have been fixed to the current solution. Once the LP has been solved for each i, the algorithm updates the current solution in the following way (Line 7): the strategy of the team member that obtained the best LP optimal solution is replaced with the corresponding strategy from the LP. This process continuously iterates until convergence or until some timeout is met. Algorithm 2 is an iterated algorithm and, at each iteration, the value of the game increases (non–strictly) monotonically. We run it using multiple random restarts, that is generating a set of different initial assignments ŝ (Line 1). Once convergence is achieved, we pass to the next random restart generating a new strategy profile for the team. This algorithm is polynomial and probabilistic exact sampling in the simplices space and then solving the linear program iteratively several times. Being just probabilistic exact, we are pushed to study the approximation ratio of such algorithm. Theorem 4 Given ŝ such that, for each player i in the team, sˆi prescribes a uniform strategy profile, the worst–case approximation factor of Algorithm 2 can- 8 N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG not be better than for each player. 1 m , where m is the number of actions Proof 4.1 To prove this we proceed by constructing an instance where Algorithm 2 achieves a factor of ex1 independently of the initial assignment ŝ. We actly m provide detailed insights for the case of uniform ŝ and we describe the rationale behind its extension to the general case. Consider a STSA–TG where the team is composed by 2 members and each player has m actions. Moreover consider the minimizer to have only one action available. The team utility is defined as UT “ Im where Im is the mˆm identity matrix. Let us consider, in Line 1 of Algorithm 2, a uniform ŝ prescribing to each team member to play each action with probabil1 . The resolution of the LP of Line 4, once fixed ity m the strategy of one team member to such uniform distribution, would output the same uniform distribution for the other one. In other words, ŝ is a local maximum for our algorithm. Thus, the algorithm would return a strategy profile guaranteeing the team a payoff m 1 of m 2 “ m whereas the Team–maxmin value amounts to 1 and can be clearly obtained by any team profile in which both team members play the same action in pure strategies. So the approximation factor achieved 1 by the algorithm would be exactly m . l We conjecture that Theorem 4 holds in general. It should be possibile to build, for any arbitrary initialization ŝ, a game in which Algorithm 2 has an approx1 imation ratio of m , but the problem is still open. 4. Experimental evaluation In this section we provide an experimental comparison of the algorithms described in the previous section. In Section 4.1, we present our experimental setting, while, in Section 4.2, we present and discuss the experimental results. 4.1. Experimental setting Our experimental setting is based on instances of RandomGames class generated by GAMUT [9] (the main testbed for game theory algorithms). Specifically, once generated a game instance we extract the utility function of player 1 and assign it to all the team members. Furthermore, in each generated game instance, the payoffs are between 0 and 1. We use game in- stances with a different number of players n and a different number of actions m for each player, with n P t3, 4, 5u as follows: $ 5 to 40, step “ 5, n “ 3 ’ ’ ’ & 50 to 80, step “ 10, n “ 3 m“ ’ 5 to 15, step “ 5, n “ 4 ’ ’ % 5 to 10, step “ 5, n “ 5 . Algorithms are implemented in Python 2.7.6, adopting GUROBI 6.5.0 [5] for solving linear problems, AMPL 20160310 [4] and BARON 14.4.0 [11,10] for solving global optimization programs. We set a timeout of 60 minutes for the resolution of each instance. All the algorithms are executed on a computer with the following characteristics: – CPU: Intel Xeon CPU E5–4610 v2 @ 2.30GHz; – RAM: 128 GB; – OS: Ubuntu 16.04.1 LTS. 4.2. Experimental results 4.2.1. Global optimization The quality of the solutions returned by global optimization solver BARON is reported in Figure 1. We report the ratio between the lower (a.k.a. primal) bound and the upper (a.k.a. dual) bound returned by BARON once terminated. When BARON finds the optimal solution (up to an accuracy of 10´9 ), the lower bound equals the upper bound achieving a ratio of 1. This happens: with 3 players up to 15 actions per player (except for some outliers), with 4 and 5 players up to 5 actions. If the lower bound is strictly smaller than the upper bound, the solution returned by BARON may be not optimal and the ratio is strictly smaller than 1. Interestingly, it can be observed that the ratio is rather high, being always larger than 0.9 with 3 players (the boxplot shows a very low standard deviation of the results), while with 4 and 5 players the performance are worse, but the approximation ratio keeps to be high, being always higher than 0.7. Finally, let us notice that, in principle, the lower bound returned by BARON may be indefinite. This happens whenever BARON terminates without finding any feasible solution. However, this never happens with our problem since a feasible solution can be found easily, any possible strategy profile being a feasible solution. 9 1.0 ● ● ● ● ● ● ● ● 0.9 Number of players: ● 3 4 5 0.8 0.7 10 20 30 40 Number of actions Ratio (BARON lower bound / BARON upper bound) Ratio (BARON lower bound / BARON upper bound) N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.9 Number of players: 3 ● 4 5 0.8 0.7 5 10 15 20 25 30 35 40 Number of actions (a) Mean of the approximation ratio of the solutions returned by (b) Boxplot of approximation ratio of the solutions retuned by BARON. BARON. Fig. 1. Analysis of the approximation ratio obtained with BARON. 4.2.2. IteratedLP The mean of the approximation ratios between the value of the solutions obtained with IteratedLP and the lower bound returned by BARON is reported in Figure 2 as the number of players and the number of random restarts vary. In the plots, we report data only when the algorithm terminated by 60 minutes. For instance, with 5 players, IteratedLP terminates by 60 minutes only with 1 random restart, while with 4 players IteratedLP terminates by 60 minutes only with 30 random restarts or less. The mean of the approximation ratios is always smaller than 1, showing that IteratedLP always performs worse than global optimization. This happens even with 3 players and 80 actions, where the trend suggests that, as the number of actions increases, the ratio increases, but it remains below 1. Surprisingly, the mean of the approximation ratio obtained with a single restart is rather high and increases as the number of actions increases (on the other hand, we know that in the worst case the solution quality may decrease in the number of actions). We report in Figure 3 the statistical significance of our results. They show that the standard deviation is rather close especially as the number of actions increases and the number of random restarts increases. The compute time of a single restart may be extremely long, not allowing the termination of the algorithm with more restarts after 70 actions with 3 players. This is due to the long time needed to solve each single LP in IteratedLP, while the number of iterations is very low as showed in Figure 4. 4.2.3. HeuristicEnumeration The mean of the approximation ratio (defined as the ratio between the value of the solutions obtained with HeuristicEnumeration w.r.t. the lower bound returned by BARON) is reported in Figure 2 as the number of players and vary. In the plots, we report data only when the algorithm terminated within the 60 minutes deadline. We recall that HeuristicEnumeration is the only algorithm guaranteeing a theoretical bound over the optimality gap. The main result is that HeuristicEnumeration does not terminate by 60 minutes with ă 1 from 25 actions even with 3 players. This shows that the algorithm can be practically applied only when “ 1, but this does not provide any theoretical bound (indeed, since all the payoffs are between 0 and 1, any strategy profile has an additive gap no larger than 1). The approximation ratios of the solutions returned by HeuristicEnumeration are extremely low, being smaller than 0.25 except for the cases with very few actions. Finally, we report in Figure 3 the statistical significance. Also for HeuristicEnumeration, as with IteratedLP, the standard deviation is very low, showing that the performance of the algorithm is rather stable independently from the specific game instance. 4.2.4. Algorithms comparison We provide here a comparison of the performance of IteratedLP and HeuristicEnumeration, each with its best parametrization: for IteratedLP we set the number of random restarts as the largest number such that the algorithm terminates with 40 actions (i.e., 60 random restarts with 3 players, 30 with 4 players, and 1 with 5 players), for HeuristicEnumeration we set as the smallest value such that the algorithm terminates with 40 actions (i.e., “ 1.0). The results show that IteratedLP outperforms on average HeuristicEnumeration in all the cases. It can be observed that the boxplots have no overlaps, showing that for no game instance, among those we generated, HeuristicEnumeration outperforms IteratedLP. 10 N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG 1.00 ● ● Ratio (Iterated LP / BARON) ● ● ● ● ● ● ● ● ● ● ● ● ● Random restarts: ● ● 1 10 ● ● 20 ● 0.9 30 40 50 60 0.8 70 80 ● 90 Ratio (Heuristic enumeration / BARON) 1.0 ● 0.75 ε ● ● 0.1 0.25 0.50 0.5 0.75 1 0.25 100 0.7 0.00 20 40 60 80 10 20 Number of actions (a) 3–players IteratedLP. 1.00 ● Random restarts: ● Ratio (Iterated LP / BARON) 40 (b) 3–players HeuristicEnumeration. ● 1 10 20 0.9 30 40 ● 50 ● 60 ● 0.8 70 80 ● 90 Ratio (Heuristic enumeration / BARON) 1.0 30 Number of actions 0.75 ε ● ● 0.1 0.25 0.50 0.5 0.75 1 0.25 100 0.7 0.00 5.0 7.5 10.0 12.5 15.0 5.0 7.5 Number of actions (c) 4–players IteratedLP. 12.5 15.0 (d) 4–players HeuristicEnumeration. 1.0 1.00 ● 1 10 20 0.9 30 40 50 60 0.8 70 ● 80 ● 90 ● Ratio (Heuristic enumeration / BARON) Random restarts: ● Ratio (Iterated LP / BARON) 10.0 Number of actions 0.75 ε ● ● 0.1 0.25 0.50 0.5 0.75 1 0.25 100 0.7 0.00 5 6 7 8 9 10 Number of actions (e) 5–players IteratedLP. 5 6 7 8 9 10 Number of actions (f) 5–players HeuristicEnumeration. Fig. 2. Mean performance of IteratedLP and HeuristicEnumeration. 4.2.5. Price of non–correlation Finally, we evaluate the price, in terms of utility, for a team acting in non–correlated fashion. We do that by comparing the Maxmin value obtained when all the members of the team can play in correlated strategies w.r.t. the Team–maxmin value obtained when the members of the team play in mixed strategies. The latter value is obtained by using global optimization (more precisely, we use the lower bound returned by BARON). Figure 6 shows the mean values of the ratio (the value obtained with correlated strategies divided by the value obtained with mixed ones) and the corresponding boxplots for games with 3, 4 and 5 players. Focusing on Figure 6(a), we note that all the curves are increasing and, for 3–players games, the trend is asymptotical towards about 1.15. This means that correlation allows the team to always achieve better re- sults both w.r.t. the number of players and the number of actions, but in random games the loss due to the non–correlation is low. 5. Conclusions and future research In the present work, for the first time, we analyzed the Team–maxmin equilibrium from a computational perspective. We introduced a definition that involves an arbitrary number of teams and thus it is broader than the one presented in literature. We focused on games with n players, where a team composed of n ´ 1 players play against one single adversary. Being the Team– maxmin equilibrium the best solution concept for these games, we reported its mathematical formulation and proposed two approximation algorithms to deal 11 N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG 1.00 ●●● ●●●● ●●● ●● ●● ●● ●●● ● ●●● ● ●● ●● ● ● ● Ratio (Iterated LP / BARON) ●● ● ● ● ● ● ● ●● ● Random restarts: ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ●●● ● ● ●● ●●● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ●● ●●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● 1 ● 10 ● ● 20 ● 0.9 ● 30 ● ● ●●● 40 ● 50 ● 60 0.8 70 80 90 ● Ratio (Heuristic enumeration / BARON) 1.0 0.75 ε 0.1 ● ● 0.25 0.50 0.5 0.75 1 0.25 ● ● ● 100 ● 15 20 25 30 35 40 50 60 70 80 5 10 15 Number of actions 20 25 30 35 40 Number of actions (a) 3–players IteratedLP. (b) 3–players HeuristicEnumeration. 1.0 1.00 ● ● ● ● ● ● ● 1 ● ● ● 10 ● ● ● 0.9 ● ● ● 20 ● ● ● ● 30 ● ● ● 40 ● ● ● 50 ● ● ● ● 60 ● 0.8 70 80 ● 90 Ratio (Heuristic enumeration / BARON) ● ● ● ● Random restarts: ● ● ● ● ● ● ● ● ● ● ● Ratio (Iterated LP / BARON) ● ● 0.00 10 ● ● ● 0.7 5 ● ● 0.75 ε 0.1 0.25 0.50 0.5 0.75 1 ● ● 0.25 ● ● 100 ● ● 0.7 0.00 5 10 ● 15 5 Number of actions 15 Number of actions (c) 4–players IteratedLP. (d) 4–players HeuristicEnumeration. 1.0 1.00 1 ● 0.9 ● ● ● 10 20 ● ● 30 ● ● ● ● 40 50 60 0.8 70 ● ● 80 90 ● Ratio (Heuristic enumeration / BARON) Random restarts: ● Ratio (Iterated LP / BARON) ● 10 0.75 ε 0.1 ● ● 0.50 0.25 ● ● 0.5 0.75 1 0.25 ● ● ● 100 0.7 0.00 5 10 Number of actions (e) 5–players IteratedLP. 5 10 Number of actions (f) 5–players HeuristicEnumeration. Fig. 3. Boxplots of the performance of IteratedLP and HeuristicEnumeration. with its computation, HeuristicEnumeration and IteratedLP. HeuristicEnumeration is a variation of a quasi–polinomial algorithm already presented in literature while IteratedLP is a novel heuristic algorithm. Such algorithm is an iterated approximation anytime algorithm, which improves its performance given more computational time. We also provided some bounds on the utility of the IteratedLP. We evaluated these two approximation algorithms, comparing them with the value computed by the global optimization approach, which results overcoming them in medium–sized instances, while IteratedLP is the most performing algortihm for large instances. Finally, we compared the results of the global optimization solution with the correlated equilibrium, discovering that correlation really gives the team the possibility to achieve always better results, independently from the number of players and actions played by each player. In the future, we want to further analyze the IteratedLP algorithm, providing an accurate computational analysis on the time required to be executed. Moreover, we want to investigate the price of non– correlation, i.e., the utility loss due to the impossibility of communication between the team players, to obtain theoretical bounds. We will also study how to extend the results presented in this work to team games that present two teams with more than one player each. Finally, the application of such theoretical results to a real–world problem, e.g., security games, would be very important to show how these algorithms can be adapted to the high level of complexity of such scenarios. 12 N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG 10.0 Random restarts: Number of iterations (LP iterated) 1 ● ●● 10 7.5 ● ● ● ● ●●● ● ● ● 20 ● ●● [2] 30 ● ● ●● 40 5.0 ● ●● ● ● ●● ●● ● ● 50 ● ● ●● ● ●● ●●●●●●● ●● ●●●●●● ● ● 60 ● ● [3] 70 ● ● ● ● 2.5 80 ● ● ● ● ●●● ●● ●●●●●●●●● ● ●● 90 100 0.0 5 10 15 20 25 30 35 40 50 60 70 [4] 80 Number of actions (a) Games with 3 players. 15 Random restarts: Number of iterations (LP iterated) 1 10 ● 10 [5] ● 20 ● ● ● ● ● ● ● ● ● ● 30 40 ● ● ● 5 ● 60 70 ● 80 ● ● ● [6] 50 [7] 90 100 0 5 10 [8] 15 Number of actions (b) Games with 4 players. [9] 15 Random restarts: ● 1 Number of iterations (LP iterated) ● ● 10 ● 10 20 ● 30 40 50 60 5 ● ● 70 [10] 80 ● 90 ● 100 [11] 0 5 10 Number of actions (c) Games with 5 players. [12] Fig. 4. Number of iterations done by IteratedLP per restart. [13] References [1] Nicola Basilico, Nicola Gatti, and Francesco Amigoni. Patrolling security games: Definition and algorithms for solving large instances with single patroller and single intruder. Artificial Intelligence, 184:78–123, 2012. C. Borgs, J. T. Chayes, N. Immorlica, A. T. Kalai, V. S. Mirrokni, and C. H. Papadimitriou. The myth of the folk theorem. Games and Economic Behavior, 70(1):34–43, 2010. Xiaotie Deng, Christos Papadimitriou, and Shmuel Safra. On the complexity of equilibria. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 67–71. ACM, 2002. R. Fourer, D. M. Gay, and B. Kernighan. Algorithms and model formulations in mathematical programming. chapter AMPL: A Mathematical Programming Language, pages 150– 151. Springer-Verlag New York, Inc., New York, NY, USA, 1989. Inc. Gurobi Optimization. Gurobi optimizer reference manual, 2015. K. A. Hansen, T. D. Hansen, P. B. Miltersen, and T. B. Sørensen. Approximability and parameterized complexity of minmax values. In WINE, pages 684–695, 2008. Albert Xin Jiang, Ariel D Procaccia, Yundi Qian, Nisarg Shah, and Milind Tambe. Defender (mis) coordination in security games. AAAI, 2013. Arnold Neumaier, Oleg Shcherbina, Waltraud Huyer, and Tamás Vinkó. A comparison of complete global optimization solvers. Mathematical programming, 103(2):335–356, 2005. Eugene Nudelman, Jennifer Wortman, Yoav Shoham, and Kevin Leyton-Brown. Run the gamut: A comprehensive approach to evaluating game-theoretic algorithms. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 880–887. IEEE Computer Society, 2004. N. V. Sahinidis. BARON 14.4.0: Global Optimization of MixedInteger Nonlinear Programs, User’s Manual, 2014. M. Tawarmalani and N. V. Sahinidis. A polyhedral branchand-cut approach to global optimization. Mathematical Programming, 103:225–249, 2005. John Von Neumann and Oskar Morgenstern. Theory of games and economic behavior. Princeton university press, 2007. Bernhard von Stengel and Daphne Koller. Team-maxmin equilibria. Games and Economic Behavior, 21(1):309–321, 1997. 13 N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG ● ● ● ● ● ● ● 1.00 ● 0.75 Algorithm: ● 0.50 Iterated LP, n_restart=60 Heuristic enumeration, ε=1 0.25 Ratio (approximated alg. / BARON) Ratio (approximated alg. / BARON) 1.00 ● ● ● ● ● ● ● ● ● ● 0.75 Algorithm: Iterated LP, n_restart=60 0.50 Heuristic enumeration, ε=1 0.25 ● 0.00 10 20 30 40 ● ● ● ● 0.00 5 10 15 Number of actions (a) 3–players ratio comparison. 30 35 40 1.00 ● ● Ratio (approximated alg. / BARON) 25 (b) 3–players comparison boxplot. ● 0.75 Algorithm: ● 0.50 Iterated LP, n_restart=30 Heuristic enumeration, ε=1 0.25 Ratio (approximated alg. / BARON) 1.00 20 Number of actions ● ● ● ● 0.75 Algorithm: Iterated LP, n_restart=30 0.50 Heuristic enumeration, ε=1 0.25 ● ● 0.00 0.00 5.0 7.5 10.0 12.5 15.0 ● ● ● 5 10 Number of actions 15 Number of actions (c) 4–players ratio comparison. (d) 4–players comparison boxplot. 1.00 1.00 ● ● ● ● ● 0.75 ● Algorithm: ● 0.50 Iterated LP, n_restart=1 Heuristic enumeration, ε=1 0.25 0.00 Ratio (approximated alg. / BARON) Ratio (approximated alg. / BARON) ● 0.75 ● Algorithm: Iterated LP, n_restart=1 0.50 Heuristic enumeration, ε=1 ● 0.25 ● 0.00 5 6 7 8 9 10 Number of actions (e) 5–players ratio comparison. 5 10 Number of actions (f) 5–players comparison boxplot. Fig. 5. Average ratios and boxplots of the comparison between IteratedLP and HeuristicEnumeration. 14 N. Basilico, A. Celli, G. De Nittis and N. Gatti / Computing the team–maxmin equilibrium in STSA–TG 1.4 1.3 Number of players: ● 1.2 3 4 5 ● 1.1 ● ● ● ● ● ● Ratio (correlated / BARON) Ratio (correlated / BARON) 1.4 1.3 1.2 ● ● ● 1.1 ● ● ● ● ● ● 1.0 1.0 10 20 30 40 5 10 15 Number of actions (a) Non–correlation loss mean. 25 30 35 (b) 3–players non–correlation loss boxplot. 1.4 Ratio (correlated / BARON) 1.4 Ratio (correlated / BARON) 20 Number of actions 1.3 ● 1.2 ● 1.1 1.0 1.3 1.2 1.1 1.0 5 10 15 5 Number of actions (c) 4–players non–correlation loss boxplot. 10 Number of actions (d) 5–players non–correlation loss boxplot. Fig. 6. Inefficiency due to the non–correlation in team games. 40
© Copyright 2024 Paperzz