Circumventing the Price of Anarchy: Leading Dynamics to Good Behavior 1 Maria-Florina Balcan1 Avrim Blum2 Yishay Mansour3 School of Computer Science, College of Computing, Georgia Institute of Technology 2 Department of Computer Science, Carnegie Mellon University 3 Blavatnik School of Computer Science, Tel Aviv University [email protected] [email protected] [email protected] Abstract: Many natural games can have a dramatic difference between the quality of their best and worst Nash equilibria, even in pure strategies. Yet, nearly all work to date on dynamics shows only convergence to some equilibrium, especially within a polynomial number of steps. In this work we study how agents with some knowledge of the game might be able to quickly (within a polynomial number of steps) find their way to states of quality close to the best equilibrium. We consider two natural learning models in which players choose between greedy behavior and following a proposed good but untrusted strategy and analyze two important classes of games in this context, fair cost-sharing and consensus games. Both games have extremely high Price of Anarchy and yet we show that behavior in these models can efficiently reach low-cost states. Keywords: Dynamics in Games, Price of Anarchy, Price of Stability, Cost-sharing games, Consensus games, Learning from untrusted experts 1 Introduction on fast convergence of both best response and regret minimization dynamics to states with cost There has been substantial work in the machine comparable to the Price of Anarchy of the game learning, game theory, and (more recently) algo- [4, 15, 26, 28], whether or not that state is an equirithmic game theory communities on understand- librium. This line of work is justified by the realing the overall behavior of multi-agent systems ization that in many cases the equilibrium nature in which agents follow natural learning dynam- of the system itself is less important, and what we ics such as (randomized) best/better response and care more about is having the dynamics reach a no-regret learning. For example, it is well known low-cost state; this is a position we adopt as well. that in potential games, best-response dynamics, in which players take turns each making a bestHowever, while the above results are quite genresponse move to the current state of all the oth- eral, the behavior or equilibrium reached by the ers, is guaranteed to converge to a pure-strategy given dynamics could be as bad as the worst equiNash equilibrium [24, 27]. Significant effort has librium in the game (the worst pure-strategy Nash been spent recently on analyzing various proper- equilibrium in the case of best-response dynamics ties of these dynamics and their variations, in par- in potential games, and the worst correlated equiticular on their convergence time [1, 26]. No re- librium for no-regret algorithms in general games). gret dynamics have also been long studied. For Even for specific efficient algorithms and even example, a well known general result that applies for natural potential games, in general no bounds to any finite game is that if players each follow better than the pure-strategy price of anarchy are a “no-internal-regret” strategy, then the empirical known. On the other hand, many important podistribution of play is guaranteed to approach the tential games, including fair cost-sharing and conset of correlated equilibria of the game [17–19]. sensus games, can have very high-cost Nash equiThere has also been a lot of attention recently libria even though they also always have low-cost 1 equilibria as well; that is, their Price of Anarchy is extremely high and yet their Price of Stability is low (we discuss these games more in Section 1.1). Thus, a guarantee comparable with the cost of the worst Nash equilibrium can be quite unsatisfactory. solutions might be known because people analyzing the specific instance being played might discover and publish low cost global behaviors. In this case, individual players might then occasionally test out their parts of these behaviors, using them as extra inputs to their own learning algorithm or adaptive dynamics to see if they do in fact provide benefit to themselves. The question then is can this process allow low cost states to be reached and in what kinds of games? Unfortunately, in general there have been very few results showing natural dynamics that lead to low cost equilibria or behavior in games of this type, especially within a polynomial number of steps. For potential games, it is known that noisy best-response in which players act in a “simulatedannealing” style manner, also known as log-linear learning, does have the property that the states of highest probability in the limit are those minimizing the global potential [8, 9, 23]. In fact, as temperature approaches 0, these are the only stochastically-stable states. However, as we show in Appendix A.2, reaching such states with nonnegligible probability mass may easily require exponential time. For polynomial-time processes, Charikar et al. [11] show natural dynamics rapidly reaching low-cost behavior in a special case of undirected fair cost-sharing games; however, these results require specialized initial conditions and also do not apply to directed graphs. Motivated by this question, in this paper we develop techniques for understanding and influencing the behavior of natural dynamics in games with multiple equilibria, some of which may be of much higher social quality than others. In particular we consider a model in which a low cost global behavior is proposed, but individual players do not necessarily trust it (it may not be an equilibrium, and even if it is they do not know if others will follow it). Instead, players use some form of experts learning algorithm where one expert says to play the best response to the current state and another says to follow the proposed strategy. Our model imposes only very mild conditions on the learning algorithm each player uses: different players may use completely different algorithms for deciding among or updating probabiliIn this paper we initiate a study of how to aid ties on their two high-level choices. Assume that dynamics, beginning from arbitrary initial condithe players move in a random order. Will this protions, to reach states of cost close to the best Nash duce a low-cost solution (even if it doesn’t conequilibrium. We propose a novel angle on this verge to anything)? We consider two variations of problem by considering whether providing more this model: a learn-then-decide model where playinformation to simple learning algorithms about ers initially follow an “exploration” phase where the game being played can allow natural dynamthey put roughly equal probability on each expert, ics to reach such high quality states. At a high followed by a “commitment” phase where based level, there are two barriers to simple dynamics on their experience they choose one to use from performing well. One is computational: for dithen on, and a smoothly adaptive model where rected cost-sharing games for instance, we do not they slowly change their probabilities over time. even know of efficient centralized procedures for Within each we analyze several important classes finding low-cost states in general. As an optimizaof games. In particular, our main results are that tion problem this is the Directed Steiner Forest for both fair cost-sharing and consensus games, problem and the best approximation factor known these processes lead to good quality behavior. 1/2+ǫ 4/5+ǫ 2/3 ǫ is min(n ,N , m N ) where n is the number of players, N is the number of vertices and Our study is motivated by several lines of work. m is the number of edges in the graph [12, 16]. One is the above-mentioned work on noisy bestThe other barrier is incentive-based: even if a low- response dynamics which reach high-quality states cost solution were known, there would still be the but only after exponentially many steps. Another is issue of whether players would individually want work on the value of altruism [29] which considers to play it, especially without knowing what other how certain players acting altruistically can help players will choose to do. For example, low-cost the system reach a better state. The last is the work 2 in [5] which considers a central authority who can temporarily control a random subset of the players in order to get the system into a good state. That model is related to the model we consider here, but is much more rigid because it posits two classes of players (one that follows the given instructions and one that doesn’t) and does not allow players to adaptively decide for themselves. (See Section 1.2 for more discussion). Figure 1 A directed cost-sharing game, that models a setting where each player can choose either to drive its own car to work at a cost of 1, or share public transportation with others, splitting an overall cost of k. For any 1 < k < n, if players arrive one a time and each greedily chooses a path minimizing its cost, then the cost of the equilibrium obtained is n, whereas OPT has cost only k. 1.1 Our Results As described above we introduce and analyze two models for guiding dynamics to good equilibria, and within these models we prove strong positive results for two classes of games, fair cost sharing and consensus games. In n-player fair cost sharing games, players choose routes in a network and split the cost of edges they take with others using the same edge; see Section 3 for a formal definition. These games can model scenarios such as whether to drive one’s own car to work or to share public transportation with others (see Figure 1 for a simple example) and can have equilibria as much as a factor of n worse than optimal even though they are always guaranteed to have low-cost equilibria that are only O(log n) worse than optimal as well. In consensus games, players are nodes in a network who each need to choose a color, and they pay a cost for each neighbor of different color than their own (see Section 4). These again can have a wide gap between worst and best Nash equilibrium (in this case, cost Ω(n2 ) for the worst versus 0 for the best). Our main results for these games are the following. For fair cost-sharing, we show that in the learn-then-decide model, so long as the exploration phase has sufficient (polynomial) length and the proposed strategy is nearoptimal, the expected cost of the system reached is O(log(n) log(nm)OPT). Thus, this is only slightly larger than that given by the price of stability of the game. For the smoothly-adaptive model, if there are many players of each type (i.e., associated to each (si , ti ) pair) we can do even better, with high probability achieving cost O(log(nm)OPT), or even O(OPT) if the number of players of each type is high enough. Note that with many players of each type the price of anarchy remains Ω(n) though the price of stability becomes O(1). For consensus games, we show that so long as players place probability β > 1/2 on the proposed optimal strategy, then with high probability play will reach the exact optimal behavior within a polynomial number of steps. Moreover for certain natural graphs such as the line and the grid, any β > 0 is sufficient. Note that our results are actually stronger than those achievable in the more centralized model of [5], where one needs to make certain minimal degree assumptions on the graph as well. In both our models it is an easy observation that for any game, if the proposed solution is a good equilibrium, then in the limit the system will eventually reach the equilibrium and stay there indefinitely. Our interest, however, is in polynomialtime behavior. 1.2 Related Work Dynamics and Convergence to Equilibria: It is well known that in potential games, best-response dynamics, in which players take turns each making a best-response move to the current state of all the others, is guaranteed to converge to a purestrategy Nash equilibrium [24, 27]. Significant effort has been spent recently on the convergence time of these dynamics [1, 26], with both examples of games in which such dynamics can take exponential time to converge, and results on fast convergence to states with cost comparable to the 3 Price of Anarchy of the game [4, 15]. implies that in a two phase model, where in the Another well known general result that applies first phase the jobs join one at a time and use the to any finite game is that if players each follow greedy algorithm (or that of [3]), and in the seca “no-internal-regret” strategy, then the empirical ond phase they perform a best response dynamics, distribution of play is guaranteed to approach the achieves cost within a factor O(m) (or O(log m)) set of correlated equilibria of the game [17–19]. In of optimal. This is in contrast to the unbounded particular, for good no-regret algorithms, the em- Price of Anarchy in general. pirical distribution will be an ǫ-correlated equilib- Taxation: There has also been work on using taxes rium after O(1/ǫ2 ) rounds of play [7]. A recent to improve the quality of behavior [13, 14]. Here result of [20] analyzes a version of the weighted- the aim is via taxes to adjust the utilities of each majority algorithm in congestion games, showing player, such that the only Nash equilibria in the that behavior converges to the set of weakly-stable new game correspond to optimal or near-optimal equilibria. This implies performance better than behavior in the original game. In contrast, our fothe worst correlated equilibrium, but even in this cus is to identify how dynamics can be made to case the guarantee is no better than the worst pure- reach a good result without changing the game or strategy Nash equilibrium. adjusting utilities, but rather by injecting more inNoisy best-response has been shown in the limit formation into the system. to reach states of minimum global potential [8, 9, 23], and thus provide strong positive results in games with a small gap between potential and cost. However, this convergence may take exponential time, even for fair cost-sharing games. In particular we show in Appendix A.2 that if one makes many copies of the edges of cost 1 in the example of Figure 1, reaching a state with sufficiently many players on the shared path to induce others to follow along would take time exponential in k even if the algorithm can arbitrarily vary its temperature parameter. For details see Appendix A.2. Public service advertising: Finally, the public service advertising model of [5] also uses the idea of a proposed strategy in order to move players into a good equilibrium. In the model of [5], the players only once select (randomly) between following the proposed strategy or not. Players then stick with their decision while those that decided not to follow the proposed strategy settle on some equilibrium for themselves (given the other players actions are fixed). Then, in the last phase all players perform a best response dynamics to converge to a final equilibrium (the convergence is guaranteed since the discussion is limited to potential games). In contrast, in our models the players continuously randomly re-select between following the proposed strategy or performing a best response. This continuous randomization makes the analysis of our dynamics technically more challenging. Conceptually, the benefit of our model is that the players are “symmetric” and can continuously switch between the two alternatives, which better models selfish behavior. This continuous process is what enables the players to both explore and exploit the two alternative actions. An alternative line of work assumes that the system starts empty and players join one at a time. Charikar et al. [11] analyze fair cost sharing in this setting on an undirected graph where all players have a common sink. Their model has two phases: in the first, players enter one at a time and use a greedy algorithm to connect to the current tree, and in the second phase the players undergo best-response dynamics. They show that in this case a good equilibrium (one that is within only a polylog(n) factor of optimal) is reached. We discuss this result further in Appendix A. We remark here that for directed graphs, it is easy to construct simple examples where this process reaches 2 A Formal Framework an equilibrium that is Ω(n) from optimal (see Figure 1). For scheduling on unrelated machines with 2.1 Notation and Definitions makespan cost function, the natural greedy algoWe start by providing general notations and defrithm to assign incoming jobs is an O(m) approx- initions. A game is denoted by a tuple imation and a more sophisticated online algorithm guarantees an O(log m) approximation [3]. This G = hN, (Si ), (costi )i 4 where N is a set of n players, Si is the finite action space of player i ∈ N , and costi is the cost function of player i. The joint action space of the players is S = S1 × . . . × Sn . For a joint action s ∈ S we denote by s−i the actions of players j 6= i, i.e., s−i = (s1 , ..., si−1 , si−1 , ..., sn ). The cost function of player i maps a joint action s ∈ S to a real non-negative number, i.e., costi : S → R+ . Every game has associated a social cost function cost : S → R that maps a joint action to a real value. In the cases discussed in this paper the social cost is simply the sum of players’ costs, i.e., cost(s) = n X 2.2 The Model We now describe the formal model that we consider. Initially, players begin in some arbitrary state, which could be a high-cost equilibrium or even a state that is not an equilibrium at all. Next an entity, perhaps the designer of the system or a player who has studied the system well, proposes some better global behavior B. B may or may not be an equilibrium behavior, our only assumption is that it have low overall cost. Now, players move one at a time in a random order. Each player, when it is their turn to move, chooses among two options. The first is to simply behave greedily and to make a best-response move to the current configuration. The second option is to follow their part of the given behavior B. These two high-level strategies (best-response or follow B) are viewed as two “experts” and the player then runs some learning algorithm aiming to learn which of these is most suitable for himself. Note that best-response is an abstract option — the specific action it corresponds to may change over time. Because moves occur in an asynchronous manner, there are multiple reasonable ways to model the feedback each player gives to its learning algorithm: for instance, does it consider the average cost since the player’s previous move or just the cost when it is the player’s turn to go, or something in between? In addition, does it get to observe the cost of the action it did not take (the full information model) or only the action it chose (the bandit model)? To abstract away these issues, and even allow different players to address them differently, we consider here two models that make only very mild assumptions on the kind of learning and adaptation made by players. Learn then Decide model: In this model, players follow an “exploration” phase where each time it is their turn to move, they flip a coin to decide whether to follow the proposed behavior B or to do a best-response move to the current configuration. We assume that the coin gives probability at least β to B, for some constant β > 0. Finally, after some common time T ∗ , all players switch to an “exploitation” phase where they each commit in an arbitrary way based on their past experience to follow B or perform best response from then on. (The time T ∗ is selected in advance.) costi (s). i=1 The optimal social cost is OPT(G) = min cost(s). s∈S We sometimes overload notation and use OPT for a joint action s that achieves cost OPT(G). Given a joint action s, the Best Response (BR) of player i is the set of actions BRi (s) that minimizes its cost, given the other players actions s−i , i.e., BRi (s−i ) = arg mina∈Si costi (a, s−i ). A joint action s ∈ S is a pure Nash Equilibrium (NE) if no player i ∈ N can benefit from unilaterally deviating to another action, namely, every player is playing a best response action in s, i.e., si ∈ BRi (s−i ) for every i ∈ N . A best response dynamics is a process in which at each time step, some player which is not playing a best response switches its action to a best response action, given the current joint action. In this paper we focus on potential games, which have the property that any best response dynamics converges to a pure Nash equilibrium [24]. Let N (G) be the set of Nash equilibria of the game G. The Price of Anarchy (PoA) is defined as the ratio between the maximum cost of a Nash equilibrium and the social optimum, i.e., max cost(s)/OPT(G). s∈N (G) The Price of Stability (PoS) is the ratio between the minimum cost of a Nash equilibrium and the social optimum, i.e., min cost(s)/OPT(G). s∈N (G) For a class of games, the PoA and PoS are the maximum over all games in the class 5 other alternative) or even the sum total cost since its previous move, then B might appear better. Our model allows users to update in any way they wish, so long as the updates are sufficiently gradual. Finally, as mentioned in the introduction, in both our models it is an easy observation that for any game, if the proposed solution is a good equilibrium, then in the limit (as T ∗ → ∞ in the Learnthen-Decide model or as ∆ → 0 in the Smoothly Adaptive model) the system will eventually reach the equilibrium and stay there indefinitely. Our interest, however, is in polynomial-time behavior. The above model assumes some degree of coordination: a fixed time T ∗ after which all players make their decisions. One could imagine instead each player i having its own time Ti∗ at which it commits to one of the experts, perhaps with the time itself depending on the player’s experience. In the Smoothly-Adaptive model below we even more generally allow players to smoothly adjust their probabilities over time as they like, subject only to a constraint on the amount by which probabilities may change between time steps. Smoothly Adaptive model: In this model, there are no separate exploration and exploitation phases. Instead, each player i maintains and adjusts a value pi over time. When player i is chosen to move, it flips a coin of bias pi to select between the proposed behavior or a best-response move, choosing B with probability pi . We allow the players to use arbitrary adaptive learning algorithms to adjust these probabilities with the sole requirement that learning proceed slowly. Specifically, using pti to denote the value of pi at time t, we require that 3 Fair Cost Sharing The first class of games we study in this paper, because of its rich structure and wide gap between price of anarchy and price of stability, is that of fair cost sharing games. These games are defined as follows. We are given a graph G = (V, E), which can be directed or undirected, where each edge e ∈ E has a nonnegative cost ce ≥ 0. There is a set N = {1, ..., n} of n players, where player i is associated with a source si and a sink ti . The strategy set of player i is the set Si of si − ti paths. In an outcome of the game, each player i chooses |pti − pt+1 | ≤ ∆ i a single path Pi ∈ Si . Given a vector of players’ for a sufficiently (polynomially) small quan- strategies s = (P1 , . . . , Pn ), let xe be the number tity ∆, and furthermore that for all i, the ini- of agents whose strategy contains edge e. In the tial probability p0i ≥ p0 for some overall con- fair cost sharing game the cost to agent i is stant 0 < p0 < 1. Note that the algorithm X ce may update pi even in time steps at which it costi (s) = xe does not move. The learning algorithm may e∈Pi use any kind of feedback or weight-updating strategy it wants to (e.g., gradient descent [30, and the goal of each agent is to connect its termi31], multiplicative updating [10, 21, 22]) sub- nals with minimum total cost. The social cost of an outcome s = (P1 , ..., Pn ) is defined to be ject to this bounded step-size requirement. X We say that the probabilities are (T, β)-good cost(P1 , ..., Pn ) = ce . if for any time t ≤ T we have for all i, pti > 0 e∈∪ P i i β. (Note that if ∆ < (p − β)/T then clearly the probabilities are (T, β)-good.) It is well known that fair cost sharing games are We point out that while one might at first think that potential games [2, 24] and the price of anarchy in any natural adaptive algorithm would learn to favor these games is Θ(n) while the price of stability is P best-response (always decreasing pi ), this depends H(n) [2], where H(n) = n 1/i = Θ(log n). i=1 on the kind of feedback it uses. For instance, if the In particular, the potential function for these games algorithm considers only its cost immediately after is xe XX it moves, then indeed by definition best-response Φ(s) = ce /x, will appear better. However, if it considers its cost e∈E x=1 immediately before it moves (comparing that to what its cost would have been had it chosen the which satisfies the following inequality: 6 Fact 1 In fair cost sharing, for any s ∈ S we have: cost(s) ≤ Φ(s) ≤ H(n) · cost(s). 3.1 The Main Arguments We begin with the following key lemma that is useful in the analysis of both learn-then-decide and smoothly adaptive models. For ease of notation, we assume in this section that the proposed strategy B is the socially optimal behavior OPT, so we can identify PiOP T = PiB as the behavior proposed by B to player i. If B is different from OPT, then we simply lose the corresponding approximation factor. Lemma 2 Consider a fair cost sharing game. There exists a T = poly(n), such that if the probabilities are (T, β)-good for constant β > 0, then with high probability the cost at time T will be at most O(OPT log(mn)). Moreover, if we have at least c log(nm) players of each (si , ti ) pair for sufficiently large constant c, then with high probability the cost at time T will be O(OPT). Overview of the Results and Analysis: Our main results for fair cost sharing are the following. If we have many players of each type (the type of a player is determined by its source si and destination ti ) then in both the learn-then-decide and smoothly adaptive models we can show that with high probability, behavior will reach a state of cost within a logarithmic factor of OPT within a polynomial number of steps. Moreover, for the learnthen-decide model, even if we do not have many players of each type we can show that the expected cost at the end of the process will be low. Proof: We begin with the general case. Let nopt e denote the number of players who use edge e in OPT. We partition edges into two classes. We say an edge is a “high traffic” edge if nopt > e c log(nm) where c is a sufficiently large constant (c = 32/β suffices for the argument below). We say it is a “low traffic” edge otherwise. Define T0 = 2n log n. With high probability, by time T0 each player has had a chance to move at least once. We assume in the following that this indeed is the case. Note that as a crude bound, at this point the cost of the system is at most n2 · OPT (each player will move to a path of cost-share at most OPT and therefore of actual cost at most n·OPT). Next, by Chernoff bounds and the union bound, our choice of c implies that with high probability each high-traffic edge e has at least βnopt e /2 players on it at all time steps T ∈ [T0 , T0 + n3 ]; in particular, Chernoff bounds imply that each of the at most mn3 events has probability at least opt 1 − e−βne /8 ≥ 1 − 1/(mn)4 . In the remaining analysis, we assume this indeed is the case as well. Let OPTi denotePthe cost of player i in OPT, so that OPT = i OPTi . Our assumption above implies that for any time step T under consideration, if player i follows the proposed strategy PiOP T , its cost will be at most c log(nm)OPTi . In particular, its cost on the low-traffic edges in PiOP T can be at most a factor c log(nm) larger than its cost on those edges under OPT, and its cost on high-traffic edges is at most a 2/β factor larger than its cost on those edges under OPT. The high level idea of the analysis is that we first prove that so long as each player randomizes with probability near to 50/50, with high probability the overall cost of the system will drop to within a logarithmic factor of OPT in a polynomial number of steps; moreover, at that point both the best response and the proposed actions are pretty good strategies from the individual players point of view. To finish the analysis in the “Learn then Decide” model, we show that in the remaining steps of the exploration phase the expected cost does not increase by much; using properties of the potential function, we then show that in the final “decision” round T ∗ , the overall potential cannot increase substantially either, which in turn implies a bound on the increase in overall cost. For the adaptive model, one key difficulty in the analysis is to show that if the system reaches a state where the social cost is low and both abstract actions are pretty good for most players, the cost never goes high again. We are able to show that this indeed is the case as long as there are many players of each type, no matter how the players adjust their probabilities pti or make their choice between best-response and following the proposed behavior. 7 We now argue as follows. Let costT denote the cost of the system at time T . If which is not possible since by definition of XT we have costT ≥ 2c log(nm)OPT, ΦT ≤ Φ0 + (XT − X0 ) − T Q/n then the expected cost of a random player is at least which would be negative. Therefore, with high probability stopping must occur before this time 2c log(nm)OPT/n. as desired. Finally, if we have at least c log(mn) players of On the other hand, if player i is chosen to move at each type, then there are no low-traffic edges and time T , from the above analysis its cost after the so we do not need to lose the c log(mn) factor in move (whether it chooses B or best response) will the argument. be at most c log(nm)OPTi . The expected value of this quantity over players i chosen at random is We now present a second lemma which will be at most c log(nm)OPT/n. Therefore, if used to analyze the “Learn then Decide” model. costT ≥ 2c log(nm)OPT, Lemma 3 Consider a fair cost sharing game in the Learn-then-Decide model. If the cost of the system at time T1 is O(OPT log(mn)), and T = T1 + poly(n) < T ∗ , then the expected value of the potential at time T is O(OPT log(mn) log(n)). the expected drop in potential at time T is at least c log(nm)OPT/n. Finally, since the cost at time T0 was at most n2 · OPT, which implies by Fact 1 the value of the potential was at most n2 (1 + log(n))OPT, with high probability this cannot continue for more than O(n3 ) steps. Formally, we can apply HoeffdingAzuma bounds for supermartingales as follows: let us define Q = c log(nm)OPT and Proof: First, as argued in the proof of Lemma 2, with high probability for any player i and any time t ∈ [T1 , T ], the cost for player i to follow the proposed strategy at time t is at most c log(nm)OPTi for some constant c. Let us assume below that this is indeed the case. Next, the above bound implies that if the cost at time t ∈ [T1 , T ] is costt , then the expected decrease in potential caused by a random player moving (whether following the proposed strategy or performing best response) is at least ∆T = max(ΦT − ΦT −1 + Q/n, −2Q) and consider running this process stopping when costT < 2Q. Let XT = Φ0 + ∆1 + . . . + ∆T . (costt − c log(nm)OPT)/n; Then throughout the process we have in particular, costt /n is the expected cost of a random player before its move, and E[XT |X1 , . . . , XT −1 ] ≤ XT −1 c log(nm)OPT)/n is an upper bound on the expected cost of a random player after its move. Note and that this is an expectation over randomness in the |XT − XT −1 | ≤ 2Q, choice of player at time t, conditioned on the value where the first inequality holds because our analyof costt . In particular, since this holds true for sis showing an expected decrease in potential of at any value of costt , we can take expectation over least Q/n is true even if we cap all decreases to a the entire process from time T1 up to t, and we maximum of 2Q as in the definition of ∆T . So, by have that if Hoeffding-Azuma (see Theorem 10 in Appendix 3 B), after n steps in the non-stopped process with E[costt ] ≥ c log(nm)OPT high probability we would have then 1 XT − X0 ≤ n2 Q, E[Φt+1 ] ≤ E[Φt ], 2 8 where Φt is the value of the potential at time step the total increase in potential after time T ∗ is at t. most O(OPT log n). Finally, since Φt ≤ log(n)costt , this implies Since after all players have committed, potenthat if tial can only decrease, this implies that the expected value of the potential at any time T ′ ≥ T ∗ E[Φt ] ≥ c log(nm) log(n)OPT is O(OPT log(mn) log(n)). Therefore, the expected cost at time T ′ is at most this much as well, then as desired. E[Φt+1 ] ≤ E[Φt ]. We now use Lemma 2 to analyze fair cost sharSince for any value of costt we always have ing games in the adaptive learning model when the number of players ni of each type (i.e., associated E[Φt+1 ] ≤ E[Φt ] + c log(nm)OPT/n, to each (si , ti ) pair) is large. this in turn implies by induction on t that Theorem 5 Consider a fair cost sharing game in the adaptive learning model satisfying ni = Ω(m) for all i. There exists a T1 = poly(n) such that if the probabilities are (T1 , β)-good for constant β > 0, then with high probability, for all T ≥ T1 the cost at time T is O(log(nm)OPT). Moreover, there exists constant c such that if ni ≥ max[m, c log (mn)] then with high probability, for all T ≥ T1 the cost at time T is O(OPT). E[Φt ] ≤ c log(nm) log(n)OPT + c log(nm)OPT/n for all t ∈ [T1 , T ], as desired. Our main result in the “Learn then Decide” model is the following: Theorem 4 For fair cost sharing games in the Learn then Decide model, a polynomial numProof: First, by Lemma 2 we have that with high ber of exploration steps T ∗ is sufficient so that probability at some time T0 ≤ T1 , the cost of the the expected cost at any time T ′ ≥ T ∗ is system reaches O(OPT log(mn)). The key to the O(log(n) log(nm)OPT). argument now is to prove that once the cost beProof: From Lemma 2, there exists T = poly(n) comes low, it will never become high again. To do such that with high probability the cost of the sys- this we use the fact that ni is large for all i. Our tem will be at most O(OPT log(mn)) at some argument which follows the proof of Theorem 4.3 time T1 ∈ [T ∗ − T, T ∗ ]. From Lemma 3, this in [6] is as follows. Let U be the set of all edges in use at time T0 implies the expected value of the potential at time ∗ along with all edges used in B. In general, we will T right before the final exploitation phase is insert an edge into U if it is ever used from then O(OPT log(mn) log(n)). Finally, we consider ∗ on in the process, and we never remove any edge the decisions made at time T . When a player from U , even if it later becomes unused. Let c∗ be chooses to make a best-response move, this can ∗ only decrease potential. When a player chooses the total cost of all edges in U . So, c is an upper B, this could increase potential. However, for any bound∗on the cost of the current state and the only edge e in the proposed solution B, the total in- way c can increase is by some player choosing a crease in potential caused by edge e over all play- best response path that includes some edge not in ers who have e in their proposed solution is at most U . Now, notice that any time a best-response path for some (si , ti ) player uses such an edge, the total opt cost of all edges inserted into U is at most c∗ /ni , ce · H(ne ) = O(ce log n). because the current player can always choose to This is because whenever a new player makes a de- take the path used by the least-cost player of his cision to commit to following the proposed strat- type and those ni players are currently sharing a egy and using edge e, all previous players who total cost of at most c∗ . Thus, any time new edges made that commitment after time T and whose are added into U , c∗ increases by at most a mulproposed strategy uses edge e are still there. Thus, tiplicative (1 + 1/ni ) factor. We can insert edges 9 into U at most m times, so the final cost is at most of Ω(n2 ).1 This is substantially worse than optimal (either by an Ω(n2 ) or infinite factor, depending on m whether one allows an additive offset or not). cost(T0 )(1 + 1/ni∗ ) , Unlike in the case of fair cost-sharing games, where ni∗ = mini ni . This implies that as long as for consensus games there is no hope of quickly ni∗ = Ω(m) we have cost(T ) = O(cost(T0 )) achieving near-optimal behavior for all β > 0. In for all T ≥ T0 . Thus, overall the total cost remains particular, for any β < 1/2 there exists α > 0 in the example above such that if players choose O(OPT log(mn)). If ni ≥ max[m, c log(mn)], we simply use the the proposed strategy B with probability β, with improved guarantee provided by Lemma 2 to say high probability all nodes have more neighbors that with high probability the cost of the system performing best response than following B. Thus, reaches O(OPT) within T0 ≤ T1 time steps, and it is easy to see by induction that no matter what then apply the charging argument as above in order strategy B is, all best-response players will remain with their initial color. Moreover, this remains true to get the desired result. for exponentially in n many steps.2 On the other hand, this argument breaks down for β > 1/2. In Variations: One could imagine a variation on our fact, we will show that for β > 1/2, for any graph framework where rather than proposing a fixed beG, the system will in polynomial time reach optihavior B, one can propose a more complex stratmal behavior (if B = OPT). ∗ egy such as “follow B until time step T and then It is interesting to compare this with the cen′ ∗ follow B ” or “follow B until time step T and tralized model of [5] in which a central authority then perform best-response”. In the latter case, takes control of a random constant fraction of the the Smoothly-Adaptive model with (T ∗ , β)-good players and aims to use them to guide the selfish probabilities becomes essentially a special case of behavior of the others. In that setting, even simthe Learn-then-Decide model and all results above ple low-degree graphs can cause a problem. For for the Learn-then-Decide model go through iminstance, consider a graph G consisting of n/4 mediately. On the other hand, this type of strat4-cycles each in the initial equilibrium configuraegy requires a form of global coordination that one tion red, red, blue, blue. If the authority controls would prefer to avoid. a random β fraction of players (for any constant β < 1), with high probability a constant fraction of the 4-cycles contain no players under the author4 Consensus Games ity’s control and will remain in a high-cost state. Another interesting class of games we consider On the other hand, it is easy to see that this specific is that of consensus games. Here, players are ver- example will perform well in both our learn-thentices in an n-vertex graph G, and each have two decide or smoothly-adaptive models. We show that choices of action, red or blue. Players incur a in fact all graphs G perform well, though this recost equal to the number of neighbors they have quires care in the argument due to correlations that of different color, and the overall social cost is the may arise. sum of costs of each player. The potential funcWe assume that the proposed behavior B is the tion for consensus games is simply half the social optimal behavior “all blue” and prove that with cost function, or equivalently the number of edges high probability the configuration will reach this having endpoints of different color. While in these state after O(n log2 n) steps. We begin with a simgames optimal behavior is trivial to describe (all 1 Intuitively one can think of this example as two countries, red or all blue for a total cost of 0) and is an equione using English units and one using the metric system, neilibrium, they also have extremely poor equilibria ther wanting to switch. 2 Of course, in the limit as the number of time steps T → as well. For example consider two cliques of n/2 vertices each, with each node also having αn/2 ∞ eventually we will observe a sequence in which all players B, and if B is an equilibrium the system will then remain neighbors in the other clique for some 0 < α < 1. select there forever. Thus if B is “all blue” the system will in the In this case, there is a Nash equilibrium with one limit converge to optimal behavior. However, our interest is in clique red and one clique blue, for an overall cost polynomial time convergence. 10 ple preliminary lemma. Equivalently, we can write this as Lemma 6 If X1 , . . . , Xd are {0, 1}-valued ran1 − qt = (1 − qt−1 )(1 − γ). dom variables with Pr(Xi = 1) ≥ p, then no matter how they are correlated, Since γ > 0 (because β > 1/2), this in turn implies that t = O(log n) is sufficient to reach Pr(MAJORITY(X1 , . . . , Xd ) = 1) ≥ 2p − 1. 1 − q ≤ 1/n2 , meaning that with high probability t all nodes are blue as desired. Proof: The expected number of 1’s is at least pd. We prove this bound on qt as follows. Consider It is also at most the nodes who move at times Tt−1 + 1, Tt−1 + 2, . . . , Tt in order. When some node v moves, by d · pMAJ + (d/2) · (1 − pMAJ ) induction each neighbor w of v has probability at where least qt of being blue (though these may not be independent). By assumption, v chooses B with pMAJ = Pr(MAJORITY(X1 , . . . , Xd ) = 1)). some probability β ′ ≥ β > 1/2 and chooses bestresponse with probability 1 − β ′ . Therefore, the Solving, we get pMAJ ≥ 2p − 1. probability v becomes blue is at least: We now present our main result of this section. ′ ′ Theorem 7 In both Learn-then-Decide and β + (1 − β )Pr(majority of nbrs of v are blue) ≥ β ′ + (1 − β ′ )(2qt−1 − 1) (by Lemma 6) Smoothly-Adaptive models, for any constant β > 1/2, if the proposed behavior B is op= (1 − (2β ′ − 1))qt−1 + (2β ′ − 1) timal then with high probability play will ≥ (1 − γ)qt−1 + γ (for γ = 2β − 1) become optimal in O(n log2 n) steps. For the Smoothly-Adaptive model we assume behavior is as desired. Thus, with high probability all nodes (cn log2 n, β)-good for sufficiently large constant are blue by time Tt for t = O(log n), and by our c. assumption on S this occurs within O(n log2 n) Proof: In both models the dynamics contain steps. two random processes: the random order in which players move and the random coins flipped by the players to determine how they want to move. In order to prove the theorem it will be helpful to separate these two processes and consider them each in turn. First, consider and fix a random sequence S of players to move. Let T1 denote the time by which all players have moved at least once according to S, and more generally let Tt+1 denote the time by which all players have moved at least once since time Tt . Since players move i.i.d. in a random order, with high probability Tt+1 ≤ Tt +3n log(n) for all t = 1, . . . , n. We assume in the following that this indeed is the case; in fact, we can allow S to be an arbitrary, adversarially-chosen sequence subject to this constraint. Fixing S, we now consider the coin flips of the individual players. We will prove by induction that each player has probability at least qt of being blue at time Tt (not necessarily independently) for qt = (1 − γ)qt−1 + γ for γ = 2β − 1. The general result above requires β > 1/2, and as noted earlier there exists graphs G and initial configurations such that the process will fail for any β < 1/2. On the other hand, for several “nice” graphs such as the line or grid, any constant β > 0 is sufficient. For this we assume that best-response will only ask to switch color if the new color is strictly better than the current color. Theorem 8 For the line and d-dimensional grid graphs (constant d), for any β > 0, if the proposed action B is optimal then with high probability play will reach optimal in poly(n) steps. For the case of the Smoothly-Adaptive model we assume behavior is (T, β)-good for T a sufficiently large polynomial in n. Proof: Assume B is “all blue”. On the line, if any two neighbors become blue, they will remain blue indefinitely. Similarly in the grid, if any ddimensional cube becomes blue, the nodes in the cube will also remain blue indefinitely. On the line, any neighbors, in any two consecutive steps, have 11 probability at least β 2 /n2 of becoming blue, and on the d-dimensional grid, any cube, in any 2d cond secutive steps, has probability at least (β/n)2 of becoming all blue. Therefore with high probability all nodes become blue in a polynomial number of steps. 5 Conclusions and Future Directions In this paper we initiate a study of how to aid dynamics, beginning from arbitrary initial conditions, to reach states of cost close to the best Nash equilibrium. We propose a novel angle on this problem by considering how providing more information to simple learning algorithms about the game being played can allow the dynamics to reach such low-cost states. We show that for fair cost-sharing and consensus games, proposing a good solution and allowing users to adaptively decide for themselves between that solution and best-response behavior will efficiently lead to near-optimal configurations, so long as users adapt sufficiently slowly. Both of these games have the property that all by itself, random best response may easily end up in a high-cost state: cost Ω(n · OPT) for fair cost-sharing and cost Ω(n2 ) for consensus, whereas with this additional aid the states reached have cost only polylog(n) · OPT for cost-sharing and 0 for consensus. Open questions and future directions Our results for fair cost sharing games in the adaptive learning model hold only for the case when the number of players ni of each type is large. One natural open question is whether similar positive results can be shown for the general case, i.e., when the number of players ni of each type is arbitrary. It would also be interesting to broaden the types of learning algorithms the players can use. In particular, for the Smoothly Adaptive model, how large a learning rate ∆ (or for the Learn-then-Decide model, how small a cutoff time T ∗ ) can one allow and still produce comparable positive results on the quality of the final outcome? It would also be interesting to extend this model to the case of multiple proposed solutions, some of potentially much higher quality than others, and still show that these dynamics reach good states. In this case, one would need to make additional assumptions on the learning algorithms the players are using. In particular, if players are trying to decide between various learning dynamics (best response, regret minimization, noisy best response, etc.), and if they use an experts learning algorithm to decide between these actions, can we show that if following the the best of the dynamics is good for all players and if the experts learning algorithms have appropriate properties, then the players will do nearly as well as if they had used the best dynamics? Finally, in both of the games we consider players with the same profile (the same source and destination in the fair cost sharing case) receive the same global advice, which can also be easily communicated as a single message. It would be interesting to also analyze games where players with the same profile might receive different global advice. One way to view our model is as follows. Starting from the original game, we create a meta-game in which each player is playing one of the two abstract actions, best response and the proposed strategy. We then show that if the players learn in the new meta-game, with restrictions only on the learning rate, then this results in good behavior in the original game. More generally, it would be interesting to further explore the idea of designing “meta games” on the top of the actual games and showing that natural behavior in these meta games induce good behavior in the original game of interest. Acknowledgements This work was supported in part by ONR grant N00014-09-1-0751, by AFOSR grant FA9550-091-0538, by NSF grant CCF-0830540, by the IST Programme of the European Community under the PASCAL2 Network of Excellence IST-2007216886, by a grant from the Israel Science Foundation (grant No. 709/09) and by grant No. 2008321 from the United States-Israel Binational Science Foundation (BSF). This publication reflects the authors’ views only. References 12 [1] H. Ackermann, H. Röglin, and B. Vöcking. On the impact of combinatorial structure on conges- tion games. In Proc. 47th FOCS, 2006. [2] E. Anshelevich, A. Dasgupta, J. Kleinberg, E. Tardos, T. Wexler, and T. Roughgarden. The Price of Stability for Network Design with Fair Cost Allocation. In FOCS, 2004. [3] J. Aspnes, Y. Azar, A. Fiat, S. A. Plotkin, and O. Waarts. On-line routing of virtual circuits with applications to load balancing and machine scheduling. J. ACM (JACM), 44(3):486–504, 1997. [4] B. Awerbuch, Y. Azar, A. Epstein, and A. Skopalik. Fast convergence to nearly optimal solutions in potential games. In ACM Conference on Electronic Commerce, 2008. [5] M.-F. Balcan, A. Blum, and Y. Mansour. Improved Equilibria via Public Service Advertising. In ACM-SIAM Symposium on Discrete Algorithms, 2009. [6] M.-F. Balcan, A. Blum, and Y. Mansour. The Price of Uncertainty. In Tenth ACM Conference on Electronic Commerce, 2009. [7] A. Blum and Y. Mansour. Chapter 4: Learning, Regret-Minimization, and Equilibria. In N. Nisan, T. Roughgarden, E. Tardos, and V. V. Vazirani, editors, Algorithmic Game Theory. Cambridge University Press, 2007. [8] L. Blume. The statistical mechanics of strategic interaction. Games and Economic Behavior, 5:387– 424, 1993. [9] L. Blume. How noise matters. Games and Economic Behavior, 44:251–271, 2003. [10] N. Cesa-Bianchi, Y. Freund, D.P. Helmbold, D. Haussler, R.E. Schapire, and M.K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, 1997. [11] M. Charikar, H. Karloff, C. Mathieu, J. Naor, and M. Saks. Online multicast with egalitarian cost sharing. In Proc. 20th SPAA, 2008. [12] C. Chekuri, G. Even, A. Gupta, and D. Segev. Set Connectivity Problems in Undirected Graphs and the Directed Steiner Network Problem. In ACMSIAM Symposium on Discrete Algorithms, 2008. [13] R. Cole, Y. Dodis, and T. Roughgarden. How much can taxes help selfish routing? In ACM-EC, 2003. [14] R. Cole, Y. Dodis, and T. Roughgarden. Pricing network edges for heterogeneous selfish users. In STOC, 2003. [15] A. Fanelli, M. Flamminia, and L. Moscardelli. Speed of convergence in congestion games under best responce dynamics. In Proc. 35th ICALP, 2008. [16] M. Feldman, G. Kortsarz, and Z. Nutov. Improved approximation algorithms for directed steiner forest. In ACM-SIAM Symposium on Discrete Algo- rithms, 2009. [17] D. Foster and R. Vohra. Calibrated learning and correlated equilibrium. Games and Economic Behavior, 21:40–55, 1997. [18] D. Foster and R. Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 29:7–36, 1999. [19] S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68:1127–1150, 2000. [20] R. Kleinberg, G. Piliouras, and E. Tardos. Multiplicative updates outperform generic no-regret learning in congestion games. In Proc. 41th STOC, 2009. [21] N. Littlestone, P. M. Long, and M. K. Warmuth. On-line learning of linear functions. In Proc. of the 23rd Symposium on Theory of Computing, pages 465–475. ACM Press, New York, NY, 1991. See also UCSC-CRL-91-29. [22] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994. [23] J.R. Marden and J.S. Shamma. Revisiting loglinear learning: asynchrony, completeness, and payoff-based implementation. Submitted, September 2008. [24] D. Monderer and L.S. Shapley. Potential games. Games and Economic Behavior, 14:124–143, 1996. [25] Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 1997. [26] N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, editors. Algorithmic Game Theory. Cambridge, 2007. [27] R.W. Rosenthal. A class of games possessing purestrategy nash equilibria. Int. Journal Game Theory, 2(1), 1973. [28] T. Roughgarden. Intrinsic Robustness of the Price of Anarchy. In STOC, 2009. [29] Y. Sharma and D. P. Williamson. Stackelberg thresholds in network routing games or the value of altruism. In ACM Conference on Electronic Commerce, 2007. [30] B. Widrow and M. E. Hoff. Adaptive switching circuits. 1960 IRE WESCON Convention Record, pages 96–104, 1960. Reprinted in Neurocomputing (MIT Press, 1988). [31] M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, 2003. 13 A Further Discussion of Other Dy- A.2 A Lower Bound for Noisy BestResponse namics A.1 Players Entering One at a Time In noisy best-response dynamics (also called log-linear learning [23]), when it is player i’s turn to move, it probabilistically chooses an action with a probability that decreases exponentially with the gap between the cost of that action and the cost of the best-reponse action. The rate of decrease is controlled by a temperature term τ , much like in simulated annealing. In fact, the dynamics can be viewed as a form of simulated annealing with the global potential as the objective function. At zero temperature, the dynamics is equivalent to standard (non-noisy) best response, and at infinite temperature the dynamics is completely random. While it is known that with appropriate temperature control this process in the limit will stabilize at states of optimum global potential, we show here there exist cost sharing instances such that no dynamics of this form can achieve expected cost o(n · OPT/ log n) within a polynomial number of time steps. We begin with a definition capturing a broad range of dynamics of this form. As mentioned in Section 1.2, one form of natural dynamics that have been studied in potential games is where the system starts empty and players join one at a time. Charikar et al. [11] analyze this setting for fair cost sharing on an undirected graph where all players have a common sink. They consider a two-phase process. In Phase 1, the players arrive one by one and each connects to the root by greedily choosing a path minimizing its cost, i.e., each selects a greedy (best response) path relative to the selection of paths by the previous players. In Phase 2, players are allowed to change their paths in order to decrease their costs, namely, in the second step players play best response dynamics. Charikar et al. [11] show that interestingly the sum of the players’ costs at the end of the first step will be within an O(log2 n) factor of the cost of a socially optimal solution (which in this case is defined to be a minimum Steiner tree connecting the players to the root). This then then implies that the cost of the Nash equilibrium achieved in the Definition 1 Let generalized noisy best response second step, as well as all states reached along the dynamics be any dynamics where players move way, are O(log3 n) close to OPT. one at a time in a random order, and when it is Note that in the directed case the result above a given player i’s turn to move, it probabilistically does not hold, in fact such dynamics can lead to selects among its available actions. The sole revery poor equilibria. Figure 1 shows an example quirement is that if action a is a worse response where if the players arrive one by one and each for player i than action b to the current state S, connects to the root by greedily choosing a path and furthermore a has been a worse response than ′ minimizing its cost, then the cost of the equilib- b to all past states S , then the probability of choosrium obtained can be much worse than the cost of ing a should be at most the probability of choosing OPT. The optimal solution which is also a Nash b. equilibrium in this example is (P1 , . . . , Pn ) where The above definition captures almost any natuPi = si → v → t for each i; however the soral individual learning-based dynamics. We now lution obtained if the players arrive one at a time show that no dynamics of this form can achieve and each connects to the root by greedily choosexpected cost o(n · OPT/ log n) within a polyno′ ′ ing a path minimizing its cost is (P1 , . . . , Pn ) mial number of time steps for fair cost-sharing. ′ where Pi = si → t for each player i. Clearly, ′ ′ cost(P1 , . . . , Pn ) = n which is much worse than Theorem 9 For the fair cost sharing game, no cost(OPT) = k. Moreover, if one modifies the generalized noisy best response dynamics can example by making many copies of the edge of achieve expected cost o(n · OPT/ log n) within cost k, then even if behavior begins in a random a polynomial number of time steps. initial configuration, with high probability each edge of cost k will have few players on it and so Proof: We consider a version of the “cars or best-response behavior will lead to the equilibrium public transit” example of Figure 1, but where each of cost n. player has n cars (options of cost 1 that cannot 14 be shared by others). For this problem, we can describe the evolution of the system as a random walk on a line, where the current position t indicates the number of players currently using the public transit (the shared edge of cost k). The exact probabilities in this walk depend on specifics of the dynamics and may even change over time, but one immediate fact is that so long as the walk has never reached t ≥ k, the shared edge of cost k is a worse response to any player than its edges of cost 1. Therefore, by definition of generalized noisy best response, each player has at most a 1/n chance of choosing the shared edge, and at least a 1 − 1/n chance of choosing a private edge, when it is that player’s turn to move. Since position t corresponds to a t/n fraction of players on the shared edge and a 1 − t/n fraction on private edges, this in turn implies that given that the random walk is in position 1 ≤ t ≤ k − 1, 1. the probability pt,t+1 of moving to position t + 1 is at most n1 (1 − nt ), and 2. the probability pt,t−1 of moving to position t − 1 is at least (1 − n1 ) nt , (with remaining probability 1 − pt,t+1 − pt,t−1 the walk remains in position t). In particular, pt,t+1 /pt,t−1 ≤ n−t ≤ 1/t. t(n − 1) We now argue that the expected time for this walk to reach position k is superpolynomial in n for k = log n. In particular, consider a simplified version of the above Markov chain where pt,t+1 /pt,t−1 = 1/t (rather than ≤ 1/t) and we delete all self-loops except at the origin (so pt,t+1 = 1/(t + 1) and pt,t−1 = t/(t + 1) for 1 ≤ t ≤ k − 1). Deleting self-loops can only decrease the expected time to reach position k since it corresponds to simply ignoring time spent in self-loops, and the same for setting pt,t+1 /pt,t−1 = 1/t. So, it suffices to show the expected time for this simplified walk to reach position k is superpolynomial in n. For convenience, set pk,k−1 = 1. We can now solve for the stationary distribution π of this chain. In particular, the simplified Markov chain is now equivalent to a random walk on an undirected multigraph with vertices v0 , v1 , . . . , vk having one edge between vk and vk−1 , (k − 1) edges between vk−1 and vk−2 , (k −1)(k −2) edges between vk−2 and vk−3 , and in general (k − 1)(k − 2) · · · t edges between vt and vt−1 for 1 ≤ t ≤ k−1. In addition, node v0 has (n − 1) · (k − 1)! edges in self-loops since the probability p0,0 is at least (n − 1)/n. Therefore, since the stationary distribution of an undirected random walk is proportional to the degree of each node [25], we have that πk ≤ 1/(n · (k − 1)!) < 1/k!. Lastly, since the expected time hkk between consecutive visits to node vk satisfies hkk = 1/πk by the Fundamental Theorem of Markov Chains, the expected time h0k to reach vk from v0 is at least 1/πk as well. So, the expected time to reach vk is at least k! which is superpolynomial in n for k = log(n) (or even k = ω(log n/ log log n)). Finally, the fact that the expected time to reach position k is superpolynomial in n implies that the probability of reaching position k within a polynomial number of time steps is less than 1/poly(n): specifically, if the walk has probability p of reaching position k in T time steps starting from position 0, then by the Markov property the expected time to reach position k is at most T /p. Moreover, so long as the walk has not yet reached position k, the cost of the system is Θ(n) = Θ(n · OPT/k). Thus, the expected cost of the system within a polynomial number of time steps is Ω(n · OPT/ log n) as desired. B Hoeffding-Azuma For completeness we present here Hoeffding-Azuma concentration bound supermartingales. the for Theorem 10 Suppose X0 , X1 , X2 , . . . is a supermartingale, namely that E[Xk |X1 , . . . , Xk−1 ] ≤ Xk−1 , for all k. Suppose also that for all k we have |Xk − Xk−1 | ≤ C. Then, for any λ > 0 and any n ≥ 1, 15 2 Pr[Xn ≥ X0 + λ] ≤ e−λ /(2C 2 n) .
© Copyright 2026 Paperzz