1556 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009 Attitude Adaptation in Satisficing Games Matthew Nokleby, Student Member, IEEE, and Wynn Stirling Abstract—Satisficing game theory offers an alternative to classical game theory that describes a flexible model of players’ social interactions. Players’ utility functions depend on other players’ attitudes rather than simply their actions. However, satisficing players with conflicting attitudes may enact dysfunctional behaviors, which results in poor performance. We present an evolutionary method by which a population of players may adapt their attitudes to improve payoff. In addition, we extend the Nashequilibrium concept to satisficing games, showing that the method leads players toward the equilibrium in their attitudes. We apply these ideas to the stag hunt—a simple game in which cooperation does not easily evolve from noncooperation. The evolutionary method provides two major contributions. First, satisficing players may improve their performance by adapting their attitudes. Second, numerical results demonstrate that cooperation in the stag hunt can emerge much more readily under the method we present than under traditional evolutionary models. Index Terms—Adaptive systems, cooperative systems, game theory, replicator dynamics, satisficing games. I. I NTRODUCTION G AME-THEORETIC models are often used to construct societies of artificial agents. Commonly, agents are modeled as players in a noncooperative game in which players solely focus on the maximization of individual payoff. The players’ self interest leads to Nash equilibria [3], which are strategy profiles such that no single player can improve its payoff by changing strategies. Unfortunately, self-interested behavior places significant limitations in terms of the players’ social interactions. For example, it is often difficult to engender cooperation and other social behaviors with self-interested players. Indeed, the self-interest hypothesis has come under nearly continuous criticism since the inception of game theory [4]–[7]. Satisficing game theory [8] offers an alternative to noncooperative game theory. It was developed for the synthesis of artificial agents and particularly focuses on social interactions between players. The players’ utilities are expressed as conditional mass functions, allowing them to consider the prefer- Manuscript received May 11, 2007; revised April 3, 2008. First published June 23, 2009; current version published November 18, 2009. This work was supported in part by the U.S. Army Research Laboratory and the U.S. Army Research Office under Grant W911NF-07-1-0650. This work was presented in part at the 2006 IEEE World Congress on Computational Intelligence and the 2007 IEEE Symposium on Foundations of Computational Intelligence. This paper was recommended by Associate Editor E. Santos. M. Nokleby was with the Department of Electrical and Computer Engineering, Brigham Young University, Provo, UT 84602 USA. He is now with the Department of Electrical and Computer Engineering, Rice University, Houston, TX 77251-1892 USA (e-mail: [email protected]). W. Stirling is with the Department of Electrical and Computer Engineering, Brigham Young University, Provo, UT 84602 USA (e-mail: wynn@ee. byu.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCB.2009.2021013 ences of others rather than solely focusing on individual self interest. Satisficing models have previously been successful in overcoming the social hurdles that were presented by noncooperative game theory, allowing players to exhibit sophisticated social behaviors such as altruism, negotiation, and compromise [9]. However, satisficing game theory presents its own set of challenges. As in real-life social situations, satisficing communities may dysfunctionally behave. When players with incompatible attitudes are grouped together, they can choose incoherent behaviors that lead to poor performance. The stag hunt, a simple game that was originally suggested by Rousseau [10], underscores the difficulty of achieving cooperation under self interest. As usually formalized, the game involves two hunters. They can catch a stag only if they hunt the stag together, but each can separately catch a (much smaller) hare. That is, a player earns maximum payoff if both players cooperate but risks failure if it attempts to cooperate whereas the other does not. Each player must individually decide between cooperation and noncooperation; thus, it represents a useful model for the analysis of potentially cooperative behavior. For example, a group of workers who choose whether to loosely strike fall under the stag-hunt model: a large number of workers may achieve a significant benefit by striking, whereas a single worker who “strikes” alone incurs significant loss. Social dilemmas such as the stag hunt have extensively been studied by (among others) social scientists, economists, and biologists. A large body of recent work has focused on learning-based [11]–[13] and evolutionary [14]–[16] methods for achieving cooperation. In evolutionary game theory, which was pioneered by Maynard Smith [17], [18], populations of players make decisions by trial and error rather than by explicit utility maximization. Over time, natural selection favors individuals who earn higher payoff, altering the population’s makeup. Large well-mixed populations are described by the replicator dynamics [19], which defines a system of ordinary differential equations that govern the evolution of the population. Under suitable conditions, the replicator dynamics drives the population to a Nash equilibrium. The stag hunt presents considerable difficulties based on an evolutionary perspective. Under the standard replicator dynamics, a population with primarily hare hunters cannot evolve into a group of stag hunters, although each player benefits from cooperation. Skyrms posits a compelling reason for this failure: “For the Hare Hunters to decide to be Stag Hunters, each must change her beliefs about what the others will do. But rational choice based on game theory as usually conceived, has nothing to say about how or why such a change might take place” [20, emphasis in the original]. Motivated by Skyrms’ conjecture, we explore methods by which “such a change” may take place in satisficing game theory. To do so, we attempt to bridge the gap between non-cooperative and satisficing game theory by incorporating 1083-4419/$25.00 © 2009 IEEE Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. NOKLEBY AND STIRLING: ATTITUDE ADAPTATION IN SATISFICING GAMES elements of non-cooperative game theory into satisficing game theory. In a manner similar to [21] and [22], we present a method where a population of players may modify its attitudes according to the game structure and the attitudes of other players. In our method, which employs the standard replicator dynamics, players whose attitudes result in higher payoffs more readily reproduce, causing their attitudes to dominate the population. The resulting model blends the two decision theories: players retain the conditional utility structure of satisficing game theory while improving the payoff by evolutionary means. The dynamics leads the players toward the Nash equilibrium in players’ attitudes rather than in their actions. In Section II, we familiarize the reader with the basics of satisficing game theory. In Section III, we review the classical formulation of the stag hunt and its evolutionary difficulties. We present a satisficing model for the stag hunt in Section IV. In Section V, we define the attitude equilibrium and present the attitude dynamics. We present experimental results in Section VI and compare the satisficing approach to other recent methods in evolutionary game theory. We give our conclusions in Section VII. II. S ATISFICING G AME T HEORY Although the simple seemingly reasonable assumption of self interest—also called individual rationality—has given rise to a rich successful theory of games, narrow maximization may be too simple, particularly in describing social situations. As observed by Luce and Raiffa, “General game theory seems to be, in part, a sociological theory which does not include any sociological assumptions. . . it may be too much to ask that any sociology be derived from the single assumption of individual rationality” [4, p.196]. Satisficing game theory provides an alternative to the classical framework. It presents a more elaborate structure that may be more useful in modeling social behaviors. Players may directly concern themselves with the preferences of others rather than explicitly attempting to maximize utility. We construct the satisficing framework by altering the structure of the players’ utility functions. First, each player possesses two utilities: 1) one utility for characterizing the benefits associated with taking an action and 2) another utility for characterizing the costs. A satisficing player contents itself with a decision for which the benefits outweighing the costs is “good enough” or satisficing.1 Second, the players’ utility functions share a common syntax with probability mass functions, allowing probabilistic concepts, e.g., conditioning and independence, to be applied to players’ preferences—albeit with a significantly different interpretation. The use of probability mass functions to describe a player’s preferences rather than a random phenomenon is unusual and warrants further explanation. A rigorous justification is given in [24], where it is shown that the use of mass functions as utilities guarantees several useful social properties in terms of the reconciliation of group and individual preferences. Fortunately, 1 Although they share similarities, satisficing game theory should not be confused with the concept of “bounded rationality” satisficing as introduced by Simon [23]. With Simon’s satisficing, individuals search for suboptimal choices that meet a variable threshold or aspiration level, implicitly accounting for the cost of continued searching. 1557 however, the benefits of conditional utilities may also intuitively be appreciated. For two discrete random phenomena X and Y , where Y is dependent on X, we can express the probabilities for Y by the conditional mass function pY |X (y|x). The conditional mass function gives the hypothetical probabilities of Y : What would be the probability that Y = y if we knew that X took on some value x? If we know the probabilities for X = x, we can compute the marginal mass function according to the basic rules of probability theory, i.e., pY (y) = x pY |X (y|x)pX (x). The marginal probabilities for Y are influenced—but not entirely dictated—by the probabilities of X. Similarly, players’ preferences may depend on the preferences of others, allowing their utilities (which we call social utilities) to be expressed as conditional mass functions. The conditional mass functions allow for hypothetical expressions of utility: What would Player 1’s utilities be if Player 2 unilaterally preferred a particular action? We can compute Player 1’s marginal utilities, i.e., the utilities that were used for decision making, by summing the conditional utilities over Player 2’s actual preferences. This structure allows players to consider not only what actions other players may prefer but also how strong the preferences for action are. Their utilities are influenced by others’ preferences in a controlled manner, which does not require the players to discard their own preferences. A. Formalization First, define the set of players X = {1, 2, . . . , n}. Each player chooses a pure strategy ui ∈ Ui , where Ui is player i’s pure-strategy set. A pure-strategy profile, which describes the actions of all of the players, is an n-dimensional vector u ∈ U, where U = U1 × U2 × · · · × Un is the pure-strategy space. As mentioned in the previous section, each player possesses two social utilities. To describe these social utilities, we define two “selves” or perspectives from which each player may consider its actions [25]. The selecting self considers actions strictly in terms of their associated benefits, whereas the rejecting self considers actions only in terms of the costs incurred in implementing them. These selves are described by the selectability function pSi (ui ) and rejectability function pRi (ui ), respectively. Social utilities are mass functions; thus, they are normalized across the pure-strategy sets and therefore describe the relative benefits and costs associated with a pure strategy in Ui . They also provide players with a formal definition of “good enough.” A pure strategy is “good enough” or satisficing if the relative benefits are at least as great as the relative costs. In the vernacular, we may view satisficing as “getting one’s money worth” as opposed to optimization, where players seek “the best and only the best.” Although the former concept allows for a set of multiple actions that are “good enough,” the latter concept is designed to produce a unique solution. We therefore define the individually satisficing set for player i as Σi = {u ∈ Ui : pSi (u) ≥ qpRi (u)} , (1) where q is the index of caution. Typically, q = 1, but we may adjust a player’s definition of “good enough” by changing q. Setting q ≤ 1 ensures that Σi is non-empty. We may combine the players’ individually satisficing sets by forming the satisficing Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. 1558 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009 Fig. 1. Simple praxeic network. rectangle 12,...,n , which is defined as the Cartesian product, i.e., 12,...,n = Σ1 × Σ2 × · · · × Σn . Fig. 2. Praxeic network with “true” random variables. (2) The satisficing rectangle is the set of all strategy profiles that are simultaneously satisficing to each player. It is convenient to graphically express the relationship between players’ utilities. In probability theory, relationships between random variables are expressed in Bayesian networks [26]. Similarly, in satisficing game theory, the relationships between players’ utilities are expressed in praxeic networks.2 The praxeic network consists of a directed acyclic graph, where the nodes are the selecting and rejecting perspectives of each player, and the edges are the conditional utility functions. For example, consider the simple two-player community in Fig. 1. For each player, the rejecting preferences depend on the selecting preferences of the other player, whereas the selecting preferences are independent. Parenthetically, we note that praxeic networks also resemble the spatial evolutionary models in [15], [16], and [28]–[30]. In these models, graphical connections determine which players may interact during play. That is, individuals may only play with players to whom they are connected. In contrast, graphical connections in praxeic networks define how players influence each other in play. Both models describe, in some sense, the players’ social relationships. However, spatial evolutionary models describe which players can pair up in a game, whereas praxeic networks describe which players’ utilities can influence the utilities of others. In discussing the players’ social utilities, we retain the terminology of probability theory. In the community in Fig. 1, we refer to Player 1’s conditional rejectability function, which is denoted as pR1 |S2 (v1 |u2 ). As aforementioned, the conditional mass function expresses a hypothetical proposition, where the antecedent is the strategy favored by Player 2, and the consequent is the utility of Player 1. That is, if Player 2’s selecting preferences entirely favored strategy u2 , what would be Player 1’s rejectability for v1 ? As with probability mass functions, we may compute the marginal rejectability by summing over the conditionals, i.e., pR1 (v1 ) = u2 ∈U2 pR1 |S2 (v1 |u2 )pS2 (u2 ). The marginal utilities determine the individually satisficing sets and the satisficing rectangle. If a utility is independent (e.g., the selectability functions), its marginal is directly expressed without conditioning. By allowing conditioning in the players’ utilities, we implicitly assume that players have at least partial knowledge of each other’s utilities. Each player must have sufficient knowledge of other players’ utilities to compute its marginal utilities and find 2 The term praxeic is derived from praxeology, which refers to “the science of human conduct” or “the science of efficient action” [27]. its individually satisficing set. In the example community, each player must know the other player’s selectability function to compute its own rejectability. However, players do not consider each other’s actions in determining the individually satisficing sets; thus, they need not observe (or predict) each other’s choices. With the marginal and conditional utilities defined for the example community, we can form the interdependence function pS1 ,...,Sn R1 ,...,Rn (u1 , . . . , un , v1 , . . . , vn ), which is the joint mass function of all players’ selecting and rejecting preferences. By the chain rule of probability theory, the interdependence function for this example is pS1 S2 R1 R2 (u1 , u2 , v1 , v2 ) = pR1 |S2 (v1 |u2 )pR2 |S1 (v2 |u1 )pS1 (u1 )pS1 (u1 ). Satisficing games are characterized by the triple (X, U, pS1 ,...,Sn R1 ,...,Rn ), where X is the set of players, U is the purestrategy space, and pS1 ,...,Sn R1 ,...,Rn is the interdependence function. Based on this information, all necessary marginal utilities can be computed, and the satisficing rectangle can be determined. Finally, it is often useful to specify the players’ social utilities in terms of variable parameters, which we refer to as the players’ attitudes. The interpretation of the attitudes, of course, depends on the specific game being played, but in general, they express each player’s temperament, which affects the degree to which its utilities depend on those of other players. For example, in the stag hunt, the players’ attitudes will characterize their aversion to risk, which influences each player’s willingness to engage in stag hunting. B. Random Satisficing Games Oftentimes, a player’s utility will depend on random phenomena, which result in expected utilities based on the distribution of the random event. With classical game theory, it is required that the probabilistic distributions of the random phenomena are not influenced by the preferences of the players. In other words, a player’s belief on a random event may affect its utilities, but not vice versa. In most cases, this restriction poses no difficulty. However, we may want to consider circumstances in which a player’s subjective probability about an event depends on players’ preferences. The conditional structure of social utilities provides for such a possibility. Since the utilities are mass functions, we can combine both probabilistic and preferential information into a single model. Fig. 2 illustrates a network that implements such a Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. NOKLEBY AND STIRLING: ATTITUDE ADAPTATION IN SATISFICING GAMES TABLE I PAYOFF MATRIX FOR A TWO-PLAYER STAG HUNT model. This praxeic network is similar to the network in Fig. 1 in that it contains the same four vertices that were associated with the players’ selecting and rejecting selves. However, we also include two random variables θ1 and θ2 , which represent phenomena that are only probabilistically known to the players. This network describes both players whose preferences depend on random phenomena and random phenomena that depend on players’ preferences. The dependencies in Fig. 1 still persist. R1 still depends—but indirectly, through θ2 —on S2 , and R2 still depends on S1 , which now depends on θ1 . III. T HE S TAG H UNT In the stag hunt, players choose between two pure strategies—hunt stag or hunt hare—denoted s and h, respectively. The payoff for playing each pure strategy depends on the action of the other player. If the other player hunts a stag, the payoff for hunting a stag is higher than that of hunting a hare. However, if the other player hunts a hare, stag hunting yields a low payoff. That is, the players must hunt together to catch the stag and obtain the higher payoff. The payoff for hunting a hare, on the other hand, is independent of the other player’s choice. Each player can individually catch a hare and therefore can always opt for the modest but more secure payoff associated with consuming a hare. We quantitatively express the players’ utilities in the payoff matrix in Table I. There are two pure-strategy Nash equilibria for the stag hunt: 1) (s, s) and 2) (h, h). If the players simultaneously hunt stag or a hare, there is no incentive for either player to change actions. There is also a mixed-strategy equilibrium, in which each player invokes a randomized rule to choose between the two pure strategies. We will study the mixedstrategy equilibrium in more detail later. Each pure-strategy equilibrium has its benefits. The (s, s) equilibrium is optimal in that it maximizes both players’ payoffs. However, successful stag hunting requires the cooperation of the other player; thus, risk-averse players may instead choose to hunt hare. The (h, h) equilibrium is regarded as the risk-dominant equilibrium in the sense that the potential gains of deviating from hare hunting are less than the potential losses. At best, a hare hunter will increase its utility by one by switching to hunt stag, but at worst, it will decrease its utility by three. Thus, conservative—yet fully rational—players might choose to hunt hare. This dichotomy illustrates the fundamental issue of the stag hunt. Obviously, if each player had a certain assurance that the other player would hunt stag, everyone would cooperate.3 However, players do not have such an assurance under the usual model but must independently choose their actions. The players’ actions then boil down to how much confidence each 3 Interestingly, it is straightforward to show that, if the game is sequentially played (i.e., Player 1 makes its move, and then, Player 2, who observes Player 1’s choice, moves), mutual stag hunting becomes the unique subgame perfect Nash equilibrium [31]. 1559 player has in the other’s willingness to cooperate and how risk averse each player is. As mentioned by Skyrms, the classical game theory has little to say about this topic. Indeed, the Nash equilibria do not tell us which actions the players will take. They simply imply that once a pair of players is in either purestrategy equilibrium, neither player will have an incentive to deviate. To study which equilibrium will result under different circumstances, we turn to evolutionary game theory [32], [33]. A. Replicator Dynamics Replicator dynamics is the classic instantiation of evolutionary game theory. It models the evolution of a population’s strategies according to their ecological fitness. Consider a large population of players who are “programmed” to play a particular strategy, regardless of the other player’s behavior, in a symmetric two-player game such as the stag hunt. The players are randomly paired up to play the game at each time step. Each player asexually4 reproduces according to its payoffs, i.e., the number of offspring that a player has is proportional to its payoff during the previous game. Players’ strategies also “breed true,” which means that offspring are programmed to the same pure strategy as their parents. We assume that the population is well mixed, giving each player an equal chance of being paired with any other player. For a symmetric two-player game where each player must choose some strategy in the pure-strategy set U , define the mixed-strategy simplex ΔU as the set of all mixed (randomized) strategies over U . If U contains m elements, we can characterize a mixed strategy as a nonnegative n-dimensional vector x that obeys the constraint m i=1 xi = 1. Each player’s mixed strategy is probabilistically independent of the other player’s. The interior of ΔU is the set of mixed strategies that assign nonzero probability to each pure strategy, i.e., int(ΔU ) = {x ∈ ΔU : xi > 0, i ∈ {1, . . . , m}} . In the replicator dynamics, we interpret each element xi as the population share for i, or the fraction of the population that plays the pure strategy i. That is, if we randomly draw an individual from the population that was described by x, the probability that it will be programmed to play i is xi . At time t, the expected utility5 of a player who plays pure strategy i against a random member of the population is u(i, x(t)) = m j=1 π(i, j)xj (t), where π(i, j) represents the utility of playing pure strategy i against pure strategy j. As the players reproduce, the population shares described by x(t) vary, and the more successful strategies tend to dominate over strategies that are poorly adapted to the evolving community. As the population size approaches infinity, we may invoke the law of large numbers, and the dynamics of the population shares becomes a system of m differential equations. We have ẋi (t) = [u (i, x(t))−u (x(t), x(t))] xi (t), i ∈ {1, . . . , m} (3) 4 This result does not contradict the fact that the players must pair off to play the game. Although they play the game pairwise, each player individually earns its payoff. The number of offspring that it produces is proportional only to its own payoff and is entirely independent of the other players’. 5 We use π to represent the utility (or payoff) when players use only pure strategies, whereas u represents the expected utility when mixed strategies are involved. Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. 1560 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009 where u(x(t), x(t)) is the population’s average expected utility, i.e., u (x(t), x(t)) = m u (i, x(t)) xi (t) i=1 = m m π(i, j)xi (t)xj (t). Fig. 3. Direction field for the stag-hunt replicator dynamics. 4x2s − 3xs + 3. xs = 1 − xh ; thus, we can characterize the dynamics by examining only the stag hunting share. Suppressing the time arguments, we get i=1 j=1 Intuitively, (3) tells us that a pure strategy’s population share increases at time t if its expected utility is higher than the average expected utility across the population. It is shown in [32] that, if the initial conditions satisfy x(0) ∈ int(ΔU ) (all pure strategies are represented in the initial conditions), any steady state of the dynamics is a Nash equilibrium in the players’ strategies. It should be noted that the standard replicator model describes a selection dynamics rather than a mutation dynamics. Players do not change strategies under this model; instead, the offspring of players whose strategies are suboptimal are overwhelmed by the offspring of more successful players. As time continues, the fraction of the population that plays suboptimal strategies becomes arbitrarily small. To account for random factors such as mutation, migration, and payoff fluctuations, several stochastic replicator models have been proposed [13], [14], [34], [35]. We examine the model in [14], which augments the standard replicator dynamics by introducing fixed mutation probabilities into the dynamics. The mutation probabilities are contained in the matrix W = [Wij ], where Wij represents the probability that an individual that plays strategy j spontaneously switches to strategy i. The mutation dynamics differs from (3) by the addition of a mutation term, i.e., ẋi (t) = [u (i, x(t)) − u (x(t), x(t))] xi (t) + m (Wij xj (t) − Wji xi (t)) . (4) ẋs = [u(s, x) − u(x, x)] xs = −4x3s + 7x2s − 3xs . (5) Although non-linearities prevent a closed-form solution, we can easily examine the qualitative behavior of the population. In Fig. 3, we show a direction field for the replicator dynamics, which gives the sign of the derivative as a function of xs . The stationary points, where ẋs = 0, occur at xs = {0, 3/4, 1}. The point at xs = 3/4 corresponds to the aforementioned mixedstrategy Nash equilibrium. However, the mixed-strategy equilibrium is not stable; any deviation drives the dynamics to one of the pure-strategy points, which are asymptotically stable. We may regard xs = 3/4 as a boundary for the initial conditions of the population: If fewer than 75% of the population initially hunt stag, the dynamics quickly drives stag hunters to relative extinction. If more than 75% initially hunt stag, hare hunters die out. Although stag hunting prevails in a predominantly cooperative society, these dynamics cannot evolve cooperation from an initially noncooperative population. 2) Mutation Dynamics: Using the replicator model in (4), we add a probability of mutation into the stag-hunt dynamics, assuming that mutation helps a cooperative population evolve. We assume that the probability of mutating from stag hunting to hare hunting is identical to the probability of mutation from hare hunting to stag hunting. Consequently, we can parameterize the mutation matrix by a single mutation probability, 0 ≤ α ≤ 1. We have 1−α α W= . α 1−α j=1 The dynamics for xi are altered by adding the rate at which players mutate into the population share xi (described by players mutate j Wij xj ) and subtracting the rate at which out of the population share xi (described by j Wji xj ). When mutation probabilities are zero (W = I), (4) collapses to the standard replicator dynamics. In general, however, we are forced to give up the theoretical properties that were guaranteed under the standard replicator model. The steady-state behavior of the system no longer corresponds to the Nash equilibria, regardless of the initial conditions. B. Stag-Hunt Replicator Dynamics 1) Standard Dynamics: For the stag hunt, the population is described by the 2-D vector x = (xs , xh ). The payoff matrix (see Table I) shows that the payoff for a stag hunter is four when paired with another stag hunter, and it is zero when paired with a hare hunter. A stag hunter therefore gains an expected utility of u(s, x) = 4xs . The utility for hunting a hare is independent of the other player’s actions; thus, u(h, x) = 3. The population’s average expected payoff is given by u(x, x) = The dynamics for xs becomes ẋs = − 4x3s + 7x2s − 3xs + Wsh (1 − xs ) − Whs xs = − 4x3s + 7x2s − 3xs + α(1 − 2xs ). (6) The closed-form expression for the stationary points of the dynamics is quite unwieldy. So, in Fig. 4, we plot the direction field for the dynamics as a function of α and xs . When the mutation probabilities are small, the qualitative behavior of the solution does not change: there remain two stable stationary points at which nearly all of the population hunts either a stag or a hare and one unstable stationary point that defines the boundary between the stag-hunting and hare-hunting basins of attraction. The boundary point increases with the mutation rate, which suggests that mutation exacerbates the evolutionary difficulties of the stag hunt. For large mutation probabilities, the dynamics considerably differs, leaving a single stationary point to which the dynamics converges, independent of the initial conditions. Even with absurdly high mutation rates—in which evolution is governed more by mutation than by payoff—only a minority of the population hunts stag. The population size is infinite; thus, the Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. NOKLEBY AND STIRLING: ATTITUDE ADAPTATION IN SATISFICING GAMES 1561 Fig. 5. Praxeic network for the stag hunt. Fig. 4. Direction field for the stag-hunt stochastic replicator dynamics. mutation replicator model defines a deterministic system as in the standard dynamics. Consequently, finite populations, with random pairings and mutation, may spontaneously evolve cooperation from noncooperation. However, the moral of the story is that, on average, even finite populations rarely cooperate if they are large, well mixed, and composed of players that are preprogrammed to play a particular pure strategy. Finally, as we have already discussed, there exist evolutionary models other than the replicator dynamics. In Section VI, we investigate the effects of more sophisticated evolutionary mechanisms on the stag hunt. For the time being, however, we focus on the underlying structure of the players’ behavior. Our solution, which is based on satisficing game theory, affords a flexible structure for players’ social interactions, increasing the possibility for cooperation even under simple evolutionary dynamics. IV. T HE S ATISFICING S TAG H UNT In a two-player stag hunt, the set of players is X = {1, 2}, and each player has an identical pure-strategy set Ui = {s, h}, i ∈ X. In formulating a satisficing game, we are free to select an arbitrary structure for the praxeic network and specify the conditional utilities as we see fit. We are then constrained to carry out the rules of probability in computing the marginal utilities that determine the players’ behavior. Thus, the formulation of a satisficing game is a process of “designing” the conditional structure and examining the results to see if the players’ behavior makes sense. First, we give the conceptual definitions for the selectability and rejectability preferences, which we will further clarify as we mathematically define the players’ social utilities. What do we mean by “benefits” and “costs” for the players in the stag hunt? In our treatment, we consider selectability in terms of successful cooperation. To the extent to which stag hunting can be successful, the selecting self prefers to hunt stag. We associate rejectability with the raw opportunity cost of an action, which is tempered by risk aversion. The opportunity cost of hunting a hare is the payoff for catching a stag, and the opportunity cost of hunting a stag is the payoff for catching a hare. Next, we define the interconnections between the four selves and form the praxeic network. Our model is illustrated in Fig. 5. In addition to the vertices that correspond to the selecting and rejecting selves, we include a vertex that corresponds to a binary random variable θs , which accounts for the possibility of failure. It is not necessarily certain, even if both players hunt stag, that they will succeed. We use θs = 1 to denote that a successful stag hunt is possible and θs = 0 to denote that stag hunting will result in failure. To define the rejectability function for each agent, we must first define a normalized measure of opportunity cost. Let φsi and φhi denote the raw utility (in arbitrary units) of consuming a stag and a hare, respectively. By normalizing, the relative utility of hare hunting becomes μi = (φhi /(φhi + φsi )) for i = 1, 2. The relative utility of stag hunting is then 1 − μi . Given this definition, we may let φsi = 4 and φhi = 3, i.e., the payoff values in Table I, resulting in μi = (3/7). However, we further wish to take into account the temperament of the players. As discussed in Section III, one central issue in the stag hunt is to determine what players of different risk-aversion levels should do. Therefore, we introduce a parameter, i.e., ρi , that expresses the degree of player i’s risk aversion. A player with ρi = 1 is risk neutral, a player with ρi > 1 is risk averse, and a player with ρi < 1 is payoff seeking and tends to ignore risk. We then define μi = ρi (φhi /(φhi + φsi )). Thus, μi reflects a player’s willingness to take risks and the relative utility for a stag and a hare. A maximally risk-averse player will hunt stag only if success is certain, whereas a fully payoff-seeking player will hunt stag regardless of the odds. To ensure a meaningful game, we still require that both players will never prefer a hare to a stag, i.e., μi < (1/2) for i = 1, 2. For convenience, we will simply refer to μi as player i’s riskaversion level, which parameterizes the player’s attitudes. We define each player’s rejectability function as μi , for ui = s (7) pRi (ui ) = 1 − μi , for ui = h which is an expression of the normalized opportunity cost for each action. The cost of hunting a stag is the relative harehunting utility, and vice versa. Note that the players’ rejecting selves are not dependent on others’ preferences, allowing us to directly define the marginal utilities. We next define the conditional distribution for θs . The distribution of this random variable, which is conditioned upon both players’ rejecting selves, represents the probability that the players will successfully hunt stag. The distribution of θs incorporates whether R1 and R2 reject cooperation and how likely the players are to catch a stag if they cooperate. We model the latter consideration by defining 0 ≤ σ ≤ 1, which represents the probability of catching a stag, given that the Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. 1562 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009 players cooperate. It may reflect the number of stag in the environment, the players’ hunting skills, or other external factors. If R1 and R2 altogether reject hare hunting, then the players will cooperate and successfully capture a stag with probability σ. We characterize this condition by defining σ, for ϑs = 1 pθs |R1 R2 (ϑs |h, h) = (8) 1 − σ, for ϑs = 0 where θs represents a random variable, and ϑs represents its realization. If, however, either player unilaterally rejects stag hunting, the probability of catching a stag is zero, yielding pθs |R1 R2 (ϑs |s, s) = pθs |R1 R2 (ϑs |s, h) = pθs |R1 R2 (ϑs |h, s) 0, for ϑs = 1 = 1, for ϑs = 0. (9) Fig. 6. Notice that the players’ preferences influence the probability of a random event, as discussed in Section II-B. Since the players’ rejecting preferences affect their willingness to hunt stag, the conditional structure is justifiable. We compute the marginal mass function by summing over the conditional random variables, yielding pθs |R1 ,R2 (ϑs |v1 , v2 )pR1 (v1 )pR2 (v2 ) pθs (ϑs ) = v1 ,v2 = σ(1 − μ1 )(1 − μ2 ), 1 − σ(1 − μ1 )(1 − μ2 ), for ϑs = 1 for ϑs = 0. (10) Based on (14), we see that, as the risk-aversion levels decrease, the probability of a successful stag hunt increases. If both players are completely payoff seeking (μ1 = μ2 = 0), the probability of a successful stag hunt is σ. Either player can reduce the chances for a successful hunt. As the risk-aversion μi increases for either player, the probability of a successful stag hunt decreases. Finally, we define the conditional selectability. Each player’s selectability is influenced by the probability of a successful stag hunt. The selectability, as we have previously discussed, is tied to the benefits of cooperation: to the extent that a successful stag hunt is possible (θ = 1), selectability favors stag hunting. The higher the probability of successful stag hunting becomes, the more beneficial it is to hunt stag. The corresponding conditional selectability function is ⎧ 1, for ui = s|ϑs = 1 ⎪ ⎨ 0, for ui = h|ϑs = 1 pSi |θs (ui |ϑs ) = (11) ⎪ ⎩ 0, for ui = s|ϑs = 0 1, for ui = h|ϑs = 0. The simple form of the conditionals allows us to express the marginal selectability as σ(1 − μ1 )(1 − μ2 ), for ui = s pSi (ui ) = (12) 1 − σ(1 − μ1 )(1 − μ2 ), for ui = h. A. Satisficing Rectangle With all of the social utilities defined, we have completely characterized the players’ utilities and can solve for the purestrategy profiles that form the satisficing rectangle. As dis- Satisficing rectangle regions for the stag hunt. cussed in Section II, the satisficing rectangle is the set of pure-strategy profiles that are simultaneously satisficing to each player. In Fig. 6, we set q = 1 and plot the regions of the satisficing rectangle as functions of μ1 and μ2 , which specify the players’ attitudes. There are four possibilities. When both players have low risk aversion, (s, s) is the unique strategy profile in the satisficing rectangle. If risk aversion is high in both players, (h, h) results. In the (h, s) and (s, h) regions, however, one player is strongly risk averse, whereas the other player strongly seeks payoff, thus resulting in one player that tries to cooperate, whereas the other does not. On the boundaries of the four regions, the satisficing rectangle contains multiple strategy profiles. These last two regions illustrate a unique feature of satisficing models. In the (h, s) and (h, s) regions, one player chooses to hunt hare, whereas the other player, who is aware of the first player’s increased risk aversion, nevertheless stands by its post and attempts to hunt stag. Such dysfunctional behavior is a consequence of the structure of the utilities: the players’ utilities depend on the others’ attitudes rather than the strategies that they play. We hasten to note that dysfunctional behavior is not a failure per se of the satisficing model. Dysfunctional societies exist in practice, and we may interpret these regions as an acknowledgement that players with incompatible attitudes may incoherently act. However, in designing artificial systems, we typically prefer to avoid incoherent behaviors, regardless of whether they are sociologically justifiable. It seems unreasonable that incompatible players would continue to exhibit the same attitudes and to enact the same incoherent strategies. Thus, we introduce the attitude dynamics, which provides a way for players to adapt their attitudes and avoid such dysfunctional behavior. V. A TTITUDE D YNAMICS To introduce the attitude equilibrium and the attitude dynamics, we first embellish the structure of the satisficing game. We endow each player with a classical utility function that is based solely on the strategy profile that the players implement. Definition 1: An augmented satisficing game is a 5-tuple (X, U, pS1 ,...,Sn R1 ,...,Rn , A, π(u)). The first three elements Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. NOKLEBY AND STIRLING: ATTITUDE ADAPTATION IN SATISFICING GAMES are the set of players, the pure-strategy space, and interdependence function as normal. In addition, we introduce the pureattitude space, i.e., A = A1 × A2 × · · · × An , which contains the attitudes that the players may exhibit. These attitudes are parameters in the players’ social utilities and are different for each satisficing game. We also introduce π(u), a vector payoff function that describes the raw payoff to the players for implementing the pure-strategy profile u ∈ U. To augment a satisficing game, the players’ attitudes must be specified as distinct parameters in the players’ social utilities. Furthermore, we must construct a raw payoff function that is separate from the social utilities. Constructing raw payoff functions may be difficult in practice. In a system of artificial agents, for example, the agents’ objectives may sufficiently be complicated such that it is impossible to define a simple payoff function for each agent. In a simple game like the stag hunt, the extension is straightforward. Each player’s attitudes are given by the risk-aversion level μi , yielding a pure-attitude space of A = [0, 1/2) × [0, 1/2). The payoff function π(u) is described by the payoff matrix in Table I. The augmented satisficing game describes a two-step mapping from attitudes to payoffs. The social utilities, which are determined by the interdependence function, map the players’ attitudes to pure-strategy profiles.6 The payoff function then maps the pure-strategy profile to raw payoffs. Thus, in an augmented satisficing game, we may evaluate the raw utility of exhibiting a particular attitude. To simplify the notation, we will occasionally refer to π(a), i.e., the payoff to the players for implementing the pure-strategy profile determined by the pureattitude profile a ∈ A. That is, we may think of an augmented satisficing game as a noncooperative game, where the players’ payoffs are determined by the attitudes that they exhibit rather than the strategies that they play. We may also discuss mixed attitudes, which are probability distributions over the attitudes the players exhibit. Denoting the cardinality of Ui as ki , the mixed attitude of player i is given by a (normalized and nonnegative) ki -dimensional vector zi . The discussion of mixed strategies in Section III-A directly applies to mixed attitudes. We assume that the players’ mixed attitudes are probabilistically independent of each other. We define player i’s mixed attitude simplex Δai . The mixed-attitude space is the Cartesian product Θa = Δa1 × Δa2 × · · · × Δan . A mixed-attitude profile is a vector of mixed attitudes z = (z1 , z2 , . . . , zn ) ∈ Θa . Since the players’ mixed attitudes are independent, the probability that a pure-attitude profile is exhibited is equal to the product of the associated probabilities. Thus, player i’s expected utility ui (z) when the players exhibit the mixed-attitude profile z ∈ Θa is ui (z) = a∈A πi (a) n ziai (13) i=1 where ziai is the probability with which player i exhibits the pure attitude ai . Now, given complete knowledge of the satisfic6 We have glossed over the fact that, in general, the satisficing rectangle contains multiple pure-strategy profiles. For the stag hunt, this fact presents no problem, because the satisficing rectangle contains a single strategy profile almost everywhere. We will assume that, if necessary, the players employ a tie-breaking mechanism to select a unique strategy profile. 1563 Fig. 7. Attitude equilibrium regions for the stag hunt. ing game and the other players’ utilities, a player may consider changing their attitudes to increase the expected utility, which motivates the attitude equilibrium. Definition 2: An attitude equilibrium is a mixed-attitude profile z∗ ∈ Θa such that ui (z∗1 , . . . , z∗i , . . . , z∗n ) ≥ ui (z∗1 , . . . , zi , . . . , z∗n ) (14) for each zi ∈ Δai and for each i ∈ X. The definition for the attitude equilibrium is essentially identical to that of the Nash equilibrium: no player can improve its expected utility by exhibiting a different mixed attitude. In fact, we may say that the attitude equilibrium is the equilibrium in the players’ attitudes rather than in their strategies. Because of the analogy between the attitude equilibrium and the Nash equilibrium, many theoretical results apply. Theorem 1: An attitude equilibrium exists for every augmented satisficing game with finite attitude spaces. Proof: This result relies on the fact that any augmented satisficing game defines a classical noncooperative game, where X is the set of players, A takes the role of the pure-strategy space, and π(a) is the payoff function. In [3], it is shown that any noncooperative game with a finite pure-strategy space has at least one Nash equilibrium, although it may exist only in mixed strategies. One attitude equilibrium is simply one Nash equilibrium in the players’ attitudes; thus, one must exist for any augmented satisficing game with a finite pure-attitude space, although it exists only in mixed attitudes. Note that a finite attitude space is a sufficient—but not necessary—condition for the existence of an attitude equilibrium. For the stag hunt, although the attitude spaces are continuous, it is immediate that attitude equilibria exist in pure attitudes. In Fig. 7, the attitude equilibria are shown for several values of σ. If the players’ pure-attitude profile lies in these regions, there is no incentive for either player to change attitudes. Consider the (s, s) region of the satisficing rectangle. Here, both players receive maximum payoff, and there is no incentive for either player to deviate. Notice, however, that only part of the (h, h) region is an equilibrium, because when player i’s risk aversion μi is sufficiently low, it is possible for player j to move the group from mutual hare hunting to stag hunting by lowering its own μj . Although (h, h) is an equilibrium under the classical game, the satisficing model gives the players Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. 1564 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009 greater influence over each other’s behavior, increasing the possibility for cooperation. As σ increases, the size of the (h, h) equilibrium decreases, entirely disappearing when σ = 1. Finally, notice that the dysfunctional regions (s, h) and (h, s) do not contain equilibria. In these regions, each player can improve its payoff by changing μi and forcing the game into either (s, s) or (h, h). The attitude equilibrium concept provides a useful juxtaposition of satisficing game theory and individual rationality: the social structure of the satisficing model decreases the attraction of mutual hare hunting, while the introduction of the classical payoff function gives an incentive for players to adapt their attitudes and avoid dysfunctional behaviors of the (s, h) and (h, s) regions. If a large population of players adapts by trial-and-error experimentation, we can model the evolution of the players’ attitudes by a straightforward application of the standard replicator dynamics. We again restrict our attention to symmetric two-player games. Thus, both players are described by the pure-attitude set A and the payoff function π(a). We require that A is finite, and we denote the cardinality of A as m. Define a normalized vector z(t) = (z1 (t), z2 (t), . . . , zm (t)), where zi (t) represents the population share that exhibits the ith pure attitude. Similar to traditional games, we may describe the dynamics of the population shares by a system of m differential equations, i.e., żi (t) = [π (i, z(t)) − π (z(t), z(t))] zi (t). (19) By analogy with the standard formulation, π(i, z(t)) is the expected payoff for exhibiting the ith attitude against a random sample from the population, and π(z(t), z(t)) = i j π(i, j)zi (t)zj (t) is the average expected payoff. Let ΔA be the mixed-attitude simplex of A. Similar to mixed strategies, the interior of ΔA is the set of all mixed attitudes that gives nonzero probability to each pure attitude. Theorem 2: Let ξ(t, z(0)) denote the solution for the attitude dynamics in (19) at time t with initial conditions z(0). If z(0) ∈ int(ΔA ) and limt→∞ ξ(t, z(0)) = z∗ , then z∗ is an attitude equilibrium. Proof: This result directly follows from the fact that an augmented satisficing game can be considered a classical game, where players choose attitudes rather than play strategies. As mentioned in Section III-A, it is shown in [32] that, when initialized with a mixed strategy on the interior of the mixedstrategy simplex, any steady state of the replicator dynamics is a Nash equilibrium. An attitude equilibrium is a Nash equilibrium in players’ attitudes; thus, the result holds for the attitude dynamics. Note that Theorem 2 does not guarantee that a steady state will occur, even under well-behaved initial conditions. Rather, if a steady state results under suitable initial conditions, it must be an attitude equilibrium. VI. R ESULTS A. Attitude Dynamics To apply the attitude dynamics, we first quantize the values that μ may assume. Define A = {ν1 , ν2 , . . . , ν100 }, which is a set of 100 evenly spaced values of μ over [0, 1/2). We initialize the population shares z according to an exponential distribution Fig. 8. Joint attitude distribution for σ = 1 and λ = 10. (a) t = 0. (b) t = 30. so that most players hunt hare, i.e., zi (0) ∝ e−λ((1/2)−νi ) . As we set λ higher, the initial population is more risk averse and less willing to hunt stag. We use the payoff matrix in Table I to determine the raw payoff for exhibiting a particular pure-attitude profile a = (μ1 , μ2 ) ∈ A × A. If a is in the (h, h) region of the satisficing rectangle (see Fig. 6), then the payoff to the first player is π(μ1 , μ2 ) = 3. Similarly, the payoffs are π(μ1 , μ2 ) = 3 and π(μ1 , μ2 ) = 0 if a belongs to the (h, s) and (s, h) regions, respectively. Finally, π(μ1 , μ2 ) = 4σ if a is in the (s, s) region.7 Because of the high dimensionality of the state space and the complexity of the utility functions of the players’ preferences, it is difficult to analytically examine the attitude dynamics. We cannot easily solve for stationary points or say much about the relative sizes of the basins of attraction as we could under the (much simpler) standard replicator dynamics. Fortunately, we can specify meaningful initial conditions and numerically approximate the solution to the system of differential equations. We examine several scenarios where the vast majority of the population hunts hare and discuss when it is possible to evolve a cooperative community. First, we examine the dynamics with σ = 1. We initialize the population with λ = 10, leaving more than 85% of the population that hunts a hare. Fig. 8(a) shows the initial joint 7 We multiply by σ in the payoff to account for the probability that the players succeed, given that they both hunt stag. Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. NOKLEBY AND STIRLING: ATTITUDE ADAPTATION IN SATISFICING GAMES probability mass function of the players and the four regions of the satisficing rectangle. The vertical axis shows the joint probability that a pair of players, which were randomly selected from the population, will end up at a particular point. The players are randomly and independently drawn from the infinite population; thus, the joint probability is the product of the marginal probabilities given by z. That is, P r(μ1 = νi , μ2 = νj ) = zi (t)zj (t). Initially, almost all of the joint probability mass is in the mutual hare-hunting region. The dynamics, however, quickly pushes the population toward stag hunting. Within 30 iterations, almost the entire population is in the mutual stag-hunting region, with the most common values of (μi , μj ) being close to zero [see Fig. 8(b)]. This result is due to the fact that mutual cooperation is the only attitude equilibrium when σ = 1. For any positive finite λ, all steady-state population distributions will entirely be within the (s, s) region. Next, we lower σ to see how the dynamics changes. Keeping the initial conditions the same, we let σ = 0.925, introducing the (h, h) attitude equilibrium region. Now, more than 90% of the initial population hunts hare. This scenario yields a highly interesting result. The hare-hunting equilibrium initially dominates, and the population shares that were associated with the stag-hunting regions quickly diminish [see Fig. 9(a)]. We notice, however, that there are small migrations toward the boundaries of the decision regions. These players still predominantly hunt hare, but they are less risk averse. As evolution continues, a small concentration of players emerges around the boundaries of the four regions, as illustrated in Fig. 9(b). Players in this region are quite versatile: they hunt hare with risk-averse players, hunt stag with the payoff seekers, and only very rarely end up hunting stag with a player who refuses to cooperate. The concentration of players slowly begins to dominate, causing much more players to hunt stag. Fig. 9(c) shows the population at t = 100. By this time, essentially all of the population is composed of moderately risk-averse but versatile players. This truly emergent result provides an interesting insight in defining “fitness” in a social system. In an uncertain scenario where both hare hunting and stag hunting are potentially dominant strategies, the most successful players are those who are flexible, i.e., players who can adapt their actions to the preferences of those around them. If we lower σ much below 0.925, the dynamics fails to evolve the society toward cooperation for these initial conditions. This case happens for two reasons: 1) the size of the (s, s) region becomes smaller with decreasing σ, and 2) the expected payoff for exhibiting attitudes in the (s, s) region decreases. However, even under the unfavorable conditions shown where a pair of stag hunters might fail, the satisficing model can evolve cooperation from noncooperation. Less than 10% of the initial population is required to hunt stag in the satisficing model, which is a significant improvement over the standard replicator model, where more than 75% must initially hunt stag. B. Spatial Evolutionary Models For comparison, we also consider the stag hunt under the spatial evolutionary models in [15], [16], and [28]–[30], which have been proven effective in promoting cooperation in social dilemmas. In [15], the stag hunt is particularly studied in terms 1565 Fig. 9. Joint attitude distribution for σ = 0.925 and λ = 10. (a) t = 5. (b) t = 50. (c) t = 100. of the relative benefit for mutual stag hunting. Here, we examine the question in terms of initial population: What fraction of the population must initially hunt stag for cooperation to flourish? Spatial evolutionary models are described by undirected graphs, where each vertex represents a player, and each edge represents a social link between two players. As with the replicator dynamics, each player is preprogrammed to play a particular pure strategy. However, in the spatial dynamics, a player may change strategies depending on the relative fitness Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. 1566 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009 of its neighbors. At each generation, players accrue payoff by playing a single instance of the game with each neighbor. After play, each player randomly selects a neighbor (possibly itself) with a probability proportional to the payoff that was accrued in the current previous round, adopting that player’s pure-strategy for the next round. We may interpret the spatial dynamics as an imitation dynamics, where a player imitates the behavior of its neighbors, or as a death−birth dynamics, where players “die” and give rise to a new generation whose strategies depend on the neighbors’ relative fitness. Regardless of interpretation, for fully connected graphs, the dynamics converges to the standard replicator dynamics as the population size becomes large and the time between generations becomes small. In the stag hunt, let Ns (i) and Nh (i) be the set of player i’s neighbors (including itself) that hunt stag and a hare, respectively, and let P (i) denote the payoff that was earned by player i during a single generation. Thus, letting | · | denote the cardinality of a set, player i earns F (i) = 4(|Ns (i)|−1) if it hunts stag and F (i) = 3(|N s (i)| + |Nh (i)| − 1) if it hunts hare.8 Next, define Fs (i) = j∈Ns (i) F (j) and Fh (i) = j∈Nh (i) F (j), i.e., the respective sum payoff of stag- and hare-hunting neighbors. Finally, a neighbor is selected with a probability proportional to its fitness; thus, player i hunts stag during the next generation with probability Fs (i)/(Fs (i) + Fh (i)). The spatial dynamics is highly dependent on the structure of the graph that was used to model the population. We construct our graphs according to the so-called “scale-free” models [36], in which the number of neighbors follows a power-law distribution. If Ki is the random variable that describes the number of neighbors for player i, then each Ki is identically and independently distributed according to pKi (k) ∝ k γ for some constant γ. This distribution describes a heterogeneous and realistic model of social connectivity, i.e., many players have only a few neighbors, whereas a few players are heavily connected to the rest of the population. Scale-free models have been shown to improve the possibility of cooperation in social dilemmas [15]. To evaluate the performance of the spatial dynamics, we construct graphs with 50 players, an average number of connections per player z = E(K), and an initial fraction of the population xs (0) hunting stag. For each (xs (0), z) pair, we construct ten graphs, each of which is seeded with ten initial populations. After running the dynamics for 5000 generations, we record the steady-state behavior by averaging the fraction of stag hunters over an additional 500 generations. Fig. 10 shows the average results of our trials. For moderately low values of z, the spatial dynamics considerably improves the possibility for cooperation: a sizeable fraction of the steadystate population hunts stag, although only a quarter of the initial population cooperates. This result is consistent with previous studies of cooperation in spatial networks [15], [16]. When the average number of connections is small, cooperation more readily emerges. However, compared to the attitude dynamics, stag hunting does not consistently dominate the population unless a solid majority of players initially cooperate. 8 The (−1) term in each payoff accounts for the fact that, although N (i) or s Nh (i) includes player i, the player does not pair with itself during play. Fig. 10. Average steady-state stag-hunting fraction under spatial evolutionary dynamics. VII. C ONCLUSION In this paper, we have extended the theory of satisficing games by incorporating elements from non-cooperative game theory. We augment the satisficing game with a standard utility function that describes the raw payoff to a player for exhibiting particular attitudes. The augmented framework results in an attitude equilibrium in which no single player can improve its raw payoff by exhibiting different attitudes. The attitude equlibrium combines the merits of both the satisficing and non-cooperative game theories. The conditional utility structure allows players to consider others’ preferences in making decisions, and the standard payoff function allows players to adapt their attitudes to avoid dysfunctional behavior. Non-cooperative elements of augmented satisficing games have allowed us to employ evolutionary game theory, where adaptation occurs by trial and error. We define an attitude dynamics by applying the standard replicator dynamics to the attitudes that the players exhibited rather than the strategies that they play. The attitude dynamics models the evolution of players’ attitudes according to the game and the attitudes of other players. Given appropriate initial conditions, the steady state of the dynamics is an attitude equilibrium. We have presented a satisficing model for the stag hunt—a game under which it is difficult to evolve a cooperative population. Under the augmented satisficing framework, dysfunctional behavior vanishes: the attitude equilibria entirely lie within the regions where players either mutually hunt stag or mutually hunt hare. In addition, the attitude dynamics facilitates the evolution of cooperation by introducing strategic complexity into the dynamics. Instead of simply choosing whether to hunt stag, a player chooses a risk-aversion level, which governs its interaction with the rest of the population. Under a wide variety circumstances, the dynamics encourages the population to become less risk averse, allowing cooperation to flourish. Our results significantly outperform other evolutionary methods, including classic replicator models and recently proposed spatial evolutionary models. Finally, the theoretical properties borrowed from noncooperative game theory suggest that our results will generalize to large classes of games. In particular, any game with finite attitude spaces must have an attitude equilibrium, and any Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply. NOKLEBY AND STIRLING: ATTITUDE ADAPTATION IN SATISFICING GAMES (properly initialized) steady state of the attitude dynamics is an attitude equilibrium. Although we cannot guarantee any specific results, we expect that the qualitative benefits of our approach will pertain to other games. R EFERENCES [1] M. S. Nokleby and W. C. Stirling, “The stag hunt: A vehicle for evolutionary cooperation,” in Proc. IEEE World Congr. Comput. Intell., Vancouver, BC, Canada, Jul. 2006, pp. 348–355. [2] M. Nokleby and W. C. Stirling, “Attitude adaptation in satisficing games,” in Proc. IEEE Symp. Foundations Comput. Intell., Honolulu, HI, Apr. 2007, pp. 331–338. [3] J. F. Nash, “Noncooperative games,” Ann. Math., vol. 54, no. 2, pp. 286– 295, Sep. 1951. [4] R. D. Luce and H. Raiffa, Games and Decisions. New York: Wiley, 1957. [5] A. K. Sen, “Rational fools: A critique of the behavioral foundations of economic theory,” in Scientific Models and Man, H. Harris, Ed. Oxford, U.K.: Clarendon, 1979, ch. 1. [6] A. Tversky and D. Kahenman, “Rational choice and the framing of decisions,” in Rational Choice, R. M. Hogarth and M. W. Reder, Eds. Chicago, IL: Univ. of Chicago Press, 1986. [7] E. Sober and D. S. Wilson, Unto Others: The Evolution and Psychology of Unselfish Behavior. Cambridge, MA: Harvard Univ. Press, 1998. [8] W. C. Stirling, Satisficing Games and Decision Making: With Applications to Engineering and Computer Science. Cambridge, U.K.: Cambridge Univ. Press, 2003. [9] J. K. Archibald, J. C. Hill, F. R. Johnson, and W. C. Stirling, “Satisficing negotiations,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 36, no. 1, pp. 4–18, Jan. 2006. [10] P. Ordeshook, Game Theory and Political Theory: An Introduction. Cambridge, U.K.: Cambridge Univ. Press, 1986. [11] E. Fehr and S. Gächter, “Altruistic punishment in humans,” Nature, vol. 415, no. 6868, pp. 137–140, Jan. 2002. [12] C. Wedekind and M. Milinski, “Cooperation through image scoring in humans,” Science, vol. 288, no. 5467, pp. 850–852, May 2000. [13] K. Tuyls and A. Nowé, “Evolutionary game theory and multiagent reinforcement learning,” Knowl. Eng. Rev., vol. 20, no. 1, pp. 63–90, Mar. 2005. [14] M. Ruijgrok and T. W. Ruijgrok, Replicator Dynamics With Mutations for Games With a Continuous Strategy Space, 2005. arXiv:nlin/0505032v2. [15] F. C. Santos, J. M. Pacheco, and T. Lenaerts, “Evolutionary dynamics of social dilemmas in structured heterogeneous populations,” Proc. Nat. Acad. Sci., vol. 103, no. 9, pp. 3490–3494, Feb. 2006. [16] H. Ohtsucki, C. Hauert, E. Lieberman, and M. A. Nowak, “A simple rule for the evolution of cooperation on graphs and social networks,” Nature, vol. 441, no. 7092, pp. 502–505, May 2006. [17] J. Maynard Smith, “The theory of games and the evolution of animal conflicts,” J. Theor. Biol., vol. 47, no. 1, pp. 209–221, Sep. 1974. [18] J. Maynard Smith, Evolution and the Theory of Games. Cambridge, U.K.: Cambridge Univ. Press, 1982. [19] P. Taylor and L. Jonker, “Evolutionarily stable strategies and game dynamics,” Math. Biosci., vol. 40, no. 2, pp. 145–156, 1978. [20] B. Skyrms, “The stag hunt,” in Proc. Addresses APA, 2001, vol. 75, pp. 31–41. Presidential Address of the Pacific Division of the American Philosophical Association. [21] W. Güth, “An evolutionary approach to explaining cooperative behavior by reciprocal incentives,” Int. J. Game Theory, vol. 24, no. 4, pp. 323– 344, Dec. 1995. [22] W. Güth and H. Kliemt, “The indirect evolutionary approach,” Ration. Soc., vol. 10, no. 3, pp. 377–399, 1998. [23] H. A. Simon, “A behavioral model of rational choice,” Q. J. Econ., vol. 69, no. 1, pp. 99–118, Feb. 1955. [24] W. C. Stirling, “Social utility functions—Part 1—Theory,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 35, no. 4, pp. 522–532, Nov. 2005. 1567 [25] I. Steedman and U. Krause, “Goethe’s faust, arrow’s possibility theorem and the individual decision maker,” in The Multiple Self , J. Elster, Ed. Cambridge, U.K.: Cambridge Univ. Press, 1985, ch. 8. [26] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufmann, 1988. [27] The Compact Oxford English Dictionary, 2nd ed. J. H. Murray, H. Bradley, W. A. Craigie, and C. T. Onions, Eds. Oxford, U.K.: Clarendon, 1991. [28] M. A. Nowak and R. M. May, “Evolutionary games and spatial chaos,” Nature, vol. 437, no. 6398, pp. 826–829, Oct. 1992. [29] T. Killingback and M. Doebeli, “Spatial evolutionary game theory: Hawks and Doves revisited,” Proc. R. Soc. Lond., vol. 263, no. 1374, pp. 1135– 1144, Sep. 1996. [30] B. Skyrms and R. Premantle, A Dynamic Model of Social Network Function, 2004. arXiv:math/0404101v1. [31] R. Selten, “Spieltheoretische behandlung eines oligopolmodells mit nachfragetragheit,” Zeitschrift Für Die Gasamte Staatswissenschaft, vol. 12, pp. 301–324, 1965. [32] J. W. Weibull, Evolutionary Game Theory. Cambridge, MA: MIT Press, 1995. [33] J. Hofbauer and K. Sigmund, Evolutionary Games and Population Dynamics. Cambridge, U.K.: Cambridge Univ. Press, 1998. [34] D. Foster and P. Young, “Stochastic evolutionary game dynamics,” Theor. Popul. Biol., vol. 38, no. 2, pp. 219–232, 1990. [35] A. Cabrales, “Stochastic replicator dynamics,” Int. Econ. Rev., vol. 41, no. 2, pp. 451–482, 2000. [36] A. Barbasi and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509–512, Oct. 1999. Matthew Nokleby (S’04) received the B.S. (cum laude) and M.S. degrees in electrical engineering from the Brigham Young University, Provo, UT, in 2006 and 2008, respectively. He is currently working toward the Ph.D. degree in electrical engineering at Rice University, Houston, TX. His research interests include game theory and its applications to wireless communications. Wynn Stirling received the B.A. (magna cum laude) degree in mathematics and the M.S. degree in electrical engineering from the University of Utah, Salt Lake City, in 1969 and 1971, respectively, and the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, in 1983. From 1972 to 1975, he was with the Rockwell International Corporation, Anaheim, CA. From 1975 to 1984, he was with the ESL, Inc., Sunnyvale, CA, where he was responsible for the development of multivehicle trajectory reconstruction capabilities. In 1984, he joined the faculty of the Department of Electrical and Computer Engineering, Brigham Young University, Provo, UT, where he is currently a Professor. He is the author or coauthor of more than 70 publications. He is a coauthor of Mathematical Methods and Algorithms for Signal Processing (Prentice-Hall, 2000) and the author of the monograph Satisficing Games and Decision Making: With Applications to Engineering and Computer Science (Cambridge University Press, 2003). His research interests include multiagent decision theory, estimation theory, information theory, and stochastic processes. Dr. Stirling is a member of Phi Beta Kappa and Tau Beta Pi. He has served on the program committees of conferences on imprecise probability theory and multiagent decision theory. Authorized licensed use limited to: Rice University. Downloaded on July 12,2010 at 20:55:22 UTC from IEEE Xplore. Restrictions apply.
© Copyright 2025 Paperzz