Journal of Cognitive Systems Research 1 (2001) 221–239 www.elsevier.com / locate / cogsys Simple games as dynamic, coupled systems: randomness and other emergent properties Action editor: Ron Sun Robert L. West a , *, Christian Lebiere b a Departments of Psychology and Cognitive Science, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1 S 5 B6 b Human–Computer Interaction Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Received 28 September 2000; accepted 3 October 2000 Abstract From a game theory perspective the ability to generate random behaviors is critical. However, psychological studies have consistently found that individuals are poor at behaving randomly. In this paper we investigated the possibility that the randomness mechanism lies not within the individual players but in the interaction between the players. Provided that players are influenced by their opponent’s past behavior, their relationship may constitute a state of reciprocal causation [Cognitive Science 21 (1998) 461], in which each player simultaneously affects and is affected by the other player. The result of this would be a dynamic, coupled system. Using neural networks to represent the individual players in a game of paper, rock, and scissors, a model of this process was developed and shown to be capable of generating chaos-like behaviors as an emergent property. In addition, it was found that by manipulating the control parameters of the model, corresponding to the amount of working memory and the perceived values of different outcomes, that the game could be biased in favor of one player over the other, an outcome not predicted by game theory. Human data was collected and the results show that the model accurately describes human behavior. The results and the model are discussed in light of recent theoretical advances in dynamic systems theory and cognition. 2001 Elsevier Science B.V. All rights reserved. Keywords: Game theory; Dynamic systems; Distributed cognition; Neural networks 1. Introduction In game theory (VonNeumann & Morgenstern, 1944), models of game playing generally involve a tension between choosing the best move and behav*Corresponding author. E-mail addresses: robert [email protected] (R.L. West), cl1 ] @cmu.edu (C. Lebiere). ing unpredictably.1 This is because, while certain moves may be better than others (i.e., have a higher payoff or a higher chance of success), if players always choose the best move they will be completely predictable to their opponents. Essentially, game 1 Game theory refers only to imperfect information games. Perfect information games, such as chess, where the best move can in principle be computed, are not covered by game theory. 1389-0417 / 01 / $ – see front matter 2001 Elsevier Science B.V. All rights reserved. PII: S1389-0417( 00 )00014-0 222 R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 theory is designed to specify the optimal balance between choosing the best move and behaving unpredictably. However, behaving unpredictably requires that the players be able to incorporate a random component into their behavior. Generally, this is modeled by assigning probabilities to the available moves and drawing at random from the resulting distribution, or by weighting the available moves and injecting random noise into the process of selecting between them. But, in either case, there is an assumption that people make use of some sort of internal, randomizing process when playing games. Unfortunately, this key assumption is at odds with the fact that research has demonstrated that individuals are quite bad at behaving randomly (see Tune, 1964; Wagenaar, 1972 for reviews). This is a serious problem because, if we are to take the standard game theory model seriously, we should expect people to be fairly good randomizers. 2. A dynamic / distributed perspective In addition to being unsupported by the empirical evidence, the assumption that game players make use of an internal source of randomness supports a view of game players as highly isolated cognitive agents. This can be illustrated using the simple game of paper, rock, and scissors (henceforth, PRS).2 In PRS, no move is preferred over the others so the game theory solution is to play randomly, 1 / 3 paper, 1 / 3 rock, and 1 / 3 scissors. Notice that, in order to do this, the players do not need to pay attention to their opponent’s moves, they need only pay attention to their own internal sense of randomness. In contrast, recent theoretical advances in cognitive science have shown that complex behaviors can arise from dynamic interactions between mind, body, and the environment (e.g., Bechtel, 1998; Clark, 1997, 1998, 1999; Hutchins, 1995; Port & Van Gelder, 1995). 2 PRS is a two-player game. On each turn the players choose between the moves: paper, rock and scissors. The choice of moves is displayed simultaneously by the players, usually by means of a simple sign language. The winner for each turn is determined as follows: paper beats rock, rock beats scissors, and scissors beat paper. PRS, or variants of it, occurs in many different cultures under different names. Applying this theoretical stance to game playing, the question that we pose is whether or not the function of generating randomness could be attributed to the dynamic interaction between the players rather than to separate mechanisms within the players, and what consequences this would have for how we view simple games. An important clue as to how this could work comes from psychological experiments on how people perform in tasks that involve an element of guessing. Psychological studies have clearly shown that under these conditions people almost invariably adopt the strategy of attempting to detect sequential dependencies. That is, people pay attention to previous results, they search for sequential dependencies, and they use this information in an attempt to predict the next trial. Studies show that when sequential dependencies exist, people are able to exploit them (e.g., Anderson, 1960; Estes, 1972; Restle, 1966; Rose & Vitz, 1966; Vitz & Todd, 1967) and that when they do not exist, people still use this strategy even though it results in sub-optimal results (e.g., Gazzaniga, 1998; Ward, 1973; Ward, Livingston & Li, 1988). For example, in a simple guessing task in which a signal has an 80% chance of appearing on the top part of a computer screen and a 20% chance of appearing on the bottom, people will fruitlessly search for sequential dependencies resulting in a hit rate of roughly 68%, instead of the hit rate of 80% that could be achieved by always choosing the top part of the screen (Gazzaniga, 1998). Clark (1997, 1998) refers to the situation in which two systems are coupled together in such a way that they drive each other’s behavior as reciprocal causation. In PRS, two players can be considered in a state of reciprocal causation if their outputs are influenced by their opponent’s past behavior, as is the case if both players are using the strategy of detecting sequential dependencies. For example, player A’s behavior would be based on A’s beliefs about sequential dependencies in player B’s outputs, which would be driven by B’s behavior, which in turn is driven by A’s behavior, and so on. One possibility suggested by treating the players as a coupled system is that random, or pseudo-random, behavior could be an emergent property of the interaction between them. In a game situation, the players’ sequential dependencies (at least from the players’ perspective) R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 would change constantly as each player alters his or her outputs to exploit the other’s sequential dependencies. Such a process could make it seem as though the players were internally generating random outputs. In this paper we describe a dynamic model of two players, coupled together in a game of PRS. The model was designed to test the general idea laid out above. However, as Clark (1998) notes, reciprocal causation is often associated with ‘‘emergent behaviors whose quality and complexity far exceeds that which either subsystem could display in isolation.’’ And, indeed, this is what we found. Although PRS is a very simple game, the model revealed an intricate interaction between players that was sensitive to cognitive manipulations. 3. Paper, rock and scissors In terms of real world behaviors, PRS can be considered to represent the elemental game playing skill of guessing what your opponent will do next. From an evolutionary perspective, this skill would have been crucial for survival. For example, consider a cheetah chasing a gazelle. The gazelle can leap forward, to the right, or to the left. Therefore, in order to catch the gazelle, the cheetah must correctly choose whether to pounce straight ahead, to the right, or to the left. Similarly, in PRS you must guess what your opponent will do next in order to win. PRS is also a ‘repeated game.’ That is, except for the first move, the player has access to the opponent’s previous moves (i.e., through memory). More generally, this is usually the case when animals and / or humans square off against each other in predator / prey competitions or in disputes over resources or mates (also in sports such as boxing or basketball). For example, in mammalian mating competitions, when two males face each other there is usually some preliminary movement, seemingly aimed at feeling out the opponent. Aside from ambushes, attacks rarely occur in the absence of some recent history of movement. Thus, although simple, PRS embodies basic and important game playing skills. In game theory terms, PRS is a very simple, zero-sum game. Zero-sum games are games in which the interests of the players are completely opposed, 223 in contrast to games in which cooperation is an option, such as the prisoner’s dilemma. VonNeumann’s (1928) Minimax Theory shows that for all zero-sum games there is always an optimal strategy. As noted above, for PRS, this strategy is to play randomly: 1 / 3 paper, 1 / 3 rock, and 1 / 3 scissors. However, it is important to understand what is meant by the term ‘optimal.’ The game theory designation of ‘optimal’ is very abstract and connected to the idea that rational players are optimal players (see Samuelson, 1997, Chapter 1, for a discussion). Essentially, the optimal strategy refers to an equilibrium representing the optimal balance between minimizing risk and maximizing gain, assuming that the opponent will also calculate and execute an optimal strategy. The important thing to note is that an optimal strategy will not necessarily maximize your chance of winning if your opponent does not use an optimal strategy. For example, if my PRS opponent plays scissors at an above chance rate, then I should play rock at an above chance rate to maximize my chance of winning. Maximal strategies are designed to maximally exploit specific instances of biased strategies,3 such as the one described above. The problem with maximal strategies is that they require accurate knowledge of the opponent’s biases, and you need to assume that your opponent will not change their strategy in response to your strategy. Game theory has traditionally avoided these issues by assuming there is insufficient information to use a maximizing approach (Samuelson, 1997). However, recent work in behavioral game theory (e.g., see Camerer, 1997, 1999) has begun to examine the effect of players learning about their opponent’s strategies during the course of a game and revising their own strategies based on this. Under these conditions, it is possible for the players to fall into equilibriums in which incorrect views of each other’s strategies persist, resulting in non-maximal play (e.g., the self-confirming equilibrium, see Fudenberg & Levine, 1998, for a discussion). Conceptually, the general process that we describe for producing random outputs in PRS is somewhat related. The players try to find a maximal 3 Note that the term strategy, in this case, refers to all processes that influence the players’ outputs. Therefore, an unconscious tendency to be biased would count as a strategy. 224 R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 strategy and continually fail, but in doing this they can be seen as being at an equilibrium at which they can be described as playing randomly from the point of view of an outside observer. 4. Description of the model Our modeling approach was conceptually similar to Hutchins’ (1991) method, in which cognitive agents were modeled as very simple neural networks in order to study the emergent properties arising from their interaction. PRS play involves the interactions between two cognitive agents. For purposes of modeling, these interactions were limited to outputting the symbols paper, rock, and scissors. The individual players were modeled as neural networks with a memory buffer to store their opponent’s previous moves. The size of the memory buffer was variable. For example, a player might remember the last two moves, or only the last move. The networks were designed to predict the opponent’s next move and produce the appropriate counter move. The inputs to the networks were the opponent’s recent previous moves (stored in the memory buffer) and the output was the player’s move for the current trial. The networks themselves were simple linear models (Rumelhart, Hinton & McClelland, 1986) with two layers of nodes (i.e., similar to perceptrons — Rosenblatt, 1962). The model consisted of two neural networks representing two players in a PRS game. The networks were made as simple as possible. Each consisted of one layer for input and one for output (see Fig. 1). The output layer consisted of three nodes, one to represent each of paper, rock, and scissors. The input layer consisted of a variable number of three node sets. Each set represented the previous outputs of the opponent network at a particular lag, with the three nodes in each set again representing paper, rock, and scissors. Outputs were determined by summing the weights associated with the activated connections. The output node producing the highest sum was chosen as the output, with ties being resolved through random selection. Learning was accomplished through a simple scheme in which a win was rewarded by adding one to the activated connections leading to the node representing the output, and a loss was punished by subtract- Fig. 1. A simple neural network model of a player’s ability to detect sequential dependencies in paper, rock and scissors. ing one. Throughout this paper the networks are referred to in terms of the number of lags processed. For example, a network that utilized only the previous trial would be a lag1 network, a network that considered the previous two trials would be a lag2 network, and so on. In all trials, both networks began with all weights set to zero. The use of a two layer network is similar to Townsend and Busemeyer’s (1995) dynamic model of decision making, in which a two layer network was used to model the motivational valence associated with different choices. However, the main motivation behind the network choice was to keep the model as simple as possible. Our goal was not to explore different neural network structures, but rather to embody a sequential dependency detection strategy in the simplest and most direct way. In this regard, the model delineates two principal factors that would drive any sequential dependency detection strategy: (1) the number of lags used for prediction, and (2) the goals of the system. The first factor we associated with working memory, or the amount of space in the buffer. This corresponds to how many lags back the system could remember. The goals of the system are reflected in the rewards and punishments. For example, a system rewarded for winning and punished for tying and losing is trying exclusively to win, a system that is punished for losing and neither punished nor rewarded for winning or tying does not care what happens as long as it does not lose, and so on. In PRS the goal is to win and to avoid losing, so wins were always rewarded and losses punished. But ties are somewhat R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 ambiguous in this regard. On one hand, to tie is to fail to win, but on the other hand to tie is to avoid a loss. Therefore, we examined two conditions, one in which ties were punished and one in which ties were neither punished nor rewarded. This second configuration can be considered analogous to a patient player who does not care about ties but still wants to win. 5. Evaluating the model A dynamic system is ‘‘a system whose state changes over time in a way that depends on its current state according to some rule,’’ (Port & Van Gelder, 1995). As long as a model satisfies this criterion it can be considered dynamic, although whether or not it possesses interesting emergent properties is another question. The concept of emergent properties has no official definition (see Clark, 1997, for a review and discussion of possible definitions), but there is a general agreement that an emergent property exists ‘‘ . . . whenever interesting, non-centrally controlled behavior ensues as a result of the interactions of multiple simple components within a system’’ (Clark, 1997). Dynamic systems models are used to study how changes in the control parameters of the model alter the emergent properties of the model, which are expressed through global variables. Global variables are variables that refer to the state of the model as a whole. In our case, the global variable was the difference between the players’ scores, and the control parameters were embodied in the various ways that the model could be set up (i.e., variations in the number of lags processed and in the goals of the players). As Clark (1999) notes, modeling the interaction between a cognitive agent and its environment (including other agents in the environment) can be approached in two ways, through distributed cognition or through dynamic systems theory (DST). DST models are generally mathematical, employing nonlinear difference equations or differential equations to model the system as a whole. Generally speaking, the physical components of a system are not explicitly represented in a DST model (Van Gelder & Port, 1995). For example, in thermodynamic models the atoms are not represented. In contrast, distributed 225 cognition extends the computational modeling approach traditionally used in cognitive science to distributed systems comprised of the environment and various cognitive agents (Clark, 1999; Hutchins, 1995). This results in a very different type of model, in which the agents are explicitly represented as components within the system. In addition to nonlinear, mathematical models, computational models that embody nonlinear processes can also be considered as DST models. For example, neural network models can be considered a form of DST modeling if the networks embody nonlinear processes in the form of feedback loops (Van Gelder & Port, 1995). Although the two layer, feed-forward networks used in this study did not individually contain feedback loops, when coupled together the result was a complex, nonlinear system involving two simultaneous feedback loops with information flows in opposite directions. Thus, the coupled system can be considered as a networkbased DST model, constructed within a distributed cognition framework that identifies the players as components within the system (see Bechtel, 1998, for a discussion of computational DST models that contain explicit representations of their components). The most common way of exploring dynamic systems models is by computer simulation. Applied to distributed cognitive systems this involves modeling the individual agents, simulating the interaction, and looking for meaningful emergent properties (e.g., see Hutchins, 1991). The key to evaluating whether the emergent properties of a model provide a good account of the actual phenomena lies in the ability of the model to produce testable counterfactuals (Bechtel, 1998; Clark, 1997). By counterfactuals what is meant is that the model can predict what would happen under different conditions (i.e., different parameter settings). Without this, a dynamic model is essentially a description of the phenomena (Clark, 1997), although this can still be useful for demonstrating that a phenomenon can, in principle, be produced by a certain type of dynamic system (Van Gelder & Port, 1995). For example, if the model in this paper were only able to produce random-like behaviors this would demonstrate that it could, in principle, account for this behavior in humans; but that is all. However, if the model predicts different outcomes depending on the control parameter set- 226 R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 tings, then demonstrating that humans react in the same way to changes in the control parameters would indicate that the model captures a fundamental aspect of the process. In actuality, some counterfactual claims cannot be tested. For example, it is impossible to run time backwards to test physics models concerning time, although it is possible to run the models backwards to create counterfactual scenarios. This problem also crops up in psychological DST models. For example, in dynamic models of limb movement it is not possible to test the effect of having eight fingers on each hand instead of five. However, this is generally not a problem in phenomenologically rich areas, such as body movement, where a dynamic model can prove its value by parsimoniously describing a wide range of naturally occurring behaviors with a single model (e.g., see Kelso, 1995). However, a simple behavior, such as playing PRS, poses a problem, as there simply is not a wide range of naturally occurring PRS behaviors to test the model against. To deal with this we developed a methodology that allowed us to test the model by placing human subjects in counterfactual scenarios through a mixing of simulation and reality. The methodology, called counterfactual testing, involved running simulations to find distinct emergent properties associated with the interactions between different types of simulated players. Following this, the simulated player that was believed to model human behavior was replaced with real human players, who faced the same simulated opponents. The emergent properties of the human / computer games were then compared to those from the computer / computer games. In reference to generating the emergent property, if the emergent properties were the same it would indicate that the humans interacted with their simulated opponents in a manner similar to the component they replaced. 6. Experiment 1: simulating random PRS play Game theory makes two predictions concerning PRS. The first is that the expected result should be a tie. The second is that the outputs of the players are random or random-like. To model this, identical network models were played against each other. Intuitively, it is somewhat obvious why this would be expected to produce the expected outcome of a tie, since neither network had any advantage over the other. However, to produce random-like outputs the coupled system would need to behave in a chaos-like manner since there was no independent source of noise. Generally, computer simulated dynamic models capable of producing chaos-like outputs allow for a high number of decimal places to simulate continuous processes. Simulations of systems lacking such fine precision tend to produce limit cycles (i.e., simple repeating patterns) or converge to an equilibrium point and repeat the same output (e.g., see Ward & West, 1994). In fact, any closed dynamic system modeled using discrete rather than continuous data (i.e., all computer simulations of dynamic systems) will eventually settle into a repeating pattern, but with sufficient fidelity the length of the repeating pattern is astronomical. Thus the critical question was whether the very simple network models used in this study could produce a chaos-like effect 4 (for a discussion of the fact that simple, discrete, symbolic systems can produce complex dynamic behaviors, see Wells, 1998). 6.1. Results and discussion Multiple trials were run, pitting lag1 networks against lag1 networks, lag2 networks against lag2 networks, and lag3 networks against lag3 networks. As would be expected by symmetry, no individual network was able to systematically gain an advantage over its opponent. As can be seen from Fig. 2, which displays a lag1 versus lag1 game (plotted in terms of the difference in scores across trials), the coupled lag1 networks initially produced a chaos-like result resembling a random walk, but eventually settled to an equilibrium. However, as illustrated in Fig. 3, when the complexity of the system was increased to a lag2 versus a lag2 game the system was sufficiently complex to support the random walk behavior for a large number of trials (actually, 5000 trials could be considered excessive for PRS). This 4 The term chaos-like is used instead of chaos since truly chaotic systems, i.e. systems that never repeat, exist only in mathematics or the physical world. In this case, chaos-like is simply meant to refer to dynamic systems that appear to an observer to behave randomly. R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 Fig. 2. The difference in score across trials for a lag1 network versus a lag1 network. 227 theory. In Experiment 1, the working memory capacity, expressed in terms of how many lags a network could hold in its buffer, was equal in each game. This experiment investigated the effect of unequal working memory capacities. From a game theory perspective this was a bit unusual as it is generally assumed that the players have the same cognitive abilities, but from a psychological perspective we know that this is often not the case (e.g., due to individual differences and situational differences, such as stress or a high cognitive load). As in Experiment 1, ties were treated as losses. 7.1. Results and discussion Fig. 3. The difference in score across trials for a lag2 network versus a lag2 network. result demonstrates that this type of coupled system is capable of generating chaos-like behaviors, and that the random component in game playing could, in principle, be generated through the interaction between players. This result is quite important as it demonstrates an avenue for resolving the conflict between the game theory claim that randomness is central to game playing, and the psychological finding that people are poor at behaving randomly. We simulated 10 games of 500 trials each of a lag2 network versus a lag1 network and found that the lag2 network was significantly more likely to have a higher final score than the lag1 network (P 5 0.027, pairwise, two-tailed t-test). We also simulated 10 games of 30,000 trials each of a lag3 network versus a lag2 network and found a similar advantage for the higher lag network (P 5 0.013, pairwise, two-tailed t-test). The high number of trials for the lag3 versus lag2 games was necessary to produce a significant difference, indicating that there are diminishing returns for using a higher number of lags. Fig. 4 displays representative results of four lag2 versus lag1 games. The random walk quality found in Experiment 1 is preserved, but the results appear as a random walk with a trend favoring the 7. Experiment 2: simulating unequal PRS play The results of Experiment 1 demonstrated that the coupled networks could mimic random behavior through a dynamic process. The next step was to alter the control parameters of the model to see if it could produce non-random behaviors. For the model in this paper, producing a non-random result in the form of a systematic advantage for one player would be important as it would indicate that the model could produce PRS behaviors not predicted by game Fig. 4. The difference in score across trials for several games of a lag2 network versus a lag1 network. The score differences were calculated as the lag2 network’s score minus the lag1 network’s score. 228 R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 higher lag network. Thus the model produces a naturalistic-looking game in which the player with the larger working memory enjoys a systematic advantage. In terms of understanding the process that led to this result, it is important to keep in mind that the system that produced it was a coupled system. From a strict DST perspective, we should think of it as a single, dynamic system, and not as two separate players. However, because each network comprised a distinct module of the system it is possible to gain some insight into the nature of the interaction between them (Bechtel, 1998). Since we know that the networks used the strategy of learning sequential dependencies, we know that the interaction between them must have created sequential dependencies in the outputs of the lower lag network that the higher lag network could learn. It is also possible to determine whether or not these learned sequential dependencies were stable by looking at the connection weights across time. Fig. 5 displays representative results showing the change in connection weights across time for a lag2 network during a game against a lag1 network. As can be seen, no pattern emerges across time, indicating that the lag2 network was not learning a stable pattern of sequen- Fig. 5. Relative connection weights across trials for a lag2 network during a game against a lag1 network. The graph shows the influence of the ‘paper’ input nodes (i.e., at lag1 and lag2) on the outputs, across trials. The relative weights were created by using the connection weight between the lag1, paper input node and the paper output node as a standard. The graph shows the difference between the weight of this connection and the weight of the other connections between the lag1 and lag2 paper input nodes and the output nodes. All of the connection weights displayed this pattern, indicating that nothing stable was learned. tial dependencies (if it were, a stable pattern of differences between the weights would increase and become more evident across time). Since nothing stable was learned, the lag2 network must have won by learning relatively short-lived sequential dependencies in the lag1 outputs. From this we can characterize the process as one of learning and unlearning relatively short-lived sequential dependencies. To more precisely determine the length of the sequential dependencies learned by the networks, we computed their frequency as a function of their length. For each input unit, a learned sequential dependency was defined as the number of trials in which the weight from that unit to a given output unit remained larger than the weights from that input unit to the other output units. We ran 1000 games of 5000 trials each of the lag2 network playing the lag1 network. Fig. 6 plots the number of learned sequential dependencies of length 1 to 100 as a percentage of all the learned sequential dependencies. Since the percentage as a function of the length is plotted on a log–log scale, the roughly linear nature of the curve suggests a power law distribution of frequencies. Such distributions are pervasive in natural settings (e.g., West & Salk, 1987). The curve decreases quickly as a function of length, with a power law exponent roughly equal to 22, which results in more than 90% of dependencies lasting less than 25 trials. As can be seen in Fig. 6, the lag2 network learned fewer short sequential dependencies and more long sequential dependencies than the lag1 network. A similar pattern was found for the lag3 network versus Fig. 6. Frequency of sequential dependencies by length for lag1 versus lag2 network games. The gray line represents the lag2 frequencies and the solid line represents the lag1 frequencies. R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 229 the lag2 network. Overall, these results suggested that the advantage of the higher lag networks was due to the ability to detect and exploit longer sequential dependencies. they clicked on a button marked NEXT to reveal the computer’s response. The subject’s score, the computer’s score, and the number of trials were displayed and updated on each trial. 8. Experiment 3: humans versus the lag1 network 8.1.3. Procedure All subjects were required to play against a lag1 network for approximately 20 min. The number of trials varied according to each subject’s speed and interest in continuing. All subjects played at least 300 trials. Subjects were instructed that the computer’s responses were not random, that it was programmed to play like a human, and that it was possible to beat it. They were also told that the program was too complex to figure out and that the way to win was to play by intuition. As in Experiments 1 and 2, ties were treated as losses. Experiment 2 produced counterfactual scenarios in which a larger working memory capacity led to a systematic advantage. The next step was to see if humans could also produce this type of behavior. A lag1 network opponent was used to create a condition which maximized the chance that human subjects could win, since a win by the computer could be explained in terms of the computer passively detecting naturally occurring sequential dependencies in subjects’ outputs. To win against the network the human players would need to get the network to generate the kind of sequential dependencies they could detect, without generating too many of the kind of sequential dependencies that the network could detect. However, if we assume that the lag2 network is a reasonable model of the human ability to detect sequential dependencies, then this should occur as an emergent property of the interaction, provided that human players actually use the strategy of detecting sequential dependencies to play PRS. Also, note that if the human players used the strategy of attempting to generate random outputs this would result in a tie if they were able to be sufficiently random to prevent the computer from detecting any sequential dependencies, or a loss if they failed to achieve this standard. It is not possible to beat the lag1 network using this strategy. 8.1. Method 8.1.1. Subjects The human subjects were nine volunteers from the University of British Columbia. 8.1.2. Apparatus The experiment was conducted using a program written in Visual Basic. Subjects could use a mouse to click on three different icons to indicate their move (i.e., paper, rock, or scissors). Following this 8.2. Results and discussion The mean final scores were 173 for the humans and 150 for the computer. A pairwise t-test revealed that this difference was significant (P 5 0.036, twotailed), indicating that the humans were able to outplay the lag1 network. Fig. 7 displays the difference in scores between the subjects and the computer across trials. As we can see, only one subject performed badly, while two performed at roughly a breakeven level and six were clearly able to outplay the computer. To get an idea of the general trend the mean score difference was plotted and is displayed in Fig. 8 (note: to get an unbiased function the data set was truncated at 300 trials to make the number of trials the same across subjects). A regression analysis on the data in Fig. 8 revealed a significant (P , 0.001) linear trend, indicating a systematic advantage in favor of the human subjects. A regression on the non-truncated data set also produced a significant, positive, linear trend (P , 0.001). Finally, to more directly compare the human results to the lag2 network, 100 games of 300 trials each between a lag2 network and a lag1 network were simulated. The simulations produced an average score difference of 10.47 (s.d. 13.04) in favor of the lag2 network. At 300 trials, the average score difference between the subjects and the lag1 network was 9.99 230 R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 Fig. 7. The differences in score across trials for nine humans versus the lag1 network. The score differences were calculated as the human score minus the lag1 network score. Fig. 8. Mean score differential across trials (the data set was truncated at 300 trials to make the number of trials the same across subjects). (s.d. 19.61), quite close to the value predicted by the simulation. Fig. 9 displays a percent distribution of the differences in final scores for the humans versus the lag1 and the lag2 versus the lag1. In Experiment 2 the results suggested that the advantage of the lag2 network over the lag1 network was due to the ability of the lag2 network to learn longer lasting sequential dependencies. To examine this in terms of the human data we adopted a methodology called model tracing (Anderson, Kushmerick & Lebiere, 1993). This involved matching each subject with a lag2 network and once again simulating games between a lag2 network and a lag1 network. In each game we forced the lag2 network to make the same moves as those made by their human counterpart in the recorded game. The lag1 network Fig. 9. A percentage based distribution of final score differences for humans versus the lag1, and the lag2 versus the lag1. The score differences were calculated as the human score minus the lag1 score and the lag2 score minus the lag1 score. was constrained in the same way. The results were analyzed in the same manner as in Experiment 2. Fig. 10 plots the frequency of learned sequential dependencies as a function of length for the lag1 network and for the lag2 network standing in for the subjects. For comparison purposes, Fig. 10 also plots the frequencies generated by the lag1 and lag2 networks playing freely against each other in Experiment 2. The form of the distribution generated by the free-playing networks was well reproduced. However, because we found very few learned sequential dependencies lasting over 25 trials, we were unable to examine the range where the difference between the free-playing lag1 and lag2 networks was most pronounced (see Fig. 6). This was probably because the sequential dependencies lasting over 25 trials are R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 231 9. Experiment 4: humans versus the lag2 network Fig. 10. Frequency of sequential dependencies by length for the model-tracing results (Experiment 3) and the free-playing results (Experiment 2). For the model-tracing results the circles represent the lag2 frequencies and the triangles represent the lag1 frequencies. For the free-playing results the dotted line represents the lag2 frequencies and the solid line represents the lag1 frequencies. relatively rare and the model-tracing simulations were restricted to low numbers of trials and games. However, there was a tendency, discernable in Fig. 10, for the lag2 network to learn slightly fewer short sequential dependencies and slightly more long sequential dependencies than the lag1 network, as predicted by the free-playing results. After the experiment, subjects were informally asked for any insights they had concerning their own play. Several subjects reported attempting to draw the computer into a vulnerable position and then exploit it, but when pressed for details could not provide them or described a strategy that would not work. Specifically, some subjects reported achieving success by repeating a pattern until the computer, ‘got used to it,’ and then altering their pattern to exploit what the computer had learned. However, this simple strategy will not work because the lag1 network unlearns as fast as it learns. This is because the reward for winning and punishment for losing were balanced (reward 11, punishment 21). Thus teaching the network a pattern and then exploiting the pattern leads to a breakeven situation at best. To win, a player must present a pattern that, while being learned, simultaneously exploits earlier learning. Experiment 4 examined what happened when subjects played PRS against a lag2 network. Following from the simulation results, if the mechanism that subjects used to play was functionally equivalent to a lag2 network they should have had a 50 / 50 chance of winning. However, unlike the computer models, subjects may not always play perfectly. For example, they could experience lapses in attention, which could result in processing, on average, more than one lag but less than two. Thus it was possible that subjects could beat a lag1 network but fail to tie a lag2 network. On the other hand, if subjects were able to beat a lag2 network, as they beat the lag1 network, it would falsify the claim that humans play PRS in a manner similar to the lag2 network. Also, such a win could not be explained in terms of subjects playing at a lag3 level (or higher) as the number of trials involved would be insufficient to distinguish between a lag2 network and a lag3 network (recall from the simulation results that a lag3 network can be distinguished from a lag2 network only with an enormous number of trials). Therefore, a collective victory by human subjects against a lag2 network would indicate a fundamental problem with the model used in this study. 9.1. Method 9.1.1. Subjects Eighteen subjects from the University of Hong Kong volunteered to play against the lag2 network. Eight of these were tested on a one by one basis (similar to Experiment 2), while the other 10 were tested as a group. For the group test a computer lab was used so that each subject had his or her own PC. 9.1.2. Procedure The conditions, instructions, and apparatus were the same as in Experiment 2, except that the individually tested subjects were asked to play until either they or the lag2 network reached a score of 50, and the group tested subjects were asked to play until either they or the lag2 network reached a score of 100. As in the previous experiments ties were treated as losses by the network. Subjects were told to play 232 R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 at their own pace and after approximately 20 min all subjects were stopped, regardless of how far they had gotten. Also, subjects were told that the computer was programmed to play quite well and that if they won only by a little it would demonstrate considerable ability on their part. This was done to avoid subjects becoming discouraged and losing concentration if they failed to gain a decisive advantage. 9.2. Results and discussion Taking an average of the difference between subject’s final scores and the lag2 network’s final scores produced a mean difference of 28.89 (s.d. 19.74). A paired t-test revealed that the difference in final score was significant (P 5 0.036). Thus, these results were consistent with the claim that subjects play PRS in a manner similar to a lag2 network, but without the consistency of the computer. Alternatively, it was also possible that some subjects did play as well as the lag2 network but that the averaged results were dragged down by other subjects who did not. One factor that might enter here is that it is simply less fun and less motivating to play when you are not winning. Because it was evident from these results that at least some subjects were not playing in the same way as a lag2 network, we did not pursue model-tracing. 10. Experiment 5: simulating different payoffs Experiment 2 simulated the effect of altering the number of lags processed. In addition to this parameter, it was also possible to adjust the payoffs for wins, losses, and ties (i.e., the amounts that were allotted for rewarding and punishing the network). In Experiments 1 to 4 the networks were rewarded by adding one to the connection weights for winning, and punished by subtracting one for losing or tying. These weights were based on the evolutionary argument that a tie is a waste of resources and therefore undesirable. However, taking a less longterm view, it could also be argued that a tie is a neutral event and should therefore be neither punished nor rewarded. At first, it may seem that removing the punishment for ties should not affect the outcome of a higher lag model being able to beat a lower lag model. After all, the effect is merely to prevent a network from learning to avoid ties, it should still learn to predict losses and wins. Viewed in this way a lag2 network should still be able to beat a lag1 network, although the rate of winning could be slower due to more ties. This reasoning, however, applies to a single network attempting to detect a stable pattern of sequential dependencies. Since the results of Experiment 2 indicated that the interaction between two networks produces short-term sequential dependencies, it was not obvious how altering the payoffs would affect the behavior of the system. Therefore, as in Experiments 1 and 2, simulations were used to explore the dynamics of the model. 10.1. Results and discussion All simulations were 5000 trials long. To aid in the discussion a network that is punished for tying will be referred to as an aggressive network and a network that is neither punished nor rewarded for tying will be referred to as a passive network. In the first simulation, aggressive lag2 networks were played against passive lag2 networks. The result was a clear tendency for the aggressive network to win. Fig. 11 shows some representative results. Next, aggressive lag1 networks were played against passive lag2 networks. In this case the results were less stable, with some runs producing dramatic rise and fall patterns in the score differential. Overall, there was no evidence of a systematic advantage for the Fig. 11. The difference in score across trials for several games of an aggressive lag2 network versus a passive lag2 network. The score differences were calculated as the aggressive network’s score minus the passive network’s score. R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 lag2 network over the lag1 network. One interesting characteristic of these results was a tendency for the lag1 network to go on winning streaks followed by less intense, but longer losing streaks. This pattern was very obvious in some runs but not in others. Fig. 12 shows an example of a run with this result. Also evident in Fig. 12 is a fractal structure, in which the same pattern is repeated on different scales (fractal patterns often occur as an emergent property of dynamic systems). To gain a further understanding of how the aggressive lag2 network was able to beat the passive lag2 network, we performed the same analysis on the learned sequential dependencies as in Experiment 2. An aggressive lag2 network was played against a passive lag2 network for 1000 games of 5000 trials each. Fig. 13 plots the frequency of their learned sequential dependencies. The results show that the winning network, the aggressive lag2, learned more short sequential dependencies and fewer long ones than the losing network, the passive lag2. This result is the opposite of the Experiment 2 results, in which the winning network learned less short sequential dependencies and more long sequential dependencies than the losing network. Thus, learning longer sequential dependencies is not necessarily preferable, as it appeared from the Experiment 2 results. Instead, whether learning more long or more short dependencies is associated with winning depends on the characteristics of the networks involved. 233 Fig. 13. Frequency of sequential dependencies by length for aggressive lag2 versus passive lag2 network games. The gray line represents the aggressive lag2 frequencies and the solid line represents the passive lag2 frequencies. 11. Experiment 6: humans versus a less aggressive network The results of Experiment 5 indicated that in order to beat the aggressive lag1 network in Experiment 3, the human subjects would have needed to play in a way similar to an aggressive lag2 network. Since the results of Experiment 5 also indicated that an aggressive lag2 network can beat a passive lag2 network, it was predicted that human subjects would also be able to beat a passive lag2 network. 11.1. Method 11.1.1. Subjects Twenty-two subjects from the University of Hong Kong volunteered to play against the passive lag2 network. All of the subjects were tested simultaneously, using a computer lab, as in Experiment 4. 11.1.2. Procedure The conditions, instructions, and apparatus were the same as in the group condition in Experiment 4. As in Experiment 4, subjects were asked to play until either they or the lag2 network reached a score of 100. 11.2. Results and discussion Fig. 12. The difference in score across trials for an aggressive lag1 network versus a passive lag2 network. Note the fractal structure, in which the same pattern is repeated on different scales. Out of the 22 subjects, only six failed to win. The mean final score for the human subjects was 95.27 and the mean final score for the lag2 network was 234 R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 Fig. 14. A percentage based distribution of final score differences for humans versus the passive lag2, and the aggressive lag2 versus the passive lag2. The score differences were calculated as the human score minus the passive lag2 score and the aggressive lag2 score minus the passive lag2 score. 84.14. A paired t-test revealed that this difference was significant (P 5 0.009, two-tailed). As in Experiment 3, simulations were run to better compare subjects’ performance with the lag2 network. In this experiment, subjects played an average of 287 trials, so 100 simulations of 287 trials each were run, playing an aggressive lag2 network against a passive lag2 network. The mean difference in final score for the simulation was 11.17 (s.d. 20.35) in favor of the Fig. 15. Frequency of sequential dependencies by length for the model-tracing results (Experiment 6) and the free-playing results (Experiment 5). For the model-tracing results the circles represent the aggressive lag2 frequencies and the triangles represent the passive lag2 frequencies. For the free-playing results the dotted line represents the aggressive lag2 frequencies and the solid line represents the passive lag2 frequencies. aggressive network. For subjects versus the passive lag2 network the mean difference in final score was 11.14 (s.d. 23.05), very close to that of the simulation. Fig. 14 displays a percent distribution of the differences in final scores for the humans versus the lag1 and the lag2 versus the lag1. As in Experiment 3, we applied model-tracing to further evaluate the results. In this case, this involved forcing an aggressive lag2 network to make the same moves as the human players in games against a passive lag2 network. The frequency of learned sequential dependencies are plotted in Fig. 15, along with the results from the free-playing networks from Experiment 5. As in Experiment 3, we found very few of the relatively rare sequential dependencies lasting over 25 trials, probably due to the low numbers of games and trials. However, the modeltracing results do a good job of reproducing the frequency distributions of the free-playing networks within the shorter range. 12. Discussion In this study we investigated a distributed cognition model of the interaction between players in a simple, zero-sum, guessing game (i.e., PRS). Our results show that human subjects produce results highly similar to the aggressive lag2 network for two different counterfactual scenarios, playing an aggressive lag1 opponent and playing a passive lag2 opponent. Although human subjects displayed a slight tendency to lose against an aggressive lag2 opponent, instead of tying as predicted by the simulations, this can be accounted for by considering factors such as concentration and motivation (see Experiment 4 for a discussion). The validity of our model is further strengthened by the fact that it was based, a priori, on psychological research describing how people behave under similar conditions. 12.1. Other models Other models have also tried to describe how game theory solutions could arise in the behavior of humans and animals. These can be divided into three R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 distinct, but interrelated areas: evolutionary models, learning models, and psychological models. Evolutionary models are used primarily by biologists to explain how the appropriate probabilities for different actions could have developed. The players are computer-simulated automata that evolve specific strategic responses through the use of genetic algorithms. Explaining how game theory solutions could have evolved is important as game theory has been successfully used to predict real animal behaviors, such as the competitive strategies of spiders and naked mole rats (see Pool, 1995, for a review). Consistent with this, it is possible to get computer simulations of simple automata to evolve to game theory solutions (Roth, 1996). Similar to evolutionary models, the purpose of learning models is to explain how players acquire the probabilities associated with each move in a game. In these models players attempt to improve their strategies by learning through experience. However, because players may have incomplete information they can evolve to equilibriums that are non-optimal and also non-maximal (e.g., see Camerer & Ho, 1999; Claus & Boutilier, 1997; Fudenberg & Levine, 1998; Sun & Qi, 2000). This type of modeling is important because it can explain (a) how humans can acquire effective strategies for novel games without being able to perform the complex calculations involved in determining the optimal game theory strategy, and (b) why humans often do not use the optimal game theory strategy (Pool, 1995). The literature in these areas is large and diverse (e.g., see ; Camerer, 1997, 1999; Camerer & Ho, 1999; Claus & Boutilier, 1997; Erev & Roth, 1998; Fudenberg & Levine, 1998; Sun & Qi, 2000). However, we believe that the unique contribution of our model is that it explicitly avoids assuming that individual players use some sort of randomizing function to implement their strategy. Instead, we postulate that the randomizing function exists as an emergent property of the interaction. Typically, evolutionary models and learning models converge to a vision of the player as possessing a set of probabilities expressing the likelihood of each move and a means of randomly selecting moves according to these probabilities, which is difficult to reconcile with the fact that human players are poor at randomizing. 235 Psychological models seek to replace the objectively rational assumptions of game theory with psychologically realistic assumptions. This goal is supported by research in psychology and experimental economics demonstrating that human behavior systematically and predictably deviates from the rational course of action. Generally speaking, psychological models have attributed the human inability to behave randomly on demand to cognitive biases related to an incorrect understanding of randomness. These have been explored extensively (e.g., Tversky & Khaneman, 1986; Treisman & Faulkner, 1987) and it is clear that these biases exist when people attempt to generate random outputs in isolation. However, our results indicate that people generate outputs in a different way when involved in an interactive game situation. In terms of studies testing people under actual game playing conditions, our results are consistent with Rapoport and Budescu (1992) who found that humans behave more randomly in a game situation than in isolation, and with Huettel and Lockhead (2000) who found that humans are strongly influenced by past trials when playing games. Another related finding in this area is Gilovich, Vallone and Tversky’s (1985) well-known paper, The hot hand in basketball. This paper went beyond artificially created lab situations and attempted to show that people erroneously perceive sequential dependencies in real life human games. Specifically they tested the belief that professional basketball players are more likely to score if they have recently scored (in other words, that players are governed by sequential dependencies). The result was negative, they found no evidence for this in the task they studied (shooting sequences of baskets from the free throw line). This finding would be completely trivial if the claim were limited to their specific task. However, the notoriety of the study is due to the possibility that the result (i.e., people seeing sequential dependencies where they do not exist) can be extended to human game playing in general. With regard to this study, it is important to note that the statistical methodology used by Gilovich et al. (1985) is only valid for evaluating stable, long-term sequential dependencies and cannot be used to rule out the possibility that the players are able to detect short-term sequential dependencies. 236 R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 12.2. Non-distributed, non-dynamic explanations Our network models show that, under the right conditions, the interactions between players using the strategy of detecting sequential dependencies can put the players into a co-learning equilibrium in which one player enjoys an advantage that the other player cannot learn to avoid. The empirical results of this study also indicate that humans play PRS in a manner closely approximated by an aggressive lag2 network. To further evaluate this claim we need to consider the possibility that human players could win against the networks using a different type of strategy. The optimal game theory strategy of selecting moves at random will not work, as the expected outcome is a tie. However, it is possible to win by allowing the network to detect sequential dependencies that are, in reality, decoys drawing it into a predictable pattern of play. This strategy of using one’s own pattern of responses to exert control over one’s opponent’s responses will be referred to as the decoy strategy. The decoy strategy is generally considered problematic as it evokes a recursive pattern of reasoning. For example, in PRS, if player A played rock on the first five trials then the opponent, player B, would be tempted to play paper on the 6th trial, but B would also know that A knows that B is tempted to play paper and that therefore A would play scissors and B should counter with rock. B might further realize that if he could figure this out then so could A, and therefore that A would expect B to play rock and would counter with paper, and so on into an infinite regress. However, if the players know that their opponent is using a sequential dependency detection strategy (something the subjects did not know), the decoy strategy is tenable. For example, if the opponent learns faster than he unlearns, then the decoy strategy can be executed by repeating a pattern until the opponent learns it. Once the pattern is learned the player can switch to a pattern that exploits responses based on the learned pattern. Because the learning process is shorter than the unlearning process, the cost of teaching the opponent a pattern (i.e., allowing the opponent to win in order to convince him the pattern is valid) is less than the advantage gained during the exploitation phase. But, if the opponent unlearns at least as fast as he learns, as the network models in this study did, then the decoy strategy is more difficult to implement. To win against the networks used in this study it is necessary to play according to a pattern that simultaneously exploits previously learned patterns while causing the opponent to learn a new pattern that can be exploited in the future. To do this a player would need to know how his opponent was detecting sequential dependencies, something our subjects did not know. Even with this knowledge, it is a fairly complex problem to reason out how to simultaneously set up and exploit the opponent. An alternative to reasoning out the problem is to mentally simulate the opponent’s sequential dependency detection mechanism. However, this would involve a very heavy cognitive load, even without considering the time factor (the speed of the human players, on average, was approximately one move every 1 to 3 seconds). Also, we need to ask how a player would get the detailed information necessary to construct an accurate simulation. Having no knowledge about how their opponent would play, the only way the human players in this study could gain this knowledge would be by sampling how the computer played and / or by doing planned experiments. Again it is difficult to believe that this is what the human players were doing. Note how far we have gone from the simplicity of the network models in order to come up with a tenable explanation. Even if we assume that humans are capable of (1) deducing their opponent’s strategy through sampling and / or experimentation and (2) using logic or mental simulations to determine their next move, we need to ask ourselves if such an elaborate scheme makes sense from an evolutionary point of view. One of the arguments supporting a dynamical model of movement, as opposed to the classical model in which movements are calculated in the head and then executed, is that there is no time to calculate trajectories and forces when one is being chased by a bear. The systems must work in real time (see Clark, 1997, for a discussion of this point). Similarly, in a human-on-human fight (e.g., for mating privileges), there is no time to sample the opponent’s behavior, form a model, and work out the appropriate response. Finally, subjects’ own accounts of how they played indicated no complex planning. It is interest- R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 ing, though, that subjects did report vague notions of the decoy strategy (see Experiment 3). However, the most likely reason for this is that they were merely reporting their best guess as to how they were playing. In many popular games it is possible to fake-out one’s opponent. For example, in soccer you can fake going left and then go right to get around your opponent. This type of strategy is actually a very simple instantiation of the decoy strategy, so it is possible that subjects were merely drawing on their experiences of playing games to generate a hypothesis as to how they were playing against the networks. The fact that people can mistake this type of hypothesis for an insight into their actual behavior has been well documented (Nisbett & Wilson, 1977). 12.3. Game theory revisited Game theory is a way of calculating optimal strategies, but the game theory strategy of playing randomly is not the best strategy to use against the network models in this study. For example, a better strategy would be to: (1) use a network-like, sequential dependency detection strategy, (2) play aggressively (i.e., treat ties as losses), and (3) process more lags than your opponent. As demonstrated in the experiments above, this will cause you to win, rather than tie as with the game theory strategy. However, the game theory strategy can still be considered an accurate description of the optimal strategy. Why is this so? — because the term ‘optimal’ refers to an equilibrium (Samuelson, 1997). As pointed out in the Introduction, if an opponent has a bias the game theory solution does not maximize your chance of winning. Instead, the maximal strategy is one that exploits the specific biases of your opponent. However, to do this, you must also play in a biased way, opening yourself up to exploitation. The game theory solution represents an equilibrium where both players have optimally maximized their advantages and minimized their risks. This is why the evolutionary models and the learning models tend to produce solutions similar to the game theory solution; the players, through trial and error, co-evolve to an equilibrium. If the networks were free to evolve, passive networks and networks with less working memory would be weeded out through competition. 237 Given that there is not much of an advantage to going beyond a lag2 network in terms of memory (unless the games are extremely long), the result would be that all the networks would evolve to be lag2 (or higher), aggressive networks. As Experiment 1 demonstrated, lag2 versus lag2 games result in essentially random outputs from the players, as predicted by game theory. The counterfactual scenarios also need to be understood in this context. Evolutionary models of game playing as well as studies on animal populations (Pool, 1995) have demonstrated that evolution can, and may quite often, produce solutions close to the optimal game theory solutions. As noted above, these solutions are the result of the different players co-evolving. The counterfactual scenarios on the computer pit mechanisms that have not co-evolved against each other. However, it is unlikely, given individual differences and the ability to learn, that humans have all arrived at a perfect equilibrium for game playing. Indeed, game theorists generally regard game theory solutions as generalizations about a population, not as descriptions of every individual in the population. Thus, it is not unreasonable to propose that some people may be able to beat others at PRS, or other games, by virtue of the fact that they employ more working memory (e.g., by concentrating), or because they treat ties as losses while their opponent treats ties as neutral events. If this is true, it is interesting to consider the possibility that the effective use of working memory may be a more important determinant in sports such as boxing than pure speed or power. Essentially, games played in the manner described in this paper fall outside the sphere of the standard game theory approach (i.e., Minimax theory in the case of PRS). As noted above, from an evolutionary perspective we would expect all the networks to evolve to aggressive, lag2 networks. Although this would result in the outcome predicted by game theory (i.e., random-like play), there is no guarantee that this will always be the case. This is also the case for the evolutionary models and the learning models discussed in the Introduction. The reason is that these models can co-evolve to a local rather than a global optima. So we should not automatically assume that game theory will always provide the correct answer. 238 R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 12.4. Counterfactual testing Cognitive science has traditionally taken a mechanistic approach to modeling (i.e., breaking a system down into its component parts, Bechtel, 1998; Clark, 1997, p. 104). There are two principal advantages to this approach. The first is that understanding a system in terms of its components provides insight into how the system is actually constructed. The second is that, when applied to models that do not involve complex patterns of interactions between the components, it allows us to understand the sequential steps involved in the operation of the model and the specific function of each component at each step (Bechtel, 1998). However, since DST models generally involve very complex interactions between system components, they do not provide the second advantage (Clark, 1997; Bechtel, 1998). In fact, most DST models do not refer to the individual parts at all (e.g., models of thermodynamics refer to heat and pressure but not to individual atoms). Counterfactual testing provides a methodology for evaluating DST models that include humans as components (i.e., distributed cognition systems). It also provides a clear delineation of the parts of the system in terms of the agents (human and otherwise) involved in the task. However, because counterfactual testing is based on the global variable values of a DST system, it does not provide insight into the interaction between the agents. Characterizing the interaction can be accomplished to some degree by (1) deduction about what must be going on, based on knowledge of the simulated agents, and (2) other types of analysis (e.g., model tracing). recurrent networks use information from preceding trials to predict current trials, but instead of using a memory buffer the previous trials are represented within the network itself. Since the task (i.e., guessing what comes next) is similar, people should use the same mechanism. Future work should focus on finding a mechanism that can unify different tasks that involve this type of guessing. 12.6. Conclusion Our findings indicate that cognitive factors play a much larger role in simple games than has previously been thought. More specifically, we found that PRS is not a case of two players tossing out random outputs, but rather an intricate dance, in which the outcome is determined by the amount of working memory employed and the values assigned to the outcomes on each trial. The results also demonstrate that randomness can be understood as an emergent property of the interaction between players rather than an individual cognitive ability. This is significant as it offers an explanation of how individual humans could be poor at randomizing when this ability is so important from a game theory perspective. Acknowledgements This research was supported, in part, by a grant from the Natural Sciences and Engineering Research Council of Canada to RLW. 12.5. Neural networks References Is the lag2 network the best model of the human ability to detect sequential dependencies? As noted above, we strove to create the simplest possible model, so it is possible that a more complex model that has other applications could also produce these effects. Ultimately, extending the model in this paper to more complex games or other situations that involve guessing may prove that it is too limited. In this regard, work indicating that simple recurrent networks (Cleeremans & McClelland, 1991; Elman, 1990) can explain implicit memory tasks may be relevant. Like the networks used in this paper, simple Anderson, N. H. (1960). Effect of first-order probability in a two choice learning situation. Journal of Experimental Psychology 59, 73–93. Anderson, J. R., Kushmerick, N., & Lebiere, C. (1993). Navigation and conflict resolution. In: Anderson, J. R. L. (Ed.), Rules of the mind, Erlbaum, Hillsdale, NJ. Bechtel, W. (1998). Representations and cognitive explanations: assessing the dynamicist’s challenge in cognitive science. Cognitive Science 22 (3), 295–318. Camerer, C. F. (1997). Progress in behavioral game theory. Journal of Economic Perspectives 11 (4), 167–188. Camerer, C. F. (1999). Behavioral economics: reunifying psychology and economics. Proceedings of the National Academy of Sciences 96, 10575–10577. R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239 Camerer, C. F., & Ho, T. H. (1999). Experience weighted attraction learning in normal-form games. Econometrica 67, 827–874. Clark, A. (1997). Being there: putting brain, body and world together again, MIT Press, Cambridge, MA. Clark, A. (1998). The dynamic challenge. Cognitive Science 21 (4), 461–481. Clark, A. (1999). Where brain, body, and world collide. Journal of Cognitive Systems Research 1, 5–17. Claus, C., & Boutilier, C. (1997). The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI-97 Workshop on Multiagent Learning. Cleeremans, A., & McClelland, J. L. (1991). Learning the structure of event sequences. Journal of Experimental Psychology: General 120, 235–253. Elman, J. L. (1990). Finding structure in time. Cognitive Science 14, 179–211. Erev, I., & Roth, A. E. (1998). Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review 88 (4), 848–881. Estes, W. K. (1972). Research and theory on the learning of probabilities. Journal of the American Statistical Association 67, 81–102. Fudenberg, D., & Levine, D. K. (1998). The theory of learning in games, MIT Press, Cambridge, MA. Gazzaniga, M. S. (1998). The split brain revisited. Scientific American July, 50–55. Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in basketball: on the misperception of random sequences. Cognitive Psychology 17, 295–314. Huettel, S. A., & Lockhead, G. (2000). Psychologically rational choice: selection between alternatives in a multiple-equilibrium game. Journal of Cognitive Systems Research 1, 143–160. Hutchins, E. (1991). The social organization of distributed cognition. In: Resnick, L. B., Levine, J. M., & Teasley, S. D. (Eds.), Perspectives on socially shared cognition, The American Psychological Association, Washington, DC. Hutchins, E. (1995). Cognition in the wild, MIT Press, Cambridge, MA. Kelso, J. S. (1995). Dynamic patterns: the self-organization of brain and behavior, MIT Press, Cambridge, MA. Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: verbal reports on mental processes. Psychological Review 84 (3), 231–257. Pool, R. (1995). Putting game theory to the test. Science 267, 1591–1593. Port, R. F., & Van Gelder, T. (Eds.), (1995). Mind as motion, MIT Press, Cambridge, MA, Glossary, pp. 573–578. Rapoport, A., & Budescu, V. (1992). Generation of random series in two person strictly competitive games. Journal of Experimental Psychology: General 121 (3), 352–363. Restle, F. (1966). Run structure and probability learning: disproof of Restle’s model. Journal of Experimental Psychology 72, 382–389. Rose, R. M., & Vitz, P. C. (1966). The role of runs of events in probability learning. Journal of Experimental Psychology 72, 751–760. 239 Rosenblatt, F. (1962). Principles of neurodynamics, Spartan, New York. Roth, A.E. (1996). Comments on Tversky’s ‘Rational theory and constructive choice. In: Arrow, K., Colombatto, E., Perlman, M., & Schmidt, C. (Eds.), The rational foundations of economic behavior, Macmillan, New York. Rumelhart, D. E., Hinton, G. E., & McClelland, J. L. (1986). A general framework for parallel distributed processing. In: Rumelhart, D. E., & McClelland, J. L. (Eds.), Parallel distributed processing: explorations in the microstructure of cognition, MIT Press, Cambridge, MA, pp. 45–76. Samuelson, L. (1997). Evolutionary games and equilibrium selection, MIT Press, Cambridge, MA. Sun, R., & Qi, D. (2000). Rationality assumptions and optimality of co-learning. In: Proceedings of PRIMA’2000, Lecture Notes in Artificial Intelligence, Springer, Heidelberg. Townsend, J. T., & Busemeyer, J. (1995). Dynamic representation of decision making. In: Port, R. F., & Van Gelder, T. (Eds.), Mind as motion, MIT Press, Cambridge, MA, pp. 1–44. Treisman, M., & Faulkner, F. (1987). Generation of random sequences by human subjects: cognitive operations or psychophysical process. Journal of Experimental Psychology 116 (4), 337–355. Tune, G. S. (1964). A brief survey of variables that influence random generation. Perception and Motor Skills 18, 705–710. Tversky, A., & Khaneman, D. (1986). Judgement under uncertainty: heuristics and biases. In: Arkes, H. R., & Hammond, K. R. (Eds.), Judgement and decision making: an interdisciplinary reader, Cambridge University Press, Cambridge, pp. 38–55. Van Gelder, T., & Port, R. F. (1995). It’s about time: an overview of the dynamic approach to cognition. In: Port, R. F., & Van Gelder, T. (Eds.), Mind as motion, MIT Press, Cambridge, MA, pp. 1–44. Vitz, P. C., & Todd, T. C. (1967). A model of learning for simple repeating binary patterns. Journal of Experimental Psychology 75, 108–117. VonNeumann, J. (1928). Zur Theorie der Gesellschaftsspiele. Mathematische Annalen 100, 295–320. VonNeumann, J., & Morgenstern, O. (1944). Theory of games and economic behaviour, Princton University Press, Princeton, NJ. Wagenaar, W. A. (1972). Generation of random sequences by human subjects: a critical survey of the literature. Psychological Bulletin 77, 65–72. Ward, L. M. (1973). Use of markov-encoded sequential information in numerical signal detection. Perception and Psychophysics 14, 337–342. Ward, L. M., Livingston, J. W., & Li, J. (1988). On probabilistic categorization: the markovian observer. Perception and Psychophysics 43, 125–136. Ward, L. M., & West, R. L. (1994). On chaotic behaviour. Psychological Science 5, 232–236. Wells, A. J. (1998). Turing’s analysis of computation and theories of cognitive architecture. Cognitive Science 22 (3), 269–294. West, B. J., & Salk, J. (1987). Complexity, organization and uncertainty. European Journal of Operational Research 30, 117–128.
© Copyright 2026 Paperzz