Simple games as dynamic, coupled systems: randomness and other

Journal of Cognitive Systems Research 1 (2001) 221–239
www.elsevier.com / locate / cogsys
Simple games as dynamic, coupled systems: randomness and
other emergent properties
Action editor: Ron Sun
Robert L. West a , *, Christian Lebiere b
a
Departments of Psychology and Cognitive Science, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1 S 5 B6
b
Human–Computer Interaction Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
Received 28 September 2000; accepted 3 October 2000
Abstract
From a game theory perspective the ability to generate random behaviors is critical. However, psychological studies have
consistently found that individuals are poor at behaving randomly. In this paper we investigated the possibility that the
randomness mechanism lies not within the individual players but in the interaction between the players. Provided that players
are influenced by their opponent’s past behavior, their relationship may constitute a state of reciprocal causation [Cognitive
Science 21 (1998) 461], in which each player simultaneously affects and is affected by the other player. The result of this
would be a dynamic, coupled system. Using neural networks to represent the individual players in a game of paper, rock, and
scissors, a model of this process was developed and shown to be capable of generating chaos-like behaviors as an emergent
property. In addition, it was found that by manipulating the control parameters of the model, corresponding to the amount of
working memory and the perceived values of different outcomes, that the game could be biased in favor of one player over
the other, an outcome not predicted by game theory. Human data was collected and the results show that the model
accurately describes human behavior. The results and the model are discussed in light of recent theoretical advances in
dynamic systems theory and cognition.  2001 Elsevier Science B.V. All rights reserved.
Keywords: Game theory; Dynamic systems; Distributed cognition; Neural networks
1. Introduction
In game theory (VonNeumann & Morgenstern,
1944), models of game playing generally involve a
tension between choosing the best move and behav*Corresponding author.
E-mail addresses: robert [email protected] (R.L. West), cl1
]
@cmu.edu (C. Lebiere).
ing unpredictably.1 This is because, while certain
moves may be better than others (i.e., have a higher
payoff or a higher chance of success), if players
always choose the best move they will be completely
predictable to their opponents. Essentially, game
1
Game theory refers only to imperfect information games. Perfect
information games, such as chess, where the best move can in
principle be computed, are not covered by game theory.
1389-0417 / 01 / $ – see front matter  2001 Elsevier Science B.V. All rights reserved.
PII: S1389-0417( 00 )00014-0
222
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
theory is designed to specify the optimal balance
between choosing the best move and behaving
unpredictably. However, behaving unpredictably requires that the players be able to incorporate a
random component into their behavior. Generally,
this is modeled by assigning probabilities to the
available moves and drawing at random from the
resulting distribution, or by weighting the available
moves and injecting random noise into the process of
selecting between them. But, in either case, there is
an assumption that people make use of some sort of
internal, randomizing process when playing games.
Unfortunately, this key assumption is at odds with
the fact that research has demonstrated that individuals are quite bad at behaving randomly (see
Tune, 1964; Wagenaar, 1972 for reviews). This is a
serious problem because, if we are to take the
standard game theory model seriously, we should
expect people to be fairly good randomizers.
2. A dynamic / distributed perspective
In addition to being unsupported by the empirical
evidence, the assumption that game players make use
of an internal source of randomness supports a view
of game players as highly isolated cognitive agents.
This can be illustrated using the simple game of
paper, rock, and scissors (henceforth, PRS).2 In PRS,
no move is preferred over the others so the game
theory solution is to play randomly, 1 / 3 paper, 1 / 3
rock, and 1 / 3 scissors. Notice that, in order to do
this, the players do not need to pay attention to their
opponent’s moves, they need only pay attention to
their own internal sense of randomness. In contrast,
recent theoretical advances in cognitive science have
shown that complex behaviors can arise from dynamic interactions between mind, body, and the
environment (e.g., Bechtel, 1998; Clark, 1997, 1998,
1999; Hutchins, 1995; Port & Van Gelder, 1995).
2
PRS is a two-player game. On each turn the players choose
between the moves: paper, rock and scissors. The choice of moves
is displayed simultaneously by the players, usually by means of a
simple sign language. The winner for each turn is determined as
follows: paper beats rock, rock beats scissors, and scissors beat
paper. PRS, or variants of it, occurs in many different cultures
under different names.
Applying this theoretical stance to game playing, the
question that we pose is whether or not the function
of generating randomness could be attributed to the
dynamic interaction between the players rather than
to separate mechanisms within the players, and what
consequences this would have for how we view
simple games.
An important clue as to how this could work
comes from psychological experiments on how
people perform in tasks that involve an element of
guessing. Psychological studies have clearly shown
that under these conditions people almost invariably
adopt the strategy of attempting to detect sequential
dependencies. That is, people pay attention to previous results, they search for sequential dependencies,
and they use this information in an attempt to predict
the next trial. Studies show that when sequential
dependencies exist, people are able to exploit them
(e.g., Anderson, 1960; Estes, 1972; Restle, 1966;
Rose & Vitz, 1966; Vitz & Todd, 1967) and that
when they do not exist, people still use this strategy
even though it results in sub-optimal results (e.g.,
Gazzaniga, 1998; Ward, 1973; Ward, Livingston &
Li, 1988). For example, in a simple guessing task in
which a signal has an 80% chance of appearing on
the top part of a computer screen and a 20% chance
of appearing on the bottom, people will fruitlessly
search for sequential dependencies resulting in a hit
rate of roughly 68%, instead of the hit rate of 80%
that could be achieved by always choosing the top
part of the screen (Gazzaniga, 1998).
Clark (1997, 1998) refers to the situation in which
two systems are coupled together in such a way that
they drive each other’s behavior as reciprocal causation. In PRS, two players can be considered in a state
of reciprocal causation if their outputs are influenced
by their opponent’s past behavior, as is the case if
both players are using the strategy of detecting
sequential dependencies. For example, player A’s
behavior would be based on A’s beliefs about
sequential dependencies in player B’s outputs, which
would be driven by B’s behavior, which in turn is
driven by A’s behavior, and so on. One possibility
suggested by treating the players as a coupled system
is that random, or pseudo-random, behavior could be
an emergent property of the interaction between
them. In a game situation, the players’ sequential
dependencies (at least from the players’ perspective)
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
would change constantly as each player alters his or
her outputs to exploit the other’s sequential dependencies. Such a process could make it seem as
though the players were internally generating random
outputs.
In this paper we describe a dynamic model of two
players, coupled together in a game of PRS. The
model was designed to test the general idea laid out
above. However, as Clark (1998) notes, reciprocal
causation is often associated with ‘‘emergent behaviors whose quality and complexity far exceeds
that which either subsystem could display in isolation.’’ And, indeed, this is what we found. Although
PRS is a very simple game, the model revealed an
intricate interaction between players that was sensitive to cognitive manipulations.
3. Paper, rock and scissors
In terms of real world behaviors, PRS can be
considered to represent the elemental game playing
skill of guessing what your opponent will do next.
From an evolutionary perspective, this skill would
have been crucial for survival. For example, consider
a cheetah chasing a gazelle. The gazelle can leap
forward, to the right, or to the left. Therefore, in
order to catch the gazelle, the cheetah must correctly
choose whether to pounce straight ahead, to the right,
or to the left. Similarly, in PRS you must guess what
your opponent will do next in order to win. PRS is
also a ‘repeated game.’ That is, except for the first
move, the player has access to the opponent’s
previous moves (i.e., through memory). More generally, this is usually the case when animals and / or
humans square off against each other in predator /
prey competitions or in disputes over resources or
mates (also in sports such as boxing or basketball).
For example, in mammalian mating competitions,
when two males face each other there is usually
some preliminary movement, seemingly aimed at
feeling out the opponent. Aside from ambushes,
attacks rarely occur in the absence of some recent
history of movement. Thus, although simple, PRS
embodies basic and important game playing skills.
In game theory terms, PRS is a very simple,
zero-sum game. Zero-sum games are games in which
the interests of the players are completely opposed,
223
in contrast to games in which cooperation is an
option, such as the prisoner’s dilemma. VonNeumann’s (1928) Minimax Theory shows that for
all zero-sum games there is always an optimal
strategy. As noted above, for PRS, this strategy is to
play randomly: 1 / 3 paper, 1 / 3 rock, and 1 / 3 scissors. However, it is important to understand what is
meant by the term ‘optimal.’ The game theory
designation of ‘optimal’ is very abstract and connected to the idea that rational players are optimal
players (see Samuelson, 1997, Chapter 1, for a
discussion). Essentially, the optimal strategy refers to
an equilibrium representing the optimal balance
between minimizing risk and maximizing gain, assuming that the opponent will also calculate and
execute an optimal strategy. The important thing to
note is that an optimal strategy will not necessarily
maximize your chance of winning if your opponent
does not use an optimal strategy. For example, if my
PRS opponent plays scissors at an above chance rate,
then I should play rock at an above chance rate to
maximize my chance of winning.
Maximal strategies are designed to maximally
exploit specific instances of biased strategies,3 such
as the one described above. The problem with
maximal strategies is that they require accurate
knowledge of the opponent’s biases, and you need to
assume that your opponent will not change their
strategy in response to your strategy. Game theory
has traditionally avoided these issues by assuming
there is insufficient information to use a maximizing
approach (Samuelson, 1997). However, recent work
in behavioral game theory (e.g., see Camerer, 1997,
1999) has begun to examine the effect of players
learning about their opponent’s strategies during the
course of a game and revising their own strategies
based on this. Under these conditions, it is possible
for the players to fall into equilibriums in which
incorrect views of each other’s strategies persist,
resulting in non-maximal play (e.g., the self-confirming equilibrium, see Fudenberg & Levine, 1998, for
a discussion). Conceptually, the general process that
we describe for producing random outputs in PRS is
somewhat related. The players try to find a maximal
3
Note that the term strategy, in this case, refers to all processes
that influence the players’ outputs. Therefore, an unconscious
tendency to be biased would count as a strategy.
224
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
strategy and continually fail, but in doing this they
can be seen as being at an equilibrium at which they
can be described as playing randomly from the point
of view of an outside observer.
4. Description of the model
Our modeling approach was conceptually similar
to Hutchins’ (1991) method, in which cognitive
agents were modeled as very simple neural networks
in order to study the emergent properties arising
from their interaction. PRS play involves the interactions between two cognitive agents. For purposes of
modeling, these interactions were limited to outputting the symbols paper, rock, and scissors. The
individual players were modeled as neural networks
with a memory buffer to store their opponent’s
previous moves. The size of the memory buffer was
variable. For example, a player might remember the
last two moves, or only the last move. The networks
were designed to predict the opponent’s next move
and produce the appropriate counter move. The
inputs to the networks were the opponent’s recent
previous moves (stored in the memory buffer) and
the output was the player’s move for the current trial.
The networks themselves were simple linear
models (Rumelhart, Hinton & McClelland, 1986)
with two layers of nodes (i.e., similar to perceptrons
— Rosenblatt, 1962). The model consisted of two
neural networks representing two players in a PRS
game. The networks were made as simple as possible. Each consisted of one layer for input and one
for output (see Fig. 1). The output layer consisted of
three nodes, one to represent each of paper, rock, and
scissors. The input layer consisted of a variable
number of three node sets. Each set represented the
previous outputs of the opponent network at a
particular lag, with the three nodes in each set again
representing paper, rock, and scissors. Outputs were
determined by summing the weights associated with
the activated connections. The output node producing the highest sum was chosen as the output, with
ties being resolved through random selection. Learning was accomplished through a simple scheme in
which a win was rewarded by adding one to the
activated connections leading to the node representing the output, and a loss was punished by subtract-
Fig. 1. A simple neural network model of a player’s ability to
detect sequential dependencies in paper, rock and scissors.
ing one. Throughout this paper the networks are
referred to in terms of the number of lags processed.
For example, a network that utilized only the previous trial would be a lag1 network, a network that
considered the previous two trials would be a lag2
network, and so on. In all trials, both networks began
with all weights set to zero.
The use of a two layer network is similar to
Townsend and Busemeyer’s (1995) dynamic model
of decision making, in which a two layer network
was used to model the motivational valence associated with different choices. However, the main
motivation behind the network choice was to keep
the model as simple as possible. Our goal was not to
explore different neural network structures, but
rather to embody a sequential dependency detection
strategy in the simplest and most direct way. In this
regard, the model delineates two principal factors
that would drive any sequential dependency detection strategy: (1) the number of lags used for
prediction, and (2) the goals of the system. The first
factor we associated with working memory, or the
amount of space in the buffer. This corresponds to
how many lags back the system could remember.
The goals of the system are reflected in the rewards
and punishments. For example, a system rewarded
for winning and punished for tying and losing is
trying exclusively to win, a system that is punished
for losing and neither punished nor rewarded for
winning or tying does not care what happens as long
as it does not lose, and so on. In PRS the goal is to
win and to avoid losing, so wins were always
rewarded and losses punished. But ties are somewhat
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
ambiguous in this regard. On one hand, to tie is to
fail to win, but on the other hand to tie is to avoid a
loss. Therefore, we examined two conditions, one in
which ties were punished and one in which ties were
neither punished nor rewarded. This second configuration can be considered analogous to a patient
player who does not care about ties but still wants to
win.
5. Evaluating the model
A dynamic system is ‘‘a system whose state
changes over time in a way that depends on its
current state according to some rule,’’ (Port & Van
Gelder, 1995). As long as a model satisfies this
criterion it can be considered dynamic, although
whether or not it possesses interesting emergent
properties is another question. The concept of
emergent properties has no official definition (see
Clark, 1997, for a review and discussion of possible
definitions), but there is a general agreement that an
emergent property exists ‘‘ . . . whenever interesting,
non-centrally controlled behavior ensues as a result
of the interactions of multiple simple components
within a system’’ (Clark, 1997). Dynamic systems
models are used to study how changes in the control
parameters of the model alter the emergent properties of the model, which are expressed through
global variables. Global variables are variables that
refer to the state of the model as a whole. In our
case, the global variable was the difference between
the players’ scores, and the control parameters were
embodied in the various ways that the model could
be set up (i.e., variations in the number of lags
processed and in the goals of the players).
As Clark (1999) notes, modeling the interaction
between a cognitive agent and its environment
(including other agents in the environment) can be
approached in two ways, through distributed cognition or through dynamic systems theory (DST). DST
models are generally mathematical, employing nonlinear difference equations or differential equations
to model the system as a whole. Generally speaking,
the physical components of a system are not explicitly represented in a DST model (Van Gelder & Port,
1995). For example, in thermodynamic models the
atoms are not represented. In contrast, distributed
225
cognition extends the computational modeling approach traditionally used in cognitive science to
distributed systems comprised of the environment
and various cognitive agents (Clark, 1999; Hutchins,
1995). This results in a very different type of model,
in which the agents are explicitly represented as
components within the system.
In addition to nonlinear, mathematical models,
computational models that embody nonlinear processes can also be considered as DST models. For
example, neural network models can be considered a
form of DST modeling if the networks embody
nonlinear processes in the form of feedback loops
(Van Gelder & Port, 1995). Although the two layer,
feed-forward networks used in this study did not
individually contain feedback loops, when coupled
together the result was a complex, nonlinear system
involving two simultaneous feedback loops with
information flows in opposite directions. Thus, the
coupled system can be considered as a networkbased DST model, constructed within a distributed
cognition framework that identifies the players as
components within the system (see Bechtel, 1998,
for a discussion of computational DST models that
contain explicit representations of their components).
The most common way of exploring dynamic
systems models is by computer simulation. Applied
to distributed cognitive systems this involves modeling the individual agents, simulating the interaction,
and looking for meaningful emergent properties (e.g.,
see Hutchins, 1991). The key to evaluating whether
the emergent properties of a model provide a good
account of the actual phenomena lies in the ability of
the model to produce testable counterfactuals (Bechtel, 1998; Clark, 1997). By counterfactuals what is
meant is that the model can predict what would
happen under different conditions (i.e., different
parameter settings). Without this, a dynamic model is
essentially a description of the phenomena (Clark,
1997), although this can still be useful for demonstrating that a phenomenon can, in principle, be
produced by a certain type of dynamic system (Van
Gelder & Port, 1995). For example, if the model in
this paper were only able to produce random-like
behaviors this would demonstrate that it could, in
principle, account for this behavior in humans; but
that is all. However, if the model predicts different
outcomes depending on the control parameter set-
226
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
tings, then demonstrating that humans react in the
same way to changes in the control parameters
would indicate that the model captures a fundamental
aspect of the process.
In actuality, some counterfactual claims cannot be
tested. For example, it is impossible to run time
backwards to test physics models concerning time,
although it is possible to run the models backwards
to create counterfactual scenarios. This problem also
crops up in psychological DST models. For example,
in dynamic models of limb movement it is not
possible to test the effect of having eight fingers on
each hand instead of five. However, this is generally
not a problem in phenomenologically rich areas,
such as body movement, where a dynamic model can
prove its value by parsimoniously describing a wide
range of naturally occurring behaviors with a single
model (e.g., see Kelso, 1995). However, a simple
behavior, such as playing PRS, poses a problem, as
there simply is not a wide range of naturally
occurring PRS behaviors to test the model against.
To deal with this we developed a methodology
that allowed us to test the model by placing human
subjects in counterfactual scenarios through a mixing
of simulation and reality. The methodology, called
counterfactual testing, involved running simulations
to find distinct emergent properties associated with
the interactions between different types of simulated
players. Following this, the simulated player that was
believed to model human behavior was replaced with
real human players, who faced the same simulated
opponents. The emergent properties of the human /
computer games were then compared to those from
the computer / computer games. In reference to
generating the emergent property, if the emergent
properties were the same it would indicate that the
humans interacted with their simulated opponents in
a manner similar to the component they replaced.
6. Experiment 1: simulating random PRS play
Game theory makes two predictions concerning
PRS. The first is that the expected result should be a
tie. The second is that the outputs of the players are
random or random-like. To model this, identical
network models were played against each other.
Intuitively, it is somewhat obvious why this would
be expected to produce the expected outcome of a
tie, since neither network had any advantage over the
other. However, to produce random-like outputs the
coupled system would need to behave in a chaos-like
manner since there was no independent source of
noise. Generally, computer simulated dynamic
models capable of producing chaos-like outputs
allow for a high number of decimal places to
simulate continuous processes. Simulations of systems lacking such fine precision tend to produce
limit cycles (i.e., simple repeating patterns) or converge to an equilibrium point and repeat the same
output (e.g., see Ward & West, 1994). In fact, any
closed dynamic system modeled using discrete rather
than continuous data (i.e., all computer simulations
of dynamic systems) will eventually settle into a
repeating pattern, but with sufficient fidelity the
length of the repeating pattern is astronomical. Thus
the critical question was whether the very simple
network models used in this study could produce a
chaos-like effect 4 (for a discussion of the fact that
simple, discrete, symbolic systems can produce
complex dynamic behaviors, see Wells, 1998).
6.1. Results and discussion
Multiple trials were run, pitting lag1 networks
against lag1 networks, lag2 networks against lag2
networks, and lag3 networks against lag3 networks.
As would be expected by symmetry, no individual
network was able to systematically gain an advantage over its opponent. As can be seen from Fig. 2,
which displays a lag1 versus lag1 game (plotted in
terms of the difference in scores across trials), the
coupled lag1 networks initially produced a chaos-like
result resembling a random walk, but eventually
settled to an equilibrium. However, as illustrated in
Fig. 3, when the complexity of the system was
increased to a lag2 versus a lag2 game the system
was sufficiently complex to support the random walk
behavior for a large number of trials (actually, 5000
trials could be considered excessive for PRS). This
4
The term chaos-like is used instead of chaos since truly chaotic
systems, i.e. systems that never repeat, exist only in mathematics
or the physical world. In this case, chaos-like is simply meant to
refer to dynamic systems that appear to an observer to behave
randomly.
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
Fig. 2. The difference in score across trials for a lag1 network
versus a lag1 network.
227
theory. In Experiment 1, the working memory
capacity, expressed in terms of how many lags a
network could hold in its buffer, was equal in each
game. This experiment investigated the effect of
unequal working memory capacities. From a game
theory perspective this was a bit unusual as it is
generally assumed that the players have the same
cognitive abilities, but from a psychological perspective we know that this is often not the case (e.g., due
to individual differences and situational differences,
such as stress or a high cognitive load). As in
Experiment 1, ties were treated as losses.
7.1. Results and discussion
Fig. 3. The difference in score across trials for a lag2 network
versus a lag2 network.
result demonstrates that this type of coupled system
is capable of generating chaos-like behaviors, and
that the random component in game playing could,
in principle, be generated through the interaction
between players. This result is quite important as it
demonstrates an avenue for resolving the conflict
between the game theory claim that randomness is
central to game playing, and the psychological
finding that people are poor at behaving randomly.
We simulated 10 games of 500 trials each of a
lag2 network versus a lag1 network and found that
the lag2 network was significantly more likely to
have a higher final score than the lag1 network
(P 5 0.027, pairwise, two-tailed t-test). We also
simulated 10 games of 30,000 trials each of a lag3
network versus a lag2 network and found a similar
advantage for the higher lag network (P 5 0.013,
pairwise, two-tailed t-test). The high number of trials
for the lag3 versus lag2 games was necessary to
produce a significant difference, indicating that there
are diminishing returns for using a higher number of
lags. Fig. 4 displays representative results of four
lag2 versus lag1 games. The random walk quality
found in Experiment 1 is preserved, but the results
appear as a random walk with a trend favoring the
7. Experiment 2: simulating unequal PRS play
The results of Experiment 1 demonstrated that the
coupled networks could mimic random behavior
through a dynamic process. The next step was to
alter the control parameters of the model to see if it
could produce non-random behaviors. For the model
in this paper, producing a non-random result in the
form of a systematic advantage for one player would
be important as it would indicate that the model
could produce PRS behaviors not predicted by game
Fig. 4. The difference in score across trials for several games of a
lag2 network versus a lag1 network. The score differences were
calculated as the lag2 network’s score minus the lag1 network’s
score.
228
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
higher lag network. Thus the model produces a
naturalistic-looking game in which the player with
the larger working memory enjoys a systematic
advantage.
In terms of understanding the process that led to
this result, it is important to keep in mind that the
system that produced it was a coupled system. From
a strict DST perspective, we should think of it as a
single, dynamic system, and not as two separate
players. However, because each network comprised a
distinct module of the system it is possible to gain
some insight into the nature of the interaction
between them (Bechtel, 1998). Since we know that
the networks used the strategy of learning sequential
dependencies, we know that the interaction between
them must have created sequential dependencies in
the outputs of the lower lag network that the higher
lag network could learn. It is also possible to
determine whether or not these learned sequential
dependencies were stable by looking at the connection weights across time. Fig. 5 displays representative results showing the change in connection
weights across time for a lag2 network during a
game against a lag1 network. As can be seen, no
pattern emerges across time, indicating that the lag2
network was not learning a stable pattern of sequen-
Fig. 5. Relative connection weights across trials for a lag2
network during a game against a lag1 network. The graph shows
the influence of the ‘paper’ input nodes (i.e., at lag1 and lag2) on
the outputs, across trials. The relative weights were created by
using the connection weight between the lag1, paper input node
and the paper output node as a standard. The graph shows the
difference between the weight of this connection and the weight of
the other connections between the lag1 and lag2 paper input nodes
and the output nodes. All of the connection weights displayed this
pattern, indicating that nothing stable was learned.
tial dependencies (if it were, a stable pattern of
differences between the weights would increase and
become more evident across time). Since nothing
stable was learned, the lag2 network must have won
by learning relatively short-lived sequential dependencies in the lag1 outputs. From this we can
characterize the process as one of learning and
unlearning relatively short-lived sequential dependencies.
To more precisely determine the length of the
sequential dependencies learned by the networks, we
computed their frequency as a function of their
length. For each input unit, a learned sequential
dependency was defined as the number of trials in
which the weight from that unit to a given output
unit remained larger than the weights from that input
unit to the other output units. We ran 1000 games of
5000 trials each of the lag2 network playing the lag1
network. Fig. 6 plots the number of learned sequential dependencies of length 1 to 100 as a percentage
of all the learned sequential dependencies. Since the
percentage as a function of the length is plotted on a
log–log scale, the roughly linear nature of the curve
suggests a power law distribution of frequencies.
Such distributions are pervasive in natural settings
(e.g., West & Salk, 1987). The curve decreases
quickly as a function of length, with a power law
exponent roughly equal to 22, which results in more
than 90% of dependencies lasting less than 25 trials.
As can be seen in Fig. 6, the lag2 network learned
fewer short sequential dependencies and more long
sequential dependencies than the lag1 network. A
similar pattern was found for the lag3 network versus
Fig. 6. Frequency of sequential dependencies by length for lag1
versus lag2 network games. The gray line represents the lag2
frequencies and the solid line represents the lag1 frequencies.
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
229
the lag2 network. Overall, these results suggested
that the advantage of the higher lag networks was
due to the ability to detect and exploit longer
sequential dependencies.
they clicked on a button marked NEXT to reveal the
computer’s response. The subject’s score, the computer’s score, and the number of trials were displayed and updated on each trial.
8. Experiment 3: humans versus the lag1
network
8.1.3. Procedure
All subjects were required to play against a lag1
network for approximately 20 min. The number of
trials varied according to each subject’s speed and
interest in continuing. All subjects played at least
300 trials. Subjects were instructed that the computer’s responses were not random, that it was
programmed to play like a human, and that it was
possible to beat it. They were also told that the
program was too complex to figure out and that the
way to win was to play by intuition. As in Experiments 1 and 2, ties were treated as losses.
Experiment 2 produced counterfactual scenarios in
which a larger working memory capacity led to a
systematic advantage. The next step was to see if
humans could also produce this type of behavior. A
lag1 network opponent was used to create a condition which maximized the chance that human
subjects could win, since a win by the computer
could be explained in terms of the computer passively detecting naturally occurring sequential dependencies in subjects’ outputs. To win against the network
the human players would need to get the network to
generate the kind of sequential dependencies they
could detect, without generating too many of the
kind of sequential dependencies that the network
could detect. However, if we assume that the lag2
network is a reasonable model of the human ability
to detect sequential dependencies, then this should
occur as an emergent property of the interaction,
provided that human players actually use the strategy
of detecting sequential dependencies to play PRS.
Also, note that if the human players used the strategy
of attempting to generate random outputs this would
result in a tie if they were able to be sufficiently
random to prevent the computer from detecting any
sequential dependencies, or a loss if they failed to
achieve this standard. It is not possible to beat the
lag1 network using this strategy.
8.1. Method
8.1.1. Subjects
The human subjects were nine volunteers from the
University of British Columbia.
8.1.2. Apparatus
The experiment was conducted using a program
written in Visual Basic. Subjects could use a mouse
to click on three different icons to indicate their
move (i.e., paper, rock, or scissors). Following this
8.2. Results and discussion
The mean final scores were 173 for the humans
and 150 for the computer. A pairwise t-test revealed
that this difference was significant (P 5 0.036, twotailed), indicating that the humans were able to
outplay the lag1 network. Fig. 7 displays the difference in scores between the subjects and the computer
across trials. As we can see, only one subject
performed badly, while two performed at roughly a
breakeven level and six were clearly able to outplay
the computer. To get an idea of the general trend the
mean score difference was plotted and is displayed in
Fig. 8 (note: to get an unbiased function the data set
was truncated at 300 trials to make the number of
trials the same across subjects). A regression analysis
on the data in Fig. 8 revealed a significant (P ,
0.001) linear trend, indicating a systematic advantage
in favor of the human subjects. A regression on the
non-truncated data set also produced a significant,
positive, linear trend (P , 0.001). Finally, to more
directly compare the human results to the lag2
network, 100 games of 300 trials each between a
lag2 network and a lag1 network were simulated.
The simulations produced an average score difference of 10.47 (s.d. 13.04) in favor of the lag2
network. At 300 trials, the average score difference
between the subjects and the lag1 network was 9.99
230
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
Fig. 7. The differences in score across trials for nine humans versus the lag1 network. The score differences were calculated as the human
score minus the lag1 network score.
Fig. 8. Mean score differential across trials (the data set was
truncated at 300 trials to make the number of trials the same
across subjects).
(s.d. 19.61), quite close to the value predicted by the
simulation. Fig. 9 displays a percent distribution of
the differences in final scores for the humans versus
the lag1 and the lag2 versus the lag1.
In Experiment 2 the results suggested that the
advantage of the lag2 network over the lag1 network
was due to the ability of the lag2 network to learn
longer lasting sequential dependencies. To examine
this in terms of the human data we adopted a
methodology called model tracing (Anderson, Kushmerick & Lebiere, 1993). This involved matching
each subject with a lag2 network and once again
simulating games between a lag2 network and a lag1
network. In each game we forced the lag2 network to
make the same moves as those made by their human
counterpart in the recorded game. The lag1 network
Fig. 9. A percentage based distribution of final score differences
for humans versus the lag1, and the lag2 versus the lag1. The
score differences were calculated as the human score minus the
lag1 score and the lag2 score minus the lag1 score.
was constrained in the same way. The results were
analyzed in the same manner as in Experiment 2.
Fig. 10 plots the frequency of learned sequential
dependencies as a function of length for the lag1
network and for the lag2 network standing in for the
subjects. For comparison purposes, Fig. 10 also plots
the frequencies generated by the lag1 and lag2
networks playing freely against each other in Experiment 2. The form of the distribution generated by the
free-playing networks was well reproduced. However, because we found very few learned sequential
dependencies lasting over 25 trials, we were unable
to examine the range where the difference between
the free-playing lag1 and lag2 networks was most
pronounced (see Fig. 6). This was probably because
the sequential dependencies lasting over 25 trials are
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
231
9. Experiment 4: humans versus the lag2
network
Fig. 10. Frequency of sequential dependencies by length for the
model-tracing results (Experiment 3) and the free-playing results
(Experiment 2). For the model-tracing results the circles represent
the lag2 frequencies and the triangles represent the lag1 frequencies. For the free-playing results the dotted line represents the
lag2 frequencies and the solid line represents the lag1 frequencies.
relatively rare and the model-tracing simulations
were restricted to low numbers of trials and games.
However, there was a tendency, discernable in Fig.
10, for the lag2 network to learn slightly fewer short
sequential dependencies and slightly more long
sequential dependencies than the lag1 network, as
predicted by the free-playing results.
After the experiment, subjects were informally
asked for any insights they had concerning their own
play. Several subjects reported attempting to draw
the computer into a vulnerable position and then
exploit it, but when pressed for details could not
provide them or described a strategy that would not
work. Specifically, some subjects reported achieving
success by repeating a pattern until the computer,
‘got used to it,’ and then altering their pattern to
exploit what the computer had learned. However,
this simple strategy will not work because the lag1
network unlearns as fast as it learns. This is because
the reward for winning and punishment for losing
were balanced (reward 11, punishment 21). Thus
teaching the network a pattern and then exploiting
the pattern leads to a breakeven situation at best. To
win, a player must present a pattern that, while being
learned, simultaneously exploits earlier learning.
Experiment 4 examined what happened when
subjects played PRS against a lag2 network. Following from the simulation results, if the mechanism
that subjects used to play was functionally equivalent
to a lag2 network they should have had a 50 / 50
chance of winning. However, unlike the computer
models, subjects may not always play perfectly. For
example, they could experience lapses in attention,
which could result in processing, on average, more
than one lag but less than two. Thus it was possible
that subjects could beat a lag1 network but fail to tie
a lag2 network. On the other hand, if subjects were
able to beat a lag2 network, as they beat the lag1
network, it would falsify the claim that humans play
PRS in a manner similar to the lag2 network. Also,
such a win could not be explained in terms of
subjects playing at a lag3 level (or higher) as the
number of trials involved would be insufficient to
distinguish between a lag2 network and a lag3
network (recall from the simulation results that a
lag3 network can be distinguished from a lag2
network only with an enormous number of trials).
Therefore, a collective victory by human subjects
against a lag2 network would indicate a fundamental
problem with the model used in this study.
9.1. Method
9.1.1. Subjects
Eighteen subjects from the University of Hong
Kong volunteered to play against the lag2 network.
Eight of these were tested on a one by one basis
(similar to Experiment 2), while the other 10 were
tested as a group. For the group test a computer lab
was used so that each subject had his or her own PC.
9.1.2. Procedure
The conditions, instructions, and apparatus were
the same as in Experiment 2, except that the individually tested subjects were asked to play until
either they or the lag2 network reached a score of 50,
and the group tested subjects were asked to play until
either they or the lag2 network reached a score of
100. As in the previous experiments ties were treated
as losses by the network. Subjects were told to play
232
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
at their own pace and after approximately 20 min all
subjects were stopped, regardless of how far they
had gotten. Also, subjects were told that the computer was programmed to play quite well and that if
they won only by a little it would demonstrate
considerable ability on their part. This was done to
avoid subjects becoming discouraged and losing
concentration if they failed to gain a decisive advantage.
9.2. Results and discussion
Taking an average of the difference between
subject’s final scores and the lag2 network’s final
scores produced a mean difference of 28.89 (s.d.
19.74). A paired t-test revealed that the difference in
final score was significant (P 5 0.036). Thus, these
results were consistent with the claim that subjects
play PRS in a manner similar to a lag2 network, but
without the consistency of the computer. Alternatively, it was also possible that some subjects did play as
well as the lag2 network but that the averaged results
were dragged down by other subjects who did not.
One factor that might enter here is that it is simply
less fun and less motivating to play when you are not
winning. Because it was evident from these results
that at least some subjects were not playing in the
same way as a lag2 network, we did not pursue
model-tracing.
10. Experiment 5: simulating different payoffs
Experiment 2 simulated the effect of altering the
number of lags processed. In addition to this parameter, it was also possible to adjust the payoffs for
wins, losses, and ties (i.e., the amounts that were
allotted for rewarding and punishing the network). In
Experiments 1 to 4 the networks were rewarded by
adding one to the connection weights for winning,
and punished by subtracting one for losing or tying.
These weights were based on the evolutionary
argument that a tie is a waste of resources and
therefore undesirable. However, taking a less longterm view, it could also be argued that a tie is a
neutral event and should therefore be neither
punished nor rewarded.
At first, it may seem that removing the punishment
for ties should not affect the outcome of a higher lag
model being able to beat a lower lag model. After
all, the effect is merely to prevent a network from
learning to avoid ties, it should still learn to predict
losses and wins. Viewed in this way a lag2 network
should still be able to beat a lag1 network, although
the rate of winning could be slower due to more ties.
This reasoning, however, applies to a single network
attempting to detect a stable pattern of sequential
dependencies. Since the results of Experiment 2
indicated that the interaction between two networks
produces short-term sequential dependencies, it was
not obvious how altering the payoffs would affect
the behavior of the system. Therefore, as in Experiments 1 and 2, simulations were used to explore the
dynamics of the model.
10.1. Results and discussion
All simulations were 5000 trials long. To aid in
the discussion a network that is punished for tying
will be referred to as an aggressive network and a
network that is neither punished nor rewarded for
tying will be referred to as a passive network. In the
first simulation, aggressive lag2 networks were
played against passive lag2 networks. The result was
a clear tendency for the aggressive network to win.
Fig. 11 shows some representative results. Next,
aggressive lag1 networks were played against passive lag2 networks. In this case the results were less
stable, with some runs producing dramatic rise and
fall patterns in the score differential. Overall, there
was no evidence of a systematic advantage for the
Fig. 11. The difference in score across trials for several games of
an aggressive lag2 network versus a passive lag2 network. The
score differences were calculated as the aggressive network’s
score minus the passive network’s score.
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
lag2 network over the lag1 network. One interesting
characteristic of these results was a tendency for the
lag1 network to go on winning streaks followed by
less intense, but longer losing streaks. This pattern
was very obvious in some runs but not in others. Fig.
12 shows an example of a run with this result. Also
evident in Fig. 12 is a fractal structure, in which the
same pattern is repeated on different scales (fractal
patterns often occur as an emergent property of
dynamic systems).
To gain a further understanding of how the
aggressive lag2 network was able to beat the passive
lag2 network, we performed the same analysis on the
learned sequential dependencies as in Experiment 2.
An aggressive lag2 network was played against a
passive lag2 network for 1000 games of 5000 trials
each. Fig. 13 plots the frequency of their learned
sequential dependencies. The results show that the
winning network, the aggressive lag2, learned more
short sequential dependencies and fewer long ones
than the losing network, the passive lag2. This result
is the opposite of the Experiment 2 results, in which
the winning network learned less short sequential
dependencies and more long sequential dependencies
than the losing network. Thus, learning longer
sequential dependencies is not necessarily preferable,
as it appeared from the Experiment 2 results. Instead,
whether learning more long or more short dependencies is associated with winning depends on the
characteristics of the networks involved.
233
Fig. 13. Frequency of sequential dependencies by length for
aggressive lag2 versus passive lag2 network games. The gray line
represents the aggressive lag2 frequencies and the solid line
represents the passive lag2 frequencies.
11. Experiment 6: humans versus a less
aggressive network
The results of Experiment 5 indicated that in order
to beat the aggressive lag1 network in Experiment 3,
the human subjects would have needed to play in a
way similar to an aggressive lag2 network. Since the
results of Experiment 5 also indicated that an
aggressive lag2 network can beat a passive lag2
network, it was predicted that human subjects would
also be able to beat a passive lag2 network.
11.1. Method
11.1.1. Subjects
Twenty-two subjects from the University of Hong
Kong volunteered to play against the passive lag2
network. All of the subjects were tested simultaneously, using a computer lab, as in Experiment 4.
11.1.2. Procedure
The conditions, instructions, and apparatus were
the same as in the group condition in Experiment 4.
As in Experiment 4, subjects were asked to play until
either they or the lag2 network reached a score of
100.
11.2. Results and discussion
Fig. 12. The difference in score across trials for an aggressive
lag1 network versus a passive lag2 network. Note the fractal
structure, in which the same pattern is repeated on different scales.
Out of the 22 subjects, only six failed to win. The
mean final score for the human subjects was 95.27
and the mean final score for the lag2 network was
234
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
Fig. 14. A percentage based distribution of final score differences
for humans versus the passive lag2, and the aggressive lag2 versus
the passive lag2. The score differences were calculated as the
human score minus the passive lag2 score and the aggressive lag2
score minus the passive lag2 score.
84.14. A paired t-test revealed that this difference
was significant (P 5 0.009, two-tailed). As in Experiment 3, simulations were run to better compare
subjects’ performance with the lag2 network. In this
experiment, subjects played an average of 287 trials,
so 100 simulations of 287 trials each were run,
playing an aggressive lag2 network against a passive
lag2 network. The mean difference in final score for
the simulation was 11.17 (s.d. 20.35) in favor of the
Fig. 15. Frequency of sequential dependencies by length for the
model-tracing results (Experiment 6) and the free-playing results
(Experiment 5). For the model-tracing results the circles represent
the aggressive lag2 frequencies and the triangles represent the
passive lag2 frequencies. For the free-playing results the dotted
line represents the aggressive lag2 frequencies and the solid line
represents the passive lag2 frequencies.
aggressive network. For subjects versus the passive
lag2 network the mean difference in final score was
11.14 (s.d. 23.05), very close to that of the simulation. Fig. 14 displays a percent distribution of the
differences in final scores for the humans versus the
lag1 and the lag2 versus the lag1.
As in Experiment 3, we applied model-tracing to
further evaluate the results. In this case, this involved
forcing an aggressive lag2 network to make the same
moves as the human players in games against a
passive lag2 network. The frequency of learned
sequential dependencies are plotted in Fig. 15, along
with the results from the free-playing networks from
Experiment 5. As in Experiment 3, we found very
few of the relatively rare sequential dependencies
lasting over 25 trials, probably due to the low
numbers of games and trials. However, the modeltracing results do a good job of reproducing the
frequency distributions of the free-playing networks
within the shorter range.
12. Discussion
In this study we investigated a distributed cognition model of the interaction between players in a
simple, zero-sum, guessing game (i.e., PRS). Our
results show that human subjects produce results
highly similar to the aggressive lag2 network for two
different counterfactual scenarios, playing an aggressive lag1 opponent and playing a passive lag2
opponent. Although human subjects displayed a
slight tendency to lose against an aggressive lag2
opponent, instead of tying as predicted by the
simulations, this can be accounted for by considering
factors such as concentration and motivation (see
Experiment 4 for a discussion). The validity of our
model is further strengthened by the fact that it was
based, a priori, on psychological research describing
how people behave under similar conditions.
12.1. Other models
Other models have also tried to describe how
game theory solutions could arise in the behavior of
humans and animals. These can be divided into three
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
distinct, but interrelated areas: evolutionary models,
learning models, and psychological models.
Evolutionary models are used primarily by biologists
to explain how the appropriate probabilities for
different actions could have developed. The players
are computer-simulated automata that evolve specific
strategic responses through the use of genetic algorithms. Explaining how game theory solutions could
have evolved is important as game theory has been
successfully used to predict real animal behaviors,
such as the competitive strategies of spiders and
naked mole rats (see Pool, 1995, for a review).
Consistent with this, it is possible to get computer
simulations of simple automata to evolve to game
theory solutions (Roth, 1996).
Similar to evolutionary models, the purpose of
learning models is to explain how players acquire
the probabilities associated with each move in a
game. In these models players attempt to improve
their strategies by learning through experience. However, because players may have incomplete information they can evolve to equilibriums that are
non-optimal and also non-maximal (e.g., see Camerer & Ho, 1999; Claus & Boutilier, 1997; Fudenberg & Levine, 1998; Sun & Qi, 2000). This type of
modeling is important because it can explain (a) how
humans can acquire effective strategies for novel
games without being able to perform the complex
calculations involved in determining the optimal
game theory strategy, and (b) why humans often do
not use the optimal game theory strategy (Pool,
1995). The literature in these areas is large and
diverse (e.g., see ; Camerer, 1997, 1999; Camerer &
Ho, 1999; Claus & Boutilier, 1997; Erev & Roth,
1998; Fudenberg & Levine, 1998; Sun & Qi, 2000).
However, we believe that the unique contribution of
our model is that it explicitly avoids assuming that
individual players use some sort of randomizing
function to implement their strategy. Instead, we
postulate that the randomizing function exists as an
emergent property of the interaction. Typically,
evolutionary models and learning models converge
to a vision of the player as possessing a set of
probabilities expressing the likelihood of each move
and a means of randomly selecting moves according
to these probabilities, which is difficult to reconcile
with the fact that human players are poor at randomizing.
235
Psychological models seek to replace the objectively rational assumptions of game theory with
psychologically realistic assumptions. This goal is
supported by research in psychology and experimental economics demonstrating that human behavior
systematically and predictably deviates from the
rational course of action. Generally speaking, psychological models have attributed the human inability to behave randomly on demand to cognitive biases
related to an incorrect understanding of randomness.
These have been explored extensively (e.g., Tversky
& Khaneman, 1986; Treisman & Faulkner, 1987)
and it is clear that these biases exist when people
attempt to generate random outputs in isolation.
However, our results indicate that people generate
outputs in a different way when involved in an
interactive game situation. In terms of studies testing
people under actual game playing conditions, our
results are consistent with Rapoport and Budescu
(1992) who found that humans behave more randomly in a game situation than in isolation, and with
Huettel and Lockhead (2000) who found that
humans are strongly influenced by past trials when
playing games.
Another related finding in this area is Gilovich,
Vallone and Tversky’s (1985) well-known paper,
The hot hand in basketball. This paper went beyond
artificially created lab situations and attempted to
show that people erroneously perceive sequential
dependencies in real life human games. Specifically
they tested the belief that professional basketball
players are more likely to score if they have recently
scored (in other words, that players are governed by
sequential dependencies). The result was negative,
they found no evidence for this in the task they
studied (shooting sequences of baskets from the free
throw line). This finding would be completely trivial
if the claim were limited to their specific task.
However, the notoriety of the study is due to the
possibility that the result (i.e., people seeing sequential dependencies where they do not exist) can be
extended to human game playing in general. With
regard to this study, it is important to note that the
statistical methodology used by Gilovich et al.
(1985) is only valid for evaluating stable, long-term
sequential dependencies and cannot be used to rule
out the possibility that the players are able to detect
short-term sequential dependencies.
236
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
12.2. Non-distributed, non-dynamic explanations
Our network models show that, under the right
conditions, the interactions between players using the
strategy of detecting sequential dependencies can put
the players into a co-learning equilibrium in which
one player enjoys an advantage that the other player
cannot learn to avoid. The empirical results of this
study also indicate that humans play PRS in a
manner closely approximated by an aggressive lag2
network. To further evaluate this claim we need to
consider the possibility that human players could win
against the networks using a different type of
strategy. The optimal game theory strategy of selecting moves at random will not work, as the expected
outcome is a tie. However, it is possible to win by
allowing the network to detect sequential dependencies that are, in reality, decoys drawing it into a
predictable pattern of play. This strategy of using
one’s own pattern of responses to exert control over
one’s opponent’s responses will be referred to as the
decoy strategy.
The decoy strategy is generally considered problematic as it evokes a recursive pattern of reasoning.
For example, in PRS, if player A played rock on the
first five trials then the opponent, player B, would be
tempted to play paper on the 6th trial, but B would
also know that A knows that B is tempted to play
paper and that therefore A would play scissors and B
should counter with rock. B might further realize that
if he could figure this out then so could A, and
therefore that A would expect B to play rock and
would counter with paper, and so on into an infinite
regress. However, if the players know that their
opponent is using a sequential dependency detection
strategy (something the subjects did not know), the
decoy strategy is tenable. For example, if the opponent learns faster than he unlearns, then the decoy
strategy can be executed by repeating a pattern until
the opponent learns it. Once the pattern is learned the
player can switch to a pattern that exploits responses
based on the learned pattern. Because the learning
process is shorter than the unlearning process, the
cost of teaching the opponent a pattern (i.e., allowing
the opponent to win in order to convince him the
pattern is valid) is less than the advantage gained
during the exploitation phase. But, if the opponent
unlearns at least as fast as he learns, as the network
models in this study did, then the decoy strategy is
more difficult to implement.
To win against the networks used in this study it is
necessary to play according to a pattern that simultaneously exploits previously learned patterns while
causing the opponent to learn a new pattern that can
be exploited in the future. To do this a player would
need to know how his opponent was detecting
sequential dependencies, something our subjects did
not know. Even with this knowledge, it is a fairly
complex problem to reason out how to simultaneously set up and exploit the opponent. An alternative to
reasoning out the problem is to mentally simulate the
opponent’s sequential dependency detection mechanism. However, this would involve a very heavy
cognitive load, even without considering the time
factor (the speed of the human players, on average,
was approximately one move every 1 to 3 seconds).
Also, we need to ask how a player would get the
detailed information necessary to construct an accurate simulation. Having no knowledge about how
their opponent would play, the only way the human
players in this study could gain this knowledge
would be by sampling how the computer played
and / or by doing planned experiments. Again it is
difficult to believe that this is what the human
players were doing.
Note how far we have gone from the simplicity of
the network models in order to come up with a
tenable explanation. Even if we assume that humans
are capable of (1) deducing their opponent’s strategy
through sampling and / or experimentation and (2)
using logic or mental simulations to determine their
next move, we need to ask ourselves if such an
elaborate scheme makes sense from an evolutionary
point of view. One of the arguments supporting a
dynamical model of movement, as opposed to the
classical model in which movements are calculated
in the head and then executed, is that there is no time
to calculate trajectories and forces when one is being
chased by a bear. The systems must work in real
time (see Clark, 1997, for a discussion of this point).
Similarly, in a human-on-human fight (e.g., for
mating privileges), there is no time to sample the
opponent’s behavior, form a model, and work out the
appropriate response.
Finally, subjects’ own accounts of how they
played indicated no complex planning. It is interest-
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
ing, though, that subjects did report vague notions of
the decoy strategy (see Experiment 3). However, the
most likely reason for this is that they were merely
reporting their best guess as to how they were
playing. In many popular games it is possible to
fake-out one’s opponent. For example, in soccer you
can fake going left and then go right to get around
your opponent. This type of strategy is actually a
very simple instantiation of the decoy strategy, so it
is possible that subjects were merely drawing on
their experiences of playing games to generate a
hypothesis as to how they were playing against the
networks. The fact that people can mistake this type
of hypothesis for an insight into their actual behavior
has been well documented (Nisbett & Wilson, 1977).
12.3. Game theory revisited
Game theory is a way of calculating optimal
strategies, but the game theory strategy of playing
randomly is not the best strategy to use against the
network models in this study. For example, a better
strategy would be to: (1) use a network-like, sequential dependency detection strategy, (2) play aggressively (i.e., treat ties as losses), and (3) process more
lags than your opponent. As demonstrated in the
experiments above, this will cause you to win, rather
than tie as with the game theory strategy. However,
the game theory strategy can still be considered an
accurate description of the optimal strategy. Why is
this so? — because the term ‘optimal’ refers to an
equilibrium (Samuelson, 1997). As pointed out in
the Introduction, if an opponent has a bias the game
theory solution does not maximize your chance of
winning. Instead, the maximal strategy is one that
exploits the specific biases of your opponent. However, to do this, you must also play in a biased way,
opening yourself up to exploitation. The game theory
solution represents an equilibrium where both
players have optimally maximized their advantages
and minimized their risks. This is why the evolutionary models and the learning models tend to produce
solutions similar to the game theory solution; the
players, through trial and error, co-evolve to an
equilibrium. If the networks were free to evolve,
passive networks and networks with less working
memory would be weeded out through competition.
237
Given that there is not much of an advantage to
going beyond a lag2 network in terms of memory
(unless the games are extremely long), the result
would be that all the networks would evolve to be
lag2 (or higher), aggressive networks. As Experiment
1 demonstrated, lag2 versus lag2 games result in
essentially random outputs from the players, as
predicted by game theory.
The counterfactual scenarios also need to be
understood in this context. Evolutionary models of
game playing as well as studies on animal populations (Pool, 1995) have demonstrated that evolution
can, and may quite often, produce solutions close to
the optimal game theory solutions. As noted above,
these solutions are the result of the different players
co-evolving. The counterfactual scenarios on the
computer pit mechanisms that have not co-evolved
against each other. However, it is unlikely, given
individual differences and the ability to learn, that
humans have all arrived at a perfect equilibrium for
game playing. Indeed, game theorists generally
regard game theory solutions as generalizations
about a population, not as descriptions of every
individual in the population. Thus, it is not unreasonable to propose that some people may be able to beat
others at PRS, or other games, by virtue of the fact
that they employ more working memory (e.g., by
concentrating), or because they treat ties as losses
while their opponent treats ties as neutral events. If
this is true, it is interesting to consider the possibility
that the effective use of working memory may be a
more important determinant in sports such as boxing
than pure speed or power.
Essentially, games played in the manner described
in this paper fall outside the sphere of the standard
game theory approach (i.e., Minimax theory in the
case of PRS). As noted above, from an evolutionary
perspective we would expect all the networks to
evolve to aggressive, lag2 networks. Although this
would result in the outcome predicted by game
theory (i.e., random-like play), there is no guarantee
that this will always be the case. This is also the case
for the evolutionary models and the learning models
discussed in the Introduction. The reason is that
these models can co-evolve to a local rather than a
global optima. So we should not automatically
assume that game theory will always provide the
correct answer.
238
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
12.4. Counterfactual testing
Cognitive science has traditionally taken a mechanistic approach to modeling (i.e., breaking a system
down into its component parts, Bechtel, 1998; Clark,
1997, p. 104). There are two principal advantages to
this approach. The first is that understanding a
system in terms of its components provides insight
into how the system is actually constructed. The
second is that, when applied to models that do not
involve complex patterns of interactions between the
components, it allows us to understand the sequential
steps involved in the operation of the model and the
specific function of each component at each step
(Bechtel, 1998). However, since DST models generally involve very complex interactions between
system components, they do not provide the second
advantage (Clark, 1997; Bechtel, 1998). In fact, most
DST models do not refer to the individual parts at all
(e.g., models of thermodynamics refer to heat and
pressure but not to individual atoms). Counterfactual
testing provides a methodology for evaluating DST
models that include humans as components (i.e.,
distributed cognition systems). It also provides a
clear delineation of the parts of the system in terms
of the agents (human and otherwise) involved in the
task. However, because counterfactual testing is
based on the global variable values of a DST system,
it does not provide insight into the interaction
between the agents. Characterizing the interaction
can be accomplished to some degree by (1) deduction about what must be going on, based on knowledge of the simulated agents, and (2) other types of
analysis (e.g., model tracing).
recurrent networks use information from preceding
trials to predict current trials, but instead of using a
memory buffer the previous trials are represented
within the network itself. Since the task (i.e., guessing what comes next) is similar, people should use
the same mechanism. Future work should focus on
finding a mechanism that can unify different tasks
that involve this type of guessing.
12.6. Conclusion
Our findings indicate that cognitive factors play a
much larger role in simple games than has previously
been thought. More specifically, we found that PRS
is not a case of two players tossing out random
outputs, but rather an intricate dance, in which the
outcome is determined by the amount of working
memory employed and the values assigned to the
outcomes on each trial. The results also demonstrate
that randomness can be understood as an emergent
property of the interaction between players rather
than an individual cognitive ability. This is significant as it offers an explanation of how individual
humans could be poor at randomizing when this
ability is so important from a game theory perspective.
Acknowledgements
This research was supported, in part, by a grant
from the Natural Sciences and Engineering Research
Council of Canada to RLW.
12.5. Neural networks
References
Is the lag2 network the best model of the human
ability to detect sequential dependencies? As noted
above, we strove to create the simplest possible
model, so it is possible that a more complex model
that has other applications could also produce these
effects. Ultimately, extending the model in this paper
to more complex games or other situations that
involve guessing may prove that it is too limited. In
this regard, work indicating that simple recurrent
networks (Cleeremans & McClelland, 1991; Elman,
1990) can explain implicit memory tasks may be
relevant. Like the networks used in this paper, simple
Anderson, N. H. (1960). Effect of first-order probability in a two
choice learning situation. Journal of Experimental Psychology
59, 73–93.
Anderson, J. R., Kushmerick, N., & Lebiere, C. (1993). Navigation and conflict resolution. In: Anderson, J. R. L. (Ed.), Rules
of the mind, Erlbaum, Hillsdale, NJ.
Bechtel, W. (1998). Representations and cognitive explanations:
assessing the dynamicist’s challenge in cognitive science.
Cognitive Science 22 (3), 295–318.
Camerer, C. F. (1997). Progress in behavioral game theory.
Journal of Economic Perspectives 11 (4), 167–188.
Camerer, C. F. (1999). Behavioral economics: reunifying psychology and economics. Proceedings of the National Academy
of Sciences 96, 10575–10577.
R.L. West, C. Lebiere / Journal of Cognitive Systems Research 1 (2001) 221 – 239
Camerer, C. F., & Ho, T. H. (1999). Experience weighted
attraction learning in normal-form games. Econometrica 67,
827–874.
Clark, A. (1997). Being there: putting brain, body and world
together again, MIT Press, Cambridge, MA.
Clark, A. (1998). The dynamic challenge. Cognitive Science
21 (4), 461–481.
Clark, A. (1999). Where brain, body, and world collide. Journal of
Cognitive Systems Research 1, 5–17.
Claus, C., & Boutilier, C. (1997). The dynamics of reinforcement
learning in cooperative multiagent systems. In: AAAI-97 Workshop on Multiagent Learning.
Cleeremans, A., & McClelland, J. L. (1991). Learning the
structure of event sequences. Journal of Experimental Psychology: General 120, 235–253.
Elman, J. L. (1990). Finding structure in time. Cognitive Science
14, 179–211.
Erev, I., & Roth, A. E. (1998). Predicting how people play games:
reinforcement learning in experimental games with unique,
mixed strategy equilibria. American Economic Review 88 (4),
848–881.
Estes, W. K. (1972). Research and theory on the learning of
probabilities. Journal of the American Statistical Association
67, 81–102.
Fudenberg, D., & Levine, D. K. (1998). The theory of learning in
games, MIT Press, Cambridge, MA.
Gazzaniga, M. S. (1998). The split brain revisited. Scientific
American July, 50–55.
Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in
basketball: on the misperception of random sequences. Cognitive Psychology 17, 295–314.
Huettel, S. A., & Lockhead, G. (2000). Psychologically rational
choice: selection between alternatives in a multiple-equilibrium
game. Journal of Cognitive Systems Research 1, 143–160.
Hutchins, E. (1991). The social organization of distributed cognition. In: Resnick, L. B., Levine, J. M., & Teasley, S. D. (Eds.),
Perspectives on socially shared cognition, The American Psychological Association, Washington, DC.
Hutchins, E. (1995). Cognition in the wild, MIT Press, Cambridge, MA.
Kelso, J. S. (1995). Dynamic patterns: the self-organization of
brain and behavior, MIT Press, Cambridge, MA.
Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can
know: verbal reports on mental processes. Psychological Review 84 (3), 231–257.
Pool, R. (1995). Putting game theory to the test. Science 267,
1591–1593.
Port, R. F., & Van Gelder, T. (Eds.), (1995). Mind as motion, MIT
Press, Cambridge, MA, Glossary, pp. 573–578.
Rapoport, A., & Budescu, V. (1992). Generation of random series
in two person strictly competitive games. Journal of Experimental Psychology: General 121 (3), 352–363.
Restle, F. (1966). Run structure and probability learning: disproof
of Restle’s model. Journal of Experimental Psychology 72,
382–389.
Rose, R. M., & Vitz, P. C. (1966). The role of runs of events in
probability learning. Journal of Experimental Psychology 72,
751–760.
239
Rosenblatt, F. (1962). Principles of neurodynamics, Spartan, New
York.
Roth, A.E. (1996). Comments on Tversky’s ‘Rational theory and
constructive choice. In: Arrow, K., Colombatto, E., Perlman,
M., & Schmidt, C. (Eds.), The rational foundations of economic
behavior, Macmillan, New York.
Rumelhart, D. E., Hinton, G. E., & McClelland, J. L. (1986). A
general framework for parallel distributed processing. In:
Rumelhart, D. E., & McClelland, J. L. (Eds.), Parallel distributed processing: explorations in the microstructure of cognition,
MIT Press, Cambridge, MA, pp. 45–76.
Samuelson, L. (1997). Evolutionary games and equilibrium
selection, MIT Press, Cambridge, MA.
Sun, R., & Qi, D. (2000). Rationality assumptions and optimality
of co-learning. In: Proceedings of PRIMA’2000, Lecture Notes
in Artificial Intelligence, Springer, Heidelberg.
Townsend, J. T., & Busemeyer, J. (1995). Dynamic representation
of decision making. In: Port, R. F., & Van Gelder, T. (Eds.),
Mind as motion, MIT Press, Cambridge, MA, pp. 1–44.
Treisman, M., & Faulkner, F. (1987). Generation of random
sequences by human subjects: cognitive operations or psychophysical process. Journal of Experimental Psychology 116 (4),
337–355.
Tune, G. S. (1964). A brief survey of variables that influence
random generation. Perception and Motor Skills 18, 705–710.
Tversky, A., & Khaneman, D. (1986). Judgement under uncertainty: heuristics and biases. In: Arkes, H. R., & Hammond, K. R.
(Eds.), Judgement and decision making: an interdisciplinary
reader, Cambridge University Press, Cambridge, pp. 38–55.
Van Gelder, T., & Port, R. F. (1995). It’s about time: an overview
of the dynamic approach to cognition. In: Port, R. F., & Van
Gelder, T. (Eds.), Mind as motion, MIT Press, Cambridge, MA,
pp. 1–44.
Vitz, P. C., & Todd, T. C. (1967). A model of learning for simple
repeating binary patterns. Journal of Experimental Psychology
75, 108–117.
VonNeumann, J. (1928). Zur Theorie der Gesellschaftsspiele.
Mathematische Annalen 100, 295–320.
VonNeumann, J., & Morgenstern, O. (1944). Theory of games and
economic behaviour, Princton University Press, Princeton, NJ.
Wagenaar, W. A. (1972). Generation of random sequences by
human subjects: a critical survey of the literature. Psychological
Bulletin 77, 65–72.
Ward, L. M. (1973). Use of markov-encoded sequential information in numerical signal detection. Perception and Psychophysics 14, 337–342.
Ward, L. M., Livingston, J. W., & Li, J. (1988). On probabilistic
categorization: the markovian observer. Perception and Psychophysics 43, 125–136.
Ward, L. M., & West, R. L. (1994). On chaotic behaviour.
Psychological Science 5, 232–236.
Wells, A. J. (1998). Turing’s analysis of computation and theories
of cognitive architecture. Cognitive Science 22 (3), 269–294.
West, B. J., & Salk, J. (1987). Complexity, organization and
uncertainty. European Journal of Operational Research 30,
117–128.