Can we rationally learn to coordinate?

SANJEEV GOYAL AND MAARTEN JANSSEN
C A N W E R A T I O N A L L Y L E A R N TO C O O R D I N A T E ? ~
ABSTRACT. In this paper we examine the issue whether individual rationality
considerations are sufficient to guarantee that individuals will learn to coordinate.
This question is central in any discussion of whether social phenomena (read:
conventions) can be explained in terms of a purely individualistic approach. We
argue that the positive answers to this general question that have been obtained in
some recent work require assumptions which incorporate some convention. This
conclusion may be seen as supporting the viewpoint of 'institutional individualism'
in contrast to 'psychological individualism'.
KEY WORDS: Individual rationality, collective rationality, learning, coordination, conventions.
"We must remember that a pattern whether
of the past or of the future is always arbitrary or partial in that there could always
be a different one or a further elaboration
of the same one."
(C. Palliser, Quincunx, 1989, Book V: The
Maliphants, Part V).
1. INTRODUCTION
M a n y situations in every day life can be understood in the f r a m e w o r k
o f coordination problems. Familiar examples include w h i c h side o f
the road to drive on, whether or not to call back once the connection
o f a telephone conversation has suddenly broken and, in the context
o f economics, what type of c o m m o d i t y to carry in y o u r pocket as a
m e d i u m o f exchange. The typical feature o f these and other coordination problems is that the optimal action of every individual agent
crucially depends on the action that is chosen by the other agent(s).
Moreover, there are two or more (Nash equilibrium) combinations
at w h i c h each agent's action is optimal given the actions of the other
agents. Finally, the agents are often indifferent between the different
equilibrium combinations. Accordingly, there is a strong sense in
w h i c h the best course o f action is undetermined.
Theory and Decision 40: 29--49, 1996.
@ 1996 Kluwer Academic Publishers. Printed in the Netherlands.
30
SANJEEV GOYAL AND MAARTEN JANSSEN
Experience suggests that people in 'real' coordination problems
are, on the whole, very well able to coordinate their actions. It
seems that for every particular problem some actions 'stand out'
compared to the other possible actions and that people tend to use
the actions that stand out. At the theoretical level, this has lead to
investigations into the nature of 'salience' (Schelling, 1960) and
'conventions' (Lewis, 1969). The integration of these concepts into
the individualistic language of game theory in which the coordination problems are cast has turned out to be rather problematic, however. Two questions have attracted some attention in this
respect. First, how do the notions of salience and convention relate
to the view that individual agents choose their actions in a rational way? (See, e.g., Gauthier, 1975 and Gilbert, 1989.) Second,
can the notions of salience and convention themselves be grounded
on rationality principles? (See, e.g., Bacharach, 1991 and Sugden,
1992.)
One might argue that these questions are inherently difficult to
address in a static context and in this paper we take a slightly different perspective by investigating whether individual agents can
learn, in a rational way, to coordinate their actions if the coordination problem that they face is a repeated one. If this were the
case, the second question above could be answered in the affirmative, i.e., a convention could be said to emerge from a rational learning process. Another rationale for adopting a dynamic
perspective is that it is generally believed that although individual agents might not be able to solve a particular coordination
problem at once, they will learn to coordinate actions after some
time.
This paper will argue that individual rationality considerations
are not sufficient to ensure that individual agents learn to coordinate. On the contrary, some form of common background (convention) has to be assumed for rational learning to coordinate to take
place. The conclusion that we draw from this analysis for the individualistic program underlying much of the social sciences is in
line with the position defended by Agassi (1960); see also Janssen
(1993). He argues, in our view convincingly, against a psychological form of individualism and in favor of an institutional form of
individualism. According to the latter view, the aim of the individualistic program is "neither to assume the existence of all co-
CAN WE RATIONALLY LEARN TO COORDINATE?
31
ordination nor to explain all of them, but rather to assume the existence of some co-ordination in order to explain the existence of
some other co-ordination" (Agassi, 1960, p. 263). We will argue
that some conventions have to exist in order to explain the emergence of other conventions. In recent papers (Goyal and Janssen,
1993a,b) we have followed this approach in developing formal models to study questions of inertia and dynamic stability of conventions.
The above point of view might appear to be in conflict with the
general idea behind recent papers by Crawford and Haller (1990)
and Kalai and Lehrer (1993). We will discuss these papers in some
detail while developing our argument. Crawford and Haller argue
that there are optimal rules of learning to coordinate. These optimal rules themselves are, however, in general not unique and, in
our view, this non-uniqueness problem can only be resolved by
positing the existence of a coordinating convention at a higher level. Kalai and Lehrer, on the other hand, show that under some
restrictions on prior beliefs, Bayesian ('rational') learning leads
in the long run to Nash equilibrium. Applying this result to our
context might suggest that Bayesian learning leads to individual agents playing one of the pure Nash equilibria of the coordination game. We will argue that the restrictions on prior beliefs
that they require implicitly assume a conventional selection of priors.
Another conclusion one might draw from our analysis is that the
perfect rationality paradigm is not very powerful when it comes to the
study of interactive learning processes. Instead, one could resort to
models of boundedly rational behavior or evolutionary models. We
think recent developments (cf., Kandori et al., 1993; Young, 1993)
within these approaches are interesting, but they fall outside the
scope of the present paper. Here, we focus on the issue to what extent
learning by perfectly rational agents can ensure coordination.
In order to be able to state our argument with some care, we
have to give definitions of the basic terms that are used. This will be
done in Section 2. Section 3 contains the discussion of the problems
rational individual agents face while learning to coordinate. Sections
4 and 5 examine the papers by Crawford and Haller (1990) and Kalai
and Lehrer (1993a), respectively. Section 6 concludes with a brief
comparison of the two approaches.
32
SANJEEV GOYALAND MAARTENJANSSEN
2.
NOTATION A N D D E F I N I T I O N S
In order to concentrate on issues of coordination we will consider
pure coordination games only. In a pure coordination game there is
no conflict of interest; the only aim of the players is to coordinate.
For notional simplicity we will restrict ourselves to the two players
case. Most of the time we also restrict attention to the simplest
coordination game, namely the following two action game:
(/1,1t (o,o/)
L
R
/3\(0,0)
This game has two pure strategy Nash equilibria: (T, L) and (/3, R).
In the discussion of Crawford and Haller (1990) in Section 4 we
will also consider a k action coordination game, where the payoff matrix is the k x k identity matrix. Denote the row player and
the column player as 1 and 2, respectively. The two players are
playing this game an infinite number of times. Let 8 E [0, 1) denote
the common discount factor. The players observe all actions and
the resulting pay-offs, i.e., history is completely observable. We
also assume that players have perfect recall. The players cannot
communicate except via their actions. 2 These assumptions and the
rationality of the players are assumed to be common knowledge.
It is possible that the two players might choose one of the two
coordination equilibria by chance. If, for example, in period ~player 1
1
chooses T with probability ~ and player 2 chooses L with probability
1
1
~,
then the chance of coordination in period f will be equal to g.
Moreover, if the two players keep playing these mixed strategies,
then for any finite number of times the game is actually played,
the chance that they have coordinated at every instance is strictly
positive. This type of 'coordination' is unsatisfactory, however. What
one wants is that coordination is ensured, not that there is a (small)
probability that it might result. Also, it is not clear what the players
learn when they constantly play this mixed strategy.
In order to clarify what we mean by 'learning to coordinate',
we introduce the following (standard) notation and definitions. A
(behavior) strategy 3 of player i in period t is a function s~ 9 H --+
A(Ai), where H is the set of all possible histories, A(Ai) denotes
33
CAN WE RATIONALLY LEARN TO COORDINATE?
the set of probability distributions over Ai and A1 = {T, B} and
A2 = {L, R}. Thus, a strategy specifies how a player randomizes
over his actions after all possible histories. We denote by 7r~(sl, s2)
the discounted sum of expected pay-offs of player i, if strategies s i
and S 2 are played, i.e.,
O(3
~li (8 i, $2) = (i -- ~) Z E(Sl,S2) ~t7i{ (a,l (t), a2 (t))
t=l
i=I,2,
where ai (t) 9 Ai is the action of player i in period t and 7ri(al (t), a2 (~))
is the stage-game pay-off of player i in period t.
Player i's belief about player - i ' s strategy is denoted by #~,
where #i : H ~ A (Ai). Such a belief specifies how player i expects
player - i to randomize over his actions after any possible history.
Note that the possibility for learning about the other player's strategy
is incorporated in the belief.
Given the above definitions of strategies and beliefs, the notion
of individual rational behavior that is employed in this paper is a
traditional one. A strategy s~ is said to be individually rational if
there is a belief #~ such that (i) s~ maximizes the discounted sum
of expected pay-offs given #~ and (ii) #~ is 'consistent' with all
information gathered throughout history. For the moment, we restrict
ourselves to a very general notion of consistency according to which
a belief should be such that at any point in time the observed history
up to that time should have been possible given a player's own
strategy and this belief. Other notions of consistency, like Bayesian
updating, will be introduced later.
We are now ready to define the basic concept of 'coordination'.
Coordination is regarded as a property of a pair of strategies as
follows.
DEFINITION 1. A pair of strategies {sl, 32} is coordinated if for
any c > 0 there is a finite t*(c) such that for all t > t*(G),
P[(a,(t),a2(t)) = (T,L)l(Sl,S2)] > 1 - c
a2(0)
= (B,
or
> 1 -
The expression P[(a.l(t),a2(t)) = (T,L)l(Sl,S2)] in the above definition has to be read as the probability that the action combination
(T, L) is played in period t given that the players have chosen the
strategy combination (s l, s2).
34
SANJEEV GOYAL AND MAARTEN JANSSEN
Two elements of this definition are worth emphasizing. First, we
say that a pair of strategies is coordinated only if one of the two pure
strategy equilibria is played from some finite time period onwards
with very high probability. A pair of strategies which results, for
example, in alternating between (T, L) and (B, R) therefore does
not satisfy this requirement. We have opted for this formulation,
because we want to investigate to what extent a convention to play
the stage game can emerge out of a process in which rational players
learn while playing the game. Second, we do not require that there
is a finite period after which the players always play the same pure
strategy equilibrium. This is done in order to take account of the
possibility that players randomize until a certain, somehow desired,
outcome (for example, (T, L)) in the stage game is obtained and
play pure strategies ever after (cf., Section 4). For every finite period there is always a (small) chance that they have not reached the
desired outcome if they choose to play such a strategy.
3.
T H E P R O B L E M STATED
In the previous section we have not discussed the learning process
itself: how do rational individuals learn? It is clear that the strategies
that players can rationally choose do not change over time, if the
beliefs #i about future plays of the opponent that are consistent with
past observations do not change. This suggests that the problem of
learning in games is of the following type: what can the players infer
about the future on the basis of a finite number of observations? Can
they become more certain about future plays of the opponent the
more observations they have about the opponent's past play? This
is, of course, very similar to the well-known Problem of Induction.
In the present context, however, there is an additional problem. An
agent should take into account the possibility that the opponent's
play is not stationary, because she (the opponent) is also learning
about the strategies he is playing and will try to adjust optimally.
Two notions of learning that are frequently employed in game
theory and in economics are adaptive (Cournot) learning and 'stationary' (Bayesian) learning (cf., Eichberger et al., 1993). 4 For
the present purposes, both notions are unsatisfactory, because they
assume in one way or another that an individual player takes the
other player's stage game strategy as stationary. This is something
CAN WE RATIONALLY LEARN TO COORDINATE?
35
they know to be false, since they know their opponent is learning at
the same time.
A notion of learning that is more satisfactory in this respect is the
notion of sophisticated learning introduced by Milgrom and Roberts
(1991). They show that the stage game strategies of what they regard
as sophisticated learners converge to the set of iteratively undominated strategies. Applying their results (especially Theorem 5) to
our context yields the conclusion that sophisticated learning cannot
ensure coordination. The idea, basically, is that an uncoordinated
pair of strategies as, for example, {s l, $2} = { B / ~ B . . . , L LL... },
where t3BB stands for 'always B' is consistent with sophisticated
learning, because at any time each player can consistently forecast
that the other player will see that it is in his or her interest to switch
actions the next time period, i.e., nothing prevents player 1 from
expecting player 2 to play R the next period after having observed
a finite number of L's. It is then rational to continue to play B.
One might wish to argue that the notion of sophisticated learning is not an appropriate notion of rational learning in the present
context if { B B B . . . , LLL...} is a pair of strategies that is consistent with sophisticated learning. Indeed, it seems reasonable to
assume that in an infinitely repeated game the players will reach one
of the coordination equilibria at least once. So, let us consider the
question of what the players are able to infer after having reached
a pure strategy equilibrium of the stage game. We argue that the
strategic uncertainty the individual players face in deciding upon
their strategy at period t + 1 is basically the same whether or not
they have reached a coordination equilibrium at period t. Suppose
the players have played (T, L) in period ~. For the players individually there is no way of knowing whether they have coordinated
on a pair of strategies {s~,s2} = { T B T B , . . . , L R L R . . . } or on
{Sl, s2} = {TTTT,..., LLLL...} or on still other pairs of strategies that could possibly have resulted in the action combination
(T, L) in period t. Accordingly, it is possible that one of the players
thinks that they will alternate deterministically between (T, L) and
(/3, R), while the other player thinks that they will continue playing
(T, L). This means that after (T, L) has occurred in period t, the players may find themselves at (B, L) in period t + 1. The same is true for
any number of times they have reached a coordination equilibrium.
In other words, on the basis of only a finite number of observations
36
SANJEEV GOYAL AND MAARTEN JANSSEN
the players cannot ascertain what the other player will do in the (near)
future, i.e., the players face the well-known problem that at any point
in time there is an infinite number of hypotheses that is consistent
with past observations (each of these hypotheses having different
implications for expected future play). We will call the problem that
the players do not know which hypothesis to use for predicting future
opponent's play the Projection problem (after Goodman, 1973).
The argument we are putting forward is very much like an argument recently made by Gilbert (1990). Gilbert (p. 10) argues that
"common knowledge of precedent as such will not by itself automatically generate expectation of conformity". The reason she offers
is that after having coordinated before, a rational agent will infer that
it is best to conform to precedent if and only if his opponent will do
likewise. He will realize that his opponent does not have an independent reason for conforming to precedent either. Therefore, there
is nothing inherently rational in conforming to precedent.
In this section we have argued that the essential problem in rational learning to coordinate lies in the difficulty of making inferences
about future play on the basis of past play. The fact that people often
seem to think they can learn from past experience might suggest
that our argument is too agnostic to provide an accurate account of
how rational agents behave in repeated coordination situations. This
motivates our discussion Of two other approaches which try to ensure
coordination by imposing more structure on the interactive learning
process.
4.
APPROACH I: EXTENDED RATIONALITY AND NO COMMON
LANGUAGE
The approach considered in this section is based on a paper by
Craw ford and Haller (1990), henceforth C&H, who extend the notion
of rationality in coordination games and introduce a 'no common
language' assumption. We will introduce their ideas below and look
at the consequences of their approach for the problems we have
indicated in the previous section.
Extended Rationality
C&H "maintain the working hypothesis that players play an optimal
... attainable strategy combination" (p. 580). They define an attain-
CAN WE RATIONALLY LEARN TO COORDINATE?
37
able strategy combination as a pair of strategies in which each player's 'undistinguished' actions enter his overall strategy in a symmetric way and in which players whose positions are 'undistinguished'
play identical strategies. An optimal attainable strategy combination
is an attainable strategy combination that maximizes both player's
repeated-game pay-offs. For the moment, we will assume that players have a common language in which they describe the game so
that the players' actions are distinguished and all strategy combinations are attainable. In this case, C&H's working hypothesis reduces
to the players choosing an optimal strategy combination, i.e., they
coordinate immediately.
The idea of players choosing an optimal strategy combination
is similar to a suggestion by Sugden (1991) who proposes that in
coordination games, players be regarded as a team in a game against
nature. As a member of a team, each rational player should individually think of a strategy combination which, if both players did their
part, would optimize their individual (and collective) pay-offs. We
will call this notion 'individual rationality in the extended sense',
because this notion defines rational behavior in such a way as to
include a belief of a 'good' strategy of the other player. This extended notion of individual rationality is, we think, quite natural in pure
coordination games, because players know that it is in their mutual
interest to coordinate. Hence, it is reasonable to expect 'good will'
on the part of the opponent. The notion of extended rationality is
a generalization of the Principle of Coordination to the context of
repeated games. This principle says that in coordination games with
a strict Pareto-best equilibrium, each player should do his or her part
in it (see, e.g., Gauthier, 1975 and Bacharach, 1991).
We now formally state the definition of'extended individual rationality' and distinguish it from 'collective rationality'. We will make
this distinction in order to be able to express what it means for players
to choose a strategy combination.
DEFINITION 2. A strategy si is individual rational in the extended
sense if it is part of a pair of strategies { si, s_i } such that for i = 1,2,
for all strategies s'~,s;~ of players i and - i .
38
SANJEEV GOYALAND MAARTENJANSSEN
DEFINITION 3. A pair of strategies {sl,
if
7ri(s,,s2) /> 7ri(Jl,s;),
82 }
is collectively rational
i = 1,2,
for all strategies s], s~ of players 1 and 2.
The difference between the two definitions is that only the second
definition requires some collective action to be undertaken. The first
definition requires the players individually to come up with a 'team'
solution (pair of strategies) that would be optimal for both. Each
player should then perform his part of the team solution. The two
definitions yield the same outcomes if and only if there is a unique
pair of strategies that is collectively rational. In this case, both players
will necessarily find the same solution to the 'team problem' and
the definition of extended individual rationality prescribes that the
players perform their part of this pair of strategies. Since, typically,
there are many pairs of strategies that are collectively rational, every
individual player of 'good will' can choose his or her part of a
different pair of such strategies. Accordingly, coordination is not
ensured.
The fact that in the case of a common language C&H's approach
yields a coordinated outcome from the first period onwards is due to
their implicit assumption that players choose a pair of strategies that
is collectively rational, i.e., our definition of collective rationality
coincides with their hypothesis that players play an optimal attainable strategy combination. This is exemplified by their saying that
players can use "their perfect recall and knowledge of the structure
of the game to maintain coordination forever once they locate a pair
of coordinated actions. They can do this either by repeating those
actions, or by alternating deterministically between them and the
other coordinated pair" (p. 575). As Crawford and Haller (and we)
assume that communication is not possible except by playing the
game, we think that players individually cannot maintain coordination once they have reached a coordination equilibrium of the stage
game. It is precisely because both 'repeating the same action as in
the last period' and 'choose the action that you didn't choose in the
last period' are collectively rational that player 1 might think that
player 2 will repeat her action, while player 2 might think that player
1 alternates his action deterministically. Thus, it is not certain that
CAN WE RATIONALLY LEARN TO COORDINATE?
39
rational players will maintain coordination once they have located a
pair of coordinated actions (see also Section 3).
Of course, it might be that coordination results even in cases
where there is no unique solution that is collectively rational. This
can be the case, for example, when one 'team solution' is more
prominent than the others. This would, however, introduce a focal
point solution at the level of which pair of strategies to select from
the set of strategies that is collectively rational. This means that
C&H's approach only seems to take the coordination problem one
stage backwards.
No Common Language and No Common Labeling
A second aspect of C&H's paper is that they allow for situations
in which players do not have a common labeling to describe the
game. This assumption has the powerful implication that not all
strategy combinations are attainable. As an example of no common
labeling, one can think of the game described in Section 2 with the
modification that player 1 does not know what L and what R is for
player 2 and player 2 does not know what T and what B is for player
1. In this case, it makes no sense for a player to try to coordinate
on say (T, L), because he does not know whether T and L have the
same meaning for the other player. Hence, Definitions 2 and 3 have
to be modified in such a way that the set of strategy combinations
is restricted to the set of attainable ones. When there are multiple
attainable optimal strategy combinations, the same difficulties as
the ones described above arise. However, due to the restriction to
attainable strategies there are cases (see below) in which there is a
unique optimal strategy combination and, under the assumption of
extended rationality, individuals will do their part of this strategy
combination.
It will turn out that a distinction between 'no common language'
and 'no common labeling" is useful, s We will say that players do not
have a common language if there is no one-to-one correspondence
from one language to the other; they have no common labeling of
terms if there exists such a one-to-one correspondence between every
pair of terms. 6 We will argue that a new set of problems arise if the
assumption of no common language is taken seriously. On the other
hand, for the no common labeling assumption C&H obtain a nice
result for the 3 x 3 action game.
40
SANJEEV GOYAL AND MAARTEN JANSSEN
To illustrate the nature of this problem, we consider a reformulation of Goodman's 'grue' and 'bleen' example (Goodman, 1973).
Suppose, that player 1 describes the coordination game in terms of
the terms TOP and BOTTOM, while player 2's language describes
the game in terms of TOTTOM and BOP. The peculiar feature of
the two languages is that what TOP is for player 1 is, for player
2, TOTTOM until t and BOP after t. Similarly, what BOTTOM is
for player 1 is for player 2 BOP until time t and TOTTOM after ~.
Reciprocally, what TOTTOM (BOP) is for the second player is for
the first player TOP (BOTTOM) until t and BOTTOM (TOP) after
time t. 7
Let us now suppose that players have attained a stage game equilibrium for the first time in period t and that the pair of actions that
have produced this equilibrium is (TOP, TOP) in player 1 's language
and (TOTTOM, TOTTOM) in player 2's language. In this case the
focal point 'stick to the same action' does not work, because player 1 will persist with TOP and the second player will persist with
TOTTOM and they will not coordinate in period ~ + 1. It is interesting to note that both players might accuse the other of changing
actions !8
Let us now briefly consider the case of no common labeling. For
the 3 x 3 action game, C&H propose the following strategy: "play
each of your actions with equal probability in the first stage. If coordination results, maintain it by repeating your first-stage action. If
not, rule out your first stage action and the action that would have
yielded coordination given your partner's first action; then play the
action not ruled out from the second stage onwards" (p. 585). It is
relatively easy to see that this strategy combination is the unique
attainable strategy combination that is collectively rational. Hence,
individual rational players in the extended sense should play their
part of this strategy combination. 9 For the 2 x 2 case, C&H propose
that the players randomize over their two actions with probabilities
( 2! ~)
1
until a stage game equilibrium is reached and then to continue to play these actions or to alternate deterministically. This pair
of strategies ensures coordination in the way it has been defined in
Section 2. However, as the proposal of how to continue after an
equilibrium is reached suffers from non-uniqueness, there is, again,
no guarantee that individual players who are rational in the extended
sense, play their part of the same pair of strategies. Therefore, indi-
CAN WE RATIONALLYLEARN TO COORDINATE?
41
vidual rationality in the extended sense does not ensure coordination
in the 2 x 2 case. The non-uniqueness argument applies to all k x / ;
action games, where/c = 2, 4 and k/> 6 (cf., C&H, p. 585).
Concluding Observations
C&H assume that players choose optimal attainable strategy combinations. In order to evaluate whether their analysis is based on purely
individualistic assumptions we have distinguished between 'individual rationality in the extended sense' and 'collective rationality'. In
the discussion above we have seen that for the 3 x 3 action case (and
for the 5 x 5 action case with a low value of the discount factor), when
players do not have a common labeling of their actions, individual
rationality in the extended sense suffices to ensure that a convention
can emerge out of a rational learning process. In all other cases, however, extended individual rationality does not ensure coordination
(due to non-uniqueness). To resolve the non-uniqueness problem,
C&H implicitly resort to the non-individualistic assumption of 'collective rationality'. Another way out of the non-uniqueness problem
is to assume the pre-existence of a convention (or focal point) at
a higher level (the level corresponding to which pair of strategies
to select from the set of strategies that is collectively rational) to
explain the endogenous emergence of a new convention.
5.
A P P R O A C H II: B A Y E S I A N L E A R N I N G IN THE INFINITE STAGE
GAME
In a sequence of papers, Kalai and Lehrer (1993(a), (b)) - K&L for
short - study the nature of Bayesian learning and its implications
for equilibrium play in an infinitely repeated game context. They
are concerned with learning repeated game strategies and as these
strategies are given and do not change over time, they overcome
the above mentioned difficulty of the possible non-stationarity of
individual agents' stage game strategies. We will restrict the discussion of the results K&L obtain to the two-player coordination games
under consideration.
The basic idea of K&L's paper is the following. At the start of the
repeated game, players have prior beliefs over the possible strategies that the opponent plays in the repeated game. Given these prior
42
SANJEEV GOYAL AND MAARTEN JANSSEN
beliefs, each player chooses the repeated game strategy that maximizes his discounted sum of pay-offs. The beliefs are updated using
Bayes' rule. The crucial assumption K&L use is that the combination of strategies that is actually played is absolutely continuous
with respect to the players' prior beliefs.l~ Roughly speaking, they
show that players will eventually correctly predict the future play of
their opponent if this assumption is satisfied. As players maximize
discounted pay-offs, this implies that their strategies will eventually form a Nash equilibrium of the repeated game. The results they
obtain are more easily explained if a condition stronger than absolute continuity, namely what they call the 'grain of truth' assumption, holds.la Therefore, we will consider and comment on this case
first.
Suppose player l's beliefs about player 2's strategy are as follows:
P(always R) = ct;
P(always L) = ~
P(RLLL
P(RRLL .)-1.
""
. ) -- -
1.
89
3'
P(RRRL...) = 1
and so on,
where, for example, P ( R L L L . . . ) = 1 is to be read as saying that
the prior probability of observing _Rin the first period and L always
i Suppose, moreover, that player 2 plays always R.
after is equal to g.
The 'grain of truth' assumption says that each player should attach a
strictly positive prior probability to the actual strategy that is chosen
by the opponent. Observe that the assumption is satisfied in the
example considered here. K&L's argument works as follows. After
observing R in period 1, player 1 knows that player 2 does not always
play L. Bayesian updating requires that the probability mass of ~l - ct
is distributed proportionally to its weight over the strategies that are
possibly played. After observing R in the first two periods, 5/6-c~
is distributed in this way over the remaining possible strategies. It
is clear that for any value of c~ > 0, after sufficiently long time
t, player 1 is almost sure that player 2 plays strategy 'always R'
and the chance of observing R in period t after having observed a
finite number of Rs up to time t - 1, P(Rt[FI1,..., Rt-1), will be
arbitrarily close to 1 for large t. The optimal response to such a belief
is for player 1 to play B in all subsequent (infinite) periods.
CAN WE RATIONALLY LEARN TO COORDINATE?
43
To see more clearly the importance of the 'grain of truth' assumption, suppose that, instead of the above beliefs, player l's beliefs are
slightly different:
P(always L) -- - ~ , 1.
P(RLLL
P(RRRL...)
.)-
'"
= 1
P(RRLL...)
= ~;
and so on.
In this case the grain of truth assumption is not satisfied and
P ( R t I R I , . . . , R ~ - I ) = 5x for any value of t. Accordingly, at the
beginning of every period player 1 thinks (wrongly) that the chance
that player 2 plays I in this period is larger than 1 and, consequently,
he will play T. Thus, the players will never coordinate given these
prior beliefs.
The grain of truth assumption is, however, a very strong condition to impose on the prior beliefs9 This is because a positive prior
probability can be attached to a countably infinite number of strategies only, whereas there is an uncountable number of strategies from
which the true strategy can be chosen. Any countable set of strategies
is very small compared to the uncountable strategy set from which
the true strategy is chosen. Thus, there are in principle a very large
number of priors, each of which assigns a positive probability to a
countable number of strategies. From this set of possible priors there
is very little reason for individuals to choose a prior of the right sort,
i.e., one that assigns strictly positive probability to the true strategy
of the opponent9
The 'grain of truth' assumption can also be examined in relation to
the Problem of Induction and the Projection Problem. In Section 3 we
argued that the main difficulty for players in learning to coordinate
is the problem of what to infer about future play of the opponent,
given some observations about past play. In general, there is an
uncountable number of hypotheses consistent with a finite number
of observations. One can only hope to make some useful inference
about future play if one can argue that one of these hypotheses is
more likely to be true than the others. In the Bayesian approach,
the prior is responsible for the selection of the most reasonable
hypothesis given the available evidence. In the first example above,
player 1 argues that a particular hypothesis, namely 'always R' after
having observed R 1 9 1 4 9Rt-1,
1 4 9 is more likely to be true than any other
44
SANJEEV GOYAL AND MAARTEN JANSSEN
hypothesis (for large t), because of the form of the prior. We have
seen that this conclusion no longer holds in the second example, But
what reasons can a player have for choosing a particular prior?
Recall that the aim of the present paper is to examine whether a
convention can emerge out of an interactive learning process between
rational players. The critical point then seems to be how players
choose their priors. One common answer here is that priors are subjective. However, the second example shows that the priors in K&L's
approach cannot be just any subjective priors. This is because subjective priors will, in general, not satisfy the grain of truth requirement.
One can interpret the drastic restriction implied by the grain of truth
assumption as saying that priors are of a conventional nature. This
conventional nature of priors could be defended by arguing that
players will not choose 'unusual' strategies and the players 'know'
that the other player will not do this, and so on. This means that
many of the possible strategies will not be chosen. The point here is
that the players should have this 'knowledge' in common, i.e., not
only should a player not choose 'unusual' strategies, the other player
should also know that he will not do so. In other words, there should
be a convention saying that certain observations are indicative of
particular strategies. Thus, K&L implicitly introduce a conventional
aspect in their analysis.
Can the above criticism be overcome by replacing the 'grain of
troth' assumption by the 'absolute continuity' assumption? After
all, it is the latter assumption that K&L really employ in their paper.
The above criticism still goes through for all strategies with the
property that some action combinations in the repeated game occur
with strictly positive probability and it is in such strategies that we
are interested in (cf., Definition 1). Let us consider, as an example, the strategy that Crawford and Haller propose for the 2 x 2
coordination game, i.e., 'randomize over your possible actions with
probability 1 until you reach a coordination equilibrium, and continue to play your part of this equilibrium ever after'. This strategy combination 12 results with probability ~l in { ( T , L ) , ( T , L ) ...},
with probability ~1 in { (B, R), (B, R) ... }, with probability 1/8 in
{ (T, R), (T, L ), (T, L ) , . . . }, with probability 1/8 in { (/3, L), (/3, R),
(/3, R ) , . . . } , and so on. Two things should be noted. First, there is
a countably infinite number of action combinations that occurs with
CAN WE RATIONALLY LEARN TO COORDINATE?
45
positive probability and, second, there is an uncountable number of
action combinations that will not occur given this strategy combination. The absolute continuity assumption requires that the prior
beliefs assign positive probability to all action combinations that
occur with strictly positive probability. The simplest way to guarantee this, of course, is 'the grain of truth' assumption, but there are
other ways as well. The crucial point is, however, that one can choose
the priors over strategies only such that a countable number of action
combinations receive a strictly positive probability. The same argument as above can now be applied replacing 'strategies' by action
combinations: as, at the start of play, there are an uncountable number of potentially true action combinations that are consistent with
the available evidence and as any countable set of action combinations is very small compared to an uncountable set, it is very unlikely
that a player's belief assigns strictly positive probability to the true
actions that are played by the other player in the repeated game.
So, we conclude that the 'absolute continuity' assumption cannot
overcome the above criticism.
More recent contributions to Bayesian learning, in particular
Kalai and Lehrer (1994) and Lehrer and Smorodinsky (1993), try
to overcome the restrictions imposed by the absolute continuity
assumption. They show that a 'weak' learning may take place when
the beliefs are dispersed around the true strategy being played (in
the sense that the beliefs assign a positive probability to any 'neighbourhood' of the true strategy being chosen). 'Weak learning' in this
context means that a player is able to predict opponent's play in the
near future only. As a careful technical analysis of the topologies
and notions of learning needs to be considered the reader is referred
to the original papers for a precise statement of the results and the
conditions under which they hold (see also Section 7.2 of Kalai and
Lehrer, 1993a). For the purpose of this paper these recent contributions are interesting in that they show that the conventions needed
to ensure coordination may be more vague than the conventions
implied by the absolute continuity assumption.~3
Our conclusion on the conventional nature of the selection of
priors is strengthened by the following observation. K&L show that
play will converge to the Nash equilibria of the repeated game, n o t
to the pure strategy Nash equilibria of the stage game. There are
very many equilibria of the repeated game and the equilibrium to
46
SANJEEV GOYAL AND MAARTEN JANSSEN
which one converges depends on the priors. Recall that we have
defined coordination as a property of a pair of strategies in such
a way that a convention to play the stage game emerges out of a
rational learning process (see Definition 1 above). Thus, even if the
'absolute continuity' assumption holds, there is no guarantee that
coordination is ensured. On the contrary, it may very well be that
the Bayesian players of K&L converge to an equilibrium in which
they, for example, alternate between (T, L) and (B, R), or worse, in
which they play the mixed strategy Nash equilibrium of the stage
game for ever. In order to obtain convergence to one of the pure
strategy Nash equilibria of the stage game, additional restrictions on
the priors may have to be imposed. This makes it even more unlikely
that players have the 'right' prior.
In concluding this section, we can say that K&L's approach circumvents the problem of the non-stationarity of stage game strategies by applying Bayesian learning to the repeated game strategies,
instead of to the stage game strategies. However, their approach has
its own problems, two of which are intimately related to the issues
we have discussed in the present paper. First, the Projection Problem
is 'solved' by attributing a selective prior to the players. The set of
opponent's strategies to which a positive prior probability can be
given is, however, very small with respect to the set of all possible
strategies. The above discussion has suggested that this implicitly
implies a conventional selection of priors. Second, K&L show convergence to one of the Nash equilibria of the repeated game, not to
one of the pure strategy stage game equilibria. Thus, their results
do not satisfy the requirements of 'learning implies coordination'
which were developed in Definition 1 above.
6. CONCLUSION
In this paper we have examined the issue whether individual rationality considerations are sufficient to guarantee that individuals will
learn to coordinate. This question is central in any discussion of
whether social phenomena (read: conventions) can be explained in
terms of a purely individualistic approach.
In the analysis we have paid special attention to recent work by
Crawford and Haller (1990) and Kalai and Lehrer (1993a,b). With
CAN WE RATIONALLYLEARNTO COORDINATE?
47
respect to C&H we have argued that the hypothesis that players
employ optimal attainable strategy combinations is, in general, nonindividualistic. On the other hand, the K&L result on learning to
play Nash equilibrium is based on the crucial assumption of absolute continuity, which we have argued implicitly assumes that some
strategies are salient. This salience is not explained in individualistic
terms.
To summarize: we have argued that the positive answers obtained
to the general question posed above require assumptions which incorporate some convention. This conclusion may be seen as supporting
the viewpoint of 'institutional individualism' in contrast to 'psychological individualism'.
NOTES
1 Earlier versions of this paper have been presented at a workshop on 'The
Emergence and Stability of Institutions' (Louvain-la-Neuve, June 1991), at a
workshop on Game Theory and Philosophy (Amsterdam, December 1991, at a
seminar at Carnegie-Mellon University (November 1993) and at a conference on
Epistemic Logic and The Theory of Games and Decisions (Marseille, January
1994). Comments by workshop and seminar participants, especially by Andr6
Orltan and Robert Sugden, and by Vincent Crawford, Ehud Kalai, Theo Kuipers,
Nick Vriend and two anonymous referees are gratefully acknowledged.
z This is a reasonable assumption in some settings. Examples include telephone
connections (see above); the paratroopers problem, or the electronic mail game
(Rubinstein, 1989). Moreover, in the philosophical literature it has been argued
that communication itself (language and truth telling) is a coordination problem
(cf., Hodgson, 1969; Lewis, 1972; den Hartog, 1985).
3 For simplicity we will restrict ourselves to behavior strategies. It is wellknown that, under the assumption of perfect recall, every mixed (hence, pure)
strategy can be replaced by an equivalent single behavior strategy (cf., Kuhn,
1953; Kalai and Lehrer, 1993).
4 This latter notion is not to be confused with the Bayesian learning that takes
place in Kalai and Lehrer (1993a), to which we return in Section 5.
5 C&H do not make this distinction. In private correspondence, Professor Crawford has pointed out that they introduced the term 'no common language' as a
convenient synonym for what we call 'no common labeling'. Thus, the discussion
on 'no common language' below should not be regarded as a criticism of their
paper.
6 It is well-known that for almost any pair of languages there are many words
for which no one-to-one correspondence exists. The 'proper' translation often
depends on the context in which the term is used.
7 It is interesting to observe that the coordination problem is so closely related
to the problem of induction the players are confronted with. On p. 120 of the
48
SANJEEV GOYALAND MAARTENJANSSEN
monograph, Goodman (1973) argues that "the roots of inductive validity are to be
found in our use of language". This indicates that it is even more difficult for the
players to solve the coordination problem once the common language assumption
is dropped.
8 The language problem is not resolved by assuming that there is mutual or
common knowledge of the two languages. In this case the players do not know
which language to use: they should use the language that the opponent is also
using, but know that there opponent uses the language that she thinks he will
choose, and so on.
9 Note that the proposed pair of strategies does not ensure coordination from
the second stage onwards if the players do not have a common language.
l0 Recall that the actual strategies chosen induce a probability distribution over
the set of possible outcomes (play paths) of the game. These strategies are absolutely continuous with respect to the beliefs, if the players' prior beliefs are such
that they assign a strictly positive probability to every outcome that occurs with
strictly positive probability given the actual strategies chosen.
~1 In independent work, Nyarko (1990) has used a similar assumption on the
support of agents' beliefs to demonstrate convergence to Nash equilibrium in a
repeated game with quantity setting firms.
12 Recall that both players are advised to choose the same strategy.
13 We are grateful to one of the referees for this interpretation of the recent work
on Bayesian learning.
REFERENCES
Agassi, A.: 1960, 'Methodological Individualism', British Journalfor Sociology
11, 144-170.
B acharach, M.: 1991, 'Conceptual Strategy Spaces', Oxford discussion paper.
Boland, L.: 1982, Foundations of Economic Method, London: Allen and Unwin.
Caldwell, B.: 1984, Beyond Positivism, London: Allen and Unwin.
Crawford, V. and Haller, H.: 1990, 'Learning How to Cooperate: Optimal Play in
Repeated Coordination Games', Econometrica 58, 571-595.
Eichberger, J., Haller, H. and Milne, E: 1993, 'Naive Bayesian Learning in 2 x 2
Games', Journal of Economic Behavior and Organization 22, 69-90.
Gauthier, D.: 1975, 'Coordination', Dialogue 14, 195-221.
Gilbert, M.: 1989, 'Rationality and Salience', Philosophical Studies 57, 61-77.
Gilbert, M.: 1990, 'Rationality, Coordination and Convention', Synthese 84, 1-21.
Goodman, N.: 1973, Fact, Fiction and Forecast (3rd Ed.), New York: BobbsMerill Company.
Goyal, S. and Janssen, M.: 1993a, 'Dynamic Coordination Failures and the Efficiency of the Firm', Tinbergen Institute Research paper No. 93-92. Journal of
Economic Behavior and Organization (forthcoming).
Goyal, S. and Janssen, M.: 1993b, 'Non-exclusive Conventions and Social Coordination', Tinbergen Institute Research paper No. 93--233.
Hartog, G. den.: 1985, Wederkerige Verwachtingen [Mutual Expectations],
Unpublished Ph.D. Thesis, University of Amsterdam (in Dutch).
CANWE RATIONALLYLEARNTO COORDINATE?
49
Hodgson, D.H.: 1967, Consequences of Utilitarianism, Oxford: Oxford University
Press.
Janssen, M.: 1993, Microfoundations: A Critical Inquiry, London: Routledge.
Kalai, E. and Lehrer, E.: 1993a, 'Rational Learning Leads to Nash Equilibrium',
Econometrica 61, 1019-1045.
Kalai, E. and Lehrer, E.: 1993b, 'Private-Beliefs Equilibrium', Econometrica 61,
1231-1240.
Kalai, E. and Lehrer, E.: 1994, 'Weak and Strong Merging of Opinions', Journal
of Mathematical Economics 23, 73-86.
Kandori, M., Malaith, G., and Rob, R.: 1993, 'Learning, Mutation and Long Run
Equilibria in Games', Econometrica 61, 29-56.
Kuhn, H.W.: 1953, 'Extensive Games and the Problem of Interaction', in H.W.
Kuhn and W.W. Tucker (Eds.), Contributions to the Theory of Games, Vol. II,
pp. 193-216. Princeton: Princeton University Press.
Lehrer, E. and Smorodinsky, R.: 1993, 'Compatible Measures and Merging', Tel
Aviv University, Mimeo.
Lewis, D.: 1969, Conventions, Cambridge (Mass.): Harvard University Press.
Lewis, D.: 1972, 'Utilitarianism and Truthfulness', Australasian Journal of Philosophy 50, 17-19.
Milgrom, R and Roberts, J.: 1991, 'Adaptive and Sophisticated Learning in Normal
Form Games', Games and Economic Behaviour 3, 82-100.
Nyarko, Y.: 1990, 'Bayesian Rationality and Learning without Common Priors',
New York University, mimeo.
Rubinstein, A.: 1989, 'The Electronic Mail Game: Strategic Behavior under
Almost Common Knowledge",' American Economic Review 79, 385-389.
Schelling, T.: 1960, The Strategy of Conflict., Cambridge (Mass.): Harvard University Press.
Sugden, R.: 1991, 'Rational Choice: A Survey of Contributions from Economics
and Philosophy', Economic Journal i01,751-786.
Sugden, R.: t992, 'Towards a Theory of Focal Points', University of East Anglia,
discussion paper.
Young, H.R: 1993, 'The Evolution of Conventions', Econometrica 61, 57-84.
Econometric Institute ( S. G. ) and
Department of Microeconomics, (M.J.),
Erasmus University,
P.O. Box 1738,
3000 D R Rotterdam, The Netherlands