Two-person repeated games with finite automata

Int J Game Theory (2000) 29:309±325
2000
9
99
9
Two-person repeated games with ®nite automata
Abraham Neyman1, Daijiro Okada2
1 Institute of Mathematics, The Hebrew University of Jerusalem, Givat Ram, Jerusalem 91904,
ISRAEL and SUNY at Stony Brook, Stony Brook, NY 11794-4384, USA
(email: [email protected])
2 Department of Economics, SUNY at Stony Brook, Stony Brook, NY 11794-4384, USA
(email: [email protected])
Received February 1997/revised version March 2000
Abstract. We study two-person repeated games in which a player with a restricted set of strategies plays against an unrestricted player. An exogenously
given bound on the complexity of strategies, which is measured by the size
of the smallest automata that implement them, gives rise to a restriction on
strategies available to a player.
We examine the asymptotic behavior of the set of equilibrium payo¨s as
the bound on the strategic complexity of the restricted player tends to in®nity,
but su½ciently slowly. Results from the study of zero sum case provide the
individually rational payo¨ levels.
JEL classi®cation: C73, C72.
Key words: repeated games, ®nite automata
1. Introduction
The objects of this study is two-person non-zero sum repeated games in which
there is a bound on the complexity of strategies for only one of the players.
Throughout the paper player 1 will be the restricted player. We employ automata to represent repeated game strategies. The complexity of a strategy is
then de®ned to be the smallest number of states of an automaton required to
implement it.
Speci®c models of repeated games studied are (1) The ®nitely repeated
game G n …m…n†† and (2) The l-discounted game Gl …m…l††. Here, m…† is the
bound on the number of states of automata available to player 1 and this is a
function of the number of repetitions, n, or the discount factor, l. We examine
the set of limit points of equilibrium payo¨s of G n …m…n†† (resp. Gl …m…l††) as
310
A. Neyman, D. Okada
m…n† ! y (n ! y) (resp. m…l† ! y (l ! 1)). The particular case considered in this paper is when m…† grows ``su½ciently slowly''1 Formally, we will
examine the cases when
lim
n!y
m…n† log m…n†
ˆ 0;
n
…1:1†
and
lim…1
l!1
l†m…l† log m…l† ˆ 0:
…1:2†
We will show that, under these conditions on m…†, the Hausdor¨ limit of the
set of equilibrium payo¨s, Limn!y E…m…n†† and Liml!1 E…m…l††, exist and
they coincide with the set of the feasible payo¨s above certain individually
rational levels. For player 1 this level will be his maxmin payo¨ in the oneshot game where max ranges over his pure actions and min ranges over player
2's pure actions, and for player 2 it will be her minmax of the one-shot game
where min ranges over player 1's pure actions and max ranges over her own
pure actions. The determination of these individually rational levels, under
the conditions (1.1) and (1.2), can be provided by our analysis of the zero sum
case, Neyman and Okada (1999). In this paper, however, we will explicitly
construct player 2's strategy that e¨ectively punishes the restricted player 1.
This will provide a simpler proof of a result originally proved in Neyman and
Okada (1999) using the concept of entropy. See also Neyman and Okada
(2000) for an alternative proof.
It will be seen that the equilibria that we construct are in pure strategies
and the equilibrium paths are cyclic. Our result for ®nitely repeated games
implies in particular that, in ®nitely repeated prisoner's dilemma, the friendly,
or nearly friendly, outcomes can be achieved in an equilibrium when there is a
bound on the strategic complexity on only one of the players. In addition,
Folk-Theorem type results like ours have an implication that in non-zero sum
games, being restricted in terms of strategic possibility is not necessarily detrimental even against a powerful unrestricted player. Only the punishment
will be severe for the restricted player.
Related literature includes Neyman (1999) and Papadimitriou and Yannakakis (1994) which contain several results on the asymptotic behavior of
the set of equilibrium payo¨s of two-person ®nitely repeated games when
there are bounds on the strategic complexity for both players. These results
encompass Neyman (1985)'s justi®cation of cooperation in ®nitely repeated
prisoners' dilemma. For example, the main theorem of Neyman (1999) states
that if the two bounds on the size of automata are subexponential as a function of minfthe number of repetitions; the larger boundg, then the asymptotic
folk theorem is obtained. More precisely, let G be a two-person game in strategic form. Denote by v i the minmax payo¨ for player i where min ranges
over the other player's mixed actions and max ranges over i's own pure ac1 For example, the condition (1.1) holds for all functions m…n† ˆ n a where 0 < a < 1, while it is
violated for m…n† ˆ n. In addition, the function m…n† ˆ n=log n, for which m…n†=n ! 0 …n ! y†
but m…n†=n a ! y …n ! y† for all 0 < a < 1, violates (1.1). The authors thank a referee for this
comment.
Two-person repeated games with ®nite automata
311
tions. Let G n …m1 …n†; m2 …n†† be the n-fold repetitions of G in which player i's
strategies are restricted to those implementable by automata of size at most
mi …n†, a function of n. If the sequence of triples …n; m1 …n†; m2 …n††y
nˆ1 satis®es
the conditions minfm1 …n†; m2 …n†g ! y …n ! y† and
lim
n!y
log…maxfm1 …n†; m2 …n†g†
ˆ 0;
minfn; m1 …n†; m2 …n†g
then the set of equilibrium payo¨ vectors of G n …m1 …n†; m2 …n†† converges to the
set of payo¨ vectors which are feasible and give player i at least v i . Zemel
(1989) contains results similar to Neyman (1985) but using modi®ed ®nite
automata which can send messages, in addition to those conveyed through the
actions taken, during the play.
Similar results have been obtained for other classes of repeated games.
Ben-Porath (1993) studies the undiscounted in®nitely repeated games with
N
N
; …ri †iˆ1
† be an N-person game in strategic
®nite automata. Let G ˆ ……Ai †iˆ1
form, and
vi ˆ
min
max Eq …r…a i ; b††
q A j0i D…A j † a i A A i
and
w i ˆ max
min
p A D…A i † b A j0i A i
Ep …r…a i ; b††
where D…X † denotes the set of probability distributions on a set X and Em denotes the expectation with respect to the probability m. Consider the in®nitely
repeated game in which player i has a complexity bound mi …k†, parameterized
by positive integer k, with m1 …k† U U mN …k† and m1 …k† ! y…k ! y†.
Denote this game by Gy …m1 …k†; . . . ; mN …k†† and the set of its equilibrium
payo¨s by E y …m1 …k†; . . . ; mN …k††. One of his results asserts that if
lim
k!y
log mN …k†
ˆ 0;
m1 …k†
then,
(i) the set of feasible payo¨s which give each player i at least v i is included in
lim inf k!y E y …m1 …k†; . . . ; mN …k†† and
(ii) lim supk!y E y …m1 …k†; . . . ; mN …k†† is included in the set of feasible payo¨s which give each player i at least w i .
Note that v i V w i . For two-person games, we have v i ˆ w i . Hence one can
conclude that limk!y E y …m1 …k†; m2 …k†† exists and it coincides with the set of
equilibrium payo¨s of the in®nitely repeated game without complexity bound
(the Folk Theorem). This result crucially depends on the study of the twoperson zero sum case which provides the individually rational levels v i and w i .
The exact asymptotics, i.e., the limit, if exists, of E y …m1 …k†; . . . ; mN …k†† for Nperson case is not known. See Section 4 of Neyman (1997). Lehrer (1988)
312
A. Neyman, D. Okada
contains a similar result for two-person games with bounded recall. Also see
Lehrer (1994) for N-person case with bounded recall.
The main contribution of this paper is a determination of individually
rational payo¨ levels together with the construction of equilibria which have
certain robustness properties and can be applied to a wider variety of conditions on the order of magnitude of the complexity bound m…†.
The next section introduces the model of repeated games and ®nite automata. In Section 3 we will construct player 2's strategy which will be used to
punish player 1 in equilibria constructed in the subsequent chapters. The results on the asymptotics of the set of equilibrium payo¨ vectors are presented
in Section 4 (the ®nitely repeated games) and Section 5 (the discounted games).
Section 6 concludes the paper.
2. Repeated games and automata
Let G ˆ …A; B; h; k† be a two-person game in strategic form where A and B are
®nite sets of actions, and, h : A B ! R and k : A B ! R are the payo¨
functions of player 1 and 2, respectively. We call G the stage game. Throughout the paper we will assume without loss of generality that all payo¨s are
nonnegative, i.e., h…a; b† V 0 and k…a; b† V 0 for all …a; b† A A B. Denote the
maxmin value of the stage game for player 1 in pure actions by h and the
minimax value for player 2 in pure actions by k , i.e.,
h ˆ max min h…a; b†
aAA bAB
and
k ˆ min max k…a; b†:
aAA bAB
Also set khk ˆ maxa; b jh…a; b†j and kkk ˆ maxa; b jk…a; b†j.
Given G ˆ …A; B; h; k† we next describe a new game in which G is played
repeatedly (with complete information and standard signaling).
For each positive integer n, let Sn (resp. Tn ) be the set of mappings from
…A B† n 1 to A (resp. to B) where …A B† 0 ˆ ffg. A pure strategy of player
1 (resp. player 2) is an element of S ˆ n Sn (resp. T ˆ n Tn ). Equivalently,
S (resp. T ) is the set of all mappings on the set of all ®nite histories
6nV1 …A B† n 1 to A (resp. B). A mixed strategy of player 1 (resp. player 2)
is a probability distribution on S (resp. T ). The sets of mixed strategies are
denoted by D…S† and D…T†.
Every pair of pure strategies …s; t† induces a play o…s; t† ˆ …ol …s; t††y
lˆ1 A
…A B†y where ol …s; t† is de®ned inductively as
for l ˆ 1
…s1 …f†; t1 …f††
ol …s; t† ˆ …al ; bl † ˆ
…sl …o1 ; . . . ; ol 1 †; tl …o1 ; . . . ; ol 1 †† for l > 1
Accordingly, every pair …s; t† of mixed strategies induces a random play
o…s; t† ˆ …ol …s; t††y
lˆ1 . We denote the corresponding probability distribution
on the set of plays …A B†y by Ps; t and the expectation with respect to Ps; t
by Es; t .
For each positive integer n we de®ne thePn-average payo¨ function of
n
h…ol …s; t††. Also, for each
player 1, hn : S T ! R, by hn …s; t† ˆ …1=n† lˆ1
l A ‰0; 1† we de®ne the l-discounted
payo¨
function
of player 1, hl : S T !
P
l 1
l
h…o
…s;
t††.
The
n-average and the lR by hl …s; t† ˆ …1 l† y
l
lˆ1
discounted payo¨ functions of player 2, kn and kl , are similarly de®ned. The
Two-person repeated games with ®nite automata
313
bilinear extensions of hn , hl , kn and kl to D…S† D…T†
denoted by the
Pare
n
h…al ; bl †e.
same symbols. Thus, for example, hn …s; t† ˆ Es; t d…1=n† lˆ1
In this paper we study two classes of repeated games di¨erentiated by their
payo¨ functions.
Finitely Repeated Game G n ˆ …S; T; hn ; kn †
The l-Discounted Game Gl ˆ …S; T; hl ; kl †
If two pure strategies of a player induce the same play against any pure
strategy of the other player, they are said to be equivalent. For example,
player 1's pure strategies s and s 0 are equivalent if ol …s; t† ˆ ol …s 0 ; t† for all
pure strategy t of player 2 and all stages l ˆ 1; 2; . . . . Extending this notion
to mixed strategies, we say that two strategies of a player are equivalent if,
against any strategy of the other player, they induce the same probability over
the plays of a repeated game.
Given the stage game G ˆ …A; B; h; k†, an automaton of player 1 is de®ned
by a four-tuple M ˆ hQ; q1 ; f ; gi. The ®rst component Q is a set of states,
and q1 A Q is an initial state. The third component is an action function,
f : Q ! A, and the last component is a transition function, g : Q B ! Q. By
the size of an automaton we mean the cardinality of the set of its states, jQj.
An automaton M plays a repeated game as follows. At each stage n it
takes an action prescribed by f for the current state, say qn , i.e., f …qn †; it is
set for q1 at the ®rst stage. Then it changes its state to qn‡1 speci®ed by g
as a function of the current state qn and player 2's action bn , that is, qn‡1 ˆ
g…qn ; bn †.
Every automaton M induces a pure strategy s for player 1 in a repeated
game in the following manner. First, for any sequence of player 2's actions
b1 ; . . . ; bn …n V 2†, de®ne an extension of the transition function inductively by
g…q; b1 ; . . . ; bn † ˆ g…g…q; b1 ; . . . ; bn 1 †; bn †:
Then for any history o ˆ ……a1 ; b1 †; . . . ; …an ; bn †† A …A B† n , set
s…o† ˆ f …g…q; b1 ; . . . ; bn ††
which is an action taken at stage n ‡ 1 …n V 1†. At the ®rst stage, s…f† ˆ
f …q1 †.
Also, for every pure strategy s A S in a repeated game, there is an automaton that induces an equivalent strategy. If s is equivalent to a pure strategy
induced by an automaton, we say that s is implementable by that automaton.
The size of the smallest automaton that implements a pure strategy serves
as a measure of complexity of that strategy. To be more precise, for a given
s ˆ …sn † A S, we say that a ®nite history o ˆ ……a1 ; b1 †; . . . ; …al ; bl †† is compatible with s if an ˆ sn ……a1 ; b1 †; . . . ; …an 1 ; bn 1 †† for every n ˆ 1; . . . ; l. Also, for
an arbitrary ®nite history o of length l, de®ne the induced strategy sjo ˆ
……sjo†n † by …sjo†n …o 0 † ˆ sl‡n …oo 0 † for each o 0 A …A B† n 1 where oo 0 is the
concatenation of o and o 0 . The number of distinct, or nonequivalent, strategies induced by s and ®nite histories compatible with s can be considered
as a measure of complexity of implementing s. Indeed, it can be shown that
314
A. Neyman, D. Okada
the size of the smallest automaton that implements s equals the number of
the equivalence classes of f…sjo†jo is a finite history compatible with sg. Kalai
and Stanford (1988) provides an analogous result for the full automata whose
transition depends on the player's own action as well as the actions of the
other players.
Henceforth, by the complexity of a pure strategy s, we mean the size of the
smallest automaton that implements s. For each positive integer m, we denote
by S…m† the subset of S consisting of those pure strategies of player 1 whose
complexity is at most m.
3. Individually rational payo¨ levels
We will present in this section a result which will be utilized in deriving much
of the subsequent results. The situation under consideration is the one in
which player 1 is restricted to a ®nite set of pure strategies. The nature of this
set is arbitrary. In particular, it may contain pure strategies which cannot be
implemented by any ®nite automata.
Theorem 3.1. For every ®nite subset S 0 of S there exists t^ A T such that for all
s A S0
(i)
hn …s; t^† U h ‡
khk log2 jS 0 j
n
for all n ˆ 1; 2; . . . ;
and
(ii) hl …s; t^† U h ‡ …1
l†khk log2 jS 0 j
for all l A ‰0; 1†:
Proof: For each ®nite history o ˆ …o1 ; . . . ; ol †, where oj ˆ …aj ; bj †, let S 0 …o†
be the set of strategies in S 0 that are compatible with o, i.e.,
S 0 …o† ˆ fs A S 0 j s…q† ˆ a1 ; and s…o1 ; . . . ; oj 1 † ˆ aj for all j ˆ 2; . . . ; lg:
For each a A A let S 0 …o; a† be the set of strategies in S 0 …o† that takes the
action a at the history o, i.e.,
S 0 …o; a† ˆ fs A S 0 …o† j s…o† ˆ ag:
Clearly, if a 0 a 0 , then S 0 …o; a† and S 0 …o; a 0 † are disjoint, and 6a A A S 0 …o; a†
ˆ S 0 …o†.
Let a…o† be an action of player 1 such that jS 0 …o; a…o††j V jS 0 …o; a†j for
all a A A. Notice that if a 0 a…o†, then jS 0 …o; a†j is at most one half of jS 0 …o†j:
otherwise, jS 0 …o; a…o††j ‡ jS 0 …o; a†j V 2jS 0 …o; a†j > jS 0 …o†j, a contradiction.
This implies that for every …a; b† A A B with a 0 a…o†, if o 0 ˆ …o1 ; . . . ; ol ;
…a; b††, then
jS 0 …o 0 †j U
jS 0 …o†j
:
2
De®ne t^ A T by t^…o† A argminb A B h…a…o†; b†. Take s A S 0 arbitrarily and
let …o1 ; o2 ; . . .† be the play generated by …s; t^†, oj ˆ …aj ; bj †. Denote o l ˆ
Two-person repeated games with ®nite automata
315
…o1 ; . . . ; ol †. Of course, s is compatible with o l for every l, i.e., s A S…o l †.
Therefore for all n,
Pn
I …al 0a…o l 1 ††
lˆ1
V jS…o n †j V 1
jS 0 j2
Pn
where I is the indicator function. This implies that lˆ1
I …al 0 a…ol 1 †† U
0
log2 jS j, that is, the number of stages at which player 1's action di¨ers from
a…o l † is at most log2 jS 0 j.
; y2 ; . . .† be a nonincreasing sequence of nonnegative
Now let y ˆ …y1P
Pyreal
numbers such that y
lˆ1 yl ˆ 1. De®ne hy : S T ! R by hy …s; t† ˆ
lˆ1 yl h…ol …s; t††. Take s A S 0 . Then, since
h…ol † U h I …al ˆ a…ol 1 †† ‡ khkI …al 0 a…ol 1 ††
for every l ˆ 1; 2; . . . ; we have
hy …s; t^† U
y
X
yl …h I …al ˆ a…ol 1 †† ‡ khkI …al 0 a…ol 1 †††
lˆ1
U h ‡ khk
y
X
yl I …al 0 a…ol 1 ††
lˆ1
U h ‡ khk
y
X
y1 I …al 0 a…ol 1 ††
lˆ1
U h ‡ y1 khk log2 jS 0 j:
(Recall our assumption h…a; b† V 0.) Note that if yl ˆ 1=n for l ˆ 1; . . . ; n
and yl ˆ 0 for l > n, then hy ˆ hn . Hence (i). Also, if yl ˆ …1 l†ll 1 , l ˆ
Q.E.D.
1; 2; . . . ; then hy ˆ hl . This proves (ii).
Remark 3.1: If s and s 0 in S 0 are equivalent, then for every ®nite history o and
every action a A A, either both s and s 0 are in S 0 …o; a† or neither is in S 0 …o; a†.
Therefore one can replace logjS 0 j in the statement of Theorem 3.1 by
logjS 0 =@j where S 0 =@ is the set of equivalence classes of S 0 .
Remark 3.2: Let S1 ; S2 ; . . . be a nondecreasing sequence of ®nite subsets of S.
If logjSn j=n ! 0 as n ! y, then Theorem 3.1 (i) implies that for every e > 0,
there is n0 such that for each n V n0 there is t A T for which maxs A Sn hn …s; t† U
h ‡ e. Similar result is obtained from Theorem 3.1 (ii) for l-discounted payo¨
by replacing the sequence S1 ; S2 ; . . . by a net …Sl j0 U l < 1† and the condition
logjSn j=n ! 0 as n ! y by …1 l† logjSl j ! 0 as l ! 1.
4. Finitely repeated game G n …m…n††
In this section we study the modi®ed version of the ®nitely repeated game,
G n …m…n†† ˆ …S…m…n††; T; hn ; kn †. The bound on the complexity of player 1's
316
A. Neyman, D. Okada
strategy, m…†, is a function of the number of repetitions n. Player 1 is allowed
to use a mixed strategy provided that its support lies in S…m…n††. He can also
use a behavioral strategy s : 6lV0 …A B†l ! D…A† as long as it is equivalent
to a mixed strategy with the support in S…m…n††.
A simple counting shows that the number of ®nite automata of size m is at
most mCm for some positive constant C. Thus the number of equivalence
classes of S…m† is also bounded by mCm . The next lemma follows from Theorem 3.1 (i), Remark 3.1 and Remark 3.2.
Lemma 4.1. Suppose that m…n† log m…n†=n ! 0 as n ! y. Then for every
e > 0, there is n0 such that for each n > n0 there is t A T such that
hn …s; t† U h ‡ e
for all s A S…m…n††:
Remark 4.1: As an immediate corollary of this lemma, we obtain the following result concerning the asymptotics of the value of two-person zero sum
repeated games with ®nite automata which was proved in Neyman and Okada
(1998a).2 Consider G ˆ …A; B; h; k† to be a two-person zero sum game, i.e.,
k ˆ h. Denote the value of G n …m…n†† by V n …m…n††.
Corollary 4.1. If m…n† log m…n†=n ! 0 as n ! y, then V n …m…n†† ! h as
n ! y.
Denote by E n …m…n†† the set of (Nash) equilibrium payo¨ vectors of
G …m…n††. The next theorem provides an asymptotics of E n …m…n†† when m…n†
grows su½ciently slowly. The convergence of sets is with respect to the Hausdor¨ topology in R 2 . To state the theorem formally we need some more notation. Let F be the convex hull of the set of payo¨s feasible in pure actions of
the stage game, that is, F ˆ Cof…h…a; b†; k…a; b†† j …a; b† A A Bg; and let
n
F~ ˆ f…x; y† A F j x V h ; y V k g:
The set F~ is nonempty. For example, let a A argmaxa A A ‰minb A B h…a; b†Š
and b A argmaxb A B k…a ; b†. Then it is easily seen that h…a ; b † V h and
k…a ; b † V k . Note that the point …h ; k † does not necessarily belong to
F and thus it may not belong to F~. For example, for the 2 2 stage game
L
R
T
0; 0
1; 2
B
2; 1
0; 0
we have …h ; k † ˆ …0; 1† and F ˆ Cof…0; 0†; …1; 2†; …2; 1†g. So …h ; k † B F .
…F~ ˆ Cof…12 ; 1†; …1; 2†; …2; 1†g† See Figure 1.
Theorem 4.1. If m…n† ! y, m…n† log m…n†=n ! 0 as n ! y, and if there is
…x; y† A F~ with x > h , then E n …m…n†† ! F~ as n ! y.
2 The previous proof utilized the notion of entropy.
Two-person repeated games with ®nite automata
317
Fig. 1. …h ; k † may not be in F
As demonstrated in the next example, the conclusion of the theorem fails if
the condition on F~ is not satis®ed.
Example (Neyman (1999)). Consider the 2 2 stage game given below.
L
R
T
1; 3
0; 4
B
1; 1
1; 0
Observe that h ˆ k ˆ 1 and F~ ˆ f…1; y† j 1 U y U 3g. For this game
E …m…n†† ˆ f…1; 1†g for every n regardless of m…n†. To see this ®rst note that
player 1 must receive 1 at every stage in any equilibrium path, and he can
guarantee 1 with an automaton of size 1 (play B at every stage).
Suppose that, in some equilibrium …s; t† of G n …m…n††, player 2 received
more than 1. Then …T; L† must be played with a positive probability at some
stage on the equilibrium path. Let n~ be the last stage at which …T; L† is played
with a positive probability, i.e.,
n
n~ ˆ maxfn 0 j 1 U n 0 U n; Ps; t ……an 0 ; bn 0 † ˆ …T; R†† > 0g:
De®ne t~ ˆ …~
tl † A D…T† as
8
< tl …o† for 1 U l U n~
t~l …o† ˆ R
for l ˆ n~
:
L
for l > n~
1
Then it is easily veri®ed that kn …s; ~t† > kn …s; t†, contradicting to the supposition that …s; t† is an equilibrium. r
We now turn to the proof of Theorem 4.1. Given a point z A R 2 and a
nonempty compact set Z H R 2 , de®ne d…z; Z† ˆ minz 0 A Z kz z 0 k. Since F~
and E n …m…n†† are nonempty compact subsets of F which is compact, the
318
A. Neyman, D. Okada
conclusion of the theorem, E n …m…n†† ! F~, is equivalent to
Lim E n …m…n†† ˆ Lim E n …m…n†† ˆ F~
n!y
n!y
where
Lim E n …m…n†† ˆ fzjEe > 0; En 0 ; bn V n 0 such that d…z; E n …m…n††† < eg;
n!y
and
Lim E n …m…n†† ˆ fzjEe > 0; bn0 such that En V n0 ; d…z; E n …m…n††† < eg:
n!y
We will establish the identity of the three sets through a pair of claims. Note
that the ®rst claim requires neither m…n† ! y …n ! y† nor the condition on
F~ present in the statement of Theorem 4.1.
Claim 4.1. If m…n† log m…n†=n ! 0 as n ! y, then Limn!y E n …m…n†† H F~.
Proof: Obviously, E n …m…n†† is included in the set of payo¨ vectors achieved
by mixed strategies which in turn is a subset of F. As the set F is closed,
Limn!y E n …m…n†† H F .
Take …x; y† A Limn!y E n …m…n††. First, the fact that player 1 can guarantee h at every stage using a pure action a A argmaxa A A ‰minb A B h…a; b†Š
shows that x V h . Next if G is zero sum so that h ˆ k, then
h ˆ max min k…a; b† ˆ
aAA bAB
min max k…a; b† ˆ
aAA bAB
k :
It follows from Lemma 4.1 that for every e > 0, there is n0 such that for each
n V n0 player 2 has a pure strategy t such that
kn …s; t† V k e
for all s A S…m…n††:
Therefore, for every n V n0 , player 2 must receive at least k librium of Gn …m…n††. This implies that y V k .
e in any equiQ.E.D.
Claim 4.2. If m…n† ! y, m…n† log m…n†=n ! 0 as n ! y, and if there is
…x; y† A F~ with x > h , then F~ H Lim n!y E n …m…n††.
Proof: First, we deal with the case in which there is …x; y† in F~ with x > h
and y > k . To show F~ H Lim n!y En …m…n†† it su½ces to show that, for
every d > 0, the set F~d ˆ f…x; y† A F~ j x > h ‡ d; y > k ‡ dg is contained in
Lim n!y En …m…n††.
Let K ˆ maxfkhk; kkkg. Since we have assumed that the payo¨s are
nonnegative, it follows that jr…a; b† r…a 0 ; b 0 †j U K for all r ˆ h; k and …a; b†;
…a 0 ; b 0 † A A B. Fix a d > 0 for which F~d 0 q and take …x; y† A F~d . Let e > 0
be su½ciently small so that e < minf1; K=4g, x > h ‡ 2e, and y > k ‡ 2e.
Let …ai ; bi † A A B, i ˆ 1; 2; 3, be such that …x; y† is a convex combination of …h…ai ; bi †; k…ai ; bi ††, i ˆ 1; 2; 3. Thus there are ai V 0, i ˆ 1; 2; 3, such
Two-person repeated games with ®nite automata
319
P3
that a1 ‡ a2 ‡ a3 ˆ 1 and …x; y† ˆ iˆ1
ai …h…ai ; bi †; k…ai ; bi ††. Assume that
k…a1 ; b1 † U k…a2 ; b2 † U k…a3 ; b3 † and, without loss of generality, a3 > 0. Let d
be a su½ciently large positive integer soP
that, by setting d1 ˆ ‰a1 dŠ, d2 ˆ ‰a2 dŠ,
3
di …h…ai ; bi †; k…ai ; bi ††, the followd3 ˆ d …d1 ‡ d2 †, and …x; y† ˆ …1=d† iˆ1
ing inequalities hold:
k…x; y†
e
…x; y†k < ;
2
…4:1†
and
k † > K:
d3 …y
…4:2†
Note that …x; y† converges to …x; y† as d tends to in®nity and thus (4.1) holds
for a su½ciently large d. Also, since a3 > 0 implies that d3 ! y as d ! y,
and since y > k , (4.2) holds for a su½ciently large d.
Let b4 A B be a best response of player 2 to the action a3 of player 1. De®ne
a sequence of action pairs of length d, x ˆ …x1 ; . . . ; xd †, by
8
< …a1 ; b1 † for j ˆ 1; . . . ; d1
xj ˆ …a2 ; b2 † for j ˆ d1 ‡ 1; . . . ; d1 ‡ d2
:
…a3 ; b3 † for j ˆ d1 ‡ d2 ‡ 1; . . . ; d.
(Recall that d1 ‡ d2 ‡ d3 ˆ d) De®ne a sequence of action pairs o ˆ
…o1 ; . . . ; on † A …A B† n as follows. Let q ˆ ‰n=dŠ. In the last d stages, o coincides with x up to the one stage before the end and then ®nishes with
…a3 ; b4 †:
…on
d‡1 ; . . . ; on †
From the stage n
…on
ˆ …x1 ; . . . ; xd 1 ; …a3 ; b4 ††;
qd ‡ 1 up to n
qd‡1 ; . . . ; on d †
Finally, in the ®rst n
…o1 ; . . . ; on
n
qd †
d, x is repeated q
ˆ ``…x1 ; . . . ; xd † repeated q
1 times:
1 times:''
qd stages, the tail part of x is played:
ˆ …x…q‡1†d
n‡1 ; . . . ; xd †:
Notice that …o1 ; . . . ; on 1 † is d-periodic. Clearly, for every p ˆ 1; . . . ;
1,
n
X
h…ol † V …n
p†x
dK
…4:3†
lˆp‡1
The assumption k…a1 ; b1 † U k…a2 ; b2 † U k…a3 ; b3 † and the choice of b4 imply
that for every p < n,
n
X
lˆp‡1
k…ol † V …n
p†y:
…4:4†
320
A. Neyman, D. Okada
De®ne a pair of pure strategies …~
s; t~† so that (i) they follow the path o as
long as the other player does so, and (ii) if player 2 deviated from o, then s~
takes a pure action a~ A argmina A A dmaxb A B k…a; b†e at every stage afterward,
while (iii) if player 1 deviated from o for the ®rst time at stage p, then t~ starts
playing a pure strategy t^ such that for every s A S…m…n†† and l ˆ 1; 2; . . . ;
hl …s; t^† U h ‡
khk logjS…m…n††j
:
l
Theorem 3.1 ensures the existence of such strategy.
The strategy s~ is implementable by an automaton of size d ‡ 1: d states for
playing the cycle phase of o and one for the punishment.3 So, if n is large
enough so that m…n† > d, then s~ A S…m…n††. Since the play induced by …~
s; t~† is
o, we have
s; t~†; kn …~
s; t~††
k…hn …~
…x; y†k <
dK e
<
n
2
for n >
2dK
:
e
It follows from (4.1), using the triangle inequality, that …hn …~
s; t~†; kn …~
s; t~†† is
within e of …x; y† for su½ciently large n.
Next we will show that no unilateral deviation from …~
s; t~† leads to a strict
improvement of the payo¨. We start with player 2. Take t A T. Assume that
the strategy t deviates from the play o at stage p. If p U n d3 , then the
inequalities (4.2) and (4.4) imply that
s; t~†
n…kn …~
kn …~
s; t†† V
V
while if n
K ‡ …n
K ‡ d3 …y
p†…y
k †
k † > 0;
d3 < p U n, recalling the choice of b4 ,
s; t~†
n…kn …~
kn …~
s; t†† V …n
p†k…a3 ; b3 † ‡ k…a3 ; b4 †
…k…a3 ; b4 † ‡ …n
ˆ …n
p†…k…a3 ; b3 †
p†k †
k † > 0:
Thus we conclude that player 2 cannot bene®t from a deviation from o at
any stage. Let us turn to player 1. Take s A S…m…n†† and suppose that …s; t~†
resulted in player 1's deviation from the path o. The fact that s A S…m…n††
implies that such deviation must occur in the ®rst m…n† repetitions of the cycle
o1 ; . . . ; od , and hence in the ®rst m…n†d stages of the repeated game. Thus
assume that the deviation occurred at stage p U m…n†d. Let …o10 ; . . . ; op0 † be
the play induced by …s; t~† up to stage p and set s 0 ˆ …sjo10 ; . . . ; op0 †. Then by
the construction of t~, we have, recalling that the payo¨s are assumed to be
nonnegative,
3 Although on 0 xd , player 1's action in on , a3 , is the same as his action in xd .
Two-person repeated games with ®nite automata
321
Fig. 2. F~ ˆ f…x; k †jh0 U x U h1 g, h U h0 < h1
hn …s; t~† U
U
pK ‡ 1
n
p
hn p …s 0 ; t^†
n
pK
khk logjS…m…n††j
‡ h ‡
n
n
which would be less than h ‡ e if, e.g.,
nV
2
maxfm…n†dK; khk logjS…m…n††jg:
e
…4:5†
Our assumption on the order of magnitude of m…n† guarantees (4.5) to hold
for all su½ciently large n. Since x > h ‡ 2e and hn …~
s; t~† is within e of x, we
s; t~†. We have thus shown that …~
s; t~† is an equilibrium of
have hn …s; t~† < hn …~
G n …m…n†† with a payo¨ vector within e of our target payo¨ vector …x; y† provided that n is large enough.
Next assume F~ ˆ f…x; k †jh0 U x U h1 g where h U h0 U h1 and h < h1 .
See Figure 2 for example. In this case there are two action pairs …a; b† and
…a 0 ; b 0 † such that k…a; b† ˆ k , h…a; b† U h0 and …h…a 0 ; b 0 †; k…a 0 ; b 0 †† ˆ …h1 ; k †.
Take …x; k † A F~ with x > h ‡ 2e where e > 0 is su½ciently small. Let d be
a su½ciently large positive integer and let x ˆ …x1 ; . . . ; xd † A …A B† d be such
Pd
that (i) xj ˆ …a; b† or …a 0 ; b 0 †, and (ii) j…1=d† jˆ1
h…xj † xj < e. Let n be a
su½ciently large positive integer and de®ne the path o ˆ …o1 ; . . . ; on † A
…A B† n by …o1 ; . . . ; on qd † ˆ ……a 0 ; b 0 †; . . . ; …a 0 ; b 0 †† and …on qd‡1 ; . . . ; on † ˆ
``x repeated q times'' where q ˆ ‰n=dŠ. Let s~ be player 1's pure strategy that
takes the action ol1 at stage l regardless of the past history. Let t~ be player 2's
pure strategy that follows the path o as long as player 1 does so, and if player
1 deviated from o at stage l for the ®rst time, it immediately reverts to t^
against S…m…n†† as in the previous case.
s; t~† is played and it is the
Since player 2 receives k at every stage when …~
highest payo¨ for her in the stage game, she has no incentive to deviate from
o at any stage. An argument similar to the ®rst case shows that player 1 cannot bene®t from the deviation from o, provided that n is su½ciently large.
322
A. Neyman, D. Okada
Thus …~
s; t~† is an equilibrium of G n …m…n†† with payo¨ within e of …x; k † for
su½ciently large n. This completes the proof.
Q.E.D.
Claims 4.1, 4.2 and the fact Lim n!y E n …m…n†† H Limn!y E n …m…n††
establish Theorem 4.1.
5. The l-discounted game G l (m(l))
In this section we will study a modi®ed version of the l-discounted game,
Gl …m…l†† ˆ …S…m…l††; T; hl ; kl †. We consider the bound on the complexity of
player 1's strategies to be a function of the discount factor l such that m…l† !
y as l ! 1. The next lemma is an analog of Lemma 4.1.
Lemma 5.1. Suppose that …1 l†m…l† log m…l† ! 0 as l ! 1. Then for every
e > 0, there is l0 such that for each l A ‰l0 ; 1† there is t A T such that
hl …s; t† U h ‡ e
for all s A S…m…l††:
Corollary 5.1. Let G be a two-person zero sum game and Vl …m…l†† be the
value of Gl …m…l††. If …1 l†m…l† log m…l† ! 0 as l ! 1, then Vl …m…l†† ! h
as l ! 1.
Let us denote by El …m…l†† the set of (Nash) equilibrium payo¨ vectors of
Gl …m…l††. The next theorem is an analogue of Theorem 4.1.
Theorem 5.1. If …1 l†m…l† log m…l† ! 0 as l ! 1 and if there is …x; y† A F~
with x > h or y > k , then El …m…l†† ! F~ as l ! 1.
Proof: De®ne the sets Liml!1 El …m…l†† and Lim l!1 El …m…l†† similarly to
Limn!y E n …m…n†† and Lim n!y E n …m…n††. An argument similar to the proof
of Claim 4.1 together with Lemma 5.1 shows that Liml!1 El …m…l†† H F~.
Below we will show that F~ H Lim l!1 El …m…l††.
First, assume that there is a point …x; y† in F~ such that x > h and y > k .
As in the proof of Claim 4.2, ®x d > 0 with F~d 0 q and take …x; y† A F~d . Let
e > 0 be su½ciently small so that x > h ‡ 4e and y > k ‡ 4e. Let d be a
su½ciently large positive integer and let x ˆ …x1 ; . . . ; xd † A …A B† d be a ®nite
sequence of action pairs such that
1 X
d
…h…xj †; k…xj ††
d jˆ1
e
…x; y† < :
2
…5:1†
De®ne a play o ˆ …o1 ; o2 ; . . .† by ol ˆ xj if l ˆ j (mod d ). That is, o consists of repetitions of the ®nite cycle x. For each positive integer p let
…xp …l†; yp …l†† ˆ …1
l†
y
X
lˆp
ll p …h…ol †; k…ol ††;
Two-person repeated games with ®nite automata
323
and set …x…l†; y…l††Pˆ …x1 …l†; y1 …l††. As …ol †y
lˆ1 is d-periodic, …xp …l†; yp …l††
d
…h…xj †; k…xj †† as l ! 1 for each p. This convergence is
converges to …1=d† jˆ1
uniform in p. So we can take l su½ciently close to 1 so that for every p ˆ
1; 2; . . . ;
e
d
1X
…5:2†
…h…xj †; k…xj †† < :
…xp …l†; yp …l††
2
d jˆ1
Now we describe the equilibrium strategies s~ A S and t~ A T. Player 1's
strategy s~ follows the play o as long as player 2 does so. If player 2 deviated
from o at some stage, then from the next stage on s~ takes a pure action a~ A
argmina A A dmaxb A B k…a; b†e. Player 2's strategy t~ also follows o as long as
player 1 does so. If player 1 deviated from o at some stage, then at the next
stage t~ starts playing the pure strategy t^ constructed in the proof of Theorem
3.1 against player 1's strategy set S…m…l††.
The strategy s~ is implementable by an automaton of size at most d ‡ 1.
So for l su½ciently close to 1 so that m…l† > d, we have s~ A S…m…l††. Note
s; t~†; kl …~
s; t~†† ˆ …x…l†; y…l††. Thus, by (5.1) and (5.2), the strategy
that …hl …~
pair …~
s; t~† yields a payo¨ vector within e of …x; y†.
~
Take s A S…m…l†† and let …ol0 †y
lˆ1 be the play induced by …s; t †. Then, either
ol ˆ ol0 for all l or there is the smallest p such that op 0 op0 . (Note that both
…ol † and …ol0 † are deterministic plays.) In the latter case, Lemma 5.1 implies
that
hl …s; t~† < …1
l†
p 1
X
l
l 1
h…ol † ‡ l
p 1
lˆ1
h…op0 †
‡
!
lp
1
l
…h ‡ e† :
It follows from (5.1), (5.2), and the assumption x > h ‡ 4e, that, for l su½ciently close to 1, …h ‡ e† xp‡1 …l† < 2e. Since l=…1 l† ! y as l ! 1,
we have K ‡ …l=…1 l††……h ‡ e† xp …l†† < e for l su½ciently close to 1
and hence
hl …s; t~†
hl …~
s; t~† ˆ hl …s; t~†
U …1
x…l†
p 1
K‡
l†l
U …1
l†l p 1 … e† < 0:
l
1
l
……h ‡ e†
xp‡1 …l††
If l is su½ciently close to 1 so that …1 l†dK U l d e, then player 2 would
have no incentive to deviate from o at any stage. Indeed, if player 2 deviated
from o for the ®rst time in the p-th cycle, then the gain within the p-th cycle
from the deviation is at most …1 l†l… p 1†d dK while the loss she incurs from
the punishment is at least l pd e.
Thus we have shown that, for all l su½ciently close to 1, …~
s; t~† is an equilibrium of Gl …m…l†† that yields a payo¨ vector within e of …x; y†.
Now suppose that F~ ˆ f…h ; y†jk0 U y U k1 g where k U k0 U k1 and
k < k1 . Then there are action pairs …a; b† and …a 0 ; b 0 † such that h…a; b† ˆ h ,
324
A. Neyman, D. Okada
k…a; b† U k0 and …h…a 0 ; b 0 †; k…a 0 ; b 0 †† ˆ …h ; k1 †. For a given payo¨ vector
…x; y† in F~ with y > k , de®ne x ˆ …x1 ; . . . ; xd † as in (5.1) except that xj ˆ
…a; b† or …a 0 ; b 0 †. Let o be the cyclic play with the cycle x. De®ne a strategy
pair …~
s; t~† as follows: s~ is the same as above, and t~ follows o regardless of the
history. Note that player 1 receives h at every stage on the play o and the
assumption on F~ implies that this is the highest payo¨ for him in the stage
game. Thus player 1 cannot bene®t by deviating from o. The same argument
as above shows that player 2 has no incentive to deviate from o provided that
l is su½ciently close to 1. The proof for the case F~ ˆ f…x; k †jh0 U x U h1 g
Q.E.D.
with h U h0 U h1 and h < h1 is similar and we omit it.
For the following 2 2 game
L
R
T
0; 1
1; 0
B
1; 0
0; 1
…h ; k † ˆ …0; 1† and hence F~ ˆ f…0; 1†g. It is clear, however, that player 1
must receive a strictly positive payo¨ in any equilibrium of the l-discounted
game. Thus one cannot dispense with the condition on F~ in the statement of
Theorem 5.1.
6. Concluding remarks
In the proof of Claim 4.2, we argued that player 1's deviation from equilibrium path, if any, must occur in a very early stage of the repeated game and so
there are enough stages after the deviation so that player 2's punishment is
e¨ective. For this we only needed the condition on the order of magnitude of
the complexity bound, m…n† ˆ o…n†, which is weaker than our original condition m…n†log m…n† ˆ o…n†. The latter condition is needed to determine the
individually rational levels, or rather, we know the individually rational levels
only under this condition on m…n†. Suppose that we obtained a result that, for
a particular sequence …m…n††y
nˆ1 with m…n† ˆ o…n†,
lim V n …m…n†† ˆ Val…G† ˆ min max h…a; b†
n!y
b A D…B† a A A
for every two-person zero sum game G ˆ …A; B; h†. (See Neyman (1997)
Conjecture 1 and 2.) Then an essentially the same proof as in Theorem 4.1
shows that, for such sequence …m…n††y
nˆ1 ,
n
E …m…n†† !
…x; y† A F x V min max h…a; b†; y V min max k…a; b†
b A D…B† a A A
a A D…A† b A B
as n ! y, provided that there is a feasible payo¨ vector in which player 1
receives strictly more than minb A D…B† maxa A A h…a; b†. Similar argument holds
for the discounted games.
Two-person repeated games with ®nite automata
325
References
Ben-Porath E (1993) Repeated games with ®nite automata. Journal of Economic Theory 59:17±32
Kalai E, Stanford W (1988) Finite rationality and interpersonal complexity in repeated games.
Econometrica 56:397±410
Lehrer E (1988) Repeated games with stationary bounded recall strategies. Journal of Economic
Theory 46:130±144
Lehrer E (1994) Finitely many players with bounded recall in in®nitely repeated games. Games
and Economic Behavior 7:390±405
Neyman A (1985) Bounded complexity justi®es cooperation in the ®nitely repeated prisoner's
dilemma. Economics Letters 19:227±229
Neyman A (1997) Cooperation, repetition, and automata. In Cooperation: Game-Theoretic Approaches, ed. by S. Hart, and A. Mas-Colell, vol. 155 of NATO ASI-Seies F, pp. 233±255.
Springer Verlag
Neyman A (1999) Finitely repeated games with ®nite automata. Mathematics of Operations Research 23:513±552
Neyman A, Okada D (1999) Strategic entropy and complexity in repeated games. Games and
Economic Behavior 29:191±223
Neyman A, Okada D (2000) Repeated games with bounded entropy. Games and Economic Behavior 30:228±247
Papadimitriou CH, Yannakakis M (1994) On complexity as bounded rationality: Extended abstract. In STOC 94, pp. 726±733, Montreal, Quebec, Canada
Zemel E (1989) Small talk and cooperation: A note on bounded rationality. Journal of Economic
Theory 49:1±9