Convergence of Best-response Dynamics
in Extensive-form Games
Zibo Xu1
Singapore University of Technology and Design
First version: June 2013, revised: September 3, 2014
Abstract
This paper presents a collection of convergence results on best-response dynamics in extensive-form games. We prove that in all finite generic extensive-form
games of perfect information, every solution trajectory to the continuous-time
best-response dynamic converges to a Nash equilibrium component. We show the
robustness of this convergence in the sense that along any interior approximate
best-response trajectory, the evolving state is close to the set of Nash equilibria
most of the time. We also prove that in any perfect-information game in which
every play contains at most one decision node of each player, any interior approximate best-response trajectory converges to the backward-induction equilibrium.
Our final result concerns self-confirming equilibria in perfect-information games. If
each player always best responds to her conjecture of the current strategy profile,
and she updates her conjecture based only on observed moves, then the dynamic
will converge to the set of self-confirming equilibria.
JEL classification: C73, D83.
Keywords: Convergence, extensive form, perfect information, backward-induction
equilibrium, best-response dynamics, fictitious play, self-confirming equilibrium.
1
The author is grateful to Sergiu Hart and Jörgen Weibull for many suggestions and discussions. The author also wishes to thank Carlos Alós-Ferrer, Itai Arieli, Ross Cressman, Itzhak
Gilboa, Josef Hofbauer, Christoph Kuzmics, Larry Samuelson, Bill Sandholm, Karl Schlag,
Tomas Sjöström, Tristan Tomala, and Eyal Winter for their comments. The author would like to
acknowledge financial support from the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013), ERC grant 249159, the Knut and
Alice Wallenberg Foundation, and Agence Nationale de la Recherche grant ANR-10-BLAN0112.
1
1
Introduction
The notion of best response lies at the heart of Nash equilibrium (Nash, 1950):
each player selects a best response to the other players’ strategies and no player
can benefit by unilaterally changing her strategy. If we consider a dynamic setting
where each player always chooses a best response to the current strategies of the
other players, does this dynamic always converges to a Nash equilibrium?
We adopt the definition of continuous-time best-response dynamic formulated
by a constant revision rate and myopic optimization, as found in Matsui (1989),
Gilboa and Matsui (1991), Hofbauer (1995), and Balkenborg et al. (2013). In
such a dynamic, a state specifies the strategy profile of all players; the frequency
of a strategy increases only if it is a best response to the current state. The
continuous-time best-response dynamic has been analyzed in various classes of
games in strategic (or normal) form; see Hofbauer and Sigmund (1998) and Sandholm (2010). In particular, the convergence of a continuous-time best-response
dynamic to the set of Nash equilibria has been shown in Hofbauer (1995) for
two-player zero-sum games, in Benaim et al. (2005) for k-player identical interest
games, and in Berger (2005) for 2 × n games.
In the present paper, we consider extensive-form games of perfect information. In addition to its explicit representation of players’ sequential moves, each
perfect-information game admits a backward-induction equilibrium, and backward
induction itself is a process to determine a finite sequence of local best-response
moves; see Selten (1965, 1975). The present paper studies the predictive power
of the best-response dynamic in an extensive-form game of prefect information.
In particular, we focus on the convergence results of the continuous-time bestresponse dynamic concerning the set of Nash equilibria or the backward-induction
equilibrium (or so-called subgame-perfect equilibrium) in an extensive-form game.
In a standard evolutionary setting, a state x introduced above, i.e., a strategy
profile, can be regarded as a population state, where each component xa represents
the population share of individuals who are programmed to the pure strategy a.
The payoffs represent fitness of population shares. Even without specifying the
exact math model, we can obtain some intuition by applying the following classical
example of the two-player game of Figure 1 in the evolutionary setting.
This game has two Nash equilibria in pure strategies (r1 , r2 ) and (l1 , l2 ), where
the first one is the backward induction equilibrium. Assume that for each player
there is a continuous population of individuals playing the game in that role.
The populations are distinct, and each individual plays a pure strategy of the
2
I
l
1
r1
1
II
2 l2 r2
0 2
0 1
Figure 1: Best-response dynamics in a simple game
corresponding player. Suppose that in the initial state x, everyone in population
I is playing l1 and xl2 , the share of population II playing l2 , is at least 1/2. Then,
on the one hand, population I has no incentive to convert to strategy r1 ; on the
other hand, since the proper subgame is unreached, population II is indifferent
between l2 and r2 .
If a best-response trajectory starts from a state where more individuals in population II are playing r2 than l2 , then population I’s best response strategy is r1 .
Once xr1 becomes positive, it will always be positive from that time on, and hence
the proper subgame will always be reached. (This follows from a fundamental
proposition of continuous-time best-response dynamics.) The strategy l2 is then
wiped out in the long run, and the trajectory is converging to the backward induction equilibrium. To summarize, a best-response dynamic always converges to the
set of Nash equilibria, and it converges to the subset of non-backward-induction
equilibria only if xr2 is always no more than 1/2.
In the following general result for extensive-form games, we apply the generic
condition, i.e., no two pure strategy profiles that yield different outcomes generate
the same payoff for any player.
Main Result 1: The continuous-time best-response dynamic always converges
to the set of Nash equilibria in all finite generic extensive-form games of perfect
information.
In a general game, the predictive power of the continuous-time best-response
dynamic, or even any simple and natural dynamic process, is limited. Hofbauer
and Sigmund (1998, Theorem 8.6.1) show that there is no natural or reasonable
evolutionary dynamic process that leads to an equilibrium in every normal-form
game. This convergence problem has also been studied from the perspective of
3
natural adaptive learning rules. In uncoupled dynamics, a player’s strategy is independent of the payoff functions of the other players, but it need not be independent
of the player’s own payoff function or the other players’ strategies. Some common
uncoupled dynamics are best-response dynamics, better-response dynamics, and
fictitious play. Hofbauer and Sandholm (2002) and Hart and Mas-Colell (2003)
find that some continuous-time uncoupled dynamics can guarantee convergence to
the set of Nash equilibria in certain special classes of games, like zero-sum games,
potential games, dominance-solvable games, and more. Nevertheless, Hart and
Mas-Colell (2005) prove the general negative result that continuous-time uncoupled dynamics cannot guarantee Nash convergence in all games.
Cressman (2003) observes the possibly complicated dynamic process in a game
where each player can move more than once (see Comment at Figure 4), and
writes, “Convergence of best-response dynamics in extensive-form games of perfect information is an open problem, even in the case where there is a unique Nash
equilibrium component.” We dispense with the condition of the unique Nash equilibrium component in this open problem, and prove Main Result 1 in Section
3.
In any finite generic extensive-form game of perfect information, any interior
solution trajectory to the replicator dynamic converges to a Nash equilibrium,
which is proved by Cressman and Schlag (1998). However, this result can only be
applied to an interior trajectory. If a solution trajectory to the replicator dynamic
starts from an initial state of a pure strategy profile, then the trajectory will stay
at the initial state forever.
Unlike the replicator dynamic, the solution trajectory to the best-response
dynamic may not be unique. It is because the strategy adjustment rate may not
be continuous, and there may be multiple best-response strategies for a player at
some time. When the state is in the basin of attraction to a Nash equilibrium
component, some player’s induced behavior strategy may remain unchanged in
a subgame off the equilibrium path. If that induced strategy generates multiple
best responses from other players, the chosen best-response strategies of those
players may “drift” freely. (See the game in Figure 3 for an example.) Hence, we
cannot guarantee the convergence of an (even interior) solution trajectory to any
particular Nash equilibrium in this case.
For generic perfect-information games, we can guarantee only the convergence
of the best-response dynamic to a Nash equilibrium component, which is a set of
Nash equilibria with the same outcome. In the present paper, we define a quasistrategy as a class of pure strategies that generate the same outcome in the game
4
regardless of the strategies used by the other players. A quasi-strategy captures all
the relevant information for a dynamic process, if we ignore any move that cannot
be implemented under any strategy combination of all other players. Moreover,
for any conjecture of any player at unreached nodes in the original extensive-form
game, we could formulate it by quasi-strategies. Hence, we could also define a selfconfirming equilibrium by quasi-strategies. The application of quasi-strategies is
essential to our understanding of the dynamics as well as to the proof. Without
it, we may be facing the dilemma illustrated in Hendon et al. (1996); see Section 7
for further discussion. For an extensive-form game of prefect information, we show
in Appendix A the equivalence of a quasi-strategy with a strategy in the semireduced normal form. In this case, a quasi-strategy may be viewed as a strategy
of a reduced form associated with the game tree.
To show that the convergence of an interior best-response trajectory in our
Main Result 1 is robust, we consider an approximate best-response dynamic
adapted from the -accessible path defined in Gilboa and Matsui (1991). This
is a weaker version of the continuous-time best-response dynamic, in the sense
that a limitation is imposed on the players’ ability to recognize the current state.
Therefore, the direction at each point of the trajectory may not be a best response
to the current state, but rather a best response to a different interior sate very close
to the current state. Although an interior approximate best-response trajectory
need not converge to the set of Nash equilibria (see Figure 5 for a counterexample),
we have the following result of convergence in frequency.
Main Result 2: In any finite generic extensive-form game of perfect information, along any interior solution trajectory to the approximate best-response
dynamic, the evolving state is close to the set of Nash equilibria most of the time.
In Section 4 we will formalize the definition of the approximate best-response
dynamic. Benaim and Weibull (2003) apply this convergence notion to a stochastic
model, and call the time fractions of convergence there the empirical visitation
rate. Young (2009) also applies this convergence notion to show that behavior
in so-called interactive trial-and-error learning comes close to a Nash equilibrium
most of the time.
For the technical part, we show that in the best-response dynamic the outcome
converges to a single play and that, moreover, that play is an equilibrium path.
For the approximate best-response dynamic, we still follow this approach and also
apply induction to the set of decision nodes in order to study the movement of
5
the solution trajectory. Note that the set of Nash equilibria may change in the
inductive step. To avoid discussion of such a possible change in the set of Nash
equilibria, we show that most of the time the outcome concentrates on a single
play, which turns out to be an equilibrium path.
The backward-induction equilibrium is traditionally regarded as the rational
solution in an extensive-form game of perfect information, see, e.g., Aumann
(1995). Moreover, in a generic perfect-information game, the unique backwardinduction outcome is the same as the one predicted by the rationalizability notions
like extensive-form rationalizability introduced in Pearce (1984) and backward
dominance procedure introduced in Perea (2014). It is natural to ask whether a
best-response trajectory always converges to the backward-induction equilibrium
component. The short answer is no, as we will see in the game in Figure 5. However, we show in Section 5 the positive results below for the backward-induction
equilibrium in a game tree wherein no player moves more than once on a play. In
related literature, the stability of backward-induction equilibrium in such a game
is well justified in static models, see, e.g., Ben Porath (1997); similar evolutionary
dynamic results to Main Result 3 can be found in Hart (2002) for discrete-time
better-reply dynamics as well as in Cressman (2003) for replicator dynamics.
Main Result 3: For any finite generic extensive-form game of perfect information, from any interior initial state, we can find an approximate best-response
trajectory that converges to the backward-induction equilibrium. Moreover, in the
case where every play in the game tree contains at most one decision node of each
player, any interior solution trajectory to the approximate best-response dynamic
converges to the backward-induction equilibrium.
The best-response dynamic presented so far is constructed on the state space
of standard (or quasi-) strategy profiles. We could consider an alternative model
of continuous-time best-response dynamics in a learning process. At each time,
each player updates her conjecture about the strategies of all the other players.
On the basis of the initial conjecture, the current conjecture is generated by the
observations of the moves taken by all the other players in the past. At each time,
each player best responds to her conjecture, and the realized moves in the game
tree are observed by everyone. A new conjecture is thus generated and the process
goes on. The implementation of this learning process requires that the conjecture
can be updated at only currently connected decision nodes, and thus players can
only learn from their observation in the extensive form of the game. A similar
6
dynamic model is used in Noldeke and Samuelson (1992) and Kuzmics (2004).
We show the following convergence result for this dynamic in Section 6.
Main Result 4: In any finite generic extensive-form game of perfect information, every solution trajectory to the continuous-time best-response dynamic
formulated upon conjectures converges to the set of self-confirming equilibria.
2
The Model
We adopt the standard definition of a finite extensive-form game of perfection
information; see, e.g., Kreps and Wilson (1982), Hart (1992), and Ritzberger
(2002). The main contribution here is the introduction of the quasi-strategy, i.e.,
a class of strategies with an equivalent outcome.
Given a set N of finitely many nodes, we define a partial-order binary relation
≺ on N that represents precedence. We further suppose an initial node n0 as a
predecessor of all other nodes in N . Such (N, ≺) defines a tree T , and we call n0 the
root of T . We define the immediate-predecessor function ψ : n0 7→ max{n : n ≺ n0 }
for all n0 in N \ {n0 }. Let Ψ be the predecessor function Ψ : N → 2N with
Ψ(n0 ) = {n ∈ N : n ≺ n0 }.
We denote ψ −1 to be the immediate-successor function. Thus, ψ −1 (n) = {n0 ∈ N :
n = ψ(n0 )} for all n in N . The successor function Ψ−1 can be similarly deduced.
We call a node n a terminal node if ψ −1 (n) = ∅, and write Nt := {n ∈ N, ψ −1 (n) =
∅}.
We say that a sequence {n1 , ..., ni } of nodes is a subplay in the tree T if
nj−1 = ψ(nj ) for all 1 < j ≤ i. If n1 = n0 and ni ∈ Nt , then it is called a
play. Denote the set of all plays by H. If a subplay has exactly two elements, then
it is called a move in the game.
We define below a k-player extensive-form game of perfect information on the
finite tree (N, ≺). Denote by {Λ0 , Λ1 , ..., Λk } a partition of Nd := N \ Nt , and call
it the assignment of decision nodes. The members of Λ0 are called chance nodes;
for each i ≤ k, the members of Λi are called the decision nodes of player i. Given
a node n ∈ N , we put λ(n) as the indicator of which player moves on this node.
Hence λ(n) = i, if n ∈ Λi . For chance nodes, define τ : ψ −1 (Λ0 ) → [0, 1] to be a
probability distribution function such that
X
τ (n0 ) = 1 ∀n ∈ Λ0 .
(2.1)
n0 ∈ψ −1 (n)
7
We define a vector v = (v 1 , ..., v k ) such that each v i : H → R is a Bernoulli
function of player i for all 1 ≤ i ≤ k. Since there is an one-to-one correspondence
between Nt and H, we may abuse the notation and write v i (n) = v i (h) when
n ∈ h ∩ Nt . We call the quadruple (T, N , τ, v) an extensive-form game of perfect
information.
Given an extensive-form game Γ of perfect information, for each player i, a pure
strategy ai assigns a successor ai (n) to each node n in Λi . Denote the set of pure
strategies of player i by Ai , and the set of pure-strategy profiles by A = ×ki=1 Ai .
We denote the probability distribution of plays in game Γ for a pure strategy
P
profile a to be a function ρa : H → [0, 1] with h∈H ρa (h) = 1. (Note that H is
finite.) We say that a node n̄ is reached under a pure strategy profile a if
X
ρa (h) > 0.
(2.2)
h∈H:n̄∈h
When Λ0 = ∅, given such a pure strategy profile a in A, we can find a unique play
h = {n0 , n1 , ..., nm } such that ρa (h) = 1.
The set of mixed strategies for player i is defined as
)
(
X
i
i
i
i
i
i
X := ∆(A ) =
σ (a) a∈Ai : σ (a) ≥ 0 ∀a ∈ A , and
σ (a) = 1 . (2.3)
a∈Ai
Hence a mixed strategy xi is a vector of probabilities assigned to each pure strategy
in Ai . The set of mixed-strategy profiles is denoted by X = ×ki=1 X i . We call
the induced probability distribution of a mixed-strategy profile x over plays the
outcome of x. Kuhn (1950, 1953) shows that in any game, for any player and
mixed strategy, there exists a behavior strategy such that, against all strategy
combinations of other players, induces the same outcome as the mixed strategy
does. Here, for player i, a behavior strategy π i assigns to each decision node
n ∈ Λi a probability measure on the set {(n, n0 ) : n = ψ(n0 )} of moves. That is, a
behavior strategy π i may be viewed as a function from the set of all moves (n, n0 )
with n ∈ Λi to [0, 1] such that given any n ∈ Λi , π i (n, n0 ) ≥ 0 for each n0 ∈ ψ −1 (n)
P
and n0 ∈ψ−1 (n) π i (n, n0 ) = 1. Denote the set of behavior strategies of player i by
Πi , and the set of behavior strategy profiles by Π = ×ki=1 Πi .
In game Γ, a pure-strategy profile a generates a payoff vector
X
u(a) =
ρa (h)v(h).
h∈H
We can linearly extend it to a mixed-strategy profile x:
!
k
Y
X
u(x) =
xi ai u(a) .
a∈supp(x)
i=1
8
(2.4)
A mixed-strategy profile x is a Nash equilibrium in game Γ if
ui (x) ≥ ui (y i , x−i )
for every i ≤ k and every y i ∈ X i , where x−i := (xj |1 ≤ j ≤ k, j 6= i). We denote
by N E the set of all Nash equilibria.
A subtree rooted at a node n is the truncated tree (Ψ−1 (n)∪{n}, ≺). A subgame
rooted at node n is the corresponding subtree with the projection assignment of
decision nodes and the payoff functions. We denote this subgame by Γn , and
denote the set of all nodes in Γn by N (Γn ). For a behavior strategy π i , we define
the behavior strategy π i (Γn ) induced by π i on Γn such that π i (Γn ) assigns to each
decision node n̄ ∈ Λi ∩ N (Γn ) the same probability measure as by π i on the set
{(n̄, n0 ) : n̄ = ψ(n0 )} of moves. For a behavior strategy profile Π, we can similarly
define the behavior strategy profile Π(Γn ) induced by Π on Γn .
A Nash equilibrium is a subgame-perfect equilibrium (or backward-induction
equilibrium) if it induces a Nash equilibrium in all subgames. Kuhn (1953) proves
that there always exists a pure-strategy backward-induction equilibrium in a finite
generic extensive-form game of perfect information, constructed from the terminal
nodes and going towards the root. Under the generic condition, the backwardinduction equilibrium in Γ is unique.
For a mixed strategy profile x, we say that a node n̄ is reached under x if there
exists a pure strategy profile a with non-zero probability in x such that node n̄ is
reached under a. A realized play of a Nash equilibrium is also called an equilibrium
path.
In an extensive-form game, two different pure strategies of the same player
always induce the same probability distribution over plays, if they differ only at
unreached nodes (cf. Proposition 4.1 in Ritzberger (2002)). This observation
suggests a lower-dimensional representation of an extensive-form game. We call
two pure strategies ai1 and ai2 for player i outcome equivalent and write ai1 ∼ ai2
if with every combination a−i of strategies for the other players, the outcomes
generated by these two strategies are the same, i.e.,
ρ(ai1 ,a−i ) (h) = ρ(ai2 ,a−i ) (h) ∀h ∈ H ∀a−i ∈ A−i .
(2.5)
Such a relationship of outcome equivalence generates for each player i a partition
B i of the set Ai . That means:
(i) The union of all subsets in B i equals Ai .
(ii) Given any bi in B i , for any two strategies ai1 , ai2 ∈ bi , (2.5) holds.
9
(iii) If a pair (ai1 , ai2 ) of strategies satisfies the property (2.5), then both ai1 and
ai2 belong to the same element in B i .
Thus, each bi is an equivalence class. We call B i the set of pure quasi-strategies
of player i, and B := ×ki=1 B i the set of pure quasi-strategy profiles. Given a
pure quasi-strategy bi of player i, if for each ai ∈ bi , ai (n) = n0 at some node
n, then we also write bi (n) = n0 . We denote the domain of bi by N̄bi . Given
a pure quasi-strategy profile b = (b1 , ..., bk ), we can take a pure strategy profile
a = (a1 , a2 , ..., ak ) with ai ∈ bi for all 1 ≤ i ≤ k, and we define the payoff vector of
profile b as u(b) = u(a). The set of mixed quasi-strategies and the payoff vector
of a mixed quasi-strategy profile can be defined analogously to (2.3) and (2.4),
respectively.
In the example of game Γ̄ in Figure 2, a pure strategy of player I that includes the move α1 at the root must also specify the move she would play at the
bottom node. We do not, however, specify the move at the bottom node for the
corresponding quasi-strategy, as it is impossible to reach the bottom node in that
case. Hence, in this one-player game, there are only three quasi-strategies corresponding to α1 , α2 , and α3 , respectively. See Appendix A for a comparison between quasi-strategies and strategies in reduced normal form in an extensive-form
game: a quasi-strategy is defined on the realization of outcome, while a reduced
normal-form strategy concerns the payoff equivalence. Nash equilibrium can also
α1
7
I
α2
I
α3
3 4
Figure 2: Game Γ̄
be defined using quasi-strategies. When there is no ambiguity, we may simply
call a quasi-strategy as a strategy, and denote the set of mixed quasi-strategies
of player i by X i . (In particular, X i denotes the set of mixed quasi-strategies of
player i in all proofs.)
We can also define a quasi-strategy in the form of a behavior strategy. Recall
that a standard behavior strategy π i defines at each node in Λi a probability
measure over the set of moves. We let a behavior quasi-strategy wi only assign a
probability measure over moves at each node n ∈ Λi with the proposition that for
10
any move (n̄, n0 ) with n̄ ∈ Ψ(n) ∩ Λi and n0 ∈ ψ −1 (n̄) ∩ (Ψ(n) ∪ {n}), wi (n̄, n0 ) > 0.
That is, at each n’s predecessor node n̄ where player i moves, wi assigns a positive
probability on the move from n̄ towards n. This definition is consistent with
the proposition that a quasi-strategy only specifies the moves not precluded by
previous moves. For wi , we denote by D(wi ) the set of all those nodes n in the
definition of wi . Note that any node n in Λi with Ψ(n) ∩ Λi = ∅ is trivially in
D(wi ). We define the set of all behavior quasi-strategy of player i by W i and the
set of all behavior quasi-strategy profile by W := ×i W i .
Given a quasi-strategy xi , we deduce below the unique equivalent behavior
quasi-strategy wi such that against all combinations of strategies of other players,
xi and wi induce the same outcome. For each player i, any quasi-strategy xi
induces a probability along each move (n1 , n2 ) with n1 ∈ D(wi ):
P
i i
bi :n1 ∈N̄bi ,bi (n1 )=n2 x (b )
P
wi (n1 , n2 ) = P
.
(2.6)
i i
n2 ∈ψ −1 (n1 )
bi :n1 ∈N̄ i ,bi (n1 )=n2 x (b )
b
The sequence (wi (n1 , n2 ))n1 ∈D(wi ),n2 ∈ψ−1 (n1 ) defines the equivalent behavior quasistrategy.
3
Continuous-time Best-response Dynamics
Given a player i in a k-player extensive-form game Γ of perfect information, for
any state, i.e., a mixed-strategy profile
k
x = (xi )1≤i≤k ∈ × ∆(Ai ),
(3.1)
i=1
let BRi (x) be the set of best responses in mixed strategies for player i, i.e.,
BRi (x) = argmax ui (z i , x−i ).
(3.2)
z i ∈∆(Ai )
We define the continuous-time best-response dynamic in the form
ẋi = g i − xi
(3.3)
for some g i ∈ BRi (x), with dots for time derivatives and suppressing time arguments. Hence,
ẋi ∈ BRi (x) − xi .
A continuous-time solution trajectory to the best-response dynamic (also called
a continuous-time best-response trajectory) is a mapping ξ : R × X → X which
11
to each initial state x0 ∈ X and time t ∈ R assigns the state ξ(t, x0 ) ∈ X with the
property (3.3). Since the best-response correspondence is upper hemicontinuous
with closed and convex values, a solution trajectory through any given initial state
exists, though it is not necessarily unique; see, e.g., Aubin and Cellina (1984).
For a subset S ⊆ X of mixed quasi-strategy profiles, denote the -neighbourhood
of S by S[]. We apply the supremum norm here. That is, x ∈ S[] if and only if
there exists a x̄ ∈ S such that, for all i with 1 ≤ i ≤ k and all ai ∈ Ai ,
|xi (ai ) − x̄i (ai )| ≤ .
Given a state x for an extensive-form game and a positive number , we may abuse
the notation and write the -neighbourhood of x as x[] = {x}[].
Theorem 3.1. Given a finite generic extensive-form game of perfect information, the continuous-time best-response trajectory converges to a Nash equilibrium
component. That is, given any > 0 and any initial state x0 , for any solution
trajectory ξ(t, x0 ) to the best-response dynamic, there exist a Nash equilibrium
component EC and a time τ such that for all t > τ
ξ(t, x0 ) ∈ EC[].
Comment 1: For simplicity, we consider only extensive-form games without
chance nodes in the proof below. If we drop this condition, the result still holds
under the generic condition; see Appendix C for more details.
Comment 2: The continuous-time best-response dynamic is closely related
to fictitious play. The sequence ((xit )1≤i≤k )t∈N is a discrete-time fictitious play if
xi1 ∈ ∆(Ai ) and xit+1 =
txit + gti
t+1
for some gti ∈ BRi (xt ), for all 1 ≤ i ≤ k and all t ≥ 1. Cressman (2003, Theorem
8.2.9) proves that a discrete-time fictitious play converges to N E in a finite generic
extensive-form game of perfect information. From (3.2), it follows that xt+1 − xt =
(gt − xt )/(t + 1). If we equate this difference to ẋ|t , we obtain a nonautonomous
system due to the factor 1/(t+1). We can reach the definition of the best-response
dynamic if we ignore this time factor. In the other direction, after rescaling of time
in the best-response dynamic, which does not affect the shape of the trajectory,
we reach the so-called continuous-time fictitious play:
BRi (x) − xi
.
t
Theorem 3.1 also applies to the continuous-time fictitious play.
ẋi ∈
12
3.1
Sketch of the Proof
Since each standard strategy maps to a quasi-strategy, we can describe states and
the set of best responses in the dynamic by quasi-strategies. We apply quasistrategies in the proof.
In an extensive-form game Γ, if a move is included in the set of best responses
in quasi-strategies for only a finite period of time, then xt (b) is decreasing to zero
for any quasi-strategy b including that move, where xt is the state in the solution
trajectory at time t. Therefore, we only need to focus on a play (n1 , n2 , ..., ni ) such
that each move (nj , nj+1 ) along the play is part of a best-response quasi-strategy
for infinitely long time, and we call such a play a perpetually realized play.
We can show the existence of a perpetually realized play by the finite structure
of the game tree. Moreover, due to the generic condition, the perpetually realized
play is unique for any solution trajectory. To see this, let us suppose that there
are l > 1 perpetually realized plays in the extensive-form game, and we fix two
of them, denoted by h1 and h2 , such that there is no node of a third perpetually
realized play in the subgame rooted at the last intersection node of h1 and h2 . We
denote that intersection node by n. From the generic condition, it follows that
either play h1 or h2 gives the player who moves at node n a higher payoff than the
one assigned to the other play. By the definition of the best-response dynamic,
at most one play between h1 and h2 qualifies as a perpetually realized play. It
further implies that the perpetually realized play is unique in the finite game,
and we denote it by hp . Therefore, the solution trajectory to the best-response
dynamic converges to the set of strategy profiles that generate hp .
To prove that the unique perpetually realized play is in fact an equilibrium
path, we first use Lemmata B.1 and B.2 in the Appendix to show that if a play h
is not an equilibrium path, then there exists at least one node n in h such that, in
any state where all players follow the play h, a deviation from the node n can give
the player who moves at n an extra payoff bounded away from zero. Hence, if hp
is not an equilibrium path, then by the generic condition there exists a node n in
hp such that, for every strategy profile with the outcome close enough to hp , the
player who moves at n can deviate from hp for a higher payoff. It further implies
that the probability that the play hp is followed is decreasing in the dynamic.
Yet, from our previous result of the perpetually realized play, we know that the
distribution of the outcome converges to hp . Hence hp must be an equilibrium
path.
Rigorously speaking, the convergence of the outcome to an equilibrium path
13
is not equal to the convergence of the solution trajectory to the corresponding
equilibrium component. We show that from some time on, the behavior strategy
projection off the equilibrium path is always not far away from the projection of
a strategy profile in the equilibrium component.
3.2
Notations and Preliminary Results
If we would like to emphasize that a notation is with respect to a game G, then
we add (G) after the notation, e.g., N (G), Nt (G), and Λi (G).
For convenience in the proof, we sometimes write a best-response dynamic
(xt )t≥0 for short, instead of the more explicit a solution trajectory ξ(t, x0 ) to the
continuous-time best-response dynamic.
In the generic k-player extensive-form game Γ, we put κ := |Nt |. We enumerate
the assigned payoffs at all terminal nodes to each player i in order, respectively,
i
i
i
i
for
> v(n)
) such that the superscripts specify the players and v(m)
, ..., v(κ)
as (v(1)
all m < n, 1 ≤ i ≤ k. We can find a positive number δ < 1 such that for all i ≤ k
and j ≤ κ
i
i
i
i
(1 − δ)v(j)
+ δv(1)
< (1 − δ)v(j−1)
+ δv(κ)
.
(3.4)
That is, player i’s preference over terminal nodes remains unchanged, even if there
is uncertainty of probability δ over which terminal node is finally reached. We fix
such a δ for the proof later.
As a quasi-strategy represents a class of standard strategies in an extensiveform game, we can apply quasi-strategies for the dynamic in the proof of Theorem
3.1. Formally, from the definition of quasi-strategies of each player i, we may
define a projection mapping si from Ai to B i , and we replace a state x in (3.1) by
X
k
∈ × ∆(B i ),
(3.5)
x =
xi (ai )
ai :si (ai )=bi
i=1
bi ∈B i
1≤i≤k
and apply it in (3.2) and (3.3).
Given a move (n1 , n2 ) with n1 ∈ Λi and a behavior quasi-strategy wi , if
wi (n1 , n2 ) is defined (see (2.6)), we may simply say wi (n1 , n2 ) ≥ 0. We may also
write w(n1 , n2 ), instead of the more explicit wi (n1 , n2 ). For a play h = (n1 , ..., n|h| )
in Γ, we denote the probability of h followed in game Γ for wt , the equivalent behavior quasi-strategy profile of xt at time t, by
( Q
|h|
j=1 wt (nj−1 , nj ) if wt (nj−1 , nj ) ≥ 0∀j ≤ |h|
ρxt (h) :=
(3.6)
0
otherwise.
14
We sometimes write ρt (h) as a shorthand for ρxt (h).
For a pure quasi-strategy bi of player i, we can denote its domain as N̄bi , i.e.,
n ∈ N̄bi ,
(3.7)
if for all nodes n0 ∈ Ψ(n) ∩ Λi , bi (n0 ) ∈ Ψ(n) ∪ {n}. This definition requires that,
if player i moves towards a node n at every n’s predecessor node where she plays,
then player i should also specify her move at node n in her quasi-strategy. Note
that N̄bi = D(wi ) if wi is the equivalent behavior quasi-strategy of bi . We can
then represent a pure quasi-strategy bi of player i by a sequence of moves
S(bi ) := (n, bi (n)) : n ∈ N̄bi .
(3.8)
S
For a mixed quasi-strategy xi of player i, we let N̄xi := bi :xi (bi )>0 N̄bi .
Given a subtree Γn , for a pure quasi-strategy bi of player i, the continuation
strategy bi (Γn ) is the subsequence
(n0 , bi (n0 )) : n0 ∈ N̄bi ∩ N (Γn )
in S(bi ). (Recall that N (Γn ) contains all nodes in the subtree Γn .) We can
then define a mixed continuation strategy xi (Γn ) as a probability distribution on
{bi (Γn )}bi ∈B i in a natural way.
At time t in the best-response dynamic defined in (3.3), for each player i, we
S
let S̄ti be bi :gti (bi )>0 S(bi ). Given a pair of nodes (n, n0 ) with n = ψ(n0 ) in Γ, we
define the integral (rigorously speaking, the infinite lower Lebesgue integral)
Z ∞
0
f (n, n ) :=
1(n,n0 )∈S̄ti dt
(3.9)
0
where player i = λ(n). (Recall that 1 is the indicator function.) Hence, f (n, n0 )
is the total time that the move (n, n0 ) is included in a best response used in the
dynamic.
We give several preliminary lemmata below for a best-response dynamic (xt )t≥0
in a finite generic extensive-form game Γ of perfect information.
If the behavior quasi-strategy of player i at time t assigns a positive probability
on a move (n, n0 ), and (n, n0 ) is not included in the best-response quasi-strategy gti ,
then the (mixed) continuation strategy of player i in subgame Γn0 keeps unchanged
at time t.
Lemma 3.2. For a move (n, n0 ) with n ∈ Λi , if (n, n0 ) ∈
/ S̄ti and wt (n, n0 ) > 0,
then
ẋi (Γn0 )|t = 0.
(3.10)
15
Proof. This lemma straightforwardly follows from the definition of best-response
dynamic in (3.3).
If a move (n, n0 ) is never included in the best-response quasi-strategy gti , then
the probability of any pure quasi-strategy of player i including the move (n, n0 ) is
in exponential decay in the dynamic.
Lemma 3.3. If ẋi (bi )|t < 0 for some pure quasi-strategy bi at some time t ≥ 0,
then there is at least one move (n, n0 ) in S(bi ) but not in S̄ti . If there is a move
(n, n0 ) such that n ∈ Λi and (n, n0 ) ∈
/ S̄ti for all t ≥ 0, then for any pure quasistrategy bi with (n, n0 ) ∈ S(bi ),
xit (bi ) = xi0 (bi )e−t , ∀t ≥ 0.
(3.11)
Proof. From the definition of best-response dynamic in (3.3), we have the first
desired conclusion. It also follows that for any bi with (n, n0 ) ∈ S(bi )\ S̄ti , ẋi (bi )|t =
−xit (bi ). Solve the differential equation and we reach the second desired conclusion.
The following lemma considers the situation that the behavior quasi-strategy
at time t assigns a positive probability on each move departing from a node n̄ ∈ Λi ,
but only one move (n̄, n0 ) of them is included in the best-response quasi-strategy
gti .
Lemma 3.4. At any t̃ ≥ 0, given a node n̄ ∈ Λi , if there is a move (n̄, n0 ) ∈ S̄t̃i ,
and if for all nodes n ∈ ψ −1 (n̄) \ {n0 }, (n̄, n) ∈
/ S̄t̃i and wt̃ (n̄, n) > 0, then for each
such node n,
0
1
w(n̄, n0 )
=
,
(3.12)
w(n̄, n) t̃ wt̃ (n̄, n)
where the prime denotes the derivative with respect to time.
Proof. From the definition of best-response dynamic in (3.3), it follows that
ẇ(n̄, n0 )|t̃ = 1 − wt̃ (n̄, n0 ) and ẇ(n̄, n)|t̃ = −wt̃ (n̄, n) for any n ∈ ψ −1 (n̄) \ {n0 }.
Hence,
w(n̄, n0 ) 0
)
w(n̄, n) t̃
ẇ(n̄, n0 )|t̃ · wt̃ (n̄, n) − wt̃ (n̄, n0 ) · ẇ(n̄, n)|t̃
=
(wt̃ (n̄, n))2
(1 − wt̃ (n̄, n0 ))wt̃ (n̄, n) − wt̃ (n̄, n0 )(−wt̃ (n̄, n))
=
(wt̃ (n̄, n))2
1
=
.
wt̃ (n̄, n)
(
16
3.3
Proof of Theorem 3.1
Under a best-response dynamic, we would like to find and then focus on a play
that is followed with probability close to one in the long run.
The following lemma shows that departing from some node, if one move is
included in the best-response strategy profile gt for infinitely long time while all
other moves not, then the ratio of the probability that the first move is followed
in the best-response dynamic to the sum of the probabilities of all other moves
increases to infinity.
Lemma 3.5. For a move (n̄, n0 ) with f (n̄, n0 ) = +∞, suppose a set N n̄ ⊆ ψ −1 (n̄)\
{n0 } with f (n̄, n) < +∞ for all n in N n̄ . Then, for every > 0, there exists a
time t̄ such that
X
∀t > t̄,
wt (n̄, n) < wt (n̄, n0 ).
(3.13)
n∈N n̄
Proof. Denote the player who moves at node n̄ by player i. From (3.9), it follows
that for every µ > 0 there exists a time t(µ) such that
X Z ∞
1{t:(n̄,n)∈S̄ti } dt < µ.
n∈N n̄
t(µ)
The desired result then follows from (3.10) and (3.12).
Lemma 3.6. There exists a play h = (n1 , ..., n|h| ) in Γ such that f (ni , ni+1 ) = +∞
for all i with 1 ≤ i < |h|. We call such a play a perpetually realized play.
Proof. This follows from the definition of a quasi-strategy (note the definition of
S(bi ) in (3.8)) and the finiteness of the game tree.
Recall that |N | is the number of nodes in Γ. We show below that the bestresponse trajectory must converge to the unique perpetually realized play.
Lemma 3.7. There is only one perpetually realized play hr = (n1 , ..., n|hr | ) in Γ.
Given an > 0, we denote ˆ = (δ)/(8|N |). For every > 0 there exists a time
t() such that
ρt (hr ) > 1 − ˆ
(3.14)
for all t > t().
Proof. We first prove by contradiction that there is only one perpetually realized
play. Suppose that there are j > 1 perpetually realized plays. Then we can find
two perpetually realized plays h1 = (n1l )1≤l≤|h1 | and h2 = (n2l )1≤l≤|h2 | such that the
following properties hold for some integer c.
17
(i) We can fix a node nc := n1c = n2c such that n1l = n2l for all 1 ≤ l ≤ c and
n1c+1 6= n2c+1 .
(ii) For any other perpetually realized play hr , (hr \ {nc }) ∩ N (Γnc ) = ∅.
From Lemma 3.5, we know that there exists a finite time t̄ such that
∀t > t̄, ∀j = {1, 2}, ∀l > c, wt (njl , njl+1 ) > 1 − ˆ.
Denote the player who moves at node nc by player i. Given any t > t̄, for a pure
quasi-strategy bi with bi (n1l ) = n1l+1 for all n1l ∈ h1 ∩ Λi , it follows that
i
i
|N |ˆv(κ)
+ (1 − |N |ˆ)v i (h1 ) < ui (bi (Γnc ), x−i
v(1)
+ (1 − |N |ˆ)v i (h1 ),
t (Γnc )) < |N |ˆ
(3.15)
−i
i i
where u (b (Γnc ), xt (Γnc )) is the expected payoff defined in Γnc for the induced
strategy bi against the induced strategy profile x−i
t in Γnc .
Similarly, given any t > t̄, for a pure quasi-strategy b̃i with b̃i (n2l ) = n2l+1 for
all n2l ∈ h2 ∩ Λi ,
i
i
v(1)
+ (1 − |N |ˆ)v i (h2 ).
|N |ˆv(κ)
+ (1 − |N |ˆ)v i (h2 ) < ui (b̃i (Γnc ), x−i
t (Γnc )) < |N |ˆ
(3.16)
Since |N |ˆ < δ, we infer from (3.15), (3.16), and (3.4) that either
i i
−i
ui (bi (Γnc ), x−i
t (Γnc )) > u (b̃ (Γnc ), xt (Γnc )) ∀t > t̄
or
i i
−i
ui (bi (Γnc ), x−i
t (Γnc )) < u (b̃ (Γnc ), xt (Γnc )) ∀t > t̄.
Hence, the definition of the best-response dynamic implies that f (nc , n1c ) = +∞
and f (nc , n2c ) = +∞ cannot both hold. Therefore there are at most j − 1 perpetually realized plays, a contradiction.
Since there is only one perpetually realized play, (3.14) follows straightforwardly.
We still need to show that the perpetual realized play is an equilibrium path.
The next lemma shows that there are infinitely many times such that the probability assigned to each move in the perpetually realized play by the behavior
quasi-strategy profile is (weakly) increasing at the same time.
Lemma 3.8. Given any positive < 1/2|N | , for the perpetually realised play hr =
(n1 , ..., n|hr | ), there are infinitely many times t > t(), where t() is defined in
Lemma 3.7, such that
ẇ(nj , nj+1 )|t ≥ 0 ∀j with 1 ≤ j < |hr |.
18
(3.17)
Proof. We apply Lemma D.1, which considers a more general definition of a µdynamic; see Definition 4.1. A best-response dynamic is a 0-dynamic. Note that
the µ-dynamic in Lemma D.1 needs not to start from an interior state.
We show below that if the outcome in any state in the dynamic is close enough
to and moving towards a single play, then that play is an equilibrium path. Lemmata B.1 and B.2 are applied in the proof of the following lemma. Roughly
speaking, Lemma B.1 says that if a play h is not an equilibrium path, then there
exists at least one node n ∈ h (played by player λ(n)) such that whenever all other
payers follow play h, player λ(n) could receive a better payoff than v λ(n) (h) by a
deviation from h at node n; Lemma B.2 shows that player λ(n) can guarantee
such a profit bounded away from zero.
For the k-player game Γ, let
k
κ−1 i
i
c := min min v(l) − v(l+1)
,
(3.18)
i=1
and
¯ := min
l=1
c
.
maxk
i
i
2v
−
2v
i=1
(1)
(κ)
(3.19)
< ¯,
(3.20)
c,
Lemma 3.9. For any
if both (3.14) and (3.17) hold for the perpetually realized play hr in Γ at some time
t, then hr is an equilibrium path in Γ.
Proof. Given the play hr = (n1 , ..., n|hr | ), we use the definition of W (hr ) in Appendix B, i.e., the set of behavior quasi-strategy profiles w with w(nj , nj+1 ) = 1
for all 1 ≤ j < |hr |.
We consider a time t when both (3.14) and (3.17) hold. We then modify wt to
a strategy profile ŵ in W (hr ) such that ŵ(n0 , n00 ) = wt (n0 , n00 ) for all moves (n0 , n00 )
with n0 ∈
/ hr , wt (n0 , n00 ) ≥ 0, and λ(n0 ) 6= λ(ψ q̄ (n0 )) in which q̄ := min{q : ψ q (n0 ) ∈
hr }. (Roughly speaking, the last condition here means that the player who moves
at node n0 is not the player who can deviate from hr towards node n0 .) Each wt
corresponds to a unique ŵ.
Suppose that hr is not an equilibrium path. We pick a deviation node nj in hr
as defined in Lemma B.1 (roughly speaking, a node where the player can profit
by not following the play hr ), and denote the player who moves at nj by player i.
We then take a pure quasi-strategy
gŵ−i ∈
argmax
bi :n
i
j ∈N̄bi ,b (nj )6=nj+1
19
ui (bi , ŵ−i ),
i.e., a best response of player i to ŵ−i under the constraint that the move at node
nj is not towards nj+1 . From (3.14), we may infer
i
ui (wt ) < v(1)
+ (1 − )v i (hr )
(3.21)
i
+ (1 − )û(ŵ−i ),
ui (gŵ−i , wt−1 ) > v(κ)
(3.22)
and
where û(ŵ−i ) := ui (gŵ−i , ŵ−i ). We also deduce from (B.4) in Lemma B.2 and
(3.18) that
û(ŵ−i ) − v i (hr ) ≥ c.
(3.23)
From (3.20), it follows that
<
c
i
(2v(1)
−
v i (h)
i
)
− v(κ)
.
Therefore
i
i
v(1)
+ û(ŵ−i ) − v i (hr ) − v(κ)
< c.
(3.24)
From (3.21), (3.22), (3.23), and (3.24), it follows that
ui (wt ) < ui (gŵ−i , wt−1 ).
Hence (3.17) does not hold at time t, a contradiction.
If (3.17) always holds from some time t > t() on, then it follows from (3.10)
and (3.11) that both (3.14) and (3.17) hold for hr and the state is in -neighborhood
of the equilibrium component from that t on. Even if (3.17) is sometimes not true,
we show below that from some time on, the behavior off the equilibrium path
always agrees with the behavior of some state close to the equilibrium component.
Proof of Theorem 3.1. Recall the definition of ˆ and ¯ in Lemma 3.7 and (3.19),
respectively. Without loss of generality, we take an
< min{1/2|N | , ¯}.
By Lemmata 3.7, 3.8, and 3.9, there exists a time t̄ > t() such that both (3.14)
and (3.17) hold for an equilibrium path h. Denote the equilibrium component
with equilibrium path h by EC. Therefore xt̄ ∈ EC[ˆ].
If there exists a time t̃ as the infimum of time t > t̄ such that (3.17) does not
hold at time t, then there must exist a time t̂ as the infimum of time t > t̃ such
that (3.17) holds again at time t. Without loss of generality, we assume t̂ − t̃ > 0.
20
Note that for any move (nj , nj+1 ) in h, if ẇ(nj , nj+1 )|t < 0 with t > t(), then
ρ̇(h)|t < 0; see Lemma D.1. In that case, by (3.11) as well as (D.2) and (D.3) in
the proof of Lemma D.1 (for the case of µ = 0), we find that
ρ̇(h)|t < −(1 − 2|N |ˆ) < −(1 − /4)
(3.25)
for all t with t̃ < t < t̂. We may also infer from (3.25) and Lemma 3.7 that
−(t̂−t̃)
1−
e
> 1 − ˆ.
(3.26)
4
To see this, note that for a function z(x) = (1 − /4)e−x , ż|x > −(1 − /4) for all
x > 0. On the other hand, as ρt (h) > 1 − ˆ for any t ∈ [t̃, t̂],
|ρt̂ (h) − ρt̃ (h)| ≤ ˆ.
We compare the function ρt (h) and z(x) and hence reach (3.26).
Note that the basic proposition (see (3.11)) of the continuous-time best-response
dynamic implies that
−(t̂−t̃) |xit1 (bi )−xit2 (bi )| ≤ 1−e−(t̂−t̃) < 1 −
e
+ ∀t̃ < t1 < t2 < t̄ ∀bi ∈ B i ∀1 ≤ i ≤ k.
4
4
We then infer from (3.26) that
|xit1 (bi ) − xit2 (bi )| <
+ ˆ < ∀t̃ < t1 < t2 < t̄ ∀bi ∈ B i ∀1 ≤ i ≤ k.
4
2
Therefore xt ∈ EC[] for all t with t̃ ≤ t ≤ t̂. We observe that xt̂ ∈ EC[ˆ].
Comment 1: It is not true that every interior solution trajectory to the bestresponse dynamic converges to a single Nash equilibrium. (An interior solution
trajectory requires that ρ0 (h) > 0 for all plays h; see (3.6).) Consider the game in
Figure 3. Suppose that at time t = 0,
x1 (b11 ) < 1,
2
x1 (b12 )
= ,
1
1
1
1
x (b2 ) + x (b3 )
3
1
and 0 < x2 (b22 ) ≤ .
8
Then BR1 (x0 ) = {b11 } and BR2 (x0 ) = X 2 . From x0 , we can find a solution
trajectory (xt )t≥0 with the following properties for all t ≥ 0.
(i) BR1 (xt ) = {b11 },
(ii)
x1t (b12 )
2
=
1 1
1 1
xt (b2 ) + xt (b3 )
3
(this follows from (3.10)),
21
b11
I
1 2 II 2
b
b2
-1 1
0
8
3
b12
I
b13
4 2
2 4
Figure 3: Not converging to one Nash equilibrium
(iii) BR2 (xt ) = X 2 ,
(iv) x2 (b22 ) ≤ 1/8.
This solution trajectory (xt )t≥0 converges to the backward-induction equilibrium
component, but x2t need not be fixed in the process, and hence the solution trajectory need not converge to any particular Nash equilibrium. The definition of
quasi-strategies is the coarsest strategy concept that can identify the converging
outcome, and hence the equilibrium component, in the dynamic; see Section 7 for
more discussion.
Comment 2: The time τ in Theorem 3.1 need not be bounded. See, e.g., the
game in Figure 4, Cressman (2003, p. 259), which also shows that the distance
from the evolving state in the best-response trajectory to any equilibrium component may not be always decreasing, and no equilibrium component is (Lyapunov)
stable in the best-response dynamic.
We suppose that the initial state of a solution trajectory is (D, ( 45 , 0, 51 )),
where the proportion distribution of population II by strategies is in the order
of (d, ad, aa). Then, we can let gt1 = D and xt = x0 until an arbitrary finite time
t̄. However, we can still show the convergence of any best-response trajectory to
a Nash equilibrium component in this game. To be specific, if gt̄1 turns to AA
and x1t̄ (AA) > 0, then for any > 0 the trajectory enters N E[] in bounded time
starting from t̄.
Comment 3: We use an infinite lower Lebesgue integral in (3.9) for general
dynamic processes. For instance, in the game in Figure 3, we can require that in
some interval [t1 , t2 ], gt2 = b21 when t is a rational number, and gt2 = b22 when t is
an irrational number.
22
I
D
A
II
I
A
D
d
0
0
a
-1
3
II
a
4
4
d
2
2
1
5
Figure 4: A centipede game of length 4
Comment 4: Readers may find that the proof above has a similar flavor to
the proof for the replicator dynamic in Cressman and Schlag (1998), which focuses
R∞
on the integral 0 xit (bi )dt for some bi . Yet, we cannot imitate their proof for the
R∞
best-response dynamic. Even if 0 xit (bi )dt < ∞, it is possible that xi (bi ) > 0
holds less and less frequently, but that xi (bi ) is bounded from below when it does
hold. In (3.9), we instead turn to the integral of the period of time when some
move is part of a best response.
4
Approximate Best-response Dynamics
Denote the set of interior states by int(X). Given a state x and a positive number
µ, we put
[
CRi (x[µ]) =
BRi (x0 ),
x0 :x0 ∈x[µ]∩int(X)
so BRi (x) ⊆ CRi (x[µ]) ⊆ BRi (x[µ]) when x ∈ int(X).
Definition 4.1. Consider a k-player extensive-form game Γ of perfect information
with the set of pure strategies Ai for each player i, 1 ≤ i ≤ k. We call a continuoustime dynamic process a µ-dynamic in Γ if every state x is a k-dimensional vector
(xi )i ∈ ×i ∆(Ai ), x0 ∈ int(X), and the dynamic satisfies ẋi = g i − xi for some
g i ∈ CRi (x[µ]) for all i with 1 ≤ i ≤ k. That is, for all i,
ẋi ∈ CRi (x[µ]) − xi .
Any such µ-dynamic with µ > 0 is an approximate best-response dynamic.
Similarly to the continuous-time best-response dynamic, we can check that for
a generic perfect-information game and for any positive µ < 1, from any interior
initial state x0 , a solution trajectory ξ(t, x0 ) to the µ-dynamic exists.
In Definition 4.1, we require g i ∈ CRi (x[µ]), not BRi (x[µ]). This is because if
the solution trajectory to the dynamic starts from an interior state, player i would
23
realize the state x0 ∈ x[µ] definitely different from the current state if x0 was not an
interior state. Indeed, if player i picks a boundary state x0 ∈ BRi (x[µ]) such that
some subgame is not reached, then she can choose any possible induced behavior
strategy in that subgame for her best response. The player could then pick a state
x0 ∈ x[µ] ∩ int(X) again to obtain a finer set of her best response. In this way,
for a generic extensive-form game under such dynamics formulated by CRi (x[µ]),
the proportion of any weakly dominated strategy is decreasing, which better fits
the property of an interior best-response trajectory. From another perspective,
we may view x0 ∈ x[µ] as generated from some µ-perturbation on the target state
x. If that perturbation has continuous probability distribution on x[µ], e.g., the
uniformly distribution, then with probability zero x0 will end up a boundary state.
4.1
Counterexample to Convergence to N E
We show below that a µ-dynamic with any µ > 0 need not converge to the set of
Nash equilibria in an extensive-form game of perfect information.
a1
I
4
II
b1
-2
6
a2
1
-3
0
I
b2
II
b3
-5
I
a3
a4
2.2
-4 0
2 2.9
Figure 5: A µ-dynamic need not converge to N E
For ease of exposition, the set of best responses of player i at time t is denoted
i
by BRi (x−i
t [µ]), since it does not depend on the component xt .
In the game in Figure 5, each aj or bj denotes a quasi-strategy. The backwardinduction equilibrium is (a1 , b3 ) and the alternative pure-strategy Nash equilibrium
is (a2 , b1 ). (The backward-induction moves are single-arrowed, and the final moves
24
in the latter equilibrium are double-arrowed.) We denote the corresponding two
Nash equilibrium components by BC and N C, respectively. If xt ever approaches
N E, then there exists a time t0 such that xt0 ∈ BC[µ] or xt0 ∈ N C[µ]. We show
below that it is possible that xt ∈
/ BC[µ] ∪ N C[µ] infinitely often. Without loss
of generality, we let µ ≤ 1/100.
(i) xt0 ∈ N C[µ]: we take a mixed quasi-strategy x̃2 of player II such that
x̃2 (b1 ) = 1 − µ, x̃2 (b3 ) = 99µ/100, and x̃2 (b2 ) = µ/100. Then for any
x ∈ N C, x̃2 ∈ x2 [µ]. We suppose that
xt ∈ N C[µ] ∀t ≥ t0 ,
(4.1)
which we show below to be in fact not true for some solution trajectories to
the µ-dynamic. Under the assumption (4.1), we can let
gt1 ∈ BR1 (x̃2 ) ∀t ≥ t0
in the µ-dynamic and hence
gt1 = a4 ∀t ≥ t0 .
Then there exists a time t1 ≥ t0 such that x1t1 (a3 ) + x1t1 (a4 ) > 2/3. Then at
some time before t1 , g 2 becomes b3 , and so g1 can still be set as a4 . Hence,
the solution trajectory is moving away from N C towards BC. We then infer
that (4.1) cannot hold for some solution trajectories to the µ-dynamic.
(ii) xt0 ∈ BC[µ]: we take a mixed quasi-strategy x̄1 of player I such that x̄1 (a1 ) =
1 − µ, x̄1 (a3 ) = 99µ/100, and x̄1 (a2 ) = x̄1 (a4 ) = µ/200. Then for any
x ∈ BC, x̄1 ∈ x1 [µ]. We suppose that
xt ∈ BC[µ] ∀t ≥ t0 ,
(4.2)
which we show below to be in fact not true for some solution trajectories to
the µ-dynamic. Under the assumption (4.2), we can let
gt2 ∈ BR2 (x̄1 )
and hence gt2 = b2 in the µ-dynamic from t0 until a time t1 ≥ t0 when
x2t1 (b2 ) > 3/4.
(4.3)
We take another mixed quasi-strategy x̂1 of player I such that x̂1 (a1 ) = 1−µ,
x̂1 (a2 ) = 99µ/100, and x̂1 (a3 ) = x̂1 (a4 ) = µ/200. Then for any x ∈ BC[µ],
x̂1 ∈ x1 [µ]. Under the assumption (4.2), we can then let
gt2 ∈ BR2 (x̂1 )
25
and hence gt2 = b1 from t1 until a time t2 ≥ t1 when x2t2 (b1 ) > 4/5. When
/ BR1 (x2t [µ]). From (4.3) and (3.10), it follows that
x2t (b1 ) > 4/5, a1 ∈
x2t (b2 )/x2t (b3 ) > 3 for all t with t1 ≤ t ≤ t2 . Then at some time before
t2 , g 1 becomes a2 , and so g2 can still be set as b1 . (Note that the best response of player II to a2 is b1 .) Hence, the solution trajectory is moving
away from BC towards N C. We then infer that (4.2) cannot hold for some
solution trajectories to the µ-dynamic.
From another direction, we can show why the proof of Theorem 3.1 cannot
be applied to an approximate best-response dynamic. We check Lemma 3.7 for
the µ-dynamic in the game in Figure 5. For a solution trajectory that transits
between N C[µ] and BC[µ] infinitely often, both the plays with the last move
same as of b2 and a4 , respectively, are perpetually realized plays. This is because
when x1 (a1 ) > 1 − µ, a µ-perturbation on x1 (a3 ) and x1 (a4 ) may decide whether
b2 or b3 is the best response g 2 of player II. Similarly, when x2 (b1 ) > 1 − µ, a µperturbation on x2 (b2 ) and x2 (b3 ) may decide whether a2 or a4 is the best response
g 1 of player I. For an interior best-response trajectory, i.e., a solution trajectory
to the 0-dynamic, if the plays with the last move same as of a4 is a perpetually
realized play, then
x1 (a3 )
lim 1t
= 0.
t→∞ xt (a4 )
Therefore, the best response of player II cannot be b2 after a sufficiently long period
of time, and the play with the last move same as of b2 cannot be a perpetually
realized play.
4.2
Result
In the game in Figure 5, although some solution trajectories to the µ-dynamic do
not converge to N E, the relative proportion of the time spent outside N E must be
small in the long run, if µ is small. That means that most of the time the state is
very close to some Nash equilibrium component. To see this, we give an example
of a transition in the µ-dynamic from N C[cµ] to BC[cµ] for a positive constant
c. We can firstly show that given > 0, if a transition is triggered (c.f. Case (i)
above), then the transition time from N C[cµ] to BC[] is bounded, and we denote
an upper bound by T1 . Note that for a sufficiently big c > 0 (depending on Γ), the
µ-dynamic behaves as a best-response dynamic before the state enters BC[cµ].
We can also find that that any evolving state in BC[] \ BC[cµ] in this transition
is in the basin of attraction of BC[cµ] under the µ-dynamic. Furthermore, the
26
transition time from BC[] to BC[cµ], denoted by T2 (µ), becomes longer as µ
decreases. Therefore, the state is in BC[] for at least a period of T2 before the
next transition back to N C[], and
T1
= 0.
µ→0 T2 (µ)
lim
Theorem 4.2. Given any finite generic extensive-form game Γ of perfect information, for every > 0, there exist µ() > 0 and T () > 0 such that from any
interior initial state x0 , for any solution trajectory ξ(t, x0 ) to the µ()-dynamic,
1
T
Z
T
1{t:ξ(t,x0 )∈N E[]} dt ≥ 1 − (4.4)
0
for all T ≥ T ().
Comment: It is still unclear whether the trembled version of a best-response
dynamic
ẋi ∈ (BRi (x) − xi )[µ]
weekly converges to N E as in (4.4).
4.3
Sketch of the Proof
For simplicity, we only consider extensive-form games without chance nodes in
the proof. If we were to drop this condition, the result would still hold under the
generic condition.
We apply quasi-strategies in the proof. Given a finite extensive-form game Γ
of perfect information without chance nodes, we consider any pre-terminal node
ñ. In a µ-dynamic, for the player who moves at node ñ, we first suppose that one
of the following two cases holds.
1. For all t ≥ 0, the proportion of all pure quasi-strategies including a move at
ñ is smaller than µ (Step I in Appendix E).
2. For all t ≥ 0, the proportion of all pure quasi-strategies including a dominated move at ñ is at most µ of the proportion of all pure quasi-strategies
including the dominating move at ñ (Step II in Appendix E).
When we study the outcome in the game at time t, we note that
(i) for case 1 above, we may simply cut ñ from Γ, and ignore the outcome
through node ñ;
27
(ii) for case 2 above, we may simply replace the decision node ñ by a terminal
node with the same payoff vector assigned to the dominating move at ñ in
the original game Γ.
In either case, the induced outcome in the modified game is in the µ-neighbourhood
of the original outcome in Γ.
We define two tree operations for these two cases, respectively, in Appendix
D. In either case, we can reduce the analysis of game Γ to a game Γ0 with
Nd (Γ0 ) = Nd (Γ) − 1.
For these two cases, a naive approach would be to apply induction and suppose
that the µ-dynamic converges in frequency to N E in Γ0 ; the base case of N (Γ) = 1
is trivial. However, N E(Γ) may be different from N E(Γ0 ), and hence the direct
induction approach is not appropriate to this situation. We first show instead
that if in an approximate best-response dynamic, the outcome is close to one play
most of the time, then the state is in a small neighborhood of N E most of the
time. This can be deduced from an intermediate result, Lemma 3.9, in the proof
of Theorem 3.1. We then use induction to generate an approximate best-response
trajectory in Γ0 from the original trajectory in Γ, and the induction hypothesis is
that the outcome of the generated trajectory is close to a play in Γ0 most of the
time. For both Case 1 and Case 2 above, the inductive step is to show that the
outcome of the original trajectory is also close to a play in Γ most of the time.
Moreover, there exists a lower bound TΓ such that for either Case 1 or Case 2,
any solution trajectory to a µ-dynamic with small enough µ satisfies the inductive
property for all finite periods of time longer than TΓ .
We finally synthesize the two cases, and study a general solution trajectory
to the approximate best-response dynamic in Γ. After adjusting the parameter µ
in both Case 1 and Case 2 above, we show that from some time on, the state is
always in either Case 1 or Case 2, though it can transit between these two cases
infinitely often. On the other hand, for a transition from Case 2 to Case 1, the
state has to satisfy Case 2 for a period of time much longer than TΓ . Hence, once
the state is in Case 2, the state is close to N E most of the time until it transits
to Case 1. If the state is rarely in Case 2, then it stays in Case 1 most of the time
and hence the state is still close to N E most of the time.
28
5
Convergence to the Backward-induction
Equilibrium
Given a generic extensive-form game of perfect information, we denote the unique
backward-induction equilibrium by BIE. We show below that if each player can
move at only one node in the game, then an interior µ-dynamic with sufficiently
small µ always converges to BIE.
Theorem 5.1. Consider a finite generic extensive-form game Γ of perfect information in which every player can move at only one node, i.e., |Λi | = 1 for all
players i. For every > 0, there exists a number µ() > 0 and T () > 0 such
that from any interior initial state x0 , for any solution trajectory ξ(t, x0 ) to the
µ()-dynamic on Γ
ξ(t, x0 ) ∈ BIE[]
for all t ≥ T ().
Remark: This theorem can also be applied to the case where µ = 0, i.e., an
interior best-response trajectory.
To see this result, note that a mixed strategy of a player is a probability
distribution over all moves directed from one node, say n for this player. If the
continuation strategy profile in each subgame after n is close to the backwardinduction equilibrium in that subgame, then the dominating move at n is the
backward-induction move at n. This dominating move at n is the direction of the
change for this player in the dynamic from then on.
The following corollary follows straightforwardly by backward induction and
the definition of quasi-strategies.
Corollary 5.2. Let us suppose that in a finite generic extensive-form game Γ of
perfect information, every play in the game tree contains at most one decision
node of every player. Then, for every > 0, there exists a number µ() > 0 and
T () > 0 such that from any interior initial state x0 , for any solution trajectory
ξ(t, x0 ) to the µ()-dynamic in Γ
ξ(t, x0 ) ∈ BIE[]
for all t ≥ T ().
Remark: Every pure quasi-strategy is also a pure strategy if and only if every
play in the game tree contains at most one decision node of every player. In
29
this case, the backward induction equilibrium can be defined in either standard
strategies or quasi-strategies.
Given an extensive-form game of perfect information, the convergence of a
µ-dynamic with small µ to the backward-induction equilibrium is always possible.
Theorem 5.3. Given a finite generic extensive-form game Γ of perfect information and > 0, for any µ > 0, and from any initial state x0 , there exists a solution
trajectory ξ(t, x0 ) to the µ-dynamic and a number T > 0 such that
ξ(t, x0 ) ∈ BIE[]
for all t > T .
When the probability of any non-backward-induction move at some node is
small enough, we can view this probability distribution as if the player plays the
backward-induction move with probability one at that node in an approximate
best-response dynamic. We then apply induction to complete the proof.
Comment: One may be tempted to prove the following claim for the bestresponse dynamic. Given a finite generic extensive-form game Γ of perfect information and > 0, there exists a number T () > 0 such that from any interior
initial state x0 , there exists a solution trajectory ξ(t, x0 ) to the best-response dynamic with the property xt ∈ BC[] for all t > T (). However, one can give a
counterexample to this claim by the game in Figure 5. Consider an initial state
x0 = (x̂1 , x̂2 ) with x̂2 (b2 ) > 3/4, x̂1 (a1 ) = 99/100 and x̂1 (a2 ) = 99/10000. Following the analysis of x̂1 in Step (ii) for that game in Section 4.1, we find that the
solution trajectory to the best-response dynamic converges to the non-backwardinduction equilibrium component.
6
Convergence to Self-confirming Equilibrium
In the study of evolutionary processes in extensive-form games, Noldeke and
Samuelson (1992) characterize each player by her strategy and conjecture. In
their paper, a conjecture is essentially the belief of the player on the probability
distribution assigned by the behavior strategy profile over the set of moves at each
unreached decision nodes. At any time, a player can observe only those probability
distributions at all reached decision nodes. Every player always best-responds to
the behavior strategy profile generated by those observed probability distributions
at all reached decision nodes and her conjecture at other decision nodes.
30
We adapt this dynamic in our setup. At each time t, each player i is characterized by a quasi-strategy xit ∈ X and a conjecture θti ∈ W −i . The conjecture
of player i specifies her belief of the strategy profile at time t. For the update
rule of the conjecture, we first need to define the condition under which a node is
connected. Given a node n different from the root n0 of the tree, we denote the
subplay from n0 to n by (n0 = n0 , n1 , ..., nl = n). Given the strategy profile xt
(recall that wt denotes the equivalent behavior strategy profile) at time t, we say
that node n is connected to player i at time t if for all j with 0 ≤ j < l
(
S
wt (nj , nj+1 ) > 0
if nj ∈ m6=i Λm
wt (nj , nj+1 ) > 0 or ẇ(nj , nj+1 )|t > 0 if nj ∈ Λi .
We require that the root n0 be always connected to each player, and denote the set
of connected nodes to player i at time t by Cti . At each t, the conjecture of player
i at each decision node connected to player i is updated, i.e., θti (n, n0 ) = wt (n, n0 )
for all moves (n, n0 ) with n in Cti \ Λi . Conjectures at disconnected nodes are
unchanged. Therefore, θti1 (n, n0 ) = θti2 (n, n0 ), if n is disconnected to i between
[t1 , t2 ] and θti2 (n, n0 ) is defined. Note that the above definition of connectedness
allows that for a subgame Γn unreached, i.e., root n unreached, at time t, only
the player i who has a pure quasi-strategy bi such that node n is reached under
(bi , x−i
t ) can update conjecture of the continuation strategy profile in Γn . Hence,
it is possible that the subgame has never been reached, but player i has updated
conjecture in that subgame. In a more general sense, this dynamic enforces that
conjectures are independent.
We suppose that each player has her initial conjecture at time 0. A state
z = (xi , θi )1≤i≤k specifies the strategies and conjectures of all players. The set of
best responses of player i is defined with respect to her conjecture, i.e.,
BRi (z) = argmax ui (q i , θi ).
(6.1)
q i ∈X i
The best-response dynamic becomes
ẋi ∈ BRi (z) − xi ,
(6.2)
with θ being adjusted by the rule above.
Note that the best-response correspondence in (6.1) may not be upper hemicontinuous at countably many times t in the dynamic. For example, the discontinuity
may happen when a subgame becomes reached: an extreme case is that for an
interior state at t = 0, each player i may have a conjecture completely different
31
from w0−i . Such discontinuity arises from the inconsistency between the conjecture
and the true strategy profile in the state at time t. Since the best-response correspondence is closed, convex-valued, and upper hemicontinuous at all t ≥ 0 except
countably many times, a solution trajectory (in the extended sense) ξ(t, z0 ) from
any initial state z0 exists, though it is not necessarily unique. A solution trajectory
may not satisfy (6.2) for at most a measure-zero set of times.
If we apply such a dynamic in an extensive-form game, then we can analyze
the dynamic in quasi-strategies, as both BRi (z) and θi in (6.1) can be formulated
by quasi-strategies. For the convergence result of this dynamic, we only need to
consider non-interior solution trajectories. This is because a conjecture is irrelevant to interior solution trajectories, and once a node is reached at any time t, it
will be reached at all times after t. (Note that being reached is slightly stronger
than being connected.) For a non-interior solution trajectory, we can focus on the
set of nodes that are eventually reached, and the problem is reduced to the one of
an interior dynamic. From the proof of Theorem 3.1, in particular from Lemmata
B.1 and 3.9, we may infer that for this version of best-response dynamic applied
in a finite generic extensive-form game of perfect information, the outcome converges to an equilibrium path, and the limit state is in the set of self-confirming
equilibria.
7
Further Comments
1. For a µ-dynamic with CRi replaced by BRi in Definition 4.1, I do not know
whether Theorem 4.2 still holds. Note that the proportions of some weakly
dominated strategies in this new µ-dynamic may not be always decreasing,
and hence (E.13) in the proof does not always hold.
2. For a µ-dynamic in the present paper, we assume that a player may not fully
recognize the current state, and her best response is against a point in the
µ-neighborhood of the current state. We may also consider another version
of approximate best-response dynamic: a player can always recognize the
current state, but cannot fully control her behavior in the sense that the
direction of the trajectory can be guaranteed only in the µ-neighborhood of
the current best response. Therefore, it is a trembling-hand version of the
best-response dynamic, and we may formulate it as
ẋi ∈ BRi (x) [µ] − xi
(7.1)
for each player i in the game.
32
Note that the limit of the differential inclusion in (7.1) and the one in Definition 4.1 are the same when µ decreases to zero. I conjecture that Theorem
4.2 is still true for the dynamic described in (7.1). Note that for this dynamic, Lemma 3.9 (necessary for the proof of Lemma D.2) cannot be applied
directly, and (3.10) may not hold.
3. Hendon et al. (1996) claim that in extensive-form games, the (discrete-time)
fictitious play defined in standard strategies is problematic. The quasistrategies introduced in the present paper can help to eliminate the confusion.
Take the game in Figure 3 in the present paper for example. Suppose in the
current state that
x1 (b11 ) = x1 (b12 ) = x1 (b13 ) and x2 (b22 ) = 1/8.
Then a best response of player I in a standard strategy needs the left move at
the top node, and any distribution on moves at the bottom node. Hendon et
al. argue that since in the current state the probability of the right move at
the top node is positive for player I, there is positive probability of reaching
the bottom decision node. Conditional on the bottom node being reached,
the local best response there is the left move. So the only best response of
player I in standard strategy is (top left, bottom left). The inconsistency
here induces them to define a so-called sequential best reply in an extensiveform game. However, the above unique best response of player I given by
sequential reasoning takes into account the part of strategies of player I that
is to be adjusted in the future. My answer to the dilemma above is that
the best response of player I should only depend on the strategy of player
II. If player I’s best-response strategy consists of the left move at the top
node, then player I knows that conditional on this move, there is no chance
of reaching the bottom node, regardless of player II’s strategy. Hence, player
I does not need to specify a move at the bottom node in her best response.
This observation is in fact a motivation of quasi-strategies in extensive-form
games.
In addition to the above definition of sequential best reply, Hendon et al.
(1996) also propose a definition of local best reply in so-called agent-strategic
form, where they assign an agent at all nodes played by the same player,
and all agents of the same player have an identical utility function. Such a
game is in fact also studied in Hart (2002) and Gorodeisky (2006) as well as
33
in Theorem 5.1 in the present paper. In a finite generic extensive-form game
of perfect information in agent-strategic form, an (interior) best-response
dynamic or fictitious play in the setting of either discrete or continuous time
converges to the backward-induction equilibrium.
Appendix A
Terminology of Reduced Normal Form
There are two other well-known normal-form representations of an extensive-form
game; see Ritzberger (2002). A normal-form game is semi-reduced (or purestrategy reduced or quasi-reduced ) if for all pairs (si1 , si2 ) of strategies of player
i and for all players i = 1, ..., k,
u(si1 , s−i ) = u(si2 , s−i ), ∀s−i ⇒ si1 = si2 .
A normal-form game is mixed-strategy reduced (or simply reduced ) if for all strategies s̄i of player i and for all players i = 1, ..., k,
u(s̄i , s−i ) =
X
σ i (si )u(si , s−i ), ∀s−i ⇒ σ i (s̄i ) = 1.
si ∈S i
That is, no si is payoff equivalent to any convex combination of the elements in
S i \ {s̄i }. (Recall σ i in (2.3).) Given an extensive-form game Γ, we denote the set
of strategies for player i in semi-reduced normal form and mixed-strategy reduced
normal form by S i and M i , respectively. Recall that the set of quasi-strategies for
player i is B i . From their definitions, we could let B i ⊇ S i ⊇ M i for any player i
in any extensive-form game Γ.
Recall that Λ0 = ∅ implies no chance node.
Lemma A.1. Given an extensive-form game of perfect information with no payoff
tie for any player at any two terminal nodes, if Λ0 = ∅, then for each player i,
B i ⊇ S i implies that B i = S i .
Proof. When Λ0 = ∅, for every player i and every two different quasi-strategies
bi1 , bi2 ∈ B i , there exist two different plays h1 , h2 and one strategy combination b−i
of other players such that
ρ(bi1 ,b−i ) (h1 ) = ρ(bi2 ,b−i ) (h2 ) = 1,
i.e., (bi1 , b−i ) and (bi2 , b−i ) result in different plays h1 and h2 , respectively, in Γ.
(Rigorously, given a quasi-strategy profile b = (b1 , ..., bk ) and a pure strategy
34
profile a = (a1 , ..., ak ) such that ai ∈ bi for all i with 1 ≤ i ≤ k, we have ρb (h)
defined as ρa (h) for all plays h in H.) From no payoff tie for player i at any two
terminal nodes, it follows that
u(bi1 , b−i ) 6= u(bi2 , b−i ).
This is true for every pair of quasi-strategies bi1 , bi2 ∈ B i . Recall that the set
of quasi-strategies is a partition of pure strategies, and we reach the conclusion
Bi = S i.
Even if Λ0 = ∅, it is not true that B i = M i for all players i in all generic
extensive-form games of perfect information. In the one-player game Γ0 below, we
denote B 1 = {b1 , b2 , b3 } where u1 (b1 ) = 3, u1 (b2 ) = 0 and u1 (b3 ) = 6. However,
strategy b1 can be replaced by a mixed strategy x1 = {σ 1 (b1 ) = 0, σ 1 (b2 ) =
1/2, σ 1 (b3 ) = 1/2}. Thus, M 1 6= B 1 .
I
3 0 6
Figure 6: Γ0 with B i 6= M i
If without the constraint on Λ0 , then the conclusion in Lemma A.1 is no longer
true. In the one-player game Γ00 below, we denote B 1 = {b1 , b2 }. The strategies
b1 and b2 lead to the same expected payoff, hence B 1 6= S 1 .
I
b1
3
b2
Nature
p = 1/2
p = 1/2
0 6
Figure 7: Γ00 with B i 6= S i
Remark: The standard generic condition for an extensive-form game is that
no two pure strategy profiles that yield different outcomes generate the same payoff
for any player. In any extensive-form game under this condition, B i = S i for each
player i.
35
Appendix B
Preliminary Lemmas for Theorem
3.1
We consider a finite generic k-player extensive-form game Γ of perfect information
without chance nodes. Given a play h = (n1 , ..., n|h| ), we define W (h) to be the
set of behavior quasi-strategy profiles w with w(nj , nj+1 ) = 1 for all 1 ≤ j < |h|.
Lemma B.1. If a play h = (n1 , ..., n|h| ) is not an equilibrium path in Γ, then
there exists one node nj in h such that for any behavior quasi-strategy profile w in
W (h), there exists a pure quasi-strategy bi ∈ B i with i = λ(nj ), bi (nl ) = nl+1 for
all l < j, and bi (nj ) 6= nj+1 such that
ui (bi , w−i ) > ui (w) = v i (h).
(B.1)
We call such node nj a deviation node in h.
Proof. Note that each Nash equilibrium component in Γ consists of a pure strategy Nash equilibrium together with other Nash equilibria with the same equilibrium path.
We prove the desired conclusion by contradiction. For each node nj in h with
j < |h| − 1, we modify the subgame Γnj to a game Γnj by replacing the node nj+1
with a terminal node nh that has the same payoff vector as for the terminal node
n|h| in the original play h. (Note that the subgame Γnj+1 is not included in Γnj .)
For completeness, we let Γn|h|−1 = Γn|h|−1 .
If (B.1) does not hold, then for any nj and the modified game Γnj , there exists
a strategy profile x−λ(nj ) (Γnj ) to which the best response of player λ(nj ) at node
nj is the move (nj , nh ). Note that for each node n ∈
/ h in Γ there exists a unique
j such that n ∈ Γnj . We can then generate a behavior quasi-strategy w̄ in Γ from
the sequence (x−λ(nj ) (Γnj ))j<|h| such that
(i) w̄ ∈ W (h);
(ii) for every move (n, n0 ) in Γ with the proposition n ∈
/ h and n also in some
nj
0
modified game Γ , w̄(n, n ) equals the probability of (n, n0 ) assigned by the
equivalent behavior strategy profile to x−λ(nj ) (Γnj ) in Γnj .
In this way, we see that play h is the path of a Nash equilibrium x̄ in Γ, a
contradiction.
Given a node nj in a play h = (n1 , ..., n|hr | ), we put
Bnj := {bλ(nj ) ∈ B λ(nj ) : nj ∈ N̄bλ(nj ) and bλ(nj ) (nj ) 6= nj+1 },
36
(B.2)
In other words, a pure quasi-strategy in Bnj includes a move at node nj which
is not towards nj+1 . Suppose that player i plays at node nj . Given a behavior
strategy profile w−1 , we let
ui (bi , w−i ).
û(w−i ) := max
i
b ∈Bnj
(B.3)
Recall that κ is the number of terminal nodes in Γ.
Lemma B.2. If a play h̄ = (n1 , ..., n|h̄| ) is not an equilibrium path in Γ, then we
take a deviation node nj in h̄, and suppose that player i moves at node nj . Then,
k
κ−1 −i
i
i
i
min û(x ) − v (h̄) ≥ min min v(l) − v(l+1)
,
(B.4)
w∈W (h̄)
i=1
l=1
i
where û(w−1 ) is defined in (B.3) with respect to nj in h̄, and v(l)
is defined in
Section 3.2.
Proof. It is straightforward to see that the left-hand side in (B.4) is bounded
away from zero. We define a function f 0 : W −i (h̄) → R such that f 0 (w−i ) =
û(w−i ) − v i (h̄). From Lemma B.1, f 0 (w−i ) > 0 for all w−i in W −i (h̄). Since f 0 is
continuous and W (h̄)−i is compact, f 0 is bounded away from 0.
i
}1≤l≤k,1≤i≤k of
The right-hand side in (B.4) is due to the fact that the set {v(l)
payoffs is finite.
Appendix C
Games with Chance Nodes
For a generic perfect-information game Γ with chance nodes, the informal general
idea to the proof of Theorem 3.1 is as follows. We first fix a subgame rooted
at a chance node n such that n is the only chance node in subgame Γn . We
then check whether Γn includes a perpetually realized subplay (see Lemma 3.6
for the definition of a perpetually realized play). If yes, by Lemma 3.7, for each
immediate successor node n0 of n, there is one and only one maximal perpetually
realized subplay through node n0 in Γn . After checking all chance nodes n who has
no successor chance nodes, we check a larger subgame Γn̄ in which n̄ is the only
chance node which has not been checked in the previous step. If there is a chance
node different from n̄ in Γn̄ , Lemma 3.7 then implies that for each immediate
successor node n0 of n̄, there is one and only one maximal perpetually realized
subplay through node n0 in Γn̄ . In this way, we can check all subgames rooted at a
chance node in order, and finally show by Lemma 3.7 that in the original game Γ,
37
there is a set of perpetually realized play in which each pair split up at a chance
node. The proof then follows the one for the case of no chance nodes.
For a formal treatment, we consider a finite generic k-player extensive-form
game Γ of perfect information, not necessarily without chance nodes, for Theorem
3.1. Without loss of generality, we assume that
∀n ∈ Λ0 , ∀n0 ∈ ψ −1 (n), τ (n0 ) > 0;
see (2.1) for the definition of τ (n0 ). Recall that n0 is the root of Γ. We define a
super play hs to be a non-empty set of moves with the following properties:
(i) (n1 , n2 ) ∈ hs and n1 6= n0 ⇒ (ψ(n1 ), n1 ) ∈ hs .
S
(ii) n1 ∈ kl=1 Λl and (ψ(n1 ), n1 ) ∈ hs ⇒ ∃!n2 ∈ ψ −1 (n1 ) s.t. (n1 , n2 ) ∈ hs .
(iii) n1 ∈ Λ0 and (ψ(n1 ), n1 ) ∈ hs ⇒ ∀n0 ∈ ψ −1 (n1 ) (n1 , n0 ) ∈ hs .
(∃! denotes unique existence.) Therefore, a super play hs includes all moves departing from any chance node in hs . When Λ0 = ∅, the sequence of nodes in a
super play is a play as defined in Section 2.
We briefly show below the steps when considering chance nodes for the proof
of Theorem 3.1. We first show that a perpetually realized super play hs uniquely
exists, much in the same spirit as the case of no chance nodes in Lemmata 3.6 and
3.7. Furthermore, we can replace (3.14) by
∀h ⊂ hs , ρt (h) > Πn∈h∩ψ−1 (Λ0 ) τ (n) − ,
where h ⊂ hs means that play h is contained in hs . We proceed to the proof of
Lemma 3.8. Replacing (3.17) by
ẇ(n1 , n2 )|t ≥ 0 ∀(n1 , n2 ) ∈ hs with n1 ∈
/ Λ0 ,
we prove that each play contained in hs is an equilibrium path. We thus reach a
modified version of Lemma 3.9. The remaining details of the proof are analogous
to the proof of the case Λ0 = ∅.
We can also use the definition of a super play to prove Theorem 4.2 for general
extensive-form games with chance nodes.
Appendix D
Notations, Operations, and Preliminary Results for Theorem 4.2
For convenience in the proof, we sometimes write a µ-dynamic (xt )t≥0 for short,
instead of the more explicit a solution trajectory ξ(t, x0 ) to the continuous-time
38
µ-dynamic. We apply quasi-strategies as in (3.5).
Recall that the set of decision nodes in a finite extensive-form game G is
denoted by Nd (G). We let r(Γ) := |Nd (Γ)| for game Γ, and sometimes write r
for short. For the results in this section, the µ-dynamic need not start from an
interior initial state. Recall that the equivalent behavior quasi-strategy profile of
xt is denoted as wt for all t ≥ 0.
The following lemma shows that if the solution trajectory concentrates on some
path h in the game tree at some time t and the probability that h is followed has
a nonnegative growth rate at t, then the probability of each move in h assigned
by the behavior quasi-strategy profile at t has a positive growth rate.
Lemma D.1. Given a positive number < 1/(2r+1 ), for a µ-dynamic (xt )t≥0 in
the k-player game Γ with µ < /2, suppose that there exist a play h := (n1 , ..., n|h| )
and a time t0 such that ρt0 (h) ≥ 1 − and ρ̇(h)|t0 ≥ 0. Then ẇ(nj , nj+1 )|t0 ≥ 0
for all j with 1 ≤ j < |h|.
Proof. First, note that when r = 1, i.e., when Γ contains only one decision node,
the result is trivial.
We prove the case where r > 1 by contradiction. Suppose that the desired
conclusion is not true. Then, at some t with
ρt (h) ≥ 1 − > 1 − 1/2r+1 ,
there exists at least one nj such that
ẇ(nj , nj+1 )|t < 0.
(D.1)
Without loss of generality, we assume that each decision node in h is played by
a different player. It follows from ρt (h) > 1−1/2r+1 that wt (nj , nj+1 ) > 1−1/2r+1
for all j with 1 ≤ j < |h|. From (D.1),
0
!
|h|−1
|h|−1
Y
Y
X
w(nj , nj + 1) =
ẇ(nl , nl+1 )
w(nj , nj+1 ) ,
(D.2)
j=1
l=1
j6=l
and the definition of the µ-dynamic, we may infer that
ρ̇(h)|t
0 |h|−1
Y
=
w(nj , nj + 1) j=1
t
≤ ( + µ) (|h| − 2) − (1 − − µ)|h|−1
r
1
1
≤ r+1 + µ (r − 1) − 1 − r+1 − µ .
2
2
39
(D.3)
(D.4)
Since µ < 1/(2r+2 ),
1
2r+1
and
+ µ (r − 1) ≤
3
16
r r 1
3
169
1 − r+1 − µ ≥ 1 − r+2 =
2
2
256
r=2
(D.5)
(D.6)
for all r with r > 1.
From (D.4), (D.5), and (D.6) it follows that ρ̇(h)|t < 0, a contradiction.
Recall that the set of plays in Γ is denoted by H. Given > 0 and a µ-dynamic
(xt )t≥0 , we put
S() := {t : ρt (h) > 1 − and ρ̇(h)|t ≥ 0 for some h ∈ H}
and
Q() := {t : ρt (h) > 1 − for some h ∈ H}.
(D.7)
The following lemma for the -dynamic is analogous to Lemma 3.9 for the bestresponse dynamic.
Lemma D.2. Given game Γ, we take any with
1
¯
0 < < min
,
2r+2 r + 1
(D.8)
in which ¯ is defined in (3.19). For any -dynamic (xt )t≥0 on Γ, it follows that
xt ∈ S() ⇒ xt ∈ N E[].
Proof. We apply the following adapted proof of Lemma 3.9 to show that for any
with (D.8), the play h studied in Lemma D.1 is in fact an equilibrium path.
If we consider the proof of Lemma 3.9 in the setup of the -dynamic, then for a
state x with the equivalent behavior quasi-strategy profile w in W (h) (defined at
the start of the proof of Lemma 3.9), a state x0 ∈ x[] implies that some moves
may deviate away from h, but any such move cannot be followed with probability
greater than . These moves may lead to a different payoff vector from the one
assigned to h. Note that there are at most r nodes in the play h. We apply the
proof of Lemma 3.9 but with in (3.21) and (3.22) replaced by (r + 1).
The next theorem states that the convergence in frequency of the outcome
implies the convergence in frequency of the -dynamic to N E.
40
Theorem D.3. Given game Γ, consider any with the properties (D.8) and
(1 −
2)r
2r
4r
<
.
− 2(r − 1)
1 − 4r
(D.9)
Given a positive number T 0 , if an -dynamic satisfies
Z
1 T
1Q() dt ≥ 1 − T 0
for all T > T 0 , then
1
T
Z
T
1{t:xt ∈N E[]} dt ≥ 1 − (4r + 1)
(D.10)
0
for all T > T 0 .
Proof. For the given -dynamic in Γ, there exist a time t1 and a play h :=
(n1 , ..., n|h| ) with ρt1 (h) ≥ 1 − and ρ̇(h)|t1 > 0. Then, for any such pair (t1 , h),
by Lemma D.1, ẇ(nj , nj+1 )|t1 ≥ 0 for all j with 1 ≤ j < |h|. We put
t2 := min{t > t1 : ρt (h) = ρt1 (h)}.
(D.11)
Note that, when ρt (h) ≥ 1 − at any t, wt (nj , nj+1 ) ≥ 1 − for all j < |h|. From
the definition of the -dynamic and (D.3), we further deduce that when ρ̇(h)|t ≥ 0,
ρ̇(h)|t < 2r; when ρ̇(h)|t < 0, ρ̇(h)|t < −(1−2)r +2(r−1). It follows from(D.11)
and (D.9) that if t2 is finite, then
Z t2
1{t:ρ̇(h)|t <0} dt < (4r)(t2 − t1 ).
t1
We then apply Lemma D.2 to complete the proof.
Tree operations: Given a generic extensive-form game G of perfect information with r(G) = |Nd (G)| > 1, define the set of pre-terminal nodes in G to
be
L(G) := {n ∈ Nd (G) : Ψ−1 (n) ⊆ Nt (G)},
(D.12)
i.e., the subset of decision nodes whose successors are all terminal nodes. We take
a node ñ ∈ L(G) and introduce below two operations on game G.
1. (Remove node ñ) Given the node ñ, we denote the last node played by player
λ(ñ) before ñ by n̂. That is,
Ψ(ñ) ∩ Λλ(ñ) = {n̂} ∪ (Ψ(n̂) ∩ Λλ(ñ) ).
41
(If such n̂ does not exist, then we will not implement this operation on G in
the proof.) We then denote the node n0 to be the immediate successor node
of n̂ in a play to ñ, i.e., n0 ∈ ψ −1 (n̂) ∩ (Ψ(ñ) ∪ {ñ}).
Without loss of generality, let us suppose that for each decision node n in
i
at the start of Section 3.2. We
G, |ψ −1 (n)| > 1. Recall the definition of v(l)
define a new game G̃ with |r(G̃)| ≤ r(G) − 1 by the following two steps.
(a) Remove subgame Gn0 together with the edge (n̂, n0 ).
(b) At node n̂, add two terminal nodes n̂1 and n̂2 such that v i (n̂1 ) =
i
i
+ 1 for all players i 6= λ(n̂) and
− 1 for all players i; v i (n̂2 ) = v(1)
v(κ(G))
λ(n̂)
λ(n̂)
v (n̂2 ) = v(κ(G)) − 2.
Such generated game G̃ is still a generic game.
2. (Replace ñ by a successor terminal node of ñ) Denote ψ(ñ) by n̄. In ψ −1 (ñ),
denote by ñ0 the terminal node with the maximum payoff to the player who
moves at node ñ, i.e.,
v λ(ñ) (ñ0 ) = max
v λ(ñ) (n).
−1
n∈ψ
(D.13)
(ñ)
We define a new game Ḡ with |r(Ḡ)| = r(G) − 1 by the following two steps.
(a) Remove subgame Gñ and the edge (n̄, ñ). We arbitrarily index all
remaining decision nodes as n1 , n2 , ..., n|r(G)−1| . For each node nj , we
add two terminal nodes nj1 and nj2 to nj with the property that
i
v i (nj1 ) = v(κ(G))
−j
(D.14)
j
i
for all players i; v i (nj2 ) = v(1)
+j for all players i 6= λ(nj ) and v λ(n ) (nj2 ) =
λ(nj )
v(κ(G)) − r(G) − j.
(b) Add a new terminal node ñ to n̄ with vñ = vñ0 , where ñ0 is the terminal
node denoted in (D.13) in G.
Such generated game Ḡ is still a generic game.
We derive the set of pure quasi-strategy profiles and mixed quasi-strategy profiles
in G̃ by B̃ and X̃, respectively. Similarly, we can derive B̄ and X̄ in Ḡ.
42
Appendix E
Proof of Theorem 4.2
We prove Theorem 4.2 by induction on the number of decision nodes. Note that
x0 (and hence any xt with t ≥ 0) is an interior state.
Induction hypothesis: Given a natural number η ≥ 1 and any finite generic
extensive-form game G of perfect information with r(G) ≤ η, for every > 0,
there exist a positive number µG () < and a number TG () > 0 such that for any
interior initial state x0 and any µG ()-dynamic (xt )t≥0 in G
Z
1 T
1{t:xt ∈N E[]} dt ≥ 1 − (E.1)
T 0
for all T ≥ TG (). In the proofs later, we also apply a weaker version with (E.1)
replaced by
Z
1 T
1Q() dt ≥ 1 − .
(E.2)
T 0
Note that (E.1) trivially holds when r(G) = 1.
For the game Γ, we let r(Γ) = η + 1 > 1 and assume that the induction
hypothesis above is true for all generic perfect-information games G with r(G) ≤ η.
We pick a node ñ in L(Γ) (defined in (D.12)) and suppose that player i plays at
node ñ.
Step I [when the dynamic in subgame Γñ can be ignored due to (E.4) below]
For the game Γ̃ generated by Tree Operation 1 with respect to the node ñ,
we consider a pair of states x and x̃ below for Γ and Γ̃ with equivalent behavior
quasi-strategy profiles w and w̃, respectively. Recall the definition of w(n1 , n2 ) in
(2.6). For every player j 6= i, we call x̃j consistent with xj if for every move (n, m)
with n ∈ Λj (Γ̃), w̃(n, m) = w(n, m) whenever w(n, m) ≥ 0. If x is an interior
state, and for every move (n, m) included in both Γ and Γ̃, w(n, m) = w̃(n, m)
(when both of them are defined), then we call x̃ an induced state of x. Thus, the
outcome for x and x̃ are the same in the subset {h ∈ H(Γ) : n0 ∈
/ h}. We denote
the set of induced states of x by in(x) ⊂ X̃.
We show below that if the node ñ ∈ Λi has been rarely included in the domain
of any supporting strategy in the state x, then a best-response strategy of any
player other than i in an induced state x0 for Γ̃ is “almost” consistent with a bestresponse strategy of that player in the state x. We use the proposition that any
achievable payoff of a player j 6= i in the subgame Γn0 is a convex combination
of v j (n̂1 ) and v j (n̂2 ). (Recall that n0 is the immediate successor node of n̂ in the
play to node ñ.) Recall in (3.7) that for a pure quasi-strategy bi , N̄bi is denoted
as the domain of bi .
43
Lemma E.1. Given an interior state x for Γ, suppose that
X
xi (bi ) < µ.
(E.3)
bi :ñ∈N̄bi
Then, given any player j 6= i and a best response g j ∈ BRj (x), for any state
x̃ ∈ in(x) for Γ̃, there exists a state x̃0 ∈ x̃[µ] and a best response g̃ j ∈ BRj (x̃0 )
such that g̃ j is consistent with g j .
Proof. Recall that given any state x̃ ∈ in(x), for all nodes in Nd (Γ̃) \ {n̂}, the
probability measures assigned in the equivalent behavior strategy profile w̃ are the
same as the one in w. We can show that given a state x and a player j 6= i, there
exists x̃0 ∈ in(x) with w̃0 (n̂, n̂1 ) + w̃0 (n̂, n̂2 ) = w(n̂, n0 ), and there is a g̃ j ∈ BRj (x̃0 )
consistent with the given g j ∈ BRj (x). This conclusion can be deduced from the
payoff vectors of nodes n̂1 and n̂2 . To be specific, given any behavior strategy
wj of player j, we can choose appropriate w̃0 (n̂, n̂1 ) and w̃0 (n̂, n̂2 ) such that the
expected payoff of player j for the subgame Γn̂ in Γ against the induced behavior
strategy profile of w−j is the same as the expected payoff for Γn̂ in Γ̃ generated by
the behavior strategy w̃j consistent with wj against the induced behavior strategy
profile of w0−j .
From (E.3), it follows that for any x̃ ∈ in(x), x̃0 ∈ x̃[µ].
In the following lemma, for a µ-dynamic (xt )t≥0 , gt is the direction of the
solution trajectory at time t, as defined in Definition 4.1. Similarly, for a µdynamic (x̃t )t≥0 in Γ̃, g̃t is defined for the dynamic (x̃t )t≥0 . The following lemma
is an extension of the lemma above: given a µ-dynamic (xt )t with proposition
(E.3) all the time, we generate a 3µ-dynamic (x̃t )t in game Γ̃ such that both the
outcome and the best-response strategy profile of (x̃t )t are close to the ones of
(xt )t all the time.
Lemma E.2. Given an interior µ-dynamic (xt )t≥0 on Γ, suppose that
X
xit (bi ) < µ ∀t ≥ 0.
(E.4)
bi :ñ∈N̄bi
Then, for the game Γ̃ generated by Tree Operation 1 with respect to node ñ, there
exists a 3µ-dynamic (x̃t )t≥0 in Γ̃ such that for all players j 6= i, x̃jt is consistent
with xjt , and x̃t ∈ in(xt [µ]) for all t ≥ 0. Moreover, for all players j 6= i, g̃tj is
consistent with gtj for all t ≥ 0; for player i, if ñ ∈
/ N̄gti at time t, then g̃ti = gti .
44
Proof. We define a shadow process (zti )t≥0 to (xit )t≥0 on Γ such that z0i = xi0 and
/ N̄gti . When
ż i = g̃ i − z i where g̃ i is defined as follows. We let g̃ti = gti whenever ñ ∈
0
ñ ∈ N̄gti , we can write gti as gti = c1 xi + c2 xi with c1 + c2 = 1, c1 > 0, c2 > 0,
ñ ∈
/ N̄xi , and ñ ∈ N̄xi 0 . So xi is in X i ∩ X̃ i . We then let
g̃ti = c1 xi + c2 b̃i
(E.5)
in which b̃i can be arbitrarily chosen in B i ∩ B̃ i . We show that, in any such shadow
process (zti )t≥0 , for any t ≥ 0,
|xit (bi ) − zti (bi )| ≤ µ
(E.6)
for all bi ∈ B i , from the following observation.
Suppose that for a period of time [t1 , t2 ], we let b̃it in (E.5) be the same for
all time t in [t1 , t2 ] whenever g̃ti 6= gti . If we further suppose that zt1 (b̃i ) = xt1 (b̃i ),
then zt1 (b̃i ) > xt1 (b̃i ). For the general case, we can infer that zt (b̂i ) ≥ xt (b̂i ) for all
b̂i ∈ B i ∩ B̃ i and all t ≥ 0. Hence we reach (E.6).
From (E.6) and Lemma E.1, we complete the proof.
Given a µ-dynamic (xt )t≥0 in Γ, we can find a 3µ-dynamic (x̃t )t≥0 in Γ̃ whose
outcome is always “close to” the outcome of (xt )t≥0 . Since by induction hypothesis,
the outcome of (x̃t )t≥0 “weakly” converges to a play in Γ̃ as in (E.2), the outcome of
(xt )t≥0 also “weakly” converges to a play in Γ, and hence the µ-dynamic “weakly”
converges to the set of Nash equilibria as in (E.1). Recall that r(G) denotes the
number of decision nodes in the game G.
Lemma E.3. Given game Γ and > 0, there exists µ > 0 and T̄ > 0 with the
following property. For any interior µ-dynamic in Γ with the property (E.4) for
some node ñ in L(Γ), (D.10) holds for all T ≥ T̄ in the µ-dynamic.
Proof. Suppose that given a positive < 1, for some µ with 0 < µ < 1 − and
two states x and x̃ for Γ and Γ̃, respectively, the following four conditions holds.
(i) (E.3) holds.
(ii) For all j 6= i, x̃j is consistent with xj .
(iii) x̃ ∈ in(x[µ]).
(iv) ρ(h) > 1 − for some h ∈ H(Γ).
45
(Recall that ρ(h) is the probability that play h is followed in state x; see (3.6).)
Then, it follows that h ∈ H(Γ̃) and ρ̃(h) > 1 − − µ.
Take an Γ that satisfies two conditions (D.8) and (D.9) in Theorem D.3.
Denote min{/2, Γ /2} by 1 . Since r(Γ̃) < r(Γ), we apply the induction hypothesis
on Γ̃ and obtain µΓ̃ (1 ) < 1 . Consider any positive µ ≤ µΓ̃ (1 )/3. By (E.2) in the
induction hypothesis with respect to the 3µ-dynamic on Γ̃ in Lemma E.2, we can
find TΓ̃ (1 ) such that in the interior µ-dynamic in Γ
1
T
Z
T
1W (1 +µ) dt > 1 − 1
0
for all T ≥ TΓ̃ (1 ). Note that 1 + µ < min{, Γ }. Theorem D.3 completes the
proof.
Step II [when ñ can be viewed as a terminal node due to (E.7) and (E.8)]
For simplicity, we assume no chance nodes in Γ. The extension to general
games follows by the generic condition.
Recall the game Γ̄ obtained by Tree Operation 2 and that player i moves at
node ñ in Γ. (We can tell from the context whether ñ refers to the decision node in
Γ or the terminal node in Γ̄.) We define a function q : int(X) → int(X̄) such that
if q(x) = x̄ ∈ X̄ and w and w̄ are the equivalent behavior strategy profiles of x and
x̄, respectively, then for all moves (n1 , n2 ) in both Γ and Γ̄, w(n1 , n2 ) = w̄(n1 , n2 )
(can be both undefined at the same time). Thus, q(x) is the canonical projection
of the quasi-strategy profile x in Γ̄. Moreover, w̄(n, n1 ) = w̄(n, n2 ) = 0 for all
nodes n in Nd (Γ̄), where n1 and n2 are the terminal nodes denoted in step (a) in
Tree Operation 2.
Lemmata E.4, E.5, and E.6 show that for any interior µ4 /4-dynamic (xt )t≥0 ,
CRl (xt [µ4 /4]) ⊆ CRl (q(xt )[µ2 /2]) for all players l 6= i.
Recall the constant δ defined for Γ in (3.4), the set S(bi ) in (3.8), and node
ñ0 denoted in (D.13). Recall that Nt (Γ̄) denotes the set of terminal nodes in Γ̄.
Denote the player who plays at node n̄ = ψ(ñ) by player j. (It is possible that
j = i.) The lemma below considers in Γ two subplays with the same starting
node, one ending at ñ and the other ending at a terminal node not in Γñ . It is
also supposed that all decision nodes except ñ in these two subplays are played by
player j. The conjunction of conditions (E.7) and (E.8) below requires that the
node ñ be highly likely to be followed by the terminal node ñ0 in Γ in the state x.
If player j’s (approximate) best response to state x requires all moves along one
subplay, then for the projection state q(x) in Γ̄, any best response of player j to
46
q(x) cannot include all moves in the other subplay. This follows from the generic
condition in Γ and the fact that ñ0 gives player i the maximum payoff among all
terminal nodes in Γñ .
Lemma E.4. Given an interior state x for Γ and a positive number µ, suppose
that
X
X
µ4
(E.7)
0<
xi (bi ) ≤
4
−1
0
i
i
n∈ψ
and
(ñ)\{ñ } b :(ñ,n)∈S(b )
X
µ2
≤
xi (bi ) < 1.
δ
i
(E.8)
b :ñ∈N̄bi
Suppose further that in Γ̄, there are two subplays (n11 , n12 , ..., n1k1 ) and (n21 , n22 , ..., n2k2 )
with n1k1 = ñ 6= n2k2 , n2k2 ∈ Nt (Γ̄), n11 = n21 ; except that n1k1 is not necessarily
played by player j, all decision nodes in these two subplays are played by player
j . Then, for any pure quasi-strategy g j ∈ CRj (x[µ4 /4]) in Γ and pure quasistrategy ḡ j ∈ BRj (q(x)) in Γ̄, the following statement is NOT true. For i = 1 or
3−i
j
2, (nil , nil+1 ) ∈ S(g j ) for all l < ki , and (n3−i
l , nl+1 ) ∈ S(ḡ ) for all l < k3−i .
Remark: If it is (n1k1 −1 , n1k1 ) ∈ S(g j ), then n1k1 denotes the decision node ñ in
Γ.
Proof. It follows from (E.7) and (E.8) that
0
0
w (ñ, ñ ) ≥
µ2
δ
−
µ2
δ
µ4
2
δ
>1− .
2
for all w0 with the equivalent profile x0 ∈ x[µ4 /4]. We complete the proof by the
definition of δ in (3.4).
Note that the set of pure quasi-strategies for any player l 6= i in Γ are the same
as the one in Γ̄. The following two lemmata consider further the case where at
least one decision node in the two subplays studied in Lemma E.4 is not played
by player j, i.e., N̂ 6= ∅ in Lemma E.6 below.
Recall in Section 3.2 that N̄b denotes the domain of the pure quasi-strategy b,
i.e., the set of origin nodes in all moves included in b.
Lemma E.5. Given an interior state x for Γ, for any player l 6= i, if BRl (x) =
6
BRl (q(x)), then for any g l ∈ BRl (x), there exists a pure quasi-strategy ḡ l ∈
BRl (q(x)) and a unique node nl such that ñ ∈ Ψ−1 (nl ), nl ∈ N̄gl ∩ N̄ḡl , and
Ψ−1 (nl ) ∩ N̄gl ∩ N̄ḡl = ∅.
47
Remark: Roughly speaking, a node nl is where g l and ḡ l are deviating from
each other.
Proof. This follows from the definition of q(x) and the construction of game
Γ̄.
The following lemma shows that if the structure condition of game Γ in Lemma
E.4 does not hold, but the conjunction of (E.7) and (E.8) holds in state x, then g l
is in fact a best response of player l in game Γ̄ for a state in the µ2 /4-neighborhood
of q(x). In the proof, we make it possible to reach the newly added terminal nodes
n1 and n2 of some original decision nodes in Γ, and such a modified state for game
Γ̄ is more favorable to g l than ḡ l .
Lemma E.6. We continue with the setting in Lemma E.5, suppose BRl (x) 6=
BRl (q(x)), and obtain the node nl with respect to (g l , ḡ l ). We define a set
N̂ := {n̂ ∈ Ψ−1 (nl ) ∩ Λ−l : (Ψ(n̂) ∩ Ψ−1 (nl )) ⊂ Λl }.
If N̂ 6= ∅ and both (E.7) and (E.8) hold, then there exists an interior state
z ∈ q(x)[µ2 /4] for Γ̄ such that g l ∈ BRl (z).
Remark: N̂ is the subset in Ψ−1 (nl ) that contains all first nodes not played
by player l after nl . For any two nodes n1 , n2 ∈ N̂ , n1 ∈
/ Ψ(n2 ) ∪ Ψ−1 (n2 ).
Proof. From (E.7) and (E.8), it follows that in Γ, for the equivalent behavior
quasi-strategy profile w to x,
w(ñ, ñ0 ) > 1 − (µ2 /4).
(E.9)
We first consider the case that g l is a pure quasi-strategy. Since g l 6= ḡ l , one
and only one of the following statements is true:
g l (nl ) ∈ Ψ(ñ) ∪ {ñ} and ḡ l (nl ) ∈
/ Ψ(ñ)
(E.10)
ḡ l (nl ) ∈ Ψ(ñ) ∪ {ñ} and g l (nl ) ∈
/ Ψ(ñ).
(E.11)
or
We first suppose that (E.10) holds, and we generate a state z ∈ q(x)[µ2 /4] as follows. Denote the equivalent behavior quasi-strategy profile to q(x) by w̄. To
generate z, for the equivalent behavior quasi-strategy profile wz to z, we let
wz (n, n0 ) = w̄(n, n0 ) for all moves (n, n0 ) with n ∈ Nd (Γ̄) \ N̂ in Γ̄.
If N̂ ∩ Ψ(ñ) 6= ∅, then from the remark above, we infer that there is only one
node n̂ in N̂ ∩ Ψ(ñ). For this unique node n̂, we then let wz (n̂, n̂2 ) = µ2 /4, where
48
n̂2 is defined in step (a) in Tree Operation 2; wz (n̂, n̂0 ) = (1 − µ2 /4)w̄(n̂, n̂0 ) for all
nodes n̂0 in ψ −1 (n̂) \ {n̂2 }.
Regardless whether N̂ ∩ Ψ(ñ) 6= ∅, for every node n̂ in N̂ \ Ψ(ñ), we let
wz (n̂, n̂1 ) = µ2 /4, where n̂1 is defined in step (a) in Tree Operation 2; wz (n̂, n̂0 ) =
(1−µ2 /4)w̄(n̂, n̂0 ) for all nodes n̂0 in ψ −1 (n̂)\{n̂1 }. We then transfer the generated
behavior quasi-strategy profile wz to a state z in Γ̄.
If (E.11) holds, we apply a similar process to obtain z̄ but with n̂1 and n̂2
swapped. In either case, we find ul (g l , z −l ) > ul (ḡ l , z −l ) in Γ̄ and hence g l ∈
BRl (z).
For the case that g l is a mixed quasi-strategy, we construct wz by a similar
procedure as above, except that wz (n̂, n̂1 ) and wz (n̂, n̂2 ) are not always set to
zero or µ2 /4. By a similar argument to (E.5), we can find a distribution on
(wz (n̂, n̂1 ), wz (n̂, n̂2 ))n̂∈N̂ with wz (n̂, n̂1 ) + wz (n̂, n̂2 ) = µ2 /4 for all n̂ in N̂ such
that for the generated state z equivalent to wz , g l ∈ BRl (z).
In summary, g l ∈ BRl (z). We can check by (E.9) that z is in q(x)[µ2 /4]. Such
constructed state z is not an interior state for Γ̄, as wz (n, n1 ) = wz (n, n2 ) = 0 for
all n ∈ Nd (Γ̄) \ N̂ , but it is straightforward to modify it to an interior state with
the above required proposition. For example, let wz (n, n1 ) = wz (n, n2 ) = µ̂ for a
very small µ̂ > 0 for all n ∈ Nd (Γ̄) \ N̂ and then make appropriate modification
on wz (n, n1 ) and wz (n, n2 ) for all nodes in N̂ .
In the next lemma, we apply Lemmata E.4 and E.6 in an approximate bestresponse dynamic.
Lemma E.7. Given a µ4 /4-dynamic (xt )t≥0 in Γ, we suppose that both (E.7) and
(E.8) hold at all t ≥ 0. Then (q(xt ))t≥0 is a µ2 /2-dynamic in Γ̄.
Proof. Given any decision node n in Γ̄, from the payoff vectors of node n1 and
n2 and the definition of the approximate best-response dynamic, it follows that in
any µ2 /2-dynamic (x̄t )t≥0 from the initial state x̄0 = q(x0 ) in Γ̄,
X
X
λ(n)
λ(n)
x̄t (b) =
x̄t (b) = 0
b:(n,n1 )∈S(b)
b:(n,n2 )∈S(b)
for all t ≥ 0.
Recall that player i moves at node ñ in Γ, and that if player i’s best response
requires a move at node ñ, then it must be towards node ñ0 . When x̄t = q(xt )
at any t ≥ 0, it is straightforward to see that for any pure quasi-strategy profile
b ∈ CR(xt ) in Γ, q(b)i is in CRi (x̄t ) in Γ̄. Therefore, for any pure quasi-strategy
profile b ∈ CR(xt [µ4 /4]) in Γ, q(b)i is in CRi (x̄t [µ4 /4]) in Γ̄.
49
Recall that player j moves at node n̄. For a player l = j with N̂ 6= ∅ (defined
in Lemma E.6), or a player l with l 6= j or i, which implies N̂ 6= ∅, we can always
adjust w̄t (n, n1 ) and w̄t (n, n2 ) for the behavior quasi-strategy profile w̄t equivalent
to x̄t = q(xt ) at some nodes n, as shown in the proof of Lemma E.6, such that if
b ∈ CR(xt [µ4 /4]) in Γ, then q(b)l is in CRl (x̄t [µ2 /2]) in Γ̄. (The final adjustment
consists of two parts: the distance of µ4 /4 in xt [µ4 /4] and the adjustment shown
in Lemma E.6.)
Note that if player l = j and N̂ = ∅, then we cannot adjust in that way to make
q(b)j ∈ CRj (x̄t [µ2 /2]), since player j’s best response cannot include a move to a
j
newly added terminal node in Γ̄, which gives a payoff worse than v(κ(Γ))
to player
j. However, when N̂ = ∅, Lemma E.4 implies that for any pure quasi-strategy
profile b ∈ CR(xt ) in Γ, q(b)j is in CRj (x̄t ) in Γ̄.
Given a µ4 /4-dynamic (xt )t≥0 in Γ, we can find a µ2 /2-dynamic (x̄t )t≥0 in Γ̄
whose outcome is always “close to” the outcome of (xt )t≥0 . Since by induction
hypothesis, the outcome of (x̄t )t≥0 “weakly” converges to a play in Γ̄ as in (E.2),
the outcome of (xt )t≥0 also “weakly” converges to a play in Γ, and hence the µ4 /4dynamic “weakly” converges to the set of Nash equilibria as in (E.1). Recall that
r(G) denotes the number of decision nodes in the game G.
Lemma E.8. Given the game Γ and > 0, there exist µ̄ with 0 < µ̄ < and
T̄ > 0 with the following property. For any interior µ4 /4-dynamic (xt )t≥0 in Γ
with µ < µ̄ and with the properties (E.7) and (E.8) at some node ñ ∈ L(Γ) for all
t ≥ 0, (D.10) holds for all T ≥ T̄ .
Proof. Take Γ that satisfies two conditions (D.8) and (D.9) in Theorem D.3. We
denote min{/2, Γ /2} by 2 . Since r(Γ̄) < r(Γ), we apply the induction hypothesis
on Γ̄ and obtain µΓ̄ (2 ) < 2 . Take µ̄ := (2µΓ̄ (2 ))1/2 , and consider any positive
µ < µ̄. Note that from (E.7) and (E.8), it follows that wt (ñ, ñ0 ) > 1 − µ2 /4 for all
t ≥ 0. Hence, if for an associated dynamic (x̄t )t≥0 = (q(xt ))t≥0 , we suppose that
ρ̄t0 (h̄) = p for the play h̄ up to the terminal node ñ in Γ̄ at some time t0 , then
for (xt )t≥0 , ρt0 (h) ≥ p(1 − µ2 /4) for the play h = h̄ ∪ {ñ0 } in Γ. By (E.2) in the
induction hypothesis with respect to a µ2 /2-dynamic in Γ̄ and Lemma E.7, we can
find TΓ̄ (2 ) such that for the µ2 /4-dynamic in Γ
Z
1 T
1W (2 +µ2 /4) dt > 1 − 2
T 0
for all T > TΓ̄ (2 ). From µ2 /4 < µΓ̄ (2 )/2 < 2 /2, it follows that
Z
1 T
1W (32 /2) dt > 1 − 2
T 0
50
for all T > TΓ̄ (2 ). Theorem D.3 completes the proof, as 32 /2 ≤ min{3/4, 3Γ /4}.
Step III: [Either the condition in Step I or the one in Step II holds at any
time after a sufficiently long period of time.]
We first give a corollary from Lemmata E.3 and E.8.
Corollary E.9. Given the game Γ and > 0, there exists µ̂ with 0 < µ̂ < and
T̂ > 0 with the following property. For any interior µ4 /4-dynamic (xt )t≥0 on Γ
with µ < µ̂, and with either of the two conditions below at some node ñ ∈ L(Γ),
(i) (E.4)
(ii) the conjunction of (E.7) and (E.8) for all t ≥ 0,
it follows that
Z
T
1{t:xt ∈N E[]} dt ≥ T (1 − )
(E.12)
0
for all T ≥ T̂ .
To prove Theorem 4.2, we observe that in the long run, the approximate bestresponse dynamic satisfies at least one condition in Corollary E.9. If it satisfies one
condition for sufficiently long period of time, then the dynamic “weakly” converges
to N E during that period of time. We then only need to prove that that period
of time is sufficiently long by Lemma 3.3, a basic proposition in a best-response
dynamic. Recall in (3.7) that for a pure quasi-strategy bi , N̄bi is denoted as the
domain of bi , and recall in (3.8) that S(bi ) denotes the set of moves included in bi .
Proof of Theorem 4.2. Recall ñ ∈ L(Γ), and the two associated games Γ̃ and
Γ̄ obtained by Tree Operations 1 and 2, respectively. Recall that in Γ node ñ is
played by player i. We first note that in the long run, when player i needs to move
at node ñ, she rarely picks a move other than (ñ, ñ0 ). More rigorously, given any
interior approximate best-response dynamic (xt )t≥0 , for any pure quasi-strategy
g̃ti ∈ CRi (xt ) at any time t, if ñ ∈ N̄g̃ti , then g̃ti (ñ) must be ñ0 . Therefore, for any
number c > 0, there exists a time t̄ such that
∀t ≥ t̄,
X
X
xit (bi ) ≤ c
(E.13)
n∈ψ −1 (ñ)\{ñ0 } bi :(ñ,n)∈S(bi )
for all interior approximate best-response dynamics (xt )t≥0 ; see Lemma 3.3.
We take µ under the following constraints:
51
(i) µ < δ/2;
(ii) T̂ < ln δ − ln(2µ), where T̂ is defined in Corollary E.9;
(iii) µ < µ̂ in Corollary E.9.
We consider any interior µ4 /4-dynamic (xt )t≥0 with the µ taken under the
above constraints, and then transform this dynamic to a dynamic (zt )t≥0 with
zt = xt+t̄ in which t̄ is defined in (E.13) in which c is replaced by µ4 /4. We define
a sequence of times (tk )k≥0 with t0 = 0 such that for any k ≥ 0
X
i i
t2k+1 = min t ≥ t2k :
zt (b ) ≥ µ
i
b :ñ∈N̄bi
and
t2k+2 = min
t ≥ t2k+1 :
X
zti (bi ) <
bi :ñ∈N̄bi
2
µ
.
δ
Note that it is possible that there exists a natural number l such that tk = +∞
P
for all k ≥ l. We know that during [t2k+1 , t2k+2 ) for any k ≥ 0, bi :ñ∈N̄ i zti (bi ) has
b
decreased from µ to µ2 /δ. From constraint (i) above, it follows that µ/2 > µ2 /δ.
From Lemma 3.3, we may further infer a lower bound of [t2k+1 , t2k+2 ) that
X
µ
i i
t2k+2 − t2k+1 > min t > t2k+1 :
zt (b ) =
2
bi :(ñ,ñ0 )∈S(bi )
2
X
µ
− min t > t2k+1 :
zti (bi ) =
δ
i
0
i
b :(ñ,ñ )∈S(b )
> ln
δ
2
− ln
2
µ
µ
= ln δ − ln(2µ).
(E.14)
Recall the constraint (ii) and (iii) above concerning µ. During [t2k+1 , t2k+2 ) for
any k ≥ 0, we can apply the condition of the conjunction of (E.7) and (E.8) in
Corollary E.9. It then follows that during [t2k+1 , t2k+2 ),
Z 2k+2
1
1{t:zt ∈N E[]} dt > 1 − .
(E.15)
t2k+2 − t2k+1 2k+1
From the definition of t2k+1 , it follows that, for any t with t2k ≤ t < t2k+1 ,
X
zti (bi ) < µ.
(E.16)
bi :ñ∈N̄bi
52
We can then apply condition (E.4) in Corollary E.9 for any [t2k , t2k+1 ) with k ≥ 0.
Moreover, by Corollary E.9, we can see from constraint (ii) above, (E.14), and
(E.15) that during (t2k , t2k+3 ) for any k ≥ 0,
1
t2k+3 − t2k
Z
2k+3
1{t:zt ∈N
/ E[]} dt <
2k
3T̂ < 3.
t2k+2 − t2k+1
Therefore, in the original µ4 /4-dynamic (xt )t≥0 ,
1
T
t̄+T
Z
1{t:xt ∈N E[]} dt ≥ 1 − 3
t̄
for all T ≥ 2T̂ . We finally let µ(4) := µ2 /4 and T (4) := t̄/ + 2T̂ . It follows that
for any µ(4)-dynamic in Γ
1
T
Z
T
1{t:xt ∈N E[4]} dt ≥ 1 − 4
0
for all T ≥ T (4). We have thus shown that the induction hypothesis (E.1) can
also be applied to game Γ, and indeed to all generic extensive-form games of
perfect information with η + 1 decision nodes.
Appendix F
Proofs in Section 5
Proof of Theorem 5.1. We prove it by induction on the number of decision
nodes, with the following induction hypothesis. Recall that in a game G, r(G) is
the number of decision nodes.
Induction hypothesis: Given a natural number η ≥ 1, we consider any finite
generic extensive-form game G of perfect information in which every player can
move at only one node and r(G) ≤ η. For every ¯ > 0, there exist a number µG (¯)
with 0 < µG (¯) < ¯ and a number TG (¯) > 0 such that for any initial state x0 and
every interior µG (¯)-dynamic (xt )t≥0 in G
xt ∈ BIE[¯]
(F.1)
for all t ≥ TG (¯). Note that this assumption trivially holds when r(G) = 1.
Given the game Γ with r(Γ) = η + 1 > 1, recall that the root of Γ is n0 .
Without loss of generality, we assume that n0 is not a chance node. We index the
nodes in ψ −1 (n0 ) by n1 , n2 , ..., nl . So l < η. Recall δ defined in (3.4). We now
take an
min{δ, }
0 <
.
(F.2)
2η 2
53
For each subgame Γni with 1 ≤ i ≤ l, we obtain µΓni (0 ) and TΓni (0 ) defined
in the induction hypothesis above for game Γni . Let µ̄ := minli=1 µΓni (0 ) and
T̄ := maxli=1 TΓni (0 ). For any µ̄-dynamic (xt )t≥0 in Γ, by (F.1) in the induction
hypothesis, we may infer that for all i with 1 ≤ i ≤ l, xt (Γni ) ∈ BIE(Γni )[0 ] for
any t ≥ T̄ , where xt (Γni ) is the continuation strategy profile in the subgame Γni
at time t in the µ̄-dynamic (see Section 3.2), and BIE(Γni ) denotes the backward
induction equilibrium in the subgame Γni .
By (3.4), (F.2), and µ̄ < 0 , we observe that from time T̄ on, the best-response
strategy for the player who moves at the root, i.e., λ(n0 ), is the backward induction
move at the root. From Lemma 3.3, it follows that in Γ, for all t ≥ T̄ − ln ,
xt ∈ BIE[]. Hence, we can take µ̄ for µ() and T̄ − ln for T (), respectively,
with respect to the game Γ.
Proof of Theorem 5.3. We prove it by induction on the number of decision
nodes, with the following induction hypothesis.
Induction hypothesis: Given a natural number η ≥ 1 and any finite generic
extensive-form game G of perfect information with r(G) = η and 0 > 0, for any
initial state x0 and for any µ > 0, there exists a µ-dynamic (xt )t≥0 in G and a
number TG such that
xt ∈ BIE[0 ] ∀t > TG .
(F.3)
Note that the above assumption trivially holds when r(G) = 1.
We prove the desired result for game Γ with r(Γ) = η + 1 > 1 under the
assumption that the induction hypothesis above holds for all games G with r(G) =
η. Recall in (D.12) that L(Γ) is the set of pre-terminal nodes in Γ. We pick a node
ñ in L(Γ). Without loss of generality, we assume that ñ is not a chance node.
Given the initial state x0 in Γ and for any µ0 > 0, we can always find a µ0 dynamic (xt )t≥0 with an associated finite time t0 such that
∀t ≥ t0 , 0 <
X
X
n∈ψ −1 (ñ)\{ñ0 } bi :(ñ,n)∈S(bi )
xit (bi ) ≤
n o
1
min
, µ0 ,
2
2
(F.4)
where node ñ0 is defined in Tree Operation 2 with respect to ñ in the game Γ̄.
This can be done if in the µ0 -dynamic we always make the node ñ reached in the
chosen target state in CR(xt [µ0 ]).
We firstly take an µ0 -dynamic (x̂t )0≤t≤t0 with x̂0 = x0 , µ0 = µ/2, and proposition (F.4). Put x̄0 = q(x̂t0 ), where q is defined in Step II in the proof of Theorem
4.2. We apply the induction hypothesis in Γ̄ with respect to 0 = /2, and obtain a
µ/2-dynamic (x̄t )t≥0 from the initial state x̄0 and the associated TΓ̄ with property
54
(F.3). By the definition of the µ/2-dynamic in Γ̄, at any time t, x̄˙ l |t = ḡtl − x̄lt
where ḡtl ∈ CRl (x̄t [µ/2]) for all players l in Γ̄. From (F.4), we compare the game
Γ with Γ̄, and construct a µ-dynamic (xt )t≥0 in Γ such that
1. xt = x̂t for all t with 0 < t ≤ t0 ,
j
2. ẋj |t = ḡt−t
− xjt for all player j 6= λ(ñ) and for all t > t0 ,
0
3. ẋi |t = gti − xit in which i = λ(ñ) and q(gt )i = ḡti , for all t > t0 .
From (F.3), we then infer that xt ∈ BIE[] for all t > t0 + TΓ̄ .
References
Aubin, J., and A. Cellina (1984): Differential Inclusions. Berlin: Springer.
Aumann, R. J. (1995): “Backward Induction and Common Knowledge of Rationality,” Games and Economic Behavior, 8 (1), 6–19.
Balkenborg, D., C. Kuzmics, and J. Hofbauer (2013): “Refined Best-Response
Correspondence and Dynamics,” Theoretical Economics, 8 (1), 165–192.
Benaim, M., J. Hofbauer, and S. Sorin (2005): “Stochastic approximations and differential inclusions, part II: Aplications,” Mathematics of Operations Research,
31, 673–695.
Benaim, M., and J. W. Weibull (2003): “Deterministic Approximation of Stochastic Evolution in Games,” Econometrica, 71, 873–903.
Ben Porath, E. (1997): “Rationality, Nash Equilibrium and Backward Induction
in Perfect Information Games,” Review of Economic Studies, 64, 23–46.
Berger, U. (2005): “Fictitious play in 2×n games,” Journal of Economic Theory,
120, 139–154.
Cressman, R. (2003): Evolutionary Dynamics and Extensive Form Games. Cambridge (MA): MIT Press.
Cressman, R., and K. H. Schlag (1998): “The Dynamic (in)Stability of Backward
Induction,” Journal of Economic Theory, 83, 260–285.
Gilboa, Y., and A. Matsui (1991): “Social Stability and Equilibrium,” Econometrica, 59, 859–867.
55
Gorodeisky, Z. (2006): “Evolutionary Stability for Large Populations and Backward Induction,” Mathematics of Operations Research, 31, 369–380.
Hart, S. (1992): Games in Extensive and Strategic Forms, in Handbook of Game
Theory with Economic Applications, Edition 1, Volume 1, ed. by R. J. Aumann
and S. Hart. North Holland: Elsevier, 19–40.
(2002): “Evolutionary Dynamics and Backward Induction,” Games and
Economic Behavior, 41, 227–264.
Hart, S., and A. Mas-Colell (2003): “Regret-Based Continuous-Time Dynamics,”
Games and Economic Behavior, 45, 375-394.
(2005): “Uncoupled Dynamics do not Lead to Nash Equilibrium,” American Economic Review, 93, 1830–1836.
Hendon, E., H. J. Jacobson, and B. Sloth (1996): “Fictitious Play in Extensive
Form Games,” Games and Economic Behavior, 15, 177–202.
Hofbauer, J. (1995): “Stability for the Best Response Dynamics,” mimeo, University of Vienna.
Hofbauer, J., and W. H. Sandholm (2002): “On the Global Convergence of
Stochastic Fictitious Play,” Econometrica, 70, 2265–2294.
Hofbauer, J., and K. Sigmund (1998): Evolutionary Games and Population Dynamics. Cambridge University Press.
Kreps, D., and R. Wilson (1982): “Sequential Equilibria,” Econometrica, 50, 863–
894.
Kuhn, H. W. (1950): “Extensive games,” Proceedings of the National Academy of
Sciences, 36 (10), 570-576.
(1953): Extensive Games and the Problem of Information, in Contributions
to the Theory of Games II, ed. by H. W. Kuhn and A. W. Tucker, Annals of
Mathematics Studies, Vol. 28. Princeton: Princeton University Press, 193–216.
Kuzmics, C. (2004): “Stochastic Evolutionary Stability in Extensive Form Games
of Perfect Information,” Games and Economic Behavior, 48 (2), 321–336.
Matsui, A. (1989): “Social Stability and Equilibrium,” CMS-DMS No. 819, Northwestern University.
56
Nash, J. (1950): “Equilibrium Points in N-person Games,” Proceedings of the
National Academy of Sciences of the United States of America, 36, 48–49.
Nöldeke, G., and L. Samuelson (1993): “An Evolutionary Analysis of Backward
and Forward Induction,” Games and Economic Behavior, 5, 425–454.
Pearce, D. G. (1984): “Rationalizable Strategic Behavior and the Problem of
Perfection,” Econometrica, 52, 1029–1050.
Perea, A, (2014): “Belief in the Oponents’ Future Rationality,” Games and Economic Behavior, 83, 231–154.
Ritzberger, K. (2002): Foundations of Non-Cooperative Game Theory. Oxford
University Press.
Sandholm, W. H., Population Games and Evolutionary Dynamics. MIT Press.
Selten, R. (1965): “Spieltheoretische Behandlung eines Oligopolmodells mit Nachfragetragheit,” Z. Ges. Staatswissen., 12, 301–324.
(1975): “Re-examination of the Perfectness Concept for Equilibrium Points
in Extensive Games,” International Journal of Game Theory, 4, 25–55.
Young, H. P. (2009): “Learning by Trial and Error,” Games and Economic Behavior, 65, 626–643.
57
© Copyright 2026 Paperzz