Equilibrium Uniqueness with Gradual and Noisy

Equilibrium Uniqueness with Gradual and Noisy
Adjustment∗
Ryota Iijima and Akitada Kasahara†
September 3, 2016
Abstract
We study a model of gradual adjustment in games, in which players can flexibly adjust
and monitor their positions up until a deadline. Players’ terminal and flow payoffs are
influenced by their positions. We show that, unlike in one-shot games, the equilibrium is
unique for a broad class of terminal payoffs when players’ actions are flexible enough in
the presence of (possibly small) noise. In a team-production model, the unique equilibrium approximately selects the efficient outcome when adjustment friction is small. We
also examine the welfare implications of such gradualness in applications, including team
production, hold-up problems, and dynamic contests.
∗
Results in Section 5 are based on our previous work “Gradual Adjustment and Equilibrium Uniqueness
under Noisy Monitoring,” which was awarded the 18th Moriguchi Prize by the Institute of Social and Economic
Research of Osaka University (Iijima and Kasahara, 2015a). We thank Drew Fudenberg and Akihiko Takahashi,
as well as Nageeb Ali, Ian Ball, Darrell Duffie, Mira Frick, George Georgiadis, Ben Golub, Johannes Hörner,
Yuhta Ishii, Matthew Jackson, Yuichiro Kamada, Kazuya Kamiya, Michihiro Kandori, David Laibson, Nicolas
Lambert, Rida Laraki, Jonathan Libgober, Bart Lipman, Chiara Margaria, Eric Maskin, Akihiko Matsui,
Stephen Morris, Nozomu Muto, Wojciech Olszewski, Daisuke Oyama, Larry Samuelson, Yuliy Sannikov, Andy
Skrzypacz, Bruno Strulovici, Tomasz Strzalecki, Takuo Sugaya, Masashi Toda, Zhe Wang, Daichi Yanagisawa,
Jeff Zwiebel, and seminar/conference participants at various places. All remaining errors are ours.
†
Iijima: Cowles Foundation for Research in Economics, Yale University, Email:[email protected], and
Kasahara: Graduate School of Business, Stanford University, Email:[email protected].
1
Introduction
In a variety of economic settings, strategic interactions have often been modeled as simple
one-shot decisions. Under this approach, time is not explicitly modeled and the outcome of a
game is immediately determined by the players’ once-and-for-all and simultaneous decisions.
However, this approach abstracts away from the fact that players often take a certain amount of
time to finalize their actions in real-world settings, such as investment or production. In these
situations, players may want to gradually adjust their positions as additional noisy information
arises. This paper examines the implications of such gradualness in games, emphasizing the
contrast with the one-shot modeling.
As a motivating example, consider several firms engaged in a joint project with a fixed end
date, such as the development of a new product with a pre-determined release date. Each
firm continuously invests to improve the quality of the product, all the way up to the deadline.
Demand for the new product on the release date is affected by the overall quality of the product
as well as exogenous shocks reflecting market conditions. If firms simultaneously choose their
total investment levels at the start, then they face a coordination problem in the form of
equilibrium multiplicity, for example when there are strategic complementarities.
The main insight of this paper is that this problem of multiplicity can be resolved by
explicitly modeling gradual adjustment. We show that equilibrium multiplicity disappears when
firms can incrementally adjust their investment levels in response to (possibly small) stochastic
shocks. To formalize this, we consider a class of finite-horizon stochastic games which can
accommodate a variety of applications, such as strategic investment, oligopoly competition,
public projects, and R&D races. Each player is endowed with a publicly observable state
variable, which describes her position in the game at each point in time until a deadline, and
takes an action each period to influence its transition subject to a convex adjustment cost. The
costs can depend on the state variables (and can thus also be interpreted as a net flow payoff),
and there is a lump-sum payoff that depends on the final values of the state variables at the
deadline. This model is parameterized by the length ∆ of each period (while the total duration
of the game is fixed); as ∆ shrinks, the players adjust their state variables more gradually, as
their actions each period have a smaller effect on the states. This is in contrast to the one-shot
case in which each action changes the game outcome in a significant manner.
In Section 3, we show that the model has a unique equilibrium if ∆ is small enough, as
long as the size of the noise does not vanish too quickly. The condition requires that the
noise size is sufficiently larger than players’ adjustment speed at a local (i.e., per-period) level.
However, the total noise size in the game can vanish as ∆ → 0, i.e., the limiting model can be
1
deterministic.1 The uniqueness result holds for a general class of terminal payoff functions, even
if, for example, they admit strategic complementarities so that one-shot modeling can easily
generate multiple equilibria. In addition, the unique equilibrium has the Markovian property,
that is, each player’s adjustment at each moment in time depends only on the current time and
state and is independent of past histories.
As we discuss in detail in Section 3.2, we identify the following key properties that underly the uniqueness result: (i) gradual adjustment; (ii) noisy state evolution; (iii) no myopic
strategic interdependence ; and (iv) deadline. If any one of these features is absent, we can
construct counterexamples where equilibrium multiplicity arises. Without gradualness (i.e., ∆
not small), each player’s optimal current action is sensitive to her belief about her opponents’
current actions. This substantial strategic interdependence creates the potential for equilibrium
multiplicity. However, we show that this issue vanishes when the period length is small in the
presence of noise. Here it is important not only that other’s current actions have a small impact
on the state but also that their impact is diluted by the noise term. Moreover, the players’
continuation values are uniquely determined by the terminal payoffs at the deadline. As we
will explain in Section 3.1, these observations make it possible to uniquely specify the players’
optimal actions and continuation values backward from the deadline under small period length
and some amount of noise.
In Section 4, we focus on a team-production model in which players share the same terminal
payoffs as a function of total output produced by the players. Note that in the absence of
gradualness (i.e., small ∆) or noise, we cannot rule out the possibility that players might
coordinate on producing an inefficient outcome. The efficiency loss from this coordination
failure can be nonneglibile even when the (adjustment) cost is arbitrarily small. We show that,
provided the costs are small, this unique equilibrium under noise is approximately first-best.
Intuitively, this is because some noise realizations can substitute for current efforts on the part
of the players and promote subsequent efforts on their part. When the state is sufficiently close
to the socially efficient outcome near the deadline, the players can coordinate on the efficient
outcome with small costs and noise, which will unravel any inefficient equilibrium because
the players will anticipate that even small noise fluctuations will trigger repeated coordination
toward the efficient outcome.
In Section 5, we consider a continuous-time model in which aggregate noise follows a Brownian motion. We prove equilibrium uniqueness in this continuous-time model, based on an
application of a recent unique-solution result in backward stochastic differential equations (BS1
We note that equilibrium multiplicity is possible if we analyze the same deterministic model directly in
continuous time.
2
DEs). The equilibrium strategy is also Markovian. We derive partial differential equations that
describe the evolution of continuation values (subject to a smoothness requirement).
A particular advantage of the Brownian-noise model is tractability. We solve a couple of
specific examples, and examine particular channels through which the gradualness of players’
decisions can improve their welfare relative to the one-shot environment. In Section 5.4.2, we
study a hold-up problem, in which a seller can improve the quality of a good but the price of
the good is fixed ex-ante (independently of the quality). While the one-shot benchmark case
predicts no investment in quality, we show that in our setting with gradual adjustment the seller
will choose positive investment levels. This illustrates a form of “commitment power” embedded
in the gradualness of the state. We solve for the equilibrium in closed form to quantify the
welfare gains relative to the one-shot case. Section 5.4.3 studies a model of a dynamic contest,
in which players exert costly effort to improve the quality of their own project, and a winner
is chosen by comparing all players’ quality levels at the deadline. In contrast to the one-shot
case, the players have flexibility in adjusting their effort depending on the current position even
before the deadline. This flexibility allows players to save on effort costs in situations where it
becomes relatively easy to predict who will win the contest. Due to this effect, the total welfare
can be improved under the gradual setting relative to the one-shot game, and we compute the
welfare gain numerically.
Section 6 discusses a list of possible extensions of the model. While players’ positions
and their transitions are continuous in the main model, in some situations positions are better
described as discrete variables with infrequent jumps, for example, technology adoption, market
participation, technological breakthrough, etc. In Section 6.2, we consider such a variation of
the main model in which the state-variable process approximates a Poisson process in the limit.
We again establish an equilibrium uniqueness result when the period length ∆ is sufficiently
small. The intuition is much like in the main model: While the impact of players’ per-period
actions is small in quantities under the main model, it is small in probabilities under the
discrete-jump model.
All proofs are relegated to the Appendix. In addition, a supplementary appendix contains
results for variations of the main model, as well as an introduction to the mathematical results
utilized in the proofs.
1.1
Related Literature
There are several studies on models with a deadline in which players can adjust their positions
gradually subject to some form of friction. One approach to modeling friction is with adjustment
3
costs, as in our paper. Caruana and Einav (2008b) consider a model of endogenous commitment
power in which the cost is increasing with time until the deadline.2 Their main focus is not on
coordination problems per se, and the model assumes a finite-period sequential-move structure,
which makes it possible to apply backward induction. In contrast, there is no such exogenously
specified ordering of players’ actions within each period in our model. Relatedly, Lipman and
Wang (2000) study finitely-repeated games with switching costs and short period length. They
obtain an equilibrium selection result in binary-coordination games while the equilibrium is
not unique for general payoffs. Caruana and Einav (2008a) study a continuous-time Cournot
competition model in which firms change the target production level before the market opens
subject to an adjustment cost. In contrast to the papers in this literature, we introduce (small)
stochastic shock to the adjustment process, which turns out to remove equilibrium multiplicity
for a very general class of terminal payoffs.
Another typical modeling of friction is via stochastic timing of adjustment opportunities,
usually formulated with Poisson clocks. In Kamada and Kandori (2015), players’ Poisson
clocks are correlated, and, in contrast to our uniqueness results, they show that a broad range
of final outcomes can be sustained as equilibria even if the one-shot benchmark has a unique
equilibrium. Calcagno, Kamada, Lovo, and Sugaya (2014) consider independent Poisson clocks
and show that a unique equilibrium selects a Pareto efficient outcome for a certain class of
coordination games.3 Their efficiency selection result is consistent with our efficiency results
(though we impose stronger symmetry assumption). A special case in our discrete-jump model
with small adjustment costs can be seen as an approximation of the independent Poissonclock models. Thus, one of the points our paper makes to this literature is that equilibrium
uniqueness can obtain for general payoffs if players incur (possibly small) adjustment costs.
A main point of our paper is that local noise can smooth out strategic sensitivity. This is
reminiscent of the literature on global games and their variants such as Carlsson and van Damme
(1993), Frankel and Pauzner (2000), and Burdzy, Frankel, and Pauzner (2001). The literature
emphasizes the role of payoff/belief perturbations in equilibrium uniqueness,4 although the
logic for equilibrium uniqueness is different from ours. While the literature relies on strategic
complementarities and the existence of dominance regions, our result holds for a general class
of terminal payoffs.
2
Caruana, Einav, and Quint (2007) applied this framework to bargaining.
They provide an example of multiple equilibria in different classes of games. Lovo and Tomala (2015)
prove the existence of Markov perfect equilibria in a more general framework, and Gensbittel, Lovo, Renault,
and Tomala (2016) study zero-sum games. There are also papers that focus on specific applications such as
bargaining (Ambrus and Lu, 2015) and auctions (Ambrus, Burns, and Ishii, 2014).
4
See Morris and Shin (2006) for a unified treatment of the various approaches in the literature including
private signals, timing frictions, and local interactions.
3
4
The analysis in Section 5 demonstrates useful applications of BSDEs to dynamic games in
continuous time. It is known that if players control the drift of publicly observable diffusion
processes, then, for a fixed strategy profile, their continuation value can be described by a
sensitivity process based on the martingale representation theorem (e.g., DeMarzo and Sannikov
(2006)). In particular, when the game has a deadline, the continuation value process solves a
BSDE with a fixed terminal time (Hamadene and Lepeltier, 1995). This representation method,
either for a finite or infinite horizon, is useful because it enables a characterization of incentive
compatibility using the sensitivity process. In this paper we highlight another advantage of
this approach by using it to prove equilibrium uniqueness. This requires, first of all, a suitable
restriction of the class of games that makes it possible to write down BSDEs that describe
continuation values under all the equilibrium strategy profiles. This step involves a technical
subtlety because this leads to multidimensional non-Lipschitz BSDEs that do not have a unique
solution in general. We handle this issue by adapting a refinement of a recent result in stochastic
analysis by Cheridito and Nam (2015). Generally speaking, the approach taken here suggests
a useful recipe to prove equilibrium uniqueness beyond the particular models in this paper.5 In
addition, we apply the comparison principle of BSDEs to conduct comparative statics.
The model in Section 5 can be seen as a particular class of (stochastic) differential games
(see Dockner, Jorgensen, Long, and Sorger (2001) for a survey). In important economic settings such as capital accumulation or duopoly with sticky prices, there typically exist many
equilibria (Fudenberg and Tirole, 1983; Tsutsui and Mino, 1990). As observed in the literature,
non-smoothness (or, discontinuities) of strategies is an important source of multiplicity (See also
Section 3.1 in our context). Our results contribute to the literature by providing a uniqueness
result under Brownian noise. Papavassilopoulos and Cruz (1979) show that, in a class of deterministic models similar to ours, an equilibrium is unique within the class of Markov strategies
that are analytic functions of the state (if such an equilibrium exists).6 The approach using
BSDEs in this paper allows us to show equilibrium uniqueness among all strategies, including
those that are non-smooth and/or non-Markovian.7
5
To the best of our knowledge, this paper is the first to use BSDEs to obtain equilibrium uniqueness in
continuous-time games.
6
Wernerfelt (1987) obtains an analogous result under a stochastic setting. That is, equilibrium is unique
within the class of Markov strategies that are C 2 in the state variables (if such an equilibrium exists).
7
There are more recent papers in the continuous-time game literature that examine the properties of Brownian signals in sustaining cooperation under imperfect monitoring (Fudenberg and Levine, 2007, 2009; Sannikov
and Skrzypacz, 2007; Faingold and Sannikov, 2011; Bohren, 2016). In contrast to their results, the current
paper proves equilibrium uniqueness even under perfect monitoring.
5
2
Model
We consider a model with a finite set of players N = {1, 2, · · · , n}, where each player has a
control over her own endowed “state.” As we will explain, the interpretation of the state variable
can vary according to the application, representing for instance accumulated capital, project
quality, investment positions in markets, or price levels. To formalize the idea that each player
can gradually adjust her state, we analyze a model with discrete periods t = 0, ∆, 2∆, ..., T
parameterized by the period length ∆ > 0. We assume for notational convenience that T is
divisible by ∆. Specifically, let Xt ∈ R denote player i’s state at time t, which follows a process
of the form
i
= Xti + ∆(µit (Xti ) + Ait ) + r∆ it ,
Xt+∆
(1)
given the exogenously fixed initial state X0i , where (it )t=0,...,T −∆ is a sequence of random variables that are identically and independently distributed with density φi . Here, r∆ is the noise
scaling factor which, as we will discuss, should shrink as ∆ → 0. In each period t = 0, ..., T −∆,
player i chooses her action Ait from a compact interval Ai after observing the past states
(Xsj )s=0,...,t−∆,j∈N and actions (Ajs )s=0,...,t,j∈N of every player. Player i’s action Ait plus the function µit : R → R determine the drift of Xti . Let Xt = (Xti )i=1,...,n and εt = (it )i=1,...,n denote,
respectively, the state vector and noise vector in period t. Depending on the application, the
noise terms capture either stochasticity in the environment that players are faced with and/or
slight imperfectness of implementation ability (“trembling hand”) on the part of players (See
also Section 6.2, which considers a model with discrete state variables).
Player i’s total payoff is
U i (XT ) −
T
X
ci (Ait )∆
(2)
t=0
where U i : Rn → R is the terminal payoff as a function of the vector of players’ states, and
ci : Ai → R is a flow payoff function that includes the adjustment cost of changing states.
We specialize to a relatively simple setup to minimize the notation, but Theorem 1, as well
as Proposition 1 in Section 4, hold more generally. For example, ci can depend on Xt , so that
it can be interpreted as flow payoffs instead of an adjustment cost. We discuss a list of possible
extensions of the model in Section 6. The section also considers a discrete-jump model where
state variables are discrete (Section 6.2).
Example 1 (Team production/project). Consider a team of workers who exert costly effort
to produce output, where Xti denotes the output (or, the quality of a project) produced by
individual i, which is also subject to randomness in the production process. In the last period,
6
P
player i receives U i (XT ) = U ( j XTj ), where U (·) is the team payoff as a function of the total
P
team-output j XTj . Each player also incurs effort cost ci from producing the individual output
every period. Note that there are strategic complementarities in workers’ output choices when
U is (locally) convex.
Example 2 (Currency attack). Consider a version of the currency attack model in Morris
and Shin (1998) embedded in a real timeline. Multiple speculators try to attack the currency
P
regime. At the terminal period, the currency regime collapses with probability P ( j XTj ),
P
where j XTj is the aggregate short position accumulated by the speculators. The more the
speculators sell the currency, the more likely the currency regime is to collapse at the end
of the game, which can create equilibrium multiplicity as emphasized in the literature. The
terminal payoff U i (XT ) represents the expected return of i conditional on the final positions
XT . The convex adjustment costs ci can be interpreted as portfolio adjustment costs with
limited liquidity.
Example 3 (Investment in Oligopoly). n firms decide how much capital they invest each period
Ait to improve its dominance in the market. Xti denotes the total amount of investment made by
i, which increases the flow revenue i receives each period (Note that flow payoff can depend on
Xt , see Section 6.1). This fits the framework analyzed in Athey and Schmutzler (2001), which
is applied to several examples such as consumers with inertia (where Xti corresponds to the
number of “loyal customers”) and learning-by-doing (where Xti corresponds to the accumulated
experience that serves to lower the production cost). Caruana and Einav (2008a) consider a
related deterministic model in which competing firms revise production targets Xti subject to
adjustment costs.
Example 4 (Contest). Players compete for a fixed prize, and the value of the prize does not
depend on their effort. One example is an internal labor market where colleagues compete for
a promotion. The boss gives a promotion to the employee whose project receives the highest
evaluation. Each player chooses her costly effort level Ait at each time to improve the quality of
her project Xti , and the boss evaluates performance based on the final quality of each player’s
project at deadline T , XTi .
We assume the following conditions on U i , ci , µit and it .
Assumptions.
1. U i is bounded and Lipschitz continuous.8
8
For the uniqueness result (Theorem 1), it is sufficient to require U i to be locally Lipschitz continuous in Xi .
7
2. ci is continuously differentiable and strictly convex.
3. µit : R → R is Lipschitz continuous.
4. φi : R → R is twice continuously differentiable, and its first-order derivative is absolutely
R∞
integrable, i.e., −∞ |(φi )0 (it )|dit < ∞.
We note that the last condition is satisfied by many standard parametric distributions, such
as the normal distribution. In addition, the condition is satisfied as long as the support of the
noise is compact and the density is continuously differentiable.
Remark 1. We mainly consider the static benchmark to be the case ∆ = T , so that players
make choices only once. Alternatively, one can consider a static game in which player i directly
chooses X i to receive payoff U i (X) without any cost. Our model with (1)-(2) can be seen as
an elaboration of this alternative static benchmark as well, in particular when the adjustment
costs ci is small. Under this interpretation, ci is mostly interpreted as a switching cost that
could arise due to various cognitive or physical constraints.
3
Equilibrium Uniqueness
The following is our main equilibrium uniqueness result.
Theorem 1. There exists M > 0 such that if lim∆→0
r∆
∆
≥ M , then for all sufficiently small
∆ > 0, there is a unique subgame-perfect equilibrium. Moreover, the strategies in this equilibrium are Markov (i.e., functions of (t, Xt ) only) and pure.
To understand the condition lim∆→0
r∆
∆
≥ M in the result, note that in each period, the
impact of each player’s action is proportional to ∆, while the impact of noise is r∆ . Thus
the condition means that, at a local level, the importance of players’ actions relative to noise
is bounded. However, this does not mean that the outcome of the game is mostly driven by
stochastic nose realizations rather than players’ actions. It is indeed possible that the aggregate
P
noise level r∆ Tt=0 t converges to 0 as ∆ → 0 when lim∆ √r∆∆ = 0. On the other hand, when
√
r∆ is proportional to ∆, the noise process converges to a Brownian motion. We will study
this continuous-time limit model with Brownian noise in Section 5.
The highlight of this uniqueness result is that we do not impose substantial restrictions on
U i . The one-shot benchmark with ∆ = T (and the alternative benchmark described in Remark
1) can easily have multiple equilibria, for example when U i admits strategic complementarities.
Theorem 1 claims that this equilibrium multiplicity vanishes generally when players adjust
state variables gradually with stochastic noise. In the next subsection, we will give intuition
for this result with a simple example.
8
3.1
Illustrative Example
In this subsection, we illustrate the intuition behind Theorem 1 with a team-project example
(Example 1). Assume for simplicity µ(·) = 0 and a form of “equity partnership” in which
P
U i (XT ) = max{0, j XTj } holds (in a large interval that contains [X0 , X0 + N T ]). That is,
players receive a positive benefit if their total output is positive and zero payoff otherwise (Figure
P
1). Also suppose A = [0, 1], c is strictly increasing with c(0) = 0, and T < − j X0j < N T .
With this simple example, we start with a static model, then allow players to gradually adjust
their actions, and finally add noise to the state variable.
Figure 1: Terminal payoff under equity partnership
Static case: Consider the static benchmark case of ∆ = T . That is, player i chooses effort
P
level Ai0 only once and the final output is determined by j (X0j + T Aj0 ). Set the zero noise
level r∆ = 0 to simplify the calculation. If c0 (1) < 1, then there exists a Nash equilibrium
P
Ai0 = 1 for every i. The symmetric equilibrium payoff is j X0j + N T − T c(1) > 0. On the
other hand, there is always a Nash equilibrium in which Ai0 = 0 for every i. In this case, the
symmetric equilibrium payoff is 0. Note that the equilibrium outcome in the latter case is
Pareto dominated by that in the former case.
Deterministic case: Equilibrium multiplicity similar to the static case remains even under
gradual adjustment (i.e., ∆ ≈ 0) without noise (i.e., r∆ = 0). For N ≥ 3 and for any value
θ ∈ (−N + 1, −1), one can verify that the strategy profile in which player i chooses

1 if P X j ≥ (T − t)θ
t
j
i
At =
0 if P X j < (T − t)θ
t
j
is a subgame-perfect equilibrium (for the details, see Section A in Appendix). If the initial
P
state is such that j X0j ≥ T θ, the players always choose the maximal effort Ait = 1 on the
P
equilibrium path, and they achieve the equilibrium payoff j X0j + T N − T c(1) > 0. If not,
9
there is no effort at all, that is, Ait = 0, and the equilibrium payoff is 0. The logic here is
analogous to the static case; players coordinate on either being productive or not, depending
on whether the current state is higher than the threshold (T − t)θ, and no any single player is
able to influence other players’ actions by any deviation. Note that the players’ continuation
values are discontinuous in the state Xt is on the threshold; thus we cannot uniquely pin down
the value of θ, which leads to a continuum of multiple equilibria.9 Figures 2 and 3 illustrate
two different equilibria in which the former Pareto dominates the latter.
Figure 2: A “good” Markov perfect equilibrium.
line), and team members choose
exceeds the marginal cost.
Ait
i
i X0
P
is located to the right of the threshold (dotted
= 1 for all t. Since c0 (1) < 1, the marginal benefit of effort always
Main case:. Now let us introduce stochastic noise and explain heuristically why equilibrium
multiplicity vanishes with frequent adjustment and stochastic noise. First observe that the
above multiplicity still remains if the noise size r∆ is small. In the above example, small noise
does not prevent players from perfectly coordinate on either full-effort or non-effort based on
the threshold strategy. However, this is not the case if the noise size r∆ is sufficiently larger than
∆; it becomes possible that some noise realizations push the state Xt to cross the threshold,
which changes players’ coordination incentives. Intuitively speaking, given that the ratio
r∆
∆
is sufficiently large, each player at a subgame is mostly reacting to the stochastic shock, and
coordinating with other players’ actions in the same period become relatively less important.
To elaborate on the above point about vanishing coordination incentives, consider a player’s
9
It is worthwhile to clarify that the multiplicity under the deterministic model does not rely on the kink in
the terminal payoff. Even when the terminal payoff is smooth, continuation values can be discontinuous due to
players’ strategic choice. Also, the point that discontinuities in continuation values can give rise to multiplicity
has been observed in the literature, for example Fudenberg and Tirole (1983) under an infinite-horizon strategic
investment game.
10
Figure 3: A “bad” Markov perfect equilibrium.
i
i X0
P
is located to the left of the threshold (dotted
= 0 for all t. Since a single player’s deviation cannot move the
line), and team members choose
total output over the threshold to the right region, the marginal benefit of effort is always zero.
Ait
incentive t the start of the last period t = T − ∆, which is to choose Ait to maximize the value
of the subgame payoff
Vti,∆ (At )
:= −c
i
(Ait )∆
Z
+
U i (Xt + At ∆ + r∆ t ) φ(t )dt
given the opponents’ actions (Ajt )j6=i . To examine their coordination incentives under small ∆,
for convenience now we assume that U i is twice differentiable with derivatives that are bounded.
Then we can compute the strategic interaction effect by directly taking the cross derivative of
the form
∂2
i,∆
(At ) =
j Vt
i
∂At , ∂At
Z
(U i )00 (Xt + At ∆ + r∆ t ) φ(t )dt ∆2 .
Intuitively speaking, this is because each player’s action influences the state to first order in
∆, and the interactions among their influences is of order less than ∆. On the other hand, the
second-order partial derivative
∂2
V i,∆
∂(Ait )2 t
is on the order of ∆ because of the strict convexity
of the adjustment cost ∆c00 (Ait ). The first-order condition implicitly defines i’s best response
against the opponents’ actions at the subgame, and by the implicit function theorem, its slope
is bounded by
2
∂ V i,∆ ∂Ait ∂Ajt t 2
∂ V i,∆ .
∂(Ait )2 t (3)
This ratio is close to 0 when ∆ is small, meaning that each player’s incentive is nearly insensitive
to her opponents’ actions, and thus the equilibrium actions at time t are uniquely determined
11
by the contraction mapping theorem.
The basic idea is to repeat the same line of argument backward from the last period T − ∆
i
by replacing the terminal payoff U i by the expected continuation payoff denoted by Wt+∆
, but
this involves subtleties. While U i was assumed to have second derivatives uniformly bounded
in the previous paragraph, we cannot impose such a restriction because continuation values
vary endogenously; they can be discontinuous due to players’ strategic incentives (in general,
even when U i is smooth).10 However, with the presence of noise, the expected continuationR i
value Wt+∆
(Xt + At ∆ + r∆ t ) φ(t )dt becomes smooth in (Ajt )j∈N . The logic here is that one
can think of each action Ait as smoothly changing a density of it , instead of deterministically
changing the next state Xt+∆ (a change of measure, formally speaking). Under the assumption
that the local noise ratio
r∆
∆
is large, players’ actions have only a limited impact on the noise
densities, and thus we can derive bounds on the strategic interaction term analogous to those
on (12) for all small enough ∆ > 0.
3.2
Key Assumptions for the Uniqueness
In this subsection, we identity and elaborate on four key assumptions behind Theorem 1:
(i) Gradual adjustment
(ii) Noisy state evolution
(iii) No instantaneous strategic dependence
(iv) Deadline
Gradual adjustment (i) means that a player’s decision at each time has only a small influence
on the value of the state variable. More specifically, this influence is of order ∆.11 If not,
we cannot ensure uniqueness in general, for the same reason that a one-shot game can have
multiple equilibria.
For (ii), recall that the continuation values can be very steep or discontinuous in the state
without noise, which can lead to a continuum of multiple equilibria even when ∆ is small (e.g.,
Section 3.1). With enough local noise (i.e.,
r∆
),
∆
such strategic sensitivities are smoothed out.
In (iii), we mean that the cross derivative of flow payoffs in players’ actions are zero (or, at
least, small). The model in this paper is motivated to study situations in which players influence
others’ payoffs via changing states,12 rather than directly changing flow payoffs. Introducing
10
Another issue is that we want the derivatives to be bounded uniformly in ∆. But this is more difficult to
ensure as the number of periods explodes as ∆ → 0.
11
In the case of the discrete-jump model (Section 6.2), it is also of order ∆ in expectation, and we prove that
there exists a unique equilibrium if ∆ is small enough.
12
Note also that flow payoffs can depend on Xt (Section 6.1).
12
cross-derivative terms in flow payoffs can clearly create coordination problems at the flow payoff
level.
With the presence of a deadline (iv), we can conduct a “backward induction” argument
to construct a unique equilibrium, combined with (i)-(iii). While we can allow for the length
of game T to be stochastic, possibly influenced by players’ actions and states, we want the
distribution to be uniformly bounded. (See Section 6.1 and also Supplementary Appendix B.2
for an example of multiple equilibria in the case of unbounded distributions).
4
Efficiency with Team Production
In this section, we focus on the special case where symmetric players play a team production
model (Example 1). Specifically, all players share the same adjustment cost ci (Ait ) := c(Ait ) and
P
P
terminal payoff U i ( i XTi ) := U ( i XTi ) with c and U satisfying the assumptions in Section 2.
As we have seen in Section 3.1, there can be multiple equilibria that are Pareto ranked,
and the welfare loss of coordinating on an inefficient equilibrium is non-negligible even with
small adjustment costs. We show that in this team production model, players can achieve
approximate efficiency at some equilibria under small frictions (noise and adjustment cost).
Combined with the uniqueness result in the previous section, this result guarantees a condition
under which players can always attain approximate efficiency.
Proposition 1. Consider the team production model. Then, for any > 0, there exists δ > 0
such that if max{sup |c|,
2
r∆
}
∆
≤ δ, there is a subgame-perfect equilibrium in which the expected
payoff of any player is greater than
!
U
Q max
X∈ i∈N (X0i +T A)
X
Xi
− .
(4)
i
That is, in (4) players approximately achieve the best terminal payoff among the outcomes that are collectively achievable (under the deterministic environment).13 The key step of
the proof of Proposition 1 involves constructing a single-agent optimization problem in which
a (hypothetical) planner chooses all the actions (Ait )i∈N,t=0,∆,···T −∆ to maximize E[U (XT ) −
P
∆ i,t c(Ait )]. We show that this value provides a lower bound for an equilibrium. Under small
costs and noise, the optimal value in this planner’s problem approximates (4).
Proposition 1 (efficiency) requires a small noise level r∆ (relative to ∆) to ensure small
aggregate volatility of the game outcome, which is the opposite of Theorem 1 (uniqueness),
13
While we look at small adjustment cost with a fixed terminal payoff, approximate efficiency (in terms of
the ration between the first-best payoff) also obtains when we look at large terminal payoff relative to cost.
13
where we needed enough noise (relative to ∆) to smooth out strategic uncertainty. For an
intermediate rate of noise level, by combining Theorem 1 and Proposition 1, we conclude the
following corollary:
Corollary 1. Consider the team production model. Assume that lim r∆∆ = ∞ and lim
2
r∆
∆
= 0.
Then, for any > 0, under sufficiently small ∆, there is a unique subgame-perfect equilibrium
and its expected payoff for any player is greater than (4).
To informally explain why inefficient equilibria do not exist here, reconsider the example
P
in Section 3.1. If we look at period t near the deadline in which the team output i Xti is
sufficiently high, every player finds it optimal to work hard. But they have no incentive to
work if the output at t is too low. At earlier period t0 < t, players have additional incentives
to work harder because they anticipate that pushing up the team output could “encourage”
other players to work harder at later periods (and this effect is guaranteed to be positive with
the presence of noise). Given that the efficiency term in the terminal payoff U is sufficiently
large relative to the cost c, this effect accumulates as |c| becomes small, so that the ex-ante
equilibrium payoff becomes very efficient.
5
Continuous-time Limit with Brownian Noise
In this section, we analyze a continuous-time model with Brownian noise, which can be seen as
√
a limit of the model in Section 3 as ∆ → 0 where r∆ is proportional to ∆. 14
There are three main reasons for this exercise. First, Brownian noise is often used in
applied works, and it is of independent interest to examine this case directly in continuous
time. Second, while continuous-time models are usually computationally more tractable than
discrete-time models, deterministic continuous-time models are in general (even under Markov
strategy restrictions) not well-defined and lack equilibrium uniqueness (See footnote 16). Introducing Brownian noise resolves these two issues nicely. Lastly, we stress a technical interest
in applications of backward stochastic differential equations (BSDEs) to dynamic games. We
prove equilibrium uniqueness and comparative statics based on BSDEs. The proof methods
work even when the standard approach based on HJB equations is not applicable (See Section
5.3).
√
We have not formally investigated whether the (unique) equilibrium under r∆ ∝ ∆ converges to the one
under the continuous-time limit, partly because taking such a Brownian-noise limit is not necessary for our main
results (in Section 3). Staudigl and Steg (2014) show the convergence of expected payoffs for a fixed strategy
profile in the similar setup of repeated games with imperfect public monitoring assuming a full-support signal
distribution.
14
14
5.1
Setup
The model is formulated directly in continuous time. Player i controls his own state variable
as
dXti = (µi (Xti ) + Ait )dt + σ i dZti ,
(5)
for a fixed initial value X0i . As in the discrete-time model, Ait ∈ Ai is the control chosen by
player i at time t.15 The control range Ai is assumed to be a compact interval in R. We assume
that Brownian shock dZti is independent across players, while this assumption can be relaxed.
Player i’s total payoff is
i
Z
T
ci (Ait )dt.
U (XT ) −
0
We impose the same conditions on ci and U i as we did in the discrete-time model. In
addition, we assume that each µi is bounded.
At time t, player i observes the state Xt but does not directly observe the control actions
Ajt
of the other players. Formally, player i’s (public) strategy is an Ai -valued stochastic process
Ai = (Ait )t∈[0,T ] that is progressively measurable with respect to F X , the filtration generated
by the public history of Xt . Note that this setup allows the players’ actions to depend on past
histories of the Xt process.16
Under any profile of strategies A = (Ai )i∈N , we can define player i’s continuation value at
time t as
Wti (A)
Z
i
= E U (XT ) −
t
T
c
i
(Ais )ds
X
Ft .
(6)
A profile of strategies A = (Ai )i∈N is a Nash equilibrium if each player maximizes her expected
payoff (6) at t = 0 given the other players’ strategies. Since the distribution of the publicly
observable Xt process has full support in our setup, the Nash equilibrium concept is identical
to its refinement known as public perfect equilibrium, which is frequently used in the literature
on dynamic games.
15
We use the so-called weak formulation, which is the standard way that a continuous-time game with
imperfect public monitoring is formulated in the literature (see Appendix for the formal description).
16
We assume imperfect observation only for the purpose of avoiding the well-known technical issue in
continuous-time games. That is, the outcomes of a game may not be well-defined for a fixed strategy profile, even under Markovian strategies (See other approaches, e.g., Simon and Stinchcombe (1989) and Bergin
and MacLeod (1993), for careful restrictions on the strategy space). This setup is standard in dynamic games
and contracts that model players who observe noisy public signals about the actions of the other players (e.g.,
Sannikov (2007)). Note also that we rule out mixed strategies in this model. Both of this and imperfect monitoring are not crucial assumptions for equilibrium uniqueness because Theorem 1 in the discrete-time model
holds under allowing for mixed strategies and perfect monitoring as well.
15
5.2
Equilibrium Uniqueness
We follow the standard approach in characterizing the optimality of each player’s strategy by
using the martingale representation theorem.17 For any profile of strategies A = (Ai )i∈N , there
is a profile of processes (β i (A))i∈N that are progressively measurable with respect to F X such
that the evolution of player i’s continuation value satisfies
dWti (A) = ci (Ait )dt + βti (A) · dZt
(7)
along with the terminal condition
WTi (A) = U i (XT ).
(8)
Here β i (A), whose existence and uniqueness are ensured by the martingale representation, is
hR
i
T
an F X -progressively measurable Rn -valued process such that E 0 |βti |2 dt < ∞.18 Intuitively,
βti,k (A) measures the sensitivity of player i’s continuation payoff to fluctuations in k’th shock
Ztk at time t.
To simplify the notation, we introduce vector/matrix notations. Denote the drift vector and
the covariance matrix of the state variable by µ(Xt ) + At ∈ Rn and Σ ∈ Rn×n , respectively.19
Rewriting dXt = (µ(Xt ) + At )dt + ΣdZt as dZt = Σ−1 (dXt − (µ(Xt ) + At )dt) and substituting
the latter into equation (7), we obtain
dWti (A) =
i i
c (At ) − βti (A) · Σ−1 (µ(Xt ) + At ) dt + βti (A) · Σ−1 dXt .
To characterize the players’ incentive compatibility constraint, we define player i’s response
function f i : Rn×n → Ai as
f i (b) = arg max
bi · Σ−1 αi − ci (αi )
αi ∈Ai
for each b = (bi )i∈N ∈ Rn×n . The maximand is taken from (the negative of) the drift term in the
expression for the continuation value above. The function f i is well-defined because we assume
strict convexity of ci in Ai by assumption. Using these, based on the standard argument we can
characterize the optimality of strategies in terms of the following “local incentive compatibility
17
The application of the martingale representation theorem to stochastic control problems has a relatively long
history. For example, see Davis (1979) for a historical survey and Pham (2010) for a textbook-style introduction.
18
In this paper we use | · | to denote both the Euclidean norm for vectors and the Frobenius norm for matrices.
19
More specifically, the i-the element of µ(Xt ) + At is µi (Xti ) + Ait , and the (i, j)-element of Σ is σ i if i = j
and zero otherwise.
16
condition” (Lemma 6 in Appendix): i’s strategy Ai is optimal against A−i if and only if
Ait = f i (βt (A)) almost surely for almost all t.
(9)
Lemma 1. A strategy profile (Ai )i∈N is a Nash equilibrium if and only if (7), (8), and (9) hold
for some F X -progressively measurable processes W, β such that
2
sup |Wt |
E
Z
< ∞, E
0≤t≤T
T
|βt | dt < ∞.
2
0
Combining equations (5), (7), (8), and (9) leads to a multi-dimensional backward stochastic
differential equation (BSDE)
dWti = ci (f i (βt )) − βti · Σ−1 µi (Xti ) + f i (βt ) dt + βti (A) · Σ−1 dXt
WTi = U i (XT )
(10)
under a probability measure for which the Xt process is a standard Brownian motion. A solution
(W, β) to this BSDE describes the stochastic evolution of the continuation payoffs. Moreover,
the control Ait given by f i (βt , Xt ) is incentive compatible for player i and is consistent with the
evolution of the continuation payoffs.
While the class of BSDEs in (10) is non-standard in the literature, we build on the results
of Delarue (2002) and Cheridito and Nam (2015) to show that this BSDE has a unique solution
(W, β), which establishes our main claim of equilibrium uniqueness.
Theorem 2. There exists a unique Nash equilibrium. Moreover, this equilibrium has the Markovian property, that is, the continuation values and the strategies take the form Wti (A) = wi (t, Xt )
and Ait = ai (t, Xt ) for some function wi : [0, T ]×Rn → R that is Lipschitz continuous in x ∈ Rn
uniformly in t ∈ [0, T ] and some function ai : [0, T ] × Rn → Ai .
5.3
Relation to Dynamic Programming Approach
Theorem 2 shows that the unique equilibrium is Markovian with respect to time and state
variables. This means that on the equilibrium path each player solves a Markov stochastic
control problem. A standard approach to this problem is to write down HJB equations (see,
e.g., Pham (2010)):
0 = max
i
a
!
n
n
i
∂wi (t, x) X 1 ∂ 2 wi (t, x) X k
∂w
(t,
x)
+
+
µ (x) + ak
− ci (ai ) ,
k )2 (∂xk )2
k
∂t
2(σ
∂x
k=1
k=1
17
(11)
with a terminal condition wi (T, x) = U i (x). This non-linear PDE describes how player i’s
continuation value evolves over time in response to changes in Xt .
Although this dynamic programming approach might be more familiar to economists, the
existence of a smooth solution to the nonlinear PDE is not easy to ensure.20 Moreover, it cannot
allow for non-Markovian and non-smooth strategies, so that we cannot rule out the existence
of other equilibria.21 For the proof of Theorem 2, we can avoid these issues by directly relying
on the uniqueness property of BSDE (10).
5.4
Examples
The continuous-time model with Brownian noise is more tractable than the discrete-time model
because the BSDE (10) allows for convenient numerical analysis (Delong (2013), Ch 5), in
particular its reduction to a PDE in (11). In this subsection, we illustrate the usefulness
through three specific applications. In particular, the latter two examples, a hold-up problem
and a contest, illustrate channels under which gradualness improves players’ welfare.
5.4.1
Team Production
As a first example, we consider a team-production model in which players share a common
terminal payoff that depends only on the sum of the state variables. From Corollary 1, we
know that players can always approximately achieve efficiency when frictions are small.
To examine the dependence of the equilibrium outcome distribution on the initial point
when frictions are not necessarily small, we conducted numerical simulations with two players,
the terminal payoff U (x) = sin(ax)+bx−cx2 with a = 0.5, and b = c = 0.01 (Figure 4), and the
quadratic cost function c(a) = κ2 a2 . The terminal function two local maxima over x ∈ [−10, 6],
and we cannot rule out multiple equilibria in the static benchmark case, because the players
can coordinate on “targeting” either of the locally maximum points. In particular, the players
might coordinate on playing a Pareto-inefficient equilibrium by targeting the local maximum
at x ≈ −8.30 rather than the global maximum at x ≈ 3.28.
Figure 5 shows a set of simulation results for the stated utility function with different
initial points X0 := X01 + X02 and cost levels κ. In each panel, we fix X0 ∈ {−6, 0, −5.2} and
compare the simulated distribution of XT with κ = 0.5 (grey) and that with 0.2 (black). For
20
In Iijima and Kasahara (2015a), we provide a sufficient condition for the existence of a smooth solution
with stronger restrictions of U i and Inada conditions on ci .
21
In general, non-Markovian strategies can be effective in generating equilibrium multiplicity even for finitehorizon games, as observed in the literature of repeated games as well as more recent papers such as Kamada
and Kandori (2015). Also note that non-smoothness of continuation values is a key aspect to cause equilibrium
multiplicity as well.
18
each parameter specification, we compute the equilibrium strategy profile and then randomly
generated XT 1000 times and plotted the resulting values as a histogram. We found that when
X0 is very close to one of the local maxima of U (XT ), as in Panels A and B, the players tend
to coordinate on the closer of the two local maxima, regardless of cost level. However, when
the initial point is not very close to one of the local maxima, the cost level can significantly
affect the distribution of XT . In Panel C, although X0 = −5.2 is closer to the inefficient local
maximum, players try to push the state variable toward the global maximum in more than half
of the simulations when the cost level is sufficiently small.22
Finally, we consider comparative statics of the equilibrium payoff assuming that U is monotone. We let A(κ, N ) denote the unique equilibrium under the cost parameter κ and team size
N.
Proposition 2. Consider the team production game in which U is increasing. Then W0 (A(κ, N ))
is decreasing in κ and increasing in N .
While this result may not be surprising in and of itself, it ensures that our “equilibrium
selection” has natural properties. In general, an arbitrary selection of an equilibrium can lead
to strange comparative statics results when we vary parameters locally. The proof is based
on the comparison theorem applied to the equilibrium BSDEs across different parameters. In
order to conduct the appropriate comparison, we need to ensure that βt is always positive in
the equilibrium. This is guaranteed by the co-monotonicity theorem (Chen, Kulperger, and
Wei, 2005; dos Reis and dos Reis, 2013), which is useful in checking the signs of sensitivity
processes βt of BSDEs in general.
5.4.2
Hold-Up Problem
In this example, we study a class of hold-up problems that highlight a form of “commitment
power” embedded in gradual adjustment (in contrast to one-shot adjustment). In particular,
we obtain a closed-form solution for the equilibrium strategies and values to calculate the
“efficiency gain” of gradualness relative to the static benchmark. This also illustrates the point
that equilibrium outcomes in the game with gradual adjustment (with small noise) can be very
different from those under the static benchmark model, even if the equilibrium of each model
is unique.
Consider a two-player game with a seller (player 1) and a buyer (player 2) with X0i = 0
for each i = 1, 2. The seller chooses investment A1t to improve the quality of a good Xt1
by incurring a flow cost c1 (A1t ). The buyer decides the amount of the purchase, Xt2 . His
22
We thank an anonymous referee for suggesting this simulation experiment.
19
U(X)
1.5
1
0.5
0
−0.5
−1
−1.5
−2
−10
−8
−6
−4
−2
X
0
2
4
6
Figure 4: Terminal payoff function with multiple local maxima
Panel A: X0 = −6
250
κ = 0.5
κ = 0.2
200
150
100
50
0
−10
−8
−6
−4
−2
0
2
4
6
0
2
4
6
0
2
4
6
Panel B: X0 = 0
250
200
150
100
50
0
−10
−8
−6
−4
−2
Panel C: X0 = −5.2
250
200
150
100
50
0
−10
−8
−6
−4
−2
Figure 5: Simulated distributions of XT with different initial points X0 and cost levels κ. In each
panel, we fix X0 ∈ {−6, 0, −5.2} and compare the simulated distributions of XT with κ = 0.5 (grey)
and 0.2 (black). For each parameter set, we simulated XT 1000 times and plotted them as a histogram.
The other parameters are σ = 1, N = 2, and T = 1.
20
adjustment cost c2 (A2t ) captures the transaction costs of—or the frictions involved in loading—
new goods, such as preparing the necessary facilities. The price per unit is exogenously fixed.
The seller’s terminal payoff U 1 is her revenue, and the buyer’s terminal payoff U 2 is his utility
of consumption. We make the following assumptions: (i)
is increasing in the amount of the buyer’s purchase; (ii)
does not depend on the quality per se; (iii)
∂2U 2
∂X 1 ∂X 2
∂U 1
> 0, that is, the seller’s benefit
∂X 2
1
∂U
= 0, that is, the seller’s revenue
∂X 1
1
> 0; and (iv) c is minimized at 0.
When the quality of a good is not contractible, incentivizing the seller to make a costly
investment is a difficult problem in general. The seller and buyer could negotiate the price of
the good after observing the final quality, but this approach may not be helpful when the buyer
gains bargaining power ex post. In this subsection, we examine an alternative scheme in which
the seller and buyer agree on a fixed price ex ante and the buyer gradually adjusts the scheduled
amount of the purchase while observing the seller’s quality investment thus far. In this setting,
the seller has an incentive to choose a positive level of investment under the dynamic model,
because it will encourage the buyer to purchase the good at a later time.23 This is in contrast
to the static benchmark case, where the seller chooses the dominant strategy, A10 = 0.
To quantify the welfare gain, we focus on the following quadratic payoffs to obtain a closedform solution:
U 1 (X) = X 2 , U 2 (X) = X 1 X 2 , ci (Ai ) =
c̄i i 2
(A )
2
for some c̄1 , c̄2 > 0 with Ai = R. In Proposition S1 in Supplementary Appendix, we show
that a linear equilibrium in a general quadratic game is obtained by using a system of ordinary
differential equations, which is independent of σ. Accordingly, we obtain the following result.
Proposition 3. Consider the hold-up problem with quadratic payoffs. Then there is a Nash
equilibrium A = (Ai )i=1,2 given by
A1t =
T −t
c̄1 c̄2
| {z
}
,
A2t =
cooperation term
(T − t)2
2c̄ (c̄ )2
| 1{z2 }
+
Xt1
.
c̄2
cooperation term
Moreover, the corresponding equilibrium payoffs are as follows:
W01 (A) =
T3
T 2σ2
T5
2
(A)
=
,
W
+
0
3c̄1 (c̄2 )2
4c̄2
8(c̄1 )2 (c̄2 )3
23
The logic is similar to that of Bohren (2016), who studies an infinite-horizon model in which a long-lived
seller is faced with a continuum of short-lived buyers at each moment in time. The seller has an incentive to
make an investment, since the effect is persistent over time, which will convince prospective future buyers to
make purchases. In our model, the buyer is not myopic, and his forward-looking incentive is captured by the
cooperation term.
21
The basic reason for the welfare improvement is that the seller can acquire commitment
power from the gradually evolving state, which encourages the seller’s investment. Also, anticipating the positive investment incentive by the seller, the buyer has an additional incentive to
increase the purchase amount. These effects are captured by the “cooperation terms” in the
above formula, and disappear as time t approaches the deadline T .24
As an extreme form of a seller’s commitment power, we can compute the Stackelberg equilibrium in the static benchmark game, where the seller chooses A10 =
A20 =
T
.
c̄1 c̄2
The seller’s payoff is
T3
,
2c̄1 (c̄2 )2
T
,
c̄1 c̄2
and the buyer chooses
which is always higher than the dynamic equilibrium
payoff in Proposition 3. The buyer’s payoff is
T5
,
2(c̄1 )2 (c̄2 )3
which can be lower than the dynamic
equilibrium payoff if σ is high. This captures the value of action flexibility in a dynamic setting
that is missing in the static case. The payoffs in the static benchmark case are not affected
even if a noise term ZT affects the state variables.
5.4.3
Dynamic Contest
As a final example in this section, we apply our framework to a contest (Example 4), in which
players compete for a fixed prize, and the value of the prize does not depend on their effort.25
We consider a two-player game where each player i = 1, 2 exerts costly effort Ait at each time
t to improve the quality Xti of her project given the initial value X0i . We normalize the payoff
for the winner of the prize to 1. We suppose the winner is stochastically determined with the
following additive noise:
U i (XT ) = Pr[XTi + θi > XTj + θj ],
where θi is a random element that affects the contest designer’s judgment. Hence the terminal
payoff for player i is increasing in the final quality difference XTi − XTj . By assuming that the
distribution of θ := θ1 − θ2 admits a bounded density, the terminal payoff function U i becomes
Lipschitz continuous.
In contrast to the static benchmark with one-shot effort choice, in this model players can
adjust their effort choice gradually depending on the current quality difference before the deadline. To examine this point, we numerically solved the equilibrium PDE (11) using the symmetric quadratic cost funcitons ci (a) = 2c̄ a2 with Ai = R and terminal payoffs with logit noise
U i (X) =
exp[γX i ]
,
exp[γX 1 ]+exp[γX 2 ]
where γ > 0.26 The numerical simulation results indicate that the
leader (i.e., the player with the higher value of the state variable) generally exerts greater effort
24
We note that the unbounded control ranges deviate from the original model formulation, and thus the
equilibrium uniqueness is an open question.
25
See Konrad (2009) for an extensive survey of this field.
26
We used the unbounded control range in the simulation for simplicity.
22
than the follower. The leader has an incentive to “discourage” the follower and get her to exert
less effort.
Relative to the static benchmark case, we observe that the expected total payoff for each
player is higher than in the dynamic setting (Figure 3). That is, the total cost of effort is lower
in expectation. This is because the players tend to exert less effort under the dynamic model
when the state difference becomes large (Figure 4).
Another observation from the simulations is that the equilibrium effort level ai (t, X) is
monotone in t (Figure 4). In particular, it is increasing in t when competition is fierce, that is,
when X 1 ≈ X 2 . To understand this, note that the benefit of exerting effort in such a situation
is greater when t is closer to the deadline, because the magnitude of the remaining Brownian
noise is smaller. On the other hand, consider the case where |Xt1 − Xt2 | is large. Then the
winner of the contest is essentially already determined if t is close to the deadline, hence the
players tend to stop choosing high effort levels, and this makes ai decreasing in t.27
Harris and Vickers (1987) study a related continuous-time model of contest games, called
the tug-of-war model.28 In this model, the players exert effort to influence the drift of their state
variables, as in the current paper. As in our model, in each state the leader always exerts more
effort than the follower. In contrast to our model, there is no fixed deadline; one of the players
wins, and the game ends if the difference between the values of the state variables reaches some
pre-established value. Harris and Vickers (1987) focus on time-stationary strategies, while our
model allows us to investigate the effects of the length of time remaining (before the deadline)
on equilibrium behavior.
27
The simulations suggest that ai tends to be increasing in t, especially when γ is small. To interpret this,
note that in more random contests (smaller γ), raising the level of effort has a greater marginal effect on the
winning probability. Therefore, the players become competitive even when the difference |X 1 − X 2 | is large.
28
Budd, Harris, and Vickers (1993) analyze a more general model. Cao (2014) also studies a model with
asymmetric players and the effect of a handicapping policy. Yildirim (2005) analyzes a discrete-time contest
model with multiple periods. In contrast to our results, there are multiple equilibria when the players are
asymmetric, and the total effort is typically at least as great as that in the one-shot model. Seel and Strack
(2015) consider a continuous-time model in which players solve the optimal stopping problem with a private
observation of the own state variables.
23
w1(0,X1,X2)+w2(0,X1,X2)
1
0.99
0.98
0.97
0.96
0.95
Dynamic
Static
0.94
0.93
−2
−1
0
1
2
X1−X2
Figure 6: Expected total payoff of the players under different initial state differences X01 − X02 . The
total payoff under the dynamic model (solid line) exceeds that under the static model (dashed line)
for each initial state. The other parameters are σ = 1 and γ = 1.
a1(t,0.5,−0.5)
a1(t,1,−1)
0.2
0.13
Dynamic
Static
0.195
0.125
0.19
0.12
0.185
0.115
0.18
0.11
0.175
0.105
0.17
0.165
0
0.25
0.5
t
0.75
0.1
1
0
0.25
0.5
t
0.75
1
Figure 7: Time dynamics of player 1’s optimal control policy in the dynamic contest game when
player 1 is the leader. The solid lines are equilibrium strategies under the dynamic model: The effort
level a1 (t, X) is increasing in t when Xt1 and Xt2 are close (left panel, with Xt1 −Xt2 = 1) but decreasing
in t otherwise (right panel, with Xt1 − Xt2 = 2). The dashed lines are the equilibrium effort levels
under the static benchmark for different initial states. The other parameters are σ = 1 and γ = 1.
24
6
6.1
Extensions
Generalizations of the Main Model
While we have specialized to a relatively simple setup so far, Theorem 1, as well as Proposition
1 in Section 4, hold more generally. In this section, we briefly discuss some of generalizations.29
State-dependent adjustment cost: ci can depend both on Ait and Xt as long as ci (At , Xt )
is Lipschitz continuous in Xt and the second derivative in Ait is bounded away from 0. This
generalization allows us to interpret ci as a general flow payoff rather than cost of adjusting X i .
For example, firms in dynamic oligopoly may earn flow profits as a function of interim capital
levels (Example 3).
Discontinuous payoffs:
The terminal payoff U i can be discontinuous at countably many
points as long as the noise density is sufficiently flat. See Section B.1 in the supplementary
appendix for the details. This allows for so-called threshold technologies in the context of public
good provision (Example 1). Also, in Example 2, the central bank may abandon the present
currency regime if and only if the aggregate short position exceeds a known threshold, which
leads to discontinuous terminal payoffs.
Aggregate state:
While each player is endowed with his own state variable in the main
model, we can relax this restriction by introducing (possibly several) aggregate state variables
whose drifts are determined as weighted sums of players’ actions. For example, the total output
produced by team members can be interpreted as a single aggregate state variable in a team
project (Example 1). This is also useful for analyzing anonymous markets where players do not
necessarily observe individual transaction levels.
Imperfect monitoring:
While we have assumed that players can perfectly observe other
players past actions, this specification is irrelevant for the results because the unique equilibrium
in Theorem 1 is Markovian with respect to time and the current state vector. As long as players
can observe the state variables, there is a unique perfect Bayesian equilibrium.
Random deadline:
The deadline can be random as long as it is bounded. Moreover, it is
possible to specify that the game ends earlier than the deadline when a state variable hits a
certain threshold. This generalization is related to Georgiadis (2015), who analyzes a teamproject model without a deadline in continuous time.
Non-i.i.d. noise:
Theorem 1 holds even if noise is correlated across players and/or peri-
ods as long as the noise density φit+∆ (·|()0≤s≤t ) conditional on past realizations satisfies the
assumptions in Section 2. This generalization is important when the noise contains common
29
The uniqueness result in Section 5 can be proved under these generalizations at least except for discontinuous
payoffs and non-iid noise.
25
value components and/or time persistence as in a macroeconomic or industry-wide shock.30
6.2
Discrete-Jump Model
So far we have focused on the environment in which the state variables and noise terms are continuous. Such a formalization does not adequately represent some applications where players’
states are naturally represented by discrete variables, for example when players choose whether
or not to adopt a new technology. In this section, we consider a variant of the main model by
incorporating state variables that follow discrete jump processes, and prove equilibrium uniqueness in this setup. The finding here serves to highlight the robustness of the general point made
in this paper and broaden the scope of economic applications as we will illustrate below.
As in the main model in Section 2, we consider a finite-horizon discrete-time game with N
players. Player i has a finite set of states denoted by X i ⊆ R. For example, the state could be
either “maintaining an old technology” or “adopting a new technology.” In each state, player
i
i has action set Ai = [0, 1]X and controls the transition intensity from the current state to
another with costly effort Ait ∈ Ai . Here Ait (xi ) represents the intensity of jumping into state
i
xi . More specifically, the transition probability from Xti to Xt+∆
os given by
i
Xt+∆

x i =
6 Xti
=
X i
t
with probability ∆Ait (xi )µi (xi , Xti ),
otherwise,
where µi (x, x0 ) ≥ 0 for each x, x0 ∈ X i . This formulation allows for absorbing states when
µi (x, x0 ) = 0 for x0 6= x. For example, Ait can be interpreted as player i’s effort to adopt a new
technology.
As in the main model, player i’s payoff is given as
U i (XT ) −
X
∆ci (Ait ).
t
This model can be also interpreted as embedding a finite-strategy static game into a dynamic
setting by considering X i as the set of strategies in the static game. That is, players have
stochastic chances to change their actions until they play a one-shot game (See Calcagno,
Kamada, Lovo, and Sugaya (2014)).
30
In the proof of Proposition 1, we rely on the fact that the total noise size r∆
2
r∆
i
t t
P
is small with a high
probability. While this is ensured by small ∆ under the i.i.d assumption, we can achieve this even with
correlated noise under an appropriate requirement, for example by invoking the law of large numbers for
dependent variables. Even if it fails, we can redefine the approximate efficiency in Proposition 1 with the
expected first-best payoff rather than the best terminal payoff.
26
Example 5 (Technology adoption). Each agent can acquire a new technology subject to switching cost. The state Xti takes binary values, and Xti = 1 indicates that the agent i has already
adopted the new technology, and Xti = 0 otherwise. Assume that µ(1, 0) > 0 but µ(0, 1) = 0,
i.e., the technology adoption is an irreversible action. The cost ci (Ait ) is increasing in Ait (1), so
that it is costly to acquire the technology at a higher rate. Adopting new technologies typically involves complementarities, for example when U i supermodular, which causes equilibrium
multiplicity.
Example 6 (Market-platform choice). There are multiple marketplaces M at which transactions take place. Traders select a market platform before the deadline, where ci (Ai ) is seen as
a (possibly small) switching cost of changing the market platform, and Xti ∈ M represents the
marketplace player i is planning to enter at t. The payoff of entering each market m ∈ M is
increasing in the size of the participants, that is, for X i = m, U i (X i , X −i ) ≥ U i (X i , X̃ −i ) holds
whenever {j ∈ N : X j = m} ⊇ {j ∈ N : X̃ j = m}. As emphasized in the literature (e.g.,
Pagano (1989), Ellison and Fudenberg (2003)), such complementarity in market participation
can arise from various channels, and there are typically several equilibria in this type of game
under the one-shot formulation.
We maintain the same assumption on the adjustment cost ci as in the assumptions in Section
2, while do not impose any restriction on the terminal payoff U i .
As in the main model, we cannot rule out equilibrium multiplicity for the one-shot case
of ∆ = T , or more generally when ∆ is not small (for example when terminal payoffs admit
complementarities, as in the above examples). However, for small enough ∆, we again achieve
unique equilibrium as shown by the following:
Theorem 3. There is a unique subgame-perfect equilibrium in the discrete jump model if ∆ is
small enough. The strategies are Markovian, i.e, functions of (t, Xt ).
The intuition for equilibrium uniqueness is analogous to the main model. In the main
model, players adjust their positions every period by small amounts. In the current model,
their positions can jump but this happens only infrequently. Thus, in either case, the impact
of players’ actions per period is small in expectation. Also, at a more formal level, recall that
players’ actions in the main model can be interpreted as changing probability measure of state
variable paths, just as players’ actions influence the states in the current model.
In a team production game, an approximate efficiency result similar to Proposition 1 holds
for the discrete-jump model as well. The intuition and the proof idea are analogous and thus
omitted. This result is in line with Calcagno, Kamada, Lovo, and Sugaya (2014), which selects
Pareto efficient outcomes in coordination games (See also the related literature section).
27
7
Conclusion
This paper examines the strategic role of players’ ability to gradually adjust their positions
over time. The equilibrium is unique if players’ actions are flexible enough in the presence
of noise, which can be arbitrarily small in the limit. The highlight is that we do not require
any substantial restrictions of terminal payoffs, and adjustment of their positions can be either
continuous or discrete. For certain situations, gradualness leads to higher efficiency relative
to the one-shot setting. We also considered a continuous-time version of the model (based on
Brownian noise) that is more tractable for applications, in particular because the equilibrium
is characterized by PDEs (subject to smoothness requirement) that facilitate applications to
specific problems.
Let us emphasize a few implications of our findings. First, our equilibrium uniqueness
results suggest that coordination problems in one-shot games might simply be an artifact of
abstracting away from the players’ abilities to make gradual adjustment. We think such adjustment is plausible in many instances of interest. Second, our framework is flexible so our
uniqueness results hold for a broad class of applications. While many papers in the finitehorizon-game literature analyze specific economic problems that feature interesting deadline
effects, our results provide a general modeling approach that conveniently yields equilibrium
uniqueness for general terminal payoffs with frequent actions. We also note that equilibrium
uniqueness is in general important for parameter estimation in empirical analysis.31 Third, as
explained in Section 1.1, our method of proof for the uniqueness result in Section 5 is useful in
continuous-time games with Brownian noise beyond the specific model of this paper. While the
theory of BSDEs has been fruitfully applied in finance and stochastic control and optimization,
it seems underutilized in the literature of dynamic games.
While we have focused on games with fixed payoff functions, the problem of optimal incentive
design is a natural next step, and our approach provides a useful guide for a designer to achieve
unique (rather than partial) implementation in multi-agent settings. Also, we have focused
on complete information games. In the presence of incomplete information, the evolution of
the states can convey some information about players’ information or unobservable economic
fundamentals, and such an extension of the model will allow one to study incentives for signaling
and/or experimentation.32
31
Relatedly Caruana and Einav (2008a) use data of time-varying production target levels from U.S. auto
manufacturers.
32
A recent paper by Bonatti, Cisternas, and Toikka (2016) considers a continuos-time model of dynamic
oligopoly with a deadline and private information.
28
Appendix
A
Details of Example 3.1
We argue that in the deterministic case the strategy profile in which player i chooses

1 if P X j ≥ (T − t)θ
t
j
i
At =
P
j
0 if
j Xt < (T − t)θ
is a subgame perfect equilibrium for N ≥ 3 and for any value θ ∈ (−N + 1, −1). To see this,
we fix any period t ∈ {0, ∆, .., T − ∆} and current state Xt and confirm that no player has an
incentive to deviate from the prescribed strategy:
• Case 1: Xt ≥ (T − t)θ.
If there is no deviation, the terminal state is XT = Xt +(T −t)N , which is strictly positive.
After an arbitrary deviation on the part of player i after time t, the state variable Xt0 at
any t0 > t is strictly greater than (T − t0 )θ because of θ < −N + 1. Thus, the terminal
state after any deviation is still positive. Given this, the optimization problem reduces to
max
∆
i
A
TX
−∆
(Ais − c(Ais )),
s=t
and thus it is strictly optimal to choose Ait = 1 at any period t0 ≥ t.
• Case 2: Xt < (T − t)θ.
If there is no deviation, the terminal state is XT = Xt −(T −t)N , which is strictly negative.
By the same argument as in Case 1, under an arbitrary deviation on the part of player i
after time t, the terminal state is strictly negative. Given this, the optimization problem
reduces to
max
∆
i
A
TX
−∆
(−c(Ais )),
s=t
and thus it is strictly optimal to choose Ait = 0 at any time t0 ≥ t.
Finally, consider the case r∆ = κ∆ with some constant κ. If κ is small, then for any ∆,
the above strategy profile remains to be a subgame-perfect equilibrium. This is because,
the players’ incentive problems under deviations take the same forms as the above ones.
29
B
Proofs of Results from Main Model
B.1
Proof of Theorem 1
We begin with a couple of preliminary lemmas that are used in the proof.
Lemma 2. Let w : R → R be an integrable function. Then the function w̃ : R → R defined
R
by w̃(y) := w(y + r∆ i )φi (i )di is continuously differentiable with its derivative given by
R
w̃0 (y) = − r1∆ w(y + r∆ i )φ0i (i )di
Proof.
Z
d
d
w̃(y) =
w(y + r∆ i )φi (i )di
dy
dy
i
Z
˜ − y di i
d
d˜
=
w(˜)φi
dy
r∆
d˜i
i
Z
1
˜ − y di i
0
= −
w(˜)φi
d˜
r∆
r∆
d˜i
Z
1
= −
w(y + r∆ i )φ0i (i )di
r∆
where we used the change of variables by ˜i := y + r∆ i .
The following lemma is a minor modification of the standard result that ensures continuous
parametric dependence of fixed points under contraction maps.
Lemma 3. Let Y, Z be metric spaces such that Y is complete, and g : Y × Z → Y . Suppose
that (i) g(·, z) a contraction map over Y for each z with its module λ ∈ (0, 1) independent of
z, (ii) g(y, z) is Lipschitz continuous in z with its module κ independent of y. Then there is a
unique fixed point y ∗ (z) for each z, and the map z 7→ y ∗ (z) is Lipschitz continuous.
Proof. Let y ∗ (z) be the unique fixed point of each g(·, z). Then for any z, z 0 ∈ Z,
ky ∗ (z) − y ∗ (z 0 )k = kg(y ∗ (z), y ∗ (z 0 )) − g(y ∗ (z 0 ), z 0 )k
≤ kg(y ∗ (z), z) − g(y ∗ (z), z 0 )k + kg(y ∗ (z), z 0 ) − g(y ∗ (z), z)k
≤ ky ∗ (z) − y ∗ (z 0 )kλ + kz − z 0 kκ,
κ
which ensures ky ∗ (z) − y ∗ (z 0 )k ≤ kz − z 0 k 1−λ
.
Proof of Theorem 1. To simplify the notation, we only consider the two-player case. The general case can be proved analogously. Take any history ht at t = T − ∆ and let Xt denote the
30
state at t. After this history, each player i’s equilibrium action Ait (ht ) is to maximize the payoff
Q
function Vti (·; Xt ) : j=1,2 Ai → R defined by
Vti (At ; Xt )
Z
: =
Y
U i (Xt + ∆At + r∆ εt )
φj (jt )dεt − ∆ci (Ait )
j=1,2
Z Z
=
U i (XT )φi (it )dεit φj (jt )djt − ∆ci (Ait )
over Ai given the other player’s equilibrium action Ajt , where XT = Xt + ∆At + r∆ t (we will
use this simplified notation several times in this proof). By Lemma 2 below, the first term is
continuously differentiable in Ait . Note that we can write
i
−i
i
Z
U (X) = U (0, X ) +
0
Xi
∂
U i (xi , X −i )dxi
∂X i
because U i is absolutely continuous, where for each X −i , the partial derivative of U i (X i , X −i )
in X i is defined almost everywhere. Thus the derivative of Vti is calculated by
∂ i
V (At ; Xt ) = lim
δ→0
∂Ait t
Z
Z
1
δ
Z
XTi +∆δ
XTi
!
U i (xi , XTj )dxi
!
φi (it )dεit φj (jt )djt − ∆(ci )0 (Ait )
Z Z
i i i
1 i i
j
i
= lim
U (XT + ∆δ, XT ) − U (XT ) φ (t )dεt φj (jt )djt − ∆(ci )0 (Ait )
δ→0
δ
Z Z
∂
i
i i
i
U (XT )φ (t )dεt φj (jt )djt − ∆(ci )0 (Ait )
= ∆
i
∂X
where we used the Lebesgue convergence theorem in the third line since φi and the partial
derivatives of U i are bounded.
Applying Lemma 2 again, the second derivatives are derived as
Z Z
∆
∂
∂2
i
i i
i
i
U (XT )φ (t )dεt (φj )0 (jt )djt ,
j Vt (At ; Xt ) = −∆
i
i
r
∂X
∂At ∂At
∆
Z Z
2
∂
∆
∂
i
i
i 0 i
i
V (At ; Xt ) = −∆
U (XT )(φ ) (t )dεt φj (jt )djt − ∆(ci )00 (Ait ).
i 2 t
i
∂(At )
r∆
∂X
31
Since U i is absolutely continuous in X i , we can apply the integration by parts to obtain
Z
∂
j
i
i
i
i
i
i
i
U
(X
+
∆A
+
r
ε
,
X
)φ
(
)d
∆
t
t
t
t
t
T
∂X i
Z
i i
j
j
i
i
i i ∞
i
i
i
i
i 0 i
i
= U (Xt + ∆At + r∆ ε , XT )φ (t ) −∞ − U (Xt + ∆At + r∆ ε , XT )(φ ) (t )dt Z
i
i i
i 0
i i
i
≤ sup |U (x) ilim φ ( ) − U (x ) i lim φ ( )| + sup |U (x)| |(φi )0 (it )|dit
→∞
→−∞
x,x0 ∈R2
x∈R2
Z
= sup |U i (x)| |(φi )0 (it )|dit .
x∈R2
Thus
∂2
∆2
i
V
(A
;
X
)
∂Ai ∂Aj t t t ≤ r∆ C,
t
t
for all i ∈ N , Xt ∈ Rn , and At ∈
Q
i∈N
Ai where
j
j
j
Z
C := max sup |U (x) − T c (A )|
j=1,2 x∈R2 ,Aj
1 0
|(φ )
(1t )|d1t
Z
|(φ2 )0 (2t )|d2t < ∞
by assumption.
There is M > 0 such that if lim∆→0
r∆
∆
¯ > 0 such that if ∆ < ∆,
¯
≥ M , there is ∆
2
∂ V i (A ; X ) ∆2
∂Ait ∂Ajt t t t C
1
r∆
2
≤
∂ V i (A ; X ) ∆c̄ − ∆2 C < 2 ,
(∂Ait )2 t t t r∆
(12)
where c̄ := minAj , j=1,2 (cj )00 (Aj ) > 0.
¯ each player i’s objective function is strictly concave in Ai . Under
Therefore, for ∆ < ∆,
t
this concave payoff function, player i’s best response against the other’s action at this subgame
is always in a pure strategy and given by the FOC
∂
V i (At ; Xt )
∂Ait t
= 0 in the case of the interior
solution, and otherwise
i
A =

max Ai
if
∂
V i (max Ai , Ajt ; Xt )
∂Ait t
>0
min Ai
if
∂
V i (max Ai , Ajt ; Xt )
∂Ait t
<0
.
By the implicit function theorem, the slope of player i’s best response can be bounded by the
inequality (12) in absolute. Note that a pure strategy equilibrium in this subgame is a fixed
Q
point of the map over i∈N Ai defined by the best-response functions above. Since the map is
a contraction, the equilibrium action profile after the history ht should be unique. Since the
32
payoff function is strictly concave, there is no mixed strategy equilibrium in this subgame as
¯ depends only on the properties of (U i , ci , φi )i∈N , we can conclude
well.33 Furthermore, since ∆
that after any period T − ∆ histories, there is a unique equilibrium action profile that is played
¯
if ∆ < ∆.
¯ and t = T − ∆, denote by W i (ht ) := E[U i (Xt + ∆A(ht ) + r∆ t )|ht ] − ∆ci (Ai (ht ))
For ∆ < ∆
the continuation value after any period t history ht , where Ai (ht ) is the unique equilibrium
action after history ht . Since each Ai (ht ) depends only only the current state Xt by the above
construction, we can write the continuation values and equilibrium actions as functions of Xt
instead, i.e., W i (Xt ) and Ai (Xt ).
Note that for any Xt that is induced by some history ht , (A1 (Xt ), A2 (Xt )) is a unique fixed
¯ the module of the contraction is
point of a contraction mapping. By the construction of ∆,
taken to be uniformly bounded away from 1 in Xt . By Lemma 3 below, Ai (Xt ) is Lipschitz
continuous in Xti . This also implies that W i (Xt ) is Lipschitz continuous in Xt .
Consider period t = T − 2∆. Taking any period t history ht and denoting by Xt the current
state,
Vti (At ; Xt )
Z
:=
W i (Xt + ∆At + r∆ εt )
Y
φj (jt )dεt − ∆ci (Ait )
j=1,2
Since we know that the function W i is Lipschitz continuous and bounded in Xt , we can repeat
the same argument as above to calculate the second-order derivatives of the payoff functions.
Since |W i (X)| ≤ maxX,Aj |U j (X)| + T |ci (Aj )|, we can use the same constants C used above
to bound the derivative terms. Following the same argument, we can conclude that there is a
¯ where ∆
¯ is
unique action profile (in pure strategies) that is played after the history if ∆ < ∆
taken to be the same as the above.
We can repeat the same procedure for earlier periods t < T − 2∆ repeatedly, which confirms
that the model admits a unique subgame equilibrium, which is in pure strategies. Also, by
construction, the strategies are Markovian (i.e, functions of (t, Xt )).
B.2
Proof of Proposition 1
Let c̄ := supA∈A |c(A)|. We consider a hypothetical single agent model where a single agent
P
P
chooses Âit ∈ A for each i and t. The final payoff is given as U ( i XTi ) − ∆ i,t (c(Ait ) + c̄).
33
The inequality (12) also ensures the total concavity condition of Rosen (1965), i.e.,
∂2
i
V
(A
;
X
)
+
V i (At ; Xt ) < 0.
t
t
t
i) t
i ∂Aj
∂(A
∂A
t
t
t
i=1,2
X
∂2
Thus, the equilibrium is also a unique correlated equilibrium (Ui (2008)).
33
Note that this problem has a symmetric optimal control policy that is Markovian, denoted by
Â∆ (t, Xt ), and the continuation value, denoted by Ŵ∆ (t, Xt ).
Lemma 4. In the team production model, there is a symmetric subgame-perfect equilibrium
strategy that is Markovian, denoted by A∆ (t, Xt ), and the corresponding continuation value,
denoted by W∆ (t, Xt ), such that
W∆ (t, Xt ) ≥ Ŵ∆ (t, Xt )
for any t and Xt .
Proof. We prove the lemma by backward induction.
1. Take any history in the team production model at t = T − ∆, and let Xt denote the
corresponding state. In the hypothetical single agent model at (t, Xt ), the symmetric
optimal control policy Â∆ (t, Xt ) solves
"
#
max E U (Xt + N ∆At +
At ∈A
X
r∆ it ) − N ∆c(At ).
i
Thus, it also solves
"
#
max E U (Xt + (N − 1)∆Â∆ (t, Xt ) + ∆At +
At ∈A
X
r∆ it ) − ∆c(At )
i
because the cost terms are linearly separable among players. Therefore,
A∆ (t, Xt ) := Â∆ (t, Xt )
satisfies the incentive compatibility of players in the team production model at this subgame. Associated with this action profile at this subgame, we observe
W∆ (t, Xt ) = Ŵ∆ (t, Xt ) + (N − 1)∆(c(A∆ (t, Xt )) + c̄) ≥ Ŵ∆ (t, Xt ),
which used c(A∆ (t, Xt )) + c̄ ≥ 0.
2. Next, at t = T − 2∆, take any history in the team production model, and let Xt denote
the corresponding state. We consider a modified single problem in which the single agent
chooses At at t = T − 2∆ given W∆ (t, Xt ) at t = T − ∆, instead of Ŵ∆ (t, Xt ). In other
words, the continuation payoff of the single agent at the last period is replaced with that
34
of the team production model. We denote the optimal control policy by Ã∆ (t, Xt ) and its
associated continuation payoff by W̃∆ (t, Xt ). We first observe that
A∆ (t, Xt ) := Ã∆ (t, Xt )
satisfies the incentive compatibility condition at t = T − 2∆ because the two problems
share the same continuation payoff at t = T − ∆. Denote the associated (symmetric)
continuation payoff by W∆ (t, Xt ). Then we can claim the following:
W∆ (t, Xt ) ≥ W̃∆ (t, Xt ) ≥ Ŵ∆ (t, Xt ).
The first inequality can be proved in the same manner as in the previous section. We just
need to replace U with W∆ (T − ∆, ·) because the two models share the same continuation
payoff at the last period. For the next inequality, we rely on the previous observation that
W∆ (T − ∆, XT −∆ ) ≥ Ŵ∆ (T − ∆, XT −∆ ). The single agent could always take the same
action in the original and modified problems. Therefore, the higher continuation payoff at
each value of the state variable necessarily implies W̃∆ (t, Xt ) ≥ Ŵ∆ (t, Xt ).
3. For t = T − 3∆, · · · , 0, we can continue this argument until the start of the game.
Lemma 5. For any > 0, there exists δ > 0 such that if max{sup |c|,
2
r∆
}
∆
≤ δ, then
!
Ŵ∆ (0, X0 ) ≥
X
U
Q max
X∈ i∈N (X0i +T A)
Xi
− .
i
Proof. Take
!
XT? ∈
arg max
X∈
Q
U
X
i
i∈N (X0 +T A)
Xi
,
i
and consider a control policy
Ai,?
t =
(XTi,? − X0i )∆
.
T
This control policy provides a lower bound of Ŵ∆ (t, Xt ) as
"
Ŵ∆ (0, X0 ) ≥ E U
!#
X
X i,? +
i
X
i,t
35
r∆ it
−
2N T
c̄.
∆
(13)
Since U is bounded and Lipschitz continuous,
"
E U
!#
X
X
i,?
+
i
as
2
r∆
∆
X
r∆ it
!
−→ U
i,t
X
X
i,?
i
goes to zero and the Chebyshev applies. In addition, the last term in (13) vanishes when
cost goes to zero, which ends the proof.
Proposition 1 is a consequence of Lemmas 4 and 5.
B.3
Proof of Corollary 1
Proof. Corollary 1 is a consequence of Theorem 1 and Proposition 1.
C
Proofs of Results from Continuous-time Limit Model
with Brownian Noise
C.1
Proof of Lemma 1
We note that the model is constructed from a canonical probability space (Ω, F, P) where there
is no drift in X and changes in the probability measure depend on A = (Ai )i∈N . We denote
this new probability space by (Ω, F, PA ; F X ), which has been augmented with the filtration
generated by X, and the the corresponding expectation operator by EA . This construction is
standard and (often implicitly) used by most continuous-time models that feature imperfect
public monitoring.34
We begin with the following preliminary lemma (While the result is standard, we include
its proof for its completeness).
Lemma 6. Fix a strategy profile A = (Ai )i∈N . Then W0i (A) ≥ W0i (Âi , (Aj )j6=i ) for any strategy
Âi of player i if and only if Ait = f i (βt (A)) for almost all t ∈ [0, T ] almost surely, where β(A)
is defined as in equation (7).
Proof. Fix any strategy profile A = (Ai )i∈N . As discussed in Section 5.2, player i’s continuation
34
Chapter 5 of Cvitanic and Zhang (2012) discusses the technical details more extensively in the context of
dynamic contracts.
36
payoff under A takes the form
Wti (A)
A
Z
i
T
i
(Ais )ds|FtX
c
U (XT ) −
t
Z T
Z T
i
i
i
c (As )ds −
βsi (A) · dZs (A)
= U (XT ) −
t
t
Z
Z T
−1
i
i
i
i
c (As ) − βs (A) · Σ (µ(Xs ) + As ) ds −
= U (XT ) −
= E
t
T
βsi (A) · Σ−1 dXs ,
t
with WTi (A) = U i (XT ). The second equality follows by applying the extended martingale
representation theorem to the conditional expectation of the total payoff (see for example
Chapter 5 and Lemma 10.4.6 in Cvitanic and Zhang (2012)), and the third equality uses
dXt = (µ(Xt ) + At ) dt + ΣdZt .
The pair (W i (A), β i (A)) is a solution to the one-dimensional BSDE (under the original
probability space) with driver −ci (Ait )+βti (A)·Σ−1 (µ(Xt ) + At ) and terminal condition UTi (XT ).
Note that we treat (Aj )j6=i as exogenous processes in this BSDE formulation.
Take another strategy profile à = (Ãj )j∈N in which player i uses strategy Ãit = f i (βt (A))
and where Ãj = Aj for every j 6= i. Using the same argument as above, player i’s continuation
value takes the form
Wti (Ã)
i
Z
= U (XT ) −
T
c
i
(Ãis )
−
βsi (Ã)
−1
·Σ
t
µ(Xs ) + Ãs
Z
ds −
T
βsi (Ã) · Σ−1 dXs
t
with WTi (Ã) = U i (XT ). Likewise, (W i (Ã), β i (Ã)) is a solution to the one-dimensional BSDE
with driver −ci (Ãit ) + βti · Σ−1 (µ(Xt ) + Ãt ) and terminal condition U i (XT ). Note that the driver
term satisfies the quadratic growth condition, because f i is bounded and Lipschitz continuous
(Lemma S4 in Supplementary Appendix). By the comparison theorem (Lemma S1), W0i (Ã) ≥
W0i (A) holds. This proves the “if” direction.
To prove the “only if” direction, consider the case in which Ait 6= f i (βt ) for some strictly
positive measure dt⊗dP. Then by the strict inequality part of the comparison theorem (Lemma
S1), W0i (Ã) > W0i (A).
Proof of Lemma 1. To show the “if” direction, take any strategy profile A = (Ai )i∈N under
which conditions (5), (7), (8), and (9) hold for some F X -progressively measurable processes
RT
(W, β) such that E sup0≤t≤T |Wt |2 , E[ 0 |βt |2 dt] < ∞. As we have seen in the proof of Lemma
6, for player i, the pair of processes (W i (A), β i (A)) is a unique solution to one-dimensional
BSDE with driver −ci (Ait ) + βti · Σ−1 (µ(Xt ) + At ) and terminal condition UTi (XT ). Thus
37
Wti = Wti (A) and βti = βti (A) (for almost all t almost surely). By Lemma 6 and the incentive compatibility condition (9), each player maximizes her expected total payoff. Thus A
is a Nash equilibrium.
To show the “only if” direction, take any Nash equilibrium A = (Ai )i∈N . Then the incentive
compatibility condition (9) holds by Lemma 6. The remaining conditions follow immediately
by setting Wti = Wti (A) and βti = βti (A).
C.2
Proof of Theorem 2
Proof. Take any solution (W, β) of BSDE (10) if one exists. This is a solution to an FBSDE of
the form35
dWti = ci (f i (βt ))dt + βti · dZt , WTi = U i (XT ), i ∈ N
dXt = (µ(Xt ) + f (βt )) dt + ΣdZt ,
(14)
under the probability space (Ω, G, Q) in which G = F X and the Zt -process is a d-dimensional
Brownian motion. The probability measure Q is constructed from the original canonical probability space (Ω, F X , P) by the Girsanov theorem, which ensures F Z ⊆ F X .
We apply Lemma S3 in Supplementary Appendix to show that a unique solution (X, W, β)
to FBSDE (14) exists. To check condition 1 in Lemma S3, first note that the drift term in the
expression for the Xt process, namely µ(Xt ) + f (βt ), is Lipschitz continuous in βt by Lemma S4
in Supplementary Appendix, which ensures the first inequality. The second inequality follows
by the Cauchy-Schwartz inequality and the Lipschitz continuity of µ(Xt ) + f (βt ) in X (Lemma
S4), and the boundedness of µ0 and Ai for each i ensures the third inequality. To check condition
2, note that the drift term in the expression for the Wti process, namely ci (f i (β)), is Lipschitz
continuous in (X, β) by Lemma S4 and the assumptions on ci . This term is independent of W ,
which ensures the second inequality. The third inequality follows from the boundedness of ci
for each i. Condition 3 is satisfied by the assumption on the terminal payoffs.
Thus if there is another solution (W 0 , β 0 ) to BSDE (10), this induces another solution
(X 0 , W 0 , β 0 ) to FBSDE (14) that coincides with the solution (X, W, β). Thus (Wt , βt ) = (Wt0 , βt0 )
for almost all t almost surely. By Theorem 2.1 of Cheridito and Nam (2015), the process (W, β)
is a solution to BSDE (10). This shows existence of a unique Nash equilibrium by Lemma 1.
Finally, by Lemma S3, the unique solution (X, W, β) to the FBSDE with G = F Z takes the
35
In the third line, f (βt ) denotes a column vector whose i’s component is f i (βt ).
38
form Wt = w(t, Xt ), βt = b(t, Xt ) for some function w : [0, T ] × Rn → Rn that is uniformly
Lipschitz continuous in x ∈ Rn and some function b : [0, T ] × Rn → Rd×n .
C.3
Proof of Proposition 2
Proof. By symmetry of the players, at the unique equilibrium A = (Ai )i∈N , every player chooses
the same action, namely Ait = f (βt (A)), where f (β) := arg maxα∈A {σ −1 βα − κc(α)} at almost
all t almost surely. Thus the equilibrium BSDE (10) reduces to a one-dimensional BSDE
dWt = − N f (βt )σ −1 βt − κc(f (βt )) dt + σ −1 βt dXt , WT = U (XT ).
(15)
As a next step we observe that βt (A) ≥ 0 for almost all t almost surely if U is increasing.
To see this, first note that the driver term in (15) satisfies the quadratic growth condition, and
that the terminal payoff U (x) is bounded, Lipschitz continuous, and increasing. Therefore, the
claim follows by Corollary 3.5 and Remark 3.8 in dos Reis and dos Reis (2013).
For any β ≥ 0, the term maxα {Σ−1 βα − κc(α)} is decreasing in κ by the envelope theorem.
Let f (β; κ) := arg maxα Σ−1 βα−κc(α). Thus, for each β ≥ 0, the driver term in the equilibrium
BSDE (15),
(N − 1)f (β; κ)Σ−1 β + max{Σ−1 βα − κc(α)},
α
is decreasing in κ and increasing in N . By Lemma S1 (comparison theorem for one-dimensional
quadratic BSDEs) in Supplementary Appendix , W0 (A(κ, N )) ≥ W0 (A(κ̃, N )) for any κ ≤ κ̃
and N , and W0 (A(κ, N )) ≤ W0 (A(κ, Ñ )) for any N ≤ Ñ and κ.
C.4
Proof of Proposition 3
In Supplementary Appendix, we construct an equilibrium strategy profile for general quadratic
P
payoffs U i = k=1,2 (αik (xk )2 + βik xk ) + γi x1 x2 in Proposition S1. The equilibrium actions and
continuation values take the form ai (t, x1 , x2 ) = c̄1i (2αii (t)xi +βii (t)+γi (t)xj ) and wi (t, x1 , x2 ) =
P
k 2
k
+ γi (t)xi xj + δi (t), where the coefficients are obtained by solving
k=i,j αik (t)(x ) + βik (t)x
the system of ODEs (20) given in that proof. For the current game, we can directly verify the
39
solution
α11 (t) = α12 (t) = γ1 (t) = α22 (t) = 0,
(T − t)3
T −t
, β12 (t) = 1, δ1 (t) =
,
β11 (t) =
c̄2
3c̄1 c̄22
T −t
(T − t)2
(T − t)3
α21 (t) =
, β22 (t) =
, β21 (t) =
,
2c̄2
2c̄1 c̄2
2c̄1 c̄22
σ 2 (T − t)2 3(T − t)5
,
γ2 (t) = 1, δ2 (t) = 1
+
4c̄2
20c̄21 c̄32
where the boundary conditions (21) given in the proof of Proposition S1 take the form β12 (T ) =
γ2 (T ) = 1, α11 (T ) = α12 (T ) = α21 (T ) = α22 (T ) = β11 (T ) = β21 (T ) = β22 (T ) = γ1 (T ) =
δ1 (T ) = δ2 (T ) = 0.
Therefore, the equilibrium strategies are
a1 (t, x1 , x2 ) =
T −t
(T − t)2 x1
, a2 (t, x1 , x2 ) =
+ .
c̄1 c̄2
2c̄1 (c̄2 )2
c̄2
For the continuation value functions,
w1 (t, x1 , x2 ) =
w2 (t, x1 , x2 ) =
D
D.1
T −t 1
(T − t)3
,
x + x2 +
c̄2
3c̄1 c̄22
T − t 1 2 (T − t)3 1 (T − t)2 2
σ12 (T − t)2 3(T − t)5
1 2
(x ) +
x
+
x
+
x
x
+
+
.
2c̄2
2c̄1 c̄22
2c̄1 c̄2
4c̄2
20c̄21 c̄32
Proofs of Results from Discrete-Jump Model
Proof of Theorem 3
To simplify the notation, we only consider the two-player case. The general case can be proved
analogously. Take any history ht at t = T − ∆ and let Xt denote the state at t. After this
history, each player i’s equilibrium action Ait (ht ) is to maximize the payoff function Vti (·; Xt ) :
Q
i
j=1,2 A → R defined by
Vti (At ; Xt ) : =
X
pi (xi )pj (xj )U i (x) − ∆ci (Ait )
x
over Ai given the other player’s equilibrium action Ajt , where for each i = 1, 2 the transition
probabilities pi (xi ) are specified by
40
pi (xi ) =

1 − ∆λ P
xi 6=Xti
Ait (xi )µi (xi , Xti ) for xi = Xti
∆λAi (xi )µ (x , X i )
t
i
i
t
i
if x 6=
.
Xti
The derivative of Vti is calculated by

−∆c0 (Ai )
for xi = Xti
∂
t
i
i
.
V (At ; Xt ) =
λ∆ P p (x )µ (xi , X i )(U i (x) − U i (X i , xj )) − ∆c0 (Ai )) if xi =
∂Ait (xi ) t
i
6
X
j
i
t
t
i
t
t
xj j
The cross derivatives are derived as, for xi 6= Xti ,

0
for xj = Xtj
∂
i
Vt (At ; Xt ) =
.
j
λ2 ∆2 µ (xj , X j )µ (xi , X i )(U i (x) − U i (X i , xj )) if xj =
∂Ait (xi )∂Ajt (xj )
6 Xt
j
i
t
t
t
2
and
∂2
V i (At ; Xt )
∂Ait (xi )∂Ajt (xj ) t
= 0 for xi = Xti and xj . On the other hand, the second derivatives
are calculated by
∂2
V i (At ; Xt ) = ∆c00i (Ait (xi ))
i i 2 t
(∂At (x ))
for each xi .
¯ > 0 such that if ∆ < ∆,
¯
Therefore, there is ∆
∂2
i
V
(A
;
X
)
j
t
t
∆2 C
∂Ait (xi )∂At (xj ) t
1
≤
<
∂2
∆c̄
|X i |
(∂Ait (xi ))2 Vti (At ; Xt ) (16)
for each i = 1, 2,, where c̄ := minAj ,j=1,2 c00j (Aj ) > 0 and
C := λ2
max 0
j=1,2,x,x0 ,xj ,xj ,Aj ,Aj
0
0
(µj (xj , xj ))2 |U j (x) − U j (x0 )| + T |cj (Aj ) − cj (Aj )| > 0,
0
which is independent ∆ and players’ strategy profiles.
¯ for each i, Xti and Aj , V i (Ait , Aj ; Xt ) is strictly concave in Ait . Under this
Thus, for ∆ < ∆,
concave payoff function, player i’s best response against the other’s action at this subgame is
always in a pure strategy and given by the FOC
∂
V i (At ; Xt )
∂Ait (xi ) t
= 0 in the case of the interior
solution. By the implicit function theorem, the slope of player i’s best response can bounded
by (16) in absolute. Note that a pure strategy equilibrium in this subgame is a fixed-point
Q
of the map over i∈N Ai defined by the best-response functions above. Since the map is a
contraction, the equilibrium action profile after the history ht should be unique. Also note
that since the payoff function is strictly concave, there is no mixed strategy equilibrium in this
41
subgame as well.
¯ depends only on the properties of (U i , µi )i∈N , we can conclude that
Furthermore, since ∆
after any period T − ∆ histories, there is a unique equilibrium action profile that is played if
¯
∆ < ∆.
¯ and t = T −∆, denote by W i (ht ) := E[U i (XT )|ht ]−∆ci (Ai (ht )) the continuation
For ∆ < ∆
value after any period t history ht , where Ai (ht ) is the unique equilibrium action after history
ht . Since each Ai (ht ) depends only only the current state Xt by the above construction, we
can write the continuation values and equilibrium actions as functions of (t, Xt ) instead, i.e.,
Wti (Xt ) and Ait (Xt ).
Consider period t = T − 2∆. Taking any period t history ht and denoting by Xt the current
state,
Vti0 (At ; Xt )
Z
:=
W i (Xt + ∆At + r∆ εt )
Y
φj (jt )dεt − ∆ci (Ait )
j=1,2
Since we know that the function W i is bounded in Xt , we can repeat the same argument
as above to calculate the second-order derivatives of the payoff functions. Since |W i (X)| ≤
maxX,Aj |U j (X)| + T |ci (Aj )|, we can use the same constants C, C 0 used above to bound the
derivative terms. Following the same argument, we can conclude that there is a unique action
¯ where ∆
¯ is taken to be the
profile (in pure strategies) that is played after the history if ∆ < ∆
same as the above.
We can repeat the same procedure for earlier periods t < T − 2∆ repeatedly, which confirms
that the model admits a unique subgame equilibrium, which is in pure strategies. Also, by
construction, the strategies are Markovian (i.e, functions of (t, Xt )).
References
Ambrus, A., J. Burns, and Y. Ishii (2014): “Gradual bidding in eBay-like auctions,”
mimeo.
Ambrus, A., and S. E. Lu (2015): “A continuous-time model of multilateral bargaining,”
American Economic Journal: Microeconomics, 7(1), 208–249.
Athey, S., and A. Schmutzler (2001): “Investment and Market Dominance,” The RAND
Journal of Economics, 32(1), 1–26.
Bergin, J., and W. B. MacLeod (1993): “Continuous Time Repeated Games,” International Economic Review, 34(1), 21–37.
42
Bohren, A. (2016): “Using Persistence to Generate Incentives in a Dynamic Moral Hazard
Problem,” mimeo.
Bonatti, A., G. Cisternas, and J. Toikka (2016): “Dynamic Oligopoly with Incomplete
Information,” Review of Economic Studies.
Budd, C., C. Harris, and J. Vickers (1993): “A Model of the Evolution of Duopoly: Does
the Asymmetry between Firms Tend to Increase or Decrease?,” Review of Economic Studies,
60(3), 543–573.
Burdzy, K., D. Frankel, and A. Pauzner (2001): “Fast Equilibrium Selection by Rational
Players Living in a Changing World,” Econometrica, 69(1), 163–189.
Calcagno, R., Y. Kamada, S. Lovo, and T. Sugaya (2014): “Asynchronicity and Coordination in Common and Opposing Interest Games,” Theoretical Economics, 9, 409–434.
Cao, D. (2014): “Racing Under Uncertainty: A Boundary Value Problem Approach,” Journal
of Economic Theory, 151, 508–527.
Carlsson, H., and E. van Damme (1993): “Global Games and Equilibrium Selection,”
Econometrica, 61(5), 989–1018.
Caruana, G., and L. Einav (2008a): “Production Targets,” RAND Journal of Economics,
39(4), 990–1017.
(2008b): “A Theory of Endogenous Commitment,” Review of Economic Studies, 75(1),
99–116.
Caruana, G., L. Einav, and D. Quint (2007): “Multilateral bargaining with concession
costs,” Journal of Economic Theory, 132, 147 – 166.
Chen, Z., R. Kulperger, and G. Wei (2005): “A Comonotonic Theorem for BSDEs,”
Stochastic Processes and their Applications, 115(1), 41–54.
Cheridito, P., and K. Nam (2015): “Multidimensional quadratic and subquadratic BSDEs
with special structure,” Stochastics, 87(5), 871–884.
Cvitanic, J., and J. Zhang (2012): Contract Theory in Continuous-Time Models. Springer
Finance.
Davis, M. H. A. D. M. (1979): Martingale methods in stochastic control pp. 85–117. Springer
Berlin Heidelberg.
Delarue, F. (2002): “On the existence and uniqueness of solutions to FBSDEs in a nondegenerate case,” Stochastic Processes and their Applications, 99(2), 209–286.
43
Delong, L. (2013): Backward Stochastic Differential Equations with Jumps and Their Actuarial and Financial Applications: BSDEs with Jumps. Springer.
DeMarzo, P. M., and Y. Sannikov (2006): “Optimal Security Design and Dynamic Capital
Structure in a Continuous-Time Agency Model,” Journal of Finance, 61(6), 2681–2724.
Dockner, E. J., S. Jorgensen, N. V. Long, and G. Sorger (2001): Differential Games
in Economics and Management Science. Cambridge University Press.
dos Reis, G., and R. J. dos Reis (2013): “A note on comonotonicity and positivity of
the control components of decoupled quadratic FBSDE,” Stochastics and Dynamics, 13(4),
1350005.
Ellison, G., and D. Fudenberg (2003): “Knife-Edge or Plateau: When Do Market Models
Tip?,” The Quarterly Journal of Economics, 118(4), 1249–1278.
Faingold, E., and Y. Sannikov (2011): “Reputation in Continuous-Time Games,” Econometrica, 79(3), 773–876.
Frankel, D., and A. Pauzner (2000): “Resolving Indeterminacy in Dynamic Settings: The
Role of Shocks,” The Quarterly Journal of Economics, 115(1), 285–304.
Fudenberg, D., and D. Levine (2007): “Continuous time limits of repeated games with
imperfect public monitoring,” Review of Economic Dynamics, 10(2), 173–192.
(2009): “Repeated Games with Frequent Signals,” Quarterly Journal of Economics,
124, 233–265.
Fudenberg, D., and J. Tirole (1983): “Capital as a commitment: Strategic investment to
deter mobility,” Journal of Economic Theory, 31(2), 227–250.
Gensbittel, F., S. Lovo, J. Renault, and T. Tomala (2016): “Zero-sum Revision
Games,” mimeo.
Georgiadis, G. (2015): “Projects and Team Dynamics,” Review of Economic Studies, 82(1),
187–218.
Hamadene, S., and J.-P. Lepeltier (1995): “Backward equations, stochastic control and
zero-sum stochastic differential games,” Stochastics, 3(4), 221–231.
Harris, C., and J. Vickers (1987): “Racing with Uncertainty,” Review of Economic Studies,
54(1), 1–21.
Iijima, R., and A. Kasahara (2015a): “Gradual Adjustment and Equilibrium Uniqueness
under Noisy Monitoring,” ISER Discussion Paper No. 965.
44
Kamada, Y., and M. Kandori (2015): “Revision Games Part I: Theory,” mimeo.
Kobylanski, M. (2000): “Backward stochastic differential equations and partial differential
equations with quadratic growth,” Annals of Probability, 28(2), 558–602.
Konrad, K. A. (2009): Strategy and Dynamics in Contests. Oxford University Press.
Lipman, B. L., and R. Wang (2000): “Switching Costs in Frequently Repeated Games,”
Journal of Economic Theory, 93(2), 148–90.
Lockwood, B. (1996): “Uniqueness of Markov-perfect equilibrium in infinite-time affinequadratic differential games,” Journal of Economic Dynamics and Control, 20(5), 751–765.
Lovo, S., and T. Tomala (2015): “Markov Perfect Equilibria in Stochastic Revision Games,”
mimeo.
Morris, S., and H. Shin (2006): “Heterogeneity and Uniqueness in Interaction Games,”
in The Economy as an Evolving Complex System III, ed. by L. Blume, and S. Durlauf, pp.
207–242. Oxford University Press.
Morris, S., and H. S. Shin (1998): “Unique Equilibrium in a Model of Self-Fulfilling Currency Attacks,” The American Economic Review, 88(3), 587–597.
Pagano, M. (1989): “Trading Volume and Asset Liquidity,” The Quarterly Journal of Economics, 104(2), 255–274.
Papavassilopoulos, G. P., and J. B. Cruz (1979): “On the uniqueness of Nash strategies
for a class of analytic differential games,” Journal of Optimization Theory and Applications,
27(2), 309–314.
Pardoux, E., and S. Peng (1990): “Adapted solution of a backward stochastic differential
equation,” Systems and Control Letters, 14(1), 55–61.
Pham, H. (2010): Continuous-time Stochastic Control and Optimization with Financial Applications. Springer.
Rosen, J. B. (1965): “Existence and Uniqueness of Equilibrium Points for Concave N-Person
Games,” Econometrica, 33(3), 520–534.
Sannikov, Y. (2007): “Games with Imperfectly Observable Actions in Continuous Time,”
Econometrica, 75(5), 1285–1329.
Sannikov, Y., and A. Skrzypacz (2007): “Impossibility of Collusion under Imperfect Monitoring with Flexible Production,” American Economic Review, 97(5), 1794–1823.
Seel, C., and P. Strack (2015): “Continuous Time Contests with Private Information,”
45
Mathematics of Operations Research.
Simon, L. K., and M. B. Stinchcombe (1989): “Extensive Form Games in Continuous
Time: Pure Strategies,” Econometrica, 57(5), 1171–214.
Staudigl, M., and J.-H. Steg (2014): “On Repeated Games with Imperfect Public Monitoring: From Discrete to Continuous Time,” Center for Mathematical Economics Working
Papers, 525.
Tsutsui, S., and K. Mino (1990): “Nonlinear Strategies in Dynamic Duopolistic Competition
with Sticky Prices,” Journal of Economic Theory, 52, 136–161.
Ui, T. (2008): “Correlated Equilibrium and Concave Games,” International Journal of Game
Theory, 37(1), 1–13.
Wernerfelt, B. (1987): “Uniqueness of Nash Equilibrium for Linear-Convex Stochastic
Differential Game,” Journal of Optimization Theory and Applications, 53(1), 133–138.
Yildirim, H. (2005): “Contests with multiple rounds,” Games and Economic Behavior, 51,
213–227.
46
Supplementary Appendix (For Online Publication)
A
Mathematical Foundations
A.1
Lipschitz and Quadratic BSDEs
This section introduces BSDEs and introduce relevant results.36
Fix any invertible d × d matrix Σ̃ throughout this subsection. Let X = {Xt }0≤t≤T be
a standard d-dimensional Brownian motion on a filtered probability space (Ω, F, P), where
F = {Ft }0≤t≤T is the augmented filtration generated by X, and T is a fixed finite horizon. We
denote by S2 the set of Rn -valued progressively measurable processes W over [0, T ] such that
E
2
sup |Wt |
< ∞,
0≤t≤T
and by H2 the set of Rn×d -valued progressively measurable processes β over [0, T ] such that
Z
E
T
|βt | dt < ∞.
2
0
We call g : Ω × [0, T ] × Rn × Rn×d → Rn a driver if the following conditions hold:
- g(ω, t, w, b), written g(t, w, b) for simplicity, is progressively measurable for all w ∈ Rn and
b ∈ Rn×d
- g(t, 0, 0) ∈ S2 (0, T )n
We call ξ a terminal condition if it is an Rn -valued random variable such that E[|ξ|2 ] < ∞.
Definition S1. Let ξ be a terminal condition, and let g be a driver. A BSDE is a stochastic
system of equations defined by37
Z
T
Z
g(s, Ws , βs )ds −
Wt = ξ +
t
T
βs · Σ̃dXs , 0 ≤ t ≤ T.
t
A solution to a BSDE is a pair (W, β) ∈ S2 × H2 that satisfies the above system of equations.
A BSDE is said to be one-dimensional if n = 1.
36
See, for example, Pham (2010). Note that the notation (W, β) used here corresponds to (Y, Z) in the
standard notation.
37
In the standard notation, Σ̃ is usually an identity matrix. We chose to include this additional term, which
does not change any of the results in this section, in order to make it easier to see their applicability to our
model.
47
We often write the above BSDE in differential form as
dWt = −g(t, Wt , βt )dt + βt · Σ̃dXt ,
WT = ξ.
It is well-known that a BSDE has a unique solution (W, β) if g satisfies the uniform Lipschitz
condition (Pardoux and Peng (1990)). However, since the equilibrium BSDE (10) does not
satisfy the condition due to strategic feedback effects across players. Instead, its driver satisfies
quadratic growth condition. More specifically, a driver g satisfies the quadratic growth condition
if there is a constant C > 0 such that
|g(t, w, b)| ≤ C(1 + |w| + |b|2 ),
|g(t, w, b) − g(t, w̃, b̃)| ≤ C |w − w̃| + (|w| + |b| + |w̃| + |b̃|)|b − b̃|
for all ω ∈ Ω, t ∈ [0, T ], w, w̃ ∈ Rn , and b, b̃ ∈ Rn×d .
The following results for one-dimensional BSDEs are due to Kobylanski (2000).
Lemma S 1 (Comparison Theorem). Consider a pair of one-dimensional BSDEs that are
˜ g̃) that satisfy the quadratic growth
associated with drivers and terminal conditions (ξ, g) and (ξ,
condition. Let (W, β) and (W̃ , β̃) be the corresponding solutions. If ξ ≥ ξ˜ almost surely and
g(t, Wt , βt ) ≥ g̃(t, Wt , βt ) for almost all t almost surely, then Wt ≥ W̃t for all t almost surely.
If, in addition, g(t, Wt , βt ) > g̃(t, Wt , βt ) holds for a strictly positive measure under dt ⊗ dP,
then W0 > W̃0 .
Lemma S2 (One-dimensional Quadratic BSDE). Consider a one-dimensional BSDE in which
ξ is a terminal condition and g is a driver that satisfies the quadratic growth condition. Then
there exists a unique solution (W, β) of the BSDE.
A.2
Quadratic FBSDE
Fix a probability space (Ω, G, Q) in which {Gt }0≤t≤T satisfies the usual conditions and (Zt )0≤t≤T
is a standard d-dimensional Gt -Brownian motion. Note that Gt can be larger than the filtration
generated by the Brownian motion Zt . A (Markovian) forward backward stochastic differential
equation (FBSDE) is a stochastic system of equations that takes the form
dXt = F (t, Xt , Wt , βt )dt + Σ̃dZt , X0 = x0
dWt = G(t, Xt , Wt , βt )dt + βt · dZt , WT = U (XT )
48
(17)
for some F : [0, T ] × Rn × Rn × Rn×d → Rn , G : [0, T ] × Rn × Rn × Rn×d → Rn , U : Rn → Rn ,
x0 ∈ Rn , and invertible Σ̃ ∈ Rd×d .
A solution (X, W, β) to an FBSDE is an Rn ×Rn ×Rn×d -valued {Gt }-progressively measurable
process that satisfies (17) and
Z
T
2
2
2
|Xt | + |Wt | + |βt |
E
dt < ∞.
0
The following result is due to Delarue (2002).38
Lemma S3 (Non-degenerate FBSDE). Consider the FBSDE (17). Assume that there exists a
constant K > 0 such that the following conditions are satisfied (for all t, x, x0 , w, w0 , b, b0 ):
1. F is continuous in x and satisfies
|F (t, x, w, b) − F (t, x, w0 , b0 )| ≤ K(|w − w0 | + |b − b0 |)
(x − x0 ) · (F (t, x, w, b) − F (t, x0 , w, b)) ≤ K|x − x0 |2
|F (t, x, w, b)| ≤ K(1 + |x| + |w| + |b|)
2. G is continuous in w and satisfies
|G(t, x, w, b) − G(t, x0 , w, b0 )| ≤ K(|x − x0 | + |b − b0 |)
(w − w0 ) · (G(t, x, w, b) − G(t, x, w0 , b)) ≤ K|w − w0 |2
|G(t, x, w, b)| ≤ K(1 + |x| + |w| + |b|)
3. U is Lipschitz continuous and bounded, that is, it satisfies
|U (x) − U (x0 )| ≤ K(|x − x0 |)
|U (x)| ≤ K.
Then there exists a unique solution (X, W, β) of the FBSDE. The solution takes the form
Wt = w(t, Xt ), βt = b(t, Xt ) for some function w : [0, T ] × Rn → Rn that is uniformly Lipschitz
continuous in x ∈ Rn , and some function b : [0, T ] × Rn → Rd×n .
38
Delarue (2002) uses a more general formulation that, in particular, allows the matrix Σ̃ to depend on t
and X. He requires the condition called uniform parabolicity: There exist ν, ν 0 > 0 such that ∀ξ ∈ Rd , ν ≤
1 0
0
0
|ξ|2 ξ (Σ̃(t, X)Σ̃ (t, X))ξ ≤ ν for all t, X. To see that this condition is satisfied in the current framework, first
note that Σ̃Σ̃0 has d real positive eigenvalues (λ1 , · · · , λd ), because Σ̃ is invertible, which guarantees that Σ̃Σ̃0
is positive definite. By the standard results in linear algebra, 0 < (mink λk )|ξ|2 ≤ ξ 0 (Σ̃Σ̃0 )ξ ≤ (maxk λk )|ξ|2 for
any non-zero ξ ∈ Rd .
49
Lemma S4. There is Cf > 0 such that |f i (β, X) − f i (β̃, X̃)| ≤ Cf (|β i − β̃ i | + |X − X̃|) for all
β, β̃, X, X̃.
Proof. Denote the control range by Ai = [Ai , Āi ]. Since β i · Σ−1 µi αi − ci (αi , X) is strictly
concave in αi , its unique maximizer Ai is continuous in β and X and satisfies the first-order
condition β i · Σ−1 µi =
the form
∂ci (Ai ,X)
∂Ai
in the case of Ai ∈ (Ai , Āi ). Furthermore, this maximizer takes


∂ci (Āi ,X)
−1
i
i

Ā
if
β
·
Σ
µ
≥

i
∂Ai


i (Ai ,X) ∂ci (Āi ,X)
f i (β, X) = g i (β, X) if β i · Σ−1 µi ∈ ( ∂c ∂A
, ∂Ai )
i



i (Ai ,X)

Ai
if β i · Σ−1 µi ≤ ∂c ∂A
i
where g i (β, X) is the unique value Ai ∈ Ai that is defined by the equality
∂ci (Ai ,X)
−β i ·Σ−1 µi
∂Ai
i
= 0.
Note that this value is uniquely determined, since ci is strictly convex in A . By the implicit
function theorem, for k = 1, . . . , d, he first-order partial derivatives of g i satisfy
i ∂ 2 ci ∂g ∂Ai ∂X k ∂X k = ∂ 2 ci ,
(∂Ai )2
i k −1 ∂g e · Σ µi ∂β ik = ∂ 2 ci ,
(∂Ai )2
where β ik denotes the kth element of the vector β i ∈ Rn , and ek is the kth
unit vector in
∂ 2 ci Rn . Since we can choose constants M, M 0 > 0 uniformly in (β, X) such that (∂A
i )2 > M and
∂ 2 ci ∂Ai ∂X k < M 0 by assumption, this establishes the Lemma.
B
B.1
Additional Results
Discontinuous terminal payoff
In this subsection we consider the case where terminal payoffs can be discontinuous, and show
the uniqueness result (Theorem 1) if the noise distribution is approximately uniform. Formally
we impose the following requirement on U i :
Assumptions. U i is piecewise Lipschitz in X i , i.e., for each X −i there exist (ψki (X −i ))k=1,...,Ki
such that U i (X i , X −i ) is Lipschitz continuous in X i ∈ R \ {ψki (X −i ) : k = 1, ..., Ki }. Also,
assume that each ψki (X −i ) and sup{limn |U i (Xni ) − U i (ψ i (X j ))| : Xni → ψ i (X j } are Lipschitz
continuous and bounded in X −i .
Theorem S1. Consider the model in Section 2 with the modified assumption as above. Assume
that lim∆
r∆
∆
= ∞. Then there exists δ > 0 such that if max{∆, supi,
50
|φ0i (i )|
}
r∆
≤ δ, then there is
a unique subgame-perfect equilibrium.
Proof. To simplify the notation, we only consider the two-player case and assume that Ki (Xj ) =
1 for all Xj (i.e., there is only one “jump” point in each player’s terminal payoff). The general
case can be proved analogously. Take any history ht at t = T − ∆ and let Xt denote the state
at t. After this history, each player i’s equilibrium action Ait (ht ) is to maximize the payoff
Q
function Vti (·; Xt ) : j=1,2 Ai → R defined by
Vti (At ; Xt )
Z
=
U i (Xt + ∆At + r∆ εt )
Y
φj (jt )dεt − ∆ci (Ait )
j=1,2

Z
=
Z
j
ψ i (X )−Xti −∆Ait
T
r∆


U i (Xt + ∆At + r∆ εt )φi (it )dεit  φj (jt )djt
−∞
Z
Z
+
!
∞
i
j
ψ i (X )−Xti −∆Ait
T
r∆
U (Xt + ∆At +
r∆ εt )φi (it )dεit
φj (jt )djt
−∆ci (Ait )
over Ait given the current state Xt and the other’s action Ajt .
Let Gi (XTj ) := limX i %ψi (X j ) U i (X i ) − U i (ψ i (XTj )) denote the amount of jump of at X i =
T
ψ i (XTj ).
The first-order derivative takes the form

Z
Z
j
ψ i (X )−Xti −∆Ait
T
r∆

∂
U i (XT )φi (it )dεit  φj (jt )djt
i
∂X
−∞
!
Z Z ∞
∂
+∆
U i (XT )φi (it )dεit φj (jt )djt
j
i
ψ i (X )−Xti −∆Ait
∂X
T
r∆
!!
Z
∆
ψ i (XTj ) − ∆Ait
j
+
φj (jt )djt
Gi (XT )φi
r∆
r∆
∂ i
V (At ; Xt ) = ∆
∂Ait t

−∆c0i (Ait ),
where we denote XT = Xt + At ∆ + r∆ t to simplify the notation.
51
The cross derivative can be derived as
2
2
∂
∆
i
V
(A
;
X
)
=
t
t
t
j
r∆
∂Ait ∂At

Z
Z

∆2
+
r∆
Z
∆2
+
r∆
Z
∆2
+ 2
r∆
Z
j
ψ i (X )−Xti −∆Ait
T
r∆

∂
U i (XT )φi (it )dεit  (−φ0j (jt ))djt
i
∂X
−∞
!
Z ∞
∂
U i (XT )φi (it )dεit (−φ0j (jt ))djt
j
i
ψ i (X )−Xti −∆Ait
∂X
T
r∆
!!
j
i
i
ψ
(X
)
−
∆A
t
T
G0i (XTj )φi
φj (jt )djt
r∆
!
!
j
i
ψ
(X
)
−
∆A
i
t
T
Gi (XTj )φ0i
ψi0 (XTj ) φj (jt )djt
r∆
where the first two terms are derived as in the proof of Theorem 1. These terms are bounded
in absolute by a constant times
∆2
r∆
as in the proof of Theorem 1.
Since Gi is absolutely continuous, we can apply the integration by parts to bound the third
term of
∂2
V i (At ; Xt )
∂Ait ∂Ajt t
by
Z
!
j
i
i
∆ ψ (XT ) − ∆At
j
j
0
j j
φ (t )dt Gi (XT )φi
r∆ r∆
"
!
#∞ ∆2 Gi (XTj )
ψ i (XTj ) − ∆Ait
j j
=
φi
φ (t )
r∆ ∆
r∆
−∞
Z
!
!
! j
i
i
∆
∆2 ψ i (XTj ) − ∆Ait
ψ
(X
)
−
∆A
j
j
j
j
j
t
j 0
0
i 0
j
T
(φ ) (t ) + φi
(ψ ) (XT ) φ (t ) dt +
Gi (XT ) φi
r∆ r∆
r∆
r∆
i
∆
ψ (x) − ∆Ait
ψ i (x) − ∆Ait
≤
sup Gi (x)φi
lim φj () − G(x0 )φi
lim φj ()
→∞
→−∞
r∆ x,x0 ∈R2
r∆
r∆
Z
ψ i (x) − ∆Ait
∆3
(ψi )0 (x) |φj (jt )|djt
+ 2 sup Gi (x)φ0i
r∆ x∈R2
r∆
i
Z
2
ψ (x) − ∆Ait ∆
j
j 0 j
sup Gi (x)φi
+
|(φ ) (t )|dt ,
r∆ x∈R2 r∆
2
where the last step involved Holder’s inequality. The first term of the righthand side is 0 because
of lim→±∞ φj () = 0. The remaining terms are bounded in absolute by a constant times
Lastly, the fourth term of
∂2
V i (At ; Xt )
∂Ait ∂Ajt t
is bounded in absolute by
∆2
2
r∆
∆2
.
r∆
sup |φ0i ()| times
a constant.
By the same argument as in the proof of Theorem 1, one can show the uniqueness of
equilibrium action profile in this subgame if
sup |φ0i ()|
r∆
are analogous.
52
is sufficiently small. The remaining steps
B.2
Equilibrium Multiplicity without Deadline
We demonstrate that in the continuous-time model in Section 5 equilibrium may not be unique
if there is no deterministic bounded deadline T . To make this point, we assume that the deadline
T is time of the first arrival of a Poisson process with fixed intensity λ > 0. For simplicity,
P
here we focus on the two-player case with one-dimensional state dXt = i=1,2 Ait dt + σdZt .
Moreover, we assume that ci depends on both Ait and Xt . As clarified in Section 6.1, the
uniqueness result in 5 still holds even when ci depends on both Ait and Xt if T is fixed.
Focusing on Markov strategies that depend on only the current state Xt , we can derive the
following HJB equations:
"
#
σ
Ak + (wi )00 (x) + λ U i (x) − wi (x) ,
−ci (x, Ai ) + (wi )0 (x)
0 = max
2
Ai ∈Ai
k=1,2
X
where wi (x) denote the continuation value of player i when the state Xt is equal to x. These
equations can be written as
"
#
σ i 00
A + (w ) (x) .
λU (x) − c (x, A ) + (w ) (x)
λw (x) = max
2
Ai ∈Ai
k=1,2
i
i
i
i
i 0
X
k
Therefore, the model is equivalent to the infinite-horizon model in which each player receives
flow payoff λU i (Xt ) − ci (Xt , Ai ) and has discount rate λ.
It is known in the literature that infinite-horizon differential games can have multiple equilibria in Markov strategies even if flow payoffs are independent of opponents’ actions (e.g., Tsutsui
and Mino (1990)). As a particularly simple example, consider the game in which the terminal
payoff U i is constantly zero and the flow payoff −ci is given by − 12 (Ait )2 + 4(Ajt )2 + γ(Xt )2
for some constant γ > 0. Based on Example 2 in Lockwood (1996), one can verify that there
are two possible values of θ > 0 such that the strategy profile of the form Ait = −θXt is a
(Markov perfect) equilibrium of this game when γ is small.39
This multiplicity contrasts with the uniqueness result in Section 5. Because there is no
terminal-value condition on the continuation value Wti , the “backward induction” argument
which drives the unique equilibrium in the original model does not work here.
39
Lockwood (1996) analyzes a deterministic model, in which σ = 0. Under quadratic payoffs, any Markov
strategy equilibrium profile that is linear in the state constitutes an equilibrium of the stochastic model as long
as the transversality condition is satisfied.
53
B.3
Quadratic Games
We analyze a particular class of the continuous-time model in Section 5 with quadratic terminal
payoff functions and cost functions:
U i (X) =
X
k=1,2
c̄i
ᾱik (X k )2 + β̄ik X k + γ̄i X 1 X 2 , ci (a) = a2 ,
2
where ᾱik , β̄ik , γ̄i , c̄i are fixed constants such that c̄i > 0. We also allow for unbounded controls40 :
Ai = R.
With these assumptions, the equilibrium PDEs (11) can be simplified to ODEs. Moreover,
Proposition S1 below states that an equilibrium is linear in the state variables, where the
coefficients follow ODEs. For simplicity, the proposition assumes the two-player case.
Proposition S 1. Consider a quadratic game. Let (αik (t), βik (t), γi (t), δi (t))i,k=1,2,t∈[0,T ] be a
solution to the system of ordinal differential equations
2(αii (t))2 γi (t)γj (t)
γ 2 (t) 4αij (t)αjj (t)
−
, α̇ij (t) = − i
−
,
c̄i
c̄j
2c̄i
c̄j
2αii (t)βii (t) βjj (t)γi (t) + βij (t)γj (t)
βii (t)
2(αij (t)βjj (t) + αjj (t)βij (t))
β̇ii (t) = −
−
, β̇ij (t) = −
γi (t) −
,
c̄i
c̄j
c̄i
c̄j
2αii (t)γi (t) 2(αij (t)γj (t) + αjj (t)γi (t))
βii2
βij βjj
2
2
γ̇i (t) = −
−
, δ̇i (t) = −(σ1 ) αi1 − (σ2 ) αi2 −
−
c̄i
c̄j
2c̄i
c̄j
α̇ii (t) = −
(18)
associated with the boundary conditions
αik (T ) = ᾱik , β̄ik (T ) = βik , γ̄i (T ) = γi , δi (T ) = 0.
(19)
Then a Nash equilibrium strategy takes the form
ai (t, x1 , x2 ) =
1
(2αii (t)xi + βii (t) + γi (t)xj ).
c̄i
Proof. We first solve (αik (t), βik (t), γi (t), δi (t))i,k=1,2,t∈[0,T ] by solving the system of ordinal differential equations
40
Because of the tractability of quadratic games, they have been widely employed in applications. A drawback
to this approach is that it involves unbounded terminal payoffs and unbounded control ranges, which violate
the assumptions for our main theorems; the uniqueness of equilibrium is an open question.
54
2(αii (t))2 γi (t)γj (t)
γ 2 (t) 4αij (t)αjj (t)
−
, α̇ij (t) = − i
−
,
c̄i
c̄j
2c̄i
c̄j
2αii (t)βii (t) βjj (t)γi (t) + βij (t)γj (t)
βii (t)
2(αij (t)βjj (t) + αjj (t)βij (t))
β̇ii (t) = −
−
, β̇ij (t) = −
γi (t) −
,
c̄i
c̄j
c̄i
c̄j
2αii (t)γi (t) 2(αij (t)γj (t) + αjj (t)γi (t))
β2
βij βjj
γ̇i (t) = −
−
, δ̇i (t) = −(σ1 )2 αi1 − (σ2 )2 αi2 − ii −
c̄i
c̄j
2c̄i
c̄j
α̇ii (t) = −
(20)
associated with the boundary conditions:
αik (T ) = ᾱik , β̄ik (T ) = βik , γ̄i (T ) = γi , δi (T ) = 0.
(21)
Guess the functional form of value function by
wi (t, x1 , x2 ) =
X
αik (t)(xk )2 + βik (t)xk + γi (t)xi xj + δi (t).
k=i,j
Substituting this guess into the HJB equations (11),
0=
X
k=i,j
1
(2αii xi + βii + γi xj )2
α̇ik (xk )2 + β̇ik xk + γ̇i xi xj + δ̇i + (σ1 )2 αi1 + (σ2 )2 αi2 +
2c̄i
1
(2αij xj + βij + γi xi )(2αjj xj + βjj + γj xi )
c̄j
2αii2
γi2
2αii γi
γi γj
4αij αjj
2
i 2
j 2
i j
=(x ) α˙ii +
+
+ (x ) α˙ij +
+
+ x x γ̇i +
+ (αij γj + αjj γi )
c̄i
c̄j
2c̄j
c̄j
c̄i
c̄j
2αii βii βjj γi + βij γj
βii γi 2(αij βjj + αjj βij )
+ xi β̇ii +
+
+ xj β˙ij +
+
c̄i
c̄j
c̄i
c̄j
2
βij βjj
β
+ δ̇i + (σ1 )2 αi1 + (σ2 )2 αi2 + ii +
2c̄i
c̄j
+
where we have omitted the argument t for coefficient functions. By the proposed ODE, the
right hand side becomes zero for every x1 and x2 . Also, the boundary condition holds with
wi (T, x1 , x2 ) =
X
ᾱik (xk )2 + β̄ik xk + γ̄i xi xj
k=i,j
which verifies the solution.
55
Finally, by Theorem 2, the equilibrium action is given by
ai (t, x1 , x2 ) =
wii (t, x1 , x2 )
1
= (2αii (t)xi + βii (t) + γi (t)xj ).
c̄i
c̄i
56