Equilibrium Uniqueness with Gradual and Noisy Adjustment∗ Ryota Iijima and Akitada Kasahara† September 3, 2016 Abstract We study a model of gradual adjustment in games, in which players can flexibly adjust and monitor their positions up until a deadline. Players’ terminal and flow payoffs are influenced by their positions. We show that, unlike in one-shot games, the equilibrium is unique for a broad class of terminal payoffs when players’ actions are flexible enough in the presence of (possibly small) noise. In a team-production model, the unique equilibrium approximately selects the efficient outcome when adjustment friction is small. We also examine the welfare implications of such gradualness in applications, including team production, hold-up problems, and dynamic contests. ∗ Results in Section 5 are based on our previous work “Gradual Adjustment and Equilibrium Uniqueness under Noisy Monitoring,” which was awarded the 18th Moriguchi Prize by the Institute of Social and Economic Research of Osaka University (Iijima and Kasahara, 2015a). We thank Drew Fudenberg and Akihiko Takahashi, as well as Nageeb Ali, Ian Ball, Darrell Duffie, Mira Frick, George Georgiadis, Ben Golub, Johannes Hörner, Yuhta Ishii, Matthew Jackson, Yuichiro Kamada, Kazuya Kamiya, Michihiro Kandori, David Laibson, Nicolas Lambert, Rida Laraki, Jonathan Libgober, Bart Lipman, Chiara Margaria, Eric Maskin, Akihiko Matsui, Stephen Morris, Nozomu Muto, Wojciech Olszewski, Daisuke Oyama, Larry Samuelson, Yuliy Sannikov, Andy Skrzypacz, Bruno Strulovici, Tomasz Strzalecki, Takuo Sugaya, Masashi Toda, Zhe Wang, Daichi Yanagisawa, Jeff Zwiebel, and seminar/conference participants at various places. All remaining errors are ours. † Iijima: Cowles Foundation for Research in Economics, Yale University, Email:[email protected], and Kasahara: Graduate School of Business, Stanford University, Email:[email protected]. 1 Introduction In a variety of economic settings, strategic interactions have often been modeled as simple one-shot decisions. Under this approach, time is not explicitly modeled and the outcome of a game is immediately determined by the players’ once-and-for-all and simultaneous decisions. However, this approach abstracts away from the fact that players often take a certain amount of time to finalize their actions in real-world settings, such as investment or production. In these situations, players may want to gradually adjust their positions as additional noisy information arises. This paper examines the implications of such gradualness in games, emphasizing the contrast with the one-shot modeling. As a motivating example, consider several firms engaged in a joint project with a fixed end date, such as the development of a new product with a pre-determined release date. Each firm continuously invests to improve the quality of the product, all the way up to the deadline. Demand for the new product on the release date is affected by the overall quality of the product as well as exogenous shocks reflecting market conditions. If firms simultaneously choose their total investment levels at the start, then they face a coordination problem in the form of equilibrium multiplicity, for example when there are strategic complementarities. The main insight of this paper is that this problem of multiplicity can be resolved by explicitly modeling gradual adjustment. We show that equilibrium multiplicity disappears when firms can incrementally adjust their investment levels in response to (possibly small) stochastic shocks. To formalize this, we consider a class of finite-horizon stochastic games which can accommodate a variety of applications, such as strategic investment, oligopoly competition, public projects, and R&D races. Each player is endowed with a publicly observable state variable, which describes her position in the game at each point in time until a deadline, and takes an action each period to influence its transition subject to a convex adjustment cost. The costs can depend on the state variables (and can thus also be interpreted as a net flow payoff), and there is a lump-sum payoff that depends on the final values of the state variables at the deadline. This model is parameterized by the length ∆ of each period (while the total duration of the game is fixed); as ∆ shrinks, the players adjust their state variables more gradually, as their actions each period have a smaller effect on the states. This is in contrast to the one-shot case in which each action changes the game outcome in a significant manner. In Section 3, we show that the model has a unique equilibrium if ∆ is small enough, as long as the size of the noise does not vanish too quickly. The condition requires that the noise size is sufficiently larger than players’ adjustment speed at a local (i.e., per-period) level. However, the total noise size in the game can vanish as ∆ → 0, i.e., the limiting model can be 1 deterministic.1 The uniqueness result holds for a general class of terminal payoff functions, even if, for example, they admit strategic complementarities so that one-shot modeling can easily generate multiple equilibria. In addition, the unique equilibrium has the Markovian property, that is, each player’s adjustment at each moment in time depends only on the current time and state and is independent of past histories. As we discuss in detail in Section 3.2, we identify the following key properties that underly the uniqueness result: (i) gradual adjustment; (ii) noisy state evolution; (iii) no myopic strategic interdependence ; and (iv) deadline. If any one of these features is absent, we can construct counterexamples where equilibrium multiplicity arises. Without gradualness (i.e., ∆ not small), each player’s optimal current action is sensitive to her belief about her opponents’ current actions. This substantial strategic interdependence creates the potential for equilibrium multiplicity. However, we show that this issue vanishes when the period length is small in the presence of noise. Here it is important not only that other’s current actions have a small impact on the state but also that their impact is diluted by the noise term. Moreover, the players’ continuation values are uniquely determined by the terminal payoffs at the deadline. As we will explain in Section 3.1, these observations make it possible to uniquely specify the players’ optimal actions and continuation values backward from the deadline under small period length and some amount of noise. In Section 4, we focus on a team-production model in which players share the same terminal payoffs as a function of total output produced by the players. Note that in the absence of gradualness (i.e., small ∆) or noise, we cannot rule out the possibility that players might coordinate on producing an inefficient outcome. The efficiency loss from this coordination failure can be nonneglibile even when the (adjustment) cost is arbitrarily small. We show that, provided the costs are small, this unique equilibrium under noise is approximately first-best. Intuitively, this is because some noise realizations can substitute for current efforts on the part of the players and promote subsequent efforts on their part. When the state is sufficiently close to the socially efficient outcome near the deadline, the players can coordinate on the efficient outcome with small costs and noise, which will unravel any inefficient equilibrium because the players will anticipate that even small noise fluctuations will trigger repeated coordination toward the efficient outcome. In Section 5, we consider a continuous-time model in which aggregate noise follows a Brownian motion. We prove equilibrium uniqueness in this continuous-time model, based on an application of a recent unique-solution result in backward stochastic differential equations (BS1 We note that equilibrium multiplicity is possible if we analyze the same deterministic model directly in continuous time. 2 DEs). The equilibrium strategy is also Markovian. We derive partial differential equations that describe the evolution of continuation values (subject to a smoothness requirement). A particular advantage of the Brownian-noise model is tractability. We solve a couple of specific examples, and examine particular channels through which the gradualness of players’ decisions can improve their welfare relative to the one-shot environment. In Section 5.4.2, we study a hold-up problem, in which a seller can improve the quality of a good but the price of the good is fixed ex-ante (independently of the quality). While the one-shot benchmark case predicts no investment in quality, we show that in our setting with gradual adjustment the seller will choose positive investment levels. This illustrates a form of “commitment power” embedded in the gradualness of the state. We solve for the equilibrium in closed form to quantify the welfare gains relative to the one-shot case. Section 5.4.3 studies a model of a dynamic contest, in which players exert costly effort to improve the quality of their own project, and a winner is chosen by comparing all players’ quality levels at the deadline. In contrast to the one-shot case, the players have flexibility in adjusting their effort depending on the current position even before the deadline. This flexibility allows players to save on effort costs in situations where it becomes relatively easy to predict who will win the contest. Due to this effect, the total welfare can be improved under the gradual setting relative to the one-shot game, and we compute the welfare gain numerically. Section 6 discusses a list of possible extensions of the model. While players’ positions and their transitions are continuous in the main model, in some situations positions are better described as discrete variables with infrequent jumps, for example, technology adoption, market participation, technological breakthrough, etc. In Section 6.2, we consider such a variation of the main model in which the state-variable process approximates a Poisson process in the limit. We again establish an equilibrium uniqueness result when the period length ∆ is sufficiently small. The intuition is much like in the main model: While the impact of players’ per-period actions is small in quantities under the main model, it is small in probabilities under the discrete-jump model. All proofs are relegated to the Appendix. In addition, a supplementary appendix contains results for variations of the main model, as well as an introduction to the mathematical results utilized in the proofs. 1.1 Related Literature There are several studies on models with a deadline in which players can adjust their positions gradually subject to some form of friction. One approach to modeling friction is with adjustment 3 costs, as in our paper. Caruana and Einav (2008b) consider a model of endogenous commitment power in which the cost is increasing with time until the deadline.2 Their main focus is not on coordination problems per se, and the model assumes a finite-period sequential-move structure, which makes it possible to apply backward induction. In contrast, there is no such exogenously specified ordering of players’ actions within each period in our model. Relatedly, Lipman and Wang (2000) study finitely-repeated games with switching costs and short period length. They obtain an equilibrium selection result in binary-coordination games while the equilibrium is not unique for general payoffs. Caruana and Einav (2008a) study a continuous-time Cournot competition model in which firms change the target production level before the market opens subject to an adjustment cost. In contrast to the papers in this literature, we introduce (small) stochastic shock to the adjustment process, which turns out to remove equilibrium multiplicity for a very general class of terminal payoffs. Another typical modeling of friction is via stochastic timing of adjustment opportunities, usually formulated with Poisson clocks. In Kamada and Kandori (2015), players’ Poisson clocks are correlated, and, in contrast to our uniqueness results, they show that a broad range of final outcomes can be sustained as equilibria even if the one-shot benchmark has a unique equilibrium. Calcagno, Kamada, Lovo, and Sugaya (2014) consider independent Poisson clocks and show that a unique equilibrium selects a Pareto efficient outcome for a certain class of coordination games.3 Their efficiency selection result is consistent with our efficiency results (though we impose stronger symmetry assumption). A special case in our discrete-jump model with small adjustment costs can be seen as an approximation of the independent Poissonclock models. Thus, one of the points our paper makes to this literature is that equilibrium uniqueness can obtain for general payoffs if players incur (possibly small) adjustment costs. A main point of our paper is that local noise can smooth out strategic sensitivity. This is reminiscent of the literature on global games and their variants such as Carlsson and van Damme (1993), Frankel and Pauzner (2000), and Burdzy, Frankel, and Pauzner (2001). The literature emphasizes the role of payoff/belief perturbations in equilibrium uniqueness,4 although the logic for equilibrium uniqueness is different from ours. While the literature relies on strategic complementarities and the existence of dominance regions, our result holds for a general class of terminal payoffs. 2 Caruana, Einav, and Quint (2007) applied this framework to bargaining. They provide an example of multiple equilibria in different classes of games. Lovo and Tomala (2015) prove the existence of Markov perfect equilibria in a more general framework, and Gensbittel, Lovo, Renault, and Tomala (2016) study zero-sum games. There are also papers that focus on specific applications such as bargaining (Ambrus and Lu, 2015) and auctions (Ambrus, Burns, and Ishii, 2014). 4 See Morris and Shin (2006) for a unified treatment of the various approaches in the literature including private signals, timing frictions, and local interactions. 3 4 The analysis in Section 5 demonstrates useful applications of BSDEs to dynamic games in continuous time. It is known that if players control the drift of publicly observable diffusion processes, then, for a fixed strategy profile, their continuation value can be described by a sensitivity process based on the martingale representation theorem (e.g., DeMarzo and Sannikov (2006)). In particular, when the game has a deadline, the continuation value process solves a BSDE with a fixed terminal time (Hamadene and Lepeltier, 1995). This representation method, either for a finite or infinite horizon, is useful because it enables a characterization of incentive compatibility using the sensitivity process. In this paper we highlight another advantage of this approach by using it to prove equilibrium uniqueness. This requires, first of all, a suitable restriction of the class of games that makes it possible to write down BSDEs that describe continuation values under all the equilibrium strategy profiles. This step involves a technical subtlety because this leads to multidimensional non-Lipschitz BSDEs that do not have a unique solution in general. We handle this issue by adapting a refinement of a recent result in stochastic analysis by Cheridito and Nam (2015). Generally speaking, the approach taken here suggests a useful recipe to prove equilibrium uniqueness beyond the particular models in this paper.5 In addition, we apply the comparison principle of BSDEs to conduct comparative statics. The model in Section 5 can be seen as a particular class of (stochastic) differential games (see Dockner, Jorgensen, Long, and Sorger (2001) for a survey). In important economic settings such as capital accumulation or duopoly with sticky prices, there typically exist many equilibria (Fudenberg and Tirole, 1983; Tsutsui and Mino, 1990). As observed in the literature, non-smoothness (or, discontinuities) of strategies is an important source of multiplicity (See also Section 3.1 in our context). Our results contribute to the literature by providing a uniqueness result under Brownian noise. Papavassilopoulos and Cruz (1979) show that, in a class of deterministic models similar to ours, an equilibrium is unique within the class of Markov strategies that are analytic functions of the state (if such an equilibrium exists).6 The approach using BSDEs in this paper allows us to show equilibrium uniqueness among all strategies, including those that are non-smooth and/or non-Markovian.7 5 To the best of our knowledge, this paper is the first to use BSDEs to obtain equilibrium uniqueness in continuous-time games. 6 Wernerfelt (1987) obtains an analogous result under a stochastic setting. That is, equilibrium is unique within the class of Markov strategies that are C 2 in the state variables (if such an equilibrium exists). 7 There are more recent papers in the continuous-time game literature that examine the properties of Brownian signals in sustaining cooperation under imperfect monitoring (Fudenberg and Levine, 2007, 2009; Sannikov and Skrzypacz, 2007; Faingold and Sannikov, 2011; Bohren, 2016). In contrast to their results, the current paper proves equilibrium uniqueness even under perfect monitoring. 5 2 Model We consider a model with a finite set of players N = {1, 2, · · · , n}, where each player has a control over her own endowed “state.” As we will explain, the interpretation of the state variable can vary according to the application, representing for instance accumulated capital, project quality, investment positions in markets, or price levels. To formalize the idea that each player can gradually adjust her state, we analyze a model with discrete periods t = 0, ∆, 2∆, ..., T parameterized by the period length ∆ > 0. We assume for notational convenience that T is divisible by ∆. Specifically, let Xt ∈ R denote player i’s state at time t, which follows a process of the form i = Xti + ∆(µit (Xti ) + Ait ) + r∆ it , Xt+∆ (1) given the exogenously fixed initial state X0i , where (it )t=0,...,T −∆ is a sequence of random variables that are identically and independently distributed with density φi . Here, r∆ is the noise scaling factor which, as we will discuss, should shrink as ∆ → 0. In each period t = 0, ..., T −∆, player i chooses her action Ait from a compact interval Ai after observing the past states (Xsj )s=0,...,t−∆,j∈N and actions (Ajs )s=0,...,t,j∈N of every player. Player i’s action Ait plus the function µit : R → R determine the drift of Xti . Let Xt = (Xti )i=1,...,n and εt = (it )i=1,...,n denote, respectively, the state vector and noise vector in period t. Depending on the application, the noise terms capture either stochasticity in the environment that players are faced with and/or slight imperfectness of implementation ability (“trembling hand”) on the part of players (See also Section 6.2, which considers a model with discrete state variables). Player i’s total payoff is U i (XT ) − T X ci (Ait )∆ (2) t=0 where U i : Rn → R is the terminal payoff as a function of the vector of players’ states, and ci : Ai → R is a flow payoff function that includes the adjustment cost of changing states. We specialize to a relatively simple setup to minimize the notation, but Theorem 1, as well as Proposition 1 in Section 4, hold more generally. For example, ci can depend on Xt , so that it can be interpreted as flow payoffs instead of an adjustment cost. We discuss a list of possible extensions of the model in Section 6. The section also considers a discrete-jump model where state variables are discrete (Section 6.2). Example 1 (Team production/project). Consider a team of workers who exert costly effort to produce output, where Xti denotes the output (or, the quality of a project) produced by individual i, which is also subject to randomness in the production process. In the last period, 6 P player i receives U i (XT ) = U ( j XTj ), where U (·) is the team payoff as a function of the total P team-output j XTj . Each player also incurs effort cost ci from producing the individual output every period. Note that there are strategic complementarities in workers’ output choices when U is (locally) convex. Example 2 (Currency attack). Consider a version of the currency attack model in Morris and Shin (1998) embedded in a real timeline. Multiple speculators try to attack the currency P regime. At the terminal period, the currency regime collapses with probability P ( j XTj ), P where j XTj is the aggregate short position accumulated by the speculators. The more the speculators sell the currency, the more likely the currency regime is to collapse at the end of the game, which can create equilibrium multiplicity as emphasized in the literature. The terminal payoff U i (XT ) represents the expected return of i conditional on the final positions XT . The convex adjustment costs ci can be interpreted as portfolio adjustment costs with limited liquidity. Example 3 (Investment in Oligopoly). n firms decide how much capital they invest each period Ait to improve its dominance in the market. Xti denotes the total amount of investment made by i, which increases the flow revenue i receives each period (Note that flow payoff can depend on Xt , see Section 6.1). This fits the framework analyzed in Athey and Schmutzler (2001), which is applied to several examples such as consumers with inertia (where Xti corresponds to the number of “loyal customers”) and learning-by-doing (where Xti corresponds to the accumulated experience that serves to lower the production cost). Caruana and Einav (2008a) consider a related deterministic model in which competing firms revise production targets Xti subject to adjustment costs. Example 4 (Contest). Players compete for a fixed prize, and the value of the prize does not depend on their effort. One example is an internal labor market where colleagues compete for a promotion. The boss gives a promotion to the employee whose project receives the highest evaluation. Each player chooses her costly effort level Ait at each time to improve the quality of her project Xti , and the boss evaluates performance based on the final quality of each player’s project at deadline T , XTi . We assume the following conditions on U i , ci , µit and it . Assumptions. 1. U i is bounded and Lipschitz continuous.8 8 For the uniqueness result (Theorem 1), it is sufficient to require U i to be locally Lipschitz continuous in Xi . 7 2. ci is continuously differentiable and strictly convex. 3. µit : R → R is Lipschitz continuous. 4. φi : R → R is twice continuously differentiable, and its first-order derivative is absolutely R∞ integrable, i.e., −∞ |(φi )0 (it )|dit < ∞. We note that the last condition is satisfied by many standard parametric distributions, such as the normal distribution. In addition, the condition is satisfied as long as the support of the noise is compact and the density is continuously differentiable. Remark 1. We mainly consider the static benchmark to be the case ∆ = T , so that players make choices only once. Alternatively, one can consider a static game in which player i directly chooses X i to receive payoff U i (X) without any cost. Our model with (1)-(2) can be seen as an elaboration of this alternative static benchmark as well, in particular when the adjustment costs ci is small. Under this interpretation, ci is mostly interpreted as a switching cost that could arise due to various cognitive or physical constraints. 3 Equilibrium Uniqueness The following is our main equilibrium uniqueness result. Theorem 1. There exists M > 0 such that if lim∆→0 r∆ ∆ ≥ M , then for all sufficiently small ∆ > 0, there is a unique subgame-perfect equilibrium. Moreover, the strategies in this equilibrium are Markov (i.e., functions of (t, Xt ) only) and pure. To understand the condition lim∆→0 r∆ ∆ ≥ M in the result, note that in each period, the impact of each player’s action is proportional to ∆, while the impact of noise is r∆ . Thus the condition means that, at a local level, the importance of players’ actions relative to noise is bounded. However, this does not mean that the outcome of the game is mostly driven by stochastic nose realizations rather than players’ actions. It is indeed possible that the aggregate P noise level r∆ Tt=0 t converges to 0 as ∆ → 0 when lim∆ √r∆∆ = 0. On the other hand, when √ r∆ is proportional to ∆, the noise process converges to a Brownian motion. We will study this continuous-time limit model with Brownian noise in Section 5. The highlight of this uniqueness result is that we do not impose substantial restrictions on U i . The one-shot benchmark with ∆ = T (and the alternative benchmark described in Remark 1) can easily have multiple equilibria, for example when U i admits strategic complementarities. Theorem 1 claims that this equilibrium multiplicity vanishes generally when players adjust state variables gradually with stochastic noise. In the next subsection, we will give intuition for this result with a simple example. 8 3.1 Illustrative Example In this subsection, we illustrate the intuition behind Theorem 1 with a team-project example (Example 1). Assume for simplicity µ(·) = 0 and a form of “equity partnership” in which P U i (XT ) = max{0, j XTj } holds (in a large interval that contains [X0 , X0 + N T ]). That is, players receive a positive benefit if their total output is positive and zero payoff otherwise (Figure P 1). Also suppose A = [0, 1], c is strictly increasing with c(0) = 0, and T < − j X0j < N T . With this simple example, we start with a static model, then allow players to gradually adjust their actions, and finally add noise to the state variable. Figure 1: Terminal payoff under equity partnership Static case: Consider the static benchmark case of ∆ = T . That is, player i chooses effort P level Ai0 only once and the final output is determined by j (X0j + T Aj0 ). Set the zero noise level r∆ = 0 to simplify the calculation. If c0 (1) < 1, then there exists a Nash equilibrium P Ai0 = 1 for every i. The symmetric equilibrium payoff is j X0j + N T − T c(1) > 0. On the other hand, there is always a Nash equilibrium in which Ai0 = 0 for every i. In this case, the symmetric equilibrium payoff is 0. Note that the equilibrium outcome in the latter case is Pareto dominated by that in the former case. Deterministic case: Equilibrium multiplicity similar to the static case remains even under gradual adjustment (i.e., ∆ ≈ 0) without noise (i.e., r∆ = 0). For N ≥ 3 and for any value θ ∈ (−N + 1, −1), one can verify that the strategy profile in which player i chooses 1 if P X j ≥ (T − t)θ t j i At = 0 if P X j < (T − t)θ t j is a subgame-perfect equilibrium (for the details, see Section A in Appendix). If the initial P state is such that j X0j ≥ T θ, the players always choose the maximal effort Ait = 1 on the P equilibrium path, and they achieve the equilibrium payoff j X0j + T N − T c(1) > 0. If not, 9 there is no effort at all, that is, Ait = 0, and the equilibrium payoff is 0. The logic here is analogous to the static case; players coordinate on either being productive or not, depending on whether the current state is higher than the threshold (T − t)θ, and no any single player is able to influence other players’ actions by any deviation. Note that the players’ continuation values are discontinuous in the state Xt is on the threshold; thus we cannot uniquely pin down the value of θ, which leads to a continuum of multiple equilibria.9 Figures 2 and 3 illustrate two different equilibria in which the former Pareto dominates the latter. Figure 2: A “good” Markov perfect equilibrium. line), and team members choose exceeds the marginal cost. Ait i i X0 P is located to the right of the threshold (dotted = 1 for all t. Since c0 (1) < 1, the marginal benefit of effort always Main case:. Now let us introduce stochastic noise and explain heuristically why equilibrium multiplicity vanishes with frequent adjustment and stochastic noise. First observe that the above multiplicity still remains if the noise size r∆ is small. In the above example, small noise does not prevent players from perfectly coordinate on either full-effort or non-effort based on the threshold strategy. However, this is not the case if the noise size r∆ is sufficiently larger than ∆; it becomes possible that some noise realizations push the state Xt to cross the threshold, which changes players’ coordination incentives. Intuitively speaking, given that the ratio r∆ ∆ is sufficiently large, each player at a subgame is mostly reacting to the stochastic shock, and coordinating with other players’ actions in the same period become relatively less important. To elaborate on the above point about vanishing coordination incentives, consider a player’s 9 It is worthwhile to clarify that the multiplicity under the deterministic model does not rely on the kink in the terminal payoff. Even when the terminal payoff is smooth, continuation values can be discontinuous due to players’ strategic choice. Also, the point that discontinuities in continuation values can give rise to multiplicity has been observed in the literature, for example Fudenberg and Tirole (1983) under an infinite-horizon strategic investment game. 10 Figure 3: A “bad” Markov perfect equilibrium. i i X0 P is located to the left of the threshold (dotted = 0 for all t. Since a single player’s deviation cannot move the line), and team members choose total output over the threshold to the right region, the marginal benefit of effort is always zero. Ait incentive t the start of the last period t = T − ∆, which is to choose Ait to maximize the value of the subgame payoff Vti,∆ (At ) := −c i (Ait )∆ Z + U i (Xt + At ∆ + r∆ t ) φ(t )dt given the opponents’ actions (Ajt )j6=i . To examine their coordination incentives under small ∆, for convenience now we assume that U i is twice differentiable with derivatives that are bounded. Then we can compute the strategic interaction effect by directly taking the cross derivative of the form ∂2 i,∆ (At ) = j Vt i ∂At , ∂At Z (U i )00 (Xt + At ∆ + r∆ t ) φ(t )dt ∆2 . Intuitively speaking, this is because each player’s action influences the state to first order in ∆, and the interactions among their influences is of order less than ∆. On the other hand, the second-order partial derivative ∂2 V i,∆ ∂(Ait )2 t is on the order of ∆ because of the strict convexity of the adjustment cost ∆c00 (Ait ). The first-order condition implicitly defines i’s best response against the opponents’ actions at the subgame, and by the implicit function theorem, its slope is bounded by 2 ∂ V i,∆ ∂Ait ∂Ajt t 2 ∂ V i,∆ . ∂(Ait )2 t (3) This ratio is close to 0 when ∆ is small, meaning that each player’s incentive is nearly insensitive to her opponents’ actions, and thus the equilibrium actions at time t are uniquely determined 11 by the contraction mapping theorem. The basic idea is to repeat the same line of argument backward from the last period T − ∆ i by replacing the terminal payoff U i by the expected continuation payoff denoted by Wt+∆ , but this involves subtleties. While U i was assumed to have second derivatives uniformly bounded in the previous paragraph, we cannot impose such a restriction because continuation values vary endogenously; they can be discontinuous due to players’ strategic incentives (in general, even when U i is smooth).10 However, with the presence of noise, the expected continuationR i value Wt+∆ (Xt + At ∆ + r∆ t ) φ(t )dt becomes smooth in (Ajt )j∈N . The logic here is that one can think of each action Ait as smoothly changing a density of it , instead of deterministically changing the next state Xt+∆ (a change of measure, formally speaking). Under the assumption that the local noise ratio r∆ ∆ is large, players’ actions have only a limited impact on the noise densities, and thus we can derive bounds on the strategic interaction term analogous to those on (12) for all small enough ∆ > 0. 3.2 Key Assumptions for the Uniqueness In this subsection, we identity and elaborate on four key assumptions behind Theorem 1: (i) Gradual adjustment (ii) Noisy state evolution (iii) No instantaneous strategic dependence (iv) Deadline Gradual adjustment (i) means that a player’s decision at each time has only a small influence on the value of the state variable. More specifically, this influence is of order ∆.11 If not, we cannot ensure uniqueness in general, for the same reason that a one-shot game can have multiple equilibria. For (ii), recall that the continuation values can be very steep or discontinuous in the state without noise, which can lead to a continuum of multiple equilibria even when ∆ is small (e.g., Section 3.1). With enough local noise (i.e., r∆ ), ∆ such strategic sensitivities are smoothed out. In (iii), we mean that the cross derivative of flow payoffs in players’ actions are zero (or, at least, small). The model in this paper is motivated to study situations in which players influence others’ payoffs via changing states,12 rather than directly changing flow payoffs. Introducing 10 Another issue is that we want the derivatives to be bounded uniformly in ∆. But this is more difficult to ensure as the number of periods explodes as ∆ → 0. 11 In the case of the discrete-jump model (Section 6.2), it is also of order ∆ in expectation, and we prove that there exists a unique equilibrium if ∆ is small enough. 12 Note also that flow payoffs can depend on Xt (Section 6.1). 12 cross-derivative terms in flow payoffs can clearly create coordination problems at the flow payoff level. With the presence of a deadline (iv), we can conduct a “backward induction” argument to construct a unique equilibrium, combined with (i)-(iii). While we can allow for the length of game T to be stochastic, possibly influenced by players’ actions and states, we want the distribution to be uniformly bounded. (See Section 6.1 and also Supplementary Appendix B.2 for an example of multiple equilibria in the case of unbounded distributions). 4 Efficiency with Team Production In this section, we focus on the special case where symmetric players play a team production model (Example 1). Specifically, all players share the same adjustment cost ci (Ait ) := c(Ait ) and P P terminal payoff U i ( i XTi ) := U ( i XTi ) with c and U satisfying the assumptions in Section 2. As we have seen in Section 3.1, there can be multiple equilibria that are Pareto ranked, and the welfare loss of coordinating on an inefficient equilibrium is non-negligible even with small adjustment costs. We show that in this team production model, players can achieve approximate efficiency at some equilibria under small frictions (noise and adjustment cost). Combined with the uniqueness result in the previous section, this result guarantees a condition under which players can always attain approximate efficiency. Proposition 1. Consider the team production model. Then, for any > 0, there exists δ > 0 such that if max{sup |c|, 2 r∆ } ∆ ≤ δ, there is a subgame-perfect equilibrium in which the expected payoff of any player is greater than ! U Q max X∈ i∈N (X0i +T A) X Xi − . (4) i That is, in (4) players approximately achieve the best terminal payoff among the outcomes that are collectively achievable (under the deterministic environment).13 The key step of the proof of Proposition 1 involves constructing a single-agent optimization problem in which a (hypothetical) planner chooses all the actions (Ait )i∈N,t=0,∆,···T −∆ to maximize E[U (XT ) − P ∆ i,t c(Ait )]. We show that this value provides a lower bound for an equilibrium. Under small costs and noise, the optimal value in this planner’s problem approximates (4). Proposition 1 (efficiency) requires a small noise level r∆ (relative to ∆) to ensure small aggregate volatility of the game outcome, which is the opposite of Theorem 1 (uniqueness), 13 While we look at small adjustment cost with a fixed terminal payoff, approximate efficiency (in terms of the ration between the first-best payoff) also obtains when we look at large terminal payoff relative to cost. 13 where we needed enough noise (relative to ∆) to smooth out strategic uncertainty. For an intermediate rate of noise level, by combining Theorem 1 and Proposition 1, we conclude the following corollary: Corollary 1. Consider the team production model. Assume that lim r∆∆ = ∞ and lim 2 r∆ ∆ = 0. Then, for any > 0, under sufficiently small ∆, there is a unique subgame-perfect equilibrium and its expected payoff for any player is greater than (4). To informally explain why inefficient equilibria do not exist here, reconsider the example P in Section 3.1. If we look at period t near the deadline in which the team output i Xti is sufficiently high, every player finds it optimal to work hard. But they have no incentive to work if the output at t is too low. At earlier period t0 < t, players have additional incentives to work harder because they anticipate that pushing up the team output could “encourage” other players to work harder at later periods (and this effect is guaranteed to be positive with the presence of noise). Given that the efficiency term in the terminal payoff U is sufficiently large relative to the cost c, this effect accumulates as |c| becomes small, so that the ex-ante equilibrium payoff becomes very efficient. 5 Continuous-time Limit with Brownian Noise In this section, we analyze a continuous-time model with Brownian noise, which can be seen as √ a limit of the model in Section 3 as ∆ → 0 where r∆ is proportional to ∆. 14 There are three main reasons for this exercise. First, Brownian noise is often used in applied works, and it is of independent interest to examine this case directly in continuous time. Second, while continuous-time models are usually computationally more tractable than discrete-time models, deterministic continuous-time models are in general (even under Markov strategy restrictions) not well-defined and lack equilibrium uniqueness (See footnote 16). Introducing Brownian noise resolves these two issues nicely. Lastly, we stress a technical interest in applications of backward stochastic differential equations (BSDEs) to dynamic games. We prove equilibrium uniqueness and comparative statics based on BSDEs. The proof methods work even when the standard approach based on HJB equations is not applicable (See Section 5.3). √ We have not formally investigated whether the (unique) equilibrium under r∆ ∝ ∆ converges to the one under the continuous-time limit, partly because taking such a Brownian-noise limit is not necessary for our main results (in Section 3). Staudigl and Steg (2014) show the convergence of expected payoffs for a fixed strategy profile in the similar setup of repeated games with imperfect public monitoring assuming a full-support signal distribution. 14 14 5.1 Setup The model is formulated directly in continuous time. Player i controls his own state variable as dXti = (µi (Xti ) + Ait )dt + σ i dZti , (5) for a fixed initial value X0i . As in the discrete-time model, Ait ∈ Ai is the control chosen by player i at time t.15 The control range Ai is assumed to be a compact interval in R. We assume that Brownian shock dZti is independent across players, while this assumption can be relaxed. Player i’s total payoff is i Z T ci (Ait )dt. U (XT ) − 0 We impose the same conditions on ci and U i as we did in the discrete-time model. In addition, we assume that each µi is bounded. At time t, player i observes the state Xt but does not directly observe the control actions Ajt of the other players. Formally, player i’s (public) strategy is an Ai -valued stochastic process Ai = (Ait )t∈[0,T ] that is progressively measurable with respect to F X , the filtration generated by the public history of Xt . Note that this setup allows the players’ actions to depend on past histories of the Xt process.16 Under any profile of strategies A = (Ai )i∈N , we can define player i’s continuation value at time t as Wti (A) Z i = E U (XT ) − t T c i (Ais )ds X Ft . (6) A profile of strategies A = (Ai )i∈N is a Nash equilibrium if each player maximizes her expected payoff (6) at t = 0 given the other players’ strategies. Since the distribution of the publicly observable Xt process has full support in our setup, the Nash equilibrium concept is identical to its refinement known as public perfect equilibrium, which is frequently used in the literature on dynamic games. 15 We use the so-called weak formulation, which is the standard way that a continuous-time game with imperfect public monitoring is formulated in the literature (see Appendix for the formal description). 16 We assume imperfect observation only for the purpose of avoiding the well-known technical issue in continuous-time games. That is, the outcomes of a game may not be well-defined for a fixed strategy profile, even under Markovian strategies (See other approaches, e.g., Simon and Stinchcombe (1989) and Bergin and MacLeod (1993), for careful restrictions on the strategy space). This setup is standard in dynamic games and contracts that model players who observe noisy public signals about the actions of the other players (e.g., Sannikov (2007)). Note also that we rule out mixed strategies in this model. Both of this and imperfect monitoring are not crucial assumptions for equilibrium uniqueness because Theorem 1 in the discrete-time model holds under allowing for mixed strategies and perfect monitoring as well. 15 5.2 Equilibrium Uniqueness We follow the standard approach in characterizing the optimality of each player’s strategy by using the martingale representation theorem.17 For any profile of strategies A = (Ai )i∈N , there is a profile of processes (β i (A))i∈N that are progressively measurable with respect to F X such that the evolution of player i’s continuation value satisfies dWti (A) = ci (Ait )dt + βti (A) · dZt (7) along with the terminal condition WTi (A) = U i (XT ). (8) Here β i (A), whose existence and uniqueness are ensured by the martingale representation, is hR i T an F X -progressively measurable Rn -valued process such that E 0 |βti |2 dt < ∞.18 Intuitively, βti,k (A) measures the sensitivity of player i’s continuation payoff to fluctuations in k’th shock Ztk at time t. To simplify the notation, we introduce vector/matrix notations. Denote the drift vector and the covariance matrix of the state variable by µ(Xt ) + At ∈ Rn and Σ ∈ Rn×n , respectively.19 Rewriting dXt = (µ(Xt ) + At )dt + ΣdZt as dZt = Σ−1 (dXt − (µ(Xt ) + At )dt) and substituting the latter into equation (7), we obtain dWti (A) = i i c (At ) − βti (A) · Σ−1 (µ(Xt ) + At ) dt + βti (A) · Σ−1 dXt . To characterize the players’ incentive compatibility constraint, we define player i’s response function f i : Rn×n → Ai as f i (b) = arg max bi · Σ−1 αi − ci (αi ) αi ∈Ai for each b = (bi )i∈N ∈ Rn×n . The maximand is taken from (the negative of) the drift term in the expression for the continuation value above. The function f i is well-defined because we assume strict convexity of ci in Ai by assumption. Using these, based on the standard argument we can characterize the optimality of strategies in terms of the following “local incentive compatibility 17 The application of the martingale representation theorem to stochastic control problems has a relatively long history. For example, see Davis (1979) for a historical survey and Pham (2010) for a textbook-style introduction. 18 In this paper we use | · | to denote both the Euclidean norm for vectors and the Frobenius norm for matrices. 19 More specifically, the i-the element of µ(Xt ) + At is µi (Xti ) + Ait , and the (i, j)-element of Σ is σ i if i = j and zero otherwise. 16 condition” (Lemma 6 in Appendix): i’s strategy Ai is optimal against A−i if and only if Ait = f i (βt (A)) almost surely for almost all t. (9) Lemma 1. A strategy profile (Ai )i∈N is a Nash equilibrium if and only if (7), (8), and (9) hold for some F X -progressively measurable processes W, β such that 2 sup |Wt | E Z < ∞, E 0≤t≤T T |βt | dt < ∞. 2 0 Combining equations (5), (7), (8), and (9) leads to a multi-dimensional backward stochastic differential equation (BSDE) dWti = ci (f i (βt )) − βti · Σ−1 µi (Xti ) + f i (βt ) dt + βti (A) · Σ−1 dXt WTi = U i (XT ) (10) under a probability measure for which the Xt process is a standard Brownian motion. A solution (W, β) to this BSDE describes the stochastic evolution of the continuation payoffs. Moreover, the control Ait given by f i (βt , Xt ) is incentive compatible for player i and is consistent with the evolution of the continuation payoffs. While the class of BSDEs in (10) is non-standard in the literature, we build on the results of Delarue (2002) and Cheridito and Nam (2015) to show that this BSDE has a unique solution (W, β), which establishes our main claim of equilibrium uniqueness. Theorem 2. There exists a unique Nash equilibrium. Moreover, this equilibrium has the Markovian property, that is, the continuation values and the strategies take the form Wti (A) = wi (t, Xt ) and Ait = ai (t, Xt ) for some function wi : [0, T ]×Rn → R that is Lipschitz continuous in x ∈ Rn uniformly in t ∈ [0, T ] and some function ai : [0, T ] × Rn → Ai . 5.3 Relation to Dynamic Programming Approach Theorem 2 shows that the unique equilibrium is Markovian with respect to time and state variables. This means that on the equilibrium path each player solves a Markov stochastic control problem. A standard approach to this problem is to write down HJB equations (see, e.g., Pham (2010)): 0 = max i a ! n n i ∂wi (t, x) X 1 ∂ 2 wi (t, x) X k ∂w (t, x) + + µ (x) + ak − ci (ai ) , k )2 (∂xk )2 k ∂t 2(σ ∂x k=1 k=1 17 (11) with a terminal condition wi (T, x) = U i (x). This non-linear PDE describes how player i’s continuation value evolves over time in response to changes in Xt . Although this dynamic programming approach might be more familiar to economists, the existence of a smooth solution to the nonlinear PDE is not easy to ensure.20 Moreover, it cannot allow for non-Markovian and non-smooth strategies, so that we cannot rule out the existence of other equilibria.21 For the proof of Theorem 2, we can avoid these issues by directly relying on the uniqueness property of BSDE (10). 5.4 Examples The continuous-time model with Brownian noise is more tractable than the discrete-time model because the BSDE (10) allows for convenient numerical analysis (Delong (2013), Ch 5), in particular its reduction to a PDE in (11). In this subsection, we illustrate the usefulness through three specific applications. In particular, the latter two examples, a hold-up problem and a contest, illustrate channels under which gradualness improves players’ welfare. 5.4.1 Team Production As a first example, we consider a team-production model in which players share a common terminal payoff that depends only on the sum of the state variables. From Corollary 1, we know that players can always approximately achieve efficiency when frictions are small. To examine the dependence of the equilibrium outcome distribution on the initial point when frictions are not necessarily small, we conducted numerical simulations with two players, the terminal payoff U (x) = sin(ax)+bx−cx2 with a = 0.5, and b = c = 0.01 (Figure 4), and the quadratic cost function c(a) = κ2 a2 . The terminal function two local maxima over x ∈ [−10, 6], and we cannot rule out multiple equilibria in the static benchmark case, because the players can coordinate on “targeting” either of the locally maximum points. In particular, the players might coordinate on playing a Pareto-inefficient equilibrium by targeting the local maximum at x ≈ −8.30 rather than the global maximum at x ≈ 3.28. Figure 5 shows a set of simulation results for the stated utility function with different initial points X0 := X01 + X02 and cost levels κ. In each panel, we fix X0 ∈ {−6, 0, −5.2} and compare the simulated distribution of XT with κ = 0.5 (grey) and that with 0.2 (black). For 20 In Iijima and Kasahara (2015a), we provide a sufficient condition for the existence of a smooth solution with stronger restrictions of U i and Inada conditions on ci . 21 In general, non-Markovian strategies can be effective in generating equilibrium multiplicity even for finitehorizon games, as observed in the literature of repeated games as well as more recent papers such as Kamada and Kandori (2015). Also note that non-smoothness of continuation values is a key aspect to cause equilibrium multiplicity as well. 18 each parameter specification, we compute the equilibrium strategy profile and then randomly generated XT 1000 times and plotted the resulting values as a histogram. We found that when X0 is very close to one of the local maxima of U (XT ), as in Panels A and B, the players tend to coordinate on the closer of the two local maxima, regardless of cost level. However, when the initial point is not very close to one of the local maxima, the cost level can significantly affect the distribution of XT . In Panel C, although X0 = −5.2 is closer to the inefficient local maximum, players try to push the state variable toward the global maximum in more than half of the simulations when the cost level is sufficiently small.22 Finally, we consider comparative statics of the equilibrium payoff assuming that U is monotone. We let A(κ, N ) denote the unique equilibrium under the cost parameter κ and team size N. Proposition 2. Consider the team production game in which U is increasing. Then W0 (A(κ, N )) is decreasing in κ and increasing in N . While this result may not be surprising in and of itself, it ensures that our “equilibrium selection” has natural properties. In general, an arbitrary selection of an equilibrium can lead to strange comparative statics results when we vary parameters locally. The proof is based on the comparison theorem applied to the equilibrium BSDEs across different parameters. In order to conduct the appropriate comparison, we need to ensure that βt is always positive in the equilibrium. This is guaranteed by the co-monotonicity theorem (Chen, Kulperger, and Wei, 2005; dos Reis and dos Reis, 2013), which is useful in checking the signs of sensitivity processes βt of BSDEs in general. 5.4.2 Hold-Up Problem In this example, we study a class of hold-up problems that highlight a form of “commitment power” embedded in gradual adjustment (in contrast to one-shot adjustment). In particular, we obtain a closed-form solution for the equilibrium strategies and values to calculate the “efficiency gain” of gradualness relative to the static benchmark. This also illustrates the point that equilibrium outcomes in the game with gradual adjustment (with small noise) can be very different from those under the static benchmark model, even if the equilibrium of each model is unique. Consider a two-player game with a seller (player 1) and a buyer (player 2) with X0i = 0 for each i = 1, 2. The seller chooses investment A1t to improve the quality of a good Xt1 by incurring a flow cost c1 (A1t ). The buyer decides the amount of the purchase, Xt2 . His 22 We thank an anonymous referee for suggesting this simulation experiment. 19 U(X) 1.5 1 0.5 0 −0.5 −1 −1.5 −2 −10 −8 −6 −4 −2 X 0 2 4 6 Figure 4: Terminal payoff function with multiple local maxima Panel A: X0 = −6 250 κ = 0.5 κ = 0.2 200 150 100 50 0 −10 −8 −6 −4 −2 0 2 4 6 0 2 4 6 0 2 4 6 Panel B: X0 = 0 250 200 150 100 50 0 −10 −8 −6 −4 −2 Panel C: X0 = −5.2 250 200 150 100 50 0 −10 −8 −6 −4 −2 Figure 5: Simulated distributions of XT with different initial points X0 and cost levels κ. In each panel, we fix X0 ∈ {−6, 0, −5.2} and compare the simulated distributions of XT with κ = 0.5 (grey) and 0.2 (black). For each parameter set, we simulated XT 1000 times and plotted them as a histogram. The other parameters are σ = 1, N = 2, and T = 1. 20 adjustment cost c2 (A2t ) captures the transaction costs of—or the frictions involved in loading— new goods, such as preparing the necessary facilities. The price per unit is exogenously fixed. The seller’s terminal payoff U 1 is her revenue, and the buyer’s terminal payoff U 2 is his utility of consumption. We make the following assumptions: (i) is increasing in the amount of the buyer’s purchase; (ii) does not depend on the quality per se; (iii) ∂2U 2 ∂X 1 ∂X 2 ∂U 1 > 0, that is, the seller’s benefit ∂X 2 1 ∂U = 0, that is, the seller’s revenue ∂X 1 1 > 0; and (iv) c is minimized at 0. When the quality of a good is not contractible, incentivizing the seller to make a costly investment is a difficult problem in general. The seller and buyer could negotiate the price of the good after observing the final quality, but this approach may not be helpful when the buyer gains bargaining power ex post. In this subsection, we examine an alternative scheme in which the seller and buyer agree on a fixed price ex ante and the buyer gradually adjusts the scheduled amount of the purchase while observing the seller’s quality investment thus far. In this setting, the seller has an incentive to choose a positive level of investment under the dynamic model, because it will encourage the buyer to purchase the good at a later time.23 This is in contrast to the static benchmark case, where the seller chooses the dominant strategy, A10 = 0. To quantify the welfare gain, we focus on the following quadratic payoffs to obtain a closedform solution: U 1 (X) = X 2 , U 2 (X) = X 1 X 2 , ci (Ai ) = c̄i i 2 (A ) 2 for some c̄1 , c̄2 > 0 with Ai = R. In Proposition S1 in Supplementary Appendix, we show that a linear equilibrium in a general quadratic game is obtained by using a system of ordinary differential equations, which is independent of σ. Accordingly, we obtain the following result. Proposition 3. Consider the hold-up problem with quadratic payoffs. Then there is a Nash equilibrium A = (Ai )i=1,2 given by A1t = T −t c̄1 c̄2 | {z } , A2t = cooperation term (T − t)2 2c̄ (c̄ )2 | 1{z2 } + Xt1 . c̄2 cooperation term Moreover, the corresponding equilibrium payoffs are as follows: W01 (A) = T3 T 2σ2 T5 2 (A) = , W + 0 3c̄1 (c̄2 )2 4c̄2 8(c̄1 )2 (c̄2 )3 23 The logic is similar to that of Bohren (2016), who studies an infinite-horizon model in which a long-lived seller is faced with a continuum of short-lived buyers at each moment in time. The seller has an incentive to make an investment, since the effect is persistent over time, which will convince prospective future buyers to make purchases. In our model, the buyer is not myopic, and his forward-looking incentive is captured by the cooperation term. 21 The basic reason for the welfare improvement is that the seller can acquire commitment power from the gradually evolving state, which encourages the seller’s investment. Also, anticipating the positive investment incentive by the seller, the buyer has an additional incentive to increase the purchase amount. These effects are captured by the “cooperation terms” in the above formula, and disappear as time t approaches the deadline T .24 As an extreme form of a seller’s commitment power, we can compute the Stackelberg equilibrium in the static benchmark game, where the seller chooses A10 = A20 = T . c̄1 c̄2 The seller’s payoff is T3 , 2c̄1 (c̄2 )2 T , c̄1 c̄2 and the buyer chooses which is always higher than the dynamic equilibrium payoff in Proposition 3. The buyer’s payoff is T5 , 2(c̄1 )2 (c̄2 )3 which can be lower than the dynamic equilibrium payoff if σ is high. This captures the value of action flexibility in a dynamic setting that is missing in the static case. The payoffs in the static benchmark case are not affected even if a noise term ZT affects the state variables. 5.4.3 Dynamic Contest As a final example in this section, we apply our framework to a contest (Example 4), in which players compete for a fixed prize, and the value of the prize does not depend on their effort.25 We consider a two-player game where each player i = 1, 2 exerts costly effort Ait at each time t to improve the quality Xti of her project given the initial value X0i . We normalize the payoff for the winner of the prize to 1. We suppose the winner is stochastically determined with the following additive noise: U i (XT ) = Pr[XTi + θi > XTj + θj ], where θi is a random element that affects the contest designer’s judgment. Hence the terminal payoff for player i is increasing in the final quality difference XTi − XTj . By assuming that the distribution of θ := θ1 − θ2 admits a bounded density, the terminal payoff function U i becomes Lipschitz continuous. In contrast to the static benchmark with one-shot effort choice, in this model players can adjust their effort choice gradually depending on the current quality difference before the deadline. To examine this point, we numerically solved the equilibrium PDE (11) using the symmetric quadratic cost funcitons ci (a) = 2c̄ a2 with Ai = R and terminal payoffs with logit noise U i (X) = exp[γX i ] , exp[γX 1 ]+exp[γX 2 ] where γ > 0.26 The numerical simulation results indicate that the leader (i.e., the player with the higher value of the state variable) generally exerts greater effort 24 We note that the unbounded control ranges deviate from the original model formulation, and thus the equilibrium uniqueness is an open question. 25 See Konrad (2009) for an extensive survey of this field. 26 We used the unbounded control range in the simulation for simplicity. 22 than the follower. The leader has an incentive to “discourage” the follower and get her to exert less effort. Relative to the static benchmark case, we observe that the expected total payoff for each player is higher than in the dynamic setting (Figure 3). That is, the total cost of effort is lower in expectation. This is because the players tend to exert less effort under the dynamic model when the state difference becomes large (Figure 4). Another observation from the simulations is that the equilibrium effort level ai (t, X) is monotone in t (Figure 4). In particular, it is increasing in t when competition is fierce, that is, when X 1 ≈ X 2 . To understand this, note that the benefit of exerting effort in such a situation is greater when t is closer to the deadline, because the magnitude of the remaining Brownian noise is smaller. On the other hand, consider the case where |Xt1 − Xt2 | is large. Then the winner of the contest is essentially already determined if t is close to the deadline, hence the players tend to stop choosing high effort levels, and this makes ai decreasing in t.27 Harris and Vickers (1987) study a related continuous-time model of contest games, called the tug-of-war model.28 In this model, the players exert effort to influence the drift of their state variables, as in the current paper. As in our model, in each state the leader always exerts more effort than the follower. In contrast to our model, there is no fixed deadline; one of the players wins, and the game ends if the difference between the values of the state variables reaches some pre-established value. Harris and Vickers (1987) focus on time-stationary strategies, while our model allows us to investigate the effects of the length of time remaining (before the deadline) on equilibrium behavior. 27 The simulations suggest that ai tends to be increasing in t, especially when γ is small. To interpret this, note that in more random contests (smaller γ), raising the level of effort has a greater marginal effect on the winning probability. Therefore, the players become competitive even when the difference |X 1 − X 2 | is large. 28 Budd, Harris, and Vickers (1993) analyze a more general model. Cao (2014) also studies a model with asymmetric players and the effect of a handicapping policy. Yildirim (2005) analyzes a discrete-time contest model with multiple periods. In contrast to our results, there are multiple equilibria when the players are asymmetric, and the total effort is typically at least as great as that in the one-shot model. Seel and Strack (2015) consider a continuous-time model in which players solve the optimal stopping problem with a private observation of the own state variables. 23 w1(0,X1,X2)+w2(0,X1,X2) 1 0.99 0.98 0.97 0.96 0.95 Dynamic Static 0.94 0.93 −2 −1 0 1 2 X1−X2 Figure 6: Expected total payoff of the players under different initial state differences X01 − X02 . The total payoff under the dynamic model (solid line) exceeds that under the static model (dashed line) for each initial state. The other parameters are σ = 1 and γ = 1. a1(t,0.5,−0.5) a1(t,1,−1) 0.2 0.13 Dynamic Static 0.195 0.125 0.19 0.12 0.185 0.115 0.18 0.11 0.175 0.105 0.17 0.165 0 0.25 0.5 t 0.75 0.1 1 0 0.25 0.5 t 0.75 1 Figure 7: Time dynamics of player 1’s optimal control policy in the dynamic contest game when player 1 is the leader. The solid lines are equilibrium strategies under the dynamic model: The effort level a1 (t, X) is increasing in t when Xt1 and Xt2 are close (left panel, with Xt1 −Xt2 = 1) but decreasing in t otherwise (right panel, with Xt1 − Xt2 = 2). The dashed lines are the equilibrium effort levels under the static benchmark for different initial states. The other parameters are σ = 1 and γ = 1. 24 6 6.1 Extensions Generalizations of the Main Model While we have specialized to a relatively simple setup so far, Theorem 1, as well as Proposition 1 in Section 4, hold more generally. In this section, we briefly discuss some of generalizations.29 State-dependent adjustment cost: ci can depend both on Ait and Xt as long as ci (At , Xt ) is Lipschitz continuous in Xt and the second derivative in Ait is bounded away from 0. This generalization allows us to interpret ci as a general flow payoff rather than cost of adjusting X i . For example, firms in dynamic oligopoly may earn flow profits as a function of interim capital levels (Example 3). Discontinuous payoffs: The terminal payoff U i can be discontinuous at countably many points as long as the noise density is sufficiently flat. See Section B.1 in the supplementary appendix for the details. This allows for so-called threshold technologies in the context of public good provision (Example 1). Also, in Example 2, the central bank may abandon the present currency regime if and only if the aggregate short position exceeds a known threshold, which leads to discontinuous terminal payoffs. Aggregate state: While each player is endowed with his own state variable in the main model, we can relax this restriction by introducing (possibly several) aggregate state variables whose drifts are determined as weighted sums of players’ actions. For example, the total output produced by team members can be interpreted as a single aggregate state variable in a team project (Example 1). This is also useful for analyzing anonymous markets where players do not necessarily observe individual transaction levels. Imperfect monitoring: While we have assumed that players can perfectly observe other players past actions, this specification is irrelevant for the results because the unique equilibrium in Theorem 1 is Markovian with respect to time and the current state vector. As long as players can observe the state variables, there is a unique perfect Bayesian equilibrium. Random deadline: The deadline can be random as long as it is bounded. Moreover, it is possible to specify that the game ends earlier than the deadline when a state variable hits a certain threshold. This generalization is related to Georgiadis (2015), who analyzes a teamproject model without a deadline in continuous time. Non-i.i.d. noise: Theorem 1 holds even if noise is correlated across players and/or peri- ods as long as the noise density φit+∆ (·|()0≤s≤t ) conditional on past realizations satisfies the assumptions in Section 2. This generalization is important when the noise contains common 29 The uniqueness result in Section 5 can be proved under these generalizations at least except for discontinuous payoffs and non-iid noise. 25 value components and/or time persistence as in a macroeconomic or industry-wide shock.30 6.2 Discrete-Jump Model So far we have focused on the environment in which the state variables and noise terms are continuous. Such a formalization does not adequately represent some applications where players’ states are naturally represented by discrete variables, for example when players choose whether or not to adopt a new technology. In this section, we consider a variant of the main model by incorporating state variables that follow discrete jump processes, and prove equilibrium uniqueness in this setup. The finding here serves to highlight the robustness of the general point made in this paper and broaden the scope of economic applications as we will illustrate below. As in the main model in Section 2, we consider a finite-horizon discrete-time game with N players. Player i has a finite set of states denoted by X i ⊆ R. For example, the state could be either “maintaining an old technology” or “adopting a new technology.” In each state, player i i has action set Ai = [0, 1]X and controls the transition intensity from the current state to another with costly effort Ait ∈ Ai . Here Ait (xi ) represents the intensity of jumping into state i xi . More specifically, the transition probability from Xti to Xt+∆ os given by i Xt+∆ x i = 6 Xti = X i t with probability ∆Ait (xi )µi (xi , Xti ), otherwise, where µi (x, x0 ) ≥ 0 for each x, x0 ∈ X i . This formulation allows for absorbing states when µi (x, x0 ) = 0 for x0 6= x. For example, Ait can be interpreted as player i’s effort to adopt a new technology. As in the main model, player i’s payoff is given as U i (XT ) − X ∆ci (Ait ). t This model can be also interpreted as embedding a finite-strategy static game into a dynamic setting by considering X i as the set of strategies in the static game. That is, players have stochastic chances to change their actions until they play a one-shot game (See Calcagno, Kamada, Lovo, and Sugaya (2014)). 30 In the proof of Proposition 1, we rely on the fact that the total noise size r∆ 2 r∆ i t t P is small with a high probability. While this is ensured by small ∆ under the i.i.d assumption, we can achieve this even with correlated noise under an appropriate requirement, for example by invoking the law of large numbers for dependent variables. Even if it fails, we can redefine the approximate efficiency in Proposition 1 with the expected first-best payoff rather than the best terminal payoff. 26 Example 5 (Technology adoption). Each agent can acquire a new technology subject to switching cost. The state Xti takes binary values, and Xti = 1 indicates that the agent i has already adopted the new technology, and Xti = 0 otherwise. Assume that µ(1, 0) > 0 but µ(0, 1) = 0, i.e., the technology adoption is an irreversible action. The cost ci (Ait ) is increasing in Ait (1), so that it is costly to acquire the technology at a higher rate. Adopting new technologies typically involves complementarities, for example when U i supermodular, which causes equilibrium multiplicity. Example 6 (Market-platform choice). There are multiple marketplaces M at which transactions take place. Traders select a market platform before the deadline, where ci (Ai ) is seen as a (possibly small) switching cost of changing the market platform, and Xti ∈ M represents the marketplace player i is planning to enter at t. The payoff of entering each market m ∈ M is increasing in the size of the participants, that is, for X i = m, U i (X i , X −i ) ≥ U i (X i , X̃ −i ) holds whenever {j ∈ N : X j = m} ⊇ {j ∈ N : X̃ j = m}. As emphasized in the literature (e.g., Pagano (1989), Ellison and Fudenberg (2003)), such complementarity in market participation can arise from various channels, and there are typically several equilibria in this type of game under the one-shot formulation. We maintain the same assumption on the adjustment cost ci as in the assumptions in Section 2, while do not impose any restriction on the terminal payoff U i . As in the main model, we cannot rule out equilibrium multiplicity for the one-shot case of ∆ = T , or more generally when ∆ is not small (for example when terminal payoffs admit complementarities, as in the above examples). However, for small enough ∆, we again achieve unique equilibrium as shown by the following: Theorem 3. There is a unique subgame-perfect equilibrium in the discrete jump model if ∆ is small enough. The strategies are Markovian, i.e, functions of (t, Xt ). The intuition for equilibrium uniqueness is analogous to the main model. In the main model, players adjust their positions every period by small amounts. In the current model, their positions can jump but this happens only infrequently. Thus, in either case, the impact of players’ actions per period is small in expectation. Also, at a more formal level, recall that players’ actions in the main model can be interpreted as changing probability measure of state variable paths, just as players’ actions influence the states in the current model. In a team production game, an approximate efficiency result similar to Proposition 1 holds for the discrete-jump model as well. The intuition and the proof idea are analogous and thus omitted. This result is in line with Calcagno, Kamada, Lovo, and Sugaya (2014), which selects Pareto efficient outcomes in coordination games (See also the related literature section). 27 7 Conclusion This paper examines the strategic role of players’ ability to gradually adjust their positions over time. The equilibrium is unique if players’ actions are flexible enough in the presence of noise, which can be arbitrarily small in the limit. The highlight is that we do not require any substantial restrictions of terminal payoffs, and adjustment of their positions can be either continuous or discrete. For certain situations, gradualness leads to higher efficiency relative to the one-shot setting. We also considered a continuous-time version of the model (based on Brownian noise) that is more tractable for applications, in particular because the equilibrium is characterized by PDEs (subject to smoothness requirement) that facilitate applications to specific problems. Let us emphasize a few implications of our findings. First, our equilibrium uniqueness results suggest that coordination problems in one-shot games might simply be an artifact of abstracting away from the players’ abilities to make gradual adjustment. We think such adjustment is plausible in many instances of interest. Second, our framework is flexible so our uniqueness results hold for a broad class of applications. While many papers in the finitehorizon-game literature analyze specific economic problems that feature interesting deadline effects, our results provide a general modeling approach that conveniently yields equilibrium uniqueness for general terminal payoffs with frequent actions. We also note that equilibrium uniqueness is in general important for parameter estimation in empirical analysis.31 Third, as explained in Section 1.1, our method of proof for the uniqueness result in Section 5 is useful in continuous-time games with Brownian noise beyond the specific model of this paper. While the theory of BSDEs has been fruitfully applied in finance and stochastic control and optimization, it seems underutilized in the literature of dynamic games. While we have focused on games with fixed payoff functions, the problem of optimal incentive design is a natural next step, and our approach provides a useful guide for a designer to achieve unique (rather than partial) implementation in multi-agent settings. Also, we have focused on complete information games. In the presence of incomplete information, the evolution of the states can convey some information about players’ information or unobservable economic fundamentals, and such an extension of the model will allow one to study incentives for signaling and/or experimentation.32 31 Relatedly Caruana and Einav (2008a) use data of time-varying production target levels from U.S. auto manufacturers. 32 A recent paper by Bonatti, Cisternas, and Toikka (2016) considers a continuos-time model of dynamic oligopoly with a deadline and private information. 28 Appendix A Details of Example 3.1 We argue that in the deterministic case the strategy profile in which player i chooses 1 if P X j ≥ (T − t)θ t j i At = P j 0 if j Xt < (T − t)θ is a subgame perfect equilibrium for N ≥ 3 and for any value θ ∈ (−N + 1, −1). To see this, we fix any period t ∈ {0, ∆, .., T − ∆} and current state Xt and confirm that no player has an incentive to deviate from the prescribed strategy: • Case 1: Xt ≥ (T − t)θ. If there is no deviation, the terminal state is XT = Xt +(T −t)N , which is strictly positive. After an arbitrary deviation on the part of player i after time t, the state variable Xt0 at any t0 > t is strictly greater than (T − t0 )θ because of θ < −N + 1. Thus, the terminal state after any deviation is still positive. Given this, the optimization problem reduces to max ∆ i A TX −∆ (Ais − c(Ais )), s=t and thus it is strictly optimal to choose Ait = 1 at any period t0 ≥ t. • Case 2: Xt < (T − t)θ. If there is no deviation, the terminal state is XT = Xt −(T −t)N , which is strictly negative. By the same argument as in Case 1, under an arbitrary deviation on the part of player i after time t, the terminal state is strictly negative. Given this, the optimization problem reduces to max ∆ i A TX −∆ (−c(Ais )), s=t and thus it is strictly optimal to choose Ait = 0 at any time t0 ≥ t. Finally, consider the case r∆ = κ∆ with some constant κ. If κ is small, then for any ∆, the above strategy profile remains to be a subgame-perfect equilibrium. This is because, the players’ incentive problems under deviations take the same forms as the above ones. 29 B Proofs of Results from Main Model B.1 Proof of Theorem 1 We begin with a couple of preliminary lemmas that are used in the proof. Lemma 2. Let w : R → R be an integrable function. Then the function w̃ : R → R defined R by w̃(y) := w(y + r∆ i )φi (i )di is continuously differentiable with its derivative given by R w̃0 (y) = − r1∆ w(y + r∆ i )φ0i (i )di Proof. Z d d w̃(y) = w(y + r∆ i )φi (i )di dy dy i Z ˜ − y di i d d˜ = w(˜)φi dy r∆ d˜i i Z 1 ˜ − y di i 0 = − w(˜)φi d˜ r∆ r∆ d˜i Z 1 = − w(y + r∆ i )φ0i (i )di r∆ where we used the change of variables by ˜i := y + r∆ i . The following lemma is a minor modification of the standard result that ensures continuous parametric dependence of fixed points under contraction maps. Lemma 3. Let Y, Z be metric spaces such that Y is complete, and g : Y × Z → Y . Suppose that (i) g(·, z) a contraction map over Y for each z with its module λ ∈ (0, 1) independent of z, (ii) g(y, z) is Lipschitz continuous in z with its module κ independent of y. Then there is a unique fixed point y ∗ (z) for each z, and the map z 7→ y ∗ (z) is Lipschitz continuous. Proof. Let y ∗ (z) be the unique fixed point of each g(·, z). Then for any z, z 0 ∈ Z, ky ∗ (z) − y ∗ (z 0 )k = kg(y ∗ (z), y ∗ (z 0 )) − g(y ∗ (z 0 ), z 0 )k ≤ kg(y ∗ (z), z) − g(y ∗ (z), z 0 )k + kg(y ∗ (z), z 0 ) − g(y ∗ (z), z)k ≤ ky ∗ (z) − y ∗ (z 0 )kλ + kz − z 0 kκ, κ which ensures ky ∗ (z) − y ∗ (z 0 )k ≤ kz − z 0 k 1−λ . Proof of Theorem 1. To simplify the notation, we only consider the two-player case. The general case can be proved analogously. Take any history ht at t = T − ∆ and let Xt denote the 30 state at t. After this history, each player i’s equilibrium action Ait (ht ) is to maximize the payoff Q function Vti (·; Xt ) : j=1,2 Ai → R defined by Vti (At ; Xt ) Z : = Y U i (Xt + ∆At + r∆ εt ) φj (jt )dεt − ∆ci (Ait ) j=1,2 Z Z = U i (XT )φi (it )dεit φj (jt )djt − ∆ci (Ait ) over Ai given the other player’s equilibrium action Ajt , where XT = Xt + ∆At + r∆ t (we will use this simplified notation several times in this proof). By Lemma 2 below, the first term is continuously differentiable in Ait . Note that we can write i −i i Z U (X) = U (0, X ) + 0 Xi ∂ U i (xi , X −i )dxi ∂X i because U i is absolutely continuous, where for each X −i , the partial derivative of U i (X i , X −i ) in X i is defined almost everywhere. Thus the derivative of Vti is calculated by ∂ i V (At ; Xt ) = lim δ→0 ∂Ait t Z Z 1 δ Z XTi +∆δ XTi ! U i (xi , XTj )dxi ! φi (it )dεit φj (jt )djt − ∆(ci )0 (Ait ) Z Z i i i 1 i i j i = lim U (XT + ∆δ, XT ) − U (XT ) φ (t )dεt φj (jt )djt − ∆(ci )0 (Ait ) δ→0 δ Z Z ∂ i i i i U (XT )φ (t )dεt φj (jt )djt − ∆(ci )0 (Ait ) = ∆ i ∂X where we used the Lebesgue convergence theorem in the third line since φi and the partial derivatives of U i are bounded. Applying Lemma 2 again, the second derivatives are derived as Z Z ∆ ∂ ∂2 i i i i i U (XT )φ (t )dεt (φj )0 (jt )djt , j Vt (At ; Xt ) = −∆ i i r ∂X ∂At ∂At ∆ Z Z 2 ∂ ∆ ∂ i i i 0 i i V (At ; Xt ) = −∆ U (XT )(φ ) (t )dεt φj (jt )djt − ∆(ci )00 (Ait ). i 2 t i ∂(At ) r∆ ∂X 31 Since U i is absolutely continuous in X i , we can apply the integration by parts to obtain Z ∂ j i i i i i i i U (X + ∆A + r ε , X )φ ( )d ∆ t t t t t T ∂X i Z i i j j i i i i ∞ i i i i i 0 i i = U (Xt + ∆At + r∆ ε , XT )φ (t ) −∞ − U (Xt + ∆At + r∆ ε , XT )(φ ) (t )dt Z i i i i 0 i i i ≤ sup |U (x) ilim φ ( ) − U (x ) i lim φ ( )| + sup |U (x)| |(φi )0 (it )|dit →∞ →−∞ x,x0 ∈R2 x∈R2 Z = sup |U i (x)| |(φi )0 (it )|dit . x∈R2 Thus ∂2 ∆2 i V (A ; X ) ∂Ai ∂Aj t t t ≤ r∆ C, t t for all i ∈ N , Xt ∈ Rn , and At ∈ Q i∈N Ai where j j j Z C := max sup |U (x) − T c (A )| j=1,2 x∈R2 ,Aj 1 0 |(φ ) (1t )|d1t Z |(φ2 )0 (2t )|d2t < ∞ by assumption. There is M > 0 such that if lim∆→0 r∆ ∆ ¯ > 0 such that if ∆ < ∆, ¯ ≥ M , there is ∆ 2 ∂ V i (A ; X ) ∆2 ∂Ait ∂Ajt t t t C 1 r∆ 2 ≤ ∂ V i (A ; X ) ∆c̄ − ∆2 C < 2 , (∂Ait )2 t t t r∆ (12) where c̄ := minAj , j=1,2 (cj )00 (Aj ) > 0. ¯ each player i’s objective function is strictly concave in Ai . Under Therefore, for ∆ < ∆, t this concave payoff function, player i’s best response against the other’s action at this subgame is always in a pure strategy and given by the FOC ∂ V i (At ; Xt ) ∂Ait t = 0 in the case of the interior solution, and otherwise i A = max Ai if ∂ V i (max Ai , Ajt ; Xt ) ∂Ait t >0 min Ai if ∂ V i (max Ai , Ajt ; Xt ) ∂Ait t <0 . By the implicit function theorem, the slope of player i’s best response can be bounded by the inequality (12) in absolute. Note that a pure strategy equilibrium in this subgame is a fixed Q point of the map over i∈N Ai defined by the best-response functions above. Since the map is a contraction, the equilibrium action profile after the history ht should be unique. Since the 32 payoff function is strictly concave, there is no mixed strategy equilibrium in this subgame as ¯ depends only on the properties of (U i , ci , φi )i∈N , we can conclude well.33 Furthermore, since ∆ that after any period T − ∆ histories, there is a unique equilibrium action profile that is played ¯ if ∆ < ∆. ¯ and t = T − ∆, denote by W i (ht ) := E[U i (Xt + ∆A(ht ) + r∆ t )|ht ] − ∆ci (Ai (ht )) For ∆ < ∆ the continuation value after any period t history ht , where Ai (ht ) is the unique equilibrium action after history ht . Since each Ai (ht ) depends only only the current state Xt by the above construction, we can write the continuation values and equilibrium actions as functions of Xt instead, i.e., W i (Xt ) and Ai (Xt ). Note that for any Xt that is induced by some history ht , (A1 (Xt ), A2 (Xt )) is a unique fixed ¯ the module of the contraction is point of a contraction mapping. By the construction of ∆, taken to be uniformly bounded away from 1 in Xt . By Lemma 3 below, Ai (Xt ) is Lipschitz continuous in Xti . This also implies that W i (Xt ) is Lipschitz continuous in Xt . Consider period t = T − 2∆. Taking any period t history ht and denoting by Xt the current state, Vti (At ; Xt ) Z := W i (Xt + ∆At + r∆ εt ) Y φj (jt )dεt − ∆ci (Ait ) j=1,2 Since we know that the function W i is Lipschitz continuous and bounded in Xt , we can repeat the same argument as above to calculate the second-order derivatives of the payoff functions. Since |W i (X)| ≤ maxX,Aj |U j (X)| + T |ci (Aj )|, we can use the same constants C used above to bound the derivative terms. Following the same argument, we can conclude that there is a ¯ where ∆ ¯ is unique action profile (in pure strategies) that is played after the history if ∆ < ∆ taken to be the same as the above. We can repeat the same procedure for earlier periods t < T − 2∆ repeatedly, which confirms that the model admits a unique subgame equilibrium, which is in pure strategies. Also, by construction, the strategies are Markovian (i.e, functions of (t, Xt )). B.2 Proof of Proposition 1 Let c̄ := supA∈A |c(A)|. We consider a hypothetical single agent model where a single agent P P chooses Âit ∈ A for each i and t. The final payoff is given as U ( i XTi ) − ∆ i,t (c(Ait ) + c̄). 33 The inequality (12) also ensures the total concavity condition of Rosen (1965), i.e., ∂2 i V (A ; X ) + V i (At ; Xt ) < 0. t t t i) t i ∂Aj ∂(A ∂A t t t i=1,2 X ∂2 Thus, the equilibrium is also a unique correlated equilibrium (Ui (2008)). 33 Note that this problem has a symmetric optimal control policy that is Markovian, denoted by Â∆ (t, Xt ), and the continuation value, denoted by Ŵ∆ (t, Xt ). Lemma 4. In the team production model, there is a symmetric subgame-perfect equilibrium strategy that is Markovian, denoted by A∆ (t, Xt ), and the corresponding continuation value, denoted by W∆ (t, Xt ), such that W∆ (t, Xt ) ≥ Ŵ∆ (t, Xt ) for any t and Xt . Proof. We prove the lemma by backward induction. 1. Take any history in the team production model at t = T − ∆, and let Xt denote the corresponding state. In the hypothetical single agent model at (t, Xt ), the symmetric optimal control policy Â∆ (t, Xt ) solves " # max E U (Xt + N ∆At + At ∈A X r∆ it ) − N ∆c(At ). i Thus, it also solves " # max E U (Xt + (N − 1)∆Â∆ (t, Xt ) + ∆At + At ∈A X r∆ it ) − ∆c(At ) i because the cost terms are linearly separable among players. Therefore, A∆ (t, Xt ) := Â∆ (t, Xt ) satisfies the incentive compatibility of players in the team production model at this subgame. Associated with this action profile at this subgame, we observe W∆ (t, Xt ) = Ŵ∆ (t, Xt ) + (N − 1)∆(c(A∆ (t, Xt )) + c̄) ≥ Ŵ∆ (t, Xt ), which used c(A∆ (t, Xt )) + c̄ ≥ 0. 2. Next, at t = T − 2∆, take any history in the team production model, and let Xt denote the corresponding state. We consider a modified single problem in which the single agent chooses At at t = T − 2∆ given W∆ (t, Xt ) at t = T − ∆, instead of Ŵ∆ (t, Xt ). In other words, the continuation payoff of the single agent at the last period is replaced with that 34 of the team production model. We denote the optimal control policy by Ã∆ (t, Xt ) and its associated continuation payoff by W̃∆ (t, Xt ). We first observe that A∆ (t, Xt ) := Ã∆ (t, Xt ) satisfies the incentive compatibility condition at t = T − 2∆ because the two problems share the same continuation payoff at t = T − ∆. Denote the associated (symmetric) continuation payoff by W∆ (t, Xt ). Then we can claim the following: W∆ (t, Xt ) ≥ W̃∆ (t, Xt ) ≥ Ŵ∆ (t, Xt ). The first inequality can be proved in the same manner as in the previous section. We just need to replace U with W∆ (T − ∆, ·) because the two models share the same continuation payoff at the last period. For the next inequality, we rely on the previous observation that W∆ (T − ∆, XT −∆ ) ≥ Ŵ∆ (T − ∆, XT −∆ ). The single agent could always take the same action in the original and modified problems. Therefore, the higher continuation payoff at each value of the state variable necessarily implies W̃∆ (t, Xt ) ≥ Ŵ∆ (t, Xt ). 3. For t = T − 3∆, · · · , 0, we can continue this argument until the start of the game. Lemma 5. For any > 0, there exists δ > 0 such that if max{sup |c|, 2 r∆ } ∆ ≤ δ, then ! Ŵ∆ (0, X0 ) ≥ X U Q max X∈ i∈N (X0i +T A) Xi − . i Proof. Take ! XT? ∈ arg max X∈ Q U X i i∈N (X0 +T A) Xi , i and consider a control policy Ai,? t = (XTi,? − X0i )∆ . T This control policy provides a lower bound of Ŵ∆ (t, Xt ) as " Ŵ∆ (0, X0 ) ≥ E U !# X X i,? + i X i,t 35 r∆ it − 2N T c̄. ∆ (13) Since U is bounded and Lipschitz continuous, " E U !# X X i,? + i as 2 r∆ ∆ X r∆ it ! −→ U i,t X X i,? i goes to zero and the Chebyshev applies. In addition, the last term in (13) vanishes when cost goes to zero, which ends the proof. Proposition 1 is a consequence of Lemmas 4 and 5. B.3 Proof of Corollary 1 Proof. Corollary 1 is a consequence of Theorem 1 and Proposition 1. C Proofs of Results from Continuous-time Limit Model with Brownian Noise C.1 Proof of Lemma 1 We note that the model is constructed from a canonical probability space (Ω, F, P) where there is no drift in X and changes in the probability measure depend on A = (Ai )i∈N . We denote this new probability space by (Ω, F, PA ; F X ), which has been augmented with the filtration generated by X, and the the corresponding expectation operator by EA . This construction is standard and (often implicitly) used by most continuous-time models that feature imperfect public monitoring.34 We begin with the following preliminary lemma (While the result is standard, we include its proof for its completeness). Lemma 6. Fix a strategy profile A = (Ai )i∈N . Then W0i (A) ≥ W0i (Âi , (Aj )j6=i ) for any strategy Âi of player i if and only if Ait = f i (βt (A)) for almost all t ∈ [0, T ] almost surely, where β(A) is defined as in equation (7). Proof. Fix any strategy profile A = (Ai )i∈N . As discussed in Section 5.2, player i’s continuation 34 Chapter 5 of Cvitanic and Zhang (2012) discusses the technical details more extensively in the context of dynamic contracts. 36 payoff under A takes the form Wti (A) A Z i T i (Ais )ds|FtX c U (XT ) − t Z T Z T i i i c (As )ds − βsi (A) · dZs (A) = U (XT ) − t t Z Z T −1 i i i i c (As ) − βs (A) · Σ (µ(Xs ) + As ) ds − = U (XT ) − = E t T βsi (A) · Σ−1 dXs , t with WTi (A) = U i (XT ). The second equality follows by applying the extended martingale representation theorem to the conditional expectation of the total payoff (see for example Chapter 5 and Lemma 10.4.6 in Cvitanic and Zhang (2012)), and the third equality uses dXt = (µ(Xt ) + At ) dt + ΣdZt . The pair (W i (A), β i (A)) is a solution to the one-dimensional BSDE (under the original probability space) with driver −ci (Ait )+βti (A)·Σ−1 (µ(Xt ) + At ) and terminal condition UTi (XT ). Note that we treat (Aj )j6=i as exogenous processes in this BSDE formulation. Take another strategy profile à = (Ãj )j∈N in which player i uses strategy Ãit = f i (βt (A)) and where Ãj = Aj for every j 6= i. Using the same argument as above, player i’s continuation value takes the form Wti (Ã) i Z = U (XT ) − T c i (Ãis ) − βsi (Ã) −1 ·Σ t µ(Xs ) + Ãs Z ds − T βsi (Ã) · Σ−1 dXs t with WTi (Ã) = U i (XT ). Likewise, (W i (Ã), β i (Ã)) is a solution to the one-dimensional BSDE with driver −ci (Ãit ) + βti · Σ−1 (µ(Xt ) + Ãt ) and terminal condition U i (XT ). Note that the driver term satisfies the quadratic growth condition, because f i is bounded and Lipschitz continuous (Lemma S4 in Supplementary Appendix). By the comparison theorem (Lemma S1), W0i (Ã) ≥ W0i (A) holds. This proves the “if” direction. To prove the “only if” direction, consider the case in which Ait 6= f i (βt ) for some strictly positive measure dt⊗dP. Then by the strict inequality part of the comparison theorem (Lemma S1), W0i (Ã) > W0i (A). Proof of Lemma 1. To show the “if” direction, take any strategy profile A = (Ai )i∈N under which conditions (5), (7), (8), and (9) hold for some F X -progressively measurable processes RT (W, β) such that E sup0≤t≤T |Wt |2 , E[ 0 |βt |2 dt] < ∞. As we have seen in the proof of Lemma 6, for player i, the pair of processes (W i (A), β i (A)) is a unique solution to one-dimensional BSDE with driver −ci (Ait ) + βti · Σ−1 (µ(Xt ) + At ) and terminal condition UTi (XT ). Thus 37 Wti = Wti (A) and βti = βti (A) (for almost all t almost surely). By Lemma 6 and the incentive compatibility condition (9), each player maximizes her expected total payoff. Thus A is a Nash equilibrium. To show the “only if” direction, take any Nash equilibrium A = (Ai )i∈N . Then the incentive compatibility condition (9) holds by Lemma 6. The remaining conditions follow immediately by setting Wti = Wti (A) and βti = βti (A). C.2 Proof of Theorem 2 Proof. Take any solution (W, β) of BSDE (10) if one exists. This is a solution to an FBSDE of the form35 dWti = ci (f i (βt ))dt + βti · dZt , WTi = U i (XT ), i ∈ N dXt = (µ(Xt ) + f (βt )) dt + ΣdZt , (14) under the probability space (Ω, G, Q) in which G = F X and the Zt -process is a d-dimensional Brownian motion. The probability measure Q is constructed from the original canonical probability space (Ω, F X , P) by the Girsanov theorem, which ensures F Z ⊆ F X . We apply Lemma S3 in Supplementary Appendix to show that a unique solution (X, W, β) to FBSDE (14) exists. To check condition 1 in Lemma S3, first note that the drift term in the expression for the Xt process, namely µ(Xt ) + f (βt ), is Lipschitz continuous in βt by Lemma S4 in Supplementary Appendix, which ensures the first inequality. The second inequality follows by the Cauchy-Schwartz inequality and the Lipschitz continuity of µ(Xt ) + f (βt ) in X (Lemma S4), and the boundedness of µ0 and Ai for each i ensures the third inequality. To check condition 2, note that the drift term in the expression for the Wti process, namely ci (f i (β)), is Lipschitz continuous in (X, β) by Lemma S4 and the assumptions on ci . This term is independent of W , which ensures the second inequality. The third inequality follows from the boundedness of ci for each i. Condition 3 is satisfied by the assumption on the terminal payoffs. Thus if there is another solution (W 0 , β 0 ) to BSDE (10), this induces another solution (X 0 , W 0 , β 0 ) to FBSDE (14) that coincides with the solution (X, W, β). Thus (Wt , βt ) = (Wt0 , βt0 ) for almost all t almost surely. By Theorem 2.1 of Cheridito and Nam (2015), the process (W, β) is a solution to BSDE (10). This shows existence of a unique Nash equilibrium by Lemma 1. Finally, by Lemma S3, the unique solution (X, W, β) to the FBSDE with G = F Z takes the 35 In the third line, f (βt ) denotes a column vector whose i’s component is f i (βt ). 38 form Wt = w(t, Xt ), βt = b(t, Xt ) for some function w : [0, T ] × Rn → Rn that is uniformly Lipschitz continuous in x ∈ Rn and some function b : [0, T ] × Rn → Rd×n . C.3 Proof of Proposition 2 Proof. By symmetry of the players, at the unique equilibrium A = (Ai )i∈N , every player chooses the same action, namely Ait = f (βt (A)), where f (β) := arg maxα∈A {σ −1 βα − κc(α)} at almost all t almost surely. Thus the equilibrium BSDE (10) reduces to a one-dimensional BSDE dWt = − N f (βt )σ −1 βt − κc(f (βt )) dt + σ −1 βt dXt , WT = U (XT ). (15) As a next step we observe that βt (A) ≥ 0 for almost all t almost surely if U is increasing. To see this, first note that the driver term in (15) satisfies the quadratic growth condition, and that the terminal payoff U (x) is bounded, Lipschitz continuous, and increasing. Therefore, the claim follows by Corollary 3.5 and Remark 3.8 in dos Reis and dos Reis (2013). For any β ≥ 0, the term maxα {Σ−1 βα − κc(α)} is decreasing in κ by the envelope theorem. Let f (β; κ) := arg maxα Σ−1 βα−κc(α). Thus, for each β ≥ 0, the driver term in the equilibrium BSDE (15), (N − 1)f (β; κ)Σ−1 β + max{Σ−1 βα − κc(α)}, α is decreasing in κ and increasing in N . By Lemma S1 (comparison theorem for one-dimensional quadratic BSDEs) in Supplementary Appendix , W0 (A(κ, N )) ≥ W0 (A(κ̃, N )) for any κ ≤ κ̃ and N , and W0 (A(κ, N )) ≤ W0 (A(κ, Ñ )) for any N ≤ Ñ and κ. C.4 Proof of Proposition 3 In Supplementary Appendix, we construct an equilibrium strategy profile for general quadratic P payoffs U i = k=1,2 (αik (xk )2 + βik xk ) + γi x1 x2 in Proposition S1. The equilibrium actions and continuation values take the form ai (t, x1 , x2 ) = c̄1i (2αii (t)xi +βii (t)+γi (t)xj ) and wi (t, x1 , x2 ) = P k 2 k + γi (t)xi xj + δi (t), where the coefficients are obtained by solving k=i,j αik (t)(x ) + βik (t)x the system of ODEs (20) given in that proof. For the current game, we can directly verify the 39 solution α11 (t) = α12 (t) = γ1 (t) = α22 (t) = 0, (T − t)3 T −t , β12 (t) = 1, δ1 (t) = , β11 (t) = c̄2 3c̄1 c̄22 T −t (T − t)2 (T − t)3 α21 (t) = , β22 (t) = , β21 (t) = , 2c̄2 2c̄1 c̄2 2c̄1 c̄22 σ 2 (T − t)2 3(T − t)5 , γ2 (t) = 1, δ2 (t) = 1 + 4c̄2 20c̄21 c̄32 where the boundary conditions (21) given in the proof of Proposition S1 take the form β12 (T ) = γ2 (T ) = 1, α11 (T ) = α12 (T ) = α21 (T ) = α22 (T ) = β11 (T ) = β21 (T ) = β22 (T ) = γ1 (T ) = δ1 (T ) = δ2 (T ) = 0. Therefore, the equilibrium strategies are a1 (t, x1 , x2 ) = T −t (T − t)2 x1 , a2 (t, x1 , x2 ) = + . c̄1 c̄2 2c̄1 (c̄2 )2 c̄2 For the continuation value functions, w1 (t, x1 , x2 ) = w2 (t, x1 , x2 ) = D D.1 T −t 1 (T − t)3 , x + x2 + c̄2 3c̄1 c̄22 T − t 1 2 (T − t)3 1 (T − t)2 2 σ12 (T − t)2 3(T − t)5 1 2 (x ) + x + x + x x + + . 2c̄2 2c̄1 c̄22 2c̄1 c̄2 4c̄2 20c̄21 c̄32 Proofs of Results from Discrete-Jump Model Proof of Theorem 3 To simplify the notation, we only consider the two-player case. The general case can be proved analogously. Take any history ht at t = T − ∆ and let Xt denote the state at t. After this history, each player i’s equilibrium action Ait (ht ) is to maximize the payoff function Vti (·; Xt ) : Q i j=1,2 A → R defined by Vti (At ; Xt ) : = X pi (xi )pj (xj )U i (x) − ∆ci (Ait ) x over Ai given the other player’s equilibrium action Ajt , where for each i = 1, 2 the transition probabilities pi (xi ) are specified by 40 pi (xi ) = 1 − ∆λ P xi 6=Xti Ait (xi )µi (xi , Xti ) for xi = Xti ∆λAi (xi )µ (x , X i ) t i i t i if x 6= . Xti The derivative of Vti is calculated by −∆c0 (Ai ) for xi = Xti ∂ t i i . V (At ; Xt ) = λ∆ P p (x )µ (xi , X i )(U i (x) − U i (X i , xj )) − ∆c0 (Ai )) if xi = ∂Ait (xi ) t i 6 X j i t t i t t xj j The cross derivatives are derived as, for xi 6= Xti , 0 for xj = Xtj ∂ i Vt (At ; Xt ) = . j λ2 ∆2 µ (xj , X j )µ (xi , X i )(U i (x) − U i (X i , xj )) if xj = ∂Ait (xi )∂Ajt (xj ) 6 Xt j i t t t 2 and ∂2 V i (At ; Xt ) ∂Ait (xi )∂Ajt (xj ) t = 0 for xi = Xti and xj . On the other hand, the second derivatives are calculated by ∂2 V i (At ; Xt ) = ∆c00i (Ait (xi )) i i 2 t (∂At (x )) for each xi . ¯ > 0 such that if ∆ < ∆, ¯ Therefore, there is ∆ ∂2 i V (A ; X ) j t t ∆2 C ∂Ait (xi )∂At (xj ) t 1 ≤ < ∂2 ∆c̄ |X i | (∂Ait (xi ))2 Vti (At ; Xt ) (16) for each i = 1, 2,, where c̄ := minAj ,j=1,2 c00j (Aj ) > 0 and C := λ2 max 0 j=1,2,x,x0 ,xj ,xj ,Aj ,Aj 0 0 (µj (xj , xj ))2 |U j (x) − U j (x0 )| + T |cj (Aj ) − cj (Aj )| > 0, 0 which is independent ∆ and players’ strategy profiles. ¯ for each i, Xti and Aj , V i (Ait , Aj ; Xt ) is strictly concave in Ait . Under this Thus, for ∆ < ∆, concave payoff function, player i’s best response against the other’s action at this subgame is always in a pure strategy and given by the FOC ∂ V i (At ; Xt ) ∂Ait (xi ) t = 0 in the case of the interior solution. By the implicit function theorem, the slope of player i’s best response can bounded by (16) in absolute. Note that a pure strategy equilibrium in this subgame is a fixed-point Q of the map over i∈N Ai defined by the best-response functions above. Since the map is a contraction, the equilibrium action profile after the history ht should be unique. Also note that since the payoff function is strictly concave, there is no mixed strategy equilibrium in this 41 subgame as well. ¯ depends only on the properties of (U i , µi )i∈N , we can conclude that Furthermore, since ∆ after any period T − ∆ histories, there is a unique equilibrium action profile that is played if ¯ ∆ < ∆. ¯ and t = T −∆, denote by W i (ht ) := E[U i (XT )|ht ]−∆ci (Ai (ht )) the continuation For ∆ < ∆ value after any period t history ht , where Ai (ht ) is the unique equilibrium action after history ht . Since each Ai (ht ) depends only only the current state Xt by the above construction, we can write the continuation values and equilibrium actions as functions of (t, Xt ) instead, i.e., Wti (Xt ) and Ait (Xt ). Consider period t = T − 2∆. Taking any period t history ht and denoting by Xt the current state, Vti0 (At ; Xt ) Z := W i (Xt + ∆At + r∆ εt ) Y φj (jt )dεt − ∆ci (Ait ) j=1,2 Since we know that the function W i is bounded in Xt , we can repeat the same argument as above to calculate the second-order derivatives of the payoff functions. Since |W i (X)| ≤ maxX,Aj |U j (X)| + T |ci (Aj )|, we can use the same constants C, C 0 used above to bound the derivative terms. Following the same argument, we can conclude that there is a unique action ¯ where ∆ ¯ is taken to be the profile (in pure strategies) that is played after the history if ∆ < ∆ same as the above. We can repeat the same procedure for earlier periods t < T − 2∆ repeatedly, which confirms that the model admits a unique subgame equilibrium, which is in pure strategies. Also, by construction, the strategies are Markovian (i.e, functions of (t, Xt )). References Ambrus, A., J. Burns, and Y. Ishii (2014): “Gradual bidding in eBay-like auctions,” mimeo. Ambrus, A., and S. E. Lu (2015): “A continuous-time model of multilateral bargaining,” American Economic Journal: Microeconomics, 7(1), 208–249. Athey, S., and A. Schmutzler (2001): “Investment and Market Dominance,” The RAND Journal of Economics, 32(1), 1–26. Bergin, J., and W. B. MacLeod (1993): “Continuous Time Repeated Games,” International Economic Review, 34(1), 21–37. 42 Bohren, A. (2016): “Using Persistence to Generate Incentives in a Dynamic Moral Hazard Problem,” mimeo. Bonatti, A., G. Cisternas, and J. Toikka (2016): “Dynamic Oligopoly with Incomplete Information,” Review of Economic Studies. Budd, C., C. Harris, and J. Vickers (1993): “A Model of the Evolution of Duopoly: Does the Asymmetry between Firms Tend to Increase or Decrease?,” Review of Economic Studies, 60(3), 543–573. Burdzy, K., D. Frankel, and A. Pauzner (2001): “Fast Equilibrium Selection by Rational Players Living in a Changing World,” Econometrica, 69(1), 163–189. Calcagno, R., Y. Kamada, S. Lovo, and T. Sugaya (2014): “Asynchronicity and Coordination in Common and Opposing Interest Games,” Theoretical Economics, 9, 409–434. Cao, D. (2014): “Racing Under Uncertainty: A Boundary Value Problem Approach,” Journal of Economic Theory, 151, 508–527. Carlsson, H., and E. van Damme (1993): “Global Games and Equilibrium Selection,” Econometrica, 61(5), 989–1018. Caruana, G., and L. Einav (2008a): “Production Targets,” RAND Journal of Economics, 39(4), 990–1017. (2008b): “A Theory of Endogenous Commitment,” Review of Economic Studies, 75(1), 99–116. Caruana, G., L. Einav, and D. Quint (2007): “Multilateral bargaining with concession costs,” Journal of Economic Theory, 132, 147 – 166. Chen, Z., R. Kulperger, and G. Wei (2005): “A Comonotonic Theorem for BSDEs,” Stochastic Processes and their Applications, 115(1), 41–54. Cheridito, P., and K. Nam (2015): “Multidimensional quadratic and subquadratic BSDEs with special structure,” Stochastics, 87(5), 871–884. Cvitanic, J., and J. Zhang (2012): Contract Theory in Continuous-Time Models. Springer Finance. Davis, M. H. A. D. M. (1979): Martingale methods in stochastic control pp. 85–117. Springer Berlin Heidelberg. Delarue, F. (2002): “On the existence and uniqueness of solutions to FBSDEs in a nondegenerate case,” Stochastic Processes and their Applications, 99(2), 209–286. 43 Delong, L. (2013): Backward Stochastic Differential Equations with Jumps and Their Actuarial and Financial Applications: BSDEs with Jumps. Springer. DeMarzo, P. M., and Y. Sannikov (2006): “Optimal Security Design and Dynamic Capital Structure in a Continuous-Time Agency Model,” Journal of Finance, 61(6), 2681–2724. Dockner, E. J., S. Jorgensen, N. V. Long, and G. Sorger (2001): Differential Games in Economics and Management Science. Cambridge University Press. dos Reis, G., and R. J. dos Reis (2013): “A note on comonotonicity and positivity of the control components of decoupled quadratic FBSDE,” Stochastics and Dynamics, 13(4), 1350005. Ellison, G., and D. Fudenberg (2003): “Knife-Edge or Plateau: When Do Market Models Tip?,” The Quarterly Journal of Economics, 118(4), 1249–1278. Faingold, E., and Y. Sannikov (2011): “Reputation in Continuous-Time Games,” Econometrica, 79(3), 773–876. Frankel, D., and A. Pauzner (2000): “Resolving Indeterminacy in Dynamic Settings: The Role of Shocks,” The Quarterly Journal of Economics, 115(1), 285–304. Fudenberg, D., and D. Levine (2007): “Continuous time limits of repeated games with imperfect public monitoring,” Review of Economic Dynamics, 10(2), 173–192. (2009): “Repeated Games with Frequent Signals,” Quarterly Journal of Economics, 124, 233–265. Fudenberg, D., and J. Tirole (1983): “Capital as a commitment: Strategic investment to deter mobility,” Journal of Economic Theory, 31(2), 227–250. Gensbittel, F., S. Lovo, J. Renault, and T. Tomala (2016): “Zero-sum Revision Games,” mimeo. Georgiadis, G. (2015): “Projects and Team Dynamics,” Review of Economic Studies, 82(1), 187–218. Hamadene, S., and J.-P. Lepeltier (1995): “Backward equations, stochastic control and zero-sum stochastic differential games,” Stochastics, 3(4), 221–231. Harris, C., and J. Vickers (1987): “Racing with Uncertainty,” Review of Economic Studies, 54(1), 1–21. Iijima, R., and A. Kasahara (2015a): “Gradual Adjustment and Equilibrium Uniqueness under Noisy Monitoring,” ISER Discussion Paper No. 965. 44 Kamada, Y., and M. Kandori (2015): “Revision Games Part I: Theory,” mimeo. Kobylanski, M. (2000): “Backward stochastic differential equations and partial differential equations with quadratic growth,” Annals of Probability, 28(2), 558–602. Konrad, K. A. (2009): Strategy and Dynamics in Contests. Oxford University Press. Lipman, B. L., and R. Wang (2000): “Switching Costs in Frequently Repeated Games,” Journal of Economic Theory, 93(2), 148–90. Lockwood, B. (1996): “Uniqueness of Markov-perfect equilibrium in infinite-time affinequadratic differential games,” Journal of Economic Dynamics and Control, 20(5), 751–765. Lovo, S., and T. Tomala (2015): “Markov Perfect Equilibria in Stochastic Revision Games,” mimeo. Morris, S., and H. Shin (2006): “Heterogeneity and Uniqueness in Interaction Games,” in The Economy as an Evolving Complex System III, ed. by L. Blume, and S. Durlauf, pp. 207–242. Oxford University Press. Morris, S., and H. S. Shin (1998): “Unique Equilibrium in a Model of Self-Fulfilling Currency Attacks,” The American Economic Review, 88(3), 587–597. Pagano, M. (1989): “Trading Volume and Asset Liquidity,” The Quarterly Journal of Economics, 104(2), 255–274. Papavassilopoulos, G. P., and J. B. Cruz (1979): “On the uniqueness of Nash strategies for a class of analytic differential games,” Journal of Optimization Theory and Applications, 27(2), 309–314. Pardoux, E., and S. Peng (1990): “Adapted solution of a backward stochastic differential equation,” Systems and Control Letters, 14(1), 55–61. Pham, H. (2010): Continuous-time Stochastic Control and Optimization with Financial Applications. Springer. Rosen, J. B. (1965): “Existence and Uniqueness of Equilibrium Points for Concave N-Person Games,” Econometrica, 33(3), 520–534. Sannikov, Y. (2007): “Games with Imperfectly Observable Actions in Continuous Time,” Econometrica, 75(5), 1285–1329. Sannikov, Y., and A. Skrzypacz (2007): “Impossibility of Collusion under Imperfect Monitoring with Flexible Production,” American Economic Review, 97(5), 1794–1823. Seel, C., and P. Strack (2015): “Continuous Time Contests with Private Information,” 45 Mathematics of Operations Research. Simon, L. K., and M. B. Stinchcombe (1989): “Extensive Form Games in Continuous Time: Pure Strategies,” Econometrica, 57(5), 1171–214. Staudigl, M., and J.-H. Steg (2014): “On Repeated Games with Imperfect Public Monitoring: From Discrete to Continuous Time,” Center for Mathematical Economics Working Papers, 525. Tsutsui, S., and K. Mino (1990): “Nonlinear Strategies in Dynamic Duopolistic Competition with Sticky Prices,” Journal of Economic Theory, 52, 136–161. Ui, T. (2008): “Correlated Equilibrium and Concave Games,” International Journal of Game Theory, 37(1), 1–13. Wernerfelt, B. (1987): “Uniqueness of Nash Equilibrium for Linear-Convex Stochastic Differential Game,” Journal of Optimization Theory and Applications, 53(1), 133–138. Yildirim, H. (2005): “Contests with multiple rounds,” Games and Economic Behavior, 51, 213–227. 46 Supplementary Appendix (For Online Publication) A Mathematical Foundations A.1 Lipschitz and Quadratic BSDEs This section introduces BSDEs and introduce relevant results.36 Fix any invertible d × d matrix Σ̃ throughout this subsection. Let X = {Xt }0≤t≤T be a standard d-dimensional Brownian motion on a filtered probability space (Ω, F, P), where F = {Ft }0≤t≤T is the augmented filtration generated by X, and T is a fixed finite horizon. We denote by S2 the set of Rn -valued progressively measurable processes W over [0, T ] such that E 2 sup |Wt | < ∞, 0≤t≤T and by H2 the set of Rn×d -valued progressively measurable processes β over [0, T ] such that Z E T |βt | dt < ∞. 2 0 We call g : Ω × [0, T ] × Rn × Rn×d → Rn a driver if the following conditions hold: - g(ω, t, w, b), written g(t, w, b) for simplicity, is progressively measurable for all w ∈ Rn and b ∈ Rn×d - g(t, 0, 0) ∈ S2 (0, T )n We call ξ a terminal condition if it is an Rn -valued random variable such that E[|ξ|2 ] < ∞. Definition S1. Let ξ be a terminal condition, and let g be a driver. A BSDE is a stochastic system of equations defined by37 Z T Z g(s, Ws , βs )ds − Wt = ξ + t T βs · Σ̃dXs , 0 ≤ t ≤ T. t A solution to a BSDE is a pair (W, β) ∈ S2 × H2 that satisfies the above system of equations. A BSDE is said to be one-dimensional if n = 1. 36 See, for example, Pham (2010). Note that the notation (W, β) used here corresponds to (Y, Z) in the standard notation. 37 In the standard notation, Σ̃ is usually an identity matrix. We chose to include this additional term, which does not change any of the results in this section, in order to make it easier to see their applicability to our model. 47 We often write the above BSDE in differential form as dWt = −g(t, Wt , βt )dt + βt · Σ̃dXt , WT = ξ. It is well-known that a BSDE has a unique solution (W, β) if g satisfies the uniform Lipschitz condition (Pardoux and Peng (1990)). However, since the equilibrium BSDE (10) does not satisfy the condition due to strategic feedback effects across players. Instead, its driver satisfies quadratic growth condition. More specifically, a driver g satisfies the quadratic growth condition if there is a constant C > 0 such that |g(t, w, b)| ≤ C(1 + |w| + |b|2 ), |g(t, w, b) − g(t, w̃, b̃)| ≤ C |w − w̃| + (|w| + |b| + |w̃| + |b̃|)|b − b̃| for all ω ∈ Ω, t ∈ [0, T ], w, w̃ ∈ Rn , and b, b̃ ∈ Rn×d . The following results for one-dimensional BSDEs are due to Kobylanski (2000). Lemma S 1 (Comparison Theorem). Consider a pair of one-dimensional BSDEs that are ˜ g̃) that satisfy the quadratic growth associated with drivers and terminal conditions (ξ, g) and (ξ, condition. Let (W, β) and (W̃ , β̃) be the corresponding solutions. If ξ ≥ ξ˜ almost surely and g(t, Wt , βt ) ≥ g̃(t, Wt , βt ) for almost all t almost surely, then Wt ≥ W̃t for all t almost surely. If, in addition, g(t, Wt , βt ) > g̃(t, Wt , βt ) holds for a strictly positive measure under dt ⊗ dP, then W0 > W̃0 . Lemma S2 (One-dimensional Quadratic BSDE). Consider a one-dimensional BSDE in which ξ is a terminal condition and g is a driver that satisfies the quadratic growth condition. Then there exists a unique solution (W, β) of the BSDE. A.2 Quadratic FBSDE Fix a probability space (Ω, G, Q) in which {Gt }0≤t≤T satisfies the usual conditions and (Zt )0≤t≤T is a standard d-dimensional Gt -Brownian motion. Note that Gt can be larger than the filtration generated by the Brownian motion Zt . A (Markovian) forward backward stochastic differential equation (FBSDE) is a stochastic system of equations that takes the form dXt = F (t, Xt , Wt , βt )dt + Σ̃dZt , X0 = x0 dWt = G(t, Xt , Wt , βt )dt + βt · dZt , WT = U (XT ) 48 (17) for some F : [0, T ] × Rn × Rn × Rn×d → Rn , G : [0, T ] × Rn × Rn × Rn×d → Rn , U : Rn → Rn , x0 ∈ Rn , and invertible Σ̃ ∈ Rd×d . A solution (X, W, β) to an FBSDE is an Rn ×Rn ×Rn×d -valued {Gt }-progressively measurable process that satisfies (17) and Z T 2 2 2 |Xt | + |Wt | + |βt | E dt < ∞. 0 The following result is due to Delarue (2002).38 Lemma S3 (Non-degenerate FBSDE). Consider the FBSDE (17). Assume that there exists a constant K > 0 such that the following conditions are satisfied (for all t, x, x0 , w, w0 , b, b0 ): 1. F is continuous in x and satisfies |F (t, x, w, b) − F (t, x, w0 , b0 )| ≤ K(|w − w0 | + |b − b0 |) (x − x0 ) · (F (t, x, w, b) − F (t, x0 , w, b)) ≤ K|x − x0 |2 |F (t, x, w, b)| ≤ K(1 + |x| + |w| + |b|) 2. G is continuous in w and satisfies |G(t, x, w, b) − G(t, x0 , w, b0 )| ≤ K(|x − x0 | + |b − b0 |) (w − w0 ) · (G(t, x, w, b) − G(t, x, w0 , b)) ≤ K|w − w0 |2 |G(t, x, w, b)| ≤ K(1 + |x| + |w| + |b|) 3. U is Lipschitz continuous and bounded, that is, it satisfies |U (x) − U (x0 )| ≤ K(|x − x0 |) |U (x)| ≤ K. Then there exists a unique solution (X, W, β) of the FBSDE. The solution takes the form Wt = w(t, Xt ), βt = b(t, Xt ) for some function w : [0, T ] × Rn → Rn that is uniformly Lipschitz continuous in x ∈ Rn , and some function b : [0, T ] × Rn → Rd×n . 38 Delarue (2002) uses a more general formulation that, in particular, allows the matrix Σ̃ to depend on t and X. He requires the condition called uniform parabolicity: There exist ν, ν 0 > 0 such that ∀ξ ∈ Rd , ν ≤ 1 0 0 0 |ξ|2 ξ (Σ̃(t, X)Σ̃ (t, X))ξ ≤ ν for all t, X. To see that this condition is satisfied in the current framework, first note that Σ̃Σ̃0 has d real positive eigenvalues (λ1 , · · · , λd ), because Σ̃ is invertible, which guarantees that Σ̃Σ̃0 is positive definite. By the standard results in linear algebra, 0 < (mink λk )|ξ|2 ≤ ξ 0 (Σ̃Σ̃0 )ξ ≤ (maxk λk )|ξ|2 for any non-zero ξ ∈ Rd . 49 Lemma S4. There is Cf > 0 such that |f i (β, X) − f i (β̃, X̃)| ≤ Cf (|β i − β̃ i | + |X − X̃|) for all β, β̃, X, X̃. Proof. Denote the control range by Ai = [Ai , Āi ]. Since β i · Σ−1 µi αi − ci (αi , X) is strictly concave in αi , its unique maximizer Ai is continuous in β and X and satisfies the first-order condition β i · Σ−1 µi = the form ∂ci (Ai ,X) ∂Ai in the case of Ai ∈ (Ai , Āi ). Furthermore, this maximizer takes ∂ci (Āi ,X) −1 i i Ā if β · Σ µ ≥ i ∂Ai i (Ai ,X) ∂ci (Āi ,X) f i (β, X) = g i (β, X) if β i · Σ−1 µi ∈ ( ∂c ∂A , ∂Ai ) i i (Ai ,X) Ai if β i · Σ−1 µi ≤ ∂c ∂A i where g i (β, X) is the unique value Ai ∈ Ai that is defined by the equality ∂ci (Ai ,X) −β i ·Σ−1 µi ∂Ai i = 0. Note that this value is uniquely determined, since ci is strictly convex in A . By the implicit function theorem, for k = 1, . . . , d, he first-order partial derivatives of g i satisfy i ∂ 2 ci ∂g ∂Ai ∂X k ∂X k = ∂ 2 ci , (∂Ai )2 i k −1 ∂g e · Σ µi ∂β ik = ∂ 2 ci , (∂Ai )2 where β ik denotes the kth element of the vector β i ∈ Rn , and ek is the kth unit vector in ∂ 2 ci Rn . Since we can choose constants M, M 0 > 0 uniformly in (β, X) such that (∂A i )2 > M and ∂ 2 ci ∂Ai ∂X k < M 0 by assumption, this establishes the Lemma. B B.1 Additional Results Discontinuous terminal payoff In this subsection we consider the case where terminal payoffs can be discontinuous, and show the uniqueness result (Theorem 1) if the noise distribution is approximately uniform. Formally we impose the following requirement on U i : Assumptions. U i is piecewise Lipschitz in X i , i.e., for each X −i there exist (ψki (X −i ))k=1,...,Ki such that U i (X i , X −i ) is Lipschitz continuous in X i ∈ R \ {ψki (X −i ) : k = 1, ..., Ki }. Also, assume that each ψki (X −i ) and sup{limn |U i (Xni ) − U i (ψ i (X j ))| : Xni → ψ i (X j } are Lipschitz continuous and bounded in X −i . Theorem S1. Consider the model in Section 2 with the modified assumption as above. Assume that lim∆ r∆ ∆ = ∞. Then there exists δ > 0 such that if max{∆, supi, 50 |φ0i (i )| } r∆ ≤ δ, then there is a unique subgame-perfect equilibrium. Proof. To simplify the notation, we only consider the two-player case and assume that Ki (Xj ) = 1 for all Xj (i.e., there is only one “jump” point in each player’s terminal payoff). The general case can be proved analogously. Take any history ht at t = T − ∆ and let Xt denote the state at t. After this history, each player i’s equilibrium action Ait (ht ) is to maximize the payoff Q function Vti (·; Xt ) : j=1,2 Ai → R defined by Vti (At ; Xt ) Z = U i (Xt + ∆At + r∆ εt ) Y φj (jt )dεt − ∆ci (Ait ) j=1,2 Z = Z j ψ i (X )−Xti −∆Ait T r∆ U i (Xt + ∆At + r∆ εt )φi (it )dεit φj (jt )djt −∞ Z Z + ! ∞ i j ψ i (X )−Xti −∆Ait T r∆ U (Xt + ∆At + r∆ εt )φi (it )dεit φj (jt )djt −∆ci (Ait ) over Ait given the current state Xt and the other’s action Ajt . Let Gi (XTj ) := limX i %ψi (X j ) U i (X i ) − U i (ψ i (XTj )) denote the amount of jump of at X i = T ψ i (XTj ). The first-order derivative takes the form Z Z j ψ i (X )−Xti −∆Ait T r∆ ∂ U i (XT )φi (it )dεit φj (jt )djt i ∂X −∞ ! Z Z ∞ ∂ +∆ U i (XT )φi (it )dεit φj (jt )djt j i ψ i (X )−Xti −∆Ait ∂X T r∆ !! Z ∆ ψ i (XTj ) − ∆Ait j + φj (jt )djt Gi (XT )φi r∆ r∆ ∂ i V (At ; Xt ) = ∆ ∂Ait t −∆c0i (Ait ), where we denote XT = Xt + At ∆ + r∆ t to simplify the notation. 51 The cross derivative can be derived as 2 2 ∂ ∆ i V (A ; X ) = t t t j r∆ ∂Ait ∂At Z Z ∆2 + r∆ Z ∆2 + r∆ Z ∆2 + 2 r∆ Z j ψ i (X )−Xti −∆Ait T r∆ ∂ U i (XT )φi (it )dεit (−φ0j (jt ))djt i ∂X −∞ ! Z ∞ ∂ U i (XT )φi (it )dεit (−φ0j (jt ))djt j i ψ i (X )−Xti −∆Ait ∂X T r∆ !! j i i ψ (X ) − ∆A t T G0i (XTj )φi φj (jt )djt r∆ ! ! j i ψ (X ) − ∆A i t T Gi (XTj )φ0i ψi0 (XTj ) φj (jt )djt r∆ where the first two terms are derived as in the proof of Theorem 1. These terms are bounded in absolute by a constant times ∆2 r∆ as in the proof of Theorem 1. Since Gi is absolutely continuous, we can apply the integration by parts to bound the third term of ∂2 V i (At ; Xt ) ∂Ait ∂Ajt t by Z ! j i i ∆ ψ (XT ) − ∆At j j 0 j j φ (t )dt Gi (XT )φi r∆ r∆ " ! #∞ ∆2 Gi (XTj ) ψ i (XTj ) − ∆Ait j j = φi φ (t ) r∆ ∆ r∆ −∞ Z ! ! ! j i i ∆ ∆2 ψ i (XTj ) − ∆Ait ψ (X ) − ∆A j j j j j t j 0 0 i 0 j T (φ ) (t ) + φi (ψ ) (XT ) φ (t ) dt + Gi (XT ) φi r∆ r∆ r∆ r∆ i ∆ ψ (x) − ∆Ait ψ i (x) − ∆Ait ≤ sup Gi (x)φi lim φj () − G(x0 )φi lim φj () →∞ →−∞ r∆ x,x0 ∈R2 r∆ r∆ Z ψ i (x) − ∆Ait ∆3 (ψi )0 (x) |φj (jt )|djt + 2 sup Gi (x)φ0i r∆ x∈R2 r∆ i Z 2 ψ (x) − ∆Ait ∆ j j 0 j sup Gi (x)φi + |(φ ) (t )|dt , r∆ x∈R2 r∆ 2 where the last step involved Holder’s inequality. The first term of the righthand side is 0 because of lim→±∞ φj () = 0. The remaining terms are bounded in absolute by a constant times Lastly, the fourth term of ∂2 V i (At ; Xt ) ∂Ait ∂Ajt t is bounded in absolute by ∆2 2 r∆ ∆2 . r∆ sup |φ0i ()| times a constant. By the same argument as in the proof of Theorem 1, one can show the uniqueness of equilibrium action profile in this subgame if sup |φ0i ()| r∆ are analogous. 52 is sufficiently small. The remaining steps B.2 Equilibrium Multiplicity without Deadline We demonstrate that in the continuous-time model in Section 5 equilibrium may not be unique if there is no deterministic bounded deadline T . To make this point, we assume that the deadline T is time of the first arrival of a Poisson process with fixed intensity λ > 0. For simplicity, P here we focus on the two-player case with one-dimensional state dXt = i=1,2 Ait dt + σdZt . Moreover, we assume that ci depends on both Ait and Xt . As clarified in Section 6.1, the uniqueness result in 5 still holds even when ci depends on both Ait and Xt if T is fixed. Focusing on Markov strategies that depend on only the current state Xt , we can derive the following HJB equations: " # σ Ak + (wi )00 (x) + λ U i (x) − wi (x) , −ci (x, Ai ) + (wi )0 (x) 0 = max 2 Ai ∈Ai k=1,2 X where wi (x) denote the continuation value of player i when the state Xt is equal to x. These equations can be written as " # σ i 00 A + (w ) (x) . λU (x) − c (x, A ) + (w ) (x) λw (x) = max 2 Ai ∈Ai k=1,2 i i i i i 0 X k Therefore, the model is equivalent to the infinite-horizon model in which each player receives flow payoff λU i (Xt ) − ci (Xt , Ai ) and has discount rate λ. It is known in the literature that infinite-horizon differential games can have multiple equilibria in Markov strategies even if flow payoffs are independent of opponents’ actions (e.g., Tsutsui and Mino (1990)). As a particularly simple example, consider the game in which the terminal payoff U i is constantly zero and the flow payoff −ci is given by − 12 (Ait )2 + 4(Ajt )2 + γ(Xt )2 for some constant γ > 0. Based on Example 2 in Lockwood (1996), one can verify that there are two possible values of θ > 0 such that the strategy profile of the form Ait = −θXt is a (Markov perfect) equilibrium of this game when γ is small.39 This multiplicity contrasts with the uniqueness result in Section 5. Because there is no terminal-value condition on the continuation value Wti , the “backward induction” argument which drives the unique equilibrium in the original model does not work here. 39 Lockwood (1996) analyzes a deterministic model, in which σ = 0. Under quadratic payoffs, any Markov strategy equilibrium profile that is linear in the state constitutes an equilibrium of the stochastic model as long as the transversality condition is satisfied. 53 B.3 Quadratic Games We analyze a particular class of the continuous-time model in Section 5 with quadratic terminal payoff functions and cost functions: U i (X) = X k=1,2 c̄i ᾱik (X k )2 + β̄ik X k + γ̄i X 1 X 2 , ci (a) = a2 , 2 where ᾱik , β̄ik , γ̄i , c̄i are fixed constants such that c̄i > 0. We also allow for unbounded controls40 : Ai = R. With these assumptions, the equilibrium PDEs (11) can be simplified to ODEs. Moreover, Proposition S1 below states that an equilibrium is linear in the state variables, where the coefficients follow ODEs. For simplicity, the proposition assumes the two-player case. Proposition S 1. Consider a quadratic game. Let (αik (t), βik (t), γi (t), δi (t))i,k=1,2,t∈[0,T ] be a solution to the system of ordinal differential equations 2(αii (t))2 γi (t)γj (t) γ 2 (t) 4αij (t)αjj (t) − , α̇ij (t) = − i − , c̄i c̄j 2c̄i c̄j 2αii (t)βii (t) βjj (t)γi (t) + βij (t)γj (t) βii (t) 2(αij (t)βjj (t) + αjj (t)βij (t)) β̇ii (t) = − − , β̇ij (t) = − γi (t) − , c̄i c̄j c̄i c̄j 2αii (t)γi (t) 2(αij (t)γj (t) + αjj (t)γi (t)) βii2 βij βjj 2 2 γ̇i (t) = − − , δ̇i (t) = −(σ1 ) αi1 − (σ2 ) αi2 − − c̄i c̄j 2c̄i c̄j α̇ii (t) = − (18) associated with the boundary conditions αik (T ) = ᾱik , β̄ik (T ) = βik , γ̄i (T ) = γi , δi (T ) = 0. (19) Then a Nash equilibrium strategy takes the form ai (t, x1 , x2 ) = 1 (2αii (t)xi + βii (t) + γi (t)xj ). c̄i Proof. We first solve (αik (t), βik (t), γi (t), δi (t))i,k=1,2,t∈[0,T ] by solving the system of ordinal differential equations 40 Because of the tractability of quadratic games, they have been widely employed in applications. A drawback to this approach is that it involves unbounded terminal payoffs and unbounded control ranges, which violate the assumptions for our main theorems; the uniqueness of equilibrium is an open question. 54 2(αii (t))2 γi (t)γj (t) γ 2 (t) 4αij (t)αjj (t) − , α̇ij (t) = − i − , c̄i c̄j 2c̄i c̄j 2αii (t)βii (t) βjj (t)γi (t) + βij (t)γj (t) βii (t) 2(αij (t)βjj (t) + αjj (t)βij (t)) β̇ii (t) = − − , β̇ij (t) = − γi (t) − , c̄i c̄j c̄i c̄j 2αii (t)γi (t) 2(αij (t)γj (t) + αjj (t)γi (t)) β2 βij βjj γ̇i (t) = − − , δ̇i (t) = −(σ1 )2 αi1 − (σ2 )2 αi2 − ii − c̄i c̄j 2c̄i c̄j α̇ii (t) = − (20) associated with the boundary conditions: αik (T ) = ᾱik , β̄ik (T ) = βik , γ̄i (T ) = γi , δi (T ) = 0. (21) Guess the functional form of value function by wi (t, x1 , x2 ) = X αik (t)(xk )2 + βik (t)xk + γi (t)xi xj + δi (t). k=i,j Substituting this guess into the HJB equations (11), 0= X k=i,j 1 (2αii xi + βii + γi xj )2 α̇ik (xk )2 + β̇ik xk + γ̇i xi xj + δ̇i + (σ1 )2 αi1 + (σ2 )2 αi2 + 2c̄i 1 (2αij xj + βij + γi xi )(2αjj xj + βjj + γj xi ) c̄j 2αii2 γi2 2αii γi γi γj 4αij αjj 2 i 2 j 2 i j =(x ) α˙ii + + + (x ) α˙ij + + + x x γ̇i + + (αij γj + αjj γi ) c̄i c̄j 2c̄j c̄j c̄i c̄j 2αii βii βjj γi + βij γj βii γi 2(αij βjj + αjj βij ) + xi β̇ii + + + xj β˙ij + + c̄i c̄j c̄i c̄j 2 βij βjj β + δ̇i + (σ1 )2 αi1 + (σ2 )2 αi2 + ii + 2c̄i c̄j + where we have omitted the argument t for coefficient functions. By the proposed ODE, the right hand side becomes zero for every x1 and x2 . Also, the boundary condition holds with wi (T, x1 , x2 ) = X ᾱik (xk )2 + β̄ik xk + γ̄i xi xj k=i,j which verifies the solution. 55 Finally, by Theorem 2, the equilibrium action is given by ai (t, x1 , x2 ) = wii (t, x1 , x2 ) 1 = (2αii (t)xi + βii (t) + γi (t)xj ). c̄i c̄i 56
© Copyright 2026 Paperzz