Non-Cooperative and Semi-Cooperative Differential Games Wen Shen Department of Mathematics Penn State University McAllister Building, University Park, PA shen [email protected] Abstract In this paper we review some recent results on non-cooperative and semicooperative differential games. For the n-person non-cooperative games in one-space dimension, we consider the Nash equilibrium solutions. When the system of Hamilton-Jacobi equations for the value functions is strictly hyperbolic, we show that the weak solution of a corresponding system of hyperbolic conservation laws determines an n-tuple of feedback strategies. These yield a Nash equilibrium solution to the non-cooperative differential game. However, in the multi-dimensional cases, the system of Hamilton-Jacobi equations is generically elliptic, and therefore ill posed. In an effort to obtain meaningful stable solutions, we propose an alternative “semi-cooperative” pair of strategies for the two players, seeking a Pareto optimum instead of a Nash equilibrium. In this case, the corresponding Hamiltonian system for the value functions is always weakly hyperbolic. Key words. Non-cooperative differential games, Nash equilibrium, system of Hamilton-Jacobi equations, hyperbolic system of conservation laws, BV solutions, optimal control theory, discontinuous ODE, ill-posed Cauchy problem. AMS Subject Classifications. Primary 91A23, 49N70, 93B52, 35L65 ; Secondary 91A10, 49N90, 49N35, 49L20, 34A36 . 1 Introduction In this paper we review some recent results on non-cooperative differential games. Non-cooperative games provide a mathematical model for the behavior of two or more individuals, operating in the same environment with different (possibly conflicting) goals. In the case of n players, the evolution of the system is governed by system of differential equations of the form ẋ(t) = n i=1 fi (x, ui ), x(τ ) = y. (1) 88 W. Shen Here the real-valued map t → ui (t) is the control implemented by the i-th player. Together with (1) we consider the payoff functionals: T . Ji = Ji (τ, y, u1 , . . . , un ) = gi (x(T )) − hi (x(t), ui (t)) dt, (2) τ Notice that (2) is the sum of a terminal payoff gi , depending on the state of the system at the final time T , and of a running cost hi , incurred while implementing the control ui . The goal of the i-th player is to maximize Ji . To see an example in economy, one can consider n companies which are planning to sell the same type of product. Let xi (t) be the market share of the i-th company at time t. This can change in time, depending on the level of advertising (u1 , . . . , un ) chosen by the various companies. At the final time T when the products are being sold, the payoff Ji of the i-th company will depend on its market share xi (T ) and on its total advertising cost, as in (2). A major step toward the understanding of non-cooperative games with several players was provided by the concept of Nash non-cooperative equilibrium, introduced by J. Nash [26]. Roughly speaking, a set of strategies (U1∗ , . . . , Un∗ ) constitutes a Nash equilibrium if, whenever one single player modifies his strategy (and the other players do not change theirs), his own payoff will not increase. This concept was first formulated in the context of static games, where no time-evolution is involved. It is natural to explore the relevance of Nash equilibria also in connection with differential games. Results on the existence of Nash equilibrium solutions for open-loop strategy can be found in [17,32]. In this case, each player has knowledge only of the initial state of the system. His strategy is thus a function of time only, say ui = Ui (t). In this paper, we analyze the existence and stability of Nash equilibrium strategies in feedback (closed-loop) form. Here, the players can directly observe the state x(t) of the system at every time t ∈ [0, T ], therefore their strategies depend on x. More precisely, an n-tuple of feedback strategies ui = Ui∗ (t, x), i = 1, . . . , n, is called a Nash equilibrium solution if the following holds. For each i, if the i-th player chooses an alternative strategy Ui , while every other player j = i sticks to his previous strategy Uj∗ , then the i-th payoff does not increase: ∗ ∗ , Ui , Ui+1 , . . . , Un∗ ) Ji (τ, y, U1∗ , . . . , Ui−1 ∗ ∗ ∗ ≤ Ji (τ, y, U1 , . . . , Ui−1 , Ui∗ , Ui+1 , . . . , Un∗ ). Therefore, for the i-th player, the feedback strategy ui = Ui∗ (t, x) provides the solution to the optimal control problem max gi (x(T )) − ui (·) T τ hi (x(t), ui (t)) dt , (3) Non-Cooperative and Semi-Cooperative Differential Games 89 in connection with the system ẋ = fi (x, ui ) + fj (x, Uj∗ (t, x)) . (4) j=i Assuming that the dynamics of the system and the payoff functions are sufficiently regular, the problem can be attacked using tools from P.D.E. theory. As in the theory of optimal control, the basic objects of our study are the value functions Vi . Roughly speaking, Vi (τ, y) denotes the payoff expected by the i-th player, if the game were to start at time τ , with the system in the state x(τ ) = y. As shown in [17, p. 292], these value functions satisfy a system of first order partial differential equations with terminal data: ∂ Vi + Hi (x, ∇V1 , . . . , ∇Vn ) = 0, ∂t Vi (T, x) = gi (x), i = 1, . . . , n. (5) In the case of a two-person, zero-sum differential game, the value function is obtained from the scalar Bellman-Isaacs equation [17]. The analysis can thus rely on comparison principles and on the well-developed theory of viscosity solutions for Hamilton-Jacobi equations; for example, see [2]. In our case, one has to study a highly nonlinear system of Hamilton-Jacobi equations. Previous results in this direction include only particular examples as in [3,13,14,27]. In the one-dimensional case, differentiating (5) one obtains a system of conser. vation laws for the gradient functions pi = Vi,x , namely: pi,t + Hi (x, p)x = 0. (6) Under the assumption on strict hyperbolicity (which is discussed in more detail in Sec. 2 the known results on systems of conservation laws can be applied. The theorem of Glimm [19] or its more general versions [4,21,24] provide then the existence of a global solution to the Hamilton-Jacobi equations for terminal data gi whose gradients have sufficiently small total variation. The Nash feedback strategies can then be recovered from the gradients of the value functions. Establishing the optimality of this feedback strategy is a nontrivial task due to lack of regularity. In Sec. 2, we prove the optimality by using the special structure of solutions of hyperbolic systems of conservation laws. However, when the state space is multi-dimensional, the corresponding system of P.D.E’s is generically not hyperbolic, and the Cauchy problem is not well posed. In Sec. 3, we study in detail a particular one-dimensional example where hyperbolicity fails, and construct a family of unstable, highly oscillatory solutions. Our conclusion is that the concept of Nash equilibrium is not appropriate for the study of feedback strategies for differential games in continuous time. Indeed, solutions are extremely sensitive to small perturbations of the data, so that the mathematical model has no predictive power. To readdress the situation, one possibility is to introduce some stochasticity in the system; see [18,25] and the references therein. The presence of random inputs, 90 W. Shen in the form of white noise, has a well-known stabilizing effect since it transforms the system into a parabolic one. Another possibility, explained in more detail in Sec. 3, is to allow some degree of cooperation among the players. As proved by Smale in connection with the repeated prisoner’s dilemma [31], even if the players do not communicate with each other, over a period of time they can devise strategies converging to a Pareto optimum. In the setting of differential games we prove that, if these semi-cooperative strategies are implemented, then the system of P.D.E’s for the value functions turns out to be always hyperbolic, at least in a weak sense. Partial cooperation thus removes the most severe instabilities found among Nash non-cooperative equilibrium solutions. 2 Feedback Nash Equilibrium to Non-Cooperative Differential Games Consider a differential game for n players in one space dimension, with the simple form: ẋ = f0 + ui , x(τ ) = y . (1) i Here the controls ui can be any measurable, real-valued functions, while f0 ∈ IR is a fixed constant. The payoff functionals are given by: Ji = Ji (τ, y, u1 , . . . , un ) = gi (x(T )) − T τ hi (ui (t)) dt . (2) A key assumption, used throughout the paper, is that the cost functions hi are smooth and strictly convex, with a positive second derivative ∂2 hi (ω) > 0. ∂ω 2 (3) The Hamiltonian functions Hi are thus defined as follows. By (3), for any j = 1, . . . , n and any given gradient vector pj = ∇Vj ∈ IRm , there exist a unique optimal control value u∗j (pj ) such that: . pj · u∗j (pj ) − hj (u∗ (pj )) = max{pj · ω − hj (ω)} = φj (pj ) . ω∈IR Then Hi (p1 , . . . , pn ) = pi · f0 + (4) u∗j (pj ) − hi (u∗i (pi )). (5) j The corresponding Hamilton-Jacobi equation for Vi takes the form Vi,t + Hi (V1,x , . . . , Vn,x ) = 0, (6) Non-Cooperative and Semi-Cooperative Differential Games 91 with data given at the terminal time t = T : Vi (T, x) = gi (x), i = 1, . . . , n. (7) . In turn, the gradients pi = Vi,x of the value functions satisfy the system of conservation laws with terminal data ∂ ∂ pi + Hi (p1 , . . . , pn ) = 0, ∂t ∂x pi (T, x) = gi (x) . (8) In recent years, considerable progress has been achieved in the understanding of weak solutions to hyperbolic systems of conservation laws in one-space dimension. In particular, entropy admissible solutions with small total variation are known to be unique and depend continuously on the initial data [7,8]. Moreover, they can be obtained as the unique limits of vanishing viscosity approximations [4]. We apply these new results to prove the existence and stability of Nash equilibrium solutions, in the context of differential games. The key question is whether this system of conservation laws admits a solution. Moreover, is this solution unique? How is it affected by small perturbations of the data? Classical P.D.E. theory provides conditions under which the Cauchy problem is “well posed”, i.e., it admits a unique solution depending continuously on the initial data. The basic requirement is that the system should be hyperbolic. For a given system of P.D.E’s, hyperbolicity amounts to an algebraic condition on the matrices of coefficients and can be checked in practice. 2.1 Hyperbolicity conditions Now we describe the hyperbolicity conditions for the system of conservation laws (8). The Jacobian matric A(p) of this system, with entries Aij = ∂Hi /∂pj , takes the form: ẋ p1 φ2 p1 φ3 · · · p1 φn p2 φ1 ẋ p2 φ3 · · · p2 φn ẋ · · · p2 φn A(p) = p3 φ1 p3 φ2 (9) , .. .. .. .. .. . . . . . pn φ1 pn φ2 pn φ3 · · · ẋ where φj (pj ) is defined in (4) and φj (pj ) = −(hj (u∗j (pj )))−1 is always negative. The system (8) is strictly hyperbolic at a point p = (p1 , . . . , pn ) if the Jacobian matrix A(p) has n real distinct eigenvalues. Our first result provides a sufficient condition for this to happen. Lemma 2.1. Assume that all components pi , i = 1, . . . , n, have the same sign. Moreover, assume that there are no distinct indices i = j = k such that: pi φi = pj φj = pk φk . (10) 92 W. Shen Then the system (8) is strictly hyperbolic at p. Moreover, all eigenvalues λi (p) of the matrix A(p) satisfy the inequality: λi (p) = ẋ = f0 + u∗j (pj ) i = 1, . . . , n . (11) j We note that if all the pi have the same sign, the eigenvalues will be real. The further condition (10) ensures that they are all distinct. If this condition fails, say, we have pi−1 φi−1 = pi φi = pi+1 φi+1 , then −pi φi becomes a multiple zero of det(B −λI). In this case, the system of conservation laws is only called hyperbolic (not strictly hyperbolic). Furthermore, from (11), the characteristic wave speeds λi are all different from the speed ẋ at which the state of the system changes. For a proof of the above result, see [10]. We mention that, in the case of two-player games, the condition p1 p2 > 0 is also necessary for the strict hyperbolicity of the system. However, one can give an example of a three-player game, where the system (8) is strictly hyperbolic even without p1 , p2 , p3 having all the same sign. 2.2 Solutions of the hyperbolic system Next, assume that the system of conservation laws (8) is strictly hyperbolic in a neighborhood of a point p∗ = (p∗1 , . . . , p∗n ). In this case, assuming that the terminal conditions have small total variation, one can apply the following theorem (cf. [4,7,8,19]) and obtain the global existence and uniqueness of a weak solution. Proposition 1. Assume that the flux function H : IRn → IRn is smooth and that, at some point p∗ , the Jacobian matrix A(p∗ ) = DF (p∗ ) has n real distinct eigenvalues. Then there exists δ > 0 for which the following holds. If p̄(·) − p∗ L∞ < δ, Tot.Var.{p̄} < δ, (12) p(0, x) = p̄(x) (13) then the Cauchy problem pt + H(p)x = 0, admits a unique entropy weak solution p = p(t, x) defined for all t ≥ 0, obtained as the limit of vanishing viscosity approximations. 2.3 Optimal trajectory; solutions of a discontinuous O.D.E. In general, a weak solution of the hyperbolic system of conservation laws (8) uniquely determines a family of discontinuous feedback controls Ui∗ = Ui∗ (t, x). Inserting these feedback controls in (1), we obtain the O.D.E. for the optimal trajectory: ẋ(t) = f0 + n i=1 u∗i (pi (t, x)) , x(τ ) = y. (14) Non-Cooperative and Semi-Cooperative Differential Games 93 Note that the right-hand side of this ODE is discontinuous, due to the discontinuities in the feedback controls Ui∗ = Ui∗ (t, x). In spite of this, the solution of the Cauchy problem (14) is unique and depends continuously on the initial data, thanks to the special structure of the BV solutions of hyperbolic systems of conservation laws. Indeed, every trajectory of (14) crosses transversally all lines of discontinuity in the functions pi . Because of the bound on the total variation, the uniqueness result in [6] can thus be applied. We explain the ideas below. First, we observe that the solution p = p(t, x) of (8) has bounded directional variation along a cone Γ, strictly separated from all characteristic directions. Indeed, by assumption, the matrix A(p∗ ) has distinct eigenvalues λ∗1 < λ∗2 < · · · < λ∗n . By continuity, there exists ε > 0 such that, for all p in the ε-neighborhood . Ω∗ε = {p; |p − p∗ | ≤ ε}, the characteristic speeds range inside disjoint intervals + λj (p) ∈ [λ− j , λj ]. (15) Moreover, if p− , p+ ∈ Ω∗ε are two states connected by a j-shock, the speed + λj (p− , p+ ) of the shock remains inside the interval [λ− j , λj ]. Now consider an open cone of the form . Γ = {(t, x); t > 0, a < x/t < b}. (16) Following [6], we define the directional variation of the function (t, x) → p(t, x) along the cone Γ as: N sup |p(ti , xi ) − p(ti−1 , xi−1 )| , (17) i=1 where the supremum is taken over all finite sequences (t0 , x0 ), (t1 , x1 ), . . . , (tN , xN ) such that: (ti − ti−1 , xi − xi−1 ) ∈ Γ for every i = 1, . . . , N ; (18) see Fig. 1. The next lemma shows that the weak solution p = p(t, x) has bounded directional variation along a suitable cone Γ. Lemma 2.2. Let p = p(t, x) be an entropy weak solution of (13) taking values − inside the domain Ω∗ε . Assume that λ+ k−1 < a < b < λk for some k. Then p has bounded directional variation along the cone Γ in (16). See [10] for a detailed proof. Together with Γ we now consider a strictly smaller cone, say, . Γ = {(t, x); t > 0, a < x/t < b }, (19) with a < a < b < b. A standard theorem in real analysis states that a BV function of one real variable admits left and right limits at every point. An analogous result for functions with bounded directional variation is proved in [10]. 94 W. Shen x y Γ (t i , x i ) b a t 1 Figure 1: Directional variation along the cone Γ. Lemma 2.3. Let p = p(t, x) be a function with bounded directional variation along the cone Γ in (16), and consider the smaller cone Γ ⊂ Γ in (19), with a < a < b < b. Then at every point P = (t, x) there exist the directional limits: . . p+ (P ) = lim p(Q), p− (P ) = lim p(Q). (20) Q→P, Q−P ∈Γ Q→P, P −Q∈Γ Now, due to the transversality condition and the bound on directional total variation, the result in [6] can be applied, providing the uniqueness and continuous dependence of trajectories of (14). We refer to [10] for more details. 2.4 Optimal feedback strategies We see that from the gradients of the value functions one can recover the Nash feedback strategies for the various players. To obtain an existence result for solutions of differential games, one has to show that, for each single player, the feedback strategy corresponding to the solution of the Hamilton-Jacobi system actually provides the optimal solution to the control problem (3)–(4). We remark that, if the value functions Vi were smooth, the optimality would be an immediate consequence of the equations. The main technical difficulty stems from the non-differentiability of these value functions. In the literature on control theory, sufficient conditions for optimality have been obtained along two main directions. On one hand, there is the “regular synthesis” approach developed by Boltianskii [5], Brunovskii [12], and Sussmann and Piccoli [29]. In this case, one typically requires that the value function be piecewise C 1 and satisfy the Hamilton-Jacobi equations outside a finite or countable number of smooth manifolds Mi . On the other hand, one can use the Crandall-Lions theory of viscosity solutions, and show that the value function is the unique solution of the Hamilton-Jacobi equation in the viscosity sense [2]. Non-Cooperative and Semi-Cooperative Differential Games 95 None of these approaches is applicable in the present situation because of lack of regularity, for both the value functions and the system itself. Indeed, each player now has to solve an optimal control problem for a system whose dynamics (determined by the feedbacks used by all other players) is discontinuous. Our proof of optimality strongly relies on the special structure of BV solutions of hyperbolic systems of conservation laws. In particular, the solution has bounded directional variation along a cone Γ bounded away from all characteristic directions. As a consequence, the value functions Vi always admit a directional derivative in the directions of the cone Γ. For trajectories whose speed remains inside Γ, the optimality can thus be tested directly from the equations. An additional argument, using Clarke’s generalized gradients [15], rules out the optimality of trajectories whose speed falls outside the above cone of directions. The following Theorem is proved in [10]. Theorem 2.1. Consider the differential game (1)–(2), where the cost functions hi are smooth and satisfy the convexity assumption (3). In connection with the . functions φj at (4), let p∗ = (p∗1 , . . . , p∗n ) be a point where the assumptions of Lemma 2.1 are satisfied. Then there exists δ > 0 such that the following holds. If gi − p∗i L∞ < δ , Tot.Var.{gi (·)} < δ , i = 1, . . . , n , (21) then for any T > 0 the terminal value problem (8) has a weak solution p : . [0, T ] × IR → IRn . The (possibly discontinuous) feedback controls Uj∗ (t, x) = u∗j (pj (t, x)) implicitly defined by (4) provide a Nash equilibrium solution to the differential game. The trajectories t → x(t) depend Lipschitz continuously on the initial data (τ, y). It is interesting to observe that the entropy admissibility conditions play no role in our analysis. For example, a solution of the system of conservation laws consisting of a single, non-entropic shock still determines a Nash equilibrium solution, provided that the amplitude of the shock is small enough. There is, however, a way to distinguish entropy solutions from all others that is also in the context of differential games. Indeed, entropy solutions are precisely the ones obtained as vanishing viscosity limits [4]. They can thus be derived from a stochastic differential game of the form dx = n fi (x, ui ) dt + ε dw, i=1 letting the white noise parameter ε → 0. Here, dw formally denotes the differential of a Brownian motion. For a discussion of stochastic differential games we refer to [18]. 96 W. Shen 3 Semi-Cooperative Differential Games 3.1 Lack of hyperbolicity in vector cases Unfortunately, though strict hyperbolicity can usually be found in one-space dimension, it is not the case in higher-space dimensions. When the state of the system is described by a vector x ∈ IRm , m ≥ 2, the system of Hamilton-Jacobi equations (5) for the value functions is generically not hyperbolic. For the reader’s convenience, we recall here some basic definitions. The linear multidimensional system with constant coefficients m ∂ ∂ v+ Aα v=0 ∂t ∂x α α=1 (1) is said to be hyperbolic if, for each vector ξ = (ξ1 , . . . , ξm ) ∈ IRm , the matrix . (2) ξα Aα A(ξ) = α admits a basis of real eigenvectors [30]. We shall say that (1) is weakly hyperbolic if all the eigenvalues of A(ξ) are real, for every ξ ∈ IRm . Next, given a point (x, p) = (x, p1 , . . . , pn ) ∈ IR(1+n)m , with x ∈ IRm , pi = ∇x Vi = (pi1 , . . . , pim ) ∈ IRm , consider the linearized system ∂vj ∂vi ∂Hi + (x, p1 , . . . , pn ) · =0 ∂t ∂p ∂x jα α j,α i = 1, . . . , n , (3) where all derivatives are computed at the point (x, p). This is equivalent to (1), with (Aα )ij = ∂Hi (x, p1 , . . . , pn ) . ∂pjα (4) We now say that the system in (5) is hyperbolic (weakly hyperbolic) on a domain Ω ⊆ IR(1+n)m if, for every (x, p) ∈ Ω, the linearized system (3) is hyperbolic (weakly hyperbolic, respectively). To understand why the hyperbolicity condition fails, in a generic multidimensional situation, consider, for example, a two-player game on IRm . In the scalar case, we have seen that the 2 × 2 system of Hamiton-Jacobian equations is not hyperbolic if the gradients of the value functions have opposite signs. In the multidimensional case, whenever ∇V1 , ∇V2 ∈ IRm are not parallel to each other, we can find a vector ξ such that ∇V1 · ξ < 0 and ∇V2 · ξ > 0; see Fig. 2. In this case, the eigenvalues of the corresponding matrix A(ξ) in (2) and (4) are complex, and the system is called elliptic. Non-Cooperative and Semi-Cooperative Differential Games 97 V2 ∆ ξ V1 ∆ Figure 2: Hyperbolicity fails generically in multi-space dimensions. 3.2 Ill-posedness of the Cauchy problem When the system is not hyperbolic, the Cauchy problem (5) is ill posed. See [16,23] for recent discussion on this subject. Here we show a brief analysis on how vanishing viscosity approximations can fail to converge to a well-defined solution. For more details, see [11]. Consider a two-persons non-cooperative differential game in one-space dimension, with the simple dynamics ẋ = u1 + u2 , (5) x(τ ) = y , and payoff functionals Ji = Ji (τ, y, u1 , u2 ) = gi x(T ) − T τ u2i dt 2 i = 1, 2. Here, ui is the control implemented by the i-th player, while gi is his terminal . payoff. Let V1 , V2 be the corresponding value functions, and call p1 = V1,x and . p2 = V2,x their spatial derivatives. The corresponding optimal feedback control u∗i for the i-th player is u∗i (pi ) = arg max pi · ω − (ω 2 /2) = pi , (6) ω and the Hamiltonian functions are Hi (p1 , p2 ) = (p1 + p2 )pi − p2i /2, i = 1, 2 . Therefore, p = (p1 , p2 ) satisfies a 2 × 2 system of conservation laws, solved backward in time pi,t + Hi (p1 , p2 )x = 0, pi (x, T ) = gi,x (x) . Setting τ = T − t, and still using t as time variable, we obtain a more standard Cauchy problem, to be solved forward in time: p1,t − (p21 /2 + p1 p2 )x = 0, (7) p2,t − (p22 /2 + p1 p2 )x = 0, 98 W. Shen with the initial data p1 (0, x) = g1,x (x), p2 (0, x) = g2,x (x). The system (7) can be written in quasi-linear form: p1 p1 + p 2 . . A(p) = pt − A(p)px = 0, p2 p1 + p2 (8) (9) The eigenvalues of the matrix A(p) are real if p1 p2 ≥ 0, and complex if p1 p2 < 0. Throughout the following, we focus our attention on solutions with p1 p2 < 0, so that hyperbolicity fails. As a first step, we add a small viscosity and consider the parabolic system pεt − A(pε )pεx = εpεxx . (10) This system is related to a stochastic differential game with dynamics dx = (u1 + u2 )dt + ε dω , where ω denotes a standard Brownian motion, as in [18]. Observe that pε = (pε1 , pε2 ) provides a solution to (10) if and only if pε (t, x) = p(t/ε, x/ε) , where p = (p1 , p2 ) solves the system with unit viscosity p1,t − (p21 /2 + p1 p2 )x = p1,xx , p2,t − (p22 /2 + p1 p2 )x = p2,xx . (11) To achieve an understanding of solutions of (10), it thus suffices to study the system (11). An interesting class of solutions of (11) are the traveling waves, having the form p(t, x) = P (x − σt). The function P : IR → IR2 must then satisfy the second-order O.D.E. P = −[A(P ) + σI]P , (12) where A = DH is the Jacobian matrix in (9) and I denotes the 2 × 2 identity matrix. Integrating equation (12) once, we obtain: P = (H(P ) + σP ) − H(P ) + σP , where P = (p̄1 , p̄2 ) is some constant vector. We are particularly interested in periodic solutions of the O.D.E.: p1 = (p̄1 p̄2 + p̄21 /2) − (p1 p2 + p21 /2) − σ(p1 − p̄1 ) , (13) p2 = (p̄1 p̄2 + p̄22 /2) − (p1 p2 + p22 /2) − σ(p2 − p̄2 ) , Non-Cooperative and Semi-Cooperative Differential Games 99 taking values inside the elliptic region where p1 p2 < 0. Linearizing (13) at the equilibrium point (p̄1 , p̄2 ), one gets: z1 = −(p̄1 + p̄2 + σ)z1 − p̄1 z2 , (14) z2 = −p̄2 z1 − (p̄1 + p̄2 + σ)z2 . . Notice that, if one chooses σ = σ̄ = −p̄1 − p̄2 , then the two eigenvalues √ λ1 , λ2 = −(p̄1 + p̄2 + σ) ± i −p̄1 p̄2 are purely imaginary. By the Hopf bifurcation theorem [28], for every δ > 0 sufficiently small there exists a value σ = σ(δ) ≈ σ̄ such that the corresponding system (14) has a periodic orbit passing through the point (p̄1 + δ , p̄2 ). In this way, we obtain a family of periodic orbits forthe system (13), depending on the parameters p̄1 , p̄2 , and δ. If s → p1 (s), p2 (s) is any such orbit, then . p1 (t, x), p2 (t, x) = p1 (x − σt), p2 (x − σt) (15) yields a solution of the parabolic system (11) in the form of a periodic traveling wave. In turn, the functions x − σt x − σt , p2 pε1 (t, x), pε2 (t, x) = p1 ε ε (16) provide a solution to the system (10) with small viscosity. We now recall that, by (6), the corresponding dynamic of the system is: ẋ(t) = u∗1 + u∗2 = p1 x − σt ε + p2 x − σt ε . In our construction, p1 + p2 ≈ p̄1 + p̄2 = σ ≈ −p̄1 − p̄2 . As the viscosity parameter ε → 0+, along each trajectory the controls (u∗1 , u∗2 ) = (pε1 , pε2 ) are periodic functions of time with fixed amplitude and with period approaching zero. Because of this oscillatory behavior, there is no strong limit in L1 . Yet, a weak limit exists and can be represented in terms of Young measures [30]. These oscillatory limits can now be interpreted as chattering feedback controls. The limit trajectories cover the whole t-x plane. They all have the same constant speed, determined by the weak limit of pε1 + pε2 . A further analysis in [11] shows that these viscous traveling waves have almost the same instability properties as the constant states. 100 W. Shen 3.3 Transition from Nash equilibrium to a Pareto optimum The eventual conclusion of our analysis is that, except for the one-dimensional case, the concept of Nash non-cooperative equilibrium is not appropriate to study games with complete information, in continuous time. The highly unstable nature of the solutions makes it impossible to extract useful information from the mathematical model. In the literature, various approaches have been proposed, to overcome this basic difficulty. Following [3], one can study a special class of multi-dimensional games, with linear dynamics and quadratic cost functionals. In this case, the system of Hamilton-Jacobi equations (5) may be ill posed, but one can always find a unique solution within the set of quadratic polynomial functions. An alternative strategy is to add some random noise to the system. This leads to the analysis of a stochastic differential game [18,25], with dynamics described by dx = f (x, u1 , . . . , un ) dt + ε dw , where dw is the differential of a Brownian motion. The corresponding system describing the value functions is now parabolic, and admits a unique smooth solution. However, one should be aware that, as ε → 0, the solutions become more and more unstable and may not approach a well-defined limit. An entirely different approach was proposed in [11], where the authors explored the possibility of partial cooperation among players. To explain the heart of the matter, we first observe that the Hamiltonian functions at (5)–(4) are derived from the following instantaneous optimization problem. Given p1 , . . . , pn ∈ IRm , the i-th player seeks a control value ui which maximizes his instantaneous payoff: Yi = pi · f0 + uj − hi (ui ) . (17) j In the case of two players, the set of possible payoffs (Y1 , Y2 ) attainable as (u1 , u2 ) ∈ IR2m corresponds to the shaded region in Fig. 3. The Nash equilibrium strategy produces the payoffs at N, and corresponds to the Hamiltonian in (5)–(4). In this context, it is interesting to examine alternative strategies for the two players, resulting in different Hamiltonian functions. If full cooperation were possible, then the players would simply choose the strategy that maximizes the sum Y1 + Y2 of the two payoffs, i.e., the point C in Fig. 3. In this case, u1 , u2 can be regarded as components of one single control function. The optimization problem thus fits entirely within the framework of optimal control theory. The only additional issue arising from the differential game is the possible side payment that one of the players should make to the other, to preserve fairness. Alternatively, the players may choose strategies u1 , u2 corresponding to a Pareto optimum P. See [1] for basic definitions. In the case where the two players cannot communicate and are not allowed to make side payments, their behavior can still drift away from a Nash equilibrium and approach a Pareto optimum Non-Cooperative and Semi-Cooperative Differential Games 101 Y2 P N 0 C Y1 Figure 3: From Nash equibrium to Pareto optimal. which improves both of their payoffs. For a game modeling an iterated prisoner’s dilemma, in [31] Smale introduced a class of “good” strategies, which induce the other player to cooperate. Asymptotically for large times, the outcome of the game thus drifts away from the Nash equilibrium, approaching a Pareto optimum. It is remarkable that these strategies do not require any direct communication among the players. These same ideas are appropriate also in continuous time. Real life situations where such transition happens can often be observed. For example, you were invited to a party. You went there, and it turned out to be very boring. You wished you could leave but it would look bad if you were the first to leave, so you stayed. Many of the other guests were having the same thoughts as you, and they stayed also. This is a Nash Equilibrium of this non-cooperative game. You were not discussing with other guests about leaving, but everybody had the same intention. Then, if someone eventually got up and approached the host, many other guests would join him, so the first person would not look bad to be the first one to leave. So, suddenly everyone was leaving. This is a Pareto optimum. The key in the transition is the willingness of each player to cooperative, even though without actual cooperation. A full description of such transition in differential games can be found in [11]. To illustrate the main ideas, let (U1N , U2N ) and (Y1N , Y2N ) be the Nash strategies and the gains for two players, respectively, and let (U1P , U2P ) and (Y1P , Y2P ) be the strategies and gains with some Pareto optimum that both players think is fair, respectively. A strategy is called a good strategy for player 1 if the following three conditions hold. (C1) If the gain of the first player is smaller than what he gets by playing the Nash strategy, then he leans back toward U1N . (C2) If the second player is gaining more than his due profit Y2P , then the first player should again lean back toward U2N . (C3) If the second player is cooperating, then the first player should approach the Pareto strategy. Notice that the first two conditions say that player 1 should play “tough” when- 102 W. Shen ever the other player is taking advantage of him. The last condition implies that he should play “soft” when the game goes in his favor or when the other player is cooperating. For a given Pareto optimum (U1P , U2P ), the definition of a good strategy for player 2 is completely analogous. One can now envision a situation where each player adopts a partially cooperative strategy, based on the behavior of the other player. In [11], we showed that if both players adopt good strategies, then the outcome of the game will approach the Pareto optimum. 3.4 A smart anti-cheating strategy Assume that the first player adopts a good strategy, expecting the payoffs (Y1P , Y2P ). It is possible for the second player to “out-smart” him and gain more than his fair share Y2P . Indeed, assume p1 , p2 > 0, and two players are semicooperating, so the system is moving toward the Pareto optimum P. See Fig. 4. B E P F D G N Y2 Y1 A Figure 4: A possible “cheating cycle” by player 2. At a point D, close to P, player 2 can quickly change his control, moving close to the Nash equilibrium. The payoffs (Y1 , Y2 ) will move from Y D to Y E , favoring the second player. Of course player 1 doesn’t like it, and will move towards his Nash equilibrium. As a consequence, the payoff of the second player will now decrease. When the payoffs reach a point Y E such that Y2F = Y2P , player 2 decides to cooperate again, setting his control back to U2P , reaching the pointing G. Now the new payoffs (Y1G , Y2G ) are in favor of the first player, so he decides to cooperate, and the system is moving toward the Pareto optimum. The whole cycle can then be repeated. Notice now, in this cycle, player 2 can make the transition from D to E, and from F to G very quickly. Along the arc from E to F, player 2 gains more than his Non-Cooperative and Semi-Cooperative Differential Games 103 fair share Y2P , while along the arc from G to D, player 2 gains less than Y2P . If the time spent along the first arc is longer compared with the time spend on the latter, his strategy will be profitable in a long run. Based on this analysis, player 1 could design his counter-strategy, making it a smart anti-cheating one. In order to discourage the above behavior, player 1 should quickly go back to his Nash strategy when the other player is not cooperating, and approach the Pareto optimum slowly in the cooperative case. In other words, if player 2 tries to cheat, player 1 should not be too quick in restoring cooperation. By doing so, the gain of player 2 in the long run will not be more than his fair share Y2P . Again, see [11] for details. 3.5 Weak hyperbolicity of the semi-cooperative games If, for any given p1 , p2 , the players choose control values ui (p1 , p2 ) which yield some Pareto optimum, we say that their strategy is semi-cooperative. When these strategies are implemented, the value functions will satisfy a different system of Hamilton-Jacobi equations. Consider a two-person game with the dynamic ẋ = u1 + u2 . The instantaneous gain functionals Y1 , Y2 become: Y1 (p1 , p2 , u1 , u2 ) = p1 · (u1 + u2 ) − h1 (u1 ), Y2 (p1 , p2 , u1 , u2 ) = p2 · (u1 + u2 ) − h2 (u2 ). Let UiP (p1 , p2 , s) denote the Pareto optimum that maximizes the combined payoff sY1 + Y2 for some s > 0, and let’s write . YiP p1 , p2 , s) = Yi p1 , p2 , U1P (p1 , p2 , s), U2P (p1 , p2 , s) . Assume that the players adopt feedback strategies of the form u1 = u∗1 (p1 , p2 ) , u2 = u∗1 (p1 , p2 ) , then the value functions V1 , V2 will satisfy the system of Hamilton-Jacobi equations V1,t + H1 (∇x V1 , ∇x V2 ) = 0, V2,t + H2 (∇x V1 , ∇x V2 ) = 0, with H1 (p1 , p2 ) = Y1 (p1 , p2 , u∗1 (p1 , p2 ), u∗2 (p1 , p2 )) , H2 (p1 , p2 ) = Y2 (p1 , p2 , u∗1 (p1 , p2 ), u∗2 (p1 , p2 )) . A remarkable fact, proved in [11], is that the above system is always weakly hyperbolic for a very general class of strategies u∗i (p1 , p2 ), under the only assumption that they achieve Pareto optima (Y1P , Y2P ). In particular, this includes the 104 W. Shen semi-cooperative strategy (u∗1 , u∗2 ) = (U1P , U2P ) considered in this paper. This means that the Jacobian matrix of the Hamiltonian functions has real (possibly coinciding) eigenvalues. The result is true for both one-dimensional and multidimensional cases. This looks promising in connection with the Cauchy problem (5) for the value functions. Indeed, our semi-cooperative solutions will not experience the severe instabilities of the Nash solutions. It is thus expected that some existence theorem should hold in greater generality. REFERENCES [1] J.P. Aubin. Mathematical Methods of Game and Economic Theory, North Holland, Amsterdam 1979. [2] M. Bardi and I. Capuzzo Dolcetta. Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkhäuser, Boston 1997. [3] T. Basar and G.J. Olsder. Dynamic Non-cooperative Game Theory, 2nd edition, Academic Press, London 1995. [4] S. Bianchini and A. Bressan. Vanishing viscosity solutions of nonlinear hyperbolic systems, Annals of Mathematics, to appear. [5] V.G. Boltianskii. Sufficient conditions for optimality and the justification of the dynamic programming principle, SIAM J. Control Optim. 4 (1966), 326361. [6] A. Bressan. Unique solutions for a class of discontinuous differential equations, Proc. Amer. Math. Soc. 104 (1988), 772-778. [7] A. Bressan. Hyperbolic systems of conservation laws. The one-dimensional Cauchy problem, Oxford University Press, Oxford 2000. [8] A. Bressan, T.P. Liu and T. Yang. L1 stability estimates for n×n conservation laws, Arch. Rational Mech. Anal. 149 (1999), 1–22. [9] A. Bressan and F. Priuli. Infinite horizon noncooperative differential games, J. Diff. Eq. 227 (2006), no. 1, 230–257. [10] A. Bressan and W. Shen. Small BV solution of hyperbolic noncooperative differential games, SIAM J. Control Optim. 43 (2004) 104–215. [11] A. Bressan and W. Shen. Semi-cooperative strategies for differential games, Int. J. Game Th. 32 (2004), 561–593. [12] P. Brunovsky. Existence of regular syntheses for general problems, J. Diff. Eq. 38 (1980), 317–343. Non-Cooperative and Semi-Cooperative Differential Games 105 [13] P. Cardaliaguet and S. Plaskacz. Existence and uniqueness of a Nash equilibrium feedback for a simple non-zero-sum differential game, Int. J. Game Th. 32 (2003), 33–71. [14] G.Q. Chen and A. Rustichini. The Riemann solution to a system of conservation laws, with application to a nonzero sum game, Contemp. Math. 100 (1988), 287–297, [15] F.H. Clarke. Optimization and Nonsmooth Analysis, Wiley, New York, 1983. [16] H. Frid and I.S. Liu. Oscillation waves in Riemann problems inside elliptic regions for conservation laws of mixed type, Z. Angew. Math. Phys. 46 (1995), 913–931. [17] A. Friedman. Differential Games, Wiley-Interscience, New York 1971. [18] A. Friedman. Stochastic Differential Games, J. Diff. Eq. 11 (1972), 79–108. [19] J. Glimm. Solutions in the large for nonlinear hyperbolic systems of equations, Comm. Pure Appl. Math. 18 (1965), 697–715. [20] O. Hajek. Discontinuous differential equations I, J. Diff. Eq. 32 (1979), 149– 170. [21] T. Iguchi and P.G. LeFloch. Existence theory for hyperbolic systems of conservation laws with general flux-functions, Arch. Rational Mech. Anal. 168 (2003), 165–244. [22] R. Isaacs. Differential Games, Wiley, New York 1965. [23] H.O. Kreiss and J. Yström. Parabolic problems which are ill-posed in the zero dissipation limit, Math. Comput. Model 35 (2002), 1271–1295. [24] T.P. Liu. Admissible solutions of hyperbolic conservation laws, Amer. Math. Soc. Memoir 240 (1981). [25] P. Mannucci. Nonzero sum stochastic differential games with discontinuous feedback, SIAM J. Control Optim. 43 (2004), 1222-1233. [26] J. Nash. Non-cooperative games, Ann. Math. 2 (1951), 286-295. [27] G.J. Olsder. On open- and closed-loop bang-bang control in nonzero-sum differential games, SIAM J. Control Optim. 40 (2001), 1087–1106. [28] L. Perko. Differential Equations and Dynamical Systems, Springer-Verlag, 1991. [29] B. Piccoli and H.J. Sussmann. Regular synthesis and sufficiency conditions for optimality, SIAM J. Optim. Control 39 (2000), 359-410. [30] D. Serre. Systems of Conservation Laws I, II. Cambridge University Press, Cambridge 2000. 106 W. Shen [31] S. Smale. The prisoner’s dilemma and dynamical systems associated to noncooperative games, in The Collected Papers of Stephen Smale, F. Cucker and R. Wong Eds., World Scientific, 2000. [32] E.M. Vaisbord and V.I. Zhukovskii. Introduction to Multi-players Differential Games and their Applications, Gordon and Breach Science Publishers, 1988.
© Copyright 2026 Paperzz