3000 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015 Distributed Seeking of Time-Varying Nash Equilibrium for Non-Cooperative Games Maojiao Ye and Guoqiang Hu Abstract—In this note, we address a Nash equilibrium seeking problem for non-cooperative games. In contrast to previous works on Nash equilibrium seeking, the Nash equilibrium under consideration can be time-varying. A non-model-based seeking scheme is proposed to achieve time-varying Nash equilibrium seeking, where each player updates its strategy by employing an extremum seeking method. The proposed Nash seeking scheme consists of a gradient estimation algorithm and a gradient search algorithm, which can be designed in a modular fashion. For symmetric quadratic games, the proposed Nash equilibrium seeking method enables the estimated strategy to globally asymptotically converge to the Nash equilibrium. For general quadratic games that are not necessarily symmetric, the estimated strategy converges to a neighborhood of the Nash equilibrium. For more general non-quadratic games that may admit multiple equilibria, local convergence to the Nash equilibrium is proven. Index Terms—Extremum seeking, non-cooperative games, time-varying Nash equilibrium seeking. I. I NTRODUCTION Non-cooperative games have been widely applied to engineering systems subject to limited resources (e.g., see [1]–[7]). Related problems include resource allocation [1], power control in networks [2], defence strategy design for networks [3], intrusion detection [4], charging coordination among plug-in electric vehicles [5], energy management [6], [7] and so on. For the problems modeled by non-cooperative games mentioned above, an efficient analysis method is to use the Nash equilibrium concept. Several approaches have been proposed to achieve Nash equilibrium seeking in the literature (e.g., see [8]– [15]). However, most of the existing methods including best response [8], fictitious play [10], hypothesis testing [9] and regret testing [11] need model information [23] and consider only time-invariant Nash equilibrium. Motivated by the desire to weaken the dependence on model information, the extremum seeking control (ESC) method [16]–[18] has been employed for Nash equilibrium seeking [19]–[23]. In [19], [20], a Nash seeking algorithm based on ESC and sliding mode control was designed to drive the system to the sliding boundary layer and stay there thereafter. A multi-input ESC with stochastic perturbations was introduced to improve the performance of ESC on Nash equilibrium seeking in [21]. Based on the fact that the determination of Nash Manuscript received June 1, 2014; revised October 18, 2014 and February 12, 2015; accepted March 5, 2015. Date of publication March 19, 2015; date of current version October 26, 2015. This work was supported by Singapore Economic Development Board under EIRP Grant S14-1172-NRF EIRP-IHL. Recommended by Associate Editor L. Zaccarian. The authors are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798 (e-mail: ye0003ao@ e.ntu.edu.sg; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TAC.2015.2414817 equilibrium and the solution of the bilinear non-convex optimization problem are equivalent, an approach for bimatrix games was developed in [22]. In [23], averaging theory and a singular perturbation method were employed to enable local convergence to Nash equilibrium based on first-order ESC. However, the existing works consider only timeinvariant Nash equilibrium. In this note, we consider a time-varying Nash equilibrium seeking problem for N-player non-cooperative games without explicit model information, where each player’s objective is to maximize its own payoff value. A non-model-based seeking scheme is proposed to achieve time-varying Nash equilibrium seeking. The strategy of each player is updated using a new extremum seeking scheme, where a gradient estimation algorithm and a gradient search algorithm are proposed to ensure convergence to the extremum. By utilizing the proposed seeking scheme, the players are able to track the time-varying Nash equilibrium trajectory without having explicit model information. In comparison to previous works on Nash equilibrium seeking, the main contributions of this note can be summarized as follows. 1) The N-player non-cooperative games under consideration admit timevarying Nash equilibrium and payoff values. The proposed seeking scheme doesn’t require explicit model information on both the Nash equilibrium trajectory and the payoff value at the Nash equilibrium. 2) The proposed seeking method to solve the time-varying Nash seeking problem consists of a gradient estimation part and gradient search part that can be designed separately. The modular design provides more freedom to design each part. For example, the averaging method that is usually necessary for extremum seeking analysis is not required, and methods in the area of robust control can be leveraged to design the gradient search part. 3) The developed ESC-based seeking method enables the estimated strategy to converge to a neighborhood of the time-varying Nash equilibrium for both the quadratic and nonquadratic games. II. M OTIVATING E XAMPLE The Nash equilibrium may be time-varying. Take the classical Cournot quantity game as an example. The participants of the game are two firms producing the same products, the strategies for the players are the quantities of the products they produce and the payoffs for the players are the profits they gain. In the game setting, the price for their product is p(T ot) = a(t) − T ot where a(t) is a time-varying factor and T ot is the total quantity of product that the two firms produce, i.e., T ot = q1 + q2 where qi , i ∈ {1, 2} denotes the quantity of the product that firm i produces. The profit for each player is Qi = p(T ot)qi − ci (t)qi , i ∈ {1, 2} where ci (t) represents the marginal cost that may be timevarying. Based on the two profit functions, the Nash equilibrium of this game can be given by (q1∗ , q2∗ ) = ((2a1 (t) − a2 (t))/3, (−a1 (t) + 2a2 (t))/3) where a1 (t) = a(t) − c1 (t) and a2 (t) = a(t) − c2 (t). The Nash equilibrium is time-varying. While most of the Nash equilibrium seeking algorithms need model information for implementation, ESC is a promising method to enable the Nash equilibrium seeking without model information. Each user’s updating law by using first-order ESC-based Nash seeking scheme 0018-9286 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015 3001 Problem 1: (Nash equilibrium seeking with time-varying payoff functions) Consider a time-varying payoff function for each player in the game as Qi (ui (t), u−i (t), ςi (t)), i ∈ N. The mapping Qi (ui (t), u−i (t), ςi (t)), the Nash equilibrium trajectory u∗ (t), Δ Fig. 1. Black line and black dashed line denote q1∗ and q2∗ , respectively. The blue and red dashed lines denote q̂1 (t) and q̂2 (t) generated by the first-order ESC-based Nash seeking method, respectively. Fig. 2. Black line and black dashed line denote q1∗ and q2∗ , respectively. The blue and red dashed lines denote q̂1 (t) and q̂2 (t) generated by the proposed ESC-based Nash seeking method, respectively. is [23] q̂˙ i = mi sin(ωi t)Qi , qi = q̂i + ai sin(ωi t), i ∈ {1, 2}, where mi is the integrator gain, ai and ωi are the amplitude and frequency of the dither signal of player i, respectively. However, the first-order ESCbased Nash equilibrium seeking scheme will suffer from degraded performance if directly applied to the case of time-varying Nash equilibrium. The simulation results generated by the first-order ESC and the proposed method are shown in Figs. 1 and 2, respectively. It can be seen that the simulation result generated by the seeking strategy built on first-order ESC displays bounded error and large chattering for the case of time-varying Nash equilibrium. In contrast, the result generated by the proposed method displays much smaller error and much less chattering. III. P ROBLEM S TATEMENT To facilitate the subsequent design and convergence analysis, several definitions related to the Nash equilibrium seeking problem for non-cooperative games are provided as below. Δ Definition 1: The game to be considered is defined as Γ = {N, (ui )i∈N , (Qi )i∈N } where N is the set of N players, ui (t) is the strategy for player i, U ⊂ RN = {ui | i ∈ N} denotes the strategy Δ space, and Qi is the payoff function for player i. Define Qi = Qi (ui (t), u−i (t), ςi (t)) where ςi (t) is a time-varying unknown vector and u−i (t) denotes the strategies for all the players other than player i. Definition 2: The strategy vector of the game defined in Definition 1 is said to be at the Nash equilibrium if any unilateral change of a player’s strategy does not increase its payoff value in the sense that Qi (u∗i (t), u∗−i (t), ςi (t)) ≥ Qi (ui (t), u∗−i (t), ςi (t)),∀i ∈ N. Δ Δ To simplify the notations, we define û = [û1 , û2 , · · · , ûN ]T , u = [u1 , Δ u2 , · · · , uN ]T , u∗ = [u∗1 , u∗2 , · · · , u∗N ]T , and Eξ = [ξ1 , ξ2 , · · · , ξN ]T where ξi = ûi − u∗i (t), ui = ûi + ai sin(ωi t), i ∈ N, and the parameter ai is assumed to be small and positive. and the mapping at the Nash equilibrium denoted by Q∗i (t) = Qi (u∗i (t), u∗−i (t), ςi (t)) are all unknown. Design a seeking control scheme such that the strategy vector û(t) tracks the time-varying Nash equilibrium u∗ (t). Remark 1: In classical extremum seeking, an effective update law for ûi (t) is the continuous steepest descent method, which enables ûi (t) to converge to a constant optimal point using the gradient measurements. However, in the case of time-varying Nash equilibrium seeking using classic extremum seeking methods, the error and chattering can be large. The simulation result shown in Fig. 1 helps to verify this point. Assumption 1: The first three derivatives of the time-varying Nash equilibrium trajectory exist and are all bounded, i.e., u̇∗ (t), ü∗ (t), ˙ü∗ (t) ∈ L∞ . Assumption 2: The non-cooperative games in consideration admit at least one Nash equilibrium on which ∂Qi (u∗ (t), ςi (t))/∂ui = 0 and ∂ 2 Qi (u∗ (t), ςi (t))/∂u2i < 0 for all t and for all i ∈ N. Assumption 3: The first three partial derivatives of the payoff functions with respect to uj (t) for j ∈ N exist and are all bounded at the Nash equilibrium trajectory u∗ (t). Assumption 4: The frequencies of the dither signals are such that the variations of dither signals are much faster than the variations of the payoff functions as well as their partial derivatives with respect to uj (t) for j ∈ N at û(t). Assumption 5: Define ⎡ ⎢ ⎢ ⎢ A(t) = ⎢ ⎢ ⎣ ∂ 2 Q1 (u∗ (t),ς1 ) ∂u2 1 ∂ 2 Q2 (u∗ (t),ς2 ) ∂u1 ∂u2 2 .. .∗ ∂ 2 Q1 (u∗ (t),ς1 ) ∂u1 ∂u2 ··· ∂ 2 Q1 (u∗ (t),ς1 ) ∂u1 ∂uN ∂ 2 Q2 (u∗ (t),ς2 ) ∂u2 2 ··· ∂ 2 Q2 (u∗ (t),ς2 ) ∂u2 ∂uN .. ∂ QN (u (t),ςN ) ∂u1 ∂uN . ··· ⎤ ∂ 2 QN (u∗ (t),ςN ) ∂u2 N ⎥ ⎥ ⎥ ⎥. ⎥ ⎦ Then A is invertible, AT A is bounded and positive definite, and −λmax (Ak1 + k1 AT ) can be made positive and large by tuning a positive diagonal matrix k1 . Remark 2: Since ∂ 2 Qi (u∗ (t), ςi (t))/∂u2i < 0 for all i ∈ N, if A, AT are strictly diagonally dominant, this assumption can be easily satisfied, for example, by choosing all the nonzero components in k1 equal. Even if they are not strictly diagonally dominant, this assumption may also be satisfied. IV. T IME -VARYING NASH E QUILIBRIUM S EEKING In this section, we provide a Nash equilibrium tracking scheme as well as the stability analysis of the proposed scheme. The overall schematic outline is shown in Fig. 3. Each player employs the proposed extremum seeking method to update its strategy. In the proposed extremum seeking control scheme, sinusoidal dither signals are used to modulate the players’ payoff functions, and a delay based gradient estimation subsystem and a robust gradient search subsystem are designed for tracking the time-varying Nash equilibrium. In the following subsections, we will firstly consider a special case, i.e., quadratic game, after which, we will focus our attention on a more generalized non-quadratic game. A. Quadratic Games The update law for player i in the game is designed as [28] û˙i = ki1 μi + Φi − ci1 ûi (1) Φ̇i = ci2 ki1 μi + ki2 sgn(μi ) (2) 3002 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015 Proof: From (3), we have 2 1 − e−Ti s μi = [Qi (ui , u−i , ςi ) sin(ωi t)] ai Ti s t N N 1 2 = ai T i 2 t−Ti + Fig. 3. Schematic diagram for time-varying Nash equilibrium seeking. × where μi (t) is generated by μi (ui , u−i , ςi ) = 2 1 − e−Ti s [Qi (ui , u−i , ςi ) sin(ωi t)] ai Ti s qij (ςi )uj + si (ςi ) sin(ωi τ )dτ sin(ωi τ ) t−Ti N N 1 pijk (ςi ) u∗j (τ )u∗k (τ ) 2 j=1 k=1 (3) and sgn(μi ) = 1 if μi > 0, sgn(μi ) = −1 if μi < 0, sgn(μi ) = 0 if μi = 0. The parameters ki1 , ki2 , ci1 , ci2 are positive control gains to be determined. The parameter Ti is a common multiple of the periods of the dither signals. Remark 3: The proposed extremum seeking scheme is based on a modular design where μi (t) defined in (3) is used to extract the gradient. The output of this gradient estimation part is fed into the gradient search part (shown in (1) and (2)). By utilizing this gradient estimation method, the convergence analysis can be conducted without using the averaging method which is the main analysis tool for classical extremum seeking methods. Remark 4: The actual Nash trajectory is unknown and thus the error signal is unknown. Hence, the estimated gradient is used in the gradient search part where the function sgn(·) is used to eliminate the effect of some bounded terms such that asymptotic convergence can be achieved. Furthermore, the function sgn(·) is integrated in the control law so that the overall control law is continuous and admits less chattering [25] in comparison with discontinuous update laws such as sliding mode controllers. 1) Game Analysis: For general quadratic games, the payoff N N functions are defined as Qi (ui , u−i , ςi ) = (1/2) j=1 k=1 pijk N N j=1 t 2 = ai T i pijk (ςi )uj (τ )uk (τ ) j=1 k=1 (ςi )uj uk + j=1 qij (ςi )uj + si (ςi ), i ∈ N where pijk , ṗijk , and p̈ijk are all bounded. Taking the partial derivative Nof Qi with respect to ui yields (∂Qi /∂ui )(u, ςi ) = piii ui (t) + j=1,j=i piij uj (t) + qii , i ∈ N. By Assumption 2,⎡we have P u∗ + q = 0 where⎤the matrix p111 p112 · · · p11N p222 p22N ⎥ ⎢ p221 ⎥ and q = P (t) is defined as P = ⎢ .. ⎣ ... ⎦ . pN N N pN N 1 pN N 2 [q11 , q22 , . . . , qN N ]T is a N × 1 vector. Lemma 1: Suppose that Assumption 2 holds and the matrix P (t) is invertible for all t. Then, the Nash equilibrium of the quadratic game exists and is unique. Proof: By Assumption 2, if P u∗ + q = 0 admits a solution, then the Nash equilibrium exists. Since the matrix P (t) is invertible, the Nash equilibrium exists and is unique with u∗ = −P −1 q. Remark 5: With the uniqueness of the Nash equilibrium, global stability result can be obtained for quadratic games. Lemma 2: Suppose Assumptions 2 and 4 hold and the frequencies of the dither signals are chosen such that ωi = ωj , 2ωi = ωj , and ωi = ωj + ωk , for all i, j, k ∈ N, i = j = k. Then, for a quadratic game, μi (t) defined in (3) can be related to the error signal as μi = N p (ς )ξ . j=1 iij i j + u∗j (τ )ξk +ak u∗j (τ ) sin(ωk τ ) + ξj u∗k (τ ) + ξj ξk + ak ξj sin(ωk τ )+aj sin(ωj τ )u∗k (τ )+aj sin(ωj τ )ξk + aj ak sin(ωj τ ) sin(ωk τ )) + N qij (ςi ) u∗j (τ )+ξj +aj sin(ωj τ ) +si (ςi ) dτ. (4) j=1 Based on Assumption 4, the variations of Qi at û(t) are much slower than sin(ωi t). Thus, μi (ui , u−i , ςi ) can be written as 2 μi = ai T i t N t−Ti piij (ςi ) ai u∗j (τ ) sin2 (ωi τ ) j=1 + ai ξj sin2 (ωi τ ) + qii (ςi )ai sin2 (ωi τ )dτ = N piij ξj . j=1 The detailed calculation to get the above equation is omitted due to space limit. Remark 6: By Assumption 4, the function of the delay based gradient estimation module is similar to the averaging method in which the relatively slowly varying components are regarded as constants for analysis [27]. Furthermore, the output of the gradient estimation part μi is equal to the gradient of (∂Qi /∂ui )(û, ςi ) for i ∈ N. Remark 7: If Ti is not a common multiple of the periods of the dither signals but only positive integer times of the period of player i s dither signal, then, the output of the gradient estimation module will have an approximation error for gradient estimation. Through numerical simulation, we see that the proposed method still works if |ωi − ωj |, |2ωi − ωj |, and |ωi − (ωj + ωk )|, for i, j, k ∈ N, i = j = k are not that small. However, the transient performance may be degraded to some extent due to imperfect gradient estimation. Substituting the result in Lemma 2 into the update law (1) and (2), we get û˙ = k1 P Eξ + Φ − c1 û Φ̇ = c2 k1 P Eξ + k2 sgn(P Eξ ) (5) where Φ = [Φ1 , Φ2 , · · · , ΦN ]T and the bold notations k1 , k2 , c1 , c2 are used to denote diagonal control gain matrices of ki1 , ki2 , ci1 , ci2 , respectively. 2) Stability Analysis for Quadratic Games: For stability analysis, define a filtered signal Eη = d(P Eξ )/dt + c2 P Eξ . Then, the derivative of Eη (t) is Ėη = P Ëξ + c2 P Ėξ + c2 Ṗ Eξ + 2Ṗ Ėξ + P̈ Eξ ¨ − ü∗ (t) + c2 P Ėξ + c2 Ṗ Eξ + 2Ṗ Ėξ + P̈ Eξ = P û = P (k1 Eη + k2 sgn(P Eξ − c1 u̇∗ (t) − ü∗ (t)) − P c1 Ėξ + c2 P Ėξ + c2 Ṗ Eξ + 2Ṗ Ėξ + P̈ Eξ = P (k1 Eη + k2 sgn(P Eξ ) − Nc (t)) + Ñ (t) (6) IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015 where Nc (t) = c1 u̇∗ (t)+ ü∗ (t) and Ñ (t) = (−P c1 + 2Ṗ )P −1 (Eη − c2 P Eξ − Ṗ Eξ ) + c2 (Eη − c2 P Eξ ) + P̈ Eξ . Remark 8: By Assumption 1, u̇∗ (t), ü∗ (t),˙ü∗ (t) are all bounded. Hence, Nc (t), Ṅc (t) are bounded. For convenience, we denote the upper bounds of Nc (t) and Ṅc (t) as UNc and UṄc , respectively. Since P , Ṗ , and P̈ are bounded, there exists a positive k such that Ñ (t) ≤ kE where E = [EξT EηT ]T if P −1 is bounded. A Special Class of Quadratic Games: The subsequent analysis considers a special case where the following assumption is satisfied. Assumption 6: The matrix P is symmetric negative definite. Remark 9: Under this assumption, the game can be shown to be a potential game [26]. To facilitate the subsequent stability analysis of the closed-loop system, define an auxiliary function H(t) as H = λmin (k2 ) EηT (0)P (0) − EηT (0)P (0)Nc (0) t EηT [Nc (τ ) − k2 sgn(P Eξ )] dτ. − (7) 0 Lemma 3: Suppose that c2 UNc + UṄ c − λmin (c2 k2 ) ≤ 0 and UNc − λmin (k2 ) ≤ 0. Then, H ≥ 0 under Assumption 1. Proof: The proof can be obtained by using similar analysis in [25]. Theorem 1: Suppose that Assumptions 1–4 and Assumption 6 hold, P −1 is bounded and differentiable and the control gain k2 is selected such that H(t) ≥ 0, c2 and k1 are large enough, ωi = ωj , 2ωi = ωj , and ωi = ωj + ωk for all i, j, k ∈ N and i = j = k. Then, the estimated control input û(t) globally asymptotically converges to the time-varying Nash equilibrium. Proof: Noting that −P −1 is bounded and positive definite, define V = (m/2)(EξT P T P Eξ ) − (1/2)EηT P −1 Eη + H with m > 0 and √ X = [EξT EηT H]T . Then, β1 X 2 ≤ V ≤ β2 X 2 where β1 and β2 are positive constants. Furthermore d(P Eξ ) dP −1 1 − EηT P −1 Ėη + Ḣ − EηT Eη V̇ dt 2 dt T T T −1 = mEξ P (Eη − c2 P Eξ ) − Eη P Ñ (t) + EηT (−k1 Eη − k2 sgn(P Eξ ) + Nc (t)) dP −1 1 Eη . − EηT [Nc (t) − k2 sgn(P Eξ )] − EηT (8) 2 dt = mEξT P T Since Ñ (t) ≤ kE V̇ ≤ − m λmin (c2 )λmin (P T P ) − 1 T P E 2 2 − λmin (k1 ) − mλmin (c2 )λmin (P T P ) 1 − λmin 2 ≤ − mkx E 2 + dP −1 dt Eη 2 + kE Eη P −1 k2 P −1 2 E 2 4ky where kx , ky are defined in the subsequent remark and the conclusion can be derived by choosing the control gains such that kx , ky > 0 and kz = mkx − (k2 P −1 2 /4ky ) > 0. Remark 10: In this Theorem, the parameters should be chosen such that H(t) > 0, kx = λmin (c2 )λmin (P T P ) − (1/2)P T > 0, ky = λmin (k1 )−mλmin (c2 )λmin (P T P )−(1/2)λmin (dP −1 /dt) > 0 and kz > 0. These conditions can be achieved by choosing c2 and k1 , k2 large enough. 3003 More General Quadratic Games: In Theorem 1, we assume that the matrix P (t) is symmetric negative definite by which we get a globally asymptotically stable result. If Assumption 6 is not satisfied, we can get an uniformly ultimately bounded result under a milder condition. Noticing that for quadratic games P (t) = A(t) which is defined in Assumption 5, the following conclusion can be derived. Theorem 2: Suppose that Assumptions 1–5 hold, the matrix P −1 is bounded, ωi = ωj , 2ωi = ωj , and ωi = ωj + ωk , for all i, j, k ∈ N and i = j = k. Then, the estimated control input û(t) globally asymptotically converges to a neighborhood of the time-varying Nash equilibrium by suitably choosing the control gains. Proof: Define a Lyapunov function candidate V = (m/2) EξT P T P Eξ + (1/2)EηT Eη . With a positive m, we have β11 E 2 ≤ V ≤ β22 E 2 , where β11 and β22 are positive constants. The time derivative of the Lyapunov candidate function is V̇ = mEξT P T (Eη − c2 P Eξ ) + EηT Ñ (t) + EηT (P k1 Eη + P k2 sgn(P Eξ ) − P Nc (t)) 1 ≤ − m λmin (c2 )λmin (P T P ) − P T E 2 2 + mλmin (c2 )λmin (P T P )EηT Eη + kh Eη 1 + λmax (P k1 + k1 P T )EηT Eη + kE Eη 2 where kh = P k2 + P Nc (t). Define kxx = λmin (c2 )λmin (P T P ) − (1/2)P T , kyy = −(1/2)λmax (P k1 + k1 P T )−mλmin (c2 )λmin (P T P ). Then V̇ ≤ −m λmin (c2 )λmin (P T P ) − 1 T P E 2 2 λmax (P k1+k1P T ) −mλmin (c2 )λmin (P TP ) Eη 2 − − 2 + kE Eη + kh Eη k2 E 2 + kh Eη . ≤ − mkxx − 4kyy The positiveness of kxx , kyy , and kzz = mkxx − (k2 /4kyy ) can be ensured by choosing the control gains suitably as described in the subsequent remark. Hence, V̇ ≤ −kzz E 2 + kh E ≤ −kzz (E − (kh /2kzz ))2 +(kh2 /4kzz ). The ultimate bound term can be made small by choosing the parameters such that kzz is sufficiently large. Remark 11: In this Theorem, the parameters should be chosen to satisfy the following inequalities: kxx > 0, kyy > 0 and kzz = mkxx − (k2 /4kyy ) > 0. With a fixed m, kzz can be made large by choosing c2 , k1 such that kxx and kyy are both large. B. General Non-Quadratic Games In this part, the update law for gradient search subsystem is revised as û˙i = ki1 μi + Φi − ci1 ûi Φ̇i = ci2 ki1 μi + ki2 lgn(μi ) where lgn(·) is defined as lgn(θ) = sgn(θ), κ(θ, ), if |θ| ≥ if |θ| < . (9) (10) (11) In (11), is a small positive parameter and κ(·) is such that lgn(·) is continuous and differentiable and the partial derivative of lgn(θ) with respect to θ is bounded. We make this revision in this part to ensure that the dynamics of the update law is continuous and differentiable. The gradient estimation subsystem remains the same as for quadratic games, i.e., μi = (2/ai )((1 − e−Ti s )/Ti s)[Qi (ui , u−i , ςi ) sin(ωi t)]. 3004 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015 To view the effect of non-quadratic functions on convergence property of the proposed method, we use Taylor polynomial approximation to analyze the update law. In this note, we use the third-order approximation Qi (ui , u−i , ςi ) = Qi (u∗ , ςi ) + N ∂Qi (u∗ (t), ςi ) ∂uj j=1 + (ξj + aj sin(ωj t)) N 1 ∂ 2 Qi (u∗ (t), ςi ) (ξj + aj sin(ωj t))2 2 ∂u2j j=1 + N N ∂ 2 Qi (u∗ (t), ςi ) ∂uj ∂uk j=1 k=1,k=j (ξj + aj sin(ωj t)) Fig. 4. Blue, red, and green lines denote the actual Nash equilibrium trajectory for player 1, 2, and 3, respectively. The blue, red and green dashed lines denote the estimated control input of player 1, 2, and 3, respectively, which are generated by the proposed seeking method. × (ξk + ak sin(ωk t)) + N 1 ∂ 3 Qi (u∗ (t), ςi ) (ξj + aj sin(ωj t))3 3! ∂u3j j=1 N + 1 2! N Rewrite the closed-loop system as ξ˙i = ki1 μi + Φi − ci1 ξi − ci1 u∗i − u̇∗i , Φ̇i = ci2 ki1 μi + ki2 lgn(μi ) where μi = (2/ai )((1 − e−Ti s )/Ti s)[Qi (u, ςi ) sin(ωi t)]. The linearized system for this N closed-loop system at ξi = j=1 hijj a2j + o(maxi∈N a3i ) is ∂ 3 Qi (u∗ (t), ςi ) (ξj + aj sin(ωj t))2 ∂u2j ∂uk j=1 k=1,k=j × (ξk + ak sin(ωk t)) + N N N ∗ 3 j=1 k=1,k=j l=1,l=k=j ∂ Qi (u (t), ςi ) (ξj +aj sin(ωj t)) ∂uj ∂uk ∂ul × (ξk + ak sin(ωk t)) (ξl + al sin(ωl t)) + o max a4i i∈N . (12) Suppose that the frequencies of the dither signals are chosen according to ωi = ωj , ωi = ωj +ωk , ωi = ωj +ωk , 2ωi = ωj +ωk , ωi = 2ωj +ωk , 3ωi = ωj for all i, j, k ∈ N and i = j = k. Substituting (12) into μi (ui , u−i , ςi ) = (2/ai )((1 − e−Ti s )/Ti s)[Qi (ui , u−i , ςi ) sin(ωi t)] yields μi (ui , u−i , ςi ) 2(1 − e−Ti s ) [Qi (ui , u−i , ςi ) sin(ωi t)] = ai T i s t 2 = ai T i Qi (ui , u−i , ςi ) sin(ωi τ )dτ t−Ti = N ∂ 2 Qi (u∗ (t), ςi ) j=1 ∂ui ∂uj ξj + 1 2 a2i ξ + 2 i 8 N 2 ∂ 3 Qi (u∗ (t), ςi ) ξj N + ξi ξj j=1 j=i + N ∂u2i ∂uj N ξj ξk j=1,j=i k>j,k=i + j=1 j=i 2 ∂ 3 Qi (u∗ (t), ςi ) ∂u3i + a2j ∂ 3 Qi (u∗ (t), ςi ) 4 ∂ui ∂u2j ∂ 3 Qi (u∗ (t), ςi ) + o max a3i . i∈N ∂ui ∂uj ∂uk (13) The proposed extremum seeking based scheme tends to force μi , which is used as an approximation of the gradient of the payoff function, to zero. From (13), we know that μi is coupled by the higher order terms. To view the effect of the higher order terms on the equilibrium of the overall system, we conduct a postulation which is similar as [23, Theorem 3]. Let ξi = N j=1 gij aj + N N hijk aj ak + o max a3i . i∈N (14) j=1 k≥j By that the equilibrium is [23] ξi = N(13) and 2(14), it can be obtained 3 h a + o(max a ) where hijj is the ith column of Hj and ijj i∈N j i j=1 Hj = −(1/4)A−1 fjT where fj is a N ×1 vector defined as fji = (∂3 Qi / ∂ui u2j )(u∗ (t), ςi ), if i = j and fii = (1/2)(∂ 3 Qi /∂u3i )(u∗ (t), ςi ). ξ˙i1 = ki1 N ∂ 2 Qi (u∗ (t), ςi ) ∂ui ∂uj j=1 Φ̇i = ci2 ki1 +o max a3i N ∂ 2 Qi (u∗ (t), ςi ) j=1 ∂ui ∂uj i∈N +o ξj1 +Φi −ci1 ξi1 max a3i i∈N ξj1 N ∂lgn(μi ) ∂ 2 Qi (u∗ (t), ςi ) + ki2 +o max a3i ξj1 i∈N ∂μi ∂ui ∂uj (15) j=1 N where ξi1 = ξi − ( j=1 hijj a2j + o(maxi∈N a3i )), i ∈ N. Theorem 3: Considering the system in (9) and (10), suppose that Assumptions 1–5 hold, and the frequencies of the dither signals are chosen according to ωi = ωj , ωi = ωj + ωk , ωi = ωj + ωk , 2ωi = ωj + ωk , ωi = 2ωj + ωk , 3ωi = ωj for all i, j, k ∈ N, i = j = k. the time-varying Nash equiThen, ûi converges to a neighborhood of N librium trajectory, i.e., ûi → u∗i + j=1 hijj a2j + o(maxi∈N a3i ) if the control gains are suitably chosen and ûi (0) is sufficiently close to the Nash equilibrium. Proof: This theorem can be shown by defining a filtered signal 1 T ] . The Ez = (d(AEξ1 )/dt) + c2 AEξ1 where Eξ1 = [ξ11 , ξ21 , . . . , ξN subsequent analysis is similar to that of Theorem 2. The candidate Lyapunov function can be defined as V = (m/2)Eξ1T AT AEξ1 + EzT Ez with m > 0. By Lyapunov analysis, the theorem can be proved. The details are omitted here due to space limit. V. N UMERICAL E XAMPLE Consider a three-player game with payoff functions given by Q1 = −(u1 − (1/3)u2 +(1/2)u3 − ς1 +1)2 +2+ς2 , Q2 = −(−(1/4)u1 + u2 −(1/2)u3 −ς3 +3)2 +1+ς4 , and Q3 = −(−(1/2)u1 −(1/5)u2 + u3 + 5)2 + 2 + ς5 where ς1 , ς2 , ς3 , ς4 , and ς5 are unknown timevarying signals. Solving this game gives the time-varying Nash equilibrium u∗1 (t) = (108/121)ς1 + (28/121)ς3 + (8/121), u∗2 (t) = (60/121)ς1 + (150/ 121)ς3 − (735/121), u∗3 (t) = (6/11)ς1 + (4/11)ς3 − (68/11) and the payoff values for the three players at Nash equilibrium are Q∗1 (t) = 2 + ς2 , Q∗2 (t) = 1 + ς4 , and Q∗3 (t) = 2 + ς5 . The simulation results with ω1 = 15, ω2 = 9, ω3 = 20, ai = 0.1, i ∈ {1, 2, 3} are shown in Figs. 4 and 5. The simulation result shows that û(t) reaches a neighborhood of the time-varying Nash equilibrium. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015 Fig. 5. Blue, red, and green lines denote the actual payoff values at the Nash equilibrium trajectory for player 1, 2, and 3, respectively. The blue, red and green dashed lines denote the output values of player 1, 2, and 3, respectively, which are generated by the proposed seeking method. VI. C ONCLUSION In this note, an ESC-based scheme is proposed to achieve timevarying Nash equilibrium seeking with time-varying output and Nash equilibrium. By designing a delay-based gradient estimation method and a robust gradient search method, we propose a seeking control scheme to track the Nash equilibrium without knowing the explicit model information. Quadratic games are firstly considered which are followed by more general non-quadratic games. The estimated game strategy is shown to globally asymptotically converge to the unknown time-varying Nash equilibrium for symmetric quadratic games. For general quadratic games and non-quadratic games, we show that the seeking control update law enables the estimated game strategy to converge to a neighborhood of the Nash equilibrium. R EFERENCES [1] Y. Sarikaya, T. Alpcan, and O. Ercetin, “Resource allocation game for wireless networks with queue stability constraints,” in Proc. IEEE Conf. Decision and Control and European Control Conf., 2011, pp. 3098–3103. [2] C. Saraydar, N. Mandayam, and D. Goodman, “Efficient power control via pricing in wireless data networks,” IEEE Trans. Commun., vol. 50, no. 2, pp. 291–303, Feb. 2002. [3] A. Agah, S. Das, and K. Basu, “A non-cooperative game approach for intrusion detection in sensor networks,” in IEEE Veh. Technol. Conf., 2004, pp. 2902–2906. [4] Q. Zhu, C. Fung, R. Boutaba, and T. Basar, “A game-theoretic approach to rule sharing mechanism in networked intrusion detection systems: Robustness, incentives and security,” in Proc. IEEE Conf. Decision and Control and European Control Conf., 2011, pp. 243–248. [5] Z. Ma, D. Callaway, and I. Hiskens, “Decentralized charging control of large populations of plug-in electric vehicles,” IEEE Trans. Control Syst. Technol., vol. 21, no. 1, pp. 67–78, Jan. 2013. [6] A. Rad, V. Wong, J. Jatskevich, R. Scholar, and A. Garcia, “Autonomous demand-side management based on game-theoretic energy consumption scheduling for the future smart grid,” IEEE Trans. Smart Grid, vol. 1, no. 3, pp. 320–331, 2010. 3005 [7] Q. Zhu, J. Zhang, P. Sauer, A. Garcia, and T. Basar, “A game-theoretic framework for control of distributed renewable-based energy resources in smart grids,” in Proc. American Control Conf., 2012, pp. 3623–3628. [8] A. Fiat, E. Koutsoupias, K. Ligett, Y. Mansour, and S. Olonetsky, “Beyond myopic best response,” in Proc. ACM-SIAM Symp. Discrete Algorithms, 2012, pp. 993–1005. [9] P. Dean and H. Young, “Learning, hypothesis testing, and Nash equilibrium,” Games and Econom. Beh., vol. 45, no. 1, pp. 73–96, 2003. [10] J. Shamma and G. Arslan, “Dynamic fictitious play, dynamic gradient play, and distributed convergence to nash equilibria,” IEEE Trans. Autom. Control, vol. 50, no. 3, pp. 312–327, Mar. 2005. [11] D. Foster, “Regret testing: Learning to play Nash equilibrium without knowing you have an opponent,” Theoret. Econom., vol. 1, pp. 341–367, 2006. [12] I. Cho and A. Matsui, “Learning aspiration in repeated games,” J. Econom. Theory, vol. 124, no. 2, pp. 171–201, 2005. [13] S. Li and T. Basar, “Distributed algorithms for the computation of noncooperative equilibria,” Automatica, vol. 23, no. 4, pp. 523–533, 1987. [14] A. Kannan and V. Uday, “Distributed computation of equilibria in monotone nash games via iterative regularization techniques,” SIAM J. Optimiz., vol. 22, no. 4, pp. 1177–1205, 2012. [15] I. Erev and E. Alvin, “Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria,” American Econom. Rev., vol. 88, no. 4, pp. 848–881, 1998. [16] M. Krstic and H. Wang, “Stability of extremum seeking feedback for general nonlinear dynamic systems,” Automatica, vol. 36, no. 4, pp. 595–601, 2000. [17] K. B. Ariyur and M. Krstic, Real-Time Optimization by ExtremumSeeking Control. New York, NY, USA: Wiley-Interscience, 2003. [18] Y. Tan, D. Nesic, I. Mareels, and A. Astolfi, “On global extremum seeking in the presence of local extrema,” Automatica, vol. 45, no. 1, pp. 245–251, 2009. [19] Y. Pan, T. Acarman, and U. Ozguner, “Nash solution by extremum seeking control approach,” in Proc. IEEE Conf. Decision and Control, 2002, pp. 329–334. [20] Y. Pan and U. Ozguner, “Sliding mode extremum seeking control for linear quadratic dynamic game,” in Proc. American Control Conf., 2004, pp. 614–619. [21] S. Liu and M. Krstic, “Stochastic nash equilibrium seeking for games with nonlinear payoffs,” SIAM J. Control Optimiz., vol. 49, no. 4, pp. 1659– 1679, 2011. [22] A. Orlov and A. Strekalovskii, “Seeking the equilibrium situations in bimatrix games,” Automat. Remote Control, vol. 65, no. 2, pp. 204–218, 2004. [23] P. Frihauf, M. Krstic, and T. Basar, “Nash equilibrium seeking in noncooperative games,” IEEE Trans. Autom. Control, vol. 57, no. 5, pp. 1192– 1207, 2012. [24] K. Ariyur and M. Krstic, Real-time Optimization by Extremum-Seeking Control, 1st ed. New York, NY, USA: Wiley-Interscience, 2003. [25] C. Makkar, G. Hu, W. G. Sawyer, and W. E. Dixon, “Lyapunov-based tracking control in the presence of uncertain nonlinear parameterizable friction,” IEEE Trans. Autom. Control, vol. 52, no. 10, pp. 1988–1994, 2007. [26] M. Dov and L. S. Shapley, “Potential games,” Games and Econom. Behav., vol. 14, pp. 124–143, 1996. [27] H. Khalil, Nonlinear Systems, 3rd ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2002. [28] M. Ye and G. Hu, “Distributed seeking of time-varying Nash equilibrium for non-cooperative games,” in Proc. IEEE Int. Conf. Control Autom., 2013, pp. 1674–1679.
© Copyright 2026 Paperzz