Distributed Seeking of Time-Varying Nash Equilibrium for Non

3000
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015
Distributed Seeking of Time-Varying Nash Equilibrium
for Non-Cooperative Games
Maojiao Ye and Guoqiang Hu
Abstract—In this note, we address a Nash equilibrium seeking
problem for non-cooperative games. In contrast to previous works
on Nash equilibrium seeking, the Nash equilibrium under consideration can be time-varying. A non-model-based seeking scheme
is proposed to achieve time-varying Nash equilibrium seeking,
where each player updates its strategy by employing an extremum
seeking method. The proposed Nash seeking scheme consists of a
gradient estimation algorithm and a gradient search algorithm,
which can be designed in a modular fashion. For symmetric
quadratic games, the proposed Nash equilibrium seeking method
enables the estimated strategy to globally asymptotically converge
to the Nash equilibrium. For general quadratic games that are not
necessarily symmetric, the estimated strategy converges to a neighborhood of the Nash equilibrium. For more general non-quadratic
games that may admit multiple equilibria, local convergence to the
Nash equilibrium is proven.
Index Terms—Extremum seeking, non-cooperative games,
time-varying Nash equilibrium seeking.
I. I NTRODUCTION
Non-cooperative games have been widely applied to engineering
systems subject to limited resources (e.g., see [1]–[7]). Related problems include resource allocation [1], power control in networks [2], defence strategy design for networks [3], intrusion detection [4], charging
coordination among plug-in electric vehicles [5], energy management
[6], [7] and so on. For the problems modeled by non-cooperative
games mentioned above, an efficient analysis method is to use the
Nash equilibrium concept. Several approaches have been proposed
to achieve Nash equilibrium seeking in the literature (e.g., see [8]–
[15]). However, most of the existing methods including best response
[8], fictitious play [10], hypothesis testing [9] and regret testing [11]
need model information [23] and consider only time-invariant Nash
equilibrium.
Motivated by the desire to weaken the dependence on model information, the extremum seeking control (ESC) method [16]–[18] has
been employed for Nash equilibrium seeking [19]–[23]. In [19], [20],
a Nash seeking algorithm based on ESC and sliding mode control was
designed to drive the system to the sliding boundary layer and stay
there thereafter. A multi-input ESC with stochastic perturbations was
introduced to improve the performance of ESC on Nash equilibrium
seeking in [21]. Based on the fact that the determination of Nash
Manuscript received June 1, 2014; revised October 18, 2014 and
February 12, 2015; accepted March 5, 2015. Date of publication March 19,
2015; date of current version October 26, 2015. This work was supported by
Singapore Economic Development Board under EIRP Grant S14-1172-NRF
EIRP-IHL. Recommended by Associate Editor L. Zaccarian.
The authors are with the School of Electrical and Electronic Engineering,
Nanyang Technological University, Singapore 639798 (e-mail: ye0003ao@
e.ntu.edu.sg; [email protected]).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TAC.2015.2414817
equilibrium and the solution of the bilinear non-convex optimization
problem are equivalent, an approach for bimatrix games was developed
in [22]. In [23], averaging theory and a singular perturbation method
were employed to enable local convergence to Nash equilibrium based
on first-order ESC. However, the existing works consider only timeinvariant Nash equilibrium.
In this note, we consider a time-varying Nash equilibrium seeking
problem for N-player non-cooperative games without explicit model
information, where each player’s objective is to maximize its own payoff value. A non-model-based seeking scheme is proposed to achieve
time-varying Nash equilibrium seeking. The strategy of each player
is updated using a new extremum seeking scheme, where a gradient
estimation algorithm and a gradient search algorithm are proposed
to ensure convergence to the extremum. By utilizing the proposed
seeking scheme, the players are able to track the time-varying Nash
equilibrium trajectory without having explicit model information.
In comparison to previous works on Nash equilibrium seeking, the
main contributions of this note can be summarized as follows. 1) The
N-player non-cooperative games under consideration admit timevarying Nash equilibrium and payoff values. The proposed seeking
scheme doesn’t require explicit model information on both the Nash
equilibrium trajectory and the payoff value at the Nash equilibrium.
2) The proposed seeking method to solve the time-varying Nash
seeking problem consists of a gradient estimation part and gradient
search part that can be designed separately. The modular design
provides more freedom to design each part. For example, the averaging
method that is usually necessary for extremum seeking analysis is not
required, and methods in the area of robust control can be leveraged to
design the gradient search part. 3) The developed ESC-based seeking
method enables the estimated strategy to converge to a neighborhood
of the time-varying Nash equilibrium for both the quadratic and nonquadratic games.
II. M OTIVATING E XAMPLE
The Nash equilibrium may be time-varying. Take the classical
Cournot quantity game as an example. The participants of the game are
two firms producing the same products, the strategies for the players
are the quantities of the products they produce and the payoffs for the
players are the profits they gain.
In the game setting, the price for their product is p(T ot) =
a(t) − T ot where a(t) is a time-varying factor and T ot is the total
quantity of product that the two firms produce, i.e., T ot = q1 + q2
where qi , i ∈ {1, 2} denotes the quantity of the product that firm i
produces. The profit for each player is Qi = p(T ot)qi − ci (t)qi , i ∈
{1, 2} where ci (t) represents the marginal cost that may be timevarying. Based on the two profit functions, the Nash equilibrium of
this game can be given by (q1∗ , q2∗ ) = ((2a1 (t) − a2 (t))/3, (−a1 (t) +
2a2 (t))/3) where a1 (t) = a(t) − c1 (t) and a2 (t) = a(t) − c2 (t).
The Nash equilibrium is time-varying.
While most of the Nash equilibrium seeking algorithms need model
information for implementation, ESC is a promising method to enable
the Nash equilibrium seeking without model information. Each user’s
updating law by using first-order ESC-based Nash seeking scheme
0018-9286 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015
3001
Problem 1: (Nash equilibrium seeking with time-varying payoff functions) Consider a time-varying payoff function for each
player in the game as Qi (ui (t), u−i (t), ςi (t)), i ∈ N. The mapping Qi (ui (t), u−i (t), ςi (t)), the Nash equilibrium trajectory u∗ (t),
Δ
Fig. 1. Black line and black dashed line denote q1∗ and q2∗ , respectively. The
blue and red dashed lines denote q̂1 (t) and q̂2 (t) generated by the first-order
ESC-based Nash seeking method, respectively.
Fig. 2. Black line and black dashed line denote q1∗ and q2∗ , respectively. The
blue and red dashed lines denote q̂1 (t) and q̂2 (t) generated by the proposed
ESC-based Nash seeking method, respectively.
is [23] q̂˙ i = mi sin(ωi t)Qi , qi = q̂i + ai sin(ωi t), i ∈ {1, 2}, where
mi is the integrator gain, ai and ωi are the amplitude and frequency of
the dither signal of player i, respectively. However, the first-order ESCbased Nash equilibrium seeking scheme will suffer from degraded
performance if directly applied to the case of time-varying Nash
equilibrium. The simulation results generated by the first-order ESC
and the proposed method are shown in Figs. 1 and 2, respectively. It
can be seen that the simulation result generated by the seeking strategy
built on first-order ESC displays bounded error and large chattering
for the case of time-varying Nash equilibrium. In contrast, the result
generated by the proposed method displays much smaller error and
much less chattering.
III. P ROBLEM S TATEMENT
To facilitate the subsequent design and convergence analysis, several definitions related to the Nash equilibrium seeking problem for
non-cooperative games are provided as below.
Δ
Definition 1: The game to be considered is defined as Γ =
{N, (ui )i∈N , (Qi )i∈N } where N is the set of N players, ui (t) is the
strategy for player i, U ⊂ RN = {ui | i ∈ N} denotes the strategy
Δ
space, and Qi is the payoff function for player i. Define Qi =
Qi (ui (t), u−i (t), ςi (t)) where ςi (t) is a time-varying unknown vector
and u−i (t) denotes the strategies for all the players other than player i.
Definition 2: The strategy vector of the game defined in Definition 1
is said to be at the Nash equilibrium if any unilateral change of a
player’s strategy does not increase its payoff value in the sense that
Qi (u∗i (t), u∗−i (t), ςi (t)) ≥ Qi (ui (t), u∗−i (t), ςi (t)),∀i ∈ N.
Δ
Δ
To simplify the notations, we define û = [û1 , û2 , · · · , ûN ]T , u = [u1 ,
Δ
u2 , · · · , uN ]T , u∗ = [u∗1 , u∗2 , · · · , u∗N ]T , and Eξ = [ξ1 , ξ2 , · · · , ξN ]T
where ξi = ûi − u∗i (t), ui = ûi + ai sin(ωi t), i ∈ N, and the parameter ai is assumed to be small and positive.
and the mapping at the Nash equilibrium denoted by Q∗i (t) =
Qi (u∗i (t), u∗−i (t), ςi (t)) are all unknown. Design a seeking control
scheme such that the strategy vector û(t) tracks the time-varying Nash
equilibrium u∗ (t).
Remark 1: In classical extremum seeking, an effective update law
for ûi (t) is the continuous steepest descent method, which enables
ûi (t) to converge to a constant optimal point using the gradient
measurements. However, in the case of time-varying Nash equilibrium
seeking using classic extremum seeking methods, the error and chattering can be large. The simulation result shown in Fig. 1 helps to verify
this point.
Assumption 1: The first three derivatives of the time-varying Nash
equilibrium trajectory exist and are all bounded, i.e., u̇∗ (t), ü∗ (t),
˙ü∗ (t) ∈ L∞ .
Assumption 2: The non-cooperative games in consideration admit
at least one Nash equilibrium on which ∂Qi (u∗ (t), ςi (t))/∂ui = 0
and ∂ 2 Qi (u∗ (t), ςi (t))/∂u2i < 0 for all t and for all i ∈ N.
Assumption 3: The first three partial derivatives of the payoff functions with respect to uj (t) for j ∈ N exist and are all bounded at the
Nash equilibrium trajectory u∗ (t).
Assumption 4: The frequencies of the dither signals are such that
the variations of dither signals are much faster than the variations of
the payoff functions as well as their partial derivatives with respect to
uj (t) for j ∈ N at û(t).
Assumption 5: Define
⎡
⎢
⎢
⎢
A(t) = ⎢
⎢
⎣
∂ 2 Q1 (u∗ (t),ς1 )
∂u2
1
∂ 2 Q2 (u∗ (t),ς2 )
∂u1 ∂u2
2
..
.∗
∂ 2 Q1 (u∗ (t),ς1 )
∂u1 ∂u2
···
∂ 2 Q1 (u∗ (t),ς1 )
∂u1 ∂uN
∂ 2 Q2 (u∗ (t),ς2 )
∂u2
2
···
∂ 2 Q2 (u∗ (t),ς2 )
∂u2 ∂uN
..
∂ QN (u (t),ςN )
∂u1 ∂uN
.
···
⎤
∂ 2 QN (u∗ (t),ςN )
∂u2
N
⎥
⎥
⎥
⎥.
⎥
⎦
Then A is invertible, AT A is bounded and positive definite, and
−λmax (Ak1 + k1 AT ) can be made positive and large by tuning a
positive diagonal matrix k1 .
Remark 2: Since ∂ 2 Qi (u∗ (t), ςi (t))/∂u2i < 0 for all i ∈ N, if A,
AT are strictly diagonally dominant, this assumption can be easily
satisfied, for example, by choosing all the nonzero components in
k1 equal. Even if they are not strictly diagonally dominant, this
assumption may also be satisfied.
IV. T IME -VARYING NASH E QUILIBRIUM S EEKING
In this section, we provide a Nash equilibrium tracking scheme
as well as the stability analysis of the proposed scheme. The overall
schematic outline is shown in Fig. 3. Each player employs the proposed
extremum seeking method to update its strategy. In the proposed
extremum seeking control scheme, sinusoidal dither signals are used
to modulate the players’ payoff functions, and a delay based gradient
estimation subsystem and a robust gradient search subsystem are
designed for tracking the time-varying Nash equilibrium.
In the following subsections, we will firstly consider a special case,
i.e., quadratic game, after which, we will focus our attention on a more
generalized non-quadratic game.
A. Quadratic Games
The update law for player i in the game is designed as [28]
û˙i = ki1 μi + Φi − ci1 ûi
(1)
Φ̇i = ci2 ki1 μi + ki2 sgn(μi )
(2)
3002
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015
Proof: From (3), we have
2 1 − e−Ti s
μi =
[Qi (ui , u−i , ςi ) sin(ωi t)]
ai
Ti s
t N N
1
2
=
ai T i
2
t−Ti
+
Fig. 3. Schematic diagram for time-varying Nash equilibrium seeking.
×
where μi (t) is generated by
μi (ui , u−i , ςi ) =
2 1 − e−Ti s
[Qi (ui , u−i , ςi ) sin(ωi t)]
ai
Ti s
qij (ςi )uj + si (ςi ) sin(ωi τ )dτ
sin(ωi τ )
t−Ti
N
N
1 pijk (ςi ) u∗j (τ )u∗k (τ )
2
j=1 k=1
(3)
and sgn(μi ) = 1 if μi > 0, sgn(μi ) = −1 if μi < 0, sgn(μi ) = 0 if
μi = 0. The parameters ki1 , ki2 , ci1 , ci2 are positive control gains to
be determined. The parameter Ti is a common multiple of the periods
of the dither signals.
Remark 3: The proposed extremum seeking scheme is based on
a modular design where μi (t) defined in (3) is used to extract the
gradient. The output of this gradient estimation part is fed into the
gradient search part (shown in (1) and (2)). By utilizing this gradient
estimation method, the convergence analysis can be conducted without
using the averaging method which is the main analysis tool for
classical extremum seeking methods.
Remark 4: The actual Nash trajectory is unknown and thus the
error signal is unknown. Hence, the estimated gradient is used in the
gradient search part where the function sgn(·) is used to eliminate
the effect of some bounded terms such that asymptotic convergence
can be achieved. Furthermore, the function sgn(·) is integrated in the
control law so that the overall control law is continuous and admits less
chattering [25] in comparison with discontinuous update laws such as
sliding mode controllers.
1) Game Analysis: For general quadratic games,
the payoff
N N
functions are defined as Qi (ui , u−i , ςi ) = (1/2) j=1 k=1 pijk
N
N
j=1
t
2
=
ai T i
pijk (ςi )uj (τ )uk (τ )
j=1 k=1
(ςi )uj uk + j=1 qij (ςi )uj + si (ςi ), i ∈ N where pijk , ṗijk , and p̈ijk
are all bounded. Taking the partial derivative
Nof Qi with respect to
ui yields (∂Qi /∂ui )(u, ςi ) = piii ui (t) + j=1,j=i piij uj (t) + qii ,
i ∈ N. By Assumption 2,⎡we have P u∗ + q = 0 where⎤the matrix
p111
p112 · · · p11N
p222
p22N ⎥
⎢ p221
⎥ and q =
P (t) is defined as P = ⎢
..
⎣ ...
⎦
.
pN N N
pN N 1 pN N 2
[q11 , q22 , . . . , qN N ]T is a N × 1 vector.
Lemma 1: Suppose that Assumption 2 holds and the matrix P (t) is
invertible for all t. Then, the Nash equilibrium of the quadratic game
exists and is unique.
Proof: By Assumption 2, if P u∗ + q = 0 admits a solution, then
the Nash equilibrium exists. Since the matrix P (t) is invertible, the
Nash equilibrium exists and is unique with u∗ = −P −1 q.
Remark 5: With the uniqueness of the Nash equilibrium, global
stability result can be obtained for quadratic games.
Lemma 2: Suppose Assumptions 2 and 4 hold and the frequencies
of the dither signals are chosen such that ωi = ωj , 2ωi = ωj , and
ωi = ωj + ωk , for all i, j, k ∈ N, i = j = k. Then, for a quadratic
game, μi (t) defined in (3) can be related to the error signal as μi =
N
p (ς )ξ .
j=1 iij i j
+ u∗j (τ )ξk +ak u∗j (τ ) sin(ωk τ ) + ξj u∗k (τ ) + ξj ξk
+ ak ξj sin(ωk τ )+aj sin(ωj τ )u∗k (τ )+aj sin(ωj τ )ξk
+ aj ak sin(ωj τ ) sin(ωk τ ))
+
N
qij (ςi ) u∗j (τ )+ξj +aj sin(ωj τ ) +si (ςi ) dτ.
(4)
j=1
Based on Assumption 4, the variations of Qi at û(t) are much slower
than sin(ωi t). Thus, μi (ui , u−i , ςi ) can be written as
2
μi =
ai T i
t N
t−Ti
piij (ςi ) ai u∗j (τ ) sin2 (ωi τ )
j=1
+ ai ξj sin2 (ωi τ ) + qii (ςi )ai sin2 (ωi τ )dτ =
N
piij ξj .
j=1
The detailed calculation to get the above equation is omitted due to
space limit.
Remark 6: By Assumption 4, the function of the delay based
gradient estimation module is similar to the averaging method in which
the relatively slowly varying components are regarded as constants for
analysis [27]. Furthermore, the output of the gradient estimation part
μi is equal to the gradient of (∂Qi /∂ui )(û, ςi ) for i ∈ N.
Remark 7: If Ti is not a common multiple of the periods of the
dither signals but only positive integer times of the period of player
i s dither signal, then, the output of the gradient estimation module
will have an approximation error for gradient estimation. Through
numerical simulation, we see that the proposed method still works
if |ωi − ωj |, |2ωi − ωj |, and |ωi − (ωj + ωk )|, for i, j, k ∈ N, i =
j = k are not that small. However, the transient performance may be
degraded to some extent due to imperfect gradient estimation.
Substituting the result in Lemma 2 into the update law (1) and (2),
we get
û˙ = k1 P Eξ + Φ − c1 û
Φ̇ = c2 k1 P Eξ + k2 sgn(P Eξ )
(5)
where Φ = [Φ1 , Φ2 , · · · , ΦN ]T and the bold notations k1 , k2 , c1 , c2
are used to denote diagonal control gain matrices of ki1 , ki2 , ci1 , ci2 ,
respectively.
2) Stability Analysis for Quadratic Games: For stability analysis,
define a filtered signal Eη = d(P Eξ )/dt + c2 P Eξ . Then, the derivative of Eη (t) is
Ėη = P Ëξ + c2 P Ėξ + c2 Ṗ Eξ + 2Ṗ Ėξ + P̈ Eξ
¨ − ü∗ (t) + c2 P Ėξ + c2 Ṗ Eξ + 2Ṗ Ėξ + P̈ Eξ
= P û
= P (k1 Eη + k2 sgn(P Eξ − c1 u̇∗ (t) − ü∗ (t))
− P c1 Ėξ + c2 P Ėξ + c2 Ṗ Eξ + 2Ṗ Ėξ + P̈ Eξ
= P (k1 Eη + k2 sgn(P Eξ ) − Nc (t)) + Ñ (t)
(6)
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015
where Nc (t) = c1 u̇∗ (t)+ ü∗ (t) and Ñ (t) = (−P c1 + 2Ṗ )P −1 (Eη −
c2 P Eξ − Ṗ Eξ ) + c2 (Eη − c2 P Eξ ) + P̈ Eξ .
Remark 8: By Assumption 1, u̇∗ (t), ü∗ (t),˙ü∗ (t) are all bounded.
Hence, Nc (t), Ṅc (t) are bounded. For convenience, we denote the
upper bounds of Nc (t) and Ṅc (t) as UNc and UṄc , respectively. Since
P , Ṗ , and P̈ are bounded, there exists a positive k such that Ñ (t) ≤
kE where E = [EξT EηT ]T if P −1 is bounded.
A Special Class of Quadratic Games: The subsequent analysis
considers a special case where the following assumption is satisfied.
Assumption 6: The matrix P is symmetric negative definite.
Remark 9: Under this assumption, the game can be shown to be a
potential game [26].
To facilitate the subsequent stability analysis of the closed-loop
system, define an auxiliary function H(t) as
H = λmin (k2 ) EηT (0)P (0) − EηT (0)P (0)Nc (0)
t
EηT [Nc (τ ) − k2 sgn(P Eξ )] dτ.
−
(7)
0
Lemma 3: Suppose that c2 UNc + UṄ c − λmin (c2 k2 ) ≤ 0 and
UNc − λmin (k2 ) ≤ 0. Then, H ≥ 0 under Assumption 1.
Proof: The proof can be obtained by using similar analysis
in [25].
Theorem 1: Suppose that Assumptions 1–4 and Assumption 6 hold,
P −1 is bounded and differentiable and the control gain k2 is selected
such that H(t) ≥ 0, c2 and k1 are large enough, ωi = ωj , 2ωi =
ωj , and ωi = ωj + ωk for all i, j, k ∈ N and i = j = k. Then, the
estimated control input û(t) globally asymptotically converges to the
time-varying Nash equilibrium.
Proof: Noting that −P −1 is bounded and positive definite, define
V = (m/2)(EξT P T P Eξ ) − (1/2)EηT P −1 Eη + H with m > 0 and
√
X = [EξT EηT H]T . Then, β1 X 2 ≤ V ≤ β2 X 2 where β1 and β2
are positive constants. Furthermore
d(P Eξ )
dP −1
1
− EηT P −1 Ėη + Ḣ − EηT
Eη
V̇
dt
2
dt
T T
T −1
= mEξ P (Eη − c2 P Eξ ) − Eη P Ñ (t)
+ EηT (−k1 Eη − k2 sgn(P Eξ ) + Nc (t))
dP −1
1
Eη .
− EηT [Nc (t) − k2 sgn(P Eξ )] − EηT
(8)
2
dt
= mEξT P T
Since Ñ (t) ≤ kE V̇ ≤ − m λmin (c2 )λmin (P T P ) −
1 T
P E 2
2
− λmin (k1 ) − mλmin (c2 )λmin (P T P )
1
− λmin
2
≤ − mkx E 2 +
dP −1
dt
Eη 2 + kE Eη P −1 k2 P −1 2
E 2
4ky
where kx , ky are defined in the subsequent remark and the conclusion
can be derived by choosing the control gains such that kx , ky > 0 and
kz = mkx − (k2 P −1 2 /4ky ) > 0.
Remark 10: In this Theorem, the parameters should be chosen such
that H(t) > 0, kx = λmin (c2 )λmin (P T P ) − (1/2)P T > 0, ky =
λmin (k1 )−mλmin (c2 )λmin (P T P )−(1/2)λmin (dP −1 /dt) > 0 and
kz > 0. These conditions can be achieved by choosing c2 and k1 , k2
large enough.
3003
More General Quadratic Games: In Theorem 1, we assume that the
matrix P (t) is symmetric negative definite by which we get a globally
asymptotically stable result. If Assumption 6 is not satisfied, we can
get an uniformly ultimately bounded result under a milder condition.
Noticing that for quadratic games P (t) = A(t) which is defined in
Assumption 5, the following conclusion can be derived.
Theorem 2: Suppose that Assumptions 1–5 hold, the matrix P −1
is bounded, ωi = ωj , 2ωi = ωj , and ωi = ωj + ωk , for all i, j, k ∈
N and i = j = k. Then, the estimated control input û(t) globally
asymptotically converges to a neighborhood of the time-varying Nash
equilibrium by suitably choosing the control gains.
Proof: Define a Lyapunov function candidate V = (m/2)
EξT P T P Eξ + (1/2)EηT Eη . With a positive m, we have β11 E 2 ≤
V ≤ β22 E 2 , where β11 and β22 are positive constants.
The time derivative of the Lyapunov candidate function is
V̇ = mEξT P T (Eη − c2 P Eξ ) + EηT Ñ (t)
+ EηT (P k1 Eη + P k2 sgn(P Eξ ) − P Nc (t))
1
≤ − m λmin (c2 )λmin (P T P ) − P T E 2
2
+ mλmin (c2 )λmin (P T P )EηT Eη + kh Eη 1
+ λmax (P k1 + k1 P T )EηT Eη + kE Eη 2
where kh = P k2 + P Nc (t). Define kxx = λmin (c2 )λmin
(P T P ) − (1/2)P T , kyy = −(1/2)λmax (P k1 + k1 P T )−mλmin
(c2 )λmin (P T P ). Then
V̇ ≤ −m λmin (c2 )λmin (P T P ) −
1 T
P E 2
2
λmax (P k1+k1P T )
−mλmin (c2 )λmin (P TP ) Eη 2
− −
2
+ kE Eη + kh Eη k2
E 2 + kh Eη .
≤ − mkxx −
4kyy
The positiveness of kxx , kyy , and kzz = mkxx − (k2 /4kyy ) can
be ensured by choosing the control gains suitably as described
in the subsequent remark. Hence, V̇ ≤ −kzz E 2 + kh E ≤
−kzz (E − (kh /2kzz ))2 +(kh2 /4kzz ). The ultimate bound term
can be made small by choosing the parameters such that kzz is
sufficiently large.
Remark 11: In this Theorem, the parameters should be chosen
to satisfy the following inequalities: kxx > 0, kyy > 0 and kzz =
mkxx − (k2 /4kyy ) > 0. With a fixed m, kzz can be made large by
choosing c2 , k1 such that kxx and kyy are both large.
B. General Non-Quadratic Games
In this part, the update law for gradient search subsystem is revised as
û˙i = ki1 μi + Φi − ci1 ûi
Φ̇i = ci2 ki1 μi + ki2 lgn(μi )
where lgn(·) is defined as
lgn(θ) =
sgn(θ),
κ(θ, ),
if |θ| ≥ if |θ| < .
(9)
(10)
(11)
In (11), is a small positive parameter and κ(·) is such that lgn(·) is
continuous and differentiable and the partial derivative of lgn(θ) with
respect to θ is bounded. We make this revision in this part to ensure
that the dynamics of the update law is continuous and differentiable.
The gradient estimation subsystem remains the same as for quadratic
games, i.e., μi = (2/ai )((1 − e−Ti s )/Ti s)[Qi (ui , u−i , ςi ) sin(ωi t)].
3004
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015
To view the effect of non-quadratic functions on convergence property of the proposed method, we use Taylor polynomial approximation to analyze the update law. In this note, we use the third-order
approximation
Qi (ui , u−i , ςi )
= Qi (u∗ , ςi ) +
N
∂Qi (u∗ (t), ςi )
∂uj
j=1
+
(ξj + aj sin(ωj t))
N
1 ∂ 2 Qi (u∗ (t), ςi )
(ξj + aj sin(ωj t))2
2
∂u2j
j=1
+
N
N
∂ 2 Qi (u∗ (t), ςi )
∂uj ∂uk
j=1 k=1,k=j
(ξj + aj sin(ωj t))
Fig. 4. Blue, red, and green lines denote the actual Nash equilibrium trajectory
for player 1, 2, and 3, respectively. The blue, red and green dashed lines
denote the estimated control input of player 1, 2, and 3, respectively, which
are generated by the proposed seeking method.
× (ξk + ak sin(ωk t))
+
N
1 ∂ 3 Qi (u∗ (t), ςi )
(ξj + aj sin(ωj t))3
3!
∂u3j
j=1
N
+
1 2!
N
Rewrite the closed-loop system as ξ˙i = ki1 μi + Φi − ci1 ξi −
ci1 u∗i − u̇∗i , Φ̇i = ci2 ki1 μi + ki2 lgn(μi ) where μi = (2/ai )((1 −
e−Ti s )/Ti s)[Qi (u, ςi ) sin(ωi t)]. The linearized system for this
N
closed-loop system at ξi = j=1 hijj a2j + o(maxi∈N a3i ) is
∂ 3 Qi (u∗ (t), ςi )
(ξj + aj sin(ωj t))2
∂u2j ∂uk
j=1 k=1,k=j
× (ξk + ak sin(ωk t))
+
N
N
N
∗
3
j=1 k=1,k=j l=1,l=k=j
∂ Qi (u (t), ςi )
(ξj +aj sin(ωj t))
∂uj ∂uk ∂ul
× (ξk + ak sin(ωk t)) (ξl + al sin(ωl t)) + o
max a4i
i∈N
.
(12)
Suppose that the frequencies of the dither signals are chosen according
to ωi = ωj , ωi = ωj +ωk , ωi = ωj +ωk , 2ωi = ωj +ωk , ωi = 2ωj +ωk ,
3ωi = ωj for all i, j, k ∈ N and i = j = k. Substituting (12) into
μi (ui , u−i , ςi ) = (2/ai )((1 − e−Ti s )/Ti s)[Qi (ui , u−i , ςi ) sin(ωi t)]
yields
μi (ui , u−i , ςi )
2(1 − e−Ti s )
[Qi (ui , u−i , ςi ) sin(ωi t)]
=
ai T i s
t
2
=
ai T i
Qi (ui , u−i , ςi ) sin(ωi τ )dτ
t−Ti
=
N
∂ 2 Qi (u∗ (t), ςi )
j=1
∂ui ∂uj
ξj +
1 2 a2i
ξ +
2 i
8
N 2
∂ 3 Qi (u∗ (t), ςi ) ξj
N
+ ξi
ξj
j=1
j=i
+
N
∂u2i ∂uj
N
ξj ξk
j=1,j=i k>j,k=i
+
j=1
j=i
2
∂ 3 Qi (u∗ (t), ςi )
∂u3i
+
a2j ∂ 3 Qi (u∗ (t), ςi )
4
∂ui ∂u2j
∂ 3 Qi (u∗ (t), ςi )
+ o max a3i .
i∈N
∂ui ∂uj ∂uk
(13)
The proposed extremum seeking based scheme tends to force μi ,
which is used as an approximation of the gradient of the payoff
function, to zero. From (13), we know that μi is coupled by the
higher order terms. To view the effect of the higher order terms on
the equilibrium of the overall system, we conduct a postulation which
is similar as [23, Theorem 3]. Let
ξi =
N
j=1
gij aj +
N N
hijk aj ak + o max a3i .
i∈N
(14)
j=1 k≥j
By
that the equilibrium is [23] ξi =
N(13) and 2(14), it can be obtained
3
h
a
+
o(max
a
)
where
hijj is the ith column of Hj and
ijj
i∈N
j
i
j=1
Hj = −(1/4)A−1 fjT where fj is a N ×1 vector defined as fji = (∂3 Qi /
∂ui u2j )(u∗ (t), ςi ), if i = j and fii = (1/2)(∂ 3 Qi /∂u3i )(u∗ (t), ςi ).
ξ˙i1 = ki1
N
∂ 2 Qi (u∗ (t), ςi )
∂ui ∂uj
j=1
Φ̇i = ci2 ki1
+o max a3i
N
∂ 2 Qi (u∗ (t), ςi )
j=1
∂ui ∂uj
i∈N
+o
ξj1 +Φi −ci1 ξi1
max a3i
i∈N
ξj1
N
∂lgn(μi ) ∂ 2 Qi (u∗ (t), ςi )
+ ki2
+o max a3i ξj1
i∈N
∂μi
∂ui ∂uj
(15)
j=1
N
where ξi1 = ξi − ( j=1 hijj a2j + o(maxi∈N a3i )), i ∈ N.
Theorem 3: Considering the system in (9) and (10), suppose that
Assumptions 1–5 hold, and the frequencies of the dither signals are
chosen according to ωi = ωj , ωi = ωj + ωk , ωi = ωj + ωk , 2ωi =
ωj + ωk , ωi = 2ωj + ωk , 3ωi = ωj for all i, j, k ∈ N, i = j = k.
the time-varying Nash equiThen, ûi converges to a neighborhood
of
N
librium trajectory, i.e., ûi → u∗i + j=1 hijj a2j + o(maxi∈N a3i ) if
the control gains are suitably chosen and ûi (0) is sufficiently close
to the Nash equilibrium.
Proof: This theorem can be shown by defining a filtered signal
1 T
] . The
Ez = (d(AEξ1 )/dt) + c2 AEξ1 where Eξ1 = [ξ11 , ξ21 , . . . , ξN
subsequent analysis is similar to that of Theorem 2. The candidate Lyapunov function can be defined as V = (m/2)Eξ1T AT AEξ1 + EzT Ez
with m > 0. By Lyapunov analysis, the theorem can be proved. The
details are omitted here due to space limit.
V. N UMERICAL E XAMPLE
Consider a three-player game with payoff functions given by Q1 =
−(u1 − (1/3)u2 +(1/2)u3 − ς1 +1)2 +2+ς2 , Q2 = −(−(1/4)u1 +
u2 −(1/2)u3 −ς3 +3)2 +1+ς4 , and Q3 = −(−(1/2)u1 −(1/5)u2 +
u3 + 5)2 + 2 + ς5 where ς1 , ς2 , ς3 , ς4 , and ς5 are unknown timevarying signals.
Solving this game gives the time-varying Nash equilibrium u∗1 (t) =
(108/121)ς1 + (28/121)ς3 + (8/121), u∗2 (t) = (60/121)ς1 + (150/
121)ς3 − (735/121), u∗3 (t) = (6/11)ς1 + (4/11)ς3 − (68/11) and
the payoff values for the three players at Nash equilibrium are Q∗1 (t) =
2 + ς2 , Q∗2 (t) = 1 + ς4 , and Q∗3 (t) = 2 + ς5 . The simulation results
with ω1 = 15, ω2 = 9, ω3 = 20, ai = 0.1, i ∈ {1, 2, 3} are shown
in Figs. 4 and 5. The simulation result shows that û(t) reaches a
neighborhood of the time-varying Nash equilibrium.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 11, NOVEMBER 2015
Fig. 5. Blue, red, and green lines denote the actual payoff values at the Nash
equilibrium trajectory for player 1, 2, and 3, respectively. The blue, red and
green dashed lines denote the output values of player 1, 2, and 3, respectively,
which are generated by the proposed seeking method.
VI. C ONCLUSION
In this note, an ESC-based scheme is proposed to achieve timevarying Nash equilibrium seeking with time-varying output and Nash
equilibrium. By designing a delay-based gradient estimation method
and a robust gradient search method, we propose a seeking control
scheme to track the Nash equilibrium without knowing the explicit
model information. Quadratic games are firstly considered which are
followed by more general non-quadratic games. The estimated game
strategy is shown to globally asymptotically converge to the unknown
time-varying Nash equilibrium for symmetric quadratic games. For
general quadratic games and non-quadratic games, we show that the
seeking control update law enables the estimated game strategy to
converge to a neighborhood of the Nash equilibrium.
R EFERENCES
[1] Y. Sarikaya, T. Alpcan, and O. Ercetin, “Resource allocation game for
wireless networks with queue stability constraints,” in Proc. IEEE Conf.
Decision and Control and European Control Conf., 2011, pp. 3098–3103.
[2] C. Saraydar, N. Mandayam, and D. Goodman, “Efficient power control
via pricing in wireless data networks,” IEEE Trans. Commun., vol. 50,
no. 2, pp. 291–303, Feb. 2002.
[3] A. Agah, S. Das, and K. Basu, “A non-cooperative game approach for
intrusion detection in sensor networks,” in IEEE Veh. Technol. Conf.,
2004, pp. 2902–2906.
[4] Q. Zhu, C. Fung, R. Boutaba, and T. Basar, “A game-theoretic approach
to rule sharing mechanism in networked intrusion detection systems:
Robustness, incentives and security,” in Proc. IEEE Conf. Decision and
Control and European Control Conf., 2011, pp. 243–248.
[5] Z. Ma, D. Callaway, and I. Hiskens, “Decentralized charging control of
large populations of plug-in electric vehicles,” IEEE Trans. Control Syst.
Technol., vol. 21, no. 1, pp. 67–78, Jan. 2013.
[6] A. Rad, V. Wong, J. Jatskevich, R. Scholar, and A. Garcia, “Autonomous
demand-side management based on game-theoretic energy consumption
scheduling for the future smart grid,” IEEE Trans. Smart Grid, vol. 1,
no. 3, pp. 320–331, 2010.
3005
[7] Q. Zhu, J. Zhang, P. Sauer, A. Garcia, and T. Basar, “A game-theoretic
framework for control of distributed renewable-based energy resources in
smart grids,” in Proc. American Control Conf., 2012, pp. 3623–3628.
[8] A. Fiat, E. Koutsoupias, K. Ligett, Y. Mansour, and S. Olonetsky, “Beyond
myopic best response,” in Proc. ACM-SIAM Symp. Discrete Algorithms,
2012, pp. 993–1005.
[9] P. Dean and H. Young, “Learning, hypothesis testing, and Nash equilibrium,” Games and Econom. Beh., vol. 45, no. 1, pp. 73–96, 2003.
[10] J. Shamma and G. Arslan, “Dynamic fictitious play, dynamic gradient
play, and distributed convergence to nash equilibria,” IEEE Trans. Autom.
Control, vol. 50, no. 3, pp. 312–327, Mar. 2005.
[11] D. Foster, “Regret testing: Learning to play Nash equilibrium without knowing you have an opponent,” Theoret. Econom., vol. 1, pp. 341–367, 2006.
[12] I. Cho and A. Matsui, “Learning aspiration in repeated games,”
J. Econom. Theory, vol. 124, no. 2, pp. 171–201, 2005.
[13] S. Li and T. Basar, “Distributed algorithms for the computation of noncooperative equilibria,” Automatica, vol. 23, no. 4, pp. 523–533, 1987.
[14] A. Kannan and V. Uday, “Distributed computation of equilibria in
monotone nash games via iterative regularization techniques,” SIAM J.
Optimiz., vol. 22, no. 4, pp. 1177–1205, 2012.
[15] I. Erev and E. Alvin, “Predicting how people play games: Reinforcement
learning in experimental games with unique, mixed strategy equilibria,”
American Econom. Rev., vol. 88, no. 4, pp. 848–881, 1998.
[16] M. Krstic and H. Wang, “Stability of extremum seeking feedback
for general nonlinear dynamic systems,” Automatica, vol. 36, no. 4,
pp. 595–601, 2000.
[17] K. B. Ariyur and M. Krstic, Real-Time Optimization by ExtremumSeeking Control. New York, NY, USA: Wiley-Interscience, 2003.
[18] Y. Tan, D. Nesic, I. Mareels, and A. Astolfi, “On global extremum seeking
in the presence of local extrema,” Automatica, vol. 45, no. 1, pp. 245–251,
2009.
[19] Y. Pan, T. Acarman, and U. Ozguner, “Nash solution by extremum seeking
control approach,” in Proc. IEEE Conf. Decision and Control, 2002,
pp. 329–334.
[20] Y. Pan and U. Ozguner, “Sliding mode extremum seeking control for
linear quadratic dynamic game,” in Proc. American Control Conf., 2004,
pp. 614–619.
[21] S. Liu and M. Krstic, “Stochastic nash equilibrium seeking for games with
nonlinear payoffs,” SIAM J. Control Optimiz., vol. 49, no. 4, pp. 1659–
1679, 2011.
[22] A. Orlov and A. Strekalovskii, “Seeking the equilibrium situations in
bimatrix games,” Automat. Remote Control, vol. 65, no. 2, pp. 204–218,
2004.
[23] P. Frihauf, M. Krstic, and T. Basar, “Nash equilibrium seeking in noncooperative games,” IEEE Trans. Autom. Control, vol. 57, no. 5, pp. 1192–
1207, 2012.
[24] K. Ariyur and M. Krstic, Real-time Optimization by Extremum-Seeking
Control, 1st ed. New York, NY, USA: Wiley-Interscience, 2003.
[25] C. Makkar, G. Hu, W. G. Sawyer, and W. E. Dixon, “Lyapunov-based
tracking control in the presence of uncertain nonlinear parameterizable friction,” IEEE Trans. Autom. Control, vol. 52, no. 10, pp. 1988–1994,
2007.
[26] M. Dov and L. S. Shapley, “Potential games,” Games and Econom.
Behav., vol. 14, pp. 124–143, 1996.
[27] H. Khalil, Nonlinear Systems, 3rd ed. Upper Saddle River, NJ, USA:
Prentice-Hall, 2002.
[28] M. Ye and G. Hu, “Distributed seeking of time-varying Nash equilibrium
for non-cooperative games,” in Proc. IEEE Int. Conf. Control Autom.,
2013, pp. 1674–1679.