ON VARIOUS EQUILIBRIUM SOLUTIONS FOR LINEAR
QUADRATIC NONCOOPERATIVE GAMES
DISSERTATION
Presented in Partial Fulfillment of the Requirements for
the Degree Doctor of Philosophy in the
Graduate School of The Ohio State University
By
Xu Wang, B.E., M.E.
*****
The Ohio State University
2007
Dissertation Committee:
Approved by
Prof. Jose B. Cruz, Jr., Adviser
Prof. Kevin Passino
Prof. Andrea Serrani
Adviser
Graduate Program
in Electrical and Computer
Engineering
c Copyright by
Xu Wang
2007
ABSTRACT
Game theory has been widely used to model decision making processes in economic,
political, military, and engineering fields especially when multiple players are involved.
Because it can capture the nature of these multi-player problems: the determination of one
player’s control strategy is not only subject to the system state evolution but is also tightly
coupled to the determination of the other players’ strategies and vice versa. The interweaving of actions, reactions and anti-reactions among game players really raises the essential
difference between game problems and control problems: the stability of players’ control
strategies should be considered in game problems. In this dissertation, we categorize linear
quadratic (LQ) games into three groups: definite, singular and indefinite.
For singular LQ games: 1) a new equilibrium concept: asymptotic ε-Nash equilibrium
is proposed for a two-player nonzero-sum game where each player has a control-free cost
functional quadratic in the system states over an infinite horizon and each player’s control
strategy is constrained to be continuous linear state feedback; 2) based on each player’s
singular control problem, a group of algebraic equations of system coefficients is found
whose solution can constitute the partial state feedback asymptotic ε-Nash equilibrium
for the singular LQ games. Conditions on the relationship between initial states and the
parameter ε are provided such that the asymptotic ε-Nash equilibrium will be an ε-Nash
equilibrium or an ordinary Nash equilibrium; 3) for a class of 2nd -order singular LQ games,
ii
the closed-form asymptotic ε-Nash equilibrium implemented by partial state feedback is
explicitly found in terms of system coefficients.
Robust equilibrium solutions for two-player asymmetric games with an additive uncertainty are studied: 1) regarding the uncertainty as the third player, a three-player noncooperative nonzero-sum game is formed and each player’s cost functional value resulting
from the Nash equilibrium of this three-player game is not as conservative as his/her individual rationality; 2) Robust equilibrium solution based on output feedback is provided
to accommodate the situation that the players have asymmetric knowledge about the game
evolution; 3) regarding the coalition of the original two players as one player and the uncertainty as another player, a two-player noncooperative nonzero-sum game is formed to
find an un-improvable robust equilibrium for the original game.
Inverse problems for indefinite games are investigated (for which weighting matrices
in the cost functional integral such that the given linear state feedback control strategy can
constitute a Nash equilibrium solution): 1) a necessary and sufficient condition for the
inverse problem is provided using a group of algebraic equations linear in the variables
and the weighting matrices. Because of the linearity of the equations, the inverse problem
is easier to solve compared with the direct problem: given the system and cost functional
to find the Nash equilibrium solution; 2) the inverse problem for a class of 2nd -order twoplayer LQ game is thoroughly discussed.
iii
To My Family
iv
ACKNOWLEDGMENTS
First, I would like to thank my advisor, Prof. Jose B. Cruz, Jr., a real gentleman and
a true scholar, who leads me to this interesting and challenging world of game theory. I
thank him from the bottom of my heart for his wonderful guidance, critical feedback on
my technical writing, patience, and being such an understanding advisor. During group and
individual meetings, which I have enjoyed more and more, Prof. Cruz has been helping to
refine my thoughts. The new ideas sparking during our discussions made this dissertation
possible.
I am extremely grateful to Prof. Kevin Passino and Prof. Andrea Serrani for serving in
both my candidacy exam and my oral defense. I particularly thank Prof. Passino for the
tips for graduate students on his homepage, from which I benefited a lot. Special thanks
go to Prof. Serrani for his generosity of allowing us to attend the output regulation seminar
for his research group and his wonderful lectures on nonlinear and adaptive control.
I also would like to express my thanks to Ms. Stella E. Rubia for her delicious Thanksgiving and Christmas dinners. I thank my teammates Dr. Dan Shen, Dr. Mo Wei, Dr.
Xiaohuan Tan, Dr. Dongxu Li, and Ziyi Sun for helpful discussions and comments.
Finally, I am forever indebted to my parents. It is a pity that my dearly beloved father
could not see the completion of this dissertation: Dad, I know you were too tired, just have
a good rest there. The love and care from my mother could not be more. I feel sorry that
I could not take care of her and help her like she has been doing to me. Last but not least,
v
I really appreciate the love, encouragement, help, and patience from my dear husband Dr.
Zhimin Yang who is my best friend and is always there for me. I owe him a lot, especially
the sacrifice he made in order to be together with me.
I am thankful to all my friends who accompany me through this journey all along.
vi
VITA
July, 1973 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Born - Baicheng, Jilin, China
July, 1995 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.E. in Electrical Engineering
Harbin Institute of Technology, China
July, 1997 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.E. in Electrical Engineering
Harbin Institute of Technology, China
September 2002-present . . . . . . . . . . . . . . . . . . . . . University Fellow, DAGSI Fellow,
Graduate Research Associate,
Graduate Teaching Associate,
Dept. of Electrical and Computer Eng.
The Ohio State University
Columbus, OH
PUBLICATIONS
Research Publications
Xu Wang. Jose B. Cruz, Jr., “Asymptotic ε-Nash equilibrium for 2nd two-player nonzerosum singular LQ games with decentralized control”. submitted to IFAC 2008
Xu Wang. Jose B. Cruz, Jr., “Asymptotic ε-Nash equilibrium for two-player nonzero-sum
singular LQ games”. Submitted to Automatica
Xu Wang. Jose B. Cruz, Jr., “When is linear state feedback a Nash equilibrium solution for
LQ games?”. to be submitted
Xu Wang, Jose B. Cruz, Jr., G. Chen, H. Chang, and E. Blasch, “Formation Control in
Multi-player Pursuit Evasion Game with Superior Evaders”. in Proc. of SPIE Defense and
Security Symposium, (Orlando, FL USA), 6578:36, April 9-12, 2007.
vii
Xu Wang. Jose B. Cruz, Jr., “Nash equilibrium for 2nd -order two-player non-zero sum LQ
games with executable decentralized control strategies”. in Proc. 45th IEEE Conference on
Decision and Control, (San Diego, CA USA), pp:1960–1965, Dec. 13-15, 2006.
Jose B. Cruz, Jr., G. Chen, D. Li and Xu Wang, “Particle Swarm Optimization for Resource
Allocation in UAV Cooperative Control”. in Proc. AIAA Guidance, Navigation, and
Control Conference, CD ROM, August 2004.
Jose B. Cruz, Jr., G. Chen, D. Garagic, X. Tan, D., D. Shen, M. Wei, Xu Wang, “Team
Dynamics and Tactics for Mission Planning”. in Proc. 42nd IEEE Conference on Decision
and Control, (Maui, Hawaii USA), 4:3579-3584, Dec. 9-12, 2003.
FIELDS OF STUDY
Major Field: Electrical Engineering
viii
TABLE OF CONTENTS
Page
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapters:
1.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1
1.2
1.3
1.4
2.
Motivation . . . . . . . . . . . . . . . . . . . . . .
Literature Review . . . . . . . . . . . . . . . . . . .
1.2.1 Equilibrium concepts . . . . . . . . . . . .
1.2.2 Riccati equation solution . . . . . . . . . .
1.2.3 Singular control . . . . . . . . . . . . . . .
1.2.4 Control systems with uncertainties . . . . .
1.2.5 Output feedback . . . . . . . . . . . . . . .
Contributions and Organization of This Dissertation
Notation and Abbreviation . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
7
7
9
10
12
12
15
17
Asymptotic ε-Nash Equilibrium for Nonzero-sum Singular LQ Games . . . . . 20
2.1
2.2
2.3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Approximate Solution by Cheap Control . . . . . . . . . . . . . . . . . 26
ix
2.4
2.5
2.6
2.7
3.
3.4
3.5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
26
27
38
41
49
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Problem Statement: 2-player LQ Game with Uncertainty . . . . . . . . .
Robust Output Feedback Equilibrium Strategy by 3-player LQ Game . .
3.3.1 The Vector-valued Problem of the Uncertainty . . . . . . . . . .
3.3.2 Output Feedback Nash Equilibrium of 3-player LQ Game . . . .
Un-improvable Robust Output Feedback Equilibrium Strategy by 2-player
LQ Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
52
54
54
56
60
63
When Is Linear State Feedback a Nash Equilibrium Solution for LQ Games? . 64
4.1
4.2
4.3
4.4
4.5
4.6
5.
.
.
.
.
.
.
Robust Equilibrium For Asymmetric Games by Output Feedback . . . . . . . . 50
3.1
3.2
3.3
4.
Asymptotic ε-Nash Equilibrium . . . . . . . . . . . . . . . . . .
2.4.1 Singular Optimal Control Problem for Each Player . . . .
2.4.2 Asymptotic ε-Nash Equilibrium by Decentralized Control
More General Performance Indices . . . . . . . . . . . . . . . .
Numeric Example . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
nth -order m-player Nonzero-sum LQ Games . . . . . . . . . . . . .
4.2.1 Direct Problem . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Inverse Problem . . . . . . . . . . . . . . . . . . . . . . . .
Inverse Problem for 2nd -order 2-player Nonzero-sum LQ Game . . .
4.3.1 2nd -order 2-player Nonzero-sum Game with Infinite Horizon .
4.3.2 2nd -order 2-player Nonzero-sum Game with Finite Horizon .
Extension to a 2n-order LQ Games . . . . . . . . . . . . . . . . . .
4.4.1 2n-order LQ Game with Infinite Horizon . . . . . . . . . . .
4.4.2 2n-order LQ Game with Finite Horizon . . . . . . . . . . . .
Numeric Example . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
64
69
77
79
83
83
87
90
90
91
93
94
Conclusion and Future Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.1
5.2
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Future Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
x
LIST OF TABLES
Table
Page
1.1
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2
Abbreviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1
Conditions on the weighting matrices . . . . . . . . . . . . . . . . . . . . 85
xi
LIST OF FIGURES
Figure
Page
2.1
Stability illustration based on a 2-dimension system . . . . . . . . . . . . . 30
2.2
System states versus time . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3
Phase plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4
Controls versus time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5
Relationship of Tε and ε when fixing x10 = 100, x20 = −70 . . . . . . . . 47
2.6
Range of initial conditions obtaining ε-Nash equilibrium when fixing ε = 15 48
3.1
2-player non-cooperative game with uncertainty illustration . . . . . . . . . 51
4.1
System state versus time . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2
State phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3
Control versus time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
xii
CHAPTER 1
INTRODUCTION
1.1 Motivation
In the real world, there are so many systems which involve multiple players who try to
optimize their own performance indices. Since the pioneering work by John von Neumann
and Oskar Morgenstern [56] in 1944, game theory has been widely used to model decision
making processes in economic, political, military, and engineering fields especially when
multiple players are involved. Game theory can capture the nature of these multi-player
problems: the determination of one player’s control strategy is not only subject to the
system state evolution but is also tightly coupled to the determination of the other players’
strategies and vice versa. When there is only one player in the system, it is an ordinary
control problem. The interweaving of actions, reactions and anti-reactions among game
players really raises the essential difference between game problems and control problems:
the stability of players’ control strategies should be considered in game problems. A group
of control strategies is called stable if it is an equilibrium solution. For a game problem,
an equilibrium solution is desired because the system can settle down at this equilibrium
solution and evolve according to a set of fixed control strategies.
1
Nash [55] in 1950 proposed an equilibrium definition, which is known as Nash equilibrium, for non-cooperative games where all the players decide their own control strategies
independently. A Nash equilibrium is self-enforcing because the player who deviates from
this equilibrium solution will not get better results. The extreme case of non-cooperative
games is the adversarial zero-sum game where a Nash equilibrium becomes the so called
saddle point.
Although in the real world there are few purely linear systems, under some situations,
a system can be studied by a linear model. For example, due to the high speed and short
time interval, the end game of a missile interception problem [81][80][9] can be described
by a linear model. Usually, the study of linear systems is a basis of that of nonlinear
systems. James, Nichols, and Phillips [36] studied the lease square problem for linear time
invariant (LTI) by frequency domain methods. Kalman [38] first brought the least square
optimization problem with linear differential constraints to control theory in 1960. Since
then optimal problems with quadratic running performance indices (integral with quadratic
terms on system state and / or control variables) have been given a lot of attention and
formed a large category of optimal control problems, i.e. linear quadratic problems. With
the flexible linear quadratic formulation, 1) system terminal state can be tuned; 2) system
state trajectory can be shaped; 3) control energy can be adjusted; 4) the (strong) convexity
/ concavity of the performance index functional with respect to some variables is often a
sufficient condition for an optimal solution to exist. Using a quadratic form, this condition
can be easily verified and sometimes a closed form solution can be found.
Linear quadratic game is one of the popular topics in game theory recently. For example, Basar discussed various discrete-time and continuous-time linear quadratic games in
[8]. In order to clearly describe a game, we need to answer the following questions: who
2
are the players; what are their performance indices; what kind of information is available
for each player’s decision making process at every time point; what kind of strategy can
be applied by each player; what kind of relationship exists among the players; how does
the game evolve. The linear quadratic noncooperative games studied in this dissertation is
described by
ẋ = Ax +
m
X
Bi ui
(x(0) = x0 )
(1.1)
i=1
y i = Ci x
Z ∞
Ji (x0 , ui , u−i ) =
0
(i = 1, · · · , m)
!
m
X
uTj Rij uj dt
xT Qi x +
j=1
(1.2)
(i = 1, · · · , m)
(1.3)
In (1.1), system states x ∈ Rn ; there are m players, u1 ,· · · ,um with ui ∈ Ui , {u :
[0, ∞) → Rmi } ⊂ Cmi [0, ∞). Here Cmi [0, ∞) denotes the set of all matrix valued
functions whose mi entries are continuous functions of an argument within the domain of
[0, ∞). Ui is the set of all admissible controls of player i. For example,
U , U1 × U2 · · · × Um
could be the set of all the players’ controls such that the closed-loop system is stable. Each
player’s output yi ∈ Rri and performance index Ji (x0 , ui , u−i ) ∈ R are described by
(1.2) and (1.3) respectively. In (1.3), the argument time t is suppressed; (M)T means the
transpose of matrix M; u−i = {u1 , u2 , · · · , ui−1 , ui+1 , · · · , um } denotes the collection
of all the other players’ control except for player i. System matrix A, control distribution
matrix Bi , output matrix Ci , state weighting matrix Qi = QTi , and control energy weighting
T
matrix Rij = Rij
(i, j = 1, · · · , m) are all constant matrices with proper dimension. Bi
has full column rank; Ci has full row rank; (A, Bi ) is stabilizable and (Ci , A) is detectable
(i = 1, · · · , m).
3
Denote Ii (t) the information available to player i at time t. Based on Ii (t), player i
needs to find a control strategy ui from Ui as the optimal response to all the other players’
control u−i . For example, when Ci ∈ Rn×n has full rank and Ii (t) = {yi (t)}, player i
actually has the full information about system states at time t and can construct a full state
feedback control strategy.
T
Define z = xT , uT1 , · · · , uTm , Πxi = Qi , Πui = diag(Ri1 , · · · , Rii , · · · , Rim ), and
Πi = diag(Πxi , Πui ). Then Ji (x0 , ui , u−i ) in (1.3) can be rewritten as
Ji (x0 , ui , u−i ) =
Z
∞
z T Πi z dt
(1.3′ )
0
Denote the m-player noncooperative game described by (1.1), (1.2) and (1.3′ ) as G1 . In
this dissertation, without loss of generality, we study the game problems when each player
has a minimization problem. The games when each player has a maximization problem
can be studied similarly. A Nash equilibrium solution u∗ = (u∗1 , · · · , u∗m ) ∈ U is desired
for game G1 .
Definition 1.1. u∗ = (u∗1 , · · · , u∗m ) ∈ U is Nash equilibrium solution for game G1 if we
have
Ji (x0 , u∗−i, ui) ≥ Ji (x0 , u∗−i , u∗i )
(∀ui ∈ Ui ; i = 1, · · · , m)
(1.4)
When we have an infinite horizon game problem, stability of the closed-loop system
is the basic requirement. In this dissertation, we always require that U is the set of all
the players’ controls such that the closed-loop system is stable. We have the following
definitions in terms of each player’s minimization problem.
Definition 1.2. Noncooperative game G1 is well-posed if for a Nash equilibrium u∗ =
(u∗1 , · · · , u∗m ) ∈ U we have Ji (x0 , u∗ ) > −∞
4
(i = 1, · · · , m;
∀x0 ∈ Rn ).
It is meaningful that we study a well-posed game problem. In terms of the definiteness
properties of Πi and Rii , G1 can be categorized into different classes according to the
following definitions.
Definition 1.3. Game G1 is definite if Πi ≥ 0, and Rii > 0 for all i = 1, · · · , m.
Definition 1.4. Game G1 is singular if Πi ≥ 0, and Rii ≥ 0 for all i = 1, · · · , m and there
is at least one player, say player j ∈ (1, · · · , m), whose Rjj ≥ 0 is singular.
Definition 1.5. Game G1 is indefinite if there is at least one player, say player i ∈ (1, · · · , m),
whose Πi is indefinite.
It can be proved that a definite game is well-posed and is the simplest compared with
singular and indefinite cases. Basar (Proposition 6.8 in [8]) provided a sufficient condition
for the construction of state feedback Nash equilibrium solution (1.6) for definite game G1
based on the solution (θ1 , · · · , θm ) to the coupled Riccati equations (1.5).
A−
m
X
−1 T
Bj Rjj
Bj θj
j=1
m
X
+
−1 T
Rjj
Bj θj
j=1
!T
T
θi + θi
A−
m
X
j=1
−1 T
Rij Rjj
Bj θj = 0
u∗i = −Rii−1 BiT θi x
−1 T
Bj Rjj
Bj θj
+ Qi
(1.5)
(i = 1, · · · , m)
(i = 1, · · · , m)
In (1.6), x is the state of the closed-loop system (1.7)
!
m
X
ẋ = A −
Bi Rii−1 BiT θi x
!
(x(0) = x0 )
(1.6)
(1.7)
i=1
The absence of knowledge for Riccati equations (1.5) is a bottleneck to the analysis
of linear quadratic game problems. Up to now, to the best of our knowledge, there are
no general results about existence and uniqueness of the analytical solution to the coupled
5
Riccati equations (1.5), which can be expressed explicitly in terms of system coefficients,
−1 T
due to the cross product terms θi Bj Rjj
Bj θj (i 6= j). When there is only one player in the
definite game, say it u1 , we have a well-known linear quadratic regulator (LQR) problem
and Riccati equation (1.5) becomes
−1 T
AT θ1 + θ1 A + Q1 − θ1 B1 R11
B1 θ1 = 0
(1.8)
Kalman ([38], Theorem 6.6) proved that if (A, B1 ) is completely controllable and (D1 , A)
(where D1T D1 = Q1 ) is observable there will be a unique positive definite solution θ1 to
(1.8).
If Rii is negative definite, G1 may be not well-posed. So in this dissertation, we always
assume that Rii ≥ 0 and we will point out when Rii > 0 and when Rii = 0 .
Some researchers provided solution to Riccati equations (1.5) for some scalar definite
games. Could we say more for more general cases, e.g. higher order systems?
If G1 is singular and each player’s control is constrained to be continuous due to physical limits, i.e. discontinuous controls like impulsive inputs are not allowed, then exact singular optimal control may be not possible. What kind of new equilibrium solution should
be developed in order to accommodate this situation and to be a guidance for the derivation
of each player’s singular control?
When there is an additive uncertainty in (1.1) and the multiple (≥ 2) players have
asymmetric information about the game, what kind of robust equilibrium solution can be
developed such that the security level of the players’ cost functional values is not as conservative as the individual rationalities resulting from minimax design?
For a minimization LQR control problem, it is usually difficult to explain the physical motivation if the state weighting matrix in the cost functional integral is not positive
semi-definite. But for a noncooperative game, this is meaningful especially when players
6
have conflicting objectives. If we could not solve Riccati equations associated with linear
quadratic game problems completely, could we study these games from another view of
point: given a set of players’ feedback gains, for what Qi ’s and Rij can these gains constitute a Nash equilibrium? The answer to this question can help to reveal the degree of
design freedom for these game problems.
In this dissertation, we will find the answers to these challenging problems.
1.2 Literature Review
In this section, the references related to the topics in this dissertation will be reviewed.
1.2.1 Equilibrium concepts
For non-cooperative games, we derive Nash equilibrium solutions. For cooperative
games, we find Pareto solutions. These two equilibrium concepts are the very basic definitions. Many other equilibrium concepts have been developed based on or independent
of these two equilibrium concepts in order to match special requirements of some games
or overcome the shortcomings of the already existing equilibrium concepts. When there
are more than two players in a non-cooperative game, the situation becomes a little more
complex. A non-cooperative game does not mean that each player has to play in an adversarial way. Cooperation may happen in a non-cooperative game even without prior
agreements among the players. For example, in a multi-customer communication network
game, when a communication traffic jam occurs, the later-logging customers will log off
the network willingly because the congestion will be exaggerated if they insist on logging
onto the network. So for multi-player (player number larger than 2) non-cooperative games,
Nash equilibrium concept itself is not enough to depict the stability of the solutions. Bernheim [10] [11], Moreno [54], Konishi [45] and their respective co-workers developed the
7
coalition-proof Nash equilibrium solutions which are stable with respect to not only the deviations of single player or single coalition but also the coalition structure. Fudenberg and
Levine [29] proposed the self-confirming equilibrium and consistent self-confirming equilibrium for the following situation: in a game involving nature’s move, players know their
own performance indices, the distribution of the nature’s move and other players’ strategy
space although they do not know which strategies other players will use. Like a Nash equilibrium, a self-confirming equilibrium describes the situation that each player’s strategy is
the best response to his/her beliefs about the play of other players. But a self-confirming
equilibrium only requires that each player’ beliefs about other players’ play are correct
along the equilibrium path of play while a Nash equilibrium requires that each player’s
beliefs about other players’ play are exactly correct. The consistent self-confirming equilibrium is a special case of the extensive-form correlated equilibrium introduced by Forges
[26]. Voorneveld and co-workers [83] introduced the ideal equilibrium concept in order
to derive the Pareto equilibria of multi-criterion games by solving the Nash equilibria of
scalarized games. Based on scalar ε-optimality concepts, Loridan [51] and White [87]
developed the ε-solutions for multi-criterion optimization problems. A Nash equilibrium
is a self-enforcing equilibrium because any deviation from the equilibrium solution will
get the deviator a worse result. Zhukovskii [94] introduced an enforced-by-others equilibrium called Pareto equilibrium of objections and counter-objections for non-cooperative
games. First, this equilibrium is a Pareto solution when regarding the game as a cooperative game. So this equilibrium is un-improvable. Second, any player’s deviation from the
equilibrium point will be forced back to the equilibrium by other players. Pareto-efficient
Nash equilibrium was studied in Papavassilopoulos [60] and Karavaev [41]. Being a solution to a multi-criterion optimization problem, Pareto-efficient Nash equilibrium cannot be
8
improved anymore. Actually, any Nash equilibrium with Pareto efficiency is a Pareto equilibrium of objections and counter-objections because every player in the non-cooperative
game will have no objections to the Nash equilibrium. Wang and Cruz [84] proposed the
asymptotic ε-Nash equilibrium concept for non-cooperative singular LQ games where each
player’s control is constrained to be continuous linear state feedback.
1.2.2 Riccati equation solution
In the 1960s, optimal control theory was developed in the period from classical control
theory to modern state space method. In 1960, Kalman [38] introduced his theory (later
named as Linear Quadratic Regulator) to time-varying multiple-input multiple-output linear systems with an integral performance index of quadratic form. This integral performance index described the output regulation error and control energy. Using calculus of
variations, Kalman showed that the optimal control should be of the form of linear feedback of the system states and the time varying state feedback gains can be obtained by
solving a Riccati equation backward in time to steady-state. Actually, there are various
kinds of Riccati equations (e.g. matrix / operator algebraic / differential / difference Riccati
equations (ARE/DRE)) which are associated with optimization problems such as linear
optimal control, optimal filtering, singular perturbation theory and so on. For linear dynamic games with quadratic performance indices, if each player applies a control strategy
of linear state feedback, the associated Riccati equations are coupled nonlinear equations.
Mathematically speaking, there are no good mathematical tools to discuss the existence and
uniqueness of the solutions to the Riccati equations associated with game problems. There
are many papers in which the conclusions are given in terms of Riccati equations and the
existence of the solutions to Riccati equations is an imperative assumption [17][18][16] et
9
al. Based on the existence of H∞ -type Riccati equiation solution and comparison theorem,
Abou-kandil and Freiling et al [1] provided a sufficient condition for the global existence
of solution to DRE of two-player non-cooperative game. Although sufficient conditions
are provided in [1] for ARE associated with two-person non-cooperative linear quadratic
game, the result is hard to apply because the implicit condition already involves the solutions to the Riccati equation. For simple cases, e.g. a scalar system [86][24] and when all
the coefficient matrices are diagonal matrices, sufficient conditions have been obtained. Papavassilopoulos and Olsder [63] studied for the first time the global existence of solutions
of closed loop Nash matrix Riccati equations. But the game studied in this paper has very
special form: the two players have the same control matrices in the system equation and the
same control weighting matrices in the performance indices. Cruz and Chen [19] discussed
a series solution to the Riccati equations associated with open-loop non-cooperative games.
Using Brower’s fixed-point theorem and comparison theorem, Papavassilopoulos, Cruz and
co-workers [62][61] provided sufficient conditions for the local existence of the solutions
to the coupled symmetric Riccati equations in nonzero-sum linear-quadratic games. Wang
and Cruz [85] for the first time explicitly solved Riccati equations associated with a class
of 2nd -order two-player nonzero-sum LQ games and the corresponding closed-form Nash
equilibrium solution can be implemented by decentralized control strategies.
1.2.3 Singular control
For dynamic optimal control systems, we will have singular optimal control problems
if optimal solutions cannot be decided from a minimum or a maximum principle in terms
of the Hamiltonians associated with the dynamic systems. Based on the theory of characteristics, Kelley [43] introduced a nonsingular transformation which is now well known as
10
Kelley’s transformation for Mayer problems linear in a single control variable. The result
of the transformation is that the original singular optimization problem is transformed into
a regular optimization problem of reduced dimension so that the Legendre-Clebsch condition can be applied. Butman [14] dealt with optimization problems where the cost function
does not depend on the control vector. The coefficient matrix of controls in the system
model is state-independent and has full column rank so that the state space at every time
slot can be decomposed into two orthogonal subspace: one is spanned by the columns of
control coefficient matrix and the other one is just the complement of the first subspace. As
in [43], an optimal problem of lower dimension with new defined control variables appearing in the performance index was studied for the original problem. Because the optimal
state trajectory of a singular control problem (which is often called singular arc) can only
exist within some subspace of state space, if the control variable is not constrained within
some compact set, impulses will be used in controls at the initial time in order to bring the
state trajectory immediately onto the singular arc. Speyer and Jacobson [74] studied the
optimization problem when the control entering the integral of the cost functional appears
linearly. Kelley’s transformation was applied to the accessory minimum problem (finding
the control deviation to minimize the second variation of the augmented cost functional)
to obtain a lower dimension regular optimal control problem. Ho [35] applied the above
techniques to a singular stochastic optimal control problem. In the transformed lower dimension regular optimal control problems in [43][14][74][35], the states with nonsingular
weighting matrices in performance indices were regarded as control variables. Singularly
perturbed system is another topic drawing a lot of attention. By a special coordinate basis
which concerns amplitude and time scaling, Saberi and Sannuti [67] transformed cheap
and singular LQR systems to singularly perturbed systems. Thus the singularity in the
11
performance indices was transferred to the system dynamics. The optimal solution was
approximated to some extent of accuracy by studying slow and fast regulator problems.
1.2.4 Control systems with uncertainties
When an optimal control system involves some uncertainty, robust optimal control is
desired. In classical control theory, the gain and phase margins are used to design a robust
controller for a family of controlled plants which are subject to parameter uncertainties or
unknown inputs. ’Minimax’ is a terminology originated from statistical decision theory in
the 1950s [71]. In the 1960s, researchers (Salmon [68] and Witsehausen [88]) began to use
game theory to design robust minimax controllers. Bertsekas and Rhodes [12] designed
the minimax feedback control for discrete-time system with bounded uncertainties. In the
1980s, the H ∞ control method was introduced for linear quadratic optimal control systems
with additive disturbances, e.g. Zames [92]. In fact, H ∞ control is also a worst-case design
or minimax/maxmin optimization problem which can be regarded as the steady state of a
zero-sum differential game between the controls and the disturbances. In Basar [7], the H ∞
optimal control by a dynamic game approach has been thoroughly studied. Since the 1980s,
robust controllers by game theoretic approaches have been widely used for various kinds
of control systems: references [59][91][52] studied the nonlinear H ∞ controller design
problem; [79][15] addressed the minimax controllers for stochastic systems; [42] described
an application to a military air operation problem.
1.2.5 Output feedback
For a linear time invariant system
ẋ = Ax + Bu
y = Cx
12
when a complete set of state variables is not available for the construction of control signals,
output feedback can be an alternative. Also for practical applications, as long as overall
system performance is not degraded too much, output feedback may outperform full state
feedback in the following aspects: 1) less on-line computation; 2) fewer control loops,
which implies fewer sensors and controllers; 3) lower cost and higher reliability. Output
feedback problems are more general than state feedback problems. When the output matrix
C is nonsingular, output feedback is equivalent to full state feedback.
Output feedback can be categorized into two main groups: direct output feedback (time
varying gains or constant gains) and dynamic output feedback. For direct output feedback,
control signals are directly constructed by
u = K(t)y
(1.9)
If K(t) in (1.9) is time independent, then (1.9) is static output feedback. For dynamic
output feedback, the compensator can be constructed by
ξ˙ = Eξ + F y
(1.10)
u = Gξ + Hy
(1.11)
ξ ∈ Rl (l ≤ n); E, F , G, and H are constant matrices with proper dimensions. When
(C, A) is observable, E = A + BK − LC, F = L, G = K, and H = 0, where K can
stabilize (A, B) and LT can stabilize (AT , −C T ), (1.10) is actually a state observer and
separation principle is applied here.
In terms of Riccati like equations, Levine et al [49] studied 1) necessary conditions of
static and dynamic output feedbacks for the optimal control of a LTI system with infinitetime performance index and 2) necessary and sufficient condition of the direct output feedback with time varying gains for a linear time varying system with finite-time performance
13
index. Lin and Hu [50] provided the optimal static output feedback for a linear discretetime system in earthquake engineering by solving a group of algebraic equations. In [72],
for a linear time-varying system with an additive bounded uncertainty, robust output feedback controllability was solved by a H∞ problem similar with that in [7]. [77] is a survey
for static output feedback of LTI systems and pointed out the essential difficulty: there exist
no testable necessary and sufficient conditions to test the stability of a given system using
static output feedback. Gao et al [31] compared the control effects of optimal state feedback and optimal output feedback and investigated the relationship between state feedback
gains and output feedback gains, which is described by a linear algebraic equation, in terms
of initial state conditions. Using structured Lyapunov functions and linear matrix inequalities, Prempain and Postlethwaite [65] studied static output feedback which can stabilize a
class of LTI systems and at the same time provide a sub-optimal solution to H∞ control.
Larin [47] proposed an algorithm (including how to select the initial approximation) for the
design of optimal static output feedback controller for a periodic discrete-time system by
transforming the periodic system to a stationary system.
Takaba and Katayama [78] used model matching technique in frequency domain to
solve a H2 problem in transfer function setting for a multi-variable descriptor system. The
optimal controller was given by solving two general algebraic Riccati equations (ARE),
which was actually corresponding to the descriptor variable feedback and the descriptor
variable estimator in state space. Grimble [33] found the optimal output feedback solution to a H2 problem by solving polynomial matrix equations of z-domain transfer function, which were obtained from Diophantine approach (an alternative to model matching
method). Combining nonlinear model predictive control and high-gain observer, Findeisen
[25] et al proposed a output feedback stabilization scheme which achieved semi-global
14
practical stability for uniformly completely observable nonlinear multiple-input-multipleoutput (MIMO) systems. The output feedback scheme was implemented in a open-loopfeedback manner. Based on the solutions to Riccati-type coupled equations, Dragan [21] et
al discussed the H2 output feedback for stochastic linear systems subjected to both Markov
jumps and multiplicative white noise. Edwards et al [22] studied dynamic output feedback
min-max controllers for non-square linear systems with additive uncertainty which has the
same distribution matrix as the control input. Andrieu and Lahanier [5] studied dynamic
output feedback for the guidance and autopilot law design of a nonlinear terminal missile
interception problem.
1.3 Contributions and Organization of This Dissertation
The backgrounds and motivations of this dissertation are introduced in chapter 1. The
main contributions of this dissertation are as follows:
1. In chapter 2, singular LQ games are solved.
• A new equilibrium concept: asymptotic ε-Nash equilibrium is proposed for a
two-player nonzero-sum game where each player has a control-free cost functional quadratic in the system states over an infinite horizon and each player’s
control strategy is constrained to be continuous linear state feedback.
• Based on each player’s singular control problem, a group of algebraic equations
of system coefficients is found whose solution can constitute a partial state feedback asymptotic ε-Nash equilibrium for the singular LQ games.
15
• Conditions on the relationship between initial states and the parameter ε are
provided such that the asymptotic ε-Nash equilibrium will be an ε-Nash equilibrium or an ordinary Nash equilibrium.
• For a class of 2nd -order singular LQ games, the closed-form asymptotic ε-Nash
equilibrium implemented by partial state feedback is explicitly found in terms
of system coefficients.
2. Robust equilibrium solution for two-player asymmetric games with an additive uncertainty is proposed in chapter 3.
• Regarding the uncertainty as the third player, a three-player noncooperative
nonzero-sum game is formed and each player’s cost functional value resulting
from the Nash equilibrium of this three-player game is not as conservative as
their own individual rationality.
• Robust equilibrium solution based on output feedback is provided to accommodate the situation that the players have asymmetric knowledge about the game
evolution.
• Regarding the coalition of the original two players as one player and the uncertainty as another player, a two-player noncooperative nonzero-sum game is
formed to find the un-improvable robust equilibrium for the original game.
3. Chapter 4 discusses the inverse problem for indefinite games: for which weighting
matrices in the cost functional integral such that the given stabilizing control strategies (linear state feedback) constitute a Nash equilibrium solution.
16
• A necessary and sufficient condition for the inverse problem is provided using a
group of algebraic equations linear in the variables and the weighting matrices.
Because of the linearity of the equations, a inverse problem is easier to solve
compared with the direct problem: given the system and cost functional, find
the Nash equilibrium solution.
• The inverse problem for a class of 2nd -order two-player LQ game is solved.
1.4 Notation and Abbreviation
17
x
x̂
x̂i
ui
u−i
Ui
U−i
U
z
n
m
mi
r
ri
A
B
Bi
Di
Ei
Ki
Fi
I
J
Ji
Ci
Q
Qi
Rij
Cpi×j [a, b]
system state
system state estimation
system state estimation by player i
player i’s control variable
the collection of controls: (u1 , · · · , ui−1, ui+1 , · · · , um )
the set of admissible controlS of player i
the collection of control sets: U1 × · · · Ui−1 × Ui+1 × · · · × Um
the collection of all control sets: U1 × · · · × Um
uncertainty
system state dimension
control dimension
player i’s control dimension
ouput dimension
player i’s output dimension
system matrix
control distribution matrix
player i’s control distribution matrix
player i’s observer gain
player i’s output matrix
player i’s state feedback gain
player i’s output feedback gain
identity matrix
performance index
player i’s performance index
player i’s state terminal weighting matrix
state weighting matrix
player i’s state weighting matrix
player i’s control weighting matrix for uj
collection of matrix valued functions whose i × j entries are pth -order
continuously differentiable over [a, b]
θ Riccati equation solution
θi Riccati equation solution for player i
λ eigenvalue
σ(M) spectrum of matrix M
Ri×j i × j real number space
C complex number space
C complex numbers with negative real parts
|| vector norm
kk matrix norm
Table 1.1: Notation
18
ARE
DRE
LQ
LQR
LTI
RCLF
HJB
algebraic Riccati equation
differential Riccati equation
linear quadratic
linear quadratic regulator
linear time invariant
robust control Lyapunov function
Hamilton-Jacobi-Bellman
Table 1.2: Abbreviation
19
CHAPTER 2
ASYMPTOTIC ε-NASH EQUILIBRIUM FOR NONZERO-SUM
SINGULAR LQ GAMES
2.1 Introduction
For parameter optimization problems (we can regard these problems as static optimization problems in comparison with dynamic optimal problems where the system evolution
can be described by differential or difference equations), if the Hessian matrix of the performance index with respect to the parameter is singular, then we have a singular optimization
problem. In this situation, more information is needed in order to decide whether a stationary point (at which the Jacobian matrix of the performance index with respect to the
parameter is zero) is a local optimum. For dynamic optimal control systems, we will have
singular optimal control problems if optimal solutions cannot be decided from a minimum
or a maximum of the Hamiltonians associated with the minimum principle for dynamic
systems. In terms of the classical linear quadratic regulator problem which is described by
(2.1) (where x is the system state, u is the control variable and state weighting matrices P
and Q in the cost functional are positive semi-definite),
ẋ = Ax + Bu
(x(t0 ) = x0 )
Z tf
T
J (x0 , u) = xtf P xtf +
xT Qx + uT Ru dt
t0
20
(2.1)
if the control weighting matrix R in the cost functional is positive definite, (2.1) is a regular
optimal control problem; if R is positive semi-definite, (2.1) is a singular optimal control
problem. An extreme case of singular LQR problem is when R = 0, i.e. the cost functional
is control-free. In singular optimal control problems, Hamiltonians may have non-unique
optimums and we cannot find the possible optimal control candidates directly from the
Weierstrass necessary condition, that is, the partial derivative of the Hamiltonian with respect to the control variable should be zero. A differential game is singular if some players
in the game have singular optimal control problems. Singular problems are very difficult
to solve and there are few publications contributed to this topic.
In the scenario of non-cooperative games, all players’ strategies should be derived
simultaneously to obtain the Nash equilibrium solution. So each player is sensitive to
changes in the other players’ strategies. Thus in singular games there are special phenomena and properties that cannot be found in singular control problems. References
[3][53][40] dealt with singularity in zero-sum games. Amato and Pironti [3] studied a twoplayer linear quadratic game where singularity arose because of semi-definiteness of one
player’s weighting matrix in the cost functional. The transformation method in [14] was
applied to obtain a reduced order nonsingular game. Regular and singular points, which
are the bases to define singular and switching surfaces, were defined in [53][40] and the
discussions therein were based on the method of characteristics. Melikyan [53] applied the
method of singular characteristics to study equivocal surfaces which only exist in singular
zero-sum games due to particular convex-concave Hamiltonians. Kamneva [40] studied an
optimum-time differential game where the evolution of system states was affected by two
separate parts which were controlled by two players respectively. The singular surfaces
contained in this optimum-time game were dispersal, equivocal and switching surfaces.
21
Comparing optimal control and nonzero-sum differential games, Starr and Ho pointed
out in [76][75] the special difficulties in the latter: 1) the necessary conditions that must
be satisfied by Nash equilibrium solutions are a set of partial differential equations while
the counterpart in optimal control are a set of ordinary differential equations; 2) the relationship between the open-loop and closed-loop solution in optimal control is no longer
guaranteed in game problems. Sarma and Prasad [70] discussed methods to categorize
and construct different kinds of switching surfaces in N-person nonzero-sum differential
games and pointed out that the dispersal surfaces in nonzero-sum games are more complicated than those in zero-sum games. Olsder [57] studied nonzero-sum differential games
with saturation constraints in the control variables. Open- and closed-loop bang-bang control were provided and applied to nonlinear and linear examples. Singular surfaces as a
by-product were also discussed for a better understanding of the properties of value functions.
The exact optimal control in singular optimal control problems or optimal strategies
in singular games may contain discontinuities where optimal paths cross the switching
surfaces and Hamiltonians are non-smooth. For example, when control variables are constrained to compact sets, the optimal path will be of bang, singular-arc, and bang type;
when impulses are allowed, the optimal path will be of impulse and singular-arc type. In
this chapter, we discuss a class of two-player nonzero-sum linear quadratic games where
each player has a singular optimal control problem (cost functionals are quadratic only in
system states). The only constraint on two players’ strategies is that they must be continuous linear state feedback, which is practical from a point of view of real applications.
Because of the exclusion of discontinuities in players’ strategies, exact singular control for
each player may not be fulfilled. Thus we propose a new equilibrium concept: asymptotic
22
ε-Nash equilibrium. Note that the term ’asymptotic’ here is different from that in Glizer
[32]. The term ’asymptotic’ in our study is in the sense of time. While ’asymptotic’ in [32]
is related to the expansion order of the approximate solution in the boundary layer method
used to solve singularly perturbed Riccati equations, which depicts the accuracy extent of
the approximate solution. The ε-optimality has been employed to construct equilibrium
solutions in game problems. For example, in Fudenberg and Levine [28], Zhukovskiy and
Salukvadze [95], Xu and Mizukami [90] and Jimenez and Poznyak [37], ε-equilibria and
ε-saddle points were discussed for non-cooperative games.
This chapter is organized as follows: the problem and several equilibrium definitions,
including the new equilibrium concept: asymptotic ε-Nash equilibrium, are introduced in
section 2.2. An approximate solution by cheap control method is introduced in section 2.3.
According to the property of the problem that each player has a control free cost functional,
the singular control problem faced by each player is formulated in section 2.4.1. Based on
each player’s singular control problem, the asymptotic ε-Nash equilibrium implemented by
partial state feedback is proposed in section 2.4.2 and consequently the closed-loop system
becomes a decentralized system. Section 2.5 is devoted to deal with performance indices
of more general format. Conclusion is given in section 2.7 following a numeric example in
section 2.6.
2.2 Problem Statement
A two-player linear game system is described by
ẋ1 = A11 x1 + A12 x2 + B1 u1
(x1 (0) = x10 )
(2.2)
ẋ2 = A21 x1 + A22 x2 + B2 u2
(x2 (0) = x20 )
Z ∞
Ji (x0 , u1 , u2 ) =
xTj Qij xj + xTi Qii xi dt (i, j = 1, 2; i 6= j)
0
23
(2.3)
where the state xi ∈ Rni and each player’s control ui ∈ Ui = {u : R≥0 → Rmi } with Ui ∈
Cmi [0, ∞] (mi ≤ ni , i = 1, 2). Ui implies that each player’s control must be continuous.
A11 , A12 6= 0, A21 6= 0, A22 , B1 and B2 are constant real matrices of proper dimension.
Assume (Aii , Aij ) is stabilizable and rank(Bi ) = mi (i, j = 1, 2; i 6= j). Define
A11 A12
A=
A21 A22
B̄1 =
B1
0
B̄2 =
0
B2
x1 (t)
x(t) =
x2 (t)
Further assume that (A, B̄i )(i = 1, 2) is stabilizable. Player i will choose his/her own
control strategy ui (i = 1, 2) independently so that his/her performance index (2.3) can be
minimized.
The constant real state weighting matrices Qii > 0 (i.e. positive definite) and Qij ≥ 0
(i.e. positive semi-definite) (i, j = 1, 2; i 6= j) are of proper dimension. Decompose Qij
to be Qij = CijT Cij and assume that (Cij , Ajj ) is detectable (i, j = 1, 2; i 6= j).
Besides the constraint of continuity, each player should construct his/her control strategy by linear state feedback method as described by (2.4)
ui (t) = −Ki x(t) = − (ki1 ki2 ) xT1 (t) xT2 (t)
T
(2.4)
The two players have perfect knowledge about the system model and the structure of the
other player’s performance index. And at each time, each player has perfect information
about system state x(t). Denote the game described by (2.2)-(2.4) by G2 .
Note that each player’s control does not appear in performance indices (2.3). Thus each
player faces a singular control problem. Because each player’s control is constrained to
be continuous and impulsive control cannot be used, the exact optimal control for each
player’s singular optimal control problem may not be obtained. So we introduce the ‘εNash equilibrium’ and ‘asymptotic ε-Nash equilibrium’ concepts based on the definition
24
of ε-optimality. First define the truncated performance index as
Ji (t, x0 , u1 , u2 ) =
Z
t
∞
xTj Qij xj + xTi Qii xi dτ
(t ∈ [0, +∞); i, j = 1, 2; i 6= j)
(2.5)
Definition 2.1. For ∀ε > 0, (u∗1ε , u∗2ε ) is called an ε-Nash equilibrium solution for game
G2 if we have
J1 (x0 , u∗1ε , u∗2ε ) ≤ J1 (x0 , u1 , u∗2ε ) + ε
(∀u1 ∈ U1 )
(2.6)
J2 (x0 , u∗1ε , u∗2ε ) ≤ J2 (x0 , u∗1ε , u2 ) + ε
(∀u2 ∈ U2 )
Definition 2.2. For ∀ε > 0, u∗1Tε , u∗2Tε is called an asymptotic ε-Nash equilibrium so-
lution for game G2 if there exists a finite number Tε ∈ [0, ∞) such that for ∀t ≥ Tε we
have
J1 t, x0 , u∗1Tε , u∗2Tε ≤ J1 t, x0 , u1 , u∗2Tε + ε
J2 t, x0 , u∗1Tε , u∗2Tε ≤ J2 t, x0 , u∗1Tε , u2 + ε
(∀u1 ∈ U1 )
(2.7)
(∀u2 ∈ U2 )
Definition 2.2 focuses on what will happen to game G2 from time point Tε on.
Remark 2.1. 1)If ε=0 in (2.6), then ε-Nash equilibrium (2.6) is consistent with the ordinary
Nash equilibrium (1.4); 2) If Tε = 0 in (2.7), then the asymptotic ε-Nash equilibrium
(2.7) is consistent with the ε-Nash equilibrium (2.6); 3) If Tε = ε = 0 in (2.7), then the
asymptotic ε-Nash equilibrium (2.7) is consistent with the ordinary Nash equilibrium (1.4).
We now try to find the ε-Nash equilibrium (u∗1ε , u∗2ε ) and asymptotic ε-Nash equilib
rium u∗1Tε , u∗2Tε for game G2 which is implemented by linear state feedback strategy
(2.4).
25
2.3 Approximate Solution by Cheap Control
By the method of cheap control, singular game G2 can be solved approximately by the
following definite game
ẋ1 = A11 x1 + A12 x2 + B1 u1
(x1 (0) = x10 )
ẋ2 = A21 x1 + A22 x2 + B2 u2
(x2 (0) = x20 )
(2.8)
Z ∞
Ji (x0 , u1 , u2 ) =
xTj Qij xj + xTi Qii xi + εuTi RiiT ui dt (i, j = 1, 2; i 6= j)
0
In (2.8), ε > 0 is an arbitrarily small number, and Rii > 0. When ε = 0 in (2.8), game (2.8)
is equivalent to game G2 . As described in section 1.1, state feedback Nash equilibrium (1.6)
for game (2.8) can be found by solving Riccati equation (1.5) with Rii and Rij replaced by
εRii and 0 respectively. Then the state feedback Nash equilibrium for definite game (2.8)
is an approximate solution for singular game G2 .
2.4 Asymptotic ε-Nash Equilibrium
Observing performance index (2.3), because of the positive definiteness of Qii , we can
regard xi as the fake control for each player first to solve his/her singular optimal control
problem as shown in section 2.4.1. Then the asymptotic ε-Nash equilibrium implemented
by decentralized control strategy is obtained in section 2.4.2. At the same time, the asymptotic stability of the closed system and the value of Tε are also provided.
2.4.1 Singular Optimal Control Problem for Each Player
When discussing the singular optimal control problem Pi faced by each player i, let us
just temporarily fix the other player’s control uj (i, j = 1, 2; i 6= j) .
(
ẋj = Ajj xj + Aji xi + Bj uj
R∞
Pi :
(i, j = 1, 2; i 6= j)
Ji (x0 , u1 , u2 ) = 0 xTj Qij xj + xTi Qii xi dt
26
(2.9)
Now we can find player i’s optimal control xi by dynamic programming method. First
construct the Hamilton-Jacobi-Bellman equation (2.10) for player i.
min{Hi } = 0
xi
(i = 1, 2)
(2.10)
where the Hamiltonian Hi is defined as
Hi =
∂Vi
∂xj
T
(Ajj xj + Aji xi + Bj uj ) + xTj Qij xj + xTi Qii xi
(i, j = 1, 2; i 6= j)
Assume the scalar function Vi (t, x) determined by (2.10) has the form Vi = xTj θi xj with θi
a positive semi-definite matrix.
It can be shown that
∂ 2 Hi
∂x2i
= 2Qii > 0 , i.e. Hi is strictly convex with respect to xi . This
implies that there exists a unique control strategy xi such that Hi can be minimized. By the
necessary condition (here it is also sufficient because Hi is strictly convex with respect to
xi ): the first order derivative of Hamiltonian Hi with respect to xi equals zero, i.e.
T !
∂Hi
∂uj
T
= 2 Aji +
BjT θi xj + 2Qii xi = 0
∂xi
∂xi
then the pseudo control strategy xi to minimize Hi should be of the following form
T !
∂uj
−1
T
xi = −Qii Aji +
BjT θi xj (i, j = 1, 2; i 6= j)
(2.11)
∂xi
2.4.2 Asymptotic ε-Nash Equilibrium by Decentralized Control
Substitute (2.11) into (2.10), we have
#
"
T !
∂u
j
2xTj θi Ajj xj − Aji Q−1
ATji +
BjT θi xj + Bj uj
ii
∂xi
T !
∂u
∂u
j
j
+xTj θi Aji + Bj
Q−1
ATji +
BjT θi xj + xTj Qij xj = 0
ii
∂xi
∂xi
(i, j = 1, 2; i 6= j)
27
(2.12)
Because (2.11) is the desired state relationship for each player in order to minimize his/her
own performance index, now define two new variables
T !
∂u
j
si = xi + Q−1
BjT θi xj
ATji +
ii
∂xi
(i, j = 1, 2; i 6= j)
(2.13)
If si tends to be zero, then (2.11) can be satisfied. For two matrices Li ’s such that −Li ’s
are Hurwitz, which will be determined later, let
ṡi = −Li si
(i, = 1, 2)
(2.14)
i.e.
ẋi + Q−1
ii
ATji +
∂uj
∂xi
T
!
BjT θi ẋj
∂uj
= (Aii xi + Aij xj + Bi ui ) + Q−1
ATji +
ii
∂xi
!
#
"
T
∂u
j
= − Li xi + Q−1
ATji +
BjT θi xj
ii
∂xi
T
!
BjT θi (Ajj xj + Aji xi + Bj uj )
(i, j = 1, 2; i 6= j)
(2.15)
System (2.14) is an exponentially stable system. Depending on the eigenvalues of −Li ’s, si
will converge to the equilibrium (the origin) with some convergence rate. From (2.12) and
(2.15), we need to find θ1 , θ2 , u1, and u2 . Observing (2.12), we know that if each player
applies partial state feedback strategy, i.e.
ui (t) = −kii xi (t) (i = 1, 2)
(2.16)
then (2.12) will be an equation of θi which only involves xj . After factoring out xj , (2.12)
is equivalent to Riccati equation (2.17). Note that, from (2.16), we have
T
2θ1 (A22 − B2 k22 ) − θ1 A21 Q−1
11 A21 θ1 + Q12 = 0
2θ2 (A11 − B1 k11 ) −
T
θ2 A12 Q−1
22 A12 θ2
28
+ Q21 = 0
∂ui
∂xj
= 0.
(2.17)
At the same time, under the assumption (2.16), (2.15) becomes (2.18)
T
A11 − B1 k11 + Q−1
11 A21 θ1 A21 + L1 x1
T
−1 T
+ A12 + Q−1
11 A21 θ1 (A22 − B2 k22 ) + L1 Q11 A21 θ1 x2 = 0
T
−1 T
A21 + Q−1
22 A12 θ2 (A11 − B1 k11 ) + L2 Q22 A12 θ2 x1
(2.18)
T
+ A22 − B2 k22 + Q−1
22 A12 θ2 A12 + L2 x2 = 0
The necessary and sufficient condition such that (2.15) can be satisfied for any x1 and x2 is
that the coefficients of every state in (2.18) is zero, i.e. (2.19) and (2.20)
T
A11 − B1 k11 + Q−1
11 A21 θ1 A21 + L1 = 0
(2.19)
T
−1 T
A12 + Q−1
11 A21 θ1 (A22 − B2 k22 ) + L1 Q11 A21 θ1 = 0
T
−1 T
A21 + Q−1
22 A12 θ2 (A11 − B1 k11 ) + L2 Q22 A12 θ2 = 0
(2.20)
T
A22 − B2 k22 + Q−1
22 A12 θ2 A12 + L2 = 0
From (2.17), (2.19) and (2.20), we need to find −L1 (Hurwitz), −L2 (Hurwitz), θ1 > 0,
θ1 > 0, k11 and k22 .
So for singular game G2 , we have the following theorem
Theorem 2.1. If there exists qualified solutions −L1 (Hurwitz), −L2 (Hurwitz), θ1 > 0,
θ1 > 0, k11 and k22 to algebraic equations (2.17), (2.19) and (2.20), then decentralized
control strategy (2.16) constitutes an asymptotic ε-Nash equilibrium for Game G2 . Meanwhile the closed-loop system is stable and Tε can be found by (2.43).
Proof. Suppose that the conditions in the theorem are satisfied. Define a Lyapunov function
candidate as
V (t) = xT1 θ2 x1 + xT2 θ1 x2
29
(2.21)
Figure 2.1: Stability illustration based on a 2-dimension system
30
Because of the assumptions θ1 > 0 and θ2 > 0, we know that this is a Lyapunov function.
Considering (2.17), by some manipulation we can get its derivative with respect to time
T
T
−1 T
V̇ (t) =xT1 −Q21 − θ2 A12 Q−1
22 A12 θ2 x1 + x2 −Q12 − θ1 A21 Q11 A21 θ1 x2
(2.22)
T T
T
T T
T
+ s2 A12 θ2 x1 + x1 θ2 A12 s2 + s1 A21 θ1 x2 + x2 θ1 A21 s1
Define the neighborhood Ω|si |≤δi (δi > 0) of the hyperplane si = 0 as
Ω|si |≤δi ,
T
(x1 , x2 ) : |si | = |xi + Q−1
ii Aji θi xj | ≤ δi
(2.23)
Denote |v| the L2 -norm of a vector v. It can be proved that Ω|si |≤δi is invariant under the
conditions in the theorem. Also define several neighborhoods around the origin as
Nx , {(x1 , x2 ) : |x| ≤ ε̄, ε̄ > 0}
(2.24)
′
N x , {(x1 , x2 ) : |x1 | ≤ δ, |x2 | ≤ δ, δ > 0}
(2.25)
Nx1 , {(x1 , x2 ) : |x1 | ≤ δ, δ > 0}
(2.26)
Nx2 , {(x1 , x2 ) : |x2 | ≤ δ, δ > 0}
(2.27)
Select
δ = min
δi ≤
)
√
√
2 ε̄
2 ε̄
2 ε̄
,
,
−1 T
T
2
3 k Q−1
11 A21 θ1 k 3 k Q22 A12 θ2 k
(√
(2.28)
1
T
k Q−1
ii Aji θi k δ
2
(2.29)
Here k M k denotes a matrix 2-norm. Then we have
′
N x ∈ Nx
(2.30)
Considering (2.13), the closed-loop system under (2.16) is
T
ẋ1 = (A11 − B1 k11 ) x1 + A12 x2 = (A11 − B1 k11 ) x1 + A12 s2 − Q−1
22 A12 θ2 x1
ẋ2 = A21 x1 + (A22 − B2 k22 ) x2 = A21 s1 −
31
T
Q−1
11 A21 θ1 x2
+ (A22 − B2 k22 ) x2
(2.31)
By (2.14) we have
si (t) = e−Li t si (0)
T
i = 1, 2; t ≥ 0; si (0) = xi (0) + Q−1
ii Aji θi xj (0)
(2.32)
So there must exist a time T such that the state trajectory of (2.31) satisfies (2.33)
x(t) ∈
2
\
Ω|si |≤δi
i=1
(t ≥ T )
(2.33)
Actually, we can select T as
T = max Ti
i=1,2
(
1
where : Ti =
minσ(Li )
0,
{Re (σ(Li ))} ln
else
δi
|si (0)|
,
|si(0)| =
6 0
(2.34)
In (2.34), operator Re() gets the real part of a complex number; operator σ(M) gets the
spectrum of a square matrix M. We consider three situations which can cover the whole
state space.
1) If x1 6∈ Nx1 and x2 6∈ Nx2 when t ≥ T , by (2.29) we know that
T
sT2 AT12 θ2 x1 + xT1 θ2 A12 s2 − xT1 θ2 A12 Q−1
22 A12 θ2 x1 < 0
T
sT1 AT21 θ1 x2 + xT2 θ1 A21 s1 − xT2 θ1 A21 Q−1
11 A21 θ1 x2 < 0
Observing (2.22), we have
V̇ (t) < 0
(x1 6∈ Nx1 , x2 6∈ Nx2 , t ≥ T )
Under this situation, the state trajectory of (2.31) will approach the origin. And once x(t)
enters the neighborhood Nx of the origin, x(t) will remain within Nx because Ω|si |≤δi is
invariant.
32
2) If x1 ∈ Nx1 and x2 6∈ Nx2 when t ≥ T ,
T
|s2 | = |x2 + Q−1
22 A12 θ2 x1 | ≤ δ2
T
=⇒|x2 | − |Q−1
22 A12 θ2 x1 | ≤ δ2
=⇒|x2 | ≤
T
kQ−1
22 A12 θ2 k|x1 |
+ δ2 ≤
T
kQ−1
22 A12 θ2 kδ
√
3 −1 T
2
+ δ2 ≤ kQ22 A12 θ2 kδ ≤
ε̄
2
2
=⇒|x| ∈ Nx
Under this situation, the state trajectory of (2.31) is already within the neighborhood Nx of
the origin.
3) Similarly, if x1 6∈ Nx1 and x2 ∈ Nx2 when t ≥ T ,
T
|s1 | = |x1 + Q−1
11 A21 θ1 x2 | ≤ δ1
T
=⇒|x1 | − |Q−1
11 A21 θ1 x2 | ≤ δ1
√
2
3
−1 T
T
−1 T
ε̄
=⇒|x1 | ≤ kQ−1
11 A21 θ1 k|x2 | + δ1 ≤ kQ11 A21 θ1 kδ + δ1 ≤ kQ11 A21 θ1 kδ ≤
2
2
=⇒|x| ∈ Nx
The state trajectory of (2.31) is within the neighborhood Nx of the origin.
Thus, closed-loop system (2.31) is stable under control strategy pair (2.16). Figure 2.1
illustrates the neighborhood definitions (2.24)-(2.27) for a 2-dimension system.
Now, let us find Tε . By (2.13) and (2.32), we have
T
xi (t) = si (t) − Q−1
ii Aji θi xj
−Li t
=e
si (0) −
(i = 1, 2; t ≥ 0)
T
Q−1
ii Aji θi xj
33
(2.35)
Then a truncated version of the cost functional in (2.9) becomes (2.36)
Z ∞
Ji (Tεi , x0 , u1 , u2 ) =
(i = 1, 2)
xTj Qij xj + xTi Qii xi dt
=
Z
Tεi
∞
Tεi
∞
=
Z
Tεi
∞
=
Z
|
Tεi
h
T
i
T
−1 T
xTj Qij xj + si − Q−1
A
θ
x
Q
s
−
Q
A
θ
x
dt
i
j
ii
i
i
j
ji
ji
ii
ii
T
T
T
T
xj (Qij + θi Aji Q−1
ii Aji θi )xj − 2xj θi Aji si + si Qii si dt
T
T
T
xj Qij + θi Aji Q−1
ii Aji θi xj − 2xj θi Aji si dt
{z
}
part 1
+
Z
|
∞
Tεi
sTi (t)Qii si (t)dt
{z
(2.36)
}
part 2
Define Wi = xTj θi xj . And note that
Ẇi = ẋTj θi xj + xTj θi ẋj
= (Ajixi + (Ajj − Bj kjj ) xj )T θi xj + xTj θi (Aji xi + (Ajj − Bj kjj ) xj )
h
i
= xTj (Ajj + Bj kjj )T θi + θi (Ajj + Bj kjj ) xj + xTi ATji θi xj + xTj θi Ajixi
By (2.35) and the first equation in (2.17), we have
h
i
T
T
−1 T
Ẇi = xj (Ajj + Bj kjj ) θi + θi (Ajj + Bj kjj ) − 2θi Aji Qii Ajiθi xj
+ sTi ATjiθi xj + xTj θi Ajisi
(2.37)
T
T T
T
= xTj −Qji − θi Aji Q−1
ii Aji θi xj + si Aji θi xj + xj θi Aji si
Note that xj (∞) = 0. Integrating both sides of (2.37) from Tεi to ∞, we have
0−
xTj (Tεi )θi xj (Tεi )
=
Z
∞
Tεi
T
T
T T
T
xj −Qji − θi Aji Q−1
ii Aji θi xj + si Aji θi xj + xj θi Aji si dt
Thus we find the value of part 1 in (2.36), i.e.
part 1 =
Z
∞
Tεi
T
T
T
T
xj Qji + θi Aji Q−1
ii Aji θi xj − 2xj θi Aji si dt = xj (Tεi )θi xj (Tεi ) (2.38)
34
and consequently (2.36) becomes
Ji (Tεi , x0 , u1 , u2 ) =
Z
∞
Tε
Z i∞
= xTj (Tεi )θi xj (Tεi ) +
|
Denote
Tεi
xTj Qij xj + xTi Qii xi dt
{z
}
0 < λik ∈ σ Li + LTi
1≤k≤ni
(2.39)
sTi (t)Qii si (t)dt
part 2
λi = min λik
(i = 1, 2)
ρi = max ρik (0 < ρik ∈ σ (Qii ))
1≤k≤ni
(i = 1, 2)
Denote I the identity matrix of some dimension. Note that by the above notations, we have
Qii ≤ ρi I and − Li + LTi ≤ −λi I. Then we can find a bound for part 2 in (2.36) as
Z ∞
part 2 =
sTi (t)Qii si (t)dt
Tεi
∞
= sTi (0)
Z
∞
≤ sTi (0)
Z
T
e−Li t Qii e−Li t dt si (0) (i = 1, 2)
Tεi
!
T
e−Li t ρi Ie−Li t dt si (0)
Tεi
∞
= ρi sTi (0)
Z
∞
≤ ρi sTi (0)
Z
=
!
(2.40)
!
e−(Li +Li )t dt si (0)
T
Tεi
!
e−λi It dt si (0)
Tεi
−λi Tεi
ρi λ−1
|si(0)|2
i e
Combining (2.38) and (2.40), we have
Z ∞
Ji (Tεi , x0 , u1 , u2 ) =
xTj Qij xj + xTi Qii xi dt
(i = 1, 2)
Tεi
= part 1 + part 2
−λi Tεi
≤ xTj (Tεi ) θi xj (Tεi ) + ρi λ−1
|si (0)|2
i e
Define
−λi Tεi
Jεi (Tεi , x0 , u1 , u2 ) = part 2 ≤ ρi λ−1
|si(0)|2
i e
35
(2.41)
In order to obtain
Jεi (Tεi , x0 , u1 , u2 ) ≤ ε (i = 1, 2)
we need
Tεi ≥
λ−1
i
ln
ρi |si(0)|2
λi ε
(i = 1, 2)
So we can select Tεi as
Tεi = max 0,
λ−1
i
ln
ρi |si (0)|2
λi ε
(i = 1, 2)
(2.42)
Then Tε can be selected as
Tε = max{Tε1 , Tε2 }
(2.43)
Extensive simulations have verified (2.7). Here we give the proof for the case when
s1 (0) = s2 (0) = 0. By Corollary 2.2 below we know that if the conditions in Theorem 2.1
are satisfied, (2.16) constitutes a Nash equilibrium for Game G2 , i.e. ε = 0 and Tε = 0 in
(2.7), s1 (t) = s2 (t) = 0 (∀t ∈ (0, ∞)), and Ji (x0 , u1 = −k11 x1 , u2 = −k22 x2 ) = xTj0 θi xj0 .
Take player 1 for example. If player 1 applies a partial state feedback gain k̄11 and k̄11 6=
k11 , but player 2 still sticks to k22 , the new closed-loop system with the same initial states
is
x̄˙ 1 = A11 − B1 k̄11 x̄1 + A12 x̄2
x̄˙ 2 = A21 x̄1 + (A22 − B2 k22 ) x̄2
(x̄1 (0) = x10 )
(2.44)
(x̄2 (0) = x20 )
Suppose for (k̄11 , k22 ), we could not find qualified solutions θi ’s and −Li ’s to algebraic
equations (2.17), (2.19) and (2.20). But observing (2.9), we find that taking x̄1 as player
1’s fake control, the singular control problem faced by player 1 remains the same. Also
from the first equation in (2.17), we know that θ̄1 = θ1 . Thus the desired relationship
between x̄1 and x̄2 for player 1 remains the same, i.e.
T
x̄1 = −Q−1
11 A21 θ1 x̄2
36
(2.45)
For one LTI system driven by two different stabilizing constant state feedback controls, the
two system state trajectories will not have an intersection except for the starting point and
the ending point, i.e. x(t) 6= x̄(t) (∀t ∈ (0, ∞)) and
T
s̄1 (t) = x̄1 + Q−1
11 A21 θ1 x̄2 6= 0
(∀t ∈ (0, ∞))
By the same manipulation for (2.36) and (2.39), we have
Z ∞
¯
J1 (x0 , u1 = −k̄11 x̄1 , u2 = −k22 x̄2 ) =
x̄T2 Q12 x̄2 + x̄T1 Q11 x̄1 dt
0
Z ∞
T
= x20 θ1 x20 +
s̄T1 (t)Q11 s̄1 (t)dt
(2.46)
(2.47)
0
By (2.46) and the positive definiteness of Q11 , we have
Z
0
∞
s̄T1 (t)Q11 s̄1 (t)dt ≥ 0
and consequently
J¯1 (x0 , u1 = −k̄11 x̄1 , u2 = −k22 x̄2 ) ≥ J1 (x0 , u1 = −k11 x1 , u2 = −k22 x2 )
(2.48)
Thus (2.7) is verified.
Remark 2.2. 1) The difficulties to express solutions to algebraic equations (2.17), (2.19)
and (2.20) explicitly in terms of coefficients Aij ’s, Bi ’s and Qij ’s (i, j = 1, 2) are the same
as finding explicit expression of solutions to Riccati and Lyapunov equations in optimal
control problems. So far there are no general results except for some special systems, e.g.
scalar systems or systems with diagonal coefficient matrices; 2) The working mechanism
of the Nash equilibrium solution (2.16) is like this: first, under (2.16), the system state
trajectory will asymptotically reach (2.11)-the desired state trajectory and the reaching
rate, which is the same as the converging rate of si to its equilibrium-the origin, depends
on the spectrum of the two Hurwitz matrices −Li ’s. After the desired state trajectory (2.11)
37
is quickly approached, each player’s performance index (2.3) can be optimized; 3) Under
the Nash equilibrium solution (2.16), the closed-loop system is a decentralized system,
which will greatly reduce the system complexity.
Corollary 2.1. If
0 < |si (0)| ≤
s
λi ε
ρi
and the conditions in Theorem 2.1 are satisfied, then decentralized control strategy (2.16)
constitutes an ε-Nash equilibrium for Game G2 .
Proof. By the first condition in Corollary 2.1, it is easy to obtain that Tε = 0.
Corollary 2.2. If |si (0)| = 0, (i = 1, 2) and the conditions in Theorem 2.1 are satisfied,
then decentralized control strategy (2.16) constitutes a Nash equilibrium for Game G2 and
Ji (x0 , u∗1 , u∗2 ) = xTj (0)θi xj (0) (i, j = 1, 2; i 6= j)
Proof. By the first condition in Corollary 2.2, it is clear that Tε = 0 and observing (2.36)
we have
Ji (x0 , u1 , u2 ) =
=
Z
Z
∞
0
∞
|0
xTj Qij xj + xTi Qii xi dt
(i = 1, 2)
T
xTj Qij + θi Aji Q−1
ii Aji θi xj dt
{z
}
part 1
= xTj (0)θi xj (0)
2.5 More General Performance Indices
If (2.3) has a cross term and thus is of more general form as shown in (2.49)
Ji (x0 , u1 , u2) =
Z
0
∞
xTj Qij xj + 2xTi Qi3 xj + xTi Qii xi dt
38
(2.49)
where Qii > 0, Qij ≥ 0 and Qi =
Qi1 Qi3
QTi3 Qi2
≥ 0 (i, j = 1, 2; i 6= j), now the optimal
control problem Pi faced by player i becomes
(
ẋj = Ajj xj + Aji xi + Bj uj
R∞
Pi :
Ji (x0 , u1 , u2 ) = 0 xTj Qij xj + 2xTi Qi3 xj + xTi Qii xi dt
(2.50)
In order to apply the result in section 2.4, we need to transform (2.49) to be of form (2.3).
Define
wi = xi + Ti xj
(i, j = 1, 2; i 6= j)
(2.51)
Then (2.49) becomes (2.52)
Ji (x0 , u1 , u2 ) (i, j = 1, 2; i 6= j)
Z ∞h
i
=
xTj Qij xj + 2 (wi − Ti xj )T Qi3 xj + (wi − Ti xj )T Qii (wi − Ti xj ) dt
Z0 ∞
T
=
xj Qij − 2TiT Qi3 + TiT Qii Ti xj − 2wiT Qi3 xj − 2wiT Qii xj + wiT Qii wi dt
0
(2.52)
Choose Ti = Q−1
ii Qi3 (i = 1, 2), then (2.52) becomes (2.53)
Z ∞
T
T
Ji (x0 , u1 , u2 ) =
xj Qij − QTi3 Q−1
ii Qi3 xj + wi Qii wi dt
0
(2.53)
(i, j = 1, 2; i 6= j)
−1
∗
Denote Q∗ij = Qij − QTi3 Q−1
ii Qi3 , Aii = Aii − Aij Qjj Qj3 (i, j = 1, 2; i 6= j), then (2.50)
becomes (2.54)
(
ẋj = A∗jj xj + Ajiwi + Bj uj
R∞
Pi :
Ji (x0 , u1 , u2 ) = 0 xTj Q∗ij xj + wiT Qii wi dt
(i, j = 1, 2; i 6= j)
(2.54)
Following the same standard procedure in section 2.4 to solve the optimal regulator
problem for player i with wi as the fake control, we get (2.55) corresponding to (2.11),
T !
∂uj
−1
T
BjT θi xj (i, j = 1, 2; i 6= j)
(2.55)
wi = −Qii Aji +
∂wi
39
Substituting (2.55) into the corresponding Hamiltonian, we get
"
#
T !
∂u
j
2xTj θi A∗jj xj − Aji Q−1
ATji +
BjT θi xj + Bj uj
ii
∂wi
T !
∂u
∂u
j
j
Q−1
ATji +
BjT θi xj + xTj Q∗ij xj = 0
+xTj θi Aji + Bj
ii
∂wi
∂wi
(2.56)
(i, j = 1, 2; i 6= j)
Similarly define
si = wi + Q−1
ii
ATji +
∂uj
∂wi
T
BjT
!
θi xj
(i, j = 1, 2; i 6= j)
(2.57)
If si tends to be zero, then (2.55) can be satisfied. For two matrices Li ’s such that −Li ’s
are Hurwitz, let
ṡi = −Li si
(i, = 1, 2)
(2.58)
Substituting the expressions of wi and ẋi into (2.58), we have
"
#
T !
T !
∂u
∂u
j
j
ẇ i + Q−1
ATji +
BjT θi ẋj = −Li wi + Q−1
ATji +
BjT θi xj
ii
ii
∂wi
∂wi
(i, j = 1, 2; i 6= j)
i.e.
"
(Aii xi + Aij xj + Bi ui ) + Q−1
Qi3 + ATji +
ii
= −Li
(
xi +
Q−1
ii
"
Qi3 +
ATji
+
∂uj
∂wi
T
T ! #
∂uj
BjT θi (Ajj xj + Aji xi + Bj uj )
∂wi
! # )
BjT θi xj
(i, j = 1, 2; i 6= j)
(2.59)
From (2.59), by the partial state feedback control design, we have
T
A11 − B1 k11 + Q−1
11 Q13 + A21 θ1 A21 + L1 x1
T
−1
T
+ A12 + Q−1
x2 = 0
11 Q13 + A21 θ1 (A22 − B2 k22 ) + L1 Q11 Q13 + A21 θ1
T
−1
T
x1
A21 + Q−1
22 Q23 + A12 θ2 (A11 − B1 k11 ) + L2 Q22 Q23 + A12 θ2
T
+ A22 − B2 k22 + Q−1
Q
+
A
θ
A
+
L
x2 = 0
23
2
12
2
22
12
40
(2.60)
From (2.56) and (2.60), the six matrix equations corresponding to (2.17), (2.19) and (2.20)
are respectively
T
∗
2θ1 (A∗22 − B2 k22 ) − θ1 A21 Q−1
11 A21 θ1 + Q12 = 0
2θ2 (A∗11
− B1 k11 ) −
T
θ2 A12 Q−1
22 A12 θ2
+
Q∗21
(2.61)
=0
T
A11 − B1 k11 + Q−1
11 Q13 + A21 θ1 A21 + L1 = 0
(2.62)
T
−1
T
A21 + Q−1
22 Q23 + A12 θ2 (A11 − B1 k11 ) + L2 Q22 Q23 + A12 θ2 = 0
(2.63)
A12 +
Q−1
11
Q13 +
AT21 θ1
(A22 − B2 k22 ) +
A22 − B2 k22 +
Q−1
22
L1 Q−1
11
Q23 +
Q13 +
AT12 θ2
AT21 θ1
=0
A12 + L2 = 0
From (2.61)-(2.63), we need to find the qualified solutions −L1 (Hurwitz), −L2 (Hurwitz),
θ1 > 0, θ2 > 0, k11 and k22 . And Tε still can be found by (2.43).
2.6 Numeric Example
For a 2-dimension instantiation of game G2 , we have the following corollary.
Corollary 2.3. If n1 = n2 = m1 = m2 = 1, A12 = A21 6= 0, Q11 = Q22 > 0,
Q12 = Q21 > 0, and Q21 /Q22 > 3, then the unique solution to (2.17), (2.19) and (2.20) is
Q11
>0
|A21 |
1 Q21
L1 = L2 =
− 3 |A21 | > 0
2 Q11
2Aii Q11 + (Q21 − Q11 )|A21 |
(i = 1, 2)
kii =
2Bi Q11
θ1 = θ2 =
(2.64)
Proof. Considering the conditions in Corollary 2.3, from the second equation in (2.19) and
the first equation in (2.20) we have
θ1 [(A22 − B2 k22 ) + L1 ] = θ2 [(A11 − B1 k11 ) + L2 ]
41
(2.65)
From the first equation in (2.19) and the second equation in (2.20) we have A11 − B1 k11 =
−1 T
T
−L1 − Q−1
11 A21 θ1 A21 and A22 − B2 k22 = −L2 − Q22 A12 θ2 A12 respectively. Substituting
these expressions into (2.65), we have
L1 = L2
(2.66)
From the first equation in (2.17) and the second equation in (2.19) we have
1
−1 2 2
A12 + Q−1
A
Q
A
θ
−
Q
+ L1 Q−1
21
12
11 21 1
11 A21 θ1 = 0
2 11
(2.67)
Similarly, from the second equation in (2.17) and the first equation in (2.20) we have
1
−1 2 2
A
Q
A
θ
−
Q
+ L2 Q−1
A21 + Q−1
12
21
22 12 2
22 A12 θ2 = 0
2 22
(2.68)
By the symmetry of (2.67) and (2.68), we know that
θ1 = θ2
(2.69)
By the conclusions (2.66) and (2.69), it is easy to obtain (2.64).
Inspired by Corollary 2.3, we have the following corollary for a class of higher dimension systems.
Corollary 2.4. If n1 = n2 = m1 = m2 > 1, Aij ’s, Bi ’s and Qij ’s (i, j = 1, 2) are
diagonal matrices which satisfy A12 = A21 nonsingular, Q11 = Q22 > 0, Q12 = Q21 > 0,
and Q21 Q−1
22 > 3I, then the unique solution to (2.17), (2.19) and (2.20) is
θ1 = θ2 = Q11 |A21 |−1 > 0
L1 = L2 =
kii =
1
Q21 Q−1
11 − 3I |A21 | > 0
2
1
[2Aii Q11 + (Q21 − Q11 )|A21 |] Bi−1 Q−1
11
2
42
(2.70)
(i = 1, 2)
In Corollary 2.4, the absolute value of a diagonal matrix is formed by taking the absolute
value of every diagonal entry.
Example 2.1.
For a second order system, the system parameters are: A11 = 0.2, A22 = 0.3, A21 =
A12 = 3, B1 = 10, B2 = 4; the parameters in the performance indices are: Q11 = Q22 =
0.1, Q12 = Q21 = 4. Select ε = 15. For the initial conditions x10 = 100, x20 = −70, it
can be calculated according to Corollary 2.3 that the unique solution to (2.17), (2.19) and
(2.20) are θ1 = θ2 = 0.0333, L1 = L2 = 55.5, k11 = 5.87, k22 = 14.7 and Tε = 0.
Correspondingly, the decentralized control strategy (2.16), which comprises the ε-Nash
equilibrium, becomes
u1ε (t) = −5.87x1 (t)
(2.71)
u2ε (t) = −14.7x2 (t)
Under (2.71), the two eigenvalues of the closed-loop system are −61.5 and −55.5. The
values of the two players’ performance indices are respectively
J1 x0 = (100, −70)T , u1 = −5.87x1 , u2 = −14.7x2 = 214.7
J2 x0 = (100, −70)T , u1 = −5.87x1 , u2 = −14.7x2 = 335.1
Figures 2.2-2.4 show the curves of system states and control inputs versus time.
The parameter ε depicts the extent to which an ε-Nash equilibrium is close to the ordinary Nash equilibrium. Fixing the initial state conditions as x10 = 100, x20 = −70, figure
2.5 shows the relationship of Tε and ε for an asymptotic ε-Nash equilibrium. From figure
2.5 we know that the more we relax ε, the smaller Tε we have, i.e. the faster we can reach
an asymptotic ε-Nash equilibrium.
Fixing ε = 15, figure 2.6 shows the initial state conditions which guarantee an ε-Nash
equilibrium, i.e. Tε = 0. The shaded area in figure 2.6 is symmetric about x10 = −x20 .
43
x1(circle: starting point)
100
80
60
40
20
0
0
0.2
0.4
0.6
0.8
1
time(sec)
1.2
1.4
1.6
1.8
2
0
0.2
0.4
0.6
0.8
1
time(sec)
1.2
1.4
1.6
1.8
2
x2(circle: starting point)
20
0
−20
−40
−60
−80
Figure 2.2: System states versus time
44
10
0
−10
x
2
−20
−30
−40
−50
−60
−70
0
10
20
30
40
50
60
x1(circle: starting point)
Figure 2.3: Phase plot
45
70
80
90
100
−100
−200
−300
−400
1
u (circle: starting point)
0
−500
−600
0
0.2
0.4
0.6
0.8
1
time(sec)
1.2
1.4
1.6
1.8
2
0
0.2
0.4
0.6
0.8
1
time(sec)
1.2
1.4
1.6
1.8
2
1000
800
600
400
200
2
u (circle: starting point)
1200
0
−200
Figure 2.4: Controls versus time
46
0.07
0.06
0.05
Tε (sec)
0.04
0.03
0.02
0.01
0
0
0.2
0.4
0.6
0.8
1
ε
1.2
1.4
1.6
1.8
Figure 2.5: Relationship of Tε and ε when fixing x10 = 100, x20 = −70
47
2
100
x
20
50
0
−50
−100
−100
−50
0
x
50
100
10
Figure 2.6: Range of initial conditions obtaining ε-Nash equilibrium when fixing ε = 15
48
2.7 Conclusion
For a two-player nonzero-sum game where each player has a control-free cost functional quadratic in system states, we proposed a new equilibrium concept, asymptotic εNash equilibrium, to accommodate the constraint that each player’s control strategy should
be continuous linear state feedback. Based on each player’s singular control problem, the
asymptotic ε-Nash equilibrium, which is attained by partial state feedback and can also
guarantee stability of the decentralized closed-loop system, was obtained from a group of
algebraic equations. A numerical example illustrated the various aspects of the theories in
chapter.
49
CHAPTER 3
ROBUST EQUILIBRIUM FOR ASYMMETRIC GAMES BY
OUTPUT FEEDBACK
3.1 Introduction
Most realistic systems are inevitably subject to uncertainties. Sometimes the uncertainties are caused by the nature of the systems themselves. For example, a system may have
an uncertain structure; or even if it has a deterministic structure it may have some uncertain
parameters. Sometimes although a system has a deterministic structure and parameters, the
uncertainties are introduced by external unknown inputs, such as disturbances, measurement noise etc. Figure 3.1 illustrate an example in a war, where there are two opposing
sides: blue force and red force, and the uncertainty might be the civilian whose inclination
might change from one side to to the other side with time. Sometimes, the system may
be deterministic, but its model may not be precisely known and it is modeled as a system
affected by some uncertainty, e.g. [2]. If the uncertainties are subject to some statistical
characteristics, we can take mathematical expectations and thus reduce the stochastic system to a deterministic system. But in many cases, the uncertainties do not necessarily yield
to statistical analysis and the only information we have about the uncertainties is the range
of their possible values or the function type used to describe them.
50
Figure 3.1: 2-player non-cooperative game with uncertainty illustration
How should the players act in a game with uncertainties? This is a real and significant
problem worth our attention. Since the players do not exactly know the uncertainties in the
game, it is natural for them to adopt some robust strategies that will guarantee their performance indices to be no worse than some performance bounds, which are also called performance guarantees. In control systems with uncertainties, the idea of Maxmin/Minimax
is extensively used in order to get some robust control. Similarly in a game with uncertainties, each player can employ the Maxmin/Minimax idea to obtain his/her strategy. Each
player’s performance index produced by such strategy is called individual rationality. The
strategy produced by the Maxmin/Minimax idea is robust in the sense that no matter which
uncertainties are realized in the game, each player’s performance index will not be worse
than their corresponding individual rationality. But please note that a disadvantage of such
kind of strategies is that they may be very conservative.
51
One type of asymmetric games is games where players have asymmetric information.
Leader-follower game [13] is a well known example of asymmetric information games: the
follower can choose his/her action with the knowledge of not only system states but also
the leader’s action. Many military [69][66], economic and political [82] [93] [73] [58] [34]
problems can be modeled as game problems with asymmetric information.
How to develop equilibrium solutions for asymmetric information games, especially
when there is an uncertainty in the game? This chapter is devoted to answer this question.
After the problem statement in section 3.2, where each player needs to build his/her own
control strategy by output feedback in order to accommodate the situation that they have
asymmetric information about the game, robust output feedback equilibrium strategy and
un-improvable robust output feedback equilibrium strategy are provided in section 3.3 and
3.4 respectively. Conclusions are given in section 3.5
3.2 Problem Statement: 2-player LQ Game with Uncertainty
In this chapter, although we study a 2-player LQ game with an uncertainty, the results
can be easily extended to m-player games. A 2-player LQ game with an uncertainty is
described by (3.1)-(3.4).
ẋ = Ax + B1 u1 + B2 u2 + Bz
(x(0) = x0 ))
yi = Ei x (i = 1, 2)
!
Z ∞
2
X
Ji (x0 , u1, u2 , z) =
xT Qi x +
uTj Rij uj + z T Li z dt
0
(3.1)
(3.2)
(i = 1, 2)
(3.3)
j=1
Here, x ∈ Rn , yi ∈ Rri , two players’ control ui (t) ∈ Ui ⊂ Rmi is constructed by (3.4)
based on their own information structure Ii (t) = {yi (t)}.
ui (t) = γi (yi ) (i = 1, 2)
52
(3.4)
The uncertainty z ∈ Z ⊂ Rmz is regarded as an unknown input and it does not yield to any
statistical analysis. It is known that the uncertainty is realized independent of each player’s
control strategy and can be assumed to be a linear function of sytem state x. Although
knowledge about the other players’ control strategies is absent, each player knows the other
players’ performance indices form.
A, B1 , B2 , B, Ei , Qi ’s, Rij ’s, and Li ’s are constant matrices with proper dimension.
Specially, Qi , Rij , Rii > 0 and Li < 0 (i, j = 1, 2; i 6= j) are symmetrix matrices. If
Ei = I, then player i has full information about system state. For asymmetric information
games, E1 6= E2 .
Many researchers build the robust strategy
u†1 , u†2
for game (3.1)-(3.4) by minimax
design to obtain the individual rationality (3.5) for each player.
Ji† = min max Ji (x0 , u1 , u2, z) = max Ji (x0 , u†i , u−i, zi† ) (i = 1, 2)
ui
u−i
(u−i ,z)
(3.5)
Robust strategy u†1 , u†2 is the most reliable strategy for we always have,
Ji† = max Ji (x0 , u†i , u−i, z) ≤ max Ji (x0 , u†i , u−i, zi† )
u−i
u−i
(∀z ∈ Z) (i = 1, 2)
At the same time robust strategy u†1 , u†2 is the most conservative strategy because it is
† †
designed independently for two uncertainties z1 , z2 which might be different. But for the
system (3.1), every time there is only one uncertainty which is the same for both players.
So we find the common worst uncertainty for both player at each time point
53
3.3 Robust Output Feedback Equilibrium Strategy by 3-player LQ
Game
3.3.1 The Vector-valued Problem of the Uncertainty
To find the common worst uncertainty for both player, i.e. to find z ∗ to make J1 (x0 , u1 , u2 , z)
and J2 (x0 , u1 , u2, z) large, or to find z ∗ to minimize
(
−J1 (x0 , u1, u2 , z)
J3 (x0 , u1 , u2, z) :
−J2 (x0 , u1, u2 , z)
(3.6)
Regarding the uncertainty as the third player who is assumed to have full information
about system state to construct control strategy (3.7), it has a vector-valued minimization
problem (3.6). When fixing (u1 , u2), there are 3 different kinds of solution definitions [95]
for vector-valued problem (3.6).
z = −Kz x
(3.7)
Definition 3.1. z ′ is a Pareto solution for (3.6) if there is no z 6= z ′ and z ∈ Z such that
Ji (x0 , u1, u2 , z ′ ) ≤ Ji (x0 , u1 , u2 , z)
(i = 1, 2)
(3.8)
and at least one of the two inequalities should be strict inequality.
Pareto solution is also called efficient solution or non-inferior solution or non-dominated
solution. Denote the set of all the Pareto solutions as Zp .
Definition 3.2. z ′ is a weak Pareto solution for (3.6) if there is no z 6= z ′ and z ∈ Z such
that
Ji (x0 , u1 , u2 , z ′ ) < Ji (x0 , u1, u2 , z)
Denote the set of all the weak Pareto solutions as Zwp .
54
(i = 1, 2)
(3.9)
Definition 3.3. z ′ is a proper Pareto solution for (3.6) if
1) it is a Pareto solution;
2) for each player i, and the z ∈ Z such that
Ji (x0 , u1 , u2, z) > Ji (x0 , u1 , u2, z ′ )
there exists a constant M > 0 such that
Ji (x0 , u1 , u2 , z) − Ji (x0 , u1, u2 , z ′ )
≤M
Jj (x0 , u1 , u2 , z ′ ) − Jj (x0 , u1 , u2 , z)
(i, j = 1, 2; i 6= j)
(3.10)
Proper Pareto solution is also called Geoffrion solution. Denote the set of all the proper
Pareto solutions as ZG .
The relationship between these three solution sets is Zwp ⊃ Zp ⊃ ZG .
There is a scalar-weighting method [23] [48] to derive Geoffrion solution for (3.6).
Lemma 3.1. For a constant α ∈ (0, 1), a zG ∈ Z exists such that
−αJ1 (x0 , u1, u2 , zG ) − (1 − α)J2 (x0 , u1 , u2, zG )
(3.11)
= min [−αJ1 (x0 , u1 , u2 , z) − (1 − α)J2 (x0 , u1 , u2, z)]
z∈Z
then zG is a Geoffrion solution for (3.6).
From Lemma 3.1, the two-criterion optimal problem (3.6) can be transferred to a optimization problem with only one objective function (3.12).
Jz (x0 , u1 , u2, z) =
Z
∞
xT Qα x +
0
2
X
i=1
uTi Riα ui + z T Lα z
!
dt
(3.12)
In (3.12): Qα = −αQ1 − (1 − α)Q2 , Riα = −αR1i − (1 − α)R2i (i = 1, 2), Lα =
−αL1 − (1 − α)L2 . Note that Lα > 0.
55
3.3.2 Output Feedback Nash Equilibrium of 3-player LQ Game
By the derivation in section 3.3.1, we can construct a 3-player game as (3.13)
ẋ = Ax + B1 u1 + B2 u2 + Bz
(x(0) = x0 ))
yi = Ei x (i = 1, 2)
!
Z ∞
2
X
Ji (x0 , u1, u2 , z) =
xT Qi x +
uTj Rij uj + z T Li z dt
0
Jz (x0 , u1 , u2 , z) =
Z
(3.13)
j=1
∞
xT Qα x +
0
(i = 1, 2)
2
X
uTi Riα ui + z T Lα z
i=1
!
dt
N
We need to find Nash equilibrium uN
1 , u2 , zG ∈ U1 × U2 × Z (3.14) for game (3.13).
N
N
J1 (x0 , u1 , uN
2 , zG ) ≥ J1 (x0 , u1 , u2 , zG )
(∀u1 ∈ U1 )
N
N
J2 (x0 , uN
1 , u2 , zG ) ≥ J2 (x0 , u1 , u2 , zG )
(∀u2 ∈ U2 )
N
N
N
Jz (x0 , uN
1 , u2 , z) ≥ Jz (x0 , u1 , u2 , zG )
(3.14)
(∀z ∈ Z)
1) Dynamic Output Feedback Nash Equilibrium
N
If (E1 , A), and (E2 , A) are all observable, uN
1 , u2 , zG in (3.14) can be found by
dynamic output feedback (3.15) based on the separation principle.
i
uN
i = −Ki x̂
(i = 1, 2)
(3.15)
zG = −Kz x
In(3.15), x̂i is player i’s estimation of system state x produced by
x̂˙ i = (A − Bj Kj − BKz ) x̂i + Bi ui + Di (yi − ŷi )
ŷi = Ei x̂i
ui = −Ki x̂i
(x̂i (0) = x̂i0 )
(3.16)
(i, j = 1, 2; i 6= j)
In (3.16), initial state x̂i0 can be chosen arbitrarily; Di is chosen to be able to stabilize
(A − Bj Kj − BKz )T , −EiT . With Di chosen properly, x̂i can approach real system
state x very quickly.
56
We have the following theorem which provides a sufficient condition for the construction of (3.15).
Theorem 3.1. If there exists symmetric matrix solution (θ1 , θ2 , θz ) to Riccati equation
(3.17)
0= A −
2
X
j=1
T
−1 T
Bj Rjj
Bj θj − BL−1
α B θz
2
X
+ Qi +
!T
θi + θi
A−
2
X
j=1
−1 T
T
Bj θj − BL−1
Bj Rjj
α B θz
−1
−1 T
−1 T
θj Bj Rjj
Rij Rjj
Bj θj + θz BL−1
α Li Lα B θz
!
(i = 1, 2)
j=1
0= A −
2
X
j=1
+Qα +
−1 T
T
Bj Rjj
Bj θj −BL−1
α B θz
2
X
!T
θz + θz
A−
2
X
j=1
−1 T
T
Bj Rjj
Bj θj −BL−1
α B θz
!
−1
−1 T
T
θj Bj Rjj
Rjα Rjj
Bj θj + θz BL−1
α B θz
j=1
(3.17)
and furthermore, the eigenvalues of the system matrix
A−
2
X
i=1
T
Bi Rii−1 BiT θi − BL−1
α B θz
all have negative real parts. Based on state observer (3.16) with D1 and D2 which can
B2 K2
1 E1
stable, the dynamic output feedback Nash equilibrium
make A−B2BK12K−D
A−B
K
−D
E
1
1 1
2 2
(when x̂0 = x0 )/dynamic output feedback asymptotic ε-Nash equilibrium (when x̂0 6= x0 )
N
uN
1 , u2 , zG for game (3.13) can be constructed by
−1 T
i
uN
i = −Rii Bi θi x̂
ZG =
(i = 1, 2)
(3.18)
T
−L−1
α B θz x
Proof. The proof is similar to that for sufficiency of Theorem 4.3. The time point Tε for the
dynamic output feedback asymptotic ε-Nash equilibrium is dependent on the convergence
rate of each player’s observer.
57
2) Static Output Feedback Nash Equilibrium
N
uN
1 and u2 in (3.14) can also be constructed by direct output feedback with constant
feedback gains (3.19).
uN
i = −Fi yi
(i = 1, 2)
(3.19)
zG = −Kz x
The following theorem provides a sufficient condition for (3.19).
Theorem 3.2. If there exists solution (θ1′ , θ2′ , θz′ ) to Riccati equation (3.20)
0= A −
2
X
j=1
+ EiT θi′
−1 T ′
T ′
Bj Rjj
Bj θj Ej − BL−1
α B θz
A−
2
X
+ Qi +
2
X
j=1
!T
θi′ Ei
−1 T ′
T ′
Bj Rjj
Bj θj Ej − BL−1
α B θz
!
−1
−1 T ′
−1 T ′
EjT θj′ Bj Rjj
Rij Rjj
Bj θj Ej + θz′ BL−1
α Li Lα B θz
j=1
0= A −
+
θz′
2
X
j=1
−1 T ′
T ′
Bj Rjj
Bj θj Ej −BL−1
α B θz
A−
+Qα +
2
X
2
X
−1 T ′
Bj Rjj
Bj θj Ej
j=1
!T
(i = 1, 2)
(3.20)
θz′
T ′
−BL−1
α B θz
!
−1
−1 T ′
T ′
EjT θj′ Bj Rjj
Rjα Rjj
Bj θj Ej + θz′ BL−1
α B θz
j=1
P2
T ′
Bi Rii−1 BiT θi′ Ei −BL−1
α B θz
N
all have negative real parts, the static output feedback Nash equilibrium uN
1 , u2 , zG for
and furthermore, the eigenvalues of the system matrix A−
i=1
game (3.13) can be constructed by
−1 T ′
uN
i = −Rii Bi θi yi
ZG =
(i = 1, 2)
T ′
−L−1
α B θz x
Proof. The proof is similar to that for sufficiency of Theorem 4.3.
58
(3.21)
N
Remark 3.1. 1) Nash equilibrium solution uN
1 , u2 , zG for game (3.13) provides a robust
equilibrium solution to game (3.1)-(3.4) in the sense that: when fixing u1 = uN
1 , and
u2 = uN
2 in (3.1), we have
N
N
N
αJ1 (x0 , uN
1 , u2 , z) + (1 − α)J2 (x0 , u1 , u2 , z)
N
N
N
≤ αJ1 (x0 , uN
1 , u2 , zG ) + (1 − α)J2 (x0 , u1 , u2 , zG ) (∀z ∈ Z)
When fixing z = zG in (3.1), we have
N
N
J1 (x0 , uN
1 , u2 , zG ) ≤ J1 (x0 , u1 , u2 , zG )
(∀u1 ∈ U1 )
N
N
J2 (x0 , uN
1 , u2 , zG ) ≤ J2 (x0 , u1 , u2 , zG )
(∀u2 ∈ U2 )
2) Compared with individual rationality Ji† defined in (3.5), we have
†
N
Ji (x0 , uN
1 , u2 , zG ) ≤ Ji
(i = 1, 2)
N
So the result of uN
1 , u2 , zG is not as conservative as the individual rationality resulting
from each player’s minimax design.
3) If Ei = I (i = 1, 2), (3.20) is exactly the same as (3.17). If there exist qualified solutions (θ1 , θ2 , θz ) to (3.17) and (θ1′ , θ2′ , θz′ ) to (3.20), then the effect of static output feedback
Equilibrium is the same as that of state feedback equilibrium which can be constructed by
(3.18), that is,
−1 T ′
N
−1 T ′
−1 T ′ ′
Ji (x0 , uN
1 = −R11 B1 θ1 y1 , u2 = −R22 B2 θ2 y2 , zG = −Lα B θz x )
−1 T
N
−1 T
−1 T
= Ji (x0 , uN
1 = −R11 B1 θ1 x, u2 = −R22 B2 θ2 x, zG = −Lα B θz x)
(3.22)
(i = 1, 2)
where system states x′ and x in (3.22) are produced respectively by (3.23) and (3.24)
ẋ′ =
ẋ =
A−
A−
2
X
i=1
2
X
i=1
T ′
Bi Rii−1 BiT θi′ Ei − BL−1
α B θz
T
Bi Rii−1 BiT θi − BL−1
α B θz
59
!
!
x
x′
(3.23)
(3.24)
This can be proved by the observation that the roles of θi′ Ei and θz′ in (3.20) are the same
as those of θi and θz in (3.17).
4) Parameter α depicts how the uncertainty adds penalty to the two players. For example, when α = 1, uncertainty z put a full penalty on player 1. The larger α is, the more the
uncertainty is in favor of player 2. In real applications, the selection of α can be decided
according to some experts’ experience.
5) θi′ ’s may not be symmetric.
3.4 Un-improvable Robust Output Feedback Equilibrium Strategy by
2-player LQ Game
Let u1 and u2 form a coalition. Regarding the coalition of u1 and u2 and uncertainty
z as the two players, we can construct a 2-player game (3.25) whose Nash equilibrium
solution will be a un-improvable robust equilibrium solution for game (3.1)-(3.4).
ẋ = Ax + B1 u1 + B2 u2 + Bz
yi = Ei x (i = 1, 2)
Z
J12 (x0 , u1, u2 , z) =
(x(0) = x0 ))
∞
xT Qβ x +
0
Jz (x0 , u1, u2 , z) =
Z
2
X
uTj Riβ uj + z T Lβ z
j=1
∞
xT Qα x +
0
2
X
uTi Riα ui + z T Lα z
i=1
!
!
(3.25)
dt
dt
In (3.25), Jz is as in(3.12), β ∈ (0, 1), Qβ = βQ1 + (1 − β)Q2 , Riβ = βR1i 1 + (1 − β)R2i,
Lβ = βL1 + (1 − β)L2 .
G
As in section 3.3, we can construct Nash equilibrium ((uG
1 , u2 ), zG ) by both dynamic
output feedback and static output feedback for game (3.25).
R1β 0
1
Define B12 = (B1 B2 ), E12 = E
,
R
=
12β
E2
0 R2β , R12α =
60
R1α 0
0 R2α
.
Construct a state observer (3.26) based on observations (y1 , y2).
!
2
2
X
X
˙x̂ = A −
Bj Kj − BKz x̂ +
Dj (yj − ŷj ) (x̂(0) = x̂0 )
j=1
j=1
ŷi = Ei x̂
(3.26)
ui = −Ki x̂ (i = 1, 2)
z = −Kz x
In (3.26), D1 and D2 are chosen such that
D1 0
0 D2
T
can stabilize (A− BKz ) ,−
E1 0
0 E2
T .
Theorem 3.3. If there exists symmetric matrix solution (θ12 , θz ) to Riccati equation (3.27)
−1
T
T
0 = A − B12 R12β
B12
θ12 − BL−1
α B θz
T
−1
T
T
θ12 + θ12 A − B12 R12β
B12
θ12 − BL−1
α B θz
T
T
T
−1
θz + θz A − B12 R12β
B12
θ12 − BL−1
α B θz
−1
T
−1 T
+ Qβ + θ12 B12 R12β
B12
θ12 + θz BL−1
α Lβ Lα B θz
T
T
−1
B12
θ12 − BL−1
0 = A − B12 R12β
α B θz
T
T
−1
−1
B12
θ12 + θz BL−1
R12α R12β
+Qα + θ12 B12 R12β
α B θz
(3.27)
and furthermore, the eigenvalues of the system matrix
T
T
−1
B12
θ12 − BL−1
A − B12 R12β
α B θz
all have negative real parts, based on state observer (3.26) the dynamic output feedback
Nash equilibrium (when x̂0 = x0 )/dynamic output feedback asymptotic ε-Nash equilibrium
G
(when x̂0 6= x0 ) (uG
1 , u2 ), zG for game (3.25) can be constructed by
−1 T
uG
i = −Riβ Bi θ12 x̂ (i = 1, 2)
ZG =
T
−L−1
α B θz x
61
(3.28)
′
Theorem 3.4. If there exists solution (θ12
, θz′ ) to Riccati equation (3.29)
−1
T ′
T ′
0 = A − B12 R12β
B12
θ12 E12 − BL−1
α B θz
T
′
θ12
E12
T ′
−1
T ′
T ′
+ E12
θ12 A − B12 R12β
B12
θ12 E12 − BL−1
α B θz
T ′
−1
T ′
−1 T ′
+ Qβ + E12
θ12 B12 R12β
B12
θ12 E12 + θz′ BL−1
α Lβ Lα B θz
0= A −
−1
T ′
B12 R12β
B12
θ12 E12
−
T ′ T ′
BL−1
θz
α B θz
−1
T ′
T ′
+ θz′ A − B12 R12β
B12
θ12 E12 − BL−1
α B θz
(3.29)
T ′
−1
−1
T ′
T ′
+Qα + E12
θ12 B12 R12β
R12α R12β
B12
θ12 E12 + θz′ BL−1
α B θz
and furthermore, the eigenvalues of the system matrix
−1
T ′
T ′
A − B12 R12β
B12
θ12 E12 − BL−1
α B θz
G
all have negative real parts, the static output feedback Nash equilibrium (uG
1 , u2 ), zG for
game (3.25) can be constructed by
uG
i
=
ZG =
−1 T ′
Bi θ12
−Riβ
y1
y2
(i = 1, 2)
(3.30)
T ′
−L−1
α B θz x
Remark 3.2. 1) If α = β, then game (3.25) is actually a zero-sum game.
′
2) θ12
may be asymmetric.
3) The larger β is, the more the coalition (u1 , u2 ) is in favor of player 1.
G
4) Nash equilibrium solution uG
,
u
,
z
for game (3.25) provides an unimprovable
G
1
2
robust equilibrium solution to game (3.1)-(3.4) in the sense that: when fixing u1 = uG
1 , and
u2 = uG
2 in (3.1), we have
G
G
G
αJ1 (x0 , uG
1 , u2 , z) + (1 − α)J2 (x0 , u1 , u2 , z)
G
G
G
≤ αJ1 (x0 , uG
1 , u2 , zG ) + (1 − α)J2 (x0 , u1 , u2 , zG ) (∀z ∈ Z)
62
G
When fixing z = zG in (3.1), we cannot find (u1 , u2 ) 6= (uG
1 , u2 ) and (u1 , u2 ) ∈ U1 × U2
such that we can simultaneously have
G
J1 (x0 , uG
1 , u2 , zG ) ≥ J1 (x0 , u1 , u2 , zG )
G
J2 (x0 , uG
1 , u2 , zG ) ≥ J2 (x0 , u1 , u2 , zG )
In the above two inequalities, at least one of them is a strict inequality.
3.5 Conclusion
In this chapter, output feedback was used to form equilibrium solution in order to accommodate the situation that players have asymmetric information about the game evolution. For the original 2-player game with an uncertainty: 1) parameterized by α which
represents the favoritism of uncertainty to the two players, a 3-player game was constructed
whose Nash equilibrium solution is a robust equilibrium solution for the original game; 2)
parameterized by α, which represents the favoritism of uncertainty to the two players, and
β, which represents the favoritism of the two-player coalition, a 2-player game was constructed whose Nash equilibrium solution is an un-improvable robust equilibrium solution
for the original game. The robust equilibrium solution were build based on both dynamic
output feedback method and static output feedback method.
63
CHAPTER 4
WHEN IS LINEAR STATE FEEDBACK A NASH EQUILIBRIUM
SOLUTION FOR LQ GAMES?
4.1 Introduction
Riccati equations play an important role in LQ games and LQ optimal control problems because each player’s control strategy or the optimal controls depend on the solution
of associated Riccati equations. In 1964, Kalman [39] introduced the inverse problem:
given a linear single input control system with constant linear asymptotically stable state
feedback control law, for what quadratic performance index is the control law optimal?
The main result of [39] is a return difference condition in the frequency domain. Although
Kalman hoped to derive the explicit condition for all inverse control problems, he pointed
out that only for low order systems is it possible to derive explicit conditions and he gave
examples of a second order system and a third order system. Cruz and Perkins [20][64]
proposed the concept of sensitivity matrix whose inverse is a matrix generalization of return difference for the study of decreasing parameter variation effects on system behavior.
Perkins and Cruz discussed the relationship between stability and parameter sensitivity in
[64], again using the matrix return difference. Anderson [4] studied the inverse problem for
64
multi-variable linear systems with additive noise at the input and output and matrix return
difference conditions were derived for testing the optimality.
Because the optimal control is a linear constant feedback of system states for linearquadratic time invariant optimal system with infinite horizon all the results in [39][20][64][4]
could be and were stated in the frequency domain. For linear-quadratic optimal systems
with finite horizons the linear feedback gains will be time varying in general, and the system matrices may be time varying also, so that there is no frequency domain result for
the inverse problem. Here we provide a new theorem about the inverse problem of linearquadratic optimal systems with finite horizons. It is a prelude to the main theme of this
chapter. Before we do this, a theorem for the inverse problem of linear-quadratic optimal
systems with infinite horizons is provided first for comparison.
A linear quadratic optimal control system with infinite horizon is described by (4.1)
with system state x ∈ Rn and control u ∈ Rm and all the coefficients are constant matrices
of proper dimension. A linear constant feedback (4.3) is applied in order to minimize the
performance index (4.2). Problem (4.1)-(4.2) is the so called optimal regulator problem.
The first term within the integral of (4.2) represents the regulation error and the second
term represents the control energy.
ẋ = Ax + Bu
(x(0) = x0 )
Z ∞
J(x0 , u) =
xT Qx + uT Ru dt
(4.1)
(4.2)
0
u(t) = −Kx
For the inverse problem of the optimal regulator, we have the following theorem.
65
(4.3)
Theorem 4.1. For the system (4.1), a given asymptotically stable control law (4.3) will
minimize the performance index (4.2) with some positive semi-definite Q and positive definite R if and only if Q and R satisfy (4.4):
0 = RK − B T θ
(4.4)
0 = (A − BK)T θ + θ(A − BK) + Q + K T RK
where θ is a positive semi-definite matrix intermediate variable.
Alternatively, define the set
Γ = Q ∈ Rn×n , R ∈ Rn×n : (4.4)is satisfied and Q ≥ 0, R > 0, and θ ≥ 0
Then the asymptotically stable control law in (4.4) is optimal if and only if Γ is not empty.
Similarly, a LQ control system with finite horizon is described by (4.5) and (4.6).
ẋ = A(t)x + B(t)u
T
J(x0 , u) = x (tf )Cx(tf ) +
Z
tf
t0
u(t) = −K(t)x
(x(0) = x0 , t ∈ [t0 , tf ])
xT Q(t)x + uT R(t)u dt
(4.5)
(4.6)
(4.7)
In (4.5) and (4.6), every time varying coefficient is a continuous matrix-valued time function of proper dimension, and especially B(t) and R(t) are continuously differentiable. A
linear time varying feedback control law (4.7) is desired to minimize (4.6). In (4.7), the
continuously differentiable matrix-valued state feedback gain K(t) : [t0 , tf ] → Rm×n is
continuously differentiable.
Theorem 4.2. For system (4.5), a given time varying control law (4.7) will minimize the
performance index (4.6) with some positive semi-definite matrix Q(t) and positive definite
66
R(t) if and only if Q(t) and R(t) satisfy (4.8):
0 = R(t)K(t) − B T (t)θ(t)
−θ̇(t) = (A(t) − B(t)K(t))T θ(t) + θ(t)(A(t) − B(t)K(t)) + Q(t) + K T (t)R(t)K(t)
(θ(tf ) = C,
t ∈ [t0 , tf ])
(4.8)
where θ(t) ≥ 0 (∀t ∈ [t0 , tf ]) is an intermediate variable.
Define the set
Γ′ = {(Q(t), R(t)) : (4.8) is satisfied and Q(t) ≥ 0, R(t) > 0, and θ ≥ 0, ∀t ∈ [t0 , tf ]}
Then control law (4.7) is optimal if and only if Γ′ is not empty.
Remark 4.1. 1) Γ and Γ′ may have infinitely many elements; 2) If Γ/Γ′ is empty, this means
that the given control law (4.3)/ (4.7) cannot minimize any performance index of the form
(4.2)/ (4.6).
Riccati solved a scalar differential equation which is quadratic in the unknown. Kalman
[38] generalized this Riccati equation to the matrix case. Comparing the conditions (4.4)
and (4.8) with the nonlinear Riccati equation, (4.4) and (4.8) are linear in the unknowns θ,
Q and R. So the inverse problem of a linear-quadratic optimal control system will be easier
to solve than the direct problem of the same system. Compared with the Riccati equations associated with optimal control problems, the Riccati equations of linear-quadratic
dynamic games are more difficult to solve because they are coupled nonlinear matrix equations containing cross terms: products of different players’ Riccati equation solutions.
Because of the close relationship between robust stabilizability and optimality, several
researchers link LQ differential games and robust control systems (e.g. H∞ control systems) together. For example, researchers in [30], [44], [27], and [46] used LQ games to
67
study minimax controller design methods for the original robust control problems. Note
that the games in [30], [44], [27], and [46] are all two-player zero-sum adversarial games.
Inspired by the idea that Kalman used in [38], inverse problems for games were proposed.
The inverse problem for games can be stated as: for a given system and a group of control
strategies, for what performance indices is the given control strategy group an equilibrium
solution. Fujii and Khargonekar [30] proposed equivalent conditions of return difference
conditions in the frequency domain for the inverse problem in LQ to study H∞ control
theory. In Kogan [44], the so-called generalized return difference conditions in the frequency domain were presented for LQ differential games associated with several uncertain
systems. Tang and Basar [79] designed minimax nonlinear controllers for a class of stochastic nonlinear systems which are also contaminated by an unknown disturbance. When
regarding the choice of special cost functions as part of the design problem, they solved the
inverse problem for a class of nonlinear zero-sum games. Based on two kinds of relationship: 1) the relation between the existence of robust control Lyapunov function (RCLF) for
control-affine system and robust stabilization via state feedback; 2) the relation between robust control systems and zero-sum differential games, Freeman and Kokotovic [27] solved
the inverse problem for nonlinear games using RCLF. Krstic and Li [46] studied the inverse
problem for nonlinear zero-sum games when there are state-dependent weighting matrix
for control and non-quadratic penalty for disturbance in the performance index. Again note
that these authors only studied the inverse problems for the two-player zero-sum adversarial
games.
In this chapter, we discuss the inverse problem for general nth -order m-player nonzerosum LQ differential games which can be indefinite. Given a linear state feedback control
strategy group, the general conditions on the state and control weighting matrices are given
68
as linear equations in order that the given control strategy group be a Nash equilibrium
solution. Explicit conditions for the inverse problem of a 2nd -order 2-player nonzero-sum
game with full state feedback Nash equilibrium solution is thoroughly discussed in section
4.3 and the results are extended to a class of high order 2-player non-cooperative games in
section 4.4. In section 4.5, a numerical example is used to illustrate that the state weighting
matrix in the integral of the performance index could be indefinite and still guarantee the
existence of a Nash equilibrium solution, which is a vivid contrast with the fact that in the
LQ optimal control problems the state weighting matrix should be at least positive semidefinite to guarantee the existence of optimal control. Conclusions are in section 4.6.
4.2 nth -order m-player Nonzero-sum LQ Games
An nth -order m-player nonzero-sum LQdifferential game with infinite horizon is described by (4.9) and (4.10).
ẋ = Ax +
m
X
Bi ui
(x(0) = x0 )
i=1
Ji (x0 , ui , u−i ) =
Z
∞
xT Qi x +
0
m
X
uTj Rij uj
j=1
!
(4.9)
dt (Rii > 0; i = 1, · · · , m; m ≥ 2)
(4.10)
x ∈ Rn , u ∈ Rmi . Here we require m ≥ 2 to discuss real game problems whose special
properties (not possessed by general optimal control problems) will be revealed. A and Bi ’s
are proper constant matrices with (A, diag(B1 , ..., Bm )) is stabilizable. Except for Rii > 0,
Qi ’s and Rij ’s (i, j = 1, ..., m) are only required to be symmetric constant matrices with
proper dimension. So game (4.9)-(4.10) can be an indefinite game as defined in chapter 1.
Assume that each player has a perfect state information structure.
69
A linear constant state feedback control strategy (4.11) is applied by each player to
minimize his/her own performance index (4.10) independently.
ki 11 · · · ki 1n
..
.. (x (t) · · · x (t))T
ui (t) = −Ki x(t) = − ...
n
.
. 1
ki mi 1 · · · ki mi n
(i = 1, · · · , m)
(4.11)
The counterpart of Game (4.9)-(4.10) with finite horizon is described by (4.12)-(4.13)
ẋ = A(t)x +
m
X
Bi (t)ui
(x(0) = x0 , t ∈ [t0 , tf ])
i=1
T
Ji (x0 , ui, u−i) = x (tf )Ci x(tf ) +
Z
tf
T
x Qi (t)x +
t0
m
X
j=1
(Rii (t) > 0, ∀t ∈ [t0 , tf ]; i = 1, · · · , m; m ≥ 2)
uTj Rij (t)uj
(4.12)
!
dt
(4.13)
Again, except for Rii > 0, Ci ’s, Qii (t)’s, and Rij ’s (i, j = 1, · · · , m) are only required to
be symmetric real-valued matrices with proper dimension. System state x(t) is available
for each player to construct his/her linear state feedback control strategy (4.14)
ki 11 (t) · · · ki 1n (t)
T
..
..
..
ui(t) = −Ki (t)x(t) = −
(x1 (t) · · · xn (t)) (i = 1, · · · , m)
.
.
.
ki mi 1 (t) · · · ki mi n (t)
(4.14)
For games (4.9)-(4.10) and (4.12)-(4.13), we have the following theorem.
Theorem 4.3. 1) A stabilizing control strategy group (4.11) is a Nash equilibrium solution
for game (4.9)-(4.10) if and only if the system parameter matrices A and Bi ’s, the weighting matrices in performance index (4.10) Qi ’s, Rij ’s and feedback gains Ki ’s satisfy the
relations described by (4.15)-(4.17).
70
Rii−1 BiT θi = Ki
0=
A−
m
X
(i = 1, · · · , m)
!T
−1 T
Bj Rjj
Bj θj
j=1
m
X
+ Qi +
(4.15)
θi + θi
A−
−1
−1 T
θj Bj Rjj
Rij Rjj
Bj θj
j=1
λ∈C
∀λ ∈ σ A −
m
X
−1 T
Bj Rjj
Bj θj
j=1
(4.16)
(i = 1, · · · , m)
−1 T
Bj Rjj
Bj θj
j=1
m
X
!
!!
(4.17)
where the n by n symmetric matrices θi ’s are intermediate variables linking all the parameters.
2) A control strategy group (4.14) is a Nash equilibrium solution for game (4.12)-(4.13)
if and only if the system parameter matrices A(t) and Bi (t)’s, the weighting matrices in
performance indices (4.13) Qi (t)’s, Rij (t)’s and Ci ’s and feedback gains Ki (t)’s satisfy
the relations described by (4.18)-(4.20). In (4.19), the time variable t is omitted.
Rii−1 (t)BiT (t)θi (t) = Ki (t)
− θ̇i =
A−
+ Qi +
m
X
j=1
m
X
(i = 1, · · · , m)
!T
−1 T
Bj Rjj
Bj θj
θi + θi
−1
−1 T
θj Bj Rjj
Rij Rjj
Bj θj
j=1
θi (tf ) = Ci
A−
(4.18)
m
X
−1 T
Bj Rjj
Bj θj
j=1
!
(4.19)
(i = 1, · · · , m)
(i = 1, · · · , m)
(4.20)
The n by n symmetric matrices θi (t)’s are intermediate variables linking all the parameters.
Proof. 1) The dynamic programming method is used for the proof of sufficiency. First
introduce the Hamilton-Jacobi-Bellman (HJB) equations (4.21) for game (4.9)-(4.10)
∂Vi
+ min{Hi } = 0 (i = 1, · · · , m)
ui
∂t
71
(4.21)
The Hamiltonian Hi is
Hi =
∂Vi
∂x
T
Ax +
m
X
i=1
Bi ui
!
+ xT Qi x +
m
X
uTj Rij uj
j=1
Vi (t, x) in (4.21) is a scalar function. It can be shown that
∂ 2 Hi
∂u2i
(i = 1, · · · , m)
= 2Rii > 0, i.e. Hi
is strictly convex with respect to ui. This implies that if there exists a control strategy
i
ui such that Hi can be minimized, then the control strategy should be unique. Let ∂H
=
∂ui
i
BiT ∂V
+2Rii ui = 0, then the control strategies to minimize Hi should be of the following
∂x
form
1
ui = − Rii−1 BiT
2
∂Vi
∂x
(i = 1, · · · , m)
(4.22)
Substitute the above control strategies into the HJB equations (4.21) and get the partial
differential equation (4.23) of Vi .
T "
#
m
∂Vi
1X
∂Vj
∂Vi
−1 T
+
Bj Rjj Bj
+ xT Qi x
0=
Ax −
∂t
∂x
2 j=1
∂x
T
m 1 X ∂Vj
∂Vj
−1
−1 T
+
Bj Rjj Rij Rjj Bj
(i = 1, · · · , m)
4 j=1 ∂x
∂x
Assume Vi (t, x) = xT θi x, where θi is a symmetric matrix. Note that
∂Vi (t, x)
∂x
∂Vi (t, x)
∂t
(4.23)
= 0 and
= 2θi x. Then (4.23) is equivalent to the ARE (4.16). If solutions θi ’s to the Riccati
equation (4.16) exist, then (4.23) and consequently (4.21) will be satisfied and the Nash
equilibrium can be constructed as in (4.24) where the state x can be calculated from the
closed-loop system (4.25).
ui = −Rii−1 BiT θi x (i = 1, · · · , m)
!
m
X
−1 T
ẋ = A −
Bj Rjj
Bj θj x
(4.24)
(4.25)
j=1
(4.17) is required for the stability of the closed-loop system (4.25). Comparing (4.11) and
(4.24), we can have (4.15). So the sufficiency of part (1) is proved.
72
The necessity can be proved by the minimum principle method1 . Suppose (4.11) is
a stabilizing Nash equilibrium solution for game (4.9)-(4.10). For player i, (4.9)-(4.10)
becomes (4.26)-(4.27) and construct function (4.28)
ẋ = Ax −
Ji (x0 , ui , u−i ) =
Z
∞
0
m
X
Bj Kj x + Bi ui
(x(0) = x0 )
(4.26)
j=1
j6=i
m
X
T
x Qi x +
xT KjT Rij Kj x + uTi Rii ui
dt
T
(i = 1, · · · , m)
j=1
j6=i
Hi (x, ui , pi ) = x Qi x +
m
X
T
x
KjT Rij Kj x
+
uTi Rii ui
+
pTi
(4.27)
m
X
Ax −
B
K
x
+
B
u
j
j
i
i
j=1
j6=i
j=1
j6=i
(4.28)
where pi ’s is the costate. By the minimum principle, player i’s optimal control must be of
the form (4.29) and also we have (4.30) and (4.31).
1
u∗i = arg min Hi = − Rii−1 BiT pi (i = 1, · · · , m)
ui
2
m
X
1
ẋ = Ax −
Bj Kj x − Bi Rii−1 BiT pi
2
j=1
(4.29)
(4.30)
j6=i
ṗ = −2Qi x − 2
m
X
j=1
j6=i
KjT Rij Kj x
T
m
X
− A −
Bj Kj
pi
j=1
j6=i
(i = 1, · · · , m)
(4.31)
Comparing (4.11) and (4.29), we have ui = −Ki x = − 12 Rii−1 BiT pi . Then there must
exist an n by n constant matrix θi such that pi = 2θi x and Ki = Rii−1 BiT θi . Here
Pm
−1 T
the state trajectory x(t) should be produced by ẋ = A − j=1 Bj Rjj Bj θj x. By
1
The argument here can also be carried out by using the matrix minimum principle method in Athans [6]
to minimize the cost function with respect to the control gain matrix. In order to apply the matrix minimum
principle, a matrix differential equation which is equivalent to (4.9)-(4.10) should be constructed first. The
interested readers should refer to Athans [6] for more details.
73
the assumption that (4.11) is a stabilizing Nash equilibrium, we know that the matrix
P
Pm
−1 T
Ā = A − m
j=1 Bj Rjj Bj θj = A −
j=1 Bj Kj must be Hurwitz and consequently nonsingular. From (4.30) and (4.31) we have
ṗi = 2θi ẋ
= 2θi
A−
m
X
−1 T
Bj Rjj
Bj θj
j=1
= −2Qi x − 2
m
X
!
x
−1
−1 T
θjT Bj Rjj
Rij Rjj
Bj θj x
j=1
j6=i
T
(4.32)
m
X
−1 T
− 2 A −
Bj Rjj
Bj θj θi x
j=1
j6=i
In (4.32), the second equal sign comes from (4.30) and the third is a result of (4.31). (4.32)
implies (4.33), which is almost the same as (4.16) except that we do not know whether the
θi ’s in (4.33) are symmetric or not.
0=
A−
m
X
−1 T
Bj Rjj
Bj θj
j=1
m
X
+ Qi +
!T
θi + θi
−1
−1 T
θjT Bj Rjj
Rij Rjj
Bj θj
j=1
A−
m
X
−1 T
Bj Rjj
Bj θj
j=1
!
(4.33)
(i = 1, · · · , m)
Because the last two terms on the right hand side of (4.33) are symmetric, we must have
that the sum of the first two terms is also symmetric, i.e.
A−
= θiT
m
X
−1 T
Bj Rjj
Bj θj
j=1
A−
m
X
!T
θi + θi
−1 T
Bj Rjj
Bj θj
j=1
!
+
A−
A−
m
X
−1 T
Bj Rjj
Bj θj
!
−1 T
Bj Rjj
Bj θj
!T
j=1
m
X
j=1
(4.34)
θiT
(i = 1, · · · , m)
Define Pi = θi − θiT , which is a skew symmetric matrix, i.e. Pi = −(Pi )T . Then from
(4.34), we have (4.35).
Pi Ā + ĀT Pi = 0 (i = 1, · · · , m)
74
(4.35)
(4.35) is an algebraic Sylvester equation. If there does not exist nontrivial skew symmetric
matrix solution to (4.35), then there must not exist asymmetric solution to (4.33). Fortunately, Ā is a Hurwitz matrix and all its the eigenvalues have negative real parts. So the
intersection of the spectrum of Ā and that of −ĀT must be empty. Then by Theorem 1.1.3
in Abou-Kandil [1], the solution to (4.35) must be unique. Obviously, Pi = 0 is a solution
to (4.35) and from the above analysis we know that Pi = 0 is the only solution to (4.35).
Thus we know the solutions θi ’s to (4.33) if they exist must be symmetric. Then (4.33) is
exactly the same as (4.16). The necessity is proved.
2) The proof for part 2 can be derived similarly only with minor modifications. In the
proof of sufficiency, the scalar function used for part 2 should be Vi (t, x) = xT θi (t)x, where
θi ’s are time varying symmetric matrices with θi (tf ) = Ci . Note that
and
∂Vi (t, x)
∂x
∂Vi (t, x)
∂t
= xT θ̇i (t)x
= 2xθi (t)x. This time the Riccati equation used should be the differential
equation (4.19) and the Nash equilibrium can be constructed as (4.36).
ui = −Rii−1 (t)BiT (t)θi (t)x
(i = 1, · · · , m)
(4.36)
And in the proof of necessity, assume that (4.14) is a Nash equilibrium with Ki (tf ) =
Rii−1 (tf )BiiT (tf )Ci . Let ui = −Ki (t)x = − 12 Rii−1 (t)BiT (t)pi (t) with pi (tf ) = Ci x(tf ).
Assume pi (t) = θi (t)x(t) and note that ṗi (t) = θ̇i (t)x(t) + θi (t)ẋ(t). Then we have (4.37)(4.28) with time variable t omitted in (4.38).
75
Ki (t) = Rii−1 (t)BiT (t)θi (t)
−θ̇i =
A−
m
X
(4.37)
−1 T
Bj θj
Bj Rjj
j=1
+ Qi +
m
X
!T
θi + θi
A−
−1
−1 T
θjT Bj Rjj
Rij Rjj
Bj θj
ẋ =
A(t) −
−1 T
Bj θj
Bj Rjj
j=1
(4.38)
(θi (tf ) = Ci ; i = 1, · · · , m)
j=1
m
X
m
X
!
!
−1
Bj (t)Rjj
(t)BjT (t)θj (t) x
j=1
(4.39)
Again, if θi ’s in (4.38) are symmetric, then (4.38) will be exactly the same as (4.19). Because the last two terms in the right hand side of (4.38) are symmetric, we know that
T
P
Pm
−1 T
−1 T
θ̇i + A − m
B
R
B
θ
θ
+
θ
A
−
B
R
B
θ
must be symmetric, i.e.
j
j
i
i
j
j
j
j
jj
jj
j=1
j=1
θ̇i +
A−
= θ̇iT + θiT
m
X
−1 T
Bj Rjj
Bj θj
j=1
A−
m
X
!T
θi + θi
−1 T
Bj Rjj
Bj θj
j=1
!
+
A−
m
X
−1 T
Bj Rjj
Bj θj
j=1
A−
m
X
j=1
!
−1 T
Bj Rjj
Bj θj
!
θiT
(i = 1, · · · , m)
(4.40)
Define a time varying matrix Pi (t) = θi (t) − θiT (t), which is skew symmetric for ∀t ∈
P
Pm
−1
T
[t0 , tf ]. Denote Ā(t) = A(t) − m
B
(t)R
(t)B
(t)θ
(t)
=
A(t)
−
j
j
j
jj
j=1
j=1 Bj (t)Kj (t).
Then from (4.40), we have (4.41)
Ṗi = −Pi Ā(t) − ĀT (t)Pi
(Pi (tf ) = 0; ∀t ∈ [t0 , tf ]; i = 1, · · · , m)
(4.41)
(4.41) is a differential Sylvester equation and it has a unique solution (4.42).
Pi (t) = Φ−ĀT (t, t0 )Pi (t0 )ΦT−ĀT (t, t0 ) (∀t ∈ [t0 , tf ]; i = 1, · · · , m)
(4.42)
where Φ−ĀT (t, t0 ) is the state transition matrix of −ĀT satisfying
Φ̇−ĀT (t, t0 ) = −ĀT (t)Φ−ĀT (t, t0 )
(Φ−ĀT (t0 , t0 ) = I; ∀t ∈ [t0 , tf ])
76
(4.43)
Because Pi (tf ) = 0 and Φ−ĀT (t, t0 ) is nonsingular for ∀t ∈ [t0 , tf ], from (4.42) we
must have Pi (t0 ) = 0 (i = 1, · · · , m) and consequently we have for Pi (t) = 0 (∀t ∈
[t0 , tf ] ; i = 1, · · · , m). So there is no nontrivial skew symmetric solution to (4.41) and
θi (t) = θiT (∀t ∈ [t0 , tf ]). Thus we know the solutions θi (t)’s to (4.38) if they exist should
be symmetric and (4.38) is exactly the same as (4.19). The proof of necessity to part 2 is
completed.
Starting from different points of view, we have the following two different problems.
4.2.1 Direct Problem
To find the linear state feedback (4.11)/(4.14) such that it will be the Nash equilibrium
solution for the given game (4.9)-(4.10) /(4.12)-(4.13) is the direct or general problem
discussed in many references [30] [44] [27] [46]. The direct problem for game (4.12)(4.13) (similarly for Game (4.9)-(4.10)) can be stated as: given A(t), Bi (t)’s, Qi (t)’s,
Rij (t)’s and Ci ’s, find state feedback gains Ki∗ (t)’s such that (4.18)-(4.20) is satisfied. For
the direct problem, we have the following theorem. Please note that Basar and Olsder
provided the sufficient condition [8], Corollary 6.5) for the game (4.12)-(4.13) when all the
weighting matrices in (4.13) are at least positive semi-definite.
Theorem 4.4. Game (4.9)-(4.10) / (4.12)-(4.13) can have a Nash equilibrium solution if
and only if the ARE/DRE (4.16)/ (4.19) has a solution θi ’s and the state feedback gains
of Nash equilibrium solution can be constructed by (4.15)/ (4.18). Furthermore, for game
(4.9)-(4.10), (4.17) is required in order to guarantee the stability of the closed-loop system.
Proof. The proof is similar to that for Theorem 4.3.
Remark 4.2. 1) In the linear quadratic optimal control problem (4.1), under the assumptions of Q ≥ 0, R > 0, (A, B) is stabilizable, and (D, A) is detectable (it should be
77
understood that Q = D T D), then the corresponding Riccati equation is solvable and furthermore the closed-loop system is asymptotically stable due to Theorem 4.1 in [89]; 2)
In the direct problem of game (4.9)-(4.10), the asymptotical stability of the closed-loop
system is not guaranteed by the solvability of the Riccati equation (4.16). This is one
of the reasons why game problems are more complicated than optimal control problems.
Proposition 4.1 below provides a sufficient condition for the asymptotical stability of the
closed-loop system if the state and control weighting matrices in (4.10) satisfy the positive
and the semi-positive definite condition. Otherwise, more concrete conditions are required
for each specific game.
Proposition 4.1. In game (4.9)-(4.10), if Qi > 0, Rii > 0, and Rij ≥ 0 for all i, j =
1, · · · , m and if there is a solution θi > 0 (i = 1, · · · , m) of the Riccati equation (4.16),
P
−1 T
then the closed-loop system with system matrix A − m
j=1 Bj Rjj Bj θj is asymptotically
stable.
Proof. Construct a Lyapunov candidate function (4.44) for the closed-loop system,
V (t, x) =
m
X
xT (t)θi x(t)
(4.44)
i=1
Obviously, (4.44) is a positive definite function. Define
Pi = Qi +
m
X
j=1
−1
−1 T
θj Bj Rjj
Rij Rjj
Bj θj (i = 1, · · · , m)
then under the above assumptions we have Pi > 0 (i = 1, · · · , m). Considering the
Riccati equation (4.16), the time derivative of V (t, x) along the system trajectory (4.9)
P
T
is V̇ (t, x) = − m
i=1 x (t)Pi x(t). By the positive definiteness of Pi ’s, V̇ (t) is negative
definite and the closed-loop system is asymptotically stable.
78
Remark 4.3. The conditions Qi > 0 (i = 1, · · · , m) in Proposition 4.1 can be relaxed
to be Qi = DiT Di ≥ 0 (i = 1, · · · , m) and there is at least one i such that (Di , A)
is detectable. The stability of the closed-loop system can be guaranteed by an analysis
similar to that in Theorem 4.1 in Wonham [89].
4.2.2 Inverse Problem
For game (4.9)-(4.10) /(4.12)-(4.13), given a group of control strategies (4.11)/(4.14)
(for the infinite horizon case, (4.11) should be an asymptotically stabilizing control law),
find the conditions on the state and control weighting matrices in (4.10)/(4.13) such that
(4.11)/(4.14) is a Nash equilibrium for game (4.9)-(4.10) /(4.12)-(4.13). This is the inverse
problem. The inverse problem for game (4.12)-(4.13) (similarly for game (4.9)-(4.10))
can be stated as: given system parameter matrices A(t), Bi (t)’s and state feedback gains
Ki (t)’s, find real symmetric state and control weighting matrices Qi (t)’s, Rij (t)’s and Ci ’s
such that the given state feedback strategy (4.14) is a Nash equilibrium solution for the
constructed performance indices..
1) Infinite Horizon Game Problem
Theorem 4.5. Control strategy group (4.11), under which the closed-loop system with
P
system matrix A − m
i=1 Bi Ki is asymptotically stable, is a Nash equilibrium solution of
game (4.9)-(4.10) for some Qi ’s, Rij ’s with Rii > 0 if and only if they satisfy (4.45)-(4.46):
0 = BiT θi − Rii Ki
0= A−
m
X
j=1
Bj Kj
(i = 1, ..., m)
!T
θi +θi
A−
(4.45)
m
X
!
Bj Kj + Qi +
j=1
where θi ’s are real symmetric constant matrices.
79
m
X
(Kj )T Rij Kj
(i = 1, ..., m)
j=1
(4.46)
For proof please refer to that of Theorem 4.3.
Substituting (4.45) into (4.46), we have
T
m
m
X
X
+ Qi
0=
A
−
B
K
θ
+
θ
A
−
B
K
j
j
i
i
j
j
j=1
j6=i
+
m
X
j=1
j6=i
j=1
j6=i
(Kj )T Rij Kj − (Ki )T Rii Ki
(4.47)
(i = 1, ..., m)
Equations (4.45)-(4.46)are equivalent to (4.46)-(4.47).
Define a block diagonal matrix variable of the unknowns θi ’s, Qi ’s and Rij ’s with Rii >
0 as
0 θi
Yi = diag
, Qi , Ri1 , ... , Rii , ... , Rim
θi 0
P
Pm
which is of dimension 3n + m
m
by
3n
+
j
j=1
j=1 mj .
(i = 1, ..., m)
Define the other two parameter matrices as
S = A−
m
X
j=1
Bj Kj
!T
T
, I, I, (K1)T , ... , (Ki )T , ... , (Km )T
T
T
m
X
√
T
T
T
−1(K
)
,
...
,
(K
)
Si = A −
Bj Kj
,
I,
I,
(K
)
,
...
,
i
m
1
(i = 1, ...m)
j=1
j6=i
which are both of dimension n by 3n+
Pm
j=1 mj .
Here I is the identity matrix of dimension
n by n. Then (4.46) and (4.47) can be rewritten as
S T Yi S = 0 (i = 1, ..., m)
(4.48)
SiT Yi Si = 0
(4.49)
(i = 1, ..., m)
(4.48) and (4.49) are algebraic linear matrix equations of Yi .
80
Define the set of solutions to (4.48) as Ỹi = {Yi : S T Yi S = 0} and the set of solutions
to (4.49) as Ȳi = {Yi : SiT YiSi = 0}. Then for each player, the solution set to (4.48) and
(4.49) is
Yi = Ỹi ∩ Ȳi
(4.50)
For player i, denote the set of all Qi and Rij which satisfy (4.50) as Γi . For Game (4.9)(4.10), the group of strategies in (4.11) is a Nash equilibrium strategy if and only if Γ =
Γ1 × · · · × Γm is not empty.
2) Finite Horizon Game Problem
Theorem 4.6. Control strategy group (4.14) is a Nash equilibrium solution of game (4.12)(4.13) for some Qi (t)’s, Rij (t)’s with Rii (t) > 0 if and only if they satisfy (4.51)-(4.52):
0 = BiT (t)θi (t) − Rii (t)Ki (t) (t ∈ [t0 , tf ], i = 1, ..., m)
!T
!
m
m
m
X
X
X
−θ̇i = A −
Bj Kj θi +θi A −
Bj Kj + Qi +
(Kj )T Rij Kj
j=1
θi (tf ) = Ci
j=1
j=1
(4.51)
(4.52)
(t ∈ [t0 , tf ]; i = 1, ..., m)
where θi (t)’s are real symmetric time varying matrices.
For proof please refer to that of Theorem 4.3.
Substituting the terminal condition in (4.52) into (4.51), we get BiT(tf )Ci=Rii (tf )Ki (tf )
for i = 1, · · · , m. Because Rii (tf ) (i = 1, · · · , m) must be positive definite matrices, for
the given Bi (tf ) and Ki (tf ), in order to find Ci , we must have
rank (Bi (tf )) = rank (Bi (tf ), Ki (tf ))
(i = 1, ..., m)
(4.53)
Remark 4.4. 1) For each player i (i = 1, · · · , m), there are mi ×n+ n(n+ 1)/2 equations
P
and n(n + 1) + m
j=1 mj (mj + 1)/2 variables in (4.45)-(4.46). Similarly, for each player
i (i = 1, · · · , m), the numbers of equations and variables in (4.51)-(4.52) are respectively
81
mi × n + n(n + 1) and 3n(n + 1)/2 +
Pm
j=1 mj (mj + 1)/2.
Because mi ≤ n, the number of
variables is always greater than the number of equations. So for a given system (4.9)/ (4.12)
and control strategy group (4.11)/ (4.14), there may be infinitely many Qi ’s, Rij ’s and / or
Ci ’s which can satisfy the conditions (4.45)-(4.46)/ (4.51)-(4.52).
2) Choose arbitrary positive definite Rii ’s first. Then from (4.45) and (4.51), we can
decide partial entries of θi ’s (and also partial entries of Ci ’s for game (4.12)-(4.13)). Substitute these entries into (4.46) and (4.52), for arbitrarily selected Rij ’s and the undetermined entries of θi ’s whose terminal values Ci ’s are also correspondingly chosen, Qi ’s can
always be decided by (4.54) for game (4.9)-(4.10) and (4.55) for fame (4.12)-(4.13).
!T
!
m
m
m
X
X
X
Qi = − A −
Bj Kj θi − θi A −
Bj Kj −
(Kj )T Rij Kj
j=1
j=1
j=1
(4.54)
(i = 1, ..., m)
Qi = −θ˙i −
A−
m
X
j=1
Bj Kj
!T
θi − θi
A−
m
X
j=1
Bj Kj
!
−
m
X
(Kj )T Rij Kj
j=1
(4.55)
(i = 1, ..., m)
3) Note that in (4.45)-(4.46) and (4.51)-(4.52), Qi ’s and Rij (i 6= j) may be indefinite, whereas indefinite Q is hardly physically motivated in most general optimal control
problems.
4) If the resultant Qi ’s and Rij ’s (and also Ci ’s for game (4.12)-(4.13)) are the same
for all the players, then the non-cooperative game problem is actually a team problem.
5) If m=2 and the resultant state and control weighting matrices Q1 = −Q2 , R11 =
−R21 and R12 = −R22 (also C1 = −C2 for game (4.12)-(4.13)), then the two-player
non-cooperative game is actually a zero-sum game.
Although (4.45)-(4.46) and (4.51)-(4.52) provide the conditions on the parameter matrices in (4.10)/(4.13), as Kalman has pointed out in [39], only for low-dimension systems,
82
it is possible for us to find the explicit conditions for Qi ’s, Rij ’s and / or Ci ’s. In section
4.3, we apply the above process to a second-order two-player game problem where a full
state feedback Nash equilibrium solution is desired.
4.3 Inverse Problem for 2nd -order 2-player Nonzero-sum LQ Game
4.3.1 2nd -order 2-player Nonzero-sum Game with Infinite Horizon
A second order 2-player LQ game is described by (4.9)-(4.10) with n = 2, m = 2,
m1 = m2 = 1 and
a1
a2 6= 0
A=
,
a3 6= 0
a4
B1 =
b1 =
6 0
,
0
R11 = R22 = 1,
B2 =
R12 = r12 ,
0
,
b2 6= 0
qi1 qi3
Q=
qi3 qi2
R21 = r21
i.e.
ẋ1 = a1 x1 + a2 x2 + b1 u1
(x1 (0) = x10 )
(4.56)
ẋ2 = a3 x1 + a4 x2 + b2 u2
(x2 (0) = x20 )
Z ∞
Ji (x0 , u1 , u2 ) =
xT Qi x + u2i + rij u2j dt (i, j = 1, 2; i 6= j)
(4.57)
0
(A, B1 ) and (A, B2 ) are both controllable. Each player applies a constant linear full state
feedback control strategy (4.58)
ui(t) = −Ki x(t) = −ki1 x1 (t) − ki2 x2 (t)
(i = 1, 2)
ẋ1 = (a1 − b1 k11 ) x1 + (a2 − b1 k12 ) x2
(x1 (0) = x10 )
ẋ2 = (a3 − b2 k21 ) x1 + (a4 − b2 k22 ) x2
(x2 (0) = x20 )
(4.58)
(4.59)
Assume under (4.58), that the closed-loop system (4.59) is asymptotically stable. By this
assumption, we know that
(a1 − b1 k11 ) + (a4 − b2 k22 ) < 0
(4.60)
(a1 − b1 k11 ) (a4 − b2 k22 ) − (a2 − b1 k12 ) (a3 − b2 k21 ) > 0
(4.61)
83
This implies that (a1 − b1 k11 ) and (a4 − b2 k22 ) cannot be zero simultaneously; if a1 −
b1 k11 = 0 or a4 − b2 k22 = 0, then we must have a2 − b1 k12 6= 0 and a3 − b2 k21 6= 0; if
a2 −b1 k12 = 0 and/or a3 −b2 k21 = 0, then we must have a1 −b1 k11 6= 0 and a4 −b2 k22 6= 0.
We need to find conditions on Q1 , Q2 , r12 and r21 such that (4.58) is a Nash equilibrium
solution. Denote the two symmetric constant matrices θi (i = 1, 2) in (4.45)-(4.46) for
game (4.56)-(4.57) by
θ1 =
φ1 φ3
,
φ3 φ2
θ2 =
ψ1 ψ3
ψ3 ψ2
Theorem 4.7. The asymptotically stabilizing control strategy pair (4.58) is a Nash equilibrium solution for game (4.56)-(4.57) if and only if the weighting matrices in (4.57) satisfy
conditions (4.62)-(4.63). Considering (4.60)-(4.61), Table 4.1 lists all the cases for (4.62)(4.63).
2
2
q11 = −2 [a1 k11 /b1 + (a3 − b2 k21 ) k12 /b1 ] + k11
− r12 k21
2
2
q12 = −2[a2 k12 /b1 + (a4 − b2 k22 ) α] + k12 − r12 k22
q13 = − (a1 k12 + a2 k11 ) /b1 − (a3 − b2 k21 ) α − (a4 − b2 k22 ) k12 /b1
+k11 k12 − r12 k21 k22
∀α ∈ R
q21
q22
q23
∀β
2
2
= −2[(a1 − b1 k11 )β + a3 k21 /b2 ] − r21 k11
+ k21
2
2
= −2[(a2 − b1 k12 )k21 /b2 + a4 k22 /b2 ] − r21 k12
+ k22
= − (a1 − b1 k11 ) k21 /b2 − (a3 k22 + a4 k21 ) /b2 − (a2 − b1 k12 ) β
−r21 k11 k12 + k21 k22
∈R
2
2
q11 = −2 [a1 k11 /b1 + (a3 − b2 k21 ) k12 /b1 ] + k11 − r12 k21
2
2
q12 = −2a2 k12 /b1 + k12
− r12 k22
q13 = − (a1 k12 + a2 k11 ) /b1 − (a4 − b2 k22 ) k12 /b1 + k11 k12 − r12 k21 k22
2
2
q = −2[a1 k11 /b1 + (a3 − b2 k21 ) k12 /b1 ] + k11
− r12 k21
11
q13 = − (a1 k12 + a2 k11 ) /b1 − (a4 − b2 k22 ) k12 /b1
2 +r k 2
(a3 −b2 k21 )(2a2 k12 /b1 +q12 −k12
12 22 )
+
2(a
−b
k
)
4
2 22
+k11 k12 − r12 k21 k22
84
(4.62)
(4.63)
(4.64)
(4.65)
a1 − b1 k11
a3 − b2 k21
a4 − b2 k22
a2 − b1 k12
a2 − b1 k12
a1 − b1 k11
a3 − b2 k21
a1 − b1 k11
a2 − b1 k12
a1 − b1 k11
a1 − b1 k11
a3 − b2 k21
Cases
= 0, a2 − b1 k12
6= 0, a4 − b2 k22
= 0, a1 − b1 k11
6= 0, a3 − b2 k21
= 0, a3 − b2 k21
6= 0, a4 − b2 k22
= 0, a2 − b1 k12
6= 0, a4 − b2 k22
= 0, a3 − b2 k21
6= 0, a4 − b2 k22
6= 0, a2 − b1 k12
6= 0, a4 − b2 k22
Q1 , r12
Q2 , r21
6= 0
6= 0 (4.64) or (4.65)
(4.70)
6= 0
6= 0
(4.66)
(4.68) or (4.69)
6= 0
6= 0 (4.64) or (4.65)
(4.71)
6= 0
6= 0
(4.67)
(4.68) or (4.70)
=0
6= 0
(4.67)
(4.71)
6= 0
6= 0 (4.64) or (4.65) (4.68) or (4.69)
Table 4.1: Conditions on the weighting matrices
2
2
q11 = −2 [a1 k11 /b1 + (a3 − b2 k21 ) k12 /b1 ] + k11 − r12 k21
2
2
q12 = −2a2 k12 /b1 + k12
− r12 k22
∀q13 ∈ R
2
2
q11 = −2a1 k11 /b1 + k11 − r12 k21
∀q12 ∈ R
q13 = −(a1 k12 + a2 k11 )/b1 + (a4 − b2 k22 )k12 /b1 + k11 k12 − r12 k21 k22
2
2
q21 = −2a3 k21 /b2 − r21 k11 + k21
2
2
q22 = −2[(a2 − b1 k12 )k21 /b2 + a4 k22 /b2 ] − r21 k12
+ k22
q23 = −(a1 − b1 k11 )k21 /b2 + (a3 k22 + a4 k21 )/b2 − r21 k11 k12 + k12 k22
q23 = − (a1 − b1 k11 ) k21 /b2 − (a3 k22 + a4 k21 ) /b2
2 −k 2
(a2 −b1 k12 )(2a3 k21 /b2 +q21 +r12 k11
21 )
+
(a1 −b1 k11 )
−r21 k11 k12 + k21 k22
2
2
q22 = −2[(a2 − b1 k12 )k21 /b2 + a4 k22 /b2 ] − r21 k12
+ k22
2
2
q21 = −2a3 k21 /b2 − r21 k11 + k21
2
2
q12 = −2[(a2 − b1 k12 )k21 /b2 + a4 k22 /b2 ] − r21 k12
+ k22
∀q23 ∈ R
85
(4.66)
(4.67)
(4.68)
(4.69)
(4.70)
∀q21 ∈ R
2
2
q22 = −2a4 k22 /b2 − r21 k12
+ k22
q23 = −(a1 − b1 k11 )k21 /b2 − (a3 k22 + a4 k21 )/b2 − r21 k11 k12 + k21 k22
(4.71)
Proof. By Theorem 4.5, after substituting B1 , B2 , R11 = R22 = 1 and (4.58) into (4.45),
we get (b1 φ1 b1 φ3 ) = (k11 k12 ) and (b2 ψ3 b2 ψ2 ) = (k21 k22 ). This means that θi (i =
1, 2) must be of the form
k11 /b1 k12 /b1
θ1 =
,
k12 /b1
φ2
θ2 =
ψ1
k21 /b2
k21 /b2 k22 /b2
(4.72)
Note that bi 6= 0 (i = 1, 2). In (4.72), the undetermined entries φ2 and ψ1 represent the
design freedom. By (4.72), (4.46) for game (4.56)-(4.57) becomes (4.73)-(4.74)
2
2
2[(a1 − b1 k11 )k11 /b1 + (a3 − b2 k21 )k12 /b1 + q11 + k11
+ r12 k21
=0
(a − b k )k /b + (a − b k )φ + (a − b k )k /b
1
1 11 12 1
3
2 21
2
2
1 12 11 1
+(a4 − b2 k22 )k21 /b1 + q13 + k11 k12 + r12 k21 k22 = 0
2
2
2[(a2 − b1 k12 )k12 /b1 + (a4 − b2 k22 )φ2 ] + q12 + k12
+ r12 k22
=0
2
2
2[(a1 − b1 k11 )ψ1 + (a3 − b2 k21 )k21 /b2 ] + q21 + r21 k11
+ k21
=0
(a − b k )k /b + (a − b k )k /b + (a − b k )ψ
1
1 11 21 2
3
2 21 22 2
2
1 12
1
+(a4 − b2 k22 )k21 /b2 + q23 + r21 k11 k12 + k21 k22 = 0
2
2
2[(a2 − b1 k12 )k21 /b2 + (a4 − b2 k22 )k22 /b2 ] + q22 + r21 k12
+ k22
=0
(4.73)
(4.74)
Rearranging (4.73)-(4.74), we get (4.62)-(4.63) which explicitly give the expressions of
Q1 and Q2 . When considering (4.60)-(4.61) which guarantees the asymptotical stability of
the close-loop system, we obtain Table 4.1 which lists all the specific situations for (4.62)(4.63). For example, (4.64) and (4.68) are obtained when setting the arbitrary variables
in (4.62)-(4.63) to be zero, i.e. α = β = 0. All the other conditions can be derived
correspondingly.
86
4.3.2 2nd -order 2-player Nonzero-sum Game with Finite Horizon
The time varying version of game (4.56)-(4.57), a 2nd -order 2-player nonzero-sum game
with finite horizon, is described by (4.75)-(4.76).
(
ẋ1 = a1 (t)x1 + a2 (t)x2 + b1 (t)u1 (x1 (t0 ) = x10 , b1 (t) 6= 0, ∀t ∈ [t0 , tf ])
ẋ2 = a3 (t)x1 + a4 (t)x2 + b2 (t)u2 (x2 (t0 ) = x20 , b2 (t) 6= 0, ∀t ∈ [t0 , tf ])
T
Ji (x0 , u1, u2 ) = x (tf )Cix(tf ) +
Z
tf
0
xT Qi (t)x + u2i + rij (t)u2j dt
(4.75)
(4.76)
(i, j = 1, 2; i 6= j)
Compared with (4.57), the extra terminal state weighting matrix in (4.76) is
Ci =
ci1 ci3
(i = 1, 2)
ci3 ci2
Each player applies a linear time varying full state feedback strategy (4.77).
ui (t) = −Ki (t)x(t) = −ki1 (t)x(t) − ki2 (t)x2 (t)
(i = 1, 2)
(4.77)
Suppose that in (4.75)-(4.76) and (4.77), all the time functions are continuous especially
with bi (t) and kij (t) (i, j = 1, 2) being continuously differentiable.
Theorem 4.8. The linear time varying full state feedback strategy (4.77) is a Nash equilibrium solution of game (4.75)-(4.76) if and only if conditions in ((4.78)-(4.79) on Q1 (t),
Q2 (t), C1 , 2 , r12 (t) and r2 1(t) are satisfied.
2
2
2
q11 = (ḃ1 k11 − b1 k̇11 )/b1 − 2 [a1 k11 /b1 + (a3 − b2 k21 )k12 /b1 ] + k11 − r12 k21
2
2
q12 = −α̇ − 2[a2 k12 /b1 + (a4 − b2 k22 )α] + k12
− r12 k22
2
q13 = (ḃ1 k12 − b1 k̇12 )/b1 − (a1 k12 + a2 k11 )/b1 − (a3 − b2 k21 )α
−(a4 − b2 k22 )k12 /b1 + k11 k12 − r12 k21 k22
c11 = k11 (tf )/b1 (tf )
c13 = k12 (tf )/b1 (tf )
∀c ∈ R; ∀α(t) : [t , t ] → R ∈ C 1 with α(t ) = c
12
0
f
f
12
∀t ∈ [t0 , tf ]
87
(4.78)
2
2
q21 = −β̇ − 2[(a1 − b1 k11 )β + a3 k21 /b2 ] − r21 k11
+ k21
2
2
q22 = (ḃ2 k22 − b2 k̇22 )/b22 − 2 [(a2 − b1 k12 )k21 /b2 + a4 k22 /b2 ] − r21 k12
+ k22
2
q23 = (ḃ2 k21 − b2 k̇21 )/b2 − (a1 − b1 k11 )k21 /b2 − (a3 k22 + a4 k21 )/b2
−(a2 − b1 k12 )β − r21 k11 k12 + k21 k22
c23 = k21 (tf )/b2 (tf )
c22 = k22 (tf )/b2 (tf )
∀c ∈ R; ∀β(t) : [t , t ] → R ∈ C 1 with β(t ) = c
21
0
f
f
21
(4.79)
∀t ∈ [t0 , tf ]
Proof. Denote the two time varying symmetric matrices in (4.51)-(4.52) for game (4.75)(4.76) by
φ1 (t) φ3 (t)
θ1 (t) =
,
φ3 (t) φ2 (t)
θ2 (t) =
ψ1 (t) ψ3 (t)
ψ3 (t) ψ2 (t)
By (4.51), we obtain (b1 (t)φ1 (t) b1 (t)φ3 (t)) = (k11(t) k12(t)) and (b2 (t)ψ3 (t) b2 (t)ψ2 (t)) =
(k21(t) k22(t)). Because of bi (t) 6= 0 (∀t ∈ [t0 , tf ]; i = 1, 2), we have
θ1 (t) =
k11 (t)/b1 (t) k12 (t)/b1 (t)
,
k12 (t)/b1 (t)
φ2 (t)
θ2 (t) =
ψ1 (t)
k21 (t)/b2 (t)
k21 (t)/b2 (t) k22 (t)/b2 (t)
(4.80)
Then (4.52) for game (4.75)-(4.76) becomes (4.81)-(4.82) with time variable t omitted for
simplicity of notation.
2
2
(ḃ1 k11 − b1 k̇11 )/b21 = 2 [a1 k11 /b1 + (a3 − b2 k21 )k12 /b1 ] + q11 − k11
+ r12 k21
(ḃ1 k12 − b1 k̇12 )/b21 = (a1 k12 + a2 k11 )/b1 + (a3 − b2 k21 )φ2
+(a4 − b2 k22 )k12 /b1 + q13 − k11 k12 + r12 k21 k22
2
2
−φ̇2 = 2[a2 k12 /b1 + (a4 − b2 k22 )φ2 ] + q12 − k12
+ r12 k22
c11 = k11 (tf )/b1 (tf )
c13 = k12 (tf )/b1 (tf )
c = φ (t )
12
2 f
∀t ∈ [t0 , tf ]
88
(4.81)
2
2
−ψ̇1 = −2[(a1 − b1 k11 )ψ1 + a3 k21 /b2 ] + q21 + r21 k11
− k21
(ḃ2 k21 − b2 k̇21 /b22 = (a1 − b1 k11 )k21 )/b2 + (a3 k22 + a4 k21 )/b2
+(a2 − b1 k12 )ψ1 + q23 + r21 k11 k12 − k21 k22
2
2
2
(ḃ2 k22 − b2 k̇22 )/b2 = 2 [(a2 − b1 k12 )k21 /b2 + a4 k22 /b2 ] + q22 + r21 k12
− k22
c21 = ψ1 (tf )
c23 = k21 (tf )/b2 (tf )
c = k (t )/b (t )
22
22 f
2 f
(4.82)
∀t ∈ [t0 , tf ]
Rearranging (4.81)-(4.82), we obtain (4.78)-(4.79). Notice that with different choices
of the arbitrary time functions α(t) and β(t) and the arbitrary numbers c12 and c21 , we will
get different specific conditions on the weighting matrices. For example, when we choose
α(t) = c12 and β(t) = c21 (∀t ∈ [t0 , tf ]), we get (4.83)-(4.84)
2
2
q11 = (ḃ1 k11 − b1 k̇11 )/b21 − 2 [a1 k11 /b1 + (a3 − b2 k21 )k12 /b1 ] + k11
− r12 k21
2
2
q12 = −2[a2 k12 /b1 + (a4 − b2 k22 )c12 ] + k12
− r12 k22
2
q13 = (ḃ1 k12 − b1 k̇12 )/b1 − (a1 k12 + a2 k11 )/b1 − (a3 − b2 k21 )c12
−(a4 − b2 k22 )k12 /b1 + k11 k12 − r12 k21 k22
c11 = k11 (tf )/b1 (tf )
c13 = k12 (tf )/b1 (tf )
∀c ∈ R;
12
(4.83)
∀t ∈ [t0 , tf ]
2
2
q21 = −2[(a1 − b1 k11 )c21 + a3 k21 /b2 ] − r21 k11
+ k21
2
2
q22 = (ḃ2 k22 − b2 k̇22 )/b22 − 2 [(a2 − b1 k12 )k21 /b2 + a4 k22 /b2 ] − r21 k12
+ k22
2
q23 = (ḃ2 k21 − b2 k̇21 )/b2 − (a1 − b1 k11 )k21 /b2 − (a3 k22 + a4 k21 )/b2
−(a2 − b1 k12 )c21 − r21 k11 k12 + k21 k22
c23 = k21 (tf )/b2 (tf )
c22 = k22 (tf )/b2 (tf )
∀c ∈ R;
21
∀t ∈ [t0 , tf ]
89
(4.84)
Remark 4.5. 1) For the inverse problems of game (4.56)-(4.57) and (4.75)-(4.76), there
will be infinitely many solutions. The reasons for this are: a) (4.62)-(4.63) and (4.78)(4.79) only describe the relationship between the weighting matrices in the performance
indices; b) the different choices of the design freedom, i.e. the arbitrary real numbers α
and β in (4.62)-(4.63), the arbitrary real numbers c12 and c21 and time functions α(t) and
β(t) in (4.78)-(4.79), is another reason to produce the non-uniqueness.
2) If we restrict the weighting matrices in the integral of (4.76) to be time-invariant,
then for the time varying state feedback strategy pair (4.77) there may not be solutions to
the inverse problem for game (4.75)-(4.76).
4.4 Extension to a 2n-order LQ Games
4.4.1 2n-order LQ Game with Infinite Horizon
A 2n order (n ≥ 1) two-player linear-quadratic game with infinite horizon is described
by (4.85)-(4.86)
(
Ẋ1 = A1 X1 + A2 X2 + B 1 U1
Ẋ2 = A3 X1 + A4 X2 + B 2 U2
Z
(X1 (0) = X10 )
(X2 (0) = X20 )
(4.85)
∞
Ji (X0 , U1 , U2 ) =
(X T Qi X + UiT Ui + UjT Rij Uj )dt (i, j = 1, 2; i 6= j)
0
(4.86)
Qi1 Qi3
Qi =
(i = 1, 2)
Qi3 Qi2
T
where system states Xi ∈ Rn (i = 1, 2)), X(t) = X1T , X2T . Each player’s control
Ui ∈ Ūi (i = 1, 2). Here Ūi = {U : R≥0 → Rn } (i = 1, 2). Ai , B̄j , Qjk and Rlj
(i = 1, 2, 3, 4; k = 1, 2, 3; j, l = 1, 2, j 6= l) are all n × n diagonal matrices and B̄1 and B̄2
are nonsingular matrices. Player i (i = 1, 2) uses linear time-invariant stabilizing full state
feedback control strategy (4.87). In (4.87), Kij (i, j = 1, 2) are n × n diagonal matrices.
Ui = −K i X = −Ki1 X1 − K12 X2
90
(i = 1, 2)
(4.87)
Corollary 4.1. For the 2n-order (n ≥ 1) two-player linear-quadratic game (4.85)-(4.86),
(4.87) will be a Nash equilibrium solution if and only if the weighting matrices in (4.86)
satisfy condition (4.88)-(4.89).
2
2
Q11 = −2[A1 K11 B̄1−1 + (A3 − B̄2 K21 )K12 B̄1−1 ] + K11
− R12 K21
−1
2
2
Q12 = −2[A2 K12 B̄1 + (A4 − B̄2 K22 )Λ] + K12 − R12 K22
Q13 = −(A1 K12 + A2 K11 )B̄1−1 − (A3 − B̄2 K21 )Λ
−(A4 − B̄2 K22 )K12 B̄1−1 + K11 K12 − R12 K21 K22
∀ diagonal matrix Λ ∈ Rn×n
2
2
Q21 = −2[(A1 − B̄1 K11 )T + A3 K21 B̄2−1 ] − R21 K11
+ K21
−1
−1
2
2
Q22 = −2[(A2 − B̄1 K12 )K21 B̄1 + A4 K22 B̄2 ] − R21 K12 + K22
Q23 = −(A1 − B̄1 K11 )K21 B̄2−1 − (A3 K22 + A4 K21 )B̄2−1
−(A2 − B̄1 K12 )T − R21 K11 K12 + K21 K22
∀ diagonal matrix T ∈ Rn×n
(4.88)
(4.89)
4.4.2 2n-order LQ Game with Finite Horizon
A 2n order (n ≥ 1) two-player linear-quadratic game with finite horizon is described
by (4.90)-(4.91)
(
Ẋ1 = A1 (t)X1 + A2 (t)X2 + B̄1 (t)U1
Ẋ2 = A3 (t)X1 + A4 (t)X2 + B̄2 (t)U2
T
Ji (X0 , U1 , U2 ) = X (tf )C̄i X(tf ) +
Z
tf
0
(X1 (t0 ) = X10 , ∀t ∈ [t0 , tf ])
(X2 (t0 ) = X20 ∀t ∈ [t0 , tf ])
(4.90)
X T Q̄i (t)X + UiT Ui + UjT Rij (t)Uj dt
(i, j = 1, 2; i 6= j) (4.91)
Qi1 (t) Qi3 (t)
Ci1 Ci3
Q̄i (t) =
, C̄i =
(i = 1, 2)
Qi3 (t) Qi2 (t)
Ci3 Ci2
Similarly, parameter matrices Ai (t), B̄j (t), Qjk (t), Cjk and Rlj (i = 1,2,3,4; k = 1,2,3; j, l =
1, 2; j 6= l) in (4.90)-(4.91) are all n × n diagonal matrices, and among them all the time
varying matrices are continuous. B̄1 (t) and B̄2 (t) are continuously differentiable and are
nonsingular for ∀t ∈ [t0 , tf ]. Player i (i = 1, 2) applies time varying linear full state
91
feedback control strategy (4.92) to minimize his/her own performance index (4.91).
Ui = −K̄i (t)X = −Ki1 (t)X1 − K12 (t)X2
(i = 1, 2)
(4.92)
In (4.92), the n × n diagonal matrices Kij (t) : [t0 , tf ] → Rn×n are continuously differentiable.
Corollary 4.2. Control strategy pair (4.92) is a Nash equilibrium solution of Game (4.90)(4.91) if and only if the weighting matrices Q̄i (t), C̄i and Rij (t) (i, j = 1, 2; i 6= j) in (4.91)
satisfy the conditions in (4.93)-(4.94) with time variable omitted for simplicity of notation.
−2
−1
−1
˙
Q
=
B̄
K
−
B̄
K̇
B̄
−
2
A
K
B̄
+
(A
−
B̄
K
)K
B̄
11
1
11
1
11
1
11
3
2
21
12
1
1
1
2
2
+K11
− R12 K21
2
2
Q12 = −
Λ̇ − 2 A2 K12 B̄1−1
+ (A4 − B̄2 K22 )Λ + K12
− R12 K22
Q13 = B̄˙ 1 K12 − B̄1 K̇12 B̄1−2 − (A1 K12 + A2 K11 )B̄1−1 − (A3 − B̄2 K21 )Λ
−(A4 − B̄2 K22 )K12 B̄1−1 + K11 K12 − R12 K21 K22
(4.93)
C
=
K
(t
)
B̄
(t
)
11
11 f
1 f
C
=
K
(t
)
B̄
13
12 f
1 (tf )
∀ diagonal constanct matrix C12 ∈ Rn×n
1
∀ diagonal time varying matrix Λ(t) : [t0 , tf ] →∈ Rn×n ∈ Cn×n
[t0 , tf ]
with Λ(tf ) = C12
∀t ∈ [t0 , tf ]
−1
2
2
Q
=
−
Ṫ
−
2
(A
−
B̄
K
)T
+
A
K
B̄
− R21 K11
+ K21
21
1
1
11
3
21
2
Q22 = B̄˙ 2 K22 − B̄2 K̇22 B̄2−2 − 2 (A2 − B̄1 K12 )K21 B̄1−1 + A4 K22 B̄2−1
2
2
−R21 K12
+ K22
−2
−1
˙ K − B̄ K̇
Q
=
B̄
23
2 21
2 21 B̄2 − (A1 − B̄1 K11 )K21 B̄2
−(A3 K22 + A4 K21 )B̄2−1 − (A2 − B̄1 K12 )T
−R21 K11 K12 + K21 K22
C23 = K21 (tf )B̄2−1 (tf )
C22 = K22 (tf )B̄2−1 (tf )
∀ diagonal constanct matrix C21 ∈ Rn×n
1
∀ diagonal time varying matrix T(t) : [t0 , tf ] →∈ Rn×n ∈ Cn×n
[t0 , tf ]
with T(t ) = C
f
12
∀t ∈ [t0 , tf ]
92
(4.94)
Proofs for Corollaries 4.1 and 4.2 are similar to those for Theorems 4.7 and 4.8 respectively.
Remark 4.6. 1) Note that all matrices in (4.88)-(4.89) and (4.93)-(4.94) are diagonal so
that the order of matrices in the multiplication does not matter.
2) If we restrict the weighting matrices Q̄i (t) and Rij (t) (i, j = 1, 2; i 6= j) in ((4.91)
to be time-invariant, we may not be able to find constant and from (4.93)-(4.94), and there
will not be solutions of the inverse problem for Game (4.90)-(4.91).
4.5 Numeric Example
We provide an example to illustrate an interesting point which can be easily observed
in the study of the inverse problem.
Suppose in the 2nd -order 2-player nonzero-sum game (4.56)-(4.57), the parameters are
1 −1
A=
,
2 4
1
B1 =
,
0
0
B2 =
1
And the given constant control strategy group is
u1 (t) = −2x1 (t) + x2 (t)
i.e. k11 = 2; k12 = −1
u2 (t) = −2x1 (t) − 6x2 (t)
i.e. k21 = 2; k22 = 6
(A, B1 ) is controllable and so is (A, B2 ). A is unstable with two eigenvalues of 2
0
and 3. The closed-loop system matrix is −1
0 −2 . Obviously, the closed-loop system is
asymptotically stable. According to condition (4.67) and (4.71), if we choose r12 = r21 =
0, q12 = 6 and q21 = 8, then we get
Q1 =
0 −1
,
−1 6
Q2 =
8 −6
−6 8
The eigenvalues of Q1 are −0.1623 and 6.1623; the eigenvalues of Q2 are −10 and
10. So, the resultant Q1 and Q2 are indefinite. This is in contrast with the requirement in
93
many optimal control problems where the state weighting matrix in the integral of performance index should be positive semi-definite (or negative semi-definite) in order to have a
meaningful optimal control problem.
For x10 = 1 and x20 − 0.5, figures 4.1-4.3 show the plots of system state and control
variables.
4.6 Conclusion
Riccati equations are very important for linear-quadratic optimal control or game problems, but generally speaking they are difficult to solve. We discussed the inverse problem
for the general nth -order m-player nonzero-sum LQ differential games. Given a group of
linear state feedback strategies, they constitute a Nash equilibrium if and only if the state
and control weighting matrices in the performance indices satisfy some conditions which
can actually be transformed to be of the form of linear equations. Based on these general
conditions explicit conditions for the inverse problem of a 2nd -order 2-player nonzero-sum
game was thoroughly discussed, and the results were extended to a class of higher order
2-player non-cooperative games.
94
1
x1
x2
0.9
0.8
0.7
States
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
5
time (s)
6
7
Figure 4.1: System state versus time
95
8
9
10
0.5
0.45
0.4
0.35
x2
0.3
0.25
0.2
0.15
0.1
0.05
0
0
0.1
0.2
0.3
0.4
0.5
x1
0.6
Figure 4.2: State phase
96
0.7
0.8
0.9
1
0
u1
u2
−0.5
−1
−1.5
control
−2
−2.5
−3
−3.5
−4
−4.5
−5
0
1
2
3
4
5
time (s)
6
7
Figure 4.3: Control versus time
97
8
9
10
CHAPTER 5
CONCLUSION AND FUTURE STUDIES
5.1 Summary
Game theory has been widely applied to multi-player decision making processes because it can capture the nature of these problems: the determination of one player’s control
strategy is tightly coupled to that of other players’ control strategies. Although there are
hundreds publications devoted to various game problems, there still exist many problems
remaining unexplored: 1) for singular games, the available literatures discussed either zerosum games or game problems where discontinuous controls are allowed. There is no literature addressing singular nonzero-sum games with the constraint that each player’s control
should be continuous state feedback; 2) many researchers applied zero-sum games to minimax design problems (e.g. H∞ control) where the two opposing players are the controller
and uncertainty. There are many real applications involving multiple players (≥ 2) besides
the uncertainty. New methods need to be investigated in order to produce robust strategies for these more complicated problems; 3) because of the bottleneck of solving Riccati
equations associated with game problems, equilibrium solutions can be formulated, but explicit solutions cannot be found. Inverse game problems provide us another point of view
for analyzing game problems. By inverse problems, the degree of system design freedom
98
can be revealed. There is no literature addressing the inverse problems for the general indefinite games which have more applications than definite game problems specially in the
non-cooperative game category.
This dissertation presents and solves three problems: equilibrium solutions for singular
game by continuous state feedback; robust equilibrium solutions for multi-player asymmetric games; inverse game problem for indefinite games. The main contributions are as
follows.
For singular LQ games where each player has a control-free cost functional quadratic in
the system states over an infinite horizon and each player’s control strategy is constrained
to be continuous linear state feedback: 1) a new equilibrium concept: asymptotic ε-Nash
equilibrium was proposed in terms of a two-player nonzero-sum game; 2) partial state feedback asymptotic ε-Nash equilibrium was found by solving a group of algebraic equations
of system coefficients. Conditions on the initial states and the variable ε were provided such
that the asymptotic ε-Nash equilibrium will be an ε-Nash equilibrium or an ordinary Nash
equilibrium; 3) for a class of 2nd -order singular LQ games, the asymptotic ε-Nash equilibrium implemented was explicitly found in terms of system coefficients. All the results can
be extended to multiple (≥ 2) players game without technical difficulties.
Robust output feedback equilibrium solutions were proposed for two-player asymmetric games with an additive uncertainty: 1) regarding the uncertainty as the third player, a
three-player noncooperative nonzero-sum game was formed whose Nash equilibrium is a
robust equilibrium solution for the original two-player game; 2) regarding the coalition of
the original two players as one player and the uncertainty as another player, a two-player
noncooperative nonzero-sum game was formed whose Nash equilibrium solution provides
an un-improvable robust equilibrium for the original game.
99
Inverse problem for indefinite games were proposed to reveal the degree of system design freedom: 1) a necessary and sufficient condition for the inverse problem was provided
using a group of algebraic/differential equations linear in the variables and the weighting
matrices. Because of the linearity of the equations, a inverse problem is easier to solve
compared with a direct problem; 2) the inverse problem for a class of 2nd -order two-player
LQ game was thoroughly investigated.
5.2 Future Studies
There are some problems worth our attention and further investigations to extend current results to solve more difficult/general problems.
1. In singular game problems, asymptotic ε-Nash equilibrium solutions can be found by
solving nonlinear algebraic equations (2.17), (2.19) and (2.20). We provided explicit
solutions to these equations for a class of 2nd -order systems. More work is needed
for the discussion of existence and uniqueness of qualified solution for general high
order games.
2. Riccati equation (4.16) may have multiple qualified solutions. More studies are
needed for finding methods to characterize all the qualified solutions because the
properties of these qualified solutions will affects the behavior of the closed-loop
system. The constraint of stability of the closed-loop may not remove the multiplicity of qualified solution to (4.16). More criteria are needed in order to guarantee the
uniqueness of qualified solutions to (4.16).
100
3. To apply the results in chapter 2 to a class of nonlinear games
ẋ = f (x) + g1 (x)u1 + g2 (x)u2
(x ∈ Rn ; f (x0 ) = g1 (x0 ) = g2 (x0 ) = 0)
yi = hi (x) ∈ R
(i = 1, 2)
Z ∞
Ji (x0 , u1, u2 ) =
h2i (x) dt (i = 1, 2)
(5.1)
0
Define
L0f hi (x) = hi (x)
Ljf hi (x) =
(i = 1, 2)
∂Lj−1
f hi (x)
f (x) (j ≥ 1; i = 1, 2)
∂x
∂Ljf hi (x)
j
Lgk Lf hi (x) =
gk (x) (j ≥ 0; i, k = 1, 2)
∂x
If system (5.1) has vector relative degree (γ1 , γ2 ), γ1 + γ2 = n and
Lg1 Lγf2 −1 h2 (h) = Lg2 Lγf1 −1 h1 (h) = 0
Under new coordinate (5.2) and new control variable (5.3)
T
ξ = ξ11 , · · · , ξγ11 , ξ12 , · · · , ξγ22
= h1 (x), · · · , Lγf1 −1 h1 (x), h2 (x), · · · , Lfγ2 −1 h2 (x)
ui = vi − Lγfi hi (x) /Lgi Lγfi −1 hi (x) (i = 1, 2)
T
(5.2)
(5.3)
System (5.1) is equivalent to (5.4)
ξ˙ = Aξ + b1 v1 + b1 v2
yi = ξ1i
(i = 1, 2)
Z ∞
2
Ji (x0 , u1 , u2) =
ξ1i dt (i = 1, 2)
(5.4)
0
where
A1 0
A=
,
0 A2
0 1 0 ··· 0
0 0 1 · · · 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Ai =
0 0 0 · · · 1
0 0 0 · · · 0 γ ×γ
i
101
i
b1 = 0 · · · 0 1 0 · · · 0
↑γith entry
T
, b2 = 0 · · · 0 1
n×1
T
n×1
If LQ game (5.4) can be solved using the result in chapter 2, then the original nonlinear game (5.1) can be solved.
102
BIBLIOGRAPHY
[1] H. Abou-Kandil, G. Freiling, V. Ionescu, and G. Jank. Matrix Riccati equations in
control and systems theory. Birkhäuser Verlag, Boston, 2003.
[2] F. Amato, M. Mattei, and A. Pironti. Robust strategies for Nash linear quadratic
games under uncertain dynamics. In Proceedings of the 37th IEEE Conference on
Decision and Control, pages 1869–1970, 1998.
[3] F. Amato and A. Pironti. A note on singular zero-sum linear quadratic differential
games. In Proceedings of the 33rd IEEE Conference on Decision and Control, pages
1533–1535, Lake Buena Vista, FL, USA, 1994.
[4] B. D. O. Anderson. The testing for optimality of linear systems. International Journal
of Control, 4(1):29–40, 1996.
[5] V. Andrieu and H. P. Lahanier. Exoatmospheric interception problem solved using
output feedback law. Systems and Controls Letters, 55:633–639, 2006.
[6] M. Athans. The matrix minimum principle. Information and Control, 11(6):592–606,
1968.
[7] T. Basar and P. Bernhard. H ∞ -optimal control and related minimax design problems:
a dynamic game approach. Birkhäuser Boston, 1991.
[8] T. Basar and G. J. Olsder. Dynamic non-cooperative game theory. Academic Press,
New York, Second edition, 1995.
[9] J. Z. Ben-Asher, S. Levinson, J. Shinar, and H. Weiss. Trajectory shaping in linearquadratic pursuit-evasion games. Journal of Guidance, 27(6):1102–1105, 2004.
[10] B. D. Bernheim, B. Peleg, and M. D. Whinston. Coalition-proof Nash equilibrium 1:
concepts. Journal of Economic Theory, 42:1–12, 1987.
[11] B. D. Bernheim and M. D. Whinston. Coalition-proof Nash equilibrium 2: applications. Journal of Economic Theory, 42:13–29, 1987.
103
[12] D. P. Bertsekas and I. B. Rhodes. Sufficiently informative functions and the minimax
feedback control of uncertain dynamic systems. IEEE Transactions on Automatic
Control, AC-13:117–124, April 1973.
[13] M. Breton, F. Chauny, and G. Zaccour. Leader-follower dynamic game of new prodcut
diffusion. Journal of Optimization Theory and Applications, 92(1):77–98, 1997.
[14] S. Butman. A method for optimizing control-free costs in systems with linear controllers. IEEE Transactions on Automatic Control, 13(5):554–556, 1968.
[15] C. D. Charalambous. Stochastic nonlinear minimax dynamic games with noisy measurements. IEEE Transactions on Automatic Control, 48(2):261–266, Feb. 2003.
[16] B. S. Chen and W. Zhang. Stochastic H 2 /H ∞ control with state-dependent noise.
IEEE Transactions on Automatic Control, 49(1):45–57, 2004.
[17] X. Chen and K. Zhou. Multi-objective filtering design. In Proceedings of IEEE
Conference on Electrical and Computer Engineering, pages 708–713, 1999.
[18] O. L V. Costa and E. F. Tuesta. Finite horizon quadratic optimal control and a separation principle for Markovian jump linear systems. IEEE Transactions on Automatic
Control, 48(10):1836–1842, 2003.
[19] J. B. Cruz and C. I. Chen. Series Nash solution of two person nonzero-sum linearquadratic games. Journal of Optimization Theory and Applications, 7:240–257, 1971.
[20] J. B. Cruz Jr. and W. R. Perkins. A new approach to the sensitivity problem in multivariable feedback system design. IEEE Transactions on Automatic Control, AC9:216–223, July 1964.
[21] V. Dragan, T. Morozan, and A. Stoica. H 2 optimal control for linear stochastic systems. Automatica, 40:1103–1113, 2004.
[22] C. Edwards, N. O. Lai, and S. K. Spurgeon. On discrete dynamic output feedback
min-max controllers. Automatica, 41:1783–1790, 2005.
[23] M. Ehrgott. Multicriteria optimization. Springer Verlag, Heidelberg, New York, 2000.
[24] J. Engwerda. Feedback Nash equilibria in the scalar infinite horizon LQ-game. Automatica, 36:135–139, 2000.
[25] R. Findeisen, L. Imsland, F. Allgöwer, and B. A. Foss. Output feedback stabilization
of constrained systems with nonlinear predictive control. International Journal of
Robust and Nonlinear Control, 13:211–227, 2003.
[26] F. Forges. An approach to communications equilibrium. Econometrica, 54:1375–
1386, 1985.
104
[27] R. A. Freeman and P. V. Kokotovic. Inverse optimality in robust stabilization. SIAM
Journal on Control and Optimization, 34(4):1365–1391, July 1996.
[28] D. Fudenberg and D. Levine. Limit games and limit equilibria. Journal of Economic
Theory, 38(2):261–279, 1986.
[29] D. Fudenberg and D. K. Levine.
61(3):523–545, May 1993.
Self-confirming equilibrium.
Econometrica,
[30] T. Fujii and P. P. Khargonekar. Inverse problems in H∞ control theory and linearquadratic differential games. In Proceedings of the 27th IEEE Conference on Decision
and Controln, pages 26–31, 1988.
[31] F. Gao, W. Liu, V. Sreeram, and K. L. Teo. Characterization and selection of global
optimal output feedback gains for linear time-invariant systems. Optimal Control
Applications and Methods, 21:195–209, 2000.
[32] V. Y. Glizer. Asymptotic solution of zero-sum linear-quadratic differential game with
cheap control for minimizer. Nonlinear Differential Equations and Applications,
7(2):231–258, 2000.
[33] M. J. Grimble. Polynomial matrix solution of H 2 optimal control problem for statespace systems. Optimal Control Applications and Methods, 23:59–89, 2002.
[34] A. Heifetz and C. Ponsati. All in good time. International Journal of Game Theory,
35:521–538, 2007.
[35] Y. C. Ho. Linear stochastic singular control problems. Journal of Optimization Theory
and Applications, 9(1):24–31, 1972.
[36] H. M. James, N. B. Nichols, and R. S. Phillips. Theory of Servomechanisms.
McGRAW-Hill Book Company, Inc., New York, 1947.
[37] M. Jimenez and A. Poznyak. ε-equilibrium in LQ differential games with bounded uncertain disturbances: robustness of standard strategies and new strategies with adaptation. International Journal of Control, 79(7):786–797, July 2006.
[38] R. E. Kalman. Contributions to the theory of optimal control. Bol. Soc. Mat. Mex,
5:102–199, 1960.
[39] R. E. Kalman. When is a linear control system optimal? Transactions of the ASME.
Series D, Journal of Basic Engineering, pages 51–60, Mar. 1964.
[40] L. V. Kamneva. The sufficient conditions of stability for the value function of a
differential game in terms of singular points. Journal of Applied Mathematics and
Mechanics, 67(3):329–343, 2003.
105
[41] A. P. Karavaev. Pareto effectiveness of equilibria in active systems with the distributed
control. Automation and Remote Control, 63(12):1980–1995, 2002.
[42] I. N. Katz, H. Mukai, H. Schattler, and M. Zhang. Solution of a differential game
formulation of military air operations by the method of characteristics. In Proceedinds
of American Control Conference, pages 168–175, Arlington, VA, USA, 2001.
[43] H. J. Kelley. A transformation approach to singular subarcs in optimal trajectory and
control problems. SIAM Journal on Control, 2(2):234–240, 1965.
[44] M. M. Kogan. Solution to an inverse problem of locally minimax control and its applications in robust control designs. In Proceedings of American Control Conference,
pages 25–27, 2001.
[45] H. Konishi, M. L. Breton, and S. Weber. On coalition-proof Nash equilibria in common agency games. Journal of Economic Theory, 85:122–139, 1999.
[46] M. Kristic and Z. Li. Inverse optimal design of input-to-state stabilizing nonlinear
controllers. IEEE Transactions on Automatic Control, 43(3):336–350, Mar. 1998.
[47] V. B. Larin. On static output-feedback stabilization of a periodic system. International
Applied Mechanics, 42(3):357–363, 2006.
[48] G. Leitmann. Cooperative and non-cooperative many-players differential games.
Springer Verlag, New York, Second edition, 1974.
[49] W. S. Levine, T. L. Johnson, and M. Athans. Optimal limited state variable feedback controllers for linear systems. IEEE Transactions on Automatic Control, AC16(6):785–792, 1971.
[50] C. C. Lin and K. H. Lu. Optimal discrete-time structural control using direct output
feedback. Engineering Structures, 18(6):472–480, 1996.
[51] P. Loridan. ε-solutions in vector minimization problems. Journal of Optimization
Theory and Applications, 43(2):265–276, June 1984.
[52] W. M. McEneaney. Max-plus eigenvector representations for solution of nonlinear
H ∞ problems: basic concepts. IEEE Transactions on Automatic Control, 48(7):1150–
1163, July 2003.
[53] A. A. Melikyan. Singular characteristics of first order PDEs in optimal control and
differential games. Journal of Mathematical Sciences, 103(6):745–755, 2001.
[54] D. Moreno and J. Wooders. Coalition-proof equilibrium. Games and Economic Behavior, 7:80–112, 1996.
106
[55] J. F. Nash. Equilibrium points in n-person games. Proceedings of NAS, 1950.
[56] J. V. Neumann and O. Morgenstern. Theory of Games and Economic Behavior.
Princeton, 1944.
[57] G. J. Olsder. On open- and closed-loop bang-bang control in nonzero-sum differential
games. SIAM Journal on Control and Optimization, 40(4):1087–1106, 2001.
[58] M. Ottaviani and F. Squintani. Naive audience and communication bias. International
Journal of Game Theory, 35:129–150, 2006.
[59] Z. Pan and T. Basar. Robustness of H ∞ controllers to nonlinear perturbations. In
Proceedings of the 32nd Conference on Decision and Control, pages 1638–1643, San
Antonlo, Texas, 1993.
[60] G. P. Papavassilopoulos. Cooperative outcomes of dynamic stochastic Nash games. In
Proceedings of the 28th IEEE Conference on Decision and Control, pages 186–191,
Tampa, Florida, Dec. 1989.
[61] G. P. Papavassilopoulos and J. B. Cruz. On the existence of solutions to coupled matrix Riccati differential equations in linear-quadratic Nash games. IEEE Transactions
on Automatic Control, AC-42(1):127–129, 1979.
[62] G. P. Papavassilopoulos, J. V. Medanic, and J. B. Cruz. On the existence of Nash
strategies and solutions to coupled Riccati equations in linear-quadratic games. Journal of Optimization Theory and Applications, 28(1):49–76, 1979.
[63] G. P. Papavassilopoulos and G. J. Olsder. On the linear-quadratic closed-loop nomemory Nash game. Journal of Optimization Theory and Applications, 42:551–560,
1984.
[64] W. R. Perkins and J. B. Cruz Jr. Feedback properties of linear regulators. IEEE
Transactions on Automatic Control, AC-16:649–664, Dec. 1971.
[65] E. Prempain and I. Postlethwaite. Static output feedback stabilisation with H ∞ performance for a class of plants. Systems and Control Letters, 43:159–166, 2001.
[66] R. W. Rauchhaus. Asymmetric information, mediation, and conflict management.
World Politics, 58:207–241, January 2006.
[67] L. Saberi and P. Sannuti. Cheap and singular controls for linear quadratic regulators.
IEEE Transactions on Automatic Control, AC-32(3):208–219, 1987.
[68] D. M. Salmon. Minimax controller design. IEEE Transactions on Automatic Control,
13(4):369–376, 1968.
107
[69] T. Sandler and D. G. Arce M. Terrorism and game theory. Simulation and Gaming,
34(3):319–337, September 2003.
[70] I. G. Sarma and U. R. Prasad. Switching surfaces in N -person differential games.
Journal of Optimization Theory and Applications, 10(3):160–177, 1972.
[71] L. J. Savage. The foundation of statistics. Wiley, New York, 1954.
[72] A. V. Savkin. Robust output feedback constrained controllability of uncertain linear
time-varying systems. Journal of Mathematical Analysis and Applications, 215:376–
387, 1997.
[73] U. Schwalbe. The core of a production economy with asymmetric information.
Metroeconomica, 54(4):385–403, 2003.
[74] J. L. Speyer and D. H. Jacobson. Necessary and sufficient conditions for optimality
for singular control problems; a transformation approach. Journal of Mathematical
Analysis and Applications, 33(1):163–187, 1971.
[75] A. W. Starr and Y. C. Ho. Further properties of nonzero-sum differential games.
Journal of Optimization Theory and Applications, 3(4):207–219, 1969.
[76] A. W. Starr and Y. C. Ho. Nonzero-sum differential games. Journal of Optimization
Theory and Applications, 3(3):184–206, 1969.
[77] V. L. Syrmos, C. T. Abdallah, P. Dorato, and K. Grigoriadis. Static output feedback-a
survey. Automatica, 33(2):125–137, 1997.
[78] K. Takaba and T. Katayama. H 2 output feedback control for descriptor systems.
Automatica, 34(7):841–850, 1998.
[79] C. Tang and T. Basar. Mimimax nonlinear control under stochastic uncertainty constraints. In Proceedings of the 42nd IEEE Conference on Decision and Control, pages
1065–1070, Maui, Hawaii, USA, 2003.
[80] V. Turetsky. Upper bounds of the pursuer control based on a linear quadratic differential game. Journal of Optimization Theory and Applications, 121(1):163–191, April
2004.
[81] V. Turetsky and J. Shinar. Missile guidance laws based on pursuit-evasion game
formulations. Automatica, 39:607–618, 2003.
[82] O. Volij. Communication, credible improvements and the core of an economy with
asymmetric information. International Journal of Game Theory, 29:63–79, 2000.
108
[83] M. Voorneveld, S. Grahn, and M. Dufwenberg. Ideal equilibria in noncooperative
multi-criteria games. Mathmematical Methods of Operations Research, 52:65–77,
2000.
[84] X. Wang and J. B. Cruz. Asymptotic ε-Nash equilibrium for 2nd -order two-player
nonzero-sum singular LQ games with decentralized control. Submitted to IFAC 2008.
[85] X. Wang and J. B. Cruz. Nash equilibrium for 2nd -order two-player non-zero sum LQ
games with executable decentralized control strategies. In Proceedings of the 45th
IEEE Conference on Decision and Control, pages 1960–1965, San Diego, CA, USA,
2006.
[86] A. J. T. M. Weeren, J. M. Schumacher, and J. C. Engwerda. Asymptotic analysis of
linear feedback Nash equilibria in nonzero-sum linear-quadratic differential games.
Journal of Optimization Theory and Applications, 101(3):693–722, June 1999.
[87] D. J. White. Epsilon efficiency. Journal of Optimization Theory and Applications,
49(2):319–337, May 1986.
[88] H. S. Witsehausen. A minimax control problem for sampled linear systems. IEEE
Transactions on Automatic Control, AC-13:5–21, 1968.
[89] W. M. Wonham. On a matrix Riccati equation of stochastic control. SIAM Journal on
Control, 6(4):681–697, 1998.
[90] H. Xu and K. Mizukami. Infinite-horizon differential games of singularly perturbed
systems: a unified approach. Automatica, 33(2):273–276, 1997.
[91] C. Yung, Y. Lin, and F. Yeh. A family of nonlinear H ∞ output feedback controllers. In
Proceedings of American Control Conference, pages 3728–3729, Seatle, Washington,
1995.
[92] G. Zames and B. A. Francis. Feedback minimax sensitivity and optimal robustness.
IEEE Transactions on Automatic Control, 28:585–601, 1983.
[93] K. Zhu and J. P. Weyant. Strategic decisions of new technology adoption under
asymmetric information: a game-theoretic model. Decision Sciences, 34(4):643–675,
2003.
[94] V. I. Zhukovskii. Lyapunov functions in differential games. Taylor and Francis, 2003.
[95] V. I. Zhukovskiy and M. E. Salukvadze. The Vector-valued Maximin. Academic Press,
Boston, 1994.
109
© Copyright 2026 Paperzz