Stochastic Omega-Regular Games

Stochastic Omega-Regular Games - EECS Berkeley

Stochastic Omega-Regular Games
Krishnendu Chatterjee
Electrical Engineering and Computer Sciences
University of California at Berkeley
Technical Report No. UCB/EECS-2007-122
http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-122.html
October 8, 2007
Copyright © 2007, by the author(s).
All rights reserved.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission.
Stochastic ω-Regular Games
by
Krishnendu Chatterjee
B. Tech. (IIT, Kharagpur) 2001
M.S. (University of California, Berkeley) 2004
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy
in
Computer Science
in the
GRADUATE DIVISION
of the
UNIVERSITY of CALIFORNIA at BERKELEY
Committee in charge:
Professor Thomas A. Henzinger, Chair
Professor Christos Papadimitriou
Professor John Steel
Fall, 2007
The dissertation of Krishnendu Chatterjee is approved:
Chair
Date
Date
Date
University of California at Berkeley
Fall, 2007
Stochastic ω-Regular Games
Copyright Fall, 2007
by
Krishnendu Chatterjee
1
Abstract
Stochastic ω-Regular Games
by
Krishnendu Chatterjee
Doctor of Philosophy in Computer Science
University of California at Berkeley
Professor Thomas A. Henzinger, Chair
We study games played on graphs with ω-regular conditions specified as parity, Rabin,
Streett or Müller conditions. These games have applications in the verification, synthesis,
modeling, testing, and compatibility checking of reactive systems. Important distinctions
between graph games are as follows: (a) turn-based vs. concurrent games, depending
on whether at a state of the game only a single player makes a move, or players make
moves simultaneously; (b) deterministic vs. stochastic, depending on whether the transition
function is a deterministic or a probabilistic function over successor states; and (c) zero-sum
vs. non-zero-sum, depending on whether the objectives of the players are strictly conflicting
or not.
We establish that the decision problem for turn-based stochastic zero-sum games
with Rabin, Streett, and Müller objectives are NP-complete, coNP-complete, and PSPACEcomplete, respectively, substantially improving the previously known 3EXPTIME bound.
We also present strategy improvement style algorithms for turn-based stochastic Rabin and
Streett games. In the case of concurrent stochastic zero-sum games with parity objectives
we obtain a PSPACE bound, again improving the previously known 3EXPTIME bound. As
2
a consequence, concurrent stochastic zero-sum games with Rabin, Streett, and Müller objectives can be solved in EXPSPACE, improving the previously known 4EXPTIME bound.
We also present an elementary and combinatorial proof of the existence of memoryless εoptimal strategies in concurrent stochastic games with reachability objectives, for all real
ε > 0, where an ε-optimal strategy achieves the value of the game with in ε against all strategies of the opponent. We also use the proof techniques to present a strategy improvement
style algorithm for concurrent stochastic reachability games.
We then go beyond ω-regular objectives and study the complexity of an important
class of quantitative objectives, namely, limit-average objectives. In the case of limit-average
games, the states of the graph is labeled with rewards and the goal is to maximize the longrun average of the rewards. We show that concurrent stochastic zero-sum games with
limit-average objectives can be solved in EXPTIME.
Finally, we introduce a new notion of equilibrium, called secure equilibrium, in nonzero-sum games which captures the notion of conditional competitiveness. We prove the
existence of unique maximal secure equilibrium payoff profiles in turn-based deterministic
games, and present algorithms to compute such payoff profiles. We also show how the
notion of secure equilibrium extends the assume-guarantee style of reasoning in the game
theoretic framework.
Professor Thomas A. Henzinger
Dissertation Committee Chair
i
To Maa (my mother) ...
ii
Contents
List of Figures
v
List of Tables
vi
1 Introduction
2 Definitions
2.1 Game Graphs . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Turn-based probabilistic game graphs . . . .
2.1.2 Concurrent game graphs . . . . . . . . . . . .
2.2 Strategies . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Types of strategies . . . . . . . . . . . . . . .
2.2.2 Probability space and outcomes of strategies
2.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Game Values . . . . . . . . . . . . . . . . . . . . . .
2.5 Determinacy . . . . . . . . . . . . . . . . . . . . . .
2.6 Complexity of Games . . . . . . . . . . . . . . . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
12
12
13
15
16
17
19
23
26
27
3 Concurrent Games with Tail Objectives
3.1 Tail Objectives . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Completeness of limit-average objectives . . . . . .
3.2 Positive Limit-one Property . . . . . . . . . . . . . . . . .
3.3 Zero-sum Tail Games to Nonzero-sum Reachability Games
3.4 Construction of ε-optimal Strategies for Müller Objectives
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
31
32
37
47
54
59
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Strategies
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
60
61
61
62
65
67
68
.
.
.
.
.
.
.
.
.
.
4 Stochastic Müller Games
4.1 Markov decision processes . . . . . . . . . . . . . . . .
4.1.1 MDPs with reachability objectives . . . . . . .
4.1.2 MDPs with Müller objectives . . . . . . . . . .
4.1.3 MDPs with Rabin and Streett objectives . . .
4.2 2 12 -player Games with Müller objectives . . . . . . . .
4.3 Optimal Memory Bound for Pure Qualitative Winning
.
.
.
.
.
.
.
.
.
.
iii
CONTENTS
4.4
4.5
4.6
4.3.1 Complexity of qualitative analysis . . . . . . . . . . . . . . .
Optimal Memory Bound for Pure Optimal Strategies . . . . . . . . .
4.4.1 Complexity of quantitative analysis . . . . . . . . . . . . . . .
4.4.2 The complexity of union-closed and upward-closed objectives
An Improved Bound for Randomized Strategies . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
80
83
90
97
100
103
5 Stochastic Rabin and Streett Games
5.1 Qualitative Analysis of Rabin Games . . . . . . . . . .
5.2 Strategy Improvement for 2 21 -player Rabin and Streett
5.2.1 Key Properties . . . . . . . . . . . . . . . . . .
5.2.2 Strategy Improvement Algorithm . . . . . . . .
5.3 Randomized Algorithm . . . . . . . . . . . . . . . . .
5.4 Optimal Strategy Construction for Streett Objectives
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
Games
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
104
104
116
116
118
126
129
130
6 Concurrent Reachability Games
6.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . .
6.2 Markov Decision Processes of Memoryless Strategies
6.3 Existence of Memoryless ε-Optimal Strategies . . .
6.3.1 From value iteration to selectors . . . . . . .
6.3.2 From value iteration to optimal selectors . . .
6.4 Strategy Improvement . . . . . . . . . . . . . . . . .
6.4.1 The strategy-improvement algorithm . . . . .
6.4.2 Convergence . . . . . . . . . . . . . . . . . .
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
135
138
139
141
141
145
149
150
151
154
.
.
.
.
.
.
.
156
157
159
163
163
167
168
172
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Concurrent Limit-average Games
7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Theory of Real-closed Fields and Quantifier Elimination . . . . . . .
7.3 Computation of Values in Concurrent Limit-average Games . . . . .
7.3.1 Sentence for the value of a concurrent limit-average game . .
7.3.2 Algorithmic analysis . . . . . . . . . . . . . . . . . . . . . . .
7.3.3 Approximating the value of a concurrent limit-average game .
7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8 Concurrent Parity Games
173
8.1 Strategy Complexity and Computational Complexity . . . . . . . . . . . . . 175
8.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
9 Secure Equilibria and Applications
9.1 Non-zero-sum Games . . . . . . . . . . . .
9.2 Secure Equilibria . . . . . . . . . . . . . .
9.3 2-Player Non-Zero-Sum Games on Graphs
9.3.1 Unique maximal secure equilibria .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
199
200
205
208
209
iv
CONTENTS
9.4
9.5
9.3.2 Algorithmic characterization of secure equilibria
Assume-guarantee Synthesis . . . . . . . . . . . . . . . .
9.4.1 Co-synthesis . . . . . . . . . . . . . . . . . . . .
9.4.2 Game Algorithms for Co-synthesis . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .
Bibliography
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
216
220
222
227
235
236
v
List of Figures
3.1
3.2
3.3
3.4
A simple Markov chain. . . . . . . .
An illustration of idea of Theorem 5.
A game with Büchi objective. . . . .
A concurrent Büchi game. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
38
42
51
52
4.1
4.2
The sets of the construction. . . . . . . . . . . . . . . . . . . . . . . . . . .
The sets of the construction with forbidden edges. . . . . . . . . . . . . . .
74
75
5.1
5.2
5.3
5.4
Gadget for the reduction of 2 21 -player Rabin games to 2-player Rabin games. 106
The strategy sub-graph in Gσ . . . . . . . . . . . . . . . . . . . . . . . . . . 108
The strategy sub-graph in Gπ . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Gadget for the reduction of 2 12 -player Streett games to 2-player Streett games.114
6.1
An MDP with reachability objective. . . . . . . . . . . . . . . . . . . . . . .
143
9.1
9.2
9.3
9.4
A graph game with reachability objectives.
A graph game with Büchi objectives. . . . .
Mutual-exclusion protocol synthesis . . . .
Peterson’s mutual-exclusion protocol . . . .
203
204
222
226
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vi
List of Tables
5.1
5.2
8.1
8.2
Strategy complexity of 2 21 -player games and its sub-classes with ω-regular
objectives, where ΣPM denotes the family of pure memoryless strategies,
ΣPF denotes the family of pure finite-memory strategies and ΣM denotes the
family of randomized memoryless strategies. . . . . . . . . . . . . . . . . . .
Computational complexity of 2 21 -player games and its sub-classes with ωregular objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Strategy complexity of concurrent games with ω-regular objectives, where
ΣPM denotes the family of pure memoryless strategies, ΣM denotes the family of randomized memoryless strategies, and ΣHI denotes the family of randomized history dependent, infinite-memory strategies. . . . . . . . . . . .
Computational complexity of concurrent games with ω-regular objectives.
130
130
197
197
vii
LIST OF TABLES
Acknowledgements
I am deeply grateful to my advisor, Tom Henzinger, for his wonderful support and guidance
during my stay in Berkeley. Over the last five years he taught me all I know of research in
the field of verification. He taught me how to think about research problems, helped me
make significant progress in skills that are essential for a researcher, he taught me how to
write precisely and concisely, even carefully and patiently helped me in correcting all my
punctuation and grammatical mistakes. His enthusiasm, patience, ability to concretize an
ill-conceived idea to a precise problem, suggesting new ways to attack a problem when I got
stuck and his brilliance will always remain a source of inspiration. His influence is present
in every page of this thesis and will be in write-ups that I write in future. I only wish a
small fraction of his abilities has rubbed off on me.
I am thankful to Luca de Alfaro, Rupak Majumdar and Marcin Jurdziński for
several collaborative works that appear in this dissertation. It was an absolute pleasure
to work with Luca, and from him I received innumerable intuitions on the behavior of
concurrent games that form a large part of the thesis. Research with Rupak is a truly
wonderful experience: his ability to suggest relevant and interesting new problems and
amazing sense of humor always made our research discussions a fun experience. I started
my research on graph games with Marcin and I am grateful to him getting me interested in
the topic of graph games and explaining all the basics. Other than helping me in research,
he also influenced me a lot on how to clearly communicate an idea and patiently answer
to all questions (in my early days of research I surely had many stupid questions for him).
I am also fortunate to collaborate with Orna Kupferman and Nir Piterman, and I thank
them for sharing with me their knowledge of automata theory and teaching me a new way
to look at games via automata. I am also thankful to Jean-François Raskin, Laurent Doyen
and Radha Jagadeesan for fruitful research collaborations; it was a pleasure to work with
them. I am in debt to P.P. Chakrabarti and Pallab Dasgupta, who were my undergraduate
LIST OF TABLES
viii
mentors in IIT Kharagpur and introduced me to the field of formal methods and taught me
all the basics in computer science. I feel simply lucky that such brilliant people brought me
so caringly and smoothly to the field of verification and games.
I am grateful to a lot of people who read several of the results that appear in this
thesis (as manuscripts or conference publications) and helped me with their comments to
improve the results and presentation. Kousha Etessami and Mihalis Yannakakis pointed
out a flaw in a statement of a result of chapter 8 and then helped to the extent of correctly
formulating the result (which currently appears in chapter 8). I am truly grateful to them.
Hugo Gimbert with his valuable comments helped me to make the results of chapter 3
precise. Abraham Neyman helped immensely with his comments on chapter 7; his comments
were extremely helpful in improving and formalizing the results.
Christos Papadimitriou taught us an amazing course on “Algorithms, Internet
and Game Theory”and reinforced my interests in games. I thank him for the course and
for serving in my thesis committee. George Necula taught us a course on “Programming
Languages” and illuminated us with several aspects of program verification; and though I
have not worked much on this field he instilled in me a lot of interest and I hope to pursue
research in program verification in future. I also thank him for serving in my qualifying
exam committee. I am thankful to John Steel who readily agreed to serve on my qualifying
exam and thesis committee.
I thank all my friends who made my stay in Berkeley such a wonderful experience.
I already had old friends Arindam and Arkadeb and made many new friends. I had amazing
discussions with labmates Arindam, Slobodan, Vinayak, Satrajit and Arkadeb. I had an
excellent roommate Kaushik who helped me in many ways, and shared his vast knowledge
on cricket, tennis, movies, and so many other topics. The other highlight was our great
cricket sessions with Kaushik and Rahul. I had some great parties in the company of
Kaushik, Rahul, Pankaj, Arkadeb, Mohan Dunga, Satrajit, and many more friends. I had
LIST OF TABLES
ix
some great time with some of my other close friends in Berkeley such as Ambuj, Anurag,
Shanky, Vishnu, Ankit, Parag, Sanjeev, · · · . I was fortunate to meet Nilim and Antar-da
in Berkeley, who in my early years took care of me as their younger brother.
Two of my school teachers: Manjusree Mukherjee and Sreya Kana Mukherjee; and
two of my great friends: Binayak Roy and Abhijit Guria, will always remain as a source of
inspiration and I can always rely on them when I am in trouble. Binayak, in my college
days, taught me how to think about mathematics and without him nothing would have
been possible. I am thankful and grateful to them in too many ways to list.
Finally, my family has been an endless source of love, affection, support and motivation for me. My grand-parents, parents, Jetha, Jethima, Bad-di, Anju-di, Ranju-di,
Sikha-da, Pradip-da, Abhi, Hriti: all my family members in Calcutta and relatives in Purulia and other parts of Bengal gave me love beyond imagination, support and encouragement
at all stages of my PhD life. My inner strength is my mother and without her this thesis
would not have been possible. So I dedicate this thesis to her.
1
Chapter 1
Introduction
One shot and stochastic games. The study of games provide theoretical foundations in
several fields of mathematics and computer science. The simplest class of games consists
of the “one-step” games — games with single interaction between the agents after which
the game ends and the payoffs are decided (e.g., matrix games). However, a wide class of
games progress over time and in a stateful manner, and the current game depends on the
history of interactions. The class of concurrent stochastic games [Sha53, Eve57], that are
played over a finite state space and played in rounds, are a natural model for such games.
Infinite games. In this thesis we will consider nonterminating games of perfect-information
played on finite graphs. A nonterminating game proceeds for an infinite number of rounds.
The state of a game is a vertex of a graph. In each round, the state changes along an edge of
the graph to a successor vertex. Thus, the outcome of the game being played for an infinite
number of rounds, is an infinite path through the graph. We consider boolean objectives
for the two players: for each player, the resulting infinite path is either winning or losing.
The winning sets of paths are assumed to be ω-regular [Tho97]. Depending on how the
winning sets are specified, we distinguish between parity, Rabin, Streett, and Müller games,
as well as some subclasses thereof. The class of parity, Rabin, Streett and Müller objectives
CHAPTER 1. INTRODUCTION
2
are canonical forms to express ω-regular objectives [Tho97]. Depending on the structure of
the graph, we distinguish between turn-based and concurrent games. In turn-based games,
the graph is partitioned into player-1 states and player-2 states: in player-1 states, player 1
chooses the successor vertex; and in player-2 states, player 2 chooses the successor vertex.
In concurrent games, in every round both players choose simultaneously and independently
from a set of available moves, and the combination of both choices determines the successor
vertex. Finally, we distinguish between deterministic and stochastic games: in stochastic
games, in every round the players’ moves determine a probability distribution on the possible
successor vertices, instead of determining a unique successor.
These games play a central role in several areas of computer science. One important application arises when the vertices and edges of a graph represent the states and transitions of a reactive system, and the two players represent controllable versus uncontrollable
decisions during the execution of the system. The synthesis problem (or control problem)
for reactive systems asks for the construction of a winning strategy in the corresponding
graph game. This problem was first posed independently by Alonzo Church [Chu62] and
Richard Büchi [Büc62] in settings that can be reduced to turn-based deterministic games
with ω-regular objectives. The problem was solved independently by Michael Rabin using
logics on trees [Rab69], and by Büchi and Lawrence Landweber using a more game-theoretic
approach [BL69]; it was later resolved using improved methods [GH82, McN93] and in different application contexts [RW87, PR89]. Game-theoretic formulations have proved useful
not only for synthesis, but also for the modeling [Dil89, ALW89], refinement [HKR02], verification [dAHM00b, AHK02], testing [BGNV05], and compatibility checking [dAH01] of
reactive systems. The use of ω-regular objectives is natural in these application contexts.
This is because the winning conditions of the games arise from requirements specifications
for reactive systems, and the ω-regular sets of infinite paths provide an important and
robust paradigm for such specifications [MP92]. However, both the restriction to determin-
CHAPTER 1. INTRODUCTION
3
istic games and the restriction to turn-based games are limiting in some respects: probabilistic transitions are useful to model uncertain behavior that is not strictly adversarial
[Var85, CY95], and concurrent choice is useful to model certain forms of synchronous interaction between reactive systems [dAHM00a, dAHM01]. The resulting concurrent stochastic
games have long been familiar to game theorists and mathematicians, sometimes under the
name of competitive Markov decision processes [FV97].
Qualitative and quantitative analysis. The central computational problem about a
game is the question of whether a player has a strategy for winning the game. However, in
stochastic graph games there are several degrees of “winning”: we may ask if a player has
a strategy that ensures a winning outcome of the game, no matter how the other player
resolves her choices (this is called sure winning); or we may ask if a player has a strategy
that achieves a winning outcome of the game with probability 1 (almost-sure winning);
or we may ask if the maximal probability with which a player can win is 1 in the limit,
defined as the supremum over all possible strategies of the infimum over all adversarial
strategies (limit-sure winning). While all three notions of winning coincide for turn-based
deterministic games [Mar75], and almost-sure winning coincides with limit-sure winning for
turn-based stochastic games [CJH03] (see Corollary 5 of chapter 4), all three notions are
different for concurrent games, even in the deterministic case [dAHK98]. This is because
for concurrent games, strategies that use randomization are more powerful than pure (i.e.,
nonrandomized) strategies. The computation of sure winning, almost-sure winning, and
limit-sure winning states is called the qualitative analysis of graph games. This is in contrast
to the quantitative analysis, which asks for computing for each state the maximal probability
with which a player can win in the limit, even if that limit is less than 1. For a fixed player,
the limit probability is called the sup-inf value, or the optimal value, or simply the value of
the game at a state. A strategy that achieves the optimal value is an optimal strategy, and
a strategy that ensures one of the three ways of winning, is a sure (almost-sure; limit-sure)
CHAPTER 1. INTRODUCTION
4
winning strategy. Concurrent graph games are more difficult than turn-based graph games
for several reasons. In concurrent games, optimal strategies may not exist, but for every
real ε > 0, there may be a strategy that guarantees a winning outcome with a probability
that lies within ε of the optimal value [Eve57]. Moreover, ε-optimal or limit-sure winning
strategies may require infinite memory about the history of a game in order to prescribe
the next move of a player [dAH00]. By contrast, in the simplest scenarios —for example,
in the case of turn-based stochastic games with parity objectives— optimal and winning
strategies require neither randomization nor memory (see chapter 5); such pure memoryless
strategies can be implemented by control maps from states to moves.
A game that has a winning strategy for one of the two players at every vertex
is called determined. There are two kinds of determinacy results for graph games. First,
the turn-based deterministic games have a qualitative determinacy, namely, determinacy
for sure winning: in every state of the game graph, one of the two players has a sure
winning strategy [Mar75]. Second, the turn-based stochastic games and the concurrent
games have a quantitative determinacy, that is, determinacy for optimal values: in every
state, the optimal values for both players add up to 1 [Mar98]. Both the sure-winning
determinacy result and the optimal-value determinacy results hold for all Borel objectives;
the sure-winning determinacy for turn-based deterministic games with Borel objectives
was established by Donald Martin [Mar75] and the optimal-value determinacy for Borel
objectives was established again by Donald Martin [Mar98] for a very general class of
games called Blackwell games, which include all games we consider in this thesis. For
concurrent games, however, there is no determinacy for sure winning: even if a concurrent
game is deterministic (i.e., nonstochastic) and the objectives are simple (e.g., single-step
reachability), neither player may have a strategy for sure winning [dAHK98]. Determinacy is
useful for solving games: when computing the sure winning states of a game, or the optimal
values, we can switch between the dual views of the two players whenever convenient.
CHAPTER 1. INTRODUCTION
5
Quantitative objectives. So far we have discussed about qualitative objectives, i.e., an
outcome of the game is assigned payoff either 0 or 1. The more general case of quantitative
objectives consist of measurable functions that assign real valued rewards to outcomes of a
game. Several quantitative objectives have been studied by game theorists and also in the
context of economics. The notable quantitative objectives are discounted reward and limitaverage (or mean-payoff) objectives. In such games the states of the game graph is labeled
with real valued rewards: for discounted reward objectives the payoff is the discounted sum
of the rewards, and for limit-average objectives the payoff is the long-run average of the
rewards. Games with discounted reward objectives were introduced by Shapley [Sha53] and
has been studied in economics and also in systems theory [dAHM03]. The limit-average
objectives has also been studied extensively in game theory [MN81].
Nonzero-sum games. In nonzero-sum games, both players may be winning. In this case,
the notion of rational behavior of the players is captured by Nash equilibria: a pair of
strategies for the two players is a Nash equilibrium if neither player can increase her payoff
by unilaterally switching her strategy [Jr50]. In stochastic games Nash equilibria exists in
some special cases, and in the general setting, the existence of ε-Nash equilibria, for ε > 0,
is investigated. A pair of strategies for the two players is an ε-Nash equilibrium, for ε > 0,
if neither player can increase her payoff by at least ε by switching strategy. We now present
the fundamental results on stochastic games, and then state the main contribution of the
thesis.
Previous results on turn-based deterministic games. Sure determinacy for turnbased deterministic games with Borel objectives was established by a deep result of Martin [Mar75]: the result of Martin showed that for complementary objectives for the players,
the sure winning set for the two-players form a partition of the state space. For the special
case of Müller objectives, the result of Gurevich-Harrington [GH82] showed that finitememory sure-winning strategies exist for each player from their respective sure-winning set.
CHAPTER 1. INTRODUCTION
6
In the case of Rabin objectives existence of pure-memoryless sure-winning strategy has been
established in [EJ88], and the results of [EJ88] also proved that turn-based deterministic
games with Rabin and Streett objectives are NP-complete and coNP-complete, respectively.
Zielonka [Zie98] used a tree representation of Müller objectives (referred as the Zielonka tree)
and presented an elegant analysis of turn-based deterministic games with Müller objectives.
Using an insightful analysis of Zielonka’s result, the result of [DJW97] presented an optimal memory bound for pure sure-winning strategies for turn-based deterministic Müller
games. The complexity of turn-based deterministic games with Müller objectives was studied in [HD05] and the problem was shown to be PSPACE-complete. The algorithmic study
of turn-based deterministic games has received much attention in literature. A few notable
of them are as follows: (a) small progress measure algorithm [Jur00], strategy improvement
algorithm [VJ00], and subexponential time algorithm [JPZ06] for parity games, (b) algorithms for Streett and Rabin games [Hor05, KV98, PP06], and (c) algorithms for Müller
games [Zie98, HD05].
Previous results on concurrent games. The optimal value determinacy for one-shot
games is the famous minmax theorem of von Neumann, and such games can be solved in
polynomial time using linear programming. For concurrent games sure-determinacy does
not hold, and the optimal value determinacy for concurrent games with Borel objectives was
established by Martin [Mar98]. Concurrent games with qualitative reachability and more
general parity objectives have been studied in [dAHK98, dAH00]. The computation of
sure, almost-sure and limit-sure states can be computed in polynomial time for reachability
objectives [dAHK98], and for parity objectives the problems are in NP ∩ coNP [dAH00].
The values of concurrent games with parity objectives was characterized by quantitative
µ-calculus formulas in [dAM01], and from the characterization a 3EXPTIME algorithm
was obtained to solve concurrent parity games. The reduction of Rabin, Streett and Müller
objectives to parity objectives (an exponential reduction) [Tho97] and the algorithm for
CHAPTER 1. INTRODUCTION
7
parity objectives yield a 4EXPTIME algorithm to solve concurrent Rabin, Streett and
Müller games. For the special case of turn-based stochastic games the algorithm of [dAM01]
can be shown to work in 2EXPTIME for parity objectives, and thus one could obtain a
3EXPTIME algorithm for turn-based stochastic Rabin, Streett and Müller games.
Previous results on quantitative objectives. The determinacy of concurrent stochastic games with discounted reward objectives was proved in [Sha53], and the determinacy
for limit-average objectives was proved in [MN81]. The existence of pure memoryless optimal strategies for turn-based deterministic games with limit-average objectives was shown
in [EM79]; and for turn-based stochastic games in [LL69]. The existence of pure memoryless strategies in turn-based stochastic games with discounted reward objectives can be
proved from the results of [Sha53]; see [FV97] for analysis of various classes of games with
discounted reward and limit-average objectives. The complexity of turn-based deterministic limit-average games has been studied in [ZP96]; also see [FV97] for algorithms for
turn-based stochastic games with discounted reward and limit-average objectives.
Previous results on nonzero-sum games. The existence of Nash equilibrium in oneshot concurrent games is the celebrated result of Nash [Jr50]. The computation of Nash
equilibria in one-shot games is PPAD-complete [DGP06, CD06], also see [EY07] for related
complexity results. Nash’s theorem holds for the case when the strategy space is convex
and compact. However, for infinite games and the strategy space is not compact and hence
Nash’s result does not immediately extend to infinite games. In fact for concurrent zero-sum
reachability games Nash equilibria (in case of zero-sum games Nash equilibria correspond
to optimal strategies) need not exist. In such case one investigates ε-Nash equilibria, and
ε-Nash equilibria, for all ε > 0 is the best one can achieve. Exact Nash equilibrium do exist
in discounted stochastic games [Fin64]. For concurrent nonzero-sum games with payoffs
defined by Borel sets, surprisingly little is known. Secchi and Sudderth [SS01] showed that
exact Nash equilibrium do exist when all players have payoffs defined by closed sets (“safety
CHAPTER 1. INTRODUCTION
8
objectives”). For the special case of two-player games, existence of ε-Nash equilibrium,
for every ε > 0, is known for limit-average objectives [Vie00a, Vie00b], and for parity
objectives [Cha05]. The existence of ε-Nash equilibrium in n-player concurrent games with
objectives in higher levels of Borel hierarchy is an intriguing open problem.
Organization and new results of the thesis. We now present the organization of the
thesis and the main results of each chapter.
1. (Chapter 2). The basic definitions of various classes of games, objectives, strategies,
and the formal notion of determinacy is presented in Chapter 2.
2. (Chapter 3). In Chapter 3 we consider concurrent games with tail objectives (which
is a generalization of Müller objectives) and prove several basic properties, e.g., we
show that if there there is a state with positive value, then there is some state with
value 1. The properties we prove are useful in the analysis of later chapters.
3. (Chapter 4). In Chapter 4 we study turn-based stochastic games with Müller objectives. The main results of the chapter are as follows:
• we prove an optimal memory bound for pure optimal strategies in turn-based
stochastic Müller games;
• we show the qualitative and quantitative analysis of turn-based stochastic Müller
games are both PSPACE-complete (improving the previous known 3EXPTIME
bound); and
• we present an improved memory bound for randomized optimal strategies as
compared to pure optimal strategies.
4. (Chapter 5). In Chapter 5 we study turn-based stochastic games with Rabin and
Streett objectives. The main results of the chapter are as follows:
CHAPTER 1. INTRODUCTION
9
• we show the qualitative and quantitative analysis of turn-based stochastic games
with Rabin and Streett objectives are NP-complete and coNP-complete, respectively, (improving the previous known 3EXPTIME bound); and
• we present a strategy improvement algorithm for turn-based stochastic Rabin
and Streett games.
5. (Chapter 6). In Chapter 6 we study concurrent games with reachability objectives.
We present an elementary and combinatorial proof of existence of memoryless εoptimal strategies in concurrent games with reachability objectives, for all ε > 0. In
contrast, the previous proofs of the result relied on deep results from analysis (such
as analysis of Puisieux series) [FV97]. The proof techniques we develop also lead to a
strategy improvement algorithm for concurrent reachability games.
6. (Chapter 7). In Chapter 7 we study the complexity of concurrent games with limitaverage objectives and show that these games can be solved in EXPTIME. It also
follows from our results that concurrent games with discounted reward objectives can
be solved in PSPACE. To the best of our knowledge this is the first complexity result
on the solution of concurrent limit-average games. Also the techniques used in the
chapter are useful in the analysis for Chapter 8.
7. (Chapter 8). In Chapter 8 we study the complexity of concurrent games with
parity objectives and show that the quantitative analysis of concurrent parity games
can be achieved in PSPACE (improving the previous 3EXPTIME bound); and as a
consequence obtain an EXPSPACE algorithm for Rabin, Streett and Müller objectives
(as compared to the previously known 4EXPTIME bound).
8. (Chapter 9). In Chapter 9 we study games that are not strictly competitive. We
present a new notion of equilibrium that captures the notion of conditional competitiveness. The new notion of equilibrium, called secure equilibria, captures the
CHAPTER 1. INTRODUCTION
10
notion of adverserial external choice. We show the maximal secure equilibria payoff is
unique for turn-based deterministic games, and present algorithms to compute such
payoff for ω-regular objectives. We then illustrate its application in the synthesis of
independent processes: we show that the notion of secure equilibria generalizes the
assume-guarantee style of reasoning in the game theoretic framework.
The relevant open problems for each chapter is listed along with concluding remarks for the
respective chapter.
Related topics. In this thesis we consider on games played on graphs with finite state
spaces, where each player has perfect information about the state of the game. We briefly
discuss several extensions of such games which have been studied in the literature.
Beyond games for reactive systems. We have only discussed about games played on graphs
that are mainly used in the analysis of reactive systems. However, graph games are widely
used in several other areas of computer science, such as, Ehrenfeucht and Fraissé games
in finite-model theory, network congestion games and auctioning for the analysis of the
internet [Pap01]. We keep our discussion limited to games related to verification of reactive
systems, and now describe several extensions in this context.
Partial-information games. In the class of partial-information games players only have
partial-information about the state of the game. Such games are much harder to solve
as compared to the perfect-information games, for example, 2-player partial-information
turn-based games are 2EXPTIME complete for reachability objectives [Rei79], and several
problems related to partial-information turn-based games with more than 2-players become
undecidable [Rei79]. The results in [CH05] present a close connection between a sub-class
of partial-information turn-based games and perfect-information concurrent games. The
algorithmic analysis of partial-information turn-based games with ω-regular objectives has
been studied in [CDHR06]. The complexity of partial-information Markov decision processes
CHAPTER 1. INTRODUCTION
11
has been studied in [PT87].
Infinite-state games. There are several extensions of games played on finite state space
to games played on infinite state space. The notable of them are pushdown games and
timed games. In case of pushdown games the state of the games encode an unbounded
amount of information about the pushdown store (or a stack); such games have been studied
in [Wal96]; also see [Wal04] for a survey. Pushdown games with stochastic transitions have
been studied in [EY05, EY06]. The class of timed games are played on finite state graphs,
but in continuous time with discrete transitions. The modeling of time by clocks make the
games infinite state games, and such games are studied in [MPS95, dAFH+ 03].
Logic and games. The connection between logical quantifiers and games is deep and wellestablished. Game theory also provides an useful framework to study properties of sets. The
results of Martin [Mar75, Mar98] establishing Borel determinacy for 2-player and concurrent
games illuminates several key properties about sets. The close connections between logic on
trees and 2-player games is well-exposed in [Tho97]. The logic µ-calculus is a logic of fixedpoints and is expressive enough to capture all ω-regular objectives [Koz83]. Emerson and
Jutla [EJ91] established the equivalence of µ-calculus model checking and solving 2-player
parity games. Quantitative µ-calculus has been proposed in [dAM01] to solve concurrent
games with parity objectives, and in [MM02] to solve 2 21 -player games with parity objectives.
The model checking algorithm for the alternating temporal logic ATL requires game solving
procedures as sub-routines [AHK02].
Relationships between games. The relationship between games is an intriguing area of
research. The notions of abstraction of games [HMMR00, HJM03, CHJM05], refinement
relations between games [AHKV98], and distances between games [dAHM03] have been
explored in the literature.
12
Chapter 2
Definitions
In this chapter we will present the definitions of several classes of game graphs,
strategies, objectives, the notion of values and equilibrium. We start with the definition of
game graphs.
2.1
Game Graphs
We first define turn-based game graphs, and then the more general class of con-
current game graphs. We start with some preliminary notation. For a finite set A, a
P
probability distribution on A is a function δ: A → [0, 1] such that a∈A δ(a) = 1. We write
Supp(δ) = {a ∈ A | δ(a) > 0} for the support set of δ. We denote the set of probability
distributions on A by Dist(A).
2.1.1
Turn-based probabilistic game graphs
We consider several classes of turn-based games, namely, two-player turn-based
probabilistic games (2 21 -player games), two-player turn-based deterministic games (2-player
games), and Markov decision processes (1 12 -player games).
Turn-based probabilistic game graphs. A turn-based probabilistic game graph (or
CHAPTER 2. DEFINITIONS
13
2 12 -player game graph) G = ((S, E), (S1 , S2 , SP ), δ) consists of a directed graph (S, E), a
partition of the vertex set S into three subsets S1 , S2 , SP ⊆ S, and a probabilistic transition
function δ: SP → Dist(S). The vertices in S are called states. The state space S is finite.
The states in S1 are player-1 states; the states in S2 are player-2 states; and the states in
SP are probabilistic states. For all states s ∈ S, we define E(s) = {t ∈ S | (s, t) ∈ E} to
be the set of possible successor states. We require that E(s) 6= ∅ for every nonprobabilistic
state s ∈ S1 ∪ S2 , and that E(s) = Supp(δ(s)) for every probabilistic state s ∈ SP . At
player-1 states s ∈ S1 , player 1 chooses a successor state from E(s); at player-2 states
s ∈ S2 , player 2 chooses a successor state from E(s); and at probabilistic states s ∈ SP , a
successor state is chosen according to the probability distribution δ(s).
The turn-based deterministic game graphs (or 2-player game graphs) are the special case of the 2 21 -player game graphs with SP = ∅. The Markov decision processes (MDPs
for short; or 1 12 -player game graphs) are the special case of the 2 21 -player game graphs with
either S1 = ∅ or S2 = ∅. We refer to the MDPs with S2 = ∅ as player-1 MDPs, and to the
MDPs with S1 = ∅ as player-2 MDPs. A game graph that is both deterministic and an
MDP is called a transition system (or 1-player game graph): a player-1 transition system
has only player-1 states; a player-2 transition system has only player-2 states.
2.1.2
Concurrent game graphs
Concurrent game graphs. A concurrent game graph (or a concurrent game structure)
G = (S, A, Γ1 , Γ2 , δ) consists of the following components:
• A finite state space S.
• A finite set A of moves or actions.
• Two move assignments Γ1 , Γ2 : S → 2A \∅. For i ∈ {1, 2}, the player-i move assignment
Γi associates with every state s ∈ S a nonempty set Γi (s) ⊆ A of moves available to
CHAPTER 2. DEFINITIONS
14
player i at state s.
• A probabilistic transition function δ: S × A × A → Dist(S). At every state s ∈ S,
player 1 chooses a move a1 ∈ Γ1 (s), and simultaneously and independently player 2
chooses a move a2 ∈ Γ2 (s). A successor state is then chosen according to the probability distribution δ(s, a1 , a2 ).
For all states s ∈ S and all moves a1 ∈ Γ1 (s) and a2 ∈ Γ2 (s), we define Succ(s, a1 , a2 ) =
Supp(δ(s, a1 , a2 )) to be the set of possible successor states of s when the moves a1 and a2
are chosen. For a concurrent game graph, we define the set of edges as E = {(s, t) ∈ S × S |
(∃a1 ∈ Γ1 (s))(∃a2 ∈ Γ2 (s))(t ∈ Succ(s, a1 , a2 ))}, and as with turn-based game graphs, we
write E(s) = {t | (s, t) ∈ E} for the set of possible successors of a state s ∈ S.
We distinguish the following special classes of concurrent game graphs. The concurrent game graph G is deterministic if |Succ(s, a1 , a2 )| = 1 for all states s ∈ S and all
moves a1 ∈ Γ1 (s) and a2 ∈ Γ2 (s). A state s ∈ S is a turn-based state if there exists a player
i ∈ {1, 2} such that |Γi (s)| = 1; that is, player i has no choice of moves at s. If |Γ2 (s)| = 1,
then s is a player-1 turn-based state; and if |Γ1 (s)| = 1, then s is a player-2 turn-based
state. The concurrent game graph G is turn-based if every state in S is a turn-based state.
Note that the turn-based concurrent game graphs are equivalent to the turn-based probabilistic game graphs: to obtain a 2 21 -player game graph from a turn-based concurrent game
graph G, for every player-i turn-based state s of G, where i ∈ {1, 2}, introduce |Γi (s)| many
probabilistic successor states of s. Moreover, the concurrent game graphs that are both
turn-based and deterministic are equivalent to the 2-player game graphs.
To measure the complexity of algorithms and problems, we need to define the size
of game graphs. We do this for the case that all transition probabilities can be specified as rational numbers. Then the size of a concurrent game graph G is equal to the size of the probP
P
P
P
abilistic transition function δ, that is, |G| = s∈S a1 ∈Γ1 (s) a2 ∈Γ2 (s) t∈S |δ(s, a1 , a2 )(t)|,
CHAPTER 2. DEFINITIONS
15
where |δ(s, a1 , a2 )(t)| denotes the space required to specify a rational probability value.
2.2
Strategies
When choosing their moves, the players follow recipes that are called strategies.
We define strategies both for 2 21 -player game graphs and for concurrent game graphs. On a
concurrent game graph, the players choose moves from a set A of moves, while on a 2 12 -player
game graph, they choose successor states from a set S of states. Hence, for 2 12 -player game
graphs, we define the set of moves as A = S. For 2 12 -player game graphs, a player-1 strategy
prescribes the moves that player 1 chooses at the player-1 states S1 , and a player-2 strategy
prescribes the moves that player 2 chooses at the player-2 states S2 . For concurrent game
graphs, both players choose moves at every state, and hence for concurrent game graphs,
we define the sets of player-1 states and player-2 states as S1 = S2 = S.
Consider a game graph G. A player-1 strategy on G is a function σ: S ∗ · S1 →
Dist(A) that assigns to every nonempty finite sequence ~s ∈ S ∗ · S1 of states ending in a
player-1 state, a probability distribution σ(~s) over the moves A. By following the strategy σ,
whenever the history of a game played on G is ~s, then player 1 chooses the next move
according to the probability distribution σ(~s). A strategy must prescribe only available
moves. Hence, for all state sequences s~1 ∈ S ∗ and all states s ∈ S1 , if σ(s~1 · s)(a) > 0, then
the following condition must hold: a ∈ E(s) for 2 12 -player game graphs G, and a ∈ Γ1 (s)
for concurrent game graphs G. Symmetrically, a player-2 strategy on G is a function π:
S ∗ · S2 → Dist(A) such that if π(s~1 · s)(a) > 0, then a ∈ E(s) for 2 12 -player game graphs G,
and a ∈ Γ2 (s) for concurrent game graphs G. We write Σ for the set of player-1 strategies,
and Π for the player-2 strategies on G. Note that |Π| = 1 if G is a player-1 MDP, and
|Σ| = 1 if G is a player-2 MDP.
CHAPTER 2. DEFINITIONS
2.2.1
16
Types of strategies
We classify strategies according to their use of randomization and memory.
Use of randomization. Strategies that do not use randomization are called pure. A
player-1 strategy σ is pure (or deterministic) if for all state sequences ~s ∈ S ∗ · S1 , there
exists a move a ∈ A such that σ(~s)(a) = 1. The pure strategies for player 2 are defined
analogously. We denote by ΣP the set of pure player-1 strategies, and by ΠP the set of pure
player-2 strategies. A strategy that is not necessarily pure is sometimes called randomized.
Use of memory. Strategies in general require memory to remember the history of a
game. The following alternative definition of strategies makes this explicit. Let M be a set
called memory. A player-1 strategy σ = (σu , σn ) can be specified as a pair of functions: a
memory-update function σu : S ×M → M, which given the current state of the game and the
memory, updates the memory with information about the current state; and a next-move
function σm : S1 × M → Dist(A), which given the current state and the memory, prescribes
the next move of the player. The player-1 strategy σ is finite-memory if the memory M is
a finite set; and the strategy σ is memoryless (or positional) if the memory M is singleton,
i.e., |M | = 1. A finite-memory strategy remembers only a finite amount of information
about the infinitely many different possible histories of the game; a memoryless strategy is
independent of the history of the game and depends only on the current state of the game.
Note that a memoryless player-1 strategy can be represented as a function σ: S1 → Dist(A).
A memoryless strategy σ is uniform memoryless if the memoryless strategy is an uniform
distribution over its support, i.e., for all states s we have σ(s)(a) = 0 if a 6∈ Supp(σ(s))
and σ(s)(a) =
1
|Supp(σ(s))|
if a ∈ Supp(σ(s)). We denote by ΣF the set of finite-memory
player-1 strategies, by ΣM and ΣUM the set of memoryless and uniform memoryless player1 strategies. The finite-memory player-2 strategies ΠF , the memoryless player-2 strategies
ΠM and uniform memoryless player-2 strategies ΠUM are defined analogously.
17
CHAPTER 2. DEFINITIONS
A pure finite-memory strategy is a pure strategy that is finite-memory; we write ΣPF =
ΣP ∩ ΣF for the pure finite-memory player-1 strategies, and ΠPF for the corresponding
player-2 strategies. A pure memoryless strategy is a pure strategy that is memoryless. The
pure memoryless strategies use neither randomization nor memory; they are the simplest
strategies we consider. Note that a pure memoryless player-1 strategy can be represented
as a function σ: S1 → A. We write ΣPM = ΣP ∩ ΣM for the pure memoryless player-1
strategies, and ΠPM for the corresponding class of simple player-2 strategies.
2.2.2
Probability space and outcomes of strategies
A path of the game graph G is an infinite sequence ω = hs0 , s1 , s2 , . . .i of states in
S such that (sk , sk+1 ) ∈ E for all k ≥ 0. We denote the set of paths of G by Ω. Once a
starting state s ∈ S and strategies σ ∈ Σ and π ∈ Π for the two players are fixed, the result
of the game is a random walk in G, denoted as ωsσ,π .
Probability space of strategies. Given a finite sequence x = hs0 , s1 , . . . , sk i of states,
the cone for x the set Cone(x) = {hs′0 , s′1 , . . .i | (∀0 ≤ i ≤ k)(si = s′i )} of paths with prefix x.
Let U be the set of cones, for all finite paths of G. The set U is the set of basic open sets
in S ω . Let F be the Borel σ-field generated by U, i.e., F is the smallest set that is closed
under complementation, countable union, countable intersection, Ω ∈ F and U ⊆ F. Then
(Ω, F) is a σ-algebra. Given strategies σ and π for player 1 and player 2, respectively, and
a state s, we define a function µσ,π
s : U → [0, 1] as follows:
• Cones of length 1:
′
µσ,π
s (Cone(s ))
=



1


0
if s = s′
otherwise
• Cones of length greater than 1: given a finite sequence ωk+1 = hs0 , s1 , . . . , sk , sk+1 i,
18
CHAPTER 2. DEFINITIONS
let ωk = hs0 , s1 , . . . , sk i and
σ,π
µσ,π
s (Cone(ωk+1 )) = µs (Cone(ωk ))·
X
δ(sk , a1 , a2 )(sk+1 )·σ(ωk )(a1 )·π(ωk )(a2 ).
a1 ∈Γ1 (sk ),
a2 ∈Γ2 (sk )
The function µσ,π
is a measure and there is a unique extension of µσ,π
as a probability
s
s
measure on F (by Carathéodary Extension Theorem [Bil95]). We denote this probability
measure on F induced by strategies σ and π, and the starting state s as Prσ,π
s . Then
(Ω, F, Prσ,π
s ) is a probability space. An event Φ is a measurable set of paths, i.e., Φ ∈ F.
σ,π
Given an event Φ, Prσ,π
is in Φ. For
s (Φ) denotes the probability that the random walk ωs
a measurable function f : Ω → R we denote by Eσ,π
s [f ] the expectation of the function
f under the probability distribution Prσ,π
s (·). For i ≥ 0, we denote by Xi : Ω → S the
random variable denoting the i-th state along a path, and by Y1,i and Y2,i the random
variables denoting the action played in the i-th round of the play by player 1 and player 2,
respectively.
Outcomes of strategies. Consider two strategies σ ∈ Σ and π ∈ Π on a game graph G,
and let ω = hs0 , s1 , s2 , . . .i be a path of G. The path ω is (σ, π)-possible for a 2 21 -player
game graph G if for every k ≥ 0 the following two conditions hold: if sk ∈ S1 , then
σ(s0 s1 . . . sk )(sk+1 ) > 0; and if sk ∈ S2 , then π(s0 s1 . . . sk )(sk+1 ) > 0. The path ω is (σ, π)possible for a concurrent game graph G if for every k ≥ 0, there exist moves a1 ∈ Γ1 (sk ) and
a2 ∈ Γ2 (sk ) for the two players such that σ(s0 s1 . . . sk )(a1 ) > 0 and π(s0 s1 . . . sk )(a2 ) > 0
and sk+1 ∈ Succ(sk , a1 , a2 ). Given a state s ∈ S and two strategies σ ∈ Σ and π ∈ Π, we
denote by Outcome(s, σ, π) ⊆ Ω the set of (σ, π)-possible paths whose first state is s. Note
that Outcome(s, σ, π) is a probability-1 event, i.e., Prσ,π
s (Outcome(s, σ, π)) = 1.
Given a game graph G and a player-1 strategy σ ∈ Σ, we write Gσ for the game
played on G under the constraint that player 1 follows the strategy σ. Analogously, given G
CHAPTER 2. DEFINITIONS
19
and a player-2 strategy π ∈ Π, we write Gπ for the game played on G under the constraint
that player 2 follows the strategy π. Observe that for a 2 12 -player game graph G or a
concurrent game graph G and a memoryless player-1 strategy σ ∈ Σ, the result Gσ is a
player-2 MDP. Similarly, for a player-2 MDP G and a memoryless player-2 strategy π ∈ Π,
the result Gπ is a Markov chain. Hence, if G is a 2 21 -player game graph or a concurrent
game graph and the two players follow memoryless strategies σ and π, then the result
Gσ,π = (Gσ )π is a Markov chain. Also the following observation will be used later. Given a
game graph G and a strategy in Σ∪Π with finite memory M, the strategy can be interpreted
as a memoryless strategy in the synchronous product G × M of the game graph G with the
memory M. Hence the above observation (on memoryless strategies) also extends to finitememory strategies, i.e., if player 1 plays a finite-memory strategy σ, then Gσ is a player-2
MDP, and if both players follow finite-memory strategies, then we have a Markov chain.
2.3
Objectives
Consider a game graph G. Player-1 and player-2 objectives for G are measurable
sets Φ1 , Φ2 ⊆ Ω of winning paths for the two players: player i, for i ∈ {1, 2}, wins the game
played on the graph G with the objective Φi iff the infinite path in Ω that results from
playing the game, lies inside the set Φi . In the case of zero-sum games, the objectives of
the two players are strictly competitive, that is, Φ2 = Ω \ Φ1 . A general class of objectives
are the Borel objectives. A Borel objective Φ ⊆ Ω is a Borel set in the Cantor topology
on the set S ω of infinite state sequences (note that Ω ⊆ S ω ). An important subclass of
the Borel objectives are the ω-regular objectives, which lie in the first 2 21 levels of the Borel
hierarchy (i.e., in the intersection of Σ03 and Π03 ). The ω-regular objectives are of special
interest for the verification and synthesis of reactive systems [MP92]. In particular, the
following specifications of winning conditions for the players define ω-regular objectives,
CHAPTER 2. DEFINITIONS
20
and subclasses thereof [Tho97].
Reachability and safety objectives. A reachability specification for the game graph
G is a set T ⊆ S of states, called target states. The reachability specification T requires
that some state in T be visited. Thus, the reachability specification T defines the set
Reach(T ) = {hs0 , s1 , s2 , . . .i ∈ Ω | (∃k ≥ 0)(sk ∈ T )} of winning paths; this set is called
a reachability objective. A safety specification for G is likewise a set U ⊆ S of states;
they are called safe states. The safety specification U requires that only states in U be
visited. Formally, the safety objective defined by U is the set Safe(U ) = {hs0 , s1 , . . .i ∈ Ω |
(∀k ≥ 0)(sk ∈ U )} of winning paths. Note that reachability and safety are dual objectives:
Safe(U ) = Ω \ Reach(S \ U ).
Büchi and coBüchi objectives. A Büchi specification for G is a set B ⊆ S of states,
which are called Büchi states. The Büchi specification B requires that some state in B be
visited infinitely often. For a path ω = hs0 , s1 , s2 , . . .i, we write Inf(ω) = {s ∈ S | sk =
s for infinitely many k ≥ 0} for the set of states that occur infinitely often in ω. Thus, the
Büchi objective defined by B is the set Büchi(B) = {ω ∈ Ω | Inf(ω) ∩ B 6= ∅} of winning
paths. The dual of a Büchi specification is a coBüchi specification C ⊆ S, which specifies a
set of so-called coBüchi states. The coBüchi specification C requires that the states outside
C be visited only finitely often. Formally, the coBüchi defined by C is the set coBüchi(C) =
{ω ∈ Ω | Inf(ω) ⊆ C} of winning paths. Note that coBüchi(C) = Ω \ Büchi(S \ C). It is
also worth noting that reachability and safety objectives can be turned into both Büchi and
coBüchi objectives, by slightly modifying the game graph (for example if every target state
s ∈ T is made a sink state, then we have Reach(T ) = Büchi(T )).
Rabin and Streett objectives. We use colors to define objectives independent of game
graphs. For a set C of colors, we write [·]]: C → 2S for a function that maps each color to a set
of states. Inversely, given a set U ⊆ S of states, we write [U ] = {c ∈ C | [c]] ∩ U 6= ∅} for the
CHAPTER 2. DEFINITIONS
21
set of colors that occur in U . Note that a state can have multiple colors. A Rabin objective
is specified as a set P = {(e1 , f1 ), . . . , (ed , fd )} of pairs of colors ei , fi ∈ C. Intuitively, the
Rabin condition P requires that for some 1 ≤ i ≤ d, all states of color ei be visited finitely
often and some state of color fi be visited infinitely often. Let [P ] = {(E1 , F1 ), . . . , (Ed , Fd )}
be the corresponding set of so-called Rabin pairs, where Ei = [ei] and Fi = [fi] for all
1 ≤ i ≤ d. Formally, the set of winning plays is Rabin(P ) = {ω ∈ Ω | ∃ 1 ≤ i ≤
d. (Inf(ω) ∩ Ei = ∅ ∧ Inf(ω) ∩ Fi 6= ∅)}. Without loss of generality, we require that
S
i∈{1,2,...,d} (Ei ∪ Fi ) = S. The parity (or Rabin-chain) objectives are the special case
of Rabin objectives such that E1 ⊂ F1 ⊂ E2 ⊂ F2 . . . ⊂ Ed ⊂ Fd . A Streett objective is
again specified as a set P = {(e1 , f1 ), . . . , (ed , fd )} of pairs of colors. The Streett condition
P requires that for each 1 ≤ i ≤ d, if some state of color fi is visited infinitely often,
then some state of color ei be visited infinitely often. Formally, the set of winning plays is
Streett(P ) = {ω ∈ Ω | ∀ 1 ≤ i ≤ d. (Inf(ω) ∩ Ei 6= ∅ ∨ Inf(ω) ∩ Fi = ∅)}, for the set
[P ] = {(E1 , F1 ), . . . , (Ed , Fd )} of so-called Streett pairs. Note that the Rabin and Streett
objectives are dual; i.e., the complement of a Rabin objective is a Streett objective, and
vice versa.
Parity objectives. A parity specification for G consists of a nonnegative integer d and
a function p: S → {0, 1, 2, . . . , 2d}, which assigns to every state of G an integer between
0 and 2d. For a state s ∈ S, the value p(s) is called the priority of S. We assume
without loss of generality that p−1 (j) 6= ∅ for all 0 < j ≤ 2d; this implies that a parity
specification is completely specified by the priority function p (and d does not need to be
specified explicitly). The positive integer 2d + 1 is referred to as the number of priorities of p. The parity specification p requires that the minimum priority of all states that
are visited infinitely often, is even. Formally, the parity objective defined by p is the set
Parity(p) = {ω ∈ Ω | min{p(s) | s ∈ Inf(ω)} is even} of winning paths. Note that for
a parity objective Parity(p), the complementary objective Ω \ Parity(p) is again a parity
CHAPTER 2. DEFINITIONS
22
objective: Ω \ Parity(p) = Parity(p + 1), where the priority function p + 1 is defined by
(p + 1)(s) = p(s) + 1 for all states s ∈ S (if p−1 (0) = ∅, then use p − 1 instead of p + 1). This
self-duality of parity objectives is often convenient when solving games. It is also worth
noting the Büchi objectives are parity objectives with two priorities (let p−1 (0) = B),
and the coBüchi objectives are parity objectives with three priorities (let p−1 (0) = ∅ and
p−1 (1) = S \ C and p−1 (2) = C).
Parity objectives are also called Rabin-chain objectives, as they are a special case
of Rabin objectives [Tho97]: if the sets of a Rabin pair P = {(E1 , F1 ), . . . , (Ed , Fd )} form
a chain E1 ( F1 ( E2 ( F2 ( · · · ( Ed ( Fd , then Rabin(P ) = Parity(p) for the priority
function p: S → {0, 1, . . . , 2d} that for all 1 ≤ j ≤ d assigns to each state in Ej \ Fj−1
the priority 2j − 1, and to each state in Fj \ Ej the priority 2j, where F0 = ∅. Conversely,
given a priority function p: S → {0, 1, . . . , 2d}, we can construct a chain E1 ( F1 ( · · · (
Ed+1 ( Fd+1 of Rabin sets such that Parity(p) = Rabin({(E1 , F1 ), . . . , (Ed , Fd )} as follows:
let E1 = ∅ and F1 = p−1 (0), and for all 1 ≤ j ≤ d + 1, let and Ej = Fj−1 ∪ p−1 (2j − 3)
and Fj = Ej ∪ p−1 (2j − 2). Hence, the parity objectives are a subclass of the Rabin
objectives that is closed under complementation. It follows that every parity objective is
both a Rabin objective and a Streett objective. The parity objectives are of special interest,
because every ω-regular objective can be turned into a parity objective by modifying the
game graph (take the synchronous product of the game graph with a deterministic parity
automaton that accepts the ω-regular objective) [Mos84].
Müller and upward-closed objectives. The most general form for defining ω-regular
objectives are Müller specifications. A Müller specification for the game graph G is a set
M ⊆ 2S of sets of states. The sets in M are called Müller sets. The Müller specification
M requires that the set of states that are visited infinitely often is one of the Müller sets.
Formally, the Müller specification M defines the Müller objective Müller(M ) = {ω ∈ Ω |
Inf(ω) ∈ M }. Note that Rabin and Streett objectives are special cases of Müller objectives.
23
CHAPTER 2. DEFINITIONS
The upward-closed objectives form a sub-class of Müller objectives, with the restriction
that the set M is upward-closed. Formally a set UC ⊆ 2S is upward-closed if the following
condition hold: if U ∈ UC and U ⊆ Z, then Z ∈ UC . Given a upward-closed set UC ⊆ 2S ,
the upward-closed objective is defined as the set UpClo(UC ) = {ω ∈ Ω | Inf(ω) ∈ UC } of
winning plays.
2.4
Game Values
Given a state s and an objective Ψ1 for player 1, the maximal probability with
which player 1 can ensure that Ψ1 holds from s is the value of the game at s for player 1.
Formally, given a game graph G with objectives Ψ1 for player 1 and Ψ2 for player 2, we
define the value functions Val 1 and Val 2 for the players 1 and 2, respectively, as follows:
Val G
1 (Ψ1 )(s) =
Val G
2 (Ψ2 )(s) =
inf sup Prσ,π
s (Ψ1 );
π∈Π σ∈Σ
inf sup Prσ,π
s (Ψ2 ).
σ∈Σ π∈Π
If the game graph G is clear from the context, then we will drop the superscript G. Given a
game graph G, a strategy σ for player 1 and an objective Ψ1 we use the following notation
Val σ1 (Ψ1 )(s) = inf Prσ,π
s (Ψ1 ).
π∈Π
Given a game graph G, a strategy σ for player 1 is optimal from state s for objective Ψ1 if
Val 1 (Ψ1 )(s) = Val σ1 (Ψ1 )(s) = inf Prσ,π
s (Ψ1 ).
π∈Π
Given a game graph G, a strategy σ for player 1 is ε-optimal, for ε ≥ 0, from state s for
objective Ψ1 if
Val 1 (Ψ1 )(s) − ε ≤ inf Prσ,π
s (Ψ1 ).
π∈Π
Note that an optimal strategy is ε-optimal with ε = 0. The optimal and ε-optimal strategies
for player 2 are defined analogously. Computing values, optimal and ε-optimal strategies is
referred to as the quantitative analysis of games.
CHAPTER 2. DEFINITIONS
24
Sure, almost-sure, positive and limit-sure winning strategies. Given a game graph
G with an objective Ψ1 for player 1, a strategy σ is a sure winning strategy for player 1
from a state s if for every strategy π of player 2 we have Outcome(s, σ, π) ⊆ Ψ1 . A strategy
σ is an almost-sure winning strategy for player 1 from a state s for the objective Ψ1 if
for every strategy π of player 2 we have Prσ,π
s (Ψ1 ) = 1. A strategy σ is positive winning
for player 1 from the state s for the objective Φ if for every player-2 strategy π, we have
C
Prσ,π
s (Φ) > 0. A family of strategies Σ is limit-sure winning for player 1 from a state s for
the objective Ψ1 , if we have supσ∈ΣC inf π∈Π Prσ,π
s (Ψ1 )(s) = 1. The sure winning, almostsure winning, positive winning and limit-sure winning strategies for player 2 are defined
analogously. Given a game graph G and an objective Ψ1 for player 1, the sure winning set
Sure G
1 (Ψ1 ) for player 1 is the set of states from which player 1 has a sure winning strategy.
Similarly, the almost-sure winning set Almost G
1 (Ψ1 ) for player 1 is the set of states from
which player 1 has an almost-sure winning strategy, the positive winning set Positive G
1 (Ψ1 )
for player 1 is the set of states from which player 1 has an almost-sure winning strategy,
and the limit-sure winning set Limit G
1 (Ψ1 ) for player 1 is the set of states from which
player 1 has limit-sure winning strategies. The sure winning set Sure G
2 (Ψ2 ), the almostG
sure winning set Almost G
2 (Ψ2 ), the positive winning set Positive 2 (Ψ2 ) and the limit-sure
winning set Limit G
2 (Ψ2 ) with objective Ψ2 for player 2 are defined analogously. Again if the
game graph G is clear from the context we will drop G from superscript. It follows from
the definitions that for all 2 12 -player and concurrent game graphs and all objectives Ψ1 and
Ψ2 , we have Sure 1 (Ψ1 ) ⊆ Almost 1 (Ψ1 ) ⊆ Limit 1 (Ψ1 ) ⊆ Positive 1 (Ψ1 ) and Sure 2 (Ψ2 ) ⊆
Almost 2 (Ψ2 ) ⊆ Limit 2 (Ψ2 ) ⊆ Positive 2 (Ψ2 ). A game is sure winning (resp. almost-sure
winning and limit-sure winning) for player i, if every state is sure winning (resp. almostsure winning and limit-sure winning) for player i, for i ∈ {1, 2}. Computing sure winning,
almost-sure winning, positive winning and limit-sure winning sets and strategies is referred
to as the qualitative analysis of games.
CHAPTER 2. DEFINITIONS
25
Sufficiency of a family of strategies. Let C ∈ {P, M, F, PM , PF } and consider the
family ΣC of special strategies for player 1. We say that the family ΣC suffices with respect
to an objective Ψ1 on a class G of game graphs for
• sure winning if for every game graph G ∈ G, for every s ∈ Sure 1 (Ψ1 ) there is
a player-1 strategy σ ∈ ΣC such that for every player-2 strategy π ∈ Π we have
Outcome(s, σ, π) ⊆ Ψ1 ;
• almost-sure winning if for every game graph G ∈ G, for every state s ∈ Almost 1 (Ψ1 )
there is a player-1 strategy σ ∈ ΣC such that for every player-2 strategy π ∈ Π we
have Prσ,π
s (Ψ1 ) = 1;
• positive winning if for every game graph G ∈ G, for every state s ∈ Positive 1 (Ψ1 )
there is a player-1 strategy σ ∈ ΣC such that for every player-2 strategy π ∈ Π we
have Prσ,π
s (Ψ1 ) > 0;
• limit-sure winning if for every game graph G ∈ G, for every state s ∈ Limit 1 (Ψ1 ) we
have supσ∈ΣC inf π∈Π Prσ,π
s (Ψ1 ) = 1;
• optimality if for every game graph G ∈ G, for every state s ∈ S there is a player-1
strategy σ ∈ ΣC such that Val 1 (Ψ1 )(s) = inf π∈Π Prσ,π
s (Ψ1 ).
• ε-optimality if for every game graph G ∈ G, for every state s ∈ S there is a player-1
strategy σ ∈ ΣC such that Val 1 (Ψ1 )(s) − ε ≤ inf π∈Π Prσ,π
s (Ψ1 ).
The notion of sufficiency for size of finite-memory strategies is obtained by referring to the
size of the memory M of the strategies. The notions of sufficiency of strategies for player 2
is defined analogously.
For sure winning, 1 12 -player and 2 12 -player games coincide with 2-player (turnbased deterministic) games where the random player (who chooses the successor at the
26
CHAPTER 2. DEFINITIONS
probabilistic states) is interpreted as an adversary, i.e., as player 2. This is formalized by
the proposition below.
Proposition 1 If a family ΣC of strategies suffices for sure winning with respect to an
objective Φ on all 2-player game graphs, then the family ΣC suffices for sure winning with
respect to Φ also on all 1 21 -player and 2 21 -player game graphs.
The following proposition states that randomization is not necessary for sure winning.
Proposition 2 If a family ΣC of strategies suffices for sure winning with respect to a Borel
objective Φ on all concurrent game graphs, then the family ΣC ∩ΣP of pure strategies suffices
for sure winning with respect to Φ on all concurrent game graphs.
2.5
Determinacy
The fundamental concept of rationality in zero-sum games is captured by the
notion of optimal and ε-optimal strategies. The key result that establishes existence of
ε-optimal strategies, for all ε > 0, in zero-sum games is the determinacy result, that states
the sum of the values of the players is 1 at all states, i.e., for all states s ∈ S, we have
Val 1 (Ψ1 )(s) + Val 2 (Ψ2 )(s) = 1. The determinacy result implies the following equality:
σ,π
sup inf Prσ,π
s (Ψ1 ) = inf sup Prs (Ψ1 ).
σ∈Σ π∈Π
π∈Π σ∈Σ
The determinacy result also guarantees existence of ε-optimal strategies for all ε > 0,
for both players. A deep result by Martin [Mar98] established that determinacy holds
for all concurrent games with Borel objectives (see Theorem 1). A more refined notion
of determinacy is the sure determinacy which states that for an objective Ψ1 we have
Sure 1 (Ψ1 ) = (S \ Sure 2 (Ω \ Ψ1 )). The sure determinacy holds for turn-based deterministic
CHAPTER 2. DEFINITIONS
27
games with all Borel objectives [Mar75], however, the sure determinacy does not hold for
2 12 -player games and concurrent games.
Theorem 1 For all Borel objectives Ψ1 and the complementary objective Ψ2 = Ω \ Ψ1 the
following assertions hold.
1. ([Mar75]).
For all 2-player game graphs, the sure winning sets Sure 1 (Ψ1 ) and
Sure 2 (Ψ2 ) form a partition of the state space, i.e., Sure 1 (Ψ1 ) = S \ Sure 2 (Ψ2 ), and
the family of pure strategies suffices for sure winning.
2. ([Mar98]).
For all concurrent game structures and for all states s we have
Val 1 (Ψ1 )(s) + Val 2 (Ψ2 )(s) = 1.
Given a game graph G, let us denote by Σε (Ψ1 ) and Πε (Ψ2 ) the set of ε-optimal
strategies for player 1 for objective Ψ1 and and player 2 for objective Ψ2 , respectively. We
obtain the following corollary from Theorem 1.
Corollary 1 For all concurrent game structures, for all Borel objectives Ψ1 and the complementary objective Ψ2 , for all ε > 0 we have Σε (Ψ1 ) 6= ∅ and Πε (Ψ2 ) 6= ∅.
2.6
Complexity of Games
We now summarize the main complexity results related to 2-player, 2 21 -player and
concurrent games with parity, Rabin, Streett and Müller objectives. We first present the
result for 2-player games.
Theorem 2 (Complexity of 2-player games) The problem of deciding whether a state
s is a sure winning state, i.e., s ∈ Sure 1 (Ψ1 ) for an objective Ψ1 , is NP-complete for
Rabin objectives and coNP-complete for Streett objectives [EJ88], and PSPACE-complete
for Müller objectives [HD05].
CHAPTER 2. DEFINITIONS
28
We now state the main complexity results known for concurrent game structures. The basic results were proved for reachability and parity objectives (given by Theorem 3). By an exponential reduction of Rabin, Streett and Müller objectives to parity
objectives [Tho97], we obtain Corollary 2.
Theorem 3 The following assertions hold.
1. ([dAHK98]). For all concurrent game structures G, for all T ⊆ S, for a state s ∈ S,
whether Val 1 (Reach(T ))(s) = 1 can be decided in PTIME.
2. ([EY06]). For all concurrent game structures G, for all T ⊆ S, for a state s ∈ S,
a rational α and a rational ε > 0, whether Val 1 (Reach(T ))(s) ≥ α can be decided
in PSPACE; and a rational interval [l, u] such that Val 1 (Reach(T ))(s) ∈ [l, u] and
u − l ≤ ε can be computed in PSPACE.
3. ([dAH00]). For all concurrent game structures G, for all priority functions p, for a
state s ∈ S, whether Val 1 (Parity(p))(s) = 1 can be decided in NP ∩ coNP.
4. ([dAM01]). For all concurrent game structures G, for all priority functions p, for a
state s ∈ S, a rational α and a rational ε > 0, whether Val 1 (Parity(p))(s) ≥ α can be
decided in 3EXPTIME; and a rational interval [l, u] such that Val 1 (Parity(p))(s) ∈
[l, u] and u − l ≤ ε can be computed in 3EXPTIME.
Corollary 2 For all concurrent game structures G, for all Rabin, Streett, and Müller objectives Φ, for a state s ∈ S, a rational α and a rational ε > 0, whether Val 1 (Φ)(s) ≥ α
can be decided in 4EXPTIME; a rational interval [l, u] such that Val 1 (Φ)(s) ∈ [l, u] and
u − l ≤ ε can be computed in 4EXPTIME; and whether Val 1 (Φ)(s) = 1 can be decided in
2EXPTIME.
We now present the result for 2 21 -player game graphs. The results for 2 21 -player
game graphs are obtained as follows: the qualitative analysis for reachability and parity
CHAPTER 2. DEFINITIONS
29
objectives follows from the results on concurrent game structures; the result for quantitative
analysis for reachability objectives follows from the results of Condon [Con92]; and the result
for quantitative analysis for parity objectives follows from the results of concurrent games
but with an exponential improvement. The results are presented in Theorem 4, and the
exponential reduction of Rabin, Streett and Müller objectives to parity objectives [Tho97]
yields Corollary 3.
Theorem 4 The following assertions hold.
1. ([dAHK98]). For all 2 21 -player game graphs G, for all T ⊆ S, for a state s ∈ S,
whether Val 1 (Reach(T ))(s) = 1 can be decided in PTIME.
2. ([Con92]). For all 2 21 -player game graphs G, for all T ⊆ S, for a state s ∈ S,
and a rational α, whether Val 1 (Reach(T ))(s) ≥ α can be decided in NP ∩ coNP, and
Val 1 (Reach(T ))(s) can be computed in EXPTIME.
3. ([dAH00]). For all 2 21 -player game graphs G, for all priority functions p, for a state
s ∈ S, whether Val 1 (Parity(p))(s) = 1 can be decided in NP ∩ coNP.
4. ([dAM01]). For all 2 21 -player game structures G, for all priority functions p, for a
state s ∈ S, a rational α, and a rational ε > 0, whether Val 1 (Parity(p))(s) ≥ α can
be decided in 2EXPTIME; and a rational interval [l, u] such that Val 1 (Parity(p))(s) ∈
[l, u] and u − l ≤ ε can be computed in 2EXPTIME.
Corollary 3 For all 2 21 -player game graphs G, for all Rabin, Streett, and Müller objectives
Φ, for a state s ∈ S, a rational α and a rational ε > 0, whether Val 1 (Φ)(s) ≥ α can be
decided in 3EXPTIME; a rational interval [l, u] such that Val 1 (Φ)(s) ∈ [l, u] and u − l ≤ ε
can be computed in 3EXPTIME; and whether Val 1 (Φ)(s) = 1 can be decided in 2EXPTIME.
30
Chapter 3
Concurrent Games with Tail
Objectives
In this chapter we will consider concurrent games with tail objectives,1 i.e., objectives that are independent of the finite-prefix of traces, and show that the class of tail
objectives are strictly richer than the ω-regular objectives. We develop new proof techniques
to extend several properties of concurrent games with ω-regular objectives to concurrent
games with tail objectives. We prove the positive limit-one property for tail objectives,
that states for all concurrent games if the optimum value for a player is positive for a tail
objective Φ at some state, then there is a state where the optimum value is 1 for Φ, for the
player. We also show that the optimum values of zero-sum (strictly conflicting objectives)
games with tail objectives can be related to equilibrium values of nonzero-sum (not strictly
conflicting objectives) games with simpler reachability objectives. A consequence of our
analysis presents a polynomial time reduction of the quantitative analysis of tail objectives
to the qualitative analysis for the sub-class of one-player stochastic games (Markov decision
processes). The properties we prove for the general class of concurrent games with tail
1
A preliminary version of the results of this chapter appeared in [Cha06, Cha07a]
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
31
objectives will be used in the later chapters for both concurrent and turn-based games with
Müller objectives.
3.1
Tail Objectives
The class of tail objectives are defined as follows.
Tail objectives. Informally the class of tail objectives is the sub-class of Borel objectives
that are independent of all finite prefixes. An objective Φ is a tail objective, if the following
condition hold: a path ω ∈ Φ if and only if for all i ≥ 0, ωi ∈ Φ, where ωi denotes the
path ω with the prefix of length i deleted. Formally, let Gi = σ(Xi , Xi+1 , . . .) be the σfield generated by the random-variables Xi , Xi+1 , . . .. 2 . The tail σ-field T is defined as
T
T = i≥0 Gi . An objective Φ is a tail objective if and only if Φ belongs to the tail σ-field
T , i.e., the tail objectives are indicator functions of events A ∈ T .
Observe that Müller and parity objectives are tail objectives. Büchi and coBüchi
objectives are special cases of parity objectives and hence tail objectives. Reachability
objectives are not necessarily tail objectives, but for a set T ⊆ S of states, if every state
s ∈ T is an absorbing state, then the objective Reach(T ) is equivalent to Büchi(T ) and hence
is a tail objective. It may be noted that since σ-fields are closed under complementation,
the class of tail objectives are closed under complementation. We give an example to show
that the class of tail objectives are richer than ω-regular objectives.
3
Example 1 Let r be a reward function that maps every state s to a real-valued reward
r(s), i.e., r : S → R. Given a reward function r, we define a function LimAvgr : Ω → R
n
1X
as follows: for a path ω = hs1 , s2 , s3 , . . .i we have LimAvgr (ω) = lim inf
r(si ) i.e.,
n→∞ n
i=1
LimAvgr (ω) defines the long-run average of the rewards appearing in ω. For a constant
2
We use σ for strategies and σ (boldface) for sigma-fields
Our example shows that there are Π03 -hard objectives that are tail objectives. It is possible that the
tail objectives can express objectives in even higher levels of Borel hierarchy than Π03 , which will make our
results stronger.
3
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
32
c ∈ R consider the objective Φc defined as follows: Φc = {ω ∈ Ω | LimAvgr (ω) ≥ c}.
Intuitively, Φc accepts the set of paths such that the “long-run” average of the rewards in
the path is at least the constant c. The “long-run” average condition is hard for the thirdlevel of the Borel-hierarchy (see subsection 3.1.1 for Π03 -completeness proof ) and cannot be
expressed as an ω-regular objective. It may be noted that the “long-run” average of a path
is independent of all finite-prefixes of the path. Formally, the class Φc of objectives are tail
objectives. Since Φc are Π03 -hard objectives, it follows that tail objectives lie in higher levels
of Borel hierarchy than ω-regular objectives.
Notation. For ε > 0, an objective Φ for player 1 and Φ for player 2, we denote by Σε (Φ)
and Πε (Φ) the set of ε-optimal strategies for player 1 and player 2, respectively. Note that
the quantitative determinacy of concurrent games equivalent to the existence of ε-optimal
strategies for objective Φ for player 1 and Φ for player 2, for all ε > 0, at all states s ∈ S,
i.e., for all ε > 0, Σε (Φ) 6= ∅ and Πε (Φ) 6= ∅ (Corollary 1). We refer to the analysis of
computing the limit-sure winning states (the set of states s such that Val 1 (Φ)(s) = 1) and
ε-limit-sure winning strategies (ε-optimal strategies for the limit-sure winning states) as the
qualitative analysis of objective Φ. We refer to the analysis of computing the values and
the ε-optimal strategies as the quantitative analysis of objective Φ.
3.1.1
Completeness of limit-average objectives
Borel hierarchy.
For an (possibly infinite) alphabet A, let Aω and A∗ denote
the set of infinite and finite words on A, respectively.
The finite Borel hierarchy
(Σ01 , Π01 ),(Σ02 , Π02 ),(Σ03 , Π03 ), . . . is defined as follows:
• Σ01 = {W · Aω | W ⊆ A∗ } is the set of open sets;
• for all n ≥ 1, Π0n = {Aω \ L | L ∈ Σ0n } consists of the complement of sets in Σ0n ;
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
33
S
• for all n ≥ 1, Σ0n+1 = { i∈N Li | ∀i ∈ N. Li ∈ Π0n } is the set obtained by countable
union of sets in Π0n .
Definition 1 (Wadge game) Let A and B be two (possibly infinite) alphabets. Let X ⊆
Aω and Y ⊆ B ω . The Wadge game GW (X, Y ) is a two player game between player 1
and player 2 as follows. Player 1 first chooses a letter a0 ∈ A and then player 2 chooses a
(possibly empty) finite word b0 ∈ B ∗ , then player 1 chooses a letter a1 ∈ A and then player 2
chooses a word b1 ∈ B ∗ , and so on. The play consists in writing a word wX = a0 a1 . . . by
player 1 and wY = b0 b1 . . . by player 2. Player 2 wins if and only if both wY is infinite and
wX ∈ X iff wY ∈ Y .
Definition 2 (Wadge reduction) Given alphabets A and B, a set X ⊆ Aω is Wadge
reducible to a set Y ⊆ B ω , denoted as X ≤W Y , if and only if there exists a continuous
function f : Aω → B ω such that X = f −1 (Y ). If X ≤W Y and Y ≤W X, then X and Y
are Wadge equivalent and we denote this by X ≡W Y .
The notion of strategies in Wadge games and winners is defined similarly to the
notion of games on graphs. The Wadge games and Wadge reduction are related by the
following result.
Proposition 3 ([Wad84]) Player 2 has a winning strategy in the Wadge game GW (X, Y )
iff X ≤W Y .
Wadge equivalence preserves Borel hierarchy and defines the natural notion of
completeness.
Proposition 4 If X ≡W Y , then X and Y belong to the same level of Borel hierarchy.
Definition 3 A set Y ∈ Σ0n (resp. Y ∈ Π0n ) is Σ0n -complete (resp. Π0n -complete) if and
only if X ≤W Y for all X ∈ Σ0n (resp. X ∈ Π0n ).
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
34
Our goal is to show that the lim inf objectives (defined in Example 1) are Π03 -hard.
We first present a few notations.
Notations. Let A be an alphabet and B = {b0 , b1 }. For a word w ∈ A∗ or w ∈ B ∗ we
denote by len(w) the length of w. For an infinite word w or finite word w with len(w) ≥ k
we denote by (w ↾ k) the prefix of length k of w. For a word w ∈ B ω or w ∈ B ∗ with
len(w) ≥ k, we denote by
number of b0 in (w ↾ k)
avg w ↾ k =
,
k
i.e., the average of b0 ’s in (w ↾ k). For a finite word w we denote by avg(w) = avg(w ↾
len(w)). Let
= {w ∈ B ω | lim inf avg(w ↾ k) = 1}
k→∞
\[ \
1
=
{w ∈ B ω | avg(w ↾ k) ≥ 1 − }.
i
Y
i≥0 j≥0 k≥j
Hardness of Y . We will show that Y is Π03 -hard. To prove the result we consider an
arbitrary X ∈ Π03 and show that X ≤W Y . A set X ⊆ Aω in Π03 is obtained as the
countable intersection of countable union of closed sets, i.e.,
X=
\[
(Aj · (Fij )ω ),
i≥0 j≥0
where Fij ⊆ A, and Aj denotes the set of words of length j in A∗ . We show such a X is
Wadge reducible to Y , by showing that player 2 has a winning strategy in GW (X, Y ). In
the reduction we will use the following notation: given a word w ∈ A∗ , let
sat(w) = {i | ∃j ≥ 0. w ∈ Aj · (Fij )∗ };
d(w) = max{l | ∀l′ ≤ l. l′ ∈ sat(w)} + 1.
For example if sat(w) = {0, 1, 2, 4, 6, 7}, then d(w) = max{0, 1, 2}+1 = 3. The play between
player 1 and player 2 proceeds as follows:
Player 1: wX
=
Player 2: wY
= wY (1) wY (2) wY (3) . . . ;
a1
a2
a3
...;
∀i ≥ 1. ai ∈ A
∀i ≥ 1. wY (i) ∈ B +
35
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
A winning strategy for player 2 is as follows: let the current prefix of wX of length k be
(wX ↾ k) = a1 a2 . . . ak and the current prefix of wY be wY (1)wY (2) . . . wY (k − 1), then the
word wY (k) is generated satisfying the following conditions:
1. There exists ℓ ≤ len(wY (k)) such that
avg wY (1)wY (2) . . . wY (k − 1)(wY (k) ↾ ℓ) ≥ 1 −
1
d(wX ↾ k)
,
for all ℓ1 ≤ ℓ
avg wY (1) . . . wY (k − 1)(wY (k) ↾ ℓ) ≥ avg wY (1) . . . wY (k − 1)(wY (k) ↾ ℓ1 )
and for all ℓ2 such that ℓ ≤ ℓ2 ≤ len(wY (k)) we have
avg wY (1)wY (2) . . . wY (k − 1)(wY (k) ↾ ℓ2 ) ≥ 1 −
2.
1−
1
d(wX ↾ k)
≤ avg wY (1)wY (2) . . . wY (k) ≤ 1 −
1
d(wX ↾ k)
d(wX
1
↾ k) + 1
Intuitively, player 2 initially plays a sequence of b0 ’s to ensure that the average of b0 ’s crosses
1
1 − d(wX ↾k) and then plays a sequence of b0 and b1 ’s to ensure that the average of b0 in
wY (1)wY (2) . . . wY (k) is in the interval
1
1
,1 −
,
1−
d(wX ↾ k)
d(wX ↾ k) + 1
1
and the average never falls below 1− d(wX ↾k) while generating wY (k) once it crosses 1−
1
ω
d(wX ↾k) . Clearly, player 2 has such a strategy. Given a word wX ∈ A , the corresponding
word wY generated is an infinite word. Hence we need to prove wX ∈ X if and only if
wY ∈ Y . We prove implications in both directions.
Claim 1. (wX ∈ X ⇒ wY ∈ Y ) Let wX ∈ X and we show that wY ∈ Y . Given wX ∈ X,
we have ∀i ≥ 0. ∃j ≥ 0. wX ∈ Aj · (Fij )ω . Given i ≥ 0, let
j(i) = min{j ≥ 0 | wX ∈ Aj · (Fij )ω };
b
j(i) = max{j(i′ ) | i′ ≤ i}.
36
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
Given i ≥ 0, for j = b
j(i), for all k ≥ j we have (wX ↾ k) ∈ Aj · (Fij )∗ . Consider the
sequence (wX ↾ j), (wX ↾ j + 1), . . .: for all k ≥ j we have {i′ | i′ ≤ i} ⊆ sat(wX ↾
k). Hence in the corresponding sequence of the word wY it is ensured that for all ℓ ≥
1
len(wY (1)wY (2) . . . wY (j)) we have avg(wY ↾ ℓ) ≥ 1 − i+1
. Hence lim inf n→∞ avg(wY ↾ n) ≥
1−
1
i+1 .
Since this holds for all i ≥ 0, let i → ∞ to obtain that lim inf n→∞ avg(wY ↾ n) ≥
1 = 1 (the equality follows as the average can never be more than 1). Hence wY ∈ Y .
Claim 2. (wY ∈ Y ⇒ wX ∈ X) Let wY ∈ Y and we show wX ∈ X. Fix i ≥ 0. Since
lim inf n→∞ avg(wY ↾ n) = 1, it follows that from some point on average never falls below
1−
1
i+1 .
Then there exists j such that for all l ≥ j we have d(wX ↾ l) ≥ i + 1 and hence
{i′ | i′ ≤ i} ⊆ sat(wY ↾ l). Hence for all l ≥ j we have (wX ↾ l) ∈ Aj · (Fij )∗ and thus we
obtain that wX ∈ Aj · (Fij )ω , i.e., ∃j ≥ 0 such that wX ∈ Aj · (Fij )ω . Since this holds for
all i ≥ 0, it follows that wX ∈ X.
From claim 1 and claim 2 it follows that Y is Π03 -hard, and as an easy consequence
we have the class Φc of objectives defined in Example 1 is Π03 -hard. Hence tail objectives
contain Π03 -hard objectives and since tail objectives are closed under complementation it
also follows that tail objectives contain Σ03 -hard objectives.
Π03 -completeness. To prove Π03 -completeness for long-run average objectives, now it suffices to show that long-run average objectives can be expressed in Π03 . To achieve this we
need to show that for a reward function r, for all real β how to express the following sets
in Π03 :
(1) {ω ∈ Ω | LimAvgr (ω) ≤ β};
(2) {ω ∈ Ω | LimAvgr (ω) ≥ β}.
We now show how to express the above sets in Π03 . We prove the two cases below.
1. The expression for (1) is as follows:
{ω ∈ Ω | LimAvgr (ω) ≤ β} =
\ [
n
1X
r(si ) ≤ β}
{ω | ω = hs1 , s2 , s3 , . . .i,
n
m≥1 n≥m
i=1
37
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
n
1X
It is easy to argue that for a fixed n the set {ω | ω = hs1 , s2 , s3 , . . .i,
r(si ) ≤ β}
n
i=1
is an open set (Σ01 set) (the set of paths can be expressed as union of cones). Hence
it follows that the set of paths specified by (1) can be expressed in Π02 .
2. The expression for (2) is as follows:
{ω ∈ Ω | LimAvgr (ω) ≥ β} =
\ [ \
m≥1 n≥1 k≥n
(
k
1
1X
r(si ) ≥ β−
ω | ω = hs1 , s2 , s3 , . . .i,
k
m
i=1
)
To prove the above expression is in Π03 we show that for fixed m and k,
(
)
k
1X
1
ω | ω = hs1 , s2 , s3 , . . .i,
r(si ) ≥ β −
(∗)
k
m
i=1
is a closed set (i.e, in Π01 ). Observe that once we prove this, it follows that the set
(
)
n
\
1X
1
ω | ω = hs1 , s2 , s3 , . . .i,
r(si ) ≥ β −
n
m
i=1
k≥n
is also a closed set, and hence (2) can be expressed in Π03 . To prove the desired claim
we show that the complement of (*) is open, i,e., for fixed k and m we argue that the
set
(
k
1
1X
r(si ) < β −
ω | ω = hs1 , s2 , s3 , . . .i,
k
m
i=1
)
is an open set. Observe that the above set can be described by an union of cones of
length k, and since cones are basic open sets the desired result follows.
3.2
Positive Limit-one Property
The positive limit-one property for concurrent games, for a class C of objectives,
states that for all objectives Φ ∈ C, for all concurrent games G, if there is a state s such
that the value for player 1 is positive at s for objective Φ, then there is a state s′ where
the value for player 1 is 1 for objective Φ. The property means if a player can win with
positive value from some state, then from some state she can win with value 1. The positive
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
s0
1
2
38
s1
1
2
Figure 3.1: A simple Markov chain.
limit-one property was proved for parity objectives in [dAH00] and has been one of the
key properties used in the algorithmic analysis of concurrent games with parity objectives
(see chapter 8). In this section we prove the positive limit-one property for concurrent
games with tail objectives, and thereby extend the positive limit-one property from parity
objectives to a richer class of objectives that subsume several canonical ω-regular objectives.
Our proof uses a result from measure theory and certain strategy constructions, whereas
the proof for the sub-class of parity objectives [dAH00] followed from complementation
arguments of quantitative µ-calculus formula. We first show an example that the positive
limit-one property is not true for all objectives, even for simpler class of games.
Example 2 Consider the game shown in Fig 3.1, where at every state s, we have Γ1 (s) =
Γ2 (s) = {1} (i.e., the set of moves is singleton at all states). From all states the next state
is s0 and s1 with equal probability. Consider the objective (s1 ) which specifies the next
state is s1 ; i.e., a play ω starting from state s is winning if the first state of the play is s and
the second state (or the next state from s) in the play is s1 . Given the objective Φ = (s1 )
for player 1, we have Val 1 (Φ)(s0 ) = Val 1 (Φ)(s1 ) = 12 . Hence though the value is positive at
s0 , there is no state with value 1 for player 1.
Notation. In the setting of concurrent games the natural filtration sequence (Fn ) for the
stochastic process under any pair of strategies is defined as
Fn = σ(X1 , X2 , . . . , Xn )
i.e., the σ-field generated by the random-variables X1 , X2 , . . . , Xn .
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
39
Conditional expectations. Given a σ-algebra H, the conditional expectation E[f | H] of
a measurable function f is a random variable Z that satisfies the following properties: (a) Z
is H measurable and (b) for all A ∈ H we have E[f 1A ] = E[Z1A ], where 1A is the indicator
of event A (see [Dur95] for details). Another key property of conditional expectation is as
follows: E[E[f | H]] = E[f ] (again see [Dur95] for details).
Almost-sure convergence. Given a random variable X and a sequence (Xn )n≥0 of random variables we write Xn → X almost-surely if limn→∞ Pr({ω | Xn (ω) = X(ω)}) = 1,
i.e., with probability 1 the sequence converges to X.
Lemma 1 (Lévy’s 0-1 law) Suppose Hn ↑ H∞ , i.e.,Hn is a sequence of increasing σfields and H∞ = σ(∪n Hn ). For all events A ∈ H∞ we have
E[1A | Hn ] = Pr(A | Hn ) → 1A almost-surely, (i.e., with probability 1),
where 1A is the indicator function of event A.
The proof of the lemma is available in the book of Durrett (page 262—263) [Dur95]. An
immediate consequence of Lemma 1 in the setting of concurrent games is the following
lemma.
Lemma 2 (0-1 law in concurrent games) For all concurrent game structures G, for
all events A ∈ F∞ = σ(∪n Fn ), for all strategies (σ, π) ∈ Σ × Π, for all states s ∈ S, we
have
Prσ,π
s (A | Fn ) → 1A almost-surely.
Intuitively, the lemma means that the probability Prσ,π
s (A | Fn ) converges almost-surely
(i.e., with probability 1) to 0 or 1 (since indicator functions take values in the range {0, 1}).
Note that the tail σ-field T is a subset of F∞ , i.e., T ⊆ F∞ , and hence the result of
Lemma 2 holds for all A ∈ T .
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
40
Objectives as indicator functions. Objectives Φ are indicator functions Φ : Ω → {0, 1}
defined as follows:
Φ(ω) =



1 if ω ∈ Φ


0 otherwise.
Notation. Given strategies σ and π for player 1 and player 2, a tail objective Φ, and a
state s, for β > 0, let
Hn1,β (σ, π, Φ) = {hs1 , s2 , . . . , sn , sn+1 , . . .i | Prσ,π
s (Φ | hs1 , s2 , . . . , sn i) ≥ 1 − β}
= {ω | Prσ,π
s (Φ | Fn )(ω) ≥ 1 − β};
denote the set of paths ω such that the probability of satisfying Φ given the strategies σ
and π, and the prefix of length n of ω is at least 1 − β. Similarly, let
Hn0,β (σ, π, Φ) = {hs1 , s2 , . . . , sn , sn+1 , . . .i | Prσ,π
s (Φ | hs1 , s2 , . . . , sn i) ≤ β}
= {ω | Prσ,π
s (Φ | Fn )(ω) ≤ β};
denote the set of paths ω such that the probability of satisfying Φ given the strategies σ
and π, and the prefix of length n of ω is at most β. We often refer to prefixes of paths in
Hn1,β as histories in Hn1,β , and analogously for Hn0,β .
Proposition 5 For all concurrent game structures G, for all strategies σ and π for player 1
and player 2, respectively, for all tail objectives Φ, for all states s ∈ S, for all β > 0 and
1,β
0,β
ε > 0, there exists n, such that Prσ,π
s (Hn (σ, π, Φ) ∪ Hn (σ, π, Φ)) ≥ 1 − ε.
Proof. Let fn = Prσ,π
s (Φ | Fn ). By Lemma 2, we have fn → Φ almost-surely as n → ∞.
Since almost-sure convergence implies convergence in probability we have
∀β > 0. limn→∞ Prσ,π
s ({ω | |fn (ω) − Φ(ω)| ≥ β}) = 0
⇒
∀β > 0. limn→∞ Prσ,π
s ({ω | |fn (ω) − Φ(ω)| ≤ β}) = 1.
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
41
Since Φ is an indicator function we have
∀β > 0. limn→∞ Prσ,π
s ({ω | fn (ω) ≥ 1 − β or fn (ω) ≤ β}) = 1
⇒
1,β
0,β
∀β > 0. limn→∞ Prσ,π
s (Hn (σ, π, Φ) ∪ Hn (σ, π, Φ)) = 1.
Hence we have
1,β
0,β
∀β > 0. ∀ε > 0. ∃n0 . ∀n ≥ n0 . Prσ,π
s (Hn (σ, π, Φ) ∪ Hn (σ, π, Φ)) ≥ 1 − ε.
The result follows.
Lemma 3 (Always-positive implies probability 1) Let α > 0 be a real constant
greater than 0. For all objectives Φ, for all strategies σ and π, and for all states s, if
fn = Prσ,π
s (Φ | Fn ) > α, ∀n,
i.e.,fn (ω) > α almost-surely for all n;
then Prσ,π
s (Φ) = 1.
Proof. We show that for all ε > 0 we have Prσ,π
s (Φ) ≥ 1 − 2ε. Since ε > 0 is arbitrary, the
result follows. Given ε > 0 and α > 0, we chose β such that 0 < β < α and 0 < β < ε. By
Proposition 5 there exists n0 such that for all n > n0 we have
Prσ,π
s ({ω | fn (ω) ≥ 1 − β or fn (ω) ≤ β}) ≥ 1 − ε.
Since fn (ω) ≥ α > β almost-surely for all n we have Prσ,π
s ({ω | fn (ω) ≥ 1 − β}) ≥ 1 − ε,
i.e., we have Prσ,π
s (Φ | Fn ) ≥ 1 − β with probability at least 1 − ε. Hence we have
σ,π
σ,π
σ,π
Prσ,π
s (Φ) = Es [Φ] = Es [ Es [Φ | Fn ] ] ≥ (1 − β) · (1 − ε) ≥ 1 − 2ε.
Observe that we have used the property of conditional expectation to infer that Eσ,π
s [Φ] =
σ,π
Eσ,π
s [ Es [Φ | Fn ] ]. The desired result follows.
Theorem 5 (Positive limit-one property) For all concurrent game structures G, for
all tail objectives Φ, if there exists a state s ∈ S such that Val 1 (Φ)(s) > 0, then there exists
a state s′ ∈ S such that Val 1 (Φ)(s′ ) = 1.
42
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
Probability less than ε.
Modify strategy
≤β
to π
e
and ensure η −
ε
4
≥1−β
Figure 3.2: An illustration of idea of Theorem 5.
The basic idea of the proof. We prove the desired result by contradiction. We assume
towards contradiction that from some state s we have Val 1 (Φ)(s) = α > 0 and for all
states s1 we have Val 1 (Φ)(s1 ) ≤ η < 1. We fix ε-optimal strategies σ and π for player 1
and player 2, for sufficiently small ε > 0. By Proposition 5, for all 0 < β < 1, there exists
1,β
0,β
ε
e as
n such that Prσ,π
s (Hn ∪ Hn ) ≥ 1 − 4 . The strategy π is modified to a strategy π
follows: on histories in Hn0,β , the strategy π
e ignores the history of length n and switches to
an 4ε -optimal strategy, and otherwise plays as π. By suitable choice of β (depending on ε)
we show that player 2 can ensure that the probability of satisfying Φ from s given σ is less
than α − ε. This contradicts that σ is an ε-optimal strategy and Val 1 (Φ)(s) = α. The idea
is illustrated in Fig 3.2. We formally prove the result now.
Proof.
(of Theorem 5.)
Assume towards contradiction that there exists a state
s such that Val 1 (Φ)(s) > 0, but for all states s′ we have Val 1 (Φ)(s′ ) < 1.
Let
α = 1 − Val 1 (Φ)(s) = Val 2 (Φ)(s). Since 0 < Val 1 (Φ)(s) < 1, we have 0 < α < 1. Since
Val 2 (Φ)(s′ ) = 1 − Val 1 (Φ)(s′ ) and for all states s′ we have Val 1 (Φ)(s′ ) < 1, it follows
that Val 2 (Φ)(s′ ) > 0, for all states s′ . Fix η such that 0 < η = mins′ ∈S Val 2 (Φ)(s′ ). Also
observe that since Val 2 (Φ)(s) = α < 1, we have η < 1. Let c be a constant such that c > 0,
and α · (1 + c) = γ < 1 (such a constant exists as α < 1). Also let c1 > 1 be a constant
43
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
such that c1 · γ < 1 (such a constant exists since γ < 1); hence we have 1 − c1 · γ > 0 and
1−
1
c1
> 0. Fix ε > 0 and β > 0 such that
1
1
β < min{ε, , 1 − }.
2
c1
η
η
0 < 2ε < min{ , 2c · α, · (1 − c1 · γ)};
4
4
(3.1)
Fix ε-optimal strategies σε for player 1 and πε for player 2. Let Hn1,β = Hn1,β (σε , πε , Φ) and
Hn0,β = Hn0,β (σε , πε , Φ). Consider n such that Prsσε ,πε (Hn1,β ∪ Hn0,β ) ≥ 1 −
by Proposition 5). Also observe that since β <
1
2
ε
4
(such n exists
we have Hn1,β ∩ Hn0,β = ∅. Let
val = Prσs ε ,πε (Φ | Hn1,β ) · Prσs ε ,πε (Hn1,β ) + Prsσε ,πε (Φ | Hn0,β ) · Prσs ε ,πε (Hn0,β ).
We have
ε
val ≤ Prσs ε ,πε (Φ) ≤ val + .
4
(3.2)
The first inequality follows since Hn1,β ∩ Hn0,β = ∅ and the second inequality follows since
Prσs ε ,πε (Hn1,β ∪ Hn0,β ) ≥ 1 − 4ε . Since σε and πε are ε-optimal strategies we have α − ε ≤
Prsσε ,πε (Φ) ≤ α + ε. This along with (3.2) yield that
α−ε−
ε
≤ val ≤ α + ε.
4
(3.3)
Observe that Prσs ε ,πε (Φ | Hn1,β ) ≥ 1 − β and Prsσε ,πε (Φ | Hn0,β ) ≤ β. Let q = Prsσε ,πε (Hn1,β ).
Since Prsσε ,πε (Φ | Hn1,β ) ≥ 1−β; ignoring the term Prσs ε ,πε (Φ | Hn0,β )·Prσs ε ,πε (Hn0,β ) in val and
from the second inequality of (3.3) we obtain that (1−β)·q ≤ α+ε. Since ε < c·α, β < 1− c11 ,
and γ = α · (1 + c) we have
q≤
α+ε
α · (1 + c)
<
= c1 · γ
1−β
1 − (1 − c11 )
(3.4)
We construct a strategy π
bε as follows: the strategy π
bε follows the strategy πε for the first
n − 1-stages; if a history in Hn1,β is generated it follows πε , and otherwise it ignores the
history and switches to an ε-optimal strategy. Formally, for a history hs1 , s2 , . . . , sk i we
44
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
have
π
bε (hs1 , s2 , . . . , sk i) =




πε (hs1 , s2 , . . . , sk i)













π
eε (hsn , . . . , sk i)








if k < n;
or Prσs ε ,πε (Φ | hs1 , s2 , . . . , sn i) ≥ 1 − β;
k ≥ n, Prσs ε ,πε (Φ | hs1 , s2 , . . . , sn i) < 1 − β,
where π
eε is an ε-optimal strategy
Since π
bε and πε coincides for n − 1-stages we have Prsσε ,bπε (Hn1,β ) = Prσs ε ,πε (Hn1,β ) and
Prsσε ,bπε (Hn0,β ) = Prsσε ,πε (Hn0,β ). Moreover, since Φ is a tail objective that is independent
of the prefix of length n; η ≤ mins′ ∈S Val 2 (Φ)(s′ ) and π
eε is an ε-optimal strategy, we have
Prsσε ,bπε (Φ | Hn0,β ) ≥ η − ε. Also observe that
Prsσε ,bπε (Φ | Hn0,β ) ≥ (η − ε) = Prσs ε ,πε (Φ | Hn0,β ) + (η − ε − Prsσε ,πε (Φ | Hn0,β ))
≥ Prsσε ,πε (Φ | Hn0,β ) + (η − ε − β),
(3.5)
45
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
since Prsσε ,πε (Φ | Hn0,β ) ≤ β. Hence we have the following inequality
Prsσε ,bπε (Φ) ≥ Prsσε ,bπε (Φ | Hn1,β ) · Prsσε ,bπε (Hn1,β )
+ Prsσε ,bπε (Φ | Hn0,β ) · Prsσε ,bπε (Hn0,β )
= Prσs ε ,πε (Φ | Hn1,β ) · Prsσε ,πε (Hn1,β )
+ Prsσε ,bπε (Φ | Hn0,β ) · Prsσε ,bπε (Hn0,β )
≥ Prσs ε ,πε (Φ | Hn1,β ) · Prsσε ,πε (Hn1,β )
+ Prσs ε ,πε (Φ | Hn0,β ) · Prσs ε ,πε (Hn0,β )
+ (η − ε − β) · 1 − q −
ε
4
since Prσs ε ,πε (Hn0,β ) ≥ 1 − q −
ε
= val + (η − ε − β) · (1 − q − )
4
ε
ε
≥ α − ε − + (η − ε − β) · (1 − q − )
4
4
(recall first inequality of (3.3))
> α−ε−
ε
ε
+ (η − 2ε) · (1 − q − )
4
4
(since β < ε by (3.1))
> α−ε−
ε
ε η
+ · (1 − q − )
4 2
4
(since 2ε <
> α−ε−
η ε
ε η
+ · (1 − c1 · γ) − ·
4 2
2 4
(since q < c1 · γ by (3.4))
> α−ε−
ε
ε
+ 4ε −
4
8
(since 2ε <
η
2
η
4
by (3.1))
· (1 − c1 · γ) by (3.1),
and η ≤ 1)
> α + ε.
The first equality follows since for histories in Hn1,β , the strategies πε and π
bε coincide; and the
second inequality uses (3.5). Hence we have Prsσε ,bπε (Φ) > α + ε and Prsσε ,bπε (Φ) < 1 − α − ε.
This is a contradiction to the fact that Val 1 (Φ)(s) = 1 − α and σε is an ε-optimal strategy.
The desired result follows.
Notation. We use the following notation for the rest of the chapter:
W11 = {s | Val 1 (Φ)(s) = 1};
W1>0 = {s | Val 1 (Φ)(s) > 0};
ε
4
W21 = {s | Val 2 (Φ)(s) = 1}.
W2>0 = {s | Val 2 (Φ)(s) > 0}.
By determinacy of concurrent games with tail objectives, we have W11 = S \ W2>0 and
W21 = S \ W1>0 . We have the following finer characterization of the sets.
46
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
Corollary 4 For all concurrent game structures G, with tail objectives Φ for player 1, the
following assertions hold:
1. (a) if W1>0 6= ∅, then W11 6= ∅; and (b) if W2>0 6= ∅, then W21 6= ∅.
2. (a) if W1>0 = S, then W11 = S; and (b) if W2>0 = S, then W21 = S.
Proof. The first result is a direct consequence of Theorem 5. The second result is derived
as follows: if W1>0 = S, then by determinacy we have W21 = ∅. If W21 = ∅, it follows from
part 1 that W2>0 = ∅, and hence W11 = S. The result of part 2 shows that if a player has
positive optimum value at every state, then the optimum value is 1 at all states.
Extension to countable state space. We first present an example to show that Corollary 4 (and hence also Theorem 5) does not extend directly to countable state space. Then
we present the appropriate extension of Theorem 5 to countable state space.
Example 3 Consider a Markov chain defined on the countable state space S as follows:
S = SN ∪ {t}, where SN = {si | i = 0, 1, 2, . . .}. The transition probabilities are specified
as follows: the state t is an absorbing state; and from state si the next state is si+1 with
1
probability ( 12 ) 2i , and the next state is t with the rest of the probability. Consider the tail
P∞
objective Φ = Büchi({t}). For a state si we have Val 2 (Φ)(si ) = ( 21 )
1
j=i 2j
1
= ( 12 ) 2i−1 < 1.
That is, we have Val 1 (Φ)(s) > 0 for all s ∈ S. Hence W1>0 = S, but however, W11 6= S.
We now present the appropriate extension of Theorem 5 to countable state spaces.
Theorem 6 For all concurrent game structures G with countable state space, for all tail objectives Φ, if there exists a state s ∈ S such that Val 1 (Φ)(s) > 0, then sups′ ∈S Val 1 (Φ)(s′ ) =
1.
Proof. The key difference to the proof of Theorem 5 is to fix the constants. Assume towards
contradiction that there exists a state s such that Val 1 (s) > 0, but sups′ ∈S Val 1 (Φ)(s′ ) < 1.
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
47
Let α = 1 − Val 1 (Φ)(s) = Val 2 (Φ)(s). Since 0 < Val 1 (Φ)(s) < 1, we have 0 < α < 1. Let
η = inf s′ ∈S Val 2 (Φ)(s′ ); since sups′ ∈S Val 1 (Φ)(s′ ) < 1 and Val 2 (Φ)(s′ ) = 1 − Val 1 (Φ)(s′ )
for all s′ ∈ S, we have 0 < η. Also observe that since Val 2 (Φ)(s) = α < 1, we have η < 1.
Once the constant η is fixed, we can essentially follow the proof of Theorem 5 to obtain the
desired result.
3.3
Zero-sum Tail Games to Nonzero-sum Reachability
Games
In this section we relate the values of zero-sum games with tail objectives with
the Nash equilibrium values of nonzero-sum games with reachability objectives. The result shows that the values of a zero-sum game with complex objectives can be related to
equilibrium values of a nonzero-sum game with simpler objectives. We also show that for
MDPs the value function for a tail objective Φ can be computed by computing the maximal
probability of reaching the set of states with value 1. As an immediate consequence of the
above analysis, we obtain a polynomial time reduction of the quantitative analysis of MDPs
with tail objectives, to the qualitative analysis. We first prove a limit-reachability property of ε-optimal strategies: the property states that for tail objectives, if the players play
ε-optimal strategies, for small ε > 0, then the game reaches W11 ∪ W21 with high probability.
Theorem 7 (Limit-reachability) For all concurrent game structures G, for all tail objectives Φ for player 1, for all ε′ > 0, there exists ε > 0, such that for all states s ∈ S, for
all ε-optimal strategies σε and πε , we have
Prsσε ,πε (Reach(W11 ∪ W21 )) ≥ 1 − ε′ .
Proof. By determinacy it follows that W11 ∪W21 = S\(W1>0 ∩W2>0 ). For a state s ∈ W11 ∪W21
the result holds trivially. Consider a state s ∈ W1>0 ∩ W2>0 and let α = Val 2 (Φ)(s).
48
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
Observe that 0 < α < 1. Let η1 = mins∈W >0 Val 1 (Φ)(s) and η2 = maxs∈W >0 Val 2 (Φ)(s),
2
2
and let η = min{η1 , 1 − η2 }, and note that 0 < η < 1. Given ε′ > 0, fix ε such that
′
0 < 2ε < min{ η2 , η·ε
12 }. Fix any ε-optimal strategies σε and πε for player 1 and player 2,
respectively. Fix β such that 0 < β < ε and β <
Let Hn1,β = Hn1,β (σε , πε , Φ) and
1
2.
Hn0,β = Hn0,β (σε , πε , Φ). Consider n such that Prsσε ,πε (Hn1,β ∪ Hn0,β ) = 1 −
ε
4
(such n exists
by Proposition 5), and also as β < 12 , we have Hn1,β ∩ Hn0,β = ∅. Let us denote by
val = Prσs ε ,πε (Φ | Hn1,β ) · Prσs ε ,πε (Hn1,β ) + Prsσε ,πε (Φ | Hn0,β ) · Prσs ε ,πε (Hn0,β ).
Similar to inequality (3.2) of Theorem 5 we obtain that
val ≤ Prsσε ,πε (Φ) ≤ val +
ε
4
Since σε and πε are ε-optimal strategies, similar to inequality (3.3) of Theorem 5 we obtain
that α − ε −
ε
4
≤ val ≤ α + ε.
For W ⊆ S, let Reachn (W ) = {hs1 , s2 , s3 . . .i | ∃k ≤ n. sk ∈ W } denote the
set of paths that reaches W in n-steps. We use the following notations: Reach(W11 ) =
Ω \ Reachn (W11 ), and Reach(W21 ) = Ω \ Reachn (W21 ). Consider a strategy σ
bε defined as
follows: for histories in Hn1,β ∩ Reach(W21 ), σ
bε ignores the history after stage n and follows
an ε-optimal strategy σ
eε (i.e., σ
eε is an ε-optimal strategy); and for all other histories it
follows σε . Let z1 = Prsσε ,πε (Hn1,β ∩ Reach(W21 )). Since η2 = maxs∈W >0 Val 2 (Φ)(s), and
2
player 1 switches to an ε-optimal strategy for histories of length n in
Hn1,β ∩ Reach(W21 )
and
Φ is a tail objective, it follows that for all ω = hs1 , s2 , . . . , sn , sn+1 , . . .i ∈ Hn1,β ∩ Reach(W21 ),
we have Prσsbε ,πε (Φ | hs1 , s2 . . . , sn i) ≤ η2 + ε; where as Prσs ε ,πε (Φ | hs1 , s2 . . . , sn i) ≥ 1 − β.
Hence we have
val2 = Prσsbε ,πε (Φ) ≤ Prsσε ,πε (Φ) − z1 · (1 − β − η2 − ε) ≤ val +
ε
− z1 · (1 − β − η2 − ε),
4
49
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
since with probability z1 the decrease is at least by 1 − β − η2 − ε. Since πε is an ε-optimal
strategy we have val2 ≥ α − ε; and since val ≤ α + ε, we have the following inequality
z1 · (1 − η2 − β − ε) ≤ 2ε +
⇒ z1 <
⇒ z1 <
ε
< 3ε
4
3ε
η−β−ε
(since η ≤ 1 − η2 )
6ε
ε′
3ε
<
<
η − 2ε
η
4
(since β < ε; ε <
η
η · ε′
;ε <
)
4
24
bε igConsider a strategy π
bε defined as follows: for histories in Hn0,β ∩ Reach(W11 ), π
nores the history after stage n and follows an ε-optimal strategy π
eε ; and for all other histories
it follows πε . Let z2 = Prσs ε ,πε (Hn0,β ∩ Reach(W11 )). Since η1 = mins∈W >0 Val 2 (Φ)(s), and
2
player 2 switches to an ε-optimal strategy for histories of length n in
Hn0,β ∩ Reach(W11 )
and
Φ is a tail objective, it follows that for all ω = hs1 , s2 , . . . , sn , sn+1 , . . .i ∈ Hn1,β ∩ Reach(W11 ),
we have Prsσε ,bπε (Φ | hs1 , s2 . . . , sn i) ≥ η1 −ε; where as Prσs ε ,πε (Φ | hs1 , s2 . . . , sn i) ≤ β. Hence
we have
val1 = Prsσε ,bπε (Φ) ≥ Prsσε ,πε (Φ) + z2 · (η1 − ε − β) ≥ val + z2 · (η1 − ε − β),
since with probability z2 the increase is at least by η1 − ε − β. Since σε is an ε-optimal
strategy we have val1 ≤ α + ε; and since val ≥ α − ε + 4ε , we have the following inequality
z2 · (η1 − β − ε) ≤ 2ε +
⇒ z2 <
⇒ z2 <
ε
< 3ε
4
3ε
η−β−ε
ε′
4
(since η ≤ η1 )
(similar to the inequality for z1 <
ε′
)
4
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
Hence z1 + z2 ≤
ε′
2;
50
and then we have
Prσs ε ,πε (Reach(W11 ∪ W21 )) ≥ Prσs ε ,πε (Reachn (W11 ∪ W21 ) ∩ (Hn1,β ∪ Hn0,β ))
= Prσs ε ,πε (Reachn (W11 ∪ W21 ) ∩ Hn1,β )
+ Prsσε ,πε (Reachn (W11 ∪ W21 ) ∩ Hn0,β )
≥ Prσs ε ,πε (Reachn (W11 ) ∩ Hn1,β ) + Prσs ε ,πε (Reachn (W21 ) ∩ Hn0,β )
≥ Prσs ε ,πε (Hn1,β ) + Prsσε ,πε (Hn0,β ) − (z1 + z2 )
≥ 1−
ε ε′
+ ≥ 1 − ε′
4
2
(since ε ≤ ε′ ).
The result follows.
Theorem 7 proves the limit-reachability property for tail objectives, under ε-optimal strategies, for small ε. We present an example to show that Theorem 7 is not true for all objectives,
or for tail objectives with arbitrary strategies.
Example 4 Observe that in the game shown in Example 2, the objective was not a tail
objective and we had W11 ∪ W21 = ∅. Hence Theorem 7 need not necessarily hold for all
objectives. Also consider the game shown in Fig 3.3. In the game shown s1 and s2 are
absorbing state. At s0 the available moves for the players are as follows: Γ1 (s0 ) = {a} and
Γ2 (s0 ) = {1, 2}. The transition function is as follows: if player 2 plays move 2, then the
next state is s1 and s2 with equal probability, and if player 2 plays move 1, then the next
state is s0 . The objective of player 1 is Φ = Büchi({s0 , s1 }), i.e., to visit s0 or s1 infinitely
often. We have W11 = {s1 } and W21 = {s2 }. Given a strategy π that chooses move 1
always, the set W11 ∪ W21 of states is reached with probability 0; however π is not an optimal
or ε-optimal strategy for player 2 (for ε < 21 ). This shows that Theorem 7 need not hold
if ε-optimal strategies are not considered. In the game shown, for an optimal strategy for
player 2 (e.g., a strategy to choose move 2) the play reaches W11 ∪ W21 with probability 1.
51
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
s2
a,2
1
2
a,1
s0
a,2
1
2
s1
Figure 3.3: A game with Büchi objective.
The following example further illustrates Theorem 7.
Example 5 (Concurrent Büchi game) Consider the concurrent game shown in Fig 3.4.
The avilable moves for the players at state s0 and s3 are {0, 1} and {0, 1, q}, respectively.
At all other states the available moves for both the players are singleton. The transitions are
shown as labeled edges in the figure. The objective of player 1 is to visit s4 or s7 infinitely
often, i.e., Büchi({s4 , s7 }). Observe that since at state s3 each player can choose move q,
it follows that the values for the players at state s3 (and hence at states s0 , s1 , s2 , s4 , s5 and
s6 ) is
1
2.
The value for player 1 is 1 at state s7 and 0 at state s8 . Consider the strategy
σ for player 1 as follows: (a) at state s0 it plays 0 and 1, each with probability
1
2,
and
remembers the move played as the move b; (b) at state s3 , player 1 remembers the move
c played by player 2 (since player 1 knows whether the state s1 or s2 was visited, it can
infer the move played by player 2 at s0 ); (c) at state s3 player 1 plays move b as long as
player 2 plays move c, otherwise player 1 plays the move q. Informally, player 1 plays both
its move uniformly at random at s0 , and discloses to player 2, and remembers the move of
player 2. As long as player 2 follows her move, player 1 follows her move chosen in the
first round, else if player 2 deviates, then player 1 quits the game by playing q. A strategy π
for player 2 can be defined similarly. Given strategies σ and π the play ωsσ,π
reaches s7 and
0
s8 with probability 0. Moreover, ωsσ,π
satisfies Büchi({s4 , s8 }) with probability 21 . However,
0
observe that the strategy σ is not an optimal strategy. Given the strategy σ, consider the
strategy π as follows: the strategy π chooses 0 and 1 with probability
1
2
at s0 , and at s3 if
the chosen move c at s0 matches with the move b for player 1, then player 2 plays q (i.e.,
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
52
B
00,
11
s1
s3
s0
01,
10
s4
00,
11
01,
10
s5
s2
q0,q1
1q,0q
s8
s6
s7
B
Figure 3.4: A concurrent Büchi game.
quits the game) and otherwise it follows π. Given the strategy π, if player 1 follows σ, then
Büchi({s4 , s7 }) is satisfied with only probability
1
4.
In the game shown, if both the players
follow any pair of optimal strategies, then the game reaches s7 and s8 with probability 1.
Lemma 4 is immediate from Theorem 7.
Lemma 4 For all concurrent game structures G, for all tail objectives Φ for player 1 and
Φ for player 2, for all states s ∈ S, we have
lim
ε→0
lim
ε→0
lim
ε→0
sup
σ∈Σε (Φ),π∈Πε (Φ)
sup
1
1
Prσ,π
s (Reach(W1 ∪ W2 )) = 1;
1
Prσ,π
s (Reach(W1 )) = Val 1 (Φ)(s);
σ∈Σε (Φ),π∈Πε (Φ)
sup
1
Prσ,π
s (Reach(W2 )) = Val 2 (Φ)(s).
σ∈Σε (Φ),π∈Πε (Φ)
Consider a non-zero sum reachability game GR such that the states in W11 ∪ W21 are transformed to absorbing states and the objectives of both players are reachability objectives:
the objective for player 1 is Reach(W11 ) and the objective for player 2 is Reach(W21 ). Note
that the game GR is not zero-sum in the following sense: there are infinite paths ω such
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
53
that ω 6∈ Reach(W11 ) and ω 6∈ Reach(W21 ) and each player gets a payoff 0 for the path ω.
We define ε-Nash equilibrium of the game GR and relate some special ε-Nash equilibrium
of GR with the values of G.
Definition 4 (ε-Nash equilibrium in GR ) A strategy profile (σ ∗ , π ∗ ) ∈ Σ × Π is an εNash equilibrium at state s if the following two conditions hold:
Prσs
Prσs
∗ ,π ∗
∗ ,π ∗
∗
(Reach(W11 )) ≥ sup Prσ,π
(Reach(W11 )) − ε
s
σ∈Σ
(Reach(W21 )) ≥ sup Prsσ
π∈Π
∗ ,π
(Reach(W21 )) − ε
Theorem 8 (Nash equilibrium of reachability game GR ) The following assertion
holds for the game GR .
• For all ε > 0, there is an ε-Nash equilibrium (σε∗ , πε∗ ) ∈ Σε (Φ) × Πε (Φ) such that for
all states s we have
∗
∗
∗
∗
lim Prσs ε ,πε (Reach(W11 )) = Val 1 (Φ)(s)
ε→0
lim Prσs ε ,πε (Reach(W21 )) = Val 2 (Φ)(s).
ε→0
Proof. It follows from Lemma 4.
Note that in case of MDPs the strategy for player 2 is trivial, i.e., player 2 has only one
strategy. Hence in context of MDPs we drop the strategy π of player 2. A specialization of
Theorem 8 in case of MDPs yields Theorem 9.
Theorem 9 For all MDPs GM , for all tail objectives Φ, we have
Val 1 (Φ)(s) = sup Prσs (Reach(W11 )) = Val 1 (Reach(W11 ))(s)
σ∈Σ
Since the values in MDPs with reachability objectives can be computed in polynomial time
(by linear-programming) [Con92, FV97], our result presents a polynomial time reduction of
quantitative analysis of tail objectives in MDPs to qualitative analysis.
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
3.4
54
Construction of ε-optimal Strategies for Müller Objectives
In this section we show that for Müller objectives witnesses of ε-optimal strategies
can be constructed as witnesses of certain limit-sure winning strategies that respect certain
local conditions. A key notion that will play an important role in the construction of εoptimal strategies is the notion of local optimality. Informally, a selector function ξ is a
memoryless strategy and the selector function ξ is locally optimal if it is optimal in the onestep matrix game where each state is assigned a reward value Val 1 (Φ)(s). A locally optimal
strategy is a strategy that consists of locally optimal selectors. A locally ε-optimal strategy
is a strategy that has a total deviation from locally-optimal selectors of at most ε. We
note that local ε-optimality and ε-optimality are very different notions. Local ε-optimality
consists in the approximation of local optimal selectors; a locally ε-optimal strategy provides
no guarantee of yielding a probability of winning the game close to the optimal one.
Definition 5 (Selectors) A selector ξ for player i ∈ {1, 2} is a function ξ : S → Dist(A)
such that for all s ∈ S and a ∈ A, if ξ(s)(a) > 0, then a ∈ Γi (s). We denote by Λi the set
of all selectors for player i ∈ {1, 2}. Observe that selectors coincide with the definition of
memoryless strategies.
Definition 6 (Locally ε-optimal selectors and strategies) A selector ξ is locally optimal for objective Φ if for all s ∈ S and a2 ∈ Γ2 (s) we have
Esξ(s),a2 [Val 1 (Φ)(X1 )] ≥ Val 1 (Φ)(s).
We denote by Λℓ (Φ) the set of locally-optimal selectors for objective Φ. A strategy σ is locally
optimal for objective Φ if for every history hs0 , s1 , . . . , sk i we have σ(hs0 , s1 , . . . , sk i) ∈
Λℓ (Φ), i.e., player 1 plays a locally optimal selector at every round of the play. We denote
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
55
by Σℓ (Φ) the set of locally optimal strategies for objective Φ. A strategy σε is locally εoptimal for objective Φ if for every strategy π ∈ Π, for all k ≥ 1, for all states s we have
Val 1 (Φ)(s) − Eσ,π
s [Val 1 (Φ)(Xk )] ≤ ε.
Observe that a strategy that at each round i chooses a locally optimal selector with probability
P
ℓ
at least (1 − εi ), with ∞
i=0 εi ≤ ε, is a locally ε-optimal strategy. We denote by Σε (Φ) the
set of locally ε-optimal strategies for objective Φ.
We first show that for all tail objectives, for all ε > 0, there exist strategies that are
ε-optimal and locally ε-optimal as well.
Lemma 5 For all tail objectives Φ, for all ε > 0,
1. Σ 2ε (Φ) ⊆ Σℓε (Φ),
2. Σε (Φ) ∩ Σℓε (Φ) 6= ∅.
Proof. For ε > 0, fix an 2ε -optimal strategy σ for player 1. By definition σ is an ε-optimal
strategy as well. We argue that σ ∈ Σℓε (Φ). Assume towards contradiction that σ 6∈ Σℓε (Φ),
i.e., there exists a player 2 strategy π, a state s, and k such that
Val 1 (Φ)(s) − Eσ,π
s [Val 1 (Φ)(Xk )] > ε.
Fix a strategy π ∗ = (π + π
e) for player 2 as follows: play π for k steps, then switch to an
ε
4 -optimal
strategy π
e. Formally for a history hs1 , s2 , . . . , sn i we have





π(hs1 , s2 , . . . , sn i)




π ∗ (hs1 , s2 , . . . , sn i) = π
e(hsk+1 , sk+2 , . . . , sn i)








if n ≤ k
if n > k,
where π
e is an 4ε -optimal strategy.
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
∗
(Φ) =
Since Φ is a tail objective we have Prσ,π
s
obtain the following inequality
P
∗
Prsσ,π (Φ) =
t∈S
P
=
t∈S
t∈S
∗
(Xk = t). Hence we
Prtσ,eπ (Φ) · Prσ,π
s
∗
(Xk = t)
Prtσ,eπ (Φ) · Prσ,π
s
Prtσ,eπ (Φ) · Prσ,π
s (Xk = t)
P
≤
P
56
t∈S (Val 1 (Φ)(t)
+ 4ε ) · Prσ,π
s (Xk = t)
= Eσ,π
s [Val 1 (Φ)(Xk )] +
ε
4
(since π
e is an 4ε -optimal strategy)
Hence we have
∗
Prσ,π
(Φ) < (Val 1 (Φ)(s) − ε) +
s
3ε
ε
ε
= Val 1 (Φ)(s) −
< Val 1 (Φ)(s) − .
4
4
2
Since by assumption σ is an 2ε -optimal strategy we have a contradiction. This establishes
the desired result.
Definition 7 (Perennial ε-optimal strategies) A strategy σ is a perennial ε-optimal
strategy for objective Φ, if it is ε-optimal for all states s, and for all histories hs1 , s2 , . . . , sk i,
for all strategies π ∈ Π for player 2, Prσ,π
s (Φ | hs1 , s2 , . . . , sk i) ≥ Val 1 (Φ)(sk ) − ε, i.e., for
every history hs1 , s2 , . . . , sk i, given the history the probability to satisfy Φ is within ε of the
value at sk . We denote by ΣPL
ε (Φ) the set of perennial ε-optimal strategies for player 1, for
objective Φ. The set of perennial ε-optimal strategies for player 2 is defined similarly and
we denote them by ΠPL
ε (Φ).
Existence of perennial ε-optimal strategies. The results of [dAM01] proves existence
of perennial ε-optimal strategies for concurrent games with parity objectives, for all ε > 0.
Since Müller objectives can be reduced to parity objectives, the following proposition follows.
Proposition 6 For all concurrent game structures, for all Müller objectives Φ, for all
PL
ε > 0, ΣPL
ε (Φ) 6= ∅ and Πε (Φ) 6= ∅.
57
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
Lemma 6 For all concurrent game structures G, for all Müller objectives Φ for player 1
and Φ for player 2, we have
>0
>0
sup Prσ,π
s (Φ ∩ Safe(W1 ∩ W2 )) = 0;
inf
σ∈ΣPL
ε (Φ) π∈Π
inf
>0
>0
sup Prσ,π
s (Φ ∩ Safe(W1 ∩ W2 )) = 0;
σ∈Σε (Φ) π∈Π
>0
>0
sup Prσ,π
s (Φ ∩ Safe(W1 ∩ W2 )) = 0;
inf
π∈ΠPL
ε (Φ) σ∈Σ
inf
>0
>0
sup Prσ,π
s (Φ ∩ Safe(W1 ∩ W2 )) = 0.
π∈Πε (Φ) σ∈Σ
Proof. We show that
inf
σ∈ΣPL
ε (Φ)
>0
>0
sup Prσ,π
s (Φ ∩ Safe(W1 ∩ W2 )) = 0.
π∈Π
Since for all ε > 0 we have ΣPL
ε (Φ) ⊆ Σε (Φ), this is sufficient to prove the first two claims.
The result for the last two claims is symmetric. We prove the first claim as follows. Let
W >0 = W1>0 ∩ W2>0 . Let η = mins∈W >0 Val 1 (Φ)(s), and observe that 0 < η < 1. Fix 0 <
2ε < η, and fix a perennial ε-optimal strategy σ ∈ ΣPL
ε (Φ). Consider a strategy π ∈ Π for
>0
player 2. Since σ ∈ ΣPL
ε (Φ), for all k ≥ 1, for all histories hs1 , s2 , . . . , sk i such that si ∈ W
η
for all i ≤ k, we have Prσ,π
s (Φ | hs1 , s2 , . . . , sk i) ≥ η−ε > 2 . For a history hs1 , s2 , . . . , sk i such
1
1
that there exists i ≤ k and si 6∈ W >0 we have Prσ,π
s (Reach(W1 ∪ W2 ) | hs1 , s2 , . . . , sk i) = 1.
η
1
1
Hence it follows that for all n we have Prσ,π
s (Φ ∪ Reach(W1 ∪ W2 ) | Fn ) > 2 . Since
η
2
> 0,
σ,π
>0 )) = 0. The
1
1
by Lemma 3, we have Prσ,π
s (Φ ∪ Reach(W1 ∪ W2 )) = 1, i.e., Prs (Φ ∩ Safe(W
desired result follows.
Theorem 10 Given a concurrent game structure G, with a tail objective Φ for player 1,
let σε ∈ Σℓε (Φ) be a locally ε-optimal strategy, and ε-optimal from W11 (i.e., for all s ∈
W11 and all π we have Prσs ε ,π (Φ) ≥ 1 − ε). If for all strategies π for player 2 we have
Prσs ε ,π (Φ ∩ Safe(W1>0 ∩ W2>0 )) ≤ ε, then σε is an 3ε-optimal strategy.
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
58
Proof. Let η1 = maxs∈W >0 Val 1 (Φ)(s). Without loss of generality we assume that the
2
states in W21 are converted to absorbing states and player 2 wins if the play reaches W21 . Consider an arbitrary strategy π for player 2, and consider a state s ∈ W1>0 ∩ W2>0 , and let α =
Val 1 (Φ)(s). By local ε-optimality of σε , for all k ≥ 1, we have α − ε ≤ Eσs ε ,π [Val 1 (Φ)(Xk )].
Since for all s ∈ S we have Val 1 (Φ)(s) ≤ 1, we have Eσs ε ,π [Val 1 (Φ)(Xk )] ≤ Prσs ε ,π (Xk ∈ W1>0 ).
Hence we obtain the following inequality:
α − ε ≤ Eσs ε ,π [Val 1 (Φ)(Xk )] ≤ Prσs ε ,π (Xk ∈ W1>0 )
≤ Prσs ε ,π (Safe(W1>0 )) = 1 − Prσs ε ,π (Reach(W21 )).
Hence we have Prσs ε ,π (Reach(W21 )) ≤ 1−α+ε. Thus we obtain that Prσs ε ,π (Φ∩Reach(W21 )) ≤
1 − α + ε. Since σε is ε-optimal from W11 , we have Prσs ε ,π (Φ ∩ Reach(W11 )) ≤ ε. The above
inequalities and along with the assumption of the lemma yield the following inequality:
Prσs ε ,π (Φ) = Prσs ε ,π (Φ ∩ Safe(W1>0 ∩ W2>0 )) + Prσs ε ,π (Φ ∩ Reach(W21 ))
+ Prσs ε ,π (Φ ∩ Reach(W11 ))
≤ ε + 1 − α + ε + ε ≤ 1 − α + 3ε.
Thus Prσs ε ,π (Φ) ≥ α − 3ε. Since the above inequality holds for all π we obtain that σε is an
3ε-optimal strategy.
Lemma 6 shows that the ε-optimal strategies for player 1 are limit-sure winning against
objective Φ∩Safe(W1>0 ∩W2>0 ), for Müller objective Φ. Theorem 10 shows that if a strategy
is ε-limit sure winning for player 1 against objective Φ ∩ Safe(W1>0 ∩ W2>0 ) for player 2,
then local ε-optimality guarantees 3ε-optimality. This characterizes ε-optimal strategies as
local ε-optimal and ε-limit sure winning strategies.
CHAPTER 3. CONCURRENT GAMES WITH TAIL OBJECTIVES
3.5
59
Conclusion
In this chapter we studied concurrent games with tail objectives. We proved
the positive-limit one property and also related the values of zero-sum tail games with
Nash equilibria of nonzero-sum reachability games. We also presented construction of εoptimal strategies for Müller objectives. The computation of the sets W11 , W1>0 and the
corresponding sets for player 2 for concurrent games and its sub-classes for tail objectives
remain open. The more general problem of computing the value functions also remain open.
We believe that algorithms for computing W11 , W1>0 and the properties we prove in the
chapter could lead to algorithms for computing value functions. The exact characterization
of tail objectives in the Borel hierarchy also remains open.
60
Chapter 4
Stochastic Müller Games
In this chapter we study 2 21 -player games with Müller objectives.1 We present
an optimal memory bound for pure (deterministic) almost-sure and optimal strategies in
2 12 -player graph games with Müller conditions. In fact we generalize the elegant analysis
of [DJW97] to present an upper bound for optimal strategies for 2 21 -player graph games
with Müller conditions that matches the lower bound for sure winning in 2-player games.
We present the result for almost-sure strategies in subsection 4.3; and then generalize it
to optimal strategies in subsection 4.4. The results developed also help us to precisely
characterize the complexity of several classes of 2 12 -player Müller games. We show that the
complexity of quantitative analysis of 2 12 -player games with Müller objectives is PSPACE
complete. We also show that for two special classes of Müller objectives (namely, unionclosed and upward-closed objectives) the problem is coNP-complete. We also study the
memory bounds for randomized strategies. In case of randomized strategies we improve
the upper bound for almost-sure and optimal strategies as compared to pure strategies
(Section 4.5). The problem of a matching upper and lower bound for almost-sure and
optimal randomized strategies remains open. We start with some basic results on MDPs in
1
Preliminary versions of the results of this chapter appeared in [CdAH04] and [Cha07b, Cha07c]
61
CHAPTER 4. STOCHASTIC MÜLLER GAMES
the following section.
4.1
Markov decision processes
We consider player-1 MDPs and hence only strategies for player 1. Let G =
((S, E), (S1 , S2 , SP ), δ) with S2 = ∅ be a 1 21 -player game graph. We present some basic
results on MDPs (1 21 -player game graphs) and some new results. In the sequel of this
section we consider player-1 MDPs and hence drop S2 from game graphs, since S2 = ∅.
4.1.1
MDPs with reachability objectives
We first consider MDPs with reachability objectives. The following theorem states
that the value function for MDPs with reachability objectives can be computed in polynomial time by linear-programming. It also follows that pure memoryless optimal strategies
exist for MDPs with reachability objectives.
Theorem 11 ([FV97]) Given a player-1 MDP G = ((S, E), (S1 , SP ), δ) and T ⊆ S, consider the following linear program. For every state s ∈ S there is a variable xs ; the objective
function and the constraints are as follows
X
xs
subject to
xs
≥
xt
t ∈ E(s); s ∈ S1 ;
xs
=
min
s∈S
X
t∈E(s)
xs
=
xt · δ(s)(t)
1
s ∈ SP ;
s ∈ T.
For all s ∈ S we have xs = Val 1 (Reach(T ))(s).
Theorem 12 ([FV97]) The family ΣPM of pure memoryless strategies suffices for optimality for reachability objectives on all 1 12 -player games (MDPs).
CHAPTER 4. STOCHASTIC MÜLLER GAMES
62
Almost-sure winning reachability property. Given an MDP G = ((S, E), (S1 , SP ), δ)
and T ⊆ S, let U = Almost 1 (Reach(T )) be the set of almost-sure winning states. For all
states s ∈ (S \ U ), the probability to reach T , for all strategies σ, is less than 1. For all
states s ∈ U ∩ SP , there cannot be an edge to (S \ U ); otherwise from s the game reaches
(S \ U ) with positive probability, and from (S \ U ) the set T is reached with probability
less than 1. This will contradict that s ∈ U = Almost 1 (Reach(T )). Hence for all states
s ∈ U ∩ SP we have E(s) ⊆ U . Moreover, for all states s ∈ U there is path from s to a state
in T . Hence we have the following characterization:
1. for all s ∈ (U \ T ) there exists a state t ∈ U ∩ E(s) such that the distance (BFS
distance) to T from t in the graph of G is smaller than the distance to T from s; we
refer to choosing such a successor as shortening the distance to T ; and
2. for all s ∈ (U \ T ) ∩ SP we have E(s) ⊆ U .
Moreover, a pure memoryless strategy σ that at every state s ∈ (U \ T ) ∩ S1 chooses a
successor to shorten distance to T ensures that T is reached with probability 1.
4.1.2
MDPs with Müller objectives
We now show that in MDPs with Müller objectives uniform randomized memo-
ryless (uniform over the support) optimal strategies exist. We develop some facts on end
components [CY90, dA97] that will be useful tools for analysis of MDPs.
Definition 8 (End component) A set U ⊆ S of states is an end component if U is
δ-closed and the subgame graph G ↾ U is strongly connected.
We denote by E ⊆ 2S the set of all end components of G. The next lemma states
that, under any strategy (memoryless or not), with probability 1 the set of states visited
infinitely often along a play is an end component. This lemma allows us to derive conclusions
63
CHAPTER 4. STOCHASTIC MÜLLER GAMES
on the (infinite) set of plays in an MDP by analyzing the (finite) set of end components in
the MDP.
Lemma 7 [CY90, dA97] For all states s ∈ S and strategies σ ∈ Σ, we have
Prσs (Müller(E)) = 1.
For an end component U ∈ E, we denote by σU the randomized memoryless
strategy that at each state s ∈ U ∩ S1 selects uniformly at random the states in E(s) ∩ U ;
we call this the uniform strategy for U . Under the uniform strategy for U , all states of U
are visited infinitely often; this follows immediately from the fact that U , under σU , is a
closed connected recurrent class of a Markov chain.
Lemma 8 For all end components U
∈
E and all states s
∈
U , we have
Prσs U (Müller({U })) = 1.
We will now prove that for MDPs with Müller objectives randomized memoryless
optimal strategies exist. We first state the result and then prove it.
Theorem 13 The family ΣM of randomized memoryless strategies suffices for optimality
with respect to Müller objectives on 1 12 -player game graphs.
Given a set M ⊆ 2S of Müller sets, we denote by U = E ∩ M the set of end
components that are Müller sets. These are the winning end components. Let Tend =
S
U ∈U U be their union. From Lemma 7 and Theorem 9, it follows that the maximal
probability of satisfying the objective Müller(M ) is equal to the maximal probability of
reaching the union of the winning end components.
Lemma 9 For all 1 21 -player games and for all Müller objectives Müller(M ) for M ⊆ 2S
we have Val 1 (Müller(M )) = Val 1 (Reach(Tend )).
CHAPTER 4. STOCHASTIC MÜLLER GAMES
64
To construct a memoryless optimal strategy, we let U = {U1 , . . . , Uk }, thus fixing
an arbitrary order between the winning end components, and we define the rank of a state
s ∈ Tend by r(s) = max{1 ≤ j ≤ k | s ∈ Uj }. We define a randomized memoryless strategy
σ
b as follows:
• In S \ Tend , the strategy σ
b coincides with a pure memoryless optimal strategy to
reach Tend .
• At each state s ∈ Tend ∩ S1 , the strategy σ
b coincides with the strategy σUr(s) (the
uniform strategy for Ur(s) ); that is, it selects uniformly at random the states in E(s) ∩
Ur(s) as successors.
Once such a memoryless strategy is fixed, the MDP becomes a Markov chain MC σb , with
transition probabilities δσb defined as follows:



σ
b(s)(t)
δσb (s)(t) =


δ(s)(t)
s ∈ S1 , t ∈ S;
s ∈ SP , t ∈ S.
The following lemma characterizes the closed connected recurrent classes of this Markov
chain in the set Tend , stating that they are all winning end components.
Lemma 10 If C is a closed connected recurrent class of the Markov chain MC σb , then
either C ∩ Tend = ∅ or C ∈ U.
Proof. Let
Eσb = {(s, t) ∈ Tend × Tend | δσb (s)(t) > 0}.
The closed connected recurrent classes of M Cσb are the terminal strongly connected components of the graph (Tend , Eσb ). The rank of the states along all paths in (Tend , Eσb ) is
nondecreasing. Hence each terminal strongly connected component C of (Tend , Eσb ) must
consist of states with the same rank, denoted r(C). Clearly, C ⊆ Ur(C) . To see that
65
CHAPTER 4. STOCHASTIC MÜLLER GAMES
C = Ur(C) note that in C player 1 follows the strategy σUr(C) , which causes the whole of
Ur(C) to be visited. Hence, as C is terminal, we have C = Ur(C) .
The optimality of the randomized memoryless strategy σ
b is a simple consequence
of Lemma 7. Hence we have the following lemma which proves Theorem 13.
Lemma 11 For all states s ∈ S, we have Val 1 (Müller(M ))(s) = Prσsb (Müller(M )).
4.1.3
MDPs with Rabin and Streett objectives
In this section we present polynomial time algorithm for computing values of MDPs
with Rabin and Streett objectives. It follows from the results of subsection 4.1.2 that it
suffices to compute the winning end components, and then solve a MDP with reachability
objective (which can then be achieved in polynomial time by Theorem 11).
Winning end components. Consider a set P = {(E1 , F1 ), . . . , (Ed , Fd )} of Streett (resp.
Rabin) pairs. An end component U is winning for the Streett objective if ∀i ∈ [1..d].(U ∩
Ei 6= ∅ ∨ U ∩ Fi = ∅); and for the Rabin objective if ∃i ∈ [1..d].(U ∩ Ei = ∅ ∧ U ∩ Fi 6= ∅).
Winning end components for Rabin objectives. In [dA97] it was shown that the
winning end components for MDPs with Rabin objectives and Almost 1 (Rabin(P )) can be
computed by d-calls to a procedure of computing almost-sure winning states of a MDP with
Büchi objective. In [CJH03] we proved that the set of almost-sure winning states of MDPs
√
with Büchi objectives can be computed in O(m · m) time, where m is the number of edges.
Hence we have the following result.
Theorem 14 For all MDPs G = ((S, E), (S1 , SP ), δ) with Rabin objective Rabin(P ), where
P = {(E1 , F1 ), (E2 , F2 ), . . . , (Ed , Fd )}, the following assertions hold:
1. Almost 1 (Rabin(P )) can be computed in O(d · m ·
√
m) time where m = |E|; and
2. Val 1 (Rabin(P )) can be computed in polynomial time.
CHAPTER 4. STOCHASTIC MÜLLER GAMES
66
Winning end components for Streett objectives. We now present a polynomial time
algorithm for computing the maximal probability of satisfying a Streett condition in an
MDP. We present a polynomial time algorithm for computing Tend ; the computation of the
value then reduces to computing values of a MDP with a reachability objective. To state
the algorithm, we say that an end component U ⊆ S is maximal in V ⊆ S if U ⊆ V , and
if there is no end component U ′ with U ⊂ U ′ ⊆ V . Given a set V ⊆ S, we denote by
MaxEC(V ) the set consisting in all maximal end components U such that U ⊆ V . This set
can be computed in quadratic time (in O(n · m) time for graphs with n states and m edges)
with standard graph algorithms; see, e.g., [dA97]. The set Tend can be computed with the
following algorithm.
L := MaxEC(S); D := ∅
while L 6= ∅ do
choose U ∈ L and let L := L \ {U }
if ∀i ∈ [1..d].(U ∩ Ei 6= ∅ ∨ U ∩ Fi = ∅)
then D := D ∪ {U }
else choose i ∈ [1..d] such that U ∩ Fi 6= ∅, and let L := L ∪
MaxEC(U \ Fi )
end if
end while
S
Return: Tend = U ∈D U .
It is easy to see that every state s ∈ S is considered as part of an end component in the
else-part of the above algorithm at most once for every 1 ≤ i ≤ d; hence, the algorithm
runs in time polynomial in n · m · d time.
Theorem 15 For all MDPs G = ((S, E), (S1 , SP ), δ) with Streett objective Streett(P ),
where P = {(E1 , F1 ), (E2 , F2 ), . . . , (Ed , Fd )}, the following assertions hold:
1. Almost 1 (Streett(P )) can be computed in O(d · n · m) time, where n = |S| and m = |E|;
and
2. Val 1 (Streett(P )) can be computed in polynomial time.
CHAPTER 4. STOCHASTIC MÜLLER GAMES
4.2
67
2 12 -player Games with Müller objectives
We present a slightly different notation for Müller objectives. The slightly different
notation is consistent with the notation of [DJW97], and we do so to use the lower bound
results of [DJW97]. The results (memory bounds and complexity results) of this chapter
also hold for the definitions used in chapter 2.
Objectives.
An objective for a player consists of an ω-regular set of winning plays
Φ ⊆ Ω [Tho97].
In this chapter we study zero-sum games [FV97, RF91], where the
objectives of the two players are complementary; that is, if the objective of one player
is Φ, then the objective of the other player is Φ = Ω \ Φ. We consider ω-regular objectives specified as Müller objectives. For a play ω = hs0 , s1 , s2 , . . .i, let Inf(ω) be the set
{s ∈ S | s = sk for infinitely many k ≥ 0} of states that appear infinitely often in ω. We
use colors to define objectives as in [DJW97]. A 2 21 -player game (G, C, χ, F ⊆ P(C)) consists of a 2 12 -player game graph G, a finite set C of colors, a partial function χ : S ⇀ C
that assigns colors to some states, and a winning condition specified by a subset F of the
power set P(C) of colors. The winning condition defines subset Φ ⊆ Ω of winning plays,
defined as follows:
Müller(F) = {ω ∈ Ω | χ(Inf(ω)) ∈ F}
that is the set of paths ω such that the colors appearing infinitely often in ω is in F.
Remarks. A winning condition F ⊆ P(C) has a split if there are sets C1 , C2 ∈ F such
that C1 ∪ C2 6∈ F. A winning condition is a Rabin winning condition if it do not have splits,
and it is a Streett winning condition if P(C) \ F does not have a split. This notions coincide
with the Rabin and Streett winning conditions usually defined in the literature, i.e., as in
Chapter 2 (see [Niw97, DJW97] for details).
Determinacy. For sure winning, the 1 21 -player and 2 12 -player games coincide with 2player (deterministic) games where the random player (who chooses the successor at the
CHAPTER 4. STOCHASTIC MÜLLER GAMES
68
probabilistic states) is interpreted as an adversary, i.e., as player 2. Theorem 2 states
the classical determinacy result for 2-player games with Müller objectives. Theorem 16
(obtained as special case of Theorem 1) states the classical determinacy results for 2 12 player game graphs with Müller objectives. It follows from Theorem 16 that for all Müller
objectives Φ, for all ε > 0, there exists an ε-optimal strategy σε for player 1 such that for
all π and all s ∈ S we have Prσ,π
s (Φ) ≥ Val 1 (Φ)(s) − ε.
Theorem 16 (Quantitative determinacy [Mar98]) For all 2 21 -player game graphs, for
all Müller winning conditions F ⊆ P(C), and all states s, we have Val 1 (Müller(F))(s) +
Val 2 (Ω \ Müller(F))(s) = 1.
4.3
Optimal Memory Bound for Pure Qualitative Winning
Strategies
In this section we present optimal memory bounds for pure strategies with respect
to qualitative (almost-sure and positive) winning for 2 12 -player game graphs with Müller
winning conditions. The result is obtained by a generalization of the result of [DJW97]
and depends on the novel constructions of Zielonka [Zie98] for 2-player games. In [DJW97]
the authors use an insightful analysis of Zielonka’s construction to present an upper bound
(and also a matching lower bound) on memory of sure winning strategies in 2-player games
with Müller objectives. In this section we generalize the result of [DJW97] to show that the
same upper bound holds for qualitative winning strategies in 2 21 -player games with Müller
objectives. We now introduce some notations and the Zielonka tree of a Müller condition.
Notation. Let F ⊆ P(C) be a winning condition. For D ⊆ C we define (F ↾ D) ⊆ P(D)
as the set {D ′ ∈ F | D ′ ⊆ D}. For a Müller condition F ⊆ P(C) we denote by F the
complementary condition, i.e., F = P(C) \ F. Similarly for an objective Φ we denote by Φ
the complementary objective, i.e., Φ = Ω \ Φ.
CHAPTER 4. STOCHASTIC MÜLLER GAMES
69
Definition 9 (Zielonka tree of a winning condition [Zie98]) The Zielonka tree of a
winning condition F ⊆ P(C), denoted ZF ,C , is defined inductively as follows:
1. If C 6∈ F, then ZF ,C = ZF ,C , where F = P(C) \ F.
2. If C ∈ F, then the root of ZF ,C is labeled with C. Let C0 , C1 , . . . , Ck−1 be all the
maximal sets in {X 6∈ F | X ⊆ C}. Then we attach to the root, as its subtrees, the
Zielonka trees of F ↾ Ci , i.e., ZF ↾Ci ,Ci , for i = 0, 1, . . . , k − 1.
Hence the Zielonka tree is a tree with nodes labeled by sets of colors. A node of ZF ,C is a
0-level node if it is labeled with a set from F, otherwise it is a 1-level node. In the sequel
we write ZF to denote ZF ,C if C is clear from the context.
Definition 10 (The number mF of Zielonka tree) Let F ⊆ P(C) be a winning condition and ZF0 ,C0 , ZF1 ,C1 , . . . , ZFk−1 ,Ck−1 be the subtrees attached to the root of the tree ZF ,C ,
where Fi = F ↾ Ci ⊆ P(Ci ) for i = 0, 1, . . . , k − 1. We define the number mF inductively
as follows
mF =





1




max{mF0 , , mF1 , . . . , mFk−1 }





P


 k−1
i=1 mFi
if ZF ,C does not have any subtrees,
if C 6∈ F, (1-level node)
if C ∈ F, (0-level node).
We now state the results on strategy complexity of Müller objectives and its subclasses.
Theorem 17 (Finite memory and memoryless strategies) The following assertions
hold.
1. ([GH82]). The family of pure finite-memory strategies suffices for sure winning with
respect to Müller objectives on 2-player game graphs.
70
CHAPTER 4. STOCHASTIC MÜLLER GAMES
2. ([EJ88]). The family of pure memoryless strategies suffices for sure winning with
respect to Rabin objectives on 2-player game graphs.
Theorem 18 (Size of memory for sure-winning strategy[DJW97]) For
all
2-
player game graphs, for all Müller winning conditions F, pure sure winning strategies
of size mF suffices for sure winning for objective Müller(F). For every Müller winning
conditions F, there exists a 2-player game graph, such that all pure sure winning strategies
for objective Müller(F) requires memory at least mF .
Our goal is to show that for winning conditions F pure finite-memory qualitative
winning strategies of size mF exist in 2 21 -player games. This proves the upper bound. The
results of [DJW97] (Theorem 18) already established the matching lower bound for 2-player
games. This establishes the optimal bound of memory of qualitative winning strategies for
2 12 -player games. We start with the key notion of attractors that will be crucial in our
proofs.
Definition 11 (Attractors) Given a 2 21 -player game graph G and a set U ⊆ S of states,
such that G ↾ U is a subgame, and T ⊆ S we define Attr1,P (T, U ) as follows:
T0 = T ∩ U ;
and for j ≥ 0 we define Tj+1 from Tj as
Tj+1 = Tj ∪ {s ∈ (S1 ∪ SP ) ∩ U | E(s) ∩ Tj 6= ∅} ∪ {s ∈ S2 ∩ U | E(s) ∩ U ⊆ Tj }.
and A = Attr1,P (T, U ) =
S
j≥0 Tj .
We obtain Attr2,P (T, U ) by exchanging the roles of
player 1 and player 2. A pure memoryless attractor strategy σ A : (A \ T ) ∩ S1 → S for
player 1 on A to T is as follows: for i > 0 and a state s ∈ (Ti \ Ti−1 ) ∩ S1 , the strategy
σ A (s) ∈ Ti−1 chooses a successor in Ti−1 (which exists by definition).
Lemma 12 (Attractor properties) Let G be a 2 21 -player game graph and U ⊆ S be a
set of states such that G ↾ U is a subgame. For a set T ⊆ S of states, let Z = Attr1,P (T, U ).
Then the following assertions hold.
71
CHAPTER 4. STOCHASTIC MÜLLER GAMES
1. G ↾ (U \ Z) is a subgame.
2. Let σ Z be a pure memoryless attractor strategy for player 1. For all strategies π for
player 2 in the subgame G ↾ U and for all states s ∈ U we have
(a) if Prσs
Z ,π
(Reach(Z)) > 0, then Prσs
(b) if Prσs
Z ,π
(Büchi(Z)) > 0, then Prσs
Z ,π
Z ,π
(Reach(T )) > 0; and
(Büchi(T ) | Büchi(Z)) = 1.
Proof. We prove the following cases.
1. Subgame property. For a state s ∈ U \ Z, if s ∈ S1 ∪ SP , then E(s) ∩ Z = ∅, (otherwise
s would have been in Z), i.e., E(s) ∩ U ⊆ U \ Z. For a state s ∈ S2 ∩ (U \ Z) we have
E(s) ∩ (U \ Z) 6= ∅ (otherwise s would have been in Z). It follows that G ↾ (U \ Z) is
a subgame.
2. We now prove the two cases.
(a) Positive probability reachability. Let
δmin = min{δ(s)(t) | s ∈ SP , t ∈ S, δ(s)(t) > 0}.
Observe that δmin > 0. Let Z =
S
i≥0 Ti
with T0 = T ; (as defined for attractors).
Z of both player 1 and the random player on Z as follows:
Consider a strategy σ1,P
player 1 follows an attractor strategy σ Z on Z to T and for s ∈ (Ti \ Ti−1 ) ∩ SP ,
the random player chooses a successor t ∈ Ti−1 . Such a successor exists by
definition, and observe that such a choice is made in the game with probability
Z
at least δmin . The strategy σ1,P
ensures that for all states s ∈ Z and for all
strategies π for player 2 in G ↾ U , the set T ∩ U is reached with in |Z|-steps.
Given player 1 follows an attractor strategy σ Z , the probability of the choice of
|Z|
Z
σ1,P
is at least δmin . It follows that a pure memoryless attractor strategy σ Z
72
CHAPTER 4. STOCHASTIC MÜLLER GAMES
ensures that for all states s ∈ Z and for all strategies π for player 2 in G ↾ U we
have
Prσs
Z ,π
(Reach(T )) ≥ (δmin )|Z| > 0.
The desired result follows.
(b) Almost-sure Büchi property. Given a pure memoryless attractor strategy σ Z , if
the set Z is visited ℓ-times, then by the previous part we have that T is reached
at least once with probability 1 − (1 − |δmin ||Z| )ℓ , which goes to 1 as ℓ → ∞.
Hence for all states s and strategies π in G ↾ U , given Prσs
have Prσs
Z ,π
Z ,π
(Büchi(Z)) > 0, we
(Reach(T ) | Büchi(Z)) = 1. Since given the event that Z is visited
infinitely often (i.e., Büchi(Z)) the set T is reached with probability 1 from all
states, it follows that the set T is visited infinitely often with probability 1.
Formally, for all states s and strategies π in G ↾ U , given Prσs
we have Prσs
Z ,π
Z ,π
(Büchi(Z)) > 0,
(Büchi(T ) | Büchi(Z)) = 1.
The result of the lemma follows.
Lemma 12 shows that the complement of an attractor is a subgame; and a pure
memoryless attractor strategy ensures that if the attractor of a set T is reached with positive
probability, then T is reached with positive probability, and given that the attractor of T
is visited infinitely often, then T is visited infinitely often with probability 1. We now
present the main result of this section (upper bound on memory for qualitative winning
strategies). A matching lower bound follows from the results of [DJW97] for 2-player games
(see Theorem 20).
Theorem 19 (Qualitative forgetful determinacy) Let (G, C, χ, F) be a 2 21 -player
game with Müller winning condition F for player 1. Let Φ = Müller(F), and consider
CHAPTER 4. STOCHASTIC MÜLLER GAMES
73
the following sets
W1>0 = Positive 1 (Φ);
W1 = Almost 1 (Φ);
W2>0 = Positive 2 (Φ);
W2 = Almost 2 (Φ).
The following assertions hold.
1. We have (a) W1>0 ∪ W2 = S and W1>0 ∩ W2 = ∅; and (b) W2>0 ∪ W1 = S and
W2>0 ∩ W1 = ∅.
2. (a) Player 1 has a pure strategy σ with memory of size mF such that for all states
s ∈ W1>0 and for all strategies π for player 2 we have Prσ,π
s (Φ) > 0; and (b) player 2
has a pure strategy π with memory of size mF such that for all states s ∈ W2 and for
all strategies σ for player 1 we have Prσ,π
s (Φ) = 1.
3. (a) Player 1 has a pure strategy σ with memory of size mF such that for all states
s ∈ W1 and for all strategies π for player 2 we have Prσ,π
s (Φ) = 1; and (b) player 2
has a pure strategy π with memory of size mF such that for all states s ∈ W2>0 and
for all strategies σ for player 1 we have Prσ,π
s (Φ) > 0.
Proof. The first part of the result is a consequence of Theorem 16. We will concentrate
on the proof for the result for part 2. The last part (part 3) follows from a symmetric
argument.
The proof goes by induction on the structure of the Zielonka tree ZF ,C of the
winning condition F. We assume that C 6∈ F. The case when C ∈ F can be proved by
a similar argument: if C ∈ F, then we consider b
c 6∈ C and consider the winning condition
b Hence we consider, without loss of generality, that
Fb = F ⊆ P(C ∪ {b
c}) with C ∪ {b
c} 6∈ F.
C 6∈ F and let C0 , C1 , . . . , Ck−1 be the label of the subtrees attached to the root C, i.e.,
C0 , C1 , . . . , Ck−1 are maximal subset of colors that appear in F. We will define by induction
a non-decreasing sequence of sets (Uj )j≥0 as follows. Let U0 = ∅ and for j > 0 we define Uj
below:
74
CHAPTER 4. STOCHASTIC MÜLLER GAMES
Xj
Zj
Uj−1
χ−1 (Dj )
Aj = Attr1,P (Uj−1 , S)
Attr2,P (χ−1 (Dj ), Xj )
Yj
Figure 4.1: The sets of the construction.
1. Aj = Attr1,P (Uj−1 , S) and Xj = S \ Aj ;
2. Dj = C \ Cj
mod k
and Yj = Xj \ Attr2,P (χ−1 (Dj ), Xj );
3. let Zj be the set of positive winning states for player 1 in (G ↾ Yj , Cj
Cj
mod k ),
(i.e., Zj = Positive 1 (Müller(F ↾ Cj
mod k ))
mod k , χ, F
↾
in G ↾ Yj ); hence (Yj \ Zj ) is
almost-sure winning for player 2 in the subgame; and
4. Uj = Aj ∪ Zj .
Fig 4.1 describes all these sets. The property of attractors and almost-sure winning states
ensure certain edges are forbidden between the sets. This is shown is Fig 4.2.
We start
with a few observations of the construction.
1. Observation 1. For all s ∈ S2 ∩ Zj , we have E(s) ⊆ Zj ∪ Aj . This follows from the
following case analysis.
• Since Yj is a complement of an attractor set Attr2,P (χ−1 (Dj ), Xj ), it follows that
for all states s ∈ S2 ∩ Yj we have E(s) ∩ Xj ⊆ Yj . It follows that E(s) ⊆ Yj ∪ Aj .
75
CHAPTER 4. STOCHASTIC MÜLLER GAMES
Xj
Zj
Uj−1
Aj = Attr1,P (Uj−1 , S)
χ−1 (Dj )
Yj
Attr2,P (χ−1 (Dj ), Xj )
Figure 4.2: The sets of the construction with forbidden edges.
• Since player 2 can win almost-surely from the set Yj \ Zj , if a state s ∈ Yj ∩ S2
has an edge to Yj \ Zj , then s ∈ Yj \ Zj . Hence for s ∈ S2 ∩ Zj we have
E(s) ∩ (Yj \ Zj ) = ∅.
2. Observation 2. For all s ∈ Xj ∩ (S1 ∪ SP ) we have (a) E(s) ∩ Aj = ∅; else s would
have been in Aj ; and (b) if s ∈ Yj \ Zj , then E(s) ∩ Zj = ∅ (else s would have been
in Zj ).
3. Observation 3. For all s ∈ Yj ∩ SP we have E(s) ⊆ Yj .
We will denote by Fi the winning condition F ↾ Ci , for i = 0, 1, . . . , k − 1, and
F i = P(Ci ) \ Fi . By induction hypothesis on Fi = F ↾ Cj
mod k ,
player 1 has a pure
positive winning strategy of size mFi from Zj and player 2 has a pure almost-sure winning
S
strategy of size mF i from Yj \ Zj . Let W = j≥0 Uj . We will show in Lemma 13 that
player 1 has a pure positive winning strategy of size mF from W ; and then in Lemma 14
we will show that player 2 has a pure almost-sure winning strategy of size mF from S \ W .
CHAPTER 4. STOCHASTIC MÜLLER GAMES
76
This completes the proof. We now prove the Lemmas 13 and 14.
Lemma 13 Player 1 has a pure positive winning strategy of size mF from the set W .
Proof. By induction hypothesis on Uj−1 , player 1 has a pure positive winning strategy
U
σj−1
of size mF from Uj−1 . From the set Aj = Attr1,P (Uj−1 , S), player 1 has a pure
memoryless attractor strategy σjA to bring the game to Uj−1 with positive probability
U
(Lemma 12(part 2.(a))), and then use σj−1
and ensure winning with positive probability
from the set Aj . Let σjZ be the pure positive winning strategy for player 1 in Zj of size mFi ,
U , σ A and σ Z ensure
where i = j mod k. We now show the combination of strategies σj−1
j
j
positive probability winning for player 1 from Uj . If the play starts at a state s ∈ Zj ,
then player 1 follows σjZ . If the play stays in Yj for ever, then the strategy σjZ ensures
that player 1 wins with positive probability. By observation 1 of Theorem 19, for all states
s ∈ Yj ∩ S2 , we have E(s) ⊆ Yj ∪ Aj . Hence if the play leaves Yj , then player 2 must chose
U
an edge to Aj . In Aj player 1 can use the attractor strategy σjA followed by σj−1
to ensure
positive probability win. Hence if the play is in Yj for ever with probability 1, then σjZ
ensures positive probability win, and if the play reaches Aj with positive probability, then
U
σjA followed by σj−1
ensures positive probability win.
Z , σ Z ) be the strategy
We now formally present σjU defined on Uj . Let σjZ = (σj,u
j,m
obtained from inductive hypothesis; defined on Zj (i.e., arbitrary elsewhere) of size mFi ,
Z be
where i = j mod k, and ensure winning with positive probability on Zj . Let σj,u
Z
the memory-update function and σj,m
be the next-move function of σjZ . We assume the
memory MFi of σjZ to be the set {1, 2, . . . , mFi }. The strategy σjA : (Aj \ Uj−1 ) ∩ S1 → Aj
is a pure memoryless attractor strategy on Aj to Uj−1 . The strategy σjU is as follows: the
77
CHAPTER 4. STOCHASTIC MÜLLER GAMES
memory-update function is



U


σj−1,u
(s, m)




U
σj,u
(s, m) = σ Z (s, m)
j−1,u







1
s ∈ Uj−1
s ∈ Zj , m ∈ MFi
otherwise.
the next-move function is
U
(s, m) =
σj,m



U

σj−1,m
(s, m) s ∈ Uj−1 ∩ S1








Z
σj−1,m
(s, m) s ∈ Zj ∩ S1 , m ∈ MFi


Z


σj−1,m
(s, 1)







σ A (s)
j
s ∈ Zj ∩ S1 , m 6∈ MFi
s ∈ (Aj \ Uj−1 ) ∩ S1 .
The strategy σjU formally defines the strategy we described and proves the result.
Lemma 14 Player 2 has a pure almost-sure winning strategy of size mF from the set S \W .
Proof. Let ℓ ∈ N be such that ℓ mod k = 0 and W = Uℓ−1 = Uℓ = Uℓ+1 = · · · = Uℓ+k−1 .
From the equality W = Uℓ−1 = Uℓ we have Attr1,P (W, S) = W . Let us denote by W = S\W .
Hence G ↾ W is a subgame (by Lemma 12), and also for all s ∈ W ∩(S1 ∪SP ) we have E(s) ⊆
W . The equality Uℓ+i−1 = Uℓ+i implies that Zℓ+i = ∅. Hence for all i = 0, 1, . . . , k − 1,
we have Zℓ+i = ∅. By inductive hypothesis for all i = 0, 1, . . . , k − 1, player 2 has a pure
almost-sure winning strategy π i of size mF i in the game (G ↾ Yℓ+i , Ci , χ, F ↾ Ci ).
We now describe the construction of a pure almost-sure winning strategy π ∗ for
b i = χ−1 (Di ) the set of states with
player 2 in W . For Di = C \ Ci we denote by D
colors Di . If the play starts in a state in Yℓ+i , for i = 0, 1, . . . , k − 1, then player 2 uses
the almost-sure winning strategy π i . If the play leaves Yℓ+i , then the play must reach
b i , W ), since player 1 and random states do not have edges to W .
W \ Yℓ+i = Attr2,P (D
b i , W ), player 2 plays a pure memoryless attractor strategy to reach the set
In Attr2,P (D
78
CHAPTER 4. STOCHASTIC MÜLLER GAMES
b i with positive probability. If the set D
b i is reached, then a state in Y(ℓ+i+1) mod k or in
D
b (i+1) mod k , W is reached. If Y(ℓ+i+1) mod k is reached π (i+1) mod k is followed,
Attr2,P D
b (i+1)
and otherwise the pure memoryless attractor strategy to reach the set D
positive probability is followed. Of course, the play may leave Y(ℓ+i+1)
Y(ℓ+i+2)
mod k ,
mod k ,
mod k
with
and reach
and then we would repeat the reasoning, and so on. Let us analyze various
cases to prove that π ∗ is almost-sure winning for player 2.
1. If the play finally settles in some Yℓ+i , for i = 0, 1, . . . , k − 1, then from this moment
player 2 follows π i and ensures that the objective Φ is satisfied with probability 1.
∗
(Φ |
Formally, for all states s ∈ W , for all strategies σ for player 1 we have Prσ,π
s
coBüchi(Yℓ+i )) = 1. This holds for all i = 0, 1, . . . , k−1 and hence for all states s ∈ W ,
S
∗
(Φ | 0≤i≤k−1 coBüchi(Yℓ+i )) = 1.
for all strategies σ for player 1 we have Prσ,π
s
b i , W ) is visited
2. Otherwise, for all i = 0, 1, . . . , k − 1, the set W \ Yℓ+i = Attr2,P (D
b i , W ) is visited infinitely often, then the
infinitely often. By Lemma 12, given Attr2,P (D
b i is visited infinitely often with probability 1.
attractor strategy ensures that the set D
Formally, for all states s ∈ W , for all strategies σ for player 1, for all i = 0, 1, . . . , k −
∗
b i ) | Büchi(W \ Yℓ+i )) = 1; and also Prσ,π∗ (Büchi(D
bi) |
(Büchi(D
1, we have Prσ,π
s
s
T
0≤i≤k−1 Büchi(W \ Yℓ+i )) = 1. It follows that for all states s ∈ W , for all strategies
∗ T
bi) | T
σ for player 1 we have Prσ,π
( 0≤i≤k−1 Büchi(D
s
0≤i≤k−1 Büchi(W \ Yℓ+i )) = 1.
Hence the play visits states with colors not in Ci with probability 1. Hence the set
of colors visited infinitely often is not contained in any Ci . Since C0 , C1 , . . . , Ck−1 are
all the maximal subsets of F, we have the set of colors visited infinitely often is not
in F with probability 1, and hence player 2 wins almost-surely.
∗
Hence it follows that for all strategies σ and for all states s ∈ (S \W ) we have Prσ,π
(Φ) = 1.
s
To complete the proof we present precise description of the strategy π ∗ with memory of size
i ) be an almost-sure winning strategy for player 2 for the subgame on
mF . Let π i = (πui , πm
79
CHAPTER 4. STOCHASTIC MÜLLER GAMES
Yℓ+i with memory MF i . By definition we have mF =
Pk−1
i=0
mF i . Let MF =
Sk−1
i=0 (MF i
×
{i}). This set is not exactly the set {1, 2, . . . , mF }, but has the same cardinality (which
suffices for our purpose). We define the strategy π ∗ as follows:



πui (s, (m, i))
s ∈ Yℓ+i
∗
πu (s, (m, i)) =


(1, i + 1 mod k) otherwise.
∗
(s, (m, i)) =
πm



i (s, (m, i))


πm




π Li (s)







s i
s ∈ Yℓ+i
bi
s ∈ Li \ D
b i , si ∈ E(s) ∩ W .
s∈D
b i , W ); π Li is a pure memoryless attractor strategy on Li to D
b i , and
where Li = Attr2,P (D
si is a successor state of s in W (such a state exists since W induces a subgame). This
formally represents π ∗ and the size of π ∗ satisfies the required bound. Observe that the
disjoint sum of all MF i was required since Yℓ , Yℓ+1 , . . . , Yℓ+k−1 may not be disjoint and the
strategy π ∗ need to know which Yj the play is in.
It follows from the existence of pure finite-memory qualitative strategies, that for
Müller objectives, the set of almost-sure and limit-sure winning states coincide for 2 12 -player
game graphs. This along with Theorem 5 and Corollary 4 gives us the following corollary.
Corollary 5 For all 2 21 -player game graphs G, for all Müller objectives Ψ1 we have
Almost 1 (Ψ1 ) = Limit 1 (Ψ1 ). For all 2 21 -player game graphs G, for all Müller objectives
Ψ1 , (a) if for all states s ∈ S we have Val 1 (Ψ1 )(s) > 0, then Almost 1 (Ψ1 ) = S; and (b) if
Almost 1 (Ψ1 ) = ∅, then Almost 2 (Ω \ Ψ1 ) = S.
Lower bound. In [DJW97] the authors show a matching lower bound for sure winning
strategies in 2-player games (see Theorem 18). It may be noted that in 2-player games
any pure almost-sure winning or any pure positive winning strategy is also a sure winning
strategy. This observation along with the result of [DJW97] gives us the following result.
80
CHAPTER 4. STOCHASTIC MÜLLER GAMES
Theorem 20 (Lower bound [DJW97]) For all Müller winning conditions F ⊆ P(C),
there is a 2-player game (G, C, χ, F) (with a 2-player game graph G) such that every pure
almost-sure and positive winning strategy for player 1 requires memory of size at least mF ;
and every pure almost-sure and positive winning strategy for player 2 requires memory of
size at least mF .
4.3.1
Complexity of qualitative analysis
We now present algorithms to compute the almost-sure and positive winning states
for Müller objectives Müller(F) in 2 21 -player games. We will consider two cases: the case
when C ∈ F and when C 6∈ F. We present the algorithm for the later case (which recursively
calls the former case). Once the algorithm for the later case is obtained, we show how the
algorithm can be iteratively used to solve the former case.
Informal description of the algorithm. We present an algorithm to compute the
positive winning sets for player 1 and the almost-sure winning sets for player 2 for Müller
objectives Müller(F) for player 1 in 2 21 -player game graphs. We consider the case with
C 6∈ F and refer to this algorithm as MüllerQualitativeWithoutC and the case when
C ∈ F we refer to the algorithm as MüllerQualitativeWithC. The algorithm proceeds
iteratively removing positive winning sets for player 1: at iteration j the game graph is
denoted as Gj and the set of states as S j . The algorithm is described as Algorithm 1.
Correctness. If W1 and W2 are outputs of Algorithm 1, then W1 = Positive 1 (Müller(F))
and W2 = Almost 2 (Müller(F )). The correctness follows from the correctness arguments
of Theorem 19.
We now present an algorithm to compute the almost-sure winning
states Almost 1 (Müller(F)) for player 1 and positive winning states Positive 2 (Müller(F))
for player 2 for Müller objectives Müller(F) with C 6∈ F.
Once we present this
algorithm, it is easy to exchange the roles of the players to obtain the algorithm
MüllerQualitativeWithC. The algorithm to compute almost-sure winning states for
81
CHAPTER 4. STOCHASTIC MÜLLER GAMES
Algorithm 1 MüllerQualitativeWithoutC
Input: A 2 12 -player game graph G, a Müller objective Müller(F) for player 1,
with F ⊆ P(C) and C 6∈ F.
Output: W1 and W2 .
1. Let C0 , C1 , . . . , Ck−1 be the maximal sets that appear in F.
2. U0 = ∅; j = 0; G0 = G;
3. do {
3.1 Dj = C \ Cj
mod k ;
3.2 Yj = S j \ Attr2,P (χ−1 (Dj ), S j );
3.3 (Aj1 , Aj2 ) = MüllerQualitativeWithC(Gj ↾ Yj , F ↾ Cj
mod k );
3.4 if (Aj1 6= ∅)
3.4.1 Uj+1 = Uj ∪ Attr1,P (Uj ∪ Aj1 , S j );
3.4.2 Gj+1 = G ↾ (S \ Uj+1 );
3.5 j = j + 1;
} while (j ≤ k ∨ ¬(j mod k = 0 ∧ j > k ∧ ∀i. j − k ≤ i ≤ j. Ai1 = ∅));
4. return (W1 , W2 ) = (Uj , S \ Uj ).
player 1 for Müller objectives Müller(F) with C 6∈ F proceeds as follows: the algorithm
iteratively uses MüllerQualitativeWithoutC and runs for atmost |S| iterations. At iteration i the algorithm computes the almost-sure winning set Aj2 for player 2 in the present
sub-game Gj , and the set of states such that player 2 can reach Aj2 with positive probability.
The above set is removed from the game graph, and the algorithm iterates on a smaller
game graph. The algorithm is formally described as Algorithm 2.
Correctness.
Let W1 and W2 be the output of Algorithm 2, then we argue that
82
CHAPTER 4. STOCHASTIC MÜLLER GAMES
Algorithm 2 MüllerQualitativeWithoutCIterative
Input: A 2 12 -player game graph G, a Müller objective Müller(F) for player 1,
with F ⊆ P(C) and C 6∈ F.
Output: W1 and W2 .
1. Let C0 , C1 , . . . , Ck−1 be the maximal sets that appear in F.
2. X0 = ∅; j = 0; G0 = G;
3. do {
3.1 (Aj1 , Aj2 ) = MüllerQualitativeWithoutC(Gj , F);
3.2 if (Aj2 6= ∅);
3.2.1 Xj+1 = Xj ∪ Attr2,P (Xj ∪ Aj2 , S 0 );
3.2.2 Gj+1 = G ↾ (S \ Xj+1 );
3.5 j = j + 1;
} while (A2j−1 6= ∅);
4. return (W1 , W2 ) = (S \ Xj , Xj ).
W1 = Almost 1 (Müller(F)) and W2 = Positive 2 (Müller(F )).
It is clear that W2 ⊆
Positive 2 (Müller(F )). We now argue that W1 = Almost 1 (Müller(F)) to complete the
correctness arguments. When the algorithm terminates, let the game graph by Gj , and
we have Aj2 = ∅. Then in Gj , player 1 wins with positive probability from all states. It
follows from Theorem 5 and Corollary 5 that if a player wins in a game with positive probability from all states for a Müller objective, then the player wins with value 1 from all
states, and wins almost-surely from all states for 2 21 -player game graphs. It follows that
W1 = Almost 1 (Müller(F)). The correctness follows.
Time and space complexity. We now argue that the space requirement for the algorithms
CHAPTER 4. STOCHASTIC MÜLLER GAMES
83
are polynomial. Let us denote the space recurrence of Algorithm 1 as S(n, c) for game graphs
with n states and Müller objectives Müller(F) with c colors (i.e., F ⊆ P(C) with |C| = c).
Then the recurrence satisfies that S(n, c) = O(n) + S(n, c − 1) = O(n · c). The recurrence
requires space for recursive calls with at least one less color (denoted by S(n, c − 1)), and
O(n) space for the computation of the loop of the algorithm. This gives a PSPACE upper
bound, and a matching lower bound (of PSPACE-hardness) for the special case of 2-player
game graphs is given in [HD05]. The result also improves the previous 2EXPTIME bound
for the qualitative analysis for 2 12 -player games with Müller objectives (Corollary3).
Theorem 21 (Algorithm and complexity) The following assertions hold.
1. Given a 2 12 -player game graph G and a Müller winning condition F Algorithm 1 and
Algorithm 2 computes an almost-sure winning strategy and the almost-sure winning
sets in O((|S| + |E|) · d)h+1 ) time and O(|S| · |C|) space; where d is the maximum
degree of a node and h is the height of the Zielonka tree ZF .
2. Given a 2 21 -player game graph G, a Müller objective Φ, and a state s, it is PSPACEcomplete to decide whether s ∈ Almost 1 (Φ).
4.4
Optimal Memory Bound for Pure Optimal Strategies
In this section we extend the sufficiency results for families of strategies from
almost-sure winning to optimality with respect to all Müller objectives. In the following,
we fix a 2 12 -player game graph G. We first present a useful proposition and then some
definitions. Since Müller objectives are infinitary objectives (independent of finite prefixes)
the following proposition is immediate.
Proposition 7 (Optimality conditions) For all Müller objectives Φ, for every s ∈ S
the following conditions hold.
84
CHAPTER 4. STOCHASTIC MÜLLER GAMES
1. If s ∈ S1 , then for all t ∈ E(s) we have Val 1 (Φ)(s) ≥ Val 1 (Φ)(t), and for some
t ∈ E(s) we have Val 1 (Φ)(s) = Val 1 (Φ)(t).
2. If s ∈ S2 , then for all t ∈ E(s) we have Val 1 (Φ)(s) ≤ Val 1 (Φ)(t), and for some
t ∈ E(s) we have Val 1 (Φ)(s) = Val 1 (Φ)(t).
3. If s ∈ SP , then Val 1 (Φ)(s) =
P
t∈E(s) Val 1 (Φ)(t)
· δ(s)(t) .
Similar conditions hold for the value function Val 2 (Ω \ Φ) of player 2.
Definition 12 (Value classes) Given a Müller objective Φ, for every real r ∈ [0, 1] the
value class with value r is VC(Φ, r) = {s ∈ S | Val 1 (Φ)(s) = r} is the set of states with
S
value r for player 1. For r ∈ [0, 1] we denote by VC(Φ, > r) = q>r VC(Φ, q) the value
S
classes greater than r and by VC(Φ, < r) = q<r VC(Φ, q) the value classes smaller than r.
Definition 13 (Boundary probabilistic states) Given a set U of states, a state s ∈
U ∩SP is a boundary probabilistic state for U if E(s)∩(S\U ) 6= ∅, i.e., the probabilistic state
has an edge out of the set U . We denote by Bnd (U ) the set of boundary probabilistic states
for U . For a value class VC(Φ, r) we denote by Bnd (Φ, r) the set of boundary probabilistic
states of value class r.
Observation. It follows from Proposition 7 that for a state s ∈ Bnd (Φ, r) we have E(s) ∩
VC(Φ, > r) 6= ∅ and E(s) ∩ VC(Φ, < r) 6= ∅, i.e., the boundary probabilistic states have
edges to higher and lower value classes. It follows that for all Müller objectives Φ we have
Bnd (Φ, 1) = ∅ and Bnd (Φ, 0) = ∅.
Reduction of a value class. Given a set U of states, such that U is δ-live, let Bnd (U ) be
the set boundary probabilistic states for U . We denote by GBnd (U ) the subgame G ↾ U where
every state in Bnd (U ) is converted to an absorbing state (state with a self-loop). Since U
CHAPTER 4. STOCHASTIC MÜLLER GAMES
85
is δ-live, we have GBnd (U ) is a subgame. Given a value class VC(Φ, r), let Bnd (Φ, r) be
the set of boundary probabilistic states in VC(Φ, r). We denote by GBnd(Φ,r) the subgame
where every boundary probabilistic state in Bnd (Φ, r) is converted to an absorbing state.
We denote by GΦ,r = GBnd (Φ,r) ↾ VC(Φ, r): this is a subgame since every value class is
δ-live by Proposition 7, and δ-closed as all states in Bnd (Φ, r) are converted to absorbing
states.
Lemma 15 (Almost-sure reduction) Let G be a 2 21 -player game graph and F ⊆ P(C)
be a Müller winning condition. Let Φ = Müller(F). For 0 < r < 1, the following assertions
hold.
1. Player 1 wins almost-surely for objective Φ∪Reach(Bnd (Φ, r)) from all states in GΦ,r ,
i.e., Almost 1 (Φ ∪ Reach(Bnd (Φ, r))) = VC(Φ, r) in the subgame GΦ,r .
2. Player 2 wins almost-surely for objective Φ∪Reach(Bnd (Φ, r)) from all states in GΦ,r ,
i.e., Almost 2 (Φ ∪ Reach(Bnd (Φ, r))) = VC(Φ, r) in the subgame GΦ,r .
Proof. We prove the first part and the second part follows from symmetric arguments.
The result is obtained through an argument by contradiction. Let 0 < r < 1, and let
q = max{Val 1 (Φ)(t) | t ∈ E(s) \ VC(Φ, r), s ∈ VC(Φ, r) ∩ S1 },
that is, q is the maximum value a successor state t of a player 1 state s ∈ VC(Φ, r) such
that the successor state t is not in VC(Φ, r). By Proposition 7 we must have q < r.
Hence if player 1 chooses to escape the value class VC(Φ, r), then player 1 gets to see
a state with value at most q < r. We consider the subgame GΦ,r . Let U = VC(Φ, r)
and Z = Bnd (Φ, r). Assume towards contradiction, there exists a state s ∈ U such that
s 6∈ Almost 1 (Φ ∪ Reach(Z)). Then we have s ∈ (U \ Z) and Val 2 (Φ ∩ Safe(U \ Z))(s) >
0. It follows from Theorem 5 (and also Corollary 5) that for all Müller objectives Ψ, if
Val 2 (Ψ)(s) > 0, then for some state s1 we have Val 2 (Ψ)(s1 ) = 1. Observe that in GΦ,r
86
CHAPTER 4. STOCHASTIC MÜLLER GAMES
we have all states in Z are absorbing states, and hence the objective Φ ∩ Safe(U \ Z) is
equivalent to the objective Φ ∩ coBüchi(U \ Z), which is a Müller objective. It follows that
there exists a state s1 ∈ (U \ Z) such that Val 2 (Φ ∩ Safe(U \ Z)) = 1. Hence there exists
a strategy π
b for player 2 in GΦ,r such that for all strategies σ
b for player 1 in GΦ,r we have
Prσsb1,bπ (Φ∩Safe(U \Z)) = 1. We will now construct a strategy π ∗ for player 2 as a combination
of the strategy π
b and a strategy in the original game G. By Martin’s determinacy result
(Theorem 16), for all ε > 0, there exists an ε-optimal strategy πε for player 2 in G such
that for all s ∈ S and for all strategies σ for player 1 we have
ε
(Φ) ≥ Val 2 (Φ)(s) − ε.
Prσ,π
s
Let r − q = α > 0, and let ε =
α
2
and consider an ε-optimal strategy for player 2 in G.
The strategy π ∗ in G is constructed as follows: for a history w that remains in U , player 2
follows π
b; and if the history reaches (S \ U ), then player 2 follows the strategy πε . Formally,
for a history w = hs1 , s2 , . . . , sk i we have



π
b(w)
if for all 1 ≤ j ≤ k. sj ∈ U ;
∗
π (w) =


πε (sj , sj+1 , . . . , sk ) where j = min{i | si 6∈ U }
We consider the case when the play starts at s1 . The strategy π ∗ ensures the following: if
the game stays in U , then the strategy π
b is followed, and given the play stays in U , the
strategy π
b ensures with probability 1 that Φ is satisfied and Bnd (Φ, r) is not reached. Hence
if the game escapes U (i.e., player 1 chooses to escape U ), then it reaches a state with value
at most q for player 1. We consider an arbitrary strategy σ for player 1 and consider the
following cases.
∗
∗
σ,π
σ,b
π
1. If Prσ,π
s1 (Safe(U )) = 1, then we have Prs1 (Φ ∩ Safe(U )) = Prs1 (Φ ∩ Safe(U )) = 1.
∗
π
Hence we also have Prsσ,b
(Φ) = 1, i.e., we have Prσ,π
s1 (Φ) = 0.
1
∗
2. If Prσ,π
s1 (Reach(S \ U )) = 1, then the play reaches a state with value for player 1 at
∗
most q and the strategy πε ensures that Prσ,π
s1 (Φ) ≤ q + ε.
87
CHAPTER 4. STOCHASTIC MÜLLER GAMES
∗
∗
σ,π
3. If Prσ,π
s1 (Safe(U )) > 0 and Prs1 (Reach(S \ U )) > 0, then we condition on both these
events and have the following:
∗
∗
∗
σ,π
σ,π
Prσ,π
s1 (Φ) = Prs1 (Φ | Safe(U )) · Prs1 (Safe(U ))
∗
∗
σ,π
+ Prσ,π
s1 (Φ | Reach(S \ U )) · Prs1 (Reach(S \ U ))
∗
≤ 0 + (q + ε) · Prσ,π
s1 (Reach(S \ U ))
≤ q + ε.
The above inequalities are obtained as follows: given the event Safe(U ), the strategy
π ∗ follows π
b and ensures that Φ is satisfied with probability 1 (i.e., Φ is satisfied with
probability 0); else the game reaches states where the value for player 1 is at most q,
and then the analysis is similar to the previous case.
Hence for all strategies σ we have
∗
Prσ,π
s1 (Φ) ≤ q + ε = q +
α
α
=r− .
2
2
Hence we must have Val 1 (Φ)(s1 ) ≤ r− α2 . Since α > 0 and s1 ∈ VC(Φ, r) (i.e., Val 1 (Φ)(s1 ) =
r), we have a contradiction. The desired result follows.
Lemma 16 (Almost-sure to optimality) Let G be a 2 21 -player game graph and F ⊆
P(C) be a Müller winning condition. Let Φ = Müller(F). Let σ be a strategy such that
• σ is an almost-sure winning strategy from the almost-sure winning states (Almost 1 (Φ)
in G); and
• σ is an almost-sure winning strategy for objective Φ ∪ Reach(Bnd (Φ, r)) in the game
GΦ,r , for all 0 < r < 1.
Then σ is an optimal strategy.
CHAPTER 4. STOCHASTIC MÜLLER GAMES
88
Proof. We prove the result for the case when σ is memoryless (randomized memoryless).
The case when σ is finite-memory with memory M, the arguments can be repeated on the
game G × M (the usual synchronous product of G and the memory M).
Consider the player-2 MDP Gσ with the objective Müller(F ) for player 2. In MDPs
with Müller objectives randomized memoryless optimal strategies exist (Theorem 13). We
fix a randomized memoryless optimal strategy π for player 2 in Gσ . Let W1 = Almost 1 (Φ)
and W2 = Almost 2 (Φ). We consider the Markov chain Gσ,π and analyze the recurrent states
of the Markov chain.
Recurrent states in Gσ,π . Let U be a closed, connected recurrent set in Gσ,π (i.e., U is a
bottom strongly connected component in the graph of Gσ,π ). Let q = max{r | VC(Φ, r) ∩
U 6= ∅}, i.e., for all q ′ > q we have VC(Φ, q ′ ) ∩ U = ∅ or in other words VC(Φ, > q) ∩ U = ∅.
For a state s ∈ U ∩ VC(Φ, q) we have the following cases.
1. If s ∈ S1 , then Supp(σ(s)) ⊆ VC(Φ, q). This is because in the game GΦ,q the edges of
player 1 consists of edges in the value class VC(Φ, q)
2. If s ∈ SP and s ∈ Bnd (Φ, q), then it means that U ∩ VC(Φ, q ′ ) 6= ∅, for some q ′ > q:
this is because E(s) ∩ VC(, Φ, > q) 6= ∅ for s ∈ Bnd (Φ, q) and U is closed. This is
not possible since by assumption on U we have U ∩ VC(Φ, > q) = ∅. Hence we have
s ∈ SP ∩ (U \ Bnd (Φ, q)), and E(s) ⊆ VC(Φ, q).
3. If s ∈ S2 , then since U ∩VC(Φ, > q) = ∅, it follows by Proposition 7 that Supp(π(s)) ⊆
VC(Φ, q).
Hence for all s ∈ U ∩ VC(Φ, q) we have all successors of U in Gσ,π are in VC(Φ, q), and
moreover U ∩ Bnd (Φ, q) = ∅, i.e., U is contained in a value class and does not intersect
with the boundary probabilistic states. By the property of strategy σ, if U ∩ (S \ W2 ) 6= ∅,
then for all s ∈ U we have Prσ,π
s (Φ) = 1: this is because for all r > 0, the strategy σ is
almost-sure winning for objective Φ ∪ Reach(Bnd (Φ, r)) in GΦ,r . Since σ is a fixed strategy
89
CHAPTER 4. STOCHASTIC MÜLLER GAMES
and π is optimal against σ, it follows that if Val 1 (Φ)(s) < 1, then Prσ,π
s (Φ) < 1. Hence
it follows that U ∩ (S \ (W1 ∪ W2 )) = ∅. Hence the recurrent states of Gσ,π are contained
in W1 ∪ W2 , i.e., we have Prσ,π
s (Reach(W1 ∪ W2 )) = 1. Since σ is an almost-sure winning
σ,π
strategy in W1 , we have Prσ,π
s (Φ) = Prs (Reach(W2 )). Hence the strategy π maximizes
the probability to reach W2 in the MDP Gσ .
Analyzing reachability in Gσ . Since in Gσ player 2 maximizes the probability to reachability
to W2 , we analyze the player-2 MDP Gσ with objective Reach(W2 ) for player 2. For every
state s consider a real-valued variable xs = 1 − Val 1 (Φ)(s) = Val 2 (Φ)(s). The following
constraints are satisfied
xs =
xs =
P
t∈Supp(σ(s))
P
t∈E(s) xt
xt · σ(s)(t)
· δ(s)(t)
s ∈ S1 ;
s ∈ SP ;
xs ≥ x t
s ∈ S2 ;
xs = 1
s ∈ W2 ;
The first equality follows as for all r ∈ [0, 1] and for all s ∈ S ∩ VC(Φ, r) we have
Supp(σ(s)) ⊆ VC(Φ, r). The next equality and the first inequality follows from Proposition 7. Since the values for MDPs with reachability objective is characterized as the least
value vector satisfying the above constraints [FV97], it follows that for all s ∈ S and for all
strategies π1 ∈ Π we have
1
Prσ,π
(Reach(W2 )) ≤ xs = Val 2 (Φ)(s).
s
σ,π
Hence we have Prσ,π
s (Φ) ≤ Val 2 (Φ)(s), i.e., Prs (Φ) ≥ 1 − Val 2 (Φ)(s) = Val 1 (Φ)(s). Thus
we obtain that σ is an optimal strategy.
Müller reduction for GΦ,r . Given a Müller winning condition F and the objective
Φ = Müller(F), we consider the game GΦ,r with the objective Φ ∪ Reach(Bnd (Φ, r)) for
player 1. We present a simple reduction to a game with objective Φ. The reduction is
achieved as follows: without loss of generality we assume F =
6 ∅, and let F ∈ F and
CHAPTER 4. STOCHASTIC MÜLLER GAMES
90
eΦ,r with objective Φ for player 1 as
F = {cF1 , cF2 , . . . , cFf }. We construct a game graph G
follows: convert every state sj ∈ Bnd (Φ, r) to a cycle Uj = {sj1 , sj2 , . . . , sjf } with χ(sji ) = cFi ,
i.e., once sj is reached the cycle Uj is repeated with χ(Uj ) ∈ F. An almost-sure winning
strategy in GΦ,r with objective Φ ∪ Reach(Bnd (Φ, r)), is an almost-sure winning strategy
eΦ,r with objective Φ; and vice-versa. The present reduction along with Lemma 15 and
in G
Lemma 16 gives us Lemma 17. Observe that Lemma 15 ensures that strategies satisfying
conditions of Lemma 16 exist. Lemma 17 along with Theorem 19 gives Theorem 22, which
generalizes Theorem 18.
Lemma 17 For all Müller winning conditions F, the following assertions hold.
1. If the family of pure finite-memory strategies of size ℓPF suffices for almost-sure winning on 2 12 -player game graphs for Müller(F), then the family of pure finite-memory
strategies of size ℓPF suffices for optimality on 2 21 -player game graphs for Müller(F).
2. If the family of randomized finite-memory strategies of size ℓR
F suffices for almostsure winning on 2 12 -player game graphs for Müller(F), then the family of randomized
1
finite-memory strategies of size ℓR
F suffices for optimality on 2 2 -player game graphs
for Müller(F).
Theorem 22 For all Müller winning conditions F, the family of pure finite-memory strategies of size mF suffices for optimality on 2 21 -player game graphs for Müller objectives
Müller(F).
4.4.1
Complexity of quantitative analysis
In this section we consider the complexity of quantitative analysis of 2 21 -player
games with Müller objectives. We first prove some properties of the values of 2 21 -player
games with Müller objectives. We start with a lemma.
CHAPTER 4. STOCHASTIC MÜLLER GAMES
91
Lemma 18 For all 2 21 -player game graphs, for all Müller objectives Φ, there exist optimal
strategies σ and π for player 1 and player 2 such that the following assertions hold:
1. for all r ∈ (0, 1), for all s ∈ VC(Φ, r) we have Prσ,π
s (Reach(Bnd (Φ, r))) = 1;
2. for all s ∈ S we have
Prσ,π
s (Reach(W1 ∪ W2 )) = 1;
Prσ,π
s (Reach(W1 )) = Val 1 (Φ)(s);
Prσ,π
s (Reach(W2 )) = Val 2 (Φ)(s);
where W1 = Almost 1 (Φ) and W2 = Almost 2 (Φ).
Proof. Consider an optimal strategy σ that satisfies the conditions of Lemma 16, and a
strategy π that satisfies analogous conditions for player 2. For all r ∈ (0, 1), the strategy σ
is almost-sure winning for the objective Φ ∪ Reach(Bnd (Φ, r)) and the strategy π is almostsure winning for the objective Φ ∪ Reach(Bnd (Φ, r)), in the game GΦ,r . Thus we obtain
that for all r ∈ (0, 1), for all s ∈ VC(Φ, r) we have
Prσ,π
s (Φ ∪ Reach(Bnd (Φ, r))) = 1;
and
Prσ,π
s (Φ ∪ Reach(Bnd (Φ, r))) = 1.
It follows that for all r ∈ (0, 1), for all s ∈ VC(Φ, r) we have
Prσ,π
s (Reach(Bnd (Φ, r))) = 1.
From the above condition it easily follows that for all s ∈ S we have Prσ,π
s (Reach(W1 ∪
W2 )) = 1. Since σ and π are optimal strategies, all the requirements of the second condition
are fulfilled. Hence, the strategies σ and π are witness strategies to prove the desired result.
Characterizing values for 2 21 -player Müller games. We now relate the values of 2 21 player game graphs with Müller objectives with the values of a Markov chain, on the same
state space, with reachability objectives. Once the relationship is established we obtain
92
CHAPTER 4. STOCHASTIC MÜLLER GAMES
bound on preciseness of the values. We use Lemma 18 to present two transformations to
Markov chains.
Markov
chain
transformation.
Given
a
((S, E), (S1 , S2 , SP ), δ) with a Müller objective Φ,
2 21 -player
let W1
game
=
graph
G
=
Almost 1 (Φ) and
W2 = Almost 2 (Φ) be the set of almost-sure winning states for the players.
Let σ
and π be optimal strategies for the players (obtained from Lemma 18) such that
1. for all r ∈ (0, 1), for all s ∈ VC(Φ, r) we have Prσ,π
s (Reach(Bnd (Φ, r))) = 1;
2. for all s ∈ S we have
Prσ,π
s (Reach(W1 ∪ W2 )) = 1;
Prσ,π
s (Reach(W1 )) = Val 1 (Φ)(s);
Prσ,π
s (Reach(W2 )) = Val 2 (Φ)(s).
We first consider a Markov chain that mimics the stochastic process under σ and π. The
e = MC1 (G, Φ) with the transition function δe is defined as follows:
e = (S, δ)
Markov chain G
e
1. for s ∈ W1 ∪ W2 we have δ(s)(s)
= 1;
e
2. for r ∈ (0, 1) and s ∈ VC(Φ, r) \ Bnd (Φ, r) we have δ(s)(t)
= Prσ,π
s (Reach({t})), for
t ∈ Bnd (Φ, r) (since for all s ∈ VC(Φ, r) we have Prσ,π
s (Reach(Bnd (Φ, r))) = 1, the
transition function δe at s is a probability distribution); and
e
3. for r ∈ (0, 1) and s ∈ Bnd (Φ, r) we have δ(s)(t)
= δ(s)(t), for t ∈ S.
e mimics the stochastic process under σ and π and yields the following
The Markov chain G
lemma.
Lemma 19 For all 2 12 -player game graphs G and all Müller objectives Φ, consider the
e = MC1 (G, Φ). Then for all s ∈ S we have Val 1 (Φ)(s) = Prs (Reach(W1 )),
Markov chain G
e
that is, the value for Φ in G is equal to the probability to reach W1 in the Markov chain G.
CHAPTER 4. STOCHASTIC MÜLLER GAMES
93
e to another Markov
Second transformation. We now transform the Markov chain G
b We start with the observation that for r ∈ (0, 1), for all states s, t ∈ Bnd (Φ, r)
chain G.
e we have Prs (Reach(W1 )) = Prt (Reach(W1 )) = r. Moreover, for
in the Markov chain G
r ∈ (0, 1), every state s ∈ Bnd (Φ, r) has edges to higher and lower value classes. Hence for
a state s ∈ VC(Φ, r) \ Bnd (Φ, r) if we chose a state tr ∈ Bnd (Φ, r) and make the transition
probability from s to tr to 1, the probability to reach W1 does not change. This motivates
the following transformation: given a 2 12 -player game graph G = ((S, E), (S1 , S2 , SP ), δ)
with a Müller objective Φ, let W1 = Almost 1 (Φ) and W2 = Almost 2 (Φ) be the set of
almost-sure winning states for the players.
b = MC2 (G, Φ)
b = (S, δ)
The Markov chain G
with the transition function δb is defined as follows:
b
1. for s ∈ W1 ∪ W2 we have δ(s)(s)
= 1;
b
2. for r ∈ (0, 1) and s ∈ VC(Φ, r) \ Bnd (Φ, r), pick t ∈ Bnd (Φ, r) and δ(s)(t)
= 1; and
b
3. for r ∈ (0, 1) and s ∈ Bnd (Φ, r) we have δ(s)(t)
= δ(s)(t), for t ∈ S.
b
Observe that for δ>0 = {δ(s)(t) | s ∈ SP , t ∈ S, δ(s)(t) > 0} and δb>0 = {δ(s)(t)
| s∈
b
b are
S, t ∈ S, δ(s)(t)
> 0}, we have δb>0 ⊆ δ>0 ∪ {1}, i.e., the transition probabilities in G
subset of transition probabilities in G. Let
δu = max{q | δ(s)(t) =
p
for s ∈ SP and δ(s)(t) > 0};
q
p
b
b
> 0}.
δbu = max{q | δ(s)(t)
= for s ∈ SP and δ(s)(t)
q
Since δb>0 ⊆ δ>0 ∪ {1}, it follows that δbu ≤ δu . The following lemma is immediate from
e and G.
b
Lemma 19 and the equivalence of the probabilities to reach W1 in G
Lemma 20 For all 2 12 -player game graphs G and all Müller objectives Φ, consider the
b = MC2 (G, Φ). Then for all s ∈ S we have Val 1 (Φ)(s) = Prs (Reach(W1 )),
Markov chain G
b
that is, the value for Φ in G is equal to the probability to reach W1 in the Markov chain G.
94
CHAPTER 4. STOCHASTIC MÜLLER GAMES
Lemma 21 is a result from [Con93] (Lemma 2 of [Con93]).
Lemma 21 ([Con93]) Let G = ((S, E), (S1 , S2 , SP ), δ) be 2 12 -player game graph with n
states such that every state has at most two successors and for all s ∈ SP and t ∈ E(s) we
have δ(s)(t) = 21 . Then for all R ⊆ S, for all s ∈ S we have
Val 1 (Reach(R))(s) =
The results
of
p
where p, q are integers with p, q ≤ 4n−1 .
q
[ZP96]
showed
that
a
2 12 -player
game
graph
G
=
e =
((S, E), (S1 , S2 , SP ), δ) can be reduced to an equivalent 2 12 -player game graph G
e such that every state se ∈ Se has at most two successors and for
e E),
e (Se1 , Se2 , SeP ), δ)
((S,
e s)(e
e s) we have δ(e
e = 2 · |E| · log δu . Lemma 22 follows
all se ∈ SeP and e
t ∈ E(e
t) = 12 , and |S|
from this reduction and Lemma 21.
Lemma 22 ([ZP96]) Let G = ((S, E), (S1 , S2 , SP ), δ) be 2 21 -player game graph. Then for
all R ⊆ S, for all s ∈ S we have
Val 1 (Reach(R))(s) =
p
where p, q are integers with p, q ≤ 42·|E|·log δu = δu4·|E| .
q
Lemma 23 For all 2 12 -player game graphs G = ((S, E), (S1 , S2 , SP ), δ) and all Müller objectives Φ, for all states s ∈ S \ (W1 ∪ W2 ) we have
Val 1 (Φ)(s) =
p
where p, q are integers with 0 < p < q ≤ δu4·|E| ,
q
where W1 and W2 are the almost-sure winning states for player 1 and player 2, respectively.
Proof. Lemma 20 shows the values of the game G can be related to the values of reaching a
b defined on the same state space, and also we have δbu ≤ δu .
set of states in a Markov chain G
The result on the bound on then follows from Lemma 22 and the fact that Markov chains
are a subclass of 2 12 -player games.
95
CHAPTER 4. STOCHASTIC MÜLLER GAMES
Lemma 24 Let G = ((S, E), (S1 , S2 , SP ), δ) be a 2 12 -player game with a Müller objective
Φ. Let P = (V0 , V1 , V2 , . . . , Vk ) be a partition of the state space S, and let r0 > r1 > r2 >
. . . > rk be k-rational values such that the following conditions hold:
1. V0 = Almost 1 (Φ) and Vk = Almost 2 (Φ);
2. r0 = 1 and rk = 0;
3. for all 1 ≤ i ≤ k − 1 we have Bnd (Vi ) 6= ∅ and Vi is δ-live;
4. for all 1 ≤ i ≤ k − 1 and all s ∈ S2 ∩ Vi we have E(s) ⊆
S
j≤i Vj ;
5. for all 1 ≤ i ≤ k − 1 we have Vi = Almost 1 (Φ ∪ Reach(Bnd (Vi ))) in GBnd(Vi ) ;
6. let xs = ri , for s ∈ Vi , and for all s ∈ SP , let xs satisfy that xs =
P
t∈E(s) xt
· δ(s)(t).
Then we have Val 1 (Φ)(s) ≥ xs for all s ∈ S.
Proof. Let σ be a finite-memory strategy with memory M such that (a) σ is almostsure winning from V0 ; and (b) for all 1 ≤ i ≤ k − 1 and s ∈ Vi and all strategies π
for player 2 in GBnd (Vi ) we have Prσ,π
s (Φ ∪ Reach(Bnd (Vi )) = 1; such a strategy exists
since condition 1 (V0 = Almost 1 (Φ)) and condition 5 are satisfied. Let π be a finitememory counter-optimal strategy for player 2 in Gσ , i.e., π is optimal for player 2 for
objective Φ in Gσ . We claim that for all 1 ≤ i ≤ k − 1 and for all s ∈ Vi we have
S
Prσ,π
Reach(Bnd (Vi ) ∪ j<i Vj ) = 1. To prove the claim, assume towards contradiction
s
S
that for some 1 ≤ i ≤ k − 1 and s ∈ Vi we have Prσ,π
Reach(Bnd (Vi ) ∪ j<i Vj ) < 1.
s
σ,π
Then since condition 4 holds we would have Prσ,π
s (Safe(Vi \ Bnd (Vi )) > 0. If Prs (Safe(Vi \
Bnd (Vi )) > 0, then there must be a closed connected recurrent set C in Gσ,π such that C
is contained in (Vi \ Bnd (Vi )) × M. Hence for states se ∈ C we would have Prσ,π
s (Φ) = 1;
e
this holds since we have Prσ,π
s (Φ ∪ Reach(Bnd (Vi ))) = 1. This contradicts the facts that π
is counter-optimal and Vi ∩ Almost 1 (Φ) = ∅. Thus we obtain that for all 1 ≤ i ≤ k − 1 and
96
CHAPTER 4. STOCHASTIC MÜLLER GAMES
all s ∈ Vi we have Prσ,π
s (Reach(Bnd (Vi ) ∪
S
j<i Vj ))
= 1. It follows that for all s ∈ S we
have Prσ,π
s (Reach(V0 ∪ Vk )) = 1. By the ordering r0 > r1 > r2 > . . . > rk , condition 4, and
condition 6, it follows that for all s ∈ S we have Prσ,π
s (Reach(Vk )) ≤ 1 − xs ; this follows by
the analysis of the MDP Gσ with the reachability objective Reach(Vk ) for player 2. Hence
we have Prσ,π
s (Reach(V0 )) ≥ xs . Since σ is almost-sure winning from V0 , we obtain that for
all s ∈ S we have Val 1 (Φ)(s) ≥ xs . The desired result follows.
A PSPACE algorithm for quantitative analysis. We now present a PSPACE algorithm for quantitative analysis for 2 12 -player games with Müller objectives Müller(F). A
PSPACE lower bound is already known for the qualitative analysis of 2-player games with
Müller objectives [HD05]. To obtain an upper bound we present a NPSPACE algorithm.
The algorithm is based on Lemma 24. Given a 2 21 -player game G = ((S, E), (S1 , S2 , SP ), δ)
with a Müller objective Φ, a state s and a rational number r, the following assertion hold:
if Val 1 (Φ)(s) ≥ r, then there exists a partition P = (V0 , V1 , V2 , . . . , Vk ) of S and rational
values r0 > r1 > r2 > . . . > rk , such that ri =
pi
qi
4·|E|
with pi , qi ≤ δu
, such that conditions of
Lemma 24 are satisfied, and s ∈ Vi with ri ≥ r. The witness P is the value class partition
and the rational values represent the values of the value classes. From the above observation
we obtain the algorithm for quantitative analysis as follows: given a 2 12 -player game graph
G = ((S, E), (S1 , S2 , SP ), δ) with a Müller objective Φ, a state s and a rational r, to verify
that Val 1 (Φ)(s) ≥ r, the algorithm guesses a partition P = (V0 , V1 , V2 , . . . , Vk ) of S and
rational values r0 > r1 > r2 > . . . > rk , such that ri =
pi
qi
4·|E|
with pi , qi ≤ δu
, and then
verifies that all the conditions of Lemma 24 are satisfied, and s ∈ Vi with ri ≥ r. Observe
that since the guesses of the rational values can be made with O(|G| · |S| · |E|) bits, the
guess is polynomial in size of the game. The condition 1 and the condition 5 of Lemma 24
can be verified in PSPACE by the PSPACE qualitative algorithms (see Theorem 21), and
all the other conditions can be checked in polynomial time.
Since NPSPACE=PSPACE
we obtain a PSPACE upper bound for quantitative analysis of 2 21 -player games with Müller
CHAPTER 4. STOCHASTIC MÜLLER GAMES
97
objectives. This improves the previous 3EXPTIME bound for the quantitative analysis of
2 12 -player games with Müller objectives (Corollary3).
Theorem 23 Given a 2 21 -player game G, a Müller objective Φ, a state s, and a rational r
in binary, it is PSPACE-complete to decide if Val 1 (Φ)(s) ≥ r.
4.4.2
The complexity of union-closed and upward-closed objectives
We now consider two special classes of Müller objectives: namely, union-closed
and upward-closed objectives. We will show the quantitative analysis of both these classes
of objectives in 2 12 -player games under succinct representation is co-NP-complete. We first
present these conditions.
1. Union-closed and basis conditions. A Müller winning condition F ⊆ P(C) is unionclosed if for all I, J ∈ F we have I ∪J ∈ F. A basis condition B ⊆ P(C), given as a set
S
B specifies the winning condition F = {I ⊆ C | ∃B1 , B2 , . . . , Bk ∈ B. 1≤i≤k Bi = I}.
A Müller winning condition F can be specified as a basis condition only if F is unionclosed.
2. Upward-closed and superset conditions. A Müller winning condition F ⊆ P(C) is
upward-closed if for all I ∈ F and I ⊆ J ⊆ C we have J ∈ F. A superset condition
U ⊆ P(C), specifies the winning condition F = {I ⊆ C | J ⊆ I for some J ∈ U }.
A Müller winning condition F can be specified as a superset condition only if F is
upward-closed. Any upward-closed condition is also union-closed.
The results of [HD05] showed that the basis and superset conditions are more
succinct ways to represent union-closed and upward-closed conditions, respectively, than
the explicit representation. The following proposition was also shown in [HD05] (see [HD05]
for the formal description of the notion of succinctness and translability).
CHAPTER 4. STOCHASTIC MÜLLER GAMES
98
Proposition 8 ([HD05]) A superset condition is polynomially translatable to an equivalent basis condition.
Strategy complexity for union-closed conditions. We observe that for an union-closed
objective F, the Zielonka tree construction ensures that mF = 1. Then from Theorem 22 we
obtain that for objectives Müller(F ) pure memoryless optimal strategies exist in 2 21 -player
game graphs, for union-closed conditions F.
Proposition 9 For all union-closed winning conditions F we have mF = 1; and pure
memoryless optimal strategies exist for objective Müller(F ) for all 2 12 -player game graphs.
Complexity of basis and superset conditions. The results of [HD05] established that
deciding the winner in 2-player games (that is qualitative analysis for 2-player game graphs)
with union-closed and upward-closed conditions specified as basis and superset conditions
is coNP-complete. The lower bound for the special case of 2-player games, yields a coNP
lower bound for the quantitative analysis of 2 12 -player games with union-closed and upwardclosed conditions specified as basis and superset conditions. We will prove a matching upper
bound. We prove the upper bound for basis conditions, and by Proposition 8 the result also
follows for superset conditions.
The upper bound for basis games. We present a coNP upper bound for the quantitative analysis for basis games. Given a 2 12 -player game graph and a Müller objective
Φ = Müller(F), where F is union-closed and specified as a basis condition defined by B,
let s be a state and r be a rational given in binary. The problem whether Val 1 (Φ)(s) ≥ r
can be decided in coNP. We present a polynomial witness and polynomial time verification
procedure when the answer to the problem is “NO”. Since F is union-closed, it follows
from Proposition 9 that pure memoryless optimal strategy π exists for player 2. The pure
memoryless optimal strategy is the polynomial witness to the problem, and once π is fixed
CHAPTER 4. STOCHASTIC MÜLLER GAMES
99
we obtain a 1 21 -player game graph Gπ . To present a polynomial time verification procedure
we present a polynomial time algorithm to compute values in an MDP (or 1 12 -player games)
with basis condition B.
Polynomial time algorithm for MDPs with basis condition. Given an 1 21 -player
game graph G, let E be the set of end components. Consider a basis condition B =
{B1 , B2 , . . . , Bk } ⊆ P(C), and let F be the union-closed condition generated from B. The
S
set of winning end-components are U = E ∩ {F ⊆ S | χ−1 (F ) ∈ F}, and let Tend = U ∈U U .
It follows from the results of subsection 4.1 that the value function in G can be computed by
computing the maximal probability to reach Tend . Once the set Tend is computed, the value
function for reachability objective in 1 21 -player game graphs can be computed in polynomial
time by linear-programming (Theorem 11). To complete the proof we present a polynomial
time algorithm to compute Tend .
Computing winning end components. The algorithm is as follows. Let B be the basis
for the winning condition and G be the 1 12 -player game graph. Initialize B0 = B and repeat
the following:
1. let Xi =
S
B∈Bi
χ−1 (B);
2. partition the set Xi into maximal end components MaxEC(Xi );
3. remove an element B of Bi such that χ−1 (B) is not wholly contained in a maximal
end component to obtain Bi+1 ;
until Bi = Bi−1 . When Bi = Bi−1 , let X = Xi , and every maximal end component of X
is an union of basis elements (all Y in X are members of basis elements, i.e., χ−1 (Y ) ∈ B,
and an basis element not contained in any maximal end component of X is removed in
step 3). Moreover, any maximal end component of G which is an union of basis elements
is a subset of an maximal end component of X, since the algorithm preserves such sets.
CHAPTER 4. STOCHASTIC MÜLLER GAMES
100
Hence we have X = Tend . The algorithm requires |B| iterations and each iteration requires
the decomposition of an 1 12 -player game graph into the set of maximal end components,
which can be achieved in O(|S| · |E|) time (see [dA97]). Hence the algorithm works in
O(|B| · |S| · |E|) time. This completes the proof and yields the following result.
Theorem 24 Given a 2 21 -player game graph and a Müller objective Φ = Müller(F), where
F is an union-closed condition specified as a basis condition defined by B or F is an upwardclosed condition specified as a superset condition U, a state s and a rational r given in binary,
it is coNP-complete to decide whether Val 1 (Φ)(s) ≥ r.
4.5
An Improved Bound for Randomized Strategies
We now show that if a player plays randomized strategies, then the upper bound
on memory for optimal strategies can be improved. We first present the notions of an
upward closed restriction of a Zielonka tree. The number mU
F of such restrictions of the
Zielonka tree will be in general lower than the number mF of Zielonka trees, and we show
that randomized strategies with memory of size mU
F suffices for optimality.
Upward closed restriction of Zielonka tree. The upward closed restriction of a Zielonka
tree for a Müller winning condition F ⊆ P(C), denoted as ZFU,C , is obtained by making
upward closed conditions as leaves. Formally, we define ZFU,C inductively as follows:
1. if F is upward closed, then ZFU,C is leaf labeled F (i.e., it has no subtrees);
2. otherwise
(a) if C 6∈ F, then ZFU,C = ZFU,C , where F = P(C) \ F.
(b) if C ∈ F, then the root of ZFU,C is labeled with C; and let C0 , C1 , . . . , Ck−1
be all the maximal sets in {X 6∈ F | X ⊆ C}; then we attach to the root, as
CHAPTER 4. STOCHASTIC MÜLLER GAMES
101
its subtrees, the Zielonka upward closed restricted trees ZFU,C of F ↾ Ci , i.e.,
ZFU↾Ci ,Ci , for i = 0, 1, . . . , k − 1.
U
The number mU
F for ZF ,C is the number defined as the number mF was defined for the tree
ZF ,C .
We will prove randomized strategies of size mU
F suffices for optimality. To prove this
result, we first prove that randomized strategies of size mU
F suffices for almost-sure winning.
The result then follows from Lemma 17. To prove the result for almost-sure winning we
take a closer look at the proof of Theorem 19. The inductive proof characterizes that
if existence of randomized memoryless strategies can be proved for 2 21 -player games with
Müller winning conditions that appear in the leaves of the Zielonka tree, then the inductive
proof generalizes to give a bound as in Theorem 19. Hence to prove an upper bound of
size mU
F for almost-sure winning, it suffices to show that randomized memoryless strategies
suffices for upward closed Müller winning conditions. Lemma 25 proves this result and this
gives us Theorem 25.
Lemma 25 The family ΣUM of randomized uniform memoryless strategies suffices for
almost-sure winning with respect to upward closed objectives on 2 21 -player game graphs.
Proof. Consider a 2 21 -player game graph G and the game (G, C, χ, F) with an upward
closed objective Φ = Müller(F) for player 1, i.e., F is upward closed. Let W1 = Almost 1 (Φ)
be the set of almost-sure winning states for player 1 in G. We have S \ W1 = Positive 2 (Φ)
and hence any almost-sure winning strategy for player 1 ensures that from W1 the set S \W1
is not reached with positive probability. Hence we only require to consider strategies σ for
player 1 such that for all w ∈ W1∗ and s ∈ W1 we have Supp(σ(w · s)) ⊆ W1 . Consider
a randomized memoryless strategy σ for player 1 such that for a state s ∈ W1 it chooses
uniformly at random all successors in W1 . Observe that for a state s ∈ (S2 ∪SP )∩W1 we have
E(s) ⊆ W1 ; otherwise s would not have been in W1 . Consider the MDP Gσ ↾ W1 . Since it is
CHAPTER 4. STOCHASTIC MÜLLER GAMES
102
a player-2 MDP with the Müller objective Φ and randomized memoryless optimal strategies
exist in MDPs (Theorem 13), we fix a memoryless counter-optimal strategy π for player 2
in Gσ ↾ W1 . Now consider the player-1 MDP Gπ ↾ W1 . Consider a memoryless strategy
σ ′ in Gπ ↾ W1 . We first present an observation: since the strategy σ chooses all successors
in W1 uniformly at random and for all s ∈ W1 ∩ S1 we have Supp(σ ′ (s)) ⊆ Supp(σ(s)),
it follows that for every closed recurrent set U ′ in the Markov chain Gσ′ ,π ↾ W1 there is
a closed recurrent set U in the Markov chain Gσ,π ↾ W1 with U ′ ⊆ U . We now prove
that σ is an almost-sure winning strategy by showing that all recurrent set of states U in
Gσ,π ↾ W1 is winning for player 1, i.e., χ(U ) ∈ F. Assume towards contradiction, there
is a closed recurrent set U in Gσ,π ↾ W1 with χ(U ) 6∈ F. Consider the player-1 MDP
Gπ ↾ W1 . Since randomized memoryless optimal strategies exist in MDPs (Theorem 13),
we fix a memoryless counter-optimal strategy σ ′ for player 1. By observation for any closed
recurrent set U ′ in Gσ′ ,π such that U ′ ∩ U 6= ∅ we have U ′ ⊆ U ; and moreover, χ(U ′ ) ⊆ χ(U )
and χ(U ′ ) 6∈ F, since F is upward closed and χ(U ) 6∈ F. It then follows that player 2 wins
with probability 1 in from a non-empty set U ′ (a closed recurrent set U ′ ⊆ U ) of states
in the Markov chain Gσ′ ,π . Since π is a fixed strategy for player 2 and the strategy σ ′ is
counter-optimal for player 1, this contradicts that U ′ ⊆ U ⊆ Almost 1 (Φ). It follows that
every closed recurrent set U in Gσ,π ↾ W1 is winning for player 1 and the result follows.
Theorem 25 For all Müller winning conditions F, the family of randomized finite-memory
1
strategies of size mU
F suffices for optimality on 2 2 -player game graphs for Müller(F).
Remark. In general we have mU
F < mF . Consider for example F ⊆ P(C), where C =
{c1 , c2 , . . . , ck }. For the Müller winning condition F = {C}. We have mU
F = 1, and
mF = |C|.
CHAPTER 4. STOCHASTIC MÜLLER GAMES
4.6
103
Conclusion
In this chapter we presented optimal memory bounds for pure almost-sure, positive
and optimal strategies for 2 12 -player games with Müller winning conditions. We also present
improved memory bounds for randomized strategies. Unlike the results of [DJW97] our
results do not extend to infinite state games: for example, the results of [EY05] showed
that even for 2 21 -player pushdown games optimal strategies need not exist, and for ε > 0
even ε-optimal strategies may require infinite memory. For lower bound of randomized
strategies the constructions of [DJW97] do not work: in fact for the family of games used
for lower bounds in [DJW97] randomized memoryless almost-sure winning strategies exist.
However, it is known that there exist Müller winning conditions F ⊆ P(C), such that
randomized almost-sure winning strategies may require memory |C|! [Maj03]. However,
whether a matching lower bound of size mU
F can be proved in general, or whether the upper
bound of mU
F can be improved and a matching lower bound can be proved for randomized
strategies with memory remains open.
104
Chapter 5
Stochastic Rabin and Streett
Games
In this chapter we will consider 2 21 -player games with two canonical forms of Müller
objectives, namely, the Rabin and Streett objectives.1 We will prove that the quantitative
analysis of 2 12 -player games with Rabin and Streett objectives are NP-complete and coNPcomplete, respectively. We also present algorithms for both qualitative and quantitative
analysis of 2 12 -player games with Rabin and Streett objectives. We start with the qualitative
analysis of 2 21 -player games with Rabin objectives.
5.1
Qualitative Analysis of Rabin Games
In this section we present algorithms for qualitative analysis for 2 21 -player Rabin
games. We present a reduction of 2 12 -player Rabin games to 2-player Rabin games preserving
the ability of player 1 to win almost-surely. The reduction thus makes all algorithms for
2-player Rabin games [PP06, KV98] readily available for qualitative analysis of 2 12 -player
games with Rabin objectives.
1
Preliminary versions of the results of this chapter appeared in [CJH04, CdAH05, CH06b]
105
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
Reduction.
Given a 2 12 -player game graph G = ((S, E), (S1 , S2 , SP ), δ), a set P =
{e1 , f1 , . . . , ed , fd } of colors, and a color map [·]: S → 2P \ ∅, we construct a 2-player
game graph G = ((S, E), (S 1 , S 2 ), δ) together with a color map [·]: S → 2P \ ∅ for the
extended color set P = P ∪ {ed+1 , fd+1 }. The construction is specified as follows. For every
nonprobabilistic state s ∈ S1 ∪ S2 , there is a corresponding state s ∈ S such that (1) s ∈ S 1
iff s ∈ S1 , and (2) [s] = [s], and (3) (s, t) ∈ E iff (s, t) ∈ E. Every probabilistic state
s ∈ SP is replaced by the gadget shown in Figure 5.1. In the figure, diamond-shaped states
are player-2 states (in S 2 ), and square-shaped states are player-1 states (in S 1 ). From the
state s with [s] = [s], the players play the following 3-step game in G. First, in state s
player 2 chooses a successor (e
s, 2k), for k ∈ {0, 1, . . . , d}. For every state (e
s, 2k), we have
[(e
s, 2k)] = [s]. For k ≥ 1, in state (e
s, 2k) player 1 chooses from two successors: state
(b
s, 2k − 1) with [(b
s, 2k − 1)] = ek , or state (b
s, 2k) with [(b
s, 2k)] = fk . The state (e
s, 0) has
only one successor (b
s, 0), with [(b
s, 0)] = {f1 , f2 , . . . , fd , fd+1 }. Note that no state in S is
labeled by the new color ed+1 , that is, [ed+1] = ∅. Finally, in each state (b
s, j) the choice
is between all states t such that (s, t) ∈ E, and it belongs to player 1 if k is odd, and to
player 2 if k is even.
We
consider
2 12 -player
games
played
on
the
graph
G
{(e1 , f1 ), . . . , (ed , fd )} and the Rabin objective Rabin(P ) for player 1.
with
P
=
We denote
by G = Tr1as (G) the 2-player game, with Rabin objective Rabin(P ), where P =
{(e1 , f1 ), . . . , (ed+1 , fd+1 )}, as defined by the reduction above. Also given a strategy (pure
memoryless) σ in the 2-player game G, a strategy σ = Tr1as (σ) in the 2 12 -player game G is
defined as follows:
σ(s) = t, if and only if σ(s) = t; for all s ∈ S1 .
Definition 14 (Winning s.c.c.s and end components) Let G be a 1-player game
graph and Rabin(P ) the objective for player 1 with P = {(e1 , f1 ), (e2 , f2 ), . . . , (ed , fd )} the
106
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
[s]
s
[s]
[s]
f1 , f2 , . . . , fd , fd+1
(b
s, 0)
E(s)
[s]
(e
s, 2)
(e
s, 0)
[s]
(e
s, 2d)
(e
s, 4)
f1
e1
(b
s, 1)
(b
s, 2)
(b
s, 3)
e2
f2
(b
s, 4)
E(s)
E(s)
E(s)
E(s)
ed
(b
s, 2d − 1)
E(s)
fd
(b
s, 2d)
E(s)
Figure 5.1: Gadget for the reduction of 2 12 -player Rabin games to 2-player Rabin games.
set of d pairs of colors. A strongly connected component (s.c.c) C in G is winning for
player 1 if there exists i ∈ {1, 2, . . . , d} such that C ∩ Fi 6= ∅ and C ∩ Ei = ∅; otherwise C
is winning for player 2. If G is a MDP with the set P of colors, then an end component
C in G is winning for player 1 if there exists i ∈ {1, 2, . . . , d} such that C ∩ Fi 6= ∅ and
C ∩ Ei = ∅; otherwise C is winning for player 2.
Lemma 26 Given a 2 21 -player game graph G with Rabin objective Rabin(P ) for player 1,
let U 1 and U 2 be the sure winning sets for players 1 and 2, respectively, in the 2-player game
graph G = Tr1as (G) with the modified Rabin objective Rabin(P ). Define the sets U1 and U2 in
the original 2 12 -player game graph G by U1 = {s ∈ S | s ∈ U 1 } and U2 = {s ∈ S | s ∈ U 2 }.
Then the following assertions hold:
1. (a) U1 ⊆ Almost 1 (Rabin(P )); and
2. (b) if σ is a pure memoryless sure winning strategy for player 1 from U 1 in G, then
σ = Tr1as (σ) is an almost-sure winning strategy for player 1 from U1 in G.
Proof. Consider a pure memoryless sure winning strategy σ in the game G from every
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
107
state s ∈ U 1 ; such a strategy exists by Theorem 17. Our goal is to establish that the pure
memoryless strategy σ = Tr1as (σ) is an almost-sure winning strategy from every state in U1 .
Winning end components in (G ↾ U1 )σ . We first prove that every end component in the
player-2 MDP (G ↾ U1 )σ is winning for player 1. We argue that if there is an end component
in (G ↾ U1 )σ that is winning for player 2, then we can construct an s.c.c in the subgraph
(G ↾ U 1 )σ that is winning for player 2. This will give a contradiction because σ is a sure
winning strategy for player 1 from the set U 1 in the 2-player Rabin game G. Let C be
an end component in (G ↾ U1 )σ that is winning for player 2. We denote by C the set
of states in the gadget of states in C. Since C is a winning end component for player 2
for all i ∈ {1, 2, . . . , d} we have if Fi ∩ C 6= ∅, then C ∩ Ei 6= ∅. Let us define the set
I = {i1 , i2 , . . . , ij } such that Eik ∩ C 6= ∅. Thus for all i ∈ ({1, 2, . . . , d} \ I) we have
Fi ∩ C = ∅. Note that I 6= ∅, as every state has at least one color. We now construct a
subgame in Gσ as follows:
• For a state s ∈ C ∩ S 2 keep all the edges (s, t) such that t ∈ C.
• For a state s ∈ C ∩ S P the subgame is defined as follows:
s, 2i) such that i ∈ I.
– At state s choose the edges to state (e
– For a state s ∈ U1 , let dis(s, C ∩ Ei ) denote the shortest distance (BFS distance)
from s to C ∩ Ei in the graph of (G ↾ U1 )σ . At state (b
s, 2i), which is a player 2
state, player 2 chooses a successor s1 such that dis(s1 , C ∩ Ei ) < dis(s, C ∩ Ei )
(i.e., shorten distance to the set C ∩ Ei in G); unless s ∈ Ei , in which case (b
s, 2i)
keeps all the edges ((b
s, 2i), t) such that t ∈ C.
The construction is illustrated in Fig. 5.2. We now prove that every terminal s.c.c.
(i.e., a bottom strongly connected component) is winning for player 2 in the subgame thus
constructed in (G ↾ C)σ , where C is the set of states in the gadget of states in C. Consider
108
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
[s]
s
ik
i1
i2
[s]
(e
s, 2i1 )
[s]
(e
s, 2i2 )
fi1
ei1
(b
s, 2i1 − 1)
E(s)
(b
s, 2i1 )
E(s)
ei2
(b
s, 2i2 − 1)
E(s)
[s]
(e
s, 2ik )
fi2
(b
s, 2i2 )
E(s)
eik
fik
(b
s, 2ik − 1)
(b
s, 2ik )
E(s)
E(s)
Figure 5.2: The strategy sub-graph in Gσ .
any arbitrary terminal s.c.c Y in the subgame constructed in (G ↾ C)σ . It follows from the
construction that for every i ∈ {1, 2, . . . , d} \ I we have Fi ∩ Y = ∅. Suppose there exists
an i ∈ I such that Fi ∩ Y 6= ∅, then we show that Ei ∩ Y 6= ∅. The claim follows from the
following case analysis.
1. If there is at least one state (e
s, 2i) in Y such that the strategy σ chooses the successor
(b
s, 2i − 1), then since [(e
s, 2i − 1)] = ei we have Ei ∩ Y 6= ∅.
s, 2i).
2. Else for every state (e
s, 2i) in Y the strategy σ for player 1 chooses the successor (b
For a state s ∈ Y , let dis(s, Ei ) denote the shortest distance (BFS distance) from s
to Ei in the graph of (G ↾ U1 )σ . At state (b
s, 2i), which is a player 2 state, player 2
chooses a successor s1 such that dis(s1 , Ei ) < dis(s, Ei ) (i.e., shorten distance to the
set Ei ); unless s ∈ Ei (in which case s ∈ Ei ). Hence the terminal s.c.c. Y must contain
a state s such that [s] = ei . Hence Ei ∩ Y 6= ∅.
We now consider the probability of staying in U1 . For every probabilistic state
s ∈ SP ∩ U1 , all of its successors are in U1 , i.e., E(s) ⊆ U1 . Otherwise, player 2 in the state
s of the game G can choose the successor (e
s, 0) and then a successor to its winning set U 2 .
This will again contradict the assumption that the strategy σ is a sure winning strategy for
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
109
the player 1 in the game G from the set U 1 . Similarly, for every state s ∈ S2 ∩ U1 we must
have E(s) ⊆ U1 . For all states s ∈ S1 ∩ U1 we have σ(s) ∈ U1 . Hence for all strategies π, for
all states s ∈ U1 , with probability 1 the set of states visited infinitely often along the play
ωsσ,π is an end component in U1 : since for all s ∈ U1 and for all strategies π for player 2
we have Prσ,π
s (Safe(U1 )) = 1. It follows from Lemma 7 that, since every end component
in (G ↾ U1 )σ is winning for player 1, the strategy σ is an almost-sure winning strategy for
player 1 and U1 ⊆ Almost 1 (Rabin(P )).
Notation for finite-memory strategy. Let π be a finite-memory strategy for player 2
in the game G with finite-memory M. The strategy π can be considered as a memoryless
strategy, denoted as π ∗ = MemLess(π), in G × M (the synchronous product of G with M)
as follows: for s ∈ S and m ∈ M we have π ∗ ((s, m)) = (s′ , m′ ) where (a) the memory-update
function is π u (s, m) = m′ and (b) the next-move function is π m (s, m) = s′ . A memoryless
strategy π ∗ in G × M corresponding to the strategy π ∗ is defined in a similar fashion as
σ = Tr1as (σ) was defined previously. From the strategy π ∗ we can then easily define a
finite-memory strategy π in G with memory M; we refer to this strategy as π = Tr2pos (π).
Lemma 27 Given a 2 21 -player game graph G with Rabin objective Rabin(P ) for player 1,
let U 1 and U 2 be the sure winning sets for players 1 and 2, respectively, in the 2-player game
graph G = Tr1as (G) with the modified Rabin objective Rabin(P ). Define the sets U1 and U2 in
the original 2 12 -player game graph G by U1 = {s ∈ S | s ∈ U 1 } and U2 = {s ∈ S | s ∈ U 2 }.
Then there exists a finite-memory strategy π for player 2 in the game G such that for
all strategies σ for player 1 for all states s ∈ U2 we have Prσ,π
s (Streett(P )) > 0, i.e.,
Almost 1 (Rabin(P )) ⊆ (S \ U2 ).
Proof. The proof idea is similar to the proof of Lemma 26. Consider a finite-memory sure
winning strategy π for player 2 in the game G ↾ U 2 ; such a strategy exists by Theorem 17.
Let M be the memory of the strategy π and π = Tr2pos (π) be the corresponding strategy in
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
110
G. We denote by π ∗ = MemLess(π) and π ∗ the corresponding memoryless strategies of π in
G × M and π in G × M, respectively. We first argue that every end component in the game
(G ↾ U2 )π is winning for player 2.
Winning end components in (G ↾ U2 )π . Consider the product game (G × M ↾ U2 × M)
and the corresponding memoryless strategy π ∗ of π in the game G × M. We first argue
that every end component in (G × M ↾ U2 × M)π∗ is winning for player 2. Assume towards
contradiction that C is an end component in (G×M ↾ U2 ×M)π∗ that is winning for player 1.
Then we construct an s.c.c. that is winning for player 1 in (G × M ↾ U2 × M)π∗ . This will
give us a contradiction: since π is a sure winning strategy for player 2 in G ↾ U2 . We
describe the key steps to construct a winning s.c.c. C for player 1 in (G × M ↾ U2 × M)π∗
from a winning end component C for player 1 in (G × M ↾ U2 × M)π∗ . Mainly we describe
the strategy corresponding to a probabilistic state. If C is a winning end component for
player 1, consider i to be the witness Rabin pair to exhibit that C is winning, i.e., C ∩Fi 6= ∅
and C ∩ Ei = ∅. The strategy for player 1 is as follows.
• If the strategy π ∗ for player 2 at a state (s, m) chooses successor ((e
s, 0), m′ ), then the
following successor state is ((b
s, 0), m′ ) and since [(b
s, 0)] = {f1 , f2 , . . . , fd , fd+1 } player 1
ensures that a state in Fi is visited.
• If the strategy π ∗ for player 2 at a state (s, m) chooses a successor ((e
s, 2i), m′ ), then
player 1 chooses a successor ((b
s, 2i − 1), m′ ), where m, m′ ∈ M. Since [(e
s, 2i)] = fi
player 1 ensures that a state in Fi is visited.
• If the strategy π ∗ for player 2 at a state (s, m) chooses a successor ((e
s, 2j), m′ ), for
j 6= i, then player 1 chooses a successor ((b
s, 2j − 1), m′ ) and then a successor to
shorten distance to the set Fi , where m, m′ ∈ M. Since [(e
s, 2j − 1)] = ej 6= ei , player 1
ensures that a state in Ei is not visited.
The construction is illustrated in Fig. 5.3. Consider any terminal s.c.c. Y in
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
[s]
111
[s]
s
s
i
[s]
j 6= i
[s]
Visits Fi
(e
s, 2j)
(e
s, 2i)
fi
Shorten
distance
to Fi
E(s)
E(s)
E(s)
E(s)
Figure 5.3: The strategy sub-graph in Gπ .
the subgame thus constructed. The strategy for player 1 ensures that in the subgame C
whenever a state s is visited such that s ∈ SP , no state in Ei is visited. Since C ∩ Ei = ∅,
it follows that Y ∩ Ei = ∅. Moreover, the strategy for player 1 ensures that a state in Fi
is always visited, i.e., Y ∩ Fi 6= ∅. Hence in the subgame of (G × M ↾ C × M)π∗ every
terminal s.c.c. Y is winning for player 1, i.e., Fi ∩ Y 6= ∅ and Ei ∩ Y = ∅. However, this is a
contradiction since π is a sure winning strategy for player 2. Hence, all the end components
in (G × M ↾ U2 × M)π∗ are winning for player 2.
Reachability to end components. We prove that for all states s ∈ U2 × M, for all strategies
∗
σ for player 1, the play ωsσ,π stays in U2 × M with positive probability. Since every end
component in U2 × M is a winning end component for player 2, the desired result is then
established. We prove the above claim by contradiction. Let U3 ⊆ U2 ×M be the set of states
such that there is a strategy σ for player 1, such that for all states s ∈ U3 the play ωsσ,π
∗
reaches U1 with probability 1. Assume towards contradiction that U3 is non-empty. Note
that (G × M ↾ U2 × M)π∗ is a finite-state MDP. It follows from the almost-sure reachability
properties of MDPs (subsection 4.1.1) that for all states s ∈ U3 there is a successor in U3
that shortens the distance to U1 ; and for all probabilistic states s ∈ U3 all the successors of
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
112
s are in U3 . Consider a strategy σ in (G × M ↾ U2 × M)π∗ as follows:
1. for all states s ∈ U 3 ∩ S 1 choose a successor in U3 that shortens the distance to U1 ;
2. for all states s ∈ U 3 ∩ S P
• at state (e
s, 2i) chose successor (b
s, 2i − 1), for i ≥ 1;
• at state (b
s, 2i − 1) chose a successor that shortens the distance to U 1 .
Given the strategy σ and π ∗ there are two cases.
• If there is no cycle in U 3 , then the play given the strategies σ and π ∗ reaches U 1 and
player 1 wins from every state in U 1 . Since π ∗ is a sure winning strategy for player 2
from U 2 × M ⊇ U 3 , we have a contradiction.
• If there is a cycle C in U 3 , then the strategy π ∗ must chose the successor (e
s, 0) for
some state s ∈ C ∩ S P . The above fact follows because the strategy σ ensures that
for all state (e
s, 2i), for i ≥ 1, the distance to U 1 decreases. Hence C ∩ Fd+1 6= ∅.
Since Ed+1 = ∅ it follows that the cycle C is winning for player 1. Since C ⊆ U 3 ⊆
U 2 × M and π ∗ is a winning strategy for player 2 from all state in U 2 × M we have a
contradiction.
Hence U3 = ∅ and this completes the proof.
It follows from Lemma 26 and Lemma 27 that U1 = Almost 1 (Rabin(P )). The reduction of a
2 21 -player G to the 2-player game G blowed up states and edges of states in SP by a factor of
d, and added one more pair. This result readily allows us to use the algorithms for 2-player
Rabin games for qualitative analysis of 2 12 -player Rabin games. Moreover, pure memoryless
almost-sure winning strategies exist for 2 21 -player Rabin games and finite-memory positive
winning strategies exist for 2 12 -player Street games; and these strategies can be extracted
from sure winning strategies in 2-player games.
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
113
Theorem 26 Given a 2 21 -player game graph G with Rabin objective Rabin(P ) for player 1,
with d-pairs, and let n = |S| and m = |E|. Let U 1 and U 2 be the sure winning sets for
players 1 and 2, respectively, in the 2-player game graph G = Tr1as (G) with the modified
Rabin objective Rabin(P ). Define the sets U1 and U2 in the original 2 12 -player game graph G
by U1 = {s ∈ S | s ∈ U 1 } and U2 = {s ∈ S | s ∈ U 2 }. Then the following assertions hold.
1. We have
U1 = Almost 1 (Rabin(P ));
and
U2 = S \ Almost 1 (Rabin(P )) = {s | ∃π ∈ ΠPF . ∀σ ∈ Σ. Prσ,π
s (Streett(P )) > 0}.
2. The set Almost 1 (Rabin(P )) can be computed in time TwoPlRabinGame(n·d, m·d, d+1),
where TwoPlRabinGame(n·d, m·d, d+1)) the time complexity of a 2-player Rabin game
solving algorithm with n · d states, m · d, edges and d + 1 Rabin pairs.
3. If σ is a pure memoryless sure winning strategy for player 1 from U 1 in G, then
σ = Tr1as (σ) is an almost-sure winning strategy for player 1 from U1 in G.
Theorem 27 The family ΣPM of pure memoryless strategies suffices for almost-sure winning with respect to Rabin objectives on 2 12 -player game graphs.
Almost-sure winning for Streett objectives. Then almost-sure winning strategies for
Streett objectives can also be obtained by reduction to 2-player games. The gadget of
reduction needs to be slightly modified. We describe the modification below for the gadget
for a probabilistic state. We first consider the gadget for Rabin reduction without the edge
to the state (e
s, 0). The starting state is a player 1 state which can choose between the
above gadget, or a state (e
s, 0) to a successor to (b
s, 0) which is now a player 1 state and we
have [(b
s, 0)] = {e1 , e2 , . . . , ed+1 }. The successor for (b
s, 0) is as defined for the reduction for
almost-sure winning in Rabin games. The reduction gadget is illustrated in Fig 5.4. We
refer to this reduction as G2 = Tr2as (G). This reduction preserves almost-sure winning for
114
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
[s]
[s]
s
[s]
[s]
e1 , e2 , . . . , ed , ed+1
(b
s, 0)
E(s)
[s]
(e
s, 2)
(e
s, 0)
[s]
(e
s, 2d)
(e
s, 4)
f1
e1
(b
s, 1)
(b
s, 2)
(b
s, 3)
e2
f2
(b
s, 4)
E(s)
E(s)
E(s)
E(s)
ed
(b
s, 2d − 1)
E(s)
fd
(b
s, 2d)
E(s)
Figure 5.4: Gadget for the reduction of 2 21 -player Streett games to 2-player Streett games.
player 2 with the Streett objective. Moreover, from a finite-memory sure winning strategy
π in G2 we can extract a finite-memory almost-sure winning strategy π in G. The mapping
of the strategy π to π is obtained in a similar fashion as for the previous reduction; we refer
to this mapping of strategy as π = Tr2as (π).
Theorem 28 Given a 2 21 -player game graph G with Rabin objective Rabin(P ) for player 1,
with d-pairs, and let n = |S| and m = |E|. Let U 1 and U 2 be the sure winning sets for
players 1 and 2, respectively, in the 2-player game graph G = Tr2as (G) with the modified
Rabin objective Rabin(P ) for player 1. Define the sets U1 and U2 in the original 2 12 -player
game graph G by U1 = {s ∈ S | s ∈ U 1 } and U2 = {s ∈ S | s ∈ U 2 }. Then the following
assertions hold.
1. We have
U2 = Almost 2 (Streett(P ));
and
U1 = S \ Almost 2 (Streett(P )) = {s | ∃σ ∈ ΠPM . ∀π ∈ Π. Prσ,π
s (Rabin(P )) > 0}.
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
115
2. The set Almost 2 (Streett(P )) can be computed in time TwoPlRabinGame(n·d, m·d, d+1),
where TwoPlRabinGame(n·d, m·d, d+1)) the time complexity of a 2-player Rabin game
solving algorithm with n · d states, m · d, edges and d + 1 Rabin pairs.
3. If π is pure finite-memory sure winning strategy for player 2 in U 2 in Tr2as (G), then
the pure finite-memory strategy π = Tr2as (π) is an almost-sure winning strategy for
player 2 in G.
Quantitative Complexity. The existence of pure memoryless almost-sure winning strategies for Rabin objectives (Theorem 27) and Lemma 17 implies the existence of pure memoryless optimal strategies for Rabin objectives. This gives us the following results.
Theorem 29 The family ΣPM of pure memoryless strategy suffices for optimality with
respect to all Rabin objectives on 2 12 -player game graphs.
Theorem 30 Given a 2 21 -player game graph G, an objective Φ for player 1, a state s ∈ S
and a rational r ∈ R, the complexity of determining whether Val 1 (Φ)(s) ≥ r is as follows:
1. NP-complete if Φ is a Rabin objective.
2. coNP-complete if Φ is a Streett objective.
3. NP ∩ coNP if Φ is a parity objective.
Proof.
1. Let G be a 2 12 -player game with a Rabin objective Rabin(P ) for player 1. Given a
pure memoryless optimal strategy σ for player 1 the game Gσ is a player-2 MDP with
Streett objective for player 2. Since the values of MDPs with Streett objective can be
computed in polynomial time (Theorem 15) the problem is in NP. The NP-hardness
proof follows from the fact the 2-player games with Rabin objectives are NP-hard
(Theorem 2).
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
116
2. Follows immediately from the fact that Street objectives are complementary to Rabin
objectives.
3. Follows from the previous two completeness result, as a parity objective is both a
Rabin objective and a Streett objective.
Theorem 30 improves the previous 3EXPTIME bound for the quantitative analysis
for 2 21 -player games with Rabin and Streett objectives (Corollary3).
5.2
Strategy Improvement for 2 12 -player Rabin and Streett
Games
We first present a few key properties of 2 12 -player games with Rabin objectives. We
use the properties later to develop a strategy improvement algorithm for 2 21 -player games
with Rabin objectives.
5.2.1
Key Properties
Boundary probabilistic states. Given a set U of states, let Bnd (U ) = {s ∈ U ∩
SP | ∃t ∈ E(s), t 6∈ U }, be the set of boundary probabilistic states that have an edge out
of U . Given a set U of states and a Rabin objective Rabin(P ) for player 1, we define
two transformations Trwin1 (U ) and Trwin2 (U ) of U as follows: every state s in Bnd (U )
is converted to an absorbing state (state with only a self-loop) and (a) in Trwin1 (U ) it is
assigned the color f1 and (b) in Trwin2 (U ) it is assigned the color e1 ; i.e., every state in
Bnd (U ) is converted to a sure winning state for player 1 in Trwin1 (U ) and every state in
Bnd (U ) is converted to a sure winning state for player 2 in Trwin2 (U ). Observe that if U is
δ-live, then Trwin1 (G ↾ U ) and Trwin2 (G ↾ U ) is a game graph.
Value classes. Given a Rabin objective Φ, for every real r ∈ R the value class with value
117
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
r, VC(Φ, r) = {s ∈ S | Val 1 (Φ)(s) = r}, is the set of states with value r for player 1. In
sequel we will drop the parameter Φ and write VC(r) to denote VC(Φ, r). It follows from
Proposition 7 that for every r > 0, the value class VC(r) is δ-live. The following lemmas
(easily obtained as specialization of Lemma 15 and Lemma 16) establishes a connection
between value classes, the transformations Trwin1 and Trwin2 and the almost-sure winning
states.
Lemma 28 (Almost-sure winning reduction) The following assertions hold.
1. For every value class VC(r), for r > 0, the game Trwin1 (G ↾ VC(r)) is almost-sure
winning for player 1.
2. For every value class VC(r), for r < 1, the game Trwin2 (G ↾ VC(r)) is almost-sure
winning for player 2.
Lemma 29 (Optimal strategies) The following assertions hold.
1. If a strategy σ is an almost-winning strategy in the game Trwin1 (G ↾ VC(r)), for every
value class VC(r), then σ is an optimal strategy.
2. If a strategy π is an almost-winning strategy in the game Trwin2 (G ↾ VC(r)), for every
value class VC(r), then π is an optimal strategy.
It follows from Theorem 26 and Lemma 28, that for every value class VC(r), with
r > 0, the game Tr1as (Trwin1 (G ↾ VC(r))) is sure winning for player 1.
Properties of almost-sure winning states. The following lemma easily follows from
Corollary 5.
Lemma 30 Given
a
2 12 -player
game
G
and
a
Rabin
Almost 1 (Rabin(P )) = ∅, then Almost 2 (Ω \ Rabin(P )) = S.
objective
Rabin(P )
if
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
118
Property of MDPs with Streett objectives. The following lemma is a obtained as a
special case of Theorem 13.
Lemma 31 The family of randomized memoryless strategies suffices for optimality with
respect to Streett objectives on MDPs.
5.2.2
Strategy Improvement Algorithm
We now present an algorithm to compute values for 2 21 -player games with Rabin
objective Rabin(P ) for player 1. By quantitative determinacy (Theorem 16) the algorithm
also computes values for Streett objective Streett(P ) for player 2. Recall that since pure
memoryless strategies exist for Rabin objectives we will only consider pure memoryless
strategies σ for player 1. We refer to the Rabin objective Rabin(P ) for player 1 as Φ.
Restriction, values and value classes of strategies. Given a strategy σ and a set
U of states, we denote by (σ ↾ U ) the restriction of the strategy σ on the set U , that is,
a strategy that for every state in U follows the strategy σ. Given a player-1 strategy σ
and a Rabin objective Φ, we denote the value of player 1 given the strategy σ as follows:
Val σ1 (Φ)(s) = inf π∈Π Prσ,π
s (Φ). Similarly we define the value classes given strategy σ as
VCσ (r) = {s ∈ S | Val σ1 (Φ)(s) = r}.
Ordering of strategies. We define an ordering relation ≺ on strategies as follows: given
two strategies σ and σ ′ , we have σ ≺ σ ′ if and only if
′
• for all states s we have Val σ1 (Φ)(s) ≤ Val σ1 (Φ)(s) and for some state s we have
′
Val σ1 (Φ)(s) < Val σ1 (Φ)(s).
Strategy improvement step. Given a strategy σ for player 1, we describe a procedure
Improve to “improve” the strategy for player 1. The procedure is described in Algorithm 3.
An informal description of the procedure is as follows: given a strategy σ, the algorithm
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
119
computes the values Val σ1 (Φ)(s) for all states. Since σ is a pure memoryless strategy,
Val σ1 (Φ)(s) can be computed by solving the MDP Gσ with the Streett objective Ω \ Φ. If
there is a state s ∈ S1 such that the strategy can be “value improved,” i.e., there is a state
t ∈ E(s) with Val σ1 (Φ)(t) > Val σ1 (Φ)(s), then the strategy σ is modified by setting σ(s)
to t. This is achieved in Step 2.1 of Improve. Otherwise in every value class VCσ (r), the
strategy σ is “improved” for the game Tr1as (Trwin2 (G ↾ VCσ (r))) by solving the 2-player
game Tr1as (Trwin2 (G ↾ VCσ (r))) by an algorithm to solve 2-player Rabin games.
The complexity of Improve will be discussed in Lemma 36. In the algorithm the
strategy σ for player 1 is always a pure memoryless strategy (this is sufficient, because
pure memoryless strategies suffice for optimality in 2 12 -player games with Rabin objectives
(Theorem 16)). Moreover, given a pure memoryless strategy σ, the game Gσ is a player-2
MDP, and by Lemma 31, there is a randomized memoryless counter-optimal strategy for
player 2. Hence, fixing a pure memoryless strategy for player 1, we only consider randomized
memoryless strategies for player 2. We now define the notion of Rabin winning set, and
then present two propositions, which are useful in the correctness proof of the algorithm.
Rabin winning set.
Consider a Rabin objective Rabin(P ) and let [P ]
=
{(E1 , F1 ), (E2 , F2 ), . . . , (Ed , Fd )} be the set of Rabin pairs. A set C ⊆ S is Rabin winning if there exists 1 ≤ i ≤ d such that C ∩ Ei = ∅ and C ∩ Fi 6= ∅, i.e., for all plays ω if
Inf(ω) = C, then ω ∈ Rabin(P ).
Proposition 10 Given a strategy σ for player 1, for every state s ∈ VCσ (r) ∩ S2 , if
S
t ∈ E(s), then we have Val σ1 (Φ)(t) ≥ r, i.e., E(s) ⊆ q≥r VCσ (q).
Proof. The result is proved by contradiction. Suppose the assertion of the proposition fails,
i.e., there exists s and t ∈ E(s), such that s ∈ VCσ (r) and Val σ1 (Φ)(t) < r, then consider the
strategy π ∈ Π for player 2 that at s chooses successor t, and from t ensures Φ is satisfied
with probability at most Val σ1 (Φ)(t) against strategy σ. Hence we have Val σ1 (Φ)(s) ≤
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
120
Val σ1 (Φ)(t) < r. This contradicts that s ∈ VCσ (r). Hence player 2 can only choose edges
with the target of the edge in equal or higher value classes.
Proposition 11 Given a strategy σ for player 1, for all strategies π ∈ ΠM for player 2, if
there is a closed connected recurrent class C in the Markov chain Gσ,π , with C ⊆ VCσ (r),
for r > 0, then C is Rabin winning.
Proof. The result is again proved by contradiction. Suppose the assertion of the proposition
fails, i.e., for some strategy π ∈ ΠM for player 2, for some r > 0, C is a closed connected
recurrent class in the Markov chain Gσ,π , with C ⊆ VCσ (r) and C is not Rabin winning.
Then player 2 by playing strategy π ensures that for all states s ∈ C we have Prσ,π
s (Φ) = 0
(since C is not Rabin winning and given C is a closed connected recurrent class, all states
in C are visited infinitely often). This contradicts that C ⊆ VCσ (r) and r > 0.
Lemma 32 Consider a strategy σ to be an input to Algorithm 3, and let σ ′ be an output,
i.e., σ ′ = Improve(G, σ). If the set I in Step 2 of Algorithm 3 is non-empty, then we have
′
Val σ1 (Φ)(s) ≥ Val σ1 (Φ)(s) ∀s ∈ S;
′
Val σ1 (Φ)(s) > Val σ1 (Φ)(s) ∀s ∈ I.
Proof. Consider a switch of the strategy of player 1 from σ to σ ′ , as constructed in Step 2.1
of Algorithm 3. Consider a strategy π ∈ ΠM for player 2 and a closed connected recurrent
S
class C in Gσ′ ,π such that C ⊆ r>0 VCσ (r). Let z = max{r > 0 | C ∩ VCσ (r) 6= ∅},
that is, VCσ (z) is the greatest value class with a nonempty intersection with C. A state
s ∈ VCσ (z) ∩ C satisfies the following conditions:
1. If s ∈ S2 , then for all t ∈ E(s) if π(s)(t) > 0, then t ∈ VCσ (z). This follows, because
S
by Proposition 10, we have E(s) ⊆ q≥z VCσ (q) and C ∩ VCσ (q) = ∅ for q > z.
2. If s ∈ S1 , then σ ′ (s) ∈ VCσ (z). This follows, because by construction σ ′ (s) ∈
S
σ
σ
Also, since s ∈ VCσ (z) and
q≥z VC (q) and C ∩ VC (q) = ∅ for q > z.
σ ′ (s) ∈ VCσ (z), it follows that σ ′ (s) = σ(s).
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
121
3. If s ∈ SP , then E(s) ⊆ VCσ (z). This follows, because for s ∈ SP , if E(s) ( VCσ (z),
S
then E(s) ∩ q>z VCσ (q) 6= ∅. Since C is closed, and C ∩ VCσ (q) = ∅ for q > z, the
claim follows.
It follows that C ⊆ VCσ (z), and for all states s ∈ C ∩ S1 , we have σ ′ (s) = σ(s). Hence, by
Proposition 11, we conclude that C is Rabin winning.
It follows that if player 1 switches to the strategy σ ′ , as constructed when Step 2.1
of Algorithm 3 is executed, then for all strategies π ∈ ΠM for player 2 the following assertion
holds: if there is a closed connected recurrent class C ⊆ S \ VCσ (0) in the Markov chain
Gσ′ ,π , then C is Rabin winning for player 1. Hence given strategy σ ′ , a counter-optimal
strategy for player 2 maximizes the probability to reach VCσ (0). We now analyze the
player-2 MDP Gσ′ with the reachability objective VCσ (0) to establish the desired claim.
For simplicity we consider E ∩ (SP × SP ) = ∅, i.e., a state s ∈ SP has edges to S1 and S2
states only ((E(s) ⊆ S1 ∪ S2 ). We consider the following variables ws for s ∈ (S \ SP ) as
follows:
ws =



1 − Val σ (Φ)(s)
s ∈ (S \ I) \ SP ;
1


1 − Val σ (Φ)(σ ′ (s)) s ∈ I.
1
Observe that for s ∈ I we have ws > 1 − Val σ1 (Φ)(s). We now define variables xs for s ∈ S
as follows:
xs =
Observe that



ws


P
xs ≤ ws
t∈E(s) δ(s)(t)
s ∈ S1 ∪ S2 ;
· ws
s ∈ SP .



≥ 1 − Val σ1 (Φ)(s) s ∈ S \ I;


> 1 − Val σ (Φ)(s) s ∈ I.
1
122
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
The variables satisfy the following constraints
xs ≥ xt
s ∈ S2 , (s, t) ∈ E;
P
xs =
t∈E(s) δ(s)(t) · xt s ∈ SP ;
xs = xσ′ (s)
s ∈ S1 ;
xs = 1
s ∈ VCσ (0).
Since the value for player 2 to reach VCσ (0) in the MDP Gσ′ is the least value vector to
satisfy the above constraints (see Theorem 11), it follows that the value for player 2 to reach
′
VCσ (0) at a state s is at most xs , i.e., for all π ∈ Π we have Prσs ,π (Reach(VCσ (0)) ≤ xs .
′
Thus we obtain that for all s ∈ S and for all strategies π ∈ Π we have Prσs ,π (Φ) ≤ xs .
′
Hence we have Val σ1 (Φ)(s) ≥ 1 − xs and the desired result follows.
Lemma 33 Consider a strategy σ to be an input to Algorithm 3, and let σ ′ be an output,
i.e., σ ′ = Improve(G, σ), such that σ ′ 6= σ. If the set I in Step 2 of Algorithm 3 is empty,
then
′
1. for all states s we have Val σ1 (Φ)(s) ≥ Val σ1 (Φ)(s); and
′
2. for some state s we have Val σ1 (Φ)(s) > Val σ1 (Φ)(s).
Proof. It follows from Proposition 11 that for all strategies π ∈ ΠM for player 2, if C is
a closed connected recurrent class in Gσ,π and C ⊆ VCσ (q), for q > 0, then C is Rabin
winning. Let σ ′ be the strategy constructed from σ in Step 2.2 of Algorithm 3. The
set Ur where σ is modified to obtain σ ′ , the strategy σ ′ ↾ Ur is an almost-sure winning
strategy in Ur . in the subgame Trwin2 (G ↾ VCσ (r)) This follows from Theorem 26 since
σ ′ ↾ Ur = Tr1as (σ ↾ U r ) and σ ↾ U r is a sure winning strategy for player 1 in U r in the
subgame Tr1as (Trwin2 (G ↾ VCσ (r))). It follows that if C is a closed connected recurrent
class in Gσ′ ,π and C ⊆ Ur , then C is Rabin winning. Arguments similar to Lemma 32
shows that the following assertion hold: for all strategies π ∈ ΠM for player 2, if there
123
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
is a closed connected recurrent class C ⊆ (S \ VCσ (0)) in the Markov chain Gσ′ ,π , then
either (a) C ⊆ VCσ (z) \ Ur , for some z > 0; or (b) C ⊆ Ur ; in both cases C is Rabin
winning. Given the strategy σ ′ player 2 we consider the MDP Gσ′ and fix a randomized
memoryless optimal strategy π for player 2 in the player-2 MDP Gσ′ . We have the following
case analysis.
1. If for all states s ∈ Ur ∩ S2 we have Supp(π(s)) ⊆ Ur , then since σ ′ ↾ Ur is an
almost-sure winning strategy for player 1 it follows that for all states s ∈ Ur we have
′
σ
σ
Prσ,π
s (Φ) = 1. Since r < 1, for all s ∈ Ur VC (Φ)(s) = r < 1, and VC (Φ)(s) = 1 >
r = VCσ (Φ)(s). The desired claim of the lemma easily follows in this case.
2. Otherwise, there exists a state s ∈ Ur such that Supp(π(s)) ∩ (S \ Ur ) 6= ∅. In the
present case we have
Supp(π(s)) ⊆
[
VCσ (q);
q≥r
Supp(π(s)) ∩
[
q>r
VCσ (q) 6= ∅.
(5.1)
Observe that for all s ∈ S1 , if s ∈ VCσ (q), then σ ′ (s) ∈ VCσ (q). We now consider
variable ws for s ∈ S1 ∪ S2 as follows:
ws =



1 − Val σ1 (Φ)(s)
s ∈ S1 ;


1 − (P
t∈E(s) π(s)(t)
· Val σ1 (Φ)(t))
s ∈ S2 .
We now consider variables xs for s ∈ S as follows:
xs =



ws


P
s ∈ S1 ∪ S2 ;
t∈E(s) δ(s)(t)
It follows that
xs ≥ 1 − Val σ1 (Φ)(s), for all s ∈ S;
· wt
s ∈ SP .
xs < 1 − Val σ1 (Φ)(s), for some s ∈ S;
124
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
The variables satisfy the following constraints
xs =
xs =
P
t∈E(s) π(s)(t)
P
t∈E(s) δ(s)(t)
· xt s ∈ S 2 ;
· xt
s ∈ SP ;
xs = xσ′ (s)
s ∈ S1 ;
xs = 1
s ∈ VCσ (0).
Since the value for player 2 to reach VCσ (0) in the Markov chain Gσ′ ,π is the least
value vector to satisfy the above constraints (follows as a special case of Theorem 11),
it follows that the value for player 2 to reach VCσ (0) at a state s is at most xs , i.e.,
′
′
Prσs ,π (Reach(VCσ (0)) ≤ xs . Thus we obtain that for all s ∈ S we have Prσs ,π (Φ) ≤ xs .
′
Since π is an optimal strategy against σ ′ we have Val σ1 (Φ)(s) ≥ 1 − xs .
The desired result follows.
Lemma 32 and Lemma 33 yields Lemma 34.
Lemma 34 For a strategy σ, if σ 6= Improve(G, σ), then σ ≺ Improve(G, σ).
Lemma 35 If σ = Improve(G, σ), then σ is an optimal strategy for player 1.
Proof. Let σ be a strategy such that σ = Improve(G, σ). Then the following conditions
hold.
1. Fact 1. The strategy σ cannot be “value-improved”, that is
∀s ∈ S1 . ∀t ∈ E(s). Val σ1 (Φ)(t) ≤ Val σ1 (Φ)(s);
and for all s ∈ S1 we have Val σ1 (Φ)(s) = Val σ1 (Φ)(σ(s)).
2. Fact 2. For all r < 1, the set of almost-sure winning states in Trwin2 (G ↾ VCσ (r))
for player 1 is empty. By Lemma 30 it follows that for all r < 1, all states in
Trwin2 (G ↾ VCσ (r)) are almost-sure winning for player 2.
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
125
Consider a finite-memory strategy π, with memory M, for player 2 such that the strategy π
is almost-sure winning in Trwin2 (G ↾ VCσ (r)) for all r < 1. Let U<1 = S \ VCσ (1). Consider
a pure memoryless strategy σ ′ of player 1. Consider the Markov chain (G × M)σ′ ,π and
closed connected recurrent set C in the Markov chain. It follows from arguments similar
to Lemmas 32 that the set C is contained in some value class and do not intersect with
the boundary probabilistic states, i.e., for some r ∈ [0, 1] we have C ⊆ (VCσ (r) × M) and
C ∩ (Bnd (r) × M) = ∅. By the property of the almost-sure winning of π it follows that for
r < 1 we have C is almost-sure winning for player 2, i.e., if r < 1, then for all s ∈ C the
probability of satisfying Φ in the Markov chain is 0. Hence for all states s ∈ U<1 , we have
′
Prσs ,π (Φ | Safe(U<1 )) = 0, where Safe(U<1 ) = {ω = hs0 , s1 , . . .i | ∀k ≥ 0. sk ∈ U<1 } denotes
the set of plays that only visit states in U<1 . Hence, given the strategy π, any counteroptimal pure memoryless strategy for player 1 maximizes the probability to reach VCσ (1) in
the MDP (G × M)π . From the fact that the strategy σ cannot be “value improved” (Fact 1)
and argument similar to Lemma 32 (to analyze reachability in MDPs), it follows that that
for all player-1 pure memoryless strategies σ ′ , all r < 1, and all states s ∈ VCσ (r), we have
′
Prσs ,π (Φ) ≤ r. Since pure memoryless optimal strategies exist for player 1, it follows that
for all r ∈ [0, 1] and all states s ∈ VCσ (r), we have Val 1 (Φ)(s) ≤ r. For all r ∈ [0, 1] and all
states s ∈ VCσ (r), we have r = Val σ1 (Φ)(s) ≤ Val 1 (Φ)(s). This establishes the optimality
of σ.
Lemma 36 The procedure Improve can be computed in time
O(poly (n)) + n · O(TwoPlRabinGame(n · d, m · d, d + 1)),
where poly is a polynomial function.
In Lemma 36 we denote by O(TwoPlRabinGame(n · d, m · d, d + 1)) the time complexity of an algorithm for solving 2-player Rabin games with n · d states, m · d edges, and
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
126
d + 1 Rabin pairs. Recall the reduction Tr1as blows up states in and outgoing edges from SP
by a factor of d, and adds a new Rabin pair. A call to Improve requires solving an MDP
with Streett objectives quantitatively (Step 1 of Improve; this can be achieved in polynomial time by Theorem 15), and computing Step 2.2 requires to solve at most n two-player
Rabin games (since there can be at most n value classes). Lemma 36 follows. Also recall
that by the results of [PP06] we have
O(TwoPlRabinGame(n · d, m · d, d + 1)) = O (m · d) · (n · d)d+2 · (d + 1)! = O m · nd+2 · dd+3
A strategy-improvement algorithm using the Improve procedure is described in
Algorithm 4. Observe that it follows from Lemma 34 that, if Algorithm 4 outputs a strategy
σ ∗ , then σ ∗ = Improve(G, σ ∗ ). The correctness of the algorithm follows from Lemma 35
and yields Theorem 31. Given an optimal strategy σ for player 1, the values for both the
players can be computed in polynomial time by computing the values of the MDP Gσ (see
Theorem 15). Since there are at most
m n
n)
≤ 2n·log n possible pure memoryless strategies,
it follows that Algorithm 4 requires at most 2n·log n iterations. This along with Lemma 36
gives us the following theorem.
Theorem 31 (Correctness of Algorithm 4) For every 2 12 -player game graph G and
Rabin objective Φ, the output σ ∗ of Algorithm 4 is an optimal strategy for player 1. The
running time of Algorithm 4 is bounded by 2O(n·log n) · O m · nd+2 · d(d+3) if G has n states
and m edges, and Φ has d pairs.
5.3
Randomized Algorithm
We now present a randomized algorithm for 2 21 -player Rabin games, by combining
an algorithm of Björklund et al. [BSV03] and the procedure Improve.
Games and improving subgames. Given l, m ∈ N, let G(l, m) be the class of 2 21 -player game
graphs with the set S1 of player 1 states partitioned into two sets as follows: (a) O1 = {s ∈
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
127
S1 | |E(s)| = 1}, i.e., the set of states with out-degree 1; and (b) O2 = S2 \ O1 , with O2 ≤ l
P
and s∈O2 |E(s)| ≤ m. There is no restriction for player 2. Given a game G ∈ G(l, m), a
ee by deleting all edges from s
state s ∈ O2 , and an edge e = (s, t), we define the subgame G
ee ∈ G(l − 1, m − |E(s)|), and hence also G
ee ∈ G(l, m).
other than the edge e. Observe that G
e is σ-improving if some
If σ is a strategy for player 1 in G ∈ G(l, m), then a subgame G
e satisfies that σ ≺ σ ′ .
strategy σ ′ in G
Informal description of Algorithm 5. The algorithm takes a 2 21 -player Rabin game and
an initial strategy σ 0 , and proceeds in three steps. In Step 1, it constructs r pairs of σ 0 e and corresponding improved strategy σ in G.
e This is achieved by
improving subgames G
the procedure ImprovingSubgames. The parameter r will be chosen to obtain a suitable
complexity analysis. In Step 2, the algorithm selects uniformly at random one of the
e with corresponding strategy σ, and recursively computes an optimal
improving subgames G
e from σ as the initial strategy. If the strategy σ ∗ is optimal in the original
strategy σ ∗ in G
game G, then the algorithm terminates and returns σ ∗ . Otherwise it improves σ ∗ by a
call to Improve, and continues at Step 1 with the improved strategy Improve(G, σ ∗ ) as the
initial strategy.
The procedure ImprovingSubgames constructs a sequence of game graphs
ei of Gi are σ 0 G0 , G1 , . . . , Gr−l with Gi ∈ G(l, l + i) such that all (l + i)-subgames G
e
improving. The subgame Gi+1 is constructed from Gi as follows: we compute an optimal
strategy σ i in Gi , and if σ i is optimal in G, then we have discovered an optimal strategy;
otherwise we construct Gi+1 by adding any target edge e of Improve(G, σ i ) in Gi , i.e., e is
an edge required in the strategy Improve(G, σ i ) that is not in the strategy σ i .
The correctness of the algorithm can be seen as follows. Observe that every time
Step 1 is executed, the initial strategy is improved with respect to the ordering ≺ on
strategies. Since the number of strategies is bounded, the termination of the algorithm is
guaranteed. Step 3 of Algorithm 5 and Step 1.2.1 of procedure ImprovingSubgames ensure
128
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
that on termination of the algorithm, the returned strategy is optimal. Lemma 37 bounds
the expected number of iterations of Algorithm 5. The analysis is similar to the results of
[BSV03].
Lemma 37 Algorithm 5 computes an optimal strategy. The expected number of iterations
T (·, ·) of Algorithm 5 for a game G ∈ G(l, m) is bounded by the following recurrence:
T (l, m) ≤
r
X
i=l
T (l, i) + T (l − 1, m − 2) +
r
1 X
T (l, m − i) + 1.
·
r
i=1
Proof. We justify every term of the right hand side of the recurrence. The first term
represent the work by procedure ImprovingSubgames by recursive calls to Algorithm 5 to
compute r pairs of σ 0 -improving sub-games and witnesses. The second term represents the
work of the recursive call at Step 2 of Algorithm 5. The third term represents the work
as the average of the r equally likely choices in Step 3 of Algorithm 5. All the sub-games
Gi can be partially ordered according to the values of the optimal strategies in Gi . Since
the algorithm only visits strategies that are improving w.r.t. the ≺ ordering, it follows that
sub-games that have equal, worse or incomparable optimal strategy, to the strategy σ ∗ will
never be explored in the rest of the algorithm. In the worst case the algorithm selects the
worst r sub-games and the Step 3 solves a game G ∈ G(l, m − i), for i = 1, 2, . . . , r, each
with probability 1r . This gives the bound for the recurrence.
For a game graph G with |S| = n, we obtain a bound of n2 for m. Using this
fact and an analysis of Kalai for linear programming, Björklund et al. [BSV03] showed that
√
√
mO n/ log(n) = 2O n·log (n) is a solution to the recurrence of Lemma 37, by choosing
r = max{n, m
2 }. The above analysis along with Lemma 36 yields Theorem 32.
Theorem 32 Given a 2 21 -player game graph G and a Rabin objective Rabin(P ) with d-pairs
the value Val 1 (Rabin(P ))(s) can be computed for all states s ∈ S in expected time
O
2
√
n·log(n)
O
· O(TwoPlRabinGame(n · d, m · d, d + 1)) = 2
√
n·log(n)
· O nd+2 · dd+3 · m .
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
5.4
129
Optimal Strategy Construction for Streett Objectives
The algorithms, Algorithm 4 and the randomized algorithm, compute values for
both player 1 and player 2 (i.e., both for Rabin and Streett objectives), but only construct
an optimal strategy for player 1 (i.e., the player with the Rabin objective). Since pure
memoryless optimal strategies exist for the Rabin player, it is much simpler to analyze and
obtain the values and an optimal strategy for player 1. We now show that how, once these
values have been computed, we can obtain an optimal strategy for the Streett player as well.
We do this by computing sure winning strategies in 2-player games with Streett objectives.
Given a 2 21 -player game G with Rabin objective Φ for player 1, and the complementary objective Ω \ Φ for player 2, first we compute Val 1 (Φ)(s) for all states s ∈ S. An
optimal strategy π ∗ for player 2 is constructed as follows: for a value class VC(r), for r < 1,
obtain a sure winning strategy π r for player 2 in Tr2as (Trwin2 (G ↾ VC(r))), and in VC(r) the
strategy π ∗ follows the strategy Tr2as (π r ). By Lemma 29, it follows that π ∗ is an optimal
strategy, and given all values, the construction of π ∗ requires n calls to a procedure for
solving 2-player games with Streett objectives.
Theorem 33 Let G be a 2 12 -player game graph with n states and m edges, and let Φ
and Ω \ Φ be a Rabin and Streett objective, respectively, with d pairs. Given the values
Val 1 (Φ)(s) = 1 − Val 2 (Φ)(s) for all states s of G, an optimal strategy π ∗ for player 2 can be
constructed in time n·O(TwoPlStreettGame(n·d, m·d, d+1)), where TwoPlStreettGame(n·
d, m · d, d + 1) is any algorithm for solving 2-player Streett games with n · d states, m · d
edges, and d + 1 Streett pairs.
Discussion on parity games. We briefly discuss the special case for parity games, and
then summarize the results. For the special case of 2 12 -player games with parity objectives an
improved strategy improvement algorithm (where the improvement step can be computed
in polynomial time) is given in [CH06a]. We summarize the complexity of strategies and
130
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
Table 5.1: Strategy complexity of 2 12 -player games and its sub-classes with ω-regular objectives, where ΣPM denotes the family of pure memoryless strategies, ΣPF denotes the family
of pure finite-memory strategies and ΣM denotes the family of randomized memoryless
strategies.
Objectives
1-pl.
1 12 -pl.
2-pl.
2 12 -pl.
Reachability
ΣPM
ΣPM
ΣPM
ΣPM
/Safety
Parity
ΣPM
ΣPM
ΣPM
ΣPM
Rabin
ΣPM
ΣPM
ΣPM
ΣPM
P
F
M
P
F
M
P
F
Streett
Σ
/Σ
Σ
/Σ
Σ
ΣP F
Müller
ΣP F / ΣM
ΣP F / ΣM
ΣP F
ΣP F
Table 5.2: Computational complexity of 2 12 -player games and its sub-classes with ω-regular
objectives.
PTIME
1 12 -pl.
Quan.
PTIME
PTIME
Qual.
PTIME
PTIME
PTIME
PTIME
PTIME
PTIME
PTIME
PTIME
PTIME
NP ∩ coNP
NP-compl.
coNP-compl.
PSPACE-compl.
NP ∩ coNP
NP-compl.
coNP-compl.
PSPACE-compl.
1-pl.
Objectives
Reachability
/Safety
Parity
Rabin
Streett
Müller
2 12 -pl.
2-pl.
Quan.
NP ∩ coNP
NP ∩ coNP
NP-compl.
coNP-compl.
PSPACE-compl.
the computational complexity of 2 12 -player games with Müller objectives and its subclasses
in Table 5.1 and Table 5.2.
5.5
Conclusion
We conclude the chapter stating the major open problems in the complexity anal-
ysis of 2 21 -player games and its subclasses. The open problems are as follows:
1. to obtain polynomial time algorithm for 2-player parity games;
2. to obtain polynomial time algorithm for quantitative analysis of 2 21 -player reachability
games; and
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
131
3. to obtain polynomial time algorithm for quantitative analysis of 2 21 -player parity
games.
All the above problem are in NP ∩ coNP, and no polynomial time algorithm is known for
any of the above problems.
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
Algorithm 3 Improve
Input: A 2 12 -player game graph G, a Rabin objective Φ for player 1,
and a pure memoryless strategy σ for player 1.
Output: A pure memoryless strategy σ ′ for player 1 such that σ ′ = σ or σ ≺ σ ′ .
[Step 1] Compute Val σ1 (Φ)(s) for all states s.
[Step 2] Consider the set I = {s ∈ S1 | ∃t ∈ E(s). Val σ1 (Φ)(t) > Val σ1 (Φ)(s)}.
2.1 (Value improvement) if I 6= ∅ then choose σ ′ as follows:
σ ′ (s) = σ(s) for s ∈ S1 \ I; and
σ ′ (s) = t for s ∈ I, where t ∈ E(s) such that Val σ1 (Φ)(t) > Val σ1 (Φ)(s).
2.2 (Qualitative improvement) else
for each value class VCσ (r) with r < 1 do
Let Gr be the 2-player game Tr1as (Trwin2 (G ↾ VCσ (r))).
Let U r be the sure winning states for player 1 in Gr ;
let Ur the corresponding set in G; and
let σ r be the sure winning strategy for player 1 in U r .
Choose σ ′ (s) = Tr1as (σ r ↾ U r )(s) for all states in Ur ; and
σ ′ (s) = σ(s) for all states in VCσ (r) \ Ur .
return σ ′ .
132
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
Algorithm 4 StrategyImprovementAlgorithm
Input: A 2 12 -player game graph G and a Rabin objective Φ for player 1.
Output: An optimal strategy σ ∗ for player 1.
1. Choose an arbitrary pure memoryless strategy σ for player 1.
2. while σ 6= Improve(G, σ) do σ = Improve(G, σ).
3. return σ ∗ = σ.
133
CHAPTER 5. STOCHASTIC RABIN AND STREETT GAMES
Algorithm 5 RandomizedAlgorithm (2 12 -player Rabin games)
Input: a 2 12 -player game graph G ∈ G(l, m), a Rabin objective Rabin(P ) for pl. 1
and an initial strategy σ 0 for pl. 1.
Output : an optimal strategy σ ∗ for player 1.
e σ) of subgames G
e of G, and
1. (Step 1) Collect a set I of r pairs (G,
e such that σ 0 ≺ σ.
corresponding strategies σ in G
(This is achieved by the procedure ImprovingSubgames below).
e σ) from I uniformly at random.
2. (Step 2) Select a pair (G,
e by applying the algorithm recursively,
2.1 Find an optimal strategy in σ ∗ ∈ G
with σ as the initial strategy.
3. (Step 3) if σ ∗ is an optimal strategy in the original game G then return σ ∗ .
else let σ = Improve(G, σ ∗ ), and
goto Step 1 with G and σ as the initial strategy.
procedure ImprovingSubgames
1. Construct sequence G0 , G1 , . . . , Gr−l of subgames with Gi ∈ G(l, l + i) as follows:
1.1 G0 is the game where each edge is fixed according to σ 0 .
1.2 Let σ i be an optimal strategy in Gi ;
1.2.1 if σ i is an optimal strategy in the original game G
then return σ i .
1.2.2 else let e be any target of Improve(G, σ i );
the subgame Gi+1 is Gi with the edge e added.
2. return r subgames (fixing one of the r edges in Gr−l ) and associated strategies.
134
135
Chapter 6
Concurrent Reachability Games
In this chapter we present two results1 . First, we present a simple proof of the fact
that in concurrent reachability games, for all ε > 0, memoryless ε-optimal strategies exist.
A memoryless strategy is independent of the history of plays, and an ε-optimal strategy
achieves the objective with probability within ε of the value of the game. In contrast to
previous proofs of this fact, which rely on the limit behavior of discounted games using
advanced Puisieux series analysis, our proof is elementary and combinatorial. Second, we
present a strategy-improvement (a.k.a. policy-iteration) algorithm for concurrent games
with reachability objectives.
It has long been known that optimal strategies need not exist for concurrent reachability games [Eve57], so that one must settle for ε-optimality. It was also known that, for
ε > 0, there exist ε-optimal strategies that are memoryless, i.e., strategies that always
choose a probability distribution over moves that depends only on the current state, and
not on the past history of the play [FV97]. Unfortunately, the only previous proof of this
fact is rather complex. The proof considered discounted versions of reachability games,
where a play that reaches the target in k steps is assigned a value of αk , for some discount
1
Preliminary version of this chapter appeared in [CdAH06b]
CHAPTER 6. CONCURRENT REACHABILITY GAMES
136
factor 0 < α ≤ 1, rather than value 1. It is possible to show that, for 0 < α < 1, memoryless
optimal strategies always exist. The result for the undiscounted (α = 1) case followed from
an analysis of the limit behavior of such optimal strategies for α → 1. The limit behavior is
studied with the help of results on the field of real Puisieux series [FV97]. This proof idea
works not only for reachability games, but also for total-reward games with nonnegative
rewards (see [FV97] again). A recent result [EY06] established the existence of memoryless ε-optimal strategies for certain infinite-state (recursive) concurrent games; the proof
relies on results from analysis and analytic property of certain power series. We show that
the existence of memoryless ε-optimal strategies for concurrent reachability games can be
established by more elementary means. Our proof relies only on combinatorial techniques
and on simple properties of Markov decision processes [Ber95, dA97]. As our proof is easily
accessible, we believe that the proof techniques we use will find future applications in game
theory.
Our proof of the existence of memoryless ε-optimal strategies, for all ε > 0, is built
upon a value-iteration scheme that converges to the value of the game [dAM01]. The valueiteration scheme computes a sequence u0 , u1 , u2 , . . . of valuations, where for i = 0, 1, 2, . . .
each valuation ui associates with each state s of the game a lower bound ui (s) on the
value of the game, such that limi→∞ ui (s) converges to the value of the game at s. From
each valuation ui , we can easily extract a memoryless, randomized player-1 strategy, by
considering the (randomized) choice of moves for player 1 that achieves the maximal onestep expectation of ui . In general, a strategy σi obtained in this fashion is not guaranteed
to achieve the value ui . We show that σi is guaranteed to achieve the value ui if it is
proper, that is, if regardless of the strategy adopted by player 2, the game reaches with
probability 1 states that are either in the target, or that have no path leading to the target.
Next, we show how to extract from the sequence of valuations u0 , u1 , u2 , . . . a sequence of
memoryless randomized player-1 strategies σ0 , σ1 , σ2 , . . . that are guaranteed to be proper,
CHAPTER 6. CONCURRENT REACHABILITY GAMES
137
and thus achieve the values u0 , u1 , u2 , . . .. This proves the existence of memoryless ε-optimal
strategies for all ε > 0.
We then apply the techniques developed for the above proof to develop a strategyimprovement algorithm for concurrent reachability games. Strategy-improvement algorithms, also known as policy iteration algorithms in the context of Markov decision processes
[Der70, Ber95], compute a sequence of memoryless strategies σ0′ , σ1′ , σ2′ , . . . such that, for all
′
′
= σk′ , then σk is
is at all states no worse than σk′ ; (ii) if σk+1
k ≥ 0, (i) the strategy σk+1
optimal; and (iii) for every ε > 0, we can find a k sufficiently large so that σk′ is ε-optimal.
Computing a sequence of strategies σ0 , σ1 , σ2 , . . . on the basis the value-iteration scheme
from above does not yield a strategy-improvement algorithm, as condition (ii) may be violated: there is no guarantee that a step in the value iteration leads to an improvement in the
strategy. We will show that the key to obtain a strategy-improvement algorithm consists in
recomputing, at each iteration, the values of the player-1 strategy to be improved, and in
adopting a particular strategy-update rule, which ensures that all the strategies produced
are proper. Unlike previous proofs of strategy-improvement algorithms for concurrent games
[Con93, FV97], which relied on the analysis of discounted versions of the games, our analysis
is again purely combinatorial. Differently from turn-based games [Con93], for concurrent
games we cannot guarantee the termination of the strategy-improvement algorithm. In fact,
there are games where optimal strategies do not exist, and we can guarantee the existence
of only ε-optimal strategies, for all ε > 0 [Eve57, dAHK98].
CHAPTER 6. CONCURRENT REACHABILITY GAMES
6.1
138
Preliminaries
Destinations of selectors and their memoryless strategies. Given a state s, and
selectors ξ1 and ξ2 for the two players, we denote by
[
Succ(s, ξ1 , ξ2 ) =
Succ(s, a1 , a2 )
a1 ∈Supp(ξ1 (s)),
a2 ∈Supp(ξ2 (s))
the set of possible successors of s with respect to the selectors ξ1 and ξ2 . We write ξ for the
memoryless strategy consisting in playing forever the selector ξ.
Valuations.
A valuation is a mapping v : S → [0, 1] associating a real number v(s) ∈ [0, 1]
with each state s. Given two valuations v, w : S → R, we write v ≤ w when v(s) ≤ w(s)
for all states s ∈ S. For an event A, we denote by Prσ,π (A) the valuation S → [0, 1] defined
for all states s ∈ S by Prσ,π (A) (s) = Prσ,π
s (A). Similarly, for a measurable function
f : Ωs → [0, 1], we denote by Eσ,π (f ) the valuation S → [0, 1] defined for all s ∈ S by
Eσ,π (f ) (s) = Eσ,π
s (f ).
Given a valuation v, and two selectors ξ1 ∈ Λ1 and ξ2 ∈ Λ2 , we define the valuations
P reξ1 ,ξ2 (v), P re1:ξ1 (v), and P re1 (v) as follows, for all states s ∈ S:
P reξ1 ,ξ2 (v)(s) =
P
a,b∈A
P
t∈S
v(t) · δ(s, a, b)(t) · ξ1 (s)(a) · ξ2 (s)(b);
P re1:ξ1 (v)(s) = inf ξ2 ∈Λ2 P reξ1 ,ξ2 (v)(s);
P re1 (v)(s) = supξ1 ∈Λ1 inf ξ2 ∈Λ2 P reξ1 ,ξ2 (v)(s).
Intuitively, P re1 (v)(s) is the greatest expectation of v that player 1 can guarantee at a successor state of s. Also note that given a valuation v, the computation of P re1 (v) reduces to
the solution of a zero-sum one-shot matrix game, and can be solved by linear programming.
Similarly, P re1:ξ1 (v)(s) is the greatest expectation of v that player 1 can guarantee at a
successor state of s by playing the selector ξ1 . Note that all of these operators on valuations
CHAPTER 6. CONCURRENT REACHABILITY GAMES
139
are monotonic: for two valuations v, w, if v ≤ w, then for all selectors ξ1 ∈ Λ1 and ξ2 ∈ Λ2 ,
we have P reξ1 ,ξ2 (v) ≤ P reξ1 ,ξ2 (w), P re1:ξ1 (v) ≤ P re1:ξ1 (w), and P re1 (v) ≤ P re1 (w).
6.2
Markov Decision Processes of Memoryless Strategies
To develop our arguments, we need some facts about one-player versions of concurrent
stochastic games, known as Markov decision processes (MDPs) [Der70, Ber95]. For i ∈
{1, 2}, a player-i MDP (for short, i-MDP) is a concurrent game where, for all states s ∈
S, we have |Γ3−i (s)| = 1. Given a concurrent game G, if we fix a memoryless strategy
corresponding to selector ξ1 for player 1, the game is equivalent to a 2-MDP Gξ1 with the
transition function
δξ1 (s, a2 )(t) =
X
a1 ∈Γ1 (s)
δ(s, a1 , a2 )(t) · ξ1 (s)(a1 ),
for all s ∈ S and a2 ∈ Γ2 (s). Similarly, if we fix selectors ξ1 and ξ2 for both players in a
concurrent game G, we obtain a Markov chain, which we denote by Gξ1 ,ξ2 .
End components.
In an MDP, the sets of states that play an equivalent role to the closed
recurrent classes of Markov chains [Kem83] are called “end components” [CY95, dA97].
Definition 15 (End components) An end component of an i-MDP G, for i ∈ {1, 2}, is
a subset C ⊆ S of the states such that there is a selector ξ for player i so that C is a closed
recurrent class of the Markov chain Gξ .
It is not difficult to see that an equivalent characterization of an end component C is the
following. For each state s ∈ C, there is a subset Mi (s) ⊆ Γi (s) of moves such that:
1. (closed) if a move in Mi (s) is chosen by player i at state s, then all successor states
that are obtained with nonzero probability lie in C; and
140
CHAPTER 6. CONCURRENT REACHABILITY GAMES
2. (recurrent) the graph (C, E), where E consists of the transitions that occur with
nonzero probability when moves in Mi (·) are chosen by player i, is strongly connected.
The following theorem states that in a 2-MDP, for every strategy of player 2, the set of
states that are visited infinitely often is, with probability 1, an end component. Corollary 6
follows easily from Theorem 34.
Theorem 34 [dA97] For a player-1 selector ξ1 , let C be the set of end components of a 2ξ ,π
MDP Gξ1 . For all player-2 strategies π and all states s ∈ S, we have Prs1 (Müller(C)) = 1.
Corollary 6 For a player-1 selector ξ1 , let C be the set of end components of a 2-MDP Gξ1 ,
S
and let Z = C∈C C be the set of states of all end components. For all player-2 strategies
ξ ,π
π and all states s ∈ S, we have Prs1 (Reach(Z)) = 1.
MDPs with reachability objectives.
Given a 2-MDP with a reachability objective
Reach(T ) for player 2, where T ⊆ S, the values can be obtained as the solution of a linear
program [FV97]. The linear program has a variable x(s) for all states s ∈ S, and the
objective function and the constraints are as follows:
min
X
x(s) subject to
s∈S
x(s) ≥
X
t∈S
x(t) · δ(s, a2 )(t) for all s ∈ S and a2 ∈ Γ2 (s)
x(s) = 1 for all s ∈ T
0 ≤ x(s) ≤ 1 for all s ∈ S
The correctness of the above linear program to compute the values follows from [Der70,
FV97].
CHAPTER 6. CONCURRENT REACHABILITY GAMES
6.3
141
Existence of Memoryless ε-Optimal Strategies
In this section we present an elementary proof of the existence of memoryless ε-optimal
strategies for concurrent reachability games, for all ε > 0 (optimal strategies need not exist
for concurrent games with reachability objectives [Eve57]). A proof of the existence of
memoryless optimal strategies for safety games can be found in [dAM01].
6.3.1
From value iteration to selectors
Consider a reachability game with target T ⊆ S. Let W2 = {s ∈ S | Val 1 (Reach(T ))(s) = 0}
be the set of states from which player 1 cannot reach the target with positive probability.
From [dAH00], we know that this set can be computed as W2 = limk→∞ W2k , where W20 =
S \ T , and for all k ≥ 0,
W2k+1 = {s ∈ S \ T | ∃a2 ∈ Γ2 (s). ∀a1 ∈ Γ1 (s). Succ(s, a1 , a2 ) ⊆ W2k }.
The limit is reached in at most |S| iterations. Note that player 2 has a strategy that confines
the game to W2 , and that consequently all strategies are optimal for player 1, as they realize
the value 0 of the game in W2 . Therefore, without loss of generality, in the remainder we
assume that all states in W2 and T are absorbing.
Our first step towards proving the existence of memoryless ε-optimal strategies for
reachability games consists in considering a value-iteration scheme for the computation of
Val 1 (Reach(T )). Let [T ] : S → [0, 1] be the indicator function of T , defined by [T ](s) = 1
for s ∈ T , and [T ](s) = 0 for s 6∈ T . Let u0 = [T ], and for all k ≥ 0, let
uk+1 = P re1 (uk ).
(6.1)
Note that the classical equation assigns uk+1 = [T ]∨ P re1 (uk ), where ∨ is interpreted as the
maximum in pointwise fashion. Since we assume that all states in T are absorbing, the classical equation reduces to the simpler equation given by (6.1). From the monotonicity of P re1
CHAPTER 6. CONCURRENT REACHABILITY GAMES
142
it follows that uk ≤ uk+1 , that is, P re1 (uk ) ≥ uk , for all k ≥ 0. The result of [dAM01] establishes by a combinatorial argument that Val 1 (Reach(T )) = limk→∞ uk , where the limit is
interpreted in pointwise fashion. For all k ≥ 0, let the player-1 selector ζk be a value-optimal
selector for uk , that is, a selector such that P re1 (uk ) = P re1:ζk (uk ). An ε-optimal strategy
σ k for player 1 can be constructed by applying the sequence ζk , ζk−1 , . . . , ζ1 , ζ0 , ζ0 , ζ0 , . . . of
selectors, where the last selector, ζ0 , is repeated forever. It is possible to prove by induction
on k that
inf Prσ
k ,π
π∈Π
(∃j ∈ [0..k]. Xj ∈ T ) ≥ uk .
As the strategies σ k , for k ≥ 0, are not necessarily memoryless, this proof does not suffice for
showing the existence of memoryless ε-optimal strategies. On the other hand, the following
example shows that the memoryless strategy ζ k does not necessarily guarantee the value
uk .
Example 6 Consider the 1-MDP shown in Fig 6.1. At all states except s3 , the set of
available moves for player 1 is a singleton set. At s3 , the available moves for player 1 are
a and b. The transitions at the various states are shown in the figure. The objective of
player 1 is to reach the state s0 .
We consider the value-iteration procedure and denote by uk the valuation after k
iterations. Writing a valuation u as the list of values u(s0 ), u(s1 ), . . . , u(s4 ) , we have:
u0 = (1, 0, 0, 0, 0)
1
u1 = P re1 (u0 ) = (1, 0, , 0, 0)
2
1 1
u2 = P re1 (u1 ) = (1, 0, , , 0)
2 2
1 1 1
u3 = P re1 (u2 ) = (1, 0, , , )
2 2 2
1 1 1
u4 = P re1 (u3 ) = u3 = (1, 0, , , )
2 2 2
The valuation u3 is thus a fixpoint.
143
CHAPTER 6. CONCURRENT REACHABILITY GAMES
1
2
s4
a
s3
b
s1
s2
1
2
s0
Figure 6.1: An MDP with reachability objective.
Now consider the selector ξ1 for player 1 that chooses at state s3 the move a with
probability 1. The selector ξ1 is optimal with respect to the valuation u3 . However, if
player 1 follows the memoryless strategy ξ 1 , then the play visits s3 and s4 alternately and
reaches s0 with probability 0. Thus, ξ1 is an example of a selector that is value-optimal, but
not optimal.
On the other hand, consider any selector ξ1′ for player 1 that chooses move b at
′
state s3 with positive probability. Under the memoryless strategy ξ 1 , the set {s0 , s1 } of
states is reached with probability 1, and s0 is reached with probability
1
2.
Such a ξ1′ is thus
an example of a selector that is both value-optimal and optimal.
In the example, the problem is that the strategy ξ 1 may cause player 1 to stay forever in
S \ (T ∪ W2 ) with positive probability. We call “proper” the strategies of player 1 that
guarantee reaching T ∪ W2 with probability 1.
Definition 16 (Proper strategies and selectors) A player-1 strategy σ is proper if for
all player-2 strategies π, and for all states s ∈ S \(T ∪W2 ), we have Prσ,π
s (Reach(T ∪ W2 )) =
1. A player-1 selector ξ1 is proper if the memoryless player-1 strategy ξ 1 is proper.
We note that proper strategies are closely related to Condon’s notion of a halting game
[Con92]: precisely, a game is halting iff all player-1 strategies are proper. We can check
whether a selector for player 1 is proper by considering only the pure selectors for player 2.
Lemma 38 Given a selector ξ1 for player 1, the memoryless player-1 strategy ξ 1 is
CHAPTER 6. CONCURRENT REACHABILITY GAMES
144
proper iff for every pure selector ξ2 for player 2, and for all states s ∈ S, we have
ξ ,ξ2
Prs1
(Reach(T ∪ W2 )) = 1.
Proof. We prove the contrapositive. Given a player-1 selector ξ1 , consider the 2-MDP Gξ1 .
If ξ 1 is not proper, then by Theorem 34, there must exist an end component C ⊆ S \(T ∪W2 )
in Gξ1 . Then, from C, player 2 can avoid reaching T ∪ W2 by repeatedly applying a pure
selector ξ2 that at every state s ∈ C deterministically chooses a move a2 ∈ Γ2 (s) such that
Succ(s, ξ1 , a2 ) ⊆ C. The existence of a suitable ξ2 (s) for all states s ∈ C follows from the
definition of end component.
The following lemma shows that the selector that chooses all available moves
uniformly at random is proper. This fact will be used later to initialize our strategyimprovement algorithm.
Lemma 39 Let ξ1unif be the player-1 selector that at all states s ∈ S \ (T ∪ W2 ) chooses all
moves in Γ1 (s) uniformly at random. Then ξ1unif is proper.
Proof. Assume towards contradiction that ξ1unif is not proper. From Theorem 34, in the
2-MDP G
unif
ξ1
there must be an end component C ⊆ S \ (T ∪ W2 ). Then, when player 1
unif
follows the strategy ξ 1
, player 2 can confine the game to C. By the definition of ξ1unif ,
player 2 can ensure that the game does not leave C regardless of the moves chosen by
player 1, and thus, for all strategies of player 1. This contradicts the fact that W2 contains
all states from which player 2 can ensure that T is not reached.
The following lemma shows that if the player-1 selector ζk computed by the valueiteration scheme (6.1) is proper, then the player-1 strategy ζ k guarantees the value uk , for
all k ≥ 0.
Lemma 40 Let v be a valuation such that P re1 (v) ≥ v and v(s) = 0 for all states s ∈ W2 .
Let ξ1 be a selector for player 1 such that P re1:ξ1 (v) = P re1 (v). If ξ1 is proper, then for all
player-2 strategies π, we have Prξ1 ,π (Reach(T )) ≥ v.
145
CHAPTER 6. CONCURRENT REACHABILITY GAMES
Proof. Consider an arbitrary player-2 strategy π, and for k ≥ 0, let
vk = Eξ1 ,π v(Xk )
be the expected value of v after k steps under ξ 1 and π. By induction on k, we can prove
vk ≥ v for all k ≥ 0. In fact, v0 = v, and for k ≥ 0, we have
vk+1 ≥ P re1:ξ1 (vk ) ≥ P re1:ξ1 (v) = P re1 (v) ≥ v.
For all k ≥ 0 and s ∈ S, we can write vk as
ξ ,π
vk (s) = Es1
ξ ,π
+ Es1
ξ ,π
+ Es1
ξ ,π
v(Xk ) | Xk ∈ T · Prs1 Xk ∈ T
ξ ,π
v(Xk ) | Xk ∈ S \ (T ∪ W2 ) · Prs1 Xk ∈ S \ (T ∪ W2 )
ξ ,π
v(Xk ) | Xk ∈ W2 · Prs1 Xk ∈ W2 .
ξ ,π
Since v(s) ≤ 1 when s ∈ T , the first term on the right-hand side is at most Prs1 Xk ∈ T .
For the second term, we have limk→∞ Prξ1 ,π Xk ∈ S \ (T ∪ W2 ) = 0 by hypothesis, because
Prξ 1 ,π (Reach(T ∪ W2 )) = 1 and every state s ∈ (T ∪ W2 ) is absorbing. Finally, the third
term on the right hand side is 0, as v(s) = 0 for all states s ∈ W2 . Hence, taking the limit
with k → ∞, we obtain
Prξ1 ,π Reach(T ) = lim Prξ 1 ,π Xk ∈ T ≥ lim vk ≥ v,
k→∞
k→∞
where the last inequality follows from vk ≥ v for all k ≥ 0. The desired result follows.
6.3.2
From value iteration to optimal selectors
Considering again the value-iteration scheme (6.1), since Val 1 (Reach(T )) = limk→∞ uk , for
every ε > 0 there is a k such that uk (s) ≥ uk−1 (s) ≥ Val 1 (Reach(T ))(s) − ε at all states
s ∈ S. Lemma 40 indicates that, in order to construct a memoryless ε-optimal strategy, we
need to construct from uk−1 a player-1 selector ξ1 such that:
CHAPTER 6. CONCURRENT REACHABILITY GAMES
146
1. ξ1 is value-optimal for uk−1 , that is, P re1:ξ1 (uk−1 ) = P re1 (uk−1 ) = uk ; and
2. ξ1 is proper.
To ensure the construction of a value-optimal, proper selector, we need some definitions.
For r > 0, the value class
Urk = {s ∈ S | uk (s) = r}
k = {s ∈ S |
consists of the states with value r under the valuation uk . Similarly we define U⊲⊳r
uk (s) ⊲⊳ r}, for ⊲⊳ ∈ {<, ≤, ≥, >}. For a state s ∈ S, let ℓk (s) = min{j ≤ k | uj (s) = uk (s)}
be the entry time of s in Uukk (s) , that is, the least iteration j in which the state s has the
same value as in iteration k. For k ≥ 0, we define the player-1 selector ηk as follows: if
ℓk (s) > 0, then
ηk (s) = ηℓk (s) (s) = arg sup inf P reξ1 ,ξ2 (uℓk (s)−1 );
ξ1 ∈Λ1 ξ2 ∈Λ2
otherwise, if ℓk (s) = 0, then ηk (s) = ηℓk (s) (s) = ξ1unif (s) (this definition is arbitrary, and
it does not affect the remainder of the proof). In words, the selector ηk (s) is an optimal
selector for s at the iteration ℓk (s). It follows easily that uk = P re1:ηk (uk−1 ), that is, ηk is
also value-optimal for uk−1 , satisfying the first of the above conditions.
To conclude the construction, we need to prove that for k sufficiently large (namely,
for k such that uk (s) > 0 at all states s ∈ S \ (T ∪ W2 )), the selector ηk is proper. To this
end we use Theorem 34, and show that for sufficiently large k no end component of Gηk is
entirely contained in S \ (T ∪ W2 ).2 To reason about the end components of Gηk , for a state
s ∈ S and a player-2 move a2 ∈ Γ2 (s), we write
Succk (s, a2 ) =
[
Succ(s, a1 , a2 )
a1 ∈Supp(ηk (s))
for the set of possible successors of state s when player 1 follows the strategy η k , and player 2
chooses the move a2 .
2
In fact, the result holds for all k, even though our proof, for the sake of a simpler argument, does not
show it.
CHAPTER 6. CONCURRENT REACHABILITY GAMES
147
Lemma 41 Let 0 < r ≤ 1 and k ≥ 0, and consider a state s ∈ S \ (T ∪ W2 ) such that
s ∈ Urk . For all moves a2 ∈ Γ2 (s), we have:
k 6= ∅,
1. either Succk (s, a2 ) ∩ U>r
2. or Succk (s, a2 ) ⊆ Urk , and there is a state t ∈ Succk (s, a2 ) with ℓk (t) < ℓk (s).
Proof. For convenience, let m = ℓk (s), and consider any move a2 ∈ Γ2 (s).
• Consider first the case that Succk (s, a2 ) 6⊆ Urk . Then, it cannot be that Succk (s, a2 ) ⊆
k ; otherwise, for all states t ∈ Succ (s, a ), we would have u (t) ≤ r, and there
U≤r
2
k
k
would be at least one state t ∈ Succk (s, a2 ) such that uk (t) < r, contradicting uk (s) =
k 6= ∅.
r and P re1:ηk (uk−1 ) = uk . So, it must be that Succk (s, a2 ) ∩ U>r
• Consider now the case that Succk (s, a2 ) ⊆ Urk . Since um ≤ uk , due to the monotonicity
of the P re1 operator and (6.1), we have that um−1 (t) ≤ r for all states t ∈ Succk (s, a2 ).
From r = uk (s) = um (s) = P re1:ηk (um−1 ), it follows that um−1 (t) = r for all states
t ∈ Succk (s, a2 ), implying that ℓk (t) < m for all states t ∈ Succk (s, a2 ).
The above lemma states that under ηk , from each state i ∈ Urk with r > 0 we
are guaranteed a probability bounded away from 0 of either moving to a higher-value class
k , or of moving to states within the value class that have a strictly lower entry time.
U>r
Note that the states in the target set T are all in U10 : they have entry-time 0 in the value
class for value 1. This implies that every state in S \ W2 has a probability bounded above
zero of reaching T in at most n = |S| steps, so that the probability of staying forever in
S \ (T ∪ W2 ) is 0. To prove this fact formally, we analyze the end components of Gηk in
light of Lemma 41.
Lemma 42 For all k ≥ 0, if for all states s ∈ S \ W2 we have uk−1 (s) > 0, then for all
player-2 strategies π, we have Prηk ,π Reach(T ∪ W2 )) = 1.
CHAPTER 6. CONCURRENT REACHABILITY GAMES
148
Proof. Since every state s ∈ (T ∪ W2 ) is absorbing, to prove this result, in view of
Corollary 6, it suffices to show that no end component of Gηk is entirely contained in
S \ (T ∪ W2 ). Towards the contradiction, assume there is such an end component C ⊆
S \ (T ∪ W2 ). Then, we have C ⊆ U[rk 1 ,r2 ] with C ∩ Ur2 6= ∅, for some 0 < r1 ≤ r2 ≤ 1,
k
k
where U[rk 1 ,r2 ] = U≥r
∩ U≤r
is the union of the value classes for all values in the interval
1
2
[r1 , r2 ]. Consider a state s ∈ Urk2 with minimal ℓk , that is, such that ℓk (s) ≤ ℓk (t) for all
other states t ∈ Urk2 . From Lemma 41, it follows that for every move a2 ∈ Γ2 (s), there is
k . In
a state t ∈ Succk (s, a2 ) such that (i) either t ∈ Urk2 and ℓk (t) < ℓk (s), (ii) or t ∈ U>r
2
both cases, we obtain a contradiction.
The above lemma shows that ηk satisfies both requirements for optimal selectors
spelt out at the beginning of Section 6.3.2. Hence, ηk guarantees the value uk . This proves
the existence of memoryless ε-optimal strategies for concurrent reachability games.
Theorem 35 (Memoryless ε-optimal strategies) For every ε > 0, memoryless εoptimal strategies exist for all concurrent games with reachability objectives.
Proof. Consider a concurrent reachability game with target T ⊆ S. Since limk→∞ uk =
Val 1 (Reach(T )), for every ε > 0 we can find k ∈ N such that the following two assertions
hold:
max Val 1 (Reach(T ))(s) − uk−1 (s) < ε
s∈S
min uk−1 (s) > 0
s∈S\W2
By construction, P re1:ηk (uk−1 ) = P re1 (uk−1 ) = uk . Hence, from Lemma 40 and Lemma 42,
for all player-2 strategies π, we have Prηk ,π (Reach(T )) ≥ uk−1 , leading to the result.
CHAPTER 6. CONCURRENT REACHABILITY GAMES
6.4
149
Strategy Improvement
In the previous section, we provided a proof of the existence of memoryless ε-optimal strategies for all ε > 0, on the basis of a value-iteration scheme. In this section we present a
strategy-improvement algorithm for concurrent games with reachability objectives. The
algorithm will produce a sequence of selectors γ0 , γ1 , γ2 , . . . for player 1, such that:
γ
γ
1. for all i ≥ 0, we have Val 1 i (Reach(T )) ≤ Val 1 i+1 (Reach(T ));
γ
2. limi→∞ Val 1 i (Reach(T )) = Val 1 (Reach(T )); and
γ
3. if there is i ≥ 0 such that γi = γi+1 , then Val 1 i (Reach(T )) = Val 1 (Reach(T )).
Condition 1 guarantees that the algorithm computes a sequence of monotonically improving
selectors. Condition 2 guarantees that the value guaranteed by the selectors converges to
the value of the game, or equivalently, that for all ε > 0, there is a number i of iterations
such that the memoryless player-1 strategy γ i is ε-optimal. Condition 3 guarantees that
if a selector cannot be improved, then it is optimal. Note that for concurrent reachability
games, there may be no i ≥ 0 such that γi = γi+1 , that is, the algorithm may fail to generate
an optimal selector. This is because there are concurrent reachability games that do not
admit optimal strategies, but only ε-optimal strategies for all ε > 0 [Eve57, dAHK98]. For
turn-based reachability games, it can be easily seen that our algorithm terminates with an
optimal selector.
We note that the value-iteration scheme of the previous section does not directly yield a strategy-improvement algorithm. In fact, the sequence of player-1 selectors
η0 , η1 , η2 , . . . computed in Section 6.3.1 may violate Condition 3: it is possible that for some
i ≥ 0 we have ηi = ηi+1 , but ηi 6= ηj for some j > i. This is because the scheme of
Section 6.3.1 is fundamentally a value-iteration scheme, even though a selector is extracted
from each valuation. The scheme guarantees that the valuations u0 , u1 , u2 , . . . defined as in
150
CHAPTER 6. CONCURRENT REACHABILITY GAMES
(6.1) converge, but it does not guarantee that the selectors η0 , η1 , η2 , . . . improve at each
iteration.
The strategy-improvement algorithm presented here shares an important connection with the proof of the existence of memoryless ε-optimal strategies presented in the
previous section. Here, also, the key is to ensure that all generated selectors are proper.
Again, this is ensured by modifying the selectors, at each iteration, only where they can be
improved.
6.4.1
The strategy-improvement algorithm
Ordering of strategies. We let W2 be as in Section 6.3.1, and again we assume without loss of generality that all states in W2 ∪ T are absorbing.
We define a preorder
≺ on the strategies for player 1 as follows: given two player 1 strategies σ and σ ′ , let
′
σ ≺ σ ′ if the following two conditions hold: (i) Val σ1 (Reach(T )) ≤ Val σ1 (Reach(T )); and
′
(ii) Val σ1 (Reach(T ))(s) < Val σ1 (Reach(T ))(s) for some state s ∈ S. Furthermore, we write
σ σ ′ if either σ ≺ σ ′ or σ = σ ′ .
Informal description of Algorithm 6.
We now present the strategy-improvement al-
gorithm (Algorithm 6) for computing the values for all states in S \(T ∪W2 ). The algorithm
iteratively improves player-1 strategies according to the preorder ≺. The algorithm starts
unif
with the random selector γ0 = ξ 1
. At iteration i+1, the algorithm considers the memory-
less player-1 strategy γ i and computes the value v1 γ i . Observe that since γ i is a memoryless
γ
strategy, the computation of Val 1 i (Reach(T )) involves solving the 2-MDP Gγi . The valγ
uation Val 1 i (Reach(T )) is named vi . For all states s such that P re1 (vi )(s) > vi (s), the
memoryless strategy at s is modified to a selector that is value-optimal for vi . The algorithm then proceeds to the next iteration. If P re1 (vi ) = vi , the algorithm stops and returns
the optimal memoryless strategy γ i for player 1. Unlike strategy-improvement algorithms
151
CHAPTER 6. CONCURRENT REACHABILITY GAMES
for turn-based games (see [Con93] for a survey), Algorithm 6 is not guaranteed to terminate,
because the value of a reachability game may not be rational.
6.4.2
Convergence
Lemma 43 Let γi and γi+1 be the player-1 selectors obtained at iterations i and i + 1 of
Algorithm 6. If γi is proper, then γi+1 is also proper.
Proof. Assume towards a contradiction that γi is proper and γi+1 is not. Let ξ2 be a
pure selector for player 2 to witness that γi+1 is not proper. Then there exist a subset
C ⊆ S \ (T ∪ W2 ) such that C is a closed recurrent set of states in the Markov chain
Gγi+1 ,ξ2 . Let I be the nonempty set of states where the selector is modified to obtain γi+1
from γi ; at all other states γi and γi+1 agree.
Since γi and γi+1 agree at all states other than the states in I, and γi is a proper
γ
strategy, it follows that C∩I 6= ∅. Let Uri = {s ∈ S\(T ∪W2 ) | Val 1 i (Reach(T ))(s) = vi (s) =
r} be the value class with value r at iteration i. For a state s ∈ Uri the following assertion
i 6= ∅. Let z = max{r | U i ∩ C 6= ∅},
holds: if Succ(s, γi , ξ2 ) ( Uri , then Succ(s, γi , ξ2 ) ∩ U>r
r
that is, Uzi is the greatest value class at iteration i with a nonempty intersection with the
closed recurrent set C. It easily follows that 0 < z < 1. Consider any state s ∈ I, and
i 6= ∅. Hence
let s ∈ Uqi . Since P re1 (vi )(s) > vi (s), it follows that Succ(s, γi+1 , ξ2 ) ∩ U>q
we must have z > q, and therefore I ∩ C ∩ Uzi = ∅. Thus, for all states s ∈ Uzi ∩ C,
we have γi (s) = γi+1 (s). Recall that z is the greatest value class at iteration i with a
i ∩ C = ∅. Thus for all states s ∈ C ∩ U i , we have
nonempty intersection with C; hence U>z
z
Succ(s, γi+1 , ξ2 ) ⊆ Uzi ∩ C. It follows that C ⊆ Uzi . However, this gives us three statements
that together form a contradiction: C ∩ I 6= ∅ (or else γi would not have been proper),
I ∩ C ∩ Uzi = ∅, and C ⊆ Uzi .
Lemma 44 For all i ≥ 0, the player-1 selector γi obtained at iteration i of Algorithm 6 is
152
CHAPTER 6. CONCURRENT REACHABILITY GAMES
proper.
Proof. By Lemma 39 we have that γ0 is proper. The result then follows from Lemma 43
and induction.
Lemma 45 Let γi and γi+1 be the player-1 selectors obtained at iterations i and i + 1
γ
of Algorithm 6. Let I = {s ∈ S | P re1 (vi )(s) > vi (s)}. Let vi = Val 1 i (Reach(T )) and
γ
vi+1 = Val 1 i+1 (Reach(T )). Then vi+1 (s) ≥ P re1 (vi )(s) for all states s ∈ S; and therefore
vi+1 (s) ≥ vi (s) for all states s ∈ S, and vi+1 (s) > vi (s) for all states s ∈ I.
Proof. Consider the valuations vi and vi+1 obtained at iterations i and i + 1, respectively,
and let wi be the valuation defined by wi (s) = 1 − vi (s) for all states s ∈ S. Since γi+1 is
proper (by Lemma 44), it follows that the counter-optimal strategy for player 2 to minimize
vi+1 is obtained by maximizing the probability to reach W2 . In fact, there are no end
components in S \ (W2 ∪ T ) in the 2-MDP Gγi+1 . Let
wi+1 (s) =



wi (s)


1 − P re1 (vi )(s) < wi (s)
if s ∈ S \ I;
if s ∈ I.
In other words, wi+1 = 1 − P re1 (vi ), and we also have wi+1 ≤ wi . We now show that
wi+1 is a feasible solution to the linear program for MDPs with the objective Reach(W2 ), as
γ
described in subsection 6.2. Since vi = Val 1 i (Reach(T )), it follows that for all states s ∈ S
and all moves a2 ∈ Γ2 (s), we have
wi (s) ≥
X
t∈S
wi (t) · δγi (s, a2 ).
For all states s ∈ S \ I, we have γi (s) = γi+1 (s) and wi+1 (s) = wi (s), and since wi+1 ≤ wi ,
it follows that for all states s ∈ S \ I and all moves a2 ∈ Γ2 (s), we have
wi+1 (s) ≥
X
t∈S
wi+1 (t) · δγi+1 (s, a2 ).
CHAPTER 6. CONCURRENT REACHABILITY GAMES
153
Since for s ∈ I the selector γi+1 (s) is obtained as an optimal selector for P re1 (vi )(s), it
follows that for all states s ∈ I and all moves a2 ∈ Γ2 (s), we have
wi+1 (s) ≥
X
t∈S
wi (t) · δγi+1 (s, a2 ).
Since wi+1 ≤ wi , for all states s ∈ I and all moves a2 ∈ Γ2 (s), we have
wi+1 (s) ≥
X
t∈S
wi+1 (t) · δγi+1 (s, a2 ).
Hence it follows that wi+1 is a feasible solution to the linear program for MDPs with
reachability objectives. Since the reachability valuation for player 2 for Reach(W2 ) is the
least solution (observe that the objective function of the linear program is a minimizing
function), it follows that vi+1 ≥ 1 − wi+1 = P re1 (vi ). Thus we obtain vi+1 (s) ≥ vi (s) for
all states s ∈ S, and vi+1 (s) > vi (s) for all states s ∈ I.
Theorem 36 (Strategy improvement) The following two assertions hold about Algorithm 6:
1. For all i ≥ 0, we have γ i γ i+1 ; moreover, if γ i = γ i+1 , then γ i is an optimal
strategy.
γ
2. limi→∞ vi = limi→∞ Val 1 i (Reach(T )) = Val 1 (Reach(T )).
Proof. We prove the two parts as follows.
1. The assertion that γ i γ i+1 follows from Lemma 45. If γ i = γ i+1 , then P re1 (vi ) = vi ,
indicating that vi = Val 1 (Reach(T )). From Lemma 44 it follows that γ i is proper.
γ
Since γ i is proper by Lemma 40, we have Val 1 i (Reach(T )) ≥ vi = Val 1 (Reach(T )).
It follows that γ i is optimal for player 1.
2. Let v0 = [T ] and u0 = [T ]. We have u0 ≤ v0 . For all k ≥ 0, by Lemma 45, we have
vk+1 ≥ [T ] ∨ P re1 (vk ). For all k ≥ 0, let uk+1 = [T ] ∨ P re1 (uk ). By induction we
CHAPTER 6. CONCURRENT REACHABILITY GAMES
154
conclude that for all k ≥ 0, we have uk ≤ vk . Moreover, vk ≤ Val 1 (Reach(T )), that
is, for all k ≥ 0, we have
uk ≤ vk ≤ Val 1 (Reach(T )).
Since limk→∞ uk = Val 1 (Reach(T )), it follows that
γ
lim Val 1 k (Reach(T )) = lim vk = Val 1 (Reach(T )).
k→∞
k→∞
The theorem follows.
6.5
Conclusion
In this chapter we presented an elementary and combinatorial proof of existence
of ε-optimal strategies in concurrent games with reachability objectives, for all ε > 0. We
also presented a strategy improvement algorithm.
CHAPTER 6. CONCURRENT REACHABILITY GAMES
Algorithm 6 Strategy-Improvement Algorithm
Input: a concurrent game structure G with target set T .
0. Compute W2 = {s ∈ S | Val 1 (Reach(T ))(s) = 0}.
1. Let γ0 = ξ1unif and i = 0.
γ
2. Compute v0 = Val 1 0 (Reach(T )).
3. do {
3.1. Let I = {s ∈ S \ (T ∪ W2 ) | P re1 (vi )(s) > vi (s)}.
3.2. Let ξ1 be a player-1 selector such that for all states s ∈ I,
we have P re1:ξ1 (vi )(s) = P re1 (vi )(s) > vi (s).
3.3. The player-1 selector γi+1 is defined as follows: for each state t ∈ S, let



γi (t) if s 6∈ I;
γi+1 (t) =


ξ1 (s) if s ∈ I.
γ
3.4. Compute vi+1 = Val 1 i+1 (Reach(T )).
3.5. Let i = i + 1.
} until I = ∅.
155
156
Chapter 7
Concurrent Limit-average Games
In this chapter we will consider concurrent games with limit-average objectives.1
The main result of this chapter is as follows: the value of a concurrent zero-sum game with
limit-average payoff can be approximated to within ε in time exponential in a polynomial
in the size of the game times polynomial in logarithmic in
1
ε,
for all ε > 0. Our main
technique is the characterization of values as semi-algebraic quantities [BK76, MN81]. We
show that for a real number α, whether the value of a concurrent limit-average game at a
state s is strictly greater than α can be expressed as a sentence in the theory of real-closed
fields. Moreover, this sentence is polynomial in the size of the game and has a constant
number of quantifier alternations. The theory of real-closed fields is decidable in time
exponential in the size of a formula and doubly exponential in the quantifier alternation
depth [Bas99]. This, together with binary search over the range of values, gives an algorithm
exponential in polynomial in the size of the game graph times polynomial in logarithmic in
1
ε
to approximate the value, for ε > 0. Our techniques combine several known results to
provide the first complexity bound on the general problem of approximating the value of
stochastic games with limit-average objectives.
1
Preliminary versions of the results of this chapter appeared in [CMH07]
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
7.1
157
Definitions
We start with few basic definitions.
Concurrent limit-average games. We consider zero-sum concurrent limit-average games
which consists of a concurrent game structure G = (S, A, Γ1 , Γ2 , δ) and a real-valued reward
function r : S → R, that maps every state to a real valued reward.
Size of a concurrent game. We now present a few notations that we will require to
precisely characterize the complexity of concurrent limit-average games. Given a concurrent
game G we use the following notations:
1. n = |S| is the number of states;
2. |δ| =
P
s∈S
|Γ1 (s)| · |Γ2 (s)| is the number of entries of the transition function.
Given a rational concurrent game (where all rewards and transition probabilities are rational) we use the following notations:
1. size(δ) =
P
P
P
|r(s)|, where |r(s)| denotes the space to express r(s) in binary;
t∈S
a∈Γ1 (s)
P
b∈Γ2 (s) |δ(s, a, b)(t)|,
where |δ(s, a, b)(t)| denotes the space
to express δ(s, a, b)(t) in binary;
2. size(r) =
s∈S
3. |G| = size(G) = size(δ) + size(r).
The specification of a game G requires O(|G|) bits. Given a stochastic game with n states,
we assume without loss of generality that the state space of the stochastic game structure
is enumerated as natural numbers, S = {1, 2, . . . , n}, i.e., the states are numbered from 1
to n.
Limit-average objectives. For a reward function r and n ∈ N consider the function
Avg(r, n) : Ω → R defined as follows: for a play ω = hs0 , , s1 , s2 , . . .i we have
Avg(r, N )(ω) =
N −1
1 X
r(si );
n
i=0
158
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
i.e., it is the average of the first n rewards of the play. For a reward function r we consider
two functions LimAvgInf(r) : Ω → R and LimAvgSup(r) : Ω → R defined as follows: for a
play ω = hs0 , s1 , s2 , . . .i we have
LimAvgInf(r)(ω) = lim inf Avg(r, N );
N →∞
LimAvgSup(r)(ω) = lim sup Avg(r, N ).
N →∞
In other words, the above functions specify the “long-run” average of the rewards of the
play. Also note that for all plays ω we have LimAvgInf(ω) ≤ LimAvgSup(ω).
Valuations and values. A valuation is a mapping v : S → R, associating a real number
v(s) with each state s. Given a state s ∈ S and we are interested in finding the maximal
payoff that player 1 can ensure against all strategies for player 2, and the maximal payoff
that player 2 can ensure against all strategies for player 1. We call such payoff the value of
the game G at s for player i ∈ {1, 2}. The value for player 1 and player 2 are given by the
valuations v1 : S → R and v2 : S → R, defined for all s ∈ S by
Val 1 (LimAvg(r))(s) = sup inf Eσ,π
s [LimAvgInf(r)];
σ∈Σ π∈Π
Val 2 (LimAvg(r))(s) = sup inf Eσ,π
s [LimAvgSup(−r)].
π∈Π σ∈Σ
Mertens and Neyman [MN81] established the determinacy of concurrent limit-average
games.
Theorem 37 ([MN81]) For all concurrent limit-average games, for all states s, we have
Val 1 (LimAvg(r))(s) + Val 2 (LimAvg(r))(s) = 0.
Stronger notion of existence of values [MN81]. The value for concurrent games exists
in a strong sense [MN81]: ∀ε > 0, ∃σ ∗ ∈ Σ, ∃π ∗ ∈ Π such that the following conditions hold:
1. for all σ and π we have
∗
−ε + Eσ,π
[LimAvgSup(r)] ≤ Esσ
s
∗ ,π
[LimAvgInf(r)] + ε;
(7.1)
159
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
2. for all ε1 > 0, there exists N0 = such that for all σ and π, for all N ≥ N0 we have
∗
−ε1 + Eσ,π
[Avg(r, N )] ≤ Esσ
s
∗ ,π
[Avg(r, N )] + ε1 .
(7.2)
The condition (7.1) is equivalent to the following equality
σ,π
sup inf Eσ,π
s [LimAvgInf(r)] = inf sup Es [LimAvgSup(r)].
σ∈Σ π∈Π
7.2
π∈Π σ∈Σ
Theory of Real-closed Fields and Quantifier Elimination
Our main technique is to represent the value of a game as a formula in the theory
of real-closed fields. We denote by R the real-closed field (R, +, ·, 0, 1, ≤) of the reals with
addition and multiplication. In the sequel we write “real-closed field” to denote the realclosed field R. An atomic formula is an expression of the form p < 0 or p = 0, where p is
a (possibly) multi-variate polynomial with coefficients in the real-closed field. Coefficients
are rationals or symbolic constants (e.g., the symbolic constant e stands for 2.71828 . . .).
We will consider the special case when only rational coefficients of the form
q1
q2 ,
where q1 , q2
are integers, are allowed. A formula is constructed from atomic formulas by the grammar
ϕ ::= a | ¬a | ϕ ∧ ϕ | ϕ ∨ ϕ | ∃x.ϕ | ∀x.ϕ,
where a is an atomic formula, ¬ denotes complementation, ∧ denotes conjunction, ∨ denotes
disjunction, and ∃ and ∀ denote existential and universal quantification, respectively. We
use the standard abbreviations such as p ≤ 0, p ≥ 0 and p > 0 that are derived as follows:
p ≤ 0 (for p < 0 ∨ p = 0),
p ≥ 0 (for ¬(p < 0)),
and
p > 0 (for ¬(p ≤ 0)).
The semantics of formulas are given in a standard way. A variable x is free in the formula
ϕ if it is not in the scope of a quantifier ∃x or ∀x. A sentence is a formula with no free
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
160
variables. A formula is quantifier-free if it does not contain any existential or universal
quantifier. Two formulas ϕ1 and ϕ2 are equivalent if the set of free variables of ϕ1 and
ϕ2 are the same, and for every assignment to the free variables the formula ϕ1 is true if
and only if the formula ϕ2 is true. A formula ϕ admits quantifier elimination if there is an
algorithm to convert it to an equivalent quantifier-free formula. A quantifier elimination
algorithm takes as input a formula ϕ and returns an equivalent quantifier-free formula, if
one exists.
Tarski [Tar51] proved that every formula in the theory of the real-closed field
admits quantifier elimination, and (by way of quantifier elimination) that there is an algorithm to decide the truth of a sentence ϕ in the theory of the real-closed field. The
complexity of the algorithm of Tarski has subsequently improved, and we now present a
result of Basu [Bas99] on the complexity of quantifier elimination for formulas in the theory
of the real-closed field.
Complexity of quantifier elimination. We first define the length of a formula ϕ, and
then define the size of a formula with rational coefficients. We denote the length and size of
ϕ as len(ϕ) and size(ϕ), respectively. The length of a polynomial p is defined as the sum of
the length of its constituent monomials plus the number of monomials in the polynomial.
The length of a monomial is defined as its degree plus the number of variables plus 1 (for
the coefficient). For example, for the monomial
1
4
· x3 · y 2 · z, its length is 6 + 3 + 1 = 10.
Given a polynomial p, the length of both p < 0 and p = 0 is len(p) + 2. This defines the
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
161
length of an atomic formula a. The length of a formula ϕ is inductively defined as follows:
len(¬a) = len(a) + 1;
len(ϕ1 ∧ ϕ2 ) = len(ϕ1 ) + len(ϕ2 ) + 1;
len(ϕ1 ∨ ϕ2 ) = len(ϕ1 ) + len(ϕ2 ) + 1;
len(∃x.ϕ) = len(ϕ) + 2;
len(∀x.ϕ) = len(ϕ) + 2.
Observe that the length of a formula is defined for formulas that may contain symbolic
constants as coefficients. For formulas with rational coefficients we define its size as follows:
the size of ϕ, i.e., size(ϕ), is defined as the sum of len(ϕ) and the space required to specify
the rational coefficients of the polynomials appearing in ϕ in binary. We state a result
of Basu [Bas99] on the complexity of quantifier elimination for the real-closed field. The
following theorem is a specialization of Theorem 1 of [Bas99]; also see Theorem 14.14 and
Theorem 14.16 of [BPMF].
Theorem 38 [Bas99] Let d, k, m be nonnegative integers, X = {X1 , X2 , . . . , Xk } be a
set of k variables, and P = {p1 , p2 , . . . , pm } be a set of m polynomials over the set X
of variables, each of degree at most d and with coefficients in the real-closed field. Let
X[r] , X[r−1] , . . . , X[1] denote a partition of the set X of variables into r subsets such that the
P
set X[i] of variables has size ki , i.e., ki = |X[i] | and ri=1 ki = k. Let
Φ = (Qr X[r] ). (Qr−1 X[r−1] ). · · · .(Q2 X[2] ). (Q1 X[1] ). ϕ(p1 , p2 , . . . , pm )
be a sentence with r alternating quantifiers Qi ∈ {∃, ∀} (i.e., Qi+1 6= Qi ), and
ϕ(p1 , p2 , . . . , pm ) is a quantifier-free formula with atomic formulas of the form pi ⊲⊳ 0,
where ⊲⊳ ∈ {<, >, =}. Let D denote the ring generated by the coefficients of the polynomials
in P. Then the following assertions hold.
1. There is an algorithm to decide the truth of Φ using
m
Q
i (ki +1)
Q
·d
i
O(ki )
· len(ϕ)
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
162
arithmetic operations (multiplication, addition, and sign determination) in D.
2. If D = Z (the set of integers) and the bit sizes of the coefficients of the polynomials
are bounded by γ, then the bit sizes of the integers appearing in the intermediate
computations of the truth of Φ is bounded by
Q
γ ·d
i
O(ki )
.
The result of part 1 of Theorem 38 holds for sentences with symbolic constants
as coefficients. The result of part 2 of Theorem 38 is for the special case of sentences
with only integer coefficients. Part 2 of Theorem 38 follows from the results of [Bas99],
but is not explicitly stated as a theorem there; for an explicit statement as a theorem, see
Theorem 14.14 and Theorem 14.16 of [BPMF].
Remark 1 Given two integers a and b, let |a| and |b| denote the space to express a and b
in binary, respectively. The following assertions hold: given integers a and b,
1. given signs of a and b, the sign determination of a + b can be done in O(|a| + |b|) time,
i.e., in linear time, and the sign determination of a · b can be done O(1) time, i.e., in
constant time;
2. addition of a and b can be done in O(|a| + |b|) time, i.e., in linear time; and
3. multiplication of a and b can be done in O(|a| · |b|) time, i.e., in quadratic time.
It follows from the above observations, along with Theorem 38, that if D = Z and the bit
sizes of the coefficients of the polynomials appearing in Φ are bounded by γ, then the truth
of Φ can be determined in time
m
Q
i
O(ki +1)
Q
·d
i
O(ki )
· O(len(ϕ) · γ 2 ).
(7.3)
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
7.3
163
Computation of Values in Concurrent Limit-average
Games
The values in concurrent limit-average games can be irrational even if all rewards
and transition probability values are rational [RF91]. Hence, we can algorithmically only
approximate the values to within a precision ε, for ε > 0.
Discounted value functions. Let G be a concurrent limit-average game with reward
function r. For a real β, with 0 < β < 1, the β-discounted value function Val β1 is defined as
follows:
Val β1 (s) = sup inf
σ∈Σ π∈Π
β · Eσ,π
s
∞
X
(1 − β)i · r(Xi ) .
i=1
For a concurrent limit-average game G, the β-discounted value function Val β1 is monotonic
with respect to β in a neighborhood of 0 [MN81].
7.3.1
Sentence for the value of a concurrent limit-average game
We now describe how we can obtain a sentence in the theory of the real-closed
field that states that the value of a concurrent limit-average game at a given state is strictly
greater than α, for a real α. The sentence applies to the case where the rewards and the
transition probabilities are specified as symbolic or rational constants.
Formula for β-discounted value functions. Given a real α and a concurrent limitaverage game G, we present a formula in the theory of the real-closed field to express that
the β-discounted value Val β1 (s) at a given state s is strictly greater than α, for 0 < β < 1. A
valuation v ∈ Rn is a vector of reals, and for 1 ≤ i ≤ n, the i-th component of v represents
the value v(i) for state i. For every state s ∈ S and for every move b ∈ Γ2 (s) we define a
polynomial u(s,b,1) for player 1 as a function of x ∈ Dist(Γ1 (s)), a valuation v and 0 < β < 1
164
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
as follows:
u(s,b,1) (x, v, β) = β ·
X
a∈Γ1 (s)
x(a) · r(s) + (1 − β) ·
X
a∈Γ1 (s)
x(a) ·
X
t∈S
δ(s, a, b)(t) · v(t) − v(s).
The polynomial u(s,b,1) consists of the variables β, and x(a) for a ∈ Γ1 (s), and v(t) for
t ∈ S. Observe that given a concurrent limit-average game, r(s) and δ(s, a, b)(t) for t ∈ S
and a ∈ Γ1 (s) are rational or symbolic constants given by the game graph, not variables.
The coefficients of the polynomial are r(s) and δ(s, a, b)(t) for a ∈ Γ1 (s) and t ∈ S. Hence
the polynomial has degree 3 and has 1+|Γ1 (s)|+n variables. Similarly, for s ∈ S, a ∈ Γ1 (s),
y ∈ Dist(Γ2 (s)), v ∈ Rn , and 0 < β < 1, we have polynomials u(s,a,2) defined by
u(s,a,2) (y, v, β) = β ·
X
b∈Γ2 (s)
y(b) · r(s) + (1 − β) ·
X
b∈Γ2 (s)
y(b) ·
X
t∈S
δ(s, a, b)(t) · v(t) − v(s).
The sentence stating that Val β1 (s) is strictly greater than α is as follows.
We have
variables xs (a) for s ∈ S and a ∈ Γ1 (s), ys (b) for s ∈ S and b ∈ Γ2 (s), and
variables v(1), v(2), . . . , v(n).
For simplicity we write xs for the vector of variables
xs (a1 ), xs (a2 ), . . . , xs (aj ), where Γ1 (s) = {a1 , a2 , . . . , aj }, ys for the vector of variables
ys (b1 ), ys (b2 ), . . . , ys (bl ), where Γ2 (s) = {b1 , b2 , . . . , bl }, and v for the vector of variables
v(1), v(2), . . . , v(n). The sentence is as follows:
Φβ (s, α) = ∃x1 , . . . , xn . ∃y1 , . . . , yn . ∃v.
V
V
^
s∈S,b∈Γ2 (s)
Ψ(x1 , x2 , . . . , xn , y1 , y2 , . . . , yn )
V
u(s,b,1) (xs , v, β) ≥ 0
^
s∈S,a∈Γ1 (s)
u(s,a,2) (ys , v, β) ≤ 0
v(s) − α > 0;
where Ψ(x1 , x2 , . . . , xn , y1 , y2 , . . . , yn ) specify the constraints that x1 , x2 , . . . , xn and
y1 , y2 , . . . , yn are valid randomized strategies and is defined as follows:
Ψ(x1 , x2 , . . . , xn , y1 , y2 , . . . , yn ) =
^
s∈S
∧
^
s∈S
(
X
a∈Γ1 (s)
(
X
b∈Γ2 (s)
xs (a)) − 1 = 0 ∧
ys (b)) − 1 = 0 ∧
^
s∈S,a∈Γ1 (s)
^
s∈S,b∈Γ2 (s)
xs (a) ≥ 0
ys (b) ≥ 0 .
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
The total number of polynomials in Φβ (s, α) is 1 +
P
s∈S (3 · |Γ1 (s)| + 3 · |Γ2 (s)| + 2)
165
= O(|δ|).
In the above formula we treat β as a variable; it is a free variable in Φβ (s, α). Given a
concurrent limit-average game G, for all 0 < β < 1, the correctness of Φβ (s, α) to specify
that Val β1 (s) > α can be proved from the results of [Sha53]. Also observe that we have a
formula in the existential theory of real-closed field (the sub-class of the real-closed field
where only the existential quantifier is used) that states Val β1 (s) > α. Since the existential
theory of is decidable in PSPACE [Can88], we have the following result.
Theorem 39 Given a rational concurrent limit-average game G, a state s of G, a discount
factor β and a rational α, whether Val β1 (s) > α can be decided in PSPACE.
Value of a game as limit of discounted games. The result of Mertens-Neyman [MN81]
established that the value of a concurrent limit-average game is the limit of the β-discounted
values, as β goes to 0. Formally, we have
Val 1 (LimAvg(r))(s) = lim Val β1 (s).
β→0+
Sentence for the value of a concurrent limit-average game. From the characterization of the value of a concurrent limit-average game as the limit of the β-discounted values
and the monotonicity property of the β-discounted values in a neighborhood of 0, we obtain
the following sentence Φ(s, α) stating that the value at state s is strictly greater than α.
In addition to variables for Φβ (s, α), we have the variables β and β1 . The sentence Φ(s, α)
specifies the expression
∃β1 > 0. ∀β ∈ (0, β1 ). Φβ (s, α),
166
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
and is defined as follows:
Φ(s, α) = ∃β1 . ∀β. ∃x1 , . . . , xn . ∃y1 , . . . , yn . ∃v.
V
V
β1 > 0
V
β1 − β ≤ 0
W
β≤0
Ψ(x1 , x2 , . . . , xn , y1 , y2 , . . . , yn )
W
β1 − β > 0
V
V
^
u(s,b,1) (xs , v, β) ≥ 0
s∈S,b∈Γ2 (s)
^
u(s,a,2) (ys , v, β) ≤ 0
s∈S,a∈Γ1 (s)
v(s) − α > 0;
where Ψ(x1 , x2 , . . . , xn , y1 , y2 , . . . , yn ) specify the constraints that x1 , x2 , . . . , xn and
y1 , y2 , . . . , yn are valid randomized strategies (the same formula used for Φβ (s, α)). Observe that Φ(s, α) contains no free variable (i.e., the variables xs , ys , v, β1 , and β are
quantified). A similar sentence was used in [BK76] for values of discounted games. The
total number of polynomials in Φ(s, α) is O(|δ|); in addition to the O(|δ|) polynomials of
Φβ (s, α) there are 4 more polynomials in Φ(s, α). In the setting of Theorem 38 we obtain
the following bounds for Φ(s, α):
m = O(|δ|);
Y
(ki + 1) = O(|δ|);
k = O(|δ|);
r = O(1);
d = 3;
(7.4)
i
and hence we have
m
Q
i (ki +1)
Q
·d
i
O(ki )
= O(|δ|)O(|δ|) = 2O
|δ|·log(|δ|)
.
Also observe that for a concurrent game G, the sum of the lengths of the polynomials
appearing in the sentence is O(|δ|). The present analysis along with Theorem 38 yields
Theorem 40. The result of Theorem 40 holds for concurrent limit-average games where the
transition probabilities and rewards are specified as symbolic constants.
Theorem 40 Given a concurrent limit-average game G with reward function r, a state s
of G, and a real α, there is an algorithm to decide whether Val 1 (LimAvg(r))(s) > α using
167
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
O |δ|·log(|δ|)
2
·O(|δ|) arithmetic operations (addition, multiplication, and sign determination)
in the ring generated by the set
{r(s) | s ∈ S} ∪ {δ(s, a, b)(t) | s, t ∈ S, a ∈ Γ1 (s), b ∈ Γ2 (s)} ∪ {α}.
7.3.2
Algorithmic analysis
For algorithmic analysis we consider rational concurrent games, i.e., concurrent
games such that r(s) and δ(s, a, b)(t) are rational for all states s, t ∈ S, and moves a ∈ Γ1 (s)
and b ∈ Γ2 (s). In the sequel we will only consider rational concurrent games. Given the
sentence Φ(s, α) to specify that Val 1 (LimAvg(r))(s) > α, we first reduce it to an equivalent
b α) as follows.
sentence Φ(s,
• For every rational coefficient ℓ =
q1
q2 ,
where q1 , q2 ∈ Z, appearing in Φ(s, α) we apply
the following procedure:
1. introduce a new variable zℓ ;
2. replace ℓ by zℓ in Φ(s, α);
3. add a polynomial q2 · zℓ − q1 = 0 as a conjunct to the quantifier-free body of the
formula; and
4. existentially quantify zℓ in the block of existential quantifiers after quantifying
β1 and β.
Thus we add O(|δ|) variables and polynomials, and increase the degree of the polynomials
b α) are integers, and hence the ring
in Φ(s, α) by 1. Also observe that the coefficients in Φ(s,
b generated by the coefficients in Φ(s,
b α) is Z. Similar to the bounds obtained in (7.4), in
D
b α):
the setting of Theorem 38 we obtain the following bounds for Φ(s,
m
b = O(|δ|);
b
k = O(|δ|);
Y
(b
ki + 1) = O(|δ|);
i
rb = O(1);
db = 4;
168
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
and hence
m
b
Q
i
O(b
ki +1)
· db
Q
i
O(b
ki )
O(|δ|)
= O(|δ|)
O |δ|·log(|δ|)
=2
.
b α) can be bounded by O(|δ|), and the sum
Also observe that the length of the sentence Φ(s,
b α) can be bounded by O(|G| + |α|), where |α| is
of the bit sizes of the coefficients in Φ(s,
the space required to express α in binary. This along with (7.3) of Remark 1 yields the
following result.
Theorem 41 Given a rational concurrent limit-average game G, a state s of G, and a
rational α, there is an algorithm that decides whether Val 1 (LimAvg(r))(s) > α in time
2O |δ|·log(|δ|) · O(|δ|) · O(|G|2 + |α|2 ) = 2O |δ|·log(|δ|) · O(|G|2 + |α|2 ).
7.3.3
Approximating the value of a concurrent limit-average game
We now present an algorithm that approximates the value Val 1 (LimAvg(r))(s)
within a tolerance of ε > 0. The algorithm (Algorithm 7) is obtained by a binary search
technique along with the result of Theorem 41. Algorithm 7 works for the special case
of normalized rational concurrent games. We first define normalized rational concurrent
games and then present a reduction of rational concurrent games to normalized rational
concurrent games.
Normalized rational concurrent games. A rational concurrent game is normalized if
the reward function satisfies the following two conditions: (1) min{r(s) | s ∈ S} ≥ 0; and
(2) max{r(s) | s ∈ S} ≤ 1.
Reduction. We now present a reduction of rational concurrent games to normalized rational concurrent games, such that by approximating the values of normalized rational
concurrent games we can approximate the values of rational concurrent games. Given a
reward function r : S → R, let
M = max{abs(r(s)) | s ∈ S},
169
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
where abs(r(s)) denotes the absolute value of r(s). Without loss of generality we assume
M > 0. Otherwise, r(s) = 0 for all states s ∈ S, and hence Val 1 (LimAvg(r))(s) = 0 for
all states s ∈ S (i.e., the value function can be trivially computed). Consider the reward
function r + : S → [0, 1] defined as follows: for s ∈ S we have
r + (s) =
r(s) + M
.
2M
The reward function r + is normalized and the following assertion hold. Let Val 1 (LimAvg(r))
and Val 1 (LimAvg(r + )) denote the value functions for the reward functions r and r + , respectively. Then for all states s ∈ S we have
Val 1 (LimAvg(r + ))(s) =
Val 1 (LimAvg(r))(s) + M
.
2M
Hence it follows that for rationals α, l, and u, such that l ≤ u, we have
Val 1 (LimAvg(r))(s) > α iff Val 1 (LimAvg(r + ))(s) >
α+M
;
2M
Val 1 (LimAvg(r + ))(s) ∈ [l, u] iff Val 1 (LimAvg(r))(s) ∈ [M · (2l − 1), M · (2u − 1)].
Given a rational ε > 0, to obtain an interval [l1 , u1 ] such that u1 − l1 ≤ ε and
Val 1 (LimAvg(r))(s) ∈ [l1 , u1 ], we first obtain an interval [l, u] such that u − l ≤
ε
2M
and Val 1 (LimAvg(r + ))(s) ∈ [l, u]. From the interval [l, u] we obtain the interval [l1 , u1 ] =
[M ·(2l−1), M ·(2u−1)] such that Val 1 (LimAvg(r))(s) ∈ [l1 , u1 ] and u1 −l1 = 2·M ·(u−l) ≤ ε.
Hence we present the algorithm to approximate the values for normalized rational concurrent games.
Running time of Algorithm 7. In Algorithm 7 we denote by Φ(s, m) the sentence to
specify that Val 1 (LimAvg(r))(s) > m, and by Theorem 41 the truth of Φ(s, m) can be
decided in time
O |δ|·log(|δ|)
2
· O(|G|2 + |m|2 ),
for a concurrent game G, where |m| is the number of bits required to specify m. In Algorithm 7, the variables l and u are initially set to 0 and 1, respectively. Since the game
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
170
Algorithm 7 Approximating the value of a concurrent limit-average game
Input: a normalized rational concurrent limit-average game G,
a state s of G, and a rational value ε > 0 specifying the desired tolerance.
Output: a rational interval [l, u] such that u − l ≤ 2ε and Val 1 (LimAvg(r))(s) ∈ [l, u].
1. l := 0; u := 1; m = 21 ;
2. repeat for ⌈log 1ε ⌉ steps
2.1. if Φ(s, m), then
2.1.1. l := m; u := u; m :=
l+u
2 ;
2.2. else
2.2.1. l := l; u := m; m :=
l+u
2 ;
3. return [l, u];
is normalized, the initial values of l and u clearly provide lower and upper bounds on the
value, and provide starting bounds for the binary search. In each iteration of the algorithm,
in Steps 2.1.1 and 2.2.1, there is a division by 2. It follows that after i iterations l, u, and
m can be expressed as
q
,
2i
where q is an integer and q ≤ 2i . Hence l, u, and m can always
be expressed in
O log
1 ε
bits. The loop in Step 4 runs for ⌈log 1ε ⌉ = O log 1ε iterations, and every iteration can
O |δ|·log(|δ|)
be computed in time 2
· O |G|2 + log2 1ε . This gives the following theorem.
Theorem 42 Given a normalized rational concurrent limit-average game G, a state s of G,
and a rational ε > 0, Algorithm 7 computes an interval [l, u] such that Val 1 (LimAvg(r))(s) ∈
171
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
[l, u] and u − l ≤ 2ε, in time
O |δ|·log(|δ|)
2
1
3 1
· O |G| · log
+ log
.
ε
ε
2
The reduction from rational concurrent games to normalized concurrent games
suggest that for a rational concurrent game G and a rational tolerance ε > 0, to obtain an
interval of length at most ε that contains the value Val 1 (LimAvg(r))(s), it suffices to obtain
an interval of length of at most
ε
2M
that contains the value in the corresponding normalized
game, where M = max{abs(r(s)) | s ∈ S}. Since M can be expressed in |G| bits, it follows
that the size of the normalized game is O(|G|2 ). Given a tolerance ε > 0 for the rational
concurrent game, we need to consider the tolerance
ε
2·M
for the normalized game. The above
analysis along with Theorem 42 yields the following corollary (the corollary is obtained from
Theorem 42 by substituting |G| by |G|2 , and log 1ε by |G| · log 1ε ).
Corollary 7 Given a rational concurrent limit-average game G, a state s of G, and a
rational ε > 0, an interval [l, u] such that Val 1 (LimAvg(r))(s) ∈ [l, u] and u − l ≤ 2ε, can
be computed in time
O |δ|·log(|δ|)
2
1
5
3
3 1
· O |G| · log
+ |G| · log
.
ε
ε
Hence from Theorem 41 and Corollary 7 we obtain the following result.
Theorem 43 Given a rational concurrent limit-average game G, a state s of G, rational
ε > 0, and rational α, the following assertions hold.
1. (Decision problem) Whether Val 1 (LimAvg(r))(s) > α can be decided in EXPTIME.
2. (Approximation problem) An interval [l, u] such that u − l
Val 1 (LimAvg(r))(s) ∈ [l, u] can be computed in EXPTIME.
≤
2ε and
CHAPTER 7. CONCURRENT LIMIT-AVERAGE GAMES
7.4
172
Conclusion
We showed that concurrent limit-average games can be solved in EXPTIME. Un-
fortunately, the only lower bound on the complexity is PTIME-hardness (polytime hardness). The hardness follows from a reduction from alternating reachability. However, from
the results of [EY06] it follows that the square root-sum problem (that is not known to be
in NP) can be reduced to the decision problem of concurrent limit-average games. Even
for the simpler case of turn-based deterministic limit-average games no polynomial time
algorithm is known [ZP96], and the best known algorithm for turn-based stochastic limitaverage games is exponential in the size of the game. In case of turn-based stochastic games,
pure memoryless optimal strategies exist [LL69] and the complexity of turn-based stochastic
limit-average games is NP ∩ coNP. Since the number of pure memoryless strategies can be
at most exponential in the size of the game, there is an exponential time algorithm to compute the values exactly (not approximation) for turn-based stochastic limit-average games
(also see the survey [NS03]). The main open problems are as follows.
1. Whether a PSPACE algorithm can be obtained for the decision problem or approximation problem for concurrent limit-average games remains open.
2. Whether a polynomial time algorithm can be obtained for turn-based stochastic limitaverage games and turn-based deterministic limit-average games remains open.
173
Chapter 8
Concurrent Parity Games
In this chapter we consider concurrent zero-sum games with parity objectives.1
Concurrent games are substantially more complex than turn-based games in several respects. To see this, consider the structure of optimal strategies. For turn-based stochastic
parity games pure memoryless optimal strategies exist. It is this observation that led to
the NP ∩ coNP results for turn-based parity games. By contrast, in concurrent games,
already for reachability objectives, players must in general play with randomized strategies.
Furthermore, optimal strategies may not exist: rather, for every real ε > 0, the players have
ε-optimal strategies. Even for relatively simple parity winning conditions, such as Büchi
conditions, ε-optimal strategies need both randomization and infinite memory [dAM01].
It is therefore not inconceivable that the complexity of concurrent parity games might be
considerably worse. The only known previous algorithm for computing the value of concurrent parity games is triple-exponential [dAM01]: it is shown in [dAM01] that the value
of concurrent parity games can be characterized by fixpoint of expressions written in the
quantitative µ-calculus, a quantitative extension of the ordinary µ-calculus [Koz83]. The
triple-exponential algorithm was obtained via a reduction of the quantitative µ-calculus
1
Preliminary version of the results of this chapter appeared in [CdAH06a]
CHAPTER 8. CONCURRENT PARITY GAMES
174
formula to the theory of the real closed field, and then using decision procedures for the
theory of reals with addition and multiplication [Tar51, Bas99]. This approach fails to provide concise witness for ε-optimal strategies. In [dAH00] it is shown that, given a parity
game, the problem of deciding whether the value at a state is 1 is in NP ∩ coNP, and there
exists concise witness for ε-optimal strategies, for ε > 0, for states with optimal value 1.
In this chapter, we present concise witness for ε-optimal strategies, for ε > 0, for
concurrent games with parity objectives. We then show that the values can be computed
to any desired precision ε > 0 in PSPACE. Also given a rational α, it can be decided in
PSPACE if the value at a state is greater than α. The basic idea behind the proof, which
can no longer rely on the existence of pure memoryless optimal strategies, is as follows.
Through a detailed analysis of the branching structure of the stochastic process of the game,
we show that we can construct an ε-optimal strategy by stitching together strategies, one
per each value class. In each value class the witness is obtained as the witness constructed
in [dAH00], that satisfy certain local conditions. This gives us a witness for an ε-optimal
strategy. The decision procedure guesses and verifies the qualitative witnesses of [dAH00]
in each value class and the local optimality is checked by a formula in the existential theory
of the real-closed field. This gives us a NPSPACE algorithm. A detailed analysis of our
proof also gives us the following result. We show that in concurrent parity games, there
exists a sequence of ε-optimal strategies, such that the limit of the ε-optimal strategies,
for ε → 0, is a memoryless strategy. This result parallels with the celebrated result of
Mertens-Neyman [MN81] for concurrent games with limit-average objectives, that states
there exist ε-optimal strategies that in limit coincide with some memoryless strategies (the
memoryless strategy correspond to the memoryless optimal strategies in the discounted
game with discount factor tends to 0). It may be noted that the memoryless strategies that
the ε-optimal strategies coincide, is itself not necessarily ε-optimal.
CHAPTER 8. CONCURRENT PARITY GAMES
8.1
175
Strategy Complexity and Computational Complexity
In this section we construct witnesses for perennial ε-optimal strategies. The
construction is based on a reduction to qualitative analysis.
Reduction to qualitative witness. Recall that a value class VC(Φ, r) is the set of states
s such that the value for player 1 is r. That is for an objective Φ, we have VC(Φ, r) = {s |
Val 1 (Φ)(s) = r}. By VC(Φ, < r) we denote the set {s | Val 1 (Φ)(s) < r} and similarly we
use VC(Φ, > r) to denote the set {s | Val 1 (Φ)(s) > r}. Intuitively, we can picture the game
as a “quilt” of value classes. Two of the value classes correspond to values 1 (player 1 wins
with probability arbitrarily close to 1) and 0 (player 2 wins with probability arbitrarily close
to 1); the other value classes correspond to intermediate values. We construct a witness
for ε-optimal strategies in a piece-meal fashion. We first show that we can construct,
for each intermediate value class, a strategy that with probability arbitrarily close to 1
guarantees either leaving the class, or winning without leaving the class. Such a strategy
can be constructed using results from [dAH00], and has a concise witness. Second, we show
that the above strategy can be constructed so that when the class is left, it is left via a
locally ε-optimal selector. By stitching together the strategies constructed in this fashion
for the various value classes, we will obtain a single witness for the complete game. The
construction of a strategy in a value class relies on a reduction. We present few notations
and then the reduction.
Value class notations. Let G be a game graph with a parity objective Φ = Parity(p).
For a state s we define the set of allowable supports
OptSupp(s) = {γ ⊆ Γ1 (s) | ∃ξ1ℓ ∈ Λℓ (Φ). Supp(ξ1ℓ ) = γ},
to be the set of supports of locally optimal selectors. For every s ∈ S, we assume that we
have a fixed way to enumerate OptSupp(s) = {γ1 , γ2 , . . . , γk }. For a state s ∈ VC(Φ, r) and
CHAPTER 8. CONCURRENT PARITY GAMES
176
γ ⊆ Γ1 (s), we define the following sets of move pairs: let B = Γ1 (s) \ γ,
Eq(s, γ) = {(a1 , a2 ) ∈ B × Γ2 (s) | Succ(s, a1 , a2 ) ⊆ VC(Φ, r)};
X
Neg(s, γ) = {(a1 , a2 ) ∈ B × Γ2 (s) |
δ(s, a1 , a2 )(t) · Val 1 (Φ)(t) < Val 1 (Φ)(s)};
t∈S
Pos(s, γ) = {(a1 , a2 ) ∈ B × Γ2 (s) | Succ(s, a1 , a2 ) ∩ (S \ VC(Φ, r)) 6= ∅; and
X
δ(s, a1 , a2 )(t) · Val 1 (Φ)(t) ≥ Val 1 (Φ)(s)}.
t∈S
Observe that (γ × Γ2 (s), Eq(s, γ), Pos(s, γ), Neg(s, γ)) forms a partition of Γ1 (s) × Γ2 (s).
Reduction.
Let G = (S, A, Γ1 , Γ2 , δ) be a concurrent game structure with parity objective
Φ = Parity(p) for player 1. Let ξe associate a locally optimal selector to γ ∈ OptSupp(s), i.e.,
we have ξeγ (s) = ξ(s) such that ξ ∈ Λℓ (Φ) and Supp(ξ(s)) = γ. Given the game structure G,
e and a value class V = VC(Φ, r), we construct
the priority function p, the set of selectors ξ,
e with a priority function pe as follows.
e = (S,
e A,
e Γ
e1 , Γ
e 2 , δ)
a game graph G
1. State space. Given a state s, let OptSupp(s) = {γ1 , γ2 , . . . , γk }. Then we have
Se = {e
s | s ∈ V } ∪ {w1 , w2 } ∪ {(e
s, i) | s ∈ V, i ∈ {1, 2, . . . , |OptSupp(s)|}}.
2. Priority function.
(a) pe(e
s) = p(s) for all s ∈ V .
e
(b) pe((e
s, i)) = p(s) for all (e
s, i) ∈ S.
(c) pe(w1 ) = 0 and pe(w2 ) = 1.
3. Moves assignment.
f1 (e
f2 (e
(a) Γ
s) = {1, 2, . . . , |OptSupp(s)|} and Γ
s) = {a2 }. Note that every se ∈ Se is a
player-1 turn-based state.
177
CHAPTER 8. CONCURRENT PARITY GAMES
f1 ((e
(b) Γ
s, i)) = {i} ∪ (Γ1 (s) \ γi ) for i ∈ {1, 2, . . . , k} where OptSupp(s) =
f2 ((e
{γ1 , γ2 , . . . , γk }, and Γ
s, i)) = Γ2 (s). At state (e
s, i) all the moves in γi are
collapsed to one move i and all the moves not in γi are still available.
4. Transition function.
(a) The states w1 and w2 are absorbing states. Observe that player 1 have value 1
at state w1 and value 0 at state w2 for the parity objective Parity(e
p).
e s, i, a2 )((e
(b) For all states se ∈ Se we have δ(e
s, i)) = 1. Hence at state se player 1 can
decide which element in OptSupp(s) to play and if player 1 chooses move i the
game proceeds to state (e
s, i).
(c) Transition function at state (e
s, i).
i. (Case 1.) For a move a2 ∈ Γ2 (s), if there is a move a1 ∈ γi such that
e s, i), i, a2 )(w1 ) = 1. The above transition
Succ(s, a1 , a2 )∩(S\V ) 6= ∅, then δ((e
specifies that for a move a2 for player 2, if there is a move a1 ∈ γi for player 1
such that the game G proceeds to a state not in V with positive probability,
e the game proceeds to the state w1 , which has value 1 for player 1,
then in G
with probability 1.
ii. (Case 2.)
For move a2 ∈ Γ2 (s), if for every move a1 ∈ γi we have
Succ(s, a1 , a2 ) ⊆ V , then
e s, i), i, a2 )(e
δ((e
s′ ) =
X
a1 ∈γi
ξeγi (s)(a1 ) · δ(s, a1 , a2 )(s′ );
for s′ ∈ V .
iii. (Case 3.) For move pairs (a1 , a2 ) ∈ Eq(s, γi ) we have
e s, i), a1 , a2 )(e
δ((e
s′ ) = δ(s, a1 , a2 )(s′ );
for s′ ∈ V.
iv. (Case 4.) For move pairs (a1 , a2 ) ∈ Pos(s, γi ) we have
e s, i), a1 , a2 )(w1 ) = 1.
δ((e
CHAPTER 8. CONCURRENT PARITY GAMES
178
v. (Case 5.) For move pairs (a1 , a2 ) ∈ Neg(s, γi ) we have
e s, i), a1 , a2 )(w2 ) = 1.
δ((e
We use the following notation for the reduction: for a value class VC(Φ, r) we write
to denote the reduction.
e p)
e pe) = VQR(G, r, ξ,
(G,
Proposition 12 Let G be a concurrent game structure with a parity objective Φ =
e p).
e pe) = VQR(G, r, ξ,
Parity(p). For r > 0, consider the value-class VC(Φ, r), and (G,
Consider the event
A=
∞
[
e (a1 , a2 ) ∈ Neg(s, γi )}.
{Xj = (e
s, i), Y1,j = a1 , Y2,j = a2 | (e
s, i) ∈ S,
j=1
e we have
Let s ∈ V and consider the state se. For all strategies σ
e and π
e in G
Prσsee,eπ (Reach({w2 }) = Prσsee,eπ (A).
e1 (e
Proof. We first observe that given a state s ∈ VC(Φ, r), for the state se, for all a1 ∈ Γ
s)
e 2 (e
and a2 ∈ Γ
s) we have Succ(e
s, a1 , a2 ) ⊆ {(e
s, i) | i ∈ {1, 2, . . . , |OptSupp(s)|}}. We now
consider the following case analysis.
1. If player 1 plays move i at state (e
s, i), then since γi ∈ OptSupp(s), for all moves
a1 ∈ γi , and all moves a2 ∈ Γ2 (s), we have either (a) Succ(s, a1 , a2 ) ⊆ VC(Φ, r); or
(b) Succ(s, a1 , a2 )∩VC(Φ, > r) 6= ∅. Hence for the move i, at the player 1 at state (e
s, i),
for all moves a2 ∈ Γ2 ((e
s, i)) = Γ2 (s), we have (a) if Succ(s, a1 , a2 ) ⊆ VC(r), for all
a1 ∈ γi , then Succ((e
s, i), i, a2 ) ⊆ Ser \{w1 , w2 }; (b) else Succ(s, a1 , a2 )∩VC(Φ, > r) 6= ∅,
for some a1 ∈ γi , then Succ((e
s, i), i, a2 ) = {w1 }. That is for all moves a2 ∈ Γ2 (s) we
have Succ((e
s, i), i, a2 ) ⊆ Se \ {w2 }.
179
CHAPTER 8. CONCURRENT PARITY GAMES
2. For move pairs (a1 , a2 ) ∈ Eq(s, γi ), we have Succ((e
s, i), a1 , a2 ) ⊆ Se \ {w1 , w2 }.
3. For move pairs (a1 , a2 ) ∈ Pos(s, γi ), we have Succ((e
s, i), a1 , a2 ) = {w1 }.
4. For move pairs (a1 , a2 ) ∈ Neg(s, γi ), we have Succ((e
s, i), a1 , a2 ) = {w2 }.
It follows that the probability of reaching w2 is the probability of the event A. Hence the
result follows.
e p), for r > 0, and define
e pe) = VQR(G, r, ξ,
Strategy maps. We consider the reduction (G,
two strategy maps below.
1. Given a strategy σε in the game structure G we construct a projected strategy σ
eε =
e as follows:
t1 (σε ) in the game G
• σ
eε (e
s0 , (e
s0 , i0 ), se1 , (e
s1 , i1 ), . . . , (e
sk−1 , ik−1 ), sek )(j)
P
arg maxγ∈OptSupp(sk ) a∈γ σε (s0 , s1 , . . . , sk )(a).
• σ
eε (se0 , (se0 , i0 ), se1 , (se1 , i1 ), . . . , sek , (sek , j))(j)
for all a′
6∈
=
= 1 if and only if γj
P
a∈γj
=
σε (s0 , s1 , . . . , sk )(a) and
γj we have σ
eε (se0 , (se0 , i0 ), se1 , (se1 , i1 ), . . . , sek , (sek , j))(a′ )
=
σε (s0 , s1 , . . . , sk )(a′ ).
e we define a
2. Given a strategy σε in the game structure G and a strategy π
e in G,
strategy π = t2 (σε , π
e) in the game structure G as follows:
• π(s0 , s1 , . . . , sk ) = π
e(se0 , (se0 , i0 ), se1 , (se1 , i1 ), . . . , sek )) such that for all 0 ≤ l ≤ k,
we have σ
eε (se0 , (se0 , i0 ), se1 , (se1 , i1 ), . . . , sel ))(il ) = 1, where σ
eε = t1 (σε ).
Lemma 46 Let G be a concurrent game structure with a parity objective Φ = Parity(p).
e p). There exists a constant c, such that for all
e pe) = VQR(G, r, ξ,
For r > 0, consider (G,
ε > 0, for all locally ε-optimal strategies σε in G, for all states se ∈ Se \{w2 }, for all strategies
e we have Prσeε ,eπ (Reach({w2 })) ≤ c · ε, where σ
eε = t1 (σε ).
π
e in G,
se
180
CHAPTER 8. CONCURRENT PARITY GAMES
Proof. For ε > 0, consider a locally ε-optimal strategy σε . Let
c1 = min{Val 1 (Φ)(s) −
P
t∈S
Val 1 (Φ)(s) −
P
Val 1 (Φ)(t) · δ(s, a1 , a2 )(t) | s ∈ S, a1 ∈ Γ1 (s), a2 ∈ Γ2 (s),
t∈S
Val 1 (Φ)(t) · δ(s, a1 , a2 )(t) > 0} > 0.
Since σε is a locally ε-optimal strategy, it follows that the strategy σ
eε = t1 (σε ) satisfies
that at every round j, at state (e
s, i), the move pairs (a1 , a2 ) ∈ Neg(s, γi ) is played with
P
1
1
probability at most εj , with ∞
j=1 εj ≤ c1 · ε. With c = c1 , we obtain that the probability
of the event A (as defined in Proposition 12) is at most c · ε. The result then follows from
Proposition 12.
Lemma 47 Let G be a concurrent game structure with a parity objective Φ = Parity(p).
e p). For all states s ∈ VC(Φ, r) the following
e pe) = VQR(G, r, ξ,
For r > 0, consider (G,
assertions hold.
1. There exists a constant c such that for all locally ε-optimal and perennial ε-optimal
r
ℓ
e we
strategies σε ∈ ΣPL
e in G
ε (Φ) ∩ Σε (Φ) in G, with 0 < ε < 2 , for all strategies π
e ≥ 1 − c · ε, where Φ
e = Parity(e
have Presσeε ,eπ (Φ)
p), and σ
eε = t1 (σε ).
e for objective Φ
e = Parity(e
2. The state se is a limit-sure winning state in G
p); i.e., {e
s|
e
s ∈ VC(Φ, r)} ⊆ Limit G
p)).
1 (Parity(e
Proof. We prove both the cases below.
1. For 0 < ε <
r
2,
consider a locally ε-optimal and perennial ε-optimal strategy σε .
e We construct an extended strategy π for player 2 in G as
Consider a strategy π
e in G.
follows: π = t2 (σε , π
e). Since σε is a perennial ε-optimal strategy, it follows that for
all histories hs0 , s1 , . . . , sn i such that for all 0 ≤ i ≤ n, si ∈ VC(Φ, r), we have
Prσs ε ,π (Φ | hs0 , s1 , . . . , sn i) ≥ r − ε ≥
r
> 0.
2
It follows that for all histories hs0 , s1 , . . . , sn i we have Prσs ε ,π (Φ ∪ Reach(S \ VC(Φ, r)) |
hs0 , s1 , . . . , sn i) ≥ r −ε ≥
r
2
> 0, i.e., for all n we have Prσs ε ,π (Φ∪Reach(S \VC(Φ, r)) |
CHAPTER 8. CONCURRENT PARITY GAMES
Fn ) ≥ r − ε ≥
r
2
181
> 0. It follows from Lemma 3 that Prσs ε ,π (Φ ∪ Reach(S \ VC(Φ, r))) =
1. Then by construction we obtain that Prσseeε ,eπ (Φ ∪ Reach({w1 , w2 })) = 1. Since
σε is a locally ε-optimal strategy, by Lemma 46, there exists a constant c such that
e ≥
Presσeε ,eπ (Reach({w2 })) ≤ c·ε, and hence we have Prsσeeε ,eπ (Φ∪Reach({w1 })) = Prσseeε ,eπ (Φ)
1 − c · ε. The result follows.
ℓ
2. By Lemma 5 and Proposition 6 it follows that for all ε > 0, we have ΣPL
ε (Φ)∩Σε (Φ) 6=
∅. The desired result then follows from part (1).
Limit-sure witness [dAH00]. The witness strategy σ for limit-sure winning games constructed in [dAH00] consists of the following parts: a ranking function of the states, and
a ranking function of the actions at a state. The ranking functions were described by a
µ-calculus formula. The witness strategy σ at round k of a play, at a state s, plays the
actions of the least rank at s with positive-bounded probabilities and other actions with
vanishingly small probabilities (as function of ε), in appropriate proportion as described by
the ranking function. Hence, the strategy σ can be described as
σ = (1 − εk )σℓ + εk · σd (εk ),
where σℓ is any selector with ξ such that Supp(ξ) is the set actions with least rank, and
σd (εk ) denotes a selector with Supp(σd (εk )) = Γ1 \ Supp(σℓ ). Hence the strategy σ plays
the moves in Supp(σd (εk )) with vanishingly small probability as εk → 0. We denote by
limit-sure witness move set the set of actions with the least rank, i.e., Supp(σℓ ). It follows
from the above construction that as ε → 0, the limit-sure winning strategy σ converges to
the memoryless selector σℓ , i.e., the limit of the limit-sure witness strategy is a memoryless
strategy. The following lemma follows from the limit-sure witness strategy construction
in [dAH00]. Lemma 49 is also a direct consequence of the results of [dAH00]. Lemma 49
states that the set of limit-sure winning states of a concurrent game structure is independent
CHAPTER 8. CONCURRENT PARITY GAMES
182
of the precise transition probabilities of transition function and depends only on the support
of the transition function.
Lemma 48 Let G be a concurrent game structure with a parity objective Φ = Parity(p).
e p). For every state se in G
e pe) = VQR(G, r, ξ,
e there is a pure
For r > 0, consider (G,
memoryless move j for player 1 and a limit-sure winning strategy σ such that Supp(σ)(e
s) =
{j} and the limit-sure witness move set at (e
s, j) = {j}.
Proof. The existence of pure memoryless move is a consequence of the fact that every state
se is a player-1 turn-based state and the witness construction in [dAH00].
Lemma 49 Let G1 = (S, A, Γ1 , Γ2 , δ1 ) and G2 = (S, A, Γ1 , Γ2 , δ2 ) be two concurrent game
structures with the same set S of states, same set A of moves, and same move assignment functions Γ1 and Γ2 . If for all s ∈ S, for all a1 ∈ Γ1 (s) and a2 ∈ Γ2 (s) we have
Supp(δ1 (s, a1 , a2 )) = Supp(δ2 (s, a1 , a2 )), then for all parity objectives Φ, the set of limitG2
1
sure winning states in G1 and G2 coincide, i.e., Limit G
1 (Φ) = Limit 1 (Φ).
Simplified construction. From Lemma 48 we conclude that in Lemma 47 it is possible
e is limit-sure
to restrict that every state se has a single successor and still every state se in G
winning. From Lemma 49 we conclude that for the selectors ξe of Lemma 47, the precise
transition probabilities do not matter, and only the support matters. We will formalize
the result in Lemma 50. We first present another reduction that is not restricted to value
classes. The reduction is similar to the reduction VQR, but it takes a subset of moves for
every state specified by a function f , and a partition of moves specified by Eq, Pos, and
Neg. Given a game graph G, a priority function p, let V ⊆ S be a subset of states. Let
f : S → 2A \∅ be a function such that f (s) ⊆ Γ1 (s) for all s ∈ S, and let ξf be a selector such
that for all s ∈ S we have Supp(ξf (s)) = f (s). For s ∈ V , let (Eq(s, f ), Pos(s, f ), Neg(s, f ))
define partition of the move set (Γ1 (s) \ f (s)) × Γ2 (s) such that for all (a1 , a2 ) ∈ Eq(s, f ) we
183
CHAPTER 8. CONCURRENT PARITY GAMES
e with a priority
e = (S,
e A,
e Γ
e1 , Γ
e2 , δ)
have Succ(s, a1 , a2 ) ⊆ V . We construct a game graph G
function pe as follows.
1. State space. The state space is as follows: Se = V ∪ {w1 , w2 }.
2. Priority function.
(a) pe(s) = p(s) for all s ∈ V .
(b) pe(w1 ) = 0 and pe(w2 ) = 1.
3. Moves assignment.
e 1 (s) = {1} ∪ (Γ1 (s) \ f (s)) and Γ
e 2 (s) = Γ2 (s). At state s ∈ Se all the moves
(a) Γ
in f (s) are collapsed to single move 1 and all the moves not in f (s) are still
available.
4. Transition function.
(a) The states w1 and w2 are absorbing states. Observe that player 1 have value 1
at state w1 and value 0 at state w2 for the parity objective Parity(e
p).
(b) Transition function at state s.
i. (Case 1.) For a move a2 ∈ Γ2 (s), if there is a move a1 ∈ f (s) such that
e 1, a2 )(w1 ) = 1. The above transition
Succ(s, a1 , a2 ) ∩ (S \ V ) 6= ∅, then δ(s,
specifies that for a move a2 for player 2, if there is a move a1 ∈ f (s) for
player 1 such that the game G proceeds to a state not in V with positive
e the game proceeds to the state w1 , which has value 1
probability, then in G
for player 1, with probability 1.
ii. (Case 2.) For a move a2 ∈ Γ2 (s), if for every move a1 ∈ f (s) we have
Succ(s, a1 , a2 ) ⊆ V , then
e 1, a2 )(s′ ) =
δ(s,
X
a1 ∈f (s)
ξf (s)(a1 ) · δ(s, a1 , a2 )(s′ );
for s′ ∈ V .
184
CHAPTER 8. CONCURRENT PARITY GAMES
iii. (Case 3.) For move pairs (a1 , a2 ) ∈ Eq(s, f ) we have
e a1 , a2 )(s′ ) = δ(s, a1 , a2 )(s′ );
δ(s,
for s′ ∈ V.
iv. (Case 4.) For move pairs (a1 , a2 ) ∈ Pos(s, f ) we have
e a1 , a2 )(w1 ) = 1.
δ(s,
v. (Case 5.) For move pairs (a1 , a2 ) ∈ Neg(s, f ) we have
e a1 , a2 )(w2 ) = 1.
δ(s,
The main difference to the reduction VQR is as follows:
the function f (replacing
OptSupp(s)) chooses a single subset of moves, and the functions Eq, Pos, and Neg are
given. We refer to this reduction as follows:
e pe) = QRS(G, V, f, ξf , Eq, Pos, Neg, p).
(G,
From Lemma 47, Lemma 48, and Lemma 49 we obtain Lemma 50.
Lemma 50 Let G be a concurrent game structure with a parity objective Φ = Parity(p).
There is a function f : S → 2A \ ∅ such that for all s ∈ S we have f (s) ∈ OptSupp(s), and
the following assertion hold. For a value class V = VC(Φ, r), for r > 0, let
Eq(s, f ) = Eq(s, f (s));
Pos(s, f ) = Pos(s, f (s));
Neg(s, f ) = Neg(s, f (s));
for s ∈ V , where Eq, Pos, and Neg are as defined for value classes.
For all selec-
e pe) =
tors ξf such that for all s ∈ S we have Supp(ξf (s)) = f (s), in the game (G,
e =
QRS(G, V, f, ξf , Eq, Pos, Neg, p) every state s is limit-sure winning for the objective Φ
e
Parity(e
p), i.e., V ⊆ Limit G
p)).
1 (Parity(e
Given a concurrent game structure G, and a strategy that ensures that certain
action pairs are played with very small probabilities, we obtain an bound on the probability
CHAPTER 8. CONCURRENT PARITY GAMES
185
on reaching a set of states by considering an MDP. The construction of such an MDP is
described below.
MDP construction for partitions.
Given a concurrent game structure G =
(S, A, Γ1 , Γ2 , δ), let T ⊆ S be a subset of states. Let
1. P = (V0 , V1 , . . . , Vk ) be a partition of S;
2. f : S → 2A \ ∅ be a function such that f (s) ⊆ Γ1 (s) for all s ∈ S; and
3. ξf be a selector such that for all s ∈ S we have Supp(ξf (s)) = f (s).
For s ∈ Vi , let (Eq(s, f ), Pos(s, f ), Neg(s, f )) define partition of (Γ1 (s) \ f (s)) × Γ2 (s) such
that for all (a1 , a2 ) ∈ Eq(s, f ) we have Succ(s, a1 , a2 ) ⊆ Vi . We now consider a player 2
b as follows:
b = (S, A,
b Γ
b 2 , δ)
MDP G
b 2 (s) = {1} × Γ2 (s) ∪ Eq(s, f ) ∪ Pos(s, f ).
1. Γ
b (a1 , a2 ))(t) = δ(s, a1 , a2 )(t), for t ∈ S and (a1 , a2 ) ∈ Eq(s, f ) ∪ Pos(s, f ), and
2. δ(s,
b (1, a2 ))(t) = P
δ(s,
a1 ∈f (s) δ(s, a1 , a2 )(t) · ξf (s)(a1 ), for t ∈ S and a2 ∈ Γ2 (s).
b = ∪s∈S Γ
b 2 (s).
3. A
Intuitively, player 2 can choose moves in Eq(s, f ) ∪ Pos(s, f ), and if player 2 decides to play
(1, a2 ), then the process of playing the selector ξf with the move a2 is mimiced. We will
use the following notations:
b = M(G,
c
G
f, P, ξf , Eq, Pos, Neg);
π
b
vb(G, f, P, ξf , Eq, Pos, Neg, T )(s) = supπb∈Π
b Prs (Reach(T )),
c and vb(G, f, P, ξf , Eq, Pos, Neg, T )(s) denotes the
i.e., we denote the MDP construction as M
b to reach T . The following lemma shows that if a strategy plays action
value at s in MDP G
pairs in Neg(s) with very small probabilities, and maintain the ratio of probabilities of f (s)
186
CHAPTER 8. CONCURRENT PARITY GAMES
as specified by ξf , then the maximal probability to reach T is bounded approximately by
b
the maximal value to reach T in G.
Lemma 51 Let G = (S, A, Γ1 , Γ2 , δ) be a concurrent game structure and T ⊆ S. Let
P = (V0 , V1 , . . . , Vk ) be a partition of S, f : S → 2A \ ∅ a function such that f (s) ⊆ Γ1 (s)
for all s ∈ S, ξf a selector such that for all s ∈ S we have Supp(ξf (s)) = f (s). For s ∈ Vi ,
let (Eq(s, f ), Pos(s, f ), Neg(s, f )) define partition of (Γ1 (s) \ f (s)) × Γ2 (s) such that for all
(a1 , a2 ) ∈ Eq(s, f ) we have Succ(s, a1 , a2 ) ⊆ Vi . Let σ be a strategy such that the following
condition holds: for all histories w ∈ S ∗ and all s ∈ S
σ(w · s)(a)
= ξf (s)(a);
a1 ∈f (s) σ(w · s)(a1 )
P
for all a ∈ f (s),
i.e., the proportion of the probability of actions in f (s) are same as in ξf (s). Consider the
event
A=
∞
[ [
{Xj = s, (Y1,j , Y2,j ) ∈ Neg(s, f )}.
s∈S j=1
For ε > 0, for a strategy π for player 2 and a state s ∈ S,
σ,π
if Prσ,π
b(G, f, P, ξf , Eq, Pos, Neg, T )(s) + ε.
s (A) ≤ ε, then Prs (Reach(T )) ≤ v
Proof. Let vb(s) = vb(G, f, P, ξf , Eq, Pos, Neg, T )(s). Let σ be a strategy such that the
following condition holds: for all histories w ∈ S ∗ and all s ∈ S
P
σ(w · s)(a)
= ξf (s)(a);
a1 ∈f (s) σ(w · s)(a1 )
for all a ∈ f (s).
b = M(G,
c
Then by the construction of the MDP G
f, P, ξf , Eq, Pos, Neg), it follows that for
all s ∈ S and all π ∈ Π we have
π
b
Prσ,π
b(s),
s (Reach(T ) | A) ≤ sup Prs (Reach(T )) = v
b
π
b∈Π
where A is the complement of event A, i.e., given the event A that ensures that action
pairs in Neg(s, f ) are never played, the probability to reach T is bounded by the maximal
CHAPTER 8. CONCURRENT PARITY GAMES
187
b Hence for all s ∈ S and all π ∈ Π, if Prσ,π
probability to reach T in G.
s (A) ≤ ε, then we
have
σ,π
σ,π
σ,π
σ,π
Prσ,π
s (Reach(T )) = Prs (Reach(T ) | A) · Prs (A) + Prs (Reach(T ) | A) · Prs (A)
≤ ε + vb(s).
The desired result follows.
The following two lemmas relate the value of a concurrent game structure with a
parity objective, with qualitative winning in partitions of the state space and reachability
to the limit-sure winning set.
Lemma 52 Given a concurrent game structure G with a parity objective Parity(p), let
W1 = Limit 1 (Parity(p)) and W2 = Limit 2 (coParity(p)). Let P = (V0 , V1 , . . . , Vk ) be a partition of S, f : S → 2A \ ∅ a function such that f (s) ⊆ Γ1 (s) for all s ∈ S, ξf a selector such
that for all s ∈ S we have Supp(ξf (s)) = f (s). For s ∈ Vi , let (Eq(s, f ), Pos(s, f ), Neg(s, f ))
define partition of (Γ1 (s) \ f (s)) × Γ2 (s) such that for all (a1 , a2 ) ∈ Eq(s, f ) we have
Succ(s, a1 , a2 ) ⊆ Vi . Suppose the following conditions hold.
1. Assumption 1. V0 = W1 , Vk = W2 .
2. Assumption 2. For all 1 ≤ i ≤ k − 1, for all s ∈ Vi
• for all a2 ∈ Γ2 (s), if Succ(s, ξf , a2 )∩(S \Vi ) 6= ∅, then Succ(s, ξf , a2 )∩(∪j<i Vj ) 6=
∅; and
• Succ(s, a1 , a2 ) ∩ (∪j<i Vj ) 6= ∅, for all (a1 , a2 ) ∈ Pos(s, f ).
3. Assumption 3. For all 1 ≤ i ≤ k − 1, every state s ∈ Vi is limit-sure winning
ei for objective Parity(e
ei , pei ) = QRS(G, Vi , f, ξf , Eq, Pos, Neg, p); i.e.,
in G
pi ), where (G
e
i
pi )).
Vi ⊆ Limit G
1 (Parity(e
Then for all s ∈ S we have Val 2 (coParity(p))(s) ≤ vb(G, f, P, ξf , Eq, Pos, Neg, W2 )(s).
188
CHAPTER 8. CONCURRENT PARITY GAMES
Proof. Given ε > 0, let σε0 be a strategy such that for all s ∈ V0 and all strategies π
σ0 ,π
we have Prs ε (Parity(p)) ≥ 1 − ε; (such a strategy exists, since by assumption 1 we have
ei
V0 = W1 = Limit 1 (Parity(p)). Given ε > 0, for 1 ≤ i ≤ k − 1, let σεi be a strategy in G
i
ei we have Prsσε ,π (Parity(e
such that for all s ∈ Vi and all strategies π in G
pi )) ≥ 1 − ε; (such
a strategy exists by assumption 3). Given ε > 0, fix a sequence ε1 , ε2 , . . ., such that for
P
ε
all j ≥ 1 we have εj > 0 and ∞
j=1 εj ≤ ε (e.g., set εj = 2j ). We construct a strategy σε
as follows. For a history w = hs0 , s1 , . . . , sℓ i let us inductively define num(w) as follows:
num(hs0 i) = 1, and
num(hs0 , s1 , . . . , sℓ−1 , sℓ i) =



num(hs0 , s1 , . . . , sℓ−1 i)


num(hs0 , s1 , . . . , sℓ−1 i) + 1
if sℓ−1 , sℓ ∈ Vi , for some i;
if sℓ−1 ∈ Vi , sℓ ∈ Vj , Vi 6= Vj .
That is num(w) denotes the number of switches of the partitions for w. The strategy σε
follows the strategy σε0 upon reaching V0 , and the strategy played on reaching Vk is irrelevant
(i.e., can be fixed arbitrarily). For a history w = hs0 , s1 , . . . , sℓ i, such that for all 0 ≤ j ≤ ℓ,
sj ∈ ∪1≤l≤k−1 Vl , the strategy σε (w) is defined as follows: for a ∈ Γ1 (s) \ f (s) we have
σε (w)(a) =



σεi 1 (w)(a)


σ i (hsj , . . . , sℓ i)(a)
εl
if s0 , . . . , sℓ ∈ Vi ;
otherwise, where sj , . . . , sℓ ∈ Vi , sj−1 6∈ Vi , num(w) = l;
and for a ∈ f (s) we have





σεi 1 (w)(1) · ξf (sℓ )(a)




σε (w)(a) = σ i (hsj , . . . , sℓ i)(1) · ξf (sk )(a)
εl








if s0 , . . . , sℓ ∈ Vi ;
otherwise, where sj , . . . , sℓ ∈ Vi , sj−1 6∈ Vi ,
num(w) = l.
The strategy σε , on entering a set Vi ignores the history and switches to the strategy σεi l ,
for histories w with num(w) = l.
189
CHAPTER 8. CONCURRENT PARITY GAMES
The following assertion hold: for all s ∈ S, for all strategies π, for all histories
w = hs0 , s1 , . . . , sℓ i with sℓ ∈ Vi , for 1 ≤ i ≤ k − 1, we have
Prσs ε ,π (Parity(p) | A) ≥ 1 − εl ≥ 1 − ε (†);
where A = {w · ω | ω ∈ Safe(Vi )}, and num(w) = l. The above property follows since
the strategy σε switches to a strategy σεi l , and the strategy σεi l ensures that if the game
stays Vi forever, then Parity(p) holds with probability at least 1 − εl . We now analyze the
ei , for 1 ≤ i ≤ k − 1. In G
ei , for s ∈ Vi and (a1 , a2 ) ∈ Neg(s, f ), we
game structures G
have δ(s, a1 , a2 )(w2 ) = 1, and w2 is sure-winning for player 2. Since σεi l ensures winning
with probability 1 − εl in Vi , it follows that for all strategies π, the action pairs from
the set Neg(s, f ) is played with total probability less than εl over all rounds of play, for
histories w with num(w) = l. For (a1 , a2 ) ∈ Eq(s, f ), we have Supp(δ(s, a1 , a2 )) ⊆ Vi . By
assumption 3 the following conditions hold: (a) for all a2 ∈ Γ2 (s), if Succ(s, ξf , a2 ) ∩ (S \
Vi ) 6= ∅, then Succ(s, ξf , a2 )) ∩ (∪j<i Vj ) 6= ∅; and (b) Succ(s, a1 , a2 ) ∩ (∪j<i Vj ) 6= ∅, for all
(a1 , a2 ) ∈ Pos(s, f ). Thus we obtain that for all s ∈ S, for all strategies π, for all histories
w = hs0 , s1 , . . . , sℓ i with sℓ ∈ Vi , for 1 ≤ i ≤ k − 1, we have
Prσs ε ,π (Reach(∪j<i Vj ) | B) ≥ c > 0,
for some constant c, where B = {w · ω | ω ∈ Reach(S \ Vi )}. Since there are k-partitions
we obtain that for all s ∈ S, for all strategies π, for all histories w = hs0 , s1 , . . . , sℓ i with
sℓ ∈ Vi , for 1 ≤ i ≤ k − 1, we have
Prσs ε ,π (Reach(V0 ) | B) ≥ c1 > 0,
(‡)
for some constant c1 , with 0 < c1 ≤ ck . For all s ∈ S, for all strategies π, for all histories
w = hs0 , s1 , . . . , sℓ i with sℓ ∈ V0 or sℓ ∈ Vk , we have
Prσs ε ,π (Reach(V0 ∪ Vk ) | w) = 1
(§)
CHAPTER 8. CONCURRENT PARITY GAMES
190
Hence it follows from (†), (‡) and (§) that for all s ∈ S, for all strategies π, for all n > 0,
we have
Prsσε ,π (Parity(p) ∪ Reach(V0 ∪ Vk ) | Fn ) ≥ c1 > 0,
for some constant c1 . By Lemma 3 for all s ∈ S, for all strategies π, we have
Prσs ε ,π (Parity(p) ∪ Reach(V0 ∪ Vk )) = 1.
Since σε plays σε0 upon reaching V0 , it follows that for all s ∈ S and all strategies π we have Prσs ε ,π (coParity(p) | Reach(V0 )) ≤ ε. Since the strategy σε ensures that
action pairs in Neg(s, f ) is played with probability at most ε, by Lemma 51 we obtain
that for all s ∈ S and all strategies π we have Prσs ε ,π (Reach(W2 )) ≤ vb(s) + ε, where
vb(s) = vb(G, f, P, ξf , Eq, Pos, Neg, W2 )(s) + ε. It follows that for all s ∈ S and all strategies
π we have
Prsσε ,π (coParity(p)) ≤ Prσs ε ,π (Reach(W2 )) + ε ≤ (b
v (s) + ε) + ε = vb(s) + 2 · ε.
Since ε > 0 is arbitrary, the desired result follows.
Lemma 53 Given a concurrent game structure G with a parity objective Parity(p), let
W1 = Limit 1 (Parity(p)) and W2 = Limit 2 (coParity(p)). There exists a partition P =
(V0 , V1 , . . . , Vk ) of S, a function f : S → 2A \ ∅ such that f (s) ⊆ Γ1 (s) for all s ∈ S, a
selector ξf such that for all s ∈ S we have Supp(ξf (s)) = f (s), and the following conditions
hold.
1. Condition 1. V0 = W1 , Vk = W2 .
2. Condition 2.
For all 1 ≤ i ≤ k − 1, for all s ∈ Vi , there exists a partition
(Eq(s, f ), Pos(s, f ), Neg(s, f )) of (Γ1 (s) \ f (s)) × Γ2 (s) such that for all (a1 , a2 ) ∈
Eq(s, f ) we have Succ(s, a1 , a2 ) ⊆ Vi , and the following assertions hold:
191
CHAPTER 8. CONCURRENT PARITY GAMES
• for all a2 ∈ Γ2 (s), if Succ(s, ξf , a2 )∩(S \Vi ) 6= ∅, then Succ(s, ξf , a2 )∩(∪j<i Vj ) 6=
∅; and
• Succ(s, a1 , a2 ) ∩ (∪j<i Vj ) 6= ∅, for all (a1 , a2 ) ∈ Pos(s, f ).
ei
3. Condition 3. For all 1 ≤ i ≤ k − 1, every state s ∈ Vi is limit-sure winning in G
ei , pei ) = QRS(G, Vi , f, ξf , Eq, Pos, Neg, p), i.e., Vi ⊆
for objective Parity(e
pi ), where (G
e
i
Limit G
pi )).
1 (Parity(e
4. Condition 4.
For all s
Val 2 (coParity(p))(s).
∈
S we have vb(G, f, P, ξf , Eq, Pos, Neg, W2 )(s)
≤
Proof. The witness partition P, the function f , the selector ξf , and the partitions Eq, Pos
and Neg are obtained as follows. The partition P is the value-class partition of S in
decreasing order, i.e., (a) for 0 ≤ i ≤ k, the set Vi is a value-class; and (b) for s ∈ Vi
and t ∈ Vj , if j < i, then Val 1 (Parity(p))(t) > Val 1 (Parity(p))(s). The witness f is
obtained as a witness satisfying the conditions of Lemma 50 such that for all s ∈ S we have
f (s) ∈ OptSupp(s). For a state s ∈ Vi , with 1 ≤ i ≤ k − 1, we have
Eq(s, f ) = Eq(s, f (s));
Pos(s, f ) = Pos(s, f (s));
Neg(s, f ) = Neg(s, f (s));
where Eq, Pos, and Neg are as defined for value classes.
By Lemma 50 we obtain
e
i
ei , pei ) =
pi )), where (G
that for all 1 ≤ i ≤ k − 1, we have Vi ⊆ Limit G
1 (Parity(e
QRS(G, Vi , f, ξf , Eq, Pos, Neg, p). The witness selector ξf is a locally optimal selector such
that Supp(ξf (s)) = f (s) for all s ∈ S. Since ξf is a locally optimal selector, it follows that
S
for all s ∈ S ∩ Vi , with 1 ≤ i ≤ k − 1, and for all a2 ∈ Γ2 (s), if Succ(s, ξf , a2 ) ∩ ( j>i Vj ) 6= ∅
S
(i.e., it goes to a lower value-class), then Succ(s, ξf , a2 ) ∩ ( j<i Vj ) 6= ∅ (i.e., it goes to a
higher value-class). In other words, for all s ∈ S ∩ Vi , with 1 ≤ i ≤ k − 1, and for all
S
a2 ∈ Γ2 (s), if Succ(s, ξf , a2 ) ∩ (S \ Vi ) 6= ∅, then Succ(s, ξf , a2 ) ∩ ( j<i Vj ) 6= ∅. For s ∈ Vi ,
with 1 ≤ i ≤ k − 1, for (a1 , a2 ) ∈ Eq(s, f (s)) we have Succ(s, a1 , a2 ) ⊆ Vi . For s ∈ Vi , with
192
CHAPTER 8. CONCURRENT PARITY GAMES
1 ≤ i ≤ k − 1, for (a1 , a2 ) ∈ Pos(s, f (s)) we have Succ(s, a1 , a2 ) ∩ (S \ Vi ) 6= ∅, and
X
t∈S
Val 1 (Parity(p))(t) · δ(s, a1 , a2 )(t) ≥ Val 1 (Parity(p))(s).
Since Succ(s, a1 , a2 ) ∩ (S \ Vi ) 6= ∅, we must have Succ(s, a1 , a2 ) ∩ VC(Parity(p), > r) 6= ∅,
where s ∈ VC(Parity(p), r), i.e., Succ(s, a1 , a2 ) ∩ (∪j<i Vi ) 6= ∅. It follows that condition 1,
condition 2, and condition 3 holds. We now prove condition 4.
Let v(s) = Val 2 (coParity(p))(s) for s ∈ S.
s ∈ W2 .
Observe that v(s) = 1 for all
Hence to show the desired result, it suffices to show that in the MDP
b = M(G,
c
b 2 (s) we have
G
f, P, ξf , Eq, Pos, Neg), for all states s and all b
a∈Γ
X
v(s) ≥
t∈S
b b
v(t) · δ(s,
a)(t).
The inequality is proved considering the following cases.
• For s ∈ S and b
a = (a1 , a2 ) ∈ Eq(s, f (s)), for all t ∈ Succ(s, a1 , a2 ) we have v(s) = v(t)
P
b b
(as s and t are in the same value-class). It follows that v(s) = t∈S v(t) · δ(s,
a)(t).
• For s ∈ S and b
a = (a1 , a2 ) ∈ Pos(s, f (s)) we have
(1 − v(s)) ≤
i.e., v(s) ≥
b b
δ(s,
a)(t).
P
t∈S
X
(1 − v(t)) · δ(s, a1 , a2 )(t),
t∈S
v(t) · δ(s, a1 , a2 )(t). In other words we have v(s) ≥
• For s ∈ S and b
a = (1, a2 ) we have
X
t∈S
b b
v(t) · δ(s,
a)(t) =
X X
t∈S a∈f (s)
P
t∈S
v(t) ·
v(t) · δ(s, a1 , a2 )(t) · ξf (s)(a1 ).
Since ξf is a locally optimal selector, for all s ∈ S and for all a2 ∈ Γ2 (s) we have
v(s) ≥
Thus we have v(s) ≥
P
X X
t∈S a1 ∈f (s)
t∈S
v(t) · δ(s, a1 , a2 )(t) · ξf (s)(a1 ).
b b
v(t) · δ(s,
a)(t).
CHAPTER 8. CONCURRENT PARITY GAMES
193
Hence the desired result follows.
Algorithm. Given a concurrent game structure G, a parity objective Parity(p), a real α,
and a state s, to decide whether Val 1 (Parity(p))(s) ≥ α, it is sufficient (and possible by
Lemma 53) to guess (P, f, ξ, Eq, Pos, Neg) such that P = (V0 , V1 , . . . , Vk ) is a partition of
the state space, f : S → 2A \ ∅ a function such that f (s) ⊆ Γ1 (s) for all s ∈ S, ξ a selector
such that for all s ∈ S we have Supp(ξ(s)) = f (s), and the following conditions hold:
1. V0 = Limit 1 (Parity(p)), and Vk = Limit 2 (coParity(p));
2. for all 1 ≤ i ≤ k and all s ∈ S ∩ Vi we have
(a) for all (a1 , a2 ) ∈ Eq(s, f ) we have Succ(s, a1 , s2 ) ⊆ Vi ;
(b) for all (a1 , a2 ) ∈ Pos(s, f ) we have Succ(s, a1 , a2 ) ∩ (∪j<i Vj ) 6= ∅;
(c) for all a2 ∈ Γ2 (s), if Succ(s, ξ, a2 )∩(S \Vi ) 6= ∅, then Succ(s, ξ, a2 )∩(∪j<i Vj ) 6= ∅;
(also observe that it suffices to verify the condition for the selector ξ U that at s
plays all actions in f (s) uniformly at random, instead of the selector ξ);
ei , pei ) =
3. for all 1 ≤ i ≤ k − 1, every state s is limit-sure winning in (G
QRS(G, Vi , f, ξ U , Eq, Pos, Neg, p), where ξ U is a selector that at a state s plays all
moves in f (s) uniformly at random; and
4. 1 − α ≥ b
v (f, P, ξ, Eq, Pos, Neg, Vk )(s).
ei we need to verify that s is limit-sure winning, and limit-sure winning
Observe that in each G
in concurrent games is not dependent on the the precise transition probability (Lemma 50),
it is sufficient to verify the condition with ξ U instead of ξ. The guess of the partition P and
f is polynomial in the size of the game, and the guess of ξ will be obtained by a sentence in
the theory of reals. Once P and f are guessed, step 1 and step 2 can be achieved in PSPACE
(since for concurrent games whether a state s ∈ Limit 1 (Parity(p)) can be decided in NP∩
194
CHAPTER 8. CONCURRENT PARITY GAMES
coNP [dAH00]). We now present a sentence in the existential theory of the real closed field
(the sub-class of the real-closed field where only the existential quantifier is used) for the
guess ξ to verify the last condition:
^
∃x.∃v.
s∈S,a1 ∈Γ1 (s)
∧
∧
∧
∧
∧
^
s∈S,a1 ∈f (s)
^
s∈V0
^
^
xs (a) ≥ 0 ∧
s∈S
(xs (a1 ) > 0) ∧
(v(s) = 0) ∧
^
1≤i≤k s,t∈Vi
^
^
a∈Γ1 (s)
^
s∈S,a1 6∈f (s)
^
xs (a) = 1
xs (a1 ) = 0
(v(s) = 1)
s∈Vk
(v(s) = v(t)) ∧
s∈S\(V0 ∪Vk ),a2 ∈Γ2 (s)
X
^
^
(v(s) > v(t))
1≤i<j≤k s∈Vi ,t∈Vj
X X
v(s) ≥
v(t) · δ(s, a1 , a2 )(t) · xs (a1 )
s∈S\(V0 ∪Vk ),(a1 ,a2 )∈Pos(s,f )
t∈S a1 ∈f (s)
v(s) ≥
X
t∈S
v(t) · δ(s, a1 , a2 )(t)
∧ (v(s) ≤ 1 − α).
The first line of constraints ensures that x is a selector and the second line of constraints
ensures that x is a selector with support f (s) for all states s ∈ S. The third line ensures
that v(s) is defined in the right way for states in V0 and Vk . The fourth line ensures that
for all 1 ≤ i ≤ k, and s, t ∈ Vi , we have the same value in s and t, and if s ∈ Vi and
t ∈ Vj , with i < j, then the value at s is greater than the value at t. The next two lines
present the inequality constraints to guarantee that with the choice of x as selector we
have b
v (f, P, x, Eq, Pos, Neg, Vk ) ≤ v(s). The last constraint specifies that v(s) ≤ 1 − α.
Since existential theory of reals is decidable in PSPACE [Can88], we obtain NPSPACE
algorithm to decide whether Val 1 (Parity(p))(s) ≥ α. Since NPSPACE=PSPACE, there is
a PSPACE algorithm to decide whether Val 1 (Parity(p))(s) ≥ α. By applying the binary
search technique (as for Algorithm 7) we can approximate the value to a precision ε, for
ε > 0, applying the decision procedure log( 1ε ) times. Thus we have the following result.
195
CHAPTER 8. CONCURRENT PARITY GAMES
Theorem 44 (Computational complexity) Given a concurrent game structure G and
a parity objective Parity(p), a state s of G, rational ε > 0 and a rational α the following
assertions hold.
1. (Decision problem). Whether Val 1 (Parity(p))(s) ≥ α can be decided in PSPACE.
2. (Approximation problem).
An interval [l, u] such that u − l
≤
2ε and
Val 1 (Parity(p))(s) ∈ [l, u] can be computed in PSPACE.
The previous best known algorithm to approximate values is triple exponential in
the size of the game graph and logarithmic in
1
ε
[dAM01].
Strategy complexity. Lemma 52 and Lemma 53 shows that witness for perennial εoptimal strategies can be obtained by “stitching” (or composing together) limit-sure winning
strategies and locally optimal selectors across value classes. This characterization along with
results on the structure of limit-sure winning strategies yields Theorem 45. From the results
of [dAH00] it follows that there are limit-sure winning strategies coincide in limit with a
memoryless selector σℓ such that Supp(σℓ ) is the set of least-rank actions of the limit-sure
witness. The witness construction of ε-optimal strategies we presented extend the result
from limit-sure winning strategies to ε-optimal strategies (Theorem 45). Theorem 45 states
that there exist ε-optimal strategies that in limit coincide with locally optimal selector, i.e.,
a memoryless strategy with locally optimal selectors. This parallels the results of MertensNeyman [MN81] for concurrent games with limit-average objectives.
Theorem 45 (Limit of ε-optimal strategies) For every ε > 0, for all parity objectives
Φ there exist ε-optimal strategy σε , such that the sequence of the strategies σε converge to
a memoryless strategy σ with locally optimal selector as ε → 0, i.e., limε→0 σε = σ, where
σ ∈ Σℓ (Φ) and σ is memoryless.
CHAPTER 8. CONCURRENT PARITY GAMES
196
Complexity of concurrent ω-regular games. The complexity results for concurrent
games with sure winning criteria follows from the results for 2-player games. Given a
concurrent game of size |G|, and a parity objective of d-priorities, the almost-sure and
limit-sure winning states can be computed in time O(|G|d+1 ) and the almost-sure and
limit-sure winning states can be decided in NP ∩ coNP [dAH00]. We established that the
values of concurrent games with parity objectives can be approximated within ε-precision
in EXPTIME, for ε > 0. A concurrent game with Rabin and Streett objectives with d pairs,
can be solved by transforming it to a game of exponential size of the original game, with
parity objectives of O(d) priorities: the reduction is achieved using a index appearance record
(IAR) construction [Tho95], which is an adaptation of the LAR construction of [GH82]. The
conversion along with the qualitative analysis of concurrent games with parity objectives
shows that almost-sure and limit-sure winning states of concurrent games with Rabin and
Streett objectives can be computed in EXPTIME. Moreover, the conversion of concurrent
games with Rabin and Streett objectives to concurrent games with parity objectives and
quantitative analysis of concurrent games with parity objectives yields an EXPSPACE
bound to compute values within ε-precision for concurrent games with Rabin and Streett
objectives. We summarize the results on strategy and computational complexity in Table 8.1
and Table 8.2.
8.2
Conclusion
In this chapter we studied the complexity of concurrent games with parity objec-
tives, and as a consequence also obtained improved complexity results for concurrent games
with Rabin, Streett and Müller objectives. The interesting open problems are as follows:
1. The lower bounds for computation of almost-sure and limit-sure winning sets for
concurrent games with Rabin and Streett objectives are NP-hard and coNP-hard,
197
CHAPTER 8. CONCURRENT PARITY GAMES
Table 8.1: Strategy complexity of concurrent games with ω-regular objectives, where ΣPM
denotes the family of pure memoryless strategies, ΣM denotes the family of randomized memoryless strategies, and ΣHI denotes the family of randomized history dependent,
infinite-memory strategies.
Objectives
Sure
Almost-sure
Limit-sure
ε-optimal
PM
PM
PM
Safety
Σ
Σ
Σ
ΣM
Reachability
ΣPM
ΣM
ΣM
ΣM
PM
M
M
coBüchi
Σ
Σ
Σ
ΣM
Büchi
ΣPM
ΣM
ΣHI
ΣHI
PM
HI
HI
Parity
Σ
Σ
Σ
ΣHI
Rabin
ΣPM
ΣHI
ΣHI
ΣHI
P
F
HI
HI
Streett
Σ
Σ
Σ
ΣHI
Müller
ΣP F
ΣHI
ΣHI
ΣHI
Table 8.2: Computational complexity of concurrent games with ω-regular objectives.
Objectives
Safety
Reachability
coBüchi
Büchi
Parity
Rabin
Streett
Müller
Sure
PTIME
PTIME
PTIME
PTIME
NP ∩ coNP
NP-compl.
coNP-compl.
PSPACE-compl.
Almost-sure
PTIME
PTIME
PTIME
PTIME
NP ∩ coNP
EXPTIME
EXPTIME
EXPTIME
Limit-sure
PTIME
PTIME
PTIME
PTIME
NP ∩ coNP
EXPTIME
EXPTIME
EXPTIME
Values
PSPACE
PSPACE
PSPACE
PSPACE
PSPACE
EXPSPACE
EXPSPACE
EXPSPACE
CHAPTER 8. CONCURRENT PARITY GAMES
198
respectively (from NP-hardness and coNP-hardness from the special case of 2-player
games). The upper bounds are EXPTIME, and it is open to prove that the problems
are NP-complete and coNP-complete for concurrent games with Rabin and Streett
objectives, respectively.
2. As stated above lower bounds for quantitative analysis of concurrent games with Rabin
and Streett objectives are NP-hard and coNP-hard, respectively. The upper bounds
are EXPSPACE. It remains open to get NP and coNP algorithm for concurrent games
with Rabin and Streett objectives, respectively, or even EXPTIME algorithms.
199
Chapter 9
Secure Equilibria and Applications
In this chapter we will consider 2-player games with non-zero-sum objectives and
show its application in synthesis1 . In 2-player non-zero-sum games, Nash equilibria capture
the options for rational behavior if each player attempts to maximize her payoff. In contrast
to classical game theory, we consider lexicographic objectives: first, each player tries to
maximize her own payoff, and then, the player tries to minimize the opponent’s payoff.
Such objectives arise naturally in the verification of systems with multiple components.
There, instead of proving that each component satisfies its specification no matter how
the other components behave, it sometimes suffices to prove that each component satisfies
its specification provided that the other components satisfy their specifications. We say
that a Nash equilibrium is secure if it is an equilibrium with respect to the lexicographic
objectives of both players. We prove that in graph games with Borel objectives there
may be several Nash equilibria, but there is always an unique maximal payoff profile of a
secure equilibrium. We show how this equilibrium can be computed in the case of ω-regular
winning conditions. We then study the problem of synthesis of two independent processes
each with its own specification, and show how the notion of secure equilibria generalizes
1
This chapter contains results from [CHJ04, CH07]
200
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
the assume-guarantee style of reasoning in a game theoretic framework and leads to a more
appropriate formulation of the synthesis problem.
9.1
Non-zero-sum Games
We consider 2-player non-zero-sum games, i.e., non-strictly competitive games. A
possible behavior of the two players is captured by a strategy profile (σ, π), where σ is
a strategy of player 1, and π is a strategy of player 2. Classically, the behavior (σ, π) is
considered rational if the strategy profile is a Nash equilibrium [Jr50] —that is, if neither
player can increase her payoff by unilaterally changing her strategy. Formally, let v1σ,π be
the payoff of player 1 if the strategies (σ, π) are played, and let v2σ,π be the corresponding
′
payoff of player 2. Then (σ, π) is a Nash equilibrium if (1) v1σ,π ≥ v1σ ,π for all player 1
′
strategies σ ′ , and (2) v2σ,π ≥ v2σ,π for all player 2 strategies π ′ . Nash equilibria formalize a
notion of rationality which is strictly internal: each player cares about her own payoff but
does not in the least care (cooperatively or adversarially) about the other player’s payoff.
Choosing among Nash equilibria. A classical problem is that many games have multiple
Nash equilibria, and some of them may be preferable to others. For example, one might
′
′
′
′
partially order the equilibria by (σ, π) (σ ′ , π ′ ) if both v1σ,π ≥ v1σ ,π and v2σ,π ≥ v2σ ,π . If a
unique maximal Nash equilibrium exists in this order, then it is preferable for both players.
However, maximal Nash equilibria may not be unique. In such cases external criteria, such
as the sum of the payoffs for both players, have been used to evaluate different rational
behaviors [Kre90, Owe95, vNM47]. These external criteria, which are based on a single
preference order on strategy profiles, are usually cooperative, in that they capture social
aspects of rational behavior. We define and study, instead, an adversarial external criterion
for rational behavior. Put simply, we assume that each player attempts to minimize the
other player’s payoff as long as, by doing so, she does not decrease her own payoff. This
201
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
yields two different preference orders on strategy profiles, one for each player. Among two
strategy profiles (σ, π) and (σ ′ , π ′ ), player 1 prefers (σ, π), denoted (σ, π) 1 (σ ′ , π ′ ), if
′
′
′
′
′
′
either v1σ,π > v1σ ,π , or both v1σ,π = v1σ ,π and v2σ,π ≤ v2σ ,π . In other words, the preference
order 1 of player 1 is lexicographic: the primary goal of player 1 is to maximize her own
payoff; the secondary goal is to minimize the opponent’s payoff. The preference order 2 of
player 2 is defined symmetrically. We refer to rational behaviors under these lexicographic
objectives as secure equilibria. (We do not know how to uniformly translate all games
with lexicographic preference orders to games with a single objective for each player, such
that the Nash equilibria of the translated games correspond to the secure equilibria of the
original games.)
Secure equilibria. The two orders 1 and 2 on strategy profiles, which express the
preferences of the two players, induce the following refinement of the notion of Nash equi′
′
librium: a strategy profile (σ, π) is a secure equilibrium if (1) (v1σ,π , v2σ,π ) 1 (v1σ ,π , v2σ ,π ) for
′
′
all player 1 strategies σ ′ , and (2) (v1σ,π , v2σ,π ) 2 (v1σ,π , v2σ,π ) for all player 2 strategies π ′ .
Note that every secure equilibrium is a Nash equilibrium, but a Nash equilibrium need not
be secure. The name “secure” equilibrium derives from the following equivalent characterization. We say that a strategy profile (σ, π) is secure if any rational deviation of player 2
—i.e., a deviation that does not decrease her payoff— will not decrease the payoff of player 1,
and symmetrically, any rational deviation of player 1 will not decrease the payoff of player 2.
′
Formally, a strategy profile (σ, π) is secure if for all player 2 strategies π ′ , if v2σ,π ≥ v2σ,π then
′
′
′
v1σ,π ≥ v1σ,π , and for all player 1 strategies σ ′ , if v1σ ,π ≥ v1σ,π then v2σ ,π ≥ v2σ,π . The secure
profile (σ, π) can thus be interpreted as a contract between the two players which enforces
cooperation: any unilateral selfish deviation by one player cannot put the other player at a
disadvantage if she follows the contract. It is not difficult to show that a strategy profile is
a secure equilibrium iff it is both a secure profile and a Nash equilibrium. Thus, the secure
equilibria are those Nash equilibria which represent enforceable contracts between the two
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
202
players.
Motivation: verification of component-based systems. The motivation for our definitions comes from verification. There, one would like to prove that a component of a
system (player 1) can satisfy a specification no matter how the environment (player 2)
behaves [AHK02]. Classically, this is modeled as a strictly competitive (zero-sum) game,
where the environment’s objective is the complement of the component’s objective. However, the zero-sum model is often overly conservative, as the environment itself typically
consists of components, each with its own specification (i.e., objective). Moreover, the individual component specifications are usually not complementary; a common example is
that each component must maintain a local invariant. So a more appropriate approach is
to prove that player 1 can meet her objective no matter how player 2 behaves as long as
player 2 does not sabotage her own objective. In other words, classical correctness proofs
of a component assume absolute worst-case behavior of the environment, while it would
suffice to assume only relative worst-case behavior of the environment —namely, relative
to the assumption that the environment itself is correct (i.e., meets its specification). Such
relative worst-case reasoning, called assume-guarantee reasoning [AL95, AH99, NAT03], so
far has not been studied in the natural setting offered by game theory.
Existence and uniqueness of maximal secure equilibria. We will see that in general
games, such as matrix games, there may be multiple secure equilibrium payoff profiles, even
several incomparable maximal ones. We show that for 2-player games with Borel objectives,
which may have multiple maximal Nash equilibria, there always exists a unique maximal
secure equilibrium payoff profile. In other words, in graph games with Borel objectives
there is a compelling notion of rational behavior for each player, which is (1) a classical
Nash equilibrium, (2) an enforceable contract (“secure”), and (3) a guarantee of maximal
payoff for each player among all behaviors that achieve (1) and (2).
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
s1
R2
s3
s0
s4
203
s2
R2 , R1
Figure 9.1: A graph game with reachability objectives.
Examples. Consider the game graph shown in Fig. 9.1. Player 1 chooses the successor
node at square nodes and her objective is to reach the target s4 . Player 2 chooses the
successor node at diamond nodes and her objective is to reach s3 or s4 , also a reachability
objective. There are two player 1 strategies: the strategy σ1 chooses the move s0 → s1 ,
and σ2 chooses s0 → s2 . There are also two player 2 strategies: the strategy π1 chooses
s1 → s3 , and π2 chooses s1 → s4 . The strategy profile (σ1 , π1 ) leads the game into s3 and
therefore gives the payoff profile (0,1), indicating that player 1 loses and player 2 wins (i.e.,
only player 2 reaches her target). The strategy profiles (σ1 , π2 ), (σ2 , π1 ), and (σ2 , π2 ) give
the payoffs (1,1), (0,0), and (0,0), respectively. All four strategy profiles are Nash equilibria.
For example, in (σ1 , π1 ) player 1 does not have an incentive to switch to strategy σ2 (which
would still give her payoff 0), and neither does player 2 have an incentive to switch to π2
(she is already getting payoff 1). However, the strategy profile (σ1 , π1 ) is not a secure
equilibrium, because player 2 can lower player 1’s payoff (from 1 to 0) without changing her
own payoff by switching to strategy σ2 . Similarly, the strategy profile (σ1 , π2 ) is not secure,
because player 1 can lower player 2’s payoff without changing her own payoff by switching
to σ2 . So if both players, in addition to maximizing their own payoff, also attempt to
minimize the opponents payoff, then the resulting payoff profile is unique, namely, (0,0).
In other words, in this game, the only rational behavior for both players is to deny each
other’s objectives.
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
204
B1
s1
B2
s4
s0
s2
s3
Figure 9.2: A graph game with Büchi objectives.
This is not always the case: sometimes it is beneficial for both players to cooperate
to achieve their own objectives, with the result that both players win. Consider the game
graph shown in Fig. 9.2. Both players have Büchi objectives: player 1 (square) wants to
visit s0 infinitely often, and player 2 (diamond) wants to visit s4 infinitely often. If player 1
always chooses s1 → s0 and player 2 always chooses s2 → s4 , then both players win. This
Nash equilibrium is also secure: if player 1 deviates by choosing s2 → s0 , then player 2 can
“retaliate” by choosing s0 → s3 ; similarly, if player 2 deviates by choosing s1 → s2 , then
player 2 can retaliate by s2 → s3 . It follows that for purely selfish motives (and not some
social reason), both players have an incentive to cooperate to achieve the maximal secure
equilibrium payoff (1,1).
Outline and results. We first define the notion of secure equilibrium and give several
interpretations through alternative definitions. We then prove the existence and uniqueness
of maximal secure equilibria in graph games with Borel objectives. The proof is based on
the following classification of strategies. A player 1 strategy is called strongly winning if
it ensures that player 1 wins and player 2 loses (i.e., the outcome of the game satisfies
ϕ1 ∧ ¬ϕ2 ). A player 1 strategy is a retaliating strategy if it ensures that if player 2 wins,
then player 1 wins (i.e., the outcome satisfies ϕ2 → ϕ1 ). In other words, a retaliating
strategy for player 1 ensures that if player 2 causes player 1 to lose, then player 2 will
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
205
lose too. If both players follow retaliating strategies (σ, π), they may both win —in this
case, we say that (σ, π) is a winning pair of retaliating strategies— or they may both lose.
We show that at every node of a graph game with Borel objectives, either one of the two
players has a strongly winning strategy, or there is a pair of retaliating strategies. Based
on this insight, we give an algorithm for computing the secure equilibria in graph games
in the case that both players’ objectives are ω-regular. We then consider the problem of
synthesis of two independent processes each with its own specification and show that secure
equilibria generalizes the assume-guarantee style of reasoning in game theoretic framework
and present an appropriate formulation for the synthesis problem.
9.2
Secure Equilibria
In a secure game the objective of player 1 is to maximize her own payoff and then
minimize the payoff of player 2. Similarly, player 2 maximizes her own payoff and then
minimizes the payoff of player 1. We want to determine the best payoff that each player can
ensure when both players play according to these preferences. We formalize this as follows.
A strategy profile (σ, π) is a pair of strategies, where σ is a player 1 strategy and π is a
player 2 strategy. The strategy profile (σ, π) gives rise to a payoff profile (v1σ,π , v2σ,π ), where
v1σ,π is the payoff of player 1 if the two players follow the strategies σ and π respectively,
and v2σ,π is the corresponding payoff of player 2. We define the player 1 preference order 1
and the player 2 preference order 2 on payoff profiles lexicographically:
(v1 , v2 ) ≺1 (v1′ , v2′ ) iff (v1 < v1′ ) ∨ (v1 = v1′ ∧ v2 > v2′ ),
that is, player 1 prefers a payoff profile which gives her greater payoff, and if two payoff
profiles match in the first component, then she prefers the payoff profile in which player 2’s
payoff is minimized. Symmetrically,
(v1 , v2 ) ≺2 (v1′ , v2′ ) iff (v2 < v2′ ) ∨ (v2 = v2′ ∧ v1 > v1′ ).
206
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
Given two payoff profiles (v1 , v2 ) and (v1′ , v2′ ), we write (v1 , v2 ) = (v1′ , v2′ ) iff v1 = v1′ and
v2 = v2′ , and (v1 , v2 ) 1 (v1′ , v2′ ) iff either (v1 , v2 ) ≺1 (v1′ , v2′ ) or (v1 , v2 ) = (v1′ , v2′ ). We
define 2 analogously.
Definition 17 (Secure strategy profiles) A strategy profile (σ, π) is secure if the following two conditions hold:
′
′
∀π ′ . (v1σ,π < v1σ,π ) → (v2σ,π < v2σ,π )
′
′
∀σ ′ . (v2σ ,π < v2σ,π ) → (v1σ ,π < v1σ,π )
A secure strategy for player 1 ensures that if player 2 tries to decrease player 1’s payoff,
then player 2’s payoff decreases as well, and vice versa.
Definition 18 (Secure equilibria) A strategy profile (σ, π) is a Nash equilibrium if
′
′
(1) v1σ,π ≥ v1σ ,π for all player 1 strategies σ ′ , and (2) v2σ,π ≥ v2σ,π for all player 2 strategies π ′ . A strategy profile is a secure equilibrium if it is both a Nash equilibrium and secure.
Proposition 13 (Equivalent characterization) The strategy profile (σ, π) is a secure
equilibrium iff the following two conditions hold:
′
′
∀π ′ . (v1σ,π , v2σ,π ) 2 (v1σ,π , v2σ,π )
′
′
∀σ ′ . (v1σ ,π , v2σ ,π ) 1 (v1σ,π , v2σ,π ).
Proof. Consider a strategy profile (σ, π) which is a Nash equilibrium and secure. Since
′
(σ, π) is a Nash equilibrium, for all player 2 strategies π ′ , we have v2σ,π ≤ v2σ,π . Since (σ, π)
′
′
is secure, for all π ′ , we have (v1σ,π < v1σ,π ) → (v2σ,π < v2σ,π ). It follows that for every player 2
strategy π ′ , the following condition holds:
′
′
′
(v2σ,π = v2σ,π ∧ v1σ,π ≤ v1σ,π ) ∨ (v2σ,π < v2σ,π ).
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
′
207
′
Hence, for all π ′ , we have (v1σ,π , v2σ,π ) 2 (v1σ,π , v2σ,π ). The argument for the other case is
symmetric. Thus neither player 1 nor player 2 has any incentive to switch from the strategy
profile (σ, π) in order to increase the payoff profile according to their respective payoff profile
ordering.
Conversely, an equilibrium strategy profile (σ, π) with respect to the preference
orders 1 and 2 is both a Nash equilibrium and a secure strategy profile.
Example 7 (Matrix games) A secure equilibrium need not exist in a matrix game. We
give an example of a matrix game where no Nash equilibrium is secure. Consider the game
M1 below, where the row player can choose row 1 or row 2 (denoted r1 and r2 , respectively),
and the column player chooses between the two columns (denoted c1 and c2 ). The first
component of the payoff is the row player payoff, and the second component is the column
player payoff.

 (3, 3)
M1 = 
(3, 1)

(1, 3) 

(2, 2)
In this game the strategy profile (r1 , c1 ) is the only Nash equilibrium. But (r1 , c1 ) is not a
secure strategy profile, because if the row player plays r1 , then the column player playing c2
can still get payoff 3 and decrease the row player’s payoff to 1.
In the game M2 below, there are two Nash equilibria, namely, (r1 , c2 ) and (r2 , c1 ),
and the strategy profile (r2 , c1 ) is a secure strategy profile as well. Hence the strategy profile
(r2 , c1 ) is a secure equilibrium. However, the strategy profile (r1 , c2 ) is not secure.


(1, 0) 
 (0, 0)
M2 = 

( 21 , 12 )
( 12 , 21 )
Multiple secure equilibria may exist, as in the case, for example, in a matrix game where
all entries of the matrix are the same. We now present an example of a matrix game with
multiple secure equilibria with different payoff profiles. Consider the following matrix game
M3 . The strategy profiles (r1 , c1 ) and (r2 , c2 ) are both secure equilibria. The former has the
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
208
payoff profile (2, 1), and the latter, the payoff profile (1, 2). These two payoff profiles are
incomparable: player 1 prefers the former, player 2 the latter. Hence, in this case, there is
not a unique maximal secure payoff profile.

 (2, 1)
M3 = 
(0, 0)
9.3

(0, 0) 

(1, 2)
2-Player Non-Zero-Sum Games on Graphs
We consider 2-player infinite path-forming games played on graphs. We restrict
our attention to turn-based games and pure (i.e., non-randomized) strategies. In these
games, the class of pure strategies suffices for determinacy [Mar75], and, as we shall see,
for the existence of equilibria (both Nash and secure equilibria). Hence in this chapter we
consider pure strategies only. Given a state s ∈ S, a strategy σ of player 1, and a strategy
π of player 2, there is a unique play ωσ,π (s) = hs0 , s1 , s2 , . . .i, which starts from s and for
all i ≥ 0 we have (a) if si ∈ S1 , then si+1 = σ(s0 , s1 , . . . , si ); and (b) if si ∈ S2 , then
si+1 = π(s0 , s1 , . . . , si ).
We consider non-zero-sum games on graphs. For our purposes, a graph game
(G, s, ϕ1 , ϕ2 ) consists of a game graph G, say with state space S, together with a start state
s ∈ S and two Borel objectives ϕ1 , ϕ2 ⊆ S ω . The game starts at state s, player 1 pursues the
objective ϕ1 , and player 2 pursues the objective ϕ2 (in general, ϕ2 is not the complement
of ϕ1 ). Player i ∈ {1, 2} gets payoff 1 if the outcome of the game is a member of ϕi , and
she gets payoff 0 otherwise. In the following, we fix the game graph G and the objectives
ϕ1 and ϕ2 , but we vary the start state s of the game. Thus we parametrize the payoffs
by s: given strategies σ and π for the two players, we write viσ,π (s) = 1 if ωσ,π (s) ∈ ϕi , and
viσ,π (s) = 0 otherwise, for i ∈ {1, 2}. Similarly, we sometimes refer to Nash equilibria and
secure strategy profiles of the graph game (G, s, ϕ1 , ϕ2 ) as equilibria and secure profiles at
the state s.
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
209
In the following subsection, we investigate the existence and structure of secure
equilibria for the general class of graph games with Borel objectives. In the subsequent
subsection, we give a characterization of secure equilibria which can be used to compute
secure equilibria in the special case of ω-regular objectives.
9.3.1
Unique maximal secure equilibria
Consider a game graph G with state space S, and Borel objectives ϕ1 and ϕ2 for
the two players.
Definition 19 (Maximal secure equilibria) For v, w ∈ {0, 1}, we write SE vw ⊆ S to
denote the set of states s such that a secure equilibrium with the payoff profile (v, w) exists
in the graph game (G, s, ϕ1 , ϕ2 ); that is, s ∈ SE vw iff there is a secure equilibrium (σ, π)
at s such that (v1σ,π (s), v2σ,π (s)) = (v, w). Similarly, MS vw ⊆ SE vw denotes the set of
states s such that the payoff profile (v, w) is a maximal secure equilibrium payoff profile
at s; that is, s ∈ MS vw iff (1) s ∈ SE vw and (2) for all v ′ , w′ ∈ {0, 1}, if s ∈ SE v′ w′ , then
(v ′ , w′ ) 1 (v, w) and (v ′ , w′ ) 2 (v, w).
We now define the notions of strongly winning and retaliating strategies, which capture
the essence of secure equilibria. A strategy for player 1 is strongly winning if it ensures
that the objective of player 1 is satisfied and the objective of player 2 is not. A retaliating
strategy for player 1 ensures that for every strategy of player 2, if the objective of player 2
is satisfied, then the objective of player 1 is satisfied as well. We will show that every secure
equilibrium either contains a strongly winning strategy for one of the players, or it consists
of a pair of retaliating strategies.
Definition 20 (Strongly winning strategies) A strategy σ is strongly winning for
player 1 from a state s if she can ensure the payoff profile (1, 0) in the graph game
(G, s, ϕ1 , ϕ2 ) by playing the strategy σ. Formally, σ is strongly winning for player 1 from s
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
210
if for all player 2 strategies π, we have ωσ,π (s) ∈ (ϕ1 ∧¬ϕ2 ). The strongly winning strategies
for player 2 are defined symmetrically.
Definition 21 (Retaliating strategies) A strategy σ is a retaliating strategy for player 1
from a state s if for all player 2 strategies π, we have ωσ,π (s) ∈ (ϕ2 → ϕ1 ). Similarly, a
strategy π is a retaliating strategy for player 2 from s if for all player 1 strategies σ, we
have ωσ,π (s) ∈ (ϕ1 → ϕ2 ). We write Re 1 (s) and Re 2 (s) to denote the sets of retaliating
strategies for player 1 and player 2, respectively, from s. A strategy profile (σ, π) is a
retaliation strategy profile at a state s if both σ and π are retaliating strategies from s. The
retaliation strategy profile (σ, π) is winning at s if ωσ,π (s) ∈ (ϕ1 ∧ ϕ2 ). A strategy σ is a
winning retaliating strategy for player 1 at state s if there is a strategy π for player 2 such
that (σ, π) is a winning retaliation strategy profile at s.
Example 8 (Büchi-Büchi game) Recall the graph game shown in Fig. 9.2. Consider the
memoryless strategies of player 2 at state s0 . If player 2 chooses s0 → s3 , then player 2
does not satisfy her Büchi objective. If player 2 chooses s0 → s2 , then at state s2 player 1
chooses s2 → s0 , and hence player 1’s objective is satisfied, but player 2’s objective is not
satisfied. Thus, no memoryless strategy for player 2 can be a winning retaliating strategy
at s0 .
Now consider the strategy πg for player 2 which chooses s0 → s2 if between the
last two consecutive visits to s0 the state s4 was visited, and otherwise it chooses s0 → s3 .
Given this strategy, for every strategy of player 1 that satisfies player 1’s objective, player 2’s
objective is also satisfied. Let σg be the player 1 strategy that chooses s2 → s4 if between the
last two consecutive visits to s2 the state s0 was visited, and otherwise chooses s2 → s3 . The
strategy profile (σg , πg ) consists of a pair of winning retaliating strategies, as it satisfies the
Büchi objectives of both players. If instead, player 2 always chooses s0 → s3 , and player 1
always chooses s2 → s3 , we obtain a memoryless retaliation strategy profile, which is not
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
211
winning for either player: it is a Nash equilibrium at state s0 with the payoff profile (0, 0).
Finally, suppose that at s0 player 2 always chooses s2 , and at s2 player 1 always chooses s0 .
This strategy profile is again a Nash equilibrium, with the payoff profile (0, 1) at s0 , but not
a retaliation strategy profile. This shows that at state s0 the Nash equilibrium payoff profiles
(0, 1), (0, 0), and (1, 1) are possible, but only (0, 0) and (1, 1) are secure.
Given a game graph G with state space S, and a set ϕ ⊆ S ω of infinite paths, we define the
sets of states from which player 1 or player 2, respectively, can win a zero-sum game with
objective ϕ, as follows:
hh1iiG (ϕ) = {s ∈ S | ∃σ ∈ Σ. ∀π ∈ Π. ωσ,π (s) ∈ ϕ}
hh2iiG (ϕ) = {s ∈ S | ∃π ∈ Π. ∀σ ∈ Σ. ωσ,π (s) ∈ ϕ}
The set of states from which the two players can cooperate to satisfy the objective ϕ is
hh1, 2iiG (ϕ) = {s ∈ S | ∃σ ∈ Σ. ∃π ∈ Π. ωσ,π (s) ∈ ϕ}.
We omit the subscript G when the game graph is clear from the context. Let s be a state
in hh1, 2ii(ϕ), and let (σ, π) be a strategy profile such that ωσ,π (s) ∈ ϕ. Then we say that
(σ, π) is a cooperative strategy profile at s.
Definition 22 (Characterization of states) For the given game graph G and Borel objectives ϕ1 , ϕ2 , we define the following four state sets in terms of strongly winning and
retaliating strategies.
• The sets of states where player 1 or player 2, respectively, has a strongly winning
strategy:
W10 = hh1iiG (ϕ1 ∧ ¬ϕ2 )
W01 = hh2iiG (ϕ2 ∧ ¬ϕ1 )
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
212
• The set of states where both players have retaliating strategies, and there exists a
retaliation strategy profile whose strategies satisfy the objectives of both players:
W11 = {s ∈ S | ∃σ ∈ Re 1 (s). ∃π ∈ Re 2 (s). ωσ,π (s) ∈ (ϕ1 ∧ ϕ2 )}
• The set of states where both players have retaliating strategies and for every retaliation strategy profile, neither the objective of player 1 nor the objective of player 2 is
satisfied:
W00
=
{s ∈ S | Re 1 (s) 6= ∅ and Re 2 (s) 6= ∅ and
∀σ ∈ Re 1 (s). ∀π ∈ Re 2 (s). ωσ,π (s) ∈ (¬ϕ1 ∧ ¬ϕ2 )}
We first show that the four sets W10 , W01 , W11 , and W00 form a partition of the state
space. In the zero-sum case, where ϕ2 = ¬ϕ1 , the sets W10 and W01 specify the winning
states for players 1 and 2, respectively; furthermore, W11 = ∅ by definition, and W00 = ∅ by
determinacy. We also show that for all v, w ∈ {0, 1}, we have MS vw = Wvw . It follows that
for 2-player graph games (1) secure equilibria always exist, and moreover, (2) there is always
a unique maximal secure equilibrium payoff profile. (Example 8 showed that there can be
multiple secure equilibria with different payoff profiles). This result fully characterizes each
state of a 2-player non-zero-sum graph game with Borel objectives by a maximal secure
equilibria profile, just like the determinacy result fully characterizes the zero-sum case. The
proof proceeds in several steps.
Lemma 54 W10 = {s ∈ S | Re 2 (s) = ∅} and W01 = {s ∈ S | Re 1 (s) = ∅}.
Proof. First, W10 ⊆ {s ∈ S | Re 2 (s) = ∅}, because a strongly winning strategy of player 1
—i.e., a strategy to satisfy ϕ1 ∧ ¬ϕ2 against every strategy of player 2— is a witness to
exhibit that there is no retaliating strategy for player 2. Second, it follows from Borel
determinacy that from each state s in S \ W10 there is a strategy π of player 2 to satisfy
¬ϕ1 ∨ ϕ2 against every strategy of player 1. The strategy π is a retaliating strategy for
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
213
player 2. Hence S \ W10 ⊆ {s ∈ S | Re 2 (s) 6= ∅}, and therefore W10 = {s ∈ S | Re 2 (s) = ∅}.
The proof that W01 = {s ∈ S | Re 1 (s) = ∅} is symmetric.
Lemma 55 Consider the following two sets:
T1 = {s ∈ S | ∀σ ∈ Re 1 (s). ∀π ∈ Re 2 (s). ωσ,π (s) ∈ (¬ϕ1 ∧ ¬ϕ2 )}
T2 = {s ∈ S | ∀σ ∈ Re 1 (s). ∀π ∈ Re 2 (s). ωσ,π (s) ∈ (¬ϕ1 ∨ ¬ϕ2 )}
Then T1 = T2 .
Proof. The inclusion T1 ⊆ T2 follows from the fact that (¬ϕ1 ∧ ¬ϕ2 ) → (¬ϕ1 ∨ ¬ϕ2 ). We
show that T2 ⊆ T1 . By the definition of retaliating strategies, if σ is a retaliating strategy
of player 1, then for all strategies π of player 2, we have ωσ,π (s) ∈ (ϕ2 → ϕ1 ), and thus
ωσ,π (s) ∈ (¬ϕ1 → ¬ϕ2 ). Symmetrically, if π is a retaliating strategy of player 2, then for
all strategies σ of player 1, we have ωσ,π (s) ∈ (¬ϕ2 → ¬ϕ1 ). Hence, given a retaliation
strategy profile (σ, π), we have ωσ,π (s) ∈ (¬ϕ1 ∨ ¬ϕ2 ) iff ωσ,π (s) ∈ (¬ϕ1 ∧ ¬ϕ2 ). The lemma
follows.
Proposition 14 (State space partition) For all 2-player graph games with Borel objectives, the four sets W10 , W01 , W11 , and W00 form a partition of the state space.
Proof. It follows from Lemma 54 that
S \ (W10 ∪ W01 ) = {s ∈ S | Re 2 (s) 6= ∅ ∧ Re 1 (s) 6= ∅}.
It also follows that the sets W10 , W01 , W11 , and W00 are disjoint. By definition, we have
W00 ⊆ {s ∈ S | Re 1 (s) 6= ∅ ∧ Re 2 (s) 6= ∅} ⊆ S \ (W10 ∪ W01 ). Consider T1 and T2 as defined
in Lemma 55. We have W00 = T1 , and by Lemma 54, we have T2 ∪ W11 = S \ (W10 ∪ W01 ).
It also follows that T2 ∩ W11 = ∅, and hence T2 = S \ (W10 ∪ W01 ) \ W11 . Therefore by
Lemma 55,
T2 = T1 = W00 = S \ (W10 ∪ W01 ) \ W11 .
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
214
The proposition follows.
Lemma 56 The following equalities hold:
SE 00 ∩ SE 10 = ∅
SE 01 ∩ SE 10 = ∅
SE 00 ∩ SE 01 = ∅.
Proof. Consider a state s ∈ SE 10 and a secure equilibrium (σ, π) at s. Since the strategy
profile is secure and player 2 receives the least possible payoff, it follows that for all player 2
strategies, the payoff for player 1 cannot decrease. Hence for all player 2 strategies π ′ , we
have ωσ,π′ (s) ∈ ϕ1 . So there is no Nash equilibrium at state s which assigns payoff 0 to
player 1. Hence we have SE 10 ∩ SE 01 = ∅ and SE 10 ∩ SE 00 = ∅. The argument to show
that SE 01 ∩ SE 00 = ∅ is similar.
Lemma 57 The following equalities hold:
SE 11 ∩ SE 01 = ∅
SE 11 ∩ SE 10 = ∅.
Proof. Consider a state s ∈ SE 11 and a secure equilibrium (σ, π) at s. Since the strategy
profile is secure, it ensures that for all player 2 strategies π ′ , if ωσ,π′ (s) ∈ ¬ϕ1 , then ωσ,π′ ∈
¬ϕ2 . Hence s 6∈ SE 01 . Thus SE 11 ∩SE 01 = ∅. The proof that SE 11 ∩SE 10 = ∅ is analogous.
Lemma 58 The following equalities hold:
MS 00 ∩ MS 01
MS 00 ∩ MS 10
MS 01 ∩ MS 10
MS 11 ∩ MS 00
=∅
=∅
=∅
= ∅.
Proof. The first three equalities follow from Lemmas 56 and 57. The last equality follows
from the facts that (0, 0) 1 (1, 1) and (0, 0) 2 (1, 1). So if s ∈ MS 11 , then (0, 0) cannot
be a maximal secure payoff profile at s.
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
215
Lemma 59 W10 = MS 10 and W01 = MS 01 .
Proof. Consider a state s ∈ MS 10 and a secure equilibrium (σ, π) at s. Since player 2
receives the least possible payoff and (σ, π) is a secure strategy profile, it follows that for all
strategies π ′ of player 2, we have ωσ,π′ (s) ∈ ϕ1 . Since (σ, π) is a Nash equilibrium, for all
strategies π ′ of player 2, we have ωσ,π′ (s) ∈ ¬ϕ2 . Thus MS 10 ⊆ W10 . Now consider a state
s ∈ W10 , and let σ be a strongly winning strategy of player 1 at s; that is, for all strategies
π of player 2, we have ωσ,π (s) ∈ (ϕ1 ∧ ¬ϕ2 ). For all strategies π of player 2, the strategy
profile (σ, π) is a secure equilibrium. Hence s ∈ SE 10 . Since (1, 0) is the greatest payoff
profile in the preference order for player 1, we have s ∈ MS 10 . Therefore W10 = MS 10 .
Symmetrically, W01 = MS 01 .
Lemma 60 W11 = MS 11 .
Proof. Consider a state s ∈ MS 11 , and let (σ, π) be a secure equilibrium at s. We prove
that σ ∈ Re 1 (s) and π ∈ Re 2 (s). Since (σ, π) is a secure strategy profile, for all strategies
π ′ of player 2, if ωσ,π′ (s) ∈ ¬ϕ1 , then ωσ,π′ (s) ∈ ¬ϕ2 . In other words, for all strategies π ′
of player 2, we have ωσ,π′ (s) ∈ (ϕ2 → ϕ1 ). Hence σ ∈ Re 1 (s). Symmetrically, π ∈ Re 2 (s).
Thus MS 11 ⊆ W11 . Consider a state s ∈ W11 , and let σ ∈ Re 1 (s) and π ∈ Re 2 (s) such that
ωσ,π (s) ∈ (ϕ1 ∧ ϕ2 ). A retaliation strategy profile is, by definition, a secure strategy profile.
Since the strategy profile (σ, π) assigns the greatest possible payoff to each player, it is a
Nash equilibrium. Therefore W11 ⊆ SE 11 ⊆ MS 11 .
Lemma 61 W00 = MS 00 .
Proof. It follows from Lemmas 56 and 58 that MS 00 = SE 00 \ SE 11 = SE 00 \ MS 11 . We
will use this fact to prove that W00 = MS 00 . First, consider a state s ∈ MS 00 . Then
s 6∈ (MS 11 ∪ MS 10 ∪ MS 11 ), which implies that s 6∈ (W11 ∪ W10 ∪ W01 ). By Proposition 14,
it follows that s ∈ W00 . Thus MS 00 ⊆ W00 .
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
216
Second, consider a state s ∈ W00 . We claim that there is a strategy σ of player 1
such that for all strategies π ′ of player 2, we have ωσ,π′ (s) ∈ ¬ϕ2 . Assume by the way
of contradiction that this is not the case. Then, by Borel determinacy there is a player 2
strategy π ′′ such that for all player 1 strategies σ ′ , we have ωσ′ ,π′′ (s) ∈ ϕ2 . It follows that
either π ′′ is a strongly winning strategy for player 2, or a retaliating strategy such that
player 2 receives payoff 1. Hence s 6∈ W00 , which is a contradiction. Thus there is a player 1
strategy σ such that for all player 2 strategies π ′ , we have ωσ,π′ (s) ∈ ¬ϕ2 . Similarly, there
is a player 2 strategy π such that for all player 1 strategies σ ′ , we have ωσ′ ,π (s) ∈ ¬ϕ1 .
We claim that (σ, π) is a secure equilibrium. By the properties of σ, for every π ′ we have
ωσ,π′ (s) ∈ ¬ϕ2 . A similar argument holds for π as well. It follows that (σ, π) is a Nash
equilibrium. The strategy profile (σ, π) has the payoff profile (0, 0), which assigns the least
possible payoff to each player. Hence it is a secure strategy profile. Therefore s ∈ SE 00 .
Also, s ∈ W00 implies that s 6∈ W11 . Since W11 = MS 11 , we have s ∈ SE 00 \ MS 11 . Thus
W00 ⊆ MS 00 .
Theorem 46 (Unique maximal secure equilibria) At every state of a 2-player graph
game with Borel objectives, there exists a unique maximal secure equilibrium payoff profile.
Proof. From Lemmas 59, 60, and 61, it follows that for all i, j ∈ {0, 1}, we have MS ij = Wij .
Using Proposition 14, the theorem follows.
9.3.2
Algorithmic characterization of secure equilibria
We now give an alternative characterization of the state sets W00 , W01 , W10 ,
and W11 . The new characterization is useful to derive computational complexity results
for computing the four sets when player 1 and player 2 have ω-regular objectives. The
characterization itself, however, applies to all tail (prefix independent) objectives.
In this subsection we only consider tail objectives ϕ1 and ϕ2 , for player 1 and player 1,
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
217
respectively. It follows from the definitions that W10 = hh1ii(ϕ1 ∧ ¬ϕ2 ) and W01 = hh2ii(ϕ2 ∧
¬ϕ1 ). Define A = S \ (W10 ∪ W01 ), the set of “ambiguous” states from which neither player
has a strongly winning strategy. Let Wi = hhiii(ϕi ), for i ∈ {1, 2}, be the winning sets of
the two players, and let U1 = W1 \ W10 and U2 = W2 \ W01 be the sets of “weakly winning”
states for players 1 and 2, respectively. Define U = U1 ∪ U2 . Note that U ⊆ A.
Lemma 62 U ⊆ W11 .
Proof. Let s ∈ U1 . By the definition of U1 , player 1 has a strategy σ from the state s
to satisfy the objective ϕ1 , which is obviously a retaliating strategy, because ϕ1 implies
ϕ2 → ϕ1 . Again by the definition of U1 , we have s 6∈ W10 . Hence, by the determinacy
of zero-sum games, player 2 has a strategy π to satisfy the objective ¬(ϕ1 ∧ ¬ϕ2 ), which
is a retaliating strategy, because ¬(ϕ1 ∧ ¬ϕ2 ) is equivalent to ϕ1 → ϕ2 . Clearly, we have
ωσ,π (s) ∈ ϕ1 and ωσ,π (s) ∈ (ϕ1 → ϕ2 ), and hence ωσ,π (s) ∈ (ϕ1 ∧ ϕ2 ). The case of s ∈ U2
is symmetric.
Example 8 shows that in general U ( W11 . Given a game graph G = ((S, E), (S1 , S2 )) and
a subset S ′ ⊆ S of the states, we write G ↾ S ′ to denote the subgraph induced by S ′ , that
is, G ↾ S ′ = ((S ′ , E ∩ (S ′ × S ′ )), (S1 ∩ S ′ , S2 ∩ S ′ )). The following lemma characterizes the
set W11 .
Lemma 63 W11 = hh1, 2iiG↾A (ϕ1 ∧ ϕ2 ).
Proof. Let s ∈ hh1, 2iiG↾A (ϕ1 ∧ϕ2 ). The case s ∈ U is covered by Lemma 62; so let s ∈ A\U .
Let (σ, π) be a cooperative strategy profile at s, that is, ωσ,π (s) ∈ (ϕ1 ∧ ϕ2 ). Observe that
if t ∈ A \ U , then t 6∈ hh1iiG (ϕ1 ) and t 6∈ hh2iiG (ϕ2 ). Hence, by the determinacy of zero-sum
games, from every state t ∈ A \ U , player 1 (resp. player 2) has a strategy σ (resp. π) to
satisfy the objective ¬ϕ2 (resp. ¬ϕ1 ) from state s. We define the pair (σ + σ, π + π) of
strategies from s as follows:
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
218
• When the play reaches a state t ∈ U , the players follow their winning retaliating
strategies from t. It follows from Lemma 62 that U ⊆ W11 .
• If the play has not yet reached the set U , then player 1 uses the strategy σ and player 2
uses the strategy π. If, however, player 2 deviates from the strategy π, then player 1
switches to the strategy σ at the first state after the deviation, and symmetrically, as
soon as player 1 deviates from σ, then player 2 switches to the strategy π.
It is easy to observe that both strategies σ + σ and π + π are retaliating strategies, and that
ωσ+σ,π+π (s) ∈ (ϕ1 ∧ ϕ2 ), because ωσ+σ,π+π (s) = ωσ,π (s). Hence s ∈ W11 .
Let s 6∈ hh1, 2iiG↾A (ϕ1 ∧ ϕ2 ). Then s 6∈ W11 , because for every strategy profile
(σ, π), either ωσ,π (s) ∈ ¬ϕ1 or ωσ,π (s) ∈ ¬ϕ2 .
By definition, the two sets W10 and W01 can be computed by solving two zero-sum
games with conjunctive objectives. Lemma 63 shows that the set W11 can be computed
by solving a model-checking (i.e., 1-player) problem for a conjunctive objective. Finally, it
follows from Proposition 14 that the set W00 can be obtained by set operations. This is
summarized in the following theorem.
Theorem 47 (Algorithmic characterization of secure equilibria) Consider a game
graph G with Borel objectives ϕ1 and ϕ2 for the two players. The four sets W10 , W01 , W11 ,
and W00 can be computed as follows:
W10
W01
W11
W00
= hh1iiG (ϕ1 ∧ ¬ϕ2 ),
= hh2iiG (ϕ2 ∧ ¬ϕ1 ),
= hh1, 2iiG↾A (ϕ1 ∧ ϕ2 ),
= S \ (W10 ∪ W01 ∪ W11 ),
where A = S \ (W10 ∪ W01 ).
If the two objectives ϕ1 and ϕ2 are ω-regular, then we obtain the following corollary.
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
219
Corollary 8 (Computational complexity) Let n be the size of the game graph G.
• If ϕ1 and ϕ2 are parity objectives specified by priority functions, then the decision
problem whether a given state lies in W10 , or in W01 , is coNP-complete; and if a
given state lies in W11 , or in W00 , can be decided in NP. The four sets W10 , W01 ,
W11 , and W00 can be computed in time O nd+1 · d! , where d is the maximal number
of priorities in the two priority functions.
• If the two objectives ϕ1 and ϕ2 are specified as LTL (linear temporal logic) formulas,
then deciding W10 , W01 , W11 , and W00 is 2EXPTIME-complete. The four sets can
ℓ·log ℓ ℓ
, where ℓ is the sum of the lengths of the two
be computed in time O n2 × 22
formulas.
Proof. If the objectives ϕ1 and ϕ2 are parity objectives, and d is the maximal number
of priorities in the two priority functions, then the conjunctions ϕ1 ∧ ¬ϕ2 , ϕ2 ∧ ¬ϕ1 and
ϕ1 ∧ ϕ2 can be expressed as Streett objectives [Tho97] with d pairs. The decision problem
for zero-sum games with Streett objectives is in co-NP [EJ88], the model-checking problem
for Streett objectives can be solved in polynomial time, and zero-sum games with Streett
objectives with d pairs can be solved in time O(nd+1 · d!) [PP06]. It follows that, for a
given state s, whether s ∈ W10 and whether s ∈ W01 can be decided in co-NP, and whether
s ∈ A for A = S \ (W01 ∪ W10 ) can be decided in NP. Given the set A, whether s ∈ W11
and whether s ∈ W00 can be decided in PTIME, by solving a model-checking problem with
Streett objectives. It follows from the results of [CHP07] that deciding the winner of a game
with conjunction of two parity objectives is coNP-hard; and hence the coNP-complete result
follows. The first part of the corollary follows.
Since the decision problem for zero-sum games with LTL objectives is 2EXPTIMEcomplete [PR89], the 2EXPTIME lower bound is immediate. We obtain the matching upper
bound as follows. Let ℓ be the sum of the lengths of the two LTL formulas ϕ1 and ϕ2 . LTL
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
220
formulas are closed under conjunction and negation, and hence ϕ1 ∧ ¬ϕ2 and ϕ2 ∧ ¬ϕ1
are LTL formulas of length ℓ + 2. An LTL formula of length ℓ can be converted into an
equivalent nondeterministic Büchi automaton of size 2ℓ [VW86], and the nondeterministic
Büchi automaton can be converted into an equivalent deterministic parity automaton of
ℓ·log ℓ
size 22
with 2ℓ priorities [Saf88]. The problem then reduces to solving zero-sum parity
games obtained by the synchronous product of the game graph and the deterministic parity
automaton. Since zero-sum parity games can be solved in time O(nd ) for game graphs of
size n and parity objectives with d priorities [Tho97], the upper bound follows.
9.4
Assume-guarantee Synthesis
In this we will study the synthesis of two independent processes and show how
secure equilibria is useful in such scenario. The classical synthesis problem for reactive
systems asks, given a proponent process A and an opponent process B, to refine A so
that the closed-loop system A||B satisfies a given specification Φ. The solution of this
problem requires the computation of a winning strategy for proponent A in a game against
opponent B. We define and study the co-synthesis problem, where the proponent A consists
itself of two independent processes, A = A1 ||A2 , with specifications Φ1 and Φ2 , and the goal
is to refine both A1 and A2 so that A1 ||A2 ||B satisfies Φ1 ∧ Φ2 . For example, if the opponent
B is a fair scheduler for the two processes A1 and A2 , and Φi specifies the requirements of
mutual exclusion for Ai (e.g., starvation freedom), then the co-synthesis problem asks for
the automatic synthesis of a mutual-exclusion protocol.
We show that co-synthesis defined classically, with the processes A1 and A2 either
collaborating or competing, does not capture desirable solutions. Instead, the proper formulation of co-synthesis is the one where process A1 competes with A2 but not at the price of
violating Φ1 , and vice versa. We call this assume-guarantee synthesis and show that it can
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
221
be solved by computing secure-equilibrium strategies. In particular, from mutual-exclusion
requirements the assume-guarantee synthesis algorithm automatically computes Peterson’s
protocol.
We formally define the co-synthesis problem, using the automatic synthesis of a
mutual-exclusion protocol as a guiding example. More precisely, we wish to synthesize
two processes P1 and P2 so that the composite system P1 ||P2 ||R, where R is a scheduler
that arbitrarily but fairly interleaves the actions of P1 and P2 , satisfies the requirements
of mutual exclusion and starvation freedom for each process. We show that traditional
zero-sum game-theoretic formulations, where P1 and P2 either collaborate against R, or
unconditionally compete, do not lead to acceptable solutions. We then show that for the
non-zero-sum game-theoretic formulation, where the two processes compete conditionally,
there exists an unique winning secure-equilibrium solution, which corresponds exactly to
Peterson’s mutual-exclusion protocol. In other words, Peterson’s protocol can be synthesized automatically as the winning secure strategies of two players whose objectives are the
mutual-exclusion requirements. This is to our knowledge the first application of non-zerosum games in the synthesis of reactive processes. It is also, to our knowledge, the first
application of Nash equilibria —in particular, the special kind called “secure”— in system
design.
The new formulation of co-synthesis, with the two processes competing conditionally, is called assume-guarantee synthesis, because similar to assume-guarantee verification
(e.g., [AH99]), in attempting to satisfy her specification, each process makes the assumption
that the other process does not violate her own specification. The solution of the assumeguarantee synthesis problem can be obtained by computing secure equilibria in 3-player
games, with the three players P1 , P2 , and R.
222
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
do
{
flag[1]:=true; turn:=2;
|
|
|
|
|
|
|
|
while(flag[1]) nop;
while(flag[2]) nop;
while(turn=1) nop;
while(turn=2) nop;
while(flag[1] & turn=2)
while(flag[1] & turn=1)
while(flag[2] & turn=1)
while(flag[2] & turn=2)
do
{
flag[2]:=true; turn:=1;
|
|
|
|
nop;|
nop;|
nop;|
nop;|
while(flag[1]) nop;
while(flag[2]) nop;
while(turn=1) nop;
while(turn=2) nop;
while(flag[1] & turn=2)
while(flag[1] & turn=1)
while(flag[2] & turn=1)
while(flag[2] & turn=2)
Cr1:=true;
fin_wait;
Cr1:=false;
flag[1]:=false;
Cr2:=true;
fin_wait;
Cr2:=false;
flag[2]:=false;
wait[1]:=1;
while(wait[1]=1)
| nop;
| wait[1]:=0;
} while(true)
wait[2]:=1;
while(wait[2]=1)
| nop;
| wait[2]:=0;
} while(true)
nop;
nop;
nop;
nop;
(C1)
(C2)
(C3)
(C4)
(C5)
(C6)
(C7)
(C8)
(C9)
(C10)
Figure 9.3: Mutual-exclusion protocol synthesis
9.4.1
Co-synthesis
In this section we define processes, refinement, schedulers, and specifications. We
consider the traditional co-operative [CE81] and strictly competitive [PR89, RW87] versions
of the co-synthesis problem; we refer to them as weak co-synthesis and classical co-synthesis,
respectively. We show the drawbacks of these formulations and then present a new formulation of co-synthesis, namely, assume-guarantee synthesis.
Variables, valuations, and traces. Let X be a finite set of variables such that each variable
S
x ∈ X has a finite domain Dx . A valuation θ on X is a function θ : X → x∈X Dx that
assigns to each variable x ∈ X a value θ(x) ∈ Dx . We write Θ for the set of valuations on
X. A trace on X is an infinite sequence (θ0 , θ1 , θ2 , . . .) ∈ Θω of valuations on X. Given a
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
223
valuation θ ∈ Θ and a subset Y ⊆ X of the variables, we denote by θ ↾ Y the restriction
of the valuation θ to the variables in Y . Similarly, for a trace τ = (θ0 , θ1 , θ2 , . . .) on X, we
write τ ↾ Y = (θ0 ↾ Y, θ1 ↾ Y, θ2 ↾ Y, . . .) for the restriction of τ to the variables in Y . The
restriction operator is lifted to sets of valuations, and to sets of traces.
Processes and refinement. For i ∈ {1, 2}, a process Pi = (Xi , δi ) consists of a finite set Xi
of variables and a nondeterministic transition function δi : Θi → 2Θi \ ∅, where Θi is the
set of valuations on Xi . The transition function maps a present valuation to a nonempty
set of possible successor valuations. We write X = X1 ∪ X2 for the set of variables of
both processes; note that some variables may be shared by both processes. A refinement
of process Pi = (Xi , δi ) is a process Pi′ = (Xi′ , δi′ ) such that (1) Xi ⊆ Xi′ , and (2) for all
valuations θ ′ on Xi′ , we have δi′ (θ ′ ) ↾ Xi ⊆ δi (θ ′ ↾ Xi ). In other words, the refined process
Pi′ has possibly more variables than the original process Pi , and every possible update of
the variables in Xi by Pi′ is a possible update by Pi . We write Pi′ Pi to denote that Pi′ is
a refinement of Pi . Given two refinements P1′ of P1 and P2′ of P2 , we write X ′ = X1′ ∪ X2′
for the set of variables of both refinements, and we denote the set of valuations on X ′ by
Θ′ .
Schedulers. Given two processes P1 and P2 , a scheduler R for P1 and P2 chooses at each
computation step whether it is process P1 ’s turn or process P2 ’s turn to update its variables.
Formally, the scheduler R is a function R : Θ∗ → {1, 2} that maps every finite sequence of
global valuations (representing the history of a computation) to i ∈ {1, 2}, signaling that
process Pi is next to update its variables. The scheduler R is fair if it assigns turns to both
P1 and P2 infinitely often; i.e., for all traces (θ0 , θ1 , θ2 , . . .) ∈ Θω , there exist infinitely many
j ≥ 0 and infinitely many k ≥ 0 such that R(θ0 , . . . , θj ) = 1 and R(θ0 , . . . , θk ) = 2. Given
two processes P1 = (X1 , δ1 ) and P2 = (X2 , δ2 ), a scheduler R for P1 and P2 , and a start
valuation θ0 ∈ Θ, the set of possible traces is [[(P1 || P2 || R)(θ0 )]] = {(θ0 , θ1 , θ2 , . . .) ∈ Θω |
∀j ≥ 0. R(θ0 , . . . , θj ) = i and θj+1 ↾ (X \ Xi ) = θj ↾ (X \ Xi ) and θj+1 ↾ Xi ∈ δi (θj ↾ Xi )}.
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
224
Note that during turns of one process Pi , the values of the private variables X \ Xi of the
other process remain unchanged.
Specifications. A specification Φi for processes Pi is a set of traces on X; that is, Φi ⊆
Θω . We consider only ω-regular specifications [Tho97]. We define boolean operations on
specifications using logical operators such as ∧ (conjunction) and → (implication).
Weak co-synthesis. In all formulations of the co-synthesis problem that we consider, the
input to the problem is given as follows: two processes P1 = (X1 , δ1 ) and P2 = (X2 , δ2 ), two
specifications Φ1 for process 1 and Φ2 for process 2, and a start valuation θ0 ∈ Θ. The weak
co-synthesis problem is defined as follows: do there exist two processes P1′ = (X1′ , δ1′ ) and
P2′ = (X2′ , δ2′ ), and a valuation θ0′ ∈ Θ′ , such that (1) P1′ P1 and P2′ P2 and θ0′ ↾ X = θ0 ,
and (2) for all fair schedulers R for P1′ and P2′ , we have [[(P1′ || P2′ || R)(θ0′ )]] ↾ X ⊆ (Φ1 ∧ Φ2 ).
Example 9 (Mutual-exclusion protocol synthesis) Consider the two processes shown
in Fig. 9.3. Process P1 (on the left) places a request to enter its critical section by setting
flag[1]:=true, and the entry of P1 into the critical section is signaled by Cr1:=true; and
similarly for process P2 (on the right). The two variables flag[1] and flag[2] are boolean,
and in addition, both processes may use a shared variable turn that takes two values 1 and
2. There are 8 possible conditions C1–C8 for a process to guard the entry into its critical
section.2 The figure shows all 8×8 alternatives for the two processes; any refinement without
additional variables will choose a subset of these. Process P1 may stay in its critical section
for an arbitrary finite amount of time (indicated by fin wait), and then exit by setting
Cr1:=false; and similarly for process P2 . The while loop with the two alternatives C9
and C10 expresses the fact that a process may wait arbitrarily long (possibly infinitely long)
before a subsequent request to enter its critical section.
We use the notations 2 and 3 to denote always (safety) and eventually (reacha2
Since a guard may check any subset of the three 2-valued variables, there are 256 possible guards; but
all except 8 can be discharged immediately as not useful.
225
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
bility) specifications, respectively. The specification for process P1 consists of two parts:
prog
= 2¬(Cr1 = true ∧ Cr2 = true) and a liveness part Φ1
a safety part Φmutex
1
=
2(flag[1] = true → 3(Cr1 = true)). The first part Φmutex
specifies that both processes
1
prog
are not in their critical sections simultaneously ( mutual exclusion); the second part Φ1
specifies that if process P1 wishes to enter its critical section, then it will eventually enter
( starvation freedom). The specification Φ1 for process P1 is the conjunction of Φmutex
and
1
prog
Φ1
. The specification Φ2 for process P2 is symmetric.
The answer to the weak co-synthesis problem for Example 9 is “Yes.” A solution of the
weak co-synthesis formulation are two refinements P1′ and P2′ of the two given processes P1
and P2 , such that the composition of the two refinements satisfies the specifications Φ1 and
Φ2 for every fair scheduler. One possible solution is as follows: in P1′ , the alternatives C4
and C10 are chosen, and in P2′ , the alternatives C3 and C10 are chosen. This solution is not
satisfactory, because process P1 ’s starvation freedom depends on the fact that process P2
requests to enter its critical section infinitely often. If P2 were to make only a single request
to enter its critical section, then the progress part of Φ1 would be violated.
Classical co-synthesis. The classical co-synthesis problem is defined as follows: do there
exist two processes P1′ = (X1′ , δ1′ ) and P2′ = (X2′ , δ2′ ), and a valuation θ0′ ∈ Θ′ , such that
(1) P1′ P1 and P2′ P2 and θ0′ ↾ X = θ0 , and (2) for all fair schedulers R for P1′ and P2′ ,
we have (a) [[(P1′ || P2 || R)(θ0′ )]] ↾ X ⊆ Φ1 and (b) [[(P1 || P2′ || R)(θ0′ )]] ↾ X ⊆ Φ2 .
The answer to the classical co-synthesis problem for Example 9 is “No.” We will
argue later (in Example 10) why this is the case.
Assume-guarantee synthesis. We now present a new formulation of the co-synthesis
problem. The main idea is derived from the notion of secure equilibria. We refer to this
new formulation as the assume-guarantee synthesis problem; it is defined as follows: do
there exist two refinements P1′ = (X1′ , δ1′ ) and P2′ = (X2′ , δ2′ ), and a valuation θ0′ ∈ Θ′ , such
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
do
{
flag[1]:=true; turn:=2;
226
do
{
flag[2]:=true; turn:=1;
while (flag[2] & turn=2) nop; while (flag[1] & turn=1) nop;
Cr1:=true;
fin_wait;
Cr1:=false;
flag[1]:=false;
Cr2:=true;
fin_wait;
Cr2:=false;
flag[2]:=false;
wait[1]:=1;
while(wait[1]=1)
| nop;
| wait[1]:=0;
} while(true)
wait[2]:=1;
while(wait[2]=1)
| nop;
| wait[2]:=0;
} while(true)
(C8+C6)
(C9)
(C10)
Figure 9.4: Peterson’s mutual-exclusion protocol
that (1) P1′ P1 and P2′ P2 and θ0′ ↾ X = θ0 , and (2) for all fair schedulers R for P1′ and
P2′ , we have (a) [[(P1′ || P2 || R)(θ0′ )]] ↾ X ⊆ (Φ2 → Φ1 ) and (b) [[(P1 || P2′ || R)(θ0′ )]] ↾ X ⊆
(Φ1 → Φ2 ) and (c) [[(P1′ || P2′ || R)(θ0′ )]] ↾ X ⊆ (Φ1 ∧ Φ2 ).
The answer to the assume-guarantee synthesis problem for Example 9 is “Yes.”
A solution P1′ and P2′ is shown in Fig. 9.4. We will argue the correctness of this solution
later (in Example 11). The two refined processes P1′ and P2′ present exactly Peterson’s
solution to the mutual-exclusion problem. In other words, Peterson’s protocol can be derived
automatically as an answer to the assume-guarantee synthesis problem for the requirements
of mutual exclusion and starvation freedom. The success of assume-guarantee synthesis for
the mutual-exclusion problem, together with the failure of the classical co-synthesis, suggests
that the classical formulation of co-synthesis is too strong.
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
9.4.2
227
Game Algorithms for Co-synthesis
We reduce the three formulations of the co-synthesis problem to problems about
games played on graphs with three players.
Game graphs. A 3-player game graph G = ((S, E), (S1 , S2 , S3 )) consists of a directed graph
(S, E) with a finite set S of states and a set E ⊆ S × S of edges, and a partition (S1 , S2 , S3 )
of the state space S into three sets. The states in Si are player-i states, for i ∈ {1, 2, 3},
and player i decides the successor at a state in Si . The notion of strategies and plays are
similar as in the case of 2-player games. We denote by σi a strategy for player i and Σi
the set of all strategies for player i, for i ∈ {1, 2, 3}. Given a start state s ∈ S and three
strategies σi ∈ Σi , one for each of the three players i ∈ {1, 2, 3}, there is an unique play,
denoted ωσ1 ,σ2 ,σ3 (s) = (s0 , s1 , s2 , . . .), such that s0 = s and for all k ≥ 0, if sk ∈ Si , then
σi (s0 , s1 , . . . , sk ) = sk+1 ; this play is the outcome of the game starting at s given the three
strategies σ1 , σ2 , and σ3 .
Winning. An objective Ψ is a set of plays; i.e., Ψ ⊆ Ω. We extend the notion of winning
states to three player games (using the notation is derived from [AHK02]. For an objective
Ψ, the set of winning states for player 1 in the game graph G is
hh1iiG (Ψ) = {s ∈ S | ∃σ1 ∈ Σ1 . ∀σ2 ∈ Σ2 . ∀σ3 ∈ Σ3 . ωσ1 ,σ2 ,σ3 (s) ∈ Ψ};
a witness strategy σ1 for player 1 for the existential quantifier is referred to as a winning
strategy. The winning sets hh2iiG (Ψ) and hh3iiG (Ψ) for players 2 and 3 are defined analogously. The set of winning states for the team consisting of player 1 and player 2, playing
against player 3, is
hh1, 2iiG (Ψ) = {s ∈ S | ∃σ1 ∈ Σ1 . ∃σ2 ∈ Σ2 . ∀σ3 ∈ Σ3 . ωs,σ1 ,σ2 ,σ3 (s) ∈ Ψ}.
The winning sets hhIiiG (Ψ) for other teams I ⊆ {1, 2, 3} are defined similarly. The following
determinacy result follows from [GH82].
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
228
Theorem 48 (Finite-memory determinacy [GH82]) Let Ψ be an ω-regular objective,
let G be a 3-player game graph, and let I ⊆ {1, 2, 3} be a set of the players. Let J =
{1, 2, 3}\I. Then (1) hhIiiG (Ψ) = S\hhJiiG (¬Ψ), and (2) there exist finite-memory strategies
for the players in I such that against all strategies for the players in J, for all states in
s ∈ hhIiiG (Ψ), the play starting at s given the strategies lies in Ψ.
Game solutions to weak and classical co-synthesis. Given two processes P1 = (X1 , δ1 )
b = ((S, E), (S1 , S2 , S3 )) as follows:
and P2 = (X2 , δ2 ), we define the 3-player game graph G
let S = Θ × {1, 2, 3}; let Si = Θ × {i} for i ∈ {1, 2, 3}; and let E contain (1) all edges
of the form ((θ, 3), (θ, 1)) for v ∈ Θ, (2) all edges of the form ((θ, 3), (θ, 2)) for v ∈ Θ,
and (3) all edges of the form ((θ, i), (θ ′ , 3)) for i ∈ {1, 2} and θ ′ ↾ Xi ∈ δi (θ ↾ Xi ) and
θ ′ ↾ (X \ Xi ) = θ ↾ (X \ Xi ). In other words, player 1 represents process P1 , player 2
represents process P2 , and player 3 represents the scheduler. Given a play of the form
ω = ((θ0 , 3), (θ0 , i0 ), (θ1 , 3), (θ1 , i1 ), (θ2 , 3), . . .), where ij ∈ {1, 2} for all j ≥ 0, we write [ω]1,2
for the sequence of valuations (θ0 , θ1 , θ2 , . . .) in ω (ignoring the intermediate valuations at
player-3 states). A specification Φ ⊆ Θω defines the objective [[Φ]] = {ω ∈ Ω | [ω]1,2 ∈ Φ}.
In this way, the specifications Φ1 and Φ2 for the processes P1 and P2 provide the objectives
Ψ1 = [[Φ1 ]] and Ψ2 = [[Φ2 ]] for players 1 and 2, respectively. The objective for player 3
(the scheduler) is the fairness objective Ψ3 = Fair that both S1 and S2 are visited infinitely
often; i.e., Fair contains all plays (s0 , s1 , s2 , . . .) ∈ Ω such that sj ∈ S1 for infinitely many
j ≥ 0, and sk ∈ S2 for infinitely many k ≥ 0.
Proposition 15 Given two processes P1 = (X1 , δ1 ) and P2 = (X2 , δ2 ), two specifications
Φ1 for P1 and Φ2 for P2 , and a start valuation θ0 ∈ Θ, the answer to the weak co-synthesis
problem is “Yes” iff (θ0 , 3) ∈ hh1, 2iiGb (Fair → ([[Φ1 ]] ∧ [[Φ2 ]])); and the answer to the
classical co-synthesis problem is “Yes” iff both (θ0 , 3) ∈ hh1iiGb (Fair → [[Φ1 ]]) and (θ0 , 3) ∈
hh2iiGb (Fair → [[Φ2 ]]).
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
229
Proof. We first note that for games with ω-regular objectives, finite-memory winning
strategies suffices (Theorem 48). The proof follows by the following case analysis.
1. Given a finite-memory strategy σ1 , a witness P1′ = (X1′ , δ1′ ) for the weak co-synthesis
problem can be obtained as follows: the variables Xi′ \ Xi encodes the finite-memory
information of the strategy σ1 and the next-state function of the strategy is then
captured by a deterministic update function δ1′ . A similar construction holds for
player 2.
2. Given a witness P1′ = (X1′ , δ1′ ) as a witness for the weak co-synthesis problem, we first
observe that any deterministic restriction of P1′ (i.e., the transition function δ1′ is made
deterministic) is also a witness to the weak co-synthesis problem. A witness strategy
b is obtained as follows: the variables in X ′ \X1 is encoded as the finite-memory
σ1 in G
1
information of σ1 and the deterministic update is captured by the next-state function.
The construction of witness strategies for player 2 is similar.
The proof for classical co-synthesis problem is similar.
Example 10 (Failure of classical co-synthesis) We now demonstrate the failure of
classical co-synthesis for Example 9. We show that for every strategy for process P1 , there
exist spoiling strategies for process P2 and the scheduler such that (1) the scheduler is fair
and (2) the specification Φ1 of process P1 is violated. With any fair scheduler, process P1
will eventually set flag[1]:=true. Whenever process P1 enters its critical section (setting Cr1:=true), the scheduler assigns a finite sequence of turns to process P2 . During
this sequence, process P2 enters its critical section: it may first choose the alternative C10
to return to the beginning of the the main loop, then set flag[2]:=true; turn:=1; then
pass the guard C4: (since (turn 6= 2)), and enter the critical section (setting Cr2:=true).
This violates the mutual-exclusion requirement Φmutex
of process P1 . On the other hand, if
1
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
230
process P1 never enters its critical section, this violates the starvation-freedom requirement
prog
Φ1
of process P1 . Thus the answer to the classical co-synthesis problem is “No.”
Game solution to assume-guarantee synthesis. We extend the notion of secure equilibria from 2-player games to 3-player games where player 3 can win unconditionally; i.e.,
hh3iiG (Ψ3 ) = S for the objective Ψ3 for player 3. In the setting of two processes and a
scheduler (player 3) with a fairness objective, the restriction that hh3iiG (Ψ3 ) = S means
that the scheduler has a fair strategy from all states; this is clearly the case for Ψ3 = Fair.
(Alternatively, the scheduler may not required to be fair; then Ψ3 is the set of all plays, and
the restriction is satisfied trivially.) We characterize the winning secure equilibrium states
and then establish the existence of finite-memory winning secure strategies (Theorem 50).
This will allow us to solve the assume-guarantee synthesis problem by computing winning
secure equilibria (Theorem 51).
Payoffs. In the following, we fix a 3-player game graph G and objectives Ψ1 , Ψ2 , and
Ψ3 for the three players such that hh3iiG (Ψ3 ) = S. Since hh3iiG (Ψ3 ) = S, any equilibrium
payoff profile will assign payoff 1 to player 3. Hence we focus on payoff profiles whose third
component is 1.
Payoff-profile ordering. The player-1 preference order ≺1 on payoff profiles is lexicographic:
(v1 , v2 , 1) ≺1 (v1′ , v2′ , 1) iff either (1) v1 < v1′ , or (2) v1 = v1′ and v2 > v2′ ; that is, player 1
prefers a payoff profile that gives her greater payoff, and if two payoff profiles match in the
first component, then she prefers the payoff profile in which player 2’s payoff is smaller, i.e.,
it is the same preference order defined for secure equilibria for two players. The preference
order for player 2 is symmetric. The preference order for player 3 is such that (v1 , v2 , 1) ≺3
(v1′ , v2′ , 1) iff v1 + v2 > v1′ + v2′ . Given two payoff profiles (v1 , v2 , v3 ) and (v1′ , v2′ , v3′ ), we write
(v1 , v2 , v3 ) = (v1′ , v2′ , v3′ ) iff vi = vi′ for all i ∈ {1, 2, 3}, and we write (v1 , v2 , v3 ) i (v1′ , v2′ , v3′ )
iff (v1 , v2 , v3 ) ≺i (v1′ , v2′ , v3′ ) or (v1 , v2 , v3 ) = (v1′ , v2′ , v3′ ).
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
231
Secure equilibria. A strategy profile (σ1 , σ2 , σ3 ) is a secure equilibrium at a state s ∈ S iff its
a Nash equilibrium with respect to the preference order 1 , 2 and 3 . For u, w ∈ {0, 1},
we write Suw1 ⊆ S for the set of states s such that a secure equilibrium with the payoff
profile (u, w, 1) exists at s; that is, s ∈ Suw1 iff there is a secure equilibrium (σ1 , σ2 , σ3 ) at
s with payoff profile (u, w, 1). Moreover, we write MS uw1 (G) ⊆ Suw1 for the set of states s
such that the payoff profile (u, w, 1) is a maximal secure equilibrium payoff profile at s;
that is, s ∈ MS uw1 (G) iff (1) s ∈ Suw1 , and (2) for all u′ , w′ ∈ {0, 1}, if s ∈ Su′ w′ 1 , then
both (u′ , w′ , 1) 1 (u, w, 1) and (u′ , w′ , 1) 2 (u, w, 1). The states in MS 111 (G) are referred
to as winning secure equilibrium states, and the witnessing secure equilibrium strategies as
winning secure strategies.
Theorem 49 Let G be a 3-player game graph G with the objectives Ψ1 , Ψ2 , and Ψ3 for
the three players such that hh3iiG (Ψ3 ) = S. Let
U1 = hh1iiG (Ψ3 → Ψ1 ); and U2 = hh2iiG (Ψ3 → Ψ2 );
Z1 = hh1, 3iiG↾U1 (Ψ1 ∧ Ψ3 ∧ ¬Ψ2 ); and Z2 = hh2, 3iiG↾U2 (Ψ2 ∧ Ψ3 ∧ ¬Ψ1 );
W = hh1, 2iiG↾(S\(Z1 ∪Z2 )) (Ψ3 → (Ψ1 ∧ Ψ2 )).
Then the following assertions hold: (1) at all states in Z1 the only secure equilibrium payoff
profile is (1, 0, 1); (2) at all states in Z2 the only secure equilibrium payoff profile is (0, 1, 1);
and (3) W = MS 111 (G).
Proof. We prove parts (1) and (3); the proof of part (2) is similar to part (1).
Part (1). Since hh3iiG (Ψ3 ) = S and Z1 ⊆ U1 = hh1iiG (Ψ3 → Ψ1 ), it follows that any secure
equilibrium profile in Z1 has payoff profile of the form (1, , 1). Since (1, 1, 1) ≺1 (1, 0, 1) and
(1, 1, 1) ≺3 (1, 0, 1), to prove uniqueness it suffices to show that player 1 and player 3 can fix
strategies to ensure secure equilibrium payoff profile (1, 0, 1). Since Z1 = hh1, 3iiG↾U1 (Ψ1 ∧
Ψ3 ∧ ¬Ψ2 ), consider the strategy pair (σ1 , σ3 ) such that against all player 2 strategies σ2
and for all states s ∈ Z1 , we have ωσ1 ,σ2 ,σ3 (s) ∈ (Ψ1 ∧ Ψ3 ∧ ¬Ψ2 ). The secure equilibrium
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
232
strategy pair (σ1∗ , σ3∗ ) for player 1 and player 3 (along with any strategy σ2 for player 2) is
constructed as follows.
1. The strategy σ1∗ is as follows: player 1 plays σ1 and if player 3 deviates from σ3 , then
player 1 switches to a winning strategy for Ψ3 → Ψ1 . Such a strategy exists since
Z1 ⊆ U1 = hh1iiG (Ψ3 → Ψ1 ).
2. The strategy σ3∗ is as follows: player 3 plays σ3 and if player 1 deviates from σ1 ,
then player 3 switches to a winning strategy for Ψ3 . Such a strategy exists since
hh3iiG (Ψ3 ) = S.
Hence objective of player 1 is always satisfied, given objective of player 3 is satisfied. Thus
player 3 has no incentive to deviate. Similarly, player 1 also has no incentive to deviate.
The result follows.
Part (3). By Theorem 48 we have S \ W = hh3iiG (Ψ3 ∧ (¬Ψ1 ∨ ¬Ψ2 )) and there is a
player 3 strategy σ3 that satisfies Ψ3 ∧ (¬Ψ1 ∨ ¬Ψ2 ) against all strategies of player 1 and
player 2. Hence the equilibrium (1, 1, 1) cannot exist in the complement set of W , i.e.,
MS 111 (G) ⊆ W . We now show that in W there is a secure equilibrium with payoff profile
(1, 1, 1). The following construction completes the proof.
1. In W ∩ U1 , player 1 plays a winning strategy for objective Ψ3 → Ψ1 , and player 2
plays a winning strategy for objective (Ψ3 ∧ Ψ1 ) → Ψ2 . Observe that S \ Z1 =
hh2iiG (¬Ψ1 ∨ ¬Ψ3 ∨ Ψ2 ), and hence such a winning strategy exists for player 2.
2. In W ∩ (U2 \ U1 ), player 2 plays a winning strategy for objective Ψ3 → Ψ2 , and
player 1 plays a winning strategy for objective (Ψ2 ∧ Ψ3 ) → Ψ1 . Observe that S \Z2 =
hh1iiG (¬Ψ2 ∨ ¬Ψ3 ∨ Ψ1 ), and hence such a winning strategy exists for player 1.
3. By Theorem 48 we have W \U1 = hh2, 3iiG (¬Ψ1 ∧Ψ3 ) and W \U2 = hh1, 3iiG (¬Ψ2 ∧Ψ3 ).
The strategy construction in W \ (U1 ∪ U2 ) is as follows: player 1 and player 2 play a
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
233
strategy (σ1 , σ2 ) to satisfy Ψ1 ∧ Ψ2 against all strategies of player 3, and player 3 plays
a winning strategy for Ψ3 ; if player 1 deviates, then player 2 and player 3 switches to
a strategy (σ 2 , σ 3 ) such that against all strategies for player 1 the objective Ψ3 ∧ ¬Ψ1
is satisfied; and if player 2 deviates, then player 1 and player 3 switches to a strategy
(σ 1 , σ 3 ) such that against all strategies for player 2 the objective Ψ3 ∧ ¬Ψ2 is satisfied.
Hence neither player 1 and nor player 2 has any incentive to deviate according to the
preference order 1 and 2 , respectively.
Alternative characterization of winning secure equilibria. In order to obtain a characterization of the set MS 111 (G) in terms of strategies, we extend the definition of retaliation
strategies for the case of three players. Given objectives Ψ1 , Ψ2 , and Ψ3 for the three
players, and a state s ∈ S, the sets of retaliation strategies for players 1 and 2 at s are
Re1 (s) = {σ1 ∈ Σ1 | ∀σ2 ∈ Σ2 . ∀σ3 ∈ Σ3 . ωσ1 ,σ2 ,σ3 (s) ∈ ((Ψ3 ∧ Ψ2 ) → Ψ1 )};
Re2 (s) = {σ2 ∈ Σ2 | ∀σ1 ∈ Σ1 . ∀σ3 ∈ Σ3 . ωσ1 ,σ2 ,σ3 (s) ∈ ((Ψ3 ∧ Ψ1 ) → Ψ2 )}.
Theorem 50 Let G be a 3-player game graph G with the objectives Ψ1 , Ψ2 , and Ψ3 for the
three players such that hh3iiG (Ψ3 ) = S. Let U = {s ∈ S | ∃σ1 ∈ Re1 (s). ∃σ2 ∈ Re2 (s). ∀σ3 ∈
Σ3 . ωσ1 ,σ2 ,σ3 (s) ∈ (Ψ3 → (Ψ1 ∧ Ψ2 ))}. Then U = MS 111 (G).
Proof. We first show that U ⊆ MS 111 (G). For a state s ∈ U , choose σ1 ∈ Re1 (s) and
σ2 ∈ Re2 (s) such that for all σ3 ∈ Σ3 , we have ωσ1 ,σ2 ,σ3 (s) ∈ (Ψ3 → (Ψ1 ∧ Ψ2 )). Fixing the
strategies σ1 and σ2 for players 1 and 2, and a winning strategy for player 3, we obtain the
secure equilibrium payoff profile (1, 1, 1). We now show that MS 111 (G) ⊆ U . This follows
from the proof of Theorem 49. In Theorem 49 we proved that for all states s ∈ (S\(Z1 ∪Z2 )),
we have Re1 (s) 6= ∅ and Re2 (s) 6= ∅; and the winning secure strategies constructed for the
set W = MS 111 (G) are witness strategies to prove that MS 111 (G) ⊆ U .
Observe that for ω-regular objectives, the winning secure strategies of Theorem 50 are finitememory strategies. The existence of finite-memory winning secure strategies and argument
similar to Proposition 15 establishes the following theorem.
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
234
Theorem 51 (Game solution of assume-guarantee synthesis) Given two processes
P1 = (X1 , δ1 ) and P2 = (X2 , δ2 ), two specifications Φ1 for P1 and Φ2 for P2 , and a
start valuation θ0 ∈ Θ, the answer to the assume-guarantee synthesis problem is “Yes” iff
b for the 3-player game graph G
b with the objectives Ψ1 = [[Φ1 ]], Ψ2 = [[Φ2 ]],
(θ0 , 3) ∈ MS 111 (G)
and Ψ3 = Fair.
Example 11 (Assume-guarantee synthesis of mutual-exclusion protocol) We
consider the 8 alternatives C1–C8 of process P1 , and the corresponding spoiling strategies
for process P2 and the scheduler to violate P1 ’s specification. We denote by [→] a switch
between the two processes (decided by the scheduler).
C1 The spoiling strategies for process P2 and the scheduler cause the following sequence of
updates:
P1 : flag[1]:=true; turn:=2; [→];
P2 : flag[2]:=true; turn:=1;
P2 : enters the critical section by passing the guard C8: (since (turn 6= 2)).
After exiting its critical section, process P2 chooses the alternative C10
to enter the beginning of the main loop, sets flag[2]:=true; turn:=1;
and then the scheduler assigns the turn to process P1 , which cannot enter
its critical section. The scheduler then assigns turn to P2 and then
P2 enters the critical section by passing guard C8 and this sequence is
repeated forever.
The same spoiling strategies work for choices C2, C3, C6 and C7.
C4 The spoiling strategies cause the following sequence of updates:
P2 : flag[2]:=true; turn:=1; [→];
P1 : flag[1]:=true; turn:=2; [→];
P2 : enters the critical section by passing the guard C3: (since (turn 6= 1)).
After exiting its critical section, process P2 continues to choose the alternative C9 forever, and the scheduler alternates turn between P1 and
P2 ; and process P1 cannot enter its critical section.
The same spoiling strategies work for the choice C5.
CHAPTER 9. SECURE EQUILIBRIA AND APPLICATIONS
235
C8 The spoiling strategies cause the following sequence of updates:
P2 : flag[2]:=true; turn:=1; [→];
P1 : flag[1]:=true; turn:=2; [→];
P2 : while(flag[2]) nop;
Then process P2 does not enter its critical section, and neither can process P1 enter.
In this case P2 cannot violate P1 ’s specification without violating her own specification.
It follows from this case analysis that no alternatives except C8 for process P1 can witness
a solution to the assume-guarantee synthesis problem. The alternative C8 for process P1
and the symmetric alternative C6 for process P2 provide winning secure strategies for both
processes. In this example, we considered refinements without additional variables; but in
general refinements can have additional variables.
9.5
Conclusion
We considered non-zero-sum graph games with lexicographically ordered objectives
for the players in order to capture adversarial external choice, where each player tries
to minimize the other player’s payoff as long as this does not decrease her own payoff.
We showed that these games have an unique maximal equilibrium for all Borel winning
conditions. This confirms that secure equilibria provide a good formalization of rational
behavior in the context of verifying component-based systems. We also show the relevance
of the equilibria in the co-synthesis problem. The extension of the notion of secure equilibria
in stochastic games and other quantitative setting is an interesting open problem.
236
Bibliography
[AH99]
R. Alur and T.A. Henzinger. Reactive modules. In Formal Methods in System
Design, pages 207–218. IEEE Computer Society Press, 1999.
[AHK02]
R. Alur, T.A. Henzinger, and O. Kupferman. Alternating-time temporal logic.
Journal of the ACM, 49:672–713, 2002.
[AHKV98]
R. Alur, T.A. Henzinger, O. Kupferman, and M.Y. Vardi. Alternating refinement relations. In CONCUR’97, LNCS 1466, Springer, pages 163–178, 1998.
[AL95]
M. Abadi and L. Lamport. Conjoining specifications. ACM Transactions on
Programming Languages and Systems, 17(3):507–534, 1995.
[ALW89]
M. Abadi, L. Lamport, and P. Wolper. Realizable and unrealizable specifications of reactive systems. In ICALP’89, LNCS 372, Springer, pages 1–17,
1989.
[Bas99]
S. Basu. New results on quantifier elimination over real closed fields and applications to constraint databases. Journal of the ACM, 46(4):537–555, 1999.
[Ber95]
D.P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 1995. Volumes I and II.
[BGNV05]
A. Blass, Y. Gurevich, L. Nachmanson, and M. Veanes. Play to test. In
FATES’05, 2005.
237
BIBLIOGRAPHY
[Bil95]
P. Billingsley, editor. Probability and Measure. Wiley-Interscience, 1995.
[BK76]
T. Bewley and E. Kohlberg. The asymptotic theory of stochastic games. Mathematics of Operations Research, 1, 1976.
[BL69]
J.R. Büchi and L.H. Landweber. Solving sequential conditions by finite-state
strategies. Transactions of the AMS, 138:295–311, 1969.
[BPMF]
S. Basu, R. Pollack, and M.-F.Roy. Algorithms in Real Algebraic Geometry.
Springer-Verlag.
[BSV03]
H. Björklund, S. Sandberg, and S. Vorobyov. A discrete subexponential algorithms for parity games. In STACS’03, pages 663–674. LNCS 2607, Springer,
2003.
[Büc62]
J.R. Büchi. On a decision method in restricted second-order arithmetic. In
E. Nagel, P. Suppes, and A. Tarski, editors, Proceedings of the First International Congress on Logic, Methodology, and Philosophy of Science 1960, pages
1–11. Stanford University Press, 1962.
[Can88]
J. Canny.
Some algebraic and geometric computations in PSPACE.
In
STOC’88, pages 460–467. ACM Press, 1988.
[CD06]
X. Chen and X. Deng. Settling the complexity of 2-player Nash-equilibrium.
In FOCS’06. IEEE, 2006. ECCC TR05-140.
[CdAH04]
K. Chatterjee, L. de Alfaro, and T.A. Henzinger. Trading memory for randomness. In QEST’04, pages 206–217. IEEE, 2004.
[CdAH05]
K. Chatterjee, L. de Alfaro, and T.A. Henzinger. The complexity of stochastic
Rabin and Streett games. In ICALP’05, pages 878–890. LNCS 3580, Springer,
2005.
238
BIBLIOGRAPHY
[CdAH06a] K. Chatterjee, L. de Alfaro, and T.A. Henzinger. The complexity of quantitative concurrent parity games. In SODA’06, pages 678–687. ACM-SIAM,
2006.
[CdAH06b] K. Chatterjee, L. de Alfaro, and T.A. Henzinger. Strategy improvement in
concurrent reachability games. In QEST’06. IEEE, 2006.
[CDHR06]
K. Chatterjee, L. Doyen, T.A. Henzinger, and J.F. Raskin. Algorithms for
omega-regular games with imperfect information. In CSL’06, pages 287–302.
LNCS 4207, Springer, 2006.
[CE81]
E.M. Clarke and E.A. Emerson. Design and synthesis of synchronization skeleton using branching-time temporal logic. In Logic of Programs’81, pages 52–71,
1981.
[CH05]
K. Chatterjee and T.A. Henzinger.
Semiperfect-information games.
In
FSTTCS’05. LNCS 3821, Springer, 2005.
[CH06a]
K. Chatterjee and T.A. Henzinger. Strategy improvement and randomized
subexponential algorithms for stochastic parity games. In STACS’06, LNCS
3884, Springer, pages 512–523, 2006.
[CH06b]
K. Chatterjee and T.A. Henzinger. Strategy improvement for stochastic Rabin
and Streett games. In CONCUR’06, LNCS 4137, Springer, pages 375–389,
2006.
[CH07]
K. Chatterjee and T.A. Henzinger. Assume guarantee synthesis. In TACAS’07,
LNCS 4424, Springer, pages 261–275, 2007.
[Cha05]
K. Chatterjee. Two-player nonzero-sum ω-regular games. In CONCUR’05,
pages 413–427. LNCS 3653, Springer, 2005.
BIBLIOGRAPHY
[Cha06]
239
K. Chatterjee. Concurrent games with tail objectives. In CSL’06, LNCS 4207,
Springer, pages 256–270, 2006.
[Cha07a]
K. Chatterjee. Concurrent games with tail objectives. Theoretical Computer
Science, 2007. (To Appear).
[Cha07b]
K. Chatterjee. Optimal strategy synthesis for stochastic Müller games. In
FoSSaCS’07, LNCS 4423, Springer, pages 138–152, 2007.
[Cha07c]
K. Chatterjee. Stochastic Müller games are PSPACE-complete. In FSTTCS’07,
2007. (To Appear).
[CHJ04]
K. Chatterjee, T.A. Henzinger, and M. Jurdziński. Games with secure equilibria. In LICS’04, pages 160–169. IEEE, 2004.
[CHJM05]
K. Chatterjee, T.A. Henzinger, R. Jhala, and R. Majumdar. Counterexampleguided planning. In UAI, pages 104–111. AUAI Press, 2005.
[CHP07]
K. Chatterjee, T.A. Henzinger, and N. Piterman. Generalized parity games.
In FoSSaCS’07, LNCS 4423, Springer, 2007.
[Chu62]
A. Church. Logic, arithmetic, and automata. In Proceedings of the International Congress of Mathematicians, pages 23–35. Institut Mittag-Leffler, 1962.
[CJH03]
K. Chatterjee, M. Jurdziński, and T.A. Henzinger. Simple stochastic parity
games. In CSL’03, LNCS 2803, Springer, pages 100–113, 2003.
[CJH04]
K. Chatterjee, M. Jurdziński, and T.A. Henzinger. Quantitative stochastic
parity games. In SODA’04, ACM-SIAM, pages 114–123, 2004.
[CMH07]
K. Chatterjee, R. Majumdar, and T.A. Henzinger. Stochastic limit-average
games are in EXPTIME. International Journal of Game Theory, 2007. (To
Appear).
240
BIBLIOGRAPHY
[Con92]
A. Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992.
[Con93]
A. Condon. On algorithms for simple stochastic games. In Advances in Computational Complexity Theory, volume 13 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 51–73. American Mathematical Society, 1993.
[CY90]
C. Courcoubetis and M. Yannakakis. Markov decision processes and regular
events. In ICALP’90, LNCS 443, Springer, pages 336–349, 1990.
[CY95]
C. Courcoubetis and M. Yannakakis. The complexity of probabilistic verification. Journal of the ACM, 42(4):857–907, 1995.
[dA97]
L. de Alfaro. Formal Verification of Probabilistic Systems. PhD thesis, Stanford
University, 1997.
[dAFH+ 03] L. de Alfaro, M. Faella, T.A. Henzinger, R. Majumdar, and M. Stoelinga. The
element of surprise in timed games. In CONCUR’03, LNCS 2761, Springer,
pages 144–158. 2003.
[dAH00]
L. de Alfaro and T.A. Henzinger.
Concurrent omega-regular games.
In
LICS’00, pages 141–154. IEEE, 2000.
[dAH01]
L. de Alfaro and T.A. Henzinger. Interface theories for component-based design. In EMSOFT’01, LNCS 2211, Springer, pages 148–165. 2001.
[dAHK98]
L. de Alfaro, T.A. Henzinger, and O. Kupferman. Concurrent reachability
games. In FOCS,98, pages 564–575. IEEE, 1998.
[dAHM00a] L. de Alfaro, T.A. Henzinger, and F.Y.C. Mang. The control of synchronous
systems. In CONCUR’00, LNCS 1877, pages 458–473. Springer, 2000.
241
BIBLIOGRAPHY
[dAHM00b] L. de Alfaro, T.A. Henzinger, and F.Y.C. Mang. Detecting errors before reaching them. In CAV’00, LNCS 1855, Springer, pages 186–201, 2000.
[dAHM01]
L. de Alfaro, T.A. Henzinger, and F.Y.C. Mang. The control of synchronous
systems, part ii. In CONCUR’01, LNCS 2154, pages 566–580. Springer, 2001.
[dAHM03]
L. de Alfaro, T.A. Henzinger, and R. Majumdar. Discounting the future in
systems theory. In ICALP’03, LNCS 2719, pages 1022–1037. Springer, 2003.
[dAM01]
L. de Alfaro and R. Majumdar. Quantitative solution of omega-regular games.
In STOC 01, pages 675–683. ACM, 2001.
[Der70]
C. Derman. Finite State Markovian Decision Processes. Academic Press, 1970.
[DGP06]
C. Daskalakis, P.W. Goldberg, and C.H. Papadimitriou. The complexity of
computing a Nash equilibrium. In STOC’06. ACM, 2006. ECCC, TR05-115.
[Dil89]
D.L. Dill.
Trace Theory for Automatic Hierarchical Verification of Speed-
independent Circuits. The MIT Press, 1989.
[DJW97]
S. Dziembowski, M. Jurdzinski, and I. Walukiewicz. How much memory is
needed to win infinite games? In LICS’97, pages 99–110. IEEE, 1997.
[Dur95]
Richard Durrett. Probability: Theory and Examples. Duxbury Press, 1995.
[EJ88]
E.A. Emerson and C. Jutla. The complexity of tree automata and logics of
programs. In FOCS’88, pages 328–337. IEEE, 1988.
[EJ91]
E.A. Emerson and C. Jutla. Tree automata, mu-calculus and determinacy. In
FOCS’91, pages 368–377. IEEE, 1991.
[EM79]
A. Ehrenfeucht and J. Mycielski. Positional strategies for mean payoff games.
Int. Journal of Game Theory, 8(2):109–113, 1979.
BIBLIOGRAPHY
[Eve57]
242
H. Everett. Recursive games. In Contributions to the Theory of Games III,
volume 39 of Annals of Mathematical Studies, pages 47–78, 1957.
[EY05]
K. Etessami and M. Yannakakis. Recursive Markov decision processes and
recursive stochastic games. In ICALP’05, LNCS 3580, Springer, pages 891–
903, 2005.
[EY06]
K. Etessami and M. Yannakakis. Recursive concurrent stochastic games. In
ICALP’06 (2), LNCS 4052, Springer, pages 324–335, 2006.
[EY07]
K. Etessami and M. Yannakakis. On the complexity of Nash equilibria and
other fixed points. In FOCS’07, IEEE, 2007.
[Fin64]
A.M. Fink. Equilibrium in a stochastic n-person game. Journal of Science of
Hiroshima University, 28:89–93, 1964.
[FV97]
J. Filar and K. Vrieze. Competitive Markov Decision Processes. SpringerVerlag, 1997.
[GH82]
Y. Gurevich and L. Harrington. Trees, automata, and games. In STOC’82,
pages 60–65. ACM, 1982.
[HD05]
P. Hunter and A. Dawar. Complexity bounds for regular games. In MFCS’05,
pages 495–506, 2005.
[HJM03]
T.A. Henzinger, R. Jhala, and R. Majumdar. Counterexample-guided control.
In ICALP’03, LNCS 2719, pages 886–902. Springer, 2003.
[HKR02]
T.A. Henzinger, O. Kupferman, and S. Rajamani. Fair simulation. Information
and Computation, 173:64–81, 2002.
BIBLIOGRAPHY
243
[HMMR00] T.A. Henzinger, R. Majumdar, F.Y.C. Mang, and J.-F. Raskin. Abstract
interpretation of game properties. In SAS’00, LNCS 1824, pages 220–239.
Springer, 2000.
[Hor05]
F. Horn. Streett games on finite graphs. In GDV’05, 2005.
[JPZ06]
M. Jurdziński, M. Paterson, and U. Zwick. A deterministic subexponential
algorithm for solving parity games. In SODA’06, pages 117–123. ACM-SIAM,
2006.
[Jr50]
J.F. Nash Jr. Equilibrium points in n-person games. Proceedings of the National Academny of Sciences USA, 36:48–49, 1950.
[Jur00]
Marcin Jurdzinski. Small progress measures for solving parity games. In
STACS’00, LNCS 1770, Springer, pages 290–301, 2000.
[Kem83]
J.H. Kemeny. Finite Markov Chains. Springer, 1983.
[Koz83]
D. Kozen. Results on the propositional µ-calculus. Theoretical Computer
Science, 27(3):333–354, 1983.
[Kre90]
D.M. Kreps. A Course in Microeconomic Theory. Princeton Univeristy Press,
1990.
[KV98]
O. Kupferman and M.Y. Vardi. Weak alternating automata and tree automata
emptiness. In STOC’98, pages 224–233. ACM, 1998.
[LL69]
T. A. Liggett and S. A. Lippman. Stochastic games with perfect information
and time average payoff. Siam Review, 11:604–607, 1969.
[Maj03]
R. Majumdar. Symbolic algorithms for verification and control. PhD thesis,
UC Berkeley, 2003.
BIBLIOGRAPHY
[Mar75]
244
D.A. Martin. Borel determinacy. Annals of Mathematics, 102(2):363–371,
1975.
[Mar98]
D.A. Martin. The determinacy of Blackwell games. The Journal of Symbolic
Logic, 63(4):1565–1581, 1998.
[McN93]
R. McNaughton. Infinite games played on finite graphs. Annals of Pure and
Applied Logic, 65:149–184, 1993.
[MM02]
A. McIver and C. Morgan. Games, probability, and the quantitative µ-calculus
qµ. In LPAR’02, LNCS 2514, Springer, pages 292–310, 2002.
[MN81]
J.F. Mertens and A. Neyman. Stochastic games. International Journal of
Game Theory, 10:53–66, 1981.
[Mos84]
A.W. Mostowski. Regular expressions for infinite trees and a standard form
of automata. In 5th Symposium on Computation Theory, LNCS 208, pages
157–168. Springer, 1984.
[MP92]
Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent
Systems: Specification. Springer-Verlag, 1992.
[MPS95]
O. Maler, A. Pnueli, and J. Sifakis. On the synthesis of discrete controllers
for timed systems. In STACS’95, LNCS 900, pages 229–242. Springer-Verlag,
1995.
[NAT03]
K. Namjoshi N. Amla, E.A. Emerson and R. Trefler. Abstract patterns for
compositional reasoning. In CONCUR 03: Concurrency Theory, 2003.
[Niw97]
D. Niwiński. Fixed-point characterization of infinite behavior of finite-state
systems. In Theoretical Computer Science, volume 189(1-2), pages 1–69, 1997.
BIBLIOGRAPHY
[NS03]
245
A. Neyman and S. Sorin. Stochastic Games and Applications. Kluwer Academic
Publishers, 2003.
[Owe95]
G. Owen. Game Theory. Academic Press, 1995.
[Pap01]
C.H. Papadimitriou. Algorithms, games, and the internet. In STOC 01, pages
749–753. ACM Press, 2001.
[PP06]
N. Piterman and A. Pnueli. Faster solution of rabin and streett games. In
LICS’06, pages 275–284. IEEE, 2006.
[PR89]
A. Pnueli and R. Rosner. On the synthesis of a reactive module. In POPL’89,
pages 179–190. ACM, 1989.
[PT87]
C. H. Papadimitriou and J. N. Tsitsiklis. The complexity of Markov decision
processes. Mathematics of Operations Research, 12:441–450, 1987.
[Rab69]
M.O. Rabin. Automata on Infinite Objects and Church’s Problem. Number 13
in Conference Series in Mathematics. American Mathematical Society, 1969.
[Rei79]
J. H. Reif. Universal games of incomplete information. In STOC’79, pages
288–308. ACM, 1979.
[RF91]
T.E.S. Raghavan and J.A. Filar. Algorithms for stochastic games — a survey.
ZOR — Methods and Models of Operations Research, 35:437–472, 1991.
[RW87]
P.J. Ramadge and W.M. Wonham. Supervisory control of a class of discreteevent processes. SIAM Journal of Control and Optimization, 25(1):206–230,
1987.
[Saf88]
S. Safra. On the complexity of ω-automata. In Proceedings of the 29th Annual Symposium on Foundations of Computer Science, pages 319–327. IEEE
Computer Society Press, 1988.
BIBLIOGRAPHY
[Sha53]
246
L.S. Shapley. Stochastic games. Proc. Nat. Acad. Sci. USA, 39:1095–1100,
1953.
[SS01]
P. Secchi and W.D. Sudderth. Stay-in-a-set games. International Journal of
Game Theory, 30:479–490, 2001.
[Tar51]
A. Tarski. A Decision Method for Elementary Algebra and Geometry. University of California Press, Berkeley and Los Angeles, 1951.
[Tho95]
W. Thomas. On the synthesis of strategies in infinite games. In STACS 95,
LNCS 900, Springer, pages 1–13, 1995.
[Tho97]
W. Thomas. Languages, automata, and logic. In Handbook of Formal Languages, volume 3, Beyond Words, chapter 7, pages 389–455. Springer, 1997.
[Var85]
M.Y. Vardi. Automatic verification of probabilistic concurrent finite-state systems. In FOCS’85, pages 327–338. IEEE, 1985.
[Vie00a]
N. Vieille. Two player stochastic games I: a reduction. Israel Journal of
Mathematics, 119:55–91, 2000.
[Vie00b]
N. Vieille. Two player stochastic games II: the case of recursive games. Israel
Journal of Mathematics, 119:93–126, 2000.
[VJ00]
J. Vöge and M. Jurdziński. A discrete strategy improvement algorithm for
solving parity games. In CAV’00, LNCS 1855, Springer, pages 202–215, 2000.
[vNM47]
J. von Neumann and O. Morgenstern. Theory of games and economic behavior.
Princeton University Press, 1947.
[VW86]
M.Y. Vardi and P. Wolper. An automata-theoretic approach to automatic
program verification. In LICS’86, pages 322–331. IEEE, 1986.
BIBLIOGRAPHY
[Wad84]
247
W.W. Wadge. Reducibility and Determinateness of Baire Spaces. PhD thesis,
UC Berkeley, 1984.
[Wal96]
I. Walukiewicz. Pushdown processes: Games and model checking. In CAV’96,
LNCS 1102, pages 62–74. Springer, 1996.
[Wal04]
I. Walukiewicz. A landscape with games in the background. In LICS’04, pages
356–366. IEEE, 2004.
[Zie98]
W. Zielonka. Infinite games on finitely coloured graphs with applications to
automata on infinite trees. In Theoretical Computer Science, volume 200(1-2),
pages 135–183, 1998.
[ZP96]
U. Zwick and M.S. Paterson. The complexity of mean payoff games on graphs.
Theoretical Computer Science, 158:343–359, 1996.

Download Report

Stochastic Omega-Regular Games - EECS Berkeley

Paperzz.com

Your Paperzz