1,2,3 - ENS Cachan

Games with imperfect information:
Theory and algorithms
Laurent Doyen
LSV, ENS Cachan & CNRS
3-coin game
Player 1
Player 2
• Player 1 does not see the coins but knows how many coins are on H
(imperfect information). Player 2 does see them (perfect information).
• Initially, two coins are on H.The game is played in rounds as follows:
Player 1 chooses a coin C in {1,2,3}. Player 2 flips C, then he decides to
exchange or not the position of the other two coins. He announces the
number of H to Player 1.
• Player 1 wins when all coins are on H, Player 2 wins when all coins are
on T or if the game never reaches 3 coins on H.
3-coin game
2H
H
1
S
1H
2
0H
T
T
T
H
T
H
T
T
Loosing
3-coin game
2H
H
1
S
1H
T
1
S
2H
2
3H
H
H
T
H
T
H
H
T
H
H
Winning
3-coin game
Player 1
Player 2
• Player 1 does not see the coins but knows how many coins are on H
(imperfect information). Player 2 does see them (perfect information).
• Initially, two coins are on H.The game is played in rounds as follows:
Player 1 chooses a coin C in {1,2,3}. Player 2 flips C, then he decides to
exchange or not the position of the other two coins. He announces the
number of H to Player 1.
• Player 1 wins when all coins are on H, Player 2 wins when all coins are
on T or if the game never reaches 3 coins on H.
3-coin game
Player 1
Player 2
• Player 1 does not see the coins but knows how many coins are on H
(imperfect information). Player 2 does see them (perfect information).
• Initially, two coins are on H.The game is played in rounds as follows:
Player 1 chooses a coin C in {1,2,3}. Player 2 flips C, then he decides to
exchange or not the position of the other two coins. He announces the
number of H to Player 1.
• Player 1 wins when all coins are on H, Player 2 wins when all coins are
on T or if the game never reaches 3 coins on H.
Content
• Game structures with imperfect information, to
model games such as the 3-coin game.
• Two variants: deterministic vs. randomized
strategies
• Algorithms to solve games with imperfect
information, i.e. to decide who is the winner, and
synthesize winning strategies (when they exist).
Section 3 & 4 in the notes.
Game structures with
imperfect information
Imperfect information
• Games with perfect information makes the strong assumption that
the players can observe the state of the game and the previous moves
before playing.
• This is often unrealistic in the design of reactive systems because
- processes have internal state not visible to other processes
(e.g. local variables).
- noisy sensors entail uncertainties on the state of the game.
Hidden variables = imperfect information.
Sensor uncertainty = imperfect information.
Game structures
Game structures
Game structures
Rounds
Games are played by two players for infinitely many rounds, initially in
Round i:
is the current location
- Player 1 chooses an action
- Player 2 resolves the nondeterminism,
by choosing a successor
such that
Round i+1 starts in
Rounds
Rounds
Rounds
Rounds
Rounds
Rounds
Rounds
Rounds
Rounds
Rounds
Rounds
Remark
a
a
a
a,b
b
See also Exercise 3 in the lecture notes.
b
Play, history
A play is an infinite sequence of locations
such that
and
A history of a play is a finite prefix of the play.
is the prefix of length j
We denote by
the location.
Strategies
A deterministic strategy for Player 1 is a function
maps histories to action.
that
A deterministic strategy for Player 2 is a function
such that:
A strategy
is memoryless if
i.e., memoryless strategies depend only on the last location of the history.
Outcome
The outcome of a strategy
is the play
for Player 1 and a strategy
where
This play is denoted
A play
is consistent with a strategy
for Player 1 if
for some strategy
for Player 2.
The set of plays consistent with
is denoted
for Player 2
Winning condition
A winning condition for a game
sequences of locations, i.e. a subset of
• A reachability objective is defined by a set
• A safety objective is defined by a set
Reachability
Safety
is a set of
of target locations
of safe locations
Surely winning
Let
be a game structure and
The strategy
if
be a winning condition
for Player 1 is surely-winning in
(similarly for Player 2)
Memoryless strategies suffice to win perfect-information
games with reachability and saftey objectives.
Observations
Given
such that
, we denote by
.
the (unique) observation
Observations
While playing, only the observation of the current location
is visible to Player 1.
Observation-based strategies
A strategy
is observation-based if
Example: an observation-based stratregy
after
and after
since
plays the same action
Special cases
• Perfect-information game (observations are singletons)
• Blind game (one obervation)
The universality problem for finite automata can be reduced
to solving blind reachability game.
A
NFA
#
from all rejecting
states
GA
A word w is rejected by NFA A
iff w# is a winning strategy in
blind reachability game GA.
Imperfect information: discussion
• The games that we consider sound asymmetric.
• Indeed, Player 1 has imperfect information while Player 2 has
perfect information.
• Nevertheless, making Player 2 weaker (with imperfect information)
would not help Player 1 to surely win.
• Indeed, it can be shown that counting strategies are sufficient for
spoiling deterministic strategies.
• See Theorem 4 in the lecture notes.
Imperfect information: discussion
• We consider observable objectives
• Observable objectives can be viewed as subsets of
.
• Observable reachability objectives defined by a set of observations
• See Exercise 9 for a generalization to non-observable objectives
Example
Can Player 1 surely-win
with an observation-based
strategy ?
NO!
Example
Can Player 1 surely-win
with an observation-based
strategy ?
NO!
Fix an arbitrary strategy
Consider the strategy
for Player 1.
for Player 2 such that
is a spoiling strategy against
for
Example
Can Player 1 surely-win
with an observation-based
strategy ?
NO!
Similarly, Player 2 has no pure strategy to ensure
Since
and
are
complementary objectives, this shows that games
with imperfect information are not determined.
Memory may be necessary
Play a
Player 1 needs memory for surely-winning
Play b
Memory may be necessary
Memory may be necessary
The knowledge of Player 1 provides information
about the history of the play.
Solving games with
imperfect information
Reduction to perfect information
After a finite prefix of a play, Player 1 has a
partial knowledge of the current state of the
game: a set of states, called a cell.
Reduction to perfect information
After a finite prefix of a play, Player 1 has a
partial knowledge of the current state of the
game: a set of states, called a cell.
Initial knowledge: cell
Reduction to perfect information
After a finite prefix of a play, Player 1 has a
partial knowledge of the current state of the
game: a set of states, called a cell.
Initial knowledge: cell
Player 1 plays σ,
Player 2 chooses
.
Current knowledge: cell
Subset construction
Imperfect information
Perfect information
Subset construction
Imperfect information
Perfect information
Subset construction
Imperfect information
Perfect information
Reachability of a union
of observations
Theorem
Player 1 is winning in
winning in
.
if and only if Player 1 is
See Exercise 8 for general reachability objectives.
Imperfect information
Games of imperfect information can be solved by a
reduction to games of perfect information.
G,Obs
Imperfect
information
G’


Winning region
Perfect
information
subset
construction
classical
techniques
Imperfect information
G,Obs
Imperfect
information
G’


Winning region
Perfect
information
subset
construction
Exponential
blow-up
classical
techniques
Imperfect information
G,Obs
Imperfect
information

implicit
G’

Perfect
information
Direct symbolic algorithm
Winning region
Symbolic algorithm
Controllable predecessor:
set of cells
set of cells
Winning cells for
:
Symbolic algorithm
Obs 1
Obs 2
The union of two controllable cells is not necessarily controllable,
but…
Symbolic algorithm
If a cell s is controllable (i.e. winning for Player 1),
then all sub-cells s’  s are controllable.
Symbolic algorithm
If a cell s is controllable (i.e. winning for Player 1),
then all sub-cells s’  s are controllable.
copy the strategy from s
Symbolic algorithm
The sets of cells computed by the fixpoint iterations are
downward-closed.
Symbolic algorithm
The sets of cells computed by the fixpoint iterations are
downward-closed.
It is sufficient to keep only
the -maximal cells.
Antichains
Antichains
The antichain {{1,2,3},{3,4}}
represents the set of cells
{1,2,3}
{1,3}
{1,2} {2,3}
{1} {2}
{3,4}
= {{1,2,3},{3,4}}
{3} {4}
i.e. the downward-closure of {{1,2,3},{3,4}}
Structure of antichains
Membership
?

Structure of antichains
Inclusion
?

Structure of antichains
Inclusion
?

partial order on antichains
Structure of antichains
Union

Structure of antichains
Union

maximal elements of
Computing
is polynomial.
Structure of antichains
Intersection

Structure of antichains
Intersection

Structure of antichains
Intersection

maximal elements of
Computing
is exponential !
Structure of antichains
Independent set
(pairwise non-adjacent vertices)
Structure of antichains
Independent set
(pairwise non-adjacent vertices)
Computing largest independent set is NP-hard
Structure of antichains
Consider a graph
The sets of vertices that do no contain edge (v,w) are
represented by the antichain
Hence, the maximal independent sets of G are defined by
Computing
is exponential (unless P=NP)
Structure of antichains
Intersection

Antichains partially-ordered by
a complete lattice
maximal elements of
Computing
is exponential !
is
Symbolic algorithm
Controllable predecessor operator
CPre(q) = cells
from which Player 1 has an action ( )
such that for all
the cell
chosen by Player 2
is in q
in q
in q
in q
Symbolic algorithm
Controllable predecessor operator
If q is downward-closed, then ...
q
Symbolic algorithm
Controllable predecessor operator
If q is downward-closed, then CPre(q) is downward-closed.
CPre(q)
q
Cpre() preserves downward-closedness.
Antichains
Symbolic algorithms
Algorithm for solving reachability games
= set of cells from which Player 1 can force to visit the
target
within at most steps.
Fixpoint iteration:
W* is the set of (maximal) winning cells in
G for reachability objective Reach(T).
Antichain algorithms
Antichain algorithms have applications to solve automatatheoretic problems (using equivalence with blind games)
• Finite automata: language inclusion, universality,
etc.
[De Wulf,D,Henzinger,Raskin 06]
• Alternating Büchi automata: emptiness and
[D,Raskin 07]
language inclusion.
• LTL: satisfiability and model-checking.
[De Wulf,D,Maquet,Raskin 08]
http://www.antichains.be
Antichain algorithms
Antichain
algorithms have applications to solve automata12
dk.brics
theoretic problems (using
equivalence with blind games)
Execution time (s)
10
Alaska
• Finite automata: language inclusion, universality,
etc.
[De Wulf,D,Henzinger,Raskin 06]
8
6
• Alternating Büchi automata: emptiness and
[D,Raskin 07]
language inclusion.
4
2
• LTL: satisfiability and model-checking.
0
0
[De Wulf,D,Maquet,Raskin 08]
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Num ber of states
http://www.antichains.be
Sure-winning - Summary
• Games with imperfect information are EXPTIME-complete (both for
reachability objective [CDHR07] and safety objective [BD08])
• Games with imperfect information with the notion of surely-winning
are not determined
• Finite memory is needed even for reachability objectives
• Knowledge-based subset construction allows us to construct
equivalent games with perfect information
• Antichains are adequate data-structures to handle underlying state
spaces
• Blind games and antichains are useful to obtain new efficient
algorithms for classical automata-theoretic problems
Almost-sure winning
3-coin game
Player 1
Player 2
• Player 1 does not see the coins but knows how many coins are on H
(imperfect information). Player 2 does see them (perfect information).
• Initially, two coins are on H. The game is played in rounds as follows:
Player 1 chooses a coin C in {1,2,3}. Player 2 flips C, then he decides to
exchange or not the position of the other two coins. He announces the
number of H to Player 1.
• Player 1 wins when all coins are on H, Player 2 wins when all coins are
on T or if the game never reaches 3 coins on H.
3-coin game
×
×
Without exchange
Winning strategy: try flipping successively each coin.
3-coin game
Player 1
Player 2
• Player 1 cannot be surely-winning if Player 2 is allowed to exchange
coins.
• This is because against a fixed strategy of Player 1, Player 2 can
anticipate the next move of Player 1 and decide to exchange coins
accordingly to always avoid HHH.
3-coin game
Player 1
Player 2
• However, consider the following strategy for Player 1: choose uniformly at
random a coin to flip;
- With probability 1/3 the winning state HHH is reached;
- Otherwise, flip again the same coin, and start again.
• In the long run, Player 1 wins with probability 1, i.e. almost-surely.
• This example shows that randomized strategies are more powerful than
deterministic strategies to win reachability games with imperfect
information.
Strategies
A randomized strategy for Player 1 is a function
that maps histories to action.
A randomized strategy for Player 2 is a function
such that:
Given strategies and , initial location
play
with
is
where
Notion of observation-based strategy
, the probability of a finite
Outcome
For a measurable set of plays, we denote by
probability that a play satisifies
when strategies
from location .
and
the
are used
A strategy for Player 1 is almost-surely winning for objective
from
if for all strategies for Player 2, we have:
Note that our definition is again asymmetric.
While having perfect information does not help Player 2 in the case of
surely-winning, it makes Player 2 stronger in this probabilistic setting.
See [BGG09,GS09] for a symmetric model.
Example
Almost-surely winning strategy: play a and b unifromly at random
Almost-sure winning
Classical knowledge-based
subset construction ?
Almost-sure winning
Does not preserve almost-sure
winning strategies !
Classical knowledge-based
subset construction ?
We need to reduce the power of
Player 2.
Almost-sure winning
Extended subset constrution: states (s,l)
- Set s tracks the knowledge of Player 1.
- Location l (the current location) keeps track of
Player 2’s choices.
Extended subset construction
Given a game structure with imperfect information
we construct the extended knowledge-based subset construction as
follows:
Extended subset construction
Extended subset construction
• In a state (s,l), the strategy of Player 1 should depend only on
knowledge s, not on the location l.
Extended subset construction
• In a state (s,l), the strategy of Player 1 should depend only on
knowledge s, not on the location l.
• Define an equivalence betweeen states:
• and require Player 1 to play with equivalence-preserving strategies.
Extended subset construction
The equivalence
induces equivalence of plays and histories, and equivalence-preserving
strategies:
Extended subset construction
Reachability objective
becomes
Player 1 has an observation-based almost-sure
winning strategy in G for Reach(T) if and only if
Player 1 has an equivalence-preserving strategy in
Knw(G) for Reach(T’).
Algorithm
• Memoryless strategies are sufficient for Player 1 to win in the extended
subset construction:
Play uniformly at random all actions that do not leave the
almost-sure winning region.
Win
In a state (s,l), let Allow(s,l) be the set of such actions.
Algorithm
• To be winning, the strategy needs to ensure with positive probability
that the target set is reached within some fixed number of steps.
The action Good(s,l)  Allow(s,l) is a witness of this requirement
Win
Algorithm
• To be winning, the strategy needs to ensure with positive probability
that the target set is reached within some fixed number of steps.
The action Good(s,l)  Allow(s,l) is a witness of this requirement
Win
Algorithm
• Player 1 is almost-surely winning with an equivalence-preserving
strategy from the set Win  Q iff there exist functions
such that
1. for all
2. for all
and for all
3. in the graph (Win,E) with
all infinite paths visit a state in T’.
Ranking computation
• Assume a set W is given.
W
Ranking computation
• Assume a set W is given.
W
Ranking computation
• Assume a set W is given.
W
Ranking computation
• Assume a set W is given.
W
Ranking computation
• Assume a set W is given.
W
Ranking computation
• Assume a set W is given.
Let PosReach(W) be the fixpoint of this iteration
1. Either W = PosReach(W) and then W = Win
2. or PosReach(W)  W and then, start again with W  PosReach(W)
Algorithm
• The set of states from which Player is winning with an equivalencepreserving strategy is computed as follows:
where PosReach(Wi) is computed by:
• An almost-sure winning strategy is obtained by playing in q all actions
of Allow(q) uniformly at random.
Complexity
• The problem of deciding if a location is almost-sure winning for a
reachability objective is EXPTIME-complete (hence the algorithm based
on extended subset construction is worst-case optimal)
• For safety objective, sure-winning and almost-sure winning coincide
(hence, the problem is also EXPTIME-complete)
• Example: apply the almost-sure algorithm to the 3-coin example (see
also lecture notes)
Stochastic games
• We have considered game structures with non-probabilistic transitions.
• This is not restrictive in the context of imperfect information.
We show that a probabilistic state can be simulated by imperfect
information:
Stochastic games
Each player can unilaterally decide to simulate the probabilistic state
by playing uniformly at random:
Player 2 chooses states (s,0),(s,1),(s,2) unifiormly at random
Player 1 chooses actions 0,1,2 unifiormly at random
Stochastic games
Each player can unilaterally decide to simulate the probabilistic state
by playing all actions uniformly at random.
Hence, compared to the original game, no player can hope to
improve his probability to win.
Corollary
• Undecidability results for stochastic games with imperfect information
carry over to our model of games with imperfect information
• Threshold problem for probabilistic automata is undecidable
 the maximal probability to win a reachability game with imperfect
information is not computable
• Emptiness of almost-sure coBüchi automata is undecidable
 almost-sure winning for coBüchi objective is undecidable
A
Prob. aut.
#
from all accepting
states
GA
A word w is accepted with
probability p iff w# is a
strategy winning with
probability p in blind
reachability game GA.
Conclusion
• Games with imperfect information,
Algorithms for reachability objective (see lecture notes for other
objectives) using reduction to games with perfect information
• Memory and randomization are necessary to win
• For sure-winning, games with imp. info. are not determined
• Antichain algorithms, with applications in automata theory
Thank you !
Questions ?

Download Report

1,2,3 - ENS Cachan

Paperzz.com

Your Paperzz