Imperfect Information in Games for Multi-Agent Systems

Imperfect Information in Games for Multi-Agent
Systems
Internship Report ∼ May 23, 2016 – July 29, 2016.
Raphaël Berthon1
Supervisors: Bastien Maubert2 and
Aniello Murano2
1
2
ENS Rennes, France
University of Naples Federico II, Italy
Abstract. Problems involving agents can be met in a wide range of
fields, and can be used in various ways: agents can represent robots in
artificial intelligence, users in security, but also processors in architecture.
A formal approach to these problems is to represent the world as a game
structure, where the agents are the players, and a task involving our
agents is a logical formula in a given language.
In this framework, the model checking problem is to check whether our
task (a formula) is satisfied on our model of the world (the graph of the
game). A question that usually arises is to know if a group of agents
has a strategy to reach a given goal in the game. We study this problem on logics with a great expressivity, and when agents have imperfect
information about their world.
1
Introduction
Trying to know if a group of agents can fulfill a given task involving strategies
(the task is represented by a logical formula) in a given setting (represented by
a graph, the game structure) is a problem met as well in AI as in computer
security. The model checking problem is to check whether our task (the formula)
holds on our model of the world (the graph of the game).
Various logics exist to express questions on strategies. The more powerful
the logic used to describe a goal, the more difficult it is to decide if the goal can
be reached or not [1]. We know the complexity bounds for most of these logics,
when players have exactly the state of the world at each step of the game [2–6].
This is not always the case, players may not be able to make the difference
between some situations, or may forget information. To represent this, the idea of
imperfect information has been introduced [7, 11]. A useful framework is perfect
recall : agents still recall everything they have seen, but they are not able to make
the distinction between some of the states in the game.
Very few results are known on logics dealing with strategies in the perfect
recall framework. Deciding if there is a winning strategy for a winning condition
is undecidable in the general case [7], even with very weak logics like CTL. On
these weak logics, some restrictions have been added to make this problem decidable [8, 9]. The complexity with perfect recall is almost always non-elementary
(tower-exponential complete in fact), and some optimal restrictions (when any
weaker restriction is undecidable) are known [10].
Some useful logics: The logics we use are defined on Game Structures, graphs
with states labeled with propositional variables, and where the joint decision of
a set of players gives the transition taken from a given state (in fact, it uses the
history of all the states visited until now).
We first present Linear Temporal Logic [2] (LTL). This logic includes boolean
formulas on the variables of the current state, and some new operators. The next
unary operator (denoted by X) means that a given property will be true at the
state in the next step. The until binary operator (denoted by U ) means that a
first property will be true in every future state, until a second property occurs.
Computation tree logic [3] (CTL) is a logic where the strategy quantifier E
(there exists a path) is used to say that the team composed of all the players in
the game has a strategy to verify a given goal; and A (for all paths) to say that
whatever their strategies are, they will reach this goal. An important remark is
that the combination of E and ∧ can be used to say that in a given state, two
futures are possible. For example, we can ask the players to have a way so that
in the next state p is true, and to have a way so that in the next state, p is false.
This is represented by EXp ∧ EX¬p. This formula can be satisfied if the current
state has a transition leading to one state where p is true, and to another state
where p is false. As our formulas have to deal with all possible behaviors of the
system, a given point has in general more than one possible futures, and we shall
see this set of possible futures as a tree.
In this report, we focus on a generalization of CTL, Alternating-time Temporal Logic [4] (ATL) and some variations. In ATL, an existential quantifier hhAiiϕ
can be used to say that the players in the set A can team up to fulfill the goal ϕ.
With the team made of all the players in the game, we get the operator E back.
Solving games with imperfect information and with CTL and LTL objectives
has been extensively studied, but apart from the general undecidability result,
nothing was known for stronger logics like ATL under perfect recall.
In a first time, we study ATL∗ , an extension of ATL. Of all the four logics
previously presented, only ATL∗ will be formally defined, but it is easy to understand LTL, CTL, and ATL as restrictions of this logic. In ATL, hhAii can only
be used before specific kind of formula (in fact, strategy quantifiers can only be
followed by X or U, and those cannot appear without a strategy quantifier).
The ∗ indicates that we do not have this constraint any more. In general, when
a logic is followed by ∗, there exists a constrained version of it.
Plan of the report: In this internship, we studied the model checking of more
expressive logics dealing with strategies, in the special case when there is a
hierarchy on players, as introduced by Peterson and Reif [11, 12]. This framework
is defined in Section 2. The choice for this specific restriction is that almost
all known decidable cases in weaker logics can be reduced to this one [13, 14,
3, 15]. In Section 3, we show that ATL∗ model checking with the hierachical
asumption is decidable (in fact the result is more general). We also consider ATL
with both strategy contexts (ATL∗sc [6]) and imperfect information. Strategy
contexts are interesting, as they can express Nash equilibria, which are greatly
used in the field of artificial intelligence. To prove results on this logic, we had
to define a second logic equivalent to our first one, QCTL∗i (QCTL∗ [16] with
imperfect information). These definitions are presented in Section 4. The proof of
equivalence is done, and then the undecidability of both these logics (even when
the hierarchical asumption holds) is proved by a reduction from MSO with equal
level; all these results are presented in Section 5.
A decidability result was obtained on ATLsc with imperfect information. We
had to introduce a new restriction for this. This is not described in detail in this
report, and only mentioned in the conclusion.
2
2.1
Preliminaries
Game Structure
The logics presented earlier use games (or rather the description of what can
happen in a given game) as their models. Games are represented by transition
systems, where the choices of the players (and sometimes non-determism) decides
the next move.
Definition 1. A Game Structure [4] is a tuple G = hk, Q, Π, π, (da )a∈{1,...,k} , δi
such that:
– k ∈ N∗ is the number of players. Each player is designated by its index
a ∈ {1, . . . , k}. We write Σ = {1, . . . , k}
– Q is a finite set of states.
– Π is a finite set of propositional variables.
– π : Q → P(Π) is the labeling function: if q ∈ Q, then π(q) ⊆ Π is the set
of propositional variables true at the state q.
– da : Q → N∗ is such that da (q) is the number of moves available to player
a ∈ {1, . . . , k} at state q ∈ Q. The moves of a player at a given state will be
designated by their index i ∈ {1, . . . , da (q)}.
For each state q, a move vector is a tuple hj1 , . . . , jk i such that for every
player a, we have ja ∈ {1, . . . , da (q)}.
D is the move function: D(q) is the set {1, . . . , d1 (q)} × · · · × {1, . . . , dk (q)}
of move vectors at q.
k
– δ : Q × N∗ → Q is the transition function: for a set q ∈ Q, if every player
a ∈ Σ chooses ja (i.e. the players choose the move vector hj1 , . . . , jk i ∈
D(q)), then the resulting state is δ(q, j1 , . . . , jk ).
Example 1. Figure 1 gives an example of a game structure, G1 , with two players.
We will use this example in most of this report. The transitions are given between
parenthesis, and the moves of the players are separated by commas. The variables
are not given here. Player 1 can only play + or ×, player 2 can play +, × or W .
start
(∗, ∗)
qi
(∗, ∗)
(∗, W )
(∗, W )
q= (+, ×)
(×, +)
(+, +) (×, ×)
q>
(+, ×) q6=
(×, +)
qw
(∗, W )
(+, +)
(×, ×)
(∗, +) (∗, ×)
q⊥
Fig. 1. G1 , a Game Structure with two players.
The symbol ∗ is used to denote that the move can occur whatever this player
chooses.
In this game, the first move decides non-deterministically if the players go in
state q= or q6= . From state =, players 1 and 2 must play the same move to go
to q> , otherwise they go to q⊥ . It is the contrary in q6= , where players 1 and 2
must choose a different move to go to q> . Finally, player 2 can force the game
to wait a turn by playing W . In q= , this loops directly, but in q6= , they will first
go by qw before having to play W another time to come back.
Now, we define some basic notions on games. Let G be a Game Structure. A
state qj is a successor of qi iff there is a move vector from qi to qj . A computation
λ ∈ QN is a sequence such that every λ[i+1] is a successor of λ[i]. A q-computation
is a computation such that λ[0] = q. We use λ[0, i] to designate the (finite) prefix
q0 , q1 , . . . , qi and λ[i, ∞] for the (infinite) suffix qi , qi+1 , . . .
A strategy fa : Q+ → N for player a ∈ Σ is a function associating every finite
prefix λ of a computation to a move. Thus, if λ ∈ Q+ and q is the last state of
λ, fa (λ) ≤ da (q): the chosen move must be available. The outcome of Fa from q,
denoted as out(q, Fa ), is the set of q-computations that the players in A enforce
when they begin from q and they follow the strategies in FA .
2.2
ATL∗
Syntax: We recall that the purpose of Alternating-Time logic (ATL∗ here) is
to express conditions on strategies held by subsets of players. We use the same
definition of ATL∗ as in [4], except that for the sake of simplicity, we do not
introduce state formula and path formula.
Definition 2. Let Σ = {1, . . . , k} be a set of players, let Π be a finite set of
propositional variables. An ATL∗ formula is given by the following grammar:
ϕ ::= p ∈ Π | ¬ϕ | ϕ ∨ ϕ | Xϕ | ϕUϕ | hhAiiϕ for all A ⊆ Σ
We only study closed formulas, i.e. formulas where all temporal operators
are in the scope of some strategy operator (hhAiiϕ or JAKϕ as we will see later).
Semantics: G, λ |= ϕ is used to denote that the path λ satisfies the formula ϕ
in the structure G. When G is clear from the context, we will omit it.
Definition 3. We define inductively the semantics of ATL∗ this way:
λ |= p for propositions p ∈ Π iff p ∈ π(λ[0]).
λ |= ¬ϕ iff λ 2 ϕ.
λ |= ϕ1 ∨ ϕ2 iff λ |= ϕ1 or λ |= ϕ2 .
λ |= hhAii iff there exists FA a set of strategies, one for each player in A,
such that for all computations λ0 ∈ out(q, FA ), we have λ0 |= ϕ.
– λ |= Xϕ iff λ[1, ∞] |= ϕ.
– λ |= ϕ1 Uϕ2 iff there exists a position i ≥ 0 such that λ[i, ∞] |= ϕ2 and for
all j such that 0 ≤ j < i, we have λ[j, ∞] |= ϕ.
–
–
–
–
def
We will use the dual formula of hhAii, defined as JAKϕ = ¬hhAii¬ϕ. As hhAiiϕ
means that the players in A can cooperate to make ϕ true, the dual formula JAKϕ
means that the players of A cannot cooperate to make ϕ false. We will also use
def
the finally operator: Fϕ = >Uϕ, meaning that at least one time in the future,
ϕ will be true.
It is important to notice that in this semantics, when a new strategy operator
is met, all the previous strategies are forgotten, even the strategies for the players
who are not quantified by this new operator. We will sometimes say that G, q |=
hhAiiϕ (using q instead of λ) to say that a formula is true from a given state
of the game. As the hhAii operator considers a new set of outcomes, only the
current state is relevant. We begin studying in Section 4.1 the case where the
strategy context is kept.
Example 2. If we take the Structure G1 of Figure 1, and take a variable lost
such that lost is only true on q⊥ , player 2 has a strategy such that lost is never
true: The strategy is just to always play W : G1, qi |= hh2ii¬Flost.
In the same way, if state q> has a variable win set to true, players 1 and
2 can work together to have a strategy to make win true: player 1 can always
play + and player 2 can play + if they are on state q= , and × on state q6= . Thus
G1 , qi |= hh1, 2iiFwin
2.3
Imperfect information
Perfect recall: Until now, we have presented games with perfect information,
where players have perfect access to the current computation λ[0, n]. With imperfect information, players do not have access to the exact computation. In the
case of perfect recall, players still recall all they observe, but they cannot make
the distinction between some states.
To represent what the k players see, we use different sets of indexes, Q1 , . . . , Qj .
States of the game are represented by elements of Q1 × · · · × Qj . Each player
i has an observation set oi ⊆ {1, . . . , j} : they can only see these components
of the states. Thus the observation function of player i is the projection
on the
Q
components of oi , which can be written πoi : Q1 × · · · × Qj → l∈oi Ql . Two
states v and v 0 are indistinguishable for player i if they are identical after application of the observation function: v ≡i v 0 iff πoi (v) = πoi (v 0 ). This notation
can easily be generalized to paths.
A strategy for player i is a function fi : (Q1 × · · · × Qj )+ → N. When players
have imperfect information, we add the constraint that players must make the
same strategic choices in indistinguishable situations: if λ[0, n] ≡i λ0 [0, n], then
fi (λ[0, n]) = fi (λ0 [0, n]). The model checking problem becomes much harder, as
we have to take into account what every player sees.
In the general case, games with imperfect information and perfect recall are
undecidable [11]. As ATL∗ with imperfect information (ATL∗i ) can express the
existence of winning strategies in games with imperfect information, this logic
is also undecidable.
qi
start
qi
start
(∗, ∗)
(∗, ∗)
(∗, W )
(∗, W )
q=,6=
qw
(∗, W )
q=,6=,w
(∗, W )
q>
(∗, +)
(∗, ×)
(∗, +)
q (∗, +) (∗, ×)
(∗, ×) ⊥
(a) q= and q6= confounded.
q> (∗, +)
(∗, ×)
(∗, +)
q⊥
(∗, ×)
(b) q= , q6= , and qw confounded.
Fig. 2. G1 under Imperfect Information.
Example 3. In Figure 2 we can see the game G1 when players have imperfect
information. Some states are merged and players do not see all the game: in
Figure 2(a) only states q= and q6= are merged. In Figure 2(b), q= , q6= and qw are
merged.
Players know the "real" game, the subgames are only a useful way to understand what they see during the game. A better representation of what a player
with perfect recall does see would be to use powerset construction, but we don’t
have the place for this graph in this report.
If we take the same notations as in Example 2, with player 1 seeing the game
in Figure 2(a) and player 2 seeing the game in 2(b), we have that hh1, 2iiFwin
still holds on qi : A strategy for player 2 is to play W twice, and then play +.
Player 1 will either see that they were twice in state q=,6= , meaning that they are
in the "real" state q= , and will play +. Otherwise, player 1 will see they went
to state qw before coming back to state q=,6= , meaning that they are in q6= , and
will play ×.
Hierarchy on observations As said before, games with Imperfect Information
and Perfect Recall are undecidable. Now, we present a useful restriction. There
is a hierarchy between the observations [11, 12] of the players if there is an order
on the players such that players with smaller indexes see more than players with
higher indexes. More formally, it means that with k players, for i < k, oi+1 ⊆ oi .
With CTL or LTL winning conditions, when there is a hierarchy on players,
games with imperfect information and perfect recall are decidable [8, 9]. Even
though this condition is not optimal, it is very strong, as almost all known
decidability results (including some optimal ones) on these logics with perfect
recall are using reduction to a hierarchy. This asumption is also very natural, as
agents often have a hierarchy on them, whether it is in what robots can see, or
the privileges on a computer going from the simple user to the administrator.
3
3.1
Solving ATL∗i with perfect recall
Overview
Let ϕ be an ATL∗i formula, G = hk, Q, Π, π, (da )a∈{1,...,k} , δi a game structure.
To check whether ϕ is valid w.r.t. the game structure G, we will create the
function solve_ATL∗i which returns the set of states in Q such that ϕ is satisfied.
Let is_ltl be a function such that is_ltl(ϕ) returns true if ϕ is a LTL formula,
false otherwise. For ϕ a LTL formula, we assume a function solve_LTL(ϕ, A, G)
which returns the set of states in Q where the players in A have a winning
strategy for the winning condition ϕ. It is especially useful to notice that when
there is a hierarchy on the players, we know how to solve games with imperfect
information perfect recall, and LTL winning conditions. In fact, our result is
more general, as it permits to generalize all the decidable cases of games with
LTL objectives with perfect recall [8, 9] to ATL∗i .
Theorem 1. For all classes of game structures such that the games with LTL
objectives, imperfect information and perfect recall are decidable, the ATL∗i model
checking is decidable.
To do this, we flatten the formula, with a bottom-up approach. We search for
the innermost strategy quantifier, of the form hhAiiϕ. The formula ϕ following
this quantifier is an LTL formula; we know how to find the states where the
agents have a strategy to enforce ϕ. We add a new propositional variable in those
states, phhAiiϕ to indicate that they can enforce ϕ, and replace hhAiiϕ by phhAiiϕ
in our formula. We consider closed formula: by applying this method inductively,
we end up with a boolean formula, which is easy to solve, for example with a
function solve_bool.
In our algorithm, we use the union of games G1 ∪ G2 . We do not define it in
the general case, but only for the games met in this algorithm. The games we
consider are almost indentical, except for propositional variables. They have a
shared subset of variables, and a subset unique to each game. Thus, we take the
union of all these propositional variables. For the shared variables, both games
have the same valuation for these variables, and we keep them. For the unique
variables, as the valuation is defined in only one of the games, we keep this
valuation.
3.2
Model Checking Algorithm
let flatten(ϕ, G) =
match ϕ with
|hhAiiψ −>
if is_ltl ψ
then
sol := solve_LTL(ψ, A, G)
else
(ψ 0 , G0 ) := flatten(ψ, G)
sol := solve_LTL(ψ 0 , A, G0 )
end
Π 0 := Π ∪ pϕ
for q in Q
do
if q ∈ sol
then π 0 (q) := π(q) ∪ pϕ
else π 0 (q) := π(q)
end
done
return (pϕ , G0 = hk, Q, Π 0 , π 0 , (da )a∈{1,...,k} , δi)
|¬ψ −>
(ψ 0 , G0 ) := flatten(ψ, G)
return (¬ψ 0 , G0 )
|ψ1 ∨ ψ2 −>
(ψ10 , G1 ) := flatten(ψ1 , G)
(ψ20 , G2 ) := flatten(ψ2 , G)
return (ψ10 ∨ ψ20 , G1 ∪ G2 )
|Xψ −>
(ψ 0 , G0 ) := flatten(ψ, G)
return (Xψ 0 , G0 )
|ψ1 Uψ2 −>
(ψ10 , G1 ) := flatten(ψ1 , G)
(ψ20 , G2 ) := flatten(ψ2 , G)
return (ψ10 Uψ20 , G1 ∪ G2 )
let solve_ATL∗i (ϕ, G) =
(ϕ0 , G0 ) := flatten(ϕ, G)
return solve_bool(ϕ0 , ∅, G0 )
In the proof, we say that (ϕ1 , G1 ) is equivalent to (ϕ2 , G2 ) iff G1 and G2
share the same set Q of states, and for all q ∈ Q, it holds that G1 , q |= ϕ1 iff
G2 , q |= ϕ2 .
3.3 Correctness of the algorithm
We have to show that q0 is in solve_ATL∗ (ϕ, G) iff G, q0 |= ϕ.
For this, we prove by induction the following lemma:
Lemma 1. For G a game structure, ϕ a ATL∗ formula, (ϕ0 , G0 ) = flatten(ϕ, G)
is equivalent to (ϕ, G), and ϕ0 is a LTL formula.
The proof can be found in the Appendix A.1.
4
Strategy Context: Some new logics
We have proved two important results on ATL∗ when strategy contexts are kept.
First, that this logic is undecidable, even with a hierarchy on players. Second,
that it becomes decidable if the players can only decide of their strategies in
a given order. In this report, we only prove the undecidability result. We first
define the logic we study, ATL∗sc,i . Then we introduce a new logic, QCTL∗i . We
show it is equivalent to ATL∗sc,i , and then directly prove our results on it.
4.1
ATL∗i
Overview: The difference between an ATL∗ formula and an ATL∗ formula with
strategy contexts (denoted ATL∗sc ) lies in the semantics of the formula, and
more precisely in the semantics of the strategy quantifier. When a new strategy
quantifier hhAii is met in ATL∗sc , players who are not present in A keep their
strategy. To represent this, we add a context to our models, giving the strategies
of the players. We use a definition close to the one by Laroussinie and Markey [6]
In fact we study ATL∗sc with imperfect information and perfect recall, denoted
ATL∗sc,i , and compare it to ATL∗ in the same framework.
Syntax: Let Σ = {1, . . . , k} be a set of players, let Π be a finite set of propositional variables.
Definition 4. The syntax of ATL∗ with strategy context (denoted ATL∗sc ) is the
same one as ATL∗ , and is given by the following grammar:
ϕ ::= p ∈ Π | ¬ϕ | ϕ ∨ ϕ | Xϕ | ϕUϕ | hhaiiϕ for all a ∈ Σ
We only allow quantification on one player at a time, but as context is kept, it
creates no difference with global quantifications on all the players; it only makes
the proofs simpler.
As for ATL∗ , we only study closed formulas, i.e. formulas of the form hhaiiϕ
or JaKϕ.
Semantics: First we recall that the strategies of player i are the elements of
+
F = N(Q1 ×···×Qj ) . Strategies will be in fact elements of F∅ = F ∪ ∅ where ∅
denotes that no strategy was assigned to player i. A context is a set of strategies,
one for each player: a context is an element of (F∅ )k . Updating a context C with
a new strategy fa , denoted C ← fa means that we replace the current strategy
of player a in C by the strategy fa .
We use C, G, λ |= ϕ to denote that in the context C, the computation λ
satisfies the formula ϕ in the structure G. When G is clear from the context, we
will omit it. The semantics is almost the same as for ATL∗ , except for hhaii. This
operator is also the only one to use the strategy context: the other operators just
send the context as it is when doing the recursive definition of the semantics.
For both these reasons, we only give the new definition of hhaii:
Definition 5. The semantics of hhaiiϕ in ATL∗sc,i is:
– C, λ |= hhaii iff there exists fa a strategy for player a, such that with C 0 =
C ← fa , for all computations λ0 ∈ out(q, C 0 ), we have C 0 , λ0 |= ϕ.
While no strategy context is set, the context is ∅k . We use the formula JaKϕ
for ¬hhaii¬ϕ.
Example 4. If we take the same problem as in Example 3, with game G1 of
Figure 1, player 1 seeing as in Figure 2(a) and player 2 seeing as in 2(b), we can
use ATL∗sc,i to express problems that we cannot express in ATL∗i .
For example, we can try to know whether player 2 has a single strategy such
that the players never lose (lost is never true, i.e. the players never reach q⊥ ), and
if player 1 cooperates, they can win (reach win in q> ). The strategy context let
us say that the strategy of player 2 must not change between the left member of
∧ (player 2 is alone) and the right one (players 1 and 2 play together). Formally,
we try to model-check the following formula: hh2ii (¬Flost ∧ hh1iiFwin) in model
G1 with player 1 confounding q= and q6= and player 2 confounding q= , q6= , and
qw . This formula does not hold, there is no such strategy: if player 2 waits and
always plays W , then q> cannot be reached, and if player 2 chooses to play either
+ or ×, if player 1 does not cooperate and plays the wrong one, they are sent
to q⊥ .
As we have seen in the example, ATL∗sc,i is more powerful than ATL∗i . One
interesting result is that ATL∗sc,i is powerful enough to express Nash equilibria:
Definition 6. Let A be a set of players, such that each player a ∈ A has an
objective ϕa . There is a Nash equilibrium if all the players have a common strategy such that none of the individual players could change their strategy to have
a better outcome. This is expressed by the following formula:
!
^
hhAii
(¬ϕa ⇒ ¬hhaiiϕa )
a∈A
4.2
QCTL∗i
Overview: QCTL∗ is made of CTL∗ formulas, where we add quantifiers on propositional variables: ∃q.ϕ means that at each step, we decide the valuation of the
propositional variable q.
This logic was already well known, and known to be equivalent to ATL∗sc [16].
However, we enrich it with with imperfect information on propositional quantification: in QCTL∗i , we have different quantification operators, ∃i q.ϕ. This way,
for two different states and paths that led to these states, we can ask to have
the same decision for the value of q when the paths were indistinguishable for
operator i. We show that this new logic is equivalent to ATL∗sc with perfect recall
(ATL∗sc,i ).
The semantics of a QCTL∗i formulas is given on a Kripke Structure. As for
game structures, a Krike structure is a labeled graph, but without the notion of
players. The logics presented earlier used games (or rather the description of what
can happen in a given game) as their models. Here, we only use computations,
as we do not have any player.
Definition 7. A Kripke Structure is a tuple K = hQ, Π, π, δi such that:
– Q is a finite set of states.
– Π is a finite set of propositional variables.
– π : Q → P(Π) is the labeling function: if q ∈ Q, then π(q) ⊆ Π is the set
of propositional variables true at the state q.
– δ : Q×Q is a binary relation, the transition function: for q, q 0 ∈ Q, (q, q 0 ) ∈ δ
means that the transition from q to q 0 is available.
Game structures can be seen as Kripke Structure by forgetting the moves
and the players. Most of the definition given earlier still hold. We define the
computation tree from a state q: it is the tree where the nodes are sequence of
states, representing an history, and labelled with the propositions that hold in
the last state. The computation tree can be seen as the unfolding of the structure
K from the state q; we denote it by TK .
If T is a computation tree beginning with q, and λ[0, k] a finite q-computation
on T , the application of λ[0, k] to T is the subtree of T beginning at the node
obtained by taking each time the transition proposed by λ; this represents a
move in the tree along λ.
Syntax: Let Π be a finite set of propositional variables.
Definition 8. A Quantified CTL formula with imperfect information and perfect recall (QCTL∗ ) is a formula given by the following grammar:
ϕ ::= p ∈ Π | ¬ϕ | ϕ ∨ ϕ | Xϕ | ϕUϕ | Eϕ | ∃i q.ϕi for all i ∈ Σ
We only allow closed formula, i.e. formula beginning with Eϕ or with Aϕ =
¬Eϕ¬
Semantics: Let K = hQ, Π, π, δi be a Kripke structure. Our semantics is defined
on the unfolding of this structure from a given state: we use T the computation
tree from q of K. In the framework of imperfect information with perfect recall,
we take Q = Q1 × · · · × Qj . Let k be the number of distinguished existential
quantifiers operators ∃i . They are used to describe imperfect information. For
each i ∈ {1, . . . , k}, let oi be observation set oi ⊆ {1, . . . , j}, meaning that i
can only see these components of the states: Each operator i has an observation
operator πoi which gives the observable components of the state. We add that
there is a hierarchy on the operators: oi+1 ⊆ oi .
T, λ |= ϕs is used to denote that the computation λ satisfies the formula ϕ
in the tree T . When the current path is not relevant, we will just use T |= ϕ.
Definition 9. The semantics of QCTL∗ formulas is defined as:
T, λ |= p for propositions p ∈ Π iff p ∈ π(q).
T, λ |= ¬ϕ iff T, λ 2 ϕ.
T, λ |= ϕ1 ∨ ϕ2 iff T, λ |= ϕ1 or T, λ |= ϕ2 .
T, λ |= Eϕ iff there exists a λ[0]-computation λ0 such that we have T, λ0 |= ϕ.
T, λ |= Xϕ iff with T 0 the tree obtained from the application of λ[0] to T , we
have T 0 , λ[1, ∞] |= ϕ.
– T, λ |= ϕ1 Uϕ2 iff there exists a position i ≥ 0 such that with T 0 the tree
obtained from the application of λ[0, i] to T , we have T 0 , λ[i, ∞] |= ϕ2 and
for all j such that 0 ≤ j < i with Ti0 the tree obtained from the application
of λ[0, j] to T , we have Ti0 , λ[j, ∞] |= ϕ.
– T, λ |= ∃i p.ϕ iff there exists a valuation of p for each node of the computation tree (making a new tree T 0 ), such that for all finite paths ρ, ρ0 on the
computation tree, if πoi (ρ) = πoi (ρ0 ), the valuation of p at ρ is equal to the
valuation of p at ρ0 , and we have that in this modified tree, T 0 , λ |= ϕ
–
–
–
–
–
5
Undecidability result on ATL∗sc,i
We only give a sketch of the proof of the undecidability.
5.1
Equivalence between ATL∗sc,i and QCTL∗i
First, we show the equivalence between our two logics, which can be expressed
in two parts:
Proposition 1. For all ATL∗sc,i formula ϕ, there is a QCTL∗i formula ϕ0 such
that G, q |= ϕ iff TG |= ϕ0 .
Proposition 2. For all QCTL∗i formula ϕ, there is an ATL∗sc,i formula ϕ0 such
that TG |= ϕ iff G, q 0 |= ϕ0 .
From ATL∗sc,i to QCTL∗i : As seen in Laroussinie and Markey [6], going from
ATL∗sc,i to QCTL∗i is straightforward except for strategy quantifiers. When a
strategy quantifier is encountered, we use a propositional quantifier with the
same observation set as the player quantified, and add a formula ensuring that
the choices for this quantifier give the representation of a strategy, and ensuring
that when following the strategy, the translation of the QCT L∗i formula following
the strategy quantifier is true.
The full translation is available in Appendix A.2.
From QCTL∗i to ATL∗sc,i : The translation of QCTL∗i into ATL∗sc,i is shorter
but a bit less intuitive than the reverse [6]. We take k players, one per variable
introduced by an existential quantifier and with the same observation set than
the existential quantifier used. These player associated to a variable choose its
truth value, and we add a new player, k + 1, who decides of the moves taken in
the graph.
The full translation is available in Appendix A.3.
5.2
Undecidability: Hierarchy on observations
Overview: We do not give the full formal undecidability proof here. It is known
that QCTL∗ is equivalent to Monadic Second Order logic (MSO) interpreted
on trees [16]. It is also known that MSO on trees is undecidable as soon as a
binary predicate equal_level is added to the logic (this logic will be denoted by
MSOeq_lvl ), such that equal_level(q, q 0 ) holds if and only if q and q 0 are at the
same height in the tree [16]. As we can already translate any MSO formula to
QCTL∗i (by using the same translation as for QCTL∗ ), we only show here that
we can translate the equal_level predicate to QCTL∗i , even in the case where
there is a hierarchy on players.
Proposition 3. For every MSOeq_lvl formula ϕ and tree T , there exists a
QCTL∗i formula ϕ0 (which is effectively computable) such that T |= ϕ iff TK |=
ϕ0 .
Proof. To do this, we only need to introduce one existential quantifier: a blind
quantifier, ∃1 , which can only see the number of steps taken until now (blind
means that the observation function of 1 is ∅). Thus we translate the equal_level
predicate by adding a propositional variable pq to every state q, and by saying
that 2 has a way to label the tree with a fresh variable p1 such that if ρ or ρ0 is
met, p1 is true, and in the path, p1 is only true once. This formula is verified on
a tree if and only if ρ and ρ0 are at the same height in the tree, as the quantifier
can only see the height of the nodes and must label in the same way nodes it
confounds. The difficult part is not to make the formula true when the two nodes
are at the same height, it is to have make it wrong when they aren’t.
More formally, we replace equal_level(ρ, ρ0 ) by:
∃1 x.AG [(pρ ⇒ x) ∧ (pρ0 ⇒ x) ∧ (x ⇒ XG¬x)]
Theorem 2. Model checking of QCTL∗i with imperfect information, perfect recall, and hierachy on quantifiers is undecidable.
Sketch of proof. In the proof of undecidability of MSO with equal_level, the
undecidable sub-logic where formulas start with a first existential quantifier, and
under this quantifier the equal_level predicate can be translated to QCTL∗i with
a quantifier seeing everything followed by a blind quantifier. As a consequence,
we have that QCT L∗i with a hierarchy on quantifier is undecidable. By using
the translation defined in Section 5.1, we have that ATLsc,i is undecidable, even
with a hierarchy on players.
Corollary 1. Model checking of ATL∗sc,i with imperfect information, perfect recall, and hierachy on players is undecidable.
Proof. By Proposition 2. In the translation of QCTL∗i to ATL∗sc,i , the hierarchy
on the quantifiers is changed to a hierarchy on players, and the asumptions of
imperfect information and perfect recall are kept. As QCTL∗i with the previous
restrictions is undecidable, and can be translated to ATL∗sc,i with imperfect
information, perfect recall, and hierachy on players, this second logic with these
restrictions is also undecidable.
6
Conclusion
Overview: We have proved that the model checking problem with imperfect
information, perfect recall, and hierarchy on players is decidable for ATL∗i with
imperfect information, perfect recall and hierarchy on players, and undecidable
on ATL∗sc,i with the same conditions.
The undecidability result is strong, but not optimal: the proof only uses
formulas with an alternation depth of 1, and two players. This result does not
prove that there is no decidable fragment expressive enough to reason about the
existence of Nash equilibria.
Other results: We have also proved, but not presented here, that the model
checking of ATL∗sc,i with perfect recall and hierarchy on player becomes decidable
in the case where we also have a hierarchy on the order of quantification: the
strategy quantifiers for the players knowing more must be placed more inner
than the quantifiers for the players knowing less. We proved it on QCTL∗i (and
obtained it on ATL∗sc,i by using Proposition 1). The decision method is very close
to Kupferman and Vardi’s thin trees method [3, 15]. We only used the narrow
operator, and a project operator close to the one used in the QCTL∗ decidability
proof.
This result is strong, as removing the hierarchy on quantifications leads to the
previous undecidability result. In the proof, we had to suppose that propositional
variables were visible to every player (and thus they cannot confound two states
with different propositional variables). We do not know yet if we can remove this
asumption.
Future works: The decidability of the model checking of Nash Equilibria with
perfect recall, and the optimality of our decidability result on ATL∗sc,i are two
possible axis of work. A third one is Imperfect Information on Strategy Logic
(SL). SL has been described in various ways [5], but very few results are known,
and none of them use imperfect information. We hope that the logics and methods introduced in this report (QCTL∗i and a reduction from MSOeq_lvl on trees)
may be used to obtain results on some restrictions of SL, for a well chosen
definition of Imperfect Information.
Aknowledgement I am very grateful to Aniello Murano and Bastien Maubert,
who involved themselves a lot during their supervision of my internship. I would
also like to thank Antonio, Dario, and Giovanni, for their warm welcome and
their invaluable help.
References
1. Moshe Y Vardi. Alternating automata and program verification. In Computer
Science Today, pages 471–485. Springer, 1995.
2. Wolfgang Thomas. Automata on infinite objects. Handbook of theoretical computer
science, Volume B, pages 133–191, 1990.
3. Orna Kupferman, Moshe Y Vardi, and Pierre Wolper. An automata-theoretic approach to branching-time model checking. Journal of the ACM (JACM), 47(2):312–
360, 2000.
4. Rajeev Alur, Thomas A Henzinger, and Orna Kupferman. Alternating-time temporal logic. Journal of the ACM (JACM), 49(5):672–713, 2002.
5. Fabio Mogavero, Aniello Murano, and Moshe Y Vardi. Reasoning about strategies. In LIPIcs-Leibniz International Proceedings in Informatics, volume 8. Schloss
Dagstuhl-Leibniz-Zentrum fuer Informatik, 2010.
6. François Laroussinie and Nicolas Markey. Augmenting atl with strategy contexts.
Information and Computation, 245:98–123, 2015.
7. Orna Kupferman and Moshe Y Vardi. Church’s problem revisited. Bulletin of
Symbolic Logic, pages 245–263, 1999.
8. Dietmar Berwanger and Anup Basil Mathew. Infinite games with finite knowledge
gaps. arXiv preprint arXiv:1411.5820, 2014.
9. Dietmar Berwanger, Anup Basil Mathew, and Marie Van den Bogaard. Hierarchical information patterns and distributed strategy synthesis. In International
Symposium on Automated Technology for Verification and Analysis, pages 378–
393. Springer, 2015.
10. Bernd Finkbeiner and Sven Schewe. Uniform distributed synthesis. In 20th Annual
IEEE Symposium on Logic in Computer Science (LICS’05), pages 321–330. IEEE,
2005.
11. Gary Peterson, John Reif, and Salman Azhar. Lower bounds for multiplayer noncooperative games of incomplete information. Computers & Mathematics with
Applications, 41(7):957–992, 2001.
12. Gary Peterson, John Reif, and Salman Azhar. Decision algorithms for multiplayer
noncooperative games of incomplete information. Computers & Mathematics with
Applications, 43(1):179–206, 2002.
13. Amir Pnueli and Roni Rosner. On the synthesis of a reactive module. In Proceedings
of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming
languages, pages 179–190. ACM, 1989.
14. Amir Pnueli and Roni Rosner. Distributed reactive systems are hard to synthesize.
In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium
on, pages 746–757. IEEE, 1990.
15. O Kuperman and MY Varfi. Synthesizing distributed systems. In Logic in Computer Science, 2001. Proceedings. 16th Annual IEEE Symposium on, pages 389–
398. IEEE, 2001.
16. François Laroussinie and Nicolas Markey. Quantified ctl: expressiveness and complexity. arXiv preprint arXiv:1411.4332, 2014.
A
A.1
Appendix
Correctness of the algorithm solving ATL∗i with perfect recall.
Lemma 1. For G a game structure, ϕ a ATL∗ formula, (ϕ0 , G0 ) = flatten(ϕ, G)
is equivalent to (ϕ, G), and ϕ0 is a LTL formula.
As solve_ATL∗ (ϕ, G) returns the states of G such that flatten(ϕ, G) (which
is a LTL formula) holds, if flatten(ϕ, G) is equivalent to (ϕ, G), then q0 is in
solve_ATL∗ (ϕ, G) iff G, q0 |= ϕ.
– If ϕ = hhAiiψ, with ψ a LTL formula, we have indeed that for all q ∈ Q,
G, q |= hhAiiψ iff G0 , q |= pϕ .
– Else, if ϕ = hhAiiψ with ψ not a LTL formula, we have by the induction
hypothesis that (ψ 0 , G0 ) = flatten(ψ, G) is equivalent to (ψ, G).
As ψ 0 is a LTL formula, we are back to the first possibility.
– If ϕ = ¬ψ, as (ψ, G) and (ψ 0 , G0 ) = flatten(ψ, G) are equivalent, it is the
same for (¬ψ, G) and (¬ψ 0 , G0 ).
– If ϕ = ψ1 ∨ ψ2 , we have a (ψ10 , G1 ) and a (ψ20 , G2 ) respectively equivalent
to (ψ1 , G) and (ψ2 , G). As G1 is only G with added propositional variables,
which can be made to only appear in ψ10 (resp 2 ), we have that (ψ1 ∨ ψ2 , G)
is equivalent to (ψ10 ∨ ψ20 , G1 ∪ G2 )
– If ϕ = Xψ, the proof is exactly the same as for ¬ψ.
– If ϕ = ψ1 Uψ2 , the proof is exactly the same as for ψ1 ∨ ψ2 .
Thus, (ϕ0 , G0 ) = flatten(ϕ, G) is equivalent to (ϕ, G), and ϕ0 is a LTL formula.
As a consequence, G0 , q |= ϕ0 iff G, q |= ϕ.
As demonstrated earlier, this proves that a state q0 is in solve_ATL∗ (ϕ, G)
iff G, q0 |= ϕ.
A.2
Translation from ATL∗sc,i to QCTL∗i :
Proposition 1. For all ATL∗sc,i formula ϕ, there is a QCTL∗i formula ϕ0 such
that G, q |= ϕ iff TG |= ϕ0 .
This translation has to keep the record of the current context C of strategies
already chosen. The translation is not difficult for most of the operators, except
hhaiiϕ.
To translate hhaiiϕ, we introduce two QCTL∗i formulas. ϕstrat (i) introduces
a first set of propositional variables, {ma1 , . . . , mak }, one for each possible move
of player a. In each node of the tree, only one of them is true; this represents the
move chosen by player a in the strategy being represented. ϕout (i) introduces a
new variable, pout , which will only be true on the path given by the conjunction
of the strategy described earlier and the strategies already in the context. Finally
we add a formula indicating that if we follow pout (our strategy and our context),
then the translation of ϕ is true.
For the first operators, the translation (without making the difference between state formulas and path formulas) is the following:
pC = p
¬ϕC = ¬ϕ
C
Xϕ = XϕC
ϕ1 Uϕ2
C
ϕ1 ∨ ϕ2
C
= ϕ1 C ∨ ϕ2 C
= ϕ1 C Uϕ2 C
def
To translate hhaiiϕ, we introduce the always operator: Aϕ = ¬E¬ϕ. Its
meaning is that the formula ϕ is true in all possible path going from the current
def
state. We also define the finally operator, Fϕ = >Uϕ, and the one we use,
def
globally Gϕ = ¬F¬ϕ. The globally operator means that a formula is true on all
the states we meet in the current path.
We recall that da : Q → N∗ is such that da (q) is the number of moves
available to player a at state q ∈ Q. Let A be a subset of the players Σ, at state
q, the states available to these players playing each a given move ma is denoted
by N ext(q, A, (ma )a∈A ).
S
Formally, N ext(q, A, (ma )a∈A ) = (mi )i6∈A , mi ∈di δ(q, (ma )a∈{1,...,k} ).
V
For m = (mia )a∈A , we write pm for the formula a∈A maia . We now define
our new formulas:



^
_
^
mai ∧
ϕstrat (a) =
AG pq ⇒
¬maj 
q∈Q
j6=i
i∈da (q)
ϕout (A) = pout ∧ AG (¬pout ⇒ AX¬pout ) ∧ AG pout ⇒


_
pq ∧
q∈Q
_
m∈(di (q))i∈A

pm ∧ AX 

_
 !
pq0  ⇔ pout 
q 0 ∈N ext(q,A,m)
Then, we can finally define our translation:
C
hhaiiϕ = ∃a m1 . . . . .mk .pout .
ϕstrat (a) ∧ ϕout (C ∪ {a}) ∧ A Gpout ⇒ ϕC∪{a}
A.3
Translation from QCTL∗i to ATL∗sc,i :
Proposition 2. For all QCTL∗i formula ϕ, there is an ATL∗sc,i formula ϕ0 such
that TG |= ϕ iff G, q 0 |= ϕ0 .
With k the number of propositional variables under a strategy quantifier,
take k + 1 players. The transitions are such that player k + 1 chooses the move
between the states q ∈ Q (the "real states"), and can choose to go to a state cq,i
where player i chooses the move. These states represent the choice whether at q
the ith variable is true (going to state pi ) or false (going to state p⊥ ).
To translate a QCTL∗i formula defined on a Kripke structure K = hQ, Π, π, δi,
we take the game G = hk 0 , Q0 , Π 0 , π 0 , (da )a∈{1,...,k0 } , δ 0 i. With k the number of
quantified variables in our QCTL∗i formula, we take k 0 = k + 1 (each player i
has the same observation set than the existential quantifier used to quantify Pi ),
Q0 = Q ∪ {cq,i | i = 1, . . . , k, q ∈ Q} ∪ {pi | i = 1, . . . k} ∪ {p⊥ }. We add new
variables: Π 0 = Π ∪ {Pi | i = 1, . . . k} ∪ {PQ }. Finally, we change π 0 such that
for q ∈ Q, π 0 (q) = π(q) ∪ {PQ }, for i ∈ {1, . . . , k}, π 0 (pi ) = {Pi }, and otherwise,
π 0 (cq,i ) = π 0 (q⊥ ) = ∅.
The changes to the transition function δ 0 gives us the number of moves
available da . We only give the details for the transition function. let M =
0
{j1 , . . . , jk+1 } be a move vector, for each q ∈ Q let q10 , . . . , qm
the states such that
0
0
0
there exists a M such that δ(q, M ) = qj , j ∈ {1, . . . , m}. We take δ 0 (q, M ) = qj0
for j ∈ {1, . . . , m} and M such that jk+1 = j. We also add for each state q ∈ Q
a transition to cq,i for i ∈ {1, . . . , k} when in M we have jk+1 = m + i. Finally,
we add that δ(cq,i , M ) = qi if ji = 1 and δ(cq,i , M ) = q⊥ if ji = 2.
We use the following translation for the QCTL∗i formula itself (note that the
translation does not use the model):
¬ϕ
f = ¬ϕ
e
ϕ^
f1 ∨ ϕ
f2
1 ∨ ϕ2 = ϕ
m P .ϕ = hh{i}iiϕ
∃^
e
i
g = Xϕ
Xϕ
e
f = hh{k + 1}ii(GPQ ∧ ϕ)
Eϕ
e
ϕ^
f1 U ϕ
f2
1 Uϕ2 = ϕ
(
hh{k + 1}iiXhh{k + 1}iiXp if p ∈ {P1 , . . . , Pk }
pe =
p otherwise

Download Report

Imperfect Information in Games for Multi-Agent Systems

Paperzz.com

Your Paperzz