Energy distribution using multi-player games

Energy distribution using multi-player games
an outgoing inventory of what we have and what remains to be done
Thomas Brihaye1 , Amit Kumar Dhar2 , Gilles Geeraerts3 ,
Axel Haddad1 (me), Benjamin Monmege4
1
Université de Mons
2
IIIT Allahabad
3
Université de Bruxelles
4
Aix-Marseille Université
European Project FP7-CASSTING
Our motivation
“The objective is to develop a
novel approach for analysing
and designing collective
adaptive systems in their
totality, by setting up a game
theoretic framework.”
The Nøvling case study
Each house:
• has a list of tasks
• is equiped with solar panels
• can use or sell its energy
• can buy the other houses
produced energy
• can buy energy from the
supplier
The Nøvling case study
Goal:
• Achieve all the tasks
• Minimise each house’s bill
• Minimise the energy bought from the supplier
How to model it?
• Game, multiple players
• Prices ⇒ costs and incomes (= negative costs)
• some goal to reach (perform all the tasks)
• Time sensitive costs and possible actions
• Simultaneous choices ⇒ Concurrency
How to model it?
• Game, multiple players
• Prices ⇒ costs and incomes (= negative costs)
• some goal to reach (perform all the tasks)
• Time sensitive costs and possible actions
• Simultaneous choices ⇒ Concurrency
Multi-player concurrent priced∗ -timed games!
∗
(with positive and negative costs)
What are we looking for?
A Nash equilibrium!
(if possible one that is not too costly)
i.e. strategies for all players such that no player can pay less by
deviating from his/her strategy.
Untimed versions: Min-Cost reachability games
• Several players
• Played of a graph, with costs (different for each player)
• Some nodes are targets
• Goal: reach a target and minimize the sum of the costs
• Can be concurrent or turn-based
1,1
ℓ6
-1,-5
-7,0
ℓ5
ℓ1
ℓ2
ℓ4
-2,4
1,1
-3,-4
8,7
1,-3
ℓ3
5,-2
0,0
ℓf
6,2
ℓ7
-5,-4
Simple priced-timed games
• Time elapse in an interval (usually [0, 1])
• all players are forced to play within this interval
• Rates for each location and each player
• In a location loc: costfor player p = timein loc × ratein loc for p
• Some locations are urgent = cannot wait before taking a
transition.
1,1
2,2
6,-8
ℓ6
-1,-5
ℓ1
8,7
1,-3
-7,0
ℓ5
0,4
1,1
-3,-4
ℓ2
!
ℓ3
5,-2
-7,9
0,0
ℓ4
!
-2,4
ℓf
6,2
ℓ7
10,-8
-5,-4
(General) Priced-timed games
• Several clocks x, y, . . .
• Time can elapse to +∞
• After each transition, some clocks can be reset
• Some guards are added to the transition
• they specify intervals for each clocks in which the transition can
be triggered
1,1
2,2
6,-8
ℓ6
-1,-5
if x ⩾ 2
-7,0
ℓ5
0,4
ℓ1
reset y
1,1
-3,-4
8,7
1,-3
ℓ2
!
ℓ3
5,-2
-7,9
0,0
if y ∈ [4, 7]
ℓ4
!
-2,4
ℓf
6,2
ℓ7
10,-8
-5,-4
How to tackle multi-player games? The Folk theorem!
The Folk theorem is a magical tool that turns solving any∗
concurrent multi-player game into solving turn-based two-player
zero-sum games!
∗
(provided that it is fully informed =
each player sees the actions of all the others.)
When not fully informed, a more involved version of the theorem still
holds, see the suspect games of [Bouyer Brenguier Markey
Ummels 2015].
Coalition games
Given a concurrent multi-player Game, a history (= the beginning of
a play), a cost, a player.
The coalition game Gameplayer,cost [history]:
• 2 players, player and coalition,
• is played on the same Game starting after the history.
• The player has the same actions as before, coalition controls the
actions of all the other players,
• player chooses its actions after coalition,
• the goal of player is to pay < than cost,
• the goal of coalition is to make player pay ⩾ cost.
Coalition games
Given a concurrent multi-player Game, a history (= the beginning of
a play), a cost, a player.
The coalition game Gameplayer,cost [history]:
• 2 players, player and coalition,
• is played on the same Game starting after the history.
• The player has the same actions as before, coalition controls the
actions of all the other players,
• player chooses its actions after coalition,
• the goal of player is to pay < than cost,
• the goal of coalition is to make player pay ⩾ cost.
Example: The owner of a house claims to be able to pay less that 20€
a month of electricity, and the other households want to prove him
wrong!
Characterising Nash Equilibrium
A Nash equilibrium can be split in 3 parts:
• The outcome of the equilibrium = the play when everyone
follows their strategies.
• The retaliations strategies = what do the strategies prescribe
when one player deviates.
• What remains of the strategies = what do the strategies
prescribe when two or more players deviate.
Characterising Nash Equilibrium
A Nash equilibrium can be split in 3 parts:
• The outcome of the equilibrium = the play when everyone
follows their strategies.
• The retaliations strategies = what do the strategies prescribe
when one player deviates.
• What remains of the strategies = what do the strategies
prescribe when two or more players deviate.
First: the last part has no impact on the Nash property.
Then: the only goal of the retaliations strategies in Nash equilibria
is to prevent the one who deviated to pay less.
Finally: in the context of a deviation, all other players have this same
objective!
Characterising Nash Equilibrium
A Nash equilibrium can be split in 3 parts:
• The outcome of the equilibrium = the play when everyone
follows their strategies.
• The retaliations strategies = what do the strategies prescribe
when one player deviates.
• What remains of the strategies = what do the strategies
prescribe when two or more players deviate.
First: the last part has no impact on the Nash property.
Then: the only goal of the retaliations strategies in Nash equilibria
is to prevent the one who deviated to pay less.
Finally: in the context of a deviation, all other players have this same
objective!
⇒ when one player deviates, the others form a coalition...
⇒ the situation is equivalent to a coalition game!
The folk theorem
Given a Game, a player, a play π whose cost for player is cost:
The play π is the outcome of a Nash Equilibrium if and only if, for all
player and for all history where only player has deviated, coalition
wins the game:
Gameplayer,costplayer (π) [history]
Notice that, if it is satisfied, one can construct a Nash Equilibrium!
Existence of Nash Equilibria
The Folk theorem is hidden behind many Nash equilibria existence
results, e.g. [Brihaye, De Pril, Schewe 2013], [Le Roux, Pauly 2016]:
A large class of games in which there exists a Nash equilibrium.
⇒ there allways exist Nash equilibria in turn-based Min-Cost
reachability games with non negative costs (De Pril ,2014)
(As usual, concurrency is not a friend of pure Nash equilibria, e.g. rock
paper scissors)
A conjecture
prefix-linear: For all history h, there exists ah ∈ R and bh ∈ R+ such
that for all play p,
cost(h · p) = ah + bh ·cost(p)
(i.e. history does not change the preference relation)
Initial coalition games: coalition games with no history, starting in
any location.
Every turn-based game with:
• a prefix-linear cost function,
• such that all initial coalition games have optimal finite-memory
strategies,
has a Nash equilibrium.
(= first versions of [Brihaye, De Pril, Schewe 2013] and [Le Roux,
Pauly 2016])
Unfortunately...
When negative costs are allowed, even with no concurrency, and no
clocks...
0,-1
B
A
-1,0
-1,0
0,-1
C
0,0
Unfortunately...
When negative costs are allowed, even with no concurrency, and no
clocks...
0,-1
B
A
-1,0
-1,0
0,-1
C
0,0
(Note that when the possible costs are bounded below, the situation is
equivalent to games with non-negative costs ⇒ there allways exists a
Nash equilibrium.)
A construction that might work
1 compute for each player, a strategy ensuring the least possible
cost against a coalition of all other players (= solve a two-player
zero sum game).
2 consider the outcome of the constructed profile
3 check that all deviations satisfy the property (= solve more
two-player zero sum games)
4 if it is the case, compute coalition strategies in case of a deviation.
2-player zero-sum games with non-negative costs
• Min-Cost reachability games: compute the value and optimal
strategies in polynomial time, furthermore there are optimal
positional strategies.
• Simple priced-timed games: compute the value and optimal
strategies in exponential time
• One-clock priced-timed games: compute the value in exponential
time
• Priced-timed games with 3 or more clocks: knowing whether
there is a strategy ensuring a cost smaller than a given bound is
undecidable.
(those results and other on the subjects have been found by many researchers, as a
non exhaustive list Alur, Berendsen, Bernadsky, Bouyer, Brihaye, Bruyère,
Cassez, Chen, Fleury, Hansen, Ibsen-Jensen, Jansen, Jaziri, Khachiyan, Larsen,
Madhusudan, Markey, Miltersen, Raskin, Rasmussen, Rutkowski,…)
Positive and negative costs
• Min-Cost reachability games: compute the value and optimal
strategies in pseudo-polynomial time, furthermore there are
optimal finite-memory strategies.
• Simple priced-timed games: compute the value and optimal
strategies in exponential time
• One-clock priced-timed games: ???
(Joint work with Thomas Brihaye, Gilles Geeraerts, Engel Lefaucheux and
Benjamin Monmege)
Back to the case study
We modeled it with min-cost reachability games using PRISM and
were able to construct some equilibria that maintains minimum
non-solar energy consumption and such that the average cost paid
by the houses is not too high.
A lot remains to be done
• What about 1-clock priced-timed games with positive and
negative weights?
• Is it difficult to look at equilibrium in timed games?
• What about mixed strategies?
• (My favorite:) By using meta-strategies (e.g, corner-point
abstraction, non-standard arithmetics, profinite words) can we
regain the existence of kind-of-Nash-equilbria? (e.g. example
below or the “< 1 minute game”)
0,-1
B
A
-1,0
-1,0
0,-1
C
0,0
• About case studies: can we refine the models and find more
efficient solutions?
• ...