Symblicit algorithms for optimal strategy synthesis in monotonic

Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Symblicit algorithms for optimal strategy
synthesis in monotonic Markov decision
processes
Aaron Bohy1
1
Véronique Bruyère1
Université de Mons
2
Jean-François Raskin2
Université Libre de Bruxelles
SYNT 2014
3rd workshop on Synthesis
Conclusion
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Overview (1/2)
Motivations:
• Markov decision processes with large state spaces
• Explicit enumeration exhausts the memory
• Symbolic representations like MTBDDs are useful
• No easy use of (MT)BDDs for solving linear systems
1 R. Wimmer, B. Braitling, B. Becker, E. M. Hahn, P. Crouzen, H. Hermanns, A.
Dhama, and O. E. Theel. Symblicit calculation of long-run averages for concurrent
probabilistic systems. In QEST, pages 27-36. IEEE Computer Society, 2010.
1 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Overview (1/2)
Motivations:
• Markov decision processes with large state spaces
• Explicit enumeration exhausts the memory
• Symbolic representations like MTBDDs are useful
• No easy use of (MT)BDDs for solving linear systems
Recent contributions of [WBB+ 10]1 :
• Symblicit algorithm
• Mixes symbolic and explicit data structures
• Expected mean-payoff in Markov decision processes
• Using (MT)BDDs
1 R. Wimmer, B. Braitling, B. Becker, E. M. Hahn, P. Crouzen, H. Hermanns, A.
Dhama, and O. E. Theel. Symblicit calculation of long-run averages for concurrent
probabilistic systems. In QEST, pages 27-36. IEEE Computer Society, 2010.
1 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Overview (2/2)
Our motivations:
• Antichains sometimes outperform BDDs (e.g. [WDHR06, DR07])
• Use antichains instead of (MT)BDDs in symblicit algorithms
2 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Overview (2/2)
Our motivations:
• Antichains sometimes outperform BDDs (e.g. [WDHR06, DR07])
• Use antichains instead of (MT)BDDs in symblicit algorithms
Our contributions:
• New structure of pseudo-antichain (extension of antichains)
• Closed under negation
• Monotonic Markov decision processes
• Two quantitative settings:
• Stochastic shortest path (focus of this talk)
• Expected mean-payoff
• Two applications:
• Automated planning
• LTL synthesis
Full paper available on ArXiv: abs/1402.1076
2 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Table of contents
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic Markov decision processes
Applications
Conclusion and future work
3 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Table of contents
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic Markov decision processes
Applications
Conclusion and future work
4 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Markov decision processes (MDPs)
• M = (S, Σ, P) where:
s0
σ2
5
6
σ1
1
6
• S is a finite set of states
1
2
• Σ is a finite set of actions
1
2
• P : S × Σ → Dist(S) is a stochastic transition
function
s2
4
5
s1
σ1
σ1
1
5
1
1
5 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Markov decision processes (MDPs)
• M = (S, Σ, P) where:
s0
σ2
5
6
1
3
σ1
1
6
• S is a finite set of states
1
2
• Σ is a finite set of actions
1
2
• P : S × Σ → Dist(S) is a stochastic transition
function
4
5
s2
s1
σ1 2
σ1 1
1
5
• Cost function c : S × Σ → R>0
1
1
5 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Markov decision processes (MDPs)
• M = (S, Σ, P) where:
s0
σ2
5
6
1
3
σ1
1
6
• S is a finite set of states
1
2
• Σ is a finite set of actions
1
2
• P : S × Σ → Dist(S) is a stochastic transition
function
4
5
s2
s1
σ1 2
σ1 1
1
5
• Cost function c : S × Σ → R>0
1
1
• (Memoryless) strategy λ : S → Σ
5 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Markov chains (MCs)
s0
• MDP (S, Σ, P) with P : S × Σ → Dist(S)
1
2
+ strategy λ : S → Σ
⇒ induced MC (S, Pλ ) with Pλ : S → Dist(S)
1
2
s2
4
5
s1
1
5
1
1
6 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Markov chains (MCs)
s0
• MDP (S, Σ, P) with P : S × Σ → Dist(S)
1
2
3
+ strategy λ : S → Σ
⇒ induced MC (S, Pλ ) with Pλ : S → Dist(S)
1
2
s2
• Cost function c : S × Σ → R>0
s1
+ strategy λ : S → Σ
4
5
2
1
1
5
1
1
⇒ induced cost function cλ : S → R>0
6 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Expected truncated sum
• Let Mλ = (S, Pλ ) with cost function cλ
• Let G ⊆ S be a set of goal states
7 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Expected truncated sum
• Let Mλ = (S, Pλ ) with cost function cλ
• Let G ⊆ S be a set of goal states
• TSG (ρ = s0 s1 s2 . . . ) =
Pn−1
i=0
cλ (si ), with n first index s.t. sn ∈ G
7 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Expected truncated sum
• Let Mλ = (S, Pλ ) with cost function cλ
• Let G ⊆ S be a set of goal states
• TSG (ρ = s0 s1 s2 . . . ) =
G
• ETS
λ (s) =
Pn−1
i=0
cλ (si ), with n first index s.t. sn ∈ G
P
Pλ (ρ)TSG (ρ), with ρ = s0 s1 . . . sn s.t.
s0 = s, sn ∈ G and s0 , . . . , sn−1 6∈ G
ρ
7 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Stochastic shortest path (SSP)
• Let M = (S, Σ, P) with cost function c
• Let G ⊆ S be a set of goal states
TSG
G
• λ∗ is optimal if ETS
λ∗ (s) = inf λ∈Λ Eλ (s)
8 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Stochastic shortest path (SSP)
• Let M = (S, Σ, P) with cost function c
• Let G ⊆ S be a set of goal states
TSG
G
• λ∗ is optimal if ETS
λ∗ (s) = inf λ∈Λ Eλ (s)
• SSP problem: compute an optimal strategy λ∗
• Complexity and strategies [BT96]:
• Polynomial time via linear programming
• Memoryless optimal strategies exist
8 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Table of contents
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic Markov decision processes
Applications
Conclusion and future work
9 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Ingredients
• Strategy iteration algorithm [How60, BT96]
• Generates a sequence of monotonically improving strategies
• 2 phases:
• strategy evaluation by solving a linear system
• strategy improvement at each state
• Stops as soon as no more improvement can be made
• Returns the optimal strategy along with its value function
10 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Ingredients
• Strategy iteration algorithm [How60, BT96]
• Generates a sequence of monotonically improving strategies
• 2 phases:
• strategy evaluation by solving a linear system
• strategy improvement at each state
• Stops as soon as no more improvement can be made
• Returns the optimal strategy along with its value function
• Bisimulation lumping [LS91, Buc94, KS60]
• Applies to MCs
• Gathers states which behave equivalently
• Produces a bisimulation quotient (hopefully) smaller
• Interested in the largest bisimulation ∼L
10 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Symblicit algorithm
• Mix of symbolic and explicit data structures
Algo 1 Symblicit(MDP M S , Cost function c S , Goal states G S )
S
S
1: n := 0, λS
n := InitialStrategy(M , G )
2: repeat
3:
(MλSn , cλSn ) := InducedMCAndCost(M S , c S , λS
n)
4:
(MλSn ,∼ , cλSn ,∼ ) := Lump(MλSn , cλSn )
L
L
5:
(Mλn ,∼L , cλn ,∼L ) := Explicit(MλSn ,∼ , cλSn ,∼ )
L
L
6:
vn := SolveLinearSystem(Mλn ,∼L , cλn ,∼L )
7:
vnS := Symbolic(vn )
S
S
S
8:
λS
n+1 := ImproveStrategy(M , λn , vn )
9:
n := n + 1
S
10: until λS
n = λn−1
S
S )
11: return (λn−1 , vn−1
Key: S in superscript denotes symbolic representations
11 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Table of contents
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic Markov decision processes
Applications
Conclusion and future work
12 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Antichains
• Let (S, ) be a semilattice with greatest lower bound
• A set α ⊆ S is an antichain if ∀s, s 0 ∈ α, s 6 s 0 and s 0 6 s
• The closure of α is ↓α = {s ∈ S | ∃a ∈ α, s a}
• Example: α = {a1 , a2 }
a1
•
a2
•
13 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Antichains
• Let (S, ) be a semilattice with greatest lower bound
• A set α ⊆ S is an antichain if ∀s, s 0 ∈ α, s 6 s 0 and s 0 6 s
• The closure of α is ↓α = {s ∈ S | ∃a ∈ α, s a}
• Example: α = {a1 , a2 }
a1
•
a2
•
• Canonical representations of closed sets by their maximal
elements (unique)
• Efficient computations of closures of antichains w.r.t. union and
intersection
• But antichains are not closed under negation
13 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Pseudo-elements
• Let (S, ) be a semilattice with greatest lower bound
• A pseudo-element is a pair (x, α) where x ∈ S and α ⊆ S is an
antichain such that x 6∈ ↓α
• The pseudo-closure of (x, α) is l(x, α) = {s ∈ S | s x and s 6∈ ↓α}
= ↓{x}\ ↓α
• Example: (x, α) with α = {a1 , a2 }
x
•
a1
•
a2
•
14 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Pseudo-elements
• Let (S, ) be a semilattice with greatest lower bound
• A pseudo-element is a pair (x, α) where x ∈ S and α ⊆ S is an
antichain such that x 6∈ ↓α
• The pseudo-closure of (x, α) is l(x, α) = {s ∈ S | s x and s 6∈ ↓α}
= ↓{x}\ ↓α
• Example: (x, α) with α = {a1 , a2 }
x
•
a1
•
a2
•
• (x, α) is in canonical form if ∀a ∈ α, a x (unique)
14 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Pseudo-antichains
• A pseudo-antichain A is a set {(xi , αi ) | i ∈ I } of pseudo-elements
• The pseudo-closure of A is lA =
S
i∈I
l(xi , αi )
15 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Pseudo-antichains
• A pseudo-antichain A is a set {(xi , αi ) | i ∈ I } of pseudo-elements
• The pseudo-closure of A is lA =
S
i∈I
l(xi , αi )
• A is a PA-representation of lA (not unique)
15 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Pseudo-antichains
• A pseudo-antichain A is a set {(xi , αi ) | i ∈ I } of pseudo-elements
• The pseudo-closure of A is lA =
S
i∈I
l(xi , αi )
• A is a PA-representation of lA (not unique)
• Any set can be PA-represented
• Efficient computations of pseudo-closures of pseudo-antichains
w.r.t. the union, intersection and negation
15 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Table of contents
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic Markov decision processes
Applications
Conclusion and future work
16 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Monotonic properties
Intuition on a transition system (TS) (S, Σ, ∆) where:
• S: set of states
• Σ: set of actions
• ∆: transition function
17 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Monotonic properties
Intuition on a transition system (TS) (S, Σ, ∆) where:
• S: set of states
• Σ: set of actions
• ∆: transition function
A monotonic TS is a TS (S, Σ, ∆) s.t.:
• S is equipped with a partial order s.t. (S, ) is a semilattice
17 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Monotonic properties
Intuition on a transition system (TS) (S, Σ, ∆) where:
• S: set of states
• Σ: set of actions
• ∆: transition function
A monotonic TS is a TS (S, Σ, ∆) s.t.:
• S is equipped with a partial order s.t. (S, ) is a semilattice
• is compatible with ∆, i.e. ∀s, s 0 ∈ S
s
s0
17 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Monotonic properties
Intuition on a transition system (TS) (S, Σ, ∆) where:
• S: set of states
• Σ: set of actions
• ∆: transition function
A monotonic TS is a TS (S, Σ, ∆) s.t.:
• S is equipped with a partial order s.t. (S, ) is a semilattice
• is compatible with ∆, i.e. ∀s, s 0 ∈ S
s
s0
∀σ∈Σ
∆(s 0 , σ)
t0
17 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Monotonic properties
Intuition on a transition system (TS) (S, Σ, ∆) where:
• S: set of states
• Σ: set of actions
• ∆: transition function
A monotonic TS is a TS (S, Σ, ∆) s.t.:
• S is equipped with a partial order s.t. (S, ) is a semilattice
• is compatible with ∆, i.e. ∀s, s 0 ∈ S
s
s0
∀σ∈Σ
∆(s 0 , σ)
∃
t
t0
17 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Monotonic properties
Intuition on a transition system (TS) (S, Σ, ∆) where:
• S: set of states
• Σ: set of actions
• ∆: transition function
A monotonic TS is a TS (S, Σ, ∆) s.t.:
• S is equipped with a partial order s.t. (S, ) is a semilattice
• is compatible with ∆, i.e. ∀s, s 0 ∈ S
s
s0
∆(s 0 , σ)
∆(s, σ)
∃
∀σ∈Σ
t
t0
17 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Monotonic properties
Intuition on a transition system (TS) (S, Σ, ∆) where:
• S: set of states
• Σ: set of actions
• ∆: transition function
A monotonic TS is a TS (S, Σ, ∆) s.t.:
• S is equipped with a partial order s.t. (S, ) is a semilattice
• is compatible with ∆, i.e. ∀s, s 0 ∈ S
s
s0
∆(s 0 , σ)
∆(s, σ)
∃
∀σ∈Σ
t
t0
17 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Monotonic Markov decision processes
Monotonic MDP:
• MDP s.t. its underlying TS is monotonic
Remark:
• All MDPs can be seen monotonic
• Interested in MDPs built on state spaces already equipped with a
natural partial order
⇒ Pseudo-antichain based symblicit algorithm for monotonic MDPs
18 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Table of contents
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic Markov decision processes
Applications
Conclusion and future work
19 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
STRIPS
A STRIPS is a tuple (P, I , M, O) where
• P is a finite set of propositional variables
• I ⊆ P is a subset of initial variables
• M ⊆ P is a subset of goal variables
20 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
STRIPS
A STRIPS is a tuple (P, I , M, O) where
• P is a finite set of propositional variables
• I ⊆ P is a subset of initial variables
• M ⊆ P is a subset of goal variables
• O is a finite set of operators o = (γ, (α, δ)) s.t.
• γ ⊆ P is the guard of o
• (α, δ), with α, δ ⊆ P, is the effect of o
20 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
STRIPS
A STRIPS is a tuple (P, I , M, O) where
• P is a finite set of propositional variables
• I ⊆ P is a subset of initial variables
• M ⊆ P is a subset of goal variables
• O is a finite set of operators o = (γ, (α, δ)) s.t.
• γ ⊆ P is the guard of o
• (α, δ), with α, δ ⊆ P, is the effect of o
s
20 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
STRIPS
A STRIPS is a tuple (P, I , M, O) where
• P is a finite set of propositional variables
• I ⊆ P is a subset of initial variables
• M ⊆ P is a subset of goal variables
• O is a finite set of operators o = (γ, (α, δ)) s.t.
• γ ⊆ P is the guard of o
• (α, δ), with α, δ ⊆ P, is the effect of o
s
(γ, (α, δ))
20 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
STRIPS
A STRIPS is a tuple (P, I , M, O) where
• P is a finite set of propositional variables
• I ⊆ P is a subset of initial variables
• M ⊆ P is a subset of goal variables
• O is a finite set of operators o = (γ, (α, δ)) s.t.
• γ ⊆ P is the guard of o
• (α, δ), with α, δ ⊆ P, is the effect of o
s
(γ, (α, δ))
s⊇γ
20 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
STRIPS
A STRIPS is a tuple (P, I , M, O) where
• P is a finite set of propositional variables
• I ⊆ P is a subset of initial variables
• M ⊆ P is a subset of goal variables
• O is a finite set of operators o = (γ, (α, δ)) s.t.
• γ ⊆ P is the guard of o
• (α, δ), with α, δ ⊆ P, is the effect of o
s
s⊇γ
(γ, (α, δ))
s0
s0
= (s ∪ α) \ δ
⇒ TS with monotonic properties
20 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
STRIPS
A STRIPS is a tuple (P, I , M, O) where
• P is a finite set of propositional variables
• I ⊆ P is a subset of initial variables
• M ⊆ P is a subset of goal variables
• O is a finite set of operators o = (γ, (α, δ)) s.t.
• γ ⊆ P is the guard of o
• (α, δ), with α, δ ⊆ P, is the effect of o
s
s⊇γ
(γ, (α, δ))
s0
s0
= (s ∪ α) \ δ
⇒ TS with monotonic properties
Planning from STRIPS [FN72]
• Find a sequence of operators leading from the initial state I to a
goal state s ⊇ M
20 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Stochastic STRIPS
• Extension of STRIPS with stochastic aspects [BL00]
• Probability distribution on the effects of operators
Monotonic Markov decision processes
21 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Stochastic STRIPS
• Extension of STRIPS with stochastic aspects [BL00]
• Probability distribution on the effects of operators
Monotonic Markov decision processes
• Cost function C : O → R>0
• Planning from stochastic STRIPS
• Minimize the expected truncated sum up to a state s ⊇ M from I
Stochastic shortest path problem
21 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Experimental results
PA
Moats and castles
Monkey
example
TSG
Eλ
|MS |
#it
| ∼L |
time
mem
(3, 2)
(3, 3)
(3, 4)
(3, 5)
(5, 2)
(5, 3)
(5, 4)
(5, 5)
35.75
35.75
35.75
36.00
35.75
35.75
35.75
36.00
4096
65536
1048576
16777216
65536
4194304
268435456
17179869184
4
5
6
7
4
5
6
7
23
43
57
88
31
56
97
152
0.2
1.6
17.8
272.1
0.5
8.2
196.8
7098.4
16.0
17.3
21.7
37.5
16.6
19.5
31.3
81.3
(2, 5)
(2, 6)
(3, 3)
(3, 4)
(3, 5)
(3, 6)
(4, 2)
(4, 3)
32.22
32.22
59.00
52.00
48.33
48.33
96.89
78.67
4096
16384
4096
32768
262144
2097152
4096
65536
3
3
3
3
3
3
3
3
49
66
84
219
357
595
132
464
1.8
11.7
15.3
150.8
740.2
11597.7
43.7
1594.5
17.3
19.3
20.2
30.7
49.1
145.8
26.5
82.2
Explicit
time
mem
60.6
20316.2
133.7
2966.8
149.6
14660.7
173.6
1626
> 4000
> 4000
> 4000
2343
> 4000
> 4000
> 4000
1202
1706
1205
1611
> 4000
> 4000
1211
> 4000
22 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Expected mean-payoff with LTL synthesis
Results from [BBFR13]:
• Synthesis from LTL specifications with mean-payoff objectives
• Reduction to a 2-player safety game (SG)
• between the system and its environment
• equipped with a partial order (monotonic properties)
23 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Expected mean-payoff with LTL synthesis
Results from [BBFR13]:
• Synthesis from LTL specifications with mean-payoff objectives
• Reduction to a 2-player safety game (SG)
• between the system and its environment
• equipped with a partial order (monotonic properties)
Goal: compute a worst-case winning strategy with good expected
performance
23 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Expected mean-payoff with LTL synthesis
Results from [BBFR13]:
• Synthesis from LTL specifications with mean-payoff objectives
• Reduction to a 2-player safety game (SG)
• between the system and its environment
• equipped with a partial order (monotonic properties)
Goal: compute a worst-case winning strategy with good expected
performance
Idea:
• Replace the environment by a probability distribution in the SG
restricted to winning states
Monotonic MDP
• Symblicit algorithm for the expected mean-payoff problem
• Implementation in Acacia+
23 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Experimental results
Comparison with an MTBDD based symblicit algorithm [VE13]
Figure : Execution time
Figure : Memory consumption
⇒ Monotonic MDPs are better handled by pseudo-antichains
24 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Table of contents
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic Markov decision processes
Applications
Conclusion and future work
25 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Conclusion and future work
Summary:
• New data structure of pseudo-antichains
• Symblicit algorithms in monotonic MDPs with a natural partial
order
• Expected mean-payoff and stochastic shortest path
• Promising experimental results
26 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Conclusion and future work
Summary:
• New data structure of pseudo-antichains
• Symblicit algorithms in monotonic MDPs with a natural partial
order
• Expected mean-payoff and stochastic shortest path
• Promising experimental results
Future work:
• Implementation of a MTBDD based symblicit algorithm for the
stochastic shortest path
• Apply pseudo-antichains in other contexts (e.g. model-checking of
probabilistic lossy channel systems)
26 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
Thank you!
Questions?
27 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
References I
Aaron Bohy, Véronique Bruyère, Emmanuel Filiot, and Jean-François Raskin.
Synthesis from LTL specifications with mean-payoff objectives.
In Nir Piterman and Scott A. Smolka, editors, TACAS, volume 7795 of Lecture Notes in
Computer Science, pages 169–184. Springer, 2013.
Avrim L Blum and John C Langford.
Probabilistic planning in the graphplan framework.
In Recent Advances in AI Planning, pages 319–332. Springer, 2000.
D. P. Bertsekas and J. N. Tsitsiklis.
Neuro-Dynamic Programming.
Anthropological Field Studies. Athena Scientific, 1996.
Peter Buchholz.
Exact and ordinary lumpability in finite Markov chains.
Journal of applied probability, pages 59–75, 1994.
Laurent Doyen and Jean-François Raskin.
Improved algorithms for the automata-based approach to model-checking.
In Orna Grumberg and Michael Huth, editors, TACAS, volume 4424 of Lecture Notes in
Computer Science, pages 451–465. Springer, 2007.
Richard E Fikes and Nils J Nilsson.
STRIPS: A new approach to the application of theorem proving to problem solving.
Artificial intelligence, 2(3):189–208, 1972.
28 / 27
Definitions
Symblicit approach
Antichains and pseudo-antichains
Monotonic MDPs
Applications
Conclusion
References II
Ronald A. Howard.
Dynamic Programming and Markov Processes.
John Wiley and Sons, 1960.
John G. Kemeny and J. L. Snell.
Finite Markov Chains.
Van Nostrand Company, Inc, 1960.
Kim G. Larsen and Arne Skou.
Bisimulation through probabilistic testing.
Inf. Comput., 94(1):1–28, 1991.
Christian Von Essen.
personal communication, 20-11-2013.
R. Wimmer, B. Braitling, B. Becker, E. M. Hahn, P. Crouzen, H. Hermanns, A. Dhama, and
O. E. Theel.
Symblicit calculation of long-run averages for concurrent probabilistic systems.
In QEST, pages 27–36. IEEE Computer Society, 2010.
Martin De Wulf, Laurent Doyen, Thomas A. Henzinger, and Jean-François Raskin.
Antichains: A new algorithm for checking universality of finite automata.
In Thomas Ball and Robert B. Jones, editors, CAV, volume 4144 of Lecture Notes in
Computer Science, pages 17–30. Springer, 2006.
29 / 27

Download Report

Symblicit algorithms for optimal strategy synthesis in monotonic

Paperzz.com

Your Paperzz