Approximate Solutions for Factored Dec

Approximate Solutions for Factored
Dec-POMDPs with Many Agents
Frans A. Oliehoek, Shimon Whiteson, & Matthijs T.J. Spaan
AAMAS, Wednesday May 8, 2013
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
1 / 33
Visual Roadmap
model
model
background:
background:
FSPC
FSPC
Factored
Factored FSPC
FSPC
practical
practical factored
factored FFSPC
FFSPC
May 8, 2013
results
results
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
2 / 33
Multiagent decision making
under Uncertainty

Outcome Uncertainty

Partial Observability

Multiagent Systems: uncertainty about others
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
3 / 33
Formal Model

A Dec-POMDP









〈 S , A , PT , O , PO , R ,h〉
n agents
S – set of states
A – set of joint actions
PT – transition function
a=〈 a1, a2, ... ,an 〉
P(s '∣s , a)
O – set of joint observations
PO – observation function
o=〈o1, o2, ..., o n 〉
R – reward function
h – horizon (finite)
R (s , a)=E s ' R(s ,a , s ' )
May 8, 2013
P(o∣a , s ')
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
4 / 33
Factored Dec-POMDPs

state variables
or 'factors'
E.g., 'Firefighting Graph'

H houses h1...hH

H-1 agents
h1

actions: left or right

observations: flames or not
h2
1
2
3
4
h3
h4
t
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
5 / 33
Factored Dec-POMDPs

CPTs
(for each factor)
E.g., 'Firefighting Graph'

H houses h1...hH

H-1 agents

actions: left or right

observations: flames or not
1
2
3
4
h1
h1'
h2
h2'
h3
h 3'
h4
h 4'
t
May 8, 2013
t+1
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
6 / 33
Factored Dec-POMDPs

CPTs
(for each factor)
E.g., 'Firefighting Graph'

H houses h1...hH

H-1 agents
h1

actions: left or right

observations: flames or not
h1'
a1
h2
h2'
a2
1
2
3
4
h3
h 3'
a3
h4
t
May 8, 2013
h 4'
t+1
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
7 / 33
Factored Dec-POMDPs

CPTs
(for each factor)
E.g., 'Firefighting Graph'

H houses h1...hH

H-1 agents
h1

actions: left or right

observations: flames or not
h1'
o1
a1
h2
h2'
o2
a2
1
2
3
4
h3
h 3'
o3
a3
h4
t
May 8, 2013
h 4'
t+1
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
8 / 33
Factored Dec-POMDPs
reward for house 1

E.g., 'Firefighting Graph'

H houses h1...hH

H-1 agents
R1
h1

actions: left or right

observations: flames or not
h 1'
o1
a1
h2
h 2'
o2
a2
1
2
3
4
h3
h3'
o3
a3
h4
t
May 8, 2013
h4'
t+1
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
9 / 33
Factored Dec-POMDPs

E.g., 'Firefighting Graph'

H houses h1...hH

H-1 agents
R1
Expected reward for
house 1
h1

actions: left or right

observations: flames or not
h 1'
o1
a1
h2
h 2'
o2
a2
1
2
3
4
h3
h3'
o3
a3
h4
t
May 8, 2013
h4'
t+1
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
10 / 33
Background: FSPC – 1


Goal: scalability w.r.t. #agents
Forward-sweep policy computation (FSPC)



for each t=0,...,h-1
compute best joint decision rule δt
given past joint policy φt
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
11 / 33
Background: FSPC – 1


Goal: scalability w.r.t. #agents
Forward-sweep policy computation (FSPC)



for each t=0,...,h-1
compute best joint decision rule δt
given past joint policy φt
May 8, 2013
()
φ0
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
12 / 33
Background: FSPC – 1


Goal: scalability w.r.t. #agents
Forward-sweep policy computation (FSPC)



for each t=0,...,h-1
compute best joint decision rule δt
given past joint policy φt
May 8, 2013
()
φ0
δ0
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
13 / 33
Background: FSPC – 1


Goal: scalability w.r.t. #agents
Forward-sweep policy computation (FSPC)



for each t=0,...,h-1
compute best joint decision rule δt
given past joint policy φt
()
φ0
(δ0)
φ1
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
14 / 33
Background: FSPC – 1


Goal: scalability w.r.t. #agents
Forward-sweep policy computation (FSPC)



for each t=0,...,h-1
compute best joint decision rule δt
given past joint policy φt
()
φ0
(δ0)
φ1
δ1
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
15 / 33
Background: FSPC – 1


Goal: scalability w.r.t. #agents
Forward-sweep policy computation (FSPC)



for each t=0,...,h-1
compute best joint decision rule δt
given past joint policy φt
()
φ0
(δ0)
φ1
(δ0,δ1)
φ2
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
16 / 33
Background: FSPC – 2


Problem at stage t

select δ=<δ1,...,δn>

δi : histories → actions δti ( ⃗θti )=ati
As collaborative Bayesian games (CBGs)




φ0
B(φ0)
φ1
B(φ1)
φ2
B(φ2)
agents, actions
types θi ↔ histories
probabilities: P(θ)
payoffs: Q(θ,a)
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
17 / 33
Background: CBGs
1
2
3
ag. 2
Flames
Flames
agent 1

No Flames
No Flames
H2
H3
H2
H3
H1
+2
-1
...
...
H2
...
...
...
...
H1
…
...
...
...
H2
...
...
...
...
As collaborative Bayesian games (CBGs)




B(φ0)
B(φ1)
agents, actions
types θi ↔ histories
probabilities: P(θ)
payoffs: Q(θ,a)
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
B(φ2)
18 / 33
This paper: Factored FSPC
Factored
Factored FSPC
FSPC –– Basic
Basic idea:
idea:
exploit
exploit independence
independence between
between agents
agents in
in FSPC
FSPC

if value function factored
Q( x , ⃗θ , a)=∑e Q e ( x e , ⃗
θ e , ae )
→ replace CBGs with
Collaborative graphical Bayesian games (CGBGs)
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
19 / 33
This paper: Factored FSPC
agent 1
agent 3
agent 2

if value function factored
Total payoff is sum of components
→ replace CBGs with
Collaborative graphical Bayesian games (CGBGs)
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
20 / 33
Factored FSPC for Many Agents
Improving scalability w.r.t. the number of agents...
1) Approximate structure of Q*

Predetermined scope structure
2) Compute CGBG payoff functions

Transfer Planning (TP)
B(φ0)
B(φ1)
B(φ2)
3) Inference techniques to construct CGBGs

Extension of factored frontier [Murphy&Weiss UAI 2001]
4) Efficient solutions of CGBGs

Max-plus to ATI-FG [OWS UAI 2012]
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
21 / 33
Factored FSPC for Many Agents
B(φ0)
Improving scalability w.r.t. the number of agents...
1) Approximate structure of Q*

Predetermined scope structure
2) Compute CGBG payoff functions

Rest of
this talk
Transfer Planning (TP)
B(φ1)
B(φ2)
3) Inference techniques to construct CGBGs

Extension of factored frontier [Murphy&Weiss UAI 2001]
4) Efficient solutions of CGBGs

Max-plus to ATI-FG [OWS UAI 2012]
May 8, 2013
see paper
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
22 / 33
Accumulation of Reward over Time
Q e=1 ( x e , ⃗
θe , a e )
h1
h1
a1
h2
o1
h1
a1
h2
a2
h3
o2
o3
o1
a2
o2
a2
h3
a3
o3
h4
h4
h4
t=0
t=1
t=2
May 8, 2013
a1
h2
h3
a3
R1
a3
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
23 / 33
Accumulation of Reward over Time
R1
h1
h1
a1
h2
o1
h1
a1
h2
a2
h3
o2
o3
a2
o2
a2
h3
a3
o3
h4
h4
h4
t=0
t=1
t=2
May 8, 2013
a1
h2
h3
a3
o1
a3
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
24 / 33
Accumulation of Reward over Time
Still
Still factored,
factored, but
but
components
components not
not local!
local!
Q1
h1
h1
a1
h2
o1
h1
a1
h2
a2
h3
o2
o3
o1
a2
o2
a2
h3
a3
o3
h4
h4
h4
t=0
t=1
t=2
May 8, 2013
a1
h2
h3
a3
R1
a3
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
25 / 33
1) Predetermined Scope Structure

Use smaller scopes
Q1
h1
h1
a1
h2
a1
h2
a2
h3
o2
o3
o1
a1
h2
a2
h3
a3
May 8, 2013
o1
h1
o2
a2
h3
a3
o3
h4
h4
h4
t=0
t=1
t=2
a3
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
26 / 33
1) Predetermined Scope Structure

Use smaller scopes
Q1
h1
h1
a1
h2
a1
h2
a2
h3
o2
o3
o1
a1
Q3
h2
a2
h3
a3
May 8, 2013
o1
Q2
h1
o2
a2
h3
a3
o3
h4
h4
h4
t=0
t=1
t=2
a3
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Q4
27 / 33
2) Transfer Planning

Define a source problem for each component

involving fewer agents
h1
h1
a1
h2
a1
h2
a2
h3
o2
o3
o1
a2
o2
1
a1
2
3
Q3
h2
h3
a3

o1
Q2
h1
a2
h3
a3
o3
h4
h4
h4
t=0
t=1
t=2
a3
2
3
4
Solve those exactly or approximately (QMDP, etc.)
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
28 / 33
Results – Compared to Optimal
Fact.
Fact. FSPC
FSPC ++ TP
TP ++ QBG
QBG performs
performs optimally
optimally
1
2
May 8, 2013
3
4
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
29 / 33
Results – vs. Approximate
TP
TP achieves
achieves (near-)
(near-)
best
best value....
value....


...and
...and superior
superior
scaling
scaling behavior
behavior
FFSCP: TP vs ADP, NR
Non-factored FSPC, DICE [Oliehoek et al. 2008 Informatica]
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
30 / 33
Results – Many Agents
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
31 / 33
Conclusions

Factored FSPC with transfer planning:
approximates factored Dec-POMDPs with multiple abstractions
involving subsets of agents

Unprecedented scalability for this class


results up to 1000 agents
Future work:


scale to higher horizon
understand such abstractions


May 8, 2013
empirically verified near-optimal quality
formal understanding influence-based abstraction [AAAI' 12]
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
32 / 33
References







R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for
partially observable stochastic games with common payoffs. In AAMAS, 2004.
K. P. Murphy and Y. Weiss. The factored frontier algorithm for approximate inference in
DBNs. In UAI, 2001.
F. A. Oliehoek, J. F. Kooi, and N. Vlassis. The cross-entropy method for policy search in
decentralized POMDPs. Informatica, 32:341–357, 2008
F. A. Oliehoek, M. T. J. Spaan, S. Whiteson, and N. Vlassis. Exploiting locality of interaction in
factored Dec-POMDPs. In AAMAS, 2008.
F. A. Oliehoek, S. Witwicki, and L. P. Kaelbling. Influence-Based Abstraction for Multiagent
Systems. In AAAI, 2012.
F. A. Oliehoek, S. Whiteson, and M. T. J. Spaan. Exploiting structure in cooperative Bayesian
games. In UAI, 2012.
D. Szer, F. Charpillet, and S. Zilberstein. MAA*: A heuristic search algorithm for solving
decentralized POMDPs. In UAI, 2005.
May 8, 2013
Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
33 / 33