Approximate Solutions for Factored Dec-POMDPs with Many Agents Frans A. Oliehoek, Shimon Whiteson, & Matthijs T.J. Spaan AAMAS, Wednesday May 8, 2013 May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 1 / 33 Visual Roadmap model model background: background: FSPC FSPC Factored Factored FSPC FSPC practical practical factored factored FFSPC FFSPC May 8, 2013 results results Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 2 / 33 Multiagent decision making under Uncertainty Outcome Uncertainty Partial Observability Multiagent Systems: uncertainty about others May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 3 / 33 Formal Model A Dec-POMDP 〈 S , A , PT , O , PO , R ,h〉 n agents S – set of states A – set of joint actions PT – transition function a=〈 a1, a2, ... ,an 〉 P(s '∣s , a) O – set of joint observations PO – observation function o=〈o1, o2, ..., o n 〉 R – reward function h – horizon (finite) R (s , a)=E s ' R(s ,a , s ' ) May 8, 2013 P(o∣a , s ') Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 4 / 33 Factored Dec-POMDPs state variables or 'factors' E.g., 'Firefighting Graph' H houses h1...hH H-1 agents h1 actions: left or right observations: flames or not h2 1 2 3 4 h3 h4 t May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 5 / 33 Factored Dec-POMDPs CPTs (for each factor) E.g., 'Firefighting Graph' H houses h1...hH H-1 agents actions: left or right observations: flames or not 1 2 3 4 h1 h1' h2 h2' h3 h 3' h4 h 4' t May 8, 2013 t+1 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 6 / 33 Factored Dec-POMDPs CPTs (for each factor) E.g., 'Firefighting Graph' H houses h1...hH H-1 agents h1 actions: left or right observations: flames or not h1' a1 h2 h2' a2 1 2 3 4 h3 h 3' a3 h4 t May 8, 2013 h 4' t+1 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 7 / 33 Factored Dec-POMDPs CPTs (for each factor) E.g., 'Firefighting Graph' H houses h1...hH H-1 agents h1 actions: left or right observations: flames or not h1' o1 a1 h2 h2' o2 a2 1 2 3 4 h3 h 3' o3 a3 h4 t May 8, 2013 h 4' t+1 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 8 / 33 Factored Dec-POMDPs reward for house 1 E.g., 'Firefighting Graph' H houses h1...hH H-1 agents R1 h1 actions: left or right observations: flames or not h 1' o1 a1 h2 h 2' o2 a2 1 2 3 4 h3 h3' o3 a3 h4 t May 8, 2013 h4' t+1 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 9 / 33 Factored Dec-POMDPs E.g., 'Firefighting Graph' H houses h1...hH H-1 agents R1 Expected reward for house 1 h1 actions: left or right observations: flames or not h 1' o1 a1 h2 h 2' o2 a2 1 2 3 4 h3 h3' o3 a3 h4 t May 8, 2013 h4' t+1 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 10 / 33 Background: FSPC – 1 Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC) for each t=0,...,h-1 compute best joint decision rule δt given past joint policy φt May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 11 / 33 Background: FSPC – 1 Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC) for each t=0,...,h-1 compute best joint decision rule δt given past joint policy φt May 8, 2013 () φ0 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 12 / 33 Background: FSPC – 1 Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC) for each t=0,...,h-1 compute best joint decision rule δt given past joint policy φt May 8, 2013 () φ0 δ0 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 13 / 33 Background: FSPC – 1 Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC) for each t=0,...,h-1 compute best joint decision rule δt given past joint policy φt () φ0 (δ0) φ1 May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 14 / 33 Background: FSPC – 1 Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC) for each t=0,...,h-1 compute best joint decision rule δt given past joint policy φt () φ0 (δ0) φ1 δ1 May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 15 / 33 Background: FSPC – 1 Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC) for each t=0,...,h-1 compute best joint decision rule δt given past joint policy φt () φ0 (δ0) φ1 (δ0,δ1) φ2 May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 16 / 33 Background: FSPC – 2 Problem at stage t select δ=<δ1,...,δn> δi : histories → actions δti ( ⃗θti )=ati As collaborative Bayesian games (CBGs) φ0 B(φ0) φ1 B(φ1) φ2 B(φ2) agents, actions types θi ↔ histories probabilities: P(θ) payoffs: Q(θ,a) May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 17 / 33 Background: CBGs 1 2 3 ag. 2 Flames Flames agent 1 No Flames No Flames H2 H3 H2 H3 H1 +2 -1 ... ... H2 ... ... ... ... H1 … ... ... ... H2 ... ... ... ... As collaborative Bayesian games (CBGs) B(φ0) B(φ1) agents, actions types θi ↔ histories probabilities: P(θ) payoffs: Q(θ,a) May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents B(φ2) 18 / 33 This paper: Factored FSPC Factored Factored FSPC FSPC –– Basic Basic idea: idea: exploit exploit independence independence between between agents agents in in FSPC FSPC if value function factored Q( x , ⃗θ , a)=∑e Q e ( x e , ⃗ θ e , ae ) → replace CBGs with Collaborative graphical Bayesian games (CGBGs) May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 19 / 33 This paper: Factored FSPC agent 1 agent 3 agent 2 if value function factored Total payoff is sum of components → replace CBGs with Collaborative graphical Bayesian games (CGBGs) May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 20 / 33 Factored FSPC for Many Agents Improving scalability w.r.t. the number of agents... 1) Approximate structure of Q* Predetermined scope structure 2) Compute CGBG payoff functions Transfer Planning (TP) B(φ0) B(φ1) B(φ2) 3) Inference techniques to construct CGBGs Extension of factored frontier [Murphy&Weiss UAI 2001] 4) Efficient solutions of CGBGs Max-plus to ATI-FG [OWS UAI 2012] May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 21 / 33 Factored FSPC for Many Agents B(φ0) Improving scalability w.r.t. the number of agents... 1) Approximate structure of Q* Predetermined scope structure 2) Compute CGBG payoff functions Rest of this talk Transfer Planning (TP) B(φ1) B(φ2) 3) Inference techniques to construct CGBGs Extension of factored frontier [Murphy&Weiss UAI 2001] 4) Efficient solutions of CGBGs Max-plus to ATI-FG [OWS UAI 2012] May 8, 2013 see paper Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 22 / 33 Accumulation of Reward over Time Q e=1 ( x e , ⃗ θe , a e ) h1 h1 a1 h2 o1 h1 a1 h2 a2 h3 o2 o3 o1 a2 o2 a2 h3 a3 o3 h4 h4 h4 t=0 t=1 t=2 May 8, 2013 a1 h2 h3 a3 R1 a3 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 23 / 33 Accumulation of Reward over Time R1 h1 h1 a1 h2 o1 h1 a1 h2 a2 h3 o2 o3 a2 o2 a2 h3 a3 o3 h4 h4 h4 t=0 t=1 t=2 May 8, 2013 a1 h2 h3 a3 o1 a3 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 24 / 33 Accumulation of Reward over Time Still Still factored, factored, but but components components not not local! local! Q1 h1 h1 a1 h2 o1 h1 a1 h2 a2 h3 o2 o3 o1 a2 o2 a2 h3 a3 o3 h4 h4 h4 t=0 t=1 t=2 May 8, 2013 a1 h2 h3 a3 R1 a3 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 25 / 33 1) Predetermined Scope Structure Use smaller scopes Q1 h1 h1 a1 h2 a1 h2 a2 h3 o2 o3 o1 a1 h2 a2 h3 a3 May 8, 2013 o1 h1 o2 a2 h3 a3 o3 h4 h4 h4 t=0 t=1 t=2 a3 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 26 / 33 1) Predetermined Scope Structure Use smaller scopes Q1 h1 h1 a1 h2 a1 h2 a2 h3 o2 o3 o1 a1 Q3 h2 a2 h3 a3 May 8, 2013 o1 Q2 h1 o2 a2 h3 a3 o3 h4 h4 h4 t=0 t=1 t=2 a3 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents Q4 27 / 33 2) Transfer Planning Define a source problem for each component involving fewer agents h1 h1 a1 h2 a1 h2 a2 h3 o2 o3 o1 a2 o2 1 a1 2 3 Q3 h2 h3 a3 o1 Q2 h1 a2 h3 a3 o3 h4 h4 h4 t=0 t=1 t=2 a3 2 3 4 Solve those exactly or approximately (QMDP, etc.) May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 28 / 33 Results – Compared to Optimal Fact. Fact. FSPC FSPC ++ TP TP ++ QBG QBG performs performs optimally optimally 1 2 May 8, 2013 3 4 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 29 / 33 Results – vs. Approximate TP TP achieves achieves (near-) (near-) best best value.... value.... ...and ...and superior superior scaling scaling behavior behavior FFSCP: TP vs ADP, NR Non-factored FSPC, DICE [Oliehoek et al. 2008 Informatica] May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 30 / 33 Results – Many Agents May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 31 / 33 Conclusions Factored FSPC with transfer planning: approximates factored Dec-POMDPs with multiple abstractions involving subsets of agents Unprecedented scalability for this class results up to 1000 agents Future work: scale to higher horizon understand such abstractions May 8, 2013 empirically verified near-optimal quality formal understanding influence-based abstraction [AAAI' 12] Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 32 / 33 References R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In AAMAS, 2004. K. P. Murphy and Y. Weiss. The factored frontier algorithm for approximate inference in DBNs. In UAI, 2001. F. A. Oliehoek, J. F. Kooi, and N. Vlassis. The cross-entropy method for policy search in decentralized POMDPs. Informatica, 32:341–357, 2008 F. A. Oliehoek, M. T. J. Spaan, S. Whiteson, and N. Vlassis. Exploiting locality of interaction in factored Dec-POMDPs. In AAMAS, 2008. F. A. Oliehoek, S. Witwicki, and L. P. Kaelbling. Influence-Based Abstraction for Multiagent Systems. In AAAI, 2012. F. A. Oliehoek, S. Whiteson, and M. T. J. Spaan. Exploiting structure in cooperative Bayesian games. In UAI, 2012. D. Szer, F. Charpillet, and S. Zilberstein. MAA*: A heuristic search algorithm for solving decentralized POMDPs. In UAI, 2005. May 8, 2013 Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents 33 / 33
© Copyright 2026 Paperzz