Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents Prasanna Velagapudi Pradeep Varakantham Paul Scerri Katia Sycara D-TREMOR - AAMAS2011 1 Motivation • 100s to 1000s of robots, agents, people • Complex, collaborative tasks • Dynamic, uncertain environment • Offline planning D-TREMOR - AAMAS2011 2 Motivation • Exploit three characteristics of these domains 1. Explicit Interactions • Specific combinations of states and actions where effects depend on more than one agent 2. Sparsity of Interactions • Many potential interactions could occur between agents • Only a few will occur in any given solution 3. Distributed Computation • Each agent has access to local computation • A centralized algorithm has access to 1 unit of computation • A distributed algorithm has access to N units of computation D-TREMOR - AAMAS2011 3 Review: Dec-POMDP : Joint Transition : Joint Reward : Joint Observation D-TREMOR - AAMAS2011 4 Distributed POMDP with Coordination Locales [Varakantham, et al 2009] CL = Time constraint Relevant region of joint state-action space D-TREMOR - AAMAS2011 Nature of time constraint (e.g. affects only sametime, affects any futuretime) 5 Distributed POMDP with Coordination Locales [Varakantham, et al 2009] CL = : : D-TREMOR - AAMAS2011 6 D-TREMOR (extending TREMOR [Varakantham, et al 2009]) Task Allocation Decentralized auction Local Planning EVA POMDP solver Interaction Exchange Policy sub-sampling and Coordination Locale (CL) messages Model Shaping Prioritized/randomized reward and transition shaping D-TREMOR - AAMAS2011 7 D-TREMOR: Task Allocation • Assign “tasks” using decentralized auction – Greedy, nearest allocation • Create local, independent sub-problem: D-TREMOR - AAMAS2011 8 D-TREMOR: Local Planning • Solve using off-the-shelf algorithm (EVA) • Result: locally-optimal policies D-TREMOR - AAMAS2011 9 D-TREMOR: Interaction Exchange Find PrCLi and ValCLi: No collision [Kearns 2002] Entered corridor in 95 of 100 runs: PrCLi= 0.95 Collision ValCLi= -7 • Send CL messages to teammates: D-TREMOR - AAMAS2011 10 D-TREMOR: Model Shaping • Shape local model rewards/transitions based on interactions Probability of interaction Interaction model functions Independent model functions D-TREMOR - AAMAS2011 11 D-TREMOR: Local Planning (again) • Re-solve shaped local models to get new policies • Result: new locally-optimal policies new interactions D-TREMOR - AAMAS2011 12 D-TREMOR: Adv. Model Shaping • In practice, we run into three common issues faced by concurrent optimization algorithms: – Slow convergence – Oscillation – Local optima • We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have D-TREMOR - AAMAS2011 13 D-TREMOR: Adv. Model Shaping • Slow convergence Prioritization – Assign priorities to agents, only model-shape collision interactions for higher priority agents – Can quickly resolve purely negative interactions • Negative interaction: when every agent is guaranteed to have a lower-valued local policy if an interaction occurs D-TREMOR - AAMAS2011 14 D-TREMOR: Adv. Model Shaping • Oscillation Probabilistic shaping – Often caused by time dynamics between agents • Agent 1 shapes based on Agent 2’s old policy • Agent 2 shapes based on Agent 1’s old policy – Each agent only applies model-shaping with probability δ [Zhang 2005] – Breaks out of cycles between agent policies D-TREMOR - AAMAS2011 15 D-TREMOR: Adv. Model Shaping • Local Optima Optimistic initialization – Agents cannot detect mixed interactions (e.g. debris) • Rescue agent policies can only improve if debris is cleared • Cleaner agent policies can only worsen if they clear debris I’m not going near the debris I’m not If no one is going clearing the through debris, I debris won’t clear it D-TREMOR - AAMAS2011 16 D-TREMOR: Adv. Model Shaping • Local Optima Optimistic initialization – Agents cannot detect mixed interactions (e.g. debris) • Rescue agent policies can only improve if debris is cleared • Cleaner agent policies can only worsen if they clear debris – Let each agent solve an initial model that uses an optimistic assumption of interaction condition D-TREMOR - AAMAS2011 17 Experimental Setup • D-TREMOR policies • Scaling: – 10 to 100 agents – Max-joint-value – Random maps – Last iteration • Density D-TREMOR produces reasonable policies for 100-agent 100 agents planning problems in –under 6 hrs. (with some caveats) • Comparison policies – Concentric ring maps – – – – Independent Optimistic Do-nothing Random • • • • 3 problems/condition 20 planning iterations 7 time step horizon 1 CPU per agent D-TREMOR - AAMAS2011 18 Experimental Datasets Scaling Dataset Density Dataset D-TREMOR - AAMAS2011 19 Experimental Results: Scaling D-TREMOR Policies Naïve Policies D-TREMOR - AAMAS2011 20 Experimental Results: Density D-TREMOR rescues the most victims D-TREMOR - AAMAS2011 D-TREMOR does not resolve every collision 21 Experimental Results: Time # of CLs Active Increase in time related to # of CLs, not # of agents D-TREMOR - AAMAS2011 22 Conclusions • D-TREMOR: Decentralized planning for sparse Dec-POMDPs with many agents • Demonstrated complete distributability, fast heuristic interaction detection, and local message exchange to achieve high scalability • Empirical results in simulated search and rescue domain D-TREMOR - AAMAS2011 23 Future Work • Generalized framework for distributed planning under uncertainty through iterative message exchange • Optimality/convergence bounds • Reduce necessary communication • Better search over task allocations • Scaling to larger team sizes D-TREMOR - AAMAS2011 24 D-TREMOR - AAMAS2011 25 D-TREMOR - AAMAS2011 26 Motivation • Scaling planning to large teams is hard – Need to plan (with uncertainty) for each agent in team – Agents must consider the actions of a growing number of teammates – Full, joint problem has NEXP complexity [Bernstein 2002] • Optimality is going to be infeasible • Find and exploit structure in the problem • Make good plans in reasonable amount of time D-TREMOR - AAMAS2011 27 Motivation • Exploit three characteristics of these domains 1. Explicit Interactions • Specific combinations of states and actions where effects depend on more than one agent 2. Sparsity of Interactions • Many potential interactions could occur between agents • Only a few will occur in any given solution 3. Distributed Computation • Each agent has access to local computation • A centralized algorithm has access to 1 unit of computation • A distributed algorithm has access to N units of computation D-TREMOR - AAMAS2011 28 Experimental Results: Density Do-nothing does the best? Ignoring interactions = poor performance D-TREMOR - AAMAS2011 29 Experimental Results: Time Why is this increasing? D-TREMOR - AAMAS2011 30 Related Work Structured Dec-(PO)MDP planners Scalability – JESP [Nair 2003] – TD-Dec-POMDP [Witwicki 2010] – EDI-CR [Mostafa 2009] – SPIDER [Marecki 2009] Generality D-TREMOR - AAMAS2011 • Restrict generality slightly to get scalability • High optimality 31 Related Work Heuristic Dec-(PO)MDP planners Scalability – TREMOR [Varakantham 2009] – OC-Dec-MDP [Beynier 2005] • Sacrifice optimality for scalability • High generality Generality D-TREMOR - AAMAS2011 32 Related Work Structured multiagent path planners Scalability – DPC [Bhattacharya 2010] – Optimal Decoupling [Van den Berg 2009] Generality D-TREMOR - AAMAS2011 • Sacrifice generality further to get scalability • High optimality 33 Related Work Heuristic multiagent path planners Scalability – Dynamic Networks [Clark 2003] – Prioritized Planning [Van den Berg 2005] • Sacrifice optimality to get scalability Generality D-TREMOR - AAMAS2011 34 Related Work Scalability Our approach: • Fix high scalability and generality • Explore what level of optimality is possible Generality D-TREMOR - AAMAS2011 35 A Simple Rescue Domain Unsafe Cell Clearable Debris Rescue Agent Narrow Corridor Victim Cleaner Agent D-TREMOR - AAMAS2011 36 A Simple (Large) Rescue Domain D-TREMOR - AAMAS2011 37 Distributed POMDP with Coordination Locales (DPCL) • Often, interactions between agents are sparse Only fits one agent Passable if cleaned [Varakantham, et al 2009] D-TREMOR - AAMAS2011 38 Distributed, Iterative Planning • Reduce the full joint problem into a set of smaller, independent sub-problems • Inspiration: – TREMOR [Varankantham 2009] – JESP [Nair 2003] • Solve independent subproblems with local algorithm • Modify sub-problems to push locally optimal solutions towards high-quality joint solution D-TREMOR - AAMAS2011 39 Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR) • Reduce the full joint problem into a set of smaller, independent sub-problems (one for each agent) Task Allocation Local Planning • Solve independent sub-problems with existing state-of-the-art algorithms • Modify sub-problems such that local optimum solution approaches highquality joint solution Interaction Exchange Model Shaping D-TREMOR - AAMAS2011 40 D-TREMOR (extending [Varakantham, et al 2009]) Task Allocation Decentralized auction Local Planning EVA POMDP solver Interaction Exchange Policy sub-sampling and Coordination Locale (CL) messages Model Shaping Prioritized/randomized reward and transition shaping D-TREMOR - AAMAS2011 41 D-TREMOR: Task Allocation • Assign “tasks” using decentralized auction – Greedy, nearest allocation • Create local, independent sub-problem: D-TREMOR - AAMAS2011 42 D-TREMOR: Local Planning • Solve using off-the-shelf algorithm (EVA) • Result: locally-optimal policies D-TREMOR - AAMAS2011 43 D-TREMOR: Interaction Exchange Finding PrCLi [Kearns 2002]: • Evaluate local policy • Compute frequency of associated si, ai Entered corridor in 95 of 100 runs: PrCLi= 0.95 D-TREMOR - AAMAS2011 44 D-TREMOR: Interaction Exchange Finding ValCLi [Kearns 2002]: No collision • Sample local policy value with/without interactions – Test interactions independently • Compute change in value if interaction occurred D-TREMOR - AAMAS2011 Collision ValCLi= -7 45 D-TREMOR: Interaction Exchange • Send CL messages to teammates: • Sparsity Relatively small # of messages D-TREMOR - AAMAS2011 46 D-TREMOR: Model Shaping • Shape local model rewards/transitions based on remote interactions Probability of interaction Interaction model functions Independent model functions D-TREMOR - AAMAS2011 47 D-TREMOR: Local Planning (again) • Re-solve shaped local models to get new policies • Result: new locally-optimal policies new interactions D-TREMOR - AAMAS2011 48 D-TREMOR: Adv. Model Shaping • In practice, we run into three common issues faced by concurrent optimization algorithms: – Slow convergence – Oscillation – Local optima • We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have D-TREMOR - AAMAS2011 49 D-TREMOR: Adv. Model Shaping • Slow convergence Prioritization – Majority of interactions are collisions – Assign priorities to agents, only model-shape collision interactions for higher priority agents – From DPP: prioritization can quickly resolve collision interactions – Similar properties for any purely negative interaction • Negative interaction: when every agent is guaranteed to have a lower-valued local policy if an interaction occurs D-TREMOR - AAMAS2011 50 D-TREMOR: Adv. Model Shaping • Oscillation Probabilistic shaping – Often caused by time dynamics between agents • Agent 1 shapes based on Agent 2’s old policy • Agent 2 shapes based on Agent 1’s old policy – Each agent only applies model-shaping with probability δ [Zhang 2005] – Breaks out of cycles between agent policies D-TREMOR - AAMAS2011 51 D-TREMOR: Adv. Model Shaping • Local Optima Optimistic initialization – Agents cannot detect mixed interactions (e.g. debris) • Rescue agent policies can only improve if debris is cleared • Cleaner agent policies can only worsen if they clear debris PrCL = low, ValCL = low If (Val low): PrCLCL= =low, optimal ValCLpolicy = low do nothing D-TREMOR - AAMAS2011 52 D-TREMOR: Adv. Model Shaping • Local Optima Optimistic initialization – Agents cannot detect mixed interactions (e.g. debris) • Rescue agent policies can only improve if debris is cleared • Cleaner agent policies can only worsen if they clear debris – Let each agent solve an initial model that uses an optimistic assumption of interaction condition D-TREMOR - AAMAS2011 53 Experimental Setup • D-TREMOR policies • Scaling: – 10 to 100 agents – Max-joint-value – Random maps – Last iteration • Density D-TREMOR produces reasonable policies for 100-agent 100 agents planning problems in –under 6 hrs. (with some caveats) • Comparison policies – Concentric ring maps – – – – Independent Optimistic Do-nothing Random • • • • 3 problems/condition 20 planning iterations 7 time step horizon 1 CPU per agent D-TREMOR - AAMAS2011 54 Experimental Datasets Scaling Dataset Density Dataset D-TREMOR - AAMAS2011 55 Experimental Results: Scaling D-TREMOR Policies Naïve Policies D-TREMOR - AAMAS2011 56 Experimental Results: Density Do-nothing does the best? Ignoring interactions = poor performance D-TREMOR - AAMAS2011 57 Experimental Results: Density D-TREMOR rescues the most victims D-TREMOR - AAMAS2011 D-TREMOR does not resolve every collision 58 Experimental Results: Time Why is this increasing? D-TREMOR - AAMAS2011 59 Experimental Results: Time Increase in time related to # of CLs, not # of agents D-TREMOR - AAMAS2011 60 Conclusions • D-TREMOR: Decentralized planning for sparse Dec-POMDPs with many agents • Demonstrated complete distributability, fast heuristic interaction detection, and local message exchange to achieve high scalability • Empirical results in simulated search and rescue domain D-TREMOR - AAMAS2011 61 Conclusions D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. – Partially-observable, uncertain world – Multiple types of interactions & agents • Improves over independent planning • Resolved interactions in large problems • Still some convergence/efficiency issues D-TREMOR - AAMAS2011 62 DPCL vs. other models • EDI/EDI-CR – Adds complex transition functions • TD-Dec-MDP – Allows simultaneous interaction (within epoch) • Factored MDP/POMDP – Adds interactions that span epochs D-TREMOR - AAMAS2011 63 D-TREMOR D-TREMOR - AAMAS2011 64 D-TREMOR D-TREMOR - AAMAS2011 65 D-TREMOR: Reward functions • Probability that a debris will not allow a robot to enter the cell: – • Probability of action failure – • R_Observe = -0.25; Reward for a collision – • R_Move = -0.5; Reward of observing – • R_Cleaning = 0.25; Reward of moving – • R_Victim = 10.0; Reward of cleaning debris – • P_ReboundAfterCollision = 0.5; Reward of saving a victim – • P_ObsSuccessOnFailure = 0.2; Probability that a robot will return to the same cell after collision – • P_ObsSuccessOnSuccess = 0.8; Probability that success is observed if the action failed – • P_ActionFailure = 0.2; Probability that success is observed if the action succeeded. – • P_Debris = 0.9; R_Collision = -5.0; Reward for landing in an unsafe cell – R_Unsafe = -1; D-TREMOR - AAMAS2011 66 Review: POMDP : Set of States : Set of Actions : Set of Observations : Transition function : Reward function : Observation function D-TREMOR - AAMAS2011 67 Distributed POMDP with Coordination Locales [Varakantham, et al 2009] • Extension of Dec-POMDP which modifies , • Coordination locales (CLs) represent interactions: Implicitly construct interaction functions CL = Explicit time constraint Explicit time D-TREMOR - AAMAS2011 68 Proposed Approach: DIMS Distributed Iterative Model Shaping Task Allocation • Assign tasks to agents • Reduce search space considered by agent • Define local sub-problem for each robot Local Planning Interaction Exchange Model Shaping D-TREMOR - AAMAS2011 69 Proposed Approach: DIMS Distributed Iterative Model Shaping Task Allocation Local Planning • Assign tasks to agents • Reduce search space considered by agent • Define local sub-problem for each robot Full SI-Dec-POMDP Interaction Exchange Model Shaping Local (Independent) POMDP D-TREMOR - AAMAS2011 70 Proposed Approach: DIMS Distributed Iterative Model Shaping Task Allocation Local Planning • Solve local sub-problems using off-theshelf centralized solver • Result: Locally-optimal policy Interaction Exchange Model Shaping D-TREMOR - AAMAS2011 71 Proposed Approach: DIMS Distributed Iterative Model Shaping Task Allocation Local Planning • Given local policy: estimate local probability and value of interactions • Communicate local probability and value of relevant interactions to team members • Sparsity Relatively small # of messages Interaction Exchange Model Shaping D-TREMOR - AAMAS2011 72 Proposed Approach: DIMS Distributed Iterative Model Shaping Task Allocation • Modify local sub-problems to account for presence of interactions Local Planning Interaction Exchange Model Shaping D-TREMOR - AAMAS2011 73 Proposed Approach: DIMS Distributed Iterative Model Shaping Task Allocation • Reallocate tasks or re-plan using modified local sub-problem Local Planning Interaction Exchange Model Shaping D-TREMOR - AAMAS2011 74 Proposed Approach: DIMS Distributed Iterative Model Shaping Task Allocation Any decentralized allocation mechanism (e.g. auctions) Local Planning Stock graph, MDP, POMDP solver Interaction Exchange Lightweight local evaluation and low-bandwidth messaging Model Shaping Methods to alter local problem to incorporate non-local effects D-TREMOR - AAMAS2011 75 Example: Interactions Rescue robot Debris Victim Cleaner robot D-TREMOR - AAMAS2011 76 Example: Sparsity D-TREMOR - AAMAS2011 77
© Copyright 2026 Paperzz