Tractable Planning in Large Teams Distributed POMDPs with Coordination Locales (DPCLs) Emerging team applications require the cooperation of 100s of members (humans, robots, agents). Team members must complete complex, collaborative tasks in dynamic and uncertain environments. How can we effectively and tractably plan in these domains? This work uses the DPCL problem model[2]. DPCLs are similar to Dec-POMDPs in representing problems as sets of states, actions and observations with joint transition, reward, and observation functions. However, DPCLs differ in that they factor state space into global and per-agent local components, and interactions among agents are limited to coordination locales. Coordination Locales define regions of state-action space where joint transition/reward functions are needed : Set of States: Agents not interacting, use independent functions: : Set of Actions: : Set of Observations: : Joint Transition Agents are interacting, use joint CL functions: : Joint Reward CL = : Joint Observation : Initial Belief State Time constraint Relevant region of joint state-action space ? Nature of time constraint (e.g. affects only same-time, affects any future-time) D-TREMOR: Evaluation in a Heterogeneous Rescue Robot Domain Distributed Team REshaping of Models for Rapid-execution Consider the problem of a team of robots planning to search a disaster area. Some robots can assist victims, while others can clear otherwise intraversable debris. Robot observations and movements are subject to uncertainty. We evaluate D-TREMOR’s performance on a number of these planning problems, in teams of up to 100 agents. We extend the TREMOR[2] algorithm for solving DPCLs to produce D-TREMOR, a fullydistributed solver that scales to problems with hundreds of agents. It approximates DPCLs as a set of single-agent POMDPs which are solved in parallel, then iteratively reshaped using messages that describe CL interactions between agent policies. Example Map: Rescue Domain • D-TREMOR policies • Scaling dataset: Unsafe Cell Scaling up from TREMOR[2] to D-TREMOR Task Allocation Interaction Exchange TREMOR D-TREMOR Local Planning Role Allocation Policy Solution Branch & Bound MDP Interaction Detection Joint policy evaluation Independent EVA[3] solvers Decentralized Auction Coordination Reward shaping of local models Clearable Debris Reward shaping of local models with convergence heuristics – 10 to 100 agents – Random maps – Max-joint-value – Last iteration • Density dataset Narrow Corridor Victim Sampling & message passing Rescue Agent • Comparison policies Cleaner Agent Objective function: Get rescue agents to as many victims as possible within a fixed time horizon while minimizing collisions and unsafe cells. – – – – Independent Optimistic Do-nothing Random • • • • Agents can collide in narrow corridors (only one agent can fit at a time) and with clearable debris (blocks rescue agents, but can be cleared by cleaner agents). Model Shaping – 100 agents – Concentric ring maps 3 problems/condition 20 planning iterations 7 time step horizon 1 CPU per agent Results of Scaling Dataset Distributed Interaction Detection using Sampling and Message Exchange Finding the probability of a CL[1]: Finding the value of a CL[1]: • Evaluate local policy • Sample local policy value with/without interactions Number of agents and map size are varied as density of debris, corridors, and unsafe cells is held constant. D-TREMOR Policies No collision Naïve Policies (optimistic, donothing, random) – Test interactions independently • Compute frequency of associated si, ai • Compute change in value if interaction occurred Collision ValCL= -7 Increases in time are related to # of CLs, not # of agents. Entered corridor in 95 of 100 runs: PrCL= 0.95 • Send CL messages to teammates: Results of Density Dataset • Sparsity Relatively small # of messages Concentric rings of narrow corridors are added from outside in on a map where victims are at the center. Independent & Optimistic Improved model shaping of local agent models with convergence heuristics • Shape local model rewards/transitions based on remote interactions High risk of collision penalties makes do-nothing seem competitive. • Re-solve shaped local models to get new policies D-TREMOR rescues many more victims. D-TREMOR resolves many, but not all collisions. Independent & Optimistic Probability of interaction • Result: new locally-optimal policies new interactions Interaction model functions Ignoring interactions = poor performance Independent & Optimistic Independent model functions Conclusions and Future Work In practice, we run into three common issues faced by concurrent optimization algorithms. We alter our model-shaping to mitigate these by reasoning about the types of interactions we have: – Slow convergence Prioritization – Oscillation Probabilistic shaping – Local optima Optimistic policy initialization We introduce D-TREMOR, an approach to scale distributed planning under uncertainty into the hundreds of agents using information exchange and model-shaping. Results suggest competitive performance while improving scalability and reducing computational cost. We are working to further improve performance through better modeling of interaction dynamics and intelligent information dissemination between agents. References Acknowledgements This research has been funded in part by the AFOSR MURI grant FA9550-08-1-0356. This material is based upon work supported under a National Science Foundation Graduate Research Fellowship. [1] M. Kearns, Y. Mansour, and A. Y. Ng. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning. 2002. [2] P. Varakantham, J. Kwak, M. Taylor, J. Marecki, P. Scerri, and M. Tambe. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping. Proc. of ICAPS, 2009. [3] P. Varakantham, R.T. Maheswaran, T. Gupta, and M. Tambe. Towards Efficient Computation of Error Bounded Solutions in POMDPs: Expected Value Approximation and Dynamic Disjunctive Beliefs. Proc. of IJCAI, 2007.
© Copyright 2026 Paperzz