Tractable Planning in Large Teams Distributed Path Planning (DPP) Emerging team applications require the cooperation of 1000s of members (humans, robots, agents). Team members must complete complex, collaborative tasks in dynamic and uncertain environments. How can we effectively and tractably plan in these domains? Consider the problem of a team of agents given start and goal locations and asked to find a set of collision-free paths through time and space. One approach that has been shown effective for reasonably large teams is prioritized planning[1]. We present a distributed extension to this algorithm that may improve scalability in certain cases. Prioritized Planning[1] Example Map: Planning for 240 agents In prioritized planning, agents are assigned some order, often using a heuristic such as distance to goal. Then, starting with the highest (furthest) agent, plans are generated in sequence. Each agent’s path is used as a dynamic obstacle for all subsequent agents. Start Goal: Get every agent on the team from a start location to a goal location, with no collision between agents and/or any map obstacles. Agents can interact at any state, at any time step, and the interaction (a collision) is highly undesirable for the team. Goal Comparing centralized and distributed performance While both planners attain almost identical solutions, the centralized prioritized planner is faster, even though the distributed planner uses far fewer iterations. Distributed Prioritized Planning (DPP) In DPP, agents plan simultaneously, then exchange paths. If an agent receives a conflicting path of higher priority, it adds it as a dynamic obstacle. One strategy that can be applied to this problem is iterative, independent planning coupled with social model shaping. While the specifics vary by domain, the general process can be broken into a few basic steps: ①Factor the problem and enumerate the set of interactions in the problem state We present Large-scale Teams REshaping of MOdels for Rapid-execution (L-TREMOR), a distributed version of the TREMOR[2] algorithm for solving distributed POMDPs with coordination locales (DPCLs). In DPCL problems, agents are typically able to act independently, except in certain sets of states known as coordination locales. Rescue Agent Clearable Debris N=6 N = 10 N = 100 (structurally similar to N=10) Narrow Corridor Victim Cleaner Agent Goal: Get rescue agents to as many victims as possible within a fixed time horizon. Agents can interact through narrow corridors (only one agent can fit at a time) and clearable debris (blocks rescue agents, but can be cleared by cleaner agents). about When cumulative planning time (over all agents) is normalized by team size, it is evident that L-TREMOR is scaling close to linearly to large teams. Scaling up from TREMOR[2] to L-TREMOR Role Allocation TREMOR Once an agent has some idea of what interactions it could be involved in, it can communicate information about those interactions and how it expects to be affected to its teammates. Empirically computed joint reward is shown for L-TREMOR and an independent solution on three different maps. The results show that improvement occurs, but it is sporadic and unstable. Unsafe Cell Branch & Bound MDP L-TREMOR messages Preliminary Results Example Map: Rescue Domain ②Compute independent plans and find potential interactions between agents ③Exchange interactions A B C D L-TREMOR Create a set of functions that will enable agents to plan independently, except when they are involved in an interaction. Each agent computes an independent plan using its local knowledge of the problem. Using this plan, it can search over all possible interactions to find a set of interactions that might involve it. Centralized Iterative Distributed Planning A B C D DPP This is because the distributed planner must sometimes replan difficult paths, and does not use an incremental planner. Decentralized Auction Policy Solution Interaction Detection Joint policy evaluation Independent EVA[3] solvers Sampling & message passing Coordination Reward shaping of independent models By comparing the reward expected by agents to actual joint reward, a negative linear relationship is evident between improvement over an independent solution, and the error in estimating reward. ④Use exchanged information to improve local model when replanning Now that agents have exchanged information, they have a better idea of which interactions could occur, and how likely they are to occur. They can use this information to improve their local model and return to step 2 to plan again. Acknowledgements This research has been funded in part by the AFOSR MURI grant FA9550-08-1-0356. This material is based upon work supported under a National Science Foundation Graduate Research Fellowship. Conclusions and Future Work In this work, we investigate two related approaches to scale distributed planning into the hundreds of agents using information exchange and reward-shaping. Preliminary work suggests that these techniques may provide competitive performance while improving scalability and reducing computational cost. We are working to further improve performance of these systems through better modeling of the dynamics of the systems and more intelligent dissemination of information over the network. References [1] J. van den Berg and M. Overmars, "Prioritized Motion Planning for Multiple Robots,” Proc. of IEEE/RSJ IROS, 2005. [2] P. Varakantham, J. Kwak, M. Taylor, J. Marecki, P. Scerri, and M. Tambe, "Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping," Proc. of ICAPS, 2009. [3] P. Varakantham, R.T. Maheswaran, T. Gupta, and M. Tambe, "Towards Efficient Computation of Error Bounded Solutions in POMDPs : Expected Value Approximation and Dynamic Disjunctive Beliefs," Proc. of IJCAI, 2007.
© Copyright 2026 Paperzz