aamas10_dmp_poster_p..

Tractable Planning in Large Teams
Distributed Path Planning (DPP)
Emerging team applications require the cooperation of
1000s of members (humans, robots, agents). Team
members must complete complex, collaborative tasks
in dynamic and uncertain environments. How can we
effectively and tractably plan in these domains?
Consider the problem of a team of agents given start and goal locations and asked to find a set of collision-free paths
through time and space. One approach that has been shown effective for reasonably large teams is prioritized planning[1].
We present a distributed extension to this algorithm that may improve scalability in certain cases.
Prioritized Planning[1]
Example Map: Planning for 240 agents
In prioritized planning, agents are assigned some
order, often using a heuristic such as distance to
goal. Then, starting with the highest (furthest) agent,
plans are generated in sequence. Each agent’s
path is used as a dynamic obstacle for all
subsequent agents.
Start
Goal: Get every agent on the
team from a start location to a
goal location, with no collision
between agents and/or any
map obstacles.
Agents can interact at any
state, at any time step, and the
interaction (a collision) is highly
undesirable for the team.
Goal
Comparing centralized and distributed performance
While both planners attain almost identical
solutions, the centralized prioritized planner
is faster, even though the distributed planner
uses far fewer iterations.
Distributed Prioritized Planning (DPP)
In DPP, agents plan simultaneously, then exchange paths. If an agent receives
a conflicting path of higher priority, it adds it as a dynamic obstacle.
One strategy that can be applied to this problem is
iterative, independent planning coupled with social
model shaping. While the specifics vary by domain,
the general process can be broken into a few basic
steps:
①Factor the problem and enumerate the
set of interactions in the problem state
We present Large-scale Teams REshaping of MOdels for Rapid-execution (L-TREMOR), a distributed version of the
TREMOR[2] algorithm for solving distributed POMDPs with coordination locales (DPCLs). In DPCL problems, agents are
typically able to act independently, except in certain sets of states known as coordination locales.
Rescue
Agent
Clearable
Debris
N=6
N = 10
N = 100
(structurally
similar to N=10)
Narrow
Corridor
Victim
Cleaner
Agent
Goal: Get rescue agents to as many victims as possible within a fixed time horizon.
Agents can interact through narrow corridors (only one agent can fit at a time) and
clearable debris (blocks rescue agents, but can be cleared by cleaner agents).
about
When cumulative planning time (over
all agents) is normalized by team size,
it is evident that L-TREMOR is scaling
close to linearly to large teams.
Scaling up from TREMOR[2] to L-TREMOR
Role Allocation
TREMOR
Once an agent has some idea of what
interactions it could be involved in, it can
communicate
information
about
those
interactions and how it expects to be affected to
its teammates.
Empirically computed joint reward is shown for L-TREMOR and an
independent solution on three different maps. The results show that
improvement occurs, but it is sporadic and unstable.
Unsafe Cell
Branch & Bound
MDP
L-TREMOR
messages
Preliminary Results
Example Map: Rescue Domain
②Compute independent plans and find
potential interactions between agents
③Exchange
interactions
A
B
C
D
L-TREMOR
Create a set of functions that will enable agents
to plan independently, except when they are
involved in an interaction.
Each agent computes an independent plan
using its local knowledge of the problem. Using
this plan, it can search over all possible
interactions to find a set of interactions that
might involve it.
Centralized
Iterative Distributed Planning
A
B
C
D
DPP
This is because the distributed planner must
sometimes replan difficult paths, and does
not use an incremental planner.
Decentralized
Auction
Policy Solution
Interaction Detection
Joint policy
evaluation
Independent
EVA[3] solvers
Sampling &
message
passing
Coordination
Reward shaping
of independent
models
By comparing the reward expected by agents to actual joint reward, a
negative linear relationship is evident between improvement over an
independent solution, and the error in estimating reward.
④Use exchanged information to improve
local model when replanning
Now that agents have exchanged information,
they have a better idea of which interactions
could occur, and how likely they are to occur.
They can use this information to improve their
local model and return to step 2 to plan again.
Acknowledgements
This research has been funded in part by the AFOSR MURI
grant FA9550-08-1-0356. This material is based upon work
supported under a National Science Foundation Graduate
Research Fellowship.
Conclusions and Future Work
In this work, we investigate two related approaches to scale distributed planning into the hundreds of agents using
information exchange and reward-shaping. Preliminary work suggests that these techniques may provide competitive
performance while improving scalability and reducing computational cost. We are working to further improve performance of
these systems through better modeling of the dynamics of the systems and more intelligent dissemination of information
over the network.
References
[1] J. van den Berg and M. Overmars, "Prioritized Motion Planning for Multiple Robots,” Proc. of IEEE/RSJ IROS, 2005.
[2] P. Varakantham, J. Kwak, M. Taylor, J. Marecki, P. Scerri, and M. Tambe, "Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping," Proc. of
ICAPS, 2009.
[3] P. Varakantham, R.T. Maheswaran, T. Gupta, and M. Tambe, "Towards Efficient Computation of Error Bounded Solutions in POMDPs : Expected Value Approximation and
Dynamic Disjunctive Beliefs," Proc. of IJCAI, 2007.