A0 Poster Template Informatica

Tractable Planning in Large Teams
Distributed POMDPs with Coordination Locales (DPCLs)
Emerging team applications require the cooperation of
100s of members (humans, robots, agents). Team
members must complete complex, collaborative tasks
in dynamic and uncertain environments. How can we
effectively and tractably plan in these domains?
This work uses the DPCL problem model[2]. DPCLs are similar to Dec-POMDPs in representing problems as sets of states,
actions and observations with joint transition, reward, and observation functions. However, DPCLs differ in that they factor
state space into global and per-agent local components, and interactions among agents are limited to coordination locales.
Coordination Locales define regions of state-action space
where joint transition/reward functions are needed
: Set of States:
Agents not interacting, use independent functions:
: Set of Actions:
: Set of Observations:
: Joint Transition
Agents are interacting, use joint CL functions:
: Joint Reward
CL =
: Joint Observation
: Initial Belief State
Time
constraint
Relevant region of joint
state-action space
?
Nature of time
constraint (e.g. affects
only same-time, affects
any future-time)
D-TREMOR:
Evaluation in a Heterogeneous Rescue Robot Domain
Distributed Team REshaping of Models for Rapid-execution
Consider the problem of a team of robots planning to search a disaster area. Some
robots can assist victims, while others can clear otherwise intraversable debris. Robot
observations and movements are subject to uncertainty. We evaluate D-TREMOR’s
performance on a number of these planning problems, in teams of up to 100 agents.
We extend the TREMOR[2] algorithm for solving DPCLs to produce D-TREMOR, a fullydistributed solver that scales to problems with hundreds of agents. It approximates
DPCLs as a set of single-agent POMDPs which are solved in parallel, then iteratively
reshaped using messages that describe CL interactions between agent policies.
Example Map: Rescue Domain
• D-TREMOR policies • Scaling dataset:
Unsafe
Cell
Scaling up from TREMOR[2] to D-TREMOR
Task
Allocation
Interaction
Exchange
TREMOR
D-TREMOR
Local
Planning
Role Allocation
Policy Solution
Branch & Bound
MDP
Interaction Detection
Joint policy
evaluation
Independent
EVA[3] solvers
Decentralized
Auction
Coordination
Reward shaping
of local models
Clearable
Debris
Reward shaping of
local models with
convergence heuristics
– 10 to 100 agents
– Random maps
– Max-joint-value
– Last iteration
• Density dataset
Narrow
Corridor
Victim
Sampling &
message
passing
Rescue
Agent
• Comparison policies
Cleaner
Agent
Objective function: Get rescue agents to as many
victims as possible within a fixed time horizon while
minimizing collisions and unsafe cells.
–
–
–
–
Independent
Optimistic
Do-nothing
Random
•
•
•
•
Agents can collide in narrow corridors (only one agent
can fit at a time) and with clearable debris (blocks rescue
agents, but can be cleared by cleaner agents).
Model
Shaping
– 100 agents
– Concentric ring maps
3 problems/condition
20 planning iterations
7 time step horizon
1 CPU per agent
Results of Scaling Dataset
Distributed Interaction Detection using Sampling and Message Exchange
Finding the probability of a CL[1]:
Finding the value of a CL[1]:
• Evaluate local policy
• Sample local policy value
with/without interactions
Number of agents and map size are varied as density
of debris, corridors, and unsafe cells is held constant.
D-TREMOR
Policies
No collision
Naïve Policies
(optimistic, donothing, random)
– Test interactions independently
• Compute frequency of
associated si, ai
• Compute change in value if
interaction occurred
Collision
ValCL= -7
Increases in time are related to # of CLs, not # of agents.
Entered corridor
in 95 of 100
runs:
PrCL= 0.95
• Send CL messages to teammates:
Results of Density Dataset
• Sparsity  Relatively small # of messages
Concentric rings of narrow corridors are added from
outside in on a map where victims are at the center.
Independent
& Optimistic
Improved model shaping of local agent models with convergence heuristics
• Shape local model rewards/transitions
based on remote interactions
High risk of collision penalties makes
do-nothing seem competitive.
• Re-solve shaped local
models to get new policies
D-TREMOR rescues many
more victims.
D-TREMOR
resolves many, but
not all collisions.
Independent &
Optimistic
Probability of interaction
• Result: new locally-optimal
policies  new
interactions
Interaction
model functions
Ignoring interactions
= poor performance
Independent &
Optimistic
Independent
model functions
Conclusions and Future Work
In practice, we run into three common issues faced by concurrent optimization algorithms. We
alter our model-shaping to mitigate these by reasoning about the types of interactions we have:
– Slow convergence  Prioritization
– Oscillation  Probabilistic shaping
– Local optima  Optimistic policy initialization
We introduce D-TREMOR, an approach to scale distributed planning under uncertainty
into the hundreds of agents using information exchange and model-shaping. Results
suggest competitive performance while improving scalability and reducing computational
cost. We are working to further improve performance through better modeling of
interaction dynamics and intelligent information dissemination between agents.
References
Acknowledgements
This research has been funded in part by the AFOSR MURI grant FA9550-08-1-0356. This material is
based upon work supported under a National Science Foundation Graduate Research Fellowship.
[1] M. Kearns, Y. Mansour, and A. Y. Ng. A sparse sampling algorithm for near-optimal planning in large Markov decision
processes. Machine Learning. 2002.
[2] P. Varakantham, J. Kwak, M. Taylor, J. Marecki, P. Scerri, and M. Tambe. Exploiting Coordination Locales in Distributed
POMDPs via Social Model Shaping. Proc. of ICAPS, 2009.
[3] P. Varakantham, R.T. Maheswaran, T. Gupta, and M. Tambe. Towards Efficient Computation of Error Bounded Solutions in
POMDPs: Expected Value Approximation and Dynamic Disjunctive Beliefs. Proc. of IJCAI, 2007.