aamas11

Distributed Model Shaping for
Scaling to Decentralized POMDPs
with hundreds of agents
Prasanna Velagapudi
Pradeep Varakantham
Paul Scerri
Katia Sycara
D-TREMOR - AAMAS2011
1
Motivation
• 100s to 1000s of robots,
agents, people
• Complex, collaborative tasks
• Dynamic, uncertain
environment
• Offline planning
D-TREMOR - AAMAS2011
2
Motivation
• Exploit three characteristics of these domains
1. Explicit Interactions
• Specific combinations of states and actions where effects
depend on more than one agent
2. Sparsity of Interactions
• Many potential interactions could occur between agents
• Only a few will occur in any given solution
3. Distributed Computation
• Each agent has access to local computation
• A centralized algorithm has access to 1 unit of computation
• A distributed algorithm has access to N units of computation
D-TREMOR - AAMAS2011
3
Review: Dec-POMDP
: Joint Transition
: Joint Reward
: Joint Observation
D-TREMOR - AAMAS2011
4
Distributed POMDP with Coordination Locales
[Varakantham, et al 2009]
CL =
Time
constraint
Relevant region of joint
state-action space
D-TREMOR - AAMAS2011
Nature of time constraint
(e.g. affects only sametime, affects any futuretime)
5
Distributed POMDP with Coordination Locales
[Varakantham, et al 2009]
CL =
:
:
D-TREMOR - AAMAS2011
6
D-TREMOR
(extending TREMOR [Varakantham, et al 2009])
Task Allocation
Decentralized auction
Local Planning
EVA POMDP solver
Interaction
Exchange
Policy sub-sampling and
Coordination Locale (CL)
messages
Model Shaping
Prioritized/randomized
reward and transition shaping
D-TREMOR - AAMAS2011
7
D-TREMOR: Task Allocation
• Assign “tasks” using
decentralized auction
– Greedy, nearest allocation
• Create local, independent
sub-problem:
D-TREMOR - AAMAS2011
8
D-TREMOR: Local Planning
• Solve using off-the-shelf
algorithm (EVA)
• Result: locally-optimal policies
D-TREMOR - AAMAS2011
9
D-TREMOR: Interaction Exchange
Find PrCLi and ValCLi:
No collision
[Kearns 2002]
Entered corridor in
95 of 100 runs:
PrCLi= 0.95
Collision
ValCLi= -7
• Send CL messages to
teammates:
D-TREMOR - AAMAS2011
10
D-TREMOR: Model Shaping
• Shape local model
rewards/transitions based on
interactions
Probability of interaction
Interaction
model functions
Independent
model functions
D-TREMOR - AAMAS2011
11
D-TREMOR: Local Planning (again)
• Re-solve shaped local models
to get new policies
• Result: new locally-optimal
policies  new interactions
D-TREMOR - AAMAS2011
12
D-TREMOR: Adv. Model Shaping
• In practice, we run into three common issues
faced by concurrent optimization algorithms:
– Slow convergence
– Oscillation
– Local optima
• We can alter our model-shaping to mitigate
these by reasoning about the types of
interactions we have
D-TREMOR - AAMAS2011
13
D-TREMOR: Adv. Model Shaping
• Slow convergence  Prioritization
– Assign priorities to agents, only model-shape
collision interactions for higher priority agents
– Can quickly resolve purely negative interactions
• Negative interaction: when every agent is guaranteed to
have a lower-valued local policy if an interaction occurs
D-TREMOR - AAMAS2011
14
D-TREMOR: Adv. Model Shaping
• Oscillation  Probabilistic shaping
– Often caused by time dynamics between agents
• Agent 1 shapes based on Agent 2’s old policy
• Agent 2 shapes based on Agent 1’s old policy
– Each agent only applies model-shaping with
probability δ [Zhang 2005]
– Breaks out of cycles between agent policies
D-TREMOR - AAMAS2011
15
D-TREMOR: Adv. Model Shaping
• Local Optima  Optimistic initialization
– Agents cannot detect mixed interactions (e.g. debris)
• Rescue agent policies can only improve if debris is cleared
• Cleaner agent policies can only worsen if they clear debris
I’m not going
near the
debris
I’m not
If no one is going
clearing the
through debris, I
debris
won’t clear it
D-TREMOR - AAMAS2011
16
D-TREMOR: Adv. Model Shaping
• Local Optima  Optimistic initialization
– Agents cannot detect mixed interactions (e.g. debris)
• Rescue agent policies can only improve if debris is cleared
• Cleaner agent policies can only worsen if they clear debris
– Let each agent solve an initial model that uses an
optimistic assumption of interaction condition
D-TREMOR - AAMAS2011
17
Experimental Setup
• D-TREMOR policies
• Scaling:
– 10 to 100 agents
– Max-joint-value
– Random maps
– Last iteration
• Density
D-TREMOR produces reasonable
policies for 100-agent
100 agents
planning problems in –under
6 hrs.
(with some caveats)
• Comparison policies
– Concentric ring maps
–
–
–
–
Independent
Optimistic
Do-nothing
Random
•
•
•
•
3 problems/condition
20 planning iterations
7 time step horizon
1 CPU per agent
D-TREMOR - AAMAS2011
18
Experimental Datasets
Scaling Dataset
Density Dataset
D-TREMOR - AAMAS2011
19
Experimental Results: Scaling
D-TREMOR
Policies
Naïve Policies
D-TREMOR - AAMAS2011
20
Experimental Results: Density
D-TREMOR rescues the
most victims
D-TREMOR - AAMAS2011
D-TREMOR does not
resolve every collision
21
Experimental Results: Time
# of CLs Active
Increase in time related to # of CLs, not # of agents
D-TREMOR - AAMAS2011
22
Conclusions
• D-TREMOR: Decentralized planning for sparse
Dec-POMDPs with many agents
• Demonstrated complete distributability, fast
heuristic interaction detection, and local
message exchange to achieve high scalability
• Empirical results in simulated search and rescue
domain
D-TREMOR - AAMAS2011
23
Future Work
• Generalized framework for distributed planning
under uncertainty through iterative message
exchange
• Optimality/convergence bounds
• Reduce necessary communication
• Better search over task allocations
• Scaling to larger team sizes
D-TREMOR - AAMAS2011
24
D-TREMOR - AAMAS2011
25
D-TREMOR - AAMAS2011
26
Motivation
• Scaling planning to large teams is hard
– Need to plan (with uncertainty) for each agent in team
– Agents must consider the actions of a growing
number of teammates
– Full, joint problem has NEXP complexity [Bernstein 2002]
• Optimality is going to be infeasible
• Find and exploit structure in the problem
• Make good plans in reasonable amount of time
D-TREMOR - AAMAS2011
27
Motivation
• Exploit three characteristics of these domains
1. Explicit Interactions
• Specific combinations of states and actions where effects
depend on more than one agent
2. Sparsity of Interactions
• Many potential interactions could occur between agents
• Only a few will occur in any given solution
3. Distributed Computation
• Each agent has access to local computation
• A centralized algorithm has access to 1 unit of computation
• A distributed algorithm has access to N units of computation
D-TREMOR - AAMAS2011
28
Experimental Results: Density
Do-nothing does the best?
Ignoring interactions
= poor performance
D-TREMOR - AAMAS2011
29
Experimental Results: Time
Why is this
increasing?
D-TREMOR - AAMAS2011
30
Related Work
Structured Dec-(PO)MDP
planners
Scalability
– JESP
[Nair 2003]
– TD-Dec-POMDP
[Witwicki 2010]
– EDI-CR
[Mostafa 2009]
– SPIDER
[Marecki 2009]
Generality
D-TREMOR - AAMAS2011
• Restrict generality
slightly to get
scalability
• High optimality
31
Related Work
Heuristic Dec-(PO)MDP
planners
Scalability
– TREMOR
[Varakantham 2009]
– OC-Dec-MDP
[Beynier 2005]
• Sacrifice optimality
for scalability
• High generality
Generality
D-TREMOR - AAMAS2011
32
Related Work
Structured multiagent
path planners
Scalability
– DPC
[Bhattacharya 2010]
– Optimal Decoupling
[Van den Berg 2009]
Generality
D-TREMOR - AAMAS2011
• Sacrifice generality
further to get
scalability
• High optimality
33
Related Work
Heuristic multiagent
path planners
Scalability
– Dynamic Networks
[Clark 2003]
– Prioritized Planning
[Van den Berg 2005]
• Sacrifice optimality
to get scalability
Generality
D-TREMOR - AAMAS2011
34
Related Work
Scalability
Our approach:
• Fix high scalability
and generality
• Explore what level
of optimality is
possible
Generality
D-TREMOR - AAMAS2011
35
A Simple Rescue Domain
Unsafe Cell
Clearable
Debris
Rescue Agent
Narrow
Corridor
Victim
Cleaner Agent
D-TREMOR - AAMAS2011
36
A Simple (Large) Rescue Domain
D-TREMOR - AAMAS2011
37
Distributed POMDP with Coordination
Locales (DPCL)
• Often, interactions between agents are sparse
Only fits one
agent
Passable if
cleaned
[Varakantham, et al 2009]
D-TREMOR - AAMAS2011
38
Distributed, Iterative Planning
• Reduce the full joint problem
into a set of smaller,
independent sub-problems
• Inspiration:
– TREMOR
[Varankantham 2009]
– JESP
[Nair 2003]
• Solve independent subproblems with local algorithm
• Modify sub-problems to push
locally optimal solutions
towards high-quality joint
solution
D-TREMOR - AAMAS2011
39
Distributed Team REshaping of MOdels
for Rapid execution (D-TREMOR)
• Reduce the full joint problem into a set
of smaller, independent sub-problems
(one for each agent)
Task Allocation
Local Planning
• Solve independent sub-problems with
existing state-of-the-art algorithms
• Modify sub-problems such that local
optimum solution approaches highquality joint solution
Interaction
Exchange
Model Shaping
D-TREMOR - AAMAS2011
40
D-TREMOR
(extending [Varakantham, et al 2009])
Task Allocation
Decentralized auction
Local Planning
EVA POMDP solver
Interaction
Exchange
Policy sub-sampling and
Coordination Locale (CL)
messages
Model Shaping
Prioritized/randomized
reward and transition shaping
D-TREMOR - AAMAS2011
41
D-TREMOR: Task Allocation
• Assign “tasks” using
decentralized auction
– Greedy, nearest allocation
• Create local, independent
sub-problem:
D-TREMOR - AAMAS2011
42
D-TREMOR: Local Planning
• Solve using off-the-shelf
algorithm (EVA)
• Result: locally-optimal policies
D-TREMOR - AAMAS2011
43
D-TREMOR: Interaction Exchange
Finding PrCLi [Kearns 2002]:
• Evaluate local policy
• Compute frequency of
associated si, ai
Entered corridor in
95 of 100 runs:
PrCLi= 0.95
D-TREMOR - AAMAS2011
44
D-TREMOR: Interaction Exchange
Finding ValCLi [Kearns 2002]:
No collision
• Sample local policy value
with/without interactions
– Test interactions independently
• Compute change in value if
interaction occurred
D-TREMOR - AAMAS2011
Collision
ValCLi= -7
45
D-TREMOR: Interaction Exchange
• Send CL messages to
teammates:
• Sparsity  Relatively small
# of messages
D-TREMOR - AAMAS2011
46
D-TREMOR: Model Shaping
• Shape local model
rewards/transitions based on
remote interactions
Probability of interaction
Interaction
model functions
Independent
model functions
D-TREMOR - AAMAS2011
47
D-TREMOR: Local Planning (again)
• Re-solve shaped local models
to get new policies
• Result: new locally-optimal
policies  new interactions
D-TREMOR - AAMAS2011
48
D-TREMOR: Adv. Model Shaping
• In practice, we run into three common issues
faced by concurrent optimization algorithms:
– Slow convergence
– Oscillation
– Local optima
• We can alter our model-shaping to mitigate
these by reasoning about the types of
interactions we have
D-TREMOR - AAMAS2011
49
D-TREMOR: Adv. Model Shaping
• Slow convergence  Prioritization
– Majority of interactions are collisions
– Assign priorities to agents, only model-shape
collision interactions for higher priority agents
– From DPP: prioritization can quickly resolve collision
interactions
– Similar properties for any purely negative interaction
• Negative interaction: when every agent is guaranteed to
have a lower-valued local policy if an interaction occurs
D-TREMOR - AAMAS2011
50
D-TREMOR: Adv. Model Shaping
• Oscillation  Probabilistic shaping
– Often caused by time dynamics between agents
• Agent 1 shapes based on Agent 2’s old policy
• Agent 2 shapes based on Agent 1’s old policy
– Each agent only applies model-shaping with
probability δ [Zhang 2005]
– Breaks out of cycles between agent policies
D-TREMOR - AAMAS2011
51
D-TREMOR: Adv. Model Shaping
• Local Optima  Optimistic initialization
– Agents cannot detect mixed interactions (e.g. debris)
• Rescue agent policies can only improve if debris is cleared
• Cleaner agent policies can only worsen if they clear debris
PrCL = low,
ValCL = low
If (Val
low):
PrCLCL= =low,
optimal
ValCLpolicy
= low do
nothing
D-TREMOR - AAMAS2011
52
D-TREMOR: Adv. Model Shaping
• Local Optima  Optimistic initialization
– Agents cannot detect mixed interactions (e.g. debris)
• Rescue agent policies can only improve if debris is cleared
• Cleaner agent policies can only worsen if they clear debris
– Let each agent solve an initial model that uses an
optimistic assumption of interaction condition
D-TREMOR - AAMAS2011
53
Experimental Setup
• D-TREMOR policies
• Scaling:
– 10 to 100 agents
– Max-joint-value
– Random maps
– Last iteration
• Density
D-TREMOR produces reasonable
policies for 100-agent
100 agents
planning problems in –under
6 hrs.
(with some caveats)
• Comparison policies
– Concentric ring maps
–
–
–
–
Independent
Optimistic
Do-nothing
Random
•
•
•
•
3 problems/condition
20 planning iterations
7 time step horizon
1 CPU per agent
D-TREMOR - AAMAS2011
54
Experimental Datasets
Scaling Dataset
Density Dataset
D-TREMOR - AAMAS2011
55
Experimental Results: Scaling
D-TREMOR
Policies
Naïve Policies
D-TREMOR - AAMAS2011
56
Experimental Results: Density
Do-nothing does the best?
Ignoring interactions
= poor performance
D-TREMOR - AAMAS2011
57
Experimental Results: Density
D-TREMOR rescues the
most victims
D-TREMOR - AAMAS2011
D-TREMOR does not
resolve every collision
58
Experimental Results: Time
Why is this
increasing?
D-TREMOR - AAMAS2011
59
Experimental Results: Time
Increase in time related to # of CLs, not # of agents
D-TREMOR - AAMAS2011
60
Conclusions
• D-TREMOR: Decentralized planning for sparse
Dec-POMDPs with many agents
• Demonstrated complete distributability, fast
heuristic interaction detection, and local
message exchange to achieve high scalability
• Empirical results in simulated search and rescue
domain
D-TREMOR - AAMAS2011
61
Conclusions
D-TREMOR produces reasonable policies for
100-agent planning problems in under 6 hrs.
– Partially-observable, uncertain world
– Multiple types of interactions & agents
• Improves over independent planning
• Resolved interactions in large problems
• Still some convergence/efficiency issues
D-TREMOR - AAMAS2011
62
DPCL vs. other models
• EDI/EDI-CR
– Adds complex transition functions
• TD-Dec-MDP
– Allows simultaneous interaction (within epoch)
• Factored MDP/POMDP
– Adds interactions that span epochs
D-TREMOR - AAMAS2011
63
D-TREMOR
D-TREMOR - AAMAS2011
64
D-TREMOR
D-TREMOR - AAMAS2011
65
D-TREMOR: Reward functions
•
Probability that a debris will not allow a robot to enter the cell:
–
•
Probability of action failure
–
•
R_Observe = -0.25;
Reward for a collision
–
•
R_Move = -0.5;
Reward of observing
–
•
R_Cleaning = 0.25;
Reward of moving
–
•
R_Victim = 10.0;
Reward of cleaning debris
–
•
P_ReboundAfterCollision = 0.5;
Reward of saving a victim
–
•
P_ObsSuccessOnFailure = 0.2;
Probability that a robot will return to the same cell after collision
–
•
P_ObsSuccessOnSuccess = 0.8;
Probability that success is observed if the action failed
–
•
P_ActionFailure = 0.2;
Probability that success is observed if the action succeeded.
–
•
P_Debris = 0.9;
R_Collision = -5.0;
Reward for landing in an unsafe cell
–
R_Unsafe = -1;
D-TREMOR - AAMAS2011
66
Review: POMDP
: Set of States
: Set of Actions
: Set of Observations
: Transition function
: Reward function
: Observation function
D-TREMOR - AAMAS2011
67
Distributed POMDP with Coordination Locales
[Varakantham, et al 2009]
• Extension of Dec-POMDP which modifies ,
• Coordination locales (CLs) represent interactions:
Implicitly construct
interaction functions
CL =
Explicit time
constraint
Explicit time
D-TREMOR - AAMAS2011
68
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
• Assign tasks to agents
• Reduce search space considered by agent
• Define local sub-problem for each robot
Local Planning
Interaction
Exchange
Model Shaping
D-TREMOR - AAMAS2011
69
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
Local Planning
• Assign tasks to agents
• Reduce search space considered by agent
• Define local sub-problem for each robot
Full SI-Dec-POMDP
Interaction
Exchange
Model Shaping
Local (Independent) POMDP
D-TREMOR - AAMAS2011
70
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
Local Planning
• Solve local sub-problems using off-theshelf centralized solver
• Result: Locally-optimal policy
Interaction
Exchange
Model Shaping
D-TREMOR - AAMAS2011
71
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
Local Planning
• Given local policy: estimate local
probability and value of interactions
• Communicate local probability and value
of relevant interactions to team members
• Sparsity  Relatively small # of messages
Interaction
Exchange
Model Shaping
D-TREMOR - AAMAS2011
72
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
• Modify local sub-problems to account for
presence of interactions
Local Planning
Interaction
Exchange
Model Shaping
D-TREMOR - AAMAS2011
73
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
• Reallocate tasks or re-plan using
modified local sub-problem
Local Planning
Interaction
Exchange
Model Shaping
D-TREMOR - AAMAS2011
74
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
Any decentralized allocation
mechanism (e.g. auctions)
Local Planning
Stock graph, MDP, POMDP solver
Interaction
Exchange
Lightweight local evaluation and
low-bandwidth messaging
Model Shaping
Methods to alter local problem to
incorporate non-local effects
D-TREMOR - AAMAS2011
75
Example: Interactions
Rescue
robot
Debris
Victim
Cleaner
robot
D-TREMOR - AAMAS2011
76
Example: Sparsity
D-TREMOR - AAMAS2011
77