Generalized Strategies for Reward

Information Sharing for
Distributed Planning
Prasanna Velagapudi
AAMAS 2010 - Doctoral Symposium
1
Large Heterogeneous Teams
• 100s to 1000s of
robots, agents, people
• Complex, collaborative
tasks
• Dynamic, uncertain
environment
• Joint planning
intractable
AAMAS 2010 - Doctoral Symposium
2
Scaling Team Planning
• Independent planners: can’t account for
teammates
• Existing work: needs specific structure or doesn’t
scale to these sizes
– DPC, Prioritized Planning
– JESP, Factored MDP, ND-POMDP
AAMAS 2010 - Doctoral Symposium
3
Iterated Distributed Planning
1.
2.
3.
4.
Factor the problem, enumerate interactions
Compute independent plans & potential interactions
Exchange messages about interactions
Use exchanged information, improve local model
AAMAS 2010 - Doctoral Symposium
4
Iterated Distributed Planning
1.
2.
3.
4.
Factor the problem, enumerate interactions
Compute independent plans & potential interactions
Exchange messages about interactions
Use exchanged information, improve local model
?
AAMAS 2010 - Doctoral Symposium
5
Iterated Distributed Planning
1.
2.
3.
4.
Factor the problem, enumerate interactions
Compute independent plans & potential interactions
Exchange messages about interactions
Use exchanged information, improve local model
?
AAMAS 2010 - Doctoral Symposium
6
Iterated Distributed Planning
1.
2.
3.
4.
Factor the problem, enumerate interactions
Compute independent plans & potential interactions
Exchange messages about interactions
Use exchanged information, improve local model
AAMAS 2010 - Doctoral Symposium
7
A Tale of Two Distributed Planners
Distributed Prioritized
Planning (DPP)
AAMAS 2010 - Doctoral Symposium
L-TREMOR
8
Distributed Prioritized
Planning
AAMAS 2010 - Doctoral Symposium
9
Multiagent Path Planning
Start
Goal
AAMAS 2010 - Doctoral Symposium
10
Multiagent Path Planning
AAMAS 2010 - Doctoral Symposium
11
Prioritized Planning
• Assign priorities to agents based on path length
[van den Berg, et al 2005]
AAMAS 2010 - Doctoral Symposium
12
Prioritized Planning
• Plan from highest priority to lowest priority
• Use previous agents as dynamic obstacles
[van den Berg, et al 2005]
AAMAS 2010 - Doctoral Symposium
13
Distributed Prioritized Planning
Parallelizable
& Equivalent
AAMAS 2010 - Doctoral Symposium
14
Large-Scale Path Solutions
AAMAS 2010 - Doctoral Symposium
15
Large-Scale Path Solutions
AAMAS 2010 - Doctoral Symposium
16
DPP Results
Fewer Sequential Plans
AAMAS 2010 - Doctoral Symposium
17
DPP Results
Fewer Sequential Plans
Longer Planning Time
AAMAS 2010 - Doctoral Symposium
18
Why does this happen?
• Prioritized Planning
A
B
C
D
Longest planning agents
might replan multiple
times
Individual agent planning
times varied by >2
orders of magnitude
• DPP
A
B
C
D
Solution 1:
Prioritize by plan time?
Solution 2:
Incremental Planning
AAMAS 2010 - Doctoral Symposium
19
Summary of DPP
• Observable, certain world
• Only one type of interaction: collision
• Far fewer sequential planning iterations
• Incremental planning may reduce execution time
AAMAS 2010 - Doctoral Symposium
20
L-TREMOR
AAMAS 2010 - Doctoral Symposium
21
A Simple Rescue Domain
Unsafe Cell
Rescue Agent
Clearable
Debris
Narrow
Corridor
Victim
Cleaner Agent
AAMAS 2010 - Doctoral Symposium
22
A Simple (Large) Rescue Domain
AAMAS 2010 - Doctoral Symposium
23
Distributed POMDP with Coordination
Locales (DPCL)
• Often, interactions between agents are sparse
Only fits one
agent
Passable if
cleaned
[Varakantham, et al 2009]
AAMAS 2010 - Doctoral Symposium
24
Distributed POMDP with Coordination
Locales (DPCL)
• Define coordination locales (CLs) where
POMDP model functions are not independent:
<S, A, Ω, P, R, O>
(states) (actions) (obs.) (transition)(reward)(obs. fn)
[Varakantham, et al 2009]
AAMAS 2010 - Doctoral Symposium
25
Distributed POMDP with Coordination
Locales (DPCL)
• Define coordination locales (CLs) where
POMDP model functions are not independent:
Outside CL:
(typical)
Sglobal
R1, P1, O1
S1, A1
R2, P2, O2
S2, A2
[Varakantham, et al 2009]
AAMAS 2010 - Doctoral Symposium
26
Distributed POMDP with Coordination
Locales (DPCL)
• Define coordination locales (CLs) where
POMDP model functions are not independent:
Inside CL:
(interaction)
Sglobal
R12, P12, O12
S1, A1
S2, A2
[Varakantham, et al 2009]
AAMAS 2010 - Doctoral Symposium
27
TREMOR
TREMOR
Role Allocation
Policy Solution
Interaction Detection
Coordination
Branch & Bound
MDP
Independent
EVA[3] solvers
Joint policy
evaluation
Reward shaping
of independent
models
[Varakantham, et al 2009]
AAMAS 2010 - Doctoral Symposium
28
L-TREMOR
TREMOR
Branch & Bound
MDP
L-TREMOR
Role Allocation
Decentralized
Auction
Policy Solution
Distributed &
Independent
EVA[3] solvers
Interaction Detection
Joint policy
Parallelizable
evaluation
Sampling &
message
passing
AAMAS 2010 - Doctoral Symposium
Coordination
Reward shaping
of independent
models
29
Preliminary Results – Joint Utility
N=6
N = 10
AAMAS 2010 - Doctoral Symposium
N = 100
(structurally similar
to N=10)
30
Preliminary Results – Timing
AAMAS 2010 - Doctoral Symposium
31
Preliminary Results – Model Accuracy
R = 0.804
AAMAS 2010 - Doctoral Symposium
32
Current Issues
• Oscillations in solutions
• Discovery of relevant locales
?
AAMAS 2010 - Doctoral Symposium
33
Summary of L-TREMOR
• Partially-observable, uncertain world
• Multiple types of interactions
• Role-allocation of tasks
• Improvement over independent planning
• Handles large problems
• Next steps: improving convergence
AAMAS 2010 - Doctoral Symposium
34
Conclusions
• Two approaches to distributed planning
– DPP: approaching centralized performance
– L-TREMOR: exceeding joint tractability
• Analogous strategies for distributing planning
– Both iterate independent planners
– Both exchange messages about states, actions
AAMAS 2010 - Doctoral Symposium
35
Future Work
• Generalized framework for distributed planning
through iterative message exchange
• Reduce necessary communication
• Better search over task allocations
• Scaling to larger team sizes
AAMAS 2010 - Doctoral Symposium
36