Distributed Planning for Large Teams

Distributed Planning
for Large Teams
Prasanna Velagapudi
Thesis Committee:
Katia Sycara (co-chair)
Paul Scerri (co-chair)
J. Andrew Bagnell
Edmund H. Durfee
Distributed Planning for Large Teams
Outline
• Motivation
• Background
• Approach
– SI-Dec-POMDP
– DIMS
• Preliminary Work
– DPP
– D-TREMOR
• Proposed Work
• Conclusion
Distributed Planning for Large Teams
2
Motivation
• 100s to 1000s of robots,
agents, people
• Complex, collaborative tasks
• Dynamic, uncertain
environment
• Offline planning
Distributed Planning for Large Teams
3
Motivation
• Scaling planning to large teams is hard
– Need to plan (with uncertainty) for each agent in team
– Agents must consider the actions of a growing
number of teammates
– Full, joint problem has NEXP complexity [Bernstein 2002]
• Optimality is going to be infeasible
• Find and exploit structure in the problem
• Make good plans in reasonable amount of time
Distributed Planning for Large Teams
4
Motivation
• Exploit three characteristics of these domains
1. Explicit Interactions
• Specific combinations of states and actions where effects
depend on more than one agent
2. Sparsity of Interactions
• Many potential interactions could occur between agents
• Only a few will occur in any given solution
3. Distributed Computation
• Each agent has access to local computation
• A centralized algorithm has access to 1 unit of computation
• A distributed algorithm has access to N units of computation
Distributed Planning for Large Teams
5
Example: Interactions
Rescue
robot
Debris
Victim
Cleaner
robot
Distributed Planning for Large Teams
6
Example: Sparsity
Distributed Planning for Large Teams
7
Scalability
Related Work
Generality
Distributed Planning for Large Teams
8
Related Work
Structured Dec-(PO)MDP
planners
Scalability
– JESP
[Nair 2003]
– TD-Dec-POMDP
[Witwicki 2010]
– EDI-CR
[Mostafa 2009]
– SPIDER
[Marecki 2009]
Generality
Distributed Planning for Large Teams
• Restrict generality
slightly to get
scalability
• High optimality
9
Related Work
Heuristic Dec-(PO)MDP
planners
Scalability
– TREMOR
[Varakantham 2009]
– OC-Dec-MDP
[Beynier 2005]
• Sacrifice optimality
for scalability
• High generality
Generality
Distributed Planning for Large Teams
10
Related Work
Structured multiagent
path planners
Scalability
– DPC
[Bhattacharya 2010]
– Optimal Decoupling
[Van den Berg 2009]
Generality
Distributed Planning for Large Teams
• Sacrifice generality
further to get
scalability
• High optimality
11
Related Work
Heuristic multiagent
path planners
Scalability
– Dynamic Networks
[Clark 2003]
– Prioritized Planning
[Van den Berg 2005]
• Sacrifice optimality
to get scalability
Generality
Distributed Planning for Large Teams
12
Related Work
Scalability
Our approach:
• Fix high scalability
and generality
• Explore what level
of optimality is
possible
Generality
Distributed Planning for Large Teams
13
Distributed, Iterative Planning
• Reduce the full joint problem
into a set of smaller,
independent sub-problems
• Inspiration:
– TREMOR
[Varankantham 2009]
– JESP
[Nair 2003]
• Solve independent subproblems with local algorithm
• Modify sub-problems to push
locally optimal solutions
towards high-quality joint
solution
Distributed Planning for Large Teams
14
Thesis Statement
Agents in a large team with known sparse interactions can
find computationally efficient high-quality solutions to planning
problems through an iterative process of estimating the
actions of teammates, locally planning based on these
estimates, and refining their estimates by exchanging
coordination messages.
Distributed Planning for Large Teams
15
Outline
• Motivation
• Background
• Approach
– SI-Dec-POMDP
– DIMS
• Preliminary Work
Problem Formulation
Proposed Algorithm
– DPP
– D-TREMOR
• Proposed Work
• Conclusion
Distributed Planning for Large Teams
16
Problem Formulation
Sparse-Interaction
Dec-POMDP Dec-POMDP
POMDP
Distributed Planning for Large Teams
17
Review: POMDP
: Set of States
: Set of Actions
: Set of Observations
: Transition function
: Reward function
: Observation function
Distributed Planning for Large Teams
18
Review: Dec-POMDP
: Joint Transition
: Joint Reward
: Joint Observation
Distributed Planning for Large Teams
19
Dec-POMDP  SI-Dec-POMDP
Distributed Planning for Large Teams
20
Sparse Interaction Dec-POMDP
:
:
:
Distributed Planning for Large Teams
21
Proposed Approach: DIMS
Distributed Iterative Model Shaping
• Reduce the full joint problem into a set
of smaller, independent sub-problems
(one for each agent)
Task Allocation
Local Planning
• Solve independent sub-problems with
existing state-of-the-art algorithms
• Modify sub-problems such that local
optimum solution approaches highquality joint solution
Interaction
Exchange
Model Shaping
Distributed Planning for Large Teams
22
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
• Assign tasks to agents
• Reduce search space considered by agent
• Define local sub-problem for each robot
Local Planning
Interaction
Exchange
Model Shaping
Distributed Planning for Large Teams
23
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
Local Planning
• Assign tasks to agents
• Reduce search space considered by agent
• Define local sub-problem for each robot
Full SI-Dec-POMDP
Interaction
Exchange
Model Shaping
Local (Independent) POMDP
Distributed Planning for Large Teams
24
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
Local Planning
• Solve local sub-problems using off-theshelf centralized solver
• Result: Locally-optimal policy
Interaction
Exchange
Model Shaping
Distributed Planning for Large Teams
25
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
Local Planning
• Given local policy: estimate local
probability and value of interactions
• Communicate local probability and value
of relevant interactions to team members
• Sparsity  Relatively small # of messages
Interaction
Exchange
Model Shaping
Distributed Planning for Large Teams
26
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
• Modify local sub-problems to account for
presence of interactions
Local Planning
Interaction
Exchange
Model Shaping
Distributed Planning for Large Teams
27
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
• Reallocate tasks or re-plan using
modified local sub-problem
Local Planning
Interaction
Exchange
Model Shaping
Distributed Planning for Large Teams
28
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
Any decentralized allocation
mechanism (e.g. auctions)
Local Planning
Stock graph, MDP, POMDP solver
Interaction
Exchange
Lightweight local evaluation and
low-bandwidth messaging
Model Shaping
Methods to alter local problem to
incorporate non-local effects
Distributed Planning for Large Teams
29
Outline
• Motivation
• Background
• Approach
– SI-Dec-POMDP
– DIMS
• Preliminary Work
– DPP
– D-TREMOR
• Proposed Work
• Conclusion
Distributed Planning for Large Teams
30
Preliminary Results
Distributed Prioritized Planning
(DPP)
Distributed Team REshaping of
MOdels for Rapid execution
(D-TREMOR)
P. Velagapudi, K. Sycara, and P. Scerri,
“Decentralized prioritized planning in large
multirobot teams,”
IROS 2010.
P. Velagapudi, P. Varakantham,
K. Sycara, and P. Scerri,
“Distributed Model Shaping for Scaling to
Decentralized POMDPs with hundreds of agents,”
AAMAS 2011 (in submission).
Distributed Planning for Large Teams
31
Preliminary Results
Distributed Prioritized Planning
(DPP)
•No uncertainty
•Many potential interactions
•Simple interactions
Distributed Team REshaping of
MOdels for Rapid execution
(D-TREMOR)
•Action/Observation uncertainty
•Fewer potential interactions
•Complex interactions
Distributed Planning for Large Teams
32
Multiagent Path Planning
Start
Goal
Distributed Planning for Large Teams
33
Multiagent Path Planning  SI-Dec-POMDP
• Only one interaction: collision
• Many potential collisions
• Few collisions in any solution
Distributed Planning for Large Teams
34
DIMS: Distributed Prioritized Planning
Task Allocation
(Given)
Local Planning
A*
Interaction
Exchange
Path messages
Model Shaping
Prioritized configuration-time
obstacles
Distributed Planning for Large Teams
35
Prioritized Planning
[van den Berg, et al 2005]
• Assign priorities to agents based on path length
• Longer path length estimate  higher priority
[van den Berg, et al 2005]
Distributed Planning for Large Teams
36
Prioritized Planning
[van den Berg, et al 2005]
• Sequentially plan from highest to lowest priority
– Takes n steps for n agents
• Use previous agents as dynamic obstacles
Distributed Planning for Large Teams
37
Distributed Prioritized Planning
Model-Shaping
Local
Planning
Agent Prioirity
Interaction
Exchange
Distributed Planning for Large Teams
38
Large-Scale Path Solutions
Distributed Planning for Large Teams
39
Large-Scale Path Solutions
Distributed Planning for Large Teams
40
Experimental Results
• Scaling Dataset
– # robots varied: {40, 60, 80, 120, 160, 240}
– Density of map constant: 8 cells per robot
• Density Dataset
– # robots constant: 240
– Density of map varied: {32, 24, 16, 12, 8} cells per robot
• Cellular automata to generate 15 random maps
• Maps solved with centralized prioritized planning
Distributed Planning for Large Teams
41
High quality solutions
Both centralized and distributed PP are near-optimal
Varying Team Size
Varying Density
(240 agents)
DPP
Distributed Planning for Large Teams
42
Few sequential iterations
DPP’s sequential iterations are a fraction of team size
Varying Team Size
Varying Density
(240 agents)
(Centralized Prioritized Planning takes
50 - 240 iterations)
(Centralized Prioritized Planning
takes 240 iterations)
Distributed Planning for Large Teams
43
Summary of DPP
• DPP achieves high-quality solutions
– Same quality as centralized PP
• Prioritization + sparsity = rapid convergence
– Able to handle large numbers of collision interactions
– Far fewer sequential planning iterations
Distributed Planning for Large Teams
44
Preliminary Results
Distributed Prioritized Planning
(DPP)
•No uncertainty
•Many potential interactions
•Simple interactions
Distributed Team REshaping of
MOdels for Rapid execution
(D-TREMOR)
•Action/Observation uncertainty
•Fewer potential interactions
•Complex interactions
Distributed Planning for Large Teams
45
A Simple Rescue Domain
Unsafe Cell
Clearable
Debris
Rescue Agent
Narrow
Corridor
Victim
Cleaner Agent
Distributed Planning for Large Teams
46
A Simple (Large) Rescue Domain
Distributed Planning for Large Teams
47
Distributed POMDP with Coordination Locales
[Varakantham, et al 2009]
• Subset of SI-Dec-POMDP: only modifies ,
• Coordination locales (CLs) are subtypes of interactions:
Implicitly construct
interaction functions
CL =
Explicit time
constraint
Explicit time
Distributed Planning for Large Teams
48
DIMS: D-TREMOR
(extending [Varakantham, et al 2009])
Task Allocation
Decentralized auction
Local Planning
EVA POMDP solver
Interaction
Exchange
Policy sub-sampling and
Coordination Locale (CL)
messages
Model Shaping
Prioritized/randomized
reward and transition shaping
Distributed Planning for Large Teams
49
D-TREMOR: Task Allocation
• Assign “tasks” using
decentralized auction
– Greedy, nearest allocation
• Create local, independent
sub-problem:
Distributed Planning for Large Teams
50
D-TREMOR: Local Planning
• Solve using off-the-shelf
algorithm (EVA)
• Result: locally-optimal policies
Distributed Planning for Large Teams
51
D-TREMOR: Interaction Exchange
Finding PrCLi [Kearns 2002]:
• Evaluate local policy
• Compute frequency of
associated si, ai
Entered corridor in
95 of 100 runs:
PrCLi= 0.95
Distributed Planning for Large Teams
52
D-TREMOR: Interaction Exchange
Finding ValCLi [Kearns 2002]:
No collision
• Sample local policy value
with/without interactions
– Test interactions independently
• Compute change in value if
interaction occurred
Distributed Planning for Large Teams
Collision
ValCLi= -7
53
D-TREMOR: Interaction Exchange
• Send CL messages to
teammates:
• Sparsity  Relatively small
# of messages
Distributed Planning for Large Teams
54
D-TREMOR: Model Shaping
• Shape local model
rewards/transitions based on
remote interactions
Probability of interaction
Interaction
model functions
Independent
model functions
Distributed Planning for Large Teams
55
D-TREMOR: Local Planning (again)
• Re-solve shaped local models
to get new policies
• Result: new locally-optimal
policies  new interactions
Distributed Planning for Large Teams
56
D-TREMOR: Adv. Model Shaping
• In practice, we run into three common issues
faced by concurrent optimization algorithms:
– Slow convergence
– Oscillation
– Local optima
• We can alter our model-shaping to mitigate
these by reasoning about the types of
interactions we have
Distributed Planning for Large Teams
57
D-TREMOR: Adv. Model Shaping
• Slow convergence  Prioritization
– Majority of interactions are collisions
– Assign priorities to agents, only model-shape
collision interactions for higher priority agents
– From DPP: prioritization can quickly resolve collision
interactions
– Similar properties for any purely negative interaction
• Negative interaction: when every agent is guaranteed to
have a lower-valued local policy if an interaction occurs
Distributed Planning for Large Teams
58
D-TREMOR: Adv. Model Shaping
• Oscillation  Probabilistic shaping
– Often caused by time dynamics between agents
• Agent 1 shapes based on Agent 2’s old policy
• Agent 2 shapes based on Agent 1’s old policy
– Each agent only applies model-shaping with
probability δ [Zhang 2005]
– Breaks out of cycles between agent policies
Distributed Planning for Large Teams
59
D-TREMOR: Adv. Model Shaping
• Local Optima  Optimistic initialization
– Agents cannot detect mixed interactions (e.g. debris)
• Rescue agent policies can only improve if debris is cleared
• Cleaner agent policies can only worsen if they clear debris
PrCL = low,
ValCL = low
If (Val
low):
PrCLCL= =low,
optimal
ValCLpolicy
= low do
nothing
Distributed Planning for Large Teams
60
D-TREMOR: Adv. Model Shaping
• Local Optima  Optimistic initialization
– Agents cannot detect mixed interactions (e.g. debris)
• Rescue agent policies can only improve if debris is cleared
• Cleaner agent policies can only worsen if they clear debris
– Let each agent solve an initial model that uses an
optimistic assumption of interaction condition
Distributed Planning for Large Teams
61
Preliminary Results
Scaling Dataset
Density Dataset
Distributed Planning for Large Teams
62
Experimental Setup
• D-TREMOR policies
• Scaling:
– 10 to 100 agents
– Max-joint-value
– Random maps
– Last iteration
• Density
D-TREMOR produces reasonable
policies for 100-agent
100 agents
planning problems in –under
6 hrs.
(with some caveats)
• Comparison policies
– Concentric ring maps
–
–
–
–
Independent
Optimistic
Do-nothing
Random
•
•
•
•
3 problems/condition
20 planning iterations
7 time step horizon
1 CPU per agent
Distributed Planning for Large Teams
63
Preliminary Results: Scaling
D-TREMOR
Policies
Naïve Policies
Distributed Planning for Large Teams
64
Preliminary Results: Density
Do-nothing does the best?
Ignoring interactions
= poor performance
Distributed Planning for Large Teams
65
Preliminary Results: Density
D-TREMOR rescues the
most victims
Distributed Planning for Large Teams
D-TREMOR does not
resolve every collision
66
Preliminary Results: Time
Why is this
increasing?
Distributed Planning for Large Teams
67
Preliminary Results: Time
Increase in time related to # of CLs, not # of agents
Distributed Planning for Large Teams
68
Summary of D-TREMOR
D-TREMOR produces reasonable policies for
100-agent planning problems in under 6 hrs.
– Partially-observable, uncertain world
– Multiple types of interactions & agents
• Improves over independent planning
• Resolved interactions in large problems
• Still some convergence/efficiency issues
Distributed Planning for Large Teams
69
Outline
• Motivation
• Background
• Approach
– SI-Dec-POMDP
– DIMS
• Preliminary Work
– DPP
– D-TREMOR
• Proposed Work
• Conclusion
Distributed Planning for Large Teams
70
Proposed Work
Distributed Planning for Large Teams
71
Proposed Work
DPP
D-TREMOR
Task Allocation
Given
Auction
Local Planning
A*
(Graph)
EVA
(POMDP)
Interaction
Exchange
Model Shaping
Policy evaluation &
Prioritized exchange
Prioritized
shaping
Distributed Planning for Large Teams
Stochastic
shaping
Policy sub-sampling &
Full exchange
Optimistic
initialization
72
Proposed Work
Consolidate and
generalize DIMS:
1. Interaction classification
2. Model-shaping heuristics
3. Domain evaluation
– Search & Rescue
– Humanitarian Convoy
Distributed Planning for Large Teams
DPP
D-TREMOR
DIMS
Framework
73
Interaction Classification
What are the different classes of possible
interactions between agents in DIMS?
Model-shaping Terms:
Collisions
(Reward Only)
Collisions (DPP)
Collisions (D-TREMOR)
Debris Clearing
Reward
?
Transition
Obs.
x
x
Collisions
x
(Neg. Reward +
x
Transition)
Policy Effects:
Negative
Positive
Debris Clearing
Collisions (DPP)
(Transition +xDelay)
Collisions (D-TREMOR)
x
Debris Clearing
Mixed
x
Distributed Planning for Large Teams
74
Interaction Classification
1. Determine the sets of interactions that occur in
the domains of interest
2. Formalize the characteristics of useful classes
of interactions from this relevant set
–
Start with identifying differences between
interactions in preliminary work:
•
•
•
Collisions: Reward-only, same-time
Collisions: Reward + Transition, same-time
Debris-Clearing: Transition-only, different-time
3. Classify potential interactions by common
features
Distributed Planning for Large Teams
75
Model-shaping Heuristics
Given classes of relevant interactions, what
do we need to do to find good solutions?
Collisions
(Reward Only)
Model Shaping
Collisions
(Neg. Reward +
Transition)
Prioritized
shaping
Debris Clearing
(Transition + Delay)
?
Stochastic
shaping
Distributed Planning for Large Teams
Optimistic
initialization
?
76
Model-shaping Heuristics
• Explore which if, any, of the existing heuristics
apply to each class of interaction
• Apply targeted heuristics for newly-identified
classes of interactions
• Attempt to bound the performance of the
heuristics for particular classes of interaction
– e.g. Proved that prioritization converges for negative
interactions
Distributed Planning for Large Teams
77
Domain Evaluation
Using our approach, how well can we do in
realistic planning scenarios?
Search and Rescue
Humanitarian Convoy
USARSim
Virtual Battlespace 2 (VBS2)
Distributed Planning for Large Teams
78
Domain Evaluation
Increasing Uncertainty
Domain
Search and Rescue
Humanitarian Convoy
Model
Simple Model
USARSim
Simple Model
VBS2
Graph
DPP



MDP

DIMS

POMDP
D-TREMOR


Distributed Planning for Large Teams

79
Proposed Work: Timeline
Date
Description
Nov 2010 – Feb 2011
Develop classification of interactions
Feb 2011 – Mar 2011
Design heuristics for common interactions
Mar 2011 – Jul 2011
Implementation of DIMS solver
Jul 2011 – Oct 2011
Rescue experiments
Oct 2011 – Jan 2012
Convoy experiments
Feb 2012 – May 2012
Thesis preparation
May 2012
Defend Thesis
Distributed Planning for Large Teams
80
Outline
• Motivation
• Background
• Approach
– SI-Dec-POMDP
– DIMS
• Preliminary Work
– DPP
– D-TREMOR
• Proposed Work
• Conclusion
Distributed Planning for Large Teams
81
Conclusions (1/3): Work-to-date
• DPP: Distributed path planning for large teams
• D-TREMOR: Decentralized planning for sparse
Dec-POMDPs with many agents
• Demonstrated complete distributability, fast
heuristic interaction detection, and local
message exchange to achieve high scalability
• Empirical results in simulated search and rescue
domain
Distributed Planning for Large Teams
82
Conclusions (2/3): Contributions
1. DIMS: a modular algorithm for solving planning
problems in large teams with sparse interactions
–
Single framework, applied to path planning, MDP, POMDP
2. Empirical results of distributed planning using DIMS in
teams of at least 100 agents across two domains
3. Study of characteristics of interaction in sparse
planning problems
– Provide classification of interactions
– Determine features for distinguishing interaction behaviors
Distributed Planning for Large Teams
83
Conclusions (3/3): Take-home Message
This thesis will demonstrate that it is possible to
efficiently, distributedly find high-quality
solutions to planning problems with known
sparse interactions with and without
uncertainty for teams of at least a hundred
agents.
Distributed Planning for Large Teams
84
Distributed Planning for Large Teams
85
The VECNA Bear (Yes, it exists!)
Distributed Planning for Large Teams
86
SI-Dec-POMDP vs. other models
• DPCL
– Extends DPCL model
– Adds observational interactions
– Time integrated in state rather than explicit
• EDI/EDI-CR
– Adds complex transitions and observations
• TD-Dec-MDP
– Allows simultaneous interaction (within epoch)
• Factored MDP/POMDP
– Adds interactions that span epochs
Distributed Planning for Large Teams
87
Proposed Approach: DIMS
Distributed Iterative Model Shaping
Task Allocation
Assign tasks to agents
Create local sub-problems
Local Planning
Use local solver to find optimal
solution to sub-problem
Interaction
Exchange
Compute and exchange
probability and expected value of
interactions
Model Shaping
Alter local sub-problem to
incorporate non-local effects
Distributed Planning for Large Teams
88
Motivation
Distributed Planning for Large Teams
89
D-TREMOR
Distributed Planning for Large Teams
90
D-TREMOR
Distributed Planning for Large Teams
91
D-TREMOR: Reward functions
•
Probability that a debris will not allow a robot to enter the cell:
–
•
Probability of action failure
–
•
R_Observe = -0.25;
Reward for a collision
–
•
R_Move = -0.5;
Reward of observing
–
•
R_Cleaning = 0.25;
Reward of moving
–
•
R_Victim = 10.0;
Reward of cleaning debris
–
•
P_ReboundAfterCollision = 0.5;
Reward of saving a victim
–
•
P_ObsSuccessOnFailure = 0.2;
Probability that a robot will return to the same cell after collision
–
•
P_ObsSuccessOnSuccess = 0.8;
Probability that success is observed if the action failed
–
•
P_ActionFailure = 0.2;
Probability that success is observed if the action succeeded.
–
•
P_Debris = 0.9;
R_Collision = -5.0;
Reward for landing in an unsafe cell
–
R_Unsafe = -1;
Distributed Planning for Large Teams
92

Download Report

Distributed Planning for Large Teams

Paperzz.com

Your Paperzz