A novel distributed scheduling algorithm for time-critical multi

Loughborough University
Institutional Repository
A novel distributed
scheduling algorithm for
time-critical multi-agent
systems
This item was submitted to Loughborough University's Institutional Repository
by the/an author.
Citation: WHITBROOK, A., MENG, Q. and CHUNG, P.W.H., 2015. A novel
distributed scheduling algorithm for time-critical multi-agent systems.
Pre-
sented at: 2015 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS), Hamburg, Germany, 28th Sept. to 2nd Oct. pp.6451-6488.
Additional Information:
•
Personal use of this material is permitted.
Permission from IEEE must
be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers
or lists, or reuse of any copyrighted component of this work in other works.
Metadata Record: https://dspace.lboro.ac.uk/2134/18840
Version: Accepted for publication
Publisher: IEEE
Rights: This work is made available according to the conditions of the Creative
Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BYNC-ND 4.0) licence. Full details of this licence are available at: https://creativecommons.org/licenses/bync-nd/4.0/
Please cite the published version.
A Novel Distributed Scheduling Algorithm for Time-Critical MultiAgent Systems
Amanda Whitbrook, Qinggang Meng, and Paul W. H. Chung

Abstract — This paper describes enhancements made to the
distributed performance impact (PI) algorithm and presents
the results of trials that show how the work advances the stateof-the-art in single-task, single-robot, time-extended, multiagent task assignment for time-critical missions. The
improvement boosts performance by integrating the
architecture with additional action selection methods that
increase the exploratory properties of the algorithm (either soft
max or ε-greedy task selection). It is demonstrated empirically
that the average time taken to perform rescue tasks can reduce
by up to 8% and solution of some problems that baseline PI
cannot handle is enabled. Comparison with the consensusbased bundle algorithm (CBBA) also shows that both the
baseline PI algorithm and the enhanced versions are superior.
All test problems center around a team of heterogeneous,
autonomous vehicles conducting rescue missions in a 3dimensional environment, where a number of different tasks
must be carried out in order to rescue a known number of
victims that is always more than the number of available
vehicles.
I.
INTRODUCTION
A. Motivation
Multi-agent task allocation problems in the Single-Task,
Single-Robot, Time-Extended Assignment (ST-SR-TA) class
are strongly NP-hard [4] as they represent complex,
combinatorial decision problems, and, even when these
problems are small, there is an exponential solution space [3].
These drawbacks mean that the Linear Programming (LP)
approach, which guarantees optimality, is not suitable for
problems of this type. Much research effort has thus been
directed toward designing heuristic-based methods that can
provide fitter solutions than those generated by greedy
algorithms.
The contribution of this paper is the development and
testing of extensions to the work carried out in [1] to enhance
the performance of a distributed heuristic algorithm that uses
the novel concept of performance impact (PI) to allocate
time-critical tasks among a heterogeneous set of vehicles.
The baseline PI algorithm is expanded to include an
appropriate combination of PI task selection and either soft
max or ε-greedy task selection. For each problem, the
parameter that determines the best action-selection
combination is obtained by repeatedly solving between start
and end values in suitable steps, until the best solution is
found. Extensive testing under several different scenarios is
A. M. Whitbrook, Qinggang Meng and Paul W. H. Chung are with the
Department of Computer Science at Loughborough University,
Loughborough, Leicestershire, LE11 3TU, United Kingdom (phone:
+441509225913; e-mail: [email protected], [email protected],
[email protected]).
carried out to show empirically that the enhancement can
improve the performance of the baseline PI algorithm by up
to 8%, and enables solution of some problems that the
baseline cannot handle. Comparison with the state-of-the-art
CBBA algorithm is also included and it is shown that the
baseline PI is superior to it. The enhancements suggested
here thus advance the state of the art in task assignment for
multi-agent, time-critical systems even further. The increased
effectiveness can be attributed to enabling escape from local
minima by improving the exploration properties of the
algorithm. However, the search for an optimal parameter
introduces a trade-off between increasing solution time and
boosting solution quality.
B. Related Work
The choice of either a centralized or distributed
communication system is of paramount importance when
designing task allocation algorithms, as robots continually
need to share information about their current task set.
Centralized approaches, for example [6] and [7], incur a high
communication overhead for larger systems, and are
vulnerable to single-point failure. In addition, the vehicles
need to communicate with a central server in a fixed location,
limiting the range of the mission. However, these systems are
generally simpler to implement and tend to run faster as no
consensus processing stage is required to ensure that the
vehicles have identical situational awareness (SA) or
identical solutions. Alternatively, distributed systems, where
a planner is instantiated on each vehicle, require less
communication bandwidth, allow extension of the mission
range, and have no single-point failure vulnerability.
However, in real networks, where communication is
sometimes limited, inconsistencies in the SA or the
generation of different local solutions can lead to conflicting
assignments [8], meaning that some form of consensus
algorithm is necessary [9]. These consensus-before-planning
algorithms provide an additional computational and data
processing burden, which can slow down performance, but
they have been shown to be robust to different network
topologies [2].
Many methods designed for the solution of ST-SR-TA
problems involve iterative task allocation, for example
market-based decision strategies [5], where each robot is
modelled as a self-interested agent, and the whole fleet as an
economy. The robots must maximize their own profit by
making deals with others in the form of bidding for different
tasks. Globally, the profit (revenue minus cost) must be
maximized.
Auction-based algorithms (see [10], [11]), which are a
subset of market-based methods, have also been applied to
ST-SR-TA problems. In these algorithms, each robot bids on
a task based on information from its own SA, and the highest
bidder wins the task assignment. Either a central system or
one of the bidders can act as the auctioneer. A disadvantage
with the method is that each agent needs to communicate
directly with the auctioneer, limiting the choice of network
topology that can be employed. To avoid this difficulty the
auctions can be run within a set of direct neighbors only,
although this can compromise mission performance. Auctionbased methods are generally robust to inconsistencies in the
SA, and have been shown to produce sub-optimal solutions
efficiently [2]. For a full discussion of centralized and
distributed auction methods see [12].
In [2] Choi et al. have shown that their distributed
consensus-based bundle algorithm (CBBA, suitable for
solving time-critical SR-ST-TA problems) effectively
combines the positive properties of auction-based and
consensus-before-planning approaches, producing conflictfree solutions independent of inconsistencies in the SA. Task
selection is implemented via a decentralized auction phase,
and agreement on the winning bids (rather than the SA) is
achieved through a consensus phase that also serves to
release tasks that have been outbid. Application of the
auction method to TA problems is made possible by grouping
common tasks into bundles, and allowing the vehicles to bid
on the bundles rather than individual tasks. Bundles are
continuously updated as the auction proceeds. The authors
show that the method produces the same solution as some
centralized sequential greedy procedures, and 50% optimality
is guaranteed. Task bundling auction methods are also
described in [13] and [14].
In CBBA the bundles are formed by logically grouping
similar tasks, as it would be too computationally costly to
enumerate all possible bundles. The architecture in [1] uses a
similar approach to CBBA, but considers a set of tasks to
have a positive synergy for a vehicle if the combined cost of
executing them together is less than the sum of the individual
costs incurred by carrying them out separately (and vice versa
for a negative synergy). The method uses a key novel concept
called performance impact (PI) in order to exploit the
synergies between tasks to increase optimality. This is a
measure of the importance of a task to a vehicle’s local cost.
Global cost is decreased by satisfying certain criteria on the
performance impact when switching tasks between vehicles.
The distributed PI algorithm has been shown empirically to
solve task allocation problems more effectively than the
CBBA method. When solving a number of time-critical STSR-TA problems with different network topologies, different
numbers of vehicles and tasks, and randomly generated
locations for survivors and vehicles, the PI approach
demonstrates a consistently lower average rescue time, and is
able to solve many problems that the CBBA method cannot.
This paper develops the PI algorithm further and tests
performance against both CBBA and baseline PI. It is
arranged as follows; section II summarizes the problem
domain and describes the baseline PI architecture. Section III
details the additional action selection mechanisms and shows
how they are integrated into baseline PI. Section IV provides
the experimental methodology and presents a detailed result
comparison for the algorithms using several different
scenarios. Section V concludes the paper and suggests
possible extensions for the work.
II. PROBLEM DESCRIPTION AND PI ARCHITECTURE
A. A. Summary of the Problem
The problem of interest in this paper is formulated
mathematically by defining a set of 𝑛 heterogeneous rescue
vehicles 𝑽 = [𝑣1 , … , 𝑣𝑛 ] , a set of 𝑚 tasks 𝑻 =
[𝑡1 , … , 𝑡𝑚 ]T 𝑚 > 𝑛, and a set of ordered task allocations
𝑨 = [𝒂1 , … , 𝒂𝑛 ]T , where 𝒂𝑖 , 𝑖 = 1, … , 𝑛, is the task list
assigned to vehicle 𝑣𝑖 . Note that the actual size of a task list
𝛼𝑖 may vary between vehicles, for example vehicle 𝑣1 may
be assigned a single task, whereas vehicle 𝑣2 may be
assigned three tasks. However, mathematically |𝒂𝑖 | ≡ 𝑚 as
the remaining elements of each assignment vector are set to
-1. A compatibility matrix 𝑯 with entries ℎ𝑖,𝑗 ∈ [0,1] defines
whether vehicle 𝑣𝑖 is able to perform task 𝑡𝑗 as there may be
different task types (the value is 1 if it is able, 0 otherwise).
In addition, a maximum start time 𝑺 = [𝑠1 , … , 𝑠𝑚 ] is
randomly defined for each task in each test scenario. After
this time has elapsed the task cannot commence, i.e. the
problem has the additional complexity of being time-limited.
Thus, the time cost for a particular task is defined as the time
taken to arrive at the scene; the time taken to carry out the
current task is fixed for each task type and is not included;
only previous task times affect the current time cost. Each
task requires only one vehicle, and each vehicle can complete
only one task at a time, although it can complete other tasks
afterwards, provided that there is enough time. The problem
is to find a conflict-free assignment of vehicles to tasks that
maximizes some global reward. It falls into the general
category of Single Task–Single Robot (ST-SR) task
allocation problems, as defined in [3], and since there are
always more victims to find than vehicles available, the
problem is also a Time-Extended Assignment (TA) type, i.e.
it is an ST-SR-TA system under the same taxonomy.
In this particular case, the global objective function is to
minimize 𝜑 the average single task time over all tasks, i.e.
𝑛
𝜑=
𝛼𝑖
1
∑ ∑ 𝑐𝑖,𝑘 (𝒂𝑖 ),
𝑚
(1)
𝑖=1 𝑘=1
where 𝛼𝑖 is the number of tasks assigned to vehicle 𝑣𝑖 , and
𝑐𝑖,𝑘 (𝒂𝑖 ) is the time cost incurred by vehicle 𝑣𝑖 servicing the
kth tasks in its task list.
The particular scenario used here is based on the rescue
aspect of urban search-and-rescue (USAR). The vehicles are
either Unmanned Air Vehicles (UAVs) supplying food or
helicopters supplying medicine, and each vehicle must find
its way to a victim that requires the supplies it is carrying.
The start locations of the UAVs and helicopters are known in
advance, as are the 3-dimensional locations and requirements
of the victims. Different test scenarios can be created by
using different seed values to set the vehicle and task
locations randomly. The following problem details hold
throughout this paper unless stated otherwise:
1.
The number of tasks is always exactly double the
number of vehicles, and the numbers of helicopters and
2.
3.
4.
5.
6.
7.
UAVs are always equal.
The world 𝑥 and 𝑦 coordinates range from -5000m to
5000m and the 𝑧 coordinates range from 0m to 1000m.
The helicopters travel at 30m/s and the UAVs at 50m/s.
All vehicles are available straight away at the start of
the mission.
The mission time limit (the time window within which
the mission must finish) is set at 2000s and the earliest
start time is always 0s for all tasks.
The maximum start time s is generated for each task
using a random fraction of 2000s.
The times to execute delivery of medicine and food are
fixed at 300s and 350s respectively.
Note that real USAR missions do not conform to the above
assumptions; they are made merely to simplify the analysis
and enable general conclusions to be drawn more easily.
B. The PI Architecture
PI is a distributed algorithm that runs independently on
board each vehicle. It uses the basic framework of the
distributed CBBA auction architecture [2], but introduces the
novel concept of performance impact (PI) as a score function
to determine the bundles and hence allocate the tasks. As in
the CBBA algorithm, there is a local task allocation phase in
which each vehicle generates a bundle of tasks, and a task
consensus phase that resolves conflicts through local
communication between connected vehicles. The two phases
are repeated until convergence, i.e. until a conflict-free task
assignment is reached.
There are two types of performance impact, which are
now explained. The removal performance impact (RPI)
𝑤𝑘 (𝒂𝑖 ⊝ 𝑡𝑘 ) of task 𝑡𝑘 to its assigned vehicle 𝑣𝑖 is the cost of
performing a removed task plus the difference in cost (with
and without the removed task) of performing future tasks. It
represents the contribution of a task to the local cost
generated by a vehicle. RPI is defined as:
∝𝑖
𝑐𝑖,𝑏 (𝒂𝑖 ) + ∑ {𝑐𝑖,𝑟 (𝒂𝑖 ) − 𝑐𝑖,𝑟−1 (𝒂𝑖 ⊝ 𝑡𝑘 )},
(2)
𝑟=𝑏+1
where 𝒂𝑖 ⊝ 𝑡𝑘 symbolizes removal of task 𝑡𝑘 from the task
list 𝒂𝑖 of vehicle 𝑣𝑖 , and 𝑏 denotes the position of task 𝑡𝑘 in
the task list, i.e. 𝑎𝑖,𝑏 = 𝑡𝑘 . The summation term represents
comparison of the time cost with the task 𝑡𝑘 included in the
task list (first term) and the time cost without it (second
term). It is a summation since this is calculated for all the
tasks following 𝑡𝑘 in the task list. An RPI list 𝜸𝑝 =
[𝑤1 , … , 𝑤𝑚 ]T 𝑝 = 1, … , 𝑛 is thus compiled for each vehicle.
T
To facilitate consensus, a vehicle list 𝜷𝑝 = [𝛽1 , … , 𝛽𝑚 ] 𝑝 =
1, … , 𝑛 is also composed for each vehicle. This list records
the local view of which vehicle is assigned to which task.
When a task is removed from a vehicle’s task list it must
be added to the task list of another. Thus, it is necessary to
define inclusion performance impact (IPI) to measure the
task’s contribution to the local cost generated by the new
vehicle. The IPI 𝑤𝑘∗ (𝒂𝑗 ⊕ 𝑡𝑘 ) of task 𝑡𝑘 to vehicle 𝑣𝑗 is the
cost of performing the additional task plus the difference in
cost (with and without the added task) of performing future
tasks. It is defined as:
∝
𝑗
∆
𝑤𝑘∗ (𝒂𝑗 ⊕ 𝑡𝑘 ) = min𝑙=1
{𝑤𝑘,𝑗,𝑙
},
(3)
∝𝑗
∆
𝑤𝑘,𝑗,𝑙
= {𝑐𝑗,𝑙 (𝒂𝑗 ⨁𝑙 𝑡𝑘 ) + ∑{𝑐𝑗,𝑟+1 (𝒂𝑗 ⨁𝑙 𝑡𝑘 ) − 𝑐𝑗,𝑟 (𝒂𝑗 )}},
(4)
𝑟=𝑙
where 𝒂𝑗 ⨁𝑙 𝑡𝑘 symbolizes adding task 𝑡𝑘 into the task list 𝒂𝑗
∆
of vehicle 𝑣𝑗 , at the 𝑙 th position. The value of 𝑤𝑘,𝑗,𝑙
in (4) is
∗
calculated for each possible value of 𝑙 and 𝑤𝑘 is taken as the
minimum of these. Again, the summation term represents
comparison of the time cost with the task 𝑡𝑘 now included in
the task list (first term) and the time cost without it (second
term). This is calculated for all the tasks including and
following position 𝑙 in the task list. An IPI list 𝜸∗𝑝 =
∗ ]T
[𝑤1∗ , … , 𝑤𝑚
𝑝 = 1, … , 𝑛 is thus compiled for each vehicle.
Note that in the implementation of PI an infinity value is used
for 𝑤𝑘∗ when a task is already included in a vehicle’s task list.
∆
Intuitively, the RPI and 𝑤𝑘,𝑗,𝑙
in (4) have the same value when
a task is removed from a vehicle’s task list and is then added
back into the same task list in the same position, i.e. when
𝑖 = 𝑗 and 𝑏 = 𝑙.
When removing a task 𝑡𝑘 from the task list 𝒂𝑖 of 𝑣𝑖 it is
obvious that there is benefit in adding it to the task list 𝒂𝑗 of
vehicle 𝑣𝑗 if
𝑤𝑘 (𝒂𝑖 ⊝ 𝑡𝑘 ) > 𝑤𝑘∗ (𝒂𝑗 ⊕ 𝑡𝑘 )
as this will decrease the overall cost by the difference
between the two values. The novelty of the PI algorithm is
the RPI and IPI concepts; its full structure, which is similar to
the CBBA algorithm [2] and written in MATLAB code, is
now described.
At the start of the PI algorithm, the locations of the tasks
and vehicles and the maximum start times are randomly
generated, and the network topology (row, circular, mesh or
hybrid) is defined. Also, the vehicle RPI lists and IPI lists are
initialized to an 𝑚-sized vector holding the maximum
MATLAB real number. The task lists, time cost lists and
vehicle lists are initialized to an 𝑚-sized vector of -1, -1 and
0 values respectively.
During the consensus phase the vehicles exchange RPI
lists, vehicle lists and also time stamps with all other vehicles
in their range. When all the lists have been received, the
consensus takes place, i.e. the RPI and vehicle lists are recomputed according to an adaptation of the CBBA action
rules specified in Table 1 of [2], which stipulates conditions
for updating (adopting another vehicle’s lists), leaving
(keeping the same lists), and resetting. These rules are based
on comparing RPI values and determining which vehicle has
the most up-to-date information. For example, if vehicle 𝑗 is
the sender and vehicle 𝑖 is the receiver, and both vehicles
claim task 𝑘 then if 𝑤𝑗𝑘 < 𝑤𝑖𝑘 the receiver’s action is to
update so that 𝑤𝑖𝑘 = 𝑤𝑗𝑘 and 𝛽𝑖𝑘 = 𝛽𝑗𝑘 . If the sender claims
task 𝑘 but the receiver credits it to a different vehicle 𝑝 then,
if either the time stamp for the information exchange between
vehicles 𝑗 and 𝑝 is more recent than that between vehicles 𝑖
and 𝑝, or if 𝑤𝑗𝑘 < 𝑤𝑖𝑘 , then the receiver’s action is the same.
After the consensus phase, the task removal phase of the
algorithm begins. Tasks are marked as candidates for
removal from a vehicle’s task list if there is disagreement
between the vehicle list computed in the consensus phase and
the current task list, i.e. if a task is recorded on the task list,
but that task is assigned to a different vehicle on the vehicle
list. The RPI list 𝜸𝑖⋄ = [𝑤1 , … , 𝑤𝑚 ]T is then calculated
according to (2) and iteratively compared with the previous
RPI list 𝜸𝑖 = [𝑤1 , … , 𝑤𝑚 ]T that emerged from the consensus
phase, for all candidate tasks 𝒅𝑖 , i.e. the following is
computed:
|𝑑 |
⋄
𝑖
𝑧 = max𝑘=1
{𝛾𝑖,𝑘
− 𝛾𝑖,𝑘 }.
(5)
If z ≥ 0 then the task yielding the maximum 𝑧 is removed
from both the task list and the candidate list, and the time cost
is then re-calculated. In addition, 𝜸⋄𝑖 is computed again from
(2) as its value changes following the removal of the task.
Equation (5) is re-evaluated, and the process repeats until
𝒅𝑖 = ∅. Any unremoved tasks are assigned to 𝑣𝑖 in the
vehicle list, i.e. 𝜷𝑖 (𝒅𝑖 ) = 𝑣𝑖 and the RPI list 𝜸𝑖 is set as the
final 𝜸⋄𝑖 .
𝑓𝑘
𝑝𝑘 =
𝑒𝜏
𝑓𝑗 .
(7)
𝜏
∑𝑚
𝑗=1 𝑒
By varying the parameter 𝜏 it is possible to alter the
selection strategy from picking a random item (𝜏 infinite) to
assigning higher probabilities for higher fitness (𝜏 small and
finite), to choosing only the item with the best fitness
(𝜏 tending to 0).
B. Integration with the PI Algorithm
In the proposed approach ε-greedy or Boltzmann soft max
action selection routines are integrated into the PI algorithm
and a loop is constructed around the main program. Within
the loop different values of 𝜀 and 𝜏 respectively are trialed (in
appropriate steps) until the best solution (i.e. that yielding the
minimum 𝜑 value from (1)) is obtained. Pseudo code for the
modified main program is presented in Algorithm 1.
A. Soft Max and ε-Greedy Action Selection
In the baseline PI algorithm task allocation is governed
only by comparing the calculated RPI and IPI values using
(5) and (6). This approach can restrict the solution search
space to local minima. There is thus a need to provide an
additional mechanism that permits further exploration of the
solution space.
Algorithm 1 Main Program
1: Initialize φ
̂ , ε̂ (ε-greedy) or τ̂ (soft max) to large
...values, Set Seed
2: for 𝜀 [𝜏] between start value and end value
3: Reset Seed,
4: Define World, Vehicles, Tasks, Network Topology
5: for each vehicle 𝑖
6:
Initialize 𝑎𝑖 , 𝑐𝑖 , 𝑤𝑖∗ , 𝑤𝑖 , 𝛽𝑖
7: next
8: while not converged
9:
Communicate 𝑤𝑖 and 𝛽𝑖 between vehicles
10:
Re-compute 𝑤𝑖 and 𝛽𝑖 according to CBBA rules
11:
for each vehicle 𝑖
12:
Carry out 𝜀- greedy [soft max] task removal
13:
Carry out 𝜀- greedy [soft max] task inclusion
14:
next
15:
Check convergence
16:
if converged
17:
Compute 𝜑 from (1)
18:
Results set = ℛ
19:
Mark as converged
20:
end if
21: end while
22: if 𝜑 < 𝜑̂ AND problem has not failed
23:
𝜑̂ = 𝜑 , 𝜀̂ = 𝜀 [𝜏̂ = 𝜏], ℛ̂ = ℛ
24: end if
25: next
26: Show final 𝜑̂, 𝜀̂ [𝜏̂ ], ℛ̂
There are two variants of the modified PI algorithm; one
that uses ε-greedy action selection, and one that uses
Boltzmann soft max selection. In ε-greedy selection the best
option is selected for a proportion 1 − 𝜀 of trials, and a
random option (with uniform probability) is selected for a
proportion 𝜀, (0 ≤ 𝜀 ≤ 1). In the Boltzmann soft max
method, selection is based on a fitness score 𝑓 for the various
options. If there are 𝑚 items, and the fitness for item 𝑘 is 𝑓𝑘 ,
then the probability 𝑝𝑘 of selecting item 𝑘 is given by
In the ε-greedy variant selection is modified when the RPI
and IPI are calculated in the task removal and task inclusion
phases respectively. In the task removal phase, after 𝑤𝑘 has
been calculated from (2) there is a probability 𝜀, (0 ≤ 𝜀 ≤ 1)
of multiplication of 𝑤𝑘 by random factor 𝛿, (1 ≤ 𝛿 ≤ 1.5).
For example, if 𝜀 = 0.25 there is a 25% probability that 𝑤𝑘 is
modified for each task in the task list and a 75% probability
(1 − 𝜀 ) that 𝑤𝑘 is unaffected. The modification means there
The next phase is the task inclusion phase, where the IPI
∗ ]T
list 𝜸∗𝑖 = [𝑤1∗ , … , 𝑤𝑚
is computed according to (3) and (4),
and compared with the RPI list 𝜸𝑖 = [𝑤1 , … , 𝑤𝑚 ]T computed
in the task removal phase, i.e. the following is calculated:
∗
𝑚
𝑞 =max𝑘=1
{𝛾𝑖,𝑘 − 𝛾𝑖,𝑘
}.
(6)
If 𝑞 > 0 then the task 𝑡𝜁 yielding the maximum 𝑞 is
added to the task list of 𝑣𝑖 at the position 𝑙 that returns the
minimum 𝑤𝜁∗ , and the time cost is re-calculated. The RPI of
the task for 𝑣𝑖 becomes the IPI of the task for 𝑣𝑖 and the
vehicle list 𝜷𝑖 is adjusted accordingly. The IPI is
recalculated, (6) is re-evaluated, and the process repeats until
there are no more tasks in the task list. Finally, the RPI
𝜸𝑖 = [𝑤1 , … , 𝑤𝑚 ]T is recalculated at the end of the phase. The
communication exchange, consensus, task removal and task
inclusion phases continue iteratively until suitable stopping
criteria is met, for example, until no actions have been taken
in the task removal and inclusion phases for more than two
iterations. For further details of the PI algorithm see [1].
III. ACTION SELECTION MECHANISMS
is a probability that the task yielding 𝑧 in (5) will differ from
the equivalent task in the baseline PI algorithm. Task
inclusion works in a similar way. Pseudo codes for the εgreedy task removal and task inclusion routines are presented
in Algorithms 2 and 3 respectively. Note that the upper and
lower limits for 𝛿 were determined from pre-trials. Use of a
fixed 𝛿 represents a more common form of ε-greedy
selection, but the pre-trials demonstrated advantages (in
terms of solution quality) when using a variable 𝛿 with
minimum value 1 and maximum 1.5.
Algorithm 2 𝜀- greedy Task Removal
1: Compute candidate tasks for removal
2: while candidate list is not empty
3: for each task in the task list
4:
if vehicle and task are compatible
5:
Set previous (and next) task and time cost
6:
Compute 𝑤𝑘 from (2)
7:
Set 𝜌, 0 ≤ 𝜌 ≤ 1.0 and 𝛿, 1 ≤ 𝛿 ≤ 1.5
8:
if 𝜌 < 𝜀
9:
Set 𝑤𝑘 = 𝛿𝑤𝑘
10:
end if
11:
end if
12: next
13: Compute 𝑧 from (5)
14: if 𝑧 ≥ 0
15:
Remove task yielding max 𝑧 from task list
16:
Remove task yielding max 𝑧 from candidate list
17:
Re-calculate time cost list
18: end if
19: end while
20: Put unremoved tasks back into vehicle list
Algorithm 3 𝜀- greedy Task Inclusion
1: while tasks in task list not at upper limit
2: for each task in problem
3:
if vehicle and task are compatible
4:
if task not already in task list
5:
for each insertion position
6:
Set previous task and time cost
7:
Compute 𝑤𝑘∗ from (3) and (4)
8:
Set 𝜌, 0 ≤ 𝜌 ≤ 1.0 and 𝛿, 1 ≤ 𝛿 ≤ 1.5
9:
if 𝜌 < 𝜀
10:
Set 𝑤𝑘∗ = 𝛿𝑤𝑘∗
11:
end if
12:
next
13:
end if
14: end if
15: next
16: Compute 𝑞 from (6) and 𝑙 from (3) and (4)
17: if 𝑞 > 0
18:
Add task yielding max 𝑞 to task list at position 𝑙
19:
Update vehicle list
20:
Set 𝑤𝑘 = 𝑤𝑘∗
21:
Recalculate time cost list
22: end if
23: end while
24: Recalculate 𝜸𝑖
In the Boltzmann soft max version selection is modified
in a different way. The RPI and IPI are calculated for each
task as in the baseline PI algorithm. For task removal the
arrays 𝝀, 𝝃 (fitness) and 𝝈 (related to the top term in (7)) are
then determined from:
𝝀 = 𝜸⋄ [𝒅] − 𝜸[𝒅],
(8)
𝜇 = min{𝝀},
(9)
𝜇∗
= |𝜇| ∀ 𝜇 < 0,
𝜇∗
= 0 ∀ 𝜇 ≥ 0,
𝝃= 𝝀+
(10)
(11)
𝜇∗ ,
(12)
𝝃
𝝉
(13)
𝝈= 𝑒 .
For task inclusion 𝝀 is calculated from
𝝀 = 𝜸 − 𝜸∗ .
(14)
Calculation of 𝜇 (the adjustment factor to remove negative
values) is slightly more complex for task inclusion, as some
of the members of the 𝝀 array may have values equal to
MATLAB’s largest possible value 𝑅 from initialization.
Thus, 𝝀 is first adjusted so that any such members have their
values scaled by a factor 𝑅. If 𝝀∗ represents the adjusted 𝝀
array, then
𝜇 = min{𝝀∗ },
(15)
and 𝜇 ∗ is given by (10) and (11) as before. The fitness is
defined as
𝝃 = 𝝀∗ + 𝜇∗ ,
(16)
and 𝝈 is given by (13) as before. For both task removal and
task inclusion, the probability 𝑝𝑘 of task 𝑘 being selected is
given by
𝜎𝑘
𝑝𝑘 = 𝑚
(17)
∑𝑗=1 σ𝑗
from (7). To facilitate this, a random number 𝜌 is generated
for each iteration of the task removal and task inclusion
phases, and this number determines which task is selected for
removal or inclusion according to (17). By varying 𝜏 in (13)
it is possible to control the reliance of the strategy upon
probability. Pseudo codes for the Boltzmann soft max task
removal and task inclusion routines are presented in
Algorithms 4 and 5 respectively. Note that the value of 𝑧 is
still calculated from (5) for task removal, even if a different
task is selected (i.e. if the task yielding the maximum value is
not selected). For task inclusion 𝑞 is not calculated from (6);
instead, it is taken as 𝜆𝑗 where 𝑗 represents the selected task.
The position of insertion in the task list 𝑙 is still taken as that
yielding the minimum 𝑤𝑘∗ from (3). Note that, in theory,
parameter reselection could be carried out at any time, to
cope with dynamic changes in the environment. Although
this would impose an additional computational burden during
run time, minimization of impact is possible by limiting the
search to a region close to the original optimal parameter,
assessing overall mission time, and imposing suitable
stopping criteria.
Algorithm 4 Soft Max Task Removal
1: Compute candidate tasks for removal
2: while candidate list is not empty
3: for each task in the task list
4:
if vehicle and task are compatible
5:
Set previous (and next) task and time cost
6:
Compute 𝑤𝑘 from (2)
7:
end if
8: next
9: Set 𝜆, 𝜇, 𝜇 ∗ , 𝜉, 𝜎 from (8) to (13)
10: Compute probabilities of task selection from (17)
11: Generate 𝜌, 0 ≤ 𝜌 ≤ 1.0
12: Select task for removal based on probabilities
13: Compute 𝑧 from (5)
14: if 𝑧 ≥ 0
15:
Remove selected task from task list
16:
Remove selected task from candidate list
17:
Re-calculate time cost list
18: end if
19: end while
20: Put unremoved tasks back into vehicle list
Algorithm 5 Soft Max Task Inclusion
1: while tasks in task list not at upper limit
2: for each task in problem
3:
if vehicle and task are compatible
4:
if task not already in task list
5:
for each insertion position
6:
Set previous task and time cost
7:
Compute 𝑤𝑘∗ from (3) and (4)
8:
next
9:
end if
10: end if
11: next
12: Set 𝜆, 𝜆∗ , 𝜇, 𝜇 ∗ from (14-15), (10-11)
13: Set 𝜉, 𝜎 from (16) and (13)
14: Compute probabilities of task selection from (17)
15: Generate 𝜌, 0 ≤ 𝜌 ≤ 1.0
16: Select task for inclusion based on probabilities
17: Calculate 𝜆𝑗 for selected task 𝑗
18: Compute 𝑙 from (3) and (4)
19: if 𝜆𝑗 > 0
20:
Add task 𝑗 to task list at position 𝑙
21:
Update vehicle list
22:
Set 𝑤𝑘 = 𝑤𝑘∗
23:
Recalculate time cost list
24: end if
25: end while
26: Recalculate 𝜸𝑖
IV. EXPERIMENTS
A. Methodology
In the experiments nine seeds were used to create
different 3-dimensional cases with 4, 6, 8, 10, 12, 14, and 16
vehicles, i.e. 63 different problems were tackled, each using a
row communication topology. The problems were solved
using CBBA, the baseline PI algorithm, and the two proposed
PI variants in this paper. If the problem was solved, i.e. each
task was completed on time, then 𝜑 was calculated and
recorded. If some tasks were not completed then the number
of failed tasks for each task type was recorded instead.
For ε-greedy selection, 𝜀 values of between 0.01 and 0.99
were trialed in steps of 0.01. However, pre-trials showed that
the program execution time for all 99 𝜀 values was becoming
too large as 𝑛 increased. To avoid delays the stopping values
were changed to 0.35 and 0.15 for 10/12 and 14/16 vehicles
respectively. For the same reason soft max selection 𝜏 values
between 1 and 100 in steps of 1 were used for up to 8
vehicles, but the stopping value was adjusted to 𝜏 = 50 for
10, 12 and 14 vehicles, and to 𝜏 = 20 for 16 vehicles. These
adjustments meant that run time never exceeded 7 minutes
for 𝜀-greedy selection (3 minutes for soft max selection,
which proved to be a faster algorithm) in the MATLAB
implementations used here.
In CBBA values of 0.001 were used for 𝜆 and also for 𝐹
(the vehicle fuel penalty) to match the scale of the worlds
generated. The CBBA score function used was:
ℋ𝑒 −𝜆𝑡 − 𝐹𝑑,
where ℋ is the reward associated with a task (ℋ = 100 for
all tasks) and 𝑑 is the distance between vehicle and task.
B. Results
Table I shows the calculated 𝜑 values for each algorithm
and scenario and, in the case of the PI variants, the 𝜀 and 𝜏
values that produced the best solution are also displayed. The
percentage improvement of the best algorithm (highlighted in
bold) over baseline PI is also shown in the last column, where
appropriate. If the best algorithm is able to solve when
baseline PI cannot, then a ‘+’ symbol is shown. If no
algorithm can solve, then a ‘−’ symbol is displayed.
Table II shows the time taken in seconds for 15 iterations
of the 𝜀-greedy and soft max PI variants when case 4 is
solved. This is the time taken when the maximum 𝜀 is set to
0.15 and a step-size of 0.01 is used, and when the maximum
𝜏 is set to 15 with increments of size 1. These times are
compared with the corresponding run times of the baseline PI
algorithm, which only executes a single iteration. For
comparison purposes this value is also multiplied by 15 in the
next column. If an ‘F’ is shown in the table then the
algorithm failed to solve the problem and, if a ‘T’ is shown,
this indicates that the run was terminated because a 600
second limit was reached.
C. Discussion
CBBA demonstrated a high failure rate and out of the 63
problems trialed it was only able to solve 12. It offered the
best solution in only 1 problem (for 4 vehicles, case 8), but in
this case, the ε-greedy and soft max PI variants also produced
the same solution.
The PI variants consistently out-performed the original in
terms of the average rescue time; up to about 8%
improvement was observed. In addition, there were 3 and 4
problems respectively that the ε-greedy and soft max variants
could solve that the baseline could not. They improved the
average solution time (or were able to offer solutions when
the baseline could not) for 92% of the problems; the
remaining 8% of problems could not be solved by any
algorithm.
TABLE I.
𝒏
4
6
8
10
12
14
16
Case
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
CBBA
PI
1/0
1/0
0/1
297.76
0/1
0/1
338.03
274.65
313.20
1/1
1/2
0/1
1/0
0/1
0/1
307.61
281.06
270.23
3/0
2/2
1/2
2/1
274.96
1/1
0/1
270.67
275.92
3/2
3/2
2/1
2/1
1/1
1/1
0/1
0/1
273.67
2/2
4/1
2/0
1/1
2/1
2/0
0/1
0/1
1/0
4/1
3/5
3/0
0/1
1/1
2/1
2/1
1/1
1/0
2/3
3/4
3/4
2/1
2/1
1/2
2/0
1/0
251.23
1/0
293.93
309.30
314.53
263.83
0/1
342.75
281.35
312.72
320.83
313.80
289.50
285.95
270.88
297.97
303.01
275.12
288.95
297.76
313.11
316.31
328.79
270.33
266.51
288.89
255.28
281.69
294.15
1/1
296.75
312.05
1/0
261.77
276.20
280.76
272.68
287.77
1/1
303.77
268.95
274.55
255.63
272.94
285.07
271.86
278.68
0/1
281.63
264.79
269.45
263.44
272.09
262.40
245.47
0/2
1/0
274.48
1/0
265.43
241.58
238.49
254.40
239.81
𝜑 VALUES
εgreedy
1/0
293.93
302.99
296.76
261.57
336.71
332.12
274.65
298.65
298.65
307.40
288.84
280.04
255.66
289.09
291.45
274.91
269.10
294.85
296.12
292.05
304.50
257.95
264.79
278.54
254.01
265.38
284.44
1/1
282.23
301.23
274.27
255.25
268.32
262.05
263.15
276.28
0/1
278.44
262.32
262.40
243.80
264.13
261.09
250.68
271.30
1/1
275.13
259.28
262.10
249.37
261.76
255.94
241.52
0/2
1/0
267.32
267.09
263.93
240.52
241.87
250.00
237.23
𝜺
0.01
0.43
0.31
0.43
0.14
0.19
0.49
0.05
0.91
0.21
0.61
0.49
0.08
0.01
0.31
0.30
0.44
0.76
0.49
0.91
0.75
0.77
0.58
0.97
0.10
0.88
0.32
0.26
0.29
0.33
0.11
0.16
0.28
0.27
0.33
0.16
0.33
0.18
0.08
0.32
0.15
0.35
0.14
0.11
0.15
0.15
0.13
0.08
0.07
0.15
0.06
0.11
0.03
0.03
0.06
0.02
0.11
Soft
max
1/0
293.93
309.30
297.76
263.83
349.00
337.78
274.65
298.65
294.42
307.40
289.50
280.04
256.82
291.18
291.45
274.91
270.23
297.77
303.11
292.23
305.34
266.53
270.94
283.99
255.28
267.33
289.12
1/1
281.24
298.74
288.94
270.10
271.44
276.00
263.48
278.19
1/1
283.27
267.08
262.27
249.64
261.54
261.04
256.28
277.71
284.09
279.52
257.13
260.55
251.36
259.26
259.39
245.47
0/2
1/0
266.85
278.25
259.56
237.51
237.02
246.44
240.01
𝝉
94
2
1
1
28
41
1
1
28
19
2
2
1
18
5
31
2
2
94
7
15
63
82
10
3
2
6
3
5
4
15
2
2
10
3
4
9
19
7
9
12
12
15
47
42
17
31
6
12
41
14
20
10
10
5
12
7
13
%
PI
0.00
2.04
5.65
0.86
+
3.10
2.38
4.50
8.23
2.04
0.23
2.07
5.62
2.98
3.81
0.08
6.87
0.98
5.43
7.67
7.39
4.58
0.65
3.58
0.50
5.79
4.00
5.23
4.26
+
2.49
2.86
7.77
3.92
3.99
8.34
2.46
4.47
4.63
4.17
8.43
7.79
2.65
+
4.33
2.89
3.31
5.34
4.72
2.46
1.61
2.78
+
2.21
1.68
0.62
3.13
1.08
In terms of average rescue time 𝜑, the ε-greedy variant
generally performed better than the soft max variant,
especially when vehicle and task numbers were low (less
than 16 vehicles). However, for higher numbers of vehicles
the search space had to be narrowed more for the ε-greedy
variant to improve run time efficiency, which led to a
decrease in its performance compared to the soft max variant.
For example, for 8 vehicles the ε-greedy variant produced the
best average rescue time in all cases. However, it is worth
noting that, for this number of vehicles, the ε-greedy variant
took much longer than the soft max variant to find the best
solution, yet the soft max version was still able to outperform the baseline in most cases.
TABLE II.
n
4
6
8
10
12
14
16
ALGORITHM RUN TIMES IN SECONDS
PI
0.1
0.2
0.6
0.9
1.5
3.2
F
PI*15
1.7
3.5
9.1
13.4
22.5
48.5
-
ε-greedy
1.0
2.7
9.6
117.1
44.8
189.6
T
Soft max
1.0
3.0
9.2
18.8
29.8
54.9
83.5
The values of 𝜀 and 𝜏 that produced the best solutions
varied greatly. For example, for 6 vehicles, the best 𝜀 value
was 0.01 in one case, but it was 0.91 in another. Similarly, for
8 vehicles the best 𝜏 was 2 in one of the cases but was 94 in
another. Unfortunately, this means that it is not possible to
narrow the scope of the search to save run time whilst still
guaranteeing the best solution; if the scope of the search is
reduced then the opportunity to find the best solution may be
missed. Thus, there is a compromise between obtaining the
best possible solution and minimizing run time. In these
experiments as the number of vehicles increased, it was
necessary to reduce the ranges of 𝜀 and 𝜏 values that were
searched to preserve run-time efficiency. This meant that the
best solutions were sometimes missed. For example, for 10
vehicles the maximum 𝜀 was set to 0.35, and this produced a
best solution of 261.09 with 𝜀 = 0.15 for case 8. However, if
the maximum 𝜀 had remained at 0.99, then a better solution
(258.95) would have been obtained when 𝜀 reached 0.89.
Table II shows that the baseline PI algorithm runs almost
instantaneously, for example, it takes only about a second
when there are 10 vehicles, and the run time increases
steadily as the number of vehicles rises. The run time of the
soft max variant is comparable with column 3 of the table,
which multiples baseline PI’s run time by 15. It suggests that
running the soft max variant is approximately equivalent to
running the baseline the same number of times, i.e. soft max
takes longer only because it executes a search; there are no
additional complications in its architecture that slow it down.
This is not the case for the 𝜀-greedy variant; for smaller
numbers of vehicles (< 10) it behaves in a similar manner to
the soft max variant, but begins to reveal much longer run
times for higher numbers of vehicles (≥ 10). Moreover, these
run times show erratic behavior, i.e. they do not rise steadily
with the number of vehicles. For example, when 10 vehicles
are used with case 4 the algorithm takes about 117s to run,
but converges in about 45s when 12 vehicles are used.
Further investigation has shown that the problem is caused by
the number of inner iterations of the task allocation phase.
For some problems this number can become very high for the
𝜀-greedy variant because new probability parameters are
generated each time the RPI and IPI are calculated for each
task in the task list. With the soft max variant, the RPI and
IPI are calculated for each task in the list first before the
probability parameters are created. Limiting the number of
internal iterations in the 𝜀-greedy variant should overcome
this problem. Additionally, the time efficiency of both
variants could be improved by terminating the search once
the objective function has reached a given percentage of the
best value computed so far.
An important consideration is that neither of the proposed
algorithms is as efficient as the original PI algorithm in terms
of actual run time. This matters for the problems considered
here, where the maximum mission time is 2000s (about half
an hour), and mean rescue times are at about 280s (about four
and a half minutes). In these problems any savings in mean
rescue time offered by the amended algorithms are counterbalanced by run-time losses. For example, if there is a 4%
saving in mean rescue time, this equates to about 11s for each
task. If there are 28 tasks the total saving is 308s or about 5
minutes. However, if the modified algorithm takes 5 minutes
to run, then any gain is eliminated. For smaller time-scale
problems this means that it is important to reduce the search
space accordingly as 𝑛 increases, even if this means missing
fitter solutions.
For more realistic larger-scale problems, for example,
problems with a maximum mission time in terms of days and
mean rescue start times in terms of hours, the algorithm’s
initial run-time efficiency is not as important as it runs in
terms of minutes and seconds, and thus has much less impact
on overall mean rescue time. In these cases, it would be
possible to examine a wider range of the spectrum of possible
𝜀 and 𝜏 values to be certain of finding the optimal or near
optimal solution without compromising the effectiveness or
efficiency of the mission. In addition, if a new parameter
search is required because new information has become
available then the computational burden can be reduced by
using prior knowledge of the original optimal parameters, i.e.
the search can be conducted within a much smaller range that
centers on the previously selected parameter. This means that
problem solution using modified PI is possible online, and
can be applied to dynamically changing problems.
relatively low cost. This work thus represents an advance in
the state-of-the-art in ST-SR-TA multi-agent task planning.
Future work will aim to improve the time efficiency of
the 𝜀-greedy and soft max PI variants. This will be achieved
by limiting the number of inner iterations in the task
allocation phase in the 𝜀-greedy variant and stopping the
search in both variants when the objective function has
already improved by a given percentage of its best value. The
tests carried out in this paper will also be repeated using
different network topologies and the effects of introducing
uncertainty into key parameters will also be explored.
Finally, the baseline PI algorithm will be adapted to allow it
to reschedule tasks as soon as new information becomes
available.
This work was supported by EPSRC (grant number
EP/J011525/1) with BAE Systems as the leading industrial
partner.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
V. CONCLUSIONS AND FUTURE WORK
In this paper, the distributed Performance Impact (PI)
task-allocation algorithm has been enhanced to include a
degree of either 𝜀-greedy or soft max action selection, thus
introducing a level of exploration to the architecture, much
like a genetic algorithm, which uses random cross-over and
mutation to obtain more variation in the solution. These
variations can permit new areas of the search space to be
explored and hence can improve solution fitness. Indeed, in
the experiments performed here the introduction of more
exploration improved baseline PI’s task allocation
performance by up to about 8%, and enabled the algorithm to
solve additional problems that failed using the baseline
version. Baseline PI has been shown to outperform the
popular state-of-the-art CBBA algorithm [2] in this paper and
also in [1]. The additional action selection mechanisms
detailed in this paper enable improved performance at
[10]
[11]
[12]
[13]
[14]
W. Zhao, Q. Meng, and P. W. H. Chung, “A Heuristic Distributed
Task Allocation Method for Multivehicle Multitask Problems and Its
Application to Search and Rescue Scenario,” IEEE Transactions on
Cybernetics, DOI: 10.1109/TCYB.2015.2418052, April 2015
H-L Choi, L. Brunet, and J. P. How, “Consensus-Based Decentralized
Auctions for Robust Task Allocation,” IEEE Transactions on
Robotics, vol. 25, no. 4, Aug. 2009, pp. 912–926
B. P. Gerkey, and M. J. M. Matarić, “A Formal Analysis and
Taxonomy of Task Allocation in Multi-robot Systems,” Intl. J. of
Robotics Research, vol. 23, no. 9, Sept. 2004, pp. 939–954
J. L. Bruno, E. G. Coffman, and R. Sethi “Scheduling Independent
Tasks to Reduce Mean Finishing Time,” Communications of the ACM,
vol. 17, no. 7, 1974, pp. 382–387
M. B. Dias, and A. Stentz, “Opportunistic Optimization for MarketBased Multi-robot Control”, in Proc. of the IEEE/RSJ Intl. Conf. on
Intelligent Robots and Systems (IROS), Lausanne, Switzerland 2002,
pp. 2714–2720.
C. Liu, and A. Kroll, “Memetic Algorithms for Optimal Task
Allocation in Multi-robot Systems for Inspection Problems with
Cooperative Tasks,” Soft Comput., April 2014
C. Liu, and A. Kroll, “A Centralized Multi-robot Task Allocation for
Industrial Plant Inspection by Using A* and Genetic Algorithms”, L.
Rutkjowski et al. (Eds.): ICAISC 2012, Part II, LNCS 7268, 2012, pp.
466–474
J. A. Fax, and R. M. Murray, “Information Flow and Cooperative
Control of Vehicle Formations”, IEEE Trans. Autom. Control, vol. 49,
no. 9, Sept. 2004, pp. 1465–1476
W. Ren, R. Beard, and D. Kingston, “Multi-agent Kalman Consensus
with Relative Uncertainty”, in Proc. Amer. Control Conf., 2006, pp.
1865–1870
G. Oliver, and J. Guerrero, “Auction and Swarm Multi-Robot Task
Allocation Algorithms in Real Time Scenarios”, Toshiyuki Yasuda
(Ed.): Multi-Robot Systems, Trends and Development, InTech, ISBN:
978-953-307-425-2, 2011, pp. 437–456
D. P. Bertsekas, “The Auction Algorithm for Assignment and Other
Network Flow Problems”, Mass. Inst. Technol., Cambridge, MA,
Tech. Rep., 1989
K. Zhang, E. G. Jr. Collins, and D. Shi, “Centralized and Distributed
Task Allocation in Multi-robot Teams via a Stochastic Clustering
Auction”, ACM Trans. Autonom. Adapt. Syst., vol. 7, no. 2, July 2012,
Article 21
S. Zaman, and D. Grosu, “A Combinatorial Auction-based
Mechanism for Dynamic VM Provisioning and Allocation in Clouds”,
IEEE Transactions on Cloud Computing, vol. 1, no. 2, 2013, pp. 129–
141
M. G. Lagoudakis, M. Berhault, S. Koenig, P. Keskinocak, and A. J.
Kleywegt, “Simple Auctions with Performance Guarantees for Multirobot Task Allocation”, in Proc. of the 2004 IEEE/RSJ Int. Conf. on
Intelligent Robots and Systems, vol. 1, 2004, pp. 698–705