Loughborough University Institutional Repository A novel distributed scheduling algorithm for time-critical multi-agent systems This item was submitted to Loughborough University's Institutional Repository by the/an author. Citation: WHITBROOK, A., MENG, Q. and CHUNG, P.W.H., 2015. A novel distributed scheduling algorithm for time-critical multi-agent systems. Pre- sented at: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28th Sept. to 2nd Oct. pp.6451-6488. Additional Information: • Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Metadata Record: https://dspace.lboro.ac.uk/2134/18840 Version: Accepted for publication Publisher: IEEE Rights: This work is made available according to the conditions of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BYNC-ND 4.0) licence. Full details of this licence are available at: https://creativecommons.org/licenses/bync-nd/4.0/ Please cite the published version. A Novel Distributed Scheduling Algorithm for Time-Critical MultiAgent Systems Amanda Whitbrook, Qinggang Meng, and Paul W. H. Chung Abstract — This paper describes enhancements made to the distributed performance impact (PI) algorithm and presents the results of trials that show how the work advances the stateof-the-art in single-task, single-robot, time-extended, multiagent task assignment for time-critical missions. The improvement boosts performance by integrating the architecture with additional action selection methods that increase the exploratory properties of the algorithm (either soft max or ε-greedy task selection). It is demonstrated empirically that the average time taken to perform rescue tasks can reduce by up to 8% and solution of some problems that baseline PI cannot handle is enabled. Comparison with the consensusbased bundle algorithm (CBBA) also shows that both the baseline PI algorithm and the enhanced versions are superior. All test problems center around a team of heterogeneous, autonomous vehicles conducting rescue missions in a 3dimensional environment, where a number of different tasks must be carried out in order to rescue a known number of victims that is always more than the number of available vehicles. I. INTRODUCTION A. Motivation Multi-agent task allocation problems in the Single-Task, Single-Robot, Time-Extended Assignment (ST-SR-TA) class are strongly NP-hard [4] as they represent complex, combinatorial decision problems, and, even when these problems are small, there is an exponential solution space [3]. These drawbacks mean that the Linear Programming (LP) approach, which guarantees optimality, is not suitable for problems of this type. Much research effort has thus been directed toward designing heuristic-based methods that can provide fitter solutions than those generated by greedy algorithms. The contribution of this paper is the development and testing of extensions to the work carried out in [1] to enhance the performance of a distributed heuristic algorithm that uses the novel concept of performance impact (PI) to allocate time-critical tasks among a heterogeneous set of vehicles. The baseline PI algorithm is expanded to include an appropriate combination of PI task selection and either soft max or ε-greedy task selection. For each problem, the parameter that determines the best action-selection combination is obtained by repeatedly solving between start and end values in suitable steps, until the best solution is found. Extensive testing under several different scenarios is A. M. Whitbrook, Qinggang Meng and Paul W. H. Chung are with the Department of Computer Science at Loughborough University, Loughborough, Leicestershire, LE11 3TU, United Kingdom (phone: +441509225913; e-mail: [email protected], [email protected], [email protected]). carried out to show empirically that the enhancement can improve the performance of the baseline PI algorithm by up to 8%, and enables solution of some problems that the baseline cannot handle. Comparison with the state-of-the-art CBBA algorithm is also included and it is shown that the baseline PI is superior to it. The enhancements suggested here thus advance the state of the art in task assignment for multi-agent, time-critical systems even further. The increased effectiveness can be attributed to enabling escape from local minima by improving the exploration properties of the algorithm. However, the search for an optimal parameter introduces a trade-off between increasing solution time and boosting solution quality. B. Related Work The choice of either a centralized or distributed communication system is of paramount importance when designing task allocation algorithms, as robots continually need to share information about their current task set. Centralized approaches, for example [6] and [7], incur a high communication overhead for larger systems, and are vulnerable to single-point failure. In addition, the vehicles need to communicate with a central server in a fixed location, limiting the range of the mission. However, these systems are generally simpler to implement and tend to run faster as no consensus processing stage is required to ensure that the vehicles have identical situational awareness (SA) or identical solutions. Alternatively, distributed systems, where a planner is instantiated on each vehicle, require less communication bandwidth, allow extension of the mission range, and have no single-point failure vulnerability. However, in real networks, where communication is sometimes limited, inconsistencies in the SA or the generation of different local solutions can lead to conflicting assignments [8], meaning that some form of consensus algorithm is necessary [9]. These consensus-before-planning algorithms provide an additional computational and data processing burden, which can slow down performance, but they have been shown to be robust to different network topologies [2]. Many methods designed for the solution of ST-SR-TA problems involve iterative task allocation, for example market-based decision strategies [5], where each robot is modelled as a self-interested agent, and the whole fleet as an economy. The robots must maximize their own profit by making deals with others in the form of bidding for different tasks. Globally, the profit (revenue minus cost) must be maximized. Auction-based algorithms (see [10], [11]), which are a subset of market-based methods, have also been applied to ST-SR-TA problems. In these algorithms, each robot bids on a task based on information from its own SA, and the highest bidder wins the task assignment. Either a central system or one of the bidders can act as the auctioneer. A disadvantage with the method is that each agent needs to communicate directly with the auctioneer, limiting the choice of network topology that can be employed. To avoid this difficulty the auctions can be run within a set of direct neighbors only, although this can compromise mission performance. Auctionbased methods are generally robust to inconsistencies in the SA, and have been shown to produce sub-optimal solutions efficiently [2]. For a full discussion of centralized and distributed auction methods see [12]. In [2] Choi et al. have shown that their distributed consensus-based bundle algorithm (CBBA, suitable for solving time-critical SR-ST-TA problems) effectively combines the positive properties of auction-based and consensus-before-planning approaches, producing conflictfree solutions independent of inconsistencies in the SA. Task selection is implemented via a decentralized auction phase, and agreement on the winning bids (rather than the SA) is achieved through a consensus phase that also serves to release tasks that have been outbid. Application of the auction method to TA problems is made possible by grouping common tasks into bundles, and allowing the vehicles to bid on the bundles rather than individual tasks. Bundles are continuously updated as the auction proceeds. The authors show that the method produces the same solution as some centralized sequential greedy procedures, and 50% optimality is guaranteed. Task bundling auction methods are also described in [13] and [14]. In CBBA the bundles are formed by logically grouping similar tasks, as it would be too computationally costly to enumerate all possible bundles. The architecture in [1] uses a similar approach to CBBA, but considers a set of tasks to have a positive synergy for a vehicle if the combined cost of executing them together is less than the sum of the individual costs incurred by carrying them out separately (and vice versa for a negative synergy). The method uses a key novel concept called performance impact (PI) in order to exploit the synergies between tasks to increase optimality. This is a measure of the importance of a task to a vehicle’s local cost. Global cost is decreased by satisfying certain criteria on the performance impact when switching tasks between vehicles. The distributed PI algorithm has been shown empirically to solve task allocation problems more effectively than the CBBA method. When solving a number of time-critical STSR-TA problems with different network topologies, different numbers of vehicles and tasks, and randomly generated locations for survivors and vehicles, the PI approach demonstrates a consistently lower average rescue time, and is able to solve many problems that the CBBA method cannot. This paper develops the PI algorithm further and tests performance against both CBBA and baseline PI. It is arranged as follows; section II summarizes the problem domain and describes the baseline PI architecture. Section III details the additional action selection mechanisms and shows how they are integrated into baseline PI. Section IV provides the experimental methodology and presents a detailed result comparison for the algorithms using several different scenarios. Section V concludes the paper and suggests possible extensions for the work. II. PROBLEM DESCRIPTION AND PI ARCHITECTURE A. A. Summary of the Problem The problem of interest in this paper is formulated mathematically by defining a set of 𝑛 heterogeneous rescue vehicles 𝑽 = [𝑣1 , … , 𝑣𝑛 ] , a set of 𝑚 tasks 𝑻 = [𝑡1 , … , 𝑡𝑚 ]T 𝑚 > 𝑛, and a set of ordered task allocations 𝑨 = [𝒂1 , … , 𝒂𝑛 ]T , where 𝒂𝑖 , 𝑖 = 1, … , 𝑛, is the task list assigned to vehicle 𝑣𝑖 . Note that the actual size of a task list 𝛼𝑖 may vary between vehicles, for example vehicle 𝑣1 may be assigned a single task, whereas vehicle 𝑣2 may be assigned three tasks. However, mathematically |𝒂𝑖 | ≡ 𝑚 as the remaining elements of each assignment vector are set to -1. A compatibility matrix 𝑯 with entries ℎ𝑖,𝑗 ∈ [0,1] defines whether vehicle 𝑣𝑖 is able to perform task 𝑡𝑗 as there may be different task types (the value is 1 if it is able, 0 otherwise). In addition, a maximum start time 𝑺 = [𝑠1 , … , 𝑠𝑚 ] is randomly defined for each task in each test scenario. After this time has elapsed the task cannot commence, i.e. the problem has the additional complexity of being time-limited. Thus, the time cost for a particular task is defined as the time taken to arrive at the scene; the time taken to carry out the current task is fixed for each task type and is not included; only previous task times affect the current time cost. Each task requires only one vehicle, and each vehicle can complete only one task at a time, although it can complete other tasks afterwards, provided that there is enough time. The problem is to find a conflict-free assignment of vehicles to tasks that maximizes some global reward. It falls into the general category of Single Task–Single Robot (ST-SR) task allocation problems, as defined in [3], and since there are always more victims to find than vehicles available, the problem is also a Time-Extended Assignment (TA) type, i.e. it is an ST-SR-TA system under the same taxonomy. In this particular case, the global objective function is to minimize 𝜑 the average single task time over all tasks, i.e. 𝑛 𝜑= 𝛼𝑖 1 ∑ ∑ 𝑐𝑖,𝑘 (𝒂𝑖 ), 𝑚 (1) 𝑖=1 𝑘=1 where 𝛼𝑖 is the number of tasks assigned to vehicle 𝑣𝑖 , and 𝑐𝑖,𝑘 (𝒂𝑖 ) is the time cost incurred by vehicle 𝑣𝑖 servicing the kth tasks in its task list. The particular scenario used here is based on the rescue aspect of urban search-and-rescue (USAR). The vehicles are either Unmanned Air Vehicles (UAVs) supplying food or helicopters supplying medicine, and each vehicle must find its way to a victim that requires the supplies it is carrying. The start locations of the UAVs and helicopters are known in advance, as are the 3-dimensional locations and requirements of the victims. Different test scenarios can be created by using different seed values to set the vehicle and task locations randomly. The following problem details hold throughout this paper unless stated otherwise: 1. The number of tasks is always exactly double the number of vehicles, and the numbers of helicopters and 2. 3. 4. 5. 6. 7. UAVs are always equal. The world 𝑥 and 𝑦 coordinates range from -5000m to 5000m and the 𝑧 coordinates range from 0m to 1000m. The helicopters travel at 30m/s and the UAVs at 50m/s. All vehicles are available straight away at the start of the mission. The mission time limit (the time window within which the mission must finish) is set at 2000s and the earliest start time is always 0s for all tasks. The maximum start time s is generated for each task using a random fraction of 2000s. The times to execute delivery of medicine and food are fixed at 300s and 350s respectively. Note that real USAR missions do not conform to the above assumptions; they are made merely to simplify the analysis and enable general conclusions to be drawn more easily. B. The PI Architecture PI is a distributed algorithm that runs independently on board each vehicle. It uses the basic framework of the distributed CBBA auction architecture [2], but introduces the novel concept of performance impact (PI) as a score function to determine the bundles and hence allocate the tasks. As in the CBBA algorithm, there is a local task allocation phase in which each vehicle generates a bundle of tasks, and a task consensus phase that resolves conflicts through local communication between connected vehicles. The two phases are repeated until convergence, i.e. until a conflict-free task assignment is reached. There are two types of performance impact, which are now explained. The removal performance impact (RPI) 𝑤𝑘 (𝒂𝑖 ⊝ 𝑡𝑘 ) of task 𝑡𝑘 to its assigned vehicle 𝑣𝑖 is the cost of performing a removed task plus the difference in cost (with and without the removed task) of performing future tasks. It represents the contribution of a task to the local cost generated by a vehicle. RPI is defined as: ∝𝑖 𝑐𝑖,𝑏 (𝒂𝑖 ) + ∑ {𝑐𝑖,𝑟 (𝒂𝑖 ) − 𝑐𝑖,𝑟−1 (𝒂𝑖 ⊝ 𝑡𝑘 )}, (2) 𝑟=𝑏+1 where 𝒂𝑖 ⊝ 𝑡𝑘 symbolizes removal of task 𝑡𝑘 from the task list 𝒂𝑖 of vehicle 𝑣𝑖 , and 𝑏 denotes the position of task 𝑡𝑘 in the task list, i.e. 𝑎𝑖,𝑏 = 𝑡𝑘 . The summation term represents comparison of the time cost with the task 𝑡𝑘 included in the task list (first term) and the time cost without it (second term). It is a summation since this is calculated for all the tasks following 𝑡𝑘 in the task list. An RPI list 𝜸𝑝 = [𝑤1 , … , 𝑤𝑚 ]T 𝑝 = 1, … , 𝑛 is thus compiled for each vehicle. T To facilitate consensus, a vehicle list 𝜷𝑝 = [𝛽1 , … , 𝛽𝑚 ] 𝑝 = 1, … , 𝑛 is also composed for each vehicle. This list records the local view of which vehicle is assigned to which task. When a task is removed from a vehicle’s task list it must be added to the task list of another. Thus, it is necessary to define inclusion performance impact (IPI) to measure the task’s contribution to the local cost generated by the new vehicle. The IPI 𝑤𝑘∗ (𝒂𝑗 ⊕ 𝑡𝑘 ) of task 𝑡𝑘 to vehicle 𝑣𝑗 is the cost of performing the additional task plus the difference in cost (with and without the added task) of performing future tasks. It is defined as: ∝ 𝑗 ∆ 𝑤𝑘∗ (𝒂𝑗 ⊕ 𝑡𝑘 ) = min𝑙=1 {𝑤𝑘,𝑗,𝑙 }, (3) ∝𝑗 ∆ 𝑤𝑘,𝑗,𝑙 = {𝑐𝑗,𝑙 (𝒂𝑗 ⨁𝑙 𝑡𝑘 ) + ∑{𝑐𝑗,𝑟+1 (𝒂𝑗 ⨁𝑙 𝑡𝑘 ) − 𝑐𝑗,𝑟 (𝒂𝑗 )}}, (4) 𝑟=𝑙 where 𝒂𝑗 ⨁𝑙 𝑡𝑘 symbolizes adding task 𝑡𝑘 into the task list 𝒂𝑗 ∆ of vehicle 𝑣𝑗 , at the 𝑙 th position. The value of 𝑤𝑘,𝑗,𝑙 in (4) is ∗ calculated for each possible value of 𝑙 and 𝑤𝑘 is taken as the minimum of these. Again, the summation term represents comparison of the time cost with the task 𝑡𝑘 now included in the task list (first term) and the time cost without it (second term). This is calculated for all the tasks including and following position 𝑙 in the task list. An IPI list 𝜸∗𝑝 = ∗ ]T [𝑤1∗ , … , 𝑤𝑚 𝑝 = 1, … , 𝑛 is thus compiled for each vehicle. Note that in the implementation of PI an infinity value is used for 𝑤𝑘∗ when a task is already included in a vehicle’s task list. ∆ Intuitively, the RPI and 𝑤𝑘,𝑗,𝑙 in (4) have the same value when a task is removed from a vehicle’s task list and is then added back into the same task list in the same position, i.e. when 𝑖 = 𝑗 and 𝑏 = 𝑙. When removing a task 𝑡𝑘 from the task list 𝒂𝑖 of 𝑣𝑖 it is obvious that there is benefit in adding it to the task list 𝒂𝑗 of vehicle 𝑣𝑗 if 𝑤𝑘 (𝒂𝑖 ⊝ 𝑡𝑘 ) > 𝑤𝑘∗ (𝒂𝑗 ⊕ 𝑡𝑘 ) as this will decrease the overall cost by the difference between the two values. The novelty of the PI algorithm is the RPI and IPI concepts; its full structure, which is similar to the CBBA algorithm [2] and written in MATLAB code, is now described. At the start of the PI algorithm, the locations of the tasks and vehicles and the maximum start times are randomly generated, and the network topology (row, circular, mesh or hybrid) is defined. Also, the vehicle RPI lists and IPI lists are initialized to an 𝑚-sized vector holding the maximum MATLAB real number. The task lists, time cost lists and vehicle lists are initialized to an 𝑚-sized vector of -1, -1 and 0 values respectively. During the consensus phase the vehicles exchange RPI lists, vehicle lists and also time stamps with all other vehicles in their range. When all the lists have been received, the consensus takes place, i.e. the RPI and vehicle lists are recomputed according to an adaptation of the CBBA action rules specified in Table 1 of [2], which stipulates conditions for updating (adopting another vehicle’s lists), leaving (keeping the same lists), and resetting. These rules are based on comparing RPI values and determining which vehicle has the most up-to-date information. For example, if vehicle 𝑗 is the sender and vehicle 𝑖 is the receiver, and both vehicles claim task 𝑘 then if 𝑤𝑗𝑘 < 𝑤𝑖𝑘 the receiver’s action is to update so that 𝑤𝑖𝑘 = 𝑤𝑗𝑘 and 𝛽𝑖𝑘 = 𝛽𝑗𝑘 . If the sender claims task 𝑘 but the receiver credits it to a different vehicle 𝑝 then, if either the time stamp for the information exchange between vehicles 𝑗 and 𝑝 is more recent than that between vehicles 𝑖 and 𝑝, or if 𝑤𝑗𝑘 < 𝑤𝑖𝑘 , then the receiver’s action is the same. After the consensus phase, the task removal phase of the algorithm begins. Tasks are marked as candidates for removal from a vehicle’s task list if there is disagreement between the vehicle list computed in the consensus phase and the current task list, i.e. if a task is recorded on the task list, but that task is assigned to a different vehicle on the vehicle list. The RPI list 𝜸𝑖⋄ = [𝑤1 , … , 𝑤𝑚 ]T is then calculated according to (2) and iteratively compared with the previous RPI list 𝜸𝑖 = [𝑤1 , … , 𝑤𝑚 ]T that emerged from the consensus phase, for all candidate tasks 𝒅𝑖 , i.e. the following is computed: |𝑑 | ⋄ 𝑖 𝑧 = max𝑘=1 {𝛾𝑖,𝑘 − 𝛾𝑖,𝑘 }. (5) If z ≥ 0 then the task yielding the maximum 𝑧 is removed from both the task list and the candidate list, and the time cost is then re-calculated. In addition, 𝜸⋄𝑖 is computed again from (2) as its value changes following the removal of the task. Equation (5) is re-evaluated, and the process repeats until 𝒅𝑖 = ∅. Any unremoved tasks are assigned to 𝑣𝑖 in the vehicle list, i.e. 𝜷𝑖 (𝒅𝑖 ) = 𝑣𝑖 and the RPI list 𝜸𝑖 is set as the final 𝜸⋄𝑖 . 𝑓𝑘 𝑝𝑘 = 𝑒𝜏 𝑓𝑗 . (7) 𝜏 ∑𝑚 𝑗=1 𝑒 By varying the parameter 𝜏 it is possible to alter the selection strategy from picking a random item (𝜏 infinite) to assigning higher probabilities for higher fitness (𝜏 small and finite), to choosing only the item with the best fitness (𝜏 tending to 0). B. Integration with the PI Algorithm In the proposed approach ε-greedy or Boltzmann soft max action selection routines are integrated into the PI algorithm and a loop is constructed around the main program. Within the loop different values of 𝜀 and 𝜏 respectively are trialed (in appropriate steps) until the best solution (i.e. that yielding the minimum 𝜑 value from (1)) is obtained. Pseudo code for the modified main program is presented in Algorithm 1. A. Soft Max and ε-Greedy Action Selection In the baseline PI algorithm task allocation is governed only by comparing the calculated RPI and IPI values using (5) and (6). This approach can restrict the solution search space to local minima. There is thus a need to provide an additional mechanism that permits further exploration of the solution space. Algorithm 1 Main Program 1: Initialize φ ̂ , ε̂ (ε-greedy) or τ̂ (soft max) to large ...values, Set Seed 2: for 𝜀 [𝜏] between start value and end value 3: Reset Seed, 4: Define World, Vehicles, Tasks, Network Topology 5: for each vehicle 𝑖 6: Initialize 𝑎𝑖 , 𝑐𝑖 , 𝑤𝑖∗ , 𝑤𝑖 , 𝛽𝑖 7: next 8: while not converged 9: Communicate 𝑤𝑖 and 𝛽𝑖 between vehicles 10: Re-compute 𝑤𝑖 and 𝛽𝑖 according to CBBA rules 11: for each vehicle 𝑖 12: Carry out 𝜀- greedy [soft max] task removal 13: Carry out 𝜀- greedy [soft max] task inclusion 14: next 15: Check convergence 16: if converged 17: Compute 𝜑 from (1) 18: Results set = ℛ 19: Mark as converged 20: end if 21: end while 22: if 𝜑 < 𝜑̂ AND problem has not failed 23: 𝜑̂ = 𝜑 , 𝜀̂ = 𝜀 [𝜏̂ = 𝜏], ℛ̂ = ℛ 24: end if 25: next 26: Show final 𝜑̂, 𝜀̂ [𝜏̂ ], ℛ̂ There are two variants of the modified PI algorithm; one that uses ε-greedy action selection, and one that uses Boltzmann soft max selection. In ε-greedy selection the best option is selected for a proportion 1 − 𝜀 of trials, and a random option (with uniform probability) is selected for a proportion 𝜀, (0 ≤ 𝜀 ≤ 1). In the Boltzmann soft max method, selection is based on a fitness score 𝑓 for the various options. If there are 𝑚 items, and the fitness for item 𝑘 is 𝑓𝑘 , then the probability 𝑝𝑘 of selecting item 𝑘 is given by In the ε-greedy variant selection is modified when the RPI and IPI are calculated in the task removal and task inclusion phases respectively. In the task removal phase, after 𝑤𝑘 has been calculated from (2) there is a probability 𝜀, (0 ≤ 𝜀 ≤ 1) of multiplication of 𝑤𝑘 by random factor 𝛿, (1 ≤ 𝛿 ≤ 1.5). For example, if 𝜀 = 0.25 there is a 25% probability that 𝑤𝑘 is modified for each task in the task list and a 75% probability (1 − 𝜀 ) that 𝑤𝑘 is unaffected. The modification means there The next phase is the task inclusion phase, where the IPI ∗ ]T list 𝜸∗𝑖 = [𝑤1∗ , … , 𝑤𝑚 is computed according to (3) and (4), and compared with the RPI list 𝜸𝑖 = [𝑤1 , … , 𝑤𝑚 ]T computed in the task removal phase, i.e. the following is calculated: ∗ 𝑚 𝑞 =max𝑘=1 {𝛾𝑖,𝑘 − 𝛾𝑖,𝑘 }. (6) If 𝑞 > 0 then the task 𝑡𝜁 yielding the maximum 𝑞 is added to the task list of 𝑣𝑖 at the position 𝑙 that returns the minimum 𝑤𝜁∗ , and the time cost is re-calculated. The RPI of the task for 𝑣𝑖 becomes the IPI of the task for 𝑣𝑖 and the vehicle list 𝜷𝑖 is adjusted accordingly. The IPI is recalculated, (6) is re-evaluated, and the process repeats until there are no more tasks in the task list. Finally, the RPI 𝜸𝑖 = [𝑤1 , … , 𝑤𝑚 ]T is recalculated at the end of the phase. The communication exchange, consensus, task removal and task inclusion phases continue iteratively until suitable stopping criteria is met, for example, until no actions have been taken in the task removal and inclusion phases for more than two iterations. For further details of the PI algorithm see [1]. III. ACTION SELECTION MECHANISMS is a probability that the task yielding 𝑧 in (5) will differ from the equivalent task in the baseline PI algorithm. Task inclusion works in a similar way. Pseudo codes for the εgreedy task removal and task inclusion routines are presented in Algorithms 2 and 3 respectively. Note that the upper and lower limits for 𝛿 were determined from pre-trials. Use of a fixed 𝛿 represents a more common form of ε-greedy selection, but the pre-trials demonstrated advantages (in terms of solution quality) when using a variable 𝛿 with minimum value 1 and maximum 1.5. Algorithm 2 𝜀- greedy Task Removal 1: Compute candidate tasks for removal 2: while candidate list is not empty 3: for each task in the task list 4: if vehicle and task are compatible 5: Set previous (and next) task and time cost 6: Compute 𝑤𝑘 from (2) 7: Set 𝜌, 0 ≤ 𝜌 ≤ 1.0 and 𝛿, 1 ≤ 𝛿 ≤ 1.5 8: if 𝜌 < 𝜀 9: Set 𝑤𝑘 = 𝛿𝑤𝑘 10: end if 11: end if 12: next 13: Compute 𝑧 from (5) 14: if 𝑧 ≥ 0 15: Remove task yielding max 𝑧 from task list 16: Remove task yielding max 𝑧 from candidate list 17: Re-calculate time cost list 18: end if 19: end while 20: Put unremoved tasks back into vehicle list Algorithm 3 𝜀- greedy Task Inclusion 1: while tasks in task list not at upper limit 2: for each task in problem 3: if vehicle and task are compatible 4: if task not already in task list 5: for each insertion position 6: Set previous task and time cost 7: Compute 𝑤𝑘∗ from (3) and (4) 8: Set 𝜌, 0 ≤ 𝜌 ≤ 1.0 and 𝛿, 1 ≤ 𝛿 ≤ 1.5 9: if 𝜌 < 𝜀 10: Set 𝑤𝑘∗ = 𝛿𝑤𝑘∗ 11: end if 12: next 13: end if 14: end if 15: next 16: Compute 𝑞 from (6) and 𝑙 from (3) and (4) 17: if 𝑞 > 0 18: Add task yielding max 𝑞 to task list at position 𝑙 19: Update vehicle list 20: Set 𝑤𝑘 = 𝑤𝑘∗ 21: Recalculate time cost list 22: end if 23: end while 24: Recalculate 𝜸𝑖 In the Boltzmann soft max version selection is modified in a different way. The RPI and IPI are calculated for each task as in the baseline PI algorithm. For task removal the arrays 𝝀, 𝝃 (fitness) and 𝝈 (related to the top term in (7)) are then determined from: 𝝀 = 𝜸⋄ [𝒅] − 𝜸[𝒅], (8) 𝜇 = min{𝝀}, (9) 𝜇∗ = |𝜇| ∀ 𝜇 < 0, 𝜇∗ = 0 ∀ 𝜇 ≥ 0, 𝝃= 𝝀+ (10) (11) 𝜇∗ , (12) 𝝃 𝝉 (13) 𝝈= 𝑒 . For task inclusion 𝝀 is calculated from 𝝀 = 𝜸 − 𝜸∗ . (14) Calculation of 𝜇 (the adjustment factor to remove negative values) is slightly more complex for task inclusion, as some of the members of the 𝝀 array may have values equal to MATLAB’s largest possible value 𝑅 from initialization. Thus, 𝝀 is first adjusted so that any such members have their values scaled by a factor 𝑅. If 𝝀∗ represents the adjusted 𝝀 array, then 𝜇 = min{𝝀∗ }, (15) and 𝜇 ∗ is given by (10) and (11) as before. The fitness is defined as 𝝃 = 𝝀∗ + 𝜇∗ , (16) and 𝝈 is given by (13) as before. For both task removal and task inclusion, the probability 𝑝𝑘 of task 𝑘 being selected is given by 𝜎𝑘 𝑝𝑘 = 𝑚 (17) ∑𝑗=1 σ𝑗 from (7). To facilitate this, a random number 𝜌 is generated for each iteration of the task removal and task inclusion phases, and this number determines which task is selected for removal or inclusion according to (17). By varying 𝜏 in (13) it is possible to control the reliance of the strategy upon probability. Pseudo codes for the Boltzmann soft max task removal and task inclusion routines are presented in Algorithms 4 and 5 respectively. Note that the value of 𝑧 is still calculated from (5) for task removal, even if a different task is selected (i.e. if the task yielding the maximum value is not selected). For task inclusion 𝑞 is not calculated from (6); instead, it is taken as 𝜆𝑗 where 𝑗 represents the selected task. The position of insertion in the task list 𝑙 is still taken as that yielding the minimum 𝑤𝑘∗ from (3). Note that, in theory, parameter reselection could be carried out at any time, to cope with dynamic changes in the environment. Although this would impose an additional computational burden during run time, minimization of impact is possible by limiting the search to a region close to the original optimal parameter, assessing overall mission time, and imposing suitable stopping criteria. Algorithm 4 Soft Max Task Removal 1: Compute candidate tasks for removal 2: while candidate list is not empty 3: for each task in the task list 4: if vehicle and task are compatible 5: Set previous (and next) task and time cost 6: Compute 𝑤𝑘 from (2) 7: end if 8: next 9: Set 𝜆, 𝜇, 𝜇 ∗ , 𝜉, 𝜎 from (8) to (13) 10: Compute probabilities of task selection from (17) 11: Generate 𝜌, 0 ≤ 𝜌 ≤ 1.0 12: Select task for removal based on probabilities 13: Compute 𝑧 from (5) 14: if 𝑧 ≥ 0 15: Remove selected task from task list 16: Remove selected task from candidate list 17: Re-calculate time cost list 18: end if 19: end while 20: Put unremoved tasks back into vehicle list Algorithm 5 Soft Max Task Inclusion 1: while tasks in task list not at upper limit 2: for each task in problem 3: if vehicle and task are compatible 4: if task not already in task list 5: for each insertion position 6: Set previous task and time cost 7: Compute 𝑤𝑘∗ from (3) and (4) 8: next 9: end if 10: end if 11: next 12: Set 𝜆, 𝜆∗ , 𝜇, 𝜇 ∗ from (14-15), (10-11) 13: Set 𝜉, 𝜎 from (16) and (13) 14: Compute probabilities of task selection from (17) 15: Generate 𝜌, 0 ≤ 𝜌 ≤ 1.0 16: Select task for inclusion based on probabilities 17: Calculate 𝜆𝑗 for selected task 𝑗 18: Compute 𝑙 from (3) and (4) 19: if 𝜆𝑗 > 0 20: Add task 𝑗 to task list at position 𝑙 21: Update vehicle list 22: Set 𝑤𝑘 = 𝑤𝑘∗ 23: Recalculate time cost list 24: end if 25: end while 26: Recalculate 𝜸𝑖 IV. EXPERIMENTS A. Methodology In the experiments nine seeds were used to create different 3-dimensional cases with 4, 6, 8, 10, 12, 14, and 16 vehicles, i.e. 63 different problems were tackled, each using a row communication topology. The problems were solved using CBBA, the baseline PI algorithm, and the two proposed PI variants in this paper. If the problem was solved, i.e. each task was completed on time, then 𝜑 was calculated and recorded. If some tasks were not completed then the number of failed tasks for each task type was recorded instead. For ε-greedy selection, 𝜀 values of between 0.01 and 0.99 were trialed in steps of 0.01. However, pre-trials showed that the program execution time for all 99 𝜀 values was becoming too large as 𝑛 increased. To avoid delays the stopping values were changed to 0.35 and 0.15 for 10/12 and 14/16 vehicles respectively. For the same reason soft max selection 𝜏 values between 1 and 100 in steps of 1 were used for up to 8 vehicles, but the stopping value was adjusted to 𝜏 = 50 for 10, 12 and 14 vehicles, and to 𝜏 = 20 for 16 vehicles. These adjustments meant that run time never exceeded 7 minutes for 𝜀-greedy selection (3 minutes for soft max selection, which proved to be a faster algorithm) in the MATLAB implementations used here. In CBBA values of 0.001 were used for 𝜆 and also for 𝐹 (the vehicle fuel penalty) to match the scale of the worlds generated. The CBBA score function used was: ℋ𝑒 −𝜆𝑡 − 𝐹𝑑, where ℋ is the reward associated with a task (ℋ = 100 for all tasks) and 𝑑 is the distance between vehicle and task. B. Results Table I shows the calculated 𝜑 values for each algorithm and scenario and, in the case of the PI variants, the 𝜀 and 𝜏 values that produced the best solution are also displayed. The percentage improvement of the best algorithm (highlighted in bold) over baseline PI is also shown in the last column, where appropriate. If the best algorithm is able to solve when baseline PI cannot, then a ‘+’ symbol is shown. If no algorithm can solve, then a ‘−’ symbol is displayed. Table II shows the time taken in seconds for 15 iterations of the 𝜀-greedy and soft max PI variants when case 4 is solved. This is the time taken when the maximum 𝜀 is set to 0.15 and a step-size of 0.01 is used, and when the maximum 𝜏 is set to 15 with increments of size 1. These times are compared with the corresponding run times of the baseline PI algorithm, which only executes a single iteration. For comparison purposes this value is also multiplied by 15 in the next column. If an ‘F’ is shown in the table then the algorithm failed to solve the problem and, if a ‘T’ is shown, this indicates that the run was terminated because a 600 second limit was reached. C. Discussion CBBA demonstrated a high failure rate and out of the 63 problems trialed it was only able to solve 12. It offered the best solution in only 1 problem (for 4 vehicles, case 8), but in this case, the ε-greedy and soft max PI variants also produced the same solution. The PI variants consistently out-performed the original in terms of the average rescue time; up to about 8% improvement was observed. In addition, there were 3 and 4 problems respectively that the ε-greedy and soft max variants could solve that the baseline could not. They improved the average solution time (or were able to offer solutions when the baseline could not) for 92% of the problems; the remaining 8% of problems could not be solved by any algorithm. TABLE I. 𝒏 4 6 8 10 12 14 16 Case 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 CBBA PI 1/0 1/0 0/1 297.76 0/1 0/1 338.03 274.65 313.20 1/1 1/2 0/1 1/0 0/1 0/1 307.61 281.06 270.23 3/0 2/2 1/2 2/1 274.96 1/1 0/1 270.67 275.92 3/2 3/2 2/1 2/1 1/1 1/1 0/1 0/1 273.67 2/2 4/1 2/0 1/1 2/1 2/0 0/1 0/1 1/0 4/1 3/5 3/0 0/1 1/1 2/1 2/1 1/1 1/0 2/3 3/4 3/4 2/1 2/1 1/2 2/0 1/0 251.23 1/0 293.93 309.30 314.53 263.83 0/1 342.75 281.35 312.72 320.83 313.80 289.50 285.95 270.88 297.97 303.01 275.12 288.95 297.76 313.11 316.31 328.79 270.33 266.51 288.89 255.28 281.69 294.15 1/1 296.75 312.05 1/0 261.77 276.20 280.76 272.68 287.77 1/1 303.77 268.95 274.55 255.63 272.94 285.07 271.86 278.68 0/1 281.63 264.79 269.45 263.44 272.09 262.40 245.47 0/2 1/0 274.48 1/0 265.43 241.58 238.49 254.40 239.81 𝜑 VALUES εgreedy 1/0 293.93 302.99 296.76 261.57 336.71 332.12 274.65 298.65 298.65 307.40 288.84 280.04 255.66 289.09 291.45 274.91 269.10 294.85 296.12 292.05 304.50 257.95 264.79 278.54 254.01 265.38 284.44 1/1 282.23 301.23 274.27 255.25 268.32 262.05 263.15 276.28 0/1 278.44 262.32 262.40 243.80 264.13 261.09 250.68 271.30 1/1 275.13 259.28 262.10 249.37 261.76 255.94 241.52 0/2 1/0 267.32 267.09 263.93 240.52 241.87 250.00 237.23 𝜺 0.01 0.43 0.31 0.43 0.14 0.19 0.49 0.05 0.91 0.21 0.61 0.49 0.08 0.01 0.31 0.30 0.44 0.76 0.49 0.91 0.75 0.77 0.58 0.97 0.10 0.88 0.32 0.26 0.29 0.33 0.11 0.16 0.28 0.27 0.33 0.16 0.33 0.18 0.08 0.32 0.15 0.35 0.14 0.11 0.15 0.15 0.13 0.08 0.07 0.15 0.06 0.11 0.03 0.03 0.06 0.02 0.11 Soft max 1/0 293.93 309.30 297.76 263.83 349.00 337.78 274.65 298.65 294.42 307.40 289.50 280.04 256.82 291.18 291.45 274.91 270.23 297.77 303.11 292.23 305.34 266.53 270.94 283.99 255.28 267.33 289.12 1/1 281.24 298.74 288.94 270.10 271.44 276.00 263.48 278.19 1/1 283.27 267.08 262.27 249.64 261.54 261.04 256.28 277.71 284.09 279.52 257.13 260.55 251.36 259.26 259.39 245.47 0/2 1/0 266.85 278.25 259.56 237.51 237.02 246.44 240.01 𝝉 94 2 1 1 28 41 1 1 28 19 2 2 1 18 5 31 2 2 94 7 15 63 82 10 3 2 6 3 5 4 15 2 2 10 3 4 9 19 7 9 12 12 15 47 42 17 31 6 12 41 14 20 10 10 5 12 7 13 % PI 0.00 2.04 5.65 0.86 + 3.10 2.38 4.50 8.23 2.04 0.23 2.07 5.62 2.98 3.81 0.08 6.87 0.98 5.43 7.67 7.39 4.58 0.65 3.58 0.50 5.79 4.00 5.23 4.26 + 2.49 2.86 7.77 3.92 3.99 8.34 2.46 4.47 4.63 4.17 8.43 7.79 2.65 + 4.33 2.89 3.31 5.34 4.72 2.46 1.61 2.78 + 2.21 1.68 0.62 3.13 1.08 In terms of average rescue time 𝜑, the ε-greedy variant generally performed better than the soft max variant, especially when vehicle and task numbers were low (less than 16 vehicles). However, for higher numbers of vehicles the search space had to be narrowed more for the ε-greedy variant to improve run time efficiency, which led to a decrease in its performance compared to the soft max variant. For example, for 8 vehicles the ε-greedy variant produced the best average rescue time in all cases. However, it is worth noting that, for this number of vehicles, the ε-greedy variant took much longer than the soft max variant to find the best solution, yet the soft max version was still able to outperform the baseline in most cases. TABLE II. n 4 6 8 10 12 14 16 ALGORITHM RUN TIMES IN SECONDS PI 0.1 0.2 0.6 0.9 1.5 3.2 F PI*15 1.7 3.5 9.1 13.4 22.5 48.5 - ε-greedy 1.0 2.7 9.6 117.1 44.8 189.6 T Soft max 1.0 3.0 9.2 18.8 29.8 54.9 83.5 The values of 𝜀 and 𝜏 that produced the best solutions varied greatly. For example, for 6 vehicles, the best 𝜀 value was 0.01 in one case, but it was 0.91 in another. Similarly, for 8 vehicles the best 𝜏 was 2 in one of the cases but was 94 in another. Unfortunately, this means that it is not possible to narrow the scope of the search to save run time whilst still guaranteeing the best solution; if the scope of the search is reduced then the opportunity to find the best solution may be missed. Thus, there is a compromise between obtaining the best possible solution and minimizing run time. In these experiments as the number of vehicles increased, it was necessary to reduce the ranges of 𝜀 and 𝜏 values that were searched to preserve run-time efficiency. This meant that the best solutions were sometimes missed. For example, for 10 vehicles the maximum 𝜀 was set to 0.35, and this produced a best solution of 261.09 with 𝜀 = 0.15 for case 8. However, if the maximum 𝜀 had remained at 0.99, then a better solution (258.95) would have been obtained when 𝜀 reached 0.89. Table II shows that the baseline PI algorithm runs almost instantaneously, for example, it takes only about a second when there are 10 vehicles, and the run time increases steadily as the number of vehicles rises. The run time of the soft max variant is comparable with column 3 of the table, which multiples baseline PI’s run time by 15. It suggests that running the soft max variant is approximately equivalent to running the baseline the same number of times, i.e. soft max takes longer only because it executes a search; there are no additional complications in its architecture that slow it down. This is not the case for the 𝜀-greedy variant; for smaller numbers of vehicles (< 10) it behaves in a similar manner to the soft max variant, but begins to reveal much longer run times for higher numbers of vehicles (≥ 10). Moreover, these run times show erratic behavior, i.e. they do not rise steadily with the number of vehicles. For example, when 10 vehicles are used with case 4 the algorithm takes about 117s to run, but converges in about 45s when 12 vehicles are used. Further investigation has shown that the problem is caused by the number of inner iterations of the task allocation phase. For some problems this number can become very high for the 𝜀-greedy variant because new probability parameters are generated each time the RPI and IPI are calculated for each task in the task list. With the soft max variant, the RPI and IPI are calculated for each task in the list first before the probability parameters are created. Limiting the number of internal iterations in the 𝜀-greedy variant should overcome this problem. Additionally, the time efficiency of both variants could be improved by terminating the search once the objective function has reached a given percentage of the best value computed so far. An important consideration is that neither of the proposed algorithms is as efficient as the original PI algorithm in terms of actual run time. This matters for the problems considered here, where the maximum mission time is 2000s (about half an hour), and mean rescue times are at about 280s (about four and a half minutes). In these problems any savings in mean rescue time offered by the amended algorithms are counterbalanced by run-time losses. For example, if there is a 4% saving in mean rescue time, this equates to about 11s for each task. If there are 28 tasks the total saving is 308s or about 5 minutes. However, if the modified algorithm takes 5 minutes to run, then any gain is eliminated. For smaller time-scale problems this means that it is important to reduce the search space accordingly as 𝑛 increases, even if this means missing fitter solutions. For more realistic larger-scale problems, for example, problems with a maximum mission time in terms of days and mean rescue start times in terms of hours, the algorithm’s initial run-time efficiency is not as important as it runs in terms of minutes and seconds, and thus has much less impact on overall mean rescue time. In these cases, it would be possible to examine a wider range of the spectrum of possible 𝜀 and 𝜏 values to be certain of finding the optimal or near optimal solution without compromising the effectiveness or efficiency of the mission. In addition, if a new parameter search is required because new information has become available then the computational burden can be reduced by using prior knowledge of the original optimal parameters, i.e. the search can be conducted within a much smaller range that centers on the previously selected parameter. This means that problem solution using modified PI is possible online, and can be applied to dynamically changing problems. relatively low cost. This work thus represents an advance in the state-of-the-art in ST-SR-TA multi-agent task planning. Future work will aim to improve the time efficiency of the 𝜀-greedy and soft max PI variants. This will be achieved by limiting the number of inner iterations in the task allocation phase in the 𝜀-greedy variant and stopping the search in both variants when the objective function has already improved by a given percentage of its best value. The tests carried out in this paper will also be repeated using different network topologies and the effects of introducing uncertainty into key parameters will also be explored. Finally, the baseline PI algorithm will be adapted to allow it to reschedule tasks as soon as new information becomes available. This work was supported by EPSRC (grant number EP/J011525/1) with BAE Systems as the leading industrial partner. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] V. CONCLUSIONS AND FUTURE WORK In this paper, the distributed Performance Impact (PI) task-allocation algorithm has been enhanced to include a degree of either 𝜀-greedy or soft max action selection, thus introducing a level of exploration to the architecture, much like a genetic algorithm, which uses random cross-over and mutation to obtain more variation in the solution. These variations can permit new areas of the search space to be explored and hence can improve solution fitness. Indeed, in the experiments performed here the introduction of more exploration improved baseline PI’s task allocation performance by up to about 8%, and enabled the algorithm to solve additional problems that failed using the baseline version. Baseline PI has been shown to outperform the popular state-of-the-art CBBA algorithm [2] in this paper and also in [1]. The additional action selection mechanisms detailed in this paper enable improved performance at [10] [11] [12] [13] [14] W. Zhao, Q. Meng, and P. W. H. Chung, “A Heuristic Distributed Task Allocation Method for Multivehicle Multitask Problems and Its Application to Search and Rescue Scenario,” IEEE Transactions on Cybernetics, DOI: 10.1109/TCYB.2015.2418052, April 2015 H-L Choi, L. Brunet, and J. P. How, “Consensus-Based Decentralized Auctions for Robust Task Allocation,” IEEE Transactions on Robotics, vol. 25, no. 4, Aug. 2009, pp. 912–926 B. P. Gerkey, and M. J. M. Matarić, “A Formal Analysis and Taxonomy of Task Allocation in Multi-robot Systems,” Intl. J. of Robotics Research, vol. 23, no. 9, Sept. 2004, pp. 939–954 J. L. Bruno, E. G. Coffman, and R. Sethi “Scheduling Independent Tasks to Reduce Mean Finishing Time,” Communications of the ACM, vol. 17, no. 7, 1974, pp. 382–387 M. B. Dias, and A. Stentz, “Opportunistic Optimization for MarketBased Multi-robot Control”, in Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), Lausanne, Switzerland 2002, pp. 2714–2720. C. Liu, and A. Kroll, “Memetic Algorithms for Optimal Task Allocation in Multi-robot Systems for Inspection Problems with Cooperative Tasks,” Soft Comput., April 2014 C. Liu, and A. Kroll, “A Centralized Multi-robot Task Allocation for Industrial Plant Inspection by Using A* and Genetic Algorithms”, L. Rutkjowski et al. (Eds.): ICAISC 2012, Part II, LNCS 7268, 2012, pp. 466–474 J. A. Fax, and R. M. Murray, “Information Flow and Cooperative Control of Vehicle Formations”, IEEE Trans. Autom. Control, vol. 49, no. 9, Sept. 2004, pp. 1465–1476 W. Ren, R. Beard, and D. Kingston, “Multi-agent Kalman Consensus with Relative Uncertainty”, in Proc. Amer. Control Conf., 2006, pp. 1865–1870 G. Oliver, and J. Guerrero, “Auction and Swarm Multi-Robot Task Allocation Algorithms in Real Time Scenarios”, Toshiyuki Yasuda (Ed.): Multi-Robot Systems, Trends and Development, InTech, ISBN: 978-953-307-425-2, 2011, pp. 437–456 D. P. Bertsekas, “The Auction Algorithm for Assignment and Other Network Flow Problems”, Mass. Inst. Technol., Cambridge, MA, Tech. Rep., 1989 K. Zhang, E. G. Jr. Collins, and D. Shi, “Centralized and Distributed Task Allocation in Multi-robot Teams via a Stochastic Clustering Auction”, ACM Trans. Autonom. Adapt. Syst., vol. 7, no. 2, July 2012, Article 21 S. Zaman, and D. Grosu, “A Combinatorial Auction-based Mechanism for Dynamic VM Provisioning and Allocation in Clouds”, IEEE Transactions on Cloud Computing, vol. 1, no. 2, 2013, pp. 129– 141 M. G. Lagoudakis, M. Berhault, S. Koenig, P. Keskinocak, and A. J. Kleywegt, “Simple Auctions with Performance Guarantees for Multirobot Task Allocation”, in Proc. of the 2004 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, vol. 1, 2004, pp. 698–705
© Copyright 2026 Paperzz