Scalable Decentralised Control for Multi-Platform Reconnaissance and Information Gathering Tasks George M. Mathews and Hugh F. Durrant-Whyte ARC Centre of Excellence for Autonomous Systems The University of Sydney, Sydney NSW 2006, Australia [email protected] Abstract - This paper describes the general multiplatform reconnaissance or information gathering task. It is shown that the general problem must be solved in a centralised manner using information supplied by every platform. However, if for a specific problem the objective function is Partially Separable it is shown the optimal control problem can be solved in a decentralised fashion for an arbitrary sized group. This is demonstrated for scenarios with Gaussian probabilities and an indoor mapping problem. Keywords: Cooperative Control, Decentralised Information Gathering, Active Estimation, Exploration 1 Introduction Reconnaissance or information gathering problems cover any scenario requiring information to be gathered from the external environment. This includes: environmental monitoring, searching, surveillance, target tracking, mapping and exploration. Utilising an autonomous system to perform these tasks allows a human operator to be removed from possibly dangerous (or simply mundane) situations. A system comprising multiple autonomous sensor platforms offers additional benefits such as increased spatial coverage and greater system redundancy. However, the system and control complexity grows rapidly with the number of platforms [1]. Existing approaches to this problem are either: (1) fully centralised [2, 3] where the system as a whole is modelled and controlled using conventional techniques. (2) Distributed or hierarchical [4] which utilise the distributed computational capacity of the multiple platforms but require a single control centre to resolve global constrains or perform resource allocations. (3) Decentralised systems that do not require any centralised facility. The decentralised approach can be further broken down into systems that require each platform to have global information about the group [5] and those that only require local information from a small subset. It is the latter that is of interest here. The complexity of these systems is independent of the number of platforms and achieves the full benefits offered by multiagent systems. This paper formulates the general multi-platform reconnaissance or information gathering problem for known platform dynamics. In general the single step optimal control problem must be solved in a centralised manner using information from every platform. This is computationally intractable and physically infeasible (due to bandwidth considerations) for large teams. In this paper the work of Grocholsky [6] is generalised and a special form of objective function, called Partially Separable is presented. This allows arbitrary sized teams to solve the optimal control problem in a fully scalable and decentralised fashion. This paper is organised as follows. The general optimal control problem is formulated in Section 2. Section 3 presents the partially separable form of the group objective function and demonstrates its scalability using a scenario with Gaussian probabilities. Section 4 demonstrates the indoor mapping scenario has the same partially separable group objective function. Experimental results for the indoor mapping problem are presented in Section 5. Section 6 presents conclusions and future work. 2 The Information Problem Gathering This paper is concerned with the best way to control a group of sensing platforms undertaking a reconnaissance or information gathering task. This scenario requires information to be gathered on a particular state variable x ∈ X . This may include the positions of targets, terrain properties of a given region, or surface information of a remote planet, for example. This state will be observed by a group of n sensor platforms jointly described 1 Qn by the state variable y = y, . . . , n y ∈ Y = i=1 i Y. It is assumed that the state x is not influenced by the sensor platforms. Thus x evolves according to the probabilistic discrete time Markov motion model P (xk |xk−1 ). The ith sensor platform is controlled by applying a particular control action i uk ∈ i U . In general this causes the platforms state to change according to the probabilistic motion model P (i yk |i yk−1 , i uk ). However in this paper, it is assumed that the platform’s motion is know with precision, i.e. i yk = i f (i yk−1 , i uk ). The group state and transition model is given by yk = f (yk−1 , uk ) = 1 f (1 yk−1 , 1 uk ), . . . , n f (n yk−1 , n uk ) The observations made by the ith platform are modelled by the conditional density P (i zk |xk , i yk ), which describes the probability of obtaining a particular observation i zk given states xk and i yk . The notation zk will denote the set of observations from all the 1 n platforms Qn ati time step k, i.e. zk = { zk , . . . , zk } ∈ Z = i=1 Z. Assuming that the observations are conditionally independent given the states xk and i yk , the combined sensor model is P (zk |xk , yk ) = Q n i i i=1 P ( zk |xk , yk ) The platforms are to be controlled such that the combined observations they receive produce the most informative (or least uncertain) belief about x. To accomplish this the distribution’s entropy will be used as a measure of its associated uncertainty. The entropy of a distribution P (x), is the negative of the expectation of its logarithm Hx = Ex [− log P (x)]. (1) This can be used for continuous variables (in a fixed coordinate frame) where P (x) is the probability density function or for discrete variables where P (x) is the probability distribution function. For a detailed description of the properties of this metric see [7]. 2.1 2.2 Group Utility The objective of the group is to minimise its uncertainty in x. There are many ways to formally define this, the best being a discounted infinite horizon dynamic programming problem. As a simple approximation to this a single step look ahead will be considered. At time step k − 1, the utility of a future group action uk and observation zk is related to the amount of uncertainty in the resulting posterior belief at time k. Actions and observations that produce a smaller uncertainly in the posterior belief are favored over others. Thus a suitable utility function is the negative posterior entropy, such that when the utility is maximised, the posterior uncertainty is minimised Uk (zk , uk ) = −Hxk |Zk ,Yk (zk , uk ). However, the actual observations made will not be known in advance. Thus, an expectation over all possible future observations is performed. The expected utility or group objective function is Jk (uk ) = Ezk [Uk (zk , uk )] = Ezk −Hxk |Zk ,Yk (zk , uk ) . This section outlines the recursive Bayesian estimation process [8]. To start, consider the system at a given time step k − 1. The team’s state is given by yk−1 and the belief about xk−1 conditioned on all past observations and platform configurations is P (xk−1 |Zk−1 , Yk−1 ), where Zk−1 = {z1 , . . . , zk−1 } and Yk−1 = {y1 , . . . , yk−1 } and Z0 = Y0 = {∅}. When a group control action, uk is taken, the new state of the platforms becomes yk (uk ) = f (yk−1 , uk ), and an observation zk is taken. To update the belief about xk , it must first be predicted forward in time using the Chapman-Kolmogorov equation Jk (uk ) = Ezk Exk [log P (xk |Zk , Yk (uk ))] Qn P (xk |Zk−1 , Yk−1 ) i=1 P (i zk |xk , i yk ) = Ezk Exk log . P (zk |Zk−1 , Yk ) (8) For continuous variables the double expectation is given by a double integral over the joint density of zk and xk , which can be expanded as Jk (uk ) = Z Z Z P (xk |Zk−1 , Yk−1 ) = Z P (xk |xk−1 )P (xk−1 |Zk−1 , Yk−1 )dxk−1 . (2) ×log X The belief can now be updated using Bayes rule P (xk |Zk , Yk ) = (3) where zk = {1 zk , . . . , n zk } are the actual observations taken by the platforms. The term P (i zk |xk , i yk ) is the ith platforms observation model evaluated at the actual observation and platform configuration, resulting in a likelihood over xk . The normalisation constant µ is given by µ = P (zk |Zk−1 , Yk ) (4) Z n Y P (i zk |xk , i yk )dxk . (5) = P (xk |Zk−1 , Yk−1 ) X (7) Using the definition of entropy and substituting (3) and (5) this becomes Baysian Filtering n Y 1 P (xk |Zk−1 , Yk−1 ) P (i zk |xk , i yk ) µ i=1 (6) i=1 The key problem in this process is deciding on the optimal group control action uk . P (xk |Zk−1 , Yk−1 ) X n Y P (i zk |xk , i yk ) i=1 Qn i P (xk |Zk−1 , Yk−1 ) i=1 P ( zk |xk , i yk ) dxk dzk . P (zk |Zk−1 , Yk ) (9) The dependence of this function on the control variables uk , is embedded in the sensor model P (i zk |xk , i yk ) through the platform configuration i yk = i f (i yk−1 , i uk ). 3 Decentralised Control The single step group optimal control problem is defined by uk ∗ = arg max Jk (uk ). (10) In general, the communication complexity is determined by what each agents is required to know about the other agents. While the computational requirments relates to the inherent complexity of the optimisation task. To evaluate the general objective function in (9), the motion and observation models from all the sensor platforms are required. Thus the communication complexity of the general optimal control problem will be strongly dependent on the size of the group. This constraint is a major limiting factor on the scalability of any distributed control algorithm and no benefits are gained through a decentralised implementation.. However, there exist certain objective functions which allow the control problem to be solved in a scalable decentralised fashion. platform has an impact vector i Υk (i uk ), these are combined in some fashion, using the operator ∗ (e.g. addition). The objective function is then given by a function of this combined vector (e.g. norm)1 . If the objective function has this form, the group optimal control problem becomes uk = arg max ψk 1 Υk (1 uk ) ∗ · · · ∗ n Υk (n uk ) . (16) 3.1 To demonstrate how this allow the group control problem to be solved in a decentralised fashion, a linearised Gaussian example will be given. Fully Separable The simplest case occurs when the objective function can be separated into a sum of terms from each platform n X i Jk (uk ) = φk (i uk ). (11) i=1 where i φk : Ui → <. The objective function (9) can be written in this form if the individual platforms observe independent components of the state x. The group optimal control problem is then solved by each platform independently maximising their decoupled functions i φk (i uk ). However, this for is not often encountered in practice. 3.2 uk ∈U 3.3 Gaussian Example This section considers the multi-platform information gathering problem with all densities assumed Gaussian. It follows the work of Grocholsky [6] and demonstrates it has the partially separable form. The predicted prior for time step k, given all information upto k − 1, is assumed Gaussian with mean x̂k|k−1 and covariance Pk|k−1 P (xk |Zk−1 , Yk−1 ) ∼ N (xk ; x̂k|k−1 , Pk|k−1 ). The ith platforms observation model is given in linearised form as Partially Separable P (i zk |xk , i yk ) ∼ N (i zk ; i h(xk , i yk ), i Rk ) An objective function is Partially Separable if it can be redefined from a mapping from the space of team controls, U1 × · · · × Un , to a mapping from a common group impact space S. An element of this impact space represents all information relevant to the group objective. (12) Jk (uk ) = ψk Tαk where Tαk ∈ S is the combined group impact. This enables the number and individual model of the platforms in the group to be abstracted away. To demonstrate how this group impact is obtained, consider a group containing only one platform. The impact this platform generates is given by its impact function, defined i Υk : Ui → S αk = 1 αk = 1 Υk (1 uk ). T (17) (18) where i h(xk , i yk ) = i ak + i Hk xk . The vector i ak and the matrix i Hk are both generally dependent on x̂k|k−1 and i yk (and hence the control i uk ) through the linearisation. For a given group control action uk and observation zk , the posterior on xk can be computed using Bayes rule (3). This is identical to a Kalman filter update P (xk |Zk , Yk ) ∼ N (xk ; x̂k|k , Pk|k ) (19) with mean and covariance, given in inverse covariance form n X −1 −1 i T i −1 i Pk|k = Pk|k−1 + H k Rk H k (20) i=1 −1 P−1 k|k x̂k|k = Pk|k−1 x̂k|k−1 + (13) n X i i HTk i R−1 zk k (21) i=1 This function abstracts away the individual models and state of the platform. If another platform is added, there must exist a commutative and associative binary operator ∗ for the impact space that combines impacts, i.e. αk = 1,2 αk = 1 Υk (1 uk ) ∗ 2 Υk (2 uk ). T (14) Extending this further, a objective function is partially separable if it can be written in the form 1 1 n n Jk (uk ) = ψk Υk ( uk ) ∗ · · · ∗ Υk ( uk ) . (15) One interpretation of this definition can be given by considering the impact space as a vector space. Each where (·)−1 and (·)T are the matrix inverse and transpose operators respectively. The group utility function defined in Section 2.2 is the negative of the posterior entropy. For a Gaussian density the entropy is a function of the determinate of the covariance matrix [7] Hxk |Zk ,Yk (zk , uk ) = 1 log (2πe)dx Pk|k 2 (22) where dx is the dimensionality of the state space X . An important property of the Gaussian representation is that the posterior entropy is not a function of the fully separable form of (11) is recovered if S ≡ <, i Υk ≡ and ∗ ≡ +. 1 The iφ k actual observation made. This can be seen by examining the update equation (20), which is only dependent on the predicted prior covariance Pk|k−1 , and the observation models i Hk , i Rk of the platforms. Thus the objective function Jk (uk ) can be written as 1 (23) Jk (uk ) = − log (2πe)dx Pk|k . 2 This can be simplified by noting that the objective function is only required as a preference ordering over actions, and thus it can be replaced by (24) Jk (uk ) = P−1 k|k . Combining (20) and (24) the objective function becomes n X −1 i i Ik ( uk ) (25) Jk (uk ) = Pk|k−1 + i=1 i i i i HTk i R−1 Hk . k where Ik ( uk ) = This has the partially sparable form specified in (15) where the impact space S contains all dx ×dx symmetric matrices and the impact for each agent is given by the impact function i αk = i Υ k (i uk ) = i I k (i uk ) (26) The combination operator is given by matrix addition, hence the team impact is n X T i αk = 1 Υ k (1 uk ) ∗ · · · ∗ n Υ k (n uk ) = Ik (i uk ) (27) i=1 Thus the objective function is T Jk (uk ) = ψk αk −1 T = Pk|k−1 + αk (28) The benefit of this formulation is that each agent only needs the combined impact from the rest of the team to evaluate the group objective function. The number of platforms, what type of sensors they have, how they move, even the actual impacts of each agent are irrelevant. 3.4 Decentralised Solution Method The previous section defines what each agent must know for the objective function to be evaluated. This will be extended and a decentralised algorithm to reach the optimal group controls presented. The proposed method makes one additional assumption on the structure of the impact space. An unary inverse operator2 exists on the impact space S under the operation of ∗ and will be defined as (·)−1 . Its assumed each sensor platform has some processing capability and authority over its control input i uk . The group optimal control problem, for a partially separable objective function, can then be solved using a distributed optimisation method, such as a nonlinear block iterative algorithm [9]. 2 This should not be confused with the matrix inverse used in the example in Section 3.3. The inverse of an impact in this example is simply its negative. A simple algorithm of this type solves the group optimisation problem (16) in a synchronous fashion. At each iteration l + 1, platform i optimises over its local control input i uk while keeping all other controls fixed. i uk,l+1 = arg i max Jk (1 uk,l , . . . uk ∈Ui . . . , i−1 uk,l , i uk , i+1 uk,l , . . . , n uk,l ) (29) where the subscript (·)l refers to the previous update. Here the updates are computed in parallel and hence it is a Jacobi type algorithm[9]. Using the partially separable for (15) this can be written as i uk,l+1 = arg max ψk i Υk (i uk ) ∗ ı̄ αk,l . (30) iu k ∈Ui Where ı̄ αk,l is the combined impact from the rest of the group (i.e. all platforms not including i) from the last iteration. To implement this, it may appear that each platform requires the impacts from all other platforms to evaluate ı̄ αk,l . Thus the number of messages each platform sends/receives will be proportional to the size of the group n. However, since only the combination ı̄ αk,l is required, the information can be communicated throughout the group using an acyclic communication network. To demonstrate this, consider the simplest acyclic network, a chain, where each platform communicates with its two neighbours. After the lth evaluation of (30), all platforms evaluates i αk,l = i Υk (i uk,l ) in parallel. For illustration purposes, assume communication begins at one end of the chain by the platform sending i αk,l . Each platform along the chain combines the component it receives with its local component and sends it to the next platform. Once the message reaches the end, it contains the full impact from the group T αk,l . This is propagated back up the chain such that each platform obtains a copy of it. This scenario is shown in Fig. 1 for 3 platforms. This process ensures that each platform has the full impact from the group. To start the l + 1th iteration each platform must separate its component from the last iterations group summary to obtain ı̄ αk,l . This is obtained by using the inverse operator ı̄ αk,l = T αk,l ∗ i αk,l −1 . (31) The l + 1th iteration is computed using (30) and the procedure repeated. The algorithm is summarised in Table 1 and will converge to the optimal group solution if the matrix ∇2 Jk (uk ) satisfies a diagonal dominance condition [9]. The benefits of formulating and solving the group optimal control problem in this fashion are: • Computational requirements per iterations for each platform is independent of the size of the group n. • Number of messages each platform must send is independent of n. 1 1 1,2 αk,l 2 αk,l T 3 αk,l T αk,l 3.5 αk,l αk,l = 1,2,3 αk,l αk,l Figure 1: Simple communication process in a chain of 3 platforms. Table 1: Example procedure for each platform i to solve the group optimal control problem specified in (16). 1. Initialise i uk,0 and set l = 0. Decentralised Data Fusion For all cases considered the objective function is dependent on the prior belief P (xk−1 |Zk−1 , Yk−1 ) and must be synchronised among the platforms before the control algorithm can start. It is assumed that this is accomplished using standard DDF techniques communicating over the same acyclic network. See Makarenko et al. [10] for a general formulation of the DDF algorithm with specific formulations of the Gaussian and certainty grid cases. 4 Indoor Mapping 2. Evaluate i αk,l = i Υk (i uk,l ) and sent to neighbours. 3. Wait to receive T αk,l . ı̄ 4. Obtain αk,l by removing the previously combined component from group summary using (31). 5. Compute i uk,l+1 using equation (30) 6. Check for convergence: i uk,l+1 − i uk,l < threshold - No: Set l = l + 1 and goto 2. - Yes: Stop iterations. • Size of each message each platform must send is independent of n. • Total number of messages the group must send is only O(n). • Number of iterations required to converge is only dependent on the inherent complexity of the problem and not the size of the group. The last item can be demonstrated by considering a scenario where the objective function is fully separable but is to be solved using the above iterative method. The optimal solution will be found in a single iteration regardless of the size of the team. This property also holds when only a few members actions are coupled. For this situation, the solution time will only depend on this small subgroup and not the full size of the group. The presented algorithm requires synchronous operation over a chain network. However it can be extend to allow asynchronous updates over a arbitrary acyclic network. This will also overcome the only dependence the algorithm has on the number of platforms: the time required for messages to be passed throughout the network. However, these extensions are out of scope and will be left for future work. For the Gaussian case the decentralised control algorithm has the same form as a decentralised data fusion (DDF) algorithm [10, 11], which requires each sensor platform to propagate the information matrix i Ik through the network after an observation is made (the information vector component, specified in (21), is also communicated). This works for predicting future expected posterior entropy because, as shown in (22), it is independent of the future observation and can be predicted deterministically. In general this decentralised control algorithm requires additional information to be communicated than simply likelihoods from future observations. To demonstrate how it is possible to create a decentralised control scheme for a non-Gaussian problem, a multi-platform indoor mapping scenario is considered. This problem has attracted a lot of research interest, however all previous solutions require some form of centralised controller [4]. A group of n mobile sensor platforms equipped with laser range scanner and access to external localisation is considered. The full multi-platform exploration scenario requires the joint localisation problem to be solved concurrently, however only the mapping component is considered here. 4.1 Belief Representation The indoor environment will be represented using an occupancy grid (OG). This provides a simple representation that can be used for pathplanning [12]. An OG is a binary random field where the state of each cell is assumed independent and encodes the probability of the corresponding region in space containing an obstacle. The state of interest is the binary vector x = [x1 , x2 , . . . , xN ]T , where xs ∈ Xs = {OCC, EM P }, ∀s ∈ {1, . . . , N } and N is the number of cells in the grid. The environment is assumed static and hence the time index k has been replaced with a cell index. For simplicity the notation P (xs ) will mean P (xs = OCC), and hence P (xs = EM P ) = 1−P (xs ). Due to the independency assumption, the belief can be maintained separately for each cell, P (x) = P (x1 )P (x2 ) · · · P (xN ). The entropy of the OG is the sum of the entropiesP of the independent distributions N over the cells, Hx = s=1 Hxs . Where the entropy of a cell is H = −P (x ) log P (x ) − 1 − P (x ) log 1 − x s s s s P (xs ) . 4.2 Observation Models It is important to construct an observation model that can be used to efficiently evaluate the expected utility of future observations. This is opposed to models for accurately fusing data after an observation is made. In general the same model should be used for both, however this often proves computationally infeasible and will not lead to a partially separable group objective function. P (xs |i zk ) = 1 iz P (xs |i zk ) = P (xs ) k (s) iz P (xs |i zk ) = 0 iz k (s) k (s) Qn = OCC i=1 Zi (s̃). The probability P (zk (s̃)|Zk−1 , Yk ) is the joint probability the multiple platforms will detect or not detect the state of a given cell s and is given by = UNO = EM P Figure 2: Each platform is equipped with a perfect sensor that will either fully observe the state of a given cell (i zk (s) = EM P or OCC) or will not observe the cell at all (i zk (s) = U N O). As more sensor platforms are added to the planning process the possible posteriors will always be constrained to three per OG cell. For a low noise sensor, such as a laser range scanner, it is assumed the sensor is deterministic. Thus for any observation, each cell will be fully observed (being either occupied or empty), or left unobserved. The notation i zk (s) will represent the outcome of the ith platform observation at time k with respect to cell s, i.e. i zk (s) ∈ Zi (s) = {OCC, EM P, U N O}. This simple model causes the posterior of each cell to be constrained to three values, corresponding to the cell known to be occupied, known to be empty or remains the same as the prior (see Fig. 2). If all platforms employ a similar deterministic observation model, the posterior of each cell after all platform observations is constrained to one of these three values. However, two of these have the same entropy (if the state of a cell is known, P (xs ) = 0 or 1, the entropy is zero), and the other has the same as the prior. Thus for the goal of calculating the expected posterior entropy, the observations, i zk (s) = OCC, EM P will be considered together. Thus the observation space Zi (s) will be partitioned S into Zi (s̃) = {OBS, U N O} where OBS = {OCC EM P } and represents the state of the cell being observed. The full observation model is P, φ(i yk ) if xm = EM i ∀m ∈ ray[ y i i k , s) P ( zk (s̃) = OBS|x, yk ) = 0 else i i i P (zk (s̃)|Zk−1 , Yk ) = n X Y P (x|Zk−1 , Yk−1 ) P (i zk (s̃)|x, i yk ). (34) If the sensor platforms are to observe the same cell from slightly different directions, then due to the independency assumption of the OG, the observations i zk (s̃) made by the platforms will be independent. This occurs because the observations of each of the platforms, defined in (32), will only be dependent on independent components of the state x (i.e. T ray[i yk , s) ray[j yk , s) = ∅). Thus the joint probability becomes the product of the marginals P (zk (s̃)|Zk−1 , Yk ) = P (i zk (s̃)|Zk−1 , Yk−1 , i yk ) = X P (x|Zk−1 , Yk−1 )P (i zk (s̃)|x, i yk ). (36) x∈X However this is where the assumptions made in the OGs cause a full probabilistic analysis to produce unrealistic predictions (see Fig. 3 for a comparison). Hence, the probability P (i zk (s̃)|Zk−1 , Yk−1 , i yk ) will be empirically modelled from the statistics of previous observations[13]. Using (33) and (35), the expected utility becomes Jk (uk ) = − N X Jk (uk ) = − N X X n Y P (i zk (s̃)|Zk−1 , Yk−1 , i yk ) s=1 zk (s̃)∈Z(s̃) i=1 × Hxs |Zk ,Yk (zk (s̃), uk ) (37) This can be further simplified by noting that if a cell is observed by any of the platforms (i zk (s̃) = OBS for any i ∈ {1, . . . , n}) it will have zero posterior entropy, while if it is unobserved by the entire group, its entropy will remain the same as its prior Hxs |Zk−1 ,Yk−1 . Thus the objective function becomes Group Utility The group objective function is given by the expected posterior entropy of each cell, summed over all cells P (i zk (s̃)|Zk−1 , Yk−1 , i yk ). (35) This allows each platform to independently calculate the probability of its observations. These could be obtained using P ( zk (s̃) = U N O|x, yk ) = 1 − P ( zk (s̃) = OBS|x, yk ) (32) 4.3 n Y i=1 i where φ(i yk ) ∈ {0, 1} includes geometric sensor constraints such as range and field of view and ray[i yk , s) represents the cells between the sensor and up to but not including the cell s. Thus a cell will be observed if it is within the sensor range and field of view and all the cells in between are empty. i=1 x∈X Jk (uk ) = − N Y n X i pk (s, i uk )Hxs |Zk−1 ,Yk−1 . (38) s=1 i=1 where i pk (s, i uk ) = P (i zk (s̃) = U N O|Zk−1 , Yk−1 , i yk ). X P (zk (s̃)|Zk−1 , Yk ) s=1 zk (s̃)∈Z(s̃) × Hxs |Zk ,Yk (zk (s̃), uk ). (33) The expectation is taken over all possible group observations zk (s̃) = {1 zk (s̃), . . . , n zk (s̃)} ∈ Z(s̃) = 4.4 Decentralised Optimal Control The group objective function (38) has the required partially separable form defined in Section 3.2. This can be demonstrated by letting the impact space S be the 1 Measured Prior P(x =OCC) = 0.5 s 0.9 Probability cell is observed 0.8 0.7 0.6 0.5 0.4 0.3 0.2 (a) 0.1 0 0 1 2 3 4 5 6 7 (b) Figure 4: (a) OG map representing the current group be- 8 Distance (m), No. of Cells (x10) Figure 3: The probability of a cell being observed P (i zk (s̃) = OBS|Zk−1 , Yk−1 , i yk ) versus distance from the sensor. Black line is for the OG model with prior probability of occupancy of 0.5, grey line was experimentally determined for the environment given in Fig. 6. lief, the sensor platforms are shown in blue and grey. The red squares are the optimal strategies determined by the group. (b) Corresponding group impact T αk displayed on the same grid as the OG, white is high probability that the group will observe the cell and grey is low probability. bounded N dimensional space [0, 1]N , where an element corresponds to the probability of the platform not observing each cell in the OG. Thus i αk = i Υ k (i uk ) T = i pk (s, i uk ), . . . , i pk (s, i uk ) . (39) The composition operator is given by element wise multiplication and thus the group impact is T αk = n hY i i pk (1, uk ), . . . , i=1 n Y i=1 i i pk (N, uk ) iT for the experiments. . (40) By letting Tαk (s) be the sth element of the group impact vector, the group objective function, in the partially separable form, is Jk (uk ) = ψk (Tαk ) = − N X Figure 5: Mobile robots, Hornet, Buzz and Mozzy used To overcome this problem, each platform will employ its own variable planning horizon. The group objective function then becomes the expected rate of uncertainty reduction in the map obtained by the group undertaking a joint strategy vk Jk (vk ) = E [−∆H/∆t] T αk (s) Hxs |Zk−1 ,Yk−1 . s=1 (41) Thus in theory the group optimal control problem can be solved using the decentralised method outlined in Section 3.4. where the group control strategy is vk = 1 vk , . . . , n vk and the strategy for platform i is given by the set of controls i vk = i uk:k+iκ , iκ . 5.1 5 Experiment Previous developments assumed the look ahead was until the next observation. However, for an exploration type scenario, a much larger horizon is required (e.g. where to go next after exploring a room). In general this can be overcome by using a group objective function related to the expected posterior entropy after κ steps. However, this introduces the problem of what is the appropriate horizon for the group. This is a complex problem as an appropriate look ahead for one platform may not be right for another. For example a platform that has just finished exploring a room will require a much larger look ahead than one exploring a corridor. In more general situations some platforms may have less processing capability than others and are unable to plan a long way into the future. (42) Planning and Observation Simplifications The decentralised block iterative algorithm presented in Section 4 remains the same except that control actions i uk are replaced with control strategies i vk . To limit the space of possible strategies, only paths to frontiers cells are considered. Also, only observations made at the frontier will have a significant impact on the expected posterior entropy, and hence only these are included in the expectation for the group objective function. Figure 4 shows the results of a simple two platform scenario. 5.2 Results The system was tested with the group of 3 pioneer robots shown in Fig. 5. Localisation was performed using highly reflective laser beacons with known positions. Figure 6 shows the results of the group exploring (a) (b) (c) Figure 6: Progress made by the group throughout the mission. Each figure represents the combination of previous and future observations from the group (see Fig. 4). an indoor office environment. The environment is approximately 20x30m at a discretisation of 10cm. At each stage shown, the group summary T αk has been combined with the current OG. This combined map represents the expected belief given all previous and possible future observations by the group. The behaviour is consistent across the group, with each vehicle’s strategy taking into account the strategy of all other platforms. The task is considered completed when no available actions will reduce the map uncertainty. 6 Conclusions This paper has presented the general multi-platform information gathering problem. This can be solved in a scalable decentralised manner if the group objective function can be written in a partially separable form. Two examples scenarios were shown that result in the required objective function, the first using the Gaussian representation and the second an indoor occupancy grid mapping problem. The main advantage of this form is the resulting solution complexity and platfrom bandwidth requirements are independent of the number of platforms in the group. Although only the information gathering scenario was considered, this method can be applied to any group control problem, whose objective function is partially separable. A good example of this is the cooperative search problem described by Bourgault in [14] which it is believed can also be formulated with a partially separable group objective function. [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] Acknowledgments This work is partly supported by the ARC Centre of Excellence programme, funded by the Australian Research Council (ARC) and the New South Wales State Government, and by CSIRO Industrial Physics. References [1] D. V. Pynadath and M. Tambe, “Multiagent teamwork: Analyzing the optimality and complexity of [13] [14] key theories and models,” in Proc. Int. Conf. of Autonomous Agents and Multiagent Systems, 2002. T. Furukawa, G. Dissanayake, and H. F. DurrantWhyte, “Time-optimal cooperative control of multiple robot vehicles,” in Proc. IEEE Int. Conf. on Robotics and Automation, 2003. A. Richards and J. P. How, “Aircraft trajectory planning with collision avoidance using mixed integer linear programming,” in Proc. of the American Control Conf., 2002. W. Burgard, M. Moors, C. Stachniss, and F. Schneider, “Coordinated multi-robot exploration,” IEEE Trans. on Robotics, vol. 21, no. 3, pp. 376–386, June 2005. S. Waslader, G. Inalhan, and C. J. Tomlin, “Decentralized optimisation via nash bargaining,” in Proc. Conf. Cooperative Control and Optimisation, 2003. B. Grocholsky, “Information-theoretic control of multiple sensor platforms,” Ph.D. dissertation, Univ. of Sydney, Australia, 2002. T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. P. S. Maybeck, Stochastic models, estimation and control. Academic Press, New York, 1979-1982. D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. PrenticeHall, 1989. A. Makarenko and H. F. Durrant-Whyte, “Decentralized data fusion and control algorithms in active sensor networks,” in Proc. Int. Conf. on Information Fusion, 2004, pp. 479–486. J. Manyika and H. F. Durrant-Whyte, Data Fusion and Sensor Management: A Decentralized Information-Theoretic Approach. Ellis Horwood, 1994. A. Elfes, “Occupancy grids : A probabilistic framework for robot perception and navigation,” Ph.D. dissertation, Carnegie Mellon University, Australia, 1989. W. Burgard, M. Moors, R. Simmons, and S. Thrun, “Collaborative multi-robot exploration,” in Proc. IEEE Int. Conf. on Robotics and Automation, San Francisco, CA, USA, Apr. 2000, pp. 476–481. F. Bourgault, T. Furukawa, and H. F. Durrant-Whyte, “Decentralized bayesian negotiation for cooperative search,” in Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2004.
© Copyright 2026 Paperzz