Scalable Decentralised Control for Multi

Scalable Decentralised Control for Multi-Platform
Reconnaissance and Information Gathering Tasks
George M. Mathews and Hugh F. Durrant-Whyte
ARC Centre of Excellence for Autonomous Systems
The University of Sydney, Sydney NSW 2006, Australia
[email protected]
Abstract - This paper describes the general multiplatform reconnaissance or information gathering
task. It is shown that the general problem must
be solved in a centralised manner using information
supplied by every platform. However, if for a specific problem the objective function is Partially Separable it is shown the optimal control problem can
be solved in a decentralised fashion for an arbitrary
sized group. This is demonstrated for scenarios with
Gaussian probabilities and an indoor mapping problem.
Keywords: Cooperative Control, Decentralised Information Gathering, Active Estimation, Exploration
1
Introduction
Reconnaissance or information gathering problems
cover any scenario requiring information to be gathered from the external environment. This includes: environmental monitoring, searching, surveillance, target
tracking, mapping and exploration.
Utilising an autonomous system to perform these
tasks allows a human operator to be removed from possibly dangerous (or simply mundane) situations. A system comprising multiple autonomous sensor platforms
offers additional benefits such as increased spatial coverage and greater system redundancy. However, the
system and control complexity grows rapidly with the
number of platforms [1].
Existing approaches to this problem are either: (1)
fully centralised [2, 3] where the system as a whole
is modelled and controlled using conventional techniques. (2) Distributed or hierarchical [4] which utilise
the distributed computational capacity of the multiple platforms but require a single control centre to resolve global constrains or perform resource allocations.
(3) Decentralised systems that do not require any centralised facility.
The decentralised approach can be further broken
down into systems that require each platform to have
global information about the group [5] and those that
only require local information from a small subset. It
is the latter that is of interest here. The complexity
of these systems is independent of the number of platforms and achieves the full benefits offered by multiagent systems.
This paper formulates the general multi-platform
reconnaissance or information gathering problem for
known platform dynamics. In general the single step
optimal control problem must be solved in a centralised
manner using information from every platform. This
is computationally intractable and physically infeasible
(due to bandwidth considerations) for large teams.
In this paper the work of Grocholsky [6] is generalised and a special form of objective function, called
Partially Separable is presented. This allows arbitrary
sized teams to solve the optimal control problem in a
fully scalable and decentralised fashion.
This paper is organised as follows. The general optimal control problem is formulated in Section 2. Section
3 presents the partially separable form of the group
objective function and demonstrates its scalability using a scenario with Gaussian probabilities. Section
4 demonstrates the indoor mapping scenario has the
same partially separable group objective function. Experimental results for the indoor mapping problem are
presented in Section 5. Section 6 presents conclusions
and future work.
2
The Information
Problem
Gathering
This paper is concerned with the best way to control
a group of sensing platforms undertaking a reconnaissance or information gathering task. This scenario requires information to be gathered on a particular state
variable x ∈ X . This may include the positions of
targets, terrain properties of a given region, or surface
information of a remote planet, for example.
This state will be observed by a group of n sensor
platforms
jointly
described
1
Qn by the state variable y =
y, . . . , n y ∈ Y = i=1 i Y. It is assumed that the
state x is not influenced by the sensor platforms. Thus
x evolves according to the probabilistic discrete time
Markov motion model P (xk |xk−1 ).
The ith sensor platform is controlled by applying
a particular control action i uk ∈ i U . In general this
causes the platforms state to change according to the
probabilistic motion model P (i yk |i yk−1 , i uk ). However in this paper, it is assumed that the platform’s motion is know with precision, i.e. i yk = i f (i yk−1 , i uk ).
The group state
and transition model is given by yk =
f (yk−1 , uk ) = 1 f (1 yk−1 , 1 uk ), . . . , n f (n yk−1 , n uk )
The observations made by the ith platform are modelled by the conditional density P (i zk |xk , i yk ), which
describes the probability of obtaining a particular observation i zk given states xk and i yk . The notation
zk will denote the set of observations from all the
1
n
platforms
Qn ati time step k, i.e. zk = { zk , . . . , zk } ∈
Z = i=1 Z. Assuming that the observations are
conditionally independent given the states xk and
i
yk , the combined sensor model is P (zk |xk , yk ) =
Q
n
i
i
i=1 P ( zk |xk , yk )
The platforms are to be controlled such that the
combined observations they receive produce the most
informative (or least uncertain) belief about x. To accomplish this the distribution’s entropy will be used as
a measure of its associated uncertainty. The entropy of
a distribution P (x), is the negative of the expectation
of its logarithm
Hx = Ex [− log P (x)].
(1)
This can be used for continuous variables (in a fixed
coordinate frame) where P (x) is the probability density function or for discrete variables where P (x) is
the probability distribution function. For a detailed
description of the properties of this metric see [7].
2.1
2.2
Group Utility
The objective of the group is to minimise its uncertainty in x. There are many ways to formally define
this, the best being a discounted infinite horizon dynamic programming problem. As a simple approximation to this a single step look ahead will be considered.
At time step k − 1, the utility of a future group action uk and observation zk is related to the amount
of uncertainty in the resulting posterior belief at time
k. Actions and observations that produce a smaller uncertainly in the posterior belief are favored over others.
Thus a suitable utility function is the negative posterior entropy, such that when the utility is maximised,
the posterior uncertainty is minimised
Uk (zk , uk ) = −Hxk |Zk ,Yk (zk , uk ).
However, the actual observations made will not be
known in advance. Thus, an expectation over all possible future observations is performed. The expected
utility or group objective function is
Jk (uk ) = Ezk [Uk (zk , uk )]
= Ezk −Hxk |Zk ,Yk (zk , uk ) .
This section outlines the recursive Bayesian estimation process [8]. To start, consider the system at
a given time step k − 1. The team’s state is given
by yk−1 and the belief about xk−1 conditioned on
all past observations and platform configurations is
P (xk−1 |Zk−1 , Yk−1 ), where Zk−1 = {z1 , . . . , zk−1 }
and Yk−1 = {y1 , . . . , yk−1 } and Z0 = Y0 = {∅}.
When a group control action, uk is taken, the new
state of the platforms becomes yk (uk ) = f (yk−1 , uk ),
and an observation zk is taken. To update the belief
about xk , it must first be predicted forward in time
using the Chapman-Kolmogorov equation
Jk (uk ) = Ezk Exk [log P (xk |Zk , Yk (uk ))]
Qn
P (xk |Zk−1 , Yk−1 ) i=1 P (i zk |xk , i yk )
= Ezk Exk log
.
P (zk |Zk−1 , Yk )
(8)
For continuous variables the double expectation is
given by a double integral over the joint density of zk
and xk , which can be expanded as
Jk (uk ) =
Z Z
Z
P (xk |Zk−1 , Yk−1 ) =
Z
P (xk |xk−1 )P (xk−1 |Zk−1 , Yk−1 )dxk−1 . (2)
×log
X
The belief can now be updated using Bayes rule
P (xk |Zk , Yk ) =
(3)
where zk = {1 zk , . . . , n zk } are the actual observations
taken by the platforms. The term P (i zk |xk , i yk ) is the
ith platforms observation model evaluated at the actual
observation and platform configuration, resulting in a
likelihood over xk . The normalisation constant µ is
given by
µ = P (zk |Zk−1 , Yk )
(4)
Z
n
Y
P (i zk |xk , i yk )dxk . (5)
=
P (xk |Zk−1 , Yk−1 )
X
(7)
Using the definition of entropy and substituting (3)
and (5) this becomes
Baysian Filtering
n
Y
1
P (xk |Zk−1 , Yk−1 )
P (i zk |xk , i yk )
µ
i=1
(6)
i=1
The key problem in this process is deciding on the optimal group control action uk .
P (xk |Zk−1 , Yk−1 )
X
n
Y
P (i zk |xk , i yk )
i=1
Qn
i
P (xk |Zk−1 , Yk−1 ) i=1 P ( zk |xk , i yk )
dxk dzk .
P (zk |Zk−1 , Yk )
(9)
The dependence of this function on the control
variables uk , is embedded in the sensor model
P (i zk |xk , i yk ) through the platform configuration
i
yk = i f (i yk−1 , i uk ).
3
Decentralised Control
The single step group optimal control problem is defined by
uk ∗ = arg max Jk (uk ).
(10)
In general, the communication complexity is determined by what each agents is required to know
about the other agents. While the computational requirments relates to the inherent complexity of the optimisation task.
To evaluate the general objective function in (9),
the motion and observation models from all the sensor
platforms are required. Thus the communication complexity of the general optimal control problem will be
strongly dependent on the size of the group. This constraint is a major limiting factor on the scalability of
any distributed control algorithm and no benefits are
gained through a decentralised implementation..
However, there exist certain objective functions
which allow the control problem to be solved in a scalable decentralised fashion.
platform has an impact vector i Υk (i uk ), these are combined in some fashion, using the operator ∗ (e.g. addition). The objective function is then given by a function of this combined vector (e.g. norm)1 .
If the objective function has this form, the group
optimal control problem becomes
uk = arg max ψk 1 Υk (1 uk ) ∗ · · · ∗ n Υk (n uk ) . (16)
3.1
To demonstrate how this allow the group control problem to be solved in a decentralised fashion, a linearised
Gaussian example will be given.
Fully Separable
The simplest case occurs when the objective function
can be separated into a sum of terms from each platform
n
X
i
Jk (uk ) =
φk (i uk ).
(11)
i=1
where i φk : Ui → <. The objective function (9) can be
written in this form if the individual platforms observe
independent components of the state x. The group
optimal control problem is then solved by each platform independently maximising their decoupled functions i φk (i uk ). However, this for is not often encountered in practice.
3.2
uk ∈U
3.3
Gaussian Example
This section considers the multi-platform information
gathering problem with all densities assumed Gaussian. It follows the work of Grocholsky [6] and demonstrates it has the partially separable form.
The predicted prior for time step k, given all information upto k − 1, is assumed Gaussian with mean
x̂k|k−1 and covariance Pk|k−1
P (xk |Zk−1 , Yk−1 ) ∼ N (xk ; x̂k|k−1 , Pk|k−1 ).
The ith platforms observation model is given in linearised form as
Partially Separable
P (i zk |xk , i yk ) ∼ N (i zk ; i h(xk , i yk ), i Rk )
An objective function is Partially Separable if it can
be redefined from a mapping from the space of team
controls, U1 × · · · × Un , to a mapping from a common
group impact space S. An element of this impact space
represents all information relevant to the group objective.
(12)
Jk (uk ) = ψk Tαk
where Tαk ∈ S is the combined group impact. This
enables the number and individual model of the platforms in the group to be abstracted away.
To demonstrate how this group impact is obtained,
consider a group containing only one platform. The
impact this platform generates is given by its impact
function, defined i Υk : Ui → S
αk = 1 αk = 1 Υk (1 uk ).
T
(17)
(18)
where i h(xk , i yk ) = i ak + i Hk xk . The vector i ak and
the matrix i Hk are both generally dependent on x̂k|k−1
and i yk (and hence the control i uk ) through the linearisation.
For a given group control action uk and observation
zk , the posterior on xk can be computed using Bayes
rule (3). This is identical to a Kalman filter update
P (xk |Zk , Yk ) ∼ N (xk ; x̂k|k , Pk|k )
(19)
with mean and covariance, given in inverse covariance
form
n
X
−1
−1
i T i −1 i
Pk|k = Pk|k−1 +
H k Rk H k
(20)
i=1
−1
P−1
k|k x̂k|k = Pk|k−1 x̂k|k−1 +
(13)
n
X
i
i
HTk i R−1
zk
k
(21)
i=1
This function abstracts away the individual models and
state of the platform.
If another platform is added, there must exist a commutative and associative binary operator ∗ for the impact space that combines impacts, i.e.
αk = 1,2 αk = 1 Υk (1 uk ) ∗ 2 Υk (2 uk ).
T
(14)
Extending this further, a objective function is partially
separable if it can be written in the form
1
1
n
n
Jk (uk ) = ψk Υk ( uk ) ∗ · · · ∗ Υk ( uk ) .
(15)
One interpretation of this definition can be given by
considering the impact space as a vector space. Each
where (·)−1 and (·)T are the matrix inverse and transpose operators respectively.
The group utility function defined in Section 2.2 is
the negative of the posterior entropy. For a Gaussian
density the entropy is a function of the determinate of
the covariance matrix [7]
Hxk |Zk ,Yk (zk , uk ) =
1
log (2πe)dx Pk|k 2
(22)
where dx is the dimensionality of the state space X .
An important property of the Gaussian representation
is that the posterior entropy is not a function of the
fully separable form of (11) is recovered if S ≡ <, i Υk ≡
and ∗ ≡ +.
1 The
iφ
k
actual observation made. This can be seen by examining the update equation (20), which is only dependent
on the predicted prior covariance Pk|k−1 , and the observation models i Hk , i Rk of the platforms.
Thus the objective function Jk (uk ) can be written
as
1
(23)
Jk (uk ) = − log (2πe)dx Pk|k .
2
This can be simplified by noting that the objective
function is only required as a preference ordering over
actions, and thus it can be replaced by
(24)
Jk (uk ) = P−1
k|k .
Combining (20) and (24) the objective function becomes
n
X
−1
i
i
Ik ( uk )
(25)
Jk (uk ) = Pk|k−1 +
i=1
i
i
i
i
HTk i R−1
Hk .
k
where Ik ( uk ) =
This has the partially sparable form specified in (15) where the impact
space S contains all dx ×dx symmetric matrices and the
impact for each agent is given by the impact function
i
αk = i Υ k (i uk ) = i I k (i uk )
(26)
The combination operator is given by matrix addition,
hence the team impact is
n
X
T
i
αk = 1 Υ k (1 uk ) ∗ · · · ∗ n Υ k (n uk ) =
Ik (i uk ) (27)
i=1
Thus the objective function is
T
Jk (uk ) = ψk αk
−1
T
= Pk|k−1 + αk (28)
The benefit of this formulation is that each agent only
needs the combined impact from the rest of the team
to evaluate the group objective function. The number of platforms, what type of sensors they have, how
they move, even the actual impacts of each agent are
irrelevant.
3.4
Decentralised Solution Method
The previous section defines what each agent must
know for the objective function to be evaluated. This
will be extended and a decentralised algorithm to reach
the optimal group controls presented.
The proposed method makes one additional assumption on the structure of the impact space. An unary
inverse operator2 exists on the impact space S under
the operation of ∗ and will be defined as (·)−1 .
Its assumed each sensor platform has some processing capability and authority over its control input i uk .
The group optimal control problem, for a partially separable objective function, can then be solved using a
distributed optimisation method, such as a nonlinear
block iterative algorithm [9].
2 This should not be confused with the matrix inverse used
in the example in Section 3.3. The inverse of an impact in this
example is simply its negative.
A simple algorithm of this type solves the group
optimisation problem (16) in a synchronous fashion.
At each iteration l + 1, platform i optimises over its
local control input i uk while keeping all other controls
fixed.
i
uk,l+1 = arg i max Jk (1 uk,l , . . .
uk ∈Ui
. . . , i−1 uk,l , i uk , i+1 uk,l , . . . , n uk,l )
(29)
where the subscript (·)l refers to the previous update.
Here the updates are computed in parallel and hence
it is a Jacobi type algorithm[9]. Using the partially
separable for (15) this can be written as
i
uk,l+1 = arg max ψk i Υk (i uk ) ∗ ı̄ αk,l .
(30)
iu
k ∈Ui
Where ı̄ αk,l is the combined impact from the rest of the
group (i.e. all platforms not including i) from the last
iteration. To implement this, it may appear that each
platform requires the impacts from all other platforms
to evaluate ı̄ αk,l . Thus the number of messages each
platform sends/receives will be proportional to the size
of the group n.
However, since only the combination ı̄ αk,l is required, the information can be communicated throughout the group using an acyclic communication network.
To demonstrate this, consider the simplest acyclic network, a chain, where each platform communicates with
its two neighbours.
After the lth evaluation of (30), all platforms evaluates i αk,l = i Υk (i uk,l ) in parallel. For illustration
purposes, assume communication begins at one end of
the chain by the platform sending i αk,l . Each platform along the chain combines the component it receives with its local component and sends it to the
next platform.
Once the message reaches the end, it contains the
full impact from the group T αk,l . This is propagated
back up the chain such that each platform obtains a
copy of it. This scenario is shown in Fig. 1 for 3 platforms.
This process ensures that each platform has the full
impact from the group. To start the l + 1th iteration
each platform must separate its component from the
last iterations group summary to obtain ı̄ αk,l . This is
obtained by using the inverse operator
ı̄
αk,l = T αk,l ∗ i αk,l
−1
.
(31)
The l + 1th iteration is computed using (30) and the
procedure repeated. The algorithm is summarised in
Table 1 and will converge to the optimal group solution
if the matrix ∇2 Jk (uk ) satisfies a diagonal dominance
condition [9].
The benefits of formulating and solving the group
optimal control problem in this fashion are:
• Computational requirements per iterations for
each platform is independent of the size of the
group n.
• Number of messages each platform must send is
independent of n.
1
1
1,2
αk,l
2
αk,l
T
3
αk,l
T
αk,l
3.5
αk,l
αk,l =
1,2,3
αk,l
αk,l
Figure 1: Simple communication process in a chain of 3
platforms.
Table 1: Example procedure for each platform i to solve
the group optimal control problem specified in (16).
1. Initialise i uk,0 and set l = 0.
Decentralised Data Fusion
For all cases considered the objective function is dependent on the prior belief P (xk−1 |Zk−1 , Yk−1 ) and
must be synchronised among the platforms before the
control algorithm can start. It is assumed that this is
accomplished using standard DDF techniques communicating over the same acyclic network. See Makarenko
et al. [10] for a general formulation of the DDF algorithm with specific formulations of the Gaussian and
certainty grid cases.
4
Indoor Mapping
2. Evaluate i αk,l = i Υk (i uk,l ) and sent to neighbours.
3. Wait to receive
T
αk,l .
ı̄
4. Obtain αk,l by removing the previously combined
component from group summary using (31).
5. Compute i uk,l+1 using equation (30)
6. Check for convergence: i uk,l+1 − i uk,l < threshold
- No: Set l = l + 1 and goto 2.
- Yes: Stop iterations.
• Size of each message each platform must send is
independent of n.
• Total number of messages the group must send is
only O(n).
• Number of iterations required to converge is only
dependent on the inherent complexity of the problem and not the size of the group.
The last item can be demonstrated by considering a
scenario where the objective function is fully separable
but is to be solved using the above iterative method.
The optimal solution will be found in a single iteration
regardless of the size of the team. This property also
holds when only a few members actions are coupled.
For this situation, the solution time will only depend on
this small subgroup and not the full size of the group.
The presented algorithm requires synchronous operation over a chain network. However it can be extend
to allow asynchronous updates over a arbitrary acyclic
network. This will also overcome the only dependence
the algorithm has on the number of platforms: the time
required for messages to be passed throughout the network. However, these extensions are out of scope and
will be left for future work.
For the Gaussian case the decentralised control algorithm has the same form as a decentralised data fusion (DDF) algorithm [10, 11], which requires each sensor platform to propagate the information matrix i Ik
through the network after an observation is made (the
information vector component, specified in (21), is also
communicated).
This works for predicting future expected posterior
entropy because, as shown in (22), it is independent
of the future observation and can be predicted deterministically. In general this decentralised control algorithm requires additional information to be communicated than simply likelihoods from future observations.
To demonstrate how it is possible to create a decentralised control scheme for a non-Gaussian problem,
a multi-platform indoor mapping scenario is considered. This problem has attracted a lot of research
interest, however all previous solutions require some
form of centralised controller [4].
A group of n mobile sensor platforms equipped with
laser range scanner and access to external localisation is considered. The full multi-platform exploration
scenario requires the joint localisation problem to be
solved concurrently, however only the mapping component is considered here.
4.1
Belief Representation
The indoor environment will be represented using an
occupancy grid (OG). This provides a simple representation that can be used for pathplanning [12]. An
OG is a binary random field where the state of each
cell is assumed independent and encodes the probability of the corresponding region in space containing an
obstacle.
The state of interest is the binary vector x =
[x1 , x2 , . . . , xN ]T , where xs ∈ Xs = {OCC, EM P },
∀s ∈ {1, . . . , N } and N is the number of cells in the
grid. The environment is assumed static and hence
the time index k has been replaced with a cell index. For simplicity the notation P (xs ) will mean
P (xs = OCC), and hence P (xs = EM P ) = 1−P (xs ).
Due to the independency assumption, the belief
can be maintained separately for each cell, P (x) =
P (x1 )P (x2 ) · · · P (xN ). The entropy of the OG is the
sum of the entropiesP
of the independent distributions
N
over the cells, Hx = s=1 Hxs . Where the entropy
of
a cell is
H
=
−P
(x
)
log
P
(x
)
−
1
−
P
(x
)
log
1
−
x
s
s
s
s
P (xs ) .
4.2
Observation Models
It is important to construct an observation model that
can be used to efficiently evaluate the expected utility
of future observations. This is opposed to models for
accurately fusing data after an observation is made. In
general the same model should be used for both, however this often proves computationally infeasible and
will not lead to a partially separable group objective
function.
P (xs |i zk ) = 1
iz
P (xs |i zk ) = P (xs )
k (s)
iz
P (xs |i zk ) = 0
iz
k (s)
k (s)
Qn
= OCC
i=1 Zi (s̃). The probability P (zk (s̃)|Zk−1 , Yk ) is the
joint probability the multiple platforms will detect or
not detect the state of a given cell s and is given by
= UNO
= EM P
Figure 2: Each platform is equipped with a perfect sensor that will either fully observe the state of a given cell
(i zk (s) = EM P or OCC) or will not observe the cell at all
(i zk (s) = U N O). As more sensor platforms are added to
the planning process the possible posteriors will always be
constrained to three per OG cell.
For a low noise sensor, such as a laser range scanner, it is assumed the sensor is deterministic. Thus
for any observation, each cell will be fully observed
(being either occupied or empty), or left unobserved.
The notation i zk (s) will represent the outcome of the
ith platform observation at time k with respect to cell
s, i.e. i zk (s) ∈ Zi (s) = {OCC, EM P, U N O}. This
simple model causes the posterior of each cell to be
constrained to three values, corresponding to the cell
known to be occupied, known to be empty or remains
the same as the prior (see Fig. 2).
If all platforms employ a similar deterministic observation model, the posterior of each cell after all platform observations is constrained to one of these three
values.
However, two of these have the same entropy (if the
state of a cell is known, P (xs ) = 0 or 1, the entropy is
zero), and the other has the same as the prior. Thus
for the goal of calculating the expected posterior entropy, the observations, i zk (s) = OCC, EM P will be
considered together. Thus the observation space Zi (s)
will be partitioned
S into Zi (s̃) = {OBS, U N O} where
OBS = {OCC EM P } and represents the state of
the cell being observed. The full observation model is

P,

 φ(i yk ) if xm = EM
i
∀m
∈
ray[
y
i
i
k , s)
P ( zk (s̃) = OBS|x, yk ) =

 0
else
i
i
i
P (zk (s̃)|Zk−1 , Yk ) =
n
X
Y
P (x|Zk−1 , Yk−1 )
P (i zk (s̃)|x, i yk ). (34)
If the sensor platforms are to observe the same cell
from slightly different directions, then due to the
independency assumption of the OG, the observations i zk (s̃) made by the platforms will be independent. This occurs because the observations of each
of the platforms, defined in (32), will only be dependent on independent
components of the state x (i.e.
T
ray[i yk , s) ray[j yk , s) = ∅). Thus the joint probability becomes the product of the marginals
P (zk (s̃)|Zk−1 , Yk ) =
P (i zk (s̃)|Zk−1 , Yk−1 , i yk ) =
X
P (x|Zk−1 , Yk−1 )P (i zk (s̃)|x, i yk ). (36)
x∈X
However this is where the assumptions made in the
OGs cause a full probabilistic analysis to produce
unrealistic predictions (see Fig. 3 for a comparison).
Hence, the probability P (i zk (s̃)|Zk−1 , Yk−1 , i yk ) will
be empirically modelled from the statistics of previous
observations[13].
Using (33) and (35), the expected utility becomes
Jk (uk ) = −
N
X
Jk (uk ) = −
N
X
X
n
Y
P (i zk (s̃)|Zk−1 , Yk−1 , i yk )
s=1 zk (s̃)∈Z(s̃) i=1
× Hxs |Zk ,Yk (zk (s̃), uk )
(37)
This can be further simplified by noting that if a cell
is observed by any of the platforms (i zk (s̃) = OBS for
any i ∈ {1, . . . , n}) it will have zero posterior entropy,
while if it is unobserved by the entire group, its entropy
will remain the same as its prior Hxs |Zk−1 ,Yk−1 . Thus
the objective function becomes
Group Utility
The group objective function is given by the expected
posterior entropy of each cell, summed over all cells
P (i zk (s̃)|Zk−1 , Yk−1 , i yk ).
(35)
This allows each platform to independently calculate
the probability of its observations. These could be obtained using
P ( zk (s̃) = U N O|x, yk ) = 1 − P ( zk (s̃) = OBS|x, yk )
(32)
4.3
n
Y
i=1
i
where φ(i yk ) ∈ {0, 1} includes geometric sensor constraints such as range and field of view and ray[i yk , s)
represents the cells between the sensor and up to but
not including the cell s. Thus a cell will be observed if
it is within the sensor range and field of view and all
the cells in between are empty.
i=1
x∈X
Jk (uk ) = −
N Y
n
X
i
pk (s, i uk )Hxs |Zk−1 ,Yk−1 . (38)
s=1 i=1
where i pk (s, i uk ) = P (i zk (s̃) = U N O|Zk−1 , Yk−1 , i yk ).
X
P (zk (s̃)|Zk−1 , Yk )
s=1 zk (s̃)∈Z(s̃)
× Hxs |Zk ,Yk (zk (s̃), uk ). (33)
The expectation is taken over all possible group observations zk (s̃) = {1 zk (s̃), . . . , n zk (s̃)} ∈ Z(s̃) =
4.4
Decentralised Optimal Control
The group objective function (38) has the required partially separable form defined in Section 3.2. This can
be demonstrated by letting the impact space S be the
1
Measured
Prior P(x =OCC) = 0.5
s
0.9
Probability cell is observed
0.8
0.7
0.6
0.5
0.4
0.3
0.2
(a)
0.1
0
0
1
2
3
4
5
6
7
(b)
Figure 4: (a) OG map representing the current group be-
8
Distance (m), No. of Cells (x10)
Figure 3:
The probability of a cell being observed
P (i zk (s̃) = OBS|Zk−1 , Yk−1 , i yk ) versus distance from the
sensor. Black line is for the OG model with prior probability of occupancy of 0.5, grey line was experimentally
determined for the environment given in Fig. 6.
lief, the sensor platforms are shown in blue and grey. The
red squares are the optimal strategies determined by the
group. (b) Corresponding group impact T αk displayed on
the same grid as the OG, white is high probability that the
group will observe the cell and grey is low probability.
bounded N dimensional space [0, 1]N , where an element corresponds to the probability of the platform
not observing each cell in the OG. Thus
i
αk = i Υ k (i uk )
T
= i pk (s, i uk ), . . . , i pk (s, i uk ) .
(39)
The composition operator is given by element wise
multiplication and thus the group impact is
T
αk =
n
hY
i
i
pk (1, uk ), . . . ,
i=1
n
Y
i=1
i
i
pk (N, uk )
iT
for the experiments.
.
(40)
By letting Tαk (s) be the sth element of the group impact vector, the group objective function, in the partially separable form, is
Jk (uk ) = ψk (Tαk ) = −
N
X
Figure 5: Mobile robots, Hornet, Buzz and Mozzy used
To overcome this problem, each platform will employ its own variable planning horizon. The group objective function then becomes the expected rate of uncertainty reduction in the map obtained by the group
undertaking a joint strategy vk
Jk (vk ) = E [−∆H/∆t]
T
αk (s) Hxs |Zk−1 ,Yk−1 .
s=1
(41)
Thus in theory the group optimal control problem can
be solved using the decentralised method outlined in
Section 3.4.
where
the group control strategy is vk
=
1
vk , . . . , n vk and the strategy for platform
i
is
given by the set of controls i vk = i uk:k+iκ , iκ .
5.1
5
Experiment
Previous developments assumed the look ahead was
until the next observation. However, for an exploration
type scenario, a much larger horizon is required (e.g.
where to go next after exploring a room). In general
this can be overcome by using a group objective function related to the expected posterior entropy after κ
steps.
However, this introduces the problem of what is the
appropriate horizon for the group. This is a complex
problem as an appropriate look ahead for one platform
may not be right for another. For example a platform
that has just finished exploring a room will require a
much larger look ahead than one exploring a corridor.
In more general situations some platforms may have
less processing capability than others and are unable
to plan a long way into the future.
(42)
Planning and Observation Simplifications
The decentralised block iterative algorithm presented
in Section 4 remains the same except that control actions i uk are replaced with control strategies i vk . To
limit the space of possible strategies, only paths to
frontiers cells are considered. Also, only observations
made at the frontier will have a significant impact on
the expected posterior entropy, and hence only these
are included in the expectation for the group objective
function. Figure 4 shows the results of a simple two
platform scenario.
5.2
Results
The system was tested with the group of 3 pioneer
robots shown in Fig. 5. Localisation was performed
using highly reflective laser beacons with known positions. Figure 6 shows the results of the group exploring
(a)
(b)
(c)
Figure 6: Progress made by the group throughout the mission. Each figure represents the combination of previous and
future observations from the group (see Fig. 4).
an indoor office environment. The environment is approximately 20x30m at a discretisation of 10cm.
At each stage shown, the group summary T αk has
been combined with the current OG. This combined
map represents the expected belief given all previous
and possible future observations by the group. The
behaviour is consistent across the group, with each
vehicle’s strategy taking into account the strategy of
all other platforms. The task is considered completed
when no available actions will reduce the map uncertainty.
6
Conclusions
This paper has presented the general multi-platform
information gathering problem. This can be solved in
a scalable decentralised manner if the group objective
function can be written in a partially separable form.
Two examples scenarios were shown that result in
the required objective function, the first using the
Gaussian representation and the second an indoor occupancy grid mapping problem. The main advantage
of this form is the resulting solution complexity and
platfrom bandwidth requirements are independent of
the number of platforms in the group.
Although only the information gathering scenario
was considered, this method can be applied to any
group control problem, whose objective function is partially separable. A good example of this is the cooperative search problem described by Bourgault in [14]
which it is believed can also be formulated with a partially separable group objective function.
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
Acknowledgments
This work is partly supported by the ARC Centre of
Excellence programme, funded by the Australian Research Council (ARC) and the New South Wales State
Government, and by CSIRO Industrial Physics.
References
[1] D. V. Pynadath and M. Tambe, “Multiagent teamwork: Analyzing the optimality and complexity of
[13]
[14]
key theories and models,” in Proc. Int. Conf. of Autonomous Agents and Multiagent Systems, 2002.
T. Furukawa, G. Dissanayake, and H. F. DurrantWhyte, “Time-optimal cooperative control of multiple
robot vehicles,” in Proc. IEEE Int. Conf. on Robotics
and Automation, 2003.
A. Richards and J. P. How, “Aircraft trajectory planning with collision avoidance using mixed integer linear programming,” in Proc. of the American Control
Conf., 2002.
W. Burgard, M. Moors, C. Stachniss, and F. Schneider, “Coordinated multi-robot exploration,” IEEE
Trans. on Robotics, vol. 21, no. 3, pp. 376–386, June
2005.
S. Waslader, G. Inalhan, and C. J. Tomlin, “Decentralized optimisation via nash bargaining,” in Proc.
Conf. Cooperative Control and Optimisation, 2003.
B. Grocholsky, “Information-theoretic control of multiple sensor platforms,” Ph.D. dissertation, Univ. of
Sydney, Australia, 2002.
T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991.
P. S. Maybeck, Stochastic models, estimation and control. Academic Press, New York, 1979-1982.
D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. PrenticeHall, 1989.
A. Makarenko and H. F. Durrant-Whyte, “Decentralized data fusion and control algorithms in active sensor networks,” in Proc. Int. Conf. on Information Fusion, 2004, pp. 479–486.
J. Manyika and H. F. Durrant-Whyte, Data Fusion and Sensor Management: A Decentralized
Information-Theoretic Approach.
Ellis Horwood,
1994.
A. Elfes, “Occupancy grids : A probabilistic framework for robot perception and navigation,” Ph.D.
dissertation, Carnegie Mellon University, Australia,
1989.
W. Burgard, M. Moors, R. Simmons, and S. Thrun,
“Collaborative multi-robot exploration,” in Proc.
IEEE Int. Conf. on Robotics and Automation, San
Francisco, CA, USA, Apr. 2000, pp. 476–481.
F. Bourgault, T. Furukawa, and H. F. Durrant-Whyte,
“Decentralized bayesian negotiation for cooperative
search,” in Proc. IEEE/RSJ Int. Conf. on Intelligent
Robots and Systems, 2004.