An Information-Theoretic Approach to Distributed State Estimation

Preprints of the 18th IFAC World Congress
Milano (Italy) August 28 - September 2, 2011
An Information-Theoretic Approach to
Distributed State Estimation
Giorgio Battistelli ∗ Luigi Chisci ∗ Stefano Morrocchi ∗
Francesco Papi ∗
∗
Dipartimento di Sistemi e Informatica DSI - Università di Firenze,
Via S. Marta 3, 50139 Firenze, Italy
(e-mail: {giorgio.battistelli, luigi.chisci}@unifi.it).
Abstract: It is shown that the covariance intersection fusion rule, widely used in the context of
distributed estimation, has a nice information-theoretic interpretation in terms of consensus on
the Kullback-Leibler average of Gaussian probability density functions (PDFs). Based on this
observation, a novel distributed state estimator based on the consensus among local posterior
PDFs is proposed and its stability properties are analyzed.
Keywords: Distributed state estimation; sensor fusion; Kalman filters; networks; information
theory.
1. INTRODUCTION
A challenging research topic is to develop efficient techniques for distributed information fusion in multi-agent
systems, i.e. networks consisting of multiple nodes (agents)
with local data acquisition, processing and communication
capabilities. Such networks are widely employed in modern
automation and supervision systems in both industrial and
other contexts (e.g., environmental monitoring) and pose
interesting problems that need a thorough investigation
from both a theoretical and a practical point of view.
A still open issue is how to effectively counteract the
deleterious effects of ”data incest” that occur even in the
simple case in which multiple inter-communicating agents,
without any centralized coordination and according to an
unknown and possibly time-varying network topology, try
to estimate the state of the same linear dynamical process;
in fact, in such a case, it is common to find, due to
the correlation among the estimates of different agents,
a remarkable performance degradation with respect to
the centralized Kalman estimator or even, if the target
process is unstable, divergence of the estimation error. A
key objective to be pursued is, therefore, to devise, with
reference to the state estimation of a linear process, distributed state estimation algorithms (to be implemented
in each agent) that guarantee for each agent, under the
assumptions of process observability from the whole network (but not necessarily from the individual agents) and
network connectivity, bounded estimation error as close as
possible to the one obtained with the optimal centralized
estimator. Among the approaches devised to counteract
the data incest phenomena, special attention is deserved
by the consensus filters (Xiao et al., 2005; Olfati-Saber
et al., 2007) and by the covariance intersection fusion rule
(Julier and Uhlmann, 1997; Chen et al., 2002). Consensus
filters provide a general tool for distributed averaging; in
the context of distributed state estimation, they have been
exploited for averaging innovations (Olfati-Saber, 2007;
Kamgarpour and Tomlin, 2008) or state estimates (AlCopyright by the
International Federation of Automatic Control (IFAC)
riksson and Rantzer, 2006; Carli et al., 2008; Olfati-Saber,
2009; Stankovic et al., 2009). The covariance intersection
fusion adopts an information filtering approach, by propagating the information (inverse covariance) matrix and
the information vector defined via premultiplication of the
state estimate by the information matrix, and computes
the fused information pair as a convex combination of the
local information pairs. Other relevant approaches to distributed state estimation include those based on movinghorizon estimation (Farina et al., 2010) and diffusion processes (Cattivelli and Sayed, 2010).
In this paper, we follow an information-theoretic approach
by formulating a consensus problem among the probability
density functions (PDFs) of the state vector to be estimated and by defining the average PDF as the one that
minimizes the sum of the information gains from the initial
PDFs. In particular, it is shown that in the linear Gaussian
case and for a single consensus step this just provides
the covariance intersection fusion which, therefore, can
be interpreted as a single-step consensus on local PDFs.
This consideration also suggests how multistep consensus
can be exploited to possibly improve performance of the
covariance intersection fusion. Another contribution of this
paper is to show that, under weak network observability
and connectivity assumptions as well as for any number of
consensus steps, distributed state estimation with consensus on the posterior PDFs guarantees bounded estimation
error covariance in all network nodes. All the proofs are
omitted due to space constraints.
2. CONSENSUS ON THE KULLBACK-LEIBLER
AVERAGE
Consider a network consisting of N nodes. Using graph
formalism, the communication structure among the nodes
is represented by a directed graph G = (N , E) where
N := {1, 2, . . . , N } is the set of nodes and E ⊆ N × N
is the set of links. In particular, it is supposed that (j, i)
belongs to N if and only if node j can communicate with
12477
Preprints of the 18th IFAC World Congress
Milano (Italy) August 28 - September 2, 2011
node i (by definition (i, i) ∈ E for any i ∈ N ). Further, let
N i = {j ∈ N : (j, i) ∈ E} denote the set of neighbors of
node i (note that, by definition, i always belongs to N i ).
Finally, let |N i | be the cardinality of N i (i.e., the in-degree
of node i plus one).
Suppose that, in each node i of the network, a PDF pi (·)
is available representing the local information on some
random vector x ∈ Rnx of interest. For example, such
PDFs can be obtained via statistical inference or can be
the result of some recursive Bayesian estimation algorithm
(as will be discussed in the following sections). Here, it
is supposed that all the local PDFs belong to the same
parametric family P = {p(·) = g(·; θ), θ ∈ Θ ⊂ Rnθ } where
f is a given function and θ a parameter vector completely
characterizing the PDF. In other words, each local PDF
can be written as pi (·) = g(·; θi ) for some vector θi ∈ Θ.
Aim of this section is to investigate whether it is possible
to devise a suitable consensus algorithm guaranteeing
that all the nodes of the network reach an agreement
regarding the PDF of x. Here, by consensus algorithm
we mean a distributed interaction rule specifying: i) the
information exchange between a node and its neighbors;
ii) the mechanism for updating the local PDF on the basis
of the received information.
In particular, the focus is on consensus algorithms that
preserve the shape of the PDFs so that all the updated
local PDFs always belong to the parametric family P. To
this end, let pi[ℓ] (·) ∈ P denote the PDF available at node
i at the ℓ-th iteration of the consensus algorithm (where
i
pi[0] (·) = pi (·)) and let θ[ℓ]
be the corresponding parameter
vector. Then, the following average consensus problem is
addressed.
Problem 1. Find a consensus algorithm such that
i
lim θ[ℓ]
= θ̄ , ∀i ∈ N .
ℓ→∞
where the asymptotic PDF p̄(·) = g(·; θ̄) represents the
average (in some sense) of the initial PDFs pi (·) =
g(·; θi ), i ∈ N .
Clearly, the first important issue to be addressed is how
to define the average PDF p̄(·) = g(·; θ̄) given the initial
PDFs pi (·) = g(·; θi ), i ∈ N . To this end, it is convenient
to recall that, given a set of points {x1 , . . . , xN } belonging
PN
to Rn , their average x̄ = N1 i=1 xi also satisfies the variational property of minimizing the mean squared distance
from such points, i.e.,
N
X
1
x̄ = arg minn
kx − xi k2
(1)
x∈R
N
i=1
where k · k denotes the Euclidean norm. As well known (),
there exist many ways of measuring the distance between
two PDFs pi (·) and pj (·). From the information-theoretic
point of view, the most typical choice corresponds to
the Kullback-Leibler divergence (KLD) or relative entropy
defined as
Z
pi (x)
DKL (pi kpj ) = pi (x) log j
dx .
p (x)
The KLD has many meaningful interpretations; for instance, in Bayesian statistics, it can be seen as the in-
formation gain achieved when moving from a prior PDF
pj (·) to a posterior PDF pi (·). It is also worth noting that
the KLD can be seen as the equivalent of the squared
Euclidean distance in the space of the PDFs; in fact, they
are both examples of Bregman divergences (Banerjee et al.,
2005). In this connection, taking into account (1), it seems
quite natural to introduce the following notion.
Definition 1. The Kullback-Leibler average (KLA) of N
PDFs pi (·) belonging to the parametric family P is the
PDF
N
X
1
p̄ = arg inf
DKL pkpi .
p∈P
N
i=1
According to Definition 1, the average PDF is the one that
minimizes the sum of the information gains from the initial
PDFs. Thus, this choice is coherent with the Principle of
Minimum Discrimination Information (PMDI) according
to which the PDF which best represents the current state
of knowledge is the one which produces an information
gain as small as possible (the interested reader is referred
to (Campbell, 1970; Akaike, 1973) for a discussion on
such a principle and its relation with Gauss’ principle
and maximum likelihood estimation) or, in other words,
“the probability assignment which most honestly describes
what we know should be the most conservative assignment
in the sense that it does not permit one to draw any
conclusions not warranted by the data.” (Jaynes, 2003)
In the remainder of this section, for the sake of brevity,
the attention will be restricted to a particular parametric
family of PDFs, namely, normal distributions. However,
similar considerations could be made also for all those
parametric families for which the KLD can be computed in
closed form (such families include those belonging to the
exponential class of density functions like the exponential
distribution, the binomial distribution, etc.).
Let now all the local PDFs pi (·) take the form
pi (x) = Φ(x; µi , Σi )
i ⊤
i −1
i
1
1
=p
e− 2 (x−µ ) (Σ ) (x−µ ) (2)
i
n
(2π) x det(Σ )
where µi ∈ Rn is the mean and Σi ∈ Rn×n is the (positive
definite) covariance matrix. Then, the following result can
be stated.
Lemma 1. The KLA of N Gaussian PDFs as in (2) takes
the form p̄(·) = Φ(·; µ̄, Σ̄) where the mean µ̄ and the
covariance matrix Σ̄ can be computed from the algebraic
equations
Σ̄−1 =
Σ̄−1 µ̄ =
N
1 X i −1
(Σ ) ,
N i=1
(3)
N
1 X i −1 i
(Σ ) µ .
N i=1
(4)
Recalling that Ωi := (Σi )−1 and q i := (Σi )−1 µi are the
information matrix and information vector, respectively,
associated with the Gaussian PDF (2), Lemma 1 states
that the KLA of N Gaussian PDFs can be simply obtained
12478
Preprints of the 18th IFAC World Congress
Milano (Italy) August 28 - September 2, 2011
by averaging their information matrices and information
vectors. An important consequence of this state of affairs
is that Problem 1 can be readily solved by applying one
of the many existing consensus algorithms for distributed
averaging of real numbers.
In this connection, the simplest distributed averaging
algorithms consist of updating the local data via convex
combination with the data received by the neighbors (Xiao
et al., 2005; Olfati-Saber et al., 2007). In this case, such a
choice gives rise to the following consensus algorithm: 1
X
π i,j Ωj[ℓ] , i ∈ N , ℓ = 0, 1, . . .
(5)
Ωi[ℓ+1] =
j∈N i
i
q[ℓ+1]
=
X
j
π i,j q[ℓ]
,
i ∈ N , ℓ = 0, 1, . . .
(6)
j∈N i
where
π i,j are suitable non-negative weights such that
P
i,j
= 1 , ∀i ∈ N . Clearly, the recursion is
j∈N i π
i
initialized by letting Ωi[0] = Ωi and q[0]
= q i for all i ∈ N .
Let us now denote by Π the consensus matrix whose
generic (i, j)-element coincides with the consensus weight
π i,j (if j ∈
/ N i then π i,j is taken as 0). Then, the following
proposition derives from well-known results on distributed
averaging (Xiao et al., 2005; Olfati-Saber et al., 2007).
Proposition 1. Let the consensus matrix Π be primitive
and doubly stochastic. 2 Then, the consensus algorithm
(5)-(6) asymptotically yields the KLA of the initial local
PDFs in that
lim Ωi[ℓ] = Ω̄ := Σ̄−1
ℓ→∞
i
lim q[ℓ]
= q̄ := Σ̄−1 µ̄ .
ℓ→∞
As well known (Calafiore and Abrate, 2009), a necessary
condition for the matrix Π to be primitive is that the
graph G associated with the sensor network be (strongly)
connected. In this case, a possible choice satisfying the
hypotheses of Proposition 1 is given by the so-called
Metropolis weights (Xiao et al., 2005; Calafiore and Abrate,
2009)
1
,
max{|N i |, |N j |}
X
π i,j .
π i,i = 1 −
π i,j =
i ∈ N , j ∈ N i , i 6= j
j∈N i ,j6=i
3. APPLICATION TO DISTRIBUTED STATE
ESTIMATION
In this section, the developments of the previous section
are applied to distributed state estimation. To this end,
consider a discrete-time dynamical system
xk+1 = f (xk ) + wk
1
Along the lines of Lemma 1 it could be seen that the update rules
(5) and (6) correspond to computing a weighted KLA between the
local PDF and the PDFs of the neighbors.
2 Recall that a non-negative square matrix Π is doubly stochastic
if all its rows and columns sum up to 1. Further, it is primitive if
there exists an integer m such that all the elements of Πm are strictly
positive.
where k is the discrete time index, xk ∈ Rnx is the system
state and wk ∈ Rnx is the process noise. The initial state
x0 of the system is unknown but distributed according to
a known PDF p0|−1 (·). It is supposed that the sequence
{wk } is generated by a zero-mean white stochastic process
with known PDF pw (·). Further, suppose that, at each
time k = 0, 1, . . ., each node of the network collects a
measurement
yki = hi (xk ) + vki
of the state vector xk , where vki ∈ Rny is generated by a
zero-mean white stochastic process with known PDF pvi (·)
(the initial state x0 and the sequences {wk } and {vki } are
supposed to be mutually independent).
If no communication between the nodes of the network
were possible, the solution of the local state estimation
problem would yield the well-known Bayesian filtering
recursion:
pi0|−1 (x) = p0|−1 (x) ,
pik|k (x) = R
pik+1|k (x) =
Z
pvi [yki
pvi [yki
−
−
(7)
i
h (x)]pik|k−1 (x)
hi (ξ)]pik|k−1 (ξ)dξ
pw [x − f (ξ)]pik|k (ξ)dξ ,
,
(8)
(9)
for k = 0, 1, . . ., where pik|t (·) represents the PDF of xk
conditioned to all the measurements collected at node i
up to time t.
If, however, a communication structure is available as
described in the previous section, then each node can
improve its local estimate by fusing the local information
with the one received from its neighbors. More specifically,
if one constrains 3 all the local PDFs to belong to a
parametric family P, then one can perform at each time
instant a certain number, say L, of consensus steps on
the posterior PDFs pik|k (·), i ∈ N , in order to compute
in a distributed fashion their KLA. This idea leads to
the following Distributed State Estimation Algorithm with
Consensus on the Posteriors (DSE-CP).
Algorithm 1. At each time k = 0, 1, . . ., for each node
i ∈ N:
(1) collect the local measurement yki and update the local
PDF pik|k−1 (·) via equation (8) to obtain the local
posterior pik|k,[0] ;
(2) perform L steps of consensus on the KLA average to
obtain the fused posterior pik|k,[L] ;
(3) compute the local prior pik+1|k from pik|k := pik|k,[L]
via equation (9).
Remark 1. The rationale for combining the local posteriors according to the KLA paradigm is to counteract the
deleterious effects of the so-called data incest (Smith and
3
Notice that in some cases of interest, like the linear-Gaussian
one, such an assumption is automatically satisfied. Further, even
in a nonlinear and/or non Gaussian setting, since it is usually not
possible to compute the true conditional PDF in closed form, it is
quite common to approximate it with a PDF belonging to some fixed
parametric family P. For example, when the Extended Kalman Filter
(EKF) or the Unscented Kalman Filter (UKF) are used, all the PDFs
are approximated as Gaussian.
12479
Preprints of the 18th IFAC World Congress
Milano (Italy) August 28 - September 2, 2011
Singh, 2006; Chang et al., 2008). In fact, it is well known
that, when multiple inter-communicating agents try to
estimate the state of the same dynamical process without
any centralized coordination, a remarkable performance
degradation with respect to the centralized estimator (or
even divergence of the estimation error) can occur due to
the possible correlation among the estimates of different
agents and the consequent double-counting of information.
As discussed in the previous section, an information fusion
rule based on the KLA paradigm is expected to avoid such
a double-counting since it follows the PMDI (this intuition
will be formalized in Section 4 for the linear-Gaussian
case).
3.1 The linear-Gaussian case: Covariance Intersection
revisited
Suppose now that both the system dynamics and the
measurement equations are linear, i.e., consider a discretetime linear system
xk+1 = Axk + wk
(10)
whose state is measured by N linear sensors
yki = C i xk + vki , i = 1, . . . , N .
i
qk+1|k
=
−1 i
A−⊤ I − Ωik|k Ωik|k + A⊤ Q−1 A
qk|k
,
Ωik+1|k = A−⊤ Ωik|k A−1
−1
−A−⊤ Ωik|k Ωik|k + A⊤ Q−1 A
Ωik|k A−1 .
By exploiting the above-defined information filter as well
as the consensus algorithm (5)-(6), it is possible to specialize Algorithm 1 as follows.
Algorithm 2. At each time k = 0, 1, . . ., for each node
i ∈ N:
(1) collect the local measurement
yki and update the local
Further, let the initial state, the process disturbance and
all the measurement noises be normally distributed, i.e.,
i
information pair qk|k−1
, Ωik|k−1 via equations (12)-
(13)
to obtain the local posterior information pair
i
qk|k,[0]
, Ωik|k,[0] ;
(2) perform L steps of consensus on the KLA average
X
Ωik|k,[ℓ+1] =
π i,j Ωjk|k,[ℓ] , ℓ = 0, . . . , L − 1(16)
p0 (x) = Φ(x; x̂0|−1 , P0|−1 ) ,
pw (w) = Φ(w; 0, Q) ,
pvi (v i ) = Φ(v i ; 0, Ri ) , i = 1, . . . , N ,
where x̂0|−1 is a known vector and P0|−1 , Q, Ri , i =
1, . . . , N are known positive definite matrices. As well
known, in this case, the Bayesian filtering recursion (7)(9) admits a closed-form solution in that, for each node i,
one has
j∈N i
i
qk|k,[ℓ+1]
=
(3) compute the local
prior information pair
i
i
i
, Ωik|k,[L] via equaqk+1|k , Ωk+1|k from qk|k,[L]
tions (14)-(15).
In principle, the algorithm should be initialized by letting
−1
Ωi0|−1 = P0|−1
,
i
q0|−1
Ωi0|−1 = 0 ,
are recursively propagated. With this choice, the update
step can be simply written as
Ωik|k
= Ωik|k−1
i ⊤
+ (C )
(Ri )−1 yki
i ⊤
i −1
+ (C ) (R )
,
(12)
i
(13)
C ,
whereas the prediction step takes the form
i∈N
−1
= P0|−1
x̂0|−1
,
i∈N.
However, since it is quite unrealistic to assume that all the
nodes of the network share the same a priori information
on the initial state x0 , a more practical initialization can
be obtained by letting
i
qk+1|k
= Ωik+1|k x̂ik+1|k
i
= qk|k−1
j
π i,j qk|k,[ℓ]
, ℓ = 0, . . . , L − 1 (17)
i
, Ωik|k :=
to obtain the fused information pair qk|k
i
, Ωik|k,[L] ;
qk|k,[L]
i
pik+1|k (x) = Φ(x; x̂ik+1|k , Pk+1|k
)
i
qk|k
X
j∈N i
i
pik|k (x) = Φ(x; x̂ik|k , Pk|k
),
and information vectors
i
qk|k
= Ωik|k x̂ik|k ,
(15)
Notice that, for equations (14) and (15) to make sense,
the system matrix A has to be invertible. This hypothesis
is automatically satisfied in sampled-data systems wherein
the matrix A is obtained by discretization of a continuoustime system matrix. Further, in the other cases, such a limitation could be easily overcome by considering different
information filter algorithms (e.g, the generalized squareroot filter proposed by Chisci and Zappa (1992)). However,
these generalizations are not addressed here in order to
keep the presentation as streamlined as possible.
(11)
where the mean vectors x̂ik|k and x̂ik+1|k and the covariance
i
i
matrices Pk|k
and Pk+1|k
can be recursively computed
by means of the Kalman filter recursion. In view of the
developments of the previous section, in order to streamline the presentation as well as the stability analysis, it is
convenient to consider the information form of the Kalman
filter recursion whereby, instead of the mean vectors and
the covariance matrices, the information matrices
i
i
)−1
)−1 , Ωik+1|k = (Pk+1|k
Ωik|k = (Pk|k
(14)
i
q0|−1
=0,
i∈N
(18)
i∈N,
(19)
which would amount to assuming that no a priori information is available.
It is important to point out that the idea of fusing the
information received from multiple sensors by making
a convex combination of the information matrices and
12480
Preprints of the 18th IFAC World Congress
Milano (Italy) August 28 - September 2, 2011
vectors as in (16)-(17) is not novel in the literature (albeit
the derivation proposed here is quite different from the
existing ones). In fact, such a fusion rule is known in
the literature as Covariance Intersection and dates back
to over a decade ago (Julier and Uhlmann, 1997; Chen
et al., 2002). In this connection, a contribution of this
paper consists of showing that such a widely-used fusion
rule naturally arises as a step of a consensus algorithm
for computing, in a distributed fashion, the KLA of the
local posterior PDFs. It is believed that, in the spirit
of Algorithm 1, this result can serve as a starting point
for deriving useful generalizations of such a fusion rule in
nonlinear and/or non-Gaussian settings.
4. STABILITY ANALYSIS
In this section, the stability properties of the proposed
distributed state estimation algorithm are analyzed. To
this end, in order to ensure the well-posedness of the state
estimation problem, the following preliminary assumptions
are needed.
A1. The system matrix A is invertible.
A2. The pair (A, C) is observable
where
C := col C 1 , . . . , C N .
One of the most basic features that a recursive state
estimation algorithm should enjoy is that the true estimation errors be consistent with their predicted statistics.
In fact, as well known (Jazwinski, 1970), divergence of
the estimation error can take place when the covariance
matrix becomes too small or optimistic (because in this
case subsequent observations tend to be ignored). In this
connection, the following definition can be introduced
(Jazwinski, 1970; Julier and Uhlmann, 1997).
Definition 2. Consider a random vector x. Further, let x̂
be an unbiased estimate of x and P an estimate of the
corresponding error covariance. Then, the pair (x̂, P ) is
said to be consistent if
E (x − x̂)(x − x̂)⊤ ≤ P .
(20)
In words, according to inequality (20), consistency amounts
to requiring that the estimated error covariance P be an
upper bound (in the positive definite sense) of the true
error covariance. This property becomes even more important in distributed state estimation because the unaware
reuse of the same data due to the presence of loops within
the network as well as the possible correlation between
measurements of different sensors can lead to inconsistency
and divergence. In fact, this was the primary motivation
that led to the development of the Covariance Intersection
fusion rule.
Notice now that, if one considers the information pair
(q, Ω) = (P −1 x̂, P −1 ), inequality (20) can be rewritten as
−1
Ω ≤ E (x − Ω−1 q)(x − Ω−1 q)⊤
.
Then, taking into account the relationship between Algorithm 2 and Covariance Intersection, the following result
can be stated.
Theorem 1. Let Assumption A1 holds and let the distributed state estimation Algorithm 2 be adopted with
the initialization (18)-(19). Then, for each k = 0, 1, . . . and
i
each i ∈ N , the information pair (qk|k
, Ωik|k ) is consistent
in that
o−1
n
Ωik|k ≤ E (xk − x̂ik|k )(xk − x̂ik|k )⊤
.
(21)
+
i
with x̂ik|k := Ωik|k qk|k
.4
Theorem 1 points out that, at least in the linear-Gaussian
case, the intuitions of Remark 1 are correct in that, thanks
to the adherence to the PMDI, the proposed distributed
state estimation algorithm avoids double-counting and
preserves consistency of all the information pairs. Further,
in view of Theorem 1, in order
to prove the boundedness
o
n
of the error covariance E (xk − x̂ik|k )(xk − x̂ik|k )⊤ it is
sufficient to show that asymptotically the information
matrix Ωik|k is bounded below by some positive definite
i
matrix (or, equivalently, to show that Pk|k
= (Ωik|k )−1 is
asymptotically bounded above by some constant matrix).
In this connection, the following theorem can be stated
that represents the main result of this section.
Theorem 2. Let assumptions A1-A2 hold and let the distributed state estimation Algorithm 2 be adopted with
the initialization (18)-(19). Then, if the consensus matrix
Π is primitive and the number L of consensus iterations is
greater than 0, there exist a time instant k̄ and a positive
definite matrix Ω̃ such that
0 < Ω̃ ≤ Ωik|k , ∀i ∈ N and ∀k ≥ k̄ .
As a consequence, the estimation error is asymptotically
bounded in mean square in that
n
o
lim sup E (xk − x̂ik|k )⊤ (xk − x̂ik|k ) ≤ tr{Ω̃−1 }.
k→∞
A few remarks on Theorem 2 are in order. First of all,
it is important to point out that, to the best of our
knowledge, this is the first stability result for distributed
state estimation that relies only on collective observability
assumptions. In fact, even the most recent stability proofs
(concerning other estimation algorithms) require some
sort of local observability (or detectability) condition in
each node, thus limiting the practical applicability (see,
for instance, (Stankovic et al., 2009; Olfati-Saber, 2009;
Cattivelli and Sayed, 2010)). Such an advancement is
mainly due to the fact that, in the proposed algorithm, the
local posterior PDFs (rather than just the local estimates)
are combined so that also the covariance matrices are
updated and kept consistent with the combined estimates.
It is also worth noting that such a stability result holds
regardless of the possible correlation between the measurements coming from different sensors (which is supposed
to be unknown). Further, only one consensus step per
iteration is needed for stability.
Another important feature of the proposed distributed
state estimation algorithm is that, when the system dy4
Here, the Moore-Penrose pseudoinverse of the information matrix
Ωik|k is considered to account for the fact that, under the uninformative prior (18)-(19), such a matrix is singular during the first time
instants.
12481
Preprints of the 18th IFAC World Congress
Milano (Italy) August 28 - September 2, 2011
namics as well as all the measurement equations are noisefree, it is possible to show that all the local estimation
errors xk − x̂ik|k converge asymptotically to zero. To see
this, it is convenient to consider the quadratic time-varying
Lyapunov functions Lik (x) := x⊤ Ωik|k−1 x , i ∈ N , and to
define the vector
N
Lk = col L1k (xk − x̂1k|k−1 ), . . . , LN
(x
−
x̂
)
.
k
k
k|k−1
Then, the following theorem can be stated.
Theorem 3. Let the system dynamics (10) and the measurement equations (11) be noise-free, i.e.,
wk = 0 ,
vki = 0 ,
i∈N
for k = 0, 1, . . . ,. Then, under the same assumptions of
Theorem 2, the following facts hold:
(i) there exist a time instant k̄ and a positive real β < 1
such that
Lk+1 ≤ β ΠL Lk ,
for k = k̄, k̄ + 1, . . .;
(ii) the proposed distributed state estimation algorithm
yields an asymptotic observer on each node of the
network, in that
lim xk − x̂ik|k = 0 , ∀i ∈ N .
k→∞
5. CONCLUSIONS
An information-theoretic approach to distributed estimation, exploiting the consensus on the Kullback-Leibler average of Gaussian probability density functions (PDFs),
has been introduced. Following this approach, a distributed state estimator based on the consensus among
local posterior PDFs has been derived. It has been proved
that the proposed estimator guarantees bounded state
estimation error covariance in all network nodes, under
network connectivity and collective observability. Simulation experiments, not shown here due to lack of space, have
demonstrated the effectiveness of the distributed estimator
even in networks with few and low-degree sensor nodes.
Future work will concern the development of distributed
state estimation algorithms for nonlinear systems and/or
with fault detection & accommodation features.
REFERENCES
Akaike, H. (1973). Information theory and the extension
of the maximum likelihood principle. In Proceedings
of the Second International Symposium on Information
Theory, 267–281.
Alriksson, P. and Rantzer, A. (2006). Distributed Kalman
filtering using weighted averaging. In Proocedings of the
17th International Symposium on Mathematical Theory
of Networks and Systems, 2006.
Banerjee, A., Guo, X., and Wang, H. (2005). On the
optimality of conditional expectation as a Bregman
predictor. IEEE Transactions on Information Theory,
51(7), 2664 –2669.
Calafiore, G.C. and Abrate, F. (2009). Distributed linear
estimation over sensor networks. International Journal
of Control, 82(5), 868–882.
Campbell, L. (1970). Equivalence of Gauss’s principle
and minimum discrimination information estimation of
probabilities. The Annals of Mathematical Statistics,
41(3), pp. 1011–1015.
Carli, R., Chiuso, A., Schenato, L., and Zampieri, S.
(2008). Distributed Kalman filtering based on consensus
strategies. IEEE Journal on Selected Areas in Communications, 26, 622–633.
Cattivelli, F.S. and Sayed, A.H. (2010). Diffusion strategies for distributed Kalman filtering and smoothing.
IEEE Transactions on Automatic Control, 55(9), 2069–
2084.
Chang, K.C., Chong, C.Y., and Mori, S. (2008). On
scalable distributed sensor fusion. In Proceedings of
the 2008 11th International Conference on Information
Fusion, 1–8.
Chen, L., Arambel, P., and Mehra, R. (2002). Estimation under unknown correlation: covariance intersection
revisited. IEEE Transactions on Automatic Control,
47(11), 1879–1882.
Chisci, L. and Zappa, G. (1992). Square-root Kalman
filtering of descriptor systems. Systems & Control
Letters, 19(4), 325–334.
Farina, M., Ferrari-Trecate, G., and Scattolini, R. (2010).
Distributed moving horizon estimation for linear constrained systems. IEEE Transactions on Automatic
Control, 55, 2462–2475.
Jaynes, E.T. (2003). Probability Theory: The Logic of
Science. Cambridge University Press.
Jazwinski, A. (1970). Stochastic Processes and Filtering
Theory. Academic Press.
Julier, S. and Uhlmann, J. (1997). A non-divergent estimation algorithm in the presence of unknown correlations.
In Proceedings of the 1997 American Control Conference, volume 4, 2369–2373.
Kamgarpour, M. and Tomlin, C. (2008). Convergence
properties of a decentralized Kalman filter. In Proocedings of the 47th IEEE Conf. Decision and Control, 2008,
3205–3210.
Olfati-Saber, R. (2007). Distributed Kalman filtering
for sensor networks. In Proocedings of the 46th IEEE
Conference on Decision and Control, 2007, 5492–5498.
Olfati-Saber, R. (2009). Kalman-consensus filter : Optimality, stability, and performance. In Proceedings of
the 48th IEEE Conference on Decision and Control,
2009 held jointly with the 2009 28th Chinese Control
Conference, 7036 –7042.
Olfati-Saber, R., Fax, J.A., and Murray, R.M. (2007).
Consensus and cooperation in networked multi-agent
systems. Proceedings of the IEEE, 95(1), 49–54.
Smith, D. and Singh, S. (2006). Approaches to multisensor
data fusion in target tracking: A survey. IEEE Transactions on Knowledge and Data Engineering, 18(12),
1696–1710.
Stankovic, S.S., Stankovic, M.S., and Stipanovic, D.M.
(2009). Consensus based overlapping decentralized estimation with missing observations and communication
faults. Automatica, 45(6), 1397–1406.
Xiao, L., Boyd, S., and Lall, S. (2005). A scheme for robust
distributed sensor fusion based on average consensus.
In Proceedings of The 4th International Symposium on
Information Processing in Sensor Networks (IPSN), 63–
70.
12482