Scalable Decoding on Factor Trees: A Practical

ACCEPTED
FOR PUBLICATION IN THE
IEEE T RANSACTIONS
ON
C OMMUNICATIONS , J ULY 2005.
1
Scalable Decoding on Factor Trees:
A Practical Solution for Wireless Sensor Networks
João Barros and Michael Tüchler
Abstract— We consider the problem of jointly decoding
the correlated data picked up and transmitted by the nodes
of a large-scale sensor network. Assuming that each sensor
node uses a very simple encoder (a scalar quantizer and a
modulator), we focus on decoding algorithms that exploit
the correlation structure of the sensor data to produce
the best possible estimates under the minimum mean
square error (MMSE) criterion. Our analysis shows that
a standard implementation of the optimal MMSE decoder
is unfeasible for large scale sensor networks, because
its complexity grows exponentially with the number of
nodes in the network. Seeking a scalable alternative, we
use factor graphs to obtain a simplified model for the
correlation structure of the sensor data. This model allows
us to use the sum-product decoding algorithm, whose
complexity can be made to grow linearly with the size
of the network. Considering large sensor networks with
arbitrary topologies, we focus on factor trees and give an
exact characterization of the decoding complexity, as well
as mathematical tools for factorizing Gaussian sources and
optimization algorithms for finding optimal factor trees
under the Kullback-Leibler criterion.
Index Terms— sensor networks, trees, complexity theory,
quantization, MAP estimation
I. I NTRODUCTION
Consider a large-scale sensor network in which hundreds of sensor nodes pick up samples from a physical
process in a field, encode their observations, and transmit
the data back to a remote location over an array of
reachback channels. The task of the decoder at the
remote location is then to produce the best possible
estimates of the data sent by all the nodes.
In [2], Barros and Servetto show that knowledge
available at the receiver on the correlation between
the measurements of different sensors can in general
J. Barros is with the Department of Computer Science
of the University of Porto, Porto, Portugal. URL:
http://www.dcc.fc.up.pt/˜barros/. M. Tüchler is with the Center of
Microelectronics Aargau, University of Applied Sciences of NorthWest Switzerland, Aargau, Switzerland. This work was conducted
while the authors were with the Institute for Communications
Engineering of the Technische Universität München, München,
Germany. Parts of it have been presented at the 2004 IEEE
International Conference in Communications [3], and the 2004
International Symposium on Inf. Theory and Applications [29].
be exploited to improve the decoding result and thus
increase the reachback capacity of the network. This
principle holds even when the sensor nodes themselves
are not capable of eliminating the redundancy in the data
prior to transmission. To fulfill this data compression
task, each node would have to use complex Slepian-Wolf
source codes, which might be impractical for large-scale
sensor networks. In that case, the decoder can still take
advantage of the remaining correlation to produce a more
accurate estimate of the sent information.
To study this scenario, we consider a reachback communications model in which the system complexity is
shifted from the sensor nodes to the receiver, i.e., a
reachback network with very simple encoders (e.g., a
scalar quantizer and a modulator) and a decoder of
increased yet manageable complexity. Our goal is then
to devise a practical decoding algorithm for this instance
of the sensor reachback problem.
Our main contributions are as follows. First, we argue
that a standard implementation of the optimal decoder
based on minimum mean square error (MMSE) estimation is unfeasible for large-scale sensor networks,
because its complexity grows exponentially with the
number of sensors. To guarantee the scalability of the
decoding algorithm, we propose to use factor graphs [20]
that model the correlation between the sensor signals
in a flexible way depending on the targeted decoding
complexity and the desired reconstruction fidelity. The
applied decoding algorithm is the sum-product (SP)
algorithm [20]. We are able to show that by choosing
the factor graph in an appropriate way we can make
the overall decoding complexity grow linearly with the
number of nodes.
We also provide evidence that cycle-free factor graphs,
so called factor trees, are particularly well suited for
large-scale sensor networks with arbitrary topology, because (a) they guarantee that the SP algorithm is MMSEoptimal and thus the fidelity depends on the approximation only, (b) they yield low-complexity solutions,
and (c) they allow the use of well-established minimum
weight spanning tree algorithms. Using the KullbackLeibler distance (KLD) as a measure of the fidelity of
the approximated correlation model, we give a detailed
mathematical treatment of multivariate Gaussian sources
II. P ROBLEM S ETUP
and a set of optimization algorithms for factor trees with
various degree constraints.
Finally, we present numerical results that underline the
performance and scalability of the proposed approach.
It turns out that under reasonable assumptions on the
spatial correlation of the sensor data, the performance of
our decoder is very close to the optimal MMSE solution.
Our results suggest a useful relationship between the
KLD and the MMSE, which still requires proof.
We note that the idea of exploiting the remaining
correlation in the source encoded data to enhance the
decoding result was already presented in Shannon’s
landmark paper [27]. This principle was put effectively
into practice in [13], triggering many contributions that
exploit the redundancy left by suboptimal quantizers in
combination with convolutional codes or turbo codes and
powerful iterative decoding schemes [14]. More recently,
this approach has also been successfully implemented
using low-density parity-check codes (see [19] and references therein). In the context of sensor networks, early
work by Slepian and Wolf [5] has inspired several contributions on the construction of distributed source codes
(e.g. [30], [6], and [7]), which remove the redundancy
in the data prior to transmission. Quantizers operating
along these lines were proposed in [9], [10] and [11].
Joint source-channel codes for correlated sources and
noisy channels were considered in [8]. Other related
lines of research are focused on multi-sensor data fusion [32], [33] and rate-constrained encodings of sensor
data with a given correlation structure (see e.g. [31] and
references therein).
The rest of the paper is organized as follows. Sec. II
sets the stage for the main decoding problem by describing the system setup and elaborating on the drawbacks
of the optimal decoder. Then, Sec. III describes our approach based on factor graphs and iterative decoding. A
key contribution is the set of optimization tools presented
in Sec. IV. The paper concludes with some numerical
results in Sec. V and some comments in Sec. VI.
µ, Σ).
Such a PDF is denoted as N (µ
Source Model: Each sensor k observes at time t
continuous real-valued data samples uk (t), with k =
1, 2, . . . , M . For simplicity, we assume that the M sensor
nodes are placed randomly on the unit square and
consider only the spatial correlation of measurements
and not their temporal dependence. Thus, we drop the
time variable t and consider only one time step. However,
it is worth pointing out that the discussed techniques can
be easily extended to account for sources with memory.
The sample vector u = (u1 u2 ... uM )T at any given time
t is assumed to be one realization of an M -dimensional
Gaussian random variable, whose PDF p(u) is given by
N (0M , R) with


1
ρ1,2 · · · ρ1,M
 ρ2,1
1
· · · ρ2,M 


R =  ..
..
..  .
..
 .
.
.
. 
ρM,1 ρM,2 · · ·
BPSK
+
!
!!
Sensor Node M
BPSK
+
BPSK
+
1
It follows that the samples uk are distributed with
N (0, 1). Gaussian models for capturing the spatial correlation between sensors at different locations are discussed in [26], whereas reasonable models for the correlation coefficients ρi,j of physical processes unfolding in
a field can be found in [12]. In the following, we assume
that the sensors are randomly placed in a unit square according to a uniform distribution and that the correlation
ρi,j between sensor i and j decays exponentially with
their Euclidean distance di,j , i.e., ρi,j = exp(−β · di,j ),
where β is a positive constant. Notice that this correlation
. . .
Decoder
Sensor Node 2
. . .
. . .
. . .
1
2
p(a) = exp(− (a − µ )T Σ−1 (a − µ ))/(2π|Σ|)1/2 . (1)
#"
The basic system model that accompanies us throughout this paper is illustrated in Fig. 1. We begin with a
brief explanation of our notation and a precise description of the source model, the encoding procedure, and
the reachback channel.
Notation: In the following, vectors are always considered to be column vectors and are denoted with
small bold letters. Matrices are denoted with capital
bold letters, unless otherwise noted. The expression
0N is the length-N all-zero column vector, IN is the
N × N identity matrix, and |A| is the determinant
ofA.
The covariance is defined by Cov{a, b} = E abT −
E {a} E {b}T , where E {·} is the expectation operator.
An N -dimensional random variable with realizations a ∈
RN is Gaussian distributed with mean µ = E {a} and
covariance matrix Σ = Cov{a, a}, when its probability
density function (PDF) p(a) is given by
Sensor Node 1
A. System Model
Fig. 1. System model of a sensor network.
2
The required posterior probabilities p(ik |y) are given by
X
p(ik = i|y) = γ ·
p(y|i)p(i),
(3)
structure, which we deem to be a reasonable abstraction
of the physical measurements picked up locally by a
number of scattered sensors, is only one of many source
models for which our algorithms apply.
Encoding: We assume that the sensors are “cheap”
devices consisting of a scalar quantizer, a bit mapper, and
a modulator1 . Each sensor k quantizes uk to the index
ik ∈ L = {1, 2, . . . , 2Q }, representing Q bits, i.e., there
are 2Q reconstruction values ũ(ik ) ∈ R. The modulator
maps ik to a tuple xk of channel symbols, which are
transmitted to the remote receiver. In our examples we
use binary phase shift keying (BPSK), such that in a
discrete-time baseband description of our transmission
scheme ik is mapped to Q symbols xk = (xk,1 ... xk,Q )
from the alphabet {+1, −1}.
Reachback Channel: Since in many applications sensors must transmit some data to the central receiver
simultaneously, reservation based medium access protocols such as TDMA or FDMA are a reasonable
choice for this type of reachback networks, as argued
in [2]. Thus, we assume that the reachback channel is virtually interference-free, i.e.,
QMthe joint PDF
p(y1 , ..., yM |x1 , ..., xM ) factors into k=1 p(yk |xk ). In
addition, we model the reachback channel as an array
of additive white Gaussian noise channels with noise
variance σ 2 , i.e., the channel outputs are given by yk =
xk +nk after demodulation, where nk is distributed with
N (0Q , σ 2 IQ ).
∀i∈LM :ik =i
where i = (i1 i2 ... iM )T and γ = 1/p(y) is a constant normalizing the sum over the product of probabilities to one.
Since the AWGN Q
channels are independent, the PDF
p(y|i) factors into M
k=1 p(yk |ik ), where each p(yk |ik )
is a Gaussian distribution given by N (xk (ik ), σ 2 IQ ).
The probability mass function (PMF) p(i) of the index
vector i can be obtained by numerically integrating the
source PDF p(u) over the quantization region indexed
by i. Alternatively, one can resort to Monte Carlo simulations in order to estimate p(i), a task which needs to
be carried out only once and can therefore be performed
offline.
The computational complexity of the decoding process
is determined by the number of additions and multiplications required to compute the estimates û k for
all k . The most demanding decoding operation
is the
Q
marginalization of the indices ik in p(i) · M
p(y
k |ik ),
k=1
denoted
mk (i) =
X
∀i∈LM :i
p(i) ·
k =i
M
Y
p(yl |il ).
(4)
l=1
Although the calculation
of the PDF p(yk |ik ) and the
P
estimate ûk = γ · ∀i∈L ũ(i) · mk (i) for all k requires
a number of additions and multiplications that is linear
in M , the marginalization in (4) requires 2Q(M−1) − 1
additions and M 2QM multiplications per index k .
Problem Statement: We conclude that a straightforward computation of the MMSE-optimal decoder is
unfeasible for networks with a large number of sensors
M , the calculation in (4) being the major bottleneck —
its computational complexity grows exponentially 2 with
M . Our goal is thus to find a scalable decoding algorithm
yielding the best possible trade-off between complexity
and estimation error.
B. Optimal Decoding
The decoder uses the channel output vector y =
(y1 y2 ... yM )T and the available knowledge of the
source correlation R to produce estimates ûk of the
measurements
uk . Assuming
that the mean square error
2
(MSE) E (ûk − ũ(ik )) between the estimate ûk and
the source representation ũ(ik ), corresponding to the
transmitted quantization index ik , is the fidelity criterion
to be minimized by the decoder, the conditional mean
estimator (CME) [25] should be applied:
X
ũ(i) · p(ik = i|y). (2)
ûk = E {ũ(ik )|y} =
III. S CALABLE D ECODING
USING
FACTOR G RAPHS
In this section we propose a scalable decoding solution
based on factor graphs and the sum-product (SP) algorithm [20], in which the computational complexity of
the decoding algorithm is restricted using the following
two-step approach: First, an approximate model of the
dependencies between the samples uk is defined yielding
a factor graph that is suitable for decoding. Second, the
SP algorithm is performed on the factor graph defined
∀i∈L
Notice that for PDF-optimized
quantizers this estimator
also minimizes the MSE E (ûk −uk )2 between the
estimate ûk and the originally observed value uk [16].
1
This model for the encoder may seem too simple, yet it allows us
to focus on the essential aspects of the problem and highlight the key
features of our decoding algorithm. The latter can be easily extended
to include, for example, more sophisticated channel coding.
2
Notice that although this is certainly true for a straightforward
implementation of the decoding algorithm, it remains to be seen
whether a more efficient implementation exists.
3
by the SP algorithm directly from the node degrees [1].
For our factor graph, in which the variables are drawn
from the size-2Q alphabet L, a degree-d variable node
requires d(d−2)2Q multiplications to compute d outgoing
messages (a message consists of 2Q values, d−2 multiplications per value). A degree-d function node requires
d(d−1)2Qd multiplications and d(2Q(d−1) −1) additions
to compute d outgoing messages (d summations over
(2Q(d−1)−1) values, d−1 multiplications per value). These
complexity counts hold for graphs with cycles as well,
but the number of operations scales with the number of
iterations performed during message passing. Thus, to
bound the complexity in this case, we must bound the
number of iterations.
Running the SP algorithm on the factor graph in
Fig. 2, which yields the exact marginals mk (i), requires
M 2Q multiplications in the M variable nodes for ik and
M (2Q(M−1) −1) additions and M (M −1)2QM multiplications in the function node for p(i). Combining these
numbers yields the same count as that below equation
(4).
in the first step, delivering the desired data estimates
with complexity growing linearly with M , the number
of sensors in the reachback network.
A. Factor Graphs and the Sum-Product Algorithm
A factor graph depicts a function of typically many
variables, which factors with respect to a suitable operation such as multiplication. There are two types of
nodes in a factor graph: variable nodes representing the
variables and function nodes representing the factors.
The dependencies between variables and factors are
indicated by edges connecting some of the nodes. The
degree of a node is the number of incident edges.
The function that needs
QMto be factorized in our decoding problem is p(i) · k=1 p(yk |ik ) contained in (4).
The corresponding factor graph, illustrated in Fig. 2
for M = 9 sensors, consists of M variable nodes
for each ik , and M function nodes for each p(yk |ik ),
as well as one degree-M function node for p(i). The
1
8
7
0.9
B. Scalable Decoding on Factor Trees
0.8
This complexity count can decrease tremendously if
p(i) factors into functions with small numbers of arguments (indices ik ) yielding function nodes with small degree. There are general ways to factorize p(i) such as the
chain rule, e.g. p(i) = p(i1 )p(i2 |i1 )...p(iM |i1 , ..., iM −1 ).
However, some factors in this factorization still evidence a large degree (up to degree M ) and the factor
graph contains cycles, so that the SP algorithm cannot be exact. To overcome this drawback, we propose
factorizations yielding fully connected cycle-free factor
graphs, which we name factor trees. Running the SP
algorithm on a factor tree yields the correct marginals
mk (i). Moreover, we can restrict the connectivity of
the factor tree by limiting the function nodes to have a
prescribed degree. For example, if we factorize according to p(i) = g1 (i1 )g2 (i2 , i1 )...gM (iM , iM −1 ) for some
functions gk (·), we get a chain-like factor tree, whose
function nodes have a degree of at most 2. Obviously, in
most cases the PMF p(i) derived from p(u) will not have
a structure leading to such a factorization. Consequently,
we must seek an approximate source distribution p̂(u)
that does lead to a PMF p̂(i) with the desired properties.
In this paper, we consider factorizations of p̂(i) into N
functions gk (·), k = 1, 2, ..., N , where the function node
degree, i.e., the number of arguments of gk (·), is at most
1, 2, or 3. Fig. 3 depicts possible factor graphs with these
constraints on gk (·) for our example with M = 9 sensors.
Running the SP algorithm on the degree-1 factor graph
corresponds to scalar decoding, where no information
4
5
0.7
9
6
0.6
0.5
1
0.4
2
0.3
3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
Q
Fig. 2. Factor graph of the function p(i) · M
k=1 p(yk |ik ) for a sensor
network consisting of 9 sensors (numbered circles). We have nine
variable nodes for each index ik (circles), nine function nodes for
each factor p(yk |ik ) (empty boxes), and one function node for the
factor p(i) (filled box).
marginals mk (i) in (4) can be computed by running
the SP algorithm [20] on the factor graph in Fig. 2,
which lets the nodes pass “messages” to their neighbors
along the edges of the graph. As long as the factor
graph is cycle-free, the SP algorithm yields the correct
marginals mk (i) for the M variable nodes, and we know
that the estimation error is due to the approximation
of the correlation structure only. Otherwise, it becomes
iterative (the messages circulate forever) and, in general,
the marginals mk (i) cannot be computed exactly, which
possibly leads to errors beyond those imposed by the
chosen approximation.
Cycle-free factor graphs also allow us to determine the
exact number of additions and multiplications required
4
1
1
8
7
0.9
0.8
4
5
0.7
4
5
0.7
9
6
9
0.6
1
0.4
0.5
1
0.4
2
2
0.3
0.3
3
0.2
3
0.2
0.4
0.6
0.8
0
1
2
3
0.2
0.1
0
1
0.4
0.3
0.2
0.1
6
0.6
0.5
0.5
4
5
0.7
9
6
0.6
8
7
0.9
0.8
0.8
0
1
8
8
7
0.9
0.1
0
0.2
0.4
0.6
0.8
1
0
0
0.2
0.4
0.6
QM
0.8
1
QN
Fig. 3. Factor
Q graph of the function p̂(i) · k=1 p(yk |ik ) corresponding to a sensor network with M = 9 sensors, where p̂(i) = k=1 gk (·) is
given by 9k=1 gk (ik ) (left plot, degree-1 function nodes), g(i1 , i6 )g(i2 , i1 )g(i3 , i9 )g(i4 , i7 )g(i5 , i4 )g(i6 , i4 )g(i7 , i8 )g(i9 , i5 ) (middle plot,
degree-2 function nodes), or g(i1 , i2 )g(i3 , i9 )g(i1 , i4 , i6 )g(i4 , i5 , i9 )g(i4 , i7 , i8 ) (right plot, degree-2 and -3 function nodes).
Moreover, it is possible to construct an optimal set of
arguments of the functions gk (·) for these rather simple
factor graphs, as we will prove in the next section.
Remark: Since each quantization index ik depends
on a unique source symbol uk , any factorization of
the approximate probability distribution of the indices
p̂(i) corresponds uniquely to a factorization of p̂(u) and
vice-versa. Consequently, the optimal arguments of the
functions gk (·) of the factorization for the indices p̂(i) =
Q
N
k=1 gk (·) can be easily obtained from the optimal
functions fk (·) of the corresponding
factorization for
QN
the source symbols p̂(u) = k=1 fk (·). Thus, to obtain
the approximate source distribution p̂(u), we chose the
functions gk (·) whose arguments correspond uniquely
to the arguments of the functions fk (·). For example, from p̂(i) = p(i1 )p(i2 |i1 )p(i3 |i1 ) we get p̂(u) =
p(u1 )p(u2 |u1 )p(u3 |u1 ).
about the correlations between sensors is taken into
consideration. In this case, we have N = M . To obtain
the approximate marginals m̂k (i) = p(yk |ik = i)gk (ik =
i), no operations are required on the function nodes for
gk (ik ) and only M 2Q multiplications must be performed
in the M variable nodes for ik .
Running the SP algorithm on the degree-2 or degree-3
factor graph requires 2·22Q multiplications and 2(2Q−1)
addition per degree-2 function node and 6 · 23Q multiplications and 3(22Q − 1) per degree-3 function node.
In addition, some multiplications are required in the
variable nodes. The following lemma shows how many
function nodes N are required to yield a factor tree:
Lemma 1: A factor graph consisting of M variable
nodes and N function nodes with degree dk , k =
1, 2, ..., N , can be a fully connected cycle-free factor
graph, i.e., a factor tree, if and only if
M +N −1=
N
X
IV. M ODEL O PTIMIZATION
dk .
k=1
A. Optimization Criterion
Proof: The lemma follows from elementary graph
theory (see e.g. [28]).
The SP algorithm running on a factor tree that corresponds to the approximate index distribution p̂(i) is a
candidate for our desired scalable decoding algorithm,
whose complexity increases linearly with the number
of sensors M . There are numerous other factorizations
of p̂(i) yielding different complexity counts, e.g., by
increasing the admissible degree of the function nodes,
by clustering the variable nodes, or by allowing factor
graphs with cycles. The latter yields a very large class
of factors graphs, which admit an iterative SP algorithm [3], [21]. Interestingly enough, as will be shown
in Sec. V, the performance of the scalable decoder based
on factor trees with degree-2 or degree-3 functions nodes
is already very close to that of the optimal decoder.
The performance of the scalable decoding algorithm
proposed in the previous section naturally depends on
how well p̂(u) approximates p(u). A useful distance
measure to determine how close p̂(u) is to p(u) is the
Kullback-Leibler distance (KLD) measured in bit [4,
Sec. 9.5]:
Z
p(u)
D(p(u)||p̂(u)) = p(u) log 2
du.
(5)
p̂(u)
Our motivation for using the KLD as the optimization
criterion comes from previous work on fixed-rate vector
quantization with mismatched codebooks [23, Sec. 6],
where it is shown that if the quantizer is optimized
for a model probability distribution p̂(u) instead of
the true source distribution p(u), the resulting excess
quadratic distortion in decibels is proportional to the
5
KLD D(p(u)||p̂(u)). Although this result does not apply
directly to our reachback system — in our case the
source coding is done by an array of scalar quantizers
and not by a vector quantizer — the vector quantizer
approach does correspond to the case of full cooperation
between the sensors, i.e. when every node knows all the
source realizations observed by all the nodes. Therefore,
the performance of a vector quantizer processing u can
be viewed as an upper bound to the fidelity achieved by
our coding scheme, and for this upper bound we know
from [23] that the loss in MSE is proportional to the
KLD between p̂(u) and p(u).
Indeed, our numerical results (discussed in detail in
Sec. V) sustain a similar useful connection between
the KLD for p(u) and p̂(u) and the average MSE
of our decoder. A mathematical proof that quantifies
this relationship is a non-trivial matter due to the nonlinearity of the array of scalar quantizers — it remains
a challenging problem for future work.
Thus, the set b1 is always empty. A special
Sk−1case is the
usual chain rule expansion, where bk = l=1
al holds.
In
our
example,
the
CCRE
p(u1 )p(u2 |u1 )p(u3 , u4 |u2 )p(u5 |u4 )
of
the
PDF
p(u1 , ..., u5 ) is specified by a1 = {u1 }, b1 = ∅,
a2 = {u2 }, b2 = {u1 }, a3 = {u3 , u4 }, b3 = {u2 },
a4 = {u5 }, and b4 = {u4 }.
The next definition introduces another
useful property:
QN
Definition 2: The CCRE p̂(u) = k=1 p(ak |bk ) is
said to be symmetric, if any bk , k = 2, 3, ..., N , is a
subset of (al , bl ) for some l < k .
The CCRE p(u1 )p(u2 |u1 )p(u3 , u4 |u2 )p(u5 |u4 ) of the
PDF p(u1 , ..., u5 ) is symmetric, because the properties
b2 ⊂ a1 , b3 ⊂ a2 , and b4 ⊂ a3 holds. The CCRE
p(u1 , u2 )p(u3 , u4 |u2 )p(u5 |u4 , u1 ) of p(u1 , ..., u5 ) is not
symmetric, since b3 = {u1 , u4 } is not contained in
{a1 , b1 } = {u1 , u2 } or {a2 , b2 } = {u2 , u3 , u4 }.
It turns out that symmetric CCREs of the source
distribution p(u) yield the factor graphs of interest in
this paper, i.e., the factor trees specified in Sec. III-B.
Consider the following lemma: Q
Lemma 2: If a CCRE p̂(u) = N
k=1 p(ak |bk ) for the
source distribution p(u) has at most one conditioning
variable in every factor, i.e., all bk are either empty
or contain a single element, then (1) the CCRE is
symmetric, and (2) the factor graph corresponding to
p̂(u) is a tree.
Proof: See the appendix.
From this
Q lemma follows, for example, that a CCRE
p̂(u) = N
k=1 p(ak |bk ) in which both the ak and the
bk consist of single elements yield a factor tree where
all function nodes have degree 2.
Due to the chosen source model, we are particularly
interested in CCREs of multivariate Gaussian distributions. Next, we present a useful lemma, which requires
the following definition: let P be an M × M indicator
matrix, whose entry in the l-th row and l 0 -th column is 1
if both ul and ul0 are contained in one of the N factors
p(ak |bk ) and 0 otherwise. For example, for the CCRE
p(u1 )p(u2 |u1 )p(u3 , u4 |u2 )p(u5 |u4 ) of p(u1 , ..., u5 ) we
find


1 1 0 0 0
 1 1 1 1 0 



P=
 0 1 1 1 0 .
 0 1 1 1 1 
0 0 0 1 1
B. Constrained Chain Rule Expansions
Recall that our goal is to minimize the KLD
D(p(u)||p̂(u)) subject to the constraints imposed
QN on the
functions fk (·) of the factorization p̂(u) = k=1 fk (·).
In this section, we introduce a few mathematical tools
that are useful for this task.
Consider
the
chain
rule
expansion
p(u1 )p(u2 |u1 )p(u3 , u4 |u1 , u2 )p(u5 |u1 , ..., u4 ) of the
source distribution p(u) = p(u1 , u2 , ..., u5 ), which
consists of N = 4 factors. A constrained chain rule
expansion (CCRE) is obtained by taking the factors
of the chain rule expansion and removing some of the
conditioning variables, thus yielding an approximate
PDF of p(u1 , ..., u5 ). For example, a CCRE of
p(u1 , ..., u5 ) with at most one conditioning variable
is given by p(u1 )p(u2 |u1 )p(u3 , u4 |u2 )p(u5 |u4 ). The
CCRE concept can be formalized as follows:
Definition 1: Consider a PDF p̂(u), which factors into
N PDFs p(ak |bk ) according to
p̂(u) =
N
Y
p(ak |bk ),
(6)
k=1
where ak and bk are subsets of the elements in u. This
PDF is a constrained chain rule expansion (CCRE) of
the source distribution p(u), if the following constraints
are met:
1) All pairs of subsets ak and al , k 6= l, are disjoint:
ak ∩ al = ∅.
Sk−1
2) The elements in bk are connected: bk ⊆ l=1
al .
SN
3) All elements uk of u are connected: k=1 ak = u.
The lemma can now be
as follows.
Qstated
N
Lemma 3: Let p̂(u) = k=1 p(ak |bk ) be a CCRE of a
Gaussian PDF p(u) given by N (0M , R). The following
holds:
1) The PDF p̂(u) is a zero-mean Gaussian PDF with
covariance matrix R̂, i.e., it is given by N (0M , R̂).
6
2) The entries of R̂−1 are zero for all zero-positions
in P.
3) The trace of RR̂−1 equals M , i.e., tr(RR̂−1 ) =
M.
4) If the CCRE is symmetric, then the entries of R̂
are equal to those in R for all one-positions in P.
Proof: See the appendix.
Based on Lemma 3, we can prove the following
connection between symmetric CCREs and the KLDoptimal functions fk (·) of the factorization p̂(u) =
Q
N
k=1 fk (·), which minimize the KLD D(p(u)||p̂(u)):
Theorem 1: Consider the Gaussian source distribution
QN p(u) given by N (0M , R) and the PDF p̂(u) =
k=1 fk (uk ), which factors into N functions fk (uk )
with subsets uk of u as argument. If the latter factorization admits a symmetric CCRE, i.e., all uk can
be split into pairs {ak , bk } satisfying the constraints in
Definitions 1 and 2, then the KLD-optimal functions
fk (uk ) minimizing the D(p(u)||p̂(u)) are equal to the
Gaussian PDFs p(ak |bk ) = p(ak , bk )/p(bk ), and the
corresponding minimal KLD is given by
N
X
the underlying symmetric CCRE p̂(u) =
yields the smallest KLD D(p(u)||p̂(u)).
k=1 p(ak |bk )
Let la and lb denote the allowed maximal number of
elements
in the sets ak and bk of the CCRE p̂(u) =
QN
p(a
k |bk ), respectively. Recall that the algorithmic
k=1
complexity for scalable decoding based on sum-product
decoding on a factor tree grows exponentially with the
degree df = la + lb of the function nodes, which is why
we consider factor trees with df ≤ 3 only, as specified
in Sec. III-B.
Besides the trivial scalar decoder
corresponding to the
Q
symmetric CCRE p̂(u) = N
p(u
k ), i.e., (la , lb ) =
k=1
(1, 0), we consider decoders based on the choice
(la , lb ) = (1, 1) or (la , lb ) = (2, 1), which are based on
factor trees with function node degrees of at most 2 or 3.
From Lemma 2 follows that symmetric CCREs generate
such factor trees when lb = 1, i.e. when the factors
p(ak |bk ) of the CCRE contain only a single conditioning
variable.
Next, we provide optimization algorithms for these
two classes of scalable decoders.
|Rak ,bk |
,
|Rbk |
k=1
(7)
where Rak ,bk and Rbk are the covariance matrices of
the zero-mean Gaussian PDFs p(ak , bk ) and p(bk ),
respectively.
Proof: See the appendix.
This theorem considerably simplifies our search for
KLD-optimal approximate source distributions p̂(u) =
Q
N
k=1 fk (uk ) that yield factor trees with function nodes
of degree at most 1, 2, or 3, by allowing us to restrict our
attention to the set of symmetric CCREs and determine
step by step the factor arguments ak and bk that minimize the KLD. Moreover, it follows from (7) that each
factor p(ak |bk ) of p̂(u) reduces the KLD D(p(u)||p̂(u))
by the amount log 2 |Rak ,bk |/|Rbk |, which is strictly
negative, because in general |Rak ,bk | < |Rbk | holds.
1
1
D(p(u)||p̂(u)) = − log 2 |R| +
2
2
QN
log2
1) Factor Tree with Degree-2
Q Function Nodes:
A symmetric CCRE p̂(u) = N
k=1 p(ak |bk ) yields a
degree-2 factor tree if (la , lb ) = (1, 1), i.e. N = M −1, as
stated inQLemma 1. Starting with the trivial factorization
M −1
p̂(u) = k=0
p(urk ), where {r0 , ..., rM −1 } is a permutation of the index set {1, ..., M }, admissible CCREs are
constructed by adding conditioning variables to M−1 of
these factors, i.e.,
p̂(u) = p(ur0 )
M
−1
Y
p(urk |usk ),
k=1
where s1 is necessarily equal to r0 and all other sk are
chosen from the set {1, ..., M }. Combining the PDFs
p(ur0 ) = p(us1 ) and p(ur1 |us1 ) yields the CCRE
p̂(u) = p(ur1 , us1 )
M
−1
Y
p(urk |usk ),
M
−1
Y
p(irk |isk ).
k=2
C. Optimization Algorithms
consisting of N = M − 1 factors with two Q
arguments.
The corresponding index factorization p̂(i) = N
k=1 gk (·)
is given by
In the previous section, we proved for Gaussian
sources that symmetric CCREs of the source distribution
p(u) yield the KLD-optimal
functions fk (·) of the factorQ
ization p̂(u) = N
f
(u
)
k=1 k k provided that the arguments
uk admit a symmetric CCRE. We also showed that
factorizations yielding a factor tree with function nodes
of degree 1, 2, or 3 always admit symmetric CCREs.
Nevertheless, there exist many factor trees that connect
the M variable nodes for each sensor in the network and
so the problem becomes finding the factor tree for which
p̂(i) = p(ir1 , is1 )
k=2
The calculation of the KLD D(p(u)||p̂(u)) via (7)
requires the local covariance matrices
1
ρrk ,sk
and Rusk = 1, (8)
Rurk ,usk =
1
ρrk ,sk
7
which follow from the entries ρk,k0 of the covariance
matrix R of the source distribution p(u), such that
2) Degree-3 Factor Trees: The optimization procedure for the previous case turned out to be relatively
simple, because degree-2 factor trees can be interpreted
as classical graphs and we could exploit well-established
graph-theoretic techniques. Unfortunately, this is not true
for degree-3 factor trees, forcing us to seek an alternative
solution.
In analogy with the previous case, we begin by rewriting (6) specifically for degree-3 factor trees according to
1
1
D(p(u)||p̂(u)) = − log 2 |R| + log2 |Rur1 ,s1 |
2
2
M −1
|Rurk ,usk |
1 X
+
log 2
2
|Rusk |
k=2
1
= − log 2 |R|
2
M −1
1 X
log 2 (1 − ρ2rk ,sk ).
+
2
(M −1)/2
p̂(u) = p(ur0 )
k=1
p(urk , usk |utk ),
k=2
Notice that a function node connecting the variable nodes
irk and isk decreases the KLD by
where a1 = ur0 , ak = [urk−1 usk−1 ] (for k > 1) and
bk = urk−1 . In practice, it is not always possible or
useful to construct a degree-3 factor tree that consists
solely of degree-3 function nodes, however to simplify
the explanation we will neglect the additional degree-2
function nodes and assume that (M − 1)/2 is a natural
number.
Once again, we require the local covariance matrices
1
log 2 (1 − ρ2rk ,sk ),
(9)
2
corresponding to the factor p(urk |usk ). We denote the
first decrease due to the factor p(ur1 , us1 ) as ∆D0 =
1
2
2 log 2 (1 − ρr1 ,s1 ).
The function nodes corresponding to p(ur1 , us1 ) or
p(urk |usk ) can be regarded as vertices in a classical
graph connecting the (variable) nodes irk and isk , which
have the undirected weight 21 log2 (1 − ρ2rk ,sk ). Our optimization task — finding the factor tree arguments ak and
bk for the factors p(ak |bk ) yielding a minimal KLD —
can thus be formulated as a minimum weight spanning
tree problem where the undirected weight of an edge
between two nodes irk and isk is given by 12 log 2 (1 −
ρ2rk ,sk ). To find this tree, we adapted Prim’s minimum
weight spanning tree algorithm [28]. The algorithm finds
the optimal tree with a very low complexity. Fig. 4 shows
the outcomes of the proposed algorithm for a sensor
network with M = 100 nodes using the source model
outlined in Sec. II-A.
∆D1|1 =

1
Ruk+1 = Rrk ,sk ,tk = ρrk ,sk
ρrk ,tk
ρrk ,sk
1
ρsk ,tk

ρrk ,tk
ρsk ,tk 
1
(10)
and Rbk+1 = Rsk = 1, where ρrk ,sk = R(rk , sk )
denotes the covariance between urk and usk . Now, we
can calculate the KLD using (7), which results in
1
D(p(u)||p̂(u)) = − log2 |R|
2
(M −1)/2
1 X
|Rrk ,sk ,tk |
+
log2
2
|Rsk |
k=1
1
= − log 2 |R|
2
(M −1)/2
1 X
log2 |Rrk ,sk ,tk |.
+
2
1
0.9
k=1
0.8
Since the degree-3 factor tree cannot be described as
a classical graph, we cannot apply a minimum weight
spanning tree algorithm. Closer inspection reveals that
our optimization problem is equivalent to the problem
of finding the minimum spanning hypertree in a hypergraph, which is known to be NP-hard in general [17].
Thus, we propose a suboptimal greedy algorithm that
constructs a degree-3 factor tree based on the optimal
degree-2 factor tree: First, we try to replace a pair
of degree-2 function nodes with one degree-3 function
node that reduces the KLD without changing the original
structure of the tree; then, we repeat this procedure over
and over again until it is no longer possible to replace
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Y
0
0.2
0.4
0.6
0.8
1
Fig. 4. Degree-2 factor trees for M sensors placed randomly on the
unit square, according to the source model described in Sec. II-A.
8
∆D2|1 − ∆D2×1|1
1
= log2
2
1 + 2 · ρrk ,sk · ρrk ,tk · ρsk ,tk − ρ2rk ,sk − ρ2rk ,tk − ρ2sk ,tk
1 + ρ2rk ,sk · ρ2rk ,tk − ρ2rk ,sk − ρ2rk ,tk
!
(11)
ing the factors p(usk |urk ) and p(urk |utk ) is
any function nodes.
Assume, for example, that we want to compute the
KLD decrement that results from the substitution of two
function nodes connecting, say, the variable nodes i rk
and isk and the variable nodes irk and itk , by a new
function node connecting irk , isk and itk .
1
1
log2 |Rsk ,rk | + log 2 |Rrk ,tk |
2
2
1
2
=
log2 (1 + ρrk ,sk · ρ2rk ,tk − ρ2rk ,sk − ρ2rk ,tk )
2
Consequently, the overall reduction in KLD that results
from the substitution is given by (11) shown on top of
the page. This quantity, which is used by the algorithm
to choose the appropriate substitutions, has the property
that
ρrk ,tk · ρrk ,sk
∆D2|1 − ∆D2×1|1 ≤ 0
if
≥0
ρsk ,tk
(12)
with equality when ρrk ,tk · ρrk ,sk = ρsk ,tk . It follows
that a degree-3 function node always leads to a smaller
KLD, except when the variables itk , irk and isk form a
Markov chain and ρtk ,rk ρrk ,sk = ρtk ,sk . In this case, the
two degree-2 factors translate the connection between
the variables in an optimal way [15].
∆D2×1|1 =
TABLE I
A LGORITHM : F IND THE OPTIMIZED DEGREE -3 FACTOR TREE
Initialization
Construct the optimal degree-2 factor tree T2 with Algorithm 1
Make a list of all combinations of three variable nodes
which are neighbours in T2
|Rs ,r ,t |
Calculate ∆D2|1 − ∆D2x1|1 = 21 log2 R k kR k
| sk ,rk || rk ,tk |
for every list entry
sort the list in order of increasing ∆D2|1 − ∆D2x1|1
Function node counter: k ← 0
Main Loop
repeat
read next row [irk isk itk ∆D2|1 − ∆D2x1|1 ] of list
if connection of irk , isk and itk does not form a cycle
then
k ←k+1
remove the two function nodes connecting irk , isk and
i tk
connect irk , isk and itk by a new function node with
function gk = p(irk , isk |itk ), where itk represents the
only conditioning argument in the previous functions
end if
until end of list
V. N UMERICAL E XAMPLES
To evaluate the decoder performance, we measure the
output signal-to-noise ratio (SNR) given by
kuk2
Output SNR = 10 · log 10
in dB
ku − ûk2
versus the channel SNR ES /N0 averaged over a sufficient amount of sample transmissions. We consider
two networks with M = 9 or M = 100 sensors. In
our implementation, MMSE-optimal decoding can be
simulated for the network with 9 sensors, only. Naturally,
the results are highly dependent on the chosen source
model. As outlined in Sec. II, we assume that the
correlation between the sensors ui and uj is given by
ρi,j = exp(−β · di,j ). Notice that if we keep increasing
the number of sensors in the unit square without altering
β , the sensor measurements would become increasingly
correlated. Therefore, to obtain a fair result, we set
β = 1.05 and β = 4.2 for the simulations with M = 9
sensors and M = 100 sensors, yielding correlation
values ρi,j between 0.217 and 0.930 and between 0 and
0.945, respectively. Each sensor node uses a Lloyd-Max
quantizer to map uk to ik , which is then transmitted in
accordance with the system setup described in Sec. II.
The decoder performance for the network with M = 9
sensors from Figs. 2 and 3 is illustrated in Fig. 5 for
In terms of the underlying CCRE, this step is equivalent to replacing the factors p(usk |urk ) and p(urk |utk )
with the factor p(usk , urk |utk ). Denoting the correlations between the sensors as ρrk ,sk , ρrk ,tk and ρsk ,tk ,
we can write the KLD decrement ∆D2|1 associated
with the degree-3 function node representing the factor
p(usk , urk , utk ) or p(usk , urk |utk ) as
1
log 2 |Rsk ,rk ,tk |
2
1
=
log 2 (1 + 2 · ρrk ,sk · ρrk ,tk · ρsk ,tk
2
−ρ2rk ,sk − ρ2rk ,tk − ρ2sk ,tk )
∆D2|1 =
On the other hand, the KLD decrement ∆D2×1|1 associated with the pair of degree-2 function nodes represent9
1-bit quantization (Q = 1). Clearly, the factor-tree-based
decoders (degree-2 and degree-3 tree) are nearly as good
as the MMSE-optimal decoder. Note also that there is
a direct correspondence between the decoding performance and the KLD. As expected, the scalar decoder
loses a lot of performance, since it does not exploit any
information about the source correlations. For this choice
of source model, the improvement of the degree-3 tree
over the degree-2 tree is barely noticeable, due to the
fact that the correlation between measurements decays
very quickly with the distance between the nodes.
Fig. 6 depicts the performance results for the network
with M = 100 sensors with multiple quantizers. The
KLD-optimal degree-2 factor tree for this network is
depicted in Fig. 4. Again, the KLD of the degree-2 tree
is nearly as good that of the degree-3 tree, which finds
a correspondence in their SNR performance. We recall
that for this network size (M > 100) the optimal MMSE
decoder is unfeasible.
4.6
16
Output SNR in dB
12
Output SNR in dB
10
SP tree d =2 Q=1
f
8
SP tree d =2 Q=2
f
SP tree d =2 Q=3
f
SP tree df=3 Q=1
6
SP tree df=3 Q=2
SP tree df=3 Q=3
4
2
no a priori Q=1
no a priori Q=2
no a priori Q=3
0
1
2
3
4
5
6
Channel SNR in dB
7
8
9
10
Fig. 6. Performance of three decoders based on optimized factor
graphs for a network with M = 100 sensors using various quantizers
(1, 2, or 3-bit quantization). The correlation factor between any two
sensor measurements varies between ρ = 0 and ρ = 0.945. We
consider the following cases: (1) scalar decoder (trivial factor graph,
D(p(u)||p̂(u)) = 45.37 bits), (2) KLD-optimal degree-2 factor tree
(D(p(u)||p̂(u)) = 6.13 bits), (3) optimized degree-3 factor tree (
D(p(u)||p̂(u)) = 5.40 bits).
t = 10000 samples
M = 9 sensors
Q =1
4.4
presented a scalable decoding scheme for the sensor
reachback problem, which uses a simplified factor graph
model of the dependencies between the sensor measurements such that a sum-product algorithm can produce
the required estimates efficiently.
Focusing on factor trees — for which we know
that the SP algorithm delivers optimal estimates —
we introduced the concept of constrained chain rule
expansions and provided two optimization algorithms for
the Gaussian case. The analysis tools we presented can
be equally applied to many other factorization models
yielding decoders with various complexities.
Our analyses and simulation results indicate that the
proposed approach is well suited for large-scale sensor
networks. Natural extensions could include (a) adapting
the factor graph to account for sensor nodes that have
more complex features, such as entropy coding, channel
coding or higher modulations, and (b) reducing the
complexity further by running linear message updates
in the nodes of the factor graph based on a Gaussian
approximation of the message distributions [24].
4.2
4
3.8
3.6
3.4
CME
SP tree df=3
3.2
SP tree df=2
no a priori
3
2.8
0
t = 10000 samples
M = 100 sensors
14
2
4
6
Channel SNR in dB
8
10
Fig. 5. Performance of the MMSE-optimal decoder (CME) and three
decoders applying the SP algorithm on the factor graphs in Fig. 3
for a network with M = 9 sensors and Q = 1-bit quantization.
The correlation factor between any two sensor measurements varies
between ρ = 0.217 and ρ = 0.930. We consider the following cases:
(1) scalar decoder (cf. Fig. 3(left), D(p(u)||p̂(u)) = 3.92 bits), (2)
optimal degree-2 factor tree (cf. Fig. 3(middle), D(p(u)||p̂(u)) =
0.43 bits), (3) optimized degree-3 factor tree (cf. Fig. 3(right),
D(p(u)||p̂(u)) = 0.40 bits), (4) optimal MMSE Decoder.
VI. S UMMARY
AND
C ONCLUSIONS
We studied the problem of jointly decoding the correlated measurements picked up by a sensor reachback
network. First, we showed that the complexity of the
optimal MMSE decoder grows exponentially with the
number of nodes in the network, thus motivating the
search for scalable solutions offering a trade-off between complexity and end-to-end distortion. Then, we
A PPENDIX
A) Proof of Lemma 2:
The proof of part 1) is straightforward: if bk consists
of at most a single element, this element must be
contained in al for some l < k according to definition
1.
10
To prove part 2), we start with the CCRE p̂0 (u) =
QN
k=1 p(ak ), which is derived from p̂(u) by removing
all conditions bk . The factor graph of this CCRE is a
tree (more precisely, a forest), since the subsets a k are
pairwise disjoint again according to definition 1. The N
subtrees corresponding to the factors p(ak ) are connected
to a complete tree by adding exactly N −1 extra edges
to the graph, such that each edge starts in the function
node of a p(ak ). This results from adding bk conditions
to the p(ak ) factors for all k = 1, ..., N , such that b1 is
empty and all other bk consist of exactly one element,
as stated in the lemma. This construction also serves to
prove part 2) of the Theorem.
B) Proof of Lemma 3:
Let [Rak ]K denote the expansion of Rak to a K ×
K matrix, where the non-zero entries correspond to the
positions of the ak elements in u, e.g.,


a 0 b 0 0
 0 0 0 0 0 


a b
c 0 d 0 0 
→ [Ru1 ,u3 ]5 = 
Ru1 ,u3 =
.

c d
 0 0 0 0 0 
0 0 0 0 0
Using this notation,Qthe inverse covariance matrix R̂−1
of the PDF p̂(u) = N
k=1 p(ak |bk ) can be written as
R̂
−1
=
N
X
[R−1
uk ] K
−
[R−1
bk ] K ,
factor p̂(u) into p(uk ) times a product of PDFs where all
elements in uk appear in the conditioning part, only. The
true source distribution p(u) can always be factored into
p(uk ) times a PDF where uk is in the conditioning part
using a suitable chain rule expansion. It follows that the
variables in uk are Gaussian distributed with zero mean
and covariance matrix Ruk according to either p̂(u) or
p(u), i.e. R̂ and R must have identical entries for all
variable pairs (ul , ul0 ) in uk .
C) Proof of Theorem 1:
The first step of the proof is to show that the KLDoptimal functions fk (uk ) and, thus, the PDF p̂(u) must
be Gaussian given that p(u) is zero-mean Gaussian.
This is shown in [21]. The second step is to show that
the factors fk (uk ) = p(ak |bk ) are the KLD-optimal
functions: Let S be the set of all positive definite M ×M
matrices, whose entries are equal to those in R for all
one-positions in P whereas the other entries are arbitrary.
Let S 0 be the set of all positive definite M ×M matrices,
whose inverse has zero entries for all zero-positions in
P whereas the other entries are arbitrary. From Theorem
2 in [22] follows that for any A ∈ S and any B ∈ S 0
the following inequality holds
D(N (0M , A)||N (0M , B̃)) ≤ D(N (0M , A)||N (0M , B)),
where B̃ is that unique matrix from S 0 , whose entries are
equal to those in R for all one-positions in P, i.e., B̃ ∈
S . A covariance matrix R̂ of the PDF p̂(u) constructed
from a symmetric CCRE is an element of both S (part
2 of Lemma 3) and S 0 (part 4 of Lemma 3), i.e., R̂ is
equal to B̃. Since the true source distribution R is an
element from S , it follows that D(p(u)||p̂(u)) given by
D(N (0M , R)||N (0M , R̂)) is the smallest KLD among
all Gaussian PDFs, whose covariance matrix is an element from S 0 . Finally, the Q
elements in S 0 represent
the admissible factorizations N
k=1 fk (uk ) of p̂(u), i.e.,
a Gaussian PDF p̂(u) constructed from a symmetric
CCRE yields the KLD-optimal factors fk (uk ) given by
p(ak |bk ).
(13)
k=1
where uk = (ak , bk ) and Ruk and Rbk are the covariance matrices of the zero-mean Gaussian PDFs p(uk ) =
p(ak , bk ) and p(bk ), respectively. This follows from the
equivalence p(ak |bk ) = p(uk )/p(bk ) and the definition
of a Gaussian PDF in (1). It is easy to see that p̂(u)
given by N (0M , R̂) is a zero-mean Gaussian PDF and
that the elements of R̂−1 are zero at the zero-positions
of P, which proves parts 1) and 2) of the lemma.
The proof of part 3) follows trivially from [18, Corollary 1.2]. For details see [21].
To prove part 4), assume that the factor p(ak |bk ) in
p̂(u) can be replaced by p(uk )/p(bk ) while 1/p(bk )
cancels with the argument ul of another factor p(al |bl ),
l 6= k , of p̂(u). This is possible for symmetric CCREs,
since bk is contained in (al , bl ) for some l < k , which
yields
Since p(u) and p̂(u) are Gaussian, computing the
KLD D(p(u)||p̂(u)) simplifies to
1
(− log 2 (|R| |R̂−1 |)
2
p(al |bl )/p(bk ) = p(ul )/p(bk )/p(bl ) = p(u0l |bk )/p(bl ),
+ tr(RR̂−1 )−M )
1
where u0l contains the remaining elements of ul after
= − log2 (|R| |R̂−1 |),
2
taking out all those in bk . This replacement can be
repeated recursively to cancel p(bl ) with p(am |bm ) for as shown in [18], [21], where the last line follows from
some m < l and so forth until the empty set b1 is part 3) of Lemma 3. Applying the factorization (13) to
reached. Thus, with a symmetric CCRE it is possible to R̂−1 yields the formula (7) in the theorem.
D(p(u)||p̂(u)) =
11
ACKNOWLEDGEMENTS
The authors most gratefully acknowledge discussions
with Seong Per Lee and Christoph Hausl, who also
ran the simulations while finishing their master/diploma
theses at the Technische Universit¨at München. The first
author would also like to thank Sergio D. Servetto and
J. Cardinal for their insightful comments.
R EFERENCES
[1] S. M. Aji and R. J. McEliece. The generalized distributive law.
IEEE Trans. on Inf. Theory, 46(2):325–343, March 2000.
[2] J. Barros and S. D. Servetto. Reachback capacity with noninterfering nodes. In Proc. of the IEEE International Symposium
on Information Theory (ISIT 2004), Yokohama, Japan, July
2003.
[3] J. Barros, M. Tüchler, and Seong P. Lee.
Scalable
source/channel decoding for large-scale sensor networks. In
Proc. of the IEEE International Conference in Communications
(ICC2004), Paris, June 2004.
[4] T. M. Cover and J. Thomas. Elements of Inf. Theory. John
Wiley and Sons, Inc., 1991.
[5] D. Slepian and J. K. Wolf. A Coding Theorem for Multiple
Access Channels with Correlated Sources. Bell Syst. Tech. J.,
52(7):1037–1076, 1973.
[6] V. Stankovic, A. Liveris, Z. Xiong, and C. Georghiades. Design
of Slepian-Wolf codes by channel code partitioning. In Proceeding of IEEE Data Compression Conference (DCC), Snowbird,
UT, USA, March 2004.
[7] J. Garcia-Frias and Y. Zhao. Compression of correlated binary
sources using turbo codes. IEEE Communications Letters, pages
417–419, 2001.
[8] W. Zhong, H. Lou, and J. Garcia-Frias. LDGM codes for joint
source-channel coding of correlated sources. In Proceedings
of the IEEE International Conference on Image Processing,
Barcelona, Spain, September 2003.
[9] Thomas J. Flynn, Robert M. Gray. Encoding of correlated
observations. IEEE Trans. Inform. Theory, IT-33:773–787,
1987.
[10] J. Cardinal, G. Van Assche. Joint entropy-constrained multiterminal quantization. In Proc. IEEE Int. Symposium on
Information Theory (ISIT 2002), Lausanne, Switzerland, JuneJuly 2002.
[11] G. Maierbacher, J. Barros. Low-Complexity Coding for the
CEO Problem with Many Encoders. In Proc. of the 26th
Symposium on Information Theory in the Benelux, Brussels,
Belgium, May 2005.
[12] C. R. Dietrich and G. N. Newsam. Fast and exact simulation of
stationary Gaussian processes through circulant embedding of
the covariance matrix. SIAM Journal on Scientific Computing,
18(4):1088–1107, 1997.
[13] J. Hagenauer. Source-controlled channel decoding. IEEE Trans.
on Communications, 43(9):2449–2457, September 1995.
[14] J. Hagenauer, E. Offer, and L. Papke. Iterative decoding of
binary block and convolutional codes. IEEE Trans. on Inf.
Theory, 42(2):429–445, March 1996.
[15] Christoph Hausl. Scalable decoding for large-scale sensor
networks. Diploma Thesis, Lehrstuhl für Nachrichtentechnik,
Technische Universität München, München, Germany, April
2004.
[16] N. Jayant and P. Noll. Digital Coding of Waveforms. Prentice
Hall, 1984.
12
[17] D. M. Warme. Spanning Trees in Hypergraphs with Applications to Steiner Trees. PhD Thesis, University of Virginia, May
1998.
[18] A. Kavcic and J. Moura. Matrices with banded inverses: Inversion algorithms and factorization of gauss-markov processes.
IEEE Trans. on Inf. Theory, 46:1495–1509, July 2000.
[19] I. Kozintsev, R. Koetter, and K. Ramchandran. A framework
for joint source-channel coding using factor graphs. In Proc.
33rd Asilomar Conference on Signals, Systems, and Computers,
Pacific Grove, CA, USA, October 1999.
[20] F. R. Kschischang, B. Frey, and H.-A. Loeliger. Factor graphs
and the sum-product algorithm. IEEE Trans. on Inf. Theory,
47(2):498–519, 2001.
[21] Seong Per Lee. Iterative decoding of correlated sensor data.
Diploma Thesis, Lehrstuhl für Nachrichtentechnik, Technische
Universität München, München, Germany, October 2003.
[22] H. Lev-Ari, S. Parker, and T. Kailath. Multidimensional
maximum-entropy covariance extension. IEEE Trans. on Inf.
Theory, 35:497–508, May 1989.
[23] J. Li, N. Chaddha, and R. Gray. Asymptotic performance of
vector quantizers with the perceptual distortion measure. IEEE
Trans. on Inf. Theory, 45(4):1082–91, May 1999.
[24] H. Loeliger. Least Squares and Kalman Filtering on Forney
Graphs. Codes, Graphs, and Systems, R.E. Blahut and R.
Koetter, eds., Kluwer, 2002.
[25] H. V. Poor. An Introduction to Signal Detection and Estimation.
Springer-Verlag, 1994.
[26] A. Scaglione and S. D. Servetto. On the interdependence of
routing and data compression in multi-hop sensor networks. In
Proc. ACM MobiCom, Atlanta, GA, 2002.
[27] C. E. Shannon. A mathematical theory of communication. Bell
Syst. Tech. J., 27:379–423 and 623–656, 1948.
[28] K. Thulasiraman and M. N. S. Swamy. Graphs: Theory and
Algorithms. John Wiley and Sons, Inc., 1992.
[29] M. Tüchler, J. Barros, and C. Hausl. Joint source-channel
decoding on factor trees: A scalable solution for large-scale
sensor networks. In Proc. of the 2004 International Symposium
on Information Theory and its Applications (ISITA 2004),
Parma, October 2004.
[30] S. Pradhan, J. Kusuma and K. Ramchandran. Distributed
compression in a dense sensor network. IEEE Signal Processing
Magazine, 1, March 2002.
[31] Prakash Ishwar, Rohit Puri, S. Sandeep Pradhan and Kannan
Ramchandran. On rate-constrained estimation in unreliable
sensor networks. In Proc. of the Second International Workshop
on Information Processing in Sensor Networks (IPSN), Palo
Alto, CA, April 2003.
[32] D.L. Hall and J. Llinas. An introduction to multisensor data
fusion. In Proceedings of the IEEE, vol. 85, n 1, pp. 6-23,
1997.
[33] H.F. Durrant-Whyte, M. Stevens and E. Nettleton. Data fusion
in decentralised sensing networks. In Proc. Fusion 2001,
pp.302-307, Montreal Canada, July 2001.