Estimation in Gaussian Graphical Models Using Tractable

1916
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008
Estimation in Gaussian Graphical Models Using
Tractable Subgraphs: A Walk-Sum Analysis
Venkat Chandrasekaran, Student Member, IEEE, Jason K. Johnson, and Alan S. Willsky, Fellow, IEEE
Abstract—Graphical models provide a powerful formalism for
statistical signal processing. Due to their sophisticated modeling
capabilities, they have found applications in a variety of fields
such as computer vision, image processing, and distributed sensor
networks. In this paper, we present a general class of algorithms
for estimation in Gaussian graphical models with arbitrary structure. These algorithms involve a sequence of inference problems
on tractable subgraphs over subsets of variables. This framework
includes parallel iterations such as embedded trees, serial iterations such as block Gauss–Seidel, and hybrid versions of these
iterations. We also discuss a method that uses local memory at
each node to overcome temporary communication failures that
may arise in distributed sensor network applications. We analyze these algorithms based on the recently developed walk-sum
interpretation of Gaussian inference. We describe the walks
“computed” by the algorithms using walk-sum diagrams, and
show that for iterations based on a very large and flexible set of
sequences of subgraphs, convergence is guaranteed in walk-summable models. Consequently, we are free to choose spanning trees
and subsets of variables adaptively at each iteration. This leads to
efficient methods for optimizing the next iteration step to achieve
maximum reduction in error. Simulation results demonstrate that
these nonstationary algorithms provide a significant speedup in
convergence over traditional one-tree and two-tree iterations.
Index Terms—Distributed estimation, Gauss–Markov random
fields, graphical models, maximum walk-sum block, maximum
walk-sum tree, subgraph preconditioners, walk-sum diagrams,
walk-sums.
I. INTRODUCTION
G
RAPHICAL models offer a convenient representation
for joint probability distributions and convey the Markov
structure in a large number of random variables compactly.
A graphical model [1], [2] is a collection of variables defined
with respect to a graph; each vertex of the graph is associated
with a random variable and the edge structure specifies the
conditional independence properties among the variables. Due
to their sophisticated modeling capabilities, graphical models
[also known as Markov random fields (MRFs)] have found
applications in a variety of signal processing tasks involving
distributed sensor networks [3], images [4], [5], and computer
vision [6]. Our focus in this paper is on the important class
Manuscript received January 17, 2007; revised June 15, 2007. The associate
editor coordinating the review of this manuscript and approving it for publication was Dr. Manuel Davy. This work was supported by the Army Research Office Grant W911NF-05-1-0207 and the Air Force Office of Scientific Research
Grant FA9550-04-1-0351.
The authors are with the Laboratory for Information and Decision Systems,
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: [email protected];
[email protected]; [email protected]).
Digital Object Identifier 10.1109/TSP.2007.912280
of Gaussian graphical models, also known as Gauss–Markov
random fields (GMRFs), which have been widely used to model
natural phenomena in many large-scale estimation problems
[7], [8].
In estimation problems in which the prior and observation
models have normally distributed random components, computing the Bayes least-squares estimate is equivalent to solving a
linear system of equations specified in terms of the informationform parameters of the conditional distribution. Due to its cubic
computational complexity in the number of variables, direct matrix inversion to solve the Gaussian estimation problem is intractable in many applications in which the number of variables
is very large (e.g., in oceanography problems [8] the number of
). For tree-structured MRFs
variables may be on the order of
(i.e., graphs with no cycles), belief propagation (BP) [9] provides an efficient linear complexity algorithm to compute exact
estimates. However, tree-structured Gaussian processes possess
limited modeling capabilities [10]. In order to model a richer
class of statistical dependencies among variables, one often requires loopy graphical models. As estimation on graphs with
cycles is substantially more complex, considerable effort has
been and still is being put into developing methods that overcome this computational barrier, including a variety of methods
that employ the idea of performing inference computations on
tractable subgraphs [11], [12]. The recently proposed embedded
trees (ET) iteration [10], [13] is one such approach that solves
a sequence of inference problems on trees or, more generally,
tractable subgraphs. If ET converges, it yields the correct conditional estimates, thus providing an effective inference algorithm
for graphs with essentially arbitrary structure.
For the case of stationary ET iterations—in which the same
tree or tractable subgraph is used at each iteration—necessary
and sufficient conditions for convergence are provided in [10],
[13]. However, experimental results in [13] provide compelling
evidence that much faster convergence can often be obtained by
changing the embedded subgraph that is used from one iteration
to the next. The work in [13] provided very limited analysis for
such nonstationary iterations, thus leaving open the problem of
providing easily computable broadly applicable conditions that
guarantee convergence.
In related work that builds on [10], Delouille et al. [14] describe a stationary block Gauss–Jacobi (GJ) iteration for solving
the Gaussian estimation problem with the added constraint that
messages between variables connected by an edge in the graph
may occasionally be “dropped.” The local blocks (subgraphs)
are assumed to be small in size. Such a framework provides
a simple model for estimation in distributed sensor networks
where communication links between nodes may occasionally
fail. The proposed solution involves the use of memory at each
1053-587X/$25.00 © 2008 IEEE
CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS
node to remember past messages from neighboring nodes. The
values in this local memory are used if there is a breakdown in
communication to prevent the iteration from diverging. However, the analysis in [14] is also restricted to the case of stationary iterations, in that the same partitioning of the graph into
local subgraphs is used at every iteration.
Finally, we note that ET iterations fall under the class of parallel update algorithms, in that every variable must be updated
in an iteration before one can proceed to the next iteration. However, serial schemes involving updates over subsets of variables
also offer tractable methods for solving large linear systems
[15], [16]. An important example in this class of algorithms is
block Gauss–Seidel (GS) in which each iteration involves updating a small subset of variables.
In this paper, we analyze nonstationary iterations based on
an arbitrary sequence of embedded trees or tractable subgraphs.
We refer to these trees and subgraphs on which inference is performed at each iteration as preconditioners, following the terminology used in the linear algebra literature. We present a general
class of algorithms that includes the nonstationary ET and block
GS iterations, and provide a general and very easily tested condition that guarantees convergence for any of these algorithms.
Our framework allows for hybrid nonstationary algorithms that
combine aspects of both block GS and ET. We also consider the
problem of failing links and describe a method that uses local
memory at each node to address this problem in general nonstationary parallel and serial iterations.
Our analysis is based on a recently introduced framework for
interpreting and analyzing inference in GMRFs based on sums
over walks in graphs [17]. We describe walk-sum diagrams that
provide an intuitive interpretation of the estimates computed by
each of the algorithms after every iteration. A walk-sum diagram
is a graph that corresponds to the walks “accumulated” after
each iteration. As developed in [17] walk-summability is an
easily tested condition which, as we will show, yields a simple
necessary and sufficient condition for the convergence of the
algorithms. As there are broad classes of models (including attractive, diagonally dominant, and so-called pairwise-normalizable models) that are walk-summable, our analysis shows that
our algorithms provide a convergent, computationally attractive
method for inference.
The walk-sum analysis and convergence results show that arbitrary nonstationary iterations of our algorithms based on a
very large and flexible set of sequences of subgraphs or subsets of variables converge in walk-summable models. Consequently, we are free to use any sequence of trees in the ET algorithm or any valid sequence of subsets of variables (one that
updates each variable infinitely often) in the block GS iteration, and still achieve convergence in walk-summable models.
We exploit this flexibility by choosing trees or subsets of variables adaptively to minimize the error at iteration based on
. To make these choices opthe residual error at iteration
timally, we formulate combinatorial optimization problems that
maximize certain reweighted walk-sums. We describe efficient
methods to solve relaxed versions of these problems. For the
case of choosing the “next best” tree, our method reduces to
solving a maximum-spanning tree problem. Simulation results
indicate that our algorithms for choosing trees and subsets of
1917
variables adaptively provide a significant speedup in convergence over traditional approaches involving a single preconditioner or alternating between two preconditioners.
Our walk-sum analysis also shows that local memory at each
node can be used to achieve convergence for any of the above
algorithms when communication failures occur in distributed
sensor networks. Our protocol differs from the description in
[14], and as opposed to that work, allows for nonstationary updates. Also, our walk-sum diagrams provide a simple, intuitive
representation for the propagation of information with each
iteration.
One of the conditions for walk-summability in Section II-C
shows that walk-summable models are equivalent to models for
which the information matrix is an H-matrix [16], [18]. Several
methods for finding good preconditioners for such matrices have
been explored in the linear algebra literature, but these have been
restricted to either cycling through a fixed set of preconditioners
[19] or to so-called “multisplitting” algorithms [20], [21]. These
results do not address the problem of convergence of nonstationary iterations using arbitrary (noncyclic) sequences of subgraphs. The analysis of such algorithms along with the development of methods to pick a good sequence of preconditioners
are the main novel contributions of this paper, and the recently
developed concept of walk-sums is critical to our analysis.
In Section II, we provide the necessary background about
GMRFs and the walk-sum view of inference. Section III describes all the algorithms that we analyze in this paper, while
Section IV contains the analysis and walk-sum diagrams that
provide interpretations of the algorithms in terms of walk-sum
computations. In Section V, we use the walk-sum interpretation of Section IV to show that these algorithms converge in
walk-summable models. Section VI presents techniques for
choosing tree-based preconditioners and subsets of variables
adaptively for the ET and block GS iterations respectively,
and demonstrates the effectiveness of these methods through
simulation. We conclude with a brief discussion in Section VII.
The Appendix provides additional details and proofs.
II. GAUSSIAN GRAPHICAL MODELS AND WALK-SUMS
A. Gaussian Graphical Models and Estimation
A graph
ated edges
consists of a set of vertices
, where
and associ-
is the set of all unordered
is said to separate subsets
pairs of vertices. A subset
if every path in between any vertex in and any
passes through a vertex in . A graphical model
vertex in
[1], [2] is a collection of random variables indexed by the vercorresponds to a random
tices of a graph; each vertex
,
.A
variable , and where for any
is Markov with respect to if for any subdistribution
that are separated by some
, the subset of
sets
variables
is conditionally independent of
given , i.e.,
.
parameterized by a mean
We consider GMRFs
vector and a positive-definite covariance matrix (denoted
by
):
[1], [22]. For simplicity, each
1918
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008
is assumed to be a scalar variable. An alternate natural parameterization for GMRFs is specified in terms of the informa(also called precision or concentration
tion matrix
matrix) and potential vector
, and is denoted by
. In particular, if
is Markov with respect
to graph , then the specialization of the Hammersley–Clifford
theorem for Gaussian models [1], [22] directly relates the sparif and only if the edge
sity of to the sparsity of :
for every pair of vertices
. The partial correis the correlation coefficient of variables
lation coefficient
and
conditioned on knowledge of all the other variables
[1]:
(1)
Hence,
implies that
and are conditionally independent given all the other variables
.
, and suppose that we are given
Let
of , with
. The
noisy observations
goal of the Gaussian estimation problem is to compute an estimate that minimizes the expected squared-error between
and . The solution to this problem is the mean of the posterior
, with
and
distribution
[23]. Thus, the posterior mean
can be computed as the solution to the following linear system:
(2)
We note that is a symmetric positive-definite matrix. If and
are diagonal (corresponding to local measurements) has the
. The conditions for all
same sparsity structure as that of
our convergence results and analysis in this paper are specified
in terms of the posterior graphical model parameterized by .
As described in the introduction, solving the linear system (2)
is computationally expensive by direct matrix inversion even
for moderate-size problems. In this paper, we discuss tractable
methods to solve this linear system.
B. Walk-Summable Gaussian Graphical Models
We assume that the information matrix of a Gaussian model
has been normalized to have unit diagdefined on
onal entries. For example, if is a diagonal matrix containing
the diagonal entries of , then the matrix
contains rescaled entries of at off-diagonal locations and 1’s
along the diagonal. Such a rescaling does not affect the convergence results of the algorithms in this paper.1 However, rescaled
matrices are useful in order to provide simple characterizations
. The off-diagonal elements
of walk-sums. Let
of are precisely the partial correlation coefficients from (1),
and have the same sparsity structure as that of (and consequently the same structure as ). Let these off-diagonal enis the weight
tries be the edge weights in , i.e.,
of the edge
. A walk in is defined to be a sequence
such that
for each
of vertices
. Thus, there is no restriction on a walk crossing
1Although our analysis of the algorithms in Section III is specified for normalized models, these algorithms and our analysis can be easily extended to the
un-normalized case. See Appendix A.
the same node or traversing the same edge multiple times. The
is defined:
weight of the walk
Note that the partial-correlation matrix is essentially a matrix of edge weights. Interpreted differently, one can also view
each element of as the weight of the length-1 walk between
is then the walk-sum
two vertices. In general,
over the (finite) set of all length- walks from to [17], where
the walk-sum over a finite set is the sum of the weights of the
walks in the set. Based on this point of view, we can interpret
estimation in Gaussian models from (2) in terms of walk-sums:
(3)
Thus, the covariance between variables and is the lengthordered sum over all walks from to . This, however, is a very
specific instance of an inference algorithm that converges if the
is satisfied (so that the maspectral radius condition
trix geometric series converges). Other inference algorithms,
however, may compute walks in different orders. In order to
analyze the convergence of general inference algorithms that
submit to a walk-sum interpretation, a stronger condition was
developed in [17] as follows. Given a countable set of walks ,
is the unordered sum of the individual
the walk-sum over
weights of the walks contained in :
In order for this sum to be well-defined, we consider the following class of Gaussian graphical models.
Definition 1: A Gaussian graphical model defined on
is said to be walk-summable if the absolute walk-sums
over the set of all walks between every pair of vertices in are
,
well-defined. That is, for every pair
Here,
denotes absolute walk-sums over a set of walks.
corresponds to the set of all walks2 beginning
and ending at the vertex in . Section II-C
at vertex
lists some easily tested equivalent and sufficient conditions
for walk-summability. Based on the absolute convergence
condition, walk-summability implies that walk-sums over a
countable set of walks can be computed in any order and that
is well-defined [24], [25].
the unordered walk-sum
Therefore, in walk-summable models, the covariances and
means can be interpreted as follows:
(4)
(5)
W
W
W
2We denote walk-sets by
but generally drop this notation when referring to
the walk-sum over , i.e., the walk-sum of the set ( ) is denoted by ( ).
CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS
where (3) is used in (4), and (4) in (5). In words, the covariance
and is the walk-sum over the set of all
between variables
walks from to , and the mean of variable is the walk-sum
over all walks ending at with each walk being reweighted by
the potential value at the starting node.
The goal in walk-sum analysis is to interpret an inference
algorithm as the computation of walk-sums in . If the analysis shows that the walks being computed by an inference algorithm are the same as those required for the computation of
the means and covariances above, then the correctness of the algorithm can be concluded directly for walk-summable models.
This conclusion can be reached regardless of the order in which
the algorithm computes the walks due to the fact that walk-sums
can be computed in any order in walk-summable models. Thus,
the walk-sum formalism allows for very strong yet intuitive
statements about the convergence of inference algorithms that
submit to a walk-sum interpretation. Indeed, the overall template for analyzing our inference algorithms is simple. First, we
show that the algorithms submit to a walk-sum interpretation.
Next, we show that the walk-sets computed by these algorithms
, where
is the set of walks comare nested, i.e.,
puted at iteration . Finally, we show that every walk required
for some
for the computation of the mean (5) is contained in
. A key ingredient in our analysis is that in computing all the
walks in (5), the algorithms must not overcount any walks. Although each step in this procedure is nontrivial, combined together they allow us to conclude that the algorithms converge in
walk-summable models.
C. Properties of Walk-Summable Models
Very importantly, there are easily testable necessary and sufficient conditions for walk-summability. Let denote the matrix
of the absolute values of the elements of . Then, walk-summability is equivalent to either of the following [17]:
•
;
•
.
From the second condition, one can draw a connection to
H-matrices in the linear algebra literature [16], [18]. Specifically, walk-summable information matrices are symmetric,
positive-definite H-matrices.
Walk-summability of a model is sufficient but not necessary
for the validity of the model (positive-definite information/covariance). Many classes of models are walk-summable [17], as
follows:
,
1) diagonally dominant models, i.e., for each
;
2) valid non-frustrated models, i.e., every cycle has an even
number of negative edge weights and
; special
cases include valid attractive models (
for all
) and tree-structured models;
3) pairwise normalizable models, i.e., there exists a diagonal matrix
and a collection of matrices
if
,
such that
.
An example of a commonly encountered walk-summable model
in statistical image processing is the thin-membrane prior [26].
Further, linear systems involving sparse diagonally dominant
matrices are also a common feature in finite element approximations of elliptical partial differential equations [27].
1919
We now describe some operations that can be performed on
walk-sets, and the corresponding walk-sum formulas. These relations are valid in walk-summable models [17].
be a countable collection of mutually dis• Let
joint walk-sets. From the sum-partition theorem for absolutely summable series [25], we have that
. This implies that for a countable collection
where
, we have that
of walk-sets
. This is easily seen by
with
defining mutually disjoint walk-sets
.
and
be walks
• Let
. The concatenation of the walks is
such that
. Now condefined to be
and
sider a walk-set with all walks ending at vertex
. The
a walk-set with all walks starting at
concatenation of the walk-sets , is defined as follows:
If every walk
can be decomposed uniquely into
and
so that
, then
is said
. For such
to be uniquely decomposable into the sets
.
uniquely decomposable walk-sets,
Finally, the following notational convention is employed in
the rest of this paper. We use wild-card symbols ( and ) to
denote a union over all vertices in . For example, given a col, we interpret
as
.
lection of walk-sets
is defined
Further, the walk-sum over the set
. In addition to edges being assigned weights,
vertices can also be assigned weights (for example, the potential vector ). A reweighted walk-sum of a walk
with vertex weight vector is then defined to be
. Based on this notation, the mean of variable from
(5) can be rewritten as
(6)
III. NONSTATIONARY EMBEDDED SUBGRAPH ALGORITHMS
In this section, we describe a framework for the computation of the conditional mean estimates in order to solve the
Gaussian estimation problem of Section II-A. We present three
algorithms that become successively more complex in nature.
We begin with the parallel ET algorithm originally presented
in [10], [13]. Next, we describe a serial update scheme that involves processing only a subset of the variables at each iteration. Finally, we discuss a generalization of these nonstationary
algorithms that is tolerant to temporary communication failure
by using local memory at each node to remember past messages
from neighboring nodes. A similar memory-based approach was
used in [14] for the special case of stationary iterations. The key
theme underlying all these algorithms is that they are based on
solving a sequence of inference problems on tractable subgraphs
involving all or a subset of the variables. Convergent iterations
that compute means can also be used to compute exact error
variances [10]. Hence, we restrict ourselves to analyzing iterations that compute the conditional mean.
1920
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008
A. Nonstationary Parallel Updates: Embedded Trees
Let be some subgraph of the graph . The stationary ET algorithm is derived by splitting the matrix
, where
is known as the preconditioner and
is known as the cut.
ting matrix. Each edge in is either an element of or
Accordingly, every nonzero off-diagonal entry of is either an
element of
or of
. The diagonal entries of are part of
. Hence, the matrix
is symmetric, zero along the diagonal,
and contains nonzero entries only in those locations that correspond to edges not included in the subgraph generated by the
splitting. Cutting matrices may have nonzero diagonal entries
in general, but we only consider zero-diagonal cutting matrices
in this paper. The splitting of according to transforms (2) to
, which suggests an iterative method to solve
(2) by recursively solving
, yielding a
sequence of estimates
(7)
If
exists then a necessary and sufficient condition for the
iterates
to converge to
for any initial guess
is that
[10]. ET iterations can be very effective
to a vector is efficient, e.g., if corresponds to
if applying
a tree or, in general, any tractable subgraph.
A nonstationary ET iteration is obtained by letting
, where the matrices
correspond to some embedded
in and can vary in an arbitrary manner
tree or subgraph
with . This leads to the following ET iteration:
(8)
Our walk-sum analysis proves the convergence of nonstationary
ET iterations based on any sequence of subgraphs
in
walk-summable models. Every step of the above algorithm is
to a vector can be performed effitractable if applying
ciently. Indeed, an important degree of freedom in the above
at each stage so as to speed up
algorithm is the choice of
convergence, while keeping the computation at every iteration
tractable. We discuss some approaches to addressing this issue
in Section VI.
B. Nonstationary Serial Updates of Subsets of Variables
We begin by describing the block GS iteration [15], [16]. For
each
, let
be some subset of . The variare updated at iteration . The
ables
to . Let
remaining variables do not change from iteration
be the
-dimensional principal submatrix corresponding to the variables . The block GS update at
iteration is as follows:
(9)
(10)
to a vector can be pertractable as long as applying
formed efficiently.
We now present a general serial iteration that incorporates
an element of the ET algorithm of Section III-A. This update
scheme involves a single ET iteration within the induced subin
graph of the update variables . We split the edges
the induced subgraph of
into a tractable set
and a set of
. Such a splitting leads to a tractable subcut edges
of the induced subgraph of . That is, the matrix
graph
is split as
. This matrix splitting is defined
analogous to the splitting in Section III-A. The modified conditional mean update at iteration is as follows:
(11)
(12)
Every step of this algorithm is tractable as long as applying
to a vector can be performed efficiently.
The preceding algorithm is a generalization of both the block
GS update (9)–(10) and the nonstationary ET algorithm (8), thus
allowing for a unified analysis framework. Specifically, by letfor all above, we obtain the block GS algoting
rithm. On the other hand, by letting
for all , we recover
the ET algorithm. This hybrid approach also offers a tractable
and flexible method for inference in large-scale estimation problems, because it possesses all the benefits of the ET and block
GS iterations. We note that there is one potential complication
with both the serial and the parallel iterations presented so far.
Specifically, for an arbitrary graphical model with positive-definite information matrix , the corresponding information subfor some choices of subgraphs
may be invalid or
matrix
even singular, i.e., may have negative or zero eigenvalues.3 Importantly, this problem never arises for walk-summable models,
and thus we are free to use any sequence of embedded subgraphs
for our iterations and be guaranteed that the computations make
sense probabilistically.
,
Lemma 1: Let be a walk-summable model, let
and let
be the
-dimensional information matrix
of
corresponding to the distribution over some subgraph
. Then,
is walk-summable, and
the induced subgraph
.
Proof: For every pair of vertices
, it is clear that
the walks between and in are a subset of the walks be. Hence,
tween these vertices in , i.e.,
, because is walk-summable.
Thus, the model specified by
is walk-summable. This allows
because walk-summability implies
us to conclude that
validity of a model.
C. Distributed Interpretation of (11)–(12) and Communication
Failure
We first reinterpret (11)–(12) as local message-passing steps
between nodes followed by inference within the subgraph
3For example, consider a five-cycle with each edge having a partial correlation
refers to the complement of the vertex set . In (9),
refers to the submatrix of edge weights of edges from
the vertices
to
. Every step of the above algorithm is
Here,
0
0:6. This model is valid (but not walk-summable) with the corresponding
having a minimum eigenvalue of 0.0292. A spanning tree model J obtained
by removing one of the edges in the cycle, however, is invalid with a minimum
eigenvalue of 0:0392.
of
J
0
CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS
. At iteration , let
denote the set of directed edges in
and from
to :
or
(13)
The edge set
corresponds to the nonzero elements of the maand
in (11). Edges in
are used to commutrices
nicate information about the values at iteration
to neighboring nodes for processing at iteration .
For each
, the message
is
sent at iteration from to using the links in . Let
denote the summary of all the messages received at node at
iteration :
(14)
Thus, each
fuses all the information received about
the previous iteration and combines this with its local potential
value to form a modified potential vector that is then used for
inference within the subgraph :
(15)
denotes the entire vector of fused messages
for
. An interesting aspect of these message-passing operations is that they are local and only nodes
that are neighbors in may participate in any communication.
If the subgraph
is tree-structured, the inference step (15)
can also be performed efficiently in a distributed manner using
only local BP messages [9]. We now present an algorithm that
is tolerant to temporary link failure by using local memory at
reeach node to store the most recent message
fails at some future iteration
ceived at from . If the link
the stored message can be used in place of the new expected
message. In order for the overall memory-based protocol to
be consistent, we also introduce an additional post-inference
message-passing step at each iteration. To make the above
points precise, we specify a memory protocol that the network
must follow; we assume that each node in the network has
sufficient memory to store the most-recent messages received
must not contain any failed links;
from its neighbors. First,
that fails at iteration must be a part
every link
. Therefore, the links
that
of the cut-set4:
are used for the inference step (15) must be active at iteration .
Second, in order for nodes to synchronize after each iteration,
they must perform a post-inference message-passing step. After
must
the inference step (15) at iteration , the variables in
update their neighbors in the subgraph
. That is, for each
, a message must be received post-inference from every
such that
:
where
(16)
4One way to ensure this is to select S to explicitly avoid the failed links. See
Section VI-B for more details.
1921
This operation is possible since the edge
is assumed to
active. Apart from these two rules, all other aspects of the algorithm presented previously remain the same. Note that every
new message received overwrites the existing stored message,
and only the most recent message received is stored in memory.
Thus, link failure affects only (14) in our iterative procedure.
from node is
Suppose that a message to be received at
unavailable due to communication failure. The message
from memory can be used instead in the fusion formula (14).
denote the iteration count of the most recent
Let
information at node about at the information fusion step (14)
at iteration . In general,
, with equality
and
is active. With this notation, we can
if
rewrite (14):
(17)
IV. WALK-SUM INTERPRETATION AND WALK-SUM DIAGRAMS
In this section, we analyze each iteration of the algorithms
of Section III as the computation of walk-sums in . Our analysis is presented for the most general algorithm involving failing
links, since the parallel and serial nonstationary updates without
failing links are special cases. For each of these algorithms, we
then present walk-sum diagrams that provide intuitive, graphical interpretations of the walks being computed. Examples that
we discuss include classic methods such as GJ and GS, and iterations involving general subgraphs. Throughout this section,
, and we initialize
we assume that the initial guess
and
for each directed edge
. In Section V, we prove the convergence of our algo.
rithms for any initial guess
A. Walk-Sum Interpretation
For every pair of vertices
, we define a recursive
sequence of walk-sets. We then show that these walk-sets are
exactly the walks being computed by the iterative procedures in
Section III:
(18)
(19)
with
(20)
The notation in these equations is defined in Section II-C.
denotes the walks starting at node com.
corresponds
puted up to iteration
to a length-1 walk (called a hop) across a directed edge in
. Finally,
denotes walks within
that end
at . Thus, the first RHS term in (18) is the set of previously
1922
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008
computed walks starting at that hop across an edge in
,
and then propagate within
(ending at ).
is
. To
the set of walks from to that live entirely within
.
simplify notation, we define
We now relate the walk-sets
to the estimate
at iteration .
, with
, the
Proposition 1: At iteration
in walk-summable models is given by
estimate for node
Fig. 1. (Left) Gauss–Jacobi walk-sum diagrams G
for n
(Right) Gauss–Seidel walk-sum diagrams G
for
n
= 1; 2; 3; 4.
= 1; 2; 3.
(21)
where the walk-sum is over the walk-sets defined by (18)–(20),
is computed using (15) and (17).
and
This proposition, proven in Appendix B, states that each of
our algorithms has a precise walk-sum interpretation. A consequence of this statement is that no walk is over-counted, i.e.,
submits to a unique decomposition with reeach walk in
spect to the construction process (18)–(20) (see proof for details), and appears exactly once in the sum at each iteration.
As discussed in Section V (Propositions 3 and 4), the iterative
process does even more; the walk-sets at successive iterations
are nested and, under an appropriate condition, are “complete”
so that convergence is guaranteed for walk-summable models.
Showing and understanding all these properties are greatly facilitated by the introduction of a visual representation of how
each of our algorithms computes walks, and that is the subject
of Section IV-B.
B. Walk-Sum Diagrams
In the rest of this section, we present a graphical interpretation of our algorithms, and of the walk-sets
(18)–(20) that
are central to Proposition 1 (which in turn is the key to our convergence analysis in Section V). This interpretation provides a
clearer picture of memory usage and information flow at each iteration. Specifically, for each algorithm we construct a sequence
such that a particular set of walks in these graphs
of graphs
(18)–(20) computed by the
corresponds exactly to the sets
sequence of iterates
. The graphs
are called walk-sum
corresponds to the subgraph used at
diagrams. Recall that
iteration , generally using some of the values computed from
captures all of these prea preceding iteration. The graph
ceding computations leading up to and including the computations at iteration .
has very specific structure for each algoAs a result,
rithm. It consists of a number of levels—within each level we
capture the subgraph used at the corresponding iteration, and
the final level corresponds to the results at the end of iteration . Although some variables may not be updated at each
iteration, the values of those variables are preserved for use in
includes all the
subsequent iterations; thus, each level of
nodes in . The update variables at any iteration (i.e., the nodes
in ) are represented as solid circles, and the nonupdate ones
as open circles. All edges in each —edges of included in
this subgraph—are included in that level of the diagram. As in
, these are undirected edges, as our algorithms perform inference on this subgraph. However, this inference update uses
some values from preceding iterations (15), (17); hence, we use
directed edges (corresponding to ) from nodes at preceding
levels. The directed nature of these edges is critical as they capture the one-directional flow of computations from iteration to
iteration, while the undirected edges within each level capture
the inference computation (15) at each iteration. At the end of
iteration , only the values at level are of interest. Therefore,
that begin at any solid
the set of walks (reweighted by ) in
node at any level, and end at any node at the last level are of importance, where walks can only move in the direction of directed
edges between levels, but in any direction along the undirected
edges within each level.
Later in this section we provide a general procedure for constructing walk-sum diagrams for our most general algorithms,
but we begin by illustrating these diagrams and the points made
in the preceding paragraph using a simple 3-node, fully connected graph (with variables denoted , , ). We look at
two of the simplest iterative algorithms in the classes we have
described, namely the classic GJ and GS iterations [15], [16].
Fig. 1 shows the walk-sum diagrams for these algorithms.
In the GJ algorithm each variable is updated at each iteration
using the values from the preceding iteration of every other variable (this corresponds to a stationary ET algorithm (7) with the
being the fully disconnected graph of all the nodes
subgraph
). Thus, each level on the left in Fig. 1 is fully disconnected,
with solid nodes for all variables and directed edges from each
node at the preceding level to every other node at the next level.
This provides a simple way of seeing both how walks are extended from one level to the next and, more subtly, how walks
captured at one iteration are also captured at subsequent iterais captured by the ditions. For example, the walk 12 in
rected edge that begins at node 1 at level 1 and proceeds to node
). However, this walk in
2 at level 2 (the final level of
is captured by the walk that begins at node 1 at level 2 and pro.
ceeds to node 2 at level 3 in
The GS algorithm is a serial iteration that updates one variable
iterations each variable is
at a time, cyclically, so that after
updated exactly once. On the right-hand side of Fig. 1, only one
node at each level is solid, using values of the other nodes from
the preceding level. For nonupdate variables at any iteration, a
weight-1 directed edge is included from the same node at the
is updated at level 2, we
preceding level. For example, since
have open circles for nodes 1 and 3 at that level and weight-1
directed edges from their copies at level 1. Weight-1 edges do
not affect the weight of any walk. Hence, at level 4 we still
capture the walk 12 from level 2 (from node 1 at level 1 to node
2 at level 2); the walk is extended to node 2 at levels 3 and 4
with weight-1 directed edges.
of one of our
For general graphs, the walk-sum diagram
algorithms is constructed as follows.
CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS
1923
Fig. 2. (Left) Nonstationary ET: Subgraphs and walk-sum diagram. (Right) Hybrid serial updates: Subgraphs and walk-sum diagram.
1) For
, create a new copy of each
using solid
circles for update variables and open circles for nonupdate
. Draw the subgraph
using the
variables; label these
solid nodes and undirected edges weighted by the partial
is the same as
correlation coefficient of each edge.
with the exception that
also contains nonupdate
variables denoted by open circles.
, create a new copy of each
using
2) Given
solid circles for update variables and open circles other. Draw
using the update variables
wise; label these
with undirected edges. Draw a directed edge from the variin
(since
)
able
for each
. If there are no failed links,
to
. Both these undirected and directed
edges are weighted by their respective partial correlation
coefficients. Draw a directed edge to each nonupdate varifrom the corresponding
with unit edge
able
weight.
A level in a walk-sum diagram refers to the th replica of the
variables.
: Walks must respect the orientation
Rules for Walks in
of each edge, i.e., walks can cross an undirected edge in either direction, but can only cross directed edges in one direction. In addition, walks can only start at the update variables
for each level
. Interpreted in this manner, walks in
reweighted by and ending at one of the variables
are exactly the walks computed in
.
Proposition 2: Let
be a walk-sum diagram constructed
and interpreted according to the preceding rules. In walk-sum,
, and with
mable models, for any
failure
(18)–(19) reduce to
. Hence, the walk-sum formulas
(23)
The stationary GJ iteration discussed previously falls in this
class. The left-hand side of Fig. 2 shows the trees , , , and
the corresponding first three levels of the walk-sum diagrams for
a more general nonstationary ET iteration. This example illustrates how walks are “collected” in walk-sum diagrams at each
iteration. First, walks can proceed along undirected edges within
each level, and from one level to the next along directed edges
(capturing cut edges). Second, the walks relevant at each iteration must end at that level. For example, the walk 13231 is
captured at iteration 1 as it is present in the undirected edges
at level 1. At iteration 2, however, we are interested in walks
ending at level 2. The walk 13231 is still captured, but in a different manner—through the walk 1323 at level 1, followed by
the hop 31 along the directed edge from node 3 at level 1 to node
1 at level 2. At iteration 3, this walk is captured first by the hop
from node 1 at level 1 to node 3 at level 2, then by the hop 32 at
level 2, followed by the hop from node 2 at level 2 to node 3 at
level 3, and finally by the hop 31 at level 3.
D. Nonstationary Serial Updates
We describe similar walk-sum diagrams for the serial update scheme of Section III-B. Since there is no link failure,
. The recursive walk-set update (18) can
be specialized as follows:
(22)
Proof: Based on the preceding discussion, one can check
the equivalence of the walks computed by the walk-sum
diagrams with the walk-sets (18)–(20). Proposition 1 then
yields (22).
Sections IV-C–E describe walk-sum diagrams for the various
algorithms presented in Section III.
C. Nonstationary Parallel Updates
We describe walk-sum diagrams for the parallel ET algorithm
of Section III-A. Here,
for all . Since there is no link
(24)
While (23) is a specialization to iterations with parallel updates,
(24) is relevant for serial updates. The GS iteration discussed
in Section IV-B falls in this class, as do more general serial
updates described in Section III-B in which we update a subset
based on a subgraph of the induced graph of
of variables
. The right-hand side of Fig. 2 illustrates an example for our
3-node model. We show the subgraphs
used in the first four
stages of the algorithm and the corresponding 4-level walk-sum
and
diagram. Note that at iteration 2 we update variables
without taking into account the edge connecting them. Indeed,
1924
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008
Fig. 3. Nonstationary updates with failing links: Subgraphs used along with
failed edges at each iteration (left) and walk-sum diagram G (right).
the updates at the first four iterations of this example include
block GS, a hybrid of ET and block GS, parallel ET, and GS,
respectively.
E. Failing Links
We now discuss the general nonstationary update scheme
of Section III-C involving failing links. The recursive walk-set
computation equations for this iteration are given by (18)–(20).
Fig. 3 shows the subgraph and the edges in
that fail at each
iteration, and the corresponding 4-level walk-sum diagram. We
elaborate on the computation and propagation of information
at each iteration. At iteration 1, inference is performed using
subgraph , followed by nodes 1 and 2 passing a message to
each other according to the post-inference message-passing rule
is updated. As no links fail, node
(16). At iteration 2 only
3 gets information from nodes 1 and 2 at level 1. At iteration
fails. But node 1 has information about
at
3, the link
level 1 (due to the post-inference message passing step from
iteration 1). This information is used from the local memory
at node 1 in (17), and is represented by the arrow from node
2 at level 1 to node 1 at level 3. At iteration 4, the links
and
fail. Similar reasoning as in iteration 3 applies to the
arrows drawn across multiple levels from node 1 to node 3, and
from node 3 to node 1. Further, post-inference message-passing
at this iteration only takes place between nodes 1 and 2 because
is
.
the only edge in
V. CONVERGENCE ANALYSIS
We now show that all the algorithms of Section III converge
in walk-summable models. As in Section IV-A, we focus on
the most general nonstationary algorithm with failing links of
converges to the
Section III-C. We begin by showing that
. Next, we use this result to show
correct means when
that we can achieve convergence to the correct means for any
.
initial guess
as
relies on
The proof that
eventually contains every element of
the fact that
of all the walks in from to , a condition we
the set
refer to as completeness. Showing this begins with the following
proposition proved in Appendix C.
Proposition 3 (Nesting): The walk-sets defined in equations
,
(18)–(20) are nested, i.e., for every pair of vertices
for each .
This statement is easily seen for a stationary ET algorithm because the walk-sum diagram
from levels 2 to is a replica
(for example, the GJ diagram in Fig. 1). However,
of
the proposition is less clear for nonstationary iterations. The
discussion in Section IV-C illustrates this point; the paths that
a walk traverses change drastically depending on the level in
the walk-sum diagram at which the walk ends. Nonetheless, as
shown in Appendix C, the structure of the estimation algorithms
that we consider ensures that whenever a walk is not explicitly
captured in the same form it appeared in the preceding iteration, it is recovered through a different path in the subsequent
walk-sum diagram (no walks are lost).
Completeness relies on both nesting and the following additional condition.
be any directed edge in . For each
Definition 2: Let
, let
denote the set of directed active edges (links
at iteration . The edge
is said to
that do not fail) in
be updated infinitely often5 if for every
, there exists an
such that
.
If there is no link failure, this definition reduces to including
infinitely often. For pareach vertex in in the update set
allel nonstationary ET iterations (Section III-A), this property
is satisfied for any sequence of subgraphs. Note that there are
cases in which inference algorithms may not have to traverse
each edge infinitely often. For instance, suppose that can be
decomposed into subgraphs
and
that are connected by a
having small size so that we can perform
single edge, with
could be a leaf node (i.e.,
exact computations. For example,
have degree one). We can eliminate the variables in , propagate information “into”
along the single connecting edge,
perform inference within , and then back-substitute. Hence,
the single connecting edge is traversed only finitely often. In
this case the hard part of the overall inference procedure is on
the reduced graph with leaves and small, dangling subgraphs
eliminated, and we focus on inference problems on such graphs.
Thus, we assume that each vertex in has degree at least two
and study algorithms that traverse each edge infinitely often.
be an arbitrary
Proposition 4 (Completeness): Let
walk from to in . If every edge in is updated infinitely
often (in both directions), then there exists an such that
for all
, where the walk-set
is
defined in (18)–(20).
The proof of this proposition appears in Appendix D. We can
now state and prove the following.
Theorem 1: If every edge in is updated infinitely often
(in both directions), then
as
in walk-summable models, with
as defined in
Section IV-A.
,
Proof: One can check that
. This is because equations (18)–(20) only use edges from
the original graph . We have from Proposition 4 that every
.
walk from to in is eventually contained in
. Given these arguments
Thus,
and the nesting of the walk-sets
from Proposition
3, we can appeal to the results in Section II-C to conclude that
as
.
Theorem 1 shows that
for
. The
following result, proven in Appendix E, shows that in walk5If
G contains a singleton node, then this node must be updated at least once.
CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS
summable models convergence is achieved for any choice of
initial condition.6
Theorem 2: If every edge is updated infinitely often, then
computed according to (15), (17) converges to the correct means
.
in walk-summable models for any initial guess
Next, we describe a stronger convergence result when there
is no communication failure.
Corollary 1: Assuming no communication failure, we have
the following special cases for convergence in walk-summable
models (with any initial guess):
1) the hybrid algorithms of Section III-B converge to the correct mean as long as each variable is updated infinitely
often;
2) the parallel ET iterations of Section III-A (with
) converge to the correct mean using any sequence of
subgraphs.
We note that in the parallel ET iteration (with no link failure),
it is not necessary to include each edge in some subgraph ;
indeed, even stationary algorithms that use the same subgraph
at each iteration are guaranteed to converge.
Theorem 2 shows that walk-summability is a sufficient
condition for all our algorithms to converge for a very large
and flexible set of sequences of tractable subgraphs or subsets
of variables (ones that update each edge infinitely often) on
which to perform successive updates. Corollary 1 requires even
weaker conditions for convergence if there is no communication failure. The following result, proven in Appendix F, shows
that walk-summability is also necessary for this complete
flexibility. Thus, while any of our algorithms may converge for
some sequence of subsets of variables and tractable subgraphs
in a non-walk-summable model, there is at least one sequence
of updates that leads to a divergent iteration.
Theorem 3: For any non-walk-summable model, there exists
at least one sequence of iterative steps that is ill-posed, or for
, computed according to (15) and (17), diverges.
which
tween the error at iteration
:
1925
and the residual error at iteration
Based on this relationship, we have the walk-sum interpretation
, and consequently the following
bound on the norm of
:
(25)
denotes walks in that must traverse edges not
where
in
,
refers to the entry-wise absolute value vector
of
,
refers to the reweighted absolute
walk-sum over all walks in , and
refers to
the reweighted absolute walk-sum over all walks in . Minimizing the error
reduces to choosing
to maximize
. Hence, if we maximize among all trees, we
have the following maximum walk-sum tree problem:
(26)
Rather than solving this combinatorially complex problem, we
instead solve a problem that minimizes a looser upper bound
than (25). Specifically, consider any edge
and all of
the walks
that
live solely on this single edge. It is not difficult to show that
VI. ADAPTIVE ITERATIONS AND EXPERIMENTAL RESULTS
In this section we address two topics. The first is taking advantage of the great flexibility in choosing successive iterative
steps by developing techniques that adaptively optimize the online choice of the next tree or subset of variables to use in order
to reduce the error as quickly as possible. The second is providing experimental results that demonstrate the convergence
behavior of these adaptive algorithms.
(27)
This weight provides a measure of the error-reduction capacity
of edge
by itself at iteration . This leads directly to
choosing the maximum spanning tree [28] by solving
(28)
A. Choosing Trees and Subsets of Variables Adaptively
At iteration , let the error be
and the residual
error be
. Note that it is tractable to compute
the residual error at each iteration.
1) Trees: We describe an efficient algorithm to choose spanning trees adaptively to use as preconditioners in the ET algorithm of Section III-A. We have the following relationship be6Note
R x
s;t 2 E
that in this case the messages must be initialized as
for each directed edge ( )
.
M (s ! t) =
For any tree
the set of walks captured in the sum in (28) is
a subset of all the walks in , so that solving (28) provides a
lower bound on (26) and thus a looser upper bound than (25).
Each iteration of this method can be solved using
computations based on standard greedy approaches for the maximum spanning tree problem [28]. For sparse graphs, more sophisticated variants can also be used to achieve a per-iteration
complexity of
[28].
1926
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008
2) Subsets of Variables: We present an algorithm to choose
the next best subset of variables for the block GS algorithm of
Section III-B. The error at iteration can be written as follows:
2
Fig. 4. (Left) Convergence results for a randomly generated 15 15 nearestneighbor grid-structured model. (Right) Average number of iterations required
for the normalized residual to reduce by a factor of 10
over 100 randomly
generated models.
As with (25), we have the following upper bound:
(29)
where
refers to the edges in the induced subgraph of .
Minimizing this upper bound reduces to solving the following
maximum walk-sum block problem:
(30)
As with the maximum walk-sum tree problem, finding the optimal such block directly is combinatorially complex. Therefore, we consider the following relaxed maximum walk-sum
block problem based on single-edge walks:
(31)
denotes the restriction that walks can traverse at
where
most one edge. The walks in (31) are a subset of the walks in
(30). Thus, solving (31) provides a lower bound on (30), hence
minimizing a looser upper bound on the error than (29).
Solving (31) is also combinatorially complex; therefore, we
use a greedy method for an approximate solution:
1) Set
. Assuming that the goal is to solve the problem
for
, compute node weights
based on the walks captured by (31) if node were to be
included in .
2) Find the maximum weight node
from
, and set
.
3) If
, stop. Otherwise, update each neighbor
of
and go to Step 2):
This update captures the extra walks in (31) if were to be
added to .
Step 3) is the greedy aspect of the algorithm as it updates
weights by computing the extra walks that would be captured
in (31) if node were added to , with the assumption that
the nodes already in
remain unchanged. Note that only
the weights of the neighbors of
are updated in Step 3);
thus, there is a bias towards choosing a connected block.
In choosing successive blocks in this way, we collect walks
adaptively without explicit regard for the objective of updating
each node infinitely often. However, our method is biased
towards choosing variables that have not been updated for a
few iterations as the residual error of such variables becomes
larger relative to the other variables. Indeed, empirical evidence
confirms this behavior with all the simulations leading to
convergent iterations. For bounded, each iteration using this
technique requires
computations using an efficient
sort data structure [28]. The space complexity of maintaining
such a structure can be significant compared to the adaptive ET
procedure.
3) Experimental Illustration: We test the preceding
two adaptive algorithms on randomly generated 15 15
nearest-neighbor grid models with7
, and with
. The blocks used in block GS were of size
.
We compare these adaptive methods to standard nonadaptive
one-tree and two-tree ET iterations [13]. Fig. 4 shows the
performance of these algorithms. The plot shows the relative
decrease in the normalized residual error
versus the number of iterations. The table shows the average
number of iterations required for these algorithms to reduce
the normalized residual error below
. The average was
computed based on the performance on 100 randomly generated models. All these models are poorly conditioned because
they are barely walk-summable. The number of iterations for
block GS is scaled appropriately to account for the different
per-iteration computational costs (
for adaptive ET and
for adaptive block GS). The one-tree
ET method uses a spanning tree obtained by removing all
the vertical edges except the middle column. The two-tree
method alternates between this tree and its rotation (obtained
by removing all the horizontal edges except the middle row).
0
7The grid edge weights are chosen uniformly at random from [ 1; 1]. The
) = 0:99. The potential vector h is chosen
matrix R is then scaled so that %(R
to be the all-ones vector.
CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS
Fig. 5. Convergence of memory-based algorithm on same randomly generated
: (left) and varying with
15 15 used in Fig. 4: Varying with : (right).
2
= 03
= 03
Both the adaptive ET and block GS algorithms provide far
faster convergence compared to the one-tree and two-tree iterations, thus providing a computationally attractive method for
estimation in the broad class of walk-summable models.
B. Communication Failure: Experimental Illustration
To illustrate our adaptive methods in the context of communication failure, we consider a simple model for a distributed
sensor network in which links (edges) fail independently with
failure probability , and each failed link remains inactive for
a certain number of iterations given by a geometric random
. At each iteration, we find the best
variable with mean
spanning tree (or forest) among the active links using the approach described in Section VI-A-1). The maximum spanning
tree problem can be solved in a distributed manner using the algorithms presented in [29], [30]. Fig. 5 shows the convergence
of our memory-based algorithm from Section III-C on the same
randomly generated 15 15 grid model used to generate the
plot in Fig. 4 (again, with
). The different curves are
obtained by varying and . As expected, the first plot shows
that our algorithm is slower to converge as the failure probability increases, while the second plot shows that convergence
is faster as is increased (which decreases the average inactive
time). These results show that our adaptive algorithms provide
a scalable, flexible, and convergent method for the estimation
problem in a distributed setting with communication failure.
1927
each iteration, and provide an intuitive graphical comparison
between the various algorithms. This walk-sum analysis allows
us to conclude that for the broad class of walk-summable
models, our algorithms converge for a very large and flexible
set of sequences of subgraphs and subsets of variables used.
We also describe how this flexibility can be exploited by
formulating efficient algorithms that choose spanning trees
and subsets of variables adaptively at each iteration. These
algorithms are then used in the ET and block GS iterations respectively to demonstrate that significantly faster convergence
can be obtained using these methods over traditional one-tree
and two-tree ET iterations.
Our adaptive algorithms are greedy in that they consider the
effect of edges individually (by considering walk-sums concentrated on single edges). A strength of our analysis for the case
of finding the “next best” tree is that we do obtain an upper
bound on the resulting error, and hence on the possible gap
between our greedy procedure and the truly optimal one. Obtaining tighter error bounds, or conditions on graphical models
under which our choice of tree yields near-optimal solutions is
an open problem. Another interesting question is the development of general versions of the maximum walk-sum tree and
maximum walk-sum block algorithms that choose the
next
best trees or blocks jointly in order to achieve maximum reduction in error after iterations. For applications involving communication failure, extending our adaptive algorithms in a principled manner to explicitly avoid failed links while optimizing
the rate of convergence is an important problem. Finally, the fundamental principle of solving a sequence of tractable inference
problems on subgraphs has been exploited for non-Gaussian inference problems (e.g., [12]), and extending our methods to this
more general setting is of clear interest.
APPENDIX
A. Dealing With Un-Normalized Models
Consider an information matrix
(where
is
the diagonal part of ) that is not normalized, i.e.,
. The
weight of a walk
can be redefined as follows:
VII. CONCLUSION
In this paper, we have described and analyzed a rich set of
algorithms for estimation in Gaussian graphical models with arbitrary structure. These algorithms are iterative in nature and involve a sequence of inference problems on tractable subgraphs
over subsets of variables. Our framework includes parallel iterations such as ET, in which inference is performed on a tractable
subgraph of the whole graph at each iteration, and serial iterations such as block GS, in which the induced subgraph of a
small subset of variables is directly inverted at each iteration.
We describe hybrid versions of these algorithms that involve
inference on a subgraph of a subset of variables. We also discuss a method that uses local memory at each node to overcome
temporary communication failures that may arise in distributed
sensor networks. Our memory-based framework applies to the
nonstationary ET, block GS, and hybrid algorithms.
We analyze these algorithms based on the recently introduced
walk-sum interpretation of Gaussian inference. A salient feature in our analysis is the development of walk-sum diagrams.
These graphs correspond exactly to the walks computed after
where
is the weight of with respect to the un-normalized model, and
is the weight of in the corresponding
normalized model. We can then define walk-summability
in terms of the absolute convergence of the un-normalized
walk-sum
over all walks from to (for each
pair of vertices
). A necessary and sufficient condition for this un-normalized notion of walk-summability is
, which is equivalent to the original
in the corresponding normalized model.
condition
Un-normalized versions of the algorithms in Section III can
be constructed by replacing every occurrence of the partial
correlation matrix by the un-normalized off-diagonal matrix
1928
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008
. The rest of our analysis and convergence results remain
unchanged because we deal with abstract walk-sets. (Note
that in the proof of Proposition 1, every occurrence of
must be changed to .) Alternatively, given an un-normalized model, one can first normalize the model
, then
apply the algorithms of Section III, and finally “de-normalize”
. Such
the resulting estimate
a procedure would provide the same estimate as the direct
application of the un-normalized versions of the algorithms in
Section III as outlined above.
Proof of proposition: We provide an inductive proof. From
. Thus
(20),
B. Proof of Proposition 1
where the first equality is from (12), the second from the inductive hypothesis, and the third from (19). Hence, we can focus on
, (32) and (33) can be rewritten as
nodes in . For
Remarks: Before proceeding with the proof of the
proposition, we make some observations about the walk-sets
that will prove useful for the other proofs as
, notice that since the set of edges contained
well. For
and
are disjoint, the walk-sets
and
in
are disjoint. Therefore, from Section II-C,
which is consistent with the proposition because we assume that
our initial guess is 0 at each node.
, for
, as
Assume that
,
the inductive hypothesis. For
(34)
because
have that
if
. From (32)–(34), we
(32)
Every walk
can be decomposed uniquely as
,
,
, and
where
. The unique decomposition property is a
consequence of
and
being disjoint, and the walk in
being restricted to a length-1 hop. This property also implies
and
that
where we have used the walk-sum interpretation of
Simplifying further, we have that
and
are
. Based on these two observations,
disjoint if
we have from Section II-C that
.
(35)
The last equality is from the inductive hypothesis because
. Next, we have that
where the second equality is from (17), and the third from (15).
(33)
CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS
1929
C. Proof of Proposition 3
E. Proof of Theorem 2
We provide an inductive proof based on the walk-sum dia. For a completely algebraic proof that is not based
gram
on walk-sum diagrams we refer the reader to [31]. Based on the
construction of the walk-sum diagrams in Section IV-B and the
rules for interpreting walks in these diagrams, one can check the
with the walk-sets (18)–(20) (this
equivalence of walks in
property is used in the proof of Proposition 2). More precisely,
; for each
we have that
that begins at some copy of and ends at
there
walk in
, and vice versa.
is a corresponding walk in
as
First, note that the proposition is true for
. We assume as the inductive hypothesis
for
.
that
. We must show that
Suppose that
. If
, then
because
from (19).
, then we must show that
is a walk in
If
that ends at
and starts at some node
for
. We
from
. We
begin by tracing the walk backwards in
continue this process as far as possible in
until we find an
where
. If there is no such
edge
. Otherwise, we
edge, then
, where
is a walk
have that
and
is an edge in
.
at level of
We need to check that
. Since
, the walk
ending at
in
also makes the hop
through some edge
in
. This means that
. However, by
, we have that
.
the definition of
, we use the inductive hypothesis
As
. Thus,
to conclude that
is a walk in
that starts at
some
and ends at
. This walk can continue in
along the edge
, and further from
to
. Therefore,
.
From Theorem 1 and Proposition 1, we can conclude that
converges to
element-wise as
for
. As. Consider a shifted linear system
,
sume that
. If we solve this system using the same
where
sequence of operations (subgraphs and failed links) that were
, and with
, then
conused to obtain the iterates
of the system
.
verges to the correct solution
, which would allow us
We will show that
to conclude that
element-wise as
for
. We prove this final step inductively. The base case is
any
. Assume as the inductive hypothesis that
clear because
for
. From this, one can
. For
, we have from (15)
check that
and (17) that
D. Proof of Proposition 4
Let
. We provide an inductive proof with respect
to the length of . If every edge is updated infinitely often, it is
clear that every node is updated infinitely often. Therefore, the
is computed when is first updated
leading length-0 part
at some iteration . By the nesting of the walk-sets
from
for all
.
Proposition 3, we have that
Now assume (as the inductive hypothesis) that the leading subincluding all but the last step
of is contained
walk
in
for some
. Given the infinitely-often
such that the edge
update property, there exists an
. If
, then
. This can
be concluded from (18) and because
by the nesting argument
of Proposition 3. Again
applying the nesting argument, we can prove the proposition befor all
. We can
cause we now have that
use a similar argument to conclude that
for all
, if
.
The second equality follows from the inductive hypothesis, and
the last two from simple algebra.
F. Proof of Theorem 3
Before proving the converse, we have the following lemma
that is proved in [32].
Lemma 2: Suppose is a symmetric positive-definite mais some splitting with
symmetric
trix, and
and
nonsingular. Then,
if and only if
.
is valid but non-walk-sumProof: Assume that
mable. Therefore, must contain some negative partial correlation coefficients (since all valid attractive models, i.e., those
containing only non-negative partial correlation coefficients, are
with
walk-summable; see Section II-C). Let
containing the positive coefficients and
containing the negative coefficients (including the negative sign). Consider a stationary ET iteration (7) based on cutting the negative edges so
and
. If
is singular, then the
that
iteration is ill-posed. Otherwise, the iteration converges if and
[15], [16]. From Lemma 2, we need to
only if
:
check the validity of
But
if and only if the model is walk-summable (from
Section II-C). Thus, this stationary iteration, if well-posed, does
not converge in non-walk-summable models.
1930
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008
ACKNOWLEDGMENT
The authors would like to thank Dr. E. Sudderth for helpful
discussions and comments on early drafts of this work.
REFERENCES
[1] S. L. Lauritzen, Graphical Models. Oxford, U.K.: Oxford Univ.
Press, 1996.
[2] M. I. Jordan, “Graphical models,” Stat. Sci. (Special Issue on Bayesian
Statistics), vol. 19, pp. 140–155, 2004.
[3] C. Crick and A. Pfeffer, “Loopy belief propagation as a basis for communication in sensor networks,” presented at the 19th Conf. Uncertainty in Artificial Intelligence, Acapulco, Mexico, Aug. 8–10, 2003.
[4] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions,
and the Bayesian restoration of images,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 5, pp. 721–741, Jun. 1984.
[5] J. Woods, “Markov image modeling,” IEEE Trans. Autom. Control, vol.
23, no. 5, pp. 846–850, Oct. 1978.
[6] R. Szeliski, “Bayesian modeling of uncertainty in low-level vision,” J.
Comput. Vision, vol. 5, no. 3, pp. 271–301, Dec. 1990.
[7] P. Rusmevichientong and B. V. Roy, “An analysis of turbo decoding
with Gaussian densities,” Adv. Neural Inf. Process. Syst., vol. 12, pp.
575–581, 2000.
[8] P. W. Fieguth, W. C. Karl, A. S. Willsky, and C. Wunsch, “Multiresolution optimal interpolation and statistical analysis of Topex/Poseidon
satellite altimetry,” IEEE Trans. Geosci. Remote Sens., vol. 33, no. 2,
pp. 280–292, Mar. 1995.
[9] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo,
CA: Morgan Kauffman, 1988.
[10] E. B. Sudderth, M. J. Wainwright, and A. S. Willsky, “Embedded trees:
Estimation of Gaussian processes on graphs with cycles,” IEEE Trans.
Signal Process., vol. 52, no. 11, pp. 3136–3150, Nov. 2004.
[11] L. K. Saul and M. I. Jordan, “Exploiting tractable substructures in
intractable networks,” Adv. Neural Inf. Process. Syst., vol. 8, pp.
486–492, 1996.
[12] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky, “Tree-based reparameterization framework for analysis of sum-product and related algorithms,” IEEE Trans. Inf. Theory, vol. 49, no. 5, pp. 1120–1146, May
2003.
[13] E. B. Sudderth, “Embedded trees: Estimation of Gaussian processes
on graphs with cycles,” Master’s thesis, Massachusetts Inst. of Technology, Cambridge, MA, 2002.
[14] V. Delouille, R. Neelamani, and R. Baraniuk, “Robust distributed estimation using the embedded subgraphs algorithm,” IEEE Trans. Signal
Process., vol. 54, no. 8, pp. 2998–3010, Aug. 2006.
[15] G. H. Golub and C. H. Van Loan, Matrix Computations. Baltimore,
MD: The Johns Hopkins Univ. Press, 1990.
[16] R. S. Varga, Matrix Iterative Analysis. New York: Springer-Verlag,
2000.
[17] D. M. Malioutov, J. K. Johnson, and A. S. Willsky, “Walk-sums and belief propagation in Gaussian graphical models,” J. Mach. Learn. Res.,
vol. 7, pp. 2031–2064, Oct. 2006.
[18] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge,
U.K.: Cambridge Univ. Press, 1994.
[19] R. Bru, F. Pedroche, and D. B. Szyld, “Overlapping additive and multiplicative Schwarz iterations for H-matrices,” Linear Algebra Its Appl.,
vol. 393, pp. 91–105, 2004.
[20] A. Frommer and D. B. Szyld, “H-splittings and two-stage iterative
methods,” Numer. Math., vol. 63, pp. 345–356, 1992.
[21] T. Gu, X. Liu, and X. Chi, “Relaxed parallel two-stage multisplitting
methods II: Asynchronous version,” Int. J. Comput. Math., vol. 80, no.
10, pp. 1277–1287, Oct. 2003.
[22] T. Speed and H. Kiiveri, “Gaussian Markov probability distributions
over finite graphs,” Ann. Stat., vol. 14, no. 1, pp. 138–150, Mar. 1986.
[23] L. Scharf, Statistical Signal Processing. Upper Saddle River, NJ:
Prentice-Hall, 2002.
[24] W. Rudin, Principles of Mathematical Analysis. New York: McGraw-Hill, 1976.
[25] R. Godement, Analysis I. New York: Springer-Verlag, 2004.
[26] P. Fieguth, W. Karl, and A. Willsky, “Efficient multiresolution counterparts to variational methods for surface reconstruction,” Comput. Vision Image Understand., vol. 70, no. 2, pp. 157–176, May 1998.
[27] W. G. Strang and G. J. Fix, An Analysis of the Finite Element method.
Cambridge, MA: Wellesley-Cambridge, 1973.
[28] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms. Cambridge, MA: MIT Press, 2001.
[29] R. G. Gallager, P. A. Humblet, and P. M. Spira, “A distributed algorithm for minimum-weight spanning trees,” ACM Trans. Program.
Lang. Syst., vol. 5, no. 1, pp. 66–77, Jan. 1983.
[30] B. Awerbuch, “Optimal distributed algorithms for minimum weight
spanning tree, counting, leader election, and related problems,” in
Annu.l ACM Symp. Theory of Computing, 1987.
[31] V. Chandrasekaran, “Modeling and estimation in Gaussian graphical models: Maximum-entropy relaxation and walk-sum analysis,”
Master’s thesis, Massachusetts Inst. Technology, Cambridge, MA,
2007.
[32] L. Adams, “m-step preconditioned conjugate gradient methods,” SIAM
J. Sci. Stat. Comput., vol. 6, pp. 452–463, Apr. 1985.
Venkat Chandrasekaran (S’03) received the B.S.
degree in electrical engineering, the B.A. degree in
mathematics from Rice University, Houston, TX, in
2005, and the S.M. degree in electrical engineering
from the Massachusetts Institute of Technology,
Cambridge, in 2007, where he is currently working
towards the Ph.D. degree with the Stochastic Systems Group.
His research interests include statistical signal
processing, machine learning, and computational
harmonic analysis.
Jason K. Johnson received the B.S. degree in
physics and the M.S. degree in electrical engineering
and computer science, both from the Massachusetts
Institute of Technology (MIT), Cambridge, in 1995
and 2003, respectively, where he is currently working
towards the Ph.D. degree.
From 1995 to 2000, he was a Member of Technical
Staff at Alphatech, Burlington, MA, where he developed algorithms for multiscale signal and image
processing, data fusion and multitarget tracking. His
current research interests include Markov models,
tractable methods for inference and learning, and statistical signal and image
processing.
Alan S. Willsky (S’70–M’73–SM’82–F’86) joined
the Massachusetts Institute of Technology (MIT),
Cambridge, in 1973 and is the Edwin Sibley Webster Professor of Electrical Engineering. He was a
founder of Alphatech, Inc., Burlington, MA, and
is the Chief Scientific Consultant at BAE Systems
Advanced Information Technologies. From 1998
to 2002, he served on the U.S. Air Force Scientific
Advisory Board. He has delivered numerous keynote
addresses and is coauthor of the text Signals and Systems (Englewood Cliffs, NJ: Prentice-Hall, 1997).
His research interests are in the development and application of advanced
methods of estimation and statistical signal and image processing.
Dr. Willsky has received several award, including the 1975 American Automatic Control Council Donald P. Eckman Award, the 1979 ASCE Alfred Noble
Prize, the 1980 IEEE Browder J. Thompson Memorial Award, the IEEE Control
Systems Society Distinguished Member Award in 1988, the 2004 IEEE Donald
G. Fink Prize Paper Award, and Doctorat Honoris Causa from the Université de
Rennes, France, in 2005.