1916 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008 Estimation in Gaussian Graphical Models Using Tractable Subgraphs: A Walk-Sum Analysis Venkat Chandrasekaran, Student Member, IEEE, Jason K. Johnson, and Alan S. Willsky, Fellow, IEEE Abstract—Graphical models provide a powerful formalism for statistical signal processing. Due to their sophisticated modeling capabilities, they have found applications in a variety of fields such as computer vision, image processing, and distributed sensor networks. In this paper, we present a general class of algorithms for estimation in Gaussian graphical models with arbitrary structure. These algorithms involve a sequence of inference problems on tractable subgraphs over subsets of variables. This framework includes parallel iterations such as embedded trees, serial iterations such as block Gauss–Seidel, and hybrid versions of these iterations. We also discuss a method that uses local memory at each node to overcome temporary communication failures that may arise in distributed sensor network applications. We analyze these algorithms based on the recently developed walk-sum interpretation of Gaussian inference. We describe the walks “computed” by the algorithms using walk-sum diagrams, and show that for iterations based on a very large and flexible set of sequences of subgraphs, convergence is guaranteed in walk-summable models. Consequently, we are free to choose spanning trees and subsets of variables adaptively at each iteration. This leads to efficient methods for optimizing the next iteration step to achieve maximum reduction in error. Simulation results demonstrate that these nonstationary algorithms provide a significant speedup in convergence over traditional one-tree and two-tree iterations. Index Terms—Distributed estimation, Gauss–Markov random fields, graphical models, maximum walk-sum block, maximum walk-sum tree, subgraph preconditioners, walk-sum diagrams, walk-sums. I. INTRODUCTION G RAPHICAL models offer a convenient representation for joint probability distributions and convey the Markov structure in a large number of random variables compactly. A graphical model [1], [2] is a collection of variables defined with respect to a graph; each vertex of the graph is associated with a random variable and the edge structure specifies the conditional independence properties among the variables. Due to their sophisticated modeling capabilities, graphical models [also known as Markov random fields (MRFs)] have found applications in a variety of signal processing tasks involving distributed sensor networks [3], images [4], [5], and computer vision [6]. Our focus in this paper is on the important class Manuscript received January 17, 2007; revised June 15, 2007. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Manuel Davy. This work was supported by the Army Research Office Grant W911NF-05-1-0207 and the Air Force Office of Scientific Research Grant FA9550-04-1-0351. The authors are with the Laboratory for Information and Decision Systems, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TSP.2007.912280 of Gaussian graphical models, also known as Gauss–Markov random fields (GMRFs), which have been widely used to model natural phenomena in many large-scale estimation problems [7], [8]. In estimation problems in which the prior and observation models have normally distributed random components, computing the Bayes least-squares estimate is equivalent to solving a linear system of equations specified in terms of the informationform parameters of the conditional distribution. Due to its cubic computational complexity in the number of variables, direct matrix inversion to solve the Gaussian estimation problem is intractable in many applications in which the number of variables is very large (e.g., in oceanography problems [8] the number of ). For tree-structured MRFs variables may be on the order of (i.e., graphs with no cycles), belief propagation (BP) [9] provides an efficient linear complexity algorithm to compute exact estimates. However, tree-structured Gaussian processes possess limited modeling capabilities [10]. In order to model a richer class of statistical dependencies among variables, one often requires loopy graphical models. As estimation on graphs with cycles is substantially more complex, considerable effort has been and still is being put into developing methods that overcome this computational barrier, including a variety of methods that employ the idea of performing inference computations on tractable subgraphs [11], [12]. The recently proposed embedded trees (ET) iteration [10], [13] is one such approach that solves a sequence of inference problems on trees or, more generally, tractable subgraphs. If ET converges, it yields the correct conditional estimates, thus providing an effective inference algorithm for graphs with essentially arbitrary structure. For the case of stationary ET iterations—in which the same tree or tractable subgraph is used at each iteration—necessary and sufficient conditions for convergence are provided in [10], [13]. However, experimental results in [13] provide compelling evidence that much faster convergence can often be obtained by changing the embedded subgraph that is used from one iteration to the next. The work in [13] provided very limited analysis for such nonstationary iterations, thus leaving open the problem of providing easily computable broadly applicable conditions that guarantee convergence. In related work that builds on [10], Delouille et al. [14] describe a stationary block Gauss–Jacobi (GJ) iteration for solving the Gaussian estimation problem with the added constraint that messages between variables connected by an edge in the graph may occasionally be “dropped.” The local blocks (subgraphs) are assumed to be small in size. Such a framework provides a simple model for estimation in distributed sensor networks where communication links between nodes may occasionally fail. The proposed solution involves the use of memory at each 1053-587X/$25.00 © 2008 IEEE CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS node to remember past messages from neighboring nodes. The values in this local memory are used if there is a breakdown in communication to prevent the iteration from diverging. However, the analysis in [14] is also restricted to the case of stationary iterations, in that the same partitioning of the graph into local subgraphs is used at every iteration. Finally, we note that ET iterations fall under the class of parallel update algorithms, in that every variable must be updated in an iteration before one can proceed to the next iteration. However, serial schemes involving updates over subsets of variables also offer tractable methods for solving large linear systems [15], [16]. An important example in this class of algorithms is block Gauss–Seidel (GS) in which each iteration involves updating a small subset of variables. In this paper, we analyze nonstationary iterations based on an arbitrary sequence of embedded trees or tractable subgraphs. We refer to these trees and subgraphs on which inference is performed at each iteration as preconditioners, following the terminology used in the linear algebra literature. We present a general class of algorithms that includes the nonstationary ET and block GS iterations, and provide a general and very easily tested condition that guarantees convergence for any of these algorithms. Our framework allows for hybrid nonstationary algorithms that combine aspects of both block GS and ET. We also consider the problem of failing links and describe a method that uses local memory at each node to address this problem in general nonstationary parallel and serial iterations. Our analysis is based on a recently introduced framework for interpreting and analyzing inference in GMRFs based on sums over walks in graphs [17]. We describe walk-sum diagrams that provide an intuitive interpretation of the estimates computed by each of the algorithms after every iteration. A walk-sum diagram is a graph that corresponds to the walks “accumulated” after each iteration. As developed in [17] walk-summability is an easily tested condition which, as we will show, yields a simple necessary and sufficient condition for the convergence of the algorithms. As there are broad classes of models (including attractive, diagonally dominant, and so-called pairwise-normalizable models) that are walk-summable, our analysis shows that our algorithms provide a convergent, computationally attractive method for inference. The walk-sum analysis and convergence results show that arbitrary nonstationary iterations of our algorithms based on a very large and flexible set of sequences of subgraphs or subsets of variables converge in walk-summable models. Consequently, we are free to use any sequence of trees in the ET algorithm or any valid sequence of subsets of variables (one that updates each variable infinitely often) in the block GS iteration, and still achieve convergence in walk-summable models. We exploit this flexibility by choosing trees or subsets of variables adaptively to minimize the error at iteration based on . To make these choices opthe residual error at iteration timally, we formulate combinatorial optimization problems that maximize certain reweighted walk-sums. We describe efficient methods to solve relaxed versions of these problems. For the case of choosing the “next best” tree, our method reduces to solving a maximum-spanning tree problem. Simulation results indicate that our algorithms for choosing trees and subsets of 1917 variables adaptively provide a significant speedup in convergence over traditional approaches involving a single preconditioner or alternating between two preconditioners. Our walk-sum analysis also shows that local memory at each node can be used to achieve convergence for any of the above algorithms when communication failures occur in distributed sensor networks. Our protocol differs from the description in [14], and as opposed to that work, allows for nonstationary updates. Also, our walk-sum diagrams provide a simple, intuitive representation for the propagation of information with each iteration. One of the conditions for walk-summability in Section II-C shows that walk-summable models are equivalent to models for which the information matrix is an H-matrix [16], [18]. Several methods for finding good preconditioners for such matrices have been explored in the linear algebra literature, but these have been restricted to either cycling through a fixed set of preconditioners [19] or to so-called “multisplitting” algorithms [20], [21]. These results do not address the problem of convergence of nonstationary iterations using arbitrary (noncyclic) sequences of subgraphs. The analysis of such algorithms along with the development of methods to pick a good sequence of preconditioners are the main novel contributions of this paper, and the recently developed concept of walk-sums is critical to our analysis. In Section II, we provide the necessary background about GMRFs and the walk-sum view of inference. Section III describes all the algorithms that we analyze in this paper, while Section IV contains the analysis and walk-sum diagrams that provide interpretations of the algorithms in terms of walk-sum computations. In Section V, we use the walk-sum interpretation of Section IV to show that these algorithms converge in walk-summable models. Section VI presents techniques for choosing tree-based preconditioners and subsets of variables adaptively for the ET and block GS iterations respectively, and demonstrates the effectiveness of these methods through simulation. We conclude with a brief discussion in Section VII. The Appendix provides additional details and proofs. II. GAUSSIAN GRAPHICAL MODELS AND WALK-SUMS A. Gaussian Graphical Models and Estimation A graph ated edges consists of a set of vertices , where and associ- is the set of all unordered is said to separate subsets pairs of vertices. A subset if every path in between any vertex in and any passes through a vertex in . A graphical model vertex in [1], [2] is a collection of random variables indexed by the vercorresponds to a random tices of a graph; each vertex , .A variable , and where for any is Markov with respect to if for any subdistribution that are separated by some , the subset of sets variables is conditionally independent of given , i.e., . parameterized by a mean We consider GMRFs vector and a positive-definite covariance matrix (denoted by ): [1], [22]. For simplicity, each 1918 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008 is assumed to be a scalar variable. An alternate natural parameterization for GMRFs is specified in terms of the informa(also called precision or concentration tion matrix matrix) and potential vector , and is denoted by . In particular, if is Markov with respect to graph , then the specialization of the Hammersley–Clifford theorem for Gaussian models [1], [22] directly relates the sparif and only if the edge sity of to the sparsity of : for every pair of vertices . The partial correis the correlation coefficient of variables lation coefficient and conditioned on knowledge of all the other variables [1]: (1) Hence, implies that and are conditionally independent given all the other variables . , and suppose that we are given Let of , with . The noisy observations goal of the Gaussian estimation problem is to compute an estimate that minimizes the expected squared-error between and . The solution to this problem is the mean of the posterior , with and distribution [23]. Thus, the posterior mean can be computed as the solution to the following linear system: (2) We note that is a symmetric positive-definite matrix. If and are diagonal (corresponding to local measurements) has the . The conditions for all same sparsity structure as that of our convergence results and analysis in this paper are specified in terms of the posterior graphical model parameterized by . As described in the introduction, solving the linear system (2) is computationally expensive by direct matrix inversion even for moderate-size problems. In this paper, we discuss tractable methods to solve this linear system. B. Walk-Summable Gaussian Graphical Models We assume that the information matrix of a Gaussian model has been normalized to have unit diagdefined on onal entries. For example, if is a diagonal matrix containing the diagonal entries of , then the matrix contains rescaled entries of at off-diagonal locations and 1’s along the diagonal. Such a rescaling does not affect the convergence results of the algorithms in this paper.1 However, rescaled matrices are useful in order to provide simple characterizations . The off-diagonal elements of walk-sums. Let of are precisely the partial correlation coefficients from (1), and have the same sparsity structure as that of (and consequently the same structure as ). Let these off-diagonal enis the weight tries be the edge weights in , i.e., of the edge . A walk in is defined to be a sequence such that for each of vertices . Thus, there is no restriction on a walk crossing 1Although our analysis of the algorithms in Section III is specified for normalized models, these algorithms and our analysis can be easily extended to the un-normalized case. See Appendix A. the same node or traversing the same edge multiple times. The is defined: weight of the walk Note that the partial-correlation matrix is essentially a matrix of edge weights. Interpreted differently, one can also view each element of as the weight of the length-1 walk between is then the walk-sum two vertices. In general, over the (finite) set of all length- walks from to [17], where the walk-sum over a finite set is the sum of the weights of the walks in the set. Based on this point of view, we can interpret estimation in Gaussian models from (2) in terms of walk-sums: (3) Thus, the covariance between variables and is the lengthordered sum over all walks from to . This, however, is a very specific instance of an inference algorithm that converges if the is satisfied (so that the maspectral radius condition trix geometric series converges). Other inference algorithms, however, may compute walks in different orders. In order to analyze the convergence of general inference algorithms that submit to a walk-sum interpretation, a stronger condition was developed in [17] as follows. Given a countable set of walks , is the unordered sum of the individual the walk-sum over weights of the walks contained in : In order for this sum to be well-defined, we consider the following class of Gaussian graphical models. Definition 1: A Gaussian graphical model defined on is said to be walk-summable if the absolute walk-sums over the set of all walks between every pair of vertices in are , well-defined. That is, for every pair Here, denotes absolute walk-sums over a set of walks. corresponds to the set of all walks2 beginning and ending at the vertex in . Section II-C at vertex lists some easily tested equivalent and sufficient conditions for walk-summability. Based on the absolute convergence condition, walk-summability implies that walk-sums over a countable set of walks can be computed in any order and that is well-defined [24], [25]. the unordered walk-sum Therefore, in walk-summable models, the covariances and means can be interpreted as follows: (4) (5) W W W 2We denote walk-sets by but generally drop this notation when referring to the walk-sum over , i.e., the walk-sum of the set ( ) is denoted by ( ). CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS where (3) is used in (4), and (4) in (5). In words, the covariance and is the walk-sum over the set of all between variables walks from to , and the mean of variable is the walk-sum over all walks ending at with each walk being reweighted by the potential value at the starting node. The goal in walk-sum analysis is to interpret an inference algorithm as the computation of walk-sums in . If the analysis shows that the walks being computed by an inference algorithm are the same as those required for the computation of the means and covariances above, then the correctness of the algorithm can be concluded directly for walk-summable models. This conclusion can be reached regardless of the order in which the algorithm computes the walks due to the fact that walk-sums can be computed in any order in walk-summable models. Thus, the walk-sum formalism allows for very strong yet intuitive statements about the convergence of inference algorithms that submit to a walk-sum interpretation. Indeed, the overall template for analyzing our inference algorithms is simple. First, we show that the algorithms submit to a walk-sum interpretation. Next, we show that the walk-sets computed by these algorithms , where is the set of walks comare nested, i.e., puted at iteration . Finally, we show that every walk required for some for the computation of the mean (5) is contained in . A key ingredient in our analysis is that in computing all the walks in (5), the algorithms must not overcount any walks. Although each step in this procedure is nontrivial, combined together they allow us to conclude that the algorithms converge in walk-summable models. C. Properties of Walk-Summable Models Very importantly, there are easily testable necessary and sufficient conditions for walk-summability. Let denote the matrix of the absolute values of the elements of . Then, walk-summability is equivalent to either of the following [17]: • ; • . From the second condition, one can draw a connection to H-matrices in the linear algebra literature [16], [18]. Specifically, walk-summable information matrices are symmetric, positive-definite H-matrices. Walk-summability of a model is sufficient but not necessary for the validity of the model (positive-definite information/covariance). Many classes of models are walk-summable [17], as follows: , 1) diagonally dominant models, i.e., for each ; 2) valid non-frustrated models, i.e., every cycle has an even number of negative edge weights and ; special cases include valid attractive models ( for all ) and tree-structured models; 3) pairwise normalizable models, i.e., there exists a diagonal matrix and a collection of matrices if , such that . An example of a commonly encountered walk-summable model in statistical image processing is the thin-membrane prior [26]. Further, linear systems involving sparse diagonally dominant matrices are also a common feature in finite element approximations of elliptical partial differential equations [27]. 1919 We now describe some operations that can be performed on walk-sets, and the corresponding walk-sum formulas. These relations are valid in walk-summable models [17]. be a countable collection of mutually dis• Let joint walk-sets. From the sum-partition theorem for absolutely summable series [25], we have that . This implies that for a countable collection where , we have that of walk-sets . This is easily seen by with defining mutually disjoint walk-sets . and be walks • Let . The concatenation of the walks is such that . Now condefined to be and sider a walk-set with all walks ending at vertex . The a walk-set with all walks starting at concatenation of the walk-sets , is defined as follows: If every walk can be decomposed uniquely into and so that , then is said . For such to be uniquely decomposable into the sets . uniquely decomposable walk-sets, Finally, the following notational convention is employed in the rest of this paper. We use wild-card symbols ( and ) to denote a union over all vertices in . For example, given a col, we interpret as . lection of walk-sets is defined Further, the walk-sum over the set . In addition to edges being assigned weights, vertices can also be assigned weights (for example, the potential vector ). A reweighted walk-sum of a walk with vertex weight vector is then defined to be . Based on this notation, the mean of variable from (5) can be rewritten as (6) III. NONSTATIONARY EMBEDDED SUBGRAPH ALGORITHMS In this section, we describe a framework for the computation of the conditional mean estimates in order to solve the Gaussian estimation problem of Section II-A. We present three algorithms that become successively more complex in nature. We begin with the parallel ET algorithm originally presented in [10], [13]. Next, we describe a serial update scheme that involves processing only a subset of the variables at each iteration. Finally, we discuss a generalization of these nonstationary algorithms that is tolerant to temporary communication failure by using local memory at each node to remember past messages from neighboring nodes. A similar memory-based approach was used in [14] for the special case of stationary iterations. The key theme underlying all these algorithms is that they are based on solving a sequence of inference problems on tractable subgraphs involving all or a subset of the variables. Convergent iterations that compute means can also be used to compute exact error variances [10]. Hence, we restrict ourselves to analyzing iterations that compute the conditional mean. 1920 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008 A. Nonstationary Parallel Updates: Embedded Trees Let be some subgraph of the graph . The stationary ET algorithm is derived by splitting the matrix , where is known as the preconditioner and is known as the cut. ting matrix. Each edge in is either an element of or Accordingly, every nonzero off-diagonal entry of is either an element of or of . The diagonal entries of are part of . Hence, the matrix is symmetric, zero along the diagonal, and contains nonzero entries only in those locations that correspond to edges not included in the subgraph generated by the splitting. Cutting matrices may have nonzero diagonal entries in general, but we only consider zero-diagonal cutting matrices in this paper. The splitting of according to transforms (2) to , which suggests an iterative method to solve (2) by recursively solving , yielding a sequence of estimates (7) If exists then a necessary and sufficient condition for the iterates to converge to for any initial guess is that [10]. ET iterations can be very effective to a vector is efficient, e.g., if corresponds to if applying a tree or, in general, any tractable subgraph. A nonstationary ET iteration is obtained by letting , where the matrices correspond to some embedded in and can vary in an arbitrary manner tree or subgraph with . This leads to the following ET iteration: (8) Our walk-sum analysis proves the convergence of nonstationary ET iterations based on any sequence of subgraphs in walk-summable models. Every step of the above algorithm is to a vector can be performed effitractable if applying ciently. Indeed, an important degree of freedom in the above at each stage so as to speed up algorithm is the choice of convergence, while keeping the computation at every iteration tractable. We discuss some approaches to addressing this issue in Section VI. B. Nonstationary Serial Updates of Subsets of Variables We begin by describing the block GS iteration [15], [16]. For each , let be some subset of . The variare updated at iteration . The ables to . Let remaining variables do not change from iteration be the -dimensional principal submatrix corresponding to the variables . The block GS update at iteration is as follows: (9) (10) to a vector can be pertractable as long as applying formed efficiently. We now present a general serial iteration that incorporates an element of the ET algorithm of Section III-A. This update scheme involves a single ET iteration within the induced subin graph of the update variables . We split the edges the induced subgraph of into a tractable set and a set of . Such a splitting leads to a tractable subcut edges of the induced subgraph of . That is, the matrix graph is split as . This matrix splitting is defined analogous to the splitting in Section III-A. The modified conditional mean update at iteration is as follows: (11) (12) Every step of this algorithm is tractable as long as applying to a vector can be performed efficiently. The preceding algorithm is a generalization of both the block GS update (9)–(10) and the nonstationary ET algorithm (8), thus allowing for a unified analysis framework. Specifically, by letfor all above, we obtain the block GS algoting rithm. On the other hand, by letting for all , we recover the ET algorithm. This hybrid approach also offers a tractable and flexible method for inference in large-scale estimation problems, because it possesses all the benefits of the ET and block GS iterations. We note that there is one potential complication with both the serial and the parallel iterations presented so far. Specifically, for an arbitrary graphical model with positive-definite information matrix , the corresponding information subfor some choices of subgraphs may be invalid or matrix even singular, i.e., may have negative or zero eigenvalues.3 Importantly, this problem never arises for walk-summable models, and thus we are free to use any sequence of embedded subgraphs for our iterations and be guaranteed that the computations make sense probabilistically. , Lemma 1: Let be a walk-summable model, let and let be the -dimensional information matrix of corresponding to the distribution over some subgraph . Then, is walk-summable, and the induced subgraph . Proof: For every pair of vertices , it is clear that the walks between and in are a subset of the walks be. Hence, tween these vertices in , i.e., , because is walk-summable. Thus, the model specified by is walk-summable. This allows because walk-summability implies us to conclude that validity of a model. C. Distributed Interpretation of (11)–(12) and Communication Failure We first reinterpret (11)–(12) as local message-passing steps between nodes followed by inference within the subgraph 3For example, consider a five-cycle with each edge having a partial correlation refers to the complement of the vertex set . In (9), refers to the submatrix of edge weights of edges from the vertices to . Every step of the above algorithm is Here, 0 0:6. This model is valid (but not walk-summable) with the corresponding having a minimum eigenvalue of 0.0292. A spanning tree model J obtained by removing one of the edges in the cycle, however, is invalid with a minimum eigenvalue of 0:0392. of J 0 CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS . At iteration , let denote the set of directed edges in and from to : or (13) The edge set corresponds to the nonzero elements of the maand in (11). Edges in are used to commutrices nicate information about the values at iteration to neighboring nodes for processing at iteration . For each , the message is sent at iteration from to using the links in . Let denote the summary of all the messages received at node at iteration : (14) Thus, each fuses all the information received about the previous iteration and combines this with its local potential value to form a modified potential vector that is then used for inference within the subgraph : (15) denotes the entire vector of fused messages for . An interesting aspect of these message-passing operations is that they are local and only nodes that are neighbors in may participate in any communication. If the subgraph is tree-structured, the inference step (15) can also be performed efficiently in a distributed manner using only local BP messages [9]. We now present an algorithm that is tolerant to temporary link failure by using local memory at reeach node to store the most recent message fails at some future iteration ceived at from . If the link the stored message can be used in place of the new expected message. In order for the overall memory-based protocol to be consistent, we also introduce an additional post-inference message-passing step at each iteration. To make the above points precise, we specify a memory protocol that the network must follow; we assume that each node in the network has sufficient memory to store the most-recent messages received must not contain any failed links; from its neighbors. First, that fails at iteration must be a part every link . Therefore, the links that of the cut-set4: are used for the inference step (15) must be active at iteration . Second, in order for nodes to synchronize after each iteration, they must perform a post-inference message-passing step. After must the inference step (15) at iteration , the variables in update their neighbors in the subgraph . That is, for each , a message must be received post-inference from every such that : where (16) 4One way to ensure this is to select S to explicitly avoid the failed links. See Section VI-B for more details. 1921 This operation is possible since the edge is assumed to active. Apart from these two rules, all other aspects of the algorithm presented previously remain the same. Note that every new message received overwrites the existing stored message, and only the most recent message received is stored in memory. Thus, link failure affects only (14) in our iterative procedure. from node is Suppose that a message to be received at unavailable due to communication failure. The message from memory can be used instead in the fusion formula (14). denote the iteration count of the most recent Let information at node about at the information fusion step (14) at iteration . In general, , with equality and is active. With this notation, we can if rewrite (14): (17) IV. WALK-SUM INTERPRETATION AND WALK-SUM DIAGRAMS In this section, we analyze each iteration of the algorithms of Section III as the computation of walk-sums in . Our analysis is presented for the most general algorithm involving failing links, since the parallel and serial nonstationary updates without failing links are special cases. For each of these algorithms, we then present walk-sum diagrams that provide intuitive, graphical interpretations of the walks being computed. Examples that we discuss include classic methods such as GJ and GS, and iterations involving general subgraphs. Throughout this section, , and we initialize we assume that the initial guess and for each directed edge . In Section V, we prove the convergence of our algo. rithms for any initial guess A. Walk-Sum Interpretation For every pair of vertices , we define a recursive sequence of walk-sets. We then show that these walk-sets are exactly the walks being computed by the iterative procedures in Section III: (18) (19) with (20) The notation in these equations is defined in Section II-C. denotes the walks starting at node com. corresponds puted up to iteration to a length-1 walk (called a hop) across a directed edge in . Finally, denotes walks within that end at . Thus, the first RHS term in (18) is the set of previously 1922 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008 computed walks starting at that hop across an edge in , and then propagate within (ending at ). is . To the set of walks from to that live entirely within . simplify notation, we define We now relate the walk-sets to the estimate at iteration . , with , the Proposition 1: At iteration in walk-summable models is given by estimate for node Fig. 1. (Left) Gauss–Jacobi walk-sum diagrams G for n (Right) Gauss–Seidel walk-sum diagrams G for n = 1; 2; 3; 4. = 1; 2; 3. (21) where the walk-sum is over the walk-sets defined by (18)–(20), is computed using (15) and (17). and This proposition, proven in Appendix B, states that each of our algorithms has a precise walk-sum interpretation. A consequence of this statement is that no walk is over-counted, i.e., submits to a unique decomposition with reeach walk in spect to the construction process (18)–(20) (see proof for details), and appears exactly once in the sum at each iteration. As discussed in Section V (Propositions 3 and 4), the iterative process does even more; the walk-sets at successive iterations are nested and, under an appropriate condition, are “complete” so that convergence is guaranteed for walk-summable models. Showing and understanding all these properties are greatly facilitated by the introduction of a visual representation of how each of our algorithms computes walks, and that is the subject of Section IV-B. B. Walk-Sum Diagrams In the rest of this section, we present a graphical interpretation of our algorithms, and of the walk-sets (18)–(20) that are central to Proposition 1 (which in turn is the key to our convergence analysis in Section V). This interpretation provides a clearer picture of memory usage and information flow at each iteration. Specifically, for each algorithm we construct a sequence such that a particular set of walks in these graphs of graphs (18)–(20) computed by the corresponds exactly to the sets sequence of iterates . The graphs are called walk-sum corresponds to the subgraph used at diagrams. Recall that iteration , generally using some of the values computed from captures all of these prea preceding iteration. The graph ceding computations leading up to and including the computations at iteration . has very specific structure for each algoAs a result, rithm. It consists of a number of levels—within each level we capture the subgraph used at the corresponding iteration, and the final level corresponds to the results at the end of iteration . Although some variables may not be updated at each iteration, the values of those variables are preserved for use in includes all the subsequent iterations; thus, each level of nodes in . The update variables at any iteration (i.e., the nodes in ) are represented as solid circles, and the nonupdate ones as open circles. All edges in each —edges of included in this subgraph—are included in that level of the diagram. As in , these are undirected edges, as our algorithms perform inference on this subgraph. However, this inference update uses some values from preceding iterations (15), (17); hence, we use directed edges (corresponding to ) from nodes at preceding levels. The directed nature of these edges is critical as they capture the one-directional flow of computations from iteration to iteration, while the undirected edges within each level capture the inference computation (15) at each iteration. At the end of iteration , only the values at level are of interest. Therefore, that begin at any solid the set of walks (reweighted by ) in node at any level, and end at any node at the last level are of importance, where walks can only move in the direction of directed edges between levels, but in any direction along the undirected edges within each level. Later in this section we provide a general procedure for constructing walk-sum diagrams for our most general algorithms, but we begin by illustrating these diagrams and the points made in the preceding paragraph using a simple 3-node, fully connected graph (with variables denoted , , ). We look at two of the simplest iterative algorithms in the classes we have described, namely the classic GJ and GS iterations [15], [16]. Fig. 1 shows the walk-sum diagrams for these algorithms. In the GJ algorithm each variable is updated at each iteration using the values from the preceding iteration of every other variable (this corresponds to a stationary ET algorithm (7) with the being the fully disconnected graph of all the nodes subgraph ). Thus, each level on the left in Fig. 1 is fully disconnected, with solid nodes for all variables and directed edges from each node at the preceding level to every other node at the next level. This provides a simple way of seeing both how walks are extended from one level to the next and, more subtly, how walks captured at one iteration are also captured at subsequent iterais captured by the ditions. For example, the walk 12 in rected edge that begins at node 1 at level 1 and proceeds to node ). However, this walk in 2 at level 2 (the final level of is captured by the walk that begins at node 1 at level 2 and pro. ceeds to node 2 at level 3 in The GS algorithm is a serial iteration that updates one variable iterations each variable is at a time, cyclically, so that after updated exactly once. On the right-hand side of Fig. 1, only one node at each level is solid, using values of the other nodes from the preceding level. For nonupdate variables at any iteration, a weight-1 directed edge is included from the same node at the is updated at level 2, we preceding level. For example, since have open circles for nodes 1 and 3 at that level and weight-1 directed edges from their copies at level 1. Weight-1 edges do not affect the weight of any walk. Hence, at level 4 we still capture the walk 12 from level 2 (from node 1 at level 1 to node 2 at level 2); the walk is extended to node 2 at levels 3 and 4 with weight-1 directed edges. of one of our For general graphs, the walk-sum diagram algorithms is constructed as follows. CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS 1923 Fig. 2. (Left) Nonstationary ET: Subgraphs and walk-sum diagram. (Right) Hybrid serial updates: Subgraphs and walk-sum diagram. 1) For , create a new copy of each using solid circles for update variables and open circles for nonupdate . Draw the subgraph using the variables; label these solid nodes and undirected edges weighted by the partial is the same as correlation coefficient of each edge. with the exception that also contains nonupdate variables denoted by open circles. , create a new copy of each using 2) Given solid circles for update variables and open circles other. Draw using the update variables wise; label these with undirected edges. Draw a directed edge from the variin (since ) able for each . If there are no failed links, to . Both these undirected and directed edges are weighted by their respective partial correlation coefficients. Draw a directed edge to each nonupdate varifrom the corresponding with unit edge able weight. A level in a walk-sum diagram refers to the th replica of the variables. : Walks must respect the orientation Rules for Walks in of each edge, i.e., walks can cross an undirected edge in either direction, but can only cross directed edges in one direction. In addition, walks can only start at the update variables for each level . Interpreted in this manner, walks in reweighted by and ending at one of the variables are exactly the walks computed in . Proposition 2: Let be a walk-sum diagram constructed and interpreted according to the preceding rules. In walk-sum, , and with mable models, for any failure (18)–(19) reduce to . Hence, the walk-sum formulas (23) The stationary GJ iteration discussed previously falls in this class. The left-hand side of Fig. 2 shows the trees , , , and the corresponding first three levels of the walk-sum diagrams for a more general nonstationary ET iteration. This example illustrates how walks are “collected” in walk-sum diagrams at each iteration. First, walks can proceed along undirected edges within each level, and from one level to the next along directed edges (capturing cut edges). Second, the walks relevant at each iteration must end at that level. For example, the walk 13231 is captured at iteration 1 as it is present in the undirected edges at level 1. At iteration 2, however, we are interested in walks ending at level 2. The walk 13231 is still captured, but in a different manner—through the walk 1323 at level 1, followed by the hop 31 along the directed edge from node 3 at level 1 to node 1 at level 2. At iteration 3, this walk is captured first by the hop from node 1 at level 1 to node 3 at level 2, then by the hop 32 at level 2, followed by the hop from node 2 at level 2 to node 3 at level 3, and finally by the hop 31 at level 3. D. Nonstationary Serial Updates We describe similar walk-sum diagrams for the serial update scheme of Section III-B. Since there is no link failure, . The recursive walk-set update (18) can be specialized as follows: (22) Proof: Based on the preceding discussion, one can check the equivalence of the walks computed by the walk-sum diagrams with the walk-sets (18)–(20). Proposition 1 then yields (22). Sections IV-C–E describe walk-sum diagrams for the various algorithms presented in Section III. C. Nonstationary Parallel Updates We describe walk-sum diagrams for the parallel ET algorithm of Section III-A. Here, for all . Since there is no link (24) While (23) is a specialization to iterations with parallel updates, (24) is relevant for serial updates. The GS iteration discussed in Section IV-B falls in this class, as do more general serial updates described in Section III-B in which we update a subset based on a subgraph of the induced graph of of variables . The right-hand side of Fig. 2 illustrates an example for our 3-node model. We show the subgraphs used in the first four stages of the algorithm and the corresponding 4-level walk-sum and diagram. Note that at iteration 2 we update variables without taking into account the edge connecting them. Indeed, 1924 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008 Fig. 3. Nonstationary updates with failing links: Subgraphs used along with failed edges at each iteration (left) and walk-sum diagram G (right). the updates at the first four iterations of this example include block GS, a hybrid of ET and block GS, parallel ET, and GS, respectively. E. Failing Links We now discuss the general nonstationary update scheme of Section III-C involving failing links. The recursive walk-set computation equations for this iteration are given by (18)–(20). Fig. 3 shows the subgraph and the edges in that fail at each iteration, and the corresponding 4-level walk-sum diagram. We elaborate on the computation and propagation of information at each iteration. At iteration 1, inference is performed using subgraph , followed by nodes 1 and 2 passing a message to each other according to the post-inference message-passing rule is updated. As no links fail, node (16). At iteration 2 only 3 gets information from nodes 1 and 2 at level 1. At iteration fails. But node 1 has information about at 3, the link level 1 (due to the post-inference message passing step from iteration 1). This information is used from the local memory at node 1 in (17), and is represented by the arrow from node 2 at level 1 to node 1 at level 3. At iteration 4, the links and fail. Similar reasoning as in iteration 3 applies to the arrows drawn across multiple levels from node 1 to node 3, and from node 3 to node 1. Further, post-inference message-passing at this iteration only takes place between nodes 1 and 2 because is . the only edge in V. CONVERGENCE ANALYSIS We now show that all the algorithms of Section III converge in walk-summable models. As in Section IV-A, we focus on the most general nonstationary algorithm with failing links of converges to the Section III-C. We begin by showing that . Next, we use this result to show correct means when that we can achieve convergence to the correct means for any . initial guess as relies on The proof that eventually contains every element of the fact that of all the walks in from to , a condition we the set refer to as completeness. Showing this begins with the following proposition proved in Appendix C. Proposition 3 (Nesting): The walk-sets defined in equations , (18)–(20) are nested, i.e., for every pair of vertices for each . This statement is easily seen for a stationary ET algorithm because the walk-sum diagram from levels 2 to is a replica (for example, the GJ diagram in Fig. 1). However, of the proposition is less clear for nonstationary iterations. The discussion in Section IV-C illustrates this point; the paths that a walk traverses change drastically depending on the level in the walk-sum diagram at which the walk ends. Nonetheless, as shown in Appendix C, the structure of the estimation algorithms that we consider ensures that whenever a walk is not explicitly captured in the same form it appeared in the preceding iteration, it is recovered through a different path in the subsequent walk-sum diagram (no walks are lost). Completeness relies on both nesting and the following additional condition. be any directed edge in . For each Definition 2: Let , let denote the set of directed active edges (links at iteration . The edge is said to that do not fail) in be updated infinitely often5 if for every , there exists an such that . If there is no link failure, this definition reduces to including infinitely often. For pareach vertex in in the update set allel nonstationary ET iterations (Section III-A), this property is satisfied for any sequence of subgraphs. Note that there are cases in which inference algorithms may not have to traverse each edge infinitely often. For instance, suppose that can be decomposed into subgraphs and that are connected by a having small size so that we can perform single edge, with could be a leaf node (i.e., exact computations. For example, have degree one). We can eliminate the variables in , propagate information “into” along the single connecting edge, perform inference within , and then back-substitute. Hence, the single connecting edge is traversed only finitely often. In this case the hard part of the overall inference procedure is on the reduced graph with leaves and small, dangling subgraphs eliminated, and we focus on inference problems on such graphs. Thus, we assume that each vertex in has degree at least two and study algorithms that traverse each edge infinitely often. be an arbitrary Proposition 4 (Completeness): Let walk from to in . If every edge in is updated infinitely often (in both directions), then there exists an such that for all , where the walk-set is defined in (18)–(20). The proof of this proposition appears in Appendix D. We can now state and prove the following. Theorem 1: If every edge in is updated infinitely often (in both directions), then as in walk-summable models, with as defined in Section IV-A. , Proof: One can check that . This is because equations (18)–(20) only use edges from the original graph . We have from Proposition 4 that every . walk from to in is eventually contained in . Given these arguments Thus, and the nesting of the walk-sets from Proposition 3, we can appeal to the results in Section II-C to conclude that as . Theorem 1 shows that for . The following result, proven in Appendix E, shows that in walk5If G contains a singleton node, then this node must be updated at least once. CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS summable models convergence is achieved for any choice of initial condition.6 Theorem 2: If every edge is updated infinitely often, then computed according to (15), (17) converges to the correct means . in walk-summable models for any initial guess Next, we describe a stronger convergence result when there is no communication failure. Corollary 1: Assuming no communication failure, we have the following special cases for convergence in walk-summable models (with any initial guess): 1) the hybrid algorithms of Section III-B converge to the correct mean as long as each variable is updated infinitely often; 2) the parallel ET iterations of Section III-A (with ) converge to the correct mean using any sequence of subgraphs. We note that in the parallel ET iteration (with no link failure), it is not necessary to include each edge in some subgraph ; indeed, even stationary algorithms that use the same subgraph at each iteration are guaranteed to converge. Theorem 2 shows that walk-summability is a sufficient condition for all our algorithms to converge for a very large and flexible set of sequences of tractable subgraphs or subsets of variables (ones that update each edge infinitely often) on which to perform successive updates. Corollary 1 requires even weaker conditions for convergence if there is no communication failure. The following result, proven in Appendix F, shows that walk-summability is also necessary for this complete flexibility. Thus, while any of our algorithms may converge for some sequence of subsets of variables and tractable subgraphs in a non-walk-summable model, there is at least one sequence of updates that leads to a divergent iteration. Theorem 3: For any non-walk-summable model, there exists at least one sequence of iterative steps that is ill-posed, or for , computed according to (15) and (17), diverges. which tween the error at iteration : 1925 and the residual error at iteration Based on this relationship, we have the walk-sum interpretation , and consequently the following bound on the norm of : (25) denotes walks in that must traverse edges not where in , refers to the entry-wise absolute value vector of , refers to the reweighted absolute walk-sum over all walks in , and refers to the reweighted absolute walk-sum over all walks in . Minimizing the error reduces to choosing to maximize . Hence, if we maximize among all trees, we have the following maximum walk-sum tree problem: (26) Rather than solving this combinatorially complex problem, we instead solve a problem that minimizes a looser upper bound than (25). Specifically, consider any edge and all of the walks that live solely on this single edge. It is not difficult to show that VI. ADAPTIVE ITERATIONS AND EXPERIMENTAL RESULTS In this section we address two topics. The first is taking advantage of the great flexibility in choosing successive iterative steps by developing techniques that adaptively optimize the online choice of the next tree or subset of variables to use in order to reduce the error as quickly as possible. The second is providing experimental results that demonstrate the convergence behavior of these adaptive algorithms. (27) This weight provides a measure of the error-reduction capacity of edge by itself at iteration . This leads directly to choosing the maximum spanning tree [28] by solving (28) A. Choosing Trees and Subsets of Variables Adaptively At iteration , let the error be and the residual error be . Note that it is tractable to compute the residual error at each iteration. 1) Trees: We describe an efficient algorithm to choose spanning trees adaptively to use as preconditioners in the ET algorithm of Section III-A. We have the following relationship be6Note R x s;t 2 E that in this case the messages must be initialized as for each directed edge ( ) . M (s ! t) = For any tree the set of walks captured in the sum in (28) is a subset of all the walks in , so that solving (28) provides a lower bound on (26) and thus a looser upper bound than (25). Each iteration of this method can be solved using computations based on standard greedy approaches for the maximum spanning tree problem [28]. For sparse graphs, more sophisticated variants can also be used to achieve a per-iteration complexity of [28]. 1926 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008 2) Subsets of Variables: We present an algorithm to choose the next best subset of variables for the block GS algorithm of Section III-B. The error at iteration can be written as follows: 2 Fig. 4. (Left) Convergence results for a randomly generated 15 15 nearestneighbor grid-structured model. (Right) Average number of iterations required for the normalized residual to reduce by a factor of 10 over 100 randomly generated models. As with (25), we have the following upper bound: (29) where refers to the edges in the induced subgraph of . Minimizing this upper bound reduces to solving the following maximum walk-sum block problem: (30) As with the maximum walk-sum tree problem, finding the optimal such block directly is combinatorially complex. Therefore, we consider the following relaxed maximum walk-sum block problem based on single-edge walks: (31) denotes the restriction that walks can traverse at where most one edge. The walks in (31) are a subset of the walks in (30). Thus, solving (31) provides a lower bound on (30), hence minimizing a looser upper bound on the error than (29). Solving (31) is also combinatorially complex; therefore, we use a greedy method for an approximate solution: 1) Set . Assuming that the goal is to solve the problem for , compute node weights based on the walks captured by (31) if node were to be included in . 2) Find the maximum weight node from , and set . 3) If , stop. Otherwise, update each neighbor of and go to Step 2): This update captures the extra walks in (31) if were to be added to . Step 3) is the greedy aspect of the algorithm as it updates weights by computing the extra walks that would be captured in (31) if node were added to , with the assumption that the nodes already in remain unchanged. Note that only the weights of the neighbors of are updated in Step 3); thus, there is a bias towards choosing a connected block. In choosing successive blocks in this way, we collect walks adaptively without explicit regard for the objective of updating each node infinitely often. However, our method is biased towards choosing variables that have not been updated for a few iterations as the residual error of such variables becomes larger relative to the other variables. Indeed, empirical evidence confirms this behavior with all the simulations leading to convergent iterations. For bounded, each iteration using this technique requires computations using an efficient sort data structure [28]. The space complexity of maintaining such a structure can be significant compared to the adaptive ET procedure. 3) Experimental Illustration: We test the preceding two adaptive algorithms on randomly generated 15 15 nearest-neighbor grid models with7 , and with . The blocks used in block GS were of size . We compare these adaptive methods to standard nonadaptive one-tree and two-tree ET iterations [13]. Fig. 4 shows the performance of these algorithms. The plot shows the relative decrease in the normalized residual error versus the number of iterations. The table shows the average number of iterations required for these algorithms to reduce the normalized residual error below . The average was computed based on the performance on 100 randomly generated models. All these models are poorly conditioned because they are barely walk-summable. The number of iterations for block GS is scaled appropriately to account for the different per-iteration computational costs ( for adaptive ET and for adaptive block GS). The one-tree ET method uses a spanning tree obtained by removing all the vertical edges except the middle column. The two-tree method alternates between this tree and its rotation (obtained by removing all the horizontal edges except the middle row). 0 7The grid edge weights are chosen uniformly at random from [ 1; 1]. The ) = 0:99. The potential vector h is chosen matrix R is then scaled so that %(R to be the all-ones vector. CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS Fig. 5. Convergence of memory-based algorithm on same randomly generated : (left) and varying with 15 15 used in Fig. 4: Varying with : (right). 2 = 03 = 03 Both the adaptive ET and block GS algorithms provide far faster convergence compared to the one-tree and two-tree iterations, thus providing a computationally attractive method for estimation in the broad class of walk-summable models. B. Communication Failure: Experimental Illustration To illustrate our adaptive methods in the context of communication failure, we consider a simple model for a distributed sensor network in which links (edges) fail independently with failure probability , and each failed link remains inactive for a certain number of iterations given by a geometric random . At each iteration, we find the best variable with mean spanning tree (or forest) among the active links using the approach described in Section VI-A-1). The maximum spanning tree problem can be solved in a distributed manner using the algorithms presented in [29], [30]. Fig. 5 shows the convergence of our memory-based algorithm from Section III-C on the same randomly generated 15 15 grid model used to generate the plot in Fig. 4 (again, with ). The different curves are obtained by varying and . As expected, the first plot shows that our algorithm is slower to converge as the failure probability increases, while the second plot shows that convergence is faster as is increased (which decreases the average inactive time). These results show that our adaptive algorithms provide a scalable, flexible, and convergent method for the estimation problem in a distributed setting with communication failure. 1927 each iteration, and provide an intuitive graphical comparison between the various algorithms. This walk-sum analysis allows us to conclude that for the broad class of walk-summable models, our algorithms converge for a very large and flexible set of sequences of subgraphs and subsets of variables used. We also describe how this flexibility can be exploited by formulating efficient algorithms that choose spanning trees and subsets of variables adaptively at each iteration. These algorithms are then used in the ET and block GS iterations respectively to demonstrate that significantly faster convergence can be obtained using these methods over traditional one-tree and two-tree ET iterations. Our adaptive algorithms are greedy in that they consider the effect of edges individually (by considering walk-sums concentrated on single edges). A strength of our analysis for the case of finding the “next best” tree is that we do obtain an upper bound on the resulting error, and hence on the possible gap between our greedy procedure and the truly optimal one. Obtaining tighter error bounds, or conditions on graphical models under which our choice of tree yields near-optimal solutions is an open problem. Another interesting question is the development of general versions of the maximum walk-sum tree and maximum walk-sum block algorithms that choose the next best trees or blocks jointly in order to achieve maximum reduction in error after iterations. For applications involving communication failure, extending our adaptive algorithms in a principled manner to explicitly avoid failed links while optimizing the rate of convergence is an important problem. Finally, the fundamental principle of solving a sequence of tractable inference problems on subgraphs has been exploited for non-Gaussian inference problems (e.g., [12]), and extending our methods to this more general setting is of clear interest. APPENDIX A. Dealing With Un-Normalized Models Consider an information matrix (where is the diagonal part of ) that is not normalized, i.e., . The weight of a walk can be redefined as follows: VII. CONCLUSION In this paper, we have described and analyzed a rich set of algorithms for estimation in Gaussian graphical models with arbitrary structure. These algorithms are iterative in nature and involve a sequence of inference problems on tractable subgraphs over subsets of variables. Our framework includes parallel iterations such as ET, in which inference is performed on a tractable subgraph of the whole graph at each iteration, and serial iterations such as block GS, in which the induced subgraph of a small subset of variables is directly inverted at each iteration. We describe hybrid versions of these algorithms that involve inference on a subgraph of a subset of variables. We also discuss a method that uses local memory at each node to overcome temporary communication failures that may arise in distributed sensor networks. Our memory-based framework applies to the nonstationary ET, block GS, and hybrid algorithms. We analyze these algorithms based on the recently introduced walk-sum interpretation of Gaussian inference. A salient feature in our analysis is the development of walk-sum diagrams. These graphs correspond exactly to the walks computed after where is the weight of with respect to the un-normalized model, and is the weight of in the corresponding normalized model. We can then define walk-summability in terms of the absolute convergence of the un-normalized walk-sum over all walks from to (for each pair of vertices ). A necessary and sufficient condition for this un-normalized notion of walk-summability is , which is equivalent to the original in the corresponding normalized model. condition Un-normalized versions of the algorithms in Section III can be constructed by replacing every occurrence of the partial correlation matrix by the un-normalized off-diagonal matrix 1928 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008 . The rest of our analysis and convergence results remain unchanged because we deal with abstract walk-sets. (Note that in the proof of Proposition 1, every occurrence of must be changed to .) Alternatively, given an un-normalized model, one can first normalize the model , then apply the algorithms of Section III, and finally “de-normalize” . Such the resulting estimate a procedure would provide the same estimate as the direct application of the un-normalized versions of the algorithms in Section III as outlined above. Proof of proposition: We provide an inductive proof. From . Thus (20), B. Proof of Proposition 1 where the first equality is from (12), the second from the inductive hypothesis, and the third from (19). Hence, we can focus on , (32) and (33) can be rewritten as nodes in . For Remarks: Before proceeding with the proof of the proposition, we make some observations about the walk-sets that will prove useful for the other proofs as , notice that since the set of edges contained well. For and are disjoint, the walk-sets and in are disjoint. Therefore, from Section II-C, which is consistent with the proposition because we assume that our initial guess is 0 at each node. , for , as Assume that , the inductive hypothesis. For (34) because have that if . From (32)–(34), we (32) Every walk can be decomposed uniquely as , , , and where . The unique decomposition property is a consequence of and being disjoint, and the walk in being restricted to a length-1 hop. This property also implies and that where we have used the walk-sum interpretation of Simplifying further, we have that and are . Based on these two observations, disjoint if we have from Section II-C that . (35) The last equality is from the inductive hypothesis because . Next, we have that where the second equality is from (17), and the third from (15). (33) CHANDRASEKARAN et al.: ESTIMATION IN GAUSSIAN GRAPHICAL MODELS USING TRACTABLE SUBGRAPHS 1929 C. Proof of Proposition 3 E. Proof of Theorem 2 We provide an inductive proof based on the walk-sum dia. For a completely algebraic proof that is not based gram on walk-sum diagrams we refer the reader to [31]. Based on the construction of the walk-sum diagrams in Section IV-B and the rules for interpreting walks in these diagrams, one can check the with the walk-sets (18)–(20) (this equivalence of walks in property is used in the proof of Proposition 2). More precisely, ; for each we have that that begins at some copy of and ends at there walk in , and vice versa. is a corresponding walk in as First, note that the proposition is true for . We assume as the inductive hypothesis for . that . We must show that Suppose that . If , then because from (19). , then we must show that is a walk in If that ends at and starts at some node for . We from . We begin by tracing the walk backwards in continue this process as far as possible in until we find an where . If there is no such edge . Otherwise, we edge, then , where is a walk have that and is an edge in . at level of We need to check that . Since , the walk ending at in also makes the hop through some edge in . This means that . However, by , we have that . the definition of , we use the inductive hypothesis As . Thus, to conclude that is a walk in that starts at some and ends at . This walk can continue in along the edge , and further from to . Therefore, . From Theorem 1 and Proposition 1, we can conclude that converges to element-wise as for . As. Consider a shifted linear system , sume that . If we solve this system using the same where sequence of operations (subgraphs and failed links) that were , and with , then conused to obtain the iterates of the system . verges to the correct solution , which would allow us We will show that to conclude that element-wise as for . We prove this final step inductively. The base case is any . Assume as the inductive hypothesis that clear because for . From this, one can . For , we have from (15) check that and (17) that D. Proof of Proposition 4 Let . We provide an inductive proof with respect to the length of . If every edge is updated infinitely often, it is clear that every node is updated infinitely often. Therefore, the is computed when is first updated leading length-0 part at some iteration . By the nesting of the walk-sets from for all . Proposition 3, we have that Now assume (as the inductive hypothesis) that the leading subincluding all but the last step of is contained walk in for some . Given the infinitely-often such that the edge update property, there exists an . If , then . This can be concluded from (18) and because by the nesting argument of Proposition 3. Again applying the nesting argument, we can prove the proposition befor all . We can cause we now have that use a similar argument to conclude that for all , if . The second equality follows from the inductive hypothesis, and the last two from simple algebra. F. Proof of Theorem 3 Before proving the converse, we have the following lemma that is proved in [32]. Lemma 2: Suppose is a symmetric positive-definite mais some splitting with symmetric trix, and and nonsingular. Then, if and only if . is valid but non-walk-sumProof: Assume that mable. Therefore, must contain some negative partial correlation coefficients (since all valid attractive models, i.e., those containing only non-negative partial correlation coefficients, are with walk-summable; see Section II-C). Let containing the positive coefficients and containing the negative coefficients (including the negative sign). Consider a stationary ET iteration (7) based on cutting the negative edges so and . If is singular, then the that iteration is ill-posed. Otherwise, the iteration converges if and [15], [16]. From Lemma 2, we need to only if : check the validity of But if and only if the model is walk-summable (from Section II-C). Thus, this stationary iteration, if well-posed, does not converge in non-walk-summable models. 1930 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 5, MAY 2008 ACKNOWLEDGMENT The authors would like to thank Dr. E. Sudderth for helpful discussions and comments on early drafts of this work. REFERENCES [1] S. L. Lauritzen, Graphical Models. Oxford, U.K.: Oxford Univ. Press, 1996. [2] M. I. Jordan, “Graphical models,” Stat. Sci. (Special Issue on Bayesian Statistics), vol. 19, pp. 140–155, 2004. [3] C. Crick and A. Pfeffer, “Loopy belief propagation as a basis for communication in sensor networks,” presented at the 19th Conf. Uncertainty in Artificial Intelligence, Acapulco, Mexico, Aug. 8–10, 2003. [4] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 5, pp. 721–741, Jun. 1984. [5] J. Woods, “Markov image modeling,” IEEE Trans. Autom. Control, vol. 23, no. 5, pp. 846–850, Oct. 1978. [6] R. Szeliski, “Bayesian modeling of uncertainty in low-level vision,” J. Comput. Vision, vol. 5, no. 3, pp. 271–301, Dec. 1990. [7] P. Rusmevichientong and B. V. Roy, “An analysis of turbo decoding with Gaussian densities,” Adv. Neural Inf. Process. Syst., vol. 12, pp. 575–581, 2000. [8] P. W. Fieguth, W. C. Karl, A. S. Willsky, and C. Wunsch, “Multiresolution optimal interpolation and statistical analysis of Topex/Poseidon satellite altimetry,” IEEE Trans. Geosci. Remote Sens., vol. 33, no. 2, pp. 280–292, Mar. 1995. [9] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kauffman, 1988. [10] E. B. Sudderth, M. J. Wainwright, and A. S. Willsky, “Embedded trees: Estimation of Gaussian processes on graphs with cycles,” IEEE Trans. Signal Process., vol. 52, no. 11, pp. 3136–3150, Nov. 2004. [11] L. K. Saul and M. I. Jordan, “Exploiting tractable substructures in intractable networks,” Adv. Neural Inf. Process. Syst., vol. 8, pp. 486–492, 1996. [12] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky, “Tree-based reparameterization framework for analysis of sum-product and related algorithms,” IEEE Trans. Inf. Theory, vol. 49, no. 5, pp. 1120–1146, May 2003. [13] E. B. Sudderth, “Embedded trees: Estimation of Gaussian processes on graphs with cycles,” Master’s thesis, Massachusetts Inst. of Technology, Cambridge, MA, 2002. [14] V. Delouille, R. Neelamani, and R. Baraniuk, “Robust distributed estimation using the embedded subgraphs algorithm,” IEEE Trans. Signal Process., vol. 54, no. 8, pp. 2998–3010, Aug. 2006. [15] G. H. Golub and C. H. Van Loan, Matrix Computations. Baltimore, MD: The Johns Hopkins Univ. Press, 1990. [16] R. S. Varga, Matrix Iterative Analysis. New York: Springer-Verlag, 2000. [17] D. M. Malioutov, J. K. Johnson, and A. S. Willsky, “Walk-sums and belief propagation in Gaussian graphical models,” J. Mach. Learn. Res., vol. 7, pp. 2031–2064, Oct. 2006. [18] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1994. [19] R. Bru, F. Pedroche, and D. B. Szyld, “Overlapping additive and multiplicative Schwarz iterations for H-matrices,” Linear Algebra Its Appl., vol. 393, pp. 91–105, 2004. [20] A. Frommer and D. B. Szyld, “H-splittings and two-stage iterative methods,” Numer. Math., vol. 63, pp. 345–356, 1992. [21] T. Gu, X. Liu, and X. Chi, “Relaxed parallel two-stage multisplitting methods II: Asynchronous version,” Int. J. Comput. Math., vol. 80, no. 10, pp. 1277–1287, Oct. 2003. [22] T. Speed and H. Kiiveri, “Gaussian Markov probability distributions over finite graphs,” Ann. Stat., vol. 14, no. 1, pp. 138–150, Mar. 1986. [23] L. Scharf, Statistical Signal Processing. Upper Saddle River, NJ: Prentice-Hall, 2002. [24] W. Rudin, Principles of Mathematical Analysis. New York: McGraw-Hill, 1976. [25] R. Godement, Analysis I. New York: Springer-Verlag, 2004. [26] P. Fieguth, W. Karl, and A. Willsky, “Efficient multiresolution counterparts to variational methods for surface reconstruction,” Comput. Vision Image Understand., vol. 70, no. 2, pp. 157–176, May 1998. [27] W. G. Strang and G. J. Fix, An Analysis of the Finite Element method. Cambridge, MA: Wellesley-Cambridge, 1973. [28] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms. Cambridge, MA: MIT Press, 2001. [29] R. G. Gallager, P. A. Humblet, and P. M. Spira, “A distributed algorithm for minimum-weight spanning trees,” ACM Trans. Program. Lang. Syst., vol. 5, no. 1, pp. 66–77, Jan. 1983. [30] B. Awerbuch, “Optimal distributed algorithms for minimum weight spanning tree, counting, leader election, and related problems,” in Annu.l ACM Symp. Theory of Computing, 1987. [31] V. Chandrasekaran, “Modeling and estimation in Gaussian graphical models: Maximum-entropy relaxation and walk-sum analysis,” Master’s thesis, Massachusetts Inst. Technology, Cambridge, MA, 2007. [32] L. Adams, “m-step preconditioned conjugate gradient methods,” SIAM J. Sci. Stat. Comput., vol. 6, pp. 452–463, Apr. 1985. Venkat Chandrasekaran (S’03) received the B.S. degree in electrical engineering, the B.A. degree in mathematics from Rice University, Houston, TX, in 2005, and the S.M. degree in electrical engineering from the Massachusetts Institute of Technology, Cambridge, in 2007, where he is currently working towards the Ph.D. degree with the Stochastic Systems Group. His research interests include statistical signal processing, machine learning, and computational harmonic analysis. Jason K. Johnson received the B.S. degree in physics and the M.S. degree in electrical engineering and computer science, both from the Massachusetts Institute of Technology (MIT), Cambridge, in 1995 and 2003, respectively, where he is currently working towards the Ph.D. degree. From 1995 to 2000, he was a Member of Technical Staff at Alphatech, Burlington, MA, where he developed algorithms for multiscale signal and image processing, data fusion and multitarget tracking. His current research interests include Markov models, tractable methods for inference and learning, and statistical signal and image processing. Alan S. Willsky (S’70–M’73–SM’82–F’86) joined the Massachusetts Institute of Technology (MIT), Cambridge, in 1973 and is the Edwin Sibley Webster Professor of Electrical Engineering. He was a founder of Alphatech, Inc., Burlington, MA, and is the Chief Scientific Consultant at BAE Systems Advanced Information Technologies. From 1998 to 2002, he served on the U.S. Air Force Scientific Advisory Board. He has delivered numerous keynote addresses and is coauthor of the text Signals and Systems (Englewood Cliffs, NJ: Prentice-Hall, 1997). His research interests are in the development and application of advanced methods of estimation and statistical signal and image processing. Dr. Willsky has received several award, including the 1975 American Automatic Control Council Donald P. Eckman Award, the 1979 ASCE Alfred Noble Prize, the 1980 IEEE Browder J. Thompson Memorial Award, the IEEE Control Systems Society Distinguished Member Award in 1988, the 2004 IEEE Donald G. Fink Prize Paper Award, and Doctorat Honoris Causa from the Université de Rennes, France, in 2005.
© Copyright 2026 Paperzz