1924 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997 A Framework for Linear Information Inequalities Raymond W. Yeung, Senior Member, IEEE Abstract— We present a framework for information inequalities, namely, inequalities involving only Shannon’s information measures, for discrete random variables. A region in IR2 01 , denoted by 03 , is identified to be the origin of all information inequalities involving n random variables in the sense that all such inequalities are partial characterizations of 03 . A product from this framework is a simple calculus for verifying all unconstrained and constrained linear information identities and inequalities which can be proved by conventional techniques. These include all information identities and inequalities of such types in the literature. As a consequence of this work, most identities and inequalities involving a definite number of random variables can now be verified by a software called ITIP which is available on the World Wide Web. Our work suggests the possibility of the existence of information inequalities which cannot be proved by conventional techniques. We also point out the relation between 03 and some important problems in probability theory and information theory. Example 2: Index Terms— Entropy, I -Measure, information identities, information inequalities, mutual information. I. INTRODUCTION S HANNON’S information measures refer to entropies, conditional entropies, mutual informations, and conditional mutual informations. For information inequalities, we refer to those involving only Shannon’s information measures for discrete random variables. These inequalities play a central role in converse coding theorems for problems in information theory with discrete alphabets. This paper is devoted to a systematic study of these inequalities. We begin our discussion by examining the two examples below which exemplify what we call the “conventional” approach to proving such inequalities. Example 1: This is a version of the well-known data processing theorem. Let be random variables such that - - - - form a Markov chain. Then In the above, the second equality follows from the Markov condition, while the inequality follows because is always nonnegative. Manuscript received August 10, 1995; revised February 10, 1997. The material in this paper was presented in part at the 1996 IEEE Information Theory Workshop, Haifa, Israel, June 9–13, 1996. The author is with the Department of Information Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong. Publisher Item Identifier S 0018-9448(97)06816-8. The inequalities above follow from the nonnegativity of , , and , respectively. In the conventional approach, we invoke certain elementary identities and inequalities in the intermediate steps of a proof. Some frequently invoked identities and inequalities are if - - - - if - - - - Proving an identity or an inequality using the conventional approach can be quite tricky, because it may not be easy to see which elementary identity or inequality should be invoked next. For certain problems, like Example 1, we may rely on our insight to see how we should proceed in the proof. But of course, most of our insight in problems is developed from the hindsight. For other problems like or even more complicated than Example 2 (which involves only three random variables), it may not be easy at all to work it out by brute force. The proof of information inequalities can be facilitated by the use of information diagrams1 [25]. However, the use of such diagrams becomes very difficult when the number of random variables is more than four. 1 It was called an I -diagram in [25], but we prefer to call it an information diagram to avoid confusion with an eye diagram in communication theory. 0018–9448/97$10.00 1997 IEEE YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1925 In the conventional approach, elementary identities and inequalities are invoked in a sequential manner. In the new framework that we shall develop in this paper, all identities and inequalities are considered simultaneously. Before we proceed any further, we would like to make a few remarks. Let and be any expressions depending only on Shannon’s information measures. We shall call them information expressions, and specifically linear information expressions if they are linear combinations of Shannon’s information measures. Likewise, we shall call inequalities involving only Shannon’s information measures information inequalities. Now if and only if . Therefore, if for any expression we can determine whether it is always nonnegative, then we can determine whether any particular inequality always holds. We note that if and only if and . Therefore, it suffices to study inequalities. The rest of the paper is organized as follows. In the next section, we first give a brief review of -Measure [25] on which a few proofs will be based. In Section III, we introduce the canonical form of an information expression and discuss its uniqueness. We also define a region called which is central to the discussion in this paper. In Section IV, we present a simple calculus for verifying information identities and inequalities which can be proved by conventional techniques. In Section V, we further elaborate the significance of by pointing out its relations with some important problems in probability theory and information theory. Concluding remarks are given in Section VI. II. REVIEW OF THE THEORY OF ( ), i.e., for any (not necessarily disjoint) (2) When , we interpret (2) as (3) When , (2) becomes (4) When and , (2) becomes (5) Thus (2) covers all the cases of Shannon’s information measures. Let for some Note that arbitrary one-to-one mappings (6) . Let . Define and let (7) (8) -MEASURE In this section, we give a review of the main results regarding -Measure. For a detailed discussion of -Measure, we refer the reader to [25]. Further results on -Measure can be found in [7]. Let be jointly distributed discrete random variables, and be a set variable corresponding to a random variable . Define the universal set to be and let be the -field generated by . The , where is either or . atoms of have the form be the set of all atoms of except for , Let which is by construction because (1) where Then and for . (9) where ) with is a unique if if An important characteristic of so we can write matrix (independent of (10) is that it is invertible [25], (11) . Note that To simplify notations, we shall use to denote and to denote . Let . It was shown in [25] that there exists a unique signed measure on which is consistent with all Shannon’s information measures via the following formal substitution of symbols: is completely specified by the set of values In other words, , , namely, all the joint entropies involving , and it follows from (5) that is the unique measure on which is consistent with all Shannon’s information measures. Note that in general is not nonnegative. However, if form a Markov chain, is always nonnegative [7]. As a consequence of the theory of -Measure, the information diagram was introduced as a tool to visualize the relationship among information measures [25]. Applications of information diagrams can be found in [7], [25], and [26]. 1926 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997 III. THE CANONICAL FORM In the rest of the paper, we shall assume that are the random variables involved in our discussion. We observe that conditional entropies, mutual informations, and conditional mutual informations can be expressed as a linear combination of joint entropies by using the following identity: (12) . Thus any information expression can where be expressed in terms of the joint entropies. We call this the canonical form of an information expression. Now for any their joint entropies correspond to a vector in , where we regard as the coordinates of . On the other hand, a vector in is said to be constructible if there exist whose joint entropies are given by . We are then motivated to define is constructible As we shall see, not only gives a complete characterization of all information inequalities, but it also is closely related to some important problems in probability theory and information theory. Thus a complete characterization of is of fundamental importance. To our knowledge, there has not been such a characterization in the literature (see Section V). Now every information expression can be expressed in canonical form. A basic question to ask is in what sense the canonical form is unique. Toward this end, we shall first establish the following theorem. Theorem 1: Let be measurable such that has zero Lebesque measure. Then cannot be identically zero on . We shall need the following lemma which is immediate from the discussion in [26, Sec. 6]. The proof is omitted here. Lemma 1: Let is constructible (cf., (11)). Then the first quadrant of is a subset of . Proof of Theorem 1: If has positive Lebesque measure, since has zero Lebesque measure and hence has zero Lebesque measure, has positive Lebesque measure. Then cannot be a subset of , which implies that cannot be identically zero on . Thus it suffices to prove that has positive Lebesque measure. Using the above Lemma, we see that the first quadrant of , which has positive Lebesque measure, is a subset of . Therefore has positive Lebesque measure. Since is an invertible linear transformation of , its Lebesque measure must also be positive. This proves the theorem. The uniqueness of the canonical form for very general classes of information expressions follows from this theorem. For example, suppose and are two polynomials of the joint entropies such that for all . Let . If is not the zero function, then has zero Lebesque measure. By the theorem, cannot be identical to zero on , which is a contradiction. Therefore is the zero function, i.e., . Thus we see that the canonical form is unique for polynomial information expressions. We note that the uniqueness of the canonical form for linear information expressions has been discussed in [4] and [2, p. 51, Theorem 3.6]. The importance of the canonical form will become clear in the next section. An application of the canonical form to recognizing the symmetry of an information expression will be discussed in Appendix II-A. We note that any invertible linear transformation of the joint entropies can be used for the purpose of defining the canonical form. Nevertheless, the current definition of the canonical form has the advantage that if and are two sets of random variables such that , then the joint entropies involving the random variables in is a subset of the joint entropies involving the random variables in . IV. A CALCULUS FOR VERIFYING LINEAR IDENTITIES AND INEQUALITIES In this section, we shall develop a simple calculus for verifying all linear information identities and inequalities involving a definite number of random variables which can be proved by conventional techniques. All identities and inequalities in this section are assumed to be linear unless otherwise specified. Although our discussion will primarily be on linear identities and inequalities (possibly with linear constraints), our approach can be extended naturally to nonlinear cases. For nonlinear cases, the amount of computation required is larger. The question of what linear combinations of entropies are always nonnegative was first raised by Han [5]. A. Unconstrained Identities Due to the uniqueness of the canonical form for linear information expressions as discussed in the preceding section, it is easy to check whether two expressions and are identical. All we need to do is to express in canonical and are form. If all the coefficients are zero, then identical, otherwise they are not. B. Unconstrained Inequalities Since all information expressions can be expressed in canonical form, we shall only consider inequalities in this form. The following is a simple yet fundamental observation which apparently has not been discussed in the literature. For any , always holds if and only if . This observation, which follows immediately from the definition of , gives a complete characterization of all unconstrained inequalities (not necessary linear) in terms of . From this point of view, an unconstrained inequality is simply a partial characterization of . The nonnegativity of all Shannon’s information measures form a set of inequalities which we shall refer to as the basic YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1927 inequalities. We observe that in the conventional approach to proving information inequalities, whenever we establish an inequality in an intermediate step, we invoke one of the basic inequalities. Therefore, all information inequalities and conditional information identities which can be proved by conventional techniques are consequences of the basic inequalities. These inequalities, however, are not nonredundant. For example, and , which are both and , imply basic inequalities of the random variables again a basic inequality of and . We shall be dealing with linear combinations whose coefficients are nonnegative. We call such linear combinations nonnegative linear combinations. We observe that any Shannon’s information measure can be expressed as a nonnegative linear combination of the following two elemental forms of Shannon’s information measures: i) ii) , where and . This can be done by successive (if necessary) application(s) of the following identities: (13) (14) (15) (16) (17) (18) (Note that all the coefficients in the above identities are nonnegative.) It is easy to check that the total number of Shannon’s information measures of the two elemental forms is equal to (19) The nonnegativity of the two elemental forms of Shannon’s information measures form a proper subset of the set of basic inequalities. We call the inequalities in this smaller set the elemental inequalities. They are equivalent to the basic inequalities because each basic inequality which is not an elemental inequality can be obtained by adding a certain set of elemental inequalities in view of (13)–(18). The minimality of the elemental inequalities is proved in Appendix I. If the elemental inequalities are expressed in canonical form, then they become linear inequalities in . Denote this set of inequalities by , where is an matrix, and define (20) Since the elemental inequalities are satisfied by any , we have . Therefore, if then i.e., always holds. Let , be the column -vector whose th component is equal to and all the other components are equal to . Since a joint entropy can be expressed as a nonnegative linear combination of the two elemental forms of Shannon’s information measures, each can be expressed as a nonnegative linear combination of the rows of . This implies that is a pyramid in the positive quadrant. Let be any column -vector. Then ,a linear combination of joint entropies, is always nonnegative if . This is equivalent to say that the minimum of the problem (Primal) subject to Minimize is zero. Since gives ( is the only corner of ), all we need to do is to apply the optimality test of the simplex method [19] to check whether the point is optimal. We can obtain further insight in the problem from the Duality Theorem in linear programming [19]. The dual of the above linear programming problem is (Dual) Maximize subject to and where By the Duality Theorem, the maximum of the dual problem is also zero. Since the cost function in the dual problem is zero, the maximum of the dual problem is zero if and only if the feasible region and (21) is nonempty. Theorem 2: is nonempty if and only if for some , where is a column -vector, i.e., is a nonnegative linear combination of the rows of . is nonempty if Proof: We omit the simple proof that and only if for some , where is a column -vector. Let (22) If for some , then . Let Since can be expressed as a nonnegative linear combination of the rows of (23) can also be expressed as a nonnegative linear combinations of the rows of . By (22), this implies for some . Thus always holds (subject to ) if and only if it is a nonnegative linear combination of the elemental inequalities (in canonical form). 1928 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997 We now summarize the results in this section. For information expressions and , let be the cost function subject to the elemental inequalities. Then apply the optimality test of the simplex method to the point . If is optimal, then always holds. If not, then may or may not always hold. If it always holds, it is not implied by the elemental inequalities. In other words, it cannot be proved by conventional techniques, namely, invoking the elemental inequalities. Han has previously studied unconstrained information inequalities involving three random variables [5] as well as information inequalities which are symmetrical in all the random variables involved [6], and explicit characterizations of such inequalities were obtained. A discussion of these results is found in Appendix II. C. Constrained Inequalities Linear constraints on arise frequently in information theory. Some examples are 1) , , and are mutually independent if and only if 2) , , and are pairwise-independent if and only if When is a subspace of , we can easily modify the method in the last subsection by taking advantage of the linear structure of the problem. Let the constraints on be given by (24) is a matrix (i.e., there are constraints). where Following our discussion in the last subsection, a linear combination of joint entropies is always nonnegative under the constraint if the minimum of the problem Minimize subject to is zero. Let be the rank of we can write . Since and is in the null space of , (25) matrix whose columns form a basis where is a of the orthogonal complement of the row space of , and is a column -vector. Then the elemental inequalities can be expressed as (26) 3) 4) is a function of - - - - - - if and only if . form a Markov chain if and only if and . In order to facilitate our discussion, we now introduce an alternative set of notations for . We do not distinguish elements and singletons of , and we write unions of subsets of as juxtapositions. For any nonempty , we use to denote , i.e., (refer to Section II for the definition of ). We also define for nonempty and in terms of , (27) (but not necessarily in the positive which is a pyramid in quadrant). Likewise, can be expressed as . With the constraints and all expressions in terms of , is always nonnegative under the constraint if the minimum of the problem Minimize to simplify notations. In general, a constraint is given by a subset instance, for the last example above, of . For When , there is no constraint. (In fact, there is no constraint if .) Parallel to our discussion in the preceding subsection, we have the following more general observation: Under the constraint , for any , always hold if and only if . Again, this gives a complete characterization of all constrained inequalities in terms of . Thus in fact is the origin of all constrained inequalities, with unconstrained inequalities being a special case. In this and the next subsection, however, we shall confine our discussion to the linear case. becomes subject to is zero. Again, since gives ( is the only corner of ), all we need to do is to apply the optimality test of the simplex method to check whether the point is optimal. By imposing the constraints in (24), the number of elemental inequalities remains the same, while the dimension of the problem decreases from to . Again from the Duality Theorem, we see that is always nonnegative if for some , where is a column vector, i.e., is a nonnegative linear combination of the elemental inequalities (in terms of ). We now summarize the results in this section. Let the constraints be given in (24). For expressions and , let . Then let be the cost function subject to the elemental inequalities (in terms of ) and apply the optimality test to the point . If is optimal, then always holds, otherwise it may or may not always hold. If it always holds, it is not implied by the elemental inequalities. In other words, it cannot be proved by conventional techniques. YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1929 D. Constrained Identities V. FURTHER DISCUSSION We impose the constraints in (24) as in the last subsection. As we have pointed out at the beginning of the paper, two information expressions and are identical if and only if and always hold. Thus we can apply the method in the last subsection to verify all constrained identities that can be proved by conventional techniques. When are unconstrained, the uniqueness of the canonical form for linear information expressions asserts that if and only if . However, when the constraints are imposed, does not imply in (24) on . We give a simple example to illustrate this point. Suppose and we impose the constraint . Then every information expression can be expressed in terms and . Now consider of (28) Note that the coefficients in the above expression are nonzero. But from the elemental inequalities, we have (29) and ON We have seen that , but it is not clear whether . If so, and hence all information inequalities are completely characterized by the elemental inequalities. In the following, we shall use the notations and when we refer to and for a specific . For In -Measure notations, the elemental inequalities are , , and . . It then follows from Lemma 1 that Inspired by the current work, the characterization of and has recently been investigated by Zhang and Yeung. They have found that (therefore in general), but , the closure of , is equal to [29]. This implies that all unconstrained (linear or nonlinear) inequalities involving three random variables are consequences of the elemental inequalities of the same set of random variables. However, it is not clear whether the same is true for all constrained inequalities. They also have discovered the following conditional inequality involving four random variables which is not implied by the elemental inequalities: If and , then (30) which imply that . We now discuss a special application of the method described in this subsection. Let us consider the following problem which is typical in probability theory. Suppose we are given that - - - and - - - form a Markov chain, and that and are independent. We ask whether and are always independent. This problem can be formulated in information-theoretic terms with the constraints represented by , , and , and we want to know whether they imply . Problems of such kind can be handled by the method described in this subsection. Our method can prove any independence relation which can be proved by conventional information-theoretic techniques. The advantage of using an information-theoretic formulation of the problem is that we can avoid manipulations of the joint distribution directly, which is awkward [8], if not difficult. It may be difficult to devise a calculus to handle independence relations of random variables in a general setting,2 because an independence relation is “discrete” in the sense that it is either true or false. On the other hand, the problem becomes a continuous one if it is formulated in informationtheoretic terms (because mutual informations are continuous functionals), and continuous problems are in general less difficult to handle. From this point of view, the problem of determining independence of random variables is a discrete problem embedded in a continuous problem. 2 A calculus for independence relations has been devised by Massey [9] for the special case when the random variables have a causal interpretation. If, in addition, and the above inequality implies that This is a conditional independence relation which is not implied by the elemental inequalities. However, whether remained an open problem. Subsequently, they have deby discovering the following uncontermined that strained inequality involving four random variables which is not implied by the elemental inequalities of the same set of random variables [30]: The existence of the above two inequalities indicates that there may be a lot of information inequalities yet to be discovered. Since most converse coding theorems are proved by means of information inequalities, it is plausible that some of these inequalities yet to be discovered are needed to settle certain open problems in information theory. In the remainder of the section, we shall further elaborate on the significance of by pointing out its relations with some important problems in probability theory and information theory. 1930 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997 A. Conditional Independence Relations For any fixed number of random variables, a basic question is what sets of conditional independence relations are possible. In the recent work of Matúš and Studený [17], this problem and let is formulated as follows. Recall that be the family of all couples where and is the union of two, not necessarily different, singletons and of . Having a system of random variables with subsystems , , we introduce the notation where is the abbreviation of the statement “ is conditionally independent of given .” For , means is determined by . The subsystem is presumed to be constant. A subfamily is called probabilistically ( -) representable if there exists a system , . The problem is called a -representation, such that of all -representable relations. to characterize the class Note that this problem is more general than the application discussed in Section IV-D. is equivalent to . If Now is a proper subset of , i.e., is not of can be written as a nonnegative elemental form, then combination of the corresponding elemental forms of Shannon’s information measure. We observe that if and only if each of the corresponding elemental forms of Shannon’s information measures vanishes, and that an elemental form of Shannon’s information measure vanishes if and only if the corresponding conditional independence relation holds. Thus it is actually unnecessary to consider , , for separately because it is determined by the other conditional independence relations. Let us now look at some examples. For and statement: and Subsequent work on this subject has been done by Pearl and his collaborators in the 1980’s, and their work is summarized in the book by Pearl [18]. Their work has mainly been motivated by the study of the logic of integrity constraints from databases. Pearl conjectured that Dawid’s four axioms completely characterize the conditional independence structure of any joint distribution. This conjecture, however, was refuted by the work of Studený [20]. Since then, Matúš and Studený have written a series of papers on this problem [10]–[17], [20]–[24]. So far, they have solved the problem for three random variables, but the problem for four random variables remains open. The relation between this problem and is the following. Suppose we want to determine whether a subfamily of is -representable. Now each corresponds to setting to zero in . Note that is a hyperplane containing the origin in . Thus is representable if and only if there exists a in such that for all . Therefore, the problem of conditional independence relations is a subproblem of the problem of characterizing . B. Optimization of Information Quantities Consider minimizing given (31) (32) (33) where . This problem is equivalent to the following minimization problem. Minimize subject to As pointed out in the last paragraph, the couples are actually redundant. Let be a such that is not system of random variables , , and and are not deterministic, functions of each other. Then it is easy to see that Thus is -representable. is not On the other hand, and imply . representable, because The recent studies on the problem of conditional independence relations was launched by a seminal paper by Dawid [3], in which he proposed four axioms as heuristic properties of conditional independence. In information-theoretic terms, these four axioms can be summarized by the following (34) (35) (36) and (37) As no characterization of is available, this minimization problem cannot be solved. Nevertheless, since , if we replace by in the above minimization problem, it becomes a linear programming problem which renders a lower bound on the solution. C. Multiuser Information Theory The framework for information inequalities developed in this paper provides new tools for problems in multiuser information theory. Consider the source coding problem in Fig. 1, in which and are source random variables, YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES 1931 Prover) has been developed by Yeung and Yan [27], and it is available on World Wide Web. The following session from ITIP contains verifications of Example 1 and 2, respectively, in Section I. >> ITIP(’I(Y; Z) >= I(X; Z)’, ’I(X; Z|Y) = 0’) True >> ITIP(’H(X,Y) - 1.04 H(Y) + 0.7 I(Y; X,Z) + 0.04 H(Y|Z) >= 0’) True Fig. 1. A multiterminal source coding problem. and the blocks on the left and right are encoders and decoders, respectively. The random variables , , and are the outputs of the corresponding encoders. Given , , and , where and , we are interested in the admissible region of the triple . Evidently, , , and give the number of bits needed for the encoders. From the encoding and decoding requirements, we immediately have , , , , , and equal to zero. Now there are five random variables involved in this problem. Then the intersection of and the set containing all such that is the set of all possible vectors of the joint entropies involving given that they satisfy the encoding and decoding requirements of the problem as well as the constraints on the joint entropies involving and . Then is given as the projection of this set on the coordinates , , and . In the same spirit as that in the last subsection, an explicit outer bound of , denoted by , is given by replacing by . We refer to an outer bound such as as an LP (linear programming) bound. This is a new tool for proving converse coding theorems for problems in multiuser information theory. The LP bound already has found applications in the recent work of Yeung and Zhang [28] on a new class of multiterminal source coding problems. We expect that this approach will have impact on other problems in multiuser information theory. VI. CONCLUDING REMARKS We have identified the region as the origin all information inequalities. Our work suggests the possibility of the existence of information inequalities which cannot be proved by conventional techniques, and this has been confirmed by the recent results of Zhang and Yeung [29], [30]. A product from the framework we have developed is a simple calculus for verifying all linear information inequalities involving a definite number of random variables possibly with linear constraints which can be proved by conventional techniques; these include all inequalities of such type in the literature. Based on this calculus, a software running on MATLAB called ITIP (Information-Theoretic Inequality We see from (19) that the amount of computation required is moderate when . Our work gives a partial answer to Han’s question of what linear combinations of entropies are always nonnegative [5]. A complete answer to this question is impossible without further characterization of . is a very fundamental problem The characterization of in information theory. However, in view of the difficulty of some special cases of this problem [15], [17], [29], [30], it is not very hopeful that this problem can be solved completely in the near future. Nevertheless, partial characterizations of may lead to the discovery of some new inequalities which make the solutions of certain open problems in information theory possible. APPENDIX I MINIMALITY OF THE ELEMENTAL INEQUALITIES The elemental inequalities in set-theoretic notations have one of the following two forms: ; 1) , where and . 2) They will be referred to as -inequalities and -inequalities, respectively. We are to show that all the elemental inequalities are nonredundant, i.e., none of them is implied by the others. For an -inequality (38) since it is the only elemental inequality which involves the , it is clearly not implied by the other atom elemental inequalities. Therefore, we only need to show that all -inequalities are nonredundant. To show that a -inequality is nonredundant, it suffices to show that there exists a measure on which satisfies all other elemental inequalities except for that one. We shall show that the -inequality (39) is nonredundant. To facilitate our discussion, we denote by and we let be the atoms in , where (40) We first consider the case when . We construct a measure if otherwise where . In other words, , i.e., by (41) is the only 1932 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997 atom with measure ; all other atoms have measure . Then is trivially true. It is also trivial to Consider check that for any (42) such that and for any (43) . On the other hand, if is a proper if , then contains at least subset of two atoms, and therefore, (44) This completes the proof for the -inequality in (39) to be nonredundant when . We now consider the case when , or . We construct a measure as follows. For the atoms in , let . For atom of we let (51) and The nonnegativity of the second term above follows from (46). For the first term, is nonempty if and only if and (52) If this condition is not satisfied, then the first term in (51) , and (50) follows immediately. becomes Let us assume that the condition in (52) is satisfied. Then by simple counting, we see that the number atoms in is equal to , where (45) For example, for , if is odd, it is referred to as an odd atom of , and if is even, it is referred to as an even . For any atom , (46) This completes the construction of . We first prove that , there are atoms in namely, where or for We first consider the case when . We check that , i.e., (47) Consider Then where the last equality follows from the binomial formula (48) for . This proves (47). Next we prove that satisfies all that for any , the atom . Thus -inequalities. We note is not in (49) It remains to prove that for (39), i.e., for any and satisfies all -inequalities except such that (50) contains exactly one atom. If this atom is an even atom of , then the first term in (51) is either or (cf., (45)), and (50) follows immediately. If this atom is an odd atom of , then the first term in (51) is equal to . This happens if and only if and have one common element, which implies that is nonempty. Therefore, the second term in (51) is at least , and hence (50) follows. Finally, we consider the case when . Using the binomial formula in (48), we see that the number of odd atoms and even atoms of in are the same. Therefore, the first term in (51) is equal to if and is equal to otherwise. The former is true if and only if , which implies that YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES is nonempty, or that the second term is at least . Thus in either case (50) is true. This completes the proof that (39) is nonredundant. APPENDIX II SOME SPECIAL FORMS OF UNCONSTRAINED INFORMATION INEQUALITIES In this appendix, we shall discuss some special forms of unconstrained linear information inequalities previously investigated by Han [5], [6]. Explicit necessary and sufficient conditions for these inequalities to always hold have been obtained. The relation between these inequalities and the results in the current paper will also be discussed. 1933 It follows trivially from the elemental inequalities that is a sufficient condition for to always hold. The necessity of this condition can be seen by noting the existence of random variables for such that and each for all and . This implies that all unconstrained linear symmetrical information inequalities are consequences of the elemental inequalities. We refer the reader to [5] for a more detailed discussion of symmetrical information inequalities. B. Information Inequalities Involving Three Random Variables Consider . Let A. Symmetrical Information Inequalities An information expression is said to be symmetrical if it is identical under every permutation among . For example, for , the expression is symmetrical. This can be seen by permuting and symbolically in the expression. Now let us consider the expression . If we replace and by each other, the expression becomes , which is symbolically different from the original expression. However, both expression are identical to . Therefore, the two expressions are in fact identical, and the expression is actually symmetrical although it is not readily recognized symbolically. The symmetry of an information expression in general cannot be recognized symbolically. However, it is readily recognized symbolically if the expression is in canonical form. This is due to the uniqueness of the canonical form as discussed in Section III. Consider a linear symmetrical information expression (in canonical form). As seen in Section IV-B, can be expressed as a linear combination of the two elemental forms of Shannon’s information measures. It was shown in [5] that every symmetrical expression can be written in the form and and let Since is an invertible linear transformation of , all linear information expression can be written as , where It was shown in [6] that always holds if and only if the following conditions are satisfied: (53) In terms of , the elemental inequalities can be expressed as , where where and, for , Note that is the sum of all Shannon’s information mea, is sures of the first elemental form, and for the sum of all Shannon’s information measures of the second elemental form conditioning on random variables. From the discussion in Section IV-B, we see that always holds if and only if is a nonnegative linear combination of the rows of . We leave it as an exercise for the reader to show that is a nonnegative linear combination of the rows of if and only if the conditions in (53) are satisfied. Therefore, all unconditional linear inequalities involving three random variables are consequences of the elemental inequalities. This result also implies that is the smallest pyramid containing . 1934 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997 ACKNOWLEDGMENT [13] The author wishes to acknowledge the help of a few individuals during the preparation of this paper. They include I. Csiszár, B. Hajek, F. Matúš, Y.-O. Yan, E.-h. Yang, and Z. Zhang. [14] REFERENCES [15] [16] [17] [18] [1] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [2] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems. New York: Academic, 1981. [3] A. P. Dawid, “Conditional independence in statistical theory (with discussion),” J. Roy. Statist. Soc., Ser. B, vol. 41, pp. 1–31, 1979. [4] T. S. Han, “Linear dependence structure of the entropy space,” Inform. Contr, vol. 29, pp. 337–368, 1975. [5] , “Nonnegative entropy measures of multivariate symmetric correlations,” Inform. Contr., vol. 36, pp. 133–156, 1978. [6] , “A uniqueness of Shannon’s information distance and related nonnegativity problems,” J. Combin.., Inform. Syst. Sci., vol. 6, no. 4, pp. 320–331, 1981. [7] T. Kawabata and R. W. Yeung, “The structure of the I -Measure of a Markov chain,” IEEE Trans. Inform. Theory, vol. 38, pp. 1146–1149, May 1992. [8] J. L. Massey, “Determining the independence of random variables,” in 1995 IEEE Int. Symp. on Information Theory (Whistler, BC, Canada, Sept. 17–22, 1995). , “Causal interpretations of random variables,” in 1995 IEEE Int. [9] Symp. on Information Theory (Special session in honor of Mark Pinsker on the occasion of his 70th birthday) (Whistler, BC, Canada, Sept. 17–22, 1995). [10] F. Matúš, “Abstract functional dependency structures,” Theor. Comput. Sci., vol. 81, pp. 117–126, 1991. [11] , “On equivalence of Markov properties over undirected graphs,” J. Appl. Probab., vol. 29, pp. 745–749, 1992. , “Ascending and descending conditional independence relations,” [12] in Trans. 11th Prague Conf. on Information Theory, Statistical Decision Functions and Random Processes (Academia, Prague, 1992), vol. B, pp. 181–200. [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] , “Probabilistic conditional independence structures and matroid theory: Background,” Int. J. General Syst., vol. 22, pp. 185–196, 1994. , “Extreme convex set functions with many nonnegative differences,” Discr. Math., vol. 135, pp. 177–191, 1994. , “Conditional independence among four random variables II,” Combin., Prob. Comput., to be published. , “Conditional independence structures examined via minors,” Ann. Math. Artificial Intell., submitted for publication. F. Matúš and M. Studený, “Conditional independence among four random variables I,” Combin., Prob. Comput., to be published. J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Morgan Kaufman, 1988. G. Strang, Linear Algebra and Its Applications, 2nd ed. New York: Academic, 1980. M. Studený, “Attempts at axiomatic description of conditional independence,” in Proc. Work. on Uncertainty Processing in Expert Systems, supplement to Kybernetika, vol. 25, nos. 1–3, pp. 65–72, 1989. , “Multiinformation and the problem of characterization of conditional independence relations,” Probl. Contr. and Inform. Theory, vol. 18, pp. 3–16, 1989. , “Conditional independence relations have no finite complete characterization,” in Trans. 11th Prague Conf. on Information Theory, Statistical Decision Functions and Random Processes (Academia, Prague, 1992), vol. B, pp. 377–396. , “Structural semigraphoids,” Int. J. Gen. Syst., submitted for publication. , “Descriptions of structures of stochastic independence by means of faces and imsets (in three parts),” Int. J. Gen. Syst., submitted for publication. , “A new outlook on Shannon’s information measures,” IEEE Trans. Inform. Theory, vol. 37, pp. 466–474, May 1991. , “Multilevel diversity coding with distortion,” IEEE Trans. Inform. Theory, vol. 41, pp. 412–422, Mar. 1995. R. W. Yeung and Y.-O. Yan, ITIP, [Online] Available WWW: http://www.ie.cuhk.edu.hk/ ITIP. R. W. Yeung and Z. Zhang, “Miltilevel distributed source coding,” in 1997 IEEE Int. Symp. on Information Theory (Ulm, Germany, June 1997), p. 276. Z. Zhang and R. W. Yeung, “A non-Shannon type conditional information inequality,” this issue, pp. 1982–1986. , “On the characterization of entropy function via information inequalities,” to be published in IEEE Trans. Inform. Theory.
© Copyright 2026 Paperzz