A Framework For Linear Information Inequalities

1924
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
A Framework for Linear Information Inequalities
Raymond W. Yeung, Senior Member, IEEE
Abstract— We present a framework for information inequalities, namely, inequalities involving only Shannon’s information
measures, for discrete random variables. A region in IR2 01 ,
denoted by 03 , is identified to be the origin of all information inequalities involving n random variables in the sense that all such
inequalities are partial characterizations of 03 . A product from
this framework is a simple calculus for verifying all unconstrained
and constrained linear information identities and inequalities
which can be proved by conventional techniques. These include
all information identities and inequalities of such types in the
literature. As a consequence of this work, most identities and
inequalities involving a definite number of random variables can
now be verified by a software called ITIP which is available on
the World Wide Web. Our work suggests the possibility of the
existence of information inequalities which cannot be proved by
conventional techniques. We also point out the relation between
03 and some important problems in probability theory and
information theory.
Example 2:
Index Terms— Entropy, I -Measure, information identities, information inequalities, mutual information.
I. INTRODUCTION
S
HANNON’S information measures refer to entropies, conditional entropies, mutual informations, and conditional
mutual informations. For information inequalities, we refer
to those involving only Shannon’s information measures for
discrete random variables. These inequalities play a central
role in converse coding theorems for problems in information
theory with discrete alphabets. This paper is devoted to a systematic study of these inequalities. We begin our discussion by
examining the two examples below which exemplify what we
call the “conventional” approach to proving such inequalities.
Example 1: This is a version of the well-known data processing theorem. Let
be random variables such that
- - - - form a Markov chain. Then
In the above, the second equality follows from the Markov
condition, while the inequality follows because
is
always nonnegative.
Manuscript received August 10, 1995; revised February 10, 1997. The
material in this paper was presented in part at the 1996 IEEE Information
Theory Workshop, Haifa, Israel, June 9–13, 1996.
The author is with the Department of Information Engineering, The Chinese
University of Hong Kong, Shatin, N.T., Hong Kong.
Publisher Item Identifier S 0018-9448(97)06816-8.
The inequalities above follow from the nonnegativity of
,
, and
, respectively.
In the conventional approach, we invoke certain elementary
identities and inequalities in the intermediate steps of a proof.
Some frequently invoked identities and inequalities are
if
- - - -
if
- - - -
Proving an identity or an inequality using the conventional
approach can be quite tricky, because it may not be easy to
see which elementary identity or inequality should be invoked
next. For certain problems, like Example 1, we may rely on
our insight to see how we should proceed in the proof. But of
course, most of our insight in problems is developed from the
hindsight. For other problems like or even more complicated
than Example 2 (which involves only three random variables),
it may not be easy at all to work it out by brute force.
The proof of information inequalities can be facilitated by
the use of information diagrams1 [25]. However, the use of
such diagrams becomes very difficult when the number of
random variables is more than four.
1 It was called an I -diagram in [25], but we prefer to call it an information
diagram to avoid confusion with an eye diagram in communication theory.
0018–9448/97$10.00  1997 IEEE
YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES
1925
In the conventional approach, elementary identities and
inequalities are invoked in a sequential manner. In the new
framework that we shall develop in this paper, all identities
and inequalities are considered simultaneously.
Before we proceed any further, we would like to make a
few remarks. Let
and
be any expressions depending
only on Shannon’s information measures. We shall call them
information expressions, and specifically linear information
expressions if they are linear combinations of Shannon’s
information measures. Likewise, we shall call inequalities
involving only Shannon’s information measures information
inequalities. Now
if and only if
.
Therefore, if for any expression we can determine whether it
is always nonnegative, then we can determine whether any
particular inequality always holds. We note that
if
and only if
and
. Therefore, it suffices to
study inequalities.
The rest of the paper is organized as follows. In the next
section, we first give a brief review of -Measure [25] on
which a few proofs will be based. In Section III, we introduce
the canonical form of an information expression and discuss its
uniqueness. We also define a region called
which is central
to the discussion in this paper. In Section IV, we present
a simple calculus for verifying information identities and
inequalities which can be proved by conventional techniques.
In Section V, we further elaborate the significance of
by
pointing out its relations with some important problems in
probability theory and information theory. Concluding remarks
are given in Section VI.
II. REVIEW
OF THE
THEORY
OF
(
), i.e., for any (not necessarily disjoint)
(2)
When
, we interpret (2) as
(3)
When
, (2) becomes
(4)
When
and
, (2) becomes
(5)
Thus (2) covers all the cases of Shannon’s information measures.
Let
for some
Note that
arbitrary one-to-one mappings
(6)
. Let
. Define
and let
(7)
(8)
-MEASURE
In this section, we give a review of the main results
regarding -Measure. For a detailed discussion of -Measure,
we refer the reader to [25]. Further results on -Measure can
be found in [7].
Let
be jointly distributed discrete random
variables, and
be a set variable corresponding to a random
variable . Define the universal set to be
and let
be the -field generated by
. The
, where is either
or
.
atoms of have the form
be the set of all atoms of
except for
,
Let
which is by construction because
(1)
where
Then
and
for
.
(9)
where
) with
is a unique
if
if
An important characteristic of
so we can write
matrix (independent of
(10)
is that it is invertible [25],
(11)
.
Note that
To simplify notations, we shall use
to denote
and
to denote
. Let
. It was
shown in [25] that there exists a unique signed measure
on
which is consistent with all Shannon’s information measures
via the following formal substitution of symbols:
is completely specified by the set of values
In other words,
,
, namely, all the joint entropies involving
, and it follows from (5) that
is the unique
measure on
which is consistent with all Shannon’s information measures. Note that
in general is not nonnegative.
However, if
form a Markov chain,
is always
nonnegative [7].
As a consequence of the theory of -Measure, the information diagram was introduced as a tool to visualize the
relationship among information measures [25]. Applications
of information diagrams can be found in [7], [25], and [26].
1926
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
III. THE CANONICAL FORM
In the rest of the paper, we shall assume that
are the random variables involved in our discussion.
We observe that conditional entropies, mutual informations,
and conditional mutual informations can be expressed as a
linear combination of joint entropies by using the following
identity:
(12)
. Thus any information expression can
where
be expressed in terms of the joint entropies. We call this the
canonical form of an information expression.
Now for any
their joint entropies correspond
to a vector in
, where we regard
as the
coordinates of
. On the other hand, a vector in
is
said to be constructible if there exist
whose joint
entropies are given by . We are then motivated to define
is constructible
As we shall see,
not only gives a complete characterization
of all information inequalities, but it also is closely related to
some important problems in probability theory and information
theory. Thus a complete characterization of
is of fundamental importance. To our knowledge, there has not been such a
characterization in the literature (see Section V).
Now every information expression can be expressed in
canonical form. A basic question to ask is in what sense
the canonical form is unique. Toward this end, we shall first
establish the following theorem.
Theorem 1: Let
be measurable such that
has zero Lebesque measure. Then
cannot
be identically zero on
.
We shall need the following lemma which is immediate
from the discussion in [26, Sec. 6]. The proof is omitted here.
Lemma 1: Let
is constructible
(cf., (11)). Then the first quadrant of
is a subset of
.
Proof of Theorem 1: If
has positive Lebesque measure, since
has zero Lebesque measure and
hence
has zero Lebesque measure,
has positive Lebesque measure. Then
cannot be a subset of
, which implies that
cannot be identically zero on . Thus it suffices to prove
that
has positive Lebesque measure. Using the above
Lemma, we see that the first quadrant of
, which has
positive Lebesque measure, is a subset of
. Therefore
has positive Lebesque measure. Since
is an invertible
linear transformation of
, its Lebesque measure must also
be positive. This proves the theorem.
The uniqueness of the canonical form for very general
classes of information expressions follows from this theorem.
For example, suppose
and
are two polynomials of
the joint entropies such that
for all
. Let
. If is not the zero function, then
has zero Lebesque measure. By the theorem,
cannot be
identical to zero on , which is a contradiction. Therefore is
the zero function, i.e.,
. Thus we see that the canonical
form is unique for polynomial information expressions. We
note that the uniqueness of the canonical form for linear
information expressions has been discussed in [4] and [2, p.
51, Theorem 3.6].
The importance of the canonical form will become clear
in the next section. An application of the canonical form to
recognizing the symmetry of an information expression will
be discussed in Appendix II-A. We note that any invertible
linear transformation of the joint entropies can be used for
the purpose of defining the canonical form. Nevertheless, the
current definition of the canonical form has the advantage
that if
and
are two sets of random variables such
that
, then the joint entropies involving the random
variables in
is a subset of the joint entropies involving the
random variables in
.
IV. A CALCULUS FOR VERIFYING
LINEAR IDENTITIES AND INEQUALITIES
In this section, we shall develop a simple calculus for verifying all linear information identities and inequalities involving
a definite number of random variables which can be proved
by conventional techniques. All identities and inequalities
in this section are assumed to be linear unless otherwise
specified. Although our discussion will primarily be on linear
identities and inequalities (possibly with linear constraints),
our approach can be extended naturally to nonlinear cases.
For nonlinear cases, the amount of computation required is
larger. The question of what linear combinations of entropies
are always nonnegative was first raised by Han [5].
A. Unconstrained Identities
Due to the uniqueness of the canonical form for linear
information expressions as discussed in the preceding section,
it is easy to check whether two expressions
and
are
identical. All we need to do is to express
in canonical
and
are
form. If all the coefficients are zero, then
identical, otherwise they are not.
B. Unconstrained Inequalities
Since all information expressions can be expressed in canonical form, we shall only consider inequalities in this form.
The following is a simple yet fundamental observation which
apparently has not been discussed in the literature.
For any
,
always holds if and
only if
.
This observation, which follows immediately from the definition of
, gives a complete characterization of all unconstrained inequalities (not necessary linear) in terms of .
From this point of view, an unconstrained inequality is simply
a partial characterization of .
The nonnegativity of all Shannon’s information measures
form a set of inequalities which we shall refer to as the basic
YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES
1927
inequalities. We observe that in the conventional approach
to proving information inequalities, whenever we establish
an inequality in an intermediate step, we invoke one of the
basic inequalities. Therefore, all information inequalities and
conditional information identities which can be proved by
conventional techniques are consequences of the basic inequalities. These inequalities, however, are not nonredundant. For
example,
and
, which are both
and , imply
basic inequalities of the random variables
again a basic inequality of
and .
We shall be dealing with linear combinations whose coefficients are nonnegative. We call such linear combinations
nonnegative linear combinations. We observe that any Shannon’s information measure can be expressed as a nonnegative
linear combination of the following two elemental forms of
Shannon’s information measures:
i)
ii)
, where
and
.
This can be done by successive (if necessary) application(s)
of the following identities:
(13)
(14)
(15)
(16)
(17)
(18)
(Note that all the coefficients in the above identities are
nonnegative.) It is easy to check that the total number of
Shannon’s information measures of the two elemental forms
is equal to
(19)
The nonnegativity of the two elemental forms of Shannon’s
information measures form a proper subset of the set of
basic inequalities. We call the inequalities in this smaller set
the elemental inequalities. They are equivalent to the basic
inequalities because each basic inequality which is not an
elemental inequality can be obtained by adding a certain set
of elemental inequalities in view of (13)–(18). The minimality
of the elemental inequalities is proved in Appendix I.
If the elemental inequalities are expressed in canonical form,
then they become linear inequalities in
. Denote this set of
inequalities by
, where
is an
matrix, and
define
(20)
Since the elemental inequalities are satisfied by any
, we have
. Therefore, if
then
i.e.,
always holds.
Let
, be the column -vector whose th
component is equal to and all the other components are
equal to . Since a joint entropy can be expressed as a
nonnegative linear combination of the two elemental forms
of Shannon’s information measures, each
can be expressed
as a nonnegative linear combination of the rows of . This
implies that is a pyramid in the positive quadrant.
Let
be any column -vector. Then
,a
linear combination of joint entropies, is always nonnegative if
. This is equivalent to say that the minimum
of the problem
(Primal)
subject to
Minimize
is zero. Since
gives
(
is the only corner
of ), all we need to do is to apply the optimality test of
the simplex method [19] to check whether the point
is optimal.
We can obtain further insight in the problem from the
Duality Theorem in linear programming [19]. The dual of the
above linear programming problem is
(Dual)
Maximize
subject to
and
where
By the Duality Theorem, the maximum of the dual problem is
also zero. Since the cost function in the dual problem is zero,
the maximum of the dual problem is zero if and only if the
feasible region
and
(21)
is nonempty.
Theorem 2:
is nonempty if and only if
for
some
, where
is a column -vector, i.e.,
is a
nonnegative linear combination of the rows of .
is nonempty if
Proof: We omit the simple proof that
and only if
for some
, where is a column
-vector. Let
(22)
If
for some
, then
. Let
Since
can be expressed as a nonnegative linear
combination of the rows of
(23)
can also be expressed as a nonnegative linear combinations
of the rows of . By (22), this implies
for some
.
Thus
always holds (subject to
) if and
only if it is a nonnegative linear combination of the elemental
inequalities (in canonical form).
1928
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
We now summarize the results in this section. For information expressions
and
, let
be the
cost function subject to the elemental inequalities. Then apply
the optimality test of the simplex method to the point
.
If
is optimal, then
always holds. If
not, then
may or may not always hold. If it
always holds, it is not implied by the elemental inequalities. In
other words, it cannot be proved by conventional techniques,
namely, invoking the elemental inequalities.
Han has previously studied unconstrained information inequalities involving three random variables [5] as well as
information inequalities which are symmetrical in all the
random variables involved [6], and explicit characterizations of
such inequalities were obtained. A discussion of these results
is found in Appendix II.
C. Constrained Inequalities
Linear constraints on
arise frequently in information theory. Some examples are
1)
,
, and
are mutually independent if and only if
2)
,
, and
are pairwise-independent if and only if
When
is a subspace of
, we can easily modify the
method in the last subsection by taking advantage of the linear
structure of the problem. Let the constraints on
be
given by
(24)
is a
matrix (i.e., there are
constraints).
where
Following our discussion in the last subsection, a linear
combination of joint entropies
is always nonnegative
under the constraint
if the minimum of the problem
Minimize
subject to
is zero.
Let be the rank of
we can write
. Since
and
is in the null space of
,
(25)
matrix whose columns form a basis
where is a
of the orthogonal complement of the row space of , and
is a column
-vector. Then the elemental inequalities
can be expressed as
(26)
3)
4)
is a function of
- - - - - -
if and only if
.
form a Markov chain if and only
if
and
.
In order to facilitate our discussion, we now introduce
an alternative set of notations for
. We do not
distinguish elements and singletons of , and we write unions
of subsets of
as juxtapositions. For any nonempty
,
we use
to denote
, i.e.,
(refer to Section
II for the definition of ). We also define for nonempty
and in terms of
,
(27)
(but not necessarily in the positive
which is a pyramid in
quadrant). Likewise,
can be expressed as
.
With the constraints and all expressions in terms of ,
is always nonnegative under the constraint
if the
minimum of the problem
Minimize
to simplify notations.
In general, a constraint is given by a subset
instance, for the last example above,
of
. For
When
, there is no constraint. (In fact, there is
no constraint if
.) Parallel to our discussion in the
preceding subsection, we have the following more general
observation:
Under the constraint , for any
,
always hold if and only if
.
Again, this gives a complete characterization of all constrained
inequalities in terms of . Thus
in fact is the origin of all
constrained inequalities, with unconstrained inequalities being
a special case. In this and the next subsection, however, we
shall confine our discussion to the linear case.
becomes
subject to
is zero. Again, since
gives
(
is the
only corner of ), all we need to do is to apply the optimality
test of the simplex method to check whether the point
is optimal.
By imposing the constraints in (24), the number of elemental inequalities remains the same, while the dimension
of the problem decreases from to
. Again from the
Duality Theorem, we see that
is always nonnegative if
for some
, where is a column vector, i.e.,
is a nonnegative linear combination of the
elemental inequalities (in terms of ).
We now summarize the results in this section. Let the
constraints be given in (24). For expressions
and
, let
. Then let
be the cost function subject
to the elemental inequalities (in terms of ) and apply the
optimality test to the point
. If
is optimal,
then
always holds, otherwise it may or may
not always hold. If it always holds, it is not implied by the
elemental inequalities. In other words, it cannot be proved by
conventional techniques.
YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES
1929
D. Constrained Identities
V. FURTHER DISCUSSION
We impose the constraints in (24) as in the last subsection.
As we have pointed out at the beginning of the paper, two
information expressions
and
are identical if and only if
and
always hold. Thus we can apply the
method in the last subsection to verify all constrained identities
that can be proved by conventional techniques.
When
are unconstrained, the uniqueness of the
canonical form for linear information expressions asserts that
if and only if
. However, when the constraints
are imposed,
does not imply
in (24) on
. We give a simple example to illustrate this point.
Suppose
and we impose the constraint
.
Then every information expression can be expressed in terms
and
. Now consider
of
(28)
Note that the coefficients in the above expression are nonzero.
But from the elemental inequalities, we have
(29)
and
ON
We have seen that
, but it is not clear whether
. If so,
and hence all information inequalities are
completely characterized by the elemental inequalities. In the
following, we shall use the notations
and
when we
refer to
and for a specific .
For
In
-Measure notations, the elemental inequalities are
,
, and
.
.
It then follows from Lemma 1 that
Inspired by the current work, the characterization of
and
has recently been investigated by Zhang and Yeung. They
have found that
(therefore
in general), but
, the closure of , is equal to
[29]. This implies that all
unconstrained (linear or nonlinear) inequalities involving three
random variables are consequences of the elemental inequalities of the same set of random variables. However, it is not
clear whether the same is true for all constrained inequalities.
They also have discovered the following conditional inequality
involving four random variables which is not implied by the
elemental inequalities:
If
and
, then
(30)
which imply that
.
We now discuss a special application of the method described in this subsection. Let us consider the following
problem which is typical in probability theory. Suppose we are
given that
- - - and
- - - form a Markov
chain, and that
and
are independent. We ask whether
and
are always independent. This problem can be
formulated in information-theoretic terms with the constraints
represented by
,
, and
, and we want to know whether they imply
.
Problems of such kind can be handled by the method
described in this subsection. Our method can prove any
independence relation which can be proved by conventional
information-theoretic techniques. The advantage of using an
information-theoretic formulation of the problem is that we can
avoid manipulations of the joint distribution directly, which is
awkward [8], if not difficult.
It may be difficult to devise a calculus to handle independence relations of random variables in a general setting,2
because an independence relation is “discrete” in the sense
that it is either true or false. On the other hand, the problem
becomes a continuous one if it is formulated in informationtheoretic terms (because mutual informations are continuous
functionals), and continuous problems are in general less
difficult to handle. From this point of view, the problem of
determining independence of random variables is a discrete
problem embedded in a continuous problem.
2 A calculus for independence relations has been devised by Massey [9] for
the special case when the random variables have a causal interpretation.
If, in addition,
and
the above inequality implies that
This is a conditional independence relation which is not
implied by the elemental inequalities. However, whether
remained an open problem. Subsequently, they have deby discovering the following uncontermined that
strained inequality involving four random variables which is
not implied by the elemental inequalities of the same set of
random variables [30]:
The existence of the above two inequalities indicates that
there may be a lot of information inequalities yet to be
discovered. Since most converse coding theorems are proved
by means of information inequalities, it is plausible that some
of these inequalities yet to be discovered are needed to settle
certain open problems in information theory.
In the remainder of the section, we shall further elaborate
on the significance of
by pointing out its relations with
some important problems in probability theory and information
theory.
1930
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
A. Conditional Independence Relations
For any fixed number of random variables, a basic question
is what sets of conditional independence relations are possible.
In the recent work of Matúš and Studený [17], this problem
and let
is formulated as follows. Recall that
be the family of all couples
where
and
is the union of two, not necessarily different, singletons and
of
. Having a system of random variables
with subsystems
,
, we introduce the
notation
where
is the abbreviation of the statement “
is conditionally independent of
given
.” For
,
means
is determined by
. The subsystem
is presumed to be constant. A subfamily
is called
probabilistically ( -) representable if there exists a system ,
. The problem is
called a -representation, such that
of all -representable relations.
to characterize the class
Note that this problem is more general than the application
discussed in Section IV-D.
is equivalent to
. If
Now
is a proper subset of
, i.e.,
is not of
can be written as a nonnegative
elemental form, then
combination of the corresponding elemental forms of Shannon’s information measure. We observe that
if and only if each of the corresponding elemental forms
of Shannon’s information measures vanishes, and that an
elemental form of Shannon’s information measure vanishes
if and only if the corresponding conditional independence
relation holds. Thus it is actually unnecessary to consider
,
, for
separately because it
is determined by the other conditional independence relations.
Let us now look at some examples. For
and
statement:
and
Subsequent work on this subject has been done by Pearl and
his collaborators in the 1980’s, and their work is summarized
in the book by Pearl [18]. Their work has mainly been
motivated by the study of the logic of integrity constraints
from databases. Pearl conjectured that Dawid’s four axioms
completely characterize the conditional independence structure
of any joint distribution. This conjecture, however, was refuted
by the work of Studený [20]. Since then, Matúš and Studený
have written a series of papers on this problem [10]–[17],
[20]–[24]. So far, they have solved the problem for three
random variables, but the problem for four random variables
remains open.
The relation between this problem and
is the following.
Suppose we want to determine whether a subfamily
of
is -representable. Now each
corresponds
to setting
to zero in
. Note that
is a hyperplane containing the origin in
. Thus
is representable if and only if there exists a in
such that
for all
. Therefore, the problem
of conditional independence relations is a subproblem of the
problem of characterizing
.
B. Optimization of Information Quantities
Consider minimizing
given
(31)
(32)
(33)
where
. This problem is equivalent to the following
minimization problem.
Minimize
subject to
As pointed out in the last paragraph, the couples
are actually redundant. Let be a
such that
is not
system of random variables
,
, and
and
are not
deterministic,
functions of each other. Then it is easy to see that
Thus
is -representable.
is not On the other hand,
and
imply
.
representable, because
The recent studies on the problem of conditional independence relations was launched by a seminal paper by Dawid
[3], in which he proposed four axioms as heuristic properties of conditional independence. In information-theoretic
terms, these four axioms can be summarized by the following
(34)
(35)
(36)
and
(37)
As no characterization of
is available, this minimization
problem cannot be solved. Nevertheless, since
, if
we replace
by
in the above minimization problem, it
becomes a linear programming problem which renders a lower
bound on the solution.
C. Multiuser Information Theory
The framework for information inequalities developed in
this paper provides new tools for problems in multiuser
information theory. Consider the source coding problem in
Fig. 1, in which
and
are source random variables,
YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES
1931
Prover) has been developed by Yeung and Yan [27], and it
is available on World Wide Web. The following session from
ITIP contains verifications of Example 1 and 2, respectively,
in Section I.
>> ITIP(’I(Y; Z) >= I(X; Z)’,
’I(X; Z|Y) = 0’)
True
>> ITIP(’H(X,Y) - 1.04 H(Y) + 0.7 I(Y; X,Z)
+ 0.04 H(Y|Z) >= 0’)
True
Fig. 1. A multiterminal source coding problem.
and the blocks on the left and right are encoders and
decoders, respectively. The random variables
,
, and
are the outputs of the corresponding encoders. Given
,
, and
, where
and
, we are interested in the
admissible region
of the triple
.
Evidently,
,
, and
give the number of
bits needed for the encoders. From the encoding and decoding
requirements, we immediately have
,
,
,
,
, and
equal to zero. Now there are five random
variables involved in this problem. Then the intersection of
and the set containing all
such that
is the set of all possible vectors of the joint entropies involving
given that they satisfy the encoding and decoding
requirements of the problem as well as the constraints on the
joint entropies involving
and
. Then
is given as the
projection of this set on the coordinates , , and . In
the same spirit as that in the last subsection, an explicit outer
bound of
, denoted by , is given by replacing
by .
We refer to an outer bound such as
as an LP (linear
programming) bound. This is a new tool for proving converse
coding theorems for problems in multiuser information theory.
The LP bound already has found applications in the recent
work of Yeung and Zhang [28] on a new class of multiterminal source coding problems. We expect that this approach
will have impact on other problems in multiuser information
theory.
VI. CONCLUDING REMARKS
We have identified the region
as the origin all information inequalities. Our work suggests the possibility of the
existence of information inequalities which cannot be proved
by conventional techniques, and this has been confirmed by
the recent results of Zhang and Yeung [29], [30].
A product from the framework we have developed is a
simple calculus for verifying all linear information inequalities
involving a definite number of random variables possibly
with linear constraints which can be proved by conventional
techniques; these include all inequalities of such type in
the literature. Based on this calculus, a software running on
MATLAB
called ITIP (Information-Theoretic Inequality
We see from (19) that the amount of computation required is
moderate when
. Our work gives a partial answer to
Han’s question of what linear combinations of entropies are
always nonnegative [5]. A complete answer to this question is
impossible without further characterization of .
is a very fundamental problem
The characterization of
in information theory. However, in view of the difficulty of
some special cases of this problem [15], [17], [29], [30], it is
not very hopeful that this problem can be solved completely
in the near future. Nevertheless, partial characterizations of
may lead to the discovery of some new inequalities which
make the solutions of certain open problems in information
theory possible.
APPENDIX I
MINIMALITY OF THE ELEMENTAL INEQUALITIES
The elemental inequalities in set-theoretic notations have
one of the following two forms:
;
1)
, where
and
.
2)
They will be referred to as -inequalities and -inequalities,
respectively.
We are to show that all the elemental inequalities are
nonredundant, i.e., none of them is implied by the others. For
an -inequality
(38)
since it is the only elemental inequality which involves the
, it is clearly not implied by the other
atom
elemental inequalities. Therefore, we only need to show that all
-inequalities are nonredundant. To show that a -inequality
is nonredundant, it suffices to show that there exists a measure
on which satisfies all other elemental inequalities except
for that one.
We shall show that the -inequality
(39)
is nonredundant. To facilitate our discussion, we denote
by
and we let
be the atoms in
, where
(40)
We first consider the case when
. We construct a measure
if
otherwise
where
. In other words,
, i.e.,
by
(41)
is the only
1932
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
atom with measure
; all other atoms have measure . Then
is trivially true. It is also trivial to
Consider
check that for any
(42)
such that
and for any
(43)
. On the other hand, if
is a proper
if
, then
contains at least
subset of
two atoms, and therefore,
(44)
This completes the proof for the -inequality in (39) to be
nonredundant when
.
We now consider the case when
, or
. We construct a measure as follows. For
the atoms in
, let
.
For
atom of
we let
(51)
and
The nonnegativity of the second term above follows from (46).
For the first term,
is
nonempty if and only if
and
(52)
If this condition is not satisfied, then the first term in (51)
, and (50) follows immediately.
becomes
Let us assume that the condition in (52) is satisfied. Then
by simple counting, we see that the number atoms in
is equal to
, where
(45)
For example, for
, if
is odd, it is referred to as an odd atom of
, and if
is even, it is referred to as an even
. For any atom
,
(46)
This completes the construction of .
We first prove that
, there are
atoms in
namely,
where
or
for
We first consider the case when
. We check that
, i.e.,
(47)
Consider
Then
where the last equality follows from the binomial formula
(48)
for
. This proves (47).
Next we prove that satisfies all
that for any
, the atom
. Thus
-inequalities. We note
is not in
(49)
It remains to prove that
for (39), i.e., for any
and
satisfies all -inequalities except
such that
(50)
contains exactly one atom. If this atom is an even atom of
, then the first term in (51) is either or (cf.,
(45)), and (50) follows immediately. If this atom is an odd
atom of
, then the first term in (51) is equal to
. This happens if and only if
and
have one
common element, which implies that
is nonempty. Therefore, the second term in (51) is at least ,
and hence (50) follows.
Finally, we consider the case when
. Using the
binomial formula in (48), we see that the number of odd atoms
and even atoms of
in
are the same. Therefore, the first term in (51) is equal to
if
and is equal to otherwise. The former is true if and only if
, which implies that
YEUNG: A FRAMEWORK FOR LINEAR INFORMATION INEQUALITIES
is nonempty, or that the second term is at least . Thus in
either case (50) is true. This completes the proof that (39) is
nonredundant.
APPENDIX II
SOME SPECIAL FORMS OF UNCONSTRAINED
INFORMATION INEQUALITIES
In this appendix, we shall discuss some special forms
of unconstrained linear information inequalities previously
investigated by Han [5], [6]. Explicit necessary and sufficient
conditions for these inequalities to always hold have been
obtained. The relation between these inequalities and the
results in the current paper will also be discussed.
1933
It follows trivially from the elemental inequalities that
is a sufficient condition for
to always hold. The necessity of this condition can be seen
by noting the existence of random variables
for
such that
and
each
for all
and
. This implies that
all unconstrained linear symmetrical information inequalities
are consequences of the elemental inequalities. We refer the
reader to [5] for a more detailed discussion of symmetrical
information inequalities.
B. Information Inequalities Involving Three Random Variables
Consider
. Let
A. Symmetrical Information Inequalities
An information expression is said to be symmetrical if it
is identical under every permutation among
. For
example, for
, the expression
is symmetrical. This can be seen by permuting
and
symbolically in the expression. Now let us consider the
expression
. If we replace
and
by each other, the expression becomes
, which is symbolically different from the original expression. However, both expression are identical to
. Therefore, the two expressions are in fact
identical, and the expression
is
actually symmetrical although it is not readily recognized
symbolically.
The symmetry of an information expression in general
cannot be recognized symbolically. However, it is readily
recognized symbolically if the expression is in canonical
form. This is due to the uniqueness of the canonical form
as discussed in Section III.
Consider a linear symmetrical information expression
(in canonical form). As seen in Section IV-B,
can be
expressed as a linear combination of the two elemental forms
of Shannon’s information measures. It was shown in [5] that
every symmetrical expression
can be written in the form
and
and let
Since is an invertible linear transformation of , all linear
information expression can be written as
, where
It was shown in [6] that
always holds if and only if
the following conditions are satisfied:
(53)
In terms of , the elemental inequalities can be expressed as
, where
where
and, for
,
Note that
is the sum of all Shannon’s information mea,
is
sures of the first elemental form, and for
the sum of all Shannon’s information measures of the second
elemental form conditioning on
random variables.
From the discussion in Section IV-B, we see that
always holds if and only if
is a nonnegative
linear combination of the rows of
. We leave it as an
exercise for the reader to show that
is a nonnegative
linear combination of the rows of
if and only if the
conditions in (53) are satisfied. Therefore, all unconditional
linear inequalities involving three random variables are
consequences of the elemental inequalities. This result also
implies that
is the smallest pyramid containing .
1934
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 6, NOVEMBER 1997
ACKNOWLEDGMENT
[13]
The author wishes to acknowledge the help of a few
individuals during the preparation of this paper. They include
I. Csiszár, B. Hajek, F. Matúš, Y.-O. Yan, E.-h. Yang, and Z.
Zhang.
[14]
REFERENCES
[15]
[16]
[17]
[18]
[1] T. M. Cover and J. A. Thomas, Elements of Information Theory. New
York: Wiley, 1991.
[2] I. Csiszár and J. Körner, Information Theory: Coding Theorems for
Discrete Memoryless Systems. New York: Academic, 1981.
[3] A. P. Dawid, “Conditional independence in statistical theory (with
discussion),” J. Roy. Statist. Soc., Ser. B, vol. 41, pp. 1–31, 1979.
[4] T. S. Han, “Linear dependence structure of the entropy space,” Inform.
Contr, vol. 29, pp. 337–368, 1975.
[5]
, “Nonnegative entropy measures of multivariate symmetric correlations,” Inform. Contr., vol. 36, pp. 133–156, 1978.
[6]
, “A uniqueness of Shannon’s information distance and related
nonnegativity problems,” J. Combin.., Inform. Syst. Sci., vol. 6, no. 4,
pp. 320–331, 1981.
[7] T. Kawabata and R. W. Yeung, “The structure of the I -Measure of a
Markov chain,” IEEE Trans. Inform. Theory, vol. 38, pp. 1146–1149,
May 1992.
[8] J. L. Massey, “Determining the independence of random variables,” in
1995 IEEE Int. Symp. on Information Theory (Whistler, BC, Canada,
Sept. 17–22, 1995).
, “Causal interpretations of random variables,” in 1995 IEEE Int.
[9]
Symp. on Information Theory (Special session in honor of Mark Pinsker
on the occasion of his 70th birthday) (Whistler, BC, Canada, Sept.
17–22, 1995).
[10] F. Matúš, “Abstract functional dependency structures,” Theor. Comput.
Sci., vol. 81, pp. 117–126, 1991.
[11]
, “On equivalence of Markov properties over undirected graphs,”
J. Appl. Probab., vol. 29, pp. 745–749, 1992.
, “Ascending and descending conditional independence relations,”
[12]
in Trans. 11th Prague Conf. on Information Theory, Statistical Decision
Functions and Random Processes (Academia, Prague, 1992), vol. B,
pp. 181–200.
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
, “Probabilistic conditional independence structures and matroid
theory: Background,” Int. J. General Syst., vol. 22, pp. 185–196, 1994.
, “Extreme convex set functions with many nonnegative differences,” Discr. Math., vol. 135, pp. 177–191, 1994.
, “Conditional independence among four random variables II,”
Combin., Prob. Comput., to be published.
, “Conditional independence structures examined via minors,”
Ann. Math. Artificial Intell., submitted for publication.
F. Matúš and M. Studený, “Conditional independence among four
random variables I,” Combin., Prob. Comput., to be published.
J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo,
CA: Morgan Kaufman, 1988.
G. Strang, Linear Algebra and Its Applications, 2nd ed. New York:
Academic, 1980.
M. Studený, “Attempts at axiomatic description of conditional independence,” in Proc. Work. on Uncertainty Processing in Expert Systems,
supplement to Kybernetika, vol. 25, nos. 1–3, pp. 65–72, 1989.
, “Multiinformation and the problem of characterization of conditional independence relations,” Probl. Contr. and Inform. Theory, vol.
18, pp. 3–16, 1989.
, “Conditional independence relations have no finite complete
characterization,” in Trans. 11th Prague Conf. on Information Theory, Statistical Decision Functions and Random Processes (Academia,
Prague, 1992), vol. B, pp. 377–396.
, “Structural semigraphoids,” Int. J. Gen. Syst., submitted for
publication.
, “Descriptions of structures of stochastic independence by means
of faces and imsets (in three parts),” Int. J. Gen. Syst., submitted for
publication.
, “A new outlook on Shannon’s information measures,” IEEE
Trans. Inform. Theory, vol. 37, pp. 466–474, May 1991.
, “Multilevel diversity coding with distortion,” IEEE Trans.
Inform. Theory, vol. 41, pp. 412–422, Mar. 1995.
R. W. Yeung and Y.-O. Yan, ITIP, [Online] Available WWW:
http://www.ie.cuhk.edu.hk/ ITIP.
R. W. Yeung and Z. Zhang, “Miltilevel distributed source coding,” in
1997 IEEE Int. Symp. on Information Theory (Ulm, Germany, June
1997), p. 276.
Z. Zhang and R. W. Yeung, “A non-Shannon type conditional information inequality,” this issue, pp. 1982–1986.
, “On the characterization of entropy function via information
inequalities,” to be published in IEEE Trans. Inform. Theory.