Generating a Maximum Entropy Probability Density from an Interval

Generating a Maximal Entropy Probability Density
from an
Interval-Valued Fuzzy Set
Carol L. Walker
Elbert A. Walker
Department of Mathematical Sciences
New Mexico State University
Las Cruces, NM 88003
Email: [email protected]
Abstract— Let f be an interval-valued fuzzy subset of a nite
set U of size n. This yields n closed intervals inside the unit
interval. Picking a point in each interval and dividing by the
sum of the points gives rise to a probability density on the
set of intervals. The problem is to pick the points that yield
maximum entropy of the resulting density. We provide algorithms
for picking such points.
I. I NTRODUCTION
A problem that arises in several situations is this. Given a
family of probability densities, choose one from that family
that has the largest entropy. Some instances are in [1], [2], [3],
[4].
The problem we consider is given n subintervals [a1 ; b1 ],
[a2 ; b2 ], : : :, [an ; bn ] of [0; 1], choose xi 2 [ai ; bi ] such that
the density
x
x
x
P 1 ; P 2 ;:::; Pn
x
x
i i
i i
i xi
has the largest entropy. This means that among all the possible
choices x for the n-tuple of xi 's, the quantity
X
x
x
P j ln P j
H(x) =
x
i i
i xi
j
is maximum.
Several remarks are in order.
It should be noted that the intervals do not have to
be distinct, and that an interval can be a single point.
Intervals such as [0; 0] and [0; a] are allowed, as are [1; 1]
and [b; 1]. And of course, an interval may be contained
in another.
If all the intervals are [0; 0], then each xi = 0 and each
probability is 00 , which is unde ned so we assumePthis
is not the case. In particular, we assume S =
xi
is always positive. If some interval is not [0; 0], then
the interval [0; 0] has associated probability 0, and since
limx!0+ x ln x = 0, it contributes no entropy. Thus, in
developing an algorithm for nding the xi 's that give
Ronald R. Yager
Machine Intelligence Institute
Iona College
New Rochelle, NY 10801
Email: [email protected]
maximum entropy, we can assume that no interval is
[0; 0]. So in this situation, which we always assume,
there are xi that give maximum entropy since H(x) =
P
Pxj ln Pxj
is a continuous function on a
j
i xi
i xi
compact space.
If the intersection of the intervals is non-empty, then
choosing all the xi to be any point x in that intersection
yields maximum entropy, namely
n
X
j=1
x
P
i
x
x
ln P
i
x
= ln n
(That when the xj = i xi are all equal maximizes the
entropy is well known.) In particular, the solution may
not be unique. In fact, in this case, if the intersection
is not a single point, then there are uncountably many
solutions, namely any point in the intersection, which is
itself a closed interval.
An xi does not have to be an end-point of its interval. For
example, if n = 3, and the three intervals are disjoint,
then there will be a unique solution, namely the right
endpoint of the left most interval, the left endpoint of the
right most interval, and generally some interior point of
the middle interval. We will see an explicit example of
this below.
The motivation for this problem is the following. Let U be a
nite set and f a function from U into the set of closed intervals of [0; 1]. That is, f : U ! f[a; b] : 0 a b 1g. This
yields a nite set of closed intervals f[ai ; bi ] : i = 1; 2; : : : ; ng,
and choosing xi 2 [ai ; bi ] in turn yields a probability density
P ([ai ; bi ]) = xi = xj on f[ai ; bi ] : i = 1; 2; : : : ; ng. This turns
the membership grades of the interval-valued fuzzy set f into
probabilities. Also maximizing the entropy of this probability
density gives a canonical way to pick out a point xi in each
interval [ai ; bi ], a possible rst step in defuzzifying the fuzzy
set f .
We term x = (x1 ; x2 ; : : : ; xn ) a solution if it maximizes
entropy. There are some technical lemmas that enable us to
provide an algorithm to produce a solution. The aim here is
to get an exact solution. We want an algorithm that in a nite
number of steps gets an exact solution x = (x1 ; x2 ; : : : ; xn ).
And there are uniqueness questions to consider.
II. S OME T ECHNICAL L EMMAS
Lemma 1: If x = (x1 ; x2 ; : : : ; xn ) is a solution, then for no
two distinct entries xi < xj in x can it be that ai < bi , aj <
bj , and xi 2 [ai ; bi ) and xj 2 (aj ; bj ].
Proof: Let x = (x1 ; x2 ; : : : ; xn ), suppose that x1 < x2 ,
and suppose that x1 2 [a1 ; b1 ) and x2 2 (a2 ; b2 ]. Consider the
density given
Pn by (x1 +Ph;n x2 h; x3 ; : : : ; xn ). Since x1 + h +
x2 h + i=3 xi = i=1 xi = S, its entropy is
0
@ x1 + h ln x1 + h + x2 h ln x2 h
f (h) =
S
S
S
S
1
X xj xj
ln A
+
S
S
j=3
and its derivative with respect to h is
f 0 (h)
=
=
1
1 x1 + h
1
+ ln
S
S
S
S
1 x2 h
1 x1 + h
ln
ln
S
S
S
S
1 x2 h
ln
S
S
This latter quantity is positive whenever x2 h > x1 +h. Since
x1 2 [ai ; bi ) and x2 2 (a2 ; b2 ] we can so choose h, keeping
x1 in [a1 ; b1 ] and x2 in [a2 ; b2 ], and increasing entropy. Thus
x = (x1 ; x2 ; : : : ; xn ) cannot be a solution.
This lemma just says that if x = (x1 ; x2 ; : : : ; xn ) is a
solution, and xi < xj , then xi cannot be moved to the right
and xj to the left, keeping them in their original intervals, and
keeping xi < xj .
Pn
xi 2 [ai ; bi ], i = k + 1; : : : ; n. Letting S = kx + i=k+1 xi ,
an x 2 [a; b] that maximizes
1
0
n
X
x
x
x
x
j
j
+
f (x) = @k
ln
ln A
S S
S
S
j=k+1
is either an endpoint a or b, or an interior point x such that
f 0 (x) = 0. The solution to f 0 (x) = 0 is the point
! Pn 1 x
j=k+1 j
n
Q
xj
x=
xj
j=k+1
which may or may not be in the interval [a; b].
Proof: Of course there is an x that maximizes f since
f is a continuous function on the closed interval [a; b] . The
function f has a maximum at a, b, or at a point in the interior
of the interval [a; b] where the derivative of f is zero. Noting
that the derivative of S is k, the derivative of f (x) with respect
to x is
0
@k S kx + k S kx ln x
f 0 (x) =
S2
S2
S
1
n
X
kxj
kxj xj A
+
+
ln
2
S
S2
S
j=k+1
Dividing through by
of the function
(S
0
Solving
Corollary 2: If x = (x1 ; x2 ; : : : ; xn ) is a solution, then no
two distinct entries in x can be in the interior of their intervals.
As mentioned earlier, the function
X
x
x
P j ln P j
H(x) =
x
i
i
i xi
j
assumes a maximum since it is continuous on a compact space.
Whatever that maximum, it consists of some endpoints of
some of the intervals [ai ; bi ] and interior points of the rest.
But this corollary says that all those interior points must be
equal. So any solution may be assumed to consist of endpoints
of some of the intervals, and a common interior point of the
remainder of the intervals, if there are any remaining.
The following lemma gives us one way to calculate x that
gives maximum entropy.
Lemma 3: Let 0 < k < n. Suppose that [a; b] is the
intersection of the intervals [a1 ; b1 ], [a2 ; b2 ], : : :, [ak ; bk ], and
kx) + (S
=@
n
X
k=S 2 , the zeroes of f 0 (x) are the zeroes
kx) ln
n
X
j=k+1
1
n
X
x
+
S
x = exp
xj
S
xj ln xj
j=k+1
n
X
xj (ln xj ) = 0
j=k+1
n
Q
j=k+1
or alternatively,
n
X
xj A ln x
xj (ln x)
x=
xj ln
j=k+1
j=k+1
we get
xj
x
xj j
! Pn
1
j=k+1
Pn
xj
j=k+1 xj (ln xj )
Pn
j=k+1 xj
(1)
!
But this value of x may not be in the interval [a; b]. If it is in
the interval [a; b], there are three possibilities for a maximum:
this value, or a, or b. All three must be checked to see which
gives the largest value for f . If the point x in (1) is not in the
interval [a; b], then one simply checks a and b to see which
gives the greater value for f .
There are several things to note here.
The derivative f 0 (x) = 0 for at most one point x in the
interval [a; b].
This lemma only maximizes entropy given the xi , i =
k + 1; 2; : : : ; n.
! Pn 1 x
j=k+1 j
n
Q
xj
xj
conceivably
The solution x =
j=k+1
could be the endpoint of one (or more) of the intervals
[a1 ; b1 ], [a2 ; b2 ], : : :, [ak ; bk ].
III. A C RUDE A LGORITHM
Using the lemmas in the previous section, we can provide
an algorithm for nding an x that maximizes entropy. By
Corollary 2, an x = (x1 ; x2 ; : : : ; xn ) that maximizes entropy
will be either a set of endpoints, or endpoints of some set
of intervals and an interior point of the intersection of the
other intervals, with multiplicity the number of intervals in
that intersection. Here are the steps.
1) For each set S of intervals with nontrivial intersection,
assign to each interval not in S one of its endpoints. The
intersection of the intervals in S is some interval [a; b].
Use Lemma 3 above to calculate the resulting entropies.
There are a lot of calculations to make, one for each
choice of an endpoint of each interval not in S. Note
that one choice for S is just one interval, and hence a
solution might be a set of endpoints.
2) Of all the entropies calculated in step 1, choose an x that
gives the maximum value. There may be many solutions,
for example if the intersection of all the intervals is
nontrivial.
Example 4: Take the three disjoint intervals I1 =
[0:25; 0:33], I2 = [0:5; 0:67], I3 = [0:75; 1]. Testing all
possibilities as described above will show that the maximum
entropy is achieved for S = fI2 g and the choice of endpoints
x1 = 0:33 2 I1 and x3 = 0:75 2 I3 . Hence the maximum
entropy is the maximum value of
0:33
x
x
0:33
ln
+
ln
1: 08 + x 1: 08 + x 1: 08 + x 1: 08 + x
0:75
0:75
+
ln
1: 08 + x 1: 08 + x
for x 2 [0:5; 0:67]. Setting the derivative equal to 0
gives the solution x = 0:583 6, so the solution is x =
(0:33; 0:5836; 0:75), or x = (0:33; 0:5; 0:75), or x =
(0:33; 0:67; 0:75). The rst yields the maximum entropy,
which is 1:0475.
Example 5: Take the three intervals I1 = [0:25; 0:33], I2 =
[0:3; 0:8], I3 = [0:6; 0:7]. Testing all possibilities as described
above will show that the maximum entropy is achieved for
S = fI2 g and the choice of endpoints x1 = 0:33 2 I1 and
x3 = 0:6 2 I3 . Hence the maximum entropy is the maximum
value of
0:33
0:33
x
x
ln
+
ln
0:93 + x 0:93 + x 0:93 + x 0:93 + x
0:6
0:6
+
ln
0:93 + x 0:93 + x
for x 2 [0:3; 0:8]. Setting the derivative equal to 0 gives
the solution x = 0:485 312 95, so the solution is x =
(0:33; 0:485 312 95; 0:6), or x = (0:33; 0:3; 0:6), or x =
(0:33; 0:8; 0:6). The rst yields the maximum entropy, which
is 1: 070 312.
IV. A R EFINED A LGORITHM
The crude algorithm requires many calculations, but if the
number of intervals is not large, this may not be a bother.
However, the number of calculations can be signi cantly
reduced.
Let L = maxfai g and R = minfbi g. If L R, then every
interval [ai ; bi ] contains [L; R], so that setting xi = x for any
point in [L; R] yields maximum entropy. Thus we may assume
that R < L, which we now do. Note that R > 0.
Lemma 6: For any x that maximizes entropy, each xi
Similarly, each xi L.
R.
Proof: Suppose that xi 2 [ai ; bi ] with x1 = x2 P
=
=
n
xk < xk+1
xn and with xk+1 R. Let S = i=1 xi
and let x1 + h < xk+1 . Note that for i k,
S
kxi = S
k
X
xi =
i=1
n
X
xi
i=k+1
Then the entropy corresponding to x1 + h; x2 + h; : : : ; xn +
h; xn+1 ; : : : ; xn is
E (h) =
k
X
xi + h
xi + h
ln
S + kh S + kh
i=1
n
X
i=k+1
xi
xi
ln
S + kh S + kh
A calculation shows that the derivative of E with respect to
h is
!
n
X
k
xi
0
E (h) =
xi ln
2
h + x1
(S + hk)
i=k+1
and for 0
h
xk+1 x1 , we have for i
k + 1 that
xi
1 so this latter expression is positive on that interval.
h + x1
Thus we may assume that for any x that maximizes entropy,
all its entries that are less than or equal to R are equal. So if
x is a solution, then it has entries
x1 = x2 =
= xk < xk+1
xn
with xk R and R < xk+1 . If xk < R, then the same proof
as above, restricting h to be
R xk , shows that all the
entries of any x that maximizes entropy must be R.
Similarly, any such x must have all its entries
L. Thus
if x = (x1
x2
xn ) maximizes entropy, we may
assume that x1 = R and xn = L.
Since for every x = (x1 ; x2 ; : : : ; xn ) maximizing entropy,
R xi L, nding the xi so that x = (x1 ; x2 ; : : : ; xn ) maximizes entropy for the intervals [a1 ; b1 ]; [a2 ; b2 ]; : : : ; [an ; bn ]
is the same as nding the xi so that x = (x1 ; x2 ; : : : ; xn )
maximizes entropy for the intervals [R _ a1 ; L ^ b1 ]; [R _
a2 ; L ^ b2 ]; : : : ; [R _ an ; L ^ bn ]. Keep in mind that we are
always assuming that R < L. So our original problem can
be transformed as follows. First, from the [ai ; bi ], calculate R
and L. If L R, for any x 2 [L; R], set all xi = x. If R < L,
form the intervals [R _ ai ; L ^ bi ], and nd the requisite xi for
this set of intervals.
Thus we may reformulate the problem as follows.
Reformulation: Let 0 < R < L 1, and let [a1 ; b1 ],
[a2; b2 ], : : :,[an ; bn ] be subintervals of [R; L] with
a1 = b1 = R and an = bn = L. The problem
is P
as always: nd xi 2 [ai ; bi ] which maximizes
n
Pnxi
Pnxi
i=1
xj ln
xj .
j=1
j=1
The reformulated problem has the same solution (or perhaps
solutions) as the original problem. What are the advantages of
this formulation? In the subintervals of [R; L], one is [R; R],
and one is [L; L]. Call these [a1 ; b1 ] and [an ; bn ] respectively.
So x1 = a1 = b1 , and xn = an = bn , and we have two of the
required xi immediately. In general, the reformulation is just
a special case of the original problem, so all our lemmas may
be applied. But we can do better.
Lemma 7: In the reformulated problem, if ak < bk = L,
then xk < bk . Similarly, if R = ak < bk , then ak < xk .
Proof: This lemma says that if xk = L, then ak = bk =
L = bn . In other words, if ak < bk = bn , then the endpoint
bn = L is not xk . So let xi 2 [ai ; bi ], i = 1; 2; : : : ; n, and
ak < bk = bn = xk = L. Then
!
n
X
xi
xi
Pn
H (x) =
ln Pn
j=1 xj
j=1 xj
i=1
Pn
is the entropy of the system. Let S = j=1 xj and consider
f (h) =
n
X
i=1;i6=k
xi
S
h
ln
xi
S
h
xk
S
h xk
ln
h
S
h
h
for 0 h < L R. Note that, under our assumptions, h < xk .
Then a calculation shows that
0
1
n
X
1
@
xi (ln (xk h) ln xi )A
f 0 (h) =
2
(S h)
i=1;i6=k
1 Pn
If h = 0, then f 0 (h) = 2
ln xi ) =
i=1;i6=k xi (ln xk
S
d > 0 since xi xk for all i, and x1 = R < xk . Since f 0 is a
continuous function of h, there is a suf ciently small positive
h such that
0
1
n
X
1
@
xi (ln (xk h) ln xi )A > 0
f 0 (h) =
2
(S h)
i=1;i6=k
A similar proof works for R = ak < bk .
We are ready now to formulate the re ned algorithm,
which will be roughly the crude algorithm but more ef cient.
In particular, there will be fewer cases to consider. A few
preliminary remarks are needed.
Corollary 2 says that any solution is an endpoint of some set
of intervals and a common interior point of the intersection of
the rest of the intervals. So we search out all such situations
and choose the one with maximum entropy. The goal is to
search out all such situations ef ciently. Now any nonempty
intersection of a set S of intervals is a member of S or is the
intersection of two members of S.
We are in the following situation. We have R < L and
n closed subintervals [ai ; bi ], i = 1; 2; :::; n, of [R; L]. Also
a1 = b1 = R, and an = bn = L. The re ned algorithm is
this:
1) Find the intersection of all pairs of intervals from [ai ; bi ],
i = 2; 3; :::; n 1 such that these intersections are proper
closed intervals, that is, are of the form [a; b] with a < b.
Let I be the set of these intersections together with all
the proper intervals in the original collection.
2) For each interval I 2 I, form the family SI of all
intervals containing I. So for each intersection, [a; b],
S[a;b] is the family of all the intervals in f[ai ; bi ]; i =
2; 3; :::; n 1g containing [a; b]. Note that SI respects
multiplicities of the intervals that occurred in the original
family.
3) For those intervals [ai ; bi ] not in SI and such that
ai < bi , choose an endpoint in the interior of [R; L]. If
ai < aj < bi < bj , do not choose aj and bi . Such endpoints cannot be part of a solution maximizing entropy
by Lemma 1. If no possible choices of endpoints are
possible, proceed to another SI . Choose the endpoints
that are of the form [R; R] and [L; L], which include
[a1 ; b1 ] and [an ; bn ]. This gets for each [ai ; bi ] 2
= SI ,
an endpoint xi , except for those I for which no proper
selection of endpoints is possible.
4) For each I 2 I, with each possible set of endpoints
gotten from step 3, use Lemma 2 to calculate the
resulting entropy. There may be several sets of possible
endpoints for a given I 2 I.
5) Repeat for all I 2 I. Now choose from the resulting
candidates the one that gives maximum entropy.
Note that the resulting solution given by this algorithm is an
exact expression in the endpoints of the given intervals. This
algorithm, in general, requires many fewer calculations than
the crude algorithm. For all we know, there may be more than
one solution, even when R < L, but we suspect not. And, of
course, there may be a more ef cient algorithm.
Example 8: Consider the following subintervals of [0; 1]:
J1 = [0:2; 0:3], J2 = [0:35; 0:5], J3 = [0:15; 0:375], J4 =
[0:1; 0:3], J5 = [0:35; 0:7], J6 = [0:25; 0:5], J7 = [0:4; 0:8].
Then
[R; L] = [0:3; 0:4]
We trim the intervals to live inside [R; L] to get the family
I1 = [0:3; 0:3], I2 = [0:35; 0:4], I3 = [0:3; 0:375], I4 =
[0:3; 0:3], I5 = [0:35; 0:4], I6 = [0:3; 0:4], I7 = [0:4; 0:4].
Now we have 4 proper intervals fI2 ; I3 ; I5 ; I6 g and 3 points
fP1 = 0:3; P4 = 0:3; P7 = 0:4g.
1) Step 1 above yields the new interval: I8 = [0:35; 0:375],
and I = fI2 ; I3 ; I6 ; I8 g.
2) For each interval I 2 I, form the family SI of all
intervals containing I. That yields
a) SI2 = fI2 ; I5 ; I6 g
b) SI3 = fI3 ; I6 g
c) SI6 = fI6 g
d) SI8 = fI2 ; I3 ; I5 ; I6 g
3) For each of these four families there happens to be only
one choice for endpoints for each of the intervals not in
SI .
a) SI2 = fI2 ; I5 ; I6 g pairs with PI2 = fP1 = 0:3,
P3 = 0:375, P4 = 0:3, P7 = 0:4g
b) SI3 = fI3 ; I6 g pairs with PI3 = fP1 = 0:3, P2 =
0:35, P4 = 0:3, P5 = 0:35, P7 = 0:4g
c) SI6 = fI6 g pairs with PI6 = fP1 = 0:3, P2 =
0:35, P3 = 0:375, P4 = 0:3, P5 = 0:35, P7 =
0:4g. But by Lemma 1, the endpoints P2 and P3
cannot both be part of a solution, so we discard
this option.
d) SI8 = fI2 ; I3 ; I5 ; I6 g pairs with PI8 = fP1 = 0:3,
P4 = 0:3, P7 = 0:4g
4) For each of these three remaining options SI , we compute the values
1
0
X
x
xj
xj
x
ln
+
ln A
fI (x) = @kI
SI
SI
SI
SI
xj 2PI
where P
kI is the number of intervals in SI , SI =
kI x + xj 2PI xj , and x is either an endpoint of the
intersection of SI that lies in the interior of [R; L], or
!P 1 x
xj 2PI j
Q xj
x=
xj
:
xj 2PI
a) For SI2 = fI2 ; I5 ; I6 g we have I2 = [0:35; 0:4]
and PI2 = f0:3; 0:375; 0:3; 0:4g. Then
1
= [0:35; 0:4]
x = (0:233 001 37) 1:375 = 0:346 654 68 2
so we compute entropy only at the endpoint 0:35,
which yields f (0:35) = 1:941 099 2.
b) For SI3 = fI3 ; I6 g, we have I3 \ I6 = [0:3; 0:375]
and PI3 = f0:3; 0:35; 0:3; 0:35; 0:4g, and
1
x = (0:161 415 18) 1: 7 = 0:342 046 32 2 [0:3; 0:375]
so we compute entropy at this x and one endpoint:
f (0:342 046 32) = 1: 941 627 7
f (0:375) = 1: 940 74
d. For SI8 = fI2 ; I3 ; I5 ; I6 g we have I2 \I3 \I4 \I6 =
[0:35; 0:375] and PI8 = f0:3; 0:3; 0:4g. Then
x = 0:336 586 54 2
= [0:35; 0:375]
so we compute entropy at two endpoints:
Thus we have computed
a.
b.
b.
d.
d.
f (0:35)
f (0:342 046 32)
f (0:375)
f (0:35)
f (0:375)
= 1: 941 099 2
= 1: 941 627 7
= 1: 940 74
= 1: 941 573 3
= 1: 940 348 3
and see that the maximum entropy 1: 941 627 7 is achieved at
x = (0:3; 0:35; 0:342 046 32; 0:3; 0:35; 0:342 046 32; 0:4)
The probabilities yielding this max entropy are
p1
p4
p7
= 0:125 834 04; p2 = 0:146 806 38; p3 = 0:143 470 23;
= 0:125 834 04; p5 = 0:146 806 38; p6 = 0:143 470 23;
= 0:167 778 72
V. C OMMENTS
We have not shown for R < L that there is a unique
solution. This seems to be an interesting technical problem
but may yield to more sophisticated analytical techniques.
The algorithms provided to nd a solution are tedious, but
could easily be programmed.
R EFERENCES
[1] J. Abellan and S. Moral, Maximum of entropy for credal sets, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,
vol. 11, pp. 587-597, 2003.
[2] A. Meyerowitz, F. Richman, E. A. Walker, “Calculating maximumentropy probability densities for belief functions,” International Journal
of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 2, pp. 377389, 1994.
[3] H. T. Nguyen and E. A. Walker, “On decision-making using belief
functions,” in Advances in the Dempster-Shafer Theory of Evidence, R.
Yager, M. Fedrizzi, and J. Kacprzyk, Eds., John Wiley & Sons: New York
1994, pp. 311-330.
[4] H. T. Nguyen and E. A. Walker, A First Course in Fuzzy Logic, 3rd ed.,
Chapman & Hall/CRC: Boca Raton, Florida, 2006.