Suppose you are the manager of a widget factory which n

IEEE TRANSACTIONS ON SYSTEMS SCIENCE AND CYBERNETICS, VOL.
ssc-4,
112] E. T. Jaynes, "Foundations of probability theory and statistical mechanics," in Delaware Seminar in the Foundations of Phys-ics, M. Bunge, Ed. Berlin: Springer, 1967, ch. 6.
[131 J. W. Gibbs, Elementary Principles in Statistical Mechanics.
New Haven, Conn.: Yale University Press, 1902.
[14] A. Feinstein, Foundations of Information Theory. New York:
McGraw-Hill, 1958.
[15] E. T. Jaynes, unpublished, 1960. Copies available on request.
[161 ~, "New engineering applications of information theory,"
in Proc. 1st Symp. on Engineering Applications of Random
Function Theory and Probability, J. L. Bogdanoff and F. Kozin,
Eds. New York: Wiley, 1963, pp. 163-203.
[17] H. Chernoff and L. E. Moses, Elementary Decision Theory.
New York: Wiley, 1959.
[18] H. Jeifreys, Theory of Probability. Oxford: Clarendon Press,
1939.
[19] -~, Scientific Inference. London: Cambridge University
Press, 1957.
Te
WIdget
NO.
3,
[20] E. P. Wigner, Group Theory. New York: Academiic Press,
1959.
A. Haar, "Der Massbegriff in der Theorie der kontinuierlichen
Gruppen," Ann. Math., vol. 34, pp. 147-169, 1933.
L. Pontryagin, Topological Groups. Princeton, N. J.: Princeton University Press, 1946, ch. 4.
H. Poincar6, (Calcul des Probabilit.gs. Paris: Gauthier-Villars,
1912.
D. A. S. Fraser, "On fiducial inference," Ann. Math. Statist.,
vol. 32, pp. 661-676, 1966.
[231 J. Hartigan, "Invariant prior distributions," Ann. Math.
Statist., Vol. 35, pp. 836-845, 1964.
[26] M. Stone, "Right Haar measure for convergence in probability
to quasi-posterior distributions," Ann. Math. Statist., vol. 36,
pp. 449-453, 1965.
[27] R. Carnap, The Continuum of Inductive Methods. Chicago:
University of Chicago Press, 1952.
[211
[22]
[231
[24]
Problem
MYRON TRIBUS
AND
GARY
Revisited
FITTS
TABLE I
Abstract-The Jaynes "widget problem" is reviewed as an example of an application of the principle of maximrum entropy in the
making of decisions. The exact solution yields an unusual probability distribution. The problem illustrates why some kinds of
decisions can be made intuitively and accurately, but would be
difficult to rationalize without the principle of maximum entropy.
INTRODUCTION
241
SEPTEMBER 196824
FIRST STATE OF KNOWLEDGE
In Stock
Red Y
100
Yellow
1.50
Green
50
Given only the data in Table I, what would you decide?
Call this decision Dl.'I
Of course, if you could obtain more information, you
might come to another decision. Suppose, therefore, that
you consulted more of the company records and found out
how many widgets per day were being ordered. That is,
you found out how many widgets were ordered last year
and divided by the total number of days orders were
taken. Table 11 gives the second state of knowledge.
TABLE II
THE WIDGET problem, introduced by Jaynes [1 ], is
paraphrased as follows:
Suppose you are the manager of a widget factory which
has adopted the advertising slogan, "Your order filled in
24 hours or your money back." Your job is to attempt to
anticipate the orders for widgets each day and thereby
to protect the company against loss. For complex technological reasons, which need not be explained here,
widgets are made at the rate of 200 per day and must all
SECOND STATE OF KNOWLEDGE
be painted one color: red, yellow, or green. If today you
decide to paint the widgets green, tomorrow you may
Yellow
Red
Green
decide to paint them yellow, but all 200 must be painted
100
50
150
In Stock
the same color. Your sole task is to decide what color to
10
50
100
Order
Average
Daily
paint the day's run of widgets.
Suppose you look into the storeroom and discover that
there are some widgets already on hand. Presumably
With this information, you might decide to change your
these are widgets left over from previous days' production. mind from the decision you made with only the first state
The data from the stockroom are given in Table I.
of knowledge. With the second state of knowledge, what
would you decide? Call this decision D2.2
M\anuscript received January 2, 1968. This research was supported
by a grant from the National Science Foundation.
The authors are with the Thayer School of Engineering, Dartmouth College, Hanover, N.H. 03753.
I
2
Most people choose DI
Most, people choose D2
=
=
green.
Yellow.
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 18:49 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON SYSTEMS SCIENCE AND CYBERNETICS, SEPTEMBER 1968
242
intuition in problems of this kind. The majority vote for
D1, D2, and D3 has always been in agreement with the
formal mathematical solution. However, there has been
almost universal disagreement over how to defend the
intuitive solution. That is, while many engineers could
agree on the best course of action, they were much less in
agreement on why that course was the best one.
This latter point offers a basis for the importance of
modern decision theory. It is not clear that we know how
to solve in a formal manner the really interesting practical
decision problems of engineering. The techniques of the
field are still too few and the data required for
solutions are often not at hand. Even simple problems, like
TABLE III
the widget problem, are formidable exercises in matheTHIRD STATE OF KNOWLEDGE
matics. Yet, even if we are not entirely sure of our decisioi0s, we like to be sure of how we arrived at them. DeciGreen
Red Yellow
sion theory, looked upon as a communication tool, is much
50
150
100
In Stock
easier to defend than decision theory looked upon as a
10
.50
100
Average Daily Order
method of making "right" decisions. Indeed, there are no
20
75
10
Average Individual Order Size
"right" decisions; there are only decisioins which are consistent with the given information and whose basis can be
With the information in Table III, now what would you rationally related to the given information. In this spirit,
D1, D2, D3, and D4, which may differ from one another, are
do? Call this decision D3.1
Again, it will not be surprising if D2 and D3 differ, for all "right" decisions to the same question.
clearly the proper decision must depend upon the knowledge available. If this contention were not true, there
SETTING UP THE PROBLEM
would be no point in data gathering.
If the decision is and the number of red, yellow, and
Suppose, finally, that just as you were about to release green widgets turns Ai
out to be nr, ny, n0, the loss will be
your decision D3, the telephone rang and one of the salesmen called up and said, "Congratulate me-I just landed
(1)
Li i E u(nj - si - 200ij)
=-r,y,g
an order for 40 green widgets." Table IV shows this state
of knowledge.
where u(x) is the ramp function, equal to 0 for negative x
and equal to x for positive x and As; = 1 if i = j, 0 otherTABLE IV
If X represents a state of knowledge, the expected
wise.
FOURTH STATE OF KNOWLEDGE
loss will be
Clearly, no one will feel unstable if decisions D1 and D2
disagree. In some sense, D1 represents the best that could
be done with state of knowledge 1 and D2 represents the
best that could be done with state of knowledge 2. Few
persons will feel secure in making decision D2; most prefer
additional knowledge.
Suppose that in addition to the information in Table II
you learned that the average order size varies among the
widgets. That is, people tend to order a lot of red widgets
at a time, but they order the yellow widgets in small
orders. Table III gives the third state of knowledge, which
includes data on the average individual order size.
=
In Stock
Average Daily Order
Average Individual Order Size
Specific Order
Red
Yellow
100
150
.50
73
100
20
Green
50
10
20
40
Now what will you do? Call this D4.4
This problem was solved approximately by Jaynes [1],
but the approximate solution does not reveal some rather
interesting features of the exact solution to be presented
here.
Before taking up the formal solution, it may be reported
that Jaynes' widget problem has been presented to many
gatherings of engineers who have been asked to vote on
D1, D2, D3, and D4. There is almost unanimous agreement
about D1. There is about 85 percent agreement on D2.
There is about 70 percent agreement on D3, and almost no
agreement on D4. One conclusion stands out from these
informal tests; the average engineer has remarkably good
Most people do not choose yellow or green.
4Most people cannot make up their minds about D4.
I
KLfAjX)=
0
co
E
nr=O
ny=O
cO
o
a)
c
c
E E
ng=O j=r,y,g
u(nj - sj - 200S&j)p(nynrngfX)
co
CO
j=r,y,g nr=O ny=O ng>=O
*u(nj -sj
E
-
(2)
00
-
20081j)p(nfl7lyfljX) (3)
u(nj -s- 200bjj)p(njjX). (4)
i = r,y, n; =O
In the last equation we have made use of the fact that
p(nj|X)
=
E E p(ninjnk X).
ni=O nk=O
(5)
It is helpful to note that the boldface symbols which
appear inside the parentheses of the probability measure
are Boolean variables which are either true or false. The
variables appearing in conventional type are ordinary
algebraic variables which take on number values. Thus,
the symbol nT is read "the number of red widgets ordered
is nr." The symbol p (ntrX) is read "the probability that the
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 18:49 from IEEE Xplore. Restrictions apply.
243
TRIBUS AND FITTS: WIDGET PROBLEM REVISITED
statement n, is true conditional on the truth of statement X
is true is the real number p." In this sense, the assignment
of a set of numbers as p values does not represent any
objective verifiable quantity in the real world. Rather it is
an encoding of the knowledge represented by Xi. p is not
taken to be a measure of anyone's strength of belief, or
degree of conviction. Rather it is a unique encoding of the
knowledge implied by the statement X. The central
problem of applied probability theory, therefore, is the
formulation of a rational basis for the assignment of p
values, given well-defined statements concerning the
conditional knowledge upon which the assignments are
based. In this sense, the use of p values as a "degree of
belief" may be interpreted as the assignment by someone
who is hazy about the X upon which the probability
assignment is based.
This way of looking at the matter is fundamental to the
methods used in this paper and the previous one by
Jaynes. It seems to be fundamental to the Bayesian
approach to problems of inference, though the matter is
not often put in precisely these terms.
Once it is accepted that the task is one of encoding
knowledge, there arise two problems. The first is the task
of being unambiguous about the knowledge to be encoded.
This is a formidable task and is central to all problems of
applied mathematics. The best that can be done is to be
explicit about the problem being solved, i.e., to define
what is and is not included in the problem statement.
This means the symbol X must be unambiguously defined.
The second task requires the invocation of a principle.
That is, since we are encoding partial knowledge (otherwise the p would be zeros or ones), the given information
will, in general, not suffice to establish a unique set of p
values, called a distribution. Rather there will be many
sets of p values which are consistent with the given information. The problem is to choose the one which in some
sense is "best." Thus the problem of probability assignment may be stated to be the search for a distribution
function which agrees with the given information (i.e., is
consistent) and satisfies a choice criterion.
The criterion proposed by Jaynes is Shannon's entropy
measure. According to this choice, Jaynes' principle of
maximum entropy may be stated as follows.
The minimally prejudiced distribution function is that
which maximizes the entropy S where
S= -Ep lnp
to see that any other assignment of probabilities would be
inconsistent. If the only way to tell one outcome from
another is by the label attached to the outcome, then the
assignment of probabilities must be invariant to a change
in the labels. If this were not so, the labels would have
more meaning than stated and there would be more information than implied by the normalization process. But
this contention is contrary to the statement that the labels
represent all that is known. This line of reasoning is due to
Robert Evans and is similar to one advanced by David
Hume in 1740. (See Jaynes [2].)
SOLUTION FOR THE FIRST STATE OF KNOWLEDGE Xi
If Xi represents the knowledge in Table I (plus the
general statement of the problem), the principle of maximum entropy will be found to give no reason for assigning
values of p(n,IX1) different from p(nyIXi) and p(n0f X1).
Therefore, for state of knowledge X1 we have
p(nfrX1)
=
p(n,11X1)
=
p(n4IXi)
(6)
and the sum indicated in (4) will be a minimum when the
expression
[u(nr - 100 - 2006ir) + u(ny - 150 - 2005iy)
+ u(n. - 50 - 2003g)] (7)
is a minimum. For the numbers given, this minimum
occurs when i = g, i.e., the decision D1 is to paint the
widgets green.
SOLUTION TO THE SECOND STATE OF KNOWLEDGE X2
In this case, X2 represents the data given in Table II.
If we identify the averages given in the table with the
expectation values for the probability distributions (and
if the data were sparse this would certainly be a questionable step), we have, as an encoding of this knowledge, the
following constraints:
00
OD
CD
p(nrnynglX2)
E E E
nr=O
X0
ny=O
ng=O
a)
co
E nrp(nrnynllX2)
E ZE Z
nr=O ny=O ng=O
00
co
E:
(8)
(9)
=
50
=
100
(10)
=
10.
(11)
co
E E E nyp(n,n2,n,JX2)
nfl=O ny=O ng=O
E
=1
E
nr=O ny=O ng=O
np(nfrnynIfX2)
subject to the given constraints.
Equations (8) through (11) represent an encoding of the
One of the results of the application of Jaynes' principle data
of Table II. Any probability distribution chosen to
is that where there are no constraints, other than the
this information should be consistent with these
encode
requirement that the probability distribution be normal- equations.
The principle of maximum entropy may be
ized, all values of pi are set equal to one another. Thus applied by maximizing
Laplace's principle of insufficient reason emerges as a
special case of the application of the principle of maximum
Z E Z p(nfrfnn,llX2) ln p(nrnynvnX2) (12)
entropy. It has been shown that this principle arises from S = - nr=O
ny=O ng=O
considerations of consistency and, in the cases in which
there are no constraints beyond normalization, it is easy subject to (8) through (11).
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 18:49 from IEEE Xplore. Restrictions apply.
244
IEEE TRANSACTIONS ON SYSTEMS SCIENCE AND CYBERNETICS, SEPTEMBER
Writing
have
p(n,n,n4X2)
S = Sr +
Sy
p(n,IX2)p(nyIX2)p(n4|X2),
=
we
The last series is readily found so that (21) becomes
(LI A X2
00
+
Sg
E=0 p(nr4X2) ln p(nlrJX2)
= -
nr
ZE
= -
ny=o
p(nyjX2) In p(nyjX2)
E p(n0fX2) In p(n4|X2) (13)
ng =O
E p(n4X2)
= 1
nr=°
E=0 p(n4X2)
ao
E
ng =O
(14)
= 1
ny
p(n,IX2)
=
(22)
(nj)-
+
j =r,y,g L-+(n)
1
Table V summarizes the results from (22).
co
= -
1968
EXPECTED LOSSES
TABLE V
STATE
WITH
Yellow
Red
Widgets
0.13
6.9
6.9
Decision
Red
Yellow
Green
OF
KNOWLEDGE X2
Green
Widgets
0.085
Widgets
22.5
3.1
0.085
4 X 10-1°
22.5
Total
22.7
10.1
29.4
The smallest, total loss corresponds to D2 = yellow, in
agreement with intuition.
= 1
0
E
nr
=
nrP (nr| X2 )
°
=
50
co
E nyp(nyX2)
ny=O
= 100
(15)
E? no,p(nu(IX2) 10.
=
By Lagrange's method of undetermined multipliers, it is
straightforward to show that the maximum-entropy distributions consistent with these constraints are
p(nrjX2)
1 0f)
(16)
IX2)
= 1 (100)
101 \101/
(17)
p(n4IX2)
0\ nig
=1iK)*
(18)
p(n
=
1
(LIDDX2)
E
j=r,y,g
E
njj=U
u(nj-sj
-
=-j=r,y,g nj=si+200ij
(n)
mr = the niumber of orders for red widgets is ini
m = the number of orders for yellow widgets is nt,
m6 = the number of orders for green widgets is in,.
co
(nji)-
1 +
(nj) [I+ (i)
=
p({u} {v} {w}IX3)
(nj)
iuj
= n
(23)
lily
2
,
i= 1
nr
(24)
=
Mg1
E kwk = nDk=1
(25)
=
00
p(uu2 ...vv2 ... ww2... IX3) (26)
represent the probability that a particular set of orders
will be received. We have
(nj)]
or,
iui
=
cc
,L Wk
k=1
Z
i-=
Wk are unknown, the best
that can be done is to encode the knowledge of these
quantities in a probability distribution. Let
~()]i20i
+
EL jvi
I
PI1r
(20) Since the values of wi, vj, and
or, letting xj = 1- sj -200bj,
j=r,y,g 1 +
n, = the number of red widgets ordered is nT
nY = the number of yellow widgets ordered is n,
n, = the number of greeii widgets ordered is n0.
L Ui =-
where (nj) = 50, 100, 10 forj = r, y, g.
Since the ramp function is zero for nj < sj + 2006ij,
we have
(LIjA2X2)
and, as before,
The quaintities are related by
2008i)
I+ (nj) [I+
(L|AiX2)
ui = the number of orders for i red widgets is ui
vj = the number of orders for j yellow widgets is vj
wk = the number of orders for k green widgets is w-
Also, let
1
Putting these results in (4), we have
=
THE THIRD STATE OF KNOWLEDGE
F The third state of knowledge iintroduces the concept of
the order size, hence a more complex encoding is needed
to convey the additional information.
Let
]xi
(21)
p({u}{vIlw}IX3) = p({u}1X3) p({v}lX3) p({w}lX3)
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 18:49 from IEEE Xplore. Restrictions apply.
(27)
245
TRIBUS AND FITTS: WIDGET PROBLEM REVISITED
TABLE VI
STATE OF KNOWLEDGE X3
and
co
p({U}lX3)
=
p({ V} JX3)
=
p({ W}I X3)
=
p(u4IX3)
(28)
H
j=1
p(Vj(X3)
(29)
II
P(WkJX3).
(30)
H
1=1
co
Color
Red
Yellow
Green
S
100
150
50
(m)
(n)
50
100
10
0.667
10
0.5
X
2
4.727
0. 51
3.64
0.0131
0.0825
0.0512
The results given thus far recapitulate in compact form
those given by Jaynes. We now consider the exact comThe maximum-entropy assignment of p({ u} 'X3) and putation of p(n,jX3) from the distribution functions for
p({ w} JX3) consistent with Table III is that which maxi- p(ui4X3). Since, as Jaynes pointed out, the expectation for
mizes
ui, i.e., (uiX3) is
k=
ao
co
E p(u4X3) In p(Ui4X3)
i=l ui=O
S = -E
(31)
coD
cc
=
co
i: uip(uiIX3)
E
i=l ui=O
=
(33)
(34)
E f(n)z", O < z < 1.
F(Z) _ n=O
(32)
1, i= 1,2,
50
(mn,)
=
75
(43)
which is a Bose-Einstein distribution, the results of the
exact calculation may have interest to more than managers
of widget factories.
from p({ u} X3) it is convenient to
To compute
use the z transform. If f(n) is a function of the discrete
variable n, the z transform F(z) is defined by
subject to
E p(uiIX3)
ui=O
(uiXi) = eX, + 1X2- c
p(n4IX3)
co
E EiUip(UiIX3) =
i=l ui=O
(nr)
= 50
We use the identity
with similar equations for p(vj|X3) and P(Wk^X3). The
maximum-entropy distribution (1) is
E
X3)
p(uijX3)
Aoi-Alui-
e
=
(35)
X2iUi
p(nl
=
where
co
)\i
=
=
ui =O
In (1
Hence
p(UiIX3) =
(38)
e-11-iX2)e-(Xl+iX2)Ui.
X1 and X2 must satisfy the equations
cc
co
Z
E
_
50
i=1 ui=O
Z
(1
(1 -
eX1-i
X2)ie -(Xl+X2)ui
Ei (1 -e-XiX2)iuie(Xl+iX2)Ui
i=1 u,=0
-
75
0o0
7 "a'
c
i =1
i
eXi + iX2-1
e
=
75
=
p(nrl u} IX3)
-
Ui=O
U3=O
E . p(nrl { u}IX3)p({ u}I X3).
Ui=O
(45)
(46)
50.
where b(x)
we have
p(nr4X3)
=
3(nr
1 if x = 0, 6(x) =
ao
=
=
cc
E E
UIl=O
=
*(nr
...
(47)
jUA)
jc=1
0 elsewhere. From (28),
-
cc
-
j
j=i
a
jUj) H p(Ui|X3). (48)
i
co
1r=O Ul=O
(40)
co
=
(41)
(nr)
U2=0
is the z transform of p(n4jX3),
(39) If P(z)
cc
cc
co
p(Z) E= E E Z Znrb(n - j1I juj) p(uil X3) (49)
U2=0
The sums over ui are readily found, hence
1
i=1 eX1 + iN2
U1=O
Z
E
Z
p(nr7{u}X3)
(37)
A2-iA
-
co
(36) But
ln E e- (Xl+iX2)Ui
-
co
aD
co
=
(44)
(42)
=
Z
i
coD
cc
2
Z *z=l
Ui=0 u20
* *
E E *.
Ul=0u 2=0
juj
c
Hp(UiIX3)
i=1
Hp(UiIX3)ziui
ii
(51)
(52)
E ziuip(Ui4X3).
j=1 ui=O
= H
Similar equations are found for the other two colors. From (38), we have
The values of N1 and N2 to satisfy the two preceding equaco
co
tions are found via digital computer. The results are
=
Al1X2)
e
II
(1
fII
E (e-l-ix2zi)ui.
P(z) i.=1
i=l ui=O
tabulated in Table VI.
00
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 18:49 from IEEE Xplore. Restrictions apply.
(50)
(03)
246
IEEE TRANSACTIONS ON SYSTEMS SCIENCE AND
CYBERNETICS, SEPTEMBER 1968
TABLE VII
j
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
1
1
2
3
4
5
7
8
10
12
14
16
19
21
24
1
1
2
3
5
6
9
11
15
18
23
27
34
39
1
1
2
3
5
7
10
13
18
23
30
37
47
1
1
2
3
5
7
11
14
20
26
35
44
1
1
2
3
5
7
11
15
21
28
38
1
1
2
3
5
7
11
15
22
29
1
1
2
3
5
7
11
15
22
1
1
2
3
5
7
11
15
1
1
2
3
5
7
11
1
1
2
3
5
7
1
1
2
3
5
1
1
2
3
1
1
2
1
1
00
We note that P(O) = II (1 - eX-iX), hence
i=l
P(z)
=
P(0)
II
E
i=1 ui=O
(e-Xl-iX2zi)ui.
(54)
Writing out the first few terms in the product of sums, we
find
P(Z)
P(O)
[1 + e-X(ze-X2) + e-2X1(ze-XS)2
=
I
+ e-3X1(zeX-)I +
* [1 + e-X(ze-XS)2 + e-2Xi(ze-X2)4
+ e-3X1(ze-X2)6 +
* [1
1 +
=
+
.
.]
+ e-l(ze-X2)I + e-2X1(ze X2)6 +
]
(ze-X2)e-xl + (ze-X2)2[e-X + e-2X1]
(ze-X2)3[e-X
+
e-2X1
e-3Xi]
+
+ (zeX)4[eX1 + 2e- X + e-3X1 +
e-4X1]
+
or
P(Z)
P() n=0 C,, (Ze
P ()
The coefficients Cn are all of the form
n
Cn j=l
E Cy ,ne
.
n
(55)
then found by use of the recursion formula. Table VII gives
values for Ci,,,.
Let Po = p(njX3) for n = 0. Take the inverse transform
of P(Z) to find
p(nrx3)
=
poCneneS.
(LrJarX3)
=
Z (nr - sr)P(nr)X3)
fr
=
Sr
equation
=
Cj_l,n-1 + Cg,n-j
(56)
(57)
THE PROBABILITY DISTRIBUTIONS FOR STATE OF
KNOWLEDGE X3
From the recursion formula (56), the equation for the
coefficients C,, (55), and the formula (57), it is straightforward to program a digital computer to use the values of
the parameters X1 and X2 given in Table VI and thereby
compute the probability distributions for widgets of each
color. The results are shown in Fig. 1.
Fig. 1 indicates a rather unusual probability distribution
for red and green widgets. There is a high probability that
there will be no orders for widgets. But if any widgets are
ordered at all, the order is likely to be a big one. Clearly
this distribution function deserves to be called a "feast-orfamine" distribution. It is typical of a business which gets
only a few large orders at a time.
The distribution for yellow widgets is almost Gaussian.
To compute the losses associated with the various decisions, it is necessary only to compute the weighted
probability beyond the amount of material in stock. For
example, if A, = "paint the widgets red" and if a, =
"do not paint the widgets red," we find (Lr = losses due
to failure to supply red widgets)
The coefficients C3,n = number of sets { Uj} such that
Eui j and Eiui = n.
A proof that the Ci,,n may be found from the recursive and
Cj,n
17
(Lr|ArX3)
=
E 200 (n
nrsr
- s,
+
200)p(n4IX3).
(58)
(59)
is given in the Appendix.
Similar expressions are used for the other colors.
The values for Cj,,, for n = 1,2,3,4 were found by inThe results of the calculations for the third state of
spection of the series expansion. All other values were knowledge are summarized in Table VIII.
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 18:49 from IEEE Xplore. Restrictions apply.
247
TRIBUS AND FITTS: WIDGET PROBLEM REVISITED
ax17
6
I
I
I
I
/t\
I
'I
I~
I
1'I
PROBABILITY THAT V,.
:1
Green---- 0.604
I~-
Yellow----
I'
41
Co
0
Red----- 0.510
2xlO-4
'YellI
3
2
00
400
300
200
NUMBER OF WIDGETS ORDERED
Fig. 1. Probability distributions, third and fourth states
of knowledge.
TABLE VIII
STATE OF KNOWLEDGE X3
IDecision
Red
Yellow
Green
Red
Widgets
2.61
19..6
19.6
Expected Losses
Green
Yellow
Widgets
Widgets
1.31
5.16
0
5.16
1.31
0
Total
9.08
20.91
24.76
CONCLUSIONS
It is interesting to note the good agreement between the
maximum-entropy solutions and the intuitive judgments.
It is not until the fourth state of knowledge that the intuition begins to falter. The reason for the difficulty is clear.
As seen in Table IX, there is not too much to choose
among the given alternatives. While the decision is difficult, it appears to be unpleasant only because of the way
the problem was structured. It is true that the minimum
loss in the third state of knowledge is 9; this is a small price
for being able to sell all 500 widgets.
When the number of widgets produced per day changes
from 200 to 250, some of the answers change. It would
spoil the fun for the reader if we were to tell what the
changes are. Our study of this problem tells us that Jaynes
chose just the right numbers to reach the limit of most
people's intuitive abilities. It is most interesting to vary
the numbers and compare the intuitive answers with those
obtained using the principle of maximum entropy. We
have not undertaken the problem of trying to minimize
the losses for the next two days operations. Clearly the
manufacturer is in trouble on the second day. These interesting variations on the basic problem will be left to a
future analysis.
It is now about ten years since Jaynes introduced the
principle of maximum entropy as a general method of
statistical inference [2]. In that time numerous applications have been demonstrated in a wide variety of fields.
The generality and utility of the method seem no longer
to be in doubt.
APPENDIX
PROOF OF THE RECURSION FORMULA
Let Sj, be the class of all sets of non-negative integers
{u,},i> 1 suchthat
Ziu,
i
= n,
Eu,
i
(60)
= j.
From Table VIII, the minimum loss is seen to occur for Partition S1,n into two disjoint classes A and B defined by
3= red.
A = all sets of Sim,, for which ul = 0
In the fourth state of knowledge, the only change is to
B = all sets of Sj1, for which ul $ 0.
make sq = 50 - 40 = 10. The result of the a-nalysis for the
fourth state of knowledge is given in Table IX.
It will now be demonstrated that there exists a one-to-one
correspondence between the sets of A and the sets of
TABLE IX
Si,n-j and a one-to-one correspondence between the sets
STATE OF KNOWLEDGE X4
of B and the sets of Sj-l,n-l1
Let vi} be a set of Sj,n,. Let wi} be a set defined by
Decision
Red
Yellow
Green
Red
Widgets
2.61
19.6
19.6
Yellow
Green
Widgets
Widgets
5. 16
0
5.16
6.74
6.74
0
I
I
Expected Losses
Total
14.51
26.34
24.76
w1 = 0, w1=
vi11for i >
(61)
2.
We have, from the definition of { v},
(62)
zvi = j, ,zii = n-j.
For wi we have
According to the analysis of Table IX, the best
decision is A4 = red.
E wi
i>l
=
E v1i-
i>2
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 18:49 from IEEE Xplore. Restrictions apply.
=
E vi
i>l
=
i
(63)
248
IEEE TRANSACTlONS ON SYSTEMS SCIENCE AND CYBERNETICS, VOL.
Z iwj =
i >1
Therefore, the set I Y4} is a set of B and there is a one-to-one
correspondence between the sets of B and the sets of
E iv_l = iE> 1 (i + 1)vi = iE> 1 ivi + iE> v:
1
i >2
(n-j) + j
=
= n.
ssc-4, NO. 3, SEPTEMBER 1968
(64)
S-1 ,n-1l
Since the sets of Sj,n partition into either A or B, it
follows
that
Therefore, Iwi} is a set of A. Clearly every set { vi} corresponds to a set { wi}. The first relation has, therefore, been number of sets of Sj,n
proved. By the converse argument, it is easy to see that
- number of sets of A + number of sets of B
every set {w} corresponds to a set {vi}.
Let xi} be a set of Si_lmXl. Therefore
= number of sets of Sjn-i + number of sets of ASj_l,n
If Ci,n is the number of sets in Sjn
=
= n-1.
I
Exi j-1, Eixi
(65)
Let yi be defined by yl = xi + 1, yi = xi for i > 2. Therefore
E
i>1
Yi = 1 + xI +
E
i>2
E xi
xi = 1 +
i>1
=
1Yi
>
i>1
=
1 + X1 +
E ixj
i>2
=
1
+
=
1
1 +
E
i>1
(j - 1)
=
j (66)
ix,
+ (n
-
1)
=
n.
(67)
Ci,n 7=JfCl,n-J + Cj-l,n-1.
(6S)
REFERENCES
[1] E. T. Jaynes, "New engineering applicationis of informationi
theory," Proc. 1st Symp. of Engineering Applications of Random
Function Theory, J. L. Bogdanoff and F. Kozin, Eds. New York:
Wiley, 1963, pp. 163-203.
[21 -, "Information theory and statistical mechanics," pt. I,
Phys. Rev., vol. 106, pp. 620-630, 1957; pt. II, ibid., vol. 108,
pp. 171-191, 1957.
[3] M. Tribus, "Information theory as the basis for thermostatics
and thermodynamics," J. Appl. Mech., vol. 28, p. 108, March
1961.
[4]
, "The maximum entropy estimate in reliability," in Recent
Developments in Information and Decision Processes. New York:
Macmillan, 1962.
Probabilistic Information Processing Systems:
Design and Evaluation
WARD EDWARDS, LAWRENCE D. PHILLIPS, MEMBER, IEEE, WILLIAM L. HAYS,
BARBARA C. GOODMAN
Abstract-A Probabilistic Information Processing System (PIP)
and machines in a novel way to perform diagnostic information processing. Men estimate likelihood ratios for each datum
and each pair of hypotheses under consideration or a sufficient subset of these pairs. A computer aggregates these estimates by means
of Bayes' theorem of probability theory into a posterior distribution
that reflects the impact of all available data on all hypotheses
being considered. Such a system circumvents human conservatism
in information processing, the inability of men to aggregate information in such a way as to modify their opinions as much as the
available data justify. It also fragments the job of evaluating diaguses men
Manuscript received April 2, 1968. This work was supported
in part by Air Force Systems Command, Electronic Systems Division, Decision Sciences Laboratory, under Contract AF 19(628)2823, and in part by the Air Force Office of Scientific Research
under Contract AF 49(638)-1731. Preparation of this report was
supported by NASA under Grant NGR-23-005-171.
W. Edwards and Barbara C. Goodman are with the Institute of
Science and Technology, University of Michigan, Ann Arbor, Mich.
L. D. Phillips was with the Institute of Science and Technology,
University of Michigan, Ann Arbor, Mich. He is now with Brunel
University, London, England.
W. L. Hays is with the College of Literature, Science and the
Arts, University of Michigan, Ann Arbor, Mich.
AND
nostic information into small separable tasks. The posterior distributions that are a PIP's output may be used as a guide to human
decision making or may be combined with a payoff matrix to make
decisions by means of the principle of maximizing expected value.
A large simulation-type experiment compared a PIP with three
other information processing systems in a simulated strategic war
setting of the 1970's. The difference between PIP and its competitors
was that in PIP the information was aggregated by computer,
while in the other three systems, the operators aggregated the
information in their heads. PIP processed the information dramatically more efficiently than did any competitor. Data that would lead
PIP to give 99:1 odds in favor of a hypothesis led the next best
system to give 4'/2: 1 odds.
An auxiliary experiment showed that if PIP operators are allowed
to know the current state of the system's opinions about the hypotheses it is considering, they perform less effectively than if they
do not have this information.
INTRODUCTION
HIS PAPER presents the basic ideas of a class of
systems called Probabilistic Information Processing
Systems (PIP) and describes the results of a large simulaT
Authorized licensed use limited to: IEEE Xplore. Downloaded on February 9, 2009 at 18:49 from IEEE Xplore. Restrictions apply.