, .
UNIVERSITY OF NORTH CAROLINA
Department of Statistics
Chapel Hill, N. C.
A BAYESIAN INDIFFERENCE POS'I'ULATE
(Preliminary Report)
by
Belvin R. Novicl,Barch 1962
This research v~s primarily supported by the Office
of Naval Research under contract No. Nonr-855(09)
for research in probability and statist:Lcs at the
University of North Carolina, Chapel Hill, N. C.
Reproduction in 1,rhole or in part is pel"ll1itted for
any purpose of the United States Government
Supplemer>.tal. support 'tras received by the National
Science Foundation, Grant 0-5824.
0
Institute of Statistics
Mimeo Series No. 319
A BAYf:SIJ\N INDIFFERENCE POSTULATEl ,2
(Preliminary Report)
by
r1elvin R. Novick
Department of Statistics
and
The Psychometric Laboratory
The University of North Carolina
liThe only thing that I know is that I Imow nothing."
Attributed to Socrates.
1.
Introduction and Summary.
A Bayesian indifference postulate and a mode of estiwnting the density of a
random variable are proposed.
dependent on a parameter
is the density of a random varialbe
,.....
g €
).
1----(
and 1
€
is a prior distribution for
I-I
/--(
Q, the
X may be defined as
prior marginal density of
(,
f ~
(x)
==
! f,,(X)
,I
~
d 'f (G)
(1.1)
I
(i-i)
In cases where no prior information
bution
f~
(x).
'-
1
€
/-(
is
available it is proposed that a prior distri-
be chosen ,·rhich minimizes the Shannon information measure of
I-t
r----f
Some sufficient conditions are given under which the prior distribution
will be uniquely specified.
In connnon examples
the usual application of the Bayes postulate.
to a choice of
f
agrees with that specified by
The proposed postulate, however, leads
which is invariant under one-to-one transforwation of the parameter
space.
The posterior marginal density of
true densi'l:;y of
X.
X is considered as an estimator of the
Under general conditions, the sequence of l')osterior marg:Lnals
IThiS research v~s primarily supported by the Office of Naval Research under contract
No. Nonr-855(09) for research in probability and statistics at the University of
North Carolina, Chapel Hill, N. C. Reproduction in whole or in part is permitted for
any purpose of the United States Government. Supplemental support 'Has received by
the National Science Foundation, Grant G-5824.
2 Presented as a contributed paper at the Eastern Regional
of Mathematical Statistics, April 1962, Chapel Hill, N. C.
~Ieeting
of the Institute
-2-
is consistent and
l~s
~
certain interesting information
This preliminary report is
eJ~ository
theoretic properties.
in nature; proofs of the results obtained
are presented when they contribute materially to the conceptual development of the
theory.
2.
The Parametric Problem
He associate with the possible outcomes of an experiment the real-or vector-
valued random variable
density function
x taking values in
~le
assume that there exists a
f(x) (with respect to a suitable measure) which way serve as
an adequate probabilistic representation of the relative frequencies of the possible outcomes of the eXl)eriment.
If
f(x)
were Imovill, no statj.stical problem 'woulO.
exist, as the probability of any possible outcome could be calculated.
vfuen
f(x)
is unlmovID a statistical problem exists and may be approached. in d.ifferent vlayS.
An important consideration is the character of the assumptions the statistician
is willing to make concerning the nature of
f.
A common assumption is that
f
belongs to a class of distributions ....?!'and that these distributions may be indexed
by a parameter
g
€
in parametric form.
not unique.
0) (real- or vector-valued).
say tl::.en that the problem is
The parametrization of a class of distriJutions, however, is
y'")
If
vle
C
:::
'-'()
\
G
is a
(\
one-to-one transformation on ,,") ,then
0,'\,-,
might as easily serve as a parametrization for
'f"
,
.
,p
\
The choice of a particular
parametrization has been subject to convention and convenience but not to logical
specification.
Consider "'j. to be the class of normal
A simple example may be illustrative.
distributions vrlth mean zero.
Then Q is real-valued, and an arbitrary member,
f g , of the class may be written
1
::::-
G(2ii
Either
Q
or
Q
parametrized by
2
e
,
-
00
<x <
00,
o<Q<oo
(2.1)
may be consid.ered to be the parameter or the distribution may be
h
::::
1
";:Y-
Q-
~Raiffa and Schlaifer,
1961_7. Often a parametriza-
-3tion is chosen so that the parameters coincide vrith moments of the distribution as
~
in the above case, where
Q2, the variance of
X, is often taken as the parameter.
other classes of distributions, for example, the beta o.istributions; are not
co~:nmon~·
ly parametrized by moments.
3.
The Classical Approach
The classical approach to the statistical problem in its parametric formulation
has been to consio.er the
cernipg
statistical problem to be one of drauing j.nferences con-
These inferences are usually eJcpressed in the form of point estimates,
Q.
interval estimates, or tests of hypotheses.
Indeed, to avoid controversy, we will
consider that to be the definit,ion of t'classical approach".
'rhe syntactic or rrathe-
matical problems arising frum this forrrmlation have been a major SUbject of research
activity.
e
The semantic problem, that is the problem of
mal~ing su~h
inferences
meaningful with respect to the scientific problem under study, has recently been
g:l.ven serious attention, particularly by those advocating Bayesian procedures.
An
inclination to add to the argurnents against the meaningfulness of the classical
approach uill not be pursued.
Those who subscribe to this approach are unlikely
to be moved by further criticism here, while a more positive approach may be more
compelling.
We vall note only tlw.t the classical approach has transformed the do-
main of study from the random variable to an arbitrary parameter.
4. The Bayesian Approa::E
In the Bayesian approach the parametric formulation is as in the classical
approach.
(h)
vTith probability element
ditional density of
of
~
Q.
/-
The additional aSBumption is that there is a distribution
X
d
given
Q and
X
The marginal density of
observations
~,
r (Q).
In Eayesian terminology
::::
fG(~) d
~
7 (G)
'-'
is the con-
! is the unconditional or prior distribution
v~s
f.,. (x)
fQ(X)
,.
1/-'; on
'-,
defined in (1.1).
Given a vector of
the posterior probability element of Q given
C!.~; (Q)
in
E:
x
n
is given by
(4.1)
-4where
f'; (x) is defined by
I
-,
marginal density of X.
f::; (x)
, n
inth
x
substituted for
The posterior
(4.2)
;,
and
(4.2)
on
x
is indicated only by the subscript
If it is possible to specify a prior distribution,
of prior information
x.
is defined by
(4.1)
The dependence of
(1.1)
~
posterior distribution
be
j
I
r,
n.
the implied evaluation
combined with the experimental results to yield a
of
n
g
(4.1) .
In practice i may be based upon prior ob-
servation of the process being studied or by analog to some similar process.
The
rnajor unresolved difficulty in this procedure lies in specifying the prior distribution of
g
vThen
II
nothing is lmoi'm about
G".
A resolution of this problem by
postulate i'~s attempted, in a special case, by Bayes ~176~7.
e
A full discussion
of the history of this problem may be found in Perks ~19417.
The Bayes postulate stipulates that in the absence of p.rJ.or ImovTledge, all
points in the parameter space are to be tal>:en as equally likely, the so-called
"principle of indifference II.
For a < g < b, where
a
and bare arbitrary con-
stants and G is real-valued, a uniform prior distribution is taken.
-0)
< G<
taken.
co,
For
a normal distribution vr.tth
0 < G<
00
For
"large" variance a:o(l 2.rbitrary mean may be
an exponen'bial distribution with large mean may be taken.
The extension to vector-valued parameters is
straightfori'~rd.
An advantage of the Bayesian approach is that it permits direct
probability statements concerning parameters.
w~thematical
vlhile these mathematical probability
statements are not usually interpretable in a relative frequency sense, they lnay be
considered as a model for a subjective evaluation of the parameter.
point
estiw~tion,
interval
estiw~tion
The methods of
and tests of hypotheses have direct analogs in
the Bayesian context and,it is felt, can be more meaningfully handled in that context since direct probability statements on
G can be made.
The rrajor difficulty in the Bayes postulate arises in that it is not invariant
-5-
e
under one-to-one transformation of the parameter space.
As pointed out in section
2, the parametrization of a class of distributions is quite arbitrary and indeed,
given any parametrization, say
which
w~ke
p
n
\{
==
I..f (Q)
is one-to-one.
may also be taken as the paraAs an example suppose
X is
is the probability of success in each of the Bernoulli trials
up the experiment.
f
where
then
The class ')~ of binomial densities may be parametrized by
binomially distributed.
where
(Q)
\1'
meter provided only that
o < P < 1,
Q,
P
(x)
We have then
==
is the total number of trials and
x::: 0, 1, .•• , n
is the number of
successes.
Under the assumption that we have no prior information relative to the process,
the Bayes postulate would specify that the prior distribution of p
form
should be uni-
0 < P < 1, which corresponds to the beta distribution
,V-l
__ k-l/~
g(p)
with parameters
k
==
p
==
f
~.L-pr
t3(k,l)
== 1.
° < P < 1; k,
However, the parameters
\.t'
f >0
==
'* (p)
2 arc sin
==
!P'
f.n J:.. could also serve as parameters for f _rLindley, 19577.
I-p
The Bayes postu]~te would require uniform prior densities for 'f and \jr which are
amI \jr::: \jr (p)
==
equivalent to beta densities for
k
==
f
==
1
2'
p, in the first instance vdth parameters
and in the second instance with parameters
k
==
f :::
(e
€
arbitrarily
small) •
Indeed if we restrict
cation of k
==
f
==
/--"1'
~~
to the class of beta densities, for every specifi-
c (say) there COuld be found a transfoni~tion on
the beta prior distribution on
p with pa~ameters k ==
to a uniform distribution on the ne"T parameter.
a rule which would logically specify a value for
f
==
c
p
such that
would be equivalent
Thus our problem would be to give
c.
This general problem has persisted since the posthumous publication of Bayes
paper in 1763.
Notable attempts i-Jefferies, 19491 Perks,
19417
have been w~de to
-6circumvent this difficulty; hovTever" the results bave no-t..
tit
tance.
19517
One contemporary writer bad
but recently lLindley,
196J:.7
j.gnored
m~t "dtn
£19527
cerning Bayes theory.
(;0"'''''1''1''1
accep-
the problem in an earlier paper fiindle,'yj
has given it serious attention.
flatly asserted that it cannot be resolved 1Raiffa and Schlaifer,
papers of Fisher
"O.n;r
A second has
196J:.7.
Early
contain excellent discussions of this and other points con-
Fisher's discussion of the fiducial distribution of the
(n+l)~
observation 1-193'1.7 v18,s j.mportant in the clevelop:rre nt of the theory presented. in this
paper.
Two other crit:i.cisms of Bayes procedures have been raised on occasion.
The
first is tl1..at even if there is some prior information it is usually difficult to
formulate this adequnte1y into a prior distribution.
It is certainly true, however,
that the difficulty is not so great that it is completely unamenable to analysis.
Schlaifer
detail.
£19527,
and Raiffa and Schlaifer f196~7, discussed this problem in some
Given a method of specifying the prior distribution under indifference,
this problem could be even less imposing.
A third criticism, a logical rather than wnthematical one, has been disappearing
now that the mathematical theory of probability has been placed on a firm a)c1ornatic
basis, and developments in the philosophy of science have mOl'9 clearly demarcated
the syntactic and semantic aspects of probability theory.
It was at one time
ar-
gued that it 'Was illogical to consi.der a parameter to have a prior distribution as
the parameter
v~s
a fixed constant and not a random variable.
The error in this
thiru~ing
was the failure to recognize the difference between
the semantics and the syntactics of probability theory.
The syntactic definition
of probability, that is the definition of probability in the syntax of a wnthematical system, may be rigorously formulated within the theory of measure.
The semantic
definition, that is the definition which lirues mathematical probability, as a model,
to some empirical process
£19427,
is less easily explicated.
There has, according to Carnap
developed two (semantic) definitions of probability.
The first of these
-7-
e
is associated ,nth the concept of degree of belief, the second with the concept of
relative frequency.
Prominent names to be associated with the first definition are,
in the earlier period, Bayes and laplace, and in the later period, Jeffries, Keynes,
Good and Savage.
The second definition however, gained dominant acceptance through
the 'l'lOrk of Fisher, Von lUses and others.
It is possible to resolve the problem of definition and to avoid much of the
confusion that has been associated vnth the Bayes postulate only by fully recognizing that the concept of probability requires a syntactic definition
an~
in a
Bayesian context, tvlO semantic definitions.
5.
Minimal Information
In the previous section we discussed the problem of choosing a prior distri-
bution for
g 'l'lhen there VIas no prior information concerning
what we might call a minimally informative distribution.
g.
He were seeking
A general measure of the
amount of information in a distributi m may be tall:en from the ''lOrk of Shannon
£i94J.7.
If X has density hex), continuous-or discrete-type, then the information in h
is defined to be LLindley, 195§j
(
I(X)
~
I hex)
log hex) 0. x
';lc·
In the discrete case the integral is replaced by a sum.
h log h = 0
rex)
when
h=O
is aSSUIl1ecl.
It is shown
The convention
£ Shannon,
194,27 that the function
is the unique function, up to a multiplicative constant, satisfying certain
properties which might reasonably be required of an information function.
The major
property will be noted later.
One property of the Shannon information measure ~ShaIll10n, 19427, is that for
a <x <b
where
a
and b
are fixed constants and
density
hex)
=
1
b - a
,
g
is real-valued, the
-8the uniform density, has minirr.al information.
For
.00
<x <
00
and for Var(x)
fixed the normal distribution ,nth arbitrary mean has minimal information, and for
(}o
G(x)
o < x < 00 and
For x
fixed the exponential distribution has miniw.al information.
talting a finite number of discrete values the discrete uniform distribution
The requ:l.rement that Var( G) or
has minirr.al information.
valent to fixing the scale of measurement.
t' (Q)
be ftxe(l is equi-
In the case of a density in a finite
interval the scale of measurement is set by specifying the endpoints of the interval.
In each instance, the fixing of two constants is equivalent to fixing the origin
and scale of measurement; in the case of a doubly infinite range of x, the constants are the prior mean and variance, in the case of a singly-infinite range, the
constants are the endpoint (zero) and the mean, while in the case of finite range,
the constants are the two endpoints.
Extensions to two discrete cases not covered by Shannon are quite simple.
~
It
is easily seen that the geometric distribution minimizes, for fixed mean, the information among distributions on the non-negative integers and that the discrete
analog of the normal distribution, with the probability of the integer x propor-ax2
tional to e
' minimizes the information, for fixed variance and arbitrary mean,
over the set of all integers.
For the joint density h(x,y) of two random variables the
fined analogously as
I(x,y)
:=
JJ
h(x,y) log h(x,y)
ely
inforr~tion
is de-
ax
): y
The major defining relation of the information measure, mentioned previously, is
I(x,y)
where
=
I(X) + I(y/x),
I(y/x) is the information in the conditional distribution of y given
fLindley, 195§}.
It has also been shovm fLindley, 195§} that
I(x,y) ~ I(x) + I(y)
JC
-9with equality if and only if
minimize
x
and
yare statistically independent.
r(x,y) we need only obtain a joint distribution
for.ma tion in each marginal is minimized and
x
and
Hence to
h(x,y) such that the in-
yare independent.
Por dis-
tributions of infinite extent, means or variances must again be fixed.
to several variables is
6.
EJctension
straightforv~rd.
A Bayesian Indifference Postulate
It will be convenient now to give names to the distributions discussed in
section 1.
The density
fg(X)
,·Till be called the model density.
distribution of a single randam variable
X.
1
The distribution
density
1n
f
t
(x)
1
or more generally of a random vector
will be called the prior Bayes distribution.
will be called the prior Ziducial density of
will be called the posterior Bayes distribution and
posterior fiducial dens!t y .,
Bayes distributions and
f'1,0 (x)
=:
f'~
and not
(x).
x.
ffn(X)
f f.,7,n (x)TI
1
The marginal
The distribution
will be called the
Vie will also vdsh to consider the sequence
....
f(x)
X
It may be the
of fiducial distributions with
'4(0
Lfn}
=:
of
r,
J
Note that the true and unlmOi'ffi density of
by the more customary
fg
X
is denoted by
(x).
o
Our proposed Bayesian indifference postulate may be stated in the follovTing
form:
Let
'.'.J- be a parametric class of m.odel density functions
for
'T
.1\••
Let
Let
9
}--I
II
1---/
€
VI) be an arbitrary parametrization for
be a class of prior distributions for
Hhen no prior information is available, that
'.
/--;
~ € r'-7
/1,
.
Q.
is
chosen, if eXistent, which minimizes the information in
E
no such mLLL::i zing
..,
~;
\_,;.
exj~8ts
a
{
for ifhich
5
is arbitrarily close to minimum,
!
the information in
f
if eXistent., is chosen.
A slight further explicat.ion 'i¥ill be needed in some
caseD
and 'ifill be demonstrated
f
"
in a later binomial example.
Typically, few restrictions on
vlould be imposed.
-10However, it will often be convenient to restrict
f
to the class of natural con-
jugates, or some other parametric class, and then shoyr that the
f~(X)
minimizes information in
'?)
€
H
~I
I-i
*
-,
€
/---f
H
/--1
which
also provides the required minimization for
the class of all distributions over the appropriate spectrum.
Consider the folloiving example.
--2-
=
Par convenience let
~r·
. 2,
J
'-
Let the model density be
(x_Q)2
1
variance
1
e
-C<l
;
<
X
<
00
be the class of normal distributions with mean p.
we may express
1
and
by the density
(Q_p.)2
1
2'-1
g( Q) :: ')I 2:l!
e
Then the prior fiducial density (1.1) is found to be
2
( x-p,)
e
The information in
vlhen
l'
f"r, (x)
is
I <;) (x:)
is arbitrarily large, and
/.1-
--
'2
- log ~'Tc.:.+l
and thus is
r-
/2:l!e
is arbitrary but fixedo
min~llized
This coincides vlith
the usual interpretation of the Bayes postulate when the paraneter of the model
density is the mean
Q.
Additionally,
f"1 (x)
has minilJ1.al information among all
densities on
(-00, +00) ydth arbitrary mean and variance at most
~Ketteridge,
19617.
(1' 2+1)
The primary advantage of the ne'VT indifference rule is that it is obviously
invariant under one-to-one transformation of the parameter space, provided the origin
and
scale of measurement of the random variable are considered fixed.
Its reasona-
bleness is partially confirme<3. by its agreement Ylith 1-That has come to be the accepted
manner of applying the Bayes postulate.
4It
A second. example may be instructive.
Consider the binomial example given in section
4.
Since our postulate is in-
variant under reparametrization we way arbitrarily take the model distribution to be
-11-
f (x)
p
,
g(l')
1~
> 0, K > o.
Then it is readily seen that
_ r(n+l) r(k+K) r(x+k) r(?-~~
- r(x+l) r(n-x+l) r{k) r(IT r(n":"'+'":""k+-;f"'"')
Since
f (x) is a discrete density defined on
be minimized when
= 0,
1, .•. , n, the information will
f7 (x) is uniform, Le.
=
e
x
x '" 0, 1, .•. , n .
1
-n-+~l-
It is easily seen that this occurs if and only if k
= K = 1,
application of the Bayes postulate to the parameter
p.
which agrees with the
v'le had required that the
infonnation be minimized only in the class of fiducial densities generated by
prior Bayes distributions of the beta class, whereas we indeed minimized the information with respect to the class of all prior distrj.butions (though not necessarily
uniguely:
see section
8).
7. Posterior Fiducial Inference
He nOvT consider a second motivation for our postulate.
vlhereas both classical
and Bayes approaches to the parametric formulation involve methods of inference concerning same parameter (or moment) of the distribution, we propose that a more
natural procedure and one ,vhich may in many scientific studies be extremely useful, vTOuld be to estil11.ate the density function of
X.
This vTould have the advan-
tage of completely freeing us from the restrictions imposed by selecting a particular parametrization.
We would desire a general method which utilized all experi-
mental infonnation and all prior information, if there were any, but which could
still be used when there Vias Ilno prior inforw.ation", or \-Then it was desirable to
-12-
have a result which depended only upon the data .
Consider the sequences
conditions the distribution
. '-." )
.
I
f n Jand f~n(X)J .
'T n 'fill converge to a
t
l.
parameter value and hence
Under reasonably general
point distribution on the true
i·,ill be a consistent estiIl"stor of
f(x).
While the author has been unable to find necessary and sufficient conditions in the
literature it must be presmned that this matter has been previously studied.
follovnng two conditions are easily seen to be sufficient.
valued
G (extensions obvious)
(i)
f~t' (Q)
~(l...
(ii)
v18
The
For the case of real-
require
> 0 for every non- degenerate interval (a, b) in
'
(~),
The existence of a consistent Il"sximum likelihood estinEtor.
Distributions satisfying (i) will be called.adaptive_.
Our procedure
when no prior inforwstion is available is to begin vnth an a
t
"-
priori distribution
f ~ (x)
which minimizes the informa ti. on in
f~7n'(x).
.
the results of the evneriment
to obtain an estiwste
..~
and to utilize
Follm~ing
Lindley
~19~~7 it is easily seen that under this procedure the expected infornRtion in
f1 n(X)' i.e.
E~ I1n (x),
is greater than the information in
f 3 (x) where the ex-
pectation is taken with respect to the true distribution of
(Xl' X , X '···, Xn ),
2 3
and the expected value of the increase in inf.ormation is positive, i.e.,
I~,n (x)7
.-
> 0,
i :::
l, 2, 3,
In the normal example, variance one, under our proposed rule
f··-
i n
f 1 n(X)
is
-log /n+l
is the observed sample mean and hence the information in
/2rre.
He then have the "unusual" result that the infor-
n
Il"stion is totally independent of the observed value of
1n(x)
I -'
have
(x)
x
(see Section 9) where
,,18
is monotone increasing.
In other cases
x
and that the sequence
I'7;,n (x)
will depend on the
-13~
vector of observations.
It is possible in such cases to obtain, under certain
conclitions,
I 1n-l- (X)
i
This vall occur, roughly
< I"1n(x)
for some
i = 1,2,3, ...
speaking, when an unusual vector
(X
x n+l.)
n+l ,Xn+ 2"'"
is observed.
We might also note that the Bayes approach regardless of the prior distribution, provided only that it is not pre-judicial ~Raiffa and Schlaiffe, 196~7 has
the important and easily established property that in a sequential procedure the
final distribution
"--
::;
Jn
and hence
f -; n(x)
sufficient statistic for a parameter
is dependent only on the value of a
Q which indexes the distributions
~.
It
is thus, given the sufficient statistic, free of dependence on the individual ob-
4It
servations and more particularly on the order in 'l-lhich they occurred.
If
pre-judicial then by definition
restriction,
and
A
in applications, to adaptive, prior distributions would be in conformance with the
philosophy of methodology of science.
8.
Unique determination of the prior distribution
The proposed indifference postulate provides a solution to the problem of in-
variance under reparametrization, provided the scale of measurement of the random
variable is considered fixed.
It, however, creates a second problem.
If
the
postulate is to be meaningful it must be such as to completely specify the procedure to be followed and the result to be obtained for given observations.
resolves itself to the problem of showing that the proposed postulate
This
~niquely
specifies the prior distribution.
In some cases "le w.ay 'dsh
/'.
4It
j---Y
butions on (!:!), or 1-1
I-i
j-i
i_I
t---;
to be the class
* of
all adaptive distri-
1---/
may be taken to be a parametrizable class of adaptive dis-
tributions, such as natural conjugates.
readily obtained.
J-.../*
1--/
In the latter case uniqueness can be
In the former case it is often possible to obtain a solution by
-14restricting
}- --I
H
1--1
'-.
~
to be the class of natural conjugates, obtaining a unique
vlhich minimizes
I
1
(x)
and then shmling that
7 is also
/
•
•
1---1
un~que ~n N
€
*.*
;-'-1
(-/
I--f
(--1'
It does not appear to be entirely trivial to state conditions under which
the integral equation
f-~ (x)
_f f~(X)
:::
d ~ (Q)
,
x
€1
(Ii)
has at most one solution (for T! .' given
f1 (x)
f (x)
Q
ane1
(8.1)
(real values)
for all
-'"")
" .
VIe
shall, however, demonstrate uniqueness for sufficiently large classes of problems
to justify further research based on the proposed postulate.
It is hoped that
more general proofs of uniqueness way be forthcoming.
Let us suppose (~) is an interval (perhaps infinite).
If
7 is to be adap-
tive it would be convenient to restrict it to a continuous-type density with spectrum (ij) , so that
(8.1) becomes,
(8.2)
\ fa(X) g(Q) d G ,
J
In our normal example let
-
@
g(Q) be any prior density with spectrum 0~0
;
then if
(8.2) is to hold,
f
f
(
(x)
:::
1
)
~
/2:rc
@)
By factoring the normal density
for suitable
h(G).
g(Q) d Q for all
e
r;-
fQ(X)
But
we have the condition
h(G)
anCl. hence
unique by the uniqueness of the bilateral laplace transform.
shO\m without the assumption of the continuity of
of the bilateral laplace-Stieltjes
densities for
G when
(2)
is
transfor~m.
g(G)
are (essentially)
Uniqueness cdulCJ. be
7 by employing the uniqueness
The restriction to continuous-type
continuous is appealing.
technique somewhat, though the generalization is more
We may generalize this
forr~Bl
than useful.
-15If
,
.!
fg(X) d 9 <
for all
00
x
€
II
j
then
C
z
.
"\
(Q)
::::
Q €
,
!f
,.. J Qa
\
()
Z
d t'
.f,
(~:) , Z €
n
a
(1-1)
is a density function (natural conjugate) with parameter
plete class lLehmann, 195.27 of densities for
g
satisfying
completeness.
(8.2).
z
If
z.
€* )
C (G)
z
there can be a most one
(8.2)
The proof follows immediately from
by the definition of
Unfortunately the techniques for establishing the completeness of
classes of distributions tend to be rather varied and specialized.
of integral transforms
The uniqueness
is one technique.
The restriction (8.3)
e
is a corn-
made only in order to apply the Cl.p.finition of the
1'la8
completeness of a class of densities; completeness of more general classes of
functions could be define(l in an obvious way and applied to the lill:elihood function
f Q(x)
(a function of
·
b 0 th
P1Y1ng
Q
for
x
€ ;:.)
1'lithout such a restriction.
i tx
.
ox'" (8 . 2) by e
a n d'lnt egra t lng
over
Sl.d es
gously to the following sufficient condition.
of functions for
-00
<t <
00,
1'lhere
\f (t IQ)
If '(' (t
x
I .)
€
In fact multi-
'l
) \ , 1 ea d sana 1 0-
is a complete class
is the characteristic function of the
model density, then there is at most one prior density g satisfying
(8.2).
This
method 1'rould also apply in the normal example above.
A second instructive example is the binomial case.
The
mini~ally
infonnative prior fiducial distribution is
f
e
The model distribution is
(x)
:::
1
n + 1
anCl_ hence the integral equation is
,--I
j
1
n+l
I (n)
::::
)
X
0
pX(l_p)n-x g(p) dp,
X ==
0, 1, .•• , n
(8.4)
-16-
tit
FoI'
n:::: 1 vTe have
1
1
(~)
c
"2
pX(1_p)l-X g(p) dp
x, i.e., x : : 0, lj hence
But this equation must hold for all
t
~
and
J(l-P) g(p) dp
c
J
\
1
P g(p) dp
"2
- 0
L.-'
t' (p) :::~.
which implies
')
For
2
vTe have
n:::: 2
\
J
1
,
"3
2
'"
(x) p""(l-p)
n-x
which inth the restriction
the first
n
for
g(p) dp
x:::: 0, 1, 2,
o
implies
moments must equal the first
n
Var(p) ::: -
1
12
For general
.
n,
moments of the uniform distribution.
Thl1S, since the uniform distribution obeys the moment theorem we may obtain uni-
tt
queness by requiring that (8.4) hold for all
spectrum of
n.
It would seem that vThenever the
X was finite it would be necessary to require that the integral
equation
f
I (xl' .•. , x)
(
J.. ~
:::
i=l
n
f'
(x.) g(Q) d. Q
1.
(fi)
hold for all
n.
If it holds for
n
it holds for
n-l, since the conditions of
Fubini's theorem are satisfiecl we roo.y integrate (8.5) over
order of integration to obtain the required
x
n
and interchange the
result~
The POiSSO!l cane is an eJro-mp.le of a cliscrete rantlom variable but vlith an
finite spectrum.
irl~>
The model density is
x ::: 0,1,2, ... , with
t
fixed.
The Bayes density is
US1lally taken as a member of the conjugate prior class
TP'A. p-l e -'A.T
r( p)
(8.6)
-17The prior fiducial dene±ty ±s then
t
;:::
X TP r(x+p)
(8. "()
r(p)r(x+l)(t+T)x+p
which is simply a negative binomial, which has the geometric as a special case.
From section 5 we know that the geometric distribution minimizes the information,
for fixed mean, in the class of distributions on the non-negative integers.
The
equation (8.2) is then
(~O~-:>_A.t
rep) r(x+I)(t+T)x+p
)
(A.t)x
or
x
~()
=
J
A.
;:::
r( p)( t+T )x+ P
x
(3.8)
hl (A.) dA.
(,)
"There
h~j."( K)" ;:::
any function
hl(A.).
that
e-A.t
()
f!.-.t .A..
.
(8.6)
But
h(A.) ;::: e-A.t g(A.)
must hold for
x ;::: 0; 1, 2;
3; ...
Hence
must have all moments equal to the moments of
Since the condition of the moment theorem is clearly satisfied it follows
hl(A.)
and hence
is the unique solution to (8.8).
gl(A.)
It should be
/--1
noted that '-Te have here for convenience again taken J':j to be conjugate prior and
then shovm that the '{
mi.zes
I~
(
x)
'7
for?
/~;..( which uniquely minirniz,es
€
1_-(
J
1-7'
€"
~~
•
1--1
I ~ (x)
}
also uniquely mini··
The method of characteristic functions mentioned
above could alternately be used.
9. Frequency
pr~erties
of posterior fiducial distributions
- a conjecture.
Having focused our attention on a procedure which estimates the density
rex)
of the random variable
X, instead of a procedure which involves inferences
l-\}
concerning some arbitrary parametrization of
~:r,
an examination of the properties of the estimator
properties are given in section 7.
it is now necessary to begin
.n (x). Information-theoretic
In this section we shall investigate the relaf'S'
-18tive rrequency properties or the posterior fiducial estimation procedure.
X) i.e.
pro-
r ,n (x) to mal~e probability state-
pose that the experimenter nBy wish to use
ments concerning
1Je
perhaps to act as ir
= f(x).
r; n(x)
Hoperully)
these probability statements vnll have meaning as relative rrequency probability
equalities or inequalities.
The unique feature or the proposed method is that it begins vdth a minimally
inrormative estimate and proceeds sequentially through a sequence or estiwstes
which are) in expectation) more inronnative and in the limit converge to the true
density.
The amount of infonnation in a distribution is a ,measure or the
dispersion in the distribution.
v~s
proportional to -
In the nornBl case it
v~s
lacl~
or
found that information
Now to the extent that a distribution is less dis-
log~.
perse) that is has more inrormation) predictions made rrom it will be more precise.
Thus predictions made from the distributions
cise as
n
The model density is normal with mean
1; the prior Bayes density is normal with mean
and variance
(I.: x. + fl ,..-2)!( n + ,.. -2)
J.
density is norwsl with mean
(I.:
1-
lJ.
and variance
observations is normal with
(_2,-1
n + 'T" / •
and variance
x. +
n
G and
fl) arbitrary but rixed)
,..2; the prior riducial density is normal '-lith mean
2
1 + ,.. ; the posterior Bayes distribution arter
mean
will tend to be more pre-
increases.
Consider the normal example.
variance
r '"1 n(x)
lJ. ,..-2)!(fl + ,..-2)
The posterior riducial
and vdth variance
~(n+l) + "'-~7!(n + ,..-2) the specification of our indirference postulate
will
take
,..2
to be arbitrarily large and hence the posterior fiducial density
will be norwsl with mean arbitrarily close to
n+l
to
He have then
x
and variance arbitrarily close
n
n (x _ x)2
=
e
2 (n+l)
-19He consider this to be the posterior fiducial distribution of the
vation.
n+lst obser-
He are thus concerned with the extent to which probability statements
based upon this distribution have relative frequency meaning with respect to the
random variable
X which has true density
f(x).
Had 'V1e proceeded by more classical methods} e.g. least squares} maximum likelihood} or minimum variance unbiased estimation} we would have obtained
our estimator of
as if
Q::;:
x}
and if we had replaced
g
x
as
G in the model density} i.e.} acted
,ve i'lOuld have ob'ba:tned}
-.)2- ( x-x
f-(x)
instead of
e
2
(9.1).
The probability that a
intel~al
central
1
::;:
x
of
f(x)
1 -
a central interval of f-(x)
x
covers a
is zero} as it will occur if and only if
x::;:
1 - a
G.
rlore
specifically we see that the probability that the (n+l)st observation (Which) of
course) is governed by the lai'1 f(x)
f-(x)
x
is less than
will fall in the
1 -
1 -
a.
'-1ill fall in a
1 -
a central interval of
Ho'vever} the probability that the (n+l)st observation
a central interval of
f'7 n (x)
is ~~~ct~:z 1 - a.
This
can be readily verified by noting that the distribution of X- y'n+l is normal
n+l
with mean zero and variance
Thus i're see that a sequence of such indepen-
'n'
a central intervals of f
with long run relative frequency exactly 1 - a. 3
dent predictions based upon
1 -
'f
,n
(x) ,,,ill be confirmed
Somewhat similar results may be obtained in other examples.
These results are
not generally in terms of central intervals nor is exact equality attained.
~
binomial example} with the true
p > .5} the expected value of
probability of a success on the
n+lst
equality
.5 < P < p.
p}
In the
the fiducial
trial} is easily seen to satisfy the in-
Hence predictions of success w~de on the basis of
3we have} in essence} a Bayesian tolerance interval.
p
will
-20-
be conservative, that is the true probability of confirmation is greater than that
claimed by the fiducial prediction.
The described information theoretic behavior of
f
e.
";i..n
(x)
and a consideration
of the above two very special cases leads us to the conjecture that more general
frequency theorems may be obtainable.
The difficulty in generalizing this con-
cept lies in the fact that these frequency properties seem to be quite different
with regaro. '1:.0 the spectrum of the random v8,riable, i ,e"
or semi-infinite.
whether finite, infinite
This problem defines a major area of future research in the
development of the proposed mode of inference.
10.
Measurement of the random variable
It has been assumed throughout that the scale of measurement of the random
variable wus fixed.
e
The assumption is clearly more stringent than necessary.
is a linear transforwation on
If
X the minimally informative prior distribution
then determined will be consistent with the minimally informative prior distribution
determined by the original random variable and this result will be true independently of the particular parametrization chosen in either case.
Thus whenever we way
consider the scale of measurement of the random variable to be fixed up to a linear
transformation, the proposed mode of analysis ivill not be tro:,lbled by lack of invariance.
Invariance under non-linear transformation of the random variable is
generally neither present nor meaningful in classical procedures vnlen the model
density is specified.
11.
Acknowledgments
The author ivishes to aCknOi'Tledge his very great indebtedness to his advisor,
Professor W. J. Hall, who has given most freely of his time and energy to guide
the author in formulating this system, based 1nita11y upon little more than the
author's intuition and hopeful conjecture, into some semblance of mathewatica1
coherence.
Hithout this guidance these ideas could Dot ' have been developed.
Pro-
fessor Hall's assistance in revising the present 11lanus.cl'ipt is also gratefully
-21-
acknowledged.
vfuile many other members of the University of North Carolina community have
made some direct or indirect contribution to these developments, a few should be
mentioned
e2~licitly.
Professor R. Darrell
Bocl~,
of the Psychometric Laboratory,
has offered many invaluable criticisms and suggestions.
Professor Walter L. Smith,
of the Department of Statistics, has on several occasions shovm the author the
error of his
i~YS.
Professor W. Robert Mann, of the Department of
l~athewatics, r~s
given the author private instruction in areas of mathewatics in which he was untrained.
The Office of Naval Research and the National Science Foundation are to be
tharu~ed
for their financial support.
REFERENCES
Bayes, T., (1763) "Essay tOvTards solving a problem in the doctrine of chances. II
;Ph:i,J~_ TrallS :.-£9Y.LSQ9·, 53, 370-418.
Fisher, R. A., (1950), .90ntributions to Nathematical Statistics, vli1ey, NevT Yorl..
Papers 10, 11, 22, 24, 25.
Fisher, R. A. (1933), llThe concepts of inverse probability and fj.ducia1 probability
referring to ul1knovm parameters", .:e;ro~!-.-B.QJG.. SQ~~ A, 139, 343-8.
Good,
r.
J. (1950}, Probability and the Heighing of Evidence, Griffin, London.
Jeffries, H. (19~·8), T~eory ,of Probability, Oxford Press, Oxford.
Ketteridge, D. P., (1961), "Inaccuracy and Inferences",JQ}J.~'r.~J301l... Bt..aj;, SQQ., 23,
18L~-194,
laPlace, P. S., I1arquis o.e (1796).
Reprinted in A Philosophical Essay on Probabi:
1ities, Dover, New York, 1951.
e
Lindley, D. V., (1956), "On a measure of infol"rnation provided by an experiment,"
i.l;n1J..•~r.'I~th.L-Staj~ .. , 27, 986-1005·
Lindley, D. V., (1957), "Binomial sampling schemes and the concept of information,ll
13j,9metrikG1., 41+, 179-186.
Lindley, D. V., (1961), "The use of prior probability distributions in statistical
inference and decisions, II (1961) .r.rocJou:r.:th..llJ3rl~_.QY1ill?" California Press,
Berkeley.
Lel:umnn, E. L., (1959), Testing Statistical Hypotheses, Hiley, New York.
Hi.ses, R. von, (1957), Probability, Statistics, and Truth, HaclUllan, New York.
J\Teymt,Ul, J.; (1952); ,!;~stures and Conferences On !Iathematical Statisti.cs and
Probability, USDA Graduate School, Washington.
Neurath, 0., et. al., (1938), :C!!,:t~pationalJ:!:llQ.Y9l:QP~M~9fUnif;i.ed Science,
Univ. of Chicago Press, Chicago.
Perlr.s, W., (1947), "Some observations on inverse probability including a new indifference rule," J_ourlli..J.nst~~t., 73, Part II, 285-334.
2,5
Raiffa, H., and Schlaifer, R., (1961), Applied Statistical Decision Theory,
4It
Harvard Business School, Boston.
savage, L. J., (1954), The Foundations of Statistics, Wiley, New York.
Schlaifer, R., (1959), Probability and Statistics for Business Decisionrs, rlcGravlHill, Nevr Yor}>..
Shannon, C. E., anc111eaver, H., (1949), The Hathematical Theory c:f Communication,
University of Illinois Press, Urbana.
© Copyright 2026 Paperzz