Novick, M.R.; (1963)A Bayesian indifference procedure."

UNIVERSITY OF NORTH CAROLINA
Department of Statistics
Chapel Hill, N. C.
A BAYESIAN INDIFFERENCE PROCEDURE
by
Melvin R. Novick
and
W. J. Hall
January 1963
This research was supported primarily by the Office
of Naval Research under contract No. Nonr-855(09)
for research in probability and statistics at the
University of North Carolina, Chapel Hill, N. C.
Reproduction in whole or in part is permitted for
any purpose of the United states Government. Supplemental support was received from the National
Science Foundation, under Grant G-5824 and from the
National Institutes of Health under Grant GM-I0397-01.
Institute of Statistics
Mimeo Series No. 345
e
TABLE OF CONTENTS
page
'"
e
1
CHAPTER I
INTRODUCTION •
CHAPTER II
THE PARAMETRIC FORMULATION
4
CHAPTER III
DEFINITIONS OF PROBABILITY
14
•
3.1 Semantics and Syntactics
14
3.2 Frequency Probability
15
3.3 Logical Probability
17
3.4 Personal Probability
18
3.5 Fiducial probability
20
CHAPTER IV
AN INDIFFERENCE PROCEDURE •
24
4.1 Indifference) Invariance and Improper
24
Densities
4.2 Natural Conjugate Bayes Densities
27
4.3 Minimum Prior Data
31
4.4 Minimum Bayes Information •
37
4.5 Minimum Necessary Sample
42
4.6 Minimum Fiducial Information •
45
4.'7 The Natural Bayesian Structuring
49
CHAPTER V
53
NORMAL MODELS •
5.1 Univariate Norraal Model - Location
Parameter Unknovm
53
5.2 Univariate Normal - Scale parameter Unknown •
e
ii
55
page
5.3
Univariate Normal - Location and Scale
Parameters Unknown
5.4
Bivariate Normal Model - Regression
Parameter Unknown
5.5
58
60
BiVariate Normal Model - Three Parameters
61
Unknown •
Bivariate Normal Model - Five parameters
Unknown •
63
A Log - Normal Model
65
POISSON MODELS
67
6.1
Random Poisson Sampling
67
6.2
Poisson Process Model
68
6.3
Discussion of the Indifference Rule for
5.7
CHAPTER VI
the Poisson Model.
69
I¥1ULTnWMIAL MODELS
71
7.1
Binomial Models
71
7.2
Trinomial Models •
75
7·3
The General Multinomial Model
77
7.4
Validation and Semantic Interpretation of
CHAPTER VII
the Multinomial Models •
79
UNIFORM AND RELATED MODELS
82
8.1
Uniform Model - Scale Parameter Unknovffi
82
8.2
Uniform Model - Location parameter Unknown
84
CHAPTER VIII
iii
page
e
8.3
8.4
CHAPTER IX
9.1
Uniform Model - Location and Scale
parameter Unknown
85
A Positive Exponential Model
87
NEGATIVE EXPONENTIAL AND RELATED MODELS •
89
Negative
EA~onential
Model - Scale
89
parameter Unknown
9·2
Negative Exponential Model - Location
90
Parameter Ulli:nown
9.3
Scale Parameters Unknown •
90
9.4
A Pareto Model
92
9.5
A Weibull Model •
93
LlllITATIONS, EXTE.LiJSIONS AND REFINEMENTS •
95
10.1
Limitations of the Procedure.
95
10.2
Evaluation of the Four Principles
10.3
Transformations of the Random Variables
10.4
Fiducial Inference •
•
•
e
Negative Exponential Model - Location and
CHAPTER X
97
.
97
99
101
REFERENCES
,
·e
.
iv
ACKNOWLEDGEMENTS
The authors wish to acknowledge the assistance of Professors
W. L. Smith, W. Hoeffding, R. D. Bock} and L. V. Jones in offering
suggestions which contributed to the work in this paper.
Critical
comments by Professors L. J. Savage and D. V. Lindley on the preliminary report of this research were very helpful.
1\1rs, Doris Gardner handled the typing assignment in a most
expeditious and efficient manner.
•
1\1iss 1\1artha Jordan assisted in
the administrative details associated with the project,
·e
v
CHAPTER I
INTRODUCTION
In 1763 there a.ppeared in the Philo<;ophical Tra.nsactions of the
Royal Society a paper entitled, "An essay tm.'ards solving a problem
:in the doctrine of chances" (Bayes 1763).
This paper, the work of the
Reverend Thomas Bayes, had been transmitted to the Royal Society,
following the author's death, by the author's friend, Richard Price.
It has been suggested that Bayes failed to publish this paper because
of his fear that the postUlate which was the basis of his work "would
be thought disputable by a critical reader" (Fisher 1956).
This
pos~
tulate was however later accepted, without question) by Laplace and
occupied a central position in his The-orie Analytilue des Probabili-
". (Laplace 1820).
tes
Later generations have found the axiomatic na-
ture of Bayes postUlate to be, at best, questionable.
Beginning, per-
haps, with Boole (1854) there appeared numerous criticisms of the
Bayes postUlate and later numerous attempts to circumvent or modify
it.
Few problems have attracted the efforts of a more distinguished
roster of statisticians) and certainly none has given rise to more
acrimonious debate
Q
The Bayes postulate was stated for use in what has come to be
termed the theory of inverse probabili.ty, Le. the theory which deals
2
with the problem of assigning probabilities to states of nature or
causes after having observed experimental results or effects (Fisher)
1930). This theory makes use of Bayes theorem which, in the form
given by Jeffreys (1961), is
Posterior probability
or Prior probability x Likelihood
Bayes theorem is an immediate consequence of the definition of
ditional probability and hence is not subject to question.
posterior probability can be obtained from
(101)
Bayes~Laplace
con~
But a
only if both the
likelihood and the prior probability are assumed known.
postulate (or
(Ll)
The Bayes
indifference rule) specifies that when
it is assumed that before the observational data are available,
nothing is known about the true state of nature, a uniform prior
dis~
tribution over the possible states of nature should be assumed.
This
specificat:i.on yields an apparent symmetry in that it would seem that
all possible s''sates of nature are being considered as "equally likely!!
a priori.
The major difficulty arising from the Bayes postUlate is that
when the states of nature are indexed by a continuously variable paramete~
it is not invariant under reversible transformation of the para-
meter space.
This paper presents an approach to the problem of specifying
prior distributions within a parametric structuring and under indifference conditions when the formulation is designed to serve as a
model for scientific inference.
Heavy reliance is placed upon the
works of D. V. Lindley, of H. Jeffreys} of H. Raiffa and R. Schlaifer,
and particularly of R. A. Fisher.
An attempt is made to modify and
expand principles and proposals taken from these works and to integrate them with other principles to furnish an acceptable procedure
for specifying prior distributions under indifference for a wide
range of statistical models.
The procedure suggested in this paper
appears to overcome the lack of invariance exhibited by Bayes postulate and to satisfy some heuristic criteria which may be reasonably
demanded of an indifference rule.
The question of the interpretation
of mathematical statistical statements in applications is considered)
and a dual mode of interpretation of the Bayesian model is proposed.
The indifference procedure is applied to one and two parameter
normal models, uniform and exponential models, Poisson and multi-
-e
nomial models, normal regression models, and several other models.
It is not proposed or supposed that this paper contains any final
statement on the general problem of indifference specifications.
It
is suggested, however, that the proposed procedure yields reasonable
specifications in the relatively wide range of cases considered and
may be capable of further extension and refinement.
CHAPTER II
THE PARAMETRIC FOR.\11JLATI0N
In this chapter we introduce and discuss the parametric fOrIDulation of the statistical inference model and contrast the classical and Bayesian approaches to statistical inference.
the real- or vector-valued random variable
the Euclidean space
1.
We associate
X, taking values
x
,with the possible outcomes of an experiment
performed on experimental units dra\1U from a defined population.
assume that a distribution function
function
sets of
in
M(X)
We
and a corresponding density
m(x) (with respect to a fixed measure defined on the Borel
"If )
may serve as an adequate probabilistic representation
of the relative frequencies of the possible outcomes of the experiment.
When M is known, no statistical problem exists, as the proba-
bility of any collection of outcomes may be calculated.
irJhen M is
unknovffi)a statistical problem exists and may be approached in various
ways.
An
important consideration is the character of the assumptions
which the statistician is Willing to make about
M.
A common assump=
M belongs to a class ~l of distribution functions and
tion is that
that these distributions may be indexed by a real- or vector-valued
parameter
e
€
(0) .
By this is meant only that there exists a one-to-
one correspondence between the points of the parameter space and the
distributions in i~.
The parametrization of a class of distribu-
5
tions, however, is n0t unique.
Any reversjh2e transformation on
1Jt.
may be taken as an indexing ];1U>:'Qm.o-l:01.~ for
®
The choice of a
particular parametrization has been subject to convention and convenience but not to logical specificatione
For example, let ~ be the class of normal distributions with
density functions indexed by the parameter rr, and written as
In this particular formulation the chosen parameter rr
to the standard deviation of the distribution.
have been indexed by
~~
corresponds
could, however,
ex:: rr 2 , the parameter corresponding to the
variance, by p :: rr- 2 , the parameter corresponding to the precision,
-e
or by A
=-
log rr/2~e , the parameter corresponding to the (Shannon)
information of the distribution.
Often a parameter is chosen to corres-
pond with a moment or other important characteristic of the distribution, or otherwise simply for convenience.
A parameter usually does
indicate some kind of ordering of the densities, however, and therefor it is reasonable to consider only a somewhat restricted class of
possible transformations.
parameter
We shall assume that there is some natural
9:: (9 , 9 , ••• , 6 ) and shall restrict ourselves to monor
2
1
tone and differentiable transformations
A.
~
b.
= ~(e.)
~
may be expressed as the finite sum L: a j eJ where
real numbers.
examples.
GP
such that
aj
and b j
~i(e.)
~
are
This last requirement will not be necessary for some
For simplicity of development we shall further assume that
is an n-dimensional interval.
Finally, we assume that the scale
of measurement of the random variable is fixed up to a linear transformation.
6
The specification of the class
%J( involves
an assumption about
M vrhich presumably is made on the basis of prior knowledge.
knowledge may have been gained in a variety of "',lays
0
This
We may have
sufficient knowledge concerning the physical process generating the
random var:i.abJ.e to completely specify ~.
For example, the assump-
tions of the Poisson process may be valid and
hence~~ may be the
class of Poisson or negative exponential distributions, depending on
the nature of
X.
We may have considerable experience with similar
experiments or processes and hence may find it useful to use this
information as a basis for specifying
'~ by analogy. Finally ~~
may be obtained by empirical curve-fitting based on previous observa-
-e
tions of
X.
The specification of the popUlation, the experiment, the sampling
rule, the random variable, the restriction of M to a particular class
~
, the specification of the dominating measure and associated metric,
and the specification of the parameter element
e
will be referred to
as the Rr::rametric structuring of the statistical inference problem.
For purposes of this paper sampling will be restricted to simple random sampHng; and the dominating measure wtll always be either Lebesgue
or counting measure.
The distribution function
will be called
the model distribution, and the corresponding density me
called the model density.
Th~.lS,
will be
the moael density will either be
the usual probability density function of a continuous random variable
or the usual frequency function of a discrete random variable.
7
The assumptions used to specify the model are not, within the
inference procedures to be discussed, subject to modification on the
basis of the data.
Ideally these assumptions 'would
certain and exact knowledge.
represent only
The statistician, in eraploying a model,
is using a necessary, convenient and useful fiction.
He must com-
promise between accuracy and convenience in choosing a model and
then subject it to continuing re-evaluation based upon inference
dravm from outside the model.
Thus all inferences made using these
procedures are conditional inferences, conditional upon the accuracy
of the model.
On the basis of the information available to him, the experimenter will design an experiment and specify measurements to define a
random variable.
At times either the experiment chosen or the measure-
ments specified will not be such as to utilize all of the prior certain
knoviledge.
For example, the experimenter may know that the process
under
could yield measurements vihich would be very nearly nor-
stud~r
mally distributed.
However, for convenience or by necessity, he may
choose to limit himself to measuring whether a resultant value is
greater than or less than some fixed value, that is, he specifies a
binomial model instead of a normal model.
Alternatively, a normal model may be appropriate, but all observations outside certain limits are truncated (or censored) so that
a truncated normal model is in fact appropriate for the experiment as
actually performed.
A third example is provided by a multinomial ex-
periment in which draws are made from an urn containing balls of
8
several colors j but a record is made only as to vlhether or not they are
black:,
OT'
sampling from the urn ln8,y be inverse rather than
dir~c::!,
in-
ducing a negative binomial rather than a binom:1.,-,l model.
Tbe
ter~
experimental model may be used. to refer to the model
a:ppropriate to the experiment as performed and :meEJ,surernents as taken.
The term infc':'mational model may be used to refer to the model approFciate to an experiment and resulta,nt meaSUrDmel1ts 11h1c11 are such as
to J.:'J.t;0:Y.':;90rate all prior certain knowledge.
Frasel" (1962) has also
expressed this kind of idea in a discussion of a paper by Ao Birnbaun.
For the purposes of this paper it is assv.med
':~hat
the experimental
model is identical with the inforrr..ational model.
More generally i't
would seem lleccssary that the Bayesian parametric
structu~ing
be
detem:Lned uith reference to the informe.tional mod,)l when the experimental and J.nformational models differ.
The teclmical problems 1n-
valved in this case ,'ill not be studied here.
'.r'he classical approach to the statist:i.cal inference problem in
its paY.'8..metric formulation has been to cons:ider the primary statistical
pI'oblc;;m to be that of draWing inferences about
to prior distributions
01'
9, without recourse
considerations of actions or attendant losses.
The inferences are usually expressed in the form of point estimates,
interval estj:mD,tes, or tests of' hypotheses.
~lasstcal
For convenience, the te:rm.
aEEoach 'Hill be used to denote the theory employing .lust
these an(l related teclmiques.
decisicn~theoretic
It is thus dis·tinguished. from both t.he
approach and the Bayesian approaches.
9
From a purely mathematical point of vieil, the theory underlying
the classical approach seems unimpeachable.
Yot there has been a
con~
tinuing and particularly voced critj.cism of these techniques by those
who find a discrepancy between the valid interpretations 'of clgssical
theory and the requirements of scientific reporting or individual
decision making.
A general agreement ivith a number of these criti-
Clsms underlies a motivation for the research reported here; however,
neither an appre,isal nor a review of all cf these criticisms i.8
attempted.
The major advante<.ges and disadvantages of the Bayesian
approach as compared to the classical approach will be sketched,
though not in detail.
No further re:ference will be made to the c1e-
cision-theory approach as it is not primari1.Y designed for purposes
-e
of scientific inference (Lindley, lQ50
).
In the Bayesian approach the parametric formulation is as in
the classical approach.
In addition, it is assumed that there is
i.'
specIfied, a priori, a probability measure on the Borel sets of ~!:!.) ,
and that the model density
density of
X given
me(x)
is regardecl as the conditional
The dominating measure is usually, and here,
assumed to be Lebesgue measure and the corresponding density function
will be denoted by bee).
The joint density of
or mixed type, is thus given by ille(X) b(G).
the random variable
e
(X,
(Vie
G), of
continuous
do not distingUish
and its values.)
The joint density function of
n
independent observations, as
a. function of the vector of observations for fixed.
8, is denoted by
10
1, 2, 3, •••
n :::
me ,n (x)
-
e
'rhis same expression) as a function of
t!le likelihood functj.on.
(2.2)
for fixed 3":' is called.
'Ithe prior unconditional density of a sin-
gle observation,to be called the (prior) fi&lcial density of
X, is
defined by
Given a vector of
given
n
observations
me,n(~) bee)
:::
n
(Where 35:
f me,n(~)
e
,
n
==
1,2,3, ••• ,
bee) dO
has been supJ?ressed in the notation
is knOvffi as Bayes theorem.
of
the posterior density of
~,
is given by
~,
b (8)
~~ty
(2.3 )
r'me(x) bee) de
,)
f(x)
bn (8».
This formula.
The posterior fi&lcial (unconditional)
den~
X is given by
fn(X)
==
\
me(x) bn(e) de
1
n - 1,2,3, ••• "
,_I
X and the selection of
The definition of the random variable
the indexing element
of
X and
"'.,
e
e
are often such as to restrict the domains
to some subset of their respective spaces.
')( (or ~) of the definition of
the sJ?ectrQ~ of
X (or
e).
X (or
0) ,vill be referred to as
The spectrum is thus the region of (possi-
,.
bly) positive density.
The region
The sequence ~_bn(e)
i 'n
~
==
0,1,2..•.. ,
b (e) ::::. b( e), will be called the sequence of Bayes densities.
o
quence
t J
fn(x)
,n
==
The
se~
0,1,2, ... , f o(X) -;:-..:: f(x), will be called the
sequence of fiducial densities.
The letter
b
will be used to denote
11
an~l
Bayes density,
n
to denote any model density, and
f
to demote
any fiducial density without implying the identity of densities thus
denoted.
Given a prior Bayes density b(O), it is possible to combine the
implied evaluation of prior information with the obtained experiBontal
results to obtain posterior evaluations of
terior Bayes density (or fiducial density).
e
(or X) through the
Point or interval
pos~
esti~
mates of any parameters or tests of simple or composite hypotheses
ab~lt
any parameters may thus be obtained by an appropriate integration.
Such posterior evaluation is, under very general conditions,
ccns:i.s~
tent, in that the posterior Bayes distribution approaches, with
pro~
bability one (M), a point distribution on the true parameter value
and in that
f (x)
n
(tensity m(x).
approaches, with probability one (M), the true
An important necessary condition is that
positive, alIilost ever:Y1'lhere, throughout
bee)
be
(0) (that is, that Lebesgue
,-j
.
measure be dominated by the prior Bayes measure).
ties having this property will be called
~~;.1pti.!~'
Prior Bayes densiThis req't::.irement
corresponds to Jeffreys' convention that no possibility be excluded
a priori.
A general discussion of the question of consistency may be
found in Le Cam (1958); however, direct verification of such consistency is not difficult for any of the exa.mples treated here.
A second desirable property of Bayes procedures is that they
depend on the sample only through a sufficient statistic.
Neyman
fac~orization
theorem we have
By the
12
where
s
= sex)
8
€
e
sufficient for
n
(the spectrum of the sufficient statistic) is
h (x)
and.
n-
does not depend on
e.
We assume the
existence of a sufficient statistic of fixed dimensionability for
n
(2.4)
to
sufficiently large.
Then the factor
cancels out in
yield
b (8)
ke,n (s) bee)
-------
:=
n
r
1\
o,n (s) bee) de
ko
,-I
The function
is called a kernel of the likeJ.ihoo~ (Raiffa and
ke,n(S)
8chlaifer (1961).
A third d.esirable property of Bayes procedures is that they de-
pend only- on the joint likelihood.
-e
b(8), and a likelihood function
Bayes density
b
n
(x), the resultant posterior
,nl (8), if used as a prior Bayes density with a new
8.S
me
l
likelihood function
Bayes density
Thus, given a prior Bayes density
me
(x), will yield the same final posterior
,n2 if the data yielding the t'wo separate likelihood
functions were pooled to yield a single
bee) to Y'ield a posterior Bayes density.
lil~e1ihcod
and combineo. vlith
More generally the posterior
Eayes c1ensity does not depend upon the oro.er in which the (lata were received, the sampling or sequential stopping rule en1ployed, or
consid.era~
tions of what results E2.!Sht have been obtained from the experiment,
but only upon the expel':.l.mental model and the actual results obtained.•
as summarized in the likelihood kernel.
(This is in keeping with the
_likelihood principle) e.g •. Birnbaum, (1962).)
The problem of multiple comparisons dues not arise in a Bayesian analysis.
Since the joint posterior Bayes density of all
para~
meters is ava:iJlble, the statistician is free to transform this
joint density to one on any other set of parameters more appropriate
to the statements he may 'V1ish to make.
A suitable region in this new
space may then be defined so that the integral of b (e)
n'
region is not less than some fixed level.
over this
Individua.l statements can
then be made marginally about each of the parameters involved. with
total probability as given by the joint statement.
Finally, and most importantly, the Bayesian approach permits
direct probability statements about any parameter of interest.
The
primary criticism of classical theory is that it does not pemd:t
direct probability statements about
about
e
but rather reqUires inferences
0 based to a large extent upon a logic 'V1hich l?isher (1956)
points out to be a simple disjunction.
For example in classical hypo-
thesis testing if the data fall in the rejection region, having small
probability under the null hypothesis, either a rare event has oecurred
or the null hypothesis is not true.
This constitutes a disjunction,
and hence we infer that the null hypothesis is not true.
~.rect
statement about
e
However, no
is made nor is one possible, and. even after
the conclusion of the disjunction is ass8Tted}nothing is said about the
relative credibility of various subsets of
QD.
In the Bayesian ap-
proach the credibility of any hypothesis set can be determined.
On
the other hand the advantages of the Bayesian approach are all dependent upon the availability of a prior Bayes density.
CHAPTER III
DEFINITIONS OF PROBABILI'ry
3.1
~emantics
and Syntactics
In this chapter we discuss the meaning of probability and introduce a dual mode of interpreting the Bayesian model.
The Bayesian
approach to the statistical inference problem has been subject to
criticisms other than those involving the difficulties in specifying
the a priori probabilities.
These criticisms have not dealt with the
mathematical validity of the theory, which seems as unirrfpeachable as
the mathematical validity of the classical theory, but rather with
its applicability.
density~
Thus, given the availability of a prim:' BayeD
the question of relative applicability is essentially the
sole basis for comparing the relative merits of the t"vTO theories.
To
clarify this point it is convenient to revielT same fundamental principles of mathematical logic and to introduce certain definitions.
Mathe,"
matics proceeds by means of the axiomatic method by which a purely abstract theory is developed deductively on the basis of the specification of certain elementary entities, the assumption of certain relations among these entities and the acceptance and use 0:1:"' a formalism
of logical deduction.
The sole necessary requirement of a valid
mathematical theory is that it be consistent in that no logj.cal contra
dictions are derivable in it.
It is in this sense that it has been
n'
15
said that the mathemattcal validity of both the classical and Bayesian
theories seem unimpeachable.
The axiomatic development of probability theory requires a d.efinition of probability within the mathematical system.
the syntactic definition of probability.
This will be termed
A syntactic definition of pro-
bability based upon the theory of measure was first proposed by Kolmogorov
(1933)
and advocated by Cramer
(1946),
Doob
(1941),
and is novr the generally accepted axiomatic formulation.
and others,
~Jithin
this
theory, probability is sinrply a normed measure defined on a fixed
rr-algebra of sets.
The use of any abstract logical system in the study of the
physi~
cal vrorld. requires the establishment of a correspondence between the
elementary units of the abstract theory and the physics,l elements undeT study.
The axioms of the abstract theory must further be chosen
so as to conform with knovffi simple behavior of the physical elements.
The
establish~ent
of this correspondence between axiomatic probability
theory and the physical world
invoJ~es
the specification of what will
here be termed the semantic definition of probability.
Classical
theory has been associated idth what is known as the relative
(semantic) defin1tion of probability.
freq~e.!?cy
Bayesian theory has been aS80""
ciated with essentially two semantic definitions.
3.2
Frequency Probability
The frequentist view of the application of the probability calcu-
lus has been dominant throughout this century.
Indeed the feel:i.ng had
been so intense that this was the only legitimate application of probability
theol~ tl~t
it is only within the last several years that any
16
widespread effort has been made to employ techniques applicable to
a~'1Y
other definition.
Two unfortunate results ensued from this pre-
occupation i'lith frequency interpretation.
The first was the confusion
between syntactic and semantic probability which led to t11e attempt
to give a
s~~tactic
frequency.
definition of probability in terms of relative
The second was that the number of neif developments in the
Bayesian syntax since the time of Laplace has been severely
limjted.
Important contributions have been made, particularly by J>effreys and
more recently by a number of vITiters, but the work of a few men could
not match that of an army of frequentists.
In confronting an applied.
inference problem today the statistician may be forced to employ a
classical procedure whether or not he considers it really appropriate
because the corresponding Bayesian procedure has not been derived.
The frequentist theory requires that every pro'uability be interpretable as the limit of a relative frequency.
If it is said that
the probability is
~othetical
p
that a ball dravm from a
urn con=
taining an infinite number of balls will be red, then this must mean
only that the observed proportion of red balls from an arbitrarily
large sample of balls must almost always be arbitrarily close to
No other interpretation is permitted.
FbI'
p.
detailed exposition of this
theory one may refer to Carnap (1945), Von Mises (1939), and Reichen~
bach (1935).
For interesting
e~aluations
of its limitations in statis-
tical inference one may read Savage (1954), Raiffa and Schlaifer (1961),
Birnbaum (1962), and the discussions by Good following the last paper.
1'7
.1-1
3.3
Logical Probability
The first semantic definition applicable to Bayesian theory
has been termed logical probability and may be associated with
the works of Bayes (1763), Laplace (1820) and Jeffreys (1961).
The second has been ter.med personal probability and may be associat.ed the works of Ramsey (1926), De F'inetti (1937), Good (1950),
Savage (1951).) and Schlaifer (1959).
Comprehensive explications of
these sp.mantic definitions may be found in the works of Jeffreys
and Savage.
A short but stimulating survey of semantic definitions,
together with a vigorous discussion of the axiomatic method, may be
found in Good (1950).
In siL1IJlest terms logical probability involves the TIl.ck:i.ng of
numerical statements concerning the relative credJ.'bility of hypotheses
(parameters), given all prior and experimental d8:1:;1:1,"
A satisfacto:cy
explication of logical probability is dependent upon the availability
of a satisfactory Bayesian postulate.
A proposed theory of logical
probability is tested by verifying it.s internal cOdsistency and by
verifying t.hat simple derivations from the Bayesian SYlltax lead to
intuitively plausible interpretations.
H9..ving validated these sl.mple
derivations we would then be inclined to believe that more complex
derivations would yield meaningful and valid results, though in these
cases our intuitions may not be sufficiently schooled to
judge~
It is
because of this limitation on the scope of our intuition that we require a theory.
For this reason each mathematical result obtained
requires semantic validation.
A single important example vlhich
18
grossly offended our intuition would be sufficient to question the
generality of a theory of logical probability provided we were
con~
vinced that our intuitions could not be enlightened to eliminate
the offence.
Fortunately, the ease
with which intuition can adapt
to interpret and explain mathematical derivations is remarkable.
3.4 ,?Grs_onal probabilit;r
Personal Probability has to do i'lith the
an idealized rational person"
rj.sk~taking
If it is said that your personal pro-
babiHty that a ball drawn as in 3.2 will be red is
means that
behavior of
p, then this
p/(l-p) is the odds that you vTOuld lIbareJ.;y be vlill:i,ng to
offer ll for a red ball against another ball (Savage, 1962).
A convenient way of classifying a Bayesian theory as pertaining
to logical probability or personal probability may, follovdng Savage,
be based upon the theory's proposed method of specifying the prior
Bayes Cl.ensity'.
If the specification is based upon the
experime~J.tal
or introspective evaluation of a person1s a priori beliefs concerning
the relative credibility of the avaHable hypotheses,
the~i
w:ill be considered to be a theory of personal probability.
specification is based upon consideration of
s~netry
If the
or indifference
in the spirit of Bayes postulate and modifications baseQ
data, then the theory 'Hill be considered to be a
th(o theory
th80Y;j"
l~on
prior
of l.og1ca.l
probability.
The problem studied in this paper is that of specifying the
prior Bayes density uncler indifference iolhen it is supposed that the
resultan~
statistical model is to be used for the evaluation of scien-
tific data.
It is important to note that 'ole are dealing with a problem
19
in the area of statistical inference theory as opposed to statistical
declsion theory (Lindley, 1956).
w11ile it is possible that. some ulti-
mate action may be taken on the basis of the inferences dravm, it
would be desirable that these inferences be made independently of
any considerations of possible actions or resultant losses.
The pro-
position (savage) 1962) that a person faced with a. choice of actions
must rationally behave lias if" he had some prior distributdon and
some loss function seems reasonable, but it seems equally reasonable
that, for the problem discussed in this paper, the prior distribution
must be free of personal bias.
A second consideration precludes the adoption of the personal
approach.
Often it is desirable to assume no prior knOWledge;> not
because there is none) but because there is too much.
Several theol'ies
may have been proposed and several sets of contradictory data may be
avnilable, each supporting a different theory.
In this context it may
be convenient to disTegard all prior data, in which case the indifferenCe
reqUirement is clearly one of symmetry 1'r1 th respect to the possible
theories.
This consideration really only
~~derscores
the idea that
scientific inference involves group inference; hence prior Bayes distributions cannot properly be determined on the basis of personal bias)
and thus) for the purposes of scientific inference the requirement
is for a theory of logical probability 'wj.th respect to statements about
parameters.
We have purposel;yr
e~3che'l,'1ed
the use of the commonly used terms
subjective and objective probability.
Bayes methods, whether serving
20
as a eoclel for logical probabilit:>r or for I)ersonal probability, may
be either subjective
01'
density is determined.
objective depending upon how the prior Ba.yes
In the case of personal probability, if the
prior Bayes density is a.eteradned by the introspective evaluation by
the
person concerned of his beliefs, then the method may rightly be
called
s'Ll.bject~
If hOvTever his beliefs are obtained ex-perimen-
tally through a measurement of bis bebavior, then the method may
rightly be called objective.
In the case of logical probability if
specific prior data are conbined I,Ti th a satisfactory Bayesian postulate to produce a prior Bayes density we may tenn the method
obd~~'
If some "general consensus of opinion ll concerning current available
data about the parameter is used to modify a prior density obta.ined
from a satisfactory Bayesian postulate we may term the method
'Live.
E~~ec~
The term subjective probability is taken (by some writers) to
'be synonymous with the term personal probability
.3 •5
Fj.ducia~
Probab5.li ty
Claf3sical theory and current Bayesian theories deal primarily
techniques for making inferences about 9.
But
0 is, generallY7
nothing more than an arbitrary indexing element of the class
~Y
,n th
'//1,
which
happen to correspond to an important characteristic of the distri·"
bution, and as indicated in example (2.1) the correct choice of para.meter is often not clear o
In light of the structuring defined above,
it would seem that the more fundamental subject of interest is the
random variable
..
experiment.
X or v&lues thereof in future repetitions of the
This concept was introduced by Novick (1962), and later
21
and independently a similar idea was suggested by Fraser (1962).
To understand nature is to predict successfully, and the proof
of understanding is valid prediction.
We therefore submit that a
realistic prime object of inference is to provide a prediction model
(which we small term fiducial) to be substituted for the true but
uDkno'Wn model specified by lvI.
In line with the classical approach
we may describe this approach as one of estimating the true model
density function
m(x)
by the fiducial density function
f
n
(x).
The
point is not whether it is possible, by use of classical or Bayesian
techniques, to make statements about
X, or to estimate m(x), but
rather whether the assertion of these statements is considered to be
of first importance in formulating the inference procedure.
In the
Bayesian approach we propose that any indifference principle be
COD-
sistent with this concept.
The technique for utilizing this idea 'wLLl
be presented in Chapter IV.
Of course, ficlucial prediction a.oes not
precluc1e Bayesian logical probability inferences about parameters.
The use of the posterior marginal (fiducial) density of
X first
appeared as the law' of succession, Laplace (1820), and has played an
important part historically in the evaluation of proposed indifference
rules.
The idea of asserting statements about
X in a completely
s~'ll1Uetric
e
and statements about
manner was proposed by Fisher (1933) in
his discussion of the fiducial distribution of the (n+l)-st observation.
Our choice of the term fiducial for this density is based upon our
interpretation of this neglected aspect of Fisherts work.
Although fiducial statements may be considered as logical proba-
22
bility statements about the (n+l)-st observation, they are capable
of certain direct frequency interpretations.
If after each
obser-lJ"a~
tion from one fixed urn we use the fiducial density to predict the
next observation, for example by predicting that the next observation
i'Till fall in the central (l-a)-interval of the fiducial density, then
the continuation of this process will lead to a long run relative frequency equal to (l-a) of (n+l)-st observations falling in the predicted
(l.. a)-interval, provided the conditions for consistency are satisfied.
More importantly, at least one example exists in which an exact
(not long run) frequency interpretation of fiducial statements is
possible.
Suppose
1'7e
had available a large number of urns from which
we could draw objects having associated with them a random variable
vITlich was norillally distributed ,nth unit variance but with
means, 1'Thich might be different for the different urns.
n
ulli~no1'm
We then o.raw"
observations from an urn, compute a central fiducial (l-a)-interval
and then determine whether or not the (n+l)-st observation from that
urn falls within that interval.
~
If the prior Bayes density of the mean
for this urn is uniform (and hence iraproper - see Chapter IV); then
X 1 is normal with mean x
n+
n
Xl' •• ~, X ) and variance (n+l)/n. By direct elimina-
the posterior fiducial density of
(sample mean of
n
tion methods we would finQ that the true distribution of
Xn -
normal with mean zero and variance (n+l)/n, independently of
X 1
n+
~.
is
Hence
it follows that the probability that the (n+l)-st observation falls in
the central (l-a)-interval of the fiducial density determined by the
first
n
observations is exactly
l-a, and hence the relative frequency"
23
of occurrences of this event over all urns is exactly
I-a. We shall
find that the principles proposed in this paper lead to the above
prior Bayes density for
~!.
Fraser (1960) has used a similar derivation for the one-parameter
normal case to demonstrate a quite different frequency interpretation
of fiducial probability.
The basic difference between fiducial pro-
bability as proposed above and as considered by Fisher and Fraser is
that the system proposed herein is avowedly Bayesian, whereas Fisher
has denied that his fiducial probability is related to the Bayesian
approach except by coincidence, and Lindley (1958) has verified the
extent of this coincidence.
CRAFTER IV
AN INDIFFERENCE PROCEDURE:
4.1
Indifference_Invariance and Improper Densities
In this chapter we review, re-interpret and extend a nwnber of
principles fronl the Bayesian literature and integrate then into a
Bayesian indifference p:rocedu.re.
The Bayes postulate vIas first stated for a problem which involved
a class of densities most naturally indexed by a parameter whose spectrum was a bounded interval.
For this case a proper uniform density
could be defined on this space and taken as the prior Bayes density.
Such a density expresses a kind of indifference.
Other problems,
hO'll·-
ever, involved densities more naturally indexed by parameters having
spectra of infinite
e}~ent,
e}~mple
being the normal model
exp _[(x_ fl )2/2],,"00 < x <
mfl(x) ::; (2rc)-1/2
in which the parameter
a simple
fl
00, -
00
(I.~cl)
< fl < 00,
takes values in an infinite interval.
~
A
second case is exemplified by (2.1)
in which the parameter
takes
'values in a semi-infinite interval.
In these cases it is not possible
to define a proper uniform density over the parameter space,; hOvlever,
an improper
unifol~
We define an
density may be defined instead.
impro~
(measurable) function
stant
c
g
is an _improper
or
unnol~ed
density as any nou.. negative
whose integral does not convorgec
unif~~
A con-
density on any interval of i.nfinite
25
extent"
Since the posterior Bayes denRi ty v1ill be independent of
any constant Inultiplicative change in the prior density it is unimportant what positive value
the constant
c
c
takes.
Indeed we shall always use
to ino.icate an improper uniform density even when
considering more than cne such density and even when the constants
cannot be the same.
Improper densities have long been used in Bayesian theory.
Jeffreys extended Bayes postulate to parameter spaces of infinite
extent by postulating that a parameter on (- 00, 00) should have an
improper uniform prior density and that a parameter
e-1 ,
should have the improper density
e
on (0, + 00)
or equivalently tha.t ')I.
should have a uniform density on (- 00, 00).
= log e
The use of improper densi-
ties has not seemed unreasonable, particularly in that, if properly
chosen, the posterior densities are proper after an appropriate number of observations.
It is, for example, easily verified that WJ.th the
~
improper uniform prior density for
of
J..l'
given a single observation
in
(4.1)
x, is proper.
the posterior density"
Similarly in the two
parameter normal case, when the parameters (l-L, /1..) corresponding to the
mean and the logarithm of the standard deviation are chosen1we find
tl~t
a joint uniform density on
density
a~ter
(~,
one observation, but a
')I.) will lead to an improper joint
pr~Qer
conditional density of
after one observation, and a proper joint density of (Il' ')I.)
observations.
Il
after two
This second example seems consistent with the idea that
one observation yields no information about the degree of spread of a
normal distribution with unknovffi mean.
26
Another motivation for considering improper densities is the desire
to obtain closure for certain parametric classes of densities.
ger(Y)
if
TllUS
is the normal density with zero mean and standard deviation
er
(0 < er < (0) , it is useful to extend the definition of the density to
permit a limiting density as
density at any two points,
er
tends to infinity.
Since the relative
ger(Yl)!ger(Y), tends to a finite limit,
namely unity, we say that the limiting (improper) density is defined
as the (improper) uniform density.
In general, if
proper density except at a boundary point
to a finite limit
density
g
for which
improper density.
ge'(Yl)!ge'(Y2)
density
g'
r(Yl' Y2)
as
8
g(Yl)!g(Y2)
8
0
,
tends to
= r(Yl'
Y2)
ge' e
but
for which
g#(Y'1 )!g'(Y2)
proper d.ensity corresponding
~
Gl>
is a
ge(Yl)!ge(Y2) tends
eo ,
then any improper
is called a limiting
Similarly, if for some other value of
is a finite function
€
@,
8' ~
r'(Yl , Y2)' then any improper
= r%(Yl'
the point
8 t.
Y2) is called an im·
In this manner we ma;)'
supply natural closures and extensions to parametric classes of proper
densities.
We shall later show how these extensions are uniquely
detennined.
E,'ven if vle pennit improper densities, the problem of
variance of Bayes postUlate is hardly less crippling.
posal that a parameter
e- l
vlith
e
Jeffreys'
also has prior Bayes density
ill-
l
for
e
ill
(0, (0), but this is the extent of the invariance obtainable.
.-
pro~
on (0, + (0) have the prior Bayes density
does remain invariant under power transformations on
ill
in~'
since,
i.n
Jeffreys
(1948) also proposed. that the initial probability density be taken a.s
proportional to the square root of the determinant of the Fisherian in,"
fonnation matrix.
This rule however leads to contradictory specifica-
2'7
tions in some cases and the arguments employed by Jeffreys to resolve
individual cases must be considered ad hoc.
There does not appear to
be any general way of specifying prior densities which will be invariant under any but severely limited classes of transformations.
Thus
it might seem that the problem of lack of invariance is insolvable.
An alternative to finding a prior density exhibiting invariance
under transforrnation would be to determine which parameter should be
subject to Bayes' or some analogous postulate.
could be discovered Ivhich c.emanded that
IJ.
If some rationale
and not
1J.3:
say, in (4.1)
should be uniformly distributed then the problem vlOuld indeed have been
solved.
Now it might be tempting to say that a uniform prier density
should be taken for whatever parameter seems of special interest to the
statistician (Lindley 1957, Kerridge 1961).
But clearly this is not an
acceptable basis for a theory of logical probability.
Again reference
must be made to the fact that scientific inference is a problem of
group inference and just as the determination of the prior distribution
cannot be subject to personal bias, it cannot be subject to personal
interest.
Basically a parameter is merely an indeXing element of a
class of densities.
Hurzurbuzar (Jeffreys, 1961) has propos eo. a parti-
cular choice of parameter to be subjected to an indifference postUlate
in
eJ~onential
class models, but little rationale has been given for
this choice.
4.2 Natural Conjugate Bayes Densities
Raiffa and Schlaifer (1961) have furnished a rich formalism for
the Bayesian analysis.
For the purposes of this paper the essential
28
feature to be emphasized is the theory of natural conjugate Bayes
densities (NCBD).
The function ke,n(S)' defined in
termed a kernel of the likelihood.
exists (finite) for
ill
= mo
Jr
Suppose that
, 1<:'
o
and for
+ 1, m0 + 2, •• ~
~)TIt
J.s the sl)ace of.' the sufficient statistic
s
is
m.
e,m (s!)
Suppose also that the definition of k
m > m,
in some natural ,;-ray for all
m
smallest interval containing
a
e
.)
e,m
SZ
(s i
€
de
S , where
ill
when the samnle
size
. t'
-1(.
Sf
ill
US.o
1=1
2.6, has been
S where
m
can be extended.
S*
is the
ill
a
We shall then term ke,m( s' )
J.
kernel of the NCBD a.efined by
Ie
b (8)
m
(s
e,m
m
= ---=------
(4.2)
Je
'k
·e
i )
(s f )de
,m m
for all real m > m and Sf €
and b (e) will be called a
a
m
m
natural conjugate Bayes density for Me' If l(e (s : ) is not inte,m m
grable we 1'71.11 still consider it to be a kernel of a.n improper density.
The
ill
existence or non-existence of this integral for various ve.].ues of
generally depends upon the choice of parameter.
For the models studiecl in this paper there exists a natural
selec~.
tion of the sufficient statistic which completes the defini.tion of ()+'2)
for
ill
> i lal .
We shall use the
and the symbol
z
s~nbol
x
in the kernel of the likelihood
as the anaJ.ogous guantity in the kernel of the NCnD,
similarly m will replace
nand
-z =
~ z./m
J.
will replace
X ::
22
I
X.
The analogs of the sufficient statistics are thus the parameters of the
NeBD.
For convenience we shall consider the sample size
component of the sufficient statistic, and hence
n
to be a
m will be cOJsidered
J,
In.
29
to be one of the parameters of the
NCBD.
To exhibit the general properties of the
~~~BD
to consider the example of the Bernoulli model.
likelihood of
n
n
Then k.
p,n
xi
(l-p)
p
(s):= pS(l_p)n-s
define the
NCBD
We have, as the joint,
observations,
:=-./T
1=1
m (x)
p,n -
it is convenient
l-xi
\-ihere
:=
L:.x.
p
n-L:.x.
l(l_p)
J.
s;;: L: x. :::: 0,1, ••• , n.
We then
1
as
b (1') ;::
m
for
0
~
z
f3(z+l,m-z+l)
~
m, 0 < P < 1, m ~ O.
non-integral m and
z
The extension of the definition for
seems natural enough.
beta density of the first kind with parameters
The density
(4.3)
is a
z and m-z.
If
(4.3)
is taken as the prior Bayes density then the posterior density of
given a sample of size
:=
n
and
x
p
successes is
pX+z (l_p)(m+n)-(x+z)
f3(x+z+l,
fm+!.p-.[x+!:.7 +
w'hich again is a beta density,
nOvi
1)
\-lith parameters
(x+z,f m+~7-,Lx+3;.7).
The advantages of the use of a NeBD as a prior Bayes density vThen a
NeBD exists becomes apparent from this example.
1.
They are analytically tractable in that we can obtain
a closed fOlm expression for
f(x)
and b (8).
n
30
2.
The kernel of the prior density combines with the kernel
of the likelihood j.n exactly the same way that two sam··
pIe likelihood kernels combine.
3. The NCBD class is closed under random sampling; hence
the entire class of prior and posterior densities may be
denoted by
(4.2)
for
m>m
o
and
('t
,.)
*
The class
m
of fj.ducial densities is similarly determined.
A more form3.1 and extended discussion of these points may be found in
Raiffa and Schlaifer (1961).
mo (an integer)
Let
defined for
ill
> m.
o
be the smallest value such that
ill
is
To demonstrate the technique for extending the
NeBD class for non-negative integral values of m<mo
sider a second example.
meters
b
IJ.' the mean, and
vlC
nOli
con-
For the two parameter nor.mal model with para'f.., the logarithm 0:1:' the standard d.eviatioJ:l,
we shall obtain
>
for
In.
For
ill::;
1, -
ro
< Z (::
< co
fl
1, consistent with definitions to b8 prese:o.ted ill 4.3 vTe shall
31
Hence define
/e- 2A.
=_.-
/2u
:= b l (1-l,A.).
define
bo(lJ.,i-.)
:=
For m:= 0 we have
c.
The function
bo(l.J.l,A.l)/bo(lJ.2,A.2):= 1, he:nce
b (lJ.)A.), 0 < m < 1, will not 'be
m
defined as such definition is net required for the theory to be presonted in this paper.
The second factor in the expression for
is tbe conditional density of
its definition in a natural
lJ.
v~y
given
A, for
m > 1.
bm(I-l,A)
We may
e~~end
to be made explicit later to 'values
O<m<l.
The class of fidu.cial der:sities is defined by
for
~:hus
ill:= 0, 1, ••• ,
ill
o
and ill > m
0
in extending the domain of the NC:BD class we automatically extEI1.d
the domain of the class of fiducial densities.
It.3
Minimum Prior Data
Raiffa and Schlaifer do not
eA~licitly
propose a Bayesian
postul~.te.
Indeed, they would seem to indicate that they beli!:;ve that a consistent
one is not possible (Raiffa and Schlaifer$ 1961, po 66, Schlaifer 1959,
p. 445).
Yet the formalism which they have developed leads to a princi-
32
pIe, which though it is not itself sufficient, will prove useful in
developing a consistent Bayesian procedure.
This principle is at
least suggested by the choice of certain limiting operations tal{en
(e.g. Raiffa and Schlaifer, 1961, p. 278, p. 292, p. 302, etc.)
The analytic tractability of certain classes of Bayes densities
for certain models has long been Imown and the restriction of prior
densities to these classes has long been advocated on this basis alone.
The formalism of Raiffa and Schlaifer, however, permits a most fruitful
heuristic justification.
If we require ths,t all prior densitics be
determined by prior data we might, for example, consider the quantities
m and
z
(4.3)
in
prior successes.
m --> 0
to refer to some prior sample of size m with
It would be natural then to see what happened as
z --> 0, with the restriction
and
happened as the prior data vTent to naught.
lim
m-,.> 0
z-·> 0
z
<
z
pZ (l_p)m-z
~(z+l,m-z+l)
=
z
~
m, i.e. to see vn1at
We obtain
1,
(4.4)
m
Le. the uniform density, which agrees with Bayes postulate.
This type of procedure is quite generally applicable to natural
conjugate Bayes densities, though we shall in some cases consider
different limiting values than those taken by Raiffa and Schlaifero
The choice of limits taken and the definition of the limiting values
will always be such as to be consistent with the principle that the
kernel of the Bayes density 'Vias determined by prior observations.
53
So that the sample size parameter may have fractional values and
so that li1nits, as the sample size approaches zero, may be precisely
defined, we shall introduce
ej~licit
definitions of
s~uns)
products,
maxima and minima when the index has a non-integral upper limit; any
other convenient definitions leading to the same limiting behavior
might do as well.
Let
m be positive and let m*
or equal to m.
Zl' z2' ••• )
For any function
is
m
L:
(1).
J.
1=1
b.
IT
= w(z.),
J.
ide then defj.ne:
m*
= L:
i=l
m
L:
m
(2)
ro(z), let ro.J.
and suppose the supremum to the range of
the in:fimum
(1)
be the largest integer less than
i=l
ill!
w.J. -
m"
J.
i=l
e
:=:
(
rf
i=l
vThere
(3)
max
i<:m
where
max ill. ;::: a ;
J.
i
(4 ) min
i<m
ill.
<
0
max( -GI,).
;::: -
J.
J.
)
i<m
min illi - (m - m*)(min illi - min wi )
;:::
i ~ m7~
where
i
min ill.
J.
<
0
i
:=
max
-
i
<0
:5 m*+ 1
("WJ.') ==
b •
i ~ m*
given
w is a, ana.
Hence taking limits as
m tends to zero, i'le have:
m
(1' )
lim L.. 1 ro.
J..
J..=
(2 1 )
limTf
~:=l
roi
:=
0
:=
I,
,
(3' ) lim max roi -- a
i < m
lim min
i
<m
ro.
:=
b
J..
In particular, if the range of
l:l.m
m->O
max z. _ J..
i<m
co
,
These limits and the value
~al~~
is (-
Z
lim
m->O
m:= 0
of the parameters of the NCBD.
00,
+
00),
luin zi
i<m
then
=
ivill be referred to
~\fe
8,S
the l1ull
shall refer to this procedure
of obta:i.ning a prior Bayes density by allovrlng the parameters of the
Bayes density to approach null values (ivhj.ch are consistent with the
interpretation of no prior data) as the principle of £1inim~ pri~E Data
(MFD).
In tY})ical cases the applica.tion of this principle leads to a
prior Bayes density which may be included in the NGBD class by the ex··
tension technique described in
402.
The choice of prior density obtained
by this extension technique will always be taken so that the posterior
density after n
observations i'lith suf:t'icient statistic
s
will be
identical to the posterior density obtained by employing the general
form of the NCBD as the prior density to obtain a posterior density and
then evaluating that posterior density for null values of the parameters
of the prior NCBD.
normal model is
To illustrate, the :NeBD for the location-parameter
35
The extension technique yields
bo(~);::
c. Now if br.1 (IJ.) is taken
as the prior Bayes density the posterior Bayes density is
;m+n
b
(IJ.);:: -,~
m+n
%c
",hich~
exp
r-
(m+n){1J. -
upon evaluation for null values
b (lJ.)
n
=E
exp - n(1J.
/2ii
~z. + Zx.
__~._J.: )/2 7
m+ll~z,
~
;:: 0,
ill
is
=::0,
)/ 2
which is identical to the posterior Bayes density obtained
employing
the prior Bayes density b (IJ.) ;::c.
o
Unfortunately~
this heuristic principle cannot stand on its
as an invariant Bayesian postulate.
Suppose
e;:: <p(p)
O'WY.l.
is an allowable
transforrnation for the parameter of the Bernoulli model.
Let
p
=~(8)
be the inverse transformation; then a more general form of the likeli·hood is
and the NCBD
b (8)
m'
This implies
'e
for
;::
e
is
, for
ill
>
Z
>0
0
36
Then
b (p)
lim
m...... >O
z"""">o
z<m
m
==~<RL­
J~'(P)dP
==
cp f (p)
---~---
cp(l) - ep(O)
o
which will not be consistent vlith
of
The
p.
MPD
(4.4)
unless
e
is a linear function
principle is thus subject to exactly the same diffi-
culty as the original Bayes postulate.
Raiffa and Schlaifer have suggested that vlhen there exists some
.,
prior knowledge concerning the parameters that prior distributions
might be chosen so as to conform with subjective betting odds that
person might place on various values.
'1
However, accepting the idea
that the prior density is to be made up from knowledge gained fran prior
observations, it would seem more natural, in developing a model for
scientific inference, to select a prior density which corresponds
to
an equivalence between our prior knOWledge and the results of some
prior hypothetical experiment.
That is, we might say that our current
knOWledge concerning the parameter is approximately that which we
would have if we observed an experiment of a certain size and obtained
cer~in
results, 1nth no information prior to that.
Since we
extended the definition of the class of natural conjugates to
graJ.
l~ve
non~inte~
m, we permit a continuum of prior information so that this class
may be sufficiently rich to adequately characterize prior da.ta obtained
from non-independent observations or non-equivalent experiments.
3'7
4.4
Minimum Bayes Information
In light of the parametric formulation adopted it is possible to
consider the Bayes problem to be that of making the minimal possible
assumption about the process under stud.y subject to the assumptions implicit in the moclel.
This may be thought of as choosing a prior Bayes
density which is minimally informative.
The amount 0:1' information in
a density function may be associated with the lack of spread in that
function.
over a
an
For eX9mple, a o.ensit.y· vlh:Lch was almost entirely cOl1centre.ted
~mall
interval would tell us rather precisely what the value of
obs,,;~~vation
from the associated population might be, whereas a density
function 'Vlhich was relatively flat over the spectrum of
yj elo. very
p:\.~("cise
predictions.
In the first instEm.ce
X would not
vlOnld cons:tder
W~
the c.GCLity flLl'lction to be very informative; in the second C2.se vie
,{Quld
~onsider
the density function to be very uninforrr.:.,s::j:;ve.
Qile possible measure of the degree of spread or lack of
iilforilla~
tion in a distribution is the variance of that distribution.
It is not,
however, difficult to construct rather concentrated dens.:t:L8s which
l1ave infinite
variance.
A second possible measure of Lr;-'o::mat.Lon is
that due to Fisher who defines the information in a san'.J)J.e as t.he expected
value of the square of the derivative of the logaritn:n or:' t'le
function.
The important use of this fu.:a.ction 1s in
lnaT~:;c.rm
l~;_kelihoocl
li::cl:i.hood
estin:.e;l:..on and in the Cramer-Hao lower bound for the V&T::':"nce of an
unbiased estimator6
As a true measure of information it is hardly
satisfactory since it defines the anwunt of information obtained as
being o.i:t'cctly pro:portional to the number of observations
Q
This con-
38
tradicts the well-accepted principle of decreasing marginal utility
of successive equidistributional observations, that is, the fact
that the first
n
observations from a process yield more informati.on
about that process than the conditional increment in information
afforded. by the second n
first
n
observations.
observations given the information from the
The variance is preferable in this respect;
for example the variEl,l1ce of the mean is given by the variance of the
random variable divided by
n J and hence the marginal decrement in
variance is a decreasing function of
n,
assuming the variance of
X
is finite.
Another measure of the amount of information in a distribution
vias taken from the Vlork of Shannon (1911-9) and. applied in a Bayesian
context by Lindley (1956, 195'7, 1961, 1962) and by Kerridge (1961).
T~
Jaynes (1957) seems to have
related application by the physicist
E.
gone unnoticed. by statisticians.
X has density
to
Lebesgue
I(f)
measure);
then the information in
Jf(X)
:::
If
log r(x)
when
f(x)::: 0
l'
f(x) (,nth respect
is defined as
(4.6)
ax
or by the corresponding sum in the discrete case.
f(x) log f(x) ::: 0
A
is assumed.
The convention
For a jo:ill'c density
hex, y) the information is defined by
r(h)
=
JJ
f).
h(x,y) log h(x,y) d;y d"
Y
with the restrictions as above.
the information
m~asure
is
(4.'r)
The important defining relation of
59
,,....
r(h)
where
t'
reg)
c;
reg) +
=
~-\
(4.8)
r(fIY)
is the information in the density
refly)
g(y)
of
is the expected value with respect to the density
of the information in the conditional (tensity of
X given
has been shown by Shannon that, ~ th~ .discrete ..::~se,
unique function, for fixed logarithm base,
and a mild continuity pro:perty.
~
r(h)
(1,
information
=~
ref)
least information.
r(f)
X and
tinuous
x,
log
If
d
f(x)
~:i
l.~
(X)
co
d
~l
;(- of finite
,x
,?')
€
A'- , with
?)
X has a discrete density defined over a finite
f(x) = N- l
< X < 00, Var (X) fixed and
X, x
= 0,
1, 2,
with information
Gn
For con-
(X) arbitrary, the
For continuous
X, 0 < X <
~l, =2, ••• , var(X) fixed,
_a(x_m)2
, a >
c: (X)
\.Q
Finally, for discrete
X, x
= 0,
arbitrary, the density preportional
°, is minimally informative.
might be called the discrete normal density.
\-Ie
The last density
note that many of the
Bayes densities encmlntered in this paper are minimally informative
densities.
co
.c. and teleX) fixed the geometric
density is minimally informative.
e
=
tf"!/
'
fixed the exponential density is minimally informa0ive.
For discrete
to
this condition
N is minimally informative over this spectrum.
=
is the
is the (proper) density over /C having
normal density is minimally informative.
and
ref)
rt
yare independent.
then the density
N points then the density
= -log
Y.
rt has also been shOlm that
X has a continuous density with spectrum
Lebesgue measure
set of
satisf~ring
g(y)
I(f) + reg)
with equality if and only if
If
Y and
The infimum of the information function on an infinite set is
_
00
and may be approached" for example, by taki.ng a normal distri-
bution and. l.etting its variance go to in.:E'inity.
In the continuous
case the supremum of the information function is infinity and Inay be
approached by letting the above variance tend to zero.
In the dis-
crete ce..se the maximum information is zero, v:hich is attained b;y a
density assigning probability one to a single point.
The fact that
the limiting information as a density approaches a point density- in
the continuous case is not the same as the information of a point
density in the discrete case is cause for contemplation.
(1956) has shown that
[lxI(bn)':: reb)
is taken with respect to
,·,here expectation over
Finally, Rajski (1960)
f(x).
Lindley
'1
l~s sho~~
that information exists (finite) under a mild regularity cono.i'tion
provided there exists an
J':~
€
m(x)
€
> 0
ax <
such that
00
...
--(~
For improper densities or densities not having moments of order
the Shannon information measure is not an adequate measure of spread
as the Shannon information need not be finite.
It is apparent that
some improper densities are more spread out than others, and perhaps
the one with the most spread is the improper uniform density.
The
inadequacy of the Shannon information measure is nicely illustrated
by the following exarnple:
==
Consider the density
log e
x (log x)2
--~--...,....
,
1<8<x<oo
(1+.10)
E
It may be verified that this is a proper density, having no moments,
e,
and that its information is minus infinity for all
B;\l takir:g
e
1 <
e<
00
•
very close to one we find that almost all of the pro-
bability is concentrated near one.
One would not happily say that
this d.8nsity was non-informative.
vJhile such densities and improper
densities may be unimfor-illative in the Shannon sense; it is clear that in
a wider sense some are more uninformative than others.
In parGicular,
we accept the proposition that the i.mproper uniform density is most
uninforn~tive.
In doing this we are in effect assuming two levels of
measurement of information
0
vJhen She,nnon information is suffic:i.ent fOl:'
our purposes,W'e shall employ it; when it is not, Le. vThen dealing
jF~roper
densities,
~e
shall introduce other
id~as
'1,Jjj:;h
to discriminate
among densities.
A natural application of this theory to the Bayes problem would.
be to require that a minimally informative prior density be chosen
for the parameter.
~~y~~ !nformation
We shall refer to this as the principle of
(MBr).
to the Bayes postulate.
If the spectrtun of
8
~!?i~
is finite)this reduces
Lindley and Kerridge have specified that in
all cases the uniform density be taken) and Kerridge has displayed an
interesting property of this specification.
Sin.ce we have referred to the
uniform (proper or improper) density as most uninformative; 'lie shall require, as our MBI principle, that a uniform density be tall:e!l a priori.
More specifically we shall require a
Y.E~~ .~ensi.!-X
of all unknown parameters of the model.
2E
the joint
.sp'aA<:'~
The invariance problem remains;>
however, as we have not specified what parameters should have a uniform
density.
Finally we note that the MEl principle is consistent vri:l:h the
restriction to
fom of a
NeED
NeBD's since a uniform prior is aluays a limiting
if a
NeBD
always adaptive, as is a
4.5
~mum
exists.
lvIoreover an NiBI
densit.y is
NeBD.
Necessary Sample
We shall take a second idea from Lindley (1961) which is best
introduced by the follovling quotation:
"Some elues on the nature of a prior distribution can
be obtained from statements that are commonly made
••• it is said that such a.no. such an obserYation gives
no ini'ormation about a parameter.
For example it is
often said that a sample of size one from a norm!),1 distribution of
unkno~~
mean and variance gives no
mation about the variance
infor~
To a Bayesian this
presumably means that the prior and posterior distributions are identical."
Lindley indicates his inability to do anything substantial uith this
principle.
T,-ie shall employ this kind of :idea though for present
purposes we shall modify the statement; somewhat.
Let us
repla~e
the
last sentence with the fQllowing:
To an information-theoretically-oriented-Bayesian this
presumably means that the Shannon information in the
prior and posterior densities are identical (or perhaps in certain contexts, that the prior and posterior
densities are both improper).
By careful statement and application this principle will be seen to
be workable for important examples when used in conjunction with the
other principles developed in this chapter.
),7
..,.j
In Chapter V vTe shall examine the no:nnal
ro.od.e~s
in
detail; however
in order to introduce our methcd for utilizing the a.bove general idea)
it will be convenient to consider the location parameter
here.
A
NeBD
for the normal model
ties indexed by the parameters
is
z
(m,
(l~.l)
z).
and the variance is (m+l) 1m.
nOl~al
model
is the class of normal densi-
The eX"pectation of this density
Using this as a prior Bayes density
the posterior density is indexed by the parameters
EZ + Ex.
i
(m + n, --'-:L~+n 2
).
Clearly then, the entire cla ss of Bayes densities can be characterized.
by the improper and proper densities
for m > 0
and
proper for
m = € > 0, however small.
uniform density.
and by
will be
b (lJ-) = C •
o
For m = 0
"le
have an tmpl'oper
It is not unreasonable to think of this as implying
that with the prior improper uniform dens:Lty, the posterior density 'llill
be proper after any amount of data, even an epsilonth of an observation!
S:i.ml1arly in the two parameter normal model we find that the posterior
joint Bayes density ivill be improper after one observation, but proper
(With probability one) after
1 +
€
observations when the prior density
is ,joint. improper uniform and the parameters are (11' A.).
If vl'e we:r8
to add a skewness parameter, presumably vie vmuld want to have an impro'"
per joint Bayes density after two observations but a proper one after
2 +
€
observations.
We shall not attempt to make expl:i.ci t now just
hOYT
an epsilonth of an observation may be obtained; however it would seem
reasonable that two dependent observations would on the average contain
more information than one observation provided that the dependence
viaS not complete and that these tV70 observations v70uld contain less
information, on the average, than tvro independent observations.
Also
an epsilonth of an observation may be thoughtof in terms of a procedure
which takes an observation with probability
observation v7ith probability
1 -
and fails to take an
€
€.
For most models studieo. we shall require that the joint Bayes
density be minimally-Shannon-infornative after K··l
where
observations,
K is the nmnber of parameters in the model and more than mini-
mally informative after K-l+€
obsel'vations.
The multinomial and bi-
variate-normal models will not follow this pattern but vnll be seen to
be consistent with "statements that are commonly made".
m
o
vJe specify
to be the largest sample number (alvlaYs an integer) for which the
joint Bayes density is to be minimally Shannon infonnative.
The
Bayesian analysis presented in Chapter VII gives some insights into
why, for the multinomial model,
m ::: 0, regardless of the nQmber of
o
categories.
We shall refer to this requirement that the
~posterior
Bayes density
become more than minimally (Shannon) informative at just the proper time
as the principle of the M~nil~ ~Tec~ssary Sample (IvINS).
In multi-
parameter problems it will be most convenient to apply this principle
to appropriate marginal and conditional densities.
For example, for
the two parameter normal model we will require that the conditional
density of
of
~
given
given
~
\-l
~
be proper for
be proper for
be proper only after
m=l+€.
m = €, the conditional density
m::: €, while the marginal density of
A.
We shall not attempt to give general rules for determining m ,
o
freely ao.mitting that this involves a real vleakness in this principle.
However) no difficulties arise in many i.mportant exanr.fJles) including
those treated herein.
The heuristic idea is that
m
o
should be one
less than the i'ltlmber of whole observations needed to "say something
about the unknmm parameter."
For the improper uniform density the Slmnnon information is
-
zero, or +
00)
for the density
as
co
x
-1
c
< 1) =1 or
is
The Shannon information
>1.
does not even have a definition as an
extended
Lebesgue integral as
,I
110~ x
'"
vIe
,·'::;'0
ax
;::
_co
and
j lO~
x
ax
;::
-teO
'- I
()
shall be dealing 1'lith such improper densitiesi and) in order to
mal~e
the proposed principles more precise) it 1'nll be necessary to define
the information in these densities in some consistent manner.
We have
no difficulty in doing this as these densities are obtained as limiting
densities of a class of proper densities.
to define the
inforn~tion
It will be convenient then
in the limiting density as the limit of the
information in the sequence of densities.
Thus for the cases dis-
cussed in this pEtper '-le shall define the infon11ation in these densitief,
to be
4.6
-
co,
this being the above limiting inforJmation.
Minimum Fiducial Information
The fourth general principle to be reviewed is that due to Novick
(1962)
0
This principle is based upon the idea that the cent:t'aJ. subjeet
of study should properly be taken to be the random variable itself and
not some arbitrary parameter.
maCl.e by Fraser (1962).)
(A similar suggestion has since been
It was suggested that within the context of
the parametric structuring def'j.ned above a prior Bayes density should
be selected so as to minimize the information
X.
d.ensity of
ref)
in the fiducial
Having specified a particular minimally infonnative
f
the integral equation
=
f(x)
e
,
is considered, where
f(x)
and me(x)
is presumed to be unknown.
are presUllled known and bee)
The problem of lack of invariance under
It is replaced, hnwever by ti-TO
reparamet:rization then disappears.
other problems.
(4.11)
\'m ( x) b( e) de
j
The first is the uniqueness of solution of the
gral
equation (4.11) (assuming a solution eXists).
(4.1)
a factorization of the density of a single observation
1
m~ ()
x ~
say,
1
) 2/2
( 2u )- "2 exp - (
x-~
= e
x~ « 2~1~ -2
inte~
For the example
2
) =
exp - (X +[.12)
2
yields the eqUation
(" :::>-<?
f(x)
=
j
v - -..
Hence
eX~
"':.>"-::.J
ane. hence
b(~)
ness of bilateral Laplace transforms.
are unique by the general uniqueFor the Poisson model we take
t > 0, A. >
The equation
(4.10)
becomes
°x
~
0;1,2, •••
f(x)
r
X
r..
=
,j
~(r..) dr.. ,
say
u
The requirement that this equation hold fox'
that all solutions
f(x.)
x
= 0,1,2,...
requires
have the se.me moments and. hence, since the
moment theorem is applicable the solution is unique.
The choice of a minimally informative fiducial density is more
difficult than that of the choice of a minimally informative Bayes
density.
At times it will be possi,ble to obtain a solution to
when the fiducial density is specified to be uniform.
(i~.lO)
If a uniform
fid.ucial density can be obtained from some uniform nayes density it
will be required, otherv1ise Ilv1e v1ill attempt to get as close as possible to a unifo:::m fiducial density" • This principle will be called. the
principle of
~inimum
Fiducial Infor-rnation (1I1F1).
arise in applying the principle on its
unique solutions to
sity.
(4.11)
OVffi,
Some difficulties
particUlarly in obtaining
and in specifying the correct fiducial
den~
We shall find, however, that its application in conjunction with
the other principles proposed in this chapter will be fruitful.
Since a uniform fiducial density is often unobtainable I i'1ell'ilJ.
often need to determine which of a class of improper densities is
least informative.
Ideally we would want some more general information
measure than the Shannon information measure in order to make this eliscri.mination. Unfcruw.ately we know of none currently available.
If
W8
\Vere
to attempt to ma.ke such a discrimination over an arbitrary class of im,,·
proper densities)we would now find that task impossible.
Within the
framework discussed in this paper, however, the problem seems less diffi-
48
cult.
We shall find that we need only make such a discrimination among
clc1.sses of densities 'ltrhich are rather obviously ordered in this respect.
To illustrate the logic we shall employ consider the
(4010).
d~Dsities
Each of these functions is a proper density though with
Shannon information equal to minus infinity.
tive point of vievl there is no
mina.tion.
For
-Crated near
e)
e
HOI.rever, from an intui-
difficulty in mak:i.ng the needed. discri-
near one, almost all of the probability is concen··
and we must say that this density is very informati.ve,
e lJecomes large the density approaches a uniform dens1 ty on (e,':o)
As
and hence becomc3 progressively less informative over its spec:trum.
we reduce the class by restricting
most reasonable to say that
e
to
1
If
< e < eo ,then it seems
is least informative.
o
Along this same line of reasoning) when comparing the im}?:c.'oper
densities
me
me(x) == x-)e(x > a)} 0 < 6
mb.imally informative.
0
:5
Also \"e say that
8, vle would say that
me
'"dS
.
..1
~2 (
0)
TIll(x)or: x
- ax
x > 0 is
m (x) ccx- l
2
or more generally that m (x) a x-I
2
-1
-~-I
is minimally informative in the class ~ (,x) (oX X
- ax"
; for
more informative than
a) r > 0 (see 5.2).
We may note that the fiducial density is proper if and only if the
corresponding Bayes density is proper) for,
fOl~ally,
Irfm(x)dx
' ':;
J
'):
j
\. b (e )d8 •
m
®
r1'h8 stated result is then a consequence of Fubini f s theorem for posi-
tive functions.
Additionally 'we may note that
( n (x),
J
f
n
i
dx
I
-n
~"t
(~••""
;J:
J
11
i;.-:l
I
ee) ,.
I b(G)dd
exists if and only if
exi.sts, where the first integration
J
is over the n-dimensional
X-spectrum.
Also Lindley has shown
that
> I(f)
where the expectation is with respect to
LI,o 7
f.
The Natural Bayesian structuring
In this chapter, we have introduced four principles which might
be considered in the selection of a parametrization and a prior distribution thereon.
In summary these four principles are:
(1) !'Iinimum Prior Data (MPD) mined as a limiting
the prior density shall be d.eter..
NCBD, i.e. by taking null parameters to reflect
the idea of no prior data.
(2)
Minimum B~3! Information
(MEl) -
the prior
density
should be unifonn over the (joint) spectrulll of all parameters.
J:.lini~um
(3)
ties should be
vations
Nece:3sary Sample (MNS) - the Bayes and fiducic,l deD.si,
m~nimally
n = 0,1,2, ••• ,m
a
(Shannon) informative when the number of obseris less than some postulated value m
o
more than minimally informative for all values
(11-)
n>m
but
a
~!~m11ln l"io.ucial Info~~~jY,i2"9: (ME"!) .. the prior fiducial density
shO'lld be minimally informative, subject to the restrictions imposed by
tl'e model.
50
We propose that edch of the four principl3s defined above is a
reasonable requirement for the specification of a prior Bayes density
under indifference in the context of a Bayesian parametric structuring.
A Bayesian parametric structuring which satisfies all of these principles will be called, for convenience of reference, a
.structuring (NBS) for the model.
patur~l
Bayes!an
We propose to 8hO'v1 tbat a NBS can
be obtained for important statistical models and tr..at this structuring
is essentially lmique in these cases.
By
~~ssentiaJ.l:t ~r)j.q~~
unique up to a linear transformation on the parameter.
we illeEl,11
For some modeliJ
the uniqueness will not depend on all four principles, though the
stnlcturing
detel~ined
by any lesser number will, in these cases,
satis~
fy all four principles.
We further propose that the results of the Bayesian analysis be
represented by the densities
aDa.
b (8)
n
posterior fiducial densities.
'be used, for
where
8
under the
and
f (x), the posterior Bayes
n
The posterior Bayes density is to
n > m , to make logical probability statements about
is the parameter which bears a uniform Bayes density
NBS.
8)
o
The posterior fiducial density of
X,
for
a
priori
m> m
a
is to be used in a predictive sense, that is, it is to be used to repla~e
the urutnovm true density
m(x) to predict values of the (n+l)-st
observo.ti on.
The
~rocedure
for determining and verifying the NBS follows a
typical pattern which 'rill be demonstrated in detail for the first two
l:J.odels studied.
If the "cm:-rect II parameter j.s not known, the analysis
will follow the scmewhat more complex pattern illustrated in 5 2;
0
51
however for brevity at this point we assume that we are fortunate
enough to begin with the parameter which proves to be "correct n •
We then postulate the value m
o
and define the
This definition is than extended for values
m
o
NCBD
for
m>m.
o
= O,1,2, ••• ,m0
and
the corresponding fiducial densities are defined.
The prior density of the parameter is then obtained as an extended
NeBD for null values of the parameters thus satisfying the
pleD
~~D
prine i-
It is then shmm that this prior density is uniform on the joint
spectrum of the parameters, thus satisfying the MBI principle.
then shown that the posterior
Ba~res
It is
densities became more than m1.ni-
mally (Shannon) informative at the proper time, thus satisfying the
MNS principle.
In mUlti-parameter problems the application of this
principle will be made in ter.ms of certain marginal and conditional
densities.
It is then shown that the prior fiducial density is mini=
mally informative, in the sense discussed above.
there cannot exist any other NBS.
Finally we show that
This is accomplished by means of the
kinds of completeness arguments discussed in
4.6 or by extending the
class of model densities so that it is complete.
At this point we wish to affirm that t.he four Bayesian indifference
principles were developed and refined inductively through consideration
of successive models; the task being to find some set of intuitively
reasonable principles which would be adequate to specify uniquely the
prior Bayes density under indifference for a reasonably wide range of
models, and to furnish a model which per.mitted the kinds of semantic
52
inteY~retations
inference.
which seemed desirable for the purposes of scientific
There have been many previous attempts to d.o this.
None
of these previous proposals have presented an internally consistent and
widely applicable methodology.
CHAPTER V
NOT:MAL MODELS
In this chapter the principles developed in the preceeding
chapter are applied to obtain unique specifications of prior Bayes
densities under indifference for each of the three basic univariate
no~~~l
models and for three bivariate normal models.
5.1 Univariate Normal Model - Location parameter UnknoWl!
We aSSUille the model density
1
m (x) = (2:iC)- '2
lJ-
-e
The joint likelihood of
We define the
NCBD
f-
exp
n
m > 0, and all
_co
<x <
(0)
_00
<
I.L
< co. (5.1)
observations is then
as
exp (lJ- ~ zi -
for
(x-lJ-)2/2_7,
-z =
~
2
m2lJ-)
z./m.
1
We interpret the parameters (m,
z)
of the Bayes density as referring to m prior observations with mean
z.
The fiducial density, for
m > 0, is
f
(x)
m
The MPD
1/2
rl
(2~(~~~1))
:=
e~~
(5.3 )
-
principle requires that the prior Bayes density be found
b (~)
as an extension of
for null values of
m
(m, l:z.) --> (0,0).
1
For
and for
tion.
1
The ratio
_._> 1, as
:=
gration we obtain
(m, LZ.).
Hence 'He define
b(p.) - c.
Then by direct inte-
f(x):= c.
° both (5.3) and
are improper uniform densities)
m > ° both are proper densities vlith finite Shannon informa..
(5.L~)
m:=
Thus postulating the value
m := 0
o
we see that the
ciple is satisfj.ed and 1·re have already Shovffi that the MPD,
principles are satisfied.
a NBS.
Hence the specification
b(~t):=
MIlTS
prin-
)limr and
C
HPJ
a_efines
Since a uniform fiducial density is obtainable) it is required
by the MFI
principle~
Hence
the essential uniqueness of the WBS
follOvls from the uniqueness of the bilateral Laplace transform.
This
is perhaps the only problem about 1"hich there has been no controversy
concerning the "correct" choice of prior density.
Under the NBS the posterior Bayes and fiducial densities are gtl/en
by (5.2) and (5.3) with
(m,
the (n~l)st observation of
z,
X.
x) replaced by
A
(n, X, x), where
I'
x
is
The Shannon information in these densities
is
r(b ) := -10g{2rre(n
n
and
r(f)
n
-log/2~e(n+l)/n
I h
I.)
p.4-
55
and
de~ends
cal.
only
upon the sample size.
This result is quite atypi-
We note also that both the Bayes and fiducial densities are mini-
mally informative densities, for fixed variance; also the model density
is a minimally informative density, for variance one.
n --> 0, of the functions (5.4) are -
values., as
informative on the real line.
00,
The limiting
i. e. minimally
In this case the limiting information
of both the sequences of Bayes and fiducial densities is equal to the
informa.tion of the limiting density.
In this case and in others em-
ploying this derivation the requirement, stipulated in Chapter II, concerning the power series expansion of the transformation is not
re~lired.
5.2 Univariate Normal - Scale Parameter Unknown
We assume the model density
-e
m (x)
~
= ~-1 (2n)-1/2
2
exp - x 2/2~,
The joint likelihood of
n
/
~
~
;::0
< x < 00,
(J'
> o.
observations is
m (x) = ~-nC 2n )-n ' 2 exp ( If
-
n
~
i=l
x.2/2~2 ).
were the correct parameter we would define the
~-
b (~)
m
=
m
m
2
(5.6)
1
NeBD as
2
exp(- ~ Z./2~)
1=1 1
~'72
2(-2:-
=
2
(5.7)
rem - ~)
56
Postulating the value m = 0
o
since
we see tl1at (5.7) is not acce:ptab1e
is im:px'o:per, thus contradicting the MHS :principle.
b / (0-)
l 2
vIe therefore require a more suitable :parameter.
an allowable transformation and su:p:pose
JJet
A.;;:: cp(U")
is of order
cpr
be
K,
The
corresponding integral to be considered is then
f
-,,(~~K) ,
m
0-
-~-
ex:p( - r,
~ r(m-.K-l)
i=l
2
Z,
2
/20- )
do-
1
K < -1.
which will diverge for null values and converge rrcherwise for
Hence the MHS :princi:ple requires that the derivative be of order less
than or equal to
0-
-1
The corresponding fiducial density ,till be of
maximum order, i.e. of order
allOv~ble
cpf(o-) > O.
we also must have
than cp(o-) = log
(J
e.g.
0- .~
= log
x-1 ,when cp f0-( )
=0- l •
with cpf(o-) =
ao-
-I'
.'
I'
0--
1
)
Since
i
y
is
Considering any other cp (other
satisfying these conditions,
> 0, a > 0, ve find that for any such parax- l
meter the ~~D fiducial density is of order
but in each case the
addition of terms of lower order in the Jacobian results in the fiducial
x --> 0
density tending to infinity more quickly as
vA
l
The series expansion requirement for
the logic of section
4.6
cp?
than the f1!nction
is necessary to employ'
and to define the impraper density
x
-1
as
the minimally informative fiducial density and hence to conclude that
A
= log
(J
is (essentially) uniquely s:p8cified as the correct parameter
of the 1iJ3S,
vJithout the power series requirement on cpt
d1'ff'
. leul t
y 'ln rul'lng out
'l\:;; (1og 0- )2rl+l ) n
we would have
= ]_, 2_, 3 ,... •
see that there is no solution K < -1 'to the equation
We
may al:3O
r(-2K)
:::
< a < 0 and hence we see again that
for
-1
MFI
dens i ty •
All future examples
~l
x
errrplo~ring
is the required
this mode of derivation
are dependent upon the pO"trer series requi:cement on
employing the derivation of
Taking
~
as the correct parameter we have
2(2~)
exp -
f
rn
for
m>
o.
l' (x)
m
m>
o.
exp
(-2~t7
m::: 0, b (A.) ::: c, which may be demonstrated in the
m
The fiducial density is the
lit It density
m/2
r( n:+2 l ) ( EZ i2 )
:=--------2
2 m+l
r(~)
for
Zz~
m~ + (-2'~)
r(~)
For
usual manner.
'VThile those
5.1 do not.
~z~ m/2
b (~) :::
::p f
x
fSc
-I-
EZ
i
2
(--2-- )
The posterior Bayes and fiducial densities are given by
1\
2
2
(x, EZ , m) replaced by (x, Ex., n).
i
(5.8)
~
The results obtained in this case agrees 'I-li th Jeffreys f genera:L
specification for a model ind.exed by a parameter taking va lues (0 ~ <Xi),
for the law
b(A.)::: c
is equivalent to the law
b(eJ)::: eJ~l.
This
result also agrees with that obtained from Fisher's fiducial theory.
Finally
'lye
note that the lill1iting information, as m -> 0, Ez.
~
-~>
OJ
(5.9)
in the density
is -
00
ref) ~ -
hence the definition
00
is
consistent.
5.3 Univariate No:nne,l - Location and. Scale parameters Unlmovm.
We assume the model density
m!J,.,l-. ()
x
for
wi.th
=:
( 2:rr )-1/2( eX}? -
_00
<
x 7 1l,f-..
l-.
==
log
by cases
I
<
00,
)r
(
(x-u)
2
l-.'L exp - _·-:~t':':!-
which is the uS1..1al tvTO parameter normal model
The choice of parameters here has been suggested
0-.
2.
and
For this paTametrization we define the NeBD as
Sfe-rnA
/
2["
== (
\
Zz.2
l
-2
-
2
ill Z
_7
m-I
'""'2
(
e- m-I)l-. e-e
rCm;l)
2
2:.z.
-2l-.r_. l
-
-
li1
-2
Z
\
2~~)
V/
1\
-
59
~-21\
=~-
/2ic
1, hence define
b
a
(~,I\) ==
c •
Correspondingly we have
f(x)
==
and
c
f (x)
l
==
_1_ •
2
(x-z)
m > 1 the fiducial density is given by
For
m-l
,,2
'" )2 - 2
.
1/2 . m ~z. - \~z.
(
1
1)
I "ill)
V2 ill \
2
fm(X) == -------------~------;:;--m-J72
(m+l)(x+L:Z~) - rl:z. +
(iii:l-l)l / 2( 2Jr)1 / 2 f(m;l)(
'J.
1
)
xl
.(5.13)
2(m+l)
b (~,I\) == c
o
Since
satisfied.
of
1\
A.;
per.
~
f(x)
the MBl
and MFl
principles are
The first factor of (5.11), which is the marginal density
\vi11 be proper for m > 1.
is improper.
of
==
given
-
For
ill
0=
1
the marginal (tensity of
The second factor of (5.11), the conditional density
1\,
/
is proper for
m> 0.
For
ill:.::
0,
Applying the uniqueness argument of 5.2 to the first density and
the uniqueness argument of 5.1 to the second density we find that the
e..bove NBS is unique.
The posterior Bayes and
fi&~cial
by ( 5.11) and ( 5.13) with ( m, Ez.,
m > 1 are given
densities for
~
L:z~)
x) replaced by
(2
The fiducial density of
obtained prior law
b(~,o-) : ;:;
0--
1
X
b(~,/\) ==
is again a generalized
c
t:t
ll
I,
n, l:x., L.X , x).
i
(:tensity. The
1 1 1
is equivalent to Jeffreysf
J~w
and also agrees with the results obtained by F:i..sher em-
60
ploying his fiducial method.
Thus the obtained
~BS
implies a solution
to the Behrens-Fisher problem which would be consistent with that
originally proposed by Behrens and Fisher and that proposed by Jeffreys.
5.4 Bivariate Normal Model - Regression Parameter Unknown
In 5.2 we demonstrated a technique for finding the correct parameterization for the model.
no~mal
In the case of the dependent bivariate
model the correct initial choice of association parameter is
even more crucial to the analysis than the correct initial choice of
scale parameter in 5.2, for if we ivere to choose the wrong parameter
(yet one of the eo~non association parameters) we would be unable to obtain a closed form expression for the NeBD.
•
For the one parameter (de,•
pendence parameter) bivariate nonnal model the choice of
p, the corre-
lation coefficient, as the dependence parameter leads to the
considera~
tion of the integral
dp
vrhich does not appear to exist in closed fonn.
To solve this problem
it seems necessary to replace the correlation parameter by a regression
parameter.
In the simplest case we have the model given by
= ~2~ )-1/2
I
•
and where
n
1 (y -
exp - -
2
~ x
)2
X is marginally standard normaL
observations is then proportional to
The joint l:i.kelib.ood of
61
1
2 (L:XiY.)
---rL:y. 1.
e
2-
2
n
J.
-
7
We define the NeED as
b (t3)
m
f
fi-
=-/2rc
\' 2->0 ,_0:) <b ;:::
or m >0 , v;; .wu
i
L:U
f (yjx) f (x)
then given by
L:u. v.
m
m
<
II
~
(Yjx) ==
m
IV
b(t3)
= c = fey/x)
(5.16)
(y -
b
x)
2
1
-2-+ v
j2ic
Ix
substitutions in
The fiducial density is
where
and ",here X is marginally standard normal.
obtain
•
i
v
f
00
In the usual manner we
and infer uni~leness of the NBS.
and
(5.17)
The usual
yield the posterior Bayes and fidu-
cial densities.
5.5 Bivariate Normal Model - Three parameters Unknown
A somewhat more realistic and more frequently applicable model
is
..
(5 18'
~ )
\
m2
0"2.l,1l2,t3 2 • l
..
(x,y)
=
62
The first. factor is the conditional density of
second is the marginal density of
11.2 • 1 := log
([2.1' but shall
X.
Y given
We shall, of course,
hencefOl~th
on
!J.n'
the mean of
c..
2
and 0"2.1
X
Y,
~2.1
the conditional variance of
'Y
X are
0":= (J2.1' !J. := lJ. 2 ' 13 ::::
without ambiguity let
The unknown para-
the regression
assuming that the mean and variance of
with
woy.~
express the model densities using
the some-vlhat less cumbersome form employing 0"2.1'
meters aTe
X1 and the
kno~~
n
x.
given
~2.1'
in terms of (!J.) 0", (3) the joint likelihood of
coeffi~ient
of
Y
vfe are
and hence we may,
A. :::: A,2.1'
Written
observations is
+ ~x.)
l -
72
x
1
2
""2
e
lSi.
The determination of the NeBD for (!J.)A.,I3) is
tedious.
m > 2,
We have, for
/'
(
::::1
'e-Afiil
----
\/2ic
\
f
(
""1
ir:::-
\ ;2rr
"
8
/-11.(:,.e
C'
_(..l:.- )
e
2
straightfori~rd,
e
me -2 Ar!J.- (-) 72
v-l3!J.
2
-
though
63
where
u
2
S "" I;u.
1
l
is to
-
x
(r,u )
i
--m
as
is to
v
2
2
S2 "" I;v.l ..
J
y
and
(I;V. )
l
----
2
(I;u. ) (I;v . )
S12 "" I;u.J. v.J. -
ill
The first factor is the conditional density of
and
A and is proper for
tional density of
A.
b(p"J/\)P) "" c
I:u.
J.
l
ill
given
t3
The second factor is the condi-
A and is proper for m > 1 by virtue
given
f3
of the definitions of
density of
m > O.
fJ
J.
and I:U~.
l
The last factor is the marginal
Proceeding in the usual manner we are able to define
and
f(ytx)::::: c
and infer the uniqueness of the NBS.
The usual substitutions in (5.20) y'ield the posterior Bayes density.
5.6
Bivariate Normal Model - Five Parameters UnknoW11
No new techniques are needed to handle this case.
densi ty assuning that the mean and variance of
()
x
(2n)1!2
(
X are unknmVD is
-1
2.1
\
(
(J
\(2
The NeBD
-1
1
n )1/2
is then defined) for
m > 2, by
The model
6~·
...
e
_ill
-(~2.1
"2
-2A,
e
2.1/.-
-
e
IJ.
2
x
(g;
\
Irs;:
.
2
(2.;u. )
'IoThere
IJ.
1
--m)
:::
(~u.
1
S
ill
2
;::: r,v. -
2
)(L:v.J. )
),
l
ill
rI;U~
1
and
u
is to
x
as
v
condj.tionaJ. density of
is to
1.1
2
given
j.s the conditional density of
marginal density of
density of
given
"
and
A • •
2 1
A,l
The first factor of (5.22) is the
y.
1J.
1
13 201 and A,2.1; the
given
;
second. factor
A,l; the third factor is the
the fourth factor is the marginal
A,2.1; the fifth factor is the cond.itional d':msity of
It is easily seen that
f(x,y)::: c
b(1J. ".l2,A,1,A,2.1,13 • ):::
1
2 1
A
"'2.1
c
and we may infer the uniqueness of the NBS a8 in
previous cases.
Bayes density for
The usual substitutions in (5.13) yield the posterior
We have assumed m = 2 which seems rea80no
Uniquer..ess ma,y be demonstrated as in previous
m > 2.
able for this model.
examples.
From the above analysis it is apparent that any number of regression
parameters might be added and we would still be able to obtain a NBS.
Also "We note that the prior Bayes . densi't:y
parameters of this model
obtained for each of the
is consistent with the prior Bayes density
for that parameter in a simpler model.
This does not mean that the
two marginal densities are the same, indeed they are not.
meant is the following.
If we obtain a prior density of
What is
f.l
in the one
parameter model it v1ill be identical with the prior cond:Ltional density
of
f.l
given
A in the two parameter model.
In the case of' the location parameter
~l
and scale parameter
A,
we find (see 5.8 and the f:i.rst factor of 5.11) that tile difference
in the posterior Bayes densities of
of' "one degree of freedom".
A may be described as a difference
Similarly in (5.20) we see that there has
been a loss of a second "degree of freedom" as the result of the introduction of the regression parameter !3.
F'ollmving through the details
of the derivation of the NeBD it is readily seen that there will be a
loss of one degree of freedom for each regression parameter introQQced.
5.7 A Log - Normal Model
In order to exhibit one result of an allowable transformation on
the space of the random variable we shall consider a simple ca.se of an
important model which is obta,ined by transformation from the 10cai:.ion
66
paraIJ.e'ter noraal model.
for
x > 0,
< ~ < 00.
.00
= (n
m ( x)
1..1.-
We assume the model density
The joint likelihood of
x. ) -1 ( 2n) -n/2 exp
~
r --2
1
-
n
l:
n
observationS is
(log x. -~)
2
1.
i=l
-
7 (5.24)
We define the NCBD as
m
-- 2
e 2(lJ. - 2n
e
==
l:: log z.
--"-~
m
-m/2(~
)
l: log zi 2
-
m
. -)
, for
ill
>
°.
The fiducial density is then
z.; log z. 2
. !
-mJ2\ ~l
-
e
2:
1
(2Tc
ill
2(m+l)
J.
° both (5.2)+)
:=:
I
2
- - -:1)
ill
'lile
have b(l..l)
= c,
and (5.25) are proper densities.
of this NBS then follows as in 5.2.
dens i ty then Y
(log x _
m
x
m == 0, l:: log z. == 0,
For null values
For m >
_
e
l:: log z.
'1
--_._-:- I
If
X
f(x)
= x· l
The uniqueness
has the above log-no:rrJk'l..l
log X has the normal dens i ty with mean
in this case, invariant under transfoTInation on
X.
j..l Q
The NBG
5. s j'
CHl\FI'ER VI
POISSON MODELS
The Poisson density arises as a model density in two distinct
contexts.
In some cases the context is that of simple random sam-
pIing 'I>fhile in others it is in terms of the observation of a
tinuous Poisson process.
con~
We shall begin with a model in the former
context but shall extend our results to cover more general problems.
6.1 Random Poisson Sampling
We have the model density
e -A;...x
xl
, ;... > 0, x
The joint likelihood of
e
n
~
0,1,2, .••
( 6.1)
observations is then
-n,,,"' ;...z:x.1
n
_·
11
. 1
(6.2)
I
X ••
1
l~
We define the NCBD to be the gamma density
b
(r..)
m
~
l+z ;...z
m
e
-rnA
, for m > 0, Z>O,
r(l+z)
z
~z:z., ;"'>0.
J.
(6.3)
The fiducial density is
f
(x)
m
=
ml+z
.
r ( x+z+l )
v+z+L
(m+l)"''"
-r(x+l)r(z+l)
for x
~
0,1,2, .•• ,
(6.1j.)
68
which is for suitable values of (x,z), a negative binomial density.
The quantities
m and
z
are interpreteo. as the prior sample size
and prior number of occurrences of the defining event.
He have
extended the domain of the NeBD to incluo.e non-integral
m and
The ratio b (A )/b (A )
m l
m 2
b(A) :: c.
and
z.
is unity for m :: Z :: 0; hence define
The corresponding fiducial density is
(6.4) will be proper for any m >
f(x):: c.
Now
(6.3)
0; hence postulating the value
m :: 0; we see that the MNS principle is satisfied •
o
Since the prior
Bayes and fiducial densities are uniform the MBI and MFI principles
are satisfied.
Hence we have a NBS which is unique by virtue of the
moment theorem argQment given in
densities a.re given by
4.6. The posterior Bayes and fioncial
(6.3) and (6.4) with
(m,z;~)
replaced by (n,x,x)
6.2 Poisson Process Model
We may generalize the above model somewhat, in the manner of Raiffa
and Schlaifer.
Suppose we were observing a Poisson process; say Poisson
arrivals at a queue, and suppose we observed the process during random
time intervals of variable length.
It would be convenient- to he,ve a
simple method of combining these data "lith data in a prior Bayes density.
To accomplish this we define the model density as
where
x
is the total
(arrivals) and
t
nu~ber
of occurrences of the defining event
is the total time of observation.
The NeBD is then defined as
TZ+l il,z e-iI,T
, iI, > 0, T > 0,
Z
>0
(6.6)
r(z+l)
The fiducial density is then
(6.'7)
The quantities (T,z) are thought of as the total time of prior observation and total number of prior occurrences of the defining event.
ratio
b (A )/b (A )
T l
T 2
hence we define
= (/I,l/A 2 )Z
b(A.):= c
It, 1. are
(6.6)
and
e
1
and obtain
the unique NBS is as in 6.1.
are given by
-T(A -A )
(6.7)
2
is unity for z
f(x)::: c.
The
= T = 0;
The proof that this is
The posterior Bayes and fiducial densities
with
(T,z~t,x)
replaced by
(t,x,t,~),
wbere
the total time of the next obseTvation and the total l1vlD.be:, of'
ne'iT occurrences.
6.,3 Discussion of the I!ldifference Rule for the Poisson Model
The result obtained here for the prior Bayes density differs from
that proposed by Jeffreys, who would take
b(A):;:: A.- l •
J'cffreys' rule the posterior Bayes density after
b (i\) :::
n
~his
n
x
A.
x .. l
e
n
observations is
-ni\
(6
, • 8'J
rex)
density is improper and (Shannon) uninformative for
gardless of how large
Employing
n
might be.
x:= 0
re-
Under the model assumptions made
here, Le. that the underlying process is definitely Poisson with
A.
strictly greater than zero, and haVing observed a llgreat manyll observations, or haVing observed a Poisson process for a "long period of
70
time" and having observed no occurrences it would seem that a small
probability should be assignable to the set
appropriately chosen
A
depending on
O
n.
A>
~
o
(say) for an
Jeffreysf rule does not
permit this and hence we must conclude that Jeffreyst rule is inappropriate for the problem
~s
vie have formulated it.
With the model (6.5) and the prior Bayes density
b(A)
:::
A- l
I
the posterior Bayes density will be
if
x::: 1.
Since
lim
->
t
1
°
I're see that the Bayes density
b(A) ::: c
0.ensi t y when the prior Bay'es density is
is obtained as a posterior
b( I'"~) __
~~l
I'"
F
and the defining
even:t occurs instantaneously after we have begun to stu.dy the process.
Also, given
b(A,) = A,-1 a priori and x:::l} the posterior Be,yes d.ensity
is minimally informative (that is, uniform) only if
t:::Oo
It vioulo.
appear then the Jeffreys law may be appropriate when we have not yet
established that
A> 0
while the rule given in this paper is appro-
proate under the model assumption that the process is definit.ely Poisson
with
A, > O.
This idea is further developed in
7.h.
The negative exponential density is often derived as a 'Haiting time
distribution from a Poisson process
0
In this context the experimcnta.l
density is determined by conditional stopping and hence shoulcl not be
taken as the informaM,onal model; rather tne prior distribu.tion derived
above should be employede
infor.matioL~l
When the negative exponential density' is the
density the analysis should be as given in Chapter IX,
CHAPTER VII
IVIULTINOMlAL MODELS
7.1 Binomial Models
In this chapter we deal first with the example vlhich was the
initial subject of Bayes postulate.
At least three prior Bayes densi-
ties have been proposed and remain seriosly considered for the binomial
proportion.
We assume the model density
n ::: 1,2, •••
(7.1)
The statement of the model density in this form is analogous to the
second treatment of the Poisson model.
In each case we have written
the model in terms of a sufficient statistic Iv-ith respect to the more
basic underlying process) and in each case it is this sufficient statistic which is usually of interest in applications.
Proceeding in this framework we define the NCBD to be a beta
density of the first kind given by
for 0 < P < 1, m > z > 0
(7.2)
with expectation
=
The fiducial density is then
z+l
m+2
\!~7 ,,)7-)
72
f
(x)
m
pz( 1-1' )m-z dp
=
t3(z+l,m.-Z+l)
=
_r(m+l) r(m+2) r(x+z+l) r(mtm-z-x+l)
r(x+l) r(n-x+l) r(z+l) r(m-z+l) r(m+n+2)
which Raiffa and Schlaifer have called the beta-binomial density.
Applying the MPD
principle we take
prior Bayes density b(p)
tive.
=
info!~ative.
which yields the
1, i.e. uniform and hence minimally informa-
The prior fiWlcial density is then
and minimally
=Z =0
m
Now for any fixed
to reparameterize in a manner such that the
MBI density ano. MFI density.
= (n+1)-l,
f(x)
n
l~D
again uniform
it would be possible
principle would imply a
The only restriction would by that the
new parameter be such that a uniform density on it was equivalent to a
density on p
which had its first
moments of the uniform density.
n
e
with inverse
transfor.n~tion
n:( e) :
1
"3 =
If
e
bee)
S(;)
2 x
£ n(6px £1 - n(617 - b(6)d6)for x,O,1,2
is uniform then
for
x = 0,
1
'3 = 1
for
x = 1,
3"
for
""" =
2,
1
1
- 2
~(p) +
= 2£ [(1')
'3 =
n
Consider the following equations defined
n = 2 and an allowable parameter
for
moments equa.l to the first
-
E(1'2) .
f:
G(1'2);
(p2
t7
7, .)7
(1
('\
1
Hence (, (p) ::: .~,
2
1
P (p ) ::: -3 ) ana in general the restriction on
I..)
moments is as given above.
The densities (7.2) ancl (7.h) vJil1 be
more thun minimally informative for
m
> 0, hence the prior law
b(p) ::: 1 defines a NBS, but not uniquely.
Under this specification
the posterior Bayes and fi,ducial densities are given by (7.2) and (7.h)
For
f
n::: n+l
and
x::: n
we have
(in ::: r(n+2) [(11+2) [(2n+2) 1'(1)
n
r{n+2) r(l) 1'(n+l) r(l) r(2n+3)
'llhis states that following
probability that the next
exac-c'1y
1
?~?
n
successes in
n+l trials
independently of
n.
)
,
:::;
~l1+l......_..
-'----
(2n-:-2 )
n
trials the fiducial
vlill also be suc::esses is
This statement ha.s been termed the
law of succession and has received almost univ-ersal reje::t:i.on as
reasonable assessment of the probability that
mig~,::'(~
c,
be :?l'O)er ir. this
esse (.Teffreys, 1961).
='~
n
sucI.:esses in
next
and
is claimed that this value is too 10i'; and that bc..\i"ing observed
n
trials) for laJ::,ge
n
the
cor~8ct p~01J9,bility
of the
n+1 being successes should be very close, but not equal to, one
tb~t
this probability should approach one as
n _.> w.
The two
other prior Beyes densities which have been proposed and remain seriously
considered give substantially different values for this grobrtbility for
all
n (thus dispelling the notion that the ch<:d.ce of the prior densJ.ty
is unimportant when
is
b(8):= c
were
n
is large).
e ;: : -
frhe first of these, due to Haldane,
log pll-p.
This corresponds to a density
7~'
for
p
given by (7.2) 1-lith m=z=O.
This law has intuitive appeal
in that it represents the Emit of the class of beta densities just
as the uniform was the limit of the c18.3s of proper nm:mal densities.
Haldanefs law' assigns probability one indepenc.ently of
above event.
for the
This j.s, of course, far too high a va,luG, particularly
for small values of
vITlere
11,
n.
The third 1av1 (perks, 191!-7) is
=e
b(A)
A = sin-l~, which corresponds to (7.2) with m = z :::: '51
<~
Each of these
and ",'ihieb. gives i.ntermediate values for the above event.
parameters could define a NBS for
m~l.
The problem of the unique specification of the NBS and the
blem of the law of succession require careful consideration.
point we shall present an argument suggesting that
only satisfactory NBS.
Later (Section
b(p)
=1
pro~
At this
yields the
7.4) we shall present another
argument ,Ihieh will suggest the correctness of the evalUi:'",tion
f
(11) =.~
n'
2
for
?;,
u -- n+J..
and x ::::n.
c
Finally we shall prGsent another
argument \<lhich shows 'that the la1-1 b(p) = 1
is the onl;y one which is
consistent with the result obtained for the Poisson modeL
'I1he density
(7.1) d::i.ffers from the norro.al examples and the ini-
tial treatment of the Poisson model in that it represents an en"cire
family of d.ensities dependent upon the 5.ndex,
11,
of the model.
In
this sense it is related to the second Poisson model which also represents an entire fa:uily of densities
depenc~e:o.t
upon the J.ndex
It certainly seems requisite that the choice of prior density for
be independent of the particular binomial model (index
t.
p
n) considered.
The reqUirement then is that the prior Bayes density have all of its
moments equal to the moments of the uniform density-
Since the
75
moment theoreLl is clearly applicable the prior law b(p)=l
specified. 'Ihis acl.mitted1y is a
a compelling one •
The
is uniquely
somewhat ad hoc argument though, we feel,
class of multinomial models is the only cla.ss
of models considered requiring any such argument.
More generally, if
the class of likelihood functions is not complete, as in this case, it
way be necessary to enlarge the class in order to make it complete.
7.2
Trinomial Model
To clarify the questions arising from the binomial problem it
will be necessary to study a more general model of which the following
is a second special case.
We assume the model density
<",
X 1+ X2 ..:
L_
for
The transformation
z1 +
wi th J"acobian
(1-8) yields the bivariate beta density
zl
b (p)
m
Pl
z2
P2 (1-PI-P2)
m-z l -z 2
:= - - - - - - - - - - - -
~(zl+l, m-zl-z2+l)~(z2+l,m-z2+2)
'76
Thus the joint density of (Pl,P2) is uniform and hence the MBI principle is satisfied.
A slinilar derivation yields the fiducial density
r(X l +zl+1) r(x 2 +Z 2 +1) r(m-x l -x 2+m-z l -z 2+1)r(m+l)r(w+:'j)
r(X l +1)r(x2+1)r(n-xl ,·x2+1 )r( zl+1)r( z2+1)r(n,.zl-z2+1)r(n-Hn+3 )
For m
Zl
=
Z2 :::: 0
i'le have
(.-,
1 ()'
\1'-.')
1. e. cons taut on the triangular lattice
1V1FI
principle is satisfied.
xl + X2
Assuming the value
~
n, and hence the
m
o
=
the NNS principle is satisfied by noting that (7.7) and
"be
NBS
more than minimally informative for
:ti1J:l.y
be
m > O.
we see that
0
(7.9)
will
The uniqueness of this
demonstrated i'ollowing the requirement and method of 7.1.
Under the NBS specification the posterior Bayes and fiducial densities
are given by (7.7) and (7.9) vlith (m,zl,z,""u;xl,x,J replaced by
c.
The marginal Bayes density of
z.
m-z.+l
J.
P/(l-Pi)
bm(Pi) : ; . : ; : - - ,
f3(z.+l,m-z.+2)
J.
J.
"lhieh, for
°e
Th(~
m
= z.J.
:; 0
fiducial density of'
Pi
,
Co
is
i _. 1,2.
o < n.
<
- J.
reduces to
X
i
corresponding to (7.12) is
1
('r.ll)
77
x.
2
7--,"'-:;;;-:-',
,11+2;\n'1-1;
x.
(n ,. x. + 1) ;:; (i1:2) (1 l
n~1)'
J
The results (7.12) and (7.13) differ froID those obtained under
the assumption of a binomial model for
X., and 11i11 be found to be
l
d.ifferent from those obtained from higher order multinomial models.
Referring these models to the popular examples involving balls in an
urn l it seems to us that our prior density for
Pi
should be different
for different assumptions as to the nomiality, (Le, the number of' categorie~
of the model.
If we assume a binonD.al model, i.e. assume that
there are exactly two types of balls in the urn, then our prj.or Bayes
density should certainly be such as to assign more probability to higher
values of
Pi
than if our model were such as to assume that there were
100, say, different kinds of balls in the urn.
Indead assuming a. xnodel
of very high nomiality; we would need to assign a prior Bayes density to
Pi
which concentrated most of the probability near zero if we were to
satisfy any of the symmetry id.eas of Bayes postulate.
We shall see
that by stipulating that the prior Bayes density be uniform over the
joint set of all parameters, as we have for our MBI principle, the
marginal prior density of any
Pi
will behave properly as the nomiality
of the model increases.
7.3 The General Multinomial Model
Employing generalizations of the transformation employed in 7.2
it is possible to obtain a NBS
for a general multinomial moo.el.
In
section we shall omit consideration of the normalizing constants and
-1-'
•
v.l1.lS
shall present only the kernels of the various Bayes densities.
This
will prove sufficient to establish the existence of the NBS and for
our general discussion in 7 )1-.
The model density is
(7,,1.11.)
where
s is the nomiality of the model. The WCBn, b (P "."P 1)' is then
l
ill
s-
proportional to
8-1
z
s-l
s-l
(
P
1 - .E p. )
s-l
i=l J.
For
::::: Z
s~l
:::
n - L: z.
. 1 J.
l=,
we have
0
s-l
.E p. < 1 ,
. 1 J.
for
(7.16)
l=
and
8-1
Z x. < n •
for
Corresponding to
1=1
(7.17)
l-
(7.16) the marginal Bayes density of Pi is
b(p.) ::: (8-1) (l-p. )s-2
l
J.
and the expectation of p
i
i
= 1,2, ••• ,8-1
corresponding to
z.l + 1
= -TIl -+S
(7.15) is
i - 1,2, ••• ,s-1 •
(7.18)
79
For
m:= z. ; 0
l
we have
:=
S
-1
(7.20)
The behavior of the marginal density
(7.18) and the expectation (7.20)
seems enrrninently reasonable in view of. the differing
asst~~ptions
of the
models of differing namiality.
7.4. .Ve.lidation and Semantic Interpretation of the Multinom.ial Models
We now propose a heuristic explanation of the correctness of. the
law b(p) ::: 1
in the binomial case, the result obtained from it for
law of succession and the fact that
vIe
tr.\G
do not approach the limit of the
class of beta densities to obtain the lavl for our indifference rule.
~'Je
have continually emphasized the fact that the indifference procedure
suggested in this paper is applicable only in a parametric model
lation.
fonrlU~
part of this parametric formulation involves the definition
of a class of model densities, which definition is based on prior in.,
formation.
X g:l:ven
Thus the model density, that is the conditional density of
e)
and the prior Bayes density are both carriers of our prior
l.,.nowledge.
The assumption of a binomial model contains certain information,
namely that there are exactly two types of balls, say, in an uru.
One
'Way of having determined that there are two tjrpes is to have preViously
observed one of each type.
Thus the prior law bep)::: 1
for the
binomial mod.eJ. may be thought of as expressing knOi'lledge equivalent t·o
that which would be obtained from having observed just one of each type,
and haVing these observations used to define the binomiality of the
model.
Conditional on having observed at least one of each type the
80
minimum information is present if just exactly one of each kind has
been observed.
density
Hence the assumption of the model
b(p) :::; 1
(7.1)
and the prior
may be considered to be equivalent to the assumption
of having started 1'lith an urn about 1'7hose contents we knew nothing
a t all and then having observed just two balls, each of different
type.
analog~ls
The interpretation here is
Formulas
(7.3),
to that of
(7.18) and (7.19) are consistent with and sug-
gestive of this interpretation.
observations have defined the
Considering
s
(7.19),
additional observations the numerator of
i
i f the first
s
categories of the multinomial model,
there ''!QuId have been just one in the i-th category.
number of type
6.3.
Hence with
m
(7.19) would be the total
observed and the denominator would be the total
number observed in all categories.
Perhaps then, an explanation for the fact that the prior beta
density furnishing the NBS for the binomial model is not the limiting
density of the beta class is that the model has incorporated
tion obtained from two observations.
of
p., as given by
J.
informa~
In general the marginal density
7.18, is dependent upon the nomiality of the mod.el,
or by this interpretation, upon the number of' prior observations used to
define the model.
The extension of this idea to the general multinomial
model is mirrored in the beta density
(7.18).
This heuristic interpretation of the NBS for the multinomial model
penaits a reevaluation of the statement
after observing
n
successes in
n
(7.5).
It had been said that
Bernoulli trials ana having no
other data} the probability of the next
n+l
being successes would be
81
1/2 7 independent on
n.
If we accept the
f~ct
that the assumption of
binomia1ity is equivalent to the assumption of the availability of one
observation of each type. we may restate the above evaltL3,tion as follmrs:
Given an urn about vlhich vle lmovl noth:"ng a 11riori and the
fact that in
n+2
trials we have observed n+1
one black, the probability that the next
all be reds is
n
reds and
draws will
1/2.
Perhaps this last statement may be accepted as a reasonable evs,luation.
One final argument will be presented to support the prior law
b(p) ::: 1
and to show its consistency "i-lith the prior law
obtained for the Poisson model.
b(",) ::: c
This argument is independent of the
heuristic considerations of the preceding arguraent.
We have seen
that when considering a simple Bernoulli model, i.e. a binomial model
"inth index one, any prior lal'l vlith expectation
1/2 vTould furnish a
He then arguec3. that this process was simply
NBS.
more general binomial model with arbitrary index I
8.
special case of the
n, and any prior
density selected should furnish a NBS for an arbitrary index.
Carrying
this argQment to its limit we write
m
x::: ( n) p x ( I-p) n-x
()
P
x
nt
:::---x! (n-x)!
x
n
-x
'" ( 1·0"')'(1---A-)
--x!
n
n
"There
(as
A.:::
n
np.
He then have the usual Poisson limit of the binomial
-> co),
-A. x
mA-(x)
:::
e x~
Clearly then the prior laws b(",) ::: c and b(p):::l, both uniform, are consistent •
Each indeed logically requires the otheJ.
CHIlPTER VIII
UNIFOHM AND RElATED MODELS
The preceeding three chapters have dealt with three of the most
frequently used classes of statistical models.
vJith this chapter
v!e
begin the study of models of somewhat lesser frequency of application.
However, from a theoretical point of Vie-vI it is important to see that
the principles developed and applied to the most basic models are also
applicable to SOllie other models.
In this chapter we shall consjder some
uniform models and one related model,and in the following chapter we
shall consider some negative exponential and related models.
8.1
Uniform Model - Scale parameter Unknown
We assume that the random variable
on the interval (0, e~), for
-
00
<~ <
X is unifonnly distributeo.
The model density is then
ro.
,
"\Jhere
(y)
(8.1)
is the Heavyside unit function defined by
u (y)
:=
1
for
y > 0
:=
0
for
Y.'S 0
We have chosen this unusual parametrization as it is the one necessary
to obtain a NBS.
The joint likelihood of
n
observations is
(8.2)
83
where
x
.x-
= max
xi
x*
= min
U(~l
e
- z*)
and
e -mlJ.
m z *m e -llill
for
n > O.
·e
U (x)
mz
=
for
ill
#1-
m
U (x)
~x, Z
> 0, where
Thus, for
t)
\ (IJ.
e
The fi&lcial density of
=
f(x)
xi'
J
.. 1"' e-(m+l)1J.
*_t
He then define the
- z
*) ,
X is
mz
*ID -mlJ.
e
U (IJ.
e
> O.
in 5.2.
oK·
z )a.1J.
r
- ,_ x, z*_7';\-) dIJ.
indicates the maximum of
In
x
and
x* •
b(lJ.)
z*
to be the number of prior observations and
to be the maximum of the prior observations.
ill
IJ.
U (e -
:=
We interpret
density is
(8.3)
>0
f(x)
we define
as
NeBD
to 'be uniform on (- ro, co).
-1
f(x) ;:: x
Now both
(8.3)
and
Applying the
~WD
principle
The corresponding fiducial
(8.4)
'Hill be proper for
The completion of the proof that this constitutes a NBS is
Under the
are given by
(8.3)
NBS
and
G~S
the posterior Bayes and fiducial d.ensj.ties
(8.4)
with (m, z*, x) replaced by (u,x*,Q).
The
84
posterior Bayes density is of exponential form on (log x* J 00) with
increasing probability being concentrated near
*
x
as
n --> 00 •
The posterior fiducial density is uniform on (OJ x* ) and proportional
for
(x * J 00)
* As n -> 00 the amount of probability on
'2 > x.
approaches zero.
These forms seem very attractive intuitively.
8.2 Uniform Model - Location parameter Unknown
We assume the model
where
~
is considered known.
The joint
lil~elihood
of
n
observa-
tions is
(8.6)
·e
lie
define the
b (8)
m
NeBD
by
~
~
:::
for
m > OJ i.e. uniform on z* - e~ to z*"
The fiducial density is
85
+ el-t 7
-
f(x) =
e
-I1'r
\_ x,z*,-7*
-
x)
U(x -
*
- .['x,z*_7
=
(z
* - -rz*
(e-rZ-x- -e I-t 7) U (z. U
e)
~~
*
( z - r z - ell 7)
* -
+ ell) U(fz~~+el-t_7-x)
8)
U(x_./J·'k_el-t.})
- el-t-7)
(B.8)
i.e. uniform on (z*,z*) and decreasing on (z*,z* + el-t) and increasing
on
of
(z *
)
- e I.L , z*.
We interpret
m and
z
as in case 1.
are (0,00, - (0) hence define
(m, z*, z*)
Null values
bee) = f(x) ;;: c.
m > a (8.7) and (8.8) are proper densities, hence postulating
we have a NBS.
Uniqueness follows as in
5.1.
Por
m ;;: 0
a
The posterior Bayes and
*
fiducial densities aTe given by (8.7) and (8.8) 1-lith (m,z-x_'z ,x) replaced by
*
(n, x*'x ,~) •
8.3 Uniform Model - Loea.tion and Scale paraneter UDkno'l'"'l,
We assume the model density
'I'he joint likelihood of
me ,jJ. (-x)
==
e
-n~t
We define the NeED to be
n
observations is
(8.10)
<le
86
e) )
(8.11)
for
of
m > 1.
e
The second factor is a kernel of the conditional density
given
sity of
~,and
the first factor is a kernel of the marginal den-
The fiducial density of
~.
X is defined by
m(m-l)
==
(m"":;:!r
==
tm+lT
(m-l)
(8.12)
Following the methods of 8.1 and 8.2
b(e,~)
= c,
bl(~) := c, f(x)
t,_*
\,-:'
,.. z*/, -"__ 0
for
follows as in 5.3.
r.l
== L
)
= c, fl(x):=
we define
IXI
b(el~) ==
- z\ -1.
c,
(note tho,t
Thr; proof of the uniq,ucnecG of this NBS
The posterior Bayes and fiducial densities are
A.
g:I.v8n by (8.11) and (8.12) with (m, z*, z* , x) replaced by (n,x;'f:>x*,:x)"
It is interesting to note that the NBS involves a rather l:nusual para·.
meterization.
8.4
A positive Exponential Mode.~
We assume the model density
m (x):::: (l+p) xP e-(l+p)1l
U(x)
U(el-l - x), for
p?-, O.
(8.13 )
I-l
The interesting feature of this model is that the density fULDction
is an increasing function of
x
contrast to previous models.
For
The joint likelihood of
n
throughout the spectrux!l of
p
= 0, (8.13)
reduces to
X, in
(8.1).
observations is
(8.14)
We define the NeBD by
e-m(l+p)1-l
::::
U(el-l
S
V (e"
e-m(l+P)"
m(l+p)
) z
m(
l+p
::::
for
m > 0, z* > O.
f m(x)
=
::::
·f
m(l+p)
m),
( m+l
_ z*)
- z*) d"
(
)
e -m l+p IJ·
*
U(I-l
-
z,'(- ) ,
log
(8.15)
The fiducial density is given by
2
z
*l11(l+p)
'P
x- e
U(x) ('\J (e"---L
-(m+l)(l-:-p)1I
,.,
*m(l+p)
(l+p) z
U(x)
*
*(m+l)(l+p;--
II
xP
n
, lor m
--
'0
,>
*
7*
z ,x_ )ov
-)f
J
Z
.....
0
,/
(z , x)
(8.16)
For m
= z* = 0
is a unique
NBS
l
.
The proof that this
follows as in previous cases.
The posterior Bayes
we have
b(\-t) == c, f(x) :::; x-
and fiducial densities are given by (8.15) and (8.16) 'with
(m,z')~,x)
88
replaced by (n, x -* ,
52').
The posterior Bayes density has an exponen-
tial form on (log x*, (0).
':['he posterior fiducial denslty has an in-
teresting form, being an increasing function of
decreasing function of
X on
(z)
* (0).
X on (0, z*') and a
CHAPTER IX
NEGATIVE EXPONENTIAL fUIID RELATED MODELS
In this chapter we derive the NBS for three negative exponential models and for the Pareto and Weibull models.
It is necessary
to repeat here that we are assuming that the given experimental model
is identical to the informational model.
9.1
~~gative
Exponential Model - Scale Parameter Unknown
We assume the model density
m ()
x
A
-e
= eA-xe
The joint likelihood of
n
-1
u (x)
•
observations is
We then define the NeBD to be
mA.-e~z. .
l.()m
bm(A)
for
=
ill > 0, Ez. > O.
1.
eL:Z
i
,
rem)
(9.3)
The fiducial density is then the
m(~~zi )ill
f (x)
ill
=
( x + Ez. )ill+l
"t" density
,
(9. 4)
1.
for
m > 0, r.,z. > O.
1.
For
m=z=O
the ratio
is unj.ty:
b (A l )/bm(A2 )
m
-1
X
'VJe see then that
the Bay-es and fiducial densities are proper for
m > 0, Ez. > 0 and
hence vle define
b(A)
:=
c
and obtain
f(x)
:=
J..
90
improper for
m
~
0,
~z.
1
~
O.
The completion of the proof that this
constitutes a unique NBS follows as in 5.2.
Under the specification
of the N13S the poste:r'ior Bayes and fiducial densities are given by
(9.3) and (9.4) with (m,
9·2
Negat~.ve
__-........
~z., x) replaced by (n,~x.
1
1
2).
Exponential Model - Location
parameter Unknown.
_;00
~-
We assume the model density
_ e A._(x_e)eA.
me ( v)
A
-
The joint likelihood of
n
U(x-e)
~
observations is then
We define the NeBD to be
for
ill
> O.
f
The firulcial density is
(x)
m
~
b (G )/b (G ) ~ 1 for m ~ 0 z* ~ 00, hence define b(e)~c.
m l
m 2
The corresponding fiducial density is f(x) ~ c. The densities (9.7)
The ratio
and
(9.8) are proper for m > 0, z* < 00, while bee) and f(x) are
improper.
in 5.1.
The essential uniqueness of the NBS may be demonstrated as
Posterior Ba;Y'es and fiducial densities are given by (9.7) and
(9.8) with (m, z*,x) replaced by (n,x*,2).
e
9.3 NeGative Ez:ponential Model - Locaticn and Scale parameter
.
.
We <'Lefine the model density
Unlmo"iffi
91
IT\,e (x) :;:: e A- (X- e)e
The joint likelihood of
n
A
U(x-8 ) •
observations is
(9.10)
We define the NeBD by
e
ll1A-e A Z(z
i
- e)
U(z*-9)
b (11,,8)
m
8)
U(z.Yc-
e)dG
dA
(9.11)
r{m-l)
-e
for
m > 1.
f
for
111
The fiducial density is then
(x)
> 1. The ratio
m :;:: I:z. :;:: 0 and
~
z*:;::
for
00.
Thus define
The conditional density of
and b(81A.) == c
(9.12)
«( s+x)~(ln+l) (x: z*) *)In
m
for
e
given
b(A,8)==c
null values
and obtain
f(x):;:: c.
(9.7)
m> 0
A is given by
for
m:;:: O.
It should be noted that the prior density for the scale parameter
in 9.1
and 9.9 does not appear to be consistent with the specification
of the prior distribution for the Poisson parameter.
The prior densi-
ties derived in this chapter should be used only '-Then the models (9.1)
92
and
(9.9)
may be considered to be identical to the informational model.
This will not generally be the case.
vIhen the basic underlying ai3sUllrp-
tions a.re those of a Poisson process} the experimental model densities
(9.1)
(9.9)
and
Thus neither
density.
are obtained only by employing conditional stopping.
(9.1)
(9.9)
nor
will coincide with the informational
When the assumptions of the Poisson process are valid) the
choice of prior distribution should be that determined in Chapter VI,
namely, b(A)
:=
c, noting that the A of this chapter is the logaritma of
the A of Chapter VI.
A similar situation exists in the binomial
tive binomial context where the binomial model
9.4
j.s
nega~
similarly e:c]?loyed.
A Fe,reto Model
Because of their importance in applications we shall now consider
two rather special models.
We deal first with a location parameter pareto
model density given by
p
:=
eP U(x-e) ,
"p+l
p
e>
> 0,
0
.(~.
The joint likelihood of
n
observations is given by
:=
We define the
b (8)
m
NCBD
as
emp
~
J
V(z* - G)
oW!! U(z*-O)dO
from
ill
>
o.
U(z* -
(mp+1) amp
=
mp+l
z*
The corresponding fiducial density is
e)
U(e)
93
U(e)
eP U(x-e)
+l
xP
(m+l)p+l
_ x,z,*_7*
r
p(mp+l)
mp+l p+l
z*
x
In the usual manner we obtain
ness of this
~rns.
/\
(n,x*,x).
b(e)::: f(x) ::: c, and infer the unique-
The posterior Bayes and fiducial densities under
the lIDS are then given by
by
de
(9.16) and (9.17) with (m,z*,x) replaced
The above model is obtainable by a logarithmic trans-
formation on the location-parameter exponential model of section 902.
The other one-parameter and the two-parameter pareto models may be
treated analogously.
9.5 A WeibulJ. Model
The second special model to be considered is the one parameter
Weibull model with density function
The joint likelihood of
n
observations is
l
m./I. (x)':::
_
P n enp/>. ( rex ):p- e e·e
i
We define the NeBD
b (/>.) :::
m
to be
rem)
AP
~o
LZ-:1
for
ill
> O.
is
"t
m
(9.3). The fiducial density
this reduces to
-r1' AI'
1)/\ I 1
ro
f (x)
p~l
For
_e~_;_:')_'"!_'_e_""_'~_C
9,_'
.
for
For
1';
1
this reduces to
,
e
·e
,
(9.4).
~P(A.I-A.2) ... I:z.(e
A.l1'
f(x):= x-
previous cases.
>
111
The ratio
(Al )/bm(A2) ;
b
m
A. 21'
... e . )
The
l
m
= L:Z~
~
:=
O.
Hence define
b(:>-') =" c
The uniqueness of this NBS follows as in
~osterior
Bayes and fiducial densities are given
by (9.20) and (9.21) with (m, L:z.,x)
replaced by
1
co.ution given in
O.
1.
which is unity for null values
and obtain
A i·1'(I: p)
I) ill
\_.l_(_;_;_:_~Z_Z_~_, )_-l_e_ill1'_..._ _ ,__Z~_._
9.3 also applies in this case.
(n, L:x.,
~).
1
The
CHAPTER X
EXTENSIONS AND
REFINEME1~S
1001 Limitations of the Procedure
We have shown that an essentially unique NBS can be obtained
for a multitude of common examples.
There are, however, an equal
number of models, though they are not frequently employed, which
do not seem amenable to this type of analysis. First consider the
gamma dens i ty
rrx(x)
=
x'K- l
e -x
U(x)
,
K
> 0 •
(10.1)
f(K)
Following our usual approach we
~Iould
define
CrrZi)K-l(f(K))-m
b (K) ::::
m
(10.2)
S
(rPi )K-l(f(K) (ill
OK
We would require that the integral in the denominator be obtained i:o.
closed form, which does not seem possible.
Even if we were to consider
K to be restricted to non-negative integral values and consider the
integral with respect to counting measure it would not appear that we
can obtain a closed form expression.
By analogy to the chi-square di.s-
tribution we may refer to K as a "degrees of freedom" parameter.
have been unable to find a NCBD for any such parameter.
We
We should note
however that degrees of freedom parameters generally arise in derived
'~stributions
and not in distributions of the basic random variable.
Indeed we seem to be successful in obtaining a NBS in just those
cases in which the model can be referred directly to some basic
stochastic process.
A second example is illustrated by the model density
~K(x_e)K-l e-~(x-e) lJ (x-e)
(10.3)
r(K)
with K
f
1.
This example fails because there does not exist a suffi-
cient statistic of fixed dimensionality for
(~,
e).
Here again there
may be some artificiality since there may be some more basic Poisson
process involved.
This model is also quite troublesome for the
classj~
cal theory.
The procedures is not directly applicable to some useful truncated
models such as the truncated normal.
care in defining the model.
In this respect we must use some
If the basic underlying model is normal
but our measurements are restricted by the experiment/then employing
the ideas of Chapter II we would obtain our prior Bayes densities from
the full normal
cated normal.
mode~even
though our experimental model might be trun-
Again we must refer to the more basic underlying process.
If we drop the requirement of the MPD principle we can obtain an
essentially unique structuring for any class of models which may be
indexed by a location parameter.
Since the integral of
unity whether the integration is
dx
form prior density for
~
or
d~,
m(x-~)
is
the assumption of a uni-
implies a uniform prior density for
X.
Also
the posterior density is proper after an epsilonth of an observation and
hence the MlJS principle is satisfied (postulating
m
o
==
0).
Thus the
above gam..'lla model, the Lapla.ce (double exponential), and the Cauchy
model adi'ntt of reasonable structuring even though there is no sufficj.eIJt
statistic of fixed dimensionality.
10.2
EyaluatioD of the
The
~~D,
MBl) MFI
F~r
Principles.
and
MNS
principles in their current formu-
lation have been reasonably satisfactory in that they have clearly indicatecL essentially unique specifications of the prior Bayes density
for a number of important examples.
somevlhat less than elegant.
The
MFI
and MNS
principles aTe
The major problem with the MPI principle
is that 'we do not have an adequate measure of information for d.iscriminating between improper densities.
While the discriminations made in
this paper are unlikely to raise violent objection it would be desirable
to have some more general statement.
is that the value
m
o
The problem vnth the
must be postulated for each case.
MI~S
principle
Again) while
i'1e have taken this to be rather apparent for the cases studied a general
theory would be preferred.
Finally/it has been necessary to introduce
some additional) though again persuasive, ideas to obtain uniqueness
for the structuring in the multinomial cases.
Future research should
be directed towards further explicating these principles.
For most applications the scale of measurement of the randem
variable may be considered fixed, at least up to a linear transformation.
We may measure height in inches or in feet or even in meters but we are
unlikely to measure it in any scale l'1hich is not a linear transformatiol".!.
of the scale of inches.
In the discrete case it is clear thatvG need
)/
not, even for theoretical purposes, consider transformations on ,If
In the continuous case the Lebesgue measure is dependent upon the
coordinate system chosen and hence considerations of non-linear trans,11
forma tion on), may be of some theoretical interest. In addition,
the:.~e
are applicatj.ons, in optics for example, in v1hich there may be
a legitimate choice among random variables v1hich are not linearly
related.
In studying the effect of transformations on
~
on the mode of
anal;ysis suggested in this paper we shall again restrict ourselves
to allowable transformations
to real-valued
X.
If me(x)
is the model density for
NeED
for
y:= cp(x)
x:= T(Y)
v1ith inverses
X, then
is the model density for
Y.
and.
Y, the
Given the model density for
e is
m
m
III te(-V1
If me( -r( w.1. ))
• "
1:::.L
i )
b (8)
'i
I
(w. )
1.
:= - - - - - - - - _ .
m
(111
. 1T me ( 1" ( 1;'1. ) )T I ( w. ) de
r
n"te(Hi)dt
) i:=l
. I
J 1.=
®
IT
:=
1.
(j})
m
i:=l
1.
me(z.)
1.
J
where
w is to y
as
z
is to x.
(' m
i Tfme(z. )de
J i=l
J.
Q:l
Hence the NeBD is invariant under
given by
y
The fiducial density of
Y is
99
r
,-
te(y)bm(e)de
::=
::=
.J
"(YlJ
Gj)
Novr,
@
j
J
g (y)dy _.
m
,
=
me(x) b (8)d8
m
~ ~I(y)f
(T(y))dy ::=
f (x)
~1(y) __
D__
l __
f (x) dx •
m
J
If then, the spectra of
are infinite, then the
v~D, ~illI
satisfied for
Y.
which a transformation of
r\
Y is proper if and cnIy if the fidu-
X is proper.
X and
dx::=
'fl(y)
ill
Hence the fiducial density of
cial density of
J'
and MNS
X and
principles will be jointly
We have demonstrated several cases in
'1
leaves the NBS invariantj specifically
tbe log-normal, Vleibull ano. pareto models rJay be related
r::ation to the norm.al and exponential models.
studi.ed in this paper if
form
A.
to
'f
and
X to
A.
y
Also for th2 examples
is a scale parameter for
so that
'f
by transfor·
X and. we trans·
be noted that transformations on ,;( and on
A transformation on
v'
;f
o
is a location parameter for
Y then the 1"138 is invari.ant under the dual transformation.
effect.
Y
(H)
It sholJld
are quite different in
generally involves a change in the
model distribution vlhile a transformation on ~) does not.
Th:J.s
simply reflects that the model assumptions are dependent upon the scale
of measurement.
10.4
Fiducial Inference
In addition to suggesting methods for specifying the prior Bayes
density under indifference and thus proposing a model for logical pro-bability statements , it has been suggested tha-l, t:11e fiducial density
of
X be used in place of the unknown true density of
future observations from the process under study.
X to predict
It has been pointed
100
out that this is simply an extension of the law Of succession.
Very
little mathematical justification has been given for this suggestion.
We have alluded to a consistency property, i.e.
n -->
n
=
00.
f (x) --> m(x)
n
We have referred to Lindley's result that
as
~\ r(fn ) ~ ref)
1,2, ••• , which says that on the average succeeding estimates be-
come more informative, having started of course by being minimally informative.
Finally we have shovm that in the simplest normal example
that a direct and exact predictive frequency result obtains.
We have been unable to obtain any general predictive frequency
theorems.
rt is clear, however, that if this aspect of the proposed
dual interpretation of the Bayesian model is to be justified some
further investigations must be 1mdertaken in this respect.
The infcr-
mation inequality as given above and the consistency property seem
hardly sufficient to reasonably justify such an interpretation.
101
REFEBENCES
BElyes, T. (1763), "Essay toward$ solving a problell1 in the doctrine of
cbanc~s") Phil. Trans. Boy. Soc.,
53, 370-418.
Birnbaum, A. (1962), "On the foundations of statistical inference",
J. A. statist. Soc., 57, 269-306.
Boole, G. (1854), An Investigation of the LavIS of Thought, reprinted
New York:
Dover.
Carnap, R. (1953), "Two concepts of probabi lity", Feigl, H., Readings
in the Philosopby of Science, New York:
Appleton-Century-
Crofts, 438- 455.
Cramer, H. (1946), I;1athematical Metbods of Statistics, Princeton
Uni v. Press.
Doob, J. L. (1941), "Probability as measure", Ann. math. stat., 12,
de Finetti, B. (1937), "la prevision:
ses lois logi que s , ses sources
subjectives", Ann. lnst. H. Poincare, 1,1-68.
Fisher, R. A. (1930), "Inverse probability", Proc. Camb. Phil. Soc.,
26, 528-535.
Fisher, R. A. (1933), "The concepts of inverse probability and
fiducial probability referring to unknown parameters", Proe.
Roy. Soc. A., 139, 343-348.
Fisher, R. A. (1956), statistical Methods and Scientific Inference,
Edinburgh:
•
Oliver and Boyd.
Fraser, D. A. S. (19 61), "On fiducial l'nferen ce,
"Anne rna th . stat., 32,
661-676.
102
Fraser, D. A. $. (1962), 'IOL the sUI'fic:i.ency and like1:Lbood
principles", ..0.tanf'ord Technical Report, 43.
Good, 1. J. (1950), Prob~?il_ity and the_Weigbi~nbof_Evidence;
London:
Griffin.
<Taynes, Eo T. (1957), "Information theory and statistical mechanics",
Physi cal Review, 106, 620- 630.
Jeffreys, H. (1948), _Theory of ProbabiUty, 2nd Edition, Oxford
University Press.
Jeffreys, H. (1961), Theory of Probability, 3rd Edition, Oxford
University Press.
Kcrridge, D. F. (1961), "Inaccuracy and inference", J. R. stati.st .
.Soc.~~, 23, 184-194.
Keynes, J. M. (1921), Treatise On ProbablU ty., London:
MacMillan.
Kolmogorov, A. N. (1956), Foundations of ProbabiHty, 2nd English
Edition, New York:
laplace, P.
s.
Chelsea.
(1820), Theorie Analytique des Probabilit~s, 2nd
Edition, Paris:
Coureier.
LeCam, L. (1958), "Les proprietes asymptotique des solutions de
Bayes", Publ. de 1 1 Institut de Statistique de l'Universite de
Paris,
1,
17-35.
Lindley, D. V. (1956), "On a measure of information provided by an
experiment", Ann. math. stat., 27, 986-1005.
Lindley, D. V. (1957), "Binomial sampling schemes and tbe concept of
information") Bicmetj:':!:.~~, ~.1+, 179-186,
e·
103
Lindley, D. V. (1958), "Fiducial distributions and Bayes theorem",
J. R. statist. Soc. B., 20, 102-108.
Lindley, D. V. (1961), "The use of prior probability distributions
in statistical inference and decisions", Pro_ceedings of the
Fourth Berkeley Symposium on Probability and Statistics,
1:, 453- 468.
von Ydses, R. (1957), Probability, Statistics and Truth, New York:
MacMillan.
Novick, M. R. (1962), "A Bayesian indifference postulate",
Institute of Statistics Mimeograph Series No. 319,
University of North Carolina.
Perks, H. (1947), "Some observations on inverse probability including
a new indifference rule", Journ. Inst. Act., B., Part II,
Raiffa, H. and Schlaifer, R. (1961), Ap~~ied Statistica~: Decision
Theory, Boston:
Harvard Business School.
Rajski, C. (1960), "On the existence of entropy", ~nformation Theory,l
Statistical Decision Functions, Random Processes, Transactions
of the Second Prague Conference, 541-542.
New York:
Academic
Press.
Ramsey, F. P. (1931), The Foundations of ~~thematics and Other Logical
Essay~
New York:
Harcourt Bruce.
Reichenbach, H. (1932), "Axiomatik der Hahrscheinlichkeitsrechmmg""
Math. Zeitschrift, 34 , 568-619.
"
I
Savac;e, L. J.
(1954), The Foundations of Statistics, Nevi "lork:
3avaee, L J.
(1961), "The foundations of statistics reGcnsidel'ed
I
..
Proceedings of the Fou!th Berkeley
and Statistics,
Savage, J. L.
London:
Schl~ifer,
R.
lt
,
~ymposium on~robability
575-586. University of California Press.
(1962), .!he Foundations of Statistical Inference,_
Methuen.
(1959), Probability and Statistics for Business
sions, New York:
Deci~
McGraw-Hill.
Shannon, C. E. and Weaver, W.
(1949), The Mathematical Theory of
Communication, University of Illinois Press.