Reference priors with partial information

Reference priors with partial information
By D. Sun
Department of Statistics, University of Missouri-Columbia, Columbia, MO 65211, USA
e-mail: [email protected]
and J. O. Berger
Institute of Statistics and Decision Sciences, Duke University, Durham, NC 27708, USA
e-mail: [email protected]
Summary
In this paper, reference priors are derived for three cases where partial information
is available. If a subjective conditional prior is given, two reasonable methods are proposed for nding the marginal reference prior. If, instead, a subjective marginal prior
is available, a method for dening the conditional reference prior is proposed. A sucient condition is then given under which this conditional reference prior agrees with the
conditional reference prior derived in the rst stage of the reference prior algorithm of
Berger and Bernardo. Finally, under the assumption of independence, a method for nding marginal reference priors is also proposed. Various examples are given to illustrate
the methods.
Some key words: Kullback-Leibler divergence, noninformative prior, normal distribution,
gamma distribution, beta distribution, Neyman-Scott Problem.
1
1. Introduction
Bayesian analysis using noninformative or default priors has received considerable
attention in recent years. A common noninformative prior is the Jereys prior (Jeffreys, 1961), which is proportional to the square root of the determinant of the Fisher
information matrix. The Jereys prior is quite useful for a single parameter, but can
be seriously decient in multiparameter problems (cf. Berger and Bernardo, 1992). For
a recent review of various approaches to the development of noninformative priors, see
Kass and Wasserman (1996). Here, we will concentrate on the reference prior approach,
as developed in Bernardo (1979) and Berger and Bernardo (1992).
In many practical problems, one has partial prior information for some of the parameters. For example, in a N ( 2) population, one might possess reasonably strong prior
information about , while the prior information for is vague. As another example,
Lavine et al. (1991) considered robust Bayesian inference with specied prior marginals.
As a third example, the prior knowledge could be of independence of the parameters,
such as in the ECMO clinical trial example studied by Ware (1989). Another common
type of partial information is constraints on the parameter space. This is typically easily
handled, however, in that reference priors for a constrained space are almost always just
the unconstrained reference prior times the indicator function on the constrained space.
The paper is arranged as follows. In Section 2, we will develop the reference priors
when two types of partial information are available. We rst consider the case when a
conditional prior is known and it is desired to nd the marginal reference prior. Two
options are given. One is similar to the nal stage of Berger and Bernardo's reference
prior algorithm the other is more intuitive and is based on deriving the marginal model.
2
Next, the conditional reference prior is derived when a marginal prior is known. A
sucient condition is found under which the conditional noninformative prior often
agrees with the conditional reference prior from the rst stage of Berger and Bernardo's
algorithm. Some examples are given in Section 3, illustrating various aspects of these
results. In Section 4, an algorithm is proposed for determining marginal reference priors
when the two parameters are known to be independent. Some sucient conditions are
given under which the answers can be written in closed form. Formal examples and the
ECMO clinical trial example are used for illustration. Finally, Section 5 contains some
discussion.
2. Knowing A Marginal Or Conditional Prior
2.1. Introduction
Let Xn = (x1
xn) be a random sample from the density p(x 1 2), where the
parameters 1 and 2 are vectors of dimensions d1 and d2, respectively. Let (1 2)
denote the prior density of (1 2). The following questions are of interest.
1. Suppose there is available a subjective conditional prior density s(2 1) for 2
j
given 1. How can we nd the marginal noninformative prior r(1) for 1?
2. Suppose there is available a subjective marginal prior density s(1) for 1. How
can we nd the conditional noninformative prior r (2 1) for 2 given 1.
j
Solutions to these two questions will be discussed in 2.2 and 2.3, respectively. We
xx
will use (1 2 Xn) to denote the joint posterior density of 1 and 2 and (1 Xn) the
j
j
marginal posterior density of 1.
2.2. Finding The Marginal Prior
3
When the conditional density of 2 given 1 is available, there are two reasonable
options for nding a marginal reference prior r(1).
Option 1: Following Bernardo (1979), dene the expected Kullback-Leibler divergence
between the marginal posterior density of 1 given Xn and the marginal prior of 1 by
o i
hZ
n
r
(1)
Xn ( ) = E (1 Xn ) log (r1(X)n ) d1 1
R
where the expectation is with respect to the marginal density m(Xn) = p(Xn 1)r(1)d1.
If
g
j
j
j
We seek that prior, r(1), which maximizes (1) asymptotically, since maximizing the
distance between the prior and posterior is a plausible way to dene a prior which has
minimal inuence on the analysis. It follows from Ghosh and Mukerjee (1992) that,
under some regularity conditions and as n
,
Z
n
o
Xn r( ) = d21 log 2ne + r (1) log r((1)) d1 + o(1)
1
! 1
If
where
g
n1 Z s
o
(1) = exp 2 (2 1) log d2 :
22
j
j
j
j
j
(2)
(3)
Here = (1 2) is the per observation Fisher information matrix for (1 2) 22 =
22(1 2) is the per observation Fisher information for 2, given that 1 is held xed,
and is the determinant of .
j
j
The reference prior strategy suggests choosing r to maximize (1) or (2) asymptotically on compact sets this can easily be seen to lead to
1r(1) (1):
/
(4)
In fact, this is essentially the solution used in Berger and Bernardo (1989, 1992). The
following theorem gives an important special case.
4
Theorem 1. (a) If jj=j22j does not depend on 2, then, for any subjective conditional
density s (2j1), the marginal reference prior from Option 1 has the form
1r(1) ( = 22 ) 21 :
/
j
j j
(5)
j
(b) If jj=j22 j depends only on 2 and s (2j1) does not depend on 1, then 1r (1) /
1.
Proof. The results follow from the denition of 1r(1) and the assumption.
Option 2: Find the marginal model
Z
p(Xn 1) = p(Xn 1 2)s(2 1)d2:
j
j
(6)
j
Let (1) be the Fisher information matrix for 1 based on this marginal model. Then
the reference prior for 1 in this model, again maximizing, asymptotically, the expected
Kullback-Leibler divergence between the marginal posterior and the marginal prior, is
2r(1)
(1) 12 :
/ fj
(7)
jg
Option 2 more closely mirrors the underlying motivation for reference priors in that,
with s(2 1) given, the information in the data about 1 resides in p(Xn 1). Hence the
j
j
marginal reference prior for 1 should, ideally, be computed with respect to this mixture
model. Unfortunately, the Fisher information matrix for such mixture models is often
dicult to compute, so that implementation of Option 2 is often dicult. This same
diculty also motivated the use of the analogue of Option 1 in Berger and Bernardo
(1989, 1992), in place of the more natural Option 2.
2.3. Finding The Conditional Prior
5
For the case when the marginal prior density s(1) is known, consider the expected
Kullback-Leibler divergence between the conditional posterior density of 2, given 1 and
Xn , and the conditional prior of 2 given 1
Z
hZ
n
o
i
r
Xn ( 1) = E (1 Xn ) (2 1 Xn) log (r2(1X)n ) d2 d1
1
2
2 1
i
hZ Z
n (1 2 Xn) o
= E
(1 2 Xn) log ( ) d2 d1
1 2
1 2
hZ
n (1 Xn ) o i
E (1 Xn ) log ( ) d1 :
(8)
If
j
g
j
j
j
j
j
j
;
j
j
1
1
From an asymptotic expansion of the rst term, compare formula (1.1) of Ghosh and
Mukerjee (1992, p. 192) and (2), we have that
Z s hZ r
n 22 21 o i
d
n
2
r
Xn ( 1) = 2 log 2e + (1) (2 1) log r ( ) d2 d1 + o(1): (9)
1
2
2 1
Choosing r(2 1) to maximize (9) asymptotically thus suggests dening the conditional
If
j
g
j
j
j
j
j
reference prior as
r(2 1)
22(1 2) 21 :
j
/ j
(10)
j
When this conditional reference prior is proper, matters are straightforward. In
practice, it may be improper and will then encounter normalization concerns. The
compact support argument that is typically used in the reference prior approach (Berger
and Bernardo, 1992) may then be applied here. Choose a nested sequence 1 2
of compact subsets of the parameter space for (1 2), such that ii = and r(2 1)
j
has nite mass on i = 2 : (1 2) i . Let 1A be the indicator function on A and
Z
Ki(1) = 22(1 2) 21 d2:
f
2
i
g
j
j
The conditional reference prior of 2 on i is
ir(2j1) =
22(1 2) 12 1 ( ):
Ki (1) i 2
j
j
6
Now dene the conditional reference prior of 2 by
ir(2 1) r(2 1) = ilim
!1 r ( )
j
j
i
20j 10
when the limit exists here (10 20) is any xed point. It is easy to see that
r(2 1)
j
/
Ki (10) ( ) 12 :
lim
i!1 K ( ) 22 1 2
i
j
1
j
(11)
The following theorem gives a sucient condition under which this limit is proportional to 22(1 2) 21 .
j
j
Theorem 2. Assume
22(1 2) = g1(1)g2(2)
j
(12)
j
for some functions g1 and g2 . Suppose = 1 2 and the compact sets are chosen to
be of the form i = 1i 2i: Then the conditional reference prior of 2 satises
r (2 1)
j
22(1 2) 12
/ j
j
g2(2) 12 :
/ f
g
Proof. Clearly, the normalizing constants Ki are independent of 1. The result follows
immediately.
Note that the conditional reference prior, r (2 1), never depends on the specied
j
marginal prior s(1).
3. Examples
3.1. Multinomial Distribution
Consider the multinomial density for 3 cells
p(y1 y2 p1 p2) = y !y !(k k!y y )! py11 py22 (1 p1 p2 )k;y1 ;y2 1 2
1
2
j
;
;
;
7
;
(13)
where k is a positive integer, yi is the observed frequency in cell i, pi is the probability
of cell i, and 0 p1 + p2 1: Without loss of generality, we will suppress the cell count
and probability for the third cell. The Fisher information matrix for (p1 p2) is
80 p;1 0 1
0 1 1 19
< 1
=
1
1
A+
@
A
:
(p1 p2) = k :@
1 p1 p2 1 1
0 p;1
2
;
;
(14)
The Marginal Reference Prior. Given p1, assume that p2 has a conditional density
s(p2 p1) on (0 1 p1 ): We want to nd the marginal reference prior for p1 .
j
;
;1
Option 1. Since jj = 1=fp1 p2 (1 ; p1 ; p2 )g and 22 = p;1
2 + (1 ; p1 ; p2 ) ,
j
=22 = 1= p1 (1 p1) . From Theorem 1, the marginal reference prior for p1 is
j
f
;
g
1r(p1)
1
p1(1 p1 )
/
f
;
1
g2
:
(15)
Option 2. Assume that n observations xi = (y1i y2i ) i = 1
n are obtained. Note
that, in reference prior developments, one considers replications of the full experiment
in creating the asymptotics. A considerable simplication results if
s(p2 p1) = (1 p1);1g p2=(1 p1)
j
;
f
;
(16)
g
for some density g on (0 1). This says that the random variable p2=(1 p1) has the
;
same distribution for any p1. Then the marginal model is
Z 1;p1 y y
p(Xn p1) =
p11+ p22+ (1 p1 p2 )nk;(y1+ +y2+ ) 1 1 p g( 1 p2p )dp2
0
1
1
Z
1
= py11+ (1 p1)nk;y1+ sy1+ (1 s)nk;y1+ ;y2+ g(s)ds
j
;
;
/
;
;
;
;
0
py11+ (1 p1)nk;y1+ ;
where yi+ = Pnj=1 yij . The Fisher information corresponding to this model is then
(p1) = p (1nk p ) :
1
8
;
1
(17)
Therefore, 2r(p1) is also given by (15) when (16) holds. If (16) does not hold, the
marginal reference priors will typically be dierent.
The Conditional Reference Prior of p2 . For any marginal subjective prior on p1 , the
conditional reference prior of p2 is
r(p2 p1)
j
1
0 < p2 < 1 p1:
p2(1 p1 p2) 12
/
;
f
;
;
g
3.2. Normal Distribution
Consider the normal density with a mean and a standard deviation ,
p(x ) =
1 expn (x )2 o
2 2
(2) 21 ;
;
;1
<x< :
1
(18)
The Fisher information matrix for ( ) is
= ( ) = diag(;2 2;2):
Case I. Marginal and Conditional Reference Priors For
Proposition 1. (a) If s ( j ) is independent of , the marginal reference prior under
Option 1 is 1r ( ) / 1.
(b) If s( j ) is independent of , the marginal reference prior under Option 2 is
2r( ) 1.
/
(c) For any given marginal prior of , the conditional reference prior of is r ( j )
/
1:
Proof. Note that = 22 = ;2, and (a) follows from Theorem 1(b). For (b), since
j
j j
j
s( ) = s(), the marginal probability density p(Xn ) is a scale-mixture of normals
j
j
9
and hence a location probability density. Consequently, the Fisher information for
will be constant. Part (c) is obvious.
When s( ) depends on , Option 1 and Option 2 can generate dierent marginal
j
priors for .
Case II. Marginal and Conditional Reference Priors For .
given , the marginal prior for Proposition 2. (a) For any conditional prior of
under Option 1 has the form 1r( ) / 1=:
(b) If the prior distribution for is normal with mean m and variance 2, then the
marginal prior for under Option 2 has the form
r () /
2
n 1
12
2
2 + (2 + n 2)2 :
(19)
;
(c) For any given marginal prior on , the conditional prior for is also
/
1=
independent of .
Proof.
Part (a) is immediate from Theorem 1 (a). Part (c) is easy. For (b), let
Xn = (x1 xn) be a random sample from N ( 2). Dene xn = (x1 +
and S 2 = Pn (x x )2. The marginal model is
+ xn )=n
i=1 i ; n
n ( m)2 o
n S 2 o Z 1 n n(xn )2 o 1
1
p(Xn ) = n exp 22
exp
exp
22
2 2 d
;1
(2) 21 n S 2 n(xn m)2 o
1
exp
n;1 (2 + n 2)1=2
22 2(2 + n 2) :
j
;
;
;
/
;
;
;
;
Therefore, writing ~ = 2, we have
@ 2 log p(X ) = n 1 +
1
n
@ ~ 2
2~2 2(~ + n 2)2
f
j
g
;
10
;
S2
~ 3
;
n(xn m)2 :
(~ + n 2)3
;
;
Under the marginal model, we have E~ S 2 = (n 1)~ and E~ (xn m)2 = ~ =n + 2. So
;
;
the Fisher information for ~ under the marginal model is then
(~) = (n 1)=(2~2) + 12 (~ + n 2)2:
;
The result is immediate.
Note that 2r () diers from 1r(). When n
! 1
, however, 2r () will converge to
1r().
3.3. Gamma Distribution
Consider the gamma density p(x ) =
j
;()
x;1 exp( x) x > 0 where > 0
;
and > 0: The Fisher information matrix for ( ) is
0 0() 1 1
A
= ( ) = @
;
;
1
2
(20)
where () = d log ;() =d:
f
g
Proposition 3. (a) For any conditional density of given , the marginal reference
prior for under Option 1 is given by
1r ()
0() ;1 12 :
/ f
;
(21)
g
(b) Assume that the conditional distribution of given is Gamma(a b). The
marginal reference prior for under Option 2 is given by
2r ()
0() n0(n + a) 12 :
/ f
;
g
(22)
(c) For any given marginal prior on , the conditional reference prior for is independent of and is given by ;1.
11
Proof. It is easy to see that = ;2 0() 1 and 22 = ;2. Thus =22 =
j
j
f
;
g
j
0() ;1 which implies (21). Part (c) is clear. For (b), let Xn = (x1
;
j
xn) be a
random sample from a Gamma distribution. The marginal model is then
Qn x );1 Z 1
n a
(
=1 i
n exp X x b a;1 exp( b)d
p(Xn ) = i;(
i
) n 0
;(a)
i
=1
Q
;(n + a)( ni=1 xi) :
n+a
;() n Pni=1 xi + b
j
f
;
g
;
/
f
g
The second derivative of the logarithm of p(Xn ) is n0()+ n2 0(n + a) from which
j
;
(b) follows immediately.
Using an expansion for 0( ), (cf. Equation (1.45b) of Bowman and Shenton, 1988),
we can show that 0(x) = x;1 + (2x2);1 + O(x;3 ) as x
;1 as n
! 1
. Therefore, n0(n + a)
!
. Consequently, the marginal reference prior for under Option 2
! 1
converges to the marginal reference prior for under Option 1.
3.4. Neyman{Scott Problem
Suppose that yij i = 1
n j = 1 2 are independent observations, and yij has
a normal distribution with mean
i
and variance 2. We want to nd the marginal
reference prior distribution for 2 or . The Fisher information matrix for ( 1
n 2)
is given by
( 1 n 2) = diag(2;2
Proposition 4. For any conditional prior of ( 1 2;2 n;4):
n ) given 2, the marginal ref-
erence prior for 2 under Option 1 is 1r (2) / ;2: Equivalently, the marginal reference
prior for is 1r ( ) / ;1:
Proof. Let 1 = 2 and 2 = ( 1
n ). We note that = 22 = n;4, which does
j
12
j j
j
not depend on 2. The rst result follows from Theorem 1 (a). The second result is an
immediate corollary.
Proposition 5. Suppose that the prior distributions for i i = 1
n are indepen-
dent normal with mean mi and variance 2. The marginal reference prior for 2 under
Option 2 then has the form
2r (2)
;4 + (2 + 2 2);2 12 :
/ f
(23)
g
Equivalently, the marginal reference prior for under Option 2 has the form
2r ()
;2 + 2(2 + 2 2);2 12 :
/ f
(24)
g
Proof. Dene yi = (yi1 + yi2)=2 and S 2 = Pni=1 P2j=1(yij yi)2. The marginal model is
n Z1
n S2 o Y
n 2(yi i)2 o 1
n ( i mi)2 o
1
2
d i
p(Xn )
exp
1
2n exp 22 i=1 ;1 exp
2 2
2 2
2
(2
)
n S 2 Pni=1(yi mi)2 o
1
exp 22
2 + 2 2 :
n(2 + 2 2) 21 n
;
j
;
/
/
;
;
;
;
;
;
Therefore, writing ~ = 2, we have
@ 2 log p(X ~) = n +
n
n
2
2
2
@ ~
2~ 2(~ + 2 2)2
f
j
;
g
;
S2
~ 3
;
2 Pni=1(yi mi)2 :
(~ + 2 2)3
;
Under the marginal model, we have E~ S 2 = n~ and E~ (yi mi)2 = 21 ~ + 2. So the
;
Fisher information for ~ under the marginal model is then
(~) = n=(2~2) + 12 n(~ + 2 2)2:
This proves the rst assertion. The second assertion follows immediately.
Interestingly, 1r() and 2r () remain substantially dierent because of the strong
prior input on the
i
in Option 2 even if n
. This prior input weakens if we take 2
! 1
13
very large, and in that case 1r () and 2r () approximately agree. Also, for any given
marginal prior on ( 1
n), the conditional prior of is independent of ( 1
n),
and is the same as 1r().
3.5. Bivariate Normal Distribution
Suppose that (Y1 Y2) has a bivariate normal distribution with unknown mean vector
(1 2) variances equal to one and known correlation . The joint density of (Y1 Y2) is
n (y1 1)2 2(y1 1)(y2 2) + (y2 2)2 o
1
exp
p(y1 y2 1 2) =
:
2(1 2)
2(1 2) 12
The Fisher information matrix is the inverse of the covariance matrix.
;
;
j
;
;
;
;
;
;
Proposition 6. (a) For any conditional prior of 2 given 1 , 1r (1 ) / 1.
(b) Suppose that 1 and 2 are independent. Then, for any subjective prior for 2,
2r(1) 1.
/
Proof. Part (a) is an immediate corollary of Theorem 1 (a). For any subjective prior
for 2, say s(2), the Fisher information of from the marginal model is given by
9
8 @ R1
< @1 ;1 p(Y1 Y2 1 2)s(2)d2 =2
(1) = E : R 1 p(Y Y )s( )d 1 2 1 2
2
2
;1
o2
n
1 @
s
Z Z R;1
p
(
y
y
)
(
)
d
R 1@1p(y 1y 2 1 )2s( )2d 2 dy1dy2
=
1 2 1 2
2
2
;1
h
i2
R
1
Z Z ;1
p(y1 y2 1 2) (y1 1) + (y2 2) s(2)d2
R 1 p(y y )s( )d
=
dy1dy2:
1 2 1 2
2
2
;1
By making the transformation t1 = y1 1, we know that the right hand side does not
j
j
j
j
j
f
;
;
g
j
;
depend on 1 and the Fisher information for 1 is in fact a constant. The result follows
immediately.
In general, the marginal priors under Option 1 and Option 2 are dierent. Here
is an example, where s(2 1) does depend on 1. Suppose that = 0 and assume
j
14
that s(2 1) has a N (1 12) distribution. It is easy to see that Y1 1 has a N (1 1)
j
j
distribution and, independently, Y2 1 has a N (1 1 + 12) distribution. The Fisher
j
information based on the observation Y1 is 1. Furthermore,
2 + 212
41(Y2 1) 412 (Y2 1)2 + (Y2 1)2 :
@ 2 log p(Y ) =
2
1
@12
1 + 12 (1 + 12)2
(1 + 12)2
(1 + 12)3
(1 + 12)2
The Fisher information based on the observation Y2 is then (1 + 12);1 + 212=(1 + 12)2.
f
j
g
;
;
;
;
;
;
Therefore,
2
(1) = 1 + 1 +1 2 + (1 +21 2)2 :
1
1
Clearly, the reference priors under Option 1 and Option 2 are thus dierent.
3.6. Beta Distribution
Consider the beta density
;( + ) x;1(1 x);1 0 < x < 1
p(x ) = ;(
);( )
;
j
(25)
where > 0 and > 0 are unknown parameters. The Fisher information matrix for
( ) is
0 G() G( + )
G( + ) 1
A
= ( ) = @
(26)
G( + )
G( ) G( + )
where G() = dd22 log ;() is the poly-gamma function. A computational formula for
G is G(x) = P1 (x + j );2 (Bowman and Shenton, 1988).
;
;
;
f
;
g
j =0
We now try to nd the conditional reference prior of given . Note that the Fisher
information matrix does not satisfy the condition in Theorem 2. We will see that the
conclusion of Theorem 2 fails in this case. Let li < ui be two sequences of constants
satisfying
li
!
0 and ui
15
:
! 1
(27)
Let Ki() = Rlui i G( ) G( + ) 21 d: When = 1, it is easy to show that G( )
G( +1) = ;2 and K (1) = R ui ;1d = log(u ) log(l ). For an arbitrary > 0, exact
f
;
g
i
;
i
li
;
i
computation of Ki () is quite complicated, but we have the following expansion.
Lemma 1. For xed and as i ! 1
Ki () = log(ui) log(li) + O(1)
p
;
where O(1) is a bounded constant.
Proof. See the Appendix.
Proposition 7. Assume that (27) holds and that ui li ! 1 as i ! 1. For any given
marginal prior of , the conditional reference prior of given is
r( )
j
1 G( ) G( + ) 12 :
+1
/ p
f
;
g
Proof. From Lemma 1,
Ki (1) = lim
log(ui) log(li)
2 :
lim
=
i!1 Ki () i!1 log (ui ) log(li) + O(1)
+1
;
p
p
;
The result then follows from (11).
This fact illustrates that, when 22(1 2) does not have the form (12), r(2 1)
j
j
j
may not be proportional to 22(1 2) 12 . Furthermore, r (2 1) may depend on the
j
j
j
p
choice of the compact supports !li ui]. For example, if li = u;i , then r( )
j
/
; 12 G( ) G( + ) 21 although such a choice of the compact sets would be rather
f
;
g
unusual.
4. When Two Parameters Are Known To Be Independent
16
4.1. Basic Algorithm
Other types of possible partial information may be available. For example, one might
believe that 1 and 2 are independent. Then one wants, as a reference prior, the product
of marginal reference priors, 1r(1) and 2r (2). It is not clear how to dene these, but
Option 1 in Section 2.1 suggests the following iterative algorithm.
Step 0. Choose any initial nonzero marginal prior density for 2, say 2(0)(2).
Step 1. Dene an interim prior density for 1 by
1 Z (0)
(1) exp 2 2 (2) log = 22 d2 :
(1)
1
j
/
j j
j
Step 2. Dene an interim prior density for 2 by
1 Z (1)
(2) exp 2 1 (1) log = 11 d1 :
(1)
2
/
j
j j
j
Now replace 2(0) in Step 0 by 2(1) and repeat Step 1 and Step 2, to obtain 1(2) and 2(2).
Consequently, we generate two sequences 1(i)
f
i1
g
and 2(i)
f
i1 .
g
The desired marginal
priors will be the limits jr = limi!1 j(i) j = 1 2, if the limits exist. In applying the
iterative algorithm, it may be necessary to operate on compact sets, and then let the
sets grow.
We do not know the extent to which this algorithm converges in general. We have
studied several specic situations, and convergence was achieved quickly. For instance,
in the two-parameter Weibull model the equations iterate to the usual reference prior
given in Sun (1997). It would clearly be of interest to establish conditions under which
convergence is guaranteed. For many important situations, it is possible to deduce the
result of the above algorithm directly without actually going though the iterations. Here
are two sucient conditions under which this can be done.
17
Theorem 3. (a) If jj=j22 j does not depend on 2 , then the marginal reference priors
are
r(
1
1
) (= )
/
j
j j
1
22 j 2
and
1 Z
r
exp 2 1(1) log( = 11 )d1 :
2)
r(
/
2
j
j j
j
(28)
(b) If jj=j11 j does not depend on 1, then the marginal reference priors are
r(
2
2
) (= )
/
j
j j
1
11 j 2
and
1 Z
r
exp
( ) log( = 22 )d2 :
1)
2 2 2
r(
1
/
j
j j
j
(29)
Proof. Under the assumption in (a), 1r(1) does not depend on the choice of 2(0) in
Step 0.
The reference priors under the independence assumption are, in general, dierent
from the reference prior or the reverse reference prior (Berger and Bernardo, 1992). The
following result gives a condition under which they are the same. Its proof is obvious,
and is omitted.
Theorem 4. If the Fisher information matrix of (1 2) is of the form
(1 2) = diag g1(1)h1(2) g2 (1)h2(2) f
g
then the independent marginal reference priors are
1r (1)
g1(1)
/ f
1
g2
and 2r (2 ) / fh2(2)g 2 :
(30)
1
Under the conditions of the theorem, when either 1 or 2 is the parameter of interest,
the reference priors have the same form: (1 2)
g1(1)h2(2)
/ f
1
g2
(cf. Datta and
Ghosh, 1995). Therefore, the reference prior and the reverse reference prior are also as
in (30).
4.2. Examples For Independent Priors
18
Example 1: normal distribution. Clearly, when and are independent, the marginal
reference priors are 1r ( ) 1 and 2r() 1=.
/
/
Example 2: gamma distribution. It is easy to see that = 22 does not dependent on
j
j j
j
. From Theorem 3, the marginal reference priors are 1r () 0() 1= 12 and
Z
0
2r ( )
exp 12 0() ;1 12 log 2(0()) 1 d
;1:
/ f
/
f
;
;
;
g
g
/
This is also the unrestricted reference prior when is the parameter of interest (Sun
and Ye, 1996).
Example 3: bivariate binomial distribution. Crowder and Sweeting (1989) consider the
following bivariate binomial distribution, whose probability density is given by
!
!
m
r
m
;r r
s (1 q )r;s f (r s p q) = r p (1 p)
q
s
j
;
;
where 0 < p q < 1, and s and r are nonnegative integers satisfying 0 s r m. The
Fisher information matrix for (p q) is given by
= m diag! p(1 p)
f
;
;1
g
p q(1 q)
f
Clearly, the Jereys prior is proportional to (1
f
;
;
;1
g
]:
p)q(1 q)
;
; 21
g
. Based on the as-
sumptions that p and q are independent, that = pq and = p(1 q)(1 pq);1 are
;
;
independent, and some invariance considerations, Crowder and Sweeting (1989) derived
the noninformative prior, CS (p q)
p(1 p)q(1 q) ;1. Polson and Wasserman
/ f
;
;
g
(1990) derived, as the reference prior when either p or q is the parameter of interest,
r (p q)
/ f
p(1 p)q(1 q)
;
;
; 21
g
. From Theorem 3, this is also the reference prior based
on independence of p and q.
19
4.3. A Clinical Trial: ECMO
Ware (1989) considered a Bayesian solution of a clinical trial. Ten patients were given
standard therapy and six survived. On the other hand, nine patients were treated with
ECMO (extra corporeal membrane oxygenation) and all nine survived. Let p1 be the
probability of success under standard therapy and p2 be the probability of success under
ECMO. It is desired to compare the two treatments. Let i = log pi =(1 pi ) i =
f
1 2 and = 2
;
;
g
1: The quantity of interest is then the posterior probability that
> 0 where 1 is a nuisance parameter. This example was reanalyzed by Kass and
Greenhouse (1989), who considered 84 dierent proper prior distributions, all involving
the independence assumption. They said that the independence assumption is somewhat
subtle and reasonable.
A follow-up to Kass and Greenhouse's study was given in Lavine et al. (1991),
who studied bounds on the posterior probability that > 0 under various priors with
and without the independence constraint. Berger and Moreno (1994) also treated the
example from a robust Bayesian viewpoint. Lavine et al. (1991) and Berger and Moreno
(1994) all showed that, without the independence assumption, the inma of the posterior
probability that > 0 for a reasonable class of priors might be very small. They also
thus suggested use of the independence assumption (assuming, of course, that it was
plausible in the application).
For this problem, both the Jereys prior and the Berger and Bernardo (1992) reference prior will give a dependent prior for and 1. We now derive the reference prior
under the independence assumption. First, the Fisher information matrix of (p1 p2) is
(p1 p2) = diag!n1= p1(1 p1 ) n2= p2 (1 p2) ]
f
;
20
g
f
;
g
where n1 = 10 and n2 = 9. Thus the Fisher information matrix of (1 ) is given by
0 n1 e1 + n2e1 +
n2 e1 + 1
(1+e 1 )2
(1+e1 + )2
(1+e1 + )2
A:
@
(31)
+
+
n2 e
n2 e
1
(1+e1 + )2
1
(1+e1 + )2
We can now apply Theorem 3, because = 22 = n1e1 =(1 + e1 )2 is independent of .
j
j j
j
Note that the two marginal priors are proper, so it is not necessary to use a compact set
limiting argument for the derivation.
(a) Under the constraint that the marginal priors for and 1 be
Proposition 8.
independent, the marginal reference priors are of the form
1r(1) = e1=2= (1 + e1 ) Z1
2r ()
exp 21 t(1 t)
0
f
/
(32)
g
f
;
;
g
; 21
i h
log 1 + nn1 (1 t)e;=2 + te=2 2 dt : (33)
2
f
;
g
(b) The marginal densities in (a) are both symmetric and exp(1 =2) has a folded
Cauchy (0,1) density.
Proof. The density in (32) follows from the fact that
Z 1 e1=2
Z1
1 ) 2 = :
; 12
d
=
t
(1
t
)
dt
=
;(
1
1
2
;1 1 + e
0
f
For (33),
r()
2
"
exp
= exp
= exp
/
;
;
;
;
g
g
f
+
1 Z 1 e1=2 log (1+n1ee11)2 + (1+n2ee11+ )2 d 1
n1 e1 n2 e1 +
2 ;1 1 + e1
(1+e1 )2 (1+e1 + )2
1 Z 1 e1=2 log n1e1 (1 + e1+ )2 + n2e1+ (1 + e1 )2 d 1
2 ;1 1 + e1
e1 e1+
1 Z 1 e1=2 lognn e;(1+)(1 + e1+ )2 + n e;1 (1 + e1 )2od :
1
2
1
2 ;1 1 + e1
Making the transformation t = e1 =(1 + e1 ), we have
Z1
2
2o i
n
h
2r()
exp 21 t(1 t) ; 21 log n1(1te t) 1 + 1te t + n2(1t t) 1 + 1 t t dt
0
/
;
f
;
;
;
g
;
21
;
Z1
1
= exp 2 t(1 t)
0
Z
1
exp 21 t(1 t)
0
/
h n1 (1 t)e;=2 + te=2 2 + n2 i dt
log
t(1 t)
h
i ; 12
log n1 (1 t)e;=2 + te=2 2 + n2 dt :
;
f
;
g
;
f
;
g
; 12
f
;
g
;
f
;
g
Formula (33) then follows. Part (b) is clear.
Kass and Greenhouse (1989) found that the posterior probability P ( > 0 data) is
j
approximately 0:95 based on the independence of the proper prior they favoured. For
their independent prior in the and 1 parameterization, P ( > 0 data) was approxij
mately 0:99. Figure 1 compares the independent reference prior density and the resulting
posterior density of . The resulting posterior probability that > 0 is about 0:99. It
is interesting that the noninformative prior analysis yields the same conclusion as the
Kass and Greenhouse (1989) subjective analysis for the same parameterization, even
though it can be shown that the reference priors are considerably more diuse than the
subjective priors of Kass and Greenhouse. Note nally that, even though 2r () can be
expressed only in terms of an integral, this is not a problem in that computation must
be done by Monte Carlo integration in any case.
5. Discussion
We have proposed two options to nd the marginal reference prior for 1 when the
conditional prior for 2 is known. Option 2 was felt to be the most natural approach,
but diculty in its implementation will usually necessitate use of the easy Option 1.
Table 1 summarizes, for the examples in this paper, when the two options are known
to yield the same answer. Note, however, that, for all examples considered in which
the two options give dierent answers with the exception of the Neyman-Scott problem,
1r( ) and 2r( ) agree asymptotically, as n
. This lends further support to general
! 1
22
use of the simple Option 1. In the Neyman-Scott problem, the two marginal reference
priors do remain dierent as n
! 1
, but, since the number of unknown parameters
grows with n, this is perhaps not unexpected.
The conditional reference prior, r (2 1), is usually given by (10), and does not then
j
depend on the specied marginal prior for 1. However, as shown in the example of the
beta distribution in Section 3.6, r (2 1) can dier from (10).
j
In dealing with the partial prior knowledge that 1 and 2 are independent, an iterative application of the reference algorithm was proposed. While this can be trivially
implemented in many important special cases, its general applicability and convergence
require further study.
Appendix. Proof of Lemma 1
Let
1n
o
X
( + j );2 ( + + j );2 :
J ( ) =
;
j =1
Then G( ) G(+ ) = ;2 (+ );2+J( ) = (+2 ) 2(+ )2
;
;
f
;1
g
+J( ): Dene
h(x) = ( +x);2 (+ +x);2 for x 0. Since h0(x) = 2 ( +x);3 (+ +x);3 < 0
;
; f
;
for x > 0 h(x) is a monotone decreasing function of x. Thus,
Z1
J( )
( + x);2 ( + + x);2 dx
1
f
;
g
= ( + 1);1 ( + + 1);1
;
= ( + 1)( + + 1)
f
;1
g
On the other hand, for any x 1 and > 0,
1
1 1
x2 (x + )2
x2 41
:
;
;
;
23
;
1
(x + )2
;
1
4
g
= (x + )
2
f
;
x
2
g
1
2
x (x + )2
;
(x2
;
1
1
2
4 ) (x + )
f
;
1
4g
which is negative. Thus, for any j 1,
1
( + j )2
;
1
( + + j )2
1
1
1
( + j )2 4 ( + + j )2 14
Z j+ 12 1
1
= j; 1 ( + x)2 ( + + x)2 dx:
2
;
;
;
;
Therefore,
J( )
Z 1n
1
2
1
( + x)2
;
o
1
dx
=
1
2
( + + x)
( + 2 )( + + 12 ) :
Consequently, HiL() Ki () HiU (), where
21
Z ui + 2
1
L
Hi () = l 2( + )2 + ( + 1)( + + 1) d i
12
Z ui + 2
1
U
Hi () = l 2( + )2 + ( + 1 )( + + 1 ) d:
i
2
2
p
p
For any 0 < < 1 small enough and i large enough such that li < < 1= < ui, we have
Z 1 + 2
12
2
U
Hi () = l ( + )2 + ( 1 + )( + 1 + ) d + O(1)
i
2
2
# 21 "
Z ui 1 2 + ;1
1
+
+ 1 + 1 ;1 1 + ( + 1 ) ;1 d
1= (1 + ;1)2
2
2
"Z #
Z
n
o
n
o
ui 1
1
1
= 1 + O( ) d + O(1) + 1 + O d
li 1=
p
f
p
=
gf
g
p
log(ui) log(li) + O(1):
p
;
Similarly, we have HiL() = log(ui) log(li) + O(1): This completes the proof.
p
;
Acknowledgements
Sun's research was partially supported by a National Security Agency grant and a
Research Board award from the University of Missouri-System. Berger's research was
24
supported under a National Science Foundation grant. The manuscript was prepared
using computer facilities supported in part by a National Science Foundation awarded
to the Department of Statistics at the University of Missouri{Columbia. The authors
gratefully acknowledge the very constructive comments of the editors (Professors A.P.
Dawid and D.M. Titterington), an associate editor, and two referees.
References
Berger, J. O. and Bernardo, J. M. (1989). Estimating a product of means:
Bayesian analysis with reference priors. J. Am. Statist. Assoc., 84, 200-207.
Berger, J. O. and Bernardo, J. M. (1992). On the development of reference priors
(with Discussion). In Bayesian Statistics, 4, Eds. J. M. Bernardo, J.O. Berger, A.P.
Dawid, and A.F.M. Smith, pp 35-60, Oxford University Press.
Berger, J. O. and Moreno, E. (1994). Bayesian robustness in bidimensional mod-
els: prior independence. J. Statistical Planning and Inference, 40, 161-176.
Bernardo, J. M. (1979). Reference posterior distributions for Bayesian inference. J.
Roy. Statist. Soc. Ser. B, 41, 113-147.
Bowman, K.O. and Shenton, L.R. (1988). Properties of Estimators for the Gamma
Distribution. New York: Marcel Dekker.
Crowder, M. and Sweeting, T. (1989). Bayesian inference for a bivariate binomial
distribution. Biometrika, 76, 599-603.
Datta, G.S. and Ghosh, M. (1995). Some remarks on noninformative priors. J.
Am. Statist. Assoc., 90, 1357-1363.
Ghosh, J.K. and Mukerjee, R. (1992). Noninformative priors (with Discussion).
25
In Bayesian Statistics, 4, Eds. J.M. Bernardo, J.O. Berger, A.P Dawid and A.F.M.
Smith, pp 195-210, Oxford Univ. Press.
Kass, R.E. and Greenhouse, J. (1989). \Investigating therapies of potentially great
benet: a Bayesian perspective," comment on \Investigating therapies of potentially
great benet: ECMO," by J.H. Ware, Statistical Science, 4, 310-314.
Kass, R.E. and Wasserman, L. (1996). The selection of prior distributions by formal
rules, J. Am. Statist. Assoc., 91, 1343-1370.
Lavine, M., Wasserman, L., and Wolpert, R.L. (1991). Bayesian inference with
specied prior marginals. J. Am. Statist. Assoc., 86, 964-971.
Polson, N. and Wasserman, L. (1990). Prior distributions for the bivariate bino-
mial. Biometrika, 77, 901-904.
Sun, D. (1997). A note on noninformative priors for Weibull distributions. Journal of
Statistical Planning and Inference, in press.
Sun, D. and Ye, K. (1996). Frequentist validity of posterior quantiles for a two-parameter
exponential family. Biometrika, 83, 55-65.
Ware, J. H. (1989). Investigating therapies of potentially great benet: ECMO (with
Discussion). Statistical Science, 4, 298-340.
26
Table 1: Comparison of Marginal Reference Priors from Two Options
Distribution
(1 2)
2r (1) = 1r(1)?
Multinomial (p1 p2)
(p1 p2)
Yes, if (16) holds
Normal ( 2)
( )
Yes, if s( ) = s()
( )
No
( )
No
Gamma ( )
Neyman{Scott
Normal2(1 2 1 1 )
( 1
j
n)
f
g
(1 2)
is known
No
Yes, if 1 and 2
are independent
27
0.0
0.05
0.10
0.15
0.20
Prior
Posterior
-20
-10
0
10
20
Fig. 1: Prior and Posterior Densities of from ECMO data