Reference priors with partial information By D. Sun Department of Statistics, University of Missouri-Columbia, Columbia, MO 65211, USA e-mail: [email protected] and J. O. Berger Institute of Statistics and Decision Sciences, Duke University, Durham, NC 27708, USA e-mail: [email protected] Summary In this paper, reference priors are derived for three cases where partial information is available. If a subjective conditional prior is given, two reasonable methods are proposed for nding the marginal reference prior. If, instead, a subjective marginal prior is available, a method for dening the conditional reference prior is proposed. A sucient condition is then given under which this conditional reference prior agrees with the conditional reference prior derived in the rst stage of the reference prior algorithm of Berger and Bernardo. Finally, under the assumption of independence, a method for nding marginal reference priors is also proposed. Various examples are given to illustrate the methods. Some key words: Kullback-Leibler divergence, noninformative prior, normal distribution, gamma distribution, beta distribution, Neyman-Scott Problem. 1 1. Introduction Bayesian analysis using noninformative or default priors has received considerable attention in recent years. A common noninformative prior is the Jereys prior (Jeffreys, 1961), which is proportional to the square root of the determinant of the Fisher information matrix. The Jereys prior is quite useful for a single parameter, but can be seriously decient in multiparameter problems (cf. Berger and Bernardo, 1992). For a recent review of various approaches to the development of noninformative priors, see Kass and Wasserman (1996). Here, we will concentrate on the reference prior approach, as developed in Bernardo (1979) and Berger and Bernardo (1992). In many practical problems, one has partial prior information for some of the parameters. For example, in a N ( 2) population, one might possess reasonably strong prior information about , while the prior information for is vague. As another example, Lavine et al. (1991) considered robust Bayesian inference with specied prior marginals. As a third example, the prior knowledge could be of independence of the parameters, such as in the ECMO clinical trial example studied by Ware (1989). Another common type of partial information is constraints on the parameter space. This is typically easily handled, however, in that reference priors for a constrained space are almost always just the unconstrained reference prior times the indicator function on the constrained space. The paper is arranged as follows. In Section 2, we will develop the reference priors when two types of partial information are available. We rst consider the case when a conditional prior is known and it is desired to nd the marginal reference prior. Two options are given. One is similar to the nal stage of Berger and Bernardo's reference prior algorithm the other is more intuitive and is based on deriving the marginal model. 2 Next, the conditional reference prior is derived when a marginal prior is known. A sucient condition is found under which the conditional noninformative prior often agrees with the conditional reference prior from the rst stage of Berger and Bernardo's algorithm. Some examples are given in Section 3, illustrating various aspects of these results. In Section 4, an algorithm is proposed for determining marginal reference priors when the two parameters are known to be independent. Some sucient conditions are given under which the answers can be written in closed form. Formal examples and the ECMO clinical trial example are used for illustration. Finally, Section 5 contains some discussion. 2. Knowing A Marginal Or Conditional Prior 2.1. Introduction Let Xn = (x1 xn) be a random sample from the density p(x 1 2), where the parameters 1 and 2 are vectors of dimensions d1 and d2, respectively. Let (1 2) denote the prior density of (1 2). The following questions are of interest. 1. Suppose there is available a subjective conditional prior density s(2 1) for 2 j given 1. How can we nd the marginal noninformative prior r(1) for 1? 2. Suppose there is available a subjective marginal prior density s(1) for 1. How can we nd the conditional noninformative prior r (2 1) for 2 given 1. j Solutions to these two questions will be discussed in 2.2 and 2.3, respectively. We xx will use (1 2 Xn) to denote the joint posterior density of 1 and 2 and (1 Xn) the j j marginal posterior density of 1. 2.2. Finding The Marginal Prior 3 When the conditional density of 2 given 1 is available, there are two reasonable options for nding a marginal reference prior r(1). Option 1: Following Bernardo (1979), dene the expected Kullback-Leibler divergence between the marginal posterior density of 1 given Xn and the marginal prior of 1 by o i hZ n r (1) Xn ( ) = E (1 Xn ) log (r1(X)n ) d1 1 R where the expectation is with respect to the marginal density m(Xn) = p(Xn 1)r(1)d1. If g j j j We seek that prior, r(1), which maximizes (1) asymptotically, since maximizing the distance between the prior and posterior is a plausible way to dene a prior which has minimal inuence on the analysis. It follows from Ghosh and Mukerjee (1992) that, under some regularity conditions and as n , Z n o Xn r( ) = d21 log 2ne + r (1) log r((1)) d1 + o(1) 1 ! 1 If where g n1 Z s o (1) = exp 2 (2 1) log d2 : 22 j j j j j (2) (3) Here = (1 2) is the per observation Fisher information matrix for (1 2) 22 = 22(1 2) is the per observation Fisher information for 2, given that 1 is held xed, and is the determinant of . j j The reference prior strategy suggests choosing r to maximize (1) or (2) asymptotically on compact sets this can easily be seen to lead to 1r(1) (1): / (4) In fact, this is essentially the solution used in Berger and Bernardo (1989, 1992). The following theorem gives an important special case. 4 Theorem 1. (a) If jj=j22j does not depend on 2, then, for any subjective conditional density s (2j1), the marginal reference prior from Option 1 has the form 1r(1) ( = 22 ) 21 : / j j j (5) j (b) If jj=j22 j depends only on 2 and s (2j1) does not depend on 1, then 1r (1) / 1. Proof. The results follow from the denition of 1r(1) and the assumption. Option 2: Find the marginal model Z p(Xn 1) = p(Xn 1 2)s(2 1)d2: j j (6) j Let (1) be the Fisher information matrix for 1 based on this marginal model. Then the reference prior for 1 in this model, again maximizing, asymptotically, the expected Kullback-Leibler divergence between the marginal posterior and the marginal prior, is 2r(1) (1) 12 : / fj (7) jg Option 2 more closely mirrors the underlying motivation for reference priors in that, with s(2 1) given, the information in the data about 1 resides in p(Xn 1). Hence the j j marginal reference prior for 1 should, ideally, be computed with respect to this mixture model. Unfortunately, the Fisher information matrix for such mixture models is often dicult to compute, so that implementation of Option 2 is often dicult. This same diculty also motivated the use of the analogue of Option 1 in Berger and Bernardo (1989, 1992), in place of the more natural Option 2. 2.3. Finding The Conditional Prior 5 For the case when the marginal prior density s(1) is known, consider the expected Kullback-Leibler divergence between the conditional posterior density of 2, given 1 and Xn , and the conditional prior of 2 given 1 Z hZ n o i r Xn ( 1) = E (1 Xn ) (2 1 Xn) log (r2(1X)n ) d2 d1 1 2 2 1 i hZ Z n (1 2 Xn) o = E (1 2 Xn) log ( ) d2 d1 1 2 1 2 hZ n (1 Xn ) o i E (1 Xn ) log ( ) d1 : (8) If j g j j j j j j ; j j 1 1 From an asymptotic expansion of the rst term, compare formula (1.1) of Ghosh and Mukerjee (1992, p. 192) and (2), we have that Z s hZ r n 22 21 o i d n 2 r Xn ( 1) = 2 log 2e + (1) (2 1) log r ( ) d2 d1 + o(1): (9) 1 2 2 1 Choosing r(2 1) to maximize (9) asymptotically thus suggests dening the conditional If j g j j j j j reference prior as r(2 1) 22(1 2) 21 : j / j (10) j When this conditional reference prior is proper, matters are straightforward. In practice, it may be improper and will then encounter normalization concerns. The compact support argument that is typically used in the reference prior approach (Berger and Bernardo, 1992) may then be applied here. Choose a nested sequence 1 2 of compact subsets of the parameter space for (1 2), such that ii = and r(2 1) j has nite mass on i = 2 : (1 2) i . Let 1A be the indicator function on A and Z Ki(1) = 22(1 2) 21 d2: f 2 i g j j The conditional reference prior of 2 on i is ir(2j1) = 22(1 2) 12 1 ( ): Ki (1) i 2 j j 6 Now dene the conditional reference prior of 2 by ir(2 1) r(2 1) = ilim !1 r ( ) j j i 20j 10 when the limit exists here (10 20) is any xed point. It is easy to see that r(2 1) j / Ki (10) ( ) 12 : lim i!1 K ( ) 22 1 2 i j 1 j (11) The following theorem gives a sucient condition under which this limit is proportional to 22(1 2) 21 . j j Theorem 2. Assume 22(1 2) = g1(1)g2(2) j (12) j for some functions g1 and g2 . Suppose = 1 2 and the compact sets are chosen to be of the form i = 1i 2i: Then the conditional reference prior of 2 satises r (2 1) j 22(1 2) 12 / j j g2(2) 12 : / f g Proof. Clearly, the normalizing constants Ki are independent of 1. The result follows immediately. Note that the conditional reference prior, r (2 1), never depends on the specied j marginal prior s(1). 3. Examples 3.1. Multinomial Distribution Consider the multinomial density for 3 cells p(y1 y2 p1 p2) = y !y !(k k!y y )! py11 py22 (1 p1 p2 )k;y1 ;y2 1 2 1 2 j ; ; ; 7 ; (13) where k is a positive integer, yi is the observed frequency in cell i, pi is the probability of cell i, and 0 p1 + p2 1: Without loss of generality, we will suppress the cell count and probability for the third cell. The Fisher information matrix for (p1 p2) is 80 p;1 0 1 0 1 1 19 < 1 = 1 1 A+ @ A : (p1 p2) = k :@ 1 p1 p2 1 1 0 p;1 2 ; ; (14) The Marginal Reference Prior. Given p1, assume that p2 has a conditional density s(p2 p1) on (0 1 p1 ): We want to nd the marginal reference prior for p1 . j ; ;1 Option 1. Since jj = 1=fp1 p2 (1 ; p1 ; p2 )g and 22 = p;1 2 + (1 ; p1 ; p2 ) , j =22 = 1= p1 (1 p1) . From Theorem 1, the marginal reference prior for p1 is j f ; g 1r(p1) 1 p1(1 p1 ) / f ; 1 g2 : (15) Option 2. Assume that n observations xi = (y1i y2i ) i = 1 n are obtained. Note that, in reference prior developments, one considers replications of the full experiment in creating the asymptotics. A considerable simplication results if s(p2 p1) = (1 p1);1g p2=(1 p1) j ; f ; (16) g for some density g on (0 1). This says that the random variable p2=(1 p1) has the ; same distribution for any p1. Then the marginal model is Z 1;p1 y y p(Xn p1) = p11+ p22+ (1 p1 p2 )nk;(y1+ +y2+ ) 1 1 p g( 1 p2p )dp2 0 1 1 Z 1 = py11+ (1 p1)nk;y1+ sy1+ (1 s)nk;y1+ ;y2+ g(s)ds j ; ; / ; ; ; ; 0 py11+ (1 p1)nk;y1+ ; where yi+ = Pnj=1 yij . The Fisher information corresponding to this model is then (p1) = p (1nk p ) : 1 8 ; 1 (17) Therefore, 2r(p1) is also given by (15) when (16) holds. If (16) does not hold, the marginal reference priors will typically be dierent. The Conditional Reference Prior of p2 . For any marginal subjective prior on p1 , the conditional reference prior of p2 is r(p2 p1) j 1 0 < p2 < 1 p1: p2(1 p1 p2) 12 / ; f ; ; g 3.2. Normal Distribution Consider the normal density with a mean and a standard deviation , p(x ) = 1 expn (x )2 o 2 2 (2) 21 ; ; ;1 <x< : 1 (18) The Fisher information matrix for ( ) is = ( ) = diag(;2 2;2): Case I. Marginal and Conditional Reference Priors For Proposition 1. (a) If s ( j ) is independent of , the marginal reference prior under Option 1 is 1r ( ) / 1. (b) If s( j ) is independent of , the marginal reference prior under Option 2 is 2r( ) 1. / (c) For any given marginal prior of , the conditional reference prior of is r ( j ) / 1: Proof. Note that = 22 = ;2, and (a) follows from Theorem 1(b). For (b), since j j j j s( ) = s(), the marginal probability density p(Xn ) is a scale-mixture of normals j j 9 and hence a location probability density. Consequently, the Fisher information for will be constant. Part (c) is obvious. When s( ) depends on , Option 1 and Option 2 can generate dierent marginal j priors for . Case II. Marginal and Conditional Reference Priors For . given , the marginal prior for Proposition 2. (a) For any conditional prior of under Option 1 has the form 1r( ) / 1=: (b) If the prior distribution for is normal with mean m and variance 2, then the marginal prior for under Option 2 has the form r () / 2 n 1 12 2 2 + (2 + n 2)2 : (19) ; (c) For any given marginal prior on , the conditional prior for is also / 1= independent of . Proof. Part (a) is immediate from Theorem 1 (a). Part (c) is easy. For (b), let Xn = (x1 xn) be a random sample from N ( 2). Dene xn = (x1 + and S 2 = Pn (x x )2. The marginal model is + xn )=n i=1 i ; n n ( m)2 o n S 2 o Z 1 n n(xn )2 o 1 1 p(Xn ) = n exp 22 exp exp 22 2 2 d ;1 (2) 21 n S 2 n(xn m)2 o 1 exp n;1 (2 + n 2)1=2 22 2(2 + n 2) : j ; ; ; / ; ; ; ; Therefore, writing ~ = 2, we have @ 2 log p(X ) = n 1 + 1 n @ ~ 2 2~2 2(~ + n 2)2 f j g ; 10 ; S2 ~ 3 ; n(xn m)2 : (~ + n 2)3 ; ; Under the marginal model, we have E~ S 2 = (n 1)~ and E~ (xn m)2 = ~ =n + 2. So ; ; the Fisher information for ~ under the marginal model is then (~) = (n 1)=(2~2) + 12 (~ + n 2)2: ; The result is immediate. Note that 2r () diers from 1r(). When n ! 1 , however, 2r () will converge to 1r(). 3.3. Gamma Distribution Consider the gamma density p(x ) = j ;() x;1 exp( x) x > 0 where > 0 ; and > 0: The Fisher information matrix for ( ) is 0 0() 1 1 A = ( ) = @ ; ; 1 2 (20) where () = d log ;() =d: f g Proposition 3. (a) For any conditional density of given , the marginal reference prior for under Option 1 is given by 1r () 0() ;1 12 : / f ; (21) g (b) Assume that the conditional distribution of given is Gamma(a b). The marginal reference prior for under Option 2 is given by 2r () 0() n0(n + a) 12 : / f ; g (22) (c) For any given marginal prior on , the conditional reference prior for is independent of and is given by ;1. 11 Proof. It is easy to see that = ;2 0() 1 and 22 = ;2. Thus =22 = j j f ; g j 0() ;1 which implies (21). Part (c) is clear. For (b), let Xn = (x1 ; j xn) be a random sample from a Gamma distribution. The marginal model is then Qn x );1 Z 1 n a ( =1 i n exp X x b a;1 exp( b)d p(Xn ) = i;( i ) n 0 ;(a) i =1 Q ;(n + a)( ni=1 xi) : n+a ;() n Pni=1 xi + b j f ; g ; / f g The second derivative of the logarithm of p(Xn ) is n0()+ n2 0(n + a) from which j ; (b) follows immediately. Using an expansion for 0( ), (cf. Equation (1.45b) of Bowman and Shenton, 1988), we can show that 0(x) = x;1 + (2x2);1 + O(x;3 ) as x ;1 as n ! 1 . Therefore, n0(n + a) ! . Consequently, the marginal reference prior for under Option 2 ! 1 converges to the marginal reference prior for under Option 1. 3.4. Neyman{Scott Problem Suppose that yij i = 1 n j = 1 2 are independent observations, and yij has a normal distribution with mean i and variance 2. We want to nd the marginal reference prior distribution for 2 or . The Fisher information matrix for ( 1 n 2) is given by ( 1 n 2) = diag(2;2 Proposition 4. For any conditional prior of ( 1 2;2 n;4): n ) given 2, the marginal ref- erence prior for 2 under Option 1 is 1r (2) / ;2: Equivalently, the marginal reference prior for is 1r ( ) / ;1: Proof. Let 1 = 2 and 2 = ( 1 n ). We note that = 22 = n;4, which does j 12 j j j not depend on 2. The rst result follows from Theorem 1 (a). The second result is an immediate corollary. Proposition 5. Suppose that the prior distributions for i i = 1 n are indepen- dent normal with mean mi and variance 2. The marginal reference prior for 2 under Option 2 then has the form 2r (2) ;4 + (2 + 2 2);2 12 : / f (23) g Equivalently, the marginal reference prior for under Option 2 has the form 2r () ;2 + 2(2 + 2 2);2 12 : / f (24) g Proof. Dene yi = (yi1 + yi2)=2 and S 2 = Pni=1 P2j=1(yij yi)2. The marginal model is n Z1 n S2 o Y n 2(yi i)2 o 1 n ( i mi)2 o 1 2 d i p(Xn ) exp 1 2n exp 22 i=1 ;1 exp 2 2 2 2 2 (2 ) n S 2 Pni=1(yi mi)2 o 1 exp 22 2 + 2 2 : n(2 + 2 2) 21 n ; j ; / / ; ; ; ; ; ; Therefore, writing ~ = 2, we have @ 2 log p(X ~) = n + n n 2 2 2 @ ~ 2~ 2(~ + 2 2)2 f j ; g ; S2 ~ 3 ; 2 Pni=1(yi mi)2 : (~ + 2 2)3 ; Under the marginal model, we have E~ S 2 = n~ and E~ (yi mi)2 = 21 ~ + 2. So the ; Fisher information for ~ under the marginal model is then (~) = n=(2~2) + 12 n(~ + 2 2)2: This proves the rst assertion. The second assertion follows immediately. Interestingly, 1r() and 2r () remain substantially dierent because of the strong prior input on the i in Option 2 even if n . This prior input weakens if we take 2 ! 1 13 very large, and in that case 1r () and 2r () approximately agree. Also, for any given marginal prior on ( 1 n), the conditional prior of is independent of ( 1 n), and is the same as 1r(). 3.5. Bivariate Normal Distribution Suppose that (Y1 Y2) has a bivariate normal distribution with unknown mean vector (1 2) variances equal to one and known correlation . The joint density of (Y1 Y2) is n (y1 1)2 2(y1 1)(y2 2) + (y2 2)2 o 1 exp p(y1 y2 1 2) = : 2(1 2) 2(1 2) 12 The Fisher information matrix is the inverse of the covariance matrix. ; ; j ; ; ; ; ; ; Proposition 6. (a) For any conditional prior of 2 given 1 , 1r (1 ) / 1. (b) Suppose that 1 and 2 are independent. Then, for any subjective prior for 2, 2r(1) 1. / Proof. Part (a) is an immediate corollary of Theorem 1 (a). For any subjective prior for 2, say s(2), the Fisher information of from the marginal model is given by 9 8 @ R1 < @1 ;1 p(Y1 Y2 1 2)s(2)d2 =2 (1) = E : R 1 p(Y Y )s( )d 1 2 1 2 2 2 ;1 o2 n 1 @ s Z Z R;1 p ( y y ) ( ) d R 1@1p(y 1y 2 1 )2s( )2d 2 dy1dy2 = 1 2 1 2 2 2 ;1 h i2 R 1 Z Z ;1 p(y1 y2 1 2) (y1 1) + (y2 2) s(2)d2 R 1 p(y y )s( )d = dy1dy2: 1 2 1 2 2 2 ;1 By making the transformation t1 = y1 1, we know that the right hand side does not j j j j j f ; ; g j ; depend on 1 and the Fisher information for 1 is in fact a constant. The result follows immediately. In general, the marginal priors under Option 1 and Option 2 are dierent. Here is an example, where s(2 1) does depend on 1. Suppose that = 0 and assume j 14 that s(2 1) has a N (1 12) distribution. It is easy to see that Y1 1 has a N (1 1) j j distribution and, independently, Y2 1 has a N (1 1 + 12) distribution. The Fisher j information based on the observation Y1 is 1. Furthermore, 2 + 212 41(Y2 1) 412 (Y2 1)2 + (Y2 1)2 : @ 2 log p(Y ) = 2 1 @12 1 + 12 (1 + 12)2 (1 + 12)2 (1 + 12)3 (1 + 12)2 The Fisher information based on the observation Y2 is then (1 + 12);1 + 212=(1 + 12)2. f j g ; ; ; ; ; ; Therefore, 2 (1) = 1 + 1 +1 2 + (1 +21 2)2 : 1 1 Clearly, the reference priors under Option 1 and Option 2 are thus dierent. 3.6. Beta Distribution Consider the beta density ;( + ) x;1(1 x);1 0 < x < 1 p(x ) = ;( );( ) ; j (25) where > 0 and > 0 are unknown parameters. The Fisher information matrix for ( ) is 0 G() G( + ) G( + ) 1 A = ( ) = @ (26) G( + ) G( ) G( + ) where G() = dd22 log ;() is the poly-gamma function. A computational formula for G is G(x) = P1 (x + j );2 (Bowman and Shenton, 1988). ; ; ; f ; g j =0 We now try to nd the conditional reference prior of given . Note that the Fisher information matrix does not satisfy the condition in Theorem 2. We will see that the conclusion of Theorem 2 fails in this case. Let li < ui be two sequences of constants satisfying li ! 0 and ui 15 : ! 1 (27) Let Ki() = Rlui i G( ) G( + ) 21 d: When = 1, it is easy to show that G( ) G( +1) = ;2 and K (1) = R ui ;1d = log(u ) log(l ). For an arbitrary > 0, exact f ; g i ; i li ; i computation of Ki () is quite complicated, but we have the following expansion. Lemma 1. For xed and as i ! 1 Ki () = log(ui) log(li) + O(1) p ; where O(1) is a bounded constant. Proof. See the Appendix. Proposition 7. Assume that (27) holds and that ui li ! 1 as i ! 1. For any given marginal prior of , the conditional reference prior of given is r( ) j 1 G( ) G( + ) 12 : +1 / p f ; g Proof. From Lemma 1, Ki (1) = lim log(ui) log(li) 2 : lim = i!1 Ki () i!1 log (ui ) log(li) + O(1) +1 ; p p ; The result then follows from (11). This fact illustrates that, when 22(1 2) does not have the form (12), r(2 1) j j j may not be proportional to 22(1 2) 12 . Furthermore, r (2 1) may depend on the j j j p choice of the compact supports !li ui]. For example, if li = u;i , then r( ) j / ; 12 G( ) G( + ) 21 although such a choice of the compact sets would be rather f ; g unusual. 4. When Two Parameters Are Known To Be Independent 16 4.1. Basic Algorithm Other types of possible partial information may be available. For example, one might believe that 1 and 2 are independent. Then one wants, as a reference prior, the product of marginal reference priors, 1r(1) and 2r (2). It is not clear how to dene these, but Option 1 in Section 2.1 suggests the following iterative algorithm. Step 0. Choose any initial nonzero marginal prior density for 2, say 2(0)(2). Step 1. Dene an interim prior density for 1 by 1 Z (0) (1) exp 2 2 (2) log = 22 d2 : (1) 1 j / j j j Step 2. Dene an interim prior density for 2 by 1 Z (1) (2) exp 2 1 (1) log = 11 d1 : (1) 2 / j j j j Now replace 2(0) in Step 0 by 2(1) and repeat Step 1 and Step 2, to obtain 1(2) and 2(2). Consequently, we generate two sequences 1(i) f i1 g and 2(i) f i1 . g The desired marginal priors will be the limits jr = limi!1 j(i) j = 1 2, if the limits exist. In applying the iterative algorithm, it may be necessary to operate on compact sets, and then let the sets grow. We do not know the extent to which this algorithm converges in general. We have studied several specic situations, and convergence was achieved quickly. For instance, in the two-parameter Weibull model the equations iterate to the usual reference prior given in Sun (1997). It would clearly be of interest to establish conditions under which convergence is guaranteed. For many important situations, it is possible to deduce the result of the above algorithm directly without actually going though the iterations. Here are two sucient conditions under which this can be done. 17 Theorem 3. (a) If jj=j22 j does not depend on 2 , then the marginal reference priors are r( 1 1 ) (= ) / j j j 1 22 j 2 and 1 Z r exp 2 1(1) log( = 11 )d1 : 2) r( / 2 j j j j (28) (b) If jj=j11 j does not depend on 1, then the marginal reference priors are r( 2 2 ) (= ) / j j j 1 11 j 2 and 1 Z r exp ( ) log( = 22 )d2 : 1) 2 2 2 r( 1 / j j j j (29) Proof. Under the assumption in (a), 1r(1) does not depend on the choice of 2(0) in Step 0. The reference priors under the independence assumption are, in general, dierent from the reference prior or the reverse reference prior (Berger and Bernardo, 1992). The following result gives a condition under which they are the same. Its proof is obvious, and is omitted. Theorem 4. If the Fisher information matrix of (1 2) is of the form (1 2) = diag g1(1)h1(2) g2 (1)h2(2) f g then the independent marginal reference priors are 1r (1) g1(1) / f 1 g2 and 2r (2 ) / fh2(2)g 2 : (30) 1 Under the conditions of the theorem, when either 1 or 2 is the parameter of interest, the reference priors have the same form: (1 2) g1(1)h2(2) / f 1 g2 (cf. Datta and Ghosh, 1995). Therefore, the reference prior and the reverse reference prior are also as in (30). 4.2. Examples For Independent Priors 18 Example 1: normal distribution. Clearly, when and are independent, the marginal reference priors are 1r ( ) 1 and 2r() 1=. / / Example 2: gamma distribution. It is easy to see that = 22 does not dependent on j j j j . From Theorem 3, the marginal reference priors are 1r () 0() 1= 12 and Z 0 2r ( ) exp 12 0() ;1 12 log 2(0()) 1 d ;1: / f / f ; ; ; g g / This is also the unrestricted reference prior when is the parameter of interest (Sun and Ye, 1996). Example 3: bivariate binomial distribution. Crowder and Sweeting (1989) consider the following bivariate binomial distribution, whose probability density is given by ! ! m r m ;r r s (1 q )r;s f (r s p q) = r p (1 p) q s j ; ; where 0 < p q < 1, and s and r are nonnegative integers satisfying 0 s r m. The Fisher information matrix for (p q) is given by = m diag! p(1 p) f ; ;1 g p q(1 q) f Clearly, the Jereys prior is proportional to (1 f ; ; ;1 g ]: p)q(1 q) ; ; 21 g . Based on the as- sumptions that p and q are independent, that = pq and = p(1 q)(1 pq);1 are ; ; independent, and some invariance considerations, Crowder and Sweeting (1989) derived the noninformative prior, CS (p q) p(1 p)q(1 q) ;1. Polson and Wasserman / f ; ; g (1990) derived, as the reference prior when either p or q is the parameter of interest, r (p q) / f p(1 p)q(1 q) ; ; ; 21 g . From Theorem 3, this is also the reference prior based on independence of p and q. 19 4.3. A Clinical Trial: ECMO Ware (1989) considered a Bayesian solution of a clinical trial. Ten patients were given standard therapy and six survived. On the other hand, nine patients were treated with ECMO (extra corporeal membrane oxygenation) and all nine survived. Let p1 be the probability of success under standard therapy and p2 be the probability of success under ECMO. It is desired to compare the two treatments. Let i = log pi =(1 pi ) i = f 1 2 and = 2 ; ; g 1: The quantity of interest is then the posterior probability that > 0 where 1 is a nuisance parameter. This example was reanalyzed by Kass and Greenhouse (1989), who considered 84 dierent proper prior distributions, all involving the independence assumption. They said that the independence assumption is somewhat subtle and reasonable. A follow-up to Kass and Greenhouse's study was given in Lavine et al. (1991), who studied bounds on the posterior probability that > 0 under various priors with and without the independence constraint. Berger and Moreno (1994) also treated the example from a robust Bayesian viewpoint. Lavine et al. (1991) and Berger and Moreno (1994) all showed that, without the independence assumption, the inma of the posterior probability that > 0 for a reasonable class of priors might be very small. They also thus suggested use of the independence assumption (assuming, of course, that it was plausible in the application). For this problem, both the Jereys prior and the Berger and Bernardo (1992) reference prior will give a dependent prior for and 1. We now derive the reference prior under the independence assumption. First, the Fisher information matrix of (p1 p2) is (p1 p2) = diag!n1= p1(1 p1 ) n2= p2 (1 p2) ] f ; 20 g f ; g where n1 = 10 and n2 = 9. Thus the Fisher information matrix of (1 ) is given by 0 n1 e1 + n2e1 + n2 e1 + 1 (1+e 1 )2 (1+e1 + )2 (1+e1 + )2 A: @ (31) + + n2 e n2 e 1 (1+e1 + )2 1 (1+e1 + )2 We can now apply Theorem 3, because = 22 = n1e1 =(1 + e1 )2 is independent of . j j j j Note that the two marginal priors are proper, so it is not necessary to use a compact set limiting argument for the derivation. (a) Under the constraint that the marginal priors for and 1 be Proposition 8. independent, the marginal reference priors are of the form 1r(1) = e1=2= (1 + e1 ) Z1 2r () exp 21 t(1 t) 0 f / (32) g f ; ; g ; 21 i h log 1 + nn1 (1 t)e;=2 + te=2 2 dt : (33) 2 f ; g (b) The marginal densities in (a) are both symmetric and exp(1 =2) has a folded Cauchy (0,1) density. Proof. The density in (32) follows from the fact that Z 1 e1=2 Z1 1 ) 2 = : ; 12 d = t (1 t ) dt = ;( 1 1 2 ;1 1 + e 0 f For (33), r() 2 " exp = exp = exp / ; ; ; ; g g f + 1 Z 1 e1=2 log (1+n1ee11)2 + (1+n2ee11+ )2 d 1 n1 e1 n2 e1 + 2 ;1 1 + e1 (1+e1 )2 (1+e1 + )2 1 Z 1 e1=2 log n1e1 (1 + e1+ )2 + n2e1+ (1 + e1 )2 d 1 2 ;1 1 + e1 e1 e1+ 1 Z 1 e1=2 lognn e;(1+)(1 + e1+ )2 + n e;1 (1 + e1 )2od : 1 2 1 2 ;1 1 + e1 Making the transformation t = e1 =(1 + e1 ), we have Z1 2 2o i n h 2r() exp 21 t(1 t) ; 21 log n1(1te t) 1 + 1te t + n2(1t t) 1 + 1 t t dt 0 / ; f ; ; ; g ; 21 ; Z1 1 = exp 2 t(1 t) 0 Z 1 exp 21 t(1 t) 0 / h n1 (1 t)e;=2 + te=2 2 + n2 i dt log t(1 t) h i ; 12 log n1 (1 t)e;=2 + te=2 2 + n2 dt : ; f ; g ; f ; g ; 12 f ; g ; f ; g Formula (33) then follows. Part (b) is clear. Kass and Greenhouse (1989) found that the posterior probability P ( > 0 data) is j approximately 0:95 based on the independence of the proper prior they favoured. For their independent prior in the and 1 parameterization, P ( > 0 data) was approxij mately 0:99. Figure 1 compares the independent reference prior density and the resulting posterior density of . The resulting posterior probability that > 0 is about 0:99. It is interesting that the noninformative prior analysis yields the same conclusion as the Kass and Greenhouse (1989) subjective analysis for the same parameterization, even though it can be shown that the reference priors are considerably more diuse than the subjective priors of Kass and Greenhouse. Note nally that, even though 2r () can be expressed only in terms of an integral, this is not a problem in that computation must be done by Monte Carlo integration in any case. 5. Discussion We have proposed two options to nd the marginal reference prior for 1 when the conditional prior for 2 is known. Option 2 was felt to be the most natural approach, but diculty in its implementation will usually necessitate use of the easy Option 1. Table 1 summarizes, for the examples in this paper, when the two options are known to yield the same answer. Note, however, that, for all examples considered in which the two options give dierent answers with the exception of the Neyman-Scott problem, 1r( ) and 2r( ) agree asymptotically, as n . This lends further support to general ! 1 22 use of the simple Option 1. In the Neyman-Scott problem, the two marginal reference priors do remain dierent as n ! 1 , but, since the number of unknown parameters grows with n, this is perhaps not unexpected. The conditional reference prior, r (2 1), is usually given by (10), and does not then j depend on the specied marginal prior for 1. However, as shown in the example of the beta distribution in Section 3.6, r (2 1) can dier from (10). j In dealing with the partial prior knowledge that 1 and 2 are independent, an iterative application of the reference algorithm was proposed. While this can be trivially implemented in many important special cases, its general applicability and convergence require further study. Appendix. Proof of Lemma 1 Let 1n o X ( + j );2 ( + + j );2 : J ( ) = ; j =1 Then G( ) G(+ ) = ;2 (+ );2+J( ) = (+2 ) 2(+ )2 ; ; f ;1 g +J( ): Dene h(x) = ( +x);2 (+ +x);2 for x 0. Since h0(x) = 2 ( +x);3 (+ +x);3 < 0 ; ; f ; for x > 0 h(x) is a monotone decreasing function of x. Thus, Z1 J( ) ( + x);2 ( + + x);2 dx 1 f ; g = ( + 1);1 ( + + 1);1 ; = ( + 1)( + + 1) f ;1 g On the other hand, for any x 1 and > 0, 1 1 1 x2 (x + )2 x2 41 : ; ; ; 23 ; 1 (x + )2 ; 1 4 g = (x + ) 2 f ; x 2 g 1 2 x (x + )2 ; (x2 ; 1 1 2 4 ) (x + ) f ; 1 4g which is negative. Thus, for any j 1, 1 ( + j )2 ; 1 ( + + j )2 1 1 1 ( + j )2 4 ( + + j )2 14 Z j+ 12 1 1 = j; 1 ( + x)2 ( + + x)2 dx: 2 ; ; ; ; Therefore, J( ) Z 1n 1 2 1 ( + x)2 ; o 1 dx = 1 2 ( + + x) ( + 2 )( + + 12 ) : Consequently, HiL() Ki () HiU (), where 21 Z ui + 2 1 L Hi () = l 2( + )2 + ( + 1)( + + 1) d i 12 Z ui + 2 1 U Hi () = l 2( + )2 + ( + 1 )( + + 1 ) d: i 2 2 p p For any 0 < < 1 small enough and i large enough such that li < < 1= < ui, we have Z 1 + 2 12 2 U Hi () = l ( + )2 + ( 1 + )( + 1 + ) d + O(1) i 2 2 # 21 " Z ui 1 2 + ;1 1 + + 1 + 1 ;1 1 + ( + 1 ) ;1 d 1= (1 + ;1)2 2 2 "Z # Z n o n o ui 1 1 1 = 1 + O( ) d + O(1) + 1 + O d li 1= p f p = gf g p log(ui) log(li) + O(1): p ; Similarly, we have HiL() = log(ui) log(li) + O(1): This completes the proof. p ; Acknowledgements Sun's research was partially supported by a National Security Agency grant and a Research Board award from the University of Missouri-System. Berger's research was 24 supported under a National Science Foundation grant. The manuscript was prepared using computer facilities supported in part by a National Science Foundation awarded to the Department of Statistics at the University of Missouri{Columbia. The authors gratefully acknowledge the very constructive comments of the editors (Professors A.P. Dawid and D.M. Titterington), an associate editor, and two referees. References Berger, J. O. and Bernardo, J. M. (1989). Estimating a product of means: Bayesian analysis with reference priors. J. Am. Statist. Assoc., 84, 200-207. Berger, J. O. and Bernardo, J. M. (1992). On the development of reference priors (with Discussion). In Bayesian Statistics, 4, Eds. J. M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, pp 35-60, Oxford University Press. Berger, J. O. and Moreno, E. (1994). Bayesian robustness in bidimensional mod- els: prior independence. J. Statistical Planning and Inference, 40, 161-176. Bernardo, J. M. (1979). Reference posterior distributions for Bayesian inference. J. Roy. Statist. Soc. Ser. B, 41, 113-147. Bowman, K.O. and Shenton, L.R. (1988). Properties of Estimators for the Gamma Distribution. New York: Marcel Dekker. Crowder, M. and Sweeting, T. (1989). Bayesian inference for a bivariate binomial distribution. Biometrika, 76, 599-603. Datta, G.S. and Ghosh, M. (1995). Some remarks on noninformative priors. J. Am. Statist. Assoc., 90, 1357-1363. Ghosh, J.K. and Mukerjee, R. (1992). Noninformative priors (with Discussion). 25 In Bayesian Statistics, 4, Eds. J.M. Bernardo, J.O. Berger, A.P Dawid and A.F.M. Smith, pp 195-210, Oxford Univ. Press. Kass, R.E. and Greenhouse, J. (1989). \Investigating therapies of potentially great benet: a Bayesian perspective," comment on \Investigating therapies of potentially great benet: ECMO," by J.H. Ware, Statistical Science, 4, 310-314. Kass, R.E. and Wasserman, L. (1996). The selection of prior distributions by formal rules, J. Am. Statist. Assoc., 91, 1343-1370. Lavine, M., Wasserman, L., and Wolpert, R.L. (1991). Bayesian inference with specied prior marginals. J. Am. Statist. Assoc., 86, 964-971. Polson, N. and Wasserman, L. (1990). Prior distributions for the bivariate bino- mial. Biometrika, 77, 901-904. Sun, D. (1997). A note on noninformative priors for Weibull distributions. Journal of Statistical Planning and Inference, in press. Sun, D. and Ye, K. (1996). Frequentist validity of posterior quantiles for a two-parameter exponential family. Biometrika, 83, 55-65. Ware, J. H. (1989). Investigating therapies of potentially great benet: ECMO (with Discussion). Statistical Science, 4, 298-340. 26 Table 1: Comparison of Marginal Reference Priors from Two Options Distribution (1 2) 2r (1) = 1r(1)? Multinomial (p1 p2) (p1 p2) Yes, if (16) holds Normal ( 2) ( ) Yes, if s( ) = s() ( ) No ( ) No Gamma ( ) Neyman{Scott Normal2(1 2 1 1 ) ( 1 j n) f g (1 2) is known No Yes, if 1 and 2 are independent 27 0.0 0.05 0.10 0.15 0.20 Prior Posterior -20 -10 0 10 20 Fig. 1: Prior and Posterior Densities of from ECMO data
© Copyright 2026 Paperzz