MA40189 - Solution Sheet Four Simon Shaw, [email protected] http://people.bath.ac.uk/masss/ma40189.html 2016/17 Semester II 1. Let X1 , . . . , Xn be exchangeable so that the Xi are conditionally independent given a parameter θ. (a) Let Xi | θ ∼ Bern(θ). i. Show that f (xi | θ) belongs to the 1-parameter exponential family and for X = (X1 , . . . , Xn ) state the sufficient statistic for learning about θ. Notice that we can write f (xi | θ) = = θxi (1 − θ)1−xi θ xi + log(1 − θ) exp log 1−θ so that f (xi | θ) belongs to the 1-parameter exponential family with φ1 (θ) = θ , u1 (xi ) = xi , g(θ) = log(1 − θ) and h(xi ) = 0. Notice that, from log 1−θ Pn Proposition 1 (see Lecture 11), tn = [n, i=1 Xi ] is a sufficient statistic. ii. By viewing the likelihood as a function of θ, which generic family of distributions (over θ) is the likelihood a kernel of ? The likelihood, without expressing in the explicit exponential family form, is f (x | θ) = θnx̄ (1 − θ)n−nx̄ which, viewing as a function of θ, we immediately recognise as a Beta kernel (in particular, a Beta(nx̄ + 1, n − nx̄ + 1)). iii. By first finding the corresponding posterior distribution for θ given x = (x1 , . . . , xn ), show that this family of distributions is conjugate with respect to the likelihood f (x | θ). Taking θ ∼ Beta(α, β) we have that f (θ | x) ∝ = θnx̄ (1 − θ)n−nx̄ × θα−1 (1 − θ)β−1 θα+nx̄−1 (1 − θ)β+n−nx̄−1 so that θ | x ∼ Beta(α + nx̄, β + n − nx̄). Thus, the prior and the posterior are in the same family giving conjugacy. Deriving the results directly from exponential family representation 1 Expressed in the 1-parameter exponential family form the likelihood is ( ) X n θ f (x | θ) = exp log xi + n log(1 − θ) 1 − θ i=1 Pn from which we immediately observe the sufficient statistic tn = [n, i=1 xi ]. Viewing f (x | θ) as a function of θ the natural conjugate prior is a member of the 2-parameter exponential family of the form θ f (θ) = exp a log + d log(1 − θ) + c(a, d) 1−θ where c(a, d) is the normalising constant. Hence, θ f (θ) ∝ exp a log + d log(1 − θ) 1−θ = θa (1 − θ)d−a which we recognise as a kernel of a Beta distribution. The convention is to label the hyperparameters as α and β so that we put α = α(a, d) = a + 1 and β = β(a, d) = d − a + 1 (equivalently, a = a(α, β) = α − 1, d = d(α, β) = β + α − 2). The conjugate prior distribution is θ ∼ Beta(α, β). (b) Let Xi | θ ∼ N (µ, θ) with µ known. i. Show that f (xi | θ) belongs to the 1-parameter exponential family and for X = (X1 , . . . , Xn ) state the sufficient statistic for learning about θ. Writing the normal density as an exponential family (parameter θ as µ is a known constant) we have √ 1 1 2 f (xi | θ) = exp − (xi − µ) − log θ − log 2π 2θ 2 so that f (xi | θ) belongs Pn to the 1-parameter exponential family. The sufficient statistic is tn = [n, i=1 (xi − µ)2 ]. Note that, expressed explicitly as a 1parameter exponential family, the likelihood for x = (x1 , . . . , xn ) is ( ) n √ 1 X n 2 f (x | θ) = exp − (xi − µ) − log θ − n log 2π 2θ i=1 2 so that the natural conjugate prior has the form 1 f (θ) = exp −a − d log θ + c(a, d) θ 1 −d ∝ θ exp −a θ which we recognise as a kernel of an Inverse-Gamma distribution. 2 ii. By viewing the likelihood as a function of θ, which generic family of distributions (over θ) is the likelihood a kernel of ? In conventional form, ( f (x | θ) ∝ θ −n 2 n 1 X exp − (xi − µ)2 2θ i=1 ) which, viewing f (x | θ) as a function of θ, we recognise as a kernel Pn of an 1 Inverse-Gamma distribution (in particular, an Inv-gamma( n−2 , i=1 (xi − 2 2 µ)2 )). iii. By first finding the corresponding posterior distribution for θ given x = (x1 , . . . , xn ), show that this family of distributions is conjugate with respect to the likelihood f (x | θ). Taking θ ∼ Inv-gamma(α, β) we have ) ( n β 1 X 2 −(α+1) −n 2 (xi − µ) × θ exp − exp − f (θ | x) ∝ θ 2θ i=1 θ ( ! ) n n 1X 1 = θ−(α+ 2 +1) exp − β + (xi − µ)2 2 i=1 θ which we recognise as a kernel Pnof an Inverse-Gamma distribution so that θ | x ∼ Inv-gamma(α+ n2 , β + 21 i=1 (xi −µ)2 ). Hence, the prior and posterior are in the same family giving conjugacy. (c) Let Xi | θ ∼ M axwell(θ), the Maxwell distribution with parameter θ so that 12 3 2 θx2 f (xi | θ) = θ 2 x2i exp − i , xi > 0 π 2 q 2 and E(Xi | θ) = 2 πθ , V ar(Xi | θ) = 3π−8 πθ . i. Show that f (xi | θ) belongs to the 1-parameter exponential family and for X = (X1 , . . . , Xn ) state the sufficient statistic for learning about θ. Writing the Maxwell density in exponential family form we have x2i 3 1 2 2 f (xi | θ) = exp −θ + log θ + log xi + log 2 2 2 π so that f (xi | θ) belongs Pn to the 1-parameter exponential family. The sufficient statistic is tn = [n, i=1 x2i ]. Note that, expressed explicitly as a 1-parameter exponential family, the likelihood for x = (x1 , . . . , xn ) is ( ) n n X X x2i 3n n 2 f (x | θ) = exp −θ + log θ + log x2i + log 2 2 2 π i=1 i=1 3 so that the natural conjugate prior has the form f (θ) = exp {−aθ + d log θ + c(a, d)} ∝ θd e−aθ which we recognise as a kernel of a Gamma distribution. ii. By viewing the likelihood as a function of θ, which generic family of distributions (over θ) is the likelihood a kernel of ? In conventional form, ! n2 Pn n Y x2i 3n 2 2 i=1 f (x | θ) = θ2 xi exp − θ π 2 i=1 Pn 2 3n i=1 xi 2 ∝ θ exp − θ 2 which, viewing f (x | θ)as a function of θ, we recognise as a kernel of a Gamma Pn 1 2 x )). distribution (in particular, Gamma( 3n+2 , i i=1 2 2 iii. By first finding the corresponding posterior distribution for θ given x = (x1 , . . . , xn ), show that this family of distributions is conjugate with respect to the likelihood f (x | θ). Taking θ ∼ Gamma(α, β) we have Pn 2 3n i=1 xi f (θ | x) ∝ θ 2 exp − θ × θα−1 e−βθ 2 ( ! ) n 1X 2 α+ 3n −1 2 = θ exp − β + x θ 2 i=1 i which, of P course, is a kernel of a Gamma distribution so that θ | x ∼ Gamma(α+ n 1 3n 2 , β + i=1 xi ). The prior and the posterior are in the same family giving 2 2 conjugacy. 2. Let X1 , . . . , Xn be exchangeable so that the Xi are conditionally independent given a parameter θ. Suppose that Xi | θ is geometrically distributed with probability density function f (xi | θ) = (1 − θ)xi −1 θ, xi = 1, 2, . . . . (a) Show that f (x | θ), where x = (x1 , . . . , xn ), belongs to the 1-parameter exponential family. Hence, or otherwise, find the conjugate prior distribution and corresponding posterior distribution for θ. As the Xi are exchangeable then f (x | θ) = = n Y i=1 n Y f (xi | θ) (1 − θ)xi −1 θ i=1 = (1 − θ)nx̄−n θn = exp {(nx̄ − n) log(1 − θ) + n log θ} 4 and so belongs to the 1-parameter exponential family. The conjugate prior is of the form f (θ) ∝ exp {a log(1 − θ) + b log θ} = θb (1 − θ)a which is a kernel of a Beta distribution. Letting α = b + 1, β = a + 1 then we have θ ∼ Beta(α, β). f (θ | x) ∝ ∝ f (x | θ)f (θ) θn (1 − θ)(nx̄−n) θα−1 (1 − θ)β−1 which is a kernel of a Beta(α+n, β +nx̄−n) so that θ | x ∼ Beta(α+n, β +nx̄−n). (b) Show that the posterior mean for θ can be written as a weighted average of the prior mean of θ and the maximum likelihood estimate, x̄−1 . E(θ | X) = = = α+n (α + n) + (β + nx̄ − n) α+n α + β + nx̄ α+β α nx̄ 1 + α + β + nx̄ α+β α + β + nx̄ x̄ = λE(θ) + (1 − λ)x̄−1 . (c) Suppose now that the prior for θ is instead given by the probability density function f (θ) = 1 1 θα (1 − θ)β−1 + θα−1 (1 − θ)β , 2B(α + 1, β) 2B(α, β + 1) where B(α, β) denotes the Beta function evaluated at α and β. Show that the posterior probability density function can be written as f (θ | x) λf1 (θ) + (1 − λ)f2 (θ) = where λ = (α + n)β Pn (α + n)β + (β − n + i=1 xi )α and f1 (θ) and f2 (θ) are probability density functions. f (θ | x) ∝ f (x | θ)f (θ) = θn (1 − θ)(nx̄−n) = θα−1 (1 − θ)β θα (1 − θ)β−1 + B(α + 1, β) B(α, β + 1) θα1 (1 − θ)β1 −1 θα1 −1 (1 − θ)β1 + B(α + 1, β) B(α, β + 1) where α1 = α + n and β1 = β + nx̄ − n. Finding the constant of proportionality we observe that θα1 (1 − θ)β1 −1 is a kernel of a Beta(α1 + 1, β1 ) and θα1 −1 (1 − θ)β1 is a kernel of a Beta(α1 , β1 + 1). So, B(α1 + 1, β1 ) B(α1 , β1 + 1) f (θ | x) = c f1 (θ) + f2 (θ) B(α + 1, β) B(α, β + 1) 5 where f1 (θ) is the density function of Beta(α1 + 1, β1 ) and f2 (θ) the density function of Beta(α1 , β1 + 1). Hence, c−1 = B(α1 + 1, β1 ) B(α1 , β1 + 1) + B(α + 1, β) B(α, β + 1) so that f (θ | x) = λf1 (θ) + (1 − λ)f2 (θ) with λ = B(α1 +1,β1 ) B(α+1,β) B(α1 +1,β1 ) B(α1 ,β1 +1) B(α+1,β) + B(α,β+1) = α1 (α+β)B(α1 ,β1 ) α(α1 +β1 )B(α,β) α1 (α+β)B(α1 ,β1 ) β1 (α+β)B(α1 ,β1 ) α(α1 +β1 )B(α,β) + β(α1 +β1 )B(α,β) = α1 β α1 β + β1 α = (α + n)β Pn . (α + n)β + (β + i=1 xi − n)α 3. Let X1 , . . . , Xn be exchangeable so that the Xi are conditionally independent given a parameter θ. Suppose that Xi | θ is distributed as a doubleexponential distribution with probability density function 1 |xi | f (xi | θ) = exp − , −∞ < xi < ∞ 2θ θ for θ > 0. (a) Find the conjugate prior distribution and corresponding posterior distribution for θ following observation of x = (x1 , . . . , xn ). n Y 1 |xi | f (x | θ) = exp − 2θ θ i=1 ) ( n 1 1X |xi | ∝ exp − θn θ i=1 which, when viewed as a function of θ, is a kernel of Inv-gamma(n−1, We thus take θ ∼ Inv-gamma(α, β) as the prior so that ( ) n 1 1X 1 β f (θ | x) ∝ exp − |xi | exp − θn θ i=1 θα+1 θ !) ( n X 1 1 = exp − β + |xi | θα+n+1 θ i=1 Pn i=1 |xi |). Pn which is a kernel of Inv-gamma(α + n, β + i=1 |xi |). Thus, with respect to X | θ, the prior and posterior P are in the same family, showing conjugacy, with n θ | x ∼ Inv-gamma(α + n, β + i=1 |xi |). (b) Consider the transformation φ = θ−1 . Find the posterior distribution of φ | x. 6 We have φ = g(θ) where g(θ) = θ−1 so that θ = g −1 (φ) = φ−1 . Transforming fθ (θ | x) to fφ (φ | x) we have ∂θ fφ (φ | x) = fθ (g −1 (φ) | x) ∂φ ( !) n X −1 1 1 ∝ 2 α+n+1 exp − 1 β + |xi | φ 1 φ i=1 φ ( !) n X α+n−1 = φ exp −φ β + |xi | i=1 Pn which is a kernel of aPGamma(α + n, β + i=1 |xi |) distribution. That is φ | x ∼ n Gamma(α + n, β + i=1 |xi |). The result highlights the relationship between the Gamma and Inv-Gamma distributions shown on question 3(b)(i) of Question Sheet Two. 4. Let X1 , . . . , Xn be a finite subset of a sequence of infinitely exchangeable random quantities with joint density function !−(n+1) n X xi . f (x1 , . . . , xn ) = n! 1 + i=1 Show that they can be represented as conditionally independent and exponentially distributed. Using de Finetti’s Representation Theorem (Theorem 2 of the on-line notes), the joint distribution has an integral representation of the form ) Z (Y n f (x1 , . . . , xn ) = f (xi | θ) f (θ) dθ. θ i=1 If Xi | θ ∼ Exp(θ) then n Y f (xi | θ) = i=1 n Y n θ exp (−θxi ) = θ exp −θ i=1 n X ! xi . i=1 Notice that, viewed as a function of θ, this looks like a kernel of Gamma(n+1, The result holds if we can find an f (θ) such that !−(n+1) ! Z n n X X n n! 1 + xi = θ exp −θ xi f (θ) dθ. θ i=1 Pn i=1 xi ). i=1 Pn The left hand side looks like the normalising constant of a Gamma(n + 1, 1 + i=1 xi ) (as n! = Γ(n + 1)) and if f (θ) =Pexp(−θ) then the integrand on the right hand side is n a kernel of a Gamma(n + 1, 1 + i=1 xi ). So, if θ ∼ Gamma(1, 1) then f (θ) = exp(−θ) and we have the desired representation. 5. Let X1 , . . . , Xn be exchangeable so that the Xi are conditionally independent given a parameter θ. Suppose that Xi | θ is distributed as a Poisson distribution with mean θ. 7 (a) Show that, with respect to this Poisson likelihood, the gamma family of distributions is conjugate. f (x | θ) = ∝ n Y i=1 n Y P (Xi = xi | θ) θxi exp {−θ} i=1 = θnx̄ exp {−nθ} . As θ ∼ Gamma(α, β) then f (θ | x) ∝ f (x | θ)f (θ) ∝ θnx̄ exp {−nθ} θα−1 exp {−βθ} = θα+nx̄−1 exp {−(β + n)θ} which is a kernel of a Gamma(α + nx̄, β + n) distribution. Hence, the prior and posterior are in the same family giving conjugacy. (b) Interpret the posterior mean of θ paying particular attention to the cases when we may have weak prior information and strong prior information. E(θ | X) = = α + nx̄ β+n β α β + nx̄ β+n α = λ + (1 − λ)x̄ β β where λ = β+n . Hence, the posterior mean is a weighted average of the prior α mean, β , and the data mean, x̄, which is also the maximum likelihood estimate. Weak prior information corresponds to a large variance of θ which can be viewed as small β (β is the inverse scale parameter). In this case, more weight is attached to x̄ than α β in the posterior mean. Strong prior information corresponds to a small variance of θ which can be viewed as large β (once again, β is the inverse scale parameter). In this case, more weight is attached to α β than x̄ in the posterior mean which thus favours the prior mean. (c) Suppose now that the prior for θ is given hierarchically. Given λ, θ is judged to follow an exponential distribution with mean λ1 and λ is given the improper distribution f (λ) ∝ 1 for λ > 0. Show that f (λ | x) ∝ where x̄ = 1 n Pn i=1 xi . 8 λ (n + λ)nx̄+1 θ | λ ∼ Exp(λ) so f (θ | λ) = λ exp{−λθ}. f (λ, θ | x) ∝ f (x | θ, λ)f (θ, λ) = f (x | θ)f (θ | λ)f (λ) ∝ θnx̄ exp {−nθ} (λ exp {−λθ}) = λθnx̄ exp {−(n + λ)θ} . Thus, integrating out θ, ∞ Z λθnx̄ exp {−(n + λ)θ} dθ f (λ | x) ∝ 0 Z = ∞ θnx̄ exp {−(n + λ)θ} dθ λ 0 As the integrand is a kernel of a Gamma(nx̄ + 1, n + λ) distribution we thus have f (λ | x) ∝ ∝ 9 λΓ(nx̄ + 1) (n + λ)nx̄+1 λ . (n + λ)nx̄+1
© Copyright 2026 Paperzz