Bowdoin Math 2606, Spring 2016

Bowdoin Math 2606, Spring 2016 – Homework #10

Bowdoin Math 2606, Spring 2016 – Homework #10
The Beta probability density functions, Bayesian Estimation, and conjugate priors.
Due on Friday, April 22 (by 5 p.m., right before the midterm)
This homework assignment is on two pages.
Read sections 7.2 and 7.3 (excluding “Improper prior distributions”) of Probability and Statistics by DG&S.
Also, solve the following problems, some of which may be taken from the textbook. To obtain full credit, please
write clearly and show your reasoning. Please solve the problems in the order given and STAPLE
your homework together. Also, write your name (LAST, First) on the top of the first page.
Problem 10A (the Beta probability density functions). In class we defined the Beta(a, b) density:

1


xa−1 (1 − x)b−1 for 0 < x < 1,

B(a, b)
f (x) =


 0
otherwise,
R1
where B(a, b) = 0 xa−1 (1 − x)b−1 dx is called the Beta function. One can show (though it is a bit laborious)
that B(a, b) = Γ(a)Γ(b)
Γ(a+b) . Using this fact and the result from Problem 4G (i.e. that Γ(a) = (a − 1)Γ(a − 1) for
any a > 1), show that if X ∼ Beta(a, b) then its mean and variance are respectively given by:
E(X) =
a
a+b
and
Var(X) =
ab
(a +
b)2 (a
+ b + 1)
(do this in the general case: do not assume that a and b are necessarily integers).
Problem 10B. Let’s consider our ‘favorite’ probability distribution, again. Suppose that we have n i.i.d. random variables X1 , X2 , . . . , Xn , each with the following probability function (as usual, 0 ≤ θ ≤ 1 is unknown):
2
P (X = 0 | θ) = θ,
3
1
P (X = 1 | θ) = θ,
3
2
P (X = 2 | θ) = (1 − θ),
3
1
P (X = 3 | θ) = (1 − θ).
3
Summary from previous episodes: In Problems 6H, 7B, and 8D you found: (1) the MLE is θbml = (N0 +N1 )/n,
where Ni = # of Xj ’s that are equal to i, for i = 0, 1, 2, 3; (2) θbml is unbiased; (3) Var(θbml ) = θ(1 − θ)/n.
Moreover: (4) the MME is θbmm = 7/6 − X/2, (5) θbmm also unbiased, and (6) Var(θbmm ) = [1/18 + θ/(1 − θ)]/n.
(a) If the prior distribution (p.d.f.) of Θ is Uniform(0, 1), what is the posterior p.d.f. fΘ|X (θ|x)? Hint: As
you did in Problem 6H consider a generic set of sample values x = (x1 , . . . , xn ) and let ni be the number
of the observations in the sample that are equal to i (with i = 0, 1, 2, 3). Now apply Bayes’ rule:
fΘ|X (θ|x) =
fX|Θ (x|θ)fΘ (θ)
∝ fX|Θ (x|θ)fΘ (θ) = f (x1 |θ)f (x2 |θ) . . . f (xn |θ) fΘ (θ),
fX (x)
i.e. by “ignoring” the denominator, which is part of the normalizing constant of the posterior density.
By inserting the f (xi |θ)’s, factoring out the parts that do not depend on θ and merging them into the
normalizing constant, you should recognize that the posterior is a Beta(a, b) density. What are a and b?
(b) If you want a number rather than a p.d.f. as your estimate, there are several options: the conditional
expectation is one. Compute θb = E(Θ|X = x), and express it in terms of n0 , . . . , n3 . Now write the
conditional expectation E(Θ|X) as a r.v. and show that, for large n, it is approximately equal to θbml .
(c) Finally, assume that we have a sample size of n = 10 with sample values x = (3, 0, 2, 1, 3, 2, 1, 0, 2, 1),
like we had in Problem 6H. Compute the expression for the posterior density and roughly plot it. Also,
compute the mode of the posterior, i.e. the value of θ that maximizes the posterior p.d.f. fΘ|X (θ|x).
1
Problem 10C. Let the unknown probability that Alice, a basketball player, makes a shot successfully be θ.
Suppose that your prior on the unknown parameter is Θ ∼ Uniform(0, 1), and that she then makes two
successful shots in a row. Assume that the outcomes of the two shots are independent. Let us compute the
posterior probability distribution function of Θ, in two different ways.
(a) Let Y1 and Y2 be two independent Bernoulli(θ) random variables (where Yi = 0 if the ith shot is unsuccessful, and Yi = 1 if it is successful). Compute the posterior p.d.f. fΘ|Y1 ,Y2 (θ|1, 1) ∝ fY1 |Θ (1|θ)fY2 |Θ (1|θ)fΘ (θ).
Do you recognize it as a known probability density function?
(b) Now let S = Y1 + Y2 . What kind of random variable is S (that is, what family of random variables
does it belong to, and with what parameters)? Compute the posterior p.d.f. fΘ|S (θ|2) ∝ fS|Θ (2|θ)fΘ (θ).
Do you recognize as a known p.d.f? Remark: You should get the same result as in part (a). This is
because S = h(Y1 , Y2 ) is a so-called sufficient statistic for the parameter θ: roughly speaking, while you
“lose” some information by summing the two random variables Y1 and Y2 (knowing Y1 and Y2 allows
you to compute S, but knowing only S does not allow you to recover the separate values of Y1 and Y2 ),
the information provided by S is “sufficient” to compute the posterior probability density function of Θ.
(c) Knowing the posterior probability density function, i.e. fΘ|Y1 ,Y2 (θ|1, 1) = fΘ|S (θ|2) for θ ∈ (0, 1), what
would you estimate the probability that she makes a successful third shot to be?
Problem 10D (Conjugacy of the Beta p.d.f. with respect to the Binomial likelihood). Suppose
that we are flipping a biased coin n times, and that the probability θ of landing heads is unknown: we are
going to adopt the Bayesian approach to estimate the parameter. Define the random variables X
P1 , X2 , . . . , Xn
as follows: Xi = 1 if the ith toss lands heads, and Xi = 0 otherwise. Therefore the r.v. S = ni=1 Xi is the
total number of heads out of n tosses. Also, assume that P (Xi = 1|Θ = θ) = θ, P (Xi = 0|Θ = θ) = 1 − θ,
and that the prior on the unknown parameter is Θ ∼ Beta(a, b), for some known parameters a > 0 and b > 0.
(a) What is the conditional probability distribution of S|{Θ = θ}? (That is, what is the likelihood function?)
(b) What is the posterior probability density function of Θ|{S = s}, where 0 ≤ s ≤ n?
(c) What is the posterior mean E(Θ|S)? What is it approximately equal to, for a large sample size n?
2
Problem 10E. Let X1 , X2 , . . . , Xn be a random sample from a N (θ, σ 2 ) distribution, where σ
Pnis known
(and we will treat it as a constant) and θ is unknown (so, we will treat it as a r.v.). Define S = i=1 Xi .
(a) Show that, when the prior distribution of the unknown parameter is Θ ∼ N (µ, τ 2 ), its posterior is
Θ|{S = s} ∼ N (µn , τn2 ), where
µn =
s
σ2
n
σ2
+
+
µ
τ2
1
τ2
and
τn2 =
n
σ2
1
+
1
τ2
.
Hint: Once again, write fΘ|S (θ|s) ∝ fS|Θ (s|θ)fΘ (θ), therefore considering the denominator as part of
the normalizing constant of the p.d.f. (which is a function of s). Insert the appropriate expressions for
the two densities. The computation is not at all trivial, but it is greatly simplified if, as you proceed, you
“get rid” of the factors that do not explicitly depend on θ by incorporating them into the normalizing
2
constant. You will need to recognize, as the exponent of e, an expression of the type − 12 θ −2µτ 2n θ+... .
n
(b) Show that, for large n, the conditional expectation E(Θ|S) is approximately given by the sample mean
of the observations X1 , . . . , Xn . What is Var(Θ|S) approximately equal to, for large n?
2

Download Report

Bowdoin Math 2606, Spring 2016 – Homework #10

Paperzz.com

Your Paperzz