Q1. Given X ~ Bin(N,p)

Q1.
Given X ~ Bin(N,p), N=100, p=0.37.
Generate ten random number of X using R, given
X1=36,X2=41,X3=47,X4=40,X5=43,X6=39,X7=47,X8=38,X9=40,X10=41.
Maximum Likelihood Method gives 𝑃�=0.412, CI=[ 0.3155298, 0.5084702].
Using Bayesian approach, taking prior as Beta(1,1) which uniformly distributed on [0,1], it gives
𝑃�= 0.4121756,CI=[ 0.38173 0.44272].
Bias for MLE method is 𝑃� βˆ’ 𝑝 = 0.412 βˆ’ 0.37 = 0.042 which is slightly smaller than Bias for
Bayesian method 𝑃� βˆ’ 𝑝 = 0.0421756.
MSE=βˆ‘(𝑃� βˆ’ 𝑝)^2, MSE calculated by MLE method is 0.0292, and it is significantly larger than
MLE calculated bayesianly which equals to 0.001778785.
For CI, the one calculated by MLE method has a larger range.
The R code I used is as below.
N=10
n=100
p=0.37
X=rbinom(N,n,p)
*********************************
mse=0
for(i in 1:N){
X[i]=rbinom(1,n,p)
phat[i]=X[i]/n
mse=mse+(phat[i]-p)^2
}
phat=sum(phat)/N
se=sqrt(abs(phat*(1-phat)/n))
phat;mse;phat-1.96*se;phat+1.96*se
*********************************
alpha=1;beta=1
x=seq(0.00001,0.99999,0.00001)
p.beta=dbeta(x,alpha,beta)
pv=seq(0.00001,0.99999,0.00001)
xv=sum(X);nv=n*N
logf1=(xv+alpha-1)*log(pv)+(beta+nv-xv-1)*log(1-pv)
f2=exp(logf1-max(logf1))
intf2=sum(f2)*(pv[2]-pv[1])
post=f2/intf2
pcdf=cumsum(post)/sum(post)
pmean=sum(pv*post)/sum(post)
for(i in seq(0.999,0.001,-0.001))
{
threshold=i*max(post)
within= which(post>=threshold)
coverage=pcdf[max(within)]-pcdf[min(within)]
if(coverage>=0.95) break()
}
CI=pv[range(within)]
mse=(pmean-p)^2
pmean;mse;CI
Q2.
Since P∈[0,1], we take Beta(1,1) as the prior.
Then generate a random number from Binomial(342,0.1) and note as xv.
Later part is similar to the Thai HIV test.
> pmean;CI
[1] 0.125
[1] 0.09085 0.16040
nv=342
p=0.1
xv=rbinom(1,nv,p)
alpha=1;beta=1
pv=seq(0.00001,0.99999,0.00001)
logf1=(xv+alpha-1)*log(pv)+(beta+nv-xv-1)*log(1-pv)
f2=exp(logf1-max(logf1))
intf2=sum(f2)*(pv[2]-pv[1])
post=f2/intf2
pcdf=cumsum(post)/sum(post)
pmean=sum(pv*post)/sum(post)
for(i in seq(0.999,0.001,-0.001))
{
threshold=i*max(post)
within=which(post>=threshold)
coverage=pcdf[max(within)]-pcdf[min(within)]
if(coverage>=0.95) break()
}
Q3.
1
𝑃𝑣 ∈ [0,1], a proper prior should satisfy ∫0 𝑝(π‘₯)𝑑π‘₯ = 1, thus a beta distribution is a good choice.
For 𝑃𝑣 ~ 𝐡𝑒(𝛼, 𝛽), we have 𝑃(𝑃𝑣 ) =
π‘ƒπ‘£π›Όβˆ’1 βˆ—(1βˆ’π‘ƒπ‘£ )π›½βˆ’1
.
𝐡𝑒(𝛼,𝛽)
𝑁
𝑋
With 𝑃(𝑋𝑣 |𝑁𝑣 , 𝑃𝑣 ) = οΏ½ 𝑣 οΏ½ βˆ— 𝑃𝑣 𝑣 βˆ— (1 βˆ’ 𝑃𝑣 )(𝑁𝑣 βˆ’π‘‹π‘£ ), we have that
𝑋𝑣
𝑃(𝑃𝑣 |𝑁𝑣 , 𝑋𝑣 ) ∝
π‘ƒπ‘£π›Όβˆ’1 βˆ— (1 βˆ’ 𝑃𝑣 )π›½βˆ’1
𝑁
𝑋
βˆ— οΏ½ 𝑣 οΏ½ βˆ— 𝑃𝑣 𝑣 βˆ— (1 βˆ’ 𝑃𝑣 )(𝑁𝑣 βˆ’π‘‹π‘£ )
𝑋𝑣
𝐡𝑒(𝛼, 𝛽)
π›Όβˆ’1+𝑋𝑣
∝ 𝑃𝑣
∝
βˆ— (1 βˆ’ 𝑃𝑣 )π›½βˆ’1+𝑁𝑣 βˆ’π‘‹π‘£
(𝑋𝑣 +𝛼)βˆ’1
𝑃𝑣
βˆ— (1 βˆ’ 𝑃𝑣 )𝛽+𝑁𝑣 βˆ’π‘‹π‘£ βˆ’1
𝐡𝑒(𝑋𝑣 + 𝛼, 𝛽 + 𝑁𝑣 βˆ’ 𝑋𝑣 )
So 𝑃𝑣 |𝑁𝑣 , 𝑋𝑣 ~ 𝐡𝑒(𝑋𝑣 + 𝛼, 𝛽 + 𝑁𝑣 βˆ’ 𝑋𝑣 ).
alpha=1;beta=1
x=seq(0.00001,0.99999,0.00001)
p.beta=dbeta(x,alpha,beta)
plot(x,p.beta,)
pv=seq(0.00001,0.05,0.00001)
xv=51;nv=8197
logf1=(xv+alpha-1)*log(pv)+(beta+nv-xv-1)*log(1-pv)
f2=exp(logf1-max(logf1))
intf2=sum(f2)*(pv[2]-pv[1])
post=f2/intf2
pcdf=cumsum(post)/sum(post)
for(i in seq(0.999,0.001,-0.001))
{
threshold=i*max(post)
within= which(post>=threshold)
coverage=pcdf[max(within)]-pcdf[min(within)]
if(coverage>=0.95) break()
}
CI=pv[range(within)]
plot(post)
Alpha
2
1/2
1
1
1/2
Beta
2
1
1/2
1
1/2
Pmean
0.006462627
0.006281637
0.00628202
0.006342237
0.00628202
CI
0.00477 0.00822
0.00461 0.00802
0.00461 0.00802
0.00467 0.00809
0.00461 0.00802
Q4.
𝑋
𝑃 𝑣 βˆ—(1βˆ’π‘ƒπ‘£ )(π‘π‘£βˆ’π‘‹π‘£ )
𝑣
Since 𝑃𝑣 ~ 𝐡𝑒(1 + 𝑋𝑣 , 1 + 𝑁𝑣 βˆ’ 𝑋𝑣 ), we have 𝑃(𝑃𝑣 ) = 𝐡𝑒(1+𝑋
𝑣 ,1+𝑁𝑣 βˆ’π‘‹π‘£ )
.
𝑁
𝑋
With 𝑃(𝑋𝑣 |𝑁𝑣 , 𝑃𝑣 ) = οΏ½ 𝑣 οΏ½ βˆ— 𝑃𝑣 𝑣 βˆ— (1 βˆ’ 𝑃𝑣 )(𝑁𝑣 βˆ’π‘‹π‘£ ), we have that
𝑋𝑣
𝑃(𝑃𝑣 |𝑁𝑣 , 𝑋𝑣 ) ∝
𝑋
𝑃𝑣 𝑣 βˆ— (1 βˆ’ 𝑃𝑣 )(𝑁𝑣 βˆ’π‘‹π‘£ )
𝑁
𝑋
βˆ— οΏ½ 𝑣 οΏ½ βˆ— 𝑃𝑣 𝑣 βˆ— (1 βˆ’ 𝑃𝑣 )(𝑁𝑣 βˆ’π‘‹π‘£ )
𝑋𝑣
𝐡𝑒(1 + 𝑋𝑣 , 1 + 𝑁𝑣 βˆ’ 𝑋𝑣 )
2βˆ—π‘‹π‘£
∝ 𝑃𝑣
(2βˆ—π‘‹ +1)βˆ’1
βˆ— (1 βˆ’ 𝑃𝑣 )2βˆ—(𝑁𝑣 βˆ’π‘‹π‘£ )
βˆ— (1 βˆ’ 𝑃𝑣 )[2βˆ—(𝑁𝑣 βˆ’π‘‹π‘£ )+1]βˆ’1
𝑃𝑣 𝑣
∝
𝐡𝑒((2 βˆ— 𝑋𝑣 + 1),2 βˆ— (𝑁𝑣 βˆ’ 𝑋𝑣 ) + 1)
So 𝑃𝑣 |𝑁𝑣 , 𝑋𝑣 ~ 𝐡𝑒((2 βˆ— 𝑋𝑣 + 1),2 βˆ— (𝑁𝑣 βˆ’ 𝑋𝑣 ) + 1).
And the 95% highest posterior density interval for 𝑃𝑣 we get is (0.00509 0.00751).
R code for a 95% highest posterior density interval for 𝑃𝑣 :
pv=seq(0.00001,0.05,0.00001)
xv=51;nv=8197
logf1=2*xv*log(pv)+2*(nv-xv)*log(1-pv)
f2=exp(logf1-max(logf1))
intf2=sum(f2)*(pv[2]-pv[1])
post=f2/intf2
pcdf=cumsum(post)/sum(post)
for(i in seq(0.999,0.001,-0.001))
{
threshold=i*max(post)
within= which(post>=threshold)
coverage=pcdf[max(within)]-pcdf[min(within)]
if(coverage>=0.95) break()
}
CI=pv[range(within)]