Q1. Given X ~ Bin(N,p)

Q1.
Given X ~ Bin(N,p), N=100, p=0.37.
Generate ten random number of X using R, given
X1=36,X2=41,X3=47,X4=40,X5=43,X6=39,X7=47,X8=38,X9=40,X10=41.
Maximum Likelihood Method gives 𝑃�=0.412, CI=[ 0.3155298, 0.5084702].
Using Bayesian approach, taking prior as Beta(1,1) which uniformly distributed on [0,1], it gives
𝑃�= 0.4121756,CI=[ 0.38173 0.44272].
Bias for MLE method is 𝑃� − 𝑝 = 0.412 − 0.37 = 0.042 which is slightly smaller than Bias for
Bayesian method 𝑃� − 𝑝 = 0.0421756.
MSE=∑(𝑃� − 𝑝)^2, MSE calculated by MLE method is 0.0292, and it is significantly larger than
MLE calculated bayesianly which equals to 0.001778785.
For CI, the one calculated by MLE method has a larger range.
The R code I used is as below.
N=10
n=100
p=0.37
X=rbinom(N,n,p)
*********************************
mse=0
for(i in 1:N){
X[i]=rbinom(1,n,p)
phat[i]=X[i]/n
mse=mse+(phat[i]-p)^2
}
phat=sum(phat)/N
se=sqrt(abs(phat*(1-phat)/n))
phat;mse;phat-1.96*se;phat+1.96*se
*********************************
alpha=1;beta=1
x=seq(0.00001,0.99999,0.00001)
p.beta=dbeta(x,alpha,beta)
pv=seq(0.00001,0.99999,0.00001)
xv=sum(X);nv=n*N
logf1=(xv+alpha-1)*log(pv)+(beta+nv-xv-1)*log(1-pv)
f2=exp(logf1-max(logf1))
intf2=sum(f2)*(pv[2]-pv[1])
post=f2/intf2
pcdf=cumsum(post)/sum(post)
pmean=sum(pv*post)/sum(post)
for(i in seq(0.999,0.001,-0.001))
{
threshold=i*max(post)
within= which(post>=threshold)
coverage=pcdf[max(within)]-pcdf[min(within)]
if(coverage>=0.95) break()
}
CI=pv[range(within)]
mse=(pmean-p)^2
pmean;mse;CI
Q2.
Since P∈[0,1], we take Beta(1,1) as the prior.
Then generate a random number from Binomial(342,0.1) and note as xv.
Later part is similar to the Thai HIV test.
> pmean;CI
[1] 0.125
[1] 0.09085 0.16040
nv=342
p=0.1
xv=rbinom(1,nv,p)
alpha=1;beta=1
pv=seq(0.00001,0.99999,0.00001)
logf1=(xv+alpha-1)*log(pv)+(beta+nv-xv-1)*log(1-pv)
f2=exp(logf1-max(logf1))
intf2=sum(f2)*(pv[2]-pv[1])
post=f2/intf2
pcdf=cumsum(post)/sum(post)
pmean=sum(pv*post)/sum(post)
for(i in seq(0.999,0.001,-0.001))
{
threshold=i*max(post)
within=which(post>=threshold)
coverage=pcdf[max(within)]-pcdf[min(within)]
if(coverage>=0.95) break()
}
Q3.
1
𝑃𝑣 ∈ [0,1], a proper prior should satisfy ∫0 𝑝(𝑥)𝑑𝑥 = 1, thus a beta distribution is a good choice.
For 𝑃𝑣 ~ 𝐵𝑒(𝛼, 𝛽), we have 𝑃(𝑃𝑣 ) =
𝑃𝑣𝛼−1 ∗(1−𝑃𝑣 )𝛽−1
.
𝐵𝑒(𝛼,𝛽)
𝑁
𝑋
With 𝑃(𝑋𝑣 |𝑁𝑣 , 𝑃𝑣 ) = � 𝑣 � ∗ 𝑃𝑣 𝑣 ∗ (1 − 𝑃𝑣 )(𝑁𝑣 −𝑋𝑣 ), we have that
𝑋𝑣
𝑃(𝑃𝑣 |𝑁𝑣 , 𝑋𝑣 ) ∝
𝑃𝑣𝛼−1 ∗ (1 − 𝑃𝑣 )𝛽−1
𝑁
𝑋
∗ � 𝑣 � ∗ 𝑃𝑣 𝑣 ∗ (1 − 𝑃𝑣 )(𝑁𝑣 −𝑋𝑣 )
𝑋𝑣
𝐵𝑒(𝛼, 𝛽)
𝛼−1+𝑋𝑣
∝ 𝑃𝑣
∝
∗ (1 − 𝑃𝑣 )𝛽−1+𝑁𝑣 −𝑋𝑣
(𝑋𝑣 +𝛼)−1
𝑃𝑣
∗ (1 − 𝑃𝑣 )𝛽+𝑁𝑣 −𝑋𝑣 −1
𝐵𝑒(𝑋𝑣 + 𝛼, 𝛽 + 𝑁𝑣 − 𝑋𝑣 )
So 𝑃𝑣 |𝑁𝑣 , 𝑋𝑣 ~ 𝐵𝑒(𝑋𝑣 + 𝛼, 𝛽 + 𝑁𝑣 − 𝑋𝑣 ).
alpha=1;beta=1
x=seq(0.00001,0.99999,0.00001)
p.beta=dbeta(x,alpha,beta)
plot(x,p.beta,)
pv=seq(0.00001,0.05,0.00001)
xv=51;nv=8197
logf1=(xv+alpha-1)*log(pv)+(beta+nv-xv-1)*log(1-pv)
f2=exp(logf1-max(logf1))
intf2=sum(f2)*(pv[2]-pv[1])
post=f2/intf2
pcdf=cumsum(post)/sum(post)
for(i in seq(0.999,0.001,-0.001))
{
threshold=i*max(post)
within= which(post>=threshold)
coverage=pcdf[max(within)]-pcdf[min(within)]
if(coverage>=0.95) break()
}
CI=pv[range(within)]
plot(post)
Alpha
2
1/2
1
1
1/2
Beta
2
1
1/2
1
1/2
Pmean
0.006462627
0.006281637
0.00628202
0.006342237
0.00628202
CI
0.00477 0.00822
0.00461 0.00802
0.00461 0.00802
0.00467 0.00809
0.00461 0.00802
Q4.
𝑋
𝑃 𝑣 ∗(1−𝑃𝑣 )(𝑁𝑣−𝑋𝑣 )
𝑣
Since 𝑃𝑣 ~ 𝐵𝑒(1 + 𝑋𝑣 , 1 + 𝑁𝑣 − 𝑋𝑣 ), we have 𝑃(𝑃𝑣 ) = 𝐵𝑒(1+𝑋
𝑣 ,1+𝑁𝑣 −𝑋𝑣 )
.
𝑁
𝑋
With 𝑃(𝑋𝑣 |𝑁𝑣 , 𝑃𝑣 ) = � 𝑣 � ∗ 𝑃𝑣 𝑣 ∗ (1 − 𝑃𝑣 )(𝑁𝑣 −𝑋𝑣 ), we have that
𝑋𝑣
𝑃(𝑃𝑣 |𝑁𝑣 , 𝑋𝑣 ) ∝
𝑋
𝑃𝑣 𝑣 ∗ (1 − 𝑃𝑣 )(𝑁𝑣 −𝑋𝑣 )
𝑁
𝑋
∗ � 𝑣 � ∗ 𝑃𝑣 𝑣 ∗ (1 − 𝑃𝑣 )(𝑁𝑣 −𝑋𝑣 )
𝑋𝑣
𝐵𝑒(1 + 𝑋𝑣 , 1 + 𝑁𝑣 − 𝑋𝑣 )
2∗𝑋𝑣
∝ 𝑃𝑣
(2∗𝑋 +1)−1
∗ (1 − 𝑃𝑣 )2∗(𝑁𝑣 −𝑋𝑣 )
∗ (1 − 𝑃𝑣 )[2∗(𝑁𝑣 −𝑋𝑣 )+1]−1
𝑃𝑣 𝑣
∝
𝐵𝑒((2 ∗ 𝑋𝑣 + 1),2 ∗ (𝑁𝑣 − 𝑋𝑣 ) + 1)
So 𝑃𝑣 |𝑁𝑣 , 𝑋𝑣 ~ 𝐵𝑒((2 ∗ 𝑋𝑣 + 1),2 ∗ (𝑁𝑣 − 𝑋𝑣 ) + 1).
And the 95% highest posterior density interval for 𝑃𝑣 we get is (0.00509 0.00751).
R code for a 95% highest posterior density interval for 𝑃𝑣 :
pv=seq(0.00001,0.05,0.00001)
xv=51;nv=8197
logf1=2*xv*log(pv)+2*(nv-xv)*log(1-pv)
f2=exp(logf1-max(logf1))
intf2=sum(f2)*(pv[2]-pv[1])
post=f2/intf2
pcdf=cumsum(post)/sum(post)
for(i in seq(0.999,0.001,-0.001))
{
threshold=i*max(post)
within= which(post>=threshold)
coverage=pcdf[max(within)]-pcdf[min(within)]
if(coverage>=0.95) break()
}
CI=pv[range(within)]

Download Report

Q1. Given X ~ Bin(N,p)

Paperzz.com

Your Paperzz