MM and ML for a sample of n = 30 from Gamma(3,2)
===============================================
Generate the sample with shape parameter α = 3 and scale parameter λ = 2
> x=rgamma(30,3,2)
>x
[1] 0.7390502 0.6296051 2.6959741 1.4224229 2.3107442 2.4565824 2.6377385 1.3118252
[9] 0.6833715 0.1624100 1.1383413 0.8929329 2.5453926 2.9638079 0.7831139 2.4481332
[17] 3.3791020 1.2064738 1.4916348 1.2501272 1.9502161 1.7128422 4.0314710 1.2669130
[25] 0.5189497 0.3306934 3.3667978 1.7410155 1.7168681 2.5169849
> summary(x)
Min. 1st Qu. Median
0.1624 0.9543 1.6020
> xave=mean(x)
> xave
[1] 1.743385
> xsd=sqrt(var(x))
> xsd
[1] 0.9973476
Mean 3rd Qu. Max.
1.7430 2.5020 4.0310
! the sample average, x
! the sample standard deviation, s
> stem(x)
The decimal point is at the |
0 | 23
0 | 567789
1 | 123334
1 | 5777
2 | 034
2 | 55567
3 | 044
3|
4|0
> hist(x)
1
Estimation of the Unknown Parameters α and λ:
===================================
Now pretend we don’t know that this sample is from a Gamma(3,2) population. Treat it as a random
sample of n = 30 data points from some population. We will use the Gamma(α,λ) distribution as the
statistical model for this data set.
Method of Moments estimates (MMEs) for this sample:
> amm=xave^2/(29*xsd^2/30)
> amm
[1] 3.160942
! α̂ MM
> lmm=amm/xave
> lmm
[1] 1.813107
! λ̂MM
Method of Maximum Likelihood estimates (MLEs) for this sample:
Evaluate log of average – average of logs:
> c=log(xave)-mean(log(x))
>c
[1] 0.2097930
First, let’s plot f (α ) = Γ' (α ) / Γ(α ) − log(α ) + c , the function of α for which we need to find the root to
determine the MLEs. Note that Γ' (α ) / Γ(α ) can be calculated as digamma (α ) in R.
a) We know α > 0. The MME suggests α is around 3. Let’s plot f (α ) for 0 < α < 10:
> xx=seq(0.1,10,0.1)
> y=digamma(xx)-log(xx)+c
> plot(xx,y)
> abline(0,0)
2
b) It looks like f (α ) is monotone increasing, with a root somewhere between 2 and 4. (You can check
monotonicity by plotting f ' (α ) . Note that Γ' ' (α ) / Γ(α ) − [Γ' (α ) / Γ(α )]2 can be calculated as
trigamma (α ) in R.) Plot f (α ) on the interval [2, 4]:
> xx=seq(2,4,0.01)
> y=digamma(xx)-log(xx)+c
> plot(xx,y)
> abline(0,0)
The root is close to 2.5, so α̂ ML is going to be a fair bit different from αˆ MM ≈ 3.16 . From the plot, it is
clear we can use Newton’s method to find the root as precisely as desired. The R function N0gamma
below determines the root (iteration stops once absolute change < 10-6) and prints out the details of the
iterations (Note that − f (α ) is used in N0gamma. No real reason for that.):
> N0gamma
function(a,x){
it=0
e=0.000001
diff=1
dx=log(mean(x))-mean(log(x))
while(abs(diff)>e){
it=it+1
f=log(a)-digamma(a)-dx
df=1/a-trigamma(a)
diff=-f/df
cat("It= ",it,"a= ",a," f= ",f," df= ",df," diff= ",diff,"\n")
a=a+diff
}
cat("MLE= ",a,"\n")
return(a)
}
Let’s try an initial guess of 2 for the root:
> N0gamma(2,x)
It= 1 a= 2 f= 0.06056987 df= -0.1449341 diff= 0.4179133
It= 2 a= 2.417913 f= 0.01102419 df= -0.09695163 diff= 0.1137081
It= 3 a= 2.531621 f= 0.0005214821 df= -0.0879952 diff= 0.005926256
3
It= 4 a= 2.537548 f= 1.283135e-06 df= -0.0875627 diff= 1.46539e-05
It= 5 a= 2.537562 f= 7.806866e-12 df= -0.08756163 diff= 8.915853e-11
MLE= 2.537562
[1] 2.537562
Note how the iteration converges to the root α ≈ 2.54 : the first step is quite large (diff = 0.42) but the
successive steps are smaller: 6x10-3, 1x10-5 and 9x10-11 at the 3rd, 4th and 5th iterations, reflecting the
quadratic convergence property of Newton’s method. The value of the function at the successive
iterations very quickly becomes very small. Also note that the derivative of the function changes very
little once you get close to the root (as is also clear from the plot). For this function, Newton’s method
works very well.
What if we start further away? Let’s try an initial guess of 4:
> N0gamma(4,x)
It= 1 a= 4 f= -0.07961628 df= -0.03382296 diff= -2.353912
It= 2 a= 1.646088 f= 0.1237274 df= -0.2196443 diff= 0.5633081
It= 3 a= 2.209396 f= 0.03326364 df= -0.1173232 diff= 0.2835214
It= 4 a= 2.492917 f= 0.003983043 df= -0.09089993 diff= 0.04381789
It= 5 a= 2.536735 f= 7.247592e-05 df= -0.08762182 diff= 0.0008271446
It= 6 a= 2.537562 f= 2.489054e-08 df= -0.08756165 diff= 2.84263e-07
MLE= 2.537562
[1] 2.537562
Basically the same story: even though the first iteration “overshoots” the root, the successive iterations
quickly locate the root. If you used a very large initial guess, the first step could lead to a negative
updated value of α – not a permissible value. N0gamma would then “bomb” when it tries to evaluate
the function at that updated value. (Try an initial value of 6, for example). Depending on the nature of
the function, Newton’s method can require careful guidance until it gets into the vicinity of the root
where the quadratic convergence property takes hold.
> aml=2.537562
> lml=aml/xave
> lml
[1] 1.455538
! α̂ ML
! λ̂ML
Evaluation of the Precision of these Estimates via Parametric Bootstrap Method:
==========================================================
The R function bootgammaMM first calculates α̂ MM and λ̂MM for the sample x and then generates B
samples of size n (the same size as the original sample) from the Gamma( α̂ MM , λ̂MM ) distribution. The
MMEs are evaluated for each of these B bootstrap samples and are stored in a Bx2 matrix.
> bootgammaMM
function(x,B){
n=length(x)
aM=(n/(n-1))*mean(x)^2/var(x)
lM=aM/mean(x)
aB=matrix(0,nrow=B,ncol=2)
4
for(i in 1:B){
xB=rgamma(n,shape=aM,rate=lM)
aB[i,1]=(n/(n-1))*mean(xB)^2/var(xB)
aB[i,2]=aB[i,1]/mean(xB)
}
return(aB)
}
Let’s do B = 1000 bootstrap samples:
> ymm=bootgammaMM(x,1000)
> summary(ymm)
X1
X2
Min. :1.354 Min. :0.6725
1st Qu.:2.899 1st Qu.:1.6468
Median :3.429 Median :1.9685
Mean :3.616 Mean :2.0930
3rd Qu.:4.132 3rd Qu.:2.4257
Max. :9.242 Max. :5.8215
> plot(ymm[,1],ymm[,2])
Now do the same thing for the MLEs α̂ ML and λ̂ML . The R function bootgammaML uses the R
function Ngamma to evaluate the MLEs for the bootstrap samples; Ngamma is a version of the funtion
N0gamma above with all the detailed output suppressed.
> bootgammaML
function(x,B){
n=length(x)
aM=Ngamma(mean(x)^2/var(x),x)
lM=aM/mean(x)
aB=matrix(0,nrow=B,ncol=2)
for(i in 1:B){
xB=rgamma(n,shape=aM,rate=lM)
aB[i,1]=Ngamma(mean(xB)^2/var(xB),xB)
aB[i,2]=aB[i,1]/mean(xB)
}
return(aB)
}
5
Let’s do B = 1000 bootstrap samples:
> yml=bootgammaML(x,1000)
> summary(yml)
X1
X2
Min. :1.357 Min. :0.7743
1st Qu.:2.269 1st Qu.:1.2790
Median :2.657 Median :1.5201
Mean :2.797 Mean :1.6274
3rd Qu.:3.126 3rd Qu.:1.8838
Max. :6.892 Max. :4.4669
> plot(yml[,1],ym[,2])
Note that the x and y scales are different in this plot than in the corresponding plot for the MMEs. Both
the summaries and the plots seem to indicate less scatter (in both dimensions!) for the bootstrap MLEs
than for the bootstrap MMEs. So it looks like, at least for this example, that the MLEs are less variable
than the MMEs. Let’s compare graphically:
Bootstrap estimates of α :
> boxplot(ymm[,1],yml[,1])
6
Bootstrap estimates of λ :
> boxplot(ymm[,2],yml[,2])
In both cases, the boxplots are centered at different values for the MME and the MLE, but it is also
clear that “box part” of the boxplot (the central portion of those datasets) is considerably less spread out
for the MLE than for the MME.
In case you prefer histograms to boxplots:
> par(mfrow=c(2,1))
> hist(ymm[,1],prob=T)
> hist(yml[,1],prob=T)
7
> hist(ymm[,2],prob=T)
> hist(yml[,2],prob=T)
To quantify the variability of the bootstrap values of α̂ MM and λ̂MM , let’s calculate the variancecovariance matrix:
> cov(ymm)
[,1] [,2]
[1,] 1.2019082 0.7103911
[2,] 0.7103911 0.4759623
So, the bootstrap estimates of:
" the SEs of α̂ MM and λ̂MM are √1.2019 = 1.096 and √0.4760 = 0.690, respectively,
" the correlation between α̂ MM and λ̂MM is 0.7104/√1.2019 √0.4760 = 0.939.
Similarly for the bootstrap MLEs:
> cov(yml)
[,1] [,2]
[1,] 0.5807971 0.3471700
[2,] 0.3471700 0.2454677
So, the bootstrap estimates of:
" the SEs of α̂ ML and λ̂ML are √0.5808 = 0.762 and √0.2455 = 0.495, respectively,
" the correlation between α̂ ML and λ̂ML is 0.3472/√0.5808 √0.2455 = 0.919.
The bootstrap estimates of the SEs are much smaller for the MLEs than for the MMEs, indicating that,
at least for this example, the MLEs are much less variable than the MMEs. We will soon see this is a
general phenomenon: in a sense to be described more precisely in class, MLEs are always better than
MMEs. In fact, we will show that (in that same sense) no estimator can do better than the MLE.
8
You may be curious how much things change if you do more bootstrap samples:
> ymm=bootgammaMM(x,10000)
> summary(ymm)
X1
X2
Min. : 1.029 Min. :0.5041
1st Qu.: 2.844 1st Qu.:1.6182
Median : 3.436 Median :1.9878
Mean : 3.607 Mean :2.0911
3rd Qu.: 4.180 3rd Qu.:2.4425
Max. :11.061 Max. :7.2850
> cov(ymm)
[,1] [,2]
[1,] 1.1610086 0.6772225
[2,] 0.6772225 0.4482210
> yml=bootgammaML(x,10000)
> summary(yml)
X1
X2
Min. :1.262 Min. :0.5769
1st Qu.:2.259 1st Qu.:1.2807
Median :2.666 Median :1.5464
Mean :2.801 Mean :1.6284
3rd Qu.:3.197 3rd Qu.:1.8797
Max. :8.889 Max. :6.0282
> cov(yml)
[,1] [,2]
[1,] 0.5930060 0.3479845
[2,] 0.3479845 0.2432802
Plotting the bootstrap MMEs and the bootstrap MLEs (note the differences in the scales) yields:
9
The resulting bootstrap estimates of the SEs:
" of α̂ MM and λ̂MM are √1.1610 = 1.078 and √0.4482 = 0.669, respectively,
" of α̂ ML and λ̂ML are √0.5930 = 0.770 and √0.2433 = 0.493, respectively,
In summary, once the results are expressed as estimated SEs, they differ in only minor ways from the
corresponding results obtained with B = 1000.
If we assume that n = 30 is large enough to consider both the MMEs and the MLEs to be, for all
practical purposes, unbiased (note that the results above suggest this may NOT be the case), then we
would estimate the efficiency of the MME relative to the MLE to be approximately:
" (0.770/1.078)2 = 0.51, or 51% for the estimation of α ,
" (0.493/0.669)2 = 0.54, or 54% for the estimation of λ .
That is, we would need almost twice the sample size to attain the same asymptotic variance with the
MMEs as with the MLEs.
Confidence Intervals for the Unknown Parameters α and λ:
===========================================
We will see in class that θˆ ± 1.96 SˆE (θˆ) provides an approximate 95% confidence interval for θ
whenever the estimator θˆ is asymptotically normally distributed, as is the case for MMEs and MLEs.
Using the bootstrap estimates of the SEs (let’s use the B=10000 results), we obtain approximate 95%
confidence intervals:
For α of:
" 3.161 ± 1.96 x 1.078 $ (1.05, 5.27) based on the MME, α̂ MM
" 2.538 ± 1.96 x 0.770 $ (1.03, 4.05) based on the MLE, α̂ ML
For λ of:
" 1.813 ± 1.96 x 0.669 $ (0.50, 3.13) based on the MME, λ̂MM
" 1.456 ± 1.96 x 0.493 $ (0.49, 2.42) based on the MLE, λ̂ML
Note that, in both cases, the lower endpoints of the two confidence intervals are basically the same so
the main difference is in the upper endpoint. Because the estimated SEs for the MLEs are smaller than
those for the MMEs, the confidence intervals based on the MLEs are quite a bit shorter – that is, the
MLEs lead to considerably tighter “ranges of plausible values” for the unknown parameters.
Comparison to Results Based on Large Sample Theory:
========================================
As noted in class, the (bivariate) CLT for the first two sample moments together with the delta method
yield the asymptotic (bivariate) distribution of the MME and the general large sample theory results for
MLEs presented in class yield the asymptotic distribution of the MLE. Specifically, for the case of
sampling from a Gamma(α,λ) distribution, as n → ∞ :
αˆ
θˆMM = ˆMM ≈ N 2
λMM
2(α + 1)λ
α 1 2α (α + 1)
λ , n 2(α + 1)λ ( 2α + 3)λ2 / α ,
and
10
λ
α
αˆ
α
1
θˆML = ˆML ≈ N 2 ,
2
,
λML
λ n [αg (α ) − 1] λ λ g (α )
where g (α ) = Γ' ' (α ) / Γ(α ) − [ Γ' (α ) / Γ(α )]2 is the trigamma function.
Plugging the values of the estimates into these expressions (need to use MMEs in the expressions for
θˆMM and MLEs in the expressions for θˆML as that is all we would have in practice) yields the estimated
(asymptotic) SEs:
" for α̂ MM and λ̂MM as [2(3.1609)(4.1609)/30]1/2 = 0.936 and [9.3219(1.8131)2/30(3.1609)]1/2 =
0.568, respectively,
" for α̂ ML and λ̂ML as [2.5376/30(0.2222)]1/2 = 0.617 and [(1.4555)2(0.4816)/30(0.2222)]1/2 =
0.391, respectively.
Compared to the bootstrap results, in both cases, it appears these asymptotic expressions underestimate
the variability to some degree; that is, it looks like n = 30 may not be a large enough sample size for
these asymptotic results to provide accurate approximations.
Let’s check this using simulation. As you have learned in the lab, it is very easy to use R to carry out
simulation experiments and this is a powerful tool that you can use in many circumstances, for
example, when you want to check the properties of some random phenomenon and the necessary
probability calculations are too hard to do analytically.
Simulate distributions of MME and MLE for a sample of n = 30 from Gamma(3,2):
============================================================
The function simgammaMM evaluates MMEs for repeated samples of n = 30 from Gamma(3,2). You
could easily make this function suitable for any Gamma distribution. We simulate 1000 samples:
> simgammaMM
function(n,S){
aB=matrix(0,nrow=S,ncol=2)
for(i in 1:S){
xB=rgamma(n,shape=3,rate=2)
aB[i,1]=(n/(n-1))*mean(xB)^2/var(xB)
aB[i,2]=aB[i,1]/mean(xB)
}
return(aB)
}
> ymm=simgammaMM(30,1000)
> summary(ymm)
X1
X2
Min. :1.328 Min. :0.8884
1st Qu.:2.729 1st Qu.:1.7928
Median :3.293 Median :2.2037
Mean :3.385 Mean :2.2764
3rd Qu.:3.888 3rd Qu.:2.6487
11
Max. :8.597 Max. :5.7478
> cov(ymm)
[,1] [,2]
[1,] 0.9856841 0.6336221
[2,] 0.6336221 0.4744681
Now do the same thing for the MLEs:
> simgammaML
function(n,S){
aB=matrix(0,nrow=S,ncol=2)
for(i in 1:S){
xB=rgamma(n,shape=3,rate=2)
aB[i,1]=Ngamma(mean(xB)^2/var(xB),xB)
aB[i,2]=aB[i,1]/mean(xB)
}
return(aB)
}
> yml=simgammaML(30,1000)
> summary(yml)
X1
X2
Min. :1.568 Min. :1.020
1st Qu.:2.661 1st Qu.:1.750
Median :3.122 Median :2.095
Mean :3.283 Mean :2.209
3rd Qu.:3.734 3rd Qu.:2.583
Max. :8.783 Max. :5.849
> cov(yml)
[,1] [,2]
[1,] 0.7956796 0.5392879
[2,] 0.5392879 0.4214266
First consider estimation of α. Plot the histograms of the simulated MMEs and MLEs of α and
superimpose a plot of the normal density corresponding to the asymptotic approximations:
> hist(ymm[,1],prob=T)
> xp=seq(1.33,8.59,0.01)
> s=sqrt(2*3*4/30)
>s
[1] 0.8944272
> lines(xp,dnorm(xp,mean=3,sd=s))
> hist(yml[,1],prob=T)
> xp=seq(1.57,8.78,0.01)
> s=sqrt(3/(30*(3*trigamma(3)-1)))
>s
[1] 0.735608
> lines(xp,dnorm(xp,mean=3,sd=s))
12
The histograms of the simulated MMEs and MLEs are skewed to the right (positively) and hence more
“spread out” than the (symmetric) asymptotic normal approximation. Although not overly dramatic,
this skewness is apparent in the qqplots for the MME (top plot below) and MLE (bottom plot).
> qqnorm(ymm[,1])
> qqnorm(yml[,1])
13
Because of this skewness in the true distributions, the (symmetric) asymptotic normal approximations
will underestimate the variability – as we suspected was the case (now we know!). We conclude that a
larger value of n is required for the asymptotic normal approximations to the distributions of α̂ MM and
α̂ ML to be accurate.
Now consider estimation of λ similarly.
> hist(ymm[,2],prob=T)
> xp=seq(0.89,5.74,0.01)
> s=sqrt((2*3+3)*2*2/(3*30))
>s
[1] 0.6324555
> lines(xp,dnorm(xp,mean=2,sd=s))
> hist(yml[,2],prob=T)
> xp=seq(1.03,5.84,0.01)
> s=sqrt(2*2*trigamma(3)/(30*(3*trigamma(3)-1)))
>s
[1] 0.5337994
> lines(xp,dnorm(xp,mean=2,sd=s))
The histograms of the simulated MMEs and MLEs are again skewed to the right (positively) and hence
more “spread out” than the (symmetric) asymptotic normal approximation. This skewness can also be
seen in the qqplots for the MME (top plot below) and the MLE (bottom plot).
> qqnorm(ymm[,2])
> qqnorm(yml[,2])
14
Transformations to Improve Asymptotic Normal Approximations:
===============================================
The asymptotic normal approximations to the distributions of the MMEs and the MLEs do not appear
to be very accurate for this example. Might these approximations perform better on a different scale?
For positive random variables that have (positively) skewed distributions, a natural transformation to
consider is the log – it “pulls in” the very large positive values so might eliminate the positive
skewness. Of course, it could induce negative skewness!
From the asymptotic results given above, the delta method easily yields that for the case of sampling
from a Gamma(α,λ) distribution, as n → ∞ :
log(α ) 1 2(α + 1) / α
log(αˆ MM )
≈
N
log(θˆMM ) =
2
log(λ ) , n 2(α + 1) / α
ˆ )
λ
log(
MM
2(α + 1) / α
( 2α + 3) / α
,
and
log(αˆ ML )
≈
log(θˆML ) =
N
2
ˆ )
log(
λ
ML
log(α )
1 / α
1
,
log(λ ) n [αg (α ) − 1] 1 / α
1/ α
.
g (α )
Let’s compare the histograms of the logs of the simulated estimates of α to these asymptotic normal
approximations:
> hist(log(ymm[,1]),prob=T)
> xp=seq(0.29,2.15,0.01)
> s=sqrt(2*4/(3*30))
> lines(xp,dnorm(xp,mean=log(3),sd=s))
> hist(log(yml[,1]),prob=T)
> xp=seq(0.46,2.17,0.01)
> s=1/sqrt(30*3*(3*trigamma(3)-1))
> lines(xp,dnorm(xp,mean=log(3),sd=s))
15
Note how much more symmetric than the earlier histograms these are. The asymptotic normal
approximations to the distributions of log(αˆ MM ) and log(αˆ ML ) are still far from perfect but they now
look reasonably accurate. The improvement is also apparent in the qqplots that look very much like
straight lines:
> qqnorm(log(ymm[,1]))
> qqnorm(log(yml[,1]))
16
Similarly, for the simulated estimates of λ:
> hist(log(ymm[,2]),prob=T)
> xp=seq(-0.11,1.74,0.01)
> s=sqrt((2*3+3)/(3*30))
> lines(xp,dnorm(xp,mean=log(2),sd=s))
> hist(log(yml[,2]),prob=T)
> xp=seq(0.03,1.76,0.01)
> s=sqrt(trigamma(3)/(30*(3*trigamma(3)-1)))
> lines(xp,dnorm(xp,mean=log(2),sd=s))
These histograms are also much more symmetric than the earlier histograms of the estimates on the raw
scale. The asymptotic normal approximations to the distributions of log(λˆMM ) and log(λˆML ) now look
reasonably accurate and this is also relected in the qqplots that look very much like straight lines:
> qqnorm(log(ymm[,2]))
> qqnorm(log(yml[,2]))
17
You may want to check how much different things look for this example when:
" the values of α and λ being considered are different,
" the value of n being considered is different.
Of course, you can equally well carry out such an investigation for any other example!
Your computer is a powerful tool for learning. You can use it to carry out:
" exact calculations of probabilities that are not otherwise feasible,
" the Monte Carlo method to approximate probabilities (and other integrals),
" simulation studies to help understand when asymptotic approximations are accurate.
THE END
18
© Copyright 2026 Paperzz