Maximum Likelihood Estimation Event Count Models Siméon

Event Count Models
Suppose the social system produces events over time:
Maximum Likelihood Estimation
Coup d’état
Collapse of cabinet government
Charles H. Franklin
Outbreak of war
[email protected]
Deaths by horse kick in cavalry units
University of Wisconsin – Madison
Lecture 14
Event Count Models: Poisson and Negative-Binomial
Last Modified: June 13, 2005
Maximum Likelihood Estimation – p.2/29
Maximum Likelihood Estimation – p.1/29
Siméon-Denis Poisson (1781-1840)
“Life is good for only two things, discovering mathematics and
teaching mathematics.”—Poisson
Event Count Models
A talented mathematician at a young age, by 18 Poisson
attracted the attention of his mentors and teachers, Laplace
and Lagrange and Legendre.
He derived the Poisson distribution for the first time in
Recherches sur la probabilité des jugements en matière
criminelle et matière civile published in 1837.
The Poisson distribution describes the probability that a
random event will occur in a time or space interval when the
probability of the event occurring is very small, but the number
of trials is very large.
It is the limit of a binomial process in which π → 0 and n → ∞.
Maximum Likelihood Estimation – p.3/29
Maximum Likelihood Estimation – p.4/29
A Poisson distribution
Ladislaus Bortkiewicz (1868-1931)
50
40
Percent
30
20
10
0
0
1
2
3
4
5
1000 draws from a Poisson
Maximum Likelihood Estimation – p.6/29
Maximum Likelihood Estimation – p.5/29
Bortkiewicz and death by horse kick
Prussian Cavalry Deaths
0
Bortkiewicz was a Russian economist and statistician of Polish
descent.
In 1898 he published a book about the Poisson distribution,
titled The Law of Small Numbers.
Events with low frequency in large populations tend to follow
the Poisson distribution.
The best thing about the book is his data on the number of
members of 14 Prussian cavalry units killed by being kicked by
a horse from 1875-1894.
These data fit a Poisson quite well, despite year-to-year and
unit-to-unit variation.
1
2
3
4
unit
unit
unit
unit
unit
unit
unit
unit
unit
unit
60
40
20
0
60
40
20
0
unit
unit
unit
unit
60
40
20
0
0
1
2
3
4
0
1
2
3
4
Prussian Cavalry Deaths by Horsekick, 1875−94
Maximum Likelihood Estimation – p.7/29
Maximum Likelihood Estimation – p.8/29
Prussian Cavalry Deaths
Flying-bomb hits in London
An interesting application from WWII.
50
R. D. Clarke, 1946, “An Application of the Poisson
Distribution”, Journal of the Institute of Actuaries V72, p 481.
40
“During the flying-bomb attack on London, frequent assertions
were made that the points of impact of the bombs tended to be
grouped in clusters. It was accordingly decided to apply a
statistical test to discover whether any support could be found
for this allegation.”
Percent
30
20
Solution: Divide London into 576 squares of 1/4 km2 each.
10
Count the number of flying-bomb hits per square.
There is a low probability of any specific square being hit, but
a large number of squares!
0
0
1
2
3
4
Pooled Prussian Cavalry Deaths
Maximum Likelihood Estimation – p.10/29
Maximum Likelihood Estimation – p.9/29
Analysis of flying-bombs
> T<-537
> N<-576
> y<-0:5
> lambda<-T/N
> lambda
[1] 0.9322917
> n<-c(229,211,93,35,7,1)
> n
[1] 229 211 93 35
7
1
> py<-dpois(y,lambda)
> py
[1] 0.393650560 0.366997137 0.171074186 0.053163679 0.012391014
[6] 0.002310408
> py[6]<-1-sum(py[1:5])
> En<-round(N*py,2)
Maximum Likelihood Estimation – p.11/29
Analysis of flying-bombs
>
> print(cbind(y,En,n))
y
En
n
[1,] 0 226.74 229
[2,] 1 211.39 211
[3,] 2 98.54 93
[4,] 3 30.62 35
[5,] 4
7.14
7
[6,] 5
1.57
1
>
> chi2<-sum(((n-En)ˆ2)/En)
> chi2
[1] 1.170929
> 1-pchisq(1.17,4)
[1] 0.8830128
>
Maximum Likelihood Estimation – p.12/29
Flying-bomb hits in London
“The occurrence of clustering would have been reflected in the
above table by an excess number of squares containing either
a high number of flying bombs or none at all, with a deficiency
in the intermediate classes.”
“The closeness of fit which in fact appears lends no support to
the clustering hypothesis.” — Clarke, 1946.
The Poisson Model
If the process generates events independently and at a fixed
rate within time periods, then the result is a Poisson process.
P (y|λ) =
and
e−λ λyi
yi !
E yi = λ
and
V ar (yi ) = λ
Maximum Likelihood Estimation – p.14/29
Maximum Likelihood Estimation – p.13/29
Poisson Model
The Poisson Model
We reparameterize λ to be a non-negative, continuous
variable:
λi = exβ
giving us the reparameterized Poisson model as
The log-likelihood of the model is:
lnL = ln
i=1
=
xβ
e−e (exβ )yi
P (y|λ) =
yi !
=
=
=
Maximum Likelihood Estimation – p.15/29
y
N
e−λi λ i
yi !
i
y
ln(e−λ + ln(λi i − ln yi !
(−λi + yi ln λi )
(−exβ + yi ln exβ )
(yi (xβ) − exβ )
Maximum Likelihood Estimation – p.16/29
Inference for Poisson
The Poisson Model
Marginal effects in the Poisson are
∂E y|x
∂x
=
=
=
Since these are ML estimates, all the usual results apply.
What else is there to say?
∂λi
∂x
∂exβ
∂x
∂exβ ∂xβ
∂xβ ∂x
= exβ β
Maximum Likelihood Estimation – p.18/29
Maximum Likelihood Estimation – p.17/29
Modifications to the Poisson
Suppose the observation intervals are not all equal: The
number of appointments a president gets to make to the
supreme court depend on his length of office. Not all serve
exactly 4 or 8 years: Kennedy, Johnson, Nixon, Ford, and
(almost!) Clinton.
Or a more contemporary case: SARS reporting is not always
equally spaced. New cases may be for 1, 2, 3 or even more
days. To estimate the rate of new cases we have to take into
account the length of the reporting period.
Rather than ignore the unequal intervals, the Poisson is easily
modified to incorporate this.
Poisson with unequal intervals
Let ti be the time interval for observation of case i.
yi ∼
e−λi ti (λi ti )yi
yi !
Reparameterize as before λi = exβ and the log-likelihood
becomes:
y
xβ lnL = ln e−e ti + ln exβ ti + ln yi !
i
xβ
xβ
= −e ti + yi ln e ti
= yi (xβ + ln ti ) − exβ ti
= yi xβ − exβ ti
Maximum Likelihood Estimation – p.19/29
where ln ti does not involve β and so can be dropped from the
Maximum Likelihood Estimation – p.20/29
log-likelihood.
Negative Binomial Event Counts
What if there is more heterogeneity in the population than we
can model with a Poisson?
Negative Binomial
Suppose the model should actually be
λ̃i = exp xi β + i
In the Poisson we had
λi = exβ
This captures variation in λi as due to variation in x
But what if xβ fails to adequately account for all the
heterogeneity in λi ?
where i is a random variable uncorrelated with x .
As usual, we cannot observe , so for any fixed value of x we
have a distribution of λi , rather than a single value, as in the
Poisson model.
Maximum Likelihood Estimation – p.22/29
Maximum Likelihood Estimation – p.21/29
The relation of λ̃i and λi
δi
We need some assumptions about δi .
Reexpress λ̃i as
It would
be especially nice if E (δi ) = 1 because then we’d get
E yi = λi , despite the added heterogeneity.
λ̃i = exi β+i
= exi β ei
= λi ei
= λi δi
Further, the distribution of yi conditional on x and δ remains a
Poisson:
y
P r (y|x, δ) =
where we let δi = ei .
=
Maximum Likelihood Estimation – p.23/29
e−λ̃i λ̃i i
yi !
e−λi δi (λi δi )yi
yi !
Maximum Likelihood Estimation – p.24/29
The Negative Binomial Likelihood
Suppose δi has a gamma distribution
If we parameterize the gamma as gamma(θ)/θ , then δi has
mean 1 and variance 1/θ , and hence at the mean we get our
standard Poisson λi back.
This requires us to integrate over the values of δi to replace
it’s value in terms of the gamma parameter, θ .
Poisson Example: Presidential Vetos
> p.fm7<-glm(nveto˜janpop+prespty+congmaj+popvote,family=poisson)
> summary(p.fm7)
Call: glm(formula = nveto ˜ janpop + prespty + congmaj + popvote,
family = poisson)
Deviance Residuals:
Min
1Q Median
-4.783 -1.978 -1.064
3Q
1.069
Max
9.823
The result is
y
λi θ θ
Γ (θ + y)
f (y|λi , θ) =
Γ (θ)y! (λi + θ)θ+y
Maximum Likelihood Estimation – p.26/29
Maximum Likelihood Estimation – p.25/29
Poisson Example: Presidential Vetos
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.409909
0.566030
0.724 0.46895
janpop
0.013684
0.004884
2.802 0.00508 **
prespty
0.173597
0.105534
1.645 0.09998 .
congmaj
-0.275073
0.115369 -2.384 0.01711 *
popvote
0.031044
0.009749
3.184 0.00145 **
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Negative Binomial: Presidential Vetos
> nb.fm7<-glm.nb(nveto˜janpop+prespty+congmaj+popvote)
> summary(nb.fm7)
Call: glm.nb(formula = nveto ˜ janpop + prespty + congmaj +
popvote,init.theta = 1.94788867323951, link = log)
Deviance Residuals:
Min
1Q
Median
-2.6817 -0.6970 -0.3070
3Q
0.4499
Max
2.5772
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 264.85
Residual deviance: 242.35
AIC: 361.48
on 25
on 21
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
Maximum Likelihood Estimation – p.27/29
Maximum Likelihood Estimation – p.28/29
Negative Binomial: Presidential Vetos
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.33999
1.64197 -0.207
0.836
janpop
0.01419
0.01450
0.979
0.328
prespty
0.26004
0.31660
0.821
0.411
congmaj
-0.30439
0.35376 -0.860
0.390
popvote
0.04423
0.02847
1.554
0.120
(Dispersion parameter for Negative Binomial(1.948) family taken to
be 1)
Null deviance: 32.197
Residual deviance: 29.004
on 25
on 21
degrees of freedom
degrees of freedom AIC: 202.47
Number of Fisher Scoring iterations: 1
Theta:
Std. Err.:
2 x log-likelihood:
1.948
0.602
-190.469
Maximum Likelihood Estimation – p.29/29