the use of poisson regression in the sociological study of suicide

CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY Vol.5 (2014) 2, 97–114 DOI: 10.14267/cjssp.2014.02.04
THE USE OF POISSON REGRESSION IN THE
SOCIOLOGICAL STUDY OF SUICIDE
Ferenc Moksony–Rita Hegedűs1
ABSTRACT This paper explains how Poisson regression can be used in studies
in which the dependent variable describes the number of occurrences of some
rare event such as suicide. After pointing out why ordinary linear regression is
inappropriate for treating dependent variables of this sort, we go on to present
the basic Poisson regression model and show how it fits in the broad class of
generalized linear models. Then we turn to discussing a major problem of Poisson
regression known as overdispersion and suggest possible solutions, including the
correction of standard errors and negative binomial regression. The paper ends
with a detailed empirical example, drawn from our own research on suicide.
KEYWORDS Deviant behavior, Generalized linear models, Poisson regression,
Rare events, Research methods, Statistical analysis, Suicide
In social research, we often encounter dependent variables that describe
the frequency of occurrence of events of various kinds. Students of deviant
behavior, for example, look at whether media reporting of celebrity suicides
leads to an increase in the number of self-destruction; demographers study
the impact of environmental hazards on the number of birth defects; and
sociologists of science examine the factors that influence the frequency
with which publications are cited by fellow researchers. In cases like these,
scholars frequently rely on ordinary linear regression to analyze their data.
This sometimes produces acceptable results, especially when the mean
frequency of occurrence of the event under study is relatively large, since
in this situation the distribution of the dependent variable does not usually
deviate substantially from the normal distribution assumed by ordinary least
1 Ferenc Moksony is professor, Rita Hegedűs is associate professor at the Corvinus University
of Budapest. Corresponding author’s e-mail: [email protected]. The research reported in
this paper was financially supported by the Hungarian Scientific Research Fund (Grant no.
T043441).
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014)
98
FERENC MOKSONY–RITA HEGEDŰS
squares regression.
Often, however, the event we are interested in explaining is rare and the
distribution of the dependent variable is highly skewed, with frequencies
peaking at the lowest value and sharply declining toward the upper end of the
scale. Figure 1 illustrates this, showing the frequency distribution of suicides
in Hungarian villages from 1990 to 1995.2
Figure 1 Frequency distribution of suicide in Hungarian villages, 1990–1995
What we see on this graph is a far cry from the familiar symmetrical bellFrequency
of suicide
Hungarian
1990–1995no
shapedFigure
curve1.of
normal distribution
distribution:
almostin800
villagesvillages,
had absolutely
suicide during the six years under study and another 600 had but one. The
number of communities with many cases of death, in contrast, is very low,
with only 38 villages registering more than twenty suicides. Variables with
such asymmetric, right-skewed distributions can be approximated with an
important class of discrete distribution, Poisson distribution.3
2 The graph displays data for areas officially recognized as villages for the whole period from
1990 to 1995.
3 As the mean frequency of occurrence of the event under study increases, the shape of the
Poisson distribution gradually approaches that of normal distribution (see, e.g., Sváb 1981:
498). The Poisson distribution, then, is of greatest importance for the study of rare events (cf.,
Land - McCall 1996).
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014)
PROBLEMS OF ORDINARY LEAST SQUARES AND POSSIBLE SOLUTIONS
THEimportant
US OF POISSON
REGRESSION
IN THE
SOCIOLOGICAL
OF SUICIDE
PROBLEMS
OF
ORDINARY
SQUARES
AND
POSSIBLE
99
One
characteristic
of LEAST
Poisson
distribution
is
thatSTUDY
the mean
is SOLUTIONS
equal to the variance.
This is easy to understand if we regard the Poisson distribution as an extreme case of
One
important characteristic
of Poisson distribution
is thatSQUARES
the mean is equal
to the variance.
PROBLEMS
OF ORDINARY
LEAST
AND
POSSIBLE
SOLUTIONS
binomial
distribution
in which the probability of one of two possible outcomes (e.g.,
This is easy to understand if we regard the Poisson distribution as an extreme case of
suicide/no
suicide, birth
defect/no birth
defect) isdistribution
very small. The
mean
of a binomially
One distribution
important
characteristic
of Poisson
that
theoutcomes
mean
is equal
binomial
in which the probability
of one of twoispossible
(e.g.,
to the variance. This is easy to understand if we regard the Poisson distribution
distributed variable is
suicide/no
suicide, case
birth of
defect/no
birthdistribution
defect) is very
small. the
The probability
mean of a binomially
as an extreme
binomial
in which
of one of
two possible outcomes (e.g., suicide/no suicide, birth defect/no birth defect)
x  npdistributed
,
is very small. The mean of a binomially
variable is
distributed variable is
and its variance is
x  np,
and its variance is
and its variance is
s 2  np (1  p),
where n is the number of observations
s 2  np (1 and
 p),p is the relative frequency of one
where
n istwo
the possible
number ofoutcomes.
observationsAs
andp pgets
is thesmaller
relative and
frequency
of one
the two
of the
smaller,
theofterm
in
parentheses on the right hand side of the equation approaches 1, and thus s2,
possible outcomes. As p gets smaller and smaller, the term in parentheses on the right hand
where
n is the number
of observations
and p is the relative frequency of one of the two
the variance,
approaches
np, the mean.
does this
characteristic
distribution
affect
the use of
the variance,
approaches np,
the mean.
sideHow
of the equation
approaches
1, and thusofs2,Poisson
possible outcomes. As p gets smaller and smaller, the term in parentheses on the right hand
ordinary linear regression? In regression analysis, we assume the conditional
mean
the dependent
is some
functionapproaches
of the explanatory
variables;
np, the mean.
side
of theofequation
approachesvariable
1, and thus
s2, the variance,
that is, we assume that the former changes in a systematic way with the values
3
of the
themean
latter.
With
Poisson-distributed
this
assumption
As
frequency
of occurrence
of the event underdependent
study increases,variables,
the shape of the
Poisson
distribution
implies
that thethatconditional
variance
of the
variable,
being equal
gradually
approaches
of normal distribution
(see, e.g.,
Svábdependent
1981: 498). The
Poisson distribution,
then, isto
of
3
the
also
theevent
values
of
the
explanatory
variables,
greatest
importance
forchanges
the
study ofwith
rareofevents
(cf.,
Land study
- McCall
1996). the shape of
As
the mean,
mean
frequency
of occurrence
the
under
increases,
the Poisson violating
distribution
an important
requirement
of ordinary
squares
regression,
namely,
that
gradually
approaches that
of normal distribution
(see, e.g., least
Sváb 1981:
498). The
Poisson distribution,
then, is
of
is the same for all levels of
the independent variables (homoscedasticity).
Conventional regression methods, then, are not generally appropriate
for dependent variables that describe the frequency of rare events such as
suicides or birth defects. What next, then? One option is to transform the
dependent variable in order to make it satisfy the assumptions underlying
ordinary least squares, and then use the transformed data in place of the
original ones. Researchers taking this approach commonly employ the
square-root transformation, which has the advantage of eliminating the meanvariance identity, since the variance of the square-root of a Poisson variable is
independent of its mean (Chatterjee & Price 1977: 39). An additional benefit
is that the distribution of the new variable will be more similar to the normal
distribution, which is important for significance tests.
the conditional
variance
ofevents
the dependent
variable
greatest
importance for the
study of rare
(cf., Land - McCall
1996).
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014)
100
FERENC MOKSONY–RITA HEGEDŰS
While transforming the dependent variable is a common practice that is
widely adopted by researchers (e.g., Moksony 2001), another possibility is to
modify the regression model, making it suitable for the analysis of Poisson
variables. This approach, in some sense, is the reverse of the previous
one: rather than tailoring the distribution of the dependent variable to the
requirements of ordinary least squares, here we proceed the other way around
and tailor the regression model to the distribution of the dependent variable.
This “custom-made” version of regression analysis is known as Poisson
regression.
Poisson regression belongs to the family of generalized linear models
(Hoffman 2004; Agresti 1996: Ch. 4). These models extend the scope of
ordinary linear regression in two ways. First, they describe transformations
of the conditional mean of the dependent variable, rather than the mean itself,
as linear functions of explanatory variables; second, they allow the dependent
variable to have conditional distributions other than the normal. Various
forms of generalized linear models differ from each other in the particular
type of transformation applied and the specific distribution assumed for the
dependent variable. Table 1, adapted from Agresti (1996: 97), lists the most
important of these models.
Table 1 Generalized linear models
Model
Transformation
applied to the mean
Distribution of
Dependent Variable
Type of explanatory
variables
Numerical and
Categorical
Numerical and
Categorical
Poisson regression
Logarithm
Poisson
Logistic regression
Logit
Binomial
Linear regression
Identity
Normal
Numerical
Analysis of variance
Identity
Normal
Categorical
Analysis of covariance
Identity
Normal
Numerical and
Categorical
Source: Agresti 1996: 97 (Table 4.5).
In Poisson regression, as can be seen, logarithmic transformation is used
and the dependent variable is taken to follow a Poisson distribution. In
contrast, in logistic regression, a logit transformation is employed and the
dependent variable is assumed to have a binomial distribution. The table
also includes ordinary linear regression, analysis of variance and analysis
of covariance, which together comprise what is known as the general linear
model (Fennessey 1968; Cohen 1968). In these three types of models, the
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014)
THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE
101
All these variants of the general linear model, then, are just special cases of the general
normal distribution
is assumed for the dependent variable and the identity
4
linear model.is
transformation
used;
that is, the conditional
is directly describedOF ITS
THE POISSON REGRESSION
MODELmean
ANDitself
INTERPRETATION
as COEFFICIENTS
a linear function of the explanatory variables. All these variants of the
general linear model, then, are just special cases of the generalized linear
4
model.
In the
simple bivariate case, the Poisson regression model has the following form:
ln   MODEL
(1)
0  1 X AND
THE POISSON REGRESSION
THE POISSON REGRESSION MODEL AND INTERPRETATION OF ITS
INTERPRETATION
OF ITS COEFFICIENTS
COEFFICIENTS
where  denotes the mean or expected value of the dependent variable, X is the indepe
In
has the
In the
the simple
simple bivariate
bivariate case,
case, the
the Poisson
Poisson regression
regression model
model has
the following
following form:
form:
variable, and  0 and 1 are the regression coefficients to be estimated. These coefficient
ln    0  1 X
(1)
essentially the same meaning as in ordinary linear regression; 1 , in particular giv
where l denotes the mean or expected value of the dependent variable, X is
 denotes
thewhere
independent
variable,
andorb0expected
and b1 are
the of
regression
coefficients
to X
beis the indepen
the mean
value
the dependent
variable,
change in the natural logarithm
of the mean frequency of the dependent variable per on
estimated. These coefficients have essentially the same meaning as in ordinary
linear
regression;
band
gives thecoefficients
change in the
logarithm coefficients
1particular
1, in
are the
regression
to benatural
variable,
change inand
the 0explanatory
variable.
This interpretation
isestimated.
not veryThese
user-friendly, how
of the mean frequency of the dependent variable per one unit change in the
explanatory
variable.
interpretation
istonot
very
however;
1 , in particular
researchers,
after
all,This
do not
usually
think
inuser-friendly,
terms
of logarithms.
It is, therefore,
gives
essentially
the
same
meaning
as in like
ordinary
linear
regression;
researchers, after all, do not usually like to think in terms of logarithms. It is,
therefore,
useful
to reformulate
Eq.
(1)
by taking
the
of both
to reformulate
Eq.
(1)logarithm
by taking
thethe
antilogarithm
of antilogarithm
both
sides:
change
in
the natural
of
mean
frequency
of the
dependent
variable per one
sides:
change in the explanatory variable. This interpretation is not very user-friendly, howe
  exp( 0  1 X )  exp( 0 ) exp(1 X )
(2)
researchers, after all, do not usually like to think in terms of logarithms. It is, therefore, us
In In
contrast
totob1,1 which
contrast
, whichrepresented
representedthe
theadditive
additiveeffect
effectof
of the
the explanatory
explanatory variable on t
variable
on
the
log
of
the
mean
frequency,
exp(
b
)
represents
its
multiplicative
to reformulate Eq. (1) by taking the antilogarithm
of both sides:
1
on mean
the
mean
frequency
indicating
how
many
times
larger
(orthe
under studyeffect
becomes
as the
independent
variable
increases
by
one unit.
To
see this,
suppose
X
1 ) represents
its
multiplicative
effect
on
mean freq
of the
frequency,
exp(itself,
smaller) the mean frequency of the phenomenon under study becomes as the
  exp(
 0  1by
X )  exp( 0 ) exp(1 X )
(2)
independent
increases
one unit.
see like
this, suppose X is equal to
is equal to a,
which
canvariable
be anyhow
number.
then To
looks
itself, indicating
many Eq.
times(2)larger
(or
smaller)this:
the mean frequency of the phenom
a, which can be any number. Eq. (2) then looks like this:
In contrast to 1 , which represented the additive effect of the explanatory variable on the
( X  a )  exp(  0  1a )  exp(  0 )  exp( 1a ),
4
Roughly speaking, while the general linear model extends the scope of ordinary regression by a
where
themean
vertical
bar refers
condition its
thatmultiplicative
the explanatory
variable
) represents
effect
on the mean freque
of the
frequency,
exp(to1the
categorical
to be included
the
analysis,
generalized
linear
models
step
where the takes
vertical
barindependent
refers tovariables
the condition
explanatory
takes go
onone
one
on one
particular
value,
namely,
a. that
Let inusthe
now
increase
thevariable
value
of the
itself, indicating how many times larger (or smaller) the mean frequency of the phenome
particular value, namely, a. Let us now increase the value of the explanatory variable by one
unit, from
4 R
oughly speaking, while the general linear model extends the scope of ordinary regression by
allowing categorical independent variables to be included in the analysis, generalized linear
4
a tomodels
a+1.
(2)
takes
this
Roughly
speaking,
while
general
linear model
extends
scope of variable.
ordinary regression by allo
goEq.
one
step now
further
andthe
also
lift form:
the constraints
imposed
on the dependent
categorical independent variables to be included in the analysis, generalized linear models go one step fu
OF 
SOCIOLOGY
AND SOCIAL POLICY 2 (2014)
( X  a  1)  exp 0 CORVINUS
1(a  1JOURNAL
)  exp(
0 )  exp( 1a)  exp( 1 ).
Taking the ratio of the two equations, we get:
where thevalue,
vertical
bar refers
thatvalue
the explanatory
variablevariable
takes on
one
particular
namely,
a. Lettousthe
nowcondition
increase the
of the explanatory
by one
Taking the ratio of the two equations, we get:
particular
value,
namely,
a.now
Let us
now
increase
of the
explanatory variable
FERENC
MOKSONY–RITA
HEGEDŰSby one
102 a
unit,
from
to a+1.
Eq. (2)
takes
this
form: the value
(Eq.X(2)
 now
a  1takes
) exp(
0 )  ( 1a )  exp( 1 )  exp(  ).
unit, from a to a+1.
this 
form:
explanatory
oneunit,
a+1.
(2)1now
takes
( variable
X  a  1by
)  exp
 from
(a  1a) to
 exp(
 Eq.
)  exp(
a)  exp(
11).this form:
0
1
( X  a )
exp(  0 )  ( 01a )
( X  a  1)  exp 0  1(a  1)  exp(  0 )  exp( 1a)  exp( 1 ).
Taking the ratio of the two equations, we get:
We can Taking
see thatthe
as ratio
the value
explanatory
variable, X , has increased from a to (a+1),
of the of
twothe
equations,
we get:
Taking the ratio of the two equations, we get:
( X  a  1) exp(  )  (  a )  exp(  )
( X  a )
exp(  )  (  a )
( X  a  1) exp(  0 )  (01a ) 1exp( 1 )

 exp( 1 ).
a factor equal to exp( 1(). X  a)
exp(  0 )  ( 1a )
We can
see
that
as
the
value
of
the
explanatory
variable,
X
,
has
from from
a to (a+1),
We can see that as the value of the explanatory variable, X,increased
has increased
1
1variable,
that is, by one unit, the mean frequency
of0the dependent

 exp( 1). , has indeed changed by
a tosee
(a+1),
is,value
by one
the mean frequency
variable,
We can
that that
as the
of unit,
the explanatory
variable, Xof, the
has dependent
increased
from
a to (a+1),
 , has indeed
that is,
unit, changed
the mean by
frequency
the dependent
changed by
l, by
hasone
indeed
a factorofequal
to exp(b variable,
).
1
It can be helpful to express this change in percentage form
using the following formula:
be helpful
to frequency
express this
change
in percentage
usingchanged
the
 , form
that is, It
by can
one unit,
the
mean
of
the
dependent
variable,
has indeed
by
a factor
equal to formula:
exp( 1 ).
following
a factor equal to exp( 1 ). percentage change  100 exp( 1 )  1.
It can be helpful to express this change in percentage form using the following formula:
If, for example, the dependent variable is the number of suicides in a village,
It can
be explanatory
helpful
express
thisvariable
change
inispercentage
form
the value
following
the
variable
is
the unemployment
rate,
and the
of b1formula:
is, say,
If, for
example,
thetodependent
the number
ofusing
suicides
in a village,
the explanatory
 100
exp(
1 )  1.(b1)–1=100*(1.057–
.055, then exp (b1) = percentage
exp (.055)change
= 1.057
and
100*(exp
means each percentage
pointthe
rise in
the level of unemployment
variable 1)=5.7,
is thewhich
unemployment
rate,
and
value
percentage
change
 100 exp(
 )  1.of 1 is, say, .055, then
increases, on average, the number of suicides by 5.71 percent. In the same way,
If, for example, the dependent variable is the number of suicides in a village, the explanatory
if the independent variable is type of settlement, with villages coded 0 and
exp( 1 )  exp(.055)  1.057, and 100  (exp( 1 )  1  100 (1.057  1)  5.7 , which means each
1, and the
value ofvariable
b1 is, say,
–.035, then
(b1)=exp(–.35)=.70,
and
If, forcities
example,
dependent
of exp
suicides
the .055,
explanatory
1a village,
variable
is thetheunemployment
rate,is the
andnumber
the value
of in
is, say,
then
100*(exp (b1)–1)=100*(.70–1)= –30, which means the number of suicides is,
percentage
point
rise
in
the
level
of
unemployment
increases,
on
average,
the
number
of
5
on average,
30 percent
less in cities
than
variable
is the
the
value
of  1) 1 5.is,
say, means
.055, each
then
(exp(and
 ) in
1 villages.
100
(1.057
7 , which
exp(  )  exp(.
055) unemployment
1.057, and 100rate,
1
1
100  unemployment
(exp(  )  1  100 (1.057  1)  5.7 , which means each
exp( 1 )  exp(.
055rise
)  1in
.057
, and
percentage
point
the
level
and also
lift the constraints
imposed
on
the of
dependent1variable. increases, on average, the number of
INCLUDING POPULATION AT RISK IN THE MODEL
percentage point rise in the level of unemployment increases, on average, the number of
and also lift
theimportant
constraints imposed
thethe
dependent
variable.
One
featureonof
Poisson
regression
model discussed thus far
is
that
it
does
not
take
differences
in
the
size
of
the
population at risk into
and also lift the constraints imposed on the dependent variable.
account. This is clearly a limitation because, with other factors remaining
5 It should be noted, though, that with dichotomous independent variables like type of settlement
(cities/villages), exp (b1) can only be interpreted directly as reflecting the impact of those
variables when we use dummy coding; that is, when we assign the values 0 and 1 to the two
categories. This is because in this case, the distance or difference between the categories
exactly equals one unit. With effect coding, in contrast, when the values –1 and +1 are used to
denote the two categories, the distance or difference is two rather than one unit and thus exp
(b1)no longer directly indicates the influence of the explanatory variable under study. To get this
influence, exp (b1) has to be raised to the second power, to accommodate the greater distance
between categories.
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014)
THEin
USthe
OF POISSON
IN THE
STUDY OF for
SUICIDE
103 sort of
Differences
size of REGRESSION
the population
at SOCIOLOGICAL
risk can be controlled
by applying some
standardization,
dividing
mean
frequency
by
theobviously
population
risk:
Differences
in the
size
of the
the
population
risk
can
be
controlled
for
by applying
some sort of
constant,
a larger
population
atatrisk
can
beatexpected
to produce
a greater frequency of the phenomenon under study. More populous cities,
standardization,
dividing
frequency
by the
population
at risk:simply because

for instance,
willthe
in mean
general
have higher
numbers
of suicides
ln    0  1 X
(3), factors, such as
n
of
their
dimensions,
quite
regardless
of
the
impact
of
other
 population at risk can be controlled
Differences in the size of the
for by applying some
poverty, unemployment,
   or residential segregation.
ln    0  1 X
(3),
where n is Differences
the population
as the
number at
of risk
individuals
in a city
or village.
in at
the
of the
population
can be living
controlled
for by
n size
 risk,
 such
standardization,
dividing
theofmean
frequencydividing
by the the
population
at risk:by the
applying
some sort
standardization,
mean frequency
This
of the
Poisson
modelofdescribes
theliving
rate ofinoccurrence
of the
population
at risk:
wheremodified
n is
the version
population
at risk,
suchregression
as the number
individuals
a city or village.
   than its absolute frequency, as a linear function of X, the
phenomenon
study,
lnrather
 1 X model describes the rate (3),
This modifiedunder
version
of the
of occurrence of the
Poisson
   0regression
n
independent
variable.
making
use
of absolute
the
properties
of logarithms,
(3)living
canofalso
be
phenomenon
under
rather than
its
asofa individuals
linearEq.
function
X, the
where
n is study,
theBy
population
at risk,
such
as frequency,
the number
in a population
city or village.
of the Poisson
regressionliving
modelin a city or
where n is the
at This
risk,modified
such asversion
the number
of individuals
written
asdescribes
independent
variable.
By making
use ofofthe
of logarithms,
(3) than
can also be
the rate
of occurrence
theproperties
phenomenon
under study,Eq.
rather
its
absolute
frequency,
as
a
linear
function
of
X,
the
independent
variable.
Thiswritten
modified
version of the Poisson regression model describes the rateByof occurrenc
asmaking
 )  ln( nof) logarithms,
 0  1 X Eq. (3) can also
(4), be written as
use of theln(
properties
phenomenon under study,ln(rather
absolute frequency,
as a linear function of
 )  ln(than
n )  its
(4),
0  1 X
which, in turn, can be rearranged to get:
in turn,By
can making
be rearranged
independentwhich,
variable.
use toofget:
the properties of logarithms, Eq. (3) can
which, in turn, can be rearranged to get:
written as
ln( )  ln( n )   0  1 X
(5).
ln(from
 )  ln(
)  given
This model differs
thenone
it contains a separate
0  by
1 X Eq. (1) in that (5).
This model
differs from
the
one
given
by
Eq.
(1)
in
that
it
contains
a separate
explanatory
variable,
ln(n),
generally
called
an
offset,
which
reflects explanatory
the
ln( )  ln( n )   0  1 X
(4),
size of the population at risk, the coefficient of which is automatically taken
variable,
ln(n),
generally
angiven
offset,
reflects
the
size
of thein
atofrisk, the
This model
differs
the
oneof
bywhich
Eq. (1)
that
contains
apopulation
separate
to
equal
1.from
If thecalled
size
the
population
atinrisk
isitthe
same
each unitexplanatory
observation, then ln(n) will be constant and thus can be incorporated into b0,
coefficient
ofcan
which
automatically
taken
to equal
1. Ifthe
the sizeofofthe
the population at risk is
which,
in turn,
beis
rearranged
to
get:which
variable,
ln(n),
generally
called
an offset,
reflects
which
takes
us
back
to the
original
formulation;
thatsize
is, Eq. (1).population at risk, the
the
same inof
each
unitisofautomatically
observation, then
will be
constant
can be incorporated
coefficient
which
takenln(n)
to equal
1. If
the sizeand
of thus
the population
at risk is
ln(to)the
 ln(
n ) ln(n)
 0 will
 1beXconstant
 0 , which
the same
in each
unitus
ofback
observation,
then
and(1).
thus (5).
can be incorporated
into
takes
original
formulation;
that is, Eq.
OVERDISPERSION
Besides including a separate independent variable to capture differences in
us population
back to the at
original
formulation;
thatofis,Poisson
Eq. (1).regression also
into  0 , which
the sizetakes
of the
risk, the
initial model
This modeloften
differs
one given
by Eq.
(1)One
in important
that it contains
a separate expl
needsfrom
to bethe
modified
for another
reason.
characteristics
variable,
of the Poisson distribution, as already noted, is that the mean is equal
to thegenerally
variance. called
The default
model which
of Poisson
regression
is built
on population
this
ln(n),
an offset,
reflects
the size
of the
identity; computations are performed assuming the mean and the variance
of the dependent variable really are the same. In many cases, however, this
at r
coefficient of which is automatically taken to equal 1. If the size of the population a
the same in each unit of observation, then ln(n) will be constant and thus can be incor
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014)

104
FERENC MOKSONY–RITA HEGEDŰS
assumption does not hold true and the variance exceeds the mean. This is
what is known in the statistical literature as overdispersion (Agresti 1996:
92–3; Le 1998: 226–228).
Overdispersion generally arises from one of two sources (King 1989:
766–769; Osgood 2000: 28). First, in almost all research, some explanatory
variables uncorrelated with the ones included in the analysis are left out
from the model, either because they do not come to mind, or because we
cannot capture them empirically. Suppose, for example, the frequency of
suicide for cities or villages is influenced by economic development and
regional culture, but we only include the former in the study. What is likely
to happen in this situation? Holding the level of economic development
constant, areas belonging to that level will be geographically – and culturally
– heterogeneous; they will be a mixture of sub-populations, with each subpopulation having its own distinct, regionally-determined mean frequency
of suicide. That is, the mean of the dependent variable for a certain value
of explanatory variable will not be constant, as is assumed by the Poisson
distribution, but we will instead have as many different means as there are
regions. The result of this heterogeneity is that the variance of the conditional
distribution of suicide – that is, of the distribution pertaining to a given
level of economic development – will be greater than expected based on the
Poisson distribution. The source of overdispersion, then, in this case is the
excess variation caused by explanatory variables left out from the model;
quite similarly to ordinary linear regression, where the impact of the omission
of independent variables uncorrelated with those included in the analysis is to
increase the error or residual variance.
Another factor that may give rise to overdispersion is dependence among
observations. One assumption of Poisson distribution is that observations are
independent of each other; that the occurrence of an event (e.g., a suicide) does
not influence the occurrence of another. What happens when this requirement
is not met; when, for example, the incidence of a suicide increases the chance
of another? In this situation, the frequency of small and large values – that
is, those at the two extreme tails – will be greater than expected based on
the Poisson distribution and the variance of the variable will increase as a
result. If, for instance, kids in a school imitate each other’s behavior, then
instead of observing a random distribution of cases, what we will likely see
is that while nothing happens for months, a substantial number of suicide
attempts take place within a very short time frame. The literature is replete
with examples of such time-space clusterings of cases (e.g., Phillips 1974;
Phillips and Carstensen 1986; Gould et al. 1990). From our point of view, the
important thing about these examples is that the process of imitation and the
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014)
standard
errors results.
using aOne
correction
the degree
overdispersion.
This
e overly
optimistic
way tofactor
cope that
with reflects
this problem
is to ofadjust
the
THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE
105
correction
performedfactor
based that
on an
analysis
of the residuals
from the This
original Poisson
d errors
using ais correction
reflects
the degree
of overdispersion.
underestimated and thus confidence interva
resulting
dependence
of observations
maysum
leadoftosquared
an increase of the variance
and
involves
forming
the of
ratio
of residuals
the
residuals to their
on isregression
performed
based
on
an
analysis
the
from the standardized
original Poisson
of the dependent variable, thus violating a fundamental assumption of Poisson
will give overly optimistic results. One w
that also
the mean
is dispersion
equal to theparameter,
variance. turns out to be considerably
degrees
ofdistribution,
freedom.
Ifnamely
this
ratio,
called
on and
involves
forming
the
ratio
of
the
sum
of
squared
residualsThe
to their
How does overdispersion affect Poissonstandardized
regression results?
most
standard
errors
using
a
correction factor t
6
serious
consequence
is
that
although
regression
coefficients
remain
unbiased,
, out
thisto
is be
a sign
of overdispersion and
greater than
then, also
provided
the
model isparameter,
well-specified
of freedom.
Iftheir
this1,standard
ratio,
called
dispersion
considerably
errors
will
be underestimated
andturns
thus confidence
intervals will
correction
is performed
based on an analy
be
unduly
narrow
and significance
tests
will
give
overly
optimistic
results.
6
standard
errors
can
be
adjusted
by
multiplying
them
with
the
square
root
of
the dispersion
is a sign
of overdispersion
and
han 1, then, provided
is well-specified
One waythe
to model
cope with
this problem is, this
to adjust
the standard
errors using
a
regression
and
involves
forming
correction factor that reflects the degree of overdispersion. This correction the ratio of t
parameter
1996:
93; Allisonthem
2001: 223;the
Gelman
Hill
114-116).
d errors
can beis(Agresti
adjusted
multiplying
squareand
root
of2007:
the dispersion
performedbybased
on an analysis with
of the residuals
from
the
original
Poisson
degrees
of
freedom.
If
this ratio, also called d
regression and involves forming the ratio of the sum of squared standardized
er (Agresti 1996:
93; Allison
2001:
223;ofGelman
andIfHill
2007:
114-116).
residuals
to their
degrees
freedom.
this
ratio,
called ofdispersion
Another way,
besides
correcting
standard
errors,
to cure
thealso
problem
overdispersion
greater
than
then,provided
provided
the modelisistow
parameter, turns out to be considerably greater
than
1,1,then,
the
is well-specified
, this is aexcess
sign ofvariability
overdispersion
and standard
errors regression.
to model
acorrecting
model
that
incorporates
not
captured
Poisson
way,switch
besides
standard
errors,the
to them
cure the
of
overdispersion
is to by multiply
standard
errors
canthebeby
adjusted
can
be adjusted
by multiplying
withproblem
the square
root
of
dispersion
6
parameter (Agresti 1996: 93; Allison 2001: 223; Gelman and Hill 2007: 114-
In thisthat
newincorporates
model, the conditional
mean of the
variable
is1996:
no
longer
a constant,
o a model
the excess variability
notdependent
captured
Poisson
regression.
parameterby
(Agresti
93; Allison
2001:as2
116).
Another way, besides correcting standard errors, to cure the problem of
in the the
original
formulation,
butthe
a variable
that
scattersisrandomly
around
someascentral value,
ew model,
conditional
mean
of
dependent
variable
no longer
a constant,
overdispersion
is to switch
to
a model that
incorporates
the excess
variability
Another
besides correcting
not captured by Poisson regression. In this new
model,way,
the conditional
mean of standard er
due
to the effectbut
of aomitted
explanatory
variables. What
we do, basically,
is
to add a random
riginal
formulation,
variable
central
value,
the dependent
variablethat
is noscatters
longer arandomly
constant, around
as in thesome
original
formulation,
switch
a model
thatdue
incorporates
the exces
but a variable that scatters randomly around
sometocentral
value,
to the
component
to Eq.
he effect
of omitted
explanatory
variables. What
we do,
basically,
is to add is
a random
effect
of (2):
omitted explanatory
variables.
What
we do, basically,
to add a
In this new model, the conditional mean of t
random component to Eq. (2):
ent to Eq. (2):
~
  exp(  0  1 X  ein) the original (6)
formulation, but a variable th
~
due
the effect
of and
omitted
varia
a random
Eq. (2)
e is explanatory
the
 is
 exp(
 0  variable
~ where l
(6)ltofrom
1 X  e) that replaces
where  is
a
random
that term
replaces
from Eq. the
(2)additional
and e is the
newly added random
newly added variable
random error
thatrepresents
heterogeneity
component
to Eq.
introduced by causal factors not explicitly
considered
in(2):
the analysis.
error
term
that
represents
the
additional
heterogeneity
introduced
byrandom
causal
The
main
difference
between
Eqs.
(2)
and
(6)
is
that
while
in
Eq.
(2) for factors not
is a random variable that replaces  from Eq. (2) and e is the newly added
each level of the independent variable we had a single mean value of the
~ (6) is that
explicitly
considered
in the(lanalysis.
Thewemain
difference
Eqs.
(2)not
and
dependent
), heterogeneity
in Eq. (6),
have
a whole distribution
means
(l
), exp(  0  1
rm that
represents
the variable
additional
introduced
bybetween
causal of
factors
corresponding to the fact that now the omitted variables subsumed in the error
the individual
means
randomly
deviate Eqs.
from (2)
the level
determined
y considered term
in themake
analysis.
The main
difference
between
and (6)
is that
~

is
a
random
variable
that replaces
where
6
This qualification is important because a high value for the ratio of the sum of squared standardized
residuals
to their degrees
freedom may
also indicate
fit or
rather
than
overdispersion, such as
6 Thisof
qualification
is important
becauselack
a highof
value
formisspecification,
the ratio of the sum of
squared
standardized
error
term
that
represents
the additional h
residuals
to their
degrees
of freedom
may also
indicate
lack
of fit orstandardized
misspecification,
rather
alification is important
because
a high
value
for the ratio
of the
sum of
squared
residuals
than overdispersion, such as when a linear model is used to describe a curvilinear relationship
egrees of freedom (Bair,
may also
indicate lack of fit or misspecification,
rather thanconsidered
overdispersion,
such analysis.
as
2013).
explicitly
in the
The m
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014)
6
This qualification is important because a high value
to their degrees of freedom may also indicate lack of
degrees of freedom. If parameter
this ratio, also
called
dispersion
parameter,
turnsGelman
out to beand
con
(Agresti
1996:
93; Allison
2001: 223;
6
MOKSONY–RITA
, thisHEGEDŰS
is a sign of overdispe
than 1, then, provided the model isFERENC
well-specified
Another way, besides correcting standard errors, to cure the
errors part
can of
be the
adjusted
multiplying
them with
the square
by standard
the systematic
model by
(Long
1997: 230–231;
Gardner
et al. root of the
switch to a model that incorporates the excess variability not
1995: 399–400).
parameter
(Agrestiof1996:
93; Allison
2001:
223;goes
Gelman
andthe
Hillinitial
2007: 114-116).
This
modification
Poisson
regression,
then,
beyond
In of
this
model, the
mean
of the dependent var
model by changing the mean
thenew
distribution
fromconditional
a constant to
a variable.
106greater
In the original formulation, the dependent variable (Y) was a random
Another
besides
correcting
errors,a to
the
problem
of overdispe
in
the
butcure
a variable
that scatters
rando
variable
thatway,
scattered
around
theoriginal
meanstandard
(l)formulation,
following
Poisson
distribution.
The mean itself, however, was a constant – its value was unequivocally
switch toby
a model
thatof
incorporates
the of
excess
variability
notmodel,
captured
Poisson
determined
the value
thetoexplanatory
variable.
Inexplanatory
the new
Y byWhat
due
the effect
omitted
variables.
we rd
continues to be a random variable scattering around a mean level following
able that fluctuates
around
a central
value
following
Gamma-distribution.
result
In this new
model,
the
conditional
of the
dependent
variable
is noThe
longer
a co
a Poisson
distribution;
butcomponent
now,
this mean
isa also
a random
variable
tomean
Eq.level
(2):
that fluctuates, following a Gamma-distribution, around some central value,
iable that fluctuates around a central value following a Gamma-distribution. The resul
intothe
formulation,
a expected
variable
thatbinomial
scatters
randomly
someiscen
theoriginal
random
error, e. Sincebut
value
of the error
is zeroaround
by which
combinationdue
of
these
two distribution
isthe
the
negative
distribution,
w
~
assumption, the central value that the individual means (
l) scatter
around
is
exp(  0  1 X  e)
combination
of these
distribution
is themodel
negative
binomial
distribution,
which
is w
identical
thetwo
mean
from
the original
given by
Eq.
(2); we
that
is, basically,
due
totothe
effect
of
omitted
explanatory
variables.
What
do,
is to add
modified form of Poisson regression ~described here is called negative binom
~ (l ) =l .
E
component
to Eq. (2):regression
modified form
of Poisson
here that
is called
random variable
replaces negative
from Eq. binom
(2) an
where  is adescribed
ression.
In final analysis, then, in the modified form of Poisson regression we have a
mixture of two different distributions:
first,
have thethe
dependent
variable
error term
that we
represents
additional
heterogeneity int
~
(Y) that scatters around the mean (l)following
a
Poisson
distribution;
second,
exp(  0  1 X  e)
(6)
have the mean
that isregression
itself a random
around
a difference
advantage we
of negative
binomial
overvariable
conventional
Poisson
regression
is thab
explicitly
considered
in that
the fluctuates
analysis.
The
main
central value following a Gamma-distribution. The result of the combination
~
e advantageofofwhere
negative
conventional
Poisson
regression
isadde
th
these
two
isregression
the negative over
binomial
distribution,
which
ise why
 isdistribution
abinomial
random
fromallows
Eq. (2)
andformer
is
thetonewly
s not require
the
variance
to be variable
equal tothatthereplaces
mean and
the
exceed
t
the modified form of Poisson regression described here is called negative
6
This qualification
is important
for the
of the
binomial
regression.
es not require
the variance
be equal
the mean
andbecause
allowsa high
the value
former
exceed
error
term
that to
represents
thetoadditional
heterogeneity
introduced
bytoratio
causal
fa
er. As against The
the advantage
original form
of toPoisson
regression,
where
of negative
binomial
regression
over
conventional
Poisson
their degrees of freedom may also indicate lack of fit or misspecificati
regression
is that
itform
does not
require
the
variance
to be
equal
to the between
mean andEqs. (2) and (
explicitly
considered
the analysis.
The main
difference
er. As against
the original
ofinPoisson
regression,
where
allows the former to exceed the latter. As against the original form of Poisson
E(Y )  var(Y )   ,
regression, where
gression.
E(Y )  var(Y )   ,
6
This qualification is important because a high value for the ratio of the sum of squared standardiz
egative binomial
regression,
to their degrees of freedom may also indicate lack of fit or misspecification, rather than overdispers
negative binomial
regression,
in negative
binomial regression,
E(Y )  
E(Y )  
and
and
var(Y )   (1   )    2 ,
var(Y )   (1   )    2 ,
and
ere α is the dispersion
that indicates
the degree
of of
overdispersion.
where α is theparameter
dispersion parameter
that indicates
the degree
overdispersion. In the spec
the special case
when there
no overdispersion,
α = 0of
andoverdispersion.
var (Y) = l+a In the spe
ere α is the Indispersion
parameter
thatis indicates
the degree
e when therel2 is
= 0 and
var(Y) to
 Poisson
  2 regression.
  and negative binom
= lno
andoverdispersion,
negative binomial αregression
simplifies
e when there is no overdispersion, α = 0 and var(Y)    2   and negative binom
ession simplifies to Poisson regression.
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014)
ression simplifies to Poisson regression.
THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE
107
AN EXAMPLE: LOCAL AREA DEPRIVATION AND
SUICIDE IN RURAL HUNGARY
To illustrate the use of the methods discussed in the previous sections of
this paper, we now present results from one of our studies that looked at the
impact of deprivation on suicide in rural Hungary. Given the main character of
our article, in what follows, we focus on methodological issues at the expense
of substantive details. The analysis spanned the years from 1990 to 1995 and
covered areas officially qualified as villages throughout the whole period.
The number of villages meeting this criterion was 2869 and the number
of suicides committed in those villages was 9237. Statistical data analysis
proceeded in two steps. We first employed principal component analysis to
create a composite measure of deprivation, then we used that measure as the
main explanatory variable in a Poisson regression in which the number of
suicides was the dependent variable and the size of the population at risk was
entered as an offset.
The following indicators were included in the principal component analysis:
– Number of cars per 1,000 inhabitants, 1992-1995
– Power consumption per household, 1993-1995
– Quality of educational infrastructure and health services, 19937
– Unemployment rate, 1993-1994
– Percentage of the population living on welfare, 1993-1995
– Percentage of the population receiving income supplement, 1993-1995
– Net migration, 1990-1995
– Dependency rate8
Figure 2 shows the principal component loadings for these indicators.
Variables capturing economic development, such as power consumption and
the number of cars, all have negative loadings, while those reflecting the lack
thereof, such as unemployment and percentage living on welfare, all have
positive loadings, lending some face validity to the principal component as a
summary measure of deprivation.
7 This variable measures the number of educational and health institutions such as schools and
hospitals.
8 This is the ratio of individuals below 18 and above 59 to those between 18 and 59.
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014)
108
FERENC MOKSONY–RITA HEGEDŰS
Figure 2. Graphic display of principal component loadings
on welfare, all have positive loadings, lending some face validity to the principal component
as a summary measure of deprivation.
[FIGURE 2 GOES HERE]
Having constructed our composite index of deprivation, in the second phase of the analysis,
Having constructed
our composite
index
of deprivation, in the second phase
Figure 2. Graphic
display ofmodel:
principal component loadings
rananalysis,
Poisson regression,
the following
ofwethe
we ranusing
Poisson
regression, using the following model:
ln( i )  ln(ni )   0   1 X i ,
isthe
theaverage
average
number
ofinsuicide
in village
during
where
where l
ii is
number
of suicide
village i during
1990 toi 1995;
ni is1990
the sizetoof1995;
ni is the size of the population for the same village in the same period; and
for thecomponent
same village in
the same
i is the principal component
Xithe
is population
the principal
score
for period;
villageandi. XWe
employed the statistical
software
Stata i.8.0
estimate
regression
b0 and
b1.
score for village
Weto
employed
the the
statistical
software coefficients,
Stata 8.0 to estimate
the regression
Table 2 displays the results obtained from Poisson regression. The
coefficients,  0 and 1 .
coefficient
for the principal component score is, as can be seen, positive and
statistically significant, with a one unit increase in our summary measure of
Table 2 displays
the associated
results obtained
fromanPoisson
regression.
The coefficient
deprivation
being
with
increase
of about
.115 in for
the the
natural
logarithm
of the number
ofcan
suicides.
To getand
ridstatistically
of logarithms,
wewith
exponentiate
principal component
score is, as
be seen, positive
significant,
a one
the coefficient and get exp(.1148) = 1.122, which means that a one unit rise in
unit increase in our summary measure of deprivation being associated with an increase of
the
principal component score raises the number of suicides by 12.2 percent.
All
in all,
findings
suggest
local
deprivation
thewerisk of
about
.115 these
in the natural
logarithm
of thethat
number
of suicides.
To get aggravates
rid of logarithms,
self-destruction.
exponentiate the coefficient and get exp(.1148) = 1.122, which means that a one unit rise in
the principal component score raises the number of suicides by 12.2 percent. All in all, these
findings suggest that local deprivation aggravates the risk of self-destruction.
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014)
[TABLE 2 GOES HERE]
109
THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE
Table 2. Effect of deprivation on suicide. Poisson regression results
Variable
Principal
component score
Constant
n = 2869
Coefficient
Std. error
.1148*
-7.7462
Z-value
.0114
10.11
.0110
-706.17
Antilog
of coefficient
1.122
* p < .001
Checking for overdispersion
As already noted, overdispersion, while not introducing bias into the
regression coefficients, leads to an underestimation of standard errors, thereby
potentially undermining the validity of confidence intervals and significance
tests. It is, therefore, important, before going too far with our conclusions, to
check for the presence of overdispersion and adjust the analysis accordingly,
if necessary.
For this purpose, we compared the observed distribution of suicides with
the one predicted from the Poisson model. The latter gives, for different
frequencies of occurrence, the probability that we would expect if the mean
number of suicides was the same as the one actually obtained (3.22) and the
assumptions underlying the Poisson model were fully met. Figure 3 displays
the results. As can be seen, at the two tails of the distribution, the solid line
runs above the dotted one, indicating that at the lowest and highest number
of suicides, observed proportions exceed Poisson probabilities. In the middle
portion of graph, in contrast, the solid line consistently stays below the dotted
one, implying that over this range, observed proportions are lower than those
predicted from the model. Observed values, then, appear to be spread out more
widely than those calculated based on the Poisson distribution. Corresponding
to this finding, we found the variance of the observed distribution to be
much larger (22.72) than its mean (3.22) and the ratio of the sum of squared
standardized residuals to their degrees of freedom also proved to be higher
than 1 (1.42), which, as already noted, is a sign of overdispersion, provided
the model is well-specified. All in all, these results speak for the fact that
overdispersion presents a real threat to the validity of conclusions drawn from
our study.
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014)
110
FERENC MOKSONY–RITA HEGEDŰS
Figure 3. Observed and expected probabilities (based on Poisson distribution, mean =
3.22)
Correcting for overdispersion
Figure 3. Observed and expected probabilities
(based on Poisson distribution, mean = 3.22)
Earlier in this paper,
we described two ways to handle the problem of
overdispersion: adjusting standard errors and switching from Poisson to
negative binomial regression. As for the former, original standard errors
are, as previously explained, multiplied by the square root of the dispersion
parameter, which is the ratio of the sum of squared standardized residuals
to their degrees of freedom. Table 4 reports these corrected standard errors,
along with the original ones, so we can better judge the change that results
from the adjustment.9
9 Stata reports two different corrected standard errors: one is based on the traditional Pearson
Chi-square, which is equivalent to the ratio of the sum of squared standardized residuals to
their degrees of freedom, while the other is calculated using the Likelihood-ratio Chi-square.
Although the two usually give similar results and thus the choice between them does not
generally make much difference, statistical theory suggests that we prefer Pearson Chi-square
to Likelihood-ratio Chi-square (Allison 2001: 223).
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014)
111
THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE
Table 3. Observed and expected distribution of suicide
0
.273
Expected probability (based on
Poisson distribution)
.040
1
.203
.129
2
.138
.207
3
.099
.222
4
.070
.179
5
.045
.115
6
.029
.062
7
.024
.028
8
.021
.011
9
.021
.004
10
.014
.001
Number of suicide
Observed relative frequency
Table 4. Poisson regression: corrected standard errors
Variable
Principal
component score
Constant
Coefficient
.1148
-7.7462
Original standard
error
.0114
(10.11)
.0110
(-706.17)
Corrected standard error
based on
Pearson
Chi-square
Likelihood-ratio
Chi-square
.0140
(8.50)
.0130
(-593.65)
.0135
(8.20)
.0140
(-573.22)
Note: Numbers in parentheses below standard errors are z-values
As can be seen, although the standard errors have increased somewhat after
correcting them for overdispersion and the z-values have correspondingly
declined slightly (since the coefficients themselves have remained
unchanged), the effect of deprivation, as captured by the principal component
score, continues to be highly statistically significant: the z-value is 6.74 and
the associated p-value is well below the usual 5% threshold.
Besides adjusting the standard errors, we also ran negative binomial
regression, the results of which are displayed in Table 5. The coefficient
for the principal component score is similar to that obtained from Poisson
regression, both testifying to the harmful effect of deprivation on suicide. The
antilogarithm of the coefficient equals 1.109, meaning that a one unit increase
in the principal component score increases the mean number of suicides by
about 11 percent. This effect of deprivation is statistically significant, as is the
estimate of the dispersion parameter, alpha, which provides further evidence
that overdispersion presents a problem in our study.
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014)
112
FERENC MOKSONY–RITA HEGEDŰS
Table 5. The impact of deprivation on suicide. Results from negative binomial
regression
Coefficient
Std. error
Z-value
Antilog
of coefficient
Principal component score
.1034*
.0153
6.74
1.109
Constant
–7.7816
.0144
Alpha
.1462*
.0130
Variable
n = 2869
* p < .001
Although the two regressions did not produce markedly different results,
the negative binomial regression appears to fit the data better than the Poisson
regression. This is evident from Figure 4, where the actual distribution of
suicides is shown along with the distributions predicted from Poisson and
negative binomial regressions. As can be seen, while Poisson regression
systematically underestimates very low and very high frequencies, the
negative binomial regression gives predicted values that are fairly close to the
observed ones even at the two tails.
Figure 4. Observed and expected number of suicides, based on Poisson and negative
binomial regression (mean = 3.22, overdispersion = 1.402)
Figure 4. Observed and expected number of suicides, based on Poisson and
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014)
negative binomial regression (mean = 3.22, overdispersion = 1.402)
THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE
113
SUMMARY AND CONCLUSIONS
Our aim with this paper was to give an introduction to Poisson regression,
which represents an important class of generalized linear models and can
profitably be used in studies in which the dependent variable describes
the number of occurrences of some rare event such as suicide. Although
researchers sometimes apply ordinary linear regression in these situations,
this method, as we have shown, is not generally appropriate for variables
of this sort. And while transformation of the dependent variable may help
alleviate part of the difficulties associated with conventional regression
techniques, Poisson regression approaches the problem more fittingly by
tailoring the model to the distribution of the dependent variable, rather than
the other way around.
REFERENCES
Agresti, Alan (1996), An introduction to categorical data analysis. New York, Wiley.
Agresti, Alan (2002), Categorical data analysis. 2nd edition. New York, Wiley.
Allison, Paul (2001), Logistic regression using the SAS system. Theory and application.
Cary, NC, SAS Institute
Bair, H. (2013), “Poisson Regression: Lack of Fit ≠ Overdispersion”, StatNews #86,
Cornell University, http,//www.cscu.cornell.edu/news/statnews/stnews86.pdf.
Accessed July 22, 2014
Chatterjee, Samprit – Bertram Price (1977), Regression analysis by example. New
York, Wiley.
Cohen, Jacob (1968), “Multiple regression as a general data-analytic system”.
Psychological Bulletin, 70, 426–443.
Fennessey, James (1968), “The general linear model, a new perspective on some
familiar topics”. American Journal of Sociology, 74, 1–27.
Gardner, William et al. (1995), “Regression analyses of counts and rates, Poisson,
overdispersed Poisson and Negative Binomial models”, Psychological Bulletin,
118, 392–404.
Gelman, Andrew – Jennifer Hill (2007), Data analysis using regression and multilevel/
hierarchical models, Cambridge University Press, New York
Gould, Madelyn et al. (1990), “Time-space clustering of teenage suicide”, American
Journal of Epidemiology, 131, 71–78.
Hoffman, John (2004), Generalized linear models, Boston, Pearson Education Inc.
King, Gary (1989), “Variance specification in event count models, from restrictive
assumptions to a generalized estimator”, American Journal of Political Science,
33, 762–784.
Land, Kenneth – Patricia McCall (1996), “A comparison of Poisson, negative
binomial, and semiparametric mixed Poisson regression models”, Sociological
Methods & Research, 24, 387–443.
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014)
114
FERENC MOKSONY–RITA HEGEDŰS
Le, Chap (1998), Applied categorical data analysis, New York, Wiley.
Long, J. Scott (1997), Regression models for categorical and limited dependent
variables, Thousand Oaks, Sage.
Moksony, Ferenc (2001), “Victims of change or victims of backwardness? Suicide in
rural Hungary”, in: Lengyel, Gy. - Rostoványi, Zs., ed., The small transformation.
Society, economy and politics in Hungary and the new European architecture,
Budapest, Akadémiai Kiadó, 366-376.
Osgood, D. Wayne (2000), “Poisson-Based Regression Analysis of Aggregate Crime
Rates”, Journal of Quantitative Criminology, 16, 21–43.
Phillips, David (1974), “The Influence of Suggestion on Suicide, Substantive and
Theoretical Implications of the Werther Effect”, American Sociological Review,
39, 340–354.
Phillips, David – Lundie Carstensen (1986), Clustering of teenage suicides after
television news stories about suicide, New England Journal of Medicine, 315,
685–89.
Sváb János (1981), Biometric methods in research [Biometriai módszerek a
kutatásban], Budapest, Mezőgazdasági Kiadó.
CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014)