CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY Vol.5 (2014) 2, 97–114 DOI: 10.14267/cjssp.2014.02.04 THE USE OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE Ferenc Moksony–Rita Hegedűs1 ABSTRACT This paper explains how Poisson regression can be used in studies in which the dependent variable describes the number of occurrences of some rare event such as suicide. After pointing out why ordinary linear regression is inappropriate for treating dependent variables of this sort, we go on to present the basic Poisson regression model and show how it fits in the broad class of generalized linear models. Then we turn to discussing a major problem of Poisson regression known as overdispersion and suggest possible solutions, including the correction of standard errors and negative binomial regression. The paper ends with a detailed empirical example, drawn from our own research on suicide. KEYWORDS Deviant behavior, Generalized linear models, Poisson regression, Rare events, Research methods, Statistical analysis, Suicide In social research, we often encounter dependent variables that describe the frequency of occurrence of events of various kinds. Students of deviant behavior, for example, look at whether media reporting of celebrity suicides leads to an increase in the number of self-destruction; demographers study the impact of environmental hazards on the number of birth defects; and sociologists of science examine the factors that influence the frequency with which publications are cited by fellow researchers. In cases like these, scholars frequently rely on ordinary linear regression to analyze their data. This sometimes produces acceptable results, especially when the mean frequency of occurrence of the event under study is relatively large, since in this situation the distribution of the dependent variable does not usually deviate substantially from the normal distribution assumed by ordinary least 1 Ferenc Moksony is professor, Rita Hegedűs is associate professor at the Corvinus University of Budapest. Corresponding author’s e-mail: [email protected]. The research reported in this paper was financially supported by the Hungarian Scientific Research Fund (Grant no. T043441). CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014) 98 FERENC MOKSONY–RITA HEGEDŰS squares regression. Often, however, the event we are interested in explaining is rare and the distribution of the dependent variable is highly skewed, with frequencies peaking at the lowest value and sharply declining toward the upper end of the scale. Figure 1 illustrates this, showing the frequency distribution of suicides in Hungarian villages from 1990 to 1995.2 Figure 1 Frequency distribution of suicide in Hungarian villages, 1990–1995 What we see on this graph is a far cry from the familiar symmetrical bellFrequency of suicide Hungarian 1990–1995no shapedFigure curve1.of normal distribution distribution: almostin800 villagesvillages, had absolutely suicide during the six years under study and another 600 had but one. The number of communities with many cases of death, in contrast, is very low, with only 38 villages registering more than twenty suicides. Variables with such asymmetric, right-skewed distributions can be approximated with an important class of discrete distribution, Poisson distribution.3 2 The graph displays data for areas officially recognized as villages for the whole period from 1990 to 1995. 3 As the mean frequency of occurrence of the event under study increases, the shape of the Poisson distribution gradually approaches that of normal distribution (see, e.g., Sváb 1981: 498). The Poisson distribution, then, is of greatest importance for the study of rare events (cf., Land - McCall 1996). CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014) PROBLEMS OF ORDINARY LEAST SQUARES AND POSSIBLE SOLUTIONS THEimportant US OF POISSON REGRESSION IN THE SOCIOLOGICAL OF SUICIDE PROBLEMS OF ORDINARY SQUARES AND POSSIBLE 99 One characteristic of LEAST Poisson distribution is thatSTUDY the mean is SOLUTIONS equal to the variance. This is easy to understand if we regard the Poisson distribution as an extreme case of One important characteristic of Poisson distribution is thatSQUARES the mean is equal to the variance. PROBLEMS OF ORDINARY LEAST AND POSSIBLE SOLUTIONS binomial distribution in which the probability of one of two possible outcomes (e.g., This is easy to understand if we regard the Poisson distribution as an extreme case of suicide/no suicide, birth defect/no birth defect) isdistribution very small. The mean of a binomially One distribution important characteristic of Poisson that theoutcomes mean is equal binomial in which the probability of one of twoispossible (e.g., to the variance. This is easy to understand if we regard the Poisson distribution distributed variable is suicide/no suicide, case birth of defect/no birthdistribution defect) is very small. the The probability mean of a binomially as an extreme binomial in which of one of two possible outcomes (e.g., suicide/no suicide, birth defect/no birth defect) x npdistributed , is very small. The mean of a binomially variable is distributed variable is and its variance is x np, and its variance is and its variance is s 2 np (1 p), where n is the number of observations s 2 np (1 and p),p is the relative frequency of one where n istwo the possible number ofoutcomes. observationsAs andp pgets is thesmaller relative and frequency of one the two of the smaller, theofterm in parentheses on the right hand side of the equation approaches 1, and thus s2, possible outcomes. As p gets smaller and smaller, the term in parentheses on the right hand where n is the number of observations and p is the relative frequency of one of the two the variance, approaches np, the mean. does this characteristic distribution affect the use of the variance, approaches np, the mean. sideHow of the equation approaches 1, and thusofs2,Poisson possible outcomes. As p gets smaller and smaller, the term in parentheses on the right hand ordinary linear regression? In regression analysis, we assume the conditional mean the dependent is some functionapproaches of the explanatory variables; np, the mean. side of theofequation approachesvariable 1, and thus s2, the variance, that is, we assume that the former changes in a systematic way with the values 3 of the themean latter. With Poisson-distributed this assumption As frequency of occurrence of the event underdependent study increases,variables, the shape of the Poisson distribution implies that thethatconditional variance of the variable, being equal gradually approaches of normal distribution (see, e.g., Svábdependent 1981: 498). The Poisson distribution, then, isto of 3 the also theevent values of the explanatory variables, greatest importance forchanges the study ofwith rareofevents (cf., Land study - McCall 1996). the shape of As the mean, mean frequency of occurrence the under increases, the Poisson violating distribution an important requirement of ordinary squares regression, namely, that gradually approaches that of normal distribution (see, e.g., least Sváb 1981: 498). The Poisson distribution, then, is of is the same for all levels of the independent variables (homoscedasticity). Conventional regression methods, then, are not generally appropriate for dependent variables that describe the frequency of rare events such as suicides or birth defects. What next, then? One option is to transform the dependent variable in order to make it satisfy the assumptions underlying ordinary least squares, and then use the transformed data in place of the original ones. Researchers taking this approach commonly employ the square-root transformation, which has the advantage of eliminating the meanvariance identity, since the variance of the square-root of a Poisson variable is independent of its mean (Chatterjee & Price 1977: 39). An additional benefit is that the distribution of the new variable will be more similar to the normal distribution, which is important for significance tests. the conditional variance ofevents the dependent variable greatest importance for the study of rare (cf., Land - McCall 1996). CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014) 100 FERENC MOKSONY–RITA HEGEDŰS While transforming the dependent variable is a common practice that is widely adopted by researchers (e.g., Moksony 2001), another possibility is to modify the regression model, making it suitable for the analysis of Poisson variables. This approach, in some sense, is the reverse of the previous one: rather than tailoring the distribution of the dependent variable to the requirements of ordinary least squares, here we proceed the other way around and tailor the regression model to the distribution of the dependent variable. This “custom-made” version of regression analysis is known as Poisson regression. Poisson regression belongs to the family of generalized linear models (Hoffman 2004; Agresti 1996: Ch. 4). These models extend the scope of ordinary linear regression in two ways. First, they describe transformations of the conditional mean of the dependent variable, rather than the mean itself, as linear functions of explanatory variables; second, they allow the dependent variable to have conditional distributions other than the normal. Various forms of generalized linear models differ from each other in the particular type of transformation applied and the specific distribution assumed for the dependent variable. Table 1, adapted from Agresti (1996: 97), lists the most important of these models. Table 1 Generalized linear models Model Transformation applied to the mean Distribution of Dependent Variable Type of explanatory variables Numerical and Categorical Numerical and Categorical Poisson regression Logarithm Poisson Logistic regression Logit Binomial Linear regression Identity Normal Numerical Analysis of variance Identity Normal Categorical Analysis of covariance Identity Normal Numerical and Categorical Source: Agresti 1996: 97 (Table 4.5). In Poisson regression, as can be seen, logarithmic transformation is used and the dependent variable is taken to follow a Poisson distribution. In contrast, in logistic regression, a logit transformation is employed and the dependent variable is assumed to have a binomial distribution. The table also includes ordinary linear regression, analysis of variance and analysis of covariance, which together comprise what is known as the general linear model (Fennessey 1968; Cohen 1968). In these three types of models, the CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014) THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE 101 All these variants of the general linear model, then, are just special cases of the general normal distribution is assumed for the dependent variable and the identity 4 linear model.is transformation used; that is, the conditional is directly describedOF ITS THE POISSON REGRESSION MODELmean ANDitself INTERPRETATION as COEFFICIENTS a linear function of the explanatory variables. All these variants of the general linear model, then, are just special cases of the generalized linear 4 model. In the simple bivariate case, the Poisson regression model has the following form: ln MODEL (1) 0 1 X AND THE POISSON REGRESSION THE POISSON REGRESSION MODEL AND INTERPRETATION OF ITS INTERPRETATION OF ITS COEFFICIENTS COEFFICIENTS where denotes the mean or expected value of the dependent variable, X is the indepe In has the In the the simple simple bivariate bivariate case, case, the the Poisson Poisson regression regression model model has the following following form: form: variable, and 0 and 1 are the regression coefficients to be estimated. These coefficient ln 0 1 X (1) essentially the same meaning as in ordinary linear regression; 1 , in particular giv where l denotes the mean or expected value of the dependent variable, X is denotes thewhere independent variable, andorb0expected and b1 are the of regression coefficients to X beis the indepen the mean value the dependent variable, change in the natural logarithm of the mean frequency of the dependent variable per on estimated. These coefficients have essentially the same meaning as in ordinary linear regression; band gives thecoefficients change in the logarithm coefficients 1particular 1, in are the regression to benatural variable, change inand the 0explanatory variable. This interpretation isestimated. not veryThese user-friendly, how of the mean frequency of the dependent variable per one unit change in the explanatory variable. interpretation istonot very however; 1 , in particular researchers, after all,This do not usually think inuser-friendly, terms of logarithms. It is, therefore, gives essentially the same meaning as in like ordinary linear regression; researchers, after all, do not usually like to think in terms of logarithms. It is, therefore, useful to reformulate Eq. (1) by taking the of both to reformulate Eq. (1)logarithm by taking thethe antilogarithm of antilogarithm both sides: change in the natural of mean frequency of the dependent variable per one sides: change in the explanatory variable. This interpretation is not very user-friendly, howe exp( 0 1 X ) exp( 0 ) exp(1 X ) (2) researchers, after all, do not usually like to think in terms of logarithms. It is, therefore, us In In contrast totob1,1 which contrast , whichrepresented representedthe theadditive additiveeffect effectof of the the explanatory explanatory variable on t variable on the log of the mean frequency, exp( b ) represents its multiplicative to reformulate Eq. (1) by taking the antilogarithm of both sides: 1 on mean the mean frequency indicating how many times larger (orthe under studyeffect becomes as the independent variable increases by one unit. To see this, suppose X 1 ) represents its multiplicative effect on mean freq of the frequency, exp(itself, smaller) the mean frequency of the phenomenon under study becomes as the exp( 0 1by X ) exp( 0 ) exp(1 X ) (2) independent increases one unit. see like this, suppose X is equal to is equal to a, which canvariable be anyhow number. then To looks itself, indicating many Eq. times(2)larger (or smaller)this: the mean frequency of the phenom a, which can be any number. Eq. (2) then looks like this: In contrast to 1 , which represented the additive effect of the explanatory variable on the ( X a ) exp( 0 1a ) exp( 0 ) exp( 1a ), 4 Roughly speaking, while the general linear model extends the scope of ordinary regression by a where themean vertical bar refers condition its thatmultiplicative the explanatory variable ) represents effect on the mean freque of the frequency, exp(to1the categorical to be included the analysis, generalized linear models step where the takes vertical barindependent refers tovariables the condition explanatory takes go onone one on one particular value, namely, a. that Let inusthe now increase thevariable value of the itself, indicating how many times larger (or smaller) the mean frequency of the phenome particular value, namely, a. Let us now increase the value of the explanatory variable by one unit, from 4 R oughly speaking, while the general linear model extends the scope of ordinary regression by allowing categorical independent variables to be included in the analysis, generalized linear 4 a tomodels a+1. (2) takes this Roughly speaking, while general linear model extends scope of variable. ordinary regression by allo goEq. one step now further andthe also lift form: the constraints imposed on the dependent categorical independent variables to be included in the analysis, generalized linear models go one step fu OF SOCIOLOGY AND SOCIAL POLICY 2 (2014) ( X a 1) exp 0 CORVINUS 1(a 1JOURNAL ) exp( 0 ) exp( 1a) exp( 1 ). Taking the ratio of the two equations, we get: where thevalue, vertical bar refers thatvalue the explanatory variablevariable takes on one particular namely, a. Lettousthe nowcondition increase the of the explanatory by one Taking the ratio of the two equations, we get: particular value, namely, a.now Let us now increase of the explanatory variable FERENC MOKSONY–RITA HEGEDŰSby one 102 a unit, from to a+1. Eq. (2) takes this form: the value (Eq.X(2) now a 1takes ) exp( 0 ) ( 1a ) exp( 1 ) exp( ). unit, from a to a+1. this form: explanatory oneunit, a+1. (2)1now takes ( variable X a 1by ) exp from (a 1a) to exp( Eq. ) exp( a) exp( 11).this form: 0 1 ( X a ) exp( 0 ) ( 01a ) ( X a 1) exp 0 1(a 1) exp( 0 ) exp( 1a) exp( 1 ). Taking the ratio of the two equations, we get: We can Taking see thatthe as ratio the value explanatory variable, X , has increased from a to (a+1), of the of twothe equations, we get: Taking the ratio of the two equations, we get: ( X a 1) exp( ) ( a ) exp( ) ( X a ) exp( ) ( a ) ( X a 1) exp( 0 ) (01a ) 1exp( 1 ) exp( 1 ). a factor equal to exp( 1(). X a) exp( 0 ) ( 1a ) We can see that as the value of the explanatory variable, X , has from from a to (a+1), We can see that as the value of the explanatory variable, X,increased has increased 1 1variable, that is, by one unit, the mean frequency of0the dependent exp( 1). , has indeed changed by a tosee (a+1), is,value by one the mean frequency variable, We can that that as the of unit, the explanatory variable, Xof, the has dependent increased from a to (a+1), , has indeed that is, unit, changed the mean by frequency the dependent changed by l, by hasone indeed a factorofequal to exp(b variable, ). 1 It can be helpful to express this change in percentage form using the following formula: be helpful to frequency express this change in percentage usingchanged the , form that is, It by can one unit, the mean of the dependent variable, has indeed by a factor equal to formula: exp( 1 ). following a factor equal to exp( 1 ). percentage change 100 exp( 1 ) 1. It can be helpful to express this change in percentage form using the following formula: If, for example, the dependent variable is the number of suicides in a village, It can be explanatory helpful express thisvariable change inispercentage form the value following the variable is the unemployment rate, and the of b1formula: is, say, If, for example, thetodependent the number ofusing suicides in a village, the explanatory 100 exp( 1 ) 1.(b1)–1=100*(1.057– .055, then exp (b1) = percentage exp (.055)change = 1.057 and 100*(exp means each percentage pointthe rise in the level of unemployment variable 1)=5.7, is thewhich unemployment rate, and value percentage change 100 exp( ) 1.of 1 is, say, .055, then increases, on average, the number of suicides by 5.71 percent. In the same way, If, for example, the dependent variable is the number of suicides in a village, the explanatory if the independent variable is type of settlement, with villages coded 0 and exp( 1 ) exp(.055) 1.057, and 100 (exp( 1 ) 1 100 (1.057 1) 5.7 , which means each 1, and the value ofvariable b1 is, say, –.035, then (b1)=exp(–.35)=.70, and If, forcities example, dependent of exp suicides the .055, explanatory 1a village, variable is thetheunemployment rate,is the andnumber the value of in is, say, then 100*(exp (b1)–1)=100*(.70–1)= –30, which means the number of suicides is, percentage point rise in the level of unemployment increases, on average, the number of 5 on average, 30 percent less in cities than variable is the the value of 1) 1 5.is, say, means .055, each then (exp(and ) in 1 villages. 100 (1.057 7 , which exp( ) exp(. 055) unemployment 1.057, and 100rate, 1 1 100 unemployment (exp( ) 1 100 (1.057 1) 5.7 , which means each exp( 1 ) exp(. 055rise ) 1in .057 , and percentage point the level and also lift the constraints imposed on the of dependent1variable. increases, on average, the number of INCLUDING POPULATION AT RISK IN THE MODEL percentage point rise in the level of unemployment increases, on average, the number of and also lift theimportant constraints imposed thethe dependent variable. One featureonof Poisson regression model discussed thus far is that it does not take differences in the size of the population at risk into and also lift the constraints imposed on the dependent variable. account. This is clearly a limitation because, with other factors remaining 5 It should be noted, though, that with dichotomous independent variables like type of settlement (cities/villages), exp (b1) can only be interpreted directly as reflecting the impact of those variables when we use dummy coding; that is, when we assign the values 0 and 1 to the two categories. This is because in this case, the distance or difference between the categories exactly equals one unit. With effect coding, in contrast, when the values –1 and +1 are used to denote the two categories, the distance or difference is two rather than one unit and thus exp (b1)no longer directly indicates the influence of the explanatory variable under study. To get this influence, exp (b1) has to be raised to the second power, to accommodate the greater distance between categories. CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014) THEin USthe OF POISSON IN THE STUDY OF for SUICIDE 103 sort of Differences size of REGRESSION the population at SOCIOLOGICAL risk can be controlled by applying some standardization, dividing mean frequency by theobviously population risk: Differences in the size of the the population risk can be controlled for by applying some sort of constant, a larger population atatrisk can beatexpected to produce a greater frequency of the phenomenon under study. More populous cities, standardization, dividing frequency by the population at risk:simply because for instance, willthe in mean general have higher numbers of suicides ln 0 1 X (3), factors, such as n of their dimensions, quite regardless of the impact of other population at risk can be controlled Differences in the size of the for by applying some poverty, unemployment, or residential segregation. ln 0 1 X (3), where n is Differences the population as the number at of risk individuals in a city or village. in at the of the population can be living controlled for by n size risk, such standardization, dividing theofmean frequencydividing by the the population at risk:by the applying some sort standardization, mean frequency This of the Poisson modelofdescribes theliving rate ofinoccurrence of the population at risk: wheremodified n is the version population at risk, suchregression as the number individuals a city or village. than its absolute frequency, as a linear function of X, the phenomenon study, lnrather 1 X model describes the rate (3), This modifiedunder version of the of occurrence of the Poisson 0regression n independent variable. making use of absolute the properties of logarithms, (3)living canofalso be phenomenon under rather than its asofa individuals linearEq. function X, the where n is study, theBy population at risk, such as frequency, the number in a population city or village. of the Poisson regressionliving modelin a city or where n is the at This risk,modified such asversion the number of individuals written asdescribes independent variable. By making use ofofthe of logarithms, (3) than can also be the rate of occurrence theproperties phenomenon under study,Eq. rather its absolute frequency, as a linear function of X, the independent variable. Thiswritten modified version of the Poisson regression model describes the rateByof occurrenc asmaking ) ln( nof) logarithms, 0 1 X Eq. (3) can also (4), be written as use of theln( properties phenomenon under study,ln(rather absolute frequency, as a linear function of ) ln(than n ) its (4), 0 1 X which, in turn, can be rearranged to get: in turn,By can making be rearranged independentwhich, variable. use toofget: the properties of logarithms, Eq. (3) can which, in turn, can be rearranged to get: written as ln( ) ln( n ) 0 1 X (5). ln(from ) ln( ) given This model differs thenone it contains a separate 0 by 1 X Eq. (1) in that (5). This model differs from the one given by Eq. (1) in that it contains a separate explanatory variable, ln(n), generally called an offset, which reflects explanatory the ln( ) ln( n ) 0 1 X (4), size of the population at risk, the coefficient of which is automatically taken variable, ln(n), generally angiven offset, reflects the size of thein atofrisk, the This model differs the oneof bywhich Eq. (1) that contains apopulation separate to equal 1.from If thecalled size the population atinrisk isitthe same each unitexplanatory observation, then ln(n) will be constant and thus can be incorporated into b0, coefficient ofcan which automatically taken to equal 1. Ifthe the sizeofofthe the population at risk is which, in turn, beis rearranged to get:which variable, ln(n), generally called an offset, reflects which takes us back to the original formulation; thatsize is, Eq. (1).population at risk, the the same inof each unitisofautomatically observation, then will be constant can be incorporated coefficient which takenln(n) to equal 1. If the sizeand of thus the population at risk is ln(to)the ln( n ) ln(n) 0 will 1beXconstant 0 , which the same in each unitus ofback observation, then and(1). thus (5). can be incorporated into takes original formulation; that is, Eq. OVERDISPERSION Besides including a separate independent variable to capture differences in us population back to the at original formulation; thatofis,Poisson Eq. (1).regression also into 0 , which the sizetakes of the risk, the initial model This modeloften differs one given by Eq. (1)One in important that it contains a separate expl needsfrom to bethe modified for another reason. characteristics variable, of the Poisson distribution, as already noted, is that the mean is equal to thegenerally variance. called The default model which of Poisson regression is built on population this ln(n), an offset, reflects the size of the identity; computations are performed assuming the mean and the variance of the dependent variable really are the same. In many cases, however, this at r coefficient of which is automatically taken to equal 1. If the size of the population a the same in each unit of observation, then ln(n) will be constant and thus can be incor CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014) 104 FERENC MOKSONY–RITA HEGEDŰS assumption does not hold true and the variance exceeds the mean. This is what is known in the statistical literature as overdispersion (Agresti 1996: 92–3; Le 1998: 226–228). Overdispersion generally arises from one of two sources (King 1989: 766–769; Osgood 2000: 28). First, in almost all research, some explanatory variables uncorrelated with the ones included in the analysis are left out from the model, either because they do not come to mind, or because we cannot capture them empirically. Suppose, for example, the frequency of suicide for cities or villages is influenced by economic development and regional culture, but we only include the former in the study. What is likely to happen in this situation? Holding the level of economic development constant, areas belonging to that level will be geographically – and culturally – heterogeneous; they will be a mixture of sub-populations, with each subpopulation having its own distinct, regionally-determined mean frequency of suicide. That is, the mean of the dependent variable for a certain value of explanatory variable will not be constant, as is assumed by the Poisson distribution, but we will instead have as many different means as there are regions. The result of this heterogeneity is that the variance of the conditional distribution of suicide – that is, of the distribution pertaining to a given level of economic development – will be greater than expected based on the Poisson distribution. The source of overdispersion, then, in this case is the excess variation caused by explanatory variables left out from the model; quite similarly to ordinary linear regression, where the impact of the omission of independent variables uncorrelated with those included in the analysis is to increase the error or residual variance. Another factor that may give rise to overdispersion is dependence among observations. One assumption of Poisson distribution is that observations are independent of each other; that the occurrence of an event (e.g., a suicide) does not influence the occurrence of another. What happens when this requirement is not met; when, for example, the incidence of a suicide increases the chance of another? In this situation, the frequency of small and large values – that is, those at the two extreme tails – will be greater than expected based on the Poisson distribution and the variance of the variable will increase as a result. If, for instance, kids in a school imitate each other’s behavior, then instead of observing a random distribution of cases, what we will likely see is that while nothing happens for months, a substantial number of suicide attempts take place within a very short time frame. The literature is replete with examples of such time-space clusterings of cases (e.g., Phillips 1974; Phillips and Carstensen 1986; Gould et al. 1990). From our point of view, the important thing about these examples is that the process of imitation and the CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014) standard errors results. using aOne correction the degree overdispersion. This e overly optimistic way tofactor cope that with reflects this problem is to ofadjust the THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE 105 correction performedfactor based that on an analysis of the residuals from the This original Poisson d errors using ais correction reflects the degree of overdispersion. underestimated and thus confidence interva resulting dependence of observations maysum leadoftosquared an increase of the variance and involves forming the of ratio of residuals the residuals to their on isregression performed based on an analysis the from the standardized original Poisson of the dependent variable, thus violating a fundamental assumption of Poisson will give overly optimistic results. One w that also the mean is dispersion equal to theparameter, variance. turns out to be considerably degrees ofdistribution, freedom. Ifnamely this ratio, called on and involves forming the ratio of the sum of squared residualsThe to their How does overdispersion affect Poissonstandardized regression results? most standard errors using a correction factor t 6 serious consequence is that although regression coefficients remain unbiased, , out thisto is be a sign of overdispersion and greater than then, also provided the model isparameter, well-specified of freedom. Iftheir this1,standard ratio, called dispersion considerably errors will be underestimated andturns thus confidence intervals will correction is performed based on an analy be unduly narrow and significance tests will give overly optimistic results. 6 standard errors can be adjusted by multiplying them with the square root of the dispersion is a sign of overdispersion and han 1, then, provided is well-specified One waythe to model cope with this problem is, this to adjust the standard errors using a regression and involves forming correction factor that reflects the degree of overdispersion. This correction the ratio of t parameter 1996: 93; Allisonthem 2001: 223;the Gelman Hill 114-116). d errors can beis(Agresti adjusted multiplying squareand root of2007: the dispersion performedbybased on an analysis with of the residuals from the original Poisson degrees of freedom. If this ratio, also called d regression and involves forming the ratio of the sum of squared standardized er (Agresti 1996: 93; Allison 2001: 223;ofGelman andIfHill 2007: 114-116). residuals to their degrees freedom. this ratio, called ofdispersion Another way, besides correcting standard errors, to cure thealso problem overdispersion greater than then,provided provided the modelisistow parameter, turns out to be considerably greater than 1,1,then, the is well-specified , this is aexcess sign ofvariability overdispersion and standard errors regression. to model acorrecting model that incorporates not captured Poisson way,switch besides standard errors,the to them cure the of overdispersion is to by multiply standard errors canthebeby adjusted can be adjusted by multiplying withproblem the square root of dispersion 6 parameter (Agresti 1996: 93; Allison 2001: 223; Gelman and Hill 2007: 114- In thisthat newincorporates model, the conditional mean of the variable is1996: no longer a constant, o a model the excess variability notdependent captured Poisson regression. parameterby (Agresti 93; Allison 2001:as2 116). Another way, besides correcting standard errors, to cure the problem of in the the original formulation, butthe a variable that scattersisrandomly around someascentral value, ew model, conditional mean of dependent variable no longer a constant, overdispersion is to switch to a model that incorporates the excess variability Another besides correcting not captured by Poisson regression. In this new model,way, the conditional mean of standard er due to the effectbut of aomitted explanatory variables. What we do, basically, is to add a random riginal formulation, variable central value, the dependent variablethat is noscatters longer arandomly constant, around as in thesome original formulation, switch a model thatdue incorporates the exces but a variable that scatters randomly around sometocentral value, to the component to Eq. he effect of omitted explanatory variables. What we do, basically, is to add is a random effect of (2): omitted explanatory variables. What we do, basically, to add a In this new model, the conditional mean of t random component to Eq. (2): ent to Eq. (2): ~ exp( 0 1 X ein) the original (6) formulation, but a variable th ~ due the effect of and omitted varia a random Eq. (2) e is explanatory the is exp( 0 variable ~ where l (6)ltofrom 1 X e) that replaces where is a random that term replaces from Eq. the (2)additional and e is the newly added random newly added variable random error thatrepresents heterogeneity component to Eq. introduced by causal factors not explicitly considered in(2): the analysis. error term that represents the additional heterogeneity introduced byrandom causal The main difference between Eqs. (2) and (6) is that while in Eq. (2) for factors not is a random variable that replaces from Eq. (2) and e is the newly added each level of the independent variable we had a single mean value of the ~ (6) is that explicitly considered in the(lanalysis. Thewemain difference Eqs. (2)not and dependent ), heterogeneity in Eq. (6), have a whole distribution means (l ), exp( 0 1 rm that represents the variable additional introduced bybetween causal of factors corresponding to the fact that now the omitted variables subsumed in the error the individual means randomly deviate Eqs. from (2) the level determined y considered term in themake analysis. The main difference between and (6) is that ~ is a random variable that replaces where 6 This qualification is important because a high value for the ratio of the sum of squared standardized residuals to their degrees freedom may also indicate fit or rather than overdispersion, such as 6 Thisof qualification is important becauselack a highof value formisspecification, the ratio of the sum of squared standardized error term that represents the additional h residuals to their degrees of freedom may also indicate lack of fit orstandardized misspecification, rather alification is important because a high value for the ratio of the sum of squared residuals than overdispersion, such as when a linear model is used to describe a curvilinear relationship egrees of freedom (Bair, may also indicate lack of fit or misspecification, rather thanconsidered overdispersion, such analysis. as 2013). explicitly in the The m CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014) 6 This qualification is important because a high value to their degrees of freedom may also indicate lack of degrees of freedom. If parameter this ratio, also called dispersion parameter, turnsGelman out to beand con (Agresti 1996: 93; Allison 2001: 223; 6 MOKSONY–RITA , thisHEGEDŰS is a sign of overdispe than 1, then, provided the model isFERENC well-specified Another way, besides correcting standard errors, to cure the errors part can of be the adjusted multiplying them with the square by standard the systematic model by (Long 1997: 230–231; Gardner et al. root of the switch to a model that incorporates the excess variability not 1995: 399–400). parameter (Agrestiof1996: 93; Allison 2001: 223;goes Gelman andthe Hillinitial 2007: 114-116). This modification Poisson regression, then, beyond In of this model, the mean of the dependent var model by changing the mean thenew distribution fromconditional a constant to a variable. 106greater In the original formulation, the dependent variable (Y) was a random Another besides correcting errors,a to the problem of overdispe in the butcure a variable that scatters rando variable thatway, scattered around theoriginal meanstandard (l)formulation, following Poisson distribution. The mean itself, however, was a constant – its value was unequivocally switch toby a model thatof incorporates the of excess variability notmodel, captured Poisson determined the value thetoexplanatory variable. Inexplanatory the new Y byWhat due the effect omitted variables. we rd continues to be a random variable scattering around a mean level following able that fluctuates around a central value following Gamma-distribution. result In this new model, the conditional of the dependent variable is noThe longer a co a Poisson distribution; butcomponent now, this mean isa also a random variable tomean Eq.level (2): that fluctuates, following a Gamma-distribution, around some central value, iable that fluctuates around a central value following a Gamma-distribution. The resul intothe formulation, a expected variable thatbinomial scatters randomly someiscen theoriginal random error, e. Sincebut value of the error is zeroaround by which combinationdue of these two distribution isthe the negative distribution, w ~ assumption, the central value that the individual means ( l) scatter around is exp( 0 1 X e) combination of these distribution is themodel negative binomial distribution, which is w identical thetwo mean from the original given by Eq. (2); we that is, basically, due totothe effect of omitted explanatory variables. What do, is to add modified form of Poisson regression ~described here is called negative binom ~ (l ) =l . E component to Eq. (2):regression modified form of Poisson here that is called random variable replaces negative from Eq. binom (2) an where is adescribed ression. In final analysis, then, in the modified form of Poisson regression we have a mixture of two different distributions: first, have thethe dependent variable error term that we represents additional heterogeneity int ~ (Y) that scatters around the mean (l)following a Poisson distribution; second, exp( 0 1 X e) (6) have the mean that isregression itself a random around a difference advantage we of negative binomial overvariable conventional Poisson regression is thab explicitly considered in that the fluctuates analysis. The main central value following a Gamma-distribution. The result of the combination ~ e advantageofofwhere negative conventional Poisson regression isadde th these two isregression the negative over binomial distribution, which ise why isdistribution abinomial random fromallows Eq. (2) andformer is thetonewly s not require the variance to be variable equal tothatthereplaces mean and the exceed t the modified form of Poisson regression described here is called negative 6 This qualification is important for the of the binomial regression. es not require the variance be equal the mean andbecause allowsa high the value former exceed error term that to represents thetoadditional heterogeneity introduced bytoratio causal fa er. As against The the advantage original form of toPoisson regression, where of negative binomial regression over conventional Poisson their degrees of freedom may also indicate lack of fit or misspecificati regression is that itform does not require the variance to be equal to the between mean andEqs. (2) and ( explicitly considered the analysis. The main difference er. As against the original ofinPoisson regression, where allows the former to exceed the latter. As against the original form of Poisson E(Y ) var(Y ) , regression, where gression. E(Y ) var(Y ) , 6 This qualification is important because a high value for the ratio of the sum of squared standardiz egative binomial regression, to their degrees of freedom may also indicate lack of fit or misspecification, rather than overdispers negative binomial regression, in negative binomial regression, E(Y ) E(Y ) and and var(Y ) (1 ) 2 , var(Y ) (1 ) 2 , and ere α is the dispersion that indicates the degree of of overdispersion. where α is theparameter dispersion parameter that indicates the degree overdispersion. In the spec the special case when there no overdispersion, α = 0of andoverdispersion. var (Y) = l+a In the spe ere α is the Indispersion parameter thatis indicates the degree e when therel2 is = 0 and var(Y) to Poisson 2 regression. and negative binom = lno andoverdispersion, negative binomial αregression simplifies e when there is no overdispersion, α = 0 and var(Y) 2 and negative binom ession simplifies to Poisson regression. CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014) ression simplifies to Poisson regression. THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE 107 AN EXAMPLE: LOCAL AREA DEPRIVATION AND SUICIDE IN RURAL HUNGARY To illustrate the use of the methods discussed in the previous sections of this paper, we now present results from one of our studies that looked at the impact of deprivation on suicide in rural Hungary. Given the main character of our article, in what follows, we focus on methodological issues at the expense of substantive details. The analysis spanned the years from 1990 to 1995 and covered areas officially qualified as villages throughout the whole period. The number of villages meeting this criterion was 2869 and the number of suicides committed in those villages was 9237. Statistical data analysis proceeded in two steps. We first employed principal component analysis to create a composite measure of deprivation, then we used that measure as the main explanatory variable in a Poisson regression in which the number of suicides was the dependent variable and the size of the population at risk was entered as an offset. The following indicators were included in the principal component analysis: – Number of cars per 1,000 inhabitants, 1992-1995 – Power consumption per household, 1993-1995 – Quality of educational infrastructure and health services, 19937 – Unemployment rate, 1993-1994 – Percentage of the population living on welfare, 1993-1995 – Percentage of the population receiving income supplement, 1993-1995 – Net migration, 1990-1995 – Dependency rate8 Figure 2 shows the principal component loadings for these indicators. Variables capturing economic development, such as power consumption and the number of cars, all have negative loadings, while those reflecting the lack thereof, such as unemployment and percentage living on welfare, all have positive loadings, lending some face validity to the principal component as a summary measure of deprivation. 7 This variable measures the number of educational and health institutions such as schools and hospitals. 8 This is the ratio of individuals below 18 and above 59 to those between 18 and 59. CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014) 108 FERENC MOKSONY–RITA HEGEDŰS Figure 2. Graphic display of principal component loadings on welfare, all have positive loadings, lending some face validity to the principal component as a summary measure of deprivation. [FIGURE 2 GOES HERE] Having constructed our composite index of deprivation, in the second phase of the analysis, Having constructed our composite index of deprivation, in the second phase Figure 2. Graphic display ofmodel: principal component loadings rananalysis, Poisson regression, the following ofwethe we ranusing Poisson regression, using the following model: ln( i ) ln(ni ) 0 1 X i , isthe theaverage average number ofinsuicide in village during where where l ii is number of suicide village i during 1990 toi 1995; ni is1990 the sizetoof1995; ni is the size of the population for the same village in the same period; and for thecomponent same village in the same i is the principal component Xithe is population the principal score for period; villageandi. XWe employed the statistical software Stata i.8.0 estimate regression b0 and b1. score for village Weto employed the the statistical software coefficients, Stata 8.0 to estimate the regression Table 2 displays the results obtained from Poisson regression. The coefficients, 0 and 1 . coefficient for the principal component score is, as can be seen, positive and statistically significant, with a one unit increase in our summary measure of Table 2 displays the associated results obtained fromanPoisson regression. The coefficient deprivation being with increase of about .115 in for the the natural logarithm of the number ofcan suicides. To getand ridstatistically of logarithms, wewith exponentiate principal component score is, as be seen, positive significant, a one the coefficient and get exp(.1148) = 1.122, which means that a one unit rise in unit increase in our summary measure of deprivation being associated with an increase of the principal component score raises the number of suicides by 12.2 percent. All in all, findings suggest local deprivation thewerisk of about .115 these in the natural logarithm of thethat number of suicides. To get aggravates rid of logarithms, self-destruction. exponentiate the coefficient and get exp(.1148) = 1.122, which means that a one unit rise in the principal component score raises the number of suicides by 12.2 percent. All in all, these findings suggest that local deprivation aggravates the risk of self-destruction. CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014) [TABLE 2 GOES HERE] 109 THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE Table 2. Effect of deprivation on suicide. Poisson regression results Variable Principal component score Constant n = 2869 Coefficient Std. error .1148* -7.7462 Z-value .0114 10.11 .0110 -706.17 Antilog of coefficient 1.122 * p < .001 Checking for overdispersion As already noted, overdispersion, while not introducing bias into the regression coefficients, leads to an underestimation of standard errors, thereby potentially undermining the validity of confidence intervals and significance tests. It is, therefore, important, before going too far with our conclusions, to check for the presence of overdispersion and adjust the analysis accordingly, if necessary. For this purpose, we compared the observed distribution of suicides with the one predicted from the Poisson model. The latter gives, for different frequencies of occurrence, the probability that we would expect if the mean number of suicides was the same as the one actually obtained (3.22) and the assumptions underlying the Poisson model were fully met. Figure 3 displays the results. As can be seen, at the two tails of the distribution, the solid line runs above the dotted one, indicating that at the lowest and highest number of suicides, observed proportions exceed Poisson probabilities. In the middle portion of graph, in contrast, the solid line consistently stays below the dotted one, implying that over this range, observed proportions are lower than those predicted from the model. Observed values, then, appear to be spread out more widely than those calculated based on the Poisson distribution. Corresponding to this finding, we found the variance of the observed distribution to be much larger (22.72) than its mean (3.22) and the ratio of the sum of squared standardized residuals to their degrees of freedom also proved to be higher than 1 (1.42), which, as already noted, is a sign of overdispersion, provided the model is well-specified. All in all, these results speak for the fact that overdispersion presents a real threat to the validity of conclusions drawn from our study. CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014) 110 FERENC MOKSONY–RITA HEGEDŰS Figure 3. Observed and expected probabilities (based on Poisson distribution, mean = 3.22) Correcting for overdispersion Figure 3. Observed and expected probabilities (based on Poisson distribution, mean = 3.22) Earlier in this paper, we described two ways to handle the problem of overdispersion: adjusting standard errors and switching from Poisson to negative binomial regression. As for the former, original standard errors are, as previously explained, multiplied by the square root of the dispersion parameter, which is the ratio of the sum of squared standardized residuals to their degrees of freedom. Table 4 reports these corrected standard errors, along with the original ones, so we can better judge the change that results from the adjustment.9 9 Stata reports two different corrected standard errors: one is based on the traditional Pearson Chi-square, which is equivalent to the ratio of the sum of squared standardized residuals to their degrees of freedom, while the other is calculated using the Likelihood-ratio Chi-square. Although the two usually give similar results and thus the choice between them does not generally make much difference, statistical theory suggests that we prefer Pearson Chi-square to Likelihood-ratio Chi-square (Allison 2001: 223). CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014) 111 THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE Table 3. Observed and expected distribution of suicide 0 .273 Expected probability (based on Poisson distribution) .040 1 .203 .129 2 .138 .207 3 .099 .222 4 .070 .179 5 .045 .115 6 .029 .062 7 .024 .028 8 .021 .011 9 .021 .004 10 .014 .001 Number of suicide Observed relative frequency Table 4. Poisson regression: corrected standard errors Variable Principal component score Constant Coefficient .1148 -7.7462 Original standard error .0114 (10.11) .0110 (-706.17) Corrected standard error based on Pearson Chi-square Likelihood-ratio Chi-square .0140 (8.50) .0130 (-593.65) .0135 (8.20) .0140 (-573.22) Note: Numbers in parentheses below standard errors are z-values As can be seen, although the standard errors have increased somewhat after correcting them for overdispersion and the z-values have correspondingly declined slightly (since the coefficients themselves have remained unchanged), the effect of deprivation, as captured by the principal component score, continues to be highly statistically significant: the z-value is 6.74 and the associated p-value is well below the usual 5% threshold. Besides adjusting the standard errors, we also ran negative binomial regression, the results of which are displayed in Table 5. The coefficient for the principal component score is similar to that obtained from Poisson regression, both testifying to the harmful effect of deprivation on suicide. The antilogarithm of the coefficient equals 1.109, meaning that a one unit increase in the principal component score increases the mean number of suicides by about 11 percent. This effect of deprivation is statistically significant, as is the estimate of the dispersion parameter, alpha, which provides further evidence that overdispersion presents a problem in our study. CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014) 112 FERENC MOKSONY–RITA HEGEDŰS Table 5. The impact of deprivation on suicide. Results from negative binomial regression Coefficient Std. error Z-value Antilog of coefficient Principal component score .1034* .0153 6.74 1.109 Constant –7.7816 .0144 Alpha .1462* .0130 Variable n = 2869 * p < .001 Although the two regressions did not produce markedly different results, the negative binomial regression appears to fit the data better than the Poisson regression. This is evident from Figure 4, where the actual distribution of suicides is shown along with the distributions predicted from Poisson and negative binomial regressions. As can be seen, while Poisson regression systematically underestimates very low and very high frequencies, the negative binomial regression gives predicted values that are fairly close to the observed ones even at the two tails. Figure 4. Observed and expected number of suicides, based on Poisson and negative binomial regression (mean = 3.22, overdispersion = 1.402) Figure 4. Observed and expected number of suicides, based on Poisson and CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014) negative binomial regression (mean = 3.22, overdispersion = 1.402) THE US OF POISSON REGRESSION IN THE SOCIOLOGICAL STUDY OF SUICIDE 113 SUMMARY AND CONCLUSIONS Our aim with this paper was to give an introduction to Poisson regression, which represents an important class of generalized linear models and can profitably be used in studies in which the dependent variable describes the number of occurrences of some rare event such as suicide. Although researchers sometimes apply ordinary linear regression in these situations, this method, as we have shown, is not generally appropriate for variables of this sort. And while transformation of the dependent variable may help alleviate part of the difficulties associated with conventional regression techniques, Poisson regression approaches the problem more fittingly by tailoring the model to the distribution of the dependent variable, rather than the other way around. REFERENCES Agresti, Alan (1996), An introduction to categorical data analysis. New York, Wiley. Agresti, Alan (2002), Categorical data analysis. 2nd edition. New York, Wiley. Allison, Paul (2001), Logistic regression using the SAS system. Theory and application. Cary, NC, SAS Institute Bair, H. (2013), “Poisson Regression: Lack of Fit ≠ Overdispersion”, StatNews #86, Cornell University, http,//www.cscu.cornell.edu/news/statnews/stnews86.pdf. Accessed July 22, 2014 Chatterjee, Samprit – Bertram Price (1977), Regression analysis by example. New York, Wiley. Cohen, Jacob (1968), “Multiple regression as a general data-analytic system”. Psychological Bulletin, 70, 426–443. Fennessey, James (1968), “The general linear model, a new perspective on some familiar topics”. American Journal of Sociology, 74, 1–27. Gardner, William et al. (1995), “Regression analyses of counts and rates, Poisson, overdispersed Poisson and Negative Binomial models”, Psychological Bulletin, 118, 392–404. Gelman, Andrew – Jennifer Hill (2007), Data analysis using regression and multilevel/ hierarchical models, Cambridge University Press, New York Gould, Madelyn et al. (1990), “Time-space clustering of teenage suicide”, American Journal of Epidemiology, 131, 71–78. Hoffman, John (2004), Generalized linear models, Boston, Pearson Education Inc. King, Gary (1989), “Variance specification in event count models, from restrictive assumptions to a generalized estimator”, American Journal of Political Science, 33, 762–784. Land, Kenneth – Patricia McCall (1996), “A comparison of Poisson, negative binomial, and semiparametric mixed Poisson regression models”, Sociological Methods & Research, 24, 387–443. CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2 (2014) 114 FERENC MOKSONY–RITA HEGEDŰS Le, Chap (1998), Applied categorical data analysis, New York, Wiley. Long, J. Scott (1997), Regression models for categorical and limited dependent variables, Thousand Oaks, Sage. Moksony, Ferenc (2001), “Victims of change or victims of backwardness? Suicide in rural Hungary”, in: Lengyel, Gy. - Rostoványi, Zs., ed., The small transformation. Society, economy and politics in Hungary and the new European architecture, Budapest, Akadémiai Kiadó, 366-376. Osgood, D. Wayne (2000), “Poisson-Based Regression Analysis of Aggregate Crime Rates”, Journal of Quantitative Criminology, 16, 21–43. Phillips, David (1974), “The Influence of Suggestion on Suicide, Substantive and Theoretical Implications of the Werther Effect”, American Sociological Review, 39, 340–354. Phillips, David – Lundie Carstensen (1986), Clustering of teenage suicides after television news stories about suicide, New England Journal of Medicine, 315, 685–89. Sváb János (1981), Biometric methods in research [Biometriai módszerek a kutatásban], Budapest, Mezőgazdasági Kiadó. CORVINUS JOURNAL OF SOCIOLOGY AND SOCIAL POLICY 2ZZZZY (2014)
© Copyright 2025 Paperzz