Exploring Cross-Correlation
Between Traffic Citations
and Traffic Accidents in
the City of Phoenix
A Study by Seth Tribble
Introduction
This study examines data on the number of traffic citations issued and the number
of traffic accidents reported per month over the fifteen-year period 1983-1997. One
question this project will address is that of the nature of the effect of the number of
citations on the number of accidents. Certainly, the goal of issuing traffic citations is to
bring about a safer driving environment, so this project will gauge whether statistically
significant evidence of success exists in the model for Phoenix, Arizona. Some may
argue that traffic citations are not a significant safety measure because speeding, the
offense most commonly cited, is only rarely the culprit in traffic accidents. Another issue
that demands attention is the effect of the number of accidents on the number of citations.
Hypotheses concerning the outcome would appear to be less debatable, as most anyone
would suppose that a high number of traffic accidents would induce an increased number
of traffic citations. This report will examine the effect of traffic accidents on citations; as
will be seen, the nature of the correlation is a bit surprising.
Methods
The data for the records of accidents and citations per month can be seen at the
Internet site for the Phoenix police department at http://www.ci.phoenix.az/POLICE. This
study incorporates only the data for the period from 1983 through 1997. Using the
statistical software package S-Plus, we can attempt to fit auto-regressive models to these
series. As will be seen later, these series will require some detrending before the autoregressive behavior of the series can be examined. This is to be expected considering
external factors based on the seasonal cycle. Several options for detrending and
modeling will be presented in this report.
Preliminary Steps: The Issue of Detrending
The first figure shows a plot of the two original series. Note that the seasonal
structure in the series of accidents appears much stronger than in the series of citations.
In order to use ARIMA models to fit the data it is important that the mean of each
series is removed. In the series of citations, a subtraction of the mean appears to be
sufficient in detrending the series, as the seasonal structure appears to be relatively nonexistent. This is confirmed by examining the plot of the spectral density of the series as
well as the auto-correlation function. The monthly means of the series of citations are
quite similar to each other and do not have a clear correlation with the monthly means of
the series of accidents.
The series of accidents must undergo a detrending process that deals with the
strong seasonal structure. The possibilities include the incorporation of a seasonal AR
model in the examination of the auto-regressive behavior of the series with itself, the use
of additional (monthly) regression variables in the estimate of ARIMA models, and a
manual removal of the monthly means. The use of a simple discrete Fourier transform
will clearly be inadequate, as the seasonal variation is not sinusoidal. Using the “sabl”
command in S- Plus, we can see an estimate of the seasonal component of the series of
accidents:
No simple combination of Fourier pairs can be used as regression variables to
accommodate this skewed seasonal structure. This irregularity may indicate that the
removal of monthly means is the most appropriate method in detrending the data.
Included in the attached S-Plus script is a comparison of the ARIMA model
estimates of an AR (3) model on the accident series with monthly regression variables
and on the accident series with monthly means removed. Also included is the ARIMA
model estimate of a seasonal AR (2) and AR (1), period 12 model. The p-values for the
Ljung-Box chi-squared statistic show that the seasonal AR model may not be reliable.
Here it is shown in comparison to the same statistic for the estimate with monthly means
removed. The seasonal model is shown first.
Neither is a desirable result but of the two models, the one with monthly means
subtracted appears more suitable. The model with monthly regression variables produces
similar results to the manual removal of monthly means, so we will choose the manual
removal of the monthly means as the most simple and effective detrending option.
Modeling the Series
After detrending each series as indicated above, we can take a look at several
preliminary plots. It seems likely that an AR model would fit each univariate series
moderately well; we can confirm such behavior by looking at the auto-correlation
function and partial auto-correlation function for the bivariate series. For the following
overlapping plot of the two series, the series of citations centered on the series mean was
reduced by the ratio of the standard errors of the two series.
We note that the series of citations appears to follow the series of accidents with some
delay. At certain peaks we can see what appears to be a high correlation between
citations and approximately the 12th lag of accidents. It is difficult to determine whether
this behavior continues in other areas of the series based on this graph alone. The effect
of citations on accidents is quite unclear from the graph. Now we will take a look at the
ACF and PACF of the series.
As expected, we note the high ACF values for the 10th-20th lags of the accidents with the
citations (shown in the second graph in the first row). The PACF also shows significant
activity on lags of accidents with citations (fourth graph in first row). Interestingly the
PACF on lags of citations with accidents (third graph in second row) is virtually zero
everywhere. To isolate the cross-correlation effect, however, we must “pre-whiten” both
series by examining their dependence structure as separate univariate series. As we note
from the ACF, both series exhibit strong auto-regressive behavior. The PACF tells us
that an AR (4) should be adequate to model both series, as the PACF is virtually zero for
lags of a higher order.
The attached script file includes a pre-whitening process where an AR model of
maximum order 4 is applied to each series separately. The cross-correlation effect will
be more evident in the ACF and PACF of the pre-whitened series.
The ACF and PACF of the citations show that perhaps there is, in fact, a small seasonal
variation that an AR (4) can not deal with, although it is not so large a deviation that it
would distort the findings on the cross-correlation here. The ACF and PACF of the
accidents are mostly with the 95% confidence interval for the sample functions of an
ideally uncorrelated series, and so we will accept the fits as sufficient pre-whitening.
We can see that the ACF for lags of either series with the other series generally
falls within the error bounds of an uncorrelated series, with the lone example in each
occurring at a lag of order greater than 30. This ACF shows no significant evidence of
any inter-dependence structure in either direction. In contrast, the PACF has quite strong
values for lags of accidents with citations. Note that the PACF for the first two lags is
negative and of a considerable magnitude, although it becomes consistently positive in
lags of order between 10 and 20. The outcome for the first two lags is a counter-intuitive
result, as one would expect a positive sign for the immediate lags. The data requires us to
conclude that police response to a high number of accidents suffers from a delay of
almost a year such that the response bears less relevance to the current trend in accidents.
The PACF for lags of citations with accidents again looks as though it is zero
everywhere. We are left with no significant evidence of an effect of citations on traffic
accidents, although perhaps the fact that the majority of lags in the ACF are negative tells
us that there is a reducing effect that is too slight to be reliably detected.
We may look at transfer function modeling to see if the series of accidents is a
filtered version of the series of citations with some delay. Again we fit the citations with
an AR (4) model, but then we filter the accidents with the coefficients of the model for
citations to see if we can isolate a delay. The ACF and PACF of the filtered series are
given here:
The ACF is shown for 60 lags, although it would not make much sense in interpreting the
model if a delay of five years is detected. Again there seem to be a few outlying lags as
we move into the distant lags (first graph of the second row), but there is no significant
evidence of a correlation. As each series is only of length 180, we can not trust the
results for lags that near 60, since almost 1/3 of the data is being ignored and the chance
for error becomes greater. Again the PACF (third graph of the second row) remains close
to zero everywhere.
So far we have not seen sufficient evidence of an effect of traffic citations on
traffic accidents, but perhaps the effect is too gradual to be perceived by correlation
between the actual values. The last model here uses differencing on the series of
accidents. The ith term of the difference series is given by the i+1th term of the series of
accidents (with monthly means removed) minus the ith term. As this series is only of
length 179, the 180th term of the series of citations is removed. We can now look at these
series to determine if lags of the citations have a detectable effect on the change in the
number of accidents that occur. We can see from the ACF of the bivariate series that
there seems to be a consistently negative
correlation between lags of citations and the
change in the number of accidents, although it
is once again far too small to be attributed
confidently to a true inter-dependence
structure. Again to be certain of the nature of
the cross-correlation, we must pre-whiten each series by fitting an AR model to both
series independently. Again an AR of maximum order 4 is used to fit each series and the
resulting ACF and PACF of the pre-whitened series are given below.
The ACF once again shows nothing statistically significant in the plot of lags of citations
versus change in accidents, with only two of the first twenty lags approaching the error
bounds for sample correlation of an uncorrelated series. The PACF is also close to zero
for all of the lags. Even a gradual shift in accidents in response to citations can not be
significantly detected.
Conclusion
This report has looked at a variety of methods in modeling the inter-dependence
of the issuing of traffic citations and the record of traffic accidents in the city of Phoenix
in order to address a number of possible effects. There is a little trouble to be found
when accounting for seasonal variations but it appears that the removal of monthly means
from the highly seasonal accident series is an appropriate detrending step. Once
measures were taken to isolate cross-correlation, there was no reliable evidence of the
hypothesis that high-volume enforcement of traffic laws induced a drop in the number of
accidents that occurred. Another outcome of this report is the conclusion that police
response to trends in accidents is significantly delayed. These two statistics are
independent, however; a more punctual response by the police would not likely produce
any more of an effect on traffic accidents in the future than it has.
Appendix: S-Plus script file
pacf<-function(x){acf(x,type="partial")}
prettyplot<-function(rows,columns){
par(mfrow=c(rows,columns),omi=c(.5,1,.5,1))
}
basicplots<-function(x){
prettyplot(2,2)
ts.plot(x)
hist(x)
acf(x)
pacf(x)
}
Pho.acc<-accidents$V1
Pho.cit<-citations$V1
graphsheet()
prettyplot(2,1)
tsplot(Pho.cit)
title("Citations series")
tsplot(Pho.acc)
title("Accidents series")
prettyplot(1,1)
# This will take a look at the odd seasonal structure in accidents
sablplot(sabl(rts(Pho.acc, frequency=12)))
mtext("Linear, seasonal and noise components of accident series",line=4.5)
#mean removal
Pho.citm<-Pho.cit-mean(Pho.cit)
Pho.accm<-Pho.acc-mean(Pho.acc)
# Smoothed periodogram plots
spec.plot(spec.pgram(Pho.citm,plot=F,span=4))
spec.plot(spec.pgram(Pho.accm,plot=F,span=4))
# The acf shows that the seasonal component is quite strong
Pho<-cbind(Pho.citm,Pho.accm)
acf(Pho)
# We can look at a seasonal ARIMA estimate of the accident series
ac.mod<-list(list(order=c(2,0,0)),list(order=c(1,0,0), period=12))
accar<-arima.mle(Pho.accm,model=ac.mod)
arima.diag(accar)
# The following removes the monthly means in the accidents series to
# remove the irregular seasonal variation
monthmean<-rep(0,12)
for(j in 0:14){
for(i in 1:12)
monthmean[i]<-monthmean[i]+Pho.acc[12*j+i]/15
}
Pho.accmm<-accidents$V1
for(j in 0:14){
for(i in 1:12)
Pho.accmm[12*j+i]<-Pho.accmm[12*j+i]-monthmean[i]
}
# The series of citations has negligible seasonal variation and so we will not
# remove the monthly means from it
# Try an arima.mle on the accident series with and without
# monthly mean removal
#adjust for monthly variation using xreg command
xa<-matrix(0,nrow=180,ncol=1)
xb<-xa
xc<-xb
xd<-xc
xe<-xd
xf<-xe
xg<-xf
xh<-xg
xi<-xh
xj<-xi
xk<-xj
# Regression matrix includes constant series of 1s, and monthly series for
eleven
# months (December is held as standard)
for(j in 0:14){
xa[12*j+1]<-1
xb[12*j+2]<-1
xc[12*j+3]<-1
xd[12*j+4]<-1
xe[12*j+5]<-1
xf[12*j+6]<-1
xg[12*j+7]<-1
xh[12*j+8]<-1
xi[12*j+9]<-1
xj[12*j+10]<-1
xk[12*j+11]<-1
}
xl<-rep(1,180)
xx<-cbind(xa,xb,xc,xd,xe,xf,xg,xh,xi,xj,xk,xl)
mmest<-arima.mle(Pho.accmm,model=list(order=c(3,0,0)))
est<-arima.mle(Pho.acc,model=list(order=c(3,0,0)),xreg=xx)
# Estimates are highly similar, monthly means acceptable
arima.diag(mmest)
graphsheet()
prettyplot(2,1)
tsplot(Pho.citm)
title("Citations series with series mean removed")
tsplot(Pho.accmm)
title("Accidents series with monthly means removed")
Pho.m<-cbind(Pho.citm,Pho.accmm)
prettyplot(1,1)
# Look at overlap of plots (citations divided by ratio of errors)
tsplot(Pho.citm/sqrt(var(Pho.citm)/var(Pho.accmm)),Pho.accm,lty=c(4,1),col=c(3,
4))
title("Pho.citm (dashed maroon, reduced) vs. Pho.accmm (solid green)")
spec.plot(spec.pgram(Pho.citm,plot=F,span=4))
spec.plot(spec.pgram(Pho.accmm,plot=F,span=4))
acf(Pho.m, lag.max=40)
acf(Pho.m, type="partial", lag.max=40)
# PACF allows for ar(4) on univariate series, use this maximum for bivariate
fit
Phomarfit<-ar(Pho.m,order.max=4,method="burg")
# Pre-whitening will isolate cross-correlation effect
accmar<-ar(Pho.accmm,aic=T,order.max=4,method="burg")
citmar<-ar(Pho.citm,aic=T,order.max=4,method="burg")
mr<-cbind(citmar$resid[5:179],accmar$resid[5:179])
prettyplot(2,1)
ts.plot(citmar$resid[5:179])
title("Prewhitened Citations Series")
ts.plot(accmar$resid[5:179])
title("Prewhitened Accidents Series")
prettyplot(1,1)
acf(mr,lag.max=40)
acf(mr,lag.max=40,type="partial")
# We will try transfer function modeling in both directions
# First with accidents as a filtered version of citations
citfit<-ar(Pho.citm,aic=F,order.max=4,method="burg")
graphsheet()
basicplots(citfit$resid)
acctr<-filter(Pho.accmm,filter=c(1,-1*citfit$ar),sides=1)
cittr<-filter(Pho.citm,filter=c(1,-1*citfit$ar),sides=1)
prettyplot(1,1)
acf(cbind(cittr,acctr), lag.max=60)
pacf(cbind(cittr,acctr))
# Try the reverse
accfit<-ar(Pho.accmm,aic=T,order.max=12,method="burg")
basicplots(accfit$resid)
acctr<-filter(Pho.accmm,filter=c(1,-1*accfit$ar),sides=1)
cittr<-filter(Pho.citm,filter=c(1,-1*accfit$ar),sides=1)
prettyplot(1,1)
acf(cbind(cittr,acctr))
pacf(cbind(cittr,acctr))
# Transfer function modeling inconclusive as to correlation between lags of
citations
# and accidents; however there is a strong PACF for lags of accidents and
citations.
# We will look at series of differences in accidents
accdiff<-rep(0,179)
citd<-Pho.citm[1:179]
for (j in 1:179)
accdiff[j]<-Pho.accmm[j+1]-Pho.accmm[j]
Pho.d<-cbind(citd,accdiff)
graphsheet()
tsplot(accdiff)
title("Series of differences in accident series")
acf(Pho.d,lag.max=40)
Phodarfit<-ar(Pho.d,order.max=4,aic=T,method="burg")
# Again try pre-whitening
accdar<-ar(accdiff,aic=T,order.max=4,method="burg")
citdar<-ar(citd,aic=T,order.max=4,method="burg")
diffr<-cbind(citdar$resid[5:179],accdar$resid[5:179])
prettyplot(2,1)
ts.plot(citdar$resid[5:179])
title("Prewhitened Citations Series")
ts.plot(accdar$resid[5:179])
title("Prewhitened Accident Differences Series")
prettyplot(1,1)
acf(diffr)
pacf(diffr)
© Copyright 2026 Paperzz