Forecasting criminal sentencing decisions

inrernational Journal of Forecasting 9 (1993) 49-60
North-Holland
49
Forecastingcriminal sentencingdecisions
DuncanI. Simester
MIT, Cambridge,
MA 02139,USA
SloanSchoolof Management,
RoderickJ. Brodie
University of Auckland, Private Bag, Auckland, New Zealand
Abstract: This paper investigates the feasibility of developing bootstrapping models of criminal
sentencing decisions to predict sentence length and the outcome of sentencing appeals. New Zealand
Court of Appeal sexual offence sentencingdecisionsare modelled as a function of variables describing
the character of the offender and the circumstancesof the offence. The model outperforms both an
equal weights approach and a naive mean projection in predicting sentencelengths in a holdout sample.
As applications and further validations of the model, the disparity between actual and predicted
sentence is used to forecast the successof an appeal and to predict change in sentence, given a
successfulappeal. The results confirm that bootstrappingmodels have the potential to contribute to a
field traditionally believed to favour solely judgemental approaches.
Keywords: Bootstrapping, Criminal sentencing,Econometric forecasting.
1. Introduction
In a recent review of the performance of
judgemental and statistical forecasting methods,
Bunn and 'Wright (1991) conclude that contemporary studies have demonstrated that statistical models of judgement may have less predictive accuracy than the judges upon whom the
models are based. They go on to state that the
superiority of these so-called 'bootstrapping'
models is most at risk when there is a need to
consider peripheral information, and where the
forecasting process is less stable and routine.
This conclusion contrasts with the findings in an
earlier review by Armstrong (1985), where it is
claimed that a bootstrapping modei is always at
least as accurate as the judge.
This paper seeks to contribute to this discussion, by investigating the feasibility of developCorrespondence b:
R.J. Brodie, University of Auckland,
Private Bag, Auckland,
New Zealand.
ing a bootstrapping model of criminal sentencing
decisions. Accurate forecasts of criminal sentencing decisionsshould, in principle, be of considerable value to attorneys, judges and policymakers. Attorneys must themselves intuitively
forecast sentencing decisions in order to advise
their clients properly on the wisdom of plea
bargaining options, guilty pleas and appeals
against sentence. A formal forecasting model
therefore has the potential to improve the advice
that attorneys can offer their clients, while also
providing an objective measure of disparity for
use in appeal submissions.
However, the Courts have expresseda reluctance to encourage the development of such models, arguing that the complexity and variety of
factors to be considered preclude such an approach:
U]t is clearly to be recognised that sentencing is an
individual, subjective exercise and must be tailored to all
the circumstances of each individual case. Anv exercise
0169-2070t93/$06.00
O 1993- Elsevier Science publishers BV. All rights reserved
I
!
I
50
Forecasting criminal sentencing decisions
o a s e d o n m a t h e m a t i c sr e l a t e d t o s o m e t a r i f f t h a t d o e s n o l
exist and from u'hich a percentageis deducted must be
unreliable and certainlv does not have any approbation
from the Court. R. t'. Vlilliants.
Successfuldevelopment of a predictive model of
judicial sentencingbehaviour would offer further
ev idence o f th e p o te n ti a l c o n tri b u tion of boots t r appin g a p p ro a c h e s .e v e n i n fi e l d s t radi ti onal l y
believ ed to fa v o u r s o l e l y j u d g e m e n ta lforecasti ng
approaches.
Previous applications of quantitative analysis
t ec hniqu e sto c ri m i n a l s e n te n c i n gd a t a have been
descriptive in nature, investigating disparitiesin
sentence type and magnitude which are attributable to race, sex or other demographic and
socioeconomic factors [see Turk (1969), Hills
(L97t), Phillipson (I971), Chiricos, Jackson and
Waldo (1972), Quinney (1973), Thornberry
(1973), Hall and Simkus (1975), Talarico (L979),
Cr oy le (1 9 8 3 ), Sp o h n , W e l c h a n d Gruhl (1985),
Z at z ( 19 8 5 ), a n d g e n e ra l l y Bl u m stei n et al .
(1983)1. The studies have been criticised for
failing to adequately control for selection and
measurement biases and for omitting potentially
relevant data available to the sentencing judge
[ K lec k ( 1 9 8 1 ), Bl u m s te i n e t a l . (1 9 8 3)]. Moreover, they tend to concentrate on a small number of potential sources of discrimination rather
than characterising and replicating the sentencing decision. The investigationof predictive ability has received very little attention.
Quantitative forecastingtechniqueshave been
used to forecast the results of US Supreme Court
hearings in a range of alternative legal contexts,
including: the right to counsel [Kort (1957,
1963) , L a w l o r (1 9 6 2 ), U l me r (L 9 67, 1969)l ;
wor k er s ' c o m p e n s a ti o n [K o rt (1 9 6 3 )]; i nvol unt ar y c on fe s s i o n s [Ko rt (1 9 6 3 ), U l m er (7967,
1969) l; c i v i l l i b e rti e s [N a g e l (1 9 6 5), U l mer
(1969)l; race discrimination in jury selection
[ Ulm er ( 1 9 6 3 )]; a n d s e a rc h a n d s ei zure cases
[ Ulm er (1 9 6 3 ), S e g a l (1 9 8 a )]. In j uri sdi cti ons
beneath the Supreme Court level, the outcomes
of union regulation [Grunbaum and Newhouse
( 1965) 1, l a b o u r re l a ti o n s [Sp a e th (1963)], effi ployment discrimination IField and Holley
( 1982) ] , a n d c e rti o ra ri d e c i s i o n s[U l mer (1984)]
have also been quantitatively analysed with a
view to forecasting future decisions.Although a
range of analysis techniques are employed in
these studies. all of the studies use a dichotom-
ous dependent vari abl e representi nga d ecision
i n favour of ei ther the pl ai nti ff or the def cndant ,
In contrast. the pri mary model devel oped and
validated in this paper seeksto forecastsentence
length: a bounded continuous dependent variabl e.
The paper proceeds,in Section2, with discussion of an appropriate model specificationand
identification of independent variablesdescribing
the character of the offender and the circumstances of the offence. Section 3 presents the
results of calibrating the model using a sample of
67 recent appellate sentencing decisions. In Section 4 the forecasting ability of the resultant
model is tested and compared, using a holdout
sample, with that of an equal weights model and
a naive constant forecast. As applications and
further validations of the model, the disparity
between actual and predicted sentenceis used to
forecast successon appeal in Section 5; and to
predict the change in sentence,given a successful
appeal, in Section 6. Section 7 summarisesthe
findings and conclusions.
2. Model specification and variable selection
The approach of the New Zealand Court of
Appeal to sexual offence sentencing was described in R. v. Clark (1987). A single base
sentencing level for the relevant offence is identified, and the appropriate sentence determined
by adjusting this level accordingto the mitigating
and aggravating factors present:
. . . for a rape committed by an adult without any aggravating or mitigating features, a figure of five years
should be taken as a starting point in a contested case.
Aggravating features can include additional violence or
indignities, acting in concert with other offenders, the
youth or age of the victim, intrusion into a home . . . .
R. v. Clark at 383
The present model uses single-stagelinear regression to derive weights for each of the
mitigating and aggravating features. The structure of the model is consistentwith the approach
described in R. v. Clark; it begins with a base
sentencinglevel (the intercept) and adds or subtracts from that level according to the aggravating or mitigating nature of the prevailing sentencing factors.
D.l. Simester, R.J. Brodie I Forecasting criminal sentencing decisions
The data for the model were drawn from New
Zealand Court of Appeal sexual offence sentencing decisions.The Court of Appeal was chosen
since it is in practice New Zealand's highest
appellate criminal court. Our analysisdoes not
consider casesheard by lower level courts and
thus no conclusionscan be drawn regarding the
ability of the model to predict sentencingdecisions outside the Court of Appeal. However,
lower level courts are expected to closely follow
rhe sentencingguidelines of the Court of Appeal. Departures from those guidelines can be expected to result in a decision subsequentlybeing
reviewed by the Court of APPeal.
Sexual offence sentencingdecisionswere used
because sentencesin these cases are generally
terms of imprisonment. This avoids the complications inherent in weighting the severity of alternative sentencetypes if a variety of sentence
ty pes ar e im pose d [s e e Bl u m s te i n e t a l . (1 9 8 3)].
Moreover, the sentencing decision for sexual
offenders is not inhibited by the presenceof any
mandatory sentence requirements, while the
wide range in potential seriousnessof sexual
offences ensures that there is considerable variance in the sentencesreceived.
The maximum penalty serves as an indicator
to the courts of the seriousnessof an offence.
For different sexual offences it varies between 7,
10 and 14 years. The influence of each of the
explanatoryvariables on the sentenceimposed is
expected to vary with the size of the maximum
penalty; for example, any credit given for a plea
of guilty is expected to increaseas the maximum
penalty increases.Rather than including an interaction term with each explanatory variable,
when defining the dependent variable the actual
sentenceimposed was treated as a proportion of
the maximum penalty available.
Twenty-three features describing the circumstances of an offence and the character of an
offender were identified from an examination of
sentencing literature and decisions, and from
responsesto a series of questionnairesdistributed among a sample of criminal barristers, the
police and the public. The variablesderived were
then refined, following inspection of a pairwise
correlation matrix, to reduce the level of collinearity in the data. Thirteen of the variables
can be classified as describing the circumstances
of the offence. five describe the character of the
51
offender, and five either fit into neither category
or are relevant to both. A more complete description of the specificationand measurementof
the variables is available upon request from the
authors.
The i denti ty of the sentenci ngj udge w as not
included as a potential explanatory variable because 38 different judges were involved in the
calibration and holdout decisions. Moreover,
while the Court of Appeal presentsonly a single
decision, the Court consists of a minimum of
three judges. The vagarities of an individual
judge can be expected to be stabilised by the
influence of opposing opinions.
A coding form was designedand, after initial
pretesting and minor modifications,used to assist
data collection. The reliability of the data collection processwas checked by replicating the measurement process using another researcher.For
the variables with absolute meaning (year of
decision, offender's age, maximum penalty) no
discrepancieswere observed. Similarly, for dichotomous variables, discrimination between the
alternativesusing the facts reported in the judge- ,
ments was relatively simple. However, difficulties were encountered with the variables which
required a subjective assessmentof the degree of
severity (Emotional Injury, Offence was Premediated, Indencies, and Offender Psychiatrically
ill). Although the inconsistencies were minor,
using more explicit coding instructions may help
to improve measurement reliability in future research.
3. Results
The variable measurementswere standardised
so that they ranged from 1 to 0 and a preliminary
calibration was undertaken using all 23 independent variables and a sample of. 67 cases. From
this preliminary investigation five variables were
omitted from the model since their coefficients
did not have the expected sign (negative for
mitigating factors and positive for aggravating
factors), and thus lacked face validity. None of
the coefficients for these five variables was significantly different from zero, even at the 80Vo
confidence level.
Data for the remaining eighteen variables
were then recalibrated. The final weights for
52
D.l. Simester, R.J. Brodie I Forecasting criminal sentencing decisions
Exhibit 1
Estimatesof variableweightsand the basesentencinslevel.
Weight
t
value
Circumstances of the offence
Young Victim
Elderly Victim
Physical Injury
Threatened Violence
Emotional Injury
Victim was Offender's Wife
Victim was Offender's De Facto
Victim & Offender Acquainted
Male Victim
Indecencies
Number of Offenders
Offence was Premeditated
otb
L..a
2.5b
1.4
-3.6'
-
0.014
-0.076
-0.070
Other
Year of Decision
Appeal by Offender
0.080
-0.032
-0.096
0.306
a tb
L.+
-1.9'
-4.L^
0.093
0.159
0.r21
Character of the offender
Young Offender
Previous Conviction
Remorse Shown
Nature of Probation Reoort
Intercept
1.g"
2.0b
u.l)l
0.236
0.151
0.16'7
0.085
-0 262
-0.209
-0.078
-0.255
23b
"J..7"
2.5b
- 1.9'
0.3
-
r
L . l
rb
a 1 a
Effect on a
1 4 - v e a rs e n t e n c e
2.12
3.30
2.II
z.J+
1.19
-3.67
-2.92
- 1.09
-3.57
r.3r
2.22
1.69
-1.34
0.19
-L.07
-0.98
1..4b
0.s6
-0.45
9.1"
4.28
-2.1.
Omitted variables
Length of Maximum Penalty
Offender Pleaded Guilty
Offender Breached a Position of Trust
Offender was Intoxicated
Offender Psychiatrically Ill
Adj R'?
F-value
n
0.673
8.541"
67
" Significant at the 99Vo confidence level.
b
Significant at the 95Vo confidence level.
'Significant
at the gAVo confidence level.
these variables are presented in Exhibit 1 along
with the corresponding impacts on a potential
l|-year sentence (the maximum possible sentence for any sexual offence). These impacts
differ from the estimated coefficients since the
dependent variable used in the regression was
actual sentence imposed as a proportion of the
maximum penalty available.
The coefficient of determination, 0.762, indicates that over 76% of. the deviations from an
average sentence in the cases studied are explained by the weighted inffuence of the mitigatittg and aggravating factors. This finding con-
A--
trasts with those in previous empirical studies
examining the effect of offence and offender
characteristicswhich typically have accountedfor
less than a third of the sentencingvariance [Sutton (1978), B l umstei n et al . (1983); see also
Vining and Dean (1980) and Lovegrove (1989)].
An examination of the variance inflation factors,
and the eigenvalues of the X'X matrix, indicates
a low overall level of multicollinearity in the
data.
All but three of the parameterswere found to
be significantly different from zero at the 90%
confidence level. Eleven of the variables had
D.l. Simester, R.J. Brodie I Forecasting criminal sentencing decisions
coefficients which were significant at the 95Vo
level, while three variables and the intercept
were significant at the 99% level. The F-test
sratisticof the joint distribution of the independent variables, 8.54, is significant at the 997o
l ev el.
The average absolute effect, on a potential
14-vear sentence,of the variables reflecting circumstancesof the offence was found to be 2.29
years. By contrast, the corresponding average
weight given to the variables describing the
character of the offender was only 0.90 years. It
is clear that the Judiciary intuitively ascribemore
importance to the circumstances of the offence
than to the character of the offender when determining appropriate sentencelengths for sexual offenders.
The base sentencing level for an offence having a 74-year maximum penalty was found to be
4.28 years in 1983.The weight calculatedfor the
Year of Decision variable indicates that this base
level has increased by an average of 0.11 years
each year. These results are consistentwith dicta
in R. v. Clark (1987) suggestingthat the base
sentencing level for rape (an offence with a
maximum sentence of 14 years) was 5 years in
1987. Similarly, research by the Justice Department found that, on average, rape sentences
increasedfrom 3.9 years in 1979 to 4.87 years in
1986, an average annual increase of 0.12 years
[Spier and Luketina (1988)]. An increaseof 0.82
years was observed in 1987.
The positive coefficients for the Young Victim,
Elderly Victim, Physical Injury, Threatened Violence, Offence was Premeditated, Indecencies,
and Number of Offenders variables are consistentwith the Court of Appeal's recognition of
these variables as aggravating factors. Similarly,
the negative coefficients for the Remorse Shown,
Itlature of Probation Report and Young Offender
variables support the mitigating nature of these
variables.
Although not conceptually a mitigating or aggravating factor, the identity of the appellant
party (Crown or offender) was expected to indicate whether the sentencinglevel in a case is at
the lower or upper end oithe sentencingspectl!t.
The weight estimated for the Appeal by
Offender variible
is consisrent with this
reasoning.
Surprisingly, an offender's previous convic-
53
ti ons appear to have had onl v an i nci denta l
aggravating impact on sentencing levels. It appeared from preliminary analysis that the influence of this variable is instead upon the type
of sentence required: in particular, whether a
finite custodial or an indefinite preventative detention sentenceis appropriate. The aggravatins
influence of the Emotional Injury variable was
also less than expected.This may be explained in
part by the paucity of information available to
the courts concerning the emotional harm suffered by the victims of sexual offences.Although
the lack of such information prompted a change
in legislation in 7987, this amendment was too
recent to have been reflected in the model.
All the male complainants in the calibration
sample caseswere the victims of incestuousoffences perpetrated by a relative of the complainant. Therefore, together with the Wife and De
Facto variables, the Male Victim variable identifies incidences of intra-family offending. These
intra-family variables account for three of the
four largest coefficients(in absolute magnitude),
suggesting that intra-family offending is treated
as a separate, lesser category of offences.Alternatively, the prominence of the Male Victim
variable as a mitigating factor may indicate that
the courts treat the violation of a male as less
serious than that of a female - which is in direct
contrast to dicta of the Court of Appeal in R. v.
Ngawhika (1987).
The omission of the Length of Maximum Penalty variable indicates that sentencing levels vary
uniformly as proportions of maximum penalties.
This implies that individual sentencesare strongly dependent upon legislativeguidelinesas to the
seriousness of an offence (as embodied in the
level of the maximum penalty) and that the
Court applies similar sentencing practices across
the gamut of different sexual offences.
The insignificance of the Offender Pleaded
Guilty variable was surprising, since the courts
have previously suggestedthat such pleas will be
rewarded with a one year reduction in rape
sentences [Hall (1987)]. However, its absence
may be explicable. First, the cases considered
are all appeals to the Court of Appeal by which
point the relevance of a guilty plea in reducing
pressure on the criminal justice system is somewhat lessened. Second, a guilty plea is a less
emotive feature of cases, and may therefore
54
Forecasting criminal sentencing decisions
attract a lesser subjective weight. Third, the
presenceof remorse(whichmay be evidencedby
a guilty plea) is accountedfor separatelyin the
model.
4. Forecasting accuracy
The ability of the model to forecast sentence
lengths (as a proportion of the maximum penalty) in a holdout sample of. 22 additional cases
was compared with that of an equal weights
approach and a naive constant forecast.
The equal weights regressionmodel was calibrated by first summing the standardisedmeasures
for each variable. The dependent variable (proportion of maximum penalty awarded) was then
regressedagainst this score (and an intercept) in
the calibration sample of.67 casesand the resulting model used to predict the sentence in the
remaining sample of.22 cases.The naive constant
forecast represents a projection of the mean
sentence imposed in the calibration sample.
As a partial rationale for using unit weights,
it has been suggested [Einhorn and Hogarth
(1975)] that the precision of individual coefficients may be of less predictive importance than
identification of the relevant variables and the
signs of their respective coefficients. There have
been a number of studies in which equal weight
models have been found to have greater forecasting accuracy than standard regressionmodels
[Lawshe and Shucker (1959), Wesman and Bennet t ( 1959) , T ra ttn e r (1 9 6 3 ), L e h m a n n (1 971),
Fischer (1972), Beckwith and Lehmann (1973)
and Parsons and Hulin (1982)]. Parker and
Srinivasan (1976) report contradictory findings.
Monte Carlo procedures have also been used to
investigate when an equal weights model can be
expected to perform better ihan a standard regression model [Schmidt (1971), Einhorn and
Hogarth (1975), Dorans and Drasgow (1978)].
Overall the Monte Carlo results point to the
efficiency of equal weights when sample sizesare
small, the number of parameters is large and
multicollinearity is high. In contrast, Keren and
Newman (1978) note that equal weights models
will not perform well when suppressor variables
are present. Suppressorvariables are defined as
explanatory variables which are not correlated
L
with the response,but which improve the prediction of the response.
The current circumstances suggest that the
predictive performance of the regression model
will be similar to that of an equal weights model.
The sample size is small relative to the number
of parameters to be estimated. However, there
was little evidence of multicollinearity in the
data. Furthermore, while almost all of the coefficients reported in Exhibit L are significant,
few of the variables had significant pairwise correlations with the response, suggestingthe presence of a number of suppressorvariables.
The accuracy of the sentence predictions for
the 22 holdout casesis reported in Exhibit 2. The
percentage mean absolute error was calculated
by dividing the mean absoluteerror by the actual
mean sentence. The mean absolute error is reported assuming a 1.4-yearmaximum sentence.
The mean absolute prediction error for the
regressionmodel correspondsto 1.59 years for a
potential l4-year sentence.This representsa percentage mean absolute error of just under 25Vo
and a significant (o :0.05) improvement over
the naive forecasts. Only a very slight improvement is observed over the equal weights model.
Investigation of the nature of the forecast error
indicates that unsystematic residual noise accounted for almost all of the regressionmodel's
prediction error, with just 3Vo of.the error arising systematically[Maddala (1977)1.
The decrease in retrospective and predictive
efficiency between calibration and validation
samplesis commonly called 'shrinkage' [Farrington and Tarling (1985)1. It arises because the
product moment correlations in a sample may
differ from those in the population as a whole.
Modelling techniques which exploit these correlations can be expected to better fit observations
in the calibration sample than observationsrandomly drawn from the population as a rvhole.
The level of shrinkage across calibration and
validation samples is an indication of the extent
to which results estimated on the calibration
sample can be generalisedto a wider population.
Shrinkage between the calibration and validation samples is investigated by regressing
forecasted (or fitted) sentences against actual
sentencesin the calibration and validation samples. The results are presented in Exhibit 2.
Perfect predictions would result in a constant of
55
D . l . S i m e s t e r ,R . J ' B r o d i e I Forecasting,criminal sentencing decisions
Exhibit2
l e n g t h f o r e c a s t i n ga c c u r a c y .
Sentence
Regression
model
Mean absolute
error
P e r c e n t a g em e a n
a b s o l u t ee r r o r
1 . 5 9y e a r
Equal weights
model
1 . 6 0y e a r
24.8%
25.0%
Coefficient
S.E.
Naive mean
forecast
2.21year
35.jVc
R e g r e s s i o no f a c t u a i o n P r e d i c t e d
Calibratiotr sarnPle
Constant
Fitted sentence
Vo varianceexPlained
0.0000
1.0001
76Vo
67
n
Validation samPle
Constant
Predicted sentence
Vo varianceexPlained
0.0299
0.0693
0.0359
0.8577
n
zero, a coefficient of one, and all of the variance
in the dependentvariable being explained' These
criteria ir. better satisfied on the calibration
sample. However, the constant and coefficient
estimated on the validation sample are not significantly different from zero and one, respectively, and the forecastsdo explain over 50Voof
the variance in the validation sample sentences'
5. An application to forecasting successon
appeal
It is anticipated that a forecasting model of
sentencing might be used by attorneys to improve the advice they give to clients regarding
the likelihood of a successfulsentencing appeal'
The usefulness of the present model in this regard is investigated by evaluating its ability to
predict the successof the appeals in the decisions
used to calibrate and validate the model.
As New Zealand's highest appellate criminal
court, the Court of Appeal is responsible for
smoothing out disparities and ensuring the
evenhandedadministration of justice. As a consequence, the Court has a discretion to grant
appeals against sentence when the original sentence represents an unjustified departure from
0.0879
0.t702
56Vo
22
previous sentencing practice. In determining
'unjustified', the court has stated a'wilwhat is
lingness to employ different definitions depending upon whether the appeal is brought by the
Crown or the offender. When the Crown is
appealing, the departure from previous practice
must be greater than when the appeal is brought
by the offender; considerationsjustifying an in'speak more powerfully than those
crease must
which ordinarily might justify a reduction' [R' v'
Wihapi 1976, at 424 per McCarthy P., noted in
H al l (1987); see al so R . v' B eaman 1982;and R '
v. Lepper and McDonald 19851.This distinction
necessitatesincorporation of the identity of the
appellant into the forecasting model, or alternitively, estimation of separate models using
Crown and offender appellant subsamples'Both
approachesare considered:
o Ful l sampl e model :
Success: G + BrDisparity + BrAppellant'
o Offender and Crown subsamplemodels:
Success: d. L BrDisParitY,
w here:
Original : original sentence length/maximum
PcnaltY;
Predicted : sentence length prediction/maximum P enal tY ;
Forecasting criminal sentencing decisions
Appellant:0, if the appeal was by the offender,
and 7, if the appeal was by the
Crown;
Success :1, if the appeal was successful,and
0, if the appeal was unsuccessful;
Disparity : Original - Predicted, if, Appellant:
0,
and
Predicted - Original,
if
Ap p e l l a n t: l .
In some instancesnegative levels of disparity
occurred. These measureswere not truncated to
zero as appeals brought before the Court of
Appeal are not limited to those with genuine
merit. Some appellants may realise that there is
little hope of a successfulappeal, but believe that
they have little to lose from making an attempt.
This is particularly true for those defendantswho
have their legal costs paid by the state.
Owing to the dichotomous nature of the dependent variable, the models are estimatedusing
PROBIT. The coefficientsrepresent the change
in the cumulative normal probability function
that results from a unit change in the respective
independent variables. The original sentence
length was not reported in six of the cases,and
the size of the dataset is thus reduced to 83
observations.The results of the full sample and
offender/Crown appellant subsamples are reported in Exhibit 3.
Overall the models show a high degree of
statisticalsignificanceand discriminatory ability.
The log likelihood ratio statisticsand all but one
of the variable coefficientsare statisticallysignificant (a :0.05). The dependentvari abl e means
indicate that the sentencing appeals were successfulin 42Vo,24Vo and 76Voof the full sample,
and offender appellant and Crown appellant subsample observations (respectively).With no further information, appeal successcould therefore
be categorised successfully in 58Vo of the full
sample cases and 76Vo of both the Crown and
offender appellant subsample cases. By adding
the independent variables, discriminatory performance increased to 83Va accuracy across all
three samples.
Contrary to the Court of Appeal's stated reluctance to increase sentencesupon appeal, the
parameter estimates indicate that to successfully
Exhibit 3
Appeal success
model.
Full
sample
Coefficients
Intercept
Disparity
Appellant
-!.07"
-0.98
(-4.6)
5.76^
(4.2)
1.L9'
( 3 . 3)
(-4.3)
4.46'
(3.2)
Critical disparity values (14 year maximumpenalty)
Offender appellant
2 . 6 0y r
- 0 . 3 0y r
Crown appellant
R'
2 log(likelihood)
D e p . v a r i a b l em e a n
Correct classifications
n
'Significant
at a :0.01.
o
S i g n i f i c a n ra t a : 0 . 0 - 5 .
N o l e : F i g u r e s i n p a r e n t h e s e sa r e t - r ' a l u e s .
A
Offender
appellant
subsample
0.51
'67.9"
0.42
83Vo
83
Cro*'n
appellant
subsample
-0.95
(- r.r/
19.32b
(2.?)
3 . 0 9y r
0.73yr
0.29
46.9"
0.21
83Vc
54
0.52
15.1"
0.i6
83Vo
29
D.l. Simester. R.J. Brodie I Forecasting criminal sentencing decisions
appeal a sentence, an offender must show a
tgnifi.untly greater disparity between the original sentence and previous sentencing practice
than is required of the Crown. For an offence
carrying a maximum penalty of 14 years imprisonment, the probability of a successfulappeal by
the offender will onll' exceed 507o when the
original sentence is approximately three years
sreater than the predicted sentence.In contrast,
in upp.ul by the Crown is likely to be successful
when the predicted sentence exceedsthe actual
se n tenc ebY jus t nin e m o n th s .
To investigate forecasting accuracy, the full
samplewas split into two subsamplesand models
calibrated using the data in each subsample.
These models were then used to forecast the
successof the appeals in the other subsampleThe distribution of the cases into each of the
subsampleswas performed using a stratified random selection procedure, in order to ensure
similarity between the subsamples.The parameters estimatedusing the two subsampleswere not
significantly different from those estimated using
the full sample. Results are presentedin Exhibit
4.
The dependent variable means indicate that a
constant forecast of failure would have been
accuratein 57Voof the subsample1 observations
and 59Vo of the subsample 2 observations.The
forecasting models performed with SIVo and
88Vo accuracy, respectively. Little shrinkage was
observed between the calibration and validation
sample performances; indeed, the best forecasts
of appeal successin the second subsamplewere
obtained from the model calibrated on the first
subsample. These performances represent
dramatic improvements on a constant forecast,
and confirm that the model is capable of accurately forecasting the outcome of sentencing appeals.
Exhibit4
Appeal successmodel forecast results.
Subsample 1
Dependent variable mean
Correct classifications
Subsample 2
Dependent variable mean
Correct classifications
Calibration
Validation
0.43
83Vo
0.43
SIVo
0.41
85o/o
0.41
88Vo
57
6. Forecasting change in sentencein a
successfulappeal
As a further investigation of the predictive
power of the model, the disparity between actual
and predicted sentencesis used to predict the
amount sentences changed in those cases in
which the Court found that a change was justified. The relevanceof the appellant'sidentity is
again investigatedboth by inclusion of an AppelIant variable and by estimation of separatemodels using Crown and otfender subsamples.
The coefficient estimated for the Appellant
variable was not significant and no significant
difference was found in the parametersestimated
using the Crown and offender subsamples.The
results of regressingabsolute change in sentence
against Disparity (as defined in Section 5) in a
calibration sample of. 20 cases are presented in
Exhibit 5. Forecast accuracyin a holdout sample
of 15 casesis also reported, both for the disparity
model and for a naive constant forecast. The
Exhibit5
Sentence chanse model.
Calibration results
Coefficient
Disparity
Intercept
Adj R'
^F-value
0.437
0.097
0.572
26.40^
20
n
t value
5.1"
5.4^
Forecast results
Disparity
model
Naive mean
forecast
Mean absolute error
Percentagemean
Absolute error
0.73year
2l.6Va
t.2L year
45.77o
Regression of actual on predicted
Calibration sample
Constant
Fitted sentence
Eo variance explained
n
Validation sample
Constant
Predicted sentence
Ea variance explained
n
Coefficient
Stderror
0.0000
1.0000
0.0355
0.1947
59Va
20
-0.0178
0.0375
0.2067
73Va
15
r.2394
" Significant at the 997o confidence level.
58
Forecasting criminal sentencing decisions
naive constant forecast represented the mean
sentence change in the 20 calibration sample
c as es .
The coefficient estimated for the disparity
variable indicates that, having decided an appeal
is justified, the Court will adjust a sentenceby
approximately 0.44 of a year for every year of
disparity observed betrveenthe original and the
predicted sentences.This result is consistentwith
statements of the Court indicating that it will
vary sentencesby the minimum extent necessary
to redress discrepancies(see R. v. Urlich; R. v.
Sirnm; and R. v. Steer).
Mean absolute prediction error corresponds
to 0.73 years for offences with a l4-year maximum penalty, while the percentagemean absolute error equals 27.6Vo. Predictions from the
regression model were significantly (c :0.05)
more precise than the naive projection. There
was some evidence of shrinkage between the
calibration and validation samples.Although the
model explained less sentencechangevariancein
the calibration sample than in the validation
sample, the calibration sample predictions show
no s y s t em a ti cb i a s . In th e v a l i d a ti o n th ere w as a
tendency to underpredict at the lower end of the
spectrum and to overpredict large sentence
changes. However, this tendency was not significant; when the actual sentencewas regressedon
the predicted sentence neither the constant nor
the estimated coefficient were significantly different from their respectiveideals.
7. Summary and conclusions
In this paper we have investigated the
feasibility of developing a bootstrapping model
of criminal sentencing decisions. New Zealand
Court of Appeal sexual offence sentencingdecisions are modelled as a function of factors describing the character of the offender and the
circumstancesof the offence. The structure of
the model assumesthat the Judiciary's intuitive
approach to the sentencing decision is to start
with a base sentencinglevel (as a proportion of
t he m ax im u m p e n a l ty ) a n d to a d d o r subtract
from that level according to the aggravatingor
mitigating nature of the prevailing factors. The
results are encouraging. The weights estimated
for the various sentencing factors are generally
*"-
consistent with prior expectations.Although inconsistenciesdo arise, explanationsare generally
available, either suggestingmodificationsto current theory or offering justifications for the lack
of corroboration.
This model significantly outperforms a naive
mean projection in forecastingsentencelengths
in a holdout sample of 22 additional decisions.In
contrast, its forecasting performance is almost
identical to that of an equal weights model,
indicating that predictive performance is not
terribly sensitive to the relative magnitudes of
the estimated coefficients. This result is consistent with previous research investigating the
relative predictive performanceof regressionand
equal weights models.
Almost all of the prediction error resulted
from unsystematic residual variance. This variance may be explained by inconsistencies
in the
sentencing behaviour of the judges, and from
imprecision in the model. Inconsistenciesin the
sentencingbehaviour of the Judiciary would correspond with observations by the judges themselvesthat the features of different casesare too
complex to be reliably compared using intuition
alone, Model imprecision may be due to the
omission of relevant mitigating or aggravating
factors; alternatively, the precision with which
the model is able to predict sentence lengths
should be viewed in the context of the generality
of the model. The cases include a range of
different offences, each of which may be associated with varied weights for the sentencing
factors.
Sentencepredictions generated by the model
are used to forecast the likelihood of successon
appeal, and to predict the change in sentence
given a successfulappeal. The results confirm
that bootstrapping models have the potential to
assistattorneys and improve the advice they can
offer cl i entsregardi ngthc w i sdom of pursuingan
appeal. Moreover, by identifying the intuitive
weights attached by the Judiciary to the various
sentencing factors and clarifying existing practice, such models can facilitate discussion of
sentencing objectives and identify the need for
potenti al reform.
These findings contradict claims by the
Judiciary that the complexity of criminal sentencing precludesdevelopment of a useful bootstrappi ng model of sentenci ngdeci si ons.In doi ng so,
D. I . Simester, R. J . Brodie I Forecasting criminal sentencing decisions
the findinss demonstrate the potential contribution of bootstrapping models to a field traditiona l l l ' b e l iev ed t o f av our s o l e l y j u d g e me n ta lfo re ca sti n ga ppr oac hes .
Acknowledgements
Th e aut hor s ac k now l e d g eth e c o n tri b u ti o n o f
Andrew Simester and wish to thank J. Scott
Armstrong and three anonymous 1"/F reviewers
for helpful comtnents and suggestions.
Case references
R. v. Beaman 1982 (16 November 7982) CA
r77I 82
R. v. Clark[1987]1 NZLR 380
R. v. Lepperand McDonald 1985CA
R. v. Ngawhika1987(3 July 1987)CA 91187
R . v . S t m m 1 9 8 1( 9 O c t o b e r1 9 8 1 )C A 1 4 8 / 8 1
R. v. Steer1981(7 December1981)CA 165/81
R. v. Urlich [1981]1 NZLR 310
R. v. Wihapi U97611 ttZLF. 422
R. v. Williams[1983]Crim LR 693
References
Armstrong, J.S., 1985, Long-Range Forecasting:From Crystal Ball to Computer, 2nd edn. (Wiley-lnterscience, New
York).
Beckwith, N.E. and D.R. Lehmann, 1973."The importance
of differential weights in multiple attribute models of
consumer attitude", Journal of Marketing Research, 10,
141-145.
B l u m s t e i n ,A . , J . C o h e n , S . E . M a r t i n a n d M . H . T o n r y , e d s . ,
7983, Research on Sentencing: The Search for Reform,
P a n e l o n S e n t e n c i n gR e s e a r c h ( N a t i o n a l A c a d e m y P r e s s ,
W a s h i n g t o n ,D C ) .
Bunn, D. and G. Wright, 1991,"Interaction of judgemental
and statistical forecasting methods: Issues and analysis",
M a n a g e m e n tS c i e n c e , 3 7 , n o . 5 , 5 0 1 - 5 1 8 .
Chiricos, T.G., P.D. Jackson and G.P. Waldo, 1972,
"lnequality in the imposition of a criminal label", Social
Problems,19, no. 4, 553-572.
C r o y l e , J . L . , 1 9 8 3 . " M e a s u r i n g a n d e x p l a i n i n g d i s p a r i t i e si n
felony sentences:Courtroom work group factors and race,
sex and socioeconomic influences on sentence severity",
Political Behaviour, 5, 135-153.
Dorans, N. and F. Drawgow,ISTS, "Alternative weighting
schemes for linear prediction" , Organizational Behaviour
and Humsn Performance, 2I, 316-345.
59
Einhorn, H.J. and R.VI. Hogarth, 1975, "Unit weighting
s c h e m e sf o r d e c i s i o n m a k i n g " , O r g a n i z a t i o n u l B e h a v i o u r
and Human PerJ'ormance,13, IlI-192.
Farrington, D.P. and R. Tarling, 1985,"Criminological prediction: An introduction", in; D.P. Farrington and R.
Tarling, eds., Prediction in CriminologS, (SUNY Press,
New York).
F i e l d , H . S . a n d W . H . H o l l e y , 1 9 8 2 , " T h e r e l a t i o n s h i po f
p e r f o r m a n c e a p p r a i s a l s y s t e m c h a r a c t e r i s t i c st o v e r d i c t s i n
selected employment discrimination cases", Academy of
Management Journal,2-5, no. 2, 392-406.
Fischer, G.W., 1972, Four Methods for Assessing Nlultl
Attribute Utilities: An Experintentol Validation (Engineeri n g P s y c h o l o g yL a b o r a t o r y , U n i v e r s i t y o f M i c h i g a n ) .
Grunbaum, W.F. and A. Nervhouse, 1965, "Quantitative
analysis of Judicial decisions: Some problems in
prediction", Houslon Law Review,3, Z0l-220.
Hall, G.G., 1987. Hall on Sentencingin New Zealand (Butterworths, \&'ellington).
H a l l , E . L . a n d A . A . S i m k u s , 1 9 7 5 ," I n e q u a l i t y i n t h e t y p e s
o f s e n t e n c e sr e c e i v e d b y n a t i v e A m e r i c a n s a n d w h i t e s " ,
Criminology, 13, no. 2, 199-222.
Hills, S.L. 197I, Crime, Power, and Morallty (Chandler,
Scranton, PA).
Keren, G. and J.R. Newman, 1978, "Additional considerations with regard to multiple regression and equal weighting", Organizational Behaviour and Human Performance.
22, r43-t64.
senti:ncKleck,G., 1981,"Racialdiscrimination
in criminal
ing: A critical evaluation of the evidencewith additional
evidence on the death penalty", American Sociological
Review, 46, 783-805.
K o r t , F . , 1 , 9 5 7 ," P r e d i c t i n g S u p r e m e C o u r t d e c i s i o n sm a t h e 'Right
m a t i c a l l y : A q u a n t i t a t i v e a n a l y s i so f t h e
to Counsel'
cases", American Political Science Review, 5L, 1-lZ.
K o r t , F . , 1 9 6 3 , " C o n t e n t a n a l y s i so f J u d i c i a l o p i n i o n s a n d
rules of law", in: G. Schubert, eds.. Judicial DecisionMaking (The Free Press of Glencoe, New York), 133-197.
Lawlor, R.C., 1962, "Mathematical aids to prediction of
court decisions", Proc. Second National Law and Electronics Conference.
Lawshe, C.H. and R.E. Shucker, 1959, "The relative efficiency of four test weighting methods in multiple regression", Educational and Psychological Measurement, 19,
1 0 3 - 11 4 .
L e h m a n n , D . R . , 1 9 7 1 , " T e l e v i s i o n s h o w p r e f e r e n c e :A p p l i cation of a choice model", Journal of Marketing Research,
8,47-55.
Lovegrove, A., 1989, Judicial Decision Making, Sentencing
Policy, and Numerical Guidance (Springer-Verlag, New
York).
M a d d a l a , G . S . , 1 9 7 7 , E c o n o m e t r i c s( M c G r a w - H i l l , L o n d o n ) .
Nagel, S., 1965, "Predicting court cases quantitatively",
Michigan Law Review, 63, I4ll-1422.
Parker, B.R. and V. Srinivasan, 1976, "A consumer preference approach to the planning of rural primary healthc a r e f a c i l i t i e s " , O p e r a t i o n sR e s e a r c h , 2 4 , n o . 5 , 9 9 1 - 1 0 2 5 .
P a r s o n s ,C . K . a n d C . L . H u l i n , l 9 S 2 , " D i f f e r e n t i a l l y w e i g h t ing linear models in organizational research: A crossvalidation comparison of four methods", Organizational
Behaviour and Human Performance, 30,289-311.
Brodie I Forecasting criminal sentencing decisions
Phillipson, M., 1,971, Sociological Aspects of Crime and
Delinquency (Routledge and Kegan Paul, London).
Quinney, R., 1973, Critique of Legal Order (Little Brown,
Boston).
Schmidt, F.L., 1971, "The relative efficiency of regression
and simple unit predictor weights in applied differential
psychology", Educational and Psychological Measurement,
37,699-714.
Segal, J.A., 1984, "Predicting Supreme Court casesprobabilistically: The search and seizure cases, 1962-198I" .
The American Political ScienceReview, 78, 891-900.
Spaeth, J., 1963, "An analysisof Judicial attitudes in the
labor relations decisions of the Warren Court", Journal of
Polirics, 25, 290-311.
Spier, P. and F. Luketina, 1988, The Impact on Sentencingof
the Criminal Justice Act 1985 (Department of Justice,
Wellington).
S p o h n , C . , S . W e l c h a n d J . G r u h l , 1 9 8 5 ," W o m e n d e f e n d a n t s
in court: The interaction between sex and race in convicting and sentencing", Social ScienceQuarterly,66, 178-185.
Sutton, L.P., 1978, Variations in Federal Criminal Sentences:
Statistical Assessment at the National Level (Criminal Justice Research Centre, New York).
T a l a r i c o , S . M . , 1 9 7 9 , " J u d i c i a l d e c i s i o n sa n d s a n c t i o n p a t terns in criminal justice", The Journal of Criminal Law and
Criminolo gy, 70, 1.I'7-I24.
T h o r n b e r r v , T . P . , 7 9 7 3 , " R a c e , s o c i o e c o n o m i cs t a t u s a n d
sentencing in the juvenile justice system", Journal of
Criminal Law and Criminology, 6.1, 90-98.
Trattner, M.H., 1963, "Comparison of three methods for
assembling aptitude test batteries", Personnel Psychology,
76,Z2t-232..
Turk, A.T., 1969, Criminality and the Legal Order (Rand
McNally, Chicago).
Ulmer, S.S., 1963, "Quantitative analysis of judicial processes:Some practical and theoretical applications" , Journal of Law and Contemporary Issues,28, 164-184.
L
Ulmer, S.S., 1967, "Mathematical models for predicting
sentencing", in: J.L. Bernd, ed., Mathemotical Applications in Polilical Science, Vol. III (The University Press of
Virginia, Charlottesville), 67*95.
U l m e r , S . S . , i 9 6 9 , " T h e d i s c r i m i n a n tf u n c t i o n a n d a t h e o r e t i c a l c o n t e x t f o r i t s u s e i n e s t i m a t i n gt h e v o t e s o f j u d g e s " , i n :
J.B. Grossman and J. Tanenhaus, eds., Frontiers of Judicial Researclr (Wiley, New York), 335-369.
Ulmer, S.S., 1984, "The Supreme Court's certiorari decisions: Conflict as a predictive variable", American Polilical
S c i e n c eR e v i e w , 7 8 , 9 0 1 - 9 1 1 .
Vining, A.R. and C. Dean, 1980, "Towards sentencing
uniformity: Integrating the normative and the empirical
orientation", in: B.A. Grosman, eds., New Directionsin
Sentencing (Butterworth, Toronto), 117-154.
Wesman, A.G. and G.K. Bennett, 1959, "Multiple regression vs. simple addition of scores in prediction of college
grades", Educational and Psychological Measurement, 19,
243-246.
Zatz, M.5., 1985, "Pleas, priors and prison: Racial/Ethnic
differences in sentencing", Social SciencesResearch, 11,
169-193.
Biographies: Roderick J. BRODIE is Professor of Marketing
and Head of Department of Marketing and International
Business, University of Auckland, New Zealand. His research interests are in the areas of marketing science and
forecasting. His publications have appeared in the Journal of
Marketing Research, International Journal of Forecasting and
lournal of Business Research and other journals.
Duncan L SIMESTER is completing his Ph.D. in the Sloan
S c h o o l o f M a n a g e m e n t , M a s s a c h u s e t t sI n s t i t u t e o f T e c h n o l ogy. His research interests are in the areas of marketing
science and forecastins.
L