inrernational Journal of Forecasting 9 (1993) 49-60 North-Holland 49 Forecastingcriminal sentencingdecisions DuncanI. Simester MIT, Cambridge, MA 02139,USA SloanSchoolof Management, RoderickJ. Brodie University of Auckland, Private Bag, Auckland, New Zealand Abstract: This paper investigates the feasibility of developing bootstrapping models of criminal sentencing decisions to predict sentence length and the outcome of sentencing appeals. New Zealand Court of Appeal sexual offence sentencingdecisionsare modelled as a function of variables describing the character of the offender and the circumstancesof the offence. The model outperforms both an equal weights approach and a naive mean projection in predicting sentencelengths in a holdout sample. As applications and further validations of the model, the disparity between actual and predicted sentence is used to forecast the successof an appeal and to predict change in sentence, given a successfulappeal. The results confirm that bootstrappingmodels have the potential to contribute to a field traditionally believed to favour solely judgemental approaches. Keywords: Bootstrapping, Criminal sentencing,Econometric forecasting. 1. Introduction In a recent review of the performance of judgemental and statistical forecasting methods, Bunn and 'Wright (1991) conclude that contemporary studies have demonstrated that statistical models of judgement may have less predictive accuracy than the judges upon whom the models are based. They go on to state that the superiority of these so-called 'bootstrapping' models is most at risk when there is a need to consider peripheral information, and where the forecasting process is less stable and routine. This conclusion contrasts with the findings in an earlier review by Armstrong (1985), where it is claimed that a bootstrapping modei is always at least as accurate as the judge. This paper seeks to contribute to this discussion, by investigating the feasibility of developCorrespondence b: R.J. Brodie, University of Auckland, Private Bag, Auckland, New Zealand. ing a bootstrapping model of criminal sentencing decisions. Accurate forecasts of criminal sentencing decisionsshould, in principle, be of considerable value to attorneys, judges and policymakers. Attorneys must themselves intuitively forecast sentencing decisions in order to advise their clients properly on the wisdom of plea bargaining options, guilty pleas and appeals against sentence. A formal forecasting model therefore has the potential to improve the advice that attorneys can offer their clients, while also providing an objective measure of disparity for use in appeal submissions. However, the Courts have expresseda reluctance to encourage the development of such models, arguing that the complexity and variety of factors to be considered preclude such an approach: U]t is clearly to be recognised that sentencing is an individual, subjective exercise and must be tailored to all the circumstances of each individual case. Anv exercise 0169-2070t93/$06.00 O 1993- Elsevier Science publishers BV. All rights reserved I ! I 50 Forecasting criminal sentencing decisions o a s e d o n m a t h e m a t i c sr e l a t e d t o s o m e t a r i f f t h a t d o e s n o l exist and from u'hich a percentageis deducted must be unreliable and certainlv does not have any approbation from the Court. R. t'. Vlilliants. Successfuldevelopment of a predictive model of judicial sentencingbehaviour would offer further ev idence o f th e p o te n ti a l c o n tri b u tion of boots t r appin g a p p ro a c h e s .e v e n i n fi e l d s t radi ti onal l y believ ed to fa v o u r s o l e l y j u d g e m e n ta lforecasti ng approaches. Previous applications of quantitative analysis t ec hniqu e sto c ri m i n a l s e n te n c i n gd a t a have been descriptive in nature, investigating disparitiesin sentence type and magnitude which are attributable to race, sex or other demographic and socioeconomic factors [see Turk (1969), Hills (L97t), Phillipson (I971), Chiricos, Jackson and Waldo (1972), Quinney (1973), Thornberry (1973), Hall and Simkus (1975), Talarico (L979), Cr oy le (1 9 8 3 ), Sp o h n , W e l c h a n d Gruhl (1985), Z at z ( 19 8 5 ), a n d g e n e ra l l y Bl u m stei n et al . (1983)1. The studies have been criticised for failing to adequately control for selection and measurement biases and for omitting potentially relevant data available to the sentencing judge [ K lec k ( 1 9 8 1 ), Bl u m s te i n e t a l . (1 9 8 3)]. Moreover, they tend to concentrate on a small number of potential sources of discrimination rather than characterising and replicating the sentencing decision. The investigationof predictive ability has received very little attention. Quantitative forecastingtechniqueshave been used to forecast the results of US Supreme Court hearings in a range of alternative legal contexts, including: the right to counsel [Kort (1957, 1963) , L a w l o r (1 9 6 2 ), U l me r (L 9 67, 1969)l ; wor k er s ' c o m p e n s a ti o n [K o rt (1 9 6 3 )]; i nvol unt ar y c on fe s s i o n s [Ko rt (1 9 6 3 ), U l m er (7967, 1969) l; c i v i l l i b e rti e s [N a g e l (1 9 6 5), U l mer (1969)l; race discrimination in jury selection [ Ulm er ( 1 9 6 3 )]; a n d s e a rc h a n d s ei zure cases [ Ulm er (1 9 6 3 ), S e g a l (1 9 8 a )]. In j uri sdi cti ons beneath the Supreme Court level, the outcomes of union regulation [Grunbaum and Newhouse ( 1965) 1, l a b o u r re l a ti o n s [Sp a e th (1963)], effi ployment discrimination IField and Holley ( 1982) ] , a n d c e rti o ra ri d e c i s i o n s[U l mer (1984)] have also been quantitatively analysed with a view to forecasting future decisions.Although a range of analysis techniques are employed in these studies. all of the studies use a dichotom- ous dependent vari abl e representi nga d ecision i n favour of ei ther the pl ai nti ff or the def cndant , In contrast. the pri mary model devel oped and validated in this paper seeksto forecastsentence length: a bounded continuous dependent variabl e. The paper proceeds,in Section2, with discussion of an appropriate model specificationand identification of independent variablesdescribing the character of the offender and the circumstances of the offence. Section 3 presents the results of calibrating the model using a sample of 67 recent appellate sentencing decisions. In Section 4 the forecasting ability of the resultant model is tested and compared, using a holdout sample, with that of an equal weights model and a naive constant forecast. As applications and further validations of the model, the disparity between actual and predicted sentenceis used to forecast successon appeal in Section 5; and to predict the change in sentence,given a successful appeal, in Section 6. Section 7 summarisesthe findings and conclusions. 2. Model specification and variable selection The approach of the New Zealand Court of Appeal to sexual offence sentencing was described in R. v. Clark (1987). A single base sentencing level for the relevant offence is identified, and the appropriate sentence determined by adjusting this level accordingto the mitigating and aggravating factors present: . . . for a rape committed by an adult without any aggravating or mitigating features, a figure of five years should be taken as a starting point in a contested case. Aggravating features can include additional violence or indignities, acting in concert with other offenders, the youth or age of the victim, intrusion into a home . . . . R. v. Clark at 383 The present model uses single-stagelinear regression to derive weights for each of the mitigating and aggravating features. The structure of the model is consistentwith the approach described in R. v. Clark; it begins with a base sentencinglevel (the intercept) and adds or subtracts from that level according to the aggravating or mitigating nature of the prevailing sentencing factors. D.l. Simester, R.J. Brodie I Forecasting criminal sentencing decisions The data for the model were drawn from New Zealand Court of Appeal sexual offence sentencing decisions.The Court of Appeal was chosen since it is in practice New Zealand's highest appellate criminal court. Our analysisdoes not consider casesheard by lower level courts and thus no conclusionscan be drawn regarding the ability of the model to predict sentencingdecisions outside the Court of Appeal. However, lower level courts are expected to closely follow rhe sentencingguidelines of the Court of Appeal. Departures from those guidelines can be expected to result in a decision subsequentlybeing reviewed by the Court of APPeal. Sexual offence sentencingdecisionswere used because sentencesin these cases are generally terms of imprisonment. This avoids the complications inherent in weighting the severity of alternative sentencetypes if a variety of sentence ty pes ar e im pose d [s e e Bl u m s te i n e t a l . (1 9 8 3)]. Moreover, the sentencing decision for sexual offenders is not inhibited by the presenceof any mandatory sentence requirements, while the wide range in potential seriousnessof sexual offences ensures that there is considerable variance in the sentencesreceived. The maximum penalty serves as an indicator to the courts of the seriousnessof an offence. For different sexual offences it varies between 7, 10 and 14 years. The influence of each of the explanatoryvariables on the sentenceimposed is expected to vary with the size of the maximum penalty; for example, any credit given for a plea of guilty is expected to increaseas the maximum penalty increases.Rather than including an interaction term with each explanatory variable, when defining the dependent variable the actual sentenceimposed was treated as a proportion of the maximum penalty available. Twenty-three features describing the circumstances of an offence and the character of an offender were identified from an examination of sentencing literature and decisions, and from responsesto a series of questionnairesdistributed among a sample of criminal barristers, the police and the public. The variablesderived were then refined, following inspection of a pairwise correlation matrix, to reduce the level of collinearity in the data. Thirteen of the variables can be classified as describing the circumstances of the offence. five describe the character of the 51 offender, and five either fit into neither category or are relevant to both. A more complete description of the specificationand measurementof the variables is available upon request from the authors. The i denti ty of the sentenci ngj udge w as not included as a potential explanatory variable because 38 different judges were involved in the calibration and holdout decisions. Moreover, while the Court of Appeal presentsonly a single decision, the Court consists of a minimum of three judges. The vagarities of an individual judge can be expected to be stabilised by the influence of opposing opinions. A coding form was designedand, after initial pretesting and minor modifications,used to assist data collection. The reliability of the data collection processwas checked by replicating the measurement process using another researcher.For the variables with absolute meaning (year of decision, offender's age, maximum penalty) no discrepancieswere observed. Similarly, for dichotomous variables, discrimination between the alternativesusing the facts reported in the judge- , ments was relatively simple. However, difficulties were encountered with the variables which required a subjective assessmentof the degree of severity (Emotional Injury, Offence was Premediated, Indencies, and Offender Psychiatrically ill). Although the inconsistencies were minor, using more explicit coding instructions may help to improve measurement reliability in future research. 3. Results The variable measurementswere standardised so that they ranged from 1 to 0 and a preliminary calibration was undertaken using all 23 independent variables and a sample of. 67 cases. From this preliminary investigation five variables were omitted from the model since their coefficients did not have the expected sign (negative for mitigating factors and positive for aggravating factors), and thus lacked face validity. None of the coefficients for these five variables was significantly different from zero, even at the 80Vo confidence level. Data for the remaining eighteen variables were then recalibrated. The final weights for 52 D.l. Simester, R.J. Brodie I Forecasting criminal sentencing decisions Exhibit 1 Estimatesof variableweightsand the basesentencinslevel. Weight t value Circumstances of the offence Young Victim Elderly Victim Physical Injury Threatened Violence Emotional Injury Victim was Offender's Wife Victim was Offender's De Facto Victim & Offender Acquainted Male Victim Indecencies Number of Offenders Offence was Premeditated otb L..a 2.5b 1.4 -3.6' - 0.014 -0.076 -0.070 Other Year of Decision Appeal by Offender 0.080 -0.032 -0.096 0.306 a tb L.+ -1.9' -4.L^ 0.093 0.159 0.r21 Character of the offender Young Offender Previous Conviction Remorse Shown Nature of Probation Reoort Intercept 1.g" 2.0b u.l)l 0.236 0.151 0.16'7 0.085 -0 262 -0.209 -0.078 -0.255 23b "J..7" 2.5b - 1.9' 0.3 - r L . l rb a 1 a Effect on a 1 4 - v e a rs e n t e n c e 2.12 3.30 2.II z.J+ 1.19 -3.67 -2.92 - 1.09 -3.57 r.3r 2.22 1.69 -1.34 0.19 -L.07 -0.98 1..4b 0.s6 -0.45 9.1" 4.28 -2.1. Omitted variables Length of Maximum Penalty Offender Pleaded Guilty Offender Breached a Position of Trust Offender was Intoxicated Offender Psychiatrically Ill Adj R'? F-value n 0.673 8.541" 67 " Significant at the 99Vo confidence level. b Significant at the 95Vo confidence level. 'Significant at the gAVo confidence level. these variables are presented in Exhibit 1 along with the corresponding impacts on a potential l|-year sentence (the maximum possible sentence for any sexual offence). These impacts differ from the estimated coefficients since the dependent variable used in the regression was actual sentence imposed as a proportion of the maximum penalty available. The coefficient of determination, 0.762, indicates that over 76% of. the deviations from an average sentence in the cases studied are explained by the weighted inffuence of the mitigatittg and aggravating factors. This finding con- A-- trasts with those in previous empirical studies examining the effect of offence and offender characteristicswhich typically have accountedfor less than a third of the sentencingvariance [Sutton (1978), B l umstei n et al . (1983); see also Vining and Dean (1980) and Lovegrove (1989)]. An examination of the variance inflation factors, and the eigenvalues of the X'X matrix, indicates a low overall level of multicollinearity in the data. All but three of the parameterswere found to be significantly different from zero at the 90% confidence level. Eleven of the variables had D.l. Simester, R.J. Brodie I Forecasting criminal sentencing decisions coefficients which were significant at the 95Vo level, while three variables and the intercept were significant at the 99% level. The F-test sratisticof the joint distribution of the independent variables, 8.54, is significant at the 997o l ev el. The average absolute effect, on a potential 14-vear sentence,of the variables reflecting circumstancesof the offence was found to be 2.29 years. By contrast, the corresponding average weight given to the variables describing the character of the offender was only 0.90 years. It is clear that the Judiciary intuitively ascribemore importance to the circumstances of the offence than to the character of the offender when determining appropriate sentencelengths for sexual offenders. The base sentencing level for an offence having a 74-year maximum penalty was found to be 4.28 years in 1983.The weight calculatedfor the Year of Decision variable indicates that this base level has increased by an average of 0.11 years each year. These results are consistentwith dicta in R. v. Clark (1987) suggestingthat the base sentencing level for rape (an offence with a maximum sentence of 14 years) was 5 years in 1987. Similarly, research by the Justice Department found that, on average, rape sentences increasedfrom 3.9 years in 1979 to 4.87 years in 1986, an average annual increase of 0.12 years [Spier and Luketina (1988)]. An increaseof 0.82 years was observed in 1987. The positive coefficients for the Young Victim, Elderly Victim, Physical Injury, Threatened Violence, Offence was Premeditated, Indecencies, and Number of Offenders variables are consistentwith the Court of Appeal's recognition of these variables as aggravating factors. Similarly, the negative coefficients for the Remorse Shown, Itlature of Probation Report and Young Offender variables support the mitigating nature of these variables. Although not conceptually a mitigating or aggravating factor, the identity of the appellant party (Crown or offender) was expected to indicate whether the sentencinglevel in a case is at the lower or upper end oithe sentencingspectl!t. The weight estimated for the Appeal by Offender variible is consisrent with this reasoning. Surprisingly, an offender's previous convic- 53 ti ons appear to have had onl v an i nci denta l aggravating impact on sentencing levels. It appeared from preliminary analysis that the influence of this variable is instead upon the type of sentence required: in particular, whether a finite custodial or an indefinite preventative detention sentenceis appropriate. The aggravatins influence of the Emotional Injury variable was also less than expected.This may be explained in part by the paucity of information available to the courts concerning the emotional harm suffered by the victims of sexual offences.Although the lack of such information prompted a change in legislation in 7987, this amendment was too recent to have been reflected in the model. All the male complainants in the calibration sample caseswere the victims of incestuousoffences perpetrated by a relative of the complainant. Therefore, together with the Wife and De Facto variables, the Male Victim variable identifies incidences of intra-family offending. These intra-family variables account for three of the four largest coefficients(in absolute magnitude), suggesting that intra-family offending is treated as a separate, lesser category of offences.Alternatively, the prominence of the Male Victim variable as a mitigating factor may indicate that the courts treat the violation of a male as less serious than that of a female - which is in direct contrast to dicta of the Court of Appeal in R. v. Ngawhika (1987). The omission of the Length of Maximum Penalty variable indicates that sentencing levels vary uniformly as proportions of maximum penalties. This implies that individual sentencesare strongly dependent upon legislativeguidelinesas to the seriousness of an offence (as embodied in the level of the maximum penalty) and that the Court applies similar sentencing practices across the gamut of different sexual offences. The insignificance of the Offender Pleaded Guilty variable was surprising, since the courts have previously suggestedthat such pleas will be rewarded with a one year reduction in rape sentences [Hall (1987)]. However, its absence may be explicable. First, the cases considered are all appeals to the Court of Appeal by which point the relevance of a guilty plea in reducing pressure on the criminal justice system is somewhat lessened. Second, a guilty plea is a less emotive feature of cases, and may therefore 54 Forecasting criminal sentencing decisions attract a lesser subjective weight. Third, the presenceof remorse(whichmay be evidencedby a guilty plea) is accountedfor separatelyin the model. 4. Forecasting accuracy The ability of the model to forecast sentence lengths (as a proportion of the maximum penalty) in a holdout sample of. 22 additional cases was compared with that of an equal weights approach and a naive constant forecast. The equal weights regressionmodel was calibrated by first summing the standardisedmeasures for each variable. The dependent variable (proportion of maximum penalty awarded) was then regressedagainst this score (and an intercept) in the calibration sample of.67 casesand the resulting model used to predict the sentence in the remaining sample of.22 cases.The naive constant forecast represents a projection of the mean sentence imposed in the calibration sample. As a partial rationale for using unit weights, it has been suggested [Einhorn and Hogarth (1975)] that the precision of individual coefficients may be of less predictive importance than identification of the relevant variables and the signs of their respective coefficients. There have been a number of studies in which equal weight models have been found to have greater forecasting accuracy than standard regressionmodels [Lawshe and Shucker (1959), Wesman and Bennet t ( 1959) , T ra ttn e r (1 9 6 3 ), L e h m a n n (1 971), Fischer (1972), Beckwith and Lehmann (1973) and Parsons and Hulin (1982)]. Parker and Srinivasan (1976) report contradictory findings. Monte Carlo procedures have also been used to investigate when an equal weights model can be expected to perform better ihan a standard regression model [Schmidt (1971), Einhorn and Hogarth (1975), Dorans and Drasgow (1978)]. Overall the Monte Carlo results point to the efficiency of equal weights when sample sizesare small, the number of parameters is large and multicollinearity is high. In contrast, Keren and Newman (1978) note that equal weights models will not perform well when suppressor variables are present. Suppressorvariables are defined as explanatory variables which are not correlated L with the response,but which improve the prediction of the response. The current circumstances suggest that the predictive performance of the regression model will be similar to that of an equal weights model. The sample size is small relative to the number of parameters to be estimated. However, there was little evidence of multicollinearity in the data. Furthermore, while almost all of the coefficients reported in Exhibit L are significant, few of the variables had significant pairwise correlations with the response, suggestingthe presence of a number of suppressorvariables. The accuracy of the sentence predictions for the 22 holdout casesis reported in Exhibit 2. The percentage mean absolute error was calculated by dividing the mean absoluteerror by the actual mean sentence. The mean absolute error is reported assuming a 1.4-yearmaximum sentence. The mean absolute prediction error for the regressionmodel correspondsto 1.59 years for a potential l4-year sentence.This representsa percentage mean absolute error of just under 25Vo and a significant (o :0.05) improvement over the naive forecasts. Only a very slight improvement is observed over the equal weights model. Investigation of the nature of the forecast error indicates that unsystematic residual noise accounted for almost all of the regressionmodel's prediction error, with just 3Vo of.the error arising systematically[Maddala (1977)1. The decrease in retrospective and predictive efficiency between calibration and validation samplesis commonly called 'shrinkage' [Farrington and Tarling (1985)1. It arises because the product moment correlations in a sample may differ from those in the population as a whole. Modelling techniques which exploit these correlations can be expected to better fit observations in the calibration sample than observationsrandomly drawn from the population as a rvhole. The level of shrinkage across calibration and validation samples is an indication of the extent to which results estimated on the calibration sample can be generalisedto a wider population. Shrinkage between the calibration and validation samples is investigated by regressing forecasted (or fitted) sentences against actual sentencesin the calibration and validation samples. The results are presented in Exhibit 2. Perfect predictions would result in a constant of 55 D . l . S i m e s t e r ,R . J ' B r o d i e I Forecasting,criminal sentencing decisions Exhibit2 l e n g t h f o r e c a s t i n ga c c u r a c y . Sentence Regression model Mean absolute error P e r c e n t a g em e a n a b s o l u t ee r r o r 1 . 5 9y e a r Equal weights model 1 . 6 0y e a r 24.8% 25.0% Coefficient S.E. Naive mean forecast 2.21year 35.jVc R e g r e s s i o no f a c t u a i o n P r e d i c t e d Calibratiotr sarnPle Constant Fitted sentence Vo varianceexPlained 0.0000 1.0001 76Vo 67 n Validation samPle Constant Predicted sentence Vo varianceexPlained 0.0299 0.0693 0.0359 0.8577 n zero, a coefficient of one, and all of the variance in the dependentvariable being explained' These criteria ir. better satisfied on the calibration sample. However, the constant and coefficient estimated on the validation sample are not significantly different from zero and one, respectively, and the forecastsdo explain over 50Voof the variance in the validation sample sentences' 5. An application to forecasting successon appeal It is anticipated that a forecasting model of sentencing might be used by attorneys to improve the advice they give to clients regarding the likelihood of a successfulsentencing appeal' The usefulness of the present model in this regard is investigated by evaluating its ability to predict the successof the appeals in the decisions used to calibrate and validate the model. As New Zealand's highest appellate criminal court, the Court of Appeal is responsible for smoothing out disparities and ensuring the evenhandedadministration of justice. As a consequence, the Court has a discretion to grant appeals against sentence when the original sentence represents an unjustified departure from 0.0879 0.t702 56Vo 22 previous sentencing practice. In determining 'unjustified', the court has stated a'wilwhat is lingness to employ different definitions depending upon whether the appeal is brought by the Crown or the offender. When the Crown is appealing, the departure from previous practice must be greater than when the appeal is brought by the offender; considerationsjustifying an in'speak more powerfully than those crease must which ordinarily might justify a reduction' [R' v' Wihapi 1976, at 424 per McCarthy P., noted in H al l (1987); see al so R . v' B eaman 1982;and R ' v. Lepper and McDonald 19851.This distinction necessitatesincorporation of the identity of the appellant into the forecasting model, or alternitively, estimation of separate models using Crown and offender appellant subsamples'Both approachesare considered: o Ful l sampl e model : Success: G + BrDisparity + BrAppellant' o Offender and Crown subsamplemodels: Success: d. L BrDisParitY, w here: Original : original sentence length/maximum PcnaltY; Predicted : sentence length prediction/maximum P enal tY ; Forecasting criminal sentencing decisions Appellant:0, if the appeal was by the offender, and 7, if the appeal was by the Crown; Success :1, if the appeal was successful,and 0, if the appeal was unsuccessful; Disparity : Original - Predicted, if, Appellant: 0, and Predicted - Original, if Ap p e l l a n t: l . In some instancesnegative levels of disparity occurred. These measureswere not truncated to zero as appeals brought before the Court of Appeal are not limited to those with genuine merit. Some appellants may realise that there is little hope of a successfulappeal, but believe that they have little to lose from making an attempt. This is particularly true for those defendantswho have their legal costs paid by the state. Owing to the dichotomous nature of the dependent variable, the models are estimatedusing PROBIT. The coefficientsrepresent the change in the cumulative normal probability function that results from a unit change in the respective independent variables. The original sentence length was not reported in six of the cases,and the size of the dataset is thus reduced to 83 observations.The results of the full sample and offender/Crown appellant subsamples are reported in Exhibit 3. Overall the models show a high degree of statisticalsignificanceand discriminatory ability. The log likelihood ratio statisticsand all but one of the variable coefficientsare statisticallysignificant (a :0.05). The dependentvari abl e means indicate that the sentencing appeals were successfulin 42Vo,24Vo and 76Voof the full sample, and offender appellant and Crown appellant subsample observations (respectively).With no further information, appeal successcould therefore be categorised successfully in 58Vo of the full sample cases and 76Vo of both the Crown and offender appellant subsample cases. By adding the independent variables, discriminatory performance increased to 83Va accuracy across all three samples. Contrary to the Court of Appeal's stated reluctance to increase sentencesupon appeal, the parameter estimates indicate that to successfully Exhibit 3 Appeal success model. Full sample Coefficients Intercept Disparity Appellant -!.07" -0.98 (-4.6) 5.76^ (4.2) 1.L9' ( 3 . 3) (-4.3) 4.46' (3.2) Critical disparity values (14 year maximumpenalty) Offender appellant 2 . 6 0y r - 0 . 3 0y r Crown appellant R' 2 log(likelihood) D e p . v a r i a b l em e a n Correct classifications n 'Significant at a :0.01. o S i g n i f i c a n ra t a : 0 . 0 - 5 . N o l e : F i g u r e s i n p a r e n t h e s e sa r e t - r ' a l u e s . A Offender appellant subsample 0.51 '67.9" 0.42 83Vo 83 Cro*'n appellant subsample -0.95 (- r.r/ 19.32b (2.?) 3 . 0 9y r 0.73yr 0.29 46.9" 0.21 83Vc 54 0.52 15.1" 0.i6 83Vo 29 D.l. Simester. R.J. Brodie I Forecasting criminal sentencing decisions appeal a sentence, an offender must show a tgnifi.untly greater disparity between the original sentence and previous sentencing practice than is required of the Crown. For an offence carrying a maximum penalty of 14 years imprisonment, the probability of a successfulappeal by the offender will onll' exceed 507o when the original sentence is approximately three years sreater than the predicted sentence.In contrast, in upp.ul by the Crown is likely to be successful when the predicted sentence exceedsthe actual se n tenc ebY jus t nin e m o n th s . To investigate forecasting accuracy, the full samplewas split into two subsamplesand models calibrated using the data in each subsample. These models were then used to forecast the successof the appeals in the other subsampleThe distribution of the cases into each of the subsampleswas performed using a stratified random selection procedure, in order to ensure similarity between the subsamples.The parameters estimatedusing the two subsampleswere not significantly different from those estimated using the full sample. Results are presentedin Exhibit 4. The dependent variable means indicate that a constant forecast of failure would have been accuratein 57Voof the subsample1 observations and 59Vo of the subsample 2 observations.The forecasting models performed with SIVo and 88Vo accuracy, respectively. Little shrinkage was observed between the calibration and validation sample performances; indeed, the best forecasts of appeal successin the second subsamplewere obtained from the model calibrated on the first subsample. These performances represent dramatic improvements on a constant forecast, and confirm that the model is capable of accurately forecasting the outcome of sentencing appeals. Exhibit4 Appeal successmodel forecast results. Subsample 1 Dependent variable mean Correct classifications Subsample 2 Dependent variable mean Correct classifications Calibration Validation 0.43 83Vo 0.43 SIVo 0.41 85o/o 0.41 88Vo 57 6. Forecasting change in sentencein a successfulappeal As a further investigation of the predictive power of the model, the disparity between actual and predicted sentencesis used to predict the amount sentences changed in those cases in which the Court found that a change was justified. The relevanceof the appellant'sidentity is again investigatedboth by inclusion of an AppelIant variable and by estimation of separatemodels using Crown and otfender subsamples. The coefficient estimated for the Appellant variable was not significant and no significant difference was found in the parametersestimated using the Crown and offender subsamples.The results of regressingabsolute change in sentence against Disparity (as defined in Section 5) in a calibration sample of. 20 cases are presented in Exhibit 5. Forecast accuracyin a holdout sample of 15 casesis also reported, both for the disparity model and for a naive constant forecast. The Exhibit5 Sentence chanse model. Calibration results Coefficient Disparity Intercept Adj R' ^F-value 0.437 0.097 0.572 26.40^ 20 n t value 5.1" 5.4^ Forecast results Disparity model Naive mean forecast Mean absolute error Percentagemean Absolute error 0.73year 2l.6Va t.2L year 45.77o Regression of actual on predicted Calibration sample Constant Fitted sentence Eo variance explained n Validation sample Constant Predicted sentence Ea variance explained n Coefficient Stderror 0.0000 1.0000 0.0355 0.1947 59Va 20 -0.0178 0.0375 0.2067 73Va 15 r.2394 " Significant at the 997o confidence level. 58 Forecasting criminal sentencing decisions naive constant forecast represented the mean sentence change in the 20 calibration sample c as es . The coefficient estimated for the disparity variable indicates that, having decided an appeal is justified, the Court will adjust a sentenceby approximately 0.44 of a year for every year of disparity observed betrveenthe original and the predicted sentences.This result is consistentwith statements of the Court indicating that it will vary sentencesby the minimum extent necessary to redress discrepancies(see R. v. Urlich; R. v. Sirnm; and R. v. Steer). Mean absolute prediction error corresponds to 0.73 years for offences with a l4-year maximum penalty, while the percentagemean absolute error equals 27.6Vo. Predictions from the regression model were significantly (c :0.05) more precise than the naive projection. There was some evidence of shrinkage between the calibration and validation samples.Although the model explained less sentencechangevariancein the calibration sample than in the validation sample, the calibration sample predictions show no s y s t em a ti cb i a s . In th e v a l i d a ti o n th ere w as a tendency to underpredict at the lower end of the spectrum and to overpredict large sentence changes. However, this tendency was not significant; when the actual sentencewas regressedon the predicted sentence neither the constant nor the estimated coefficient were significantly different from their respectiveideals. 7. Summary and conclusions In this paper we have investigated the feasibility of developing a bootstrapping model of criminal sentencing decisions. New Zealand Court of Appeal sexual offence sentencingdecisions are modelled as a function of factors describing the character of the offender and the circumstancesof the offence. The structure of the model assumesthat the Judiciary's intuitive approach to the sentencing decision is to start with a base sentencinglevel (as a proportion of t he m ax im u m p e n a l ty ) a n d to a d d o r subtract from that level according to the aggravatingor mitigating nature of the prevailing factors. The results are encouraging. The weights estimated for the various sentencing factors are generally *"- consistent with prior expectations.Although inconsistenciesdo arise, explanationsare generally available, either suggestingmodificationsto current theory or offering justifications for the lack of corroboration. This model significantly outperforms a naive mean projection in forecastingsentencelengths in a holdout sample of 22 additional decisions.In contrast, its forecasting performance is almost identical to that of an equal weights model, indicating that predictive performance is not terribly sensitive to the relative magnitudes of the estimated coefficients. This result is consistent with previous research investigating the relative predictive performanceof regressionand equal weights models. Almost all of the prediction error resulted from unsystematic residual variance. This variance may be explained by inconsistencies in the sentencing behaviour of the judges, and from imprecision in the model. Inconsistenciesin the sentencingbehaviour of the Judiciary would correspond with observations by the judges themselvesthat the features of different casesare too complex to be reliably compared using intuition alone, Model imprecision may be due to the omission of relevant mitigating or aggravating factors; alternatively, the precision with which the model is able to predict sentence lengths should be viewed in the context of the generality of the model. The cases include a range of different offences, each of which may be associated with varied weights for the sentencing factors. Sentencepredictions generated by the model are used to forecast the likelihood of successon appeal, and to predict the change in sentence given a successfulappeal. The results confirm that bootstrapping models have the potential to assistattorneys and improve the advice they can offer cl i entsregardi ngthc w i sdom of pursuingan appeal. Moreover, by identifying the intuitive weights attached by the Judiciary to the various sentencing factors and clarifying existing practice, such models can facilitate discussion of sentencing objectives and identify the need for potenti al reform. These findings contradict claims by the Judiciary that the complexity of criminal sentencing precludesdevelopment of a useful bootstrappi ng model of sentenci ngdeci si ons.In doi ng so, D. I . Simester, R. J . Brodie I Forecasting criminal sentencing decisions the findinss demonstrate the potential contribution of bootstrapping models to a field traditiona l l l ' b e l iev ed t o f av our s o l e l y j u d g e me n ta lfo re ca sti n ga ppr oac hes . Acknowledgements Th e aut hor s ac k now l e d g eth e c o n tri b u ti o n o f Andrew Simester and wish to thank J. Scott Armstrong and three anonymous 1"/F reviewers for helpful comtnents and suggestions. Case references R. v. Beaman 1982 (16 November 7982) CA r77I 82 R. v. Clark[1987]1 NZLR 380 R. v. Lepperand McDonald 1985CA R. v. Ngawhika1987(3 July 1987)CA 91187 R . v . S t m m 1 9 8 1( 9 O c t o b e r1 9 8 1 )C A 1 4 8 / 8 1 R. v. Steer1981(7 December1981)CA 165/81 R. v. Urlich [1981]1 NZLR 310 R. v. Wihapi U97611 ttZLF. 422 R. v. Williams[1983]Crim LR 693 References Armstrong, J.S., 1985, Long-Range Forecasting:From Crystal Ball to Computer, 2nd edn. (Wiley-lnterscience, New York). Beckwith, N.E. and D.R. Lehmann, 1973."The importance of differential weights in multiple attribute models of consumer attitude", Journal of Marketing Research, 10, 141-145. B l u m s t e i n ,A . , J . C o h e n , S . E . M a r t i n a n d M . H . T o n r y , e d s . , 7983, Research on Sentencing: The Search for Reform, P a n e l o n S e n t e n c i n gR e s e a r c h ( N a t i o n a l A c a d e m y P r e s s , W a s h i n g t o n ,D C ) . Bunn, D. and G. Wright, 1991,"Interaction of judgemental and statistical forecasting methods: Issues and analysis", M a n a g e m e n tS c i e n c e , 3 7 , n o . 5 , 5 0 1 - 5 1 8 . Chiricos, T.G., P.D. Jackson and G.P. Waldo, 1972, "lnequality in the imposition of a criminal label", Social Problems,19, no. 4, 553-572. C r o y l e , J . L . , 1 9 8 3 . " M e a s u r i n g a n d e x p l a i n i n g d i s p a r i t i e si n felony sentences:Courtroom work group factors and race, sex and socioeconomic influences on sentence severity", Political Behaviour, 5, 135-153. Dorans, N. and F. Drawgow,ISTS, "Alternative weighting schemes for linear prediction" , Organizational Behaviour and Humsn Performance, 2I, 316-345. 59 Einhorn, H.J. and R.VI. Hogarth, 1975, "Unit weighting s c h e m e sf o r d e c i s i o n m a k i n g " , O r g a n i z a t i o n u l B e h a v i o u r and Human PerJ'ormance,13, IlI-192. Farrington, D.P. and R. Tarling, 1985,"Criminological prediction: An introduction", in; D.P. Farrington and R. Tarling, eds., Prediction in CriminologS, (SUNY Press, New York). F i e l d , H . S . a n d W . H . H o l l e y , 1 9 8 2 , " T h e r e l a t i o n s h i po f p e r f o r m a n c e a p p r a i s a l s y s t e m c h a r a c t e r i s t i c st o v e r d i c t s i n selected employment discrimination cases", Academy of Management Journal,2-5, no. 2, 392-406. Fischer, G.W., 1972, Four Methods for Assessing Nlultl Attribute Utilities: An Experintentol Validation (Engineeri n g P s y c h o l o g yL a b o r a t o r y , U n i v e r s i t y o f M i c h i g a n ) . Grunbaum, W.F. and A. Nervhouse, 1965, "Quantitative analysis of Judicial decisions: Some problems in prediction", Houslon Law Review,3, Z0l-220. Hall, G.G., 1987. Hall on Sentencingin New Zealand (Butterworths, \&'ellington). H a l l , E . L . a n d A . A . S i m k u s , 1 9 7 5 ," I n e q u a l i t y i n t h e t y p e s o f s e n t e n c e sr e c e i v e d b y n a t i v e A m e r i c a n s a n d w h i t e s " , Criminology, 13, no. 2, 199-222. Hills, S.L. 197I, Crime, Power, and Morallty (Chandler, Scranton, PA). Keren, G. and J.R. Newman, 1978, "Additional considerations with regard to multiple regression and equal weighting", Organizational Behaviour and Human Performance. 22, r43-t64. senti:ncKleck,G., 1981,"Racialdiscrimination in criminal ing: A critical evaluation of the evidencewith additional evidence on the death penalty", American Sociological Review, 46, 783-805. K o r t , F . , 1 , 9 5 7 ," P r e d i c t i n g S u p r e m e C o u r t d e c i s i o n sm a t h e 'Right m a t i c a l l y : A q u a n t i t a t i v e a n a l y s i so f t h e to Counsel' cases", American Political Science Review, 5L, 1-lZ. K o r t , F . , 1 9 6 3 , " C o n t e n t a n a l y s i so f J u d i c i a l o p i n i o n s a n d rules of law", in: G. Schubert, eds.. Judicial DecisionMaking (The Free Press of Glencoe, New York), 133-197. Lawlor, R.C., 1962, "Mathematical aids to prediction of court decisions", Proc. Second National Law and Electronics Conference. Lawshe, C.H. and R.E. Shucker, 1959, "The relative efficiency of four test weighting methods in multiple regression", Educational and Psychological Measurement, 19, 1 0 3 - 11 4 . L e h m a n n , D . R . , 1 9 7 1 , " T e l e v i s i o n s h o w p r e f e r e n c e :A p p l i cation of a choice model", Journal of Marketing Research, 8,47-55. Lovegrove, A., 1989, Judicial Decision Making, Sentencing Policy, and Numerical Guidance (Springer-Verlag, New York). M a d d a l a , G . S . , 1 9 7 7 , E c o n o m e t r i c s( M c G r a w - H i l l , L o n d o n ) . Nagel, S., 1965, "Predicting court cases quantitatively", Michigan Law Review, 63, I4ll-1422. Parker, B.R. and V. Srinivasan, 1976, "A consumer preference approach to the planning of rural primary healthc a r e f a c i l i t i e s " , O p e r a t i o n sR e s e a r c h , 2 4 , n o . 5 , 9 9 1 - 1 0 2 5 . P a r s o n s ,C . K . a n d C . L . H u l i n , l 9 S 2 , " D i f f e r e n t i a l l y w e i g h t ing linear models in organizational research: A crossvalidation comparison of four methods", Organizational Behaviour and Human Performance, 30,289-311. Brodie I Forecasting criminal sentencing decisions Phillipson, M., 1,971, Sociological Aspects of Crime and Delinquency (Routledge and Kegan Paul, London). Quinney, R., 1973, Critique of Legal Order (Little Brown, Boston). Schmidt, F.L., 1971, "The relative efficiency of regression and simple unit predictor weights in applied differential psychology", Educational and Psychological Measurement, 37,699-714. Segal, J.A., 1984, "Predicting Supreme Court casesprobabilistically: The search and seizure cases, 1962-198I" . The American Political ScienceReview, 78, 891-900. Spaeth, J., 1963, "An analysisof Judicial attitudes in the labor relations decisions of the Warren Court", Journal of Polirics, 25, 290-311. Spier, P. and F. Luketina, 1988, The Impact on Sentencingof the Criminal Justice Act 1985 (Department of Justice, Wellington). S p o h n , C . , S . W e l c h a n d J . G r u h l , 1 9 8 5 ," W o m e n d e f e n d a n t s in court: The interaction between sex and race in convicting and sentencing", Social ScienceQuarterly,66, 178-185. Sutton, L.P., 1978, Variations in Federal Criminal Sentences: Statistical Assessment at the National Level (Criminal Justice Research Centre, New York). T a l a r i c o , S . M . , 1 9 7 9 , " J u d i c i a l d e c i s i o n sa n d s a n c t i o n p a t terns in criminal justice", The Journal of Criminal Law and Criminolo gy, 70, 1.I'7-I24. T h o r n b e r r v , T . P . , 7 9 7 3 , " R a c e , s o c i o e c o n o m i cs t a t u s a n d sentencing in the juvenile justice system", Journal of Criminal Law and Criminology, 6.1, 90-98. Trattner, M.H., 1963, "Comparison of three methods for assembling aptitude test batteries", Personnel Psychology, 76,Z2t-232.. Turk, A.T., 1969, Criminality and the Legal Order (Rand McNally, Chicago). Ulmer, S.S., 1963, "Quantitative analysis of judicial processes:Some practical and theoretical applications" , Journal of Law and Contemporary Issues,28, 164-184. L Ulmer, S.S., 1967, "Mathematical models for predicting sentencing", in: J.L. Bernd, ed., Mathemotical Applications in Polilical Science, Vol. III (The University Press of Virginia, Charlottesville), 67*95. U l m e r , S . S . , i 9 6 9 , " T h e d i s c r i m i n a n tf u n c t i o n a n d a t h e o r e t i c a l c o n t e x t f o r i t s u s e i n e s t i m a t i n gt h e v o t e s o f j u d g e s " , i n : J.B. Grossman and J. Tanenhaus, eds., Frontiers of Judicial Researclr (Wiley, New York), 335-369. Ulmer, S.S., 1984, "The Supreme Court's certiorari decisions: Conflict as a predictive variable", American Polilical S c i e n c eR e v i e w , 7 8 , 9 0 1 - 9 1 1 . Vining, A.R. and C. Dean, 1980, "Towards sentencing uniformity: Integrating the normative and the empirical orientation", in: B.A. Grosman, eds., New Directionsin Sentencing (Butterworth, Toronto), 117-154. Wesman, A.G. and G.K. Bennett, 1959, "Multiple regression vs. simple addition of scores in prediction of college grades", Educational and Psychological Measurement, 19, 243-246. Zatz, M.5., 1985, "Pleas, priors and prison: Racial/Ethnic differences in sentencing", Social SciencesResearch, 11, 169-193. Biographies: Roderick J. BRODIE is Professor of Marketing and Head of Department of Marketing and International Business, University of Auckland, New Zealand. His research interests are in the areas of marketing science and forecasting. His publications have appeared in the Journal of Marketing Research, International Journal of Forecasting and lournal of Business Research and other journals. Duncan L SIMESTER is completing his Ph.D. in the Sloan S c h o o l o f M a n a g e m e n t , M a s s a c h u s e t t sI n s t i t u t e o f T e c h n o l ogy. His research interests are in the areas of marketing science and forecastins. L
© Copyright 2026 Paperzz