The Surgical Mortality Probability Model

ORIGINAL ARTICLE
The Surgical Mortality Probability Model
Derivation and Validation of a Simple Risk Prediction Rule for Noncardiac Surgery
Laurent G. Glance, MD,∗ † Stewart J. Lustik, MD,∗ Edward L. Hannan, PhD,‡ Turner M. Osler, MD,§
Dana B. Mukamel, PhD,¶ Feng Qian, MD,PhD,∗ and Andrew W. Dick, PhD||
Objective: To develop a 30-day mortality risk index for noncardiac surgery
that can be used to communicate risk information to patients and guide clinical
management at the “point-of-care,” and that can be used by surgeons and
hospitals to internally audit their quality of care.
Background: Clinicians rely on the Revised Cardiac Risk Index to quantify
the risk of cardiac complications in patients undergoing noncardiac surgery.
Because mortality from noncardiac causes accounts for many perioperative
deaths, there is also a need for a simple bedside risk index to predict 30-day
all-cause mortality after noncardiac surgery.
Methods: Retrospective cohort study of 298,772 patients undergoing noncardiac surgery during 2005 to 2007 using the American College of Surgeons
National Surgical Quality Improvement Program database.
Results: The 9-point S-MPM (Surgical Mortality Probability Model) 30-day
mortality risk index was derived empirically and includes three risk factors:
ASA (American Society of Anesthesiologists) physical status, emergency
status, and surgery risk class. Patients with ASA physical status I, II, III, IV
or V were assigned either 0, 2, 4, 5, or 6 points, respectively; intermediate- or
high-risk procedures were assigned 1 or 2 points, respectively; and emergency
procedures were assigned 1 point. Patients with risk scores less than 5 had
a predicted risk of mortality less than 0.50%, whereas patients with a risk
score of 5 to 6 had a risk of mortality between 1.5% and 4.0%. Patients
with a risk score greater than 6 had risk of mortality more than 10%. SMPM exhibited excellent discrimination (C statistic, 0.897) and acceptable
calibration (Hosmer-Lemeshow statistic 13.0, P = 0.023) in the validation
data set.
Conclusions: Thirty-day mortality after noncardiac surgery can be accurately
predicted using a simple and accurate risk score based on information readily
available at the bedside. This risk index may play a useful role in facilitating
shared decision making, developing and implementing risk-reduction strategies, and guiding quality improvement efforts.
(Ann Surg 2012;255:696–702)
F
or more than 20 years, clinicians have used the Goldman Index,1
and its successor, the Revised Cardiac Risk Index (RCRI), to
quantify the risk of cardiac complications and cardiac mortality
in patients scheduled to undergo noncardiac surgery.2 Estimates of
From the Departments of ∗ Anesthesiology and †Community and Preventive
Medicine, University of Rochester School of Medicine, Rochester, NY; ‡School
of Public Health, Department of Health Policy, Management and Behavior, Albany, NY; §Department of Surgery, University of Vermont Medical College,
Burlington, VT; ¶Center for Health Policy Research, University of California,
Irvine, CA; and ||RAND, Pittsburgh, PA.
Disclosure: The authors declare no conflicts of interest.
Supplemental digital content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF versions of
this article on the journal’s Web site (www.annalsofsurgery.com).
Reprints: Laurent G. Glance, MD, Department of Anesthesiology, University of
Rochester Medical Center, 601 Elmwood Avenue, Box 604, Rochester, NY
14642. E-mail: [email protected].
C 2012 by Lippincott Williams & Wilkins
Copyright ISSN: 0003-4932/12/25504-0696
DOI: 10.1097/SLA.0b013e31824b45af
696 | www.annalsofsurgery.com
patient risk based on the RCRI are used in conjunction with AHA
guidelines to implement cardiac risk-reduction strategies for preoperative patients.3 However, the RCRI was not designed, nor should it
be used, to predict all-cause mortality in surgical patients.3 Mortality
from noncardiac causes accounts for a large portion of perioperative
deaths.4 In addition to the RCRI, we need a global measure of surgical risk to guide the clinical care of noncardiac surgical patients and
make true informed consent more feasible. Like the RCRI, such a
risk index needs to be based on readily available clinical data, simple
enough to implement at the bedside, and robust enough to convey
accurate risk information to patients and clinicians.
Although many models have been developed to estimate the
risk of all-cause mortality in patients undergoing noncardiac surgery,
no simple mortality risk score has been implemented in the United
States.5–9 The American College of Surgeons (ACS) has spearheaded
a national effort to benchmark general surgical outcomes in US
hospitals.10 However, the ACS National Surgical Quality Improvement Program (NSQIP) model is too complicated to use at the
bedside.11 Furthermore, because of the data collection burden and
cost of participation, only 3% of US hospitals currently participate
in the ACS NSQIP.12 By comparison, hospital report cards based
on universally available administrative data using prediction models developed by the Agency for Healthcare Research and Quality
(AHRQ)13 are prominently featured on the Web sites of many thirdparty payers. Unfortunately, models based on administrative data may
generate biased measures of hospital performance due to poor data
quality14,15 and like the ACS model are also too complicated to use to
risk stratify individual patients at the bedside. In contrast to the ACS
and AHRQ surgical mortality prediction models, the RCRI is based
on easily obtainable clinical data and can be rapidly calculated at the
bedside. However, the RCRI was created to estimate cardiac risk and
does not accurately predict overall mortality risk.16 Thus, the RCRI
cannot be used to assess the overall risk of mortality for individual
patients undergoing noncardiac surgery, or to perform hospital and
physician benchmarking.
We sought to develop a simple risk index that can be used to
communicate risk information to patients and guide clinical management at the “point-of-care,” and that can be used by surgeons and
hospitals to internally audit their quality of care. Using clinical data
from the ACS NSQIP, our objective was to develop a simple risk
score for noncardiac surgical patients, which could be easily implemented without sacrificing predictive accuracy. In creating this risk
score, we had 3 goals. First, the risk score should be based on readily
available clinical data and should not require intensive data collection
resources. Second, the risk score should be simple enough to use at
the bedside to estimate risk of mortality without the use of a calculator. And third, this prediction model should be accurate enough
to be used by hospitals and physicians with limited data collection
resources to internally audit their outcomes. With respect to performance measurement, our goal is not to replace the NSQIP model, but
rather to provide a reasonable alternative for nonpublic performance
measurement when hospital participation in the ACS NSQIP is not
feasible due to cost considerations.
Annals of Surgery r Volume 255, Number 4, April 2012
Copyright © 2012 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Annals of Surgery r Volume 255, Number 4, April 2012
METHODS
Data Source
This study used data from the ACS NSQIP database for patients undergoing noncardiac surgery between 2005 and 2007. This
database includes information on patient demographics, functional
status, admission source, preoperative risk factors, intraoperative variables, 30-day postoperative outcomes, and predicted probability of
30-day mortality (based on the ACS NSQIP model) for patients undergoing major surgery in more than 200 participating hospitals.17
Hospital demographic information is not available in the research
data file. A systematic sampling strategy is used to avoid bias in case
selection and to insure a diverse surgical case mix.17 Trained surgical
clinical reviewers collect patient data from the medical chart, operative log, anesthesia record, interviews with the surgical attending,
and telephone interviews with the patient. Data quality is insured
through comprehensive training of the nurse reviewers, and through
an interrater reliability audit of participating sites. The University of
Rochester School of Medicine institutional review board approved
this study after expedited review (Rochester, NY).
Study Population and Outcomes
We first identified 322,389 records for patients who underwent noncardiac surgery using current procedural terminology codes.
We excluded patients who received no anesthesia, local anesthesia,
or monitored anesthesia care (20,490); patients who were missing
American Society of Anesthesiologists Physical Status (ASA PS)
codes (158), and patients who were in coma or mechanically ventilated (2969). The study cohort consisted of 298,772 patients.
Development and Validation of Risk Score
The outcome of interest was 30-day mortality. We elected not
to create a comprehensive prediction model because the complexity of the resulting risk score would have rendered it impractical for
clinical use. We based our selection of risk factors on clinical judgment and our review of the literature. Because our objective was to
create a parsimonious model to use as the basis for a risk score, we
selected 3 components for the risk model: (1) ASA PS (I, II, III, IV,
or V); (2) surgery-specific risk (low, intermediate, or high risk); and
(3) emergent versus nonemergent operation. We used the ASA PS
as a summary measure of baseline patient risk because it is highly
correlated with other preoperative clinical risk predictors18 and has
been shown to be a very strong predictor of outcomes19–23 (Table 1).
A recent study designed to examine the predictive ability of parsimonious models based on ACS NSQIP data found that ASA PS was
one of the 5 key clinical predictors, and that the limited model had
similar statistical performance to the full model.23
We used a 3-stage approach to construct a risk score for surgical mortality.24,25 In the first stage, we used an empiric data-driven
approach to classify surgical procedures into low-, intermediate-, and
high-risk procedures. By comparison, risk scores, such as the RCRI,2
typically assign specific surgical procedures to risk categories based
TABLE 1. ASA PS Classification
ASA PS
I
II
III
IV
V
Definition
A normal healthy patient
A patient with mild systemic disease
A patient with severe systemic disease
A patient with severe systemic disease that is a
constant threat to life
A moribund patient who is not expected to survive
without the operation
C 2012 Lippincott Williams & Wilkins
The Surgical Mortality Probability Model
on expert opinion.5,7 First, we grouped current procedural terminology codes for similar procedures into categories (eg, pulmonary
resection) and then created dummy variables for each of the procedure
groups. Second, we estimated a logistic regression model using ASA
PS, emergency status, and each of the procedure dummy codes as
explanatory variables. Third, we assigned each procedure to 1 of the
3 mutually exclusive risk categories (low, intermediate, and high risk)
based on the estimated regression coefficient for each procedure (Supplemental Digital Content 1–3: Appendices 1a, 1b, and 1c, available
at http://links.lww.com/SLA/A223, http://links.lww.com/SLA/A224,
and http://links.lww.com/SLA/A225, respectively), as opposed to categorizing them using crude mortality rates.
In the second stage, we reestimated a logistic regression model
using ASA PS, surgery risk category, and emergency status as explanatory variables for 30-day mortality. We then assigned points to
the levels of each risk factor based on the estimated coefficients. For
each risk factor, the base category was assigned 0 points (ASA PS I,
low-risk surgery, and nonemergent surgery). We then rounded the estimated coefficients to the nearest whole number to obtain the points
associated with each of the risk factor levels. For example, because
the estimated coefficient for ASA PS II was 2.009, patients with ASA
PS II were assigned 2 points.25
In the third stage, we summed up the points for each patient. We
then estimated a final logistic regression model in which patient score
was the sole explanatory variable used to predict 30-day mortality.
The data set was randomly split (50:50) into a derivation data
set and a validation data set. The risk score was developed entirely
using the derivation data set. The multivariate model and risk score
estimated in the derivation data set were then cross-validated in the
validation data set using measures of discrimination and calibration.
Fractional polynomials were used to verify the linearity of the association between the logit of the risk score and 30-day mortality.26 The
performance of the ACS NSQIP model was evaluated in the validation data set for general surgical and vascular patients using the ACS
NSQIP probability-of-death (POD) present in the database.
Data management and statistical analyses were performed using STATA SE/MP version 11 (STATA Corp, College Station, TX).
All statistical tests were 2-tailed and P values less than 0.05 were considered significant. We used robust variance estimators to account for
the nonindependence of observations within hospitals.27 Model discrimination was assessed using the C statistic, and model calibration
was assessed using the Hosmer–Lemeshow statistic and calibration
plots.28
RESULTS
Patient demographics are displayed in Table 2. The overall
mortality rate for the study cohort was 1.34%. The median age was
55 with an interquartile range between 42 and 68. Nearly 60% of
the patients were male. The majority of the patients were ASA PS II
(47%) or ASA PS III (37%), with the remainder distributed across
ASA PS I (11%), Class IV (5.4%), and Class V (0.15%). Nearly
30% underwent either an intermediate-risk (12%) or high-risk (17%)
surgical procedure. Approximately 6% had a history or previous openheart surgery and 5% have had angioplasty. The prevalence of diabetes
was 14%, and 12% were morbidly obese. Figure 1 and Table 3 shows
mortality as a function of ASA PS and surgery-specific risk.
The baseline logistic regression model, which included each of
the risk factors coded as categorical variables—ASA PS, emergency
status, and surgical risk—demonstrated excellent discrimination
and acceptable calibration (Table 4). The C statistic, a measure
of discrimination, was 0.902 in the derivation data set and 0.900
in the validation data set. The Hosmer–Lemeshow statistic was
12.7 (P = 0.026) in the derivation data set and 10.5 (P = 0.063) in the
www.annalsofsurgery.com | 697
Copyright © 2012 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Annals of Surgery r Volume 255, Number 4, April 2012
Glance et al
TABLE 2. Preoperative Risk Factors of the Study Cohort and of the S-MPM Classes
ASA PS
I
II
III
IV
V
Procedure risk
Low risk
Intermediate risk
High risk
Urgency
Emergency
Demographics
Age∗
Male
Cardiac
Angina
Myocardial infarction
Congestive heart failure
Open-heart surgery
PCI
HTN
Pulmonary
COPD
Pneumonia, current
Dyspnea-at-rest
Renal
Renal failure
Neurologic
TIA
CVA
Other
Diabetes
Underweight
Morbid obesity
Observed mortality rate
Overall
N = 298,772
S-MPM Class I
(Low Risk)
N = 229,779
S-MPM Class II
(Intermediate Risk)
N = 55,882
S-MPM Class III
(High Risk)
N = 13,111
10.8
47.0
36.7
5.39
0.15
14.1
60.1
25.9
0
0
0
81.2
14.7
0.03
0
0
0
36.6
60.2
3.24
70.4
12.3
17.3
87.6
5.87
6.54
16.2
40.2
43.6
0.10
6.66
93.2
12.9
9.15
14.8
69.8
55 (42,68)
58.1
51 (39,63)
61.9
66 (56,76)
44.9
70 (58,79)
46.9
0.78
0.59
0.85
6.06
5.14
44.3
0.40
0.15
0.20
3.23
3.00
37.1
1.79
1.52
2.13
15.1
12.2
68.0
3.12
4.19
6.73
17.2
12.5
69.5
4.30
0.31
1.13
2.17
0.08
0.43
10.5
0.64
2.42
15.2
2.99
7.78
2.08
0.77
5.13
11.9
2.89
2.36
2.14
1.34
5.33
5.27
5.67
7.98
14.2
2.31
11.7
1.34
10.5
1.65
13.1
0.21
26.3
4.15
7.37
2.80
28.3
6.16
6.00
35.6
Values are expressed as% unless otherwise stated.
∗
Median, interquartile range.
FIGURE 1. The observed mortality rate as a function of American Society of Anesthesiologists’ physical status and surgeryspecific risk.
698 | www.annalsofsurgery.com
validation data set. These values for the Hosmer–Lemeshow statistic reflect acceptable calibration given the large sample sizes of the
derivation and validation data sets, and the recognized sensitivity of
the Hosmer-Lemeshow statistic to sample size.29 On the basis of the
regression coefficients estimated using the derivation data set, a point
value was assigned to each of the risk factors (Table 5). The total
score was then calculated for each patient by summing points for
each of the 3 predictors. The final logistic regression model, which
included only total score as an explanatory variable, also exhibited excellent discrimination and acceptable calibration. The C statistic was
0.899 in the derivation data set and 0.897 in the validation data set.
The Hosmer-Lemeshow statistic was 5.53 (P = 0.35) in the derivation data set and 13.0 (P = 0.023) in the validation data set. Visual
inspection of the calibration plot, and comparisons of the observed
and predicted mortality rates for each score level (Table 5) indicates very good model calibration. We created 3 S-MPM (Surgical
Mortality Probability Model) classes to facilitate bedside determination of the approximate risk of all-cause mortality: Class I <0.50%;
Class II 1.5%–4.0%; and Class III >10% based on point totals
(Table 6).
C 2012 Lippincott Williams & Wilkins
Copyright © 2012 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Annals of Surgery r Volume 255, Number 4, April 2012
The Surgical Mortality Probability Model
Risk Factor
TABLE 3. Observed Mortality Percent/Number at Risk as a
Function of ASA PS and Surgery Risk Category
ASA PS
Procedure risk
Emergency
ASA PS
Surgery Risk Category
Low risk
Intermediate risk
High risk
I
II
III
IV
V
0.03
15,226
0.65
459
0.00
427
0.08
56,980
0.52
6673
0.75
6306
0.49
32,200
2.21
12,338
4.81
10,267
3.31
2420
7.15
2,419
19.2
3251
7.69
13
32.7
49
54.2
144
Risk Factor
Coefficient
ASA physical status
I
II
III
IV
V
Procedure risk
Low risk
Intermediate risk
High risk
Emergency
Nonemergent
Emergency surgery
95% CI
P
0
1
2
3
4
5
6
7
8
9
4
1
0
5
1.5%
1 + exp(
1
−βi X i )
= 1.6%
βi X i = −9.18 + 3.83(1) + 1.25(1)
Odds Ratio
Using the point system (Table 7), the estimated risk of mortality
is 1.5% versus 1.6% for the full logistic regression model.
Reference
2.009
3.827
5.062
6.366
(0.861, 3.157)
0.001
(2.687, 4.968) <0.001
(3.920, 6.206) <0.001
(5.194, 7.539) <0.001
7.46
45.9
158
582
Reference
1.250
2.065
(1.082, 1.419) <0.001
(1.921, 2.210) <0.001
3.49
7.89
Reference
0.934
(0.829, 1.039) <0.001
2.54
TABLE 5. Goodness-of-Fit of the S-MPM Risk Score in the
Validation Sample
Point Total
Points
Estimating the probability of mortality using the full logistic
regression model (Table 3):
P=
TABLE 4. Logistic Regression Model Used to Assign Points to
Risk Factors in S-MPM
Value
ASA III
Intermediate
Non-emergent
Point total
Estimate of risk
n
Observed
Mortality, %
Estimated
Mortality, %
Mortality, %
(Rounded)
11,313
4245
49,810
12,366
36,951
14,354
13,555
4762
1677
139
0.035
0
0.072
0.27
0.48
1.67
3.98
10.4
25.3
56.8
0.009
0.024
0.067
0.19
0.52
1.46
3.98
10.4
24.6
47.8
0.01
0.02
0.07
0.2
0.5
1.5
4.0
10
25
50
Of the 149,172 patients in the validation data set, 144,404 had
an ACS NSQIP predicted POD in the database. The performance of
ACS NSQIP was compared to S-MPM in this subset of the validation data set. The discrimination of S-MPM (C statistic, 0.897) was
slightly worse than ACS NSQIP (C statistic, 0.935). S-MPM was better calibrated (HL stat 11.8; P = 0.04) than the ACS NSQIP model
(Hl stat 25.3, P < 0.001) (Fig. 3).
To illustrate the application of S-MPM to calculate the POD, we
show a specific example of the correspondence between the mortality
prediction estimated by the full logistic regression model and the
mortality estimate based on the risk index:
Case. A 60-year-old ASA PS III patient undergoing an elective
cholecystectomy.
C 2012 Lippincott Williams & Wilkins
DISCUSSION
We have used a large multicenter database to create a simple
scoring system for predicting all-cause mortality in patients undergoing noncardiac surgery. Our scoring system requires that clinicians
determine only 3 risk factors—ASA PS, surgical risk category, and
emergency status—to predict 30-day all-cause mortality for noncardiac surgical patients with a high degree of accuracy. We believe that
this risk index is simple enough to implement at the patient’s bedside
and can be used to help inform clinical decision-making. By providing an estimate of overall mortality risk, S-MPM complements the
preoperative assessment of cardiac risk obtained using the RCRI, and
may provide a framework for the development and implementation of
risk-reduction strategies based on all-cause mortality. The additional
information provided by S-MPM may also facilitate informed consent
and shared decision making between patients and their physicians.30
Finally, our risk score may prove useful to hospitals which lack the
resources to participate in ACS NSQIP, but nonetheless want to assess their risk-adjusted outcomes after noncardiac surgery for quality
improvement.
FIGURE 2. Calibration graph for S-MPM. The solid line is the
predicted mortality, and the open circles represent observed
mortality rates. Vertical bars represent 95% confidence intervals for the observed mortality rates.
www.annalsofsurgery.com | 699
Copyright © 2012 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Annals of Surgery r Volume 255, Number 4, April 2012
Glance et al
TABLE 6. S-MPM Class Levels and Associated Risk of
Mortality
Class
I
II
III
Point Total
Mortality
0–4
5–6
7–9
<0.50%
1.5%–4.0%
>10%
FIGURE 3. Calibration graph for S-MPM versus ACS NSQIP
models. The black line represents perfect calibration; observed
and predicted mortality are identical.
TABLE 7. S-MPM Scoring System for Estimating Risk of
30-Day Mortality After Noncardiac Surgery
Risk Factor
Points Assigned
ASA physical status
I
II
III
IV
V
Procedure risk
Low risk
Intermediate risk
High risk
Emergency
Nonemergent
Emergency surgery
0
2
4
5
6
0
1
2
0
1
The ASA PS is the most important risk factor in S-MPM. The
relationship between ASA PS and mortality was recognized early
on by Dripps and colleagues19 and by other investigators.20–22,31,32
Despite this finding, the ASA PS was never designed to “prognosticate the effect of a surgical procedure.”33 Yet, 7 decades after it was
introduced,33 the ASA PS remains one of the most important single
predictors of mortality and morbidity for general surgery.23,34,35 One
of the major strengths of the ASA PS is its simplicity. But it is the
simplicity of this classification system, which also is its greatest limitation. Specifically, the lack of precise definitions for each of the ASA
PS levels can result in inconsistent ratings.36–38 Different physicians
may not always agree on whether a patient should be classified as
an ASA II patient (a patient with mild systemic disease) or an ASA
700 | www.annalsofsurgery.com
III (a patient with severe systemic disease) patient. However, despite
the subjective nature of the ASA PS, it is one of the key risk factors
in both the ACS NSQIP and VA NSQIP prediction models, and it is
also 1 of the 5 risk factors in the parsimonious models proposed to
replace the more comprehensive current models employed by ACS
NSQIP.23 Furthermore, predictions of ASA PS classes using objective
NSQIP risk variables were found to correlate strongly with assignments made by anesthesiologists, further underlining the objective
basis of the ASA PS classification system.18
To our knowledge, only 2 other surgical risk indices based
on the ASA PS have been proposed, and neither has been widely
incorporated into clinical practice. The first, published more than 20
years ago, was developed using a cohort of 2055 cases. It predicted
major complications occurring within the first 24 hours after surgery,
as opposed to 30-day mortality. This risk score was based on patient
age, ASA PS score, emergency status, and surgical risk.39 The second,
the Surgical Risk Scale (SRS) for in-hospital mortality, was developed
using a cohort of 4903 patients treated by 3 surgeons.7 This risk index
was based on ASA PS, surgical urgency, and surgery-specific risk.
The major limitation of SRS is that it is based on a relatively small
patient cohorts and may not be readily generalizable to a US patient
population. In contrast, S-MPM was developed and validated using
a cohort of 298,772 patients undergoing surgery at more than 200
centers.
Compared to the ACS NSQIP mortality model, S-MPM has
slightly worse discrimination and marginally better calibration. Based
on only 3 predictors, S-MPM exhibits excellent discrimination with
a C statistic of 0.90, compared to the 35-variable ACS NSQIP risk
adjustment model, which has a C statistic of 0.94. Recently, Dimick
and colleagues23 have developed parsimonious 2-variable procedurespecific models also based on the ACS NSQIP database. These models
had C statistics ranging between 0.73 and 0.92. The goal of creating
these models was to reduce the burden of data collection and make
participation in the ACS NSQIP more affordable. However, unlike
S-MPM, these risk-adjustment models are not risk scores, which can
be easily implemented at the bedside. Furthermore, these models
are only applicable to a limited set of procedures: cholecystectomy,
ventral hernia repair, gastric bypass, pancreatectomy, and colectomy.
Finally, because each of these procedure-specific models has unique
coefficients and different variables, no single risk score could be
created to quantify mortality risk based on these separate procedurespecific models.
Gawande et al40 proposed the Surgical Apgar Score to predict
major postoperative complications and mortality for patients undergoing general or vascular surgery.11,41–43 This risk index incorporates
heart rate, blood pressure, and estimated blood loss. This surgical
outcome score was initially developed and validated at a 2 major
academic centers in the United States. The Surgical Apgar Score was
subsequently tested at 8 international sites serving as pilot sites for a
surgical quality improvement program sponsored by the World Health
Organization.43 As a standalone risk index, this risk index exhibits
moderately good discrimination and conveys important prognostic
information. One of the primary advantages of the Surgical Apgar
Score is its simplicity. Its primary limitation is the potential for measurement variability, especially with respect to estimated blood loss.40
Unlike S-MPM, the Surgical Apgar Score was not designed for performance benchmarking41 because it does not adjust for preoperative
risk factors or surgical complexity. If it were used for surgical quality reporting, it would give “credit” to surgeons with more extensive
surgical blood loss. In other words, if 2 surgeons were performing
an identical procedure (such as a cholecystectomy) and had identical
crude mortality rates, the surgeon with more intraoperative blood loss
would have a higher predicted mortality rate and would thus appear
to have a lower risk-adjusted mortality rate.
C 2012 Lippincott Williams & Wilkins
Copyright © 2012 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Annals of Surgery r Volume 255, Number 4, April 2012
Our goal in creating this new surgical risk index was to create
a risk score, which was both accurate enough to convey important
prognostic information and simple enough to use at the patient’s bedside using readily available risk factors. S-MPM meets these criteria.
Data collection is limited to 3 risk factors: ASA PS, emergency status, and type of operation. These data elements are easily obtained.
The goodness-of-fit of S-MPM in the validation sample suggest that
the predicted probabilities accurately reflect the observed outcomes in
our patient sample. Because the ACS NSQIP collects data from a selfselected group of major medical centers, mortality estimates based on
S-MPM will reflect the mortality experience in a group of hospitals
that may have somewhat lower mortality rates compared to “average”
hospitals in the United States. Additional work may be necessary
to recalibrate this model to a more representative population-based
database.
Risk scores, like practice guidelines, should not be used in isolation to make clinical decisions. Although prognostication based on
a prediction model may at first appear to be devoid of the subjectivity of clinical reasoning, it has long been recognized that “statistical
model building can be as much an art as it is a science.”44 Nonetheless, risk scores can be used to enhance clinical decision making by
supplementing a clinician’s subjective “gut feeling”41 with a summary measure based on a patient sample much larger than the sum
total of any individual physician’s clinical experience. One of the
central tenets of evidence-based medicine is that clinical decision
making should involve the integration of best evidence with individual expertise.45 Risk scores, properly constructed, represent the
best evidence on patient prognosis. The utility of risk scores to help
guide clinical decision making is exemplified by the widespread application of the RCRI to guide the preoperative cardiac evaluation
of surgical patients based on a patient’s risk of developing a cardiac complication.46 Risk-reduction strategies, such as the intensity
of postoperative surveillance, are dictated by patient risk and surgical risk. The introduction of a more general risk score, such as
S-MPM, may facilitate the development of other evidence-based practice guidelines, like the AHA/ACC guidelines, in which the branch
points in decision trees are based on objective measures of patient
risk.
This study has some potential limitations. It can be argued that
using the ASA PS as one of the key variables in S-MPM introduces
too much subjectivity, limiting the potential value of this risk index.
As noted earlier however, the potential for measurement error in the
ASA PS does not appear to significantly impact the value of the ASA
PS as a key predictor of surgical outcome, nor its use in the 2 largest
surgical benchmarking efforts in the United States—the VA NSQIP
and the ACS NSQIP. Second, the subjectivity of the ASA PS may lead
to up-coding of the ASA PS if S-MPM were used as the basis for public reporting of hospital surgical outcomes. However, this risk index
is not designed to be used in this manner. Third, our categorization
of specific surgical procedures into high-, intermediate-, and lowrisk procedures may not always match clinical intuition. For example, we classify pulmonary resections and exploratory laparotomies
as high-risk surgery, whereas these are typically considered to be
intermediate-risk procedures. In mapping specific procedures to risk
categories, we relied on the results of our regression analyses, as opposed to expert clinical judgment. To the extent that some procedures
in S-MPM are classified differently than expected, clinicians will need
to “recalibrate” their choice of surgical risk to use S-MPM. Because
most surgeons perform a limited number of surgical procedures, we
do not believe that this should present a significant obstacle for most
clinicians. Fourth, despite being based on a large patient population,
our study cohort is not population-based, and our risk score may not
perform as well in an external data set. This drop-off in performance
when a scoring system is applied to a different population has long
C 2012 Lippincott Williams & Wilkins
The Surgical Mortality Probability Model
been recognized,47 and it is unlikely to limit the clinical value of this
new risk score3 because it would be straight-forward to recalibrate
S-MPM to the new patient population (as long as the characteristics
of the new population are not markedly different from ACS NSQIP).
Finally, our risk score is intended to provide a baseline estimate of risk
before a patient goes to surgery. In some cases, the nature and extent of
the planned operation will change during the course of the operation.
In theory, the observed statistical performance of this risk index may
be overstated because the risk index classifies surgery-specific risk on
the basis of the actual procedure, as opposed to the planned procedure.
Although we do not believe that this increment in model performance
is clinically important, we cannot rule it out using available data.
CONCLUSIONS
In summary, we have developed a simple risk index for allcause 30-day mortality for noncardiac surgery. S-MPM is a 9-point
score based on a patient’s ASA PS, surgery-specific risk, and whether
the procedure is performed on an emergency basis. Despite its simplicity and ease of application, this risk score exhibits excellent statistical performance. S-MPM may play a useful role in facilitating
shared decision making, developing and implementing risk-reduction
strategies, and guiding quality improvement efforts.
ACKNOWLEDGMENTS
This project was supported by a grant from the Agency for
Healthcare and Quality Research (RO1 HS 16737) and the Department of Anesthesiology, University of Rochester.
The sponsors of this study had no role in the conduct of this
study; in the analysis or the interpretation of the data; or in the
preparation, review, or approval of the article.
The views presented in this manuscript are those of the authors
and may not reflect those of Agency for Healthcare and Quality
Research.
REFERENCES
1. Goldman L, Caldera DL, Nussbaum SR, et al. Multifactorial index of cardiac
risk in noncardiac surgical procedures. N Engl J Med. 1977;297:845–850.
2. Lee TH, Marcantonio ER, Mangione CM, et al. Derivation and prospective
validation of a simple index for prediction of cardiac risk of major noncardiac
surgery. Circulation. 1999;100:1043–1049.
3. Goldman L. The revised cardiac risk index delivers what it promised. Ann Int
Med. 2010;152:57–58.
4. Devereaux PJ, Yang H, Yusuf S, et al. Effects of extended-release metoprolol
succinate in patients undergoing non-cardiac surgery (POISE trial): a randomised controlled trial. Lancet. 2008;371:1839–1847.
5. Copeland GP, Jones D, Walters M. POSSUM: a scoring system for surgical
audit. Br J Surg. 1991;78:355–360.
6. Prytherch DR, Whiteley MS, Higgins B, et al. POSSUM and Portsmouth
POSSUM for predicting mortality. Physiological and Operative Severity Score
for the enUmeration of Mortality and morbidity. Br J Surg. 1998;85:1217–
1220.
7. Sutton R, Bann S, Brooks M, et al. The Surgical Risk Scale as an improved tool for risk-adjusted analysis in comparative surgical audit. Br J Surg.
2002;89:763–768.
8. Neary WD, Prytherch D, Foy C, et al. Comparison of different methods of risk
stratification in urgent and emergency surgery. Br J Surg. 2007;94:1300–1305.
9. Liebman B, Strating RP, van Wieringen W, et al. Risk modelling of outcome
after general and trauma surgery (the IRIS score). Br J Surg. 2010;97:128–133.
10. Hall BL, Hamilton BH, Richards K, et al. Does surgical quality improve in
the American College of Surgeons National Surgical Quality Improvement
Program: an evaluation of all participating hospitals. Ann Surg. 2009;250:363–
376.
11. Regenbogen SE, Lancaster RT, Lipsitz SR, et al. Does the Surgical Apgar
Score measure intraoperative performance? Ann Surg. 2008;248:320–328.
12. Birkmeyer JD, Shahian DM, Dimick JB, et al. Blueprint for a new American
College of Surgeons: National Surgical Quality Improvement Program. J Am
Coll Surg. 2008;207:777–782.
www.annalsofsurgery.com | 701
Copyright © 2012 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Annals of Surgery r Volume 255, Number 4, April 2012
Glance et al
13. Glance LG, Osler TM, Mukamel DB, et al. Impact of the present-on-admission
indicator on hospital quality measurement: experience with the Agency for
Healthcare Research and Quality (AHRQ) Inpatient Quality Indicators. Med
Care. 2008;46:112–119.
14. Iezzoni LI. Assessing quality using administrative data. Annals of Internal
Medicine. 1997;127:666–674.
15. Gordon HS, Johnson ML, Wray NP, et al. Mortality after noncardiac surgery:
prediction from administrative versus clinical data. Med Care. 2005;43:159–
167.
16. Ford MK, Beattie WS, Wijeysundera DN. Systematic review: prediction of
perioperative cardiac complications and mortality by the revised cardiac risk
index. Ann Int Med. 2010;152:26–35.
17. Khuri SF, Henderson WG, Daley J, et al. The patient safety in surgery
study: background, study design, and patient populations. J Am Coll Surg.
2007;204:1089–1102.
18. Davenport DL, Bowe EA, Henderson WG, et al. National Surgical Quality
Improvement Program (NSQIP) risk factors can be used to validate American
Society of Anesthesiologists physical status classification (ASA PS) levels.
Ann Surg. 2006;243:636–641; discussion 641–644.
19. Dripps RD, Lamont A, Eckenhoff JE. The role of anesthesia in surgical mortality. JAMA. 1961;178:261–266.
20. Vacanti CJ, VanHouten RJ, Hill RC. A statistical analysis of the relationship
of physical status to postoperative mortality in 68,388 cases. Anesth Analg.
1970;49:564–566.
21. Marx GF, Mateo CV, Orkin LR. Computer analysis of postanesthetic deaths.
Anesthesiology. 1973;39:54–58.
22. Wolters U, Wolf T, Stutzer H, et al. ASA classification and perioperative
variables as predictors of postoperative outcome. Br J Anaesth. 1996;77:217–
222.
23. Dimick JB, Osborne NH, Hall BL, et al. Risk adjustment for comparing hospital quality with surgery: how many variables are needed? J Am Coll Surg.
2010;210:503–508.
24. Wasson JH, Sox HC, Neff RK, et al. Clinical prediction rules. Applications
and methodological standards. N Engl J Med. 1985;313:793–799.
25. Sullivan LM, Massaro JM, D’Agostino RB, Sr. Presentation of multivariate
data for clinical use: the Framingham Study risk score functions. Stat Med.
2004;23:1631–1660.
26. Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modeling. Applied Statistics.
1994;43:429–467.
27. White H. A heteroskedasticity-consistent covariance matrix estimator and a
direct test for heteroskedasticity. Econometrica. 1980;48:817–830.
28. Hosmer DW, Lemeshow S. Applied Logistic Regression. Vol 2. New York, NY:
Wiley-Interscience Publication; 2000.
29. Kramer AA, Zimmerman JE. Assessing the calibration of mortality benchmarks in critical care: the Hosmer-Lemeshow test revisited. Crit Care Med.
2007;35:2052–2056.
30. Krumholz HM. Informed consent to promote patient-centered care. J Am Med
Assoc. 2010;303:1190–1191.
31. Prause G, Ratzenhofer-Comenda B, Pierer G, et al. Can ASA grade or Goldman’s cardiac risk index predict peri-operative mortality? A study of 16,227
patients. Anaesthesia. 1997;52:203–206.
702 | www.annalsofsurgery.com
32. Prause G, Offner A, Ratzenhofer-Komenda B, et al. Comparison of two preoperative indices to predict perioperative mortality in non-cardiac thoracic
surgery. Eur J Cardiothorac Surg. 1997;11:670–675.
33. Saklad M. Grading of patients for surgical procedures. Anesthesiology.
1941;2:281–284.
34. Khuri SF, Daley J, Henderson W, et al. Risk adjustment of the postoperative
mortality rate for the comparative assessment of the quality of surgical care:
results of the National Veterans Affairs Surgical Risk Study. J Am Coll Surg.
1997;185:315–327.
35. Daley J, Khuri SF, Henderson W, et al. Risk adjustment of the postoperative
morbidity rate for the comparative assessment of the quality of surgical care:
results of the National Veterans Affairs Surgical Risk Study. J Am Coll Surg.
1997;185:328–340.
36. Owens WD, Felts JA, Spitznagel EL, Jr. ASA physical status classifications: a
study of consistency of ratings. Anesthesiology. 1978;49:239–243.
37. Ranta S, Hynynen M, Tammisto T. A survey of the ASA physical status classification: significant variation in allocation among Finnish anaesthesiologists.
Acta Anaesthesiol Scand. 1997;41:629–632.
38. Mak PH, Campbell RC, Irwin MG. The ASA physical status classification:
inter-observer consistency. American Society of Anesthesiologists. Anaesth
Intensive Care. 2002;30:633–640.
39. Tiret L, Hatton F, Desmonts JM, et al. Prediction of outcome of anaesthesia
in patients over 40 years: a multifactorial risk index. Stat Med. 1988;7:947–
954.
40. Gawande AA, Kwaan MR, Regenbogen SE, et al. An Apgar score for surgery.
J Am Coll Surg. 2007;204:201–208.
41. Regenbogen SE, Ehrenfeld JM, Lipsitz SR, et al. Utility of the surgical Apgar
score: validation in 4119 patients. Arch Surg. 2009;144:30–36; discussion 37.
42. Regenbogen SE, Bordeianou L, Hutter MM, et al. The intraoperative Surgical Apgar Score predicts postdischarge complications after colon and rectal
resection. Surgery. 2010;148:559–566.
43. Haynes AB, Regenbogen SE, Weiser TG, et al. Surgical outcome measurement
for a global patient population: validation of the Surgical Apgar Score in 8
countries. Surgery. 2011;149:519–524.
44. Lemeshow S, Teres D, Klar J, et al. Mortality Probability Models (MPM II)
based on an international cohort of intensive care unit patients. J Am Med
Assoc. 1993;270:2478–2486.
45. Sackett DL, Rosenberg WM, Gray JA, et al. Evidence based medicine: what it
is and what it isn’t. BMJ. 1996;312:71–72.
46. Fleisher LA, Beckman JA, Brown KA, et al. ACC/AHA 2007 Guidelines on
Perioperative Cardiovascular Evaluation and Care for Noncardiac Surgery: Executive Summary: A Report of the American College of Cardiology/American
Heart Association Task Force on Practice Guidelines (Writing Committee to
Revise the 2002 Guidelines on Perioperative Cardiovascular Evaluation for
Noncardiac Surgery): Developed in Collaboration With the American Society of Echocardiography, American Society of Nuclear Cardiology, Heart
Rhythm Society, Society of Cardiovascular Anesthesiologists, Society for Cardiovascular Angiography and Interventions, Society for Vascular Medicine
and Biology, and Society for Vascular Surgery. Circulation. 2007;116:1971–
1996.
47. Teres D, Lemeshow S. As American as apple pie and APACHE. Acute
physiology and chronic health evaluation. Crit Care Med. 1998;26:1297–
1298.
C 2012 Lippincott Williams & Wilkins
Copyright © 2012 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.