Variability in the Measurement of Hospital-wide

The
n e w e ng l a n d j o u r na l
of
m e dic i n e
special article
Variability in the Measurement
of Hospital-wide Mortality Rates
David M. Shahian, M.D., Robert E. Wolf, M.Sc., Lisa I. Iezzoni, M.D.,
Leslie Kirle, M.P.H., and Sharon-Lise T. Normand, Ph.D.
A bs t r ac t
Background
From Massachusetts General Hospital
(D.M.S., L.I.I.); Harvard Medical School
(D.M.S., R.E.W., L.I.I., S.-L.T.N.); Massachusetts Division of Health Care Finance
and Policy (L.K.); and Harvard School of
Public Health (S.-L.T.N.) — all in Boston.
Address reprint requests to Dr. Shahian
at the Center for Quality and Safety and
Department of Surgery, Massachusetts
General Hospital, 55 Fruit St., Boston,
MA 02114, or at [email protected].
N Engl J Med 2010;363:2530-9.
Copyright © 2010 Massachusetts Medical Society.
Several countries use hospital-wide mortality rates to evaluate the quality of hospital care, although the usefulness of this metric has been questioned. Massachusetts
policymakers recently requested an assessment of methods to calculate this aggregate mortality metric for use as a measure of hospital quality.
Methods
The Massachusetts Division of Health Care Finance and Policy provided four vendors with identical information on 2,528,624 discharges from Massachusetts acute
care hospitals from October 1, 2004, through September 30, 2007. Vendors applied
their risk-adjustment algorithms and provided predicted probabilities of in-hospital
death for each discharge and for hospital-level observed and expected mortality rates.
We compared the numbers and characteristics of discharges and hospitals included
by each of the four methods. We also compared hospitals’ standardized mortality
ratios and classification of hospitals with mortality rates that were higher or lower
than expected, according to each method.
Results
The proportions of discharges that were included by each method ranged from 28%
to 95%, and the severity of patients’ diagnoses varied widely. Because of their discharge-selection criteria, two methods calculated in-hospital mortality rates (4.0%
and 5.9%) that were twice the state average (2.1%). Pairwise associations (Pearson
correlation coefficients) of discharge-level predicted mortality probabilities ranged
from 0.46 to 0.70. Hospital-performance categorizations varied substantially and were
sometimes completely discordant. In 2006, a total of 12 of 28 hospitals that had
higher-than-expected hospital-wide mortality when classified by one method had
lower-than-expected mortality when classified by one or more of the other methods.
Conclusions
Four common methods for calculating hospital-wide mortality produced substantially different results. This may have resulted from a lack of standardized national
eligibility and exclusion criteria, different statistical methods, or fundamental flaws
in the hypothesized association between hospital-wide mortality and quality of care.
(Funded by the Massachusetts Division of Health Care Finance and Policy.)
2530
n engl j med 363;26 nejm.org december 23, 2010
The New England Journal of Medicine
Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission.
Copyright © 2010 Massachusetts Medical Society. All rights reserved.
variability in Hospital-Wide Mortality measures
H
ospital-performance metrics are
increasingly used for value-based purchasing and public reporting. For example, Section 3001 of the Patient Protection and Affordable Care Act mandates incentive payments to
hospitals that meet quality performance standards,
which remain unspecified.
In 2008, the Massachusetts Health Care Quality and Cost Council1 directed the Division of
Health Care Finance and Policy (DHCFP) to study
various approaches to estimating hospital-wide
mortality rates as performance metrics. Medicare
released hospital-wide mortality rates for its beneficiaries beginning in 1986 but suspended publication of the rates in 1993 because of methodologic concerns.2,3 The United Kingdom, various
European countries, and Canada currently publish variants of such metrics, but questions remain
regarding their clinical rationale, their usefulness
for informing consumers and practitioners, and
statistical methods for calculating them.4-6
In response to the council’s mandate, the
DHCFP initiated systematic evaluations of various
methods for calculating hospital-wide mortality
rates.7 At the DHCFP’s request, we studied four
such commercial methods.
Me thods
Study Design
In November 2008, the DHCFP issued a public
call for methods to calculate risk-adjusted, hospital-wide mortality rates with the use of data from
standard hospital discharge abstracts. Five commercial vendors responded, and two elected to
develop a joint method, resulting in four methods
for inclusion in this study. These four vendors are
3M Health Information Systems (3M),8 which used
a patient-classification system called All Patient
Refined Diagnosis Related Groups; the Dr. Foster
Unit at Imperial College London (Dr. Foster)9;
Thomson Reuters8; and University HealthSystem
Consortium (UHC)–Premier (the last representing the joint work of two of the vendors).8 Each
vendor specified its own methodologic approach.
(Details are provided in the Supplementary Appendix, available with the full text of this article
at NEJM.org.)
The DHCFP sent each vendor identical computerized files representing all discharges from
Massachusetts general acute care hospitals from
October 1, 2004, through September 30, 2007 (fis-
cal years 2005 through 2007). These data included
demographic information, admission source and
type, up to 15 discharge diagnoses and 15 procedure codes, and indicators of vital status (alive
or dead) at discharge. We did not provide vendors
with data on previous or subsequent hospitalizations, nor did we provide any information on outcomes after discharge, such as 30-day mortality.
We asked vendors to apply their risk-adjustment
algorithms to these data and provide the results
of these analyses, including the predicted probability of in-hospital death for each individual discharge and hospital-level estimates (e.g., observed
and expected mortality rates) for each hospital.
comparative Analyses
We performed three principal analyses of the four
methods. First, each method used different inclusion and exclusion criteria in their mortality
algorithms. We therefore calculated numbers of
discharges and hospitals that were included by
each method according to fiscal year and the overall 3-year period. For each method and for Massachusetts overall, we compared the attributes of
discharges and hospitals, including in-hospital
mortality rates; the age, sex, and race or ethnic
group of patients; distributions of major diagnostic categories; and the mean length of stay and
median number of discharges.
Second, we calculated Pearson correlation coefficients for individual discharge-level predicted
probabilities of in-hospital death between pairs
of methods. We performed the calculation in two
ways: first using the set of discharges common to
all four methods and then using discharges common to each pair of methods.
Third, because the goal of each method was to
provide a measure of hospital quality, we examined how well hospital-specific performance estimates agreed among methods. For each method,
we used the predicted and actual mortality rates
that were based on the discharges included by
that risk-adjustment method. Some methods produced risk-standardized mortality ratios, whereas
others produced risk-standardized mortality rates.
We converted all measures to ratios, multiplied
these values by 100, and then examined pairwise
correlations of ratios between methods, using
Pearson correlation coefficients. We estimated
three correlations: one with no weighting, one
weighted by the smaller number of hospital discharges analyzed by any two methods, and one
n engl j med 363;26 nejm.org december 23, 2010
The New England Journal of Medicine
Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission.
Copyright © 2010 Massachusetts Medical Society. All rights reserved.
2531
The
n e w e ng l a n d j o u r na l
weighted by the larger number of discharges analyzed by any two methods. To quantify consistency
among methods, we also calculated the intraclass correlation coefficient (method ICC [3,1] of
Shrout and Fleiss10), using an analysis-of-variance
procedure that modeled standardized mortality
ratios as a function of method fixed effects and
hospital random effects. This approach reflected
two of our key assumptions: that only these four
specific methods were of interest and that the discharges represented a sample of discharges that
could change from year to year.
comparing estimates of hospital performance
Each method indicated whether a hospital was a
mortality outlier in each of the 3 years. We compared how each hospital was categorized according to each of the methods into three mutually
exclusive groups: higher-than-expected mortality,
as-expected mortality, and lower-than-expected
mortality. Typically, outlier designations were
based on P values of 0.05 or 95% confidence intervals (see the Supplementary Appendix). However, the Dr. Foster method provided outlier designations on the basis of 3 years of data, along
with annual standardized mortality ratios (including standard errors) for each hospital. To be consistent with other methods, we used Dr. Foster
annual estimates and standard errors to identify
hospitals with lower-than-expected, as-expected,
or higher-than-expected mortality, with a P value
of less than 0.05 indicating statistical significance.
(The Dr. Foster method typically uses funnel charts
with 99.8% control limits.) We quantified agreement between method pairs using kappa statistics and characterized the strength of agreement
using the Landis and Koch approach.11
R e sult s
Discharge and Hospital Cohorts
A total of 83 Massachusetts hospitals had
2,528,624 hospital discharges during the study
period, with an overall in-hospital mortality rate
of 2.1% (Table 1). Two methods excluded some
hospitals from their mortality calculations. Percentages of discharges that were included by each
method ranged from 28% for UHC–Premier to 95%
for 3M. Methods varied widely not only in the
numbers of discharges analyzed but also in the
distributions of included diagnoses. For example,
discharges analyzed by UHC–Premier and Dr. Fos2532
of
m e dic i n e
ter methods had overall in-hospital mortality rates
of 5.9% and 4.0%, respectively, perhaps because
these methods included higher proportions of patients with respiratory or cardiovascular diagnoses than did the other methods.
Discharge-Level Mortality
Of 2,528,624 total discharges, 567,784 (22%) were
included in all four methods (Table 1). These 22%
of discharges involved patients who were older and
had higher in-hospital mortality, longer lengths
of stay, and more respiratory and circulatory system diagnoses than patients in Massachusetts
overall.
Individual discharge-level predicted probabilities of in-hospital death ranged from 0 to 0.999
among the four methods (data not shown). Pairwise associations of individual predicted probabilities for the 22% of discharges that were common among the four methods ranged from 0.46
(for Thomson Reuters vs. Dr. Foster in fiscal year
2005) to 0.70 (for UHC–Premier vs. 3M in fiscal
year 2005) (Table 2). Estimated Pearson correlation coefficients were slightly lower among pairs
when discharges that were common to each pair
of methods were used rather than discharges that
were common to all methods (data not shown).
Hospital-Level Mortality
The four methods produced different estimates of
hospital-level mortality and assessments of whether individual hospitals had higher-than-expected
or lower-than-expected mortality. Every method
provided hospital-wide estimates for 77 of the 83
Massachusetts hospitals for discharges in fiscal
year 2005, for 78 hospitals in fiscal year 2006,
and for 75 hospitals in fiscal year 2007. Excluded
hospitals were usually specialty facilities (e.g., rehabilitation hospitals). Pairwise correlations of
hospital-wide standardized mortality ratios depended on the weighting of measures, ranging
from 0.32 to 0.74 (Table 3). UHC–Premier and
3M had the strongest linear correlations, regardless of the weighting method used. Using fiscal
year 2007 data, Thomson Reuters produced an estimate for Hospital C, a large general hospital,
that differed from the estimates with all the other
methods (Fig. 1). The Thomson Reuters estimate
was based on 3% of Hospital C’s total discharges,
whereas the other methods used more than 30%
of the discharges. Patients who were included in
the Thomson Reuters analysis of Hospital C had
n engl j med 363;26 nejm.org december 23, 2010
The New England Journal of Medicine
Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission.
Copyright © 2010 Massachusetts Medical Society. All rights reserved.
variability in Hospital-Wide Mortality measures
Table 1. Hospital Discharges and Populations Studied for Fiscal Years 2005 through 2007, According to Four Methods of Measuring
Hospital-wide Mortality. *
Characteristic
All
Discharges
Method
UHC–
Premier
Thomson
Reuters
3M
Discharges
Included
in Every
Method
Dr. Foster
Hospitals
No. included
81
83
83
82
83
81
Percent of total
98
100
100
99
100
98
716,315
2,406,881
2,048,377
1,072,918
2,528,624
567,784
7,114
21,926
18,747
10,140
23,428
5,870
28
95
81
42
100
22
Discharges
No. included
Median no. per hospital
Percent of total
Mean length of stay per hospital (days)
Inpatient mortality rate (%)
7.0
5.8
5.9
Female sex (%)
2.0
5.7
2.4
6.1
5.8
4.0
2.1
6.5
6.2
52
58
59
50
57
52
White
84
78
79
79
78
84
Black
5
6
6
6
6
5
Asian
1
2
2
2
2
1
4
Race or ethnic group (%)†
Hispanic
4
6
6
5
6
Other or unknown
6
8
7
8
8
6
66±23
50±28
56±24
52±33
51±28
69±20
6.1
5.1
5.7
5.5
Mean age (yr)
Major diagnostic category (%)‡
Nervous system
6.4
5.7
Respiratory system
25.0
9.7
11.2
17.1
9.9
26.2
Circulatory system
25.9
15.1
17.1
25.5
15.5
30.4
Digestive system
10.8
9.5
10.5
6.2
9.4
8.5
Musculoskeletal or connective tissue
5.2
8.9
8.5
3.4
8.7
4.2
Metabolic disease
5.2
3.6
4.0
5.2
3.5
6.3
Kidney and urinary tract
9.7
4.3
4.8
7.1
4.2
10.4
Pregnancy or childbirth
<0.1
10.4
12.2
<0.1
10.0
<0.1
Newborn or neonate
3.3
10.0
<0.1
21.7
9.8
0
Mental diseases and disorders
0
4.7
5.1
<0.1
4.6
0
*Plus–minus values are means ±SD. For the study, the fiscal year was October 1 through September 30. The term 3M denotes 3M Health
Information Systems, and UHC University HealthSystem Consortium.
†Race was self-reported. Because Massachusetts did not add Hispanic as an ethnic group until October 1, 2007, subjects in this study were
categorized with the use of a hierarchical algorithm.
‡Listed are the top 10 major diagnostic categories that accounted for more than 5% of patients included by at least one vendor. A complete
listing of all major diagnostic categories is available in the Supplementary Appendix.
a higher percentage of diagnoses that are generally associated with an increased risk of death
(e.g., respiratory disease, cancer, infectious disease,
and injuries) than the state population overall.
Conversely, diagnoses with a low risk of death (e.g.,
pregnancy and childbirth) were underrepresented
in the Thomson Reuters cohort for Hospital C. The
mortality rate among these 3% of discharges was
59.8%, which was substantially higher than Hospital C’s overall mortality rate (2.2%).
n engl j med 363;26 nejm.org december 23, 2010
The New England Journal of Medicine
Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission.
Copyright © 2010 Massachusetts Medical Society. All rights reserved.
2533
The
n e w e ng l a n d j o u r na l
of
m e dic i n e
Table 2. Correlation Coefficients for Agreement among Discharge-Level Estimates of In-Hospital Risk of Death
for Fiscal Years 2005 through 2007.*
Method and Year
Correlation Coefficient
UHC–Premier
3M
Thomson Reuters
Dr. Foster
2005
1.00
0.70
0.63
0.60
2006
1.00
0.69
0.62
0.60
2007
1.00
0.67
0.59
0.59
1.00
0.57
0.61
UHC–Premier
3M
2005
2006
1.00
0.58
0.61
2007
1.00
0.58
0.61
2005
1.00
0.46
2006
1.00
0.48
2007
1.00
0.48
Thomson Reuters
*Listed are Pearson correlation coefficients that were calculated on the basis of hospital discharges that were common
to all four methods. The total numbers of these discharges were 186,170 for the 2005 fiscal year, 186,888 for the 2006
fiscal year, and 194,726 for the 2007 fiscal year. In this study, the fiscal year was October 1 to September 30. The term
3M denotes 3M Health Information Systems, and UHC University HealthSystem Consortium.
Intraclass correlation coefficients indicated
consistency among the methods in fiscal year 2005
(0.73) and fiscal year 2006 (0.80), although the fiscal year 2007 intraclass correlation coefficient was
lower (0.45). Percentages of hospitals that were
assessed as having lower-than-expected or higherthan-expected mortality were high and varied
among methods (Fig. 2). For example, in fiscal
year 2005, the Dr. Foster method designated 15 of
77 hospitals (19%) as having higher-than-expected hospital-wide mortality, as compared with 38
of 77 hospitals (49%) so designated by 3M.
Kappa statistics indicated poor-to-substantial
agreement11 between methods in classifying hospital mortality performance, depending on the
year and method pairs. In fiscal year 2007, kappa
statistics for agreement between methods in designating higher-than-expected outliers ranged
from −0.04 (UHC–Premier and 3M) to 0.39 (Thomson Reuters and Dr. Foster). For some individual hospitals, categorizations varied widely and
in some cases were completely discordant. For
instance, in fiscal year 2006, of 28 hospitals designated as having higher-than-expected hospital-wide mortality by one method, 12 were simultaneously classified as having lower-than-expected
mortality by other methods (6 by one method, 3 by
two methods, and 3 by three methods).
2534
Discussion
The goal of assessing hospital-wide mortality rates
is to make inferences about the relative quality of
care among hospitals. Proponents believe that
hospital-wide mortality metrics provide useful
warning flags about problems with the quality of
inpatient care, aid consumers in choosing a hospital, and help provide a focus for hospital quality-improvement activities.12-19 In addition to parsimony, this single, overall measure of hospital
performance has other potentially attractive features. In-hospital death is unambiguous, usually
undesired, and tabulated accurately. Hospital-wide
mortality encompasses wide-ranging diagnoses,
perhaps providing more expansive insight than
metrics that are based on single diagnoses. Hospital-wide results may uncover cross-cutting concerns, such as infection control, handoffs of patient care between shifts, and care coordination.
The four commercially available methods for
assessing hospital-wide mortality that we studied
are marketed to hospitals to support internal
quality-improvement activities. However, their implications are even more important, and the corresponding need for methodologic accuracy is
greater, when such measures are used for broader
initiatives, such as public reporting or perfor-
n engl j med 363;26 nejm.org december 23, 2010
The New England Journal of Medicine
Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission.
Copyright © 2010 Massachusetts Medical Society. All rights reserved.
variability in Hospital-Wide Mortality measures
Table 3. Correlation among 78 Hospital-Level Standardized Mortality Ratios for Fiscal Year 2007, According to Pairs
of Vendor Methods.*
Correlation Equally
Weighted
Methods
Correlation Weighted According
to Number of Discharges
Smaller Number
Larger Number
UHC–Premier vs. 3M
0.74
0.68
0.65
UHC–Premier vs. Thomson Reuters
0.38
0.57
0.44
UHC–Premier vs. Dr. Foster
0.53
0.38
0.40
3M vs. Thomson Reuters
0.48
0.62
0.43
3M vs. Dr. Foster
0.48
0.37
0.37
Thomson Reuters vs. Dr. Foster
0.36
0.42
0.32
*Listed are Pearson correlation coefficients that were calculated with the use of different weighting methods for each
pair of measurement methods. For example, the correlation for estimated standardized mortality ratios (SMRs) between UHC–Premier and 3M is 0.74 when each of the 78 pairs of SMRs is weighted equally, 0.68 when the pair of
SMRs is weighted by the smaller number of discharges analyzed by the two methods, and 0.65 when the pair of SMRs
is weighted by the larger number of discharges. The term 3M denotes 3M Health Information Systems, and UHC
University HealthSystem Consortium.
mance-based purchasing. We found that estimates
of hospital-wide mortality could vary, sometimes
widely, among methods, which consequently leads
to different inferences regarding the quality of
hospital care. Although these analyses used Massachusetts data, our results should be generalizable to other hospitalized populations.
Explanations for these discrepancies are largely embedded within the different inclusion and
exclusion criteria (for patients, diagnoses, and hospital types) among the four methods and in how
each method analyzed hospital discharge abstract
data to quantify in-hospital risks of death. These
methodologic differences produced large differences in the patient cohorts that were evaluated
by each approach, with only one fifth of discharges included among all methods. Thus, it is
not surprising that different methods rated the
same individual hospitals at either end of the
spectrum of mortality categories (higher than
expected vs. lower than expected). The observed
differences in eligibility and exclusion criteria
among methods may reflect the degree to which
those who developed the methods were willing
to accept uncertainty in the presumed association between mortality and hospital quality.
With more liberal inclusion criteria, the analyses
would probably include some diagnoses for which
the linkages between in-hospital mortality and
quality are problematic. In addition to differences
in case-selection criteria, other methodologic differences (Table 2 in the Supplementary Appendix)
may also have contributed to the discrepant results of the four methods.
Differences in categorizing performance on the
basis of hospital-wide mortality rates raise the
inevitable question of which method best identifies potential quality problems. Our study could
not address that question, since an observable
benchmark for overall hospital quality does not
exist. Without such a standard, our analytic approach relied on convergence, another indicator of
validity (i.e., the extent of agreement among re­
sults obtained with different measurement methods20,21). If different methods ostensibly aim to
capture the same construct (e.g., the quality of
hospital care) and their results are similar, such
convergence is reassuring. It is one piece of evidence that these methods provide valid information about that construct. In our study, hospitallevel results differed among methods, sometimes
substantially, and in some instances, these differences would lead to completely divergent inferences. This disagreement suggests that all
methods are not reflecting the same underlying
construct, although it is possible that one method might perform better than the others in estimating the quality of hospital care. The substantial differences we observed among methods for
estimating hospital-wide mortality are similar to
the findings of a study22 of condition-specific mortality rates and hospital performance. The aggregation of results from many disparate diagnoses
is methodologically complex and may accentuate
the magnitude of differences among methods.
A broader question is whether in-hospital mortality rates are valid quality indicators.4-6,23-28
Studies have examined the relationships between
n engl j med 363;26 nejm.org december 23, 2010
The New England Journal of Medicine
Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission.
Copyright © 2010 Massachusetts Medical Society. All rights reserved.
2535
The
B
Thomson Reuters SMR (×100)
300
250
200
150
100
50
0
0
50
100 150 200 250 300
300
Hospital C
250
200
150
100
50
0
0
50
250
200
150
100
50
0
100 150 200 250 300
E
250
250
250
100
50
Dr. Foster SMR (×100)
300
Hospital C
200
150
100
50
0
0
50
100 150 200 250 300
Thomson Reuters SMR (×100)
100 150 200 250 300
F
300
150
50
UHC–Premier SMR (×100)
300
200
0
UHC–Premier SMR (×100)
3M SMR (×100)
3M SMR (×100)
m e dic i n e
C
300
UHC–Premier SMR (×100)
D
of
Dr. Foster SMR (×100)
A
3M SMR (×100)
n e w e ng l a n d j o u r na l
0
0
50
100 150 200 250 300
200
150
Hospital C
100
50
0
0
Dr. Foster SMR (×100)
50
100 150 200 250 300
Thomson Reuters SMR (×100)
Figure 1. Comparison of Hospital-Level Standardized Mortality Ratios (SMRs) for Fiscal Year 2007, According to Four Measurement Methods.
Shown are scatter plots of SMRs, which have been multiplied by 100, as calculated by University HealthSystem Consortium (UHC)–Premier and plotted against data from 3M Health Information Systems (3M) (Panel A), Thomson Reuters (Panel B), and the Dr. Foster Unit
at Imperial College London (Dr. Foster) (Panel C); as calculated by Thomson Reuters and plotted against data from 3M (Panel D) and
Dr. Foster (Panel F); and as calculated by Dr. Foster and plotted against data from 3M (Panel E). The diagonal line indicates identical results from the two methods being compared. Panels B, D, and F show the Thompson Reuters estimates as compared with those of the
other three vendors. Data points for Hospital C as estimated by Thomson Reuters on the basis of 3% of total discharges (circled) diverge substantially from all other estimates. In this study, the fiscal year was October 1 through September 30.
hospital mortality rates for individual conditions
and judgments about quality generated through
medical-record reviews conducted by experienced
clinicians. In general, the correlations have been
poor. This finding may reflect the absence of
an association between in-hospital mortality and
quality or, given the study designs, the confounding effects of small samples and randomness,
inadequate risk adjustment, coding problems, or
other methodologic concerns.22,25,26,28-41 These
issues are even more worrisome when it comes
to hospital-wide mortality measures that encompass a diverse portfolio of conditions and procedures. For some diagnoses (e.g., major trauma
or advanced cancer), linking in-hospital mortality with hospital quality is clearly problematic.
Regardless of whether a palliative care or “do
not resuscitate” (DNR) order is written, patients
2536
with such diagnoses still have a higher like­
lihood of dying than patients overall, even if a
high quality of care is delivered. The four methods we examined used different criteria for
identifying such patients and determining whether to include them in hospital-wide mortality estimates.
Differences in hospital mortality findings
among methods also probably relate to inadequacies of the data source (currently limited to administrative claims data for hospital-wide approaches) and the different accommodations for
such problems among methods. For example, although Massachusetts discharge abstracts contain
status codes for patients receiving palliative care
and for those with DNR orders, individual hospitals report these markers inconsistently. The
four methods varied in how they handled these
n engl j med 363;26 nejm.org december 23, 2010
The New England Journal of Medicine
Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission.
Copyright © 2010 Massachusetts Medical Society. All rights reserved.
variability in Hospital-Wide Mortality measures
Mortality rates lower than expected
Mortality rates higher than expected
A Fiscal Year 2005 (77 hospitals)
Percent of Hospitals
50
40
30
20
10
0
UHC–Premier
3M
Thomson
Reuters
Dr. Foster
3M
Thomson
Reuters
Dr. Foster
3M
Thomson
Reuters
Dr. Foster
B Fiscal Year 2006 (78 hospitals)
Percent of Hospitals
50
40
30
20
10
0
UHC–Premier
C Fiscal Year 2007 (75 hospitals)
50
Percent of Hospitals
indicators or whether they were considered at all.
Although the inclusion of these factors makes
clinical sense, a potential for “gaming” exists
when risk-adjusted mortality rates are used for
public reporting or determining reimbursement.5
If quality problems occur during a patient’s admission that increase the likelihood of death, hospitals might code such patients as having DNR orders or move them to palliative care status. Some
observers have asserted that increased coding for
palliative care might have accounted in part for
an apparent temporal reduction in hospital-wide
mortality rates in the United Kingdom.5
The use of data on hospital discharge abstracts
is also complicated by the difficulty in distinguishing intrinsic illnesses at the time of patients’
admissions from complications that arise during
their subsequent care. If risk-adjustment calculations misclassify complications as coexisting illnesses, this might mask the very quality problems
that hospital-mortality metrics aim to identify. One
potential solution is the recent requirement to
add “present on admission” (POA) flags to discharge diagnoses. Massachusetts began requiring
POA coding for all hospital discharges in fiscal
year 2006. However, hospitals have been slow to
report these POA indicators consistently, and the
quality and accuracy of such indicators remain
questionable. Although some fraction of our data
set included POA indicators, this information was
too inconsistent and inadequate to consider in
these analyses — an important limitation of our
study.
Despite these concerns, some individual hospital findings could be useful to potential consumers or to hospitals. In fiscal year 2006, all
four methods identified three hospitals as having
higher-than-expected mortality. In fiscal year 2007,
all four methods identified one hospital as having lower-than-expected mortality and another
hospital as having higher-than-expected mortality. Given this consistency, it is likely that these
hospitals differ somehow from others, but we were
unable to further investigate these findings. Furthermore, in real-world settings, agencies or hospitals are unlikely to use more than one method
of mortality measurement. Since they typically
must select a single method, opportunities for
comparing results among methods — and therefore identifying consistently high or low outlier
hospitals — are unavailable.
Notwithstanding differences in eligibility and
exclusion criteria and in statistical methods, the
40
30
20
10
0
UHC–Premier
Figure 2. Percentages of Hospitals with Mortality Rates Higher or Lower
Than Expected for Fiscal Years 2005 through 2007, According to Four
Measurement Methods.
substantially different results we observed among
methods may reflect flaws in the fundamental
hypothesis that hospital-wide mortality is a valid
metric for the quality of hospital care. Our study
also does not rule out the possibility that the
n engl j med 363;26 nejm.org december 23, 2010
The New England Journal of Medicine
Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission.
Copyright © 2010 Massachusetts Medical Society. All rights reserved.
2537
The
n e w e ng l a n d j o u r na l
estimation of hospital-wide mortality rates on
the basis of nationally standardized eligibility
and exclusion criteria with the use of fixed time
intervals for end points (e.g., death within 30
days after admission) might produce more concordant results. However, the four modeling approaches were sufficiently dissimilar in other respects that greater agreement is by no means
certain. If appropriate standards existed or were
developed, evaluation of such an approach would
be a logical next step.
Our results suggest that efforts to use hospital-wide mortality rates to evaluate the quality of
care must proceed cautiously. At least among the
four widely used and representative methods that
we studied, different approaches to measurement
produced different results, leading to different and
of
m e dic i n e
sometimes completely discordant impressions
about relative hospital performance. These issues
will grow in urgency and importance as state and
national mandates for developing and reporting
hospital-quality metrics are implemented. One
potential alternative to the use of hospital-wide
mortality rates as a metric would be to estimate
hospital quality on the basis of a more limited
subgroup of diagnoses for which the link between
mortality and quality is most plausible, sample
sizes and end points are adequate, and credible
risk models are available or can be developed.
The views expressed in this article are those of the authors and
do not necessarily reflect the views of the DHCFP, the Executive
Office of Health and Human Services, or the Commonwealth of
Massachusetts.
Supported by the DHCFP.
Disclosure forms provided by the authors are available with
the full text of this article at NEJM.org.
References
1. Commonwealth of Massachusetts.
General law: chapter 6A. (http://www.mass
.gov/legis/laws/mgl/6a/6a-16k.htm.)
2. Krakauer H, Bailey RC, Skellan KJ, et
al. Evaluation of the HCFA model for the
analysis of mortality following hospitalization. Health Serv Res 1992;27:317-35.
3. Iezzoni LI. Risk adjustment for measuring health care outcomes. 3rd ed. Chicago: Health Administration Press, 2003.
4. Black N. Assessing the quality of hospitals. BMJ 2010;340:c2066.
5. Hawkes N. Patient coding and the ratings game. BMJ 2010;340:c2153.
6. Lilford R, Pronovost P. Using hospital
mortality rates to judge hospital performance: a bad idea that just won’t go away.
BMJ 2010;340:c2016.
7. Massachusetts Division of Health Care
Finance and Policy. A hospital-wide mortality project review. June 3, 2009. (http://
www.mass.gov/Ihqcc/docs/meetings/
2009_06-03_Patient_Safety_Mortality_
Measurement_Update.ppt#396.)
8. Mortality measurement. Rockville, MD:
Agency for Healthcare Research and Quality, March 2009. (http://www.ahrq.gov/qual/
mortality/.)
9. Aylin P, Bottle A, Jen MH, Middleton
S. HSMR mortality indicators. London:
Dr. Foster Research, February 23, 2010.
(http://www.drfosterhealth.co.uk/docs/
HSMR-methodology-2010.pdf.)
10. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability.
Psychol Bull 1979;86:420-8.
11. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74.
12. Jarman B, Gault S, Alves B, et al. Explaining differences in English hospital
death rates using routinely collected data.
BMJ 1999;318:1515-20.
2538
13. Jarman B, Bottle A, Aylin P, Browne
M. Monitoring changes in hospital standardised mortality ratios. BMJ 2005;330:
329.
14. Jarman B. In defence of the hospital
standardized mortality ratio. Healthc Pap
2008;8:37-42.
15. Zahn C, Baker M, MacNaughton J,
Flemming C, Bell R. Hospital standardized mortality ratio is a useful burning
platform. Healthc Pap 2008;8:50-3.
16. Wen E, Sandoval C, Zelmer J, Webster
G. Understanding and using the hospital
standardized mortality ratio in Canada:
challenges and opportunities. Healthc
Pap 2008;8:26-36.
17. McKinley J, Gibson D, Ardal S. Hospital standardized mortality ratio: the way forward in Ontario. Healthc Pap 2008;8:43-9.
18. Figler S. Data may reveal real issues.
Healthc Pap 2008;8:54-6.
19. Brien SE, Ghali WA. CIHI’s hospital
standardized mortality ratio: friend or
foe? Healthc Pap 2008;8:57-61.
20. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGrawHill, 1994.
21. Crocker LM, Algina J. Introduction to
classical and modern test theory. Mason,
OH: Wadsworth Group/Thomas Learning, 2006.
22. Iezzoni LI. The risks of risk adjustment. JAMA 1997;278:1600-7.
23. Shojania KG, Forster AJ. Hospital
mortality: when failure is not a good measure of success. CMAJ 2008;179:153-7.
24. Penfold RB, Dean S, Flemons W, Moffatt M. Do hospital standardized mortality ratios measure patient safety? HSMRs
in the Winnipeg Regional Health Authority. Healthc Pap 2008;8:8-24.
25. Mohammed MA, Deeks JJ, Girling A,
et al. Evidence of methodological bias in
hospital standardised mortality ratios:
retrospective database study of English
hospitals. BMJ 2009;338:b780. [Erratum,
BMJ 2009;338:b1348.]
26. Pitches DW, Mohammed MA, Lilford
RJ. What is the empirical evidence that
hospitals with higher-risk adjusted mortality rates provide poorer quality care? A
systematic review of the literature. BMC
Health Serv Res 2007;7:91.
27. Using hospital standardized mortality ratios for public reporting: a comment
by the Consortium of Chief Quality Officers. Am J Med Qual 2009;24:164-5.
28. Lilford R, Mohammed MA, Spiegelhalter D, Thomson R. Use and misuse of
process and outcome data in managing
performance of acute medical care: avoiding institutional stigma. Lancet 2004;
363:1147-54.
29. Thomas JW, Holloway JJ, Guire KE.
Validating risk-adjusted mortality as an
indicator for quality of care. Inquiry
1993;30:6-22.
30. Thomas JW, Hofer TP. Accuracy of
risk-adjusted mortality rate as a measure
of hospital quality of care. Med Care
1999;37:83-92.
31. Idem. Research evidence on the validity of risk-adjusted mortality rate as a measure of hospital quality of care. Med Care
Res Rev 1998;55:371-404.
32. Hofer TP, Hayward RA. Identifying
poor-quality hospitals: can hospital mortality rates detect quality problems for medical diagnoses? Med Care 1996;34:737-53.
33. Zalkind DL, Eastaugh SR. Mortality
rates as an indicator of hospital quality.
Hosp Health Serv Adm 1997;42:3-15.
34. Park RE, Brook RH, Kosecoff J, et al.
Explaining variations in hospital death
rates: randomness, severity of illness,
quality of care. JAMA 1990;264:484-90.
n engl j med 363;26 nejm.org december 23, 2010
The New England Journal of Medicine
Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission.
Copyright © 2010 Massachusetts Medical Society. All rights reserved.
variability in Hospital-Wide Mortality measures
35. Dubois RW, Rogers WH, Moxley JH
III, Draper D, Brook RH. Hospital inpatient mortality — is it a predictor of quality? N Engl J Med 1987;317:1674-80.
36. Jencks SF, Daley J, Draper D, Thomas
N, Lenhart G, Walker J. Interpreting hospital mortality data: the role of clinical
risk adjustment. JAMA 1988;260:3611-6.
37. Best WR, Cowper DC. The ratio of
observed-to-expected mortality as a qual-
ity of care indicator in non-surgical VA
patients. Med Care 1994;32:390-400.
38. Blumberg MS. Comments on HCFA
hospital death rate statistical outliers.
Health Serv Res 1987;21:715-39.
39. Austin PC, Naylor CD, Tu JV. A comparison of a Bayesian vs. a frequentist
method for profiling hospital performance.
J Eval Clin Pract 2001;7:35-45.
40. Rothberg MB, Morsi E, Benjamin EM,
Pekow PS, Lindenauer PK. Choosing the
best hospital: the limitations of public
quality reporting. Health Aff (Millwood)
2008;27:1680-7.
41. Shapiro MF, Park RE, Keesey J, Brook
RH. The effect of alternative case-mix adjustments on mortality differences between
municipal and voluntary hospitals in New
York City. Health Serv Res 1994;29:95-112.
Copyright © 2010 Massachusetts Medical Society.
n engl j med 363;26 nejm.org december 23, 2010
The New England Journal of Medicine
Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission.
Copyright © 2010 Massachusetts Medical Society. All rights reserved.
2539