The n e w e ng l a n d j o u r na l of m e dic i n e special article Variability in the Measurement of Hospital-wide Mortality Rates David M. Shahian, M.D., Robert E. Wolf, M.Sc., Lisa I. Iezzoni, M.D., Leslie Kirle, M.P.H., and Sharon-Lise T. Normand, Ph.D. A bs t r ac t Background From Massachusetts General Hospital (D.M.S., L.I.I.); Harvard Medical School (D.M.S., R.E.W., L.I.I., S.-L.T.N.); Massachusetts Division of Health Care Finance and Policy (L.K.); and Harvard School of Public Health (S.-L.T.N.) — all in Boston. Address reprint requests to Dr. Shahian at the Center for Quality and Safety and Department of Surgery, Massachusetts General Hospital, 55 Fruit St., Boston, MA 02114, or at [email protected]. N Engl J Med 2010;363:2530-9. Copyright © 2010 Massachusetts Medical Society. Several countries use hospital-wide mortality rates to evaluate the quality of hospital care, although the usefulness of this metric has been questioned. Massachusetts policymakers recently requested an assessment of methods to calculate this aggregate mortality metric for use as a measure of hospital quality. Methods The Massachusetts Division of Health Care Finance and Policy provided four vendors with identical information on 2,528,624 discharges from Massachusetts acute care hospitals from October 1, 2004, through September 30, 2007. Vendors applied their risk-adjustment algorithms and provided predicted probabilities of in-hospital death for each discharge and for hospital-level observed and expected mortality rates. We compared the numbers and characteristics of discharges and hospitals included by each of the four methods. We also compared hospitals’ standardized mortality ratios and classification of hospitals with mortality rates that were higher or lower than expected, according to each method. Results The proportions of discharges that were included by each method ranged from 28% to 95%, and the severity of patients’ diagnoses varied widely. Because of their discharge-selection criteria, two methods calculated in-hospital mortality rates (4.0% and 5.9%) that were twice the state average (2.1%). Pairwise associations (Pearson correlation coefficients) of discharge-level predicted mortality probabilities ranged from 0.46 to 0.70. Hospital-performance categorizations varied substantially and were sometimes completely discordant. In 2006, a total of 12 of 28 hospitals that had higher-than-expected hospital-wide mortality when classified by one method had lower-than-expected mortality when classified by one or more of the other methods. Conclusions Four common methods for calculating hospital-wide mortality produced substantially different results. This may have resulted from a lack of standardized national eligibility and exclusion criteria, different statistical methods, or fundamental flaws in the hypothesized association between hospital-wide mortality and quality of care. (Funded by the Massachusetts Division of Health Care Finance and Policy.) 2530 n engl j med 363;26 nejm.org december 23, 2010 The New England Journal of Medicine Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission. Copyright © 2010 Massachusetts Medical Society. All rights reserved. variability in Hospital-Wide Mortality measures H ospital-performance metrics are increasingly used for value-based purchasing and public reporting. For example, Section 3001 of the Patient Protection and Affordable Care Act mandates incentive payments to hospitals that meet quality performance standards, which remain unspecified. In 2008, the Massachusetts Health Care Quality and Cost Council1 directed the Division of Health Care Finance and Policy (DHCFP) to study various approaches to estimating hospital-wide mortality rates as performance metrics. Medicare released hospital-wide mortality rates for its beneficiaries beginning in 1986 but suspended publication of the rates in 1993 because of methodologic concerns.2,3 The United Kingdom, various European countries, and Canada currently publish variants of such metrics, but questions remain regarding their clinical rationale, their usefulness for informing consumers and practitioners, and statistical methods for calculating them.4-6 In response to the council’s mandate, the DHCFP initiated systematic evaluations of various methods for calculating hospital-wide mortality rates.7 At the DHCFP’s request, we studied four such commercial methods. Me thods Study Design In November 2008, the DHCFP issued a public call for methods to calculate risk-adjusted, hospital-wide mortality rates with the use of data from standard hospital discharge abstracts. Five commercial vendors responded, and two elected to develop a joint method, resulting in four methods for inclusion in this study. These four vendors are 3M Health Information Systems (3M),8 which used a patient-classification system called All Patient Refined Diagnosis Related Groups; the Dr. Foster Unit at Imperial College London (Dr. Foster)9; Thomson Reuters8; and University HealthSystem Consortium (UHC)–Premier (the last representing the joint work of two of the vendors).8 Each vendor specified its own methodologic approach. (Details are provided in the Supplementary Appendix, available with the full text of this article at NEJM.org.) The DHCFP sent each vendor identical computerized files representing all discharges from Massachusetts general acute care hospitals from October 1, 2004, through September 30, 2007 (fis- cal years 2005 through 2007). These data included demographic information, admission source and type, up to 15 discharge diagnoses and 15 procedure codes, and indicators of vital status (alive or dead) at discharge. We did not provide vendors with data on previous or subsequent hospitalizations, nor did we provide any information on outcomes after discharge, such as 30-day mortality. We asked vendors to apply their risk-adjustment algorithms to these data and provide the results of these analyses, including the predicted probability of in-hospital death for each individual discharge and hospital-level estimates (e.g., observed and expected mortality rates) for each hospital. comparative Analyses We performed three principal analyses of the four methods. First, each method used different inclusion and exclusion criteria in their mortality algorithms. We therefore calculated numbers of discharges and hospitals that were included by each method according to fiscal year and the overall 3-year period. For each method and for Massachusetts overall, we compared the attributes of discharges and hospitals, including in-hospital mortality rates; the age, sex, and race or ethnic group of patients; distributions of major diagnostic categories; and the mean length of stay and median number of discharges. Second, we calculated Pearson correlation coefficients for individual discharge-level predicted probabilities of in-hospital death between pairs of methods. We performed the calculation in two ways: first using the set of discharges common to all four methods and then using discharges common to each pair of methods. Third, because the goal of each method was to provide a measure of hospital quality, we examined how well hospital-specific performance estimates agreed among methods. For each method, we used the predicted and actual mortality rates that were based on the discharges included by that risk-adjustment method. Some methods produced risk-standardized mortality ratios, whereas others produced risk-standardized mortality rates. We converted all measures to ratios, multiplied these values by 100, and then examined pairwise correlations of ratios between methods, using Pearson correlation coefficients. We estimated three correlations: one with no weighting, one weighted by the smaller number of hospital discharges analyzed by any two methods, and one n engl j med 363;26 nejm.org december 23, 2010 The New England Journal of Medicine Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission. Copyright © 2010 Massachusetts Medical Society. All rights reserved. 2531 The n e w e ng l a n d j o u r na l weighted by the larger number of discharges analyzed by any two methods. To quantify consistency among methods, we also calculated the intraclass correlation coefficient (method ICC [3,1] of Shrout and Fleiss10), using an analysis-of-variance procedure that modeled standardized mortality ratios as a function of method fixed effects and hospital random effects. This approach reflected two of our key assumptions: that only these four specific methods were of interest and that the discharges represented a sample of discharges that could change from year to year. comparing estimates of hospital performance Each method indicated whether a hospital was a mortality outlier in each of the 3 years. We compared how each hospital was categorized according to each of the methods into three mutually exclusive groups: higher-than-expected mortality, as-expected mortality, and lower-than-expected mortality. Typically, outlier designations were based on P values of 0.05 or 95% confidence intervals (see the Supplementary Appendix). However, the Dr. Foster method provided outlier designations on the basis of 3 years of data, along with annual standardized mortality ratios (including standard errors) for each hospital. To be consistent with other methods, we used Dr. Foster annual estimates and standard errors to identify hospitals with lower-than-expected, as-expected, or higher-than-expected mortality, with a P value of less than 0.05 indicating statistical significance. (The Dr. Foster method typically uses funnel charts with 99.8% control limits.) We quantified agreement between method pairs using kappa statistics and characterized the strength of agreement using the Landis and Koch approach.11 R e sult s Discharge and Hospital Cohorts A total of 83 Massachusetts hospitals had 2,528,624 hospital discharges during the study period, with an overall in-hospital mortality rate of 2.1% (Table 1). Two methods excluded some hospitals from their mortality calculations. Percentages of discharges that were included by each method ranged from 28% for UHC–Premier to 95% for 3M. Methods varied widely not only in the numbers of discharges analyzed but also in the distributions of included diagnoses. For example, discharges analyzed by UHC–Premier and Dr. Fos2532 of m e dic i n e ter methods had overall in-hospital mortality rates of 5.9% and 4.0%, respectively, perhaps because these methods included higher proportions of patients with respiratory or cardiovascular diagnoses than did the other methods. Discharge-Level Mortality Of 2,528,624 total discharges, 567,784 (22%) were included in all four methods (Table 1). These 22% of discharges involved patients who were older and had higher in-hospital mortality, longer lengths of stay, and more respiratory and circulatory system diagnoses than patients in Massachusetts overall. Individual discharge-level predicted probabilities of in-hospital death ranged from 0 to 0.999 among the four methods (data not shown). Pairwise associations of individual predicted probabilities for the 22% of discharges that were common among the four methods ranged from 0.46 (for Thomson Reuters vs. Dr. Foster in fiscal year 2005) to 0.70 (for UHC–Premier vs. 3M in fiscal year 2005) (Table 2). Estimated Pearson correlation coefficients were slightly lower among pairs when discharges that were common to each pair of methods were used rather than discharges that were common to all methods (data not shown). Hospital-Level Mortality The four methods produced different estimates of hospital-level mortality and assessments of whether individual hospitals had higher-than-expected or lower-than-expected mortality. Every method provided hospital-wide estimates for 77 of the 83 Massachusetts hospitals for discharges in fiscal year 2005, for 78 hospitals in fiscal year 2006, and for 75 hospitals in fiscal year 2007. Excluded hospitals were usually specialty facilities (e.g., rehabilitation hospitals). Pairwise correlations of hospital-wide standardized mortality ratios depended on the weighting of measures, ranging from 0.32 to 0.74 (Table 3). UHC–Premier and 3M had the strongest linear correlations, regardless of the weighting method used. Using fiscal year 2007 data, Thomson Reuters produced an estimate for Hospital C, a large general hospital, that differed from the estimates with all the other methods (Fig. 1). The Thomson Reuters estimate was based on 3% of Hospital C’s total discharges, whereas the other methods used more than 30% of the discharges. Patients who were included in the Thomson Reuters analysis of Hospital C had n engl j med 363;26 nejm.org december 23, 2010 The New England Journal of Medicine Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission. Copyright © 2010 Massachusetts Medical Society. All rights reserved. variability in Hospital-Wide Mortality measures Table 1. Hospital Discharges and Populations Studied for Fiscal Years 2005 through 2007, According to Four Methods of Measuring Hospital-wide Mortality. * Characteristic All Discharges Method UHC– Premier Thomson Reuters 3M Discharges Included in Every Method Dr. Foster Hospitals No. included 81 83 83 82 83 81 Percent of total 98 100 100 99 100 98 716,315 2,406,881 2,048,377 1,072,918 2,528,624 567,784 7,114 21,926 18,747 10,140 23,428 5,870 28 95 81 42 100 22 Discharges No. included Median no. per hospital Percent of total Mean length of stay per hospital (days) Inpatient mortality rate (%) 7.0 5.8 5.9 Female sex (%) 2.0 5.7 2.4 6.1 5.8 4.0 2.1 6.5 6.2 52 58 59 50 57 52 White 84 78 79 79 78 84 Black 5 6 6 6 6 5 Asian 1 2 2 2 2 1 4 Race or ethnic group (%)† Hispanic 4 6 6 5 6 Other or unknown 6 8 7 8 8 6 66±23 50±28 56±24 52±33 51±28 69±20 6.1 5.1 5.7 5.5 Mean age (yr) Major diagnostic category (%)‡ Nervous system 6.4 5.7 Respiratory system 25.0 9.7 11.2 17.1 9.9 26.2 Circulatory system 25.9 15.1 17.1 25.5 15.5 30.4 Digestive system 10.8 9.5 10.5 6.2 9.4 8.5 Musculoskeletal or connective tissue 5.2 8.9 8.5 3.4 8.7 4.2 Metabolic disease 5.2 3.6 4.0 5.2 3.5 6.3 Kidney and urinary tract 9.7 4.3 4.8 7.1 4.2 10.4 Pregnancy or childbirth <0.1 10.4 12.2 <0.1 10.0 <0.1 Newborn or neonate 3.3 10.0 <0.1 21.7 9.8 0 Mental diseases and disorders 0 4.7 5.1 <0.1 4.6 0 *Plus–minus values are means ±SD. For the study, the fiscal year was October 1 through September 30. The term 3M denotes 3M Health Information Systems, and UHC University HealthSystem Consortium. †Race was self-reported. Because Massachusetts did not add Hispanic as an ethnic group until October 1, 2007, subjects in this study were categorized with the use of a hierarchical algorithm. ‡Listed are the top 10 major diagnostic categories that accounted for more than 5% of patients included by at least one vendor. A complete listing of all major diagnostic categories is available in the Supplementary Appendix. a higher percentage of diagnoses that are generally associated with an increased risk of death (e.g., respiratory disease, cancer, infectious disease, and injuries) than the state population overall. Conversely, diagnoses with a low risk of death (e.g., pregnancy and childbirth) were underrepresented in the Thomson Reuters cohort for Hospital C. The mortality rate among these 3% of discharges was 59.8%, which was substantially higher than Hospital C’s overall mortality rate (2.2%). n engl j med 363;26 nejm.org december 23, 2010 The New England Journal of Medicine Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission. Copyright © 2010 Massachusetts Medical Society. All rights reserved. 2533 The n e w e ng l a n d j o u r na l of m e dic i n e Table 2. Correlation Coefficients for Agreement among Discharge-Level Estimates of In-Hospital Risk of Death for Fiscal Years 2005 through 2007.* Method and Year Correlation Coefficient UHC–Premier 3M Thomson Reuters Dr. Foster 2005 1.00 0.70 0.63 0.60 2006 1.00 0.69 0.62 0.60 2007 1.00 0.67 0.59 0.59 1.00 0.57 0.61 UHC–Premier 3M 2005 2006 1.00 0.58 0.61 2007 1.00 0.58 0.61 2005 1.00 0.46 2006 1.00 0.48 2007 1.00 0.48 Thomson Reuters *Listed are Pearson correlation coefficients that were calculated on the basis of hospital discharges that were common to all four methods. The total numbers of these discharges were 186,170 for the 2005 fiscal year, 186,888 for the 2006 fiscal year, and 194,726 for the 2007 fiscal year. In this study, the fiscal year was October 1 to September 30. The term 3M denotes 3M Health Information Systems, and UHC University HealthSystem Consortium. Intraclass correlation coefficients indicated consistency among the methods in fiscal year 2005 (0.73) and fiscal year 2006 (0.80), although the fiscal year 2007 intraclass correlation coefficient was lower (0.45). Percentages of hospitals that were assessed as having lower-than-expected or higherthan-expected mortality were high and varied among methods (Fig. 2). For example, in fiscal year 2005, the Dr. Foster method designated 15 of 77 hospitals (19%) as having higher-than-expected hospital-wide mortality, as compared with 38 of 77 hospitals (49%) so designated by 3M. Kappa statistics indicated poor-to-substantial agreement11 between methods in classifying hospital mortality performance, depending on the year and method pairs. In fiscal year 2007, kappa statistics for agreement between methods in designating higher-than-expected outliers ranged from −0.04 (UHC–Premier and 3M) to 0.39 (Thomson Reuters and Dr. Foster). For some individual hospitals, categorizations varied widely and in some cases were completely discordant. For instance, in fiscal year 2006, of 28 hospitals designated as having higher-than-expected hospital-wide mortality by one method, 12 were simultaneously classified as having lower-than-expected mortality by other methods (6 by one method, 3 by two methods, and 3 by three methods). 2534 Discussion The goal of assessing hospital-wide mortality rates is to make inferences about the relative quality of care among hospitals. Proponents believe that hospital-wide mortality metrics provide useful warning flags about problems with the quality of inpatient care, aid consumers in choosing a hospital, and help provide a focus for hospital quality-improvement activities.12-19 In addition to parsimony, this single, overall measure of hospital performance has other potentially attractive features. In-hospital death is unambiguous, usually undesired, and tabulated accurately. Hospital-wide mortality encompasses wide-ranging diagnoses, perhaps providing more expansive insight than metrics that are based on single diagnoses. Hospital-wide results may uncover cross-cutting concerns, such as infection control, handoffs of patient care between shifts, and care coordination. The four commercially available methods for assessing hospital-wide mortality that we studied are marketed to hospitals to support internal quality-improvement activities. However, their implications are even more important, and the corresponding need for methodologic accuracy is greater, when such measures are used for broader initiatives, such as public reporting or perfor- n engl j med 363;26 nejm.org december 23, 2010 The New England Journal of Medicine Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission. Copyright © 2010 Massachusetts Medical Society. All rights reserved. variability in Hospital-Wide Mortality measures Table 3. Correlation among 78 Hospital-Level Standardized Mortality Ratios for Fiscal Year 2007, According to Pairs of Vendor Methods.* Correlation Equally Weighted Methods Correlation Weighted According to Number of Discharges Smaller Number Larger Number UHC–Premier vs. 3M 0.74 0.68 0.65 UHC–Premier vs. Thomson Reuters 0.38 0.57 0.44 UHC–Premier vs. Dr. Foster 0.53 0.38 0.40 3M vs. Thomson Reuters 0.48 0.62 0.43 3M vs. Dr. Foster 0.48 0.37 0.37 Thomson Reuters vs. Dr. Foster 0.36 0.42 0.32 *Listed are Pearson correlation coefficients that were calculated with the use of different weighting methods for each pair of measurement methods. For example, the correlation for estimated standardized mortality ratios (SMRs) between UHC–Premier and 3M is 0.74 when each of the 78 pairs of SMRs is weighted equally, 0.68 when the pair of SMRs is weighted by the smaller number of discharges analyzed by the two methods, and 0.65 when the pair of SMRs is weighted by the larger number of discharges. The term 3M denotes 3M Health Information Systems, and UHC University HealthSystem Consortium. mance-based purchasing. We found that estimates of hospital-wide mortality could vary, sometimes widely, among methods, which consequently leads to different inferences regarding the quality of hospital care. Although these analyses used Massachusetts data, our results should be generalizable to other hospitalized populations. Explanations for these discrepancies are largely embedded within the different inclusion and exclusion criteria (for patients, diagnoses, and hospital types) among the four methods and in how each method analyzed hospital discharge abstract data to quantify in-hospital risks of death. These methodologic differences produced large differences in the patient cohorts that were evaluated by each approach, with only one fifth of discharges included among all methods. Thus, it is not surprising that different methods rated the same individual hospitals at either end of the spectrum of mortality categories (higher than expected vs. lower than expected). The observed differences in eligibility and exclusion criteria among methods may reflect the degree to which those who developed the methods were willing to accept uncertainty in the presumed association between mortality and hospital quality. With more liberal inclusion criteria, the analyses would probably include some diagnoses for which the linkages between in-hospital mortality and quality are problematic. In addition to differences in case-selection criteria, other methodologic differences (Table 2 in the Supplementary Appendix) may also have contributed to the discrepant results of the four methods. Differences in categorizing performance on the basis of hospital-wide mortality rates raise the inevitable question of which method best identifies potential quality problems. Our study could not address that question, since an observable benchmark for overall hospital quality does not exist. Without such a standard, our analytic approach relied on convergence, another indicator of validity (i.e., the extent of agreement among re sults obtained with different measurement methods20,21). If different methods ostensibly aim to capture the same construct (e.g., the quality of hospital care) and their results are similar, such convergence is reassuring. It is one piece of evidence that these methods provide valid information about that construct. In our study, hospitallevel results differed among methods, sometimes substantially, and in some instances, these differences would lead to completely divergent inferences. This disagreement suggests that all methods are not reflecting the same underlying construct, although it is possible that one method might perform better than the others in estimating the quality of hospital care. The substantial differences we observed among methods for estimating hospital-wide mortality are similar to the findings of a study22 of condition-specific mortality rates and hospital performance. The aggregation of results from many disparate diagnoses is methodologically complex and may accentuate the magnitude of differences among methods. A broader question is whether in-hospital mortality rates are valid quality indicators.4-6,23-28 Studies have examined the relationships between n engl j med 363;26 nejm.org december 23, 2010 The New England Journal of Medicine Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission. Copyright © 2010 Massachusetts Medical Society. All rights reserved. 2535 The B Thomson Reuters SMR (×100) 300 250 200 150 100 50 0 0 50 100 150 200 250 300 300 Hospital C 250 200 150 100 50 0 0 50 250 200 150 100 50 0 100 150 200 250 300 E 250 250 250 100 50 Dr. Foster SMR (×100) 300 Hospital C 200 150 100 50 0 0 50 100 150 200 250 300 Thomson Reuters SMR (×100) 100 150 200 250 300 F 300 150 50 UHC–Premier SMR (×100) 300 200 0 UHC–Premier SMR (×100) 3M SMR (×100) 3M SMR (×100) m e dic i n e C 300 UHC–Premier SMR (×100) D of Dr. Foster SMR (×100) A 3M SMR (×100) n e w e ng l a n d j o u r na l 0 0 50 100 150 200 250 300 200 150 Hospital C 100 50 0 0 Dr. Foster SMR (×100) 50 100 150 200 250 300 Thomson Reuters SMR (×100) Figure 1. Comparison of Hospital-Level Standardized Mortality Ratios (SMRs) for Fiscal Year 2007, According to Four Measurement Methods. Shown are scatter plots of SMRs, which have been multiplied by 100, as calculated by University HealthSystem Consortium (UHC)–Premier and plotted against data from 3M Health Information Systems (3M) (Panel A), Thomson Reuters (Panel B), and the Dr. Foster Unit at Imperial College London (Dr. Foster) (Panel C); as calculated by Thomson Reuters and plotted against data from 3M (Panel D) and Dr. Foster (Panel F); and as calculated by Dr. Foster and plotted against data from 3M (Panel E). The diagonal line indicates identical results from the two methods being compared. Panels B, D, and F show the Thompson Reuters estimates as compared with those of the other three vendors. Data points for Hospital C as estimated by Thomson Reuters on the basis of 3% of total discharges (circled) diverge substantially from all other estimates. In this study, the fiscal year was October 1 through September 30. hospital mortality rates for individual conditions and judgments about quality generated through medical-record reviews conducted by experienced clinicians. In general, the correlations have been poor. This finding may reflect the absence of an association between in-hospital mortality and quality or, given the study designs, the confounding effects of small samples and randomness, inadequate risk adjustment, coding problems, or other methodologic concerns.22,25,26,28-41 These issues are even more worrisome when it comes to hospital-wide mortality measures that encompass a diverse portfolio of conditions and procedures. For some diagnoses (e.g., major trauma or advanced cancer), linking in-hospital mortality with hospital quality is clearly problematic. Regardless of whether a palliative care or “do not resuscitate” (DNR) order is written, patients 2536 with such diagnoses still have a higher like lihood of dying than patients overall, even if a high quality of care is delivered. The four methods we examined used different criteria for identifying such patients and determining whether to include them in hospital-wide mortality estimates. Differences in hospital mortality findings among methods also probably relate to inadequacies of the data source (currently limited to administrative claims data for hospital-wide approaches) and the different accommodations for such problems among methods. For example, although Massachusetts discharge abstracts contain status codes for patients receiving palliative care and for those with DNR orders, individual hospitals report these markers inconsistently. The four methods varied in how they handled these n engl j med 363;26 nejm.org december 23, 2010 The New England Journal of Medicine Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission. Copyright © 2010 Massachusetts Medical Society. All rights reserved. variability in Hospital-Wide Mortality measures Mortality rates lower than expected Mortality rates higher than expected A Fiscal Year 2005 (77 hospitals) Percent of Hospitals 50 40 30 20 10 0 UHC–Premier 3M Thomson Reuters Dr. Foster 3M Thomson Reuters Dr. Foster 3M Thomson Reuters Dr. Foster B Fiscal Year 2006 (78 hospitals) Percent of Hospitals 50 40 30 20 10 0 UHC–Premier C Fiscal Year 2007 (75 hospitals) 50 Percent of Hospitals indicators or whether they were considered at all. Although the inclusion of these factors makes clinical sense, a potential for “gaming” exists when risk-adjusted mortality rates are used for public reporting or determining reimbursement.5 If quality problems occur during a patient’s admission that increase the likelihood of death, hospitals might code such patients as having DNR orders or move them to palliative care status. Some observers have asserted that increased coding for palliative care might have accounted in part for an apparent temporal reduction in hospital-wide mortality rates in the United Kingdom.5 The use of data on hospital discharge abstracts is also complicated by the difficulty in distinguishing intrinsic illnesses at the time of patients’ admissions from complications that arise during their subsequent care. If risk-adjustment calculations misclassify complications as coexisting illnesses, this might mask the very quality problems that hospital-mortality metrics aim to identify. One potential solution is the recent requirement to add “present on admission” (POA) flags to discharge diagnoses. Massachusetts began requiring POA coding for all hospital discharges in fiscal year 2006. However, hospitals have been slow to report these POA indicators consistently, and the quality and accuracy of such indicators remain questionable. Although some fraction of our data set included POA indicators, this information was too inconsistent and inadequate to consider in these analyses — an important limitation of our study. Despite these concerns, some individual hospital findings could be useful to potential consumers or to hospitals. In fiscal year 2006, all four methods identified three hospitals as having higher-than-expected mortality. In fiscal year 2007, all four methods identified one hospital as having lower-than-expected mortality and another hospital as having higher-than-expected mortality. Given this consistency, it is likely that these hospitals differ somehow from others, but we were unable to further investigate these findings. Furthermore, in real-world settings, agencies or hospitals are unlikely to use more than one method of mortality measurement. Since they typically must select a single method, opportunities for comparing results among methods — and therefore identifying consistently high or low outlier hospitals — are unavailable. Notwithstanding differences in eligibility and exclusion criteria and in statistical methods, the 40 30 20 10 0 UHC–Premier Figure 2. Percentages of Hospitals with Mortality Rates Higher or Lower Than Expected for Fiscal Years 2005 through 2007, According to Four Measurement Methods. substantially different results we observed among methods may reflect flaws in the fundamental hypothesis that hospital-wide mortality is a valid metric for the quality of hospital care. Our study also does not rule out the possibility that the n engl j med 363;26 nejm.org december 23, 2010 The New England Journal of Medicine Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission. Copyright © 2010 Massachusetts Medical Society. All rights reserved. 2537 The n e w e ng l a n d j o u r na l estimation of hospital-wide mortality rates on the basis of nationally standardized eligibility and exclusion criteria with the use of fixed time intervals for end points (e.g., death within 30 days after admission) might produce more concordant results. However, the four modeling approaches were sufficiently dissimilar in other respects that greater agreement is by no means certain. If appropriate standards existed or were developed, evaluation of such an approach would be a logical next step. Our results suggest that efforts to use hospital-wide mortality rates to evaluate the quality of care must proceed cautiously. At least among the four widely used and representative methods that we studied, different approaches to measurement produced different results, leading to different and of m e dic i n e sometimes completely discordant impressions about relative hospital performance. These issues will grow in urgency and importance as state and national mandates for developing and reporting hospital-quality metrics are implemented. One potential alternative to the use of hospital-wide mortality rates as a metric would be to estimate hospital quality on the basis of a more limited subgroup of diagnoses for which the link between mortality and quality is most plausible, sample sizes and end points are adequate, and credible risk models are available or can be developed. The views expressed in this article are those of the authors and do not necessarily reflect the views of the DHCFP, the Executive Office of Health and Human Services, or the Commonwealth of Massachusetts. Supported by the DHCFP. Disclosure forms provided by the authors are available with the full text of this article at NEJM.org. References 1. Commonwealth of Massachusetts. General law: chapter 6A. (http://www.mass .gov/legis/laws/mgl/6a/6a-16k.htm.) 2. Krakauer H, Bailey RC, Skellan KJ, et al. Evaluation of the HCFA model for the analysis of mortality following hospitalization. Health Serv Res 1992;27:317-35. 3. Iezzoni LI. Risk adjustment for measuring health care outcomes. 3rd ed. Chicago: Health Administration Press, 2003. 4. Black N. Assessing the quality of hospitals. BMJ 2010;340:c2066. 5. Hawkes N. Patient coding and the ratings game. BMJ 2010;340:c2153. 6. Lilford R, Pronovost P. Using hospital mortality rates to judge hospital performance: a bad idea that just won’t go away. BMJ 2010;340:c2016. 7. Massachusetts Division of Health Care Finance and Policy. A hospital-wide mortality project review. June 3, 2009. (http:// www.mass.gov/Ihqcc/docs/meetings/ 2009_06-03_Patient_Safety_Mortality_ Measurement_Update.ppt#396.) 8. Mortality measurement. Rockville, MD: Agency for Healthcare Research and Quality, March 2009. (http://www.ahrq.gov/qual/ mortality/.) 9. Aylin P, Bottle A, Jen MH, Middleton S. HSMR mortality indicators. London: Dr. Foster Research, February 23, 2010. (http://www.drfosterhealth.co.uk/docs/ HSMR-methodology-2010.pdf.) 10. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420-8. 11. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74. 12. Jarman B, Gault S, Alves B, et al. Explaining differences in English hospital death rates using routinely collected data. BMJ 1999;318:1515-20. 2538 13. Jarman B, Bottle A, Aylin P, Browne M. Monitoring changes in hospital standardised mortality ratios. BMJ 2005;330: 329. 14. Jarman B. In defence of the hospital standardized mortality ratio. Healthc Pap 2008;8:37-42. 15. Zahn C, Baker M, MacNaughton J, Flemming C, Bell R. Hospital standardized mortality ratio is a useful burning platform. Healthc Pap 2008;8:50-3. 16. Wen E, Sandoval C, Zelmer J, Webster G. Understanding and using the hospital standardized mortality ratio in Canada: challenges and opportunities. Healthc Pap 2008;8:26-36. 17. McKinley J, Gibson D, Ardal S. Hospital standardized mortality ratio: the way forward in Ontario. Healthc Pap 2008;8:43-9. 18. Figler S. Data may reveal real issues. Healthc Pap 2008;8:54-6. 19. Brien SE, Ghali WA. CIHI’s hospital standardized mortality ratio: friend or foe? Healthc Pap 2008;8:57-61. 20. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGrawHill, 1994. 21. Crocker LM, Algina J. Introduction to classical and modern test theory. Mason, OH: Wadsworth Group/Thomas Learning, 2006. 22. Iezzoni LI. The risks of risk adjustment. JAMA 1997;278:1600-7. 23. Shojania KG, Forster AJ. Hospital mortality: when failure is not a good measure of success. CMAJ 2008;179:153-7. 24. Penfold RB, Dean S, Flemons W, Moffatt M. Do hospital standardized mortality ratios measure patient safety? HSMRs in the Winnipeg Regional Health Authority. Healthc Pap 2008;8:8-24. 25. Mohammed MA, Deeks JJ, Girling A, et al. Evidence of methodological bias in hospital standardised mortality ratios: retrospective database study of English hospitals. BMJ 2009;338:b780. [Erratum, BMJ 2009;338:b1348.] 26. Pitches DW, Mohammed MA, Lilford RJ. What is the empirical evidence that hospitals with higher-risk adjusted mortality rates provide poorer quality care? A systematic review of the literature. BMC Health Serv Res 2007;7:91. 27. Using hospital standardized mortality ratios for public reporting: a comment by the Consortium of Chief Quality Officers. Am J Med Qual 2009;24:164-5. 28. Lilford R, Mohammed MA, Spiegelhalter D, Thomson R. Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma. Lancet 2004; 363:1147-54. 29. Thomas JW, Holloway JJ, Guire KE. Validating risk-adjusted mortality as an indicator for quality of care. Inquiry 1993;30:6-22. 30. Thomas JW, Hofer TP. Accuracy of risk-adjusted mortality rate as a measure of hospital quality of care. Med Care 1999;37:83-92. 31. Idem. Research evidence on the validity of risk-adjusted mortality rate as a measure of hospital quality of care. Med Care Res Rev 1998;55:371-404. 32. Hofer TP, Hayward RA. Identifying poor-quality hospitals: can hospital mortality rates detect quality problems for medical diagnoses? Med Care 1996;34:737-53. 33. Zalkind DL, Eastaugh SR. Mortality rates as an indicator of hospital quality. Hosp Health Serv Adm 1997;42:3-15. 34. Park RE, Brook RH, Kosecoff J, et al. Explaining variations in hospital death rates: randomness, severity of illness, quality of care. JAMA 1990;264:484-90. n engl j med 363;26 nejm.org december 23, 2010 The New England Journal of Medicine Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission. Copyright © 2010 Massachusetts Medical Society. All rights reserved. variability in Hospital-Wide Mortality measures 35. Dubois RW, Rogers WH, Moxley JH III, Draper D, Brook RH. Hospital inpatient mortality — is it a predictor of quality? N Engl J Med 1987;317:1674-80. 36. Jencks SF, Daley J, Draper D, Thomas N, Lenhart G, Walker J. Interpreting hospital mortality data: the role of clinical risk adjustment. JAMA 1988;260:3611-6. 37. Best WR, Cowper DC. The ratio of observed-to-expected mortality as a qual- ity of care indicator in non-surgical VA patients. Med Care 1994;32:390-400. 38. Blumberg MS. Comments on HCFA hospital death rate statistical outliers. Health Serv Res 1987;21:715-39. 39. Austin PC, Naylor CD, Tu JV. A comparison of a Bayesian vs. a frequentist method for profiling hospital performance. J Eval Clin Pract 2001;7:35-45. 40. Rothberg MB, Morsi E, Benjamin EM, Pekow PS, Lindenauer PK. Choosing the best hospital: the limitations of public quality reporting. Health Aff (Millwood) 2008;27:1680-7. 41. Shapiro MF, Park RE, Keesey J, Brook RH. The effect of alternative case-mix adjustments on mortality differences between municipal and voluntary hospitals in New York City. Health Serv Res 1994;29:95-112. Copyright © 2010 Massachusetts Medical Society. n engl j med 363;26 nejm.org december 23, 2010 The New England Journal of Medicine Downloaded from nejm.org on December 22, 2010. For personal use only. No other uses without permission. Copyright © 2010 Massachusetts Medical Society. All rights reserved. 2539
© Copyright 2026 Paperzz