Identifying achievable benchmarks of care: concepts and methodology

International Journal for Quality in Health Care 1998; Volume 10, Number 5: pp. 443–447
Methodology Matters—XII
Identifying achievable benchmarks of care:
concepts and methodology
CATARINA I. KIEFE1, NORMAN W. WEISSMAN2, JEROAN J. ALLISON1, ROBERT FARMER3,
MICHAEL WEAVER1 AND O. DALE WILLIAMS1
1
Department of Medicine, University of Alabama at Birmingham, 2Department of Health Services Administration, University of
Alabama at Birmingham, and 3Alabama Quality Assurance Foundation, Inc., Birmingham, AL, USA
Abstract
Webster’s Dictionary defines a benchmark as ‘something that serves as a standard by which others can be measured’.
Benchmarking pervades the health care quality improvement literature, and benchmarks are usually based on subjective
assessment rather than on measurements derived from data. As such, benchmarks may fail to yield an achievable level of
excellence that can be replicated under specific conditions. In this paper, we provide an overview of benchmarking in health
care. We then describe the evolution of our data-driven method for identifying an Achievable Benchmark of Care (ABCTM)
on the basis of process-of-care indicators. Here, our experience leads us to postulate the following premises for sound
benchmarks: (i) benchmarks should represent a level of excellence; (ii) benchmarks should be demonstrably attainable; (iii)
providers with high performance should be selected from among all providers in a predefined way using reliable data; (iv)
all providers with high performance levels should contribute to the benchmark level; and (v) providers with high performance
levels but small numbers of cases should not unduly influence the level of the benchmark.
An example of an ABCTM applied to the cooperative cardiovascular project leads the reader through the computation of
an ABCTM. Finally, we consider several refinements of the original ABCTM concept that are in progress, e.g. how to approach
the special problems posed by very small denominators.
The ABCTM methodology has been well accepted in multiple quality improvement projects. This approach lends objectivity
and reliability to benchmarks that have been a widely used, but until now, arbitrarily defined tool.
Keywords: benchmarks, continuous quality improvement, feedback, outcomes, quality improvement
Benchmarking is generally described as the identification of
‘industry leaders’ so that their practices may be understood
and emulated [1–3]. Determination of benchmark performance must be based on objective criteria and must
incorporate what is reliably and validly measured [4,5]. As an
example in the health policy arena, a group of investigators
at Dartmouth College in the USA [6] and Schroeder [7] have
argued for the use of benchmarks to set targets for the
number of physicians in the workforce. Until recently, the
literature on benchmarking had not gone beyond definitions
such as, ‘benchmarking is the continuous process of measuring products, services and practices against the toughest
competitors or those known as leaders in the field’ [8,9]. We
have developed a methodology to identify peer group-based,
objective, reproducible, data-driven performance measures
that we call the University of Alabama at Birmingham’s
Achievable Benchmarks of Care (ABCTMs).
Many recent advances in the methodology of quality
improvement have focused on the reliability and validity of
methods to measure how physicians and organizations perform specific functions of medical care. These are reflected
in process-of-care indicators such as obtaining mammography
rates for women aged over 50 years. For each process-ofcare indicator, identification of a realistic, achievable, ‘best in
class’ performance should be accomplished. Since no processof-care indicators are flawless or non-controversial, ‘perfect’
benchmarks are unlikely to be derived from provider data.
For example, patient preferences and health care access issues
that are not under the control of a provider make setting a
mammography target of 100% ill advised. Hence, establishing
Address correspondence to Catarina Kiefe, University of Alabama at Birmingham, 1717 Eleventh Avenue South, MT 700,
Birmingham, AL 35205-4785, USA. Tel: +1 205 934 3773. Fax: +1 205 934 7959. E-mail: [email protected]
443
C. I. Kiefe et al.
benchmark performance as a realistic, achievable goal is of
utmost importance to achieve provider acceptance.
In previous research, members of our team developed
the ‘pared-mean’ method to define from data the best
achievable practice in the inpatient care of acute myocardial
infarction (AMI) [10]. The pared-mean method is generalizable to a wide variety of conditions and circumstances
including various health care settings, levels of aggregation
(national, state, local), and multiple data sources such as
claims and census data. This approach differs from the
more traditional benchmarking efforts in clinical settings,
where benchmarks have mostly been defined subjectively.
The pared-mean method applies quantitative techniques
to determine ‘top performance’, defined as the mean of
the best care achieved for at least 10% of the population
[11]. Making this determination is far from trivial. The
computation usually demands a more sophisticated approach
than taking a simple average, which may assign undue
weight to those with few cases and set the benchmark
too high.
Many quality improvement projects now present providers
with benchmarks and averages to aid in the transition
from performance measurement to performance improvement [12,13]. The dissemination of provider profiles
to employers, payers, and health care organizations and of
report cards to consumers and payers has recently attracted
much attention [14,15]. In our own research [10], we use the
provider profile approach for feedback and dissemination
through confidential letters to providers. Provider acceptance
of any of these forms of feedback is largely dependent
on the perceived validity of the measures [16]. We believe
that presenting provider profiles within the framework of an
objective data-driven and achievable benchmark framework
confers significant face as well as content validity to the
profiles [10].
Data-driven benchmarks: early definition
and examples
The Cooperative Cardiovascular Project (CCP) was the first
project piloted under the Health Care Quality Improvement
Program of the US Health Care Financing Administration
(HCFA) [11]. The pilot CCP, which began in 1992, focused
on the routine management of patients with AMI [17].
Published clinical guidelines [18] were transformed into
computerized algorithms allowing quantitative assessment of
process-of-care indicators; these would measure appropriate
utilization of indicated therapies. Approximately 21 000
medical records were reviewed by the peer review organizations of Alabama, Connecticut, Iowa and Wisconsin
for the pilot CCP. A recent publication reports on the
effects of the initial measurement followed by feedback
and then remeasurement in 1995 [19]. The successful pilot
project became a national model and, since its beginning
in 1995, approximately 250 000 charts nationwide have
been abstracted by HCFA using these procedures.
444
Benchmarking in CCP
The initial CCP benchmarking analysis focused on six specific
process-of-care indicators concerning the process-of-care of
AMI patients: smoking cessation counseling, aspirin, angiotensin converting enzyme-inhibitor, and beta-blocker prescriptions at discharge, and aspirin and low-dose heparin
administration during hospitalization. Other indicators, such
as thrombolysis were addressed at a later date.
Three plausible data-driven definitions for benchmark performance of a given indicator were tested in the CCP [10].
These definitions were then compared using the following
criteria:
• benchmark levels should represent a level of excellence;
for example, benchmark levels should always exceed
mean performance;
• benchmark levels should be attainable, i.e. clinically
realistic; for example, if data-driven methodology leads
to a benchmark level on a given indicator which corresponds to perfect performance, consideration should
be given to the indicator’s becoming a more rigid
standard rather than a flexible guideline [20];
• providers with high performance should be selected
from all providers in a predefined, data-driven way;
• all providers with high performance on a process-ofcare indicator should contribute to the benchmark level
for that indicator;
• providers with high process-of-care indicator performance but small numbers of cases should not unduly
influence the level of the benchmark (although even
small number providers should contribute to the benchmark, if they exhibit high performance).
The first two criteria are important so that benchmarks will,
in fact, encourage continuous quality improvement. The
remaining third, fourth, and fifth criteria are necessary to
ensure the objectivity and validity of the benchmark.
Details concerning specific benchmark levels according to
the different definitions and their comparison to average
performance have been reported previously [10]. Both in the
pilot CCP and in subsequent projects in which the datadriven ABCTM was used, the same operational definition of
benchmark worked best in all cases. It yielded realistic and
acceptable benchmarks, and that approach was labeled the
‘pared mean’. The steps necessary to calculate a pared-mean
benchmark are shown in Table 1.
Table 2 summarizes pertinent results from our initial
benchmark analysis. For example, there were 106 hospitals
in the Alabama pilot CCP which had a total of 1253 patients
who should have been counseled to stop smoking. The
overall pooled mean counseling rate (mean weighted for
denominator size) for these 106 hospitals was 14%. We
ranked the 106 hospitals in order of their smoking cessation
counseling rates, and then selected enough hospitals, from
the top-ranked down, to include at least 125 (10% of 1253
patients) patients eligible for smoking cessation counseling.
This resulted in the selection of 12 hospitals (the high
performance hospitals, or benchmark contributors).
Achievable benchmarks of care
Table 1 Steps in calculating benchmark with the pared-mean
method
1. Rank order providers (e.g. hospitals, physicians, other
levels of aggregation) in descending order of
performance relative to process indicator.
2. Beginning with the best-performing provider, add
providers sequentially in descending order until this
subset of providers represents at least 10% of all
patients or subjects in the entire dataset.
3. Calculate benchmark based on subset as follows:
total number of patients in subset receiving
recommended intervention/total number of patients in
subset
Pooling the patients eligible for counseling from these 12
hospitals resulted in a benchmark performance level of 49%.
Of these 12 benchmark contributors, 10 hospitals actually
had rates above 49%, i.e. were benchmark performers. Note
that simply taking the top 10% of hospitals (in this case 11),
as opposed to the top 10% of patients, might have resulted
in a sample representing many fewer patients, if say, the top
11 hospitals had all had very few patients eligible for smoking
cessation counseling.
The Alabama Quality Assurance Foundation (Alabama peer
review organization) subsequently successfully implemented
benchmarking methodology into the pilot CCP’s feedback to
hospitals with positive hospital responses to that feedback
[16]. Since the pilot CCP, we have applied the ABCTM to
several quality improvement projects and found them to be
useful, as well as generally well accepted.
Refinements to the benchmark
methodology
We are currently refining and testing the pared-mean method
through a 5-year cooperative agreement with the Agency for
Health Care Policy and Research (AHCPR). As such, we are
engaged in an iterative process of exploratory data analysis,
literature review, expert consultation, and feasibility studies.
We report briefly on two issues involved in refining the
ABCTM method.
Small numbers of patients
One particular problem we are facing stems from low denominators, or low numbers of patients indicated for a given
measure. For example, if a physician has only one qualifying
patient, then clearly the performance of that physician in
recommending an intervention, i.e. mammography, will be
either 0 or 100%. A large proportion of providers with 100%
performance for only a small number of patients could yield
a misleading result because they would artificially inflate the
benchmark level.
Pragmatically, ABCTM users must consider whether a physician who recommended mammography for his only eligible
patient (i.e. 100% performance), should be ranked higher
than a physician who recommended mammography for nine
of his ten eligible patients, clearly indicating consistently high,
but not perfect, performance for a larger number of patients.
The ABCTM recognizes this potential problem and solves
it by calculating performance using a Bayesian Estimator
technique [21] which effectively reduces the impact of providers with small numbers of eligible patients. The result of
application of this correction is the generation of a number
called the adjusted performance fraction (APF) calculated as
follows:
APF=(x+1)/(d+2)
where x is the actual number of patients receiving the
indicator intervention, and d is the total number of patients
for whom the indicator intervention is clinically appropriate.
Thus, in the case of a provider with only one appropriate
patient for whom the intervention was actually given (100%
performance), the performance fraction =(1+1)/(1+2)=
0.67. As the number of appropriate patients (d ) increases,
the performance fraction calculated using the ABCTM method
and the unadjusted mathematical percentage tend to be the
same number, e.g. a provider treating eight out of 10 patients
will have a Bayesian estimator adjusted performance of 0.75
Table 2 Pilot CCP mean performance and benchmark levels
Process-of-care
indicator
No. of hospitals
State performance
Benchmark
No. of hospitals
to which indicator
(95% confidence
performance
at or above
applied
interval)
benchmark
.............................................................................................................................................................................................................................
Smoking counseling
106
0.14 (0.12–0.16)
0.49
10
Beta-blockers
89
0.34 (0.30–0.39)
0.73
10
Aspirin during
97
0.65 (0.62–0.67)
0.96
15
hospitalization
Aspirin at discharge
105
0.57 (0.55–0.59)
0.84
8
Low-dose heparin
67
0.27 (0.22–0.32)
0.67
7
Angiotensin converting
50
0.51 (0.45–0.57)
0.91
9
enzyme inhibitors
445
C. I. Kiefe et al.
vs. an unadjusted percentage performance of 0.80. The two
numbers are close to being equivalent (within 3%) at around
30 patients. In addition to the advantages of reducing the
effect of performance percentages based on small numbers,
the Bayesian adjustment allows all data to be used, rather
than simply eliminating providers with small numbers. The
APF may now be used to rank providers in Step 1 of Table
1.
Summary ABCTMs
In another refinement, we would like to identify also those
providers who perform at a high level on several processof-care indicators, so that an aggregate ABCTM can be identified. This will provide a data-driven alternative to opinionbased approaches as identified by Green and colleagues [22]
for identifying best practices.
During the process of aggregation, problems with ‘small
numbers’ surface again. If a provider has both a large and a
very small denominator for two different indicators, how
should this be handled? To this end, what is ‘very small’? We
have derived a formula identifying a ‘minimum sufficient
denominator’ (MSD) for each indicator in a given dataset.
When even the APF adjusted denominator falls below the
MSD, including that provider in the benchmark calculation
without further adjustment holds the risk of distorting the
overall performance by artificially inflating the benchmark.
The MSD for a given indicator is the smallest positive integer
N such that 100% performance on that indicator attained on
a denominator representing N patients would be statistically
significantly different from the mean performance. The theoretical background for this as well as alternative approaches
to the problems of aggregation including small numbers will
be published separately.
Table 3 presents the MSDs for selected mean performances
ranging from 5% to 95%. For example, we advise that if the
mean performance on an indicator is 50% or below, and
denominators for all providers are at least six, no special
denominator adjustment is required. If, on the other hand,
mean performance is 70%, denominators below 11 require
special adjustment; and if mean performance is 80%, ‘very
small’ denominators of size less than 17 require special
adjustment.
A manual detailing many practical aspects of our benchmarking methodology is currently under development and is
available from the University of Alabama at Birmingham.
Discussion
We offer a data-driven methodology to derive levels of
performance which can be presented as both a realistic and
desirable level of excellence. Deliberately, we have applied
the methodology only to the most immediately ‘actionable’
aspects of the quality of care, i.e. process-of-care indicators.
It is important to note that our ABCTM method incorporates,
both in its basic calculation and in its use as a tool for
446
Table 3 Minimum sufficient denominators (minimum denominator size for a given mean performance so that adjustment for small numbers in benchmark computation is
not required)
Mean performance (%)
Minimum sufficient
denominator
............................................................................................................
95
72
90
36
85
23
80
17
75
13
70
11
65
9
60
8
55
7
50
6
45
5
40
5
35
4
30
4
25
3
20
3
15
2
10
2
5
2
continuous quality improvement, the number of patients
receiving excellent care.
Palmer cogently makes the point that process-based (as
opposed to outcome-based) measures match patients to
specific health care processes indicated only for given conditions and for given patient characteristics [23]. Therefore,
these measures do not require risk-adjustment, a complex
and difficult issue that invariably arises when comparing
outcomes such as mortality for different providers [24].
Another practical advantage of process measures noted by
Mant [25] is their higher sensitivity, compared to outcome
measures, in detecting differences between providers. While
Palmer and Mant provide good reasons to focus quality
improvement activities on process rather than outcome measures, we benchmark process because we wish to address the
most immediately ‘actionable’ aspect of quality measurement,
i.e. the process-of-care (as opposed to structure or outcome).
We assume, and are currently testing the hypothesis, that
based on the guidelines, benchmark level performance will
correlate with very good outcomes.
A benchmark performance level on a process indicator is
a tool to improve provider performance, rather than a tool
for the analysis of whether or not the provider is a statistical
outlier. Only if providers were ranked with subsequent tangible consequences (e.g. penalties for low rank), would addressing potential random variation be important. This is one
reason why risk-adjustment is so crucial when comparing
outcomes, especially for release to the public. However,
Achievable benchmarks of care
the purpose here is to identify high, achievable levels of
performance.
Testing the extent of the ABCTM’s contributions to the
improvement of outcomes is a major challenge which lies
ahead for us. This challenge is but a component of proving
the assumption underlying many quality improvement efforts:
that, indeed, improved processes result in improved outcomes
[5].
Conclusion
Standardization of desirable and achievable performance,
designated as benchmark performance, is not only methodologically feasible, but is useful in quality improvement. Our
ABCTM methodology demonstrates a technique for calculating
reliably an achievable benchmark from performance data.
10. Kiefe CI, Woolley TW, Allison JJ et al. Determining benchmarks:
a data driven search for the best achievable performance. Clin
Perform Qual Health Care 1994; 2: 190–194.
11. Jencks SF, Wilensky GR. The health care quality improvement
initiative. J Am Med Assoc 1992; 268: 900–904.
12. Mohr JJ, Mahoney CC, Nelson EC et al. Improving health care,
Part 3: Clinical benchmarking for best patient care. Jt Comm J
Qual Improv 1996; 22: 599–616.
13. Tomas S. Benchmarking: A technique for improvement. Hosp
Material Manage Q 1993; 14: 78–82.
14. Epstein A. Sounding board: Performance reports on quality–
prototypes, problems, and prospects. N Engl J Med 1995; 33:
57–61.
15. Nelson EC, Wasson JH. Using patient-based information to
rapidly redesign care. In Bridging the Gap Between Theory and
Practice. Chicago, IL: Hospital Research and Education Trust
(American Hospital Association), 1994: 69–85.
Acknowledgements
16. Wallace RG, Farmer RM, Craig AS, Kiefe CI. Hospital responses
to a data driven, state-wide quality assurance project. Clin Perform
Qual Health Care 1996; 4: 34–37.
This work is supported by HS09446 from the Agency for
Health Care Policy and Research. The authors are grateful
for the contributions of Donald C. Martin and Ian G. Child.
17. Ellerbeck EF, Jencks SF, Radford MJ et al. Quality of care for
Medicare patients with acute myocardial infarction. J Am Med
Assoc 1995; 273: 1509–1514.
References
1. Quality Council Work Group, Headquarters Air Force Logistics
Command. Benchmark matrix and guide: Part I. J Qual Assess
1991; 13: 14–19.
2. Lorence D. Benchmarking quality under U.S. health care reform:
the next generation. Qual Prog 1994; 27: 103–107.
3. Berkey T. Benchmarking in health care: turning challenges into
success. Jt Comm J Qual Improv 1994; 20: 277–284.
18. Task Force on Assessment of Diagnostic and Therapeutic
Cardiovascular Procedures (Subcommittee to Develop Guidelines for the Early Management of Patients with Acute Myocardial Infarction). Guidelines for the early management of
patients with acute myocardial infarction. A report of the
American College of Cardiology/American Heart Association.
J Am Coll Cardiol 1990; 16: 249–292.
19. Marciniak TA, Ellerbeck EF, Radford MJ et al. Improving the
quality of care for Medicare patients with acute myocardial
infarction. Results from the Cooperative Cardiovascular Project.
J Am Med Assoc 1998; 279: 1351–1357.
4. Berwick DM. Continuous improvement as an ideal in health
care. N Engl J Med 1989; 320: 53–56.
20. Eddy DM. A Manual For Assessing Health Practices and Designing
Practice Policies. The Explicit Approach. Philadelphia, PA: American
College of Physicians, 1992.
5. Berwick DM, Godfrey AB, Roessner J. Curing Health Care. New
Strategies for Quality Improvement. San Francisco, CA: Jossey-Bass,
1990.
21. Agresti A. Categorical Data Analysis. New York: John Wiley &
Sons, 1990: 463–464.
6. Goodman DC, Fisher ES, Bubolz TA et al. Benchmarking the
US physician workforce. An alternative to needs-based or
demand-based planning. J Am Med Assoc 1996; 276: 1811–1817.
7. Schroeder SA. How can we tell whether there are too many or
too few physicians? The case for benchmarking. J Am Med Assoc
1996; 276: 1841–1843.
22. Green J, Wintfeld N, Krasner M, Wells C. In Search of America’s
best hospitals: the promise and reality of quality assessment. J
Am Med Assoc 1997; 277: 1152–1155.
23. Palmer RH. Process-based measures of quality: The need for
detailed clinical data in large health care databases. Ann Intern
Med 1997; 127: 733–738.
8. Camp RC. Benchmarking: the Search for Industry Best Practices That
Lead to Superior Performance. Milwaukee, WI: Quality Press, 1989.
24. Christiansen CL, Morris CN. Improving the statistical approach
to health care provider profiling. Ann Intern Med 1997; 127:
764–768.
9. Lohr KN (ed.). Medicare: A Strategy for Quality Assurance. Institute
of Medicine, Vol. I. Washington, DC: National Academy Press,
1990.
25. Mant J, Hicks N. Detecting differences in quality of care: the
sensitivity of measures of process outcome in treating myocardial
infarction. Br Med J 1995; 311: 793–796.
447