Inappropriate Repeats of Six Common Tests in a Canadian City: A

AJCP / Original Article
Inappropriate Repeats of Six Common Tests
in a Canadian City
A Population Cohort Study Within a
Laboratory Informatics Framework
Eric K. Morgen, MD, MPH, FRCPC,1,2 and Christopher Naugler, MD, MSc, FRCPC3
From the 1Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada; 2Department of Pathology and Laboratory
Medicine, Mount Sinai Hospital, Toronto, Canada; and 3Department of Pathology and Laboratory Medicine, University of Calgary and Calgary Laboratory
Services, Calgary, Canada.
Key Words: Laboratory utilization; Repeated testing; Inappropriate testing; Cholesterol; HbA1c; Vitamin D; Vitamin B12; TSH; Ferritin
Am J Clin Pathol November 2015;144:704-712
Objectives: To identify inappropriate repeats of six common
laboratory tests in a population sample of patients, using
highly specific criteria based only on repeat time and test
Methods: We used a laboratory informatics database to
conduct a retrospective cohort study using a population
sample of 103,000 patients in the city of Calgary with an
index test in 2010 and uniform follow-up of 1 year. We
examined six tests (cholesterol, hemoglobin A1c, thyroidstimulating hormone, vitamin B12, vitamin D, and ferritin)
with consensus-based or easily justified criteria for
inappropriate repeats based solely on time to repeat and the
index test value.
Results: The percentages of tests repeated at 3, 6, and 12
months were 11%, 23%, and 41%, respectively. In total, 16%
of these six tests were inappropriately repeated, representing
an annual internal cost of $0.6 to $2.2 million Canadian
dollars and corresponding to population-scaled national
estimates for Canada and the United States of $160 million
and $2.4 billion, respectively.
Conclusions: Objective definitions based on repeated
testing identified 16% of six studied tests as inappropriate,
delineating a subset of inappropriate testing that is well
suited to automated identification and intervention and
that provides a likely lower bound on the true burden of
inappropriate testing.
704 Am J Clin Pathol 2015;144:704-712
Upon completion of this activity you will be able to:
• define the “Ulysses syndrome” and describe its relation to
laboratory testing.
• describe the difficulties in defining “inappropriate testing.”
• discuss the advantages of objective definitions for inappropriate
• approximately quantify minimum burdens of inappropriate testing in
tests studied in this article.
The ASCP is accredited by the Accreditation Council for Continuing
Medical Education to provide continuing medical education for physicians.
The ASCP designates this journal-based CME activity for a maximum of
1 AMA PRA Category 1 Credit ™ per article. Physicians should claim only
the credit commensurate with the extent of their participation in the activity. This activity qualifies as an American Board of Pathology Maintenance
of Certification Part II Self-Assessment Module.
The authors of this article and the planning committee members and staff
have no relevant financial relationships with commercial interests to disclose.
Exam is located at
Laboratory testing is an integral part of the health care
system, with many patient encounters resulting in the ordering of laboratory tests and a large proportion of clinical
decisions relying on them.1 Consequently, changes in laboratory testing practices have fundamental impacts on both
the systemic costs of health care and the care that patients
receive. There is recent evidence that laboratory utilization is increasing above and beyond what can be attributed
to inflation and population aging2,3 and that the trajectory
appears unsustainable.4 We also know that due to population variation in analyte values and laboratory variation in
test results, false-positive results are inevitable. As a result,
increasing test volumes also increase unintended patient
morbidity due to the “Ulysses syndrome” of investigations
and interventions in patients who were actually healthy.5,6
(Coined in 1972, “Ulysses syndrome” refers to an abnormal
test result leading to a series of health care adventures [ie,
© American Society for Clinical Pathology
AJCP / Original Article
follow-up investigations and/or interventions, accompanied
by patient anxiety, potential morbidity, and extra monetary
cost] that were ultimately unnecessary, merely leading the
patient back to his or her starting point before the abnormal
result.) While measuring the true extent of such downstream
consequences of false-positive tests is difficult and not frequently attempted in the literature, studies of particular tests
have documented increased hospital/pharmacy/laboratory
services amounting to thousands of dollars (blood cultures),7
increased interventions and hospitalizations (tuberculosis
cultures),8 and short-term psychosocial consequences (cystic
fibrosis screening).9
There is a strong perception that a substantial proportion of laboratory tests are unnecessary.6,10-12 Indeed, studies
show that differences in laboratory testing volumes among
institutions are often not correlated with the intended clinical
outcomes13 and thus that excess testing between institutions
appears to yield no measurable benefit.14-16 However, definitions of inappropriate testing have proven problematic. One
systematic review noted that many studies of inappropriate
laboratory testing lacked objective criteria,17 potentially
leading to low specificity as well as sensitivity. A more
recent systematic review emphasized the differences in
results between studies with objective vs subjective criteria,
noting that the former were more dependable and ultimately
preferred but appeared to often underestimate inappropriate
testing and required improvement.18 A further weakness
of most criteria (all subjective criteria and many objective
criteria) is the complexity of identifying inappropriate tests.
This often requires human intervention and/or the evaluation of patient information not readily available to a testing
laboratory—thus not feasible for automated strategies that
could facilitate system-level reduction strategies. This may
provide a partial explanation for an apparent lack of any systemic improvements in this area between 1997 and 2012.18
In this context, examining repeated ordering of the same
test type on the same patient (ie, repeat testing) holds great
promise to expand the scope of objective criteria for inappropriateness while remaining amenable to system-level
strategies. Prior investigators have reported that repeat testing within 1 month comprises 30% of test volumes for eight
common tests and 63% within 1 year,19 making this a fertile
area for potential reductions. Here, we apply survival analysis to repeated laboratory testing in a population sample
of 103,000 patients living in Calgary and the surrounding
area. In particular, we examine six common laboratory test
types where a reliable (highly specific) assessment of inappropriateness can be made entirely from the test interval
and index test value, and we calculate the associated direct
annual costs in the study laboratory catchment area, as well
as corresponding population-scaled estimates nationally for
Canada and the United States.
© American Society for Clinical Pathology
Materials and Methods
Population and Data Extraction
This study was approved by the University of Calgary
Conjoint Research Ethics Board. Data on laboratory testing were extracted from the information system of Calgary
Laboratory Services, the single laboratory provider for the
city of Calgary and the surrounding area. This geographic
area encompasses approximately 1.4 million people, and
essentially all tests occurring within this area are captured
within the database. We selected six test types for the current
study that have well-accepted guidelines or easily defined
uncontroversial criteria regarding the appropriateness of
repeating testing: vitamin B12, cholesterol, thyroid-stimulating hormone (TSH), vitamin D, hemoglobin A1c (HbA1c),
and ferritin. The tests are shown in ❚Table 1❚,20-39 along with
the criteria used. For all six test types, every test instance
occurring in 2010 or 2011 was extracted from the database.
For each test instance, the following data were recorded:
name of the test, date and time of testing, test value (ie, the
numeric result), laboratory-defined upper and lower limits
for that test, patient age at the time of testing, patient sex,
and type of testing location (ie, outpatient, inpatient, community clinic). Because the data set as mentioned was difficult to work with due to its large size, we compiled a list
of all patients receiving any laboratory test over the 2-year
period and used the “SAMPLE” function in the structured
query language database language to randomly subsample
20% of these patients to constitute the main data set for further analysis in this study.
Descriptive Analysis
The distribution of test dates over the entire 2-year
study period was plotted. For the entire data set as well as
each test type, the distribution of abnormality (ie, whether
the test was designated as “abnormal” according to the laboratory reference range) was calculated, as were the distributions of patient ages and sexes associated with each test type.
In addition, the distributions of age and sex across all studied
patients were calculated.
Survival Analysis
All test instances recorded during 2010 were designated
as “index tests.” For each index test, the tested patient was
followed for exactly 1 year from the date of the index test,
looking for a second test instance of the same type. The
presence of such a repeat test was used as the outcome of
interest for survival analysis. If any repeats were found
during the 1-year period, this was designated as a positive
outcome (ie, a repeat occurred) for the index test, and the
intervening time to the first such repeat was recorded as the
Am J Clin Pathol 2015;144:704-712705
Morgen and Naugler / Inappropriately Repeated Laboratory Testing
❚Table 1❚
Selected Tests With Simple Criteria for Identifying Inappropriate Repeat Testinga
Tests with clear guidelines that can be used to identify inappropriate repeat testing
Total cholesterol
No repeats <12 weeks20
Serum cholesterol changes slowly, and repeat testing is not useful before the interval
Exception: A lipid panel might reasonably be repeated at 1 month in patients undergoing
treatment intensification.
No repeats <3 months
HbA1c represents an average effect over several months, and it is not useful to test
(abnormal values) or <1
more frequently.24,25 Diabetic patients not meeting their glycemic goals (HbA1c
year (normal values)26
generally 6%-7%) should be screened every 3 months. Those consistently meeting
goals may be screened every 6 months. Patients without diabetes (HbA1c <6%)
should be screened every 1 to 3 years, depending on risk profile.26,27
Exception: Pregnant women who require treatment should be tested no more frequently
than once per month.26,27
Should not be repeated
Patients taking thyroxine with a recent dose change should be tested after 8 to 12
within 8 weeks28
weeks. Other clinical categories should be tested no more often than every 2
Tests without consensus-based guidelines but where a clear argument for particular criteria exists
Vitamin B12
No repeats <1 year
This test has low diagnostic accuracy and is not indicated for routine screening,
and there are no guidelines to support retesting a patient with a normal result or
retesting a patient with an abnormal result unless noncompliance with therapy is
Vitamin D
No repeats of normal
The primary reason to repeat a normal test would be to monitor supplementation
values <1 year
to achieve higher levels of vitamin D, which is currently not recommended due
to potential risks of renal calculi in the context of no demonstrated benefit for hip
fracture31-33 or other potential benefits.34
No repeats of normal
Ferritin is indicated for the diagnosis of iron deficiency or iron overload states.35 In
values <1 year
both of these situations, a normal result essentially rules out the disease.36,37 It may
be appropriate to monitor serum ferritin in the specific clinical situation of patients
with transfusion-dependent anemia, but very few of these patients will have normal
HbA1c, hemoglobin A1c; TSH, thyroid-stimulating hormone.
a These criteria for inappropriate testing rely only on the time interval to repeat and whether the index test was abnormal. Tests are divided into two categories, depending on how
well accepted the presented criteria are.
time to that outcome. Kaplan-Meier curves were plotted for
the entire data set (stratified for various factors), as well as
for individual test types, stratified by whether the index test
was abnormal by local laboratory criteria. Cox proportional
hazards (CPH) modeling was used to test for a significant
difference between these two curves.
Calculation of Costs
To calculate the cost of individual laboratory tests
within the study laboratory, we used internal laboratory data
representing two separate numbers: (1) a lower bound on
this cost, representing only the variable costs (ie, reagent
costs), which are saved whenever a test is not run, and (2) an
upper bound on this cost, representing all expenses associated with performing the test, including equipment, reagents,
and personnel time, per test instance. To obtain an annual
cost range, we multiplied this range by the respective annual
volumes of tests within the study laboratory. To derive statistics on the list prices for each test in the United States and
Canada, we requested such prices from laboratories across
Canada and the United States, and we calculated the 5th,
50th, and 95th percentile costs for each country from these
data to provide an estimate of the range of prices that might
be encountered in each country.
706 Am J Clin Pathol 2015;144:704-712
To estimate annual national cost burdens for redundant
testing in Canada and the United States, we started with the
annual volumes of redundant testing identified in this study
for our geographic area, encompassing 1.4 million people in
Calgary and surrounding areas. We then scaled the volumes
of redundant testing up from a population of 1.4 million to
the population of Canada (factor of 24.91) and the United
States (factor of 224.21). This provided an estimate of the
volume of redundant tests of each type for both countries,
which could then be multiplied by various potential costs per
individual test identified above.
Test and Patient Characteristics
Nearly 400,000 test instances were included in the
study, performed on just over 100,000 patients. The test
counts and patient counts for the entire data set and for each
studied test type are shown in ❚Table 2❚, along with other
univariate descriptive statistics (median age, proportion
female, proportion abnormal). The proportion of men and
women tested is approximately equal for cholesterol and
© American Society for Clinical Pathology
AJCP / Original Article
❚Table 2❚
Univariate Descriptive Statistics for Selected Laboratory Tests of Calgary Area Residents in 2010a
Test Type
No. of Tests
No. of Patientsb
Median Patient Age, y
Patient Sex,
% Female
Abnormal Results,
% of Index Tests
Time to 25% Repeated, mo
Vitamin B12
Vitamin D
All six tests
HbA1c, hemoglobin A1c; TSH, thyroid-stimulating hormone.
a The presented data represent a 20% sample by patient of all patients tested at least once during this period in this geographic area. Because not all tests achieved a median repeat
time (ie, time to 50% repeated) during the study, the time to 25% repeated is presented instead.
b Some patients had multiple tests.
HbA1c, but women were more likely to be tested for ferritin, TSH, vitamin B12, and vitamin D. The number of tests
by month across the 2 study years was plotted (not shown)
and demonstrates a generally even distribution, with small
seasonable variability and a gradual upward trend consistent
with gradually rising test volumes.
Repeat Testing
Survival analysis was performed to analyze repeat testing. The 25th percentile repeat time (ie, the time interval
until 25% of all tests are repeated) is displayed in Table 2
(right side) for the entire data set and each test type. This was
chosen instead of the median repeat time since most tests did
not reach a median repeat time within 1 year. Overall KaplanMeier curves are plotted for each test type, as well as the two
curves created by stratification on abnormal vs normal index
test results ❚Figure 1❚. The influence of an abnormal index test
on the risk of test repetition showed statistically significant
effects by CPH modeling for each test type. Hazard ratios
and 95% confidence intervals for an abnormal index test were
as follows: cholesterol, 1.47 (1.44-1.50); HbA1c, 4.36 (4.244.48); TSH, 3.89 (3.79-3.99); vitamin B12, 1.58 (1.49-1.66);
vitamin D, 1.30 (1.25-1.36); and ferritin, 1.99 (1.93-2.05).
The percentages and numbers representing inappropriate
repeat testing, according to the Table 1 criteria, were calculated and are shown in ❚Table 3❚, along with the calculated
cost burden at the study laboratory. The percentages of inappropriate repeats were generally lower for the three tests with
well-established consensus guidelines than for the tests without, which may indicate some positive influence of creating
consensus guidelines on inappropriate repeat testing.
Test Costs
We received price lists from six Canadian laboratories (encompassing six provinces) and 13 US laboratories
(encompassing local laboratories from six states and two
national-based laboratories). This provided a range of costs,
© American Society for Clinical Pathology
for which various percentiles were calculated and are presented in ❚Table 4❚ and ❚Table 5❚, along with the estimated
cost burden of redundant testing nationally for these six tests
in Canada and the United States. The 50th percentiles for
cost provide useful point estimates for cost in Canada and
the United States of $160 million and $2.4 billion, respectively, while the other estimates provide a better sense for
the range of potential costs under different assumptions.
Repeat ordering of laboratory tests is common. Van
Walraven and Raymond19 reported, in a population-based
study of four million instances of eight common test types,
that 30% were repeated within 1 month and 60% within 1
year. There has also been evidence that a substantial portion
of such repeats is inappropriate. Stewart et al40 examined
patients transferred between institutions and found that 32%
had tests repeated within 12 hours, with 20% thought by the
study authors to not be clinically indicated. Ganiyu-Dada
and Bowcock41 studied three common hematologic tests
repeated within an 8-week period and noted that 86% followed a normal result. However, they surveyed physicians
who felt that “borderline normal” results were appropriate
to repeat, and after excluding these, only 6% of repeat tests
followed an unequivocally normal result. Kwok and Jones42
reported that 18% of tests at a tertiary-level immunology
laboratory were repeated within 12 weeks and considered
these to be redundant. Finally, Bates et al43 created multiple
definitions of redundancy and reported rates of inappropriate repeat testing to be between 8% and 30% based on a
combination of chart review and repeat interval, depending
on the criteria used. These studies have been of great help
to determine the magnitude of repeated laboratory testing.
They also provide some clues as to what portion may be
redundant. However, in this respect they suffer from similar
Am J Clin Pathol 2015;144:704-712707
Morgen and Naugler / Inappropriately Repeated Laboratory Testing
Fraction Repeated
Fraction Repeated
Time to Repeat (mo)
Time to Repeat (mo)
Time to Repeat (mo)
Time to Repeat (mo)
Fraction Repeated
Fraction Repeated
Time to Repeat (mo)
Fraction Repeated
Fraction Repeated
Time to Repeat (mo)
❚Figure 1❚ Kaplan-Meier curves of repeat laboratory testing for Calgary area patients with index tests in 2010. Curves display
the cumulative percentage of tests repeated up to each time point and are shown for each of the six tests of interest, stratified
by abnormal (black) vs normal (light gray) results, with the nonstratified curve (dark gray) also shown for comparison. For curves
corresponding to our proposed criteria for inappropriate repeated testing, vertical dashed lines indicate the time threshold for
inappropriate tests, and horizontal dashed lines indicate the corresponding percentage of repeats within that time threshold. The
colors of the dashed lines match those of the curve they apply to. A, Cholesterol. B, Hemoglobin A1c. C, Thyroid-stimulating
hormone. D, Vitamin B12. E, Vitamin D. F, Ferritin.
708 Am J Clin Pathol 2015;144:704-712
© American Society for Clinical Pathology
AJCP / Original Article
❚Table 3❚
Characteristics of Tests Performed in 2010 on Calgary Area Residents That Were Inappropriately Repeateda
Test Type
Annual Test Count at
the Study Laboratory
Percentage Repeated
Recoverable Internal Cost
of Each Test for the Study
(95% Confidence Interval) Laboratory (CAD)b
Annualized Cost of
Inappropriate Repeats at the
Study Laboratory (Millions)
HbA1c with normal resultc
HbA1c with abnormal resultd
Vitamin B12
Vitamin D with normal resultc
Ferritin with normal resultc
All testse
10.5 (10.3-10.7)
31.3 (30.7-31.9)
24.8 (24.3-25.3)
7.2 (7.0-7.3)
28.4 (27.8-28.9)
24.5 (23.8-25.3)
35.8 (35.4-36.2)
16.4 (16.3-16.6)
CAD, Canadian dollars; HbA1c, hemoglobin A1c; NA, not applicable; TSH, thyroid-stimulating hormone.
a Presented here are the estimated total annual counts for selected test types at the study laboratory, the percentage that were inappropriately repeated (using the criteria in Table
1), and the associated costs for the study laboratory.
b The lower bound of each range represents only reagent costs (always recoverable when a test is not performed), while the upper bound includes all indirect costs (which may be
recoverable if volumes decrease consistently over the long term).
c The percentages and annual test counts for these rows reflect only normal results (ie, the count and percentage of HbA index tests with a normal result that were repeated
d The percentages and annual test counts for this row reflect only abnormal results.
e The count in this row reflects all tests (both normal and abnormal) for all six studied test types and thus is larger than the sum of the numbers above it. This also influences the
denominator for the percentage calculation.
drawbacks to the general literature on overtesting, where a
systematic review found the area characterized by widely
varying definitions, small study sizes, and unvalidated, often
subjective criteria.17
In contrast, we have assembled a large study population
and chosen tests using uncontroversial, objective definitions
for inappropriateness of repeat testing, based on easily available information about the tests. We followed over 100,000
patients (a population-based sample randomly selected from
the 1.4 million residents of the study area) for at least a year,
monitoring six common tests that together represent 18% of
test volumes and 25% of test costs at the study laboratory.
Index tests were ascertained over a 1-year period to avoid
any biases due to seasonal effects, and each index test had
a uniform 1-year follow-up period for the same reason. We
selected three tests (cholesterol, HbA1c, and TSH) with consensus-based guidelines regarding the appropriate frequency
of repetition, as well as three other tests (vitamin D, vitamin
B12, and ferritin) where we could create cutoffs based on
straightforward adaptation of existing testing guidelines.
The choice of definitions is a clear advantage to our
study, resulting in a set of objective criteria for identifying
inappropriate retesting that have three useful implications:
1. Our estimates for the burden of inappropriate testing
are very conservative and represent an approximate lower
bound for the tests investigated. Our criteria will identify
as inappropriate very few repeat tests (or perhaps none, for
certain test types) that were actually appropriate. Furthermore, there are certain to be many repeats at greater intervals
that we have targeted that are also inappropriate (ie, not
every cholesterol test repeated at 12 or more weeks will be
appropriate), and we have not even attempted to target inappropriate tests that are not repeats. Thus, the true fractions of
© American Society for Clinical Pathology
inappropriate tests should be no lower (and are likely much
higher) than the numbers we report for these test types.
2. The criteria are not specific to particular health care
settings but are applicable in diverse clinical settings and
administrative levels. They may be applied in primary care,
specialty clinics, or inpatient wards, and they may be applied
to a single physician or an entire country.
3. Because the criteria are based on only two simple
pieces of information (the repeat interval and whether the
index test was abnormal), they are readily amenable to automated identification.
Together, these suggest a dramatically simplified
approach for the otherwise daunting task of “reducing inappropriate testing” that could be applied at any administrative
level. The types of simple definitions proposed could be
used to automatically flag tests that are likely to be inappropriate in electronic health record (EHR) systems. For
flagged tests, the computer could then remind the ordering
physician of the earlier result (which may have been overlooked) and prompt her or him for a justification if she or
he wishes to override the warning. Computer logs of overrides and the justifications given could then be reviewed
on an annual basis to monitor the program success, provide
feedback to physicians on their ordering habits, and identify
situations where the flags may be inappropriate because
many overrides are being performed with good justification.
Study Limitations
There are a number of limitations to the current study.
First, we examined only six test types, of the hundreds available at most laboratories. While these six tests represented
25% of study laboratory test volumes by cost, the picture
is far from complete regarding inappropriate repeat testing.
Am J Clin Pathol 2015;144:704-712709
Morgen and Naugler / Inappropriately Repeated Laboratory Testing
❚Table 4❚
Estimated Test Prices Within the Study Laboratory and the Distribution of Surveyed Test Prices Across Canada and the United
Selected Percentiles of Surveyed Canada List
Prices for a Single Test (CAD)
Selected Percentiles of Surveyed US List
Prices for a Single Test (USD)
Test Type
Estimated Internal
Cost in This Studya
Vitamin B12
Vitamin D
CAD, Canadian dollars; HbA1c, hemoglobin A1c; TSH, thyroid-stimulating hormone; USD, US dollars.
a Estimated internal costs are rounded to the nearest $1.00 CAD.
❚Table 5❚
Estimated Total Cost of Redundant Testing in the Canadian and US Health Care Systems
Estimated National Annual Canadian Cost
Burden for Each Percentile Cost Estimate in
Table 4 (Millions CAD)
Test Type
HbA1c (normal resulta)
HbA1c (abnormal resultb)
Vitamin B12
Vitamin D (normal resulta)
Ferritin (normal resulta)
Totals for all six testsc
Estimated National Annual US Cost
Burden for Each Percentile Cost
Estimate in Table 4 (Millions USD)
CAD, Canadian dollars; HbA1c, hemoglobin A1c; TSH, thyroid-stimulating hormone; USD, US dollars.
a The percentages and annual test counts for these rows reflect only normal results (ie, the count and percentage of HbA index tests with a normal result that were repeated
b The percentages and annual test counts for this row reflect only abnormal results.
c Totals may not add up due to rounding.
Furthermore, the types of definitions for inappropriate testing
used in this study will not be easily applicable to all laboratory tests. This is particularly true for tests with a wide variety
of applications in a wide variety of clinical situations. However, for many tests, the inherent kinetics of the biological
markers involved may provide certain limits for reasonable
repeat times—for example, using the biological half-life of
drugs to guide limits for repeat testing of drug levels. And for
other tests, deliberation by guidelines committees may reveal
reasonable limits on repeat testing intervals in many clinical
situations, as they did for many of the tests studied in the
current article. Ultimately, some tests will be well suited to
this approach, and others will be poorly suited. However, the
current approach represents a useful general strategy that can
identify significant volumes of inappropriate testing and has
favorable characteristics when feasible.
Second, there are a small number of exceptions to our
criteria for inappropriateness, defined in Table 1. For cholesterol, it may be considered appropriate to retest a patient after
1 month (rather than 3 months) if he or she is undergoing a
treatment intensification. For HbA1c, pregnant women may
be retested at 1 month rather than 3 months. Such exceptions
710 Am J Clin Pathol 2015;144:704-712
would have to be considered when implementing any plan to
reduce unnecessary testing but are likely to have only a small
impact on the estimated burden of inappropriate testing.
Third, the criteria for several tests in our study (vitamin
B12, vitamin D, and ferritin) were not based on consensus
guidelines, rendering them less standardized. However, the
argument for these criteria is straightforward. Each of these
tests screens for problems (vitamin B12 deficiency, vitamin
D deficiency, and iron deficiency) that are relatively uncommon and develop slowly, so after a negative test result it is
difficult to justify repeat testing.
Fourth, our approach of targeting repeated laboratory
tests will not identify all inappropriate testing. Indeed, one
of the most compelling reasons to study inappropriate repeat
testing is that it represents a fraction of inappropriate testing that is relatively easier to target and define, in a field
rife with vague definitions that make careful study and
improvement programs difficult. So while we estimated in
Table 2 that 16% of all tests we studied were inappropriate,
this really represents only a lower bound on this percentage. Indeed, the trend observed in Table 3, where tests with
formal consensus-defined restrictions about repeat testing
© American Society for Clinical Pathology
AJCP / Original Article
tended to have lower rates of inappropriate repeats, implies
that the percentage of inappropriate repeats may be higher
for tests without clear consensus definitions and supports the
potential benefit of creating such guidelines.
Finally, it is quite difficult to estimate costs for laboratory testing. Even within a study laboratory, where it might
seem straightforward to estimate the cost of a test, there
is controversy due to the distinction between the marginal
cost savings of avoiding testing and the price charged by a
laboratory for a given test.44 In the simplest case, the reduction of a laboratory test volume by one test will save only
the reagent cost of that test and will not affect the fixed
costs of operating the laboratory. However, when testing is
reduced by larger volumes and over longer periods of time,
the marginal cost reduction may include reduced staffing
and less need to increase analyzer capacity, thus deferring
equipment upgrade/maintenance and staff hiring costs as
overall laboratory volumes increase. We have dealt with this
issue by providing the reagent costs as a lower bound and
the total internal cost as an upper bound, defining the range
of potential cost savings. The actual marginal savings will
depend on the relative volume of tests avoided and whether
this decrease is consistent over time.
Extending such cost estimates beyond a single laboratory is even more problematic, since (beyond the problems
already discussed) laboratories are funded under a variety of
remuneration schemes and at a variety of prices. Thus, any
specific analysis of potential costs would be necessarily relevant only to a very narrow subset of the population. Instead,
we have chosen to provide cost estimates based on internal
costs at the study laboratory (representing the minimum
burden of redundant testing, when there is no profit margin
to the testing performed), as well as for a full range of costs
derived from Canadian and US list prices provided by a
range of laboratories. This allows readers to get a sense of
the potential cost burden of redundant testing under a variety
of cost situations. There are similar challenges relating to
the use of test volumes from one geographic area to estimate
volumes in other areas. However, given the complexity of
presenting the range of potential cost situations already considered, we have simply extrapolated our local test volumes
directly to the Canadian and US systems using scaling factors according to the population sizes served. This will not
provide definitive estimates but is a reasonable starting point
for discussion about issues surrounding redundant testing.
Survival analysis of laboratory testing data for six tests
with guideline-based or otherwise straightforward definitions
of inappropriate repeat testing has demonstrated that 16% of
© American Society for Clinical Pathology
these tests are repeated inappropriately. This provides a fairly
strong minimum estimate (ie, lower bound) for the burden of
inappropriate repeat testing. It also provides a valuable and
relatively unexploited opportunity to target inappropriate testing using EHR or other computer-based strategies and reduce
unnecessary test volumes by leveraging these straightforward
guidelines. Because redundant tests appear to make no direct
contribution to guiding patient care,14-16 such interventions
could reduce health care expenditures while likely reducing patient morbidity.5,6 The annual savings achievable by
eliminating the inappropriate tests targeted in this study could
amount to as much as $2 million within our institution and
corresponds to an estimated $160 million across Canada. In
the United States, with 10 times the population of Canada, the
health system savings could be an order of magnitude higher.
Corresponding author: Christopher Naugler, MD, MSc, FRCPC,
C410, Diagnostic and Scientific Centre, 9, 3535 Research Rd NW,
Calgary AB T2L 2K8, Canada; [email protected]
Acknowledgments: We thank Jeannine Viczko and Maggie Guo
for help with data extraction.
1. Forsman RW. Why is the laboratory an afterthought for
managed care organizations? Clin Chem. 1996;42:813-816.
2. McGrail KM, Evans RG, Barer ML, et al. Diagnosing
senescence: contributions to physician expenditure increases
in British Columbia, 1996/97 to 2005/06. Healthc Policy.
3. Sivananthan SN, Peterson S, Lavergne R, et al. Designation,
diligence and drift: understanding laboratory expenditure
increases in British Columbia, 1996/97 to 2005/06. BMC
Health Serv Res. 2012;12:472.
4. Naugler C. A perspective on laboratory utilization
management from Canada. Clin Chim Acta. 2014;427:142144.
5. Rang M. The Ulysses syndrome. Can Med Assoc J.
6. McGregor MJ, Martin D. Testing 1, 2, 3: is overtesting
undermining patient and system health? Can Fam Physician.
2012;58:1191-1193, e615-e617.
7. Bates DW, Goldman L, Lee TH. Contaminant blood cultures
and resource utilization: the true consequences of falsepositive results. JAMA. 1991;265:365-369.
8. De Boer AS, Blommerde B, de Haas PEW, et al. Falsepositive Mycobacterium tuberculosis cultures in 44 laboratories
in the Netherlands (1993 to 2000): incidence, risk factors,
and consequences. J Clin Microbiol. 2002;40:4004-4009.
9. Tluczek A, Orland KM, Cavanagh L. Psychosocial
consequences of false-positive newborn screens for cystic
fibrosis. Qual Health Res. 2011;21:174-186.
10. Beck JR. Does feedback reduce inappropriate test ordering?
Arch Pathol Lab Med. 1993;117:33-34.
11. Bareford D, Hayling A. Inappropriate use of laboratory
services: long term combined approach to modify request
patterns. BMJ. 1990;301:1305-1307.
Am J Clin Pathol 2015;144:704-712711
Morgen and Naugler / Inappropriately Repeated Laboratory Testing
12. Perraro F, Rossi P, Liva C, et al. Inappropriate emergency test
ordering in a general hospital: preliminary reports. Qual Assur
Health Care. 1992;4:77-81.
13. Ashley JS, Pasker P, Beresford JC. How much clinical
investigation? Lancet. 1972;1:890-892.
14. Daniels M, Schroeder SA. Variation among physicians in
use of laboratory tests, II: relation to clinical productivity and
outcomes of care. Med Care. 1977;15:482-487.
15. Bell DD, Ostryzniuk T, Verhoff B, et al. Postoperative
laboratory and imaging investigations in intensive care units
following coronary artery bypass grafting: a comparison of two
Canadian hospitals. Can J Cardiol. 1998;14:379-384.
16. Powell EC, Hampers LC. Physician variation in test ordering
in the management of gastroenteritis in children. Arch Pediatr
Adolesc Med. 2003;157:978-983.
17. van Walraven C, Naylor C. Do we know what inappropriate
laboratory utilization is? a systematic review of laboratory
clinical audits. JAMA. 1998;280:550-558.
18. Zhi M, Ding EL, Theisen-Toupal J, et al. The landscape of
inappropriate laboratory testing: a 15-year meta-analysis.
PLoS One. 2013;8:e78962.
19. Van Walraven C, Raymond M. Population-based study of
repeat laboratory testing. Clin Chem. 2003;49:1997-2005.
20. Anderson TJ, Grégoire J, Hegele RA, et al. 2012 update
of the Canadian Cardiovascular Society guidelines for the
diagnosis and treatment of dyslipidemia for the prevention
of cardiovascular disease in the adult. Can J Cardiol.
21. Iliadi V, Kastanioti C, Maropoulos G, et al. Inappropriately
repeated lipid tests in a tertiary hospital in Greece: the
magnitude and cost of the phenomenon. Hippokratia.
22. Virani SS, Woodard LD, Wang D, et al. Correlates of repeat
lipid testing in patients with coronary heart disease. JAMA
Intern Med. 2013;173:1439-1444.
23. Goodwin JS, Asrabadi A, Howrey B, et al. Multiple
measurement of serum lipids in the elderly. Med Care.
24. Laxmisan A, Vaughan-Sarrazin M, Cram P. Repeated
hemoglobin A1C ordering in the VA Health System. Am J
Med. 2011;124:342-349.
25. Akan P, Cimrin D, Ormen M, et al. The inappropriate use
of HbA1c testing to monitor glycemia: is there evidence in
laboratory data? J Eval Clin Pract. 2007;13:21-24.
26. Canadian Diabetes Association Clinical Practice
Guidelines Expert Committee, Cheng AYY. Canadian
Diabetes Association 2013 clinical practice guidelines for
the prevention and management of diabetes in Canada:
introduction. Can J Diabetes. 2013;37(suppl 1):S1-S3.
27. Alberta Health Services. Alberta Health Services Laboratory
Bulletin—March 28, 2014. http://www.albertahealthservices.
ca/LabServices/wf-lab-bulletin-new-nemoglobin-a1c-testutilization-criteria.pdf. Accessed September 12, 2014.
28. Towards Optimized Practice Program. Clinical practice
guidelines: investigation and management of primary
thyroid dysfunction. 2008.
download/350/thyroid_guideline.pdf. Accessed August 18, 2014.
29. Smellie WSA, Wilson D, McNulty CAM, et al. Best
practice in primary care pathology: review 1. J Clin Pathol.
712 Am J Clin Pathol 2015;144:704-712
30. Health Quality Ontario. Serum vitamin B12 testing: a rapid
review. 2012.
eds/rapid-reviews/vitamin-b12-121212-en.pdf. Accessed
August 19, 2014.
31. Moyer VA; U.S. Preventive Services Task Force. Vitamin
D and calcium supplementation to prevent fractures in
adults: U.S. Preventive Services Task Force recommendation
statement. Ann Intern Med. 2013;158:691-696.
32. Nestle M, Nesheim MC. To supplement or not to
supplement: the U.S. Preventive Services Task Force
recommendations on calcium and vitamin D. Ann Intern
Med. 2013;158:701-702.
33. Rizzoli R, Boonen S, Brandi M-L, et al. Vitamin D
supplementation in elderly or postmenopausal women:
a 2013 update of the 2008 recommendations from the
European Society for Clinical and Economic Aspects of
Osteoporosis and Osteoarthritis (ESCEO). Curr Med Res
Opin. 2013;29:305-313.
34. Prentice RL, Pettinger MB, Jackson RD, et al. Health risks
and benefits from calcium and vitamin D supplementation:
Women’s Health Initiative clinical trial and cohort study.
Osteoporos Int. 2013;24:567-580.
35. Health Quality Ontario. Ferritin testing: a rapid review. 2012. Accessed August 19, 2014.
36. Oh RC, Franzos T, Montoya C. Clinical inquiry: how
best to diagnose iron-deficiency anemia in patients with
inflammatory disease? J Fam Pract. 2012;61:160-161.
37. Bacon BR, Adams PC, Kowdley KV, et al. Diagnosis and
management of hemochromatosis: 2011 practice guideline
by the American Association for the Study of Liver Diseases.
Hepatology. 2011;54:328-343.
38. Brittenham GM, Cohen AR, McLaren CE, et al. Hepatic
iron stores and plasma ferritin concentration in patients with
sickle cell anemia and thalassemia major. Am J Hematol.
39. Cappellini MD, Porter J, El-Beshlawy A, et al. Tailoring iron
chelation by iron intake and serum ferritin: the prospective
EPIC study of deferasirox in 1744 patients with transfusiondependent anemias. Haematologica. 2010;95:557-566.
40. Stewart BA, Fernandes S, Rodriguez-Huertas E, et al. A
preliminary look at duplicate testing associated with lack
of electronic health record interoperability for transferred
patients. J Am Med Inform Assoc. 2010;17:341-344.
41. Ganiyu-Dada Z, Bowcock S. Repeat haematinic requests
in patients with previous normal results: the scale of the
problem in elderly patients at a district general hospital. Int J
Lab Hematol. 2011;33:610-613.
42. Kwok J, Jones B. Unnecessary repeat requesting of tests: an
audit in a government hospital immunology laboratory. J Clin
Pathol. 2005;58:457-462.
43. Bates DW, Boyle DL, Rittenberg E, et al. What proportion
of common diagnostic tests appear redundant? Am J Med.
44. MacMillan D. Calculating cost savings in utilization
management. Clin Chim Acta. 2014;427:123-126.
© American Society for Clinical Pathology