MEDICAL SIGNIFICANCE OF LABORATORY RESULTS Tests

THE AMERICAN JOUBNAL OF CLINICAL PATHOLOGY
Vol. 50, No. 6
Printed in U.S.A.
Copyright 1968 by Tlio Williams & Wilkina Co.
MEDICAL SIGNIFICANCE OF LABORATORY RESULTS
ROY N . B A R N E T T , M . D .
Chairman,
Subcommittee on Criteria of Medical Usefulness of the College of
American Pathologists, Chicago, Illinois 60601
Tests performed in clinical laboratories
are among the main diagnostic aids available
to physicians. The great demand for these
tests and the enormous number performed
have led to wide public interest in the manner in which they are carried out. In turn,
this interest has led to Federal specifications through legislation for clinical laboratories in two areas. One is the Medicare Act,
to assure that Medicare beneficiaries receive
proper services by setting standards for
acceptable laboratories which may be paid
under the act. The other is the Clinical
Laboratories Improvement Act of 1967, to
regulate the laboratories dealing in interstate commerce. Regulations under both
acts set basic standards of operation and require participation in a proficiency testing
program to help assure accurate laboratory
results.
Fundamental to consideration of proper
accurate test results is the question: how
accurate must clinical laboratory work be?
There is no simple over-all answer to this
question, and any answer that we find will
be temporary, depending on the progress of
medical knowledge and technology. The
Standards Committee* of the College of
American Pathologists, which has been the
major proficiency surveying body in the
United States for a number of years,t has
been deeply involved in this question since
Received M a y 22, 19G8.
Requests for reprints should be sent t o : College
of American Pathologists, 230 N . Michigan Avenue, Chicago, ]11. 60601.
* The Committee is also concerned with certification of certain medicalty essential s t a n d a r d
materials (cyanmethemoglobin, bilirubin), provision of aqueous standards to laboratories, and
evaluation of certain laboratory products.
f T h e number of laboratories participating in
these surveys in 1967 was: comprehensive survey
1500; basic survey, 1900 (total 3400). In 1968 it was:
comprehensive survey, 2200; basic survey, 1700
(total 3900).
the inception of its surveys in 1949. These
surveys lead to two questions:
1. How well are laboratories performing
certain common analyses?
2. Is this level of performance adequate to
meet the demands of good medical practice?
The present report is an examination of
this problem by the Subcommittee on
Criteria of Medical Usefulness (of which the
author is Chairman), appointed in October
1967 by Dr. O. B. Hunter, President of the
College of American Pathologists, and by
Dr. Russell Eilers, Chairman of the Standards Committee. The report has been prepared after consultation with various interested pathologists, other physicians, and
other medical laboratory scientists. I t is a
provisional report, intended to be the basis
for further discussion with all interested
groups, and intended to be changed as
science progresses.
DEFINITION OF TERMS
A glossary of pertinent terms is introduced
at this point so that there will be no misunderstandings later.
1. Accuracy is closeness to the true value.
It must be admitted at once that we do not
"know" the true value of substances in
biologic material. Frequently different
methods give different results.
2. Precision is the closeness with which
repeat analyses of the same material can be
made. We may define precision by the following terms:
A. Standard deviation. If the analytic results fall in a gaussian distribution about a
mean (x) we can calculate a value on either
side of the mean known as the standard
deviation (s). The percentage of values included within various multiples of the
standard deviation is known. The value
which we use most often in this discussion
is the mean ± 2 s encompassing 95.45%
of all of the results.
671
672
BARNETT
B. Coefficient of variation. Often the
standard deviation can be usefully expressed
as a percentage of the mean rather than as an.
absolute value. This is known as the coefficient of variation and is derived by the
formula
standard deviation
5
X 100
mean value
C. Percentiles. These are the cumulative
percentages of numerical observations,
usually arranged in ascending order. If it is
felt that numerical observations do not fit a
gaussian curve one can actually calculate a
range to exclude very low values (below 2.5
percentile) and very high values (above
97.5 percentile) and thereby include a 95%
range comparable to x ± 2 s.
S. "State of the art." This is the current
evaluation of the accuracy and precision of
laboratory analyses. In this report we derive
these values from the 3400 laboratories
which participated in the 1967 voluntary
survey programs of the College of American
Pathologists, and which forwarded their
results to the Standards Committee for
statistical analysis. The Committee recognizes that survey specimens may be handled
more carefully than routine specimens,
particularly when surveys are required to
meet regulatory requirements. On the other
hand, survey samples are unfamiliar to many
laboratory workers and may be handled incorrectly because they differ from routine
samples. Sometimes survey samples themselves are inaccurate. With all of these reservations we still believe that such voluntary
survey data offer a sound indication of
routine laboratory performance.
4- Medically significant (medically useful).
Medically significant limits of accuracy encompass those values which are of maximal
use in patient care. The many factors which
must be considered in setting specific limits
are taken up in later sections.
5. Normal range. Limits for the "normal"
population are commonly expressed as those
values which include 95% of persons not
known to have an illness affecting the component under consideration. The 95 % range
may be calculated from a gaussian or percentile distribution.
CV =
Vol. 50-
6. Decision level. This is the dividing pointat which medical decisions are commonly
made concerning the presence or absence of a
disease state or the necessity for treatment.
For example, serum potassium provides two
decision levels. One is at 3.0 mEq. per 1.;
values of this level or lower would be commonly accepted as indicating hypokalemia
with a need for prescribing potassium supplements. The other is at 6.0 mEq. per 1.; values
of this level or higher would be generally
accepted as indicating hyperkalemia and
suggesting a need for treatment.
GUIDELINES TO FOLLOW IN ESTABLISHINGLIMITS FOR REPORTING VALUES
OF MEDICAL SIGNIFICANCE
.4. Desirable limits for accuracy and •precision must be defined individually for each
type of analysis performed.
Comment. It would be most convenient if a
single set of limits could be applied to every
type of analysis. This is impossible. Some
clinical laboratory analyses are reported
quantitatively, and others as positive or
negative; others require value judgments as
to exact identification of lesions, cells, or
organisms.
Even within the primarily quantitative
disciplines such as clinical chemistry and
hematology, significant limits differ for
different substances.
1. Some differences are technical in
nature, reflecting the much greater precision
of certain analytic technics. The thinking of
attending physicians over the years reflects
their observation of this fact; they draw conclusions from small changes in values for
some tests and not for others. When better
methods are introduced, clinicians utilize the
greater precision by appropriate changes in
their interpretation of results.
2. Other differences are physiologic. For
calcium, a substance which is under close
homeostatic control, a very precise technic
is most desirable. For glucose, a substance
whose blood level varies widely depending on
ingestion of food, emotion, time of day, and
other factors, such great precision of analysis
is not helpful in the interpretation of test
results.
B. Desirable limits for accuracy and pre-
Dec. 1968
MEDICAL SIGNIFICANCE OF LABORATORY RESULTS
cision must be defined at each level of medical
significance. Maximal accuracy and precision
are necessary at decision levels.
Comment. For example, the bilirubin determination decision levels are at about 1.2
mg. per 100 ml., separating normal from
hyperbilirubinemic individuals, and at about
20.0 mg. per 100 ml., the critical level for
embarking on exchange transfusion of erythroblastotic infants. At these levels physicians
require the greatest accuracy because vital
decisions depend on the results. On the other
hand, at such intermediate levels as 9.0 mg.
per 100 ml. no change in diagnosis or treatment would follow a relatively large change
in the reported result. Another example is in
glucose determination. A 2-hr. postprandial
plasma glucose of 120 mg. per 100 ml. is
accepted as normal; a level of 130 mg. per
100 ml. leads to further consideration of
possible diabetes, so that 120 mg. represent
a decision level. A much larger difference
between 200 mg. per 100 ml. and 250 mg.
per 100 ml. would alter neither diagnosis nor
treatment.
C. Accuracy and precision of a degree
greater than is useful clinically should not be
required if extra time or expense is thereby
made necessary.
Comment. Erythrocyte counts done in a
single chamber are not accurate enough to
be clinically useful. If four chambers are
counted and the values averaged, a useful
but expensive result is achieved. As technology improved, automatic counting devices appeared and the results became both
cheap and clinically useful. Another example
is identification of Salmonella. Knowledge
that a stool culture contains a Salmonella
grouped by group serum and biochemical
reactions is medically vital information. Further complete identification of the organism
by antigen analysis is not necessary for patient care, despite the epidemiologic information provided; it is also prohibitively expensive in hospital practice. Antigen analysis
therefore should not be obligatory for ordinary medical care facilities.
D. Desirable accuracy should be such that
the method will create no substantial divergence
from generally accepted values for normal and
disease slates.
673
Comment. For many nonenzyme constituents of body fluids physicians have
learned normal ranges. It is not proper to
adopt a new method yielding different
ranges unless there are substantial advantages in accuracy, precision, ease, rapidity of performance, or freedom from random
error. If a truly advantageous method is
developed and introduced a thorough explanation must be made to clinicians. This
was done, for example, when "true" glucose
methods replaced Folin-Wu technics. Pressure to change to new technics yielding
different normals should not be applied
unless medical benefits are clearly promoted
thereby.
E. Desirable precision should be such that
errors induced by the measurement process
do not significantly widen the range of values
for the normal population.
Comment. This objective is achieved by
methods whose standard deviation does
not exceed one-twelfth to one-twentieth of
the normal population range defined as
including 95% of normal persons. The
"normal" range is a composite of true
differences between individuals and of
differences introduced by the technical
methods. If the standard deviation of the
method is one-twelfth of the population
range it will cause the apparent range to be
5.4% larger than the true range; if it is onetwentieth of the population range it will
cause an enlargement of 2.0%. This particular criterion is less reliable than the
others noted because our present knowledge
of normal ranges is inadequate and uncertain. If the normal range were compiled
for a group uniform as to sex, age, ethnic
group, and geographic location, it would be
narrower than the usual range for all adults.
Goals for precision in this category would
therefore differ, depending on the population chosen.
F. Ability to distinguish normal from
abnormal values is often more important than
the determination of absolute values.
Comment. For some substances in body
fluids there are many analytic methods
yielding widely disparate numerical results.
Enzyme analyses fall into this category.
Even laboratories allegedly using identical
674
Vol. 50
BARNETT
methods rarely achieve identical results,
yet most of them distinguish adequately
between normal and abnormal values—
this is the information which the attending
physician needs.
G. An approximate result available promptly
may be much more useful than an exact
result reported after a long delay.
Comment. Two examples will illustrate
this point clearly.
1. In an unconscious diabetic patient an
immediate report that the blood glucose is
very low is an invaluable guide to prompt
treatment and may be lifesaving. Conversely,
a precise report that the glucose level is
20.3 mg. per 100 ml. is useless if it is not
available until 24 hr. later, when the patient
is dead.
2. A Gram stain of purulent spinal fluid
correctly and immediately reported as
demonstrating Gram-positive lanceolate diplococci is vital information for initiating
treatment. If the full report of Diplococcus
pneumoniae Type IS is delayed for 2 days
it is useless.
H. An approximate result available locally
under usual laboratory conditions may be more
useful medically than a more accurate value
available only at a distant center.
Comment. Here the medical use to which
the result is to be put is crucial. Again blood
glucose is an example; a crude method which
can be performed with locally available
personnel and equipment is necessary to
save lives. On the other hand, a crude
protein-bound iodine method would never
be justified because delay in reporting of
mailed out specimens will not harm the
patient.
I. A less precise analytic technic free of
large errors may be preferable to a more precise
method subject to large random errors.
This aspect of analytic technics has not
received adequate attention. It is particu-
TABLE 1
M E D I C A L SIGNIFICANCE V A L U E S
Component
Decision Level*
s at Same Lev elf
Calculated CV
Hemoglobin
Hematocrit
Glucose
Glucose
Glucose
Blood urea nitrogen
Uric acid
Total protein
Albumin
Globulin
Cholesterol
Bilirubin
Bilirubin
Calcium
Phosphorus
Sodium
Sodium
Potassium
Potassium
Chloride
Chloride
C02
C02
10.5 C m .
32%
50 mg.
100 mg.
120 mg.
27 mg.
0.0 mg.
7.0 Gm.
3.5 Gm.
3.5 Gm.
250 mg.
1.0 m g .
20.0 mg.
11.0 mg.
4.5 mg.
130 m E q . / l .
150 m E q . / l .
3 mEq./l.
6 mEq./l.
90 m E q . / l .
110 m E q . / l .
20 m E q . / l .
30 m E q . / l .
0.5
1.0
5.0
5.0
5.0
2.0
0.5
0.3
0.25
0.25
20.0
0.2
1.5
0.25
0.25
2.0
2.0
0.25
0.25
2.0
2.0
1.0
1.0
4.7G
3.12
10.00
5.00
4.17
7.41
8.33
4.28
7.14
7.14
8.00
20.00
7.50
2.27
5.50
1.54
1.33
8.33
4.17
2.22
1.82
5.00
3.33
Low Level*
%
* All values per 100 ml. unless indicated.
t Same units as corresponding decision level.
5 Gm.
10%
20 mg.
4. mg.
4 mg.
2 Gm.
1.5 Gm.
1.5 Gm.
80 mg.
0.4 mg.
5.0 mg.
1.5 mg.
100 m E q . / l .
1.5 m E q . / l .
50 m E q . / l .
8 mEq./l.
Dec. 1968
MEDICAL SIGNIFICANCE OF LABORATORY RESULTS
TABLE 2
COMPARISON OF 1907 CV L I M I T S FOR M O S T
P R E C I S E M E T H O D WITH " M E D I C A L L Y
SIGNIFICANT
1. Component and Level
Hemoglobin, 10.5 C m .
Glucose, 100 rag.
Glucose, 120 rag.
Blood urea nitrogen, 27
rag.
Uric acid, (i.O rag.
Total protein, 7.0 Gra.
Albumin, 3.5 Gm.
Globulin, 3.5 Gm.
Cholesterol, 250 mg.
Bilirubin, 1.0 mg.
Bilirubin, 20.0 rag.
Calcium, 11.0 rag.
Phosphorus, 4.5 nig.
Sodium, 130 m E q . / l .
Sodium, 150 m E q . / l .
Potassium, 3 m.Eq./l.
Potassium, G m E q . / l .
Chloride, 00 m E q . / l .
Chloride.'ilO m E q . / l .
CV"
4.
2.
Per cent oi
3.
Medically State of Participant
Significant the Art Values
Excluded
CV
CV
3 vs. 2
%
%
4.8
5.0
4.2
7.4
3.5
5.3
5.2
8.3
0
1.4
0.1
3.0
8.3
4.3
7.1
7.1
8.0
20.0
7.5
2.3
5.0
1.5
1.3
8.3
4.2
2.2
1.8
5.8
3.9
8.S
8.8
9.1
23.3
12.8
2.8
8.4
1.8
2.0
3.7
3.3
2.1
2.1
0
0
0.2
0.2
3.3
4.0
19.7
5.7
13.7
4.9
14.8
0
0
0
4.2%
Note. For CAP proficiency surveys, values outside ± 2 CV calculated from Column 3 are considered to be not acceptable, thus excluding 4.55%
of all results. If Column 2 values were to be used
to evaluate survey performance, an additional
percentage of participant values as indicated in
Column 4 would be considered not acceptable.
larly important for tests in which a sudden
large shift of values or a single abnormal
result may lead to immediate therapeutic
or diagnostic decisions. Some technical
factors which lead to large random errors
are: complex or difficult instrument manipulations, too many steps in the procedure,
and intricate calculations of results.
SPECIFIC LIMITS
FOR MEDICAL
SIGINMFICANCE
Table 1 is a synthesis of opinions by
clinicians and laboratory specialists. It
lists 16 commonly tested blood constituents
(Column 1) at 23 decision levels (Column 2).
Column 3 gives the appropriate standard
deviation at the corresponding decision
675
level and represents what would be clinically expected for ordinary use, that is,
that 95 % of analytic values would be within
± 2 s of the true value. Column 4 is the
coefficient of variation calculated from
Columns 2 and 3. Column 5 is a list of
"low" values below which accuracy is unnecessary; a report that the concentration
is this value or lower is adequate for medical
purposes.
INTRA L A B O 11A TO R Y
A ND
I N T E HL A H O It A TORY
EVALUATION
Precision within a single laboratory is
inevitably superior to that between laboratories, no matter how excellent their performance, because interlaboratory differences result from systematic bias.* For
example, let us assume that five competent
laboratories analyze a sample of serum for
glucose, each with a day to day precision of
2 mg. for 1 s. However, the mean values
are 90, 94, 98, 102, and 106 mg. per 100 ml.,
respectively. A physician who used any one
of these laboratories routinely would be able
to use the normal and abnormal results
readily. However, in a survey, if the mean
value were 9S, and only values of 94 and 102
were thereby accepted as satisfactory by
using the precision of a single laboratory,
60% of the values for these five laboratories
would be outside the 2-s range of the mean.
I t is necessary, therefore, to use a system
based on results of all participants to
incorporate both interlaboratory and intralaboratory variability into proficiency evaluation.
UTILIZATION
OF
SPECIFIC
PROFICIENCY
LIMITS
FOR
SURVEYS
Proficiency surveying is a field which has
its own technics, problems, and pitfalls.
Many components of blood, for example,
* A large amount of d a t a to support this thesis
has been collected by the Association of Oflicial
Analytic Chemists, Inc. (AOAC). Tt is well
summarized in their booklet "Statistical Techniques for Collaborative T e s t s " by W. J. Youden,
published by the AOAC in 1907, and available from
the Association of Official Analytic Chemists, Inc.,
Box 540, Benjamin Franklin Station, Washington,
D . C. 20044.
•676
Vol. 50
BARNETT
are easily preserved in survey samples;
others are not. Utilizing certain materials
which can be surveyed adequately, the
•College of American Pathologists Survey
for 196S provides "state of the art" values
Avhich can be compared with "medical
significance" values. The values are calculated as follows.
1. All participant results for each method
are used to construct a gaussian curve.
Reports outside of x ± 3 s are assumed to
be gross errors and are omitted in the next
step.
2. The remaining values are used to
construct a new gaussian curve from which
x, s, and CV are calculated.
3. Participant results falling within ± 2
CV are considered "acceptable" for the
following year's survey. This means that the
outer 4.55% are always considered "not
acceptable."
In Table 2 the 1967 CV limits for the
.most precise method* are compared with
the "medically significant CV." The last
•column indicates what percentage of participant values in addition to the 4.55%
already considered "not acceptable" would
be excluded if medically significant limits
Avere used. Values of 0% indicate that at
this time laboratories can provide values as
accurate as are necessary medically on a
routine basis. Values above 0% indicate the
extent to Avhich current "state of the art"
routine analyses do not meet the physician's
needs or desires. Unfortunately, this is the
best which 1967 methodology permits. I t is
unrealistic to require medically significant
limits Avhere they are not practical. However, these figures point clearly to those
areas where methodology must be improved
as rapidly as possible.
When proficiency test samples of the
constituents listed in Table 2 are sent to
laboratories the CV for the value nearest the
"true" value should be used. When Column
* Methods used by only a few laboratories are
omitted.
3 values exceed Column 2, the larger values
should be used because they reflect the best
performance presently attainable on a routine basis.
U T I L I Z A T I O N O F S P E C I F I C L I M I T S I N MEDICAL
PRACTICE
It is best if the practicing physician knows
the normal values and precision for the
clinical laboratory which he uses most often.
If he does not have these data, or is presented
Avith values from another laboratory, he
can use Table 2 by observing the following
simple rules.
1. Find the component and the nearest
level in Column 1.
2. Take the corresponding CV in Column
3 and double it. The correct value for any
result will almost certainly lie within plus
or minus the percentage just calculated.
(Example: A uric acid is reported as 7.0 mg.
per 100 ml. We take the 5.S% in Column
3; double it to get 11.6; the 7.0-mg. figure
is almost certainly between 7 X 0.116 =
0.81 above or below 7.0; i.e., between 6.19
and 7.S1.)
3. For substances whose Column 4 value
is 0% or near it, the laboratory accuracy is
adequate for your use. If the Column 4
value is high and the decision vital, repeat
the analysis several times. For example, if
hyperparathyroidism is suspected and a
calcium level of 10.8 is found, at least two
repeat samples should be examined before
the disease is considered to be excluded or
demonstrated as far as the calcium level is
concerned.
SUMMARY
The Standards Committee of the College
of American Pathologists presents a statement on medical laboratory accuracy relating medical significance, state of the art
achievements, and proficiency testing of
laboratories. This is intended to be provisional and to serve as a basis for scientific
consideration of the entire problem of
laboratory performance.