THE AMERICAN JOUBNAL OF CLINICAL PATHOLOGY Vol. 50, No. 6 Printed in U.S.A. Copyright 1968 by Tlio Williams & Wilkina Co. MEDICAL SIGNIFICANCE OF LABORATORY RESULTS ROY N . B A R N E T T , M . D . Chairman, Subcommittee on Criteria of Medical Usefulness of the College of American Pathologists, Chicago, Illinois 60601 Tests performed in clinical laboratories are among the main diagnostic aids available to physicians. The great demand for these tests and the enormous number performed have led to wide public interest in the manner in which they are carried out. In turn, this interest has led to Federal specifications through legislation for clinical laboratories in two areas. One is the Medicare Act, to assure that Medicare beneficiaries receive proper services by setting standards for acceptable laboratories which may be paid under the act. The other is the Clinical Laboratories Improvement Act of 1967, to regulate the laboratories dealing in interstate commerce. Regulations under both acts set basic standards of operation and require participation in a proficiency testing program to help assure accurate laboratory results. Fundamental to consideration of proper accurate test results is the question: how accurate must clinical laboratory work be? There is no simple over-all answer to this question, and any answer that we find will be temporary, depending on the progress of medical knowledge and technology. The Standards Committee* of the College of American Pathologists, which has been the major proficiency surveying body in the United States for a number of years,t has been deeply involved in this question since Received M a y 22, 19G8. Requests for reprints should be sent t o : College of American Pathologists, 230 N . Michigan Avenue, Chicago, ]11. 60601. * The Committee is also concerned with certification of certain medicalty essential s t a n d a r d materials (cyanmethemoglobin, bilirubin), provision of aqueous standards to laboratories, and evaluation of certain laboratory products. f T h e number of laboratories participating in these surveys in 1967 was: comprehensive survey 1500; basic survey, 1900 (total 3400). In 1968 it was: comprehensive survey, 2200; basic survey, 1700 (total 3900). the inception of its surveys in 1949. These surveys lead to two questions: 1. How well are laboratories performing certain common analyses? 2. Is this level of performance adequate to meet the demands of good medical practice? The present report is an examination of this problem by the Subcommittee on Criteria of Medical Usefulness (of which the author is Chairman), appointed in October 1967 by Dr. O. B. Hunter, President of the College of American Pathologists, and by Dr. Russell Eilers, Chairman of the Standards Committee. The report has been prepared after consultation with various interested pathologists, other physicians, and other medical laboratory scientists. I t is a provisional report, intended to be the basis for further discussion with all interested groups, and intended to be changed as science progresses. DEFINITION OF TERMS A glossary of pertinent terms is introduced at this point so that there will be no misunderstandings later. 1. Accuracy is closeness to the true value. It must be admitted at once that we do not "know" the true value of substances in biologic material. Frequently different methods give different results. 2. Precision is the closeness with which repeat analyses of the same material can be made. We may define precision by the following terms: A. Standard deviation. If the analytic results fall in a gaussian distribution about a mean (x) we can calculate a value on either side of the mean known as the standard deviation (s). The percentage of values included within various multiples of the standard deviation is known. The value which we use most often in this discussion is the mean ± 2 s encompassing 95.45% of all of the results. 671 672 BARNETT B. Coefficient of variation. Often the standard deviation can be usefully expressed as a percentage of the mean rather than as an. absolute value. This is known as the coefficient of variation and is derived by the formula standard deviation 5 X 100 mean value C. Percentiles. These are the cumulative percentages of numerical observations, usually arranged in ascending order. If it is felt that numerical observations do not fit a gaussian curve one can actually calculate a range to exclude very low values (below 2.5 percentile) and very high values (above 97.5 percentile) and thereby include a 95% range comparable to x ± 2 s. S. "State of the art." This is the current evaluation of the accuracy and precision of laboratory analyses. In this report we derive these values from the 3400 laboratories which participated in the 1967 voluntary survey programs of the College of American Pathologists, and which forwarded their results to the Standards Committee for statistical analysis. The Committee recognizes that survey specimens may be handled more carefully than routine specimens, particularly when surveys are required to meet regulatory requirements. On the other hand, survey samples are unfamiliar to many laboratory workers and may be handled incorrectly because they differ from routine samples. Sometimes survey samples themselves are inaccurate. With all of these reservations we still believe that such voluntary survey data offer a sound indication of routine laboratory performance. 4- Medically significant (medically useful). Medically significant limits of accuracy encompass those values which are of maximal use in patient care. The many factors which must be considered in setting specific limits are taken up in later sections. 5. Normal range. Limits for the "normal" population are commonly expressed as those values which include 95% of persons not known to have an illness affecting the component under consideration. The 95 % range may be calculated from a gaussian or percentile distribution. CV = Vol. 50- 6. Decision level. This is the dividing pointat which medical decisions are commonly made concerning the presence or absence of a disease state or the necessity for treatment. For example, serum potassium provides two decision levels. One is at 3.0 mEq. per 1.; values of this level or lower would be commonly accepted as indicating hypokalemia with a need for prescribing potassium supplements. The other is at 6.0 mEq. per 1.; values of this level or higher would be generally accepted as indicating hyperkalemia and suggesting a need for treatment. GUIDELINES TO FOLLOW IN ESTABLISHINGLIMITS FOR REPORTING VALUES OF MEDICAL SIGNIFICANCE .4. Desirable limits for accuracy and •precision must be defined individually for each type of analysis performed. Comment. It would be most convenient if a single set of limits could be applied to every type of analysis. This is impossible. Some clinical laboratory analyses are reported quantitatively, and others as positive or negative; others require value judgments as to exact identification of lesions, cells, or organisms. Even within the primarily quantitative disciplines such as clinical chemistry and hematology, significant limits differ for different substances. 1. Some differences are technical in nature, reflecting the much greater precision of certain analytic technics. The thinking of attending physicians over the years reflects their observation of this fact; they draw conclusions from small changes in values for some tests and not for others. When better methods are introduced, clinicians utilize the greater precision by appropriate changes in their interpretation of results. 2. Other differences are physiologic. For calcium, a substance which is under close homeostatic control, a very precise technic is most desirable. For glucose, a substance whose blood level varies widely depending on ingestion of food, emotion, time of day, and other factors, such great precision of analysis is not helpful in the interpretation of test results. B. Desirable limits for accuracy and pre- Dec. 1968 MEDICAL SIGNIFICANCE OF LABORATORY RESULTS cision must be defined at each level of medical significance. Maximal accuracy and precision are necessary at decision levels. Comment. For example, the bilirubin determination decision levels are at about 1.2 mg. per 100 ml., separating normal from hyperbilirubinemic individuals, and at about 20.0 mg. per 100 ml., the critical level for embarking on exchange transfusion of erythroblastotic infants. At these levels physicians require the greatest accuracy because vital decisions depend on the results. On the other hand, at such intermediate levels as 9.0 mg. per 100 ml. no change in diagnosis or treatment would follow a relatively large change in the reported result. Another example is in glucose determination. A 2-hr. postprandial plasma glucose of 120 mg. per 100 ml. is accepted as normal; a level of 130 mg. per 100 ml. leads to further consideration of possible diabetes, so that 120 mg. represent a decision level. A much larger difference between 200 mg. per 100 ml. and 250 mg. per 100 ml. would alter neither diagnosis nor treatment. C. Accuracy and precision of a degree greater than is useful clinically should not be required if extra time or expense is thereby made necessary. Comment. Erythrocyte counts done in a single chamber are not accurate enough to be clinically useful. If four chambers are counted and the values averaged, a useful but expensive result is achieved. As technology improved, automatic counting devices appeared and the results became both cheap and clinically useful. Another example is identification of Salmonella. Knowledge that a stool culture contains a Salmonella grouped by group serum and biochemical reactions is medically vital information. Further complete identification of the organism by antigen analysis is not necessary for patient care, despite the epidemiologic information provided; it is also prohibitively expensive in hospital practice. Antigen analysis therefore should not be obligatory for ordinary medical care facilities. D. Desirable accuracy should be such that the method will create no substantial divergence from generally accepted values for normal and disease slates. 673 Comment. For many nonenzyme constituents of body fluids physicians have learned normal ranges. It is not proper to adopt a new method yielding different ranges unless there are substantial advantages in accuracy, precision, ease, rapidity of performance, or freedom from random error. If a truly advantageous method is developed and introduced a thorough explanation must be made to clinicians. This was done, for example, when "true" glucose methods replaced Folin-Wu technics. Pressure to change to new technics yielding different normals should not be applied unless medical benefits are clearly promoted thereby. E. Desirable precision should be such that errors induced by the measurement process do not significantly widen the range of values for the normal population. Comment. This objective is achieved by methods whose standard deviation does not exceed one-twelfth to one-twentieth of the normal population range defined as including 95% of normal persons. The "normal" range is a composite of true differences between individuals and of differences introduced by the technical methods. If the standard deviation of the method is one-twelfth of the population range it will cause the apparent range to be 5.4% larger than the true range; if it is onetwentieth of the population range it will cause an enlargement of 2.0%. This particular criterion is less reliable than the others noted because our present knowledge of normal ranges is inadequate and uncertain. If the normal range were compiled for a group uniform as to sex, age, ethnic group, and geographic location, it would be narrower than the usual range for all adults. Goals for precision in this category would therefore differ, depending on the population chosen. F. Ability to distinguish normal from abnormal values is often more important than the determination of absolute values. Comment. For some substances in body fluids there are many analytic methods yielding widely disparate numerical results. Enzyme analyses fall into this category. Even laboratories allegedly using identical 674 Vol. 50 BARNETT methods rarely achieve identical results, yet most of them distinguish adequately between normal and abnormal values— this is the information which the attending physician needs. G. An approximate result available promptly may be much more useful than an exact result reported after a long delay. Comment. Two examples will illustrate this point clearly. 1. In an unconscious diabetic patient an immediate report that the blood glucose is very low is an invaluable guide to prompt treatment and may be lifesaving. Conversely, a precise report that the glucose level is 20.3 mg. per 100 ml. is useless if it is not available until 24 hr. later, when the patient is dead. 2. A Gram stain of purulent spinal fluid correctly and immediately reported as demonstrating Gram-positive lanceolate diplococci is vital information for initiating treatment. If the full report of Diplococcus pneumoniae Type IS is delayed for 2 days it is useless. H. An approximate result available locally under usual laboratory conditions may be more useful medically than a more accurate value available only at a distant center. Comment. Here the medical use to which the result is to be put is crucial. Again blood glucose is an example; a crude method which can be performed with locally available personnel and equipment is necessary to save lives. On the other hand, a crude protein-bound iodine method would never be justified because delay in reporting of mailed out specimens will not harm the patient. I. A less precise analytic technic free of large errors may be preferable to a more precise method subject to large random errors. This aspect of analytic technics has not received adequate attention. It is particu- TABLE 1 M E D I C A L SIGNIFICANCE V A L U E S Component Decision Level* s at Same Lev elf Calculated CV Hemoglobin Hematocrit Glucose Glucose Glucose Blood urea nitrogen Uric acid Total protein Albumin Globulin Cholesterol Bilirubin Bilirubin Calcium Phosphorus Sodium Sodium Potassium Potassium Chloride Chloride C02 C02 10.5 C m . 32% 50 mg. 100 mg. 120 mg. 27 mg. 0.0 mg. 7.0 Gm. 3.5 Gm. 3.5 Gm. 250 mg. 1.0 m g . 20.0 mg. 11.0 mg. 4.5 mg. 130 m E q . / l . 150 m E q . / l . 3 mEq./l. 6 mEq./l. 90 m E q . / l . 110 m E q . / l . 20 m E q . / l . 30 m E q . / l . 0.5 1.0 5.0 5.0 5.0 2.0 0.5 0.3 0.25 0.25 20.0 0.2 1.5 0.25 0.25 2.0 2.0 0.25 0.25 2.0 2.0 1.0 1.0 4.7G 3.12 10.00 5.00 4.17 7.41 8.33 4.28 7.14 7.14 8.00 20.00 7.50 2.27 5.50 1.54 1.33 8.33 4.17 2.22 1.82 5.00 3.33 Low Level* % * All values per 100 ml. unless indicated. t Same units as corresponding decision level. 5 Gm. 10% 20 mg. 4. mg. 4 mg. 2 Gm. 1.5 Gm. 1.5 Gm. 80 mg. 0.4 mg. 5.0 mg. 1.5 mg. 100 m E q . / l . 1.5 m E q . / l . 50 m E q . / l . 8 mEq./l. Dec. 1968 MEDICAL SIGNIFICANCE OF LABORATORY RESULTS TABLE 2 COMPARISON OF 1907 CV L I M I T S FOR M O S T P R E C I S E M E T H O D WITH " M E D I C A L L Y SIGNIFICANT 1. Component and Level Hemoglobin, 10.5 C m . Glucose, 100 rag. Glucose, 120 rag. Blood urea nitrogen, 27 rag. Uric acid, (i.O rag. Total protein, 7.0 Gra. Albumin, 3.5 Gm. Globulin, 3.5 Gm. Cholesterol, 250 mg. Bilirubin, 1.0 mg. Bilirubin, 20.0 rag. Calcium, 11.0 rag. Phosphorus, 4.5 nig. Sodium, 130 m E q . / l . Sodium, 150 m E q . / l . Potassium, 3 m.Eq./l. Potassium, G m E q . / l . Chloride, 00 m E q . / l . Chloride.'ilO m E q . / l . CV" 4. 2. Per cent oi 3. Medically State of Participant Significant the Art Values Excluded CV CV 3 vs. 2 % % 4.8 5.0 4.2 7.4 3.5 5.3 5.2 8.3 0 1.4 0.1 3.0 8.3 4.3 7.1 7.1 8.0 20.0 7.5 2.3 5.0 1.5 1.3 8.3 4.2 2.2 1.8 5.8 3.9 8.S 8.8 9.1 23.3 12.8 2.8 8.4 1.8 2.0 3.7 3.3 2.1 2.1 0 0 0.2 0.2 3.3 4.0 19.7 5.7 13.7 4.9 14.8 0 0 0 4.2% Note. For CAP proficiency surveys, values outside ± 2 CV calculated from Column 3 are considered to be not acceptable, thus excluding 4.55% of all results. If Column 2 values were to be used to evaluate survey performance, an additional percentage of participant values as indicated in Column 4 would be considered not acceptable. larly important for tests in which a sudden large shift of values or a single abnormal result may lead to immediate therapeutic or diagnostic decisions. Some technical factors which lead to large random errors are: complex or difficult instrument manipulations, too many steps in the procedure, and intricate calculations of results. SPECIFIC LIMITS FOR MEDICAL SIGINMFICANCE Table 1 is a synthesis of opinions by clinicians and laboratory specialists. It lists 16 commonly tested blood constituents (Column 1) at 23 decision levels (Column 2). Column 3 gives the appropriate standard deviation at the corresponding decision 675 level and represents what would be clinically expected for ordinary use, that is, that 95 % of analytic values would be within ± 2 s of the true value. Column 4 is the coefficient of variation calculated from Columns 2 and 3. Column 5 is a list of "low" values below which accuracy is unnecessary; a report that the concentration is this value or lower is adequate for medical purposes. INTRA L A B O 11A TO R Y A ND I N T E HL A H O It A TORY EVALUATION Precision within a single laboratory is inevitably superior to that between laboratories, no matter how excellent their performance, because interlaboratory differences result from systematic bias.* For example, let us assume that five competent laboratories analyze a sample of serum for glucose, each with a day to day precision of 2 mg. for 1 s. However, the mean values are 90, 94, 98, 102, and 106 mg. per 100 ml., respectively. A physician who used any one of these laboratories routinely would be able to use the normal and abnormal results readily. However, in a survey, if the mean value were 9S, and only values of 94 and 102 were thereby accepted as satisfactory by using the precision of a single laboratory, 60% of the values for these five laboratories would be outside the 2-s range of the mean. I t is necessary, therefore, to use a system based on results of all participants to incorporate both interlaboratory and intralaboratory variability into proficiency evaluation. UTILIZATION OF SPECIFIC PROFICIENCY LIMITS FOR SURVEYS Proficiency surveying is a field which has its own technics, problems, and pitfalls. Many components of blood, for example, * A large amount of d a t a to support this thesis has been collected by the Association of Oflicial Analytic Chemists, Inc. (AOAC). Tt is well summarized in their booklet "Statistical Techniques for Collaborative T e s t s " by W. J. Youden, published by the AOAC in 1907, and available from the Association of Official Analytic Chemists, Inc., Box 540, Benjamin Franklin Station, Washington, D . C. 20044. •676 Vol. 50 BARNETT are easily preserved in survey samples; others are not. Utilizing certain materials which can be surveyed adequately, the •College of American Pathologists Survey for 196S provides "state of the art" values Avhich can be compared with "medical significance" values. The values are calculated as follows. 1. All participant results for each method are used to construct a gaussian curve. Reports outside of x ± 3 s are assumed to be gross errors and are omitted in the next step. 2. The remaining values are used to construct a new gaussian curve from which x, s, and CV are calculated. 3. Participant results falling within ± 2 CV are considered "acceptable" for the following year's survey. This means that the outer 4.55% are always considered "not acceptable." In Table 2 the 1967 CV limits for the .most precise method* are compared with the "medically significant CV." The last •column indicates what percentage of participant values in addition to the 4.55% already considered "not acceptable" would be excluded if medically significant limits Avere used. Values of 0% indicate that at this time laboratories can provide values as accurate as are necessary medically on a routine basis. Values above 0% indicate the extent to Avhich current "state of the art" routine analyses do not meet the physician's needs or desires. Unfortunately, this is the best which 1967 methodology permits. I t is unrealistic to require medically significant limits Avhere they are not practical. However, these figures point clearly to those areas where methodology must be improved as rapidly as possible. When proficiency test samples of the constituents listed in Table 2 are sent to laboratories the CV for the value nearest the "true" value should be used. When Column * Methods used by only a few laboratories are omitted. 3 values exceed Column 2, the larger values should be used because they reflect the best performance presently attainable on a routine basis. U T I L I Z A T I O N O F S P E C I F I C L I M I T S I N MEDICAL PRACTICE It is best if the practicing physician knows the normal values and precision for the clinical laboratory which he uses most often. If he does not have these data, or is presented Avith values from another laboratory, he can use Table 2 by observing the following simple rules. 1. Find the component and the nearest level in Column 1. 2. Take the corresponding CV in Column 3 and double it. The correct value for any result will almost certainly lie within plus or minus the percentage just calculated. (Example: A uric acid is reported as 7.0 mg. per 100 ml. We take the 5.S% in Column 3; double it to get 11.6; the 7.0-mg. figure is almost certainly between 7 X 0.116 = 0.81 above or below 7.0; i.e., between 6.19 and 7.S1.) 3. For substances whose Column 4 value is 0% or near it, the laboratory accuracy is adequate for your use. If the Column 4 value is high and the decision vital, repeat the analysis several times. For example, if hyperparathyroidism is suspected and a calcium level of 10.8 is found, at least two repeat samples should be examined before the disease is considered to be excluded or demonstrated as far as the calcium level is concerned. SUMMARY The Standards Committee of the College of American Pathologists presents a statement on medical laboratory accuracy relating medical significance, state of the art achievements, and proficiency testing of laboratories. This is intended to be provisional and to serve as a basis for scientific consideration of the entire problem of laboratory performance.
© Copyright 2026 Paperzz