Special Report Need for Improved Instrument and Kit Evaluations JAMES R. HACKNEY, M.D. AND GEORGE S. CEMBROWSKI, M.D., PH.D. A review of method comparison studies published in the American Journal of Clinical Pathology indicates that hematology evaluations are less rigorous than their chemistry counterparts and rely heavily on the correlation coefficient. While clinical chemistry evaluations depend more on linear regression, they tend to omit relevant, lesser-known statistics* such as the standard error of the estimate. The authors reiterate guidelines for the collection and statistical analysis of method comparison data and recommend that both hematology arid chemistry evaluations be improved. (Key words: Instrument evaluation; Method evaluation; Accuracy; Precision; Allowable error) Am J Clin Pathol 1986; 86: 391-393 RECENTLY, Fraser and Singer pointed out that many method comparison studies published in the clinical chemistry literature either omit useful statistics or rely too heavily on weak or inappropriate statistical tests.8 Because a cursory inspection of method evaluation studies in the clinical pathology literature appeared to demonstrate these same limitations, we reviewed the articles published in a leading clinical pathology journal, the American Journal of Clinical Pathology. We studied all the evaluations of instruments and commercial quantitative assays published in the American Journal of Clinical Pathology from January 1984 to June 1985 inclusive (volumes 81-83) and tabulated the statistical tests employed by both clinical chemistry and hematology/coagulation evaluations. Department of Pathology and Laboratory Medicine, Ochsner Clinic and Ochsner Foundation Hospital, New Orleans, Louisiana, and the Department of Pathology and Laboratory Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania slope and y-intercept (four of six vs. two of seven). By contrast, hematology and coagulation evaluations tend to only present a scatter graph without regression analysis (five of seven) and rely heavily on the correlation coefficient (four of seven). While the clinical chemistry studies appeared to be well planned and better documented, several useful statistics were omitted. The standard error of the estimate (also known as the standard deviation of the regression line, Sy,x) was reported in only one study. The standard error of the slope and the standard error of the y-intercept were not reported in any of the articles. Only one study determined the total error of the test method and compared it with the allowable error at a medical decision level.9 Methods and Results Because there are no well-accepted guidelines for the evaluation of qualitative and semiquantitative assays, we eliminated such studies from further consideration. Of the 13 articles that evaluated quantitative assays, 6 were in the general category of clinical chemistry and 7 in the category of hematology/coagulation. The statistics used by each of the articles were tabulated and are shown in Table 1. Table 1 indicates that clinical chemistry evaluations are more likely to use linear regression and report the Received September 30, 1985; received revised manuscript and accepted for publication March 10, 1986. Address reprint requests to Dr. Cembrowski: Department of Pathology and Laboratory Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104. 391 Discussion The recent proliferation of clinical chemistry evaluation protocols plus the requirements of journals such as Clinical Chemistry for specific method evaluation statistics 2 " 4 "" 20 have contributed to the increasing sensitivity of clinical chemists and chemical pathologists to the rigorous demands of method comparison studies. Our study confirms the findings of Fraser and Singer, that clinical chemistry evaluations still omit important statistics such as standard error of the estimate and standard deviation of the slope and intercept. In fact, estimates of withinrun and between-run imprecision were provided for only one of the evaluations. There is little attempt to estimate total analytic error and compare it with the medically allowable error. Only one article, a calcium evaluation, evaluated the total error and compared it with the medically allowable error.9 In contrast to the clinical chemistry evaluations, the evaluations of quantitative hematology and coagulation methods are generally less rigorous, with approximately one-half employing the correlation coefficient as the primary criterion for test acceptability. While few of the 392 HACKNEY AND CEMBROWSKI Table 1. The Use of Statistics in Recent Method Comparison Studies Published in the American Journal of Clinical Pathology Statistics Number of studies Within-day imprecision Between-day imprecision Scatter plot Slope of regression line Standard deviation of the slope y-lntercept Standard deviation of the y-intercept Standard error of the estimate Correlation coefficient Bias Bias (evaluated with /-test) Total error at medical decision level Clinical Chemistry Hematology/ Coagulation 6 3/6 3/6 4/6 4/6 0/6 3/6 7 2/7 1/7 7/7 2/7 0/7 3/7 0/6 1/6 5/6 4/6 2/6 0/7 0/7 6/7 2/7 1/7 1/6 0/7 evaluations present imprecision data, we would like to believe that within-run and between-run imprecision studies were performed but not documented. The determination of inaccuracy by regression analysis is sporadic. With the onset of the prospective payment system, the clinical laboratory has become a cost center. The laboratory cannot afford to produce data of uncertain analytic and clinical significance. Published method evaluations should be rigorous and provide estimates of imprecision, inaccuracy, and, also, total analytic error. Such evaluaTable 2. Recommended Studies and Pertinent Statistics to Be Used in Method Comparison Studies Recommended Studies Linearity Recovery Interference Imprecision Within-day imprecision Between-day imprecision Paired sample comparison Scatter plot Linear regression analysis Slope y-Intercept Standard error of the estimate Correlation coefficient /-Test Estimate of total error Comments Determines the analytic range and should encompass the medical decision level(s) Estimate of proportional error Estimate of error due to common interfering substances Estimate of random error Ten replicates of 5-10 samples Conducted over a 20-day period At least 100 patient samples Proportional error Constant error Intermethod random error Sensitive to range of analyte concentrations; determines type of regression analysis to be used Sensitive to constant and proportional error Compare with "medically allowable error" A.J.C.P. • September 1986 tions will make it easier for laboratory personnel to avoid the evaluation and installation of instruments and methods with borderline performance. Guidelines for Method Comparison Studies Based on accepted recommendations for method comparison studies1314'20 and tempered by our experience, we offer the following brief method comparison protocol, which is summarized in Table 2. A linearity study should always be done to confirm the analytic range (the range over which the method is applicable without modification). The calibration curve should be accurate and reproducible at the medical decision levels. Recovery and interference studies require some effort and should have been performed by the manufacturer before marketing. Nevertheless, selected experiments should be done to rule out proportional error and susceptibility to common interfering substances such as bilirubin. Measurement of imprecision requires measurement of short-term (within-ruri or within-day) and long-term (between-day) imprecisions. The short-term imprecision studies should be carried out first. For these studies, we recommend that ten replicate analyses be performed on five to ten varied samples, chosen to represent different levels of medical interest, using both patient and control material. The between-day imprecision studies may be carried out at the same time as the comparison of methods experiment. The between-day imprecision studies should extend over at least one week in the absence of stable control material but preferably over 20 separate days, using two to three levels of control material. We recommend four measurements per day for 20 days. Using more observations per day or less than 20 days will underestimate the between-day imprecision.10 If it is necessary to conduct the between-day imprecision study over a short period of time, it may be necessary to use analysis of variance (ANOVA) to estimate the between-day imprecision. In any event, short- and long-term standard deviations should be calculated for each method. Ideally, the new test method should be compared with a definitive or reference method. When such a method is unavailable or is too labor intensive, the new method should at least be compared with the one currently in use. Comparison of methods experiments that are to be published should include at least 100 patient samples selected so that they reflect the entire range of medical interest.14 These patient samples should be analyzed five per day over at least 20 days. The data should be plotted daily so that outliers can be identified, investigated, and, if indicated, eliminated from the final computation. It may be useful to analyze the samples in duplicate to aid in the identification of outliers. The duplicate values can be used later if an alternate method of linear regression analysis is necessary, such as the method of Deming.51718 Linear regression should be used to analyze the paired Vol. 86 • No. 3 SPECIAL REPORT sample data because it identifies both proportional error (slope of the regression line) and constant error (y-intercept). Classical linear regression usually is employed and minimizes the distance between the paired data points and the regression line only in the y (vertical) direction. Alternate forms of regression, such as the technic of Deming, are required if the range of values is narrow (for example, serum sodium levels and mean corpuscular volumes) or if the comparative method is imprecise.51718 Deming's technic minimizes the distances between the paired data points and the regression line in a direction perpendicular to the regression line and makes no assumptions about the relative imprecisions of the test and comparative methods. Since the correlation coefficient (r) is extremely sensitive to random error, outliers, and the range of concentration of analyte,21 it is not an appropriate estimate of the agreement between two methods. The correlation coefficient's most useful role in method comparison studies is in indicating the need for an alternate method of linear regression. The heavy reliance on the correlation coefficient as the criterion for accepting or rejecting a test method is disconcerting because it is not sensitive to constant or proportional error. The standard error of the estimate, the standard deviation of the points around the regression line, provides a measure of the intermethod imprecision. The standard error of the slope of the regression line and the standard error of the y-intercept and their confidence intervals can be used to evaluate the magnitude of the differences of the slope and the y-intercept from their ideal values of 0 and 1, respectively. Linear regression assumes that the variance of the observed test values (the y's) is constant throughout the range of the comparative values (the x's). This assumption can be verified by visual inspection of the scatter plot of the data. If this assumption of stable imprecision of the test method proves false, weighted regression or transformation of the data may be required for the analysis. In fact, the robust nature of linear regression usually guarantees adequate results for the evaluation of most clinical laboratory methods. The importance of the paired Mest in analyzing the method comparison data is overemphasized. While the elements of the paired Mest (the bias and the standard deviation of the differences) offer measures of systematic and random error, these estimates tend to be valid only near the means of the data sets. In addition, the Mest may indicate a statistically significant bias in situations where the difference may not be clinically significant, especially when N is large. The ultimate criterion for instrument or method selection should be whether the estimate of total analytic error exceeds the medically allowable error. The total analytic error is derived from the sum of the estimates of imprecision and inaccuracy.18,20 Estimates of medically allow- 393 able error for clinical chemistry tests have been suggested by Barnett, Elion-Gerritzen, and others1'618 and have been neatly summarized by Frazer.7 With the exception of hemoglobin, there are no accepted estimates of medically allowable error for hematology and coagulation tests. Westgard has provided a conceptually simple approach for comparing the total error with the medically allowable 19,20 error. References 1. Barnett RN: Medical significance of laboratory results. Am J Clin Pathol 1968;50:671-676 2. Broughton PMG, Bergonzi C, Lindstedt G, et al: Guideline for kit evaluation, second draft. European Committee for Clinical Laboratory Standards 1985; 5(l):l-46 3. Carey RN, Garber CC: Evaluation of methods, Clinical chemistry: Theory, analysis, and correlation. Edited by LA Kaplan, A Pesce. St. Louis, CV Mosby, 1984, pp 338-359 4. Cembrowski GC, Sullivan A: Method selection and evaluation, Clinical chemistry: Principles, procedures, correlations. Edited by Bishop ML. Philadelphia, JB Lippincott, 1985, pp 70-73 5. Cornbleet PJ, Gochman N: Incorrect least squares regression coefficients in method comparison analysis. Clin Chem 1979; 25: 432-438 6. Elion-Gerritzen WE: Analytical precision in clinical chemistry and medical decisions. Am J Clin Pathol 1980; 73:183-195 7. Fraser CG: Goals for clinical biochemistry analytical imprecision: A graphic approach. Pathology 1980; 12:209-218 8. Fraser CG, Singer R: Better laboratory evaluations of instruments and kits are required. Clin Chem 1985; 31:667-670 9. Hartmann AE, Lewis LL: Evaluation of the ASTRA® o-cresolphthalein complexone method. Am J Clin Pathol 1984; 82:182187 10. Krouwer JS, Rabinowitz R: How to improve estimates of imprecision. Clin Chem 1984; 30:290-292 11. Lloyd PH: A scheme for the evaluation of diagnostic kits. Ann Clin Biochem 1978; 15:136-145 12. Logan JE: Revised (IFCC) recommendations (1983) on evaluation of diagnostic kits. Part 2. Guidelines for the evaluation of clinical chemistry kits. J Clin Chem Biochem 1983; 21:899-902 13. National Committee for Clinical Laboratories Standards: NCCLS Tentative Guideline EP5-T. User evaluation of precision performance of clinical chemistry devices. Villanova, PA, NCCLS 1984; 4(8):185-193 14. National Committee for Clinical Laboratory Standards: NCCLS document EP4-T, Tentative guidelines for manufacturers for establishing performance claims for clinical chemistry. Comparison of methods experiment. Villanova, PA, NCCLS 1982; 2 (21): 675-730 15. Percy-Robb IW, Broughton PMG, Jennings RD, et al: A recommended scheme for the evaluation of kits in the clinical chemistry laboratory. Ann Clin Biochem 1980; 17:217-226 16. Peters T, Westgard JO: Evaluation of methods, Textbook of clinical chemistry. Edited by N Tietz. Philadelphia, WB Saunders, 1985, pp 410-423 17. Wakkers PJM, Hellendoorn HBA, Op De Weegh GJ, Heerspink W: Applications of statistics in clinical chemistry: A critical evaluation of regression lines. Clin Chim Acta 1975; 64:173-184 18. Weisbrot IM: Statistics for the clinical laboratory, Philadelphia, JB Lippincott, 1985, pp 129-149 19. Westgard JO, Carey RN, Wold S: Criteria forjudging precision and accuracy in method development and evaluation. Clin Chem 1974; 20:825-833 20. Westgard JO, de Vos DJ, Hunt MR, Quam EF, Carey RN, Garber CC: Method evaluation. Houston, TX, American Society of Medical Technology, 1978 21. Westgard JO, Hunt MR: Use and interpretation of common statistical tests in method comparison studies. Clin Chem 1973; 19:49-57 22. White GH, Fraser CG: The evaluation kit for clinical chemistry: A practical guide for the evaluation of methods, instruments and reagent kits. J Auto Chem 1984; 6:122-141
© Copyright 2026 Paperzz