DIAGNOSTIC METHODS ELECTROCARDIOGRAPHY Assessment of the performance of electrocardiographic computer programs with the use of a reference data base Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 Jos L. WILLEMS, M.D., PIERRE ARNAUD, M.D., JAN H. VAN BEMMEL, PH.D., PETER J. BOURDILLON, M.D., CHRISTIAN BROHET, M.D., SERGIO DALLA VOLTA, M.D., JENS DAMGAARD ANDERSEN, M.D., ROSANNA DEGANI, PH.D., BERNARD DENIS, M.D., MICHEL DEMEESTER, M.D., JOACHIM DUDECK, M.D., FRITS M. A. HARMS, M.D., PETER W. MACFARLANE, PH.D., GIANFRANCO MAZZOCCA, M.D., JORGEN MEYER, M.D., JORG MICHAELIS, M.D., JOS PARDAENS, D.Sc., SIEGFRIED J. P6PPL, PH.D., BERNARD C. REARDON, PH.D., HENK J. RITSEMA VAN ECK, M.D., ETIENNE 0. ROBLES DE MEDINA, M.D., PAUL RUBEL, M.SC., JAN L. TALMON, PH.D., AND CHRISTOPH ZYWIETZ, M.SC. ABSTRACT To allow an exchange of measurements and criteria between different electrocardiographic (ECG) computer programs, an international cooperative project has been initiated aimed at standardization of computer-derived ECG measurements. To this end an ECG reference library of 250 ECGs with selected abnormalities was established and a comprehensive reviewing scheme was devised for the visual determination of the onsets and offsets of P, QRS, and T waves. This task was performed by a group of cardiologists on highly amplified, selected complexes from the library of ECGs. With use of a modified Delphi approach, individual outlying point estimates were eliminated in four successive rounds. In this way final referee estimates were obtained that proved to be highly reproducible and precise. This reference data base was used to study measurement results obtained with nine vectorcardiographic and 10 standard 12-lead ECG analysis programs. The medians of program determinations of P, QRS, and T wave onsets and offsets were close to the final referee estimates. However, an important variability could be demonstrated between measurements from individual programs and mean differences from the referee estimates amounted to 10 msec for QRS for certain programs. In addition, the variances of all programs with respect to the referee point estimates were variable. Some programs proved to be more accurate and stable when the data from high- vs low-noise recordings were analyzed. Average Q wave durations calculated from ECGs for which programs agreed on the presence of a Q or QS wave differed by more than 8 msec in several program-to-program comparisons. Such differences may have important consequences with respect to diagnostic performance. Various factors that might explain these differences have been determined. The present study demonstrates that to allow an exchange of results and diagnostic criteria between different ECG computer programs, definitions, minimum wave requirements, and measurement procedures urgently need to be standardized. Circulation 71, No. 3, 523-534, 1985. DURING the last decade rapid growth has occurred in computer electrocardiographic (ECG) processing. 1-3 At present, however, no standards for quantitative ECG analysis exist. There is a lack of agreement on The authors academic affiliations are listed in the Common Standards for Quantitative Electrocardiography (CSE) organizational structure that appears before the references. Supported in part by the Commission of the European Communities, within the frame of its Medical and Public Health Research program under project No. 82/616/EEE 11.2.2, and by local and national research funding to different institutes in nine member states of the European Economic Community. Address for correspondence: Jos L. Willems, M.D., CSE Project Leader, University Hospital of Gasthuisberg, 49, Herestraat, 3000 Leuven, Belgium. Received May 30, 1984; revision accepted Nov. 8, 1984. Vol. 71, No. 3, March 1985 definitions of waves and common measurements, standardized criteria for classification, and common terminology for reporting.4' 5 To overcome some of these problems, an international project entitled Common Standards for Quantitative Electrocardiography (CSE) was initiated in the European community.69 The principal objectives of CSE are to establish recommendations for the standardization of computer-derived ECG measurements and to obtain agreement on definitions of waves and on references for the on- and offsets of P, QRS, and T waves. In other words, when the same data are given as input to any three computer programs, the ultimate 523 WILLEMS et al. Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 goal is to obtain the same measurement results, e.g., for Q durations. Only then can diagnostic ECG criteria for myocardial infarction and other conditions be exchanged and possibly standardized. Means and variances of measurement results obtained by various programs analyzing a common data base should fall within acceptable ranges. A measurement reference library was therefore established through a comprehensive, interactive review process that was performed by a group of cardiologists on highly amplified ECG tracings. Methods used were those that would ensure maximum quality and reproducibility of the resulting reference data base. The purpose of the present report is to describe results obtained by nine vectorcardiographic (VCG) and 10 standard 12-lead ECG computer programs analyzing this data base, and to highlight the need for recommendations on more precise standards, rules of measurement, and definitions. The study does not aim to criticize individual programs but rather to provide a firm foundation for improvement of all programs in the future. Methods Study protocol and description of the data base. The CSE Working Party consists of active participants from 20 institutions of the European community. In addition, investigators from six North American and one Japanese center also collaborated in the project by processing data or as consultants. The protocol of the study (the method of collection and analysis of the data base by a group of referee-cardiologists) has been described in detail elsewhere.6- '0 Briefly, from a group of digitized ECGs submitted to the coordinating center by five participating institutes, a sample of 250 was chosen that represented a wide variety of ECG morphologies. The data were collected at 500 Hz with a resolution of at least 10 bits and a minimum quantization level of 5 ,u V. They were recorded with equipment meeting AHA standards in groups of at least three simultaneous leads; the group included tracings of the standard 12 as well as the Frank XYZ leads. Because different ECG measurement programs have various philosophies with respect to analysis. e.g., some select 1 beat for analysis while others base results on an average beat, socalled artificial ECGs were also created. This was done by selecting 1 beat from each of the lead groups of each of the 250 original recordings and by creating strings of identical beats with stable RR intervals over 10 sec for the XYZ leads and over 5 sec for each three-lead group of the conventional 12 leads. The selected beats were chosen by eye in such a way as to be close to the dominant beat with the least possible baseline shift, noise, and artifact. A variable segment was interlaced between the beats to correct for possible offset artifacts. Another group of 60 artificial ECGs was composed from additional beats selected for a study of beat-to-beat variation, so the total artificial ECG library was composed of 310 recordings. Seventy of the artificial ECGs were recorded with six simultaneous leads, i.e., the six peripheral and the six precordial leads. The 250 original and 310 artificial ECGs were randomly *CSE participants are listed before the references. 524 divided into two sets containing nearly equal samples of each pathologic entity. It was agreed that detailed results would be made generally available only from data set 1, the so-called training set, whereas summary results only would be available from data set 2, the test set. This was done to prevent the processing centers from adapting their programs based on the referee results of the test set. Analysis by the referees. The beats selected for the artificial library have been analyzed by a board of referee-cardiologists from five different countries. The referees had experience in computer-assisted ECG interpretation, but to avoid bias had not been involved in program development. An overview of their analysis is presented in figure 1. In view of the well-known interobserver and intraobserver variability in determining wave recognition points, an elaborate reviewing scheme, consisting of four rounds, was devised. With the use of a modified Delphi approach," individual referee outliers were eliminated from the analysis in successive steps, an outlier being a point estimate that differs considerably from the median result. The referees were asked to mark the group on- and otfsets of the P wave and the group end of the T wave, as well as the individual on- and offsets of the QRS complexes in each lead (figure 2). on highly amplified tracings written out at 500 mm/ sec and 100 mm/mV gain. The earliest onset and latest offset of QRS in any lead was taken as the QRS group onset and offset, respectively. These leadgroup onsets and offsets were used to compute so-called isoelectric segments at the beginning and end of QRS in each lead by measuring the distance to QRS onsets and offsets determined in the respective single leads. In addition, the referees had to provide, per lead, a wave morphology description (e.g., P + QRSR'T + or positive P and T wave and an R' after a QRS complex). The referees completed their firstround analyses at home with Mingograph recordings. Reference points were marked on the paper tracings and subsequently 81 ,450 points were transferred to the computer in the coordinating center. The subsequent rounds were performed on a subset of ECGs in the coordinating center on a Tektronix 4010 graphics display terminal. To test intraobserver variability the referees were given the same beats of 26 ECGs on two other randomly selected occasions over a period of 1 year. Measurement precision could also be assessed in the ECGs in which six leads were recorded simultaneously but which were analyzed in sets of three. From a theoretical point of view, wave onsets and offsets should occur at the same time in simultaneously recorded unipolar and bipolar limb leads, since these leads are mathematically interrelated. This is not necessarily the case for the precordial leads. Processing by the computer programs. Ten centers in Europe, five in North America, and one in Japan, including those of commercial groups, participated in the analysis of the ECGs. Each of the cooperating centers had to present results of the analysis on magnetic tape in an agreed format. Both the 250 original and the 310 artificial ECG recordings were processed by a total of nine VCG and 10 standard 12-lead programs, which are listed in table 1. Descriptions of these programs have been published. 12-15 The parameters measured were those of basic interval and amplitude, i.e., P and QRS duration, PR and QT interval, duration and amplitude of Q, R, S, R, S', and R", and amplitude of the J point and of the positive and negative components of the P and T waves. Time locations with respect to the beginning of the record or of the reference beat were requested, as well as a copy of the raw data for the modal or averaged beat, when applicable (see Discussion). Alignment of the respective averaged beats with the beat analyzed by the referees was made in the coordinating center by means of a cross-correlation methCIRCULATION DIAGNOSTIC METHODS-ELECTROCARDIOGRAPHY Ist ROUND AT LEAST 4 REFEREES WITHIN D1 VALUE FROM MEDIAN YES NO d 2nd ROUND o EACH REFEREE REVIEWS MEASUREMENT WITH FEEDBACK h o TAKE MEDIAN AS REFEREE ESTIMATE . L 3rd ROUND Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 FIGURE 1. Summary of the different reviewing rounds for the final determination of P, QRS, and T onsets and offsets by the group of referees. The limits Dl to D3 used for the deviations of the individual from the median referee results are given in the inset of the flow diagram. [ 4th ROUND od. As for the referee results, the earliest onset and latest offset of QRS in any of the three corresponding leads were taken to represent the computer QRS onset and offset for that lead group. Statistical analysis. Various listings and tables containing results of referee-to-referee, program-to-program, and program-to-referee comparisons were returned as feedback to the processing centers. With respect to wave onsets and offsets, differences (algebraic and absolute) were calculated between final referee estimates and the median, as well as between the estimates and individual program results. This has been performed for both data sets separately and combined, as well as for the ECGs divided into those with the lowest and those with the highest noise content. Details on the calculation of the noise content and the applied ranking procedure have been reported elsewhere.'6 With respect to the durations and amplitudes of the various components of P, QRS, and T, differences were computed between individual and median program results. This was done for the artificial as well as for the original ECG recordings. The referees were not asked to make such measurements on the individual wave components. Median program results were used to determine the minimum duration and amplitude of the QRS waves that the referees could recognize confidently, i.e., at least four of the five referees had to agree on the presence or absence of the specific wave component, and their wave onsets and offsets of QRS had to fall within specific limits. Parametric statistics were used to evaluate mean differences and variances between program and referee results. Also, 99% Vol. 71, No. 3, March 1985 confidence intervals were calculated. Because one or two large outliers might significantly distort variance figures, 2% of the cases with the highest differences for QRS onset and offset) and 3% for P and T wave results were deleted for each program for this calculation. The agreement between programs on the absence and presence of QRS waves was tested with nonparametric analysis of variance (Friedman and Wilcoxon tests). program Results Number of measurements reviewed by the referees. The percentage of measurements reviewed by each referee during the second round amounted to 9.5% (1548 of 16,290) of the total. For P onset, P offset, T end, QRS onset, and QRS offset this amounted to 8.0%, 12.6%, 14.6%, 7.8%, and 9.0%, respectively, of the total number of measurements for each. The overall results were not significantly different in the two data sets. The number of measurements reviewed in the third round averaged 3.0% (n = 486). Each referee reviewed between 1.6% and 3.5% of his measurements during the so-called 4/1 review. For the five readers combined, this amounted to 1975 measurements or 525 WlLLEMS et al. Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 FIGURE 2. Example of an enlarged beat (amplification x 10) given as feedback for the third-round discussion. In this case, P end was discussed. Small vertical lines denote the five individual point estimates and long ones the median results. The values close to the latter denote sample point locations relative to the onset of the selected beat. Note that individual referee estimates may overlap and that QRS onset in lead 11 apparently starts 10 msec (five sample points) later than in lead I. 12.1% of the grand total. The fourth-round analysis was performed on 340 and 363 measurements from data sets 1 and 2, respectively. Modifications of the third-round estimates were made on 66 measurements in both sets combined. Interobserver variability. When individual referee results obtained after the second round were compared with the final group estimates, minor but systematic differences were observed (figure 3). Mean differences were smallest for QRS and P onset, whereas they were largest for T end. The SD of the differences was approximately 3 msec for QRS onset and equalled 5 to 6 msec for the end of QRS and for P on- and offset, whereas for T end it varied between 12 and 20 msec. Results obtained in data set 2 were concordant with those in data set 1. Reproducibility of referee results. Table 2 lists estimates of the reproducibility of the final group results for the 26 ECGs that were analyzed three times during the study period. Maximal differences between any pair of the three repeat readings are listed. It can be seen that for 89.0% of the measurements (347 out of 390), the final estimates of QRS onset were within 4 526 msec. For QRS offset, P onset, and P offset these values were 76.4%, 72.4%, and 67.6%, respectively. The repeat readings of T end were within 20 msec of the originals 80.0% of the time. TABLE 1 Programs examined in the present study CSE program Program No. name 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 CIMHUB Louvain Hannover HP Giessen IBM Nagoya Lyon AVA Glasgow Halifax Padova Telemed Modular Sicard-Riedel 12 lead yes yes yes yes yes yes yes yes yes yes XYZ yes yes yes yes yes yes yes yes yes Version June 81 1979 3 3.4 1980 2-5890 BI 5.6 4.0 1976 1980 1980 6H 8101 1980 CIRCULATION DIAGNOSTIC METHODS-ELECTROCARDIOGRAPHY MERSUREMENT\REFu- 3 2 1 5 14 ONSET P END P p i ONSET ORS 4 END ORS END T * .*k .* conz SCALE iMECi A - +*1i -10 +10 -10 + 10 -10 -10 10 -10 4 in FIGURE 3. Bar graph of differences (in msec) between individual referee estimates and final group results. Mean differences are depicted by small vertical lines and 99% confidence intervals by horizontal bars. The long vertical lines denote zero difference. Composite lead group results are presented for data set 1 and 2 combined (n = 310). For P measurements n = 261 due to exclusion of ECGs showing atrial fibrillation, flutter, and atrioventricular junctional rhythm. Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 With respect to the reproducibility of individual referee results, no significantly different results were obtained. Average deviations and corresponding SDs were of the same order of magnitude for each referee. Results from six-channel recordings. Reproducibility and precision of referee results could also be derived from the ECGs in which six channels were recorded simultaneously. Determinations of QRS onset were the most reliable. The onset of QRS in lead group I-III differed by no more than 4 msec from the time location obtained in the simultaneously recorded, but separately analyzed, lead group aVR-aVF in 91.4% of the cases (64 out of 70). For QRS offset and P on- and offset a difference of less than or equal to 4 msec was observed in 87.1%, 88.2%, and 80.4%, respectively. The difference for T end was less than 20 msec in 90% of the cases. The differences between the point estimates derived from the bipolar and unipolar limb leads varied symmetrically around zero and were significantly less (p < .01) than the differences noted between the precordial lead groups. P onset, P end, and QRS onset were often determined earlier, whereas TABLE 2 Reproducibility of median referee results Max QRS difA (msec) onset QRS offset P onsetB offsetB Max difA T end (%) (%) (%) (%) (msec) (%) 0 2 4 6 8 36.2 42.8 10.0 3.3 3.8 3.8 31.0 35.4 10.0 9.7 3.1 10.8 22.9 31.4 18.1 6.7 5.7 15.2 9.5 42.9 15.2 7.6 5.7 19.0 0-2 4-6 8-10 12-14 16-18 .20 17.7 24.6 12.3 13.1 12.3 20.0 '10 P AMaximum differences (in msec) between medians of three repeat readings in 26 cases. QRS results were derived from each of the 15 leads (15 x 26= 390), whereas P and T refer to lead group (5 x 26= 130) measurements. BFive cases with atrial fibrillation excluded. Vol. 71, No. 3, March 1985 QRS offset was often located later in V, -V3 than in the simultaneously recorded lead group V4-V6. Isoelectric segments and small waves. From figure 4 it is evident that so-called isoelectric segments of 10 msec and more were not uncommon at the beginning or at the end of QRS, especially in leads I, aVR, aVL, and X, where it occurred in 17% to 23% of the cases. The smallest recognizable wave that could be detected in a reproducible manner on standard ECG recordings was studied by comparing the referees' wave morphologic results and the measured values from the programs. Scattergrams of duration and amplitude results for small Q and R waves reliably identified by the referees (i.e., four of five reported a wave) demonstrated that the smallest detected QRS waves have an amplitude on the order of 20 ,uV and a duration of 6 msec. Only a few programs detected waves of less than 30 gV. Comparison of program with referee point estimates. The median results of the programs were quite close to the referee estimates. However, differences between individual program results and the referee standard were significantly larger. The bar graphs in figures 5 and 6 show the 99% confidence intervals for the mean differences in P, QRS, and T onsets and offsets with the referee results as reference. They demonstrate that various program results deviate significantly from the referee point estimates. Not only mean results, but also the variances (indicating scatter around the referee standard), differed from program to program. The results obtained from data set 2, the test set, were not significantly different from those from data set 1. Variations for QRS on- and offset were larger on ECGs indicating conduction defects (n = 47) and slightly larger on ECGs indicating myocardial infarction (n = 91) than on tracings with normal QRS complexes (n = 67). Comparison of program-to-program results. Agree527 WILLEMS et al. ORS onset 0 2 - 4 rI 8-8 >- 14 12 - 10 msec I 100 80 80 70 c 0 ~'50 cl 40 30 _2I II aVR III aVL VI aVF V3 V? v8 V5 V4 X z Y Lead ORS offset Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 4Z c C) C4) CL FIGURE 4. Bar chart of isoelectric segments found at QRS onset and offset in the 15 leads analyzed by the referees. Results derived from the final referee estimates. n = 310 observations for each lead. \PROGRfRM- MEDIAN ONSET P . 5 2 6 l-IlII 7 c *a$1m V1-V3 a $m a! 4! E* 4d _ _ _J6 _ 16 V4 -V6 END P 1-1II RVR -RVF _ 15_ 2 4, VO-V3 14 13 * b RVR -PVF 12 8 14a E* 4, 4, 0 2 *zz 9 *3 4! 14m E *m 4! 54-V6 4! w1m E~ END T E*a 4 1-III RVR-RVF 4 4 VI-V3 V V4-YE ONSET ORS I-III AVR-RA 4 4 * 4, E Y1-V3 V4-V6 END OfS I-111 Is 4, ,10 + 10 -1 0 + -10 +10 ¢1 E * * VI -V3 Ea * * 4! A-RAPVF 1¢ c * 1 0 43 r V4-V6 SCRLE (MSECI-10 , 10 -10 *10 -10 *10 4p -4! +10 -10 *10 *i F-1O +10 o1 +10 F -10 410 -10 *1I FIGURE 5. Comparison of referee standard with individual and median lead group onsets and offsets determined by 10 standard 12-lead programs used to analyze data set 2 (n 155). Means are depicted by small vertical lines and 99% confidence intervals, after omitting outliers, by horizontal bars. The long vertical lines denote zero differences. Note that program performance is characterized by measurement instability (width of bars) and systematic deviations (distance between short and long vertical lines). Some results for programs 2 and 5 were missing. 528 CIRCULATION DIAGNOSTIC METHODS-ELECTROCARDIOGRAPHY \PROGRRMu- MEDI}AN ONSET P X X z 2 334 E20 END P X y z 4 11 LI 1* * 4'3 rr4m t. *M -a -i 121- 4' $1 * ENO T X y z 10 9 6 04 52 ONSET ORS X X z 4 b * END QRS X X Z i q SCALE (MSEC)-10 - +-A -10 +10 -10 *10 -10 +10 * -10 *l0 0 -lo +1I -10 1 4 10 -1c +10 -10 ' 10 + 10 -10 FIGURE 6. Comparison of program results with referee standard (mean differences and 99% confidence intervals). Individual and median lead group results derived from the Frank XYZ leads by nine VCG computer programs for data set 2 (n = 155) are shown. Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 ment in reporting the presence or absence of Q waves was attained 70% to 96% of the time, depending on the leads analyzed. High agreement figures were obtained for the right precordial leads, lower ones for the limb leads. The agreement between different computer programs varied, on the average, between 80% and 90%. The number of small Q and R waves (duration 12 msec and amplitude .50 ,uV) reported by the various programs differed largely, as can be seen from table 3. When average durations were calculated for those ECGs for which programs agreed on the presence of a Q or QS wave, then significant differences amounting to more than 10 msec were found, as illustrated in figure 7. These differences were apparent to the same degree and in the same direction in the original and the so-called artificial recordings. Effect of noise on program results. A comparison of point estimates from high- vs low-noise recordings indicated that on the average computer-derived wave onsets and offsets were shifted outward by noise (figure 8). However, this shift was significantly less for some programs than for others. Discussion One objective of the CSE project is aimed at reducing the variation of measurements made by computer programs for interpreting the ECG. To this end a data base with well-defined wave reference points was established. As might be expected, individual referee results demonstrated a certain interobserver and intraobserver variability. However, this variability was lower than in former studies17 because of the interactive reviewing process. Indeed, each of the four successive rounds of the Delphi-type reviewing process led to smaller variances. When acting as a group, the final results of the referees proved to be very stable and can be supposed to be a valid standard reference. Results for each recording of half the library (the socalled training set) have been published in a CSE AtVol. 71, No. 3, March 1985 las`8 and are available on magnetic tape. These results can be used to test or refine wave recognition results of ECG analysis programs in which three simultaneously recorded leads are used. A number of compromises were required for an effective implementation of the procedures used for choosing the standard reference time points for ECG wave onsets and ends.8 One such compromise was the use of median values of the referees after the first and second review rounds as the "correct" reference points for evaluating interval measurements by the programs. It is conceivable that occasionally the median value did not correspond to the most accurate reference point. However, the choice of the median values was considered necessary to cope with the problems caused by outliers, i.e., sporadic erratic measurements in certain difficult or noisy records. On the other hand, from statistical theory it is well known that means or medians of multiple observations are more precise and reproducible in reflecting the population truth than single estimates. In view of this it can be postulated that TABLE 3 Number of "small" Q and R waves identified by 12-lead ECG programs in data set 1 and 2 combined Q wave <12 msec and .50 ,uV Program No. 2 5 6 7 8 12 13 14 15 16 R wave <12 msec and .50 gV Artificial Artificial Original Original ECGs ECGs ECGs ECGs (n= 12 x 250) (n= 12 x 310) (n= 12 x 250) (n= 12x 310) 26 14 86 45 16 148 63 0 157 10 46 76 39 80 22 240 97 13 178 11 11 8 29 26 7 81 32 21 88 39 16 31 7 37 11 150 63 14 132 52 529 WILLEMS et al. LERD MERSUREMENT 0 DUR. J LERD MERSUREMENT 0 OUR. X N Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 -20 DIFFERENCE (PROG-MED) IN MSEC DIFFERENCE IPROG-MED) IN MSEC FIGURE 7. Histogram of differences (in msec) between individual Q duration and median program results derived by eight VCG programs in lead X and by 10 standard 12-lead ECG programs in lead I. Q wave durations are not calculated by the Lyon program. Results are from data set 1 and 2, original and artificial ECGs combined (n = 560). response. However, such an effect on the readers at most affected 12% of the measurements that were made after the first-round analysis (figure 1). This percentage only slightly decreased over the study period, indicating that the five readers remained independent. Although the referees were given low-pass (15 Hz) filtered recordings to assist them in localizing the another group of readers would produce similar results within the statistical limits presented in this study. Indeed, the median of the independent programs, each with its built-in cardiologic experience, closely approached the final referee estimated (figures 5 and 6), further supporting their validity. It can be assumed that the interactive reviewing process resulted in a learned CSE DRTR SET 1+2 - SOX LOWEST NOISE RRNKS - 50X HIGHEST NOISE RANKS ~ LOW NOISE RECORFDS VERSUS HIGH NOISE RECORDS COMPRRISON OF PROGRRM RESULTS WITH REFEREE STRNDRRO; MERN DIFFERENCES RNO 99z CONFIDENCE INTERVRLS \PROGRRMw- MEDIRN _ 3_ ____ 2 4 9 6 I1 10 12 15 ONSET P XYz c P END P X yz % END T 9 SW ONSET ORS xX Y z * END ORS X Y Z 1* SCRLE (ISEC) -10 +10 :1 10 +10 - 10 + 10 - - 10 1+10 -0 -so 2S i1+10 1 - 10 +1t + 10 -1in 1'10 -10 +10 1 -10 0 *10 -10 410 FIGURE 8. Lead group onsets and offsets from XYZ leads obtained by nine VCG computer programs in comparison to the reference standard in the 50% lowest vs 50% highest noise recordings. Results were obtained from data set 1 and 2 combined (n = 310). 530 CIRCULATION DIAGNOSTIC METHODS-ELECTROCARDIOGRAPHY Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 onset and offset of the P wave and the end of the T wave, they had to indicate all the fiducial points on the unfiltered high-gain recordings. Averaging techniques and signal transformations such as spatial velocity or magnitude curves were not provided since none of the cardiologists used such signals in routine electrocardiography. In addition, this could have biased the analysis results toward certain algorithms. The standard 12-lead ECGs of the current data base were acquired with the conventional three-channel sequencing, which is known to be suboptimal as a result of lack of orthogonality of the lead groups. However, this recording technique is used all over the world and all routinely used 12-lead computer programs in existence at the start of the project required such data. Of course, lack of orthogonality cannot be offered as a criticism for the analysis of XYZ data in the present study. There were several reasons why the establishment of a data base with well-defined onsets and offsets of P, QRS, and T waves was given high priority in the CSE project. Experiences of investigators working in pattern recognition have demonstrated that several mathematic algorithms may lead to similar solutions in the average case. Some methods, however, may perform better under different conditions than others and vice versa. The use of a data base for the development of algorithms is standard practice in various fields, from automated character reading to computer-assisted chromosome and leukocyte typing. For this, a local data bank and human wave recognition, usually by a single reader, has mostly been used. Furthermore, discussions with cardiologists revealed an unwillingness of the medical community to accept strict mathematic definitions if they had not been tested against wave recognition results derived by human reading. From the intraobserver and interobserver reproducibility tests in the present study it is apparent that QRS onset is the measurement that can be made most reliably. A precision of less than 6 msec (three sample points) is attainable for QRS onset at high amplification in relatively noise-free records. Based on the results of the present study for P onset and offset, as well as for QRS offset, a difference of 10 msec is tolerable, whereas for T end, this may be increased to 25 msec. These empirical findings are in accordance with electrophysiologic theory. The onset of ventricular depolarization is usually a well-defined entity. QRS offset, in contrast, is a rather arbitrary fiducial point at which final echoes of depolarization merge imperceptibly with the early signs of repolarization. The same is true for the end of P. The T wave recovery forces move slowly and are of small magnitude. The end of T is Vol. 71, No. 3, March 1985 therefore inherently less well defined. Nonetheless, in practical electrocardiology the end of QRS, as well as of the P and T waves, needs to be determined as accu- rately as possible. The construction of the current data base from simultaneously recorded three-lead ECGs is a primary step in the process of standardization of computer ECG measurement programs, as was recommended at the first IFIP Conference'2 and at the Tenth Bethesda Conference19 on computer-assisted electrocardiography. While the data base cannot be guaranteed to be a representative sample of the ECG universe (the collection of all conceivable ECGs) and the number of ECGs in the data base has been constrained by practical considerations, it is highly probable that conclusions reached by evaluating program performance with the present data base may be generalized to cover program performance in daily routine practice. In fact, the present study demonstrates a rather wide variation in wave measurement results, and especially time intervals, obtained by nine currently used VCG and 10 standard 12-lead ECG computer programs analyzing a common reference data base. This variability may be explained by several factors. Various programs apply different algorithms and references for wave recognition, beat selection, and parameter extraction. 12-15, 19-21 Most wave recognition programs apply threshold-level crossing methods to amplitude differences of filtered leads or use different matching techniques on templates in the filtered spatial velocity-time function. These templates or threshold levels are at the best derived from a set of ECGs and are computed around the points indicated by one or more human observers. Since the onsets and offsets of waves to which various programs were tailored have been determined by different referees using different sets of ECGs, it follows that systematic differences in computer results similar to those observed in human interobserver variability studies may be expected. Other factors also contribute to the variability in measurement results. It has been demonstrated that programs that apply strategies for location of fiducial points on simultaneously recorded leads produce greater measurement reliability and reproducibility than programs in which single-lead ECG analysis is used.22, 23 The sampling rate and record length effectively used by various programs are often different. Some apply a sampling rate of 250 Hz, while others use 300, 400, or 500 Hz. Significant differences between program measurements and the reference standard are less likely to occur with programs using a sampling rate of 500 Hz. In programs based on a sam531 WILLEMS et al. Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 pling rate of 250 Hz the odd samples were analyzed in the present study, whereas those using 300 or 400 Hz interpolation techniques were applied on the 500 Hz CSE library. As demonstrated by Bailey et al.,"2 small shifts in sampling thereby introduced may have caused different measurement results. All programs attempt to make use of the redundancy of complexes available in the sampled ECG to optimize the accuracy of measurement extraction from the tracing.'9 Three techniques are currently in use. The first locates the "best complex" for analysis. To find the best complex, all of the complexes of a given lead set are located. The program then chooses one complex for analysis, generally the one with the least noise and baseline wander. In the second technique, some time-coherent averaging is made of all complexes that are considered to be morphologically of the same type. This procedure reduces the random noise in the signal. The third method extracts the measurements from every complex in the lead set and subsequently operates on the measurements of similar dominant complexes. From the above it is evident that, although identical ECGs may be given as input to various programs, final measurements may have been derived from different beats. This will inevitably lead to different results in a specific ECG record. However, if computer analysis of ECGs is to become a standardized laboratory procedure, averages of measurements based on a sample of sufficient size should be identical and variances should be within "acceptable ranges. "5 Results from the present study indicate that this goal has not yet been achieved. In the CSE project, a so-called artificial ECG library was created by selecting 1 beat from each of the lead groups of the original ECG recordings and by making strings of identical beats with a constant RR interval.'° Differences in measurement results due to the different beat selection methods listed above have thereby been circumvented. Nevertheless, an important variability between measurements obtained with the VCG and standard 12-lead computer programs could still be demonstrated. This variability is the result not only of difficulties in the determination of wave onsets and offsets, but also results from a lack of consistent and precise common definitions, minimum wave requirements, and measurement rules. In the present study we have found that isoelectric segments of 10 msec and more are not uncommon at the beginning and end of QRS in various leads. There are at present no generally accepted guidelines with respect to these segments. In a minority of programs, these isoelectric segments are en532 closed in the duration of the initial or terminal QRS components, whereas in the majority they are excluded. Furthermore, various programs use different limits for the detection and labeling of small QRS waves. Results from the present investigation indicate that the smallest recognizable waves, by visual inspection, have an amplitude on the order of 20 ,uV and a duration of 6 msec. A few programs provide Q and R wave measurement results below these limits. Other programs, however, require a minimum amplitude of 30 or even 40 ,uV and a duration of 8 or 10 msec. Others use noise-dependent thresholds based on a signal derivative or a combination of amplitude and duration results. On the average, the programs tested agreed on the presence and absence of Q and QS waves about 80% to 90% of the time. When durations were calculated for those records for which programs agreed on the presence of a Q or QS wave, significant average differences amounting to more than 8 msec were found in the limb leads and the differences were even greater in the right precordial leads. Such differences have important consequences for diagnostic performance, given that these programs might use the same thresholds and logic for the diagnosis of myocardial infarction.23 24 The comparison of referee with computer point estimates reported in the present study demonstrates that some programs are more precise and show less variability than others. In general the measurement performance of XYZ programs was better than that of 12lead programs. These results have been confirmed by noise-tolerance tests. Results from low- and highnoise recordings indicate that P, QRS, and T onsets and offsets are shifted outward by noise in various computer programs. However, the extent of this shift is variable from program to program, probably as a result of different preprocessing methods. Further studies in this area are still in progress.29 Preliminary results indicate that programs that apply time-coherent averaging perform better in noisy records. To allow an exchange of diagnostic criteria wave measurement results need to be standardized. From the above it is obvious that common standards for quantitative electrocardiography are still missing. Therefore, parallel to the establishment of the CSE reference data base, the CSE Working Party has attempted to establish definitions, wave requirements, and measurement procedures. Recommendations in this direction are being developed. In addition steps have been initiated to evaluate diagnostic program performance and to test the clinical impact of improved measurements. CIRCULATION DIAGNOSTIC METHODS-ELECTROCARDIOGRAPHY The data presented in the current investigation were derived from ECG analysis programs with the use of three simultaneously recorded leads. At the start of the CSE project, equipment that could be used to acquire 12 (eight independent) or 15 leads simultaneously was not yet on the market. Some of the latest programs can only operate on such multichannel leads.2628 To this end the CSE data base has recently been extended with several hundred new ECGs. However, the basic problems encountered in the present analysis are also applicable to these newer programs. We gratefully acknowledge the secretarial assistance of Diane Wolput and Viviane Dillemans, as well as the technical assistance of Ludo Van den dries and Danny De Schreye. 2. Rautaharju PM: The current state of computer ECG analysis: a critique. In van Bemmel JH, Willems JL, editors: Trends in computer-processed electrocardiograms. Amsterdam, 1977, North Holland Publishing Co, p 117 3. Drazen E: Use of computer-assisted ECG interpretation in the United States. In Ripley KL, Ostrow HG, editors: Computers in cardiology. Long Beach, CA 1979, IEEE Computer Society, p 83 4. Willems JL, Pardaens J: Differences in measurement results obtained by four different ECG computer programs. In Ostrow HG, Ripley KL, editors: Computers in cardiology. Long Beach, CA, 1977, IEEE Computer Society, p 115 5. Willems JL: A plea for common standards in computer aided ECG analysis. Comp Biomed Res 13: 120, 1980 6. The CSE European Working Party: Common standards for quantitative electrocardiography. The CSE pilot study. In Gremy F, et al, 7. 8. Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 Organizational structure: CSE committees and participants CSE Steering Committee. P. Arnaud (France), R. Degani (Italy), P. W. Macfarlane (United Kingdom), J. H. van Bemmel (The Netherlands), J. L. Willems (Project Leader; Belgium) C. Zywietz (West Germany). CSE Board of Referees. P. J. Bourdillon (United Kingdom), G. Mazzocca (Italy), B. Denis (France), J. Meyer (West Germany), E. 0. Robles de Medina and F. M. A. Harms (acting as a team with one vote, The Netherlands), H. J. Ritsema van Eck (consultant, The Netherlands). CSE European Working Party. Belgium: C. Brohet (University of Louvain), M. Demeester (University of Brussels), J. Pardaens and J. L. Willems (University of Leuven). West Germany: J. Dudeck (University of Giessen), J. Meyer and J. Michaelis (University of Mainz), S. J. Poppl (Institute Medical Data Processing, Munchen), C. Zywietz (University of Hannover). Denmark: J. Damgaard Andersen (University of Copenhagen). France: P. Arnaud (INSERM U121 Lyon), B. Denis (University of Grenoble), P. Rubel (INSA, Lyon). Greece: S. Moulopoulos (University of Athens), E. Skordalakis (NCR Democritos, Attiki). Italy: S. Dalla Volta (University of Padova), R. Degani (Ladseb CNR, Padova), G. Mazzocca (University of Pisa). Ireland: 1. Graham and B. C. Reardon (University of Dublin). The Netherlands: J. H. van Bemmel and J. L. Talmon (Free University, Amsterdam), F. M. A. Harfms and E. 0. Robles de Medina (University of Utrecht), H. J. Ritsema van Eck (Rotterdam). United Kingdom: P. J. Bourdillon (University of London), P. W. Macfarlane (University of Glasgow). Consultants. J. J. Bailey (N.I.H.) and Pipberger HV (George Washington University, Washington, D.C.), P. M. Rautaharju (University of Dalhousie, Halifax, Nova Scotia). Non-European participants. U.S.A.: R. Bonner (IBM), J. Doue (Hewlett-Packard), K. Michler (Telemed). Canada: P. M. Rautaharju and P. Macinnis (University of Dalhousie, Halifax, Nova Scotia). Japan: M. Okajima, N. Okamoto, M. Yokoi (University of Nagoya), M. Ohsawa (Fukuda Denshi). CSE Coordinating Center. Division of Medical Informatics, University of Leuven, Belgium. References 1. Pipberger HV: Twenty years of ECG data processing. What has been accomplished? In Antaloczy Z, editor: Modem electrocardiology. Amsterdam, 1978, Excerpta Medica, p 159 Vol. 71, No. 3, March 1985 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. editors: Medical informatics Europe 81. Berlin, 1981, Springer Verlag, p 319 The CSE European Working Party: Common standards for quantitative electrocardiography. CSE project phase one. In Ripley KL, editor: Computers in cardiology. Long Beach, CA, 1982, IEEE Computer Society, p 69 Bourdillon PJ, Denis B,Harms FMA, Mazzocca GU Meyer J, Robles de Medina EO, Ritsema van Eck HJ, Willems JL: European experience in the standardization of measurements and of definitions of the electrocardiogram. In Laks M, editor: Computerized interpretation of electrocardiograms VII. New York, 1982, Engineering Foundation, p 9 Macfarlane PW, Willems JL on behalf of the CSE Working Party: The CSE Project: progress as viewed by the cooperating centers. In Selvester R, editor: Computer interpretation of electrocardiograms VIII. New York, 1983, Engineering Foundation (in press) Willems JL, Arnaud P. Degani R, Macfarlane PW, van Bemmel JH, Zywietz C: Protocol for the concerted action project "Common Standards for Quantitative Electrocardiography," Second R&D programme in the field of Medical and Public Health Research of the EEC (80/344/EEC), CSE Ref. 80-06-00, Leuven, Belgium, 1980, ACCO Publ, p 152 Dalkey N: Analysis from a group opinion study. Rand Corporation. Futures, December 1969, p 541 Zywietz C, Schneider B, editors: Computer application in ECG and VCG analysis. Amsterdam, 1973, North Holland Publishing, p 271 van Bemmel JH, Willems JL, editors: Trends in computer processed electrocardiograms. Amsterdam, 1977, North Holland Publishing, p 437 Wolf HK, Macfarlane PW, editors: Optimization of computer ECG processing. Amsterdam, 1980, North Holland Publishing, p 346 Talmon JL: Pattern recognition of the ECG. A structured analysis, doctoral thesis. Free University, Amsterdam, 1983, p 366 Willems JL: Common standards for quantitative electrocardiography. Third progress report. Leuven, Belgium, 1983, ACCO Publ, p 275 Fischmann E, Cosma J, Pipberger HV: Beat to beat and observer variation of the electrocardiogram. Am Heart J 75: 465, 1968 Willems JL, editor: CSE atlas -referee results first phase library data set one, CSE Ref. 83-05-13, Leuven, Belgium, 1983, ACCO Publ, p 655 Rautaharju PM, Ariet M, Pryor TA, Arzbaecher RC, Bailey JJ, Bonner R, et al: Task Force III: computers in diagnostic electrocardiography. Am J Cardiol 41: 158, 1978 Stallman FW, Pipberger HV: Automatic recognition of electrocardiographic waves by digital computer. Circ Res 9: 1138, 1961 van Bemmel JH, Talmon JL, Duisterhout JP, Hengeveld SJ: Template wave form recognition applied to ECG/VCG analysis. Comp Biomed Res 6: 430, 1973 Bailey JJ, Horton M, Itscoitz SB: A method for evaluating computer programs for electrocardiographic interpretation. III Reproducibility testing and the sources of program errors. Circuation 50: 88, 1974 Helppi RR, Unite V, Wolf HK: Suggested initial performance requirements and methods of performance evaluation for computer ECG analysis programs. Can Med Assoc J 108: 1251, 1973 Rautaharju PM: Use and abuse of electrocardiographic classification systems in epidemiologic studies. Eur J Cardiol 8: 155, 1978 533 WILLEMS et al. 25. Zywietz C, Alraun W, Willems JL on behalf of the CSE Working Party: Results of ECG program noise tests within the CSE project. In Ripley KL, editor: Computers in cardiology. Long Beach, CA, 1984, IEEE Computer Society (in press) 26. MAC II Marquette Electronics Inc, Milwaukee, 1982 27. Macfarlane PW, Peden A, Podolski M, Lawrie TDV: A new 12 lead ECG diagnostic computer program. Jpn Heart J 23(suppl I): 667, 1982 28. Bortolan G, Cavaggion C, Degani RT: A comparison of ECG measurements derived from 3, 6 and 12 simultaneous leads. In Ripley KL, editor: Computers in cardiology. Long Beach, CA, 1983, IEEE Computer Society, p 269 Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 534 CIRCULATION Assessment of the performance of electrocardiographic computer programs with the use of a reference data base. J L Willems, P Arnaud, J H van Bemmel, P J Bourdillon, C Brohet, S Dalla Volta, J D Andersen, R Degani, B Denis and M Demeester Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017 Circulation. 1985;71:523-534 doi: 10.1161/01.CIR.71.3.523 Circulation is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231 Copyright © 1985 American Heart Association, Inc. All rights reserved. Print ISSN: 0009-7322. Online ISSN: 1524-4539 The online version of this article, along with updated information and services, is located on the World Wide Web at: http://circ.ahajournals.org/content/71/3/523 Permissions: Requests for permissions to reproduce figures, tables, or portions of articles originally published in Circulation can be obtained via RightsLink, a service of the Copyright Clearance Center, not the Editorial Office. Once the online version of the published article for which permission is being requested is located, click Request Permissions in the middle column of the Web page under Services. Further information about this process is available in the Permissions and Rights Question and Answer document. Reprints: Information about reprints can be found online at: http://www.lww.com/reprints Subscriptions: Information about subscribing to Circulation is online at: http://circ.ahajournals.org//subscriptions/
© Copyright 2026 Paperzz