ANATOMIC PATHOLOGY Original Article Interobserver Reproducibility of the Nottingham Modification of the Bloom and Richardson Histologic Grading Scheme for Infiltrating Ductal Carcinoma HENRY F. FRIERSON, J R , M D , ' ROBERT A. WOLBER, MD, 2 KENNETH W. BEREAN, MD, 2 DOUGLAS W. FRANQUEMONT, MD, 3 MICHAEL J. GAFFEY, MD, 1 JAMES C. BOYD, MD, 1 AND DAVID C. WILBUR, MD 4 The interobserver reproducibility of the Nottingham modification of the Bloom and Richardson histologic grading scheme for invasive breast carcinoma was tested. Six surgical pathologists from four institutions independently evaluated histologic grade and each of its three components for 75 infiltrating ductal carcinomas. The number of slides per case ranged from one to nine (median 3). Pairwise K values for agreement ranged from moderate to substantial (0.43-0.74) for histologic grade. Generalized K values indicated substantial agreement for tubule formation (0.64), moderate agreement for mitotic count (0.52), and near moderate agreement for nuclear pleomorphism (0.40). Nor- malizing the mitotic counts per mm2 showed only slight improvement in agreement over the published range of mitotic counts for three different field areas. The results suggest that steps to discriminate between categories for nuclear pleomorphism would likely be of benefit for improving the interobserver reproducibility of histologic grade. Nevertheless, the Nottingham modification of the Bloom and Richardson grading system is recommended as a suitable scheme for evaluating invasive breast carcinomas in the routine clinical setting. (Key words: Breast cancer; Grade; Reproducibility) Am J Clin Pathol 1995; 103:195-198. Histologic grade has been shown to be an important independent prognostic factor for women who have infiltrating mammary carcinoma.1"4 Those studies that fail to analyze grade as an independent prognostic variable presumably have taken into consideration those reports that show poor reproducibility of histologic grading. Unfortunately, there is no single universal grading system that is currently used by pathologists for invasive breast cancer. Among the histologic and nuclear grading schemes that have been employed, the Nottingham modification 4 of the Bloom and Richardson 5 system seems to be the most clearly defined. However, no multi-institutional interobserver reproducibility study of the Nottingham system has been undertaken for a large number of invasive breast cancers. In this study, six surgical pathologists from four institutions independently evaluated grade and each of its three components for 75 infiltrating ductal carcinomas (IDC). The study was limited to IDC, as this histologic type of invasive mammary carcinoma is the most common, and because it is perhaps the most difficult to grade with acceptable reproducibility. MATERIALS A N D METHODS Surgical pathologists from the University of Virginia Health Sciences Center (HFF and MJG), University of Rochester (DCW), Washington University (DWF), and University of British Columbia (KWB and RAW) participated in this interobserver reproducibility study. Each independently reviewed slides (range 1-9 per case; median 3) of adequate quality from 75 IDC obtained by excisional biopsy, mastectomy, or less often, incisional biopsy. The IDC were selected by one of the authors (HFF) who chose them solely on grounds of slide availability. The specimens were fixed in formalin or zinc formalin. Each participant was furnished with a copy of the publication describing the updated version of the Nottingham grading system. 4 In addition, each participant was referred to the illustrations showing histologic features of the three grades in chapter 17 of Diagnostic Histopathology of the Breast by David 6 From the 'Departments of Pathology. University of Virginia Health L. Page and Thomas J. Anderson. 2 Sciences Center. Charlottesville, Virginia; University ofBritish Colum- Several written instructions for histologic grading (some of bia, Vancouver, British Columbia;3 Washington University, St. Louis, these details had appeared in the Elston and Ellis publication) 4 Missouri; and4 University ofRochester Medical Center. Rochester. New were also provided to the participants. These guidelines inYork. cluded the following: For scoring tubule formation, the overall appearance of the neoplasm was to be taken into consideration. Manuscript received February 23, 1994; revision accepted April 21, For nuclear pleomorphism, the areas of the cancer having cells 1994. with the greatest atypia were to be evaluated. Mitotic figures Address reprint requests to Dr. Frierson: Department of Pathology, were to be counted only at the periphery of each IDC. Counting Box 214, University of Virginia Health Sciences Center, Charlotteswas to begin in the most mitotically active area (fields that subville, VA 22908. 195 196 ANATOMIC PATHOLOGY Original Article jectively had the highest density of mitotic figures), and was to commence with a field having one or more mitotic figures. Ten high-power (X400) fields were to be counted more or less in the same area, but they were not necessarily to be contiguous. The fields were to be filled with as much tumor as possible. Poorly preserved areas, if present, were to be avoided. Only bona fide mitotic figures were to be counted. Cells in prophase were to be ignored. The appropriate mitotic count scores were to be selected according to the high-power field area nearest that provided in Table 2 found in the Elston and Ellis publication. 4 According to the Nottingham scheme, 4 the grade was obtained by summing the scores for tubule formation, nuclear pleomorphism, and mitotic count, each of which was given 1, 2, or 3 points. Briefly, when most of an IDC (>75%) was composed of tubules, a score of 1 point was given. Two points were given when tubule formation was present in moderate amounts (10% to 75%), and 3 points were given when an IDC lacked or showed only minimal tubule formation (<10%). For nuclear pleomorphism, an IDC having nuclei with minimal variation in size and shape was given 1 point. An IDC with moderate pleomorphism was given 2 points, and those with marked variation were given 3 points. The mitotic count score ranged from 1 to 3. Grade I IDC had 3 to 5 points, grade II neoplasms had 6 or 7 points, and grade III tumors had 8 or 9 points. Each participant also recorded the total number of mitotic figures per 10 high-power (X400) fields for each case. In addition, each recorded the brand and type of microscope that was used and the high-power field area. Subsequently, the number of mitotic figures per mm 2 was calculated. The high-power field area (0.274 mm 2 ) originally used by Elston and Ellis4 was employed to calculate the mitotic count scores (1 to 3) per mm 2 . Finally, the participants were encouraged to provide comments on any particular case. Interobserver agreement for tumor grade, tubule formation, nuclear pleomorphism, and mitotic count was tested using pairwise kappa and generalized K statistics.7 The divisions of the K statistic providing "benchmarks" for strength of agreement were described by Landis and Koch8: Kappa Statistic Strength ofAgreement <0.00 0.00-0.20 0.21-0.40 0.41-0.60 0.61-0.80 0.81-1.00 Poor Slight Fair Moderate Substantial Almost Perfect Pairwise percent agreement for histologic grade, tubule formation, nuclear pleomorphism, and mitotic count was also calculated as well as consensus 4 (diagnostic agreement by four of six pathologists or better); consensus 5; and unanimous diagnoses. RESULTS Consensus 4 agreement for histologic grade and its three components is presented in Table 1. Twenty-three percent of IDC were grade I, 32% were grade II, and 45% were grade III. Only 4% of cases had tubule formation score 1, and 7% had nuclear pleomorphism score 1. Slightly more than half had mitotic count score 3. Consensus agreement for grade could not TABLE 1. CONSENSUS-4 AGREEMENT FOR GRADE AND ITS THREE COMPONENTS Number of Cases (%) Parameter Grade 1 2 3 Tubule formation 1 2 3 Nuclear pleomorphism 1 2 15(23) 21(32) 30(45) 3(4) 31(45) 35(51) 5(7) 37(52) 29(41) 3 Mitotic count 19(31) 10(16) 32(53) 1 2 3 be reached for 11.5% of IDC (Table 2). There was consensus agreement for tubule formation and nuclear pleomorphism for 92% and 95% of the neoplasms. Consensus agreement was lowest for mitotic rate (81%). Pairwise percent agreement and pairwise K values are presented in Table 3. The median pairwise percent agreement for grade was 71%; for tubule formation, 81%; for nuclear pleomorphism, 64%; and for mitotic rate, 67%. Generalized K values for grade, tubule formation, nuclear pleomorphism, and mitotic count were 0.55, 0.64, 0.40, and 0.52, respectively. Hence, the strength of agreement for grade and mitotic count was moderate, whereas that for nuclear pleomorphism approached moderate. The strongest agreement (substantial) was seen for tubule formation. Using the microscopic high-power (X400) field area for each participant (0.344 mm 2 ,0.196,0.344,0.105,0.178, and 0.344), the mitotic counts were normalized (number of mitotic figures per mm 2 ) for each case. The field area (0.274 mm 2 ) originally used by Elston and Ellis4 was employed to calculate mitotic count scores (1 to 3) per mm 2 . Using these calculations, the improvement in reproducibility for mitotic count scores was slight. The pairwise percent agreement ranged from 63 to 81% (median: 72%), whereas the generalized K value improved only to 0.55. DISCUSSION The Nottingham/Tenovus Primary Breast Cancer Study has shown that histologic grade is an independent prognostic factor for women with invasive mammary carcinoma, and in combination with tumor size and lymph node stage, a useful prognostic index has been generated.4 The histologic grading scheme is a modification of the Bloom and Richardson 5 system, with the most notable improvement being the assignment of points for mitotic counts according to high-power field areas for each of three types of microscopes. Although there are several reproducibility studies published for the Bloom and Richardson system, there has been no multi-institutional evaluation of the Nottingham scheme for interobserver reproducibility of grade and its three components for a large series of IDC. In our study, A.J.C.P.-February 1995 197 FRIERSON ET AL. Grade of Invasive Breast Cancer TABLE 2. DIAGNOSIS AGREEMENT FOR GRADE AND ITS THREE COMPONENTS Rate of Agreement by Participants (n = 6) Parameter 6/6 5/6 4/6 7>4/6 <4/6 Grade Tubule formation Nuclear pleomorphism Mitotic rate 30*(40) 41 (54) 18 (24) 30 (40) 17* (23) 17 (23) 22 (30) 12 (16) 19* (25.5) 11 (15) 31 (41) 19 (25) 66* (88.5) 69 (92) 71 (95) 61 (81) 9*(11.5) 6 (8) 4 (5) 14 (19) Values in parentheses are percentages. * Number of cases of infiltrating ductal carcinomas. each of six surgical pathologists from four institutions examined grade and each of its components for 75 IDC. We found that tubule formation was the most reproducible of the three features of histologic grade (generalized K = 0.64). This substantial rate of agreement may have been due in part to the semiquantitative aspect of scoring tubule formation (>75% tubule formation = 1 point; 10% to 75% = 2 points; and <10% = 3 points). Using the Bloom and Richardson system, others have found that scoring tubule formation is more reproducible than scoring either nuclear pleomorphism or mitotic count.9"'' The reproducibility for scoring nuclear pleomorphism was clearly inferior to that for grade as well as for each of the other two components. In two studies of the Bloom and Richardson scheme, agreement among pathologists was also worst for nuclear pleomorphism.9" It would seem that improvement in analyzing nuclear pleomorphism would improve histologic grading. More precise definitions and the liberal use of published illustrations of photomicrographs would possibly enhance the agreement in scoring this parameter. Future studies using image analysis for evaluation of nuclear size and contour, chromatin distribution, and nucleolor size might, one day lead to a more accurate assessment of nuclear pleomorphism as a component of histologic grade. The generalized K value of 0.52 for mitotic count indicated moderate reproducibility. The variables responsible for differences among observers in mitotic counts are numerous and include the inexact criteria for identification of mitotic figures, staining quality, section quality and thickness, nonrandom distribution of mitoticfigureswithin the histologic section, tumor cell size, amount of stroma, and microscopic field area.1213 In further analysis of the reproducibility of mitotic count scores, the numbers of mitotic figures per mm2 were calculated and compared with the scores determined for the high-power field area (0.274 mm2) originally used by Elston and Ellis.4 It was interesting that the reproducibility for the mitotic counts im- proved only slightly when calculated per mm2 as compared with that for the counts used by the participants who selected the scores for their field areas nearest that provided by Elston and Ellis.4 This indicates that the mitotic count cutoff points provided for the three different field areas are satisfactory, and that mitotic counts per mm2 need not be calculated. Moderate to substantial agreement (pairwise K range 0.430.74) was found for histologic grade. The median pairwise percent agreement was 71%. Studies using the Bloom and Richardson system with its less precisely defined criteria have yielded agreements between two pathologists of 50.6%, 54%, 66%, 72%, and 78%.1'10'1 '•'"•'5 Of thesefivestudies, the only one that limited the evaluation to IDC reported a pairwise agreement of 72%.'° One other study examined the reproducibility for six pathologists who graded IDC using the Bloom and Richardson system.9 Their generalized K value (0.30) for grade was considerably lower than that found in our study (0.55). In a recent study, 10 slides from 10 IDC were submitted to 25 private practice pathologists.16 Using an earlier Elston and Ellis scheme, there was >87% agreement for 8 of the 10 IDC. In our study and in others, disagreements in more than one grade category were uncommon, typically occurring in no more than 5%ofcases. u o , u 5 Our study was designed to rigorously test the interobserver reproducibility of grading invasive breast cancer. Hence, we restricted the cases to IDC, excluding the "special" types of invasive breast cancer, which, if included, might have resulted in higher K values. Other measures might also have led to more impressive statistical values. Such efforts as limiting the number of slides per case, identifying specific areas on the slides to be evaluated, issuing a training set of slides, and holding microscopic teaching sessions for all participants would likely have led to increased reproducibility for grade and each of its components. However, such measures would be more artificial and TABLE 3. PAIRWISE PERCENT AGREEMENT AND PAIRWISE KAPPA VALUES FOR GRADE AND ITS THREE COMPONENTS Pairwise % Agreement Grade Tubule Formation Nuclear Pleomorphism Mitotic Count Pairwise kappa Grade Tubule Formation 31-40 41-50 51-60 61-70 71-80 81-90 0 0 0 6 8 1 0 0 0 1 6 8 1 0 5 3 6 0 0 0 1 8 5 1 0-0.20 0.21-0.40 0.41-0.60 0.61-0.80 0.81-1.0 0 0 12 3 0 0 1 4 10 0 Vol. 103-No. 2 \ Nuclear Pleomorphism Mitotic Count 1 8 5 1 0 0 1 12 2 0 198 ANATOMIC PATHOLOGY Original Article less likely to reflect the "everyday" experience of grading breast cancer. Sources of variability in grading including intraobserver variability were not examined. These sources might be identified by a more detailed analysis, such as that employed by Cramer and colleagues17 who have applied a pathtracking method to examine the diagnostic variability in classifying common ovarian cancers. REFERENCES 1. Davis BW, Gelber RD, Goldhirsch A, et al. Prognostic significance of tumor grade in clinical trials of adjuvant therapy for breast cancer with axillary lymph node metastasis. Cancer 1986; 58: 2662-2670. 2. Contesso G, Mouriesse H, Friedman S, et al. The importance of histologic grade in long-term prognosis of breast cancer: A study 1,010 patients, uniformly treated at the Institut GustaveRoussy. J Clin Oncol 1987; 5:1378-1386. 3. Le Doussal V, Tubiana-Hulin M, Friedman S, et al. Prognostic value of histologic grade nuclear components of Scarff-BloomRichardson (SBR): An improved score modification based on a multivariate analysis of 1262 invasive ductal breast carcinomas. Cancer 1989;64:1914-1921. 4. Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathol0^1991; 19:403-410. 5. Bloom HJG, Richardson WW. Histological grading and prognosis in breast cancer: A study of 1409 cases of which 359 have been followed for 15 years. Br J Cancer 1957; 11:359-377. 6. Elston CW. Grading of invasive carcinoma of the breast. In: Page DL, Anderson TJ, eds. Diagnostic Histopathology of the Breast. Edinburgh: Churchill Livingstone, 1987, pp 300-311. 7. Fleiss JL. Statistical Methods for Rates and Proportions. New York: John Wiley & Sons, 1981. pp 212-236. 8. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-174. 9. Delides GS, Garas G, Georgouli G et al. Intralaboratory variations in the grading of breast carcinoma. Arch Pathol Lab Med 1982;106:126-128. 10. Theissig F, Kunze KD, Haroske G, Meyer W. Histological grading of breast cancer: Interobserver, reproducibility and prognostic significance. Pathol Res Pract 1990; 186:732-736. 11. Harvey JM, de Klerk NH, Sterrett GF. Histological grading in breast cancer: Interobserver agreement, and relation to other prognostic factors including ploidy. Pathology 1992;24:63-68. 12. Simpson JF, Dutt PL, Page DL. Expression of mitoses per thousand cells and cell density in breast carcinomas: A proposal. Hum Pathol 1992;23:608-611. 13. van Diest PJ, Baak JPA, Matze-Cok P, et al. Reproducibility of mitoses counting in 2,469 breast cancer specimens: results from the Multicenter Morphometric Mammary Carcinoma Project. Hum Pathol 1992; 23:603-607. 14. Stenkvist B, Westman-Naeser S, Vegelius J, et al. Analysis of reproducibility of subjective grading system for breast carcinoma. J Clin Pathol 1979;32:979-985. 15. Hopton DS, Thorogood J, Clayden AD, Mackinnon D. Observer variation in histologic grading of breast cancer. Eur J Surg Oncol 1989;15:21-23. 16. Dalton LW, Page DL, Dupont WD. Histologic grading of breast carcinoma: A reproducibility study. Cancer 1994; 73:2765— 2770. 17. Cramer SF, Roth LM, Mills SE. Sources of variability in classifying common ovarian cancers using the World Health Organization classification: application of the pathtracking method. Pathol Annu 1993;28(part2):243-286. A.J.C.P.-February 1995
© Copyright 2025 Paperzz