Interobserver Reproducibility of the Nottingham Modification of the

ANATOMIC PATHOLOGY
Original Article
Interobserver Reproducibility of the
Nottingham Modification of the Bloom and
Richardson Histologic Grading Scheme for
Infiltrating Ductal Carcinoma
HENRY F. FRIERSON, J R , M D , ' ROBERT A. WOLBER, MD, 2 KENNETH W. BEREAN, MD, 2
DOUGLAS W. FRANQUEMONT, MD, 3 MICHAEL J. GAFFEY, MD, 1 JAMES C. BOYD, MD, 1
AND DAVID C. WILBUR, MD 4
The interobserver reproducibility of the Nottingham modification of the
Bloom and Richardson histologic grading scheme for invasive breast
carcinoma was tested. Six surgical pathologists from four institutions
independently evaluated histologic grade and each of its three components for 75 infiltrating ductal carcinomas. The number of slides per
case ranged from one to nine (median 3). Pairwise K values for
agreement ranged from moderate to substantial (0.43-0.74) for histologic grade. Generalized K values indicated substantial agreement for
tubule formation (0.64), moderate agreement for mitotic count (0.52),
and near moderate agreement for nuclear pleomorphism (0.40). Nor-
malizing the mitotic counts per mm2 showed only slight improvement in
agreement over the published range of mitotic counts for three different
field areas. The results suggest that steps to discriminate between categories for nuclear pleomorphism would likely be of benefit for improving the interobserver reproducibility of histologic grade. Nevertheless,
the Nottingham modification of the Bloom and Richardson grading system is recommended as a suitable scheme for evaluating invasive breast
carcinomas in the routine clinical setting. (Key words: Breast cancer;
Grade; Reproducibility) Am J Clin Pathol 1995; 103:195-198.
Histologic grade has been shown to be an important independent prognostic factor for women who have infiltrating mammary carcinoma.1"4 Those studies that fail to analyze grade as
an independent prognostic variable presumably have taken
into consideration those reports that show poor reproducibility
of histologic grading. Unfortunately, there is no single universal
grading system that is currently used by pathologists for invasive breast cancer. Among the histologic and nuclear grading
schemes that have been employed, the Nottingham modification 4 of the Bloom and Richardson 5 system seems to be the
most clearly defined. However, no multi-institutional interobserver reproducibility study of the Nottingham system has been
undertaken for a large number of invasive breast cancers. In
this study, six surgical pathologists from four institutions independently evaluated grade and each of its three components for
75 infiltrating ductal carcinomas (IDC). The study was limited
to IDC, as this histologic type of invasive mammary carcinoma
is the most common, and because it is perhaps the most difficult
to grade with acceptable reproducibility.
MATERIALS A N D METHODS
Surgical pathologists from the University of Virginia Health
Sciences Center (HFF and MJG), University of Rochester
(DCW), Washington University (DWF), and University of
British Columbia (KWB and RAW) participated in this interobserver reproducibility study. Each independently reviewed
slides (range 1-9 per case; median 3) of adequate quality from
75 IDC obtained by excisional biopsy, mastectomy, or less often, incisional biopsy. The IDC were selected by one of the authors (HFF) who chose them solely on grounds of slide availability. The specimens were fixed in formalin or zinc formalin.
Each participant was furnished with a copy of the publication
describing the updated version of the Nottingham grading system. 4 In addition, each participant was referred to the illustrations showing histologic features of the three grades in
chapter 17 of Diagnostic Histopathology of the Breast by David
6
From the 'Departments of Pathology. University of Virginia Health L. Page and Thomas J. Anderson.
2
Sciences Center. Charlottesville, Virginia; University ofBritish Colum- Several written instructions for histologic grading (some of
bia, Vancouver, British Columbia;3 Washington University, St. Louis, these details had appeared in the Elston and Ellis publication) 4
Missouri; and4 University ofRochester Medical Center. Rochester. New were also provided to the participants. These guidelines inYork.
cluded the following: For scoring tubule formation, the overall
appearance of the neoplasm was to be taken into consideration.
Manuscript received February 23, 1994; revision accepted April 21,
For nuclear pleomorphism, the areas of the cancer having cells
1994.
with the greatest atypia were to be evaluated. Mitotic figures
Address reprint requests to Dr. Frierson: Department of Pathology,
were to be counted only at the periphery of each IDC. Counting
Box 214, University of Virginia Health Sciences Center, Charlotteswas to begin in the most mitotically active area (fields that subville, VA 22908.
195
196
ANATOMIC PATHOLOGY
Original Article
jectively had the highest density of mitotic figures), and was to
commence with a field having one or more mitotic figures. Ten
high-power (X400) fields were to be counted more or less in the
same area, but they were not necessarily to be contiguous. The
fields were to be filled with as much tumor as possible. Poorly
preserved areas, if present, were to be avoided. Only bona fide
mitotic figures were to be counted. Cells in prophase were to be
ignored. The appropriate mitotic count scores were to be selected according to the high-power field area nearest that provided in Table 2 found in the Elston and Ellis publication. 4
According to the Nottingham scheme, 4 the grade was obtained by summing the scores for tubule formation, nuclear
pleomorphism, and mitotic count, each of which was given 1,
2, or 3 points. Briefly, when most of an IDC (>75%) was composed of tubules, a score of 1 point was given. Two points were
given when tubule formation was present in moderate amounts
(10% to 75%), and 3 points were given when an IDC lacked or
showed only minimal tubule formation (<10%). For nuclear
pleomorphism, an IDC having nuclei with minimal variation
in size and shape was given 1 point. An IDC with moderate
pleomorphism was given 2 points, and those with marked variation were given 3 points. The mitotic count score ranged from
1 to 3. Grade I IDC had 3 to 5 points, grade II neoplasms had 6
or 7 points, and grade III tumors had 8 or 9 points.
Each participant also recorded the total number of mitotic
figures per 10 high-power (X400) fields for each case. In addition, each recorded the brand and type of microscope that was
used and the high-power field area. Subsequently, the number
of mitotic figures per mm 2 was calculated. The high-power field
area (0.274 mm 2 ) originally used by Elston and Ellis4 was employed to calculate the mitotic count scores (1 to 3) per mm 2 .
Finally, the participants were encouraged to provide comments
on any particular case.
Interobserver agreement for tumor grade, tubule formation, nuclear pleomorphism, and mitotic count was tested using pairwise kappa and generalized K statistics.7 The divisions
of the K statistic providing "benchmarks" for strength of
agreement were described by Landis and Koch8:
Kappa Statistic
Strength ofAgreement
<0.00
0.00-0.20
0.21-0.40
0.41-0.60
0.61-0.80
0.81-1.00
Poor
Slight
Fair
Moderate
Substantial
Almost Perfect
Pairwise percent agreement for histologic grade, tubule formation, nuclear pleomorphism, and mitotic count was also calculated as well as consensus 4 (diagnostic agreement by four of
six pathologists or better); consensus 5; and unanimous diagnoses.
RESULTS
Consensus 4 agreement for histologic grade and its three
components is presented in Table 1. Twenty-three percent of
IDC were grade I, 32% were grade II, and 45% were grade III.
Only 4% of cases had tubule formation score 1, and 7% had
nuclear pleomorphism score 1. Slightly more than half had mitotic count score 3. Consensus agreement for grade could not
TABLE 1. CONSENSUS-4 AGREEMENT FOR GRADE AND
ITS THREE COMPONENTS
Number of Cases (%)
Parameter
Grade
1
2
3
Tubule formation
1
2
3
Nuclear pleomorphism
1
2
15(23)
21(32)
30(45)
3(4)
31(45)
35(51)
5(7)
37(52)
29(41)
3
Mitotic count
19(31)
10(16)
32(53)
1
2
3
be reached for 11.5% of IDC (Table 2). There was consensus
agreement for tubule formation and nuclear pleomorphism for
92% and 95% of the neoplasms. Consensus agreement was lowest for mitotic rate (81%).
Pairwise percent agreement and pairwise K values are presented in Table 3. The median pairwise percent agreement for
grade was 71%; for tubule formation, 81%; for nuclear pleomorphism, 64%; and for mitotic rate, 67%. Generalized K values for grade, tubule formation, nuclear pleomorphism, and
mitotic count were 0.55, 0.64, 0.40, and 0.52, respectively.
Hence, the strength of agreement for grade and mitotic count
was moderate, whereas that for nuclear pleomorphism approached moderate. The strongest agreement (substantial) was
seen for tubule formation.
Using the microscopic high-power (X400) field area for each
participant (0.344 mm 2 ,0.196,0.344,0.105,0.178, and 0.344),
the mitotic counts were normalized (number of mitotic figures
per mm 2 ) for each case. The field area (0.274 mm 2 ) originally
used by Elston and Ellis4 was employed to calculate mitotic
count scores (1 to 3) per mm 2 . Using these calculations, the
improvement in reproducibility for mitotic count scores was
slight. The pairwise percent agreement ranged from 63 to 81%
(median: 72%), whereas the generalized K value improved only
to 0.55.
DISCUSSION
The Nottingham/Tenovus Primary Breast Cancer Study has
shown that histologic grade is an independent prognostic factor
for women with invasive mammary carcinoma, and in combination with tumor size and lymph node stage, a useful prognostic index has been generated.4 The histologic grading scheme is
a modification of the Bloom and Richardson 5 system, with the
most notable improvement being the assignment of points for
mitotic counts according to high-power field areas for each of
three types of microscopes. Although there are several reproducibility studies published for the Bloom and Richardson system, there has been no multi-institutional evaluation of the
Nottingham scheme for interobserver reproducibility of grade
and its three components for a large series of IDC. In our study,
A.J.C.P.-February 1995
197
FRIERSON ET AL.
Grade of Invasive Breast Cancer
TABLE 2. DIAGNOSIS AGREEMENT FOR GRADE AND ITS THREE COMPONENTS
Rate of Agreement by Participants (n = 6)
Parameter
6/6
5/6
4/6
7>4/6
<4/6
Grade
Tubule formation
Nuclear pleomorphism
Mitotic rate
30*(40)
41 (54)
18 (24)
30 (40)
17* (23)
17 (23)
22 (30)
12 (16)
19* (25.5)
11 (15)
31 (41)
19 (25)
66* (88.5)
69 (92)
71 (95)
61 (81)
9*(11.5)
6 (8)
4 (5)
14 (19)
Values in parentheses are percentages.
* Number of cases of infiltrating ductal carcinomas.
each of six surgical pathologists from four institutions examined grade and each of its components for 75 IDC.
We found that tubule formation was the most reproducible
of the three features of histologic grade (generalized K = 0.64).
This substantial rate of agreement may have been due in part to
the semiquantitative aspect of scoring tubule formation (>75%
tubule formation = 1 point; 10% to 75% = 2 points; and <10%
= 3 points). Using the Bloom and Richardson system, others
have found that scoring tubule formation is more reproducible
than scoring either nuclear pleomorphism or mitotic count.9"''
The reproducibility for scoring nuclear pleomorphism was
clearly inferior to that for grade as well as for each of the other
two components. In two studies of the Bloom and Richardson
scheme, agreement among pathologists was also worst for
nuclear pleomorphism.9" It would seem that improvement in
analyzing nuclear pleomorphism would improve histologic
grading. More precise definitions and the liberal use of published illustrations of photomicrographs would possibly enhance the agreement in scoring this parameter. Future studies
using image analysis for evaluation of nuclear size and contour,
chromatin distribution, and nucleolor size might, one day lead
to a more accurate assessment of nuclear pleomorphism as a
component of histologic grade.
The generalized K value of 0.52 for mitotic count indicated
moderate reproducibility. The variables responsible for differences among observers in mitotic counts are numerous and include the inexact criteria for identification of mitotic figures,
staining quality, section quality and thickness, nonrandom distribution of mitoticfigureswithin the histologic section, tumor
cell size, amount of stroma, and microscopic field area.1213 In
further analysis of the reproducibility of mitotic count scores,
the numbers of mitotic figures per mm2 were calculated and
compared with the scores determined for the high-power field
area (0.274 mm2) originally used by Elston and Ellis.4 It was
interesting that the reproducibility for the mitotic counts im-
proved only slightly when calculated per mm2 as compared
with that for the counts used by the participants who selected
the scores for their field areas nearest that provided by Elston
and Ellis.4 This indicates that the mitotic count cutoff points
provided for the three different field areas are satisfactory, and
that mitotic counts per mm2 need not be calculated.
Moderate to substantial agreement (pairwise K range 0.430.74) was found for histologic grade. The median pairwise percent agreement was 71%. Studies using the Bloom and Richardson system with its less precisely defined criteria have
yielded agreements between two pathologists of 50.6%, 54%,
66%, 72%, and 78%.1'10'1 '•'"•'5 Of thesefivestudies, the only one
that limited the evaluation to IDC reported a pairwise
agreement of 72%.'° One other study examined the reproducibility for six pathologists who graded IDC using the Bloom and
Richardson system.9 Their generalized K value (0.30) for grade
was considerably lower than that found in our study (0.55). In
a recent study, 10 slides from 10 IDC were submitted to 25
private practice pathologists.16 Using an earlier Elston and Ellis
scheme, there was >87% agreement for 8 of the 10 IDC. In
our study and in others, disagreements in more than one grade
category were uncommon, typically occurring in no more than
5%ofcases. u o , u 5
Our study was designed to rigorously test the interobserver
reproducibility of grading invasive breast cancer. Hence, we restricted the cases to IDC, excluding the "special" types of invasive breast cancer, which, if included, might have resulted in
higher K values. Other measures might also have led to more
impressive statistical values. Such efforts as limiting the number of slides per case, identifying specific areas on the slides to
be evaluated, issuing a training set of slides, and holding microscopic teaching sessions for all participants would likely have
led to increased reproducibility for grade and each of its components. However, such measures would be more artificial and
TABLE 3. PAIRWISE PERCENT AGREEMENT AND PAIRWISE KAPPA VALUES FOR GRADE
AND ITS THREE COMPONENTS
Pairwise %
Agreement
Grade
Tubule
Formation
Nuclear
Pleomorphism
Mitotic
Count
Pairwise
kappa
Grade
Tubule
Formation
31-40
41-50
51-60
61-70
71-80
81-90
0
0
0
6
8
1
0
0
0
1
6
8
1
0
5
3
6
0
0
0
1
8
5
1
0-0.20
0.21-0.40
0.41-0.60
0.61-0.80
0.81-1.0
0
0
12
3
0
0
1
4
10
0
Vol. 103-No. 2
\
Nuclear
Pleomorphism
Mitotic
Count
1
8
5
1
0
0
1
12
2
0
198
ANATOMIC PATHOLOGY
Original Article
less likely to reflect the "everyday" experience of grading breast
cancer.
Sources of variability in grading including intraobserver variability were not examined. These sources might be identified
by a more detailed analysis, such as that employed by Cramer
and colleagues17 who have applied a pathtracking method to
examine the diagnostic variability in classifying common ovarian cancers.
REFERENCES
1. Davis BW, Gelber RD, Goldhirsch A, et al. Prognostic significance
of tumor grade in clinical trials of adjuvant therapy for breast
cancer with axillary lymph node metastasis. Cancer 1986; 58:
2662-2670.
2. Contesso G, Mouriesse H, Friedman S, et al. The importance of
histologic grade in long-term prognosis of breast cancer: A study
1,010 patients, uniformly treated at the Institut GustaveRoussy. J Clin Oncol 1987; 5:1378-1386.
3. Le Doussal V, Tubiana-Hulin M, Friedman S, et al. Prognostic
value of histologic grade nuclear components of Scarff-BloomRichardson (SBR): An improved score modification based on a
multivariate analysis of 1262 invasive ductal breast carcinomas.
Cancer 1989;64:1914-1921.
4. Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathol0^1991; 19:403-410.
5. Bloom HJG, Richardson WW. Histological grading and prognosis
in breast cancer: A study of 1409 cases of which 359 have been
followed for 15 years. Br J Cancer 1957; 11:359-377.
6. Elston CW. Grading of invasive carcinoma of the breast. In: Page
DL, Anderson TJ, eds. Diagnostic Histopathology of the Breast.
Edinburgh: Churchill Livingstone, 1987, pp 300-311.
7. Fleiss JL. Statistical Methods for Rates and Proportions. New
York: John Wiley & Sons, 1981. pp 212-236.
8. Landis JR, Koch GG. The measurement of observer agreement for
categorical data. Biometrics 1977;33:159-174.
9. Delides GS, Garas G, Georgouli G et al. Intralaboratory variations
in the grading of breast carcinoma. Arch Pathol Lab Med
1982;106:126-128.
10. Theissig F, Kunze KD, Haroske G, Meyer W. Histological grading
of breast cancer: Interobserver, reproducibility and prognostic
significance. Pathol Res Pract 1990; 186:732-736.
11. Harvey JM, de Klerk NH, Sterrett GF. Histological grading in
breast cancer: Interobserver agreement, and relation to other
prognostic factors including ploidy. Pathology 1992;24:63-68.
12. Simpson JF, Dutt PL, Page DL. Expression of mitoses per thousand cells and cell density in breast carcinomas: A proposal.
Hum Pathol 1992;23:608-611.
13. van Diest PJ, Baak JPA, Matze-Cok P, et al. Reproducibility of
mitoses counting in 2,469 breast cancer specimens: results from
the Multicenter Morphometric Mammary Carcinoma Project.
Hum Pathol 1992; 23:603-607.
14. Stenkvist B, Westman-Naeser S, Vegelius J, et al. Analysis of reproducibility of subjective grading system for breast carcinoma.
J Clin Pathol 1979;32:979-985.
15. Hopton DS, Thorogood J, Clayden AD, Mackinnon D. Observer
variation in histologic grading of breast cancer. Eur J Surg Oncol
1989;15:21-23.
16. Dalton LW, Page DL, Dupont WD. Histologic grading of breast
carcinoma: A reproducibility study. Cancer 1994; 73:2765—
2770.
17. Cramer SF, Roth LM, Mills SE. Sources of variability in classifying
common ovarian cancers using the World Health Organization
classification: application of the pathtracking method. Pathol
Annu 1993;28(part2):243-286.
A.J.C.P.-February 1995