MICCAI Grand challenge: Assessment of mitosis detection algorithms (AMIDA13) This is a printer-friendly PDF version of the content of the website http://amida13.isi.uu.nl Version date: 17-04-13 Organizers: Mitko Veta, Max A. Viergever, Josien P.W. Pluim Image Sciences Institute, University Medical Center Utrecht Nikolaos Stathonikos, Paul J. van Diest Pathology Department, University Medical Center Utrecht Important dates: March 28th – Training set available for download; May 27th – Testing set available for download; July 22nd – Conference early bird registration deadline; September 8th – Deadline for submission of results that will be presented at the workshop; September 22nd – Workshop date; Introduction Histologic tumor grading systems assess the differentiation of the tumors, i.e., how closely they resemble normal tissue when examined under a microscope. Generally, patients with well-differentiated tumors have better outcomes and vice versa. Clinicians use the histologic grade, among other factors, to give an estimate of the patient’s prognosis and to develop individual treatment plans. The most widely used system for histologic grading of invasive breast cancer (BC) is the Bloom & Richardson grading system (B&R). It consists of three components: nuclear pleomorphism, degree of tubule formation and mitotic activity. Mitotic activity is one of the strongest prognosticators for invasive breast carcinoma. It is expressed as the number of mitotic figures per tissue area. As part of the B&R grading system, mitotic activity is routinely assessed in pathology labs across the world. In addition, the mitotic activity can be used as a prognosticator independently of the B&R grading system. Although it has strong prognostic value for invasive breast carcinoma, it is a tedious task prone to observer variability. With the advent of digital imaging in pathology, which has enabled cost and time efficient digitization of whole histological slides, automatic image analysis has been suggested as a way to tackle these problems. Challenge goals The goal of this challenge is to evaluate and compare (semi-)automatic mitotic figure detection methods that work on regions extracted from whole-slide images. It is our strong belief that providing open access to a high quality annotated dataset can lead to major advancement in the development of a successful mitotic detection method. Since only the number of mitotic figures present in the tissue is of importance (i.e. the size and shape of the mitotic figures is not of interest), we formulated the challenge as a detection problem. The ground truth is provided in the form of locations of ground truth objects and the results are requested in the same format. In the spirit of cooperative scientific progress, we want to evaluate whether different methods work better in different situations, and whether a combination of methods and ideas can result in improvement of the results. Challenge format Teams or individuals interested in participating in the challenge can register on this website. Upon registration, the participants will be able to download the training set (consisting of images and ground truth locations for the mitotic figures) that they can use to develop their method. At the end of May 2013, a testing set (consisting of images only) will become available for download. The participants will be able to run their 2/13 method on the testing set and send their results for evaluation. All submissions must be accompanied by a short description of the method. All participants will be invited to attend the challenge workshop on September 26th as part of MICCAI 2013 in Nagoya, Japan. The workshop will consist of presentations of the proposed methods by the participants and a summary of the results by the organizers followed by a discussion. After the workshop, a summary article describing the proposed methods and results will be written and sent to a high-impact peer-reviewed journal. The article will include the study setup, data set description, brief descriptions of the proposed methods and an overview of the results achieved. All teams that have submitted results and participated in the challenge workshop will be assigned two co-authorships. After the challenge workshop is concluded, this website will remain open for additional submissions. For more information, please refer to the Background, Dataset, Evaluation and Rules pages. 3/13 Background On this page, we give an overview of the tissue/slide preparation process so participants that do not have prior experience in analysis of histopathology images have an overview. We also give a short introduction to mitosis counting, i.e., how it is performed in standard clinical practice and what the challenges are. Tissue preparation After breast tumor excision is performed in the operating room, the excised material is sent for analysis in a pathology lab. The tissue preparation process starts with making smaller cuts of the material that are then fixed in formalin and (after processing) embedded in paraffin. Using a high precision cutting instrument (microtome), thin sections are cut from the paraffin block, which are then put on glass slides. The final stage of the tissue preparation process is the staining of the sections with stains that highlight specific structures of the tissue so they are better visible under a microscope. The standard staining protocol uses the hematoxylin and eosin stains (the diagnostic/prognostic procedure for all patients always starts by staining the sections with these stains). The hematoxylin dyes the nuclei a dark purple color and the eosin dyes other structures (cytoplasm, stroma, etc.) a pink color. Preparation of H&E stained histology slides From top left to bottom right: 1) Small cuts are made from the tissue 2-3) The smaller cuts are put into cassettes and (after processing) embedded in paraffin 4) Thin sections are made from the paraffin blocks with a microtome 5) The sections are put onto glass slides for staining 6) H&E stained slide Digital Pathology Recent years have brought the trend of digitization of histological slides. Digital slide scanners, in combination with digital slide viewers, aim to provide the experience of viewing a digital slide on a computer monitor in a manner analogous to viewing it under a microscope, but with all the added benefits of the digital format (ease of annotation, image analysis, collaborative viewing etc.). The output of the digital slide scanners are multi-layered images, stored in a format that enables fast zooming and panning. Depending on the area of the tissue that is present on the slide and the magnification and resolution at which the slide is scanned, the lowest layer of the digital slide can be up to several tens of thousands of pixels in width or height. Currently, digital slides are mainly used for research, education and remote consultation purposes. Their use for routine diagnosis and prognosis is not yet common, however 4/13 that is expected to change in the coming years. Availability of automatic image analysis algorithms that can aid pathologists in their work can be a major incentive for acceptance of digital slides in the routine pathology lab workflow. The Aperio ScanScope XT scanner model used at our Pathology Department Mitosis counting Histological tumor grading systems assess the differentiation of the tumors, i.e., how closely they resemble normal tissue when examined under a microscope. Generally, patients with well-differentiated tumors have better outcomes and vice versa. Clinicians use the histological grade, among other factors, to give an estimate of the patient’s prognosis and develop individual treatment plans. The most widely used system for histological grading of invasive breast cancer (BC) is the Bloom & Richardson grading system (B&R). It consists of assessment of three components: nuclear pleomorphism, degree of tubule formation and mitotic activity. Mitotic activity is one of the strongest prognosticators for invasive breast carcinoma. It is expressed as the number of mitotic figures per tissue area. Aggressive tumors have a high proliferation rate, which is reflected in a high number of mitotic figures present in the histological sections. As part of the B&R grading system, mitotic activity is routinely assessed in pathology labs across the world. In addition, the mitotic activity can be used as a prognosticator independently of the B&R grading system. Typically, the pathologist recieves a panel of slides for each case that is to be graded. He or she then proceeds to select one slide where the histological grading will be performed. The mitosis counting is performed in 8-10 consecutive microscope high power fields (depending on the microscope model) that should correspond to an area of 2 mm2. The standard guidelines are to select an area that encompasses the most invasive part of the tumor, at the periphery and with highest cellularity. Depending on the number of figures counted, a mitotic activity score is assigned. Cases with 7 or fewer mitotic figures present are assigned score 1 (best prognosis). Cases with more than 12 mitotic figures are assigned score 3 (worst prognosis). The intermediate cases are assigned score 2. 5/13 Challenges in spotting mitotic figures Because of the aberrant chromosomal makeup of many tumors (aneusomy, polysomy, translocations, amplifications, deletions), the appearance of mitotic figures in the images can significantly differ from the textbook examples of a splitting nucleus. In addition, imperfections of the tissue preparation process result in tissue appearance variability, which can present a challenge for an automated mitosis detection system. In the image below, some examples of mitotic figures in H&E breast cancer sections are shown. Examples of mitotic figures (marked with green arrows); Note: best viewd on a computer screen Most commonly, mitotic figures are exhibited as hyperchromatic objects. In addition, they have absence of a clear nuclear membrane, “hairy” protrusions around the edges and basophilia instead of eosinophilia of the surrounding cytoplasm. However, these are more guidelines than hard rules, and the bulk of the training of pathologists is done by looking as specific examples of mitotic figures. One of the main challenges in spotting mitotic figures is that other objects such as apoptotic nuclei (shown on the images below) can have similar appearance, making it difficult even for trained experts to make a distinction. Lymphocytes, compressed nuclei, “junk” particles and other artifact form the tissue preparation process, can also have hyperchromatic appearance. 6/13 Examples of apoptotic nuclei, most commonly mistaken as mitoses (marked with green arrows); Note: best viewd on a computer screen Useful links Grand Challenges in Medical Image Analysis Mitosis Detection in Breast Cancer Histopathology Images: An ICPR2012 Contest 7/13 Dataset Patient, slide and region selection For the formation of the dataset for the mitosis detection challenge, hematoxylin and eosin (H&E) stained slides from 23 invasive breast carcinoma patients were made available. These are invasive breast carcinoma patients who underwent an excision biopsy between July 2009 and January 2010 at the University Medical Center Utrecht. The single patient selection criterion was the availability of the slides in the Pathology Department archive. Please note that we use the routinely prepared H&E sections, which capture the day-to-day variability of the tissue preparation and staining processes. One expert pathologist selected one representative stained slide per patient and marked a large region of the tumor on the glass slides where mitosis annotation was to be performed. For the larger tumors, the marked areas within the digital slides were selected to encompass the most invasive part of the tumor, at the periphery and with highest cellularity, which are the standard guidelines for performing mitosis counting. Smaller tumors were included in their entirety. The regions of interest vary in size in the range from 7 mm 2 to 27 mm2. The standard in pathology practice is to count mitotic figures in an area of 2 mm2 (translating to 8 to 10 high power fields, depending on the microscope) and report that number as the mitotic activity index. However, in order to annotate as many mitotic figures as possible, the counting was not limited to 2 mm2 but was extended to the entire marked area. Digitization The digitization of the regions marked for annotation was performed with the Aperio ScanScope XT scanner at 40× magnification and with a spatial resolution of 0.25 μm/pixel. This is one of the most widely used digital slide scanners at present. During the scanning, the automatically selected focus points by the scanner were manually revised in order to avoid out of focus artifacts and to ensure the best image quality possible. At the time of scanning, high quality JPEG 2000 compression (quality factor of 85) was used in order to reduce the storage requirements. Mitosis annotation Two expert pathologists independently traversed the selected regions on the digital slides and annotated the locations of mitotic figures. This was done using standard digital slides viewing software on consumer grade computer monitors. The concordant cases (objects that were annotated as mitotic figures by both observers) were taken as ground truth objects directly. The discordant cases (objects that were annotated as mitotic figures only by one of the observers) were presented to a panel of an additional two observers who made the final decision. Note that the additional two observers did not traverse the slides, but only looked at the discordant cases. With this setup, all objects that are accepted as ground truth mitotic figures have been agreed upon by at least two experts. 8/13 Dataset format The annotated regions were exported into separate images (TIFF format), each image representing one high power field (HPF, defined as 0.5×0.5 mm 2 or 2000×2000 pixels). Since for some cases the total number of HPFs is very high (in the order of several hundreds), only the HPFs that contain at least one mitotic figure were included as part of the dataset. For the cases that have fewer than 10 HPFs in which a mitotic figures is present, additional “empty” HPFs were included to extend the total number to 10 (in order to include sufficient “background” information, necessary for good training and evaluation). The patients were divided into two groups, one used for training and the other as an independent testing set. The division was done in such a way that the number of mitotic figures in the two groups is balanced. Both training and testing sets are organized into numbered folders, each folder containing HPFs and, if applicable, ground truth data from a single slide (patient). The HPFs are stored as 8-bit RGB TIF images with PackBits lossless compression. An alternative version of the datasets, with smaller download size, where the images are stored with light lossy JPEG compression (quality factor of 95), is available for download. Note that this compression is on top of the one used at scan-time. The training HPF images are accompanied by a comma separated value (CSV) file with the same filename but different extension (.csv) containing the locations of the ground truth mitotic figures. The “empty” HPFs do not have a corresponding CSV file. Each row in the CSV file corresponds to one mitotic figure, and the two columns give the image coordinates of the annotated location. 9/13 Evaluation Once the testing set is available for download, the participants can run their methods and submit results, upon which they will receive the evaluation results. Each participating team or individual can submit results up to three times before the challenge workshop. Each submission must be accompanied by an abstract describing the proposed method. The second and third submission must be accompanied by a new abstract or a short description of the difference of the method from the previous submissions. After the participants send their results, they will receive the number of true positives, false positives and false negatives for each HPF in the testing set. Our main goal is to evaluate automatic methods for mitosis detection. However, methods that require or use some degree of user interaction are acceptable, provided they still offer benefit over fully manual mitosis counting. The user interaction must be described in the submitted abstract (see below), and if applicable, the output from the user interaction should be uploaded along with the results. All methods that require user interaction will be designated as semi-automatic when the results of the challenge are presented. Results format The results must be submitted as a CSV file, one for each HPF, with the same filename as the HPF it refers to. Each row in the CSV file must correspond to one detected mitotic figure location. The first two columns must contain the image coordinates of the detection. A third optional column in the CSV file can contain a confidence value for the detection. If the participants submit the results in this format, then they must provide a threshold for the confidence value, such that all objects with confidence above the threshold are considered detected mitotic figures. The threshold value must be identical for all HPFs from all patients. If the third column is not provided, all objects in the CSV file will be considered as detected mitotic figures. Although for the evaluation of results we do not consider the confidence values of the detected mitotic figures (see below), this information might be used in the summary paper to plot free response ROC curves or other similar graphs. All CSV files must be organized in a directory structure identical to the provided data set. If applicable, the confidence theshold value should be provided as a 'threshold.txt' file at the top level of the directory tree. The abstract (see below) should also be in the top level of the directory tree in PDF format with filename 'abstract.pdf'. For submission, the directory tree must be compressed with the following filename: teamUsername_#submission.[zip, tar.gz, …] 10/13 Abstract format The abstract should be 500 to 1000 words long, and contain Methods and Experiments sections. The Methods section should contain a short overview of the proposed method, in sufficient detail to understand how the method works. If a commercial system is used a method description is not necessary, but the exact name of the system and version number need to be provided. The Experiments section should describe the steps taken in order to select the detection model and/or model parameters (training procedure). Evaluation measures A detection will be considered a true positive if it’s Eucledian distance to a ground truth location is less than 7.5 μm (30 pixels). It can happen that multiple detections fall within 7.5 μm of a single ground truth location. In that case, they will be counted as one true positive. All detections that are not within 7.5 μm of a ground truth location will be counted as false positives. All ground truth locations that do not have a detection within 7.5 μm will be counted as false negatives. For comparison of the proposed methods, two different rankings will be produced: • • Ranking according to the overall F1-score; Ranking according to the F1-score computed for each patient separately; In the first ranking scheme, all ground truth objects are considered as a single dataset (regardless to which patient they belong to). The proposed methods will simply be ranked according to the F1-score calculated as F1 = 2·precision·recall / (precision + recall). The first ranking scheme is heavily influenced by the results for the cases with very high number of mitotic figures. The second ranking scheme equally weights the results from all cases, regardless of the number of mitotic figures present in them. In this case, the ground truth objects belonging to a single patient are considered as separate datasets. F1-score is calculated on the patient level, and the proposed methods are ranked for each patient separately. The final placing of the methods is according to the average ranking from all patients. In the training dataset there is one case with zero ground truth mitotic figures. If such cases occur in the testing dataset, the ranking for those cases will be done according to the number of false positive detections, as the precision and recall are not defined. The ranking of the semi-automatic methods will be done separately from the automatic methods. The final analysis of the results, that will be part of the presentation at the workshop and the overview paper, will also include a qualitative evaluation. This will include review of the most common false positive detections and false negatives by additional expert 11/13 observers. As one of the goals of this challenge is to see if different methods perform better than others in specific situations, we will evaluate if combination of different methods can improve the overall results. 12/13 Rules This challenge is organized in the spirit of cooperative scientific progress. We therefore ask anybody using this website to respect the rules below. The following rules apply to those who register a team and download the data: • • • • • • The downloaded data sets or any data derived from these data sets, may not be given or redistributed under any circumstances to persons not belonging to the registered team; All information entered when registering a team, including the name of the contact person, the affiliation (institute, organization or company the team's contact person works for) and the e-mail address must be complete and correct. In other words, anonymous registration is not allowed. The data provided will not be used for any purposes other than the challenge; The data downloaded from this website must primarily be used for preparing an entry to be submitted to this challenge. The data may not be used for other purposes in scientific studies and may not be used to train or develop other algorithms, including but not limited to algorithms used in commercial products, without prior participation in this challenge; Evaluation of results uploaded to this website will be made publicly available on this site, and by submitting results, you grant us permission to publish our evaluation. Participating teams maintain full ownership and rights to their method. We do not claim any ownership or rights to the algorithms; If the results of algorithms in this challenge are to be used in scientific publications (journal publications, conference papers, technical reports, presentations at conferences and meetings) you must make an appropriate citation. Currently, this citation will refer to the this web-site, and later to the publication that will describe the results of this challenge; Teams must notify the organizers of the challenge about any publication that is (partly) based on the results data published on this site, in order for us to maintain a list of publications associated with the challenge; Feel free to contact us if you have any questions or need clarification of the rules of the challenge. 13/13
© Copyright 2026 Paperzz