CYTO 2017 Image analysis challenge Background information Short description for the CYTO program: In a time when vast amounts of bioimaging data are produced in labs around the globe every day, effectively extracting salient information from this growing resource is paramount to understanding complex biological questions. The CYTO 2017 Image analysis challenge proposes four tasks where the aim is to classify fluorescence microscopy images from the Human Protein Atlas database www.proteinatlas.org based on subcellular protein localization, and present your findings during the final platform session at CYTO 2017. Description: In a time when vast amounts of bioimaging data are produced in labs around the globe every day, effectively extracting salient information from this growing resource is paramount to understanding complex biological questions. In this challenge, you have the opportunity to attempt a series of automated classification tasks for fluorescence microscopy data and present your findings during the final platform session at CYTO 2017. Prizes: In addition to the satisfaction of besting the challenges, prizes for winners may include complimentary registration to CYTO 2018 in Prague, complimentary ISAC membership and the possibility of participating in a paper published on the challenge in Cytometry Part A. The Dataset: The image data provided in this challenge were generated by the Cell Atlas (Thul et al. “A subcellular map of the human proteome”, Science, in press.), part of the Human Protein Atlas database www.proteinatlas.org (Uhlen et al. 2010). The images visualize immunostaining of human proteins and the aim of this challenge is to recognize the patterns of protein subcellular distribution to major organelles and fine substructures. All images were acquired in a standardized manner using Leica SP5 confocal microscopes using a 63x/1.2 NA oil objective and Nyquist sampling rate in 4 fluorescence channels. Each field of view is comprised of 4 images. This includes 3 reference channels; DAPI for the nucleus (“blue”), antibody based staining of microtubules (“red”), and endoplasmic reticulum (“yellow”). These can be used to aid you in predicting localizations of the protein of interest (“green”). Download the data from: http://www.proteinatlas.org/CYTO_challenge2017/ Inputs: Images: The input images for all sub-challenges will be .tif format with separate images for each of the channels from a given field of viewt. A brief description of the dataset contained in each sub-challenge: Challenge 1: 1802 fields of view containing multilabel data for 2 protein localizations Challenge 2: 20,000 fields of view containing multilabel data for 13 protein localizations Challenge 3: 870 fields of view containing multilabel data for an additional 3 classes to be combined with the dataset from Challenge 2 Challenge 4: There are no new images in Challenge 4. Solution keys for this challenge reveal patterns that were merged in previous challenges and should replace the solution keys from those challenges. Bonus challenge: There are no new images for this challenge. Solution keys for this challenge specify a binary value indicating whether the field of view has been labeled as “variable”. Solution keys: Solution keys for the data have been manually generated by gamers in EVE Online via Project Discovery and curated/augmented by the Human Protein Atlas for quality. Each solution key will contain a list of image filenames and strings encoding the set of locations of the protein of interest using keywords provided in the attached “keywords.txt” file. The solution key lists the type of localizations present in each image. It is important to note that the same type of localization may not be present in all cells in the same image. e.g. 1001_A1_1, Mitochondria,Nucleoli 1001_A2_1, Nucloli … Initial assessment: Self-assessment can be performed for each sub-challenge using cross validation for the average per-class F1-score. precision·recall F 1 = 2· precision+recall Final assessment: Held out datasets without solutions will be provided via the challenge website 2 weeks before the Cytometry conference. Accuracy will be judged using the average per-class F1-score. Solutions to these sets can be submitted online at [URL] for automated scoring in the format. <File_ID>,Class 1, Class 2, Class 3 Final assessment metrics are briefly outlined in each sub-challenge below. Presentation of results at CYTO 2017 (June 14 15:30-16:00): The leader board and presentation submissions will close 24hrs prior to presentations (June 13 15:30 EST). Top teams present at the CYTO conference should prepare a 5-minute presentation of their approaches and email them to [email protected]. Teams not present that still wish to present results should submit a 2-slide presentation to [email protected]. Teams will be notified by 17:00 on June 13 if they are presenting. CYTO2017 Image Analysis Challenge This challenge is split into 5 sub-challenges. Participants may choose to complete any or all of these to the best of their abilities, however sub-challenges are generally meant to build on each other and increase in difficulty so it is suggested that participants attempt them in order. 1. Getting started Using the mito_nui.tar dataset, create a model capable of distinguishing the three classes within the dataset (Figure 1). The solution key for this dataset is called mito_nui_solution_key.zip. TIP: Creating a learner capable of recognizing multi-label data will be key for future sub-challenges. (a) (b) (c) Figure 1. Example of protein localizing to mitochondria (a), nucleoli (b) and a protein localizing to both mitochondria and nucleoli (c). Assessment - This sub-challenge will be assessed using the F1-score for a held out set of data containing the same classes present in this sub-challenge. 2. Adding more complexity Using the major13.tar dataset, create a model capable of distinguishing each of the 13 “major” organelles and their mixtures. This dataset contains 13 labels, where each image may have any number of labels (1-13). The solution key for this dataset is called major13_solution_key.zip. TIP: Some classes may be much less common than others, so class balancing may be necessary to not over-train certain classes. Figure 2. Cartoon representation of the 13 major organelles present in major13.tar. Each of these localizations may be present Assessment - This sub-challenge will be assessed using the F1-score for a held out set of data containing the same classes present in this sub-challenge. 3. Rare events Often what is most interesting is what’s unusual. As you discovered in the previous sub-challenge, some classes are far more rare than others. Adding the rare_events.tar dataset to the major13.tar dataset, design a classifier capable of accurately recovering rare phenotypes. The solution key for this dataset is called rare_events_solution_key.zip. TIP: How do you represent rare phenotypes? Is trusting one guess beneficial or is a minimum number of instances required? This might not be the same for every class. (a) (b) (c) Figure 3. The rare classes contained in rare_events.zip are cytokinetic bridge, aggresomes, and focal adhesions (a-c). Each class may be present in combination with other classes or individually. Note that when a pattern is present, not every cell in the image must contain the pattern. Particularly, patterns unique to a transient temporal phase such as cytokinetic bridge may be both rare in the population and uncommon in the image. Assessment - This sub-challenge will be assessed using the F1-score for rare events. 4. Class discovery How many classes are there really? So far there has been a common class for all nucleoli localizations, referred to as ‘nucleoli’, but in reality this localization could be subdivided into more detailed localizations such as ‘nucleoli rim’ and ‘nucleoli fibrillar center’, increasing the number of classes.. Using the major13.tar and the class_discovery_solution_key.zip develop a model capable of “discovering” such distinct sub-populations. TIP: It may be possible to find even more sub-classes than presented in the class_discoverysolution_key.zip. Assessment - This sub-challenge will be assessed using the F1 score for a set of held out “hidden” classes BONUS ROUND: Not all mixtures are created equal Multilabel data is central to this challenge, however sometimes only a fraction of the cells show a pattern or set of patterns. In other words, not all mixtures e.g. “Mitochondria,Nucleoli” are the same. Some cells may show a Mitochondria pattern together with a Nucleoli pattern in every cell, while others may show only Mitochondria in some cells and only Nucleoli in others. Identifying these cell-to-cell variations can be key in understanding dynamic protein behavior such as cell-cycle, micro-environment or drug effects. Depending on your architecture these cases may be very difficult to distinguish from each other. In this challenge, we are interested in how you handle these cases. Can your algorithm distinguish them and if so how? The ccv_data_solution_key.zip will help you to tune your model by providing binary information about what fields of view in the major13.tar dataset have cells with varying protein localizations, however we do not currently have per-cell annotations of these fields of view, so this sub challenge will not have a leader board. We will however consider solution descriptions sent to us at [email protected] and choose interesting implementations to highlight during the final platform session. Here are some things to consider when you attempt this challenge: 1. Is the classifier capable of finding single-cell variations and distinguishing them from cases where mixtures are present within cells? 2. Can you estimate which cells are showing which patterns? What is the fraction of cells showing each pattern, or maybe this is better described as fraction of fluorescence? TIP: Rerun the solution for this challenge on previous challenges. How many variable cases do you find? Does pruning these from your training set improve performance? (a) (b) Figure 4. Examples of variable protein expression. Here Nucleoli are present in one cell but not others. In another case, PSMC6 is shown to translocate between the cytosol and the nucleus. Assessment - This sub-challenge will not be formally assessed; however, we will accept submissions to [email protected] for consideration in the final platform session presentations.
© Copyright 2026 Paperzz