Learning Algorithms Applied to Cell Subpopulation Analysis in High Content Screening Bohdan Soltys*, Yuriy Alexandrov, Denis Remezov, Marcin Swiatek, Louis Dagenais, Samantha Murphy and Ahmad Yekta GE Healthcare, 500 Glenridge Ave, St. Catharines, Ontario, Canada L2S 3A1; Tel: 905.688.2040, E-mail: [email protected] Abstract and Introduction Application 1: Cell cycle analysis The use of machine learning in image-based screening is reported. This approach provides a novel means to deal with complex image analysis problems. Cell heterogeneity often confounds image analysis, even in established cell lines. There is a need to increase data quality by classifying cells into distinct subpopulations. Learning algorithms classify each cell according to user-specified cell categories. Once classified, measurements are reported for specific subpopulations. In this report we demonstrate the use of supervised learning algorithms. Here, the operator first trains the program simply by providing typical images of cells from each class. The algorithm represents these cells in feature space using multiple intensity and morphology descriptors. Next, in screening applications, the program automatically classifies cells using their feature vector outputs. Robust results are demonstrated by comparing performance of the algorithm with human scoring of images acquired using the IN Cell Analyzer 1000. Human G1 G2 M Machine G1 G2 M PLC 91.5% 13 3 8 87% 3 0.5 0 94% Table 2. Cell by cell comparison of human and machine classification results. The confusion matrix shows that the algorithm correctly identifies >85% of the cells in the training data. One does not expect perfect agreement as even two different humans will have disagreeing results. Experimental Methods Cells were plated and grown in Greiner 96-well plastic microplates (#655090, Greiner BioOne Inc.). The following cell lines were used: • U2OS cells expressing a cyclin B1–GFP reporter construct (Amersham Biosciences 2580-10-50) was used for cell cycle analysis. The cell line and experimental procedures have been previously described (1-2). Colchicine was used as a model drug to cause accumulation of cells in mitosis. • GFP-PLCδ-PH Domain Assay (Amersham Biosciences 25-8007-26; CHO-derived cell line). • AKT1-EGFP Assay. (Amersham Biosciences, 25-801017; CHO-derived cell line). AKT1 is also known as protein kinase Bα(PKBα). IGF-1 was used as a reference agonist to stimulate translocation to membrane ruffles. Nuclei were labeled with Hoechst 33258, then cells were washed with physiological saline and imaged live. Images were acquired using the IN Cell Analyzer 1000 at 1392 x 1040 pixels, 12-bit precision at low magnification (10X objective, field of view ~1 mm2). Machine Results Fig 2. Step 2: building the classifier. The operator determines which and how many descriptors (here, 6) are best for identifying each population in the training data. The software automatically selects the best descriptors from the candidates. Three algorithmic approaches are available for identifying the classes: nearest neighbor; neural network; quadratic discriminant. We find that the best way of selecting one is by experimentation with the data at hand. Next, canonical variate analysis (CVA) is used to represent the multi-dimensional data, as shown in the lower part of the figure. The goal is to achieve maximum separation of the classes, as exemplified here by the red and green data points (each cell is represented by a data point). The classifier training is now complete. The next step couples the classifier to a desired analysis module for image quantification. Fig 4. Cell classification. U2OS cells expressing GFP-cyclin B1: (cell body, A and C); (nuclei, B and D). Cells are identified by the software as being in either G1/S (green), G2 (yellow) or mitosis (red). Control cells (A and B) are compared with colchicine-treated cells (C and D). About 1200 cells were human annotated and used to train the system, as described in Figs. 1-2. Classification is possible because of the GFP signal expression levels for different cell phases, and because of changes in cell and nuclear DNA morphology: G1/S - low nuclear and cytoplasmic expression of cyclin B1; G2 - low nuclear and high cytoplasmic expression; Mitosis (M) - high ‘nuclear’ expression, cell is rounded up, DNA condensation In this example, the number of descriptors used to build the classifier is 6. Examples of descriptors are given in Table 1. Cell body intensity/background intensity Nucleus intensity/cell body intensity (both channels) Nucleus size/cell size Nuclear and cell morphology Nearest Neighbor G1 G2 G1 80 20 G2 5 92 M 2 3 M 0 3 95 G1 92 24 3 G2 7 74 3 M 0 1 94 Neural Network 74 19 21 2 5 76 85 22 9 14 69 5 1 10 86 24 76 3 Quadratic Discriminant 1 6 93 1 9 90 4 68 28 5 70 25 94 2 4 94 4 2 Table 3. Comparison of different classification methods in two independent image datasets. When applied to the current test data, the nearest neighbor method produces lower error rates than the other two. Experimentation allows for selection of the best approach. The two datasets in Table 3 differ in signal to noise ratios. Best results are obtained when image qualities of training and test data are similar. AKT annexin PLC 87% 10 3 AKT 19 72% 9 annexin 4 6 90% Table 2. Comparison of known and machine classification of subcellular patterns. The algorithm was trained and tested using pure cell populations. (cf. Fig. 6). The software correctly identified >70% of the indicated patterns in each cell type. a Outlook The described learning algorithms are flexible and are expected to be applicable to a wide variety of applications, enabling independent identification and analysis for each subset of cells in a population, thereby increasing both information content and quality. Suggested additional applications of learning algorithms: Mixed cell cultures containing more than one cell type or multiple reporters Transfected cells where only a subset of cells have the correct expression level Responder/nonresponder classification Intrinsic cell heterogeneity characterization Search for inherent non-evident links between cell classes and response profiles Anomaly detection and genetic screening. Conclusions • Novel learning algorithms have been developed for Application 2: Organelle/object identification use in high content screening • These algorithms identify cell states, and provide Table 1. Morphological and intensity descriptors used by the cell cycle analysis software. the means to increase data quality where cells have Algorithmic Methods heterogeneous cell states • Effective discrimination of G1, G2 and mitotic cells Training the algorithm involves 3 steps descriptor feature vectors classifier is demonstrated trained analysis module • Preliminary work shows that pattern recognition of subcellular organelles or particles is feasible G1/S G2 cell cycle analysis training mitosis untrained analysis module Fig 1. Step 1: annotation. The operator provides the algorithm with example images of cells in each class. Point and click. This human annotation operation creates the training dataset. Fig 3. Step 3: import the classifier into an assay analysis protocol. By this approach the analysis module is converted into a trained module. Multiple classifiers can be used simultaneously in an analysis module. A classifier can in principle be used in any module with learning capability. GE and GE Monogram are trademarks of General Electric Company. Amersham Biosciences UK Limited, a General Electric company, going to market as GE Healthcare © 2004 General Electric Company - All rights reserved. Amersham Biosciences UK Limited Amersham Place Little Chalfont Buckinghamshire England U.K. HP7 9NA. Amersham Biosciences AB SE-751 84 Uppsala Sweden. Amersham Biosciences Corp 800 Centennial Avenue PO Box 1327 Piscataway NJ 08855 USA. Amersham Biosciences Europe GmbH Munzinger Strasse 9 D-79111 Freiburg References Figure 5. Software classification of cell subpopulations. Colchicine treatment leads to an expected accumulation of cells in mitosis (Fig. 5). Determining how accurate software is, however, requires cell by cell comparisons with human scoring, as shown in Tables 2 and 3. Fig 6. Classification of subcellular objects. The algorithm was trained to identify 3 types of subcellular patterns (A) PLCGFP cells showing cell periphery labeling (B) AKT-GFP showing membrane ruffle labeling and (C) annexin, showing punctuate/vesicular labeling. The pattern assignments made by the algorithm are shown as colored outlines: red=PLC, yellow=AKT; green=annexin. The algorithm misclassified only a few cells. All goods and services are sold subject to terms and conditions of sale of the company within the Amersham group which supplies them. A copy of these terms and conditions are available on request General Electric Company reserves the right, subject to any regulatory approval if required, to make changes in specifications and features shown herein, or discontinue the product described at any time without notice or obligation. Contact your GE Representative for the most current information This poster was presented at the Conference of the Society of Biomolecular Screening, 11th to 15th September 2004, Orlando, Florida. * To whom all correspondence should be addressed. (1) Tinkler, H., Thomas, N., Goodyer, I., Zaltsman, A., Arini, N, Game, S. 2003. Multi-parameter analysis of cell cycle related events using a GFP sensor (Amersham Scientific Poster 147). Conference of the Society of Biomolecular Screening. Eugene, Oregon. (2) Goodyer, I, Jones, A., Thomas, N., Tinkler, H., Zaltsman, A., Pines, J., Almuina, NM, Game, S. 2002. Cell Cycle Position Reporting (Amersham Scientific Poster 121) Conference of the Society of Biomolecular Screening, The Hague, Netherlands.
© Copyright 2026 Paperzz