Computer exercise in CellProfiler & CellProfiler Analyst, 111129 Carolina Wählby, [email protected] & Martin Simonsson, [email protected] Starting software and download images CellProfiler and CellProfilerAnalyst can be downloaded from www.cellprofiler.org. This webpage also provides tutorials, examples and forum where you can find answers questions you may have and get help. However, for this exercise the programs have already been installed on your computer, and can be found on ‘Gemensamt on 'IT-Pclab' (G:) → Program’. Download sample images and a description of experimental parameters from http://www.cb.uu.se/~carolina/TranslocationData.zip Scenario In this experiment we have human osteosarcoma cells where a ‘forkhead’-protein has been labeled with GFP (a green fluorescing protein). We know that 150nM of a positive control drug causes the cells to transport the protein from the cytoplasm to the nucleus, but we do not know the lowest possible dose to see this effect. We also want to optimize the image analysis to separate positive (treated with 150nM drug) and negative controls (untreated) as well as possible. The optimal goal of developing this type of image based screens is to use it to search for other, previously unknown drugs that have the same effect on the cell level, and may be of use for treatment of patients (possibly with less side effects than the known drug). Challenge Analyze how cells respond to treatment with a drug. In this case how a protein is translocated from one subcellular compartment to another. Material Your images originate from a ’96-well plate’, but you will work with a subset of 26 images with the following setup: 8 wells untreated (negative control) and 8 wells were treated with the maximum dose of 150nM (positive control). 10 wells were used to create a gradient with increasing concentration of the drug. You will also use a simple text file called ‘Translocation_doses_n_controls.csv’ containing information about where on the 96-well plate the wells were located and how the cells were treated. Methods Using Cellprofiler and CellProfiler Analyst, your job is to outline (segment) each cell and cytoplasm and extract a number of measurements from each cell. You have a total of 26 images and approximately 200 cells per image, so you want to automate the process. The first task is to set up a CellProfiler ‘pipeline’ consisting of a number of ‘modules’ and then test the pipeline on a few images. You will thereafter run it automatically on all the images in the experiment. Once you have extracted information from the images and stored them in a database you will use Cellprofiler Analyst to visualize your data and to train a classifier to distinguish between treated and untreated cells. CellProfiler Select Input and Output folder Unzip TranslocationData.zip. Start CellProfiler and set the ‘Default Input Folder’ to the folder where you placed your images. The names of your images should now appear in the bottom left square of the CellProfiler interface. Double-click on a few of the images to see what they look like. Set ‘Default Output folder’ to the same folder as the input folder. This is where your measurements will be saved. Load images and associated data Click on ‘+’ to add your fist module to the pipeline. Click on ‘File Processing’ and you can select either ‘LoadImages’ or ‘LoadData’. Today we will use ‘LoadData’ as this allows us to load the information about the drug dose together with the image data, which makes analysis of results easier. Then click ‘+ Add to Pipeline’. In the module adjustment window, click on the folder next to ‘Name of the file’ to select ‘Translocation_doses_n_controls.csv’ which is your file describing image names and doses. Click ‘View’ to see what the file looks like. Identify primary objects – the Nuclei To keep things simple, we skip the pre-processing for now. Instead we want to find our primary objects: the cell nuclei. Again, click on ‘+’ to add the module ‘IdentifyPrimaryObjects’, located under module category ‘Object Processing’. Then click ‘+ Add to Pipeline’. Now adjust the settings. At ‘Select the input image’ chose ‘rawDNA’ from the drop-down-list. We choose a descriptive name for our primary objects - ‘Nuclei’, so we can refer to them in later steps, and enter the name at ‘Name the primary objects to be identified’. Before making adjustment to other parameters, we want to be able to test. Go to the top menu and select ‘Test’ and ‘Start test run’. A pointer appears next to the modules in the pipeline. Then click on ‘Step’ below the pipeline to go through each step. The first step is the module ‘LoadData’. A result window should pop up with a list of the loaded files. Then click on ‘Step’ again to test the module ‘IdentifyPrimaryObjects’. A result window should pop up with three images: the Original image, the segmented Nuclei and the Nuclei outlines: Examine the results Use the ‘Zoom-tool’ to select an area to examine more closely. You can make the selection in any of the three images. You can zoom in more by reapplying the ‘Zoom-tool’. You can use the ‘Pantool’ to move around the image. The ‘Back-arrow’ and ‘Forward-arrow’ allows you undo or redo zooms and movements. If you get lost you can always press the ‘Home-button’. In the Nuclei image each color represents a separate object. When two objects are touching, but identified as separate, the objects will appear as distinct colors. In the Nuclei outlines image green outlines highlight valid objects, yellow indicates invalid object touching the image border and red invalid object based on size criterion. Improve identification of primary objects You may try adjusting settings in the module to try to improve the result. You want the outlines to match the nuclei boundaries, and to separate touching objects while at the same time not split a nucleus into separate object. For example, the automated thresholding algorithm finds a threshold that includes a bit too much background. At ‘Select the thresholding method’ change to ‘MoG Global’ from the drop-down-list and set the ‘Approximate fraction of image covered by objects’ to 0.2 to get a more exact outline. Click on ‘Step’ again to see the result from your new settings. Identify secondary objects – the Cells When you are satisfied with the segmentation of the nuclei, it is time to find the entire cell using ‘IdentifySecondaryObjects’. Add this module to the pipeline. Select ‘rawGFP’ as input image and we choose a descriptive name for our secondary objects - ‘Cells’ and enter it at ‘Name the objects to be identified’. Click on ‘Step’ again to see the result looks like when using the default settings. By default secondary objects are identified with the method ‘Propagation’ meaning that cell outlines are defined by propagating the nuclear mask until an intensity threshold in the GFP image is reached. For this assay, the intensity of GFP in the cytoplasm varies depending on treatment. As this is the change we want to measure it is better to have a segmentation method that is not dependent on the GFP stain. Instead at ‘Select the method to identify the secondary objects’ select ‘Distance – N’. Adjust ‘Number of pixels by which to expand the primary objects’ to 10 pixels. Identify tertiary objects – the Cytoplasms Once we have identified the nucleus and expanded the region to identify the entire cell we can use these two objects: ‘Nuclei’ and ‘Cells’ to define the cytoplasm of the cells. Add the module ‘IdentifyTertiaryObjects’ to the pipeline. This module will take the smaller identified objects and remove them from the larger identified objects. We want to "subtract" the ‘Nuclei’ from the ‘Cells’, which will leave just the ‘Cytoplasms’. Select the objects from the drop-down-lists and enter the name ‘CytoplasmOutlines’ for the tertiary objects. Save outlines of cytoplasm It is often useful to save the result of the segmentation to be able to refer back to it when validating the results. In this case we only save the outlines of the cytoplasms as they revel outlines of both cells and nuclei. In ‘IdentifyTertiaryObjects’, make sure you click ‘Retain outlines of tertiary objects’. From ‘FileProcessing’, select the ‘SaveImages’ module. Set ‘Select the type of image to save’ to ‘Image’, select ‘CytoplasmOutlines’ and ‘construct file names’ ‘From image filename’. File prefix ’rawDNA’, suffix ‘outlines’, and file format png, and save to ‘DefaultOutputFolder’. Save ‘Every cycle’, as ‘Grayscale’, and make sure you click ‘Store file and path information to the saved image’. This info will be used by the database when running CellProfiler Analyst. Measurements The next step is to make measurements. From the module category ‘Measurements’ add the module for ‘MeasureObjectIntensity’. We want to measure the ‘rawGFP’ intensity in the objects ‘Nuclei’ and, using ‘Add another object’, ‘Cytoplasm’. We also want to ‘MeasureCorrelation’ ‘Within object’, between the nuclear stain, ‘rawDNA’, and the GFP signal, ‘rawGFP’, both within the ‘Nuclei’, ‘Cytoplasm’, and ‘Cells’. As we want to study transportation of GFP from the cytoplasm to the nucleus it may be interesting to look at the ratio of cytoplasmic stain to nuclear stain. Therefore, add ‘CalculateMath’ from the ‘Data Tools’ category and ‘Name the output measurement’ ‘IntensityRatio’. Adjust settings to divide the mean intensity of GFP signal in each cytoplasm with the mean intensity in each nucleus. Export to database Today, we want to explore the data and use machine learning to classify the cells as having ‘cytoplasmic GFP’ or ‘nuclear GFP’. The machine learning tools are run from CellProfilerAnalyst (CPA), and in order to access the measurements from CPA they have to be saved to a data base. To do this, add the module ‘ExportToDatabase’ found under the ‘Data Tools’ category. Select data base type ‘SQLight’, and click ‘Create a CellProfiler Analyst properties file’, setting plate type to 96, plate metadata to ‘Plate’, and well metadata to ‘Well’. Let the rest of the parameters be at their default settings. Run the pipeline Your pipeline is now ready to run on the full data set of 26 images. Exit the test mode by going to the top menu and select ‘Test’ and ‘Stop test run’. Click on all ‘open eyes’ in the pipeline to not show the result of each step. This makes the analysis faster (you may keep it open if you want to see what happens, e.g. in ‘IdentifyPrimaryObjects’). To analyze all images, click ‘Analyze images’ button in the lower right corner. CellProfiler Analyst You can now start CellProfiler Analyst (CPA) to explore the data you have extracted from the cells. When you start CPA you are asked to select a ‘properties file’. Your properties file created from your CP-pipeline is located in the output folder (called *.properties). The properties file is a text file describing where the database and the images are located, and how to handle the data. Visaulization Once the properties file is loaded, click on ‘Plate Viewer’ in the CPA menu. This is an artificial view of the ’96-well plate’ from which your images originate. The colored squares represent positions in the well for which you have data, while the crossed out wells mean you do not have any measurements (I only gave you 26 images to save you some time). Currently the color coding represents the image number, which is not very interesting. Instead, under ‘Measurements’ chose ‘Image_Metadata_Dose’ from the drop-down list, and you will see the amount of drugs added to each well (browse over the well to see the actual value). You should see something like to the left. To access your measurements, try ‘Image_Count_Nuclei’, and you will see that the number of nuclei varies per image. We can also access our measurement ‘IntensityRatio’. The wells are directly linked to the image data. Click on a well and the corresponding image with outlines as shown here should pop up. To make the display correct, change the color for each channel by selecting colors in the top menu (rawDNA blue and rawGFP green, Outlines to any other color). The same settings will be kept for all subsequent images you open. Classifiy and train classifier Now select ‘Classifier’ from the CPA menu. Click on ‘Fetch!’ to have CPA select a number of random cells from the experiment. Now drag and drop some positive (GFP in nucleus) and negative (GFP in cytoplasm) into the corresponding bins (you may want to turn off the CytoplasmOutlines to better see the image data). If you are not sure about GFP location in a cell, do not use it for training. Once you have a few cells in each bin, click ‘Train Classifier’. Examine rules Now look at the rules found based on your samples. Note that the classifier uses all possible measurements, including the label position and object number of each cell. If you for example notice that ‘Nuclei_Number_Object_Number’ or ‘Nuclei_Location_Center_Y’ is one of the features used by the classification rules, you can be sure to have a classifier that fails as these measurements are not correlated with the phenotype we want to find. Either train on more cells, or (much better) open the properties file in a text editor and scroll down to ‘classifier_ignore_columns’ and add .*_Object_Number, .*_Location_Center.* to the list of measurements that should be ignored by the classifier. To load this new properties file you have to re-start CPA. Improve training When you have a set of rules you can ask the computer to fetch more examples of positive and negative cells, and drag-and-drop those cells to the corresponding bins to improve the classifier. You may also ‘Score Image’, and draw examples from the image. Note image number 1 is negative control (most GFP cytoplasmic), and number 2 is a positive control (most GFP in nuclei). Then ‘Train Classifier’ again. Score all Now, test how it does on the complete data set: press ‘Score all’ (use default settings), and every cell in every image will be scored as positive or negative by the classifier you built. A ‘Hit table’ pops up containing the summarized scores for every image. Double-Click on the star next to an image number to display the corresponding image. Click on ‘Classify’ and ‘Classify Image’ at the top of the image to see the class of each cell. If you see errors, drag and drop cells directly from the image to the ‘Classifier bins’ and ‘Train Classifier’ again. Save the results to database Select the ‘Hit table’ window and save the results to the database by using to the top menu and select ‘File’ and ‘Save table to database’, (chose a short name, e.g. ‘HitTable’, no path, and ‘Store permanently’). Visualize the results Then go to PlateViewer and chose ‘OTHER TABLE’ as ‘Data source’, and select your saved results (chose ‘per-well’ and ‘Plate_ID’ to match ‘Image_Metadata_Plate’ and ‘Image_Metadata_Well’ to match ‘Image_Metadata_Well’). Then select ‘pEnriched_positive’ as ‘Measurement’, if you’ve done things correctly it should look something like to the left. Compare this to the dose shown above (by opening a new Plate Viewer window if you closed it). If the patterns are similar, you’ve done a good job training the classifier. If you see errors, click on a well to see the image, click on ‘Classify’ and ‘Classify Image’ and drag and drop cells directly from the image to the to the ‘Classifier bins’and ‘Train Classifier’ again. Plot the results You can easily plot your data in several ways with CellProfiler Analyst. This time we will use ‘ScatterPlot’ to plot a dose-response curve. We want to see how the ratio of positive cells increases with dose. At ‘x-axis’ select ‘Per_Image’ and ‘Image_Metadata_Dose’ from the drop-down-lists. At ‘y-axis’ select the results you saved from the ‘HitTable’ and ‘pEnriched_positive’ from the drop-down-lists. Click ‘Update Chart’ to see the scatter plot. The dose-response can be more easily interpreted by changing the ‘x-axis’ ‘scale’ from ‘linear’ to ‘log’ from the drop-down list. Lab report To show that you completed the exercise, please answer the following questions and send by email to [email protected]. Please remember to write the name of your lab partner in the email. Q1: Which of Channel 1 and Channel 2 shows the DNA stain? Q2: How many positive and negative control images do we have? Q3: By default, the image is smoothed by a smoothing filter defined by the value given as minimum object size. What happens with the separation of the nuclei if you chose not to automatically calculate the size of the smoothing filter, and instead increase the smoothing filter size to 20? Q4: According to the help (reached by pressing ‘?’) how does ‘Distance N’ define the cytoplasm? Q5: Look at the final scatter plot. What is the lowest dose (Image_Metadata_Dose) that results in a cellular response (pEnriched_positive) similar to the maximum dose? Learn more? Download sample pipeline from www.cellprofiler.org, click Getting started and Example pipelines.
© Copyright 2026 Paperzz