Introduction to CellProfiler and CellProfiler Analyst

Computer exercise in CellProfiler & CellProfiler Analyst, 111129
Carolina Wählby, [email protected] & Martin Simonsson, [email protected]
Starting software and download images
CellProfiler and CellProfilerAnalyst can be downloaded from www.cellprofiler.org. This webpage
also provides tutorials, examples and forum where you can find answers questions you may have
and get help. However, for this exercise the programs have already been installed on your computer,
and can be found on ‘Gemensamt on 'IT-Pclab' (G:) → Program’.
Download sample images and a description of experimental parameters from
http://www.cb.uu.se/~carolina/TranslocationData.zip
Scenario
In this experiment we have human osteosarcoma cells where a ‘forkhead’-protein has been labeled
with GFP (a green fluorescing protein). We know that 150nM of a positive control drug causes the
cells to transport the protein from the cytoplasm to the nucleus, but we do not know the lowest
possible dose to see this effect. We also want to optimize the image analysis to separate positive
(treated with 150nM drug) and negative controls (untreated) as well as possible. The optimal goal of
developing this type of image based screens is to use it to search for other, previously unknown
drugs that have the same effect on the cell level, and may be of use for treatment of patients
(possibly with less side effects than the known drug).
Challenge
Analyze how cells respond to treatment with a drug. In this case how a protein is translocated from
one subcellular compartment to another.
Material
Your images originate from a ’96-well plate’, but you will work with a subset of 26 images with the
following setup: 8 wells untreated (negative control) and 8 wells were treated with the maximum
dose of 150nM (positive control). 10 wells were used to create a gradient with increasing
concentration of the drug. You will also use a simple text file called
‘Translocation_doses_n_controls.csv’ containing information about where on the 96-well plate the
wells were located and how the cells were treated.
Methods
Using Cellprofiler and CellProfiler Analyst, your job is to outline (segment) each cell and
cytoplasm and extract a number of measurements from each cell. You have a total of 26 images and
approximately 200 cells per image, so you want to automate the process. The first task is to set up a
CellProfiler ‘pipeline’ consisting of a number of ‘modules’ and then test the pipeline on a few
images. You will thereafter run it automatically on all the images in the experiment. Once you have
extracted information from the images and stored them in a database you will use Cellprofiler
Analyst to visualize your data and to train a classifier to distinguish between treated and untreated
cells.
CellProfiler
Select Input and Output folder
Unzip TranslocationData.zip. Start CellProfiler and set the ‘Default Input Folder’ to the folder
where you placed your images. The names of your images should now appear in the bottom left
square of the CellProfiler interface. Double-click on a few of the images to see what they look like.
Set ‘Default Output folder’ to the same folder as the input folder. This is where your
measurements will be saved.
Load images and associated data
Click on ‘+’ to add your fist module to the pipeline. Click on ‘File Processing’ and you can select
either ‘LoadImages’ or ‘LoadData’. Today we will use ‘LoadData’ as this allows us to load the
information about the drug dose together with the image data, which makes analysis of results
easier. Then click ‘+ Add to Pipeline’. In the module adjustment window, click on the folder next
to ‘Name of the file’ to select ‘Translocation_doses_n_controls.csv’ which is your file describing
image names and doses. Click ‘View’ to see what the file looks like.
Identify primary objects – the Nuclei
To keep things simple, we skip the pre-processing for now. Instead we want to find our primary
objects: the cell nuclei. Again, click on ‘+’ to add the module ‘IdentifyPrimaryObjects’, located
under module category ‘Object Processing’. Then click ‘+ Add to Pipeline’. Now adjust the
settings. At ‘Select the input image’ chose ‘rawDNA’ from the drop-down-list. We choose a
descriptive name for our primary objects - ‘Nuclei’, so we can refer to them in later steps, and enter
the name at ‘Name the primary objects to be identified’. Before making adjustment to other
parameters, we want to be able to test. Go to the top menu and select ‘Test’ and ‘Start test run’. A
pointer appears next to the modules in the pipeline. Then click on ‘Step’ below the pipeline to go
through each step. The first step is the module ‘LoadData’. A result window should pop up with a
list of the loaded files. Then click on ‘Step’ again to test the module ‘IdentifyPrimaryObjects’. A
result window should pop up with three images: the Original image, the segmented Nuclei and the
Nuclei outlines:
Examine the results
Use the ‘Zoom-tool’ to select an area to examine more closely. You can make the selection in any
of the three images. You can zoom in more by reapplying the ‘Zoom-tool’. You can use the ‘Pantool’ to move around the image. The ‘Back-arrow’ and ‘Forward-arrow’ allows you undo or redo
zooms and movements. If you get lost you can always press the ‘Home-button’.
In the Nuclei image each color represents a separate object. When two objects are touching, but
identified as separate, the objects will appear as distinct colors. In the Nuclei outlines image green
outlines highlight valid objects, yellow indicates invalid object touching the image border and red
invalid object based on size criterion.
Improve identification of primary objects
You may try adjusting settings in the module to try to improve the result. You want the outlines to
match the nuclei boundaries, and to separate touching objects while at the same time not split a
nucleus into separate object. For example, the automated thresholding algorithm finds a threshold
that includes a bit too much background. At ‘Select the thresholding method’ change to ‘MoG
Global’ from the drop-down-list and set the ‘Approximate fraction of image covered by objects’
to 0.2 to get a more exact outline. Click on ‘Step’ again to see the result from your new settings.
Identify secondary objects – the Cells
When you are satisfied with the segmentation of the nuclei, it is time to find the entire cell using
‘IdentifySecondaryObjects’. Add this module to the pipeline. Select ‘rawGFP’ as input image
and we choose a descriptive name for our secondary objects - ‘Cells’ and enter it at ‘Name the
objects to be identified’. Click on ‘Step’ again to see the result looks like when using the default
settings. By default secondary objects are identified with the method ‘Propagation’ meaning that
cell outlines are defined by propagating the nuclear mask until an intensity threshold in the GFP
image is reached. For this assay, the intensity of GFP in the cytoplasm varies depending on
treatment. As this is the change we want to measure it is better to have a segmentation method that
is not dependent on the GFP stain. Instead at ‘Select the method to identify the secondary
objects’ select ‘Distance – N’. Adjust ‘Number of pixels by which to expand the primary
objects’ to 10 pixels.
Identify tertiary objects – the Cytoplasms
Once we have identified the nucleus and expanded the region to identify the entire cell we can use
these two objects: ‘Nuclei’ and ‘Cells’ to define the cytoplasm of the cells. Add the module
‘IdentifyTertiaryObjects’ to the pipeline. This module will take the smaller identified objects and
remove them from the larger identified objects. We want to "subtract" the ‘Nuclei’ from the ‘Cells’,
which will leave just the ‘Cytoplasms’. Select the objects from the drop-down-lists and enter the
name ‘CytoplasmOutlines’ for the tertiary objects.
Save outlines of cytoplasm
It is often useful to save the result of the segmentation to be able to refer back to it when validating
the results. In this case we only save the outlines of the cytoplasms as they revel outlines of both
cells and nuclei. In ‘IdentifyTertiaryObjects’, make sure you click ‘Retain outlines of tertiary
objects’. From ‘FileProcessing’, select the ‘SaveImages’ module. Set ‘Select the type of image to
save’ to ‘Image’, select ‘CytoplasmOutlines’ and ‘construct file names’ ‘From image filename’.
File prefix ’rawDNA’, suffix ‘outlines’, and file format png, and save to ‘DefaultOutputFolder’.
Save ‘Every cycle’, as ‘Grayscale’, and make sure you click ‘Store file and path information to
the saved image’. This info will be used by the database when running CellProfiler Analyst.
Measurements
The next step is to make measurements. From the module category ‘Measurements’ add the
module for ‘MeasureObjectIntensity’. We want to measure the ‘rawGFP’ intensity in the objects
‘Nuclei’ and, using ‘Add another object’, ‘Cytoplasm’. We also want to ‘MeasureCorrelation’
‘Within object’, between the nuclear stain, ‘rawDNA’, and the GFP signal, ‘rawGFP’, both within
the ‘Nuclei’, ‘Cytoplasm’, and ‘Cells’. As we want to study transportation of GFP from the
cytoplasm to the nucleus it may be interesting to look at the ratio of cytoplasmic stain to nuclear
stain. Therefore, add ‘CalculateMath’ from the ‘Data Tools’ category and ‘Name the output
measurement’ ‘IntensityRatio’. Adjust settings to divide the mean intensity of GFP signal in each
cytoplasm with the mean intensity in each nucleus.
Export to database
Today, we want to explore the data and use machine learning to classify the cells as having
‘cytoplasmic GFP’ or ‘nuclear GFP’. The machine learning tools are run from CellProfilerAnalyst
(CPA), and in order to access the measurements from CPA they have to be saved to a data base. To
do this, add the module ‘ExportToDatabase’ found under the ‘Data Tools’ category. Select data
base type ‘SQLight’, and click ‘Create a CellProfiler Analyst properties file’, setting plate type to
96, plate metadata to ‘Plate’, and well metadata to ‘Well’. Let the rest of the parameters be at their
default settings.
Run the pipeline
Your pipeline is now ready to run on the full data set of 26 images. Exit the test mode by going to
the top menu and select ‘Test’ and ‘Stop test run’. Click on all ‘open eyes’ in the pipeline to not
show the result of each step. This makes the analysis faster (you may keep it open if you want to see
what happens, e.g. in ‘IdentifyPrimaryObjects’). To analyze all images, click ‘Analyze images’
button in the lower right corner.
CellProfiler Analyst
You can now start CellProfiler Analyst (CPA) to explore the data you have extracted from the cells.
When you start CPA you are asked to select a ‘properties file’. Your properties file created from
your CP-pipeline is located in the output folder (called *.properties). The properties file is a text file
describing where the database and the images are located, and how to handle the data.
Visaulization
Once the properties file is loaded, click on ‘Plate Viewer’ in the CPA menu. This is an artificial
view of the ’96-well plate’ from which your images originate. The colored squares represent
positions in the well for which you have
data, while the crossed out wells mean
you do not have any measurements (I
only gave you 26 images to save you
some time). Currently the color coding
represents the image number, which is
not very interesting. Instead, under
‘Measurements’ chose
‘Image_Metadata_Dose’ from the
drop-down list, and you will see the
amount of drugs added to each well
(browse over the well to see the actual
value). You should see something like to
the left.
To access your measurements, try
‘Image_Count_Nuclei’, and you will see that
the number of nuclei varies per image. We can
also access our measurement ‘IntensityRatio’.
The wells are directly linked to the image data.
Click on a well and the corresponding image
with outlines as shown here should pop up. To
make the display correct, change the color for
each channel by selecting colors in the top menu
(rawDNA blue and rawGFP green, Outlines to
any other color). The same settings will be kept
for all subsequent images you open.
Classifiy and train classifier
Now select ‘Classifier’ from the CPA menu.
Click on ‘Fetch!’ to have CPA select a number
of random cells from the experiment. Now drag
and drop some positive (GFP in nucleus) and
negative (GFP in cytoplasm) into the
corresponding bins (you may want to turn off the CytoplasmOutlines to better see the image data).
If you are not sure about GFP location in a cell, do not use it for training. Once you have a few cells
in each bin, click ‘Train Classifier’.
Examine rules
Now look at the rules found based on your samples. Note that the classifier uses all possible
measurements, including the label position and object number of each cell. If you for example
notice that ‘Nuclei_Number_Object_Number’ or ‘Nuclei_Location_Center_Y’ is one of the
features used by the classification rules, you can be sure to have a classifier that fails as these
measurements are not correlated with the phenotype we want to find. Either train on more cells, or
(much better) open the properties file in a text editor and scroll down to
‘classifier_ignore_columns’ and add .*_Object_Number, .*_Location_Center.* to the list of
measurements that should be ignored by the classifier. To load this new properties file you have to
re-start CPA.
Improve training
When you have a set of rules you can ask the computer to fetch more examples of positive and
negative cells, and drag-and-drop those cells to the corresponding bins to improve the classifier.
You may also ‘Score Image’, and draw examples from the image. Note image number 1 is negative
control (most GFP cytoplasmic), and number 2 is a positive control (most GFP in nuclei). Then
‘Train Classifier’ again.
Score all
Now, test how it does on the complete data set: press ‘Score all’ (use default settings), and every
cell in every image will be scored as positive or negative by the classifier you built. A ‘Hit table’
pops up containing the summarized scores for every image. Double-Click on the star next to an
image number to display the corresponding image. Click on ‘Classify’ and ‘Classify Image’ at the
top of the image to see the class of each cell. If you see errors, drag and drop cells directly from the
image to the ‘Classifier bins’ and ‘Train Classifier’ again.
Save the results to database
Select the ‘Hit table’ window and save the results to the database by using to the top menu and
select ‘File’ and ‘Save table to database’, (chose a short name, e.g. ‘HitTable’, no path, and
‘Store permanently’).
Visualize the results
Then go to PlateViewer and chose ‘OTHER TABLE’ as ‘Data source’, and select your saved
results (chose ‘per-well’ and ‘Plate_ID’ to match ‘Image_Metadata_Plate’ and
‘Image_Metadata_Well’ to match ‘Image_Metadata_Well’).
Then select
‘pEnriched_positive’ as
‘Measurement’, if you’ve done
things correctly it should look
something like to the left.
Compare this to the dose shown
above (by opening a new Plate
Viewer window if you closed it).
If the patterns are similar, you’ve
done a good job training the
classifier. If you see errors, click
on a well to see the image, click
on ‘Classify’ and ‘Classify
Image’ and drag and drop cells
directly from the image to the to
the ‘Classifier bins’and ‘Train Classifier’ again.
Plot the results
You can easily plot your data in several ways with CellProfiler Analyst. This time we will use
‘ScatterPlot’ to plot a dose-response curve. We want to see how the ratio of positive cells increases
with dose. At ‘x-axis’ select ‘Per_Image’ and ‘Image_Metadata_Dose’ from the drop-down-lists.
At ‘y-axis’ select the results you saved from the ‘HitTable’ and ‘pEnriched_positive’ from the
drop-down-lists. Click ‘Update Chart’ to see the scatter plot. The dose-response can be more easily
interpreted by changing the ‘x-axis’ ‘scale’ from ‘linear’ to ‘log’ from the drop-down list.
Lab report
To show that you completed the exercise, please answer the following questions and send by email
to [email protected]. Please remember to write the name of your lab partner in the email.
Q1: Which of Channel 1 and Channel 2 shows the DNA stain?
Q2: How many positive and negative control images do we have?
Q3: By default, the image is smoothed by a smoothing filter defined by the value given as minimum
object size. What happens with the separation of the nuclei if you chose not to automatically
calculate the size of the smoothing filter, and instead increase the smoothing filter size to 20?
Q4: According to the help (reached by pressing ‘?’) how does ‘Distance N’ define the cytoplasm?
Q5: Look at the final scatter plot. What is the lowest dose (Image_Metadata_Dose) that results in a
cellular response (pEnriched_positive) similar to the maximum dose?
Learn more?
Download sample pipeline from www.cellprofiler.org, click Getting started and Example pipelines.