Instructions for Isolating One Median from the Sea of

Instructions for Isolating One Median from the Sea of Data
Introduction
A print includes, at a minimum, quadruplicates of each antigen so we need a way of compressing all this
information into one numerical value for statistical analyses. Inevitably, compressing the data leads to loss
of information so we use a scheme that minimizes the loss of useful information. To this end, we take the
median value of a unique antigen. The median is an excellent representation of the average value for data
that is not normally distributed and reduces skewing do to outliers in the data.
Which Column?
GenePix already calculated the median, mean, standard deviation, ratio, and many other statistical
parameters during the analysis section of adding a grid to the slide. In addition, GenePix 5.0 outputs
information from the GAL file, settings, and flags. In all, it’s a sea of information, and for a person doing
their first analysis, swimming may not look like an attractive option.
Fortunately, we use a simple strategy in our statistical analyses. This strategy uses the median value of the
fluorescent color used with the secondary antibody. For example, if a cy3-conjugated antibody was used,
then we want to collect the F532 Median – B532 column (column AX in Excel). Alternatively, if a cy5conjugated antibody was used, then we want to collect the F635 Median – B635 column (column AW in
Excel).
Consolidating All the Slides into One Sheet
Launch Microsoft Excel
Select File, Open… (Ctrl+O)
Select a slide from the GenePix analysis (see Howto Grid µArray Images.doc)
Select the Name column (column G in Excel)
Select Edit, Copy (Ctrl+C)
Select File, New… (Ctrl+N)
I’ll call the new spreadsheet, spreadsheet A, and the old spreadsheet, spreadsheet X.
Select Edit, Paste (Ctrl+V) in the first column (column A).
Rename the label Name to something such as Unique ID (later programs don’t like the header Name).
Return to spreadsheet X.
Select the appropriate FXXX Median – BXXX column (depends on color)
Select Edit, Copy (Ctrl+C)
Return to spreadsheet A.
Select Edit, Paste (Ctrl+V) in the second column (column B).
Rename the label FXXX Median – BXXX to something more descriptive of the sample and slide #.
Select File, Open… (Ctrl+O)
Select the next slide from the GenePix analysis
Repeat the procedure for adding the Median column from spreadsheet X to spreadsheet A until you have
one file containing all the slides from the scan/analysis.
Select File, Save As
Name the Excel file with a description that represents the state of the analysis. For example, you might
name the file
slXX-YY_print_YY_MM_DD_medians.xls
The first part in the data compression process is complete.
Sorting All the Median Values
Select File, New… (Ctrl+N)
I’ll call the new spreadsheet, spreadsheet B, and the spreadsheet with all the slides, spreadsheet A.
Highlight the entire data set (Ctrl+A) from spreadsheet A.
Select Edit, Copy (Ctrl+C)
Return to spreadsheet B.
Select Edit, Paste (Ctrl+V)
See the images below for the final product of the next few steps*.
Delete empty rows above headers.
Insert a row between 1 & 2 (beneath the row of sample/slide names)
Number the empty cells beneath each slide with a number that starts with one and increases to n, where
n is the total number of slides in the experiment.
Move the Unique ID to cell A2.
Highlight cells A3 – MN (where M is the final data column and N is the final data row)
Select Data, Sort…
Select Column A, Check Ascending, Check No Header Row, and Click OK
This step groups all the replicates together (see sorted image).
Unsorted Image
*
Sorted Image
I reference cells by their row (numbers) and their column (letters). For example, A1 is the cell for the first row and first column.
Consolidating All the Median Values into One Median
The algorithm to consolidate the one or two thousand total antigens into only the unique antigens uses
functions that the average Excel user may or may not be familiar with. I’ll go slowly and clarify each step
along the way.
Please refer to the image below because I rely on the cell names and numbers in this explanation. This
particular experiment contains 26 samples, spanning columns B – AA. I hid columns E – AA to fit all the
relevant information onto a reasonably sized image.
Highlight cells A1 – AA2
Select Edit, Copy (Ctrl+C)
Select cell AD1
Select Edit, Paste (Ctrl+V)
Enter the number 1 in cell AB2
Enter the following algorithm in cell AB3
=IF((EXACT($A3,$A4)),1+AB2,1)
Select cell AB3
Select Edit, Copy (Ctrl+C)
Highlight cells AB3 – ABn†
Select Edit, Paste (Ctrl+V)
Select cell AC3
Enter the number 1 in cell AC3
Select cell AC4
Enter the following algorithm in cell AC4
=AC3+1
Select cell AC4
Select Edit, Copy (Ctrl+C)
Highlight cells AC4 – ACn
Select Edit, Paste (Ctrl+V)
Select cell AD3
Enter the following algorithm in cell AD3
=IF((EXACT($A3,$A4))," ",A3)
Select cell AD4
Select Edit, Copy (Ctrl+C)
Highlight cells AD4 – ADn
Select Edit, Paste (Ctrl+V)
Select cell AE3
Enter the following algorithm in cell AE3
=IF((EXACT($A3,$A4))," ",MEDIAN(OFFSET($A$3,($AC3-$AB2),AE$2,1,1):OFFSET($A$3,$AC2,AE$2,1,1)))
Select cell AE4
Select Edit, Copy (Ctrl+C)
Highlight cells AE4 – BDn
Select Edit, Paste (Ctrl+V)
The second part in the data compression process is complete.
Eliminating the Extra Space
Highlight cells AD1 – BDn
Select Edit, Copy (Ctrl+C)
Select Sheet2
Right click on cell A1 and select Paste Special…
Select Values and click OK
Highlight cells A3 – AAn
Select Data, Sort…
Select Column A, Check Descending, Check No Header Row, and Click OK
Scroll down to row n and make sure all the antigens are grouped continuously. If they aren’t, then delete
the intervening rows. Save this file and the data compression is now complete!
†
n represents the final row of the complete data set
Brief Explanation of the Algorithm
There are two main parts to this algorithm. The first part searches through the list of antigen names looking
for duplications. The second part determines the number of replicates for a particular antigen then
computes the median value of all the replicates.
The first algorithm uses the EXACT function to search for exact string matches. I ask the question, is this
name the same as the previous one. If the answer is yes, then I add a blank to the cell and move on. If the
answer is no, then I add the name to the cell and move on.
The second algorithm uses the EXACT function in the same manner as the first algorithm. In addition, the
second algorithm also uses the OFFSET and MEDIAN functions. The OFFSET function allows me to
include all the cells that particular group of replicates occupy. The MEDIAN function computes the
median value of this set.

Download Report

Instructions for Isolating One Median from the Sea of

Paperzz.com

Your Paperzz