Lights, Camera, Action! Systematic variation in 2

3324
Kimberly F. Sellers1
Jeffrey Miecznikowski2
Surya Viswanathan3
Jonathan S. Minden4
William F. Eddy4, 5, 6
1
Department of Mathematics,
Georgetown University,
Washington, DC, USA
2
Department of Biostatistics,
University of Buffalo,
Buffalo, NY, USA
3
Proteopure Incorporated,
Pittsburgh, PA, USA
4
Department of
Biological Sciences,
Carnegie Mellon University,
Pittsburgh, PA, USA
5
School of Computer Science,
Carnegie Mellon University,
Pittsburgh, PA, USA
6
Department of Statistics,
Carnegie Mellon University,
Pittsburgh, PA, USA
Electrophoresis 2007, 18, 3324–3332
Research Article
Lights, Camera, Action!
Systematic variation in 2-D difference gel
electrophoresis images
2-D Difference gel electrophoresis (DIGE) circumvents many of the problems associated
with gel comparison via the traditional 2-DE approach. DIGE’s accuracy and precision,
however, is compromised by the existence of other significant sources of systematic variation, including that caused by the apparatus used for imaging proteins (location of the
camera and lighting units, background material, imperfections within that material, etc.).
Through a series of experiments, we estimate some of these factors, and account for their
effect on the DIGE experimental data, thus providing improved estimates of the true
relative protein intensities. The model presented here includes 2-DE images as a special
case.
Keywords:
ANOVA / 2-DE / Normalization / Systematic variation
DOI 10.1002/elps.200600793
Received December 5, 2006
Revised January 5, 2007
Accepted February 23, 2007
1
Introduction
Ünlü et al. [1] developed a novel modified 2-DE technique
called 2-D difference gel electrophoresis (DIGE), which circumvents many of the problems associated with comparison
of separate gels. In 2-D DIGE, two different protein samples
are pre-labeled with a pair of matched cyanine dyes which
fluoresce at different wavelengths of light but have identical
charge and almost identical mass, thus allowing two different samples to be run within one gel mixture in both
dimensions. Because the two samples have been subjected to
essentially the same environment during separation, the
“between-gel” variation is significantly reduced; however, the
“within-gel” variation that results from the imaging process
is essentially unaffected [2,3].
This paper focuses on the systematic variation which
occurs due to the DIGE data imaging process and the system
used to image the proteins; for example, see Figs. 1 and 2.
Correspondence: Professor Kimberly F. Sellers, Department of
Mathematics, Georgetown University, Washington, DC 20057,
USA
E-mail: [email protected]
Fax: 1202-687-6067
Abbreviations: DIGE, difference gel electrophoresis; MS, mean
square; MSE, mean squared error; SST, total sum of squares;
SSE, error sum of squares
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
The purpose is to aid in the automatic detection of differential protein expression and modification. This problem’s
difficulty lies in developing a model that accurately represents the complexity of the experimental conditions. Only
through extensive experimentation and deliberation have we
successfully identified and modelled several nuisance effects
influencing the overall experimental data. Section 2
describes the imaging apparatus and the sequence of steps
performed in creating a protein image, thus helping to
establish a statistical model that describes the effect caused
by the system components. The general approach is widely
applicable, while the specific model is appropriate for this
particular type of imaging system.
2
Materials and methods
2.1 System description
The fluorescence gel imager contains a cabinet and platform,
and a scientific-grade CCD camera is situated and focused
approximately five inches above the platform with a lamp approximately three inches away from the camera on each side.
There are several machines of this specific type, while other
commercial fluorescent gel imagers exist (e.g. a laser scanner).
The camera image area is approximately 35 mm635 mm,
while a gel is approximately 140 mm6175 mm. Thus, to
www.electrophoresis-journal.com
Electrophoresis 2007, 18, 3324–3332
Figure 1. DIGE experiment image excited at Cy3 wavelength. A
substantial number of nuisance effects (of considerable size)
contribute to the images. This paper identifies and removes the
most prominent factors, to isolate the true relative protein intensity corresponding to this and the complementary Cy5 light
source.
Figure 2. DIGE experiment image excited at Cy5 wavelength. A
substantial number of nuisance effects (of considerable size)
contribute to the images. This paper identifies and removes the
most prominent factors, to isolate the true relative protein intensity corresponding to this and the complementary Cy3 light
source.
capture an image of the whole gel, a 564 array of tiled images is taken; we will henceforth refer to one of the 35 mm
square camera (sub-)images as a “tile.” The platform moves
so that the tiles are collected in boustrophedonic order (the
first row from left to right, the second row from right to left,
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Proteomics and 2-DE
3325
etc). Each tile is 256 pixels6256 pixels, so the entire image of
20 tiles is 1280 pixels61024 pixels, and each pixel is approximately 135 mm2. Beginning with Tile 1, the camera images
the DIGE sample using the Cy3 filter, then changes filters
and images the same sample under the Cy5 light. After both
images have been recorded, the platform moves in position
to image Tile 2, first under the Cy3 light and then under the
Cy5 light. A typical exposure time for DIGE images is 30 s for
each tile under each lighting filter. Meanwhile, the transition
time to move from one tile to the next is about five s. Thus,
the total time to create the DIGE images corresponding to
both cyanine dyes is approximately 25 min. For completeness, we tested for a change in pixel intensity by imaging the
gel from bottom to top, and by reversing the lighting filter
order, and did not find a statistically significant difference in
pixel intensity in either case.
This work focuses on image analysis using CCD-based
systems because of their inherent qualities that make such a
framework appealing. CCD-based systems are more versatile
than scanners with respect to illumination wavelength. Scanners rely on the wavelengths available for the lasers used in the
system, and on photomultipliers which amplify the photon
signal by creating an electron cascade. This response is adjusted by setting the gain of the photomultiplier tube (PMT). This
means that the system response to light is adjustable (and
inherently variable). Thus, the number of counts from one
experiment to the next may drift for PMTs. The CCD, however,
does not amplify the photon signal, but instead relies on the
ability of photons to mobilize electrons in the detector’s solid
state SiO material. Further, the number of counts per photon (or
electron) for CCD systems is very stable over time.
A disadvantage of the CCD system is the stability of the
light source. The halogen lamps take time to warm up and
stabilize. The halogen lamps also have a relatively short lifespan, while lasers are more stable and long-lived. The experimental design described, however, takes this matter into
account and accepts the tradeoff because of the more significant structural benefits. The gel in the CCD-based system is
placed in an open tray, which provides access for a gel cutting
robot to remove protein spots for mass spectrometric analysis.
The scanners, however, image the gel between glass plates,
which does not allow one to remove protein spots. As a result,
any subsequent sample preparation for mass spectrometric
analysis would be further hindered by scientist involvement.
Further, it is unclear what technical artifacts may be introduced due to the glass plates. An analogous modeling
approach is essential for laser scanner systems to address the
inherent issues pertaining to this technology. We remind the
reader that the overarching scientific goal is to introduce an
automated approach for protein analysis.
2.2 Overview: The model
Let l = {3, 5} denote the light filter setting to Cy3 or Cy5,
respectively, and denote the complementary light setting.
There are numerous experimental factors that can add to the
www.electrophoresis-journal.com
3326
K. F. Sellers et al.
Electrophoresis 2007, 18, 3324–3332
true relative protein intensities. Let Yl denote the intensity
outcome from a DIGE experimental gel imaged under light
source l. This image has an increased intensity reading due
to the CCD camera bias inherent in the system. The background intensity induced by the platform and consistency of
the gel also contributes to Yl. To normalize the two protein
images, we must also account for the difference in intensity
due to the effect of the lighting filter source exciting the
associated dye. Finally, because both protein samples are
present on the same gel, there is a multiplicative effect from
the latent protein sample (hereafter referred to as the bleedthrough effect) that represents an additive relative protein
contribution from the complementary sample when imaging the gel under light source l.
With the above considerations in mind, the following
model is believed to describe the systematic effects:
Y i ¼ b þ tðdi þ ri þ ai pi þ ki ai pi Þ þ ei
(1)
where b represents the camera bias, t denotes the exposure
time associated with the image, dl is the effect of the lighting
system on the platform under light l, rl measures the additional light effect that is introduced by gel excitation under
light l, al and ai are the lighting effects when a uniformly
labeled gel is excited with its corresponding light, k denotes
bleed-through effect, pl and pi represent the true relative
intensity (on a 0-1 scale) of the proteins, respectively, and e is
random error. All of the parameters may vary spatially. We let
i and j denote the pixel row and column location on the CCD,
respectively, within tile k; i = 1, . . ., 256; j = 1, . . ., 256; k = 1,
. . ., 20. Thus, Equation (1) becomes
Yl;ijk ¼ bijk þ t dl;ijk þ rl;ijk
þal;ijk pl;ijk þ kl;ijk al;ijk pl;ijk Þ þ el;ijk
(2)
For simplicity, we suppress the (ijk) subscripts when their
presence is implicit. This paper focuses on the two dye, two
light framework: alpl is the effect due to the protein when the
light matches the dye, and ai pi refers to the light intensity
caused by the complementary light-dye relationship.
Our ultimate goal is to estimate the sample relative
intensities, p3 and p5, for the proteins when excited under
Cy3 and Cy5, respectively, from a single gel. Thus, we consider two models of the form (2) where, given DIGE images
Y3 and Y5, we want to estimate p3 and p5 and compare those
resulting images because they are on a comparable intensity
scale. Any spots that appear more abundant in one image
than in the other imply that the associated protein is differentially expressed, whereas spots that statistically significantly change in location signify differential modification.
Although these issues are not addressed here, they are noted
to demonstrate this work’s broad, significant impact.
All terms in the model, other than p3 and p5, are nuisance
parameters of considerable size, contributing to the complexity of the model. The model has been developed iteratively;
examination of residuals at each stage has suggested further
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
experimentation and the form of additional terms in the
model. In contrast, most biologists simply take a picture of the
platform for a time equal to that for the DIGE experiment and
subtract this result from the experimental data to determine
the “preprocessed data”. Our work goes above and beyond that
which is currently practiced, attributing the variation present
in the data to specific sources that are not considered via the
standard preprocessing technique. The model was not developed for the specific datasets; the values apply only to this
particular dataset but the general model is applicable to all of
the images we have examined. Note that this model can also
be applied to standard 2-DE images. 2-DE images are created
by exciting a labeled gel with its corresponding light source,
e.g. a Cy3 (Cy5) gel excited with a Cy3 (Cy5) light. Thus, under
a particular light source l, these images can be modeled via
Equation (1) with pi = 0.
The image data (in the form of fit files) were analyzed
using the computational freeware tool, Functional Image
Analysis Software Computational Olio (FIASCO version 5.1;
http://www.stat.cmu.edu/~fiasco), in conjunction with Splus version 7.0 on a Linux server 2.4.21 (2005). FIASCO is a
collection of software originally designed to analyze fMRI data
using a series of processing steps. Because of its flexibility and
ability to analyze large datasets fairly quickly, however, it is a
good choice for analyzing proteomic data. The resulting processing time to analyze the images is quite efficient at approximately five min. In an effort to quantify the significant
contributors of systematic variation in the data, we used analysis of variance (ANOVA) measures, including the associated
mean-square (MS) of a factor, mean-squared error (MSE),
error sum of squares (SSE), and total sum of squares (SST).
These measures assess the quality of the estimators through
their associated variance. A “large” term MS value implies
statistical significance for that effect because it accounts for a
substantial amount of the total variation in the date, quantified by the total sum of squares. Meanwhile, a small MS value
implies that the effect is not statistically significant.
2.3 The system of equations
Five experiments are performed (see Table 1) to obtain a
number of images representing different contributions to
the intensity values obtained in the DIGE gel image. Given
the resulting images from these experiments, we consider
the system of equations based on Eq. (1) with epsilon
removed,
8
>
Y ðAÞ ¼ b
>
>
> Y ðBÞ ¼ b þ t d
>
>
ðBÞ l
< l
ðCÞ
Y l ¼ b þ tðCÞ ðdl þ rl Þ
>
>
ðDÞ
>
Y l ¼ b þ pðdl þ rl þ al Þ
>
>
>
: ðEÞ
Y l ¼ b þ pðdl þ rl þ kl al Þ
where the superscripts (A)-(E) denote the various experiments outlined in Table 1. For each equation within the system, t is pre-determined by the scientist; in particular, we set
t(B) = t(C) = t. Thus we have a system of nine equations and
www.electrophoresis-journal.com
Proteomics and 2-DE
Electrophoresis 2007, 18, 3324–3332
Table 1. Experiments performed (in the order listed in the table)
to estimate nuisance parameters
Experiment
Condition
(A) Camera bias
(B) Dark field, light l
(C) No dye, light l
(D) Bright, light l
(E) Bleed through, light l
t=0
rl = 0
p3 = p5 = 0
pl = 1, pi = 0
pl = 0, pi = 1
Estimated parameter
b
dl
rl
al
ki
Each row in the table corresponds to an experiment or set of
experiments. The parameter in the column labelled “Condition”
is set to the value supplied in the table. The parameter in the last
column is estimated from the experimental results.
nine (b, d3, d5, r3, r5, a3, a5, k3, k5) unknowns which can be
determined exactly for the nuisance parameters (denoted
^ ^d3 ; ^d5 ; r
^3 ; r
^5 ; a
^3 ; a
^5 ; k
^3 ; k
^ 5 ). To avoid overfitting, however,
b;
we further investigate the systematic variation that exists
within each of the above constructed images. Details regarding subsequent models are supplied in the upcoming sections, and additional images representing the precise parameter estimates are provided at http://www9.georgetown.edu/faculty/kfs7/images.
2.3.1 Camera bias
In order to model the camera bias, we assume no exposure
time, i.e. t = 0 (Experiment A in Table 1). Further, our data
consists of one 128061024 image of 20 tiles. In light of the
^ we consider the model
systematic variation present in b,
^ ¼ mb þ a þ b þ cij þ dk Þ þ eijk
b
ijk
(3)
where mb is a typical pixel value, ai and bj represent the row
and column effects respectively within the CCD, cij is the
additional CCD (i.e. within-tile) effect, dk is the tile effect, and
e is the unexplained random error. We estimated the 256
values of ai (and bj, respectively) by averaging across the row
(over the column) within each tile.
2.3.2 Dark field
A “dark field” image is collected with the light on and the filter set to either Cy3 or Cy5, but no gel (and therefore no
protein) present in the device. The platform is then imaged
with an exposure time of approximately 30 s for each tile.
The purpose of such an experiment is to estimate d (Experiment B in Table 1).
Because no dyed gel is considered in this experiment, we
consider Eq. (2) with r = pl = p = 0. We obtain the initial
estimate of ^dl via Section 2.3 and update it using the model
^
dl;ijk ¼ ml;d þ f l;ij þ g l;k þ hl;ijk þ el;ijk . The dark effect contains a typical pixel value associated with the dark field under
light l (ml,d), and the effects of the light fl,ij at pixel (i, j), a
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
3327
particular tile (gl,k), nonuniformities on the platform (e.g.,
scratches), hl,ijk, and random error, el,ijk, respectively.
Modeling the dark field image corresponding to Cy3 and
Cy5 occurs in a stepwise fashion. We estimate ml,d by averaging the 1280 ? 1024 ? 2 = 2621440 pixels from the two dark
^ l,d, we estimate
field images under light l. After subtracting m
the effect of the light by averaging the 40 pixels associated
with location (i, j) across tiles for both replicates to get ^f l,ij.
After we subtract this within-tile effect, we average over the
256 ? 256 ? 2 = 131072 pixels within tile k from both replicates to get ^g l,k, k = 1, . . ., 20. Finally, we average over the
resulting images for each replicate to estimate the platform
^ijk. For this example, the tile and platform effects are
effect, h
not statistically significant although scaling allows for the
defects to be detected by eye; thus, the estimated dark field
^
^ ld þ ^fijk; .
image is ^
dl;ijk ¼ m
2.3.3 Gel images with no protein
In this experiment, a gel is prepared containing no proteins
and imaged with the lights on and filter set to either the Cy3
or Cy5 filter for 30 s (Experiment C in Table 1). Accordingly,
^l via Secwe consider Eq. (2) with pl = pi = 0, and solve for r
tion 2.3. The overfit estimates obtained from solving the
system of equations appear approximately constant, there^
^
^l;ijk ¼ r
^ l.
fore we consider r
2.3.4 Single-dye gel images
Single gel images are collected with the light on, a special gel
uniformly loaded with protein labeled with only a single dye,
and imaged under a Cy3 and a Cy5 lighting filter. The proteins on this gel are not separated via the electrophoresis
process. “Bright” images refer to single gel images that are
excited under the associated cyanine light filter (for example,
a Cy3-labeled gel excited under a Cy3 light filter), while
“bleed through” refers to those images where a single-dyed
gel is excited under a complementary light source (for
example, a Cy3-labeled gel excited under a Cy5 light filter).
For each light, the exposure time for these images is dependent on the time needed to obtain near saturation in the
“bright” image.
We use Eq. (2) to model these images, where bright images are such that pl ; 1 and p ; 0 (Experiment D in Table 1)
providing the estimates of al and a, respectively, and bleed
through images set pl ; 0 and p ; 1 (Experiment E in Table
1) to estimate kl,ijk, l = {3, 5}.
2.3.5 Bright coefficients
In order to estimate this parameter, we isolate the middle
362 matrix of tiles from the full images to circumvent any
edge-effect concerns (we remind the reader that the goal is to
automate a procedure for image preprocessing). We average
^
^l;ij , because no statistically signifithe six tiles to determine a
cant tile effect appears to be present.
www.electrophoresis-journal.com
3328
K. F. Sellers et al.
2.3.6 Bleed coefficients
To estimate the bleed coefficients, we collect images made
with the same single dye gel used to estimate the bright
coefficients, and imaged with the complimentary light (e.g.,
Cy3 dye, Cy5 light). The goal here is to model the added effect
caused by having both dyes excited under a particular light.
^l ;ijk , the exact soluWe determine an estimate for kl ;ijk via k
tion obtained from Section 2.3.
^l ;ijk image is approximately constant spatially thus,
The k
we average over the inner 362 matrix of tiles (to avoid con^^l ;ijk .
tributions from random gel effects) to obtain k
3
Results and discussion
3.1 Camera bias
In an effort to model the row effect via the ai values, we
recognized a linear trend in the data (see Fig. 3) due to the
CCD readout process. In particular, while earlier rows of data
are being read by the CCD, additional dark current accumulates with the later rows having larger accumulations than
the earlier rows. We modeled this by assuming that ai = a ? i.
Our estimate for the row effect is â 0.017, implying an
estimated dark current accumulation of 0.68 electrons/row/s.
This parameter describes the largest amount of systematic variation, with MS = 2163479; see Table 2. The ^bj
estimates, however, deviated from the analogous trend on
both sides of the CCD, most strikingly on the left-hand
side of the CCD; see Fig. 4. Thus, we retain 256 individual estimates, ^bj to model the transfer toward the
output amplifier. After removing the row and column
effects, we noted that the additional CCD effect cij and
Electrophoresis 2007, 18, 3324–3332
tiling effect dk were not significant. The resulting model
error image has a random white-noise pattern with MSE
= 8.02; see Table 2. Thus we conclude that an adequate
^
^ ¼ 507:9 þ 0:017i þ ^bj ; see Fig. 5.
fit for the bias is b
ij
3.2 Dark field
Our estimate for m3,d is approximately 48.856 counts/s, while
the estimate for m5,d is approximately 23.800 counts/s. A
significant lighting effect exists in both cases. For this imager, the Cy3 lighting effect can be roughly described as a
convex, 3-D paraboloid such that the center of each tile contains the lowest intensities, and the values increase with the
Euclidean distance from the center. The Cy5 lighting effect,
in contrast, can be described as a convex parabola in the
y-direction, crossed with a constant function in the x-direction so that the intensities associated with the middle rows
contain the lowest values, and they increase systematically as
the distance to the top and bottom borders of the tile decreases. We do not have a physical explanation for these patterns.
There was no statistically significant tile effect associated
with either model. The platform contains dust and scratches
on the glass surface, as detected by the associated platform
estimate. The imaging process does not occur in a dust-free
environment, therefore dust falls onto the gel and is included
in the image. The scratches on the glass appear to be due to
the cleaning process administered by the operators; they use
detergent and paper towels to scrub and cleanse the platform
surface between uses. Although this factor is easily explained,
its effect is not statistically significant, therefore it is not included in the model. The resulting residual images therefore
include the nonsignificant tiling effect and platform effects.
We conclude that an adequate fit for the Cy3 dark field is
^
^
d3;ijk ¼ 48:856 þ ^f3;ij (shown in Fig. 6). The SST associated
Figure 3. Scatterplot of the 5120 withintile row means versus row number. We
recognize a linear trend in the data – while
earlier rows of data are being read by the
CCD, additional dark current accumulates
causing larger intensity readings associated with later rows. The estimated
slope is 0.017, corresponding to the dark
current of 0.68 electrons/row/s. This trend
accounts for the largest amount of systematic variation regarding the camera
bias.
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.electrophoresis-journal.com
Proteomics and 2-DE
Electrophoresis 2007, 18, 3324–3332
3329
Figure 4. Scatterplot of the 5120 withintile column means versus column number. The assumption was that a linear
trend would be detected with respect to
column number, an analogous result to
that of the row result. The most striking
feature, however, is the reduced intensity
within the three columns on the left of
the CCD and a similar but smaller effect
on the right. Thus, the individual column
estimates were retained to model the
transfer toward the output amplifier.
Table 2. Analysis of variance table for camera bias
Outliers
Row gradient
Column direction
Error
Corrected total
df
Sum of squares
Mean square
10
1
255
1310453
1310719
12386297.0
2163479.0
84410.0
10506868.0
25141054.0
1238629.70
2163479.00
331.02
8.02
substantial amount of variation in the data (SST3 =
2.26761013, and SSE3 = 1.45661012; SST5 = 3.09661013,
and SSE5 = 1.15161012).
3.5 Bleed coefficients
We modeled the bleed coefficient as a constant, kl,ijk = kl. For
the Cy3 gel under Cy5 light, we estimate k3 0.0024, and k5
= 0.0623 for the data from the experiment imaging a Cy5 gel
with the Cy3 dark field is 6.3696109, while its associated
SSE = 1.3156107). Meanwhile, an adequate fit for the Cy5
^
dark field is ^d5;ijk ¼ 23:800 þ ^f5;ij (shown in Fig. 7). The
SST here is 1.4996109, while its associated SSE =
1.8876106.
3.3 Gel images with no protein
Due to the apparent homogeneity among intensity values
within these images, we model the gel effect as a con^^
^^ . In particular, r
^^ = 0.339 counts/s and r
^
^5
stant, r
l;ijk ¼ r
l
3
= 0.258 counts/s. The total sum of squares associated
with the Cy3 data equals 1.8716107, while the resulting
residual sum of squares is 1.8366107; similarly, the Cy5
data have a total sum of squares of 1.1346108, while SSE
= 1.1306108.
3.4 Bright coefficients
^^ 3;ij equals 4415.520 plus a significant
For our dataset, a
^^5;ijk equals 4988.065 plus its assowithin-tile effect, while a
ciated significant within-tile effect; see Figs. 8 and 9. Each of
the resulting residual images, e3,ijk and e5,ijk, shows that the
gel is actually nonuniform; however, the gel was poured
uniformly to the best of our ability. The model accounts for a
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
^
^ ¼ 507:9 þ 0:017i þ ^bj
Figure 5. Fitted image for the bias effect, b
ij
^ denotes the 256 individual column estimates. There was
where b
j
no significant additional CCD effect or tiling effect, and the associated residual image shows a white-noise pattern with meansquared error equaling 8.02.
www.electrophoresis-journal.com
3330
K. F. Sellers et al.
Figure 6. Cy3 lighting effect associated with the dark (i.e. platform) image. This image shows the significant circular lighting
effect when exciting the platform. The model fit for the Cy3 dark
field is 48.856 plus this effect. There was no statistically significant tile effect or platform effect, although there exists a physical
explanation associated with the platform effect.
Figure 7. Cy5 lighting effect associated with the dark (i.e. platform) image. This image shows the horizontal lighting effect
when exciting the platform. The model fit for the Cy5 dark field
image is 23.8 plus this effect. There is no statistically significant
tile or platform effect associated with this model.
under Cy3 light. The added intensity caused by a bleedthrough of Cy3 light on a Cy5-dyed gel is significantly greater
than that for Cy5 bleed-through on a Cy3-dyed gel. The
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Electrophoresis 2007, 18, 3324–3332
^
^3;ij . The image
Figure 8. Fitted image for the Cy3 bright effect, a
shows the statistically significant lighting effect due to the presence of proteins in the gel. The exposure time associated with the
bright Cy3 image is 20 s.
^
^ 5;ij . The image
Figure 9. Fitted image for the Cy5 bright effect, a
shows the lighting effect due to the presence of proteins in the gel.
The exposure time associated with the bright Cy3 image is 15 s.
resulting residual images (not shown) are white noise within
the area of the gel, with the exception of values along the top
edge of Gel3 where it originally reached saturation, and the
right edge of Gel5 where the gel curled. A substantial amount
of variation is explained through the model (SST3 = 185.815,
and SSE3 = 177.096; SST5 = 5150.757, and SSE5 = 476.707).
www.electrophoresis-journal.com
Proteomics and 2-DE
Electrophoresis 2007, 18, 3324–3332
3.6 Overall model
The basic model for an image becomes
Y l;ijk ¼ bij þ t dl;ijk þ rl þ al;ij pl;ijk þ kl al;ij pl;ijk Þ þ el;ijk
(4)
where all of the nuisance parameters were estimated with
results detailed above. The true relative protein intensities
(pl,ijk and pi;ijk ) are estimated here.
To be exact, two DIGE images, Y3 and Y5, are created, and
we want to use Eq. (4) to solve for p3 and p5. For now, we
disregard e and let W l ¼ ðY l bÞ=t d r to obtain the
system of linear equations, Wl = alpl + ki ai pi , for l = {3, 5}, i.e.
there are two equations and two unknowns (W, al, ai , and ki
are known from the previous sections).
Thus, we can solve
for pl directly: pl ¼ ðW l kl Wl Þ ðal ð1 kl kl ÞÞ; see Figs. 10
and 11.
To construct these estimates, we use the protein images
Yl and solve for pl. Much of the systematic variation in the
DIGE experimental data is reduced or removed in the
estimated true protein images. DIGE experimental images,
however, are created without replication; thus, additional
factors (including edge effects and other phenomena) can
still contribute to further variation in the data. The data, Yl
and Yi , are used in the two equations of the form (4) to
solve for p3 and p5. Thus, the estimated value of the residuals is approximately zero, and any random error in the
DIGE experiment is contained in the estimates for p3 and p5.
With replication, however, we can estimate p3 and p5 via least
squares; this will lead to non-zero residuals. By “replication”,
we mean that the biologists create, electrophorese and image
more than one gel from a specific protein mixture. This will
allow for a more accurate estimate of the relative proteins
corresponding to each light, and for other random phenomena to be accounted for in the residual images.
For the Cy3 DIGE image here, SST3 = 1.65461013,
while SST5 = 9.62861012 for the Cy5 DIGE image. To
account for the variation in the data, we provide the sums of
squares associated with each of the estimates in Eq. (4) in
Table 3. Given the structure of the model, as well as the
procedures used to determine the estimates, we note that
the sums of squares do not add to the total sums of squares.
The purpose here is to indicate the relative amount of variation stemming from each of the factors in the model. As a
result, we witness the impact of this work. The lighting
effect associated with the CCD camera system is the most
significant of the parameters estimated. In particular, its
contributing factors, namely the dark and bright effects
respectively, address the background variation that we see in
the original DIGE images. The current preprocessing
approach for such images is to subtract the associated dark
field image from the raw DIGE image corresponding to
each light source. Although this addresses a significant
amount of variation, it does not remove significant contributions due to the gel or light source as we have quantified above.
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
3331
Consider our final model as being composed of individual models, where the estimation of each parameter is done
through an experiment meant to isolate that parameter in
model (1). Several issues have not been addressed in this
model, namely, to further improve on the current procedure,
Figure 10. Estimated true relative Cy3 protein image. Much of the
systematic variation in the raw DIGE experimental data image is
reduced or removed here. Because there were no replicates for
the DIGE experiment, however, random phenomena still exist in
these images.
Figure 11. Estimated true relative Cy5 protein image. As with the
associated Cy3 protein image, much of the systematic variation
that exists in the raw DIGE output has been significantly reduced
or removed. Further, the resulting Cy3 and Cy5 images are normalized to allow for better image comparison.
www.electrophoresis-journal.com
3332
K. F. Sellers et al.
Electrophoresis 2007, 18, 3324–3332
Table 3. Sums of squares (over pixels) associated with each of
the estimates in Eq. (4)
Factor
Parameter
Sum of squares
Bias
Dark field
bij
d3,ijk
d5,ijk
r3
r5
a3,ij
a5,ij
k5
k3
p3
p5
1.466107
3.1416109
7.4996108
4.4446106
3.3766106
2.57361013
3.28061013
3.2116103
8.1636104
132.898
13.152
Gel
Bright
Bleed-through
Proteins
we continue to investigate other less important effects
including the lamp age, lamp location, movement and drying of the gel, edge effects from the gel, etc.
The grayscale associated with the images influences our
impression of the model, as well as our ability to detect protein spots in the final image estimates. “Cropping” the gel
images circumvents the influence due to edge effects, allowing for an automated procedure; however, there remain other
internal influences from the grayscale that might lead one to
overfit the data. For example, excessive Winsorization of an
image may suggest the inclusion of more parameters (e.g., to
account for an apparent subsequent tiling effect).
4
Concluding remarks
We have developed an automated procedure to account for all
of the aforementioned factors contributing to the variation in
DIGE experimental data. Our program is supplied with data
from the experiments to estimate the nuisance parameters,
along with the experimental data of interest. We obtain
© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
image estimates for the true relative protein intensities
under the Cy3 and Cy5 light filters, respectively, along with
estimates of the variances from the original and final images.
The automated code allows for routine use. To further
improve on the current procedure, we continue to investigate
other less important effects including the lamp age, lamp
location, movement and drying of the gel, edge effects from
the gel, etc.
We have developed a method to account for the major
systematic sources of variation present in DIGE images. The
model can be extended to any 2-DE images, as well as 1-D gel
images. The design and motivation for the model is broad
and applicable to a variety of biological images based on
CCD-based fluorescence imaging. This novel model for fluorescent CCD camera images will undoubtedly improve the
accuracy and precision in which we are able to detect and
estimate quantities of interest.
KFS and JCM were supported by the National Science
Foundation VIGRE program, grant number DMS-9819950.
The authors thank the reviewers for their helpful comments and
suggestions.
5
References
[1] Ünlü, M., Morgan, M., Minden, J., Electrophoresis 1997, 18,
2071–2077.
[2] Tonge, R., Shaw, J., Middleton, B., Rowlinson, R., Proteomics
2001, 1, 377–396.
[3] Alban, A., David, S., Bjorkesten, L., Andersson, C., Proteomics
2003, 3, 36–44.
[4] Conradson, K., Pederson, J., Biometrics 1992, 48, 1273–1287.
[5] Gygi, S., Corthals, G., Zhang, Y., Rochon, Y., Aebersold, R.,
Proc. Natl. Acad. Sci. USA 2000, 97, 9390–9395.
[6] O’Farrell, P., Biol. J. Chem. 1975, 250, 4007–4021.
www.electrophoresis-journal.com