3324 Kimberly F. Sellers1 Jeffrey Miecznikowski2 Surya Viswanathan3 Jonathan S. Minden4 William F. Eddy4, 5, 6 1 Department of Mathematics, Georgetown University, Washington, DC, USA 2 Department of Biostatistics, University of Buffalo, Buffalo, NY, USA 3 Proteopure Incorporated, Pittsburgh, PA, USA 4 Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA 5 School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA 6 Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA Electrophoresis 2007, 18, 3324–3332 Research Article Lights, Camera, Action! Systematic variation in 2-D difference gel electrophoresis images 2-D Difference gel electrophoresis (DIGE) circumvents many of the problems associated with gel comparison via the traditional 2-DE approach. DIGE’s accuracy and precision, however, is compromised by the existence of other significant sources of systematic variation, including that caused by the apparatus used for imaging proteins (location of the camera and lighting units, background material, imperfections within that material, etc.). Through a series of experiments, we estimate some of these factors, and account for their effect on the DIGE experimental data, thus providing improved estimates of the true relative protein intensities. The model presented here includes 2-DE images as a special case. Keywords: ANOVA / 2-DE / Normalization / Systematic variation DOI 10.1002/elps.200600793 Received December 5, 2006 Revised January 5, 2007 Accepted February 23, 2007 1 Introduction Ünlü et al. [1] developed a novel modified 2-DE technique called 2-D difference gel electrophoresis (DIGE), which circumvents many of the problems associated with comparison of separate gels. In 2-D DIGE, two different protein samples are pre-labeled with a pair of matched cyanine dyes which fluoresce at different wavelengths of light but have identical charge and almost identical mass, thus allowing two different samples to be run within one gel mixture in both dimensions. Because the two samples have been subjected to essentially the same environment during separation, the “between-gel” variation is significantly reduced; however, the “within-gel” variation that results from the imaging process is essentially unaffected [2,3]. This paper focuses on the systematic variation which occurs due to the DIGE data imaging process and the system used to image the proteins; for example, see Figs. 1 and 2. Correspondence: Professor Kimberly F. Sellers, Department of Mathematics, Georgetown University, Washington, DC 20057, USA E-mail: [email protected] Fax: 1202-687-6067 Abbreviations: DIGE, difference gel electrophoresis; MS, mean square; MSE, mean squared error; SST, total sum of squares; SSE, error sum of squares © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim The purpose is to aid in the automatic detection of differential protein expression and modification. This problem’s difficulty lies in developing a model that accurately represents the complexity of the experimental conditions. Only through extensive experimentation and deliberation have we successfully identified and modelled several nuisance effects influencing the overall experimental data. Section 2 describes the imaging apparatus and the sequence of steps performed in creating a protein image, thus helping to establish a statistical model that describes the effect caused by the system components. The general approach is widely applicable, while the specific model is appropriate for this particular type of imaging system. 2 Materials and methods 2.1 System description The fluorescence gel imager contains a cabinet and platform, and a scientific-grade CCD camera is situated and focused approximately five inches above the platform with a lamp approximately three inches away from the camera on each side. There are several machines of this specific type, while other commercial fluorescent gel imagers exist (e.g. a laser scanner). The camera image area is approximately 35 mm635 mm, while a gel is approximately 140 mm6175 mm. Thus, to www.electrophoresis-journal.com Electrophoresis 2007, 18, 3324–3332 Figure 1. DIGE experiment image excited at Cy3 wavelength. A substantial number of nuisance effects (of considerable size) contribute to the images. This paper identifies and removes the most prominent factors, to isolate the true relative protein intensity corresponding to this and the complementary Cy5 light source. Figure 2. DIGE experiment image excited at Cy5 wavelength. A substantial number of nuisance effects (of considerable size) contribute to the images. This paper identifies and removes the most prominent factors, to isolate the true relative protein intensity corresponding to this and the complementary Cy3 light source. capture an image of the whole gel, a 564 array of tiled images is taken; we will henceforth refer to one of the 35 mm square camera (sub-)images as a “tile.” The platform moves so that the tiles are collected in boustrophedonic order (the first row from left to right, the second row from right to left, © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Proteomics and 2-DE 3325 etc). Each tile is 256 pixels6256 pixels, so the entire image of 20 tiles is 1280 pixels61024 pixels, and each pixel is approximately 135 mm2. Beginning with Tile 1, the camera images the DIGE sample using the Cy3 filter, then changes filters and images the same sample under the Cy5 light. After both images have been recorded, the platform moves in position to image Tile 2, first under the Cy3 light and then under the Cy5 light. A typical exposure time for DIGE images is 30 s for each tile under each lighting filter. Meanwhile, the transition time to move from one tile to the next is about five s. Thus, the total time to create the DIGE images corresponding to both cyanine dyes is approximately 25 min. For completeness, we tested for a change in pixel intensity by imaging the gel from bottom to top, and by reversing the lighting filter order, and did not find a statistically significant difference in pixel intensity in either case. This work focuses on image analysis using CCD-based systems because of their inherent qualities that make such a framework appealing. CCD-based systems are more versatile than scanners with respect to illumination wavelength. Scanners rely on the wavelengths available for the lasers used in the system, and on photomultipliers which amplify the photon signal by creating an electron cascade. This response is adjusted by setting the gain of the photomultiplier tube (PMT). This means that the system response to light is adjustable (and inherently variable). Thus, the number of counts from one experiment to the next may drift for PMTs. The CCD, however, does not amplify the photon signal, but instead relies on the ability of photons to mobilize electrons in the detector’s solid state SiO material. Further, the number of counts per photon (or electron) for CCD systems is very stable over time. A disadvantage of the CCD system is the stability of the light source. The halogen lamps take time to warm up and stabilize. The halogen lamps also have a relatively short lifespan, while lasers are more stable and long-lived. The experimental design described, however, takes this matter into account and accepts the tradeoff because of the more significant structural benefits. The gel in the CCD-based system is placed in an open tray, which provides access for a gel cutting robot to remove protein spots for mass spectrometric analysis. The scanners, however, image the gel between glass plates, which does not allow one to remove protein spots. As a result, any subsequent sample preparation for mass spectrometric analysis would be further hindered by scientist involvement. Further, it is unclear what technical artifacts may be introduced due to the glass plates. An analogous modeling approach is essential for laser scanner systems to address the inherent issues pertaining to this technology. We remind the reader that the overarching scientific goal is to introduce an automated approach for protein analysis. 2.2 Overview: The model Let l = {3, 5} denote the light filter setting to Cy3 or Cy5, respectively, and denote the complementary light setting. There are numerous experimental factors that can add to the www.electrophoresis-journal.com 3326 K. F. Sellers et al. Electrophoresis 2007, 18, 3324–3332 true relative protein intensities. Let Yl denote the intensity outcome from a DIGE experimental gel imaged under light source l. This image has an increased intensity reading due to the CCD camera bias inherent in the system. The background intensity induced by the platform and consistency of the gel also contributes to Yl. To normalize the two protein images, we must also account for the difference in intensity due to the effect of the lighting filter source exciting the associated dye. Finally, because both protein samples are present on the same gel, there is a multiplicative effect from the latent protein sample (hereafter referred to as the bleedthrough effect) that represents an additive relative protein contribution from the complementary sample when imaging the gel under light source l. With the above considerations in mind, the following model is believed to describe the systematic effects: Y i ¼ b þ tðdi þ ri þ ai pi þ ki ai pi Þ þ ei (1) where b represents the camera bias, t denotes the exposure time associated with the image, dl is the effect of the lighting system on the platform under light l, rl measures the additional light effect that is introduced by gel excitation under light l, al and ai are the lighting effects when a uniformly labeled gel is excited with its corresponding light, k denotes bleed-through effect, pl and pi represent the true relative intensity (on a 0-1 scale) of the proteins, respectively, and e is random error. All of the parameters may vary spatially. We let i and j denote the pixel row and column location on the CCD, respectively, within tile k; i = 1, . . ., 256; j = 1, . . ., 256; k = 1, . . ., 20. Thus, Equation (1) becomes Yl;ijk ¼ bijk þ t dl;ijk þ rl;ijk þal;ijk pl;ijk þ kl;ijk al;ijk pl;ijk Þ þ el;ijk (2) For simplicity, we suppress the (ijk) subscripts when their presence is implicit. This paper focuses on the two dye, two light framework: alpl is the effect due to the protein when the light matches the dye, and ai pi refers to the light intensity caused by the complementary light-dye relationship. Our ultimate goal is to estimate the sample relative intensities, p3 and p5, for the proteins when excited under Cy3 and Cy5, respectively, from a single gel. Thus, we consider two models of the form (2) where, given DIGE images Y3 and Y5, we want to estimate p3 and p5 and compare those resulting images because they are on a comparable intensity scale. Any spots that appear more abundant in one image than in the other imply that the associated protein is differentially expressed, whereas spots that statistically significantly change in location signify differential modification. Although these issues are not addressed here, they are noted to demonstrate this work’s broad, significant impact. All terms in the model, other than p3 and p5, are nuisance parameters of considerable size, contributing to the complexity of the model. The model has been developed iteratively; examination of residuals at each stage has suggested further © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim experimentation and the form of additional terms in the model. In contrast, most biologists simply take a picture of the platform for a time equal to that for the DIGE experiment and subtract this result from the experimental data to determine the “preprocessed data”. Our work goes above and beyond that which is currently practiced, attributing the variation present in the data to specific sources that are not considered via the standard preprocessing technique. The model was not developed for the specific datasets; the values apply only to this particular dataset but the general model is applicable to all of the images we have examined. Note that this model can also be applied to standard 2-DE images. 2-DE images are created by exciting a labeled gel with its corresponding light source, e.g. a Cy3 (Cy5) gel excited with a Cy3 (Cy5) light. Thus, under a particular light source l, these images can be modeled via Equation (1) with pi = 0. The image data (in the form of fit files) were analyzed using the computational freeware tool, Functional Image Analysis Software Computational Olio (FIASCO version 5.1; http://www.stat.cmu.edu/~fiasco), in conjunction with Splus version 7.0 on a Linux server 2.4.21 (2005). FIASCO is a collection of software originally designed to analyze fMRI data using a series of processing steps. Because of its flexibility and ability to analyze large datasets fairly quickly, however, it is a good choice for analyzing proteomic data. The resulting processing time to analyze the images is quite efficient at approximately five min. In an effort to quantify the significant contributors of systematic variation in the data, we used analysis of variance (ANOVA) measures, including the associated mean-square (MS) of a factor, mean-squared error (MSE), error sum of squares (SSE), and total sum of squares (SST). These measures assess the quality of the estimators through their associated variance. A “large” term MS value implies statistical significance for that effect because it accounts for a substantial amount of the total variation in the date, quantified by the total sum of squares. Meanwhile, a small MS value implies that the effect is not statistically significant. 2.3 The system of equations Five experiments are performed (see Table 1) to obtain a number of images representing different contributions to the intensity values obtained in the DIGE gel image. Given the resulting images from these experiments, we consider the system of equations based on Eq. (1) with epsilon removed, 8 > Y ðAÞ ¼ b > > > Y ðBÞ ¼ b þ t d > > ðBÞ l < l ðCÞ Y l ¼ b þ tðCÞ ðdl þ rl Þ > > ðDÞ > Y l ¼ b þ pðdl þ rl þ al Þ > > > : ðEÞ Y l ¼ b þ pðdl þ rl þ kl al Þ where the superscripts (A)-(E) denote the various experiments outlined in Table 1. For each equation within the system, t is pre-determined by the scientist; in particular, we set t(B) = t(C) = t. Thus we have a system of nine equations and www.electrophoresis-journal.com Proteomics and 2-DE Electrophoresis 2007, 18, 3324–3332 Table 1. Experiments performed (in the order listed in the table) to estimate nuisance parameters Experiment Condition (A) Camera bias (B) Dark field, light l (C) No dye, light l (D) Bright, light l (E) Bleed through, light l t=0 rl = 0 p3 = p5 = 0 pl = 1, pi = 0 pl = 0, pi = 1 Estimated parameter b dl rl al ki Each row in the table corresponds to an experiment or set of experiments. The parameter in the column labelled “Condition” is set to the value supplied in the table. The parameter in the last column is estimated from the experimental results. nine (b, d3, d5, r3, r5, a3, a5, k3, k5) unknowns which can be determined exactly for the nuisance parameters (denoted ^ ^d3 ; ^d5 ; r ^3 ; r ^5 ; a ^3 ; a ^5 ; k ^3 ; k ^ 5 ). To avoid overfitting, however, b; we further investigate the systematic variation that exists within each of the above constructed images. Details regarding subsequent models are supplied in the upcoming sections, and additional images representing the precise parameter estimates are provided at http://www9.georgetown.edu/faculty/kfs7/images. 2.3.1 Camera bias In order to model the camera bias, we assume no exposure time, i.e. t = 0 (Experiment A in Table 1). Further, our data consists of one 128061024 image of 20 tiles. In light of the ^ we consider the model systematic variation present in b, ^ ¼ mb þ a þ b þ cij þ dk Þ þ eijk b ijk (3) where mb is a typical pixel value, ai and bj represent the row and column effects respectively within the CCD, cij is the additional CCD (i.e. within-tile) effect, dk is the tile effect, and e is the unexplained random error. We estimated the 256 values of ai (and bj, respectively) by averaging across the row (over the column) within each tile. 2.3.2 Dark field A “dark field” image is collected with the light on and the filter set to either Cy3 or Cy5, but no gel (and therefore no protein) present in the device. The platform is then imaged with an exposure time of approximately 30 s for each tile. The purpose of such an experiment is to estimate d (Experiment B in Table 1). Because no dyed gel is considered in this experiment, we consider Eq. (2) with r = pl = p = 0. We obtain the initial estimate of ^dl via Section 2.3 and update it using the model ^ dl;ijk ¼ ml;d þ f l;ij þ g l;k þ hl;ijk þ el;ijk . The dark effect contains a typical pixel value associated with the dark field under light l (ml,d), and the effects of the light fl,ij at pixel (i, j), a © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 3327 particular tile (gl,k), nonuniformities on the platform (e.g., scratches), hl,ijk, and random error, el,ijk, respectively. Modeling the dark field image corresponding to Cy3 and Cy5 occurs in a stepwise fashion. We estimate ml,d by averaging the 1280 ? 1024 ? 2 = 2621440 pixels from the two dark ^ l,d, we estimate field images under light l. After subtracting m the effect of the light by averaging the 40 pixels associated with location (i, j) across tiles for both replicates to get ^f l,ij. After we subtract this within-tile effect, we average over the 256 ? 256 ? 2 = 131072 pixels within tile k from both replicates to get ^g l,k, k = 1, . . ., 20. Finally, we average over the resulting images for each replicate to estimate the platform ^ijk. For this example, the tile and platform effects are effect, h not statistically significant although scaling allows for the defects to be detected by eye; thus, the estimated dark field ^ ^ ld þ ^fijk; . image is ^ dl;ijk ¼ m 2.3.3 Gel images with no protein In this experiment, a gel is prepared containing no proteins and imaged with the lights on and filter set to either the Cy3 or Cy5 filter for 30 s (Experiment C in Table 1). Accordingly, ^l via Secwe consider Eq. (2) with pl = pi = 0, and solve for r tion 2.3. The overfit estimates obtained from solving the system of equations appear approximately constant, there^ ^ ^l;ijk ¼ r ^ l. fore we consider r 2.3.4 Single-dye gel images Single gel images are collected with the light on, a special gel uniformly loaded with protein labeled with only a single dye, and imaged under a Cy3 and a Cy5 lighting filter. The proteins on this gel are not separated via the electrophoresis process. “Bright” images refer to single gel images that are excited under the associated cyanine light filter (for example, a Cy3-labeled gel excited under a Cy3 light filter), while “bleed through” refers to those images where a single-dyed gel is excited under a complementary light source (for example, a Cy3-labeled gel excited under a Cy5 light filter). For each light, the exposure time for these images is dependent on the time needed to obtain near saturation in the “bright” image. We use Eq. (2) to model these images, where bright images are such that pl ; 1 and p ; 0 (Experiment D in Table 1) providing the estimates of al and a, respectively, and bleed through images set pl ; 0 and p ; 1 (Experiment E in Table 1) to estimate kl,ijk, l = {3, 5}. 2.3.5 Bright coefficients In order to estimate this parameter, we isolate the middle 362 matrix of tiles from the full images to circumvent any edge-effect concerns (we remind the reader that the goal is to automate a procedure for image preprocessing). We average ^ ^l;ij , because no statistically signifithe six tiles to determine a cant tile effect appears to be present. www.electrophoresis-journal.com 3328 K. F. Sellers et al. 2.3.6 Bleed coefficients To estimate the bleed coefficients, we collect images made with the same single dye gel used to estimate the bright coefficients, and imaged with the complimentary light (e.g., Cy3 dye, Cy5 light). The goal here is to model the added effect caused by having both dyes excited under a particular light. ^l ;ijk , the exact soluWe determine an estimate for kl ;ijk via k tion obtained from Section 2.3. ^l ;ijk image is approximately constant spatially thus, The k we average over the inner 362 matrix of tiles (to avoid con^^l ;ijk . tributions from random gel effects) to obtain k 3 Results and discussion 3.1 Camera bias In an effort to model the row effect via the ai values, we recognized a linear trend in the data (see Fig. 3) due to the CCD readout process. In particular, while earlier rows of data are being read by the CCD, additional dark current accumulates with the later rows having larger accumulations than the earlier rows. We modeled this by assuming that ai = a ? i. Our estimate for the row effect is â 0.017, implying an estimated dark current accumulation of 0.68 electrons/row/s. This parameter describes the largest amount of systematic variation, with MS = 2163479; see Table 2. The ^bj estimates, however, deviated from the analogous trend on both sides of the CCD, most strikingly on the left-hand side of the CCD; see Fig. 4. Thus, we retain 256 individual estimates, ^bj to model the transfer toward the output amplifier. After removing the row and column effects, we noted that the additional CCD effect cij and Electrophoresis 2007, 18, 3324–3332 tiling effect dk were not significant. The resulting model error image has a random white-noise pattern with MSE = 8.02; see Table 2. Thus we conclude that an adequate ^ ^ ¼ 507:9 þ 0:017i þ ^bj ; see Fig. 5. fit for the bias is b ij 3.2 Dark field Our estimate for m3,d is approximately 48.856 counts/s, while the estimate for m5,d is approximately 23.800 counts/s. A significant lighting effect exists in both cases. For this imager, the Cy3 lighting effect can be roughly described as a convex, 3-D paraboloid such that the center of each tile contains the lowest intensities, and the values increase with the Euclidean distance from the center. The Cy5 lighting effect, in contrast, can be described as a convex parabola in the y-direction, crossed with a constant function in the x-direction so that the intensities associated with the middle rows contain the lowest values, and they increase systematically as the distance to the top and bottom borders of the tile decreases. We do not have a physical explanation for these patterns. There was no statistically significant tile effect associated with either model. The platform contains dust and scratches on the glass surface, as detected by the associated platform estimate. The imaging process does not occur in a dust-free environment, therefore dust falls onto the gel and is included in the image. The scratches on the glass appear to be due to the cleaning process administered by the operators; they use detergent and paper towels to scrub and cleanse the platform surface between uses. Although this factor is easily explained, its effect is not statistically significant, therefore it is not included in the model. The resulting residual images therefore include the nonsignificant tiling effect and platform effects. We conclude that an adequate fit for the Cy3 dark field is ^ ^ d3;ijk ¼ 48:856 þ ^f3;ij (shown in Fig. 6). The SST associated Figure 3. Scatterplot of the 5120 withintile row means versus row number. We recognize a linear trend in the data – while earlier rows of data are being read by the CCD, additional dark current accumulates causing larger intensity readings associated with later rows. The estimated slope is 0.017, corresponding to the dark current of 0.68 electrons/row/s. This trend accounts for the largest amount of systematic variation regarding the camera bias. © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.electrophoresis-journal.com Proteomics and 2-DE Electrophoresis 2007, 18, 3324–3332 3329 Figure 4. Scatterplot of the 5120 withintile column means versus column number. The assumption was that a linear trend would be detected with respect to column number, an analogous result to that of the row result. The most striking feature, however, is the reduced intensity within the three columns on the left of the CCD and a similar but smaller effect on the right. Thus, the individual column estimates were retained to model the transfer toward the output amplifier. Table 2. Analysis of variance table for camera bias Outliers Row gradient Column direction Error Corrected total df Sum of squares Mean square 10 1 255 1310453 1310719 12386297.0 2163479.0 84410.0 10506868.0 25141054.0 1238629.70 2163479.00 331.02 8.02 substantial amount of variation in the data (SST3 = 2.26761013, and SSE3 = 1.45661012; SST5 = 3.09661013, and SSE5 = 1.15161012). 3.5 Bleed coefficients We modeled the bleed coefficient as a constant, kl,ijk = kl. For the Cy3 gel under Cy5 light, we estimate k3 0.0024, and k5 = 0.0623 for the data from the experiment imaging a Cy5 gel with the Cy3 dark field is 6.3696109, while its associated SSE = 1.3156107). Meanwhile, an adequate fit for the Cy5 ^ dark field is ^d5;ijk ¼ 23:800 þ ^f5;ij (shown in Fig. 7). The SST here is 1.4996109, while its associated SSE = 1.8876106. 3.3 Gel images with no protein Due to the apparent homogeneity among intensity values within these images, we model the gel effect as a con^^ ^^ . In particular, r ^^ = 0.339 counts/s and r ^ ^5 stant, r l;ijk ¼ r l 3 = 0.258 counts/s. The total sum of squares associated with the Cy3 data equals 1.8716107, while the resulting residual sum of squares is 1.8366107; similarly, the Cy5 data have a total sum of squares of 1.1346108, while SSE = 1.1306108. 3.4 Bright coefficients ^^ 3;ij equals 4415.520 plus a significant For our dataset, a ^^5;ijk equals 4988.065 plus its assowithin-tile effect, while a ciated significant within-tile effect; see Figs. 8 and 9. Each of the resulting residual images, e3,ijk and e5,ijk, shows that the gel is actually nonuniform; however, the gel was poured uniformly to the best of our ability. The model accounts for a © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim ^ ^ ¼ 507:9 þ 0:017i þ ^bj Figure 5. Fitted image for the bias effect, b ij ^ denotes the 256 individual column estimates. There was where b j no significant additional CCD effect or tiling effect, and the associated residual image shows a white-noise pattern with meansquared error equaling 8.02. www.electrophoresis-journal.com 3330 K. F. Sellers et al. Figure 6. Cy3 lighting effect associated with the dark (i.e. platform) image. This image shows the significant circular lighting effect when exciting the platform. The model fit for the Cy3 dark field is 48.856 plus this effect. There was no statistically significant tile effect or platform effect, although there exists a physical explanation associated with the platform effect. Figure 7. Cy5 lighting effect associated with the dark (i.e. platform) image. This image shows the horizontal lighting effect when exciting the platform. The model fit for the Cy5 dark field image is 23.8 plus this effect. There is no statistically significant tile or platform effect associated with this model. under Cy3 light. The added intensity caused by a bleedthrough of Cy3 light on a Cy5-dyed gel is significantly greater than that for Cy5 bleed-through on a Cy3-dyed gel. The © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Electrophoresis 2007, 18, 3324–3332 ^ ^3;ij . The image Figure 8. Fitted image for the Cy3 bright effect, a shows the statistically significant lighting effect due to the presence of proteins in the gel. The exposure time associated with the bright Cy3 image is 20 s. ^ ^ 5;ij . The image Figure 9. Fitted image for the Cy5 bright effect, a shows the lighting effect due to the presence of proteins in the gel. The exposure time associated with the bright Cy3 image is 15 s. resulting residual images (not shown) are white noise within the area of the gel, with the exception of values along the top edge of Gel3 where it originally reached saturation, and the right edge of Gel5 where the gel curled. A substantial amount of variation is explained through the model (SST3 = 185.815, and SSE3 = 177.096; SST5 = 5150.757, and SSE5 = 476.707). www.electrophoresis-journal.com Proteomics and 2-DE Electrophoresis 2007, 18, 3324–3332 3.6 Overall model The basic model for an image becomes Y l;ijk ¼ bij þ t dl;ijk þ rl þ al;ij pl;ijk þ kl al;ij pl;ijk Þ þ el;ijk (4) where all of the nuisance parameters were estimated with results detailed above. The true relative protein intensities (pl,ijk and pi;ijk ) are estimated here. To be exact, two DIGE images, Y3 and Y5, are created, and we want to use Eq. (4) to solve for p3 and p5. For now, we disregard e and let W l ¼ ðY l bÞ=t d r to obtain the system of linear equations, Wl = alpl + ki ai pi , for l = {3, 5}, i.e. there are two equations and two unknowns (W, al, ai , and ki are known from the previous sections). Thus, we can solve for pl directly: pl ¼ ðW l kl Wl Þ ðal ð1 kl kl ÞÞ; see Figs. 10 and 11. To construct these estimates, we use the protein images Yl and solve for pl. Much of the systematic variation in the DIGE experimental data is reduced or removed in the estimated true protein images. DIGE experimental images, however, are created without replication; thus, additional factors (including edge effects and other phenomena) can still contribute to further variation in the data. The data, Yl and Yi , are used in the two equations of the form (4) to solve for p3 and p5. Thus, the estimated value of the residuals is approximately zero, and any random error in the DIGE experiment is contained in the estimates for p3 and p5. With replication, however, we can estimate p3 and p5 via least squares; this will lead to non-zero residuals. By “replication”, we mean that the biologists create, electrophorese and image more than one gel from a specific protein mixture. This will allow for a more accurate estimate of the relative proteins corresponding to each light, and for other random phenomena to be accounted for in the residual images. For the Cy3 DIGE image here, SST3 = 1.65461013, while SST5 = 9.62861012 for the Cy5 DIGE image. To account for the variation in the data, we provide the sums of squares associated with each of the estimates in Eq. (4) in Table 3. Given the structure of the model, as well as the procedures used to determine the estimates, we note that the sums of squares do not add to the total sums of squares. The purpose here is to indicate the relative amount of variation stemming from each of the factors in the model. As a result, we witness the impact of this work. The lighting effect associated with the CCD camera system is the most significant of the parameters estimated. In particular, its contributing factors, namely the dark and bright effects respectively, address the background variation that we see in the original DIGE images. The current preprocessing approach for such images is to subtract the associated dark field image from the raw DIGE image corresponding to each light source. Although this addresses a significant amount of variation, it does not remove significant contributions due to the gel or light source as we have quantified above. © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 3331 Consider our final model as being composed of individual models, where the estimation of each parameter is done through an experiment meant to isolate that parameter in model (1). Several issues have not been addressed in this model, namely, to further improve on the current procedure, Figure 10. Estimated true relative Cy3 protein image. Much of the systematic variation in the raw DIGE experimental data image is reduced or removed here. Because there were no replicates for the DIGE experiment, however, random phenomena still exist in these images. Figure 11. Estimated true relative Cy5 protein image. As with the associated Cy3 protein image, much of the systematic variation that exists in the raw DIGE output has been significantly reduced or removed. Further, the resulting Cy3 and Cy5 images are normalized to allow for better image comparison. www.electrophoresis-journal.com 3332 K. F. Sellers et al. Electrophoresis 2007, 18, 3324–3332 Table 3. Sums of squares (over pixels) associated with each of the estimates in Eq. (4) Factor Parameter Sum of squares Bias Dark field bij d3,ijk d5,ijk r3 r5 a3,ij a5,ij k5 k3 p3 p5 1.466107 3.1416109 7.4996108 4.4446106 3.3766106 2.57361013 3.28061013 3.2116103 8.1636104 132.898 13.152 Gel Bright Bleed-through Proteins we continue to investigate other less important effects including the lamp age, lamp location, movement and drying of the gel, edge effects from the gel, etc. The grayscale associated with the images influences our impression of the model, as well as our ability to detect protein spots in the final image estimates. “Cropping” the gel images circumvents the influence due to edge effects, allowing for an automated procedure; however, there remain other internal influences from the grayscale that might lead one to overfit the data. For example, excessive Winsorization of an image may suggest the inclusion of more parameters (e.g., to account for an apparent subsequent tiling effect). 4 Concluding remarks We have developed an automated procedure to account for all of the aforementioned factors contributing to the variation in DIGE experimental data. Our program is supplied with data from the experiments to estimate the nuisance parameters, along with the experimental data of interest. We obtain © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim image estimates for the true relative protein intensities under the Cy3 and Cy5 light filters, respectively, along with estimates of the variances from the original and final images. The automated code allows for routine use. To further improve on the current procedure, we continue to investigate other less important effects including the lamp age, lamp location, movement and drying of the gel, edge effects from the gel, etc. We have developed a method to account for the major systematic sources of variation present in DIGE images. The model can be extended to any 2-DE images, as well as 1-D gel images. The design and motivation for the model is broad and applicable to a variety of biological images based on CCD-based fluorescence imaging. This novel model for fluorescent CCD camera images will undoubtedly improve the accuracy and precision in which we are able to detect and estimate quantities of interest. KFS and JCM were supported by the National Science Foundation VIGRE program, grant number DMS-9819950. The authors thank the reviewers for their helpful comments and suggestions. 5 References [1] Ünlü, M., Morgan, M., Minden, J., Electrophoresis 1997, 18, 2071–2077. [2] Tonge, R., Shaw, J., Middleton, B., Rowlinson, R., Proteomics 2001, 1, 377–396. [3] Alban, A., David, S., Bjorkesten, L., Andersson, C., Proteomics 2003, 3, 36–44. [4] Conradson, K., Pederson, J., Biometrics 1992, 48, 1273–1287. [5] Gygi, S., Corthals, G., Zhang, Y., Rochon, Y., Aebersold, R., Proc. Natl. Acad. Sci. USA 2000, 97, 9390–9395. [6] O’Farrell, P., Biol. J. Chem. 1975, 250, 4007–4021. www.electrophoresis-journal.com
© Copyright 2026 Paperzz