BIOINFORMATICS APPLICATIONS NOTE Vol. 23 no. 5 2007, pages 639–640 doi:10.1093/bioinformatics/btl644 Gene expression Software package for automatic microarray image analysis (MAIA) Eugene Novikov and Emmanuel Barillot Service Bioinformatique, Institut Curie, 26 Rue d’Ulm, 75248 Paris Cedex 05, France Received on May 11, 2006; revised on December 1, 2006; accepted on December 18, 2006 Advance Access publication January 19, 2007 Associate Editor: Alfonso Valencia ABSTRACT Summary: Although various software solutions are currently available for microarray image analysis, one would still expect to develop algorithms ensuring higher level of intelligence and robustness. We present a fully functional software package for automatic processing of the two-color microarray images including spot localization, quantification and quality control. The developed algorithms aim at making ratio estimates more resistant to array contamination and offer automatic tools to evaluate spot quality. Availability: A demo version of the software can be downloaded from http://bioinfo.curie.fr/projects/maia. A full version is freely available to non-commercial users upon request from the authors. Contact: [email protected] Two-color (typically, Cy3/green and Cy5/red) comparative microarray experiment is a key point of cDNA (Hedge et al., 2000), CGH (comparative genome hybridization, Pinkel et al., 1998) and, more recently, protein (Eckel-Passow et al., 2005) microarray technologies. In a conventional two-color experiment, one obtains two images (one for each color) containing thousands of bright spots, distributed over the array according to a special pattern. The first step in microarray data analysis consists of image processing intending to estimate the ratio of the fluorescence intensities in two color channels at each spot. This ratio reflects differential gene (cDNA) or protein expression or change in DNA copy number (CGH) between two compared samples. Although various academic and commercial software solutions for microarray image analysis are currently available (Jain et al., 2002; Molecular Devices, 2005; White et al., 2005; Yang et al., 2002), further efforts will perhaps never be exhausted because the existing solutions are not ideal and offer some room for further improvements. In this note, we present a fully functional software package (MAIA) for automatic processing of the microarray two-color images including spot localization, quantification and quality control (Fig. 1). The spot localization module (i) identifies the position of each spot on the array to associate it with the spotted clone; and (ii) establishes the borders between the neighboring spots to allow further independent data processing (extracting *To whom correspondence should be addressed. quantitative information) for each spot. We deal with the most widespread, orthogonal, spot localization pattern. In this pattern, the spots are aligned horizontally and vertically and can be arranged in blocks containing different numbers of spot rows and spot columns (Fig. 1). The spot localization algorithm (Novikov and Barillot, 2006) is fully automatic and robust with respect to deviations from perfect spot alignment and contamination. As an input, it requires only the common array design parameters: number of blocks and number of spots in the x and y directions of the array. Although fully automatic, there is no guarantee that it will perform well for any array. Therefore, we offer some interactive tools to repair grid in case if it is erroneous. The spot quantification module generates two sets of quantitative parameters for each spot: ratio estimates and spot quality characteristics. Two algorithms for ratio estimation were implemented (Novikov and Barillot, 2005a). The first is a direct arithmetic ratio of the background-corrected fluorescence intensity estimates in the two color channels. This approach requires the identification of both the foreground—the measured spot—and the background—typically the level of nonspecific hybridization. The second ratio estimate is the slope of the linear regression plot of the pixel intensities in one color channel versus another one. This approach does not require spot segmentation, however unavoidable presence of some aberrant or outlier pixels, occurring even in small quantities, can distract the regression line and strongly bias the regression ratio. Therefore we have developed a statistical procedure (Novikov and Barillot, 2005a), which systematically searches and removes aberrant or outlier pixels (Fig. 1). This procedure guarantees a higher level of confidence in the ratio estimates obtained using linear regression approach. Moreover, after aberrant pixels are removed the segmentation algorithm also ensures more robust estimates, and there is a greater agreement in the ratio values obtained by both methods. The spot quality module provides a value of spot quality reflecting the level of confidence in the obtained ratio estimate at each spot. The unique spot quality value is derived from a set of nine marginal quality parameters characterizing certain features of the spot (Novikov and Barillot, 2005b). The contribution of each quality parameter in the overall quality is automatically evaluated based on the visual classification of the spots, or using information available from the replicated spots, located on the same array (Fig. 1) or over a set of replicated arrays (Novikov and Barillot, 2005b). Therefore ß The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 639 E.Novikov and E.Barillot Fig. 1. A screenshot of the MAIA processing window. An exemplary spot with the pixel regression outliers is shown. Five outlier pixels (in cyan), which can hardly be flagged out by geometrical criteria, can easily be found using linear regression approach. Three replicated spots are placed as neighbors in a row on this array. Quality plot demonstrates tendency to decrease the overall quality index of the spot replicates ( y-axis) with the increase of the ratios variability of the replicates (x-axis). ‘Bad’ spots are the spot with the overall quality index beyond the user-defined threshold. The overall quality is derived from the nine spot quality characteristics, shown in the table. the developed procedure allows us not only to quantify spot quality, but also to identify different types of spot deficiency occurring in microarray technology. The quality values can be used either directly to flag out some spots with the quality lower than the user-defined threshold, or in the followup analysis as a weight controlling the contribution/influence of the obtained ratio estimates. Image simulation. We have developed a special software component for Monte-Carlo simulation of the microarray images (Novikov and Barillot, 2005a). It takes into account the statistical noise and different types of contamination like nonspecific hybridization and dust. Since in a simulation experiment the true values of the ratios are precisely known, it allows us to evaluate, to test and to compare different algorithms of microarray image analysis objectively. Software. MAIA runs on 95/98/Me/NT/2000/XP Windows platforms and needs the Java Runtime Environment. Complete analysis of one 4 Mb image pair (Cy3/Cy5, 7300 spots; each spot is 45 pixels) takes 5 s on 3.00 GHz PentiumÕ 4 CPU with 1 GB of RAM. The validity of the algorithms has been tested and confirmed on a large set of experimental images from the different microarray platforms used within our Institute and on a representative selection of CGH images obtained from the UCSF Cancer Center. ACKNOWLEDGEMENTS We would like to thank our colleagues from the different laboratories of the Institute Curie: (F. Radvanyi, UMR 144 640 CNRS/IC: Compartimentation et dynamique cellulaires; O. Delattre, INSERM 509: Pathologie moléculaire des cancers; M. Dutreix, UMR 2027 CNRS/IC: Génotoxicologie et cycle cellulaire) and Prof D. Pinkel (UCSF Comprehensive Cancer Center), who has provided numerous microarray images allowing considerable improvement of the algorithms. Conflict of Interest: none declared. REFERENCES Eckel-Passow,J.E. et al. (2005) Experimental design and analysis of antibody microarrays: applying methods from cDNA arrays. Cancer Res., 65, 2985–2989. Hegde,P. et al. (2000) A concise guide to cDNA microarray analysis. BioTechniques, 29, 548–562. Jain,A.N. et al. (2002) Fully automated quantification of microarray image data. Genome Res., 12, 325–332. Molecular Devices, Corp (2005) GenePix Pro 6.0. http://www.moleculardevices. com/, User’s Guide and Tutorial. Novikov,E. and Barillot,E. (2005a) An algorithm for automatic evaluation of the spot quality in two-color DNA microarray experiments. BMC Bioinformatics, 6, 293. Novikov,E. and Barillot,E. (2005b) A robust algorithm for ratio estimation in two-color microarray experiments. J. Bioinform. Comput. Biol., 3, 1411–1428. Novikov,E. and Barillot,E. (2006) A noise-resistant algorithm for grid finding in microarray image analysis. Machine Vis. Appl., 17, 337–345. Pinkel,D. et al. (1998) High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet., 20, 207–211. White,A.M. et al. (2005) Automated microarray image analysis toolbox for MATLAB. Bioinformatics, 21, 3578–3579. Yang,Y.H. et al. (2002) Comparison of methods for image analysis on cDNA microarray data. J. Comput. Graph. Stat., 11, 108–136.
© Copyright 2026 Paperzz