Microarray Data Analysis Data quality assessment and normalization for affymetrix chips Outline • Visualization and diagnostic plots • Normalization • Filtering 2 Microarray studies life cycle Biological question Experimental design Failed Microarray experiment Quality Measurement Image analysis Normalization Pass Here we are Analysis Estimation Testing Clustering Biological verification and interpretation Discrimination 3 Looking at microarray data Diagnostic Plots Was the experiment a success? Diagnostic plots for cDNA chips • Plots can be used to check microarray quality • Some plots useful for both cDNA or Affy. – Scatterplot / MA plot – Histograms – Spatial plots – Box plots • Other are more technology specific – Degradation plots 5 Image plots for affymetrix chips 6 MA-plot for GeneChip arrays (1 color) aRNA RMA MT intensity for each probe set MT aRNA RMA WT WT intensity for each probe set M Log2(MT)Log2(WT) A Log2(MT*WT) / 2 (signal strength) 7 Box plots 8 Density plots 9 Digestion plots 10 Preprocessing affy chips Preprocessing steps • Computing expression values for each probe set requires 3-steps – Background correction – Normalization – Probe set summaries 12 Most popular approaches • Affymetrix’s own MAS 5 or GCOS 1.0 algorithms • RMA (Robust Multichip Analysis) – Irizarry, Bolstad, Collin, Cope, Hobbs, Speed (2003), Summaries of Affymetrix GeneChip probe level data. NAR 31(4):e15 • dChip http://www.dchip.org: Li and Wong (2001). Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. PNAS 98, 3113 MAS • Background correction – Ej = PMj - MMj* where MMj* is chosen so that Ej is non-negative • Normalization – Scale so that mean Ej is same for each chip • Probe Set Summary – log(Signal Intensity) = TukeyBiweight(log Ej) 14 RMA- Background correction • Ignore MM, fit model to PM – PM = Background (N(0,s2) + Signal (Exp(a)) 15 RMA-Normalization • Force the empirical distribution of probe intensities to be the same for every chip in an experiment • The common distribution is obtained by averaging each quantile across chips: Quantile Normalization 16 One distribution for all arrays: the black curve 17 RMA: Probe set summary • Robustly fit a two-way model yielding an estimate of log2(signal) for each probe set • Fit may be by – median polish (quick) or by – Mestimation (slower but yields standard errors and good quality • RMA reduces variability without loosing the ability to detect differential expression 18 MA plots before and after RMA 19 Summary • Microarray experiments have many “hot spots” where errors or systematic biases can apper • Visual and numerical quality control should be performed • Usually intensities will require normalisation – At least global or intensity dependent normalisation should be performed – More sophisticated procedures rely on stronger assumptions Must look for a balance 20
© Copyright 2026 Paperzz