Diapositiva 1

Microarray Data Analysis
Data quality assessment
and normalization
for affymetrix chips
Outline
• Visualization and diagnostic plots
• Normalization
• Filtering
2
Microarray studies life cycle
Biological question
Experimental design
Failed
Microarray experiment
Quality
Measurement
Image analysis
Normalization
Pass
Here
we are
Analysis
Estimation
Testing
Clustering
Biological verification
and interpretation
Discrimination
3
Looking at microarray data
Diagnostic Plots
Was the experiment a success?
Diagnostic plots for cDNA chips
• Plots can be used to check microarray
quality
• Some plots useful for both cDNA or Affy.
– Scatterplot / MA plot
– Histograms
– Spatial plots
– Box plots
• Other are more technology specific
– Degradation plots
5
Image plots for affymetrix chips
6
MA-plot for GeneChip arrays (1 color)
aRNA
RMA
MT
intensity
for each
probe set
MT
aRNA
RMA
WT
WT
intensity for
each probe
set
M
Log2(MT)Log2(WT)
A
Log2(MT*WT) / 2
(signal strength)
7
Box plots
8
Density plots
9
Digestion plots
10
Preprocessing affy chips
Preprocessing steps
• Computing expression values for each
probe set requires 3-steps
– Background correction
– Normalization
– Probe set summaries
12
Most popular approaches
• Affymetrix’s own MAS 5 or GCOS 1.0 algorithms
• RMA (Robust Multichip Analysis)
– Irizarry, Bolstad, Collin, Cope, Hobbs, Speed
(2003), Summaries of Affymetrix GeneChip probe level
data. NAR 31(4):e15
• dChip http://www.dchip.org: Li and Wong (2001).
Model-based analysis of oligonucleotide arrays:
expression index computation and outlier
detection. PNAS 98, 3113
MAS
• Background correction
– Ej = PMj - MMj*
where MMj* is chosen so that Ej is non-negative
• Normalization
– Scale so that mean Ej is same for each chip
• Probe Set Summary
– log(Signal Intensity) = TukeyBiweight(log Ej)
14
RMA- Background correction
• Ignore MM, fit model to PM
– PM = Background (N(0,s2) + Signal (Exp(a))
15
RMA-Normalization
• Force the empirical distribution of probe
intensities to be the same for every chip in
an experiment
• The common distribution is obtained by
averaging each quantile across chips:
Quantile Normalization
16
One distribution for all arrays:
the black curve
17
RMA: Probe set summary
• Robustly fit a two-way model yielding an
estimate of log2(signal) for each probe set
• Fit may be by
– median polish (quick) or by
– Mestimation (slower but yields standard errors
and good quality
• RMA reduces variability without loosing
the ability to detect differential expression
18
MA plots before and after RMA
19
Summary
• Microarray experiments have many “hot spots”
where errors or systematic biases can apper
• Visual and numerical quality control should be
performed
• Usually intensities will require normalisation
– At least global or intensity dependent normalisation
should be performed
– More sophisticated procedures rely on stronger
assumptions Must look for a balance
20