SNP chips Advanced Microarray Analysis Mark Reimers, Dept Biostatistics, VCU, Fall 2008 Affy SNP chips SNP Chip Probe Design • 10 25-mers overlapping the SNP • Alleles A & B • Sense and Anti-sense – or PM and MM (old) RMA for SNP chips • Initial Affy software wasn’t very accurate • Rabbee & Speed (2006) proposed RLMM, an RMA-like method using: – Quantile normalization – Two variables ( A & B signals) – Discriminant analysis • Much better than Affy software • Variant (BRLMM) adopted by Affy Discriminating SNPs • Estimate common covariance to clusters on ‘training’ set (Hapmap) data • Separate clusters by Mahalanobis metric • Use pre-defined clusters & metric to tell apart alleles on new data Success Rate • 90% (MPAM) to 98% (CRLMM) called at comparable accuracy on HapMap data – Cross-validation estimate • BUT • New chips don’t have same distributions as ‘training’ set CRLMM - a heroic solution • RLMM couldn’t be extended across labs • Still problems with several hundred SNPs • CRLMM addresses both these issues by careful normalization • Achieves accuracy of 99.85% on hets; 99.95% on homozygotes • Most complicated statistical calculation in BioC! CRLMM Overview 1. Normalize intensity on each chip separately by 2. Summarize qA+, qB+, qA-, qB- by median polish: M+ = qA+ - qB+ ; M- = qA-- qB3. Model log ratio bias on each chip by 4. Estimate log ratio bias using E-M – Where Zi indexes which SNP state is likely – k = 1,2,3 for AA, AB, BB Normalization – Step 1 • Regress (PM) intensity on sequence predictors and fragment length g(L) and 95% CI on one chip hb(t) for all four bases on two chips Normalization – Step 1 • Too many hb(t)’s – Impose constraint: • hb(t) is a cubic spline with 5 df on [1,25] • Forces neighboring values of h to be close • Allows variation in smoothness (unlike loess) • Subtract fitted values from signal • BUT: bias still present Step 2 – Summarization • Median Polish – Tukey’s exploratory method for arrays of numbers – Iterative method • Subtract medians of each row and each column (and accumulate) until medians converge • Robust • Fast Step 3 – Ratio Normalization • Fit bias function: – of form: • m reflects allele bases • But what is k? • Estimate by E-M m fL(L) for one chip E-M Algorithm • Systematic way to ‘guess and improve’ • Start with putative assignments to classes – i.e. guess k based on overall separations • Estimate bias for each k: fi,k • Use residuals from fit to classify again • Repeat until converge! Final Step: Calling • Aim: separation in two-dimensional logratio space: • Accuracy > 99.85% on all Hapmap calls
© Copyright 2026 Paperzz