Genotyping Technology 02-‐223 How to Analyze Your Own Genome Fall 2013 HapMap Project Phase 1 Phase 2 Phase 3 Samples & POP panels 269 samples (4 populations) 270 samples (4 populations) 1,115 samples (11 populations) Genotyping centers HapMap International Consortium Perlegen Broad & Sanger Unique QC+ SNPs 1.1 M 3.8 M (phase I+II) 1.6 M (Affy 6.0 & Illumina 1M) Reference Nature (2005) 437:p1299 Nature (2007) 449:p851 Draft Rel. 1 (May 2008) Phase 3 Samples label ASW* CEU* CHB CHD GIH JPT LWK MEX* MKK* TSI YRI* population sample African ancestry in Southwest USA Utah residents with Northern and Western European ancestry from the CEPH collection Han Chinese in Beijing, China Chinese in Metropolitan Denver, Colorado Gujarati Indians in Houston, Texas Japanese in Tokyo, Japan Luhya in Webuye, Kenya Mexican ancestry in Los Angeles, California Maasai in Kinyawa, Kenya Toscans in Italy Yoruba in Ibadan, Nigeria * Population is made of family trios # samples 90 QC+ Draft 1 71 180 162 90 100 100 91 100 90 180 100 180 1,301 82 70 83 82 83 71 171 77 163 1,115 HapMap Browser 1a. Go to www.hapmap.org 1b. Select “HapMap phase 3” Overview Genotyping technology -‐ SNPs and copy number variaEons Processing the data from genotyping assays -‐ Linkage disequilibrium -‐ Haplotype inference, phasing -‐ Tag SNP selecEon Preliminary analysis of HapMap data SNP Genotyping with SNP Array • SNP arrays make use of the biochemical principle that nucleoEde bases bind to their complementary partners (A binds to T, C binds to G) – An array of oligonucleoEde sequences is laid across the surface of the chip. – The sample’s DNA is amplified, and hybridized to the array. – The array is scanned to quanEfy the relaEve amount of sample bound to each feature. • For SNPs, there is a pair of probes: one for each of the alleles. • Widely used SNP array technology – Affymetrix vs. Illumina SNP arrays Affymetrix GeneChip Probe Array SNP Array Technology: Affymetrix Array • The fragment of DNA harboring an A/C SNP to be interrogated by the probes • 25-‐mer probes for both alleles • The locaEon of the SNP locus varies from probe to probe • The DNA binds to both probes regardless of the allele it carries, but it does so more efficiently when it is complementary to all 25 bases (bright yellow) rather than mismatching the SNP site (dimmer yellow). • This impeded binding manifests itself in a dimmer signal. 25-‐mer (25 nucleoEdes) SNP Array Technology: Illumina Array • The fragment of DNA harboring an A/C SNP to be interrogated by the probes • A\ached to each Illumina bead is a 50-‐mer sequence complementary to the sequence adjacent to the SNP site. • The single-‐base extension (T or G) that is complementary to the allele carried by the DNA (A or C, respecEvely) then binds and results in the appropriately-‐colored signal (red or green, respecEvely) Calling Genotypes • The raw signal intensiEes from the SNP array can be noisy • How to cope with the noise – Pool the raw signal intensiEes from mulEple individuals for each SNP and perform a cluster analysis – Three clusters for each of the three possible genotypes (AA, AB, BB) Each dot represents the raw signal intensity for a SNP for each individual CNV Genotyping with Array CGH • Genomic DNA from two cell populaEons is differenEally labeled (red and green) and hybridized to a microarray Copy numbers are mostly the same across the chromosome between Test and Ref samples ReducEon by a factor of two in copy numbers Log2 (red/green) =log2(red)-‐log2(green) Overview Genotyping technology -‐ SNPs and copy number variaEons Processing the data from genotyping assays -‐ Linkage disequilibrium -‐ Haplotype inference, phasing -‐ Tag SNP selecEon Preliminary analysis of HapMap data Linkage Disequilibrium (LD) • LD reflects the relaEonship between alleles at different loci. • Ocen, r2 (correlaEon coefficient) is used as a measure of LD. Locus A Locus B Basic Concepts Parent 2 Parent 1 A "" a "" "B "b A B a b A B a b " X A "" a "" OR a b A B A B a b High LD -> No Recombination (r2 = 1) SNP1 “tags” SNP2 "B "b " A b A B a B A B a B A B A b A b etc… Low LD -> Recombination Many possibilities How to Compute r2 SNP2 SNP3 1 0 1 1 1 0 1 1 0 0 0 1 0 0 0 1 r2=1.0 r2=0.0 R2=0.0 r2 matrix SNP1 SNP2 SNP3 SNP1 SNP2 SNP3 Individuals SNP1 1 1 1 1 0 0 0 0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 Linkage Disequilibrium in SNP Data • r2 in SNP data from a populaEon of individuals (Black: r2=1, white: r2=0) PopulaEon 2 PopulaEon 2 genome genome PopulaEon 1 PopulaEon 1 Summary • SNP/CNV genotyping technology and genotype-‐calling methods • Linkage disequilibrium in the neighboring loci are due to non-‐ random recombinaEon sites across the genome • The level of linkage disequilibrium can be quanEfied by r2
© Copyright 2026 Paperzz