Resources at HapMap.Org HapMap Phase II Dataset Release #21a, January 2007 (NCBI build 35) 3.8 M genotyped SNPs => 1 SNP/700 bp # polymorphic SNPs/kb in consensus dataset International HapMap Consortium (2007). Nature 449:851-861 Goals of this segment • Briefly summarize HapMap design and current status • Discuss the application of HapMap HapMap Project A freely-available public resource to increase the power and efficiency of genetic association studies to medical traits High-density SNP genotyping across the genome provides information about – SNP validation, frequency, assay conditions – correlation structure of alleles in the genome All data is freely available on the web for application in study design and analyses as researchers see fit HapMap Samples • 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI) • 90 individuals (30 trios) of European descent from Utah (CEU) • 45 Han Chinese individuals from Beijing (CHB) • 45 Japanese individuals from Tokyo (JPT) Will HapMap apply to other population samples? CEU CEU CEU Utah Utah residents residents with with European European ancestry ancestry (CEPH) (CEPH) Whites Whitesfrom from Los Angeles, Los Angeles,CA CA Botnia, Botnia,Finland Finland Population differences add very little inefficiency From Paul de Bakker HapMap progress PHASE I – completed, described in Nature paper * 1,000,000 SNPs successfully typed in all 270 HapMap samples * ENCODE variation reference resource available PHASE II –complete, data released in 2007 , described in Nature paper * >3,500,000 SNPs typed in total !!! PHASE II –complete, data released April 2009 ENCODE-HAPMAP variation project • Ten “typical” 500kb regions • 48 samples sequenced • All discovered SNPs (and any others in dbSNP) typed in all 270 HapMap samples • Current data set – 1 SNP every 279 bp A much more complete variation resource by which the genome-wide map can evaluated Completeness of dbSNP Vast majority of common SNPs are contained in or highly correlated with a SNP in dbSNP Recombination hotspots are widespread and account for LD structure 7q21 Utility of LD in association study • “If I’m a causal variant, what is relevant to my detection in association studies is how well correlated I am with one of the SNPs or haplotypes examined in the study.” Coverage of Phase II HapMap (estimated from ENCODE data) Panel %r2 > 0.8 YRI 81 CEU 94 CHB+JPT 94 From Table 6 – “A Haplotype Map of the Human Genome”, Nature max r2 0.90 0.97 0.97 Coverage of Phase II HapMap (estimated from ENCODE data) Panel %r2 > 0.8 YRI 81 CEU 94 CHB+JPT 94 max r2 0.90 0.97 0.97 Percentage of deeply ascertained common variants highly correlated with a HapMap SNP From Table 6 – “A Haplotype Map of the Human Genome”, Nature Coverage of Phase II HapMap (estimated from ENCODE data) Panel %r2 > 0.8 YRI 81 CEU 94 CHB+JPT 94 max r2 0.90 0.97 0.97 Average maximum correlation between a deeply ascertained variant and a neighboring HapMap SNP From Table 6 – “A Haplotype Map of the Human Genome”, Nature Coverage of Phase II HapMap (estimated from ENCODE data) Panel %r2 > 0.8 YRI 81% CEU 94% CHB+JPT 94% max r2 0.90 0.97 0.97 Vast majority of common variation (MAF > .05) captured by Phase II HapMap HapMap Project Phase 1 Phase 2 Phase 3 Samples & POP panels 269 samples (4 panels) 270 samples (4 panels) 1,115 samples (11 panels) Genotyping centers HapMap International Consortium Perlegen Broad & Sanger Unique QC+ SNPs 1.1 M 3.8 M (phase I+II) 1.6 M (Affy 6.0 & Illumina 1M) Reference Nature (2005) 437:p1299 Nature (2007) 449:p851 Draft Rel. 1 (May 2008) Phase 3 Samples label ASW* CEU* CHB CHD GIH JPT LWK MEX* MKK* TSI YRI* population sample African ancestry in Southwest USA Utah residents with Northern and Western European ancestry from the CEPH collection Han Chinese in Beijing, China Chinese in Metropolitan Denver, Colorado Gujarati Indians in Houston, Texas Japanese in Tokyo, Japan Luhya in Webuye, Kenya Mexican ancestry in Los Angeles, California Maasai in Kinyawa, Kenya Toscans in Italy Yoruba in Ibadan, Nigeria * Population is made of family trios # samples 90 QC+ Draft 1 71 180 162 90 100 100 91 100 90 180 100 180 1,301 82 70 83 82 83 71 171 77 163 1,115 Phase 3 • 11 panels & 1,115 samples – 558/557 males/females – 924/191 founders/non-founders • Platforms: – Illumina Human 1M (Sanger) – Affymetrix SNP 6.0 (Broad) • EXCLUDED from QC+ data set: – Samples with low completeness, and SNPs with low call rate in each pop (< 80%) and not in HWE (p < 0.001) – Overall false positive rate: ~3.2% • Data merged with PLINK (concordance over 249,889 overlapping SNPs = 0.9931) • Alleles on the (+/fwd) strand of NCBI b36 Goals of This Tutorial This tutorial will show you how to: • Find HapMap SNPs near a gene or region of interest (ROI) – – – – – • • View patterns of LD in the ROI Select tag SNPs in the ROI Download information on the SNPs in ROI for use in Haploview Add custom tracks of association data Create publication-quality images Generate customized extracts of the entire data set Download the entire data set in bulk Finding HapMap SNPs in a Region of Interest • • • • • • Find the TCF7L2 gene Identify the characterized SNPs in the region View the patterns of LD (NCBI b35) Pick tag SNPs (NCBI b35) Download the region in Haploview format Upload your own annotations & superimpose on the HapMap • Make a customized image for publication • View GWA hits & OMIM annotations in the region (NCBI b36) HapMap Glossary • LD (linkage disequilibrium): For a pair of SNP alleles, it’s a measure of deviation from random association (which assumes no recombination). Measured by D’, r2, LOD • Phased haplotypes: Estimated distribution of SNP alleles. Alleles transmitted from Mom are in same chromosome haplotype, while Dad’s form the paternal haplotype. • Tag SNPs: Minimum SNP set to identify a haplotype. r2= 1 indicates SNPs are redundant, so either one “tags” the other. • Questions? [email protected] 1: Surf to the HapMap Browser 1a. Go to www.hapmap.org 1b. Select “HapMap Genome Browser B35” ncbi B35: full dataset (includes LD patterns) ncbi B36: latest, new tracks (e.g., GWA hits) 2: Search for TCF7L2 2. Type search term – “TCF7L2” Search for a gene name, a chromosome band, or a phrase like “insulin receptor” 3: Examine Region Chromosome-wide summary data is shown in overview Default tracks show HapMap genotyped SNPs, refGenes with exon/intron splicing patterns, etc. 3: This exonic region has many typed SNPs. Click on ruler to re-center image. Region view puts your ROI in genomic context 3: Examine Region (cont) Use the Scroll/Zoom buttons and menu to change position & magnification As you zoom in further, the display changes to include more detail 3: Examine Region (cont) Phase III Use the Scroll/Zoom buttons and menu to change position & magnification 3: Mouse over a SNP to see allele frequency table As you zoom in Click to godisplay to SNP further, the details page changes to include more detail 4: Turn on LD & Haplotype Tracks 4a: Scroll down to the “Tracks” section. Turn on the LD Plot and Haplotype Display tracks. 4b: Press “Update Image” These sections allow you to adjust the display and to superimpose your own data on the HapMap 5: View variation patterns Triangle plot shows LD values using r2 or D’/LOD scores in one or more HapMap population Phased haplotype track shows all 120 chromosomes with alleles colored yellow and blue 7: Adjust Track Settings (on the spot) 7a. Click on question mark preceding track name 7b. Adjust population and display settings & press “Configure” 7: Adjust Track Settings (cont) Select the analysis track to adjust and press “Configure” 8: Turn on Tag SNP Track 8: Activate the “tag SNP Picker” and press “Update Image” 9: Adjust tag SNP picker Tag SNPs are selected on the fly as you navigate around the genome 9a: Click on question mark behind “tag SNP Picker” Alternatively, you may select “Annotate tag SNP Picker” and press “Configure…” 9: Adjust tag SNP picker (cont) Select population Select tagging algorithm and parameters 9b: Press “Configure” to save changes [optional] upload list of SNPs to be included, excluded, or design scores 10: Generate Reports 10: Select the desired “Download” option and press “Go” or “Configure” Available Downloads: • Individual Genotypes • Population Allele & Genotype frequencies • Pairwise LD values •Tag SNPs 10: Generate Reports (cont) The Genotype download format can be saved to disk or loaded directly into Haploview 10: Generate Reports (cont) The tag SNP download is the same as you get from TAGGER … 11: Create your own tracks Example: • Interested in T2DM genetics • Create file with custom annotations from http://www.broad.mit.edu/diabetes and superimpose on the HapMap 11: Upload example file: TCF7L2_annotations.txt Detailed help on the format is under the “Help” link 11: Create your own tracks (cont) Formatted data for the T2DM association results (score is -LOG10 of p-value) Some SNPs were typed (known platform) and others were imputed. Format data for both typed & imputed SNPs. Save as a text file! 11: Create your own tracks (cont) 11: Create your own tracks (cont) Make edits on your own browser window by clicking on “Edit File…” 11: Create your own tracks (cont) 12: Create Image for Publication Click on the +/- sign to hide/show a section 12a. Click on “Highres Image” Mouse over a track until a cross appears. Click on track name to drag track up or down. 12: Image for Publication (cont) 12b. Click on “View SVG Image in new browser window” 12c. Save generate file with “.svg” extensions Can view file in Firefox, but use other programs (Adobe Illustrator or Inkscape) to convert to other formats and/or edit 12: Image for Publication (cont) Inskape is free and lets you edit and convert to other formats (many journals prefer EPS) 13: View GWA hits 13a. Go to www.hapmap.org 13b. Select “HapMap Genome Browser B36” 13: View GWA hits (cont) 13c. Type search term - “FTO” Default tracks for B36 include GWA hits, OMIM predicted associations, and Reactome pathways 14: Read PubMed abstracts for GWA hits 14a: Mouse over a GWA hit to learn more about the association 14b: Click on the GWA hit to see the study’s PubMed abstract Use HapMart to Generate Extracts of the HapMap Dataset Find all HapMap characterized SNPs that: 1. Have a MAF > 0.20 in the Yoruban population panel (YRI) 2. Cause a nonsynonymous amino acid change 1. Go to hapmart.hapmap.org 1. From www.hapmap.org click on “HapMart” 2. Select data source and population of interest 2b. Press “Next” Use schema menu to select dataset 2a. Choose Yoruba population or “All Populations” 3. Select the desired filters 3c. Press “Next” 3a. Check “Allele Frequency Filter” and select MAF >= 0.2 3b. Select “SNPs found in Exons – non synonymous coding SNPs” 4. Select output fields 4c. Press “Export” 4a. Choose among several pages of fields 4b. Select the fields to include in the report. The summary shows active filters and # SNPs to be output Options at the bottom let you select text or Excel format 5. Download report Bulk downloads: Download the Complete Data • Download the entire HapMap data set to your own computer 1. Surf to www.hapmap.org Or directly click on “Data” 1. From www.hapmap.org, click on “Bulk Data Download” 2. Choose the Data Type 2. Select “Genotypes” Raw genotypes & frequencies Analytic results Protocols & assay design Your own copy of the HapMap Browser * Data also available via FTP ftp://www.hapmap.org HapMap Samples 3. Choose the dataset of interest 3. Select latest build, fwd_strand orientation, and “non-redundant” fwd_strand => same as NCBI reference assembly rs_strand => same as in dbSNP Available Genotype Datasets: • Non-redundant: QC+ filtered & redundant data removed • Filtered-redundant: QC+ filtered; duplicated data not removed • Unfiltered-redundant: Includes assays that failed QC Applying the HapMap • Study design - tagging • Study coverage evaluation • Study analysis - improving association testing • Study interpretation – Comparison of multiple studies – Connection to genes/genomic features – Integration with expression and other functional data • Other uses of HapMap data – Admixture, LOH, selection Tagging from HapMap • Since HapMap describes the majority of common variation in the genome, choosing non-redundant sets of SNPs from HapMap offers considerable efficiency without power loss in association studies Pairwise tagging A/T 1 A A T T G/A 2 G G A A high r2 G/C 3 G C G C T/C 4 T C C C high r2 G/C 5 A/C 6 A C C C G C G C high r2 After Carlson et al. (2004) AJHG 74:106 Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 Pairwise Tagging Efficiency Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r2 thresholds Pairwise YRI CEU CHB+JPT r2 ≥ 0.5 324,865 178,501 159,029 r2 ≥ 0.8 474,409 293,835 259,779 r2 = 1 604,886 447,579 434,476 Tag SNPs were picked to capture common SNPs in release 16c.1 for every 7,000 SNP bin using Haploview. Tagging Phase I HapMap offers 2-5x gains in efficiency Use of haplotypes can improve genotyping efficiency A/T 1 A A T T G/A 2 G G A A G/C 3 G C G C T/C 4 T C C C G/C 5 G C G C tags in multi-marker test should be conditional on significance of LD in order to avoid overfitting A/C 6 A C C C Tags: SNP 1 SNP 3 SNP 6 2 in total 3 in total Test for association: SNP 1 SNP captures 1 1+2 SNP 3 SNP captures 3 3+5 “AG” haplotype SNP SNPcaptures 6 4+6 Relative power (%) Efficiency and power tag SNPs random SNPs ~300,000 tag SNPs needed to cover common variation in whole genome in CEU Average marker density (per kb) P.I.W. de Bakker et al. (2005) Nat Genet Advance Online Publication 23 Oct 2005 How to pick tag SNPs? • What is the genetic hypothesis? Which variants do you want to test for a role in disease? – functional annotation (coding SNPs) – allele frequency (HapMap ascertainment) – previously implicated associations • Go to http://www.hapmap.org – DCC supported interactive tagging • Export HapMap data into tools such as Tagger, Haploview (www.broad.mit.edu/mpg) Will tag SNPs picked from HapMap apply to other population samples? CEU CEU CEU Utah Utah residents residents with with European European ancestry ancestry (CEPH) (CEPH) Whites Whitesfrom from Los Angeles, Los Angeles,CA CA Botnia, Botnia,Finland Finland Population differences add very little inefficiency Platform presentation: Paul de Bakker (#223: Sat 9.30) Applying the HapMap • Study design - tagging • Study coverage evaluation • Study analysis - improving association testing • Study interpretation – Comparison of multiple studies – Connection to genes/genomic features – Integration with expression and other functional data • Other uses of HapMap data – Admixture, LOH, selection Genome-wide association coverage • If genome-wide products are typed on the HapMap sample panel, the SNPs on HapMap not included in the panel provide an evaluation for the coverage of the product – ENCODE (deep ascertainment) – Phase II (dense, genome-wide) Further Information • HapMap Publications & Guidelines http://hapmap.cshl.org/publications.html.en • Past tutorials & user’s guide to HapMap.org http://www.hapmap.org/tutorials.html.en • Questions? [email protected]
© Copyright 2026 Paperzz