Introduction to SNP resources and a tool for variation functional analysis Chuan-Kun Liu [email protected] National Genotyping Center Academia Sinica Taiwan 2010/07/26 Training p.2 Outline • • • • Basic Concepts NCBI dbSNP International HapMap Project VarioWatch Training p.3 Outline • • • • Basic Concepts NCBI dbSNP International HapMap Project VarioWatch Training p.4 Basic Concepts The Human Genome • 3 X 109 base pairs • The Human Genome Project is completed (Telomere, Centromere, Contig order ?) in 2003 • Approximately 20,000~25,000 genes (alterative splicing ?) • Sequence and structural variations are seen in different genomes Training p.5 Basic Concepts • Allele : Original definition : One of the different forms of a gene that can exist at a single locus Extended definition : A site at which DNA-either coding or noncoding differs among genomes Training p.6 Basic Concepts • Homozygous : Having two identical alleles at corresponding loci on homologous chromosomes • Heterozygous : Having two different alleles at corresponding loci on homologous chromosomes Training p.7 Basic Concepts • Single Nucleotide Polymorphism (SNP) : SNPs are DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is changed SNP1 SNP2 TGTAGTTGTGCAGGCCTGTAGTCCCAG TGTAGTAGTGCAGGCCTGTCGTCCCAG TGTAGTAGTGCAGGCCTGTGGTCCCAG Training p.8 Properties of SNPs • more than 23 million SNPs have been found in human genome (dbSNP 131) • 3 million SNPs differences between any 2 individuals (3 billion * 1/1000) • minor allele frequency ≧ 1% • Allele type : biallelic [A/C], triallelic [A/C/G], N[A/T/C/G], deletion [-/C], insersion [G/-], and others Training p.9 Outline • • • • Background NCBI dbSNP International HapMap Project VarioWatch Training p.10 NCBI dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP Training p.11 NCBI dbSNP Human genome assembly 36 2005/08 dbSNP125 (Build 35.1) 2005/10 2005 2006 dbSNP126 (Build 36.1) 2006/05 GRCh 37 2009/02 dbSNP127 (Build 36.2) 2007/03 2007 dbSNP129 (Build 36.3) 2008/04 2008 dbSNP128 (Build 36.2) 2007/10 dbSNP131 (Build 37.1) 2010/04 2009 dbSNP130 (Build 36.3) 2009/05 2010 Training p.12 NCBI dbSNP 126 Variation Class Updated 2006/11/15 Number of Variation Percentage (%) SNP 9,636,322 80.82 DIP 2,202,926 18.48 MIXED 51,775 0.43 MNP 16,329 0.14 NAMED 9,815 0.08 STR 5,096 0.04 424 0.004 4 0.00003 NO VARIATION HETEROZYGOUS Training p.13 NCBI dbSNP 131 Variation Class Updated 2010/04/01 Number of Variation Percentage (%) SNP 18,603,484 78.66 DIP 4,833,619 20.44 111,366 0.47 MNP N/A N/A NAMED N/A N/A STR 5,195 0.02 NO VARIATION 3,332 0.01 N/A N/A MIXED HETEROZYGOUS Training p.14 SNP Submission Training p.15 SNP Submission Info Notable Human SNP Submitters in dbSNP Build 131 Handle Submitter Name Institution 1000GENOMES 1000 Genomes Project Data coordinated by EBI and NCBI European Bioinformatics Institute / National Center for Biotechnology Information COMPLETE_GE Radoje Drmanac NOMICS Complete Genomics, Inc. ILLUMINA Cindy Taylor Lawley Illumina Inc. ENSEMBL Ewan Birney EMBL Outstation Hinxton GMI Jeongsun Seo Seoul National University Medical Research Center AFFY Affymetrix Technical Support Affymetrix Training p.16 How to collect SNP data 1. 2. 3. 4. dbSNP homepage NCBI Entrez web query interface NCBI Entrez Programming Utilities Localized dbSNP Training p.17 dbSNP homepage http://www.ncbi.nlm.nih.gov/projects/SNP/ Training p.18 dbSNP homepage Training p.19 NCBI Entrez web query interface http://www.ncbi.nlm.nih.gov/snp/limits 1-based NOT 0-based Training p.20 NCBI Entrez web query interface Training p.21 NCBI Entrez web query interface Training p.22 NCBI Entrez web query interface http://www.ncbi.nlm.nih.gov/snp?Db=snp&Cmd=DetailsSearch& Term=%233+AND+%234+AND+”indchineseyh1”[Filter] Training p.23 NCBI Entrez web query interface Training p.24 SNP info Can not represent the version of the SNP info SNP: GeneView Training p.25 SNP info Not always the same as the reference sequence Training p.26 SNP: GeneView Training p.27 NCBI Map Viewer Maps & options -> Sequence Maps -> Variation Training p.28 NCBI Entrez Programming Utilities http://eutils.ncbi.nlm.nih.gov Training p.29 Eutils for EntrezSNP http://www.ncbi.nlm.nih.gov/projects/SNP/SNPeutils.htm Training p.30 NCBI Entrez Programming Utilities http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=snp&term=59980974 Training p.31 NCBI Entrez Programming Utilities http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=snp&id=59980974&report=DocSet Training p.32 Create a local copy of dbSNP ftp://ftp.ncbi.nih.gov/snp/database/ README.create_local_dbSNP.txt erd_dbSNP.pdf /organism_data : Data for each organism specific database /organism_schema : Schema DDL(Data Definition Language in SQL) for organism specific database /shared_data : Data in dbSNP_main that is shared by all organism database /shared_schema : Schema DDL for dbSNP_main Training p.33 Practice • What is the physical position of rs2770 in dbSNP build 131 with assembly GRCh37? • Is rs60817388 a valid snp_id? • How many human SNPs in dbSNP build 131 with triallelic? • Which genome assembly is provided when query SNP info through Eutil? Training p.34 Outline • • • • Basic Concepts NCBI dbSNP International HapMap Project VarioWatch Training p.35 DNA recombination http://www.nature.com/nrm/journal/v6/n1/fig_tab/nrm1546_F4.html Training p.36 Haplotype / Tag SNP LD (linkage disequilibrium) : For a pair of SNP alleles, it’s a measure of deviation from random association (i.e., no recombination). Measured by D’, r2, LOD High LD -> No Recombination http://hapmap.ncbi.nlm.nih.gov/originhaplotype.html.en http://hapmap.ncbi.nlm.nih.gov/whatishapmap.html Training p.37 HapMap Project Launched on Oct, 2002 Phase 1 Phase 2 Phase 3 Samples & POP panels 269 samples (4 panels) 270 samples (4 panels) 1,397 samples (11 panels) Genotyping centers HapMap International Consortium Perlegen Broad & Sanger Unique QC+ SNPs 1.1 M 3.8 M (phase I+II) 1.6 M (Affy 6.0 & Illumina 1M) Reference Nature (2005) 437:p1299 Nature (2007) 449:p851 Draft Rel. 3 (May 2010) Training p.38 Phase 3 Samples Label Population Sample ASW (A) African ancestry in Southwest USA Samples QC+ Draft 3 98 87 CEU (C) Utah residents with Northern and Western 180 165 CHB (H) Han Chinese in Beijing, China 162 137 CHD (D) Chinese in Metropolitan Denver, Colorado 129 109 European ancestry from the CEPH collection GIH (G) Gujarati Indians in Houston, Texas 117 101 JPT (J) Japanese in Tokyo, Japan 131 113 LWK (L) Luhya in Webuye, Kenya 122 110 MEX (M) Mexican ancestry in Los Angeles, California 104 86 MKK (K) Maasai in Kinyawa, Kenya 205 184 TSI (T) Tuscan in Italy 114 102 YRI (Y) Yoruban in Ibadan, Nigeria (West Africa) 220 203 1,582 1,397 http://ccr.coriell.org/Sections/Collections/NHGRI/hapmap.aspx?PgId=266&coll=HG ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2010-05_phaseIII/relationships_w_pops_041510.txt Training p.39 Phase 3: Draft Release 1 samples 71 ASW 162 CEU 82 CHB 70 CHD 83 GIH 82 JPT 83 LWK 71 MEX 171 MKK 77 TSI 163 YRI QC+ SNPs 1,632,186 1,634,020 1,637,672 1,619,203 1,631,060 1,637,610 1,631,688 1,614,892 1,621,427 1,629,957 1,634,666 poly QC+ SNPs 1,536,247 1,403,896 1,311,113 1,270,600 1,391,578 1,272,736 1,507,520 1,430,334 1,525,239 1,393,925 1,484,416 Training p.40 Phase 3 Data ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/ Training p.41 HapMap Genome Browser http://www.hapmap.org Training p.42 HapMap Genome Browser Chr20 Chr9:660000..760000 SNP:rs6870660 NM_153254 BRCA2 5q31 ENm010 gwa* PARK3 Training p.43 HapMap Genome Browser Training p.44 HapMap Genome Browser Click on ruler to re-center image Training p.45 HapMap Genome Browser Training p.46 HapMap Genome Browser Training p.47 HapMap Genome Browser Training p.48 HapMap Genome Browser Training p.49 HapMap Genome Browser Training p.50 HapMap Genome Browser Training p.51 Impute genotypes using HapMap Data • Interested in the VAV1 gene • Commercially available platforms with few overlapping SNPs in this region • HapMap genotyped lots of SNPs in region • Use genotypes for HapMap SNPs to impute genotypes & compare nonoverlapping SNP sets Training p.52 Impute genotypes using HapMap Data Training p.53 Impute genotypes using HapMap Data Training p.54 Impute genotypes using HapMap Data example.dat (20 user-provided SNPs; all should be part of the HapMap) : example.ped (genotypes for 336 unrelated inds) : Training p.55 Impute genotypes using HapMap Data Training p.56 Impute genotypes using HapMap Data • Info (143 provided & imputed HapMap SNPs) SNP Al1 rs10419572 rs415218 T rs4807100 A rs4807101 T rs1651876 T … Al2 T A G C C Freq1 A 0.9709 0.4713 0.4714 0.9631 MAF 0.9041 0.0291 0.4713 0.4714 0.0369 Quality 0.0959 0.9427 0.9790 0.9803 0.9277 Rsq 0.8179 0.0313 0.9625 0.9649 0.0216 0.1069 • Geno (143 SNPs x 336 inds) PED00001->IND00001 ML_GENO T/T T/T G/G C/C T/T T/T A/T G/G A/A T/T T/C … PED00002->IND00002 ML_GENO T/T T/T A/G T/C T/T T/T A/T G/G A/A T/T T/C … PED00003->IND00003 ML_GENO T/T T/T A/A T/T T/T T/T A/T G/G A/A T/T T/T … … • Dose (allele dosage) PED00001->IND00001 ML_DOSE 1.719 1.911 0.004 0.003 1.913 1.980 1.246 1.884 1.949 1.948 1.302 … PED00002->IND00002 ML_DOSE 1.861 1.957 1.000 1.000 1.952 1.892 1.086 1.909 1.949 1.948 1.096 … PED00003->IND00003 ML_DOSE 1.994 1.999 1.993 1.995 1.955 1.656 1.297 1.863 1.987 1.988 1.374… … Training p.57 Outline • • • • Basic Concepts NCBI dbSNP International HapMap Project VarioWatch Training p.58 VarioWatch Training p.59 VarioWatch rs5934 ADSSL1 APOE chr8:19856718-19858718 chr14:104317518-104317518 chr11:234234+C chr15:234343 Training p.60 VarioWatch Training p.61 VarioWatch Training p.62 VarioWatch Training p.63 VarioWatch Training p.64 VarioWatch Training p.65 Practice • Find tag SNPs in a range of human genome and download those SNPs from Hapmap • Submit SNPs or a gene symbol to VarioWatch and find SNPs with high risk Training p.66 Acknowledgement Training p.67 Thanks for your attention!
© Copyright 2026 Paperzz