Application Note Re-Sequencing of Eukaryotic Model Organism – The Easy Way Introduction You work with human or model organisms such as for example mice, rat, or C. elegans and you want to know the genetic variation in your specimens? Whole genome re-sequencing is getting easier and affordable by the advent of the newest sequencing technologies and the availability of well curated reference genomes. This allows you to understand (i) Organism Pan troglodytes Homo sapiens Rattus norvegicus Sus scrofa Mus musculus Bos taurus Equus caballus Canis familiaris Danio rerio Gallus gallus Drosophila melanogaster Arabidopsis thaliana Caenorhabditis elegans Saccharomyces cerevisiae Genome Size 3.3 Gb 3.1 Gb 2.9 Gb 2.8 Gb 2.7 Gb 2.7 Gb 2.5 Gb 2.4 Gb 1.4 Gb 1.1 Gb 143 Mb 119 Mb 100 Mb 12 Mb how genetic differences affect health, allows you to perform (ii) population studies, (iii) do molecular breeding for functional gene or marker detection and much more. Microsynth offers a “onestop service” for the re-sequencing of model organisms starting from DNA isolation to data analysis to detect the genome-wide variation in your specimens. Illumina NextSeq 1 1 1 1 1 1 1 1 1 1 10 12 15 125 Illumina MiSeq n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 2 2 2 16 Results are reported in user-friendly outputs and can be directly used in your research. Due to the availability of the flexible and scalable Illumina NextSeq and MiSeq platforms, Microsynth can offer its customers an optimized sequencing strategy for every study. Table 1. Genome sizes and typical number of samples per organism which can be sequenced in a single run on the Illumina NextSeq or MiSeq platform. Sample numbers are calculated to achieve >30x coverage per sample, taking into account pooling variability. In case of larger studies where two or more sequencing runs are required, it is possible to further optimize the sequencing resulting in even more samples per run. Microsynth Competences and Services With more than 7 years of experience in the field of next-generation sequencing, one of Microsynth’s core competences is to provide high quality one-stop services. Re-sequencing of model organisms covers the entire process from experimental design planning, DNA isolation, Illumina sequencing up to a detailed bioinformatics analysis (see also workflow to the right). Project Input: Option I: Samples Option II: Isolated DNA Experimental Design: Microsynth’s NGS specialists help you to define a suitable experimental set-up for your re-sequencing project and discuss possible se- Samples (tissue, cells, blood, ...) DNA Isolation TrueSeq Library Preparation Illumina Sequencing Bioinformatics Analysis Report Generation Project Output: see following sections & pages Microsynth AG | Schützenstrasse 15 | 9436 Balgach | Switzerland Phone: +41 71 722 83 33 | E-mail: [email protected] | Web: www.microsynth.ch Page 1 quencing strategies best suited to address your research question(s). DNA Isolation: Either the customer provides isolated DNA or outsources this step to Microsynth. Microsynth has a vast experience in DNA/RNA isolation from various and also demanding matrices like plant material, FFPE samples, etc. Library Preparation and Sequencing: Following a quality check of your samples, Microsynth will construct the Illumina library including specific sequencing adaptors with barcodes. Depending on the experimental design, the libraries are pooled and sequenced either on the Illumina MiSeq or NextSeq platform. These flexible sequencing platforms al- lowing to optimize the sequencing strategy depending on the number of samples/specimens. Bioinformatics Analysis: Sequenced reads are quality-filtered and mapped against the reference genome of the organism under study. Possible small insertions & deletions (small InDels) and substitutions (SNPs) are detected based on the consense of three independent software packages. Beside useful summaries of sequencing and mapping, variant calling analyses are given in the VCF format. A variant calling step results in a userfriendly graphic summarizing the major findings of all investigated samples (e.g. specimens). For those variants which oc- cur within protein coding regions, the affected gene will be annotated and the impact on the translated amino acid sequence is shown. As a consequence, each mutation detected can be specified (silent vs. missense vs. nonsense mutation). Structural variations are analyzed using the GASVPro package which searches for discrepencies between the observed and expected alignments of the paired-end sequencing data. Copy-number variation is either analyzed de-novo based on read-depth or against a reference specimen sequenced in the study. Beside the raw data of the sequencing and the output of the analysis, a user-friendly summary report is provided. Examples for Most Important Output Files Provided by Microsynth Important note: every variant is handled separately and independently of other locations. Primary Gene ID Name Primary Tag Start End Strand Transcriptc ID 31905814 31906032 sense NM_018565 31905889 Alternative Total Read Depth AC PV A ACAG 16 8 1.54E-08 Var Position Reference chr1 SERINC2 CDS chr1 NA NA NA NA NA NA 37695988 GT G 38 19 1.11E-16 chr1 CDCP2 CDS 54605196 54605779 anti-sense NM_201546 54605319 G GC 26 13 4.33E-13 chr1 NA NA NA NA NA NA 74987170 CAT C 24 11 9.42E-11 chr1 CELSR2 CDS sense NM_001408 109792735 A ACGC 18 9 1.88E-09 chr1 OR14A16 CDS AAGG A 31 15 1.24E-14 chr1 NA C CAT 28 13 1.47E-12 NA 109792702 109796011 247978105 247979031 anti-sense NA NA NA NM_001001966 247978541 NA 248683277 Align_Seq1 Match_Line Align_Seq2 MGAEGAPDFLSCPRVRRASCLCGSAPCILCSCCPASRNSTVSRLIFTFFLFLGVLVSIIMLSPGVESQLYKLPWVC MGAEGAPDFLSCPRVRRASCLCGSAPCILCSCCPASRNSTVSRLIFTFF Insertion (373;368)MGAEGAPDFLSCPRVRRASCLCGSAPCILCSCCPASRNSTVSRLIFTFFLFLGVLVSIIMLSPGVESQLYKLPWVC ||||||||||||||||||||||||||||||||||||||||||||||||||||||| MGAEGAPDFLSCPRVRRASCLCGSAPCILCSCCPASRNSTVSRLIFTFF NA NA NA NA NA NA MLAEWGACLLLAVALLGPGLQAQAMEGVKCGGVLSAPSGNFSSPNFPRLYPYNTECSWLIVVAEGSSVLLTFH MLAEWGACLLLAVALLGPGLQAQAMEGVKCGGVLSAPSGNFSSPN Missense (409;409)MLAEWGACLLLAVALLGPGLQAQAMEGVKCGGVLSAPSGNFSSPNFPRLYPYNTECSWLIVVAEGSSVLLTFH ||||||||||||||||||||||||||||||||||||||||||||||||||||||| MLAEWGACLLLAVALLGPGLQAQAMEGVKCGGVLSAPSGNFSSPN NA NA NA NA NA NA MRSPATGVPLPTPPPPLLLLLLLLLPPPLLGDQVGPCRSLGSRGRGSSGACAPMGWLCPSSASNLWLYTSRCRD MRSPATGVPLPT-PPPPLLLLLLLLLPPPLLGDQVGPCRSLGSRGRGSSG Insertion (17;13) MRSPATGVPLPTPPPPPLLLLLLLLLPPPLLGDQVGPCRSLGSRGRGSSGACAPMGWLCPSSASNLWLYTSRCR |||||||||||| ||||||||||||||||||||||||||||||||||||||||||| MRSPATGVPLPTPPPPPLLLLLLLLLPPPLLGDQVGPCRSLGSRGRGSS MANLTIVTEFILMGFSTNKNMCILHSILFLLIYLCALMGNVLIIMITTLDHHLHTPVYFFLKNLSFLDLCLISVTAPK MANLTIVTEFILMGFSTNKNMCILHSILFLLIYLCALMGNVLIIMITTLD Deletion (163;163) MANLTIVTEFILMGFSTNKNMCILHSILFLLIYLCALMGNVLIIMITTLDHHLHTPVYFFLKNLSFLDLCLISVTAPK ||||||||||||||||||||||||||||||||||||||||||||||||||||||| MANLTIVTEFILMGFSTNKNMCILHSILFLLIYLCALMGNVLIIMITTLD NA NA NA NA NA NA Mutation_Type Ref_Protein/Alt_Protein Figure 1. Typical output overview file (HTML) resulting from the variant calling step for SNPs and small InDels. For each sample and chromosome/ contig the SNP and small Indels are reported separately and the effect of SNPs and small InDels for all annotated features of the reference genome are shown. SNPs and small InDels that are not located in a coding region are included and depicted as NA in the feature descriptions. Besides the html format the data are also given in the vcf format. In case that you sequence several samples/specimens a customized summary table including the different samples/specimens can be compiled according to your specifications. #Cluster_ID: Le.Chr: Le.BreakPoint: RightChr: RightBreakPoint: Num PRS: Localiza?on: Type: LogLikelihoodRa?o: c21 1 5807 ; 5882 2 566411 ; 566486 4 53 TN+1 8661.06 c110 1 10585 ; 10736 6 134263592 ; 134263000 8 106.8 TN-‐1 8702.01 c3435 2 6373851 ; 6374022 2 6373894 ; 6374065 4 120.9 D 8270.28 c13422 2 27874002 ; 27874100 2 27874065 ; 27874160 4 71.4 D 8406.58 c56623 2 93262144 ; 93262320 2 93262282 ; 93262630 9 183 IR 6852.01 c155520 2 121484319 ; 121484000 20 27738499 ; 27738710 21 147.4 TN+2 8824.48 c155531 2 121484815 ; 121485000 20 27733892 ; 27733930 4 88.2 TR+ 6873.08 c157310 2 84517648 ; 84518010 24 76411624 ; 76411980 4 241.5 TN-‐2 8661.06 c178639 3 42661362 ; 42661670 3 42678752 ; 42679060 5 209.4 I+ 8504.86 c210730 3 123476425 ; 123476000 3 123482348 ; 123482000 4 185.9 I-‐ 8661.06 c633554 8 40879292 ; 40879460 8 40880378 ; 40880540 19 144.5 IR 6731.64 c7477312. Typical output 9 73023376 ; 73023710 9 variants given 73023612 ; 73023940 7 included 217.1 I-‐ Figure of the analysis for structural for each samples/specimens in a study using GASVpro 8653.2 software. The c765683 of the structural 9 103435398 103435000 103435923 03436000 5 point on 127.9 I+ 8398.63 by the location variation; is given (leftChr 9and rightChr) and the; 1approximate breaking the chromosome(s) are indicated c819748 10 80552030 ; 80552050 10 80552050 ; 80552070 4 20.5 D 8168.84 boundary points (interval of coordinates to estimate the left and right break point) and the number of paired-end reads supporting the structural c1252383 18 28381258 ; 28381360 18 28381407 ; 28381510 4 78.5 D 8280.61 variation are given (Num PRS). The type of the structural varaition is indicated following the GASVpro definition (D = Deletion, IR = Reciprocal Inversion, I+/I- = Inversion, TR = Reciprocal Translocation, TN = Non-Reciprocal Translocation). Microsynth AG | Schützenstrasse 15 | 9436 Balgach | Switzerland Phone: +41 71 722 83 33 | E-mail: [email protected] | Web: www.microsynth.ch Page 2 A) B) 180 160 Read-Depth 140 120 100 80 60 40 20 0 1480 1500 1520 1540 1560 1580 1600 1620 Position on Chromosome 20 1640 1660 1680 ×10 3 Figure 3. Example results of the analyses for copy-number variations. The copy-number variation can either be analyzed de-novo based on readdepth (Fig 3A) or against a reference included in the study (Fig 3B). Further Readings • • • • • Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Meth, 9(4): 357-359. Wei Z, Wang W, Hu P, Lyon GJ, Hakonarson H: SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Research, 39(19): e132. Sindi S, Helman E, Bashir A, Raphael B: A geometric approach for classification and comparison of structural variants. Bioinformatics 2009, 25: i222-230. Abyzov A, Urban AE, Snyder M, Gerstein M: CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Research, 21(6): 974-984. Xie C, Tammi M: CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 2009, 10(1): 80. Microsynth AG | Schützenstrasse 15 | 9436 Balgach | Switzerland Phone: +41 71 722 83 33 | E-mail: [email protected] | Web: www.microsynth.ch Page 3 Version : March 2015
© Copyright 2026 Paperzz