DE-NOVO SEQUENCING OF GENOMES USING UP TO FIVE DIFFERENT TYPES OF GENOMIC DNA LIBRARIES Dr. Georg Gradl, Dr. Axel Strittmatter, Dr. Sascha Glinka, Eurofins Genomics, Anzinger Str. 7a, 85560 Ebersberg, Germany ABSTRACT This application note describes the strategy for de-novo sequencing of prokaryotic, fungal and diploid higher eukaryotic genomes of any size. Routinely, up to 5 different non-cloned genomic DNA libraries are prepared for sequencing with Roche GS FLX Titanium series chemistry using massive parallel sequencing technology. The libraries are a combination of one Shotgun library (SG) and various Long Paired End libraries (LPE) with different jumping distances (3 kb, 8 kb, 20 kb and up to 40 kb). After sequencing, the resulting reads are assembled into large contigs (> 1kb) and provide not only the genetic sequence but also necessary scaffolding information. In a final step, remaining gaps are closed by preparation of PCR fragments, subsequent double-stranded Sanger sequencing and assembly. INTRODUCTION Next Generation Sequencing technology enables the de-novo sequencing of any kind of genome in a fast and efficient way. In order to achieve the optimum result in a reasonable time and at affordable cost, we have developed a strategy which combines sequencing and scaffolding of up to 5 different types of libraries. This setup allows us to directly span repeated genetic regions of up to 40 kb. Therefore the assembly of data not only orders the hundreds or thousands of contigs, but also drastically cuts down gap closing and manual editing times. After assembly, and with the scaffolding information, it is possible to automatically design PCR primer pairs for adjacent contig ends, perform PCR on the genomic DNA and sequence PCR fragments by Sanger technology in 96-well format. The resulting sequence reads are incorporated into the original assembly and thereby gaps are closed. With this strategy it is also possible to verify sequences of doubt, mainly in areas of low coverage, or uncertain areas of presumed or possible mis-assembly. MATERIAL AND METHODS DNA Preparation High molecular weight DNA should be prepared with the various commercially available kits or customer’s internal lab protocols. In cases where the DNA was prepared with phenol and/or chloroform, we include an additional purification step to avoid loss of enzymatic activity during the consecutive library preparation steps. Library Preparation All libraries for this sequencing strategy are prepared according to the recommendations and protocols of Roche/454. Shotgun libraries (SG) High molecular weight genomic DNA is shotgun fragmented using the Roche/454 GS Rapid Library Prep Kit and nebulizers provided with the kit. Further library preparation is performed according to “GS FLX Titanium Rapid Library Preparation Method Manual”. Long Paired End libraries (LPE) Genomic DNA is fragmented into the appropriate fragment sizes (3kb, 8kb, 20kb and up to 40kb) using the HydroShearTM DNA Shearing Device (GeneMachine). Further library preparation is performed according to “GS FLX Titanium Paired End Library Prep 3kb Span Method Manual” and “GS FLX Titanium Paired End Library Prep 20+8kb Span Method Manual”. See Fig.1 eurofinsgenomics.com Fig.1 Preparation of a Long Paired End Library (LPE) with 3, 8, 20 or 40 kb spanning distance - Page 1 - DNA Sequencing Sequencing is performed on Roche/454 GS FLX systems equipped with on-instrument Data Analysis Software Modules v.2.3. Assembly and Scaffolding After sequencing, data assembly and scaffolding is performed with appropriate hardware and adopted software solution. In consequence with the amount of data produced, we either make use of the Roche Genome Assembler in the current version (formerly known as “Newbler”) or the Celera Assembler Version 6.1 or MIRA Version 3.2. All sequencing data are processed on one of different multiprocessor computer systems ranging from 8 Cores with 32 GB to 32 Core with 1024 GB of RAM. See Fig.2 and Fig.3 Strategy is depending on genome size With the above described library portfolio, Eurofins Genomics can easily address any larger diploid eukaryotic genome e.g. complex and higher plants, but also fungi, algae, or insect sample. For any of these types of complex and large genomes, we routinely make use of our 5 library strategy which also includes a preparation of several copies of each type of the described libraries. This is done to avoid bias in the chromosomal fragments used for each library, thereby covering the entire genetic information. Fig.2 Assembly and scaffolding of a combination of shotgun (SG) and long paired end (LPE) libraries. Gap closing made easy! For medium size genomes like fungi of 40 Mb we recommend a 2 – 3 libraries setup (shotgun library plus 8kb library optional with 3 kb or 20 kb). For smaller fungal genomes or large bacteria samples a 2 libraries setup (shotgun and 8 kb) is likely to be sufficient. Small or non-complex bacterial genomes (not including some extremophiles) are sequenced with a 1 library strategy that is a long paired end library with 8 kb jumping distance. RESULTS AND DISCUSSION Depending on the complexity and size of the genome, we select the appropriate strategy for our sequencing approach. Example 1: Sequencing of unknown 11.3 Mb fungal species with 4 libraries protocol (shotgun, LPE3kb, LPE8kb, LPE20kb). Each library was sequenced on a quarter picotiter plate of the Roche GS FLX sequencer, resulting in 992,611 reads with a very even distribution of reads per library (SG 25.6%, LPE3kb 24.3%, LPE8kb 26.8% LPE20kb 23.3%). The draft assembly and scaffolding resulted in 11 large contigs (see Fig.4). The largest scaffold was 3,020,278 base pairs and after manual inspection it was noted that two chromosomes were linked. After resolving, we had 12 large scaffolds representing the 11 chromosomes and a plasmid. The entire mitochondrial sequence was not correctly assembled under the selected conditions; it was, however, possible to resolve the correct structure during the manual editing phase. Fig.3 Screenshot of a contig, coassembled out of SG, LPE3kb, LPE8kb and LPE20kb reads. Contig visualised with GAP5 (Staden Package version 2.0) Example 2: Sequencing of a highly GC rich and repetitive Streptomyces spec. with 4 libraries (shotgun, LPE3kb, LPE8kb, LPE20kb). Each library was sequenced on a quarter picotiter plate of the Roche GS FLX sequencer, resulting in approx. 250.000 reads per library with a very even distribution of reads (SG 28.5%, LPE3kb 23.7%, LPE8kb 23.8% LPE20kb 24.0%). The sequencing revealed a eurofinsgenomics.com Fig.4 First draft assembly and scaffold of the yeast genome. Scaffold 1 could be resolved to 2 chromosomes, occasionally linked together. - Page 2 - genome size of 8.8 Mb. Assembly and scaffolding resulted in 118 large contigs in 1 large single scaffold of 8.760.064 bases with a total gap length of 31 kb. Example 3: Sequencing of a 3.1 Mb beta-Proteobacterium. Cost efficient sequencing was carried out with 1x LPE8kb library only. Based on the nature of the library production and sequencing, a LPE library delivers 3 types of sequences: True paired ends, pairs without partner and shotgun like reads (see Fig.5). These reads can be assembled to contigs and clustered at the same time (see Fig.6 and Fig.7). The sequencing resulted in 21 large contigs. 18 out of these large contigs were present in 5 scaffolds. The largest contig was 650 kb long. This allowed straightforward gap closing. Example 4: Sequencing of numerous other species with the LPE8kb strategy, in most cases, results in a manageable number of contigs and scaffolds. The outcome is very much dependant on the complexity and G/C content of the genomes. If the results are not sufficient, the addition of further shotgun sequencing improves the quality immediately (see Table 1). Fig.5 Because the adapter is not always in the middle of the fragment, a LPE library delivers 3 types of sequences. Fig.7 A typical result for a LPE only approach eurofinsgenomics.com Fig.6 Sequencing and scaffolding with one LPE8kb library Organism Genome Size Coverage Large Contigs Scaffolds Cyanobacterium spec. 8.6 Mb 28 198 24 Helicobacter spec. 2.1 Mb 27 11 9 Clostridium spec. 3.9 Mb 35 98 23 Clostridium spec. 3.9 Mb 44 91 27 E. coli spec. 4.5 Mb 28 18 11 Lactobacillus spec. 3.1 Mb 45 14 7 Lactobacillus spec. 2.4 Mb 37 24 22 Fungal Genome 42.9 Mb 20 231 32 Table 1 Examples of genomes that have been sequenced with one LPE8kb only - Page 3 - OUTLOOK We are currently converting all of the protocols described above for use with the Illumina sequencing technology. After the final evaluation steps, the described sequencing strategy will give way to de novo sequencing of genomes on the Illumina HiSeq 2000 sequencer, that will also deliver highly accurate and correctly assembled data sets. This will in turn further decrease the price per sample, as data output on the Illumina HiSeq 2000 offers the ability to sequence several samples in parallel. Also new sequencing targets e.g. complex communities might be addressed in the near future. CONCLUSION It is possible and affordable to routinely sequence and assemble haploid or diploid genomes of any size with the above described technology. Further improvements will open windows to new applications and also a new sequencing platform. REFERENCES De Schutter et al. Genome sequence of the recombinant protein production host Pichia pastoris. (2009) Nature Biotechnology 27:561-6. Rounsley et al. De novo next generation sequencing of plant genomes. (2009) Rice 2:35-43. CONTACT Global Sales Manager Next Generation Sequencing Dr. Georg Gradl Tel. +49 8092 8289-945 Email: [email protected] European Sales Manager Next Generation Sequencing Dr. Axel Strittmatter Tel. +49 8092 8289-972 Email: [email protected] GS FLX is a trademark of Roche. Illumina and HiSeq are trademarks of Illumina, Inc. eurofinsgenomics.com - Page 4 -
© Copyright 2026 Paperzz