Re-Sequencing of Eukaryotic Model Organism – The Easy Way

Application Note
Re-Sequencing of Eukaryotic
Model Organism – The Easy Way
Introduction
You work with human or model organisms such as for example mice, rat, or C.
elegans and you want to know the genetic variation in your specimens? Whole
genome re-sequencing is getting easier
and affordable by the advent of the newest sequencing technologies and the
availability of well curated reference genomes. This allows you to understand (i)
Organism
Pan troglodytes
Homo sapiens
Rattus norvegicus
Sus scrofa
Mus musculus
Bos taurus
Equus caballus
Canis familiaris
Danio rerio
Gallus gallus
Drosophila melanogaster
Arabidopsis thaliana
Caenorhabditis elegans
Saccharomyces cerevisiae
Genome Size
3.3 Gb
3.1 Gb
2.9 Gb
2.8 Gb
2.7 Gb
2.7 Gb
2.5 Gb
2.4 Gb
1.4 Gb
1.1 Gb
143 Mb
119 Mb
100 Mb
12 Mb
how genetic differences affect health, allows you to perform (ii) population studies, (iii) do molecular breeding for functional gene or marker detection and
much more. Microsynth offers a “onestop service” for the re-sequencing of
model organisms starting from DNA isolation to data analysis to detect the genome-wide variation in your specimens.
Illumina NextSeq
1
1
1
1
1
1
1
1
1
1
10
12
15
125
Illumina MiSeq
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
2
2
2
16
Results are reported in user-friendly outputs and can be directly used in your research. Due to the availability of the flexible and scalable Illumina NextSeq and
MiSeq platforms, Microsynth can offer its
customers an optimized sequencing
strategy for every study.
Table 1. Genome sizes and typical number of samples per organism which can be
sequenced in a single run on the Illumina
NextSeq or MiSeq platform. Sample numbers
are calculated to achieve >30x coverage per
sample, taking into account pooling variability. In case of larger studies where two or more
sequencing runs are required, it is possible to
further optimize the sequencing resulting in
even more samples per run.
Microsynth Competences and Services
With more than 7 years of experience in
the field of next-generation sequencing,
one of Microsynth’s core competences is
to provide high quality one-stop services. Re-sequencing of model organisms
covers the entire process from experimental design planning, DNA isolation,
Illumina sequencing up to a detailed bioinformatics analysis (see also workflow to
the right).
Project Input:
Option I:
Samples
Option II:
Isolated DNA
Experimental Design: Microsynth’s NGS
specialists help you to define a suitable
experimental set-up for your re-sequencing project and discuss possible se-
Samples
(tissue, cells, blood, ...)
DNA Isolation
TrueSeq Library Preparation
Illumina Sequencing
Bioinformatics Analysis
Report Generation
Project Output:
see following
sections & pages
Microsynth AG | Schützenstrasse 15 | 9436 Balgach | Switzerland
Phone: +41 71 722 83 33 | E-mail: [email protected] | Web: www.microsynth.ch
Page 1
quencing strategies best suited to address your research question(s).
DNA Isolation: Either the customer provides isolated DNA or outsources this
step to Microsynth. Microsynth has a vast
experience in DNA/RNA isolation from
various and also demanding matrices like
plant material, FFPE samples, etc.
Library Preparation and Sequencing:
Following a quality check of your samples, Microsynth will construct the Illumina library including specific sequencing adaptors with barcodes. Depending
on the experimental design, the libraries
are pooled and sequenced either on the
Illumina MiSeq or NextSeq platform.
These flexible sequencing platforms al-
lowing to optimize the sequencing strategy depending on the number of samples/specimens.
Bioinformatics Analysis: Sequenced
reads are quality-filtered and mapped
against the reference genome of the organism under study. Possible small insertions & deletions (small InDels) and substitutions (SNPs) are detected based on
the consense of three independent software packages. Beside useful summaries
of sequencing and mapping, variant calling analyses are given in the VCF format.
A variant calling step results in a userfriendly graphic summarizing the major
findings of all investigated samples (e.g.
specimens). For those variants which oc-
cur within protein coding regions, the affected gene will be annotated and the
impact on the translated amino acid sequence is shown. As a consequence, each
mutation detected can be specified (silent vs. missense vs. nonsense mutation).
Structural variations are analyzed using
the GASVPro package which searches for
discrepencies between the observed and
expected alignments of the paired-end
sequencing data. Copy-number variation
is either analyzed de-novo based on
read-depth or against a reference specimen sequenced in the study. Beside the
raw data of the sequencing and the output of the analysis, a user-friendly summary report is provided.
Examples for Most Important Output Files Provided by Microsynth
Important note: every variant is handled separately and independently of other locations.
Primary Gene
ID
Name
Primary
Tag
Start
End
Strand
Transcriptc ID
31905814
31906032
sense
NM_018565
31905889
Alternative
Total
Read
Depth
AC
PV
A
ACAG
16
8
1.54E-08
Var Position Reference
chr1
SERINC2 CDS
chr1
NA
NA
NA
NA
NA
NA
37695988
GT
G
38
19
1.11E-16
chr1
CDCP2
CDS
54605196
54605779
anti-sense
NM_201546
54605319
G
GC
26
13
4.33E-13
chr1
NA
NA
NA
NA
NA
NA
74987170
CAT
C
24
11
9.42E-11
chr1
CELSR2
CDS
sense
NM_001408
109792735
A
ACGC
18
9
1.88E-09
chr1
OR14A16 CDS
AAGG
A
31
15
1.24E-14
chr1
NA
C
CAT
28
13
1.47E-12
NA
109792702 109796011
247978105 247979031 anti-sense
NA
NA
NA
NM_001001966 247978541
NA
248683277
Align_Seq1
Match_Line
Align_Seq2
MGAEGAPDFLSCPRVRRASCLCGSAPCILCSCCPASRNSTVSRLIFTFFLFLGVLVSIIMLSPGVESQLYKLPWVC
MGAEGAPDFLSCPRVRRASCLCGSAPCILCSCCPASRNSTVSRLIFTFF
Insertion (373;368)MGAEGAPDFLSCPRVRRASCLCGSAPCILCSCCPASRNSTVSRLIFTFFLFLGVLVSIIMLSPGVESQLYKLPWVC
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
MGAEGAPDFLSCPRVRRASCLCGSAPCILCSCCPASRNSTVSRLIFTFF
NA
NA
NA
NA
NA
NA
MLAEWGACLLLAVALLGPGLQAQAMEGVKCGGVLSAPSGNFSSPNFPRLYPYNTECSWLIVVAEGSSVLLTFH
MLAEWGACLLLAVALLGPGLQAQAMEGVKCGGVLSAPSGNFSSPN
Missense (409;409)MLAEWGACLLLAVALLGPGLQAQAMEGVKCGGVLSAPSGNFSSPNFPRLYPYNTECSWLIVVAEGSSVLLTFH
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
MLAEWGACLLLAVALLGPGLQAQAMEGVKCGGVLSAPSGNFSSPN
NA
NA
NA
NA
NA
NA
MRSPATGVPLPTPPPPLLLLLLLLLPPPLLGDQVGPCRSLGSRGRGSSGACAPMGWLCPSSASNLWLYTSRCRD
MRSPATGVPLPT-PPPPLLLLLLLLLPPPLLGDQVGPCRSLGSRGRGSSG
Insertion (17;13) MRSPATGVPLPTPPPPPLLLLLLLLLPPPLLGDQVGPCRSLGSRGRGSSGACAPMGWLCPSSASNLWLYTSRCR
|||||||||||| |||||||||||||||||||||||||||||||||||||||||||
MRSPATGVPLPTPPPPPLLLLLLLLLPPPLLGDQVGPCRSLGSRGRGSS
MANLTIVTEFILMGFSTNKNMCILHSILFLLIYLCALMGNVLIIMITTLDHHLHTPVYFFLKNLSFLDLCLISVTAPK
MANLTIVTEFILMGFSTNKNMCILHSILFLLIYLCALMGNVLIIMITTLD
Deletion (163;163) MANLTIVTEFILMGFSTNKNMCILHSILFLLIYLCALMGNVLIIMITTLDHHLHTPVYFFLKNLSFLDLCLISVTAPK
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
MANLTIVTEFILMGFSTNKNMCILHSILFLLIYLCALMGNVLIIMITTLD
NA
NA
NA
NA
NA
NA
Mutation_Type
Ref_Protein/Alt_Protein
Figure 1. Typical output overview file (HTML) resulting from the variant calling step for SNPs and small InDels. For each sample and chromosome/
contig the SNP and small Indels are reported separately and the effect of SNPs and small InDels for all annotated features of the reference genome
are shown. SNPs and small InDels that are not located in a coding region are included and depicted as NA in the feature descriptions. Besides the
html format the data are also given in the vcf format. In case that you sequence several samples/specimens a customized summary table including
the different samples/specimens can be compiled according to your specifications.
#Cluster_ID:
Le.Chr:
Le.BreakPoint:
RightChr:
RightBreakPoint:
Num PRS: Localiza?on:
Type:
LogLikelihoodRa?o:
c21
1
5807 ; 5882
2
566411 ; 566486
4
53
TN+1
8661.06
c110
1
10585 ; 10736
6
134263592 ; 134263000
8
106.8
TN-­‐1
8702.01
c3435
2
6373851 ; 6374022
2
6373894 ; 6374065
4
120.9
D
8270.28
c13422
2
27874002 ; 27874100
2
27874065 ; 27874160
4
71.4
D
8406.58
c56623
2
93262144 ; 93262320
2
93262282 ; 93262630
9
183
IR
6852.01
c155520
2
121484319 ; 121484000
20
27738499 ; 27738710
21
147.4
TN+2
8824.48
c155531
2
121484815 ; 121485000
20
27733892 ; 27733930
4
88.2
TR+
6873.08
c157310
2
84517648 ; 84518010
24
76411624 ; 76411980
4
241.5
TN-­‐2
8661.06
c178639
3
42661362 ; 42661670
3
42678752 ; 42679060
5
209.4
I+
8504.86
c210730
3
123476425 ; 123476000
3
123482348 ; 123482000
4
185.9
I-­‐
8661.06
c633554
8
40879292 ; 40879460
8
40880378 ; 40880540
19
144.5
IR
6731.64
c7477312. Typical output
9
73023376 ; 73023710
9 variants given
73023612 ; 73023940
7 included
217.1
I-­‐
Figure
of
the analysis
for structural
for each
samples/specimens
in a study using
GASVpro 8653.2
software. The
c765683 of the structural
9
103435398 103435000
103435923 03436000
5 point on
127.9
I+
8398.63 by the
location
variation; is
given (leftChr 9and rightChr)
and the; 1approximate
breaking
the chromosome(s)
are indicated
c819748
10
80552030 ; 80552050
10
80552050 ; 80552070
4
20.5
D
8168.84
boundary points (interval of coordinates to estimate the left and right break point) and the number of paired-end reads supporting the structural
c1252383
18
28381258 ; 28381360
18
28381407 ; 28381510
4
78.5
D
8280.61
variation are given (Num PRS). The type of the structural varaition is indicated following the GASVpro definition (D = Deletion, IR = Reciprocal Inversion, I+/I- = Inversion, TR = Reciprocal Translocation, TN = Non-Reciprocal Translocation).
Microsynth AG | Schützenstrasse 15 | 9436 Balgach | Switzerland
Phone: +41 71 722 83 33 | E-mail: [email protected] | Web: www.microsynth.ch
Page 2
A)
B)
180
160
Read-Depth
140
120
100
80
60
40
20
0
1480
1500
1520
1540 1560 1580 1600 1620
Position on Chromosome 20
1640
1660
1680
×10
3
Figure 3. Example results of the analyses for copy-number variations. The copy-number variation can either be analyzed de-novo based on readdepth (Fig 3A) or against a reference included in the study (Fig 3B).
Further Readings
•
•
•
•
•
Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Meth, 9(4): 357-359.
Wei Z, Wang W, Hu P, Lyon GJ, Hakonarson H: SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation
sequencing data. Nucleic Acids Research, 39(19): e132.
Sindi S, Helman E, Bashir A, Raphael B: A geometric approach for classification and comparison of structural variants. Bioinformatics 2009, 25:
i222-230.
Abyzov A, Urban AE, Snyder M, Gerstein M: CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from
family and population genome sequencing. Genome Research, 21(6): 974-984.
Xie C, Tammi M: CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 2009, 10(1):
80.
Microsynth AG | Schützenstrasse 15 | 9436 Balgach | Switzerland
Phone: +41 71 722 83 33 | E-mail: [email protected] | Web: www.microsynth.ch
Page 3
Version : March 2015