Introduction Strategies Results and discussion Conclusions

Complete mitochondrial genome NGS investigation on Ion Torrent PGM™
platform and population genetic studies in Northern Han Chinese
1
2
3
2
2
2
2
2
2
1
1,2
Yishu Zhou , Jiao Yu , Fei Guo , Jinling Zhao , Feng Liu , Hongying Shen , Bin Zhao , Fei Jia , Zhu Sun , He Song , Xianhua Jiang *
1 China Medical University School of Forensic Medicine, No. 77, Puhe Road, Shenyang North New Area, Shenyang, Liaoning 110122, P.R. China
2 Criminal Science and Technology Institute of Liaoning Province, No. 2, Qishan Middle Road, Huanggu District, Shenyang, Liaoning 110032, P.R. China
3 Department of Forensic Medicine, National Police University of China, No. 83, Tawan Street, Huanggu District, Shenyang, Liaoning 110854, P.R. China
* Corresponding author at: Criminal Science and Technology Institute of Liaoning Province, E-mail address: jiangxianhua_2006@aliyun.
Introduction
Reliability of Torrent Variant Caller v.4.0
Good samples
(average coverage 1561 × )
1.37%
All samples
(average coverage 1269 × )
2.66%
Previous studies often restrict to sequence the HVS-I, II and III of control region (CR) as well
as some specific single nucleotide polymorphisms (SNPs) of coding region (CodR). Such partial
information may limit the polymorphisms information content of this genetic marker and
hinder its application in practical forensic casework. In this study, we have developed strategies
for complete mtGenome sequencing on the Ion Torrent Personal Genome™ Machine (PGM™)
platform and investigated mtGenome features of the Northern Chinese Han population to
evaluate the application in forensic sciences.
97.34%
94.74%
98.63%
Outside homopolymer regions
1.08%
Strategies
Poor samples
(average coverage 632 × )
5.26%
error variant
correct variant
Homopolymer
3 bp
4 bp
5 bp
6 bp
7 bp
8 bp
≥9 bp
Within homopolymer regions
7.58%
92.42%
98.92%
Sequencing
long range PCR amplification
Primer H877
Clonal amplification
Fragment B
8.6 kbp
PGM sequencing
314 chip: 4 samples; 316 chip: 15 samples;
318 chip: 30 samples
mtGenome
16569 bp
The results demonstrated TVC was more reliable with ≥ 1500 × average coverage and ≤ 5bp
homopolymer.
When it existed with homopolymers ≥ 6 bp (especially ≥ 8 bp) and average coverage ≤ 500 ×,
variants should be authenticated by visual inspection in some certain regions and even across
the complete mtGenome.
Fragment A
8.3 kbp
Population genetics
Primer H8982
Primer L8789
Data analysis
Error rate
1.19% (6/504)
1.90% (4/211)
4.96% (7/141)
32.14% (9/28)
25.00% (2/8)
75.00% (3/4)
53.75% (43/80)
Primer L644
Library construction (Fragmentation,
adapter ligation and size selection)
error variant
correct variant
Summary statisticsof mtGenome from 107 Northern Chinese Han.
Percentage increased
HVS-I
HVS-I/
HVS-II
CR
mtGenome
HVS-I →
HVS-I/HVS-II
HVS-I/HVS-II
→CR
CR→
mtGenome
# Variants a
522
892
1102
4022
-
-
-
# Haplotypes a
94
102
105
107
8.51%
2.94%
1.90%
# Unique haplotypes a
84
98
103
107
16.67%
5.10%
3.88%
Mean # of pairwise
differences a
7.36
9.66
11.41
39.15
31.25%
18.12%
243.12%
Range of differences
0.9967
0.9989
0.9996
1
0.22%
0.07%
0.04%
HD
0.0126
0.0104
0.0097
0.0093
–17.46%
–6.73%
–4.12%
RMP
65
75
79
88
15.38%
5.33%
11.39%
# Haplogroupsb
43
56
60
74
30.23%
7.14%
23.33%
a
Cn indels at positions 309, 315 and 16193 were not counted in all calculations; 523–524DEL, 521–524DEL,
524.1A and 524.2C, and8281–8289DEL were treated as one variant respectively; PHPs were treated as
variants.
b
Haplogroups for HVS-I, HVS-I/HVS-II, CR and CodR were assigned by mthap, while haplogroups for
mtGenome were approved by the EMPOP.
The RMP with sequencing the complete mtGenome was dramatically decreased (26.19%) by
comparing value from HVS-I.
Haplogroup resolution
Three in-house Perl scripts were developed for primary data analysis to screen out uncertain
positions and samples from variant call format (VCF) reports.
Both IGV and NextGENe software were used for base by base review.
Online tools were used for haplogroup assignment, including Mthap (http://dna.jameslick.
com/mthap), EMMA (www.empop.org), and PhyloTree (www.phylotree.org).
Results and discussion
Sample
Haplogroup assignment
CR
mtGenome
N021
M9a'b
G
N044
M9a1a1c1b1
D4
N079
M33-16362
G2b1a
N083
D4k
C4a1-195
N088
M74
D4j7/D4j-16311
N094
R6
D4-195C
N096
M33-16362
G1c
Haplotypes based on the complete mtGenome had potential on assigning to the most
accurate haplogroups compared with control region only.
Coverage and strand balance
Conclusions
Some regions were presented as particularly low coverage, mostly located in HVS and
NADH dehydrogenase (ND) coding regions.
Most of high reverse strand biases located in the regions with low coverage relative to the
rest of the mtGenome.
It seemed that most of above regions would coincide with the areas of homopolymers.
Homopolymers
The result demonstrated that low coverage
and high reverse strand biases were mainly
attributed to homopolymers, especially
presenting a single large component (≥ 8-bp)
and/or multiple continuous components in
a small region, when PGM™platform was
applied.
This study outlines strategies for complete mtGenome sequencing on Ion Torrent PGM™
platform and NGS data analysis.
According to our experience, ~ 30 samples per week by an individual are produced on
PGM™ platform.
The TVC is more reliable with samples of higher average coverage (e.g., ≥ 1500 ×) and with
≤ 5bp homopolymer.
The resolution with sequencing the complete mtGenome was dramatically improved
by comparing value from the subsets of the molecule historically targeted for human
identification.
Therefore, we believe the NGS technology has powerful potential on complete mtGenome
detection compared with traditional method.
Acknowledgements
This study was supported by grants (No. 201201ZDYJ001) from the key research project of
Ministry of Public Security Project, China. The authors wish to thank Walther Parson and
Simone Nagl from the EMPOP for advices and efforts on the data evaluation. We also thank
Qingqing Zhang from Department of Field Bioinformatics Support (FBS) of Thermo Fisher
Scientific for advices on Perl compilation.
For Research, Forensic or Paternity Use Only. Not for use in diagnostic procedures.