Jasmin Zohren , Elizabeth Sollars , Jo Clark , David Boshier , Anika

Sequencing the genome of Fraxinus excelsior (European Ash)
Jasmin
1
Zohren ,
Elizabeth
1,2
Sollars ,
Jo
3
Clark ,
David
3
Boshier ,
Anika
2
Joecker ,
Richard
1
Buggs
1. School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; 2. CLC bio, Silkeborgvej 2, Prismet, 8000 Aarhus C., Denmark;
3. The Earth Trust, Little Wittenham, Abingdon, Oxfordshire, OX14 4QZ, UK
Ash dieback
A
B
Assembly results
Biological material
C
Figure 1:
(A) Distribution
of ash trees
(blue area) and
year of first
confirmed cases
of ash dieback in
Europe.
(B) An infected
ash leaf.
(C) Confirmed
cases of infected
ash trees (red
dots) and ash
tree density
(colour grade)
across Great
Britain.
We are sequencing the whole genome of an ash tree with
low heterozygosity. The tree was generated by the selfpollination of an hermaphroditic ash tree from
Gloucestershire with low heterozygosity at five
microsatellite loci. The selfed progeny (Fig. 2B) is growing
at the Earth Trust in Oxfordshire, and was an outcome of the
EU FRAXIGEN project (2002-05).
The 1C genome size of the selfed tree was measured as
880Mbp by flow cytometry (Fig. 2A). DNA for sequencing
was extracted using the Qiagen DNeasy protocol. RNA was
extracted from leaf tissue using a modified CTAB/phenol
protocol. We also extracted RNA from the leaves, flowers,
roots, and cambium of the mother tree.
Sequencing was carried out at Eurofins using a combination
of Illumina HiSeq 2000 and Roche 454 FLX++ (Table 1). The
reads were adapter and quality trimmed, and then
assembled using SOAPdenovo2 [4], CLC Genomics
Workbench [5], and the stand-alone scaffolder SSPACE [6].
Assemblies were compared using length metrics, mapping
reads, CEGMA analysis [7], and the validation program
FRCbam [8]. The RNA is currently being sequenced on an
Illumina HiSeq at the QMUL genome centre.
Ash 2C = 1.79 pg
Huge selection pressure is being placed on European
populations of the common ash tree (Fraxinus excelsior) by
the fungal pathogen Hymenoscyphus pseudoalbidus, which
causes ash dieback. It has recently been introduced in the
UK and it is known that in areas such as Denmark where the
disease is widespread, less than 10% of trees in natural
populations have low susceptibility to the fungus. Over the
coming decades there will be strong selection for those
variants in ash populations that reduce susceptibility. We
wish ultimately to identify these genetic or epigenetic
variants, enabling the conservation of the species in Europe.
References
[1] www.forestry.gov.uk
[6] Bioinformatics, 27(4):578-579.
[2] Heredity, 106:788-797.
[7] Bioinformatics, 23(9):1061-1067.
[3] Forensic Pathology, 42:69–74.
[8] PLoS ONE 7(12):e52210.
[4] GigaScience, 1(1):18.
[9] Genome research, 18(1):188–196.
[5] CLC de novo Assembly White Paper [10] BMC Bioinformatics, 12:491.
B
Figure 2: (A) Flow cytometry results for the sequenced tree.
(B) The selfed individual growing at Paradise Wood in
Oxfordshire, owned by the Earth Trust. (Photo: R. Buggs)
Further information
Correspondence: [email protected], [email protected]
Website: www.ashgenome.org
Insert size Read length No. reads/
Coverage of
raw data 880Mbp genome
Roche 454
NA
800bp
6m
4.5x
Illumina
200bp
100bp
350m
40x
Illumina
300bp
100bp
138m
15x
Illumina
500bp
100bp
245m
30x
Illumina
3kb
100bp
147m
15x
Illumina
8kb
100bp
328m
35x
Illumina
20kb
100bp
348m
40x
Illumina
40kb
100bp
256m
30x
Table 1: The data that was
generated by Eurofins and
used for the assemblies. The
first instalment of data was
received 6 weeks after sample
delivery.
Statistics
CLC
CLC + SSPACE SOAPdenovo2
NG50
93,223
78,383
45,525
Size (Mb)
2,027
979
1,470
No. scaffolds 203,780
142,371
208,477
Max
412,238
558761
379077
scaffold size
% Reads
94.54
92.69
73.49
mapped
% Broken
43.74
35.91
44.54
pairs
% CEGMA
96.37
97.18
90.73
partial hits
% CEGMA
82.66
86.69
72.18
complete hits
Table 2: Assemblies were performed using SOAPdenovo2,
CLC Genomics Workbench, and a combination of CLC and
the SSPACE scaffolder.
Tomato 2C = 1.96 pg
A
Platform
Figure 3: Graphical Comparison of the
different assemblies using FRCbam. Taking
this and Table 2 into account, we believe the
CLC+SSPACE approach performed best.
Future studies
RNA-seq data will be fed into the annotation pipeline Maker [9, 10] to locate coding genes. We will also
assemble the transcriptome to analyse expression profiles in the different tissues.
We plan to perform bisulphite sequencing to compare DNA methylation patterns between high and low
susceptibility individuals.
By making the reference genome publicly available on our website (www.ashgenome.org) we are
encouraging other groups to use this foundational resource for evolutionary and ecological genetics
studies. These can include the discovery of markers for a low-susceptibility genotype, which could be used
to aid the breeding of resistant ash tree populations.
Acknowledgments
We thank the following people for helpful input: Martin Simonsen, Mahesh Pancholi, and Richard
Nichols. The project is funded by an urgency grant from NERC. E. Sollars and J. Zohren are funded by
INTERCROSSING, an EU-funded Marie Curie Initial Training Network.
Jasmin
Lizzy