Z. filipendulae

D E P A R T M E N T O F PLANT BIOLOGY AND BIOTECHNOLOGY
VKR RESEARCH CENTER PRO-ACTIVE PLANTS
FAC U LTY OF LIFE S C IE N C E S
U N IVE R S ITY O F C O PEN HAG E N
Genes involved in metabolism of cyanogenic glucosides in Zygaena
filipendulae determined by 454 pyrosequencing
Mika Zagrobelnya, Karsten Scheibye-Alsinga,b, Niels Bjerg Jensena, Birger Lindberg Møllera, Jan Gorodkinb, Søren Baka
aPlant
Biochemistry Laboratory, Department of Plant Biology, University of Copenhagen, 40 Thorvaldsensvej, DK-1871 Frederiksberg C, Denmark
The VKR Research Centre ‘‘Proactive Plants”, University of Copenhagen, 40 Thorvaldsensvej, DK-1871 Frederiksberg C, Denmark
bDepartment of Basic Animal and Veterinary Sciences/Genetics & Bioinformatics, University of Copenhagen, 3 Grønnegårdsvej, DK-1871 Frederiksberg C,
Denmark
Center for applied bioinformatics, University of Copenhagen, 40 Thorvaldsensvej, DK-1871 Frederiksberg C, Denmark
UGTs
Introduction
Zygaena filipendulae is a brightly colored diurnal moth, capable of biosynthesizing as well as
sequestering the cyanogenic glucosides (CNglcs) linamarin and lotaustralin from its food plant
Lotus corniculatus. The CNglcs are toxic and act as defense compounds as well as carry out
other important functions in the life cycle of Z. filipendulae. In insects, the biosynthetic
pathway of CNglcs is unknown, but in plants, the pathway was resolved in Sorghum bicolor.
Sorghum contain the CNglc dhurrin and its biosynthesis involves the P450s CYP79A1 and
CYP71E1 and the glycosyltransferase UGT85B1. The CNglcs are bio-activated by degradation
involving a β-glucosidase and an α-hydroxynitrile lyase (Figure 1).
41 putative UGTs could be identified in the Z. filipendulae transcriptome, three of which are full
length (Figure 4).
Figure 4. Neighbor-joining bootstrap tree of full
length UGT genes. Genes from Z. filipendulae
are marked in read and genes from plants are
green. Ae: Aedes aegypti, AM: Antheraea
mylitta, At: Arabidopsis thaliana, BM: Bombyx
mori, Ce: Caernorhabditis elegans, Cq: Culex
quinquefasciatus, Hs: Homo sapiens, SF:
Spodoptera frugiperda, Tc: Tribolium castaneum.
-----------------Biosynthesis-------------Bio-activation
Isoleucine
Lotaustralin
2-Butanone
CH
3
CH3
H3C
COOH
H3C
H
H3C
CN
HO
Plant NH2
enzymes
H3C
O
N
HO
CH3
CYP79
CH3
UGT85B
CYP71
CH3
H
COOH
H3C
CH3
CH3
CH3
CN
NH2
H3C
HO
H3C
O
CN
HO
O
β-glucosidase +
Glc
+Glc
-hydroxynitrile
+HCN
C H3
lyase
CH3
H3C
N
H3C
CN
H3C
P450s
Approximately 120 putative P450s could be identified in the Z. filipendulae transcriptome. Five of
these are full length, and seven more were extended by RACE PCR (Figure 3).
O
Gl c
Valine
Linamarin
Figure 1. Metabolism of cyanogenic glucosides. Enzymes are shown in green.
Acetone
We had the transcriptome of Z. filipendulae feeding on acyanogenic L. corniculatus plants
sequenced, to elucidate the pathways of CNglc metabolism in insects.
Results
Figure 3. Neighbor-joining bootstrap tree of
full-length P450 genes. Genes from Z.
filipendulae are marked in red, original full
length genes encircled in red. Green genes are
from plants. Ag: Anopheles gambiae, Bm:
Bombyx mori, Dm: Drosophila melanogaster,
Lj: Lotus japonicus, Sb: Sorghum bicolor.
We received 320.000 reads assembled into 30.000 contigs and 40.000 singletons (Figure 2).
A
B
bp
C
bp
bp
Figure 2. Distribution of sequence lengths and cluster sizes. A: The lengths of individual reads. B: The lengths
of contigs. C: Cluster sizes.
All sequences in our dataset similar to P450s, glycosyl transferases (UGTs), α-hydroxynitrile
lyases (HNLs) and β-glucosidases were found by BLAST searches and aligned with CLUSTAL
W in MEGA and refined by hand.
HNLs
Phylogenetic analyses
HNLs are divided into four groups, three of which are represented in Z. filipendulae: FAD
HNLs, Serine carboxypeptidase-related HNLs, and Non-FAD HNLs. 52 putative HNLs could
be identified in the Z. filipendulae transcriptome, but none of them are full length. Six are
longer than 1000 nucleotides, two from each HNL group.
We tested full length sequences from the four gene families for selection in PAML4.1. The
glucocerebrosidases and HNLs were not tested, since there were too few full length sequences in
each group. We tested Models 0, 1, 2, 3, 5, 7, 8 from codeml on our sequences with likelihood ratio
tests. ω-values are low and signifies purifying selection, but with less restraint in some areas for
P450s and β-glucosidases, since they have three classes of ω. UGTs have larger areas where they
seem to undergo neutral evolution. No sites with positive selection were detected in any of the four
gene families.
Enzymes from Z. Average ω
Best-fit model from PAML4.1
β-Glucosidases
17 putative β-glucosidases could be identified in the Z. filipendulae transcriptome, three of
which are full length. However, earlier protein sequences from a Z. filipendulae β-glucosidase
was similar to glucocerebrosidases. Therefore our cyanogenic β-glucosidase could be a
glucocerebrosidase. Four glucocerebrosidases could be identified in the Z. filipendulae
transcriptome, one of which is full length.
Figure 5: Neighbor-joining bootstrap tree
of full length β-glucosidase and
glucocerebrosidase genes. Genes from Z.
filipendulae are marked in read and plant
genes in green. AM: Antheraea mylitta,
Am: Apis melifera, BM: Bombyx mori,
HE: Heliconius erato, Hs: Homo sapiens,
Lj: Lotus japonicus, Tc: Tribolium
castaneum.
filipendulae
P450
(dN/dS)
0.1833
UGT
0.5268
β-glucosidase
0.2366
Average
0.3218
Model 3: p: 0.028
ω: 0.003
Model 1: p: 0.543
ω: 0.129
Model 3: p: 0.199
ω: 0.000
0.404
0.097
0.456
1.000
0.460
0.116
0.567
0.252
0.340
0.536
Perspectives
Based on analysis of the Z. filipendulae transcriptome generated by 454 pyrosequencing, gene
candidates for biosynthesis and bioactivation of CNglcs were identified. The number of Z.
filipendulae sequences within the examined gene families closely correspond to the number of genes
in the same families within other sequenced insects, signifying the good coverage achieved with 454
pyrosequencing. Full length sequences of the gene candidates are being generated by RACE PCR,
and they will be heterologously expressed and the recombinant enzymes biochemically
characterized to determine biological function.