Annotation of transposable elements in genome sequences

Melampsora Genome Consortium
2008 Summer Workshop
Transposable elements in
Melampsora larici-populina genome
Marie-Pierre Oudot-Le Secq
INRA-Nancy, August 20-21 2008
Outline
1. Transposable elements
1. Main types characteristics
2. Nomenclature
2. TE annotation
1. Annotation pipe-line
2. Manual curation
MGC Summer Workshop INRA-Nancy, August 20-21 2008
1. Transposable elements
« mobile DNA segments in the genome »
Eubacteria - Archaebacteria - Eukaryotes
Impact on the host genome
MGC Summer Workshop INRA-Nancy, August 20-21 2008
1. Transposable elements
1.1. Main types characteristics
2 classes
Transposition
RNA: Class I
« Retrotransposons »
DNA: Class II
« DNA transposons »
Reverse Transcriptase
Transposase
Copy-and-paste
Cut-and-paste
MGC Summer Workshop INRA-Nancy, August 20-21 2008
1. Transposable elements
1.2. Nomenclature
Nature reviews
Genetics
Wicker et al.
December 2007
1. Transposable elements
1.2. Nomenclature
MGC Summer Workshop INRA-Nancy, August 20-21 2008
1. Transposable elements
1.2. Nomenclature
MGC Summer Workshop INRA-Nancy, August 20-21 2008
2. TE annotation
Starting point:
TE annotation pipe-line result
Ran on the previous assembly of Melampsora laricipopulina genome
Timothé Flutre, Elodie Duprat and Hadi Quesneville
MGC Summer Workshop INRA-Nancy, August 20-21 2008
2. TE annotation
2.1. TE pipe-line
Gathering of repeated sequences
and
making consensus out of them
Set of
sequences
Set of
sequences
Matcher Connected
Grouper
HSPs
Consensus
Blaster
Pairwise
alignments
RECON PILER
Groups of
repeated
sequences
Lucy
Multiple
alignments
Timothé Flutre, Elodie Duprat and Hadi Quesneville
MGC Summer Workshop INRA-Nancy, August 20-21 2008
2. TE annotation
2.1. TE pipe-line
Consensus characterization
Structural annotation
TIR, LTR
Poly A tail
SSR
Putative ORFs
Consensus
RepBaseUpdate
nucleotides
Characterized
consensus
Known TE
associated ORFs
Functional annotation
RepBaseUpdate
proteins
Timothé Flutre, Elodie Duprat and Hadi Quesneville
MGC Summer Workshop INRA-Nancy, August 20-21 2008
2. TE annotation
2.1. TE pipe-line
melampsora-long_Recon_1027_1-21094
5993 I
LTR
comp
structure (LTR) (lgth -LTR: 5655)
melampsora-long_Recon_1039_1-21562
6461 I
LTR
comp
structure (LTR) (lgth -LTR: 6231)
melampsora-long_Recon_10703_1-14580
5042 I
LTR
comp
structure (LTR) (lgth -LTR: 4314)
melampsora-long_Recon_1518_1-26652
5971
I
LTR
comp
structure
(LTR)(LTR)
(lgth -LTR: 4853)
melampsora-long_Recon_10426_1-3059
279 I
LARD
structure
melampsora-long_Recon_2493_1-11803
6298
I
LTR
comp
structure
(LTR)
(lgth -LTR: 5912)
melampsora-long_Recon_10473_1-3812
725 I
LARD
structure (LTR)
melampsora-long_Recon_2972_1-30131
6389 602
I
comp
structure
(LTR)(LTR)
(lgth -LTR: 4681)
melampsora-long_Recon_12388_2-16751
I LTR LARD
structure
melampsora-long_Recon_3254_1-12596
6190
I
LTR
comp
structure
(LTR)
(lgth
-LTR: 5406)
melampsora-long_Recon_12585_1-17153
6499014
I
(LTR)
melampsora-long_Recon_15417_2-13773
I LARDLINE comp structure
structure
(SSR_tail)
melampsora-long_Recon_3563_1-8149
6003
I
LTR
comp
structure
(LTR)
(lgth
-LTR: 5697)
melampsora-long_Recon_14209_1-3077
3693586
I
(LTR)(SSR_tail)
melampsora-long_Recon_2114_1-1060
I LARDLINE comp structure
structure
melampsora-long_Recon_391_2-32597
4918
I
LTR
comp
structure
(LTR)
(lgth
-LTR: 4678)
melampsora-long_Recon_14776_1-25898
7363605
I
(LTR)(SSR_tail)
melampsora-long_Recon_2365_2-10734
I LARDLINE comp structure
structure
melampsora-long_Recon_4076_1-27239
4994 I
comp
structure
(LTR)(LTR)
(lgth
-LTR: 4258)
melampsora-long_Recon_14920_1-5625
I LTR
structure
melampsora-long_Recon_2922_1-14693
I LARD
comp
structure
(SSR_tail)
melampsora-long_Recon_1013_1-20576 3493555
1983
I LINESINE
structure
(SSR_tail)
melampsora-long_Recon_771_1-13518
5367
I
LTR
comp
structure
(LTR)
(lgth
-LTR: 4889)
melampsora-long_Recon_14925_1-5760
988
I
LARD
structure
(LTR)
melampsora-long_Recon_4091_1-8313
3832
I
LINE
comp
structure
(SSR_tail)
melampsora-long_Recon_10145_1-5767
192 I
SINE
structure (SSR_tail)
melampsora-long_Recon_8262_1-26667
6983 I
comp
structure
(LTR)(LTR)
(lgth
-LTR: 6323)
melampsora-long_Recon_14930_1-5829
I LTR
structure
melampsora-long_Recon_4451_2-20823
I LARD
comp
structure
(polyA_tail)
melampsora-long_Recon_10151_1-31077 3265673
2056
I LINESINE
structure
(polyA_tail)
melampsora-long_Recon_9126_2-21297
6470
I
LTR
comp
structure
(LTR)
(lgth
-LTR: 5872)
melampsora-long_Recon_18336_1-5173
5484510
I 2111
(LTR)
melampsora-long_Recon_4592_1-5088
I LARD
comp
structure
(SSR_tail)
melampsora-long_Recon_10178_1-11972
I LINE
structure
(SSR_tail)
melampsora-long_Recon_10087_1-2643
4924
IISINE
TIR structure
comp
structure
(TIR)
melampsora-long_Recon_2299_1-16036
774
I
LARD
structure
(LTR)
melampsora-long_Recon_463_1-2309
3433
I
LINE
comp
structure
(SSR_tail)
melampsora-long_Recon_10188_1-20982
344
I
SINE
structure
(polyA_tail)
melampsora-long_Recon_10136_1-28621
1073 II
TIR
comp
structure (TIR)
melampsora-long_Recon_2362_1-22723
572 I 1620
LARD
(LTR)
melampsora-long_Recon_10210_1-7539
I
(polyA_tail)
melampsora-long_Recon_1017_3-20612
960
IISINETIR structure
comp structure
structure
(TIR)
melampsora-long_Recon_3298_3-32285
495 I
LARD
structure
(LTR)
melampsora-long_Recon_10217_1-4962
504
I
SINE
structure
(SSR_tail)
melampsora-long_Recon_10329_1-25101
3742
II
TIR
comp
structure
(TIR)(TIR)
melampsora-long_Recon_890_1-16679
848 II
TIR
uncomp
structure
melampsora-long_Recon_10302_1-13696
300
I
SINE
structure
(SSR_tail)
melampsora-long_Recon_104_1-21914
1275
II
TIR
comp
structure
(TIR)(TIR)
melampsora-long_Recon_890_2-16679
849 II
TIR
uncomp
structure
melampsora-long_Recon_10304_1-13407
1012
I
SINE
structure
(SSR_tail)
melampsora-long_Recon_104_4-21914
1417 620
II IITIR TIRcompuncompstructure
(TIR)(TIR)
melampsora-long_Recon_898_2-16900
structure
melampsora-long_Recon_10545_1-27351
1619
II
TIR
comp
structure
(TIR)(TIR)
melampsora-long_Recon_9373_1-29066
683 244
II IITIR MITE
uncomp
structure
melampsora-long_Recon_1025_1-21042
structure
(TIR)
melampsora-long_Recon_10626_1-9973
1081
II
TIR
comp
structure
(TIR)(TIR)
melampsora-long_Recon_950_1-18292
542
II
TIR
uncomp
structure
melampsora-long_Recon_1055_2-22050
242 II
MITE
structure (TIR)
melampsora-long_Recon_10806_1-20186
1942 542
II IITIR TIRcompuncompstructure
(TIR)(TIR)
melampsora-long_Recon_950_3-18292
structure
melampsora-long_Recon_1055_3-22050
243 II
MITE
structure
(TIR)
melampsora-long_Recon_11179_2-3734
2098 821
II IITIR TIRcompuncompstructure
(TIR)(TIR)
melampsora-long_Recon_955_3-18702
structure
melampsora-long_Recon_1055_4-22050
242 II
MITE
structure
(TIR)
melampsora-long_Recon_969_1-19117
892 457
II IITIR MITE
uncomp
structure
(TIR)(TIR)
melampsora-long_Recon_10843_1-18729
structure
melampsora-long_Recon_969_2-19117
873 173
II IITIR MITE
uncomp
structure
(TIR)(TIR)
melampsora-long_Recon_1094_1-25152
structure
melampsora-long_Recon_969_3-19117
889
II
TIR
uncomp
structure
(TIR)(TIR)
melampsora-long_Recon_1094_2-25152
174 II
MITE
structure
melampsora-long_Recon_969_4-19117
893
II
TIR
uncomp
structure
(TIR)(TIR)
melampsora-long_Recon_1094_3-25152
179 II
MITE
structure
melampsora-long_Recon_10_1-20220
179 II
MITE
structure (TIR)
MGC Summer Workshop INRA-Nancy, August 20-21 2008
2. TE annotation
2.2. Manual curation
Annotation of elements
Checking the consensus
Blasts of consensus on genomic sequence
Checking of the result on Artemis:
refining elements
detection and fine annotation of nested elements
MGC Summer Workshop INRA-Nancy, August 20-21 2008
Number of consensus in Melampsora
•
•
•
•
•
•
•
•
•
•
•
•
•
Number of LTRcomp: 13
Number of LTRuncomp: 0
Number of LARD: 17
Number of LINEcomp: 32
Number of LINEuncomp: 0
Number of SINE: 536
Number of TIRcomp: 180
Number of TIRuncomp: 117
Number of MITE: 212 (+ 107)
Number of Helitron: 0
Number of Polinton: 0
Number of confused: 1372
Number of NoCat: 4616
MGC Summer Workshop INRA-Nancy, August 20-21 2008
LTR Consensus
Fine checking and annotation of scaffold_1: other full elements
found
=> 2 rounds of blast/annotation
At the moment:
2660 (full and partial) covering 7.855.502 bp => 7.77%
From original consensus:
Order LTR: 6 Copia
7 Gypsy
Order DIRS...
TIR Consensus
Original consensus:
12 Tc1/Mariner
28 hAT
3 Mutator
15 harbinger
19 undefined
107 without blast hits or caracteristic ORF=> « MITE »
Raw mapping:
4159 (full and partial) covering 12.046.646 bp => 11.92%
Mapping on genom
example of nested elements