Tour d`horizon transcriptomique: méthodes disponibles et

Tour d’horizon transcriptomique:
méthodes disponibles et analyses de données
Sandrine Balzergue
Plateforme Transcriptome INRA-URGV
INRA, Versailles 02.04.2012
Layout
 State of the art on April 2nd, 2012
• microarrays
• NGS and RNA-seq
 RNA-seq and CATMA
• CATMAv6 validation & URGV RNA-seq bioinformatics pipeline
• RNA-seq /CATMA : Qualitative aspects
• RNA-seq /CATMA : Quantitative aspects
• Conclusions and perspectives
Available on URGV Transcriptomic platform
Overview of microarray pipeline
Array
support
Total RNA
Total RNA
Oligos
PCR products
Amplification
Labeling
Hybridization
2days
1day
Wash
Scan
Statistical analysis
1day
Extrait de résultats : cinétique de culture de protoplastes
Log2 Intensité moyenne
7
10
13
List of genes
differentially
expressed (p-value)
E
I
M
V
exon
intron
multiple
virtual
ID
CATMA5A50147
CATMA1A30400
CATMA2A05190
CATMA1A03520
CATMA5A44810
CATMA3A17460
CATMA1A01130
CATMA1A01140
CATMA1A05970
CATMA1A05980
CATMA3A01030
CATMA4A37505
CATMA3A11740
CATMA1A17615
CATMA1A36590
CATMA3A28035
G
P
D
B
F
good
primer dimers
double band
blank
wrong size
Spec PCR_Result
At1
M1 G AT5G54280
E1 G AT1G32060
M1 G AT2G06520
E1 G AT1G04680
E2 G AT5G48900
E1 G AT3G18000
E1 G AT1G02130
E1 G AT1G02140
E1 G AT1G06910
E1 G AT1G06920
E2 G AT3G02040
M2 G AT4G35860
E1 G AT3G12770
E1 G AT1G18570
E1 G AT1G43160
E2 G AT3G28210
Log2 Ratio
-2,5
pr0h
pr0h
pr0h
pr0h
plant IV plant IV plant IV plant IV
1fonction
1Mol FoGO
1CellCompGO1BioProcGO
Ired
Igreen
Rat
Pval
myosin heavy chain, putative
motor activity myosin
No classification
8,00
14,03
-6,03 0,00E+00
phosphoribulokinase (PRK)
phosphoribulokinase
/ phosphopentokinase
chloroplast
activity biosynthesis
8,38
13,82
-5,44 0,00E+00
membrane protein, putative
molecular_function
chloroplast
unknown No classification
9,27
14,60
-5,33 0,00E+00
pectate lyase family protein
pectate lyase activity
endomembrane
Nosystem
classification
7,35
9,20
-1,85
No
pectate lyase family protein
pectate lyase activity
endomembrane
Nosystem
classification
7,35
9,20
-1,85 0,00E+00
phosphoethanolamine N-methyltransferase
phosphoethanolamine
No1 classification
/ PEAMT
N-methyltransferase
1acetate
(NMT1)
biosynthesis
activity from
7,48carbon
9,32
monoxide
-1,84 0,00E+00
Ras-related protein (ARA-5)
GTP/ small
binding
GTP-binding
endomembrane
protein,
regulation
putative
system of transcription,
11,19DNA-dependent
11,19
0,00 1,00E+00
mago nashi family proteinNo classificationnucleus
sex determination
8,71
8,71
0,00 1,00E+00
myb family transcription factor
transcription factor
nucleus
activity
No classification
7,09
7,09
0,00 1,00E+00
ovate family protein
molecular_function
mitochondrion
unknown No classification
7,15
7,15
0,00 1,00E+00
glycerophosphoryl diesterglycerophosphodiester
phosphodiesterase
chloroplast
family
phosphodiesterase
protein
glycerol metabolism
activity 11,56 10,15
1,41 0,00E+00
Ras-related GTP-bindingGTP
protein,
binding
putativeNo classification
regulation of transcription,
9,45DNA-dependent
8,04
1,41 0,00E+00
pentatricopeptide (PPR) repeat-containing
electron transporter
cellular_component
protein
activity
electron
unknown
transport
9,56
8,15
1,41 0,00E+00
myb family transcription factor
transcription
(MYB51)
factor
nucleus
activity
regulation of transcription,
8,71DNA-dependent
7,30
1,41 0,00E+00
AP2 domain-containing protein
DNA binding
RAP2.6 (RAP2.6)
nucleus
regulation of transcription,
13,41DNA-dependent
7,96
5,44 0,00E+00
zinc finger (AN1-like) family
molecular_function
protein
No unknown
classification
No classification
14,81
8,89
5,92 0,00E+00
pr24h
pr0h
Ired
9,17
8,45
11,41
6,62
6,30
6,27
9,38
8,00
5,89
6,06
8,25
8,03
7,86
No
9,55
12,54
-1,5
pr24h
pr0h
Igreen
7,80
7,35
9,56
6,62
6,30
6,27
9,38
8,00
5,89
6,06
10,81
8,73
8,77
No
12,45
13,58
P-Value
0
1,E-08
5,E-02
######
-1
pr24h
pr0h
Rat
1,37
1,10
1,84
0,00
0,00
0,00
0,00
0,00
0,00
0,00
-2,56
-0,70
-0,92
$
-2,89
-1,04
non diff
0,00
pr24h
pr0h
Pval
0,00E+00
0,00E+00
0,00E+00
1,71E-01
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
0,00E+00
5,77E-04
6,52E-09
No
0,00E+00
0,00E+00
pr48h
pr24h
Ired
10,71
11,65
12,81
9,32
7,21
8,09
11,89
9,39
6,78
6,96
10,17
9,25
9,00
8,31
11,53
13,42
1
pr48h pr48h
pr24h pr24h
Igreen Rat
10,71
0,00
10,91 0,74
12,81
0,00
7,97 1,34
7,21
0,00
8,09
0,00
11,89
0,00
9,39
0,00
6,78
0,00
6,96
0,00
10,17
0,00
9,25
0,00
9,00
0,00
8,31
0,00
11,53
0,00
14,60 -1,18
1,5
2,5
pr48h
pr24h
Pval
pr96h
pr48h
Ired
9,67
9,51
12,82
12,54
8,95
11,62
11,91
10,18
7,10
7,29
10,09
9,39
9,32
8,33
9,19
12,45
1,00E+00
1,57E-03
1,00E+00
0,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
0,00E+00
pr96h
pr48h
Igreen
11,52
11,75
12,82
10,46
7,73
8,66
11,91
10,18
7,10
7,29
10,09
9,39
9,32
8,33
11,49
14,03
pr96h
pr48h
Rat
-1,86
-2,24
0,00
2,07
1,21
2,97
0,00
0,00
0,00
0,00
0,00
0,00
0,00
0,00
-2,30
-1,58
pr96h
pr48h
Pval
0,00E+00
0,00E+00
1,00E+00
0,00E+00
1,11E-11
0,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
0,00E+00
0,00E+00
Microarray technologies
AFFYMETRIX
Home Made
AGILENT
NimbleGen
feature
Primers 25b
PCR product
Primers 60b
Primers 50-60b
Feature/chamber
500K
34000
1 million
4,2 million
hybridization
monochrome
bicolor
Mono/bi color
Mono/bi color
format
1 chamber
1 chamber
1, 2, 4, 8 chamber
1, 3, 4, 12 chamber
Array type
3’, exon, tiling
3’
3’, exon, tiling
3’, exon, tiling
Versatility
--
-
+
++
Layout
 State of the art on April 2, 2012
• microarrays
• NGS and RNA-seq
 RNA-seq and CATMA
• CATMAv6 validation & URGV RNA-seq bioinformatics pipeline
• RNA-seq /CATMA : qualitative aspects
• RNA-seq /CATMA : quantitative aspects
• Conclusions and perspectives
Available on URGV Transcriptomic platform
Overview of RNA-seq pipeline
plant
RNA/Small
RNA
extraction
2h
Library
construction
Total-RNA
2days
cDNA libraries
or
Small RNA
1day
11days
reads
sequencing
Bioinformatics
/ statistical
analysis
List of Genes diff.
expressed (p-value)
7days
Library
pre-sequencing
preparation
RNA-seq library preparation
Total RNA
polyA purification
protocol
RNA polyA
RT (dT)
Fragmentation
Fragmentation
Fragmented
molecule
Adapters ligation
PCR
PCR product
(+sizing)
Oriented
cDNA
RNA-seq pre and sequencing technologies
Pre-sequencing Library preparation
(emulsion, cluster …):
1day
Product +
sequencing
format
Sequencing 11days
reads
I
Characteristics of sequencing machines
on April, 2nd 2012!
Illumina
Mb par run
454- Life Sciences
Ion Torrent
SOLId-Applied
HiSeq2000
HiSeq2500
GS FLX Titanium
XL+
PGM 318
chip
Proton II
chip (early 2013)
5500xl System
600Gb
120Gb
0.7Gb
1Gb
144Gb
(1 lane)
70Gb
(20Gb/day)
660 million
100 million/lane
( 8 x 2 flowcell)
Nb of
reads/run
3 billion
1,2 million
1 million
12 million
Length
2x100b
Pair End (PE)
2x150b
PE
Up to 700b
200b
2x200b
PE
2x50b
Running time
11 days
27 hours
23 hours
2 hours
5 hours
3, 5 days
Disadvantages
Run time,
transition/transversion
homopolymers
homopolymers
Advantages
PE, output
length
Rapid, PE, output
(200
million/lane)
(=>360 million
real)
Length, slow
Characteristics of promising sequencing machines
Pacific BioSciences
SMRT®
(Single Molecule, Real-Time)
MinION Oxford nanopore
Do not focus on one technology : Tomorrow, there will be a new one!
The important point is the expertise…
Layout
 State of the art on April 2nd, 2012
• microarrays
• NGS and RNA-seq
 RNA-seq and CATMA
• CATMAv6 validation & URGV RNA-seq bioinformatics pipeline
• RNA-seq /CATMA : Qualitative aspects
• RNA-seq /CATMA : Quantitative aspects
• Conclusions and perspectives
Available on URGV Transcriptomic platform
1 sample/lane
5 samples/lane
HiSeq/GAII
AIP Bio-ressource
IBISA CDD-6 months
(E. Blondet)
CRR4
WT
leave
Bud
CAT-seq project
WT vs PPR mutant
Flower buds vs leaves
2 biological replicates
CATMA v6
Q-RT-PCR
CATMA v5
Tiling-array
mRNA-seq,
Directional
Delivery:
Upgrade of
and FLAGdb++ for NGS data management
RNAseq analysis (as member of the BIOS network)
Development of statistical methods for transcript quantification
Clustering of transcriptomic data coming from hybridizations/RNAseq (collab. with G.Celeux, INRIA)
Detection of post-transcriptional maturations in PPR mutants
Comparisons of RNAseq results with classical micro-array approaches
CATMAv6 description
3’
5’
CATMAv6 features
Probe
number
TAIR annotated genes
30 834
1 primer/gene in triplicate
 both strands
12 hybridizations on one slide
EUGENE annotated genes 1289
Repeat elements
5352
miRNAs
658
Other RNAs
342
Controls
36
 new amplification/labeling/hyb. protocols
 new statistical method
Differential analysis
False Positive
Control (FPC)
Normalization
Raw
intensities
Normalized
intensities
No background subtraction
Control of Technical bias
Variance estimation (Limma)
FPC: FDR with m0 adjustment
Differentially
expressed genes
with p-value
(Ratio)
CATMAv6 Q-PCR validation
10
y = 1,0427x + 0,2718
R² = 0,7395
8
6
4
2
0
-4
-2
0
2
4 Catma log-ratios
6
-2
-4
-6
qPCR Ct
-6
-8
-10
Very good correlation with qPCR
more accurate than the previous CATMA-v5 version
on 228 genes
Sample : BF vs F
Trimming
No N,
length>30,
quality …
URGV RNA-seq bioinformatics pipeline
Velvet+Oases+TGICL
reads
Contigs
4days
ASSEMBLAGE
6h
MAPPING
Bowtie2
1h
READS
COUNTING
Reads
Readsper
perContig
mRNAor
mRNA
model
model*
reads number per
contig or mRNA
model
DEseq
NORMALIZATION
/ DIFF. ANALYZES
Extrait de résultats : cinétique de culture de protoplastes
Log2 Intensité moyenne
7
10
13
E
I
M
V
BIOS Computer :
RAM 96G,
processors: 10.
exon
intron
multiple
virtual
ID
CATMA5A50147
CATMA1A30400
CATMA2A05190
CATMA1A03520
CATMA5A44810
CATMA3A17460
CATMA1A01130
CATMA1A01140
CATMA1A05970
CATMA1A05980
CATMA3A01030
CATMA4A37505
CATMA3A11740
CATMA1A17615
CATMA1A36590
CATMA3A28035
G good
P primer dimers
D double band
B blank
F wrong size
Spec PCR_Result
At1
M1 G AT5G54280
E1 G AT1G32060
M1 G AT2G06520
E1 G AT1G04680
E2 G AT5G48900
E1 G AT3G18000
E1 G AT1G02130
E1 G AT1G02140
E1 G AT1G06910
E1 G AT1G06920
E2 G AT3G02040
M2 G AT4G35860
E1 G AT3G12770
E1 G AT1G18570
E1 G AT1G43160
E2 G AT3G28210
Log2 Ratio
-2,5
pr0h
pr0h
pr0h
pr0h
plant IV plant IV plant IV plant IV
1fonction
1Mol FoGO
1CellCompGO1BioProcGO
Ired
Igreen
Rat
Pval
myosin heavy chain, putative
motor activity myosin
No classification
8,00
14,03
-6,03 0,00E+00
phosphoribulokinase (PRK)
phosphoribulokinase
/ phosphopentokinase
chloroplast
activity biosynthesis
8,38
13,82
-5,44 0,00E+00
membrane protein, putative
molecular_function
chloroplast
unknown No classification
9,27
14,60
-5,33 0,00E+00
pectate lyase family protein
pectate lyase activity
endomembrane
Nosystem
classification
7,35
9,20
-1,85
No
pectate lyase family protein
pectate lyase activity
endomembrane
Nosystem
classification
7,35
9,20
-1,85 0,00E+00
phosphoethanolamine N-methyltransferase
phosphoethanolamine
No1 classification
/ PEAMT
N-methyltransferase
1acetate
(NMT1)
biosynthesis
activity from
7,48carbon
9,32
monoxide
-1,84 0,00E+00
Ras-related protein (ARA-5)
GTP/ small
binding
GTP-binding
endomembrane
protein,
regulation
putative
system of transcription,
11,19DNA-dependent
11,19
0,00 1,00E+00
mago nashi family proteinNo classificationnucleus
sex determination
8,71
8,71
0,00 1,00E+00
myb family transcription factor
transcription factor
nucleus
activity
No classification
7,09
7,09
0,00 1,00E+00
ovate family protein
molecular_function
mitochondrion
unknown No classification
7,15
7,15
0,00 1,00E+00
glycerophosphoryl diesterglycerophosphodiester
phosphodiesterase
chloroplast
family
phosphodiesterase
protein
glycerol metabolism
activity 11,56 10,15
1,41 0,00E+00
Ras-related GTP-bindingGTP
protein,
binding
putativeNo classification
regulation of transcription,
9,45DNA-dependent
8,04
1,41 0,00E+00
pentatricopeptide (PPR) repeat-containing
electron transporter
cellular_component
protein
activity
electron
unknown
transport
9,56
8,15
1,41 0,00E+00
myb family transcription factor
transcription
(MYB51)
factor
nucleus
activity
regulation of transcription,
8,71DNA-dependent
7,30
1,41 0,00E+00
AP2 domain-containing protein
DNA binding
RAP2.6 (RAP2.6)
nucleus
regulation of transcription,
13,41DNA-dependent
7,96
5,44 0,00E+00
zinc finger (AN1-like) family
molecular_function
protein
No unknown
classification
No classification
14,81
8,89
5,92 0,00E+00
pr24h
pr0h
Ired
9,17
8,45
11,41
6,62
6,30
6,27
9,38
8,00
5,89
6,06
8,25
8,03
7,86
No
9,55
12,54
-1,5
pr24h
pr0h
Igreen
7,80
7,35
9,56
6,62
6,30
6,27
9,38
8,00
5,89
6,06
10,81
8,73
8,77
No
12,45
13,58
P-Value
0
1,E-08
5,E-02
######
-1
pr24h
pr0h
Rat
1,37
1,10
1,84
0,00
0,00
0,00
0,00
0,00
0,00
0,00
-2,56
-0,70
-0,92
$
-2,89
-1,04
non diff
0,00
pr24h
pr0h
Pval
0,00E+00
0,00E+00
0,00E+00
1,71E-01
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
0,00E+00
5,77E-04
6,52E-09
No
0,00E+00
0,00E+00
pr48h
pr24h
Ired
10,71
11,65
12,81
9,32
7,21
8,09
11,89
9,39
6,78
6,96
10,17
9,25
9,00
8,31
11,53
13,42
1
pr48h pr48h
pr24h pr24h
Igreen Rat
10,71
0,00
10,91 0,74
12,81
0,00
7,97 1,34
7,21
0,00
8,09
0,00
11,89
0,00
9,39
0,00
6,78
0,00
6,96
0,00
10,17
0,00
9,25
0,00
9,00
0,00
8,31
0,00
11,53
0,00
14,60 -1,18
1,5
2,5
pr48h
pr24h
Pval
pr96h
pr48h
Ired
9,67
9,51
12,82
12,54
8,95
11,62
11,91
10,18
7,10
7,29
10,09
9,39
9,32
8,33
9,19
12,45
1,00E+00
1,57E-03
1,00E+00
0,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
0,00E+00
pr96h
pr48h
Igreen
11,52
11,75
12,82
10,46
7,73
8,66
11,91
10,18
7,10
7,29
10,09
9,39
9,32
8,33
11,49
14,03
pr96h
pr48h
Rat
-1,86
-2,24
0,00
2,07
1,21
2,97
0,00
0,00
0,00
0,00
0,00
0,00
0,00
0,00
-2,30
-1,58
pr96h
pr48h
Pval
0,00E+00
0,00E+00
1,00E+00
0,00E+00
1,11E-11
0,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
0,00E+00
0,00E+00
List of Genes diff.
expressed (p_value)
1h
Layout
 State of the art on April 2nd, 2012
• microarrays
• NGS-RNA-seq
 RNA-seq and CATMA
• CATMAv6 validation & URGV RNA-seq bioinformatics pipeline
• RNA-seq / CATMA : Qualitative aspects
• RNA-seq / CATMA : Quantitative aspects
• Conclusions and perspectives
Available on URGV Transcriptomic platform
Evaluation of assembly pipeline
Reads
(57m)
TRIMMING
Reads
(55m)
ASSEMBLAGE
Contigs
(56177)
MAPPING
Reads/contig
(53378)
97% of contigs mapping annotated genes (TAIR annotation) :
- 90% of contigs confirm gene models
contig
CDS
mRNA
- 7% of contigs with other gene models
FLAGdb++ screenshot
- in average, 2 or 3 contigs/gene
- in average, contigs cover 78% of the mRNA length
3% of contigs mapped outside annotated genes (TAIR) : new genes ?!...
Production of high quality RNA contigs
Even with a well annotated genome like Arabidopsis, new transcriptional units are detected
RNA-seq is a very powerful tool for gene discovery
Col0
GAII
leaves
Alternative splicing detection
CDS
mRNA
contig
No mature mRNA detection ? (link with read number)
mRNA-seq : very powerful for detecting alternative splicing forms
Difficulties to quantify splicing forms
Melon
FLAGdb++ screenshot
Million of
reads
Detected genes vs read number
Detected
genes
0.5
16507
2.5
19041
5
19926
10
20748
20
21487
27.5
21739
40
22015
190
22600
No need for a large amount of reads to perform gene
detection
RNA-seq has a better detection sensitivity than microarrays
GAII
number of genes
detected in the same sample
using CATMAv6
Sample:
Col0
Leaves
GAII and HiSeq
SR
Hiseq
Layout
 State of the art on April 2, 2012
• microarrays
• NGS-RNA-seq
 RNA-seq and CATMA
• CATMAv6 validation & URGV RNA-seq bioinformatics pipeline
• RNA-seq / CATMA : Qualitative aspects
• RNA-seq / CATMA : Quantitative aspects
• Conclusions and perspectives
Available on URGV Transcriptomic platform
Read distribution
70
60
% of reads
50
40
30
20
10
0
Detected gene fraction (with decreasing expression level %)
Sample:
Col0
Leaves
HiSeq2000
PE
Very biased distribution of the reads : 114 millions of reads are cognate to
the 1500 most expressed genes
Numerous genes have only one or two reads: difficult for quantification
CATMAv6 intensities vs RNA-seq counts
CATMA intensities (log2)
17
15
13
11
9
7
5
-14
-12
-10
-8
-6
-4
-2
0
2
4
6
8
10
12
14
RNA-Seq counts (log2(RPKM))
Samples :leaves
Very large dynamic range of RNA-Seq
Good correlation for genes having CATMA intensities higher than 8:
The hybridization background impairs the quantification of poorly
expressed genes by microarrays
RNA-seq read counts reproducibility
20
18
R² = 0,9497
16
BF2 RNA-Seq counts (log2)
14
12
10
8
6
4
2
0
0
2
4
6
8
10
12
14
16
18
20
BF1 RNA-Seq counts (log2)
Very good reproducibility of RNA-seq counts between biological
replicates. It suggests excellent technical reproducibility.
RNA-seq and CATMAv6 Log Ratio reproducibility
RNA-seq
CATMAv6
20
8
15
6
4
10
2
5
BF1/F1 log ratios
-10
0
-5
0
5
10
15
20
-6
-4
0
-2
0
-10
-15
y = 0,9379x + 0,0058
R² = 0,806
2
4
6
-2
-5
BF2/F2 logratios
-15
-8
-4
-6
BF2/F2 logratios
BF1/F1 log ratios
-8
After removal genes with indeterminate logRatio in one of the two repetition
On Log Ratio, both technologies have a good reproducibility.
y = 0,9069x - 0,0149
R² = 0,7988
8
Differentially expressed genes in RNA-seq and CATMA
Bud vs Leaves
CRR4 vs Col
DE gene CATMAv5
4527
0
DE gene CATMAv6
16226
running
RNA-seq (one lane/sample)
8799
80
RNA-seq (multiplexe-5 samples/lane)
7896
NA
CATMAv6 detects almost twice as many differentially expressed genes as
RNA-Seq
84% of diff. expressed genes by RNA-seq are included in diff. expressed
genes by CATMAv6
Lack of 1100 genes with 5 multiplexed samples
2 biological repetitions
CATMAv5: common variance, Bonf5%
CATMAv6: Limma, FDR5%
RNA-seq: DEseq, BH 5%
Distribution of Differentially Expressed gene intensities
by microarray or RNA-seq
2500
DE Catma
DE RNA-Seq single
DE RNA-Seq multi
2000
Number of gene
1500
1000
500
0
<6
6-7
7-8
8-9
9-10
10-11
11-12
12-13
13-14
14-15
Log2 CATMA intensity (F)
Samples:
Buds vs leaves
190000K reads
HiSeq2000
PE
Surprising lack of power of RNA-Seq in medium-high intensities
In the low expression levels, RNA-Seq detects more differentially
expressed genes (on CATMAv6 background)
15-16
qPCR vs (RNA-seq and CATMAv6)
10
y = 0.7606x + 0.6005
R² = 0.8286
CATMA
5
y = 1.2933x - 0.2356 RNA-seq
R² = 0.6988
R² = 0,8422
qPCR Cq
0
-10
-8
-6
-4
-2
0
2
4
6
8
V6.1 limma
RNASeq
Linear (V6.1 limma)
Linear (RNASeq)
log-ratios
-5
-10
-15
On 142 genes
Sample: BF vs F
Similar correlation of RNA-Seq and CATMAv6 with qPCR
The additional differentially expressed genes detected by CATMAv6
are true ones!
qPCR stays the most sensitive technology !
Conclusions & perspectives
- Analysis of samples with few diff. expressed genes : CRR4 vs WT
- Perform orientated libraries
- Find post-transcriptional RNA modifications (CRR4 -chloro)
- Look into the reproducibility of library construction (technical repl. seq)
- Comparison with Tiling array data
-…
RNA-seq is more sensitive on low expression levels
RNA-seq allows annotation of genomes with a low number of reads
RNA-seq provides information on the splicing events
RNA-seq is the best tool for gene detection
Compared to CATMAv6 array and for Arabidopsis; lack of power in the
statistical methods in terms of differential analysis BUT the analysis
methods are rapidly evolving (J. Aubert et al., submitted).
Remains expensive (ex: 4 samples: RNA-seq : 2320€ vs CATMAv6 : 860€)
RNA-seq analysis (assembly, clustering …) requires heavy computing
facilities
Intelligent use of transcriptomic technologies!
What is your biological question?
What have you already got?
Quantification
Unigene Set
Splicing
Automatic annotated genome
New gene discovery
Functional annotation
Annotation
Nothing
Small RNAs
Etc…
What are your options?
Sequencing technologies
High density microarrays
Multiplexing samples
Tiling microarrays
Library protocol
RNA amplification protocol
Assembly or direct mapping
Sequencing length
Pair End/Single reads
Custom made microarrays
Layout
 State of the art on April 2nd, 2012
• microarrays
• NGS-RNA-seq
 RNA-seq and CATMA
• CATMAv6 validation & URGV RNA-seq bioinformatics pipeline
• RNA-seq / CATMA : Qualitative aspects
• RNA-seq / CATMA : Quantitative aspects
• Conclusions and perspectives
 Available on the URGV Transcriptomic platform
Transcriptomic Platform pipeline
transcriptome choice
(method)
Experimental design
Library/Sequencing /Hybridizations
Statistical analysis
Extrait de résultats : cinétique de culture de protoplastes
Log2 Intensité moyenne
7
10
13
E
I
M
V
Expertise/help to result interpretation
exon
intron
multiple
virtual
ID
CATMA5A50147
CATMA1A30400
CATMA2A05190
CATMA1A03520
CATMA5A44810
CATMA3A17460
CATMA1A01130
CATMA1A01140
CATMA1A05970
CATMA1A05980
CATMA3A01030
CATMA4A37505
CATMA3A11740
CATMA1A17615
CATMA1A36590
CATMA3A28035
G
P
D
B
F
good
primer dimers
double band
blank
wrong size
Spec PCR_Result
At1
M1 G AT5G54280
E1 G AT1G32060
M1 G AT2G06520
E1 G AT1G04680
E2 G AT5G48900
E1 G AT3G18000
E1 G AT1G02130
E1 G AT1G02140
E1 G AT1G06910
E1 G AT1G06920
E2 G AT3G02040
M2 G AT4G35860
E1 G AT3G12770
E1 G AT1G18570
E1 G AT1G43160
E2 G AT3G28210
Log2 Ratio
-2,5
-1,5
P-Value
0
1,E-08
5,E-02
######
-1
pr0h
pr0h
pr0h
pr0h
pr24h pr24h pr24h
plant IV plant IV plant IV plant IV
pr0h pr0h pr0h
1fonction
1Mol FoGO
1CellCompGO1BioProcGO
Ired
Igreen
Rat
Pval
Ired Igreen Rat
myosin heavy chain, putative
motor activity myosin
No classification
8,00
14,03
-6,03 0,00E+00 9,17 7,80 1,37
phosphoribulokinase (PRK)
phosphoribulokinase
/ phosphopentokinase
chloroplast
activity biosynthesis
8,38
13,82
-5,44 0,00E+00 8,45 7,35 1,10
membrane protein, putative
molecular_function
chloroplast
unknown No classification
9,27
14,60
-5,33 0,00E+00 11,41 9,56 1,84
pectate lyase family protein
pectate lyase activity
endomembrane
Nosystem
classification
7,35
9,20
-1,85
No
6,62 6,62
0,00
pectate lyase family protein
pectate lyase activity
endomembrane
Nosystem
classification
7,35
9,20
-1,85 0,00E+00 6,30 6,30
0,00
phosphoethanolamine N-methyltransferase
phosphoethanolamine
No1 classification
/ PEAMT
N-methyltransferase
1acetate
(NMT1)
biosynthesis
activity from
7,48carbon
9,32
monoxide
-1,84 0,00E+00 6,27 6,27
0,00
Ras-related protein (ARA-5)
GTP/ small
binding
GTP-binding
endomembrane
protein,
regulation
putative
system of transcription,
11,19DNA-dependent
11,19
0,00 1,00E+00 9,38 9,38
0,00
mago nashi family proteinNo classificationnucleus
sex determination
8,71
8,71
0,00 1,00E+00 8,00 8,00
0,00
myb family transcription factor
transcription factor
nucleus
activity
No classification
7,09
7,09
0,00 1,00E+00 5,89 5,89
0,00
ovate family protein
molecular_function
mitochondrion
unknown No classification
7,15
7,15
0,00 1,00E+00 6,06 6,06
0,00
glycerophosphoryl diesterglycerophosphodiester
phosphodiesterase
chloroplast
family
phosphodiesterase
protein
glycerol metabolism
activity 11,56 10,15
1,41 0,00E+00 8,25 10,81 -2,56
Ras-related GTP-bindingGTP
protein,
binding
putativeNo classification
regulation of transcription,
9,45DNA-dependent
8,04
1,41 0,00E+00 8,03 8,73 -0,70
pentatricopeptide (PPR) repeat-containing
electron transporter
cellular_component
protein
activity
electron
unknown
transport
9,56
8,15
1,41 0,00E+00 7,86 8,77 -0,92
myb family transcription factor
transcription
(MYB51)
factor
nucleus
activity
regulation of transcription,
8,71DNA-dependent
7,30
1,41 0,00E+00 No
No
$
AP2 domain-containing protein
DNA binding
RAP2.6 (RAP2.6)
nucleus
regulation of transcription,
13,41DNA-dependent
7,96
5,44 0,00E+00 9,55 12,45 -2,89
zinc finger (AN1-like) family
molecular_function
protein
No unknown
classification
No classification
14,81
8,89
5,92 0,00E+00 12,54 13,58 -1,04
non diff
0,00
pr24h
pr0h
Pval
0,00E+00
0,00E+00
0,00E+00
1,71E-01
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
0,00E+00
5,77E-04
6,52E-09
No
0,00E+00
0,00E+00
pr48h
pr24h
Ired
10,71
11,65
12,81
9,32
7,21
8,09
11,89
9,39
6,78
6,96
10,17
9,25
9,00
8,31
11,53
13,42
1
pr48h pr48h
pr24h pr24h
Igreen Rat
10,71
0,00
10,91 0,74
12,81
0,00
7,97 1,34
7,21
0,00
8,09
0,00
11,89
0,00
9,39
0,00
6,78
0,00
6,96
0,00
10,17
0,00
9,25
0,00
9,00
0,00
8,31
0,00
11,53
0,00
14,60 -1,18
1,5
2,5
pr48h
pr24h
Pval
pr96h
pr48h
Ired
9,67
9,51
12,82
12,54
8,95
11,62
11,91
10,18
7,10
7,29
10,09
9,39
9,32
8,33
9,19
12,45
1,00E+00
1,57E-03
1,00E+00
0,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
0,00E+00
pr96h
pr48h
Igreen
11,52
11,75
12,82
10,46
7,73
8,66
11,91
10,18
7,10
7,29
10,09
9,39
9,32
8,33
11,49
14,03
pr96h
pr48h
Rat
-1,86
-2,24
0,00
2,07
1,21
2,97
0,00
0,00
0,00
0,00
0,00
0,00
0,00
0,00
-2,30
-1,58
pr96h
pr48h
Pval
0,00E+00
0,00E+00
1,00E+00
0,00E+00
1,11E-11
0,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
1,00E+00
0,00E+00
0,00E+00
Platform available transcriptome technologies
Illumina
HiSeq2000 /GAII (IG-CNS)
Life Technology
Ion Proton (sept-2012)
library
TRU-seq (Small TRU-seq)
Ion Total RNA-Seq + Small Kit
(directional)
cost
200€/library + 1400€/Hiseq lane
+100€/sample for bioinformatics
Around 200€/library + 1100€/Proton II
+100€/sample for bioinformatics
Other Transcriptomic Platform developments
RNA-seq
microarrays
Design/hybridization/ analysis of High density NimbleGen array
for Cultivated plants
Melon - D. Hosemans, VCo (KBBE)
Pea - J. Burstin, Dijon (ANR)
Medicago - P.Gamas/J.Gouzy/J.Buitink, Toulouse/Angers (Collab.)
Vitis - M. Delledonne (Verone)
Micro-dissected samples
 Brachypodium -R. Sibout (Collaboration, AFFY chips)
 A. thaliana - JD Faure (ANR Regeneome, CATMA arrays)
Individual cell transcriptome
 Plant RNA analysis using Biosynthetic Tagging technology
R. Berthome, URGV
INTACT System (Dvt Cell, 2010), collab. B. Dubreucq
No coding RNA (noPolyA-RNA)
 A. thaliana - A. Dietrich (IBMP, Collaboration, CATMA array)
E. Delannoy, URGV
Platform flowchart 2012
Scientific Council
S. Balzergue
Platform Manager,
Research Engineer
L. Soubigou-Taconnat
Research Assistant,
CATMAv6, Tiling A. th
mRNA-seq,
Quality
S. Huguet,
Technician
AFFYMETRIX,
Roche-NimbleGen
E. Delannoy
C. Lurin
S. Aubourg
Research Scientist Group Leader,
Group Leader,
Research Director Research Director
DUA URGV
Bioinformatics
S. Pateyron (half-time)
Technician
CATMAv6,
Tiling A. th
J. Caius (half-time)
Technical Assistant,
AFFYMETRIX
Bioinformatics and statistics supports (“Bioinformatics & Predictive Genomics team”):
V. Brunaud, J-Ph Tamby, M-L Martin-Magniette, Ph. Grevet, G. Rigaill, O. Rogier* and S. Aubourg.
Merci de votre attention
http://www-urgv.versailles.inra.fr/microarray/index.htm
« Les journées transcriptome de l'URGV-Genopôle »
21 et 22 mai 2012, Evry
https://colloque4.inra.fr/journees_transcriptome_urgv