Identification of long non-coding RNAs (lncRNAs) using RNASeq in dogs IGDR - UMR6290 - CNRS - Université de Rennes1 Canine Genetics Group - Catherine André ! Thomas DERRIEN December 10th Non-coding Genome • 80% of the variants associated with disease (by GWAS) are localized outside of proteincoding genes (Manolio et al., Hindorrf et al.) ! • >60% of the human genome is covered by processed_transcripts (~75% by primary transcripts) with only 2% corresponding to proteins... (ENCODE Consortium; Djebali Nature; 2012) Non-coding Genome • 80% of the variants associated with disease (by GWAS) are localized outside of proteincoding genes (Manolio et al., Hindorrf et al.) ! • >60% of the human genome is covered by processed_transcripts (~75% by primary transcripts) with only 2% corresponding to proteins... (ENCODE Consortium; Djebali Nature; 2012) • Back to the future: The cell as an RNA machinery ! (from Amaral P, et al., 2008) Type functions miRNAs Regulation of gene expression siRNAs RNA interference pathway snoRNAs Chemical modification of rRNA, tRNAs and small RNAs piRNAs transposon defense - regulate euchromatin formation snRNA splicing, regulation of TFs, telomere stability... ... ... long ncRNAs Various What is known about lncRNAs • • Definition : Transcripts without coding potential , >200 nt, spliced, polyA+/- (Derrien et al., 2012) Annotation in human : e.g GENCODE reference annotation (Harrow et al., 2012) (1000 genomes project) 25000 Protein-coding_Genes LncRNAs_Genes Number of genes 20000 15000 10000 5000 12 12 st /2 0 gu Au Ju ch ar M ne /2 0 /2 0 12 01 1 D ec em be er r/2 /2 0 11 11 O ct ob ly Ju /2 0 ay M /2 0 11 11 /2 0 ch M ar 09 /2 0 er O ct ob Ju ly /2 0 09 0 • • "Famous" lncRNAs: XIST, H19, HOTAIR... (Guttman et al., Duret et al., Navarro et al., Ponting et al.,) Known functions: regulation of mRNAs expression, X chromosome inactivation, imprinting... LncRNAs Functions LncRNAs Functions (broad overview) • Can enhance or repress transcription of targeted mRNA(s) • • Can act in cis or in trans • Examples: Serve as "flexible scaffolds" • XIST : binds PRC2 (DNMT3A) => DNA hypermethylation => silencing X chromosome • HOTTIP : binds MLL1 => H3K4me3 => activation of HOXA genes (from Mattick JS, et al., 2010) ➡ RNASeq in dogs ➡ FEELnc : Annotation of lncRNAs ➡ Characterization of canine lncRNAs set Dog and non-coding genome • • • • ➡ Unique evolutionary history High heterogeneity bw breeds vs. High homogeneity within a breed One breed = One genetic isolate Facilitates the identification of Genotype/Phentoype relationships Annotate ncRNAs to exploit the strength of the dog model to identify Genotype/Phenotype relationship How to annotate lncRNAs: RNASeq • RNASeq: High throughput sequencing of all RNA molecules of cell line or a tissue at a specific time point. ! • RNASeq experiment for bioinformaticians (skipping all the different steps/protocols...) : ! How to annotate lncRNAs: RNASeq • RNASeq: High throughput sequencing of all RNA molecules of cell line or a tissue at a specific time point. ! • RNASeq experiment for bioinformaticians (skipping all the different steps/protocols...) : ! Fragments of RNA (cDNA) sequences Library construction @BRAIN_1_R.1 1 length=76! TATACATAAGCAGGTACCCACAAGGCAAGGTAGGACAGTTACTGTAGCTAATGAAAGAAAAAAGTCAGGGTGAGGA! +BRAIN_1_R.1 1 length=76! CCCFFFFFHHHHHJEHJJIJJJJJJIJJJJFGIIJJJJIJIJJJIJJJJJJJJJJJJJJJJJJJGHHHHH?BBFEC! @BRAIN_1_R.2 2 length=76! TGCATATATCACTTTTATTGGTAAATCCGCATTTCTTAGCTTAGAGACATATTCTGTAGATTATTCCCCTCCCCCT! +BRAIN_1_R.2 2 length=76! CCCFFFFFHHHHHJJJJJJJJIGIIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJIJJJJJGHIJHIJJJJE! @BRAIN_1_R.3 3 length=76! TGCATATATCACTTTTATTGGTAAATCCGCATTTCTTAGCTTAGAGACATATTCTGTAGATTATTCCCCTCCCCCT! +BRAIN_1_R.3 3 length=76! CCCFFFFFHHHHHJJJGIGIICHHHGJJJIJJJJJJJJHIIIIIJEIIIICHGIJIJIIBGHEIJIGHGEEHIIIB! @BRAIN_1_R.4 4 length=76! GAAGTGTAATCACATTTAGTTTCAAAAGTTCAAATGCCTGTTCCTGTTATACATAAGCAGGTACCCACAAGGCAAG! +BRAIN_1_R.4 4 length=76! BCCFDFFEHHHHHJGJJIJIJJJJIJJJHIJJJIIJJJJJIJJJJIIJJJJJJJJJIIJIIDFGIIIJJJJJGGHC! @BRAIN_1_R.5 5 length=76! AAGGTTTGCCCTCTTTTCTCTGAAACTTCTAGGTATTTTTAAGTTCCAGCTGGTTCTCTGCTCTGCCATAAACGAG! +BRAIN_1_R.5 5 length=76! @CCFDFFFGHHFHGIIIGIEHGHGGGGHGIJIJ:EHIIJIJIIJGHIGHIJJIHIJIHGHIIJJJIJJIIG>HIIG! @BRAIN_1_R.6 6 length=76! GAAGTAACCGCCTTTCCTGGAGGAGTGGGTGGTCTCCGCTACAATCTCATCTGCCTCCTCTCCTGAAACAGGACTG! +BRAIN_1_R.6 6 length=76! BBCFDDEDHHHHHJJJJJJJJJJIJFGIJHIIIJJJJJJJJJJIJJJJJJJJHGHHHHFFFFFEEECEDDDDDDDB! @BRAIN_1_R.7 7 length=76! GGAAATATCAGAAGTAAAAGAGTAAATGGGAAGAGGCCAAGGATGTATTCGTCCAACGGATATTAAAATGTCCTTT! +BRAIN_1_R.7 7 length=76! CCCFFFFFHHHHHJHIJJJJIJCHHIJJJJJJJJJJIJIJJJJJJHIIJJJIJJJIJJJIJJJJJJJHHHHHHHFB! @BRAIN_1_R.8 8 length=76! TGGCGCCCTGCCTGGCTCCATTAAAACAATTACCACCCTTTTGGGATCATCTACACTTCTGCTATGTCCTCTCCCT! +BRAIN_1_R.8 8 length=76! CCCFFFFFHHHHHJJJJIJIJJJIJJGJJJJJJJJJJJJJJJJIJIJJJIJJJIJJJJJJJJHEHHHHGFFFFFFC! @BRAIN_1_R.9 9 length=76! TGGCGCCCTGCCTGGCTCCATTAAAACAATTACCACCCTTTTGGGATCATCTACACTTCTGCTATGTCCTCTCCCT! +BRAIN_1_R.9 9 length=76! CCCFFFFFHHHHHJJJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIHHHHHHHFFFFFFA! @BRAIN_1_R.10 10 length=76! I. RNASeq samples available in dog CaniDNA BioBank : ➡ 34 samples ➡ 24 samples ➡ 28 from dogs at GIGA (Liège) ➡ 18 from dogs ➡ 6 from dogs at CNG (Evry) ➡ 6 from wolves ➡ Unstranded ➡ Stranded and Not stranded 58 RNAseq 33 Dogs 10 Breeds 17 Tissues ~ ~ ~ paired-end : 2x75bp /2x100 bp! 30-60 millions reads/RNA-seq ! 3 billion reads ! 300 billion nucleotides 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 TISSUE BLOOD CELL_DC CELL_LINE HEART LIVER MUSCLE OVARY TESTIS KIDNEY MUCOSA SPLEEN LYMPHATIC_NODE LUNG SKIN BRAIN BRAIN_CORTEX MUCOSA_ORAL X! 1X! 1X! 1X! 1X! 1X! 1X! 1X! 1X! 2X! 2X! 3X! 4X! 5X! 5X! 7X! 11X! 11X Pipeline for dog RNASeq analysis (Christophe Hitte) Dog Reference annotation: Ensembl (v70) RNASeq_file (.fastq) stats fastqc + sickle... Cleaning Cleaned sequences (.fastq) stats tophat2 bowtie2 Mapping Mapped files (.bam) stats Cufflink2 Known and novel transcripts(.gtf) Expression Levels (.fpkm) stats Genomic positions (bp) Transcriptome reconstruction Example of Brain (cortex) RNASeq Current dog annotation One RNASeq Experiment BRAIN RNASeq -#Genes: 29,878 -#tcpts: 44,831 ZNF3-201 Scale chr6: CUFF.25557.4 CUFF.25557.3 CUFF.25557.2 CUFF.25557.1 ENSCAFT00000023568 ENSCAFT00000023568 SINE LINE LTR DNA Simple Low Complexity Satellite RNA Other Unknown Gap 9,525,500 9,526,000 9,526,500 9,527,000 9,527,500 9,528,000 2 kb 9,528,500 canFam3 9,529,000 9,529,500 9,530,000 9,530,500 9,531,000 BROAD2_BRAIN.transcripts_gt0_ENSv70.gtf 9,531,500 9,532,000 9,532,500 9,533,000 LncRNAs_merged58_v70 RefSeq Genes Ensembl Gene Predictions - archive Ensembl 70 - jan2013 Repeating Elements by RepeatMasker Gap Locations => RNASeq allows to annotate new isoforms w.r.t to current reference annotations 9,533,500 Example of Brain (cortex) RNASeq Current dog annotation One RNASeq Experiment BRAIN RNASeq -#Genes: 29,878 -#tcpts: 44,831 New transcript Scale chr9: CUFF.30318.1 CUFF.30324.1 AFT00000043699 GGTA1 SINE LINE LTR DNA Simple Low Complexity Satellite RNA Other Unknown Gap 60,950,000 60,960,000 50 kb 60,970,000 60,980,000 60,990,000 61,000,000 61,010,000 BROAD2_BRAIN.transcripts_gt0_ENSv70.gtf 61,020,000 canFam3 61,030,000 61,040,000 RefSeq Genes Ensembl Gene Predictions - archive Ensembl 70 - jan2013 ENSCAFT00000043699 Repeating Elements by RepeatMasker => RNASeq allows to annotate new (expressed) transcripts Gap Locations => Are these lncRNAs? 61,0 ➡ RNASeq in dogs ➡ FEELnc : Annotation of lncRNAs ➡ Characterization of canine lncRNAs set FEELnc : Fast and Effective Extraction of LncRNAs RNASeq Experiment(s) I- FEELnc_Filter II- FEELnc_CodingPot III- FEELnc_Classifier LncRNAs FEELnc : Filters Merged 58 dog RNASeq samples Known and novel transcripts #tcpts: 300,735! #genes: 140,007 I- FEELnc_Filters - transcripts overlapping annotated mRNAs exon - size > N bp [default N=200] - monoexonic transcripts - Options - transcripts overlapping mRNAs locus ( get lincRNAs) - ... Candidate lncRNAs 153,910 overlap mRNAs (in sense) 111,447 mono-exonic transcripts! ! ! (from unstranded RNASeq) 3,940 length lower than 200bp -#tcpts: 31,157 FEELnc : Coding potential Candidate lncRNAs II- FEELnc_CodingPot. - - Combination of 1 to 4 dedicated programs - CPC - TxCdsPredict - CPAT - Geneid -#tcpts: 31,157 - CPC : Coding Potential Calculator! blast on protein database ! - CPAT : Coding-Potential Assessment Tool! hexamer_Frequency + ORF length analysis + Codon usage bias! - GeneId - TxCdsPredict : HMM trained on mRNAs... Get Intersection/Union and construct Venn diagram Venn diagram Dog stringent set of lncRNAs -#tcpts: 18,051! -#genes: 9,810 FEELnc : Classifier • Classifying lncRNAs genomic context wrt to mRNAs could help predict functionality -#tcpts: 18,051! -#genes: 9,810 Dog set of lncRNAs III- FEELnc_Classifier - Classify bona fide lncRNAs - Intergenic Schematic overlapping scenario LncRNA ex. Cod ex. - Genic Bidirectional promoter LncRNA ex. Intergenic (lincRNA) ! =14,726 Divergent 5,497 Genic (mRNA overlap)! =3,325 Exon (AS) 1920 Cod ex. LncRNA ex. Cod ex. Convergent 2,777 Intron (S/AS) 57/1,018 LncRNA ex. Same Orient. 6,452 Encomp (S/AS) 129/201 Cod ex. FEELnc : Classifier • Classifying lncRNAs genomic context wrt to mRNAs could help predict functionality -#tcpts: 18,051! -#genes: 9,810 Dog set of lncRNAs Schematic overlapping scenario III- FEELnc_Classifier - LncRNA ex. Classify bona fide lncRNAs - Intergenic (3 classes) Cod ex. - Genic (5 classes) LncRNA ex. Intergenic (lincRNA) ! =14,726 Divergent 5,497 Genic (mRNA overlap)! =3,325 Exon (AS) 1920 Cod ex. LncRNA ex. Cod ex. Convergent 2,777 Intron (S/AS) 57/1,018 LncRNA ex. Same Orient. 6,452 Encomp (S/AS) 129/201 Cod ex. FEELnc : Classifier • Classifying lncRNAs genomic context wrt to mRNAs could help predict functionality -#tcpts: 18,051! -#genes: 9,810 Dog set of lncRNAs Schematic overlapping scenario III- FEELnc_Classifier - LncRNA ex. Classify bona fide lncRNAs - Intergenic (3 classes) Cod ex. - Genic (5 classes) LncRNA ex. Intergenic (lincRNA) ! =14,726 Divergent 5,497 Genic (mRNA overlap)! =3,325 Exon (AS) 1920 Cod ex. LncRNA ex. Cod ex. Convergent 2,777 Intron (S/AS) 57/1,018 LncRNA ex. Same Orient. 6,452 Encomp (S/AS) 129/201 Cod ex. FEELnc : Classifier • Classifying lncRNAs genomic context wrt to mRNAs could help predict functionality -#tcpts: 18,051! -#genes: 9,810 Dog set of lncRNAs Schematic overlapping scenario III- FEELnc_Classifier - Exonic AS Classify bona fide lncRNAs - Intergenic (3 classes) Intergenic (lincRNA) ! =14,726 - Genic (5 classes) Genic (mRNA overlap)! =3,325 Divergent 5,497 Exon (AS) 1920 Convergent 2,777 Intron (S/AS) 57/1,018 Same Orient. 6,452 Contain (S/AS) 129/201 LncRNA ex. Cod ex. LncRNA ex. Intronic Contain Cod ex. LncRNA ex. Cod ex. Dog lncRNAs: a few examples • Dog XIST: not annotated by Ensembl reference annotation (partially by comparative genomics) Scale chrX: 10 kb 57,320,000 57,325,000 57,330,000 57,335,000 TCONS_00297829 TCONS_00297821 TCONS_00297827 TCONS_00297826 TCONS_00297831 TCONS_00297830 TCONS_00297828 TCONS_00297825 TCONS_00297824 TCONS_00297823 TCONS_00297822 ENSCAFT00000045197 canFam3 57,345,000 57,350,000 RefSeq Genes Ensembl Gene Predictions - archive Ensembl 70 - jan2013 57,355,000 ENSCAFT00000048497 Non-Dog RefSeq Genes Bos XIST Homo XIST Gap Locations Gap • 57,340,000 User Supplied Track CDKN2B-AS (ANRIL) : associated by GWAS study with many diseases (coronary disease, aneurysm, type 2 diabetes) Scale chr11: TCONS_00028371 TCONS_00028379 TCONS_00028378 TCONS_00028377 TCONS_00028376 TCONS_00028375 TCONS_00028374 TCONS_00028373 TCONS_00028372 TCONS_00028381 TCONS_00028382 TCONS_00028385 TCONS_00028384 TCONS_00028383 TCONS_00028386 TCONS_00028387 TCONS_00028388 TCONS_00028391 TCONS_00028394 TCONS_00028400 TCONS_00028401 TCONS_00028402 TCONS_00028403 CDKN2B ENSCAFT00000002632 41,270,000 41,280,000 41,290,000 41,300,000 50 kb 41,310,000 41,320,000 41,330,000 41,340,000 41,350,000 LncRNAs_merged58_v70 41,360,000 RefSeq Genes Ensembl Gene Predictions - archive Ensembl 70 - jan2013 ENSCAFT00000045034 canFam3 41,370,000 41,380,000 41,390,000 41,400,000 41,410,000 ➡ RNASeq in dogs ➡ FEELnc : Annotation of lncRNAs ➡ Characterization of canine lncRNAs set Dog lncRNAs Characterization (i) GERP (Genomic Evolutionary Rate Profiling) identifies constrained elements in multiple alignments by quantifying substitution deficits => LncRNAs exons do not seem to be evolutionary conserved wrt to mRNAs => LncRNAs promoters as conserved as mRNAs Dog lncRNAs Characterization (ii) Proportion of lncRNAs/mRNAs transcripts found in 58 RNASeq exp. by increased RPKM thresholds 100 Proportion of elements (%) 75 RPKM thresholds 0 0.5 1 50 5 10 100 25 0 LncRNAs mRNAs Transcript type Lower level of expression compared to mRNAs Dog lncRNAs Characterization (iii) Dog LncRNAs Txs are more tissue-specific than mRNAs Dog bidirectional lncRNAs and disease (J.Plassais) • Sensory Neuropathy : Insensitivity to pain • GWAS on 50 cases/control dogs identifies a single locus • Capture locus and NGS resequencing • One mutation located in a lncRNA sharing a bi-directional promoter with a (very) interesting candidate gene • Functional validation : qPCR + enhancer assay (Rory Jonhson, CRG Barcelona) Conclusions • • Annotation of a catalogue of ~18,000 lncRNAs in dogs • As in humans, canine LncRNAs: Development of a bioinformatic method (FEELnc) to automatize lncRNA identification using RNASeq 1. are modestly conserved through evolution 2. are less expressed compared mRNAs 3. tend to exhibit a tissue specific expression ! • Integration of lncRNAs catalogue with ongoing research project in the team allows the study (and experimentally validation) of dog lncRNAs and highlights the importance of lncRNA in genotype/phenotype relationship (disease) Perspectives : Finding functionally important lncRNAs ➡ by comparative genomics : AutoGRAPH (http://autograph.genouest.org/) Using species-specific lncRNA and gene order conservation: - to increase dog lncRNAs repertoire - to find evolutionary conserved LncRNAs ➡ by structure assessment : Find lncRNAs that are more likely to be folded Collaboration with G. Rizk and D. Lavenier (Genscale, IRISA) - GPU acceleration because of long ncRNAs versus miRNAs Integrating lncRNAs catalogue with cancer projects in the lab Affected dogs Healthy dogs - Hist.Sarcoma - Melanoma - Lymphoma - Mendelian diseases RNASeq | LncRNA catalogue • Variants in lncRNA sequences • eQTL affecting lncRNAs expression • LncRNAs/mRNAs differentially expressed • Fusion genes (Mathieu Bahin) •… ACKNOWLEDGEMENTS - IGDR. CNRS-UMR6290, Rennes Christophe Hitte Laetitia Lagoutte Mathieu Bahin Anne-Sophie Guillory Benoit Hédan Clotilde de brito Amaury Vaysse Melanie Rault Jocelyn Plassais Ronan Ulvé Edouard Cadieu Morgane Bunel Catherine ANDRÉ - Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine. University Liège Benoit HENNUY Wouter COPPIETERS - BROAD Institute/Uppsala University Jennifer MEADOW Kerstin LINDBLAD-TOH - Center for Genomic Regulation -BarcelonaRory JOHNSON Giovanni BUSSOTTI Cédric NOTREDAME Roderic GUIGÓ LUPA - Genomic Plate-form -RennesBirama N’DIAYE Marie DE TAYRAC Marc AUBRY - Biogenouest - Symbiose - Genscale Team Fabrice Legeai, Claire Lemaître, Pierre Peterlongo, Guillaume Rizk, D. Lavenier, Olivier Collin et al - Centre National Genotypage -ParisDiana ZELENIKA Anne BOLAND Parameters: RNASeq_file (.fastq) - reference genome (e.g canFam3) - reference annotation (e.g ensembl v.70) stats Cleaning fastqc or sickle or ... Cleaned sequences (.fastq) stats tophat-bowtie2 or gsnap Mapping Mapped files (.sam) stats Samtools Compressed mapped Cufflink or Trinity Transcriptome reconstruction Mapped files (.bam) stats Annotated and novel transcripts(.gtf) CPC-CPAT-TxCdsPredict CPC-CPAT-TxCdsPredict mRNAs ncRNAs Coding Potential Filtering Filter (length 200bp) Characterization Characterization: - length - Nb exons - comparison with mRNAs ! Classification - intergenic - intragenic - exon overlap - intron overlap - Sense versus Antisense LncRNAs mRNAs Correlation expression in N tissues/Dvpt Interaction prediction via structure Correlation of expression Structure prediction Searching for functional LncRNAs ! ➡ Determine 2ndary structure for lncRNAs (computationally challenging task) ! ➡ Interactions lncRNAs:mRNAs ★ Correlation of expression lncRNAs:mRNAs Predict lncRNAs-mRNAs co-expression profiles Genotype to phenotype relationship Not only protein-coding genes... ENVIRONMENT Epigenetic Population (sub-)structure Non-coding transcriptome CNV ... Why dogs? • • Unique structure population due to unique history • One breed = one genetic isolate High heterogeneity between breeds whereas homogeneity intra breeds • Cancers homologues aux humains • Cancers spécifiques de races (fréquence élevée dans une race ≈ 20%) • Cancers spontanés (et non induits comme modèle murin) • Même environnement que homme ! • Accès facilité aux prélèvements ➡ Most of the traits are governed by a few variants with high phenotypical effects
© Copyright 2026 Paperzz