CHAPTER 16 SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES USING HIGH-THROUGHPUT APPROACHES +HOLR3DLV6LPRQ0R[RQ7DPDV'DOPD\DQG9LQFHQW0RXOWRQ School of Computing Sciences, University of East Anglia, Norwich, UK *Corresponding Author: Vincent Moulton—Email: [email protected] Abstract: 51$ VLOHQFLQJ LV D PHFKDQLVP RI JHQHWLF UHJXODWLRQ WKDW LV PHGLDWHG E\ VKRUW noncoding RNAs, or small RNAs (sRNAs). Regulatory interactions are established based on nucleotide sequence complementarity between the sRNAs and their WDUJHWV 7KH GHYHORSPHQW RI QHZ KLJKWKURXJKSXW VHTXHQFLQJ WHFKQRORJLHV KDV DFFHOHUDWHGWKHGLVFRYHU\RIV51$VLQDYDULHW\RISODQWVDQGDQLPDOV7KHXVH RIWKHVHDQGRWKHUKLJKWKURXJKSXWWHFKQRORJLHVVXFKDVPLFURDUUD\VWRPHDVXUH 51$DQGSURWHLQFRQFHQWUDWLRQVRIJHQHSURGXFWVSRWHQWLDOO\UHJXODWHGE\V51$V KDVDOVREHHQLPSRUWDQWIRUWKHLUIXQFWLRQDOFKDUDFWHULVDWLRQP51$VWDUJHWHGE\ sRNAs can produce new sRNAs or the protein encoded by the target mRNA can UHJXODWHRWKHUP51$V,QHLWKHUFDVHWKHWDUJHWLQJV51$VDUHSDUWVRIFRPSOH[ 51$QHWZRUNVWKHUHIRUHLGHQWLI\LQJDQGFKDUDFWHULVLQJV51$VFRQWULEXWHWREHWWHU XQGHUVWDQGLQJRI51$QHWZRUNV,QWKLVFKDSWHUZHZLOOUHYLHZ51$VLOHQFLQJ WKHGLIIHUHQWW\SHVRIV51$VWKDWPHGLDWHLWDQGWKHFRPSXWDWLRQDOPHWKRGVWKDW KDYHEHHQGHYHORSHGWRXVHKLJKWKURXJKSXWWHFKQRORJLHVLQWKHVWXG\RIV51$V and their targets. INTRODUCTION %LRORJLFDOV\VWHPVDUHUHJXODWHGDWPXOWLSOHOD\HUVWKURXJKDP\ULDGRIPHFKDQLVPV $WWKHFHOOXODUOHYHOQRUPDOIXQFWLRQUHTXLUHVUHJXODWLRQRIJHQHH[SUHVVLRQ2QHRIWKH systems eukaryotic cells have in place to accomplish this task is RNA silencing, a process LQZKLFKDFRPSOH[IRUPHGE\DQ51$PROHFXOHDQGRQHRUPRUHSURWHLQVLQWHUDFWV HLWKHUZLWKDGLIIHUHQW51$RU'1$FDXVLQJPRGL¿FDWLRQVLQWKHUDWHVRIWUDQVODWLRQ RNA Infrastructure and Networks, edited by Lesley J. Collins . ©2011 Landes Bioscience and Springer Science+Business Media. 239 240 RNA INFRASTRUCTURE AND NETWORKS or transcription. The RNA molecules present usually contain less than 30 nucleotides and are commonly called small RNAs (sRNAs). In this chapter we will adopt this convention, although we note that there are other noncoding small RNAs that are not involved in RNA silencing (e.g., tRNAs, snoRNAs) but they will not be described here. In prokaryotic organisms there are also noncoding RNAs that regulate gene expression, EXW WKH PHFKDQLVPV E\ ZKLFK WKH\ DFWLV FRPSOHWHO\GLIIHUHQWIURPHXNDU\RWLF51$ VLOHQFLQJDQGWKHLUUHYLHZIDOOVRXWVLGHWKHVFRSHRIWKLVFKDSWHU)RUUHYLHZVRQWKLV WRSLFVHHIRUH[DPSOHUHIHUHQFHV1-3. 2QH RI WKH PRVW LPSRUWDQW IDFWRUV LQ WKH H[SORVLYH JURZWK RI NQRZOHGJH LQ WKH 51$VLOHQFLQJ¿HOGKDVEHHQWKHDSSOLFDWLRQRIQHZ'1$VHTXHQFLQJWHFKQRORJLHV,Q comparison to conventional Sanger sequencing, these new technologies are characterised by producing shorter reads with very high-throughput. Researchers in the RNA silencing ¿HOGKDYHEHHQDPRQJWKH¿UVWWRDGRSWKLJKWKURXJKSXWVHTXHQFLQJDSSURDFKHVEHFDXVH WKHPDLQOLPLWDWLRQRIWKHWHFKQLTXHLHWKHVKRUWUHDGOHQJWKLVLUUHOHYDQWIRUWKHDQDO\VLV RIVPDOO51$V51$ZKLFKDUHXVXDOO\VKRUWHUWKDQQXFOHRWLGHV+LJKWKURXJKSXW sequencing and other high-throughput methods, such as microarrays, have also been LPSRUWDQWERWKWRSUR¿OHWKHH[SUHVVLRQRIV51$VDQGWUDQVFULSWVSRWHQWLDOO\UHJXODWHGE\ V51$VV51$WDUJHWV7KHODWWHUKDVEHHQSDUWLFXODUO\XVHIXOLQDQLPDOV\VWHPVZKHUH WKHLQWHUDFWLRQEHWZHHQPRVWV51$VDQGWKHLUWDUJHWVLVPHGLDWHGE\DVPDOOQXPEHURI QXFOHRWLGHVDVIHZDVVL[&RQVHTXHQWO\DFFXUDWHFRPSXWDWLRQDOSUHGLFWLRQRIV51$ WDUJHWVLVTXLWHFKDOOHQJLQJ,QWKLVFKDSWHUZHZLOOUHYLHZWKH¿HOGRI51$VLOHQFLQJ describe the small RNAs that are involved in this process and how high-throughput WHFKQRORJLHVKDYHEHHQXVHGWRVWXG\WKHLUELRORJLFDOIXQFWLRQ THE MECHANISMS OF RNA SILENCING 7KH ¿UVW V51$ WR EH GLVFRYHUHG OLQ ZDV IRXQG LQ D JHQHWLF VFUHHQ WR VWXG\ GHYHORSPHQWDOGHIHFWVLQWKHZRUPC. elegans.4,5,QSODQWVWKHGLVFRYHU\RIV51$VZDV PDGHE\UHVHDUFKHUVZRUNLQJRQYLUDOGHIHQFH6,7 Meanwhile, work was being done on the GHOLYHU\RIH[RJHQRXV51$PROHFXOHVZLWKWKHJRDORIUHSUHVVLQJJHQHH[SUHVVLRQ8 The use RIGRXEOHVWUDQGHG51$PROHFXOHVWRVSHFL¿FDOO\DQGVWURQJO\UHSUHVVJHQHH[SUHVVLRQOHG WRWKHDZDUGRID1REHO3UL]HLQPHGLFLQHOHVVWKDQWHQ\HDUVDIWHULWZDV¿UVWXQFRYHUHG9 6LQFHWKHQDPXOWLWXGHRIGLIIHUHQWV51$FODVVHVKDYHEHHQFKDUDFWHULVHGPRVW RI ZKLFK DUH GHULYHG IURP ORQJHU GRXEOH VWUDQGHG RU IROGEDFN 51$ SUHFXUVRUV RNAaseIII-type enzymes called Dicers are able to recognise these precursors and process them to produce double stranded sRNAs. These are then incorporated into an Argonaute SURWHLQZKHUHRQHRIWKHVWUDQGVLVGHJUDGHG7KHRWKHUVWUDQGFDOOHGWKHJXLGHVWUDQG UHPDLQVLQFRUSRUDWHGLQWRWKH$UJRQDXWHZKLFKLQFRQMXQFWLRQZLWKRWKHUSURWHLQVIRUPV DQHIIHFWRUFRPSOH[FDSDEOHRIUHFRJQL]LQJDVSHFL¿F'1$RU51$WDUJHW10 Once the HIIHFWRUFRPSOH[KDVERXQGWKHWDUJHWLWLVDEOHWRHLWKHULQGXFHWDUJHWP51$FOHDYDJH mRNA destabilisation without cleavage, inhibit protein translation or cause DNA and KLVWRQHPRGL¿FDWLRQVWKDWOHDGWRWUDQVFULSWLRQDOVLOHQFLQJ11-137KHQDWXUHRIWKHSUHFXUVRU RIWKHVXEVHTXHQWSURFHVVLQJVWHSVDQGRIWKHWDUJHW'1$P51$RURWKHUV51$V determines the class a sRNA belongs to (Fig. 1). V51$VLQYROYHGLQ51$VLOHQFLQJKDYHEHHQGHVFULEHGLQDODUJHQXPEHURIHXNDU\RWHV +RZHYHUWKHVPDOO51$ODQGVFDSHVDUHTXLWHGLYHUVHDPRQJGLIIHUHQWSK\OD$OWKRXJK VRPHRIWKH51$VLOHQFLQJPDFKLQHU\LVFRQVHUYHGEHWZHHQPRVWHXNDU\RWHVWRGDWHQR Figure 1. 6PDOO 51$ SURGXFWLRQ SDWKZD\V 5HSULQWHG ZLWK SHUPLVVLRQ IURP 3KLOOLSV -5 'DOPD\ 7 %DUWHOV ' FEBS Letters 2007; 581(19):3592-3597. SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES 241 242 RNA INFRASTRUCTURE AND NETWORKS individual small RNA conserved between animals and plants has been discovered.14,15 7KHUHDUHDOVRGLIIHUHQFHVLQWKHKHWHURJHQHLW\RIV51$VLQSODQWVWKHUHLVDPXFKJUHDWHU diversity in sRNA producing mechanisms than in animals.7 PL51$VGHULYHIURPSUHFXUVRUVFKDUDFWHULVHGE\DVHFRQGDU\VWUXFWXUHLQZKLFKPRVW QXFOHRWLGHVIRUPDVLQJOHVWHPPL51$VDUHSUHVHQWLQDQLPDOVDQGSODQWVEXWWKHUHDUH VLJQL¿FDQWVWUXFWXUDOGLIIHUHQFHVEHWZHHQWKHP15-17 Generally, pre-miRNAs are longer DQGIRUPDJUHDWHUQXPEHURIEDVHSDLUVLQSODQWVWKDQLQDQLPDOV7KHELRJHQHVLVRI PL51$VKDVEHHQSDUWLFXODUO\ZHOOVWXGLHG7KLVSURFHVVEHJLQVZLWKWKHWUDQVFULSWLRQRI WKHJHQRPLFORFXVZKHUHWKHPL51$LVHQFRGHGFRQWLQXHVZLWKDVHULHVRISURFHVVLQJ VWHSVRIWKHUHVXOWLQJ51$WUDQVFULSWDQGFXOPLQDWHVZLWKWKHLQFRUSRUDWLRQRIWKHPDWXUH PL51$LQWRDVLOHQFLQJFRPSOH[:HUHIHUWKHUHDGHUWRWKHPDQ\UHYLHZDUWLFOHVWKDW extensively describe miRNA biogenesis in animals,18-21 plants17,22 and both.15,23 6PDOOLQWHUIHULQJ51$VVL51$VDUHDOVRSUHVHQWLQDQLPDOVDQGSODQWVDQGDJDLQ WKHUHDUHPDQ\GLIIHUHQFHVLQWKHLUELRJHQHVLVDQGPRGHVRIDFWLRQVL51$VRULJLQDWH IURPDORQJHUVLQJOHVWUDQGHGSUHFXUVRUWKDWLQSODQWVLVWXUQHGLQWRGRXEOHVWUDQGHG IRUP E\ DQ 51$GHSHQGHQW 51$ SRO\PHUDVH 5G5S EHIRUH EHLQJ SURFHVVHG E\ D Dicer enzyme.10,14,24,25VL51$VFDQWDUJHWERWKWKHORFLZKHUHWKH\DUHSURGXFHGIURPDQG other loci with high sequence homology. siRNAs can act both by DNA transcriptional VLOHQFLQJDQG51$GHJUDGDWLRQ$VXEFODVVRIVL51$VFDOOHGQDWXUDODQWLVHQVHVL51$V QDWVL51$VLVSURGXFHGIURPSDUWLDOO\RYHUODSSLQJSDLUVRIWUDQVFULSWVRULJLQDWLQJIURP RSSRVLQJVWUDQGVRI'1$3URGXFWLRQRIQDWVL51$VKDVVRIDURQO\EHHQREVHUYHGLQ VWUHVVUHVSRQVH,QDQLPDOVOHVVLVNQRZQDERXWWKHELRJHQHVLVDQGLGHQWLW\RIWKHWDUJHWV RIVL51$V$WOHDVWDVXEVHWRIWKHPDUHSURGXFHGIURPRYHUODSSLQJWUDQVFULSWVDQGVHHP to be involved in transposon silencing.26 3LZLLQWHUDFWLQJ51$VSL51$VFRQVWLWXWHDGLIIHUHQWFODVVRIV51$VWKDWLQWHUDFW ZLWKDVXEIDPLO\RIWKH$UJRQDXWHSURWHLQVFDOOHG3LZLSURWHLQV14,27 piRNAs are thought to be present exclusively in animals, seem to be expressed only in the germline and FRQWULEXWHWRWKHVWDELOLW\RIWKHFHOOOLQHE\VLOHQFLQJWUDQVSRVRQVSL51$VDUHXVXDOO\ QXFOHRWLGHVORQJDQGPRVWRIWKHPDUHHQFRGHGLQUHSHWLWLYHUHJLRQVRIWKHJHQRPH 51$VLOHQFLQJUHTXLUHVERWKDQV51$DQGDQHIIHFWRUFRPSOH[IRUPHGE\DVHWRI SURWHLQVLQFOXGLQJDPHPEHURIWKH$UJRQDXWHIDPLO\7KHUHDUHPDLQO\WZRW\SHVRI HIIHFWRUFRPSOH[5,6&V51$LQGXFHGVLOHQFLQJFRPSOH[HVDQG5,76V51$LQGXFHG LQLWLDWLRQ RI WUDQVFULSWLRQDO VLOHQFLQJ 5,76V DV WKH QDPH LQGLFDWHV DUH LQYROYHG LQ WUDQVFULSWLRQDOVLOHQFLQJE\SURPRWLQJWKHIRUPDWLRQRIKHWHURFKURPDWLQ13 RISCs mediate posttranscriptional regulation and act by binding to messenger RNAs and either promoting WKHP51$¶VGHJUDGDWLRQRULQÀXHQFLQJWKHUDWHRIWKHP51$¶VWUDQVODWLRQ12 Each sRNA JXLGHVWKH5,6&WRDVSHFL¿FUHJLRQRIDQP51$NQRZQDVWKHWDUJHWVLWHEDVHGRQ complementarity between the sRNA and its target. The rules governing the interaction EHWZHHQDQV51$FRQWDLQLQJ5,6&DQGDQP51$DUHQRWFXUUHQWO\IXOO\XQGHUVWRRG )RUH[DPSOHQRWRQO\LWLVLQIHDVLEOHWRUHOLDEO\SUHGLFWDQP51$V51$LQWHUDFWLRQLW is also not always possible to say, a prioriZKDWHIIHFWDQV51$KDVRQDWDUJHWP51$ 7KHIRXUNQRZQRXWFRPHVRIV51$WDUJHWLQJDUHFOHDYDJHRIWDUJHWP51$DFFHOHUDWHG GHJUDGDWLRQRIWDUJHWP51$UHSUHVVLRQRIWUDQVODWLRQZLWKRXWP51$GHJUDGDWLRQDQG HQKDQFHPHQWRIWUDQVODWLRQ28 7KHELRORJLFDOSURFHVVHVUHJXODWHGE\HDFKV51$GHSHQGRQLWVVHWRIWDUJHWV)RUWKLV UHDVRQDODUJHHIIRUWKDVEHHQPDGHWRH[WHQVLYHO\LGHQWLI\V51$WDUJHWVDQGLQSDUWLFXODU PL51$WDUJHWVV51$VJXLGHWKHHIIHFWRUFRPSOH[HVWRWKHLUWDUJHWV7KHWDUJHWUHFRJQLWLRQ LV EDVHG RQ QXFOHRWLGH VHTXHQFH FRPSOHPHQWDULW\ 7KH GHJUHH RI FRPSOHPHQWDULW\ SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES 243 SDUWLDOO\ GHWHUPLQHV WKH WDUJHWLQJ RXWFRPH D KLJK GHJUHH RI FRPSOHPHQWDULW\ PRUH FRPPRQLQSODQWVQRUPDOO\OHDGVWRWDUJHWFOHDYDJHDORZGHJUHHRIFRPSOHPHQWDULW\ PRUHFRPPRQLQDQLPDOVOHDGVWRGHFUHDVHGUDWHVRIWUDQVODWLRQRULQFUHDVHGUDWHVRI target degradation.10%DVHGRQH[SHULPHQWDOO\YDOLGDWHGV51$WDUJHWVVHWVRIHPSLULFDOO\ GHULYHG UXOHV KDYH EHHQ FUHDWHG ERWK IRU KLJKFRPSOHPHQWDULW\ WDUJHW VLWHV29,30 and low-complementarity target sites.31,32 *LYHQWKHZLGHVSUHDGHIIHFWRIV51$JXLGHGJHQHH[SUHVVLRQUHJXODWLRQLWFRPHV DVQRVXUSULVHWKDWWKHH[SUHVVLRQRIV51$JHQHVWKHPVHOYHVLVWLJKWO\UHJXODWHG$V with protein coding genes, control can be exerted at many levels and with distinct OHYHOVRIVSHFL¿FLW\332QHOHYHORIUHJXODWLRQWDNHVSODFHGXULQJWKHELRJHQHVLVRIWKH V51$V1DPHO\WKHDYDLODELOLW\RIWKHSURWHLQVUHTXLUHGIRUWKHIRUPDWLRQRIDFHUWDLQ FODVVRIV51$VZLOOOLPLWWKHOHYHORIWKRVHV51$V)RUH[DPSOHLQVRPHFDQFHUWLVVXHV DQLQFUHDVHGQXPEHURIFRSLHVRI'URVKDDJHQHLQYROYHGLQPL51$ELRJHQHVLVKDV EHHQREVHUYHGOHDGLQJWRZLGHVSUHDGRYHUH[SUHVVLRQRIPL51$V34 Another example LVWKHFRQWURORIWKHH[SUHVVLRQOHYHORIRQHRIWKH'LFHUSURWHLQV'&/E\DPL51$ PL5%HFDXVHWKHSURGXFWLRQUDWHRIWKHPL51$GHSHQGVRQWKHOHYHORI'LFHU WKHVHLQWHUDFWLRQVIRUPDQHJDWLYHIHHGEDFNORRS35 $QRWKHUIDFWRUWKDWLQÀXHQFHVV51$DFWLYLW\LVWKHSUHVHQFHRIRWKHU51$PROHFXOHV WKDWPLJKWFRPSHWHZLWKWKHV51$VIRURQHRUPRUHRIWKHVHSURWHLQV,QArabidopsis VRPHQRQFRGLQJ51$VZKRVHVHFRQGDU\VWUXFWXUHUHVHPEOHVWKDWRISUHPL51$VVHHP WRFRPSHWHIRURQHSURWHLQLQYROYHGLQSODQWPL51$ELRJHQHVLV+</36 An interesting FRQVHTXHQFHRI WKH LPSRUWDQFHRI WKH OLPLWHGDYDLODELOLW\RI WKH SURWHLQV LQYROYHGLQ PL51$DFWLYLW\LVWKDWHDFKPL51$FRQWUROVWKHDFWLYLW\OHYHORIDOORWKHUPL51$V DOEHLWSDVVLYHO\DQGLQGLUHFWO\)RUH[DPSOHLQ]HEUD¿VKHPEU\RVZKHUHPL51$DFWLYLW\ LVHVVHQWLDOWRSURPRWHWKHGHJUDGDWLRQPDWHUQDORIP51$37WUDQVIHFWLRQZLWKVL51$ UHGXFHVWKHDFWLYLW\RIHQGRJHQRXVPL51$VSUHVXPDEO\EHFDXVHWKHVL51$FRPSHWHV ZLWKWKHPL51$VIRU$UJRQDXWH387KHUHDUHDOVRPDQ\PHFKDQLVPVRIUHJXODWLQJWKH DFWLYLW\RILQGLYLGXDOV51$V$VZLWKRWKHUW\SHVRIJHQHVWKHUHDUHPDQ\IDFWRUVWKDW FRQWUROWKHWUDQVFULSWLRQUDWHRIV51$FRQWDLQLQJWUDQVFULSWVE\ELQGLQJWRWKHUHVSHFWLYH promoter region.39,QSDUWLFXODUWKHSURPRWHUUHJLRQVRIPL51$VKDYHEHHQH[WHQVLYHO\ studied40,41DQGIRUDODUJHQXPEHURIPL51$VVRPHRIWKHSURWHLQVUHJXODWLQJWUDQVFULSWLRQ KDYHEHHQLGHQWL¿HG40 USING MICROARRAYS AND DNA SEQUENCING TO MEASURE RNA 0XFKRIWKHXQGHUVWDQGLQJRIKRZV51$VDUHSURGXFHGDQGZKDWIXQFWLRQVWKH\ SHUIRUPLQVLGHWKHFHOOLVGHULYHGIURPPHDVXUHPHQWVRIWKHV51$VWKHPVHOYHVRIWKHLU UHVSHFWLYHSUHFXUVRUVDQGRIWKHWUDQVFULSWVUHJXODWHGE\WKHP7HFKQLTXHVWRPHDVXUH WKHDEXQGDQFHRI51$PROHFXOHVVXFKDV57T3&5DQGQRUWKHUQEORWWLQJKDYHEHHQ XVHG IRU PRUH WKDQ WKLUW\ \HDUV DQG DUH VWLOO XVHG WR YHULI\ UHVXOWV REWDLQHG WKURXJK other means.42,43+RZHYHUWKHVHPHWKRGVDUHYHU\WLPHFRQVXPLQJDQGWKHUHIRUHFDQ EHXVHGWRVWXG\RQO\DOLPLWHGQXPEHURI51$V6LQFHWKHVWZRKLJKWKURXJKSXW approaches that can be used to measure RNA, have been developed: microarrays and RNA sequencing (RNAseq). 0LFURDUUD\V DUH EDVHG RQ WKH VLPXOWDQHRXV K\EULGL]DWLRQ RI D ODUJH QXPEHU RI SUREHVDWWDFKHGWRDVROLGVXUIDFHDQGWKHVDPSOHRILQWHUHVW44 The probes are designed VRWKDWHDFKRIWKHPLVFRPSOHPHQWDU\WRDXQLTXHVHTXHQFHLQWKHVDPSOHFDOOHGWKH 244 RNA INFRASTRUCTURE AND NETWORKS probe target. This approach has two limitations: it is possible to measure only molecules IRUZKLFKWKHVHTXHQFHLVNQRZQDQGLWLVSRVVLEOHWKDWPHDVXUHPHQWVDUHFRUUXSWHGE\ FURVVK\EULGL]DWLRQWKDWLVK\EULGL]DWLRQRIDSUREHWRDPROHFXOHRWKHUWKDQLWVWDUJHW 7KLVFDQKDSSHQLIWKHVDPSOHFRQWDLQVVHTXHQFHVKLJKO\VLPLODUWRWKHWDUJHWLQZKLFK case hybridization to the probe might still occur. $QRWKHUZD\RIHVWLPDWLQJ51$FRQFHQWUDWLRQVFRQVLVWVRIVHTXHQFLQJDQXPEHURI PROHFXOHVLQWKHVDPSOHDQGWDNLQJWKHQXPEHURIWLPHVHDFKPROHFXOHLVVHTXHQFHGDVD PHDVXUHRILWVDEXQGDQFH7KHTXDOLW\RIWKHUHVXOWVREWDLQHGXVLQJWKLVDSSURDFKFULWLFDOO\ GHSHQGVRQWKHWRWDOQXPEHURIUHDGVWKDWFDQEHREWDLQHGIURPDVLQJOHVDPSOH7KHQHZ KLJKWKURXJKSXWVHTXHQFLQJWHFKQRORJLHVGHYHORSHGRYHUWKHODVWIHZ\HDUVKDYHLQFUHDVHG WKLVQXPEHUE\VHYHUDORUGHUVRIPDJQLWXGH7KH¿UVWKLJKWKURXJKSXWVHTXHQFLQJPHWKRG to be developed is called massively parallel signature sequencing (MPSS)45 but is now YHU\VHOGRPXVHG&XUUHQWO\WKUHHFRPPHUFLDOSODWIRUPVDUHZLGHO\XVHG5RFKH¶V )/;V\VWHP46 Illumina’s Genome Analyzer47IRUPHUO\NQRZQDV6ROH[DVHTXHQFLQJ and succeeded by Illumina’s more recent model the HiSeq 2000) and ABI’s SOLiD.48 $NQRZQGUDZEDFNRIVHTXHQFLQJLVWKDWWKHVHTXHQFLQJVRIWZDUHXVHVVLJQDO LQWHQVLW\WRGHWHUPLQHWKHQXPEHURIFRQVHFXWLYHLGHQWLFDOEDVHVLQDVHTXHQFH:KHQ PXOWLSOHFRQVHFXWLYHLGHQWLFDOEDVHVDUHHQFRXQWHUHGHVSHFLDOO\IRXURUPRUHUHSHDWHG EDVHVWKHVRIWZDUHFDQQRWUHOLDEO\LQWHUSUHWWKHVLJQDOLQWHQVLW\DQGWKHUHIRUHWKHQXPEHU RI EDVHV UHDG ZKLFK FDQ OHDG WR VHTXHQFLQJ HUURUV HVSHFLDOO\ ZLWK ORZ FRPSOH[LW\ sequences. Using the Genome Analyzer technology each nucleotide is sequenced individually, eliminating the problem that 454 technology has with homopolymeric VHTXHQFHV+RZHYHUWKH*HQRPH$QDO\]HUSURGXFHVDQLQFUHDVHGQXPEHURIHUURUVDW the 3vHQGRIORQJHUVHTXHQFHVPHDQLQJWKDWVHTXHQFHOHQJWKLVDOLPLWLQJIDFWRURIWKLV WHFKQRORJ\62/L'VHTXHQFLQJLVDOVRDIIHFWHGE\OHQJWKEXWKDVDQLQFUHDVHGWKURXJKSXW FRPSDUHGWRWKHRWKHUKLJKWKURXJKSXWSODWIRUPV 7KH VHTXHQFLQJ DSSURDFK RYHUFRPHV WKH WZR PDLQ OLPLWDWLRQV RI PLFURDUUD\V with sequencing it is possible to measure both known and unknown products and in principle the measurements are independent. Furthermore, in high-throughput sequencing GDWDVHWVLWLVSRVVLEOHWRGHWHFWVHTXHQFHYDULDQWV2QHSUREOHPRIWKLVPHWKRGKDVEHHQ highlighted by two recent studies that indicate that there might be biases on the number RIWLPHVDVHTXHQFHLVUHDG49,50 That is, the expression levels, as measured by the count RIDVHTXHQFHZLOOEHGLVWRUWHGEHFDXVHLQGLYLGXDOVHTXHQFHVKDYHGLIIHUHQWSURSHQVLWLHV WREHVHTXHQFHG$QRWKHUSRWHQWLDOGLVDGYDQWDJHRIWKLVPHWKRGLVWKDWWKHGLVFRYHU\RI sRNAs is limited to ligation compatible sRNAs, in general with a 5v mono-phosphate group and a 3v hydroxyl group. This means that there are probably unknown sRNAs that KDYHQRWEHHQXQFRYHUHGE\WKLVWHFKQRORJ\DVRI\HW HIGH-THROUGHPUT APPROACHES FOR THE DISCOVERY AND FUNCTIONAL CLASSIFICATION OF sRNAs sRNA Discovery %HIRUHWKHDGYHQWRIKLJKWKURXJKSXWVHTXHQFLQJPHWKRGVVRPHHIIRUWVZHUHPDGHWR computationally predict miRNA genes.30,51-550RVWRIWKHVHPHWKRGVWU\WR¿QGJHQRPLF UHJLRQVWKDWFRXOGSURGXFH51$VZLWKVLPLODUFKDUDFWHULVWLFVWRWKRVHRISUHFXUVRUPL51$V 7R¿QGWKHVHUHJLRQVVHFRQGDU\VWUXFWXUHSUHGLFWLRQDOJRULWKPVDUHRIWHQHPSOR\HG56,57 SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES 245 7KHVHDOJRULWKPVUHWXUQVHFRQGDU\VWUXFWXUHVWKDWPLQLPL]HWKHIUHHHQHUJ\RIWKH51$ under a certain model. miRNA candidates display predicted secondary structures with FKDUDFWHULVWLFVVLPLODUWRWKRVHRINQRZQPL51$V)RUH[DPSOHQXPEHUDQGSRVLWLRQV RIWKHEDVHSDLUVQXPEHURIQXFOHRWLGHVEULGJLQJWKHWZRDUPVRIWKHVWHPRUQXPEHU DQGOHQJWKRILQWHUQDOEXOJHVLQWKHVWHP 0RVWPHWKRGVSUHGLFWPDQ\WKRXVDQGVRIFDQGLGDWHPL51$VHTXHQFHVLQGLFDWLQJWKDW VXFKDSSURDFKHVVXIIHUIURPDODFNRIVSHFL¿FLW\7RUHGXFHWKHQXPEHURIIDOVHSRVLWLYH predictions, many algorithms30,53,54,58 employ a conservation rule, i.e., a candidate miRNA LVRQO\DFFHSWHGLIDKRPRORJXHFDQEHIRXQGLQWKHJHQRPHRIDWOHDVWRQHRWKHUUHODWHG VSHFLHV 7KLV PHWKRG RI PL51$ SUHGLFWLRQ DQG FURVVVSHFLHV FRQVHUYDWLRQ FKHFNLQJ KDVEHHQVXFFHVVIXOO\HPSOR\HGWR¿QGPDQ\QRYHOPL51$VLQERWKSODQWVDQGDQLPDOV ZLWKDKLJKGHJUHHRIDFFXUDF\$OWKRXJKVRPHPL51$VDUHFRQVHUYHGEHWZHHQFORVHO\ UHODWHGRUJDQLVPVPDQ\KDYHQRZEHHQVKRZQWREHVSHFL¿FWRLQGLYLGXDOWD[RQRPLF groups.59,60-637KLVGLVFRYHU\KDVH[SRVHGWKHOLPLWDWLRQVRIFRPSDUDWLYHPHWKRGVDQGKDV OHGWRWKHQHHGIRUDOWHUQDWLYHDSSURDFKHVWRPL51$GHWHFWLRQ ,QWKHSDVWPRVWPL51$VZHUHIRUPDOO\LGHQWL¿HGXVLQJWUDGLWLRQDO6DQJHUVHTXHQFLQJ DIWHUVL]HIUDFWLRQDWLRQVHOHFWLQJIRUVHTXHQFHV¾20-22nt) and ligation into cloning vectors. This process was adopted in ArabidopsisULFHDQGSRSODUDQGFRPSDULVRQRIPL51$ VHTXHQFHVDFURVVSODQWIDPLOLHVVKRZHGWKDWWKHPDMRULW\ZHUHFRQVHUYHG64 In animals VLPLODUVWXGLHVOHGWRWKHGLVFRYHU\RIPDQ\PL51$V65-67 Recently, the high-throughput VHTXHQFLQJ RI V51$V KDV OHG WR WKH GLVFRYHU\ RI D SOHWKRUD RI QHZ V51$V PDQ\ RIZKLFKDUHH[SUHVVHGDWORZOHYHOVDQGDUHHLWKHUXQLTXHWRDVSHFL¿FVSHFLHVRUDW least not widely conserved in related organisms. For example, this has been used to XQFRYHUPL51$VVSHFL¿FWRWKHKXPDQEUDLQ684'(DVVRFLDWHG51$VTL51$VLQ Neurospora crassa,69 DQG HQGRVL51$V LQ WKHIUXLWÀ\70 7KHXVH RI KLJKWKURXJKSXW VHTXHQFLQJWHFKQRORJLHVUHPRYHGWKHQHHGRIFORQLQJSULRUWRVHTXHQFLQJ0366ZDV WKH¿UVWKLJKWKURXJKSXWVHTXHQFLQJPHWKRGVXFFHVVIXOO\XVHGWRGLVFRYHUDQXPEHURI novel miRNAs in Arabidopsis.71 Subsequently, 454 pyrosequencing,59,61,63,72,73, Illumina’s Genome Analyzer74-76 and ABI’s SOLiD77,78SODWIRUPKDYHEHHQXVHGWRGLVFRYHUV51$V Although high-throughput techniques have revolutionised sRNA sequencing they KDYHOHGWRQHZSUREOHPVZLWKGDWDDQDO\VLV3UHYLRXVO\ELRORJLVWVZRXOGRIWHQPDQXDOO\ ZRUNWKURXJKVPDOOOLVWVRIV51$VWHVWLQJHDFKIRUPL51$OLNHSURSHUWLHV1RZZLWK PLOOLRQVRIUHDGVEHLQJSURGXFHGE\DVLQJOHVHTXHQFLQJUXQWKHQHHGIRUFRPSXWDWLRQDO WHFKQLTXHVWRSURFHVVDQGFODVVLI\V51$VLQDKLJKWKURXJKSXWPDQQHUKDVEHFRPHDSSDUHQW 7KH ¿UVW VWHS RI WKH FRPSXWDWLRQDO DQDO\VLV RI WKLV W\SH RI GDWD FRQVLVWV RI WKH LGHQWL¿FDWLRQRIWKHJHQRPLFFRRUGLQDWHVWKDWFRXOGKDYHJHQHUDWHGHDFKRIWKHUHDGV7KLV LVGRQHE\PDWFKLQJWKHVHTXHQFHUHDGVWRWKHJHQRPHDSURFHGXUHXVXDOO\UHIHUUHGWRDV UHDGPDSSLQJRUUHDGDOLJQLQJ$OLPLWHGQXPEHURIPLVPDWFKHVLQVHUWLRQVRUGHOHWLRQVLQ WKHUHDGVPD\EHDOORZHGLQRUGHUWRDFFRXQWIRUVHTXHQFLQJHUURUVDQGJHQXLQHGLIIHUHQFHV LQUHODWLRQWRWKHUHIHUHQFHJHQRPHVXFKDVVLQJOHQXFOHRWLGHSRO\PRUSKLVPV0DSSLQJ PLOOLRQVRIUHDGVWRDHXNDU\RWLFJHQRPHLVDFRPSXWDWLRQDOO\LQWHQVLYHWDVNDQGDQXPEHU RIVRIWZDUHSDFNDJHVKDYHEHHQGHYHORSHGWRSHUIRUPWKLVWDVNRSWLPDOO\79 These new methods run in less time using less memory than other alignment tools, such as BLAST E\SUHSURFHVVLQJHLWKHUWKHJHQRPHRUWKHVHWRIUHDGVLQWRLQGH[HV0HWKRGVVXFKDV Bowtie and BWA80,81LQGH[WKHUHIHUHQFHJHQRPHXVLQJD%XUURZV:KHHOHU7UDQVIRUP 7KH3DW0D1SURJUDPLQGH[HVDOOUHDGVLQWRDVXI¿[WUHHDQGXVHVDPRGL¿HGYHUVLRQRIWKH $KR&RUDVLFNDOJRULWKPWRDOLJQWKHVXI¿[WUHHWRWKHJHQRPH82 Additionally, in contrast WR%/$67WKHVHPHWKRGVDUHJXDUDQWHHGWR¿QGDOOWKHPDWFKHVEHWZHHQWKHVHWRIUHDGV 246 RNA INFRASTRUCTURE AND NETWORKS DQGWKHUHIHUHQFHJHQRPH8VXDOO\LQV51$SUR¿OLQJWKHUHDGVDUHQRWDVVHPEOHGLQWR FRQWLJVEXWWKLVVWHSPLJKWEHXVHIXOIRUWKHGLVFRYHU\RIORQJHUQF51$V 7KHJHQRPLFFRRUGLQDWHVDUHVXEVHTXHQWO\XVHGWRH[WUDFWUHJLRQVRIWKHJHQRPH RIWKHVDPHVL]HDVDW\SLFDOPL51$SUHFXUVRUDQGFRPSDWLEOHZLWKWKHSURGXFWLRQRID mature miRNA in the same position where the read was mapped to. These regions are WKHQVXEMHFWHGWRWKHSURFHGXUHSUHYLRXVO\GHVFULEHGSUHGLFWLRQRIVHFRQGDU\VWUXFWXUH DQGFRPSDULVRQWRVHFRQGDU\VWUXFWXUHVRINQRZQPL51$V,QWZRFRPSXWDWLRQDO PHWKRGV ZHUH UHOHDVHG WR SUHGLFW PL51$V IURP KLJKWKURXJKSXW VHTXHQFLQJ GDWD miRDeep83 VSHFL¿F WR DQLPDO GDWDVHWV DQG PL5&DW84 DW WKH WLPH VSHFL¿F WR SODQW GDWDVHWV%RWKPL5'HHSDQGPL5&DWVKRZDKLJKGHJUHHRIVSHFL¿FLW\LQFRPSDULVRQWR SXUHO\FRPSXWDWLRQDOGHQRYRPL51$SUHGLFWLRQDOJRULWKPV,QRQHRWKHUVRIWZDUH package, miRanalyser, was released.857KLVWRROSURGXFHVH[SUHVVLRQSUR¿OHVRINQRZQ PL51$V RWKHU QRQFRGLQJ 51$V DQG SUHGLFWV QHZ PL51$V 7KH SUHGLFWLRQ RI QHZ PL51$VLVSHUIRUPHGXVLQJDPDFKLQHOHDUQLQJDSSURDFKDQGWKHVHWRINQRZQPL51$V DVWKHWHVWVHW%HFDXVHWKHVHWRISUHGLFWHGPL51$VGRHVQRWLQFOXGHWKHNQRZQPL51$V LWLVGLI¿FXOWWRDVVHVVWKHVHQVLWLYLW\RIWKLVPHWKRGDQGWRFRPSDUHWKHTXDOLW\RIWKH SUHGLFWLRQVWRWKRVHRIPL5&DWRUPL5'HHS$GGLWLRQDOO\WKHUHDUHWZRSURJUDPVWKDW VSHFLDOLVHLQSUR¿OLQJNQRZQPL51$VPL5SURI84 and miRExpress.867KHH[LVWHQFHRI OHVVDEXQGDQWVHTXHQFHYDULDQWVRIPL51$VFDOOHGLVRPL5VZDVDOVRXQFRYHUHGXVLQJ high-throughput sequencing data.766HT%XVWHUDUHFHQWO\SXEOLVKHGVRIWZDUHSDFNDJH87 UHSRUWVWKHUHODWLYHDEXQGDQFHRIWKHFDQRQLFDOPL51$VDQGWKHUHVSHFWLYHLVRPL5V1R GRXEWE\WKHWLPHWKLVFKDSWHULVLQSUHVVWKHUHZLOOEHDGGLWLRQDOVRIWZDUHSDFNDJHVIRU PL51$DQDO\VLVDVWKLVLVDUDSLGO\JURZLQJ¿HOG 7KHUHFHQWH[SORVLRQLQKLJKWKURXJKSXWVHTXHQFLQJRIV51$VKDVOHGWRDKXJH LQFUHDVH LQ WKH GLVFRYHU\ RI QRYHO PL51$V LQ D ZLGH YDULHW\ RI RUJDQLVPV 7KLV LV GHPRQVWUDWHGE\WKHUDSLGJURZWKRIWKHFHQWUDOPL51$UHSRVLWRU\PL5%DVH88 which KDV JURZQ GUDPDWLFDOO\ IURP HQWULHV LQ DW WKH DGYHQW RI KLJKWKURXJKSXW sRNA sequencing to 14,197 in the latest release (15.0). With next generation sequencing WHFKQRORJLHVEHFRPLQJHYHUPRUHSRZHUIXODQGPRUHHFRQRPLFDOLWLVOLNHO\WKDWPDQ\ PRUH PL51$V ZLOO EH FKDUDFWHULVHG RYHU WKH FRPLQJ \HDUV 7KH UHDO FKDOOHQJH IRU ELRORJLVWVQRZLVWRWU\WRGLVFRYHUIXQFWLRQVRIWKHWKRXVDQGVRIPL51$VIRUZKLFKZH FXUUHQWO\NQRZQRWKLQJDERXW7KHGLI¿FXOWLHVRIWDUJHWDQGIXQFWLRQDOFKDUDFWHULVDWLRQ will be covered later in this chapter. 7KHSODQWVSHFL¿FWUDQVDFWLQJVL51$VWDVL51$VDUHDQRWKHUFODVVRIHQGRJHQRXV V51$WKDWFDQEHLGHQWL¿HGXVLQJFRPSXWDWLRQDOPHWKRGVWDVL51$VDUHGHULYHGIURP a single-stranded RNA transcript which is targeted in two positions by a miRNA which LVWKRXJKWWRWULJJHUGRXEOHVWUDQGIRUPDWLRQE\DQ51$GHSHQGHQW51$SRO\PHUDVH89 ,QGRXEOHVWUDQGHGIRUPWKHSUHFXUVRUEHFRPHVDVXEVWUDWHIRUD'LFHUHQ]\PH'&/ ZKRVHSURJUHVVLYHFOHDYDJHLQQWLQWHUYDOVOHDGVWRWKH³SKDVHG´SDWWHUQRIPDWXUH V51$SURGXFWLRQWKDWLVDKDOOPDUNRIWDVL51$ORFLV51$SURGXFLQJORFLFDQEHWHVWHG VWDWLVWLFDOO\LQRUGHUWRFODVVLI\QRYHOWDVL51$SURGXFLQJUHJLRQV90 2WKHUW\SHVRIV51$VDUHXVXDOO\QRWVRZHOOGH¿QHGE\DOLJQPHQWSDWWHUQVRUV51$ VHTXHQFH SURSHUWLHV DQG DUH WKHUHIRUH FXUUHQWO\ GLI¿FXOW WR FODVVLI\ FRPSXWDWLRQDOO\ 'LFHULVDOVRNQRZQWRSURGXFHV51$VIURPORQJSHUIHFWRUQHDUSHUIHFWGRXEOHVWUDQGHG SUHFXUVRUVLQDPRUHXQSUHGLFWDEOHSDWWHUQWRWKDWRIWKHSUHFLVHO\GH¿QHGPL51$VDQG ta-siRNAs.91 This imprecise Dicer processing can give rise to highly complex sRNA ORFLZKHUHODUJHQXPEHUVRIV51$VHTXHQFHVDUHUHSUHVHQWHGE\ORZDEXQGDQFHUHDGV even in high-throughput data sets. In order to try to overcome problems when comparing SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES 247 Table 1.3HUIRUPDQFHRIVHTXHQFLQJWHFKQRORJLHV1XPEHUVREWDLQHGIURPPDQXIDFWXUHU¶V ZHEVLWHVRQ0D\7KHQXPEHUVIRUWKHKLJKWKURXJKSXWSODWIRUPVZLOOFKDQJHDV WKHUHLVYHU\DFWLYHSURGXFWGHYHORSPHQWRQDOOSODWIRUPV Technology Read Length Nucleotides Read Per Day Sanger sequencing (ABI3730) Up to 900nt 8SWRA KWWSZZZDSSOLHGELRV\VWHPV com )/;7LWDQLXP series) Up to 500nt 8SWRA KWWSFRP Genome Analyzer (IIe) Up to 100nt 8SWRA KWWSZZZLOOXPLQDFRP SOLiD (version 4) Up to 50nt 8SWRA KWWSZZZDSSOLHGELRV\VWHPV com 5HIHUHQFH Table 2.%LRLQIRUPDWLFVRIWZDUHUHVRXUFHVWRDQDO\VHKLJKWKURXJKSXWVHTXHQFLQJGDWD REWDLQHGIURPV51$V$OOVRIWZDUHLVIUHHO\DYDLODEOHIRUQRQFRPPHUFLDOXVH Name Organism Functionalities 5HIHUHQFH miRDeep Animals Find miRNAs 83 miRCat Plants and animals Find miRNAs 84 miRanalyzer Animals 3UR¿OHNQRZQV51$V ¿QGPL51$V 85 miRExpress Animals and plants 3UR¿OHNQRZQPL51$V 80 PL53URI Animals and plants 3UR¿OHNQRZQPL51$V 84 pssRNAMiner Plants 3UHGLFWLRQRIWDVL51$ 121 Phasing detection tool Plants 3UHGLFWLRQRIWDVL51$ 84 SeqBuster Animals $QDO\VLVRILVRPL5V 87 SiLoCo Plants ,GHQWL¿FDWLRQRIV51$ loci 84 NiBLS Plants ,GHQWL¿FDWLRQRIV51$ loci 92 UHDGFRXQWVEHWZHHQV51$VLQPXOWLSOHVDPSOHVLWLVRIWHQDGYLVDEOHWRWU\WRJURXS sRNAs into transcriptional units or loci using either genome or transcript annotations. 7KH6L/R&RPHWKRGJURXSVV51$VVXFKWKDWHDFKORFXVFRQWDLQVDPLQLPXPQXPEHURI sRNAs and such that the gap between two consecutive sRNAs is below a maximum gap.84 More recently the NiBLS algorithm, based on graph properties, has been developed.92 This method builds a graph where its nodes are individual sRNAs and edges are created EHWZHHQV51$VFORVHWRHDFKRWKHU7KHDOJRULWKPLGHQWL¿HVVXEVHWVRIQRGHVZLWKKLJK FOXVWHULQJFRHI¿FLHQWV7KHV51$ORFLDUHWKHQGH¿QHGE\WKHPLQLPXPDQGPD[LPXP coordinates among the sequences in each subset. 248 RNA INFRASTRUCTURE AND NETWORKS 7DUJHW,GHQWL¿FDWLRQ &XUUHQWO\ WKH H[SHULPHQWDO YDOLGDWLRQ RI V51$ WDUJHWV DQG KHQFH IXQFWLRQ LV a very time-consuming and expensive process. In plants, sRNA mediated target site FOHDYDJHLVZLGHVSUHDG7KHUHVXOWLQJFOHDYDJHSURGXFWVKDYHWZRSURSHUWLHVWKDWIDFLOLWDWH WKHLU UHFRJQLWLRQ )LUVW WKH FOHDYDJH SRLQW LV YHU\ ZHOO GH¿QHG LH WKH QXFOHRWLGHV WKDWDUHFRPSOHPHQWDU\WRQXFOHRWLGHVDQGRIWKHV51$6HFRQGWKHv cleavage SURGXFW LV JHQHUDOO\ VWDEOH $ FRQVHTXHQFH RI WKH ¿UVW SURSHUW\ LV WKDW LW LV SRVVLEOH to distinguish between cleavage products and other mRNA degradation products. 7KHLPSOLFDWLRQRIWKHVHFRQGSURSHUW\LVWKDWLWLVSRVVLEOHWRFORQHDQGVHTXHQFHWKHVH cleavage products. Additionally, the 3v cleavage product contains a 5v mono-phosphate JURXS ZKLFK KHOSV WR OLJDWH DQ 51$ ROLJR PDNLQJ LW SRVVLEOH WR LGHQWLI\ WKH H[DFW cleavage position.93 Traditionally this has been done using a process called 5v Rapid $PSOL¿FDWLRQRIF'1$(QGVv RACE). In essence this process allows the sequencing RIFOHDYDJHSURGXFWVIURPWKHWUDQVFULSWSUHGLFWHGWREHWDUJHWHGE\DJLYHQV51$7KH VHTXHQFHVFDQWKHQEHDOLJQHGWRWKHIXOOOHQJWKP51$DQGLIWKHP51$LVUHJXODWHG WKHQFOHDYDJHSURGXFWVVKRXOGEHJLQDWWKHSRVLWLRQDIWHUWKHPL51$PHGLDWHGFOHDYDJH was predicted to occur. Recently a new high-throughput approach has been described which allows researchers to carry out a high-throughput target validation analysis.94,95 This degradome sequencing DSSURDFKFDSWXUHVDOOFOHDYHGP51$IUDJPHQWVLQWKHWUDQVFULSWRPHRIWKHLQSXWVDPSOH DQGXVLQJVXLWDEOHELRLQIRUPDWLFVWRROVVXFKDV&OHDYH/DQGDOORZVWKHSUHGLFWLRQDQG YDOLGDWLRQRIDOOSODQWPL51$WDUJHWVLQDVLQJOHH[SHULPHQW96 )RUWDUJHWVWKDWDUHQRWFOHDYHGWKHUHDUHWZRPDLQDSSURDFKHVXVHGIRUH[SHULPHQWDO validation.977KH¿UVWRQHUHTXLUHVWKHDELOLW\WRPDQLSXODWHWKHFRQFHQWUDWLRQRUOHYHORI DFWLYLW\RIWKHV51$)RUUHDOWDUJHWVWKHUHVSHFWLYHSURWHLQFRQFHQWUDWLRQOHYHOVVKRXOG FKDQJHLQUHVSRQVHWRGLIIHUHQWV51$DFWLYLW\OHYHOV7KHVHFRQGDSSURDFKFRQVLVWVRI FRS\LQJWKHVHTXHQFHRI WKHWDUJHWVLWHWRD UHSRUWHUJHQHDQGPHDVXULQJLWVDFWLYLW\ The target site should contain not only the region complementary to the sRNA but also WKHFRUUHVSRQGLQJÀDQNLQJUHJLRQV)RUDUHDOWDUJHWLQWURGXFLQJSRLQWPXWDWLRQVWRWKH FRSLHGVHTXHQFHVKRXOGUHVXOWLQGHFUHDVHGDFWLYLW\RIWKHUHSRUWHUJHQH 7KHOLPLWDWLRQVRIH[SHULPHQWDOPHWKRGVKDYHOHGWRWKHGHYHORSPHQWRIFRPSXWDWLRQDO PHWKRGVWRSUHGLFWV51$WDUJHWV,QPRVWRIWKHVHPHWKRGVDFRPELQDWLRQRIWKUHHIHDWXUHV is taken into account: sequence complementarity between miRNAs and target sites, RWKHUWKHUPRG\QDPLFIDFWRUVDQGFRQVHUYDWLRQRISXWDWLYHWDUJHWVLWHVDFURVVVSHFLHV 'LIIHUHQWPHWKRGVH[LVWWRSUHGLFWWKHWDUJHWVLWHVZLWKKLJKGHJUHHRIFRPSOHPHQWDULW\ PRUHW\SLFDORISODQWVDQGWDUJHWVLWHVZLWKORZGHJUHHRIFRPSOHPHQWDULW\PRUHXVXDO LQDQLPDOV)RUWKHIRUPHUWKHPRVWXVHGPHWKRGLVDUXOHEDVHGDOJRULWKPSUHVHQWHGLQ a paper by Schwab and colleagues.29 For the latter TargetScan98 and PicTar99 have been XVHGWRSURGXFHSUHGLFWLRQVRIFRQVHUYHGPL51$WDUJHWV7ZRRWKHUPHWKRGV51$ and Pita are widely used to predict nonconserved targets.100,101 The predictions generated E\WKHVHPHWKRGVLQFOXGHPDQ\IDOVHSRVLWLYHVDQGIDOVHQHJDWLYHVZLWKWKHVHUDWHVPXFK higher in the low complementarity predictions. Alternative hybrid target prediction methods using microarrays or high-throughput VHTXHQFLQJKDYHDOVREHHQGHYHORSHG2QHVHWRIDSSURDFKHVUHOLHVRQPHDVXULQJWKH FRQFHQWUDWLRQVRIHLWKHUP51$VRUSURWHLQVXQGHUGLIIHUHQWDFWLYLW\OHYHOVRIDVLQJOHV51$ 7DUJHWLQJUHODWLRQVKLSVDUHLQIHUUHGIRUWKHJHQHVWKDWVKRZFRQFHQWUDWLRQFKDQJHVDQG SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES 249 contain potential target sites. mRNA concentrations can be estimated using microarrays DQG PRUH UHFHQWO\ KLJKWKURXJKSXW VHTXHQFLQJ /DUJH VFDOH PHDVXUHPHQW RI SURWHLQ FRQFHQWUDWLRQVLVDOVRSRVVLEOHDOWKRXJKZLWKWKHFXUUHQWWHFKQRORJ\RQO\DIUDFWLRQRI proteins can be measured.102,103 ,QWKH¿UVWH[SHULPHQWWKDWXVHGP51$PHDVXUHPHQWVREWDLQHGDIWHUPL51$DFWLYLW\ PDQLSXODWLRQWKHDEXQGDQFHVRIPL5DQGPL5ZHUHLQFUHDVHGE\WUDQVIHFWLQJFHOOV ZLWKVL51$VZLWKLGHQWLFDOVHTXHQFHVRIWKRVHWZRPL51$V104(YLGHQFHIRUZLGHVSUHDG WDUJHWLQJDWWKHP51$OHYHOZDVIRXQGE\DQDO\VLQJWKHVHTXHQFHVRIGRZQUHJXODWHG JHQHV ,Q SDUWLFXODU WKH VHTXHQFH FRPSOHPHQWDU\ WR WKH ¿UVW HLJKW QXFOHRWLGHV RI WKH PL51$VWKRXJKWWREHLPSRUWDQWIRUWDUJHWUHFRJQLWLRQZDVIRXQGWREHKLJKO\RYHU UHSUHVHQWHGDPRQJWKHVHJHQHV,QRWKHUH[SHULPHQWVWKHDFWLYLW\RILQGLYLGXDOPL51$V was decreased, either by suppressing activity or deleting the miRNA gene.105-108 In both W\SHVRIH[SHULPHQWVWKHUHGXFHGPL51$DFWLYLW\LVH[SHFWHGWROHDGWRDGHUHSUHVVLRQ RIWKHWDUJHWHGP51$VDQGDFRQVHTXHQWLQFUHDVHLQP51$FRQFHQWUDWLRQV)LQDOO\LQD VXEVHTXHQWH[SHULPHQWWKHHIIHFWVRIRYHUH[SUHVVLQJDQGVXSSUHVVLQJWKHVDPHPL51$ ZHUH MRLQWO\ DVVHVVHG109 7KH FRPELQDWLRQ RI WKH UHVXOWV IURP WKHVH WZR H[SHULPHQWV LQFUHDVHGWKHVSHFL¿FLW\RIWKHVHWRIWDUJHWFDQGLGDWHV7KLVPHWKRGRIPL51$WDUJHW SUHGLFWLRQ ORRNLQJ IRU GLIIHUHQFHV LQ P51$ FRQFHQWUDWLRQV XQGHU GLIIHUHQW PL51$ DFWLYLWLHVKDVWZROLPLWDWLRQV¿UVWLWLGHQWL¿HVRQO\WKHIUDFWLRQRIWDUJHWVUHJXODWHGDW WKHP51$OHYHOVHFRQGVRPHRIWKHSUHGLFWHGWDUJHWVPLJKWEHIDOVHSRVLWLYHVVLQFH WKHREVHUYHGP51$FRQFHQWUDWLRQFKDQJHFDQEHFDXVHGE\IDFWRUVRWKHUWKDQPL51$ WDUJHWLQJ+RZHYHULWXVXDOO\JHQHUDWHVDORZHUQXPEHURIIDOVHSRVLWLYHVWKDQSXUHO\ computational methods. $QRWKHUDSSURDFKXVLQJKLJKWKURXJKSXWPHDVXUHPHQWRIP51$VKDVDOVREHHQ GHYHORSHGLQUHFHQW\HDUV7KLVDSSURDFKUHOLHVRQWKHLPPXQRSUHFLSLWDWLRQRI5,6&V ERXQGWRWDUJHWHGP51$VDQGVXEVHTXHQWSUR¿OLQJRIWKHVHP51$VXVLQJPLFURDUUD\V110-113 7KHSUR¿OLQJPXVWEHIROORZHGE\FRPSXWDWLRQDODQDO\VLVRIWKHVHTXHQFHVRIWKHSUR¿OHG P51$VWRLGHQWLI\WKHSXWDWLYHWDUJHWVLWHVDQGV51$VWKDWWDUJHWHDFKP51$0RUH UHFHQWO\DVWXG\UHSRUWHGWKHVHTXHQFLQJRIDVDPSOHZKHUH5,6&P51$DQG5,6&V51$ complexes were immuno-precipitated in the same sample and both mRNAs and sRNAs ZHUHVXEMHFWHGWRKLJKWKURXJKSXWVHTXHQFLQJ7KLVFRPELQHGGDWDDOORZHGIRUDPRUH DFFXUDWHGHWHUPLQDWLRQRIV51$P51$UHJXODWRU\LQWHUDFWLRQV114 Further sRNA Characterisation $VZHKDYHVHHQV51$FODVVHVDUHSDUWLDOO\GH¿QHGE\WKHLUELRJHQHVLVSDWKZD\V +LJKWKURXJKSXWVHTXHQFLQJRIVPDOO51$VLQFHOOVWKDWODFNRQHRUPRUHWKDQRQHRIWKH SURWHLQVLQYROYHGLQWKHVHSDWKZD\VKDVEHHQXVHGWRFODVVLI\V51$VDQGLQSDUWLFXODU WRGLVWLQJXLVKPL51$VIURPRWKHUW\SHVRIV51$V72,1157KLVPLJKWEHXVHIXOWRSUHGLFW the targeting pathways in which each sRNAs is involved. High-throughput technologies have also been used to study the transcriptional UHJXODWLRQRIV51$V7KLVFDQEHGRQHE\LPPXQRSUHFLSLWDWLQJWUDQVFULSWLRQIDFWRUVDQG WKH'1$WKH\DUHERXQGWRIROORZHGE\'1$SUR¿OLQJUHVRUWLQJHLWKHUWRPLFURDUUD\V or high-throughput sequencing. This method has been used to determine transcription start sites and promoter regions116 and to establish transcriptional regulatory interactions EHWZHHQ WUDQVFULSWLRQ IDFWRUV DQG V51$V39 7KHVH W\SHV RI VWXGLHV KDYH DOORZHG WKH FRQVWUXFWLRQRIUHJXODWRU\QHWZRUNVLQYROYLQJERWKV51$VDQGUHJXODWRU\SURWHLQV117,118 250 RNA INFRASTRUCTURE AND NETWORKS CONCLUSION +LJKWKURXJKSXWPHWKRGVKDYHEHHQH[WUHPHO\LPSRUWDQWIRUWKHFKDUDFWHUL]DWLRQRI V51$VQDPHO\IRUWKHGLVFRYHU\DQGFODVVL¿FDWLRQRIWKHJHQHVWKDWHQFRGHV51$VWKH LGHQWL¿FDWLRQRIWKHJHQHVUHJXODWHGE\V51$VDQGWKHJHQHVWKDWUHJXODWHWKHH[SUHVVLRQ RIV51$V,QWKHQHDUIXWXUHWKHLQFRUSRUDWLRQRIQHZWHFKQRORJLFDOGHYHORSPHQWVLV JRLQJWRKDYHDJUHDWLQÀXHQFHLQWKHSURJUHVVRIWKHUHVHDUFK$QHZPHWKRGWRSUHSDUH OLEUDULHVIRUKLJKWKURXJKSXWVHTXHQFLQJWKDWUHTXLUHVPXFKVPDOOHUTXDQWLWLHVRI51$ has also been recently published,119 making it possible to generate much more accurate PHDVXUHPHQWVRI51$FRQFHQWUDWLRQVDWDVLQJOHFHOOOHYHO7KHVHQHZSURWRFROVZLOO DOORZHDVLHUVHTXHQFLQJRIEDFWHULDODQGRWKHUQRQSRO\$WDLOHG51$VDQGLWLVOLNHO\WKDW there will be a rapid increase in research in these areas. New sequencing technologies FDSDEOHRIVLQJOHPROHFXOHUHVROXWLRQKDYHEHHQUHFHQWO\GHVFULEHGDQGDUHH[SHFWHGWR EHZLGHO\DYDLODEOHLQWKHQH[WIHZ\HDUV120 The expected throughput increase and cost GHFUHDVHRIVHTXHQFLQJZLOOOLNHO\FDXVHDVKLIWLQWKHXVDJHRIWKHVHWHFKQRORJLHVLQWKH IXWXUHVHTXHQFLQJZLOOEHXVHGOHVVWRFDWDORJXHV51$VDQGPRUHWRXQGHUVWDQGWKHLU IXQFWLRQIRULQVWDQFHE\VLPXOWDQHRXVO\SUR¿OLQJV51$VDQGWKHLUWDUJHWVDFURVVPXOWLSOH tissues and developmental stages. These datasets will allow us not only to increase the DPRXQWDQGTXDOLW\RILQIRUPDWLRQRQUHJXODWRU\LQWHUDFWLRQVLQYROYLQJV51$VWKDWLVWR XQFRYHUWKHWRSRORJ\RIWKHUHJXODWRU\QHWZRUNVLQYROYLQJV51$VEXWDOVRWRLPSURYH RXUXQGHUVWDQGLQJRIWKHG\QDPLFVRIWKHVHQHWZRUNVDQGRIWKHUHJXODWLRQRIELRORJLFDO processes by sRNAs. ACKNOWLEDGEMENTS 7KHDXWKRUVWKDQN'U)UDQN6FKZDFKIRUKHOSIXOGLVFXVVLRQVGXULQJWKHSUHSDUDWLRQ RIWKLVPDQXVFULSW+3LVDVWXGHQW,QVWLWXWR*XOEHQNLDQGH&LrQFLD¶V3K'3URJUDPPH LQ&RPSXWDWLRQDO%LRORJ\VSRQVRUHGE\)XQGDomRSDUDD&LrQFLDHD7HFQRORJLD>)&7@ Fundação Calouste Gulbenkian and Siemens SA Portugal) and was supported by FCT IHOORZVKLS6)5+%':RUNVXSSRUWHGE\%LRWHFKQRORJ\DQG%LRORJLFDO 6FLHQFHV5HVHDUFK&RXQFLO%%( REFERENCES *RWWHVPDQ67KHVPDOO51$UHJXODWRUVRI(VFKHULFKLDFROLUROHVDQGPHFKDQLVPV$QQXDOUHYLHZRI microbiology 2004; 58:303-328. :DJQHU (* 6LPRQV 5: $QWLVHQVH 51$ FRQWURO LQ EDFWHULD SKDJHV DQG SODVPLGV $QQXDO UHYLHZ RI microbiology 1994; 48:713-742. :DVVDUPDQ.06PDOO51$VLQEDFWHULDGLYHUVHUHJXODWRUVRIJHQHH[SUHVVLRQLQUHVSRQVHWRHQYLURQPHQWDO changes. Cell 2002; 109(2):141-144. :LJKWPDQ % +D , 5XYNXQ * 3RVWWUDQVFULSWLRQDO UHJXODWLRQ RI WKH KHWHURFKURQLF JHQH OLQ E\ OLQ PHGLDWHVWHPSRUDOSDWWHUQIRUPDWLRQLQ&HOHJDQV&HOO 5. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 1993; 75(5):843-854. 6. Mueller E, Gilbert J, Davenport G et al. Homology-dependent resistance: transgenic virus resistance in plants related to homology-dependent gene silencing. Plant J. 1995; 7(6):1001-1013. 9RLQQHW26\VWHPLF6SUHDGRI6HTXHQFH6SHFL¿F7UDQVJHQH51$'HJUDGDWLRQLQ3ODQWV,V,QLWLDWHGE\ /RFDOL]HG,QWURGXFWLRQRI(FWRSLF3URPRWHUOHVV'1$&HOO 1HOOHQ:/LFKWHQVWHLQ&:KDWPDNHVDQP51$DQWLVHQVLWLYH"7UHQGV%LRFKHP6FL SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES 251 )LUH$;X60RQWJRPHU\0.HWDO3RWHQWDQGVSHFL¿FJHQHWLFLQWHUIHUHQFHE\GRXEOHVWUDQGHG51$LQ Caenorhabditis elegans. Nature 1998; 391(6669):806-811. &DUWKHZ5:6RQWKHLPHU(-2ULJLQVDQG0HFKDQLVPVRIPL51$VDQGVL51$V&HOO (XODOLR$+XQW]LQJHU(,]DXUUDOGH(*HWWLQJWRWKHURRWRIPL51$PHGLDWHGJHQHVLOHQFLQJ&HOO 132(1):9-14. 12. Filipowicz W, Jaskiewicz L, Kolb FA et al. Post-transcriptional gene silencing by siRNAs and miRNAs. Curr Opin Struct Biol 2005; 15(3):331-341. 9HUGHO$-LD6*HUEHU6HWDO51$LPHGLDWHGWDUJHWLQJRIKHWHURFKURPDWLQE\WKH5,76FRPSOH[6FLHQFH 2004; 303(5658):672-676. 14. Ghildiyal M, Zamore PD. Small silencing RNAs: an expanding universe. Nat Rev Genet 2009; 10(2):94-108. %DUWHO'30LFUR51$VJHQRPLFVELRJHQHVLVPHFKDQLVPDQGIXQFWLRQ&HOO 16. Bushati N, Cohen SM. microRNA Functions. Annu Rev Cell Dev Biol 2007; 23(1):175-205. 17. Jones-Rhoades MW, Bartel DP, Bartel B. MicroRNAs and Their Regulatory Roles in Plants. Annu Rev Plant Biol 2006; 57:19-53. &XOOHQ%57UDQVFULSWLRQDQGSURFHVVLQJRIKXPDQPLFUR51$SUHFXUVRUV0RO&HOO 'X 7 =DPRUH 3' PLFUR3ULPHU WKH ELRJHQHVLV DQG IXQFWLRQ RI PLFUR51$ 'HYHORSPHQW 132(21):4645-4652. 20. Faller M, Guo F. MicroRNA biogenesis: there’s more than one way to skin a cat. Biochim Biophys Acta 2008; 1779(11):663-667. .LP91+DQ-6LRPL0&%LRJHQHVLVRIVPDOO51$VLQDQLPDOV1DW5HY0RO&HOO%LRO 9RLQQHW22ULJLQ%LRJHQHVLVDQG$FWLYLW\RI3ODQW0LFUR51$V&HOO 23. Kim VN. MicroRNA biogenesis: coordinated cropping and dicing. Nat Rev Mol Cell Biol 2005; 6(5):376-385. &KHQ;6PDOO51$VDQG7KHLU5ROHVLQ3ODQW'HYHORSPHQW$QQX5HY&HOO'HY%LRO 6FKZDFK)0R[RQ60RXOWRQ9HWDO'HFLSKHULQJWKHGLYHUVLW\RIVPDOO51$VLQSODQWVWKHORQJDQG VKRUWRILW%ULHI)XQFW*HQRPLF3URWHRPLF 2NDPXUD . /DL (& (QGRJHQRXV VPDOO LQWHUIHULQJ 51$V LQ DQLPDOV 1DW 5HY 0RO &HOO %LRO 9(9):673-678. 7KRPVRQ7/LQ+7KHELRJHQHVLVDQGIXQFWLRQRI3,:,SURWHLQVDQGSL51$VSURJUHVVDQGSURVSHFW Annu Rev Cell Dev Biol 2009; 25(1):355-376. )LOLSRZLF]:%KDWWDFKDU\\D616RQHQEHUJ10HFKDQLVPVRISRVWWUDQVFULSWLRQDOUHJXODWLRQE\PLFUR51$V DUHWKHDQVZHUVLQVLJKW"1DW5HY*HQHW 6FKZDE53DODWQLN-)5LHVWHU0HWDO6SHFL¿F(IIHFWVRI0LFUR51$VRQWKH3ODQW7UDQVFULSWRPH'HY Cell 2005; 8(4):517-527. -RQHV5KRDGHV0:%DUWHO'3&RPSXWDWLRQDOLGHQWL¿FDWLRQRISODQWPLFUR51$VDQGWKHLUWDUJHWVLQFOXGLQJ a stress-induced miRNA. Mol Cell 2004; 14(6):787-799. 5DMHZVN\1PLFUR51$WDUJHWSUHGLFWLRQVLQDQLPDOV1DW*HQHW6XSSOV66 %DUWHO'30LFUR51$VWDUJHWUHFRJQLWLRQDQGUHJXODWRU\IXQFWLRQV&HOO 'LQJ;&:HLOHU-*URVVKDQV+5HJXODWLQJWKHUHJXODWRUVPHFKDQLVPVFRQWUROOLQJWKHPDWXUDWLRQRI microRNAs. Trends in biotechnology 2009; 27(1):27-36. 0XUDOLGKDU%*ROGVWHLQ/1J*HWDO*OREDOPLFUR51$SUR¿OHVLQFHUYLFDOVTXDPRXVFHOOFDUFLQRPD GHSHQGRQ'URVKDH[SUHVVLRQOHYHOV7KH-RXUQDORI3DWKRORJ\ ;LH = .DVVFKDX .' &DUULQJWRQ -& 1HJDWLYH IHHGEDFN UHJXODWLRQ RI GLFHUOLNH LQ DUDELGRSVLV E\ microRNA-guided mRNA degradation. Current Biology 2003; 13(9):784-789. 3LUL\DSRQJVD--RUGDQ,.'XDOFRGLQJRIVL51$VDQGPL51$VE\SODQWWUDQVSRVDEOHHOHPHQWV51$ 2008; 14(5):814-821. *LUDOGH]$-0LVKLPD<5LKHO-HWDO=HEUD¿VK0L5SURPRWHVGHDGHQ\ODWLRQDQGFOHDUDQFHRIPDWHUQDO mRNAs. Science 2006; 312(5770):75-79. =KDR;)MRVH$/DUVHQ1HWDO7UHDWPHQWZLWKVPDOOLQWHUIHULQJ51$DIIHFWVWKHPLFUR51$SDWKZD\ DQGFDXVHVXQVSHFL¿FGHIHFWVLQ]HEUD¿VKHPEU\RV)(%6-RXUQDO 39. Marson A, Levine SS, Cole MF et al. Connecting microRNA genes to the core transcriptional regulatory FLUFXLWU\RIHPEU\RQLFVWHPFHOOV&HOO =KRX;5XDQ-:DQJ*HWDO&KDUDFWHUL]DWLRQDQGLGHQWL¿FDWLRQRIPLFUR51$FRUHSURPRWHUVLQIRXU model species. PLoS Comput Biol 2007; 3(3). 41. Megraw M, Baev V, Rusinov V et al. MicroRNA promoter element discovery in Arabidopsis. RNA 2006; 12(9):1612-1619. $OZLQH-&.HPS'-6WDUN*50HWKRGIRUGHWHFWLRQRIVSHFL¿F51$VLQDJDURVHJHOVE\WUDQVIHUWR diazobenzyloxymethyl-paper and hybridization with DNA probes. Proc Natl Acad Sci USA 1977; 74(12):5350-5354. %HFNHU$QGUp 0 +DKOEURFN . $EVROXWH P51$ TXDQWL¿FDWLRQ XVLQJ WKH SRO\PHUDVH FKDLQ UHDFWLRQ (PCR). A novel approach by a PCR aided transcript titration assay (PATTY). Nucleic Acids Res 1989; 17(22):9437-9446. 252 RNA INFRASTRUCTURE AND NETWORKS %URZQ32%RWVWHLQ'([SORULQJWKHQHZZRUOGRIWKHJHQRPHZLWK'1$PLFURDUUD\V1DW*HQHW 21(1 Suppl):33-37. 5HLQDUW]-%UX\QV(/LQ-HWDO0DVVLYHO\SDUDOOHOVLJQDWXUHVHTXHQFLQJ0366DVDWRROIRULQGHSWK TXDQWLWDWLYHJHQHH[SUHVVLRQSUR¿OLQJLQDOORUJDQLVPV%ULHI)XQFW*HQRPLF3URWHRPLF 0DUJXOLHV0(JKROP0$OWPDQ:(HWDO*HQRPHVHTXHQFLQJLQPLFURIDEULFDWHGKLJKGHQVLW\SLFROLWUH reactors. Nature 2005; 437(7057):376-380. 47. Bennett S. Solexa Ltd. Pharmacogenomics 2004; 5(4):433-438. 6KHQGXUH-3RUUHFD*-5HSSDV1%HWDO$FFXUDWHPXOWLSOH[SRORQ\VHTXHQFLQJRIDQHYROYHGEDFWHULDO genome. Science 2005; 309(5741):1728-1732. :LOOHQEURFN+6DORPRQ-6¡NLOGH5HWDO4XDQWLWDWLYHPL51$H[SUHVVLRQDQDO\VLVFRPSDULQJPLFURDUUD\V with next-generation sequencing. RNA (New York, N.Y.) 2009; 15(11):2028-2034. /LQVHQ6(GH:LW(-DQVVHQV*HWDO/LPLWDWLRQVDQGSRVVLELOLWLHVRIVPDOO51$GLJLWDOJHQHH[SUHVVLRQ SUR¿OLQJ1DWXUHPHWKRGV /DL ( 7RPDQFDN 3 :LOOLDPV 5 HW DO &RPSXWDWLRQDO LGHQWL¿FDWLRQ RI 'URVRSKLOD PLFUR51$ JHQHV Genome Biol 2003; 4(7). 52. Lim LP, Glasner ME, Yekta S et al. Vertebrate microRNA genes. Science 2003; 299(5612). %RQQHW(:X\WV-5RX]p3HWDO'HWHFWLRQRISRWHQWLDOFRQVHUYHGSODQWPLFUR51$VLQ$UDELGRSVLVWKDOLDQD DQG2U\]DVDWLYDLGHQWL¿HVLPSRUWDQWWDUJHWJHQHV3URF1DWO$FDG6FL86$ :DQJ ; =KDQJ - /L ) HW DO 0LFUR51$ LGHQWL¿FDWLRQ EDVHG RQ VHTXHQFH DQG VWUXFWXUH DOLJQPHQW %LRLQIRUPDWLFV /LP /3 /DX 1& :HLQVWHLQ (* HW DO 7KH PLFUR51$V RI FDHQRUKDEGLWLV HOHJDQV *HQHV 'HY 17(8):991-1008. +RIDFNHU,/9LHQQD51$VHFRQGDU\VWUXFWXUHVHUYHU1XFOHLF$FLGV5HV =XNHU00IROGZHEVHUYHUIRUQXFOHLFDFLGIROGLQJDQGK\EULGL]DWLRQSUHGLFWLRQ1XFOHLF$FLGV5HV 31(13):3406-3415. $GDL $ -RKQVRQ & 0ORWVKZD 6 HW DO &RPSXWDWLRQDO SUHGLFWLRQ RI PL51$V LQ $UDELGRSVLV WKDOLDQD Genome Res 2005; 15(1):78-91. %DUDNDW$:DOO./HHEHQV0DFN-HWDO/DUJHVFDOHLGHQWL¿FDWLRQRIPLFUR51$VIURPDEDVDOHXGLFRW (VFKVFKRO]LDFDOLIRUQLFDDQGFRQVHUYDWLRQLQÀRZHULQJSODQWV3ODQW-IRUFHOODQGPROHFXODUELRORJ\ 2007; 51(6):991-1003. %HQWZLFK,$YQLHO$.DURY<HWDO,GHQWL¿FDWLRQRIKXQGUHGVRIFRQVHUYHGDQGQRQFRQVHUYHGKXPDQ microRNAs. Nat Genet 2005; 37(7):766-770. )DKOJUHQ1+RZHOO0'.DVVFKDX.'HWDO+LJKWKURXJKSXWVHTXHQFLQJRI$UDELGRSVLVPLFUR51$V HYLGHQFHIRUIUHTXHQWELUWKDQGGHDWKRI0,51$JHQHV3OR6RQHH 6]LWW\D*0R[RQ66DQWRV'0HWDO+LJKWKURXJKSXWVHTXHQFLQJRI0HGLFDJRWUXQFDWXODVKRUW51$V LGHQWL¿HVHLJKWQHZPL51$IDPLOLHV%0&JHQRPLFV <DR<*XR*1L=HWDO&ORQLQJDQGFKDUDFWHUL]DWLRQRIPLFUR51$VIURPZKHDW7ULWLFXPDHVWLYXP L.). Genome Biol 2007; 8(6):R96. $[WHOO 0- %DUWHO '3 $QWLTXLW\ RI PLFUR51$V DQG WKHLU WDUJHWV LQ ODQG SODQWV 3ODQW &HOO 17(6):1658-1673. /DJRV4XLQWDQD05DXKXW5/HQGHFNHO:HWDO,GHQWL¿FDWLRQRIQRYHOJHQHVFRGLQJIRUVPDOOH[SUHVVHG RNAs. Science (New York, N.Y.) 2001; 294(5543):853-858. /HH5&$PEURV9$QH[WHQVLYHFODVVRIVPDOO51$VLQ&DHQRUKDEGLWLVHOHJDQV6FLHQFH1HZ<RUN N.Y.) 2001; 294(5543):862-864. /DX1&/LP/3:HLQVWHLQ(*HWDO$QDEXQGDQWFODVVRIWLQ\51$VZLWKSUREDEOHUHJXODWRU\UROHVLQ caenorhabditis elegans. Science 2001; 294(5543):858-862. %HUH]LNRY(7KXHPPOHU)/DDNH/:HWDO'LYHUVLW\RIPLFUR51$VLQKXPDQDQGFKLPSDQ]HHEUDLQ Nat Genet 2006; 38(12):1375-1377. /HH+&KDQJ6&KRXGKDU\6HWDOTL51$LVDQHZW\SHRIVPDOOLQWHUIHULQJ51$LQGXFHGE\'1$ damage. Nature 2009; 459(7244):274-277. *KLOGL\DO06HLW]++RUZLFK0'HWDO(QGRJHQRXVVL51$VGHULYHGIURPWUDQVSRVRQVDQGP51$VLQ drosophila somatic cells. Science 2008; 320(5879):1077-1081. :DQJ;-5H\HV-&KXD1+HWDO3UHGLFWLRQDQGLGHQWL¿FDWLRQRI$UDELGRSVLVWKDOLDQDPLFUR51$VDQG their mRNA targets. Genome Biol 2004; 5(9). 72. Lu C, Kulkarni K, Souret FF et al. MicroRNAs and other small RNAs enriched in the Arabidopsis RNA-dependent RNA polymerase-2 mutant. Genome Res 2006; 16(10):1276-1288. 0R[RQ6-LQJ56]LWW\D*HWDO'HHSVHTXHQFLQJRIWRPDWRVKRUW51$VLGHQWL¿HVPLFUR51$VWDUJHWLQJ JHQHVLQYROYHGLQIUXLWULSHQLQJ*HQRPH5HV *OD]RY($&RWWHH3$%DUULV:&HWDO$PLFUR51$FDWDORJRIWKHGHYHORSLQJFKLFNHQHPEU\RLGHQWL¿HG by a deep sequencing approach. Genome Res 2008; 18(6):957-964. SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES 253 .XFKHQEDXHU)0RULQ5'$UJLURSRXORV%HWDO,QGHSWKFKDUDFWHUL]DWLRQRIWKHPLFUR51$WUDQVFULSWRPH in a leukemia progression model. Genome Res 2008; 18(11):1787-1797. 0RULQ5'2¶&RQQRU0'*ULI¿WK0HWDO$SSOLFDWLRQRIPDVVLYHO\SDUDOOHOVHTXHQFLQJWRPLFUR51$ SUR¿OLQJDQGGLVFRYHU\LQKXPDQHPEU\RQLFVWHPFHOOV*HQRPH5HV *RII/$'DYLOD-6ZHUGHO05HWDO$JRLPPXQRSUHFLSLWDWLRQLGHQWL¿HVSUHGLFWHGPLFUR51$VLQKXPDQ embryonic stem cells and neural precursors. PLoS ONE 2009; 4(9). &DL<<X;=KRX4HWDO1RYHOPLFUR51$VLQVLONZRUP%RPE\[PRUL)XQFW,QWHJU*HQRPLFV 10(3):405-15. 7UDSQHOO&6DO]EHUJ6/+RZWRPDSELOOLRQVRIVKRUWUHDGVRQWRJHQRPHV1DW%LRWHFKQRO /L+'XUELQ5)DVWDQGDFFXUDWHVKRUWUHDGDOLJQPHQWZLWK%XUURZV:KHHOHUWUDQVIRUP%LRLQIRUPDWLFV 2[IRUG(QJODQG /DQJPHDG%7UDSQHOO&3RS0HWDO8OWUDIDVWDQGPHPRU\HI¿FLHQWDOLJQPHQWRIVKRUW'1$VHTXHQFHV to the human genome. Genome Biol 2009; 10(3). 3UIHU.6WHQ]HO8'DQQHPDQQ0HWDO3DW0D1UDSLGDOLJQPHQWRIVKRUWVHTXHQFHVWRODUJHGDWDEDVHV %LRLQIRUPDWLFV )ULHGOlQGHU05&KHQ:$GDPLGL&HWDO'LVFRYHULQJPLFUR51$VIURPGHHSVHTXHQFLQJGDWDXVLQJ miRDeep. Nat Biotech 2008; 26(4):407-415. 0R[RQ 6 6FKZDFK ) 'DOPD\ 7 HW DO $ WRRONLW IRU DQDO\VLQJ ODUJHVFDOH SODQW VPDOO 51$ GDWDVHWV %LRLQIRUPDWLFV +DFNHQEHUJ06WXUP0/DQJHQEHUJHU'HWDOPL5DQDO\]HUDPLFUR51$GHWHFWLRQDQGDQDO\VLVWRROIRU QH[WJHQHUDWLRQVHTXHQFLQJH[SHULPHQWV1XFO$FLGV5HVVXSSOB:: :DQJ:&/LQ)0&KDQJ:&HWDOPL5([SUHVV$QDO\]LQJKLJKWKURXJKSXWVHTXHQFLQJGDWDIRUSUR¿OLQJ PLFUR51$H[SUHVVLRQ%0&%LRLQIRUPDWLFV 3DQWDQR/(VWLYLOO;0DUWt(6HT%XVWHUDELRLQIRUPDWLFWRROIRUWKHSURFHVVLQJDQGDQDO\VLVRIVPDOO 51$V GDWDVHWV UHYHDOV XELTXLWRXV PL51$ PRGL¿FDWLRQV LQ KXPDQ HPEU\RQLF FHOOV 1XFOHLF $FLGV Res2010; 38(5):e34. *ULI¿WKV-RQHV66DLQL+.YDQ'RQJHQ6HWDOPL5%DVHWRROVIRUPLFUR51$JHQRPLFV1XFOHLF$FLGV 5HVVXSSOB'' $[WHOO 0- -DQ & 5DMDJRSDODQ 5 HW DO $ WZRKLW WULJJHU IRU VL51$ ELRJHQHVLV LQ SODQWV &HOO 127(3):565-577. &KHQ+/L<:X6%LRLQIRUPDWLFSUHGLFWLRQDQGH[SHULPHQWDOYDOLGDWLRQRIDPLFUR51$GLUHFWHGWDQGHP trans-acting siRNA cascade in Arabidopsis. Proc Natl Acad Sci USA 2007; 104(9):3318-3323. (OEDVKLU60/HQGHFNHO:7XVFKO751$LQWHUIHUHQFHLVPHGLDWHGE\DQGQXFOHRWLGH51$V Genes Dev 2001; 15(2):188-200. 0DF/HDQ'0RXOWRQ96WXGKROPH'-)LQGLQJV51$JHQHUDWLYHORFDOHVIURPKLJKWKURXJKSXWVHTXHQFLQJ GDWDZLWK1L%/6%0&%LRLQIRUPDWLFV 93. Llave C, Kasschau KD, Rector MA et al. Endogenous and silencing-associated small RNAs in plants. The Plant cell 2002; 14(7):1605-1619. *HUPDQ0$3LOOD\0-HRQJ'+HWDO*OREDOLGHQWL¿FDWLRQRIPLFUR51$±WDUJHW51$SDLUVE\SDUDOOHO DQDO\VLVRI51$HQGV1DW%LRWHFKQRO $GGR4XD\H&(VKRR7:%DUWHO'3HWDO(QGRJHQRXVVL51$DQGPL51$WDUJHWVLGHQWL¿HGE\VHTXHQFLQJ RIWKH$UDELGRSVLVGHJUDGRPH&XUUHQW%LRORJ\ $GGR4XD\H&0LOOHU:$[WHOO0-&OHDYH/DQGDSLSHOLQHIRUXVLQJGHJUDGRPHGDWDWR¿QGFOHDYHG VPDOO51$WDUJHWV%LRLQIRUPDWLFV2[IRUG(QJODQG .XKQ'(0DUWLQ00)HOGPDQ'6HWDO([SHULPHQWDOYDOLGDWLRQRIPL51$WDUJHWV0HWKRGV6DQ'LHJR &DOLI /HZLV %3 %XUJH &% %DUWHO '3 &RQVHUYHG VHHG SDLULQJ RIWHQ ÀDQNHG E\ DGHQRVLQHV LQGLFDWHV WKDW WKRXVDQGVRIKXPDQJHQHVDUHPLFUR51$WDUJHWV&HOO .UHN$*UQ'3R\01HWDO&RPELQDWRULDOPLFUR51$WDUJHWSUHGLFWLRQV1DW*HQHW .HUWHV]0,RYLQR18QQHUVWDOO8HWDO7KHUROHRIVLWHDFFHVVLELOLW\LQPLFUR51$WDUJHWUHFRJQLWLRQ Nat Genet 2007; 39(10):1278-1284. 0LUDQGD.&+X\QK77D\<HWDO$SDWWHUQEDVHGPHWKRGIRUWKHLGHQWL¿FDWLRQRI0LFUR51$ELQGLQJ sites and their corresponding heteroduplexes. Cell 2006; 126(6):1203-1217. %DHN'9LOOHQ-6KLQ&HWDO7KHLPSDFWRIPLFUR51$VRQSURWHLQRXWSXW1DWXUH 6HOEDFK06FKZDQKlXVVHU%7KLHUIHOGHU1HWDO:LGHVSUHDGFKDQJHVLQSURWHLQV\QWKHVLVLQGXFHGE\ microRNAs. Nature 2008; 455(7209):58-63. 104. Lim LP, Lau NC, Garrett-Engele P et al. Microarray analysis shows that some microRNAs downregulate ODUJHQXPEHUVRIWDUJHWP51$V1DWXUH .UW]IHOGW-5DMHZVN\1%UDLFK5HWDO6LOHQFLQJRIPLFUR51$VLQYLYRZLWKCDQWDJRPLUV¶1DWXUH 2005; 438(7068):685-689. 254 RNA INFRASTRUCTURE AND NETWORKS (OPHQ - /LQGRZ 0 6LODKWDURJOX $ HW DO $QWDJRQLVP RI PLFUR51$ LQ PLFH E\ V\VWHPLFDOO\ DGPLQLVWHUHG/1$DQWLPL5OHDGVWRXSUHJXODWLRQRIDODUJHVHWRISUHGLFWHGWDUJHWP51$VLQWKHOLYHU Nucleic Acids Res2007; 36(4):1153-1162. (OPpQ-/LQGRZ06FKW]6HWDO/1$PHGLDWHGPLFUR51$VLOHQFLQJLQQRQKXPDQSULPDWHV1DWXUH 2008; 452(7189):896-899. =KDR<5DQVRP-)/L$HWDO'\VUHJXODWLRQRIFDUGLRJHQHVLVFDUGLDFFRQGXFWLRQDQGFHOOF\FOHLQPLFH lacking miRNA-1-2. Cell 2007; 129(2):303-317. 1LFRODV)(3DLV+6FKZDFK)HWDO([SHULPHQWDOLGHQWL¿FDWLRQRIPLFUR51$WDUJHWVE\VLOHQFLQJ and overexpressing miR-140. RNA 2008; 14(12):2513-2520. =KDQJ / +DPPHOO 0 .XGORZ %$ HW DO 6\VWHPDWLF DQDO\VLV RI G\QDPLF PL51$WDUJHW LQWHUDFWLRQV during C. elegans development. Development 2009; 136(18):3043-3055. .DUJLQRY)9&RQDFR&;XDQ=HWDO$ELRFKHPLFDODSSURDFKWRLGHQWLI\LQJPLFUR51$WDUJHWV3URF Natl Acad Sci USA 2007; 104(49):19291-19296. (DVRZ*7HOHPDQ$$&RKHQ60,VRODWLRQRIPLFUR51$WDUJHWVE\PL513LPPXQRSXUL¿FDWLRQ51$ 2007; 13(8):1198-1204. +RQJ;+DPPHOO0$PEURV9HWDO,PPXQRSXUL¿FDWLRQRI$JRPL513VVHOHFWVIRUDGLVWLQFWFODVV RIPLFUR51$WDUJHWV3URF1DWO$FDG6FL86$ &KL6:=DQJ-%0HOH$HWDO$UJRQDXWH+,76&/,3GHFRGHVPLFUR51$±P51$LQWHUDFWLRQPDSV Nature 2009; 460(7254):479-486. 115. Babiarz JE, Ruby JG, Wang Y et al. Mouse ES cells express endogenous shRNAs, siRNAs and other Microprocessor-independent, Dicer-dependent small RNAs. Genes Dev 2008; 22(20):2773-2785. &RUFRUDQ'/3DQGLW.9*RUGRQ%HWDO)HDWXUHVRI0DPPDOLDQPLFUR51$3URPRWHUV(PHUJHIURP Polymerase II Chromatin Immunoprecipitation Data. PLoS ONE 2009; 4(4). 117. Martinez NJ, Ow MC, Barrasa MI et al. A C. elegans genome-scale microRNA network contains composite IHHGEDFNPRWLIVZLWKKLJKÀX[FDSDFLW\*HQHV'HY 6KDOJL5/LHEHU'2UHQ0HWDO*OREDODQGORFDODUFKLWHFWXUHRIWKHPDPPDOLDQPLFUR51$WUDQVFULSWLRQ IDFWRUUHJXODWRU\QHWZRUN3/R6&RPSXW%LROH :KLWH 5 %ODLQH\ 3 )DQ +& HW DO 'LJLWDO 3&5 SURYLGHV VHQVLWLYH DQG DEVROXWH FDOLEUDWLRQ IRU KLJK throughput sequencing. BMC Genomics 2009; 10:116. 120. Ansorge WJ. Next-generation DNA sequencing techniques. N Biotechnol 2009; 25(4):195-203. 'DL;=KDR3;SVV51$0LQHUDSODQWVKRUWVPDOO51$UHJXODWRU\FDVFDGHDQDO\VLVVHUYHU1XFOHLF Acids Res2008; 36(Web Server issue):W114-W118.
© Copyright 2026 Paperzz