chapter 16 small rna discovery and characterisation in eukaryotes

CHAPTER 16
SMALL RNA DISCOVERY AND
CHARACTERISATION IN EUKARYOTES USING
HIGH-THROUGHPUT APPROACHES
+HOLR3DLV6LPRQ0R[RQ7DPDV'DOPD\DQG9LQFHQW0RXOWRQ
School of Computing Sciences, University of East Anglia, Norwich, UK
*Corresponding Author: Vincent Moulton——Email: [email protected]
Abstract:
51$ VLOHQFLQJ LV D PHFKDQLVP RI JHQHWLF UHJXODWLRQ WKDW LV PHGLDWHG E\ VKRUW
noncoding RNAs, or small RNAs (sRNAs). Regulatory interactions are established
based on nucleotide sequence complementarity between the sRNAs and their
WDUJHWV 7KH GHYHORSPHQW RI QHZ KLJKWKURXJKSXW VHTXHQFLQJ WHFKQRORJLHV KDV
DFFHOHUDWHGWKHGLVFRYHU\RIV51$VLQDYDULHW\RISODQWVDQGDQLPDOV7KHXVH
RIWKHVHDQGRWKHUKLJKWKURXJKSXWWHFKQRORJLHVVXFKDVPLFURDUUD\VWRPHDVXUH
51$DQGSURWHLQFRQFHQWUDWLRQVRIJHQHSURGXFWVSRWHQWLDOO\UHJXODWHGE\V51$V
KDVDOVREHHQLPSRUWDQWIRUWKHLUIXQFWLRQDOFKDUDFWHULVDWLRQP51$VWDUJHWHGE\
sRNAs can produce new sRNAs or the protein encoded by the target mRNA can
UHJXODWHRWKHUP51$V,QHLWKHUFDVHWKHWDUJHWLQJV51$VDUHSDUWVRIFRPSOH[
51$QHWZRUNVWKHUHIRUHLGHQWLI\LQJDQGFKDUDFWHULVLQJV51$VFRQWULEXWHWREHWWHU
XQGHUVWDQGLQJRI51$QHWZRUNV,QWKLVFKDSWHUZHZLOOUHYLHZ51$VLOHQFLQJ
WKHGLIIHUHQWW\SHVRIV51$VWKDWPHGLDWHLWDQGWKHFRPSXWDWLRQDOPHWKRGVWKDW
KDYHEHHQGHYHORSHGWRXVHKLJKWKURXJKSXWWHFKQRORJLHVLQWKHVWXG\RIV51$V
and their targets.
INTRODUCTION
%LRORJLFDOV\VWHPVDUHUHJXODWHGDWPXOWLSOHOD\HUVWKURXJKDP\ULDGRIPHFKDQLVPV
$WWKHFHOOXODUOHYHOQRUPDOIXQFWLRQUHTXLUHVUHJXODWLRQRIJHQHH[SUHVVLRQ2QHRIWKH
systems eukaryotic cells have in place to accomplish this task is RNA silencing, a process
LQZKLFKDFRPSOH[IRUPHGE\DQ51$PROHFXOHDQGRQHRUPRUHSURWHLQVLQWHUDFWV
HLWKHUZLWKDGLIIHUHQW51$RU'1$FDXVLQJPRGL¿FDWLRQVLQWKHUDWHVRIWUDQVODWLRQ
RNA Infrastructure and Networks, edited by Lesley J. Collins .
©2011 Landes Bioscience and Springer Science+Business Media.
239
240
RNA INFRASTRUCTURE AND NETWORKS
or transcription. The RNA molecules present usually contain less than 30 nucleotides
and are commonly called small RNAs (sRNAs). In this chapter we will adopt this
convention, although we note that there are other noncoding small RNAs that are not
involved in RNA silencing (e.g., tRNAs, snoRNAs) but they will not be described here.
In prokaryotic organisms there are also noncoding RNAs that regulate gene expression,
EXW WKH PHFKDQLVPV E\ ZKLFK WKH\ DFWLV FRPSOHWHO\GLIIHUHQWIURPHXNDU\RWLF51$
VLOHQFLQJDQGWKHLUUHYLHZIDOOVRXWVLGHWKHVFRSHRIWKLVFKDSWHU)RUUHYLHZVRQWKLV
WRSLFVHHIRUH[DPSOHUHIHUHQFHV1-3.
2QH RI WKH PRVW LPSRUWDQW IDFWRUV LQ WKH H[SORVLYH JURZWK RI NQRZOHGJH LQ WKH
51$VLOHQFLQJ¿HOGKDVEHHQWKHDSSOLFDWLRQRIQHZ'1$VHTXHQFLQJWHFKQRORJLHV,Q
comparison to conventional Sanger sequencing, these new technologies are characterised
by producing shorter reads with very high-throughput. Researchers in the RNA silencing
¿HOGKDYHEHHQDPRQJWKH¿UVWWRDGRSWKLJKWKURXJKSXWVHTXHQFLQJDSSURDFKHVEHFDXVH
WKHPDLQOLPLWDWLRQRIWKHWHFKQLTXHLHWKHVKRUWUHDGOHQJWKLVLUUHOHYDQWIRUWKHDQDO\VLV
RIVPDOO51$V51$ZKLFKDUHXVXDOO\VKRUWHUWKDQQXFOHRWLGHV+LJKWKURXJKSXW
sequencing and other high-throughput methods, such as microarrays, have also been
LPSRUWDQWERWKWRSUR¿OHWKHH[SUHVVLRQRIV51$VDQGWUDQVFULSWVSRWHQWLDOO\UHJXODWHGE\
V51$VV51$WDUJHWV7KHODWWHUKDVEHHQSDUWLFXODUO\XVHIXOLQDQLPDOV\VWHPVZKHUH
WKHLQWHUDFWLRQEHWZHHQPRVWV51$VDQGWKHLUWDUJHWVLVPHGLDWHGE\DVPDOOQXPEHURI
QXFOHRWLGHVDVIHZDVVL[&RQVHTXHQWO\DFFXUDWHFRPSXWDWLRQDOSUHGLFWLRQRIV51$
WDUJHWVLVTXLWHFKDOOHQJLQJ,QWKLVFKDSWHUZHZLOOUHYLHZWKH¿HOGRI51$VLOHQFLQJ
describe the small RNAs that are involved in this process and how high-throughput
WHFKQRORJLHVKDYHEHHQXVHGWRVWXG\WKHLUELRORJLFDOIXQFWLRQ
THE MECHANISMS OF RNA SILENCING
7KH ¿UVW V51$ WR EH GLVFRYHUHG OLQ ZDV IRXQG LQ D JHQHWLF VFUHHQ WR VWXG\
GHYHORSPHQWDOGHIHFWVLQWKHZRUPC. elegans.4,5,QSODQWVWKHGLVFRYHU\RIV51$VZDV
PDGHE\UHVHDUFKHUVZRUNLQJRQYLUDOGHIHQFH6,7 Meanwhile, work was being done on the
GHOLYHU\RIH[RJHQRXV51$PROHFXOHVZLWKWKHJRDORIUHSUHVVLQJJHQHH[SUHVVLRQ8 The use
RIGRXEOHVWUDQGHG51$PROHFXOHVWRVSHFL¿FDOO\DQGVWURQJO\UHSUHVVJHQHH[SUHVVLRQOHG
WRWKHDZDUGRID1REHO3UL]HLQPHGLFLQHOHVVWKDQWHQ\HDUVDIWHULWZDV¿UVWXQFRYHUHG9
6LQFHWKHQDPXOWLWXGHRIGLIIHUHQWV51$FODVVHVKDYHEHHQFKDUDFWHULVHGPRVW
RI ZKLFK DUH GHULYHG IURP ORQJHU GRXEOH VWUDQGHG RU IROGEDFN 51$ SUHFXUVRUV
RNAaseIII-type enzymes called Dicers are able to recognise these precursors and process
them to produce double stranded sRNAs. These are then incorporated into an Argonaute
SURWHLQZKHUHRQHRIWKHVWUDQGVLVGHJUDGHG7KHRWKHUVWUDQGFDOOHGWKHJXLGHVWUDQG
UHPDLQVLQFRUSRUDWHGLQWRWKH$UJRQDXWHZKLFKLQFRQMXQFWLRQZLWKRWKHUSURWHLQVIRUPV
DQHIIHFWRUFRPSOH[FDSDEOHRIUHFRJQL]LQJDVSHFL¿F'1$RU51$WDUJHW10 Once the
HIIHFWRUFRPSOH[KDVERXQGWKHWDUJHWLWLVDEOHWRHLWKHULQGXFHWDUJHWP51$FOHDYDJH
mRNA destabilisation without cleavage, inhibit protein translation or cause DNA and
KLVWRQHPRGL¿FDWLRQVWKDWOHDGWRWUDQVFULSWLRQDOVLOHQFLQJ11-137KHQDWXUHRIWKHSUHFXUVRU
RIWKHVXEVHTXHQWSURFHVVLQJVWHSVDQGRIWKHWDUJHW'1$P51$RURWKHUV51$V
determines the class a sRNA belongs to (Fig. 1).
V51$VLQYROYHGLQ51$VLOHQFLQJKDYHEHHQGHVFULEHGLQDODUJHQXPEHURIHXNDU\RWHV
+RZHYHUWKHVPDOO51$ODQGVFDSHVDUHTXLWHGLYHUVHDPRQJGLIIHUHQWSK\OD$OWKRXJK
VRPHRIWKH51$VLOHQFLQJPDFKLQHU\LVFRQVHUYHGEHWZHHQPRVWHXNDU\RWHVWRGDWHQR
Figure 1. 6PDOO 51$ SURGXFWLRQ SDWKZD\V 5HSULQWHG ZLWK SHUPLVVLRQ IURP 3KLOOLSV -5 'DOPD\ 7 %DUWHOV ' FEBS Letters 2007; 581(19):3592-3597.
SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES
241
242
RNA INFRASTRUCTURE AND NETWORKS
individual small RNA conserved between animals and plants has been discovered.14,15
7KHUHDUHDOVRGLIIHUHQFHVLQWKHKHWHURJHQHLW\RIV51$VLQSODQWVWKHUHLVDPXFKJUHDWHU
diversity in sRNA producing mechanisms than in animals.7
PL51$VGHULYHIURPSUHFXUVRUVFKDUDFWHULVHGE\DVHFRQGDU\VWUXFWXUHLQZKLFKPRVW
QXFOHRWLGHVIRUPDVLQJOHVWHPPL51$VDUHSUHVHQWLQDQLPDOVDQGSODQWVEXWWKHUHDUH
VLJQL¿FDQWVWUXFWXUDOGLIIHUHQFHVEHWZHHQWKHP15-17 Generally, pre-miRNAs are longer
DQGIRUPDJUHDWHUQXPEHURIEDVHSDLUVLQSODQWVWKDQLQDQLPDOV7KHELRJHQHVLVRI
PL51$VKDVEHHQSDUWLFXODUO\ZHOOVWXGLHG7KLVSURFHVVEHJLQVZLWKWKHWUDQVFULSWLRQRI
WKHJHQRPLFORFXVZKHUHWKHPL51$LVHQFRGHGFRQWLQXHVZLWKDVHULHVRISURFHVVLQJ
VWHSVRIWKHUHVXOWLQJ51$WUDQVFULSWDQGFXOPLQDWHVZLWKWKHLQFRUSRUDWLRQRIWKHPDWXUH
PL51$LQWRDVLOHQFLQJFRPSOH[:HUHIHUWKHUHDGHUWRWKHPDQ\UHYLHZDUWLFOHVWKDW
extensively describe miRNA biogenesis in animals,18-21 plants17,22 and both.15,23
6PDOOLQWHUIHULQJ51$VVL51$VDUHDOVRSUHVHQWLQDQLPDOVDQGSODQWVDQGDJDLQ
WKHUHDUHPDQ\GLIIHUHQFHVLQWKHLUELRJHQHVLVDQGPRGHVRIDFWLRQVL51$VRULJLQDWH
IURPDORQJHUVLQJOHVWUDQGHGSUHFXUVRUWKDWLQSODQWVLVWXUQHGLQWRGRXEOHVWUDQGHG
IRUP E\ DQ 51$GHSHQGHQW 51$ SRO\PHUDVH 5G5S EHIRUH EHLQJ SURFHVVHG E\ D
Dicer enzyme.10,14,24,25VL51$VFDQWDUJHWERWKWKHORFLZKHUHWKH\DUHSURGXFHGIURPDQG
other loci with high sequence homology. siRNAs can act both by DNA transcriptional
VLOHQFLQJDQG51$GHJUDGDWLRQ$VXEFODVVRIVL51$VFDOOHGQDWXUDODQWLVHQVHVL51$V
QDWVL51$VLVSURGXFHGIURPSDUWLDOO\RYHUODSSLQJSDLUVRIWUDQVFULSWVRULJLQDWLQJIURP
RSSRVLQJVWUDQGVRI'1$3URGXFWLRQRIQDWVL51$VKDVVRIDURQO\EHHQREVHUYHGLQ
VWUHVVUHVSRQVH,QDQLPDOVOHVVLVNQRZQDERXWWKHELRJHQHVLVDQGLGHQWLW\RIWKHWDUJHWV
RIVL51$V$WOHDVWDVXEVHWRIWKHPDUHSURGXFHGIURPRYHUODSSLQJWUDQVFULSWVDQGVHHP
to be involved in transposon silencing.26
3LZLLQWHUDFWLQJ51$VSL51$VFRQVWLWXWHDGLIIHUHQWFODVVRIV51$VWKDWLQWHUDFW
ZLWKDVXEIDPLO\RIWKH$UJRQDXWHSURWHLQVFDOOHG3LZLSURWHLQV14,27 piRNAs are thought
to be present exclusively in animals, seem to be expressed only in the germline and
FRQWULEXWHWRWKHVWDELOLW\RIWKHFHOOOLQHE\VLOHQFLQJWUDQVSRVRQVSL51$VDUHXVXDOO\
QXFOHRWLGHVORQJDQGPRVWRIWKHPDUHHQFRGHGLQUHSHWLWLYHUHJLRQVRIWKHJHQRPH
51$VLOHQFLQJUHTXLUHVERWKDQV51$DQGDQHIIHFWRUFRPSOH[IRUPHGE\DVHWRI
SURWHLQVLQFOXGLQJDPHPEHURIWKH$UJRQDXWHIDPLO\7KHUHDUHPDLQO\WZRW\SHVRI
HIIHFWRUFRPSOH[5,6&V51$LQGXFHGVLOHQFLQJFRPSOH[HVDQG5,76V51$LQGXFHG
LQLWLDWLRQ RI WUDQVFULSWLRQDO VLOHQFLQJ 5,76V DV WKH QDPH LQGLFDWHV DUH LQYROYHG LQ
WUDQVFULSWLRQDOVLOHQFLQJE\SURPRWLQJWKHIRUPDWLRQRIKHWHURFKURPDWLQ13 RISCs mediate
posttranscriptional regulation and act by binding to messenger RNAs and either promoting
WKHP51$¶VGHJUDGDWLRQRULQÀXHQFLQJWKHUDWHRIWKHP51$¶VWUDQVODWLRQ12 Each sRNA
JXLGHVWKH5,6&WRDVSHFL¿FUHJLRQRIDQP51$NQRZQDVWKHWDUJHWVLWHEDVHGRQ
complementarity between the sRNA and its target. The rules governing the interaction
EHWZHHQDQV51$FRQWDLQLQJ5,6&DQGDQP51$DUHQRWFXUUHQWO\IXOO\XQGHUVWRRG
)RUH[DPSOHQRWRQO\LWLVLQIHDVLEOHWRUHOLDEO\SUHGLFWDQP51$V51$LQWHUDFWLRQLW
is also not always possible to say, a prioriZKDWHIIHFWDQV51$KDVRQDWDUJHWP51$
7KHIRXUNQRZQRXWFRPHVRIV51$WDUJHWLQJDUHFOHDYDJHRIWDUJHWP51$DFFHOHUDWHG
GHJUDGDWLRQRIWDUJHWP51$UHSUHVVLRQRIWUDQVODWLRQZLWKRXWP51$GHJUDGDWLRQDQG
HQKDQFHPHQWRIWUDQVODWLRQ28
7KHELRORJLFDOSURFHVVHVUHJXODWHGE\HDFKV51$GHSHQGRQLWVVHWRIWDUJHWV)RUWKLV
UHDVRQDODUJHHIIRUWKDVEHHQPDGHWRH[WHQVLYHO\LGHQWLI\V51$WDUJHWVDQGLQSDUWLFXODU
PL51$WDUJHWVV51$VJXLGHWKHHIIHFWRUFRPSOH[HVWRWKHLUWDUJHWV7KHWDUJHWUHFRJQLWLRQ
LV EDVHG RQ QXFOHRWLGH VHTXHQFH FRPSOHPHQWDULW\ 7KH GHJUHH RI FRPSOHPHQWDULW\
SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES
243
SDUWLDOO\ GHWHUPLQHV WKH WDUJHWLQJ RXWFRPH D KLJK GHJUHH RI FRPSOHPHQWDULW\ PRUH
FRPPRQLQSODQWVQRUPDOO\OHDGVWRWDUJHWFOHDYDJHDORZGHJUHHRIFRPSOHPHQWDULW\
PRUHFRPPRQLQDQLPDOVOHDGVWRGHFUHDVHGUDWHVRIWUDQVODWLRQRULQFUHDVHGUDWHVRI
target degradation.10%DVHGRQH[SHULPHQWDOO\YDOLGDWHGV51$WDUJHWVVHWVRIHPSLULFDOO\
GHULYHG UXOHV KDYH EHHQ FUHDWHG ERWK IRU KLJKFRPSOHPHQWDULW\ WDUJHW VLWHV29,30 and
low-complementarity target sites.31,32
*LYHQWKHZLGHVSUHDGHIIHFWRIV51$JXLGHGJHQHH[SUHVVLRQUHJXODWLRQLWFRPHV
DVQRVXUSULVHWKDWWKHH[SUHVVLRQRIV51$JHQHVWKHPVHOYHVLVWLJKWO\UHJXODWHG$V
with protein coding genes, control can be exerted at many levels and with distinct
OHYHOVRIVSHFL¿FLW\332QHOHYHORIUHJXODWLRQWDNHVSODFHGXULQJWKHELRJHQHVLVRIWKH
V51$V1DPHO\WKHDYDLODELOLW\RIWKHSURWHLQVUHTXLUHGIRUWKHIRUPDWLRQRIDFHUWDLQ
FODVVRIV51$VZLOOOLPLWWKHOHYHORIWKRVHV51$V)RUH[DPSOHLQVRPHFDQFHUWLVVXHV
DQLQFUHDVHGQXPEHURIFRSLHVRI'URVKDDJHQHLQYROYHGLQPL51$ELRJHQHVLVKDV
EHHQREVHUYHGOHDGLQJWRZLGHVSUHDGRYHUH[SUHVVLRQRIPL51$V34 Another example
LVWKHFRQWURORIWKHH[SUHVVLRQOHYHORIRQHRIWKH'LFHUSURWHLQV'&/E\DPL51$
PL5%HFDXVHWKHSURGXFWLRQUDWHRIWKHPL51$GHSHQGVRQWKHOHYHORI'LFHU
WKHVHLQWHUDFWLRQVIRUPDQHJDWLYHIHHGEDFNORRS35
$QRWKHUIDFWRUWKDWLQÀXHQFHVV51$DFWLYLW\LVWKHSUHVHQFHRIRWKHU51$PROHFXOHV
WKDWPLJKWFRPSHWHZLWKWKHV51$VIRURQHRUPRUHRIWKHVHSURWHLQV,QArabidopsis
VRPHQRQFRGLQJ51$VZKRVHVHFRQGDU\VWUXFWXUHUHVHPEOHVWKDWRISUHPL51$VVHHP
WRFRPSHWHIRURQHSURWHLQLQYROYHGLQSODQWPL51$ELRJHQHVLV+</36 An interesting
FRQVHTXHQFHRI WKH LPSRUWDQFHRI WKH OLPLWHGDYDLODELOLW\RI WKH SURWHLQV LQYROYHGLQ
PL51$DFWLYLW\LVWKDWHDFKPL51$FRQWUROVWKHDFWLYLW\OHYHORIDOORWKHUPL51$V
DOEHLWSDVVLYHO\DQGLQGLUHFWO\)RUH[DPSOHLQ]HEUD¿VKHPEU\RVZKHUHPL51$DFWLYLW\
LVHVVHQWLDOWRSURPRWHWKHGHJUDGDWLRQPDWHUQDORIP51$37WUDQVIHFWLRQZLWKVL51$
UHGXFHVWKHDFWLYLW\RIHQGRJHQRXVPL51$VSUHVXPDEO\EHFDXVHWKHVL51$FRPSHWHV
ZLWKWKHPL51$VIRU$UJRQDXWH387KHUHDUHDOVRPDQ\PHFKDQLVPVRIUHJXODWLQJWKH
DFWLYLW\RILQGLYLGXDOV51$V$VZLWKRWKHUW\SHVRIJHQHVWKHUHDUHPDQ\IDFWRUVWKDW
FRQWUROWKHWUDQVFULSWLRQUDWHRIV51$FRQWDLQLQJWUDQVFULSWVE\ELQGLQJWRWKHUHVSHFWLYH
promoter region.39,QSDUWLFXODUWKHSURPRWHUUHJLRQVRIPL51$VKDYHEHHQH[WHQVLYHO\
studied40,41DQGIRUDODUJHQXPEHURIPL51$VVRPHRIWKHSURWHLQVUHJXODWLQJWUDQVFULSWLRQ
KDYHEHHQLGHQWL¿HG40
USING MICROARRAYS AND DNA SEQUENCING TO MEASURE RNA
0XFKRIWKHXQGHUVWDQGLQJRIKRZV51$VDUHSURGXFHGDQGZKDWIXQFWLRQVWKH\
SHUIRUPLQVLGHWKHFHOOLVGHULYHGIURPPHDVXUHPHQWVRIWKHV51$VWKHPVHOYHVRIWKHLU
UHVSHFWLYHSUHFXUVRUVDQGRIWKHWUDQVFULSWVUHJXODWHGE\WKHP7HFKQLTXHVWRPHDVXUH
WKHDEXQGDQFHRI51$PROHFXOHVVXFKDV57T3&5DQGQRUWKHUQEORWWLQJKDYHEHHQ
XVHG IRU PRUH WKDQ WKLUW\ \HDUV DQG DUH VWLOO XVHG WR YHULI\ UHVXOWV REWDLQHG WKURXJK
other means.42,43+RZHYHUWKHVHPHWKRGVDUHYHU\WLPHFRQVXPLQJDQGWKHUHIRUHFDQ
EHXVHGWRVWXG\RQO\DOLPLWHGQXPEHURI51$V6LQFHWKHVWZRKLJKWKURXJKSXW
approaches that can be used to measure RNA, have been developed: microarrays and
RNA sequencing (RNAseq).
0LFURDUUD\V DUH EDVHG RQ WKH VLPXOWDQHRXV K\EULGL]DWLRQ RI D ODUJH QXPEHU RI
SUREHVDWWDFKHGWRDVROLGVXUIDFHDQGWKHVDPSOHRILQWHUHVW44 The probes are designed
VRWKDWHDFKRIWKHPLVFRPSOHPHQWDU\WRDXQLTXHVHTXHQFHLQWKHVDPSOHFDOOHGWKH
244
RNA INFRASTRUCTURE AND NETWORKS
probe target. This approach has two limitations: it is possible to measure only molecules
IRUZKLFKWKHVHTXHQFHLVNQRZQDQGLWLVSRVVLEOHWKDWPHDVXUHPHQWVDUHFRUUXSWHGE\
FURVVK\EULGL]DWLRQWKDWLVK\EULGL]DWLRQRIDSUREHWRDPROHFXOHRWKHUWKDQLWVWDUJHW
7KLVFDQKDSSHQLIWKHVDPSOHFRQWDLQVVHTXHQFHVKLJKO\VLPLODUWRWKHWDUJHWLQZKLFK
case hybridization to the probe might still occur.
$QRWKHUZD\RIHVWLPDWLQJ51$FRQFHQWUDWLRQVFRQVLVWVRIVHTXHQFLQJDQXPEHURI
PROHFXOHVLQWKHVDPSOHDQGWDNLQJWKHQXPEHURIWLPHVHDFKPROHFXOHLVVHTXHQFHGDVD
PHDVXUHRILWVDEXQGDQFH7KHTXDOLW\RIWKHUHVXOWVREWDLQHGXVLQJWKLVDSSURDFKFULWLFDOO\
GHSHQGVRQWKHWRWDOQXPEHURIUHDGVWKDWFDQEHREWDLQHGIURPDVLQJOHVDPSOH7KHQHZ
KLJKWKURXJKSXWVHTXHQFLQJWHFKQRORJLHVGHYHORSHGRYHUWKHODVWIHZ\HDUVKDYHLQFUHDVHG
WKLVQXPEHUE\VHYHUDORUGHUVRIPDJQLWXGH7KH¿UVWKLJKWKURXJKSXWVHTXHQFLQJPHWKRG
to be developed is called massively parallel signature sequencing (MPSS)45 but is now
YHU\VHOGRPXVHG&XUUHQWO\WKUHHFRPPHUFLDOSODWIRUPVDUHZLGHO\XVHG5RFKH¶V
)/;V\VWHP46 Illumina’’s Genome Analyzer47IRUPHUO\NQRZQDV6ROH[DVHTXHQFLQJ
and succeeded by Illumina’’s more recent model the HiSeq 2000) and ABI’’s SOLiD.48
$NQRZQGUDZEDFNRIVHTXHQFLQJLVWKDWWKHVHTXHQFLQJVRIWZDUHXVHVVLJQDO
LQWHQVLW\WRGHWHUPLQHWKHQXPEHURIFRQVHFXWLYHLGHQWLFDOEDVHVLQDVHTXHQFH:KHQ
PXOWLSOHFRQVHFXWLYHLGHQWLFDOEDVHVDUHHQFRXQWHUHGHVSHFLDOO\IRXURUPRUHUHSHDWHG
EDVHVWKHVRIWZDUHFDQQRWUHOLDEO\LQWHUSUHWWKHVLJQDOLQWHQVLW\DQGWKHUHIRUHWKHQXPEHU
RI EDVHV UHDG ZKLFK FDQ OHDG WR VHTXHQFLQJ HUURUV HVSHFLDOO\ ZLWK ORZ FRPSOH[LW\
sequences. Using the Genome Analyzer technology each nucleotide is sequenced
individually, eliminating the problem that 454 technology has with homopolymeric
VHTXHQFHV+RZHYHUWKH*HQRPH$QDO\]HUSURGXFHVDQLQFUHDVHGQXPEHURIHUURUVDW
the 3vHQGRIORQJHUVHTXHQFHVPHDQLQJWKDWVHTXHQFHOHQJWKLVDOLPLWLQJIDFWRURIWKLV
WHFKQRORJ\62/L'VHTXHQFLQJLVDOVRDIIHFWHGE\OHQJWKEXWKDVDQLQFUHDVHGWKURXJKSXW
FRPSDUHGWRWKHRWKHUKLJKWKURXJKSXWSODWIRUPV
7KH VHTXHQFLQJ DSSURDFK RYHUFRPHV WKH WZR PDLQ OLPLWDWLRQV RI PLFURDUUD\V
with sequencing it is possible to measure both known and unknown products and in
principle the measurements are independent. Furthermore, in high-throughput sequencing
GDWDVHWVLWLVSRVVLEOHWRGHWHFWVHTXHQFHYDULDQWV2QHSUREOHPRIWKLVPHWKRGKDVEHHQ
highlighted by two recent studies that indicate that there might be biases on the number
RIWLPHVDVHTXHQFHLVUHDG49,50 That is, the expression levels, as measured by the count
RIDVHTXHQFHZLOOEHGLVWRUWHGEHFDXVHLQGLYLGXDOVHTXHQFHVKDYHGLIIHUHQWSURSHQVLWLHV
WREHVHTXHQFHG$QRWKHUSRWHQWLDOGLVDGYDQWDJHRIWKLVPHWKRGLVWKDWWKHGLVFRYHU\RI
sRNAs is limited to ligation compatible sRNAs, in general with a 5v mono-phosphate
group and a 3v hydroxyl group. This means that there are probably unknown sRNAs that
KDYHQRWEHHQXQFRYHUHGE\WKLVWHFKQRORJ\DVRI\HW
HIGH-THROUGHPUT APPROACHES FOR THE DISCOVERY
AND FUNCTIONAL CLASSIFICATION OF sRNAs
sRNA Discovery
%HIRUHWKHDGYHQWRIKLJKWKURXJKSXWVHTXHQFLQJPHWKRGVVRPHHIIRUWVZHUHPDGHWR
computationally predict miRNA genes.30,51-550RVWRIWKHVHPHWKRGVWU\WR¿QGJHQRPLF
UHJLRQVWKDWFRXOGSURGXFH51$VZLWKVLPLODUFKDUDFWHULVWLFVWRWKRVHRISUHFXUVRUPL51$V
7R¿QGWKHVHUHJLRQVVHFRQGDU\VWUXFWXUHSUHGLFWLRQDOJRULWKPVDUHRIWHQHPSOR\HG56,57
SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES
245
7KHVHDOJRULWKPVUHWXUQVHFRQGDU\VWUXFWXUHVWKDWPLQLPL]HWKHIUHHHQHUJ\RIWKH51$
under a certain model. miRNA candidates display predicted secondary structures with
FKDUDFWHULVWLFVVLPLODUWRWKRVHRINQRZQPL51$V)RUH[DPSOHQXPEHUDQGSRVLWLRQV
RIWKHEDVHSDLUVQXPEHURIQXFOHRWLGHVEULGJLQJWKHWZRDUPVRIWKHVWHPRUQXPEHU
DQGOHQJWKRILQWHUQDOEXOJHVLQWKHVWHP
0RVWPHWKRGVSUHGLFWPDQ\WKRXVDQGVRIFDQGLGDWHPL51$VHTXHQFHVLQGLFDWLQJWKDW
VXFKDSSURDFKHVVXIIHUIURPDODFNRIVSHFL¿FLW\7RUHGXFHWKHQXPEHURIIDOVHSRVLWLYH
predictions, many algorithms30,53,54,58 employ a conservation rule, i.e., a candidate miRNA
LVRQO\DFFHSWHGLIDKRPRORJXHFDQEHIRXQGLQWKHJHQRPHRIDWOHDVWRQHRWKHUUHODWHG
VSHFLHV 7KLV PHWKRG RI PL51$ SUHGLFWLRQ DQG FURVVVSHFLHV FRQVHUYDWLRQ FKHFNLQJ
KDVEHHQVXFFHVVIXOO\HPSOR\HGWR¿QGPDQ\QRYHOPL51$VLQERWKSODQWVDQGDQLPDOV
ZLWKDKLJKGHJUHHRIDFFXUDF\$OWKRXJKVRPHPL51$VDUHFRQVHUYHGEHWZHHQFORVHO\
UHODWHGRUJDQLVPVPDQ\KDYHQRZEHHQVKRZQWREHVSHFL¿FWRLQGLYLGXDOWD[RQRPLF
groups.59,60-637KLVGLVFRYHU\KDVH[SRVHGWKHOLPLWDWLRQVRIFRPSDUDWLYHPHWKRGVDQGKDV
OHGWRWKHQHHGIRUDOWHUQDWLYHDSSURDFKHVWRPL51$GHWHFWLRQ
,QWKHSDVWPRVWPL51$VZHUHIRUPDOO\LGHQWL¿HGXVLQJWUDGLWLRQDO6DQJHUVHTXHQFLQJ
DIWHUVL]HIUDFWLRQDWLRQVHOHFWLQJIRUVHTXHQFHV¾20-22nt) and ligation into cloning vectors.
This process was adopted in ArabidopsisULFHDQGSRSODUDQGFRPSDULVRQRIPL51$
VHTXHQFHVDFURVVSODQWIDPLOLHVVKRZHGWKDWWKHPDMRULW\ZHUHFRQVHUYHG64 In animals
VLPLODUVWXGLHVOHGWRWKHGLVFRYHU\RIPDQ\PL51$V65-67 Recently, the high-throughput
VHTXHQFLQJ RI V51$V KDV OHG WR WKH GLVFRYHU\ RI D SOHWKRUD RI QHZ V51$V PDQ\
RIZKLFKDUHH[SUHVVHGDWORZOHYHOVDQGDUHHLWKHUXQLTXHWRDVSHFL¿FVSHFLHVRUDW
least not widely conserved in related organisms. For example, this has been used to
XQFRYHUPL51$VVSHFL¿FWRWKHKXPDQEUDLQ684'(DVVRFLDWHG51$VTL51$VLQ
Neurospora crassa,69 DQG HQGRVL51$V LQ WKHIUXLWÀ\70 7KHXVH RI KLJKWKURXJKSXW
VHTXHQFLQJWHFKQRORJLHVUHPRYHGWKHQHHGRIFORQLQJSULRUWRVHTXHQFLQJ0366ZDV
WKH¿UVWKLJKWKURXJKSXWVHTXHQFLQJPHWKRGVXFFHVVIXOO\XVHGWRGLVFRYHUDQXPEHURI
novel miRNAs in Arabidopsis.71 Subsequently, 454 pyrosequencing,59,61,63,72,73, Illumina’’s
Genome Analyzer74-76 and ABI’’s SOLiD77,78SODWIRUPKDYHEHHQXVHGWRGLVFRYHUV51$V
Although high-throughput techniques have revolutionised sRNA sequencing they
KDYHOHGWRQHZSUREOHPVZLWKGDWDDQDO\VLV3UHYLRXVO\ELRORJLVWVZRXOGRIWHQPDQXDOO\
ZRUNWKURXJKVPDOOOLVWVRIV51$VWHVWLQJHDFKIRUPL51$OLNHSURSHUWLHV1RZZLWK
PLOOLRQVRIUHDGVEHLQJSURGXFHGE\DVLQJOHVHTXHQFLQJUXQWKHQHHGIRUFRPSXWDWLRQDO
WHFKQLTXHVWRSURFHVVDQGFODVVLI\V51$VLQDKLJKWKURXJKSXWPDQQHUKDVEHFRPHDSSDUHQW
7KH ¿UVW VWHS RI WKH FRPSXWDWLRQDO DQDO\VLV RI WKLV W\SH RI GDWD FRQVLVWV RI WKH
LGHQWL¿FDWLRQRIWKHJHQRPLFFRRUGLQDWHVWKDWFRXOGKDYHJHQHUDWHGHDFKRIWKHUHDGV7KLV
LVGRQHE\PDWFKLQJWKHVHTXHQFHUHDGVWRWKHJHQRPHDSURFHGXUHXVXDOO\UHIHUUHGWRDV
UHDGPDSSLQJRUUHDGDOLJQLQJ$OLPLWHGQXPEHURIPLVPDWFKHVLQVHUWLRQVRUGHOHWLRQVLQ
WKHUHDGVPD\EHDOORZHGLQRUGHUWRDFFRXQWIRUVHTXHQFLQJHUURUVDQGJHQXLQHGLIIHUHQFHV
LQUHODWLRQWRWKHUHIHUHQFHJHQRPHVXFKDVVLQJOHQXFOHRWLGHSRO\PRUSKLVPV0DSSLQJ
PLOOLRQVRIUHDGVWRDHXNDU\RWLFJHQRPHLVDFRPSXWDWLRQDOO\LQWHQVLYHWDVNDQGDQXPEHU
RIVRIWZDUHSDFNDJHVKDYHEHHQGHYHORSHGWRSHUIRUPWKLVWDVNRSWLPDOO\79 These new
methods run in less time using less memory than other alignment tools, such as BLAST
E\SUHSURFHVVLQJHLWKHUWKHJHQRPHRUWKHVHWRIUHDGVLQWRLQGH[HV0HWKRGVVXFKDV
Bowtie and BWA80,81LQGH[WKHUHIHUHQFHJHQRPHXVLQJD%XUURZV:KHHOHU7UDQVIRUP
7KH3DW0D1SURJUDPLQGH[HVDOOUHDGVLQWRDVXI¿[WUHHDQGXVHVDPRGL¿HGYHUVLRQRIWKH
$KR&RUDVLFNDOJRULWKPWRDOLJQWKHVXI¿[WUHHWRWKHJHQRPH82 Additionally, in contrast
WR%/$67WKHVHPHWKRGVDUHJXDUDQWHHGWR¿QGDOOWKHPDWFKHVEHWZHHQWKHVHWRIUHDGV
246
RNA INFRASTRUCTURE AND NETWORKS
DQGWKHUHIHUHQFHJHQRPH8VXDOO\LQV51$SUR¿OLQJWKHUHDGVDUHQRWDVVHPEOHGLQWR
FRQWLJVEXWWKLVVWHSPLJKWEHXVHIXOIRUWKHGLVFRYHU\RIORQJHUQF51$V
7KHJHQRPLFFRRUGLQDWHVDUHVXEVHTXHQWO\XVHGWRH[WUDFWUHJLRQVRIWKHJHQRPH
RIWKHVDPHVL]HDVDW\SLFDOPL51$SUHFXUVRUDQGFRPSDWLEOHZLWKWKHSURGXFWLRQRID
mature miRNA in the same position where the read was mapped to. These regions are
WKHQVXEMHFWHGWRWKHSURFHGXUHSUHYLRXVO\GHVFULEHGSUHGLFWLRQRIVHFRQGDU\VWUXFWXUH
DQGFRPSDULVRQWRVHFRQGDU\VWUXFWXUHVRINQRZQPL51$V,QWZRFRPSXWDWLRQDO
PHWKRGV ZHUH UHOHDVHG WR SUHGLFW PL51$V IURP KLJKWKURXJKSXW VHTXHQFLQJ GDWD
miRDeep83 VSHFL¿F WR DQLPDO GDWDVHWV DQG PL5&DW84 DW WKH WLPH VSHFL¿F WR SODQW
GDWDVHWV%RWKPL5'HHSDQGPL5&DWVKRZDKLJKGHJUHHRIVSHFL¿FLW\LQFRPSDULVRQWR
SXUHO\FRPSXWDWLRQDOGHQRYRPL51$SUHGLFWLRQDOJRULWKPV,QRQHRWKHUVRIWZDUH
package, miRanalyser, was released.857KLVWRROSURGXFHVH[SUHVVLRQSUR¿OHVRINQRZQ
PL51$V RWKHU QRQFRGLQJ 51$V DQG SUHGLFWV QHZ PL51$V 7KH SUHGLFWLRQ RI QHZ
PL51$VLVSHUIRUPHGXVLQJDPDFKLQHOHDUQLQJDSSURDFKDQGWKHVHWRINQRZQPL51$V
DVWKHWHVWVHW%HFDXVHWKHVHWRISUHGLFWHGPL51$VGRHVQRWLQFOXGHWKHNQRZQPL51$V
LWLVGLI¿FXOWWRDVVHVVWKHVHQVLWLYLW\RIWKLVPHWKRGDQGWRFRPSDUHWKHTXDOLW\RIWKH
SUHGLFWLRQVWRWKRVHRIPL5&DWRUPL5'HHS$GGLWLRQDOO\WKHUHDUHWZRSURJUDPVWKDW
VSHFLDOLVHLQSUR¿OLQJNQRZQPL51$VPL5SURI84 and miRExpress.867KHH[LVWHQFHRI
OHVVDEXQGDQWVHTXHQFHYDULDQWVRIPL51$VFDOOHGLVRPL5VZDVDOVRXQFRYHUHGXVLQJ
high-throughput sequencing data.766HT%XVWHUDUHFHQWO\SXEOLVKHGVRIWZDUHSDFNDJH87
UHSRUWVWKHUHODWLYHDEXQGDQFHRIWKHFDQRQLFDOPL51$VDQGWKHUHVSHFWLYHLVRPL5V1R
GRXEWE\WKHWLPHWKLVFKDSWHULVLQSUHVVWKHUHZLOOEHDGGLWLRQDOVRIWZDUHSDFNDJHVIRU
PL51$DQDO\VLVDVWKLVLVDUDSLGO\JURZLQJ¿HOG
7KHUHFHQWH[SORVLRQLQKLJKWKURXJKSXWVHTXHQFLQJRIV51$VKDVOHGWRDKXJH
LQFUHDVH LQ WKH GLVFRYHU\ RI QRYHO PL51$V LQ D ZLGH YDULHW\ RI RUJDQLVPV 7KLV LV
GHPRQVWUDWHGE\WKHUDSLGJURZWKRIWKHFHQWUDOPL51$UHSRVLWRU\PL5%DVH88 which
KDV JURZQ GUDPDWLFDOO\ IURP HQWULHV LQ DW WKH DGYHQW RI KLJKWKURXJKSXW
sRNA sequencing to 14,197 in the latest release (15.0). With next generation sequencing
WHFKQRORJLHVEHFRPLQJHYHUPRUHSRZHUIXODQGPRUHHFRQRPLFDOLWLVOLNHO\WKDWPDQ\
PRUH PL51$V ZLOO EH FKDUDFWHULVHG RYHU WKH FRPLQJ \HDUV 7KH UHDO FKDOOHQJH IRU
ELRORJLVWVQRZLVWRWU\WRGLVFRYHUIXQFWLRQVRIWKHWKRXVDQGVRIPL51$VIRUZKLFKZH
FXUUHQWO\NQRZQRWKLQJDERXW7KHGLI¿FXOWLHVRIWDUJHWDQGIXQFWLRQDOFKDUDFWHULVDWLRQ
will be covered later in this chapter.
7KHSODQWVSHFL¿FWUDQVDFWLQJVL51$VWDVL51$VDUHDQRWKHUFODVVRIHQGRJHQRXV
V51$WKDWFDQEHLGHQWL¿HGXVLQJFRPSXWDWLRQDOPHWKRGVWDVL51$VDUHGHULYHGIURP
a single-stranded RNA transcript which is targeted in two positions by a miRNA which
LVWKRXJKWWRWULJJHUGRXEOHVWUDQGIRUPDWLRQE\DQ51$GHSHQGHQW51$SRO\PHUDVH89
,QGRXEOHVWUDQGHGIRUPWKHSUHFXUVRUEHFRPHVDVXEVWUDWHIRUD'LFHUHQ]\PH'&/
ZKRVHSURJUHVVLYHFOHDYDJHLQQWLQWHUYDOVOHDGVWRWKH³SKDVHG´SDWWHUQRIPDWXUH
V51$SURGXFWLRQWKDWLVDKDOOPDUNRIWDVL51$ORFLV51$SURGXFLQJORFLFDQEHWHVWHG
VWDWLVWLFDOO\LQRUGHUWRFODVVLI\QRYHOWDVL51$SURGXFLQJUHJLRQV90
2WKHUW\SHVRIV51$VDUHXVXDOO\QRWVRZHOOGH¿QHGE\DOLJQPHQWSDWWHUQVRUV51$
VHTXHQFH SURSHUWLHV DQG DUH WKHUHIRUH FXUUHQWO\ GLI¿FXOW WR FODVVLI\ FRPSXWDWLRQDOO\
'LFHULVDOVRNQRZQWRSURGXFHV51$VIURPORQJSHUIHFWRUQHDUSHUIHFWGRXEOHVWUDQGHG
SUHFXUVRUVLQDPRUHXQSUHGLFWDEOHSDWWHUQWRWKDWRIWKHSUHFLVHO\GH¿QHGPL51$VDQG
ta-siRNAs.91 This imprecise Dicer processing can give rise to highly complex sRNA
ORFLZKHUHODUJHQXPEHUVRIV51$VHTXHQFHVDUHUHSUHVHQWHGE\ORZDEXQGDQFHUHDGV
even in high-throughput data sets. In order to try to overcome problems when comparing
SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES
247
Table 1.3HUIRUPDQFHRIVHTXHQFLQJWHFKQRORJLHV1XPEHUVREWDLQHGIURPPDQXIDFWXUHU¶V
ZHEVLWHVRQ0D\7KHQXPEHUVIRUWKHKLJKWKURXJKSXWSODWIRUPVZLOOFKDQJHDV
WKHUHLVYHU\DFWLYHSURGXFWGHYHORSPHQWRQDOOSODWIRUPV
Technology
Read Length
Nucleotides
Read Per Day
Sanger sequencing
(ABI3730)
Up to 900nt
8SWRA
KWWSZZZDSSOLHGELRV\VWHPV
com
)/;7LWDQLXP
series)
Up to 500nt
8SWRA
KWWSFRP
Genome Analyzer (IIe)
Up to 100nt
8SWRA
KWWSZZZLOOXPLQDFRP
SOLiD (version 4)
Up to 50nt
8SWRA
KWWSZZZDSSOLHGELRV\VWHPV
com
5HIHUHQFH
Table 2.%LRLQIRUPDWLFVRIWZDUHUHVRXUFHVWRDQDO\VHKLJKWKURXJKSXWVHTXHQFLQJGDWD
REWDLQHGIURPV51$V$OOVRIWZDUHLVIUHHO\DYDLODEOHIRUQRQFRPPHUFLDOXVH
Name
Organism
Functionalities
5HIHUHQFH
miRDeep
Animals
Find miRNAs
83
miRCat
Plants and animals
Find miRNAs
84
miRanalyzer
Animals
3UR¿OHNQRZQV51$V
¿QGPL51$V
85
miRExpress
Animals and plants
3UR¿OHNQRZQPL51$V
80
PL53URI
Animals and plants
3UR¿OHNQRZQPL51$V
84
pssRNAMiner
Plants
3UHGLFWLRQRIWDVL51$
121
Phasing detection
tool
Plants
3UHGLFWLRQRIWDVL51$
84
SeqBuster
Animals
$QDO\VLVRILVRPL5V
87
SiLoCo
Plants
,GHQWL¿FDWLRQRIV51$
loci
84
NiBLS
Plants
,GHQWL¿FDWLRQRIV51$
loci
92
UHDGFRXQWVEHWZHHQV51$VLQPXOWLSOHVDPSOHVLWLVRIWHQDGYLVDEOHWRWU\WRJURXS
sRNAs into transcriptional units or loci using either genome or transcript annotations.
7KH6L/R&RPHWKRGJURXSVV51$VVXFKWKDWHDFKORFXVFRQWDLQVDPLQLPXPQXPEHURI
sRNAs and such that the gap between two consecutive sRNAs is below a maximum gap.84
More recently the NiBLS algorithm, based on graph properties, has been developed.92
This method builds a graph where its nodes are individual sRNAs and edges are created
EHWZHHQV51$VFORVHWRHDFKRWKHU7KHDOJRULWKPLGHQWL¿HVVXEVHWVRIQRGHVZLWKKLJK
FOXVWHULQJFRHI¿FLHQWV7KHV51$ORFLDUHWKHQGH¿QHGE\WKHPLQLPXPDQGPD[LPXP
coordinates among the sequences in each subset.
248
RNA INFRASTRUCTURE AND NETWORKS
7DUJHW,GHQWL¿FDWLRQ
&XUUHQWO\ WKH H[SHULPHQWDO YDOLGDWLRQ RI V51$ WDUJHWV DQG KHQFH IXQFWLRQ LV
a very time-consuming and expensive process. In plants, sRNA mediated target site
FOHDYDJHLVZLGHVSUHDG7KHUHVXOWLQJFOHDYDJHSURGXFWVKDYHWZRSURSHUWLHVWKDWIDFLOLWDWH
WKHLU UHFRJQLWLRQ )LUVW WKH FOHDYDJH SRLQW LV YHU\ ZHOO GH¿QHG LH WKH QXFOHRWLGHV
WKDWDUHFRPSOHPHQWDU\WRQXFOHRWLGHVDQGRIWKHV51$6HFRQGWKHv cleavage
SURGXFW LV JHQHUDOO\ VWDEOH $ FRQVHTXHQFH RI WKH ¿UVW SURSHUW\ LV WKDW LW LV SRVVLEOH
to distinguish between cleavage products and other mRNA degradation products.
7KHLPSOLFDWLRQRIWKHVHFRQGSURSHUW\LVWKDWLWLVSRVVLEOHWRFORQHDQGVHTXHQFHWKHVH
cleavage products. Additionally, the 3v cleavage product contains a 5v mono-phosphate
JURXS ZKLFK KHOSV WR OLJDWH DQ 51$ ROLJR PDNLQJ LW SRVVLEOH WR LGHQWLI\ WKH H[DFW
cleavage position.93 Traditionally this has been done using a process called 5v Rapid
$PSOL¿FDWLRQRIF'1$(QGVv RACE). In essence this process allows the sequencing
RIFOHDYDJHSURGXFWVIURPWKHWUDQVFULSWSUHGLFWHGWREHWDUJHWHGE\DJLYHQV51$7KH
VHTXHQFHVFDQWKHQEHDOLJQHGWRWKHIXOOOHQJWKP51$DQGLIWKHP51$LVUHJXODWHG
WKHQFOHDYDJHSURGXFWVVKRXOGEHJLQDWWKHSRVLWLRQDIWHUWKHPL51$PHGLDWHGFOHDYDJH
was predicted to occur.
Recently a new high-throughput approach has been described which allows researchers
to carry out a high-throughput target validation analysis.94,95 This degradome sequencing
DSSURDFKFDSWXUHVDOOFOHDYHGP51$IUDJPHQWVLQWKHWUDQVFULSWRPHRIWKHLQSXWVDPSOH
DQGXVLQJVXLWDEOHELRLQIRUPDWLFVWRROVVXFKDV&OHDYH/DQGDOORZVWKHSUHGLFWLRQDQG
YDOLGDWLRQRIDOOSODQWPL51$WDUJHWVLQDVLQJOHH[SHULPHQW96
)RUWDUJHWVWKDWDUHQRWFOHDYHGWKHUHDUHWZRPDLQDSSURDFKHVXVHGIRUH[SHULPHQWDO
validation.977KH¿UVWRQHUHTXLUHVWKHDELOLW\WRPDQLSXODWHWKHFRQFHQWUDWLRQRUOHYHORI
DFWLYLW\RIWKHV51$)RUUHDOWDUJHWVWKHUHVSHFWLYHSURWHLQFRQFHQWUDWLRQOHYHOVVKRXOG
FKDQJHLQUHVSRQVHWRGLIIHUHQWV51$DFWLYLW\OHYHOV7KHVHFRQGDSSURDFKFRQVLVWVRI
FRS\LQJWKHVHTXHQFHRI WKHWDUJHWVLWHWRD UHSRUWHUJHQHDQGPHDVXULQJLWVDFWLYLW\
The target site should contain not only the region complementary to the sRNA but also
WKHFRUUHVSRQGLQJÀDQNLQJUHJLRQV)RUDUHDOWDUJHWLQWURGXFLQJSRLQWPXWDWLRQVWRWKH
FRSLHGVHTXHQFHVKRXOGUHVXOWLQGHFUHDVHGDFWLYLW\RIWKHUHSRUWHUJHQH
7KHOLPLWDWLRQVRIH[SHULPHQWDOPHWKRGVKDYHOHGWRWKHGHYHORSPHQWRIFRPSXWDWLRQDO
PHWKRGVWRSUHGLFWV51$WDUJHWV,QPRVWRIWKHVHPHWKRGVDFRPELQDWLRQRIWKUHHIHDWXUHV
is taken into account: sequence complementarity between miRNAs and target sites,
RWKHUWKHUPRG\QDPLFIDFWRUVDQGFRQVHUYDWLRQRISXWDWLYHWDUJHWVLWHVDFURVVVSHFLHV
'LIIHUHQWPHWKRGVH[LVWWRSUHGLFWWKHWDUJHWVLWHVZLWKKLJKGHJUHHRIFRPSOHPHQWDULW\
PRUHW\SLFDORISODQWVDQGWDUJHWVLWHVZLWKORZGHJUHHRIFRPSOHPHQWDULW\PRUHXVXDO
LQDQLPDOV)RUWKHIRUPHUWKHPRVWXVHGPHWKRGLVDUXOHEDVHGDOJRULWKPSUHVHQWHGLQ
a paper by Schwab and colleagues.29 For the latter TargetScan98 and PicTar99 have been
XVHGWRSURGXFHSUHGLFWLRQVRIFRQVHUYHGPL51$WDUJHWV7ZRRWKHUPHWKRGV51$
and Pita are widely used to predict nonconserved targets.100,101 The predictions generated
E\WKHVHPHWKRGVLQFOXGHPDQ\IDOVHSRVLWLYHVDQGIDOVHQHJDWLYHVZLWKWKHVHUDWHVPXFK
higher in the low complementarity predictions.
Alternative hybrid target prediction methods using microarrays or high-throughput
VHTXHQFLQJKDYHDOVREHHQGHYHORSHG2QHVHWRIDSSURDFKHVUHOLHVRQPHDVXULQJWKH
FRQFHQWUDWLRQVRIHLWKHUP51$VRUSURWHLQVXQGHUGLIIHUHQWDFWLYLW\OHYHOVRIDVLQJOHV51$
7DUJHWLQJUHODWLRQVKLSVDUHLQIHUUHGIRUWKHJHQHVWKDWVKRZFRQFHQWUDWLRQFKDQJHVDQG
SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES
249
contain potential target sites. mRNA concentrations can be estimated using microarrays
DQG PRUH UHFHQWO\ KLJKWKURXJKSXW VHTXHQFLQJ /DUJH VFDOH PHDVXUHPHQW RI SURWHLQ
FRQFHQWUDWLRQVLVDOVRSRVVLEOHDOWKRXJKZLWKWKHFXUUHQWWHFKQRORJ\RQO\DIUDFWLRQRI
proteins can be measured.102,103
,QWKH¿UVWH[SHULPHQWWKDWXVHGP51$PHDVXUHPHQWVREWDLQHGDIWHUPL51$DFWLYLW\
PDQLSXODWLRQWKHDEXQGDQFHVRIPL5DQGPL5ZHUHLQFUHDVHGE\WUDQVIHFWLQJFHOOV
ZLWKVL51$VZLWKLGHQWLFDOVHTXHQFHVRIWKRVHWZRPL51$V104(YLGHQFHIRUZLGHVSUHDG
WDUJHWLQJDWWKHP51$OHYHOZDVIRXQGE\DQDO\VLQJWKHVHTXHQFHVRIGRZQUHJXODWHG
JHQHV ,Q SDUWLFXODU WKH VHTXHQFH FRPSOHPHQWDU\ WR WKH ¿UVW HLJKW QXFOHRWLGHV RI WKH
PL51$VWKRXJKWWREHLPSRUWDQWIRUWDUJHWUHFRJQLWLRQZDVIRXQGWREHKLJKO\RYHU
UHSUHVHQWHGDPRQJWKHVHJHQHV,QRWKHUH[SHULPHQWVWKHDFWLYLW\RILQGLYLGXDOPL51$V
was decreased, either by suppressing activity or deleting the miRNA gene.105-108 In both
W\SHVRIH[SHULPHQWVWKHUHGXFHGPL51$DFWLYLW\LVH[SHFWHGWROHDGWRDGHUHSUHVVLRQ
RIWKHWDUJHWHGP51$VDQGDFRQVHTXHQWLQFUHDVHLQP51$FRQFHQWUDWLRQV)LQDOO\LQD
VXEVHTXHQWH[SHULPHQWWKHHIIHFWVRIRYHUH[SUHVVLQJDQGVXSSUHVVLQJWKHVDPHPL51$
ZHUH MRLQWO\ DVVHVVHG109 7KH FRPELQDWLRQ RI WKH UHVXOWV IURP WKHVH WZR H[SHULPHQWV
LQFUHDVHGWKHVSHFL¿FLW\RIWKHVHWRIWDUJHWFDQGLGDWHV7KLVPHWKRGRIPL51$WDUJHW
SUHGLFWLRQ ORRNLQJ IRU GLIIHUHQFHV LQ P51$ FRQFHQWUDWLRQV XQGHU GLIIHUHQW PL51$
DFWLYLWLHVKDVWZROLPLWDWLRQV¿UVWLWLGHQWL¿HVRQO\WKHIUDFWLRQRIWDUJHWVUHJXODWHGDW
WKHP51$OHYHOVHFRQGVRPHRIWKHSUHGLFWHGWDUJHWVPLJKWEHIDOVHSRVLWLYHVVLQFH
WKHREVHUYHGP51$FRQFHQWUDWLRQFKDQJHFDQEHFDXVHGE\IDFWRUVRWKHUWKDQPL51$
WDUJHWLQJ+RZHYHULWXVXDOO\JHQHUDWHVDORZHUQXPEHURIIDOVHSRVLWLYHVWKDQSXUHO\
computational methods.
$QRWKHUDSSURDFKXVLQJKLJKWKURXJKSXWPHDVXUHPHQWRIP51$VKDVDOVREHHQ
GHYHORSHGLQUHFHQW\HDUV7KLVDSSURDFKUHOLHVRQWKHLPPXQRSUHFLSLWDWLRQRI5,6&V
ERXQGWRWDUJHWHGP51$VDQGVXEVHTXHQWSUR¿OLQJRIWKHVHP51$VXVLQJPLFURDUUD\V110-113
7KHSUR¿OLQJPXVWEHIROORZHGE\FRPSXWDWLRQDODQDO\VLVRIWKHVHTXHQFHVRIWKHSUR¿OHG
P51$VWRLGHQWLI\WKHSXWDWLYHWDUJHWVLWHVDQGV51$VWKDWWDUJHWHDFKP51$0RUH
UHFHQWO\DVWXG\UHSRUWHGWKHVHTXHQFLQJRIDVDPSOHZKHUH5,6&P51$DQG5,6&V51$
complexes were immuno-precipitated in the same sample and both mRNAs and sRNAs
ZHUHVXEMHFWHGWRKLJKWKURXJKSXWVHTXHQFLQJ7KLVFRPELQHGGDWDDOORZHGIRUDPRUH
DFFXUDWHGHWHUPLQDWLRQRIV51$P51$UHJXODWRU\LQWHUDFWLRQV114
Further sRNA Characterisation
$VZHKDYHVHHQV51$FODVVHVDUHSDUWLDOO\GH¿QHGE\WKHLUELRJHQHVLVSDWKZD\V
+LJKWKURXJKSXWVHTXHQFLQJRIVPDOO51$VLQFHOOVWKDWODFNRQHRUPRUHWKDQRQHRIWKH
SURWHLQVLQYROYHGLQWKHVHSDWKZD\VKDVEHHQXVHGWRFODVVLI\V51$VDQGLQSDUWLFXODU
WRGLVWLQJXLVKPL51$VIURPRWKHUW\SHVRIV51$V72,1157KLVPLJKWEHXVHIXOWRSUHGLFW
the targeting pathways in which each sRNAs is involved.
High-throughput technologies have also been used to study the transcriptional
UHJXODWLRQRIV51$V7KLVFDQEHGRQHE\LPPXQRSUHFLSLWDWLQJWUDQVFULSWLRQIDFWRUVDQG
WKH'1$WKH\DUHERXQGWRIROORZHGE\'1$SUR¿OLQJUHVRUWLQJHLWKHUWRPLFURDUUD\V
or high-throughput sequencing. This method has been used to determine transcription
start sites and promoter regions116 and to establish transcriptional regulatory interactions
EHWZHHQ WUDQVFULSWLRQ IDFWRUV DQG V51$V39 7KHVH W\SHV RI VWXGLHV KDYH DOORZHG WKH
FRQVWUXFWLRQRIUHJXODWRU\QHWZRUNVLQYROYLQJERWKV51$VDQGUHJXODWRU\SURWHLQV117,118
250
RNA INFRASTRUCTURE AND NETWORKS
CONCLUSION
+LJKWKURXJKSXWPHWKRGVKDYHEHHQH[WUHPHO\LPSRUWDQWIRUWKHFKDUDFWHUL]DWLRQRI
V51$VQDPHO\IRUWKHGLVFRYHU\DQGFODVVL¿FDWLRQRIWKHJHQHVWKDWHQFRGHV51$VWKH
LGHQWL¿FDWLRQRIWKHJHQHVUHJXODWHGE\V51$VDQGWKHJHQHVWKDWUHJXODWHWKHH[SUHVVLRQ
RIV51$V,QWKHQHDUIXWXUHWKHLQFRUSRUDWLRQRIQHZWHFKQRORJLFDOGHYHORSPHQWVLV
JRLQJWRKDYHDJUHDWLQÀXHQFHLQWKHSURJUHVVRIWKHUHVHDUFK$QHZPHWKRGWRSUHSDUH
OLEUDULHVIRUKLJKWKURXJKSXWVHTXHQFLQJWKDWUHTXLUHVPXFKVPDOOHUTXDQWLWLHVRI51$
has also been recently published,119 making it possible to generate much more accurate
PHDVXUHPHQWVRI51$FRQFHQWUDWLRQVDWDVLQJOHFHOOOHYHO7KHVHQHZSURWRFROVZLOO
DOORZHDVLHUVHTXHQFLQJRIEDFWHULDODQGRWKHUQRQSRO\$WDLOHG51$VDQGLWLVOLNHO\WKDW
there will be a rapid increase in research in these areas. New sequencing technologies
FDSDEOHRIVLQJOHPROHFXOHUHVROXWLRQKDYHEHHQUHFHQWO\GHVFULEHGDQGDUHH[SHFWHGWR
EHZLGHO\DYDLODEOHLQWKHQH[WIHZ\HDUV120 The expected throughput increase and cost
GHFUHDVHRIVHTXHQFLQJZLOOOLNHO\FDXVHDVKLIWLQWKHXVDJHRIWKHVHWHFKQRORJLHVLQWKH
IXWXUHVHTXHQFLQJZLOOEHXVHGOHVVWRFDWDORJXHV51$VDQGPRUHWRXQGHUVWDQGWKHLU
IXQFWLRQIRULQVWDQFHE\VLPXOWDQHRXVO\SUR¿OLQJV51$VDQGWKHLUWDUJHWVDFURVVPXOWLSOH
tissues and developmental stages. These datasets will allow us not only to increase the
DPRXQWDQGTXDOLW\RILQIRUPDWLRQRQUHJXODWRU\LQWHUDFWLRQVLQYROYLQJV51$VWKDWLVWR
XQFRYHUWKHWRSRORJ\RIWKHUHJXODWRU\QHWZRUNVLQYROYLQJV51$VEXWDOVRWRLPSURYH
RXUXQGHUVWDQGLQJRIWKHG\QDPLFVRIWKHVHQHWZRUNVDQGRIWKHUHJXODWLRQRIELRORJLFDO
processes by sRNAs.
ACKNOWLEDGEMENTS
7KHDXWKRUVWKDQN'U)UDQN6FKZDFKIRUKHOSIXOGLVFXVVLRQVGXULQJWKHSUHSDUDWLRQ
RIWKLVPDQXVFULSW+3LVDVWXGHQW,QVWLWXWR*XOEHQNLDQGH&LrQFLD¶V3K'3URJUDPPH
LQ&RPSXWDWLRQDO%LRORJ\VSRQVRUHGE\)XQGDomRSDUDD&LrQFLDHD7HFQRORJLD>)&7@
Fundação Calouste Gulbenkian and Siemens SA Portugal) and was supported by FCT
IHOORZVKLS6)5+%':RUNVXSSRUWHGE\%LRWHFKQRORJ\DQG%LRORJLFDO
6FLHQFHV5HVHDUFK&RXQFLO%%(
REFERENCES
*RWWHVPDQ67KHVPDOO51$UHJXODWRUVRI(VFKHULFKLDFROLUROHVDQGPHFKDQLVPV$QQXDOUHYLHZRI
microbiology 2004; 58:303-328.
:DJQHU (* 6LPRQV 5: $QWLVHQVH 51$ FRQWURO LQ EDFWHULD SKDJHV DQG SODVPLGV $QQXDO UHYLHZ RI
microbiology 1994; 48:713-742.
:DVVDUPDQ.06PDOO51$VLQEDFWHULDGLYHUVHUHJXODWRUVRIJHQHH[SUHVVLRQLQUHVSRQVHWRHQYLURQPHQWDO
changes. Cell 2002; 109(2):141-144.
:LJKWPDQ % +D , 5XYNXQ * 3RVWWUDQVFULSWLRQDO UHJXODWLRQ RI WKH KHWHURFKURQLF JHQH OLQ E\ OLQ
PHGLDWHVWHPSRUDOSDWWHUQIRUPDWLRQLQ&HOHJDQV&HOO
5. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with
antisense complementarity to lin-14. Cell 1993; 75(5):843-854.
6. Mueller E, Gilbert J, Davenport G et al. Homology-dependent resistance: transgenic virus resistance in plants
related to homology-dependent gene silencing. Plant J. 1995; 7(6):1001-1013.
9RLQQHW26\VWHPLF6SUHDGRI6HTXHQFH6SHFL¿F7UDQVJHQH51$'HJUDGDWLRQLQ3ODQWV,V,QLWLDWHGE\
/RFDOL]HG,QWURGXFWLRQRI(FWRSLF3URPRWHUOHVV'1$&HOO
1HOOHQ:/LFKWHQVWHLQ&:KDWPDNHVDQP51$DQWLVHQVLWLYH"7UHQGV%LRFKHP6FL
SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES
251
)LUH$;X60RQWJRPHU\0.HWDO3RWHQWDQGVSHFL¿FJHQHWLFLQWHUIHUHQFHE\GRXEOHVWUDQGHG51$LQ
Caenorhabditis elegans. Nature 1998; 391(6669):806-811.
&DUWKHZ5:6RQWKHLPHU(-2ULJLQVDQG0HFKDQLVPVRIPL51$VDQGVL51$V&HOO
(XODOLR$+XQW]LQJHU(,]DXUUDOGH(*HWWLQJWRWKHURRWRIPL51$PHGLDWHGJHQHVLOHQFLQJ&HOO
132(1):9-14.
12. Filipowicz W, Jaskiewicz L, Kolb FA et al. Post-transcriptional gene silencing by siRNAs and miRNAs.
Curr Opin Struct Biol 2005; 15(3):331-341.
9HUGHO$-LD6*HUEHU6HWDO51$LPHGLDWHGWDUJHWLQJRIKHWHURFKURPDWLQE\WKH5,76FRPSOH[6FLHQFH
2004; 303(5658):672-676.
14. Ghildiyal M, Zamore PD. Small silencing RNAs: an expanding universe. Nat Rev Genet 2009; 10(2):94-108.
%DUWHO'30LFUR51$VJHQRPLFVELRJHQHVLVPHFKDQLVPDQGIXQFWLRQ&HOO
16. Bushati N, Cohen SM. microRNA Functions. Annu Rev Cell Dev Biol 2007; 23(1):175-205.
17. Jones-Rhoades MW, Bartel DP, Bartel B. MicroRNAs and Their Regulatory Roles in Plants. Annu Rev
Plant Biol 2006; 57:19-53.
&XOOHQ%57UDQVFULSWLRQDQGSURFHVVLQJRIKXPDQPLFUR51$SUHFXUVRUV0RO&HOO
'X 7 =DPRUH 3' PLFUR3ULPHU WKH ELRJHQHVLV DQG IXQFWLRQ RI PLFUR51$ 'HYHORSPHQW 132(21):4645-4652.
20. Faller M, Guo F. MicroRNA biogenesis: there’’s more than one way to skin a cat. Biochim Biophys Acta
2008; 1779(11):663-667.
.LP91+DQ-6LRPL0&%LRJHQHVLVRIVPDOO51$VLQDQLPDOV1DW5HY0RO&HOO%LRO
9RLQQHW22ULJLQ%LRJHQHVLVDQG$FWLYLW\RI3ODQW0LFUR51$V&HOO
23. Kim VN. MicroRNA biogenesis: coordinated cropping and dicing. Nat Rev Mol Cell Biol 2005; 6(5):376-385.
&KHQ;6PDOO51$VDQG7KHLU5ROHVLQ3ODQW'HYHORSPHQW$QQX5HY&HOO'HY%LRO
6FKZDFK)0R[RQ60RXOWRQ9HWDO'HFLSKHULQJWKHGLYHUVLW\RIVPDOO51$VLQSODQWVWKHORQJDQG
VKRUWRILW%ULHI)XQFW*HQRPLF3URWHRPLF
2NDPXUD . /DL (& (QGRJHQRXV VPDOO LQWHUIHULQJ 51$V LQ DQLPDOV 1DW 5HY 0RO &HOO %LRO 9(9):673-678.
7KRPVRQ7/LQ+7KHELRJHQHVLVDQGIXQFWLRQRI3,:,SURWHLQVDQGSL51$VSURJUHVVDQGSURVSHFW
Annu Rev Cell Dev Biol 2009; 25(1):355-376.
)LOLSRZLF]:%KDWWDFKDU\\D616RQHQEHUJ10HFKDQLVPVRISRVWWUDQVFULSWLRQDOUHJXODWLRQE\PLFUR51$V
DUHWKHDQVZHUVLQVLJKW"1DW5HY*HQHW
6FKZDE53DODWQLN-)5LHVWHU0HWDO6SHFL¿F(IIHFWVRI0LFUR51$VRQWKH3ODQW7UDQVFULSWRPH'HY
Cell 2005; 8(4):517-527.
-RQHV5KRDGHV0:%DUWHO'3&RPSXWDWLRQDOLGHQWL¿FDWLRQRISODQWPLFUR51$VDQGWKHLUWDUJHWVLQFOXGLQJ
a stress-induced miRNA. Mol Cell 2004; 14(6):787-799.
5DMHZVN\1PLFUR51$WDUJHWSUHGLFWLRQVLQDQLPDOV1DW*HQHW6XSSOV66
%DUWHO'30LFUR51$VWDUJHWUHFRJQLWLRQDQGUHJXODWRU\IXQFWLRQV&HOO
'LQJ;&:HLOHU-*URVVKDQV+5HJXODWLQJWKHUHJXODWRUVPHFKDQLVPVFRQWUROOLQJWKHPDWXUDWLRQRI
microRNAs. Trends in biotechnology 2009; 27(1):27-36.
0XUDOLGKDU%*ROGVWHLQ/1J*HWDO*OREDOPLFUR51$SUR¿OHVLQFHUYLFDOVTXDPRXVFHOOFDUFLQRPD
GHSHQGRQ'URVKDH[SUHVVLRQOHYHOV7KH-RXUQDORI3DWKRORJ\
;LH = .DVVFKDX .' &DUULQJWRQ -& 1HJDWLYH IHHGEDFN UHJXODWLRQ RI GLFHUOLNH LQ DUDELGRSVLV E\
microRNA-guided mRNA degradation. Current Biology 2003; 13(9):784-789.
3LUL\DSRQJVD--RUGDQ,.'XDOFRGLQJRIVL51$VDQGPL51$VE\SODQWWUDQVSRVDEOHHOHPHQWV51$
2008; 14(5):814-821.
*LUDOGH]$-0LVKLPD<5LKHO-HWDO=HEUD¿VK0L5SURPRWHVGHDGHQ\ODWLRQDQGFOHDUDQFHRIPDWHUQDO
mRNAs. Science 2006; 312(5770):75-79.
=KDR;)MRVH$/DUVHQ1HWDO7UHDWPHQWZLWKVPDOOLQWHUIHULQJ51$DIIHFWVWKHPLFUR51$SDWKZD\
DQGFDXVHVXQVSHFL¿FGHIHFWVLQ]HEUD¿VKHPEU\RV)(%6-RXUQDO
39. Marson A, Levine SS, Cole MF et al. Connecting microRNA genes to the core transcriptional regulatory
FLUFXLWU\RIHPEU\RQLFVWHPFHOOV&HOO
=KRX;5XDQ-:DQJ*HWDO&KDUDFWHUL]DWLRQDQGLGHQWL¿FDWLRQRIPLFUR51$FRUHSURPRWHUVLQIRXU
model species. PLoS Comput Biol 2007; 3(3).
41. Megraw M, Baev V, Rusinov V et al. MicroRNA promoter element discovery in Arabidopsis. RNA 2006;
12(9):1612-1619.
$OZLQH-&.HPS'-6WDUN*50HWKRGIRUGHWHFWLRQRIVSHFL¿F51$VLQDJDURVHJHOVE\WUDQVIHUWR
diazobenzyloxymethyl-paper and hybridization with DNA probes. Proc Natl Acad Sci USA 1977;
74(12):5350-5354.
%HFNHU$QGUp 0 +DKOEURFN . $EVROXWH P51$ TXDQWL¿FDWLRQ XVLQJ WKH SRO\PHUDVH FKDLQ UHDFWLRQ
(PCR). A novel approach by a PCR aided transcript titration assay (PATTY). Nucleic Acids Res 1989;
17(22):9437-9446.
252
RNA INFRASTRUCTURE AND NETWORKS
%URZQ32%RWVWHLQ'([SORULQJWKHQHZZRUOGRIWKHJHQRPHZLWK'1$PLFURDUUD\V1DW*HQHW
21(1 Suppl):33-37.
5HLQDUW]-%UX\QV(/LQ-HWDO0DVVLYHO\SDUDOOHOVLJQDWXUHVHTXHQFLQJ0366DVDWRROIRULQGHSWK
TXDQWLWDWLYHJHQHH[SUHVVLRQSUR¿OLQJLQDOORUJDQLVPV%ULHI)XQFW*HQRPLF3URWHRPLF
0DUJXOLHV0(JKROP0$OWPDQ:(HWDO*HQRPHVHTXHQFLQJLQPLFURIDEULFDWHGKLJKGHQVLW\SLFROLWUH
reactors. Nature 2005; 437(7057):376-380.
47. Bennett S. Solexa Ltd. Pharmacogenomics 2004; 5(4):433-438.
6KHQGXUH-3RUUHFD*-5HSSDV1%HWDO$FFXUDWHPXOWLSOH[SRORQ\VHTXHQFLQJRIDQHYROYHGEDFWHULDO
genome. Science 2005; 309(5741):1728-1732.
:LOOHQEURFN+6DORPRQ-6¡NLOGH5HWDO4XDQWLWDWLYHPL51$H[SUHVVLRQDQDO\VLVFRPSDULQJPLFURDUUD\V
with next-generation sequencing. RNA (New York, N.Y.) 2009; 15(11):2028-2034.
/LQVHQ6(GH:LW(-DQVVHQV*HWDO/LPLWDWLRQVDQGSRVVLELOLWLHVRIVPDOO51$GLJLWDOJHQHH[SUHVVLRQ
SUR¿OLQJ1DWXUHPHWKRGV
/DL ( 7RPDQFDN 3 :LOOLDPV 5 HW DO &RPSXWDWLRQDO LGHQWL¿FDWLRQ RI 'URVRSKLOD PLFUR51$ JHQHV
Genome Biol 2003; 4(7).
52. Lim LP, Glasner ME, Yekta S et al. Vertebrate microRNA genes. Science 2003; 299(5612).
%RQQHW(:X\WV-5RX]p3HWDO'HWHFWLRQRISRWHQWLDOFRQVHUYHGSODQWPLFUR51$VLQ$UDELGRSVLVWKDOLDQD
DQG2U\]DVDWLYDLGHQWL¿HVLPSRUWDQWWDUJHWJHQHV3URF1DWO$FDG6FL86$
:DQJ ; =KDQJ - /L ) HW DO 0LFUR51$ LGHQWL¿FDWLRQ EDVHG RQ VHTXHQFH DQG VWUXFWXUH DOLJQPHQW
%LRLQIRUPDWLFV
/LP /3 /DX 1& :HLQVWHLQ (* HW DO 7KH PLFUR51$V RI FDHQRUKDEGLWLV HOHJDQV *HQHV 'HY 17(8):991-1008.
+RIDFNHU,/9LHQQD51$VHFRQGDU\VWUXFWXUHVHUYHU1XFOHLF$FLGV5HV
=XNHU00IROGZHEVHUYHUIRUQXFOHLFDFLGIROGLQJDQGK\EULGL]DWLRQSUHGLFWLRQ1XFOHLF$FLGV5HV
31(13):3406-3415.
$GDL $ -RKQVRQ & 0ORWVKZD 6 HW DO &RPSXWDWLRQDO SUHGLFWLRQ RI PL51$V LQ $UDELGRSVLV WKDOLDQD
Genome Res 2005; 15(1):78-91.
%DUDNDW$:DOO./HHEHQV0DFN-HWDO/DUJHVFDOHLGHQWL¿FDWLRQRIPLFUR51$VIURPDEDVDOHXGLFRW
(VFKVFKRO]LDFDOLIRUQLFDDQGFRQVHUYDWLRQLQÀRZHULQJSODQWV3ODQW-IRUFHOODQGPROHFXODUELRORJ\
2007; 51(6):991-1003.
%HQWZLFK,$YQLHO$.DURY<HWDO,GHQWL¿FDWLRQRIKXQGUHGVRIFRQVHUYHGDQGQRQFRQVHUYHGKXPDQ
microRNAs. Nat Genet 2005; 37(7):766-770.
)DKOJUHQ1+RZHOO0'.DVVFKDX.'HWDO+LJKWKURXJKSXWVHTXHQFLQJRI$UDELGRSVLVPLFUR51$V
HYLGHQFHIRUIUHTXHQWELUWKDQGGHDWKRI0,51$JHQHV3OR6RQHH
6]LWW\D*0R[RQ66DQWRV'0HWDO+LJKWKURXJKSXWVHTXHQFLQJRI0HGLFDJRWUXQFDWXODVKRUW51$V
LGHQWL¿HVHLJKWQHZPL51$IDPLOLHV%0&JHQRPLFV
<DR<*XR*1L=HWDO&ORQLQJDQGFKDUDFWHUL]DWLRQRIPLFUR51$VIURPZKHDW7ULWLFXPDHVWLYXP
L.). Genome Biol 2007; 8(6):R96.
$[WHOO 0- %DUWHO '3 $QWLTXLW\ RI PLFUR51$V DQG WKHLU WDUJHWV LQ ODQG SODQWV 3ODQW &HOO 17(6):1658-1673.
/DJRV4XLQWDQD05DXKXW5/HQGHFNHO:HWDO,GHQWL¿FDWLRQRIQRYHOJHQHVFRGLQJIRUVPDOOH[SUHVVHG
RNAs. Science (New York, N.Y.) 2001; 294(5543):853-858.
/HH5&$PEURV9$QH[WHQVLYHFODVVRIVPDOO51$VLQ&DHQRUKDEGLWLVHOHJDQV6FLHQFH1HZ<RUN
N.Y.) 2001; 294(5543):862-864.
/DX1&/LP/3:HLQVWHLQ(*HWDO$QDEXQGDQWFODVVRIWLQ\51$VZLWKSUREDEOHUHJXODWRU\UROHVLQ
caenorhabditis elegans. Science 2001; 294(5543):858-862.
%HUH]LNRY(7KXHPPOHU)/DDNH/:HWDO'LYHUVLW\RIPLFUR51$VLQKXPDQDQGFKLPSDQ]HHEUDLQ
Nat Genet 2006; 38(12):1375-1377.
/HH+&KDQJ6&KRXGKDU\6HWDOTL51$LVDQHZW\SHRIVPDOOLQWHUIHULQJ51$LQGXFHGE\'1$
damage. Nature 2009; 459(7244):274-277.
*KLOGL\DO06HLW]++RUZLFK0'HWDO(QGRJHQRXVVL51$VGHULYHGIURPWUDQVSRVRQVDQGP51$VLQ
drosophila somatic cells. Science 2008; 320(5879):1077-1081.
:DQJ;-5H\HV-&KXD1+HWDO3UHGLFWLRQDQGLGHQWL¿FDWLRQRI$UDELGRSVLVWKDOLDQDPLFUR51$VDQG
their mRNA targets. Genome Biol 2004; 5(9).
72. Lu C, Kulkarni K, Souret FF et al. MicroRNAs and other small RNAs enriched in the Arabidopsis
RNA-dependent RNA polymerase-2 mutant. Genome Res 2006; 16(10):1276-1288.
0R[RQ6-LQJ56]LWW\D*HWDO'HHSVHTXHQFLQJRIWRPDWRVKRUW51$VLGHQWL¿HVPLFUR51$VWDUJHWLQJ
JHQHVLQYROYHGLQIUXLWULSHQLQJ*HQRPH5HV
*OD]RY($&RWWHH3$%DUULV:&HWDO$PLFUR51$FDWDORJRIWKHGHYHORSLQJFKLFNHQHPEU\RLGHQWL¿HG
by a deep sequencing approach. Genome Res 2008; 18(6):957-964.
SMALL RNA DISCOVERY AND CHARACTERISATION IN EUKARYOTES
253
.XFKHQEDXHU)0RULQ5'$UJLURSRXORV%HWDO,QGHSWKFKDUDFWHUL]DWLRQRIWKHPLFUR51$WUDQVFULSWRPH
in a leukemia progression model. Genome Res 2008; 18(11):1787-1797.
0RULQ5'2¶&RQQRU0'*ULI¿WK0HWDO$SSOLFDWLRQRIPDVVLYHO\SDUDOOHOVHTXHQFLQJWRPLFUR51$
SUR¿OLQJDQGGLVFRYHU\LQKXPDQHPEU\RQLFVWHPFHOOV*HQRPH5HV
*RII/$'DYLOD-6ZHUGHO05HWDO$JRLPPXQRSUHFLSLWDWLRQLGHQWL¿HVSUHGLFWHGPLFUR51$VLQKXPDQ
embryonic stem cells and neural precursors. PLoS ONE 2009; 4(9).
&DL<<X;=KRX4HWDO1RYHOPLFUR51$VLQVLONZRUP%RPE\[PRUL)XQFW,QWHJU*HQRPLFV
10(3):405-15.
7UDSQHOO&6DO]EHUJ6/+RZWRPDSELOOLRQVRIVKRUWUHDGVRQWRJHQRPHV1DW%LRWHFKQRO
/L+'XUELQ5)DVWDQGDFFXUDWHVKRUWUHDGDOLJQPHQWZLWK%XUURZV:KHHOHUWUDQVIRUP%LRLQIRUPDWLFV
2[IRUG(QJODQG
/DQJPHDG%7UDSQHOO&3RS0HWDO8OWUDIDVWDQGPHPRU\HI¿FLHQWDOLJQPHQWRIVKRUW'1$VHTXHQFHV
to the human genome. Genome Biol 2009; 10(3).
3UIHU.6WHQ]HO8'DQQHPDQQ0HWDO3DW0D1UDSLGDOLJQPHQWRIVKRUWVHTXHQFHVWRODUJHGDWDEDVHV
%LRLQIRUPDWLFV
)ULHGOlQGHU05&KHQ:$GDPLGL&HWDO'LVFRYHULQJPLFUR51$VIURPGHHSVHTXHQFLQJGDWDXVLQJ
miRDeep. Nat Biotech 2008; 26(4):407-415.
0R[RQ 6 6FKZDFK ) 'DOPD\ 7 HW DO $ WRRONLW IRU DQDO\VLQJ ODUJHVFDOH SODQW VPDOO 51$ GDWDVHWV
%LRLQIRUPDWLFV
+DFNHQEHUJ06WXUP0/DQJHQEHUJHU'HWDOPL5DQDO\]HUDPLFUR51$GHWHFWLRQDQGDQDO\VLVWRROIRU
QH[WJHQHUDWLRQVHTXHQFLQJH[SHULPHQWV1XFO$FLGV5HVVXSSOB::
:DQJ:&/LQ)0&KDQJ:&HWDOPL5([SUHVV$QDO\]LQJKLJKWKURXJKSXWVHTXHQFLQJGDWDIRUSUR¿OLQJ
PLFUR51$H[SUHVVLRQ%0&%LRLQIRUPDWLFV
3DQWDQR/(VWLYLOO;0DUWt(6HT%XVWHUDELRLQIRUPDWLFWRROIRUWKHSURFHVVLQJDQGDQDO\VLVRIVPDOO
51$V GDWDVHWV UHYHDOV XELTXLWRXV PL51$ PRGL¿FDWLRQV LQ KXPDQ HPEU\RQLF FHOOV 1XFOHLF $FLGV
Res2010; 38(5):e34.
*ULI¿WKV-RQHV66DLQL+.YDQ'RQJHQ6HWDOPL5%DVHWRROVIRUPLFUR51$JHQRPLFV1XFOHLF$FLGV
5HVVXSSOB''
$[WHOO 0- -DQ & 5DMDJRSDODQ 5 HW DO $ WZRKLW WULJJHU IRU VL51$ ELRJHQHVLV LQ SODQWV &HOO 127(3):565-577.
&KHQ+/L<:X6%LRLQIRUPDWLFSUHGLFWLRQDQGH[SHULPHQWDOYDOLGDWLRQRIDPLFUR51$GLUHFWHGWDQGHP
trans-acting siRNA cascade in Arabidopsis. Proc Natl Acad Sci USA 2007; 104(9):3318-3323.
(OEDVKLU60/HQGHFNHO:7XVFKO751$LQWHUIHUHQFHLVPHGLDWHGE\DQGQXFOHRWLGH51$V
Genes Dev 2001; 15(2):188-200.
0DF/HDQ'0RXOWRQ96WXGKROPH'-)LQGLQJV51$JHQHUDWLYHORFDOHVIURPKLJKWKURXJKSXWVHTXHQFLQJ
GDWDZLWK1L%/6%0&%LRLQIRUPDWLFV
93. Llave C, Kasschau KD, Rector MA et al. Endogenous and silencing-associated small RNAs in plants. The
Plant cell 2002; 14(7):1605-1619.
*HUPDQ0$3LOOD\0-HRQJ'+HWDO*OREDOLGHQWL¿FDWLRQRIPLFUR51$±WDUJHW51$SDLUVE\SDUDOOHO
DQDO\VLVRI51$HQGV1DW%LRWHFKQRO
$GGR4XD\H&(VKRR7:%DUWHO'3HWDO(QGRJHQRXVVL51$DQGPL51$WDUJHWVLGHQWL¿HGE\VHTXHQFLQJ
RIWKH$UDELGRSVLVGHJUDGRPH&XUUHQW%LRORJ\
$GGR4XD\H&0LOOHU:$[WHOO0-&OHDYH/DQGDSLSHOLQHIRUXVLQJGHJUDGRPHGDWDWR¿QGFOHDYHG
VPDOO51$WDUJHWV%LRLQIRUPDWLFV2[IRUG(QJODQG
.XKQ'(0DUWLQ00)HOGPDQ'6HWDO([SHULPHQWDOYDOLGDWLRQRIPL51$WDUJHWV0HWKRGV6DQ'LHJR
&DOLI
/HZLV %3 %XUJH &% %DUWHO '3 &RQVHUYHG VHHG SDLULQJ RIWHQ ÀDQNHG E\ DGHQRVLQHV LQGLFDWHV WKDW
WKRXVDQGVRIKXPDQJHQHVDUHPLFUR51$WDUJHWV&HOO
.UHN$*UQ'3R\01HWDO&RPELQDWRULDOPLFUR51$WDUJHWSUHGLFWLRQV1DW*HQHW
.HUWHV]0,RYLQR18QQHUVWDOO8HWDO7KHUROHRIVLWHDFFHVVLELOLW\LQPLFUR51$WDUJHWUHFRJQLWLRQ
Nat Genet 2007; 39(10):1278-1284.
0LUDQGD.&+X\QK77D\<HWDO$SDWWHUQEDVHGPHWKRGIRUWKHLGHQWL¿FDWLRQRI0LFUR51$ELQGLQJ
sites and their corresponding heteroduplexes. Cell 2006; 126(6):1203-1217.
%DHN'9LOOHQ-6KLQ&HWDO7KHLPSDFWRIPLFUR51$VRQSURWHLQRXWSXW1DWXUH
6HOEDFK06FKZDQKlXVVHU%7KLHUIHOGHU1HWDO:LGHVSUHDGFKDQJHVLQSURWHLQV\QWKHVLVLQGXFHGE\
microRNAs. Nature 2008; 455(7209):58-63.
104. Lim LP, Lau NC, Garrett-Engele P et al. Microarray analysis shows that some microRNAs downregulate
ODUJHQXPEHUVRIWDUJHWP51$V1DWXUH
.UW]IHOGW-5DMHZVN\1%UDLFK5HWDO6LOHQFLQJRIPLFUR51$VLQYLYRZLWKCDQWDJRPLUV¶1DWXUH
2005; 438(7068):685-689.
254
RNA INFRASTRUCTURE AND NETWORKS
(OPHQ - /LQGRZ 0 6LODKWDURJOX $ HW DO $QWDJRQLVP RI PLFUR51$ LQ PLFH E\ V\VWHPLFDOO\
DGPLQLVWHUHG/1$DQWLPL5OHDGVWRXSUHJXODWLRQRIDODUJHVHWRISUHGLFWHGWDUJHWP51$VLQWKHOLYHU
Nucleic Acids Res2007; 36(4):1153-1162.
(OPpQ-/LQGRZ06FKW]6HWDO/1$PHGLDWHGPLFUR51$VLOHQFLQJLQQRQKXPDQSULPDWHV1DWXUH
2008; 452(7189):896-899.
=KDR<5DQVRP-)/L$HWDO'\VUHJXODWLRQRIFDUGLRJHQHVLVFDUGLDFFRQGXFWLRQDQGFHOOF\FOHLQPLFH
lacking miRNA-1-2. Cell 2007; 129(2):303-317.
1LFRODV)(3DLV+6FKZDFK)HWDO([SHULPHQWDOLGHQWL¿FDWLRQRIPLFUR51$WDUJHWVE\VLOHQFLQJ
and overexpressing miR-140. RNA 2008; 14(12):2513-2520.
=KDQJ / +DPPHOO 0 .XGORZ %$ HW DO 6\VWHPDWLF DQDO\VLV RI G\QDPLF PL51$WDUJHW LQWHUDFWLRQV
during C. elegans development. Development 2009; 136(18):3043-3055.
.DUJLQRY)9&RQDFR&;XDQ=HWDO$ELRFKHPLFDODSSURDFKWRLGHQWLI\LQJPLFUR51$WDUJHWV3URF
Natl Acad Sci USA 2007; 104(49):19291-19296.
(DVRZ*7HOHPDQ$$&RKHQ60,VRODWLRQRIPLFUR51$WDUJHWVE\PL513LPPXQRSXUL¿FDWLRQ51$
2007; 13(8):1198-1204.
+RQJ;+DPPHOO0$PEURV9HWDO,PPXQRSXUL¿FDWLRQRI$JRPL513VVHOHFWVIRUDGLVWLQFWFODVV
RIPLFUR51$WDUJHWV3URF1DWO$FDG6FL86$
&KL6:=DQJ-%0HOH$HWDO$UJRQDXWH+,76&/,3GHFRGHVPLFUR51$±P51$LQWHUDFWLRQPDSV
Nature 2009; 460(7254):479-486.
115. Babiarz JE, Ruby JG, Wang Y et al. Mouse ES cells express endogenous shRNAs, siRNAs and other
Microprocessor-independent, Dicer-dependent small RNAs. Genes Dev 2008; 22(20):2773-2785.
&RUFRUDQ'/3DQGLW.9*RUGRQ%HWDO)HDWXUHVRI0DPPDOLDQPLFUR51$3URPRWHUV(PHUJHIURP
Polymerase II Chromatin Immunoprecipitation Data. PLoS ONE 2009; 4(4).
117. Martinez NJ, Ow MC, Barrasa MI et al. A C. elegans genome-scale microRNA network contains composite
IHHGEDFNPRWLIVZLWKKLJKÀX[FDSDFLW\*HQHV'HY
6KDOJL5/LHEHU'2UHQ0HWDO*OREDODQGORFDODUFKLWHFWXUHRIWKHPDPPDOLDQPLFUR51$WUDQVFULSWLRQ
IDFWRUUHJXODWRU\QHWZRUN3/R6&RPSXW%LROH
:KLWH 5 %ODLQH\ 3 )DQ +& HW DO 'LJLWDO 3&5 SURYLGHV VHQVLWLYH DQG DEVROXWH FDOLEUDWLRQ IRU KLJK
throughput sequencing. BMC Genomics 2009; 10:116.
120. Ansorge WJ. Next-generation DNA sequencing techniques. N Biotechnol 2009; 25(4):195-203.
'DL;=KDR3;SVV51$0LQHUDSODQWVKRUWVPDOO51$UHJXODWRU\FDVFDGHDQDO\VLVVHUYHU1XFOHLF
Acids Res2008; 36(Web Server issue):W114-W118.