Logic programming to infer complex RNA expression patterns from

Briefings in Bioinformatics Advance Access published December 22, 2016
Briefings in Bioinformatics, 2016, 1–11
doi: 10.1093/bib/bbw117
Paper
Logic programming to infer complex RNA expression
patterns from RNA-seq data
*These authors contributed equally to this work.
Corresponding author. Shizuka Uchida, Cardiovascular Innovation Institute, University of Louisville, 302 E Muhammad Ali Blvd, Louisville, KY 40202, USA.
Tel.: 502-854-0570; Fax: 502-852-7195; E-mail: [email protected]
Abstract
To meet the increasing demand in the field, numerous long noncoding RNA (lncRNA) databases are available. Given many
lncRNAs are specifically expressed in certain cell types and/or time-dependent manners, most lncRNA databases fall short
of providing such profiles. We developed a strategy using logic programming to handle the complex organization of organs,
their tissues and cell types as well as gender and developmental time points. To showcase this strategy, we introduce
‘RenalDB’ (http://renaldb.uni-frankfurt.de), a database providing expression profiles of RNAs in major organs focusing on
kidney tissues and cells. RenalDB uses logic programming to describe complex anatomy, sample metadata and logical relationships defining expression, enrichment or specificity. We validated the content of RenalDB with biological experiments
and functionally characterized two long intergenic noncoding RNAs: LOC440173 is important for cell growth or cell survival,
whereas PAXIP1-AS1 is a regulator of cell death. We anticipate RenalDB will be used as a first step toward functional studies
of lncRNAs in the kidney.
Key words: gene expression; kidney; lncRNA; microarray; RNA-seq
Introduction
A noncoding RNA (ncRNA) is any expressed transcript that is
not translated into a protein. ncRNAs largely outnumber
protein-coding transcripts and contain many sub-classes, such
as ribosomal RNAs, transfer RNAs and microRNAs. Long noncoding RNAs (lncRNAs) are defined as any ncRNA longer than
200 nt [1]. This interest stems from their high abundance,
importance to many biological functions and relatively small
number of characterized lncRNAs, indicating a wealth of discoveries waiting to be found [2–9]. RNA sequencing (RNA-seq) is an
essential tool for studying lncRNAs and is widely used to screen
for lncRNAs. Furthermore, the majority of RNA-seq data from
published studies is publicly and freely available. Thus, collections of these data can be re-analyzed to test hypotheses outside of the studies they were published in.
Tyler Weirick is a senior Research Technologist at the University of Louisville and a PhD student at the Institute of Cardiovascular Regeneration (Uchida
Lab), who is focused on elucidating the evolutional conservation of long noncoding RNAs (lncRNAs).
Giuseppe Militello is a senior Research Technologist at the University of Louisville and a PhD student at the Institute of Cardiovascular Regeneration
(Uchida Lab), who is working with lncRNAs in the skeletal muscle.
Yuliya Ponomareva is a PhD student at the Institute of Cardiovascular Regeneration (Uchida Lab), who is working with lncRNAs in the heart and stem
cells.
David John is a PhD student at the Institute of Cardiovascular Regeneration (Uchida Lab), who is developing computational algorithms and pipelines to
identify RNA modification events.
Dr Claudia Döring is a bioinformatics scientist and laboratory manager of the RNA laboratory at the Dr Senckenberg Institute of Pathology, who is focused
on gene expression and next-generation sequencing analysis especially in lymphoma diseases.
Prof. Dr Stefanie Dimmeler is the director of the Institutes of Cardiovascular Regeneration.
Dr Shizuka Uchida is an Associate Professor of Medicine at the University of Louisville and an Independent Junior Group Leader at the Institute of
Cardiovascular Regeneration. His laboratory (‘Cardiovascular Bioinformatics’: http://heartlncrna.github.io) is interested in elucidating the functions of
lncRNAs using dry and wet laboratory techniques.
Submitted: 11 August 2016; Received (in revised form): 18 October 2016
Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.
1
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
Tyler Weirick,* Giuseppe Militello,* Yuliya Ponomareva, David John,
Claudia Döring, Stefanie Dimmeler and Shizuka Uchida
2
| Weirick et al.
Methods
The RenalDB database
The primary data analysis for RenalDB was preformed using the
Snakemake workflow engine [13]. The Snakemake pipelines
used in this study are provided at https://bitbucket.org/tweirick/
renaldb. In the pipeline, RNA-seq data sets were downloaded
from the NCBI Sequence Read Archive (SRA) as SRA files [14, 15].
Fastq-dump (version 2.1.7) was used to convert the SRA files to
fastq files (http://www.ncbi.nlm.nih.gov/sra). STAR [16] (version
2.5.1b) was used to align the reads using the genome annotation
files from the Ensembl database (http://www.ensembl.org/info/
data/ftp/index.html; version 83). HTSeq [17] (version 0.6.1.p2)
was used to extract read counts. Conditions in which <25% of
sequences were expressed were discarded. For paired-end
reads, gene-level and transcript-level counts were obtained. For
single-end reads, only the gene-level counts were measured to
avoid problems assigning counts to similar isoforms. DESeq2
[18] (version 3.2) was used to perform between-sample normalization on the read counts. Finally, the sequences were divided
by the sequence’s effective length and scaled by multiplying by
1e3 for sequence length normalization, which is similar to the
calculation for Fragments Per Kilobase of transcript per Million
mapped reads (FPKM) values. The calculation for FPKM values
also includes the total library size in its denominator. However,
this division was excluded owing to the previous betweensample normalization.
The information and analyzed RNA-seq data were stored in
a MySQL database (Figure 1A). Datalog knowledge bases are
shown in Figure 1B-D, which include examples of the codes
describing kidney anatomy (Figure 1B), experiments and their
corresponding accession IDs (Figure 1C) and relationships between anatomical objects and the corresponding information
about their RNA expression patterns (Figure 1D). The web interface was built using the Django web framework and the Datalog
processing handled by the pyDatalog package. Gene Ontology
(GO) annotations were obtained from the GO annotations available via the BioMart Community Portal (www.biomart.org) [19].
For each GO term, a link to AmiGo 2 (http://amigo.geneontology.
org/amigo) [20] is provided. In RenalDB, UGAHash accession system was used as primary IDs [9]. RenalDB will be updated twice
a year to include the latest publicly available RNA-seq data sets.
Culturing of cells, quantitative reverse transcription
polymerase chain reaction and siRNAs
‘Human Embryonic Kidney 293’ (HEK-293) cells were cultured in
the growth medium consisting of DMEM with low glucose and
pyruvate (Life Technologies) supplemented with 10% FBS (Life
Technologies), antibiotics (100 units of penicillin and 100 lg of
streptomycin per ml, Sigma-Aldrich) at 37 C in a humidified atmosphere containing 5% CO2.
RNA was isolated with TRIzol reagent, purified and treated
with TURBO DNase (Life Technologies) before reverse transcription. The primer pairs were designed using Primer3 (http://bio
info.ut.ee/primer3-0.4.0/) [21] and in silico validated with UCSC in
silico polymerase chain reaction (PCR; https://genome.ucsc.edu/
cgi-bin/hgPcr) before extensive testing by experiments for the
existence of a single band of the expected size for each primer
pair. The list of primer pairs used in this study can be found in
Supplementary Table S1.
For human tissues, purified RNA was purchased from commercial vendors as follows: Human Total RNA Master Panel II
(Clonetech, #636643, Lot Number 1202050A); and human heart
(Amsbio, #R1234122-50, Lot Number A804058).
Transient transfection of siRNA duplexes (MISSION, SigmaAldrich; 10 nM and 100 nM final concentration for LOC440173
and PAXIP1-AS1, respectively; Supplementary Table S1) was carried out using RNAiMax (Life Technologies) according to the
manufacturer’s protocol. The corresponding amount of control
siRNA (MISSION Negative control SIC002, confidential sequence;
Sigma-Aldrich) was used. Forty-eight hours after the transfection of siRNAs, cells were exposed to TRIzol to extract RNA.
After the purification and treatment of RNA with TURBO
DNase (Life Technologies), 1 lg of RNA was reverse transcribed
with SuperScript VILO Master Mix (Life Technologies). The firststrand cDNA was diluted to the concentration of 5 ng/ll. For
quantitative reverse transcription polymerase chain reaction
(qRT-PCR), 1 ll (5 ng) of the cDNA template was used with Fast
SYBR Green Master Mix (Life Technologies) via StepOne Plus
Real-Time PCR System (Applied Biosystem) with the following
thermal cycling condition: 95 C for 20 s followed by 40 cycles of
95 C for 3 s and 60 C for 30 s. Relative fold expression was calculated by 2–DDCt using Gapdh as an internal control.
Cell viability assay
A total of 200 000 HEK-293 cells were plated in each well of a sixwell plate. On the following day, siRNAs were transfected. Twenty-
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
Once a set of RNA-seq data sets has been assembled, there
are a number of methods for describing the expression specificity of RNAs. Each method falls into one of two categories. The
first category describes whether a sequence is tissue specific or
ubiquitously expressed. The second one describes how specific
a sequence is to a certain tissue (referred to as global and relative specificity, respectively, in this article) [10]. The basic
requirement for many methods for describing specificity is that
the samples must be on the similar level of anatomical hierarchy. This is concerning considering some lncRNA’s expression profiles are known to be much more complicated than
simply expressed in one tissue [11].
To meet with the above needs, we propose to extend the current methods with logic programming by allowing much greater
logical nuance to be used when processing data. Logic programming is a programming paradigm based on formal logic and is
composed of facts, rules and queries [12]. Here, we created sets
of logic programming facts describing the metadata of RNA-seq
samples and the anatomical organization of kidneys. We then
created logic programming rules describing what expression,
enrichment and specificity mean in these contexts. These facts
and rules were used to preform logic programming queries to
find the hierarchical expression, enrichment and specificity of
RNAs within various RNA-seq data sets. To this end, we introduce a new relational database and Datalog knowledge base for
nephrology called ‘RenalDB’ (http://renaldb.uni-frankfurt.de) to
facilitate the needs of researchers working with kidneys and
serve as an example of the logic programming technique for
bioinformaticians. Additionally, logic programming is used
within RenalDB to extend the utility of our SQL-based advanced
search and to determine the layout of the hierarchical tree
structures showing as expression heat maps in RenalDB.
Furthermore, given that most of the available lncRNA databases
were released without biological validation, we provide biological experiments validating the expressions of lncRNAs
included in RenalDB as well as functional data to gain confidence of potential users of RenalDB.
Complex RNA expression
|
3
are separated based on the type of information they contain. (B) An example of the facts describing kidney anatomy. This series of facts describes the relationships
(e.g. contains, develops from) among anatomical objects (e.g. organism, tissue, cell). (C) An example of the facts describing the experiments included in RenalDB. (D)
An example of the rules describing how expression, enrichment and specificity are defined. These high-level logical statements then used as queries on the anatomical
and experimental databases to determine whether the gene/transcript is expressed, enriched or specific to various anatomical objects.
four hours after the transfection, hydrogen peroxide (Sigma Aldrich,
#H1009) was added at the final concentration of 50mM. The next
day, cells were detached, stained with Trypan Blue (Sigma Aldrich,
cat. T8154) and counted by Neubauer Chamber.
Microarray experiments and data analysis
R Human Gene 1.0 ST Arrays (Affymetrix) were used acGeneChipV
cording to the manufacturer’s protocol and scanned. The CEL
files were analyzed through the updated version of noncoder
web interface (http://noncoder.mpi-bn.mpg.de) [22] using the
pipeline setup for Gene Array Analyzer web interface (http://gaa.
mpi-bn.mpg.de) [23]. After the normalization by Robust Multiarray Average [24] and the application of moderate t-statistics via
the Limma package [25], Transcript Cluster IDs that do not match
to a gene or that match to multiple genes were discarded. Then, a
standard deviation is calculated across samples. For a gene that
matches to multiple Transcript Cluster IDs, the Transcript
Cluster ID with the highest standard deviation across samples
was kept for further analysis.
All the microarray data in this study were deposited in the
Gene Expression Omnibus (GSE74325). The analyzed data can be
accessed via our noncoder web interface (http://noncoder.mpibn.mpg.de/) [22] using ‘Kidney’ as the user name and password.
GO analyses were performed using DAVID (https://david.
ncifcrf.gov/home.jsp) [26].
Statistics
Data are presented as mean 6 SEM. Two-sample, two-tailed,
heteroscedastic Student’s t-test was performed to calculate a
p-value via Microsoft Excel.
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
Figure 1. RenalDB. (A) The database schema for the relational database portion of RenalDB. (B–D) Examples of the knowledge bases used in RenalDB. Knowledge bases
4
| Weirick et al.
Table 1. List of public databases of lncRNAs with their expression profiles
Organism(s)
Samples
Technology
URL
ALDBa
ANGIOGENESa
C-It-Locia
ChIPBasea
Co-LncRNAa
Chicken, cow, pig
Human, mouse, zebrafish
Human, mouse, zebrafish
Human
Human
RNA-seq
RNA-seq
RNA-seq
RNA-seq
RNA-seq
http://res.xaut.edu.cn/aldb/index.jsp
http://angiogenes.uni-frankfurt.de
http://c-it-loci.uni-frankfurt.de
http://deepbase.sysu.edu.cn/chipbase/index.php
http://www.bio-bigdata.com/Co-LncRNA/
deepBase v2.0
Chicken, chimpanzee, cow,
gorilla, fly, frog, human,
monkey, mouse, opossum,
platypus, rat, worm, zebrafish
Many
Many
Human, mouse
Tissues
Cell lines, tissues
Tissues
Tissues
Cancers, cell lines,
tissues
Cell lines, tissues
RNA-seq
http://biocenter.sysu.edu.cn/deepBase/index.php
Many
Many
Cell lines, tissues
Many
Many
RNA-seq
Human
Tissues
MTDa
NONCODE 2016
NREDa
Human
Human
Human
Fly, human, mouse,
worm, zebrafish
Human, pig, rat, mouse
Human, mouse
Human, mouse
Tissues
Tissues
Cancers, tissues
Cancers, diseases,
tissues
cell lines, tissues
Tissues
Cell lines, tissues
FISH,
RNA-seq
RNA-seq
RNA-seq
RNA-seq
RNA-seq
https://www.ebi.ac.uk/gxa/home
https://www.ncbi.nlm.nih.gov/geoprofiles/
http://carolina.imis.athena-innovation.gr/
index.php?r¼lncbasev2
http://www.broadinstitute.org/genome_bio/
human_lincrnas/
http://mlg.hit.edu.cn/lncrna2function/index.jsp
http://www.lncrnadb.org
http://lncrnamap.mbc.nctu.edu.tw/php/index.php
http://lncrnator.ewha.ac.kr/index.htm
TANRIC
Human
Cancers, cell lines
RNA-seq
RNA-seq
ISH,
microarray
RNA-seq
TF2lncRNA
Human
Tissues
RNA-seq
Expression Atlasa
GEO Profilesa
LncBase v.2
Human Body Map
lincRNAsa
lncRNA2functiona
lncRNAdb v2.0
lncRNAMap
lncRNAtora
http://mtd.cbi.ac.cn
http://www.noncode.org
http://jsm-research.imb.uq.edu.au/nred/cgi-bin/
ncrnadb.pl
http://ibl.mdanderson.org/tanric/_design/basic/
index.html
http://mlg.hit.edu.cn/tf2lncrna/index.jsp
a
The databases that contain both protein-coding genes and lncRNAs.
FISH, fluorescent in situ hybridization; ISH, in situ hybridization.
Results
Survey of public lncRNA databases with expression data
Increasing research interests in the field of lncRNAs have
prompted the building of databases to cover the expression profiles of lncRNAs in various conditions and organisms. Currently,
there are 19 public databases that contain the expressions of
lncRNAs (Table 1). These are ALDB [27], ANGIOGENES [28], C-ItLoci [8], ChIPBase [29], Co-LncRNA [30], deepBase v2.0 [31],
Expression Atlas [32], GEO Profiles [33], LncBase v.2 [34], Human
Body Map long intergenic noncoding RNAs (lincRNAs) [35],
lncRNA2function [36], lncRNAdb v2.0 [37], lncRNAMap [38],
lncRNAtor [39], MTD [40], NONCODE 2016 [41], NRED [42],
TANRIC [43] and TF2lncRNA [44]. Of note, some lncRNA-focused
databases, such as LNCipedia [45], were not included in Table 1
because they do not contain expression profiles. The general
trend of these public databases is to provide a comprehensive
view of the expressions of lncRNAs in various conditions. In all
databases except NRED, the expression profiles are based on
RNA-seq data, as only few types of microarrays are designed for
lncRNAs [22]. Most of the public databases (indicated by * in
Table 1) are designed to provide the expression profiles of
protein-coding genes as well. The availability of expression profiles of protein-coding genes is useful, as such expressions
could be used as a validation for a certain expression pattern
(e.g. guilt-by-association for tissue specificity). Furthermore, by
the inclusion of protein-coding genes, GO terms can be used to
infer the possible biological functions of lncRNA by its coexpression to protein-coding genes as in the case for CoLncRNA, lncRNA2function and lncRNAtor.
It is generally accepted in the field that lncRNAs are poorly
conserved from one species to another when their sequences
are examined [3, 4]. Nevertheless, for the purpose of biological
experiments, it is important to know the species-conservation
of lncRNAs of interest, as it is not always possible to perform
biological experiments (e.g. gain/loss-of-function) in human
subjects, which leads to the usage of model organisms (e.g.
mouse, zebrafish) for in vivo experiments [46]. To provide the
evidence of evolutional conservation of lncRNAs, deepBase v2.0,
MTD and NONCODE 2016 offer such information based on the
sequence similarity via BLAST, while ANGIOGENES and C-ItLoci use three types of conservations. The first homology is
based on the concept of ‘positional conservation’ [8, 47] that a
genomic locus spanning between two homologous proteincoding genes are conserved when these protein-coding genes
are conserved between/among organisms. By defining this
locus to be conserved, any lncRNA in this locus is also considered as conserved between/among organisms. The second
homology is based on the ultraconserved elements, which are
species-conserved regions that are shown to be transcriptional
regulators of key developmental genes [48, 49]. The third homology is based on the species-conserved cis-regulatory elements
(enhancers) that are experimentally validated in transgenic
mice [50, 51]. As the intension of biological databases should be
that to assist researchers further for their biological experiments, it is imperative that an option to know the evolutional
conservation of lncRNAs is provided.
As lncRNAs are more tissue-specifically but generally lower
expressed than protein-coding genes [35, 52–55], many databases contain expression data of various tissues. In most cases,
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
Database Name
Complex RNA expression
Building of RenalDB for kidney-related RNA expressions
Provided the above situation, we attempted to build an expression database for lncRNAs and protein-coding genes across organisms to offer a comprehensive profiling of transcripts in one
tissue. For this purpose, we chose kidney as a model, as this tissue is present in all vertebrates, is related to human health (e.g.
diabetes) and has a modest but diverse set of RNA-seq experiments available. Kidneys are complex organs that perform
many important functions, including filtering of blood for excess organic molecules and regulating blood pressure via the secretion of hormones. To maintain their various functions, they
are composed of many cell types, which require careful profiling
for transcriptomes. Furthermore, there exists the most widely
5
used cell line called HEK-293 cells [64], which allow for easy experimental manipulation (e.g. transfection of plasmids and
siRNAs).
To collect RNA-seq data sets of kidneys, various databases
(e.g. Gene Expression Omnibus DataSets, PubMed and SRA)
were searched manually. Because the qualities of the genomic
sequence information and gene annotations vary across organisms, we chose three well-annotated organisms for further
study, which are human, mouse and zebrafish (Supplementary
Table S2). Available data sets included whole kidney, kidney
sub-tissues and isolated cell types, and even single cells. To this
end, we propose a database and analysis programs using logic
programming. Logic programming is a programming paradigm
based on formal logic, using a set of logical sentences consisting
of facts, rules and queries to solve a given problem [65]. For example, consider a transcript expressed in the renal cortex. The
renal cortex is located within kidneys. When sequencing whole
kidney under the same conditions, the same transcript should
be expressed (Figure 2A). One could even descend to the level of
cell types (e.g. endothelial cells isolated from interlobular
arteries, which are located within the kidney cortex). Similarly,
all sequences expressed within these endothelial cells are expressed in the kidney. Furthermore, it is well known that high
abundance sequences can overwhelm lower abundance sequences. Thus, logic programming can be useful tool for integrating RNA-seq data at different hierarchical levels and
beyond. This can be accomplished by modeling the anatomical
and experimental relationships (Figure 2B), creating rules to define various types of expression characteristics (Figure 2C) and
then using queries to determine the expression characteristic of
a given RNA (Figure 2D).
The above concept is further extended in RenalDB through
the advanced search functions, which can handle arbitrarily
complex combinations of search tags and Boolean operators
(‘and’, ‘or’, ‘not’). The search is used in the [LOCI] view (Figure
3A) and in the [VENN] view (Figure 3B). The [LOCI] view displays
rows of sequences with Universal Genomic Accession (UGAs),
names and other high-level descriptive data [9]. Clicking on a
UGA will lead to the sequence view (Figure 3C). The sequence
view contains detailed information about the corresponding sequence, such as general annotation information, links to other
databases provided via CORS request with the UGAHash server
(e.g. accessions will stay up-to-date automatically) [9] and, to
the UCSC Genome Browser [66], associated GO terms and the
corresponding links to the AmiGO 2 [20] database. Furthermore,
the sequence’s expression data are available as a heatmap displaying expression strength with hierarchical tree structures by
showing the ‘contains’ and ‘develops to’ relationships described
by the logical models. Numerical values of expression profiles
are also provided in the table format when [Numeric Values] tab
is clicked. The samples in both of these views can be grouped or
ungrouped based on sex, age and strain. The search is also used
in the [VENN] view, allowing users to visualize up to three
searches as a Venn diagram. The numbers shown in each Venn
diagram are clickable. Once clicked, the list of associated genes
and/or transcripts will be displayed in the [LOCI] view. Special
attention should be paid to the power of the search, as it includes some of the logic programming capabilities. For example,
searching for RNAs expressed in kidney (EXPRESSED:Kidney)
will also yield RNAs not directly detected in the kidney but also
those detected in some child components of kidney. Similarly,
an RNA found to be specific in a child component of kidney will
also be listed as ‘kidney specific’.
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
these databases are built in a way to prompt further knowledge
discovery from the user side with a defined hypothesis (e.g. ‘In
which tissue, a lncRNA of interest is expressed?’). Among these
databases, ALDB, C-It-Loci, MTD and deepBase v2.0 offer a way
to screen for a list of lncRNAs expressed in a target tissue.
Among them, ANGIOGENES, C-It-Loci and MTD allow for the
comparison of expressed lncRNAs across various cell lines and
tissues. With this feature, these three databases provide a set of
predefined hypotheses that could be used directly to screen for
tissue-expressed, enriched and/or specific lncRNAs as well as
protein-coding genes. This feature is important for in silico
screening, as a researcher could obtain a set of interesting
lncRNAs to be studied further in their favorite tissues.
As more and more studies are conducted for lncRNAs, it has
become evident that lncRNAs are cell-type-specifically expressed more so than protein-coding genes [11, 56]. As different
cell types built up a tissue, it is important that such information
is provided. Of the databases listed in Table 1, Human Body Map
lincRNAs (called ‘lincRNA-FISH catalog’) and NRED provide celltype-specific expression of lncRNAs via in situ hybridization
data while MTD offers such RNA-seq data. Given that it is possible to sequence at the single cell level [57–60], there will be
more of such data sets to be included in the databases.
However, as lncRNAs are known to be lower expressed than
protein-coding genes [5, 8, 52], it might be difficult to comprehensively cover the transcriptomes of lncRNAs at the level of
single cell; thus, it will be more helpful to include RNA-seq data
of tissue compartments (e.g. hypothalamus of the brain) to the
databases as in the case of MTD.
Taken together, although there are public databases for
lncRNAs providing their expression patterns in various tissues,
most of these databases fall short of offering a comprehensive
profiling of lncRNAs to cover their cell-type-specific expressions. Such information will be important especially for in vivo
studies, as it will give a clue about where to find phenotypes on
ablating a lncRNA as in the case of knockout mice [61–63].
Furthermore, for the utilization of model organisms, it is of utmost importance that the databases provide the information regarding the evolutional conservation of lncRNAs to allow for
more functional studies. More importantly, most of the databases are released to the public without validation experiments,
especially for functional assays beyond the expression profiling
by RT-PCR experiments, for example, leaving the users to validate the content of the databases by performing biological experiments, which are costly and time-consuming. From the
perspective of product building, it is not a good practice to release a product (i.e. database) without the extensive validation
of its database content, in this case, by performing biological
validation experiments.
|
6
| Weirick et al.
https://commons.wikimedia.org/wiki/File:Diagram_of_epithelial_cells_CRUK_033.svg. (B) A sample knowledge base describing the kidney anatomy and experimental
data relationships with natural language comments explaining what each Datalog statement represents. (C) Another knowledge base describing some simple logical
relationships (i.e. ‘contains’ and ‘expressed in’) with natural language comments explaining what statement represents. (D) A sample Datalog query using the knowledge bases in (B) and (C) to determine expression of ‘RNA_1’.
Validity of RenalDB and functional data of lncRNAs
To validate the content of RenalDB, we screened for lncRNAs;
more specifically, lincRNAs that are located in between proteincoding genes on the genome [3]. The reason for focusing specifically on lincRNAs (instead of sense overlapping lncRNAs, for
example) is that it is experimentally difficult to separate the expression of target lncRNA from that of nearby protein-coding
gene, as some of their sequences overlap and the likelihood of
sharing their promoter sequences is high.
From RenalDB, we selected 22 lincRNAs that are expressed
in the human kidney and performed RT-PCR experiments using
cDNA generated from 10 human tissues to validate their expression patterns in the kidney. As a result, 18 of 22 lincRNAs are expressed in the kidney, and some are enriched in the kidney
compared with other tissues (Figure 4). Of note, the sources of
total RNAs are different from those of publicly available RNAseq used included in RenalDB. Furthermore, we set the number
of PCR cycles to be 35, which may result in not detecting lowly
expressed lincRNAs. Based on the RT-PCR results and conservation among organisms, we chose two lincRNAs (LOC440173 and
PAXIP1-AS1) and characterized them further.
It is a well-known fact that many lncRNAs have distinct expression patterns in the cell (e.g. expressed exclusively in the
nucleus). To determine subcellular localization of the selected
lincRNAs, nuclear and cytoplasmic fractions of RNA were prepared from HEK-293 cells, and lincRNAs were detected by RTPCR experiment (Figure 5A). The result indicates that LOC440173
is expressed in both the nuclear and cytoplasmic fractions,
whereas PAXIP1-AS1 is exclusively detected in the nucleus.
Although the above expression profiling experiments are informative, biological functions of the selected lincRNAs are unknown without further experiments. To this end, LOC440173
was silenced by siRNAs. Compared with the control (siRNA
against scramble control sequence, termed ‘siScr’ hereafter),
the expression of LOC440173 was efficiently silenced (Figure 5B).
From these samples, total RNA was isolated and subjected to
microarrays (Figure 5C). When a threshold of 1.5-fold and pvalue of 0.05 cutoff were applied, 80 up- and 375 downregulated
genes were identified (Supplementary Tables S3 and S4). To
these genes, GO analysis was performed (Supplementary Tables
S5 and S6). Among upregulated genes, GO terms related to protein transport and cell division are enriched; while GO terms
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
Figure 2. Logic programming. (A) Kidney anatomy. The original image was obtained from (Blausen.com staff. ‘Blausen gallery 2014’. Wikiversity Journal of Medicine 1
(2); doi:10.15347/wjm/2014.010; ISSN 20018762) and modified by showing only anatomical terms related to this figure. The image of endothelial cells was obtained from
Complex RNA expression
|
7
values can be modified using the checkbox list and updated by clicking the [GO] button.
related to cell migration and growth are enriched among downregulated genes. These findings are consistent with the morphology of the cells, which revealed a reduction of cell numbers
after LOC440173 silencing (Figure 5D), suggesting that
LOC440173 is important for cell growth or cell survival.
Next, PAXIP1-AS1 was analyzed in a similar manner.
Although its official name is ‘PAXIP1 Antisense RNA 1’, PAXIP1AS1 does not overlap with the protein-coding gene PAXIP1 on
the genome. After silencing of PAXIP1-AS1 (Figure 6A) followed
by microarray experiment (Figure 6B), the same set of threshold
values (1.5-fold and p < 0.05) was applied. There were 91 upand 39 downregulated genes (Supplementary Tables S7 and S8).
To these genes, GO analysis was performed (Supplementary
Tables S9 and S10). Aside from various GO terms related to
metabolic processes enriched among upregulated genes, many
GO terms related to cell death are enriched in both up- and
downregulated genes. To test whether silencing of PAXIP1-AS1
affects cell viability, cells were treated with hydrogen peroxide,
and the surviving cells were counted and normalized to those of
the corresponding siScr cells (Figure 6C). Compared with siScr
cells, the survival of PAX1P1-AS1-silenced cells was improved,
particularly if compared with cells after LOC440173 silencing.
These experiments suggest that that PAXIP1-AS1 is a regulator
of cell death. However, the mechanism of its action is unknown.
To further elucidate the mechanism, we determined whether
PAXIP1-AS1 could cis-regulate the expression of the nearby
protein-coding gene PAXIP1. On silencing of PAXIP1-AS1,
downregulation of PAXIP1 was recorded (Figure 6D). Given that
Paxip1 (also known as ‘PTIP’) homozygous mutant mice die by
embryonic day 9.5 via accumulation of DNA damage [67], the
modulation of cell death-related genes on silencing of PAXIP1AS1 might be owing to a decreased expression of PAXIP1.
However, further research is required to clearly define this
mechanism.
Discussion
On the survey of public databases for lncRNAs, it is evident that
most of the current databases do not provide the detailed profiling of lncRNAs for their cell-type-specific expressions at the
genome-wide level. To offer a step forward for providing such
information, we built a knowledge database RenalDB to comprehensively cover the transcriptomes of human, mouse and
zebrafish kidneys. Although some databases (e.g. Expression
Atlas [32] and GEO Profiles [33]) contain more data than
RenalDB, none are able to filter data by tissue enrichment or
specificity. Furthermore, RenalDB is the only one of these databases to use logic programing. Expression Atlas does not have
an advanced search option, while GEO Profiles does offer an
advanced search with many options; however, it suffers from a
lack of curation. By curation we mean humans going over the
data and resolving discrepancies in metadata. This poses a
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
Figure 3. Usage of RenalDB. (A) A [LOCI] view from RenalDB showing search query using search tags and Boolean operators. (B) Example of a query on the [VENN] view.
(C) Example from sequence view showing the basic annotation data, the graphical expression overview for a gene, GO terms and homologs. The grouping of expression
8
| Weirick et al.
LOC440173. (A) Subcellular localization of LOC440173 and PAXIP1-AS1 in HEK-293
cells. For GAPDH, the primer pair targeting its intron between exon 2 and 3 was
used. The representative image from three independent assays. (B) Efficiency of
silencing by siRNAs; n ¼ 3. (C) Volcano plot comparing silencing of LOC440173
and siScr. Genes selected above the threshold of 1.5-fold and p < 0.05 are colored
in red; n ¼ 2. (D) Morphologies of cells on siRNA transfection. The scale bar represents 100 lM.
Figure 4. RT-PCR experiment of selected kidney-expressed lincRNAs. To be consistent, 35 cycles of PCR reactions were used for all primer pairs. GAPDH and
PRLP0 were used as loading controls. Those lincRNAs whose expressions could
not detected with 35 cycles of PCR reactions are marked in blue, while two
lincRNAs used for further experiments are marked in red.
problem for preforming advanced searches. For example, consider simple metadata such as ‘sex’. One would expect the values to be something like ‘Male’, ‘Female’ or ‘Unknown’.
However, the actual metadata is much more messy with multiple related headings and values. While building RenalDB, we
encountered various headings, including SEX, sex, Sex, Gender,
GENDER, mouse gender. The values under these headings contained various labels, such as M, Male, male, None, N, U, Mixed,
pooled. This is the case of metadata heading with only a few
possible answers, and yet, it became complicated. The sample
source metadata tags are more complex, especially because
many of the cell types and tissues have synonyms, for example,
‘Renal Cortex’ versus ‘Kidney Cortex’. We standardized all
metadata within RenalDB using text similarity clustering with
OpenRefine [68]. Furthermore, we extensively searched the
GEO/SRA archives for all kidney-related samples. Owing to this
situation, we only considered kidney-related metadata with
high-quality samples available. The curated metadata can be
found here: http://renaldb.uni-frankfurt.de/static/cit/data/sam
ples_dump.20160210.tsv
When similar transcriptomics databases focused in the kidney were searched, there are only two databases currently
available: Renal Gene Expression Database (RGED) [69] and
Toxygates [70]. Both databases contain microarrays but not
RNA-seq data, which is now increasingly used in the laboratories around the world. This fact alone makes RenalDB a valuable
tool for researchers working in the field of nephrology.
Furthermore, both RGED and Toxygates only contain the information for the selected sets of protein-coding genes that are on
the microarray platforms that are being used. In comparison,
RenalDB covers whole transcriptomes, including all proteincoding genes and lncRNAs currently being annotated by the
most widely used informational database Ensembl.
In the field of nephrology, the following lncRNAs have been
identified and studied in detail: Arid2-IR [71], H19 [72], HOTAIR
[73], RCCRT1 [74], TapSAKI [75] and Xist [76]. Given that many
lncRNAs are expressed in various cell types and parts of the kidney, more functional evidence is necessary to comprehensively
understand the transcriptomes of kidney and their contributions to the functionalities of kidneys across organisms.
Compared with the reports about the lncRNA databases, this
study provides the functional data of lncRNAs along with the
applicability of the lncRNA database itself. This point is important, as it should not be up to the users to verify the content of
the database being built and introduced to the public, although
the database itself might have been built using the previously
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
Figure 5. Expression of two selected lincRNAs and characterization of
Complex RNA expression
|
9
numbers of surviving cells were normalized to the corresponding siScr (100nM), whereas those of LOC440173-silenced cells were normalized to the corresponding siScr
(10nM). (D) Expression of PAXIP1 on silencing of PAXIP1-AS1; n ¼ 3.
published high-throughput data sets. It is imperative to note
that it is the responsibility of the software developer and his/
her team to provide the functional data of such database. In
conclusion, this study should set the standard for the further
building of bioinformatics tools with the confidence guaranteed
to the users.
Key Points
• There are public databases providing transcriptomics
data for expressions of lncRNAs.
• There is a lack of cell-type-specific databases for
lncRNAs targeting a specific tissue.
• RenalDB provides a convenient way to screen for kidney, its sub-tissues and cell expressed, enriched and/
or specific RNAs.
• Experimental evidence helps demonstrate the validity
of databases being introduced.
Supplementary Data
Supplementary data are available online at http://bib.oxford
journals.org/.
Acknowledgements
The authors would like to thank Wenjun Jin for excellent
technical assistance.
Funding
The LOEWE Center for Cell and Gene Therapy (State of
Hessen)
(to
S.U.
and
S.D.);
the
Deutsche
Forschungsgemeinschaft (SFB834 to S.U. and S.D.); the
German Center for Cardiovascular Research (DZHK) (to S.U.
and S.D.); and the startup funding from the Mansbach
Family, the Gheens Foundation and other generous supporters at the University of Louisville (to S.U.).
References
1. Lander ES, Linton LM, Birren B, et al. Initial sequencing and
analysis of the human genome. Nature 2001;409:860–921.
2. Mercer TR, Gerhardt DJ, Dinger ME, et al. Targeted RNA
sequencing reveals the deep complexity of the human transcriptome. Nat Biotechnol 2012;30:99–104.
3. Uchida S, Dimmeler S. Long noncoding RNAs in cardiovascular diseases. Circ Res 2015;116:737–50.
4. Uchida S, Gellert P, Braun T. Deeply dissecting stemness:
making sense to non-coding RNAs in stem cells. Stem Cell Rev
2012;8:78–86.
5. Weirick T, Militello G, Muller R, et al. The identification and
characterization of novel transcripts from RNA-seq data. Brief
Bioinform 2016;17:678–85.
6. Boeckel JN, Jae N, Heumuller AW, et al. Identification and
characterization of Hypoxia-regulated endothelial circular
RNA. Circ Res 2015;117:884–90.
7. Michalik KM, You X, Manavski Y, et al. Long noncoding RNA
MALAT1 regulates endothelial cell function and vessel
growth. Circ Res 2014;114:1389–97.
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
Figure 6. Characterization of PAXIP1-AS1. (A) Efficiency of silencing by siRNAs; n ¼ 3. (B) Volcano plot comparing silencing of PAXIP1-AS1 and siScr. Genes selected
above the threshold of 1.5-fold and p < 0.05 are colored in red; n ¼ 2. (C) Cell viability on the treatment with hydrogen peroxide; n ¼ 3. In the case of PAXIP1-AS1, the
10
|
Weirick et al.
pathways based on human RNA-Seq data. Database
2015;2015.
31. Zheng LL, Li JH, Wu J, et al. deepBase v2.0: identification, expression, evolution and function of small RNAs, LncRNAs
and circular RNAs from deep-sequencing data. Nucleic Acids
Res 2016;44:D196–202.
32. Petryszak R, Keays M, Tang YA, et al. Expression atlas update–
an integrated database of gene and protein expression in
humans,
animals
and
plants.
Nucleic
Acids
Res
2016;44:D746–52.
33. Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for
functional genomics data sets–update. Nucleic Acids Res
2013;41:D991–5.
34. Paraskevopoulou MD, Georgakilas G, Kostoulas N, et al.
DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs.
Nucleic Acids Res 2013;41:D239–45.
35. Cabili MN, Trapnell C, Goff L, et al. Integrative annotation of
human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 2011;25:1915–27.
36. Jiang Q, Ma R, Wang J, et al. LncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs
based on RNA-seq data. BMC Genomics 2015;16(Suppl 3):S2.
37. Quek XC, Thomson DW, Maag JL, et al. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res 2015;43:D168–73.
38. Chan WL, Huang HD, Chang JG. lncRNAMap: a map of putative regulatory functions in the long non-coding transcriptome. Comput Biol Chem 2014;50:41–9.
39. Park C, Yu N, Choi I, et al. lncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs.
Bioinformatics 2014;30:2480–5.
40. Sheng X, Wu J, Sun Q, et al. MTD: a mammalian transcriptomic database to explore gene expression and regulation.
Brief Bioinform 2016, in press.
41. Zhao Y, Li H, Fang S, et al. NONCODE 2016: an informative and
valuable data source of long non-coding RNAs. Nucleic Acids
Res 2015;44:D203–8.
42. Dinger ME, Pang KC, Mercer TR, et al. NRED: a database of long
noncoding RNA expression. Nucleic Acids Res 2009;37:D122–6.
43. Li J, Han L, Roebuck P, et al. TANRIC: an interactive open platform to explore the function of lncRNAs in cancer. Cancer Res
2015;75:3728–37.
44. Jiang Q, Wang J, Wang Y, et al. TF2LncRNA: identifying common transcription factors for a list of lncRNA genes from
ChIP-Seq data. Biomed Res Int 2014;2014:317642.
45. Volders PJ, Verheggen K, Menschaert G, et al. An update on
LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res 2015;43:D174–80.
46. Uchida S, Schneider A, Wiesnet M, et al. An integrated approach for the systematic identification and characterization
of heart-enriched genes with unknown functions. BMC
Genomics 2009;10:100.
47. Ulitsky I, Shkumatava A, Jan CH, et al. Conserved function of
lincRNAs in vertebrate embryonic development despite rapid
sequence evolution. Cell 2011;147:1537–50.
48. Bejerano G, Pheasant M, Makunin I, et al. Ultraconserved
elements in the human genome. Science 2004;304:1321–5.
49. Dimitrieva S, Bucher P. UCNEbase–a database of ultraconserved non-coding elements and genomic regulatory blocks.
Nucleic Acids Res 2013;41:D101–9.
50. Pennacchio LA, Ahituv N, Moses AM, et al. In vivo enhancer
analysis of human conserved non-coding sequences. Nature
2006;444:499–502.
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
8. Weirick T, John D, Dimmeler S, et al. C-It-Loci: a knowledge
database
for
tissue-enriched
loci.
Bioinformatics
2015;31:3537–43.
9. Weirick T, John D, Uchida S. Resolving the problem of multiple accessions of the same transcript deposited across various public databases. Brief Bioinform 2016, doi: 10.1093/bib/
bbw017.
10. Pine PS, Rosenzweig BA, Thompson KL. An adaptable method
using human mixed tissue ratiometric controls for benchmarking performance on gene expression microarrays in
clinical laboratories. BMC Biotechnol 2011;11:38.
11. Goff LA, Groff AF, Sauvageau M, et al. Spatiotemporal expression and transcriptional perturbations by long noncoding RNAs
in the mouse brain. Proc Natl Acad Sci USA 2015;112:6855–62.
12. Baral C, Gelfond M. Logic programming and knowledge representation. J Log Program 1994;19:73–148.
13. Koster J, Rahmann S. Snakemake–a scalable bioinformatics
workflow engine. Bioinformatics 2012;28:2520–2.
14. Kodama Y, Shumway M, Leinonen R. The sequence read
archive: explosive growth of sequencing data. Nucleic Acids
Res 2012;40:D54–6.
15. McWilliam H, Li W, Uludag M, et al. Analysis tool web services
from the EMBL-EBI. Nucleic Acids Res 2013;41:W597–600.
16. Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013;29:15–21.
17. Anders S, Pyl PT, Huber W. HTSeq–a Python framework to
work with high-throughput sequencing data. Bioinformatics
2015;31:166–9.
18. Love MI, Huber W, Anders S. Moderated estimation of fold
change and dispersion for RNA-seq data with DESeq2.
Genome Biol 2014;15:550.
19. Smedley D, Haider S, Durinck S, et al. The BioMart community
portal: an innovative alternative to large, centralized data
repositories. Nucleic Acids Res 2015;43:W589–98.
20. Carbon S, Ireland A, Mungall CJ, et al. AmiGO: online access to
ontology and annotation data. Bioinformatics 2009;25:288–9.
21. Untergasser A, Cutcutache I, Koressaar T, et al. Primer3–new
capabilities and interfaces. Nucleic Acids Res 2012;40:e115.
22. Gellert P, Ponomareva Y, Braun T, et al. Noncoder: a web interface for exon array-based detection of long non-coding RNAs.
Nucleic Acids Res 2013;41:e20.
23. Gellert P, Teranishi M, Jenniches K, et al. Gene array analyzer:
alternative usage of gene arrays to study alternative splicing
events. Nucleic Acids Res 2012;40:2414–25.
24. Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array
probe level data. Biostatistics 2003;4:249–64.
25. Ritchie ME, Phipson B, Wu D, et al. limma powers differential
expression analyses for RNA-sequencing and microarray
studies. Nucleic Acids Res 2015;43:e47.
26. Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009;4:44–57.
27. Li A, Zhang J, Zhou Z, et al. ALDB: a domestic-animal long
noncoding RNA database. PLoS One 2015;10:e0124003.
28. Muller R, Weirick T, John D, et al. ANGIOGENES: knowledge
database for protein-coding and noncoding RNA genes in
endothelial cells. Sci Rep 2016;6:32475.
29. Yang JH, Li JH, Jiang S, et al. ChIPBase: a database for decoding
the transcriptional regulation of long non-coding RNA and
microRNA genes from ChIP-Seq data. Nucleic Acids Res
2013;41:D177–87.
30. Zhao Z, Bai J, Wu A, et al. Co-LncRNA: investigating the
lncRNA combinatorial effects in GO annotations and KEGG
Complex RNA expression
11
64. Stepanenko AA, Dmitrenko VV. HEK293 in cell biology and
cancer research: phenotype, karyotype, tumorigenicity, and
stress-induced
genome-phenotype
evolution.
Gene
2015;569:182–90.
65. Eklund P, Klawonn F. Neural fuzzy logic programming. IEEE
Trans Neural Netw 1992;3:815–8.
66. Kent WJ, Sugnet CW, Furey TS, et al. The human genome
browser at UCSC. Genome Res 2002;12:996–1006.
67. Cho EA, Prindle MJ, Dressler GR. BRCT domain-containing
protein PTIP is essential for progression through mitosis. Mol
Cell Biol 2003;23:1666–73.
68. Ham K. OpenRefine (version 2.5). http://openrefine.org. Free,
open-source tool for cleaning and transforming data. J Med
Libr Assoc 2013;101:233–4.
69. Zhang Q, Yang B, Chen X, et al. Renal Gene Expression
Database (RGED): a relational database of gene expression
profiles in kidney disease. Database 2014;2014, in press.
70. Nystrom-Persson J, Igarashi Y, Ito M, et al. Toxygates: interactive toxicity analysis on a hybrid microarray and linked
data platform. Bioinformatics 2013;29:3080–6.
71. Zhou Q, Huang XR, Yu J, et al. Long noncoding RNA Arid2-IR is
a novel therapeutic target for renal inflammation. Mol Ther
2015;23:1034–43.
72. Kanwar YS, Pan X, Lin S, et al. Imprinted mesodermal specific
transcript (MEST) and H19 genes in renal development and
diabetes. Kidney Int 2003;63:1658–70.
73. Wu Y, Liu J, Zheng Y, et al. Suppressed expression of long
non-coding RNA HOTAIR inhibits proliferation and tumourigenicity
of
renal
carcinoma
cells.
Tumour
Biol
2014;35:11887–94.
74. Song S, Wu Z, Wang C, et al. RCCRT1 is correlated with prognosis and promotes cell migration and invasion in renal cell
carcinoma. Urology 2014;84:730 e731–7.
75. Lorenzen JM, Schauerte C, Kielstein JT, et al. Circulating long
noncoding RNATapSaki is a predictor of mortality in critically
ill patients with acute kidney injury. Clin Chem
2015;61:191–201.
76. Huang YS, Hsieh HY, Shih HM, et al. Urinary Xist is a potential
biomarker for membranous nephropathy. Biochem Biophys Res
Commun 2014;452:415–21.
Downloaded from http://bib.oxfordjournals.org/ at University of Louisville on January 12, 2017
51. Visel A, Minovitsky S, Dubchak I, et al. VISTA enhancer
browser–a database of tissue-specific human enhancers.
Nucleic Acids Res 2007;35:D88–92.
52. Derrien T, Johnson R, Bussotti G, et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene
structure, evolution, and expression. Genome Res 2012;22:
1775–89.
53. Clark MB, Mercer TR, Bussotti G, et al. Quantitative gene
profiling of long noncoding RNAs with targeted RNA sequencing. Nat Methods 2015;12:339–42.
54. Molyneaux BJ, Goff LA, Brettler AC, et al. DeCoN: genomewide analysis of in vivo transcriptional dynamics during pyramidal neuron fate selection in neocortex. Neuron 2015;85:
275–88.
55. Werber M, Wittler L, Timmermann B, et al. The tissue-specific
transcriptomic landscape of the mid-gestational mouse embryo. Development 2014;141:2325–30.
56. Mercer TR, Dinger ME, Sunkin SM, et al. Specific expression of
long noncoding RNAs in the mouse brain. Proc Natl Acad Sci
USA 2008;105:716–21.
57. Tang F, Barbacioru C, Bao S, et al. Tracing the derivation of
embryonic stem cells from the inner cell mass by single-cell
RNA-Seq analysis. Cell Stem Cell 2010;6:468–78.
58. Tang F, Barbacioru C, Nordman E, et al. RNA-Seq analysis to
capture the transcriptome landscape of a single cell. Nat
Protoc 2010;5:516–35.
59. Trapnell C, Cacchiarelli D, Grimsby J, et al. The dynamics and
regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014;32:381–6.
60. Yan L, Yang M, Guo H, et al. Single-cell RNA-Seq profiling of
human preimplantation embryos and embryonic stem cells.
Nat Struct Mol Biol 2013;20:1131–9.
61. Grote P, Wittler L, Hendrix D, et al. The tissue-specific lncRNA
Fendrr is an essential regulator of heart and body wall development in the mouse. Dev Cell 2013;24:206–14.
62. Lai KM, Gong G, Atanasio A, et al. Diverse phenotypes and
specific transcription patterns in twenty mouse lines with
ablated LincRNAs. PLoS One 2015;10:e0125522.
63. Sauvageau M, Goff LA, Lodato S, et al. Multiple knockout
mouse models reveal lincRNAs are required for life and brain
development. Elife 2013;2:e01749.
|