Full Text - Briefings in Functional Genomics

B RIEFINGS IN FUNC TIONAL GENOMICS . VOL 11. NO 2. 96 ^106
doi:10.1093/bfgp/els002
ESTand transcriptome analysis of
cephalochordate amphioxusçpast,
present and future
Yu-Bin Wang, Shu-Hwa Chen, Chun-Yen Lin and Jr-Kai Yu
Advance Access publication date 4 February 2012
Abstract
The cephalochordates, commonly known as amphioxus or lancelets, are now considered the most basal chordate
group, and the studies of these organisms therefore offer important insights into various levels of evolutionary
biology. In the past two decades, the investigation of amphioxus developmental biology has provided key knowledge
for understanding the basic patterning mechanisms of chordates. Comparative genome studies of vertebrates and
amphioxus have uncovered clear evidence supporting the hypothesis of two-round whole-genome duplication
thought to have occurred early in vertebrate evolution and have shed light on the evolution of morphological novelties in the complex vertebrate body plan. Complementary to the amphioxus genome-sequencing project, a large
collection of expressed sequence tags (ESTs) has been generated for amphioxus in recent years; this valuable collection represents a rich resource for gene discovery, expression profiling and molecular developmental studies in the
amphioxus model. Here, we review previous EST analyses and available cDNA resources in amphioxus and discuss
their value for use in evolutionary and developmental studies. We also discuss the potential advantages of applying
high-throughput, next-generation sequencing (NGS) technologies to the field of amphioxus research.
Keywords: amphioxus; EST; evolution; development; transcriptome; genome
INTRODUCTION
Amphioxus (also known as lancelets) are a group
of marine invertebrate animals belonging to the subphylum Cephalochordata. Together with Tunicata
and Vertebrata, they constitute the phylum
Chordata. All chordates share several key characteristics, including a dorsal nerve chord, a notochord,
segmented somites and pharyngeal gill slits. Unlike
most tunicates whose larvae undergo drastic metamorphosis and lose many chordate characteristics as
they become sessile in adulthood, the adult body plan
of amphioxus remains highly similar to that of vertebrates (Figure 1A). Furthermore, the embryology of
amphioxus is more comparable to that of vertebrates
[1]. Therefore, cephalochordates were traditionally
considered the closest relatives to vertebrates.
Recent molecular phylogenetic analyses, however,
have indicated that tunicates are in fact the closest
sister group to vertebrates and that cephalochordates
actually represent the most basal group within the
chordate lineage [2–4]. Therefore, amphioxus represents a key model system for understanding both
conserved chordate developmental mechanisms and
the evolutionary origins of the complex vertebrate
body plan. For this reason, the genome of the amphioxus Branchiostoma floridae was sequenced and published in 2008 [5, 6], and the draft genome has
become a rich source for comparative genomics
and developmental studies in this organism.
To facilitate gene identification and verification of
gene model predictions in the draft genome, a large
collection of B. floridae expressed sequence tags
Corresponding author. Jr-Kai Yu, Institute of Cellular and Organismic Biology, Academia Sinica, 128 Academia Road, Section 2,
Nankang, Taipei, 11529, Taiwan. Tel: þ886-2-2789-9517; Fax: þ886-2-2785-8059; E-mail: [email protected]
Yu-Bin Wang is a graduate student at the Institute of Information Science, Academia Sinica and Graduate Institute of Zoology,
College of Life Science, National Taiwan University, Taipei, Taiwan.
Shu-Hwa Chen is an invited research scholar at the Computational Biology Research Center (CBRC), Advanced Industrial Science
and Technology (AIST), Japan.
Chun-Yen-Lin is an Associate Research Fellow at the Institute of Information Science, Academia Sinica, Taipei, Taiwan.
Jr-KaiYu is an Assistant Research Fellow at the Institute of Cellular and Organismic Biology, Academia Sinica and is an adjunct faculty
member of the Institute of Oceanography, National Taiwan University.
ß The Author 2012. Published by Oxford University Press. All rights reserved. For permissions, please email: [email protected]
Amphioxus ESTand transcriptome analysis
97
Figure 1: Adult and embryonic stages of the amphioxus B. floridae. (A) Photograph of a living adult female animal
(left lateral view) and a corresponding diagram of the major anatomical features of amphioxus. (B) Unfertilized
egg; (C) gastrula; (D) neurula; (E) 36 -h larva.
(ESTs) was generated [6, 7]. ESTs are single-pass
reads of several hundred base pairs generated from
either the 50 - or 3’-end of randomly selected clones
from cDNA libraries representing the transcript ends
of the expressed portion of the genome [8]. In this
review, we first provide a historical overview of EST
studies that were performed prior to the completion
of the B. floridae genome project, focusing on the
biological insights that they provided. We next describe the large-scale EST analysis of B. floridae that
was carried out along with the B. floridae genomesequencing project and provide updated information
on the currently available cDNA resources for this
species. We also discuss the applications of the
B. floridae cDNA resources for the use in evolutionary
and developmental studies. Finally, we conclude
with a discussion of the use of high-throughput
next-generation DNA sequencing (NGS) technologies in the amphioxus model, focusing on the potential applications for future transcriptome analysis.
Historical overview of amphioxus EST
analyses
Before the amphioxus genome-sequencing project,
several studies had exploited EST surveys as a
low-cost alternative to full genome sequencing to
elucidate the tissue specific and developmental gene
expression profiles of amphioxus and to investigate
gene duplication events during vertebrate evolution.
EST studies have been carried out by various research groups and with different amphioxus species,
including the most widely used Florida amphioxus
B. floridae and the Asian amphioxus species. It should
be noted that the Asian amphioxus was originally
named Branchiostoma belcheri, and later a subspecies
status (Branchiostomabelcheritsingtauense) was recognized
for the population distributed along the northern
Chinese coast and in Japan [9, 10]. However,
recent studies have clearly demonstrated that the
amphioxus population of the Asian-Pacific coast is
comprised of two morphologically and genetically
distinct species, B. belcheri and B. japonicum [11, 12].
For clarity, the northern Asian-Pacific coast subspecies is now called B. japonicum, and its distribution
extends to the Xiamen area near the southeast
Chinese coast, where B. japonicum and B. belcheri
co-exist [11]. Based on the location of the Asian
amphioxus collection sites, we suspect that some of
the EST studies [13, 14] did not actually collect samples from the southern species B. belcheri but actually
98
Y.-B.Wang et al.
collected samples of the northern species B. japonicum.
To avoid confusion, we henceforth will refer to
B. belcheri and B. japonicum in this review to attribute
the EST resources to the correct Asian amphioxus
species; however, we still list the species name in
parentheses as they appeared in the original literature
for reference.
EST analysis was used to provide a first glimpse of
the genes expressed in the amphioxus notochord
[14]. The notochord is the defining characteristic
of the Chordata phylum. In vertebrates, the notochord is a transient embryonic structure derived from
the dorsal axial mesoderm that serves as an important
signaling center, secreting various signals necessary
for patterning the dorsal–ventral and left–right axes
of the vertebrate body plan [15]. The notochord also
functions as an axial skeletal structure in developing
embryos before it is replaced by the vertebrae [15].
Unlike vertebrates, amphioxus retain the notochord
throughout their lives and never develop vertebrae.
Suzuki and Satoh [14] dissected the notochord tissues
and isolated notochord cells from approximately 200
B. japonicum adults (called Branchiostoma belcheri in their
paper) to construct a notochord cDNA library. They
then sequenced both the 50 and the 30 ESTs from
257 randomly picked cDNA clones. After analyzing
the amphioxus notochord ESTs, they found that
11% of the cDNA clones code for muscle-related
proteins and 6% code for extracellular matrix proteins, suggesting that in addition to its structural
function, the adult amphioxus notochord is a contractile tissue used in locomotion. Interestingly, with
the exception of bFGF, they did not detect homologs of any known signaling molecules in their EST
data. This finding is probably due to differences in
the developmental stage assayed, as the Suzuki and
Satoh library is derived from adult notochord cells,
while most known signaling molecules primarily
function during embryonic stages and are therefore
not likely to be highly expressed in the adult stage.
Additionally, the limited number of ESTs sequenced
in this study may have been insufficient to detect low
abundance transcripts. In fact, later large-scale EST
analyses of different developmental stages of the
Florida amphioxus B. floridae did identify classical signaling molecules and transcription factors involved in
major axis patterning [16]. Moreover, the expression
surveys with in situ hybridization indicated that many
of these genes are expressed in either the notochord
or in its precursor in amphioxus embryos [16],
suggesting a conserved organizer function for amphioxus dorsal axial mesoderm.
In 2002, Mou et al. [13] published the sequences
of 5235 ESTs from a cDNA library constructed
from neurula-stage embryos of Asian amphioxus
B. japonicum (called Branchiostoma belcheri tsingtauense in
their paper). In addition to identifying many genes
coding for common metazoan structural and enzymatic housekeeping proteins, Mao et al. also identified
many transcription factors and regulatory proteins
that are involved in developmental processes [13].
Thus, their EST analysis provided a preliminary
global view of the genes expressed during this important developmental stage in amphioxus.
In another study, Panopoulou et al. [17]
sequenced 14 189 50 ESTs from two pre-normalized
embryonic cDNA libraries of the Florida amphioxus
B. floridae. Without whole-genome sequencing data,
they attempted to compile a substantial amphioxus
gene set to address the long-debated two rounds
of whole-genome duplication (2R) hypothesis
of early vertebrate genome evolution [18]. After
assembling their amphioxus ESTs into 9173 consensus sequences and assigning them to orthologous
groups with genes from human, mouse, ascidian,
Caenorhabditis elegans, Drosophila, and yeast, they estimated the extent of gene duplication at the transition
from invertebrates to vertebrates. They found that
on average, humans and mice have 2.6 times more
gene copies per orthologous group than amphioxus.
The authors therefore concluded at least one large
genome duplication event occurred at the origin of
vertebrates and that subsequent smaller scale duplications may have also occurred during the course of
vertebrate evolution. However, this estimation of
gene-copy ratio could be complicated by possible
gene loss in vertebrate lineages after gene duplication
events [19] or gene duplication events unique to the
amphioxus lineage [20–23].
Later, the amphioxus genome-sequencing project provided more definitive information about
the extent of whole-genome duplication during
early vertebrate evolution. When whole-genome sequence data from the Florida amphioxus B. floridae
became available [5, 6], a remarkably high level of
conserved synteny between the amphioxus genome
and human genome was recognized, leading to the
identification of seventeen putative ancestral chordate linkage groups [6]. More interestingly, when
mapped to human chromosomes, each putative ancestral chordate linkage group corresponded to large
Amphioxus ESTand transcriptome analysis
segments on four different human chromosomes [6].
This quadruple conserved synteny pattern strongly
suggests that two rounds of whole-genome duplication occurred in the vertebrate lineage.
Large-scale ESTanalysis and the
available cDNA resources of amphioxus
In addition to the B. floridae genome project, a
large-scale EST analysis of the amphioxus B. floridae
has been carried out since 2004. Five nonnormalized cDNA libraries were prepared from
unfertilized eggs, gastrula, neurula, 36-h larvae
(Figure 1B–E), and mixed male and female mature
adults. Initially, 262 037 ESTs were sequenced from
both the 50 and 30 - ends of approximately 140 000
cDNA clones at the National Institute of Genetics,
Japan [16]. Since the first-strand cDNAs were
synthesized with an oligo(dT) primer during library
preparation, the 3’ ESTs of this data set were subjected to cluster analysis to identify overlapping
cDNA clones, and 21 229 unique transcript clusters
were identified in this process. Representative cDNA
clones from each cluster were chosen and re-arrayed
into sixty-four 384-well plates to construct the
‘Branchiostoma floridae Gene Collection Release 1’
[7]. A searchable online database for these ESTs
and cDNA resources has been constructed at
http://amphioxus.icob.sinica.edu.tw/. Users can
perform BLAST searches against the EST database
to identify their genes of interest and can also request
cDNA clones through the website. At the same time,
an additional set of 97 536 ESTs was sequenced from
the same gastrula, neurula and 36-h larvae libraries by
the Joint Genome Institute (USA) during the B. floridae genome-sequencing project [6]. After the removal of low-quality and contaminant EST
sequences, 56 964 ESTs from this data set were also
deposited in Genbank. Together with the aforementioned ESTs from B. floridae, these ESTs were used to
predict protein-coding loci in the amphioxus draft
genome. Because the previous clustering analysis did
not include this second set of EST data [7], we reanalyzed the data and provided some updates on the
status of the current B. floridae EST data.
To date, we have obtained 319 001 ESTs from
these five B. floridae cDNA libraries (Table 1), and
have found that 41.8% of the predicted amphioxus
gene models on v.2.0 assembly are covered by the
ESTs. This finding suggests that more than half of
the predicted genes are either not expressed in the
developmental stages analyzed in this EST analysis,
99
Table 1: Numbers of ESTs from five B. floridae cDNA
libraries
Library
50 -EST
30 -EST
Total
30 Clusters
Egg
Gastrula
Neurula
36 h larvae
Adults
Total
19 396
21708
69 058
31730
18 801
160 693
19126
19 872
68 582
31766
18 962
158 308
38 522
41580
137640
63 496
37 763
319 001
5484
6437
14193
10 552
4868
27415a
a
The number of total clusters was calculated by combining all 30 EST
data from the five libraries for the clustering analysis.
or that they are expressed at such a low level that
their transcripts could not be detected. We used
158 308 30 ESTs for the clustering analysis and detected 27 415 unique cDNA clusters (17 452 singleton clusters and 9963 clusters of multiple ESTs).
Of note, the number of cDNA clusters is larger
than the number of protein-coding loci (21 900) predicted from the draft genome sequence [6]. This
result is likely due in part to the existence of alternatively spliced isoforms of specific genes in amphioxus [24]. Unlike genome-sequencing coverage, it is
not straightforward to extrapolate the coverage of
the EST data to the entire transcriptome because
the exact number of different transcript isoforms produced by each gene is not known in amphioxus. To
estimate the coverage of this EST data, we plotted
the relationship between the number of 30 ESTs and
the unique cDNA clusters detected by the clustering
analysis and found that the curve did not plateau
(Figure 2A), suggesting that further EST sequencing
of these cDNA libraries could still uncover many
new cDNA clusters. We applied mathematical
models to this plot to estimate the maximum diversity of amphioxus transcript species through extrapolation [25]. The model predicts that the amphioxus
transcriptome contains a maximum of 79 100 unique
transcripts, and many transcripts are predicted to be
rare species that would only be identify by sequencing a considerable number of ESTs. A comprehensive analysis of mouse full-length cDNAs from the
FANTOM project has identified more than 181 000
independent transcripts [26], and the data suggested
that the total number of transcripts is at least one
order of magnitude larger than the estimated
number of protein-coding genes (22 000) in the
mouse genome. A recent study using a transcripttagging technique and the NGS platform identified
more than 194 000 unique tag sequences from
100
Y.-B.Wang et al.
Figure 2: Clustering analysis of current B. floridae ESTs from five cDNA libraries. (A) The plot of detected unique
cDNA species with respect to the number of input 3’ EST sequences from all five cDNA libraries. The curve has
not reached its plateau due to the continuous discovery of many singleton clusters. (B) The corresponding plot
with high confidence clusters, which are defined as clusters having at least two 3’ EST reads. The number of unique
high confidence cDNA clusters detected begins to plateau (99%) when the number of input 3’ ESTs reaches approximately 116 000 sequences. (C) Relationships between the number of 3’ ESTs and the unique cDNA clusters detected
by clustering analysis at the five developmental stages indicate that adult amphioxus express a smaller number
of transcripts. As embryonic development proceeds, the embryos and larvae begin to express additional transcript
species.
Amphioxus ESTand transcriptome analysis
the cheetah skin transcriptome [27]. Therefore, our
estimation of that the amphioxus transcriptome contains 79 100 unique transcripts, which is 40% of the
mammalian transcriptome, is within a reasonable
limits. It should be noted that non-coding RNAs
(either polyadenylated or non-polyadenylated) are
now recognized to contribute a significant portion
of reads to transcriptome data [28, 29]. This finding
may also explain in part the large difference between
the number of predicted protein-coding genes in
amphioxus and the number of transcripts. Further
comprehensive analysis of amphioxus full-length
transcripts and detailed annotation of different classes
of non-coding RNAs in this organism will help determine the relative contribution of non-coding
RNAs to the amphioxus transcriptome.
We also noted that when we focus on unique
cDNA clusters composed of 2 or more reads (‘high
confidence clusters’), the plot of sequenced 3’ ESTs
versus the identified unique cDNA clusters shows a
saturation plateau (99%) at approximately 116 000 30
ESTs (Figure 2B); however, the number of transcript
species (9862) determined by the high confidence
cluster set is far less than the expected gene
number (21 900 protein-coding loci). This result
suggests that although the current coverage of
amphioxus EST data is insufficient to identify transcripts that are expressed at relatively low levels, the
current EST data can account for most of the highly
expressed genes. The model calculates that an additional 880 000 ESTs would be needed to reach 90%
transcriptome coverage and that approximately 2
million ESTs would be needed to detect >99% of
the predicted transcript species (79 100). Therefore,
traditional Sanger sequencing of these cDNA
libraries does not represent a cost-effective way to
increase our coverage of the amphioxus transcriptome. We will further discuss this issue in the last
section of this review.
Comparisons of amphioxus and
Ciona ESTdata
Since the five B. floridae libraries prepared in this
EST project were not amplified or normalized [7],
the EST counts for each transcript cluster should
be in proportion to their abundance in each particular library. Because quantitative data from more
sophisticated methods, such as microarray analysis
or NGS-based RNA-sequencing, are currently lacking in the amphioxus system, we therefore examined
101
the relationship between the number of 3’ESTs and
the number of detected clusters at each of the five
developmental stages (Figure 2C) and used this as a
proxy method to obtain a preliminary view of the
global expression trends. Comparing the four embryonic stages shows that the ratio of detected clusters versus 3’ESTs increases during embryonic
development, with the lowest ratio in the unfertilized eggs, comparable levels in the gastrula and the
neurula, and the highest ratio in larvae. This pattern
suggests that amphioxus eggs express a relatively
small population of transcripts and, as embryonic development proceeds, embryos and larvae begin to
express additional transcript species. The ratio of detected clusters versus 3’ESTs is lowest in the adult
stage, suggesting that amphioxus expresses the smallest number of transcripts in adulthood; presumably,
many developmental genes are no longer expressed
in adults or are expressed at such low levels that they
were not detected in this EST analysis. Interestingly,
this global expression pattern stands in stark contrast
to the expression patterns found in similar large-scale
EST analyses of the ascidian Ciona intestinalis [30, 31].
It has been demonstrated that Ciona eggs express the
most diverse population of transcripts, with embryos
and larvae gradually expressing fewer transcripts as
they progress through development [31]. As the
larvae undergo metamorphosis to develop an adult
body plan, Ciona appears to express a slightly higher
number of transcript species, suggesting that there is a
major transition during metamorphosis. Recent
microarray data from Ciona have confirmed this pattern, indicating that the largest expressed developmental gene set is the maternal gene cluster, which
comprises 38.8% of the tested genes [32]. The sharp
contrast between the developmental expression
trends of Ciona and amphioxus may reflect fundamental differences in the early developmental mechanisms of ascidians and amphioxus. It is widely
recognized that ascidian embryos rely heavily on
early localized maternal factors for cell fate determination [33]. The most notable examples are the
Postplasmic/PEM class of mRNAs [34, 35], such as
the Macho-1 mRNA that initially localizes to the
vegetal cytoplasm after fertilization and is subsequently inherited by specific blastomeres where it
drives muscle differentiation [36], and the mRNA
of Vasa homologue (Ci-VH) which is important in
the development of primordial germ cells [37]. In
contrast, amphioxus development is generally
102
Y.-B.Wang et al.
considered to be highly regulative [38]. Indeed, the
global gene expression trends that have been identified in the amphioxus EST data seem to be in accordance with this concept.
However, it should be noted that mosaic and
regulative development are simply two extremes of
the same scale, and all embryos probably make use of
both mechanisms [39]. For example, we now know
that ascidian development does not completely
depend on localized determinants. The regulative
interactions mediated by secreted signaling molecules, such as Fgf, BMP and Nodal, play important
functions in cell fate determination for mesoderm,
endoderm and neural ectoderm tissues in ascidian
embryos [40–42]. On the other hand, many vertebrates, such as Xenopus and zebrafish, display highly
regulated development yet also possess high levels of
maternal transcripts in their eggs [43–45]. Similar to
vertebrates, the presence of maternal transcripts in
amphioxus eggs should not be disregarded as their
identities and functions have not yet been systematically studied. Using the B. floridae EST and cDNA
resources, our group has recently demonstrated that
the maternal transcripts of Vasa and Nanos localize
asymmetrically near the vegetal pole of the amphioxus eggs, and our data suggest that they are the
putative germ cell determinants of amphioxus [46].
More interestingly, we showed that the condensed
vegetal pole cytoplasm is inherited asymmetrically by
only one blastomere from the two-cell embryo to
the early gastrula, suggesting that even the first two
blastomeres of the amphioxus embryo are developmentally distinct [46]. This study highlights the potential mosaic property of amphioxus embryos and
indicates a need for additional study of the molecular
properties of maternal transcripts in amphioxus.
Further detailed analysis of the amphioxus egg transcriptome and additional comparative studies between amphioxus and other chordate groups
(namely, the ascidians and vertebrates) should provide more information on the evolution of early patterning mechanisms in chordates.
The use of amphioxus ESTresources
Large-scale EST analysis and the cDNA resources of
the amphioxus B. floridae have become valuable tools
for research requiring comprehensive genomic information. Several phylogenomic studies have relied
mainly on EST-based data from representative metazoan taxa (including amphioxus) to construct large
data matrices for phylogenetic analyses [2, 3, 47].
This approach has been very effective and has provided many new insights that could not be easily
discovered through single-gene analysis, such as the
sister-group relationship between tunicates and vertebrates. The amphioxus EST data also allow researchers to systematically obtain a set of transcripts
involved in a specific developmental process or in
a specific metabolic pathway without the need for
individual screening of the desired cDNAs. For example, Martinez et al. [48] used the amphioxus EST
data along with other EST data and genome assembly data from representative organisms to trace the
evolutionary history of neural crest-related genes.
Using this method, they found that a considerable
number of genes encoding signaling molecules
emerged during vertebrate evolution, which may account for the cell type diversification of neural crest
derivatives in early vertebrates.
In addition to data-mining research, the currently
available B. floridae cDNA resource [7] is particularly
useful for cDNA cloning-based studies. After initial
identification of ESTs representing genes of interest,
cDNA clones can be easily obtained through request
via the Branchiostoma floridae cDNA Database (http://
amphioxus.icob.sinica.edu.tw/), and their expression
or function can then be analyzed. Using such
approaches, researchers have identified amphioxus
homologues of genes involved in the development
of many important vertebrate featuress, and the subsequent comparative analyses have proved highly informative in infering the evolutionary origins of
these structures. For example, using the B. floridae
EST database, researchers have identified most of
the important signaling molecules and transcription
factors involved in the development of the vertebrate
organizer (Spemann’s organizer) in amphioxus [16].
They have also demonstrated that the expression
patterns of these genes and the developmental mechanism for dorsal/ventral (D/V) patterning are conserved between amphioxus and vertebrates,
suggesting that the Spemann’s organizer has deeper
roots in the base of the chordate lineage. Similarly, to
understand the evolutionary origin of the vertebrate
neural crest, researchers have used genomic and EST
databases to identify amphioxus genes whose homologues have been known to function in a putative
neural crest gene regulatory network [49]. Using
comparative expression analysis, they demonstrated
that while amphioxus possess conserved upstream
Amphioxus ESTand transcriptome analysis
signaling molecules and transcription factors for specifying the neural plate during embryogenesis,
amphioxus do not express one group of important
transcription factors (such as FoxD3, SoxE, Id and
Twist) at the neural plate border [49]. They therefore
proposed that this group of genes may have been
co-opted by the neural plate border during early
vertebrate evolution, contributing to the evolution
of neural crest cells. In another study, researchers
used the EST database to search for amphioxus
homologues of developmental genes involved in cranial neural crest and cartilage development [50], and
they found that those genes are predominantly expressed in the mesodermal tissues of amphioxus, suggesting that the deployment of the chondrogenic
gene network of the mesoderm predates the evolution of neural crest-derived cartilages. Other studies
have also used genomic and EST database to investigate the evolution of the somites [51] and of the
lateral plate mesoderm [52]. These results demonstrated that amphioxus and vertebrates use a mostly
conserved genetic toolkit in somite segmentation
[51]; however, judging from marker gene expression, the ventral mesoderm of amphioxus seems to
lack apparent regionalization compared to that of
vertebrates [52]. EST data have also been used to
confirm the existence of a large family of endogenous green fluorescent proteins (GFP) in B. floridae and
other amphioxus species [53–55], making amphioxus
the only deuterostome animal in which endogenous
GFPs have been identified.
Similar approaches have also been used in
B. japonicum to identify immune-relevant genes and
liver-specific genes from amphioxus EST collections
[56, 57]. Using EST data from lipopolysaccharidechallenged amphioxus, Liu et al. [56] identified 63
genes related to the immune response. Interestingly,
some genes involved in histocompatibility and
lymphocyte immune signaling were also identified
in their study. Their results therefore support the
previous notion that amphioxus possess some of
the genetic components required for an adaptive
immune system [58, 59]. This finding in turn
suggests that amphioxus may possess a primitive
adaptive immune system and that vertebrates may
have co-opted these genetic components for the assemblage of a more sophisticated adaptive immune
system. In another EST study using B. japonicum,
Wang and Zhang [57] identified 69 homologues of
vertebrate liver-specific genes and used quantitative
103
real-time PCR to confirm that 58 of these genes are
highly expressed in the hepatic cecum of amphioxus,
supporting the hypothesis that the amphioxus hepatic cecum is homologous to the vertebrate liver.
They also found that the majority of these hepatic
cecum-specific genes responded to lipopolysaccharide challenges in a similar fashion to their zebrafish
homologues, further suggesting that amphioxus
also has liver-mediated innate immune response.
Thus, the currently available EST data represent a
rich source for both sequence-based analyses and
molecular developmental studies in various amphioxus species.
FUTURE PERSPECTIVES
In recent years, the development of NGS technologies has revolutionized the scale of transcriptomic
studies [60, 61]. The major advantage of NGS is
the ability to generate large amounts of sequence
data, ranging from hundreds of thousands to billions
of short sequence reads (50–350 bp depending on
the sequencing platform used) at a relatively low
cost. Several commercially available NGS platforms,
including Roche/454, Illumina/Solexa and ABI/
SOLiD systems, have been successfully used for
identifying small regulatory RNAs (microRNAs),
characterizing alternative splicing isoforms, and mapping and quantifying transcriptomes in both traditional model organisms and in human cells
[62, 63]. More recently, NGS applications have
been expanded to include analysis of non-model organisms for genome/transcriptome studies [64–67].
Because of their high-throughput nature, the
sequencing depth provided by the NGS platforms
is usually sufficient for global transcriptome characterization from a single sample, or for surveying
a great number of genes simultaneously from multiple samples [63]. Moreover, compared with
hybridization-based methods such as cDNA microarrays or genomic tiling arrays, sequence-based NGS
transcriptome profiling has less background noise and
a much greater dynamic range for quantifying gene
expression levels. The development of algorithms for
de novo transcriptome assembly from short-read
sequencing data and for the tagging approach for
individual transcripts also make global gene expression analyses possible in organisms for which
there are not yet high-quality reference genomes
[27, 68, 69].
104
Y.-B.Wang et al.
To date, the 454 and Solexa sequencing platforms
have been used to characterize microRNA complements in the amphioxus system [70, 71] and have led
to the discoveries of more novel microRNAs than
were either computationally predicted based on the
draft genome or identified via Sanger sequencing
of small RNA libraries [72, 73]. We anticipate that
further applications of NGS technologies will greatly
enhance our current knowledge of the amphioxus
transcriptome. For example, RNA sequencing
(RNA-Seq) of libraries prepared from normal
wild-type embryos at different developmental
stages and from selected adult tissues would help us
to catalogue the full complement of transcripts in
amphioxus. The throughput provided by NGS platforms, such as Solexa, should be sufficient to fill
in the gaps in our current amphioxus transcriptome
coverage. Presumably, high-coverage NGS data
could also help improve the current genome
annotation by providing information on transcription start sites and exon/intron boundaries.
Furthermore, it should also provide information on
alternative transcript splicing patterns and help to
identify non-coding RNAs in the amphioxus
transcriptome.
Quantitative measures of transcript species from
NGS data can be used for profiling global expression
changes during embryonic development. Recently,
various pharmacological reagents and recombinant
proteins have been used to study the developmental
functions of major signaling pathways in amphioxus
[16, 49, 74–79]. Future NGS-based transcriptomic
comparisons between wild-type embryos and
reagent-treated embryos could not only reveal the
global effects of manipulating these signaling pathways during embryogenesis but also help to identify
novel target genes downstream of these signaling
pathways. In the post-genomic era of amphioxus research, this approach will be highly effective and
will provide global information for understanding
the basic developmental mechanisms of chordates.
With continued improvements in the current
sequencing platforms and associated bioinformatic
methods and simultaneous decreases in the cost of
NGS-based analyses, this type of study is becoming
economically feasible for many researchers. We anticipate that NGS-based transcriptome analyses
will greatly increase the width and depth of evo-devo
research and will further facilitate the use of
non-model organisms for future comparative genomic studies.
Key points
Amphioxus represents the most basal chordate group and is
an emerging model organism for evolutionary developmental
biology.
The amphioxus draft genome and EST/cDNA resources have
greatly increased our ability to use amphioxus in comparative
genomics and developmental studies.
The current B. floridae EST data and its comparison with Ciona
EST data provide interesting insights into the dynamic changes
in gene expression occurring during embryonic development.
The advances of high-throughput NGS technologies have
dramatically expanded the scope of transcriptome studies;
further applications of NGS in the amphioxus system will
increase the width and depth of such transcriptomic analysis on
a genome-wide scale.
FUNDING
National Science Council, Taiwan (NSC 99-2627B-001-004 to J.K.Y., NSC 98-2221-E-001-018 to
C.Y.L); and the Academia Sinica, Taipei, Taiwan
(Career Development Award AS-98-CDA-L06 to
J.K.Y.).
References
1.
Whittaker JR. Cephalochordates, the lancelets. In: Gilbert
SF, Raunio AM (eds). Embryology: Constructing the Organism.
Sunderland: Sinauer Associates, 1997:365–81.
2. Bourlat SJ, Juliusdottir T, Lowe CJ, et al. Deuterostome
phylogeny reveals monophyletic chordates and the new
phylum Xenoturbellida. Nature 2006;444:85–8.
3. Delsuc F, Brinkmann H, Chourrout D, et al. Tunicates
and not cephalochordates are the closest living relatives of
vertebrates. Nature 2006;439:965–8.
4. Philippe H, Lartillot N, Brinkmann H. Multigene analyses of bilaterian animals corroborate the monophyly of
Ecdysozoa, Lophotrochozoa, and Protostomia. Mol Biol
Evol 2005;22:1246–53.
5. Holland LZ, Albalat R, Azumi K, et al. The amphioxus
genome illuminates vertebrate origins and cephalochordate
biology. Genome Res 2008;18:1100–11.
6. Putnam NH, Butts T, Ferrier DE, et al. The amphioxus
genome and the evolution of the chordate karyotype.
Nature 2008;453:1064–71.
7. Yu JK, Wang MC, Shin IT, et al. A cDNA resource for
the cephalochordate amphioxus Branchiostoma floridae. Dev
Genes Evol 2008;218:723–7.
8. Parkinson J, Blaxter M. Expressed sequence tags: an overview. Methods Mol Biol 2009;533:1–12.
9. Nishikawa T. Considerations on the taxonomic status of
the lancelets of the genus Branchiostoma from the Japanese
waters. Publ Seto Mar Biol Lab 1981;26:135–56.
10. Wang YQ, Fang SH. Taxonomic and molecular phylogenetic studies of amphioxus: a review and prospective evaluation. Zool Res 2005;26:666–72.
11. Zhang QJ, Zhong J, Fang SH, et al. Branchiostoma japonicum
and B. belcheri are distinct lancelets (Cephalochordata) in
Xiamen waters in China. Zool Sci 2006;23:573–9.
Amphioxus ESTand transcriptome analysis
12. Zhong J, Zhang QJ, Xu QS, et al. Complete mitochondrial
genomes defining two distinct lancelet species in the West
Pacific Ocean. Mar Biol Res 2009;5:278–85.
13. Mou CY, Zhang SC, Lin JH, et al. EST analysis of mRNAs
expressed in neurula of Chinese amphioxus. Biochem Biophys
Res Commun 2002;299:74–84.
14. Suzuki MM, Satoh N. Genes expressed in the amphioxus
notochord revealed by EST analysis. Dev Biol 2000;224:
168–77.
15. Stemple DL. Structure and function of the notochord: an
essential organ for chordate development. Development
2005;132:2503–12.
16. Yu JK, Satou Y, Holland ND, et al. Axial patterning in
cephalochordates and the evolution of the organizer.
Nature 2007;445:613–7.
17. Panopoulou G, Hennig S, Groth D, et al. New evidence for
genome-wide duplications at the origin of vertebrates using
an amphioxus gene set and completed animal genomes.
Genome Res 2003;13:1056–66.
18. Holland PW, Garcia-Fernandez J, Williams NA, et al. Gene
duplications and the origins of vertebrate development.
Dev Suppl 1994125–33.
19. Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science 2000;290:1151–5.
20. Minguillon C, Ferrier DE, Cebrian C, et al. Gene duplications in the prototypical cephalochordate amphioxus. Gene
2002;287:121–8.
21. Minguillon C, Jimenez-Delgado S, Panopoulou G, et al.
The amphioxus Hairy family: differential fate after duplication. Development 2003;130:5903–14.
22. Schubert M, Brunet F, Paris M, et al. Nuclear hormone
receptor signaling in amphioxus. Dev Genes Evol 2008;218:
651–65.
23. Yu JK, Mazet F, Chen YT, et al. The Fox genes of
Branchiostoma floridae. Dev Genes Evol 2008;218:629–38.
24. Short S, Holland LZ. The evolution of alternative splicing
in the Pax family: the view from the Basal chordate amphioxus. J Mol Evol 2008;66:605–20.
25. Chao A. Estimating the population size for capturerecapture data with unequal catchability. Biometrics 1987;
43:783–91.
26. Carninci P, Kasukawa T, Katayama S, et al. The transcriptional landscape of the mammalian genome. Science 2005;
309:1559–63.
27. Hong LZ, Li J, Schmidt-Kuntzel A, et al. Digital gene expression for non-model organisms. Genome Res 2011;21:
1905–15.
28. Carninci P, Yasuda J, Hayashizaki Y. Multifaceted mammalian transcriptome. Curr Opin Cell Biol 2008;20:274–80.
29. Costa FF. Non-coding RNAs: meet thy masters. Bioessays
2010;32:599–608.
30. Satou Y, Takatori N, Fujiwara S, et al. Ciona intestinalis
cDNA projects: expressed sequence tag analyses and gene
expression profiles during embryogenesis. Gene 2002;287:
83–96.
31. Kawashima T, Satou Y, Murakami SD, et al. Dynamic
changes in developmental gene expression in the basal chordate Ciona intestinalis. Dev Growth Differ 2005;47:187–99.
32. Azumi K, Sabau SV, Fujie M, et al. Gene expression profile
during the life cycle of the urochordate Ciona intestinalis.
Dev Biol 2007;308:572–82.
105
33. Sardet C, Paix A, Prodon F, et al. From oocyte to 16-cell
stage: cytoplasmic and cortical reorganizations that pattern
the ascidian embryo. Dev Dyn 2007;236:1716–31.
34. Prodon F, Yamada L, Shirae-Kurabayashi M, et al.
Postplasmic/PEM RNAs: a class of localized maternal
mRNAs with multiple roles in cell polarity and
development in ascidian embryos. Dev Dyn 2007;236:
1698–715.
35. Yamada L. Embryonic expression profiles and conserved
localization mechanisms of pem/postplasmic mRNAs of
two species of ascidian, Ciona intestinalis and Ciona savignyi.
Dev Biol 2006;296:524–36.
36. Nishida H, Sawada K. Macho-1 encodes a localized mRNA
in ascidian eggs that specifies muscle fate during embryogenesis. Nature 2001;409:724–9.
37. Shirae-Kurabayashi M, Nishikata T, Takamura K, et al.
Dynamic redistribution of vasa homolog and exclusion of
somatic cell determinants during germ cell specification in
Ciona intestinalis. Development 2006;133:2683–93.
38. Holland LZ, Holland ND. A revised fate map for amphioxus and the evolution of axial patterning in chordates.
Integr Comp Biol 2007;47:360–72.
39. Lawrence PA, Levine M. Mosaic and regulative
development: two faces of one coin. Curr Biol 2006;16:
R236–9.
40. Hudson C, Lotito S, Yasuo H. Sequential and combinatorial inputs from Nodal, Delta2/Notch and FGF/MEK/ERK
signalling pathways establish a grid-like organisation of distinct cell identities in the ascidian neural plate. Development
2007;134:3527–37.
41. Kobayashi K, Sawada K, Yamamoto H, et al. Maternal
macho-1 is an intrinsic factor that makes cell response
to the same FGF signal differ between mesenchyme and
notochord induction in ascidian embryos. Development
2003;130:5179–90.
42. Kondoh K, Kobayashi K, Nishida H. Suppression of
macho-1-directed muscle fate by FGF and BMP is required
for formation of posterior endoderm in ascidian embryos.
Development 2003;130:3205–16.
43. Aanes H, Winata CL, Lin CH, et al. Zebrafish mRNA
sequencing deciphers novelties in transcriptome dynamics
during maternal to zygotic transition. Genome Res 2011;21:
1328–38.
44. Baldessari D, Shin Y, Krebs O, et al. Global gene expression
profiling and cluster analysis in Xenopus laevis. Mech Dev
2005;122:441–75.
45. Mathavan S, Lee SG, Mak A, et al. Transcriptome analysis
of zebrafish embryogenesis using microarrays. PLoS Genet
2005;1:260–76.
46. Wu HR, Chen YT, Su YH, et al. Asymmetric localization
of germline markers Vasa and Nanos during early development in the amphioxus Branchiostoma floridae. Dev Biol 2011;
353:147–59.
47. Dunn CW, Hejnol A, Matus DQ, et al. Broad phylogenomic sampling improves resolution of the animal tree of
life. Nature 2008;452:745–9.
48. Martinez-Morales JR, Henrich T, Ramialison M, etal. New
genes in the evolution of the neural crest differentiation
program. Genome Biol 2007;8:R36.
49. Yu JK, Meulemans D, McKeown SJ, et al. Insights from the
amphioxus genome on the origin of vertebrate neural crest.
Genome Res 2008;18:1127–32.
106
Y.-B.Wang et al.
50. Meulemans D, Bronner-Fraser M. Insights from amphioxus
into the evolution of vertebrate cartilage. PLoS One 2007;2:
e787.
51. Beaster-Jones L, Kaltenbach SL, Koop D, etal. Expression of
somite segmentation genes in amphioxus: a clock without a
wavefront? Dev Genes Evol 2008;218:599–611.
52. Onimaru K, Shoguchi E, Kuratani S, et al. Development
and evolution of the lateral plate mesoderm: comparative
analysis of amphioxus and lamprey with implications for the
acquisition of paired fins. Dev Biol 2011;359:124–36.
53. Bomati EK, Manning G, Deheyn DD. Amphioxus encodes
the largest known family of green fluorescent proteins,
which have diversified into distinct functional classes.
BMC Evol Biol 2009;9:77.
54. Deheyn DD, Kubokawa K, McCarthy JK, et al.
Endogenous green fluorescent protein (GFP) in amphioxus.
Biol Bull 2007;213:95–100.
55. Li G, Zhang QJ, Zhong J, et al. Evolutionary and functional
diversity of green fluorescent proteins in cephalochordates.
Gene 2009;446:41–9.
56. Liu Z, Li L, Li H, etal. EST analysis of the immune-relevant
genes in Chinese amphioxus challenged with lipopolysaccharide. Fish Shellfish Immunol 2009;26:843–9.
57. Wang Y, Zhang S. Identification and expression of
liver-specific genes after LPS challenge in amphioxus: the
hepatic cecum as liver-like organ and "pre-hepatic" acute
phase response. Funct Integr Genomics 2011;11:111–8.
58. Huang G, Xie X, Han Y, et al. The identification of
lymphocyte-like cells and lymphoid-related genes in
amphioxus indicates the twilight for the emergence of adaptive immune system. PLoS One 2007;2:e206.
59. Huang S, Yuan S, Guo L, et al. Genomic analysis of the
immune gene repertoire of amphioxus reveals extraordinary
innate complexity and diversity. Genome Res 2008;18:
1112–26.
60. Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet 2010;11:31–46.
61. Mortazavi A, Williams BA, McCue K, et al. Mapping and
quantifying mammalian transcriptomes by RNA-Seq. Nat
Methods 2008;5:621–8.
62. Ozsolak F, Milos PM. RNA sequencing: advances,
challenges and opportunities. Nat Rev Genet 2011;12:
87–98.
63. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009;10:57–63.
64. Li R, Fan W, Tian G, et al. The sequence and de novo
assembly of the giant panda genome. Nature 2010;463:
311–7.
65. Shinzato C, Shoguchi E, Kawashima T, et al. Using the
Acropora digitifera genome to understand coral responses to
environmental change. Nature 2011;476:320–3.
66. Jackson DJ, McDougall C, Woodcroft B, et al. Parallel evolution of nacre building gene sets in molluscs. Mol Biol Evol
2010;27:591–608.
67. Meyer E, Aglyamova GV, Wang S, et al. Sequencing and de
novo analysis of a coral larval transcriptome using 454
GSFlx. BMC Genomics 2009;10:219.
68. Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet 2011;12:671–82.
69. Grabherr MG, Haas BJ, Yassour M, et al. Full-length
transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 2011;29:644–52.
70. Campo-Paysaa F, Semon M, Cameron RA, et al.
microRNA complements in deuterostomes: origin and evolution of microRNAs. Evol Dev 2011;13:15–27.
71. Chen X, Li Q, Wang J, et al. Identification and characterization of novel amphioxus microRNAs by Solexa
sequencing. Genome Biol 2009;10:R78.
72. Dai Z, Chen Z, Ye H, etal. Characterization of microRNAs
in cephalochordates reveals a correlation between
microRNA repertoire homology and morphological similarity in chordate evolution. Evol Dev 2009;11:41–9.
73. Luo Y, Zhang S. Computational prediction of amphioxus
microRNA genes and their targets. Gene 2009;428:41–6.
74. Bertrand S, Camasses A, Somorjai I, et al. Amphioxus FGF
signaling predicts the acquisition of vertebrate morphological traits. Proc Natl Acad Sci USA 2011;108:9160–5.
75. Koop D, Holland ND, Semon M, et al. Retinoic acid signaling targets Hox genes during the amphioxus gastrula
stage: insights into early anterior-posterior patterning of
the chordate body plan. Dev Biol 2010;338:98–106.
76. Onai T, Lin HC, Schubert M, etal. Retinoic acid and Wnt/
beta-catenin have complementary roles in anterior/posterior patterning embryos of the basal chordate amphioxus.
Dev Biol 2009;332:223–33.
77. Onai T, Yu JK, Blitz IL, et al. Opposing Nodal/Vg1 and
BMP signals mediate axial patterning in embryos of the
basal chordate amphioxus. Dev Biol 2010;344:377–89.
78. Schubert M, Holland ND, Laudet V, et al. A retinoic
acid-Hox hierarchy controls both anterior/posterior patterning and neuronal specification in the developing central
nervous system of the cephalochordate amphioxus. Dev Biol
2006;296:190–202.
79. Schubert M, Yu JK, Holland ND, et al. Retinoic acid signaling acts via Hox1 to establish the posterior limit of the
pharynx in the chordate amphioxus. Development 2005;132:
61–73.