Comprehensive Sequence Analysis of 24783 Barley Full

Plant Physiology Preview. Published on March 17, 2011, as DOI:10.1104/pp.110.171579
1
Running head: Analysis of Barley FLcDNAs
2
3
4
Corresponding author: Takashi Matsumoto
5
National Institute of Agrobiological Sciences, 2-1-2, Kannondai, Tsukuba, Ibaraki
6
305-8602, Japan
7
Tel: +81-29-838-7441
8
e-mail: [email protected]
9
10
Research area: Genome Analysis
11
1
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
Copyright 2011 by the American Society of Plant Biologists
1
Comprehensive Sequence Analysis of 24,783 Barley Full-length cDNAs derived
2
from Twelve Clone Libraries
3
4
Takashi Matsumoto*#, Tsuyoshi Tanaka*, Hiroaki Sakai, Naoki Amano, Hiroyuki
5
Kanamori, Kanako Kurita, Ari Kikuta, Kozue Kamiya, Mayu Yamamoto, Hiroshi Ikawa,
6
Nobuyuki Fujii, Kiyosumi Hori, Takeshi Itoh and Kazuhiro Sato
7
National Institute of Agrobiological Sciences, 2-1-2, Kannondai, Tsukuba, Ibaraki
8
305-8602, Japan (T. M., T. T., H. S., N. A., K. H., T. I.), Institute of Society for
9
Techno-Innovation of Agriculture, Forestry and Fisheries, 446-1, Kamiyokoba, Tsukuba,
10
Ibaraki 305-0854, Japan (H. K., K. Ku., A. K., K. Ka., M. Y., H. I.), Hitachi
11
Government & Public Corporation System Engineering, Ltd., 2-4-8, Toyo-cyo, Koto,
12
Tokyo 135-8633, Japan (N. F.), 4Institute of Plant Science and Resources, Okayama
13
University, 2-20-1, Chuo, Kurashiki, Okayama, 710-0046, Japan (K. H., K. S.)
14
2
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Financial sources: This work was supported by grants to T. M. and T. I. from the
2
Ministry of Agriculture, Forestry and Fisheries of Japan (Genomics for Agricultural
3
Innovation, TRC-1008, GIR-1001; and Integrated Research Project for Plant, Insect and
4
Animal using Genome Technology, GD-1001, GD-1002).
5
6
* These authors contributed equally to this work.
7
8
# Corresponding author.
9
Takashi Matsumoto
10
e-mail: [email protected]
11
3
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Abstract
2
3
Full-length cDNA (FLcDNAs) libraries consisting of 172,000 clones were constructed
4
from a two-row malting barley cultivar, Hordeum vulgare L. cv. Haruna Nijo, under
5
normal and stressed conditions. After sequencing the clones from both ends and
6
clustering the sequences, a total of 24,783 complete sequences were produced. By
7
removing duplicates between these and publicly available sequences, 22,651
8
representative sequences were obtained; 17,773 were novel barley FLcDNAs, and 1,699
9
were barley specific. Highly conserved genes were found in the barley FLcDNA
10
sequences for 721 of 881 rice trait genes with 50% or greater identity. These FLcDNA
11
resources from our Haruna Nijo cDNA libraries and the full-length sequences of
12
representative clones, will improve our understanding of the biological functions of
13
genes in barley, which is the cereal crop with the fourth-highest production in the world,
14
and will provide a powerful tool for annotating the barley genome sequences that will
15
become available in the near future.
16
4
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Introduction
2
3
In 2009, approximately 54 million hectares of barley (Hordeum vulgare) were
4
cultivated, and harvests of this species produced approximately 150 million tons of
5
grain worldwide, according to the Food and Agriculture Organization of the United
6
Nations (FAOSTAT; http://faostat.fao.org/). Barley belongs to the Poaceae family (i.e.,
7
grasses), which also includes rice (Oryza sativa), maize (Zea mays), wheat (Triticum
8
spp.), and rye (Secale sereade; cross-pollinated). These members of the Poaceae,
9
including barley, are the major cereal crops cultivated throughout the world. Barley is
10
utilized for animal feed, malting, and as a human food source. Because barley is
11
self-pollinated and has a diploid (2n = 14) genome, it is recognized as a genetic model
12
of the Triticeae tribe within the Poaceae. Studies of the barley genome, transcriptome,
13
and proteome are currently advancing our understanding of the molecular functions of
14
agriculturally important barley genes (Sreenivasulu and Wobus 2008; Sreenivasulu et al.
15
2008).
16
Although barley has been extensively studied in terms of its genetics and breeding,
17
molecular biology and genomics studies have been limited until recently because of the
18
large genome size of this species (>5 Gb; ca. 12 times that of rice) (Varshney et al.
19
2007). To further promote barley genomics, a multi-national collaboration, the
20
International Barley Sequencing Consortium (IBSC), has been developed with the
21
objective of obtaining the whole genome sequence of barley (http://barleygenome.org;
22
Schulte et al. 2009).
5
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
The study of an organism's transcriptome (i.e., all transcribed sequences) is one of
2
the most effective ways to investigate the structure and function of its active genes. In
3
many plant species, transcript contigs have been constructed by assembling all the
4
expressed sequence tag (EST) data available in PlantGDB, with the aim of identifying a
5
dataset of unique mRNA sequences and maximizing the information obtained for both
6
protein-coding and non-coding (nc) regions in these sequences (Duvick et al. 2008). A
7
large set of ESTs (501,620 from the vulgare subspecies and 24,161 from the
8
spontaneum
9
www.ncbi.nlm.nih.gov/dbEST/;
subspecies
in
the
and
NCBI-dbEST
522,561
sequences
release-100110:
in
PlantGDB:
10
www.plantgdb.org/) has been accumulated in the public domain. These ESTs have been
11
assembled
12
(http://www.ncbi.nlm.nih.gov/unigene) and into 134,482 Plant GDB-assembled unique
13
transcripts
14
estimated that approximately 75% of the genes in the barley genome have been captured
15
(Sreenivasulu et al. 2008). Although these assemblies provide researchers with a
16
tremendous amount of information for understanding partial barley gene structures, the
17
assembly of sequences from different barley varieties might result in erroneous contigs
18
or lead to the fusion of unrelated transcripts via common domains or motif sequences.
19
Moreover, these data might lead to an overestimation of the number of genes in barley
20
because these assemblies do not consider virtual connections between 5 and 3 end
21
sequences, where both sequences are derived from the same cDNA clone.
22
into
(PUTs)
23,595
clusters
in
NCBI’s
UniGene
(http://www.plantgdb.org/prj/ESTCluster/progress.php).
′
dataset
It
is
′
Capturing the transcripts of all active genes under defined (temporal, spatial, and
6
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
stressed) conditions can provide a “snapshot” of living cells. The resultant data can be
2
used to obtain quantitative (expression frequency) and qualitative (predicted protein
3
function) information about these genes. For barley transcriptome analysis, a 22K
4
Barley1 GeneChip probe array based on an EST database containing 350,000 sequences
5
from 84 different RNA sources has been established (Close et al. 2004). The results of
6
high-throughput transcript profiling of barley using this chip can be retrieved from
7
PLEXdb
8
(http://www.ebi.ac.uk/microarray-as/ae/). Using several platforms for transcript
9
profiling, researchers have described the responses of barley to conditions such as high
10
salinity (Ueda et al. 2006; Walia et al. 2006), low temperature or freezing (Koo et al.
11
2008), and drought (Talame et al. 2007; Tommasini et al. 2008), as well as during
12
various stages, such as grain development (Sreenivasulu et al. 2006), germination
13
(Sreenivasulu et al. 2008), and growth (Druka et al. 2006), by means of coordinated up
14
or down regulation of sets of genes.
(http://www.plexdb.org/)
and
ArrayExpress
15
Moreover, the construction of a comprehensive gene set for barley by the genome
16
sequencing is underway. However, following sequencing, genome annotation in the
17
absence of information on exon-intron junctions is not an easy task, even for organisms
18
for which complete genome sequences have been revealed. Although several gene
19
prediction programs have been developed, the predicted gene structures might not
20
always be correct because these programs may select and connect incorrectly predicted
21
exons (Bennetzen et al. 2004; Cruveiller et al. 2004; Jabbari et al. 2004). Hence, most
22
gene annotation projects refer to EST or mRNA sequences for the accurate structural
7
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
annotation of genes (Imanishi et al. 2004; Itoh et al. 2007).
2
Full-length cDNA (FLcDNA) is defined as the DNA complementary to an mRNA
3
sequence that extends from the region near the 5' cap structure to the poly(A) tail, which
4
are structures specific to eukaryotic mRNAs (Maruyama and Sugano 1994; Suzuki et al.
5
1997; Carninci et al. 2000). FLcDNA technology has been applied to the genomes of
6
humans (Ota et al. 2004), mice (Kawai et al. 2001), and several plant species, including
7
Arabidopsis thaliana (Seki et al. 2002), O. sativa (Kikuchi et al. 2003), T. aestivum
8
(Ogihara et al. 2004; Kawaura et al. 2009), soybean (Glycine max) (Umezawa et al.
9
2008), Z. mays (Soderlund et al. 2009) and tomato (Solanum lycopersicum) (Aoki et al.
10
2010). Recently, an FLcDNA library has been constructed for the Japanese malting
11
barley variety Haruna Nijo (Sato et al. 2009). From more than 45,000 cDNA clones,
12
5,006 non-overlapping sequences were obtained. These sequences and clones have been
13
made publicly available through the Barley DB (http://www.shigen.nig.ac.jp/barley/),
14
and they have proven effective in the identification of functional genes. However, this
15
information is insufficient because the number of obtained FLcDNAs is much lower
16
than all of the expressed genes in the barley genome.
17
In the present study, we constructed twelve FLcDNA libraries for Haruna Nijo
18
from various organs or under different conditions to produce a comprehensive dataset
19
for this barley variety. After 5 and 3 end-sequencing and clustering of approximately
20
172,000 cDNA clones, 24,783 complete FLcDNAs sequences were retrieved. In
21
conjunction with publicly available barley FLcDNAs, a total of 22,651 non-redundant
22
(nr) barley FLcDNAs were obtained. These sequenced clones should provide
′
′
8
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
comprehensive information about the barley gene repertory.
2
9
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Results
2
3
Barley FLcDNA dataset
4
A total of 24,783 FLcDNAs were completely sequenced from the EST library (see
5
Supplemental Information) (Fig.1). After eliminating contaminated sequences and
6
possible fusions of unrelated transcripts, we obtained 23,614 barley FLcDNA sequences
7
(henceforth, the "National Institute of Agrobiological Sciences FLcDNAs, or NIAS
8
FLcDNAs"). These FLcDNAs were derived from the 12 cDNA libraries distinguished
9
by library-specific tag sequences and an unknown group because of lacking the tag
10
sequences (Supplemental Table S1). The number of FLcDNAs in the various libraries
11
ranged from 857 in the library derived from flowers in the juvenile stage to 2,643 in the
12
library derived from the plants treated with jasmonic acid (JA). To characterize the
13
dataset, we compared the sequences with 4,999 Haruna Nijo FLcDNA sequences that
14
had been published previously (henceforth, "Public FLcDNAs") (Sato et al. 2009). We
15
found that the NIAS FLcDNA inserts were clearly longer (1,701 ± 846 bp) than the
16
Public FLcDNA inserts (1,455 ± 742 bp) (Supplemental Figure S1). For comparison,
17
the average lengths of rice FLcDNAs in sets of 37,139 O. sativa var. japonica
18
FLcDNAs and 10,083 O. sativa var. indica FLcDNAs (DDBJ/EMBL/GenBank) were
19
1,746 ± 945 bp and 1,042 ± 535 bp, respectively. Moreover, the estimated proportion of
20
clones that might contain complete predicted open reading frames (ORFs; from Met to
21
the stop codon) was similar between the NIAS and Public FLcDNAs (84.9% and 85.6%,
22
respectively; Table I). These results suggest that the qualities of the two FLcDNA
10
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
datasets are quite similar in terms of capturing complete ORFs.
2
To evaluate the novel barley FLcDNAs revealed by our data, we compared the
3
sequence similarity between the NIAS and Public FLcDNAs. We found that 21,586 of
4
the 23,614 NIAS FLcDNAs showed no significant hits for any of the Public FLcDNAs.
5
After redundant clones were removed, the remaining 17,773 FLcDNAs represented
6
novel barley sequences (Fig. 1). Combined with the 4,878 nonredundant Public
7
FLcDNAs, 22,651 representative barley FLcDNAs were constructed. We have
8
designated these FLcDNAs “Uni-FLcDNAs” in this study and used this set for further
9
comparative analyses. The average length of the Uni-FLcDNAs was 1,711 ± 863 bp.
10
Recently, many paired sense/antisense transcripts referred to as natural antisense
11
transcripts (NATs) (Osato et al. 2003; Wang et al. 2005) have been identified. In
12
Arabidopsis, 958 NAT pairs have been confirmed (Alexandrov et al. 2006), and in rice,
13
687 bidirectional transcription units have been discovered by means of FLcDNA
14
mapping onto genome sequences (Osato et al. 2003). To test for the existence of NATs
15
in the obtained FLcDNAs, we used blastn to screen for paired sense/antisense
16
transcripts among all of the FLcDNAs, with a positive match criterion of greater than
17
95% identity and an E-value of less than 1 × 10−5. We determined that 2,051 FLcDNAs
18
could form sense/antisense pairs. An example of a sense/antisense pair presented by
19
ClustalX alignment software is shown in Supplemental Figure S2. NIASHv1002F20
20
and NIASHv1022F01 exhibit complete reverse homology, despite the fact that they
21
both contain predicted ORFs. This suggests that our dataset contains NATs, even for
22
sequences that encode predicted proteins.
11
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
2
Protein-coding genes
3
Based on homology searches and predictions of the longest ORF, 22,623 of
4
22,651 representative ORFs were identified; 19,212 of these ORFs (84.9%) were
5
homologous to known functional genes according to the results of blastx searches
6
against the RefSeq (Pruitt et al. 2009) and UniProtKB (The UniProt Consortium 2009)
7
databases. The proportion of representative FLcDNAs that were deemed to contain
8
complete ORFs was 85.4% (Table I), indicating that most of the FLcDNAs were
9
associated with protein-coding genes.
10
To identify sequences in the Uni-FLcDNAs that were homologous to the
11
previously cloned, rice trait genes, we employed 881 genes from Oryzabase
12
(http://www.shigen.nig.ac.jp/rice/oryzabase/, Kurata and Yamazaki 2006). Barley
13
homologs of these genes could play important roles in barley traits (phenotypes). We
14
searched for homologs of these sequences among the Uni-FLcDNAs using blastp
15
(Supplemental Table S2). We found that 721 of the 881 genes (81.8%) had barley
16
homologs with a high similarity (≥50% identity). Although we currently do not know
17
whether these homologs are orthologs, this result indicates that these trait genes are
18
highly conserved between barley and rice. The detection of homologous sequences
19
among the Uni-FLcDNAs could accelerate the comparative mapping of barley genes
20
and functional prediction of these homologous cDNAs to promote future barley
21
genomics research.
22
We found 75 representative FLcDNAs that were longer than 5,000 bp in length,
12
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
and some of these had the capacity to encode relatively long ORFs. For example, four
2
of the ten longest FLcDNAs encoded ORFs longer than 1,000 amino acids. In contrast,
3
three of their ORFs were shorter than 100 amino acids (Table II).
4
We estimated that there were 28 nc FLcDNAs among the Uni-FLcDNAs (see
5
Materials and Methods). Although these nc FLcDNAs could have been derived from
6
miRNAs, a blastn search against rice miRNAs revealed no significant homologies.
7
Moreover, as 26 of the 28 nc FLcDNAs exhibited no homologies with four fully
8
sequenced grass genomes (O. sativa, Z. mays, Sorghum bicolor and Brachypodium
9
distachyon), they might represent poorly conserved nc RNAs.
10
An analysis using InterProScan revealed that 16,859 of the 22,623 representative
11
ORFs (74.5%) contained conserved domains, and 12,595 ORFs (55.7%) were assigned
12
GO terms(Barrell et al. 2009). Using GO2slim, we found that “binding” (GO:0005488)
13
and “catalytic activity” (GO:0003824) were the most common second-level GO terms
14
found in the data (Supplemental Fig. S3). The distribution of GO terms related to the
15
barley Uni-FLcDNAs was quite similar to that seen in representative rice RAP data
16
(Tanaka et al. 2008).
17
18
Comparison of barley FLcDNAs with the Triticeae transcriptome
19
We described 17,773 novel barley nr FLcDNAs (Fig. 1). To evaluate the amount
20
of novel gene information in these sequences, homology searches of the FLcDNAs were
21
conducted against sequences from publicly available transcripts. For this purpose, 6,625
22
mRNAs and 525,559 ESTs from barley and 3,433 mRNAs and 1,107,168 ESTs from
13
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
wheat were downloaded from DDBJ/EMBL/GenBank.
2
A blastn search showed that 3,278 of the representative FLcDNAs had no
3
homologous sequences. The average insert length of these novel FLcDNAs (1,602 ±
4
921 bp) was slightly shorter than that of the known FLcDNAs (1,729 ± 851 bp) (Fig. 2).
5
Characterization of protein functions using the second-level GO terms showed a similar
6
distribution of terms among the novel and known FLcDNAs, except for “structural
7
molecule activity” (GO:0005198) (Fig. 3) (2.4% in the “known” gene set and 5.1% in
8
the “novel” gene set), indicating that the molecular functions of the novel FLcDNAs
9
were distributed similarly to those of the known FLcDNAs. The novel transcripts were
10
predicted to code for many different kinds of essential proteins, such as proteins
11
involved in protein synthesis, including ribosomal proteins and elongation factors,
12
transcription factors, cytochromes and stress-related proteins, including JA-induced
13
protein, CBFII-5.1, heat shock protein, and NBS-LRR disease resistance protein
14
homologue (Supplemental Table S3). Hence, we anticipate that these novel transcripts
15
will support the construction of an essential gene network for barley.
16
Of the 3,278 novel FLcDNAs, 1,974 were conserved in all four sequenced grass
17
genomes (i.e., O. sativa, Z. mays, S. bicolor, and B. distachyon (International Rice
18
Genome Sequencing Project 2005; Paterson et al. 2009; Schnable et al. 2009; The
19
International Brachypodium Initiative 2010)) according to the results of blastn searches,
20
but 942 had no grass homologs (Supplemental Table S4). We conducted blastp
21
searches of the amino acid sequences derived from the novel ORFs against predicted
22
protein sequences from the four species, and found that 1,731 ORFs were homologous
14
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
to genes annotated in at least one species. These results suggest that the molecular
2
functions of approximately two-thirds of the novel barley FLcDNAs could be predicted
3
by comparison with annotated genes in major grass species.
4
5
Detection of barley-specific FLcDNAs
6
To elucidate whether the Uni-FLcDNAs contained gene structures that are
7
conserved among crop plants, blastn searches were conducted against the four
8
completed genome sequences. The results showed that 1,699 FLcDNAs (Specific
9
FLcDNAs) were not associated with homologous regions in the four genomes, whereas
10
20,952 FLcDNAs (Common FLcDNAs) had a homolog in at least one of the four plant
11
species (Supplemental Table S5). Most FLcDNAs (19,778) were conserved in all four
12
species, but more barley ORFs were homologous to ORFs in O. sativa than in B.
13
distachyon, which shares a more recent common ancestor with barley than rice does.
14
This result might have been caused by differences in the methodology used to sequence
15
O. sativa and B. distachyon and the resulting quality of the reference genome sequences.
16
The number of conserved genes detected in Z. mays was less than half the number
17
detected in S. bicolor, even though the evolutionary distances of barley from Z. mays
18
and S. bicolor are similar. The length of the inserts in the barley-specific FLcDNAs was
19
shorter (1,301 ± 897 bp) than in the other FLcDNAs (1,745 ± 851 bp).
20
Of the specific FLcDNAs, 263 contained known functional domains based on
21
InterProScan analysis, and 25 were nc transcripts; thus, 85% of the specific FLcDNAs
22
were not assigned to any putative function. GO data suggested that the relative
15
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
proportions of the genes related to “signal transducer activity” (GO:0004871) and
2
“enzyme regulator activity” (GO:0030234) were greater in the specific FLcDNAs than
3
in the total representative FLcDNAs (data not shown). This suggests that barley exhibits
4
species-specific regulatory networks involved in signal transduction, transcription, and
5
metabolism.
6
7
Comparison of FLcDNA sequences with published BAC sequences of barley
8
To evaluate the barley FLcDNAs dataset in terms of its ability to capture a large
9
number of active FLcDNAs, we mapped all of the FLcDNAs on publicly available
10
barley bacterial artificial chromosome (BAC) sequences. Taketa and colleagues reported
11
(Taketa et al. 2008) a Haruna BAC contig (AP009567) in which two genes were
12
predicted. BAG12385.1 encodes a putative iron-deficiency specific 4 protein, and
13
BAG12386.1 is a putative ethylene-responsive transcription factor responsible for the
14
nud (hull-less) phenotype. Even though the public FLcDNAs could not be mapped to
15
the predicted loci, multiple FLcDNAs determined in this study could be mapped at each
16
locus, which suggests that the structures of these genes are accurate (Supplemental Fig.
17
S4). We also mapped the FLcDNAs on BACs of a different cultivar, Morex. Haruna
18
Nijo is an established Japanese two-row malting cultivar, whereas Morex is a six-row
19
cultivar that is used as the American malting industry's standard cultivar. Because
20
Morex will provide the first reference sequence for the barley genome, there have been
21
increasing numbers of submissions of Morex sequences in the DNA Data Bank of Japan
22
(DDBJ; www.ddbj.nig.ac.jp/). To estimate the conservation of genes sequences between
16
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
these cultivars, we mapped the FLcDNAs to publicly available Morex BAC sequences
2
using EST2genome (Mott 1997). We found that 373 FLcDNAs mapped to genomic
3
sequences from 113 of 181 BACs. Forty-five FLcDNAs (31 representatives) mapped to
4
more than one locus. For example, NIASHv1123K08 was mapped to four loci in three
5
BACs (one locus each of AY758233.1 and DQ249273.1 and two loci of AC239053.1).
6
These genes were highly conserved (99.4% identity on average, Fig. 4), and 61 of them
7
were completely identical. The results of this analysis indicate that the Haruna Nijo
8
FLcDNAs can be utilized by IBSC to annotate the Morex genome.
9
17
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Discussion
2
3
Scientists have recently recognized that FLcDNAs are indispensable resources for
4
gene identification and annotation. As the barley genome sequencing project progresses,
5
obtaining information about barley FLcDNAs has become increasingly important.
6
There is a considerable amount of Triticeae (barley and wheat) transcript data in the
7
public domain; however, the number of barley FLcDNAs is insufficient compared with
8
the number available for other crops, such as O. sativa, Z. mays, and G. max (Kikuchi et
9
al. 2003; Umezawa et al. 2008; Soderlund et al.2009).
10
The sequences developed in this study represent the largest collection of Triticeae
11
FLcDNA data produced to date. To comprehensively cover the entire barley
12
transcriptome with FLcDNA clones, we constructed novel libraries from Haruna Nijo
13
from 12 different developmental stages, organs and/or stressed conditions and isolated
14
more than 170,000 clones (see Supplemental information). To identify the source of
15
clones in the pooled library constructions, tagged sub-libraries were used.
16
The ESTs we identified were clustered, and representative cDNA clones were
17
selected for full sequencing. The quality of the FLcDNAs obtained in this study was
18
evaluated by comparing their average sequence length (1,701bp) with that of publicly
19
available barley and rice FLcDNAs. A recent report in which the average length of
20
27,455 maize FLcDNAs was found to be 1,442 bp (Soderlund et al.2009) also supports
21
the length quality of the FLcDNAs in the present study. The method of FLcDNA
22
construction used here was the CAP-trapper method (Carninci et al. 1996), which has
18
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
been used in FLcDNA projects in mice, Arabidopsis (Seki et al. 2002; Seki et al. 2004),
2
rice (Kikuchi et al. 2003), and soybean (Umezawa et al. 2008). Because this
3
methodology is based on the selection of mRNAs with an existing cap structure, it
4
provides high coverage of mRNA structures.
5
The 22,651 representative barley FLcDNAs described here were constructed
6
using the clones produced in this study and publicly available barley FLcDNAs.
7
Functional categorization of the Haruna Nijo FLcDNAs based on ORF prediction and
8
GO assignments produced results similar to those found for rice FLcDNAs. This could
9
indicate conservation of the entire gene sets between barley and rice. We also detected
10
28 nc genes. Only 4 out of the 28 nc transcripts were included in the NAT pairs,
11
suggesting that NATs are not the sole reason underlying the observed the nc transcripts.
12
Comparative analysis of the Uni-FLcDNAs against public data indicated the
13
importance of this dataset. First, 17,773 FLcDNAs were found to be novel in barley,
14
and 3,278 cDNAs from the Uni-FLcDNAs exhibited no homology to public EST or
15
mRNA data from barley or wheat. We consider it unlikely that this paucity of
16
homologous sequences resulted from natural variation among Haruna Nijo and other
17
barley varieties, because 1,974 of the 3,278 novel FLcDNAs had common homologs in
18
all four grass species examined (i.e., O. sativa, Z. mays, S. bicolor, and B. distachyon).
19
As full-length sequences are available for these four grass species, these 1,974 genes
20
might be structurally conserved and functionally active in grasses in general. These
21
findings should help us to assign gene functions to the novel barley FLcDNAs.
22
Second, we detected 1,699 FLcDNAs that showed no homology to any of the four
19
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
grass genomes. Even though the mean insert length of these FLcDNAs was shorter than
2
that of the other cDNAs, these 1,699 FLcDNAs still have the capacity to encode
3
functional proteins. We therefore concluded that these genes are Triticeae-specific (or at
4
least Hordeum-specific) sequences. Unfortunately, only 263 specific FLcDNAs could
5
be assigned putative functions based on InterPro domains, and the gene functions of the
6
other genes are unknown. The complete gene structure information and the FLcDNA
7
clones of the barley-specific genes can be used in future experimental studies, such as
8
for overexpression of recombinant proteins or microarray analyses, to reveal the
9
functions of these genes. We note that the 20,952 FLcDNAs with homologs in all four
10
grass genomes might still include some barley-specific genes, because the coding
11
potential of their mapped regions has not been verified; additionally, there were cases
12
where FLcDNAs mapped to non-genic regions in the latest annotated genome data.
13
The barley cultivar Haruna Nijo, which was used here as the source of the cDNA
14
libraries generated, was released as a malting barley variety in 1981 and has been
15
intensively used in the pedigree of Japanese malting barleys because of its excellent
16
quality profiles for brewing. Additionally, several genetic/genomic resources have been
17
established for this cultivar. More than 140,000 ESTs have been sequenced, and the
18
positions of more than 2,890 of these have been located in genetic maps (Sato et al.
19
2009). A BAC library has been constructed (Saisho et al. 2007), and some of these BAC
20
clones have been beneficial for map-based cloning of trait genes in barley (Taketa et al.
21
2008). These resources have made Haruna Nijo a useful variety for investigating barley
22
genetics and genomics (see http://www.shigen.nig.ac.jp/barley/index.html).
20
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
The IBSC is currently sequencing the complete genome of the Morex cultivar.
2
Mapping of all of the Haruna Nijo FLcDNAs to genome sequences from Morex BACs
3
clearly showed that the Haruna Nijo FLcDNAs can serve as good resources for the
4
genome annotation of Morex. Based on the density of the mapped FLcDNAs (i.e., 48.8
5
kb per gene; 277 loci mapped to 181 BAC sequences, covering 13.52 Mb), we
6
estimated that the total gene number in the barley genome is approximately 100,000.
7
This estimation is much larger than an estimation presented in a previous report (Mayer
8
et al. 2009) of 38,000 to 48,000 genes in the barley genome (i.e., 104 to 132 kb per gene
9
on average). It is likely that we have overestimated the number of genes, because the
10
BAC clones used for genome sequencing come from gene-rich regions targeted for the
11
purpose of map-based cloning or the investigation of genic regions. Based on the
12
previously estimated gene number (Mayer et al. 2009), our Hruna Nijo dataset of
13
22,651 nr FLcDNAs could represent 47% to 59% of the total number of genes present
14
in barley. The fact that 54% to 70% of the rice genes predicted by RAP have been
15
validated by rice FLcDNAs indicates that our dataset is similarly comprehensive. Not
16
all genes present in barley are expected to have been expressed in the 12 conditions
17
examined in the present study, so we suggest that the proportion of active genes
18
captured could be much greater than 47% to 59%.
19
21
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Conclusion
2
We cloned more than 170,000 Haruna Nijo barley FLcDNAs, and identified more
3
than 24,000 complete FLcDNAs. The final set of 22,651 representative FLcDNAs
4
obtained will be a very useful resource for future studies of barley, as well as for the
5
annotation of barley genomic sequences in the ongoing IBSC genome sequencing
6
project. These data will also support the future wheat genome sequencing project
7
currently being organized by the International Wheat Genome Sequencing Consortium
8
(IWGSC, http://www.wheatgenome.org/).
9
10
All FLcDNA data obtained in this study are available from our full length Barley
cDNA Database (http://barleyflc.dna.affrc.go.jp/hvdb/index.html).
11
22
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Materials and Methods
2
3
Full-sequencing of representative clones
4
For each representative clone determined after clustering the end sequences of
5
clones, both the 5 and the 3 ends were resequenced using the Big Dye Terminator v3.1
6
Cycle Sequencing Kit and then analyzed with an ABI 3730xl DNA sequencer (Applied
7
Biosystems Inc., Foster City, CA, USA) to confirm the clone-sequence identity. Next,
8
the internal regions were sequenced using a primer-walking method until the sequences
9
from opposite directions overlapped. We designed the sequencing strategy so that at
10
least two reads covered every part of the insert region of each cDNA clone to assure
11
sequence quality. Finally, we assembled these reads to create a consensus sequence for
12
each clone.
′
′
13
14
Removing redundancy
15
All-against-all blastn searches (Altschul et al. 1990) were conducted among the
16
barley FLcDNA sequences to detect redundant sequences. When an FLcDNA was
17
similar to another FLcDNA with 90% identity and 95% coverage, the FLcDNAs
18
were clustered. If two FLcDNAs that did not overlap were in the same cluster, we
19
considered that a fused transcript that bridged the two FLcDNAs was contained in this
20
cluster. In this case, the cDNA was discarded and the cluster was separated into two
21
clusters.
≥
≥
22
23
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Prediction of open reading frames and functional annotation
2
Representative ORFs were predicted by blastx searches against the RefSeq and
3
UniProtKB databases with a positive match cutoff set at an E-value less than 1 × 10−20.
4
If there were no homologous proteins in the databases, the longest ORFs (>70 amino
5
acids in length) were assigned as predicted ORFs. If no ORFs were predicted, for an
6
FLcDNA, it was defined as nc FLcDNA. nc FLcDNAs were compared with rice
7
microRNAs
8
Griffiths-Jones et al. 2008) using blastn. To assign a gene function on the basis of
9
conserved domains or motifs, InterProScan searches (Zdobnov and Apweiler 2001;
10
Hunter et al. 2009) were conducted using the predicted ORFs. Based on the results of
11
the InterProScan searches, GOs categories were assigned to each predicted ORF
12
(Barrell et al. 2009). The GOs assignments were categorized using map2slim software
13
(http://www.geneontology.org/GO.slims.shtml#whatIs).
(miRNAs)
downloaded
from
miRBase
(http://www.mirbase.org/;
14
15
Comparison of FLcDNA sequences with complete genome sequences from other
16
crop species
17
To conduct genome-wide comparisons of the FLcDNA sequences, genome
18
sequences and annotation data from four completely sequenced crop plants were
19
obtained at the following sites: the Rice Annotation Project Database (RAP-DB;
20
http://rapdb.dna.affrc.go.jp/)
21
(http://www.maizesequence.org/index.html)
22
(http://genome.jgi-psf.org/Sorbi1/Sorbi1.home.html) for S. bicolor, and BrachyBase
for
O.
sativa,
for
the
MaizeSequence
Z.
mays,
24
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
the
database
JGI
1
(http://www.brachybase.org/) for B. distachyon. The FLcDNA sequences were mapped
2
onto the genome sequences using blastn search at an E-value of less than 1 × 10-5.
3
4
Detection of novel barley FLcDNAs
5
mRNAs and ESTs from barley and wheat were downloaded from DDBJ. The
6
poly(A) sequences were then trimmed, and repetitive regions were masked using
7
RepeatMasker software (http://repeatmasker.org). We conducted blastn searches of the
8
sequences against all barley FLcDNAs with the threshold of a positive match set at an
9
identity 90% and a coverage 90%. Because the dataset of barley FLcDNAs contained
10
redundant sequences, multiple hits for the query mRNA or EST sequences were
11
accepted.
≥
≥
12
13
Comparison with rice trait genes
14
A
15
(http://www.shigen.nig.ac.jp/rice/oryzabase/genes/genesTop.jsp). Of 4,124 trait genes,
16
881 genes had RAP-DB transcript information. The amino acid sequences of these
17
genes were used as queries for blastp analyses with the predicted ORFs of the
18
Uni-FLcDNAs.
list
of
rice
trait
genes
was
downloaded
from
Oryzabase
19
20
Comparison with barley BAC sequences
To compare the FLcDNAs with BAC sequences, one Haruna Nijo (Taketa et al.
21
22
2008)
and
181
Morex
BAC
sequences
were
downloaded
25
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
from
the
1
DDBJ/EMBL/GenBank. The Morex BACs were downloaded using the keywords
2
“Morex” and “BAC”. After masking repeat sequences using RepeatMasker and the
3
libraries
4
(http://mips.helmholtz-muenchen.de/plant/genomes.jsp) (Spannagl et al. 2007) and
5
Triticeae
6
http://wheat.pw.usda.gov/ITMI/Repeats/), the FLcDNAs were mapped onto the BACs
7
using the same method as was used for RAP annotation (Tanaka et al. 2008). The
8
FLcDNAs were first mapped using blastn to determine the possible mapping regions in
9
the genomic sequences and further mapped using EST2genome (Mott 1997) with
10
thresholds of 95% identity and 90% coverage. When two or more FLcDNAs were
11
mapped to the same locus, the longest FLcDNA was used for further analysis.
of
Repeat
≥
MIPS
Repeat
Sequence
Element
Database
release
Database
10
4.3
(TREP,
≥
12
13
14
15
Sequence publication in the public domain
All of the sequence data for the representative FLcDNAs have been submitted to
the DDBJ (AK353559-AK377172).
16
26
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Acknowledgments
2
3
The authors are grateful to Sapporo Breweries Co. Ltd for providing the
4
germinating barley seeds and to Dr. J. F. Ma from Okayama University for experimental
5
assistance. Published barley FLcDNA information on was provided by the National
6
Bio-Resource Project of the MEXT, Japan.
7
27
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Literature cited
2
3
Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB and
4
Feldmann KA (2006) Features of Arabidopsis genes and genome discovered using
5
full-length cDNAs. Plant Mol Biol 60:69-85
6
Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local
7
alignment search tool. J Mol Biol 215:403-410
8
Aoki K, Yano K, Suzuki A, Kawamura S, Sakurai N, Suda K, Kurabayashi A,
9
Suzuki T, Tsugane T, Watanabe M, et al (2010) Large-scale analysis of full-length
10
cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference
11
system for the Solanaceae genomics. BMC Genomics 11:210
12
Barrell D, Dimmer E, Huntley R P, Binns D, O'Donovan C and Apweiler R (2009)
13
The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic
14
Acids Res 37:D396-D403
15
Bennetzen JL, Coleman C, Liu R, Ma J and Ramakrishna W (2004) Consistent
16
over-estimation of gene number in complex plant genomes. Curr Opin Plant Biol
17
7:732-736
18
Carninci P, Kvam C, Kitamura A, Ohsumi T, Okazaki Y, Itoh M, Kamiya M,
19
Shibata K, Sasaki N, Izawa M, et al (1996) High-efficiency full-length cDNA cloning
20
by biotinylated CAP trapper. Genomics 37:327-336
21
Close TJ, Wanamaker SI, Caldo RA, Turner SM, Ashlock DA, Dickerson JA,
22
Wing RA, Muehlbauer GJ, Kleinhofs A and Wise RP (2004) A New Resource for
28
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Cereal Genomics: 22K Barley GeneChip Comes of Age. Plant Physiol. 134:960-968
2
Cruveiller S, Jabbari K, Clay O and Bernardi G (2004) Incorrectly predicted genes
3
in rice? Gene 333:187-188
4
Druka A, Muehlbauer G, Druka I, Caldo R, Baumann U, Rostoks N, Schreiber A,
5
Wise R, Close T, Kleinhofs A, et al (2006) An atlas of gene expression from seed to
6
seed through barley development. Functional & Integrative Genomics 6:202-211
7
Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ,
8
Lushbough C and Brendel V (2008) PlantGDB: a resource for comparative plant
9
genomics. Nucl. Acids Res. 36:D959-D965
10
Griffiths-Jones S, Saini HK, van Dongen S and Enright AJ (2008) miRBase: tools
11
for microRNA genomics. Nucleic Acids Res 36:D154-D158
12
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das
13
U, Daugherty L, Duquenne L, et al (2009) InterPro: the integrative protein signature
14
database. Nucleic Acids Res 37:D211-D215
15
Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero R
16
A, Tamura T, Yamaguchi-Kabata Y, Tanino M, et al (2004) Integrative annotation
17
of 21,037 human genes validated by full-length cDNA clones. PLoS Biol. 2:0859-0875
18
International Rice Genome Sequencing Project (2005) The map-based sequence of
19
the rice genome. Nature 436:793-800
20
Itoh T, Tanaka T, Barrero RA, Yamasaki C, Fujii Y, Hilton PB, Antonio BA,
21
Aono H, Apweiler R, Bruskiewich R, et al (2007) Curated genome annotation of
22
Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana.
29
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Genome Res 17:175-183
2
Jabbari K, Cruveiller S, Clay O, Le Saux J and Bernardi G (2004) The new genes
3
of rice: a closer look. Trends in Plant Science 9:281
4
Kawai J, Shinagawa A, Shibata K, Yoshino M, Itoh M, Ishii Y, Arakawa T, Hara
5
A, Fukunishi Y, Konno H, et al (2001) Functional annotation of a full-length mouse
6
cDNA collection. Nature 409:685-690
7
Kawaura K, Mochida K, Enju A, Totoki Y, Toyoda A, Sakaki Y, Kai C, Kawai J,
8
Hayashizaki Y, Seki M, et al (2009) Assessment of adaptive evolution between wheat
9
and rice as deduced from full-length common wheat cDNA sequence data and
10
expression patterns. BMC Genomics 10:271
11
Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J,
12
Ishikawa M, Yamada H, Ooka H, et al (2003) Collection, Mapping, and Annotation
13
of Over 28,000 cDNA Clones from japonica Rice. Science 301:376-379
14
Koo BC, Bushman BS and Mott IW (2008) Transcripts Associated with
15
Non-Acclimated Freezing Response in Two Barley Cultivars. The Plant Genome
16
1:21-32
17
Kurata N and Yamazaki Y (2006) Oryzabase. An integrated biological and genome
18
information database for rice. Plant Physiol. 140:12-17
19
Marin-Rodriguez MC, Orchard J and Seymour GB (2002) Pectate lyases, cell wall
20
degradation and fruit softening. J. Exp. Bot. 53:2115-2119
21
Maruyama K and Sugano S (1994) Oligo-capping: a simple method to replace the cap
22
structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138:171-174
30
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Mayer KFX, Taudien S, Martis M, Simkova H, Suchankova P, Gundlach H,
2
Wicker T, Petzold A, Felder M, Steuernagel B, et al (2009) Gene Content and
3
Virtual Gene Order of Barley Chromosome 1H. Plant Physiol. 151:496-505
4
Mochida K, Saisho D, Yoshida T, Sakurai T and Shinozaki K (2008) TriMEDB: A
5
database to integrate transcribed markers and facilitate genetic studies of the tribe
6
Triticeae. BMC Plant Biology 8:72
7
Mott R (1997) EST_GENOME: a program to align spliced DNA sequences to
8
unspliced genomic DNA. Comput Appl Biosci. 13:477-478
9
Ogihara Y, Mochida K, Kawaura K, Murai K, Seki M, Kamiya A, Shinozaki K,
10
Carninci P, Hayashizaki Y, Shin-I T, et al (2004) Construction of a full-length cDNA
11
library from young spikelets of hexaploid wheat and its characterization by large-scale
12
sequencing of expressed sequence tags. Genes & Genetic Systems 79:227-232
13
Osato N, Yamada H, Satoh K, Ooka H, Yamamoto M, Suzuki K, Kawai J,
14
Carninci P, Ohtomo Y, Murakami K, et al (2003) Antisense transcripts with rice
15
full-length cDNAs. Genome Biol 5:R5
16
Ota T, Suzuki Y, Nishikawa T, Otsuki T, Sugiyama T, Irie R, Wakamatsu A,
17
Hayashi K, Sato H, Nagai K, et al (2004) Complete sequencing and characterization
18
of 21,243 full-length human cDNAs. Nat Genet 36:40-45
19
Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H,
20
Haberer G, Hellsten U, Mitros T, Poliakov A, et al (2009) The Sorghum bicolor
21
genome and the diversification of grasses. Nature 457:551-556
22
Pruitt KD, Tatusova T, Klimke W and Maglott DR (2009) NCBI Reference
31
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Sequences: current status, policy and new initiatives. Nucleic Acids Res 37:D32-D36
2
Saisho D, Myoraku E, Kawasaki S, Sato K and Takeda K (2007) Construction and
3
Characterization of a Bacterial Artificial Chromosome (BAC) Library from the
4
Japanese Malting Barley variety Haruna Nijo. Breeding Science 57:29-38
5
Sato K, Nankaku N and Takeda K (2009) A high-density transcript linkage map of
6
barley derived from a single population. Heredity 103:110-117
7
Sato K, Shin I T, Seki M, Shinozaki K, Yoshida H, Takeda K, Yamazaki Y, Conte
8
M and Kohara Y (2009) Development of 5006 full-length CDNAs in barley: a tool for
9
accessing cereal genomics resources. DNA Res 16:81-89
10
Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J,
11
Fulton L, Graves TA, et al (2009) The B73 maize genome: complexity, diversity, and
12
dynamics. Science 326:1112-1115
13
Schulte D, Close TJ, Graner A, Langridge P, Matsumoto T, Muehlbauer G, Sato K,
14
Schulman AH, Waugh R, Wise RP, et al (2009) The International Barley Sequencing
15
Consortium--At the Threshold of Efficient Access to the Barley Genome. Plant Physiol.
16
149:142-147
17
Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju
18
A, Akiyama K, Oono Y, et al (2002) Functional Annotation of a Full-Length
19
Arabidopsis cDNA Collection. Science 296:141-145
20
Seki M, Satou M, Sakurai T, Akiyama K, Iida K, Ishida J, Nakajima M, Enju A,
21
Narusaka M, Fujita M, et al (2004) RIKEN Arabidopsis full-length (RAFL) cDNA
22
and its applications for expression profiling under abiotic stress conditions. J. Exp. Bot.
32
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
55:213-223
2
Soderlund C, Descour A, Kudrna D, Bomhoff M, Boyd L, Currie J, Angelova A,
3
Collura K, Wissotski M, Ashley E, et al (2009) Sequencing, Mapping, and Analysis
4
of 27,455 Maize Full-Length cDNAs. PLoS Genet 5:e1000740
5
Spannagl M, Noubibou O, Haase D, Yang L, Gundlach H, Hindemitt T, Klee K,
6
Haberer G, Schoof H and Mayer KF (2007) MIPSPlantsDB--plant database resource
7
for integrative and comparative plant genome research. Nucleic Acids Res
8
35:D834-D840
9
Sreenivasulu N, Radchuk V, Strickert M, Miersch O, Weschke W and Wobus U
10
(2006) Gene expression patterns reveal tissue-specific signaling networks controlling
11
programmed cell death and ABA-regulated maturation in developing barley seeds. The
12
Plant Journal 47:310-327
13
Sreenivasulu N, Usadel B, Winter A, Radchuk V, Scholz U, Stein N, Weschke W,
14
Strickert M, Close TJ, Stitt M, et al (2008) Barley Grain Maturation and
15
Germination: Metabolic Pathway and Regulatory Network Commonalities and
16
Differences Highlighted by New MapMan/PageMan Profiling Tools. Plant Physiol.
17
146:1738-1758
18
Sreenivasulu N and Wobus U (2008) Barley Genomics: An Overview. International
19
Journal of Plant Genomics 2008
20
Suzuki Y, Yoshitomo-Nakagawa K, Maruyama K, Suyama A and Sugano S (1997)
21
Construction and characterization of a full length-enriched and a 5'-end-enriched cDNA
22
library. Gene 200:149-156
33
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Taketa S, Amano S, Tsujino Y, Sato T, Saisho D, Kakeda K, Nomura M, Suzuki T,
2
Matsumoto T, Sato K, et al (2008) Barley grain with adhering hulls is controlled by an
3
ERF family transcription factor gene regulating a lipid biosynthesis pathway.
4
Proceedings of the National Academy of Sciences 105:4062-4067
5
Talame V, Ozturk NZ, Bohnert HJ and Tuberosa R (2007) Barley transcript profiles
6
under dehydration shock and drought stress treatments: a comparative analysis. J. Exp.
7
Bot. 58:229-240
8
Tanaka T, Antonio B A, Kikuchi S, Matsumoto T, Nagamura Y, Numa H, Sakai H,
9
Wu J, Itoh T, Sasaki T, et al (2008) The Rice Annotation Project Database
10
(RAP-DB): 2008 update. Nucleic Acids Res 36:D1028-33
11
The International Brachypodium Initiative (2010) Genome sequencing and analysis
12
of the model grass Brachypodium distachyon. Nature 463:763-768
13
The UniProt Consortium (2009) The Universal Protein Resource (UniProt) 2009.
14
Nucleic Acids Res 37:D169-D174
15
Tommasini L, Svensson J, Rodriguez E, Wahid A, Malatrasi M, Kato K,
16
Wanamaker S, Resnik J and Close T (2008) Dehydrin gene expression provides an
17
indicator of low temperature and drought stress: transcriptome-based analysis of Barley
18
( Hordeum vulgare L.). Functional & Integrative Genomics 8:387-405
19
Ueda A, Kathiresan A, Bennett J and Takabe T (2006) Comparative transcriptome
20
analyses of barley and rice under salt stress. TAG Theoretical and Applied Genetics
21
112:1286-1294
22
Umezawa T, Sakurai T, Totoki Y, Toyoda A, Seki M, Ishiwata A, Akiyama K,
34
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Kurotani A, Yoshida T, Mochida K, et al (2008) Sequencing and Analysis of
2
Approximately 40 000 Soybean cDNA Clones from a Full-Length-Enriched cDNA
3
Library. DNA Res 15:333-346
4
Varshney RK, Langridge P, Graner A and Jeffery CH (2007) Application of
5
Genomics to Molecular Breeding of Wheat and Barley. Advances in Genetics Volume
6
58:121-155
7
Walia H, Wilson C, Wahid A, Condamine P, Cui X and Close T (2006) Expression
8
analysis of barley (Hordeum vulgare L.) during salinity stress. Functional & Integrative
9
Genomics 6:143-156
10
Wang XJ, Gaasterland T and Chua NH (2005) Genome-wide prediction and
11
identification of cis-natural antisense transcripts in Arabidopsis thaliana. Genome Biol
12
6:R30
13
Wang J, Shi ZY, Wan XS, Sheu GZ and Zhang JL (2008) The expression pattern of
14
a rice proteinase inhibitor gene OsP18-1 implies its role in plant development. J Plant
15
Physiol 165:1519-1529
16
Zdobnov E M and Apweiler R (2001) InterProScan--an integration platform for the
17
signature-recognition methods in InterPro. Bioinformatics 17:847-848
18
35
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Figure Legends
2
Figure 1.
3
study
Determination of the representative barley FLcDNAs obtained in this
4
5
Figure 2.
Distribution of nucleotide lengths of known and novel FLcDNAs
6
Size distributions of FLcDNAs with a high homology to known grass genes (white bar)
7
and with no homology to known gene s (black bars).
8
9
Figure 3.
GO categorization of known and novel FLcDNAs
10
Functional motif/domains were detected and categorized using GO criteria. Results for
11
the representative FLcDNAs with a high homology to known grass genes (Known) and
12
with no homology to known genes (Novel).
13
14
Figure 4.
Distribution of sequence identities of mapped Haruna Nijo FLcDNAs to
15
published Morex BAC sequences
16
36
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Tables
2
Table I. Comparison of the completeness of the ORFs in the barley FLcDNAs
3
NIAS
Complete ORF
Public
Representative
20,055
4,279
19,335
5'-truncated ORF
3,404
685
3,181
3'-truncated ORF
100
11
79
5’-, 3’-truncated ORFa
29
4
28
Non-protein codingb
26
20
28
23,614
4,999
22,651
Total
4
a
5
proteins or longest ORFs whose lengths were ≥70 amino acids.
ORF had neither start nor stop codon.
b
FLcDNAs had no ORFs that were similar to known
6
37
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.
1
Table II. The ten longest FLcDNAs in the Uni-FLcDNA library
2
Nucleotide Amino acid length
Clone name
Gene function
length (bp) (AA)
NIASHv2010H17
7,384
460
Mitogen-activated protein kinase
NIASHv3115C23
7,155
2,230
Acetyl-coenzyme A carboxylase
NIASHv2090B16
7,012
90
Cycloeucalenol cycloisomerase
FLbaf76j08
6,765
98
Hypothetical protein
NIASHv2019K03
6,637
1,731
Chromatin remodeling complex subunit
NIASHv2035I12
6,311
227
Conserved hypothetical protein
NIASHv2017F18
6,183
152
KN1-type homeobox transcription factor
NIASHv1031N03
6,072
33
Hypothetical protein
NIASHv2125A18
6,037
1,859
DNA-directed RNA polymerase
NIASHv3085A02
6,036
1,792
Callose synthase 1 catalytic subunit
3
38
Downloaded from on June 18, 2017 - Published by www.plantphysiol.org
Copyright © 2011 American Society of Plant Biologists. All rights reserved.