Plant Physiology Preview. Published on March 17, 2011, as DOI:10.1104/pp.110.171579 1 Running head: Analysis of Barley FLcDNAs 2 3 4 Corresponding author: Takashi Matsumoto 5 National Institute of Agrobiological Sciences, 2-1-2, Kannondai, Tsukuba, Ibaraki 6 305-8602, Japan 7 Tel: +81-29-838-7441 8 e-mail: [email protected] 9 10 Research area: Genome Analysis 11 1 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. Copyright 2011 by the American Society of Plant Biologists 1 Comprehensive Sequence Analysis of 24,783 Barley Full-length cDNAs derived 2 from Twelve Clone Libraries 3 4 Takashi Matsumoto*#, Tsuyoshi Tanaka*, Hiroaki Sakai, Naoki Amano, Hiroyuki 5 Kanamori, Kanako Kurita, Ari Kikuta, Kozue Kamiya, Mayu Yamamoto, Hiroshi Ikawa, 6 Nobuyuki Fujii, Kiyosumi Hori, Takeshi Itoh and Kazuhiro Sato 7 National Institute of Agrobiological Sciences, 2-1-2, Kannondai, Tsukuba, Ibaraki 8 305-8602, Japan (T. M., T. T., H. S., N. A., K. H., T. I.), Institute of Society for 9 Techno-Innovation of Agriculture, Forestry and Fisheries, 446-1, Kamiyokoba, Tsukuba, 10 Ibaraki 305-0854, Japan (H. K., K. Ku., A. K., K. Ka., M. Y., H. I.), Hitachi 11 Government & Public Corporation System Engineering, Ltd., 2-4-8, Toyo-cyo, Koto, 12 Tokyo 135-8633, Japan (N. F.), 4Institute of Plant Science and Resources, Okayama 13 University, 2-20-1, Chuo, Kurashiki, Okayama, 710-0046, Japan (K. H., K. S.) 14 2 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Financial sources: This work was supported by grants to T. M. and T. I. from the 2 Ministry of Agriculture, Forestry and Fisheries of Japan (Genomics for Agricultural 3 Innovation, TRC-1008, GIR-1001; and Integrated Research Project for Plant, Insect and 4 Animal using Genome Technology, GD-1001, GD-1002). 5 6 * These authors contributed equally to this work. 7 8 # Corresponding author. 9 Takashi Matsumoto 10 e-mail: [email protected] 11 3 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Abstract 2 3 Full-length cDNA (FLcDNAs) libraries consisting of 172,000 clones were constructed 4 from a two-row malting barley cultivar, Hordeum vulgare L. cv. Haruna Nijo, under 5 normal and stressed conditions. After sequencing the clones from both ends and 6 clustering the sequences, a total of 24,783 complete sequences were produced. By 7 removing duplicates between these and publicly available sequences, 22,651 8 representative sequences were obtained; 17,773 were novel barley FLcDNAs, and 1,699 9 were barley specific. Highly conserved genes were found in the barley FLcDNA 10 sequences for 721 of 881 rice trait genes with 50% or greater identity. These FLcDNA 11 resources from our Haruna Nijo cDNA libraries and the full-length sequences of 12 representative clones, will improve our understanding of the biological functions of 13 genes in barley, which is the cereal crop with the fourth-highest production in the world, 14 and will provide a powerful tool for annotating the barley genome sequences that will 15 become available in the near future. 16 4 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Introduction 2 3 In 2009, approximately 54 million hectares of barley (Hordeum vulgare) were 4 cultivated, and harvests of this species produced approximately 150 million tons of 5 grain worldwide, according to the Food and Agriculture Organization of the United 6 Nations (FAOSTAT; http://faostat.fao.org/). Barley belongs to the Poaceae family (i.e., 7 grasses), which also includes rice (Oryza sativa), maize (Zea mays), wheat (Triticum 8 spp.), and rye (Secale sereade; cross-pollinated). These members of the Poaceae, 9 including barley, are the major cereal crops cultivated throughout the world. Barley is 10 utilized for animal feed, malting, and as a human food source. Because barley is 11 self-pollinated and has a diploid (2n = 14) genome, it is recognized as a genetic model 12 of the Triticeae tribe within the Poaceae. Studies of the barley genome, transcriptome, 13 and proteome are currently advancing our understanding of the molecular functions of 14 agriculturally important barley genes (Sreenivasulu and Wobus 2008; Sreenivasulu et al. 15 2008). 16 Although barley has been extensively studied in terms of its genetics and breeding, 17 molecular biology and genomics studies have been limited until recently because of the 18 large genome size of this species (>5 Gb; ca. 12 times that of rice) (Varshney et al. 19 2007). To further promote barley genomics, a multi-national collaboration, the 20 International Barley Sequencing Consortium (IBSC), has been developed with the 21 objective of obtaining the whole genome sequence of barley (http://barleygenome.org; 22 Schulte et al. 2009). 5 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 The study of an organism's transcriptome (i.e., all transcribed sequences) is one of 2 the most effective ways to investigate the structure and function of its active genes. In 3 many plant species, transcript contigs have been constructed by assembling all the 4 expressed sequence tag (EST) data available in PlantGDB, with the aim of identifying a 5 dataset of unique mRNA sequences and maximizing the information obtained for both 6 protein-coding and non-coding (nc) regions in these sequences (Duvick et al. 2008). A 7 large set of ESTs (501,620 from the vulgare subspecies and 24,161 from the 8 spontaneum 9 www.ncbi.nlm.nih.gov/dbEST/; subspecies in the and NCBI-dbEST 522,561 sequences release-100110: in PlantGDB: 10 www.plantgdb.org/) has been accumulated in the public domain. These ESTs have been 11 assembled 12 (http://www.ncbi.nlm.nih.gov/unigene) and into 134,482 Plant GDB-assembled unique 13 transcripts 14 estimated that approximately 75% of the genes in the barley genome have been captured 15 (Sreenivasulu et al. 2008). Although these assemblies provide researchers with a 16 tremendous amount of information for understanding partial barley gene structures, the 17 assembly of sequences from different barley varieties might result in erroneous contigs 18 or lead to the fusion of unrelated transcripts via common domains or motif sequences. 19 Moreover, these data might lead to an overestimation of the number of genes in barley 20 because these assemblies do not consider virtual connections between 5 and 3 end 21 sequences, where both sequences are derived from the same cDNA clone. 22 into (PUTs) 23,595 clusters in NCBI’s UniGene (http://www.plantgdb.org/prj/ESTCluster/progress.php). ′ dataset It is ′ Capturing the transcripts of all active genes under defined (temporal, spatial, and 6 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 stressed) conditions can provide a “snapshot” of living cells. The resultant data can be 2 used to obtain quantitative (expression frequency) and qualitative (predicted protein 3 function) information about these genes. For barley transcriptome analysis, a 22K 4 Barley1 GeneChip probe array based on an EST database containing 350,000 sequences 5 from 84 different RNA sources has been established (Close et al. 2004). The results of 6 high-throughput transcript profiling of barley using this chip can be retrieved from 7 PLEXdb 8 (http://www.ebi.ac.uk/microarray-as/ae/). Using several platforms for transcript 9 profiling, researchers have described the responses of barley to conditions such as high 10 salinity (Ueda et al. 2006; Walia et al. 2006), low temperature or freezing (Koo et al. 11 2008), and drought (Talame et al. 2007; Tommasini et al. 2008), as well as during 12 various stages, such as grain development (Sreenivasulu et al. 2006), germination 13 (Sreenivasulu et al. 2008), and growth (Druka et al. 2006), by means of coordinated up 14 or down regulation of sets of genes. (http://www.plexdb.org/) and ArrayExpress 15 Moreover, the construction of a comprehensive gene set for barley by the genome 16 sequencing is underway. However, following sequencing, genome annotation in the 17 absence of information on exon-intron junctions is not an easy task, even for organisms 18 for which complete genome sequences have been revealed. Although several gene 19 prediction programs have been developed, the predicted gene structures might not 20 always be correct because these programs may select and connect incorrectly predicted 21 exons (Bennetzen et al. 2004; Cruveiller et al. 2004; Jabbari et al. 2004). Hence, most 22 gene annotation projects refer to EST or mRNA sequences for the accurate structural 7 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 annotation of genes (Imanishi et al. 2004; Itoh et al. 2007). 2 Full-length cDNA (FLcDNA) is defined as the DNA complementary to an mRNA 3 sequence that extends from the region near the 5' cap structure to the poly(A) tail, which 4 are structures specific to eukaryotic mRNAs (Maruyama and Sugano 1994; Suzuki et al. 5 1997; Carninci et al. 2000). FLcDNA technology has been applied to the genomes of 6 humans (Ota et al. 2004), mice (Kawai et al. 2001), and several plant species, including 7 Arabidopsis thaliana (Seki et al. 2002), O. sativa (Kikuchi et al. 2003), T. aestivum 8 (Ogihara et al. 2004; Kawaura et al. 2009), soybean (Glycine max) (Umezawa et al. 9 2008), Z. mays (Soderlund et al. 2009) and tomato (Solanum lycopersicum) (Aoki et al. 10 2010). Recently, an FLcDNA library has been constructed for the Japanese malting 11 barley variety Haruna Nijo (Sato et al. 2009). From more than 45,000 cDNA clones, 12 5,006 non-overlapping sequences were obtained. These sequences and clones have been 13 made publicly available through the Barley DB (http://www.shigen.nig.ac.jp/barley/), 14 and they have proven effective in the identification of functional genes. However, this 15 information is insufficient because the number of obtained FLcDNAs is much lower 16 than all of the expressed genes in the barley genome. 17 In the present study, we constructed twelve FLcDNA libraries for Haruna Nijo 18 from various organs or under different conditions to produce a comprehensive dataset 19 for this barley variety. After 5 and 3 end-sequencing and clustering of approximately 20 172,000 cDNA clones, 24,783 complete FLcDNAs sequences were retrieved. In 21 conjunction with publicly available barley FLcDNAs, a total of 22,651 non-redundant 22 (nr) barley FLcDNAs were obtained. These sequenced clones should provide ′ ′ 8 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 comprehensive information about the barley gene repertory. 2 9 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Results 2 3 Barley FLcDNA dataset 4 A total of 24,783 FLcDNAs were completely sequenced from the EST library (see 5 Supplemental Information) (Fig.1). After eliminating contaminated sequences and 6 possible fusions of unrelated transcripts, we obtained 23,614 barley FLcDNA sequences 7 (henceforth, the "National Institute of Agrobiological Sciences FLcDNAs, or NIAS 8 FLcDNAs"). These FLcDNAs were derived from the 12 cDNA libraries distinguished 9 by library-specific tag sequences and an unknown group because of lacking the tag 10 sequences (Supplemental Table S1). The number of FLcDNAs in the various libraries 11 ranged from 857 in the library derived from flowers in the juvenile stage to 2,643 in the 12 library derived from the plants treated with jasmonic acid (JA). To characterize the 13 dataset, we compared the sequences with 4,999 Haruna Nijo FLcDNA sequences that 14 had been published previously (henceforth, "Public FLcDNAs") (Sato et al. 2009). We 15 found that the NIAS FLcDNA inserts were clearly longer (1,701 ± 846 bp) than the 16 Public FLcDNA inserts (1,455 ± 742 bp) (Supplemental Figure S1). For comparison, 17 the average lengths of rice FLcDNAs in sets of 37,139 O. sativa var. japonica 18 FLcDNAs and 10,083 O. sativa var. indica FLcDNAs (DDBJ/EMBL/GenBank) were 19 1,746 ± 945 bp and 1,042 ± 535 bp, respectively. Moreover, the estimated proportion of 20 clones that might contain complete predicted open reading frames (ORFs; from Met to 21 the stop codon) was similar between the NIAS and Public FLcDNAs (84.9% and 85.6%, 22 respectively; Table I). These results suggest that the qualities of the two FLcDNA 10 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 datasets are quite similar in terms of capturing complete ORFs. 2 To evaluate the novel barley FLcDNAs revealed by our data, we compared the 3 sequence similarity between the NIAS and Public FLcDNAs. We found that 21,586 of 4 the 23,614 NIAS FLcDNAs showed no significant hits for any of the Public FLcDNAs. 5 After redundant clones were removed, the remaining 17,773 FLcDNAs represented 6 novel barley sequences (Fig. 1). Combined with the 4,878 nonredundant Public 7 FLcDNAs, 22,651 representative barley FLcDNAs were constructed. We have 8 designated these FLcDNAs “Uni-FLcDNAs” in this study and used this set for further 9 comparative analyses. The average length of the Uni-FLcDNAs was 1,711 ± 863 bp. 10 Recently, many paired sense/antisense transcripts referred to as natural antisense 11 transcripts (NATs) (Osato et al. 2003; Wang et al. 2005) have been identified. In 12 Arabidopsis, 958 NAT pairs have been confirmed (Alexandrov et al. 2006), and in rice, 13 687 bidirectional transcription units have been discovered by means of FLcDNA 14 mapping onto genome sequences (Osato et al. 2003). To test for the existence of NATs 15 in the obtained FLcDNAs, we used blastn to screen for paired sense/antisense 16 transcripts among all of the FLcDNAs, with a positive match criterion of greater than 17 95% identity and an E-value of less than 1 × 10−5. We determined that 2,051 FLcDNAs 18 could form sense/antisense pairs. An example of a sense/antisense pair presented by 19 ClustalX alignment software is shown in Supplemental Figure S2. NIASHv1002F20 20 and NIASHv1022F01 exhibit complete reverse homology, despite the fact that they 21 both contain predicted ORFs. This suggests that our dataset contains NATs, even for 22 sequences that encode predicted proteins. 11 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 2 Protein-coding genes 3 Based on homology searches and predictions of the longest ORF, 22,623 of 4 22,651 representative ORFs were identified; 19,212 of these ORFs (84.9%) were 5 homologous to known functional genes according to the results of blastx searches 6 against the RefSeq (Pruitt et al. 2009) and UniProtKB (The UniProt Consortium 2009) 7 databases. The proportion of representative FLcDNAs that were deemed to contain 8 complete ORFs was 85.4% (Table I), indicating that most of the FLcDNAs were 9 associated with protein-coding genes. 10 To identify sequences in the Uni-FLcDNAs that were homologous to the 11 previously cloned, rice trait genes, we employed 881 genes from Oryzabase 12 (http://www.shigen.nig.ac.jp/rice/oryzabase/, Kurata and Yamazaki 2006). Barley 13 homologs of these genes could play important roles in barley traits (phenotypes). We 14 searched for homologs of these sequences among the Uni-FLcDNAs using blastp 15 (Supplemental Table S2). We found that 721 of the 881 genes (81.8%) had barley 16 homologs with a high similarity (≥50% identity). Although we currently do not know 17 whether these homologs are orthologs, this result indicates that these trait genes are 18 highly conserved between barley and rice. The detection of homologous sequences 19 among the Uni-FLcDNAs could accelerate the comparative mapping of barley genes 20 and functional prediction of these homologous cDNAs to promote future barley 21 genomics research. 22 We found 75 representative FLcDNAs that were longer than 5,000 bp in length, 12 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 and some of these had the capacity to encode relatively long ORFs. For example, four 2 of the ten longest FLcDNAs encoded ORFs longer than 1,000 amino acids. In contrast, 3 three of their ORFs were shorter than 100 amino acids (Table II). 4 We estimated that there were 28 nc FLcDNAs among the Uni-FLcDNAs (see 5 Materials and Methods). Although these nc FLcDNAs could have been derived from 6 miRNAs, a blastn search against rice miRNAs revealed no significant homologies. 7 Moreover, as 26 of the 28 nc FLcDNAs exhibited no homologies with four fully 8 sequenced grass genomes (O. sativa, Z. mays, Sorghum bicolor and Brachypodium 9 distachyon), they might represent poorly conserved nc RNAs. 10 An analysis using InterProScan revealed that 16,859 of the 22,623 representative 11 ORFs (74.5%) contained conserved domains, and 12,595 ORFs (55.7%) were assigned 12 GO terms(Barrell et al. 2009). Using GO2slim, we found that “binding” (GO:0005488) 13 and “catalytic activity” (GO:0003824) were the most common second-level GO terms 14 found in the data (Supplemental Fig. S3). The distribution of GO terms related to the 15 barley Uni-FLcDNAs was quite similar to that seen in representative rice RAP data 16 (Tanaka et al. 2008). 17 18 Comparison of barley FLcDNAs with the Triticeae transcriptome 19 We described 17,773 novel barley nr FLcDNAs (Fig. 1). To evaluate the amount 20 of novel gene information in these sequences, homology searches of the FLcDNAs were 21 conducted against sequences from publicly available transcripts. For this purpose, 6,625 22 mRNAs and 525,559 ESTs from barley and 3,433 mRNAs and 1,107,168 ESTs from 13 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 wheat were downloaded from DDBJ/EMBL/GenBank. 2 A blastn search showed that 3,278 of the representative FLcDNAs had no 3 homologous sequences. The average insert length of these novel FLcDNAs (1,602 ± 4 921 bp) was slightly shorter than that of the known FLcDNAs (1,729 ± 851 bp) (Fig. 2). 5 Characterization of protein functions using the second-level GO terms showed a similar 6 distribution of terms among the novel and known FLcDNAs, except for “structural 7 molecule activity” (GO:0005198) (Fig. 3) (2.4% in the “known” gene set and 5.1% in 8 the “novel” gene set), indicating that the molecular functions of the novel FLcDNAs 9 were distributed similarly to those of the known FLcDNAs. The novel transcripts were 10 predicted to code for many different kinds of essential proteins, such as proteins 11 involved in protein synthesis, including ribosomal proteins and elongation factors, 12 transcription factors, cytochromes and stress-related proteins, including JA-induced 13 protein, CBFII-5.1, heat shock protein, and NBS-LRR disease resistance protein 14 homologue (Supplemental Table S3). Hence, we anticipate that these novel transcripts 15 will support the construction of an essential gene network for barley. 16 Of the 3,278 novel FLcDNAs, 1,974 were conserved in all four sequenced grass 17 genomes (i.e., O. sativa, Z. mays, S. bicolor, and B. distachyon (International Rice 18 Genome Sequencing Project 2005; Paterson et al. 2009; Schnable et al. 2009; The 19 International Brachypodium Initiative 2010)) according to the results of blastn searches, 20 but 942 had no grass homologs (Supplemental Table S4). We conducted blastp 21 searches of the amino acid sequences derived from the novel ORFs against predicted 22 protein sequences from the four species, and found that 1,731 ORFs were homologous 14 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 to genes annotated in at least one species. These results suggest that the molecular 2 functions of approximately two-thirds of the novel barley FLcDNAs could be predicted 3 by comparison with annotated genes in major grass species. 4 5 Detection of barley-specific FLcDNAs 6 To elucidate whether the Uni-FLcDNAs contained gene structures that are 7 conserved among crop plants, blastn searches were conducted against the four 8 completed genome sequences. The results showed that 1,699 FLcDNAs (Specific 9 FLcDNAs) were not associated with homologous regions in the four genomes, whereas 10 20,952 FLcDNAs (Common FLcDNAs) had a homolog in at least one of the four plant 11 species (Supplemental Table S5). Most FLcDNAs (19,778) were conserved in all four 12 species, but more barley ORFs were homologous to ORFs in O. sativa than in B. 13 distachyon, which shares a more recent common ancestor with barley than rice does. 14 This result might have been caused by differences in the methodology used to sequence 15 O. sativa and B. distachyon and the resulting quality of the reference genome sequences. 16 The number of conserved genes detected in Z. mays was less than half the number 17 detected in S. bicolor, even though the evolutionary distances of barley from Z. mays 18 and S. bicolor are similar. The length of the inserts in the barley-specific FLcDNAs was 19 shorter (1,301 ± 897 bp) than in the other FLcDNAs (1,745 ± 851 bp). 20 Of the specific FLcDNAs, 263 contained known functional domains based on 21 InterProScan analysis, and 25 were nc transcripts; thus, 85% of the specific FLcDNAs 22 were not assigned to any putative function. GO data suggested that the relative 15 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 proportions of the genes related to “signal transducer activity” (GO:0004871) and 2 “enzyme regulator activity” (GO:0030234) were greater in the specific FLcDNAs than 3 in the total representative FLcDNAs (data not shown). This suggests that barley exhibits 4 species-specific regulatory networks involved in signal transduction, transcription, and 5 metabolism. 6 7 Comparison of FLcDNA sequences with published BAC sequences of barley 8 To evaluate the barley FLcDNAs dataset in terms of its ability to capture a large 9 number of active FLcDNAs, we mapped all of the FLcDNAs on publicly available 10 barley bacterial artificial chromosome (BAC) sequences. Taketa and colleagues reported 11 (Taketa et al. 2008) a Haruna BAC contig (AP009567) in which two genes were 12 predicted. BAG12385.1 encodes a putative iron-deficiency specific 4 protein, and 13 BAG12386.1 is a putative ethylene-responsive transcription factor responsible for the 14 nud (hull-less) phenotype. Even though the public FLcDNAs could not be mapped to 15 the predicted loci, multiple FLcDNAs determined in this study could be mapped at each 16 locus, which suggests that the structures of these genes are accurate (Supplemental Fig. 17 S4). We also mapped the FLcDNAs on BACs of a different cultivar, Morex. Haruna 18 Nijo is an established Japanese two-row malting cultivar, whereas Morex is a six-row 19 cultivar that is used as the American malting industry's standard cultivar. Because 20 Morex will provide the first reference sequence for the barley genome, there have been 21 increasing numbers of submissions of Morex sequences in the DNA Data Bank of Japan 22 (DDBJ; www.ddbj.nig.ac.jp/). To estimate the conservation of genes sequences between 16 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 these cultivars, we mapped the FLcDNAs to publicly available Morex BAC sequences 2 using EST2genome (Mott 1997). We found that 373 FLcDNAs mapped to genomic 3 sequences from 113 of 181 BACs. Forty-five FLcDNAs (31 representatives) mapped to 4 more than one locus. For example, NIASHv1123K08 was mapped to four loci in three 5 BACs (one locus each of AY758233.1 and DQ249273.1 and two loci of AC239053.1). 6 These genes were highly conserved (99.4% identity on average, Fig. 4), and 61 of them 7 were completely identical. The results of this analysis indicate that the Haruna Nijo 8 FLcDNAs can be utilized by IBSC to annotate the Morex genome. 9 17 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Discussion 2 3 Scientists have recently recognized that FLcDNAs are indispensable resources for 4 gene identification and annotation. As the barley genome sequencing project progresses, 5 obtaining information about barley FLcDNAs has become increasingly important. 6 There is a considerable amount of Triticeae (barley and wheat) transcript data in the 7 public domain; however, the number of barley FLcDNAs is insufficient compared with 8 the number available for other crops, such as O. sativa, Z. mays, and G. max (Kikuchi et 9 al. 2003; Umezawa et al. 2008; Soderlund et al.2009). 10 The sequences developed in this study represent the largest collection of Triticeae 11 FLcDNA data produced to date. To comprehensively cover the entire barley 12 transcriptome with FLcDNA clones, we constructed novel libraries from Haruna Nijo 13 from 12 different developmental stages, organs and/or stressed conditions and isolated 14 more than 170,000 clones (see Supplemental information). To identify the source of 15 clones in the pooled library constructions, tagged sub-libraries were used. 16 The ESTs we identified were clustered, and representative cDNA clones were 17 selected for full sequencing. The quality of the FLcDNAs obtained in this study was 18 evaluated by comparing their average sequence length (1,701bp) with that of publicly 19 available barley and rice FLcDNAs. A recent report in which the average length of 20 27,455 maize FLcDNAs was found to be 1,442 bp (Soderlund et al.2009) also supports 21 the length quality of the FLcDNAs in the present study. The method of FLcDNA 22 construction used here was the CAP-trapper method (Carninci et al. 1996), which has 18 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 been used in FLcDNA projects in mice, Arabidopsis (Seki et al. 2002; Seki et al. 2004), 2 rice (Kikuchi et al. 2003), and soybean (Umezawa et al. 2008). Because this 3 methodology is based on the selection of mRNAs with an existing cap structure, it 4 provides high coverage of mRNA structures. 5 The 22,651 representative barley FLcDNAs described here were constructed 6 using the clones produced in this study and publicly available barley FLcDNAs. 7 Functional categorization of the Haruna Nijo FLcDNAs based on ORF prediction and 8 GO assignments produced results similar to those found for rice FLcDNAs. This could 9 indicate conservation of the entire gene sets between barley and rice. We also detected 10 28 nc genes. Only 4 out of the 28 nc transcripts were included in the NAT pairs, 11 suggesting that NATs are not the sole reason underlying the observed the nc transcripts. 12 Comparative analysis of the Uni-FLcDNAs against public data indicated the 13 importance of this dataset. First, 17,773 FLcDNAs were found to be novel in barley, 14 and 3,278 cDNAs from the Uni-FLcDNAs exhibited no homology to public EST or 15 mRNA data from barley or wheat. We consider it unlikely that this paucity of 16 homologous sequences resulted from natural variation among Haruna Nijo and other 17 barley varieties, because 1,974 of the 3,278 novel FLcDNAs had common homologs in 18 all four grass species examined (i.e., O. sativa, Z. mays, S. bicolor, and B. distachyon). 19 As full-length sequences are available for these four grass species, these 1,974 genes 20 might be structurally conserved and functionally active in grasses in general. These 21 findings should help us to assign gene functions to the novel barley FLcDNAs. 22 Second, we detected 1,699 FLcDNAs that showed no homology to any of the four 19 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 grass genomes. Even though the mean insert length of these FLcDNAs was shorter than 2 that of the other cDNAs, these 1,699 FLcDNAs still have the capacity to encode 3 functional proteins. We therefore concluded that these genes are Triticeae-specific (or at 4 least Hordeum-specific) sequences. Unfortunately, only 263 specific FLcDNAs could 5 be assigned putative functions based on InterPro domains, and the gene functions of the 6 other genes are unknown. The complete gene structure information and the FLcDNA 7 clones of the barley-specific genes can be used in future experimental studies, such as 8 for overexpression of recombinant proteins or microarray analyses, to reveal the 9 functions of these genes. We note that the 20,952 FLcDNAs with homologs in all four 10 grass genomes might still include some barley-specific genes, because the coding 11 potential of their mapped regions has not been verified; additionally, there were cases 12 where FLcDNAs mapped to non-genic regions in the latest annotated genome data. 13 The barley cultivar Haruna Nijo, which was used here as the source of the cDNA 14 libraries generated, was released as a malting barley variety in 1981 and has been 15 intensively used in the pedigree of Japanese malting barleys because of its excellent 16 quality profiles for brewing. Additionally, several genetic/genomic resources have been 17 established for this cultivar. More than 140,000 ESTs have been sequenced, and the 18 positions of more than 2,890 of these have been located in genetic maps (Sato et al. 19 2009). A BAC library has been constructed (Saisho et al. 2007), and some of these BAC 20 clones have been beneficial for map-based cloning of trait genes in barley (Taketa et al. 21 2008). These resources have made Haruna Nijo a useful variety for investigating barley 22 genetics and genomics (see http://www.shigen.nig.ac.jp/barley/index.html). 20 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 The IBSC is currently sequencing the complete genome of the Morex cultivar. 2 Mapping of all of the Haruna Nijo FLcDNAs to genome sequences from Morex BACs 3 clearly showed that the Haruna Nijo FLcDNAs can serve as good resources for the 4 genome annotation of Morex. Based on the density of the mapped FLcDNAs (i.e., 48.8 5 kb per gene; 277 loci mapped to 181 BAC sequences, covering 13.52 Mb), we 6 estimated that the total gene number in the barley genome is approximately 100,000. 7 This estimation is much larger than an estimation presented in a previous report (Mayer 8 et al. 2009) of 38,000 to 48,000 genes in the barley genome (i.e., 104 to 132 kb per gene 9 on average). It is likely that we have overestimated the number of genes, because the 10 BAC clones used for genome sequencing come from gene-rich regions targeted for the 11 purpose of map-based cloning or the investigation of genic regions. Based on the 12 previously estimated gene number (Mayer et al. 2009), our Hruna Nijo dataset of 13 22,651 nr FLcDNAs could represent 47% to 59% of the total number of genes present 14 in barley. The fact that 54% to 70% of the rice genes predicted by RAP have been 15 validated by rice FLcDNAs indicates that our dataset is similarly comprehensive. Not 16 all genes present in barley are expected to have been expressed in the 12 conditions 17 examined in the present study, so we suggest that the proportion of active genes 18 captured could be much greater than 47% to 59%. 19 21 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Conclusion 2 We cloned more than 170,000 Haruna Nijo barley FLcDNAs, and identified more 3 than 24,000 complete FLcDNAs. The final set of 22,651 representative FLcDNAs 4 obtained will be a very useful resource for future studies of barley, as well as for the 5 annotation of barley genomic sequences in the ongoing IBSC genome sequencing 6 project. These data will also support the future wheat genome sequencing project 7 currently being organized by the International Wheat Genome Sequencing Consortium 8 (IWGSC, http://www.wheatgenome.org/). 9 10 All FLcDNA data obtained in this study are available from our full length Barley cDNA Database (http://barleyflc.dna.affrc.go.jp/hvdb/index.html). 11 22 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Materials and Methods 2 3 Full-sequencing of representative clones 4 For each representative clone determined after clustering the end sequences of 5 clones, both the 5 and the 3 ends were resequenced using the Big Dye Terminator v3.1 6 Cycle Sequencing Kit and then analyzed with an ABI 3730xl DNA sequencer (Applied 7 Biosystems Inc., Foster City, CA, USA) to confirm the clone-sequence identity. Next, 8 the internal regions were sequenced using a primer-walking method until the sequences 9 from opposite directions overlapped. We designed the sequencing strategy so that at 10 least two reads covered every part of the insert region of each cDNA clone to assure 11 sequence quality. Finally, we assembled these reads to create a consensus sequence for 12 each clone. ′ ′ 13 14 Removing redundancy 15 All-against-all blastn searches (Altschul et al. 1990) were conducted among the 16 barley FLcDNA sequences to detect redundant sequences. When an FLcDNA was 17 similar to another FLcDNA with 90% identity and 95% coverage, the FLcDNAs 18 were clustered. If two FLcDNAs that did not overlap were in the same cluster, we 19 considered that a fused transcript that bridged the two FLcDNAs was contained in this 20 cluster. In this case, the cDNA was discarded and the cluster was separated into two 21 clusters. ≥ ≥ 22 23 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Prediction of open reading frames and functional annotation 2 Representative ORFs were predicted by blastx searches against the RefSeq and 3 UniProtKB databases with a positive match cutoff set at an E-value less than 1 × 10−20. 4 If there were no homologous proteins in the databases, the longest ORFs (>70 amino 5 acids in length) were assigned as predicted ORFs. If no ORFs were predicted, for an 6 FLcDNA, it was defined as nc FLcDNA. nc FLcDNAs were compared with rice 7 microRNAs 8 Griffiths-Jones et al. 2008) using blastn. To assign a gene function on the basis of 9 conserved domains or motifs, InterProScan searches (Zdobnov and Apweiler 2001; 10 Hunter et al. 2009) were conducted using the predicted ORFs. Based on the results of 11 the InterProScan searches, GOs categories were assigned to each predicted ORF 12 (Barrell et al. 2009). The GOs assignments were categorized using map2slim software 13 (http://www.geneontology.org/GO.slims.shtml#whatIs). (miRNAs) downloaded from miRBase (http://www.mirbase.org/; 14 15 Comparison of FLcDNA sequences with complete genome sequences from other 16 crop species 17 To conduct genome-wide comparisons of the FLcDNA sequences, genome 18 sequences and annotation data from four completely sequenced crop plants were 19 obtained at the following sites: the Rice Annotation Project Database (RAP-DB; 20 http://rapdb.dna.affrc.go.jp/) 21 (http://www.maizesequence.org/index.html) 22 (http://genome.jgi-psf.org/Sorbi1/Sorbi1.home.html) for S. bicolor, and BrachyBase for O. sativa, for the MaizeSequence Z. mays, 24 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. the database JGI 1 (http://www.brachybase.org/) for B. distachyon. The FLcDNA sequences were mapped 2 onto the genome sequences using blastn search at an E-value of less than 1 × 10-5. 3 4 Detection of novel barley FLcDNAs 5 mRNAs and ESTs from barley and wheat were downloaded from DDBJ. The 6 poly(A) sequences were then trimmed, and repetitive regions were masked using 7 RepeatMasker software (http://repeatmasker.org). We conducted blastn searches of the 8 sequences against all barley FLcDNAs with the threshold of a positive match set at an 9 identity 90% and a coverage 90%. Because the dataset of barley FLcDNAs contained 10 redundant sequences, multiple hits for the query mRNA or EST sequences were 11 accepted. ≥ ≥ 12 13 Comparison with rice trait genes 14 A 15 (http://www.shigen.nig.ac.jp/rice/oryzabase/genes/genesTop.jsp). Of 4,124 trait genes, 16 881 genes had RAP-DB transcript information. The amino acid sequences of these 17 genes were used as queries for blastp analyses with the predicted ORFs of the 18 Uni-FLcDNAs. list of rice trait genes was downloaded from Oryzabase 19 20 Comparison with barley BAC sequences To compare the FLcDNAs with BAC sequences, one Haruna Nijo (Taketa et al. 21 22 2008) and 181 Morex BAC sequences were downloaded 25 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. from the 1 DDBJ/EMBL/GenBank. The Morex BACs were downloaded using the keywords 2 “Morex” and “BAC”. After masking repeat sequences using RepeatMasker and the 3 libraries 4 (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) (Spannagl et al. 2007) and 5 Triticeae 6 http://wheat.pw.usda.gov/ITMI/Repeats/), the FLcDNAs were mapped onto the BACs 7 using the same method as was used for RAP annotation (Tanaka et al. 2008). The 8 FLcDNAs were first mapped using blastn to determine the possible mapping regions in 9 the genomic sequences and further mapped using EST2genome (Mott 1997) with 10 thresholds of 95% identity and 90% coverage. When two or more FLcDNAs were 11 mapped to the same locus, the longest FLcDNA was used for further analysis. of Repeat ≥ MIPS Repeat Sequence Element Database release Database 10 4.3 (TREP, ≥ 12 13 14 15 Sequence publication in the public domain All of the sequence data for the representative FLcDNAs have been submitted to the DDBJ (AK353559-AK377172). 16 26 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Acknowledgments 2 3 The authors are grateful to Sapporo Breweries Co. Ltd for providing the 4 germinating barley seeds and to Dr. J. F. Ma from Okayama University for experimental 5 assistance. Published barley FLcDNA information on was provided by the National 6 Bio-Resource Project of the MEXT, Japan. 7 27 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Literature cited 2 3 Alexandrov NN, Troukhan ME, Brover VV, Tatarinova T, Flavell RB and 4 Feldmann KA (2006) Features of Arabidopsis genes and genome discovered using 5 full-length cDNAs. Plant Mol Biol 60:69-85 6 Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ (1990) Basic local 7 alignment search tool. J Mol Biol 215:403-410 8 Aoki K, Yano K, Suzuki A, Kawamura S, Sakurai N, Suda K, Kurabayashi A, 9 Suzuki T, Tsugane T, Watanabe M, et al (2010) Large-scale analysis of full-length 10 cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference 11 system for the Solanaceae genomics. BMC Genomics 11:210 12 Barrell D, Dimmer E, Huntley R P, Binns D, O'Donovan C and Apweiler R (2009) 13 The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic 14 Acids Res 37:D396-D403 15 Bennetzen JL, Coleman C, Liu R, Ma J and Ramakrishna W (2004) Consistent 16 over-estimation of gene number in complex plant genomes. Curr Opin Plant Biol 17 7:732-736 18 Carninci P, Kvam C, Kitamura A, Ohsumi T, Okazaki Y, Itoh M, Kamiya M, 19 Shibata K, Sasaki N, Izawa M, et al (1996) High-efficiency full-length cDNA cloning 20 by biotinylated CAP trapper. Genomics 37:327-336 21 Close TJ, Wanamaker SI, Caldo RA, Turner SM, Ashlock DA, Dickerson JA, 22 Wing RA, Muehlbauer GJ, Kleinhofs A and Wise RP (2004) A New Resource for 28 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Cereal Genomics: 22K Barley GeneChip Comes of Age. Plant Physiol. 134:960-968 2 Cruveiller S, Jabbari K, Clay O and Bernardi G (2004) Incorrectly predicted genes 3 in rice? Gene 333:187-188 4 Druka A, Muehlbauer G, Druka I, Caldo R, Baumann U, Rostoks N, Schreiber A, 5 Wise R, Close T, Kleinhofs A, et al (2006) An atlas of gene expression from seed to 6 seed through barley development. Functional & Integrative Genomics 6:202-211 7 Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, 8 Lushbough C and Brendel V (2008) PlantGDB: a resource for comparative plant 9 genomics. Nucl. Acids Res. 36:D959-D965 10 Griffiths-Jones S, Saini HK, van Dongen S and Enright AJ (2008) miRBase: tools 11 for microRNA genomics. Nucleic Acids Res 36:D154-D158 12 Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das 13 U, Daugherty L, Duquenne L, et al (2009) InterPro: the integrative protein signature 14 database. Nucleic Acids Res 37:D211-D215 15 Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero R 16 A, Tamura T, Yamaguchi-Kabata Y, Tanino M, et al (2004) Integrative annotation 17 of 21,037 human genes validated by full-length cDNA clones. PLoS Biol. 2:0859-0875 18 International Rice Genome Sequencing Project (2005) The map-based sequence of 19 the rice genome. Nature 436:793-800 20 Itoh T, Tanaka T, Barrero RA, Yamasaki C, Fujii Y, Hilton PB, Antonio BA, 21 Aono H, Apweiler R, Bruskiewich R, et al (2007) Curated genome annotation of 22 Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. 29 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Genome Res 17:175-183 2 Jabbari K, Cruveiller S, Clay O, Le Saux J and Bernardi G (2004) The new genes 3 of rice: a closer look. Trends in Plant Science 9:281 4 Kawai J, Shinagawa A, Shibata K, Yoshino M, Itoh M, Ishii Y, Arakawa T, Hara 5 A, Fukunishi Y, Konno H, et al (2001) Functional annotation of a full-length mouse 6 cDNA collection. Nature 409:685-690 7 Kawaura K, Mochida K, Enju A, Totoki Y, Toyoda A, Sakaki Y, Kai C, Kawai J, 8 Hayashizaki Y, Seki M, et al (2009) Assessment of adaptive evolution between wheat 9 and rice as deduced from full-length common wheat cDNA sequence data and 10 expression patterns. BMC Genomics 10:271 11 Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, 12 Ishikawa M, Yamada H, Ooka H, et al (2003) Collection, Mapping, and Annotation 13 of Over 28,000 cDNA Clones from japonica Rice. Science 301:376-379 14 Koo BC, Bushman BS and Mott IW (2008) Transcripts Associated with 15 Non-Acclimated Freezing Response in Two Barley Cultivars. The Plant Genome 16 1:21-32 17 Kurata N and Yamazaki Y (2006) Oryzabase. An integrated biological and genome 18 information database for rice. Plant Physiol. 140:12-17 19 Marin-Rodriguez MC, Orchard J and Seymour GB (2002) Pectate lyases, cell wall 20 degradation and fruit softening. J. Exp. Bot. 53:2115-2119 21 Maruyama K and Sugano S (1994) Oligo-capping: a simple method to replace the cap 22 structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138:171-174 30 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Mayer KFX, Taudien S, Martis M, Simkova H, Suchankova P, Gundlach H, 2 Wicker T, Petzold A, Felder M, Steuernagel B, et al (2009) Gene Content and 3 Virtual Gene Order of Barley Chromosome 1H. Plant Physiol. 151:496-505 4 Mochida K, Saisho D, Yoshida T, Sakurai T and Shinozaki K (2008) TriMEDB: A 5 database to integrate transcribed markers and facilitate genetic studies of the tribe 6 Triticeae. BMC Plant Biology 8:72 7 Mott R (1997) EST_GENOME: a program to align spliced DNA sequences to 8 unspliced genomic DNA. Comput Appl Biosci. 13:477-478 9 Ogihara Y, Mochida K, Kawaura K, Murai K, Seki M, Kamiya A, Shinozaki K, 10 Carninci P, Hayashizaki Y, Shin-I T, et al (2004) Construction of a full-length cDNA 11 library from young spikelets of hexaploid wheat and its characterization by large-scale 12 sequencing of expressed sequence tags. Genes & Genetic Systems 79:227-232 13 Osato N, Yamada H, Satoh K, Ooka H, Yamamoto M, Suzuki K, Kawai J, 14 Carninci P, Ohtomo Y, Murakami K, et al (2003) Antisense transcripts with rice 15 full-length cDNAs. Genome Biol 5:R5 16 Ota T, Suzuki Y, Nishikawa T, Otsuki T, Sugiyama T, Irie R, Wakamatsu A, 17 Hayashi K, Sato H, Nagai K, et al (2004) Complete sequencing and characterization 18 of 21,243 full-length human cDNAs. Nat Genet 36:40-45 19 Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, 20 Haberer G, Hellsten U, Mitros T, Poliakov A, et al (2009) The Sorghum bicolor 21 genome and the diversification of grasses. Nature 457:551-556 22 Pruitt KD, Tatusova T, Klimke W and Maglott DR (2009) NCBI Reference 31 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Sequences: current status, policy and new initiatives. Nucleic Acids Res 37:D32-D36 2 Saisho D, Myoraku E, Kawasaki S, Sato K and Takeda K (2007) Construction and 3 Characterization of a Bacterial Artificial Chromosome (BAC) Library from the 4 Japanese Malting Barley variety Haruna Nijo. Breeding Science 57:29-38 5 Sato K, Nankaku N and Takeda K (2009) A high-density transcript linkage map of 6 barley derived from a single population. Heredity 103:110-117 7 Sato K, Shin I T, Seki M, Shinozaki K, Yoshida H, Takeda K, Yamazaki Y, Conte 8 M and Kohara Y (2009) Development of 5006 full-length CDNAs in barley: a tool for 9 accessing cereal genomics resources. DNA Res 16:81-89 10 Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, 11 Fulton L, Graves TA, et al (2009) The B73 maize genome: complexity, diversity, and 12 dynamics. Science 326:1112-1115 13 Schulte D, Close TJ, Graner A, Langridge P, Matsumoto T, Muehlbauer G, Sato K, 14 Schulman AH, Waugh R, Wise RP, et al (2009) The International Barley Sequencing 15 Consortium--At the Threshold of Efficient Access to the Barley Genome. Plant Physiol. 16 149:142-147 17 Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju 18 A, Akiyama K, Oono Y, et al (2002) Functional Annotation of a Full-Length 19 Arabidopsis cDNA Collection. Science 296:141-145 20 Seki M, Satou M, Sakurai T, Akiyama K, Iida K, Ishida J, Nakajima M, Enju A, 21 Narusaka M, Fujita M, et al (2004) RIKEN Arabidopsis full-length (RAFL) cDNA 22 and its applications for expression profiling under abiotic stress conditions. J. Exp. Bot. 32 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 55:213-223 2 Soderlund C, Descour A, Kudrna D, Bomhoff M, Boyd L, Currie J, Angelova A, 3 Collura K, Wissotski M, Ashley E, et al (2009) Sequencing, Mapping, and Analysis 4 of 27,455 Maize Full-Length cDNAs. PLoS Genet 5:e1000740 5 Spannagl M, Noubibou O, Haase D, Yang L, Gundlach H, Hindemitt T, Klee K, 6 Haberer G, Schoof H and Mayer KF (2007) MIPSPlantsDB--plant database resource 7 for integrative and comparative plant genome research. Nucleic Acids Res 8 35:D834-D840 9 Sreenivasulu N, Radchuk V, Strickert M, Miersch O, Weschke W and Wobus U 10 (2006) Gene expression patterns reveal tissue-specific signaling networks controlling 11 programmed cell death and ABA-regulated maturation in developing barley seeds. The 12 Plant Journal 47:310-327 13 Sreenivasulu N, Usadel B, Winter A, Radchuk V, Scholz U, Stein N, Weschke W, 14 Strickert M, Close TJ, Stitt M, et al (2008) Barley Grain Maturation and 15 Germination: Metabolic Pathway and Regulatory Network Commonalities and 16 Differences Highlighted by New MapMan/PageMan Profiling Tools. Plant Physiol. 17 146:1738-1758 18 Sreenivasulu N and Wobus U (2008) Barley Genomics: An Overview. International 19 Journal of Plant Genomics 2008 20 Suzuki Y, Yoshitomo-Nakagawa K, Maruyama K, Suyama A and Sugano S (1997) 21 Construction and characterization of a full length-enriched and a 5'-end-enriched cDNA 22 library. Gene 200:149-156 33 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Taketa S, Amano S, Tsujino Y, Sato T, Saisho D, Kakeda K, Nomura M, Suzuki T, 2 Matsumoto T, Sato K, et al (2008) Barley grain with adhering hulls is controlled by an 3 ERF family transcription factor gene regulating a lipid biosynthesis pathway. 4 Proceedings of the National Academy of Sciences 105:4062-4067 5 Talame V, Ozturk NZ, Bohnert HJ and Tuberosa R (2007) Barley transcript profiles 6 under dehydration shock and drought stress treatments: a comparative analysis. J. Exp. 7 Bot. 58:229-240 8 Tanaka T, Antonio B A, Kikuchi S, Matsumoto T, Nagamura Y, Numa H, Sakai H, 9 Wu J, Itoh T, Sasaki T, et al (2008) The Rice Annotation Project Database 10 (RAP-DB): 2008 update. Nucleic Acids Res 36:D1028-33 11 The International Brachypodium Initiative (2010) Genome sequencing and analysis 12 of the model grass Brachypodium distachyon. Nature 463:763-768 13 The UniProt Consortium (2009) The Universal Protein Resource (UniProt) 2009. 14 Nucleic Acids Res 37:D169-D174 15 Tommasini L, Svensson J, Rodriguez E, Wahid A, Malatrasi M, Kato K, 16 Wanamaker S, Resnik J and Close T (2008) Dehydrin gene expression provides an 17 indicator of low temperature and drought stress: transcriptome-based analysis of Barley 18 ( Hordeum vulgare L.). Functional & Integrative Genomics 8:387-405 19 Ueda A, Kathiresan A, Bennett J and Takabe T (2006) Comparative transcriptome 20 analyses of barley and rice under salt stress. TAG Theoretical and Applied Genetics 21 112:1286-1294 22 Umezawa T, Sakurai T, Totoki Y, Toyoda A, Seki M, Ishiwata A, Akiyama K, 34 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Kurotani A, Yoshida T, Mochida K, et al (2008) Sequencing and Analysis of 2 Approximately 40 000 Soybean cDNA Clones from a Full-Length-Enriched cDNA 3 Library. DNA Res 15:333-346 4 Varshney RK, Langridge P, Graner A and Jeffery CH (2007) Application of 5 Genomics to Molecular Breeding of Wheat and Barley. Advances in Genetics Volume 6 58:121-155 7 Walia H, Wilson C, Wahid A, Condamine P, Cui X and Close T (2006) Expression 8 analysis of barley (Hordeum vulgare L.) during salinity stress. Functional & Integrative 9 Genomics 6:143-156 10 Wang XJ, Gaasterland T and Chua NH (2005) Genome-wide prediction and 11 identification of cis-natural antisense transcripts in Arabidopsis thaliana. Genome Biol 12 6:R30 13 Wang J, Shi ZY, Wan XS, Sheu GZ and Zhang JL (2008) The expression pattern of 14 a rice proteinase inhibitor gene OsP18-1 implies its role in plant development. J Plant 15 Physiol 165:1519-1529 16 Zdobnov E M and Apweiler R (2001) InterProScan--an integration platform for the 17 signature-recognition methods in InterPro. Bioinformatics 17:847-848 18 35 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Figure Legends 2 Figure 1. 3 study Determination of the representative barley FLcDNAs obtained in this 4 5 Figure 2. Distribution of nucleotide lengths of known and novel FLcDNAs 6 Size distributions of FLcDNAs with a high homology to known grass genes (white bar) 7 and with no homology to known gene s (black bars). 8 9 Figure 3. GO categorization of known and novel FLcDNAs 10 Functional motif/domains were detected and categorized using GO criteria. Results for 11 the representative FLcDNAs with a high homology to known grass genes (Known) and 12 with no homology to known genes (Novel). 13 14 Figure 4. Distribution of sequence identities of mapped Haruna Nijo FLcDNAs to 15 published Morex BAC sequences 16 36 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Tables 2 Table I. Comparison of the completeness of the ORFs in the barley FLcDNAs 3 NIAS Complete ORF Public Representative 20,055 4,279 19,335 5'-truncated ORF 3,404 685 3,181 3'-truncated ORF 100 11 79 5’-, 3’-truncated ORFa 29 4 28 Non-protein codingb 26 20 28 23,614 4,999 22,651 Total 4 a 5 proteins or longest ORFs whose lengths were ≥70 amino acids. ORF had neither start nor stop codon. b FLcDNAs had no ORFs that were similar to known 6 37 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved. 1 Table II. The ten longest FLcDNAs in the Uni-FLcDNA library 2 Nucleotide Amino acid length Clone name Gene function length (bp) (AA) NIASHv2010H17 7,384 460 Mitogen-activated protein kinase NIASHv3115C23 7,155 2,230 Acetyl-coenzyme A carboxylase NIASHv2090B16 7,012 90 Cycloeucalenol cycloisomerase FLbaf76j08 6,765 98 Hypothetical protein NIASHv2019K03 6,637 1,731 Chromatin remodeling complex subunit NIASHv2035I12 6,311 227 Conserved hypothetical protein NIASHv2017F18 6,183 152 KN1-type homeobox transcription factor NIASHv1031N03 6,072 33 Hypothetical protein NIASHv2125A18 6,037 1,859 DNA-directed RNA polymerase NIASHv3085A02 6,036 1,792 Callose synthase 1 catalytic subunit 3 38 Downloaded from on June 18, 2017 - Published by www.plantphysiol.org Copyright © 2011 American Society of Plant Biologists. All rights reserved.
© Copyright 2025 Paperzz