DNA RESEARCH 4, 291-300 (1997) Short Communication Structural Analysis of Arabidopsis thaliana Chromosome 5. II. Sequence Features of the Regions of 1,044,062 bp Covered by Thirteen Physically assigned PI Clones Hirokazu KOTANI, Yasukazu NAKAMURA, Shusei SATO. Takakazu KANEKO, Erika ASAMIZU, Nobuyuki MlYAJiMA, and Satoshi TABATA* Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292. Japan (Received 20 July 1997) Abstract A total of 13 PI clones, each containing a marker(s) specifically mapped on chromosome 5, were isolated from a PI library of the Arabidopsis thaliana Columbia genome, and their nucleotide sequences were determined according to the shot gun based strategy and precisely located on the physical map of chromosome 5. The total length of the sequenced regions was 1,044,062 bp. Since we have previously reported the sequence of 1,621,245 bp by analysis of 20 non-redundant PI clones, the total length of the sequences of chromosome 5 determined so far reached 2,665,307 bp. The regions sequenced in this study were analysed by comparison with the sequences in protein and EST databases and analysis with computer programs for gene modeling; a total of 225 potential protein-coding genes and/or gene segments with known or predicted functions were identified. The positions of exons which do not exhibit similarity to known genes were also predicted by computer-aided analysis. An average density of the genes and/or gene was 1 gene/4,640 bp. Introns were identified in approximately 84% of the potential genes, and the average number and length of the introns per gene were 5.3 and 184 bp, respectively. These sequence features are essentially identical to those for the previously sequenced regions. The transcription level of the predicted genes has been roughly monitored by counting the numbers of matched Arabidopsis ESTs. The sequence data and gene information are available through the World Wide Web at http://www.kazusa.or.jp/arabi/. Key words: Arabidopsis thaliana chromosome 5; genomic sequence; PI genomic library; gene prediction With a final objective of understanding of the entire genetic system in higher plants, we initiated large-scale sequencing of the Arabidopsis thaliana genome which consists of five chromosomes, totalling an estimated 130 Mb. In the initial phase of the project, we focused our target on chromosome 5 along the line of the international agreement of the Arabidopsis Genome Initiative.1 We screened the chromosome 5-specific clones from a PI library from which a contig map was constructed (manuscript in preparation), and sequence analysis of PI clones physically assigned was started. We previously reported sequence features of the regions of 1,621,245 bp regions which are covered by 20 non-redundant PI clones.2 We now completed sequence determination of 13 additional PI clones which have been localized on chromosome 5. In this paper, gene organization and structural and functional information of the genes which likely reside in the sequenced regions that were deduced by computer-aided ~ : ~TT 7T7" ~ : Communicated by Mituru lakanami * analysis are described, x isolation and Sequencing of PI Clones DNA sources and the method of clone isolation were essentially the same as described in the previous paper.2 PI clones containing the DNA regions corresponding to 13 DNA markers on chromosome 5 were isolated by screening the Mitsui PI library3 by means of PCR with the primers designed on the basis of the marker sequences. The DNA markers used and the selected clones are: m217 (MHF15), nga249 (MAH20), CHS (MSH12), mi438 (MVA3), mi433 (MDJ22), CIC4D4R (MYJ24), CIC4D4 (MOP9), BELLI (MYC6), mi83 (MRH10), MPO12 m i61 (MCL19) , and CIC10H1 (MAF19). and MTH12 were directly isolated as clones showing restriction fragment length polymorphism (RFLP), when used as probes for genomic Southern hybridization (manuscript in preparation). The relative positions of v r r r i r To whom correspondence should be addressed. Tel. +81-438- the markers and the sequenced clones on chromosome 5 52-3933, Fax. +81-438-52-3934, E-mail: [email protected] are shown in Fig. 1. The relative orientation of each clone on the chromosome is not yet known. Sequencing of Arabidopsis thaliana Chromosome 5 292 length (Mbp) m217 nga249mM74CHS mi322 •MHF15 -MAH20 -MSH12 mi438 • • MVA3 mi433 ; mi90 - . MDJ22 • MYJ24 •MOP9 CIC4D4R • mi219mi125 mi291bmi137— MPO12 MYC6 m423 BELLI mi69 mi70 mi83 — g3844 - CIC10H1L' MRH10 MCL19 — MTH12 MAF19 Figure 1. Relative locations of the sequenced PI clones and the associated markers on the physical map of chromosome 5. Positions of DNA markers used for PI isolation and of other major DNA markers were mapped based on the YAC tiling path and on map information in ref. 12 and Sato et al. (manuscript in preparation). The vertical box represents the entire length of chromosome 5. Names of PI clones are given at the right side, and those of markers at the left side. The distance (Mbp) from the telomeric site of the top arm is given in the vertical scale. [Vol. 4 models constructed with the help of computer programs; Grail, 0 FEXA in GeneFinder,6 ER (Murakami K., personal communication), ASPL in GeneFinder.e GENSCAN 7 and NetPlantGene programs, 8 which predict either exon regions or exon-intron boundaries. The transcribed regions were assigned by comparison of the nucleotide sequences with Arabidopsis ESTs 9 ' 10 in the public databases. The potential protein coding regions assigned were divided into the following three categories. A single exon oi a region containing consecutive multiple exons showing similarity to a single reported gene throughout the alignment was assigned as a potential protein gene. The} are denoted by numbers with the clone names followec by sequential numbers from one end to another of the insert. A region which matched only to portions of i reported gene and only to Arabidopsis ESTs were assigned as a potential exon(s) and a transcribed region respectively. These regions were distinguished from the potential protein genes by adding "p" and "t" betweer the clone names and the sequential numbers in the identifiers, respectively. All the genes and gene portions assigned in each PI clone according to the above procedure are listed in the table below the figure, and also schematically represented in Fig. 2. To sum up. 120 potential protein genes, 62 potential exons, and 43 transcribed regions were assigned in the 1,044,062 bp regions. An average density of the genes in the three categories in the total of 2,665,307 bp, including the previously reportec 1,621,245 bp sequences, is estimated to be 1 gene pei 4,640 bp. However, the possibility remains that additional genes may be discovered among the intergenic regions in the future, since our prediction is mainly basec on computer-assisted analysis. RNA coding regions were assigned on the basis of sequence similarity to the reported structural RNAs. Foi tRNA genes, prediction by the tRNAscan-SE program11 was also taken into account. As indicated in Fig. 1 1 tRNA gene was identified on the opposite strand of the fourth intron of a chloroplast triose phosphate translocator precursor gene in MCL19, and was denoted as mcll9rl. The nucleotide sequence of each P I insert was determined according to the bridging shotgun method described previously.2 The length of the nucleotide sequence of each P I insert finally confirmed is indicated at the top of Fig. 2. The total length of the DNA regions sequenced in this study was 1,044,062 bp. Since we have previously reported the sequences of 1,621,245 bp covered by 20 non-redundant P I clones,2 the total length of the sequences of chromosome 5 determined is now up to 3. 2,665,307 bp. Structural Features of the Potential Protein Genes In the DNA regions sequenced in this and previous papers, 2 the structure was predicted for 259 potentia' protein genes, approximately 1.3% of the total gene conAssignment of potential protein coding regions and stituents (20,000 genes) assumed for A. thaliana. Strucgene modeling were performed by combination of simi- tural features of the potential protein genes deduced sc larity search and computer prediction as described in the far are listed in Table 1. Introns were identified in apprevious paper. 2 Briefly, similarity search was first car- proximately 81% of the potential genes, and the average: ried out using the BLASTP program 4 against the non- number of the introns per gene was 4.5. The average redundant protein sequence database, owl (release 29). length of the introns was 174 bp, which was consistent The identified exons were integrated into the gene with the result obtained from analysis of 146 Arabidopsis 2. Assignment of the Potential Coding Regions No. 4] H. Kotani et al. 293 Table 1. Structural features of potential protein genes in A. thaliana chromosome 5 Features Gene length including introns Product length Genes with introns Number of intron/gene Exon length Intron length GC content of exons GC content of introns 120 genes 3 194-11,377 bp (2,457 bp) 65-1,837 a.a. (496 a.a.) 101 0-42 (5.3) 2-3,049 bp (240 bp) 23-2.435 bp (184 bp) 44% 32% 259 genes b 191-11.377 bp (2,138 bp) 64-1,837 a.a. (456 a.a.) 210 0-42 (4.5) 2-4,026 bp (251 bp) 23-2,435 bp (174 bp) 43% 32% Structural features of the 120 potential protein genes assigned in this studya) and the 259 genes assigned so farb' are listed. Average values are shown in parenthesis. genes registered in GenBank. 8 It was noted that the av- References erage GC content of introns (32%) was significantly lower 1. Kaiser, J. 1996, First global sequencing effort begins. than that of exons (43%). 4. Expression Level of the Potential Protein Genes and Gene Segments The number of matched Arabidopsis ESTs in the public DNA databases for each of the potential protein genes and gene segments was counted to monitor the transcriptional level of the genes. Of the 225 genes and gene segments that we have identified in chromosome 5 in this study. 114 carried matched ESTs. The putative products of the genes hit by 10 or more EST files, suggesting that they arc highly expressed genes, include those showing sequence similarity to chloroplast triose phosphate translocator precursor and acid phosphatase precursor 1. The sequence data as well as the gene information shown in this paper are available through the World Wide Web at http://www.kazusa.or.jp/arabi/. Acknowledgments: We thank S. Sasamoto and K. Xaruo for excellent technical assistance and the members of DNA Sequencing Laboratory: T. Kimura. T. Hosouchi. K. Kawashima. M. Matsumoto, A. Matsuno. E. Mitsui. A. Muraki. N. Nakazaki, S. Okumura. S. Shinpo. C. Takcuchi. T. Wada. A. Watanabe. M. Yamada. M. Yasuda. and M. Yatabe for their excellent team work. We are grateful to A. Tanaka for technical advice, and Mitsui Plant Biotechnology Research Institute and Arabidopsis Biological Resource Center at Ohio State University for providing the DNA markers and the DNA libraries. This work was supported by the Kazusa DNA Research Institute Foundation. We thank M. Takanarni for his support and encouragement to perform this project. Science, 274, 30. 2. Sato, S.. Kotani, H., Nakamura, Y. et al. 1997, Structural analysis of Arabidopsis thaliana chromosome 5. I. Sequence features of the 1.6 Mb regions covered by twenty physically assigned P I clones, DNA Res., 4, 215-230. 3. Liu, Y.-G., Mitsukawa, N., Vazquez-Tello, A., and Whittier, R. F. 1995, Generation of a high-quality P I library of Arabidopsis suitable for chromosome walking, Plant J., 7, 351-358. 4. Altschul, S. F.. Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. 1990, Basic local alignment search tool, J. Mol. Biol, 215, 403-410. 5. Uberbacher, E. C. and Mural, R. J. 1991, Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach, Proc. Natl. Acad. Sci. USA, 88, 11261-11265. 6. Solovyev, V. V., Salamov, A. A., and Lawrence, C. B. 1994, Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames, Nucl. Acids Res., 22. 5156-5163. 7. Burge, C. and Karlin, S. 1997, Prediction of complete gene structures in human genomic DNA, J. Mol. Biol.. 268, 78-94. 8. Hebsgaard, S. M.. Korning, P. G.. Tolstrup. N.. Engelbrecht. J., Rouze, P.. and Brunak, S. 1996. Splice site prediction in Arabidopsis thaliana DNA by combining local and global sequence information. Nucl. Acids Res., 24, 3439-3452. 9. Newman. T.. Bruijn. F. J.. and Green. P. 1994. Genes galore: A summary of methods for accessing results from large-scale partial sequencing of anonymous Arabidopsis cDNA clones, Plant Physiol, 106, 1241-1255. 10. Cooke, R.. Raynal, M., Laudie M. et al. 1996, Further progress towards a catalogue of all Arabidopsis genes: analysis of a set of 5000 non-redundant ESTs, Plant J.. 9. 101-124. 11. Lowe, T. M. and Eddy. S. R. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence, Nucl. Acids Res.. 25, 955 964. 12. Schmidt. R.. Love. K.. West. J. et al. 1997. Description of 31 YAC contigs spanning the majority of Arabidopsis thaliana chromosome 5. Plant J.. 11. 563 572. Sequencing of Arabidopsis thaliana Chromosome 5 294 [Vol. 4. MAF19 (78379 bp) nui iii mil i II I Illl II I I III! I MM III II II I t3M 5 Protein db hit 15 I < Grail exon III pBpIO pii ESTdbhit 6 Gene I r Gene II I ESTdbhil Protein db hit III IB nun ii a m i II I I Grail exon ;anos potential protein i :F r maflsl r 1r mane.4 mattes matt 9.6 3 6 4 9 -—sTcl 9807 11086 S7718 00096 72964 '[ 5 1 309 0 0 5 69073 61136 74300 452 300 521495 S402S6 (aa) 3 4 3 — 281 X96343 P40978 P13795 479 146 228 {%, aa 32~ 49. 26. 7t. hypothetical RNA-binding protein protein tergenic region N . t a U c a m H S R 2 0 1 protein 40> riboiomal protein S19 tynaptatomal auociated protein Or at a Gallui 25 mtiva gallui potential *xon» identifier Dire •lion m»n9 p3 man 9 p4 mafle p8 + - m*n9.p7 m»n9.p8 mans.ps + E S T 5' 194 11800 17734 21267 30641 L< 7aa h ) Deflnitio n Accection (aa) 103 <%. aa 54. 12138 19770 22659 30726 37883 1 T 5 1 1 0 0 0 0 0 113 319 317 62 80 Q05000 U61964 U61954 Z8312S P&2409 108 324 316 62 77 27. 40. f 36. 30. 40. 64808 66520 2 2 0 0 103 113 U75467 103 38. 37844 S4I34 06027 m»fl9.plO 06772 67209 mtfl9.pl I 70165 70463 tranacribed hit replicati CaenorhabdtUt glucan endo-1 ( E C 3.2 1 39) tltgani co»mid h 3-beta'S^ u c o F41H10 '^a*^ Caenorhabditti tltgani Caenorhabdilu e U g a m precursor Salmonella tgphimui ro (nt) m a n 9 t2 m « n o t 3 m a n 9 15 m a t t 9. t8 + -1+ + 16140 07734 08496 72200 77163 16620 68293 68516 72463 77533 TI3802 H36113 H36810 T328S3 481 600 24 204 381 60. 67 33 94 93. clone clone clone clone clone •11C7T7 174B22T7 179CI0T7 44F1T7 107N4T7 Figure 2. Gene organization in the 13 PI clones. Positions of the identified or predicted genes and gene segments in each insert of the PI clones are schematically presented by color-coded boxes above (rightward) and below (leftward) the wide line in the middle which represents the entire insert sequence. The insert length is given in parenthesis together with the clone name at the top. Arrowheads indicate the directions of the DNA strands (5' to 3'). Dark and faint blue bars with numbers represent the positions of the identified potential protein genes and potential exons. respectively, and red bars represent the positions of structural RNA genes. Gray bars with numbers indicate the positions of the transcribed regions. The regions which showed similarity to the sequences in the protein database were shown by yellow, orange and red bars, each of which corresponds to BLASTP scores of 70 100. 100 250. and 250 or more, respectively. The green bars indicate the positions of the potential exons predicted by the Grail program. Each of three different colors with increasing depth corresponds to the region with the Grail scores of less than 70. 70 90. and 90 or more, respectively. The potential protein genes, the gene portions and the potential RNA genes assigned as described in the text are listed below each of the figures. The accession numbers are as follows: AB006696 (MAF19). AB006697 (MAH20). AB006698 (MCL19). AB006699 (MDJ22). AB006700 (MHF15). AB006701 (MOP9). AB006702 (MPO12). AB006703 (MRH10). AB006704 (MSH12) . AB006705 (MTH12). AB006706 (MVA3). AB006707 (MYC6). and AB006708 (MYJ24). H. Kotani et al. No. 4] 295 MAH20 (80970 bp) IIH • • • 1 : 1 1 1 1 IIIII III! • li • I I Grail exon mil in Protein dbhtt mi i II ESTdbhit Gene Gene 1 1 1 I B M III mah20 11 mah20 12 PO 12734 14638 16830 21048 16384 2326! 27374 3S63O 29249 HI 360 459MJ 486OJ 51060 53821 71432 74069 4 7820 60283 53389 57072 72096 80743 ESTdbhit 1 I I Illl I l| mi IIIII 26 fl L08632 137371 P43292 S49634 U72831 D90917 86.9 31.3 99.4 28.6 100-0 74 4 Protein db tilt niiiiiiii Grail exon plete sequence NADH dehydrogenase (EC l.fl 99.3) lypothetical 86.0 kd protein in glkl-t e50 i] genie Soybean pyruvt ekin ER calcium-binding protein ERC-&6 pre •enne/threonine-protein kinaae ask 2 {EC 2.7.1..; hypothetical protein YML093w Arabidt>pata lhaltana ftavonol synthase Synechocyilu sp. PCC6S03 slrt>604 (LepA gen< produc Glfcine mait Unknown Arab.dop,,, Ihal•ana fuoercuJoti MtcobaeUn Arabidopm thai i PCC6803 , ential onont Length iderHitler Access to: Direct mah20 p2 mab30.p3 m>b3O.p4 oiab30.p6 mahaO.pO 23267 26082 34138 S9096 02399 + + L40368 S51B39 A57591 PI6383 U38042 23517 26243 35660 61136 94530 (aa) {'ft, aa] 87 54 310 406 49 4 38.9 23.9 30.9 D13F protein Id-asso. gc-rich sequence dna-binding fact DEAD box RNA helicwe DEAD box RNA helicaie 174 98 4 c Mat muicului tranacribed regiona MCL19 (84510 bp) I IIIII IIIII IIII Grail exon • III •I Protein db hit II III a ESTdbhit Gene 11 ^ |O tl Q i 1 1 II II in mi II 1 in in i I I I I II 1 II 1 I Gene III ESTdbhil III Protein db Hit •1 1 Graflexon potential prot em genea identifier U r ' ^t'°" —L _ 3 Exon K b l hit Length (aa) Acce* •ion Overlap (M) aa) ~n r r~F3 m itlanogait 22305 26401 28811 mctl94 mcll9.6 mcll«e 29076 32741 51996 mcll9.7 mcllBS mcll9.S lated protein he&t &nd acid-itable phoiphopi GTP-binding protein H-YPT3 hypothetical 16.2 kd protein i; genie region peptide transporter ptr2-b peptide transporter ptr2-b S62782 JC2487 P36063 31237 36747 63242 inducible prote 68100 69506 78947 Caenorhabditu elegant con MvcobacttTttan tubercutoan Finion yeast raBNA produt U61963 Z84395 D89164 X99340 potential ex on a & mcll9p2 + mdl9P3 3 Exu n ' EST hit 44046 46073 2 4T402 48302 1 0 Length (»»> 300 Accession i (aa) (*. « J U3S816 641 21 1 Z84202 307 30.3 <B«) Identity 1%. nt| Definition Specie. Drosophila melanogaiter nonroulcle myos in-II heavy chain A.thahana ORFb, ORFc and AtPK2324 fr U ,( fly > Ih. , . . „ . tranacribed region. Ac C e.. 1 O n i den tiller mcil9 tl mc!1912 Direction + + 57969 58264 me] 18 13 + 79469 soon 5 2 Z34792 Z33953 AAO42468 296 249 94 4 94 5 89 4 Detinition clone GBGe371, clone GBGe371, 5' end CD4-16 cDNA cl oneH10F7T7 potential RNA genes Portion ul No ol Length Accession Overlap Uehnition Species tnS'A-ProtT'CG; Aral,Jop,i i.%, lit) 1 "mcliOil =~ •,",!M 6^409 —T" 0 T2 X88O4B 72 Sequencing of Arabidopsis thaliana Chromosome 5 296 [Vol. 4. MDJ22 (77363 bp) II I III I III! I i inn Grail exon Protein db nil ESTdbhit Gene Gene II i I2t3 a ESTdbhit in pate • II I mi IIIII m genes ip idenlifier ProMndbNt GraOoton Uirection ft 3 EST hit Exon (ft «) (a 1 Species aj Sgntchixt'l'i *P PCC68G3 clrll34 (phoiphoglyceral*- mulue] prephenate dehydratMe (EC 4 2 1 51) TMV resistance protein N dna repair protein r&d& mdj22.5 pombt potential e i o m identifier mdj22 pi mdj22.p2 mdj22.p3 mdj22p4 Direction + + 5' 13458 55112 77235 Length las) 3 13619 77383 1 2 1 Identity S08328 U15957 P18484 0 117 36 54 113 36 37.0 (nt) (56. nt) 535 364 475 129 215 178 287 123 264 359 413 301 682 90 9 90.5 94.6 99.2 98 6 88.3 71.8 96.7 09.6 78.8 SI 4 99.3 67.2 41.7 Species alpha-adapt in Catnorhabdittt eltga Homo lapieni Acetobaeter i«Jint.m Rattut norveg'cui transcribed regions Definition Accemon identifier Direction 28969 2B716 31030 mdj22 12 mdJ22.t3 radjMU W43626 F19956 AAO424O8 Z35337 T45874 T43469 Z34946 Z17565 Z35338 W43063 H37652 Z34995 AA042470 20332 30189 31244 n>dj22 tB mdj22 tlO CD4-16 cDNA clone H1C10T7 clone TAP0366; 3' end CD4-16 cDNA clone H9H3T7 clone TAT8B01; 5' end; Similar toATTS0396 clone 133F15T7 clone 117P21T7 clone FAI214;6' end clone TAT4F5, 5' end clone TAT6B01, 3' end Lambd&-PRL2 cDNA clone 200M20T7 clone 183D14T7 clone FAFL61, 3' end CD4-16 cDNA clone H10F9T7 MHF15 (83865 bp) III! I I II Graiexon Protehi db hH II I4 I 5 EST db hit Gene Gene EST db hit I I IIIII II II Protein dbh» Grail exon identifier 2 hypothetical 29-7 kd protein in ieclO2-ifhl inter- mhfl52 12531 13877 mh(15 3 37490 38931 U89841 39814 43687 44606 50120 40434 44379 46030 63363 Y07563 P29549 P47O44 54190 t>6018 + + mhfl5.4 mhfl66 + mhri6.8 3O.0 48.7 24-5 39.5 tetiaphogphate hydrolase N.tahocum hinl gene product N.tabacvm hinl gene product telomere-binding protein alpha tubunit hypothetical 26 8 ltd protein in nup82-pep8 i> geni region S.ottracta basl prot Phaieoiui vulgaru embryo-tpecific iran'riptional activator PvALf Phattalai vsigariM potential exons identifier direction 5' (aa) 32 3 mbfl6p3 16422 17432 U08288 mhflS p4 17730 18800 U08285 28880 rahfia D5 26 1 30434 63.1 S4.5 Syneehocyttn •p. PCC6S03 1111980 (ihiol duulfide interchange protein TrxA) Nicottana tabatum VViaconsin 38 membraneaitociated salt-inducible protein hypothetical 29 7 kd protein in r»p5-p»kl intergenie region Tobacco B-typecyclin IAA24 mRNA, partial cdt transcribed region! Position identifier "TJIK? mhn5 t2 + 8286 3— 8713 Accession N96830 Overlap 42S y 67 3 Uen nition clone G5G5T7 Sgncchocsitti ip PCCS803 Nicotian* tabacum Satcharomgcei ceremuae Nicoimna labacum No. 41 H. Kotani et al. 297 MOP9 (84194 bp) mil! III I II I Gtateion II Protein db hit ESTdbNt Gene 2 «3 I III a ii III III identifier Direction mopfl 3 + 5' 3 22432 Length (aa) >n 5 14 25374 58421 Protdn d b h * m i l l • • mi nun 746 S34 0 Graflexon Jverlap JC6142 Y1O416 771 641 Specie* Xanthomonai maltophilia Solanam tmbcroiutn X-Prodipeptidyl-peptidue (EC 3 4 14 11) S-t*btrv$mn, toluble •tnrcb .ynthase 63.0 potential exona identifier mopfl.pl Uirection + mopflp2 mopfl p3 moptt.pl Petition T~ IOO4 7 26016 69057 - EST hit Excin 20378 27214 71316 0 2 2 (aa) 64 <«•> 63 427 0 SA5244 P366O8 S317I2 147 63 432 AccesiBO Overlap T44447 H76556 N3SO63 338 535 353 446 Spec*. (%, aa) T06D20 genomic sequence, complete sequence ubiquitin-like protein 0 neuronal calcium ten*or 1 beta-1,3-gtucana»e homotog 32.7 44 4 36.6 Arabidopu, (hahana Caenorhab4tli$ eiegan. transcribed regions identifier mopfl t l mopfl.12 Direction + 5 77671 Definition (%. nl) 3' 73262 d 75 3 31 8 clone 126G6T7 clone I98C11T7 clone 217G1T7 MPO12 (86263 bp) II Illl II II II Illl IF •III III •4 " HI II HI Gralexon Protein db ha ESTdbNt H Gene | Gene I. EST db hit Protein db hit Illl I II II Grail exon potential protein gel 14 11 61 65 mpol2.6 mpol27 83 41 Y11105 PS5143 JQ1677 PI9173 S27762 U08285 226 102 719 59 785 244 60 0 76.5 33.5 678 41 7 22.5 P.tativum Myb-like protein glutaredoxi S-receptor kinase (EC 2.7.1.-) precursor cytoehromie r oxidase polypeptide vc (EC 1 9.3.1) Sipl protei tabacum WUconiin 38 raerobraneassociated talt-iaducible protein A rabtdopn . thahana BAC clone T01BO8 complete (eque Arab>dop.x i thahana BAC clone T01B08 com- Pitum ml mum ftic.ns. a oleofin, itoform 21K gag/env/o rnyb=rution gene produce protein kin a«eNPK2 (EC 2.7.1.-1 Arabtdopt II lhaliana Mai ip Ntcot.ana Ipomoea batatai vutger* tabaemm mpol28 62752 12 U78721 621 37.8 mpol2.9 84810 OS U78721 596 38.0 66227 68032 T37ftl M343 695&S 1112 99 S71180 S830S1 SA3804 199 217 508 100.0 34 6 23 4 Definition Sp-sciea 6390 6751 1 0 117 S63818 (aa) 117 376 XPMC2 protein Xt nop«, lai 48922 49016 1 0 31 D60868 31 54 8 Soybean m it otic cyclin al-type Ol, mpol2 10 mpol2.11 012.12 31 58 Arabtdopt u lhahana Arabtdop, II thahaaa potential u o m T~ E xon rapol2 pi mpol2p3 ropo + p EST hit M ORF73 homoioR transcribed regions Potilion identifier IWection T~ Accewion Z337O6 Defin.l.on (nt) 320 314 (%, nt) 09.4 clone FAFK02 [Vol. 4. Sequencing of Arabidopsis thaliana Chromosome 5 298 MRHIO (71522 bp) I I IIBI I l l l I II III • I II III II III • II 1 11 1 Grail exon Illl 1 11 Protein db hit III ESTdbhrt Gene Gene ESTdbhrt II 11 1617 II Protein db hit IN Grail exon mrhlO3 mrtaltn 19486 mrhlO.5 mrhlO-6 rorhlO.7 25773 32328 33962 mrhlO.S mrhlO.ff mrhlO 10 42527 44946 52578 44137 49655 56300 67763 mrhlO.12 68884 69810 potenti.l •dentifier mrhlO.pl 2 30S EST bit (so) 1 U65313 274 AC000104 36.2 111 ADH s l utalhion ,e- dependent h*d* deh ydrogena Mm rai-GTPai* SHS-dom ain bindi F19F•19. complete 41.9 hypo 326 51.5 K-bo* binding factor 4 (as) (%, aa) (nl) (%, nt) P42 777 Arabidop,,, Droiophila Eichenchu ical 37 4 kd prat ein in eiur-l lhaltana melanogmt celt e»on» Position "07?; o( r Eio 7966 + Spec KM 1 t r a n s c r i b e d regio identifier Dir<ectioni mrhl0l4 mthl0t5 mrhlOte mrhl0t7 5 18965 22931 S51I9 49766 51059 + - 3 done VBVDH12 done 177HTT7 done 177H8T7 Lambdo-PRL2 cDNA done 123A2OXP 3' done 147N15T7 clone 93L24T7 Lambda-PHL2 cDNA done 148H10XP 3' clone 148H10T7 Z34SU K36303 H36304 AA395383 T76231 T2151C AA4O494T T751M7 1S396 23315 35429 50241 MSH12 (79259 bp) III II II I IIIIII IIIHI Gndexon II Protein db hit I 2 i in I i ESTAhit 3 Gene Gene i II i ESTdbhil Protein db htt II IIII III in IIIIII in Grail exon potenitial protein Renes 51 3' 41434 45*67 m>hl2.9 59014 63362 ni«hl2.10 67257 70072 tnshl2.11 72341 73950 + 46336 2696 12433 47174 8 5 1 0 279 S65812 302 23 2 + 76580 76744 1 0 55 S44207 52 28.8 (nt) (%. nti 375 315 277 204 387 422 2TT 305 375 250 84 8 89.0 87.7 81 6 89 4 995 98 9 89.8 99.3 her Direction EST hit Exon (aa) (aa) ra»hl2 6 1%. l ia ) hypothetical 75 4 kd protein in v R enic region Syntchocgttn sp PCC6803 slrl P40345 mshl2.7 D90900 28 8 26.4 probable membrane protein YLR384c alpha-n-acetylglucotammidue (EC 3-2.1.50) precui potential exo (aa) mahl2 pi m»hl2 p2 m>hl2p3 mshl2. P 4 mBhl2.p5 me protein YLhO92w nid ZK287 S64926 hv pothetical ptol:ein 337 Iraiiscribed r eg ions U i ?finition Accession identifier T~ T75880 H37727 TS8196 T43351 T44116 T04651 T4S009 R84001 T22832 14TJ9T7 186AI4T7 167G6T7 117K6T7 122I4T7 SBF11T7P 130IIST7 1O4G11T7 clone 105F11T7 clone 186J11T7 rlone clone clone clone Aol^.p Cat nor hi iMilu Hallot it ; tltgan, Rhodofor cm rubtr No. 4] H. Kotani et al. 299 MTH12 (74877 bp) III IIII! I I II llll I Grail exon Protein db hit ESTdbhit Gene Gene ESTdbhil Protein dbhB I Bill llll Hill I I I I I fier Id e n t Di ection 5 non m + hi 2 3 h 31432 34577 41069 46966 51437 57579 44996 &O916 65821 60723 2 5 + 2ft m 12699 22431 23902 27980 33918 34S8S 11126 20530 23420 2 2 hi m hi 2 7 hi 2 . B h 2.10 hi 2.11 m hi 2 12 EST hit * (sa) 0 0 634 161 4 524 829 103 397 780 798 0 0 1 842 910 Grail exon (aa) JQ0959 zsoioa S21495 P19037 U75467 Z73295 Z79637 S71277 S71277 S71277 S2932fl 392 639 161 845 103 904 63 1 32 7 100.0 28.4 52.7 990 39.7 36 2 905 cosmid SCY21B4 hypothetical protein 18.2 kd class I ieat shock protein Drocophila me anogatter Rga and Atu genes C.roteut recep or-likeprolei n kinase S.rojIrolaHist one H4 homo ogue Mfcobacttr Lucoperwo n etcutcntum Arabidoptu thohana Fruit fig Catharanthui roitua Setban.a ro Into tein kinase light repressibl light repressibl e receptor pro ein kinase tein kinaie light represiibl A rabidopm thaliana Arabidopsis thaliana Arabidoput thaliana potential e i o n s identifier Direction Position 5 Specie* 70T87 nthl2.p3 nlhl2p4 represaaible receptor prot chaperonin dn»J 73163 plcte acquence MVA3 (81701 bp) III III Graflexon Protein db hit EST db hit Gene Gene II III III 11 tl I identifier mva3 II II llll Direction II II III MB I I l l l EST hit 5' Length (aa) Acce^on Overlap mva3 _ mv»3. + mva3. mvs3 mvs3 — 1296 5249 13439 21813 9400 0 23084 1 262 28313 30549 496S2 3038J 32751 50989 0 0 403 425 0 721 299 366 70448 72351 313 Protein db hit Grail exon Species Definition {%. aa) 203 + EST db hit A54810 X97B26 D64001 1338 34 8 425 271 339 Q02104 S65812 Q02104 P46336 X98776 349 313 25.8 36 4 28.2 24 0 100.0 (% aa) 722 208 292 37 3 PCC6803 protein) TMV resistanc protein N Sgntchocystu p. PCC6803 S1IO057 (heatshock protein GrpE) lipase 1 precur «r (EC 3.1.1.3] lipaae 1 precur or (EC 3.1.1.3) iols protein A.thaUana pe•rondase ATP 13a, 124F2T7 EST clone Nicot,ana gUUnc A rabido put (hall Syneche PCC6803 P,Vchrc teeter ,m mo bill* Haholu rn/eicen mot iln Ptuchro Bactlla subtrfis A rabido p,i, thai poten tin exons Position idem fier mvB3 p2 rov&3.p4 mva3.p6 mvaS.pC mw3. P 7 - 34751 !T~ Exon 35616 _ 40948 42841 40796 59497 42017 43903 48436 60587 — + <aa} (aa) 0 259 163 0 331 1 290 2 241 269 EST hit 0 U78072 U78O72 P00387 P1O243 Uelinition Species acxdophtium VCP-like ATPaae 187 27.3 165 285 264 104 40.5 48.1 aadophilum VCP-like ATPase acidoph,lum VCP-like ATPase NADH-cytochr ome b6 reductase (EC 1.6.2.2) myb-celated protein a 88.7 97 9 97.3 Lambda-PHL2 cDNA clone 85D7T7 clone OAO159 5' end clone OAO159 3" end trnnscrih •ed renione identifier Uirection 51 448 mv&3.t3 + 3707 4626 (nt) 3 597 4921 AA067571 Z29763 Z29764 124 150 296 olasma a Thermo platma a cdophilam Thermo phtma a cdophilum Homo J Homo i apie.ni [Vol. 4. Sequencing of Arabidopsis thaliana Chromosome 5 300 MYC6 (82315 bp) in in II iiimi i i n n 11inn ma II u in in Grail exon Protein dbh« 11 ESTdbhit Gene Gene ii ESTdbhil !I I II III pote Protein db hit Grail exon otem genes Direction i 3 ap aa) fcxon mycfl.3 myc65 myc6.6 myc6.7 + 9 76838 «a) 93 8 DNA repair pro Q05865 449 32.7 homeotic prole folylpolyglutan late synthase (EC 5.3.2.17) U8O842 S66695 AC000104 ZS1492 23 35 50 80.8 48.6 piobable mem I F19P19, compl Catnorhabditt btdoptu thaliana illt» ,ubt,l,i potential w o n . identifier mycG.pl myc6.p2 Diirection - myc6p4 myc6p5 5 126 18361 23025 Length (aa) Ixon 7224 7 18615 23093 50616 72396 0 85 23 35 50 0 0 Did ZC239 YOL013c Did E03H4 Cat MYJ24 (78844 bp) •III I l l l II II III Grail exon Protein db hit ESTdbhrt Gene Gene ESTdbhil Protein d o r * 1 1 III II Grail exon IHIII i potential proteir genes Length 11604 Id Acceaston Overl »p -> Species (%,aa D 35015 19441 myj24.3 myj24 4 myj24 5 20504 31427 33202 39364 68127 38862 40173 70433 U61990 S43604 P49592 P48326 myj24.8 Syncchocgttutp PCC6803 »lrl918 (hypothetical protein) acetyl-coeniyme a nynthetxe (EC 6.2.1.1) StntckocpHu R07E5 1 protein drl protein homolog hypothetical 37.3 kd protein in ycQS-psbe intergenic region Gallut gatla, Arabidopm thaltaaa Cfonophora paradoia chain P homolog t r a n . c r i b e d regiona Potltlon identifier Direction »P (n t) r~ myj24 t2 43510 44002 myj24.t3 67616 B79Q9 AA042136 W43334 AA39&895 T46613 N38241 T88176 T44242 TS8075 T46238 AA404896 ident it («, M Uelin tion CD4-13 cDNA clone E2H9T7 Lambd»-PRL2 cDNA clone 250A3T7 Lambda-PRLS cDNA clone 303A8T7 clone 1MP22T7 clone 221K22T7 clone 157E20T7 clone 123N10T7 clone 155J12T7 clone 139A18T7 Lambda-PRL2 cDNA clone 150P21XP 3' Etchtnehi* «p PCC6803 col,
© Copyright 2026 Paperzz