Structural Analysis of Arabidopsis thaliana Chromosome 5. II

DNA RESEARCH 4, 291-300 (1997)
Short Communication
Structural Analysis of Arabidopsis thaliana Chromosome 5. II.
Sequence Features of the Regions of 1,044,062 bp Covered by
Thirteen Physically assigned PI Clones
Hirokazu KOTANI, Yasukazu NAKAMURA, Shusei SATO. Takakazu KANEKO, Erika ASAMIZU,
Nobuyuki MlYAJiMA, and Satoshi TABATA*
Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292. Japan
(Received 20 July 1997)
Abstract
A total of 13 PI clones, each containing a marker(s) specifically mapped on chromosome 5, were isolated
from a PI library of the Arabidopsis thaliana Columbia genome, and their nucleotide sequences were determined according to the shot gun based strategy and precisely located on the physical map of chromosome 5.
The total length of the sequenced regions was 1,044,062 bp. Since we have previously reported the sequence
of 1,621,245 bp by analysis of 20 non-redundant PI clones, the total length of the sequences of chromosome
5 determined so far reached 2,665,307 bp. The regions sequenced in this study were analysed by comparison
with the sequences in protein and EST databases and analysis with computer programs for gene modeling;
a total of 225 potential protein-coding genes and/or gene segments with known or predicted functions were
identified. The positions of exons which do not exhibit similarity to known genes were also predicted by
computer-aided analysis. An average density of the genes and/or gene was 1 gene/4,640 bp. Introns were
identified in approximately 84% of the potential genes, and the average number and length of the introns
per gene were 5.3 and 184 bp, respectively. These sequence features are essentially identical to those for the
previously sequenced regions. The transcription level of the predicted genes has been roughly monitored by
counting the numbers of matched Arabidopsis ESTs. The sequence data and gene information are available
through the World Wide Web at http://www.kazusa.or.jp/arabi/.
Key words: Arabidopsis thaliana chromosome 5; genomic sequence; PI genomic library; gene prediction
With a final objective of understanding of the entire genetic system in higher plants, we initiated large-scale sequencing of the Arabidopsis thaliana genome which consists of five chromosomes, totalling an estimated 130 Mb.
In the initial phase of the project, we focused our target
on chromosome 5 along the line of the international agreement of the Arabidopsis Genome Initiative.1 We screened
the chromosome 5-specific clones from a PI library from
which a contig map was constructed (manuscript in
preparation), and sequence analysis of PI clones physically assigned was started. We previously reported sequence features of the regions of 1,621,245 bp regions
which are covered by 20 non-redundant PI clones.2 We
now completed sequence determination of 13 additional
PI clones which have been localized on chromosome 5.
In this paper, gene organization and structural and functional information of the genes which likely reside in the
sequenced regions that were deduced by computer-aided
~
: ~TT 7T7" ~
:
Communicated by Mituru lakanami
*
analysis are described,
x
isolation and Sequencing of PI Clones
DNA sources and the method of clone isolation were
essentially the same as described in the previous paper.2
PI clones containing the DNA regions corresponding to
13 DNA markers on chromosome 5 were isolated by
screening the Mitsui PI library3 by means of PCR with
the primers designed on the basis of the marker sequences. The DNA markers used and the selected clones
are: m217 (MHF15), nga249 (MAH20), CHS (MSH12),
mi438 (MVA3), mi433 (MDJ22), CIC4D4R (MYJ24),
CIC4D4 (MOP9), BELLI (MYC6), mi83 (MRH10),
MPO12
m i61 (MCL19) , and CIC10H1 (MAF19).
and MTH12 were directly isolated as clones showing restriction fragment length polymorphism (RFLP),
when used as probes for genomic Southern hybridization
(manuscript in preparation). The relative positions of
v
r
r
r
i
r
To whom correspondence should be addressed. Tel. +81-438- the markers and the sequenced clones on chromosome 5
52-3933, Fax. +81-438-52-3934, E-mail: [email protected] are shown in Fig. 1. The relative orientation of each clone
on the chromosome is not yet known.
Sequencing of Arabidopsis thaliana Chromosome 5
292
length (Mbp)
m217
nga249mM74CHS mi322
•MHF15
-MAH20
-MSH12
mi438 •
• MVA3
mi433 ;
mi90 -
. MDJ22
• MYJ24
•MOP9
CIC4D4R •
mi219mi125
mi291bmi137—
MPO12
MYC6
m423
BELLI
mi69
mi70
mi83
—
g3844
-
CIC10H1L'
MRH10
MCL19
— MTH12
MAF19
Figure 1. Relative locations of the sequenced PI clones and
the associated markers on the physical map of chromosome 5.
Positions of DNA markers used for PI isolation and of other
major DNA markers were mapped based on the YAC tiling path
and on map information in ref. 12 and Sato et al. (manuscript
in preparation). The vertical box represents the entire length
of chromosome 5. Names of PI clones are given at the right
side, and those of markers at the left side. The distance (Mbp)
from the telomeric site of the top arm is given in the vertical
scale.
[Vol. 4
models constructed with the help of computer programs; Grail, 0 FEXA in GeneFinder,6 ER (Murakami
K., personal communication), ASPL in GeneFinder.e
GENSCAN 7 and NetPlantGene programs, 8 which predict either exon regions or exon-intron boundaries. The
transcribed regions were assigned by comparison of the
nucleotide sequences with Arabidopsis ESTs 9 ' 10 in the
public databases.
The potential protein coding regions assigned were divided into the following three categories. A single exon oi
a region containing consecutive multiple exons showing
similarity to a single reported gene throughout the alignment was assigned as a potential protein gene. The}
are denoted by numbers with the clone names followec
by sequential numbers from one end to another of the
insert. A region which matched only to portions of i
reported gene and only to Arabidopsis ESTs were assigned as a potential exon(s) and a transcribed region
respectively. These regions were distinguished from the
potential protein genes by adding "p" and "t" betweer
the clone names and the sequential numbers in the identifiers, respectively. All the genes and gene portions assigned in each PI clone according to the above procedure
are listed in the table below the figure, and also schematically represented in Fig. 2. To sum up. 120 potential
protein genes, 62 potential exons, and 43 transcribed regions were assigned in the 1,044,062 bp regions. An average density of the genes in the three categories in the
total of 2,665,307 bp, including the previously reportec
1,621,245 bp sequences, is estimated to be 1 gene pei
4,640 bp. However, the possibility remains that additional genes may be discovered among the intergenic regions in the future, since our prediction is mainly basec
on computer-assisted analysis.
RNA coding regions were assigned on the basis of sequence similarity to the reported structural RNAs. Foi
tRNA genes, prediction by the tRNAscan-SE program11
was also taken into account. As indicated in Fig. 1
1 tRNA gene was identified on the opposite strand of the
fourth intron of a chloroplast triose phosphate translocator precursor gene in MCL19, and was denoted as
mcll9rl.
The nucleotide sequence of each P I insert was determined according to the bridging shotgun method described previously.2 The length of the nucleotide sequence of each P I insert finally confirmed is indicated
at the top of Fig. 2. The total length of the DNA regions
sequenced in this study was 1,044,062 bp. Since we have
previously reported the sequences of 1,621,245 bp covered by 20 non-redundant P I clones,2 the total length of
the sequences of chromosome 5 determined is now up to 3.
2,665,307 bp.
Structural Features of the Potential Protein
Genes
In the DNA regions sequenced in this and previous
papers, 2 the structure was predicted for 259 potentia'
protein genes, approximately 1.3% of the total gene conAssignment of potential protein coding regions and stituents (20,000 genes) assumed for A. thaliana. Strucgene modeling were performed by combination of simi- tural features of the potential protein genes deduced sc
larity search and computer prediction as described in the far are listed in Table 1. Introns were identified in apprevious paper. 2 Briefly, similarity search was first car- proximately 81% of the potential genes, and the average:
ried out using the BLASTP program 4 against the non- number of the introns per gene was 4.5. The average
redundant protein sequence database, owl (release 29). length of the introns was 174 bp, which was consistent
The identified exons were integrated into the gene with the result obtained from analysis of 146 Arabidopsis
2.
Assignment of the Potential Coding Regions
No. 4]
H. Kotani et al.
293
Table 1. Structural features of potential protein genes in A. thaliana chromosome 5
Features
Gene length including introns
Product length
Genes with introns
Number of intron/gene
Exon length
Intron length
GC content of exons
GC content of introns
120 genes 3
194-11,377 bp (2,457 bp)
65-1,837 a.a. (496 a.a.)
101
0-42 (5.3)
2-3,049 bp (240 bp)
23-2.435 bp (184 bp)
44%
32%
259 genes b
191-11.377 bp (2,138 bp)
64-1,837 a.a. (456 a.a.)
210
0-42 (4.5)
2-4,026 bp (251 bp)
23-2,435 bp (174 bp)
43%
32%
Structural features of the 120 potential protein genes assigned in this studya) and the 259 genes assigned so farb' are listed. Average
values are shown in parenthesis.
genes registered in GenBank. 8 It was noted that the av- References
erage GC content of introns (32%) was significantly lower
1. Kaiser, J. 1996, First global sequencing effort begins.
than that of exons (43%).
4.
Expression Level of the Potential Protein
Genes and Gene Segments
The number of matched Arabidopsis ESTs in the public DNA databases for each of the potential protein genes
and gene segments was counted to monitor the transcriptional level of the genes. Of the 225 genes and gene segments that we have identified in chromosome 5 in this
study. 114 carried matched ESTs. The putative products of the genes hit by 10 or more EST files, suggesting
that they arc highly expressed genes, include those showing sequence similarity to chloroplast triose phosphate
translocator precursor and acid phosphatase precursor 1.
The sequence data as well as the gene information
shown in this paper are available through the World Wide
Web at http://www.kazusa.or.jp/arabi/.
Acknowledgments: We thank S. Sasamoto and K.
Xaruo for excellent technical assistance and the members of DNA Sequencing Laboratory: T. Kimura. T.
Hosouchi. K. Kawashima. M. Matsumoto, A. Matsuno.
E. Mitsui. A. Muraki. N. Nakazaki, S. Okumura. S.
Shinpo. C. Takcuchi. T. Wada. A. Watanabe. M.
Yamada. M. Yasuda. and M. Yatabe for their excellent
team work. We are grateful to A. Tanaka for technical advice, and Mitsui Plant Biotechnology Research Institute and Arabidopsis Biological Resource Center at
Ohio State University for providing the DNA markers
and the DNA libraries. This work was supported by the
Kazusa DNA Research Institute Foundation. We thank
M. Takanarni for his support and encouragement to perform this project.
Science, 274, 30.
2. Sato, S.. Kotani, H., Nakamura, Y. et al. 1997, Structural analysis of Arabidopsis thaliana chromosome 5. I.
Sequence features of the 1.6 Mb regions covered by twenty
physically assigned P I clones, DNA Res., 4, 215-230.
3. Liu, Y.-G., Mitsukawa, N., Vazquez-Tello, A., and
Whittier, R. F. 1995, Generation of a high-quality P I
library of Arabidopsis suitable for chromosome walking,
Plant J., 7, 351-358.
4. Altschul, S. F.. Gish, W., Miller, W., Myers, E. W., and
Lipman, D. J. 1990, Basic local alignment search tool, J.
Mol. Biol, 215, 403-410.
5. Uberbacher, E. C. and Mural, R. J. 1991, Locating
protein-coding regions in human DNA sequences by a
multiple sensor-neural network approach, Proc. Natl.
Acad. Sci. USA, 88, 11261-11265.
6. Solovyev, V. V., Salamov, A. A., and Lawrence, C. B.
1994, Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open
reading frames, Nucl. Acids Res., 22. 5156-5163.
7. Burge, C. and Karlin, S. 1997, Prediction of complete
gene structures in human genomic DNA, J. Mol. Biol..
268, 78-94.
8. Hebsgaard, S. M.. Korning, P. G.. Tolstrup. N..
Engelbrecht. J., Rouze, P.. and Brunak, S. 1996. Splice
site prediction in Arabidopsis thaliana DNA by combining local and global sequence information. Nucl. Acids
Res., 24, 3439-3452.
9. Newman. T.. Bruijn. F. J.. and Green. P. 1994. Genes
galore: A summary of methods for accessing results from
large-scale partial sequencing of anonymous Arabidopsis
cDNA clones, Plant Physiol, 106, 1241-1255.
10. Cooke, R.. Raynal, M., Laudie M. et al. 1996, Further
progress towards a catalogue of all Arabidopsis genes:
analysis of a set of 5000 non-redundant ESTs, Plant J..
9. 101-124.
11. Lowe, T. M. and Eddy. S. R. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in
genomic sequence, Nucl. Acids Res.. 25, 955 964.
12. Schmidt. R.. Love. K.. West. J. et al. 1997. Description
of 31 YAC contigs spanning the majority of Arabidopsis
thaliana chromosome 5. Plant J.. 11. 563 572.
Sequencing of Arabidopsis thaliana Chromosome 5
294
[Vol. 4.
MAF19 (78379 bp)
nui
iii mil i II
I Illl II I I
III!
I
MM III II
II
I
t3M
5
Protein db hit
15
I
<
Grail exon
III
pBpIO
pii
ESTdbhit
6
Gene
I
r
Gene
II I
ESTdbhil
Protein db hit
III
IB
nun ii a m i
II I I
Grail exon
;anos
potential protein
i
:F
r
maflsl
r
1r
mane.4
mattes
matt 9.6
3 6 4 9 -—sTcl
9807
11086
S7718
00096
72964
'[
5
1
309
0
0
5
69073
61136
74300
452
300
521495
S402S6
(aa)
3 4 3 —
281
X96343
P40978
P13795
479
146
228
{%,
aa
32~
49.
26.
7t.
hypothetical
RNA-binding
protein
protein
tergenic region
N . t a U c a m H S R 2 0 1 protein
40> riboiomal protein S19
tynaptatomal auociated protein
Or at a
Gallui
25
mtiva
gallui
potential *xon»
identifier
Dire •lion
m»n9 p3
man 9 p4
mafle p8
+
-
m*n9.p7
m»n9.p8
mans.ps
+
E S T
5'
194
11800
17734
21267
30641
L<
7aa h )
Deflnitio n
Accection
(aa)
103
<%.
aa
54.
12138
19770
22659
30726
37883
1
T
5
1
1
0
0
0
0
0
113
319
317
62
80
Q05000
U61964
U61954
Z8312S
P&2409
108
324
316
62
77
27.
40. f
36.
30.
40.
64808
66520
2
2
0
0
103
113
U75467
103
38.
37844
S4I34
06027
m»fl9.plO
06772
67209
mtfl9.pl I
70165
70463
tranacribed
hit
replicati
CaenorhabdtUt
glucan
endo-1
( E C 3.2 1 39)
tltgani co»mid
h
3-beta'S^
u c o
F41H10
'^a*^
Caenorhabditti
tltgani
Caenorhabdilu
e U g a m
precursor
Salmonella tgphimui
ro
(nt)
m a n 9 t2
m « n o t 3
m a n 9 15
m a t t 9. t8
+
-1+
+
16140
07734
08496
72200
77163
16620
68293
68516
72463
77533
TI3802
H36113
H36810
T328S3
481
600
24
204
381
60.
67
33
94
93.
clone
clone
clone
clone
clone
•11C7T7
174B22T7
179CI0T7
44F1T7
107N4T7
Figure 2. Gene organization in the 13 PI clones. Positions of the identified or predicted genes and gene segments in each insert of
the PI clones are schematically presented by color-coded boxes above (rightward) and below (leftward) the wide line in the middle
which represents the entire insert sequence. The insert length is given in parenthesis together with the clone name at the top.
Arrowheads indicate the directions of the DNA strands (5' to 3'). Dark and faint blue bars with numbers represent the positions of
the identified potential protein genes and potential exons. respectively, and red bars represent the positions of structural RNA genes.
Gray bars with numbers indicate the positions of the transcribed regions. The regions which showed similarity to the sequences in
the protein database were shown by yellow, orange and red bars, each of which corresponds to BLASTP scores of 70 100. 100 250.
and 250 or more, respectively. The green bars indicate the positions of the potential exons predicted by the Grail program. Each of
three different colors with increasing depth corresponds to the region with the Grail scores of less than 70. 70 90. and 90 or more,
respectively. The potential protein genes, the gene portions and the potential RNA genes assigned as described in the text are listed
below each of the figures. The accession numbers are as follows: AB006696 (MAF19). AB006697 (MAH20). AB006698 (MCL19).
AB006699 (MDJ22). AB006700 (MHF15). AB006701 (MOP9). AB006702 (MPO12). AB006703 (MRH10). AB006704 (MSH12) .
AB006705 (MTH12). AB006706 (MVA3). AB006707 (MYC6). and AB006708 (MYJ24).
H. Kotani et al.
No. 4]
295
MAH20 (80970 bp)
IIH • • • 1 : 1 1 1 1
IIIII
III! •
li •
I
I
Grail exon
mil in
Protein dbhtt
mi
i
II
ESTdbhit
Gene
Gene
1
1 1
I B M III
mah20 11
mah20 12
PO
12734
14638
16830
21048
16384
2326!
27374
3S63O
29249
HI 360
459MJ
486OJ
51060
53821
71432
74069
4 7820
60283
53389
57072
72096
80743
ESTdbhit
1
I I Illl I l|
mi
IIIII
26 fl
L08632
137371
P43292
S49634
U72831
D90917
86.9
31.3
99.4
28.6
100-0
74 4
Protein db tilt
niiiiiiii
Grail exon
plete sequence
NADH dehydrogenase (EC l.fl 99.3)
lypothetical 86.0 kd protein in glkl-t e50 i]
genie
Soybean pyruvt ekin
ER calcium-binding protein ERC-&6 pre
•enne/threonine-protein kinaae ask 2 {EC 2.7.1..;
hypothetical protein YML093w
Arabidt>pata lhaltana ftavonol synthase
Synechocyilu sp. PCC6S03 slrt>604 (LepA gen<
produc
Glfcine mait
Unknown
Arab.dop,,, Ihal•ana
fuoercuJoti
MtcobaeUn
Arabidopm thai
i PCC6803
, ential onont
Length
iderHitler
Access to:
Direct
mah20 p2
mab30.p3
m>b3O.p4
oiab30.p6
mahaO.pO
23267
26082
34138
S9096
02399
+
+
L40368
S51B39
A57591
PI6383
U38042
23517
26243
35660
61136
94530
(aa)
{'ft, aa]
87
54
310
406
49 4
38.9
23.9
30.9
D13F protein
Id-asso.
gc-rich sequence dna-binding fact
DEAD box RNA helicwe
DEAD box RNA helicaie
174
98 4
c
Mat muicului
tranacribed regiona
MCL19 (84510 bp)
I IIIII
IIIII
IIII
Grail exon
• III
•I
Protein db hit
II
III
a
ESTdbhit
Gene
11
^ |O
tl Q
i
1
1
II
II
in
mi
II 1
in in i
I I I I II
1
II 1
I
Gene
III
ESTdbhil
III
Protein db Hit
•1 1
Graflexon
potential prot em genea
identifier
U r
' ^t'°"
—L
_
3
Exon
K b l hit
Length
(aa)
Acce* •ion
Overlap
(M)
aa)
~n
r
r~F3
m
itlanogait
22305
26401
28811
mctl94
mcll9.6
mcll«e
29076
32741
51996
mcll9.7
mcllBS
mcll9.S
lated protein
he&t &nd acid-itable phoiphopi
GTP-binding protein H-YPT3
hypothetical 16.2 kd protein i;
genie region
peptide transporter ptr2-b
peptide transporter ptr2-b
S62782
JC2487
P36063
31237
36747
63242
inducible prote
68100
69506
78947
Caenorhabditu elegant con
MvcobacttTttan tubercutoan
Finion yeast raBNA produt
U61963
Z84395
D89164
X99340
potential ex on a
&
mcll9p2
+
mdl9P3
3
Exu n '
EST hit
44046
46073
2
4T402
48302
1
0
Length
(»»>
300
Accession
i
(aa)
(*. « J
U3S816
641
21 1
Z84202
307
30.3
<B«)
Identity
1%. nt|
Definition
Specie.
Drosophila melanogaiter nonroulcle myos in-II
heavy chain
A.thahana ORFb, ORFc and AtPK2324
fr U ,( fly
> Ih. , . . „ .
tranacribed region.
Ac C e.. 1 O n
i den tiller
mcil9 tl
mc!1912
Direction
+
+
57969
58264
me] 18 13
+
79469
soon
5
2
Z34792
Z33953
AAO42468
296
249
94 4
94 5
89 4
Detinition
clone GBGe371,
clone GBGe371, 5' end
CD4-16 cDNA cl oneH10F7T7
potential RNA genes
Portion
ul
No ol
Length
Accession
Overlap
Uehnition
Species
tnS'A-ProtT'CG;
Aral,Jop,i
i.%, lit)
1
"mcliOil
=~
•,",!M
6^409
—T"
0
T2
X88O4B
72
Sequencing of Arabidopsis thaliana Chromosome 5
296
[Vol. 4.
MDJ22 (77363 bp)
II
I
III
I III! I
i inn
Grail exon
Protein db nil
ESTdbhit
Gene
Gene
II i
I2t3 a
ESTdbhit
in
pate
• II I
mi
IIIII
m genes
ip
idenlifier
ProMndbNt
GraOoton
Uirection
ft
3
EST hit
Exon
(ft
«)
(a
1
Species
aj
Sgntchixt'l'i
*P PCC68G3 clrll34 (phoiphoglyceral*- mulue]
prephenate dehydratMe (EC 4 2 1 51)
TMV resistance protein N
dna repair protein r&d&
mdj22.5
pombt
potential e i o m
identifier
mdj22 pi
mdj22.p2
mdj22.p3
mdj22p4
Direction
+
+
5'
13458
55112
77235
Length
las)
3
13619
77383
1
2
1
Identity
S08328
U15957
P18484
0
117
36
54
113
36
37.0
(nt)
(56. nt)
535
364
475
129
215
178
287
123
264
359
413
301
682
90 9
90.5
94.6
99.2
98 6
88.3
71.8
96.7
09.6
78.8
SI 4
99.3
67.2
41.7
Species
alpha-adapt in
Catnorhabdittt eltga
Homo lapieni
Acetobaeter i«Jint.m
Rattut norveg'cui
transcribed regions
Definition
Accemon
identifier
Direction
28969
2B716
31030
mdj22 12
mdJ22.t3
radjMU
W43626
F19956
AAO424O8
Z35337
T45874
T43469
Z34946
Z17565
Z35338
W43063
H37652
Z34995
AA042470
20332
30189
31244
n>dj22 tB
mdj22 tlO
CD4-16 cDNA clone H1C10T7
clone TAP0366; 3' end
CD4-16 cDNA clone H9H3T7
clone TAT8B01; 5' end; Similar toATTS0396
clone 133F15T7
clone 117P21T7
clone FAI214;6' end
clone TAT4F5, 5' end
clone TAT6B01, 3' end
Lambd&-PRL2 cDNA clone 200M20T7
clone 183D14T7
clone FAFL61, 3' end
CD4-16 cDNA clone H10F9T7
MHF15 (83865 bp)
III!
I I II
Graiexon
Protehi db hH
II
I4 I 5
EST db hit
Gene
Gene
EST db hit
I
I IIIII II
II
Protein dbh»
Grail exon
identifier
2
hypothetical 29-7 kd protein in ieclO2-ifhl inter-
mhfl52
12531
13877
mh(15 3
37490
38931
U89841
39814
43687
44606
50120
40434
44379
46030
63363
Y07563
P29549
P47O44
54190
t>6018
+
+
mhfl5.4
mhfl66
+
mhri6.8
3O.0
48.7
24-5
39.5
tetiaphogphate hydrolase
N.tahocum hinl gene product
N.tabacvm hinl gene product
telomere-binding protein alpha tubunit
hypothetical 26 8 ltd protein in nup82-pep8 i>
geni region
S.ottracta basl prot
Phaieoiui vulgaru embryo-tpecific
iran'riptional activator PvALf
Phattalai vsigariM
potential exons
identifier
direction
5'
(aa)
32 3
mbfl6p3
16422
17432
U08288
mhflS p4
17730
18800
U08285
28880
rahfia D5
26 1
30434
63.1
S4.5
Syneehocyttn
•p. PCC6S03 1111980 (ihiol duulfide interchange
protein TrxA)
Nicottana tabatum VViaconsin 38 membraneaitociated salt-inducible protein
hypothetical 29 7 kd protein in r»p5-p»kl intergenie region
Tobacco B-typecyclin
IAA24 mRNA, partial cdt
transcribed region!
Position
identifier
"TJIK?
mhn5 t2
+
8286
3—
8713
Accession
N96830
Overlap
42S
y
67 3
Uen nition
clone G5G5T7
Sgncchocsitti ip PCCS803
Nicotian*
tabacum
Satcharomgcei
ceremuae
Nicoimna labacum
No. 41
H. Kotani et al.
297
MOP9 (84194 bp)
mil!
III
I II
I
Gtateion
II
Protein db hit
ESTdbNt
Gene
2 «3
I
III
a
ii III
III
identifier
Direction
mopfl 3
+
5'
3
22432
Length
(aa)
>n
5
14
25374
58421
Protdn d b h *
m i l l • • mi nun
746
S34
0
Graflexon
Jverlap
JC6142
Y1O416
771
641
Specie*
Xanthomonai maltophilia
Solanam tmbcroiutn
X-Prodipeptidyl-peptidue (EC 3 4 14 11)
S-t*btrv$mn, toluble •tnrcb .ynthase
63.0
potential exona
identifier
mopfl.pl
Uirection
+
mopflp2
mopfl p3
moptt.pl
Petition
T~
IOO4 7
26016
69057
-
EST hit
Excin
20378
27214
71316
0
2
2
(aa)
64
<«•>
63
427
0
SA5244
P366O8
S317I2
147
63
432
AccesiBO
Overlap
T44447
H76556
N3SO63
338
535
353
446
Spec*.
(%, aa)
T06D20 genomic sequence, complete sequence
ubiquitin-like protein 0
neuronal calcium ten*or 1
beta-1,3-gtucana»e homotog
32.7
44 4
36.6
Arabidopu, (hahana
Caenorhab4tli$ eiegan.
transcribed regions
identifier
mopfl t l
mopfl.12
Direction
+
5
77671
Definition
(%. nl)
3'
73262
d
75 3
31 8
clone 126G6T7
clone I98C11T7
clone 217G1T7
MPO12 (86263 bp)
II
Illl
II
II
II
Illl
IF
•III
III
•4 "
HI
II
HI
Gralexon
Protein db ha
ESTdbNt
H
Gene
| Gene
I.
EST db hit
Protein db hit
Illl I
II
II
Grail exon
potential protein gel
14
11
61
65
mpol2.6
mpol27
83
41
Y11105
PS5143
JQ1677
PI9173
S27762
U08285
226
102
719
59
785
244
60 0
76.5
33.5
678
41 7
22.5
P.tativum Myb-like protein
glutaredoxi
S-receptor kinase (EC 2.7.1.-) precursor
cytoehromie r oxidase polypeptide vc (EC 1 9.3.1)
Sipl protei
tabacum WUconiin 38 raerobraneassociated talt-iaducible protein
A rabtdopn . thahana BAC clone T01BO8 complete (eque
Arab>dop.x i thahana BAC clone T01B08 com-
Pitum ml mum
ftic.ns. a
oleofin, itoform 21K
gag/env/o rnyb=rution gene produce
protein kin a«eNPK2 (EC 2.7.1.-1
Arabtdopt II lhaliana
Mai ip
Ntcot.ana
Ipomoea batatai
vutger*
tabaemm
mpol28
62752
12
U78721
621
37.8
mpol2.9
84810
OS
U78721
596
38.0
66227
68032
T37ftl
M343
695&S
1112
99
S71180
S830S1
SA3804
199
217
508
100.0
34 6
23 4
Definition
Sp-sciea
6390
6751
1
0
117
S63818
(aa)
117
376
XPMC2 protein
Xt nop«, lai
48922
49016
1
0
31
D60868
31
54 8
Soybean m it otic cyclin al-type
Ol,
mpol2 10
mpol2.11
012.12
31
58
Arabtdopt u lhahana
Arabtdop, II thahaaa
potential u o m
T~ E xon
rapol2 pi
mpol2p3
ropo
+
p
EST hit
M
ORF73 homoioR
transcribed regions
Potilion
identifier
IWection
T~
Accewion
Z337O6
Defin.l.on
(nt)
320
314
(%, nt)
09.4
clone FAFK02
[Vol. 4.
Sequencing of Arabidopsis thaliana Chromosome 5
298
MRHIO (71522 bp)
I I IIBI I l l l
I
II III
• I
II III II
III • II
1 11
1
Grail exon
Illl
1
11
Protein db hit
III
ESTdbhrt
Gene
Gene
ESTdbhrt
II
11
1617
II
Protein db hit
IN
Grail exon
mrhlO3
mrtaltn
19486
mrhlO.5
mrhlO-6
rorhlO.7
25773
32328
33962
mrhlO.S
mrhlO.ff
mrhlO 10
42527
44946
52578
44137
49655
56300
67763
mrhlO.12
68884
69810
potenti.l
•dentifier
mrhlO.pl
2
30S
EST bit
(so)
1
U65313
274
AC000104
36.2
111 ADH s l utalhion ,e- dependent
h*d* deh ydrogena
Mm
rai-GTPai*
SHS-dom ain bindi
F19F•19. complete
41.9
hypo
326
51.5
K-bo* binding factor 4
(as)
(%, aa)
(nl)
(%, nt)
P42 777
Arabidop,,,
Droiophila
Eichenchu
ical 37 4 kd prat ein in eiur-l
lhaltana
melanogmt
celt
e»on»
Position
"07?;
o(
r
Eio
7966
+
Spec KM
1
t r a n s c r i b e d regio
identifier
Dir<ectioni
mrhl0l4
mthl0t5
mrhlOte
mrhl0t7
5
18965
22931
S51I9
49766
51059
+
-
3
done VBVDH12
done 177HTT7
done 177H8T7
Lambdo-PRL2 cDNA done 123A2OXP 3'
done 147N15T7
clone 93L24T7
Lambda-PHL2 cDNA done 148H10XP 3'
clone 148H10T7
Z34SU
K36303
H36304
AA395383
T76231
T2151C
AA4O494T
T751M7
1S396
23315
35429
50241
MSH12 (79259 bp)
III
II II I
IIIIII
IIIHI
Gndexon
II
Protein db hit
I
2
i
in
I
i
ESTAhit
3
Gene
Gene
i II i
ESTdbhil
Protein db htt
II
IIII III
in IIIIII in
Grail exon
potenitial protein Renes
51
3'
41434
45*67
m>hl2.9
59014
63362
ni«hl2.10
67257
70072
tnshl2.11
72341
73950
+
46336
2696
12433
47174
8
5
1
0
279
S65812
302
23 2
+
76580
76744
1
0
55
S44207
52
28.8
(nt)
(%. nti
375
315
277
204
387
422
2TT
305
375
250
84 8
89.0
87.7
81 6
89 4
995
98 9
89.8
99.3
her
Direction
EST hit
Exon
(aa)
(aa)
ra»hl2 6
1%. l ia )
hypothetical 75 4 kd protein in v
R enic region
Syntchocgttn
sp PCC6803 slrl
P40345
mshl2.7
D90900
28 8
26.4
probable membrane protein YLR384c
alpha-n-acetylglucotammidue
(EC 3-2.1.50)
precui
potential exo
(aa)
mahl2 pi
m»hl2 p2
m>hl2p3
mshl2. P 4
mBhl2.p5
me protein YLhO92w
nid ZK287
S64926
hv pothetical ptol:ein 337
Iraiiscribed r eg ions
U i ?finition
Accession
identifier
T~
T75880
H37727
TS8196
T43351
T44116
T04651
T4S009
R84001
T22832
14TJ9T7
186AI4T7
167G6T7
117K6T7
122I4T7
SBF11T7P
130IIST7
1O4G11T7
clone 105F11T7
clone 186J11T7
rlone
clone
clone
clone
Aol^.p
Cat nor hi iMilu
Hallot it ;
tltgan,
Rhodofor cm rubtr
No. 4]
H. Kotani et al.
299
MTH12 (74877 bp)
III IIII!
I
I II llll
I
Grail exon
Protein db hit
ESTdbhit
Gene
Gene
ESTdbhil
Protein dbhB
I Bill llll Hill I I I I I
fier
Id e n t
Di ection
5
non
m
+
hi 2 3
h
31432
34577
41069
46966
51437
57579
44996
&O916
65821
60723
2 5
+
2ft
m
12699
22431
23902
27980
33918
34S8S
11126
20530
23420
2 2
hi
m
hi
2 7
hi 2 . B
h 2.10
hi 2.11
m hi 2 12
EST hit
*
(sa)
0
0
634
161
4
524
829
103
397
780
798
0
0
1
842
910
Grail exon
(aa)
JQ0959
zsoioa
S21495
P19037
U75467
Z73295
Z79637
S71277
S71277
S71277
S2932fl
392
639
161
845
103
904
63 1
32 7
100.0
28.4
52.7
990
39.7
36 2
905
cosmid SCY21B4
hypothetical protein
18.2 kd class I ieat shock protein
Drocophila me anogatter Rga and Atu genes
C.roteut recep or-likeprolei n kinase
S.rojIrolaHist one H4 homo ogue
Mfcobacttr
Lucoperwo n etcutcntum
Arabidoptu thohana
Fruit fig
Catharanthui roitua
Setban.a ro Into
tein kinase
light repressibl
light repressibl e receptor pro ein kinase
tein kinaie
light represiibl
A rabidopm thaliana
Arabidopsis thaliana
Arabidoput thaliana
potential e i o n s
identifier
Direction
Position
5
Specie*
70T87
nthl2.p3
nlhl2p4
represaaible receptor
prot
chaperonin dn»J
73163
plcte acquence
MVA3 (81701 bp)
III
III
Graflexon
Protein db hit
EST db hit
Gene
Gene
II
III
III
11
tl
I
identifier
mva3
II
II llll
Direction
II
II
III
MB I I l l l
EST hit
5'
Length
(aa)
Acce^on
Overlap
mva3
_
mv»3.
+
mva3.
mvs3
mvs3
—
1296
5249
13439
21813
9400
0
23084
1
262
28313
30549
496S2
3038J
32751
50989
0
0
403
425
0
721
299
366
70448
72351
313
Protein db hit
Grail exon
Species
Definition
{%. aa)
203
+
EST db hit
A54810
X97B26
D64001
1338
34 8
425
271
339
Q02104
S65812
Q02104
P46336
X98776
349
313
25.8
36 4
28.2
24 0
100.0
(% aa)
722
208
292
37 3
PCC6803
protein)
TMV resistanc protein N
Sgntchocystu
p. PCC6803 S1IO057 (heatshock
protein GrpE)
lipase 1 precur «r (EC 3.1.1.3]
lipaae 1 precur or (EC 3.1.1.3)
iols protein
A.thaUana pe•rondase ATP 13a,
124F2T7
EST clone
Nicot,ana gUUnc
A rabido put (hall
Syneche
PCC6803
P,Vchrc teeter ,m mo bill*
Haholu rn/eicen
mot iln
Ptuchro
Bactlla subtrfis
A rabido p,i, thai
poten tin exons
Position
idem fier
mvB3 p2
rov&3.p4
mva3.p6
mvaS.pC
mw3. P 7
-
34751
!T~ Exon
35616
_
40948
42841
40796
59497
42017
43903
48436
60587
—
+
<aa}
(aa)
0
259
163
0
331
1
290
2
241
269
EST hit
0
U78072
U78O72
P00387
P1O243
Uelinition
Species
acxdophtium VCP-like ATPaae
187
27.3
165
285
264
104
40.5
48.1
aadophilum VCP-like ATPase
acidoph,lum VCP-like ATPase
NADH-cytochr ome b6 reductase (EC 1.6.2.2)
myb-celated protein a
88.7
97 9
97.3
Lambda-PHL2 cDNA clone 85D7T7
clone OAO159 5' end
clone OAO159 3" end
trnnscrih •ed renione
identifier
Uirection
51
448
mv&3.t3
+
3707
4626
(nt)
3
597
4921
AA067571
Z29763
Z29764
124
150
296
olasma a
Thermo platma a cdophilam
Thermo phtma a cdophilum
Homo J
Homo i apie.ni
[Vol. 4.
Sequencing of Arabidopsis thaliana Chromosome 5
300
MYC6 (82315 bp)
in in II iiimi i
i n n 11inn
ma
II u
in
in
Grail exon
Protein dbh«
11
ESTdbhit
Gene
Gene
ii
ESTdbhil
!I I
II III
pote
Protein db hit
Grail exon
otem genes
Direction
i
3
ap
aa)
fcxon
mycfl.3
myc65
myc6.6
myc6.7
+
9
76838
«a)
93 8
DNA repair pro
Q05865
449
32.7
homeotic prole
folylpolyglutan late synthase (EC 5.3.2.17)
U8O842
S66695
AC000104
ZS1492
23
35
50
80.8
48.6
piobable mem I
F19P19, compl
Catnorhabditt
btdoptu thaliana
illt» ,ubt,l,i
potential w o n .
identifier
mycG.pl
myc6.p2
Diirection
-
myc6p4
myc6p5
5
126
18361
23025
Length
(aa)
Ixon
7224 7
18615
23093
50616
72396
0
85
23
35
50
0
0
Did ZC239
YOL013c
Did E03H4
Cat
MYJ24 (78844 bp)
•III I l l l II
II
III
Grail exon
Protein db hit
ESTdbhrt
Gene
Gene
ESTdbhil
Protein d o r *
1 1 III II
Grail exon
IHIII i
potential proteir genes
Length
11604
Id
Acceaston
Overl »p
->
Species
(%,aa
D 35015
19441
myj24.3
myj24 4
myj24 5
20504
31427
33202
39364
68127
38862
40173
70433
U61990
S43604
P49592
P48326
myj24.8
Syncchocgttutp
PCC6803 »lrl918 (hypothetical
protein)
acetyl-coeniyme a nynthetxe (EC 6.2.1.1)
StntckocpHu
R07E5 1 protein
drl protein homolog
hypothetical 37.3 kd protein in ycQS-psbe intergenic region
Gallut gatla,
Arabidopm thaltaaa
Cfonophora paradoia
chain P homolog
t r a n . c r i b e d regiona
Potltlon
identifier
Direction
»P
(n t)
r~
myj24 t2
43510
44002
myj24.t3
67616
B79Q9
AA042136
W43334
AA39&895
T46613
N38241
T88176
T44242
TS8075
T46238
AA404896
ident it
(«, M
Uelin tion
CD4-13 cDNA clone E2H9T7
Lambd»-PRL2 cDNA clone 250A3T7
Lambda-PRLS cDNA clone 303A8T7
clone 1MP22T7
clone 221K22T7
clone 157E20T7
clone 123N10T7
clone 155J12T7
clone 139A18T7
Lambda-PRL2 cDNA clone 150P21XP 3'
Etchtnehi*
«p PCC6803
col,