in R. sphaeroides 2.4.1 - Sam Houston State University

Identification of the Chromosomal Origins of Replication (oriCRSI oriCRSII) in R. sphaeroides 2.4.1
Tim Johnson, Randi Harbour, Kristina Hernandez, Lin Lin, and Madhusudan Choudhary
Department of Biological Sciences, Sam Houston State University, Huntsville, Texas 77341
INTRODUCTION
RESULTS AND DISCUSSION
Rhodobacter sphaeroides belongs to the α-3 subdivision of the Proteobacteria. This organism is
metabolically versatile, and it grows under a variety of growth conditions, such as aerobic, semi-aerobic,
and photosynthetic growth conditions. R. sphaeroides possesses a complex genome, which is comprised
of two chromosomes (CI and CII) and five endogenous plasmids (1). CI and CII are ~3.0Mbp and
~0.9Mbp in size, respectively (2). Analysis of the R. sphaeroides genome reveals that genes for a wide
variety of essential functions are dispersed between the two chromosomes. Recently, it has also been
demonstrated that CI and CII have been both essential and ancient partners within the R. sphaeroides
genome since its separation from its ancestor lineage (3).
Unlike eukaryotes, prokaryotic cell lacks mitosis or mitosis like apparatus. The existence of multiple
chromosomes in bacteria may require a well coordinated chromosomal replication and chromosomal
segregation to distribute the chromosomes equally in the two daughter cells. Therefore, in order to
understand the process of DNA replication, the origin of chromosomal replication must first be
identified. The origin of replication (referred as oriC in E. coli) is the specific region in the chromosome
where the DNA double helix will begin to denature allowing replication of the chromosome to initiate
(4). This region varies 40-80 base pairs in length among different bacterial species, and usually remains
very AT rich (70 to 80%) as the bonds between adenine and thymine are more easily denatured than the
bonds between guanine and cytosine. There are cis-elements located within and around this region,
which are recognized by a set of proteins including DnaA, RepABC, and other proteins associated with
chromosomal replication that bind to the specific DNA sequences in this region and facilitate the
initiation of the chromosomal replication.
The advantage of the program used in this study is that it offers a progressive search and it also allows the
processing of an entire genome at once whereas many currently available web-based programs only allow
for a small number sequences, which is very time consuming and even providing limited information.
However, an alternative approach through ARTMIS used in this study further validated the result as
shown in Figure 2. Furthermore, the efficacy and the accuracy of this program was tested using the entire
genomic sequences of Caulobacter cresentus and Sinirhizobium meliloti, and the program was able to
identify all putative origins including the one which is biologically functional.
The output of both search programs provided 3013 and 336 nucleotide sequence files of Chromosome I
and Chromosome II, respectively. After matching the overlapping regions, there were a total 125 CI- and
16 CII-specific sequences remained. Following through the protein database search, there were 37 CI- and
9 CII-specific sequences were chosen to be analyzed further. These regions of putative origins were then
further analyzed to determine the presence of known cis-elements to which the DnaA and other replication
proteins bind as shown in Figure 3. Each of the total 13 DNA regions (as shown in Table 1) along with
the 300 nucleotide upstream and the down stream sequences was then analyzed for 21 different conserved
boxes for oriC, DnaA, RepABC1, and RepABC2 (5). Many of these sequences contain 2 to 5 of these
conserved binding boxes as shown in Figure 3. Based on the %AT content and the number of binding
boxes, 13 possible origin of replication were identified in R. sphaeroides’ chromosomes.
Comparison of the genome sequences of Caulobacter crescentus and Rickettsia prowazekii revealed that
both species shared a conserved cluster of genes in the hemE-hemH region that overlapped the established
origin of replication in C. cresecentus and the putative origin of replication in R. prowazekii (6). The
origin of replication of the S. meliloti chromosome has also been predicted as well as experimentally
confirmed to be approximately 400 kb from dnaA and adjacent to hemE (5). A putative origin of
replication of CI in R. sphaeroides is located ~40 kb from hemE but it remains uncertain until it will be
confirmed experimentally. Like R. sphaeroides, Vibrio cholerae possesses two chromosomes and the
origin of replication of the two chromosomes (oriCIvc and oriCIIvc), has been experimentally studied (7).
Thus, the identification of chromosomal origins in R. sphaeroides may further facilitate the mechanism of
chromosomal replication in bacterial species which possess multiple chromosomes.
In order to identify the putative origins on CI and CII in R. sphaeroides, a silico-approach was employed
to search CI- and CII-specific genomic sequences both with variable sequence length and %GC
composition. Two different computer programs, which search either overlapping or discrete segments of
DNA sequence, were used to search the entire chromosome specific sequences. All the sequences of 50
to 100 nucleotides in length with >65% AT content were selected for further analysis. These sequences
were then analyzed for the presence of cis-elements using the conserved consensus sequence found in
Sinorhizobium meliloti (5), which is closely related species to R. sphaeroides and which also belong to
the α-3 subgroup of proteobacteria.
(a)
METHODS
Silico-approach for the identification of the origin of replication: To identify the chromosomal
origin of replication in Rhodobacter sphaeroides 2.4.1, a computer program was designed in order to
search the A-T rich regions within CI and CII sequence. Further, the sequence was analyzed for the
presence of the consensus cis-elements which are necessary for the initiator proteins to start the
replication. The algorithm was developed as such that it searches both variable nucleotide lengths (50100 nucleotide range) and varying %AT composition (65 % to 80%) in an overlapping and
progressive manner as shown in Figure 1. The program was applied on each of the chromosomal
sequence of R. sphaeroides in the fasta format, which were directly obtained from the NCBI server.
For efficient use of memory and input-output loading, each sequence is analyzed sequentially in a
buffer. The analysis is performed by using the %AT calculation for each candidate sequence and then
checking if the nucleotide composition of the sequence is above a chosen threshold value. If a
sequence is shown to be above the chosen threshold value, it is then sent to the output data files. In
addition, ARTIMIS was also used to calculate the %GC composition within each of the discrete 120
nucleotide s long sequence along each of the two chromosomal sequences as shown in Figure 2.
Identification of the conserved DNA sequence boxes in the origin region: The sequences,
however, overlapped each other as was the nature of the program and as such had to be combined to
eliminate analyzing the same region twice. The assembled sequences were searched against the
protein database of the R. sphaeroides in order to identify if any of these sequences encode for the
protein. Finally, the remaining sequences were further analyzed using the DNADynamo to determine
whether they contain the consensus boxes as they were previously identified in the chromosomal
origin of S. meliloti (5). The program was downloaded through the internet from the publically
available website. The program performs the searches both in forward and reverse complement
directions of the target sequence.
a
b
Figure 1. a) Program window; b) Input data; c) Output data.
c
Chromosome I
~69% GC
Chromosome II
~69% GC
a
b
Figure 2. The G+C content and possible sites for origin of replication in CI and CII in R. sphaeroides 2.4.1
(purple-below average; yellow-over average). a) G+C content and two possible sites for origin of
replication in Chromosome I. b) G+C content and 9 possible sites for origin of replication in Chromosome
II.
Figure 3. DnaA and RepABC box biding sites for the origin of replication. a) A G+C content graph of a ~6kb region
encompassing in the possible region for origin of replication in R. sphaeroides. b) The sequence of possible regions for
origin of replication. c) DnaA and RepABC biding sites that match the DnaA and RepABC box consensus sequences. d)
The sequences of the putative DnaA and RepABC boxes. (* Biding sites for multiple box consensus sequences )
FUTURE WORKS
All thirteen putative chromosomal origins of R. sphaeroides 2.4.1 will be cloned into the suicide vector (pLO1 or
pSUP202). The resulting recombinant plasmid will be tested biologically if one of these origins allow the suicide plasmid
to autonomously replicate in R. sphaeroides. This work is currently in progress.
Table 1. The possible regions for origin of replication in Chromosome I and Chromosome II
Coordinates
Sequences (with A-T rich region marked as red)
A+T content for A-T rich
region
Locations
2380028-2380181
TCGCATCGCCCCTCCCGCTTCGTTGAACATTTTGGCCGATTAAATTCATTTTTTTGCCGACCATCAACGTTTATTTTCTTTTTG
ATGAAGATTTCCAGATTTACTTTCAGTTTTTCCATGCTTATGCCTTGGAAACTGGCAGTTTCCCGTTGGC
69.32%
CI
1700865-1701165
GGAGTGACTGAATGAAAGGCAACGATGTATCAATCATGAGATCGGAACATGAGTCTGCTCTCGAATAGAGTGAGATCAGG
ATTTAAGACAAAGTAAACATTTTTGGTATTCTTAAGTGATTGATTTTATTGAATAAATCAAGGGTGTCATATGGATTTGTTTT
TCTTAAGAAATCGTTTAATGATTGATTTATTGATTTATTAAGAAATGGATGAATCGAGATTTGATGTTCATGGTTCTTGAATG
GGTATTCCATCAATGAACATGAACATGAGTGCATTTTGGCGTAAGTGAGCGAAGC
72.58%
CI
1701171-1701360
GAACGCCACCTTTAATCCACATAGAGGTTTTGAGATCAGGAAAGGAGTCTTCTTTCAGATAAAGGTTTGAGATCAGGAAAG
GAGTCTTCTTTCACATAGAGGTTTTGAGATCGGATAAACCTTTAATCCACATAGAGGTTTTGAGATCGGATAAACTGCATCG
AATAAGGGTCACCATAAGCAATCTGGC
63.06%
CI
1701367-1701598
CCGCGCGAAGCGCCAATGGAATCGTTTATCCAATAGAGATTTGGACTCATACAGATCGGATAAATGATCTATGCTCAGATA
GAGATTTTGAGATATCAAATTTCATCAGATAAAGGTATTTTGGATCTTCAAACTTCCTTTCTCTAACTCAGATCTCATCTGGA
CCTTATAGTTAAGATTCTGATTATAGCTCTATTTCTATAGGGGGACGAAACCCCCATTTTCGTGGTGA
68.88%
CI
199834-199941
TCTTCCCCAGCTTATTGAAAGACAAACTGAAGAAAAAACGAGAAATTCTGACGGTTATAGAAAGTCAGACTTACAGAAGAT
CCGAGGGGGTGCTTTGAAACGCACATC
62.65%
CII
205692-205820
GTTCGGCGAGGCTCCACCTGTTCCCATTGACAGGCTAATCGAAAGCTAATCTAATAAAAACAAATAAAAGCTGACATGTGA
TGTAAGAAAATCTGACGAAAGAGAGGGGCGGATGTCGATCCGGATGCT
66.67%
CII
365609-365736
AGTATCAACTAAAGGTTGTAACCCGTCTATACTTTAGCGATAGAGTTTCATTAAGATACAATCAAGCGGGATTGTTCCTTCG
AGACTGGAACACCGTCAAAAGTGTGGGATATGGTCATTTTGACACA
63.75%
CII
478071-478177
GGAGTCAAGCATTTTGTAAACTTGTTATATACCAATCGGTTTCACTTGCTGAGCGAGGCCCCGGATAATCTGTTTTCGCATT
GTTTTGGAATGATAATCACTCTG
61.82%
CII
583697-583830
GTTACATTTTGTGCAAGACCATCACGATCTGTCAATCTCATTTTGCCAGATTTTCATGCTGCACCGCAGATAAACTCGGTGA
TTGACTTGTTCATATGTTTATTTGACAACTAATATGATCGTAGCCCAAGCGC
60.57%
CII
634469-634658
AAGAAAGTCAGCATAGAAATTGAGAATTAAGCACTCGTCTGGCAGAAAGGCCTTCCCGAAATTACATCGGGCAATTCAAA
AGAACCACCGTATTTAAGTTGACTGACGAAATACACATGTAGTTAAAATGCAGCCAATCGGAGGGCAATATGGACGGTCAG
AGAGTATCACAAGAAGAGTTTGAGGAACT
62.50%
CII
738147-738270
GCGAGTGGGATGTTCAGTAAGTTGATGAGTTTATCTGCTCGATAGTGCATGTATGCACCAATATTGGTTAAGTAAACGCTAC
CACTTTCGATTGAATCAAAAGCCGGACAAATCACCCATGGAT
63.64%
CII
876323-876437
AAGGACGAAAACACGTCATGACTCGCTTCATACTCAGCGACCTTTGCATCTGTTGTTATATTGGGGAAATAGTAGTGGTCTT
CAAATGCCATTATTTTCTTCCAATCTTTGTCGG
64.39%
CII
921529-921661
AATGGCTGATCCTTGGGTAATTTGTCCGGCTTTTGATTCAATCGAAAGTGGTAGCGTTTACTTAACCAATATTGGTGCATAC
ATGCACTATCGAGCAAATAAACTCATCAACTTACTGAACATCCCACTCGCC
64.84%
CII
REFERENCES
1. Suwanto, A., and S. Kaplan. 1989b. Physical and genetic mapping of the Rhodobacter sphaeroides 2.4.1 genome:
presence of two unique circular chromosomes. J. Bacteriology, 171:5850-5859.
2. Mackenzie, C., et al. (2001) The home stretch, a first analysis of the nearly completed genome of Rhodobacter
sphaeroides 2.4.1. Photosynthesis Research, 70: 19-41.
3. Choudhary, M., Yun-Xin Fu, C. Mackenzie, and S. Kaplan. 2004. DNA Sequence duplication in Rhodobacter
sphaeroides genome: Evidence of an ancient partnership between chromosomes I and II. J. Bacteriology, 187:20192027.
4. Fuller, R. S., Kaguni, J. M. and Kornberg, A. (1981). Enzymatic replication of the origin of Escherichia coli
chromosome. Proc Natl Acad Sci USA 78, 7370-7374.
5. Sibley, C. D., MacLellan, S. R., Finan, T. (2006) The Sinorhizobium meliloti chromosomal origin of replication.
Microbiology 152: 443-455.
6. Brassinga, A. K. C., R. Siam, and G. T. Marczynski. (2000) Conserved gene cluster at replication origins of the αProteobacteria Caulobacter crescentus and Rickettsia prowazekii. Journal of Bacteriology 183(5): 1824-1829.
7. Egan, E. S. and M. K. Waldor. (2003) Distinct replication requirements for the two Vibrio cholerae chromosomes. Cell
114: 521-530.