Subtraction hybridisation and shot-gun sequencing: a new approach

Nucleic Acids Research, 1994, Vol. 22, No. 8
1335-1341
Subtraction hybridisation and shot-gun sequencing: a new
approach to identify symbiotic loci
X.Perret 12 , R.Fellay1, A.J.Bjourson3, J.E.Cooper3, S.Brenner2 and W.J.Broughton1'*
1
Laboratoire de Biologie Mole'culaire des Plantes Sup^rieures, University de Geneve, 1 chemin de
Nmpe'ratrice, 1292 Chambe'sy, Switzerland, 2Molecular Genetics, University of Cambridge School of
Clinical Medicine, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2QQ and department of
Mycology and Plant Pathology, The Queen's University of Belfast, Belfast BT9 5PX, UK
Received March 14, 1994; Accepted March 22, 1994
EMBL accession no. X74134
ABSTRACT
Traditionally, new loci involved In the Rhlzobiumlegume symbiosis have been identified by transposon
mutagenesls and/or complementation. Wide dispersal
of the symbiotic loci in Rhizobium species NGR234, as
well as the large number of potential host-plants to be
screened, greatly reduces the efficiency of these
techniques. As an alternate strategy designed to
identify new NGR234 genes involved in the early stages
of the symbiosis, we combined data from competitive
RNA hybridisation, subtractive DNA hybridisation and
shot-gun sequencing. On the assumption that the
expression of most nodulatlon genes Is triggered by
compounds released by the host-plant, we identified,
in the ordered cosmld library of the large symbiotic
plasmid pNGR234a, restriction fragments that carry
transcripts induced by flavonolds. To target genes not
present In the closely related strain R.fredll USDA257,
we selected fragments that also carried sequences
purified by subtractive DNA hybridisation. Shot-gun
sequencing of this subset of fragments lead to the
identification of sequences with strong homology to
diverse prokaryotic genes/proteins. Amongst these, a
symblotically active ORF from pNGR234a, Is highly
homologous to the leucine responsive regulatory
protein of Escherichia coll (Lrp), is Induced by
flavonolds, and is not present in USDA257.
INTRODUCTION
Symbiotic associations between leguminous plants and soil
bacteria belonging to the genera Azorhizobium, Bradyrhizobium
and Rhizobium lead to the formation of nitrogen-fixing root
structures called nodules. In contrast to strains from temperate
regions that tend to have a limited host-range, tropical rhizobia
such as Rhizobium species NGR234 (1) and R.fredii USDA257
(2), nodulate a wide variety of host-plants. Tests on more than
400 different legumes have shown that NGR234 is able to
nodulate at least 75 plant genera, including the non-legume
•To whom correspondence should be addressed
Parasponia andersonii (1; 3; S.G.Pueppke and W.J.Broughton,
unpublished). Comparative studies have shown that R.fredii
USDA257 nodulates an exact subset of those of NGR234
(S.G.Pueppke and W.J.Broughton, unpublished). At the
nucleotide level, several symbiotic loci, including nodABC (3)
and nodS (4), are almost perfectly conserved, suggesting a very
close phylogenetic relationship between the two rhizobia.
Interestingly, the nodSU genes that allow NGR234 to nodulate
Leuceana species (5), are present in the USDA257 genome. A
deletion in the promoter region region renders nodSU inactive
however and is responsible for the Nod" phenotype of
USDA257 on Leuceana (4).
Wide-spread dispersal of the symbiotic loci in NGR234 (5,
6), coupled with the large number of potential hosts to be
screened, complicates traditional genetic approaches towards
identifying symbiotic genes (random mutagenesis, interspecies
complementation, etc). Accordingly, we designed an alternate
strategy to identify genes involved in the early stages of nodulation
(outlined in Fig. 1). The ordered cosmid library which covers
the symbiotic plasmid pNGR234a (6), as well as 97% of the
remaining 5.7 megabases of the NGR234 genome (7), was used
to index the position of loci whose expression is triggered by
plant signals (e.g. flavonoids). Many Xho\ restriction fragments,
dispersed over pNGR234a, and carrying flavonoid-inducible
genes were identifed by competitive RNA hybridisation (8). Some
of these fragments carried known inducible loci, such as the
nodABC and nodSU genes. Concomitantly, using DNA
subtraction hybridisation, we purified NGR234 sequences that
are absent from the genome of R.fredii USDA257. By probing
the cosmid library with these 'unique' sequences, we were able
to assign them first to certain cosmid clones and later, to specific
Xhol restriction fragments. To target flavonoid-inducible loci that
are not present in USDA257, we combined the results of the
competitive RNA and subtractive DNA hybridisations. This way,
we identified a subset of restriction fragments that carry sequences
not shared by USDA257, as well as inducible transcripts. Shotgun sequencing of these DNA fragments together with a fast
search for homology among existing nucleic acid and protein data
1336 Nucleic Acids Research, 1994, Vol. 22, No. 8
DNA SUBTRACTION
HYBRIDIZATION
RNA COMPETITION
HYBRIDIZATION
&m3AJ fragments torn NGR2M
not shmd by USDA257
1
1
PROBING OF THE ORDERED
COSMID LIBRARY:
PCR AMPLIFICATION
—
-
Dot-Mod sad Southern Moo otJOtol
restriction digests
Purtficatlos of JOKA ftigmmti from pNOR234«,
positive b bodi types of bybricfizatioa
CLONING
SmlAl LIBRARY
I
RANDOM SEQUENCING
lOOssqwcei
pX140
pX177
pXI8S
RANDOM SEQUENCING
ISO Sapiences
(ORF-I.Utriikc)
pRK4l
pRK21
pR57
pR64
Figure 1. Flow diagram showing the methods used to analyze the symbiotic
plasmid pNGR234a for loci induced by flavonoids and not shared by R.fredii
USDA257.
bases, identified a number of putative genes with strong homology
to diverse prokaryotic genes/proteins.
MATERIAL AND METHODS
RNA competition hybridisation
Rhizobium species NGR234 was grown at 28°C in RMM
minimal medium (9) with succinate as the carbon source.
Flavonoid induction was performed by adding 200 nM of daidzein
to cultures with a turbidity of 0.6 at 600 run. Cells were harvested
at different times after induction and resuspended in a pre-warmed
solution (90°C) consisting of equal volumes of phenol saturated
with sodium acetate (pH4.5) and 20 mM Tris-HCl containing
600 mM NaCl, 1 mM EDTA, and 1% (w/v) SDS. RNA was
extracted with phenol-chloroform, precipitated with ethanol and
purified by centrifugation through a CsCl cushion at 115,000 X
g for 1 h. To prepare radioactive probes, 10 to 15 ng of RNA
was partially hydrolyzed in 125 mM NaOH for 25 min on ice,
and labelled using T4 polynucleotide kinase and 7[32P]ATP for
90 min at 37°C. The probes were purified by centrifugation
through Sephadex G50 using Ultrafree-MC 0.45 /tm filter units
(Millipore, Bedford, MA, USA). Digested cosmid DNA was
separated on 0.8% (w/v) agarose gels and transferred to
'GeneScreen Plus' nylon membranes which were pre-hybridised
overnight at 65°C in 50 mM Tris-HCl (pH7.4), 0.2% (w/v)
bovine serum albumin, 0.2% (w/v) Ficol, 0.1% (w/v) sodium
pyrophosphate, 1 % (w/v) SDS, 1 M NaCl and 100-150 /tg of
non-labelled RNA prepared from non-induced rhizobia.
Hybridisation was performed at 65 °C for 20 h by adding the
purified probes directly to the pre-hybridisation solution. Washing
was performed (3 x 30 min at 65°C) in 1% (w/v) SDS, 1 x
SSC and 15 min at RT in 0.2 x SSC.
DNA subtraction hybridisation (for a detailed protocol see
Bjourson et al., 10)
Genomic DNAs from NGR234 (the 'probe' strain) and R.fredii
USDA257 (the 'subtracter' strain) were prepared using standard
procedures (10, 11). Approximately 1 fig of DNA from each
strain was digested to completion with Sau3AI. 600 ng of specific
linkers were ligated to =200 ng of each of the restricted DNAs.
Only the linker designed for the subtracter DNA was biotinylated
and was synthesized using uracil in place of thymidine. One ng
of the ligated probe DNA was amplified by 45 PCR cycles (80
sec denaturation at 94°C, 60 sec annealing at 55°C and 120 sec
DNA polymerization at 72 °C) in reaction mixtures containing
10 mM Tris-HCl (pH8.3), 50 mM KC1, 1.5 mM MgCl2,
0.01% (w/v) gelatin, 200 jiM dNTPs, 1 jtM primer
(complementary to the linker) and 0.5 U of Taq DNA
polymerase. With the exception of the biotinylated primer and
the substitution of dTTP for dUTP, the same amplification
conditions were applied to the subtracter DNA. To ensure that
sufficient biotin groups were present for subsequent binding to
streptavidin, the amplified subtracter DNA was additionally
biotinylated. Subtraction hybridisation was performed in 0.5 ml
centrifuge tubes, with = 1 to 5 ng of PCR-amplified probe DNA
(NGR234) and 20 /tg of subtracter DNA (USDA257) in a
hybridisation solution containing 50 mM HEPES (pH7.5), 0.5
M NaCl, 1 mM EDTA, and 0.1 % (w/v) SDS. The mixture was
denatured at 99°C for 10 min, and incubated at 65°C for 48 hrs.
To isolate the probe DNA from the subtraction mixture, 30 /*g
of streptavidin was added in two steps, and the mixture extracted
several times with an equal volume of phenol-chloroform (50:50,
v/v). Prior to cloning, the NGR234 DNA sequences left after
two consecutive cycles of subtraction were PCR-amplified using
the same amplification conditions as described above, except for
the addition of uracil glycosilase to destroy any remaining traces
of USDA257 subtracter DNA. The specific primer-linkers
flanking the subtracted sequences were removed by digestion with
Sau3M. Fragments larger than 100 bp were purified from a 1.2%
agarose gel using a DEAE cellulose membrane (Schleicher and
Schuell GmbH, Dassel, Germany) and cloned into the BamHl
restriction site of the Bluescript KS + vector (Stratagene, La Jolla,
CA, USA).
DNA isolation and sequencing
Bacterial strains and plasmids used are listed in Table 1. E. coli
was grown on TYE or Terrific Broth (12). Bluescript
recombinants were raised in E.coli DH5a, while Lorist2 cosmid
clones were grown in E.coli 1046. Cosmid and Bluescript
recombinant DNAs were prepared by standard alkaline minipreparations (12). A SOM3AI library of selected DNA fragments
from the cosmid clones covering pNGR234a was prepared as
follows: Xhol digested restriction fragments purified from agarose
gels were pooled and cleaved with Saw3AI, extracted with
phenol/chloroform and cloned in Bluescript KS + . DNA
sequences of inserts larger than 100 bp were determined by the
dideoxy method of Sanger et al. (13), using double stranded
templates and the Sequenase II kit (United States Biochemical
Corp., Cleveland, OH, USA).
DNA labelling and hybridisation procedures
^P-labelling of the SOM3AI fragments from NGR234 remaining
after subtraction against USDA257 genomic DNA was performed
by 3 cycles PCR amplification as in Bjourson et al. (10). Inserts
from selected Bluescript KS + clones were radioactively labelled
Nucleic Acids Research, 1994, Vol. 22, No. 8 1337
Table 1. Bacterial strains, plasmids and vectors used in this work
Strains, plasmids and vectors
Relevant Characteristics
References
35
36
37
38
R. sp. NGR234nORF-l
recAI, 080 lacZAM\5
recAl
broad host-range, Rif*
sym-plasmid cured derivative of NGR234
broad host-range Rhizobium isolated
from soybean nodules, KmR
n mutant of the ORF-1 locus (Lrp like)
Cosmid clones:
pXB315, pXB8O7
pXB739, pXB424
from the Sym-plasmid pNGR234a
from the chromosome of NGR234
Bacteria:
Escherichia coli DH5a
E.coli 1046
Rhizobium sp. NGR234
R. sp. ANU265
R.fredii USDA257
Bluescript KS+ clones:
pX140
pX177
pX185
pR57
pR64
pRK421
pXB315X1.4
pXB315P3
pXB807X5.2
pXB739P3
pRAF14
unique to NGR234, homologous to the leucine
responsive regulatory protein from E.coli
unique to NGR234, strong homology with
R.leguminosantm OMPIII locus
unique to NGR234, homologous to cation
ATPases
homology with the C-terminal domain of
E.coli Gabd protein
homologous to the C-terminal domain of
C.crescentus McpA protein
homologous to the UGDP gene of E.coli and
to the ATP-binding domain of that protein this work
1.4 kb Xhol fragment from pXB315
3 kb Pstl fragment from pXB315
5.2 kb Xhol fragment from pXB807
3 kb Pstl fragment from pXB739
Omega interposon inserted in the Hindlll site
of the 1.4 kb Xhol fragment from pXB315
by PCR amplification using either T3 - T 7 primers that flank the
entire insert, or synthesized primers designed to span that part
of the sequence with the highest degree of homology to the
database entries. Endonuclease digested DNAs were transferred
to nylon membranes by standard Southern blotting procedures.
Multiple samples of non-digested DNA were analysed by Dotblot hybridisation.
Data acquisition and computer analysis
Sequence data was collected on Macintosh computers (Apple
Computer Inc., Cuppertino, CA, USA) using the DNA Parrot
system (Clonetech, Palo Alto, CA, USA). Once transferred to
Sun workstations (Sun Microsystems Inc., Mountain View, CA,
USA), DNA sequences were analysed for redundant and similar
elements using the ICATOOL programme (14). Similar
sequences were subsequently aligned by CLUSTAL5 (15). To
identify homologies with published nucleotide or amino acid
sequences, the non-redundant elements were individually
compared to the latest version of the EMBL, GENEBANK,
NBRF and SWISSPROT databases using BLAST software (16).
Construction and phenotype of NGR234 ORF-1 mutant
The Hindm site in the polylinker of the clone pXB315X1.4 was
removed by digestion with Clal and BamH 1, the protruding ends
filled in, and the clone restored by re-ligation. A Sp1* Omega
interposon (17) was inserted in theremainingHindUl site internal
to ORFI. pRAF14 was derived by cloning the Xhol fragment
containing Omega in the suicide vector pJQ200SK (18). This
vector carries the sacB gene from Bacillus subtilis, which is
inducible by sucrose and lethal when expressed in Gram-negative
bacteria. pRAF14 was then mobilized into NGR234 by tri-
2
this work
this work
this work
this work
this work
this work
this
this
this
this
work
work
work
work
this work
parental mating using the helper plasmid pRK2013 (19).
Transconjugants were selected and purified on RMM plates
containing 100 mg/ml Rif, 50 mg/ml Sp and 1 % (w/v) mannitol.
Single colonies were grown in liquid TY and spread on plates
containing both antibiotics and 5 % (w/v) sucrose (to select for
inactivation of the sacB gene). In NGR234flORF-1, marker
exchange by double crossover was confirmed by Southern blot
analysis. Nodulation capacity of the ORF-1 Omega mutant was
compared to wild type NGR234 on Calopogonium caendum
(Benth.)Hemsl., Leucaena leucocephala (Lam.)DeWit,
Pachyrhizus tuberosus (Lam.)Spreng., and Vigna unguiculata
(L.)Walp. Except for V. unguiculata, all plants were grown
MagentaTM jars (5). Twenty to thirty five plants were used per
treatment. They were harvested 35d after inoculation with 109
bacteria per plant. Kinetics of nodulation in the 5 weeks following
inoculation were determined on Vigna plants held in growth
pouches (5). Each experiment was repeated two to three times.
RESULTS
Competitive RNA hybridisation
More than 50 Xhol restriction fragments, representing 100 kb
in total, and carrying genes regulated by flavonoids were
identified. These fragments are dispersed over pNGR234a, and
a detailed analysis of them is given in Fellay et al. (8).
Analysis of NGR234 DNA sequences not shared by R.fredii
USDA2S7
To assess the efficiency of the two consecutive cycles of
subtraction hybridisation, dot-blot filters of genomic DNAs from
USDA257 (the subtracter strain), NGR234 (the probe strain) and
1338 Nucleic Acids Research, 1994, Vol. 22, No. 8
•
-
•
•
•
*
>
pM77 (
•
Hit 0HTO1
•
• •
•
i
'
•
•
•
•
•
•
•
•
383
•
IVSAIGNAAKKGVLVKGGVYLEKLOAIKTVAPDKTOTLTKCVPWTD 429
I
i G AA GiL
G L L
A MTOTLIKC P .TD
IHVGHGRAAEHOILFRECEALQTLKSAEVIAVEHTaTLTKGRPELTD 2
V OR
0
G A. i L
I I iWTOTLT G P L
423 QWAACRLFQCCVHVKDCSAMERLAEIDTVLLDITOTLTIGKPRLVN 468
• •
•
142
*
•
41 KNDTEYGALTSLIALEIDCDAOECGNFKLDEAYIQbGGFKVCRFYSWWDKCL 196
K DTEYG LTi I . . . i .D
i
LO AY
OF G FYSWWD CL
93 KSDTEYCPLTCVIVHQFNADWASDQDAILDSAYLDVACFRAGLFYSWWDDCL U 4
r
lllilliiiill
r
*
• • •
•
• • •
-
> • • • •
••
• •
f1
285 INTGRMSSEAAPFOGVKQSCIGRECSRHGLEDYLDHKYLCVG 160
IKTO S E APFOGiK SG.GREOSi 0 EDYL KYiC G
440 INrCIIBNEVAPPGGIKASGLGREOSKYGIEDYLEIKYHCIC 481
a x ] , s * fi
2 IAEIDQSTQQTAAMAEESDAACRSLNAEAQHLLELIQQFELCGGSST
DQ TQQ AAM EiS AA iSL I E
L Li
F GiGSS
558 VMrMnovTnrMAAHVFDCTAATHSt.Kr.FTAP[.vBr.HABPnvnHnflfls
lllilli
142
604
31 RYPRTLSOO0RQ»VAIIORAIVRDP0VFLFDEPLSNLDAKLriVC«RAEIVKLLR 188
R PR LOOOORQtVAMORAIVRDP VFLFDEPLSNLDAKLRVQHR E
L R
130 PJ<PRELBOOQR0»VAllORAIVRDPAVFLFDEPLSNLDAKLP.VOHRLELQQLKR 182
B
- I Lip
Orf-1
. <o o W
•o o o
Bkrd
Lrp
Orf-1
Bkdr
Figure 2. A. Hybridisation patterns of the labelled NGR234 DNA fragments
not shared by R.fredii USDA257 on dot-blot filters prepared from cosmid DNAs
covering a 97% of NGR234 genome. The 24 clones representing the symbiotic
plasmid pNGR234a are boxed. B. left, Xho\ restriction digests of overlapping
cosmid DNAs covering half of pNGR234a. Right, Southern filter of the same
gel probed as in Fig.l-A. Positions of representative known genes are shown
on the autoradiogram with numbered circles: 1) nifKDH. 2) nolB. and 3) ORF-1.
- I-
35
RSVGLSPTPCLERVRRLERQGFIOGYTALINPHYLDASLLVFVEITLNRGAPDVFEOFVTAVQKLEEIOE
VCL6P iCL Ri
iC G I CYTALuP
•
V i . ITL R
F . AV K El E
HEVGLSPSACUWIKLHEQAGVIRGYTALVDPTQ8ESTIAVIINITLERQTEEYLDKFEAAVRKHPEIRE
V LSP 1C1 R n HE OVIR
Li P
VI,
LERQ.Ei L iFK.
IPEI E
28 RSVNL3PTPCFNRVRAMEELGVIR0CVTLLSPKALGLDVNVFIHVSLEKQVEQSLHRFKEEIAERPCVME
CHLVSGDFDYLLKTRVPDMfiAYRKLLGETLLRLPOVNCTRTYWMEEVKQSHKLVIKTR*
C L G D Y L i V
A '
EL
LPGV
V
RL K
CYLKTGaSDYMLRVDVENAGAFERIHltEVLSTLPGVRRIHSSFSIRNVLAG-RLKAKR*
CYLKTG DY.LRV
.
A ER
L LPGV I i S S F i i V
L
CYUtTGDPDYLLRVLLP3I0ALERFL-DYLTRLPGVANIRSSFALHQTOYirrALPLPAHCKTLRE»
165
121
162
FTgure3. A. Protein alignments for clones pX 177. pX 185, pR57, pR64andpRK21
assembled using the BLAST programme. Upper lines correspond to the putative
protein product encoded by one of the 6 ORF's of the NGR234 query sequence.
The most significant database matches are displayed on the lower lines, with the
identical, conserved (double dot) and less conserved (single dot) ammo acids listed
in the middle lines. In the case of pX 185 however, two alignments are provided,
above and below the query sequence, with the prosite signature for the El - E 2
class of ATPase's (accession number PS00154) marked in bold. The aspartate
residue believed to undergo phosphorylation is marked with an asterisk. Numbers
next to the first and last amino acids of each line show their respective positions
in the homologous protein. The RI methylation domain of the C.crescenms McpA
protein is underlined, while the • marks the potential methylation site based on
the reported methyl accepting peptide RI in E.coli Mcpl (34). In alignment of
pRK21, the bold amino acids correspond to the ATP-bmding site signature reported
in the Prosite database (accession number PS00211). B. One gap protein alignments
of the ORF-1 putative product (centre line) with E.coli Lrp (upper line), and
Bkdr from P.putida (lower line). Peptide ends are marked with • .
ANU265 (NGR234 cured of its symbiotic plasmid) were probed
with the subtracted fragments. No cross-hybridisation was
detected with USDA257, but the subtracted sequences hybridised
strongly to ANU265 and NGR234 genomic DNAs (data not
shown). Next, the ordered cosmid library was used to index the
position of these 'unique' sequences. Dot-blot filters of DNA
prepared from the 309 cosmids that cover 2: 97% of NGR234
genome (7) (see Fig.2-A), when probed with the unique
sequences, showed that less than a third of all the clones
hybridised. By comparing their respective positions in the
'contigs' (sets of contiguous cosmids), we found that positive
clones generally overlapped, and were grouped in about 30
distinct chromosomal regions. Since two thirds of the 24 cosmids
necessary to cover pNGR234a hybridised to fragments not shared
by R.fredii USDA257, the symbiotic plasmid in proportion to
its size, carries a greater number of unique sequences than the
chromosome. Assignment to distinct restriction fragments was
achieved by probing Southern blots of Xhol restricted cosmid
DNAs representative of pNGR234a. Specificity of the DNA
subtraction was confirmed by the absence of hybridisation signals
to restriction fragments known to carry genes (such as nodABC,
nodS, nolB and nifKDH) shared by both NGR234 and USDA257
(see Fig.2-B).
Figure 4. Genetic and Spel restriction map of the 500 kb symbiotic plasmid
pNGR234a. Spe\ restriction sites are marked with S. Approximate positions of
the known genes and me newly identified loci pRK21, pR57. pR64, syrM and
ORF-1 (pX140 and pRK41) are shown on the outer circle.
A sample of the unique fragments was analysed by shot-gun
sequencing. Of 100 randomly picked clones, the sequences of
73 inserts could be grouped into 24 families of similar elements.
Subsequently, a limited set of 59 non-redundant sequences was
matched against the nucleotide and amino-acid databases. Three
clones with significant homologies extending over the entire DNA
sequence were studied further. Clones pX140, pX177 and pX185
pR64, pR57
ORF-1
dclAl
miJKDH ,jfKDH
molB
modD2
Nucleic Acids Research, 1994, Vol. 22, No. 8 1339
Xhol
tf/mJIII
J
Sail
Sad
Smal
Clal Xhol
I
ORF-2
ORF-l
J 200 bp.
B
-178
-268
-358
ArreCTCCGCCTTCTTCGCAAGCGCGTTGGTGAACTTGAAGCAGTCGCCGGTCTCGTCCr^^
-44«
rr^r.rr.cr^r.TCAGrAArarnAAr^TCcccnTr,c.AAGTT«y^
US
K E V G L S P S A C L R R I K L H E Q A
CCTGTCATCAGGG<XrrATACGGCOCTTCTC<lATCCCAaX*CTCGGAATCGACAATAaCCGTAAT^
G V I R G Y T A L V D P T Q S E S T I A V I I T I T L E R Q
ACGGAGGAGTA<XTCQACAAGTTTGAAGCGGCCGTGCGCAAGCACCCC<1AAATTAGCWAGTGCTATCTAATGACCGGCGGATCAGACTAC
T E E Y L D K P E A A V R K H P E I R E C Y L M T G G S D Y
ATGCTGAGGGTGGACGTCGAGAATGCCGGGGCATTCryUXX:CATACACAAAGAGaTCCTGTCGACCTTGCCTGG<X7rGC^
M L R V D V E N A G A P E R I H K E V L S T L P G V R R I H
TCCAGCTTCTCCATTAGAAATGTCrTAGCGGCCCGTCTGAAAGCAAAAACyaSaAACTTTCCCATI^^
S S F S I R N V L A G R L K A K R O p .
Of«
Op.
'ACAGGCCACAGAAGATCTGAGCTCAGCAATCGAAGGCACM
-338
GTCGTGGTGACGATTTAACCCATTGAGATTCCCAAGAAGGCaX^AAATCACATTCAACACTGACT^^
RBJ
GGGCTTCGGGACGATCAATCGGAAOn7lTCAGAGGTTrTGTGCCXXKra»XACGAACCGCAAGCGT^^
G L R D D Q W E R I R G P V P G G T K G K R G P R
-62B
-728
-808
-898
-1078
H
D
C
D
-1168
T
N
N
R
L
F L D A L L W M A R S G D R W R D L P E R L G D Y R A V K L
R Y Y R W I E M G V L D E K L A V L A R E A D L E W L
TCGACTATCGTGCGCGCCCATCAG«TGCGGCCGGGGCGCGC*GGGCTAAAGGGGGGCGGATGCCCAGGGC^^
S T I V R A H Q H A A G A R R A K G G R H P R A W V G
M
I
L
E
D
-1435
Figure 5. A. Restriction map of the 1.4 kb Xhol fragment cloned in pXB315Xl 4, with the position of the two open reading frames reported (shadowed boxes).
B. complete DNA sequence of the same restriction fragment. Probable ribosome binding sites (RBS), putative start codons (ATGs and one alternate GTG) and nonsense codons (marked Opa) are underlined. The deduced amino acid sequence of the two ORFs is displayed under the nucleotide sequence.
matched (see Fig.3-A) a segment of a leucine regulatory protein
from E.coli (20), a sequence from R.leguminosarum coding for
an outer membrane protein (21), and a cadmium resistance
protein from Staphylococcus aureus (22) respectively.
Shot-gun sequence analysis of pNGR234a restriction
fragments that carry induced transcripts not shared by
USDA257
To identify flavonoid-inducible loci of pNGR234a that are not
present in USDA257 genome, we combined data from the
competitive RNA hybridisation with those shown in Fig.2. A
SaulIAJ library of the 18 Xhol restriction fragments that gave
hybridisation signals in both experiments, that did not carry any
known symbiotic loci and which are dispersed over 3: 57 kb
of pNGR234a was prepared. Four (pR57, pR64, pRK21 and
pRK41) out of 150 sequences of the library (representing £: 28
kb) showed very strong homologies (Fig.3-A) to a succinatesemialdehyde dehydrogenase from E.coli (Swissprot accession
number P25526), a methyl-accepting chemotaxis protein from
Caulobacter crescentus (23), the UGPC protein from E.coli (24)
and the leucine responsive regulatory protein respectively.
Detailed analysis of the selected clones
Confidence in gene identification by homology search clearly
depends upon the accuracy of the query sequence and increases
with homologies extending over larger DNA segments. To verify
that the homologies obtained for the seven selected clones were
not fortuitous, we cloned the corresponding genomic loci from
the ordered cosmid library of NGR234. For each of the seven
loci, we confirmed and extended the original sequence using as
template the appropriate genomic fragment, and two synthetic
primers designed to span the DNA segment showing the highest
degree of homology with the database entries.
Two sets of overlapping cosmids were homologous to pX177:
clones pXBS23 and pXBS4 from pNGR234a as well as cosmids
pXB482 and pXB739 from the chromosome. Sequence data
confirmed that the segment of pX177 which is homologous to
the R.leguminosarum OmpIII gene, mapped to a 3 kb Pstl
restriction fragment from the chromosome. Clone pX185 was
assigned to pXB424 of the chromosome. The homologies
reported with S.aureus CadA (22) and R.meliloti Fixl (25)
proteins correspond to a highly conserved domain in cation
transporters with El E2 ATPase activity (Fig.3-A).
ICATOOL analysis showed that pR64 and pR57 sequences
were complementary and overlapped by 178 bases. Combined,
they form a single Sau3AI fragment of 286 bp that maps to a
3 kb Pstl-Xhol restriction fragment shared by pXB43 and
pXB315 (see Fig.4 for approximate position). Interestingly, both
pR64 and pR57 gave different and statistically significant results
in the BLAST analysis. First, pR64 showed a high degree of
homology to the carboxy-terminus domain of several E. coli and
Caulobacter crescentus methyl accepting chemotaxis proteins
which extend over the second of the two proposed methylation
domains (KI and RI) adjacent to a well conserved cytoplasmic
1340 Nucleic Acids Research, 1994, Vol. 22, No. 8
region (23). On the complementary strand, the putative peptide
encoded by pR57 is highly homologous to the C-terminal domain
of several semialdehyde dehydrogenases. Since the putative
proteins from both sequences correspond to very conserved
carboxy-terminal domains, with non-sense codons correctly
placed to match the right protein length, it seems as if this Sau3Al
fragment extends over the ends of two genes transcribed in
opposite directions and overlaps by 34 bp.
The pRK21 insert was mapped to the 5.2 kb Xhol restriction
fragment of pXB807 (see position in Fig.4). This DNA fragment
was cloned (pRB807X5.2) and partially sequenced. About 800
bp of the NGR234 RS. 1 repeat element (one copy on pNGR234a,
three on the chromosome, 6) cover one extremity of this DNA
fragment while a syrM homologous sequence was identified at
the other extremity (data not shown). Alignments with database
entries showed strong homologies, both at the DNA and protein
level to the UGPC locus from E.coli. The putative pRK21 protein
product also displayed a high degree of homology to other related
ATP-binding proteins, such as R.leguminosarum and R.loti Nodi,
that are involved in the active transport of small hydrophilic
molecules across the cytoplasmic membrane. Despite these
homologies, we believe that pRK21 does not code for the
NGR234 Nodi product, as recent sequence data shows that nodi
is part of the nodABCIJ operon in Rhizobium sp. NGR234
(B.Relic' unpublished). Finally, ICATOOL analysis demonstrated
that the pX140 and pRK41 sequences are complementary and
overlap by 125 bp. Both clones are linked to a 1.4 kb Xhol
restriction fragment carried by the cosmid pXB315 (see ORF-1
map location in Fig.4), and contiguous to the 3 kb Pstl-Xhol
restriction fragment carrying the pR57 and pR64 sequences.
To test if any ot the 7 sequences described above are part of
open-reading frames whose expression is induced1 by flavonoids,
we prepared PCR-amplified products from the selected inserts
using primers designed to flank the DNA segment with the highest
degree of homology in the BLAST analysis. Probing a Southern
transfer of the resulting PCR products in a competitive RNA
hybridisation experiment showed that only inserts from pRK41
and pX140 hybridised to the labelled RNA prepared from
flavonoid-induced NGR234 bacteria. Later, induction of this locus
was also confirmed by Northern analysis (data not shown).
The deduced peptide of ORF-1 is strongly homologous to
E.coli Lrp and P.putida Bkdr
To test the reliability of our screening strategy, we analysed the
LRP-like locus that is both inducible and unique to NGR234.
First, to demonstrate that this sequence is truly unique, we probed
restricted genomic DNAs from USDA257 and NGR234 with a
32
P-labelled insert of pX140. As expected, only one strong band
was observed in NGR234, and there was no cross-hybridisation
with R.fredii DNA (data not shown). Second, to determine
whether pRK41 and pX140 inserts are part of a larger openreading frame, we sequenced the entire 1.4 kb Xhol restriction
fragment cloned in pXB315X1.4 (Fig.5). Both pX140 and pRK41
sequences matched the 381 bp ORF-1. BLAST analysis showed
that the putative ORF-1 product is highly homologous to two
regulatory proteins, one from E.coli, the other from Pseudomonas
putida [Lrp and Bkdr respectively (26)]. The one gap alignment
presented in Fig.3-B, predicts that the deduced amino acid
sequence of the protein encoded by ORF-1 has 37% and 40%
identity, or 78% and 81 % homology (when similar amino acids
are included) to Lrp and Bkdr respectively. All scores are higher
than those proposed in a more flexible three gaps alignment of
the E.coli Lrp and AsnC proteins (20). Extensive homology of
the ORF-1 amino-terminal domain with the E. coli Lrp HelixTum-Helix domain, suggests that protein synthesis should initiate
at the GTG codon rather than at the downstream ATG (Fig.5-B).
If translation starts at the alternate GTG codon, the NGR234 Lrp
homologue is 127 amino acids long, 35 a.a. shorter at its aminoterminus than the E.coli Lrp. A second ORF (ORF-2; see
Fig.4-A) was identified on the 1.4 kb Xhol fragment. The
deduced peptide sequence of ORF-2 shares 28% identity and 72%
homology (when similar amino acids are included in the analysis)
with the protein A3 from Agrobacterium tumefaciens IS869 (27)
in a no-gap protein alignment (data not shown).
Symbiotic phenotype of the ORF-1 ::fi mutant
To assay symbiotic activity of the ORF-1 locus, a mutant carrying
the Omega interposon in the HindUl site internal to the gene was
constructed (NGR234QORF-1). In comparison with wild-type
NGR234, this mutant caused a 4.5 day delay in nodulation of
V.unguiculata (measured 21 d after inoculation). On
Lleucocephala and P.tuberosus, the number of nodules increased
by more than 65% in comparison to the wild-type, while on
C.caendeum the nodule number was decreased by ± 25%.
DISCUSSION
Random sequencing has been used to study the genome structure
of various organisms including the Larungotracheitis virus (28),
Mycoplasma genitalium (29), Saccharomyces cerevisiae
(C.J.Davies, Ph.D. thesis, 1991) and the Pufferfish, Fugu
rubripes rubripes (30). In association with competitive RNA
hybridisation and/or subtraction DNA hybridisation, it becomes
a potent method to compare related genomes as well as to target
actively transcribed genes. The screening strategy outlined in
Fig. 1 is based on the NGR234 physical map, and expands the
level of analysis from limited DNA segments to the whole
replicon. It is flexible since minor modifications to the
RNA — DNA hybridisation procedures allow targeting of genes
induced or repressed under many different conditions. Only genes
with relatively strong homologies to database entries will be
identified this way however.
Data from DNA subtraction hybridisations confirmed that
Rhizobium species NGR234 and R.fredii strain USDA257 are
phylogenetically related, and share most of their genomic
background. No essential gene was identified in the random
sequence analysis of the SOM3AI fragments remaining after two
cycles of DNA subtraction hybridisation. Homologies with IS
elements indexed in the databases (data not shown), and the
absence from the USDA257 genome of the RS. 1 transposon like
repeat (X.Perret, unpublished) suggests that many of the
sequences 'unique' to NGR234 are mobile elements which have
accumulated since both bacteria diverged. The higher proportion
of the 'unique' sequences in the symbiotic plasmid compared with
the rest of the genome suggests that pNGR234a tolerates
integration of non-endogenous sequences better than the
chromosome. Icatool and ClustalV analysis of more than 100
NGR234 5ou3AI fragments not shared by R.fredii USDA257
revealed inherent limitations in the library prepared from the
subtracted fragments. There was 40% redundancy amongst the
clones analysed, with similar sequences grouped into 24 families
of as many as 10 elements. Probing Southern blots of multiple
restriction digests of NGR234 genomic DNA with sequences
representative of some of these largest families showed that the
Nucleic Acids Research, 1994, Vol. 22, No. 8 1341
redundancy does not result from repeated elements in NGR234
genome (data not shown). Moreover, several sequence
mismatches were found among nearly identical fragments cloned
in both orientations. This indicates that the PCR amplification
of subtracted sequences prior to cloning generates or increases
an unbalanced distribution of fragments, provoking small
anomalies due to Taq polymerase misreadings. This biased
fragment distribution prevents use of the level of redundancy in
the pool of analysed clones to estimate the total length of
sequences specific to NGR234. Nevertheless, valuable genetic
information can be retrieved from the library of subtracted
fragments particularly using the BLAST software which is
capable of detecting distant protein homologies even when
confronted with such common sequencing errors as frameshifts
and replacements.
This combination of techniques lead to the identification of
several new loci with putative symbiotic functions. Among these,
the sequence homologous to the symbiotic regulator syrM is
adjacent to pRK21, the clone pX177 with homology to the ompIII
gene which is symbiotically repressed in R. leguminosarum has
been mapped to the chromosome of NGR234, and pX185 carries
a highly conserved domain with El E2 ATPase activity common
in cation transporters such as the Fixl protein. In addition, pR57
has been shown to be homologous to the C-terminal domain of
succinate- and other semialdehyde dehydrogenases. In R.meliloti,
a mutant with a low succinic semialdehyde dehydrogenase activity
is defective in symbiotic nitrogen fixation. More interestingly,
we identified ORF-1, a new symbiotic gene. The peptide encoded
by ORF-1 is very similar to the regulatory proteins Lrp from
E.coli and BkdR from P.putida. BkdR is a positive activator of
the branched-chain keto acid dehydrogenase operon, while Lrp
combines repressor and activator activities that coordinate various
functions involved in global responses (31). The conservation
in the ORF-1 product of all but one of the amino acids known
to affect the Lrp DNA binding ability (32), suggests that ORF-1
may have retained regulatory functions. In presence of a suitable
carbon source, the Lrp mutant in E.coli grows normally.
Similarly, the ORF-1 mutant of NGR234 does not display an
extreme phenotype. However, the fl::ORF-l mutation modifies
the efficiency of nodulation by NGR234. Depending upon the
plant tested, we observed a significant delay in nodulation, a
reduction or even a large increase in the number of nodules. This
symbiotic phenotype, together with the observed flavonoid
induction of ORF-1 and its location on the non-essential symbiotic
plasmid pNGR234a, suggest that this gene is probably not
involved in the regulation of operons similar to those controlled
by Lrp and BkdR. Furthermore, the absence of homologous genes
in other rhizobia (data not shown), as well as in the closely related
R.fredii USDA257, suggests that NGR234 has developed
additional systems to regulate nodulation. Another symbiotic
regulatory systems, nodVW, has been described in
Bradyrhizobium japonicum (33).
ACKNOWLEDGEMENTS
We wish to thank M.Trower, G.Elgar and D.Gerber for their
help in many aspects of this work. We are grateful to J.Parsons
and S.Aparicio for their assistance with the computer analysis.
Financial support was provided by the Fonds National Suisse de
la Recherche Scientifique (Grants #31-30950.91 and
31-36454.92) and the Fondation Sandoz pour l'Avancement des
Sciences Medico-biologiques. R.Fellay gratefully acknowledges
the receipt of an EMBO short-term fellowship.
REFERENCES
1. Trinick, M.J. (1980) J. Appl. Bacteriol.. 49, 39-53.
2. Heron, D.S. and Pueppke, S.G. (1984) / Baaeriol., 160, 1061-1066.
3. Relic', B., Perret, X., Golinowsky, W., Pueppke, S.G., Krishnan, H.B.
and Broughton, W.J. (1993) Science, Submitted.
4. Krishnan, H.B., Lewin, A., Fellay, R., Broughton, W.J. and Pueppke, S.G.
(1992) Mol. Microbioi, 6, 3321-3330.
5. Lewin, A., Cervantes, E., Wong, C.-H. and Broughton. W.J. (1990) Mol.
Plant-Microbe Interact.. 3, 317-326.
6 Perret, X., Broughton, W.J. and Brenner, S. (1991) Proc. Nail. Acad. Sci.
USA, 88, 1923-1927.
7. Perret, X. (1992) Ph.D. thesis # 2489, University of Geneva, Geneva,
Switzerland.
8. Fellay, R., Perret, X., Broughton, W.J. and Brenner. S. (1993) Mol.
Microbioi., submitted.
9. Broughton, W.J , Wong, C.-H., Lewin. A., Samrey, U.. Myint, H.. Meyer
z.A., H., Dowling, D N. and Simon, R. (1986) J. Cell Biol.. 102.
1173-1182.
10. Bjourson, A.J., Stone, C.E. and Cooper J.E. (1992) Appl. Environ.
Microbioi., 58, 2296-2301.
11 Stanley, J., Dowling, D.N., Stucker, M. and Broughton, W.J. (1987) FEMS
Microbioi. Lett., 48, 25-30.
12. Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Cloning:
A Laboratory Manual, second edition. Cold Spring Harbor University Press.
Cold Spring Harbor.
13. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. Nail. Acad. Sci.
USA, 74, 5463-5467.
14. Parsons. J.D., Brenner. S. and Bishop. M.J. (1992) Comp Appl. Biosc .
8, 461-466.
15. Higgins, D.J and Sharp, P.M. (1988) Gene, 73, 237-244.
16. Altschul, S.F., Gish, W., Miller W., Myers. E.W. and Lipman. D. (1990)
J. Mol. Biol., 215, 403-410.
17. Prentki, P. and Knsch, H.M. (1984) Gene. 29, 303-313.
18. Quandt, J. and Hynes, M.F. (1993) Gene, 127, 15-21.
19. Figurski, D.H. and Helinski, D.R. (1979) Proc. Natl. Acad. Sci. USA, 76,
1648-1652.
20. Willins, D.A., Ryan. C.W., Platko. J.V. and Calvo. J.M. (1991) J. Biol.
Chem., 266, 10768-10774.
21. deMaagd, R.A., Mulders, I.H.M., Canter Cremers, H.C.J. and Lugtenberg.
B.J.J. (1992) J. Baaeriol., 174, 214-221.
22. Nucifora, G., Chu, L., Misra, T.K. and Silver, S. (1989) Proc. Natl. Acad.
Sci. USA, 86, 3544-3548.
23. Alley, M.R.K.. Maddock, J.R. and Shapiro, L. (1992) Genes and De\'.,
6, 825-836.
24. Overduin, P., Boos, W. and Tommassen, J. (1988) Mol. Microbioi., 2.
767-775.
25. Kahn, D., David, M., Domergue. O.. Daveran, M.L.. Ghai, J., Hirsch.
P.R. and Batut, J. (1989) J. Baa., 171, 929-939.
26. Madhusudhan, K.T., Lorenz. D. and Sokatch, J.R. (1993) J. Baa.. 175,
3934-3940.
27. Paulus, F., Canaday, J., Vincent. F., Bonard, G., Kares, C. and Otten,
L. (1991) Plant Mol. Biol., 16, 601-614.
28. Griffin, A.M. (1989) / Gen. Virol., 70, 3085-3089.
29. Peterson, S.N., Schramm, N., Hu, P.-C., Bott, K.F. and Hutchison, C.A.
(1991) Nucleic Acids Res., 19, 6027-6031.
30. Brenner, S., Elgar, G., Sandford, R., Macrae, A.. Venkatesh, B. and
Aparicio, S. (1993) Nature, 366, 265-268.
31. Newman, E.B., D'Ari, R. and Lin, R.T. (1992) Cell, 68, 617-619.
32. Platko, J.V. and Calvo, J.M. (1993) J. Baa., 175, 1110-1117.
33. Gotrfert, M., Grob, P. and Hennecke, H. (1990) Proc. Natl. Acad. Sci. USA,
87, 2680-2684.
34. Kehry, M.R., Bond, M.W., Hunkapiller, M.W. and Dahlquist. F.W. (1983)
Proc. Natl. Acad. Sci. USA, 80, 3599-3603.
35. Hanahan, D. (1983) J. Mol. Biol., 166, 557-580.
36. Cami, B. and Kourilsky, P. (1978) Nucleic Acids Res.. 5, 2381-2390.
37. Stanley, J., Dowling, D.N. and Broughton, W.J. (1988) Mol. Gen.
Genet.,215, 32-37.
38. Morrison, N.A , Hau. C.Y., Trinick, M.J., Shine, J. and Rolfe, B.G. (1983)
J. Baa., 153, 527-531.