Nucleic Acids Research, 1994, Vol. 22, No. 8 1335-1341 Subtraction hybridisation and shot-gun sequencing: a new approach to identify symbiotic loci X.Perret 12 , R.Fellay1, A.J.Bjourson3, J.E.Cooper3, S.Brenner2 and W.J.Broughton1'* 1 Laboratoire de Biologie Mole'culaire des Plantes Sup^rieures, University de Geneve, 1 chemin de Nmpe'ratrice, 1292 Chambe'sy, Switzerland, 2Molecular Genetics, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2QQ and department of Mycology and Plant Pathology, The Queen's University of Belfast, Belfast BT9 5PX, UK Received March 14, 1994; Accepted March 22, 1994 EMBL accession no. X74134 ABSTRACT Traditionally, new loci involved In the Rhlzobiumlegume symbiosis have been identified by transposon mutagenesls and/or complementation. Wide dispersal of the symbiotic loci in Rhizobium species NGR234, as well as the large number of potential host-plants to be screened, greatly reduces the efficiency of these techniques. As an alternate strategy designed to identify new NGR234 genes involved in the early stages of the symbiosis, we combined data from competitive RNA hybridisation, subtractive DNA hybridisation and shot-gun sequencing. On the assumption that the expression of most nodulatlon genes Is triggered by compounds released by the host-plant, we identified, in the ordered cosmld library of the large symbiotic plasmid pNGR234a, restriction fragments that carry transcripts induced by flavonolds. To target genes not present In the closely related strain R.fredll USDA257, we selected fragments that also carried sequences purified by subtractive DNA hybridisation. Shot-gun sequencing of this subset of fragments lead to the identification of sequences with strong homology to diverse prokaryotic genes/proteins. Amongst these, a symblotically active ORF from pNGR234a, Is highly homologous to the leucine responsive regulatory protein of Escherichia coll (Lrp), is Induced by flavonolds, and is not present in USDA257. INTRODUCTION Symbiotic associations between leguminous plants and soil bacteria belonging to the genera Azorhizobium, Bradyrhizobium and Rhizobium lead to the formation of nitrogen-fixing root structures called nodules. In contrast to strains from temperate regions that tend to have a limited host-range, tropical rhizobia such as Rhizobium species NGR234 (1) and R.fredii USDA257 (2), nodulate a wide variety of host-plants. Tests on more than 400 different legumes have shown that NGR234 is able to nodulate at least 75 plant genera, including the non-legume •To whom correspondence should be addressed Parasponia andersonii (1; 3; S.G.Pueppke and W.J.Broughton, unpublished). Comparative studies have shown that R.fredii USDA257 nodulates an exact subset of those of NGR234 (S.G.Pueppke and W.J.Broughton, unpublished). At the nucleotide level, several symbiotic loci, including nodABC (3) and nodS (4), are almost perfectly conserved, suggesting a very close phylogenetic relationship between the two rhizobia. Interestingly, the nodSU genes that allow NGR234 to nodulate Leuceana species (5), are present in the USDA257 genome. A deletion in the promoter region region renders nodSU inactive however and is responsible for the Nod" phenotype of USDA257 on Leuceana (4). Wide-spread dispersal of the symbiotic loci in NGR234 (5, 6), coupled with the large number of potential hosts to be screened, complicates traditional genetic approaches towards identifying symbiotic genes (random mutagenesis, interspecies complementation, etc). Accordingly, we designed an alternate strategy to identify genes involved in the early stages of nodulation (outlined in Fig. 1). The ordered cosmid library which covers the symbiotic plasmid pNGR234a (6), as well as 97% of the remaining 5.7 megabases of the NGR234 genome (7), was used to index the position of loci whose expression is triggered by plant signals (e.g. flavonoids). Many Xho\ restriction fragments, dispersed over pNGR234a, and carrying flavonoid-inducible genes were identifed by competitive RNA hybridisation (8). Some of these fragments carried known inducible loci, such as the nodABC and nodSU genes. Concomitantly, using DNA subtraction hybridisation, we purified NGR234 sequences that are absent from the genome of R.fredii USDA257. By probing the cosmid library with these 'unique' sequences, we were able to assign them first to certain cosmid clones and later, to specific Xhol restriction fragments. To target flavonoid-inducible loci that are not present in USDA257, we combined the results of the competitive RNA and subtractive DNA hybridisations. This way, we identified a subset of restriction fragments that carry sequences not shared by USDA257, as well as inducible transcripts. Shotgun sequencing of these DNA fragments together with a fast search for homology among existing nucleic acid and protein data 1336 Nucleic Acids Research, 1994, Vol. 22, No. 8 DNA SUBTRACTION HYBRIDIZATION RNA COMPETITION HYBRIDIZATION &m3AJ fragments torn NGR2M not shmd by USDA257 1 1 PROBING OF THE ORDERED COSMID LIBRARY: PCR AMPLIFICATION — - Dot-Mod sad Southern Moo otJOtol restriction digests Purtficatlos of JOKA ftigmmti from pNOR234«, positive b bodi types of bybricfizatioa CLONING SmlAl LIBRARY I RANDOM SEQUENCING lOOssqwcei pX140 pX177 pXI8S RANDOM SEQUENCING ISO Sapiences (ORF-I.Utriikc) pRK4l pRK21 pR57 pR64 Figure 1. Flow diagram showing the methods used to analyze the symbiotic plasmid pNGR234a for loci induced by flavonoids and not shared by R.fredii USDA257. bases, identified a number of putative genes with strong homology to diverse prokaryotic genes/proteins. MATERIAL AND METHODS RNA competition hybridisation Rhizobium species NGR234 was grown at 28°C in RMM minimal medium (9) with succinate as the carbon source. Flavonoid induction was performed by adding 200 nM of daidzein to cultures with a turbidity of 0.6 at 600 run. Cells were harvested at different times after induction and resuspended in a pre-warmed solution (90°C) consisting of equal volumes of phenol saturated with sodium acetate (pH4.5) and 20 mM Tris-HCl containing 600 mM NaCl, 1 mM EDTA, and 1% (w/v) SDS. RNA was extracted with phenol-chloroform, precipitated with ethanol and purified by centrifugation through a CsCl cushion at 115,000 X g for 1 h. To prepare radioactive probes, 10 to 15 ng of RNA was partially hydrolyzed in 125 mM NaOH for 25 min on ice, and labelled using T4 polynucleotide kinase and 7[32P]ATP for 90 min at 37°C. The probes were purified by centrifugation through Sephadex G50 using Ultrafree-MC 0.45 /tm filter units (Millipore, Bedford, MA, USA). Digested cosmid DNA was separated on 0.8% (w/v) agarose gels and transferred to 'GeneScreen Plus' nylon membranes which were pre-hybridised overnight at 65°C in 50 mM Tris-HCl (pH7.4), 0.2% (w/v) bovine serum albumin, 0.2% (w/v) Ficol, 0.1% (w/v) sodium pyrophosphate, 1 % (w/v) SDS, 1 M NaCl and 100-150 /tg of non-labelled RNA prepared from non-induced rhizobia. Hybridisation was performed at 65 °C for 20 h by adding the purified probes directly to the pre-hybridisation solution. Washing was performed (3 x 30 min at 65°C) in 1% (w/v) SDS, 1 x SSC and 15 min at RT in 0.2 x SSC. DNA subtraction hybridisation (for a detailed protocol see Bjourson et al., 10) Genomic DNAs from NGR234 (the 'probe' strain) and R.fredii USDA257 (the 'subtracter' strain) were prepared using standard procedures (10, 11). Approximately 1 fig of DNA from each strain was digested to completion with Sau3AI. 600 ng of specific linkers were ligated to =200 ng of each of the restricted DNAs. Only the linker designed for the subtracter DNA was biotinylated and was synthesized using uracil in place of thymidine. One ng of the ligated probe DNA was amplified by 45 PCR cycles (80 sec denaturation at 94°C, 60 sec annealing at 55°C and 120 sec DNA polymerization at 72 °C) in reaction mixtures containing 10 mM Tris-HCl (pH8.3), 50 mM KC1, 1.5 mM MgCl2, 0.01% (w/v) gelatin, 200 jiM dNTPs, 1 jtM primer (complementary to the linker) and 0.5 U of Taq DNA polymerase. With the exception of the biotinylated primer and the substitution of dTTP for dUTP, the same amplification conditions were applied to the subtracter DNA. To ensure that sufficient biotin groups were present for subsequent binding to streptavidin, the amplified subtracter DNA was additionally biotinylated. Subtraction hybridisation was performed in 0.5 ml centrifuge tubes, with = 1 to 5 ng of PCR-amplified probe DNA (NGR234) and 20 /tg of subtracter DNA (USDA257) in a hybridisation solution containing 50 mM HEPES (pH7.5), 0.5 M NaCl, 1 mM EDTA, and 0.1 % (w/v) SDS. The mixture was denatured at 99°C for 10 min, and incubated at 65°C for 48 hrs. To isolate the probe DNA from the subtraction mixture, 30 /*g of streptavidin was added in two steps, and the mixture extracted several times with an equal volume of phenol-chloroform (50:50, v/v). Prior to cloning, the NGR234 DNA sequences left after two consecutive cycles of subtraction were PCR-amplified using the same amplification conditions as described above, except for the addition of uracil glycosilase to destroy any remaining traces of USDA257 subtracter DNA. The specific primer-linkers flanking the subtracted sequences were removed by digestion with Sau3M. Fragments larger than 100 bp were purified from a 1.2% agarose gel using a DEAE cellulose membrane (Schleicher and Schuell GmbH, Dassel, Germany) and cloned into the BamHl restriction site of the Bluescript KS + vector (Stratagene, La Jolla, CA, USA). DNA isolation and sequencing Bacterial strains and plasmids used are listed in Table 1. E. coli was grown on TYE or Terrific Broth (12). Bluescript recombinants were raised in E.coli DH5a, while Lorist2 cosmid clones were grown in E.coli 1046. Cosmid and Bluescript recombinant DNAs were prepared by standard alkaline minipreparations (12). A SOM3AI library of selected DNA fragments from the cosmid clones covering pNGR234a was prepared as follows: Xhol digested restriction fragments purified from agarose gels were pooled and cleaved with Saw3AI, extracted with phenol/chloroform and cloned in Bluescript KS + . DNA sequences of inserts larger than 100 bp were determined by the dideoxy method of Sanger et al. (13), using double stranded templates and the Sequenase II kit (United States Biochemical Corp., Cleveland, OH, USA). DNA labelling and hybridisation procedures ^P-labelling of the SOM3AI fragments from NGR234 remaining after subtraction against USDA257 genomic DNA was performed by 3 cycles PCR amplification as in Bjourson et al. (10). Inserts from selected Bluescript KS + clones were radioactively labelled Nucleic Acids Research, 1994, Vol. 22, No. 8 1337 Table 1. Bacterial strains, plasmids and vectors used in this work Strains, plasmids and vectors Relevant Characteristics References 35 36 37 38 R. sp. NGR234nORF-l recAI, 080 lacZAM\5 recAl broad host-range, Rif* sym-plasmid cured derivative of NGR234 broad host-range Rhizobium isolated from soybean nodules, KmR n mutant of the ORF-1 locus (Lrp like) Cosmid clones: pXB315, pXB8O7 pXB739, pXB424 from the Sym-plasmid pNGR234a from the chromosome of NGR234 Bacteria: Escherichia coli DH5a E.coli 1046 Rhizobium sp. NGR234 R. sp. ANU265 R.fredii USDA257 Bluescript KS+ clones: pX140 pX177 pX185 pR57 pR64 pRK421 pXB315X1.4 pXB315P3 pXB807X5.2 pXB739P3 pRAF14 unique to NGR234, homologous to the leucine responsive regulatory protein from E.coli unique to NGR234, strong homology with R.leguminosantm OMPIII locus unique to NGR234, homologous to cation ATPases homology with the C-terminal domain of E.coli Gabd protein homologous to the C-terminal domain of C.crescentus McpA protein homologous to the UGDP gene of E.coli and to the ATP-binding domain of that protein this work 1.4 kb Xhol fragment from pXB315 3 kb Pstl fragment from pXB315 5.2 kb Xhol fragment from pXB807 3 kb Pstl fragment from pXB739 Omega interposon inserted in the Hindlll site of the 1.4 kb Xhol fragment from pXB315 by PCR amplification using either T3 - T 7 primers that flank the entire insert, or synthesized primers designed to span that part of the sequence with the highest degree of homology to the database entries. Endonuclease digested DNAs were transferred to nylon membranes by standard Southern blotting procedures. Multiple samples of non-digested DNA were analysed by Dotblot hybridisation. Data acquisition and computer analysis Sequence data was collected on Macintosh computers (Apple Computer Inc., Cuppertino, CA, USA) using the DNA Parrot system (Clonetech, Palo Alto, CA, USA). Once transferred to Sun workstations (Sun Microsystems Inc., Mountain View, CA, USA), DNA sequences were analysed for redundant and similar elements using the ICATOOL programme (14). Similar sequences were subsequently aligned by CLUSTAL5 (15). To identify homologies with published nucleotide or amino acid sequences, the non-redundant elements were individually compared to the latest version of the EMBL, GENEBANK, NBRF and SWISSPROT databases using BLAST software (16). Construction and phenotype of NGR234 ORF-1 mutant The Hindm site in the polylinker of the clone pXB315X1.4 was removed by digestion with Clal and BamH 1, the protruding ends filled in, and the clone restored by re-ligation. A Sp1* Omega interposon (17) was inserted in theremainingHindUl site internal to ORFI. pRAF14 was derived by cloning the Xhol fragment containing Omega in the suicide vector pJQ200SK (18). This vector carries the sacB gene from Bacillus subtilis, which is inducible by sucrose and lethal when expressed in Gram-negative bacteria. pRAF14 was then mobilized into NGR234 by tri- 2 this work this work this work this work this work this work this this this this work work work work this work parental mating using the helper plasmid pRK2013 (19). Transconjugants were selected and purified on RMM plates containing 100 mg/ml Rif, 50 mg/ml Sp and 1 % (w/v) mannitol. Single colonies were grown in liquid TY and spread on plates containing both antibiotics and 5 % (w/v) sucrose (to select for inactivation of the sacB gene). In NGR234flORF-1, marker exchange by double crossover was confirmed by Southern blot analysis. Nodulation capacity of the ORF-1 Omega mutant was compared to wild type NGR234 on Calopogonium caendum (Benth.)Hemsl., Leucaena leucocephala (Lam.)DeWit, Pachyrhizus tuberosus (Lam.)Spreng., and Vigna unguiculata (L.)Walp. Except for V. unguiculata, all plants were grown MagentaTM jars (5). Twenty to thirty five plants were used per treatment. They were harvested 35d after inoculation with 109 bacteria per plant. Kinetics of nodulation in the 5 weeks following inoculation were determined on Vigna plants held in growth pouches (5). Each experiment was repeated two to three times. RESULTS Competitive RNA hybridisation More than 50 Xhol restriction fragments, representing 100 kb in total, and carrying genes regulated by flavonoids were identified. These fragments are dispersed over pNGR234a, and a detailed analysis of them is given in Fellay et al. (8). Analysis of NGR234 DNA sequences not shared by R.fredii USDA2S7 To assess the efficiency of the two consecutive cycles of subtraction hybridisation, dot-blot filters of genomic DNAs from USDA257 (the subtracter strain), NGR234 (the probe strain) and 1338 Nucleic Acids Research, 1994, Vol. 22, No. 8 • - • • • * > pM77 ( • Hit 0HTO1 • • • • i ' • • • • • • • • 383 • IVSAIGNAAKKGVLVKGGVYLEKLOAIKTVAPDKTOTLTKCVPWTD 429 I i G AA GiL G L L A MTOTLIKC P .TD IHVGHGRAAEHOILFRECEALQTLKSAEVIAVEHTaTLTKGRPELTD 2 V OR 0 G A. i L I I iWTOTLT G P L 423 QWAACRLFQCCVHVKDCSAMERLAEIDTVLLDITOTLTIGKPRLVN 468 • • • 142 * • 41 KNDTEYGALTSLIALEIDCDAOECGNFKLDEAYIQbGGFKVCRFYSWWDKCL 196 K DTEYG LTi I . . . i .D i LO AY OF G FYSWWD CL 93 KSDTEYCPLTCVIVHQFNADWASDQDAILDSAYLDVACFRAGLFYSWWDDCL U 4 r lllilliiiill r * • • • • • • • - > • • • • •• • • f1 285 INTGRMSSEAAPFOGVKQSCIGRECSRHGLEDYLDHKYLCVG 160 IKTO S E APFOGiK SG.GREOSi 0 EDYL KYiC G 440 INrCIIBNEVAPPGGIKASGLGREOSKYGIEDYLEIKYHCIC 481 a x ] , s * fi 2 IAEIDQSTQQTAAMAEESDAACRSLNAEAQHLLELIQQFELCGGSST DQ TQQ AAM EiS AA iSL I E L Li F GiGSS 558 VMrMnovTnrMAAHVFDCTAATHSt.Kr.FTAP[.vBr.HABPnvnHnflfls lllilli 142 604 31 RYPRTLSOO0RQ»VAIIORAIVRDP0VFLFDEPLSNLDAKLriVC«RAEIVKLLR 188 R PR LOOOORQtVAMORAIVRDP VFLFDEPLSNLDAKLRVQHR E L R 130 PJ<PRELBOOQR0»VAllORAIVRDPAVFLFDEPLSNLDAKLP.VOHRLELQQLKR 182 B - I Lip Orf-1 . <o o W •o o o Bkrd Lrp Orf-1 Bkdr Figure 2. A. Hybridisation patterns of the labelled NGR234 DNA fragments not shared by R.fredii USDA257 on dot-blot filters prepared from cosmid DNAs covering a 97% of NGR234 genome. The 24 clones representing the symbiotic plasmid pNGR234a are boxed. B. left, Xho\ restriction digests of overlapping cosmid DNAs covering half of pNGR234a. Right, Southern filter of the same gel probed as in Fig.l-A. Positions of representative known genes are shown on the autoradiogram with numbered circles: 1) nifKDH. 2) nolB. and 3) ORF-1. - I- 35 RSVGLSPTPCLERVRRLERQGFIOGYTALINPHYLDASLLVFVEITLNRGAPDVFEOFVTAVQKLEEIOE VCL6P iCL Ri iC G I CYTALuP • V i . ITL R F . AV K El E HEVGLSPSACUWIKLHEQAGVIRGYTALVDPTQ8ESTIAVIINITLERQTEEYLDKFEAAVRKHPEIRE V LSP 1C1 R n HE OVIR Li P VI, LERQ.Ei L iFK. IPEI E 28 RSVNL3PTPCFNRVRAMEELGVIR0CVTLLSPKALGLDVNVFIHVSLEKQVEQSLHRFKEEIAERPCVME CHLVSGDFDYLLKTRVPDMfiAYRKLLGETLLRLPOVNCTRTYWMEEVKQSHKLVIKTR* C L G D Y L i V A ' EL LPGV V RL K CYLKTGaSDYMLRVDVENAGAFERIHltEVLSTLPGVRRIHSSFSIRNVLAG-RLKAKR* CYLKTG DY.LRV . A ER L LPGV I i S S F i i V L CYUtTGDPDYLLRVLLP3I0ALERFL-DYLTRLPGVANIRSSFALHQTOYirrALPLPAHCKTLRE» 165 121 162 FTgure3. A. Protein alignments for clones pX 177. pX 185, pR57, pR64andpRK21 assembled using the BLAST programme. Upper lines correspond to the putative protein product encoded by one of the 6 ORF's of the NGR234 query sequence. The most significant database matches are displayed on the lower lines, with the identical, conserved (double dot) and less conserved (single dot) ammo acids listed in the middle lines. In the case of pX 185 however, two alignments are provided, above and below the query sequence, with the prosite signature for the El - E 2 class of ATPase's (accession number PS00154) marked in bold. The aspartate residue believed to undergo phosphorylation is marked with an asterisk. Numbers next to the first and last amino acids of each line show their respective positions in the homologous protein. The RI methylation domain of the C.crescenms McpA protein is underlined, while the • marks the potential methylation site based on the reported methyl accepting peptide RI in E.coli Mcpl (34). In alignment of pRK21, the bold amino acids correspond to the ATP-bmding site signature reported in the Prosite database (accession number PS00211). B. One gap protein alignments of the ORF-1 putative product (centre line) with E.coli Lrp (upper line), and Bkdr from P.putida (lower line). Peptide ends are marked with • . ANU265 (NGR234 cured of its symbiotic plasmid) were probed with the subtracted fragments. No cross-hybridisation was detected with USDA257, but the subtracted sequences hybridised strongly to ANU265 and NGR234 genomic DNAs (data not shown). Next, the ordered cosmid library was used to index the position of these 'unique' sequences. Dot-blot filters of DNA prepared from the 309 cosmids that cover 2: 97% of NGR234 genome (7) (see Fig.2-A), when probed with the unique sequences, showed that less than a third of all the clones hybridised. By comparing their respective positions in the 'contigs' (sets of contiguous cosmids), we found that positive clones generally overlapped, and were grouped in about 30 distinct chromosomal regions. Since two thirds of the 24 cosmids necessary to cover pNGR234a hybridised to fragments not shared by R.fredii USDA257, the symbiotic plasmid in proportion to its size, carries a greater number of unique sequences than the chromosome. Assignment to distinct restriction fragments was achieved by probing Southern blots of Xhol restricted cosmid DNAs representative of pNGR234a. Specificity of the DNA subtraction was confirmed by the absence of hybridisation signals to restriction fragments known to carry genes (such as nodABC, nodS, nolB and nifKDH) shared by both NGR234 and USDA257 (see Fig.2-B). Figure 4. Genetic and Spel restriction map of the 500 kb symbiotic plasmid pNGR234a. Spe\ restriction sites are marked with S. Approximate positions of the known genes and me newly identified loci pRK21, pR57. pR64, syrM and ORF-1 (pX140 and pRK41) are shown on the outer circle. A sample of the unique fragments was analysed by shot-gun sequencing. Of 100 randomly picked clones, the sequences of 73 inserts could be grouped into 24 families of similar elements. Subsequently, a limited set of 59 non-redundant sequences was matched against the nucleotide and amino-acid databases. Three clones with significant homologies extending over the entire DNA sequence were studied further. Clones pX140, pX177 and pX185 pR64, pR57 ORF-1 dclAl miJKDH ,jfKDH molB modD2 Nucleic Acids Research, 1994, Vol. 22, No. 8 1339 Xhol tf/mJIII J Sail Sad Smal Clal Xhol I ORF-2 ORF-l J 200 bp. B -178 -268 -358 ArreCTCCGCCTTCTTCGCAAGCGCGTTGGTGAACTTGAAGCAGTCGCCGGTCTCGTCCr^^ -44« rr^r.rr.cr^r.TCAGrAArarnAAr^TCcccnTr,c.AAGTT«y^ US K E V G L S P S A C L R R I K L H E Q A CCTGTCATCAGGG<XrrATACGGCOCTTCTC<lATCCCAaX*CTCGGAATCGACAATAaCCGTAAT^ G V I R G Y T A L V D P T Q S E S T I A V I I T I T L E R Q ACGGAGGAGTA<XTCQACAAGTTTGAAGCGGCCGTGCGCAAGCACCCC<1AAATTAGCWAGTGCTATCTAATGACCGGCGGATCAGACTAC T E E Y L D K P E A A V R K H P E I R E C Y L M T G G S D Y ATGCTGAGGGTGGACGTCGAGAATGCCGGGGCATTCryUXX:CATACACAAAGAGaTCCTGTCGACCTTGCCTGG<X7rGC^ M L R V D V E N A G A P E R I H K E V L S T L P G V R R I H TCCAGCTTCTCCATTAGAAATGTCrTAGCGGCCCGTCTGAAAGCAAAAACyaSaAACTTTCCCATI^^ S S F S I R N V L A G R L K A K R O p . Of« Op. 'ACAGGCCACAGAAGATCTGAGCTCAGCAATCGAAGGCACM -338 GTCGTGGTGACGATTTAACCCATTGAGATTCCCAAGAAGGCaX^AAATCACATTCAACACTGACT^^ RBJ GGGCTTCGGGACGATCAATCGGAAOn7lTCAGAGGTTrTGTGCCXXKra»XACGAACCGCAAGCGT^^ G L R D D Q W E R I R G P V P G G T K G K R G P R -62B -728 -808 -898 -1078 H D C D -1168 T N N R L F L D A L L W M A R S G D R W R D L P E R L G D Y R A V K L R Y Y R W I E M G V L D E K L A V L A R E A D L E W L TCGACTATCGTGCGCGCCCATCAG«TGCGGCCGGGGCGCGC*GGGCTAAAGGGGGGCGGATGCCCAGGGC^^ S T I V R A H Q H A A G A R R A K G G R H P R A W V G M I L E D -1435 Figure 5. A. Restriction map of the 1.4 kb Xhol fragment cloned in pXB315Xl 4, with the position of the two open reading frames reported (shadowed boxes). B. complete DNA sequence of the same restriction fragment. Probable ribosome binding sites (RBS), putative start codons (ATGs and one alternate GTG) and nonsense codons (marked Opa) are underlined. The deduced amino acid sequence of the two ORFs is displayed under the nucleotide sequence. matched (see Fig.3-A) a segment of a leucine regulatory protein from E.coli (20), a sequence from R.leguminosarum coding for an outer membrane protein (21), and a cadmium resistance protein from Staphylococcus aureus (22) respectively. Shot-gun sequence analysis of pNGR234a restriction fragments that carry induced transcripts not shared by USDA257 To identify flavonoid-inducible loci of pNGR234a that are not present in USDA257 genome, we combined data from the competitive RNA hybridisation with those shown in Fig.2. A SaulIAJ library of the 18 Xhol restriction fragments that gave hybridisation signals in both experiments, that did not carry any known symbiotic loci and which are dispersed over 3: 57 kb of pNGR234a was prepared. Four (pR57, pR64, pRK21 and pRK41) out of 150 sequences of the library (representing £: 28 kb) showed very strong homologies (Fig.3-A) to a succinatesemialdehyde dehydrogenase from E.coli (Swissprot accession number P25526), a methyl-accepting chemotaxis protein from Caulobacter crescentus (23), the UGPC protein from E.coli (24) and the leucine responsive regulatory protein respectively. Detailed analysis of the selected clones Confidence in gene identification by homology search clearly depends upon the accuracy of the query sequence and increases with homologies extending over larger DNA segments. To verify that the homologies obtained for the seven selected clones were not fortuitous, we cloned the corresponding genomic loci from the ordered cosmid library of NGR234. For each of the seven loci, we confirmed and extended the original sequence using as template the appropriate genomic fragment, and two synthetic primers designed to span the DNA segment showing the highest degree of homology with the database entries. Two sets of overlapping cosmids were homologous to pX177: clones pXBS23 and pXBS4 from pNGR234a as well as cosmids pXB482 and pXB739 from the chromosome. Sequence data confirmed that the segment of pX177 which is homologous to the R.leguminosarum OmpIII gene, mapped to a 3 kb Pstl restriction fragment from the chromosome. Clone pX185 was assigned to pXB424 of the chromosome. The homologies reported with S.aureus CadA (22) and R.meliloti Fixl (25) proteins correspond to a highly conserved domain in cation transporters with El E2 ATPase activity (Fig.3-A). ICATOOL analysis showed that pR64 and pR57 sequences were complementary and overlapped by 178 bases. Combined, they form a single Sau3AI fragment of 286 bp that maps to a 3 kb Pstl-Xhol restriction fragment shared by pXB43 and pXB315 (see Fig.4 for approximate position). Interestingly, both pR64 and pR57 gave different and statistically significant results in the BLAST analysis. First, pR64 showed a high degree of homology to the carboxy-terminus domain of several E. coli and Caulobacter crescentus methyl accepting chemotaxis proteins which extend over the second of the two proposed methylation domains (KI and RI) adjacent to a well conserved cytoplasmic 1340 Nucleic Acids Research, 1994, Vol. 22, No. 8 region (23). On the complementary strand, the putative peptide encoded by pR57 is highly homologous to the C-terminal domain of several semialdehyde dehydrogenases. Since the putative proteins from both sequences correspond to very conserved carboxy-terminal domains, with non-sense codons correctly placed to match the right protein length, it seems as if this Sau3Al fragment extends over the ends of two genes transcribed in opposite directions and overlaps by 34 bp. The pRK21 insert was mapped to the 5.2 kb Xhol restriction fragment of pXB807 (see position in Fig.4). This DNA fragment was cloned (pRB807X5.2) and partially sequenced. About 800 bp of the NGR234 RS. 1 repeat element (one copy on pNGR234a, three on the chromosome, 6) cover one extremity of this DNA fragment while a syrM homologous sequence was identified at the other extremity (data not shown). Alignments with database entries showed strong homologies, both at the DNA and protein level to the UGPC locus from E.coli. The putative pRK21 protein product also displayed a high degree of homology to other related ATP-binding proteins, such as R.leguminosarum and R.loti Nodi, that are involved in the active transport of small hydrophilic molecules across the cytoplasmic membrane. Despite these homologies, we believe that pRK21 does not code for the NGR234 Nodi product, as recent sequence data shows that nodi is part of the nodABCIJ operon in Rhizobium sp. NGR234 (B.Relic' unpublished). Finally, ICATOOL analysis demonstrated that the pX140 and pRK41 sequences are complementary and overlap by 125 bp. Both clones are linked to a 1.4 kb Xhol restriction fragment carried by the cosmid pXB315 (see ORF-1 map location in Fig.4), and contiguous to the 3 kb Pstl-Xhol restriction fragment carrying the pR57 and pR64 sequences. To test if any ot the 7 sequences described above are part of open-reading frames whose expression is induced1 by flavonoids, we prepared PCR-amplified products from the selected inserts using primers designed to flank the DNA segment with the highest degree of homology in the BLAST analysis. Probing a Southern transfer of the resulting PCR products in a competitive RNA hybridisation experiment showed that only inserts from pRK41 and pX140 hybridised to the labelled RNA prepared from flavonoid-induced NGR234 bacteria. Later, induction of this locus was also confirmed by Northern analysis (data not shown). The deduced peptide of ORF-1 is strongly homologous to E.coli Lrp and P.putida Bkdr To test the reliability of our screening strategy, we analysed the LRP-like locus that is both inducible and unique to NGR234. First, to demonstrate that this sequence is truly unique, we probed restricted genomic DNAs from USDA257 and NGR234 with a 32 P-labelled insert of pX140. As expected, only one strong band was observed in NGR234, and there was no cross-hybridisation with R.fredii DNA (data not shown). Second, to determine whether pRK41 and pX140 inserts are part of a larger openreading frame, we sequenced the entire 1.4 kb Xhol restriction fragment cloned in pXB315X1.4 (Fig.5). Both pX140 and pRK41 sequences matched the 381 bp ORF-1. BLAST analysis showed that the putative ORF-1 product is highly homologous to two regulatory proteins, one from E.coli, the other from Pseudomonas putida [Lrp and Bkdr respectively (26)]. The one gap alignment presented in Fig.3-B, predicts that the deduced amino acid sequence of the protein encoded by ORF-1 has 37% and 40% identity, or 78% and 81 % homology (when similar amino acids are included) to Lrp and Bkdr respectively. All scores are higher than those proposed in a more flexible three gaps alignment of the E.coli Lrp and AsnC proteins (20). Extensive homology of the ORF-1 amino-terminal domain with the E. coli Lrp HelixTum-Helix domain, suggests that protein synthesis should initiate at the GTG codon rather than at the downstream ATG (Fig.5-B). If translation starts at the alternate GTG codon, the NGR234 Lrp homologue is 127 amino acids long, 35 a.a. shorter at its aminoterminus than the E.coli Lrp. A second ORF (ORF-2; see Fig.4-A) was identified on the 1.4 kb Xhol fragment. The deduced peptide sequence of ORF-2 shares 28% identity and 72% homology (when similar amino acids are included in the analysis) with the protein A3 from Agrobacterium tumefaciens IS869 (27) in a no-gap protein alignment (data not shown). Symbiotic phenotype of the ORF-1 ::fi mutant To assay symbiotic activity of the ORF-1 locus, a mutant carrying the Omega interposon in the HindUl site internal to the gene was constructed (NGR234QORF-1). In comparison with wild-type NGR234, this mutant caused a 4.5 day delay in nodulation of V.unguiculata (measured 21 d after inoculation). On Lleucocephala and P.tuberosus, the number of nodules increased by more than 65% in comparison to the wild-type, while on C.caendeum the nodule number was decreased by ± 25%. DISCUSSION Random sequencing has been used to study the genome structure of various organisms including the Larungotracheitis virus (28), Mycoplasma genitalium (29), Saccharomyces cerevisiae (C.J.Davies, Ph.D. thesis, 1991) and the Pufferfish, Fugu rubripes rubripes (30). In association with competitive RNA hybridisation and/or subtraction DNA hybridisation, it becomes a potent method to compare related genomes as well as to target actively transcribed genes. The screening strategy outlined in Fig. 1 is based on the NGR234 physical map, and expands the level of analysis from limited DNA segments to the whole replicon. It is flexible since minor modifications to the RNA — DNA hybridisation procedures allow targeting of genes induced or repressed under many different conditions. Only genes with relatively strong homologies to database entries will be identified this way however. Data from DNA subtraction hybridisations confirmed that Rhizobium species NGR234 and R.fredii strain USDA257 are phylogenetically related, and share most of their genomic background. No essential gene was identified in the random sequence analysis of the SOM3AI fragments remaining after two cycles of DNA subtraction hybridisation. Homologies with IS elements indexed in the databases (data not shown), and the absence from the USDA257 genome of the RS. 1 transposon like repeat (X.Perret, unpublished) suggests that many of the sequences 'unique' to NGR234 are mobile elements which have accumulated since both bacteria diverged. The higher proportion of the 'unique' sequences in the symbiotic plasmid compared with the rest of the genome suggests that pNGR234a tolerates integration of non-endogenous sequences better than the chromosome. Icatool and ClustalV analysis of more than 100 NGR234 5ou3AI fragments not shared by R.fredii USDA257 revealed inherent limitations in the library prepared from the subtracted fragments. There was 40% redundancy amongst the clones analysed, with similar sequences grouped into 24 families of as many as 10 elements. Probing Southern blots of multiple restriction digests of NGR234 genomic DNA with sequences representative of some of these largest families showed that the Nucleic Acids Research, 1994, Vol. 22, No. 8 1341 redundancy does not result from repeated elements in NGR234 genome (data not shown). Moreover, several sequence mismatches were found among nearly identical fragments cloned in both orientations. This indicates that the PCR amplification of subtracted sequences prior to cloning generates or increases an unbalanced distribution of fragments, provoking small anomalies due to Taq polymerase misreadings. This biased fragment distribution prevents use of the level of redundancy in the pool of analysed clones to estimate the total length of sequences specific to NGR234. Nevertheless, valuable genetic information can be retrieved from the library of subtracted fragments particularly using the BLAST software which is capable of detecting distant protein homologies even when confronted with such common sequencing errors as frameshifts and replacements. This combination of techniques lead to the identification of several new loci with putative symbiotic functions. Among these, the sequence homologous to the symbiotic regulator syrM is adjacent to pRK21, the clone pX177 with homology to the ompIII gene which is symbiotically repressed in R. leguminosarum has been mapped to the chromosome of NGR234, and pX185 carries a highly conserved domain with El E2 ATPase activity common in cation transporters such as the Fixl protein. In addition, pR57 has been shown to be homologous to the C-terminal domain of succinate- and other semialdehyde dehydrogenases. In R.meliloti, a mutant with a low succinic semialdehyde dehydrogenase activity is defective in symbiotic nitrogen fixation. More interestingly, we identified ORF-1, a new symbiotic gene. The peptide encoded by ORF-1 is very similar to the regulatory proteins Lrp from E.coli and BkdR from P.putida. BkdR is a positive activator of the branched-chain keto acid dehydrogenase operon, while Lrp combines repressor and activator activities that coordinate various functions involved in global responses (31). The conservation in the ORF-1 product of all but one of the amino acids known to affect the Lrp DNA binding ability (32), suggests that ORF-1 may have retained regulatory functions. In presence of a suitable carbon source, the Lrp mutant in E.coli grows normally. Similarly, the ORF-1 mutant of NGR234 does not display an extreme phenotype. However, the fl::ORF-l mutation modifies the efficiency of nodulation by NGR234. Depending upon the plant tested, we observed a significant delay in nodulation, a reduction or even a large increase in the number of nodules. This symbiotic phenotype, together with the observed flavonoid induction of ORF-1 and its location on the non-essential symbiotic plasmid pNGR234a, suggest that this gene is probably not involved in the regulation of operons similar to those controlled by Lrp and BkdR. Furthermore, the absence of homologous genes in other rhizobia (data not shown), as well as in the closely related R.fredii USDA257, suggests that NGR234 has developed additional systems to regulate nodulation. Another symbiotic regulatory systems, nodVW, has been described in Bradyrhizobium japonicum (33). ACKNOWLEDGEMENTS We wish to thank M.Trower, G.Elgar and D.Gerber for their help in many aspects of this work. We are grateful to J.Parsons and S.Aparicio for their assistance with the computer analysis. Financial support was provided by the Fonds National Suisse de la Recherche Scientifique (Grants #31-30950.91 and 31-36454.92) and the Fondation Sandoz pour l'Avancement des Sciences Medico-biologiques. R.Fellay gratefully acknowledges the receipt of an EMBO short-term fellowship. REFERENCES 1. Trinick, M.J. (1980) J. Appl. Bacteriol.. 49, 39-53. 2. Heron, D.S. and Pueppke, S.G. (1984) / Baaeriol., 160, 1061-1066. 3. Relic', B., Perret, X., Golinowsky, W., Pueppke, S.G., Krishnan, H.B. and Broughton, W.J. (1993) Science, Submitted. 4. Krishnan, H.B., Lewin, A., Fellay, R., Broughton, W.J. and Pueppke, S.G. (1992) Mol. Microbioi, 6, 3321-3330. 5. Lewin, A., Cervantes, E., Wong, C.-H. and Broughton. W.J. (1990) Mol. Plant-Microbe Interact.. 3, 317-326. 6 Perret, X., Broughton, W.J. and Brenner, S. (1991) Proc. Nail. Acad. Sci. USA, 88, 1923-1927. 7. Perret, X. (1992) Ph.D. thesis # 2489, University of Geneva, Geneva, Switzerland. 8. Fellay, R., Perret, X., Broughton, W.J. and Brenner. S. (1993) Mol. Microbioi., submitted. 9. Broughton, W.J , Wong, C.-H., Lewin. A., Samrey, U.. Myint, H.. Meyer z.A., H., Dowling, D N. and Simon, R. (1986) J. Cell Biol.. 102. 1173-1182. 10. Bjourson, A.J., Stone, C.E. and Cooper J.E. (1992) Appl. Environ. Microbioi., 58, 2296-2301. 11 Stanley, J., Dowling, D.N., Stucker, M. and Broughton, W.J. (1987) FEMS Microbioi. Lett., 48, 25-30. 12. Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, second edition. Cold Spring Harbor University Press. Cold Spring Harbor. 13. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) Proc. Nail. Acad. Sci. USA, 74, 5463-5467. 14. Parsons. J.D., Brenner. S. and Bishop. M.J. (1992) Comp Appl. Biosc . 8, 461-466. 15. Higgins, D.J and Sharp, P.M. (1988) Gene, 73, 237-244. 16. Altschul, S.F., Gish, W., Miller W., Myers. E.W. and Lipman. D. (1990) J. Mol. Biol., 215, 403-410. 17. Prentki, P. and Knsch, H.M. (1984) Gene. 29, 303-313. 18. Quandt, J. and Hynes, M.F. (1993) Gene, 127, 15-21. 19. Figurski, D.H. and Helinski, D.R. (1979) Proc. Natl. Acad. Sci. USA, 76, 1648-1652. 20. Willins, D.A., Ryan. C.W., Platko. J.V. and Calvo. J.M. (1991) J. Biol. Chem., 266, 10768-10774. 21. deMaagd, R.A., Mulders, I.H.M., Canter Cremers, H.C.J. and Lugtenberg. B.J.J. (1992) J. Baaeriol., 174, 214-221. 22. Nucifora, G., Chu, L., Misra, T.K. and Silver, S. (1989) Proc. Natl. Acad. Sci. USA, 86, 3544-3548. 23. Alley, M.R.K.. Maddock, J.R. and Shapiro, L. (1992) Genes and De\'., 6, 825-836. 24. Overduin, P., Boos, W. and Tommassen, J. (1988) Mol. Microbioi., 2. 767-775. 25. Kahn, D., David, M., Domergue. O.. Daveran, M.L.. Ghai, J., Hirsch. P.R. and Batut, J. (1989) J. Baa., 171, 929-939. 26. Madhusudhan, K.T., Lorenz. D. and Sokatch, J.R. (1993) J. Baa.. 175, 3934-3940. 27. Paulus, F., Canaday, J., Vincent. F., Bonard, G., Kares, C. and Otten, L. (1991) Plant Mol. Biol., 16, 601-614. 28. Griffin, A.M. (1989) / Gen. Virol., 70, 3085-3089. 29. Peterson, S.N., Schramm, N., Hu, P.-C., Bott, K.F. and Hutchison, C.A. (1991) Nucleic Acids Res., 19, 6027-6031. 30. Brenner, S., Elgar, G., Sandford, R., Macrae, A.. Venkatesh, B. and Aparicio, S. (1993) Nature, 366, 265-268. 31. Newman, E.B., D'Ari, R. and Lin, R.T. (1992) Cell, 68, 617-619. 32. Platko, J.V. and Calvo, J.M. (1993) J. Baa., 175, 1110-1117. 33. Gotrfert, M., Grob, P. and Hennecke, H. (1990) Proc. Natl. Acad. Sci. USA, 87, 2680-2684. 34. Kehry, M.R., Bond, M.W., Hunkapiller, M.W. and Dahlquist. F.W. (1983) Proc. Natl. Acad. Sci. USA, 80, 3599-3603. 35. Hanahan, D. (1983) J. Mol. Biol., 166, 557-580. 36. Cami, B. and Kourilsky, P. (1978) Nucleic Acids Res.. 5, 2381-2390. 37. Stanley, J., Dowling, D.N. and Broughton, W.J. (1988) Mol. Gen. Genet.,215, 32-37. 38. Morrison, N.A , Hau. C.Y., Trinick, M.J., Shine, J. and Rolfe, B.G. (1983) J. Baa., 153, 527-531.
© Copyright 2026 Paperzz