Blackwell Science, LtdOxford, UKEMIEnvironmental Microbiology 1462-2920Blackwell Publishing Ltd, 200355383394Original ArticleThe mobile gene cassettte metagenomeA. J. Holmes et al. Environmental Microbiology (2003) 5(5), 383–394 The gene cassette metagenome is a basic resource for bacterial genome evolution Andrew J. Holmes,1† Michael R. Gillings,1 Blair S. Nield,2 Bridget C. Mabbutt,3 K. M. Helena Nevalainen2 and H. W. Stokes2* 1 Key Centre for Biodiversity and Bioresources, Macquarie University, Sydney NSW 2109, Australia. 2 Department of Biological Sciences, Macquarie University, Sydney NSW 2109, Australia. 3 Department of Chemistry, Macquarie University, Sydney NSW 2109, Australia. Summary Lateral gene transfer has been proposed as a fundamental process underlying bacterial diversity. Transposons, plasmids and phage are widespread and have been shown to significantly contribute to lateral gene transfer. However, the processes by which disparate genes are assembled and integrated into the host regulatory network to yield new phenotypes are poorly known. Recent discoveries about the integron/ gene cassette system indicate it has the potential to play a role in this process. Gene cassettes are small mobile elements typically consisting of a promoterless orf and a recombination site. Integrons are capable of acquisition and re-arrangement of gene cassettes and of the expression of their associated genes. The potential of the integron/gene cassette system is thus largely determined by the diversity contained within the cassette pool and the rate at which integrons sample this pool. We show here using a polymerase chain reaction (PCR) approach by which the environmental gene cassette (EGC) metagenome can be directly sampled that this metagenome contains both protein-coding and nonprotein coding genes. Environmental gene cassetteassociated recombination sites showed greater diversity than previously seen in integron arrays. Class 1 integrons were shown to be capable of accessing this gene pool through tests of recombinational activity with a representative range of EGCs. We propose that Received 17 October, 2002; accepted 15 January, 2003. *For correspondence. E-mail [email protected]; Tel. (+612) 9850 8164; Fax (+612) 9850 8245. †Present address. School of Molecular and Microbial Biosciences, The University of Sydney, New South Wales, 2006. © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd gene cassettes represent a vast, prepackaged genetic resource that could be thought of as a metagenomic template for bacterial evolution. Introduction The Bacteria are the most physiologically diverse group known. Such physiological diversity is underpinned by corresponding genomic diversity. Given that the typical bacterial genome size is <10 Mbp this diversity has arisen from a remarkably small genomic template. These contrasting observations can be reconciled through the proposition that horizontal gene transfer (HGT) is the major factor in the evolution of bacteria (Ochman et al., 2000). There is a large body of evidence supporting this general thesis. Mechanisms for transfer of genes between cells are well known and virtually ubiquitous in both a phylogenetic and ecological sense. Genome sequence analyses have shown that a large proportion of any one bacterial genome is likely to have been acquired from ‘foreign’ sources. For several complex phenotypes there is strong evidence that sets of genes were separately acquired by HGT. A major gap in our knowledge, however, is how transferred genes are integrated into the metabolism of the recipient cell. In recent years the integron/gene cassette system has emerged as one of the best examples of capture and expression of new genes (Hall et al., 1999). Integrons include a site-specific recombination system and were first identified as the sites of antibiotic resistance gene capture in mobile elements from clinical isolates (Stokes and Hall, 1989; Martinez and de la Cruz, 1990; Collis et al., 1993). The integron is a recombination and expression system that captures genes as part of a genetic element called a gene cassette (Recchia and Hall, 1995). Gene cassettes are very simple genetic elements that typically consist of a single promoterless gene and a recombination site termed a 59-base element (59-be). In the well-studied class 1 integrons, the gene capture system consists of a site-specific recombinase (IntI1) and a recombination site (attI1). IntI1 reversibly catalyses two types of site-specific recombination reaction. These are recombination between attI1 and a 59-be, or recombination between two 59-be sites. Collectively these reactions result in the assembly of new genes downstream of an integron-associated promoter Pc that directs transcription 384 A. J. Holmes et al. of the cassette-associated genes (Stokes and Hall, 1989; Hall et al., 1991; Collis and Hall, 1992; Collis and Hall, 1995). The arrangement of these features is shown in Fig. 1. Recently, a similar organization has also been demonstrated for the class 3 integron (Collis et al., 2002). Class 1 and 3 integrons thus fulfill the basic requirements for gene acquisition in the HGT model for evolution of bacterial physiological diversity. Disparate genes can be assembled at a specific locus where they are amenable to regulatory control by the cell. Recent discoveries have shown that integrons are not simply part of mobile elements carrying antibiotic resistance genes, but are a distinct type of genetic element found in a variety of genomic contexts. Integrons and gene cassette arrays have been sequenced in the chromosomes of Pseudomonas, Vibrio, Xanthomonas and Shewanella spp. (Heidelberg et al., 2000; Rowe-Magnus et al., 2001; Vaisvila et al., 2001; da Silva et al., 2002). Furthermore, intI homologues are present in the unfinished genomes of Treponema denticola, Geobacter sulphurreducens, Acidithiobacillus ferrooxidans, and Nitrosomonas europaea implying that integrons are also present in these genomes (Nield et al., 2001). Given the gene acquisition and expression properties of class 1 and 3 integrons, the discovery that integrons are widespread raises the proposition that they may play a general role in the acquisition of new genes in bacterial genomes. However the integron platform is essentially a simple structure, entirely dependent on gene cassettes as its substrates. The crucial questions therefore revolve around the nature of the gene cassette pool and how it interacts with integrons. Difficulties in sampling or recognizing gene cassettes from outside an integron context have restricted our capacity to address these questions. Recognition of gene cassettes independently of integron features requires an objective definition of the 59be sequence family. Amongst characterized 59-be sites, length ranges from 57 bp to 145 bp and the pairwise sequence difference may exceed 70%. A detailed comparison of all cassette-associated recombination sites available at the time was reported by Stokes et al. (1997) and they noted a number of conserved features. These include an overall imperfect inverted repeat structure, each half of which includes a simple site of the type commonly associated with the tyrosine family of recombinases (Grainge and Jayaram, 1999). 59-be sites include a core site with the consensus GTTRRRY (designated 1R, Fig. 2) and for recombination events mediated by IntI1, the recombination crossover point is between the G and first T of this site (Stokes et al., 1997). Within the 59-be conserved features there is moderate sequence conservation including eight nearinvariant positions (Fig. 2). Fig. 1. Structure of In3 and of co-integrates formed with test 59-be sites. The two most common insertion sites into In3 for a test 59-be are indicated. LHS, Left-hand side. RHS, Right-hand side. Restriction sites are: B, Bam H1; H, Hind III; and S, Sal 1. 3′-CS, 3′-conserved segment. Pc, promoter for cassette-associated genes. The horizontal arrows indicate the binding sites and direction of synthesis for each of the primers used in mapping co-integrate junctions. Cm, chloramphenicol © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394 The mobile gene cassettte metagenome 385 Fig. 2. Alignment of the conserved domains of 59-be sites from environmental cassettes. 59-be recombination site sequence is shown in bold, with five bases of flanking sequence at each end also shown. The highly variable central regions are not shown, with numbers indicating the length of the omitted sequence. Sequence depicted in upper case is that predicted for the free, circular form of the cassette. For most sequences shown here, the last six bases of the 59-be (which also represent the first six bases of the integrated, linear form of the cassette) are derived from a PCR primer and consequently shown in lower case. Putative IntI binding domains in the left (1L and 2L) and right (1R and 2R) halves of the elements are indicated by shading and arrows (top). The filled triangles (▼) identify locations at which extra bases are not shown (see also Fig. 4). The alignment is separated into three groups to accommodate length variation. The ‘position’ lines allow alignment between groups. Upper and lower case letters in the ‘position’ lines indicate bases that are generally complementary when left and right halves of the element are compared. The asterisk indicates the extra base in 2L compared to 2R. The ‘+’ symbol indicates positions that are found in all 59-be sequences but disrupt the repeat structure (see Experimental procedures). Positions that are not common to all 59-be are left blank. Numbers on the left distinguishes each of the 11 identified subfamilies. We have recently demonstrated that use of degenerate primers targeting the conserved regions of 59-be sites in PCR with environmental DNA samples results in recovery of diverse sequences showing characteristics of gene cassettes (Stokes et al., 2001). In total, 123 predicted gene cassettes were recovered in that study. Here we have confirmed these sequences as gene cassettes. Thus ‘cassette PCR′ technique allows us to address a number of questions regarding the nature of the gene cassette pool for the first time. Here we describe the characterization of a further 41 environmental gene cassettes, sequence relationships of environmental gene cassettes to gene cassettes from other sources and demonstrate their ability to be recruited by class 1 integrons. Results PCR recovery of gene cassettes A total of 57 cloned amplicons, derived by PCR with the primers HS286 and HS287 and that predominantly target cassettes contained within linear arrays, were analysed from soil microcosm samples. All showed the characteristics expected of cassette PCR products (Stokes et al., 2001). Of these, 38 represented amplification of a single gene cassette, and 19 the amplification of arrays of two, three or four cassettes in tandem. Many gene cassettes were sampled more than once (either singly or as part of an array), thus this dataset resulted in a total of 41 distinct cassettes (EGC111 – EGC151, see Experimental procedures). When pooled with our previous study the total © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394 386 A. J. Holmes et al. number of gene cassettes directly sampled from natural environments via cassette PCR is 164. Gene cassette content: protein-coding orfs The vast majority of experimentally characterized gene cassettes contain promoterless protein coding orfs. Our Environmental Gene Cassette (EGC) dataset and gene cassettes recovered as part of large scale sequencing projects contain a disproportionately high fraction of novel sequences. This complicates the task of predicting coding sequences. Where alternative reading frames are found by software programs the cassette boundaries can assist in predicting the ‘true’ orf. Regardless of the difficulties in prediction of the coding content of the gene cassettes it is evident that the spectrum of proteins harboured in the total gene cassette pool is extraordinarily diverse. Of the 142 hypothetical proteins currently in the EGC data set only 24 (17%) show sequence relationship to any previously described protein and, of these, 17 are similar only to ‘hypothetical proteins’ (Table 1). This same pattern is also true of other bacterial gene cassette pools outside of the antibiotic resistance context including the Xanthomonas campestris pv campestris, Pseudomonas alcaligenes and Vibrio cholerae chromosomes (Heidelberg et al., 2000; Vaisvila et al., 2001; da Silva et al., 2002). No obvious hints of a general role for cassette-associated proteins could be predicted. Predicted protein sizes in the EGC pool ranges from 36 to 346 amino acids. The distribution of this range is skewed, with ∼ 60% of all hypothetical proteins in the size range of 70–140 amino acids. Only 10% were greater than 200 amino acids. However, in general, the size range of predicted proteins matches that of biologically active peptides that are ribosomally synthesized. Similarly, calculations of hydrophobicity values show an essentially normal distribution indicating the pool of proteins is unlikely to show any marked bias towards membrane or cytoplasmic location. The lack of pattern in size and physicochemical properties of hypothetical proteins is matched by the diversity of biological activities in those cassettes that have been characterized or which show homology to characterized proteins. In genes within cassettes from class 1 integrons most encode antibiotic resistance. However the mode of resistance varies tremendously (Recchia and Hall, 1995). Gene cassettes identified from genome sequencing or environmental contexts encode diverse properties including, lipases, restriction endonucleases, transport proteins, toxins and surface antigens (Clark et al., 2000; Vaisvila et al., 2001; Rowe-Magnus et al., 2001; Stokes et al., 2001). Gene cassette content: non-protein-coding sequences Some gene cassettes appear to have a biological role other than protein-coding. EGC104 is the first cassette of Table 1. Cassette gene products with database matches. Gene product Top database hit %Identity/%similarity Predicted function orf297_EGC010 orf101_EGC017 orf117_EGC020 orf346_EGC034 orf271_EGC035 orf81_EGC044 orf113_EGC044 orf132_EGC064 orf208_EGC067 orf133_EGC068 orf90_EGC101 orf105_EGC103 orf147_EGC162 orf168_EGC027 orf209_EGC029 orf154_EGC030 orf135_EGC079 orf139_EGC084 orf159_EGC115 orf174_EGC125 orf125_EGC139 orf161_EGC159 Orf110_EGC088 Orf117_EGC148 Bacillus subtilis (CAB15191) Caulobacter crescentus (AAK23946) Bacteriophage 933 W (AAD25429) Pseudomonas syringae (ZP_00124941) Wolinella succinogenes (CAC50085) Nitrosomonas europaea (ZP_00003253) Clostridium thermocellum (ZP_00061925) Pseudomonas aeruginosa (AAG07752) Thermus thermophilus (BAB17605) Mycobacterium tuberculosis (AAK44262) Nostoc punctiforme (ZP_00108370), Mycobacterium tuberculosis (AAK46615) Nitrosomonas europaea (ZP_00003997) Pasteurella multicida (AAK03695) Xanthomonas campestris (AAM39391) Shewanella oneidensis (AAN56682) Agrobacterium tumefaciens (AAK86432) Oceanobacillus iheyensis (BAC15208) Bacillus anthracis (NP_656837) Bacillus anthracis (NP_656524) Caulobacter crescentus (AAK24858) Brucella suis (AAN30397) Xylella fastidiosa (AAF85305) Synechocystis sp. PCC6803 (BAA18636) 28/48 61/74 41/56 27/49 31/52 91/96 45/60 49/61 30/53 32/50 71/78 33/52 68/80 27/52 27/46 50/61 34/60 29/54 36/58 36/54 42/63 27/57 17/41 34/62 Hypothetical protein Hypothetical protein Hypothetical protein Aminoglycoside phosphotransferase Sulphur transferase Possible toxin antidote protein PemK family Hypothetical protein RNA methyl transferase Hypothetical protein Hypothetical protein Hypothetical protein Pyrimidine dimer DNA glycosylase Hypothetical protein Hypothetical protein Hypothetical protein Hypothetical protein Hypothetical protein Hypothetical protein Hypothetical protein Bleomycin resistance Hypothetical protein Hypothetical protein Hypothetical protein Other notable hits: Orf360_EGC124 is in the reverse orientation. This orf has strong similarity (42% identity, 65% similarity) to cheA (AAK78103). It is likely that this cassette is a non-specific product. Orf196_EGC129 shows strong identity over the first 36 residues only to a diverse range of peptides including HI1126 (AAC22780). It is possible this region is a common leader sequence. © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394 The mobile gene cassettte metagenome 387 an array in the clone Bal48 (AF349111). This cassette contains 178 bp that are not part of its 59-be. Several observations indicate that this sequence is non-proteincoding. Although, in both the forward and reverse orientations, there is one reading frame comprising an uninterrupted stretch of coding codons, neither of these frames has plausible start or stop codons within the cassette. Second, the EGC104 cassette content shows significant sequence relationship to three other cassettes (EGC048, EGC050, EGC051). Collectively the contents of these four cassettes comprise a sequence family sharing 61–94% identity (data not shown) yet none contain obvious protein-coding orfs. Third, despite the DNA sequence conservation across the family, neither of the possible orfs in EGC104 is conserved in other family members. Finally, sequence conservation is particularly strong through the central 70 bp of the cassette-content and this region forms an imperfect inverted repeat. The pattern of family sequence conservation indicates that sequence structure, rather than coding potential, is more biologically relevant. We conclude that members of this sequence family have a role other than encoding a protein. A similar situation is seen for the cassette content of EGC091. The complete cassette was recovered as part of an array in clone Bal33 (AF349108). It shows significant DNA identity to other sequences from various environments (two soils and a hot spring). These comprise at least nine distinct EGC types, including EGC049, EGC052, EGC053, EGC054, EGC055, EGC056, EGC057, EGC058 and EGC091. For brevity, only three of these representing the range of sequence divergence, are shown in Fig. 3. Members of the family are 308– 330 bp in length and in all cases stop codons are prevalent, precluding the presence of protein-coding orfs across the sequence family. Pairwise sequence identity ranges from 62 to 91%. Noteworthy features of the EGC091 family are that poly A/poly T tracts are prominent with (16 occurrences) and that the predicted RNA (if transcribed) contains a number of sequence domains likely to have a stable secondary structure. Fig. 3. Alignment of three representative members of the EGC091 sequence family. Positions that are universal in all nine members of the sequence family are shown in the consensus as upper case. Those that are strongly conserved (> 75% identity) across all members of the sequence family are shown in lower case. Where no data is given in the consensus sequence there is not significant sequence conservation across all members of the sequence family. Members of the sequence family are EGC049, EGC052, EGC053, EGC054, EGC055, EGC056, EGC057, EGC058 and EGC091. © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394 388 A. J. Holmes et al. Sequence relationships of ‘environmental’ 59-be sites In total 37 inverted repeat structures were identified within the recovered environmental clones that were consistent with the generalized structure for 59-be sites as outlined in Experimental procedures (Fig. 2). There was nonetheless considerable sequence and structural variation between elements. The most notable variable feature of 59-be sites is the length and sequence of the region separating the two halves of the repeat. This central region ranges in length from three to 79 bases occurring between positions ‘s’ and ‘S′ in the generalized structure (Fig. 2). None, part, or all of this region may be part of the inverted repeat structure. Another noteworthy variation is the presence of ‘asymmetric’ sequence insertions in some 59-be sites (triangles in Fig. 2). These appear to only occur at specific loci that are internal to the conserved ‘paired’ regions making the inverted repeat structure asymmetric. It is this sequence variation that makes evaluation of relationships between members of the 59-be family difficult. Estimation of evolutionary distances from sequence analyses requires the comparison of orthologous positions. The conservation of structure indicates 59-be sites constitute an orthologous sequence family. However, it is unlikely that positions outside the core 51 positions are orthologous across the sequence family. Furthermore, within the 51 core positions the structural constraints are such that this short sequence contains no effective ‘phylogenetic signal’. A consequence of this heterogeneity is that inferring evolutionary relationships across the whole 59-be family is not possible. Nevertheless, evolutionarily distinct subfamilies can be recognized on the basis of heterologous sequence features. That is, any group of sequences containing a sequence insertion that is heterologous with respect to all other members of the family represents an evolutionarily distinct group (such groups are not necessarily monophyletic). The ‘PAR signature’ described for some Ps. alcaligenes gene cassettes is such an example (Vaisvila et al., 2001). On the basis of ‘heterologous insertions’ our EGC dataset includes at least 11 distinct subfamilies of 59-be sites (Fig. 2). In each of these families the inserted sequence (with respect to the core structure) is either at a different locus to all other examples or is differentiated by length and structure. Only three of these subfamilies are presently found in class 1 integrons. Examples of 59-be sequence variation, representing the range observed in the EGC dataset, are shown in Fig. 2. Of note are the 59-be sites associated with EGC099 (shorter than previously assayed elements), EGC068 (greatest divergence from the canonical structure), EGC102 (large insertion between halves), and BGC001 (containing an insertion introducing asymmetry). These 59-be sites are ‘extremes’ of the diversity observed and to-date no members of these subfamilies have been found in gene cassettes encoding antibiotic resistance genes. 59-be sites from environmental DNA samples are active recombination sites 59-be sites from clinical isolates are active recombination sites recognizable by integron integrases. Of the considerable number of 59-be sites tested for activity with IntI1, all have been found, with varying levels of efficiency, to be functional (Martinez and de la Cruz, 1990; Hall et al., 1991; Collis et al., 1993; 2001; Stokes et al., 1997). In addition, activity with the class 3 integrase, IntI3, has also been demonstrated (Collis et al., 2002). To determine whether the 59-be sites identified in environmental clones are active recombination sites, six of them (Fig. 4), EGC099, EGC082, EGC102, EGC140, EGC068 and BGC001 were tested in conduction assays to determine if they could be recognized by the class 1 integrase, IntI1. Tested elements were selected to represent the diversity of elements recovered from the environments examined. The elements associated with EGC082 and EGC140 are similar to the well-studied and highly active aadB 59-be element in terms of both length and sequence. In contrast, the EGC099 59-be, at 56 bases is the shortest element of the 59-be family seen to date while EGC102 is an example of an element that groups with members that are of greater length. EGC068 is similar in sequence to other elements recovered from Balmain but is noteworthy in that it has a nine base right hand simple site spacer (Fig. 2). This is the only element known to have a spacer of this length and contrasts with the seven or eight bases for all other elements (Stokes et al., 1997). A sixth element was also tested. This element, BGC001 (Fig. 4), was from a cassette within an array in a strain of Pseudomonas stutzeri from a soil enrichment culture and is noteworthy in that it possesses the PAR signature (Vaisvila et al., 2001 and Fig. 4) associated with elements from Pseudomonas species. All tested elements were functional (Table 2). Activity levels varied however, with the three shortest elements EGC099, EGC082 and EGC140 included in a group of five that were the most active and comparable to that of the highly active aadB 59-be. A fourth, longer element of 102 bases, EGC102, also fell within this highly active group as did the element from Ps. stutzeri. EGC068, the element with a nine base right hand spacer was the least active at a level about 10 to100-fold below the others. It nonetheless had a level of activity 50-fold above a noelement control. © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394 The mobile gene cassettte metagenome 389 Fig. 4. 59-be sites tested for recombination activity shown as foldbacks to highlight their inverted repeat structure. Colons indicate complementary bases. Sequences shown are as they appear in the linear array from which they are derived and as tested in conduction assays. Both EGC068 and BGC001 contain an insert in the left side of the element compared to the right side. The positions of these extra bases are indicated by the vertical arrows. In EGC068 the insertion is TAG and in BGC001 it is TCGCTCGCCTCGCTCACT. length of the PCR product was consistent with insertion at either attI1 or orfA (Table 3). This mapping was also consistent with the Tp phenotype in that co-integrates mapping to attI1 were Tp sensitive and those mapping to orfA were Tp resistant. No insertion events were found at dfrB2, a result also seen previously for 59-be sites from antibiotic resistance cassettes where non-attI1 insertion events favour orfA (Collis et al., 2001). To confirm the PCR mapping data and further investigate the nature of the recombination events involving the environmental 59-be sites, the junctions of several of these 69 co-integrates were sequenced. In total, 10 independent co-integrates were sequenced at both the left and right junctions (Fig. 1) and a further 12 independent cointegrates were sequenced at one junction. In all cases (Table 3), the recombination crossover point could be Analysis of recombination events To investigate the recombination events involving each of the environmental test elements, the sensitivity of cointegrates to trimethoprim (Tp) was determined (Table 3). Tp sensitivity implies insertion at attI1 since the dfrB2 gene is separated from the Pc promoter on which it depends for expression (Fig. 1). The percentage of Tp sensitive co-integrates was between 81 and 94 indicating a strong preference for insertion at attI1. These values are consistent with those seen previously for 59-be sites from antibiotic resistance cassettes when cloned into pACYC184 in orientation 2 (Collis et al., 2001). The insertion site of several co-integrates was further analysed by PCR mapping (Experimental procedures). In total, 69 co-integrates were mapped and in all cases the Table 2. Conduction frequency of 59-base elements from environmental samples Plasmid Test element Fragment lengtha Element length Range Average frequencyb pMAQ28 pMAQ701 pMAQ653 pMAQ713 pMAQ714 pMAQ707 pMAQ710 pACYC184 aadB/qacE EGC099 EGC082 EGC102 EGC140 BGC001 EGC068 none 202/198 101/197 486/383 164/95 84/110 124/114 142/87 N/A 60 56 60 102 60 77 73 N/A 4.5 × 10−3-1.6 × 10−2c 3.2 × 10−3-5.6 × 10−2 4.1 × 10−3-1.3 × 10−2 1.7 × 10−3-9.6 × 10−3 7.2 × 10−4-1.4 × 10−2 2.1 × 10−4-2.5 × 10−3c 4.1 × 10−5-2.6 × 10−4 8.8 × 10−7-4.5 × 10−6 1.1 × 10−2 (5)c 1.9 × 10−2 (12) 8.9 × 10−3(6) 6.1 × 10−3(7) 6.1 × 10−3(5) 1.3 × 10−3 (6)c 1.5 × 10−4(7) 2.9 × 10−6 (4) a. Numbers refer to nucleotides in the cloned fragment to the left and right of the recombination crossover point. b. Values for test elements are derived from at least three independent donor constructs with the number of assays shown in brackets. c. Values for pMAQ28 and pMAQ707 are from Collis et al. (2001) and Holmes et al. (2003) respectively. © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394 390 A. J. Holmes et al. Table 3. Characteristics of co-integrates formed with environmental 59-be sites and In3. Test element Percentagea TpS PCR mappingb Junction sequencingc attI orfA EGC082 BGC001 EGC099 EGC068 EGC102 EGC140 94 88 81 88 90 94 9/11 attI; 2/11 orfA 6/9 attI; 3/9 orfA 5/9 attI; 4/9 orfA 8/16 attI; 8/16 orfA 11/12 attI; 1/12 orfA 11/12 attI; 1/12 orfA GTTAG (1/1) GTTA (3/3) GTTAGGC (2/0) GTTAGGC (3/0) GTTAG (0/2) GTTAG (1/2) GTTAGA (0/1) NE CGTTAG (0/1) CGTTAG (0/1) NE CGTTAG (0/1) a. At least 100 co-integrates were tested from a minimum of four independent crosses. b. Mapped co-integrates are derived from at least three independent crosses. Co-integrates were otherwise selected randomly except for BGC001 and EGC068 for which co-integrates were selected on the basis of their Tp phenotype. c. Co-integrates sequenced were selected from those as being either attI1 or orfA insertions by PCR mapping. Where more than one co-integrate was sequenced for a particular element at the same insertion point (attI1 or orfA), replicates are derived from independent crosses. Sequence shown indicates the region at and around the core site to which the recombination crossover point can be defined (i.e. the two recombining molecules are identical in the core site region indicated). Numbers in brackets before and after the slash (/) refers to the number of co-integrates sequenced at both junctions and one junction respectively. NE = not examined. localized to a region of between four (BGC001 versus attI1) and seven (EGC099 or EGC068 versus attI1) bases that included the invariant GTT of the 1R core site (Fig. 2). Consequently it is likely that the IntI1-mediated recombination events involving these environmental 59-be sites is the same as that previously described for 59-be sites from antibiotic resistance cassettes where the recombination crossover has been shown to occur between the G and first T of the 1R core site (Stokes et al., 1997). Discussion The properties of integrons and gene cassettes indicate that these elements have the potential to play a broader role in bacterial evolution. Given that the integron is a relatively simple structure the significance of the integron/ gene cassette system is inextricably linked to the nature of the mobile gene cassette pool. Of particular importance here is the diversity of cassette-associated genes, the distribution of gene cassettes, and the ability of different integrons to exploit gene cassettes. We have previously reported that primers can be used in ‘cassette PCR′ to recover intact genes from environmental DNA and that this technique taps a very large genetic resource (Stokes et al., 2001). In this paper we confirm that cassette PCR samples an environmental gene cassette ‘metagenome’ which is accessible to class 1 integrons. Even on the basis of the present, limited dataset it is evident that the EGC metagenome sampled by cassette PCR shows remarkable diversity in both 59-be sites and cassette content. The majority of EGC include proteincoding orfs. It appears that the nature of the cassetteencoded proteins is different to typical protein-coding genes found in bacteria. The majority of cassette-encoded proteins represent novel families and no genes encoding enzymes of central metabolic pathways were found. In this respect the EGC metagenome is markedly different from any bacterial genome characterized to date. However, the lack of genes of central metabolism could simply reflect that, in terms of overall bacterial genetic diversity, such genes are a minor component. Indeed, genome sequencing projects have indicated that two strains of the same species may diverge considerably and that this primarily reflects genes outside of central metabolism. Given the very large size of bacterial populations it is not unreasonable to expect a high proportion of novelty in the ‘species genome’ (Lan and Reeves, 2000). Consideration of the diversity of orf sizes, inferred physicochemical properties and predicted functions suggests that any protein may be encoded within a gene cassette. Evaluation of this possibility will require large scale sequencing of the EGC metagenome. It is clear that non-protein coding DNA, including features such as binding sites for regulatory proteins and small RNAs, is an important part of bacterial genomes. If the EGC pool is a fundamental resource for bacteria, whereby mobilised genes facilitate genome evolution, we might expect it to contain a significant proportion of such features. Gene cassettes that do not contain obvious protein-coding orfs occur in a V. cholerae chromosomal integron (Heidelberg et al., 2000) and are present in this EGC dataset. One noteworthy observation here is that the cassette content of the EGC105 and EGC091 families is characteristically not protein-coding. Two factors suggest that these sequences encode some biological activity. First, nonidentical members of the families were repeatedly isolated from several separate environmental DNA samples (four for the EGC091 family). Second, the strong conservation of both sequence families implies selective constraints on these sequences perhaps suggesting that they represent a family of transcribed RNAs. These observations raise the possibility that essentially any DNA-encoded function may be contained within a gene cassette. Present data indicate that gene cassettes, when classified by their 59-be sequence, show at least some partitioning across bacterial species. The first evidence for © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394 The mobile gene cassettte metagenome 391 distinctive patterns of relationship among gene cassettes emerged from studies of integron arrays in Vibrio species. In these examples the 59-be sites were found to be characteristically long (∼ 130 bp) and showed unexpectedly close sequence relationship (Mazel et al., 1998; Clark et al., 2000). The demonstrations that the integrons hosting these arrays are fixed in the chromosomes of most Vibrio species, and that there is at least some correlation between 59-be sequence relationships and the species of origin, led Rowe-Magnus et al. (2001) to conclude that certain groups of 59-be sites are characteristic of particular bacterial species. Subsequent data from other bacterial genera has supported the view that cassettes found within the same chromosomal integron show significant similarity of their associated 59-be sites. In this ‘integron/ gene cassette relationship’ model, atypical 59-be sequences found in chromosomal integrons are inferred to reflect acquisition of cassettes via HGT. One of the most significant features of the gene cassettes recovered directly from soils is that the diversity of 59-be sites observed is greater than that observed in any one integron array. The 37 cassette-associated recombination sites recovered here include at least 11 distinct subfamilies (Fig. 2). This is greater than in all three completely sequenced chromosomal integrons. There are four subfamilies from 22 cassettes in X. campestris (one of which has 19 members the others being unique), three subfamilies from 33 cassettes in Ps. alcaligenes (Vaisvila et al., 2001) and one from 179 in V. cholerae (Heidelberg et al., 2000). It is also much greater than the cumulative total for any single integron class with the exception of class 1. Together these observations demonstrate that the EGC metagenome sampled by cassette PCR is likely to be partitioned across multiple bacterial species and/or multiple integrons. In support of this we have directly recovered diverse integrons from soil by PCR (Nield et al., 2001) and have recently isolated several different species that contain integrons (unpublished). A 59-be site from one such isolate (Ps. stutzeri strain Q) was included in this analysis. We tested six EGC 59-be sites for recombinational activity with the class 1 integron integrase and recombination sites found in class 1 integrons. Although a relatively limited number, these elements were deliberately selected to represent as diverse a range of element types as possible. These included elements of a previously undescribed total length (56 bases) and previously undescribed right hand spacer length (nine bases). Despite this all elements were found to be active. Mapping of cointegrate junctions showed that in all cases recombination was site-specific, preserving the orientation of the gene cassette and therefore its compatibility with the integronassociated promoter Pc. These data indicate that class 1 integrons are inherently capable of acquiring the tested elements and orienting them in such a way that any associated gene could be expressed. Also, the bias towards the capture of antibiotic resistance containing cassettes is certainly a result of natural selection and that such cassettes are being acquired, by class 1 integrons, from a much larger pool of cassettes. Our data set represents the closest currently available to a random sampling of natural gene cassette diversity. When viewed together with available data on gene cassettes from clinical environments, or large-scale organism sequencing projects, a number of points become clear. Specific environmental pressures may show correlation with specific cassette-associated genes, as witnessed by the abundance of antibiotic resistance gene cassettes in clinical or animal production environments. Specific organisms (or integrons) may show correlation with specific subfamilies of recombination sites, as shown by various Vibrio, Pseudomonas and Xanthomonas species. Natural communities contain very high diversity of both recombination sites and cassette-associated genes. The abundance of gene cassettes, and their capacity to encode diverse types of DNA-related function indicate that the EGC metagenome represents a fundamental resource for bacteria. Integrons provide a means for bacteria to perform ‘combinatorial genetics’ upon this pool. The association of integrons with other genetic elements such as transposons, plasmids and chromosomes provides both gene cassettes and integrons routes by which they may travel both within cells and between cells. A number of genetic elements and recombination processes are now known to contribute to the mobilization, transfer and eventual capture of DNA by the receiving cells. To what extent the integron/gene cassette system contributes to the different stages within the total gene flux in proportion to other systems is not yet clear. In part, this will depend on the proportion of cells that possess this gene capture system. The abundance of gene cassettes however, would appear to indicate that the impact of this system on bacterial genome evolution will be substantial. Experimental procedures Bacterial strains, plasmids and primers UB5201 is F– pro met recA56 gyrA; UB1637 is F– his lys trp recA56 rpsL (de la Cruz and Grinsted, 1982). Plasmids and primers used are shown in Table 4 and Table 5 respectively. DNA manipulations Recovery of gene cassettes from natural environments by PCR, their cloning and sequencing, has been described (Stokes et al., 2001). The cassette PCR technique may recover partial gene cassettes or gene cassette arrays that include recombination sites. In this study we have consider- © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394 392 A. J. Holmes et al. Table 4. Plasmids. Plasmid Description Cloned recombination sitea Position in pACYC184b Relevant phenotype Reference R388 33 kb IncW plasmid containing class 1 integron In3 R388 with aphA inserted into intI1 gene Cloning vector N/A N/A TpRSuRTra+IntI1+ Avila and de la Cruz (1988) N/A N/A TpRSuRKmRTra+IntI1– N/A N/A CmRTcR N/A N/A ApRIntI1+ pMAQ28 pMAQ653 pMAQ701 pMAQ707 1176 bp RsaI-BamH1 fragment of In2 in pUC9 400 bp Sau3A-HindIII fragment 882 bp BamH1 fragment 300 bp HindIII-BamH1 fragment 238 bp HindIII-BamH1 fragment aadB/qacE 59-be EGC082 59-be EGC099 59-be BGC001 59-be (AY129391) BamH1-HindIII (1) BamH1 (2) BamH1-HindIII (2) BamH1-HindIII (2) CmR. CmR CmR CmR pMAQ710 pMAQ713 pMAQ714 241 bp HindIII-BamH1 fragment 271 bp HindIII-BamH1 fragment 206 bp HindIII-BamH1 fragment EGC140 59-be (AF421329) EGC068 59-be (AF349098) EGC102 59-be (AF265275) BamH1-HindIII (2) BamH1-HindIII (2) BamH1-HindIII (2) CmR CmR CmR pMAQ495 pACYC184 pSU2056 Collis et al. (1998) Chang and Cohen (1978) Martinez and de la Cruz (1990) Hall et al. (1991) This study This study Holmes et al. (2003) This study This study This study a. Accession numbers for environmental and bacterial gene cassettes from which cloned elements are derived are shown in brackets. b. Numbers in parentheses represent the orientation of the cloned fragment with respect to pACYC184 as previously defined (Collis et al., 2001). ably expanded the number of 59-be recombination sites recovered from environmental samples through sampling microcosms established from the previously described Balmain, Homebush and Lidsdale soil samples. This expanded dataset enables us to address the recombination activity of environmental gene cassettes for the first time. Details of the microcosm conditions and enrichment are not pertinent to the present data and will be reported elsewhere, obtained upon request. Polymerase chain reaction conditions for co-integrate mapping were: [(94°C × 3 min)] × 1, [(94°C × 30 s)(65°C × 30 s)(72°C × 90 s)] × 35, [(72°C × 5 min] × 1. DNA sequencing was performed at the Macquarie Sequencing Facility (Macquarie University, Australia) using an ABI Prism 377 (PE Biosystems). Conduction assays The conduction assay was performed as described previously (Collis et al., 2002). Briefly, a donor cell contains three plasmids. One of these is a conjugative plasmid, pMAQ495, that contains the integron In3 (Fig. 1) but with an insertionally inactivated intI1 gene (Table 4). The second plasmid is a derivative of the cloning vector pACYC184 and includes a test recombination site. The third plasmid, pSU2056, supplies highly expressed IntI1 protein in trans. Recombination efficiencies are determined by the frequency with which the test recombination site recombines with one of the three partner recombination sites in In3 (see below). This efficiency is measured as the ratio of the number of co-integrates conducted to a recipient cell, as measured by transfer of chloramphenicol resistance, divided by the total number of pMAQ495 transconjugants as measured by transfer of trimethoprim resistance. Analysis of co-integrates In3 of pMAQ495 (R388) contains three recombination sites. These are attI1, and the 59-be sites of dfrB2 and orfA (Fig. 1). Insertion of a test element at attI1 separates the dfrB2 gene from the Pc promoter leading to a TpS phenotype. Consequently the Tp phenotype was used as an indicator of insertion at attI1. However, to accurately and rapidly map cointegrates, a PCR-based strategy was used (Fig. 1). Two primers, one specific for a sequence within pACYC184 of the Table 5. Sequencing and PCR primers. Primer Sequence Positiona/comment Accession number/reference HS286 HS287 HS318 HS319 HS320 HS457 b Stokes et al. (2001) Stokes et al. (2001) J01773 X12869 X12869 X06403 HS458 HS459 HS460 5′GTTTGATGTTATGGAGCAGCAACG3′ 5′GCAAAAAGGCAGCAATTATGAGCC3′ 5′GGAAGGAGCTGACTGGGTTGAAGG3′ For cassette PCR. Targets left half of 59-be For cassette PCR. Targets right half of 59-be 815–831. (C). Within dfrB2 gene cassette 282–298. Within orfA cassette 606–622. (C). Within 3′-conserved segment 1452–1476. Adjacent to unique HindIII site of pACYC184. 648–671. Within 5′-conserved segment 813–836 (C). Within 3′-conserved segment 2167–2190 (C). Adjacent to unique SalI site of pACYC184. 5′GGGATCCTCSGCTKGARCGAMTTGTTAGVC3′ 5′GGGATCCGCSGCTKANCTCVRRCGTTAGSC3′ 5′GCTTCATCGCTACTTTG3′ 5′GTATGAAGTCTTTGGCG3′ 5′AGTAAAGCCCTCGCTAG3′ 5′CAAATGTAGCACCTGAAGTCAGCCC3′ b J01773 X12869 X06403 a. Numbers refer to location in the cited database entry. (C) indicates sequence is the complementary strand. b. The first eight bases include a BamH1 linker that is not complementary to targeted sequences. © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394 The mobile gene cassettte metagenome 393 test element and a second specific for a sequence within In3, were used to directly amplify co-integrate template DNA. The derived PCR product was of a length dependent on the insertion site. The primer pair commonly used was HS457 and HS459. As an example, PCR product lengths involving pMAQ701 were 578 bp for insertion at orfA compared to 1463 bp for insertion at attI1. Polymerase chain reaction products were also used as sequencing templates to identify the recombination crossover point. For HS457/459 products and insertion at orfA, right hand junctions were sequenced by priming with HS320. For insertion at attI1, the sequencing primer used was HS318. For some co-integrates the left hand junction was also amplified and sequenced and this was achieved with HS458 and HS460 as primers. Sequencing primers for these were HS458 for insertion events at attI1 and with HS319 for insertion events at orfA. Nomenclature of gene cassettes and 59-be sites The recovery of mobile gene cassettes directly from the environment by PCR means that the source organism cannot be identified. Consequently, for such cassettes, we have adopted a nomenclature whereby each cassette is assigned the descriptor ‘EGC’ (environmental gene cassette) followed by a unique numerical code. For PCR products with more than one cassette it is also possible to identify the sequence of a cassette's 59-be as it appears in the linearized form, in all but the last cassette of the recovered array. Consequently, identified 59-be sites are assigned the same descriptor (i.e. EGCxxx) as their cognate cassette. In the linear, integrated form of a gene cassette, the last six bases of the 59-be are derived from the following cassette. Consequently these bases may differ depending on the context of the gene cassette. For consistency all experimental data presented here refer to the ‘linear’ sequence of the element as observed in the cloned fragment unless indicated otherwise. If the sequence of the last six bases of the element in the circular form of the cassette is known, and if they are different to that seen in the linear form of the cassette in a particular array, these differences are noted. In our studies we are also recovering cassette arrays from bacterial strains that have been cultured directly from the environment. As gene cassettes are mobile and it is not yet clear if any are truly specific to certain bacterial species we use the descriptor ‘BGC’ (bacterial gene cassette) followed by a unique numerical code (e.g. BGCxxx) to describe such cassettes. For completeness of this study we have included a representative 59-be from a gene cassette in an integron in a Ps.stutzeri strain (Holmes et al., 2003) recovered from the Balmain soil (Stokes et al., 2001). Sequence accession numbers Cassettes recovered from Balmain: EGC086 (Accession number AF349106), EGC090 (AF349108), EGC104/ EGC105 (AF349111), EGC068 (AF349098), EGC070 (AF349099), EGC072 (AF349100), EGC074 (AF349101), EGC076 (AF349102), EGC078 (AF349103), EGC080 (AF349104), EGC066 (AF349097), EGC064 (AF265272), EGC095/EGC096 (AF349109), EGC092/EGC093 (AF265270); Homebush Bay cassettes: EGC082 (AF265263); Cape Denison cassettes: EGC084 (AF349105), EGC099 (AF349110); Sturt National Park cassettes: EGC101/EGC102 (AF265275). No-orf containing cassettes: EGC049 (AF349081), EGC052 (AF349085), EGC053 (AF349086), EGC054 (AF349087), EGC055 (AF349088), EGC056 (AF349089), EGC057 (AF349090), EGC058 (AF349091), and EGC091 (AF349108). Environmental locations are described in Stokes et al. (2001). Other sequences reported but not specifically discussed can be found in Accession numbers AF421312-AF421335. Recognition of gene cassettes Bioinformatics analyses were conducted on BioManager.com provided by ANGIS (http://www.angis.org.au). Gene cassettes in public databases were identified by searches through the NCBI web site (http://www.ncbi.nlm.nih.gov/ cgi-bin/Entrez/genom_table_cgi) using various sequences obtained in this study as the query sequence. Sequences were analysed for features associated with gene cassettes, principally an open reading frame and a 59be. All 59-be sites are imperfect, inverted repeat structures. This common structure is achieved through conservation of a number of positions and features (Fig. 2). Within each half of the element there are 19 positions that covary (i.e. they base pair in the fold-back structure). For ease of description we designate these positions as ‘a’ to ‘s’ on the left and ‘S′ to ‘A′ for the co-varying positions on the right (Fig. 2). The repeat is imperfect as a result of ‘disruption’ (i.e. mismatch or insertion) and these commonly occur at certain points on both sides. In total there are 13 of these and they are indicated by an asterisk or ‘ +’ symbol in Fig. 2. Seven of these are located in the left half of the element and six in the right half. We define a DNA sequence as a 59-be site if it contains these 51 positions in the relative positions shown in Fig. 2. Known 59-be sites show considerable variation on this model structure. From comparisons of previously described elements as well as for the sequences obtained here it is apparent that variation is largely restricted to a few specific loci within the 59-be sequence and results in retention of the model structure (see Results). Acknowledgements Supported by a Research Innovation Fund Grant from Macquarie University. We thank Roberto Anitori and Malcom Walter for providing the DNA from the Flinders Ranges Hot Springs, Clare McInnes for sampling the Yerranderie mine site and Alexandra Kirsten for collection of material from Cape Denison, Antarctica. We thank Ruth Hall for helpful discussions. Didier Mazel provided sequence information from XCR 59-be sites. References Avila, P., and de la Cruz, F. (1988) Physical and genetic map of the IncW plasmid R388. Plasmid 20: 155–157. Chang, A.C.Y., and Cohen, S.N. (1978) Construction and characterization of amplifiable multicopy DNA cloning vehi- © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394 394 A. J. Holmes et al. cles derived from the p15A cryptic miniplasmid. J Bacteriol 134: 1141–1156. Clark, C.A., Purins, L., Kaewrakon, P., Focareta, T., and Manning, P.A. (2000) The Vibrio cholerae O1 chromosomal integron. Microbiology 146: 2605–2612. Collis, C.M., and Hall, R.M. (1992) Gene cassettes from the insert region of integrons are excised as covalently closed circles. Mol Microbiol 6: 2875–2885. Collis, C.M., and Hall, R.M. (1995) Expression of antibiotic resistance genes in the integrated cassettes of integrons. Antimicrob Agents Chemother 39: 155–162. Collis, C.M., Grammaticopoulos, G., Briton, J., Stokes, H.W., and Hall, R.M. (1993) Site-specific insertion of gene cassettes into integrons. Mol Microbiol 9: 41–52. Collis, C.M., Kim, M.-J., Stokes, H.W., and Hall, R.M. (1998) Binding of the purified integron DNA integrase IntI1 to integron- and cassette-associated recombination sites. Mol Microbiol 29: 477–490. Collis, C.M., Recchia, G.D., Kim, M.-J., Stokes, H.W., and Hall, R.M. (2001) Efficiency of recombination reactions catalysed by the Class 1 integron integrase IntI1. J Bacteriol 183: 2535–2542. Collis, C.M., Kim, M.-J., Partridge, S.R., Stokes, H.W., and Hall, R.M. (2002) Characterization of the class 3 integron and the site-specific recombination system it determines. J Bacteriol 184: 3017–3026. de la Cruz, F. and Grinsted, J. (1982) Genetic and molecular characterization of Tn21, a multiple resistance transposon from R100.1. J Bacteriol 151: 222–228. Grainge, I., and Jayaram, M. (1999) The integrase family of recombinase: organization and function of the active site. Mol Microbiol 33: 449–456. Hall, R.M., Brookes, D.E., and Stokes, H.W. (1991) Sitespecific insertion of genes into integrons: role of the 59base element and determination of the recombination cross-over point. Mol Microbiol 5: 1941–1959. Hall, R.M., Collis, C.M., Kim, M.-J., Partridge, S.R., Recchia, G.D., and Stokes, H.W. (1999) Mobile gene cassettes and integrons in evolution. Ann New York Acad Sci 870: 68–80. Heidelberg, J.F., Eisen, J.A., Nelson, W.C., Clayton, R.A., Gwinn, M.L., Dodson, R.J. et al. (2000) DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406: 477–483. Holmes, A.J., Holley, M.P., Mahon, A., Nield, B.S., Gillings, M.R., and Stokes, H.W. (2003) A distinctive and functional integron/gene cassette system present in soil bacterial communities associated with genomic diversity in Pseudomonas stutzeri. J Bacteriol 185: 918–928. Lan, R., and Reeves, P.R. (2000) Intraspecies variation in bacterial genomes: the need for a species genome concept. Trends Microbiol 8: 396–401. Martinez, E., and de la Cruz, F. (1990) Genetic elements involved in Tn21 site-specific integration, a novel mechanism for the dissemination of antibiotic resistance genes. EMBO J 9: 1275–1281. Mazel, D., Dychinco, B., Webb, B., and Davies, J. (1998) A distinctive class of integron in the Vibrio cholerae genome. Science 280: 605–608. Nield, B.S., Holmes, A.J., Gillings, M.R., Recchia, G.D., Mabbutt, B.C., Nevalainen, K.M.H., and Stokes, H.W. (2001) Recovery of new integron classes from environmental DNA. FEMS Microbiol Letts 195: 59–65. Ochman, H., Lawrence, J.G., and Groisman, E.A. (2000) Lateral gene transfer and the nature of bacterial innovation. Nature 405: 299–304. Recchia, G.D., and Hall, R.M. (1995) Gene cassettes: a new class of mobile element. Microbiology 141: 3015–3027. Rowe-Magnus, D.A., Guerot, A.M., Ploncard, P., Dychinco, B., Davies, J., and Mazel, D. (2001) The evolutionary history of chromosomal super-integrons provides an ancestry for multiresistant integrons. Proc Natl Acad Sci USA 98: 652–657. da Silva, A.C.R., Ferro, J.A., Reinach, F.C., Farah, C.S., Furlan, L.R., Quaggio, R.B. et al. (2002) Comparison of the genomes of two Xanthomonas pathogens with differing host specificities. Nature 417: 459–463 Stokes, H.W., and Hall, R.M. (1989) A novel family of potentially mobile DNA elements encoding site-specific gene integration functions: integrons. Mol Microbiol 3: 1669– 1683. Stokes, H.W., O'Gorman, D.B., Recchia, G.D., Parsekhian, M., and Hall, R.M. (1997) Structure and function of 59-base element recombination sites associated with mobile gene cassettes. Mol Microbiol 26: 731–745. Stokes, H.W., Holmes, A.J., Nield, B.S., Holley, M.P., Nevalainen, K.M.H., Mabbutt, B.C., and Gillings, M.R. (2001) Gene cassette PCR: sequence-independent recovery of entire genes from environmental DNA. Appl Environ Microbiol 67: 5240–5246. Vaisvila, R., Morgan, R.D., Posfai, J., and Raleigh, E.A. (2001) Discovery and distribution of super-integrons among Pseudomonads. Mol Microbiol 42: 587–601. © 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
© Copyright 2026 Paperzz