The gene cassette metagenome is a basic resource for bacterial

Blackwell Science, LtdOxford, UKEMIEnvironmental Microbiology 1462-2920Blackwell Publishing Ltd, 200355383394Original ArticleThe mobile gene cassettte metagenomeA. J. Holmes
et al.
Environmental Microbiology (2003) 5(5), 383–394
The gene cassette metagenome is a basic resource for
bacterial genome evolution
Andrew J. Holmes,1† Michael R. Gillings,1
Blair S. Nield,2 Bridget C. Mabbutt,3 K. M. Helena
Nevalainen2 and H. W. Stokes2*
1
Key Centre for Biodiversity and Bioresources, Macquarie
University, Sydney NSW 2109, Australia.
2
Department of Biological Sciences, Macquarie University,
Sydney NSW 2109, Australia.
3
Department of Chemistry, Macquarie University, Sydney
NSW 2109, Australia.
Summary
Lateral gene transfer has been proposed as a fundamental process underlying bacterial diversity. Transposons, plasmids and phage are widespread and
have been shown to significantly contribute to lateral
gene transfer. However, the processes by which disparate genes are assembled and integrated into the
host regulatory network to yield new phenotypes are
poorly known. Recent discoveries about the integron/
gene cassette system indicate it has the potential to
play a role in this process. Gene cassettes are small
mobile elements typically consisting of a promoterless orf and a recombination site. Integrons are capable of acquisition and re-arrangement of gene
cassettes and of the expression of their associated
genes. The potential of the integron/gene cassette
system is thus largely determined by the diversity
contained within the cassette pool and the rate at
which integrons sample this pool. We show here
using a polymerase chain reaction (PCR) approach by
which the environmental gene cassette (EGC)
metagenome can be directly sampled that this
metagenome contains both protein-coding and nonprotein coding genes. Environmental gene cassetteassociated recombination sites showed greater diversity than previously seen in integron arrays. Class 1
integrons were shown to be capable of accessing this
gene pool through tests of recombinational activity
with a representative range of EGCs. We propose that
Received 17 October, 2002; accepted 15 January, 2003. *For
correspondence.
E-mail
[email protected];
Tel.
(+612) 9850 8164; Fax (+612) 9850 8245. †Present address. School
of Molecular and Microbial Biosciences, The University of Sydney,
New South Wales, 2006.
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd
gene cassettes represent a vast, prepackaged genetic
resource that could be thought of as a metagenomic
template for bacterial evolution.
Introduction
The Bacteria are the most physiologically diverse group
known. Such physiological diversity is underpinned by
corresponding genomic diversity. Given that the typical
bacterial genome size is <10 Mbp this diversity has arisen
from a remarkably small genomic template. These contrasting observations can be reconciled through the proposition that horizontal gene transfer (HGT) is the major
factor in the evolution of bacteria (Ochman et al., 2000).
There is a large body of evidence supporting this general
thesis. Mechanisms for transfer of genes between cells
are well known and virtually ubiquitous in both a phylogenetic and ecological sense. Genome sequence analyses
have shown that a large proportion of any one bacterial
genome is likely to have been acquired from ‘foreign’
sources. For several complex phenotypes there is strong
evidence that sets of genes were separately acquired by
HGT. A major gap in our knowledge, however, is how
transferred genes are integrated into the metabolism of
the recipient cell.
In recent years the integron/gene cassette system has
emerged as one of the best examples of capture and
expression of new genes (Hall et al., 1999). Integrons
include a site-specific recombination system and were
first identified as the sites of antibiotic resistance gene
capture in mobile elements from clinical isolates (Stokes
and Hall, 1989; Martinez and de la Cruz, 1990; Collis
et al., 1993). The integron is a recombination and expression system that captures genes as part of a genetic
element called a gene cassette (Recchia and Hall, 1995).
Gene cassettes are very simple genetic elements that
typically consist of a single promoterless gene and a
recombination site termed a 59-base element (59-be). In
the well-studied class 1 integrons, the gene capture system consists of a site-specific recombinase (IntI1) and a
recombination site (attI1). IntI1 reversibly catalyses two
types of site-specific recombination reaction. These are
recombination between attI1 and a 59-be, or recombination between two 59-be sites. Collectively these reactions
result in the assembly of new genes downstream of an
integron-associated promoter Pc that directs transcription
384 A. J. Holmes et al.
of the cassette-associated genes (Stokes and Hall, 1989;
Hall et al., 1991; Collis and Hall, 1992; Collis and Hall,
1995). The arrangement of these features is shown in
Fig. 1. Recently, a similar organization has also been
demonstrated for the class 3 integron (Collis et al., 2002).
Class 1 and 3 integrons thus fulfill the basic requirements
for gene acquisition in the HGT model for evolution of
bacterial physiological diversity. Disparate genes can be
assembled at a specific locus where they are amenable
to regulatory control by the cell.
Recent discoveries have shown that integrons are not
simply part of mobile elements carrying antibiotic resistance genes, but are a distinct type of genetic element
found in a variety of genomic contexts. Integrons and
gene cassette arrays have been sequenced in the chromosomes of Pseudomonas, Vibrio, Xanthomonas and
Shewanella spp. (Heidelberg et al., 2000; Rowe-Magnus
et al., 2001; Vaisvila et al., 2001; da Silva et al., 2002).
Furthermore, intI homologues are present in the unfinished genomes of Treponema denticola, Geobacter sulphurreducens, Acidithiobacillus ferrooxidans, and
Nitrosomonas europaea implying that integrons are also
present in these genomes (Nield et al., 2001). Given the
gene acquisition and expression properties of class 1
and 3 integrons, the discovery that integrons are widespread raises the proposition that they may play a general role in the acquisition of new genes in bacterial
genomes. However the integron platform is essentially a
simple structure, entirely dependent on gene cassettes
as its substrates. The crucial questions therefore revolve
around the nature of the gene cassette pool and how it
interacts with integrons. Difficulties in sampling or recognizing gene cassettes from outside an integron context have restricted our capacity to address these
questions.
Recognition of gene cassettes independently of integron features requires an objective definition of the 59be sequence family. Amongst characterized 59-be sites,
length ranges from 57 bp to 145 bp and the pairwise
sequence difference may exceed 70%. A detailed comparison of all cassette-associated recombination sites
available at the time was reported by Stokes et al.
(1997) and they noted a number of conserved features.
These include an overall imperfect inverted repeat structure, each half of which includes a simple site of the type
commonly associated with the tyrosine family of recombinases (Grainge and Jayaram, 1999). 59-be sites
include a core site with the consensus GTTRRRY (designated 1R, Fig. 2) and for recombination events mediated by IntI1, the recombination crossover point is
between the G and first T of this site (Stokes et al.,
1997). Within the 59-be conserved features there is
moderate sequence conservation including eight nearinvariant positions (Fig. 2).
Fig. 1. Structure of In3 and of co-integrates formed with test 59-be sites. The two most common insertion sites into In3 for a test 59-be are
indicated. LHS, Left-hand side. RHS, Right-hand side. Restriction sites are: B, Bam H1; H, Hind III; and S, Sal 1. 3′-CS, 3′-conserved segment.
Pc, promoter for cassette-associated genes. The horizontal arrows indicate the binding sites and direction of synthesis for each of the primers
used in mapping co-integrate junctions. Cm, chloramphenicol
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
The mobile gene cassettte metagenome 385
Fig. 2. Alignment of the conserved domains of 59-be sites from environmental cassettes. 59-be recombination site sequence is shown in bold,
with five bases of flanking sequence at each end also shown. The highly variable central regions are not shown, with numbers indicating the
length of the omitted sequence. Sequence depicted in upper case is that predicted for the free, circular form of the cassette. For most sequences
shown here, the last six bases of the 59-be (which also represent the first six bases of the integrated, linear form of the cassette) are derived
from a PCR primer and consequently shown in lower case. Putative IntI binding domains in the left (1L and 2L) and right (1R and 2R) halves of
the elements are indicated by shading and arrows (top). The filled triangles (▼) identify locations at which extra bases are not shown (see also
Fig. 4). The alignment is separated into three groups to accommodate length variation. The ‘position’ lines allow alignment between groups. Upper
and lower case letters in the ‘position’ lines indicate bases that are generally complementary when left and right halves of the element are
compared. The asterisk indicates the extra base in 2L compared to 2R. The ‘+’ symbol indicates positions that are found in all 59-be sequences
but disrupt the repeat structure (see Experimental procedures). Positions that are not common to all 59-be are left blank. Numbers on the left
distinguishes each of the 11 identified subfamilies.
We have recently demonstrated that use of degenerate
primers targeting the conserved regions of 59-be sites in
PCR with environmental DNA samples results in recovery of diverse sequences showing characteristics of gene
cassettes (Stokes et al., 2001). In total, 123 predicted
gene cassettes were recovered in that study. Here we
have confirmed these sequences as gene cassettes.
Thus ‘cassette PCR′ technique allows us to address a
number of questions regarding the nature of the gene
cassette pool for the first time. Here we describe the
characterization of a further 41 environmental gene cassettes, sequence relationships of environmental gene
cassettes to gene cassettes from other sources
and demonstrate their ability to be recruited by class 1
integrons.
Results
PCR recovery of gene cassettes
A total of 57 cloned amplicons, derived by PCR with the
primers HS286 and HS287 and that predominantly target
cassettes contained within linear arrays, were analysed
from soil microcosm samples. All showed the characteristics expected of cassette PCR products (Stokes et al.,
2001). Of these, 38 represented amplification of a single
gene cassette, and 19 the amplification of arrays of two,
three or four cassettes in tandem. Many gene cassettes
were sampled more than once (either singly or as part of
an array), thus this dataset resulted in a total of 41 distinct
cassettes (EGC111 – EGC151, see Experimental procedures). When pooled with our previous study the total
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
386 A. J. Holmes et al.
number of gene cassettes directly sampled from natural
environments via cassette PCR is 164.
Gene cassette content: protein-coding orfs
The vast majority of experimentally characterized gene
cassettes contain promoterless protein coding orfs. Our
Environmental Gene Cassette (EGC) dataset and gene
cassettes recovered as part of large scale sequencing
projects contain a disproportionately high fraction of
novel sequences. This complicates the task of predicting
coding sequences. Where alternative reading frames are
found by software programs the cassette boundaries can
assist in predicting the ‘true’ orf. Regardless of the difficulties in prediction of the coding content of the gene
cassettes it is evident that the spectrum of proteins harboured in the total gene cassette pool is extraordinarily
diverse. Of the 142 hypothetical proteins currently in the
EGC data set only 24 (17%) show sequence relationship
to any previously described protein and, of these, 17 are
similar only to ‘hypothetical proteins’ (Table 1). This same
pattern is also true of other bacterial gene cassette pools
outside of the antibiotic resistance context including the
Xanthomonas campestris pv campestris, Pseudomonas
alcaligenes and Vibrio cholerae chromosomes (Heidelberg et al., 2000; Vaisvila et al., 2001; da Silva et al.,
2002).
No obvious hints of a general role for cassette-associated proteins could be predicted. Predicted protein sizes
in the EGC pool ranges from 36 to 346 amino acids.
The distribution of this range is skewed, with ∼ 60% of
all hypothetical proteins in the size range of 70–140
amino acids. Only 10% were greater than 200 amino
acids. However, in general, the size range of predicted
proteins matches that of biologically active peptides that
are ribosomally synthesized. Similarly, calculations of
hydrophobicity values show an essentially normal distribution indicating the pool of proteins is unlikely to show
any marked bias towards membrane or cytoplasmic
location.
The lack of pattern in size and physicochemical properties of hypothetical proteins is matched by the diversity
of biological activities in those cassettes that have been
characterized or which show homology to characterized
proteins. In genes within cassettes from class 1 integrons
most encode antibiotic resistance. However the mode of
resistance varies tremendously (Recchia and Hall, 1995).
Gene cassettes identified from genome sequencing or
environmental contexts encode diverse properties including, lipases, restriction endonucleases, transport proteins,
toxins and surface antigens (Clark et al., 2000; Vaisvila
et al., 2001; Rowe-Magnus et al., 2001; Stokes et al.,
2001).
Gene cassette content: non-protein-coding sequences
Some gene cassettes appear to have a biological role
other than protein-coding. EGC104 is the first cassette of
Table 1. Cassette gene products with database matches.
Gene product
Top database hit
%Identity/%similarity
Predicted function
orf297_EGC010
orf101_EGC017
orf117_EGC020
orf346_EGC034
orf271_EGC035
orf81_EGC044
orf113_EGC044
orf132_EGC064
orf208_EGC067
orf133_EGC068
orf90_EGC101
orf105_EGC103
orf147_EGC162
orf168_EGC027
orf209_EGC029
orf154_EGC030
orf135_EGC079
orf139_EGC084
orf159_EGC115
orf174_EGC125
orf125_EGC139
orf161_EGC159
Orf110_EGC088
Orf117_EGC148
Bacillus subtilis (CAB15191)
Caulobacter crescentus (AAK23946)
Bacteriophage 933 W (AAD25429)
Pseudomonas syringae (ZP_00124941)
Wolinella succinogenes (CAC50085)
Nitrosomonas europaea (ZP_00003253)
Clostridium thermocellum (ZP_00061925)
Pseudomonas aeruginosa (AAG07752)
Thermus thermophilus (BAB17605)
Mycobacterium tuberculosis (AAK44262)
Nostoc punctiforme (ZP_00108370),
Mycobacterium tuberculosis (AAK46615)
Nitrosomonas europaea (ZP_00003997)
Pasteurella multicida (AAK03695)
Xanthomonas campestris (AAM39391)
Shewanella oneidensis (AAN56682)
Agrobacterium tumefaciens (AAK86432)
Oceanobacillus iheyensis (BAC15208)
Bacillus anthracis (NP_656837)
Bacillus anthracis (NP_656524)
Caulobacter crescentus (AAK24858)
Brucella suis (AAN30397)
Xylella fastidiosa (AAF85305)
Synechocystis sp. PCC6803 (BAA18636)
28/48
61/74
41/56
27/49
31/52
91/96
45/60
49/61
30/53
32/50
71/78
33/52
68/80
27/52
27/46
50/61
34/60
29/54
36/58
36/54
42/63
27/57
17/41
34/62
Hypothetical protein
Hypothetical protein
Hypothetical protein
Aminoglycoside phosphotransferase
Sulphur transferase
Possible toxin antidote protein
PemK family
Hypothetical protein
RNA methyl transferase
Hypothetical protein
Hypothetical protein
Hypothetical protein
Pyrimidine dimer DNA glycosylase
Hypothetical protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Hypothetical protein
Bleomycin resistance
Hypothetical protein
Hypothetical protein
Hypothetical protein
Other notable hits: Orf360_EGC124 is in the reverse orientation. This orf has strong similarity (42% identity, 65% similarity) to cheA (AAK78103).
It is likely that this cassette is a non-specific product. Orf196_EGC129 shows strong identity over the first 36 residues only to a diverse range of
peptides including HI1126 (AAC22780). It is possible this region is a common leader sequence.
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
The mobile gene cassettte metagenome 387
an array in the clone Bal48 (AF349111). This cassette
contains 178 bp that are not part of its 59-be. Several
observations indicate that this sequence is non-proteincoding. Although, in both the forward and reverse orientations, there is one reading frame comprising an uninterrupted stretch of coding codons, neither of these frames
has plausible start or stop codons within the cassette.
Second, the EGC104 cassette content shows significant
sequence relationship to three other cassettes (EGC048,
EGC050, EGC051). Collectively the contents of these four
cassettes comprise a sequence family sharing 61–94%
identity (data not shown) yet none contain obvious protein-coding orfs. Third, despite the DNA sequence conservation across the family, neither of the possible orfs in
EGC104 is conserved in other family members. Finally,
sequence conservation is particularly strong through the
central 70 bp of the cassette-content and this region forms
an imperfect inverted repeat. The pattern of family
sequence conservation indicates that sequence structure,
rather than coding potential, is more biologically relevant.
We conclude that members of this sequence family have
a role other than encoding a protein.
A similar situation is seen for the cassette content of
EGC091. The complete cassette was recovered as part
of an array in clone Bal33 (AF349108). It shows significant
DNA identity to other sequences from various environments (two soils and a hot spring). These comprise at
least nine distinct EGC types, including EGC049,
EGC052, EGC053, EGC054, EGC055, EGC056,
EGC057, EGC058 and EGC091. For brevity, only three
of these representing the range of sequence divergence,
are shown in Fig. 3. Members of the family are 308–
330 bp in length and in all cases stop codons are prevalent, precluding the presence of protein-coding orfs
across the sequence family. Pairwise sequence identity
ranges from 62 to 91%. Noteworthy features of the
EGC091 family are that poly A/poly T tracts are prominent
with (16 occurrences) and that the predicted RNA (if transcribed) contains a number of sequence domains likely
to have a stable secondary structure.
Fig. 3. Alignment of three representative members of the EGC091 sequence family. Positions that are universal in all nine members of the
sequence family are shown in the consensus as upper case. Those that are strongly conserved (> 75% identity) across all members of the
sequence family are shown in lower case. Where no data is given in the consensus sequence there is not significant sequence conservation
across all members of the sequence family. Members of the sequence family are EGC049, EGC052, EGC053, EGC054, EGC055, EGC056,
EGC057, EGC058 and EGC091.
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
388 A. J. Holmes et al.
Sequence relationships of ‘environmental’ 59-be sites
In total 37 inverted repeat structures were identified within
the recovered environmental clones that were consistent
with the generalized structure for 59-be sites as outlined
in Experimental procedures (Fig. 2). There was nonetheless considerable sequence and structural variation
between elements.
The most notable variable feature of 59-be sites is the
length and sequence of the region separating the two
halves of the repeat. This central region ranges in length
from three to 79 bases occurring between positions ‘s’
and ‘S′ in the generalized structure (Fig. 2). None, part,
or all of this region may be part of the inverted repeat
structure. Another noteworthy variation is the presence of
‘asymmetric’ sequence insertions in some 59-be sites
(triangles in Fig. 2). These appear to only occur at specific loci that are internal to the conserved ‘paired’
regions making the inverted repeat structure asymmetric.
It is this sequence variation that makes evaluation of relationships between members of the 59-be family difficult.
Estimation of evolutionary distances from sequence analyses requires the comparison of orthologous positions.
The conservation of structure indicates 59-be sites constitute an orthologous sequence family. However, it is
unlikely that positions outside the core 51 positions are
orthologous across the sequence family. Furthermore,
within the 51 core positions the structural constraints are
such that this short sequence contains no effective ‘phylogenetic signal’.
A consequence of this heterogeneity is that inferring
evolutionary relationships across the whole 59-be family
is not possible. Nevertheless, evolutionarily distinct subfamilies can be recognized on the basis of heterologous
sequence features. That is, any group of sequences containing a sequence insertion that is heterologous with
respect to all other members of the family represents an
evolutionarily distinct group (such groups are not necessarily monophyletic). The ‘PAR signature’ described for
some Ps. alcaligenes gene cassettes is such an example
(Vaisvila et al., 2001). On the basis of ‘heterologous
insertions’ our EGC dataset includes at least 11 distinct
subfamilies of 59-be sites (Fig. 2). In each of these families the inserted sequence (with respect to the core
structure) is either at a different locus to all other examples or is differentiated by length and structure. Only
three of these subfamilies are presently found in class 1
integrons.
Examples of 59-be sequence variation, representing
the range observed in the EGC dataset, are shown in
Fig. 2. Of note are the 59-be sites associated with
EGC099 (shorter than previously assayed elements),
EGC068 (greatest divergence from the canonical structure), EGC102 (large insertion between halves), and
BGC001 (containing an insertion introducing asymmetry).
These 59-be sites are ‘extremes’ of the diversity observed
and to-date no members of these subfamilies have been
found in gene cassettes encoding antibiotic resistance
genes.
59-be sites from environmental DNA samples are active
recombination sites
59-be sites from clinical isolates are active recombination sites recognizable by integron integrases. Of the
considerable number of 59-be sites tested for activity
with IntI1, all have been found, with varying levels of
efficiency, to be functional (Martinez and de la Cruz,
1990; Hall et al., 1991; Collis et al., 1993; 2001; Stokes
et al., 1997). In addition, activity with the class 3 integrase, IntI3, has also been demonstrated (Collis et al.,
2002). To determine whether the 59-be sites identified in
environmental clones are active recombination sites, six
of them (Fig. 4), EGC099, EGC082, EGC102, EGC140,
EGC068 and BGC001 were tested in conduction assays
to determine if they could be recognized by the class 1
integrase, IntI1. Tested elements were selected to represent the diversity of elements recovered from the environments examined. The elements associated with
EGC082 and EGC140 are similar to the well-studied
and highly active aadB 59-be element in terms of both
length and sequence. In contrast, the EGC099 59-be, at
56 bases is the shortest element of the 59-be family
seen to date while EGC102 is an example of an element
that groups with members that are of greater length.
EGC068 is similar in sequence to other elements recovered from Balmain but is noteworthy in that it has a nine
base right hand simple site spacer (Fig. 2). This is the
only element known to have a spacer of this length and
contrasts with the seven or eight bases for all other elements (Stokes et al., 1997). A sixth element was also
tested. This element, BGC001 (Fig. 4), was from a cassette within an array in a strain of Pseudomonas stutzeri
from a soil enrichment culture and is noteworthy in that
it possesses the PAR signature (Vaisvila et al., 2001 and
Fig. 4) associated with elements from Pseudomonas
species.
All tested elements were functional (Table 2). Activity
levels varied however, with the three shortest elements
EGC099, EGC082 and EGC140 included in a group of
five that were the most active and comparable to that of
the highly active aadB 59-be. A fourth, longer element of
102 bases, EGC102, also fell within this highly active
group as did the element from Ps. stutzeri. EGC068, the
element with a nine base right hand spacer was the least
active at a level about 10 to100-fold below the others. It
nonetheless had a level of activity 50-fold above a noelement control.
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
The mobile gene cassettte metagenome 389
Fig. 4. 59-be sites tested for recombination
activity shown as foldbacks to highlight their
inverted repeat structure. Colons indicate complementary bases. Sequences shown are as
they appear in the linear array from which they
are derived and as tested in conduction assays.
Both EGC068 and BGC001 contain an insert in
the left side of the element compared to the
right side. The positions of these extra bases
are indicated by the vertical arrows. In EGC068
the insertion is TAG and in BGC001 it is
TCGCTCGCCTCGCTCACT.
length of the PCR product was consistent with insertion
at either attI1 or orfA (Table 3). This mapping was also
consistent with the Tp phenotype in that co-integrates
mapping to attI1 were Tp sensitive and those mapping to
orfA were Tp resistant. No insertion events were found at
dfrB2, a result also seen previously for 59-be sites from
antibiotic resistance cassettes where non-attI1 insertion
events favour orfA (Collis et al., 2001).
To confirm the PCR mapping data and further investigate the nature of the recombination events involving the
environmental 59-be sites, the junctions of several of
these 69 co-integrates were sequenced. In total, 10 independent co-integrates were sequenced at both the left and
right junctions (Fig. 1) and a further 12 independent cointegrates were sequenced at one junction. In all cases
(Table 3), the recombination crossover point could be
Analysis of recombination events
To investigate the recombination events involving each of
the environmental test elements, the sensitivity of cointegrates to trimethoprim (Tp) was determined (Table 3).
Tp sensitivity implies insertion at attI1 since the dfrB2
gene is separated from the Pc promoter on which it
depends for expression (Fig. 1). The percentage of Tp
sensitive co-integrates was between 81 and 94 indicating
a strong preference for insertion at attI1. These values are
consistent with those seen previously for 59-be sites from
antibiotic resistance cassettes when cloned into
pACYC184 in orientation 2 (Collis et al., 2001).
The insertion site of several co-integrates was further
analysed by PCR mapping (Experimental procedures). In
total, 69 co-integrates were mapped and in all cases the
Table 2. Conduction frequency of 59-base elements from environmental samples
Plasmid
Test element
Fragment lengtha
Element length
Range
Average frequencyb
pMAQ28
pMAQ701
pMAQ653
pMAQ713
pMAQ714
pMAQ707
pMAQ710
pACYC184
aadB/qacE
EGC099
EGC082
EGC102
EGC140
BGC001
EGC068
none
202/198
101/197
486/383
164/95
84/110
124/114
142/87
N/A
60
56
60
102
60
77
73
N/A
4.5 × 10−3-1.6 × 10−2c
3.2 × 10−3-5.6 × 10−2
4.1 × 10−3-1.3 × 10−2
1.7 × 10−3-9.6 × 10−3
7.2 × 10−4-1.4 × 10−2
2.1 × 10−4-2.5 × 10−3c
4.1 × 10−5-2.6 × 10−4
8.8 × 10−7-4.5 × 10−6
1.1 × 10−2 (5)c
1.9 × 10−2 (12)
8.9 × 10−3(6)
6.1 × 10−3(7)
6.1 × 10−3(5)
1.3 × 10−3 (6)c
1.5 × 10−4(7)
2.9 × 10−6 (4)
a. Numbers refer to nucleotides in the cloned fragment to the left and right of the recombination crossover point.
b. Values for test elements are derived from at least three independent donor constructs with the number of assays shown in brackets.
c. Values for pMAQ28 and pMAQ707 are from Collis et al. (2001) and Holmes et al. (2003) respectively.
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
390 A. J. Holmes et al.
Table 3. Characteristics of co-integrates formed with environmental 59-be sites and In3.
Test element
Percentagea TpS
PCR mappingb
Junction sequencingc attI
orfA
EGC082
BGC001
EGC099
EGC068
EGC102
EGC140
94
88
81
88
90
94
9/11 attI; 2/11 orfA
6/9 attI; 3/9 orfA
5/9 attI; 4/9 orfA
8/16 attI; 8/16 orfA
11/12 attI; 1/12 orfA
11/12 attI; 1/12 orfA
GTTAG (1/1)
GTTA (3/3)
GTTAGGC (2/0)
GTTAGGC (3/0)
GTTAG (0/2)
GTTAG (1/2)
GTTAGA (0/1)
NE
CGTTAG (0/1)
CGTTAG (0/1)
NE
CGTTAG (0/1)
a. At least 100 co-integrates were tested from a minimum of four independent crosses.
b. Mapped co-integrates are derived from at least three independent crosses. Co-integrates were otherwise selected randomly except for BGC001
and EGC068 for which co-integrates were selected on the basis of their Tp phenotype.
c. Co-integrates sequenced were selected from those as being either attI1 or orfA insertions by PCR mapping. Where more than one co-integrate
was sequenced for a particular element at the same insertion point (attI1 or orfA), replicates are derived from independent crosses. Sequence
shown indicates the region at and around the core site to which the recombination crossover point can be defined (i.e. the two recombining
molecules are identical in the core site region indicated). Numbers in brackets before and after the slash (/) refers to the number of co-integrates
sequenced at both junctions and one junction respectively. NE = not examined.
localized to a region of between four (BGC001 versus
attI1) and seven (EGC099 or EGC068 versus attI1) bases
that included the invariant GTT of the 1R core site (Fig. 2).
Consequently it is likely that the IntI1-mediated recombination events involving these environmental 59-be sites is
the same as that previously described for 59-be sites from
antibiotic resistance cassettes where the recombination
crossover has been shown to occur between the G and
first T of the 1R core site (Stokes et al., 1997).
Discussion
The properties of integrons and gene cassettes indicate
that these elements have the potential to play a broader
role in bacterial evolution. Given that the integron is a
relatively simple structure the significance of the integron/
gene cassette system is inextricably linked to the nature
of the mobile gene cassette pool. Of particular importance
here is the diversity of cassette-associated genes, the
distribution of gene cassettes, and the ability of different
integrons to exploit gene cassettes. We have previously
reported that primers can be used in ‘cassette PCR′ to
recover intact genes from environmental DNA and that
this technique taps a very large genetic resource (Stokes
et al., 2001). In this paper we confirm that cassette PCR
samples an environmental gene cassette ‘metagenome’
which is accessible to class 1 integrons.
Even on the basis of the present, limited dataset it is
evident that the EGC metagenome sampled by cassette
PCR shows remarkable diversity in both 59-be sites and
cassette content. The majority of EGC include proteincoding orfs. It appears that the nature of the cassetteencoded proteins is different to typical protein-coding
genes found in bacteria. The majority of cassette-encoded
proteins represent novel families and no genes encoding
enzymes of central metabolic pathways were found. In this
respect the EGC metagenome is markedly different from
any bacterial genome characterized to date. However, the
lack of genes of central metabolism could simply reflect
that, in terms of overall bacterial genetic diversity, such
genes are a minor component. Indeed, genome sequencing projects have indicated that two strains of the same
species may diverge considerably and that this primarily
reflects genes outside of central metabolism. Given the
very large size of bacterial populations it is not unreasonable to expect a high proportion of novelty in the ‘species
genome’ (Lan and Reeves, 2000). Consideration of the
diversity of orf sizes, inferred physicochemical properties
and predicted functions suggests that any protein may be
encoded within a gene cassette. Evaluation of this possibility will require large scale sequencing of the EGC
metagenome.
It is clear that non-protein coding DNA, including features
such as binding sites for regulatory proteins and small
RNAs, is an important part of bacterial genomes. If the
EGC pool is a fundamental resource for bacteria, whereby
mobilised genes facilitate genome evolution, we might
expect it to contain a significant proportion of such features.
Gene cassettes that do not contain obvious protein-coding
orfs occur in a V. cholerae chromosomal integron (Heidelberg et al., 2000) and are present in this EGC dataset.
One noteworthy observation here is that the cassette content of the EGC105 and EGC091 families is characteristically not protein-coding. Two factors suggest that these
sequences encode some biological activity. First, nonidentical members of the families were repeatedly isolated
from several separate environmental DNA samples (four
for the EGC091 family). Second, the strong conservation
of both sequence families implies selective constraints on
these sequences perhaps suggesting that they represent
a family of transcribed RNAs. These observations raise
the possibility that essentially any DNA-encoded function
may be contained within a gene cassette.
Present data indicate that gene cassettes, when classified by their 59-be sequence, show at least some partitioning across bacterial species. The first evidence for
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
The mobile gene cassettte metagenome 391
distinctive patterns of relationship among gene cassettes
emerged from studies of integron arrays in Vibrio species.
In these examples the 59-be sites were found to be characteristically long (∼ 130 bp) and showed unexpectedly
close sequence relationship (Mazel et al., 1998; Clark
et al., 2000). The demonstrations that the integrons hosting these arrays are fixed in the chromosomes of most
Vibrio species, and that there is at least some correlation
between 59-be sequence relationships and the species of
origin, led Rowe-Magnus et al. (2001) to conclude that
certain groups of 59-be sites are characteristic of particular bacterial species. Subsequent data from other bacterial genera has supported the view that cassettes found
within the same chromosomal integron show significant
similarity of their associated 59-be sites. In this ‘integron/
gene cassette relationship’ model, atypical 59-be
sequences found in chromosomal integrons are inferred
to reflect acquisition of cassettes via HGT.
One of the most significant features of the gene cassettes recovered directly from soils is that the diversity of
59-be sites observed is greater than that observed in any
one integron array. The 37 cassette-associated recombination sites recovered here include at least 11 distinct
subfamilies (Fig. 2). This is greater than in all three completely sequenced chromosomal integrons. There are four
subfamilies from 22 cassettes in X. campestris (one of
which has 19 members the others being unique), three
subfamilies from 33 cassettes in Ps. alcaligenes (Vaisvila
et al., 2001) and one from 179 in V. cholerae (Heidelberg
et al., 2000). It is also much greater than the cumulative
total for any single integron class with the exception of
class 1. Together these observations demonstrate that the
EGC metagenome sampled by cassette PCR is likely to
be partitioned across multiple bacterial species and/or
multiple integrons. In support of this we have directly
recovered diverse integrons from soil by PCR (Nield et al.,
2001) and have recently isolated several different species
that contain integrons (unpublished). A 59-be site from
one such isolate (Ps. stutzeri strain Q) was included in
this analysis.
We tested six EGC 59-be sites for recombinational
activity with the class 1 integron integrase and recombination sites found in class 1 integrons. Although a relatively limited number, these elements were deliberately
selected to represent as diverse a range of element types
as possible. These included elements of a previously
undescribed total length (56 bases) and previously undescribed right hand spacer length (nine bases). Despite this
all elements were found to be active. Mapping of cointegrate junctions showed that in all cases recombination
was site-specific, preserving the orientation of the gene
cassette and therefore its compatibility with the integronassociated promoter Pc. These data indicate that class 1
integrons are inherently capable of acquiring the tested
elements and orienting them in such a way that any associated gene could be expressed. Also, the bias towards
the capture of antibiotic resistance containing cassettes is
certainly a result of natural selection and that such cassettes are being acquired, by class 1 integrons, from a
much larger pool of cassettes.
Our data set represents the closest currently available
to a random sampling of natural gene cassette diversity.
When viewed together with available data on gene cassettes from clinical environments, or large-scale organism
sequencing projects, a number of points become clear.
Specific environmental pressures may show correlation
with specific cassette-associated genes, as witnessed by
the abundance of antibiotic resistance gene cassettes in
clinical or animal production environments. Specific
organisms (or integrons) may show correlation with specific subfamilies of recombination sites, as shown by various Vibrio, Pseudomonas and Xanthomonas species.
Natural communities contain very high diversity of both
recombination sites and cassette-associated genes. The
abundance of gene cassettes, and their capacity to
encode diverse types of DNA-related function indicate that
the EGC metagenome represents a fundamental
resource for bacteria. Integrons provide a means for bacteria to perform ‘combinatorial genetics’ upon this pool.
The association of integrons with other genetic elements
such as transposons, plasmids and chromosomes provides both gene cassettes and integrons routes by which
they may travel both within cells and between cells.
A number of genetic elements and recombination processes are now known to contribute to the mobilization,
transfer and eventual capture of DNA by the receiving
cells. To what extent the integron/gene cassette system
contributes to the different stages within the total gene flux
in proportion to other systems is not yet clear. In part, this
will depend on the proportion of cells that possess this
gene capture system. The abundance of gene cassettes
however, would appear to indicate that the impact of this
system on bacterial genome evolution will be substantial.
Experimental procedures
Bacterial strains, plasmids and primers
UB5201 is F– pro met recA56 gyrA; UB1637 is F– his lys trp
recA56 rpsL (de la Cruz and Grinsted, 1982). Plasmids and
primers used are shown in Table 4 and Table 5 respectively.
DNA manipulations
Recovery of gene cassettes from natural environments by
PCR, their cloning and sequencing, has been described
(Stokes et al., 2001). The cassette PCR technique may
recover partial gene cassettes or gene cassette arrays that
include recombination sites. In this study we have consider-
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
392 A. J. Holmes et al.
Table 4. Plasmids.
Plasmid
Description
Cloned recombination sitea
Position in
pACYC184b
Relevant phenotype
Reference
R388
33 kb IncW plasmid containing
class 1 integron In3
R388 with aphA inserted into
intI1 gene
Cloning vector
N/A
N/A
TpRSuRTra+IntI1+
Avila and de la
Cruz (1988)
N/A
N/A
TpRSuRKmRTra+IntI1–
N/A
N/A
CmRTcR
N/A
N/A
ApRIntI1+
pMAQ28
pMAQ653
pMAQ701
pMAQ707
1176 bp RsaI-BamH1 fragment
of In2 in pUC9
400 bp Sau3A-HindIII fragment
882 bp BamH1 fragment
300 bp HindIII-BamH1 fragment
238 bp HindIII-BamH1 fragment
aadB/qacE 59-be
EGC082 59-be
EGC099 59-be
BGC001 59-be (AY129391)
BamH1-HindIII (1)
BamH1 (2)
BamH1-HindIII (2)
BamH1-HindIII (2)
CmR.
CmR
CmR
CmR
pMAQ710
pMAQ713
pMAQ714
241 bp HindIII-BamH1 fragment
271 bp HindIII-BamH1 fragment
206 bp HindIII-BamH1 fragment
EGC140 59-be (AF421329)
EGC068 59-be (AF349098)
EGC102 59-be (AF265275)
BamH1-HindIII (2)
BamH1-HindIII (2)
BamH1-HindIII (2)
CmR
CmR
CmR
pMAQ495
pACYC184
pSU2056
Collis et al. (1998)
Chang and Cohen
(1978)
Martinez and de la
Cruz (1990)
Hall et al. (1991)
This study
This study
Holmes et al.
(2003)
This study
This study
This study
a. Accession numbers for environmental and bacterial gene cassettes from which cloned elements are derived are shown in brackets.
b. Numbers in parentheses represent the orientation of the cloned fragment with respect to pACYC184 as previously defined (Collis et al., 2001).
ably expanded the number of 59-be recombination sites
recovered from environmental samples through sampling
microcosms established from the previously described
Balmain, Homebush and Lidsdale soil samples. This
expanded dataset enables us to address the recombination
activity of environmental gene cassettes for the first time.
Details of the microcosm conditions and enrichment are not
pertinent to the present data and will be reported elsewhere,
obtained upon request. Polymerase chain reaction
conditions for co-integrate mapping were: [(94°C ×
3 min)] × 1, [(94°C × 30 s)(65°C × 30 s)(72°C × 90 s)] × 35,
[(72°C × 5 min] × 1. DNA sequencing was performed at the
Macquarie Sequencing Facility (Macquarie University, Australia) using an ABI Prism 377 (PE Biosystems).
Conduction assays
The conduction assay was performed as described previously
(Collis et al., 2002). Briefly, a donor cell contains three plasmids. One of these is a conjugative plasmid, pMAQ495, that
contains the integron In3 (Fig. 1) but with an insertionally
inactivated intI1 gene (Table 4). The second plasmid is a
derivative of the cloning vector pACYC184 and includes a
test recombination site. The third plasmid, pSU2056, supplies
highly expressed IntI1 protein in trans. Recombination efficiencies are determined by the frequency with which the test
recombination site recombines with one of the three partner
recombination sites in In3 (see below). This efficiency is
measured as the ratio of the number of co-integrates conducted to a recipient cell, as measured by transfer of chloramphenicol resistance, divided by the total number of pMAQ495
transconjugants as measured by transfer of trimethoprim
resistance.
Analysis of co-integrates
In3 of pMAQ495 (R388) contains three recombination sites.
These are attI1, and the 59-be sites of dfrB2 and orfA (Fig. 1).
Insertion of a test element at attI1 separates the dfrB2 gene
from the Pc promoter leading to a TpS phenotype. Consequently the Tp phenotype was used as an indicator of insertion at attI1. However, to accurately and rapidly map cointegrates, a PCR-based strategy was used (Fig. 1). Two
primers, one specific for a sequence within pACYC184 of the
Table 5. Sequencing and PCR primers.
Primer
Sequence
Positiona/comment
Accession number/reference
HS286
HS287
HS318
HS319
HS320
HS457
b
Stokes et al. (2001)
Stokes et al. (2001)
J01773
X12869
X12869
X06403
HS458
HS459
HS460
5′GTTTGATGTTATGGAGCAGCAACG3′
5′GCAAAAAGGCAGCAATTATGAGCC3′
5′GGAAGGAGCTGACTGGGTTGAAGG3′
For cassette PCR. Targets left half of 59-be
For cassette PCR. Targets right half of 59-be
815–831. (C). Within dfrB2 gene cassette
282–298. Within orfA cassette
606–622. (C). Within 3′-conserved segment
1452–1476. Adjacent to unique HindIII site
of pACYC184.
648–671. Within 5′-conserved segment
813–836 (C). Within 3′-conserved segment
2167–2190 (C). Adjacent to unique SalI site
of pACYC184.
5′GGGATCCTCSGCTKGARCGAMTTGTTAGVC3′
5′GGGATCCGCSGCTKANCTCVRRCGTTAGSC3′
5′GCTTCATCGCTACTTTG3′
5′GTATGAAGTCTTTGGCG3′
5′AGTAAAGCCCTCGCTAG3′
5′CAAATGTAGCACCTGAAGTCAGCCC3′
b
J01773
X12869
X06403
a. Numbers refer to location in the cited database entry. (C) indicates sequence is the complementary strand.
b. The first eight bases include a BamH1 linker that is not complementary to targeted sequences.
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
The mobile gene cassettte metagenome 393
test element and a second specific for a sequence within In3,
were used to directly amplify co-integrate template DNA. The
derived PCR product was of a length dependent on the
insertion site. The primer pair commonly used was HS457
and HS459. As an example, PCR product lengths involving
pMAQ701 were 578 bp for insertion at orfA compared to
1463 bp for insertion at attI1. Polymerase chain reaction
products were also used as sequencing templates to identify
the recombination crossover point. For HS457/459 products
and insertion at orfA, right hand junctions were sequenced
by priming with HS320. For insertion at attI1, the sequencing
primer used was HS318. For some co-integrates the left hand
junction was also amplified and sequenced and this was
achieved with HS458 and HS460 as primers. Sequencing
primers for these were HS458 for insertion events at attI1 and
with HS319 for insertion events at orfA.
Nomenclature of gene cassettes and 59-be sites
The recovery of mobile gene cassettes directly from the environment by PCR means that the source organism cannot be
identified. Consequently, for such cassettes, we have
adopted a nomenclature whereby each cassette is assigned
the descriptor ‘EGC’ (environmental gene cassette) followed
by a unique numerical code. For PCR products with more
than one cassette it is also possible to identify the sequence
of a cassette's 59-be as it appears in the linearized form, in
all but the last cassette of the recovered array. Consequently,
identified 59-be sites are assigned the same descriptor (i.e.
EGCxxx) as their cognate cassette. In the linear, integrated
form of a gene cassette, the last six bases of the 59-be are
derived from the following cassette. Consequently these
bases may differ depending on the context of the gene cassette. For consistency all experimental data presented here
refer to the ‘linear’ sequence of the element as observed in
the cloned fragment unless indicated otherwise. If the
sequence of the last six bases of the element in the circular
form of the cassette is known, and if they are different to that
seen in the linear form of the cassette in a particular array,
these differences are noted.
In our studies we are also recovering cassette arrays from
bacterial strains that have been cultured directly from the
environment. As gene cassettes are mobile and it is not yet
clear if any are truly specific to certain bacterial species we
use the descriptor ‘BGC’ (bacterial gene cassette) followed
by a unique numerical code (e.g. BGCxxx) to describe such
cassettes. For completeness of this study we have included
a representative 59-be from a gene cassette in an integron
in a Ps.stutzeri strain (Holmes et al., 2003) recovered from
the Balmain soil (Stokes et al., 2001).
Sequence accession numbers
Cassettes recovered from Balmain: EGC086 (Accession
number AF349106), EGC090 (AF349108), EGC104/
EGC105 (AF349111), EGC068 (AF349098), EGC070
(AF349099), EGC072 (AF349100), EGC074 (AF349101),
EGC076 (AF349102), EGC078 (AF349103), EGC080
(AF349104), EGC066 (AF349097), EGC064 (AF265272),
EGC095/EGC096
(AF349109),
EGC092/EGC093
(AF265270);
Homebush
Bay
cassettes:
EGC082
(AF265263); Cape Denison cassettes: EGC084 (AF349105),
EGC099 (AF349110); Sturt National Park cassettes:
EGC101/EGC102 (AF265275). No-orf containing cassettes:
EGC049 (AF349081), EGC052 (AF349085), EGC053
(AF349086), EGC054 (AF349087), EGC055 (AF349088),
EGC056 (AF349089), EGC057 (AF349090), EGC058
(AF349091), and EGC091 (AF349108). Environmental locations are described in Stokes et al. (2001). Other sequences
reported but not specifically discussed can be found in
Accession numbers AF421312-AF421335.
Recognition of gene cassettes
Bioinformatics analyses were conducted on BioManager.com
provided by ANGIS (http://www.angis.org.au). Gene cassettes in public databases were identified by searches
through the NCBI web site (http://www.ncbi.nlm.nih.gov/
cgi-bin/Entrez/genom_table_cgi) using various sequences
obtained in this study as the query sequence.
Sequences were analysed for features associated with
gene cassettes, principally an open reading frame and a 59be. All 59-be sites are imperfect, inverted repeat structures.
This common structure is achieved through conservation of
a number of positions and features (Fig. 2). Within each half
of the element there are 19 positions that covary (i.e. they
base pair in the fold-back structure). For ease of description
we designate these positions as ‘a’ to ‘s’ on the left and ‘S′
to ‘A′ for the co-varying positions on the right (Fig. 2). The
repeat is imperfect as a result of ‘disruption’ (i.e. mismatch
or insertion) and these commonly occur at certain points on
both sides. In total there are 13 of these and they are indicated by an asterisk or ‘ +’ symbol in Fig. 2. Seven of these
are located in the left half of the element and six in the right
half. We define a DNA sequence as a 59-be site if it contains
these 51 positions in the relative positions shown in Fig. 2.
Known 59-be sites show considerable variation on this model
structure. From comparisons of previously described elements as well as for the sequences obtained here it is apparent that variation is largely restricted to a few specific loci
within the 59-be sequence and results in retention of the
model structure (see Results).
Acknowledgements
Supported by a Research Innovation Fund Grant from Macquarie University. We thank Roberto Anitori and Malcom
Walter for providing the DNA from the Flinders Ranges Hot
Springs, Clare McInnes for sampling the Yerranderie mine
site and Alexandra Kirsten for collection of material from
Cape Denison, Antarctica. We thank Ruth Hall for helpful
discussions. Didier Mazel provided sequence information
from XCR 59-be sites.
References
Avila, P., and de la Cruz, F. (1988) Physical and genetic map
of the IncW plasmid R388. Plasmid 20: 155–157.
Chang, A.C.Y., and Cohen, S.N. (1978) Construction and
characterization of amplifiable multicopy DNA cloning vehi-
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394
394 A. J. Holmes et al.
cles derived from the p15A cryptic miniplasmid. J Bacteriol
134: 1141–1156.
Clark, C.A., Purins, L., Kaewrakon, P., Focareta, T., and
Manning, P.A. (2000) The Vibrio cholerae O1 chromosomal integron. Microbiology 146: 2605–2612.
Collis, C.M., and Hall, R.M. (1992) Gene cassettes from the
insert region of integrons are excised as covalently closed
circles. Mol Microbiol 6: 2875–2885.
Collis, C.M., and Hall, R.M. (1995) Expression of antibiotic
resistance genes in the integrated cassettes of integrons.
Antimicrob Agents Chemother 39: 155–162.
Collis, C.M., Grammaticopoulos, G., Briton, J., Stokes, H.W.,
and Hall, R.M. (1993) Site-specific insertion of gene cassettes into integrons. Mol Microbiol 9: 41–52.
Collis, C.M., Kim, M.-J., Stokes, H.W., and Hall, R.M. (1998)
Binding of the purified integron DNA integrase IntI1 to
integron- and cassette-associated recombination sites. Mol
Microbiol 29: 477–490.
Collis, C.M., Recchia, G.D., Kim, M.-J., Stokes, H.W., and
Hall, R.M. (2001) Efficiency of recombination reactions
catalysed by the Class 1 integron integrase IntI1. J Bacteriol 183: 2535–2542.
Collis, C.M., Kim, M.-J., Partridge, S.R., Stokes, H.W., and
Hall, R.M. (2002) Characterization of the class 3 integron
and the site-specific recombination system it determines.
J Bacteriol 184: 3017–3026.
de la Cruz, F. and Grinsted, J. (1982) Genetic and molecular
characterization of Tn21, a multiple resistance transposon
from R100.1. J Bacteriol 151: 222–228.
Grainge, I., and Jayaram, M. (1999) The integrase family of
recombinase: organization and function of the active site.
Mol Microbiol 33: 449–456.
Hall, R.M., Brookes, D.E., and Stokes, H.W. (1991) Sitespecific insertion of genes into integrons: role of the 59base element and determination of the recombination
cross-over point. Mol Microbiol 5: 1941–1959.
Hall, R.M., Collis, C.M., Kim, M.-J., Partridge, S.R., Recchia,
G.D., and Stokes, H.W. (1999) Mobile gene cassettes and
integrons in evolution. Ann New York Acad Sci 870: 68–80.
Heidelberg, J.F., Eisen, J.A., Nelson, W.C., Clayton, R.A.,
Gwinn, M.L., Dodson, R.J. et al. (2000) DNA sequence of
both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406: 477–483.
Holmes, A.J., Holley, M.P., Mahon, A., Nield, B.S., Gillings,
M.R., and Stokes, H.W. (2003) A distinctive and functional
integron/gene cassette system present in soil bacterial
communities associated with genomic diversity in
Pseudomonas stutzeri. J Bacteriol 185: 918–928.
Lan, R., and Reeves, P.R. (2000) Intraspecies variation in
bacterial genomes: the need for a species genome concept. Trends Microbiol 8: 396–401.
Martinez, E., and de la Cruz, F. (1990) Genetic elements
involved in Tn21 site-specific integration, a novel mechanism for the dissemination of antibiotic resistance genes.
EMBO J 9: 1275–1281.
Mazel, D., Dychinco, B., Webb, B., and Davies, J. (1998) A
distinctive class of integron in the Vibrio cholerae genome.
Science 280: 605–608.
Nield, B.S., Holmes, A.J., Gillings, M.R., Recchia, G.D., Mabbutt, B.C., Nevalainen, K.M.H., and Stokes, H.W. (2001)
Recovery of new integron classes from environmental
DNA. FEMS Microbiol Letts 195: 59–65.
Ochman, H., Lawrence, J.G., and Groisman, E.A. (2000)
Lateral gene transfer and the nature of bacterial innovation.
Nature 405: 299–304.
Recchia, G.D., and Hall, R.M. (1995) Gene cassettes: a new
class of mobile element. Microbiology 141: 3015–3027.
Rowe-Magnus, D.A., Guerot, A.M., Ploncard, P., Dychinco,
B., Davies, J., and Mazel, D. (2001) The evolutionary history of chromosomal super-integrons provides an ancestry
for multiresistant integrons. Proc Natl Acad Sci USA 98:
652–657.
da Silva, A.C.R., Ferro, J.A., Reinach, F.C., Farah, C.S.,
Furlan, L.R., Quaggio, R.B. et al. (2002) Comparison of
the genomes of two Xanthomonas pathogens with differing
host specificities. Nature 417: 459–463
Stokes, H.W., and Hall, R.M. (1989) A novel family of potentially mobile DNA elements encoding site-specific gene
integration functions: integrons. Mol Microbiol 3: 1669–
1683.
Stokes, H.W., O'Gorman, D.B., Recchia, G.D., Parsekhian,
M., and Hall, R.M. (1997) Structure and function of 59-base
element recombination sites associated with mobile gene
cassettes. Mol Microbiol 26: 731–745.
Stokes, H.W., Holmes, A.J., Nield, B.S., Holley, M.P., Nevalainen, K.M.H., Mabbutt, B.C., and Gillings, M.R. (2001)
Gene cassette PCR: sequence-independent recovery of
entire genes from environmental DNA. Appl Environ Microbiol 67: 5240–5246.
Vaisvila, R., Morgan, R.D., Posfai, J., and Raleigh, E.A.
(2001) Discovery and distribution of super-integrons
among Pseudomonads. Mol Microbiol 42: 587–601.
© 2003 Society for Applied Microbiology and Blackwell Publishing Ltd, Environmental Microbiology, 5, 383–394