The Influence of Mutation, Recombination

The Influence of Mutation, Recombination, Population History, and Selection on
Patterns of Genetic Diversity in Neisseria meningitidis
K. A. Jolley,* D. J. Wilson, P. Kriz,à G. Mcvean, and M. C. J. Maiden*
*Peter Medawar Building for Pathogen Research and Department of Zoology, University of Oxford, Oxford, UK; Peter Medawar
Building for Pathogen Research and Department of Statistics, University of Oxford, Oxford, UK; and àNational Reference Laboratory
for Meningococcal Infections, National Institute of Public Health, Prague, Czech Republic
Patterns of genetic diversity within populations of human pathogens, shaped by the ecology of host-microbe interactions,
contain important information about the epidemiological history of infectious disease. Exploiting this information,
however, requires a systematic approach that distinguishes the genetic signal generated by epidemiological processes
from the effects of other forces, such as recombination, mutation, and population history. Here, a variety of quantitative
techniques were employed to investigate multilocus sequence information from isolate collections of Neisseria
meningitidis, a major cause of meningitis and septicemia world wide. This allowed quantitative evaluation of alternative
explanations for the observed population structure. A coalescent-based approach was employed to estimate the rate of
mutation, the rate of recombination, and the size distribution of recombination fragments from samples from diseaseassociated and carried meningococci obtained in the Czech Republic in 1993 and a global collection of diseaseassociated isolates collected globally from 1937 to 1996. The parameter estimates were used to reject a model in which
genetic structure arose by chance in small populations, and analysis of molecular variation showed that geographically
restricted gene flow was unlikely to be the cause of the genetic structure. The genetic differentiation between disease and
carriage isolate collections indicated that, whereas certain genotypes were overrepresented among the disease-isolate
collections (the ‘‘hyperinvasive’’ lineages), disease-associated and carried meningococci exhibited remarkably little
differentiation at the level of individual nucleotide polymorphisms. In combination, these results indicated the repeated
action of natural selection on meningococcal populations, possibly arising from the coevolutionary dynamic of hostpathogen interactions.
Introduction
Neisseria meningitidis, the meningococcus, is a globally distributed cause of bacterial meningitis and septicemia (Rosenstein et al. 2001). Despite its reputation as an
aggressive pathogen (van Deuren, Brandtzaeg, and van der
Meer 2000), this encapsulated gram-negative bacterium is
a common commensal of the upper respiratory tract of
humans that causes disease only infrequently, relative to its
carriage prevalence of between 5% and 40% in human
populations (Wenzel et al. 1973; Broome 1986). Invasive
disease is not a route for transmission between hosts, and
the meningococcus appears in this respect to be an
‘‘accidental pathogen’’ (Maiden 2002) that derives no
long-term evolutionary benefit from the pathology that it
causes (Levin 1996). The meningococcus is both antigenically and genetically highly diverse, but comparisons of
disease-associated and carried meningococci have established that most invasive disease is caused by a minority of
serological groups, as defined by the expression of particular
capsular polysaccharides, and genotypes, as defined by
combinations of alleles at housekeeping loci distributed
around the chromosome (Caugant 1998; Maiden et al.
1998).
Multilocus sequence typing (MLST) (Maiden et al.
1998), a development of multilocus enzyme electrophoresis (Selander et al. 1986), is currently the most widely
employed approach to cataloging genetic variation in
the meningococcus. This nucleotide sequence–based
technique examines seven housekeeping gene fragments,
Key words: genetic diversity, multilocus sequence typing, Neisseria
meningitidis, nucleotide polymorphisms.
E-mail: [email protected].
Mol. Biol. Evol. 22(3):562–569. 2005
doi:10.1093/molbev/msi041
Advance Access publication November 10, 2004
approximately 450 bp in length, derived from loci
distributed around the single haploid chromosome. Each
unique sequence is assigned an allele number in order of
discovery, and each combination of seven allele numbers
a sequence type (ST; equivalent to haplotype) number for
identification. Surveys of meningococcal isolate collections have revealed clusters of related STs, referred to as
clonal complexes. These clusters are thought to correspond
to lineages of bacteria that share a recent common ancestor
(Urwin and Maiden 2003).
Analyses of MLST data from isolates from both
carriage and disease collections have identified a number
of apparent paradoxes. Certain meningococcal STs,
especially those belonging to the ST-1, ST-4, ST-5, ST8, ST-11, ST-32, and ST-41/44 clonal complexes, are
overrepresented in collections of disease-associated isolates relative to their frequency in collections of asymptomatically carried isolates. These meningococci, some of
which have persisted over periods of decades and during
global spread (Caugant 1998; Maiden et al. 1998), are
referred to as ‘‘hyperinvasive.’’ Given that pathogenicity
does not aid meningococcal transmission, this persistence
is unexpected. That is, the hyperinvasive lineages would
be expected to die out quickly unless there is some
other force acting; for example, if increased pathogenicity
is an unavoidable consequence of increased carriage transmission efficacy. Furthermore, phylogenetic analyses have
demonstrated evidence for frequent recombination in
meningococcal populations (Feil et al. 1999; Holmes,
Urwin, and Maiden 1999; Jolley et al. 2000). In the
presence of frequent recombination, the high frequency
of certain genotypes, as represented by STs and clonal
complexes, is also unexpected, because recombination
should act to introduce genetic novelty into existing genotypes.
Molecular Biology and Evolution vol. 22 no. 3 Ó Society for Molecular Biology and Evolution 2004; all rights reserved.
Influences on Patterns of Genetic Diversity in Neisseria meningitidis 563
The persistence of hyperinvasive lineages and the
maintenance of clonal complexes in the presence of frequent recombination suggest that genetic diversity within
meningococcal populations is repeatedly structured by
selective forces. However, under simple population genetic models of recombination, mutation, and genetic drift,
genetic structuring can occur simply by chance. Furthermore, geographically restricted gene flow can lead to structuring of genetic data, as would happen if transmission
between hosts is typically limited to immediate neighbors.
In the absence of estimates of the rates of mutation and recombination in natural populations, and without an understanding of the influence of geography on genetic variation,
inferences about selection are premature.
Here, a systematic approach is taken to test hypotheses concerning the nature of the evolutionary forces
that structure genetic variation in N. meningitidis populations. Three meningococcal isolate collections were
analyzed, representing (1) isolates obtained from the carriage population in the Czech Republic, (2) a temporally
and geographically equivalent set of isolates from individuals with meningococcal disease, and (3) the global
diversity of meningococcal disease isolates in the latter
part of the twentieth century. First, a coalescent approach
was employed to estimate underlying parameters of mutation and recombination, providing estimates of the relative
contribution of mutation and recombination to patterns of
meningococcal diversity, and the size distribution of recombination fragments. These estimates allowed rejection
of the null model, in which structure was a consequence
of purely neutral processes. Furthermore, there was no
evidence for differentiation among the carried meningococcal isolates found in different regions of the Czech
Republic, so limitations in gene flow could not explain the
observed level of genetic structuring, at least in this data
set. Finally, although there were marked differences in
sequence type and alleles at each locus between the Czech
disease-associated and carried meningococci, these isolate
collections exhibited very little differentiation at the level
of individual nucleotide polymorphisms. In combination,
these results demonstrated that the disease-associated and
carried isolate collections represented different combinations of polymorphisms from a common gene pool, rather
than different gene pools. In conclusion, we suggest that
patterns of diversity in N. meningitidis may reflect a
dynamic in which hyperinvasive lineages arise repeatedly
and spread to moderate frequency through an increased
transmission advantage but persist in a given transmission
system for only short periods of evolutionary time.
Methods
Bacterial Isolates
Three isolate collections were examined. The first
collection consisted of 217 carried meningococci obtained
from unrelated 15-year-old to 24-year-old individuals in
the Czech Republic during 1993 (Jolley et al. 2000, 2002).
Although there was an outbreak of meningococcal disease
in this country during the early 1990s (Krizova and
Musilek 1995), none of the isolates in this collection
originated from an individual suffering from, or who had
contact with a case of, invasive meningococcal disease.
The second collection was a previously unpublished
collection, comprising 53 of the 55 meningococci isolated
from cases of invasive disease and submitted to the Czech
National Reference Laboratory for Meningococcal Infections during 1993. This collection was geographically and
temporally coincident with the isolates obtained from
asymptomatic carriers. The third collection included 107
mainly disease-associated meningococci, isolated over
a period spanning 1937 to 1996 from a variety of locations
around the world. This collection was assembled with an
intentional overrepresentation of meningococci belonging
to the major disease-associated genotypes described by
1996 (Maiden et al. 1998).
Genetic Characterization
Each of the isolates from all three collections was
characterized by multilocus sequence typing (MLST)
employing previously published methods (Maiden et al.
1998; Holmes, Urwin, and Maiden 1999). Briefly, nucleotide sequences of seven fragments of housekeeping genes
were determined on each strand, and each unique sequence
was assigned an arbitrary ‘‘allele’’ number by reference
to the Neisseria MLST database (http://pubmlst.org/
neisseria/) (Jolley, Chan, and Maiden 2004). The combination of allele numbers for all seven loci of a given isolate
was assigned an arbitrary sequence type (ST) number.
Each ST was equivalent to a unique haplotype. The MLST
data for the Czech carriage and global-disease–isolate
collections have been published previously (Maiden et al.
1998; Jolley et al. 2000, 2002), the Czech diseaseassociated isolates are reported here for the first time.
Estimation of Population Mutation and Recombination
Rates
Levels of genetic variation and the degree of association between alleles at different loci, or linkage disequilibrium (LD), are generated by the interaction between
population history, mutation, and recombination. Under
simple models of population history, the key parameters
that determine patterns of variation are not the mutation
and recombination rates themselves, but their product
through the compound parameters h ¼ 2Nel and q ¼ 2Ner,
where Ne is the effective population size and l and r
are the per site per generation—in this case, transmission
cycle—rates of mutation and recombination. Estimates
of the scaled parameters can be obtained from empirical
data using a variety of statistical methods. The moment
estimator of Watterson (1975) was employed to estimate h,
and a composite likelihood method, referred to as LDhat
(McVean, Awadalla, and Fearnhead 2002), was employed
to estimate q and the recombination tract length: the LDhat
software is available from http://www.stats.ox.ac.uk/
;mcvean). For collections with more than 100 sequences,
multiple random subsets of 100 sequences were analyzed
for computational tractability and the average across subsets reported. In addition, nonparametric, permutationbased tests implemented in LDhat were used to test the
564 Jolley et al.
significance of the evidence for recombination (McVean,
Awadalla, and Fearnhead 2002).
In bacteria, recombination typically occurs by the
nonreciprocal incorporation of short DNA fragments. Over
short physical distances, such recombination generates
patterns of LD similar to those observed over short ranges
in species with reciprocal crossover. However, over longer
physical distances, the predictions of the two models are
different in that LD in bacteria will tend to asymptote to
a nonzero level, whereas LD between two distant loci in
organisms with crossing over will tend to zero (McVean,
Awadalla, and Fearnhead 2002). This difference gives
power to estimate the average size of DNA fragments
incorporated during recombination events. The fragment
size incorporated was assumed to be drawn from a geometric distribution following Falush et al. (2001), and the
parameter of the distribution (the average size of incorporated fragments, t) that maximized the composite
likelihood in an analysis of all seven loci was found. For
distant loci, the key parameter in determining levels of LD
is the product of the per-site population recombination rate
and the average size of the DNA fragment incorporated
in recombination events, 2Nert. The average tract length
was estimated from a concatenation of 3,284 bp from
the Czech carriage population, comprising all seven gene
fragments assembled in the order in which they occurred in
the genome sequence of N. meningitidis Z2491 and in
which the physical distance between loci was preserved
(Parkhill et al. 2000).
Testing the Neutral Model
Estimates of the population mutation rate, population
recombination rate, and average recombination fragment
size were obtained under the assumption of a simple neutral
model. To assess whether the patterns of genetic diversity
observed differ significantly from the expectations of the
standard neutral model, the data were summarized, for each
locus and for the combined set, in terms of the Tajima
D statistic (Tajima 1989), which measured departures from
neutrality in the frequency spectrum of polymorphic sites.
The Tajima D statistic is constructed such that it has an
approximately normal distribution; however, significance
levels were obtained by simulation using the estimated
mutation and recombination rates (table 2).
To assess the level of genetic structuring, we compared the number of STs (i.e., unique haplotypes) observed
in the empirical data from the Czech carriage population
to the distribution expected under the neutral model
(Fu 1996), using the estimates of per-locus population
mutation rate, population recombination rate, and the
average tract length of recombination events obtained previously. Monte Carlo coalescent simulations were performed using software written by G.Mc.V. and available
on request.
Geographical Structuring Within the Population
The degree of genetic variation among carried
meningococcal isolates sampled from different locations
within the Czech Republic was quantified by Wright’s
statistic FST, which is defined as the correlation between
alleles within a subpopulation relative to alleles within
the total population (Wright 1943,1951; Balding 2003). Isolates were sampled from seven locations within the Czech
Republic (Jolley et al. 2000) but given the uneven
distribution of numbers of isolates among sampling centers,
these isolates were pooled into three geographic regions
covering the whole country. Region A comprised Praha,
Plzen, and Haradec Králové; region B comprised Ceské
Budejovice and Kutna Hora; and region C comprised
Olomouc and Opava. Analysis of molecular variation
(AMOVA) (Excoffier, Smouse, and Quattro 1992), implemented in the software package ARLEQUIN version 2.0
(Schneider, Roessli, and Excoffier 2000), was used to assess
the significance of the observed values of FST by means of
a permutation test. The analysis was performed separately
for each of the seven loci.
Assessing Differentiation Between Disease and Carriage
Populations
We wished to assess both the nature and extent of
genetic differentiation between disease and carriage isolates in the Czech sample. At one extreme, disease and
carriage populations may have near-identical distributions
of STs (unique haplotypes across all loci), locus haplotypes (unique haplotypes at a single MLST locus), and
individual polymorphisms. At the other extreme, disease
and carriage samples may represent entirely differentiated
populations that share no common polymorphisms. To
assess differentiation, it is possible to compare the frequency of STs, locus haplotypes, and SNPs between the
populations. A simple metric of differentiation, which we
refer to as the classification index, was devised:
D¼1
X pi1 pi2
p
i
where pij is the frequency of haplotype or polymorphism
i in population j, and p is the average frequency of p across
the populations. When two populations have identical haplotype or polymorphism frequencies, the statistic is 0, and
when there is no overlap between populations, the statistic
is 1. For the case of biallelic loci, this is equivalent to FST,
and for multiallelic loci or haplotype data, the statistic is
more sensitive to differentiation than FST, as it compares
allele/haplotype with allele/haplotype rather than a summary of diversity such as homozygosity. The statistic can
intuitively be thought of as the probability of correctly
classifying an allele into one population where classification is based on the difference in allele frequency between
the populations. Whether the statistic was significantly
different from 0 was assessed through permutation. Because of finite sample size, the statistic is expected to be
positive under the null hypothesis of no differentiation, an
effect that is greater for multiallelic loci. We, therefore,
bias-corrected the statistic by subtracting the expectation
under the null hypothesis as assessed by permutation. Note
that this does not affect significance levels.
Influences on Patterns of Genetic Diversity in Neisseria meningitidis 565
Table 1
Genetic Diversity of Loci Analyzed
Number of
Segregating Sites
Locus
Length
Czech
Carriage
(n ¼ 100)
abcZ
adk
aroE
fumC
gdh
pdhC
pgm
all loci
433
465
490
465
501
480
450
3284
73
16
133
42
26
81
81
459
Tajima’s D Statistic
Number of Alleles/STs
Czech
Disease
(n ¼ 53)
Global
Disease
(n ¼ 100)
Czech
Carriage
(n ¼ 100)
Czech
Disease
(n ¼ 53)
Global
Disease
(n ¼ 100)
Czech
Carriage
(n ¼ 100)
Czech
Disease
(n ¼ 53)
Global
Disease
(n ¼ 100)
61
18
150
38
26
69
63
425
75
17
166
38
28
80
72
481
17
13
21
21
14
18
23
50
11
10
12
16
13
14
12
28
15
10
17
18
15
24
19
47
1.15
0.817
0.926
0.328
1.355
1.433*
0.811
1.101**
20.268
0.512
20.966
20.221
1.126
1.944*
0.541
0.106
1.063
0.392
0.498
0.157
1.742*
1.842*
0.286
0.833
NOTE.—* P , 0.05; ** P , 0.01.
Results
Genetic Diversity of Meningococcal Isolate Collections
The nucleotide sequence data are available at http://
pubmlst.org/neisseria/. The 153 STs identified were unevenly distributed among the isolate collections, with only
ST-11 present in all three data sets. The analyzed subsets of
these data contained between 10 and 24 alleles per locus,
with the number of segregating sites per locus ranging
among loci from 16 (adk, Czech carriage collection) to
166 (aroE, global-disease collection). Levels of diversity
were consistent for a given locus across the three isolate
collections, indicating similar effective population sizes.
Similar results were obtained with several different subsets
of the data. Tajima’s D statistic (Tajima 1989) was estimated
for each locus and for the 3,284-bp concatenated sequences
(table 1).
Estimates of Recombination and Mutation
On the basis of the number of segregating sites, the
population mutation rate (2Nel, or h) was estimated to be
between 6.80 3 1023 and 80.50 3 1023 per site: these
estimates were consistent within loci across the three data
sets (table 2). In all isolate collections, there was highly
significant evidence for recombination at all loci with the
exception of the least-diverse locus, adk: none of the
estimates of recombination in this locus were statistically
significant. Estimates of the population recombination rate
(2Ner), q, ranged from 2.70 3 1023 to 34.04 3 1023 per
site. These estimates were used to calculate the ratio of
recombination to mutation rates (r/l), which varied from
0.08 to 2.66 for all three isolate collections, with rates of
0.16 to 1.83 within the Czech carriage data set. As with the
per-locus analyses, estimates of the per-site population
mutation rate 2Nel from the concatenated sequences
varied little among the isolate collections, with values of
29.10 3 1023, 30.50 3 1023, and 30.59 3 1023 for the
carriage, Czech disease, and global-disease collections,
respectively.
Estimates of Recombination Fragment Size
In analysis of the Czech carriage collection, the rate at
which distant loci were separated by gene-conversion
events was estimated as 2Nert ¼ 28.2, and the average tract
length was estimated to be 1.1 kb (fig. 1). Within LDhat,
assessing confidence intervals was problematic because of
nonindependence; however, as the log composite likelihood surface could be approximated as a multiple of the
true log-likelihood surface, the values of 500 bp and
2.5 kbp were equally less likely, and 500 bp, the approximate size of the loci examined, was probably a lower
bound on the average fragment size, otherwise LD would
not decay consistently within loci (data not shown).
Testing the Neutral Model
The hypothesis that patterns of genetic variation in
the samples reflected chance events arising from the
Table 2
Estimates of Recombination and Mutation Rates
h 3 1023
q 3 1023
q/ha
Gene
Czech
Carriage
Czech
Disease
Global
Disease
Czech
Carriage
Czech
Disease
Global
Disease
Czech
Carriage
Czech
Disease
Global
Disease
abcZ
adk
aroE
fumC
gdh
pdhC
pgm
36
6.8
61.2
18.3
10.3
35.7
38.3
33.5
8.7
80.5
18.8
11.7
34.2
33.2
36.7
7.2
79.9
16.5
11.1
35.2
33.7
19.15
12.43
9.74
23.23
17.55
22.5
16.75
7.54
5.4
2.56
34.04
31.1
8.9
6.7
8.12
2.7
6.15
23.23
13.04
17.8
22.89
0.53
1.83
0.16
1.27
1.7
0.63
0.44
0.23
0.62
0.03
1.81
2.66
0.26
0.2
0.22
0.38
0.08
1.41
1.17
0.51
0.68
a
Equivalent to r/l.
566 Jolley et al.
FIG. 1.—Relative log composite likelihood curve for estimation of
the average tract length of recombination events. The fitted model
assumes a geometric distribution of tract lengths, which is estimated to
have an average of 1.1 kb.
interaction of mutation and recombination in a population
of constant size and in the absence of natural selection was
examined by testing for departures from the model by
looking at the allele frequency spectrum of biallelic polymorphisms using the Tajima D statistic and the difference
between the observed and expected number of STs. The
values of Tajima’s D statistic for each sample collection
are shown in table 1. Across loci, and in all sample collections (Czech carriage, Czech disease, and global disease),
there was a tendency for positive statistics, indicative of
a dearth of low-frequency variants; however, only for
pdhC was the statistic significantly different from that
expected under the coalescent accounting for recombination. Using concatenated sequences, only the statistic for
Czech carriage was significant (P , 0.01). In contrast, the
number of sequence types observed in the Czech carriage
population (50) was significantly lower than expected (fig.
2) under the neutral model for the mutation and recombination parameters estimated. The positive, genome-wide,
Tajima’s D statistic was indicative of a low level of population structure (arising from either geographical restriction of gene flow or selective processes such as balancing
selection) or a recent, weak (or strong but old) population bottleneck. Population growth or complete selective
sweeps would, in contrast, tend to generate negative Tajima
D values. Likewise, the relative deficit of STs implies that
genetic structure is stronger than expected under the neutral
model, again as might be expected under models with
geographical restriction of gene flow, selective processes,
or population bottlenecks.
FIG. 2.—Comparison of the observed number of STs in the Czech
sample of carried meningococci to the distribution expected under the
neutral coalescent model using parameters estimated from the data.
Differentiation Between Disease and Carriage Samples
The classification index was calculated for comparisons of the Czech carriage and disease-isolate collections
for STs, locus haplotypes, and nucleotide polymorphisms.
Strong differentiation was observed at the ST level, with
weaker differentiation at the level of locus haplotypes, and
low or no significant differentiation at the nucleotidepolymorphism level (table 4). These results indicated that
disease and isolate collections represented significantly
different sets of STs, as established by the existence
of hyperinvasive lineages and, to a lesser extent, haplotypes at individual loci; however, individual polymorphisms were not only shared between disease and carriage
populations but also occurred at very similar frequencies.
In conclusion, disease and carriage populations, although
significantly differentiated, represented different combinations of individual polymorphisms from a common gene
pool.
Discussion
A systematic analysis of the factors influencing genetic
diversity in the pathogen N. meningitidis has been undertaken. The estimated parameters for mutation and recombination demonstrated that patterns of diversity, as
measured by absolute levels of polymorphism, allele
frequency spectra, and linkage disequilibrium, were very
similar in collections of disease-associated and carried
meningococci, despite the dominance of the diseaseassociated isolate collections by hyperinvasive lineages. In
addition, the effective population size scaled rates of mutation and recombination, and the average size of DNA
fragments incorporated by recombination were estimated.
Although there was no strong signal of major changes in
Estimates of Geographic Differentiation
To test the hypothesis that geographically restricted
gene flow was the cause of the structuring observed in
meningococcal diversity, the evidence for differentiation
between the isolates obtained from the seven sampling
sites in the Czech carriage collection (Jolley et al. 2000)
was assessed. FST values were not significantly different
from 0 for all except the fumC locus (which is nonsignificant after Bonferroni correction) (table 3). This
indicates that geographical factors could not explain the
observed level of genetic structuring.
Table 3
Geographic Differentiation Among Carried Meningococci in
the Czech Republic in 1993
Locus
FST
P
abcZ
adk
aroE
fumC
gdh
pdhC
pgm
0.000
0.004
0.007
0.017
0.005
0.002
0.002
0.434
0.249
0.167
0.014
0.202
0.322
0.343
Influences on Patterns of Genetic Diversity in Neisseria meningitidis 567
Table 4
Differentiation Between the Czech Carriage and Disease Isolate Collections
Classification Index
Unit of Analysis
all loci
abcZ
adk
aroE
fumC
gdh
pdhC
pgm
Sequence type
Allele
Nucleotide
0.447**
–
–
–
0.240**
0.048**
–
0.186**
0.011*
–
0.084**
0.008
–
0.280**
0.013**
–
0.091**
20.001
–
0.307**
0.024*
–
0.165**
0.016*
NOTE.—* indicates significantly different from 0 at P , 0.01; ** indicates significantly different from 0 at P , 0.001.
population size, as demonstrated by the largely nonsignificant Tajima D statistics, the presence of a few highfrequency STs and their associated clonal complexes in the
carriage population was incompatible with the level of
recombination observed. The analysis of the geographical
distribution of diversity demonstrated that this structuring
could not be explained by geographically restricted gene
flow. We, therefore, propose that selective forces, most
likely associated with the repeated origin of hyperinvasive
lineages, act repeatedly to structure meningococcal populations over evolutionary time.
Structuring of Genetic Variation by Natural Selection
A number of selective forces might act to structure
genetic variation. Genetic variation among isolates in factors such as transmission route or resistance to host immune
genotypes would lead to cryptic stratification within populations; that is, the pathogen population could be divided
into isolated subpopulations, each specializing in the
colonization of different groups of hosts. In this scenario,
gene flow between subpopulations would be most greatly
limited at sites closely genetically linked to genes responsible for host specialization. This process could, however,
exert an influence over the entire genome through hitchhiking (Maynard Smith and Haigh 1974). An extension of these
ideas is that stratification is a dynamic process, emerging
from the ongoing coevolution between the pathogen and
the host immune system. New variants that are more efficient at transmitting (e.g., because they have novel antigenic
repertoires) may spread through the population rapidly at
first, but subsequently transmission rates decrease as
potential hosts become resistant as a consequence of prior
infection.
These ideas are related to the epidemic-clone model
(Maynard Smith et al. 1993), in which genetic variation in
natural pathogen populations is repeatedly structured by the
origin of novel pathogen types that cause epidemics of
limited duration. In the case of the meningococcus, this
model is apparently inapplicable, as transmission is not
disease associated (Levin and Bull 1994; Maiden 2000);
however, increased disease risk, although not adaptive to
isolates, could be an unavoidable consequence of increased
transmission efficiency. The ST-11 complex and the other
hyperinvasive lineages, could, therefore, be recent in
evolutionary origin and associated with increased transmission rates, perhaps through the possession of novel
genotypes at antigen-encoding loci, such as the capsule
operon. Such variants would tend to rapidly increase in
frequency, generating a hitchhiking event at linked sites
and, therefore, generating high-frequency STs. As host
immunity increased at the population level, so the selective
advantage of the novel variant would decrease, preventing
any single variant from dominating the entire population.
This dynamic would lead to repeated structuring of genetic
variation and differentiation between disease-associated
and carrier isolates at the ST level, as reported here. The
repeated and independent origin of such variants would
mean that disease-associated and carriage populations
would share a common gene pool at the level of nucleotide
polymorphisms; this is also consistent with our findings.
More extensive genome-wide analyses of variation would
identify candidate genes for disease association through
showing elevated levels of structuring.
Evolutionary Parameter Estimates
Estimates of the population-mutation rate, the
population-recombination rate, and the average size of
DNA fragments introduced by recombination enabled a
comparison of the relative influence of recombination and
mutation on patterns of diversity observed. The relative
rate of recombination to mutation has been proposed to be
of fundamental importance in determining the degree of
clonal structure present in a given bacterial population
(Feil et al. 1999, 2000, 2001). In terms of intralocus diversity, novel haplotypes of a locus of length L sites can be
generated either by mutation, at rate lL, or by a recombination event in which at least one end of the incorporated
fragment lies within the locus, at rate rL. The ratio r/l,
therefore, places an upper bound on the relative contribution of recombination to mutation in generating haplotype
diversity. For the Czech carriage collection, this ratio, as
calculated from the ratio of estimates, q/h, ranges from
0.16 to 1.83, which indicates that recombination and
mutation are of roughly equal importance in generating
allelic diversity within a locus.
It is also informative to consider the relative role of
recombination and mutation in generating diversity at the
interlocus or genome level. A given single-nucleotide site
in the genome will change by mutation at rate l and by
recombination at the rate at which gene conversion events
that include that site occur, rt/2, multiplied by the probability that the incorporated site has a different nucleotide,
the average pairwise difference (p). Using the Czech
carriage data set, the relative impact of recombination to
mutation can be estimated by multiplying p, which ranges
from 0.009 for adk to 0.067 for aroE, by the tract length, t,
estimated at 1.1 kbp (fig. 2), and the r/l value for the
appropriate locus divided by 2. Across loci, estimates of
568 Jolley et al.
this ratio lie in the remarkably narrow range of 6.2 to 16.8,
which were within the range of ‘‘r/m’’ values of 4 to 100
calculated previously for the set of 107 global-disease
isolates with counting techniques (Feil et al. 1999, 2000,
2001). In summary, in terms of generating novel genomes,
recombination is roughly 10 times as important as
mutation.
The average tract length in recombination events was
estimated to be 1.1 kb. This is considerably less than the
value of 7.6 kb (Feil et al. 2000) and 5 to 10 kb (Linz et al.
2000) estimated from direct observation. The discrepancy
is most likely a result of the difference between those
events that occur and those that persist over evolutionary
time. Larger tracts will introduce more nucleotide differences into the existing genome and, if epistatic interactions
are important, are more likely to lead to a decrease in
fitness (Zhu et al. 2001) than will shorter tracts. Shorter
recombination tracts are less likely to lead to fitness
decreases but are also harder to detect by direct methods. It
is also worth noting that the average size of coding sequences in the meningococcal genome, at 852 bp (Parkhill
et al. 2000), is smaller than the current estimate of the
average size of imported fragments. Consequently, many
recombination events will include complete coding sequences. In other words, gene replacement events would
be more common than the generation of new alleles with
mosaic structure when compared with the rates observed
in bacteria that exhibit recombination fragment sizes that
are smaller than the average coding sequence, such as
Helicobacter pylori (Falush et al. 2001).
Origin of Hyperinvasive Lineages
The emergence of pathogenic variants within populations of commensal organisms can be rationalized when the
disease syndrome contributes to the spread of the pathogen
(May and Anderson 1983); however, it is more difficult to
explain the emergence and persistence of pathogenic
variants in populations of bacteria such as N. meningitidis,
where the disease syndromes caused do not generate
opportunities for host-to-host spread and, indeed, are
inimical to it (Levin and Bull 1994; Maiden 2000). As
suggested above, this apparent paradox is resolved if
disease is a consequence of increased transmission efficacy
in recently arisen or introduced variants. The behavior of
the ST-11 clonal complex is particularly illustrative of this
effect. In both the Czech disease and carriage collections,
the most common ST was ST-11, the principal, or ‘‘central’’
(Urwin and Maiden 2003), ST of the ST-11 complex
(previously called the ET-37 complex), a hyperinvasive
lineage that has caused disease globally for at least the past
40 years (Wang et al. 1993; Maiden et al. 1998). Previous
studies indicated that normally these meningococci exhibit
very low point prevalence in carriage, even during disease
outbreaks (Caugant et al. 1988; Feavers et al. 1999;
Kellerman et al. 2002), and before 1993, this hyperinvasive
lineage was not found in the Czech Republic (Krizova and
Musilek 1995). However, at the time of the 1993 sample,
the Czech Republic was experiencing a major epidemic of
serogroup C ST-11 complex meningococci. In short, a novel
variant to which the population was naı̈ve was sweeping
through the carriage population and generating a large
number of disease cases.
Acknowledgments
This work was funded by the Wellcome Trust. Part of
the work performed in the Czech Republic was supported
by grant No. NI/6882-3 from the Internal Grant Agency
of the Ministry of Health of the Czech Republic. M.C.J.M.
is a Wellcome Trust Senior Research Fellow and G.Mc.V.
is a Royal Society University Research Fellow. D.J.W. is
funded by a BBSRC research studentship.
Literature Cited
Balding, D. J. 2003. Likelihood-based inference for genetic
correlation coefficients. Theor. Popul. Biol 63:221–230.
Broome, C. V. 1986. The carrier state: Neisseria meningitidis. J.
Antimicrob. Chemother. 18(Suppl. A):25–34.
Caugant, D. A. 1998. Population genetics and molecular
epidemiology of Neisseria meningitidis. Apmis 106:505–525.
Caugant, D. A., B. E. Kristiansen, L. O. Frøholm, K. Bovre, and
R. K. Selander. 1988. Clonal diversity of Neisseria
meningitidis from a population of asymptomatic carriers.
Infect. Immunol. 56:2060–2068.
Excoffier, L., P. E. Smouse, and J. M. Quattro. 1992. Analysis of
molecular variance inferred from metric distances among
DNA haplotypes: application to human mitochondrial DNA
restriction data. Genetics 131:479–491.
Falush, D., C. Kraft, N. S. Taylor, P. Correa, J. G. Fox,
M. Achtman, and S. Suerbaum. 2001. Recombination and
mutation during long-term gastric colonization by Helicobacter pylori: estimates of clock rates, recombination size, and
minimal age. Proc. Natl. Acad. Sci. USA 98:15056–15061.
Feavers, I. M., S. J. Gray, R. Urwin, J. E. Russell, J. A. Bygraves,
E. B. Kaczmarski, and M. C. J. Maiden. 1999. Multilocus
sequence typing and antigen gene sequencing in the investigation of a meningococcal disease outbreak. J. Clin. Microbiol.
37:3883–3887.
Feil, E. J., E. C. Holmes, D. E. Bessen et al. (12 co-authors).
2001. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term
phylogenetic consequences. Proc. Natl. Acad. Sci. USA
98:182–187.
Feil, E. J., M. C. J. Maiden, M. Achtman, and B. G. Spratt. 1999.
The relative contributions of recombination and mutation to
the divergence of clones of Neisseria meningitidis. Mol. Biol.
Evol. 16:1496–1502.
Feil, E. J., J. Maynard Smith, M. C. Enright, and B. G. Spratt.
2000. Estimating recombinational parameters in Streptococcus pneumoniae from multilocus sequence typing data.
Genetics 154:1439–1450.
Fu, Y. X. 1996. New statistical tests of neutrality for DNA
samples from a population. Genetics 143:557–570.
Holmes, E. C., R. Urwin, and M. C. J. Maiden. 1999. The
influence of recombination on the population structure and
evolution of the human pathogen Neisseria meningitidis. Mol.
Biol. Evol. 16:741–749.
Jolley, K. A., M. S. Chan, and M. C. Maiden. 2004. mlstdbNet—
distributed multi-locus sequence typing (MLST) databases.
BMC Bioinformatics 5:86.
Jolley, K. A., J. Kalmusova, E. J. Feil, S. Gupta, M. Musilek,
P. Kriz, and M. C. Maiden. 2000. Carried meningococci in
the Czech Republic: a diverse recombining population.
J. Clin. Microbiol. 38:4492–4498.
Influences on Patterns of Genetic Diversity in Neisseria meningitidis 569
———. 2002. Carried meningococci in the Czech Republic:
a diverse recombining population. J. Clin. Microbiol.
40:3549–3550.
Kellerman, S. E., K. McCombs, M. Ray et al. (11 co-authors).
2002. Genotype-specific carriage of Neisseria meningitidis
in Georgia counties with hyper- and hyposporadic rates of
meningococcal disease. J. Infect. Dis. 186:40–48.
Krizova, P., and M. Musilek. 1995. Changing epidemiology of
meningococcal invasive disease in the Czech Republic caused
by new clone Neisseria meningitidis C:2a:P1.2(P1.5), ET-15/
37. Central Eur. J. Public Health 3:189–194.
Levin, B. R. 1996. The evolution and maintenance of virulence in
microparasites. Emerg. Infect. Dis. 2:93–102.
Levin, B. R., and J. J. Bull. 1994. Short-sighted evolution and
the virulence of pathogenic microorganisms. Trends Microbiol.
2:76–81.
Linz, B., M. Schenker, P. Zhu, and M. Achtman. 2000. Frequent
interspecific genetic exchange between commensal neisseriae
and Neisseria meningitidis. Mol. Microbiol. 36:1049–1058.
Maiden, M. C. 2002. Population structure of Neisseria
meningitidis. Pp. 151–170 in C. Ferreirós, M. T. Criado,
and J. Vázquez, eds. Emerging strategies in the fight against
meningitis: molecular and cellular aspects. Horizon Scientific
Press, Wymondham, Norfolk, United Kingdom.
Maiden, M. C. J. 2000. High-throughput sequencing in the population analysis of bacterial pathogens. Int. J. Med. Microbiol.
290:183–190.
Maiden, M. C. J., J. A. Bygraves, E. Feil et al. (13 co-authors).
1998. Multilocus sequence typing: a portable approach to
the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA 95:3140–
3145.
May, R. M., and R. M. Anderson. 1983. Epidemiology and
genetics in the coevolution of parasites and hosts. Proc. R.
Soc. Lond. B Biol. Sci. 219:281–313.
Maynard Smith, J., and J. Haigh. 1974. The hitch-hiking effect of
a favourable gene. Genet. Res. 23:23–35.
Maynard Smith, J., N. H. Smith, M. O’Rourke, and B. G. Spratt.
1993. How clonal are bacteria? Proc. Natl. Acad. Sci. USA
90:4384–4388.
McVean, G., P. Awadalla, and P. Fearnhead. 2002. A coalescentbased method for detecting and estimating recombination
from gene sequences. Genetics 160:1231–1241.
Parkhill, J., M. Achtman, K. D. James et al. (28 co-authors).
2000. Complete DNA sequence of a serogroup A strain of
Neisseria meningitidis Z2491. Nature 404:502–506.
Rosenstein, N. E., B. A. Perkins, D. S. Stephens, T. Popovic, and
J. M. Hughes. 2001. Meningococcal disease. N. Engl. J. Med.
344:1378–1388.
Schneider, S., D. Roessli, and L. Excoffier. 2000. Arlequin
version 2.000: software for population genetic data analysis.
University of Geneva, Geneva, Switzerland.
Selander, R. K., D. A. Caugant, H. Ochman, J. M. Musser, M. N.
Gilmour, and T. S. Whittam. 1986. Methods of multilocus
enzyme electrophoresis for bacterial population genetics and
systematics. Appl. Environ. Microbiol. 51:837–884.
Tajima, F. 1989. Statistical method for testing the neutral
mutation hypothesis by DNA polymorphism. Genetics 123:
585–595.
Urwin, R., and M. C. Maiden. 2003. Multi-locus sequence
typing: a tool for global epidemiology. Trends Microbiol.
11:479–487.
van Deuren, M., P. Brandtzaeg, and J. W. van der Meer. 2000.
Update on meningococcal disease with emphasis on pathogenesis and clinical management. Clin. Microbiol. Rev.
13:144–166
Wang, J.-F., D. A. Caugant, G. Morelli, B. Koumaré, and
M. Achtman. 1993. Antigenic and epidemiological properties
of the ET-37 complex of Neisseria meningitidis. J. Infect. Dis.
167:1320–1329.
Watterson, G. A. 1975. On the number of segregating sites.
Theor. Popul. Biol. 7:256.
Wenzel, R. P., J. A. Davies, J. R. Mitzel, and W. E. Beam Jr.
1973. Non-usefulness of meningococcal carriage-rates. Lancet
2:205.
Wright, S. 1943. Isolation by distance. Genetics 28:114–138.
———. 1951. The genetical structure of populations. Annal.
Eugen. 15:323–354.
Zhu, P., A. van der Ende, D. Falush et al. (16 co-authors). 2001.
Fit genotypes and escape variants of subgroup III Neisseria
meningitidis during three pandemics of epidemic meningitis.
Proc. Natl. Acad. Sci. USA 98:5234–5239.
Pierre Capy, Associate Editor:
Accepted October 26, 2004